1salloc(1) Slurm Commands salloc(1)
2
3
4
6 salloc - Obtain a Slurm job allocation (a set of nodes), execute a com‐
7 mand, and then release the allocation when the command is finished.
8
9
11 salloc [OPTIONS(0)...] [ : [OPTIONS(N)...]] [command(0) [args(0)...]]
12
13 Option(s) define multiple jobs in a co-scheduled heterogeneous job.
14 For more details about heterogeneous jobs see the document
15 https://slurm.schedmd.com/heterogeneous_jobs.html
16
17
19 salloc is used to allocate a Slurm job allocation, which is a set of
20 resources (nodes), possibly with some set of constraints (e.g. number
21 of processors per node). When salloc successfully obtains the re‐
22 quested allocation, it then runs the command specified by the user.
23 Finally, when the user specified command is complete, salloc relin‐
24 quishes the job allocation.
25
26 The command may be any program the user wishes. Some typical commands
27 are xterm, a shell script containing srun commands, and srun (see the
28 EXAMPLES section). If no command is specified, then salloc runs the
29 user's default shell.
30
31 The following document describes the influence of various options on
32 the allocation of cpus to jobs and tasks.
33 https://slurm.schedmd.com/cpu_management.html
34
35 NOTE: The salloc logic includes support to save and restore the termi‐
36 nal line settings and is designed to be executed in the foreground. If
37 you need to execute salloc in the background, set its standard input to
38 some file, for example: "salloc -n16 a.out </dev/null &"
39
40
42 If salloc is unable to execute the user command, it will return 1 and
43 print errors to stderr. Else if success or if killed by signals HUP,
44 INT, KILL, or QUIT: it will return 0.
45
46
48 If provided, the command is resolved in the following order:
49
50 1. If command starts with ".", then path is constructed as: current
51 working directory / command
52 2. If command starts with a "/", then path is considered absolute.
53 3. If command can be resolved through PATH. See path_resolution(7).
54 4. If command is in current working directory.
55
56 Current working directory is the calling process working directory un‐
57 less the --chdir argument is passed, which will override the current
58 working directory.
59
60
62 -A, --account=<account>
63 Charge resources used by this job to specified account. The ac‐
64 count is an arbitrary string. The account name may be changed
65 after job submission using the scontrol command.
66
67 --acctg-freq=<datatype>=<interval>[,<datatype>=<interval>...]
68 Define the job accounting and profiling sampling intervals in
69 seconds. This can be used to override the JobAcctGatherFre‐
70 quency parameter in the slurm.conf file. <datatype>=<interval>
71 specifies the task sampling interval for the jobacct_gather
72 plugin or a sampling interval for a profiling type by the
73 acct_gather_profile plugin. Multiple comma-separated
74 <datatype>=<interval> pairs may be specified. Supported datatype
75 values are:
76
77 task Sampling interval for the jobacct_gather plugins and
78 for task profiling by the acct_gather_profile
79 plugin.
80 NOTE: This frequency is used to monitor memory us‐
81 age. If memory limits are enforced the highest fre‐
82 quency a user can request is what is configured in
83 the slurm.conf file. It can not be disabled.
84
85 energy Sampling interval for energy profiling using the
86 acct_gather_energy plugin.
87
88 network Sampling interval for infiniband profiling using the
89 acct_gather_interconnect plugin.
90
91 filesystem Sampling interval for filesystem profiling using the
92 acct_gather_filesystem plugin.
93
94 The default value for the task sampling interval is 30 seconds.
95 The default value for all other intervals is 0. An interval of
96 0 disables sampling of the specified type. If the task sampling
97 interval is 0, accounting information is collected only at job
98 termination (reducing Slurm interference with the job).
99 Smaller (non-zero) values have a greater impact upon job perfor‐
100 mance, but a value of 30 seconds is not likely to be noticeable
101 for applications having less than 10,000 tasks.
102
103
104 --bb=<spec>
105 Burst buffer specification. The form of the specification is
106 system dependent. Note the burst buffer may not be accessible
107 from a login node, but require that salloc spawn a shell on one
108 of its allocated compute nodes. When the --bb option is used,
109 Slurm parses this option and creates a temporary burst buffer
110 script file that is used internally by the burst buffer plugins.
111 See Slurm's burst buffer guide for more information and exam‐
112 ples:
113 https://slurm.schedmd.com/burst_buffer.html
114
115 --bbf=<file_name>
116 Path of file containing burst buffer specification. The form of
117 the specification is system dependent. Also see --bb. Note the
118 burst buffer may not be accessible from a login node, but re‐
119 quire that salloc spawn a shell on one of its allocated compute
120 nodes. See Slurm's burst buffer guide for more information and
121 examples:
122 https://slurm.schedmd.com/burst_buffer.html
123
124 --begin=<time>
125 Defer eligibility of this job allocation until the specified
126 time.
127
128 Time may be of the form HH:MM:SS to run a job at a specific time
129 of day (seconds are optional). (If that time is already past,
130 the next day is assumed.) You may also specify midnight, noon,
131 fika (3 PM) or teatime (4 PM) and you can have a time-of-day
132 suffixed with AM or PM for running in the morning or the
133 evening. You can also say what day the job will be run, by
134 specifying a date of the form MMDDYY or MM/DD/YY YYYY-MM-DD.
135 Combine date and time using the following format
136 YYYY-MM-DD[THH:MM[:SS]]. You can also give times like now +
137 count time-units, where the time-units can be seconds (default),
138 minutes, hours, days, or weeks and you can tell Slurm to run the
139 job today with the keyword today and to run the job tomorrow
140 with the keyword tomorrow. The value may be changed after job
141 submission using the scontrol command. For example:
142 --begin=16:00
143 --begin=now+1hour
144 --begin=now+60 (seconds by default)
145 --begin=2010-01-20T12:34:00
146
147
148 Notes on date/time specifications:
149 - Although the 'seconds' field of the HH:MM:SS time specifica‐
150 tion is allowed by the code, note that the poll time of the
151 Slurm scheduler is not precise enough to guarantee dispatch of
152 the job on the exact second. The job will be eligible to start
153 on the next poll following the specified time. The exact poll
154 interval depends on the Slurm scheduler (e.g., 60 seconds with
155 the default sched/builtin).
156 - If no time (HH:MM:SS) is specified, the default is
157 (00:00:00).
158 - If a date is specified without a year (e.g., MM/DD) then the
159 current year is assumed, unless the combination of MM/DD and
160 HH:MM:SS has already passed for that year, in which case the
161 next year is used.
162
163
164 --bell Force salloc to ring the terminal bell when the job allocation
165 is granted (and only if stdout is a tty). By default, salloc
166 only rings the bell if the allocation is pending for more than
167 ten seconds (and only if stdout is a tty). Also see the option
168 --no-bell.
169
170 -D, --chdir=<path>
171 Change directory to path before beginning execution. The path
172 can be specified as full path or relative path to the directory
173 where the command is executed.
174
175 --cluster-constraint=<list>
176 Specifies features that a federated cluster must have to have a
177 sibling job submitted to it. Slurm will attempt to submit a sib‐
178 ling job to a cluster if it has at least one of the specified
179 features.
180
181 -M, --clusters=<string>
182 Clusters to issue commands to. Multiple cluster names may be
183 comma separated. The job will be submitted to the one cluster
184 providing the earliest expected job initiation time. The default
185 value is the current cluster. A value of 'all' will query to run
186 on all clusters. Note that the SlurmDBD must be up for this op‐
187 tion to work properly.
188
189 --comment=<string>
190 An arbitrary comment.
191
192 -C, --constraint=<list>
193 Nodes can have features assigned to them by the Slurm adminis‐
194 trator. Users can specify which of these features are required
195 by their job using the constraint option. Only nodes having
196 features matching the job constraints will be used to satisfy
197 the request. Multiple constraints may be specified with AND,
198 OR, matching OR, resource counts, etc. (some operators are not
199 supported on all system types). Supported constraint options
200 include:
201
202 Single Name
203 Only nodes which have the specified feature will be used.
204 For example, --constraint="intel"
205
206 Node Count
207 A request can specify the number of nodes needed with
208 some feature by appending an asterisk and count after the
209 feature name. For example, --nodes=16 --con‐
210 straint="graphics*4 ..." indicates that the job requires
211 16 nodes and that at least four of those nodes must have
212 the feature "graphics."
213
214 AND If only nodes with all of specified features will be
215 used. The ampersand is used for an AND operator. For
216 example, --constraint="intel&gpu"
217
218 OR If only nodes with at least one of specified features
219 will be used. The vertical bar is used for an OR opera‐
220 tor. For example, --constraint="intel|amd"
221
222 Matching OR
223 If only one of a set of possible options should be used
224 for all allocated nodes, then use the OR operator and en‐
225 close the options within square brackets. For example,
226 --constraint="[rack1|rack2|rack3|rack4]" might be used to
227 specify that all nodes must be allocated on a single rack
228 of the cluster, but any of those four racks can be used.
229
230 Multiple Counts
231 Specific counts of multiple resources may be specified by
232 using the AND operator and enclosing the options within
233 square brackets. For example, --con‐
234 straint="[rack1*2&rack2*4]" might be used to specify that
235 two nodes must be allocated from nodes with the feature
236 of "rack1" and four nodes must be allocated from nodes
237 with the feature "rack2".
238
239 NOTE: This construct does not support multiple Intel KNL
240 NUMA or MCDRAM modes. For example, while --con‐
241 straint="[(knl&quad)*2&(knl&hemi)*4]" is not supported,
242 --constraint="[haswell*2&(knl&hemi)*4]" is supported.
243 Specification of multiple KNL modes requires the use of a
244 heterogeneous job.
245
246 Brackets
247 Brackets can be used to indicate that you are looking for
248 a set of nodes with the different requirements contained
249 within the brackets. For example, --con‐
250 straint="[(rack1|rack2)*1&(rack3)*2]" will get you one
251 node with either the "rack1" or "rack2" features and two
252 nodes with the "rack3" feature. The same request without
253 the brackets will try to find a single node that meets
254 those requirements.
255
256 NOTE: Brackets are only reserved for Multiple Counts and
257 Matching OR syntax. AND operators require a count for
258 each feature inside square brackets (i.e.
259 "[quad*2&hemi*1]"). Slurm will only allow a single set of
260 bracketed constraints per job.
261
262 Parenthesis
263 Parenthesis can be used to group like node features to‐
264 gether. For example, --con‐
265 straint="[(knl&snc4&flat)*4&haswell*1]" might be used to
266 specify that four nodes with the features "knl", "snc4"
267 and "flat" plus one node with the feature "haswell" are
268 required. All options within parenthesis should be
269 grouped with AND (e.g. "&") operands.
270
271 --container=<path_to_container>
272 Absolute path to OCI container bundle.
273
274 --contiguous
275 If set, then the allocated nodes must form a contiguous set.
276
277 NOTE: If SelectPlugin=cons_res this option won't be honored with
278 the topology/tree or topology/3d_torus plugins, both of which
279 can modify the node ordering.
280
281 -S, --core-spec=<num>
282 Count of specialized cores per node reserved by the job for sys‐
283 tem operations and not used by the application. The application
284 will not use these cores, but will be charged for their alloca‐
285 tion. Default value is dependent upon the node's configured
286 CoreSpecCount value. If a value of zero is designated and the
287 Slurm configuration option AllowSpecResourcesUsage is enabled,
288 the job will be allowed to override CoreSpecCount and use the
289 specialized resources on nodes it is allocated. This option can
290 not be used with the --thread-spec option.
291
292 --cores-per-socket=<cores>
293 Restrict node selection to nodes with at least the specified
294 number of cores per socket. See additional information under -B
295 option above when task/affinity plugin is enabled.
296 NOTE: This option may implicitly set the number of tasks (if -n
297 was not specified) as one task per requested thread.
298
299 --cpu-freq=<p1>[-p2[:p3]]
300
301 Request that job steps initiated by srun commands inside this
302 allocation be run at some requested frequency if possible, on
303 the CPUs selected for the step on the compute node(s).
304
305 p1 can be [#### | low | medium | high | highm1] which will set
306 the frequency scaling_speed to the corresponding value, and set
307 the frequency scaling_governor to UserSpace. See below for defi‐
308 nition of the values.
309
310 p1 can be [Conservative | OnDemand | Performance | PowerSave]
311 which will set the scaling_governor to the corresponding value.
312 The governor has to be in the list set by the slurm.conf option
313 CpuFreqGovernors.
314
315 When p2 is present, p1 will be the minimum scaling frequency and
316 p2 will be the maximum scaling frequency.
317
318 p2 can be [#### | medium | high | highm1] p2 must be greater
319 than p1.
320
321 p3 can be [Conservative | OnDemand | Performance | PowerSave |
322 SchedUtil | UserSpace] which will set the governor to the corre‐
323 sponding value.
324
325 If p3 is UserSpace, the frequency scaling_speed will be set by a
326 power or energy aware scheduling strategy to a value between p1
327 and p2 that lets the job run within the site's power goal. The
328 job may be delayed if p1 is higher than a frequency that allows
329 the job to run within the goal.
330
331 If the current frequency is < min, it will be set to min. Like‐
332 wise, if the current frequency is > max, it will be set to max.
333
334 Acceptable values at present include:
335
336 #### frequency in kilohertz
337
338 Low the lowest available frequency
339
340 High the highest available frequency
341
342 HighM1 (high minus one) will select the next highest
343 available frequency
344
345 Medium attempts to set a frequency in the middle of the
346 available range
347
348 Conservative attempts to use the Conservative CPU governor
349
350 OnDemand attempts to use the OnDemand CPU governor (the de‐
351 fault value)
352
353 Performance attempts to use the Performance CPU governor
354
355 PowerSave attempts to use the PowerSave CPU governor
356
357 UserSpace attempts to use the UserSpace CPU governor
358
359 The following informational environment variable is set
360 in the job
361 step when --cpu-freq option is requested.
362 SLURM_CPU_FREQ_REQ
363
364 This environment variable can also be used to supply the value
365 for the CPU frequency request if it is set when the 'srun' com‐
366 mand is issued. The --cpu-freq on the command line will over‐
367 ride the environment variable value. The form on the environ‐
368 ment variable is the same as the command line. See the ENVIRON‐
369 MENT VARIABLES section for a description of the
370 SLURM_CPU_FREQ_REQ variable.
371
372 NOTE: This parameter is treated as a request, not a requirement.
373 If the job step's node does not support setting the CPU fre‐
374 quency, or the requested value is outside the bounds of the le‐
375 gal frequencies, an error is logged, but the job step is allowed
376 to continue.
377
378 NOTE: Setting the frequency for just the CPUs of the job step
379 implies that the tasks are confined to those CPUs. If task con‐
380 finement (i.e. the task/affinity TaskPlugin is enabled, or the
381 task/cgroup TaskPlugin is enabled with "ConstrainCores=yes" set
382 in cgroup.conf) is not configured, this parameter is ignored.
383
384 NOTE: When the step completes, the frequency and governor of
385 each selected CPU is reset to the previous values.
386
387 NOTE: When submitting jobs with the --cpu-freq option with lin‐
388 uxproc as the ProctrackType can cause jobs to run too quickly
389 before Accounting is able to poll for job information. As a re‐
390 sult not all of accounting information will be present.
391
392 --cpus-per-gpu=<ncpus>
393 Advise Slurm that ensuing job steps will require ncpus proces‐
394 sors per allocated GPU. Not compatible with the --cpus-per-task
395 option.
396
397 -c, --cpus-per-task=<ncpus>
398 Advise Slurm that ensuing job steps will require ncpus proces‐
399 sors per task. By default Slurm will allocate one processor per
400 task.
401
402 For instance, consider an application that has 4 tasks, each re‐
403 quiring 3 processors. If our cluster is comprised of quad-pro‐
404 cessors nodes and we simply ask for 12 processors, the con‐
405 troller might give us only 3 nodes. However, by using the
406 --cpus-per-task=3 options, the controller knows that each task
407 requires 3 processors on the same node, and the controller will
408 grant an allocation of 4 nodes, one for each of the 4 tasks.
409
410 --deadline=<OPT>
411 remove the job if no ending is possible before this deadline
412 (start > (deadline - time[-min])). Default is no deadline.
413 Valid time formats are:
414 HH:MM[:SS] [AM|PM]
415 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
416 MM/DD[/YY]-HH:MM[:SS]
417 YYYY-MM-DD[THH:MM[:SS]]]
418 now[+count[seconds(default)|minutes|hours|days|weeks]]
419
420 --delay-boot=<minutes>
421 Do not reboot nodes in order to satisfied this job's feature
422 specification if the job has been eligible to run for less than
423 this time period. If the job has waited for less than the spec‐
424 ified period, it will use only nodes which already have the
425 specified features. The argument is in units of minutes. A de‐
426 fault value may be set by a system administrator using the de‐
427 lay_boot option of the SchedulerParameters configuration parame‐
428 ter in the slurm.conf file, otherwise the default value is zero
429 (no delay).
430
431 -d, --dependency=<dependency_list>
432 Defer the start of this job until the specified dependencies
433 have been satisfied completed. <dependency_list> is of the form
434 <type:job_id[:job_id][,type:job_id[:job_id]]> or
435 <type:job_id[:job_id][?type:job_id[:job_id]]>. All dependencies
436 must be satisfied if the "," separator is used. Any dependency
437 may be satisfied if the "?" separator is used. Only one separa‐
438 tor may be used. Many jobs can share the same dependency and
439 these jobs may even belong to different users. The value may
440 be changed after job submission using the scontrol command. De‐
441 pendencies on remote jobs are allowed in a federation. Once a
442 job dependency fails due to the termination state of a preceding
443 job, the dependent job will never be run, even if the preceding
444 job is requeued and has a different termination state in a sub‐
445 sequent execution.
446
447 after:job_id[[+time][:jobid[+time]...]]
448 After the specified jobs start or are cancelled and
449 'time' in minutes from job start or cancellation happens,
450 this job can begin execution. If no 'time' is given then
451 there is no delay after start or cancellation.
452
453 afterany:job_id[:jobid...]
454 This job can begin execution after the specified jobs
455 have terminated.
456
457 afterburstbuffer:job_id[:jobid...]
458 This job can begin execution after the specified jobs
459 have terminated and any associated burst buffer stage out
460 operations have completed.
461
462 aftercorr:job_id[:jobid...]
463 A task of this job array can begin execution after the
464 corresponding task ID in the specified job has completed
465 successfully (ran to completion with an exit code of
466 zero).
467
468 afternotok:job_id[:jobid...]
469 This job can begin execution after the specified jobs
470 have terminated in some failed state (non-zero exit code,
471 node failure, timed out, etc).
472
473 afterok:job_id[:jobid...]
474 This job can begin execution after the specified jobs
475 have successfully executed (ran to completion with an
476 exit code of zero).
477
478 singleton
479 This job can begin execution after any previously
480 launched jobs sharing the same job name and user have
481 terminated. In other words, only one job by that name
482 and owned by that user can be running or suspended at any
483 point in time. In a federation, a singleton dependency
484 must be fulfilled on all clusters unless DependencyParam‐
485 eters=disable_remote_singleton is used in slurm.conf.
486
487 -m, --distribution={*|block|cyclic|arbi‐
488 trary|plane=<size>}[:{*|block|cyclic|fcyclic}[:{*|block|cyclic|fcyclic}]][,{Pack|NoPack}]
489
490 Specify alternate distribution methods for remote processes.
491 For job allocation, this sets environment variables that will be
492 used by subsequent srun requests and also affects which cores
493 will be selected for job allocation.
494
495 This option controls the distribution of tasks to the nodes on
496 which resources have been allocated, and the distribution of
497 those resources to tasks for binding (task affinity). The first
498 distribution method (before the first ":") controls the distri‐
499 bution of tasks to nodes. The second distribution method (after
500 the first ":") controls the distribution of allocated CPUs
501 across sockets for binding to tasks. The third distribution
502 method (after the second ":") controls the distribution of allo‐
503 cated CPUs across cores for binding to tasks. The second and
504 third distributions apply only if task affinity is enabled. The
505 third distribution is supported only if the task/cgroup plugin
506 is configured. The default value for each distribution type is
507 specified by *.
508
509 Note that with select/cons_res and select/cons_tres, the number
510 of CPUs allocated to each socket and node may be different. Re‐
511 fer to https://slurm.schedmd.com/mc_support.html for more infor‐
512 mation on resource allocation, distribution of tasks to nodes,
513 and binding of tasks to CPUs.
514 First distribution method (distribution of tasks across nodes):
515
516
517 * Use the default method for distributing tasks to nodes
518 (block).
519
520 block The block distribution method will distribute tasks to a
521 node such that consecutive tasks share a node. For exam‐
522 ple, consider an allocation of three nodes each with two
523 cpus. A four-task block distribution request will dis‐
524 tribute those tasks to the nodes with tasks one and two
525 on the first node, task three on the second node, and
526 task four on the third node. Block distribution is the
527 default behavior if the number of tasks exceeds the num‐
528 ber of allocated nodes.
529
530 cyclic The cyclic distribution method will distribute tasks to a
531 node such that consecutive tasks are distributed over
532 consecutive nodes (in a round-robin fashion). For exam‐
533 ple, consider an allocation of three nodes each with two
534 cpus. A four-task cyclic distribution request will dis‐
535 tribute those tasks to the nodes with tasks one and four
536 on the first node, task two on the second node, and task
537 three on the third node. Note that when SelectType is
538 select/cons_res, the same number of CPUs may not be allo‐
539 cated on each node. Task distribution will be round-robin
540 among all the nodes with CPUs yet to be assigned to
541 tasks. Cyclic distribution is the default behavior if
542 the number of tasks is no larger than the number of allo‐
543 cated nodes.
544
545 plane The tasks are distributed in blocks of size <size>. The
546 size must be given or SLURM_DIST_PLANESIZE must be set.
547 The number of tasks distributed to each node is the same
548 as for cyclic distribution, but the taskids assigned to
549 each node depend on the plane size. Additional distribu‐
550 tion specifications cannot be combined with this option.
551 For more details (including examples and diagrams),
552 please see https://slurm.schedmd.com/mc_support.html and
553 https://slurm.schedmd.com/dist_plane.html
554
555 arbitrary
556 The arbitrary method of distribution will allocate pro‐
557 cesses in-order as listed in file designated by the envi‐
558 ronment variable SLURM_HOSTFILE. If this variable is
559 listed it will over ride any other method specified. If
560 not set the method will default to block. Inside the
561 hostfile must contain at minimum the number of hosts re‐
562 quested and be one per line or comma separated. If spec‐
563 ifying a task count (-n, --ntasks=<number>), your tasks
564 will be laid out on the nodes in the order of the file.
565 NOTE: The arbitrary distribution option on a job alloca‐
566 tion only controls the nodes to be allocated to the job
567 and not the allocation of CPUs on those nodes. This op‐
568 tion is meant primarily to control a job step's task lay‐
569 out in an existing job allocation for the srun command.
570 NOTE: If the number of tasks is given and a list of re‐
571 quested nodes is also given, the number of nodes used
572 from that list will be reduced to match that of the num‐
573 ber of tasks if the number of nodes in the list is
574 greater than the number of tasks.
575
576 Second distribution method (distribution of CPUs across sockets
577 for binding):
578
579
580 * Use the default method for distributing CPUs across sock‐
581 ets (cyclic).
582
583 block The block distribution method will distribute allocated
584 CPUs consecutively from the same socket for binding to
585 tasks, before using the next consecutive socket.
586
587 cyclic The cyclic distribution method will distribute allocated
588 CPUs for binding to a given task consecutively from the
589 same socket, and from the next consecutive socket for the
590 next task, in a round-robin fashion across sockets.
591 Tasks requiring more than one CPU will have all of those
592 CPUs allocated on a single socket if possible.
593
594 fcyclic
595 The fcyclic distribution method will distribute allocated
596 CPUs for binding to tasks from consecutive sockets in a
597 round-robin fashion across the sockets. Tasks requiring
598 more than one CPU will have each CPUs allocated in a
599 cyclic fashion across sockets.
600
601 Third distribution method (distribution of CPUs across cores for
602 binding):
603
604
605 * Use the default method for distributing CPUs across cores
606 (inherited from second distribution method).
607
608 block The block distribution method will distribute allocated
609 CPUs consecutively from the same core for binding to
610 tasks, before using the next consecutive core.
611
612 cyclic The cyclic distribution method will distribute allocated
613 CPUs for binding to a given task consecutively from the
614 same core, and from the next consecutive core for the
615 next task, in a round-robin fashion across cores.
616
617 fcyclic
618 The fcyclic distribution method will distribute allocated
619 CPUs for binding to tasks from consecutive cores in a
620 round-robin fashion across the cores.
621
622 Optional control for task distribution over nodes:
623
624
625 Pack Rather than evenly distributing a job step's tasks evenly
626 across its allocated nodes, pack them as tightly as pos‐
627 sible on the nodes. This only applies when the "block"
628 task distribution method is used.
629
630 NoPack Rather than packing a job step's tasks as tightly as pos‐
631 sible on the nodes, distribute them evenly. This user
632 option will supersede the SelectTypeParameters
633 CR_Pack_Nodes configuration parameter.
634
635 -x, --exclude=<node_name_list>
636 Explicitly exclude certain nodes from the resources granted to
637 the job.
638
639 --exclusive[={user|mcs}]
640 The job allocation can not share nodes with other running jobs
641 (or just other users with the "=user" option or with the "=mcs"
642 option). If user/mcs are not specified (i.e. the job allocation
643 can not share nodes with other running jobs), the job is allo‐
644 cated all CPUs and GRES on all nodes in the allocation, but is
645 only allocated as much memory as it requested. This is by design
646 to support gang scheduling, because suspended jobs still reside
647 in memory. To request all the memory on a node, use --mem=0.
648 The default shared/exclusive behavior depends on system configu‐
649 ration and the partition's OverSubscribe option takes precedence
650 over the job's option. NOTE: Since shared GRES (MPS) cannot be
651 allocated at the same time as a sharing GRES (GPU) this option
652 only allocates all sharing GRES and no underlying shared GRES.
653
654 -B, --extra-node-info=<sockets>[:cores[:threads]]
655 Restrict node selection to nodes with at least the specified
656 number of sockets, cores per socket and/or threads per core.
657 NOTE: These options do not specify the resource allocation size.
658 Each value specified is considered a minimum. An asterisk (*)
659 can be used as a placeholder indicating that all available re‐
660 sources of that type are to be utilized. Values can also be
661 specified as min-max. The individual levels can also be speci‐
662 fied in separate options if desired:
663 --sockets-per-node=<sockets>
664 --cores-per-socket=<cores>
665 --threads-per-core=<threads>
666 If task/affinity plugin is enabled, then specifying an alloca‐
667 tion in this manner also results in subsequently launched tasks
668 being bound to threads if the -B option specifies a thread
669 count, otherwise an option of cores if a core count is speci‐
670 fied, otherwise an option of sockets. If SelectType is config‐
671 ured to select/cons_res, it must have a parameter of CR_Core,
672 CR_Core_Memory, CR_Socket, or CR_Socket_Memory for this option
673 to be honored. If not specified, the scontrol show job will
674 display 'ReqS:C:T=*:*:*'. This option applies to job alloca‐
675 tions.
676 NOTE: This option is mutually exclusive with --hint,
677 --threads-per-core and --ntasks-per-core.
678 NOTE: This option may implicitly set the number of tasks (if -n
679 was not specified) as one task per requested thread.
680
681 --get-user-env[=timeout][mode]
682 This option will load login environment variables for the user
683 specified in the --uid option. The environment variables are
684 retrieved by running something along the lines of "su - <user‐
685 name> -c /usr/bin/env" and parsing the output. Be aware that
686 any environment variables already set in salloc's environment
687 will take precedence over any environment variables in the
688 user's login environment. The optional timeout value is in sec‐
689 onds. Default value is 3 seconds. The optional mode value con‐
690 trols the "su" options. With a mode value of "S", "su" is exe‐
691 cuted without the "-" option. With a mode value of "L", "su" is
692 executed with the "-" option, replicating the login environment.
693 If mode is not specified, the mode established at Slurm build
694 time is used. Examples of use include "--get-user-env",
695 "--get-user-env=10" "--get-user-env=10L", and
696 "--get-user-env=S". NOTE: This option only works if the caller
697 has an effective uid of "root".
698
699 --gid=<group>
700 Submit the job with the specified group's group access permis‐
701 sions. group may be the group name or the numerical group ID.
702 In the default Slurm configuration, this option is only valid
703 when used by the user root.
704
705 --gpu-bind=[verbose,]<type>
706 Bind tasks to specific GPUs. By default every spawned task can
707 access every GPU allocated to the step. If "verbose," is speci‐
708 fied before <type>, then print out GPU binding debug information
709 to the stderr of the tasks. GPU binding is ignored if there is
710 only one task.
711
712 Supported type options:
713
714 closest Bind each task to the GPU(s) which are closest. In a
715 NUMA environment, each task may be bound to more than
716 one GPU (i.e. all GPUs in that NUMA environment).
717
718 map_gpu:<list>
719 Bind by setting GPU masks on tasks (or ranks) as spec‐
720 ified where <list> is
721 <gpu_id_for_task_0>,<gpu_id_for_task_1>,... GPU IDs
722 are interpreted as decimal values unless they are pre‐
723 ceded with '0x' in which case they interpreted as
724 hexadecimal values. If the number of tasks (or ranks)
725 exceeds the number of elements in this list, elements
726 in the list will be reused as needed starting from the
727 beginning of the list. To simplify support for large
728 task counts, the lists may follow a map with an aster‐
729 isk and repetition count. For example
730 "map_gpu:0*4,1*4". If the task/cgroup plugin is used
731 and ConstrainDevices is set in cgroup.conf, then the
732 GPU IDs are zero-based indexes relative to the GPUs
733 allocated to the job (e.g. the first GPU is 0, even if
734 the global ID is 3). Otherwise, the GPU IDs are global
735 IDs, and all GPUs on each node in the job should be
736 allocated for predictable binding results.
737
738 mask_gpu:<list>
739 Bind by setting GPU masks on tasks (or ranks) as spec‐
740 ified where <list> is
741 <gpu_mask_for_task_0>,<gpu_mask_for_task_1>,... The
742 mapping is specified for a node and identical mapping
743 is applied to the tasks on every node (i.e. the lowest
744 task ID on each node is mapped to the first mask spec‐
745 ified in the list, etc.). GPU masks are always inter‐
746 preted as hexadecimal values but can be preceded with
747 an optional '0x'. To simplify support for large task
748 counts, the lists may follow a map with an asterisk
749 and repetition count. For example
750 "mask_gpu:0x0f*4,0xf0*4". If the task/cgroup plugin
751 is used and ConstrainDevices is set in cgroup.conf,
752 then the GPU IDs are zero-based indexes relative to
753 the GPUs allocated to the job (e.g. the first GPU is
754 0, even if the global ID is 3). Otherwise, the GPU IDs
755 are global IDs, and all GPUs on each node in the job
756 should be allocated for predictable binding results.
757
758 none Do not bind tasks to GPUs (turns off binding if
759 --gpus-per-task is requested).
760
761 per_task:<gpus_per_task>
762 Each task will be bound to the number of gpus speci‐
763 fied in <gpus_per_task>. Gpus are assigned in order to
764 tasks. The first task will be assigned the first x
765 number of gpus on the node etc.
766
767 single:<tasks_per_gpu>
768 Like --gpu-bind=closest, except that each task can
769 only be bound to a single GPU, even when it can be
770 bound to multiple GPUs that are equally close. The
771 GPU to bind to is determined by <tasks_per_gpu>, where
772 the first <tasks_per_gpu> tasks are bound to the first
773 GPU available, the second <tasks_per_gpu> tasks are
774 bound to the second GPU available, etc. This is basi‐
775 cally a block distribution of tasks onto available
776 GPUs, where the available GPUs are determined by the
777 socket affinity of the task and the socket affinity of
778 the GPUs as specified in gres.conf's Cores parameter.
779
780 --gpu-freq=[<type]=value>[,<type=value>][,verbose]
781 Request that GPUs allocated to the job are configured with spe‐
782 cific frequency values. This option can be used to indepen‐
783 dently configure the GPU and its memory frequencies. After the
784 job is completed, the frequencies of all affected GPUs will be
785 reset to the highest possible values. In some cases, system
786 power caps may override the requested values. The field type
787 can be "memory". If type is not specified, the GPU frequency is
788 implied. The value field can either be "low", "medium", "high",
789 "highm1" or a numeric value in megahertz (MHz). If the speci‐
790 fied numeric value is not possible, a value as close as possible
791 will be used. See below for definition of the values. The ver‐
792 bose option causes current GPU frequency information to be
793 logged. Examples of use include "--gpu-freq=medium,memory=high"
794 and "--gpu-freq=450".
795
796 Supported value definitions:
797
798 low the lowest available frequency.
799
800 medium attempts to set a frequency in the middle of the
801 available range.
802
803 high the highest available frequency.
804
805 highm1 (high minus one) will select the next highest avail‐
806 able frequency.
807
808 -G, --gpus=[type:]<number>
809 Specify the total number of GPUs required for the job. An op‐
810 tional GPU type specification can be supplied. For example
811 "--gpus=volta:3". Multiple options can be requested in a comma
812 separated list, for example: "--gpus=volta:3,kepler:1". See
813 also the --gpus-per-node, --gpus-per-socket and --gpus-per-task
814 options.
815 NOTE: The allocation has to contain at least one GPU per node.
816
817 --gpus-per-node=[type:]<number>
818 Specify the number of GPUs required for the job on each node in‐
819 cluded in the job's resource allocation. An optional GPU type
820 specification can be supplied. For example
821 "--gpus-per-node=volta:3". Multiple options can be requested in
822 a comma separated list, for example:
823 "--gpus-per-node=volta:3,kepler:1". See also the --gpus,
824 --gpus-per-socket and --gpus-per-task options.
825
826 --gpus-per-socket=[type:]<number>
827 Specify the number of GPUs required for the job on each socket
828 included in the job's resource allocation. An optional GPU type
829 specification can be supplied. For example
830 "--gpus-per-socket=volta:3". Multiple options can be requested
831 in a comma separated list, for example:
832 "--gpus-per-socket=volta:3,kepler:1". Requires job to specify a
833 sockets per node count ( --sockets-per-node). See also the
834 --gpus, --gpus-per-node and --gpus-per-task options.
835
836 --gpus-per-task=[type:]<number>
837 Specify the number of GPUs required for the job on each task to
838 be spawned in the job's resource allocation. An optional GPU
839 type specification can be supplied. For example
840 "--gpus-per-task=volta:1". Multiple options can be requested in
841 a comma separated list, for example:
842 "--gpus-per-task=volta:3,kepler:1". See also the --gpus,
843 --gpus-per-socket and --gpus-per-node options. This option re‐
844 quires an explicit task count, e.g. -n, --ntasks or "--gpus=X
845 --gpus-per-task=Y" rather than an ambiguous range of nodes with
846 -N, --nodes. This option will implicitly set
847 --gpu-bind=per_task:<gpus_per_task>, but that can be overridden
848 with an explicit --gpu-bind specification.
849
850 --gres=<list>
851 Specifies a comma-delimited list of generic consumable re‐
852 sources. The format of each entry on the list is
853 "name[[:type]:count]". The name is that of the consumable re‐
854 source. The count is the number of those resources with a de‐
855 fault value of 1. The count can have a suffix of "k" or "K"
856 (multiple of 1024), "m" or "M" (multiple of 1024 x 1024), "g" or
857 "G" (multiple of 1024 x 1024 x 1024), "t" or "T" (multiple of
858 1024 x 1024 x 1024 x 1024), "p" or "P" (multiple of 1024 x 1024
859 x 1024 x 1024 x 1024). The specified resources will be allo‐
860 cated to the job on each node. The available generic consumable
861 resources is configurable by the system administrator. A list
862 of available generic consumable resources will be printed and
863 the command will exit if the option argument is "help". Exam‐
864 ples of use include "--gres=gpu:2", "--gres=gpu:kepler:2", and
865 "--gres=help".
866
867 --gres-flags=<type>
868 Specify generic resource task binding options.
869
870 disable-binding
871 Disable filtering of CPUs with respect to generic re‐
872 source locality. This option is currently required to
873 use more CPUs than are bound to a GRES (i.e. if a GPU is
874 bound to the CPUs on one socket, but resources on more
875 than one socket are required to run the job). This op‐
876 tion may permit a job to be allocated resources sooner
877 than otherwise possible, but may result in lower job per‐
878 formance.
879 NOTE: This option is specific to SelectType=cons_res.
880
881 enforce-binding
882 The only CPUs available to the job will be those bound to
883 the selected GRES (i.e. the CPUs identified in the
884 gres.conf file will be strictly enforced). This option
885 may result in delayed initiation of a job. For example a
886 job requiring two GPUs and one CPU will be delayed until
887 both GPUs on a single socket are available rather than
888 using GPUs bound to separate sockets, however, the appli‐
889 cation performance may be improved due to improved commu‐
890 nication speed. Requires the node to be configured with
891 more than one socket and resource filtering will be per‐
892 formed on a per-socket basis.
893 NOTE: This option is specific to SelectType=cons_tres.
894
895 -h, --help
896 Display help information and exit.
897
898 --hint=<type>
899 Bind tasks according to application hints.
900 NOTE: This option cannot be used in conjunction with
901 --ntasks-per-core, --threads-per-core or -B. If --hint is speci‐
902 fied as a command line argument, it will take precedence over
903 the environment.
904
905 compute_bound
906 Select settings for compute bound applications: use all
907 cores in each socket, one thread per core.
908
909 memory_bound
910 Select settings for memory bound applications: use only
911 one core in each socket, one thread per core.
912
913 [no]multithread
914 [don't] use extra threads with in-core multi-threading
915 which can benefit communication intensive applications.
916 Only supported with the task/affinity plugin.
917
918 help show this help message
919
920 -H, --hold
921 Specify the job is to be submitted in a held state (priority of
922 zero). A held job can now be released using scontrol to reset
923 its priority (e.g. "scontrol release <job_id>").
924
925 -I, --immediate[=<seconds>]
926 exit if resources are not available within the time period spec‐
927 ified. If no argument is given (seconds defaults to 1), re‐
928 sources must be available immediately for the request to suc‐
929 ceed. If defer is configured in SchedulerParameters and sec‐
930 onds=1 the allocation request will fail immediately; defer con‐
931 flicts and takes precedence over this option. By default, --im‐
932 mediate is off, and the command will block until resources be‐
933 come available. Since this option's argument is optional, for
934 proper parsing the single letter option must be followed immedi‐
935 ately with the value and not include a space between them. For
936 example "-I60" and not "-I 60".
937
938 -J, --job-name=<jobname>
939 Specify a name for the job allocation. The specified name will
940 appear along with the job id number when querying running jobs
941 on the system. The default job name is the name of the "com‐
942 mand" specified on the command line.
943
944 -K, --kill-command[=signal]
945 salloc always runs a user-specified command once the allocation
946 is granted. salloc will wait indefinitely for that command to
947 exit. If you specify the --kill-command option salloc will send
948 a signal to your command any time that the Slurm controller
949 tells salloc that its job allocation has been revoked. The job
950 allocation can be revoked for a couple of reasons: someone used
951 scancel to revoke the allocation, or the allocation reached its
952 time limit. If you do not specify a signal name or number and
953 Slurm is configured to signal the spawned command at job termi‐
954 nation, the default signal is SIGHUP for interactive and SIGTERM
955 for non-interactive sessions. Since this option's argument is
956 optional, for proper parsing the single letter option must be
957 followed immediately with the value and not include a space be‐
958 tween them. For example "-K1" and not "-K 1".
959
960 -L, --licenses=<license>[@db][:count][,license[@db][:count]...]
961 Specification of licenses (or other resources available on all
962 nodes of the cluster) which must be allocated to this job. Li‐
963 cense names can be followed by a colon and count (the default
964 count is one). Multiple license names should be comma separated
965 (e.g. "--licenses=foo:4,bar").
966
967 NOTE: When submitting heterogeneous jobs, license requests only
968 work correctly when made on the first component job. For exam‐
969 ple "salloc -L ansys:2 :".
970
971 --mail-type=<type>
972 Notify user by email when certain event types occur. Valid type
973 values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to
974 BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT), IN‐
975 VALID_DEPEND (dependency never satisfied), STAGE_OUT (burst buf‐
976 fer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90
977 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80
978 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of
979 time limit). Multiple type values may be specified in a comma
980 separated list. The user to be notified is indicated with
981 --mail-user.
982
983 --mail-user=<user>
984 User to receive email notification of state changes as defined
985 by --mail-type. The default value is the submitting user.
986
987 --mcs-label=<mcs>
988 Used only when the mcs/group plugin is enabled. This parameter
989 is a group among the groups of the user. Default value is cal‐
990 culated by the Plugin mcs if it's enabled.
991
992 --mem=<size>[units]
993 Specify the real memory required per node. Default units are
994 megabytes. Different units can be specified using the suffix
995 [K|M|G|T]. Default value is DefMemPerNode and the maximum value
996 is MaxMemPerNode. If configured, both of parameters can be seen
997 using the scontrol show config command. This parameter would
998 generally be used if whole nodes are allocated to jobs (Select‐
999 Type=select/linear). Also see --mem-per-cpu and --mem-per-gpu.
1000 The --mem, --mem-per-cpu and --mem-per-gpu options are mutually
1001 exclusive. If --mem, --mem-per-cpu or --mem-per-gpu are speci‐
1002 fied as command line arguments, then they will take precedence
1003 over the environment.
1004
1005 NOTE: A memory size specification of zero is treated as a spe‐
1006 cial case and grants the job access to all of the memory on each
1007 node. If the job is allocated multiple nodes in a heterogeneous
1008 cluster, the memory limit on each node will be that of the node
1009 in the allocation with the smallest memory size (same limit will
1010 apply to every node in the job's allocation).
1011
1012 NOTE: Enforcement of memory limits currently relies upon the
1013 task/cgroup plugin or enabling of accounting, which samples mem‐
1014 ory use on a periodic basis (data need not be stored, just col‐
1015 lected). In both cases memory use is based upon the job's Resi‐
1016 dent Set Size (RSS). A task may exceed the memory limit until
1017 the next periodic accounting sample.
1018
1019 --mem-bind=[{quiet|verbose},]<type>
1020 Bind tasks to memory. Used only when the task/affinity plugin is
1021 enabled and the NUMA memory functions are available. Note that
1022 the resolution of CPU and memory binding may differ on some ar‐
1023 chitectures. For example, CPU binding may be performed at the
1024 level of the cores within a processor while memory binding will
1025 be performed at the level of nodes, where the definition of
1026 "nodes" may differ from system to system. By default no memory
1027 binding is performed; any task using any CPU can use any memory.
1028 This option is typically used to ensure that each task is bound
1029 to the memory closest to its assigned CPU. The use of any type
1030 other than "none" or "local" is not recommended.
1031
1032 NOTE: To have Slurm always report on the selected memory binding
1033 for all commands executed in a shell, you can enable verbose
1034 mode by setting the SLURM_MEM_BIND environment variable value to
1035 "verbose".
1036
1037 The following informational environment variables are set when
1038 --mem-bind is in use:
1039
1040 SLURM_MEM_BIND_LIST
1041 SLURM_MEM_BIND_PREFER
1042 SLURM_MEM_BIND_SORT
1043 SLURM_MEM_BIND_TYPE
1044 SLURM_MEM_BIND_VERBOSE
1045
1046 See the ENVIRONMENT VARIABLES section for a more detailed de‐
1047 scription of the individual SLURM_MEM_BIND* variables.
1048
1049 Supported options include:
1050
1051 help show this help message
1052
1053 local Use memory local to the processor in use
1054
1055 map_mem:<list>
1056 Bind by setting memory masks on tasks (or ranks) as spec‐
1057 ified where <list> is
1058 <numa_id_for_task_0>,<numa_id_for_task_1>,... The map‐
1059 ping is specified for a node and identical mapping is ap‐
1060 plied to the tasks on every node (i.e. the lowest task ID
1061 on each node is mapped to the first ID specified in the
1062 list, etc.). NUMA IDs are interpreted as decimal values
1063 unless they are preceded with '0x' in which case they in‐
1064 terpreted as hexadecimal values. If the number of tasks
1065 (or ranks) exceeds the number of elements in this list,
1066 elements in the list will be reused as needed starting
1067 from the beginning of the list. To simplify support for
1068 large task counts, the lists may follow a map with an as‐
1069 terisk and repetition count. For example
1070 "map_mem:0x0f*4,0xf0*4". For predictable binding re‐
1071 sults, all CPUs for each node in the job should be allo‐
1072 cated to the job.
1073
1074 mask_mem:<list>
1075 Bind by setting memory masks on tasks (or ranks) as spec‐
1076 ified where <list> is
1077 <numa_mask_for_task_0>,<numa_mask_for_task_1>,... The
1078 mapping is specified for a node and identical mapping is
1079 applied to the tasks on every node (i.e. the lowest task
1080 ID on each node is mapped to the first mask specified in
1081 the list, etc.). NUMA masks are always interpreted as
1082 hexadecimal values. Note that masks must be preceded
1083 with a '0x' if they don't begin with [0-9] so they are
1084 seen as numerical values. If the number of tasks (or
1085 ranks) exceeds the number of elements in this list, ele‐
1086 ments in the list will be reused as needed starting from
1087 the beginning of the list. To simplify support for large
1088 task counts, the lists may follow a mask with an asterisk
1089 and repetition count. For example "mask_mem:0*4,1*4".
1090 For predictable binding results, all CPUs for each node
1091 in the job should be allocated to the job.
1092
1093 no[ne] don't bind tasks to memory (default)
1094
1095 p[refer]
1096 Prefer use of first specified NUMA node, but permit
1097 use of other available NUMA nodes.
1098
1099 q[uiet]
1100 quietly bind before task runs (default)
1101
1102 rank bind by task rank (not recommended)
1103
1104 sort sort free cache pages (run zonesort on Intel KNL nodes)
1105
1106 v[erbose]
1107 verbosely report binding before task runs
1108
1109 --mem-per-cpu=<size>[units]
1110 Minimum memory required per allocated CPU. Default units are
1111 megabytes. Different units can be specified using the suffix
1112 [K|M|G|T]. The default value is DefMemPerCPU and the maximum
1113 value is MaxMemPerCPU (see exception below). If configured, both
1114 parameters can be seen using the scontrol show config command.
1115 Note that if the job's --mem-per-cpu value exceeds the config‐
1116 ured MaxMemPerCPU, then the user's limit will be treated as a
1117 memory limit per task; --mem-per-cpu will be reduced to a value
1118 no larger than MaxMemPerCPU; --cpus-per-task will be set and the
1119 value of --cpus-per-task multiplied by the new --mem-per-cpu
1120 value will equal the original --mem-per-cpu value specified by
1121 the user. This parameter would generally be used if individual
1122 processors are allocated to jobs (SelectType=select/cons_res).
1123 If resources are allocated by core, socket, or whole nodes, then
1124 the number of CPUs allocated to a job may be higher than the
1125 task count and the value of --mem-per-cpu should be adjusted ac‐
1126 cordingly. Also see --mem and --mem-per-gpu. The --mem,
1127 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1128
1129 NOTE: If the final amount of memory requested by a job can't be
1130 satisfied by any of the nodes configured in the partition, the
1131 job will be rejected. This could happen if --mem-per-cpu is
1132 used with the --exclusive option for a job allocation and
1133 --mem-per-cpu times the number of CPUs on a node is greater than
1134 the total memory of that node.
1135
1136 --mem-per-gpu=<size>[units]
1137 Minimum memory required per allocated GPU. Default units are
1138 megabytes. Different units can be specified using the suffix
1139 [K|M|G|T]. Default value is DefMemPerGPU and is available on
1140 both a global and per partition basis. If configured, the pa‐
1141 rameters can be seen using the scontrol show config and scontrol
1142 show partition commands. Also see --mem. The --mem,
1143 --mem-per-cpu and --mem-per-gpu options are mutually exclusive.
1144
1145 --mincpus=<n>
1146 Specify a minimum number of logical cpus/processors per node.
1147
1148 --network=<type>
1149 Specify information pertaining to the switch or network. The
1150 interpretation of type is system dependent. This option is sup‐
1151 ported when running Slurm on a Cray natively. It is used to re‐
1152 quest using Network Performance Counters. Only one value per
1153 request is valid. All options are case in-sensitive. In this
1154 configuration supported values include:
1155
1156 system
1157 Use the system-wide network performance counters. Only
1158 nodes requested will be marked in use for the job alloca‐
1159 tion. If the job does not fill up the entire system the
1160 rest of the nodes are not able to be used by other jobs
1161 using NPC, if idle their state will appear as PerfCnts.
1162 These nodes are still available for other jobs not using
1163 NPC.
1164
1165 blade Use the blade network performance counters. Only nodes re‐
1166 quested will be marked in use for the job allocation. If
1167 the job does not fill up the entire blade(s) allocated to
1168 the job those blade(s) are not able to be used by other
1169 jobs using NPC, if idle their state will appear as PerfC‐
1170 nts. These nodes are still available for other jobs not
1171 using NPC.
1172
1173 In all cases the job allocation request must specify the --ex‐
1174 clusive option. Otherwise the request will be denied.
1175
1176 Also with any of these options steps are not allowed to share
1177 blades, so resources would remain idle inside an allocation if
1178 the step running on a blade does not take up all the nodes on
1179 the blade.
1180
1181 The network option is also supported on systems with IBM's Par‐
1182 allel Environment (PE). See IBM's LoadLeveler job command key‐
1183 word documentation about the keyword "network" for more informa‐
1184 tion. Multiple values may be specified in a comma separated
1185 list. All options are case in-sensitive. Supported values in‐
1186 clude:
1187
1188 BULK_XFER[=<resources>]
1189 Enable bulk transfer of data using Remote Di‐
1190 rect-Memory Access (RDMA). The optional resources
1191 specification is a numeric value which can have a
1192 suffix of "k", "K", "m", "M", "g" or "G" for kilo‐
1193 bytes, megabytes or gigabytes. NOTE: The resources
1194 specification is not supported by the underlying IBM
1195 infrastructure as of Parallel Environment version
1196 2.2 and no value should be specified at this time.
1197
1198 CAU=<count> Number of Collectve Acceleration Units (CAU) re‐
1199 quired. Applies only to IBM Power7-IH processors.
1200 Default value is zero. Independent CAU will be al‐
1201 located for each programming interface (MPI, LAPI,
1202 etc.)
1203
1204 DEVNAME=<name>
1205 Specify the device name to use for communications
1206 (e.g. "eth0" or "mlx4_0").
1207
1208 DEVTYPE=<type>
1209 Specify the device type to use for communications.
1210 The supported values of type are: "IB" (InfiniBand),
1211 "HFI" (P7 Host Fabric Interface), "IPONLY" (IP-Only
1212 interfaces), "HPCE" (HPC Ethernet), and
1213
1214 "KMUX" (Kernel Emulation of HPCE). The devices al‐
1215 located to a job must all be of the same type. The
1216 default value depends upon depends upon what hard‐
1217 ware is available and in order of preferences is
1218 IPONLY (which is not considered in User Space mode),
1219 HFI, IB, HPCE, and KMUX.
1220
1221 IMMED =<count>
1222 Number of immediate send slots per window required.
1223 Applies only to IBM Power7-IH processors. Default
1224 value is zero.
1225
1226 INSTANCES =<count>
1227 Specify number of network connections for each task
1228 on each network connection. The default instance
1229 count is 1.
1230
1231 IPV4 Use Internet Protocol (IP) version 4 communications
1232 (default).
1233
1234 IPV6 Use Internet Protocol (IP) version 6 communications.
1235
1236 LAPI Use the LAPI programming interface.
1237
1238 MPI Use the MPI programming interface. MPI is the de‐
1239 fault interface.
1240
1241 PAMI Use the PAMI programming interface.
1242
1243 SHMEM Use the OpenSHMEM programming interface.
1244
1245 SN_ALL Use all available switch networks (default).
1246
1247 SN_SINGLE Use one available switch network.
1248
1249 UPC Use the UPC programming interface.
1250
1251 US Use User Space communications.
1252
1253 Some examples of network specifications:
1254
1255 Instances=2,US,MPI,SN_ALL
1256 Create two user space connections for MPI communications
1257 on every switch network for each task.
1258
1259 US,MPI,Instances=3,Devtype=IB
1260 Create three user space connections for MPI communica‐
1261 tions on every InfiniBand network for each task.
1262
1263 IPV4,LAPI,SN_Single
1264 Create a IP version 4 connection for LAPI communications
1265 on one switch network for each task.
1266
1267 Instances=2,US,LAPI,MPI
1268 Create two user space connections each for LAPI and MPI
1269 communications on every switch network for each task.
1270 Note that SN_ALL is the default option so every switch
1271 network is used. Also note that Instances=2 specifies
1272 that two connections are established for each protocol
1273 (LAPI and MPI) and each task. If there are two networks
1274 and four tasks on the node then a total of 32 connections
1275 are established (2 instances x 2 protocols x 2 networks x
1276 4 tasks).
1277
1278 --nice[=adjustment]
1279 Run the job with an adjusted scheduling priority within Slurm.
1280 With no adjustment value the scheduling priority is decreased by
1281 100. A negative nice value increases the priority, otherwise de‐
1282 creases it. The adjustment range is +/- 2147483645. Only privi‐
1283 leged users can specify a negative adjustment.
1284
1285 --no-bell
1286 Silence salloc's use of the terminal bell. Also see the option
1287 --bell.
1288
1289 -k, --no-kill[=off]
1290 Do not automatically terminate a job if one of the nodes it has
1291 been allocated fails. The user will assume the responsibilities
1292 for fault-tolerance should a node fail. When there is a node
1293 failure, any active job steps (usually MPI jobs) on that node
1294 will almost certainly suffer a fatal error, but with --no-kill,
1295 the job allocation will not be revoked so the user may launch
1296 new job steps on the remaining nodes in their allocation.
1297
1298 Specify an optional argument of "off" disable the effect of the
1299 SALLOC_NO_KILL environment variable.
1300
1301 By default Slurm terminates the entire job allocation if any
1302 node fails in its range of allocated nodes.
1303
1304 --no-shell
1305 immediately exit after allocating resources, without running a
1306 command. However, the Slurm job will still be created and will
1307 remain active and will own the allocated resources as long as it
1308 is active. You will have a Slurm job id with no associated pro‐
1309 cesses or tasks. You can submit srun commands against this re‐
1310 source allocation, if you specify the --jobid= option with the
1311 job id of this Slurm job. Or, this can be used to temporarily
1312 reserve a set of resources so that other jobs cannot use them
1313 for some period of time. (Note that the Slurm job is subject to
1314 the normal constraints on jobs, including time limits, so that
1315 eventually the job will terminate and the resources will be
1316 freed, or you can terminate the job manually using the scancel
1317 command.)
1318
1319 -F, --nodefile=<node_file>
1320 Much like --nodelist, but the list is contained in a file of
1321 name node file. The node names of the list may also span multi‐
1322 ple lines in the file. Duplicate node names in the file will
1323 be ignored. The order of the node names in the list is not im‐
1324 portant; the node names will be sorted by Slurm.
1325
1326 -w, --nodelist=<node_name_list>
1327 Request a specific list of hosts. The job will contain all of
1328 these hosts and possibly additional hosts as needed to satisfy
1329 resource requirements. The list may be specified as a
1330 comma-separated list of hosts, a range of hosts (host[1-5,7,...]
1331 for example), or a filename. The host list will be assumed to
1332 be a filename if it contains a "/" character. If you specify a
1333 minimum node or processor count larger than can be satisfied by
1334 the supplied host list, additional resources will be allocated
1335 on other nodes as needed. Duplicate node names in the list will
1336 be ignored. The order of the node names in the list is not im‐
1337 portant; the node names will be sorted by Slurm.
1338
1339 -N, --nodes=<minnodes>[-maxnodes]
1340 Request that a minimum of minnodes nodes be allocated to this
1341 job. A maximum node count may also be specified with maxnodes.
1342 If only one number is specified, this is used as both the mini‐
1343 mum and maximum node count. The partition's node limits super‐
1344 sede those of the job. If a job's node limits are outside of
1345 the range permitted for its associated partition, the job will
1346 be left in a PENDING state. This permits possible execution at
1347 a later time, when the partition limit is changed. If a job
1348 node limit exceeds the number of nodes configured in the parti‐
1349 tion, the job will be rejected. Note that the environment vari‐
1350 able SLURM_JOB_NUM_NODES will be set to the count of nodes actu‐
1351 ally allocated to the job. See the ENVIRONMENT VARIABLES sec‐
1352 tion for more information. If -N is not specified, the default
1353 behavior is to allocate enough nodes to satisfy the requested
1354 resources as expressed by per-job specification options, e.g.
1355 -n, -c and --gpus. The job will be allocated as many nodes as
1356 possible within the range specified and without delaying the
1357 initiation of the job. The node count specification may include
1358 a numeric value followed by a suffix of "k" (multiplies numeric
1359 value by 1,024) or "m" (multiplies numeric value by 1,048,576).
1360
1361 -n, --ntasks=<number>
1362 salloc does not launch tasks, it requests an allocation of re‐
1363 sources and executed some command. This option advises the Slurm
1364 controller that job steps run within this allocation will launch
1365 a maximum of number tasks and sufficient resources are allocated
1366 to accomplish this. The default is one task per node, but note
1367 that the --cpus-per-task option will change this default.
1368
1369 --ntasks-per-core=<ntasks>
1370 Request the maximum ntasks be invoked on each core. Meant to be
1371 used with the --ntasks option. Related to --ntasks-per-node ex‐
1372 cept at the core level instead of the node level. NOTE: This
1373 option is not supported when using SelectType=select/linear.
1374
1375 --ntasks-per-gpu=<ntasks>
1376 Request that there are ntasks tasks invoked for every GPU. This
1377 option can work in two ways: 1) either specify --ntasks in addi‐
1378 tion, in which case a type-less GPU specification will be auto‐
1379 matically determined to satisfy --ntasks-per-gpu, or 2) specify
1380 the GPUs wanted (e.g. via --gpus or --gres) without specifying
1381 --ntasks, and the total task count will be automatically deter‐
1382 mined. The number of CPUs needed will be automatically in‐
1383 creased if necessary to allow for any calculated task count.
1384 This option will implicitly set --gpu-bind=single:<ntasks>, but
1385 that can be overridden with an explicit --gpu-bind specifica‐
1386 tion. This option is not compatible with a node range (i.e.
1387 -N<minnodes-maxnodes>). This option is not compatible with
1388 --gpus-per-task, --gpus-per-socket, or --ntasks-per-node. This
1389 option is not supported unless SelectType=cons_tres is config‐
1390 ured (either directly or indirectly on Cray systems).
1391
1392 --ntasks-per-node=<ntasks>
1393 Request that ntasks be invoked on each node. If used with the
1394 --ntasks option, the --ntasks option will take precedence and
1395 the --ntasks-per-node will be treated as a maximum count of
1396 tasks per node. Meant to be used with the --nodes option. This
1397 is related to --cpus-per-task=ncpus, but does not require knowl‐
1398 edge of the actual number of cpus on each node. In some cases,
1399 it is more convenient to be able to request that no more than a
1400 specific number of tasks be invoked on each node. Examples of
1401 this include submitting a hybrid MPI/OpenMP app where only one
1402 MPI "task/rank" should be assigned to each node while allowing
1403 the OpenMP portion to utilize all of the parallelism present in
1404 the node, or submitting a single setup/cleanup/monitoring job to
1405 each node of a pre-existing allocation as one step in a larger
1406 job script.
1407
1408 --ntasks-per-socket=<ntasks>
1409 Request the maximum ntasks be invoked on each socket. Meant to
1410 be used with the --ntasks option. Related to --ntasks-per-node
1411 except at the socket level instead of the node level. NOTE:
1412 This option is not supported when using SelectType=select/lin‐
1413 ear.
1414
1415 -O, --overcommit
1416 Overcommit resources.
1417
1418 When applied to a job allocation (not including jobs requesting
1419 exclusive access to the nodes) the resources are allocated as if
1420 only one task per node is requested. This means that the re‐
1421 quested number of cpus per task (-c, --cpus-per-task) are allo‐
1422 cated per node rather than being multiplied by the number of
1423 tasks. Options used to specify the number of tasks per node,
1424 socket, core, etc. are ignored.
1425
1426 When applied to job step allocations (the srun command when exe‐
1427 cuted within an existing job allocation), this option can be
1428 used to launch more than one task per CPU. Normally, srun will
1429 not allocate more than one process per CPU. By specifying
1430 --overcommit you are explicitly allowing more than one process
1431 per CPU. However no more than MAX_TASKS_PER_NODE tasks are per‐
1432 mitted to execute per node. NOTE: MAX_TASKS_PER_NODE is defined
1433 in the file slurm.h and is not a variable, it is set at Slurm
1434 build time.
1435
1436 -s, --oversubscribe
1437 The job allocation can over-subscribe resources with other run‐
1438 ning jobs. The resources to be over-subscribed can be nodes,
1439 sockets, cores, and/or hyperthreads depending upon configura‐
1440 tion. The default over-subscribe behavior depends on system
1441 configuration and the partition's OverSubscribe option takes
1442 precedence over the job's option. This option may result in the
1443 allocation being granted sooner than if the --oversubscribe op‐
1444 tion was not set and allow higher system utilization, but appli‐
1445 cation performance will likely suffer due to competition for re‐
1446 sources. Also see the --exclusive option.
1447
1448 -p, --partition=<partition_names>
1449 Request a specific partition for the resource allocation. If
1450 not specified, the default behavior is to allow the slurm con‐
1451 troller to select the default partition as designated by the
1452 system administrator. If the job can use more than one parti‐
1453 tion, specify their names in a comma separate list and the one
1454 offering earliest initiation will be used with no regard given
1455 to the partition name ordering (although higher priority parti‐
1456 tions will be considered first). When the job is initiated, the
1457 name of the partition used will be placed first in the job
1458 record partition string.
1459
1460 --power=<flags>
1461 Comma separated list of power management plugin options. Cur‐
1462 rently available flags include: level (all nodes allocated to
1463 the job should have identical power caps, may be disabled by the
1464 Slurm configuration option PowerParameters=job_no_level).
1465
1466 --priority=<value>
1467 Request a specific job priority. May be subject to configura‐
1468 tion specific constraints. value should either be a numeric
1469 value or "TOP" (for highest possible value). Only Slurm opera‐
1470 tors and administrators can set the priority of a job.
1471
1472 --profile={all|none|<type>[,<type>...]}
1473 Enables detailed data collection by the acct_gather_profile
1474 plugin. Detailed data are typically time-series that are stored
1475 in an HDF5 file for the job or an InfluxDB database depending on
1476 the configured plugin.
1477
1478 All All data types are collected. (Cannot be combined with
1479 other values.)
1480
1481 None No data types are collected. This is the default.
1482 (Cannot be combined with other values.)
1483
1484 Valid type values are:
1485
1486 Energy Energy data is collected.
1487
1488 Task Task (I/O, Memory, ...) data is collected.
1489
1490 Lustre Lustre data is collected.
1491
1492 Network
1493 Network (InfiniBand) data is collected.
1494
1495 -q, --qos=<qos>
1496 Request a quality of service for the job. QOS values can be de‐
1497 fined for each user/cluster/account association in the Slurm
1498 database. Users will be limited to their association's defined
1499 set of qos's when the Slurm configuration parameter, Account‐
1500 ingStorageEnforce, includes "qos" in its definition.
1501
1502 -Q, --quiet
1503 Suppress informational messages from salloc. Errors will still
1504 be displayed.
1505
1506 --reboot
1507 Force the allocated nodes to reboot before starting the job.
1508 This is only supported with some system configurations and will
1509 otherwise be silently ignored. Only root, SlurmUser or admins
1510 can reboot nodes.
1511
1512 --reservation=<reservation_names>
1513 Allocate resources for the job from the named reservation. If
1514 the job can use more than one reservation, specify their names
1515 in a comma separate list and the one offering earliest initia‐
1516 tion. Each reservation will be considered in the order it was
1517 requested. All reservations will be listed in scontrol/squeue
1518 through the life of the job. In accounting the first reserva‐
1519 tion will be seen and after the job starts the reservation used
1520 will replace it.
1521
1522 --signal=[R:]<sig_num>[@sig_time]
1523 When a job is within sig_time seconds of its end time, send it
1524 the signal sig_num. Due to the resolution of event handling by
1525 Slurm, the signal may be sent up to 60 seconds earlier than
1526 specified. sig_num may either be a signal number or name (e.g.
1527 "10" or "USR1"). sig_time must have an integer value between 0
1528 and 65535. By default, no signal is sent before the job's end
1529 time. If a sig_num is specified without any sig_time, the de‐
1530 fault time will be 60 seconds. Use the "R:" option to allow
1531 this job to overlap with a reservation with MaxStartDelay set.
1532 To have the signal sent at preemption time see the pre‐
1533 empt_send_user_signal SlurmctldParameter.
1534
1535 --sockets-per-node=<sockets>
1536 Restrict node selection to nodes with at least the specified
1537 number of sockets. See additional information under -B option
1538 above when task/affinity plugin is enabled.
1539 NOTE: This option may implicitly set the number of tasks (if -n
1540 was not specified) as one task per requested thread.
1541
1542 --spread-job
1543 Spread the job allocation over as many nodes as possible and at‐
1544 tempt to evenly distribute tasks across the allocated nodes.
1545 This option disables the topology/tree plugin.
1546
1547 --switches=<count>[@max-time]
1548 When a tree topology is used, this defines the maximum count of
1549 leaf switches desired for the job allocation and optionally the
1550 maximum time to wait for that number of switches. If Slurm finds
1551 an allocation containing more switches than the count specified,
1552 the job remains pending until it either finds an allocation with
1553 desired switch count or the time limit expires. It there is no
1554 switch count limit, there is no delay in starting the job. Ac‐
1555 ceptable time formats include "minutes", "minutes:seconds",
1556 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1557 "days-hours:minutes:seconds". The job's maximum time delay may
1558 be limited by the system administrator using the SchedulerParam‐
1559 eters configuration parameter with the max_switch_wait parameter
1560 option. On a dragonfly network the only switch count supported
1561 is 1 since communication performance will be highest when a job
1562 is allocate resources on one leaf switch or more than 2 leaf
1563 switches. The default max-time is the max_switch_wait Sched‐
1564 ulerParameters.
1565
1566 --thread-spec=<num>
1567 Count of specialized threads per node reserved by the job for
1568 system operations and not used by the application. The applica‐
1569 tion will not use these threads, but will be charged for their
1570 allocation. This option can not be used with the --core-spec
1571 option.
1572
1573 --threads-per-core=<threads>
1574 Restrict node selection to nodes with at least the specified
1575 number of threads per core. In task layout, use the specified
1576 maximum number of threads per core. NOTE: "Threads" refers to
1577 the number of processing units on each core rather than the num‐
1578 ber of application tasks to be launched per core. See addi‐
1579 tional information under -B option above when task/affinity
1580 plugin is enabled.
1581 NOTE: This option may implicitly set the number of tasks (if -n
1582 was not specified) as one task per requested thread.
1583
1584 -t, --time=<time>
1585 Set a limit on the total run time of the job allocation. If the
1586 requested time limit exceeds the partition's time limit, the job
1587 will be left in a PENDING state (possibly indefinitely). The
1588 default time limit is the partition's default time limit. When
1589 the time limit is reached, each task in each job step is sent
1590 SIGTERM followed by SIGKILL. The interval between signals is
1591 specified by the Slurm configuration parameter KillWait. The
1592 OverTimeLimit configuration parameter may permit the job to run
1593 longer than scheduled. Time resolution is one minute and second
1594 values are rounded up to the next minute.
1595
1596 A time limit of zero requests that no time limit be imposed.
1597 Acceptable time formats include "minutes", "minutes:seconds",
1598 "hours:minutes:seconds", "days-hours", "days-hours:minutes" and
1599 "days-hours:minutes:seconds".
1600
1601 --time-min=<time>
1602 Set a minimum time limit on the job allocation. If specified,
1603 the job may have its --time limit lowered to a value no lower
1604 than --time-min if doing so permits the job to begin execution
1605 earlier than otherwise possible. The job's time limit will not
1606 be changed after the job is allocated resources. This is per‐
1607 formed by a backfill scheduling algorithm to allocate resources
1608 otherwise reserved for higher priority jobs. Acceptable time
1609 formats include "minutes", "minutes:seconds", "hours:min‐
1610 utes:seconds", "days-hours", "days-hours:minutes" and
1611 "days-hours:minutes:seconds".
1612
1613 --tmp=<size>[units]
1614 Specify a minimum amount of temporary disk space per node. De‐
1615 fault units are megabytes. Different units can be specified us‐
1616 ing the suffix [K|M|G|T].
1617
1618 --uid=<user>
1619 Attempt to submit and/or run a job as user instead of the invok‐
1620 ing user id. The invoking user's credentials will be used to
1621 check access permissions for the target partition. This option
1622 is only valid for user root. This option may be used by user
1623 root may use this option to run jobs as a normal user in a
1624 RootOnly partition for example. If run as root, salloc will drop
1625 its permissions to the uid specified after node allocation is
1626 successful. user may be the user name or numerical user ID.
1627
1628 --usage
1629 Display brief help message and exit.
1630
1631 --use-min-nodes
1632 If a range of node counts is given, prefer the smaller count.
1633
1634 -v, --verbose
1635 Increase the verbosity of salloc's informational messages. Mul‐
1636 tiple -v's will further increase salloc's verbosity. By default
1637 only errors will be displayed.
1638
1639 -V, --version
1640 Display version information and exit.
1641
1642 --wait-all-nodes=<value>
1643 Controls when the execution of the command begins with respect
1644 to when nodes are ready for use (i.e. booted). By default, the
1645 salloc command will return as soon as the allocation is made.
1646 This default can be altered using the salloc_wait_nodes option
1647 to the SchedulerParameters parameter in the slurm.conf file.
1648
1649 0 Begin execution as soon as allocation can be made. Do not
1650 wait for all nodes to be ready for use (i.e. booted).
1651
1652 1 Do not begin execution until all nodes are ready for use.
1653
1654 --wckey=<wckey>
1655 Specify wckey to be used with job. If TrackWCKey=no (default)
1656 in the slurm.conf this value is ignored.
1657
1658 --x11[={all|first|last}]
1659 Sets up X11 forwarding on "all", "first" or "last" node(s) of
1660 the allocation. This option is only enabled if Slurm was com‐
1661 piled with X11 support and PrologFlags=x11 is defined in the
1662 slurm.conf. Default is "all".
1663
1665 Executing salloc sends a remote procedure call to slurmctld. If enough
1666 calls from salloc or other Slurm client commands that send remote pro‐
1667 cedure calls to the slurmctld daemon come in at once, it can result in
1668 a degradation of performance of the slurmctld daemon, possibly result‐
1669 ing in a denial of service.
1670
1671 Do not run salloc or other Slurm client commands that send remote pro‐
1672 cedure calls to slurmctld from loops in shell scripts or other pro‐
1673 grams. Ensure that programs limit calls to salloc to the minimum neces‐
1674 sary for the information you are trying to gather.
1675
1676
1678 Upon startup, salloc will read and handle the options set in the fol‐
1679 lowing environment variables. The majority of these variables are set
1680 the same way the options are set, as defined above. For flag options
1681 that are defined to expect no argument, the option can be enabled by
1682 setting the environment variable without a value (empty or NULL
1683 string), the string 'yes', or a non-zero number. Any other value for
1684 the environment variable will result in the option not being set.
1685 There are a couple exceptions to these rules that are noted below.
1686 NOTE: Command line options always override environment variables set‐
1687 tings.
1688
1689
1690 SALLOC_ACCOUNT Same as -A, --account
1691
1692 SALLOC_ACCTG_FREQ Same as --acctg-freq
1693
1694 SALLOC_BELL Same as --bell
1695
1696 SALLOC_BURST_BUFFER Same as --bb
1697
1698 SALLOC_CLUSTERS or SLURM_CLUSTERS
1699 Same as --clusters
1700
1701 SALLOC_CONSTRAINT Same as -C, --constraint
1702
1703 SALLOC_CONTAINER Same as --container.
1704
1705 SALLOC_CORE_SPEC Same as --core-spec
1706
1707 SALLOC_CPUS_PER_GPU Same as --cpus-per-gpu
1708
1709 SALLOC_DEBUG Same as -v, --verbose. Must be set to 0 or 1 to
1710 disable or enable the option.
1711
1712 SALLOC_DELAY_BOOT Same as --delay-boot
1713
1714 SALLOC_EXCLUSIVE Same as --exclusive
1715
1716 SALLOC_GPU_BIND Same as --gpu-bind
1717
1718 SALLOC_GPU_FREQ Same as --gpu-freq
1719
1720 SALLOC_GPUS Same as -G, --gpus
1721
1722 SALLOC_GPUS_PER_NODE Same as --gpus-per-node
1723
1724 SALLOC_GPUS_PER_TASK Same as --gpus-per-task
1725
1726 SALLOC_GRES Same as --gres
1727
1728 SALLOC_GRES_FLAGS Same as --gres-flags
1729
1730 SALLOC_HINT or SLURM_HINT
1731 Same as --hint
1732
1733 SALLOC_IMMEDIATE Same as -I, --immediate
1734
1735 SALLOC_KILL_CMD Same as -K, --kill-command
1736
1737 SALLOC_MEM_BIND Same as --mem-bind
1738
1739 SALLOC_MEM_PER_CPU Same as --mem-per-cpu
1740
1741 SALLOC_MEM_PER_GPU Same as --mem-per-gpu
1742
1743 SALLOC_MEM_PER_NODE Same as --mem
1744
1745 SALLOC_NETWORK Same as --network
1746
1747 SALLOC_NO_BELL Same as --no-bell
1748
1749 SALLOC_NO_KILL Same as -k, --no-kill
1750
1751 SALLOC_OVERCOMMIT Same as -O, --overcommit
1752
1753 SALLOC_PARTITION Same as -p, --partition
1754
1755 SALLOC_POWER Same as --power
1756
1757 SALLOC_PROFILE Same as --profile
1758
1759 SALLOC_QOS Same as --qos
1760
1761 SALLOC_REQ_SWITCH When a tree topology is used, this defines the
1762 maximum count of switches desired for the job al‐
1763 location and optionally the maximum time to wait
1764 for that number of switches. See --switches.
1765
1766 SALLOC_RESERVATION Same as --reservation
1767
1768 SALLOC_SIGNAL Same as --signal
1769
1770 SALLOC_SPREAD_JOB Same as --spread-job
1771
1772 SALLOC_THREAD_SPEC Same as --thread-spec
1773
1774 SALLOC_THREADS_PER_CORE
1775 Same as --threads-per-core
1776
1777 SALLOC_TIMELIMIT Same as -t, --time
1778
1779 SALLOC_USE_MIN_NODES Same as --use-min-nodes
1780
1781 SALLOC_WAIT_ALL_NODES Same as --wait-all-nodes. Must be set to 0 or 1
1782 to disable or enable the option.
1783
1784 SALLOC_WAIT4SWITCH Max time waiting for requested switches. See
1785 --switches
1786
1787 SALLOC_WCKEY Same as --wckey
1788
1789 SLURM_CONF The location of the Slurm configuration file.
1790
1791 SLURM_EXIT_ERROR Specifies the exit code generated when a Slurm
1792 error occurs (e.g. invalid options). This can be
1793 used by a script to distinguish application exit
1794 codes from various Slurm error conditions. Also
1795 see SLURM_EXIT_IMMEDIATE.
1796
1797 SLURM_EXIT_IMMEDIATE Specifies the exit code generated when the --im‐
1798 mediate option is used and resources are not cur‐
1799 rently available. This can be used by a script
1800 to distinguish application exit codes from vari‐
1801 ous Slurm error conditions. Also see
1802 SLURM_EXIT_ERROR.
1803
1805 salloc will set the following environment variables in the environment
1806 of the executed program:
1807
1808 SLURM_*_HET_GROUP_#
1809 For a heterogeneous job allocation, the environment variables
1810 are set separately for each component.
1811
1812 SLURM_CLUSTER_NAME
1813 Name of the cluster on which the job is executing.
1814
1815 SLURM_CONTAINER
1816 OCI Bundle for job. Only set if --container is specified.
1817
1818 SLURM_CPUS_PER_GPU
1819 Number of CPUs requested per allocated GPU. Only set if the
1820 --cpus-per-gpu option is specified.
1821
1822 SLURM_CPUS_PER_TASK
1823 Number of CPUs requested per task. Only set if the
1824 --cpus-per-task option is specified.
1825
1826 SLURM_DIST_PLANESIZE
1827 Plane distribution size. Only set for plane distributions. See
1828 -m, --distribution.
1829
1830 SLURM_DISTRIBUTION
1831 Only set if the -m, --distribution option is specified.
1832
1833 SLURM_GPU_BIND
1834 Requested binding of tasks to GPU. Only set if the --gpu-bind
1835 option is specified.
1836
1837 SLURM_GPU_FREQ
1838 Requested GPU frequency. Only set if the --gpu-freq option is
1839 specified.
1840
1841 SLURM_GPUS
1842 Number of GPUs requested. Only set if the -G, --gpus option is
1843 specified.
1844
1845 SLURM_GPUS_PER_NODE
1846 Requested GPU count per allocated node. Only set if the
1847 --gpus-per-node option is specified.
1848
1849 SLURM_GPUS_PER_SOCKET
1850 Requested GPU count per allocated socket. Only set if the
1851 --gpus-per-socket option is specified.
1852
1853 SLURM_GPUS_PER_TASK
1854 Requested GPU count per allocated task. Only set if the
1855 --gpus-per-task option is specified.
1856
1857 SLURM_HET_SIZE
1858 Set to count of components in heterogeneous job.
1859
1860 SLURM_JOB_ACCOUNT
1861 Account name associated of the job allocation.
1862
1863 SLURM_JOB_ID
1864 The ID of the job allocation.
1865
1866 SLURM_JOB_CPUS_PER_NODE
1867 Count of CPUs available to the job on the nodes in the alloca‐
1868 tion, using the format CPU_count[(xnumber_of_nodes)][,CPU_count
1869 [(xnumber_of_nodes)] ...]. For example:
1870 SLURM_JOB_CPUS_PER_NODE='72(x2),36' indicates that on the first
1871 and second nodes (as listed by SLURM_JOB_NODELIST) the alloca‐
1872 tion has 72 CPUs, while the third node has 36 CPUs. NOTE: The
1873 select/linear plugin allocates entire nodes to jobs, so the
1874 value indicates the total count of CPUs on allocated nodes. The
1875 select/cons_res and select/cons_tres plugins allocate individual
1876 CPUs to jobs, so this number indicates the number of CPUs allo‐
1877 cated to the job.
1878
1879 SLURM_JOB_NODELIST
1880 List of nodes allocated to the job.
1881
1882 SLURM_JOB_NUM_NODES
1883 Total number of nodes in the job allocation.
1884
1885 SLURM_JOB_PARTITION
1886 Name of the partition in which the job is running.
1887
1888 SLURM_JOB_QOS
1889 Quality Of Service (QOS) of the job allocation.
1890
1891 SLURM_JOB_RESERVATION
1892 Advanced reservation containing the job allocation, if any.
1893
1894 SLURM_JOBID
1895 The ID of the job allocation. See SLURM_JOB_ID. Included for
1896 backwards compatibility.
1897
1898 SLURM_MEM_BIND
1899 Set to value of the --mem-bind option.
1900
1901 SLURM_MEM_BIND_LIST
1902 Set to bit mask used for memory binding.
1903
1904 SLURM_MEM_BIND_PREFER
1905 Set to "prefer" if the --mem-bind option includes the prefer op‐
1906 tion.
1907
1908 SLURM_MEM_BIND_SORT
1909 Sort free cache pages (run zonesort on Intel KNL nodes)
1910
1911 SLURM_MEM_BIND_TYPE
1912 Set to the memory binding type specified with the --mem-bind op‐
1913 tion. Possible values are "none", "rank", "map_map", "mask_mem"
1914 and "local".
1915
1916 SLURM_MEM_BIND_VERBOSE
1917 Set to "verbose" if the --mem-bind option includes the verbose
1918 option. Set to "quiet" otherwise.
1919
1920 SLURM_MEM_PER_CPU
1921 Same as --mem-per-cpu
1922
1923 SLURM_MEM_PER_GPU
1924 Requested memory per allocated GPU. Only set if the
1925 --mem-per-gpu option is specified.
1926
1927 SLURM_MEM_PER_NODE
1928 Same as --mem
1929
1930 SLURM_NNODES
1931 Total number of nodes in the job allocation. See
1932 SLURM_JOB_NUM_NODES. Included for backwards compatibility.
1933
1934 SLURM_NODELIST
1935 List of nodes allocated to the job. See SLURM_JOB_NODELIST. In‐
1936 cluded for backwards compabitility.
1937
1938 SLURM_NODE_ALIASES
1939 Sets of node name, communication address and hostname for nodes
1940 allocated to the job from the cloud. Each element in the set if
1941 colon separated and each set is comma separated. For example:
1942 SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
1943
1944 SLURM_NTASKS
1945 Same as -n, --ntasks
1946
1947 SLURM_NTASKS_PER_CORE
1948 Set to value of the --ntasks-per-core option, if specified.
1949
1950 SLURM_NTASKS_PER_GPU
1951 Set to value of the --ntasks-per-gpu option, if specified.
1952
1953 SLURM_NTASKS_PER_NODE
1954 Set to value of the --ntasks-per-node option, if specified.
1955
1956 SLURM_NTASKS_PER_SOCKET
1957 Set to value of the --ntasks-per-socket option, if specified.
1958
1959 SLURM_OVERCOMMIT
1960 Set to 1 if --overcommit was specified.
1961
1962 SLURM_PROFILE
1963 Same as --profile
1964
1965 SLURM_SUBMIT_DIR
1966 The directory from which salloc was invoked or, if applicable,
1967 the directory specified by the -D, --chdir option.
1968
1969 SLURM_SUBMIT_HOST
1970 The hostname of the computer from which salloc was invoked.
1971
1972 SLURM_TASKS_PER_NODE
1973 Number of tasks to be initiated on each node. Values are comma
1974 separated and in the same order as SLURM_JOB_NODELIST. If two
1975 or more consecutive nodes are to have the same task count, that
1976 count is followed by "(x#)" where "#" is the repetition count.
1977 For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the
1978 first three nodes will each execute two tasks and the fourth
1979 node will execute one task.
1980
1981 SLURM_THREADS_PER_CORE
1982 This is only set if --threads-per-core or SAL‐
1983 LOC_THREADS_PER_CORE were specified. The value will be set to
1984 the value specified by --threads-per-core or SAL‐
1985 LOC_THREADS_PER_CORE. This is used by subsequent srun calls
1986 within the job allocation.
1987
1989 While salloc is waiting for a PENDING job allocation, most signals will
1990 cause salloc to revoke the allocation request and exit.
1991
1992 However if the allocation has been granted and salloc has already
1993 started the specified command, then salloc will ignore most signals.
1994 salloc will not exit or release the allocation until the command exits.
1995 One notable exception is SIGHUP. A SIGHUP signal will cause salloc to
1996 release the allocation and exit without waiting for the command to fin‐
1997 ish. Another exception is SIGTERM, which will be forwarded to the
1998 spawned process.
1999
2000
2002 To get an allocation, and open a new xterm in which srun commands may
2003 be typed interactively:
2004
2005 $ salloc -N16 xterm
2006 salloc: Granted job allocation 65537
2007 # (at this point the xterm appears, and salloc waits for xterm to exit)
2008 salloc: Relinquishing job allocation 65537
2009
2010
2011 To grab an allocation of nodes and launch a parallel application on one
2012 command line:
2013
2014 $ salloc -N5 srun -n10 myprogram
2015
2016
2017 To create a heterogeneous job with 3 components, each allocating a
2018 unique set of nodes:
2019
2020 $ salloc -w node[2-3] : -w node4 : -w node[5-7] bash
2021 salloc: job 32294 queued and waiting for resources
2022 salloc: job 32294 has been allocated resources
2023 salloc: Granted job allocation 32294
2024
2025
2027 Copyright (C) 2006-2007 The Regents of the University of California.
2028 Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2029 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2030 Copyright (C) 2010-2022 SchedMD LLC.
2031
2032 This file is part of Slurm, a resource management program. For de‐
2033 tails, see <https://slurm.schedmd.com/>.
2034
2035 Slurm is free software; you can redistribute it and/or modify it under
2036 the terms of the GNU General Public License as published by the Free
2037 Software Foundation; either version 2 of the License, or (at your op‐
2038 tion) any later version.
2039
2040 Slurm is distributed in the hope that it will be useful, but WITHOUT
2041 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2042 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2043 for more details.
2044
2045
2047 sinfo(1), sattach(1), sbatch(1), squeue(1), scancel(1), scontrol(1),
2048 slurm.conf(5), sched_setaffinity (2), numa (3)
2049
2050
2051
2052April 2022 Slurm Commands salloc(1)