1gres.conf(5) Slurm Configuration File gres.conf(5)
2
3
4
6 gres.conf - Slurm configuration file for Generic RESource (GRES) man‐
7 agement.
8
9
11 gres.conf is an ASCII file which describes the configuration of Generic
12 RESource(s) (GRES) on each compute node. If the GRES information in
13 the slurm.conf file does not fully describe those resources, then a
14 gres.conf file should be included on each compute node and the slurm
15 controller. The file will always be located in the same directory as
16 slurm.conf.
17
18
19 If the GRES information in the slurm.conf file fully describes those
20 resources (i.e. no "Cores", "File" or "Links" specification is required
21 for that GRES type or that information is automatically detected), that
22 information may be omitted from the gres.conf file and only the config‐
23 uration information in the slurm.conf file will be used. The gres.conf
24 file may be omitted completely if the configuration information in the
25 slurm.conf file fully describes all GRES.
26
27
28 If using the gres.conf file to describe the resources available to
29 nodes, the first parameter on the line should be NodeName. If configur‐
30 ing Generic Resources without specifying nodes, the first parameter on
31 the line should be Name.
32
33
34 Parameter names are case insensitive. Any text following a "#" in the
35 configuration file is treated as a comment through the end of that
36 line. Changes to the configuration file take effect upon restart of
37 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
38 command "scontrol reconfigure" unless otherwise noted.
39
40
41 NOTE: Slurm support for gres/[mps|shard] requires the use of the se‐
42 lect/cons_tres plugin. For more information on how to configure MPS,
43 see https://slurm.schedmd.com/gres.html#MPS_Management. For more in‐
44 formation on how to configure Sharding, see
45 https://slurm.schedmd.com/gres.html#Sharding.
46
47
48 For more information on GRES scheduling in general, see
49 https://slurm.schedmd.com/gres.html.
50
51
52 The overall configuration parameters available include:
53
54
55 AutoDetect
56 The hardware detection mechanisms to enable for automatic GRES
57 configuration. Currently, the options are:
58
59 nvml Automatically detect NVIDIA GPUs. Requires the NVIDIA
60 Management Library (NVML).
61
62 off Do not automatically detect any GPUs. Used to override
63 other options.
64
65 rsmi Automatically detect AMD GPUs. Requires the ROCm System
66 Management Interface (ROCm SMI) Library.
67
68 AutoDetect can be on a line by itself, in which case it will
69 globally apply to all lines in gres.conf by default. In addi‐
70 tion, AutoDetect can be combined with NodeName to only apply to
71 certain nodes. Node-specific AutoDetects will trump the global
72 AutoDetect. A node-specific AutoDetect only needs to be speci‐
73 fied once per node. If specified multiple times for the same
74 nodes, they must all be the same value. To unset AutoDetect for
75 a node when a global AutoDetect is set, simply set it to "off"
76 in a node-specific GRES line. E.g.: NodeName=tux3 AutoDe‐
77 tect=off Name=gpu File=/dev/nvidia[0-3]. AutoDetect cannot be
78 used with cloud nodes.
79
80
81 AutoDetect will automatically detect files, cores, links, and
82 any other hardware. If a parameter such as File, Cores, or Links
83 are specified when AutoDetect is used, then the specified values
84 are used to sanity check the auto detected values. If there is a
85 mismatch, then the node's state is set to invalid and the node
86 is drained.
87
88 Count Number of resources of this name/type available on this node.
89 The default value is set to the number of File values specified
90 (if any), otherwise the default value is one. A suffix of "K",
91 "M", "G", "T" or "P" may be used to multiply the number by 1024,
92 1048576, 1073741824, etc. respectively. For example:
93 "Count=10G".
94
95 Cores Optionally specify the core index numbers for the specific cores
96 which can use this resource. For example, it may be strongly
97 preferable to use specific cores with specific GRES devices
98 (e.g. on a NUMA architecture). While Slurm can track and assign
99 resources at the CPU or thread level, its scheduling algorithms
100 used to co-allocate GRES devices with CPUs operates at a socket
101 or NUMA level for job allocations. Therefore it is not possible
102 to preferentially assign GRES with different specific CPUs on
103 the same NUMA or socket and this option should generally be used
104 to identify all cores on some socket. Though, job step alloca‐
105 tion with --exact will look at cores directly for which more
106 specific core identification may be useful.
107
108
109 Multiple cores may be specified using a comma-delimited list or
110 a range may be specified using a "-" separator (e.g. "0,1,2,3"
111 or "0-3"). If a job specifies --gres-flags=enforce-binding,
112 then only the identified cores can be allocated with each
113 generic resource. This will tend to improve performance of jobs,
114 but delay the allocation of resources to them. If specified and
115 a job is not submitted with the --gres-flags=enforce-binding op‐
116 tion the identified cores will be preferred for scheduling with
117 each generic resource.
118
119 If --gres-flags=disable-binding is specified, then any core can
120 be used with the resources, which also increases the speed of
121 Slurm's scheduling algorithm but can degrade the application
122 performance. The --gres-flags=disable-binding option is cur‐
123 rently required to use more CPUs than are bound to a GRES (e.g.
124 if a GPU is bound to the CPUs on one socket, but resources on
125 more than one socket are required to run the job). If any core
126 can be effectively used with the resources, then do not specify
127 the cores option for improved speed in the Slurm scheduling
128 logic. A restart of the slurmctld is needed for changes to the
129 Cores option to take effect.
130
131 NOTE: Since Slurm must be able to perform resource management on
132 heterogeneous clusters having various processing unit numbering
133 schemes, a logical core index must be specified instead of the
134 physical core index. That logical core index might not corre‐
135 spond to your physical core index number. Core 0 will be the
136 first core on the first socket, while core 1 will be the second
137 core on the first socket. This numbering coincides with the
138 logical core number (Core L#) seen in "lstopo -l" command out‐
139 put.
140
141 File Fully qualified pathname of the device files associated with a
142 resource. The name can include a numeric range suffix to be in‐
143 terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
144
145
146 This field is generally required if enforcement of generic re‐
147 source allocations is to be supported (i.e. prevents users from
148 making use of resources allocated to a different user). En‐
149 forcement of the file allocation relies upon Linux Control
150 Groups (cgroups) and Slurm's task/cgroup plugin, which will
151 place the allocated files into the job's cgroup and prevent use
152 of other files. Please see Slurm's Cgroups Guide for more in‐
153 formation: https://slurm.schedmd.com/cgroups.html.
154
155 If File is specified then Count must be either set to the number
156 of file names specified or not set (the default value is the
157 number of files specified). The exception to this is MPS/Shard‐
158 ing. For either of these GRES, each GPU would be identified by
159 device file using the File parameter and Count would specify the
160 number of entries that would correspond to that GPU. For MPS,
161 typically 100 or some multiple of 100. For Sharding typically
162 the maximum number of jobs that could simultaneously share that
163 GPU.
164
165 If using a card with Multi-Instance GPU functionality, use Mul‐
166 tipleFiles instead. File and MultipleFiles are mutually exclu‐
167 sive.
168
169 NOTE: File is required for all gpu typed GRES.
170
171 NOTE: If you specify the File parameter for a resource on some
172 node, the option must be specified on all nodes and Slurm will
173 track the assignment of each specific resource on each node.
174 Otherwise Slurm will only track a count of allocated resources
175 rather than the state of each individual device file.
176
177 NOTE: Drain a node before changing the count of records with
178 File parameters (e.g. if you want to add or remove GPUs from a
179 node's configuration). Failure to do so will result in any job
180 using those GRES being aborted.
181
182 NOTE: When specifying File, Count is limited in size (currently
183 1024) for each node.
184
185 Flags Optional flags that can be specified to change configured behav‐
186 ior of the GRES.
187
188 Allowed values at present are:
189
190 CountOnly Do not attempt to load plugin as this GRES
191 will only be used to track counts of GRES
192 used. This avoids attempting to load non-ex‐
193 istent plugin which can affect filesystems
194 with high latency metadata operations for
195 non-existent files.
196
197 one_sharing To be used on a shared gres. If using a
198 shared gres (mps) on top of a sharing gres
199 (gpu) only allow one of the sharing gres to
200 be used by the shared gres. This is the de‐
201 fault for MPS.
202
203 NOTE: If a gres has this flag configured it
204 is global, so all other nodes with that gres
205 will have this flag implied. This flag is
206 not combatible with all_sharing for a spe‐
207 cific gres.
208
209 all_sharing To be used on a shared gres. This is the op‐
210 posite of one_sharing and can be used to al‐
211 low all sharing gres (gpu) on a node to be
212 used for shared gres (mps).
213
214 NOTE: If a gres has this flag configured it
215 is global, so all other nodes with that gres
216 will have this flag implied. This flag is
217 not combatible with one_sharing for a spe‐
218 cific gres.
219
220 nvidia_gpu_env Set environment variable CUDA_VISIBLE_DE‐
221 VICES for all GPUs on the specified node(s).
222
223 amd_gpu_env Set environment variable ROCR_VISIBLE_DE‐
224 VICES for all GPUs on the specified node(s).
225
226 intel_gpu_env Set environment variable ZE_AFFINITY_MASK
227 for all GPUs on the specified node(s).
228
229 opencl_env Set environment variable GPU_DEVICE_ORDINAL
230 for all GPUs on the specified node(s).
231
232 no_gpu_env Set no GPU-specific environment variables.
233 This is mutually exclusive to all other en‐
234 vironment-related flags.
235
236 If no environment-related flags are specified, then
237 nvidia_gpu_env, amd_gpu_env, intel_gpu_env, and opencl_env will
238 be implicitly set by default. If AutoDetect is used and envi‐
239 ronment-related flags are not specified, then AutoDetect=nvml
240 will set nvidia_gpu_env, AutoDetect=rsmi will set amd_gpu_env,
241 and AutoDetect=oneapi will set intel_gpu_env. Conversely, spec‐
242 ified environment-related flags will always override AutoDetect.
243
244 Environment-related flags set on one GRES line will be inherited
245 by the GRES line directly below it if no environment-related
246 flags are specified on that line and if it is of the same node,
247 name, and type. Environment-related flags must be the same for
248 GRES of the same node, name, and type.
249
250 Note that there is a known issue with the AMD ROCm runtime where
251 ROCR_VISIBLE_DEVICES is processed first, and then CUDA_VISI‐
252 BLE_DEVICES is processed. To avoid the issues caused by this,
253 set Flags=amd_gpu_env for AMD GPUs so only ROCR_VISIBLE_DEVICES
254 is set.
255
256 Links A comma-delimited list of numbers identifying the number of con‐
257 nections between this device and other devices to allow
258 coscheduling of better connected devices. This is an ordered
259 list in which the number of connections this specific device has
260 to device number 0 would be in the first position, the number of
261 connections it has to device number 1 in the second position,
262 etc. A -1 indicates the device itself and a 0 indicates no con‐
263 nection. If specified, then this line can only contain a single
264 GRES device (i.e. can only contain a single file via File).
265
266
267 This is an optional value and is usually automatically deter‐
268 mined if AutoDetect is enabled. A typical use case would be to
269 identify GPUs having NVLink connectivity. Note that for GPUs,
270 the minor number assigned by the OS and used in the device file
271 (i.e. the X in /dev/nvidiaX) is not necessarily the same as the
272 device number/index. The device number is created by sorting the
273 GPUs by PCI bus ID and then numbering them starting from the
274 smallest bus ID. See
275 https://slurm.schedmd.com/gres.html#GPU_Management
276
277 MultipleFiles
278 Fully qualified pathname of the device files associated with a
279 resource. Graphics cards using Multi-Instance GPU (MIG) tech‐
280 nology will present multiple device files that should be managed
281 as a single generic resource. The file names can be a comma sep‐
282 arated list or it can include a numeric range suffix (e.g. Mul‐
283 tipleFiles=/dev/nvidia[0-3]).
284
285 Drain a node before changing the count of records with the Mul‐
286 tipleFiles parameter, such as when adding or removing GPUs from
287 a node's configuration. Failure to do so will result in any job
288 using those GRES being aborted.
289
290 When not using GPUs with MIG functionality, use File instead.
291 MultipleFiles and File are mutually exclusive.
292
293 Name Name of the generic resource. Any desired name may be used. The
294 name must match a value in GresTypes in slurm.conf. Each
295 generic resource has an optional plugin which can provide re‐
296 source-specific functionality. Generic resources that currently
297 include an optional plugin are:
298
299 gpu Graphics Processing Unit
300
301 mps CUDA Multi-Process Service (MPS)
302
303 nic Network Interface Card
304
305 shard Shards of a gpu
306
307 NodeName
308 An optional NodeName specification can be used to permit one
309 gres.conf file to be used for all compute nodes in a cluster by
310 specifying the node(s) that each line should apply to. The
311 NodeName specification can use a Slurm hostlist specification as
312 shown in the example below.
313
314 Type An optional arbitrary string identifying the type of generic re‐
315 source. For example, this might be used to identify a specific
316 model of GPU, which users can then specify in a job request. A
317 restart of the slurmctld and slurmd daemons is required for
318 changes to the Type option to take effect.
319
320 NOTE: If using autodetect functionality and defining the Type in
321 your gres.conf file, the Type specified should match or be a
322 substring of the value that is detected, using an underscore in
323 lieu of any spaces.
324
326 ##################################################################
327 # Slurm's Generic Resource (GRES) configuration file
328 # Define GPU devices with MPS support, with AutoDetect sanity checking
329 ##################################################################
330 AutoDetect=nvml
331 Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
332 Name=gpu Type=tesla File=/dev/nvidia1 COREs=2,3
333 Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
334 Name=mps Count=100 File=/dev/nvidia1 COREs=2,3
335
336 ##################################################################
337 # Slurm's Generic Resource (GRES) configuration file
338 # Overwrite system defaults and explicitly configure three GPUs
339 ##################################################################
340 Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
341 # Name=gpu Type=tesla File=/dev/nvidia[2-3] COREs=2,3
342 # NOTE: nvidia2 device is out of service
343 Name=gpu Type=tesla File=/dev/nvidia3 COREs=2,3
344
345 ##################################################################
346 # Slurm's Generic Resource (GRES) configuration file
347 # Use a single gres.conf file for all compute nodes - positive method
348 ##################################################################
349 ## Explicitly specify devices on nodes tux0-tux15
350 # NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
351 # NOTE: tux3 nvidia1 device is out of service
352 NodeName=tux[0-2] Name=gpu File=/dev/nvidia[0-3]
353 NodeName=tux3 Name=gpu File=/dev/nvidia[0,2-3]
354 NodeName=tux[4-15] Name=gpu File=/dev/nvidia[0-3]
355
356 ##################################################################
357 # Slurm's Generic Resource (GRES) configuration file
358 # Use NVML to gather GPU configuration information
359 # for all nodes except one
360 ##################################################################
361 AutoDetect=nvml
362 NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
363
364 ##################################################################
365 # Slurm's Generic Resource (GRES) configuration file
366 # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
367 ##################################################################
368 NodeName=tux[0-7] AutoDetect=nvml
369 NodeName=tux[8-11] AutoDetect=rsmi
370 NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
371
372 ##################################################################
373 # Slurm's Generic Resource (GRES) configuration file
374 # Define 'bandwidth' GRES to use as a way to limit the
375 # resource use on these nodes for workflow purposes
376 ##################################################################
377 NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
378
379
381 Copyright (C) 2010 The Regents of the University of California. Pro‐
382 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
383 Copyright (C) 2010-2022 SchedMD LLC.
384
385 This file is part of Slurm, a resource management program. For de‐
386 tails, see <https://slurm.schedmd.com/>.
387
388 Slurm is free software; you can redistribute it and/or modify it under
389 the terms of the GNU General Public License as published by the Free
390 Software Foundation; either version 2 of the License, or (at your op‐
391 tion) any later version.
392
393 Slurm is distributed in the hope that it will be useful, but WITHOUT
394 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
395 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
396 for more details.
397
398
400 slurm.conf(5)
401
402
403
404January 2023 Slurm Configuration File gres.conf(5)