1gres.conf(5) Slurm Configuration File gres.conf(5)
2
3
4
6 gres.conf - Slurm configuration file for Generic RESource (GRES) man‐
7 agement.
8
9
11 gres.conf is an ASCII file which describes the configuration of Generic
12 RESource(s) (GRES) on each compute node. If the GRES information in
13 the slurm.conf file does not fully describe those resources, then a
14 gres.conf file should be included on each compute node. The file will
15 always be located in the same directory as the slurm.conf.
16
17
18 If the GRES information in the slurm.conf file fully describes those
19 resources (i.e. no "Cores", "File" or "Links" specification is required
20 for that GRES type or that information is automatically detected), that
21 information may be omitted from the gres.conf file and only the config‐
22 uration information in the slurm.conf file will be used. The gres.conf
23 file may be omitted completely if the configuration information in the
24 slurm.conf file fully describes all GRES.
25
26
27 If using the gres.conf file to describe the resources available to
28 nodes, the first parameter on the line should be NodeName. If configur‐
29 ing Generic Resources without specifying nodes, the first parameter on
30 the line should be Name.
31
32
33 Parameter names are case insensitive. Any text following a "#" in the
34 configuration file is treated as a comment through the end of that
35 line. Changes to the configuration file take effect upon restart of
36 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
37 command "scontrol reconfigure" unless otherwise noted.
38
39
40 NOTE: Slurm support for gres/mps requires the use of the se‐
41 lect/cons_tres plugin. For more information on how to configure MPS,
42 see https://slurm.schedmd.com/gres.html#MPS_Management.
43
44
45 For more information on GRES scheduling in general, see
46 https://slurm.schedmd.com/gres.html.
47
48
49 The overall configuration parameters available include:
50
51
52 AutoDetect
53 The hardware detection mechanisms to enable for automatic GRES
54 configuration. Currently, the options are:
55
56 nvml Automatically detect NVIDIA GPUs. Requires the NVIDIA
57 Management Library (NVML).
58
59 off Do not automatically detect any GPUs. Used to override
60 other options.
61
62 rsmi Automatically detect AMD GPUs. Requires the ROCm System
63 Management Interface (ROCm SMI) Library.
64
65 AutoDetect can be on a line by itself, in which case it will
66 globally apply to all lines in gres.conf by default. In addi‐
67 tion, AutoDetect can be combined with NodeName to only apply to
68 certain nodes. Node-specific AutoDetects will trump the global
69 AutoDetect. A node-specific AutoDetect only needs to be speci‐
70 fied once per node. If specified multiple times for the same
71 nodes, they must all be the same value. To unset AutoDetect for
72 a node when a global AutoDetect is set, simply set it to "off"
73 in a node-specific GRES line. E.g.: NodeName=tux3 AutoDe‐
74 tect=off Name=gpu File=/dev/nvidia[0-3].
75
76
77 Count Number of resources of this name/type available on this node.
78 The default value is set to the number of File values specified
79 (if any), otherwise the default value is one. A suffix of "K",
80 "M", "G", "T" or "P" may be used to multiply the number by 1024,
81 1048576, 1073741824, etc. respectively. For example:
82 "Count=10G".
83
84 Cores Optionally specify the core index numbers for the specific cores
85 which can use this resource. For example, it may be strongly
86 preferable to use specific cores with specific GRES devices
87 (e.g. on a NUMA architecture). While Slurm can track and assign
88 resources at the CPU or thread level, its scheduling algorithms
89 used to co-allocate GRES devices with CPUs operates at a socket
90 or NUMA level. Therefore it is not possible to preferentially
91 assign GRES with different specific CPUs on the same NUMA or
92 socket and this option should be used to identify all cores on
93 some socket.
94
95
96 Multiple cores may be specified using a comma-delimited list or
97 a range may be specified using a "-" separator (e.g. "0,1,2,3"
98 or "0-3"). If a job specifies --gres-flags=enforce-binding,
99 then only the identified cores can be allocated with each
100 generic resource. This will tend to improve performance of jobs,
101 but delay the allocation of resources to them. If specified and
102 a job is not submitted with the --gres-flags=enforce-binding op‐
103 tion the identified cores will be preferred for scheduling with
104 each generic resource.
105
106 If --gres-flags=disable-binding is specified, then any core can
107 be used with the resources, which also increases the speed of
108 Slurm's scheduling algorithm but can degrade the application
109 performance. The --gres-flags=disable-binding option is cur‐
110 rently required to use more CPUs than are bound to a GRES (e.g.
111 if a GPU is bound to the CPUs on one socket, but resources on
112 more than one socket are required to run the job). If any core
113 can be effectively used with the resources, then do not specify
114 the cores option for improved speed in the Slurm scheduling
115 logic. A restart of the slurmctld is needed for changes to the
116 Cores option to take effect.
117
118 NOTE: Since Slurm must be able to perform resource management on
119 heterogeneous clusters having various processing unit numbering
120 schemes, a logical core index must be specified instead of the
121 physical core index. That logical core index might not corre‐
122 spond to your physical core index number. Core 0 will be the
123 first core on the first socket, while core 1 will be the second
124 core on the first socket. This numbering coincides with the
125 logical core number (Core L#) seen in "lstopo -l" command out‐
126 put.
127
128 File Fully qualified pathname of the device files associated with a
129 resource. The name can include a numeric range suffix to be in‐
130 terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
131
132
133 This field is generally required if enforcement of generic re‐
134 source allocations is to be supported (i.e. prevents users from
135 making use of resources allocated to a different user). En‐
136 forcement of the file allocation relies upon Linux Control
137 Groups (cgroups) and Slurm's task/cgroup plugin, which will
138 place the allocated files into the job's cgroup and prevent use
139 of other files. Please see Slurm's Cgroups Guide for more in‐
140 formation: https://slurm.schedmd.com/cgroups.html.
141
142 If File is specified then Count must be either set to the number
143 of file names specified or not set (the default value is the
144 number of files specified). The exception to this is MPS. For
145 MPS, each GPU would be identified by device file using the File
146 parameter and Count would specify the number of MPS entries that
147 would correspond to that GPU (typically 100 or some multiple of
148 100).
149
150 If using a card with Multi-Instance GPU functionality, use Mul‐
151 tipleFiles instead. File and MultipleFiles are mutually exclu‐
152 sive.
153
154 NOTE: If you specify the File parameter for a resource on some
155 node, the option must be specified on all nodes and Slurm will
156 track the assignment of each specific resource on each node.
157 Otherwise Slurm will only track a count of allocated resources
158 rather than the state of each individual device file.
159
160 NOTE: Drain a node before changing the count of records with
161 File parameters (e.g. if you want to add or remove GPUs from a
162 node's configuration). Failure to do so will result in any job
163 using those GRES being aborted.
164
165 Flags Optional flags that can be specified to change configured behav‐
166 ior of the GRES.
167
168 Allowed values at present are:
169
170 CountOnly Do not attempt to load plugin as this GRES
171 will only be used to track counts of GRES
172 used. This avoids attempting to load non-ex‐
173 istent plugin which can affect filesystems
174 with high latency metadata operations for
175 non-existent files.
176
177 nvidia_gpu_env Set environment variable CUDA_VISIBLE_DE‐
178 VICES for all GPUs on the specified node(s).
179
180 amd_gpu_env Set environment variable ROCR_VISIBLE_DE‐
181 VICES for all GPUs on the specified node(s).
182
183 opencl_env Set environment variable GPU_DEVICE_ORDINAL
184 for all GPUs on the specified node(s).
185
186 no_gpu_env Set no GPU-specific environment variables.
187 This is mutually exclusive to all other en‐
188 vironment-related flags.
189
190 If no environment-related flags are specified, then
191 nvidia_gpu_env, amd_gpu_env, and opencl_env will be implicitly
192 set by default. If AutoDetect is used and environment-related
193 flags are not specified, then AutoDetect=nvml will set
194 nvidia_gpu_env and AutoDetect=rsmi will set amd_gpu_env. Con‐
195 versely, specified environment-related flags will always over‐
196 ride AutoDetect.
197
198 Environment-related flags set on one GRES line will be inherited
199 by the GRES line directly below it if no environment-related
200 flags are specified on that line and if it is of the same node,
201 name, and type. Environment-related flags must be the same for
202 GRES of the same node, name, and type.
203
204 Note that there is a known issue with the AMD ROCm runtime where
205 ROCR_VISIBLE_DEVICES is processed first, and then CUDA_VISI‐
206 BLE_DEVICES is processed. To avoid the issues caused by this,
207 set Flags=amd_gpu_env for AMD GPUs so only ROCR_VISIBLE_DEVICES
208 is set.
209
210 Links A comma-delimited list of numbers identifying the number of con‐
211 nections between this device and other devices to allow
212 coscheduling of better connected devices. This is an ordered
213 list in which the number of connections this specific device has
214 to device number 0 would be in the first position, the number of
215 connections it has to device number 1 in the second position,
216 etc. A -1 indicates the device itself and a 0 indicates no con‐
217 nection. If specified, then this line can only contain a single
218 GRES device (i.e. can only contain a single file via File).
219
220
221 This is an optional value and is usually automatically deter‐
222 mined if AutoDetect is enabled. A typical use case would be to
223 identify GPUs having NVLink connectivity. Note that for GPUs,
224 the minor number assigned by the OS and used in the device file
225 (i.e. the X in /dev/nvidiaX) is not necessarily the same as the
226 device number/index. The device number is created by sorting the
227 GPUs by PCI bus ID and then numbering them starting from the
228 smallest bus ID. See
229 https://slurm.schedmd.com/gres.html#GPU_Management
230
231 MultipleFiles
232 Fully qualified pathname of the device files associated with a
233 resource. Graphics cards using Multi-Instance GPU (MIG) tech‐
234 nology will present multiple device files that should be managed
235 as a single generic resource. The file names can be a comma sep‐
236 arated list or it can include a numeric range suffix (e.g. Mul‐
237 tipleFiles=/dev/nvidia[0-3]).
238
239 Drain a node before changing the count of records with the Mul‐
240 tipleFiles parameter, such as when adding or removing GPUs from
241 a node's configuration. Failure to do so will result in any job
242 using those GRES being aborted.
243
244 When not using GPUs with MIG functionality, use File instead.
245 MultipleFiles and File are mutually exclusive.
246
247 Name Name of the generic resource. Any desired name may be used. The
248 name must match a value in GresTypes in slurm.conf. Each
249 generic resource has an optional plugin which can provide re‐
250 source-specific functionality. Generic resources that currently
251 include an optional plugin are:
252
253 gpu Graphics Processing Unit
254
255 mps CUDA Multi-Process Service (MPS)
256
257 nic Network Interface Card
258
259 NodeName
260 An optional NodeName specification can be used to permit one
261 gres.conf file to be used for all compute nodes in a cluster by
262 specifying the node(s) that each line should apply to. The
263 NodeName specification can use a Slurm hostlist specification as
264 shown in the example below.
265
266 Type An optional arbitrary string identifying the type of generic re‐
267 source. For example, this might be used to identify a specific
268 model of GPU, which users can then specify in a job request. If
269 Type is specified, then Count is limited in size (currently
270 1024). A restart of the slurmctld and slurmd daemons is re‐
271 quired for changes to the Type option to take effect.
272
273 NOTE: If using autodetect functionality and defining the Type in
274 your gres.conf file, the Type specified should match or be a
275 substring of the value that is detected, using an underscore in
276 lieu of any spaces.
277
279 ##################################################################
280 # Slurm's Generic Resource (GRES) configuration file
281 # Define GPU devices with MPS support, with AutoDetect sanity checking
282 ##################################################################
283 AutoDetect=nvml
284 Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
285 Name=gpu Type=tesla File=/dev/nvidia1 COREs=2,3
286 Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
287 Name=mps Count=100 File=/dev/nvidia1 COREs=2,3
288
289 ##################################################################
290 # Slurm's Generic Resource (GRES) configuration file
291 # Overwrite system defaults and explicitly configure three GPUs
292 ##################################################################
293 Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
294 # Name=gpu Type=tesla File=/dev/nvidia[2-3] COREs=2,3
295 # NOTE: nvidia2 device is out of service
296 Name=gpu Type=tesla File=/dev/nvidia3 COREs=2,3
297
298 ##################################################################
299 # Slurm's Generic Resource (GRES) configuration file
300 # Use a single gres.conf file for all compute nodes - positive method
301 ##################################################################
302 ## Explicitly specify devices on nodes tux0-tux15
303 # NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
304 # NOTE: tux3 nvidia1 device is out of service
305 NodeName=tux[0-2] Name=gpu File=/dev/nvidia[0-3]
306 NodeName=tux3 Name=gpu File=/dev/nvidia[0,2-3]
307 NodeName=tux[4-15] Name=gpu File=/dev/nvidia[0-3]
308
309 ##################################################################
310 # Slurm's Generic Resource (GRES) configuration file
311 # Use NVML to gather GPU configuration information
312 # for all nodes except one
313 ##################################################################
314 AutoDetect=nvml
315 NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
316
317 ##################################################################
318 # Slurm's Generic Resource (GRES) configuration file
319 # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
320 ##################################################################
321 NodeName=tux[0-7] AutoDetect=nvml
322 NodeName=tux[8-11] AutoDetect=rsmi
323 NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
324
325 ##################################################################
326 # Slurm's Generic Resource (GRES) configuration file
327 # Define 'bandwidth' GRES to use as a way to limit the
328 # resource use on these nodes for workflow purposes
329 ##################################################################
330 NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
331
332
334 Copyright (C) 2010 The Regents of the University of California. Pro‐
335 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
336 Copyright (C) 2010-2022 SchedMD LLC.
337
338 This file is part of Slurm, a resource management program. For de‐
339 tails, see <https://slurm.schedmd.com/>.
340
341 Slurm is free software; you can redistribute it and/or modify it under
342 the terms of the GNU General Public License as published by the Free
343 Software Foundation; either version 2 of the License, or (at your op‐
344 tion) any later version.
345
346 Slurm is distributed in the hope that it will be useful, but WITHOUT
347 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
348 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
349 for more details.
350
351
353 slurm.conf(5)
354
355
356
357February 2022 Slurm Configuration File gres.conf(5)