1gres.conf(5) Slurm Configuration File gres.conf(5)
2
3
4
6 gres.conf - Slurm configuration file for Generic RESource (GRES) man‐
7 agement.
8
9
11 gres.conf is an ASCII file which describes the configuration of Generic
12 RESource (GRES) on each compute node. If the GRES information in the
13 slurm.conf file does not fully describe those resources, then a
14 gres.conf file should be included on each compute node. The file loca‐
15 tion can be modified at system build time using the DEFAULT_SLURM_CONF
16 parameter or at execution time by setting the SLURM_CONF environment
17 variable. The file will always be located in the same directory as the
18 slurm.conf file.
19
20
21 If the GRES information in the slurm.conf file fully describes those
22 resources (i.e. no "Cores", "File" or "Links" specification is required
23 for that GRES type or that information is automatically detected), that
24 information may be omitted from the gres.conf file and only the config‐
25 uration information in the slurm.conf file will be used. The gres.conf
26 file may be omitted completely if the configuration information in the
27 slurm.conf file fully describes all GRES.
28
29
30 If using the gres.conf file to describe the resources available to
31 nodes, the first parameter on the line should be NodeName. If configur‐
32 ing Generic Resources without specifying nodes, the first parameter on
33 the line should be Name.
34
35
36 Parameter names are case insensitive. Any text following a "#" in the
37 configuration file is treated as a comment through the end of that
38 line. Changes to the configuration file take effect upon restart of
39 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
40 command "scontrol reconfigure" unless otherwise noted.
41
42
43 NOTE: Slurm support for gres/mps requires the use of the se‐
44 lect/cons_tres plugin. For more information on how to configure MPS,
45 see https://slurm.schedmd.com/gres.html#MPS_Management.
46
47
48 For more information on GRES scheduling in general, see
49 https://slurm.schedmd.com/gres.html.
50
51
52 The overall configuration parameters available include:
53
54
55 AutoDetect
56 The hardware detection mechanisms to enable for automatic GRES
57 configuration. Currently, the options are:
58
59 nvml Automatically detect NVIDIA GPUs
60
61 off Do not automatically detect any GPUs. Used to override
62 other options.
63
64 rsmi Automatically detect AMD GPUs
65
66 AutoDetect can be on a line by itself, in which case it will globally
67 apply to all lines in gres.conf by default. In addition, AutoDetect can
68 be combined with NodeName to only apply to certain nodes. Node-specific
69 AutoDetects will trump the global AutoDetect. A node-specific AutoDe‐
70 tect only needs to be specified once per node. If specified multiple
71 times for the same nodes, they must all be the same value. To unset Au‐
72 toDetect for a node when a global AutoDetect is set, simply set it to
73 "off" in a node-specific GRES line. E.g.: NodeName=tux3 AutoDetect=off
74 Name=gpu File=/dev/nvidia[0-3].
75
76
77 Count Number of resources of this type available on this node. The
78 default value is set to the number of File values specified (if
79 any), otherwise the default value is one. A suffix of "K", "M",
80 "G", "T" or "P" may be used to multiply the number by 1024,
81 1048576, 1073741824, etc. respectively. For example:
82 "Count=10G".
83
84
85 Cores Optionally specify the core index numbers for the specific cores
86 which can use this resource. For example, it may be strongly
87 preferable to use specific cores with specific GRES devices
88 (e.g. on a NUMA architecture). While Slurm can track and assign
89 resources at the CPU or thread level, its scheduling algorithms
90 used to co-allocate GRES devices with CPUs operates at a socket
91 or NUMA level. Therefore it is not possible to preferentially
92 assign GRES with different specific CPUs on the same NUMA or
93 socket and this option should be used to identify all cores on
94 some socket.
95
96
97 Multiple cores may be specified using a comma-delimited list or
98 a range may be specified using a "-" separator (e.g. "0,1,2,3"
99 or "0-3"). If a job specifies --gres-flags=enforce-binding,
100 then only the identified cores can be allocated with each
101 generic resource. This will tend to improve performance of jobs,
102 but delay the allocation of resources to them. If specified and
103 a job is not submitted with the --gres-flags=enforce-binding op‐
104 tion the identified cores will be preferred for scheduling with
105 each generic resource.
106
107 If --gres-flags=disable-binding is specified, then any core can
108 be used with the resources, which also increases the speed of
109 Slurm's scheduling algorithm but can degrade the application
110 performance. The --gres-flags=disable-binding option is cur‐
111 rently required to use more CPUs than are bound to a GRES (i.e.
112 if a GPU is bound to the CPUs on one socket, but resources on
113 more than one socket are required to run the job). If any core
114 can be effectively used with the resources, then do not specify
115 the cores option for improved speed in the Slurm scheduling
116 logic. A restart of the slurmctld is needed for changes to the
117 Cores option to take effect.
118
119 NOTE: Since Slurm must be able to perform resource management on
120 heterogeneous clusters having various processing unit numbering
121 schemes, a logical core index must be specified instead of the
122 physical core index. That logical core index might not corre‐
123 spond to your physical core index number. Core 0 will be the
124 first core on the first socket, while core 1 will be the second
125 core on the first socket. This numbering coincides with the
126 logical core number (Core L#) seen in "lstopo -l" command out‐
127 put.
128
129
130 File Fully qualified pathname of the device files associated with a
131 resource. The name can include a numeric range suffix to be in‐
132 terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
133
134
135 This field is generally required if enforcement of generic re‐
136 source allocations is to be supported (i.e. prevents users from
137 making use of resources allocated to a different user). En‐
138 forcement of the file allocation relies upon Linux Control
139 Groups (cgroups) and Slurm's task/cgroup plugin, which will
140 place the allocated files into the job's cgroup and prevent use
141 of other files. Please see Slurm's Cgroups Guide for more in‐
142 formation: https://slurm.schedmd.com/cgroups.html.
143
144 If File is specified then Count must be either set to the number
145 of file names specified or not set (the default value is the
146 number of files specified). The exception to this is MPS. For
147 MPS, each GPU would be identified by device file using the File
148 parameter and Count would specify the number of MPS entries that
149 would correspond to that GPU (typically 100 or some multiple of
150 100).
151
152 NOTE: If you specify the File parameter for a resource on some
153 node, the option must be specified on all nodes and Slurm will
154 track the assignment of each specific resource on each node.
155 Otherwise Slurm will only track a count of allocated resources
156 rather than the state of each individual device file.
157
158 NOTE: Drain a node before changing the count of records with
159 File parameters (i.e. if you want to add or remove GPUs from a
160 node's configuration). Failure to do so will result in any job
161 using those GRES being aborted.
162
163
164 Flags Optional flags that can be specified to change configured behav‐
165 ior of the GRES.
166
167 Allowed values at present are:
168
169 CountOnly Do not attempt to load plugin as this GRES
170 will only be used to track counts of GRES
171 used. This avoids attempting to load non-ex‐
172 istent plugin which can affect filesystems
173 with high latency metadata operations for
174 non-existent files.
175
176 nvidia_gpu_env Set environment variable CUDA_VISIBLE_DE‐
177 VICES for all GPUs on the specified node(s).
178
179 amd_gpu_env Set environment variable ROCR_VISIBLE_DE‐
180 VICES for all GPUs on the specified node(s).
181
182 opencl_env Set environment variable GPU_DEVICE_ORDINAL
183 for all GPUs on the specified node(s).
184
185 no_gpu_env Set no GPU-specific environment variables.
186 This is mutually exclusive to all other en‐
187 vironment-related flags.
188
189 If no environment-related flags are specified, then nvidia_gpu_env,
190 amd_gpu_env, and opencl_env will be implicitly set by default. If Au‐
191 toDetect is used and environment-related flags are not specified, then
192 AutoDetect=nvml will set nvidia_gpu_env and AutoDetect=rsmi will set
193 amd_gpu_env. Conversely, specified environment-related flags will al‐
194 ways override AutoDetect.
195
196 Environment-related flags set on one GRES line will be inherited by the
197 GRES line directly below it if no environment-related flags are speci‐
198 fied on that line and if it is of the same node, name, and type. Envi‐
199 ronment-related flags must be the same for GRES of the same node, name,
200 and type.
201
202 Note that there is a known issue with the AMD ROCm runtime where
203 ROCR_VISIBLE_DEVICES is processed first, and then CUDA_VISIBLE_DEVICES
204 is processed. To avoid the issues caused by this, set Flags=amd_gpu_env
205 for AMD GPUs so only ROCR_VISIBLE_DEVICES is set.
206
207
208
209 Links A comma-delimited list of numbers identifying the number of con‐
210 nections between this device and other devices to allow
211 coscheduling of better connected devices. This is an ordered
212 list in which the number of connections this specific device has
213 to device number 0 would be in the first position, the number of
214 connections it has to device number 1 in the second position,
215 etc. A -1 indicates the device itself and a 0 indicates no con‐
216 nection. If specified, then this line can only contain a single
217 GRES device (i.e. can only contain a single file via File).
218
219
220 This is an optional value and is usually automatically deter‐
221 mined if AutoDetect is enabled. A typical use case would be to
222 identify GPUs having NVLink connectivity. Note that for GPUs,
223 the minor number assigned by the OS and used in the device file
224 (i.e. the X in /dev/nvidiaX) is not necessarily the same as the
225 device number/index. The device number is created by sorting the
226 GPUs by PCI bus ID and then numbering them starting from the
227 smallest bus ID. See
228 https://slurm.schedmd.com/gres.html#GPU_Management
229
230
231 Name Name of the generic resource. Any desired name may be used. The
232 name must match a value in GresTypes in slurm.conf. Each
233 generic resource has an optional plugin which can provide re‐
234 source-specific functionality. Generic resources that currently
235 include an optional plugin are:
236
237 gpu Graphics Processing Unit
238
239 mps CUDA Multi-Process Service (MPS)
240
241 nic Network Interface Card
242
243
244 NodeName
245 An optional NodeName specification can be used to permit one
246 gres.conf file to be used for all compute nodes in a cluster by
247 specifying the node(s) that each line should apply to. The
248 NodeName specification can use a Slurm hostlist specification as
249 shown in the example below.
250
251
252 Type An optional arbitrary string identifying the type of device.
253 For example, this might be used to identify a specific model of
254 GPU, which users can then specify in a job request. If Type is
255 specified, then Count is limited in size (currently 1024). A
256 restart of the slurmctld and slurmd daemons is required for
257 changes to the Type option to take effect.
258
259
261 ##################################################################
262 # Slurm's Generic Resource (GRES) configuration file
263 # Define GPU devices with MPS support, with AutoDetect sanity checking
264 ##################################################################
265 AutoDetect=nvml
266 Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
267 Name=gpu Type=tesla File=/dev/nvidia1 COREs=2,3
268 Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
269 Name=mps Count=100 File=/dev/nvidia1 COREs=2,3
270
271
272 ##################################################################
273 # Slurm's Generic Resource (GRES) configuration file
274 # Overwrite system defaults and explicitly configure three GPUs
275 ##################################################################
276 Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
277 # Name=gpu Type=tesla File=/dev/nvidia[2-3] COREs=2,3
278 # NOTE: nvidia2 device is out of service
279 Name=gpu Type=tesla File=/dev/nvidia3 COREs=2,3
280
281
282 ##################################################################
283 # Slurm's Generic Resource (GRES) configuration file
284 # Use a single gres.conf file for all compute nodes - positive method
285 ##################################################################
286 ## Explicitly specify devices on nodes tux0-tux15
287 # NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
288 # NOTE: tux3 nvidia1 device is out of service
289 NodeName=tux[0-2] Name=gpu File=/dev/nvidia[0-3]
290 NodeName=tux3 Name=gpu File=/dev/nvidia[0,2-3]
291 NodeName=tux[4-15] Name=gpu File=/dev/nvidia[0-3]
292
293
294 ##################################################################
295 # Slurm's Generic Resource (GRES) configuration file
296 # Use NVML to gather GPU configuration information
297 # for all nodes except one
298 ##################################################################
299 AutoDetect=nvml
300 NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
301
302 ##################################################################
303 # Slurm's Generic Resource (GRES) configuration file
304 # Specify some nodes with NVML, some with RSMI, and some with no Au‐
305 toDetect
306 ##################################################################
307 NodeName=tux[0-7] AutoDetect=nvml
308 NodeName=tux[8-11] AutoDetect=rsmi
309 NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
310
311
312 ##################################################################
313 # Slurm's Generic Resource (GRES) configuration file
314 # Define 'bandwidth' GRES to use as a way to limit the
315 # resource use on these nodes for workflow purposes
316 ##################################################################
317 NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
318
319
321 Copyright (C) 2010 The Regents of the University of California. Pro‐
322 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
323 Copyright (C) 2010-2021 SchedMD LLC.
324
325 This file is part of Slurm, a resource management program. For de‐
326 tails, see <https://slurm.schedmd.com/>.
327
328 Slurm is free software; you can redistribute it and/or modify it under
329 the terms of the GNU General Public License as published by the Free
330 Software Foundation; either version 2 of the License, or (at your op‐
331 tion) any later version.
332
333 Slurm is distributed in the hope that it will be useful, but WITHOUT
334 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
335 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
336 for more details.
337
338
340 slurm.conf(5)
341
342
343
344August 2021 Slurm Configuration File gres.conf(5)