gres.conf(5)

1gres.conf(5)               Slurm Configuration File               gres.conf(5)
2
3
4

NAME

6       gres.conf  -  Slurm configuration file for Generic RESource (GRES) man‐
7       agement.
8
9

DESCRIPTION

11       gres.conf is an ASCII file which describes the configuration of Generic
12       RESource(s)  (GRES)  on  each compute node.  If the GRES information in
13       the slurm.conf file does not fully describe  those  resources,  then  a
14       gres.conf  file should be included on each compute node.  The file will
15       always be located in the same directory as the slurm.conf.
16
17
18       If the GRES information in the slurm.conf file  fully  describes  those
19       resources (i.e. no "Cores", "File" or "Links" specification is required
20       for that GRES type or that information is automatically detected), that
21       information may be omitted from the gres.conf file and only the config‐
22       uration information in the slurm.conf file will be used.  The gres.conf
23       file  may be omitted completely if the configuration information in the
24       slurm.conf file fully describes all GRES.
25
26
27       If using the gres.conf file to  describe  the  resources  available  to
28       nodes, the first parameter on the line should be NodeName. If configur‐
29       ing Generic Resources without specifying nodes, the first parameter  on
30       the line should be Name.
31
32
33       Parameter  names are case insensitive.  Any text following a "#" in the
34       configuration file is treated as a comment  through  the  end  of  that
35       line.   Changes  to  the configuration file take effect upon restart of
36       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
37       command "scontrol reconfigure" unless otherwise noted.
38
39
40       NOTE:   Slurm  support  for  gres/mps  requires  the  use  of  the  se‐
41       lect/cons_tres plugin. For more information on how  to  configure  MPS,
42       see https://slurm.schedmd.com/gres.html#MPS_Management.
43
44
45       For   more   information   on   GRES   scheduling   in   general,   see
46       https://slurm.schedmd.com/gres.html.
47
48
49       The overall configuration parameters available include:
50
51
52       AutoDetect
53              The hardware detection mechanisms to enable for  automatic  GRES
54              configuration.  Currently, the options are:
55
56              nvml   Automatically  detect  NVIDIA  GPUs.  Requires the NVIDIA
57                     Management Library (NVML).
58
59              off    Do not automatically detect any GPUs.  Used  to  override
60                     other options.
61
62              rsmi   Automatically  detect  AMD GPUs. Requires the ROCm System
63                     Management Interface (ROCm SMI) Library.
64
65              AutoDetect can be on a line by itself, in  which  case  it  will
66              globally  apply  to  all lines in gres.conf by default. In addi‐
67              tion, AutoDetect can be combined with NodeName to only apply  to
68              certain  nodes.  Node-specific AutoDetects will trump the global
69              AutoDetect. A node-specific AutoDetect only needs to  be  speci‐
70              fied  once  per  node.  If specified multiple times for the same
71              nodes, they must all be the same value. To unset AutoDetect  for
72              a  node  when a global AutoDetect is set, simply set it to "off"
73              in a  node-specific  GRES  line.   E.g.:  NodeName=tux3  AutoDe‐
74              tect=off Name=gpu File=/dev/nvidia[0-3].
75
76
77       Count  Number  of  resources  of this name/type available on this node.
78              The default value is set to the number of File values  specified
79              (if  any),  otherwise the default value is one. A suffix of "K",
80              "M", "G", "T" or "P" may be used to multiply the number by 1024,
81              1048576,    1073741824,   etc.   respectively.    For   example:
82              "Count=10G".
83
84       Cores  Optionally specify the core index numbers for the specific cores
85              which  can  use  this resource.  For example, it may be strongly
86              preferable to use specific  cores  with  specific  GRES  devices
87              (e.g. on a NUMA architecture).  While Slurm can track and assign
88              resources at the CPU or thread level, its scheduling  algorithms
89              used  to co-allocate GRES devices with CPUs operates at a socket
90              or NUMA level.  Therefore it is not possible  to  preferentially
91              assign  GRES  with  different  specific CPUs on the same NUMA or
92              socket and this option should be used to identify all  cores  on
93              some socket.
94
95
96              Multiple  cores may be specified using a comma-delimited list or
97              a range may be specified using a "-" separator  (e.g.  "0,1,2,3"
98              or  "0-3").   If  a  job specifies --gres-flags=enforce-binding,
99              then only the  identified  cores  can  be  allocated  with  each
100              generic resource. This will tend to improve performance of jobs,
101              but delay the allocation of resources to them.  If specified and
102              a job is not submitted with the --gres-flags=enforce-binding op‐
103              tion the identified cores will be preferred for scheduling  with
104              each generic resource.
105
106              If  --gres-flags=disable-binding is specified, then any core can
107              be used with the resources, which also increases  the  speed  of
108              Slurm's  scheduling  algorithm  but  can degrade the application
109              performance.  The --gres-flags=disable-binding  option  is  cur‐
110              rently  required to use more CPUs than are bound to a GRES (e.g.
111              if a GPU is bound to the CPUs on one socket,  but  resources  on
112              more  than one socket are required to run the job).  If any core
113              can be effectively used with the resources, then do not  specify
114              the  cores  option  for  improved  speed in the Slurm scheduling
115              logic.  A restart of the slurmctld is needed for changes to  the
116              Cores option to take effect.
117
118              NOTE: Since Slurm must be able to perform resource management on
119              heterogeneous clusters having various processing unit  numbering
120              schemes,  a  logical core index must be specified instead of the
121              physical core index.  That logical core index might  not  corre‐
122              spond  to  your  physical core index number.  Core 0 will be the
123              first core on the first socket, while core 1 will be the  second
124              core  on  the  first  socket.  This numbering coincides with the
125              logical core number (Core L#) seen in "lstopo -l"  command  out‐
126              put.
127
128       File   Fully  qualified  pathname of the device files associated with a
129              resource.  The name can include a numeric range suffix to be in‐
130              terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
131
132
133              This  field  is generally required if enforcement of generic re‐
134              source allocations is to be supported (i.e. prevents users  from
135              making  use  of  resources  allocated to a different user).  En‐
136              forcement of the  file  allocation  relies  upon  Linux  Control
137              Groups  (cgroups)  and  Slurm's  task/cgroup  plugin, which will
138              place the allocated files into the job's cgroup and prevent  use
139              of  other  files.  Please see Slurm's Cgroups Guide for more in‐
140              formation: https://slurm.schedmd.com/cgroups.html.
141
142              If File is specified then Count must be either set to the number
143              of  file  names  specified  or not set (the default value is the
144              number of files specified).  The exception to this is  MPS.  For
145              MPS,  each GPU would be identified by device file using the File
146              parameter and Count would specify the number of MPS entries that
147              would  correspond to that GPU (typically 100 or some multiple of
148              100).
149
150              If using a card with Multi-Instance GPU functionality, use  Mul‐
151              tipleFiles  instead.  File and MultipleFiles are mutually exclu‐
152              sive.
153
154              NOTE: If you specify the File parameter for a resource  on  some
155              node,  the  option must be specified on all nodes and Slurm will
156              track the assignment of each specific  resource  on  each  node.
157              Otherwise  Slurm  will only track a count of allocated resources
158              rather than the state of each individual device file.
159
160              NOTE: Drain a node before changing the  count  of  records  with
161              File  parameters  (e.g. if you want to add or remove GPUs from a
162              node's configuration).  Failure to do so will result in any  job
163              using those GRES being aborted.
164
165       Flags  Optional flags that can be specified to change configured behav‐
166              ior of the GRES.
167
168              Allowed values at present are:
169
170              CountOnly           Do not attempt to load plugin as  this  GRES
171                                  will  only  be  used to track counts of GRES
172                                  used. This avoids attempting to load non-ex‐
173                                  istent  plugin  which can affect filesystems
174                                  with high latency  metadata  operations  for
175                                  non-existent files.
176
177              nvidia_gpu_env      Set  environment  variable  CUDA_VISIBLE_DE‐
178                                  VICES for all GPUs on the specified node(s).
179
180              amd_gpu_env         Set  environment  variable  ROCR_VISIBLE_DE‐
181                                  VICES for all GPUs on the specified node(s).
182
183              opencl_env          Set  environment variable GPU_DEVICE_ORDINAL
184                                  for all GPUs on the specified node(s).
185
186              no_gpu_env          Set no GPU-specific  environment  variables.
187                                  This  is mutually exclusive to all other en‐
188                                  vironment-related flags.
189
190              If   no   environment-related   flags   are   specified,    then
191              nvidia_gpu_env,  amd_gpu_env,  and opencl_env will be implicitly
192              set by default. If AutoDetect is  used  and  environment-related
193              flags   are   not   specified,  then  AutoDetect=nvml  will  set
194              nvidia_gpu_env and AutoDetect=rsmi will  set  amd_gpu_env.  Con‐
195              versely,  specified  environment-related flags will always over‐
196              ride AutoDetect.
197
198              Environment-related flags set on one GRES line will be inherited
199              by  the  GRES  line  directly below it if no environment-related
200              flags are specified on that line and if it is of the same  node,
201              name,  and  type. Environment-related flags must be the same for
202              GRES of the same node, name, and type.
203
204              Note that there is a known issue with the AMD ROCm runtime where
205              ROCR_VISIBLE_DEVICES  is  processed  first,  and then CUDA_VISI‐
206              BLE_DEVICES is processed. To avoid the issues  caused  by  this,
207              set  Flags=amd_gpu_env for AMD GPUs so only ROCR_VISIBLE_DEVICES
208              is set.
209
210       Links  A comma-delimited list of numbers identifying the number of con‐
211              nections   between  this  device  and  other  devices  to  allow
212              coscheduling of better connected devices.  This  is  an  ordered
213              list in which the number of connections this specific device has
214              to device number 0 would be in the first position, the number of
215              connections  it  has  to device number 1 in the second position,
216              etc.  A -1 indicates the device itself and a 0 indicates no con‐
217              nection.  If specified, then this line can only contain a single
218              GRES device (i.e. can only contain a single file via File).
219
220
221              This is an optional value and is  usually  automatically  deter‐
222              mined  if AutoDetect is enabled.  A typical use case would be to
223              identify GPUs having NVLink connectivity.  Note that  for  GPUs,
224              the  minor number assigned by the OS and used in the device file
225              (i.e. the X in /dev/nvidiaX) is not necessarily the same as  the
226              device number/index. The device number is created by sorting the
227              GPUs by PCI bus ID and then numbering  them  starting  from  the
228              smallest                bus               ID.                See
229              https://slurm.schedmd.com/gres.html#GPU_Management
230
231       MultipleFiles
232              Fully qualified pathname of the device files associated  with  a
233              resource.   Graphics  cards using Multi-Instance GPU (MIG) tech‐
234              nology will present multiple device files that should be managed
235              as a single generic resource. The file names can be a comma sep‐
236              arated list or it can include a numeric range suffix (e.g.  Mul‐
237              tipleFiles=/dev/nvidia[0-3]).
238
239              Drain  a node before changing the count of records with the Mul‐
240              tipleFiles parameter, such as when adding or removing GPUs  from
241              a node's configuration.  Failure to do so will result in any job
242              using those GRES being aborted.
243
244              When not using GPUs with MIG functionality,  use  File  instead.
245              MultipleFiles and File are mutually exclusive.
246
247       Name   Name of the generic resource. Any desired name may be used.  The
248              name must match  a  value  in  GresTypes  in  slurm.conf.   Each
249              generic  resource  has  an optional plugin which can provide re‐
250              source-specific functionality.  Generic resources that currently
251              include an optional plugin are:
252
253              gpu    Graphics Processing Unit
254
255              mps    CUDA Multi-Process Service (MPS)
256
257              nic    Network Interface Card
258
259       NodeName
260              An  optional  NodeName  specification  can be used to permit one
261              gres.conf file to be used for all compute nodes in a cluster  by
262              specifying  the  node(s)  that  each  line should apply to.  The
263              NodeName specification can use a Slurm hostlist specification as
264              shown in the example below.
265
266       Type   An optional arbitrary string identifying the type of generic re‐
267              source.  For example, this might be used to identify a  specific
268              model of GPU, which users can then specify in a job request.  If
269              Type is specified, then Count  is  limited  in  size  (currently
270              1024).   A  restart  of  the slurmctld and slurmd daemons is re‐
271              quired for changes to the Type option to take effect.
272
273              NOTE: If using autodetect functionality and defining the Type in
274              your  gres.conf  file,  the  Type specified should match or be a
275              substring of the value that is detected, using an underscore  in
276              lieu of any spaces.
277

EXAMPLES

279       ##################################################################
280       # Slurm's Generic Resource (GRES) configuration file
281       # Define GPU devices with MPS support, with AutoDetect sanity checking
282       ##################################################################
283       AutoDetect=nvml
284       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
285       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
286       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
287       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3
288
289       ##################################################################
290       # Slurm's Generic Resource (GRES) configuration file
291       # Overwrite system defaults and explicitly configure three GPUs
292       ##################################################################
293       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
294       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
295       # NOTE: nvidia2 device is out of service
296       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3
297
298       ##################################################################
299       # Slurm's Generic Resource (GRES) configuration file
300       # Use a single gres.conf file for all compute nodes - positive method
301       ##################################################################
302       ## Explicitly specify devices on nodes tux0-tux15
303       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
304       # NOTE: tux3 nvidia1 device is out of service
305       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
306       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
307       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]
308
309       ##################################################################
310       # Slurm's Generic Resource (GRES) configuration file
311       # Use NVML to gather GPU configuration information
312       # for all nodes except one
313       ##################################################################
314       AutoDetect=nvml
315       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
316
317       ##################################################################
318       # Slurm's Generic Resource (GRES) configuration file
319       # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
320       ##################################################################
321       NodeName=tux[0-7] AutoDetect=nvml
322       NodeName=tux[8-11] AutoDetect=rsmi
323       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
324
325       ##################################################################
326       # Slurm's Generic Resource (GRES) configuration file
327       # Define 'bandwidth' GRES to use as a way to limit the
328       # resource use on these nodes for workflow purposes
329       ##################################################################
330       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
331
332

COPYING

334       Copyright  (C)  2010 The Regents of the University of California.  Pro‐
335       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
336       Copyright (C) 2010-2022 SchedMD LLC.
337
338       This file is part of Slurm, a resource  management  program.   For  de‐
339       tails, see <https://slurm.schedmd.com/>.
340
341       Slurm  is free software; you can redistribute it and/or modify it under
342       the terms of the GNU General Public License as published  by  the  Free
343       Software  Foundation;  either version 2 of the License, or (at your op‐
344       tion) any later version.
345
346       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
347       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
348       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
349       for more details.
350
351

NAME

DESCRIPTION

EXAMPLES

COPYING

SEE ALSO