gres.conf(5)

1gres.conf(5)               Slurm Configuration File               gres.conf(5)
2
3
4

NAME

6       gres.conf  -  Slurm configuration file for Generic RESource (GRES) man‐
7       agement.
8
9

DESCRIPTION

11       gres.conf is an ASCII file which describes the configuration of Generic
12       RESource  (GRES)  on each compute node.  If the GRES information in the
13       slurm.conf file  does  not  fully  describe  those  resources,  then  a
14       gres.conf file should be included on each compute node.  The file loca‐
15       tion can be modified at system build time using the  DEFAULT_SLURM_CONF
16       parameter  or  at  execution time by setting the SLURM_CONF environment
17       variable. The file will always be located in the same directory as  the
18       slurm.conf file.
19
20
21       If  the  GRES  information in the slurm.conf file fully describes those
22       resources (i.e. no "Cores", "File" or "Links" specification is required
23       for that GRES type or that information is automatically detected), that
24       information may be omitted from the gres.conf file and only the config‐
25       uration information in the slurm.conf file will be used.  The gres.conf
26       file may be omitted completely if the configuration information in  the
27       slurm.conf file fully describes all GRES.
28
29
30       If  using  the  gres.conf  file  to describe the resources available to
31       nodes, the first parameter on the line should be NodeName. If configur‐
32       ing  Generic Resources without specifying nodes, the first parameter on
33       the line should be Name.
34
35
36       Parameter names are case insensitive.  Any text following a "#" in  the
37       configuration  file  is  treated  as  a comment through the end of that
38       line.  Changes to the configuration file take effect  upon  restart  of
39       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
40       command "scontrol reconfigure" unless otherwise noted.
41
42
43       NOTE:  Slurm  support  for  gres/mps  requires  the  use  of  the   se‐
44       lect/cons_tres  plugin.  For  more information on how to configure MPS,
45       see https://slurm.schedmd.com/gres.html#MPS_Management.
46
47
48       For   more   information   on   GRES   scheduling   in   general,   see
49       https://slurm.schedmd.com/gres.html.
50
51
52       The overall configuration parameters available include:
53
54
55       AutoDetect
56              The  hardware  detection mechanisms to enable for automatic GRES
57              configuration.  Currently, the options are:
58
59              nvml   Automatically detect NVIDIA GPUs
60
61              off    Do not automatically detect any GPUs.  Used  to  override
62                     other options.
63
64              rsmi   Automatically detect AMD GPUs
65
66       AutoDetect  can  be on a line by itself, in which case it will globally
67       apply to all lines in gres.conf by default. In addition, AutoDetect can
68       be combined with NodeName to only apply to certain nodes. Node-specific
69       AutoDetects will trump the global AutoDetect. A  node-specific  AutoDe‐
70       tect  only  needs  to be specified once per node. If specified multiple
71       times for the same nodes, they must all be the same value. To unset Au‐
72       toDetect  for  a node when a global AutoDetect is set, simply set it to
73       "off" in a node-specific GRES line.  E.g.: NodeName=tux3 AutoDetect=off
74       Name=gpu File=/dev/nvidia[0-3].
75
76
77       Count  Number  of  resources  of this type available on this node.  The
78              default value is set to the number of File values specified  (if
79              any),  otherwise the default value is one. A suffix of "K", "M",
80              "G", "T" or "P" may be used to  multiply  the  number  by  1024,
81              1048576,    1073741824,   etc.   respectively.    For   example:
82              "Count=10G".
83
84
85       Cores  Optionally specify the core index numbers for the specific cores
86              which  can  use  this resource.  For example, it may be strongly
87              preferable to use specific  cores  with  specific  GRES  devices
88              (e.g. on a NUMA architecture).  While Slurm can track and assign
89              resources at the CPU or thread level, its scheduling  algorithms
90              used  to co-allocate GRES devices with CPUs operates at a socket
91              or NUMA level.  Therefore it is not possible  to  preferentially
92              assign  GRES  with  different  specific CPUs on the same NUMA or
93              socket and this option should be used to identify all  cores  on
94              some socket.
95
96
97              Multiple  cores may be specified using a comma-delimited list or
98              a range may be specified using a "-" separator  (e.g.  "0,1,2,3"
99              or  "0-3").   If  a  job specifies --gres-flags=enforce-binding,
100              then only the  identified  cores  can  be  allocated  with  each
101              generic resource. This will tend to improve performance of jobs,
102              but delay the allocation of resources to them.  If specified and
103              a job is not submitted with the --gres-flags=enforce-binding op‐
104              tion the identified cores will be preferred for scheduling  with
105              each generic resource.
106
107              If  --gres-flags=disable-binding is specified, then any core can
108              be used with the resources, which also increases  the  speed  of
109              Slurm's  scheduling  algorithm  but  can degrade the application
110              performance.  The --gres-flags=disable-binding  option  is  cur‐
111              rently  required to use more CPUs than are bound to a GRES (i.e.
112              if a GPU is bound to the CPUs on one socket,  but  resources  on
113              more  than one socket are required to run the job).  If any core
114              can be effectively used with the resources, then do not  specify
115              the  cores  option  for  improved  speed in the Slurm scheduling
116              logic.  A restart of the slurmctld is needed for changes to  the
117              Cores option to take effect.
118
119              NOTE: Since Slurm must be able to perform resource management on
120              heterogeneous clusters having various processing unit  numbering
121              schemes,  a  logical core index must be specified instead of the
122              physical core index.  That logical core index might  not  corre‐
123              spond  to  your  physical core index number.  Core 0 will be the
124              first core on the first socket, while core 1 will be the  second
125              core  on  the  first  socket.  This numbering coincides with the
126              logical core number (Core L#) seen in "lstopo -l"  command  out‐
127              put.
128
129
130       File   Fully  qualified  pathname of the device files associated with a
131              resource.  The name can include a numeric range suffix to be in‐
132              terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
133
134
135              This  field  is generally required if enforcement of generic re‐
136              source allocations is to be supported (i.e. prevents users  from
137              making  use  of  resources  allocated to a different user).  En‐
138              forcement of the  file  allocation  relies  upon  Linux  Control
139              Groups  (cgroups)  and  Slurm's  task/cgroup  plugin, which will
140              place the allocated files into the job's cgroup and prevent  use
141              of  other  files.  Please see Slurm's Cgroups Guide for more in‐
142              formation: https://slurm.schedmd.com/cgroups.html.
143
144              If File is specified then Count must be either set to the number
145              of  file  names  specified  or not set (the default value is the
146              number of files specified).  The exception to this is  MPS.  For
147              MPS,  each GPU would be identified by device file using the File
148              parameter and Count would specify the number of MPS entries that
149              would  correspond to that GPU (typically 100 or some multiple of
150              100).
151
152              NOTE: If you specify the File parameter for a resource  on  some
153              node,  the  option must be specified on all nodes and Slurm will
154              track the assignment of each specific  resource  on  each  node.
155              Otherwise  Slurm  will only track a count of allocated resources
156              rather than the state of each individual device file.
157
158              NOTE: Drain a node before changing the  count  of  records  with
159              File  parameters  (i.e. if you want to add or remove GPUs from a
160              node's configuration).  Failure to do so will result in any  job
161              using those GRES being aborted.
162
163
164       Flags  Optional flags that can be specified to change configured behav‐
165              ior of the GRES.
166
167              Allowed values at present are:
168
169              CountOnly           Do not attempt to load plugin as  this  GRES
170                                  will  only  be  used to track counts of GRES
171                                  used. This avoids attempting to load non-ex‐
172                                  istent  plugin  which can affect filesystems
173                                  with high latency  metadata  operations  for
174                                  non-existent files.
175
176              nvidia_gpu_env      Set  environment  variable  CUDA_VISIBLE_DE‐
177                                  VICES for all GPUs on the specified node(s).
178
179              amd_gpu_env         Set  environment  variable  ROCR_VISIBLE_DE‐
180                                  VICES for all GPUs on the specified node(s).
181
182              opencl_env          Set  environment variable GPU_DEVICE_ORDINAL
183                                  for all GPUs on the specified node(s).
184
185              no_gpu_env          Set no GPU-specific  environment  variables.
186                                  This  is mutually exclusive to all other en‐
187                                  vironment-related flags.
188
189       If no environment-related flags  are  specified,  then  nvidia_gpu_env,
190       amd_gpu_env,  and  opencl_env will be implicitly set by default. If Au‐
191       toDetect is used and environment-related flags are not specified,  then
192       AutoDetect=nvml  will  set  nvidia_gpu_env and AutoDetect=rsmi will set
193       amd_gpu_env. Conversely, specified environment-related flags  will  al‐
194       ways override AutoDetect.
195
196       Environment-related flags set on one GRES line will be inherited by the
197       GRES line directly below it if no environment-related flags are  speci‐
198       fied  on that line and if it is of the same node, name, and type. Envi‐
199       ronment-related flags must be the same for GRES of the same node, name,
200       and type.
201
202       Note  that  there  is  a  known  issue  with the AMD ROCm runtime where
203       ROCR_VISIBLE_DEVICES is processed first, and then  CUDA_VISIBLE_DEVICES
204       is processed. To avoid the issues caused by this, set Flags=amd_gpu_env
205       for AMD GPUs so only ROCR_VISIBLE_DEVICES is set.
206
207
208
209       Links  A comma-delimited list of numbers identifying the number of con‐
210              nections   between  this  device  and  other  devices  to  allow
211              coscheduling of better connected devices.  This  is  an  ordered
212              list in which the number of connections this specific device has
213              to device number 0 would be in the first position, the number of
214              connections  it  has  to device number 1 in the second position,
215              etc.  A -1 indicates the device itself and a 0 indicates no con‐
216              nection.  If specified, then this line can only contain a single
217              GRES device (i.e. can only contain a single file via File).
218
219
220              This is an optional value and is  usually  automatically  deter‐
221              mined  if AutoDetect is enabled.  A typical use case would be to
222              identify GPUs having NVLink connectivity.  Note that  for  GPUs,
223              the  minor number assigned by the OS and used in the device file
224              (i.e. the X in /dev/nvidiaX) is not necessarily the same as  the
225              device number/index. The device number is created by sorting the
226              GPUs by PCI bus ID and then numbering  them  starting  from  the
227              smallest                bus               ID.                See
228              https://slurm.schedmd.com/gres.html#GPU_Management
229
230
231       Name   Name of the generic resource. Any desired name may be used.  The
232              name  must  match  a  value  in  GresTypes  in slurm.conf.  Each
233              generic resource has an optional plugin which  can  provide  re‐
234              source-specific functionality.  Generic resources that currently
235              include an optional plugin are:
236
237              gpu    Graphics Processing Unit
238
239              mps    CUDA Multi-Process Service (MPS)
240
241              nic    Network Interface Card
242
243
244       NodeName
245              An optional NodeName specification can be  used  to  permit  one
246              gres.conf  file to be used for all compute nodes in a cluster by
247              specifying the node(s) that each  line  should  apply  to.   The
248              NodeName specification can use a Slurm hostlist specification as
249              shown in the example below.
250
251
252       Type   An optional arbitrary string identifying  the  type  of  device.
253              For  example, this might be used to identify a specific model of
254              GPU, which users can then specify in a job request.  If Type  is
255              specified,  then  Count  is limited in size (currently 1024).  A
256              restart of the slurmctld and  slurmd  daemons  is  required  for
257              changes to the Type option to take effect.
258
259

EXAMPLES

261       ##################################################################
262       # Slurm's Generic Resource (GRES) configuration file
263       # Define GPU devices with MPS support, with AutoDetect sanity checking
264       ##################################################################
265       AutoDetect=nvml
266       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
267       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
268       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
269       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3
270
271
272       ##################################################################
273       # Slurm's Generic Resource (GRES) configuration file
274       # Overwrite system defaults and explicitly configure three GPUs
275       ##################################################################
276       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
277       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
278       # NOTE: nvidia2 device is out of service
279       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3
280
281
282       ##################################################################
283       # Slurm's Generic Resource (GRES) configuration file
284       # Use a single gres.conf file for all compute nodes - positive method
285       ##################################################################
286       ## Explicitly specify devices on nodes tux0-tux15
287       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
288       # NOTE: tux3 nvidia1 device is out of service
289       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
290       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
291       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]
292
293
294       ##################################################################
295       # Slurm's Generic Resource (GRES) configuration file
296       # Use NVML to gather GPU configuration information
297       # for all nodes except one
298       ##################################################################
299       AutoDetect=nvml
300       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
301
302       ##################################################################
303       # Slurm's Generic Resource (GRES) configuration file
304       #  Specify  some  nodes with NVML, some with RSMI, and some with no Au‐
305       toDetect
306       ##################################################################
307       NodeName=tux[0-7] AutoDetect=nvml
308       NodeName=tux[8-11] AutoDetect=rsmi
309       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
310
311
312       ##################################################################
313       # Slurm's Generic Resource (GRES) configuration file
314       # Define 'bandwidth' GRES to use as a way to limit the
315       # resource use on these nodes for workflow purposes
316       ##################################################################
317       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
318
319

COPYING

321       Copyright (C) 2010 The Regents of the University of  California.   Pro‐
322       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
323       Copyright (C) 2010-2021 SchedMD LLC.
324
325       This  file  is  part  of Slurm, a resource management program.  For de‐
326       tails, see <https://slurm.schedmd.com/>.
327
328       Slurm is free software; you can redistribute it and/or modify it  under
329       the  terms  of  the GNU General Public License as published by the Free
330       Software Foundation; either version 2 of the License, or (at  your  op‐
331       tion) any later version.
332
333       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
334       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
335       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
336       for more details.
337
338

NAME

DESCRIPTION

EXAMPLES

COPYING

SEE ALSO