gres.conf(5)

1gres.conf(5)               Slurm Configuration File               gres.conf(5)
2
3
4

NAME

6       gres.conf  -  Slurm configuration file for Generic RESource (GRES) man‐
7       agement.
8
9

DESCRIPTION

11       gres.conf is an ASCII file which describes the configuration of Generic
12       RESource(s)  (GRES)  on  each compute node.  If the GRES information in
13       the slurm.conf file does not fully describe  those  resources,  then  a
14       gres.conf  file  should  be included on each compute node and the slurm
15       controller. The file will always be located in the  same  directory  as
16       slurm.conf.
17
18
19       If  the  GRES  information in the slurm.conf file fully describes those
20       resources (i.e. no "Cores", "File" or "Links" specification is required
21       for that GRES type or that information is automatically detected), that
22       information may be omitted from the gres.conf file and only the config‐
23       uration information in the slurm.conf file will be used.  The gres.conf
24       file may be omitted completely if the configuration information in  the
25       slurm.conf file fully describes all GRES.
26
27
28       If  using  the  gres.conf  file  to describe the resources available to
29       nodes, the first parameter on the line should be NodeName. If configur‐
30       ing  Generic Resources without specifying nodes, the first parameter on
31       the line should be Name.
32
33
34       Parameter names are case insensitive.  Any text following a "#" in  the
35       configuration  file  is  treated  as  a comment through the end of that
36       line.  Changes to the configuration file take effect  upon  restart  of
37       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
38       command "scontrol reconfigure" unless otherwise noted.
39
40
41       NOTE: Slurm support for gres/[mps|shard] requires the use  of  the  se‐
42       lect/cons_tres  plugin.  For  more information on how to configure MPS,
43       see https://slurm.schedmd.com/gres.html#MPS_Management.  For  more  in‐
44       formation      on      how      to      configure     Sharding,     see
45       https://slurm.schedmd.com/gres.html#Sharding.
46
47
48       For   more   information   on   GRES   scheduling   in   general,   see
49       https://slurm.schedmd.com/gres.html.
50
51
52       The overall configuration parameters available include:
53
54
55       AutoDetect
56              The  hardware  detection mechanisms to enable for automatic GRES
57              configuration.  Currently, the options are:
58
59              nvml   Automatically detect NVIDIA  GPUs.  Requires  the  NVIDIA
60                     Management Library (NVML).
61
62              off    Do  not  automatically  detect any GPUs. Used to override
63                     other options.
64
65              rsmi   Automatically detect AMD GPUs. Requires the  ROCm  System
66                     Management Interface (ROCm SMI) Library.
67
68              AutoDetect  can  be  on  a line by itself, in which case it will
69              globally apply to all lines in gres.conf by  default.  In  addi‐
70              tion,  AutoDetect can be combined with NodeName to only apply to
71              certain nodes. Node-specific AutoDetects will trump  the  global
72              AutoDetect.  A  node-specific AutoDetect only needs to be speci‐
73              fied once per node. If specified multiple  times  for  the  same
74              nodes,  they must all be the same value. To unset AutoDetect for
75              a node when a global AutoDetect is set, simply set it  to  "off"
76              in  a  node-specific  GRES  line.   E.g.:  NodeName=tux3 AutoDe‐
77              tect=off Name=gpu File=/dev/nvidia[0-3].  AutoDetect  cannot  be
78              used with cloud nodes.
79
80
81              AutoDetect  will  automatically  detect files, cores, links, and
82              any other hardware. If a parameter such as File, Cores, or Links
83              are specified when AutoDetect is used, then the specified values
84              are used to sanity check the auto detected values. If there is a
85              mismatch,  then  the node's state is set to invalid and the node
86              is drained.
87
88       Count  Number of resources of this name/type available  on  this  node.
89              The  default value is set to the number of File values specified
90              (if any), otherwise the default value is one. A suffix  of  "K",
91              "M", "G", "T" or "P" may be used to multiply the number by 1024,
92              1048576,   1073741824,   etc.   respectively.    For    example:
93              "Count=10G".
94
95       Cores  Optionally specify the core index numbers for the specific cores
96              which can use this resource.  For example, it  may  be  strongly
97              preferable  to  use  specific  cores  with specific GRES devices
98              (e.g. on a NUMA architecture).  While Slurm can track and assign
99              resources  at the CPU or thread level, its scheduling algorithms
100              used to co-allocate GRES devices with CPUs operates at a  socket
101              or NUMA level for job allocations.  Therefore it is not possible
102              to preferentially assign GRES with different  specific  CPUs  on
103              the same NUMA or socket and this option should generally be used
104              to identify all cores on some socket. Though, job  step  alloca‐
105              tion  with  --exact  will  look at cores directly for which more
106              specific core identification may be useful.
107
108
109              Multiple cores may be specified using a comma-delimited list  or
110              a  range  may be specified using a "-" separator (e.g. "0,1,2,3"
111              or "0-3").  If  a  job  specifies  --gres-flags=enforce-binding,
112              then  only  the  identified  cores  can  be  allocated with each
113              generic resource. This will tend to improve performance of jobs,
114              but delay the allocation of resources to them.  If specified and
115              a job is not submitted with the --gres-flags=enforce-binding op‐
116              tion  the identified cores will be preferred for scheduling with
117              each generic resource.
118
119              If --gres-flags=disable-binding is specified, then any core  can
120              be  used  with  the resources, which also increases the speed of
121              Slurm's scheduling algorithm but  can  degrade  the  application
122              performance.   The  --gres-flags=disable-binding  option is cur‐
123              rently required to use more CPUs than are bound to a GRES  (e.g.
124              if  a  GPU  is bound to the CPUs on one socket, but resources on
125              more than one socket are required to run the job).  If any  core
126              can  be effectively used with the resources, then do not specify
127              the cores option for improved  speed  in  the  Slurm  scheduling
128              logic.   A restart of the slurmctld is needed for changes to the
129              Cores option to take effect.
130
131              NOTE: Since Slurm must be able to perform resource management on
132              heterogeneous  clusters having various processing unit numbering
133              schemes, a logical core index must be specified instead  of  the
134              physical  core  index.  That logical core index might not corre‐
135              spond to your physical core index number.  Core 0  will  be  the
136              first  core on the first socket, while core 1 will be the second
137              core on the first socket.  This  numbering  coincides  with  the
138              logical  core  number (Core L#) seen in "lstopo -l" command out‐
139              put.
140
141       File   Fully qualified pathname of the device files associated  with  a
142              resource.  The name can include a numeric range suffix to be in‐
143              terpreted by Slurm (e.g. File=/dev/nvidia[0-3]).
144
145
146              This field is generally required if enforcement of  generic  re‐
147              source  allocations is to be supported (i.e. prevents users from
148              making use of resources allocated to  a  different  user).   En‐
149              forcement  of  the  file  allocation  relies  upon Linux Control
150              Groups (cgroups) and  Slurm's  task/cgroup  plugin,  which  will
151              place  the allocated files into the job's cgroup and prevent use
152              of other files.  Please see Slurm's Cgroups Guide for  more  in‐
153              formation: https://slurm.schedmd.com/cgroups.html.
154
155              If File is specified then Count must be either set to the number
156              of file names specified or not set (the  default  value  is  the
157              number of files specified).  The exception to this is MPS/Shard‐
158              ing. For either of these GRES, each GPU would be  identified  by
159              device file using the File parameter and Count would specify the
160              number of entries that would correspond to that  GPU.  For  MPS,
161              typically  100  or  some multiple of 100. For Sharding typically
162              the maximum number of jobs that could simultaneously share  that
163              GPU.
164
165              If  using a card with Multi-Instance GPU functionality, use Mul‐
166              tipleFiles instead. File and MultipleFiles are  mutually  exclu‐
167              sive.
168
169              NOTE:  If  you specify the File parameter for a resource on some
170              node, the option must be specified on all nodes and  Slurm  will
171              track  the  assignment  of  each specific resource on each node.
172              Otherwise Slurm will only track a count of  allocated  resources
173              rather than the state of each individual device file.
174
175              NOTE:  Drain  a  node  before changing the count of records with
176              File parameters (e.g. if you want to add or remove GPUs  from  a
177              node's  configuration).  Failure to do so will result in any job
178              using those GRES being aborted.
179
180              NOTE: When specifying File, Count is limited in size  (currently
181              1024) for each node.
182
183       Flags  Optional flags that can be specified to change configured behav‐
184              ior of the GRES.
185
186              Allowed values at present are:
187
188              CountOnly           Do not attempt to load plugin as  this  GRES
189                                  will  only  be  used to track counts of GRES
190                                  used. This avoids attempting to load non-ex‐
191                                  istent  plugin  which can affect filesystems
192                                  with high latency  metadata  operations  for
193                                  non-existent files.
194
195              one_sharing         To  be  used  on  a  shared gres. If using a
196                                  shared gres (mps) on top of a  sharing  gres
197                                  (gpu)  only allow one of the sharing gres to
198                                  be used by the shared gres.  This is the de‐
199                                  fault for MPS.
200
201                                  NOTE:  If a gres has this flag configured it
202                                  is global, so all other nodes with that gres
203                                  will  have  this flag implied.  This flag is
204                                  not combatible with all_sharing for  a  spe‐
205                                  cific gres.
206
207              all_sharing         To be used on a shared gres. This is the op‐
208                                  posite of one_sharing and can be used to al‐
209                                  low  all  sharing gres (gpu) on a node to be
210                                  used for shared gres (mps).
211
212                                  NOTE: If a gres has this flag configured  it
213                                  is global, so all other nodes with that gres
214                                  will have this flag implied.  This  flag  is
215                                  not  combatible  with one_sharing for a spe‐
216                                  cific gres.
217
218              nvidia_gpu_env      Set  environment  variable  CUDA_VISIBLE_DE‐
219                                  VICES for all GPUs on the specified node(s).
220
221              amd_gpu_env         Set  environment  variable  ROCR_VISIBLE_DE‐
222                                  VICES for all GPUs on the specified node(s).
223
224              opencl_env          Set environment variable  GPU_DEVICE_ORDINAL
225                                  for all GPUs on the specified node(s).
226
227              no_gpu_env          Set  no  GPU-specific environment variables.
228                                  This is mutually exclusive to all other  en‐
229                                  vironment-related flags.
230
231              If    no   environment-related   flags   are   specified,   then
232              nvidia_gpu_env, amd_gpu_env, and opencl_env will  be  implicitly
233              set  by  default.  If AutoDetect is used and environment-related
234              flags  are  not  specified,  then   AutoDetect=nvml   will   set
235              nvidia_gpu_env  and  AutoDetect=rsmi  will set amd_gpu_env. Con‐
236              versely, specified environment-related flags will  always  over‐
237              ride AutoDetect.
238
239              Environment-related flags set on one GRES line will be inherited
240              by the GRES line directly below  it  if  no  environment-related
241              flags  are specified on that line and if it is of the same node,
242              name, and type. Environment-related flags must be the  same  for
243              GRES of the same node, name, and type.
244
245              Note that there is a known issue with the AMD ROCm runtime where
246              ROCR_VISIBLE_DEVICES is processed  first,  and  then  CUDA_VISI‐
247              BLE_DEVICES  is  processed.  To avoid the issues caused by this,
248              set Flags=amd_gpu_env for AMD GPUs so only  ROCR_VISIBLE_DEVICES
249              is set.
250
251       Links  A comma-delimited list of numbers identifying the number of con‐
252              nections  between  this  device  and  other  devices  to   allow
253              coscheduling  of  better  connected devices.  This is an ordered
254              list in which the number of connections this specific device has
255              to device number 0 would be in the first position, the number of
256              connections it has to device number 1 in  the  second  position,
257              etc.  A -1 indicates the device itself and a 0 indicates no con‐
258              nection.  If specified, then this line can only contain a single
259              GRES device (i.e. can only contain a single file via File).
260
261
262              This  is  an  optional value and is usually automatically deter‐
263              mined if AutoDetect is enabled.  A typical use case would be  to
264              identify  GPUs  having NVLink connectivity.  Note that for GPUs,
265              the minor number assigned by the OS and used in the device  file
266              (i.e.  the X in /dev/nvidiaX) is not necessarily the same as the
267              device number/index. The device number is created by sorting the
268              GPUs  by  PCI  bus  ID and then numbering them starting from the
269              smallest               bus               ID.                 See
270              https://slurm.schedmd.com/gres.html#GPU_Management
271
272       MultipleFiles
273              Fully  qualified  pathname of the device files associated with a
274              resource.  Graphics cards using Multi-Instance GPU  (MIG)  tech‐
275              nology will present multiple device files that should be managed
276              as a single generic resource. The file names can be a comma sep‐
277              arated  list or it can include a numeric range suffix (e.g. Mul‐
278              tipleFiles=/dev/nvidia[0-3]).
279
280              Drain a node before changing the count of records with the  Mul‐
281              tipleFiles  parameter, such as when adding or removing GPUs from
282              a node's configuration.  Failure to do so will result in any job
283              using those GRES being aborted.
284
285              When  not  using  GPUs with MIG functionality, use File instead.
286              MultipleFiles and File are mutually exclusive.
287
288       Name   Name of the generic resource. Any desired name may be used.  The
289              name  must  match  a  value  in  GresTypes  in slurm.conf.  Each
290              generic resource has an optional plugin which  can  provide  re‐
291              source-specific functionality.  Generic resources that currently
292              include an optional plugin are:
293
294              gpu    Graphics Processing Unit
295
296              mps    CUDA Multi-Process Service (MPS)
297
298              nic    Network Interface Card
299
300              shard  Shards of a gpu
301
302       NodeName
303              An optional NodeName specification can be  used  to  permit  one
304              gres.conf  file to be used for all compute nodes in a cluster by
305              specifying the node(s) that each  line  should  apply  to.   The
306              NodeName specification can use a Slurm hostlist specification as
307              shown in the example below.
308
309       Type   An optional arbitrary string identifying the type of generic re‐
310              source.   For example, this might be used to identify a specific
311              model of GPU, which users can then specify in a job request.   A
312              restart  of  the  slurmctld  and  slurmd daemons is required for
313              changes to the Type option to take effect.
314
315              NOTE: If using autodetect functionality and defining the Type in
316              your  gres.conf  file,  the  Type specified should match or be a
317              substring of the value that is detected, using an underscore  in
318              lieu of any spaces.
319

EXAMPLES

321       ##################################################################
322       # Slurm's Generic Resource (GRES) configuration file
323       # Define GPU devices with MPS support, with AutoDetect sanity checking
324       ##################################################################
325       AutoDetect=nvml
326       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
327       Name=gpu Type=tesla  File=/dev/nvidia1 COREs=2,3
328       Name=mps Count=100 File=/dev/nvidia0 COREs=0,1
329       Name=mps Count=100  File=/dev/nvidia1 COREs=2,3
330
331       ##################################################################
332       # Slurm's Generic Resource (GRES) configuration file
333       # Overwrite system defaults and explicitly configure three GPUs
334       ##################################################################
335       Name=gpu Type=tesla File=/dev/nvidia[0-1] COREs=0,1
336       # Name=gpu Type=tesla  File=/dev/nvidia[2-3] COREs=2,3
337       # NOTE: nvidia2 device is out of service
338       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3
339
340       ##################################################################
341       # Slurm's Generic Resource (GRES) configuration file
342       # Use a single gres.conf file for all compute nodes - positive method
343       ##################################################################
344       ## Explicitly specify devices on nodes tux0-tux15
345       # NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
346       # NOTE: tux3 nvidia1 device is out of service
347       NodeName=tux[0-2]  Name=gpu File=/dev/nvidia[0-3]
348       NodeName=tux3  Name=gpu File=/dev/nvidia[0,2-3]
349       NodeName=tux[4-15]  Name=gpu File=/dev/nvidia[0-3]
350
351       ##################################################################
352       # Slurm's Generic Resource (GRES) configuration file
353       # Use NVML to gather GPU configuration information
354       # for all nodes except one
355       ##################################################################
356       AutoDetect=nvml
357       NodeName=tux3 AutoDetect=off Name=gpu File=/dev/nvidia[0-3]
358
359       ##################################################################
360       # Slurm's Generic Resource (GRES) configuration file
361       # Specify some nodes with NVML, some with RSMI, and some with no AutoDetect
362       ##################################################################
363       NodeName=tux[0-7] AutoDetect=nvml
364       NodeName=tux[8-11] AutoDetect=rsmi
365       NodeName=tux[12-15] Name=gpu File=/dev/nvidia[0-3]
366
367       ##################################################################
368       # Slurm's Generic Resource (GRES) configuration file
369       # Define 'bandwidth' GRES to use as a way to limit the
370       # resource use on these nodes for workflow purposes
371       ##################################################################
372       NodeName=tux[0-7] Name=bandwidth Type=lustre Count=4G Flags=CountOnly
373
374

COPYING

376       Copyright  (C)  2010 The Regents of the University of California.  Pro‐
377       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
378       Copyright (C) 2010-2022 SchedMD LLC.
379
380       This file is part of Slurm, a resource  management  program.   For  de‐
381       tails, see <https://slurm.schedmd.com/>.
382
383       Slurm  is free software; you can redistribute it and/or modify it under
384       the terms of the GNU General Public License as published  by  the  Free
385       Software  Foundation;  either version 2 of the License, or (at your op‐
386       tion) any later version.
387
388       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
389       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
390       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
391       for more details.
392
393

NAME

DESCRIPTION

EXAMPLES

COPYING

SEE ALSO