1gres.conf(5)               Slurm Configuration File               gres.conf(5)
2
3
4

NAME

6       gres.conf - Slurm configuration file for generic resource management.
7
8

DESCRIPTION

10       gres.conf is an ASCII file which describes the configuration of generic
11       resources on each compute node. Each node must contain a gres.conf file
12       if  generic  resources are to be scheduled by Slurm.  The file location
13       can be modified at  system  build  time  using  the  DEFAULT_SLURM_CONF
14       parameter  or  at  execution time by setting the SLURM_CONF environment
15       variable. The file will always be located in the same directory as  the
16       slurm.conf  file. If generic resource counts are set by the gres plugin
17       function node_config_load(), this file may be optional.
18
19       Parameter names are case insensitive.  Any text following a "#" in  the
20       configuration  file  is  treated  as  a comment through the end of that
21       line.  Changes to the configuration file take effect  upon  restart  of
22       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
23       command "scontrol reconfigure" unless otherwise noted.
24
25       The overall configuration parameters available include:
26
27
28       Count  Number of resources of this type available on  this  node.   The
29              default  value is set to the number of File values specified (if
30              any), otherwise the default value is one. A suffix of "K",  "M",
31              "G",  "T"  or  "P"  may  be used to multiply the number by 1024,
32              1048576, 1073741824, etc. respectively.
33
34
35       Cores  Specify the first thread CPU  index  numbers  for  the  specific
36              cores  which  can  use  this  resource.   For example, it may be
37              strongly preferable to use specific cores with specific  devices
38              (e.g.  on  a NUMA architecture). Multiple cores may be specified
39              using a comma delimited list or a range may be specified using a
40              "-"  separator (e.g. "0,1,2,3" or "0-3").  If the Cores configu‐
41              ration option is specified and  a  job  is  submitted  with  the
42              --gres-flags=enforce-binding  option  then  only  the identified
43              cores can be allocated with each generic  resource;  which  will
44              tend  to improve performance of jobs, but slow the allocation of
45              resources to them.  If specified and a job is not submitted with
46              the  --gres-flags=enforce-binding  option  the  identified cores
47              will be preferred for scheduled with each generic resource.   If
48              --gres-flags=disable-binding  is specified, then any core can be
49              used with the resources,  which  also  increases  the  speed  of
50              Slurm's  scheduling  algorithm  but  can degrade the application
51              performance.  The --gres-flags=disable-binding  option  is  cur‐
52              rently  required to use more CPUs than are bound to a GRES (i.e.
53              if a GPU is bound to the CPUs on one socket,  but  resources  on
54              more  than one socket are required to run the job).  If any core
55              can be effectively used with the resources, then do not  specify
56              the  cores  option  for  improved  speed in the Slurm scheduling
57              logic.  A restart of the slurmctld is needed for changes to  the
58              Cores option to take affect.
59
60              NOTE: If your cores contain multiple threads only list the first
61              thread of each core. The logic is such that it uses core instead
62              of  thread  scheduling per GRES. Also note that since Slurm must
63              be able to perform resource management on heterogeneous clusters
64              having various core ID numbering schemes, an abstract index will
65              be used instead of the physical core index. That abstract id may
66              not  correspond  to  your physical core number.  Basically Slurm
67              starts numbering from 0 to n, being 0 the id of the  first  pro‐
68              cessing  unit  (core  or  thread  if HT is enabled) on the first
69              socket, first core and maybe first thread, and  then  continuing
70              sequentially to the next thread, core, and socket. The numbering
71              generally coincides with the processing unit logical number  (PU
72              L#) seen in lstopo output.
73
74
75       File   Fully  qualified  pathname of the device files associated with a
76              resource.  The file name parsing logic includes support for sim‐
77              ple  regular expressions as shown in the example.  This field is
78              generally required if enforcement of  generic  resource  alloca‐
79              tions  is to be supported (i.e. prevents a users from making use
80              of resources allocated to a different user).  If File is  speci‐
81              fied  then  Count must be either set to the number of file names
82              specified or not set (the default value is the number  of  files
83              specified).  Slurm must track the utilization of each individual
84              device If device file names are specified, which  involves  more
85              overhead  than  just  tracking  the device counts.  Use the File
86              parameter only if the Count is not sufficient for tracking  pur‐
87              poses.   NOTE:  If you specify the File parameter for a resource
88              on some node, the option must be  specified  on  all  nodes  and
89              Slurm  will  track  the  assignment of each specific resource on
90              each node. Otherwise Slurm will only track a count of  allocated
91              resources rather than the state of each individual device file.
92
93
94       Name   Name  of  the  generic  resource.  Any desired name may be used.
95              Each generic resource has an optional plugin which  can  provide
96              resource-specific  options.   Generic  resources  that currently
97              include an optional plugin are:
98
99              gpu    Graphics Processing Unit
100
101              nic    Network Interface Card
102
103              mic    Intel Many Integrated Core (MIC) processor
104
105
106       NodeName
107              An optional NodeName specification can be  used  to  permit  one
108              gres.conf  file to be used for all compute nodes in a cluster by
109              specifying the node(s) that each  line  should  apply  to.   The
110              NodeName specification can use a Slurm hostlist specification as
111              shown in the example below.
112
113
114       Type   An arbitrary string identifying the type of device.   For  exam‐
115              ple,  a  particular  model  of  GPU.  If Type is specified, then
116              Count is limited in size (currently 1024).
117
118

EXAMPLES

120       ##################################################################
121       # Slurm's Generic Resource (GRES) configuration file
122       ##################################################################
123       # Configure support for our four GPUs
124       Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
125       Name=gpu Type=gtx560 File=/dev/nvidia1 COREs=0,1
126       Name=gpu Type=tesla  File=/dev/nvidia2 COREs=2,3
127       Name=gpu Type=tesla  File=/dev/nvidia3 COREs=2,3
128       Name=bandwidth Count=20M
129
130
131       ##################################################################
132       # Slurm's Generic Resource (GRES) configuration file
133       # Use a single gres.conf file for all compute nodes
134       ##################################################################
135       NodeName=tux[0-15]  Name=gpu File=/dev/nvidia[0-3]
136       NodeName=tux[16-31] Name=gpu File=/dev/nvidia[0-7]
137
138

COPYING

140       Copyright (C) 2010 The Regents of the University of  California.   Pro‐
141       duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
142       Copyright (C) 2010-2014 SchedMD LLC.
143
144       This  file  is  part  of  Slurm,  a  resource  management program.  For
145       details, see <https://slurm.schedmd.com/>.
146
147       Slurm is free software; you can redistribute it and/or modify it  under
148       the  terms  of  the GNU General Public License as published by the Free
149       Software Foundation; either version 2  of  the  License,  or  (at  your
150       option) any later version.
151
152       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
153       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
154       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
155       for more details.
156
157

SEE ALSO

159       slurm.conf(5)
160
161
162
163July 2018                  Slurm Configuration File               gres.conf(5)
Impressum