1gres.conf(5) Slurm Configuration File gres.conf(5)
2
3
4
6 gres.conf - Slurm configuration file for generic resource management.
7
8
10 gres.conf is an ASCII file which describes the configuration of generic
11 resources on each compute node. Each node must contain a gres.conf file
12 if generic resources are to be scheduled by Slurm. The file location
13 can be modified at system build time using the DEFAULT_SLURM_CONF
14 parameter or at execution time by setting the SLURM_CONF environment
15 variable. The file will always be located in the same directory as the
16 slurm.conf file. If generic resource counts are set by the gres plugin
17 function node_config_load(), this file may be optional.
18
19 Parameter names are case insensitive. Any text following a "#" in the
20 configuration file is treated as a comment through the end of that
21 line. Changes to the configuration file take effect upon restart of
22 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
23 command "scontrol reconfigure" unless otherwise noted.
24
25 The overall configuration parameters available include:
26
27
28 Count Number of resources of this type available on this node. The
29 default value is set to the number of File values specified (if
30 any), otherwise the default value is one. A suffix of "K", "M",
31 "G", "T" or "P" may be used to multiply the number by 1024,
32 1048576, 1073741824, etc. respectively.
33
34
35 Cores Specify the first thread CPU index numbers for the specific
36 cores which can use this resource. For example, it may be
37 strongly preferable to use specific cores with specific devices
38 (e.g. on a NUMA architecture). Multiple cores may be specified
39 using a comma delimited list or a range may be specified using a
40 "-" separator (e.g. "0,1,2,3" or "0-3"). If the Cores configu‐
41 ration option is specified and a job is submitted with the
42 --gres-flags=enforce-binding option then only the identified
43 cores can be allocated with each generic resource; which will
44 tend to improve performance of jobs, but slow the allocation of
45 resources to them. If specified and a job is not submitted with
46 the --gres-flags=enforce-binding option the identified cores
47 will be preferred for scheduled with each generic resource. If
48 --gres-flags=disable-binding is specified, then any core can be
49 used with the resources, which also increases the speed of
50 Slurm's scheduling algorithm but can degrade the application
51 performance. The --gres-flags=disable-binding option is cur‐
52 rently required to use more CPUs than are bound to a GRES (i.e.
53 if a GPU is bound to the CPUs on one socket, but resources on
54 more than one socket are required to run the job). If any core
55 can be effectively used with the resources, then do not specify
56 the cores option for improved speed in the Slurm scheduling
57 logic. A restart of the slurmctld is needed for changes to the
58 Cores option to take affect.
59
60 NOTE: If your cores contain multiple threads only list the first
61 thread of each core. The logic is such that it uses core instead
62 of thread scheduling per GRES. Also note that since Slurm must
63 be able to perform resource management on heterogeneous clusters
64 having various core ID numbering schemes, an abstract index will
65 be used instead of the physical core index. That abstract id may
66 not correspond to your physical core number. Basically Slurm
67 starts numbering from 0 to n, being 0 the id of the first pro‐
68 cessing unit (core or thread if HT is enabled) on the first
69 socket, first core and maybe first thread, and then continuing
70 sequentially to the next thread, core, and socket. The numbering
71 generally coincides with the processing unit logical number (PU
72 L#) seen in lstopo output.
73
74
75 File Fully qualified pathname of the device files associated with a
76 resource. The file name parsing logic includes support for sim‐
77 ple regular expressions as shown in the example. This field is
78 generally required if enforcement of generic resource alloca‐
79 tions is to be supported (i.e. prevents a users from making use
80 of resources allocated to a different user). If File is speci‐
81 fied then Count must be either set to the number of file names
82 specified or not set (the default value is the number of files
83 specified). Slurm must track the utilization of each individual
84 device If device file names are specified, which involves more
85 overhead than just tracking the device counts. Use the File
86 parameter only if the Count is not sufficient for tracking pur‐
87 poses. NOTE: If you specify the File parameter for a resource
88 on some node, the option must be specified on all nodes and
89 Slurm will track the assignment of each specific resource on
90 each node. Otherwise Slurm will only track a count of allocated
91 resources rather than the state of each individual device file.
92
93
94 Name Name of the generic resource. Any desired name may be used.
95 Each generic resource has an optional plugin which can provide
96 resource-specific options. Generic resources that currently
97 include an optional plugin are:
98
99 gpu Graphics Processing Unit
100
101 nic Network Interface Card
102
103 mic Intel Many Integrated Core (MIC) processor
104
105
106 NodeName
107 An optional NodeName specification can be used to permit one
108 gres.conf file to be used for all compute nodes in a cluster by
109 specifying the node(s) that each line should apply to. The
110 NodeName specification can use a Slurm hostlist specification as
111 shown in the example below.
112
113
114 Type An arbitrary string identifying the type of device. For exam‐
115 ple, a particular model of GPU. If Type is specified, then
116 Count is limited in size (currently 1024).
117
118
120 ##################################################################
121 # Slurm's Generic Resource (GRES) configuration file
122 ##################################################################
123 # Configure support for our four GPUs
124 Name=gpu Type=gtx560 File=/dev/nvidia0 COREs=0,1
125 Name=gpu Type=gtx560 File=/dev/nvidia1 COREs=0,1
126 Name=gpu Type=tesla File=/dev/nvidia2 COREs=2,3
127 Name=gpu Type=tesla File=/dev/nvidia3 COREs=2,3
128 Name=bandwidth Count=20M
129
130
131 ##################################################################
132 # Slurm's Generic Resource (GRES) configuration file
133 # Use a single gres.conf file for all compute nodes
134 ##################################################################
135 NodeName=tux[0-15] Name=gpu File=/dev/nvidia[0-3]
136 NodeName=tux[16-31] Name=gpu File=/dev/nvidia[0-7]
137
138
140 Copyright (C) 2010 The Regents of the University of California. Pro‐
141 duced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
142 Copyright (C) 2010-2014 SchedMD LLC.
143
144 This file is part of Slurm, a resource management program. For
145 details, see <https://slurm.schedmd.com/>.
146
147 Slurm is free software; you can redistribute it and/or modify it under
148 the terms of the GNU General Public License as published by the Free
149 Software Foundation; either version 2 of the License, or (at your
150 option) any later version.
151
152 Slurm is distributed in the hope that it will be useful, but WITHOUT
153 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
154 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
155 for more details.
156
157
159 slurm.conf(5)
160
161
162
163July 2018 Slurm Configuration File gres.conf(5)