1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is an ASCII file which defines parameters used by
11 Slurm's acct_gather related plugins. The file location can be modified
12 at system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
13 cution time by setting the SLURM_CONF environment variable. The file
14 will always be located in the same directory as the slurm.conf file.
15
16 Parameter names are case insensitive. Any text following a "#" in the
17 configuration file is treated as a comment through the end of that
18 line. The size of each line in the file is limited to 1024 characters.
19 Changes to the configuration file take effect upon restart of Slurm
20 daemons, daemon receipt of the SIGHUP signal, or execution of the com‐
21 mand "scontrol reconfigure" unless otherwise noted.
22
23
24 The following acct_gather.conf parameters are defined to control the
25 general behavior of various plugins in Slurm.
26
27
28 The acct_gather.conf file is different than other Slurm .conf files.
29 Each plugin defines which options are available. So if you do not load
30 the respective plugin for an option that option will appear to be
31 unknown by Slurm and could cause Slurm not to load. If you decide to
32 change plugin types you might also have to change the related options
33 as well.
34
35
36 EnergyIPMI
37 Options used for acct_gather_energy/ipmi are as follows:
38
39
40 EnergyIPMIFrequency=<number>
41 This parameter is the number of seconds between BMC
42 access samples.
43
44
45 EnergyIPMICalcAdjustment=<yes|no>
46 If set to "yes", the consumption between the last BMC
47 access sample and a step consumption update is approx‐
48 imated to get more accurate task consumption. The
49 adjustment is made at the step start and each time the
50 consumption is updated, including the step end. The
51 approximations are not accumulated, only the first and
52 last adjustments are used to calculated the consump‐
53 tion. The default is "no".
54
55
56 EnergyIPMIPowerSensors=<key=values>
57 Optionally specify the ids of the sensors to used.
58 Multiple <key=values> can be set with ";" separators.
59 The key "Node" is mandatory and is used to know the
60 consumed energy for nodes (scontrol show node) and
61 jobs (sacct). Other keys are optional and are named
62 by administrator. These keys are useful only when
63 profile is activated for energy to store power (in
64 watt) of each key. <values> are integers, multiple
65 values can be set with "," separators. The sum of the
66 listed sensors is used for each key. EnergyIPMIPow‐
67 erSensors is optional, default value is "Node=number"
68 where "number" is the id of the first power sensor
69 returned by ipmi-sensors.
70 i.e.
71 EnergyIPMIPowerSen‐
72 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
73 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
74 EnergyIPMIPowerSensors=Node=1280
75
76
77 The following acct_gather.conf parameters are defined to control
78 the IPMI config default values for libipmiconsole.
79
80
81 EnergyIPMIUsername=USERNAME
82 Specify BMC Username.
83
84
85 EnergyIPMIPassword=PASSWORD
86 Specify BMC Password.
87
88
89 EnergyXCC
90 Options used for acct_gather_energy/xcc include only in-band
91 communications with XClarity Controller, thus a reduced set of
92 configurations is supported:
93
94
95 EnergyIPMIFrequency=<number>
96 This parameter is the number of seconds between XCC
97 access samples. Default is 30 seconds.
98
99
100 EnergyIPMITimeout=<number>
101 Timeout, in seconds, for initializing the IPMI XCC
102 context for a new gathering thread. Default is 10 sec‐
103 onds.
104
105
106 ProfileHDF5
107 Options used for acct_gather_profile/hdf5 are as follows:
108
109
110 ProfileHDF5Dir=<path>
111 This parameter is the path to the shared folder into
112 which the acct_gather_profile plugin will write
113 detailed data (usually as an HDF5 file). The direc‐
114 tory is assumed to be on a file system shared by the
115 controller and all compute nodes. This is a required
116 parameter.
117
118
119 ProfileHDF5Default
120 A comma delimited list of data types to be collected
121 for each job submission. Allowed values are:
122
123
124 All All data types are collected. (Cannot be com‐
125 bined with other values.)
126
127
128 None No data types are collected. This is the
129 default. (Cannot be combined with other val‐
130 ues.)
131
132
133 Energy Energy data is collected.
134
135
136 Filesystem
137 File system (Lustre) data is collected.
138
139
140 Network Network (InfiniBand) data is collected.
141
142
143 Task Task (I/O, Memory, ...) data is collected.
144
145
146 ProfileInfluxDB
147 Options used for acct_gather_profile/influxdb are as follows:
148
149
150 ProfileInfluxDBDatabase
151 InfluxDB database name where profiling information is
152 to be written.
153
154
155 ProfileInfluxDBDefault
156 A comma delimited list of data types to be collected
157 for each job submission. Allowed values are:
158
159
160 All All data types are collected. (Cannot be com‐
161 bined with other values.)
162
163
164 None No data types are collected. This is the
165 default. (Cannot be combined with other val‐
166 ues.)
167
168
169 Energy Energy data is collected.
170
171
172 Filesystem
173 File system (Lustre) data is collected.
174
175
176 Network Network (InfiniBand) data is collected.
177
178
179 Task Task (I/O, Memory, ...) data is collected.
180
181
182 ProfileInfluxDBHost=<hostname>:<port>
183 The hostname of the machine where the influxd instance
184 is executed and the port used by the HTTP API. The
185 port used by the HTTP API is the one configured
186 through the bind-address influxdb.conf option in the
187 [http] section. Example:
188
189 ProfileInfluxDBHost=myinfluxhost:8086
190
191
192 ProfileInfluxDBPass
193 Optional password for username configured in Profile‐
194 InfluxDBUser.
195
196
197 ProfileInfluxDBRTPolicy
198 The InfluxDB retention policy name for the database
199 configured in ProfileInfluxDBDatabase option.
200
201
202 ProfileInfluxDBUser
203 Optional InfluxDB username that should be used to gain
204 access to the database configured in ProfileInfluxDB‐
205 Database. This is only needed InfluxDB is configured
206 with authentication enabled in the [http] config sec‐
207 tion and a user has been granted at least WRITE access
208 to the database. See also ProfileInfluxDBPass.
209
210
211 NOTE: This plugin requires the libcurl development files to be
212 installed.
213
214 NOTE: Information on how to install and configure InfluxDB and manage
215 databases, retention policies and such is available on the offi‐
216 cial webpage.
217
218 NOTE: Collected information is written from every compute node where a
219 job runs to the influxd instance listening on the ProfileIn‐
220 fluxDBHost. In order to avoid overloading the influxd instance
221 with incoming connection requests, the plugin uses an internal
222 buffer which is filled with samples. Once the buffer is full, a
223 HTTP API write request is performed and the buffer is emptied to
224 hold subsequent samples. A final request is also performed when
225 a task ends even if the buffer isn't full.
226
227 NOTE: Failed HTTP API write requests are discarded. This means that
228 collected profile information in the plugin buffer is lost if it
229 can't be written to the influxd database for any reason.
230
231 NOTE: Plugin messages are logged along with the slurmstepd logs to
232 SlurmdLogFile. In order to troubleshoot any issues, it is recom‐
233 mended to temporarily increase the slurmd debug level to debug3
234 and add Profile to the debug flags. This can be accomplished by
235 setting the slurm.conf SlurmdDebug and DebugFlags respectively
236 or dynamically through scontrol setdebug and setdebugflags.
237
238 NOTE: Perhaps it's a good idea to use a monitoring and analytics tool
239 such as Grafana on top of InfluxDB. This kind of tools permit
240 one to create dashboards, tables, and other graphics using the
241 stored time series. This way, it is easier to correlate resource
242 usage peaks reported by other node monitoring tools such as Gan‐
243 glia with specific job step tasks.
244
245
246 InfinibandOFED
247 Options used for acct_gather_interconnect/ofed are as follows:
248
249
250 InfinibandOFEDPort=<number>
251 This parameter represents the port number of the local
252 Infiniband card that we are willing to monitor. The
253 default port is 1.
254
256 ###
257 # Slurm acct_gather configuration file
258 ###
259 # Parameters for acct_gather_energy/impi plugin
260 EnergyIPMIFrequency=10
261 EnergyIPMICalcAdjustment=yes
262 #
263 # Parameters for acct_gather_profile/hdf5 plugin
264 ProfileHDF5Dir=/app/slurm/profile_data
265 # Parameters for acct_gather_interconnect/ofed plugin
266 InfinibandOFEDPort=1
267
268
269
271 Copyright (C) 2012-2013 Bull. Produced at Bull (cf, DISCLAIMER).
272
273 This file is part of Slurm, a resource management program. For
274 details, see <https://slurm.schedmd.com/>.
275
276 Slurm is free software; you can redistribute it and/or modify it under
277 the terms of the GNU General Public License as published by the Free
278 Software Foundation; either version 2 of the License, or (at your
279 option) any later version.
280
281 Slurm is distributed in the hope that it will be useful, but WITHOUT
282 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
283 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
284 for more details.
285
286
288 slurm.conf(5)
289
290
291
292April 2020 Slurm Configuration File acct_gather.conf(5)