1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is a UTF8 formatted file which defines parameters used
11 by Slurm's acct_gather related plugins. The file will always be lo‐
12 cated in the same directory as the slurm.conf.
13
14 Parameter names are case insensitive but parameter values are case sen‐
15 sitive. Any text following a "#" in the configuration file is treated
16 as a comment through the end of that line. The size of each line in
17 the file is limited to 1024 characters.
18
19 Changes to the configuration file take effect upon restart of the Slurm
20 daemons.
21
22
23 The following acct_gather.conf parameters are defined to control the
24 general behavior of various plugins in Slurm.
25
26
27 The acct_gather.conf file is different than other Slurm .conf files.
28 Each plugin defines which options are available. Each plugin to be
29 loaded must be specified in the slurm.conf under the following configu‐
30 ration entries:
31
32 • AcctGatherEnergyType (plugin type=acct_gather_energy)
33 • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
34 • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
35 • AcctGatherProfileType (plugin type=acct_gather_profile)
36
37
38 If the respective plugin for an option is not loaded then that option
39 will appear to be unknown by Slurm and silently ignored. If you decide
40 to change plugin types you may also have to change the related options.
41
42
44 Required entry in slurm.conf:
45 AcctGatherEnergyType=acct_gather_energy/ipmi
46
47 Options used for acct_gather_energy/ipmi are as follows:
48
49
50 EnergyIPMIFrequency=<number>
51 This parameter is the number of seconds between BMC
52 access samples.
53
54 EnergyIPMICalcAdjustment=<yes|no>
55 If set to "yes", the consumption between the last BMC
56 access sample and a step consumption update is approx‐
57 imated to get more accurate task consumption. The ad‐
58 justment is made at the step start and each time the
59 consumption is updated, including the step end. The
60 approximations are not accumulated, only the first and
61 last adjustments are used to calculated the consump‐
62 tion. The default is "no".
63
64 EnergyIPMIPowerSensors=<key=values>
65 Optionally specify the ids of the sensors to used.
66 Multiple <key=values> can be set with ";" separators.
67 The key "Node" is mandatory and is used to know the
68 consumed energy for nodes (scontrol show node) and
69 jobs (sacct). Other keys are optional and are named
70 by administrator. These keys are useful only when
71 profile is activated for energy to store power (in
72 watt) of each key. <values> are integers, multiple
73 values can be set with "," separators. The sum of the
74 listed sensors is used for each key. EnergyIPMIPow‐
75 erSensors is optional, default value is "Node=number"
76 where "number" is the id of the first power sensor re‐
77 turned by ipmi-sensors.
78 i.e.
79 EnergyIPMIPowerSen‐
80 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
81 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
82 EnergyIPMIPowerSensors=Node=1280
83
84
85 The following acct_gather.conf parameters are defined to control
86 the IPMI config default values for libipmiconsole.
87
88
89 EnergyIPMIUsername=USERNAME
90 Specify BMC Username.
91
92 EnergyIPMIPassword=PASSWORD
93 Specify BMC Password.
94 Datasets provided by the plugin have name: <IPMI_SENSOR_LABEL>Power.
95
97 Required entry in slurm.conf:
98 AcctGatherEnergyType=acct_gather_energy/rapl
99 This plugin doesn't read any options from acct_gather.conf.
100 Dataset provided by the plugin is: Power.
101
103 Required entry in slurm.conf:
104 AcctGatherEnergyType=acct_gather_energy/xcc
105
106 Options used for acct_gather_energy/xcc include only in-band communica‐
107 tions with XClarity Controller, thus a reduced set of configurations is
108 supported:
109
110
111 EnergyIPMIFrequency=<number>
112 This parameter is the number of seconds between XCC
113 access samples. Default is 30 seconds.
114
115 EnergyIPMITimeout=<number>
116 Timeout, in seconds, for initializing the IPMI XCC
117 context for a new gathering thread. Default is 10 sec‐
118 onds.
119 Datasets provided by the plugin are: Energy, CurrPower.
120
122 Required entry in slurm.conf:
123 AcctGatherFilesystemType=acct_gather_filesystem/lustre
124 This plugin doesn't read any options from acct_gather.conf.
125 Datasets provided by the plugin are: Reads, ReadMB, Writes, WriteMB.
126
128 Required entry in slurm.conf:
129 AcctGatherProfileType=acct_gather_profile/hdf5
130
131 Options used for acct_gather_profile/hdf5 are as follows:
132
133
134 ProfileHDF5Dir=<path>
135 This parameter is the path to the shared folder into
136 which the acct_gather_profile plugin will write detailed
137 data (usually as an HDF5 file). The directory is assumed
138 to be on a file system shared by the controller and all
139 compute nodes. This is a required parameter.
140
141 ProfileHDF5Default
142 A comma-delimited list of data types to be collected for
143 each job submission. Allowed values are:
144
145 All All data types are collected. (Cannot be combined
146 with other values.)
147
148 None No data types are collected. This is the default.
149 (Cannot be combined with other values.)
150
151 Energy Energy data is collected.
152
153 Filesystem
154 File system (Lustre) data is collected.
155
156 Network Network (InfiniBand) data is collected.
157
158 Task Task (I/O, Memory, ...) data is collected.
159
161 Required entry in slurm.conf:
162 AcctGatherProfileType=acct_gather_profile/influxdb
163
164 The InfluxDB plugin provides the same information as the HDF5 plugin
165 but will instead send information to the configured InfluxDB server.
166
167 The InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any
168 site running a v2.x InfluxDB server will need to configure a v1.x com‐
169 patibility endpoint along with the correct user and password authoriza‐
170 tion. Token authentication is not currently supported.
171
172 Options:
173 ProfileInfluxDBDatabase
174 InfluxDB v1.x database name where profiling information is to be
175 written. InfluxDB v2.x bucket name where profiling information
176 is to be written.
177
178 ProfileInfluxDBDefault
179 A comma-delimited list of data types to be collected for each
180 job submission. Allowed values are:
181
182 All All data types are collected. Cannot be combined with
183 other values.
184
185 None No data types are collected. This is the default.
186 Cannot be combined with other values.
187
188 Energy Energy data is collected.
189
190 Filesystem
191 File system (Lustre) data is collected.
192
193 Network Network (InfiniBand) data is collected.
194
195 Task Task (I/O, Memory, ...) data is collected.
196
197 ProfileInfluxDBHost=<hostname>:<port>
198 The hostname of the machine where the InfluxDB instance is exe‐
199 cuted and the port used by the HTTP API. The port used by the
200 HTTP API is the one configured through the bind-address in‐
201 fluxdb.conf option in the [http] section. Example:
202 ProfileInfluxDBHost=myinfluxhost:8086
203
204 ProfileInfluxDBPass
205 Password for username configured in ProfileInfluxDBUser. Re‐
206 quired in v2.x and optional in v1.x InfluxDB.
207
208 ProfileInfluxDBRTPolicy
209 The InfluxDB v1.x retention policy name for the database config‐
210 ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
211 tion policy bucket name for the database configured in Profile‐
212 InfluxDBDatabase option.
213
214 ProfileInfluxDBUser
215 InfluxDB username that should be used to gain access to the
216 database configured in ProfileInfluxDBDatabase. Required in v2.x
217 and optional in v1.x InfluxDB. This is only needed if InfluxDB
218 v1.x is configured with authentication enabled in the [http]
219 config section and a user has been granted at least WRITE access
220 to the database. See also ProfileInfluxDBPass.
221
222 NOTES:
223 This plugin requires the libcurl development files to be installed and
224 linkable at configure time. The plugin will not build otherwise.
225
226 Information on how to install and configure InfluxDB and manage data‐
227 bases, retention policies and such is available on the official web‐
228 page.
229
230 Collected information is written from every compute node where a job
231 runs to the InfluxDB instance listening on the ProfileInfluxDBHost. In
232 order to avoid overloading the InfluxDB instance with incoming connec‐
233 tion requests, the plugin uses an internal buffer which is filled with
234 samples. Once the buffer is full, a HTTP API write request is performed
235 and the buffer is emptied to hold subsequent samples. A final request
236 is also performed when a task ends even if the buffer isn't full.
237
238 Failed HTTP API write requests are silently discarded. This means that
239 collected profile information in the plugin buffer is lost if it can't
240 be written to the InfluxDB database for any reason.
241
242 Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
243 File. In order to troubleshoot any issues, it is recommended to tempo‐
244 rarily increase the slurmd debug level to debug3 and add Profile to the
245 debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
246 Debug and DebugFlags respectively or dynamically through scontrol set‐
247 debug and setdebugflags.
248
249 Grafana can be used to create charts based on the data held by In‐
250 fluxDB. This kind of tool permits one to create dashboards, tables and
251 other graphics using the stored time series.
252
253
255 Required entry in slurm.conf:
256 AcctGatherInterconnectType=acct_gather_interconnect/ofed
257
258 Options used for acct_gather_interconnect/ofed are as follows:
259
260
261 InfinibandOFEDPort=<number>
262 This parameter represents the port number of the local
263 Infiniband card that we are willing to monitor. The
264 default port is 1.
265 Datasets provided by the plugin: PacketsIn, PacketsOut, InMB, OutMB
266
267
269 Required entry in slurm.conf:
270 AcctGatherInterconnectType=acct_gather_interconnect/sysfs
271
272 Options used for acct_gather_interconnect/sysfs are as follows:
273
274
275 SysfsInterfaces=<interfaces>
276 Comma-separated list of interface names to collect
277 statistics from. Usage from all listed interfaces will
278 be summed together, and is not broken down individu‐
279 ally.
280 Datasets provided by the plugin: PacketsIn, PacketsOut, InMB, OutMB
281
282
284 ###
285 # Slurm acct_gather configuration file
286 ###
287 # Parameters for acct_gather_energy/impi plugin
288 EnergyIPMIFrequency=10
289 EnergyIPMICalcAdjustment=yes
290 #
291 # Parameters for acct_gather_profile/hdf5 plugin
292 ProfileHDF5Dir=/app/slurm/profile_data
293 # Parameters for acct_gather_interconnect/ofed plugin
294 InfinibandOFEDPort=1
295
296
298 Copyright (C) 2012-2013 Bull. Copyright (C) 2012-2022 SchedMD LLC.
299 Produced at Bull (cf, DISCLAIMER).
300
301 This file is part of Slurm, a resource management program. For de‐
302 tails, see <https://slurm.schedmd.com/>.
303
304 Slurm is free software; you can redistribute it and/or modify it under
305 the terms of the GNU General Public License as published by the Free
306 Software Foundation; either version 2 of the License, or (at your op‐
307 tion) any later version.
308
309 Slurm is distributed in the hope that it will be useful, but WITHOUT
310 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
311 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
312 for more details.
313
314
316 slurm.conf(5)
317
318
319
320April 2022 Slurm Configuration File acct_gather.conf(5)