1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is a UTF8 formatted file which defines parameters used
11 by Slurm's acct_gather related plugins. The file location can be modi‐
12 fied at system build time using the DEFAULT_SLURM_CONF parameter or at
13 execution time by setting the SLURM_CONF environment variable. The file
14 will always be located in the same directory as the slurm.conf file.
15
16 Parameter names are case insensitive but parameter values are case sen‐
17 sitive. Any text following a "#" in the configuration file is treated
18 as a comment through the end of that line. The size of each line in
19 the file is limited to 1024 characters.
20
21 Changes to the configuration file take effect upon restart of the Slurm
22 daemons.
23
24
25 The following acct_gather.conf parameters are defined to control the
26 general behavior of various plugins in Slurm.
27
28
29 The acct_gather.conf file is different than other Slurm .conf files.
30 Each plugin defines which options are available. Each plugin to be
31 loaded must be specified in the slurm.conf under the following configu‐
32 ration entries:
33
34 • AcctGatherEnergyType (plugin type=acct_gather_energy)
35 • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
36 • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
37 • AcctGatherProfileType (plugin type=acct_gather_profile)
38
39
40 If the respective plugin for an option is not loaded then that option
41 will appear to be unknown by Slurm and silently ignored. If you decide
42 to change plugin types you may also have to change the related options.
43
44
46 Required entry in slurm.conf:
47 AcctGatherEnergyType=acct_gather_energy/ipmi
48
49 Options used for acct_gather_energy/ipmi are as follows:
50
51
52 EnergyIPMIFrequency=<number>
53 This parameter is the number of seconds between BMC
54 access samples.
55
56
57 EnergyIPMICalcAdjustment=<yes|no>
58 If set to "yes", the consumption between the last BMC
59 access sample and a step consumption update is approx‐
60 imated to get more accurate task consumption. The ad‐
61 justment is made at the step start and each time the
62 consumption is updated, including the step end. The
63 approximations are not accumulated, only the first and
64 last adjustments are used to calculated the consump‐
65 tion. The default is "no".
66
67
68 EnergyIPMIPowerSensors=<key=values>
69 Optionally specify the ids of the sensors to used.
70 Multiple <key=values> can be set with ";" separators.
71 The key "Node" is mandatory and is used to know the
72 consumed energy for nodes (scontrol show node) and
73 jobs (sacct). Other keys are optional and are named
74 by administrator. These keys are useful only when
75 profile is activated for energy to store power (in
76 watt) of each key. <values> are integers, multiple
77 values can be set with "," separators. The sum of the
78 listed sensors is used for each key. EnergyIPMIPow‐
79 erSensors is optional, default value is "Node=number"
80 where "number" is the id of the first power sensor re‐
81 turned by ipmi-sensors.
82 i.e.
83 EnergyIPMIPowerSen‐
84 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
85 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
86 EnergyIPMIPowerSensors=Node=1280
87
88
89 The following acct_gather.conf parameters are defined to control
90 the IPMI config default values for libipmiconsole.
91
92
93 EnergyIPMIUsername=USERNAME
94 Specify BMC Username.
95
96
97 EnergyIPMIPassword=PASSWORD
98 Specify BMC Password.
99
100
102 Required entry in slurm.conf:
103 AcctGatherEnergyType=acct_gather_energy/xcc
104
105 Options used for acct_gather_energy/xcc include only in-band communica‐
106 tions with XClarity Controller, thus a reduced set of configurations is
107 supported:
108
109
110 EnergyIPMIFrequency=<number>
111 This parameter is the number of seconds between XCC
112 access samples. Default is 30 seconds.
113
114
115 EnergyIPMITimeout=<number>
116 Timeout, in seconds, for initializing the IPMI XCC
117 context for a new gathering thread. Default is 10 sec‐
118 onds.
119
120
122 Required entry in slurm.conf:
123 AcctGatherProfileType=acct_gather_profile/hdf5
124
125 Options used for acct_gather_profile/hdf5 are as follows:
126
127
128 ProfileHDF5Dir=<path>
129 This parameter is the path to the shared folder into
130 which the acct_gather_profile plugin will write detailed
131 data (usually as an HDF5 file). The directory is assumed
132 to be on a file system shared by the controller and all
133 compute nodes. This is a required parameter.
134
135
136 ProfileHDF5Default
137 A comma-delimited list of data types to be collected for
138 each job submission. Allowed values are:
139
140
141 All All data types are collected. (Cannot be combined
142 with other values.)
143
144
145 None No data types are collected. This is the default.
146 (Cannot be combined with other values.)
147
148
149 Energy Energy data is collected.
150
151
152 Filesystem
153 File system (Lustre) data is collected.
154
155
156 Network Network (InfiniBand) data is collected.
157
158
159 Task Task (I/O, Memory, ...) data is collected.
160
161
163 Required entry in slurm.conf:
164 AcctGatherProfileType=acct_gather_profile/influxdb
165
166 The InfluxDB plugin provides the same information as the HDF5 plugin
167 but will instead send information to the configured InfluxDB server.
168
169 The InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any
170 site running a v2.x InfluxDB server will need to configure a v1.x com‐
171 patibility endpoint along with the correct user and password authoriza‐
172 tion. Token authentication is not currently supported.
173
174 Options:
175 ProfileInfluxDBDatabase
176 InfluxDB v1.x database name where profiling information is to be
177 written. InfluxDB v2.x bucket name where profiling information
178 is to be written.
179
180
181 ProfileInfluxDBDefault
182 A comma-delimited list of data types to be collected for each
183 job submission. Allowed values are:
184
185
186 All All data types are collected. Cannot be combined with
187 other values.
188
189
190 None No data types are collected. This is the default.
191 Cannot be combined with other values.
192
193
194 Energy Energy data is collected.
195
196
197 Filesystem
198 File system (Lustre) data is collected.
199
200
201 Network Network (InfiniBand) data is collected.
202
203
204 Task Task (I/O, Memory, ...) data is collected.
205
206
207 ProfileInfluxDBHost=<hostname>:<port>
208 The hostname of the machine where the InfluxDB instance is exe‐
209 cuted and the port used by the HTTP API. The port used by the
210 HTTP API is the one configured through the bind-address in‐
211 fluxdb.conf option in the [http] section. Example:
212 ProfileInfluxDBHost=myinfluxhost:8086
213
214 ProfileInfluxDBPass
215 Password for username configured in ProfileInfluxDBUser. Re‐
216 quired in v2.x and optional in v1.x InfluxDB.
217
218
219 ProfileInfluxDBRTPolicy
220 The InfluxDB v1.x retention policy name for the database config‐
221 ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
222 tion policy bucket name for the database configured in Profile‐
223 InfluxDBDatabase option.
224
225
226 ProfileInfluxDBUser
227 InfluxDB username that should be used to gain access to the
228 database configured in ProfileInfluxDBDatabase. Required in v2.x
229 and optional in v1.x InfluxDB. This is only needed if InfluxDB
230 v1.x is configured with authentication enabled in the [http]
231 config section and a user has been granted at least WRITE access
232 to the database. See also ProfileInfluxDBPass.
233
234 NOTES:
235 This plugin requires the libcurl development files to be installed and
236 linkable at configure time. The plugin will not build otherwise.
237
238 Information on how to install and configure InfluxDB and manage data‐
239 bases, retention policies and such is available on the official web‐
240 page.
241
242 Collected information is written from every compute node where a job
243 runs to the InfluxDB instance listening on the ProfileInfluxDBHost. In
244 order to avoid overloading the InfluxDB instance with incoming connec‐
245 tion requests, the plugin uses an internal buffer which is filled with
246 samples. Once the buffer is full, a HTTP API write request is performed
247 and the buffer is emptied to hold subsequent samples. A final request
248 is also performed when a task ends even if the buffer isn't full.
249
250 Failed HTTP API write requests are silently discarded. This means that
251 collected profile information in the plugin buffer is lost if it can't
252 be written to the InfluxDB database for any reason.
253
254 Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
255 File. In order to troubleshoot any issues, it is recommended to tempo‐
256 rarily increase the slurmd debug level to debug3 and add Profile to the
257 debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
258 Debug and DebugFlags respectively or dynamically through scontrol set‐
259 debug and setdebugflags.
260
261 Grafana can be used to create charts based on the data held by In‐
262 fluxDB. This kind of tool permits one to create dashboards, tables and
263 other graphics using the stored time series.
264
265
267 Required entry in slurm.conf:
268 AcctGatherInterconnectType=acct_gather_interconnect/ofed
269
270 Options used for acct_gather_interconnect/ofed are as follows:
271
272
273 InfinibandOFEDPort=<number>
274 This parameter represents the port number of the local
275 Infiniband card that we are willing to monitor. The
276 default port is 1.
277
279 ###
280 # Slurm acct_gather configuration file
281 ###
282 # Parameters for acct_gather_energy/impi plugin
283 EnergyIPMIFrequency=10
284 EnergyIPMICalcAdjustment=yes
285 #
286 # Parameters for acct_gather_profile/hdf5 plugin
287 ProfileHDF5Dir=/app/slurm/profile_data
288 # Parameters for acct_gather_interconnect/ofed plugin
289 InfinibandOFEDPort=1
290
291
293 Copyright (C) 2012-2013 Bull. Copyright (C) 2012-2021 SchedMD LLC.
294 Produced at Bull (cf, DISCLAIMER).
295
296 This file is part of Slurm, a resource management program. For de‐
297 tails, see <https://slurm.schedmd.com/>.
298
299 Slurm is free software; you can redistribute it and/or modify it under
300 the terms of the GNU General Public License as published by the Free
301 Software Foundation; either version 2 of the License, or (at your op‐
302 tion) any later version.
303
304 Slurm is distributed in the hope that it will be useful, but WITHOUT
305 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
306 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
307 for more details.
308
309
311 slurm.conf(5)
312
313
314
315June 2021 Slurm Configuration File acct_gather.conf(5)