1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is a UTF8 formatted file which defines parameters used
11 by Slurm's acct_gather related plugins. The file location can be modi‐
12 fied at system build time using the DEFAULT_SLURM_CONF parameter or at
13 execution time by setting the SLURM_CONF environment variable. The file
14 will always be located in the same directory as the slurm.conf file.
15
16 Parameter names are case insensitive but parameter values are case sen‐
17 sistive. Any text following a "#" in the configuration file is treated
18 as a comment through the end of that line. The size of each line in
19 the file is limited to 1024 characters.
20
21 Changes to the configuration file take effect upon restart of the Slurm
22 daemons.
23
24
25 The following acct_gather.conf parameters are defined to control the
26 general behavior of various plugins in Slurm.
27
28
29 The acct_gather.conf file is different than other Slurm .conf files.
30 Each plugin defines which options are available. Each plugin to be
31 loaded must be specified in the slurm.conf under the following configu‐
32 ration entries:
33
34 • AcctGatherEnergyType (plugin type=acct_gather_energy)
35 • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
36 • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
37 • AcctGatherProfileType (plugin type=acct_gather_profile)
38
39
40 If the respective plugin for an option is not loaded then that option
41 will appear to be unknown by Slurm and silently ignored. If you decide
42 to change plugin types you may also have to change the related options.
43
44
46 Options used for acct_gather_energy/ipmi are as follows:
47
48
49 EnergyIPMIFrequency=<number>
50 This parameter is the number of seconds between BMC
51 access samples.
52
53
54 EnergyIPMICalcAdjustment=<yes|no>
55 If set to "yes", the consumption between the last BMC
56 access sample and a step consumption update is approx‐
57 imated to get more accurate task consumption. The ad‐
58 justment is made at the step start and each time the
59 consumption is updated, including the step end. The
60 approximations are not accumulated, only the first and
61 last adjustments are used to calculated the consump‐
62 tion. The default is "no".
63
64
65 EnergyIPMIPowerSensors=<key=values>
66 Optionally specify the ids of the sensors to used.
67 Multiple <key=values> can be set with ";" separators.
68 The key "Node" is mandatory and is used to know the
69 consumed energy for nodes (scontrol show node) and
70 jobs (sacct). Other keys are optional and are named
71 by administrator. These keys are useful only when
72 profile is activated for energy to store power (in
73 watt) of each key. <values> are integers, multiple
74 values can be set with "," separators. The sum of the
75 listed sensors is used for each key. EnergyIPMIPow‐
76 erSensors is optional, default value is "Node=number"
77 where "number" is the id of the first power sensor re‐
78 turned by ipmi-sensors.
79 i.e.
80 EnergyIPMIPowerSen‐
81 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
82 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
83 EnergyIPMIPowerSensors=Node=1280
84
85
86 The following acct_gather.conf parameters are defined to control
87 the IPMI config default values for libipmiconsole.
88
89
90 EnergyIPMIUsername=USERNAME
91 Specify BMC Username.
92
93
94 EnergyIPMIPassword=PASSWORD
95 Specify BMC Password.
96
97
99 Options used for acct_gather_energy/xcc include only in-band communica‐
100 tions with XClarity Controller, thus a reduced set of configurations is
101 supported:
102
103
104 EnergyIPMIFrequency=<number>
105 This parameter is the number of seconds between XCC
106 access samples. Default is 30 seconds.
107
108
109 EnergyIPMITimeout=<number>
110 Timeout, in seconds, for initializing the IPMI XCC
111 context for a new gathering thread. Default is 10 sec‐
112 onds.
113
114
116 Options used for acct_gather_profile/hdf5 are as follows:
117
118
119 ProfileHDF5Dir=<path>
120 This parameter is the path to the shared folder into
121 which the acct_gather_profile plugin will write detailed
122 data (usually as an HDF5 file). The directory is assumed
123 to be on a file system shared by the controller and all
124 compute nodes. This is a required parameter.
125
126
127 ProfileHDF5Default
128 A comma delimited list of data types to be collected for
129 each job submission. Allowed values are:
130
131
132 All All data types are collected. (Cannot be combined
133 with other values.)
134
135
136 None No data types are collected. This is the default.
137 (Cannot be combined with other values.)
138
139
140 Energy Energy data is collected.
141
142
143 Filesystem
144 File system (Lustre) data is collected.
145
146
147 Network Network (InfiniBand) data is collected.
148
149
150 Task Task (I/O, Memory, ...) data is collected.
151
152
154 The InfluxDB plugin provides the same information as the HDF5 plugin
155 but will instead send information to the configured InfluxDB server.
156
157 The InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any
158 site running a v2.x InfluxDB server will need to configure a v1.x com‐
159 patiblity endpoint along with the correct user and password authoriza‐
160 tion. Token authentication is not currently supported.
161
162 Options:
163 ProfileInfluxDBDatabase
164 InfluxDB v1.x database name where profiling information is to be
165 written. InfluxDB v2.x bucket name where profiling information
166 is to be written.
167
168
169 ProfileInfluxDBDefault
170 A comma delimited list of data types to be collected for each
171 job submission. Allowed values are:
172
173
174 All All data types are collected. Cannot be combined with
175 other values.
176
177
178 None No data types are collected. This is the default.
179 Cannot be combined with other values.
180
181
182 Energy Energy data is collected.
183
184
185 Filesystem
186 File system (Lustre) data is collected.
187
188
189 Network Network (InfiniBand) data is collected.
190
191
192 Task Task (I/O, Memory, ...) data is collected.
193
194
195 ProfileInfluxDBHost=<hostname>:<port>
196 The hostname of the machine where the influxd instance is exe‐
197 cuted and the port used by the HTTP API. The port used by the
198 HTTP API is the one configured through the bind-address in‐
199 fluxdb.conf option in the [http] section. Example:
200 ProfileInfluxDBHost=myinfluxhost:8086
201
202 ProfileInfluxDBPass
203 Password for username configured in ProfileInfluxDBUser. Re‐
204 quired in v2.x and optional in v1.x InfluxDB.
205
206
207 ProfileInfluxDBRTPolicy
208 The InfluxDB v1.x retention policy name for the database config‐
209 ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
210 tion policy bucket name for the database configured in Profile‐
211 InfluxDBDatabase option.
212
213
214 ProfileInfluxDBUser
215 InfluxDB username that should be used to gain access to the
216 database configured in ProfileInfluxDBDatabase. Required in v2.x
217 and optional in v1.x InfluxDB. This is only needed if InfluxDB
218 v1.x is configured with authentication enabled in the [http]
219 config section and a user has been granted at least WRITE access
220 to the database. See also ProfileInfluxDBPass.
221
222 NOTES:
223 This plugin requires the libcurl development files to be installed and
224 linkable at configure time. The plugin will not build otherwise.
225
226 Information on how to install and configure InfluxDB and manage data‐
227 bases, retention policies and such is available on the official web‐
228 page.
229
230 Collected information is written from every compute node where a job
231 runs to the influxd instance listening on the ProfileInfluxDBHost. In
232 order to avoid overloading the influxd instance with incoming connec‐
233 tion requests, the plugin uses an internal buffer which is filled with
234 samples. Once the buffer is full, a HTTP API write request is performed
235 and the buffer is emptied to hold subsequent samples. A final request
236 is also performed when a task ends even if the buffer isn't full.
237
238 Failed HTTP API write requests are silently discarded. This means that
239 collected profile information in the plugin buffer is lost if it can't
240 be written to the influxd database for any reason.
241
242 Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
243 File. In order to troubleshoot any issues, it is recommended to tempo‐
244 rarily increase the slurmd debug level to debug3 and add Profile to the
245 debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
246 Debug and DebugFlags respectively or dynamically through scontrol set‐
247 debug and setdebugflags.
248
249 Grafana can be used to create charts based on the data held by In‐
250 fluxDB. This kind of tool permits one to create dashboards, tables and
251 other graphics using the stored time series.
252
253
255 Options used for acct_gather_interconnect/ofed are as follows:
256
257
258 InfinibandOFEDPort=<number>
259 This parameter represents the port number of the local
260 Infiniband card that we are willing to monitor. The
261 default port is 1.
262
264 ###
265 # Slurm acct_gather configuration file
266 ###
267 # Parameters for acct_gather_energy/impi plugin
268 EnergyIPMIFrequency=10
269 EnergyIPMICalcAdjustment=yes
270 #
271 # Parameters for acct_gather_profile/hdf5 plugin
272 ProfileHDF5Dir=/app/slurm/profile_data
273 # Parameters for acct_gather_interconnect/ofed plugin
274 InfinibandOFEDPort=1
275
276
278 Copyright (C) 2012-2013 Bull. Produced at Bull (cf, DISCLAIMER).
279
280 This file is part of Slurm, a resource management program. For de‐
281 tails, see <https://slurm.schedmd.com/>.
282
283 Slurm is free software; you can redistribute it and/or modify it under
284 the terms of the GNU General Public License as published by the Free
285 Software Foundation; either version 2 of the License, or (at your op‐
286 tion) any later version.
287
288 Slurm is distributed in the hope that it will be useful, but WITHOUT
289 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
290 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
291 for more details.
292
293
295 slurm.conf(5)
296
297
298
299April 2021 Slurm Configuration File acct_gather.conf(5)