1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is a UTF8 formatted file which defines parameters used
11 by Slurm's acct_gather related plugins. The file will always be lo‐
12 cated in the same directory as the slurm.conf.
13
14 Parameter names are case insensitive but parameter values are case sen‐
15 sitive. Any text following a "#" in the configuration file is treated
16 as a comment through the end of that line. The size of each line in
17 the file is limited to 1024 characters.
18
19 Changes to the configuration file take effect upon restart of the Slurm
20 daemons.
21
22
23 The following acct_gather.conf parameters are defined to control the
24 general behavior of various plugins in Slurm.
25
26
27 The acct_gather.conf file is different than other Slurm .conf files.
28 Each plugin defines which options are available. Each plugin to be
29 loaded must be specified in the slurm.conf under the following configu‐
30 ration entries:
31
32 • AcctGatherEnergyType (plugin type=acct_gather_energy)
33 • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
34 • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
35 • AcctGatherProfileType (plugin type=acct_gather_profile)
36
37
38 If the respective plugin for an option is not loaded then that option
39 will appear to be unknown by Slurm and silently ignored. If you decide
40 to change plugin types you may also have to change the related options.
41
42
44 Required entry in slurm.conf:
45 AcctGatherEnergyType=acct_gather_energy/ipmi
46
47 Options used for acct_gather_energy/ipmi are as follows:
48
49
50 EnergyIPMIFrequency=<number>
51 This parameter is the number of seconds between BMC
52 access samples.
53
54 EnergyIPMICalcAdjustment=<yes|no>
55 If set to "yes", the consumption between the last BMC
56 access sample and a step consumption update is approx‐
57 imated to get more accurate task consumption. The ad‐
58 justment is made at the step start and each time the
59 consumption is updated, including the step end. The
60 approximations are not accumulated, only the first and
61 last adjustments are used to calculated the consump‐
62 tion. The default is "no".
63
64 EnergyIPMIPowerSensors=<key=values>
65 Optionally specify the ids of the sensors to used.
66 Multiple <key=values> can be set with ";" separators.
67 The key "Node" is mandatory and is used to know the
68 consumed energy for nodes (scontrol show node) and
69 jobs (sacct). Other keys are optional and are named
70 by administrator. These keys are useful only when
71 profile is activated for energy to store power (in
72 watt) of each key. <values> are integers, multiple
73 values can be set with "," separators. The sum of the
74 listed sensors is used for each key. EnergyIPMIPow‐
75 erSensors is optional, default value is "Node=number"
76 where "number" is the id of the first power sensor re‐
77 turned by ipmi-sensors.
78 i.e.
79 EnergyIPMIPowerSen‐
80 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
81 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
82 EnergyIPMIPowerSensors=Node=1280
83
84
85 The following acct_gather.conf parameters are defined to control
86 the IPMI config default values for libipmiconsole.
87
88
89 EnergyIPMIUsername=USERNAME
90 Specify BMC Username.
91
92 EnergyIPMIPassword=PASSWORD
93 Specify BMC Password.
94
96 Required entry in slurm.conf:
97 AcctGatherEnergyType=acct_gather_energy/xcc
98
99 Options used for acct_gather_energy/xcc include only in-band communica‐
100 tions with XClarity Controller, thus a reduced set of configurations is
101 supported:
102
103
104 EnergyIPMIFrequency=<number>
105 This parameter is the number of seconds between XCC
106 access samples. Default is 30 seconds.
107
108 EnergyIPMITimeout=<number>
109 Timeout, in seconds, for initializing the IPMI XCC
110 context for a new gathering thread. Default is 10 sec‐
111 onds.
112
114 Required entry in slurm.conf:
115 AcctGatherProfileType=acct_gather_profile/hdf5
116
117 Options used for acct_gather_profile/hdf5 are as follows:
118
119
120 ProfileHDF5Dir=<path>
121 This parameter is the path to the shared folder into
122 which the acct_gather_profile plugin will write detailed
123 data (usually as an HDF5 file). The directory is assumed
124 to be on a file system shared by the controller and all
125 compute nodes. This is a required parameter.
126
127 ProfileHDF5Default
128 A comma-delimited list of data types to be collected for
129 each job submission. Allowed values are:
130
131 All All data types are collected. (Cannot be combined
132 with other values.)
133
134 None No data types are collected. This is the default.
135 (Cannot be combined with other values.)
136
137 Energy Energy data is collected.
138
139 Filesystem
140 File system (Lustre) data is collected.
141
142 Network Network (InfiniBand) data is collected.
143
144 Task Task (I/O, Memory, ...) data is collected.
145
147 Required entry in slurm.conf:
148 AcctGatherProfileType=acct_gather_profile/influxdb
149
150 The InfluxDB plugin provides the same information as the HDF5 plugin
151 but will instead send information to the configured InfluxDB server.
152
153 The InfluxDB plugin is designed against 1.x protocol of InfluxDB. Any
154 site running a v2.x InfluxDB server will need to configure a v1.x com‐
155 patibility endpoint along with the correct user and password authoriza‐
156 tion. Token authentication is not currently supported.
157
158 Options:
159 ProfileInfluxDBDatabase
160 InfluxDB v1.x database name where profiling information is to be
161 written. InfluxDB v2.x bucket name where profiling information
162 is to be written.
163
164 ProfileInfluxDBDefault
165 A comma-delimited list of data types to be collected for each
166 job submission. Allowed values are:
167
168 All All data types are collected. Cannot be combined with
169 other values.
170
171 None No data types are collected. This is the default.
172 Cannot be combined with other values.
173
174 Energy Energy data is collected.
175
176 Filesystem
177 File system (Lustre) data is collected.
178
179 Network Network (InfiniBand) data is collected.
180
181 Task Task (I/O, Memory, ...) data is collected.
182
183 ProfileInfluxDBHost=<hostname>:<port>
184 The hostname of the machine where the InfluxDB instance is exe‐
185 cuted and the port used by the HTTP API. The port used by the
186 HTTP API is the one configured through the bind-address in‐
187 fluxdb.conf option in the [http] section. Example:
188 ProfileInfluxDBHost=myinfluxhost:8086
189
190 ProfileInfluxDBPass
191 Password for username configured in ProfileInfluxDBUser. Re‐
192 quired in v2.x and optional in v1.x InfluxDB.
193
194 ProfileInfluxDBRTPolicy
195 The InfluxDB v1.x retention policy name for the database config‐
196 ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
197 tion policy bucket name for the database configured in Profile‐
198 InfluxDBDatabase option.
199
200 ProfileInfluxDBUser
201 InfluxDB username that should be used to gain access to the
202 database configured in ProfileInfluxDBDatabase. Required in v2.x
203 and optional in v1.x InfluxDB. This is only needed if InfluxDB
204 v1.x is configured with authentication enabled in the [http]
205 config section and a user has been granted at least WRITE access
206 to the database. See also ProfileInfluxDBPass.
207
208 NOTES:
209 This plugin requires the libcurl development files to be installed and
210 linkable at configure time. The plugin will not build otherwise.
211
212 Information on how to install and configure InfluxDB and manage data‐
213 bases, retention policies and such is available on the official web‐
214 page.
215
216 Collected information is written from every compute node where a job
217 runs to the InfluxDB instance listening on the ProfileInfluxDBHost. In
218 order to avoid overloading the InfluxDB instance with incoming connec‐
219 tion requests, the plugin uses an internal buffer which is filled with
220 samples. Once the buffer is full, a HTTP API write request is performed
221 and the buffer is emptied to hold subsequent samples. A final request
222 is also performed when a task ends even if the buffer isn't full.
223
224 Failed HTTP API write requests are silently discarded. This means that
225 collected profile information in the plugin buffer is lost if it can't
226 be written to the InfluxDB database for any reason.
227
228 Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
229 File. In order to troubleshoot any issues, it is recommended to tempo‐
230 rarily increase the slurmd debug level to debug3 and add Profile to the
231 debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
232 Debug and DebugFlags respectively or dynamically through scontrol set‐
233 debug and setdebugflags.
234
235 Grafana can be used to create charts based on the data held by In‐
236 fluxDB. This kind of tool permits one to create dashboards, tables and
237 other graphics using the stored time series.
238
239
241 Required entry in slurm.conf:
242 AcctGatherInterconnectType=acct_gather_interconnect/ofed
243
244 Options used for acct_gather_interconnect/ofed are as follows:
245
246
247 InfinibandOFEDPort=<number>
248 This parameter represents the port number of the local
249 Infiniband card that we are willing to monitor. The
250 default port is 1.
251
252
254 ###
255 # Slurm acct_gather configuration file
256 ###
257 # Parameters for acct_gather_energy/impi plugin
258 EnergyIPMIFrequency=10
259 EnergyIPMICalcAdjustment=yes
260 #
261 # Parameters for acct_gather_profile/hdf5 plugin
262 ProfileHDF5Dir=/app/slurm/profile_data
263 # Parameters for acct_gather_interconnect/ofed plugin
264 InfinibandOFEDPort=1
265
266
268 Copyright (C) 2012-2013 Bull. Copyright (C) 2012-2022 SchedMD LLC.
269 Produced at Bull (cf, DISCLAIMER).
270
271 This file is part of Slurm, a resource management program. For de‐
272 tails, see <https://slurm.schedmd.com/>.
273
274 Slurm is free software; you can redistribute it and/or modify it under
275 the terms of the GNU General Public License as published by the Free
276 Software Foundation; either version 2 of the License, or (at your op‐
277 tion) any later version.
278
279 Slurm is distributed in the hope that it will be useful, but WITHOUT
280 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
281 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
282 for more details.
283
284
286 slurm.conf(5)
287
288
289
290January 2022 Slurm Configuration File acct_gather.conf(5)