1acct_gather.conf(5) Slurm Configuration File acct_gather.conf(5)
2
3
4
6 acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8
10 acct_gather.conf is an ASCII file which defines parameters used by
11 Slurm's acct_gather related plugins. The file location can be modified
12 at system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
13 cution time by setting the SLURM_CONF environment variable. The file
14 will always be located in the same directory as the slurm.conf file.
15
16 Parameter names are case insensitive. Any text following a "#" in the
17 configuration file is treated as a comment through the end of that
18 line. The size of each line in the file is limited to 1024 characters.
19 Changes to the configuration file take effect upon restart of Slurm
20 daemons, daemon receipt of the SIGHUP signal, or execution of the com‐
21 mand "scontrol reconfigure" unless otherwise noted.
22
23
24 The following acct_gather.conf parameters are defined to control the
25 general behavior of various plugins in Slurm.
26
27
28 The acct_gather.conf file is different than other Slurm .conf files.
29 Each plugin defines which options are available. So if you do not load
30 the respective plugin for an option that option will appear to be
31 unknown by Slurm and could cause Slurm not to load. If you decide to
32 change plugin types you might also have to change the related options
33 as well.
34
35
36 EnergyIPMI
37 Options used for AcctGatherEnergyType/ipmi are as follows:
38
39
40 EnergyIPMIFrequency=<number>
41 This parameter is the number of seconds between BMC
42 access samples.
43
44
45 EnergyIPMICalcAdjustment=<yes|no>
46 If set to "yes", the consumption between the last BMC
47 access sample and a step consumption update is approx‐
48 imated to get more accurate task consumption. The
49 adjustment is made at the step start and each time the
50 consumption is updated, including the step end. The
51 approximations are not accumulated, only the first and
52 last adjustments are used to calculated the consump‐
53 tion. The default is "no".
54
55
56 EnergyIPMIPowerSensors=<key=values>
57 Optionally specify the ids of the sensors to used.
58 Multiple <key=values> can be set with ";" separators.
59 The key "Node" is mandatory and is used to know the
60 consumed energy for nodes (scontrol show node) and
61 jobs (sacct). Other keys are optional and are named
62 by administrator. These keys are useful only when
63 profile is activated for energy to store power (in
64 watt) of each key. <values> are integers, multiple
65 values can be set with "," separators. The sum of the
66 listed sensors is used for each key. EnergyIPMIPow‐
67 erSensors is optional, default value is "Node=number"
68 where "number" is the id of the first power sensor
69 returned by ipmi-sensors.
70 i.e.
71 EnergyIPMIPowerSen‐
72 sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
73 EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
74 EnergyIPMIPowerSensors=Node=1280
75
76
77 The following acct_gather.conf parameters are defined to control
78 the IPMI config default values for libipmiconsole.
79
80
81 EnergyIPMIUsername=USERNAME
82 Specify BMC Username.
83
84
85 EnergyIPMIPassword=PASSWORD
86 Specify BMC Password.
87
88
89 ProfileHDF5
90 Options used for AcctGatherProfileType/hdf5 are as follows:
91
92
93 ProfileHDF5Dir=<path>
94 This parameter is the path to the shared folder into
95 which the acct_gather_profile plugin will write
96 detailed data (usually as an HDF5 file). The direc‐
97 tory is assumed to be on a file system shared by the
98 controller and all compute nodes. This is a required
99 parameter.
100
101
102 ProfileHDF5Default
103 A comma delimited list of data types to be collected
104 for each job submission. Allowed values are:
105
106
107 All All data types are collected. (Cannot be com‐
108 bined with other values.)
109
110
111 None No data types are collected. This is the
112 default. (Cannot be combined with other val‐
113 ues.)
114
115
116 Energy Energy data is collected.
117
118
119 Filesystem
120 File system (Lustre) data is collected.
121
122
123 Network Network (InfiniBand) data is collected.
124
125
126 Task Task (I/O, Memory, ...) data is collected.
127
128
129 ProfileInfluxDB
130 Options used for AcctGatherProfileType/influxdb are as follows:
131
132
133 ProfileInfluxDBDatabase
134 InfluxDB database name where profiling information is
135 to be written.
136
137
138 ProfileInfluxDBDefault
139 A comma delimited list of data types to be collected
140 for each job submission. Allowed values are:
141
142
143 All All data types are collected. (Cannot be com‐
144 bined with other values.)
145
146
147 None No data types are collected. This is the
148 default. (Cannot be combined with other val‐
149 ues.)
150
151
152 Energy Energy data is collected.
153
154
155 Filesystem
156 File system (Lustre) data is collected.
157
158
159 Network Network (InfiniBand) data is collected.
160
161
162 Task Task (I/O, Memory, ...) data is collected.
163
164
165 ProfileInfluxDBHost=<hostname>:<port>
166 The hostname of the machine where the influxd instance
167 is executed and the port used by the HTTP API. The
168 port used by the HTTP API is the one configured
169 through the bind-address influxdb.conf option in the
170 [http] section. Example:
171
172 ProfileInfluxDBHost=myinfluxhost:8086
173
174
175 ProfileInfluxDBPass
176 Optional password for username configured in Profile‐
177 InfluxDBUser.
178
179
180 ProfileInfluxDBRTPolicy
181 The InfluxDB retention policy name for the database
182 configured in ProfileInfluxDBDatabase option.
183
184
185 ProfileInfluxDBUser
186 Optional InfluxDB username that should be used to gain
187 access to the database configured in ProfileInfluxDB‐
188 Database. This is only needed InfluxDB is configured
189 with authentication enabled in the [http] config sec‐
190 tion and a user has been granted at least WRITE access
191 to the database. See also ProfileInfluxDBPass.
192
193
194 NOTE: This plugin requires the libcurl development files to be
195 installed.
196
197 NOTE: Information on how to install and configure InfluxDB and manage
198 databases, retention policies and such is available on the offi‐
199 cial webpage.
200
201 NOTE: Collected information is written from every compute node where a
202 job runs to the influxd instance listening on the ProfileIn‐
203 fluxDBHost. In order to avoid overloading the influxd instance
204 with incoming connection requests, the plugin uses an internal
205 buffer which is filled with samples. Once the buffer is full, a
206 HTTP API write request is performed and the buffer is emptied to
207 hold subsequent samples. A final request is also performed when
208 a task ends even if the buffer isn't full.
209
210 NOTE: Failed HTTP API write requests are discarded. This means that
211 collected profile information in the plugin buffer is lost if it
212 can't be written to the influxd database for any reason.
213
214 NOTE: Plugin messages are logged along with the slurmstepd logs to
215 SlurmdLogFile. In order to troubleshoot any issues, it is recom‐
216 mended to temporarily increase the slurmd debug level to debug3
217 and add Profile to the debug flags. This can be accomplished by
218 setting the slurm.conf SlurmdDebug and DebugFlags respectively
219 or dynamically through scontrol setdebug and setdebugflags.
220
221 NOTE: Perhaps it's a good idea to use a monitoring and analytics tool
222 such as Grafana on top of InfluxDB. This kind of tools permit
223 one to create dashboards, tables, and other graphics using the
224 stored time series. This way, it is easier to correlate resource
225 usage peaks reported by other node monitoring tools such as Gan‐
226 glia with specific job step tasks.
227
228
229 InfinibandOFED
230 Options used for AcctGatherInfinbandType/ofed are as follows:
231
232
233 InfinibandOFEDPort=<number>
234 This parameter represents the port number of the local
235 Infiniband card that we are willing to monitor. The
236 default port is 1.
237
239 ###
240 # Slurm acct_gather configuration file
241 ###
242 # Parameters for AcctGatherEnergy/impi plugin
243 EnergyIPMIFrequency=10
244 EnergyIPMICalcAdjustment=yes
245 #
246 # Parameters for AcctGatherProfileType/hdf5 plugin
247 ProfileHDF5Dir=/app/slurm/profile_data
248 # Parameters for AcctGatherInfiniband/ofed plugin
249 InfinibandOFEDPort=1
250
251
252
254 Copyright (C) 2012-2013 Bull. Produced at Bull (cf, DISCLAIMER).
255
256 This file is part of Slurm, a resource management program. For
257 details, see <https://slurm.schedmd.com/>.
258
259 Slurm is free software; you can redistribute it and/or modify it under
260 the terms of the GNU General Public License as published by the Free
261 Software Foundation; either version 2 of the License, or (at your
262 option) any later version.
263
264 Slurm is distributed in the hope that it will be useful, but WITHOUT
265 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
266 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
267 for more details.
268
269
271 slurm.conf(5)
272
273
274
275April 2015 Slurm Configuration File acct_gather.conf(5)