1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf is a UTF8 formatted file which defines parameters used
11       by Slurm's acct_gather related plugins.  The file location can be modi‐
12       fied  at system build time using the DEFAULT_SLURM_CONF parameter or at
13       execution time by setting the SLURM_CONF environment variable. The file
14       will always be located in the same directory as the slurm.conf file.
15
16       Parameter names are case insensitive but parameter values are case sen‐
17       sitive.  Any text following a "#" in the configuration file is  treated
18       as  a  comment  through the end of that line.  The size of each line in
19       the file is limited to 1024 characters.
20
21       Changes to the configuration file take effect upon restart of the Slurm
22       daemons.
23
24
25       The  following  acct_gather.conf  parameters are defined to control the
26       general behavior of various plugins in Slurm.
27
28
29       The acct_gather.conf file is different than other  Slurm  .conf  files.
30       Each  plugin  defines  which  options  are available. Each plugin to be
31       loaded must be specified in the slurm.conf under the following configu‐
32       ration entries:
33
34       • AcctGatherEnergyType (plugin type=acct_gather_energy)
35       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
36       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
37       • AcctGatherProfileType (plugin type=acct_gather_profile)
38
39
40       If  the  respective plugin for an option is not loaded then that option
41       will appear to be unknown by Slurm and silently ignored. If you  decide
42       to change plugin types you may also have to change the related options.
43
44

acct_gather_energy/IPMI

46       Required entry in slurm.conf:
47              AcctGatherEnergyType=acct_gather_energy/ipmi
48
49       Options used for acct_gather_energy/ipmi are as follows:
50
51
52              EnergyIPMIFrequency=<number>
53                        This  parameter  is  the number of seconds between BMC
54                        access samples.
55
56
57              EnergyIPMICalcAdjustment=<yes|no>
58                        If set to "yes", the consumption between the last  BMC
59                        access sample and a step consumption update is approx‐
60                        imated to get more accurate task consumption.  The ad‐
61                        justment  is  made at the step start and each time the
62                        consumption is updated, including the  step  end.  The
63                        approximations are not accumulated, only the first and
64                        last adjustments are used to calculated  the  consump‐
65                        tion. The default is "no".
66
67
68              EnergyIPMIPowerSensors=<key=values>
69                        Optionally  specify  the  ids  of the sensors to used.
70                        Multiple <key=values> can be set with ";"  separators.
71                        The  key  "Node"  is mandatory and is used to know the
72                        consumed energy for nodes  (scontrol  show  node)  and
73                        jobs  (sacct).   Other keys are optional and are named
74                        by administrator.  These keys  are  useful  only  when
75                        profile  is  activated  for  energy to store power (in
76                        watt) of each key.  <values>  are  integers,  multiple
77                        values can be set with "," separators.  The sum of the
78                        listed sensors is used for each  key.   EnergyIPMIPow‐
79                        erSensors  is optional, default value is "Node=number"
80                        where "number" is the id of the first power sensor re‐
81                        turned by ipmi-sensors.
82                        i.e.
83                        EnergyIPMIPowerSen‐
84                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
85                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
86                        EnergyIPMIPowerSensors=Node=1280
87
88
89              The following acct_gather.conf parameters are defined to control
90              the IPMI config default values for libipmiconsole.
91
92
93              EnergyIPMIUsername=USERNAME
94                        Specify BMC Username.
95
96
97              EnergyIPMIPassword=PASSWORD
98                        Specify BMC Password.
99
100

acct_gather_energy/XCC

102       Required entry in slurm.conf:
103              AcctGatherEnergyType=acct_gather_energy/xcc
104
105       Options used for acct_gather_energy/xcc include only in-band communica‐
106       tions with XClarity Controller, thus a reduced set of configurations is
107       supported:
108
109
110              EnergyIPMIFrequency=<number>
111                        This parameter is the number of  seconds  between  XCC
112                        access samples.  Default is 30 seconds.
113
114
115              EnergyIPMITimeout=<number>
116                        Timeout,  in  seconds,  for  initializing the IPMI XCC
117                        context for a new gathering thread. Default is 10 sec‐
118                        onds.
119
120

acct_gather_profile/HDF5

122       Required entry in slurm.conf:
123              AcctGatherProfileType=acct_gather_profile/hdf5
124
125       Options used for acct_gather_profile/hdf5 are as follows:
126
127
128              ProfileHDF5Dir=<path>
129                     This  parameter  is  the  path  to the shared folder into
130                     which the acct_gather_profile plugin will write  detailed
131                     data (usually as an HDF5 file).  The directory is assumed
132                     to be on a file system shared by the controller  and  all
133                     compute nodes. This is a required parameter.
134
135
136              ProfileHDF5Default
137                     A  comma-delimited list of data types to be collected for
138                     each job submission.  Allowed values are:
139
140
141                     All     All data types are collected. (Cannot be combined
142                             with other values.)
143
144
145                     None    No data types are collected. This is the default.
146                             (Cannot be combined with other values.)
147
148
149                     Energy  Energy data is collected.
150
151
152                     Filesystem
153                             File system (Lustre) data is collected.
154
155
156                     Network Network (InfiniBand) data is collected.
157
158
159                     Task    Task (I/O, Memory, ...) data is collected.
160
161

acct_gather_profile/InfluxDB

163       Required entry in slurm.conf:
164              AcctGatherProfileType=acct_gather_profile/influxdb
165
166       The InfluxDB plugin provides the same information as  the  HDF5  plugin
167       but will instead send information to the configured InfluxDB server.
168
169       The  InfluxDB  plugin is designed against 1.x protocol of InfluxDB. Any
170       site running a v2.x InfluxDB server will need to configure a v1.x  com‐
171       patibility endpoint along with the correct user and password authoriza‐
172       tion. Token authentication is not currently supported.
173
174   Options:
175       ProfileInfluxDBDatabase
176              InfluxDB v1.x database name where profiling information is to be
177              written.   InfluxDB v2.x bucket name where profiling information
178              is to be written.
179
180
181       ProfileInfluxDBDefault
182              A comma-delimited list of data types to be  collected  for  each
183              job submission.  Allowed values are:
184
185
186              All       All  data types are collected. Cannot be combined with
187                        other values.
188
189
190              None      No data types are  collected.  This  is  the  default.
191                        Cannot be combined with other values.
192
193
194              Energy    Energy data is collected.
195
196
197              Filesystem
198                        File system (Lustre) data is collected.
199
200
201              Network   Network (InfiniBand) data is collected.
202
203
204              Task      Task (I/O, Memory, ...) data is collected.
205
206
207       ProfileInfluxDBHost=<hostname>:<port>
208              The  hostname of the machine where the InfluxDB instance is exe‐
209              cuted and the port used by the HTTP API. The port  used  by  the
210              HTTP  API  is  the  one  configured through the bind-address in‐
211              fluxdb.conf option in the [http] section.   Example:
212              ProfileInfluxDBHost=myinfluxhost:8086
213
214       ProfileInfluxDBPass
215              Password for username  configured  in  ProfileInfluxDBUser.  Re‐
216              quired in v2.x and optional in v1.x InfluxDB.
217
218
219       ProfileInfluxDBRTPolicy
220              The InfluxDB v1.x retention policy name for the database config‐
221              ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
222              tion  policy bucket name for the database configured in Profile‐
223              InfluxDBDatabase option.
224
225
226       ProfileInfluxDBUser
227              InfluxDB username that should be used  to  gain  access  to  the
228              database configured in ProfileInfluxDBDatabase. Required in v2.x
229              and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
230              v1.x  is  configured  with  authentication enabled in the [http]
231              config section and a user has been granted at least WRITE access
232              to the database. See also ProfileInfluxDBPass.
233
234   NOTES:
235       This  plugin requires the libcurl development files to be installed and
236       linkable at configure time. The plugin will not build otherwise.
237
238       Information on how to install and configure InfluxDB and  manage  data‐
239       bases,  retention  policies  and such is available on the official web‐
240       page.
241
242       Collected information is written from every compute node  where  a  job
243       runs  to the InfluxDB instance listening on the ProfileInfluxDBHost. In
244       order to avoid overloading the InfluxDB instance with incoming  connec‐
245       tion  requests, the plugin uses an internal buffer which is filled with
246       samples. Once the buffer is full, a HTTP API write request is performed
247       and  the  buffer is emptied to hold subsequent samples. A final request
248       is also performed when a task ends even if the buffer isn't full.
249
250       Failed HTTP API write requests are silently discarded. This means  that
251       collected  profile information in the plugin buffer is lost if it can't
252       be written to the InfluxDB database for any reason.
253
254       Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
255       File.  In order to troubleshoot any issues, it is recommended to tempo‐
256       rarily increase the slurmd debug level to debug3 and add Profile to the
257       debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
258       Debug and DebugFlags respectively or dynamically through scontrol  set‐
259       debug and setdebugflags.
260
261       Grafana  can  be  used  to  create charts based on the data held by In‐
262       fluxDB.  This kind of tool permits one to create dashboards, tables and
263       other graphics using the stored time series.
264
265

acct_gather_interconnect/OFED

267       Required entry in slurm.conf:
268              AcctGatherInterconnectType=acct_gather_interconnect/ofed
269
270       Options used for acct_gather_interconnect/ofed are as follows:
271
272
273              InfinibandOFEDPort=<number>
274                        This parameter represents the port number of the local
275                        Infiniband card that we are willing to  monitor.   The
276                        default port is 1.
277

EXAMPLE

279       ###
280       # Slurm acct_gather configuration file
281       ###
282       # Parameters for acct_gather_energy/impi plugin
283       EnergyIPMIFrequency=10
284       EnergyIPMICalcAdjustment=yes
285       #
286       # Parameters for acct_gather_profile/hdf5 plugin
287       ProfileHDF5Dir=/app/slurm/profile_data
288       # Parameters for acct_gather_interconnect/ofed plugin
289       InfinibandOFEDPort=1
290
291

COPYING

293       Copyright  (C)  2012-2013  Bull.   Copyright (C) 2012-2021 SchedMD LLC.
294       Produced at Bull (cf, DISCLAIMER).
295
296       This file is part of Slurm, a resource  management  program.   For  de‐
297       tails, see <https://slurm.schedmd.com/>.
298
299       Slurm  is free software; you can redistribute it and/or modify it under
300       the terms of the GNU General Public License as published  by  the  Free
301       Software  Foundation;  either version 2 of the License, or (at your op‐
302       tion) any later version.
303
304       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
305       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
306       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
307       for more details.
308
309

SEE ALSO

311       slurm.conf(5)
312
313
314
315June 2021                  Slurm Configuration File        acct_gather.conf(5)
Impressum