1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf is a UTF8 formatted file which defines parameters used
11       by Slurm's acct_gather related plugins.  The file will  always  be  lo‐
12       cated in the same directory as the slurm.conf.
13
14       Parameter names are case insensitive but parameter values are case sen‐
15       sitive.  Any text following a "#" in the configuration file is  treated
16       as  a  comment  through the end of that line.  The size of each line in
17       the file is limited to 1024 characters.
18
19       Changes to the configuration file take effect upon restart of the Slurm
20       daemons.
21
22
23       The  following  acct_gather.conf  parameters are defined to control the
24       general behavior of various plugins in Slurm.
25
26
27       The acct_gather.conf file is different than other  Slurm  .conf  files.
28       Each  plugin  defines  which  options  are available. Each plugin to be
29       loaded must be specified in the slurm.conf under the following configu‐
30       ration entries:
31
32       • AcctGatherEnergyType (plugin type=acct_gather_energy)
33       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
34       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
35       • AcctGatherProfileType (plugin type=acct_gather_profile)
36
37
38       If  the  respective plugin for an option is not loaded then that option
39       will appear to be unknown by Slurm and silently ignored. If you  decide
40       to change plugin types you may also have to change the related options.
41
42

acct_gather_energy/IPMI

44       Required entry in slurm.conf:
45              AcctGatherEnergyType=acct_gather_energy/ipmi
46
47       Options used for acct_gather_energy/ipmi are as follows:
48
49
50              EnergyIPMIFrequency=<number>
51                        This  parameter  is  the number of seconds between BMC
52                        access samples.
53
54              EnergyIPMICalcAdjustment=<yes|no>
55                        If set to "yes", the consumption between the last  BMC
56                        access sample and a step consumption update is approx‐
57                        imated to get more accurate task consumption.  The ad‐
58                        justment  is  made at the step start and each time the
59                        consumption is updated, including the  step  end.  The
60                        approximations are not accumulated, only the first and
61                        last adjustments are used to calculated  the  consump‐
62                        tion. The default is "no".
63
64              EnergyIPMIPowerSensors=<key=values>
65                        Optionally  specify  the  ids  of the sensors to used.
66                        Multiple <key=values> can be set with ";"  separators.
67                        The  key  "Node"  is mandatory and is used to know the
68                        consumed energy for nodes  (scontrol  show  node)  and
69                        jobs  (sacct).   Other keys are optional and are named
70                        by administrator.  These keys  are  useful  only  when
71                        profile  is  activated  for  energy to store power (in
72                        watt) of each key.  <values>  are  integers,  multiple
73                        values can be set with "," separators.  The sum of the
74                        listed sensors is used for each  key.   EnergyIPMIPow‐
75                        erSensors  is optional, default value is "Node=number"
76                        where "number" is the id of the first power sensor re‐
77                        turned by ipmi-sensors.
78                        i.e.
79                        EnergyIPMIPowerSen‐
80                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
81                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
82                        EnergyIPMIPowerSensors=Node=1280
83
84
85              The following acct_gather.conf parameters are defined to control
86              the IPMI config default values for libipmiconsole.
87
88
89              EnergyIPMIUsername=USERNAME
90                        Specify BMC Username.
91
92              EnergyIPMIPassword=PASSWORD
93                        Specify BMC Password.
94       Datasets provided by the plugin have name: <IPMI_SENSOR_LABEL>Power.
95

acct_gather_energy/rapl

97       Required entry in slurm.conf:
98              AcctGatherEnergyType=acct_gather_energy/rapl
99       This plugin doesn't read any options from acct_gather.conf.
100       Dataset provided by the plugin is: Power.
101

acct_gather_energy/XCC

103       Required entry in slurm.conf:
104              AcctGatherEnergyType=acct_gather_energy/xcc
105
106       Options used for acct_gather_energy/xcc include only in-band communica‐
107       tions with XClarity Controller, thus a reduced set of configurations is
108       supported:
109
110
111              EnergyIPMIFrequency=<number>
112                        This parameter is the number of  seconds  between  XCC
113                        access samples.  Default is 30 seconds.
114
115              EnergyIPMITimeout=<number>
116                        Timeout,  in  seconds,  for  initializing the IPMI XCC
117                        context for a new gathering thread. Default is 10 sec‐
118                        onds.
119       Datasets provided by the plugin are: Energy, CurrPower.
120

acct_gather_filesystem/lustre

122       Required entry in slurm.conf:
123              AcctGatherFilesystemType=acct_gather_filesystem/lustre
124       This plugin doesn't read any options from acct_gather.conf.
125       Datasets provided by the plugin are: Reads, ReadMB, Writes, WriteMB.
126

acct_gather_profile/HDF5

128       Required entry in slurm.conf:
129              AcctGatherProfileType=acct_gather_profile/hdf5
130
131       Options used for acct_gather_profile/hdf5 are as follows:
132
133
134              ProfileHDF5Dir=<path>
135                     This  parameter  is  the  path  to the shared folder into
136                     which the acct_gather_profile plugin will write  detailed
137                     data (usually as an HDF5 file).  The directory is assumed
138                     to be on a file system shared by the controller  and  all
139                     compute nodes. This is a required parameter.
140
141              ProfileHDF5Default
142                     A  comma-delimited list of data types to be collected for
143                     each job submission.  Allowed values are:
144
145                     All     All data types are collected. (Cannot be combined
146                             with other values.)
147
148                     None    No data types are collected. This is the default.
149                             (Cannot be combined with other values.)
150
151                     Energy  Energy data is collected.
152
153                     Filesystem
154                             File system (Lustre) data is collected.
155
156                     Network Network (InfiniBand) data is collected.
157
158                     Task    Task (I/O, Memory, ...) data is collected.
159

acct_gather_profile/InfluxDB

161       Required entry in slurm.conf:
162              AcctGatherProfileType=acct_gather_profile/influxdb
163
164       The InfluxDB plugin provides the same information as  the  HDF5  plugin
165       but will instead send information to the configured InfluxDB server.
166
167       The  InfluxDB  plugin is designed against 1.x protocol of InfluxDB. Any
168       site running a v2.x InfluxDB server will need to configure a v1.x  com‐
169       patibility endpoint along with the correct user and password authoriza‐
170       tion. Token authentication is not currently supported.
171
172   Options:
173       ProfileInfluxDBDatabase
174              InfluxDB v1.x database name where profiling information is to be
175              written.   InfluxDB v2.x bucket name where profiling information
176              is to be written.
177
178       ProfileInfluxDBDefault
179              A comma-delimited list of data types to be  collected  for  each
180              job submission.  Allowed values are:
181
182              All       All  data types are collected. Cannot be combined with
183                        other values.
184
185              None      No data types are  collected.  This  is  the  default.
186                        Cannot be combined with other values.
187
188              Energy    Energy data is collected.
189
190              Filesystem
191                        File system (Lustre) data is collected.
192
193              Network   Network (InfiniBand) data is collected.
194
195              Task      Task (I/O, Memory, ...) data is collected.
196
197       ProfileInfluxDBHost=<hostname>:<port>
198              The  hostname of the machine where the InfluxDB instance is exe‐
199              cuted and the port used by the HTTP API. The port  used  by  the
200              HTTP  API  is  the  one  configured through the bind-address in‐
201              fluxdb.conf option in the [http] section.   Example:
202              ProfileInfluxDBHost=myinfluxhost:8086
203
204       ProfileInfluxDBPass
205              Password for username  configured  in  ProfileInfluxDBUser.  Re‐
206              quired in v2.x and optional in v1.x InfluxDB.
207
208       ProfileInfluxDBRTPolicy
209              The InfluxDB v1.x retention policy name for the database config‐
210              ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
211              tion  policy bucket name for the database configured in Profile‐
212              InfluxDBDatabase option.
213
214       ProfileInfluxDBUser
215              InfluxDB username that should be used  to  gain  access  to  the
216              database configured in ProfileInfluxDBDatabase. Required in v2.x
217              and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
218              v1.x  is  configured  with  authentication enabled in the [http]
219              config section and a user has been granted at least WRITE access
220              to the database. See also ProfileInfluxDBPass.
221
222   NOTES:
223       This  plugin requires the libcurl development files to be installed and
224       linkable at configure time. The plugin will not build otherwise.
225
226       Information on how to install and configure InfluxDB and  manage  data‐
227       bases,  retention  policies  and such is available on the official web‐
228       page.
229
230       Collected information is written from every compute node  where  a  job
231       runs  to the InfluxDB instance listening on the ProfileInfluxDBHost. In
232       order to avoid overloading the InfluxDB instance with incoming  connec‐
233       tion  requests, the plugin uses an internal buffer which is filled with
234       samples. Once the buffer is full, a HTTP API write request is performed
235       and  the  buffer is emptied to hold subsequent samples. A final request
236       is also performed when a task ends even if the buffer isn't full.
237
238       Failed HTTP API write requests are silently discarded. This means  that
239       collected  profile information in the plugin buffer is lost if it can't
240       be written to the InfluxDB database for any reason.
241
242       Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
243       File.  In order to troubleshoot any issues, it is recommended to tempo‐
244       rarily increase the slurmd debug level to debug3 and add Profile to the
245       debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
246       Debug and DebugFlags respectively or dynamically through scontrol  set‐
247       debug and setdebugflags.
248
249       Grafana  can  be  used  to  create charts based on the data held by In‐
250       fluxDB.  This kind of tool permits one to create dashboards, tables and
251       other graphics using the stored time series.
252
253

acct_gather_interconnect/OFED

255       Required entry in slurm.conf:
256              AcctGatherInterconnectType=acct_gather_interconnect/ofed
257
258       Options used for acct_gather_interconnect/ofed are as follows:
259
260
261              InfinibandOFEDPort=<number>
262                        This parameter represents the port number of the local
263                        Infiniband card that we are willing to  monitor.   The
264                        default port is 1.
265       Datasets provided by the plugin: PacketsIn, PacketsOut, InMB, OutMB
266
267

acct_gather_interconnect/sysfs

269       Required entry in slurm.conf:
270              AcctGatherInterconnectType=acct_gather_interconnect/sysfs
271
272       Options used for acct_gather_interconnect/sysfs are as follows:
273
274
275              SysfsInterfaces=<interfaces>
276                        Comma-separated  list  of  interface  names to collect
277                        statistics from. Usage from all listed interfaces will
278                        be  summed  together, and is not broken down individu‐
279                        ally.
280       Datasets provided by the plugin: PacketsIn, PacketsOut, InMB, OutMB
281
282

EXAMPLE

284       ###
285       # Slurm acct_gather configuration file
286       ###
287       # Parameters for acct_gather_energy/impi plugin
288       EnergyIPMIFrequency=10
289       EnergyIPMICalcAdjustment=yes
290       #
291       # Parameters for acct_gather_profile/hdf5 plugin
292       ProfileHDF5Dir=/app/slurm/profile_data
293       # Parameters for acct_gather_interconnect/ofed plugin
294       InfinibandOFEDPort=1
295
296

COPYING

298       Copyright (C) 2012-2013 Bull.  Copyright  (C)  2012-2022  SchedMD  LLC.
299       Produced at Bull (cf, DISCLAIMER).
300
301       This  file  is  part  of Slurm, a resource management program.  For de‐
302       tails, see <https://slurm.schedmd.com/>.
303
304       Slurm is free software; you can redistribute it and/or modify it  under
305       the  terms  of  the GNU General Public License as published by the Free
306       Software Foundation; either version 2 of the License, or (at  your  op‐
307       tion) any later version.
308
309       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
310       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
311       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
312       for more details.
313
314

SEE ALSO

316       slurm.conf(5)
317
318
319
320April 2022                 Slurm Configuration File        acct_gather.conf(5)
Impressum