1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf is a UTF8 formatted file which defines parameters used
11       by Slurm's acct_gather related plugins.  The file will  always  be  lo‐
12       cated in the same directory as the slurm.conf.
13
14       Parameter names are case insensitive but parameter values are case sen‐
15       sitive.  Any text following a "#" in the configuration file is  treated
16       as  a  comment  through the end of that line.  The size of each line in
17       the file is limited to 1024 characters.
18
19       Changes to the configuration file take effect upon restart of the Slurm
20       daemons.
21
22
23       The  following  acct_gather.conf  parameters are defined to control the
24       general behavior of various plugins in Slurm.
25
26
27       The acct_gather.conf file is different than other  Slurm  .conf  files.
28       Each  plugin  defines  which  options  are available. Each plugin to be
29       loaded must be specified in the slurm.conf under the following configu‐
30       ration entries:
31
32       • AcctGatherEnergyType (plugin type=acct_gather_energy)
33       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
34       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
35       • AcctGatherProfileType (plugin type=acct_gather_profile)
36
37
38       If  the  respective plugin for an option is not loaded then that option
39       will appear to be unknown by Slurm and silently ignored. If you  decide
40       to change plugin types you may also have to change the related options.
41
42

acct_gather_energy/IPMI

44       Required entry in slurm.conf:
45              AcctGatherEnergyType=acct_gather_energy/ipmi
46
47       Options used for acct_gather_energy/ipmi are as follows:
48
49
50              EnergyIPMIFrequency=<number>
51                        This  parameter  is  the number of seconds between BMC
52                        access samples.
53
54              EnergyIPMICalcAdjustment=<yes|no>
55                        If set to "yes", the consumption between the last  BMC
56                        access sample and a step consumption update is approx‐
57                        imated to get more accurate task consumption.  The ad‐
58                        justment  is  made at the step start and each time the
59                        consumption is updated, including the  step  end.  The
60                        approximations are not accumulated, only the first and
61                        last adjustments are used to calculated  the  consump‐
62                        tion. The default is "no".
63
64              EnergyIPMIPowerSensors=<key=values>
65                        Optionally  specify  the  ids  of the sensors to used.
66                        Multiple <key=values> can be set with ";"  separators.
67                        The  key  "Node"  is mandatory and is used to know the
68                        consumed energy for nodes  (scontrol  show  node)  and
69                        jobs  (sacct).   Other keys are optional and are named
70                        by administrator.  These keys  are  useful  only  when
71                        profile  is  activated  for  energy to store power (in
72                        watt) of each key.  <values>  are  integers,  multiple
73                        values can be set with "," separators.  The sum of the
74                        listed sensors is used for each  key.   EnergyIPMIPow‐
75                        erSensors  is optional, default value is "Node=number"
76                        where "number" is the id of the first power sensor re‐
77                        turned by ipmi-sensors.
78                        i.e.
79                        EnergyIPMIPowerSen‐
80                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
81                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
82                        EnergyIPMIPowerSensors=Node=1280
83
84
85              The following acct_gather.conf parameters are defined to control
86              the IPMI config default values for libipmiconsole.
87
88
89              EnergyIPMIUsername=USERNAME
90                        Specify BMC Username.
91
92              EnergyIPMIPassword=PASSWORD
93                        Specify BMC Password.
94

acct_gather_energy/XCC

96       Required entry in slurm.conf:
97              AcctGatherEnergyType=acct_gather_energy/xcc
98
99       Options used for acct_gather_energy/xcc include only in-band communica‐
100       tions with XClarity Controller, thus a reduced set of configurations is
101       supported:
102
103
104              EnergyIPMIFrequency=<number>
105                        This parameter is the number of  seconds  between  XCC
106                        access samples.  Default is 30 seconds.
107
108              EnergyIPMITimeout=<number>
109                        Timeout,  in  seconds,  for  initializing the IPMI XCC
110                        context for a new gathering thread. Default is 10 sec‐
111                        onds.
112

acct_gather_profile/HDF5

114       Required entry in slurm.conf:
115              AcctGatherProfileType=acct_gather_profile/hdf5
116
117       Options used for acct_gather_profile/hdf5 are as follows:
118
119
120              ProfileHDF5Dir=<path>
121                     This  parameter  is  the  path  to the shared folder into
122                     which the acct_gather_profile plugin will write  detailed
123                     data (usually as an HDF5 file).  The directory is assumed
124                     to be on a file system shared by the controller  and  all
125                     compute nodes. This is a required parameter.
126
127              ProfileHDF5Default
128                     A  comma-delimited list of data types to be collected for
129                     each job submission.  Allowed values are:
130
131                     All     All data types are collected. (Cannot be combined
132                             with other values.)
133
134                     None    No data types are collected. This is the default.
135                             (Cannot be combined with other values.)
136
137                     Energy  Energy data is collected.
138
139                     Filesystem
140                             File system (Lustre) data is collected.
141
142                     Network Network (InfiniBand) data is collected.
143
144                     Task    Task (I/O, Memory, ...) data is collected.
145

acct_gather_profile/InfluxDB

147       Required entry in slurm.conf:
148              AcctGatherProfileType=acct_gather_profile/influxdb
149
150       The InfluxDB plugin provides the same information as  the  HDF5  plugin
151       but will instead send information to the configured InfluxDB server.
152
153       The  InfluxDB  plugin is designed against 1.x protocol of InfluxDB. Any
154       site running a v2.x InfluxDB server will need to configure a v1.x  com‐
155       patibility endpoint along with the correct user and password authoriza‐
156       tion. Token authentication is not currently supported.
157
158   Options:
159       ProfileInfluxDBDatabase
160              InfluxDB v1.x database name where profiling information is to be
161              written.   InfluxDB v2.x bucket name where profiling information
162              is to be written.
163
164       ProfileInfluxDBDefault
165              A comma-delimited list of data types to be  collected  for  each
166              job submission.  Allowed values are:
167
168              All       All  data types are collected. Cannot be combined with
169                        other values.
170
171              None      No data types are  collected.  This  is  the  default.
172                        Cannot be combined with other values.
173
174              Energy    Energy data is collected.
175
176              Filesystem
177                        File system (Lustre) data is collected.
178
179              Network   Network (InfiniBand) data is collected.
180
181              Task      Task (I/O, Memory, ...) data is collected.
182
183       ProfileInfluxDBHost=<hostname>:<port>
184              The  hostname of the machine where the InfluxDB instance is exe‐
185              cuted and the port used by the HTTP API. The port  used  by  the
186              HTTP  API  is  the  one  configured through the bind-address in‐
187              fluxdb.conf option in the [http] section.   Example:
188              ProfileInfluxDBHost=myinfluxhost:8086
189
190       ProfileInfluxDBPass
191              Password for username  configured  in  ProfileInfluxDBUser.  Re‐
192              quired in v2.x and optional in v1.x InfluxDB.
193
194       ProfileInfluxDBRTPolicy
195              The InfluxDB v1.x retention policy name for the database config‐
196              ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
197              tion  policy bucket name for the database configured in Profile‐
198              InfluxDBDatabase option.
199
200       ProfileInfluxDBUser
201              InfluxDB username that should be used  to  gain  access  to  the
202              database configured in ProfileInfluxDBDatabase. Required in v2.x
203              and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
204              v1.x  is  configured  with  authentication enabled in the [http]
205              config section and a user has been granted at least WRITE access
206              to the database. See also ProfileInfluxDBPass.
207
208   NOTES:
209       This  plugin requires the libcurl development files to be installed and
210       linkable at configure time. The plugin will not build otherwise.
211
212       Information on how to install and configure InfluxDB and  manage  data‐
213       bases,  retention  policies  and such is available on the official web‐
214       page.
215
216       Collected information is written from every compute node  where  a  job
217       runs  to the InfluxDB instance listening on the ProfileInfluxDBHost. In
218       order to avoid overloading the InfluxDB instance with incoming  connec‐
219       tion  requests, the plugin uses an internal buffer which is filled with
220       samples. Once the buffer is full, a HTTP API write request is performed
221       and  the  buffer is emptied to hold subsequent samples. A final request
222       is also performed when a task ends even if the buffer isn't full.
223
224       Failed HTTP API write requests are silently discarded. This means  that
225       collected  profile information in the plugin buffer is lost if it can't
226       be written to the InfluxDB database for any reason.
227
228       Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
229       File.  In order to troubleshoot any issues, it is recommended to tempo‐
230       rarily increase the slurmd debug level to debug3 and add Profile to the
231       debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
232       Debug and DebugFlags respectively or dynamically through scontrol  set‐
233       debug and setdebugflags.
234
235       Grafana  can  be  used  to  create charts based on the data held by In‐
236       fluxDB.  This kind of tool permits one to create dashboards, tables and
237       other graphics using the stored time series.
238
239

acct_gather_interconnect/OFED

241       Required entry in slurm.conf:
242              AcctGatherInterconnectType=acct_gather_interconnect/ofed
243
244       Options used for acct_gather_interconnect/ofed are as follows:
245
246
247              InfinibandOFEDPort=<number>
248                        This parameter represents the port number of the local
249                        Infiniband card that we are willing to  monitor.   The
250                        default port is 1.
251
252

EXAMPLE

254       ###
255       # Slurm acct_gather configuration file
256       ###
257       # Parameters for acct_gather_energy/impi plugin
258       EnergyIPMIFrequency=10
259       EnergyIPMICalcAdjustment=yes
260       #
261       # Parameters for acct_gather_profile/hdf5 plugin
262       ProfileHDF5Dir=/app/slurm/profile_data
263       # Parameters for acct_gather_interconnect/ofed plugin
264       InfinibandOFEDPort=1
265
266

COPYING

268       Copyright  (C)  2012-2013  Bull.   Copyright (C) 2012-2022 SchedMD LLC.
269       Produced at Bull (cf, DISCLAIMER).
270
271       This file is part of Slurm, a resource  management  program.   For  de‐
272       tails, see <https://slurm.schedmd.com/>.
273
274       Slurm  is free software; you can redistribute it and/or modify it under
275       the terms of the GNU General Public License as published  by  the  Free
276       Software  Foundation;  either version 2 of the License, or (at your op‐
277       tion) any later version.
278
279       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
280       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
281       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
282       for more details.
283
284

SEE ALSO

286       slurm.conf(5)
287
288
289
290January 2022               Slurm Configuration File        acct_gather.conf(5)
Impressum