1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf is a UTF8 formatted file which defines parameters used
11       by Slurm's acct_gather related plugins.  The file location can be modi‐
12       fied  at system build time using the DEFAULT_SLURM_CONF parameter or at
13       execution time by setting the SLURM_CONF environment variable. The file
14       will always be located in the same directory as the slurm.conf file.
15
16       Parameter names are case insensitive but parameter values are case sen‐
17       sistive.  Any text following a "#" in the configuration file is treated
18       as  a  comment  through the end of that line.  The size of each line in
19       the file is limited to 1024 characters.
20
21       Changes to the configuration file take effect upon restart of the Slurm
22       daemons.
23
24
25       The  following  acct_gather.conf  parameters are defined to control the
26       general behavior of various plugins in Slurm.
27
28
29       The acct_gather.conf file is different than other  Slurm  .conf  files.
30       Each  plugin  defines  which  options  are available. Each plugin to be
31       loaded must be specified in the slurm.conf under the following configu‐
32       ration entries:
33
34       • AcctGatherEnergyType (plugin type=acct_gather_energy)
35       • AcctGatherInterconnectType (plugin type=acct_gather_interconnect)
36       • AcctGatherFilesystemType (plugin type=acct_gather_filesystem)
37       • AcctGatherProfileType (plugin type=acct_gather_profile)
38
39
40       If  the  respective plugin for an option is not loaded then that option
41       will appear to be unknown by Slurm and silently ignored. If you  decide
42       to change plugin types you may also have to change the related options.
43
44

acct_gather_energy/IPMI

46       Options used for acct_gather_energy/ipmi are as follows:
47
48
49              EnergyIPMIFrequency=<number>
50                        This  parameter  is  the number of seconds between BMC
51                        access samples.
52
53
54              EnergyIPMICalcAdjustment=<yes|no>
55                        If set to "yes", the consumption between the last  BMC
56                        access sample and a step consumption update is approx‐
57                        imated to get more accurate task consumption.  The ad‐
58                        justment  is  made at the step start and each time the
59                        consumption is updated, including the  step  end.  The
60                        approximations are not accumulated, only the first and
61                        last adjustments are used to calculated  the  consump‐
62                        tion. The default is "no".
63
64
65              EnergyIPMIPowerSensors=<key=values>
66                        Optionally  specify  the  ids  of the sensors to used.
67                        Multiple <key=values> can be set with ";"  separators.
68                        The  key  "Node"  is mandatory and is used to know the
69                        consumed energy for nodes  (scontrol  show  node)  and
70                        jobs  (sacct).   Other keys are optional and are named
71                        by administrator.  These keys  are  useful  only  when
72                        profile  is  activated  for  energy to store power (in
73                        watt) of each key.  <values>  are  integers,  multiple
74                        values can be set with "," separators.  The sum of the
75                        listed sensors is used for each  key.   EnergyIPMIPow‐
76                        erSensors  is optional, default value is "Node=number"
77                        where "number" is the id of the first power sensor re‐
78                        turned by ipmi-sensors.
79                        i.e.
80                        EnergyIPMIPowerSen‐
81                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
82                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
83                        EnergyIPMIPowerSensors=Node=1280
84
85
86              The following acct_gather.conf parameters are defined to control
87              the IPMI config default values for libipmiconsole.
88
89
90              EnergyIPMIUsername=USERNAME
91                        Specify BMC Username.
92
93
94              EnergyIPMIPassword=PASSWORD
95                        Specify BMC Password.
96
97

acct_gather_energy/XCC

99       Options used for acct_gather_energy/xcc include only in-band communica‐
100       tions with XClarity Controller, thus a reduced set of configurations is
101       supported:
102
103
104              EnergyIPMIFrequency=<number>
105                        This parameter is the number of  seconds  between  XCC
106                        access samples.  Default is 30 seconds.
107
108
109              EnergyIPMITimeout=<number>
110                        Timeout,  in  seconds,  for  initializing the IPMI XCC
111                        context for a new gathering thread. Default is 10 sec‐
112                        onds.
113
114

acct_gather_profile/HDF5

116       Options used for acct_gather_profile/hdf5 are as follows:
117
118
119              ProfileHDF5Dir=<path>
120                     This  parameter  is  the  path  to the shared folder into
121                     which the acct_gather_profile plugin will write  detailed
122                     data (usually as an HDF5 file).  The directory is assumed
123                     to be on a file system shared by the controller  and  all
124                     compute nodes. This is a required parameter.
125
126
127              ProfileHDF5Default
128                     A  comma delimited list of data types to be collected for
129                     each job submission.  Allowed values are:
130
131
132                     All     All data types are collected. (Cannot be combined
133                             with other values.)
134
135
136                     None    No data types are collected. This is the default.
137                             (Cannot be combined with other values.)
138
139
140                     Energy  Energy data is collected.
141
142
143                     Filesystem
144                             File system (Lustre) data is collected.
145
146
147                     Network Network (InfiniBand) data is collected.
148
149
150                     Task    Task (I/O, Memory, ...) data is collected.
151
152

acct_gather_profile/InfluxDB

154       The InfluxDB plugin provides the same information as  the  HDF5  plugin
155       but will instead send information to the configured InfluxDB server.
156
157       The  InfluxDB  plugin is designed against 1.x protocol of InfluxDB. Any
158       site running a v2.x InfluxDB server will need to configure a v1.x  com‐
159       patiblity  endpoint along with the correct user and password authoriza‐
160       tion. Token authentication is not currently supported.
161
162   Options:
163       ProfileInfluxDBDatabase
164              InfluxDB v1.x database name where profiling information is to be
165              written.   InfluxDB v2.x bucket name where profiling information
166              is to be written.
167
168
169       ProfileInfluxDBDefault
170              A comma delimited list of data types to be  collected  for  each
171              job submission.  Allowed values are:
172
173
174              All       All  data types are collected. Cannot be combined with
175                        other values.
176
177
178              None      No data types are  collected.  This  is  the  default.
179                        Cannot be combined with other values.
180
181
182              Energy    Energy data is collected.
183
184
185              Filesystem
186                        File system (Lustre) data is collected.
187
188
189              Network   Network (InfiniBand) data is collected.
190
191
192              Task      Task (I/O, Memory, ...) data is collected.
193
194
195       ProfileInfluxDBHost=<hostname>:<port>
196              The  hostname  of the machine where the influxd instance is exe‐
197              cuted and the port used by the HTTP API. The port  used  by  the
198              HTTP  API  is  the  one  configured through the bind-address in‐
199              fluxdb.conf option in the [http] section.   Example:
200              ProfileInfluxDBHost=myinfluxhost:8086
201
202       ProfileInfluxDBPass
203              Password for username  configured  in  ProfileInfluxDBUser.  Re‐
204              quired in v2.x and optional in v1.x InfluxDB.
205
206
207       ProfileInfluxDBRTPolicy
208              The InfluxDB v1.x retention policy name for the database config‐
209              ured in ProfileInfluxDBDatabase option. The InfluxDB v2.x reten‐
210              tion  policy bucket name for the database configured in Profile‐
211              InfluxDBDatabase option.
212
213
214       ProfileInfluxDBUser
215              InfluxDB username that should be used  to  gain  access  to  the
216              database configured in ProfileInfluxDBDatabase. Required in v2.x
217              and optional in v1.x InfluxDB.  This is only needed if  InfluxDB
218              v1.x  is  configured  with  authentication enabled in the [http]
219              config section and a user has been granted at least WRITE access
220              to the database. See also ProfileInfluxDBPass.
221
222   NOTES:
223       This  plugin requires the libcurl development files to be installed and
224       linkable at configure time. The plugin will not build otherwise.
225
226       Information on how to install and configure InfluxDB and  manage  data‐
227       bases,  retention  policies  and such is available on the official web‐
228       page.
229
230       Collected information is written from every compute node  where  a  job
231       runs  to  the influxd instance listening on the ProfileInfluxDBHost. In
232       order to avoid overloading the influxd instance with  incoming  connec‐
233       tion  requests, the plugin uses an internal buffer which is filled with
234       samples. Once the buffer is full, a HTTP API write request is performed
235       and  the  buffer is emptied to hold subsequent samples. A final request
236       is also performed when a task ends even if the buffer isn't full.
237
238       Failed HTTP API write requests are silently discarded. This means  that
239       collected  profile information in the plugin buffer is lost if it can't
240       be written to the influxd database for any reason.
241
242       Plugin messages are logged along with the slurmstepd logs to SlurmdLog‐
243       File.  In order to troubleshoot any issues, it is recommended to tempo‐
244       rarily increase the slurmd debug level to debug3 and add Profile to the
245       debug flags. This can be accomplished by setting the slurm.conf Slurmd‐
246       Debug and DebugFlags respectively or dynamically through scontrol  set‐
247       debug and setdebugflags.
248
249       Grafana  can  be  used  to  create charts based on the data held by In‐
250       fluxDB.  This kind of tool permits one to create dashboards, tables and
251       other graphics using the stored time series.
252
253

acct_gather_interconnect/OFED

255       Options used for acct_gather_interconnect/ofed are as follows:
256
257
258              InfinibandOFEDPort=<number>
259                        This parameter represents the port number of the local
260                        Infiniband card that we are willing to  monitor.   The
261                        default port is 1.
262

EXAMPLE

264       ###
265       # Slurm acct_gather configuration file
266       ###
267       # Parameters for acct_gather_energy/impi plugin
268       EnergyIPMIFrequency=10
269       EnergyIPMICalcAdjustment=yes
270       #
271       # Parameters for acct_gather_profile/hdf5 plugin
272       ProfileHDF5Dir=/app/slurm/profile_data
273       # Parameters for acct_gather_interconnect/ofed plugin
274       InfinibandOFEDPort=1
275
276

COPYING

278       Copyright (C) 2012-2013 Bull.  Produced at Bull (cf, DISCLAIMER).
279
280       This  file  is  part  of Slurm, a resource management program.  For de‐
281       tails, see <https://slurm.schedmd.com/>.
282
283       Slurm is free software; you can redistribute it and/or modify it  under
284       the  terms  of  the GNU General Public License as published by the Free
285       Software Foundation; either version 2 of the License, or (at  your  op‐
286       tion) any later version.
287
288       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
289       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
290       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
291       for more details.
292
293

SEE ALSO

295       slurm.conf(5)
296
297
298
299April 2021                 Slurm Configuration File        acct_gather.conf(5)
Impressum