1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf  is  an  ASCII  file  which defines parameters used by
11       Slurm's acct_gather related plugins.  The file location can be modified
12       at  system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
13       cution time by setting the SLURM_CONF environment  variable.  The  file
14       will always be located in the same directory as the slurm.conf file.
15
16       Parameter  names are case insensitive.  Any text following a "#" in the
17       configuration file is treated as a comment  through  the  end  of  that
18       line.  The size of each line in the file is limited to 1024 characters.
19       Changes to the configuration file take effect  upon  restart  of  Slurm
20       daemons,  daemon receipt of the SIGHUP signal, or execution of the com‐
21       mand "scontrol reconfigure" unless otherwise noted.
22
23
24       The following acct_gather.conf parameters are defined  to  control  the
25       general behavior of various plugins in Slurm.
26
27
28       The  acct_gather.conf  file  is different than other Slurm .conf files.
29       Each plugin defines which options are available.  So if you do not load
30       the  respective  plugin  for  an  option  that option will appear to be
31       unknown by Slurm and could cause Slurm not to load.  If you  decide  to
32       change  plugin  types you might also have to change the related options
33       as well.
34
35
36       EnergyIPMI
37              Options used for acct_gather_energy/ipmi are as follows:
38
39
40              EnergyIPMIFrequency=<number>
41                        This parameter is the number of  seconds  between  BMC
42                        access samples.
43
44
45              EnergyIPMICalcAdjustment=<yes|no>
46                        If  set to "yes", the consumption between the last BMC
47                        access sample and a step consumption update is approx‐
48                        imated  to  get  more  accurate task consumption.  The
49                        adjustment is made at the step start and each time the
50                        consumption  is  updated,  including the step end. The
51                        approximations are not accumulated, only the first and
52                        last  adjustments  are used to calculated the consump‐
53                        tion. The default is "no".
54
55
56              EnergyIPMIPowerSensors=<key=values>
57                        Optionally specify the ids of  the  sensors  to  used.
58                        Multiple  <key=values> can be set with ";" separators.
59                        The key "Node" is mandatory and is used  to  know  the
60                        consumed  energy  for  nodes  (scontrol show node) and
61                        jobs (sacct).  Other keys are optional and  are  named
62                        by  administrator.   These  keys  are useful only when
63                        profile is activated for energy  to  store  power  (in
64                        watt)  of  each  key.  <values> are integers, multiple
65                        values can be set with "," separators.  The sum of the
66                        listed  sensors  is used for each key.  EnergyIPMIPow‐
67                        erSensors is optional, default value is  "Node=number"
68                        where  "number"  is  the  id of the first power sensor
69                        returned by ipmi-sensors.
70                        i.e.
71                        EnergyIPMIPowerSen‐
72                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
73                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
74                        EnergyIPMIPowerSensors=Node=1280
75
76
77              The following acct_gather.conf parameters are defined to control
78              the IPMI config default values for libipmiconsole.
79
80
81              EnergyIPMIUsername=USERNAME
82                        Specify BMC Username.
83
84
85              EnergyIPMIPassword=PASSWORD
86                        Specify BMC Password.
87
88
89       EnergyXCC
90              Options  used  for  acct_gather_energy/xcc  include only in-band
91              communications with XClarity Controller, thus a reduced  set  of
92              configurations is supported:
93
94
95              EnergyIPMIFrequency=<number>
96                        This  parameter  is  the number of seconds between XCC
97                        access samples.  Default is 30 seconds.
98
99
100              EnergyIPMITimeout=<number>
101                        Timeout, in seconds, for  initializing  the  IPMI  XCC
102                        context for a new gathering thread. Default is 10 sec‐
103                        onds.
104
105
106       ProfileHDF5
107              Options used for acct_gather_profile/hdf5 are as follows:
108
109
110              ProfileHDF5Dir=<path>
111                        This parameter is the path to the shared  folder  into
112                        which   the   acct_gather_profile  plugin  will  write
113                        detailed data (usually as an HDF5 file).   The  direc‐
114                        tory  is  assumed to be on a file system shared by the
115                        controller and all compute nodes. This is  a  required
116                        parameter.
117
118
119              ProfileHDF5Default
120                        A  comma  delimited list of data types to be collected
121                        for each job submission.  Allowed values are:
122
123
124                        All     All data types are collected. (Cannot be  com‐
125                                bined with other values.)
126
127
128                        None    No  data  types  are  collected.  This  is the
129                                default.  (Cannot be combined with other  val‐
130                                ues.)
131
132
133                        Energy  Energy data is collected.
134
135
136                        Filesystem
137                                File system (Lustre) data is collected.
138
139
140                        Network Network (InfiniBand) data is collected.
141
142
143                        Task    Task (I/O, Memory, ...) data is collected.
144
145
146       ProfileInfluxDB
147              Options used for acct_gather_profile/influxdb are as follows:
148
149
150              ProfileInfluxDBDatabase
151                        InfluxDB  database name where profiling information is
152                        to be written.
153
154
155              ProfileInfluxDBDefault
156                        A comma delimited list of data types to  be  collected
157                        for each job submission.  Allowed values are:
158
159
160                        All     All  data types are collected. (Cannot be com‐
161                                bined with other values.)
162
163
164                        None    No data  types  are  collected.  This  is  the
165                                default.   (Cannot be combined with other val‐
166                                ues.)
167
168
169                        Energy  Energy data is collected.
170
171
172                        Filesystem
173                                File system (Lustre) data is collected.
174
175
176                        Network Network (InfiniBand) data is collected.
177
178
179                        Task    Task (I/O, Memory, ...) data is collected.
180
181
182              ProfileInfluxDBHost=<hostname>:<port>
183                        The hostname of the machine where the influxd instance
184                        is  executed  and  the  port used by the HTTP API. The
185                        port used by  the  HTTP  API  is  the  one  configured
186                        through  the  bind-address influxdb.conf option in the
187                        [http] section. Example:
188
189                        ProfileInfluxDBHost=myinfluxhost:8086
190
191
192              ProfileInfluxDBPass
193                        Optional password for username configured in  Profile‐
194                        InfluxDBUser.
195
196
197              ProfileInfluxDBRTPolicy
198                        The  InfluxDB  retention  policy name for the database
199                        configured in ProfileInfluxDBDatabase option.
200
201
202              ProfileInfluxDBUser
203                        Optional InfluxDB username that should be used to gain
204                        access  to the database configured in ProfileInfluxDB‐
205                        Database. This is only needed InfluxDB  is  configured
206                        with  authentication enabled in the [http] config sec‐
207                        tion and a user has been granted at least WRITE access
208                        to the database. See also ProfileInfluxDBPass.
209
210
211       NOTE:  This  plugin  requires  the  libcurl  development  files  to  be
212              installed.
213
214       NOTE:  Information on how to install and configure InfluxDB and  manage
215              databases, retention policies and such is available on the offi‐
216              cial webpage.
217
218       NOTE:  Collected information is written from every compute node where a
219              job  runs  to  the  influxd instance listening on the ProfileIn‐
220              fluxDBHost. In order to avoid overloading the  influxd  instance
221              with  incoming  connection requests, the plugin uses an internal
222              buffer which is filled with samples. Once the buffer is full,  a
223              HTTP API write request is performed and the buffer is emptied to
224              hold subsequent samples. A final request is also performed  when
225              a task ends even if the buffer isn't full.
226
227       NOTE:  Failed  HTTP  API  write requests are discarded. This means that
228              collected profile information in the plugin buffer is lost if it
229              can't be written to the influxd database for any reason.
230
231       NOTE:  Plugin  messages  are  logged  along with the slurmstepd logs to
232              SlurmdLogFile. In order to troubleshoot any issues, it is recom‐
233              mended  to temporarily increase the slurmd debug level to debug3
234              and add Profile to the debug flags. This can be accomplished  by
235              setting  the  slurm.conf SlurmdDebug and DebugFlags respectively
236              or dynamically through scontrol setdebug and setdebugflags.
237
238       NOTE:  Perhaps it's a good idea to use a monitoring and analytics  tool
239              such  as  Grafana  on top of InfluxDB. This kind of tools permit
240              one to create dashboards, tables, and other graphics  using  the
241              stored time series. This way, it is easier to correlate resource
242              usage peaks reported by other node monitoring tools such as Gan‐
243              glia with specific job step tasks.
244
245
246       InfinibandOFED
247              Options used for acct_gather_interconnect/ofed are as follows:
248
249
250              InfinibandOFEDPort=<number>
251                        This parameter represents the port number of the local
252                        Infiniband card that we are willing to  monitor.   The
253                        default port is 1.
254

EXAMPLE

256       ###
257       # Slurm acct_gather configuration file
258       ###
259       # Parameters for acct_gather_energy/impi plugin
260       EnergyIPMIFrequency=10
261       EnergyIPMICalcAdjustment=yes
262       #
263       # Parameters for acct_gather_profile/hdf5 plugin
264       ProfileHDF5Dir=/app/slurm/profile_data
265       # Parameters for acct_gather_interconnect/ofed plugin
266       InfinibandOFEDPort=1
267
268
269

COPYING

271       Copyright (C) 2012-2013 Bull.  Produced at Bull (cf, DISCLAIMER).
272
273       This  file  is  part  of  Slurm,  a  resource  management program.  For
274       details, see <https://slurm.schedmd.com/>.
275
276       Slurm is free software; you can redistribute it and/or modify it  under
277       the  terms  of  the GNU General Public License as published by the Free
278       Software Foundation; either version 2  of  the  License,  or  (at  your
279       option) any later version.
280
281       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
282       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
283       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
284       for more details.
285
286

SEE ALSO

288       slurm.conf(5)
289
290
291
292April 2020                 Slurm Configuration File        acct_gather.conf(5)
Impressum