1acct_gather.conf(5)        Slurm Configuration File        acct_gather.conf(5)
2
3
4

NAME

6       acct_gather.conf - Slurm configuration file for the acct_gather plugins
7
8

DESCRIPTION

10       acct_gather.conf  is  an  ASCII  file  which defines parameters used by
11       Slurm's acct_gather related plugins.  The file location can be modified
12       at  system build time using the DEFAULT_SLURM_CONF parameter or at exe‐
13       cution time by setting the SLURM_CONF environment  variable.  The  file
14       will always be located in the same directory as the slurm.conf file.
15
16       Parameter  names are case insensitive.  Any text following a "#" in the
17       configuration file is treated as a comment  through  the  end  of  that
18       line.  The size of each line in the file is limited to 1024 characters.
19       Changes to the configuration file take effect  upon  restart  of  Slurm
20       daemons,  daemon receipt of the SIGHUP signal, or execution of the com‐
21       mand "scontrol reconfigure" unless otherwise noted.
22
23
24       The following acct_gather.conf parameters are defined  to  control  the
25       general behavior of various plugins in Slurm.
26
27
28       The  acct_gather.conf  file  is different than other Slurm .conf files.
29       Each plugin defines which options are available.  So if you do not load
30       the  respective  plugin  for  an  option  that option will appear to be
31       unknown by Slurm and could cause Slurm not to load.  If you  decide  to
32       change  plugin  types you might also have to change the related options
33       as well.
34
35
36       EnergyIPMI
37              Options used for AcctGatherEnergyType/ipmi are as follows:
38
39
40              EnergyIPMIFrequency=<number>
41                        This parameter is the number of  seconds  between  BMC
42                        access samples.
43
44
45              EnergyIPMICalcAdjustment=<yes|no>
46                        If  set to "yes", the consumption between the last BMC
47                        access sample and a step consumption update is approx‐
48                        imated  to  get  more  accurate task consumption.  The
49                        adjustment is made at the step start and each time the
50                        consumption  is  updated,  including the step end. The
51                        approximations are not accumulated, only the first and
52                        last  adjustments  are used to calculated the consump‐
53                        tion. The default is "no".
54
55
56              EnergyIPMIPowerSensors=<key=values>
57                        Optionally specify the ids of  the  sensors  to  used.
58                        Multiple  <key=values> can be set with ";" separators.
59                        The key "Node" is mandatory and is used  to  know  the
60                        consumed  energy  for  nodes  (scontrol show node) and
61                        jobs (sacct).  Other keys are optional and  are  named
62                        by  administrator.   These  keys  are useful only when
63                        profile is activated for energy  to  store  power  (in
64                        watt)  of  each  key.  <values> are integers, multiple
65                        values can be set with "," separators.  The sum of the
66                        listed  sensors  is used for each key.  EnergyIPMIPow‐
67                        erSensors is optional, default value is  "Node=number"
68                        where  "number"  is  the  id of the first power sensor
69                        returned by ipmi-sensors.
70                        i.e.
71                        EnergyIPMIPowerSen‐
72                        sors=Node=16,19,23,26;Socket0=16,23;Socket1=19,26;SSUP=23,26;KNC=16,19
73                        EnergyIPMIPowerSensors=Node=29,32;SSUP0=29;SSUP1=32
74                        EnergyIPMIPowerSensors=Node=1280
75
76
77              The following acct_gather.conf parameters are defined to control
78              the IPMI config default values for libipmiconsole.
79
80
81              EnergyIPMIUsername=USERNAME
82                        Specify BMC Username.
83
84
85              EnergyIPMIPassword=PASSWORD
86                        Specify BMC Password.
87
88
89       ProfileHDF5
90              Options used for AcctGatherProfileType/hdf5 are as follows:
91
92
93              ProfileHDF5Dir=<path>
94                        This  parameter  is the path to the shared folder into
95                        which  the  acct_gather_profile  plugin   will   write
96                        detailed  data  (usually as an HDF5 file).  The direc‐
97                        tory is assumed to be on a file system shared  by  the
98                        controller  and  all compute nodes. This is a required
99                        parameter.
100
101
102              ProfileHDF5Default
103                        A comma delimited list of data types to  be  collected
104                        for each job submission.  Allowed values are:
105
106
107                        All     All  data types are collected. (Cannot be com‐
108                                bined with other values.)
109
110
111                        None    No data  types  are  collected.  This  is  the
112                                default.   (Cannot be combined with other val‐
113                                ues.)
114
115
116                        Energy  Energy data is collected.
117
118
119                        Filesystem
120                                File system (Lustre) data is collected.
121
122
123                        Network Network (InfiniBand) data is collected.
124
125
126                        Task    Task (I/O, Memory, ...) data is collected.
127
128
129       ProfileInfluxDB
130              Options used for AcctGatherProfileType/influxdb are as follows:
131
132
133              ProfileInfluxDBDatabase
134                        InfluxDB database name where profiling information  is
135                        to be written.
136
137
138              ProfileInfluxDBDefault
139                        A  comma  delimited list of data types to be collected
140                        for each job submission.  Allowed values are:
141
142
143                        All     All data types are collected. (Cannot be  com‐
144                                bined with other values.)
145
146
147                        None    No  data  types  are  collected.  This  is the
148                                default.  (Cannot be combined with other  val‐
149                                ues.)
150
151
152                        Energy  Energy data is collected.
153
154
155                        Filesystem
156                                File system (Lustre) data is collected.
157
158
159                        Network Network (InfiniBand) data is collected.
160
161
162                        Task    Task (I/O, Memory, ...) data is collected.
163
164
165              ProfileInfluxDBHost=<hostname>:<port>
166                        The hostname of the machine where the influxd instance
167                        is executed and the port used by  the  HTTP  API.  The
168                        port  used  by  the  HTTP  API  is  the one configured
169                        through the bind-address influxdb.conf option  in  the
170                        [http] section. Example:
171
172                        ProfileInfluxDBHost=myinfluxhost:8086
173
174
175              ProfileInfluxDBPass
176                        Optional  password for username configured in Profile‐
177                        InfluxDBUser.
178
179
180              ProfileInfluxDBRTPolicy
181                        The InfluxDB retention policy name  for  the  database
182                        configured in ProfileInfluxDBDatabase option.
183
184
185              ProfileInfluxDBUser
186                        Optional InfluxDB username that should be used to gain
187                        access to the database configured in  ProfileInfluxDB‐
188                        Database.  This  is only needed InfluxDB is configured
189                        with authentication enabled in the [http] config  sec‐
190                        tion and a user has been granted at least WRITE access
191                        to the database. See also ProfileInfluxDBPass.
192
193
194       NOTE:  This  plugin  requires  the  libcurl  development  files  to  be
195              installed.
196
197       NOTE:  Information  on how to install and configure InfluxDB and manage
198              databases, retention policies and such is available on the offi‐
199              cial webpage.
200
201       NOTE:  Collected information is written from every compute node where a
202              job runs to the influxd instance  listening  on  the  ProfileIn‐
203              fluxDBHost.  In  order to avoid overloading the influxd instance
204              with incoming connection requests, the plugin uses  an  internal
205              buffer  which is filled with samples. Once the buffer is full, a
206              HTTP API write request is performed and the buffer is emptied to
207              hold  subsequent samples. A final request is also performed when
208              a task ends even if the buffer isn't full.
209
210       NOTE:  Failed HTTP API write requests are discarded.  This  means  that
211              collected profile information in the plugin buffer is lost if it
212              can't be written to the influxd database for any reason.
213
214       NOTE:  Plugin messages are logged along with  the  slurmstepd  logs  to
215              SlurmdLogFile. In order to troubleshoot any issues, it is recom‐
216              mended to temporarily increase the slurmd debug level to  debug3
217              and  add Profile to the debug flags. This can be accomplished by
218              setting the slurm.conf SlurmdDebug and  DebugFlags  respectively
219              or dynamically through scontrol setdebug and setdebugflags.
220
221       NOTE:  Perhaps  it's a good idea to use a monitoring and analytics tool
222              such as Grafana on top of InfluxDB. This kind  of  tools  permit
223              one  to  create dashboards, tables, and other graphics using the
224              stored time series. This way, it is easier to correlate resource
225              usage peaks reported by other node monitoring tools such as Gan‐
226              glia with specific job step tasks.
227
228
229       InfinibandOFED
230              Options used for AcctGatherInfinbandType/ofed are as follows:
231
232
233              InfinibandOFEDPort=<number>
234                        This parameter represents the port number of the local
235                        Infiniband  card  that we are willing to monitor.  The
236                        default port is 1.
237

EXAMPLE

239       ###
240       # Slurm acct_gather configuration file
241       ###
242       # Parameters for AcctGatherEnergy/impi plugin
243       EnergyIPMIFrequency=10
244       EnergyIPMICalcAdjustment=yes
245       #
246       # Parameters for AcctGatherProfileType/hdf5 plugin
247       ProfileHDF5Dir=/app/slurm/profile_data
248       # Parameters for AcctGatherInfiniband/ofed plugin
249       InfinibandOFEDPort=1
250
251
252

COPYING

254       Copyright (C) 2012-2013 Bull.  Produced at Bull (cf, DISCLAIMER).
255
256       This file is  part  of  Slurm,  a  resource  management  program.   For
257       details, see <https://slurm.schedmd.com/>.
258
259       Slurm  is free software; you can redistribute it and/or modify it under
260       the terms of the GNU General Public License as published  by  the  Free
261       Software  Foundation;  either  version  2  of  the License, or (at your
262       option) any later version.
263
264       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
265       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
266       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
267       for more details.
268
269

SEE ALSO

271       slurm.conf(5)
272
273
274
275April 2015                 Slurm Configuration File        acct_gather.conf(5)
Impressum