1slurmdbd.conf(5) Slurm Configuration File slurmdbd.conf(5)
2
3
4
6 slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file
7
8
10 slurmdbd.conf is an ASCII file which describes Slurm Database Daemon
11 (SlurmDBD) configuration information. The file will always be located
12 in the same directory as the slurm.conf.
13
14 The contents of the file are case insensitive except for the names of
15 nodes and files. Any text following a "#" in the configuration file is
16 treated as a comment through the end of that line. Changes to the con‐
17 figuration file take effect upon restart of SlurmDBD or daemon receipt
18 of the SIGHUP signal unless otherwise noted.
19
20 This file should be only on the computer where SlurmDBD executes and
21 should only be readable by the user which executes SlurmDBD (e.g.
22 "slurm"). If the slurmdbd daemon is started as user root and changes
23 to another user ID, the configuration file will initially be read as
24 user root, but will be read as the other user ID in response to a
25 SIGHUP signal. This file should be protected from unauthorized access
26 since it contains a database password. The overall configuration pa‐
27 rameters available include:
28
29
30 AllowNoDefAcct
31 Remove requirement for users to have a default account. Bool‐
32 ean, yes to turn on, no (default) to enforce default accounts.
33
34 ArchiveDir
35 If ArchiveScript is not set the slurmdbd will generate a file
36 that can be read in anytime with sacctmgr load filename. This
37 directory is where the file will be placed after a purge event
38 has happened and archive for that element is set to true. De‐
39 fault is /tmp. The format for this files name is
40 $ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeS‐
41 tamp_$endTimeStamp We limit archive files to 50000 records per
42 file. If more than 50000 records exist during that time period,
43 they will be written to a new file. Subsequent archive files
44 during the same time period will have ".<number>" appended to
45 the file, for example .2, with the number increasing by one for
46 each file in the same time period.
47
48 ArchiveEvents
49 When purging events also archive them. Boolean, yes to archive
50 event data, no otherwise. Default is no.
51
52 ArchiveJobs
53 When purging jobs also archive them. Boolean, yes to archive
54 job data, no otherwise. Default is no.
55
56 ArchiveResvs
57 When purging reservations also archive them. Boolean, yes to
58 archive reservation data, no otherwise. Default is no.
59
60 ArchiveScript
61 This script can be executed every time a rollup happens (every
62 hour, day and month), depending on the Purge*After options.
63 This script is used to transfer accounting records out of the
64 database into an archive. It is used in place of the internal
65 process used to archive objects. The script is executed with no
66 arguments, and the following environment variables are set.
67
68 SLURM_ARCHIVE_EVENTS
69 1 for archive events 0 otherwise.
70
71 SLURM_ARCHIVE_LAST_EVENT
72 Time of last event start to archive.
73
74 SLURM_ARCHIVE_JOBS
75 1 for archive jobs 0 otherwise.
76
77 SLURM_ARCHIVE_LAST_JOB
78 Time of last job submit to archive.
79
80 SLURM_ARCHIVE_STEPS
81 1 for archive steps 0 otherwise.
82
83 SLURM_ARCHIVE_LAST_STEP
84 Time of last step start to archive.
85
86 SLURM_ARCHIVE_SUSPEND
87 1 for archive suspend data 0 otherwise.
88
89 SLURM_ARCHIVE_TXN
90 1 for archive transaction data 0 otherwise.
91
92 SLURM_ARCHIVE_USAGE
93 1 for archive usage data 0 otherwise.
94
95 SLURM_ARCHIVE_LAST_SUSPEND
96 Time of last suspend start to archive.
97
98 ArchiveSteps
99 When purging steps also archive them. Boolean, yes to archive
100 step data, no otherwise. Default is no.
101
102 ArchiveSuspend
103 When purging suspend data also archive it. Boolean, yes to ar‐
104 chive suspend data, no otherwise. Default is no.
105
106 ArchiveTXN
107 When purging transaction data also archive it. Boolean, yes to
108 archive transaction data, no otherwise. Default is no.
109
110 ArchiveUsage
111 When purging usage data (Cluster, Association and WCKey) also
112 archive it. Boolean, yes to archive transaction data, no other‐
113 wise. Default is no.
114
115 AuthInfo
116 Additional information to be used for authentication of communi‐
117 cations with the Slurm control daemon (slurmctld) on each clus‐
118 ter. The interpretation of this option is specific to the con‐
119 figured AuthType. In the case of auth/munge, this can be con‐
120 figured to use a Munge daemon specifically configured to provide
121 authentication between clusters while the default Munge daemon
122 provides authentication within a cluster. In that case, this
123 will specify the pathname of the socket to use. Per default this
124 value is left unspecified, which results in the default authen‐
125 tication mechanism being used.
126
127 AuthAltTypes
128 Command separated list of alternative authentication plugins
129 that the slurmdbd will permit for communication.
130
131 AuthAltParameters
132 Used to define alternative authentication plugins options. Mul‐
133 tiple options may be comma separated.
134
135 jwks= Absolute path to JWKS file. Only RS256 keys are sup‐
136 ported, although other key types may be listed in the
137 file. If set, no HS256 key will be loaded by default (and
138 token generation is disabled), although the jwt_key set‐
139 ting may be used to explicitly re-enable HS256 key use
140 (and token generation).
141
142 jwt_key=
143 Absolute path to JWT key file. Key must be HS256, and
144 should only be accessible by SlurmUser.
145
146 AuthType
147 Define the authentication method for communications between
148 Slurm components. Acceptable values at present include
149 "auth/munge", which is the default. "auth/munge" indicates that
150 LLNL's MUNGE system is to be used (this is the supported authen‐
151 tication mechanism for Slurm; see "https://dun.github.io/munge/"
152 for more information). SlurmDBD must be terminated prior to
153 changing the value of AuthType and later restarted.
154
155 CommitDelay
156 How many seconds between commits on a connection from a Slurm‐
157 ctld. This speeds up inserts into the database dramatically.
158 If you are running a very high throughput of jobs you should
159 consider setting this. In testing, 1 second improves the slur‐
160 mdbd performance dramatically and reduces overhead. There is a
161 small probability of data loss though since this creates a win‐
162 dow in which if the slurmdbd seg faults or exits abnormally for
163 any reason the data not committed could be lost. While this
164 situation should be very rare, it does present an extremely
165 small risk, but may be the only way to run in extremely heavy
166 environments. In all honesty, the risk is quite low, but still
167 present.
168
169 CommunicationParameters
170 Comma separated options identifying communication options.
171
172 DisableIPv4 Disable IPv4 only operation for the slurmdbd.
173 This should also be set in your slurm.conf file.
174
175 EnableIPv6 Enable using IPv6 addresses for the slurmdbd.
176 When using both IPv4 and IPv6, address family
177 preferences will be based on your /etc/gai.conf
178 file. This should also be set in your slurm.conf
179 file.
180
181 keepaliveinterval=#
182 Specifies the interval between keepalive probes
183 on the socket communications between the backup
184 and primary slurmdbd. The default value is 30
185 seconds.
186
187 keepaliveprobes=#
188 Specifies the number of keepalive probes sent on
189 the socket communications between the backup and
190 primary slurmdbd. The default value is 3.
191
192 keepalivetime=#
193 Specifies how long to wait before sending
194 keepalive probes between the primary and backup
195 slurmdbd processes. The default value is 30 sec‐
196 onds.
197
198 DbdBackupHost
199 The short, or long, name of the machine where the backup Slurm
200 Database Daemon is executed (i.e. the name returned by the com‐
201 mand "hostname -s"). This host must have access to the same un‐
202 derlying database specified by the 'Storage' options mentioned
203 below.
204
205 DbdAddr
206 Name that DbdHost should be referred to in establishing a commu‐
207 nications path. This name will be used as an argument to the
208 getaddrinfo() function for identification. For example,
209 "elx0000" might be used to designate the Ethernet address for
210 node "lx0000". By default the DbdAddr will be identical in
211 value to DbdHost.
212
213 DbdHost
214 The short, or long, name of the machine where the Slurm Database
215 Daemon is executed (i.e. the name returned by the command "host‐
216 name -s"). This value must be specified.
217
218 DbdPort
219 The port number that the Slurm Database Daemon (slurmdbd) lis‐
220 tens to for work. The default value is SLURMDBD_PORT as estab‐
221 lished at system build time. If no value is explicitly speci‐
222 fied, it will be set to 6819. This value must be equal to the
223 AccountingStoragePort parameter in the slurm.conf file.
224
225 DebugFlags
226 Defines specific subsystems which should provide more detailed
227 event logging. Multiple subsystems can be specified with comma
228 separators. Most DebugFlags will result in verbose logging for
229 the identified subsystems and could impact performance. Valid
230 subsystems available today (with more to come) include:
231
232 DB_ARCHIVE
233 SQL statements/queries when dealing with archiving and
234 purging the database.
235
236 DB_ASSOC
237 SQL statements/queries when dealing with associations in
238 the database.
239
240 DB_EVENT
241 SQL statements/queries when dealing with (node) events in
242 the database.
243
244 DB_JOB SQL statements/queries when dealing with jobs in the
245 database.
246
247 DB_QOS SQL statements/queries when dealing with QOS in the data‐
248 base.
249
250 DB_QUERY
251 SQL statements/queries when dealing with transactions and
252 such in the database.
253
254 DB_RESERVATION
255 SQL statements/queries when dealing with reservations in
256 the database.
257
258 DB_RESOURCE
259 SQL statements/queries when dealing with resources like
260 licenses in the database.
261
262 DB_STEP
263 SQL statements/queries when dealing with steps in the
264 database.
265
266 DB_TRES
267 SQL statements/queries when dealing with trackable re‐
268 sources in the database.
269
270 DB_USAGE
271 SQL statements/queries when dealing with usage queries
272 and inserts in the database.
273
274 DB_WCKEY
275 SQL statements/queries when dealing with wckeys in the
276 database.
277
278 FEDERATION
279 SQL statements/queries when dealing with federations in
280 the database.
281
282 DebugLevel
283 The level of detail to provide the Slurm Database Daemon's logs.
284 The default value is info.
285
286 quiet Log nothing
287
288 fatal Log only fatal errors
289
290 error Log only errors
291
292 info Log errors and general informational messages
293
294 verbose Log errors and verbose informational messages
295
296 debug Log errors and verbose informational messages and de‐
297 bugging messages
298
299 debug2 Log errors and verbose informational messages and more
300 debugging messages
301
302 debug3 Log errors and verbose informational messages and even
303 more debugging messages
304
305 debug4 Log errors and verbose informational messages and even
306 more debugging messages
307
308 debug5 Log errors and verbose informational messages and even
309 more debugging messages
310
311 DebugLevelSyslog
312 The slurmdbd daemon will log events to the syslog file at the
313 specified level of detail. If not set, the slurmdbd daemon will
314 log to syslog at level fatal, unless there is no LogFile and it
315 is running in the background, in which case it will log to sys‐
316 log at the level specified by DebugLevel (at fatal in the case
317 that DebugLevel is set to quiet) or it is run in the foreground,
318 when it will be set to quiet.
319
320 quiet Log nothing
321
322 fatal Log only fatal errors
323
324 error Log only errors
325
326 info Log errors and general informational messages
327
328 verbose Log errors and verbose informational messages
329
330 debug Log errors and verbose informational messages and de‐
331 bugging messages
332
333 debug2 Log errors and verbose informational messages and more
334 debugging messages
335
336 debug3 Log errors and verbose informational messages and even
337 more debugging messages
338
339 debug4 Log errors and verbose informational messages and even
340 more debugging messages
341
342 debug5 Log errors and verbose informational messages and even
343 more debugging messages
344
345 NOTE: By default, Slurm's systemd service files start daemons in
346 the foreground with the -D option. This means that systemd will
347 capture stdout/stderr output and print that to syslog, indepen‐
348 dent of Slurm printing to syslog directly. To prevent systemd
349 from doing this, add "StandardOutput=null" and "StandardEr‐
350 ror=null" to the respective service files or override files.
351
352 DefaultQOS
353 When adding a new cluster this will be used as the qos for the
354 cluster unless something is explicitly set by the admin with the
355 create.
356
357 LogFile
358 Fully qualified pathname of a file into which the Slurm Database
359 Daemon's logs are written. The default value is none (performs
360 logging via syslog).
361 See the section LOGGING in the slurm.conf man page if a pathname
362 is specified.
363
364 LogTimeFormat
365 Format of the timestamp in slurmdbd log files. Accepted values
366 are "iso8601", "iso8601_ms", "rfc5424", "rfc5424_ms", "clock",
367 and "short". The values ending in "_ms" differ from the ones
368 without in that fractional seconds with millisecond precision
369 are printed. The default value is "iso8601_ms". The "rfc5424"
370 formats are the same as the "iso8601" formats except that the
371 timezone value is also shown. The "clock" format shows a time‐
372 stamp in microseconds retrieved with the C standard clock()
373 function. The "short" format is a short date and time format.
374 The "thread_id" format shows the timestamp in the C standard
375 ctime() function form without the year but including the mi‐
376 croseconds, the daemon's process ID and the current thread ID.
377
378 MaxQueryTimeRange
379 Return an error if a query is against too large of a time span,
380 to prevent ill-formed queries from causing performance problems
381 within SlurmDBD. Default value is INFINITE which allows any
382 queries to proceed. Accepted time formats are the same as the
383 MaxTime option in slurm.conf. Operator and higher privileged
384 users are exempt from this restriction. Note that queries which
385 attempt to return over 3GB of data will still fail to complete
386 with ESLURM_RESULT_TOO_LARGE.
387
388 MessageTimeout
389 Time permitted for a round-trip communication to complete in
390 seconds. Default value is 10 seconds.
391
392 Parameters
393 Contains arbitrary comma separated parameters used to alter the
394 behavior of the slurmdbd.
395
396 PreserveCaseUser
397 When defining users do not force lower case which is the
398 default behavior.
399
400 PidFile
401 Fully qualified pathname of a file into which the Slurm Database
402 Daemon may write its process ID. This may be used for automated
403 signal processing. The default value is "/var/run/slur‐
404 mdbd.pid".
405
406 PluginDir
407 Identifies the places in which to look for Slurm plugins. This
408 is a colon-separated list of directories, like the PATH environ‐
409 ment variable. The default value is the prefix given at config‐
410 ure time + "/lib/slurm".
411
412 PrivateData
413 This controls what type of information is hidden from regular
414 users. By default, all information is visible to all users.
415 User SlurmUser, root, and users with AdminLevel=Admin can always
416 view all information. Multiple values may be specified with a
417 comma separator. Acceptable values include:
418
419 accounts
420 prevents users from viewing any account definitions un‐
421 less they are coordinators of them.
422
423 events prevents users from viewing event information unless they
424 have operator status or above.
425
426 jobs prevents users from viewing job records belonging to
427 other users unless they are coordinators of the account
428 running the job when using sacct.
429
430 reservations
431 restricts getting reservation information to users with
432 operator status and above.
433
434 usage prevents users from viewing usage of any other user.
435 This applies to sreport.
436
437 users prevents users from viewing information of any user other
438 than themselves, this also makes it so users can only see
439 associations they deal with. Coordinators can see asso‐
440 ciations of all users in the account they are coordinator
441 of, but can only see themselves when listing users.
442
443 PurgeEventAfter
444 Events happening on the cluster over this age are purged from
445 the database. This includes node down times and such. The time
446 is a numeric value and is a number of months. If you want to
447 purge more often you can include "hours", or "days" behind the
448 numeric value to get those more frequent purges (i.e. a value of
449 "12hours" would purge everything older than 12 hours). The
450 purge takes place at the start of the each purge interval. For
451 example, if the purge time is 2 months, the purge would happen
452 at the beginning of each month. If not set (default), then
453 event records are never purged.
454
455 PurgeJobAfter
456 Individual job records over this age are purged from the data‐
457 base. Aggregated information will be preserved to
458 "PurgeUsageAfter". The time is a numeric value and is a number
459 of months. If you want to purge more often you can include
460 "hours", or "days" behind the numeric value to get those more
461 frequent purges (i.e. a value of "12hours" would purge every‐
462 thing older than 12 hours). The purge takes place at the start
463 of the each purge interval. For example, if the purge time is 2
464 months, the purge would happen at the beginning of each month.
465 If not set (default), then job records are never purged.
466
467 PurgeResvAfter
468 Individual reservation records over this age are purged from the
469 database. Aggregated information will be preserved to
470 "PurgeUsageAfter". The time is a numeric value and is a number
471 of months. If you want to purge more often you can include
472 "hours", or "days" behind the numeric value to get those more
473 frequent purges (i.e. a value of "12hours" would purge every‐
474 thing older than 12 hours). The purge takes place at the start
475 of the each purge interval. For example, if the purge time is 2
476 months, the purge would happen at the beginning of each month.
477 If not set (default), then reservation records are never purged.
478
479 PurgeStepAfter
480 Individual job step records over this age are purged from the
481 database. Aggregated information will be preserved to
482 "PurgeUsageAfter". The time is a numeric value and is a number
483 of months. If you want to purge more often you can include
484 "hours", or "days" behind the numeric value to get those more
485 frequent purges (i.e. a value of "12hours" would purge every‐
486 thing older than 12 hours). The purge takes place at the start
487 of the each purge interval. For example, if the purge time is 2
488 months, the purge would happen at the beginning of each month.
489 If not set (default), then job step records are never purged.
490
491 PurgeSuspendAfter
492 Records of individual suspend times for jobs over this age are
493 purged from the database. Aggregated information will be pre‐
494 served to "PurgeUsageAfter". The time is a numeric value and is
495 a number of months. If you want to purge more often you can in‐
496 clude "hours", or "days" behind the numeric value to get those
497 more frequent purges (i.e. a value of "12hours" would purge ev‐
498 erything older than 12 hours). The purge takes place at the
499 start of the each purge interval. For example, if the purge
500 time is 2 months, the purge would happen at the beginning of
501 each month. If not set (default), then suspend records are
502 never purged.
503
504 PurgeTXNAfter
505 Records of individual transaction times for transactions over
506 this age are purged from the database. The time is a numeric
507 value and is a number of months. If you want to purge more of‐
508 ten you can include "hours", or "days" behind the numeric value
509 to get those more frequent purges (i.e. a value of "12hours"
510 would purge everything older than 12 hours). The purge takes
511 place at the start of the each purge interval. For example, if
512 the purge time is 2 months, the purge would happen at the begin‐
513 ning of each month. If not set (default), then transaction
514 records are never purged.
515
516 PurgeUsageAfter
517 Usage Records (Cluster, Association and WCKey) over this age are
518 purged from the database. The time is a numeric value and is a
519 number of months. If you want to purge more often you can in‐
520 clude "hours", or "days" behind the numeric value to get those
521 more frequent purges (i.e. a value of "12hours" would purge ev‐
522 erything older than 12 hours). The purge takes place at the
523 start of the each purge interval. For example, if the purge
524 time is 2 months, the purge would happen at the beginning of
525 each month. If not set (default), then usage records are never
526 purged.
527
528 SlurmUser
529 The name of the user that the slurmdbd daemon executes as. This
530 user should match the SlurmUser used for all instances of slurm‐
531 ctld that report to slurmdbd. It must exist on the machine exe‐
532 cuting the Slurm Database Daemon and have the same UID as the
533 hosts on which slurmctld executes. For security purposes, a
534 user other than "root" is recommended. The default value is
535 "root".
536
537 NOTE: If the SlurmUser for slurmctld is root you can still use a
538 non-root SlurmUser for slurmdbd (in any other case, both Slur‐
539 mUsers should match) by explicitly setting the user's AdminLevel
540 to Admin. After adding a user in this way, you must restart
541 slurmctld.
542
543 StorageHost
544 Define the name of the host the database is running where we are
545 going to store the data. Ideally this should be the host on
546 which slurmdbd executes.
547
548 StorageBackupHost
549 Define the name of the backup host the database is running where
550 we are going to store the data. This can be viewed as a backup
551 solution when the StorageHost is not responding. It is up to
552 the backup solution to enforce the coherency of the accounting
553 information between the two hosts. With clustered database solu‐
554 tions (active/passive HA), you would not need to use this fea‐
555 ture. Default is none.
556
557 StorageLoc
558 Specify the name of the database as the location where account‐
559 ing records are written. Defaults to "slurm_acct_db".
560
561 StorageParameters
562 Comma separated list of key-value pair parameters. Currently
563 supported values include options to establish a secure connec‐
564 tion to the database:
565
566 SSL_CERT
567 The path name of the client public key certificate file.
568
569 SSL_CA
570 The path name of the Certificate Authority (CA) certificate
571 file.
572
573 SSL_CAPATH
574 The path name of the directory that contains trusted SSL CA
575 certificate files.
576
577 SSL_KEY
578 The path name of the client private key file.
579
580 SSL_CIPHER
581 The list of permissible ciphers for SSL encryption.
582
583 StoragePass
584 Define the password used to gain access to the database to store
585 the job accounting data. The '#' character is not permitted in a
586 password.
587
588 StoragePort
589 The port number that the Slurm Database Daemon (slurmdbd) commu‐
590 nicates with the database. Default is 3306.
591
592 StorageType
593 Define the accounting storage mechanism type. Acceptable values
594 at present include "accounting_storage/mysql". The value "ac‐
595 counting_storage/mysql" indicates that accounting records should
596 be written to a MySQL or MariaDB database specified by the Stor‐
597 ageLoc parameter. This value must be specified.
598
599 StorageUser
600 Define the name of the user we are going to connect to the data‐
601 base with to store the job accounting data.
602
603 TCPTimeout
604 Time permitted for TCP connection to be established. Default
605 value is 2 seconds.
606
607 TrackSlurmctldDown
608 Boolean yes or no. If set the slurmdbd will mark all idle re‐
609 sources on the cluster as down when a slurmctld disconnects or
610 is no longer reachable. The default is no.
611
612 TrackWCKey
613 Boolean yes or no. Used to set display and track of the Work‐
614 load Characterization Key. Must be set to track wckey usage.
615 This must be set to generate rolled up usage tables from WCKeys.
616 NOTE: If TrackWCKey is set here and not in your various
617 slurm.conf files all jobs will be attributed to their default
618 WCKey.
619
621 #
622 # Sample /etc/slurmdbd.conf
623 #
624 ArchiveEvents=yes
625 ArchiveJobs=yes
626 ArchiveResvs=yes
627 ArchiveSteps=no
628 ArchiveSuspend=no
629 ArchiveTXN=no
630 ArchiveUsage=no
631 #ArchiveScript=/usr/sbin/slurm.dbd.archive
632 AuthInfo=/var/run/munge/munge.socket.2
633 AuthType=auth/munge
634 DbdHost=db_host
635 DebugLevel=info
636 PurgeEventAfter=1month
637 PurgeJobAfter=12month
638 PurgeResvAfter=1month
639 PurgeStepAfter=1month
640 PurgeSuspendAfter=1month
641 PurgeTXNAfter=12month
642 PurgeUsageAfter=24month
643 LogFile=/var/log/slurmdbd.log
644 PidFile=/var/run/slurmdbd.pid
645 SlurmUser=slurm_mgr
646 StoragePass=password_to_database
647 StorageType=accounting_storage/mysql
648 StorageUser=database_mgr
649
650
652 Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced
653 at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
654 Copyright (C) 2010-2022 SchedMD LLC.
655
656 This file is part of Slurm, a resource management program. For de‐
657 tails, see <https://slurm.schedmd.com/>.
658
659 Slurm is free software; you can redistribute it and/or modify it under
660 the terms of the GNU General Public License as published by the Free
661 Software Foundation; either version 2 of the License, or (at your op‐
662 tion) any later version.
663
664 Slurm is distributed in the hope that it will be useful, but WITHOUT
665 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
666 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
667 for more details.
668
669
671 /etc/slurmdbd.conf
672
673
675 slurm.conf(5), slurmctld(8), slurmdbd(8) syslog (2)
676
677
678
679October 2022 Slurm Configuration File slurmdbd.conf(5)