1slurmdbd.conf(5) Slurm Configuration File slurmdbd.conf(5)
2
3
4
6 slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file
7
8
10 slurmdbd.conf is an ASCII file which describes Slurm Database Daemon
11 (SlurmDBD) configuration information. The file location can be modi‐
12 fied at system build time using the DEFAULT_SLURM_CONF parameter or at
13 execution time by setting the SLURM_CONF environment variable.
14
15 The contents of the file are case insensitive except for the names of
16 nodes and files. Any text following a "#" in the configuration file is
17 treated as a comment through the end of that line. Changes to the con‐
18 figuration file take effect upon restart of SlurmDBD or daemon receipt
19 of the SIGHUP signal unless otherwise noted.
20
21 This file should be only on the computer where SlurmDBD executes and
22 should only be readable by the user which executes SlurmDBD (e.g.
23 "slurm"). If the slurmdbd daemon is started as user root and changes
24 to another user ID, the configuration file will initially be read as
25 user root, but will be read as the other user ID in response to a
26 SIGHUP signal. This file should be protected from unauthorized access
27 since it contains a database password. The overall configuration pa‐
28 rameters available include:
29
30
31 ArchiveDir
32 If ArchiveScript is not set the slurmdbd will generate a file
33 that can be read in anytime with sacctmgr load filename. This
34 directory is where the file will be placed after a purge event
35 has happened and archive for that element is set to true. De‐
36 fault is /tmp. The format for this files name is
37 $ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeS‐
38 tamp_$endTimeStamp We limit archive files to 50000 records per
39 file. If more than 50000 records exist during that time period,
40 they will be written to a new file. Subsequent archive files
41 during the same time period will have ".<number>" appended to
42 the file, for example .2, with the number increasing by one for
43 each file in the same time period.
44
45
46 ArchiveEvents
47 When purging events also archive them. Boolean, yes to archive
48 event data, no otherwise. Default is no.
49
50
51 ArchiveJobs
52 When purging jobs also archive them. Boolean, yes to archive
53 job data, no otherwise. Default is no.
54
55
56 ArchiveResvs
57 When purging reservations also archive them. Boolean, yes to
58 archive reservation data, no otherwise. Default is no.
59
60
61 ArchiveScript
62 This script can be executed every time a rollup happens (every
63 hour, day and month), depending on the Purge*After options.
64 This script is used to transfer accounting records out of the
65 database into an archive. It is used in place of the internal
66 process used to archive objects. The script is executed with no
67 arguments, and the following environment variables are set.
68
69 SLURM_ARCHIVE_EVENTS
70 1 for archive events 0 otherwise.
71
72 SLURM_ARCHIVE_LAST_EVENT
73 Time of last event start to archive.
74
75 SLURM_ARCHIVE_JOBS
76 1 for archive jobs 0 otherwise.
77
78 SLURM_ARCHIVE_LAST_JOB
79 Time of last job submit to archive.
80
81 SLURM_ARCHIVE_STEPS
82 1 for archive steps 0 otherwise.
83
84 SLURM_ARCHIVE_LAST_STEP
85 Time of last step start to archive.
86
87 SLURM_ARCHIVE_SUSPEND
88 1 for archive suspend data 0 otherwise.
89
90 SLURM_ARCHIVE_TXN
91 1 for archive transaction data 0 otherwise.
92
93 SLURM_ARCHIVE_USAGE
94 1 for archive usage data 0 otherwise.
95
96 SLURM_ARCHIVE_LAST_SUSPEND
97 Time of last suspend start to archive.
98
99
100
101 ArchiveSteps
102 When purging steps also archive them. Boolean, yes to archive
103 step data, no otherwise. Default is no.
104
105
106 ArchiveSuspend
107 When purging suspend data also archive it. Boolean, yes to ar‐
108 chive suspend data, no otherwise. Default is no.
109
110
111 ArchiveTXN
112 When purging transaction data also archive it. Boolean, yes to
113 archive transaction data, no otherwise. Default is no.
114
115
116 ArchiveUsage
117 When purging usage data (Cluster, Association and WCKey) also
118 archive it. Boolean, yes to archive transaction data, no other‐
119 wise. Default is no.
120
121
122 AuthInfo
123 Additional information to be used for authentication of communi‐
124 cations with the Slurm control daemon (slurmctld) on each clus‐
125 ter. The interpretation of this option is specific to the con‐
126 figured AuthType. In the case of auth/munge, this can be con‐
127 figured to use a Munge daemon specifically configured to provide
128 authentication between clusters while the default Munge daemon
129 provides authentication within a cluster. In that case, this
130 will specify the pathname of the socket to use. Per default this
131 value is left unspecified, which results in the default authen‐
132 tication mechanism being used.
133
134
135 AuthAltTypes
136 Command separated list of alternative authentication plugins
137 that the slurmdbd will permit for communication.
138
139
140 AuthAltParameters
141 Used to define alternative authentication plugins options. Mul‐
142 tiple options may be comma separated.
143
144 jwks= Absolute path to JWKS file. Only RS256 keys are sup‐
145 ported, although other key types may be listed in the
146 file. If set, no HS256 key will be loaded by default (and
147 token generation is disabled), although the jwt_key set‐
148 ting may be used to explicitly re-enable HS256 key use
149 (and token generation).
150
151 jwt_key=
152 Absolute path to JWT key file. Key must be HS256, and
153 should only be accessible by SlurmUser.
154
155
156 AuthType
157 Define the authentication method for communications between
158 Slurm components. Acceptable values at present include
159 "auth/munge", which is the default. "auth/munge" indicates that
160 LLNL's MUNGE system is to be used (this is the supported authen‐
161 tication mechanism for Slurm; see "https://dun.github.io/munge/"
162 for more information). SlurmDBD must be terminated prior to
163 changing the value of AuthType and later restarted.
164
165
166 CommitDelay
167 How many seconds between commits on a connection from a Slurm‐
168 ctld. This speeds up inserts into the database dramatically.
169 If you are running a very high throughput of jobs you should
170 consider setting this. In testing, 1 second improves the slur‐
171 mdbd performance dramatically and reduces overhead. There is a
172 small probability of data loss though since this creates a win‐
173 dow in which if the slurmdbd seg faults or exits abnormally for
174 any reason the data not committed could be lost. While this
175 situation should be very rare, it does present an extremely
176 small risk, but may be the only way to run in extremely heavy
177 environments. In all honesty, the risk is quite low, but still
178 present.
179
180
181 CommunicationParameters
182 Comma separated options identifying communication options.
183
184 DisableIPv4 Disable IPv4 only operation for the slurmdbd.
185 This should also be set in your slurm.conf file.
186
187 EnableIPv6 Enable using IPv6 addresses for the slurmdbd.
188 When using both IPv4 and IPv6, address family
189 preferences will be based on your /etc/gai.conf
190 file. This should also be set in your slurm.conf
191 file.
192
193
194 DbdBackupHost
195 The short, or long, name of the machine where the backup Slurm
196 Database Daemon is executed (i.e. the name returned by the com‐
197 mand "hostname -s"). This host must have access to the same un‐
198 derlying database specified by the 'Storage' options mentioned
199 below.
200
201
202 DbdAddr
203 Name that DbdHost should be referred to in establishing a commu‐
204 nications path. This name will be used as an argument to the
205 getaddrinfo() function for identification. For example,
206 "elx0000" might be used to designate the Ethernet address for
207 node "lx0000". By default the DbdAddr will be identical in
208 value to DbdHost.
209
210
211 DbdHost
212 The short, or long, name of the machine where the Slurm Database
213 Daemon is executed (i.e. the name returned by the command "host‐
214 name -s"). This value must be specified.
215
216
217 DbdPort
218 The port number that the Slurm Database Daemon (slurmdbd) lis‐
219 tens to for work. The default value is SLURMDBD_PORT as estab‐
220 lished at system build time. If no value is explicitly speci‐
221 fied, it will be set to 6819. This value must be equal to the
222 AccountingStoragePort parameter in the slurm.conf file.
223
224
225 DebugFlags
226 Defines specific subsystems which should provide more detailed
227 event logging. Multiple subsystems can be specified with comma
228 separators. Most DebugFlags will result in verbose logging for
229 the identified subsystems and could impact performance. Valid
230 subsystems available today (with more to come) include:
231
232 DB_ARCHIVE
233 SQL statements/queries when dealing with archiving and
234 purging the database.
235
236 DB_ASSOC
237 SQL statements/queries when dealing with associations in
238 the database.
239
240 DB_EVENT
241 SQL statements/queries when dealing with (node) events in
242 the database.
243
244 DB_JOB
245 SQL statements/queries when dealing with jobs in the
246 database.
247
248 DB_QOS
249 SQL statements/queries when dealing with QOS in the data‐
250 base.
251
252 DB_QUERY
253 SQL statements/queries when dealing with transactions and
254 such in the database.
255
256 DB_RESERVATION
257 SQL statements/queries when dealing with reservations in
258 the database.
259
260 DB_RESOURCE
261 SQL statements/queries when dealing with resources like
262 licenses in the database.
263
264 DB_STEP
265 SQL statements/queries when dealing with steps in the
266 database.
267
268 DB_TRES
269 SQL statements/queries when dealing with trackable re‐
270 sources in the database.
271
272 DB_USAGE
273 SQL statements/queries when dealing with usage queries
274 and inserts in the database.
275
276 DB_WCKEY
277 SQL statements/queries when dealing with wckeys in the
278 database.
279
280 FEDERATION
281 SQL statements/queries when dealing with federations in
282 the database.
283
284
285 DebugLevel
286 The level of detail to provide the Slurm Database Daemon's logs.
287 The default value is info.
288
289 quiet Log nothing
290
291 fatal Log only fatal errors
292
293 error Log only errors
294
295 info Log errors and general informational messages
296
297 verbose Log errors and verbose informational messages
298
299 debug Log errors and verbose informational messages and de‐
300 bugging messages
301
302 debug2 Log errors and verbose informational messages and more
303 debugging messages
304
305 debug3 Log errors and verbose informational messages and even
306 more debugging messages
307
308 debug4 Log errors and verbose informational messages and even
309 more debugging messages
310
311 debug5 Log errors and verbose informational messages and even
312 more debugging messages
313
314
315 DebugLevelSyslog
316 The slurmdbd daemon will log events to the syslog file at the
317 specified level of detail. If not set, the slurmdbd daemon will
318 log to syslog at level fatal, unless there is no LogFile and it
319 is running in the background, in which case it will log to sys‐
320 log at the level specified by DebugLevel (at fatal in the case
321 that DebugLevel is set to quiet) or it is run in the foreground,
322 when it will be set to quiet.
323
324
325 quiet Log nothing
326
327 fatal Log only fatal errors
328
329 error Log only errors
330
331 info Log errors and general informational messages
332
333 verbose Log errors and verbose informational messages
334
335 debug Log errors and verbose informational messages and de‐
336 bugging messages
337
338 debug2 Log errors and verbose informational messages and more
339 debugging messages
340
341 debug3 Log errors and verbose informational messages and even
342 more debugging messages
343
344 debug4 Log errors and verbose informational messages and even
345 more debugging messages
346
347 debug5 Log errors and verbose informational messages and even
348 more debugging messages
349
350
351
352 DefaultQOS
353 When adding a new cluster this will be used as the qos for the
354 cluster unless something is explicitly set by the admin with the
355 create.
356
357
358 LogFile
359 Fully qualified pathname of a file into which the Slurm Database
360 Daemon's logs are written. The default value is none (performs
361 logging via syslog).
362 See the section LOGGING in the slurm.conf man page if a pathname
363 is specified.
364
365
366 LogTimeFormat
367 Format of the timestamp in slurmdbd log files. Accepted values
368 are "iso8601", "iso8601_ms", "rfc5424", "rfc5424_ms", "clock",
369 and "short". The values ending in "_ms" differ from the ones
370 without in that fractional seconds with millisecond precision
371 are printed. The default value is "iso8601_ms". The "rfc5424"
372 formats are the same as the "iso8601" formats except that the
373 timezone value is also shown. The "clock" format shows a time‐
374 stamp in microseconds retrieved with the C standard clock()
375 function. The "short" format is a short date and time format.
376 The "thread_id" format shows the timestamp in the C standard
377 ctime() function form without the year but including the mi‐
378 croseconds, the daemon's process ID and the current thread ID.
379
380
381 MaxQueryTimeRange
382 Return an error if a query is against too large of a time span,
383 to prevent ill-formed queries from causing performance problems
384 within SlurmDBD. Default value is INFINITE which allows any
385 queries to proceed. Accepted time formats are the same as the
386 MaxTime option in slurm.conf. User SlurmUser and root are ex‐
387 empt from this restriction. Note that queries which attempt to
388 return over 3GB of data will still fail to complete with ES‐
389 LURM_RESULT_TOO_LARGE.
390
391
392 MessageTimeout
393 Time permitted for a round-trip communication to complete in
394 seconds. Default value is 10 seconds.
395
396
397 Parameters
398 Contains arbitrary comma separated parameters used to alter the
399 behavior of the slurmdbd.
400
401 PreserveCaseUser
402 When defining users do not force lower case which is the
403 default behavior.
404
405
406 PidFile
407 Fully qualified pathname of a file into which the Slurm Database
408 Daemon may write its process ID. This may be used for automated
409 signal processing. The default value is "/var/run/slur‐
410 mdbd.pid".
411
412
413 PluginDir
414 Identifies the places in which to look for Slurm plugins. This
415 is a colon-separated list of directories, like the PATH environ‐
416 ment variable. The default value is the prefix given at config‐
417 ure time + "/lib/slurm".
418
419
420 PrivateData
421 This controls what type of information is hidden from regular
422 users. By default, all information is visible to all users.
423 User SlurmUser, root, and users with AdminLevel=Admin can always
424 view all information. Multiple values may be specified with a
425 comma separator. Acceptable values include:
426
427 accounts
428 prevents users from viewing any account definitions un‐
429 less they are coordinators of them.
430
431 events prevents users from viewing event information unless they
432 have operator status or above.
433
434 jobs prevents users from viewing job records belonging to
435 other users unless they are coordinators of the account
436 running the job when using sacct.
437
438 reservations
439 restricts getting reservation information to users with
440 operator status and above.
441
442 usage prevents users from viewing usage of any other user.
443 This applies to sreport.
444
445 users prevents users from viewing information of any user other
446 than themselves, this also makes it so users can only see
447 associations they deal with. Coordinators can see asso‐
448 ciations of all users in the account they are coordinator
449 of, but can only see themselves when listing users.
450
451
452 PurgeEventAfter
453 Events happening on the cluster over this age are purged from
454 the database. This includes node down times and such. The time
455 is a numeric value and is a number of months. If you want to
456 purge more often you can include "hours", or "days" behind the
457 numeric value to get those more frequent purges (i.e. a value of
458 "12hours" would purge everything older than 12 hours). The
459 purge takes place at the start of the each purge interval. For
460 example, if the purge time is 2 months, the purge would happen
461 at the beginning of each month. If not set (default), then
462 event records are never purged.
463
464
465 PurgeJobAfter
466 Individual job records over this age are purged from the data‐
467 base. Aggregated information will be preserved to
468 "PurgeUsageAfter". The time is a numeric value and is a number
469 of months. If you want to purge more often you can include
470 "hours", or "days" behind the numeric value to get those more
471 frequent purges (i.e. a value of "12hours" would purge every‐
472 thing older than 12 hours). The purge takes place at the start
473 of the each purge interval. For example, if the purge time is 2
474 months, the purge would happen at the beginning of each month.
475 If not set (default), then job records are never purged.
476
477
478 PurgeResvAfter
479 Individual reservation records over this age are purged from the
480 database. Aggregated information will be preserved to
481 "PurgeUsageAfter". The time is a numeric value and is a number
482 of months. If you want to purge more often you can include
483 "hours", or "days" behind the numeric value to get those more
484 frequent purges (i.e. a value of "12hours" would purge every‐
485 thing older than 12 hours). The purge takes place at the start
486 of the each purge interval. For example, if the purge time is 2
487 months, the purge would happen at the beginning of each month.
488 If not set (default), then reservation records are never purged.
489
490
491 PurgeStepAfter
492 Individual job step records over this age are purged from the
493 database. Aggregated information will be preserved to
494 "PurgeUsageAfter". The time is a numeric value and is a number
495 of months. If you want to purge more often you can include
496 "hours", or "days" behind the numeric value to get those more
497 frequent purges (i.e. a value of "12hours" would purge every‐
498 thing older than 12 hours). The purge takes place at the start
499 of the each purge interval. For example, if the purge time is 2
500 months, the purge would happen at the beginning of each month.
501 If not set (default), then job step records are never purged.
502
503
504 PurgeSuspendAfter
505 Records of individual suspend times for jobs over this age are
506 purged from the database. Aggregated information will be pre‐
507 served to "PurgeUsageAfter". The time is a numeric value and is
508 a number of months. If you want to purge more often you can in‐
509 clude "hours", or "days" behind the numeric value to get those
510 more frequent purges (i.e. a value of "12hours" would purge ev‐
511 erything older than 12 hours). The purge takes place at the
512 start of the each purge interval. For example, if the purge
513 time is 2 months, the purge would happen at the beginning of
514 each month. If not set (default), then suspend records are
515 never purged.
516
517
518 PurgeTXNAfter
519 Records of individual transaction times for transactions over
520 this age are purged from the database. The time is a numeric
521 value and is a number of months. If you want to purge more of‐
522 ten you can include "hours", or "days" behind the numeric value
523 to get those more frequent purges (i.e. a value of "12hours"
524 would purge everything older than 12 hours). The purge takes
525 place at the start of the each purge interval. For example, if
526 the purge time is 2 months, the purge would happen at the begin‐
527 ning of each month. If not set (default), then transaction
528 records are never purged.
529
530
531 PurgeUsageAfter
532 Usage Records (Cluster, Association and WCKey) over this age are
533 purged from the database. The time is a numeric value and is a
534 number of months. If you want to purge more often you can in‐
535 clude "hours", or "days" behind the numeric value to get those
536 more frequent purges (i.e. a value of "12hours" would purge ev‐
537 erything older than 12 hours). The purge takes place at the
538 start of the each purge interval. For example, if the purge
539 time is 2 months, the purge would happen at the beginning of
540 each month. If not set (default), then usage records are never
541 purged.
542
543
544 SlurmUser
545 The name of the user that the slurmdbd daemon executes as. This
546 user must exist on the machine executing the Slurm Database Dae‐
547 mon and have the same UID as the hosts on which slurmctld exe‐
548 cute. For security purposes, a user other than "root" is recom‐
549 mended. The default value is "root". This name should also be
550 the same SlurmUser on all clusters reporting to the SlurmDBD.
551 NOTE: If this user is different from the one set for slurmctld
552 and is not root, it must be added to accounting with Admin‐
553 Level=Admin and slurmctld must be restarted.
554
555
556 StorageHost
557 Define the name of the host the database is running where we are
558 going to store the data. Ideally this should be the host on
559 which slurmdbd executes.
560
561
562 StorageBackupHost
563 Define the name of the backup host the database is running where
564 we are going to store the data. This can be viewed as a backup
565 solution when the StorageHost is not responding. It is up to
566 the backup solution to enforce the coherency of the accounting
567 information between the two hosts. With clustered database solu‐
568 tions (active/passive HA), you would not need to use this fea‐
569 ture. Default is none.
570
571
572 StorageLoc
573 Specify the name of the database as the location where account‐
574 ing records are written. Defaults to "slurm_acct_db".
575
576
577 StorageParameters
578 Comma separated list of key-value pair parameters. Currently
579 supported values include options to establish a secure connec‐
580 tion to the database:
581
582 SSL_CERT
583 The path name of the client public key certificate file.
584
585 SSL_CA
586 The path name of the Certificate Authority (CA) certificate
587 file.
588
589 SSL_CAPATH
590 The path name of the directory that contains trusted SSL CA
591 certificate files.
592
593 SSL_KEY
594 The path name of the client private key file.
595
596 SSL_CIPHER
597 The list of permissible ciphers for SSL encryption.
598
599
600 StoragePass
601 Define the password used to gain access to the database to store
602 the job accounting data. The '#' character is not permitted in a
603 password.
604
605
606 StoragePort
607 The port number that the Slurm Database Daemon (slurmdbd) commu‐
608 nicates with the database. Default is 3306.
609
610
611 StorageType
612 Define the accounting storage mechanism type. Acceptable values
613 at present include "accounting_storage/mysql". The value "ac‐
614 counting_storage/mysql" indicates that accounting records should
615 be written to a MySQL or MariaDB database specified by the Stor‐
616 ageLoc parameter. This value must be specified.
617
618
619 StorageUser
620 Define the name of the user we are going to connect to the data‐
621 base with to store the job accounting data.
622
623
624 TCPTimeout
625 Time permitted for TCP connection to be established. Default
626 value is 2 seconds.
627
628
629 TrackSlurmctldDown
630 Boolean yes or no. If set the slurmdbd will mark all idle re‐
631 sources on the cluster as down when a slurmctld disconnects or
632 is no longer reachable. The default is no.
633
634
635 TrackWCKey
636 Boolean yes or no. Used to set display and track of the Work‐
637 load Characterization Key. Must be set to track wckey usage.
638 This must be set to generate rolled up usage tables from WCKeys.
639 NOTE: If TrackWCKey is set here and not in your various
640 slurm.conf files all jobs will be attributed to their default
641 WCKey.
642
643
645 #
646 # Sample /etc/slurmdbd.conf
647 #
648 ArchiveEvents=yes
649 ArchiveJobs=yes
650 ArchiveResvs=yes
651 ArchiveSteps=no
652 ArchiveSuspend=no
653 ArchiveTXN=no
654 ArchiveUsage=no
655 #ArchiveScript=/usr/sbin/slurm.dbd.archive
656 AuthInfo=/var/run/munge/munge.socket.2
657 AuthType=auth/munge
658 DbdHost=db_host
659 DebugLevel=info
660 PurgeEventAfter=1month
661 PurgeJobAfter=12month
662 PurgeResvAfter=1month
663 PurgeStepAfter=1month
664 PurgeSuspendAfter=1month
665 PurgeTXNAfter=12month
666 PurgeUsageAfter=24month
667 LogFile=/var/log/slurmdbd.log
668 PidFile=/var/run/slurmdbd.pid
669 SlurmUser=slurm_mgr
670 StoragePass=password_to_database
671 StorageType=accounting_storage/mysql
672 StorageUser=database_mgr
673
674
676 Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced
677 at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
678 Copyright (C) 2010-2021 SchedMD LLC.
679
680 This file is part of Slurm, a resource management program. For de‐
681 tails, see <https://slurm.schedmd.com/>.
682
683 Slurm is free software; you can redistribute it and/or modify it under
684 the terms of the GNU General Public License as published by the Free
685 Software Foundation; either version 2 of the License, or (at your op‐
686 tion) any later version.
687
688 Slurm is distributed in the hope that it will be useful, but WITHOUT
689 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
690 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
691 for more details.
692
693
695 /etc/slurmdbd.conf
696
697
699 slurm.conf(5), slurmctld(8), slurmdbd(8) syslog (2)
700
701
702
703June 2021 Slurm Configuration File slurmdbd.conf(5)