1slurmdbd.conf(5) Slurm Configuration File slurmdbd.conf(5)
2
3
4
6 slurmdbd.conf - Slurm Database Daemon (SlurmDBD) configuration file
7
8
10 slurmdb.conf is an ASCII file which describes Slurm Database Daemon
11 (SlurmDBD) configuration information. The file location can be modi‐
12 fied at system build time using the DEFAULT_SLURM_CONF parameter or at
13 execution time by setting the SLURM_CONF environment variable.
14
15 The contents of the file are case insensitive except for the names of
16 nodes and files. Any text following a "#" in the configuration file is
17 treated as a comment through the end of that line. Changes to the con‐
18 figuration file take effect upon restart of SlurmDbd or daemon receipt
19 of the SIGHUP signal unless otherwise noted.
20
21 This file should be only on the computer where SlurmDBD executes and
22 should only be readable by the user which executes SlurmDBD (e.g.
23 "slurm"). If the slurmdbd daemon is started as user root and changes
24 to another user ID, the configuration file will initially be read as
25 user root, but will be read as the other user ID in response to a
26 SIGHUP signal. This file should be protected from unauthorized access
27 since it contains a database password. The overall configuration
28 parameters available include:
29
30
31 ArchiveDir
32 If ArchiveScript is not set the slurmdbd will generate a file
33 that can be read in anytime with sacctmgr load filename. This
34 directory is where the file will be placed after a purge event
35 has happened and archive for that element is set to true.
36 Default is /tmp. The format for this files name is
37 $ArchiveDir/$ClusterName_$ArchiveObject_archive_$BeginTimeS‐
38 tamp_$endTimeStamp We limit archive files to 50000 records per
39 file. If more than 50000 records exist during that time period,
40 they will be written to a new file. Subsequent archive files
41 during the same time period will have ".<number>" appended to
42 the file, for example .2, with the number increasing by one for
43 each file in the same time period.
44
45
46 ArchiveEvents
47 When purging events also archive them. Boolean, yes to archive
48 event data, no otherwise. Default is no.
49
50
51 ArchiveJobs
52 When purging jobs also archive them. Boolean, yes to archive
53 job data, no otherwise. Default is no.
54
55
56 ArchiveResvs
57 When purging reservations also archive them. Boolean, yes to
58 archive reservation data, no otherwise. Default is no.
59
60
61 ArchiveScript
62 This script can be executed every time a rollup happens (every
63 hour, day and month), depending on the Purge*After options.
64 This script is used to transfer accounting records out of the
65 database into an archive. It is used in place of the internal
66 process used to archive objects. The script is executed with a
67 no arguments, The following environment variables are set.
68
69 SLURM_ARCHIVE_EVENTS
70 1 for archive events 0 otherwise.
71
72 SLURM_ARCHIVE_LAST_EVENT
73 Time of last event start to archive.
74
75 SLURM_ARCHIVE_JOBS
76 1 for archive jobs 0 otherwise.
77
78 SLURM_ARCHIVE_LAST_JOB
79 Time of last job submit to archive.
80
81 SLURM_ARCHIVE_STEPS
82 1 for archive steps 0 otherwise.
83
84 SLURM_ARCHIVE_LAST_STEP
85 Time of last step start to archive.
86
87 SLURM_ARCHIVE_SUSPEND
88 1 for archive suspend data 0 otherwise.
89
90 SLURM_ARCHIVE_TXN
91 1 for archive transaction data 0 otherwise.
92
93 SLURM_ARCHIVE_USAGE
94 1 for archive usage data 0 otherwise.
95
96 SLURM_ARCHIVE_LAST_SUSPEND
97 Time of last suspend start to archive.
98
99
100
101 ArchiveSteps
102 When purging steps also archive them. Boolean,
103 yes to archive step data, no otherwise. Default
104 is no.
105
106
107 ArchiveSuspend
108 When purging suspend data also archive it. Bool‐
109 ean, yes to archive suspend data, no otherwise.
110 Default is no.
111
112
113 ArchiveTXN
114 When purging transaction data also archive it.
115 Boolean, yes to archive transaction data, no oth‐
116 erwise. Default is no.
117
118
119 ArchiveUsage
120 When purging usage data (Cluster, Association and
121 WCKey) also archive it. Boolean, yes to archive
122 transaction data, no otherwise. Default is no.
123
124
125 AuthInfo
126 Additional information to be used for authentica‐
127 tion of communications with the Slurm control dae‐
128 mon (slurmctld) on each cluster. The interpreta‐
129 tion of this option is specific to the configured
130 AuthType. In the case of auth/munge, this can be
131 configured to use a Munge daemon specifically con‐
132 figured to provide authentication between clusters
133 while the default Munge daemon provides authenti‐
134 cation within a cluster. In that case, this will
135 specify the pathname of the socket to use. Per
136 default this value is left unspecified, which
137 results in the default authentication mechanism
138 being used.
139
140
141 AuthAltTypes
142 Command separated list of alternative authentica‐
143 tion plugins that the slurmdbd will permit for
144 communication.
145
146
147 AuthType
148 Define the authentication method for communica‐
149 tions between Slurm components. Acceptable values
150 at present include "auth/none" and "auth/munge".
151 The default value is "auth/munge". Do not use
152 "auth/none" if you desire any security.
153 "auth/munge" indicates that LLNL's MUNGE system is
154 to be used (this is the supported authentication
155 mechanism for Slurm; see
156 "https://dun.github.io/munge/" for more informa‐
157 tion). SlurmDBD must be terminated prior to
158 changing the value of AuthType and later
159 restarted.
160
161
162 CommitDelay
163 How many seconds between commits on a connection
164 from a Slurmctld. This speeds up inserts into the
165 database dramatically. If you are running a very
166 high throughput of jobs you should consider set‐
167 ting this. In testing, 1 second improves the
168 slurmdbd performance dramatically and reduces
169 overhead. There is a small probability of data
170 loss though since this creates a window in which
171 if the slurmdbd seg faults or exits abnormally for
172 any reason the data not committed could be lost.
173 While this situation should be very rare, it does
174 present an extremely small risk, but may be the
175 only way to run in extremely heavy environments.
176 In all honesty, the risk is quite low, but still
177 present.
178
179
180 DbdBackupHost
181 The short, or long, name of the machine where the
182 backup Slurm Database Daemon is executed (i.e. the
183 name returned by the command "hostname -s"). This
184 host must have access to the same underlying data‐
185 base specified by the 'Storage' options mentioned
186 below.
187
188
189 DbdAddr
190 Name that DbdHost should be referred to in estab‐
191 lishing a communications path. This name will be
192 used as an argument to the gethostbyname() func‐
193 tion for identification. For example, "elx0000"
194 might be used to designate the Ethernet address
195 for node "lx0000". By default the DbdAddr will be
196 identical in value to DbdHost.
197
198
199 DbdHost
200 The short, or long, name of the machine where the
201 Slurm Database Daemon is executed (i.e. the name
202 returned by the command "hostname -s"). This
203 value must be specified.
204
205
206 DbdPort
207 The port number that the Slurm Database Daemon
208 (slurmdbd) listens to for work. The default value
209 is SLURMDBD_PORT as established at system build
210 time. If no value is explicitly specified, it will
211 be set to 6819. This value must be equal to the
212 AccountingStoragePort parameter in the slurm.conf
213 file.
214
215
216 DebugFlags
217 Defines specific subsystems which should provide
218 more detailed event logging. Multiple subsystems
219 can be specified with comma separators. Most
220 DebugFlags will result in verbose logging for the
221 identified subsystems and could impact perfor‐
222 mance. Valid subsystems available today (with
223 more to come) include:
224
225 DB_ARCHIVE SQL statements/queries when deal‐
226 ing with archiving and purging
227 the database.
228
229 DB_ASSOC SQL statements/queries when deal‐
230 ing with associations in the
231 database.
232
233 DB_EVENT SQL statements/queries when deal‐
234 ing with (node) events in the
235 database.
236
237 DB_JOB SQL statements/queries when deal‐
238 ing with jobs in the database.
239
240 DB_QOS SQL statements/queries when deal‐
241 ing with QOS in the database.
242
243 DB_QUERY SQL statements/queries when deal‐
244 ing with transactions and such in
245 the database.
246
247 DB_RESERVATION SQL statements/queries when deal‐
248 ing with reservations in the
249 database.
250
251 DB_RESOURCE SQL statements/queries when deal‐
252 ing with resources like licenses
253 in the database.
254
255 DB_STEP SQL statements/queries when deal‐
256 ing with steps in the database.
257
258 DB_USAGE SQL statements/queries when deal‐
259 ing with usage queries and
260 inserts in the database.
261
262 DB_WCKEY SQL statements/queries when deal‐
263 ing with wckeys in the database.
264
265 FEDERATION SQL statements/queries when deal‐
266 ing with federations in the data‐
267 base.
268
269
270 DebugLevel
271 The level of detail to provide the Slurm Database
272 Daemon's logs. The default value is info.
273
274 quiet Log nothing
275
276 fatal Log only fatal errors
277
278 error Log only errors
279
280 info Log errors and general informational
281 messages
282
283 verbose Log errors and verbose informational
284 messages
285
286 debug Log errors and verbose informational
287 messages and debugging messages
288
289 debug2 Log errors and verbose informational
290 messages and more debugging messages
291
292 debug3 Log errors and verbose informational
293 messages and even more debugging mes‐
294 sages
295
296 debug4 Log errors and verbose informational
297 messages and even more debugging mes‐
298 sages
299
300 debug5 Log errors and verbose informational
301 messages and even more debugging mes‐
302 sages
303
304
305 DebugLevelSyslog
306 The slurmdbd daemon will log events to the syslog
307 file at the specified level of detail. If not set,
308 the slurmdbd daemon will log to syslog at level
309 fatal, unless there is no LogFile and it is run‐
310 ning in the background, in which case it will log
311 to syslog at the level specified by DebugLevel (at
312 fatal in the case that DebugLevel is set to quiet)
313 or it is run in the foreground, when it will be
314 set to quiet.
315
316
317 quiet Log nothing
318
319 fatal Log only fatal errors
320
321 error Log only errors
322
323 info Log errors and general informational
324 messages
325
326 verbose Log errors and verbose informational
327 messages
328
329 debug Log errors and verbose informational
330 messages and debugging messages
331
332 debug2 Log errors and verbose informational
333 messages and more debugging messages
334
335 debug3 Log errors and verbose informational
336 messages and even more debugging mes‐
337 sages
338
339 debug4 Log errors and verbose informational
340 messages and even more debugging mes‐
341 sages
342
343 debug5 Log errors and verbose informational
344 messages and even more debugging mes‐
345 sages
346
347
348
349 DefaultQOS
350 When adding a new cluster this will be used as the
351 qos for the cluster unless something is explicitly
352 set by the admin with the create.
353
354
355 LogFile
356 Fully qualified pathname of a file into which the
357 Slurm Database Daemon's logs are written. The
358 default value is none (performs logging via sys‐
359 log).
360 See the section LOGGING in the slurm.conf man page
361 if a pathname is specified.
362
363
364 LogTimeFormat
365 Format of the timestamp in slurmdbd log files.
366 Accepted values are "iso8601", "iso8601_ms",
367 "rfc5424", "rfc5424_ms", "clock", and "short". The
368 values ending in "_ms" differ from the ones with‐
369 out in that fractional seconds with millisecond
370 precision are printed. The default value is
371 "iso8601_ms". The "rfc5424" formats are the same
372 as the "iso8601" formats except that the timezone
373 value is also shown. The "clock" format shows a
374 timestamp in microseconds retrieved with the C
375 standard clock() function. The "short" format is a
376 short date and time format. The "thread_id" format
377 shows the timestamp in the C standard ctime()
378 function form without the year but including the
379 microseconds, the daemon's process ID and the cur‐
380 rent thread ID.
381
382
383 MaxQueryTimeRange
384 Return an error if a query is against too large of
385 a time span, to prevent ill-formed queries from
386 causing performance problems within SlurmDBD.
387 Default value is INFINITE which allows any queries
388 to proceed. Accepted time formats are the same as
389 the MaxTime option in slurm.conf. User SlurmUser
390 and root are exempt from this restriction. Note
391 that queries which attempt to return over 3GB of
392 data will still fail to complete with
393 ESLURM_RESULT_TOO_LARGE.
394
395
396 MessageTimeout
397 Time permitted for a round-trip communication to
398 complete in seconds. Default value is 10 seconds.
399
400
401 Parameters
402 Contains arbitrary comma separated parameters used
403 to alter the behavior of the slurmdbd.
404
405 PreserveCaseUser
406 When defining users do not force lower case
407 which is the default behavior.
408
409
410 PidFile
411 Fully qualified pathname of a file into which the
412 Slurm Database Daemon may write its process ID.
413 This may be used for automated signal processing.
414 The default value is "/var/run/slurmdbd.pid".
415
416
417 PluginDir
418 Identifies the places in which to look for Slurm
419 plugins. This is a colon-separated list of direc‐
420 tories, like the PATH environment variable. The
421 default value is "/usr/local/lib/slurm".
422
423
424 PrivateData
425 This controls what type of information is hidden
426 from regular users. By default, all information
427 is visible to all users. User SlurmUser, root,
428 and users with AdminLevel=Admin can always view
429 all information. Multiple values may be specified
430 with a comma separator. Acceptable values
431 include:
432
433 accounts
434 prevents users from viewing any account
435 definitions unless they are coordinators of
436 them.
437
438 events prevents users from viewing event informa‐
439 tion unless they have operator status or
440 above.
441
442 jobs prevents users from viewing job records
443 belonging to other users unless they are
444 coordinators of the association running the
445 job when using sacct.
446
447 reservations
448 restricts getting reservation information
449 to users with operator status and above.
450
451 usage prevents users from viewing usage of any
452 other user. This applys to sreport.
453
454 users prevents users from viewing information of
455 any user other than themselves, this also
456 makes it so users can only see associations
457 they deal with. Coordinators can see asso‐
458 ciations of all users they are coordinator
459 of, but can only see themselves when list‐
460 ing users.
461
462
463 PurgeEventAfter
464 Events happening on the cluster over this age are
465 purged from the database. This includes node down
466 times and such. The time is a numeric value and
467 is a number of months. If you want to purge more
468 often you can include "hours", or "days" behind
469 the numeric value to get those more frequent
470 purges (i.e. a value of "12hours" would purge
471 everything older than 12 hours). The purge takes
472 place at the start of the each purge interval.
473 For example, if the purge time is 2 months, the
474 purge would happen at the beginning of each month.
475 If not set (default), then job step records are
476 never purged.
477
478
479 PurgeJobAfter
480 Individual job records over this age are purged
481 from the database. Aggregated information will be
482 preserved to "PurgeUsageAfter". The time is a
483 numeric value and is a number of months. If you
484 want to purge more often you can include "hours",
485 or "days" behind the numeric value to get those
486 more frequent purges (i.e. a value of "12hours"
487 would purge everything older than 12 hours). The
488 purge takes place at the start of the each purge
489 interval. For example, if the purge time is 2
490 months, the purge would happen at the beginning of
491 each month. If not set (default), then job
492 records are never purged.
493
494
495 PurgeResvAfter
496 Individual reservation records over this age are
497 purged from the database. Aggregated information
498 will be preserved to "PurgeUsageAfter". The time
499 is a numeric value and is a number of months. If
500 you want to purge more often you can include
501 "hours", or "days" behind the numeric value to get
502 those more frequent purges (i.e. a value of
503 "12hours" would purge everything older than 12
504 hours). The purge takes place at the start of the
505 each purge interval. For example, if the purge
506 time is 2 months, the purge would happen at the
507 beginning of each month. If not set (default),
508 then reservation records are never purged.
509
510
511 PurgeStepAfter
512 Individual job step records over this age are
513 purged from the database. Aggregated information
514 will be preserved to "PurgeUsageAfter". The time
515 is a numeric value and is a number of months. If
516 you want to purge more often you can include
517 "hours", or "days" behind the numeric value to get
518 those more frequent purges (i.e. a value of
519 "12hours" would purge everything older than 12
520 hours). The purge takes place at the start of the
521 each purge interval. For example, if the purge
522 time is 2 months, the purge would happen at the
523 beginning of each month. If not set (default),
524 then job step records are never purged.
525
526
527 PurgeSuspendAfter
528 Records of individual suspend times for jobs over
529 this age are purged from the database. Aggregated
530 information will be preserved to
531 "PurgeUsageAfter". The time is a numeric value
532 and is a number of months. If you want to purge
533 more often you can include "hours", or "days"
534 behind the numeric value to get those more fre‐
535 quent purges (i.e. a value of "12hours" would
536 purge everything older than 12 hours). The purge
537 takes place at the start of the each purge inter‐
538 val. For example, if the purge time is 2 months,
539 the purge would happen at the beginning of each
540 month. If not set (default), then job step
541 records are never purged.
542
543
544 PurgeTXNAfter
545 Records of individual transaction times for trans‐
546 actions over this age are purged from the data‐
547 base. The time is a numeric value and is a number
548 of months. If you want to purge more often you
549 can include "hours", or "days" behind the numeric
550 value to get those more frequent purges (i.e. a
551 value of "12hours" would purge everything older
552 than 12 hours). The purge takes place at the
553 start of the each purge interval. For example, if
554 the purge time is 2 months, the purge would happen
555 at the beginning of each month. If not set
556 (default), then job step records are never purged.
557
558
559 PurgeUsageAfter
560 Usage Records (Cluster, Association and WCKey)
561 over this age are purged from the database. The
562 time is a numeric value and is a number of months.
563 If you want to purge more often you can include
564 "hours", or "days" behind the numeric value to get
565 those more frequent purges (i.e. a value of
566 "12hours" would purge everything older than 12
567 hours). The purge takes place at the start of the
568 each purge interval. For example, if the purge
569 time is 2 months, the purge would happen at the
570 beginning of each month. If not set (default),
571 then job step records are never purged.
572
573
574 SlurmUser
575 The name of the user that the slurmdbd daemon exe‐
576 cutes as. This user must exist on the machine
577 executing the Slurm Database Daemon and have the
578 same UID as the hosts on which slurmctld execute.
579 For security purposes, a user other than "root" is
580 recommended. The default value is "root". This
581 name should also be the same SlurmUser on all
582 clusters reporting to the SlurmDBD. NOTE: If this
583 user is different from the one set for slurmctld
584 and is not root, it must be added to accounting
585 with AdminLevel=Admin and slurmctld must be
586 restarted.
587
588
589 StorageHost
590 Define the name of the host the database is run‐
591 ning where we are going to store the data. Ide‐
592 ally this should be the host on which slurmdbd
593 executes.
594
595
596 StorageBackupHost
597 Define the name of the backup host the database is
598 running where we are going to store the data.
599 This can be viewed as a backup solution when the
600 StorageHost is not responding. It is up to the
601 backup solution to enforce the coherency of the
602 accounting information between the two hosts. With
603 clustered database solutions (active/passive HA),
604 you would not need to use this feature. Default
605 is none.
606
607
608 StorageLoc
609 Specify the name of the database as the location
610 where accounting records are written. Defaults to
611 "slurm_acct_db".
612
613
614 StoragePass
615 Define the password used to gain access to the
616 database to store the job accounting data. The '#'
617 character is not permitted in a password.
618
619
620 StoragePort
621 The port number that the Slurm Database Daemon
622 (slurmdbd) communicates with the database.
623
624
625 StorageType
626 Define the accounting storage mechanism type.
627 Acceptable values at present include "account‐
628 ing_storage/mysql". The value "accounting_stor‐
629 age/mysql" indicates that accounting records
630 should be written to a MySQL or MariaDB database
631 specified by the StorageLoc parameter. This value
632 must be specified.
633
634
635 StorageUser
636 Define the name of the user we are going to con‐
637 nect to the database with to store the job
638 accounting data.
639
640
641 TCPTimeout
642 Time permitted for TCP connection to be estab‐
643 lished. Default value is 2 seconds.
644
645
646 TrackSlurmctldDown
647 Boolean yes or no. If set the slurmdbd will mark
648 all idle resources on the cluster as down when a
649 slurmctld disconnects or is no longer reachable.
650 The default is no.
651
652
653 TrackWCKey
654 Boolean yes or no. Used to set display and track
655 of the Workload Characterization Key. Must be set
656 to track wckey usage. This must be set to gener‐
657 ate rolled up usage tables from WCKeys. NOTE: If
658 TrackWCKey is set here and not in your various
659 slurm.conf files all jobs will be attributed to
660 their default WCKey.
661
662
664 #
665 # Sample /etc/slurmdbd.conf
666 #
667 ArchiveEvents=yes
668 ArchiveJobs=yes
669 ArchiveResvs=yes
670 ArchiveSteps=no
671 ArchiveSuspend=no
672 ArchiveTXN=no
673 ArchiveUsage=no
674 #ArchiveScript=/usr/sbin/slurm.dbd.archive
675 AuthInfo=/var/run/munge/munge.socket.2
676 AuthType=auth/munge
677 DbdHost=db_host
678 DebugLevel=info
679 PurgeEventAfter=1month
680 PurgeJobAfter=12month
681 PurgeResvAfter=1month
682 PurgeStepAfter=1month
683 PurgeSuspendAfter=1month
684 PurgeTXNAfter=12month
685 PurgeUsageAfter=24month
686 LogFile=/var/log/slurmdbd.log
687 PidFile=/var/tmp/jette/slurmdbd.pid
688 SlurmUser=slurm_mgr
689 StoragePass=shazaam
690 StorageType=accounting_storage/mysql
691 StorageUser=database_mgr
692
693
695 Copyright (C) 2008-2010 Lawrence Livermore National Secu‐
696 rity. Produced at Lawrence Livermore National Laboratory
697 (cf, DISCLAIMER).
698 Copyright (C) 2010-2014 SchedMD LLC.
699
700 This file is part of Slurm, a resource management pro‐
701 gram. For details, see <https://slurm.schedmd.com/>.
702
703 Slurm is free software; you can redistribute it and/or
704 modify it under the terms of the GNU General Public
705 License as published by the Free Software Foundation;
706 either version 2 of the License, or (at your option) any
707 later version.
708
709 Slurm is distributed in the hope that it will be useful,
710 but WITHOUT ANY WARRANTY; without even the implied war‐
711 ranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR‐
712 POSE. See the GNU General Public License for more
713 details.
714
715
717 /etc/slurmdbd.conf
718
719
721 slurm.conf(5), slurmctld(8), slurmdbd(8) syslog (2)
722
723
724
725July 2019 Slurm Configuration File slurmdbd.conf(5)