1sacctmgr(1) Slurm Commands sacctmgr(1)
2
3
4
6 sacctmgr - Used to view and modify Slurm account information.
7
8
10 sacctmgr [OPTIONS...] [COMMAND...]
11
12
14 sacctmgr is used to view or modify Slurm account information. The ac‐
15 count information is maintained within a database with the interface
16 being provided by slurmdbd (Slurm Database daemon). This database can
17 serve as a central storehouse of user and computer information for mul‐
18 tiple computers at a single site. Slurm account information is
19 recorded based upon four parameters that form what is referred to as an
20 association. These parameters are user, cluster, partition, and ac‐
21 count. user is the login name. cluster is the name of a Slurm managed
22 cluster as specified by the ClusterName parameter in the slurm.conf
23 configuration file. partition is the name of a Slurm partition on that
24 cluster. account is the bank account for a job. The intended mode of
25 operation is to initiate the sacctmgr command, add, delete, modify,
26 and/or list association records then commit the changes and exit.
27
28 NOTE: The contents of Slurm's database are maintained in lower case.
29 This may result in some sacctmgr output differing from that of other
30 Slurm commands.
31
32
34 -s, --associations
35 Use with show or list to display associations with the entity.
36 This is equivalent to the associations command.
37
38 -h, --help
39 Print a help message describing the usage of sacctmgr. This is
40 equivalent to the help command.
41
42 -i, --immediate
43 Commit changes immediately without asking for confirmation.
44
45 -n, --noheader
46 No header will be added to the beginning of the output.
47
48 -p, --parsable
49 Output will be '|' delimited with a '|' at the end.
50
51 -P, --parsable2
52 Output will be '|' delimited without a '|' at the end.
53
54 -Q, --quiet
55 Print no messages other than error messages. This is equivalent
56 to the quiet command.
57
58 -r, --readonly
59 Makes it so the running sacctmgr cannot modify accounting infor‐
60 mation. The readonly option is for use within interactive mode.
61
62 -v, --verbose
63 Enable detailed logging. This is equivalent to the verbose com‐
64 mand.
65
66 -V , --version
67 Display version number. This is equivalent to the version com‐
68 mand.
69
71 add <ENTITY> <SPECS>
72 Add an entity. Identical to the create command.
73
74 archive {dump|load} <SPECS>
75 Write database information to a flat file or load information
76 that has previously been written to a file.
77
78 clear stats
79 Clear the server statistics.
80
81 create <ENTITY> <SPECS>
82 Add an entity. Identical to the add command.
83
84 delete <ENTITY> where <SPECS>
85 Delete the specified entities. Identical to the remove command.
86
87 dump <ENTITY> [File=<FILENAME>]
88 Dump cluster data to the specified file. If the filename is not
89 specified it uses clustername.cfg filename by default.
90
91 help Display a description of sacctmgr options and commands.
92
93 list <ENTITY> [<SPECS>]
94 Display information about the specified entity. By default, all
95 entries are displayed, you can narrow results by specifying
96 SPECS in your query. Identical to the show command.
97
98 load <FILENAME>
99 Load cluster data from the specified file. This is a configura‐
100 tion file generated by running the sacctmgr dump command. This
101 command does not load archive data, see the sacctmgr archive
102 load option instead.
103
104 modify <ENTITY> where <SPECS> set <SPECS>
105 Modify an entity.
106
107 reconfigure
108 Reconfigures the SlurmDBD if running with one.
109
110 remove <ENTITY> where <SPECS>
111 Delete the specified entities. Identical to the delete command.
112
113 show <ENTITY> [<SPECS>]
114 Display information about the specified entity. By default, all
115 entries are displayed, you can narrow results by specifying
116 SPECS in your query. Identical to the list command.
117
118 shutdown
119 Shutdown the server.
120
121 version
122 Display the version number of sacctmgr.
123
125 NOTE: All commands listed below can be used in the interactive mode,
126 but NOT on the initial command line.
127
128
129 exit Terminate sacctmgr interactive mode. Identical to the quit com‐
130 mand.
131
132 quiet Print no messages other than error messages.
133
134 quit Terminate the execution of sacctmgr interactive mode. Identical
135 to the exit command.
136
137 verbose
138 Enable detailed logging. This includes time-stamps on data
139 structures, record counts, etc. This is an independent command
140 with no options meant for use in interactive mode.
141
142 !! Repeat the last command.
143
145 account
146 A bank account, typically specified at job submit time using the
147 --account= option. These may be arranged in a hierarchical
148 fashion, for example accounts 'chemistry' and 'physics' may be
149 children of the account 'science'. The hierarchy may have an
150 arbitrary depth.
151
152 association
153 The entity used to group information consisting of four parame‐
154 ters: account, cluster, partition (optional), and user. Used
155 only with the list or show command. Add, modify, and delete
156 should be done to a user, account or cluster entity, which will
157 in turn update the underlying associations. Modification of at‐
158 tributes like limits is allowed for an association but not a
159 modification of the four core attributes of an association. You
160 cannot change the partition setting (or set one if it has not
161 been set) for an existing association. Instead, you will need to
162 create a new association with the partition included. You can
163 either keep the previous association with no partition defined,
164 or delete it. Note that these newly added associations are
165 unique entities and any existing usage information will not be
166 carried over to the new association.
167
168 cluster
169 The ClusterName parameter in the slurm.conf configuration file,
170 used to differentiate accounts on different machines.
171
172 configuration
173 Used only with the list or show command to report current system
174 configuration.
175
176 coordinator
177 A special privileged user, usually an account manager, that can
178 add users or sub-accounts to the account they are coordinator
179 over. This should be a trusted person since they can change
180 limits on account and user associations, as well as cancel, re‐
181 queue or reassign accounts of jobs inside their realm.
182
183 event Events like downed or draining nodes on clusters.
184
185 federation
186 A group of clusters that work together to schedule jobs.
187
188 job Used to modify specific fields of a job: Derived Exit Code, Com‐
189 ment, AdminComment, SystemComment, or WCKey.
190
191 problem
192 Use with show or list to display entity problems.
193
194 qos Quality of Service.
195
196 reservation
197 A collection of resources set apart for use by a particular ac‐
198 count, user or group of users for a given period of time.
199
200 resource
201 Software resources for the system. Those are software licenses
202 shared among clusters.
203
204 RunawayJobs
205 Used only with the list or show command to report current jobs
206 that have been orphaned on the local cluster and are now run‐
207 away. If there are jobs in this state it will also give you an
208 option to "fix" them. NOTE: You must have an AdminLevel of at
209 least Operator to perform this.
210
211 stats Used with list or show command to view server statistics. Ac‐
212 cepts optional argument of ave_time or total_time to sort on
213 those fields. By default, sorts on increasing RPC count field.
214
215 transaction
216 List of transactions that have occurred during a given time pe‐
217 riod.
218
219 tres Used with list or show command to view a list of Trackable RE‐
220 Sources configured on the system.
221
222 user The login name. Usernames are case-insensitive (forced to lower‐
223 case) unless the PreserveCaseUser option has been set in the
224 SlurmDBD configuration file.
225
226 wckeys Workload Characterization Key. An arbitrary string for
227 grouping orthogonal accounts.
228
230 NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job
231 is being considered for being allocated resources. If starting a job
232 would cause any of its group limit to be exceeded, that job will not be
233 considered for scheduling even if that job might preempt other jobs
234 which would release sufficient group resources for the pending job to
235 be initiated.
236
237
238 DefaultQOS=<default_qos>
239 The default QOS this association and its children should have.
240 This is overridden if set directly on a user. To clear a previ‐
241 ously set value use the modify command with a new value of -1.
242
243 Fairshare={<fairshare_number>|parent}
244 Share={<fairshare_number>|parent}
245 Number used in conjunction with other accounts to determine job
246 priority. Can also be the string parent, when used on a user
247 this means that the parent association is used for fairshare.
248 If Fairshare=parent is set on an account, that account's chil‐
249 dren will be effectively reparented for fairshare calculations
250 to the first parent of their parent that is not Fairshare=par‐
251 ent. Limits remain the same, only its fairshare value is af‐
252 fected. To clear a previously set value use the modify command
253 with a new value of -1.
254
255 GrpJobs=<max_jobs>
256 Maximum number of running jobs in aggregate for this association
257 and all associations which are children of this association. To
258 clear a previously set value use the modify command with a new
259 value of -1.
260
261 GrpJobsAccrue=<max_jobs>
262 Maximum number of pending jobs in aggregate able to accrue age
263 priority for this association and all associations which are
264 children of this association. To clear a previously set value
265 use the modify command with a new value of -1.
266
267 GrpSubmit=<max_jobs>
268 GrpSubmitJobs=<max_jobs>
269 Maximum number of jobs which can be in a pending or running
270 state at any time in aggregate for this association and all as‐
271 sociations which are children of this association. To clear a
272 previously set value use the modify command with a new value of
273 -1.
274
275 GrpTRES=TRES=<max_TRES>[,TRES=<max_TRES>,...]
276 Maximum number of TRES running jobs are able to be allocated in
277 aggregate for this association and all associations which are
278 children of this association. To clear a previously set value
279 use the modify command with a new value of -1 for each TRES id.
280
281 NOTE: This limit only applies fully when using the Select Con‐
282 sumable Resource plugin.
283
284 GrpTRESMins=TRES=<minutes>[,TRES=<minutes>,...]
285 The total number of TRES minutes that can possibly be used by
286 past, present and future jobs running from this association and
287 its children. To clear a previously set value use the modify
288 command with a new value of -1 for each TRES id.
289
290 NOTE: This limit is not enforced if set on the root association
291 of a cluster. So even though it may appear in sacctmgr output,
292 it will not be enforced.
293
294 ALSO NOTE: This limit only applies when using the Priority Mul‐
295 tifactor plugin. The time is decayed using the value of Priori‐
296 tyDecayHalfLife or PriorityUsageResetPeriod as set in the
297 slurm.conf. When this limit is reached all associated jobs run‐
298 ning will be killed and all future jobs submitted with associa‐
299 tions in the group will be delayed until they are able to run
300 inside the limit.
301
302 GrpTRESRunMins=TRES=<minutes>[,TRES=<minutes>,...]
303 Used to limit the combined total number of TRES minutes used by
304 all jobs running with this association and its children. This
305 takes into consideration time limit of running jobs and consumes
306 it, if the limit is reached no new jobs are started until other
307 jobs finish to allow time to free up.
308
309 GrpWall=<max_wall>
310 Maximum wall clock time running jobs are able to be allocated in
311 aggregate for this association and all associations which are
312 children of this association. To clear a previously set value
313 use the modify command with a new value of -1.
314
315 NOTE: This limit is not enforced if set on the root association
316 of a cluster. So even though it may appear in sacctmgr output,
317 it will not be enforced.
318
319 ALSO NOTE: This limit only applies when using the Priority Mul‐
320 tifactor plugin. The time is decayed using the value of Priori‐
321 tyDecayHalfLife or PriorityUsageResetPeriod as set in the
322 slurm.conf. When this limit is reached all associated jobs run‐
323 ning will be killed and all future jobs submitted with associa‐
324 tions in the group will be delayed until they are able to run
325 inside the limit.
326
327 MaxJobs=<max_jobs>
328 Maximum number of jobs each user is allowed to run at one time
329 in this association. This is overridden if set directly on a
330 user. Default is the cluster's limit. To clear a previously
331 set value use the modify command with a new value of -1.
332
333 MaxJobsAccrue=<max_jobs>
334 Maximum number of pending jobs able to accrue age priority at
335 any given time for the given association. This is overridden if
336 set directly on a user. Default is the cluster's limit. To
337 clear a previously set value use the modify command with a new
338 value of -1.
339
340 MaxSubmit=<max_jobs>
341 MaxSubmitJobs=<max_jobs>
342 Maximum number of jobs which this association can have in a
343 pending or running state at any time. Default is the cluster's
344 limit. To clear a previously set value use the modify command
345 with a new value of -1.
346
347 MaxTRESMins=TRES=<minutes>[,TRES=<minutes>,...]
348 MaxTRESMinsPerJob=TRES=<minutes>[,TRES=<minutes>,...]
349 Maximum number of TRES minutes each job is able to use in this
350 association. This is overridden if set directly on a user. De‐
351 fault is the cluster's limit. To clear a previously set value
352 use the modify command with a new value of -1 for each TRES id.
353
354 MaxTRES=TRES=<max_TRES>[,TRES=<max_TRES>,...]
355 MaxTRESPerJob=TRES=<max_TRES>[,TRES=<max_TRES>,...]
356 Maximum number of TRES each job is able to use in this associa‐
357 tion. This is overridden if set directly on a user. Default is
358 the cluster's limit. To clear a previously set value use the
359 modify command with a new value of -1 for each TRES id.
360
361 NOTE: This limit only applies fully when using cons_res or
362 cons_tres select type plugins.
363
364 MaxWall=<max_wall>
365 MaxWallDurationPerJob=<max_wall>
366 Maximum wall clock time each job is able to use in this associa‐
367 tion. This is overridden if set directly on a user. Default is
368 the cluster's limit. <max wall> format is <min> or <min>:<sec>
369 or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>.
370 The value is recorded in minutes with rounding as needed. To
371 clear a previously set value use the modify command with a new
372 value of -1.
373
374 NOTE: Changing this value will have no effect on any running or
375 pending job.
376
377 Priority
378 What priority will be added to a job's priority when using this
379 association. This is overridden if set directly on a user. De‐
380 fault is the cluster's limit. To clear a previously set value
381 use the modify command with a new value of -1.
382
383 QosLevel<operator><comma_separated_list_of_qos_names>
384 Specify the default Quality of Service's that jobs are able to
385 run at for this association. To get a list of valid QOS's use
386 'sacctmgr list qos'. This value will override its parents value
387 and push down to its children as the new default. Setting a
388 QosLevel to '' (two single quotes with nothing between them) re‐
389 stores its default setting. You can also use the operator +=
390 and -= to add or remove certain QOS's from a QOS list.
391
392 Valid <operator> values include:
393
394 =
395 Set QosLevel to the specified value. Note: the QOS that can
396 be used at a given account in the hierarchy are inherited
397 by the children of that account. By assigning QOS with the
398 = sign only the assigned QOS can be used by the account and
399 its children.
400 +=
401 Add the specified <qos> value to the current QosLevel.
402 The account will have access to this QOS and the other
403 previously assigned to it.
404 -=
405 Remove the specified <qos> value from the current
406 QosLevel.
407
408
409 See the EXAMPLES section below.
410
412 Cluster=<cluster>
413 Specific cluster to add account to. Default is all in system.
414
415 Description=<description>
416 An arbitrary string describing an account.
417
418 Name=<name>
419 The name of a bank account. Note the name must be unique and
420 can not be represent different bank accounts at different points
421 in the account hierarchy.
422
423 Organization=<org>
424 Organization to which the account belongs.
425
426 Parent=<parent>
427 Parent account of this account. Default is the root account, a
428 top level account.
429
430 RawUsage=<value>
431 This allows an administrator to reset the raw usage accrued to
432 an account. The only value currently supported is 0 (zero).
433 This is a settable specification only - it cannot be used as a
434 filter to list accounts.
435
436 WithAssoc
437 Display all associations for this account.
438
439 WithCoord
440 Display all coordinators for this account.
441
442 WithDeleted
443 Display information with previously deleted data. Accounts that
444 are deleted within 24 hours of being created and did not have a
445 job run in the account during that time will be removed from the
446 database. Otherwise, the account will be marked as deleted and
447 will be viewable with the WithDeleted flag.
448
449 NOTE: If using the WithAssoc option you can also query against associa‐
450 tion specific information to view only certain associations this ac‐
451 count may have. These extra options can be found in the SPECIFICATIONS
452 FOR ASSOCIATIONS section. You can also use the general specifications
453 list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES
454 section.
455
457 Account
458 The name of a bank account.
459
460 Description
461 An arbitrary string describing an account.
462
463 Organization
464 Organization to which the account belongs.
465
466 Coordinators
467 List of users that are a coordinator of the account. (Only
468 filled in when using the WithCoordinator option.)
469
470 NOTE: If using the WithAssoc option you can also view the information
471 about the various associations the account may have on all the clusters
472 in the system. The association information can be filtered. Note that
473 all the accounts in the database will always be shown as filter only
474 takes effect over the association data. The Association format fields
475 are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
476
478 Clusters=<cluster_name>[,<cluster_name>,...]
479 List the associations of the cluster(s).
480
481 Accounts=<account_name>[,<account_name>,...]
482 List the associations of the account(s).
483
484 Users=<user_name>[,<user_name>,...]
485 List the associations of the user(s).
486
487 Partition=<partition_name>[,<partition_name>,...]
488 List the associations of the partition(s).
489
490 NOTE: You can also use the general specifications list above in the
491 GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.
492
493 Other options unique for listing associations:
494
495 OnlyDefaults
496 Display only associations that are default associations
497
498 Tree Display account names in a hierarchical fashion.
499
500 WithDeleted
501 Display information with previously deleted data. Associations
502 that are deleted within 24 hours of being created and did not
503 have a job run in the association during that time will be re‐
504 moved from the database. Otherwise, the association will be
505 marked as deleted and will be viewable with the WithDeleted
506 flag.
507
508 WithSubAccounts
509 Display information with subaccounts. Only really valuable when
510 used with the account= option. This will display all the subac‐
511 count associations along with the accounts listed in the option.
512
513 WOLimits
514 Display information without limit information. This is for a
515 smaller default format of "Cluster,Account,User,Partition".
516
517 WOPInfo
518 Display information without parent information (i.e. parent id,
519 and parent account name). This option also implicitly sets the
520 WOPLimits option.
521
522 WOPLimits
523 Display information without hierarchical parent limits (i.e.
524 will only display limits where they are set instead of propagat‐
525 ing them from the parent).
526
528 Account
529 The name of a bank account in the association.
530
531 Cluster
532 The name of a cluster in the association.
533
534 DefaultQOS
535 The QOS the association will use by default if it as access to
536 it in the QOS list mentioned below.
537
538 Fairshare
539 Share Number used in conjunction with other accounts to determine job
540 priority. Can also be the string parent, when used on a user
541 this means that the parent association is used for fairshare.
542 If Fairshare=parent is set on an account, that account's chil‐
543 dren will be effectively reparented for fairshare calculations
544 to the first parent of their parent that is not Fairshare=par‐
545 ent. Limits remain the same, only its fairshare value is af‐
546 fected.
547
548 GrpJobs
549 Maximum number of running jobs in aggregate for this association
550 and all associations which are children of this association.
551
552 GrpJobsAccrue
553 Maximum number of pending jobs in aggregate able to accrue age
554 priority for this association and all associations which are
555 children of this association.
556
557 GrpSubmit
558 GrpSubmitJobs
559 Maximum number of jobs which can be in a pending or running
560 state at any time in aggregate for this association and all as‐
561 sociations which are children of this association.
562
563 GrpTRES
564 Maximum number of TRES running jobs are able to be allocated in
565 aggregate for this association and all associations which are
566 children of this association.
567
568 GrpTRESMins
569 The total number of TRES minutes that can possibly be used by
570 past, present and future jobs running from this association and
571 its children.
572
573 GrpTRESRunMins
574 Used to limit the combined total number of TRES minutes used by
575 all jobs running with this association and its children. This
576 takes into consideration time limit of running jobs and consumes
577 it, if the limit is reached no new jobs are started until other
578 jobs finish to allow time to free up.
579
580 GrpWall
581 Maximum wall clock time running jobs are able to be allocated in
582 aggregate for this association and all associations which are
583 children of this association.
584
585 ID The id of the association.
586
587 LFT Associations are kept in a hierarchy: this is the left most spot
588 in the hierarchy. When used with the RGT variable, all associa‐
589 tions with a LFT inside this LFT and before the RGT are children
590 of this association.
591
592 MaxJobs
593 Maximum number of jobs each user is allowed to run at one time.
594
595 MaxJobsAccrue
596 Maximum number of pending jobs able to accrue age priority at
597 any given time. This limit only applies to the job's QOS and
598 not the partition's QOS.
599
600 MaxSubmit
601 MaxSubmitJobs
602 Maximum number of jobs in the pending or running state at any
603 time.
604
605 MaxTRES
606 MaxTRESPerJob
607 Maximum number of TRES each job is able to use.
608
609 MaxTRESMins
610 MaxTRESMinsPerJob
611 Maximum number of TRES minutes each job is able to use.
612
613 MaxTRESPerNode
614 Maximum number of TRES each node in a job allocation can use.
615
616 MaxWall
617 MaxWallDurationPerJob
618 Maximum wall clock time each job is able to use.
619
620 Qos Valid QOS' for this association.
621
622 QosRaw QOS' ID.
623
624 ParentID
625 The association id of the parent of this association.
626
627 ParentName
628 The account name of the parent of this association.
629
630 Partition
631 The name of a partition in the association.
632
633 Priority
634 What priority will be added to a job's priority when using this
635 association.
636
637 RGT Associations are kept in a hierarchy: this is the right most
638 spot in the hierarchy. When used with the LFT variable, all as‐
639 sociations with a LFT inside this RGT and after the LFT are
640 children of this association.
641
642 User The name of a user in the association.
643
644 WithRawQOSLevel
645 Display QosLevel in an unevaluated raw format, consisting of a
646 comma separated list of QOS names prepended with '' (nothing),
647 '+' or '-' for the association. QOS names without +/- prepended
648 were assigned (ie, sacctmgr modify ... set QosLevel=qos_name)
649 for the entity listed or on one of its parents in the hierarchy.
650 QOS names with +/- prepended indicate the QOS was added/filtered
651 (ie, sacctmgr modify ... set QosLevel=[+-]qos_name) for the en‐
652 tity listed or on one of its parents in the hierarchy. Including
653 WOPLimits will show exactly where each QOS was assigned, added
654 or filtered in the hierarchy.
655
657 Classification=<classification>
658 Type of machine, current classifications are capability, capac‐
659 ity and capapacity.
660
661 Features=<comma_separated_list_of_feature_names>
662 Features that are specific to the cluster. Federated jobs can be
663 directed to clusters that contain the job requested features.
664
665 Federation=<federation>
666 The federation that this cluster should be a member of. A clus‐
667 ter can only be a member of one federation at a time.
668
669 FedState=<state>
670 The state of the cluster in the federation.
671 Valid states are:
672
673 ACTIVE Cluster will actively accept and schedule federated jobs.
674
675 INACTIVE
676 Cluster will not schedule or accept any jobs.
677
678 DRAIN Cluster will not accept any new jobs and will let exist‐
679 ing federated jobs complete.
680
681 DRAIN+REMOVE
682 Cluster will not accept any new jobs and will remove it‐
683 self from the federation once all federated jobs have
684 completed. When removed from the federation, the cluster
685 will accept jobs as a non-federated cluster.
686
687 Name=<name>
688 The name of a cluster. This should be equal to the ClusterName
689 parameter in the slurm.conf configuration file for some
690 Slurm-managed cluster.
691
692 RPC=<rpc_list>
693 Comma separated list of numeric RPC values.
694
695 WithDeleted
696 Display information with previously deleted data. Clusters that
697 are deleted within 24 hours of being created and did not have a
698 job run in the cluster during that time will be removed from the
699 database. Otherwise, the cluster will be marked as deleted and
700 will be viewable with the WithDeleted flag.
701
702 WithFed
703 Appends federation related columns to default format options
704 (e.g. Federation,ID,Features,FedState).
705
706 WOLimits
707 Display information without limit information. This is for a
708 smaller default format of Cluster,ControlHost,ControlPort,RPC
709
710 NOTE: You can also use the general specifications list above in the
711 GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.
712
714 Classification
715 Type of machine, i.e. capability, capacity or capapacity.
716
717 Cluster
718 The name of the cluster.
719
720 ControlHost
721 When a slurmctld registers with the database the ip address of
722 the controller is placed here.
723
724 ControlPort
725 When a slurmctld registers with the database the port the con‐
726 troller is listening on is placed here.
727
728 Features
729 The list of features on the cluster (if any).
730
731 Federation
732 The name of the federation this cluster is a member of (if any).
733
734 FedState
735 The state of the cluster in the federation (if a member of one).
736
737 FedStateRaw
738 Numeric value of the name of the FedState.
739
740 Flags Attributes possessed by the cluster. Current flags include Cray,
741 External and MultipleSlurmd.
742
743 External clusters are registration only clusters. A slurmctld
744 can designate an external slurmdbd with the AccountingStorageEx‐
745 ternalHost slurm.conf option. This allows a slurmctld to regis‐
746 ter to an external slurmdbd so that clusters attached to the ex‐
747 ternal slurmdbd can communicate with the external cluster with
748 Slurm commands.
749
750 ID The ID assigned to the cluster when a member of a federation.
751 This ID uniquely identifies the cluster and its jobs in the fed‐
752 eration.
753
754 NodeCount
755 The current count of nodes associated with the cluster.
756
757 NodeNames
758 The current Nodes associated with the cluster.
759
760 PluginIDSelect
761 The numeric value of the select plugin the cluster is using.
762
763 RPC When a slurmctld registers with the database the rpc version the
764 controller is running is placed here.
765
766 TRES Trackable RESources (Billing, BB (Burst buffer), CPU, Energy,
767 GRES, License, Memory, and Node) this cluster is accounting for.
768
769
770 NOTE: You can also view the information about the root association for
771 the cluster. The Association format fields are described in the
772 LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
773
775 Account=<account_name>[,<account_name>,...]
776 Account name to add this user as a coordinator to.
777
778 Names=<user_name>[,<user_name>,...]
779 Names of coordinators.
780
781 NOTE: To list coordinators use the WithCoordinator options with list
782 account or list user.
783
785 All_Clusters
786 Get information on all cluster shortcut.
787
788 All_Time
789 Get time period for all time shortcut.
790
791 Clusters=<cluster_name>[,<cluster_name>,...]
792 List the events of the cluster(s). Default is the cluster where
793 the command was run.
794
795 CondFlags=<flag>[,<flag>,...]
796 Optional list of flags to filter events by.
797 Valid options are
798
799 Open If set, only open node events (currently down) will be
800 returned.
801
802 End=<OPT>
803 Period ending of events. Default is now.
804
805 Valid time formats are...
806 HH:MM[:SS] [AM|PM]
807 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
808 MM/DD[/YY]-HH:MM[:SS]
809 YYYY-MM-DD[THH:MM[:SS]]
810 now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]
811
812 Event=<OPT>
813 Specific events to look for, valid options are Cluster or Node,
814 default is both.
815
816 MaxCPUs=<OPT>
817 Max number of CPUs affected by an event.
818
819 MinCPUs=<OPT>
820 Min number of CPUs affected by an event.
821
822 Nodes=<node_name>[,<node_name>,...]
823 Node names affected by an event.
824
825 Reason=<reason>[,<reason>,...]
826 Reason an event happened.
827
828 Start=<OPT>
829 Period start of events. Default is 00:00:00 of previous day,
830 unless states are given with the States= spec events. If this
831 is the case the default behavior is to return events currently
832 in the states specified.
833
834 Valid time formats are...
835 HH:MM[:SS] [AM|PM]
836 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
837 MM/DD[/YY]-HH:MM[:SS]
838 YYYY-MM-DD[THH:MM[:SS]]
839 now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]
840
841 States=<state>[,<state>,...]
842 State of a node in a node event. If this is set, the event type
843 is set automatically to Node.
844
845 User=<user_name>[,<user_name>,...]
846 Query against users who set the event. If this is set, the
847 event type is set automatically to Node since only user slurm
848 can perform a cluster event.
849
851 Cluster
852 The name of the cluster event happened on.
853
854 ClusterNodes
855 The hostlist of nodes on a cluster in a cluster event.
856
857 Duration
858 Time period the event was around for.
859
860 End Period when event ended.
861
862 Event Name of the event.
863
864 EventRaw
865 Numeric value of the name of the event.
866
867 NodeName
868 The node affected by the event. In a cluster event, this is
869 blank.
870
871 Reason The reason an event happened.
872
873 Start Period when event started.
874
875 State On a node event this is the formatted state of the node during
876 the event.
877
878 StateRaw
879 On a node event this is the numeric value of the state of the
880 node during the event.
881
882 TRES Number of TRES involved with the event.
883
884 User On a node event this is the user who caused the event to happen.
885
887 Clusters[+|-]=<cluster_name>[,<cluster_name>,...]
888 List of clusters to add/remove to a federation. A blank value
889 (e.g. clusters=) will remove all federations for the federation.
890 NOTE: a cluster can only be a member of one federation.
891
892 Name=<name>
893 The name of the federation.
894
895 Tree Display federations in a hierarchical fashion.
896
897 WithDeleted
898 Display information with previously deleted data. Federations
899 that are deleted within 24 hours of being created will be re‐
900 moved from the database. Federations that were created more than
901 24 hours prior to the deletion request are just marked as
902 deleted and will be viewable with the WithDeleted flag.
903
905 Cluster
906 Name of the cluster that is a member of the federation.
907
908 Features
909 The list of features on the cluster.
910
911 Federation
912 The name of the federation.
913
914 FedState
915 The state of the cluster in the federation.
916
917 FedStateRaw
918 Numeric value of the name of the FedState.
919
920 Index The index of the cluster in the federation.
921
923 AdminComment=<admin_comment>
924 Arbitrary descriptive string. Can only be modified by a Slurm
925 administrator.
926
927 Comment=<comment>
928 The job's comment string when the AccountingStoreFlags parameter
929 in the slurm.conf file contains 'job_comment'. The user can
930 only modify the comment string of their own job.
931
932 Cluster=<cluster_list>
933 List of clusters to alter jobs on, defaults to local cluster.
934
935 DerivedExitCode=<derived_exit_code>
936 The derived exit code can be modified after a job completes
937 based on the user's judgment of whether the job succeeded or
938 failed. The user can only modify the derived exit code of their
939 own job.
940
941 EndTime
942 Jobs must end before this time to be modified. Format output is,
943 YYYY-MM-DDTHH:MM:SS, unless changed through the SLURM_TIME_FOR‐
944 MAT environment variable.
945
946 JobID=<jobid_list>
947 The id of the job to change. Not needed if altering multiple
948 jobs using wckey specification.
949
950 NewWCKey=<new_wckey>
951 Use to rename a wckey on job(s) in the accounting database
952
953 StartTime
954 Jobs must start at or after this time to be modified in the same
955 format as EndTime.
956
957 SystemComment=<system_comment>
958 Arbitrary descriptive string, usually managed by the Burst‐
959 BufferPlugin. Can only be modified by a Slurm administrator.
960
961 User=<user_list>
962 Used to specify the jobs of users jobs to alter.
963
964 WCKey=<wckey_list>
965 Used to specify the wckeys to alter.
966
967 The DerivedExitCode, AdminComment, Comment, SystemComment, and WCKey
968 fields are the only fields of a job record in the database that can be
969 modified after job completion.
970
972 The sacct command is the exclusive command to display job records from
973 the Slurm database.
974
975
977 NOTE: The group limits (GrpJobs, GrpNodes, etc.) are tested when a job
978 is being considered for being allocated resources. If starting a job
979 would cause any of its group limit to be exceeded, that job will not be
980 considered for scheduling even if that job might preempt other jobs
981 which would release sufficient group resources for the pending job to
982 be initiated.
983
984
985 Flags Used by the slurmctld to override or enforce certain character‐
986 istics.
987 Valid options are
988
989 DenyOnLimit
990 If set, jobs using this QOS will be rejected at submis‐
991 sion time if they do not conform to the QOS 'Max' or
992 'Min' limits as stand-alone jobs. Jobs that exceed these
993 limits when other jobs are considered, but conform to the
994 limits when considered individually will not be rejected.
995 Instead they will pend until resources are available.
996 Group limits (e.g. GrpTRES) will also be treated like
997 'Max' limits (e.g. MaxTRESPerNode) and jobs will be de‐
998 nied if they would violate the limit as stand-alone jobs.
999 This currently only applies to QOS and Association lim‐
1000 its.
1001
1002 EnforceUsageThreshold
1003 If set, and the QOS also has a UsageThreshold, any jobs
1004 submitted with this QOS that fall below the UsageThresh‐
1005 old will be held until their Fairshare Usage goes above
1006 the Threshold.
1007
1008 NoDecay
1009 If set, this QOS will not have its GrpTRESMins, GrpWall
1010 and UsageRaw decayed by the slurm.conf PriorityDecay‐
1011 HalfLife or PriorityUsageResetPeriod settings. This al‐
1012 lows a QOS to provide aggregate limits that, once con‐
1013 sumed, will not be replenished automatically. Such a QOS
1014 will act as a time-limited quota of resources for an as‐
1015 sociation that has access to it. Account/user usage will
1016 still be decayed for associations using the QOS. The QOS
1017 GrpTRESMins and GrpWall limits can be increased or the
1018 QOS RawUsage value reset to 0 (zero) to again allow jobs
1019 submitted with this QOS to be queued (if DenyOnLimit is
1020 set) or run (pending with QOSGrp{TRES}MinutesLimit or
1021 QOSGrpWallLimit reasons, where {TRES} is some type of
1022 trackable resource).
1023
1024 NoReserve
1025 If this flag is set and backfill scheduling is used, jobs
1026 using this QOS will not reserve resources in the backfill
1027 schedule's map of resources allocated through time. This
1028 flag is intended for use with a QOS that may be preempted
1029 by jobs associated with all other QOS (e.g use with a
1030 "standby" QOS). If this flag is used with a QOS which can
1031 not be preempted by all other QOS, it could result in
1032 starvation of larger jobs.
1033
1034 OverPartQOS
1035 If set jobs using this QOS will be able to override any
1036 limits used by the requested partition's QOS limits.
1037
1038 PartitionMaxNodes
1039 If set jobs using this QOS will be able to override the
1040 requested partition's MaxNodes limit.
1041
1042 PartitionMinNodes
1043 If set jobs using this QOS will be able to override the
1044 requested partition's MinNodes limit.
1045
1046 PartitionTimeLimit
1047 If set jobs using this QOS will be able to override the
1048 requested partition's TimeLimit.
1049
1050 RequiresReservation
1051 If set jobs using this QOS must designate a reservation
1052 when submitting a job. This option can be useful in re‐
1053 stricting usage of a QOS that may have greater preemptive
1054 capability or additional resources to be allowed only
1055 within a reservation.
1056
1057 UsageFactorSafe
1058 If set, and AccountingStorageEnforce includes Safe, jobs
1059 will only be able to run if the job can run to completion
1060 with the UsageFactor applied.
1061
1062 GraceTime
1063 Preemption grace time, in seconds, to be extended to a job which
1064 has been selected for preemption.
1065
1066 GrpJobs
1067 Maximum number of running jobs in aggregate for this QOS.
1068
1069 GrpJobsAccrue
1070 Maximum number of pending jobs in aggregate able to accrue age
1071 priority for this QOS. This limit only applies to the job's QOS
1072 and not the partition's QOS.
1073
1074 GrpSubmit
1075 GrpSubmitJobs
1076 Maximum number of jobs which can be in a pending or running
1077 state at any time in aggregate for this QOS.
1078
1079 GrpTRES
1080 Maximum number of TRES running jobs are able to be allocated in
1081 aggregate for this QOS.
1082
1083 GrpTRESMins
1084 The total number of TRES minutes that can possibly be used by
1085 past, present and future jobs running from this QOS.
1086
1087 GrpTRESRunMins
1088 Used to limit the combined total number of TRES minutes used by
1089 all jobs running with this QOS. This takes into consideration
1090 time limit of running jobs and consumes it, if the limit is
1091 reached no new jobs are started until other jobs finish to allow
1092 time to free up.
1093
1094 GrpWall
1095 Maximum wall clock time running jobs are able to be allocated in
1096 aggregate for this QOS. If this limit is reached submission re‐
1097 quests will be denied and the running jobs will be killed.
1098
1099 ID The id of the QOS.
1100
1101 LimitFactor
1102 Factor to scale TRES count limits when running with this QOS.
1103 See below for more details.
1104
1105 MaxJobsAccruePA
1106 MaxJobsAccruePerAccount
1107 Maximum number of pending jobs an account (or subacct) can have
1108 accruing age priority at any given time. This limit only ap‐
1109 plies to the job's QOS and not the partition's QOS.
1110
1111 MaxJobsAccruePU
1112 MaxJobsAccruePerUser
1113 Maximum number of pending jobs a user can have accruing age pri‐
1114 ority at any given time. This limit only applies to the job's
1115 QOS and not the partition's QOS.
1116
1117 MaxJobsPA
1118 MaxJobsPerAccount
1119 Maximum number of jobs each account is allowed to run at one
1120 time.
1121
1122 MaxJobsPU
1123 MaxJobsPerUser
1124 Maximum number of jobs each user is allowed to run at one time.
1125
1126 MaxSubmitJobsPA
1127 MaxSubmitJobsPerAccount
1128 Maximum number of jobs pending or running state at any time per
1129 account.
1130
1131 MaxSubmitJobsPU
1132 MaxSubmitJobsPerUser
1133 Maximum number of jobs pending or running state at any time per
1134 user.
1135
1136 MaxTRES
1137 MaxTRESPerJob
1138 Maximum number of TRES each job is able to use.
1139
1140 MaxTRESMins
1141 MaxTRESMinsPerJob
1142 Maximum number of TRES minutes each job is able to use.
1143
1144 MaxTRESPA
1145 MaxTRESPerAccount
1146 Maximum number of TRES each account is able to use.
1147
1148 MaxTRESPerNode
1149 Maximum number of TRES each node in a job allocation can use.
1150
1151 MaxTRESPU
1152 MaxTRESPerUser
1153 Maximum number of TRES each user is able to use.
1154
1155 MaxWall
1156 MaxWallDurationPerJob
1157 Maximum wall clock time each job is able to use.
1158
1159 MinPrioThreshold
1160 Minimum priority required to reserve resources when scheduling.
1161
1162 MinTRES
1163 MinTRESPerJob
1164 Minimum number of TRES each job running under this QOS must re‐
1165 quest. Otherwise the job will pend until modified.
1166
1167 Name Name of the QOS.
1168
1169 Preempt
1170 Other QOS' this QOS can preempt.
1171
1172 NOTE: The Priority of a QOS is NOT related to QOS preemption,
1173 only Preempt is used to define which QOS can preempt others.
1174
1175 PreemptExemptTime
1176 Specifies a minimum run time for jobs of this QOS before they
1177 are considered for preemption. This QOS option takes precedence
1178 over the global PreemptExemptTime. This is only honored for Pre‐
1179 emptMode=REQUEUE and PreemptMode=CANCEL.
1180 Setting to -1 disables the option, allowing another QOS or the
1181 global option to take effect. Setting to 0 indicates no minimum
1182 run time and supersedes the lower priority QOS (see OverPartQOS)
1183 and/or the global option in slurm.conf.
1184
1185 PreemptMode
1186 Mechanism used to preempt jobs or enable gang scheduling for
1187 this QOS when the cluster PreemptType is set to preempt/qos.
1188 This QOS-specific PreemptMode will override the cluster-wide
1189 PreemptMode for this QOS. Unsetting the QOS specific Preempt‐
1190 Mode, by specifying "OFF", "" or "Cluster", makes it use the de‐
1191 fault cluster-wide PreemptMode.
1192 The GANG option is used to enable gang scheduling independent of
1193 whether preemption is enabled (i.e. independent of the Preempt‐
1194 Type setting). It can be specified in addition to a PreemptMode
1195 setting with the two options comma separated (e.g. Preempt‐
1196 Mode=SUSPEND,GANG).
1197 See <https://slurm.schedmd.com/preempt.html> and
1198 <https://slurm.schedmd.com/gang_scheduling.html> for more de‐
1199 tails.
1200
1201 NOTE: For performance reasons, the backfill scheduler reserves
1202 whole nodes for jobs, not partial nodes. If during backfill
1203 scheduling a job preempts one or more other jobs, the whole
1204 nodes for those preempted jobs are reserved for the preemptor
1205 job, even if the preemptor job requested fewer resources than
1206 that. These reserved nodes aren't available to other jobs dur‐
1207 ing that backfill cycle, even if the other jobs could fit on the
1208 nodes. Therefore, jobs may preempt more resources during a sin‐
1209 gle backfill iteration than they requested.
1210 NOTE: For heterogeneous job to be considered for preemption all
1211 components must be eligible for preemption. When a heterogeneous
1212 job is to be preempted the first identified component of the job
1213 with the highest order PreemptMode (SUSPEND (highest), REQUEUE,
1214 CANCEL (lowest)) will be used to set the PreemptMode for all
1215 components. The GraceTime and user warning signal for each com‐
1216 ponent of the heterogeneous job remain unique. Heterogeneous
1217 jobs are excluded from GANG scheduling operations.
1218
1219 OFF Is the default value and disables job preemption and
1220 gang scheduling. It is only compatible with Pre‐
1221 emptType=preempt/none at a global level.
1222
1223 CANCEL The preempted job will be cancelled.
1224
1225 GANG Enables gang scheduling (time slicing) of jobs in
1226 the same partition, and allows the resuming of sus‐
1227 pended jobs. Gang scheduling is performed indepen‐
1228 dently for each partition, so if you only want
1229 time-slicing by OverSubscribe, without any preemp‐
1230 tion, then configuring partitions with overlapping
1231 nodes is not recommended. Time-slicing won't happen
1232 between jobs on different partitions.
1233
1234 NOTE: Heterogeneous jobs are excluded from GANG
1235 scheduling operations.
1236
1237 REQUEUE Preempts jobs by requeuing them (if possible) or
1238 canceling them. For jobs to be requeued they must
1239 have the --requeue sbatch option set or the cluster
1240 wide JobRequeue parameter in slurm.conf must be set
1241 to 1.
1242
1243 SUSPEND The preempted jobs will be suspended, and later the
1244 Gang scheduler will resume them. Therefore the SUS‐
1245 PEND preemption mode always needs the GANG option to
1246 be specified at the cluster level. Also, because the
1247 suspended jobs will still use memory on the allo‐
1248 cated nodes, Slurm needs to be able to track memory
1249 resources to be able to suspend jobs.
1250 If PreemptType=preempt/qos is configured and if the
1251 preempted job(s) and the preemptor job are on the
1252 same partition, then they will share resources with
1253 the Gang scheduler (time-slicing). If not (i.e. if
1254 the preemptees and preemptor are on different parti‐
1255 tions) then the preempted jobs will remain suspended
1256 until the preemptor ends.
1257
1258 NOTE: Suspended jobs will not release GRES. Higher
1259 priority jobs will not be able to preempt to gain
1260 access to GRES.
1261
1262 WITHIN Allows for preemption between jobs sharing the same
1263 qos. By default, PreemptType=preempt/qos will only
1264 consider jobs to be eligible for preemption if they
1265 do not share the same qos value.
1266
1267 Priority
1268 What priority will be added to a job's priority when using this
1269 QOS.
1270
1271 NOTE: The Priority of a QOS is NOT related to QOS preemption,
1272 see Preempt instead.
1273
1274 RawUsage=<value>
1275 This allows an administrator to reset the raw usage accrued to a
1276 QOS. The only value currently supported is 0 (zero). This is a
1277 settable specification only - it cannot be used as a filter to
1278 list accounts.
1279
1280 UsageFactor
1281 Usage factor when running with this QOS. See below for more de‐
1282 tails.
1283
1284 UsageThreshold
1285 A float representing the lowest fairshare of an association al‐
1286 lowable to run a job. If an association falls below this
1287 threshold and has pending jobs or submits new jobs those jobs
1288 will be held until the usage goes back above the threshold. Use
1289 sshare to see current shares on the system.
1290
1291 WithDeleted
1292 Display information with previously deleted data. A QOS that is
1293 deleted within 24 hours of being created and did not have a job
1294 run in the QOS during that time will be removed from the data‐
1295 base. Otherwise, the QOS will be marked as deleted and will be
1296 viewable with the WithDeleted flag.
1297
1299 Description
1300 An arbitrary string describing a QOS.
1301
1302 GraceTime
1303 Preemption grace time to be extended to a job which has been se‐
1304 lected for preemption in the format of hh:mm:ss. The default
1305 value is zero, no preemption grace time is allowed on this QOS.
1306 This value is only meaningful for QOS PreemptMode=CANCEL and
1307 PreemptMode=REQUEUE.
1308
1309 GrpJobs
1310 Maximum number of running jobs in aggregate for this QOS. To
1311 clear a previously set value use the modify command with a new
1312 value of -1.
1313
1314 GrpJobsAccrue
1315 Maximum number of pending jobs in aggregate able to accrue age
1316 priority for this QOS. This limit only applies to the job's QOS
1317 and not the partition's QOS. To clear a previously set value
1318 use the modify command with a new value of -1.
1319
1320 GrpSubmit
1321 GrpSubmitJobs
1322 Maximum number of jobs which can be in a pending or running
1323 state at any time in aggregate for this QOS. To clear a previ‐
1324 ously set value use the modify command with a new value of -1.
1325
1326 GrpTRES
1327 Maximum number of TRES running jobs are able to be allocated in
1328 aggregate for this QOS. To clear a previously set value use the
1329 modify command with a new value of -1 for each TRES id.
1330
1331 GrpTRESMins
1332 The total number of TRES minutes that can possibly be used by
1333 past, present and future jobs running from this QOS. To clear a
1334 previously set value use the modify command with a new value of
1335 -1 for each TRES id. NOTE: This limit only applies when using
1336 the Priority Multifactor plugin. The time is decayed using the
1337 value of PriorityDecayHalfLife or PriorityUsageResetPeriod as
1338 set in the slurm.conf. When this limit is reached all associ‐
1339 ated jobs running will be killed and all future jobs submitted
1340 with this QOS will be delayed until they are able to run inside
1341 the limit.
1342
1343 GrpWall
1344 Maximum wall clock time running jobs are able to be allocated in
1345 aggregate for this QOS. To clear a previously set value use the
1346 modify command with a new value of -1. NOTE: This limit only
1347 applies when using the Priority Multifactor plugin. The time is
1348 decayed using the value of PriorityDecayHalfLife or Priori‐
1349 tyUsageResetPeriod as set in the slurm.conf. When this limit is
1350 reached all associated jobs running will be killed and all fu‐
1351 ture jobs submitted with this QOS will be delayed until they are
1352 able to run inside the limit.
1353
1354 LimitFactor
1355 A float that is factored into an associations [Grp|Max]TRES lim‐
1356 its. For example, if the LimitFactor is 2, then an association
1357 with a GrpTRES of 30 CPUs, would be allowed to allocate 60 CPUs
1358 when running under this QOS.
1359
1360 NOTE: This factor is only applied to associations running in
1361 this QOS and is not applied to any limits in the QOS itself.
1362
1363 To clear a previously set value use the modify command with a
1364 new value of -1.
1365
1366 MaxJobsAccruePA
1367 MaxJobsAccruePerAccount
1368 Maximum number of jobs an account (or subacct) can have accruing
1369 age priority at any given time. This limit only applies to the
1370 job's QOS and not the partition's QOS.
1371
1372 MaxJobsAccruePU
1373 MaxJobsAccruePerUser
1374 Maximum number of jobs a user can have accruing age priority at
1375 any given time. This limit only applies to the job's QOS and not
1376 the partition's QOS.
1377
1378 MaxJobsPA
1379 MaxJobsPerAccount
1380 Maximum number of jobs each account is allowed to run at one
1381 time. To clear a previously set value use the modify command
1382 with a new value of -1.
1383
1384 MaxJobsPU
1385 MaxJobsPerUser
1386 Maximum number of jobs each user is allowed to run at one time.
1387 To clear a previously set value use the modify command with a
1388 new value of -1.
1389
1390 MaxTRESMins
1391 MaxTRESMinsPerJob
1392 Maximum number of TRES minutes each job is able to use. To
1393 clear a previously set value use the modify command with a new
1394 value of -1 for each TRES id.
1395
1396 MaxTRESPA
1397 MaxTRESPerAccount
1398 Maximum number of TRES each account is able to use. To clear a
1399 previously set value use the modify command with a new value of
1400 -1 for each TRES id.
1401
1402 MaxTRES
1403 MaxTRESPerJob
1404 Maximum number of TRES each job is able to use. To clear a pre‐
1405 viously set value use the modify command with a new value of -1
1406 for each TRES id.
1407
1408 MaxTRESPerNode
1409 Maximum number of TRES each node in a job allocation can use.
1410 To clear a previously set value use the modify command with a
1411 new value of -1 for each TRES id.
1412
1413 MaxTRESPU
1414 MaxTRESPerUser
1415 Maximum number of TRES each user is able to use. To clear a
1416 previously set value use the modify command with a new value of
1417 -1 for each TRES id.
1418
1419 MaxSubmitJobsPA
1420 MaxSubmitJobsPerAccount
1421 Maximum number of jobs pending or running state at any time per
1422 account. To clear a previously set value use the modify command
1423 with a new value of -1.
1424
1425 MaxSubmitJobsPU
1426 MaxSubmitJobsPerUser
1427 Maximum number of jobs pending or running state at any time per
1428 user. To clear a previously set value use the modify command
1429 with a new value of -1.
1430
1431 MaxWall
1432 MaxWallDurationPerJob
1433 Maximum wall clock time each job is able to use. <max wall>
1434 format is <min> or <min>:<sec> or <hr>:<min>:<sec> or
1435 <days>-<hr>:<min>:<sec> or <days>-<hr>. The value is recorded
1436 in minutes with rounding as needed. To clear a previously set
1437 value use the modify command with a new value of -1.
1438
1439 MinPrioThreshold
1440 Minimum priority required to reserve resources when scheduling.
1441 To clear a previously set value use the modify command with a
1442 new value of -1.
1443
1444 MinTRES
1445 Minimum number of TRES each job running under this QOS must re‐
1446 quest. Otherwise the job will pend until modified. To clear a
1447 previously set value use the modify command with a new value of
1448 -1 for each TRES id.
1449
1450 Name Name of the QOS. Needed for creation.
1451
1452 Preempt
1453 Other QOS' this QOS can preempt. Setting a Preempt to '' (two
1454 single quotes with nothing between them) restores its default
1455 setting. You can also use the operator += and -= to add or re‐
1456 move certain QOS's from a QOS list.
1457
1458 PreemptMode
1459 Mechanism used to preempt jobs of this QOS if the clusters Pre‐
1460 emptType is configured to preempt/qos. The default preemption
1461 mechanism is specified by the cluster-wide PreemptMode configu‐
1462 ration parameter. Possible values are "Cluster" (meaning use
1463 cluster default), "Cancel", and "Requeue". This option is not
1464 compatible with PreemptMode=OFF or PreemptMode=SUSPEND (i.e.
1465 preempted jobs must be removed from the resources).
1466
1467 Priority
1468 What priority will be added to a job's priority when using this
1469 QOS. To clear a previously set value use the modify command
1470 with a new value of -1.
1471
1472 UsageFactor
1473 A float that is factored into a job’s TRES usage (e.g. RawUsage,
1474 TRESMins, TRESRunMins). For example, if the usagefactor was 2,
1475 for every TRESBillingUnit second a job ran it would count for 2.
1476 If the usagefactor was .5, every second would only count for
1477 half of the time. A setting of 0 would add no timed usage from
1478 the job.
1479
1480 The usage factor only applies to the job's QOS and not the par‐
1481 tition QOS.
1482
1483 If the UsageFactorSafe flag is set and AccountingStorageEnforce
1484 includes Safe, jobs will only be able to run if the job can run
1485 to completion with the UsageFactor applied.
1486
1487 If the UsageFactorSafe flag is not set and AccountingStorageEn‐
1488 force includes Safe, a job will be able to be scheduled without
1489 the UsageFactor applied and will be able to run without being
1490 killed due to limits.
1491
1492 If the UsageFactorSafe flag is not set and AccountingStorageEn‐
1493 force does not include Safe, a job will be able to be scheduled
1494 without the UsageFactor applied and could be killed due to lim‐
1495 its.
1496
1497 See AccountingStorageEnforce in slurm.conf man page.
1498
1499 Default is 1. To clear a previously set value use the modify
1500 command with a new value of -1.
1501
1503 Clusters=<cluster_name>[,<cluster_name>,...]
1504 List the reservations of the cluster(s). Default is the cluster
1505 where the command was run.
1506
1507 End=<OPT>
1508 Period ending of reservations. Default is now.
1509
1510 Valid time formats are...
1511 HH:MM[:SS] [AM|PM]
1512 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1513 MM/DD[/YY]-HH:MM[:SS]
1514 YYYY-MM-DD[THH:MM[:SS]]
1515 now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]
1516
1517 ID=<OPT>
1518 Comma separated list of reservation ids.
1519
1520 Names=<OPT>
1521 Comma separated list of reservation names.
1522
1523 Nodes=<node_name>[,<node_name>,...]
1524 Node names where reservation ran.
1525
1526 Start=<OPT>
1527 Period start of reservations. Default is 00:00:00 of current
1528 day.
1529
1530 Valid time formats are...
1531 HH:MM[:SS] [AM|PM]
1532 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1533 MM/DD[/YY]-HH:MM[:SS]
1534 YYYY-MM-DD[THH:MM[:SS]]
1535 now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]
1536
1538 Associations
1539 The id's of the associations able to run in the reservation.
1540
1541 Cluster
1542 Name of cluster reservation was on.
1543
1544 End End time of reservation.
1545
1546 Flags Flags on the reservation.
1547
1548 ID Reservation ID.
1549
1550 Name Name of this reservation.
1551
1552 NodeNames
1553 List of nodes in the reservation.
1554
1555 Start Start time of reservation.
1556
1557 TRES List of TRES in the reservation.
1558
1559 UnusedWall
1560 Wall clock time in seconds unused by any job. A job's allocated
1561 usage is its run time multiplied by the ratio of its CPUs to the
1562 total number of CPUs in the reservation. For example, a job us‐
1563 ing all the CPUs in the reservation running for 1 minute would
1564 reduce unused_wall by 1 minute.
1565
1567 Clusters=<name_list>
1568 Comma separated list of cluster names on which specified re‐
1569 sources are to be available. If no names are designated then
1570 the clusters already allowed to use this resource will be al‐
1571 tered.
1572
1573 Count=<OPT>
1574 Number of software resources of a specific name configured on
1575 the system being controlled by a resource manager.
1576
1577 Descriptions=
1578 A brief description of the resource.
1579
1580 Flags=<OPT>
1581 Flags that identify specific attributes of the system resource.
1582 At this time no flags have been defined.
1583
1584 Names=<OPT>
1585 Comma separated list of the name of a resource configured on the
1586 system being controlled by a resource manager. If this resource
1587 is seen on the slurmctld its name will be name@server to distin‐
1588 guish it from local resources defined in a slurm.conf.
1589
1590 PercentAllowed=<percent_allowed>
1591 Percentage of a specific resource that can be used on specified
1592 cluster.
1593
1594 Server=<OPT>
1595 The name of the server serving up the resource. Default is
1596 'slurmdb' indicating the licenses are being served by the data‐
1597 base.
1598
1599 ServerType=<OPT>
1600 The type of a software resource manager providing the licenses.
1601 For example FlexNext Publisher Flexlm license server or Reprise
1602 License Manager RLM.
1603
1604 Type=<OPT>
1605 The type of the resource represented by this record. Currently
1606 the only valid type is License.
1607
1608 WithClusters
1609 Display the clusters percentage of resources. If a resource
1610 hasn't been given to a cluster the resource will not be dis‐
1611 played with this flag.
1612
1613 WithDeleted
1614 Display information with previously deleted data. Resources
1615 that are deleted within 24 hours of being created will be re‐
1616 moved from the database. Resources that were created more than
1617 24 hours prior to the deletion request are just marked as
1618 deleted and will be viewable with the WithDeleted flag.
1619
1620 NOTE: Resource is used to define each resource configured on a system
1621 available for usage by Slurm clusters.
1622
1624 Allocated
1625 The percent of licenses allocated to a cluster.
1626
1627 Cluster
1628 Name of cluster resource is given to.
1629
1630 Count The count of a specific resource configured on the system glob‐
1631 ally.
1632
1633 Description
1634 Description of the resource.
1635
1636 Name Name of this resource.
1637
1638 Server Server serving up the resource.
1639
1640 ServerType
1641 The type of the server controlling the licenses.
1642
1643 Type Type of resource this record represents.
1644
1646 Cluster
1647 Name of cluster job ran on.
1648
1649 ID Id of the job.
1650
1651 Name Name of the job.
1652
1653 Partition
1654 Partition job ran on.
1655
1656 State Current State of the job in the database.
1657
1658 TimeEnd
1659 Current recorded time of the end of the job.
1660
1661 TimeStart
1662 Time job started running.
1663
1665 Accounts=<account_name>[,<account_name>,...]
1666 Only print out the transactions affecting specified accounts.
1667
1668 Action=<Specific_action_the_list_will_display>
1669 Only display transactions of the specified action type.
1670
1671 Actor=<Specific_name_the_list_will_display>
1672 Only display transactions done by a certain person.
1673
1674 Clusters=<cluster_name>[,<cluster_name>,...]
1675 Only print out the transactions affecting specified clusters.
1676
1677 End=<Date_and_time_of_last_transaction_to_return>
1678 Return all transactions before this Date and time. Default is
1679 now.
1680
1681 Start=<Date_and_time_of_first_transaction_to_return>
1682 Return all transactions after this Date and time. Default is
1683 epoch.
1684
1685 Valid time formats for End and Start are...
1686 HH:MM[:SS] [AM|PM]
1687 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1688 MM/DD[/YY]-HH:MM[:SS]
1689 YYYY-MM-DD[THH:MM[:SS]]
1690 now[{+|-}count[seconds(default)|minutes|hours|days|weeks]]
1691
1692 Users=<user_name>[,<user_name>,...]
1693 Only print out the transactions affecting specified users.
1694
1695 WithAssoc
1696 Get information about which associations were affected by the
1697 transactions.
1698
1700 Action Displays the type of Action that took place.
1701
1702 Actor Displays the Actor to generate a transaction.
1703
1704 Info Displays details of the transaction.
1705
1706 TimeStamp
1707 Displays when the transaction occurred.
1708
1709 Where Displays details of the constraints for the transaction.
1710
1711 NOTE: If using the WithAssoc option you can also view the information
1712 about the various associations the transaction affected. The Associa‐
1713 tion format fields are described in the LIST/SHOW ASSOCIATION FORMAT
1714 OPTIONS section.
1715
1717 Account=<account>
1718 Account name to add this user to.
1719
1720 AdminLevel=<level>
1721 Admin level of user. Valid levels are None, Operator, and Ad‐
1722 min.
1723
1724 Cluster=<cluster>
1725 Specific cluster to add user to the account on. Default is all
1726 in system.
1727
1728 DefaultAccount=<account>
1729 Identify the default bank account name to be used for a job if
1730 none is specified at submission time.
1731
1732 DefaultWCKey=<defaultwckey>
1733 Identify the default Workload Characterization Key.
1734
1735 Name=<name>
1736 Name of user.
1737
1738 NewName=<newname>
1739 Use to rename a user in the accounting database
1740
1741 Partition=<name>
1742 Partition name.
1743
1744 RawUsage=<value>
1745 This allows an administrator to reset the raw usage accrued to a
1746 user. The only value currently supported is 0 (zero). This is
1747 a settable specification only - it cannot be used as a filter to
1748 list users.
1749
1750 WCKeys=<wckeys>
1751 Workload Characterization Key values.
1752
1753 WithAssoc
1754 Display all associations for this user.
1755
1756 WithCoord
1757 Display all accounts a user is coordinator for.
1758
1759 WithDeleted
1760 Display information with previously deleted data. Users that
1761 are deleted within 24 hours of being created and did not have a
1762 job run by the user during that time will be removed from the
1763 database. Otherwise, the user will be marked as deleted and
1764 will be viewable with the WithDeleted flag.
1765
1766 NOTE: If using the WithAssoc option you can also query against associa‐
1767 tion specific information to view only certain associations this user
1768 may have. These extra options can be found in the SPECIFICATIONS FOR
1769 ASSOCIATIONS section. You can also use the general specifications list
1770 above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES sec‐
1771 tion.
1772
1774 AdminLevel
1775 Admin level of user.
1776
1777 Coordinators
1778 List of users that are a coordinator of the account. (Only
1779 filled in when using the WithCoordinator option.)
1780
1781 DefaultAccount
1782 The user's default account.
1783
1784 DefaultWCKey
1785 The user's default wckey.
1786
1787 User The name of a user.
1788
1789 NOTE: If using the WithAssoc option you can also view the information
1790 about the various associations the user may have on all the clusters in
1791 the system. The association information can be filtered. Note that all
1792 the users in the database will always be shown as filter only takes ef‐
1793 fect over the association data. The Association format fields are de‐
1794 scribed in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
1795
1797 Cluster
1798 Specific cluster for the WCKey.
1799
1800 ID The ID of the WCKey.
1801
1802 User The name of a user for the WCKey.
1803
1804 WCKey Workload Characterization Key.
1805
1806 WithDeleted
1807 Display information with previously deleted data. WCKeys that
1808 are deleted within 24 hours of being created and did not have a
1809 job run with the WCKey during that time will be removed from the
1810 database. Otherwise, the WCKey will be marked as deleted and
1811 will be viewable with the WithDeleted flag.
1812
1814 ID The identification number of the trackable resource as it ap‐
1815 pears in the database.
1816
1817 Name The name of the trackable resource. This option is required for
1818 TRES types BB (Burst buffer), GRES, and License. Types CPU, En‐
1819 ergy, Memory, and Node do not have Names. For example if GRES
1820 is the type then name is the denomination of the GRES itself
1821 e.g. GPU.
1822
1823 Type The type of the trackable resource. Current types are BB (Burst
1824 buffer), CPU, Energy, GRES, License, Memory, and Node.
1825
1827 Trackable RESources (TRES) are used in many QOS or Association limits.
1828 When setting the limits they are comma separated list. Each TRES has a
1829 different limit, i.e. GrpTRESMins=cpu=10,mem=20 would make 2 different
1830 limits 1 for 10 cpu minutes and 1 for 20 MB memory minutes. This is
1831 the case for each limit that deals with TRES. To remove the limit -1
1832 is used i.e. GrpTRESMins=cpu=-1 would remove only the cpu TRES limit.
1833
1834 NOTE: When dealing with Memory as a TRES all limits are in MB.
1835
1836 NOTE: The Billing TRES is calculated from a partition's TRESBilling‐
1837 Weights. It is temporarily calculated during scheduling for each parti‐
1838 tion to enforce billing TRES limits. The final Billing TRES is calcu‐
1839 lated after the job has been allocated resources. The final number can
1840 be seen in scontrol show jobs and sacct output.
1841
1842
1844 When using the format option for listing various fields you can put a
1845 %NUMBER afterwards to specify how many characters should be printed.
1846
1847 e.g. format=name%30 will print 30 characters of field name right justi‐
1848 fied. A -30 will print 30 characters left justified.
1849
1850
1852 sacctmgr has the capability to load and dump Slurm association data to
1853 and from a file. This method can easily add a new cluster or copy an
1854 existing cluster's associations into a new cluster with similar ac‐
1855 counts. Each file contains Slurm association data for a single cluster.
1856 Be aware that QOS information is not currently included in the informa‐
1857 tion that can be dumped to a file. QOS information can be retrieved and
1858 loaded using the REST API or it must be transferred to a new cluster
1859 manually. Comments can be put into the file with the # character. Each
1860 line of information must begin with one of the four titles; Cluster,
1861 Parent, Account or User. Following the title is a space, dash, space,
1862 entity value, then specifications. Specifications are colon separated.
1863 If any variable, such as an Organization name, has a space in it, sur‐
1864 round the name with single or double quotes.
1865
1866 To create a file of associations you can run
1867 sacctmgr dump tux file=tux.cfg
1868
1869 To load a previously created file you can run
1870 sacctmgr load file=tux.cfg
1871
1872 sacctmgr dump/load must be run as a Slurm administrator or root. If us‐
1873 ing sacctmgr load on a database without any associations, it must be
1874 run as root (because there aren't any users in the database yet).
1875
1876 Other options for load are:
1877 clean - delete what was already there and start from scratch
1878 with this information.
1879 Cluster= - specify a different name for the cluster than that
1880 which is in the file.
1881
1882 Since the associations in the system follow a hierarchy, so does the
1883 file. Anything that is a parent needs to be defined before any chil‐
1884 dren. The only exception is the understood 'root' account. This is
1885 always a default for any cluster and does not need to be defined.
1886
1887 To edit/create a file start with a cluster line for the new cluster:
1888
1889 Cluster - cluster_name:MaxTRESPerJob=node=15
1890
1891 Anything included on this line will be the default for all associations
1892 on this cluster. The options for the cluster are:
1893
1894
1895 FairShare=
1896 Number used in conjunction with other associations to de‐
1897 termine job priority.
1898
1899 GrpJobs=
1900 Maximum number of running jobs in aggregate for this as‐
1901 sociation and all associations which are children of this
1902 association.
1903
1904 GrpJobsAccrue=
1905 Maximum number of pending jobs in aggregate able to ac‐
1906 crue age priority for this association and all associa‐
1907 tions which are children of this association.
1908
1909 GrpNodes=
1910 Maximum number of nodes running jobs are able to be allo‐
1911 cated in aggregate for this association and all associa‐
1912 tions which are children of this association.
1913
1914 GrpSubmitJobs=
1915 Maximum number of jobs which can be in a pending or run‐
1916 ning state at any time in aggregate for this association
1917 and all associations which are children of this associa‐
1918 tion.
1919
1920 GrpTRES=
1921 Maximum number of TRES running jobs are able to be allo‐
1922 cated in aggregate for this association and all associa‐
1923 tions which are children of this association.
1924
1925 GrpTRESMins=
1926 The total number of TRES minutes that can possibly be
1927 used by past, present and future jobs running from this
1928 association and its children.
1929
1930 GrpTRESRunMins=
1931 Used to limit the combined total number of TRES minutes
1932 used by all jobs running with this association and its
1933 children. This takes into consideration time limit of
1934 running jobs and consumes it, if the limit is reached no
1935 new jobs are started until other jobs finish to allow
1936 time to free up.
1937
1938 GrpWall=
1939 Maximum wall clock time running jobs are able to be allo‐
1940 cated in aggregate for this association and all associa‐
1941 tions which are children of this association.
1942
1943 MaxJobs=
1944 Maximum number of jobs the children of this association
1945 can run.
1946
1947 MaxTRESPerJob=
1948 Maximum number of trackable resources per job the chil‐
1949 dren of this association can run.
1950
1951 MaxWallDurationPerJob=
1952 Maximum time (not related to job size) children of this
1953 accounts jobs can run.
1954
1955 QOS= Comma separated list of Quality of Service names (Defined
1956 in sacctmgr).
1957
1958 After the entry for the root account you will have entries for the
1959 other accounts on the system. The entries will look similar to this ex‐
1960 ample:
1961
1962 Parent - root
1963 Account - cs:MaxTRESPerJob=node=5:MaxJobs=4:FairShare=399:MaxWallDurationPerJob=40:Description='Computer Science':Organization='LC'
1964 Parent - cs
1965 Account - test:MaxTRESPerJob=node=1:MaxJobs=1:FairShare=1:MaxWallDurationPerJob=1:Description='Test Account':Organization='Test'
1966
1967 Any of the options after a ':' can be left out and they can be in any
1968 order. If you want to add any sub accounts just list the Parent THAT
1969 HAS ALREADY BEEN CREATED before the account you are adding.
1970
1971 Account options are:
1972
1973 Description=
1974 A brief description of the account.
1975
1976 FairShare=
1977 Number used in conjunction with other associations to de‐
1978 termine job priority.
1979
1980 GrpTRESMins=
1981 Maximum number of TRES hours running jobs are able to be
1982 allocated in aggregate for this association and all asso‐
1983 ciations which are children of this association. Grp‐
1984 TRESRunMins= Used to limit the combined total number of
1985 TRES minutes used by all jobs running with this associa‐
1986 tion and its children. This takes into consideration
1987 time limit of running jobs and consumes it, if the limit
1988 is reached no new jobs are started until other jobs fin‐
1989 ish to allow time to free up.
1990
1991 GrpTRES=
1992 Maximum number of TRES running jobs are able to be allo‐
1993 cated in aggregate for this association and all associa‐
1994 tions which are children of this association.
1995
1996 GrpJobs=
1997 Maximum number of running jobs in aggregate for this as‐
1998 sociation and all associations which are children of this
1999 association.
2000
2001 GrpJobsAccrue
2002 Maximum number of pending jobs in aggregate able to ac‐
2003 crue age priority for this association and all associa‐
2004 tions which are children of this association.
2005
2006 GrpNodes=
2007 Maximum number of nodes running jobs are able to be allo‐
2008 cated in aggregate for this association and all associa‐
2009 tions which are children of this association.
2010
2011 GrpSubmitJobs=
2012 Maximum number of jobs which can be in a pending or run‐
2013 ning state at any time in aggregate for this association
2014 and all associations which are children of this associa‐
2015 tion.
2016
2017 GrpWall=
2018 Maximum wall clock time running jobs are able to be allo‐
2019 cated in aggregate for this association and all associa‐
2020 tions which are children of this association.
2021
2022 MaxJobs=
2023 Maximum number of jobs the children of this association
2024 can run.
2025
2026 MaxNodesPerJob=
2027 Maximum number of nodes per job the children of this as‐
2028 sociation can run.
2029
2030 MaxWallDurationPerJob=
2031 Maximum time (not related to job size) children of this
2032 accounts jobs can run.
2033
2034 Organization=
2035 Name of organization that owns this account.
2036
2037 QOS(=,+=,-=)
2038 Comma separated list of Quality of Service names (Defined
2039 in sacctmgr).
2040
2041
2042 To add users to an account add a line after the Parent line, similar to
2043 this:
2044
2045 Parent - test
2046 User - adam:MaxTRESPerJob=node:2:MaxJobs=3:FairShare=1:MaxWallDurationPerJob=1:AdminLevel=Operator:Coordinator='test'
2047
2048
2049 User options are:
2050
2051 AdminLevel=
2052 Type of admin this user is (Administrator, Operator)
2053 Must be defined on the first occurrence of the user.
2054
2055 Coordinator=
2056 Comma separated list of accounts this user is coordinator
2057 over
2058 Must be defined on the first occurrence of the user.
2059
2060 DefaultAccount=
2061 System wide default account name
2062 Must be defined on the first occurrence of the user.
2063
2064 FairShare=
2065 Number used in conjunction with other associations to de‐
2066 termine job priority.
2067
2068 MaxJobs=
2069 Maximum number of jobs this user can run.
2070
2071 MaxTRESPerJob=
2072 Maximum number of trackable resources per job this user
2073 can run.
2074
2075 MaxWallDurationPerJob=
2076 Maximum time (not related to job size) this user can run.
2077
2078 QOS(=,+=,-=)
2079 Comma separated list of Quality of Service names (Defined
2080 in sacctmgr).
2081
2083 Sacctmgr has the capability to archive to a flatfile and or load that
2084 data if needed later. The archiving is usually done by the slurmdbd
2085 and it is highly recommended you only do it through sacctmgr if you
2086 completely understand what you are doing. For slurmdbd options see
2087 "man slurmdbd" for more information. Loading data into the database
2088 can be done from these files to either view old data or regenerate
2089 rolled up data.
2090
2091
2092 archive dump
2093 Dump accounting data to file. Data will not be archived unless the cor‐
2094 responding purge option is included in this command or in slur‐
2095 mdbd.conf. This operation cannot be rolled back once executed. If one
2096 of the following options is not specified when sacctmgr is called, the
2097 value configured in slurmdbd.conf is used.
2098
2099
2100 Directory=
2101 Directory to store the archive data.
2102
2103 Events Archive Events. If not specified and PurgeEventAfter is set all
2104 event data removed will be lost permanently.
2105
2106 Jobs Archive Jobs. If not specified and PurgeJobAfter is set all job
2107 data removed will be lost permanently.
2108
2109 PurgeEventAfter=
2110 Purge cluster event records older than time stated in months.
2111 If you want to purge on a shorter time period you can include
2112 hours, or days behind the numeric value to get those more fre‐
2113 quent purges. (e.g. a value of '12hours' would purge everything
2114 older than 12 hours.)
2115
2116 PurgeJobAfter=
2117 Purge job records older than time stated in months. If you want
2118 to purge on a shorter time period you can include hours, or days
2119 behind the numeric value to get those more frequent purges.
2120 (e.g. a value of '12hours' would purge everything older than 12
2121 hours.)
2122
2123 PurgeStepAfter=
2124 Purge step records older than time stated in months. If you
2125 want to purge on a shorter time period you can include hours, or
2126 days behind the numeric value to get those more frequent purges.
2127 (e.g. a value of '12hours' would purge everything older than 12
2128 hours.)
2129
2130 PurgeSuspendAfter=
2131 Purge job suspend records older than time stated in months. If
2132 you want to purge on a shorter time period you can include
2133 hours, or days behind the numeric value to get those more fre‐
2134 quent purges. (e.g. a value of '12hours' would purge everything
2135 older than 12 hours.)
2136
2137 Script=
2138 Run this script instead of the generic form of archive to flat
2139 files.
2140
2141 Steps Archive Steps. If not specified and PurgeStepAfter is set all
2142 step data removed will be lost permanently.
2143
2144 Suspend
2145 Archive Suspend Data. If not specified and PurgeSuspendAfter is
2146 set all suspend data removed will be lost permanently.
2147
2148
2149 archive load
2150 Load in to the database previously archived data. The archive file will
2151 not be loaded if the records already exist in the database - therefore,
2152 trying to load an archive file more than once will result in an error.
2153 When this data is again archived and purged from the database, if the
2154 old archive file is still in the directory ArchiveDir, a new archive
2155 file will be created (see ArchiveDir in the slurmdbd.conf man page), so
2156 the old file will not be overwritten and these files will have dupli‐
2157 cate records.
2158
2159
2160 Archive files from the current or any prior Slurm release may be loaded
2161 through archive load.
2162
2163
2164 File= File to load into database. The specified file must exist on the
2165 slurmdbd host, which is not necessarily the machine running the
2166 command.
2167
2168 Insert=
2169 SQL to insert directly into the database. This should be used
2170 very cautiously since this is writing your sql into the data‐
2171 base.
2172
2174 Executing sacctmgr sends a remote procedure call to slurmdbd. If enough
2175 calls from sacctmgr or other Slurm client commands that send remote
2176 procedure calls to the slurmdbd daemon come in at once, it can result
2177 in a degradation of performance of the slurmdbd daemon, possibly re‐
2178 sulting in a denial of service.
2179
2180 Do not run sacctmgr or other Slurm client commands that send remote
2181 procedure calls to slurmdbd from loops in shell scripts or other pro‐
2182 grams. Ensure that programs limit calls to sacctmgr to the minimum
2183 necessary for the information you are trying to gather.
2184
2185
2187 Some sacctmgr options may be set via environment variables. These envi‐
2188 ronment variables, along with their corresponding options, are listed
2189 below. (Note: Command line options will always override these set‐
2190 tings.)
2191
2192
2193 SLURM_CONF The location of the Slurm configuration file.
2194
2196 NOTE: There is an order to set up accounting associations. You must
2197 define clusters before you add accounts and you must add accounts be‐
2198 fore you can add users.
2199
2200 $ sacctmgr create cluster tux
2201 $ sacctmgr create account name=science fairshare=50
2202 $ sacctmgr create account name=chemistry parent=science fairshare=30
2203 $ sacctmgr create account name=physics parent=science fairshare=20
2204 $ sacctmgr create user name=adam cluster=tux account=physics fairshare=10
2205 $ sacctmgr delete user name=adam cluster=tux account=physics
2206 $ sacctmgr delete account name=physics cluster=tux
2207 $ sacctmgr modify user where name=adam cluster=tux account=physics set maxjobs=2 maxwall=30:00
2208 $ sacctmgr add user brian account=chemistry
2209 $ sacctmgr list associations cluster=tux format=Account,Cluster,User,Fairshare tree withd
2210 $ sacctmgr list transactions Action="Add Users" Start=11/03-10:30:00 format=Where,Time
2211 $ sacctmgr dump cluster=tux file=tux_data_file
2212 $ sacctmgr load tux_data_file
2213
2214 A user's account can not be changed directly. A new association needs
2215 to be created for the user with the new account. Then the association
2216 with the old account can be deleted.
2217
2218 When modifying an object placing the key words 'set' and the optional
2219 'where' is critical to perform correctly below are examples to produce
2220 correct results. As a rule of thumb anything you put in front of the
2221 set will be used as a quantifier. If you want to put a quantifier af‐
2222 ter the key word 'set' you should use the key word 'where'. The follow‐
2223 ing is wrong:
2224
2225 $ sacctmgr modify user name=adam set fairshare=10 cluster=tux
2226
2227 This will produce an error as the above line reads modify user adam set
2228 fairshare=10 and cluster=tux. Either of the following is correct:
2229
2230 $ sacctmgr modify user name=adam cluster=tux set fairshare=10
2231 $ sacctmgr modify user name=adam set fairshare=10 where cluster=tux
2232
2233 When changing qos for something only use the '=' operator when wanting
2234 to explicitly set the qos to something. In most cases you will want to
2235 use the '+=' or '-=' operator to either add to or remove from the ex‐
2236 isting qos already in place.
2237
2238 If a user already has qos of normal,standby for a parent or it was ex‐
2239 plicitly set you should use qos+=expedite to add this to the list in
2240 this fashion.
2241
2242 If you are looking to only add the qos expedite to only a certain ac‐
2243 count and or cluster you can do that by specifying them in the sacctmgr
2244 line.
2245
2246 $ sacctmgr modify user name=adam set qos+=expedite
2247
2248 or
2249
2250 $ sacctmgr modify user name=adam acct=this cluster=tux set qos+=expedite
2251
2252 Let's give an example how to add QOS to user accounts. List all avail‐
2253 able QOSs in the cluster.
2254
2255 $ sacctmgr show qos format=name
2256 Name
2257 ---------
2258 normal
2259 expedite
2260
2261 List all the associations in the cluster.
2262
2263 $ sacctmgr show assoc format=cluster,account,qos
2264 Cluster Account QOS
2265 -------- ---------- -----
2266 zebra root normal
2267 zebra root normal
2268 zebra g normal
2269 zebra g1 normal
2270
2271 Add the QOS expedite to account G1 and display the result. Using the
2272 operator += the QOS will be added together with the existing QOS to
2273 this account.
2274
2275 $ sacctmgr modify account name=g1 set qos+=expedite
2276 $ sacctmgr show assoc format=cluster,account,qos
2277 Cluster Account QOS
2278 -------- -------- -------
2279 zebra root normal
2280 zebra root normal
2281 zebra g normal
2282 zebra g1 expedite,normal
2283
2284 Now set the QOS expedite as the only QOS for the account G and display
2285 the result. Using the operator = that expedite is the only usable QOS
2286 by account G
2287
2288 $ sacctmgr modify account name=G set qos=expedite
2289 $ sacctmgr show assoc format=cluster,account,user,qos
2290 Cluster Account QOS
2291 --------- -------- -----
2292 zebra root normal
2293 zebra root normal
2294 zebra g expedite
2295 zebra g1 expedite,normal
2296
2297 If a new account is added under the account G it will inherit the QOS
2298 expedite and it will not have access to QOS normal.
2299
2300 $ sacctmgr add account banana parent=G
2301 $ sacctmgr show assoc format=cluster,account,qos
2302 Cluster Account QOS
2303 --------- -------- -----
2304 zebra root normal
2305 zebra root normal
2306 zebra g expedite
2307 zebra banana expedite
2308 zebra g1 expedite,normal
2309
2310 An example of listing trackable resources:
2311
2312 $ sacctmgr show tres
2313 Type Name ID
2314 ---------- ----------------- --------
2315 cpu 1
2316 mem 2
2317 energy 3
2318 node 4
2319 billing 5
2320 gres gpu:tesla 1001
2321 license vcs 1002
2322 bb cray 1003
2323
2324
2326 Copyright (C) 2008-2010 Lawrence Livermore National Security. Produced
2327 at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
2328 Copyright (C) 2010-2022 SchedMD LLC.
2329
2330 This file is part of Slurm, a resource management program. For de‐
2331 tails, see <https://slurm.schedmd.com/>.
2332
2333 Slurm is free software; you can redistribute it and/or modify it under
2334 the terms of the GNU General Public License as published by the Free
2335 Software Foundation; either version 2 of the License, or (at your op‐
2336 tion) any later version.
2337
2338 Slurm is distributed in the hope that it will be useful, but WITHOUT
2339 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
2340 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
2341 for more details.
2342
2343
2345 slurm.conf(5), slurmdbd(8)
2346
2347
2348
2349August 2022 Slurm Commands sacctmgr(1)