1sacctmgr(1) Slurm Commands sacctmgr(1)
2
3
4
6 sacctmgr - Used to view and modify Slurm account information.
7
8
10 sacctmgr [OPTIONS...] [COMMAND...]
11
12
14 sacctmgr is used to view or modify Slurm account information. The
15 account information is maintained within a database with the interface
16 being provided by slurmdbd (Slurm Database daemon). This database can
17 serve as a central storehouse of user and computer information for mul‐
18 tiple computers at a single site. Slurm account information is
19 recorded based upon four parameters that form what is referred to as an
20 association. These parameters are user, cluster, partition, and
21 account. user is the login name. cluster is the name of a Slurm man‐
22 aged cluster as specified by the ClusterName parameter in the
23 slurm.conf configuration file. partition is the name of a Slurm parti‐
24 tion on that cluster. account is the bank account for a job. The
25 intended mode of operation is to initiate the sacctmgr command, add,
26 delete, modify, and/or list association records then commit the changes
27 and exit.
28
29
30 Note: The content's of Slurm's database are maintained in lower case.
31 This may result in some sacctmgr output differing from that of
32 other Slurm commands.
33
34
36 -h, --help
37 Print a help message describing the usage of sacctmgr. This is
38 equivalent to the help command.
39
40
41 -i, --immediate
42 commit changes immediately without asking for confirmation.
43
44
45 -n, --noheader
46 No header will be added to the beginning of the output.
47
48
49 -p, --parsable
50 Output will be '|' delimited with a '|' at the end.
51
52
53 -P, --parsable2
54 Output will be '|' delimited without a '|' at the end.
55
56
57 -Q, --quiet
58 Print no messages other than error messages. This is equivalent
59 to the quiet command.
60
61
62 -r, --readonly
63 Makes it so the running sacctmgr cannot modify accounting infor‐
64 mation. The readonly option is for use within interactive mode.
65
66
67 -s, --associations
68 Use with show or list to display associations with the entity.
69 This is equivalent to the associations command.
70
71
72 -v, --verbose
73 Enable detailed logging. This is equivalent to the verbose com‐
74 mand.
75
76
77 -V , --version
78 Display version number. This is equivalent to the version com‐
79 mand.
80
81
83 add <ENTITY> <SPECS>
84 Add an entity. Identical to the create command.
85
86
87 associations
88 Use with show or list to display associations with the entity.
89
90
91 clear stats
92 Clear the server statistics.
93
94
95 create <ENTITY> <SPECS>
96 Add an entity. Identical to the add command.
97
98
99 delete <ENTITY> where <SPECS>
100 Delete the specified entities.
101
102
103 dump <ENTITY> <File=FILENAME>
104 Dump cluster data to the specified file. If the filename is not
105 specified it uses clustername.cfg filename by default.
106
107
108 help Display a description of sacctmgr options and commands.
109
110
111 list <ENTITY> [<SPECS>]
112 Display information about the specified entity. By default, all
113 entries are displayed, you can narrow results by specifying
114 SPECS in your query. Identical to the show command.
115
116
117 load <FILENAME>
118 Load cluster data from the specified file. This is a configura‐
119 tion file generated by running the sacctmgr dump command. This
120 command does not load archive data, see the sacctmgr archive
121 load option instead.
122
123
124 modify <ENTITY> where <SPECS> set <SPECS>
125 Modify an entity.
126
127
128 problem
129 Use with show or list to display entity problems.
130
131
132 reconfigure
133 Reconfigures the SlurmDBD if running with one.
134
135
136 show <ENTITY> [<SPECS>]
137 Display information about the specified entity. By default, all
138 entries are displayed, you can narrow results by specifying
139 SPECS in your query. Identical to the list command.
140
141
142 shutdown
143 Shutdown the server.
144
145
146 version
147 Display the version number of sacctmgr.
148
149
151 NOTE: All commands listed below can be used in the interactive mode,
152 but NOT on the initial command line.
153
154
155 exit Terminate sacctmgr interactive mode. Identical to the quit com‐
156 mand.
157
158
159 quiet Print no messages other than error messages.
160
161
162 quit Terminate the execution of sacctmgr interactive mode. Identical
163 to the exit command.
164
165
166 verbose
167 Enable detailed logging. This includes time-stamps on data
168 structures, record counts, etc. This is an independent command
169 with no options meant for use in interactive mode.
170
171
172 !! Repeat the last command.
173
174
176 account
177 A bank account, typically specified at job submit time using the
178 --account= option. These may be arranged in a hierarchical
179 fashion, for example accounts chemistry and physics may be chil‐
180 dren of the account science. The hierarchy may have an arbi‐
181 trary depth.
182
183
184 association
185 The entity used to group information consisting of four parame‐
186 ters: account, cluster, partition (optional), and user. Used
187 only with the list or show command. Add, modify, and delete
188 should be done to a user, account or cluster entity. This will
189 in-turn update the underlying associations.
190
191
192 cluster
193 The ClusterName parameter in the slurm.conf configuration file,
194 used to differentiate accounts on different machines.
195
196
197 configuration
198 Used only with the list or show command to report current system
199 configuration.
200
201
202 coordinator
203 A special privileged user usually an account manager or such
204 that can add users or sub accounts to the account they are coor‐
205 dinator over. This should be a trusted person since they can
206 change limits on account and user associations inside their
207 realm.
208
209
210 event Events like downed or draining nodes on clusters.
211
212
213 federation
214 A group of clusters that work together to schedule jobs.
215
216
217 job Used to modify specific fields of a job: Derived Exit Code and
218 the Comment String.
219
220
221 qos Quality of Service.
222
223
224 Resource
225 Software resources for the system. Those are software licenses
226 shared among clusters.
227
228
229 RunawayJobs
230 Used only with the list or show command to report current jobs
231 that have been orphanded on the local cluster and are now run‐
232 away. If there are jobs in this state it will also give you an
233 option to "fix" them. NOTE: You must have an AdminLevel of at
234 least Operator to preform this.
235
236
237 stats Used with list or show command to view server statistics.
238 Accepts optional argument of ave_time or total_time to sort on
239 those fields. By default, sorts on increasing RPC count field.
240
241
242 transaction
243 List of transactions that have occurred during a given time
244 period.
245
246
247 user The login name. Only lowercase usernames are supported.
248
249
250 wckeys Workload Characterization Key. An arbitrary string for
251 grouping orthogonal accounts.
252
253
255 NOTE: The group limits (GrpJobs, GrpTRES, etc.) are tested when a job
256 is being considered for being allocated resources. If starting a job
257 would cause any of its group limit to be exceeded, that job will not be
258 considered for scheduling even if that job might preempt other jobs
259 which would release sufficient group resources for the pending job to
260 be initiated.
261
262
263 DefaultQOS=<default qos>
264 The default QOS this association and its children should have.
265 This is overridden if set directly on a user. To clear a previ‐
266 ously set value use the modify command with a new value of -1.
267
268
269 Fairshare=<fairshare number | parent>
270 Number used in conjunction with other accounts to determine job
271 priority. Can also be the string parent, when used on a user
272 this means that the parent association is used for fairshare.
273 If Fairshare=parent is set on an account, that account's chil‐
274 dren will be effectively reparented for fairshare calculations
275 to the first parent of their parent that is not Fairshare=par‐
276 ent. Limits remain the same, only it's fairshare value is
277 affected. To clear a previously set value use the modify com‐
278 mand with a new value of -1.
279
280
281 GraceTime=<preemption grace time in seconds>
282 Specifies, in units of seconds, the preemption grace time to be
283 extended to a job which has been selected for preemption. The
284 default value is zero, no preemption grace time is allowed on
285 this QOS.
286
287 NOTE: This value is only meaningful for QOS PreemptMode=CANCEL.
288
289
290 GrpTRESMins=<TRES=max TRES minutes,...>
291 The total number of TRES minutes that can possibly be used by
292 past, present and future jobs running from this association and
293 its children. To clear a previously set value use the modify
294 command with a new value of -1 for each TRES id.
295
296 NOTE: This limit is not enforced if set on the root association
297 of a cluster. So even though it may appear in sacctmgr output,
298 it will not be enforced.
299
300 ALSO NOTE: This limit only applies when using the Priority Mul‐
301 tifactor plugin. The time is decayed using the value of Priori‐
302 tyDecayHalfLife or PriorityUsageResetPeriod as set in the
303 slurm.conf. When this limit is reached all associated jobs run‐
304 ning will be killed and all future jobs submitted with associa‐
305 tions in the group will be delayed until they are able to run
306 inside the limit.
307
308
309 GrpTRESRunMins=<TRES=max TRES run minutes,...>
310 Used to limit the combined total number of TRES minutes used by
311 all jobs running with this association and its children. This
312 takes into consideration time limit of running jobs and consumes
313 it, if the limit is reached no new jobs are started until other
314 jobs finish to allow time to free up.
315
316
317 GrpTRES=<TRES=max TRES,...>
318 Maximum number of TRES running jobs are able to be allocated in
319 aggregate for this association and all associations which are
320 children of this association. To clear a previously set value
321 use the modify command with a new value of -1 for each TRES id.
322
323 NOTE: This limit only applies fully when using the Select Con‐
324 sumable Resource plugin.
325
326
327 GrpJobs=<max jobs>
328 Maximum number of running jobs in aggregate for this association
329 and all associations which are children of this association. To
330 clear a previously set value use the modify command with a new
331 value of -1.
332
333
334 GrpJobsAccrue=<max jobs>
335 Maximum number of pending jobs in aggregate able to accrue age
336 priority for this association and all associations which are
337 children of this association. To clear a previously set value
338 use the modify command with a new value of -1.
339
340
341 GrpSubmitJobs=<max jobs>
342 Maximum number of jobs which can be in a pending or running
343 state at any time in aggregate for this association and all
344 associations which are children of this association. To clear a
345 previously set value use the modify command with a new value of
346 -1.
347
348
349 GrpWall=<max wall>
350 Maximum wall clock time running jobs are able to be allocated in
351 aggregate for this association and all associations which are
352 children of this association. To clear a previously set value
353 use the modify command with a new value of -1.
354
355 NOTE: This limit is not enforced if set on the root association
356 of a cluster. So even though it may appear in sacctmgr output,
357 it will not be enforced.
358
359 ALSO NOTE: This limit only applies when using the Priority Mul‐
360 tifactor plugin. The time is decayed using the value of Priori‐
361 tyDecayHalfLife or PriorityUsageResetPeriod as set in the
362 slurm.conf. When this limit is reached all associated jobs run‐
363 ning will be killed and all future jobs submitted with associa‐
364 tions in the group will be delayed until they are able to run
365 inside the limit.
366
367
368 MaxTRESMins=<max TRES minutes>
369 Maximum number of TRES minutes each job is able to use in this
370 association. This is overridden if set directly on a user.
371 Default is the cluster's limit. To clear a previously set value
372 use the modify command with a new value of -1 for each TRES id.
373
374
375 MaxTRES=<max TRES>
376 Maximum number of TRES each job is able to use in this associa‐
377 tion. This is overridden if set directly on a user. Default is
378 the cluster's limit. To clear a previously set value use the
379 modify command with a new value of -1 for each TRES id.
380
381 NOTE: This limit only applies fully when using the Select Con‐
382 sumable Resource plugin.
383
384
385 MaxJobs=<max jobs>
386 Maximum number of jobs each user is allowed to run at one time
387 in this association. This is overridden if set directly on a
388 user. Default is the cluster's limit. To clear a previously
389 set value use the modify command with a new value of -1.
390
391
392 MaxJobsAccrue=<max jobs>
393 Maximum number of pending jobs able to accrue age priority at
394 any given time for the given association. This is overridden if
395 set directly on a user. Default is the cluster's limit. To
396 clear a previously set value use the modify command with a new
397 value of -1.
398
399
400 MaxSubmitJobs=<max jobs>
401 Maximum number of jobs which can this association can have in a
402 pending or running state at any time. Default is the cluster's
403 limit. To clear a previously set value use the modify command
404 with a new value of -1.
405
406
407 MaxWall=<max wall>
408 Maximum wall clock time each job is able to use in this associa‐
409 tion. This is overridden if set directly on a user. Default is
410 the cluster's limit. <max wall> format is <min> or <min>:<sec>
411 or <hr>:<min>:<sec> or <days>-<hr>:<min>:<sec> or <days>-<hr>.
412 The value is recorded in minutes with rounding as needed. To
413 clear a previously set value use the modify command with a new
414 value of -1.
415
416 NOTE: Changing this value will have no effect on any running or
417 pending job.
418
419
420 Priority
421 What priority will be added to a job´s priority when using this
422 association. This is overridden if set directly on a user.
423 Default is the cluster's limit. To clear a previously set value
424 use the modify command with a new value of -1.
425
426
427 QosLevel<operator><comma separated list of qos names>
428 Specify the default Quality of Service's that jobs are able to
429 run at for this association. To get a list of valid QOS's use
430 'sacctmgr list qos'. This value will override its parents value
431 and push down to its children as the new default. Setting a
432 QosLevel to '' (two single quotes with nothing between them)
433 restores its default setting. You can also use the operator +=
434 and -= to add or remove certain QOS's from a QOS list.
435
436 Valid <operator> values include:
437
438 = Set QosLevel to the specified value. Note: the QOS that can
439 be used at a given account in the hierarchy are inherited
440 by the children of that account. By assigning QOS with the
441 = sign only the assigned QOS can be used by the account and
442 its children.
443
444 += Add the specified <qos> value to the current QosLevel. The
445 account will have access to this QOS and the other previ‐
446 ously assigned to it.
447
448 -= Remove the specified <qos> value from the current QosLevel.
449
450
451 See the EXAMPLES section below.
452
453
455 Cluster=<cluster>
456 Specific cluster to add account to. Default is all in system.
457
458
459 Description=<description>
460 An arbitrary string describing an account.
461
462
463 Name=<name>
464 The name of a bank account. Note the name must be unique and
465 can not be represent different bank accounts at different points
466 in the account hierarchy.
467
468
469 Organization=<org>
470 Organization to which the account belongs.
471
472
473 Parent=<parent>
474 Parent account of this account. Default is the root account, a
475 top level account.
476
477
478 RawUsage=<value>
479 This allows an administrator to reset the raw usage accrued to
480 an account. The only value currently supported is 0 (zero).
481 This is a settable specification only - it cannot be used as a
482 filter to list accounts.
483
484
485 WithAssoc
486 Display all associations for this account.
487
488
489 WithCoord
490 Display all coordinators for this account.
491
492
493 WithDeleted
494 Display information with previously deleted data.
495
496 NOTE: If using the WithAssoc option you can also query against associa‐
497 tion specific information to view only certain associations this
498 account may have. These extra options can be found in the SPECIFICA‐
499 TIONS FOR ASSOCIATIONS section. You can also use the general specifi‐
500 cations list above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED
501 ENTITIES section.
502
503
505 Account
506 The name of a bank account.
507
508
509 Description
510 An arbitrary string describing an account.
511
512
513 Organization
514 Organization to which the account belongs.
515
516
517 Coordinators
518 List of users that are a coordinator of the account. (Only
519 filled in when using the WithCoordinator option.)
520
521 NOTE: If using the WithAssoc option you can also view the information
522 about the various associations the account may have on all the clusters
523 in the system. The association information can be filtered. Note that
524 all the accounts in the database will always be shown as filter only
525 takes effect over the association data. The Association format fields
526 are described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
527
528
529
531 Clusters=<comma separated list of cluster names>
532 List the associations of the cluster(s).
533
534
535 Accounts=<comma separated list of account names>
536 List the associations of the account(s).
537
538
539 Users=<comma separated list of user names>
540 List the associations of the user(s).
541
542
543 Partition=<comma separated list of partition names>
544 List the associations of the partition(s).
545
546 NOTE: You can also use the general specifications list above in the
547 GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.
548
549 Other options unique for listing associations:
550
551
552 OnlyDefaults
553 Display only associations that are default associations
554
555
556 Tree Display account names in a hierarchical fashion.
557
558
559 WithDeleted
560 Display information with previously deleted data.
561
562
563 WithSubAccounts
564 Display information with subaccounts. Only really valuable when
565 used with the account= option. This will display all the subac‐
566 count associations along with the accounts listed in the option.
567
568
569 WOLimits
570 Display information without limit information. This is for a
571 smaller default format of "Cluster,Account,User,Partition".
572
573
574 WOPInfo
575 Display information without parent information (i.e. parent id,
576 and parent account name). This option also implicitly sets the
577 WOPLimits option.
578
579
580 WOPLimits
581 Display information without hierarchical parent limits (i.e.
582 will only display limits where they are set instead of propagat‐
583 ing them from the parent).
584
585
586
588 Account
589 The name of a bank account in the association.
590
591
592 Cluster
593 The name of a cluster in the association.
594
595
596 DefaultQOS
597 The QOS the association will use by default if it as access to
598 it in the QOS list mentioned below.
599
600
601 Fairshare
602 Number used in conjunction with other accounts to determine job
603 priority. Can also be the string parent, when used on a user
604 this means that the parent association is used for fairshare.
605 If Fairshare=parent is set on an account, that account's chil‐
606 dren will be effectively reparented for fairshare calculations
607 to the first parent of their parent that is not Fairshare=par‐
608 ent. Limits remain the same, only it's fairshare value is
609 affected.
610
611
612 GrpTRESMins
613 The total number of TRES minutes that can possibly be used by
614 past, present and future jobs running from this association and
615 its children.
616
617
618 GrpTRESRunMins
619 Used to limit the combined total number of TRES minutes used by
620 all jobs running with this association and its children. This
621 takes into consideration time limit of running jobs and consumes
622 it, if the limit is reached no new jobs are started until other
623 jobs finish to allow time to free up.
624
625
626 GrpTRES
627 Maximum number of TRES running jobs are able to be allocated in
628 aggregate for this association and all associations which are
629 children of this association.
630
631
632 GrpJobs
633 Maximum number of running jobs in aggregate for this association
634 and all associations which are children of this association.
635
636
637 GrpJobsAccrue
638 Maximum number of pending jobs in aggregate able to accrue age
639 priority for this association and all associations which are
640 children of this association.
641
642
643 GrpSubmitJobs
644 Maximum number of jobs which can be in a pending or running
645 state at any time in aggregate for this association and all
646 associations which are children of this association.
647
648
649 GrpWall
650 Maximum wall clock time running jobs are able to be allocated in
651 aggregate for this association and all associations which are
652 children of this association.
653
654
655 ID The id of the association.
656
657
658 LFT Associations are kept in a hierarchy: this is the left most spot
659 in the hierarchy. When used with the RGT variable, all associa‐
660 tions with a LFT inside this LFT and before the RGT are children
661 of this association.
662
663
664 MaxTRESMins
665 Maximum number of TRES minutes each job is able to use.
666
667
668 MaxTRES
669 Maximum number of TRES each job is able to use.
670
671
672 MaxJobs
673 Maximum number of jobs each user is allowed to run at one time.
674
675
676 MaxJobsAccrue
677 Maximum number of pending jobs able to accrue age priority at
678 any given time.
679
680
681 MaxSubmitJobs
682 Maximum number of jobs pending or running state at any time.
683
684
685 MaxWall
686 Maximum wall clock time each job is able to use.
687
688
689 Qos Valid QOS´ for this association.
690
691
692 ParentID
693 The association id of the parent of this association.
694
695
696 ParentName
697 The account name of the parent of this association.
698
699
700 Partition
701 The name of a partition in the association.
702
703
704 Priority
705 What priority will be added to a job´s priority when using this
706 association.
707
708
709 WithRawQOSLevel
710 Display QosLevel in an unevaluated raw format, consisting of a
711 comma separated list of QOS names prepended with '' (nothing),
712 '+' or '-' for the association. QOS names without +/- prepended
713 were assigned (ie, sacctmgr modify ... set QosLevel=qos_name)
714 for the entity listed or on one of its parents in the hierarchy.
715 QOS names with +/- prepended indicate the QOS was added/filtered
716 (ie, sacctmgr modify ... set QosLevel=[+-]qos_name) for the
717 entity listed or on one of its parents in the hierarchy. Includ‐
718 ing WOPLimits will show exactly where each QOS was assigned,
719 added or filtered in the hierarchy.
720
721
722 RGT Associations are kept in a hierarchy: this is the right most
723 spot in the hierarchy. When used with the LFT variable, all
724 associations with a LFT inside this RGT and after the LFT are
725 children of this association.
726
727
728 User The name of a user in the association.
729
730
732 Classification=<classification>
733 Type of machine, current classifications are capability and
734 capacity.
735
736
737 Features=<comma separated list of feature names>
738 Features that are specific to the cluster. Federated jobs can be
739 directed to clusters that contain the job requested features.
740
741
742 Federation=<federation>
743 The federation that this cluster should be a member of. A clus‐
744 ter can only be a member of one federation at a time.
745
746
747 FedState=<state>
748 The state of the cluster in the federation.
749 Valid states are:
750
751 ACTIVE Cluster will actively accept and schedule federated jobs.
752
753
754 INACTIVE
755 Cluster will not schedule or accept any jobs.
756
757
758 DRAIN Cluster will not accept any new jobs and will let exist‐
759 ing federated jobs complete.
760
761
762 DRAIN+REMOVE
763 Cluster will not accept any new jobs and will remove
764 itself from the federation once all federated jobs have
765 completed. When removed from the federation, the cluster
766 will accept jobs as a non-federated cluster.
767
768
769 Flags=<flag list>
770 Comma separated list of Attributes for a particular cluster.
771 Current Flags include CrayXT, FrontEnd, and MultipleSlurmd.
772
773
774 Name=<name>
775 The name of a cluster. This should be equal to the ClusterName
776 parameter in the slurm.conf configuration file for some
777 Slurm-managed cluster.
778
779
780 RPC=<rpc list>
781 Comma separated list of numeric RPC values.
782
783
784 WithFed
785 Appends federation related columns to default format options
786 (e.g. Federation,ID,Features,FedState).
787
788
789 WOLimits
790 Display information without limit information. This is for a
791 smaller default format of Cluster,ControlHost,ControlPort,RPC
792
793 NOTE: You can also use the general specifications list above in the
794 GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES section.
795
796
797
799 Classification
800 Type of machine, i.e. capability or capacity.
801
802
803 Cluster
804 The name of the cluster.
805
806
807 ControlHost
808 When a slurmctld registers with the database the ip address of
809 the controller is placed here.
810
811
812 ControlPort
813 When a slurmctld registers with the database the port the con‐
814 troller is listening on is placed here.
815
816
817 Features
818 The list of features on the cluster (if any).
819
820
821 Federation
822 The name of the federation this cluster is a member of (if any).
823
824
825 FedState
826 The state of the cluster in the federation (if a member of one).
827
828
829 FedStateRaw
830 Numeric value of the name of the FedState.
831
832
833 Flags Attributes possessed by the cluster.
834
835
836 ID The ID assigned to the cluster when a member of a federation.
837 This ID uniquely identifies the cluster and its jobs in the fed‐
838 eration.
839
840
841 NodeCount
842 The current count of nodes associated with the cluster.
843
844
845 NodeNames
846 The current Nodes associated with the cluster.
847
848
849 PluginIDSelect
850 The numeric value of the select plugin the cluster is using.
851
852
853 RPC When a slurmctld registers with the database the rpc version the
854 controller is running is placed here.
855
856
857 TRES Trackable RESources (Billing, BB (Burst buffer), CPU, Energy,
858 GRES, License, Memory, and Node) this cluster is accounting for.
859
860
861 NOTE: You can also view the information about the root association for
862 the cluster. The Association format fields are described in the
863 LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
864
865
866
868 Account=<comma separated list of account names>
869 Account name to add this user as a coordinator to.
870
871 Names=<comma separated list of user names>
872 Names of coordinators.
873
874 NOTE: To list coordinators use the WithCoordinator options with list
875 account or list user.
876
877
878
880 All_Clusters
881 Get information on all cluster shortcut.
882
883
884 All_Time
885 Get time period for all time shortcut.
886
887
888 Clusters=<comma separated list of cluster names>
889 List the events of the cluster(s). Default is the cluster where
890 the command was run.
891
892
893 End=<OPT>
894 Period ending of events. Default is now.
895
896 Valid time formats are...
897
898 HH:MM[:SS] [AM|PM]
899 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
900 MM/DD[/YY]-HH:MM[:SS]
901 YYYY-MM-DD[THH:MM[:SS]]
902
903
904 Event=<OPT>
905 Specific events to look for, valid options are Cluster or Node,
906 default is both.
907
908
909 MaxTRES=<OPT>
910 Max number of TRES affected by an event.
911
912
913 MinTRES=<OPT>
914 Min number of TRES affected by an event.
915
916
917 Nodes=<comma separated list of node names>
918 Node names affected by an event.
919
920
921 Reason=<comma separated list of reasons>
922 Reason an event happened.
923
924
925 Start=<OPT>
926 Period start of events. Default is 00:00:00 of previous day,
927 unless states are given with the States= spec events. If this
928 is the case the default behavior is to return events currently
929 in the states specified.
930
931 Valid time formats are...
932
933 HH:MM[:SS] [AM|PM]
934 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
935 MM/DD[/YY]-HH:MM[:SS]
936 YYYY-MM-DD[THH:MM[:SS]]
937
938
939 States=<comma separated list of states>
940 State of a node in a node event. If this is set, the event type
941 is set automatically to Node.
942
943
944 User=<comma separated list of users>
945 Query against users who set the event. If this is set, the
946 event type is set automatically to Node since only user slurm
947 can perform a cluster event.
948
949
950
952 Cluster
953 The name of the cluster event happened on.
954
955
956 ClusterNodes
957 The hostlist of nodes on a cluster in a cluster event.
958
959
960 Duration
961 Time period the event was around for.
962
963
964 End Period when event ended.
965
966
967 Event Name of the event.
968
969
970 EventRaw
971 Numeric value of the name of the event.
972
973
974 NodeName
975 The node affected by the event. In a cluster event, this is
976 blank.
977
978
979 Reason The reason an event happened.
980
981
982 Start Period when event started.
983
984
985 State On a node event this is the formatted state of the node during
986 the event.
987
988
989 StateRaw
990 On a node event this is the numeric value of the state of the
991 node during the event.
992
993
994 TRES Number of TRES involved with the event.
995
996
997 User On a node event this is the user who caused the event to happen.
998
999
1000
1002 Clusters[+|-]=<comma separated list of cluster names>
1003 List of clusters to add/remove to a federation. A blank value
1004 (e.g. clusters=) will remove all federations for the federation.
1005 NOTE: a cluster can only be a member of one federation.
1006
1007
1008 Name=<name>
1009 The name of the federation.
1010
1011
1012 Tree Display federations in a hierarchical fashion.
1013
1014
1016 Features
1017 The list of features on the cluster.
1018
1019
1020 Federation
1021 The name of the federation.
1022
1023
1024 Cluster
1025 Name of the cluster that is a member of the federation.
1026
1027
1028 FedState
1029 The state of the cluster in the federation.
1030
1031
1032 FedStateRaw
1033 Numeric value of the name of the FedState.
1034
1035
1036 Index The index of the cluster in the federation.
1037
1038
1039
1041 DerivedExitCode
1042 The derived exit code can be modified after a job completes
1043 based on the user's judgment of whether the job succeeded or
1044 failed. The user can only modify the derived exit code of their
1045 own job.
1046
1047
1048 Comment
1049 The job's comment string when the AccountingStoreJobComment
1050 parameter in the slurm.conf file is set (or defaults) to YES.
1051 The user can only modify the comment string of their own job.
1052
1053
1054 The DerivedExitCode and Comment fields are the only fields
1055 of a job record in the database that can be modified after job
1056 completion.
1057
1058
1060 The sacct command is the exclusive command to display job records from
1061 the Slurm database.
1062
1063
1065 NOTE: The group limits (GrpJobs, GrpNodes, etc.) are tested when a job
1066 is being considered for being allocated resources. If starting a job
1067 would cause any of its group limit to be exceeded, that job will not be
1068 considered for scheduling even if that job might preempt other jobs
1069 which would release sufficient group resources for the pending job to
1070 be initiated.
1071
1072
1073 Flags Used by the slurmctld to override or enforce certain character‐
1074 istics.
1075 Valid options are
1076
1077 DenyOnLimit
1078 If set, jobs using this QOS will be rejected at submis‐
1079 sion time if they do not conform to the QOS 'Max' limits.
1080 Group limits will also be treated like 'Max' limits as
1081 well and will be denied if they go over. By default jobs
1082 that go over these limits will pend until they conform.
1083 This currently only applies to QOS and Association lim‐
1084 its.
1085
1086 EnforceUsageThreshold
1087 If set, and the QOS also has a UsageThreshold, any jobs
1088 submitted with this QOS that fall below the UsageThresh‐
1089 old will be held until their Fairshare Usage goes above
1090 the Threshold.
1091
1092 NoReserve
1093 If this flag is set and backfill scheduling is used, jobs
1094 using this QOS will not reserve resources in the backfill
1095 schedule's map of resources allocated through time. This
1096 flag is intended for use with a QOS that may be preempted
1097 by jobs associated with all other QOS (e.g use with a
1098 "standby" QOS). If this flag is used with a QOS which can
1099 not be preempted by all other QOS, it could result in
1100 starvation of larger jobs.
1101
1102 PartitionMaxNodes
1103 If set jobs using this QOS will be able to override the
1104 requested partition's MaxNodes limit.
1105
1106 PartitionMinNodes
1107 If set jobs using this QOS will be able to override the
1108 requested partition's MinNodes limit.
1109
1110 OverPartQOS
1111 If set jobs using this QOS will be able to override any
1112 limits used by the requested partition's QOS limits.
1113
1114 PartitionTimeLimit
1115 If set jobs using this QOS will be able to override the
1116 requested partition's TimeLimit.
1117
1118 RequiresReservaton
1119 If set jobs using this QOS must designate a reservation
1120 when submitting a job. This option can be useful in
1121 restricting usage of a QOS that may have greater preemp‐
1122 tive capability or additional resources to be allowed
1123 only within a reservation.
1124
1125 NoDecay
1126 If set, this QOS will not have its GrpTRESMins, GrpWall
1127 and UsageRaw decayed by the slurm.conf PriorityDecay‐
1128 HalfLife or PriorityUsageResetPeriod settings. This
1129 allows a QOS to provide aggregate limits that, once con‐
1130 sumed, will not be replenished automatically. Such a QOS
1131 will act as a time-limited quota of resources for an
1132 association that has access to it. Account/user usage
1133 will still be decayed for associations using the QOS.
1134 The QOS GrpTRESMins and GrpWall limits can be increased
1135 or the QOS RawUsage value reset to 0 (zero) to again
1136 allow jobs submitted with this QOS to be queued (if Deny‐
1137 OnLimit is set) or run (pending with QOSGrp{TRES}Minutes‐
1138 Limit or QOSGrpWallLimit reasons, where {TRES} is some
1139 type of trackable resource).
1140
1141 UsageFactorSafe
1142 If set, and AccountingStorageEnforce includes Safe, jobs
1143 will only be able to run if the job can run to completion
1144 with the UsageFactor applied.
1145
1146
1147 GraceTime
1148 Preemption grace time to be extended to a job which has been
1149 selected for preemption.
1150
1151
1152 GrpTRESMins
1153 The total number of TRES minutes that can possibly be used by
1154 past, present and future jobs running from this QOS.
1155
1156
1157 GrpTRESRunMins Used to limit the combined total number of TRES
1158 minutes used by all jobs running with this QOS. This takes into
1159 consideration time limit of running jobs and consumes it, if the
1160 limit is reached no new jobs are started until other jobs finish
1161 to allow time to free up.
1162
1163
1164 GrpTRES
1165 Maximum number of TRES running jobs are able to be allocated in
1166 aggregate for this QOS.
1167
1168
1169 GrpJobs
1170 Maximum number of running jobs in aggregate for this QOS.
1171
1172
1173 GrpJobsAccrue
1174 Maximum number of pending jobs in aggregate able to accrue age
1175 priority for this QOS.
1176
1177
1178 GrpSubmitJobs
1179 Maximum number of jobs which can be in a pending or running
1180 state at any time in aggregate for this QOS.
1181
1182
1183 GrpWall
1184 Maximum wall clock time running jobs are able to be allocated in
1185 aggregate for this QOS. If this limit is reached submission
1186 requests will be denied and the running jobs will be killed.
1187
1188 ID The id of the QOS.
1189
1190
1191 MaxTRESMins
1192 Maximum number of TRES minutes each job is able to use.
1193
1194
1195 MaxTRESPerAccount
1196 Maximum number of TRES each account is able to use.
1197
1198
1199 MaxTRESPerJob
1200 Maximum number of TRES each job is able to use.
1201
1202
1203 MaxTRESPerNode
1204 Maximum number of TRES each node in a job allocation can use.
1205
1206
1207 MaxTRESPerUser
1208 Maximum number of TRES each user is able to use.
1209
1210
1211 MaxJobsAccruePerAccount
1212 Maximum number of pending jobs an account (or subacct) can have
1213 accruing age priority at any given time.
1214
1215
1216 MaxJobsAccruePerUser
1217 Maximum number of pending jobs a user can have accruing age pri‐
1218 ority at any given time.
1219
1220
1221 MaxJobsPerAccount
1222 Maximum number of jobs each account is allowed to run at one
1223 time.
1224
1225
1226 MaxJobsPerUser
1227 Maximum number of jobs each user is allowed to run at one time.
1228
1229
1230 MinPrioThreshold
1231 Minimum priority required to reserve resources when scheduling.
1232
1233
1234 MinTRESPerJob
1235 Minimum number of TRES each job running under this QOS must
1236 request. Otherwise the job will pend until modified.
1237
1238
1239 MaxSubmitJobsPerAccount
1240 Maximum number of jobs pending or running state at any time per
1241 account.
1242
1243
1244 MaxSubmitJobsPerUser
1245 Maximum number of jobs pending or running state at any time per
1246 user.
1247
1248
1249 MaxWall
1250 Maximum wall clock time each job is able to use.
1251
1252
1253 Name Name of the QOS.
1254
1255
1256 Preempt
1257 Other QOS´ this QOS can preempt.
1258
1259
1260 PreemptMode
1261 Mechanism used to preempt jobs of this QOS if the clusters Pre‐
1262 emptType is configured to preempt/qos. The default preemption
1263 mechanism is specified by the cluster-wide PreemptMode configu‐
1264 ration parameter. Possible values are "Cluster" (meaning use
1265 cluster default), "Cancel", "Checkpoint" and "Requeue". This
1266 option is not compatible with PreemptMode=OFF or Preempt‐
1267 Mode=SUSPEND (i.e. preempted jobs must be removed from the
1268 resources).
1269
1270
1271 PreemptExemptTime
1272 Specifies a minimum run time for jobs of this QOS before they
1273 are considered for preemption. This QOS option takes precedence
1274 over the global PreemptExemptTime. Setting to -1 disables the
1275 option, allowing another QOS or the global option to take
1276 effect. Setting to 0 indicates no minimum run time and super‐
1277 sedes the lower priority QOS (see OverPartQOS) and/or the global
1278 option in slurm.conf.
1279
1280
1281 Priority
1282 What priority will be added to a job´s priority when using this
1283 QOS.
1284
1285
1286 RawUsage=<value>
1287 This allows an administrator to reset the raw usage accrued to a
1288 QOS. The only value currently supported is 0 (zero). This is a
1289 settable specification only - it cannot be used as a filter to
1290 list accounts.
1291
1292
1293 UsageFactor
1294 Usage factor when running with this QOS. See below for more
1295 details.
1296
1297
1298 UsageThreshold
1299 A float representing the lowest fairshare of an association
1300 allowable to run a job. If an association falls below this
1301 threshold and has pending jobs or submits new jobs those jobs
1302 will be held until the usage goes back above the threshold. Use
1303 sshare to see current shares on the system.
1304
1305
1306 WithDeleted
1307 Display information with previously deleted data.
1308
1309
1310
1312 Description
1313 An arbitrary string describing a QOS.
1314
1315
1316 GraceTime
1317 Preemption grace time to be extended to a job which has been
1318 selected for preemption in the format of hh:mm:ss. The default
1319 value is zero, no preemption grace time is allowed on this par‐
1320 tition. NOTE: This value is only meaningful for QOS Preempt‐
1321 Mode=CANCEL.
1322
1323
1324 GrpTRESMins
1325 The total number of TRES minutes that can possibly be used by
1326 past, present and future jobs running from this QOS. To clear a
1327 previously set value use the modify command with a new value of
1328 -1 for each TRES id. NOTE: This limit only applies when using
1329 the Priority Multifactor plugin. The time is decayed using the
1330 value of PriorityDecayHalfLife or PriorityUsageResetPeriod as
1331 set in the slurm.conf. When this limit is reached all associ‐
1332 ated jobs running will be killed and all future jobs submitted
1333 with this QOS will be delayed until they are able to run inside
1334 the limit.
1335
1336
1337 GrpTRES
1338 Maximum number of TRES running jobs are able to be allocated in
1339 aggregate for this QOS. To clear a previously set value use the
1340 modify command with a new value of -1 for each TRES id.
1341
1342
1343 GrpJobs
1344 Maximum number of running jobs in aggregate for this QOS. To
1345 clear a previously set value use the modify command with a new
1346 value of -1.
1347
1348
1349 GrpJobsAccrue
1350 Maximum number of pending jobs in aggregate able to accrue age
1351 priority for this QOS. To clear a previously set value use the
1352 modify command with a new value of -1.
1353
1354
1355 GrpSubmitJobs
1356 Maximum number of jobs which can be in a pending or running
1357 state at any time in aggregate for this QOS. To clear a previ‐
1358 ously set value use the modify command with a new value of -1.
1359
1360
1361 GrpWall
1362 Maximum wall clock time running jobs are able to be allocated in
1363 aggregate for this QOS. To clear a previously set value use the
1364 modify command with a new value of -1. NOTE: This limit only
1365 applies when using the Priority Multifactor plugin. The time is
1366 decayed using the value of PriorityDecayHalfLife or Priori‐
1367 tyUsageResetPeriod as set in the slurm.conf. When this limit is
1368 reached all associated jobs running will be killed and all
1369 future jobs submitted with this QOS will be delayed until they
1370 are able to run inside the limit.
1371
1372
1373 MaxTRESMins
1374 Maximum number of TRES minutes each job is able to use. To
1375 clear a previously set value use the modify command with a new
1376 value of -1 for each TRES id.
1377
1378
1379 MaxTRESPerAccount
1380 Maximum number of TRES each account is able to use. To clear a
1381 previously set value use the modify command with a new value of
1382 -1 for each TRES id.
1383
1384
1385 MaxTRESPerJob
1386 Maximum number of TRES each job is able to use. To clear a pre‐
1387 viously set value use the modify command with a new value of -1
1388 for each TRES id.
1389
1390
1391 MaxTRESPerNode
1392 Maximum number of TRES each node in a job allocation can use.
1393 To clear a previously set value use the modify command with a
1394 new value of -1 for each TRES id.
1395
1396
1397 MaxTRESPerUser
1398 Maximum number of TRES each user is able to use. To clear a
1399 previously set value use the modify command with a new value of
1400 -1 for each TRES id.
1401
1402
1403 MaxJobsPerAccount
1404 Maximum number of jobs each account is allowed to run at one
1405 time. To clear a previously set value use the modify command
1406 with a new value of -1.
1407
1408
1409 MaxJobsPerUser
1410 Maximum number of jobs each user is allowed to run at one time.
1411 To clear a previously set value use the modify command with a
1412 new value of -1.
1413
1414
1415 MaxSubmitJobsPerAccount
1416 Maximum number of jobs pending or running state at any time per
1417 account. To clear a previously set value use the modify command
1418 with a new value of -1.
1419
1420
1421 MaxSubmitJobsPerUser
1422 Maximum number of jobs pending or running state at any time per
1423 user. To clear a previously set value use the modify command
1424 with a new value of -1.
1425
1426
1427 MaxWall
1428 Maximum wall clock time each job is able to use. <max wall>
1429 format is <min> or <min>:<sec> or <hr>:<min>:<sec> or
1430 <days>-<hr>:<min>:<sec> or <days>-<hr>. The value is recorded
1431 in minutes with rounding as needed. To clear a previously set
1432 value use the modify command with a new value of -1.
1433
1434
1435 MinPrioThreshold
1436 Minimum priority required to reserve resources when scheduling.
1437 To clear a previously set value use the modify command with a
1438 new value of -1.
1439
1440
1441 MinTRES
1442 Minimum number of TRES each job running under this QOS must
1443 request. Otherwise the job will pend until modified. To clear
1444 a previously set value use the modify command with a new value
1445 of -1 for each TRES id.
1446
1447
1448 Name Name of the QOS. Needed for creation.
1449
1450
1451 Preempt
1452 Other QOS´ this QOS can preempt. Setting a Preempt to '' (two
1453 single quotes with nothing between them) restores its default
1454 setting. You can also use the operator += and -= to add or
1455 remove certain QOS's from a QOS list.
1456
1457
1458 PreemptMode
1459 Mechanism used to preempt jobs of this QOS if the clusters Pre‐
1460 emptType is configured to preempt/qos. The default preemption
1461 mechanism is specified by the cluster-wide PreemptMode configu‐
1462 ration parameter. Possible values are "Cluster" (meaning use
1463 cluster default), "Cancel", "Checkpoint" and "Requeue". This
1464 option is not compatible with PreemptMode=OFF or Preempt‐
1465 Mode=SUSPEND (i.e. preempted jobs must be removed from the
1466 resources).
1467
1468
1469 Priority
1470 What priority will be added to a job´s priority when using this
1471 QOS. To clear a previously set value use the modify command
1472 with a new value of -1.
1473
1474
1475 UsageFactor
1476 A float that is factored into a job’s TRES usage (e.g. RawUsage,
1477 TRESMins, TRESRunMins). For example, if the usagefactor was 2,
1478 for every TRESBillingUnit second a job ran it would count for 2.
1479 If the usagefactor was .5, every second would only count for
1480 half of the time. A setting of 0 would add no timed usage from
1481 the job.
1482
1483 The usage factor only applies to the job's QOS and not the par‐
1484 tition QOS.
1485
1486 If the UsageFactorSafe flag is set and AccountingStorageEnforce
1487 includes Safe, jobs will only be able to run if the job can run
1488 to completion with the UsageFactor applied.
1489
1490 If the UsageFactorSafe flag is not set and AccountingStorageEn‐
1491 force includes Safe, a job will be able to be scheduled without
1492 the UsageFactor applied and will be able to run without being
1493 killed due to limits.
1494
1495 If the UsageFactorSafe flag is not set and AccountingStorageEn‐
1496 force does not include Safe, a job will be able to be scheduled
1497 without the UsageFactor applied and could be killed due to lim‐
1498 its.
1499
1500 See AccountingStorageEnforce in slurm.conf man page.
1501
1502 Default is 1. To clear a previously set value use the modify
1503 command with a new value of -1.
1504
1505
1507 Clusters=<comma separated list of cluster names>
1508 List the reservations of the cluster(s). Default is the cluster
1509 where the command was run.
1510
1511
1512 End=<OPT>
1513 Period ending of reservations. Default is now.
1514
1515 Valid time formats are...
1516
1517 HH:MM[:SS] [AM|PM]
1518 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1519 MM/DD[/YY]-HH:MM[:SS]
1520 YYYY-MM-DD[THH:MM[:SS]]
1521
1522
1523 ID=<OPT>
1524 Comma separated list of reservation ids.
1525
1526
1527 Names=<OPT>
1528 Comma separated list of reservation names.
1529
1530
1531 Nodes=<comma separated list of node names>
1532 Node names where reservation ran.
1533
1534
1535 Start=<OPT>
1536 Period start of reservations. Default is 00:00:00 of current
1537 day.
1538
1539 Valid time formats are...
1540
1541 HH:MM[:SS] [AM|PM]
1542 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1543 MM/DD[/YY]-HH:MM[:SS]
1544 YYYY-MM-DD[THH:MM[:SS]]
1545
1546
1548 Associations
1549 The id's of the associations able to run in the reservation.
1550
1551
1552 Cluster
1553 Name of cluster reservation was on.
1554
1555
1556 End End time of reservation.
1557
1558
1559 Flags Flags on the reservation.
1560
1561
1562 ID Reservation ID.
1563
1564
1565 Name Name of this reservation.
1566
1567
1568 NodeNames
1569 List of nodes in the reservation.
1570
1571
1572 Start Start time of reservation.
1573
1574
1575 TRES List of TRES in the reservation.
1576
1577
1578 UnusedWall
1579 Wall clock time in seconds unused by any job.
1580
1581
1582
1584 Clusters=<name list> Comma separated list of cluster names on which
1585 specified resources are to be available. If no names are designated
1586 then the clusters already allowed to use this resource will be altered.
1587
1588
1589 Count=<OPT>
1590 Number of software resources of a specific name configured on
1591 the system being controlled by a resource manager.
1592
1593
1594 Descriptions=
1595 A brief description of the resource.
1596
1597
1598 Flags=<OPT>
1599 Flags that identify specific attributes of the system resource.
1600 At this time no flags have been defined.
1601
1602
1603 ServerType=<OPT>
1604 The type of a software resource manager providing the licenses.
1605 For example FlexNext Publisher Flexlm license server or Reprise
1606 License Manager RLM.
1607
1608
1609 Names=<OPT>
1610 Comma separated list of the name of a resource configured on the
1611 system being controlled by a resource manager. If this resource
1612 is seen on the slurmctld it's name will be name@server to dis‐
1613 tinguish it from local resources defined in a slurm.conf.
1614
1615
1616 PercentAllowed=<percent allowed>
1617 Percentage of a specific resource that can be used on specified
1618 cluster.
1619
1620
1621 Server=<OPT>
1622 The name of the server serving up the resource. Default is
1623 'slurmdb' indicating the licenses are being served by the data‐
1624 base.
1625
1626
1627 Type=<OPT>
1628 The type of the resource represented by this record. Currently
1629 the only valid type is License.
1630
1631
1632 WithClusters
1633 Display the clusters percentage of resources. If a resource
1634 hasn't been given to a cluster the resource will not be dis‐
1635 played with this flag.
1636
1637
1638 NOTE: Resource is used to define each resource configured on a system
1639 available for usage by Slurm clusters.
1640
1641
1643 Cluster
1644 Name of cluster resource is given to.
1645
1646
1647 Count The count of a specific resource configured on the system glob‐
1648 ally.
1649
1650
1651 Allocated
1652 The percent of licenses allocated to a cluster.
1653
1654
1655 Description
1656 Description of the resource.
1657
1658
1659 ServerType
1660 The type of the server controlling the licenses.
1661
1662
1663 Name Name of this resource.
1664
1665
1666 Server Server serving up the resource.
1667
1668
1669 Type Type of resource this record represents.
1670
1671
1673 Cluster
1674 Name of cluster job ran on.
1675
1676
1677 ID Id of the job.
1678
1679
1680 Name Name of the job.
1681
1682
1683 Partition
1684 Partition job ran on.
1685
1686
1687 State Current State of the job in the database.
1688
1689
1690 TimeStart
1691 Time job started running.
1692
1693
1694 TimeEnd
1695 Current recorded time of the end of the job.
1696
1697
1699 Accounts=<comma separated list of account names>
1700 Only print out the transactions affecting specified accounts.
1701
1702
1703 Action=<Specific action the list will display>
1704
1705
1706 Actor=<Specific name the list will display>
1707 Only display transactions done by a certain person.
1708
1709
1710 Clusters=<comma separated list of cluster names>
1711 Only print out the transactions affecting specified clusters.
1712
1713
1714 End=<Date and time of last transaction to return>
1715 Return all transactions before this Date and time. Default is
1716 now.
1717
1718
1719 Start=<Date and time of first transaction to return>
1720 Return all transactions after this Date and time. Default is
1721 epoch.
1722
1723 Valid time formats for End and Start are...
1724
1725 HH:MM[:SS] [AM|PM]
1726 MMDD[YY] or MM/DD[/YY] or MM.DD[.YY]
1727 MM/DD[/YY]-HH:MM[:SS]
1728 YYYY-MM-DD[THH:MM[:SS]]
1729
1730
1731 Users=<comma separated list of user names>
1732 Only print out the transactions affecting specified users.
1733
1734
1735 WithAssoc
1736 Get information about which associations were affected by the
1737 transactions.
1738
1739
1740
1742 Action
1743
1744
1745 Actor
1746
1747
1748 Info
1749
1750
1751 TimeStamp
1752
1753
1754 Where
1755
1756 NOTE: If using the WithAssoc option you can also view the information
1757 about the various associations the transaction affected. The Associa‐
1758 tion format fields are described in the LIST/SHOW ASSOCIATION FORMAT
1759 OPTIONS section.
1760
1761
1762
1764 Account=<account>
1765 Account name to add this user to.
1766
1767
1768 AdminLevel=<level>
1769 Admin level of user. Valid levels are None, Operator, and
1770 Admin.
1771
1772
1773 Cluster=<cluster>
1774 Specific cluster to add user to the account on. Default is all
1775 in system.
1776
1777
1778 DefaultAccount=<account>
1779 Identify the default bank account name to be used for a job if
1780 none is specified at submission time.
1781
1782
1783 DefaultWCKey=<defaultwckey>
1784 Identify the default Workload Characterization Key.
1785
1786
1787 Name=<name>
1788 Name of user.
1789
1790
1791 NewName=<newname>
1792 Use to rename a user in the accounting database
1793
1794
1795 Partition=<name>
1796 Partition name.
1797
1798
1799 RawUsage=<value>
1800 This allows an administrator to reset the raw usage accrued to a
1801 user. The only value currently supported is 0 (zero). This is
1802 a settable specification only - it cannot be used as a filter to
1803 list users.
1804
1805
1806 WCKeys=<wckeys>
1807 Workload Characterization Key values.
1808
1809
1810 WithAssoc
1811 Display all associations for this user.
1812
1813
1814 WithCoord
1815 Display all accounts a user is coordinator for.
1816
1817
1818 WithDeleted
1819 Display information with previously deleted data.
1820
1821 NOTE: If using the WithAssoc option you can also query against associa‐
1822 tion specific information to view only certain associations this user
1823 may have. These extra options can be found in the SPECIFICATIONS FOR
1824 ASSOCIATIONS section. You can also use the general specifications list
1825 above in the GENERAL SPECIFICATIONS FOR ASSOCIATION BASED ENTITIES sec‐
1826 tion.
1827
1828
1829
1831 AdminLevel
1832 Admin level of user.
1833
1834
1835 DefaultAccount
1836 The user's default account.
1837
1838
1839 Coordinators
1840 List of users that are a coordinator of the account. (Only
1841 filled in when using the WithCoordinator option.)
1842
1843
1844 User The name of a user.
1845
1846 NOTE: If using the WithAssoc option you can also view the information
1847 about the various associations the user may have on all the clusters in
1848 the system. The association information can be filtered. Note that all
1849 the users in the database will always be shown as filter only takes
1850 effect over the association data. The Association format fields are
1851 described in the LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
1852
1853
1854
1856 WCKey Workload Characterization Key.
1857
1858
1859 Cluster
1860 Specific cluster for the WCKey.
1861
1862
1863 User The name of a user for the WCKey.
1864
1865 NOTE: If using the WithAssoc option you can also view the information
1866 about the various associations the user may have on all the clusters in
1867 the system. The Association format fields are described in the
1868 LIST/SHOW ASSOCIATION FORMAT OPTIONS section.
1869
1870
1872 Name The name of the trackable resource. This option is required for
1873 TRES types BB (Burst buffer), GRES, and License. Types CPU,
1874 Energy, Memory, and Node do not have Names. For example if GRES
1875 is the type then name is the denomination of the GRES itself
1876 e.g. GPU.
1877
1878
1879 ID The identification number of the trackable resource as it
1880 appears in the database.
1881
1882
1883 Type The type of the trackable resource. Current types are BB (Burst
1884 buffer), CPU, Energy, GRES, License, Memory, and Node.
1885
1886
1888 Trackable RESources (TRES) are used in many QOS or Association limits.
1889 When setting the limits they are comma separated list. Each TRES has a
1890 different limit, i.e. GrpTRESMins=cpu=10,mem=20 would make 2 different
1891 limits 1 for 10 cpu minutes and 1 for 20 MB memory minutes. This is
1892 the case for each limit that deals with TRES. To remove the limit -1
1893 is used i.e. GrpTRESMins=cpu=-1 would remove only the cpu TRES limit.
1894
1895 NOTE: When dealing with Memory as a TRES all limits are in MB.
1896
1897 NOTE: The Billing TRES is calculated from a partition's TRESBilling‐
1898 Weights. It is temporarily calculated during scheduling for each parti‐
1899 tion to enforce billing TRES limits. The final Billing TRES is calcu‐
1900 lated after the job has been allocated resources. The final number can
1901 be seen in scontrol show jobs and sacct output.
1902
1903
1905 When using the format option for listing various fields you can put a
1906 %NUMBER afterwards to specify how many characters should be printed.
1907
1908 e.g. format=name%30 will print 30 characters of field name right justi‐
1909 fied. A -30 will print 30 characters left justified.
1910
1911
1913 sacctmgr has the capability to load and dump Slurm association data to
1914 and from a file. This method can easily add a new cluster or copy an
1915 existing clusters associations into a new cluster with similar
1916 accounts. Each file contains Slurm association data for a single clus‐
1917 ter. Comments can be put into the file with the # character. Each
1918 line of information must begin with one of the four titles; Cluster,
1919 Parent, Account or User. Following the title is a space, dash, space,
1920 entity value, then specifications. Specifications are colon separated.
1921 If any variable such as Organization has a space in it, surround the
1922 name with single or double quotes.
1923
1924 To create a file of associations one can run
1925
1926 > sacctmgr dump tux file=tux.cfg
1927 (file=tux.cfg is optional)
1928
1929 To load a previously created file you can run
1930
1931 > sacctmgr load file=tux.cfg
1932
1933 Other options for load are -
1934
1935 clean - delete what was already there and start from scratch with this
1936 information.
1937 Cluster= - specify a different name for the cluster than that which is
1938 in the file.
1939
1940 Quick explanation how the file works.
1941
1942 Since the associations in the system follow a hierarchy, so does the
1943 file. Anything that is a parent needs to be defined before any chil‐
1944 dren. The only exception is the understood 'root' account. This is
1945 always a default for any cluster and does not need to be defined.
1946
1947 To edit/create a file start with a cluster line for the new cluster
1948
1949 Cluster - cluster_name:MaxNodesPerJob=15
1950
1951 Anything included on this line will be the defaults for all associa‐
1952 tions on this cluster. These options are as follows...
1953
1954 GrpTRESMins=
1955 The total number of TRES minutes that can possibly be used by
1956 past, present and future jobs running from this association and
1957 its children.
1958
1959 GrpTRESRunMins=
1960 Used to limit the combined total number of TRES minutes used by
1961 all jobs running with this association and its children. This
1962 takes into consideration time limit of running jobs and consumes
1963 it, if the limit is reached no new jobs are started until other
1964 jobs finish to allow time to free up.
1965
1966 GrpTRES=
1967 Maximum number of TRES running jobs are able to be allocated in
1968 aggregate for this association and all associations which are
1969 children of this association.
1970
1971 GrpJobs=
1972 Maximum number of running jobs in aggregate for this association
1973 and all associations which are children of this association.
1974
1975 GrpJobsAccrue
1976 Maximum number of pending jobs in aggregate able to accrue age
1977 priority for this association and all associations which are
1978 children of this association.
1979
1980 GrpNodes=
1981 Maximum number of nodes running jobs are able to be allocated in
1982 aggregate for this association and all associations which are
1983 children of this association.
1984
1985 GrpSubmitJobs=
1986 Maximum number of jobs which can be in a pending or running
1987 state at any time in aggregate for this association and all
1988 associations which are children of this association.
1989
1990 GrpWall=
1991 Maximum wall clock time running jobs are able to be allocated in
1992 aggregate for this association and all associations which are
1993 children of this association.
1994
1995 FairShare=
1996 Number used in conjunction with other associations to determine
1997 job priority.
1998
1999 MaxJobs=
2000 Maximum number of jobs the children of this association can run.
2001
2002 MaxNodesPerJob=
2003 Maximum number of nodes per job the children of this association
2004 can run.
2005
2006 MaxWallDurationPerJob=
2007 Maximum time (not related to job size) children of this accounts
2008 jobs can run.
2009
2010 QOS= Comma separated list of Quality of Service names (Defined in
2011 sacctmgr).
2012
2013
2014 Followed by Accounts you want in this fashion...
2015
2016 Parent - root (Defined by default)
2017 Account - cs:MaxNodesPerJob=5:MaxJobs=4:FairShare=399:MaxWallDu‐
2018 rationPerJob=40:Description='Computer Science':Organization='LC'
2019 Parent - cs
2020 Account - test:MaxNodesPerJob=1:MaxJobs=1:FairShare=1:MaxWallDu‐
2021 rationPerJob=1:Description='Test Account':Organization='Test'
2022
2023
2024 Any of the options after a ':' can be left out and they can be in any
2025 order.
2026 If you want to add any sub accounts just list the Parent THAT
2027 HAS ALREADY BEEN CREATED before the account line in this fash‐
2028 ion...
2029
2030 All account options are
2031
2032 Description=
2033 A brief description of the account.
2034
2035 GrpTRESMins=
2036 Maximum number of TRES hours running jobs are able to be allo‐
2037 cated in aggregate for this association and all associations
2038 which are children of this association. GrpTRESRunMins= Used to
2039 limit the combined total number of TRES minutes used by all jobs
2040 running with this association and its children. This takes into
2041 consideration time limit of running jobs and consumes it, if the
2042 limit is reached no new jobs are started until other jobs finish
2043 to allow time to free up.
2044
2045 GrpTRES=
2046 Maximum number of TRES running jobs are able to be allocated in
2047 aggregate for this association and all associations which are
2048 children of this association.
2049
2050 GrpJobs=
2051 Maximum number of running jobs in aggregate for this association
2052 and all associations which are children of this association.
2053
2054 GrpJobsAccrue
2055 Maximum number of pending jobs in aggregate able to accrue age
2056 priority for this association and all associations which are
2057 children of this association.
2058
2059 GrpNodes=
2060 Maximum number of nodes running jobs are able to be allocated in
2061 aggregate for this association and all associations which are
2062 children of this association.
2063
2064 GrpSubmitJobs=
2065 Maximum number of jobs which can be in a pending or running
2066 state at any time in aggregate for this association and all
2067 associations which are children of this association.
2068
2069 GrpWall=
2070 Maximum wall clock time running jobs are able to be allocated in
2071 aggregate for this association and all associations which are
2072 children of this association.
2073
2074 FairShare=
2075 Number used in conjunction with other associations to determine
2076 job priority.
2077
2078 MaxJobs=
2079 Maximum number of jobs the children of this association can run.
2080
2081 MaxNodesPerJob=
2082 Maximum number of nodes per job the children of this association
2083 can run.
2084
2085 MaxWallDurationPerJob=
2086 Maximum time (not related to job size) children of this accounts
2087 jobs can run.
2088
2089 Organization=
2090 Name of organization that owns this account.
2091
2092 QOS(=,+=,-=)
2093 Comma separated list of Quality of Service names (Defined in
2094 sacctmgr).
2095
2096
2097
2098 To add users to a account add a line like this after a Parent -
2099 line
2100 Parent - test
2101 User - adam:MaxNodesPerJob=2:MaxJobs=3:Fair‐
2102 Share=1:MaxWallDurationPerJob=1:AdminLevel=Operator:Coor‐
2103 dinator='test'
2104
2105
2106 All user options are
2107
2108 AdminLevel=
2109 Type of admin this user is (Administrator, Operator)
2110 Must be defined on the first occurrence of the user.
2111
2112 Coordinator=
2113 Comma separated list of accounts this user is coordinator
2114 over
2115 Must be defined on the first occurrence of the user.
2116
2117 DefaultAccount=
2118 system wide default account name
2119 Must be defined on the first occurrence of the user.
2120
2121 FairShare=
2122 Number used in conjunction with other associations to
2123 determine job priority.
2124
2125 MaxJobs=
2126 Maximum number of jobs this user can run.
2127
2128 MaxNodesPerJob=
2129 Maximum number of nodes per job this user can run.
2130
2131 MaxWallDurationPerJob=
2132 Maximum time (not related to job size) this user can run.
2133
2134 QOS(=,+=,-=)
2135 Comma separated list of Quality of Service names (Defined
2136 in sacctmgr).
2137
2138
2139
2141 Sacctmgr has the capability to archive to a flatfile and or load
2142 that data if needed later. The archiving is usually done by the
2143 slurmdbd and it is highly recommended you only do it through
2144 sacctmgr if you completely understand what you are doing. For
2145 slurmdbd options see "man slurmdbd" for more information. Load‐
2146 ing data into the database can be done from these files to
2147 either view old data or regenerate rolled up data.
2148
2149
2150 archive dump
2151 Dump accounting data to file. Depending on options and slurmdbd
2152 configuration data may remain in database or be purged. This
2153 operation cannot be rolled back once executed. If one of the
2154 following options is not specified when sacctmgr is called, the
2155 value configured in slurmdbd.comf is used.
2156
2157
2158 Directory=
2159 Directory to store the archive data.
2160
2161 Events Archive Events. If not specified and PurgeEventAfter is
2162 set all event data removed will be lost permanently.
2163
2164 Jobs Archive Jobs. If not specified and PurgeJobAfter is set
2165 all job data removed will be lost permanently.
2166
2167 PurgeEventAfter=
2168 Purge cluster event records older than time stated in
2169 months. If you want to purge on a shorter time period
2170 you can include hours, or days behind the numeric value
2171 to get those more frequent purges. (e.g. a value of
2172 '12hours' would purge everything older than 12 hours.)
2173
2174 PurgeJobAfter=
2175 Purge job records older than time stated in months. If
2176 you want to purge on a shorter time period you can
2177 include hours, or days behind the numeric value to get
2178 those more frequent purges. (e.g. a value of '12hours'
2179 would purge everything older than 12 hours.)
2180
2181 PurgeStepAfter=
2182 Purge step records older than time stated in months. If
2183 you want to purge on a shorter time period you can
2184 include hours, or days behind the numeric value to get
2185 those more frequent purges. (e.g. a value of '12hours'
2186 would purge everything older than 12 hours.)
2187
2188 PurgeSuspendAfter=
2189 Purge job suspend records older than time stated in
2190 months. If you want to purge on a shorter time period
2191 you can include hours, or days behind the numeric value
2192 to get those more frequent purges. (e.g. a value of
2193 '12hours' would purge everything older than 12 hours.)
2194
2195 Script=
2196 Run this script instead of the generic form of archive to
2197 flat files.
2198
2199 Steps Archive Steps. If not specified and PurgeStepAfter is
2200 set all step data removed will be lost permanently.
2201
2202 Suspend
2203 Archive Suspend Data. If not specified and PurgeSus‐
2204 pendAfter is set all suspend data removed will be lost
2205 permanently.
2206
2207
2208 archive load
2209 Load in to the database previously archived data. The archive
2210 file will not be loaded if the records already exist in the
2211 database - therefore, trying to load an archive file more than
2212 once will result in an error. When this data is again archived
2213 and purged from the database, if the old archive file is still
2214 in the directory ArchiveDir, a new archive file will be created
2215 (see ArchiveDir in the slurmdbd.conf man page), so the old file
2216 will not be overwritten and these files will have duplicate
2217 records.
2218
2219
2220 File= File to load into database.
2221
2222 Insert=
2223 SQL to insert directly into the database. This should be
2224 used very cautiously since this is writing your sql into
2225 the database.
2226
2227
2229 Some sacctmgr options may be set via environment variables.
2230 These environment variables, along with their corresponding
2231 options, are listed below. (Note: commandline options will
2232 always override these settings)
2233
2234 SLURM_CONF The location of the Slurm configuration
2235 file.
2236
2237
2239 NOTE: There is an order to set up accounting associations. You
2240 must define clusters before you add accounts and you must add
2241 accounts before you can add users.
2242
2243 -> sacctmgr create cluster tux
2244 -> sacctmgr create account name=science fairshare=50
2245 -> sacctmgr create account name=chemistry parent=science fair‐
2246 share=30
2247 -> sacctmgr create account name=physics parent=science fair‐
2248 share=20
2249 -> sacctmgr create user name=adam cluster=tux account=physics
2250 fairshare=10
2251 -> sacctmgr delete user name=adam cluster=tux account=physics
2252 -> sacctmgr delete account name=physics cluster=tux
2253 -> sacctmgr modify user where name=adam cluster=tux
2254 account=physics set
2255 maxjobs=2 maxwall=30:00
2256 -> sacctmgr add user brian account=chemistry
2257 -> sacctmgr list associations cluster=tux format=Account,Clus‐
2258 ter,User,Fairshare tree withd
2259 -> sacctmgr list transactions StartTime=11/03\-10:30:00 for‐
2260 mat=Timestamp,Action,Actor
2261 -> sacctmgr dump cluster=tux file=tux_data_file
2262 -> sacctmgr load tux_data_file
2263
2264 A user's account can not be changed directly. A new association
2265 needs to be created for the user with the new account. Then the
2266 association with the old account can be deleted.
2267
2268 When modifying an object placing the key words 'set' and the
2269 optional 'where' is critical to perform correctly below are
2270 examples to produce correct results. As a rule of thumb any‐
2271 thing you put in front of the set will be used as a quantifier.
2272 If you want to put a quantifier after the key word 'set' you
2273 should use the key word 'where'.
2274
2275 wrong-> sacctmgr modify user name=adam set fairshare=10 clus‐
2276 ter=tux
2277
2278 This will produce an error as the above line reads modify user
2279 adam set fairshare=10 and cluster=tux.
2280
2281 right-> sacctmgr modify user name=adam cluster=tux set fair‐
2282 share=10
2283 right-> sacctmgr modify user name=adam set fairshare=10 where
2284 cluster=tux
2285
2286 When changing qos for something only use the '=' operator when
2287 wanting to explicitly set the qos to something. In most cases
2288 you will want to use the '+=' or '\-=' operator to either add to
2289 or remove from the existing qos already in place.
2290
2291 If a user already has qos of normal,standby for a parent or it
2292 was explicitly set you should use qos+=expedite to add this to
2293 the list in this fashion.
2294
2295 If you are looking to only add the qos expedite to only a cer‐
2296 tain account and or cluster you can do that by specifying them
2297 in the sacctmgr line.
2298
2299 -> sacctmgr modify user name=adam set qos+=expedite
2300
2301 > sacctmgr modify user name=adam acct=this cluster=tux set
2302 qos+=expedite
2303
2304 Let's give an example how to add QOS to user accounts. List all
2305 available QOSs in the cluster.
2306
2307 ->sacctmgr show qos format=name
2308 Name
2309 ---------
2310 normal
2311 expedite
2312
2313 List all the associations in the cluster.
2314
2315 ->sacctmgr show assoc format=cluster,account,qos
2316 Cluster Account QOS
2317 -------- ---------- -----
2318 zebra root normal
2319 zebra root normal
2320 zebra g normal
2321 zebra g1 normal
2322
2323 Add the QOS expedite to account G1 and display the result.
2324 Using the operator += the QOS will be added together with the
2325 existing QOS to this account.
2326
2327 ->sacctmgr modify account name=g1 set qos+=expedite
2328
2329 ->sacctmgr show assoc format=cluster,account,qos
2330 Cluster Account QOS
2331 -------- -------- -------
2332 zebra root normal
2333 zebra root normal
2334 zebra g normal
2335 zebra g1 expedite,normal
2336
2337 Now set the QOS expedite as the only QOS for the account G and
2338 display the result. Using the operator = that expedite is the
2339 only usable QOS by account G
2340
2341 ->sacctmgr modify account name=G set qos=expedite
2342
2343 >sacctmgr show assoc format=cluster,account,user,qos
2344 Cluster Account QOS
2345 --------- -------- -----
2346 zebra root normal
2347 zebra root normal
2348 zebra g expedite
2349 zebra g1 expedite,normal
2350
2351 If a new account is added under the account G it will inherit
2352 the QOS expedite and it will not have access to QOS normal.
2353
2354 ->sacctmgr add account banana parent=G
2355
2356 ->sacctmgr show assoc format=cluster,account,qos
2357 Cluster Account QOS
2358 --------- -------- -----
2359 zebra root normal
2360 zebra root normal
2361 zebra g expedite
2362 zebra banana expedite
2363 zebra g1 expedite,normal
2364
2365 An example of listing trackable resources
2366
2367 ->sacctmgr show tres
2368 Type Name ID
2369 ---------- ----------------- --------
2370 cpu 1
2371 mem 2
2372 energy 3
2373 node 4
2374 billing 5
2375 gres gpu:tesla 1001
2376 license vcs 1002
2377 bb cray 1003
2378
2379
2380
2382 Copyright (C) 2008-2010 Lawrence Livermore National Security.
2383 Produced at Lawrence Livermore National Laboratory (cf, DIS‐
2384 CLAIMER).
2385 Copyright (C) 2010-2016 SchedMD LLC.
2386
2387 This file is part of Slurm, a resource management program. For
2388 details, see <https://slurm.schedmd.com/>.
2389
2390 Slurm is free software; you can redistribute it and/or modify it
2391 under the terms of the GNU General Public License as published
2392 by the Free Software Foundation; either version 2 of the
2393 License, or (at your option) any later version.
2394
2395 Slurm is distributed in the hope that it will be useful, but
2396 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
2397 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
2398 General Public License for more details.
2399
2400
2402 slurm.conf(5), slurmdbd(8)
2403
2404
2405
2406September 2019 Slurm Commands sacctmgr(1)