slurm.conf(5)

1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified  at  system build time using the
17       DEFAULT_SLURM_CONF parameter  or  at  execution  time  by  setting  the
18       SLURM_CONF  environment  variable.  The Slurm daemons also allow you to
19       override both the built-in and environment-provided location using  the
20       "-f" option on the command line.
21
22       The  contents  of the file are case insensitive except for the names of
23       nodes and partitions. Any text following a  "#"  in  the  configuration
24       file  is treated as a comment through the end of that line.  Changes to
25       the configuration file take effect upon restart of Slurm daemons,  dae‐
26       mon receipt of the SIGHUP signal, or execution of the command "scontrol
27       reconfigure" unless otherwise noted.
28
29       If a line begins with the word "Include"  followed  by  whitespace  and
30       then  a  file  name, that file will be included inline with the current
31       configuration file. For large or complex systems,  multiple  configura‐
32       tion  files  may  prove easier to manage and enable reuse of some files
33       (See INCLUDE MODIFIERS for more details).
34
35       Note on file permissions:
36
37       The slurm.conf file must be readable by all users of Slurm, since it is
38       used  by  many  of the Slurm commands.  Other files that are defined in
39       the slurm.conf file, such as log files and job  accounting  files,  may
40       need  to  be  created/owned  by the user "SlurmUser" to be successfully
41       accessed.  Use the "chown" and "chmod" commands to  set  the  ownership
42       and permissions appropriately.  See the section FILE AND DIRECTORY PER‐
43       MISSIONS for information about the various files and  directories  used
44       by Slurm.
45
46

PARAMETERS

48       The overall configuration parameters available include:
49
50
51       AccountingStorageBackupHost
52              The  name  of  the backup machine hosting the accounting storage
53              database.  If used with the accounting_storage/slurmdbd  plugin,
54              this  is  where the backup slurmdbd would be running.  Only used
55              with systems using SlurmDBD, ignored otherwise.
56
57
58       AccountingStorageEnforce
59              This controls what level  of  association-based  enforcement  to
60              impose on job submissions.  Valid options are any combination of
61              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
62              all  for  all  things  (expect  nojobs and nosteps, they must be
63              requested as well).
64
65              If limits, qos, or wckeys are set, associations  will  automati‐
66              cally be set.
67
68              If wckeys is set, TrackWCKey will automatically be set.
69
70              If  safe  is  set, limits and associations will automatically be
71              set.
72
73              If nojobs is set nosteps will automatically be set.
74
75              By enforcing Associations no new job is allowed to run unless  a
76              corresponding  association  exists in the system.  If limits are
77              enforced users can be limited by  association  to  whatever  job
78              size or run time limits are defined.
79
80              If nojobs is set Slurm will not account for any jobs or steps on
81              the system, like wise if nosteps is set Slurm will  not  account
82              for any steps ran limits will still be enforced.
83
84              If safe is enforced a job will only be launched against an asso‐
85              ciation or qos that has a GrpCPUMins limit set if the  job  will
86              be  able  to  run  to completion.  Without this option set, jobs
87              will be launched as long as their usage hasn't reached the  cpu-
88              minutes  limit  which  can  lead to jobs being launched but then
89              killed when the limit is reached.
90
91              With qos and/or wckeys  enforced  jobs  will  not  be  scheduled
92              unless a valid qos and/or workload characterization key is spec‐
93              ified.
94
95              When AccountingStorageEnforce  is  changed,  a  restart  of  the
96              slurmctld daemon is required (not just a "scontrol reconfig").
97
98
99       AccountingStorageHost
100              The name of the machine hosting the accounting storage database.
101              Only used with systems using SlurmDBD, ignored otherwise.   Also
102              see DefaultStorageHost.
103
104
105       AccountingStorageLoc
106              The fully qualified file name where accounting records are writ‐
107              ten  when   the   AccountingStorageType   is   "accounting_stor‐
108              age/filetxt".  Also see DefaultStorageLoc.
109
110
111       AccountingStoragePass
112              The  password  used  to gain access to the database to store the
113              accounting data.  Only used for database type  storage  plugins,
114              ignored  otherwise.   In the case of Slurm DBD (Database Daemon)
115              with MUNGE authentication this can be configured to use a  MUNGE
116              daemon specifically configured to provide authentication between
117              clusters while the default MUNGE daemon provides  authentication
118              within  a  cluster.   In that case, AccountingStoragePass should
119              specify the named port to be used for  communications  with  the
120              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
121              The default value is NULL.  Also see DefaultStoragePass.
122
123
124       AccountingStoragePort
125              The listening port of the accounting  storage  database  server.
126              Only  used for database type storage plugins, ignored otherwise.
127              Also see DefaultStoragePort.
128
129
130       AccountingStorageTRES
131              Comma separated list of resources you wish to track on the clus‐
132              ter.   These  are the resources requested by the sbatch/srun job
133              when it is submitted. Currently this consists of  any  GRES,  BB
134              (burst  buffer) or license along with CPU, Memory, Node, Energy,
135              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
136              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
137              These default TRES cannot be disabled,  but  only  appended  to.
138              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
139              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
140              along with a gres called craynetwork as well as a license called
141              iop1. Whenever these resources are used on the cluster they  are
142              recorded.  The  TRES are automatically set up in the database on
143              the start of the slurmctld.
144
145
146       AccountingStorageType
147              The accounting storage mechanism  type.   Acceptable  values  at
148              present  include "accounting_storage/filetxt", "accounting_stor‐
149              age/none"  and  "accounting_storage/slurmdbd".   The   "account‐
150              ing_storage/filetxt"  value  indicates  that  accounting records
151              will be written to the file specified by the  AccountingStorage‐
152              Loc  parameter.   The  "accounting_storage/slurmdbd" value indi‐
153              cates that accounting records will be written to the Slurm  DBD,
154              which  manages  an underlying MySQL database. See "man slurmdbd"
155              for more information.  The default  value  is  "accounting_stor‐
156              age/none" and indicates that account records are not maintained.
157              Note: The filetxt  plugin  records  only  a  limited  subset  of
158              accounting  information and will prevent some sacct options from
159              proper operation.  Also see DefaultStorageType.
160
161
162       AccountingStorageUser
163              The user account for accessing the accounting storage  database.
164              Only  used for database type storage plugins, ignored otherwise.
165              Also see DefaultStorageUser.
166
167
168       AccountingStoreJobComment
169              If set to "YES" then include the job's comment field in the  job
170              complete  message  sent to the Accounting Storage database.  The
171              default is "YES".  Note the AdminComment and  SystemComment  are
172              always recorded in the database.
173
174
175       AcctGatherNodeFreq
176              The  AcctGather  plugins  sampling interval for node accounting.
177              For AcctGather plugin values of none, this parameter is ignored.
178              For  all  other  values  this parameter is the number of seconds
179              between node accounting samples. For the acct_gather_energy/rapl
180              plugin, set a value less than 300 because the counters may over‐
181              flow beyond this rate.  The default value is  zero.  This  value
182              disables  accounting  sampling  for  nodes. Note: The accounting
183              sampling interval for jobs is determined by the value of  JobAc‐
184              ctGatherFrequency.
185
186
187       AcctGatherEnergyType
188              Identifies the plugin to be used for energy consumption account‐
189              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
190              plugin  to  collect  energy consumption data for jobs and nodes.
191              The collection of energy consumption data  takes  place  on  the
192              node  level,  hence only in case of exclusive job allocation the
193              energy consumption measurements will reflect the job's real con‐
194              sumption. In case of node sharing between jobs the reported con‐
195              sumed energy per job (through sstat or sacct) will  not  reflect
196              the real energy consumed by the jobs.
197
198              Configurable values at present are:
199
200              acct_gather_energy/none
201                                  No energy consumption data is collected.
202
203              acct_gather_energy/ipmi
204                                  Energy  consumption  data  is collected from
205                                  the Baseboard  Management  Controller  (BMC)
206                                  using  the  Intelligent  Platform Management
207                                  Interface (IPMI).
208
209              acct_gather_energy/rapl
210                                  Energy consumption data  is  collected  from
211                                  hardware  sensors  using the Running Average
212                                  Power  Limit  (RAPL)  mechanism.  Note  that
213                                  enabling  RAPL  may require the execution of
214                                  the command "sudo modprobe msr".
215
216
217       AcctGatherInfinibandType
218              Identifies the plugin to be used for infiniband network  traffic
219              accounting.   The  jobacct_gather  plugin and slurmd daemon call
220              this plugin to collect network traffic data for jobs and  nodes.
221              The  collection  of network traffic data takes place on the node
222              level, hence only in case of exclusive job allocation  the  col‐
223              lected  values  will  reflect the job's real traffic. In case of
224              node sharing between jobs the reported network traffic  per  job
225              (through sstat or sacct) will not reflect the real network traf‐
226              fic by the jobs.
227
228              Configurable values at present are:
229
230              acct_gather_infiniband/none
231                                  No infiniband network data are collected.
232
233              acct_gather_infiniband/ofed
234                                  Infiniband network  traffic  data  are  col‐
235                                  lected from the hardware monitoring counters
236                                  of  Infiniband  devices  through  the   OFED
237                                  library.   In  order  to account for per job
238                                  network traffic, add the "ic/ofed"  TRES  to
239                                  AccountingStorageTRES.
240
241
242       AcctGatherFilesystemType
243              Identifies the plugin to be used for filesystem traffic account‐
244              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
245              plugin  to  collect  filesystem traffic data for jobs and nodes.
246              The collection of filesystem traffic data  takes  place  on  the
247              node  level,  hence only in case of exclusive job allocation the
248              collected values will reflect the job's real traffic. In case of
249              node  sharing  between  jobs the reported filesystem traffic per
250              job (through sstat or sacct) will not reflect the real  filesys‐
251              tem traffic by the jobs.
252
253
254              Configurable values at present are:
255
256              acct_gather_filesystem/none
257                                  No filesystem data are collected.
258
259              acct_gather_filesystem/lustre
260                                  Lustre filesystem traffic data are collected
261                                  from the counters found in /proc/fs/lustre/.
262                                  In order to account for per job lustre traf‐
263                                  fic, add the "fs/lustre"  TRES  to  Account‐
264                                  ingStorageTRES.
265
266
267       AcctGatherProfileType
268              Identifies  the  plugin  to  be used for detailed job profiling.
269              The jobacct_gather plugin and slurmd daemon call this plugin  to
270              collect  detailed  data  such  as  I/O  counts, memory usage, or
271              energy consumption for jobs and nodes. There are  interfaces  in
272              this  plugin  to collect data as step start and completion, task
273              start and completion, and at the account gather  frequency.  The
274              data collected at the node level is related to jobs only in case
275              of exclusive job allocation.
276
277              Configurable values at present are:
278
279              acct_gather_profile/none
280                                  No profile data is collected.
281
282              acct_gather_profile/hdf5
283                                  This enables the HDF5 plugin. The  directory
284                                  where the profile files are stored and which
285                                  values are collected are configured  in  the
286                                  acct_gather.conf file.
287
288              acct_gather_profile/influxdb
289                                  This   enables   the  influxdb  plugin.  The
290                                  influxdb  instance  host,  port,   database,
291                                  retention  policy  and which values are col‐
292                                  lected     are     configured     in     the
293                                  acct_gather.conf file.
294
295
296       AllowSpecResourcesUsage
297              If  set  to  1,  Slurm allows individual jobs to override node's
298              configured CoreSpecCount value. For a job to take  advantage  of
299              this feature, a command line option of --core-spec must be spec‐
300              ified.  The default value for this option is 1 for Cray  systems
301              and 0 for other system types.
302
303
304       AuthInfo
305              Additional information to be used for authentication of communi‐
306              cations between the Slurm daemons (slurmctld and slurmd) and the
307              Slurm clients.  The interpretation of this option is specific to
308              the configured AuthType.  Multiple options may be specified in a
309              comma delimited list.  If not specified, the default authentica‐
310              tion information will be used.
311
312              cred_expire   Default job step credential lifetime,  in  seconds
313                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
314                            ciently long enough to load user environment,  run
315                            prolog,  deal with the slurmd getting paged out of
316                            memory,  etc.   This  also  controls  how  long  a
317                            requeued job must wait before starting again.  The
318                            default value is 120 seconds.
319
320              socket        Path name to a MUNGE daemon socket  to  use  (e.g.
321                            "socket=/var/run/munge/munge.socket.2").       The
322                            default value is  "/var/run/munge/munge.socket.2".
323                            Used by auth/munge and crypto/munge.
324
325              ttl           Credential  lifetime, in seconds (e.g. "ttl=300").
326                            The default value  is  dependent  upon  the  MUNGE
327                            installation, but is typically 300 seconds.
328
329
330       AuthType
331              The  authentication method for communications between Slurm com‐
332              ponents.  Acceptable values at present include "auth/munge"  and
333              "auth/none".   The  default  value is "auth/munge".  "auth/none"
334              includes the UID in each communication, but it is not  verified.
335              This   may  be  fine  for  testing  purposes,  but  do  not  use
336              "auth/none" if you desire any security.  "auth/munge"  indicates
337              that  MUNGE  is to be used.  (See "https://dun.github.io/munge/"
338              for more information).  All Slurm daemons and commands  must  be
339              terminated  prior  to  changing  the value of AuthType and later
340              restarted.
341
342
343       BackupAddr
344              Defunct option, see SlurmctldHost.
345
346
347       BackupController
348              Defunct option, see SlurmctldHost.
349
350              The backup controller recovers state information from the State‐
351              SaveLocation directory, which must be readable and writable from
352              both the primary and backup controllers.  While  not  essential,
353              it  is  recommended  that  you specify a backup controller.  See
354              the RELOCATING CONTROLLERS section if you change this.
355
356
357       BatchStartTimeout
358              The maximum time (in seconds) that a batch job is permitted  for
359              launching  before  being  considered  missing  and releasing the
360              allocation. The default value is 10 (seconds). Larger values may
361              be required if more time is required to execute the Prolog, load
362              user environment variables (for Moab spawned jobs),  or  if  the
363              slurmd daemon gets paged from memory.
364              Note:  The  test  for  a job being successfully launched is only
365              performed when the Slurm daemon on the  compute  node  registers
366              state  with the slurmctld daemon on the head node, which happens
367              fairly rarely.  Therefore a job will not necessarily  be  termi‐
368              nated if its start time exceeds BatchStartTimeout.  This config‐
369              uration parameter is also applied  to  launch  tasks  and  avoid
370              aborting srun commands due to long running Prolog scripts.
371
372
373       BurstBufferType
374              The  plugin  used to manage burst buffers.  Acceptable values at
375              present include "burst_buffer/none".  More information later...
376
377
378       CheckpointType
379              The system-initiated checkpoint method to be used for user jobs.
380              The  slurmctld  daemon  must be restarted for a change in Check‐
381              pointType to take effect. Supported values presently include:
382
383              checkpoint/blcr   Berkeley Lab Checkpoint Restart (BLCR).  NOTE:
384                                If  a  file is found at sbin/scch (relative to
385                                the Slurm installation location), it  will  be
386                                executed  upon  completion  of the checkpoint.
387                                This can be a script  used  for  managing  the
388                                checkpoint  files.   NOTE:  Slurm's BLCR logic
389                                only supports batch jobs.
390
391              checkpoint/none   no checkpoint support (default)
392
393              checkpoint/ompi   OpenMPI (version 1.3 or higher)
394
395
396       ClusterName
397              The name by which this Slurm managed cluster  is  known  in  the
398              accounting  database.   This  is  needed  distinguish accounting
399              records when multiple clusters  report  to  the  same  database.
400              Because of limitations in some databases, any upper case letters
401              in the name will be silently mapped to lower case. In  order  to
402              avoid confusion, it is recommended that the name be lower case.
403
404
405       CommunicationParameters
406              Comma separated options identifying communication options.
407
408              CheckGhalQuiesce
409                             Used  specifically  on a Cray using an Aries Ghal
410                             interconnect.  This will check to see if the sys‐
411                             tem  is  quiescing when sending a message, and if
412                             so, we wait until it is done before sending.
413
414              NoAddrCache By  default,  Slurm  will  cache  a  node's  network
415              address after
416                             successfully   establishing  the  node's  network
417                             address. This option disables the cache and Slurm
418                             will look up the node's network address each time
419                             a connection is made. This is useful,  for  exam‐
420                             ple,  in  a  cloud  environment  where  the  node
421                             addresses come and go out of DNS.
422
423              NoCtldInAddrAny
424                             Used to directly bind to the address of what  the
425                             node resolves to running the slurmctld instead of
426                             binding messages to  any  address  on  the  node,
427                             which is the default.
428
429              NoInAddrAny    Used  to directly bind to the address of what the
430                             node resolves to instead of binding  messages  to
431                             any  address  on  the  node which is the default.
432                             This option is for all daemons/clients except for
433                             the slurmctld.
434
435
436       CompleteWait
437              The  time,  in  seconds, given for a job to remain in COMPLETING
438              state before any additional jobs are scheduled.  If set to zero,
439              pending  jobs will be started as soon as possible.  Since a COM‐
440              PLETING job's resources are released for use by  other  jobs  as
441              soon  as  the Epilog completes on each individual node, this can
442              result in very fragmented resource allocations.  To provide jobs
443              with  the  minimum response time, a value of zero is recommended
444              (no waiting).  To minimize fragmentation of resources,  a  value
445              equal  to  KillWait plus two is recommended.  In that case, set‐
446              ting KillWait to a small value may be beneficial.   The  default
447              value of CompleteWait is zero seconds.  The value may not exceed
448              65533.
449
450
451       ControlAddr
452              Defunct option, see SlurmctldHost.
453
454
455       ControlMachine
456              Defunct option, see SlurmctldHost.
457
458
459       CoreSpecPlugin
460              Identifies the plugins to be used for enforcement of  core  spe‐
461              cialization.   The  slurmd daemon must be restarted for a change
462              in CoreSpecPlugin to take effect.  Acceptable values at  present
463              include:
464
465              core_spec/cray      used only for Cray systems
466
467              core_spec/none      used for all other system types
468
469
470       CpuFreqDef
471              Default  CPU  frequency  value or frequency governor to use when
472              running a job step if it has not been explicitly  set  with  the
473              --cpu-freq  option.   Acceptable  values  at  present  include a
474              numeric value (frequency in kilohertz) or one of  the  following
475              governors:
476
477              Conservative  attempts to use the Conservative CPU governor
478
479              OnDemand      attempts to use the OnDemand CPU governor
480
481              Performance   attempts to use the Performance CPU governor
482
483              PowerSave     attempts to use the PowerSave CPU governor
484       There  is no default value. If unset, no attempt to set the governor is
485       made if the --cpu-freq option has not been set.
486
487
488       CpuFreqGovernors
489              List of CPU frequency governors allowed to be set with the  sal‐
490              loc,  sbatch,  or srun option  --cpu-freq.  Acceptable values at
491              present include:
492
493              Conservative  attempts to use the Conservative CPU governor
494
495              OnDemand      attempts to  use  the  OnDemand  CPU  governor  (a
496                            default value)
497
498              Performance   attempts  to  use  the Performance CPU governor (a
499                            default value)
500
501              PowerSave     attempts to use the PowerSave CPU governor
502
503              UserSpace     attempts to use  the  UserSpace  CPU  governor  (a
504                            default value)
505       The default is OnDemand, Performance and UserSpace.
506
507       CryptoType
508              The  cryptographic  signature tool to be used in the creation of
509              job step credentials.  The slurmctld daemon  must  be  restarted
510              for a change in CryptoType to take effect.  Acceptable values at
511              present  include   "crypto/munge".    The   default   value   is
512              "crypto/munge" and is the recommended.
513
514
515       DebugFlags
516              Defines  specific  subsystems which should provide more detailed
517              event logging.  Multiple subsystems can be specified with  comma
518              separators.   Most DebugFlags will result in verbose logging for
519              the identified subsystems and could impact  performance.   Valid
520              subsystems available today (with more to come) include:
521
522              Backfill         Backfill scheduler details
523
524              BackfillMap      Backfill scheduler to log a very verbose map of
525                               reserved resources through time.  Combine  with
526                               Backfill for a verbose and complete view of the
527                               backfill scheduler's work.
528
529              BurstBuffer      Burst Buffer plugin
530
531              CPU_Bind         CPU binding details for jobs and steps
532
533              CpuFrequency     Cpu frequency details for jobs and steps  using
534                               the --cpu-freq option.
535
536              Elasticsearch    Elasticsearch debug info
537
538              Energy           AcctGatherEnergy debug info
539
540              ExtSensors       External Sensors debug info
541
542              Federation       Federation scheduling debug info
543
544              FrontEnd         Front end node details
545
546              Gres             Generic resource details
547
548              HeteroJobs       Heterogeneous job details
549
550              Gang             Gang scheduling details
551
552              JobContainer     Job container plugin details
553
554              License          License management details
555
556              NodeFeatures     Node Features plugin debug info
557
558              NO_CONF_HASH     Do  not  log  when the slurm.conf files differs
559                               between Slurm daemons
560
561              Power            Power management plugin
562
563              Priority         Job prioritization
564
565              Profile          AcctGatherProfile plugins details
566
567              Protocol         Communication protocol details
568
569              Reservation      Advanced reservations
570
571              SelectType       Resource selection plugin
572
573              Steps            Slurmctld resource allocation for job steps
574
575              Switch           Switch plugin
576
577              TimeCray         Timing of Cray APIs
578
579              TraceJobs        Trace jobs in slurmctld. It will print detailed
580                               job  information  including  state, job ids and
581                               allocated nodes counter.
582
583              Triggers         Slurmctld triggers
584
585
586       DefMemPerCPU
587              Default  real  memory  size  available  per  allocated  CPU   in
588              megabytes.   Used  to  avoid over-subscribing memory and causing
589              paging.  DefMemPerCPU would generally be used if individual pro‐
590              cessors are allocated to jobs (SelectType=select/cons_res).  The
591              default value is 0  (unlimited).   Also  see  DefMemPerNode  and
592              MaxMemPerCPU.    DefMemPerCPU  and  DefMemPerNode  are  mutually
593              exclusive.
594
595
596       DefMemPerNode
597              Default  real  memory  size  available  per  allocated  node  in
598              megabytes.   Used  to  avoid over-subscribing memory and causing
599              paging.  DefMemPerNode would generally be used  if  whole  nodes
600              are  allocated  to jobs (SelectType=select/linear) and resources
601              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
602              The  default  value is 0 (unlimited).  Also see DefMemPerCPU and
603              MaxMemPerNode.   DefMemPerCPU  and  DefMemPerNode  are  mutually
604              exclusive.
605
606
607       DefaultStorageHost
608              The  default  name of the machine hosting the accounting storage
609              and job completion databases.  Only used for database type stor‐
610              age  plugins  and when the AccountingStorageHost and JobCompHost
611              have not been defined.
612
613
614       DefaultStorageLoc
615              The fully qualified file name where  accounting  records  and/or
616              job  completion  records are written when the DefaultStorageType
617              is "filetxt".  Also see AccountingStorageLoc and JobCompLoc.
618
619
620       DefaultStoragePass
621              The password used to gain access to the database  to  store  the
622              accounting and job completion data.  Only used for database type
623              storage plugins, ignored otherwise.   Also  see  AccountingStor‐
624              agePass and JobCompPass.
625
626
627       DefaultStoragePort
628              The  listening port of the accounting storage and/or job comple‐
629              tion database server.  Only used for database type storage plug‐
630              ins, ignored otherwise.  Also see AccountingStoragePort and Job‐
631              CompPort.
632
633
634       DefaultStorageType
635              The  accounting  and  job  completion  storage  mechanism  type.
636              Acceptable  values  at  present  include  "filetxt", "mysql" and
637              "none".  The value "filetxt"  indicates  that  records  will  be
638              written  to a file.  The value "mysql" indicates that accounting
639              records will be written to a MySQL  or  MariaDB  database.   The
640              default  value is "none", which means that records are not main‐
641              tained.  Also see AccountingStorageType and JobCompType.
642
643
644       DefaultStorageUser
645              The user account for accessing the accounting storage and/or job
646              completion  database.  Only used for database type storage plug‐
647              ins, ignored otherwise.  Also see AccountingStorageUser and Job‐
648              CompUser.
649
650
651       DisableRootJobs
652              If  set  to  "YES" then user root will be prevented from running
653              any jobs.  The default value is "NO", meaning user root will  be
654              able to execute jobs.  DisableRootJobs may also be set by parti‐
655              tion.
656
657
658       EioTimeout
659              The number of seconds srun waits for  slurmstepd  to  close  the
660              TCP/IP  connection  used to relay data between the user applica‐
661              tion and srun when the user application terminates. The  default
662              value is 60 seconds.  May not exceed 65533.
663
664
665       EnforcePartLimits
666              If set to "ALL" then jobs which exceed a partition's size and/or
667              time limits will be rejected at submission time. If job is  sub‐
668              mitted  to  multiple partitions, the job must satisfy the limits
669              on all the requested partitions. If set to  "NO"  then  the  job
670              will  be  accepted  and remain queued until the partition limits
671              are altered(Time and Node Limits).  If set to "ANY" or  "YES"  a
672              job  must  satisfy any of the requested partitions to be submit‐
673              ted. The default value is "NO".  NOTE: If set, then a job's  QOS
674              can not be used to exceed partition limits.  NOTE: The partition
675              limits  being  considered  are  it's  configured   MaxMemPerCPU,
676              MaxMemPerNode, MinNodes, MaxNodes, MaxTime, AllocNodes, AllowAc‐
677              counts, AllowGroups, AllowQOS, and QOS usage threshold.
678
679
680       Epilog Fully qualified pathname of a script to execute as user root  on
681              every    node    when    a    user's    job    completes   (e.g.
682              "/usr/local/slurm/epilog"). A glob pattern (See  glob  [22m(7))  may
683              also   be  used  to  run  more  than  one  epilog  script  (e.g.
684              "/etc/slurm/epilog.d/*"). The Epilog script or  scripts  may  be
685              used  to purge files, disable user login, etc.  By default there
686              is no epilog.  See Prolog and Epilog Scripts for  more  informa‐
687              tion.
688
689
690       EpilogMsgTime
691              The number of microseconds that the slurmctld daemon requires to
692              process an epilog completion message from  the  slurmd  daemons.
693              This  parameter can be used to prevent a burst of epilog comple‐
694              tion messages from being sent at the same time which should help
695              prevent  lost  messages  and  improve throughput for large jobs.
696              The default value is 2000 microseconds.  For a  1000  node  job,
697              this  spreads  the  epilog completion messages out over two sec‐
698              onds.
699
700
701       EpilogSlurmctld
702              Fully qualified pathname of a program for the slurmctld to  exe‐
703              cute    upon    termination    of   a   job   allocation   (e.g.
704              "/usr/local/slurm/epilog_controller").  The program executes  as
705              SlurmUser,  which gives it permission to drain nodes and requeue
706              the job if a failure occurs (See scontrol(1)).  Exactly what the
707              program  does  and how it accomplishes this is completely at the
708              discretion of the system administrator.  Information  about  the
709              job  being  initiated,  it's allocated nodes, etc. are passed to
710              the program using environment variables.  See Prolog and  Epilog
711              Scripts for more information.
712
713
714       ExtSensorsFreq
715              The  external  sensors  plugin  sampling  interval.   If ExtSen‐
716              sorsType=ext_sensors/none, this parameter is ignored.   For  all
717              other  values of ExtSensorsType, this parameter is the number of
718              seconds between external sensors samples for hardware components
719              (nodes,  switches,  etc.)  The default value is zero. This value
720              disables external sensors sampling. Note:  This  parameter  does
721              not affect external sensors data collection for jobs/steps.
722
723
724       ExtSensorsType
725              Identifies  the plugin to be used for external sensors data col‐
726              lection.  Slurmctld calls this plugin to collect  external  sen‐
727              sors  data  for  jobs/steps  and hardware components. In case of
728              node sharing between  jobs  the  reported  values  per  job/step
729              (through  sstat  or  sacct)  may not be accurate.  See also "man
730              ext_sensors.conf".
731
732              Configurable values at present are:
733
734              ext_sensors/none    No external sensors data is collected.
735
736              ext_sensors/rrd     External sensors data is collected from  the
737                                  RRD database.
738
739
740       FairShareDampeningFactor
741              Dampen  the  effect of exceeding a user or group's fair share of
742              allocated resources. Higher values will provides greater ability
743              to differentiate between exceeding the fair share at high levels
744              (e.g. a value of 1 results in almost no difference between over‐
745              consumption  by  a factor of 10 and 100, while a value of 5 will
746              result in a significant difference in  priority).   The  default
747              value is 1.
748
749
750       FastSchedule
751              Controls how a node's configuration specifications in slurm.conf
752              are used.  If the number of node configuration  entries  in  the
753              configuration  file  is  significantly  lower than the number of
754              nodes, setting FastSchedule to 1 will permit much faster  sched‐
755              uling  decisions  to be made.  (The scheduler can just check the
756              values in a few configuration records instead of possibly  thou‐
757              sands of node records.)  Note that on systems with hyper-thread‐
758              ing, the processor count reported by the node will be twice  the
759              actual  processor  count.   Consider  which value you want to be
760              used for scheduling purposes.
761
762              0    Base scheduling decisions upon the actual configuration  of
763                   each individual node except that the node's processor count
764                   in Slurm's configuration must  match  the  actual  hardware
765                   configuration   if   PreemptMode=suspend,gang   or  Select‐
766                   Type=select/cons_res are configured (both of those  plugins
767                   maintain  resource allocation information using bitmaps for
768                   the cores in the system and must remain static,  while  the
769                   node's memory and disk space can be established later).
770
771              1 (default)
772                   Consider  the  configuration of each node to be that speci‐
773                   fied in the slurm.conf configuration file and any node with
774                   less than the configured resources will be set to DRAIN.
775
776              2    Consider  the  configuration of each node to be that speci‐
777                   fied in the slurm.conf configuration file and any node with
778                   less  than  the configured resources will not be set DRAIN.
779                   This option is generally only useful for testing purposes.
780
781
782       FederationParameters
783              Used to define federation options. Multiple options may be comma
784              separated.
785
786
787              fed_display
788                     If  set,  then  the  client status commands (e.g. squeue,
789                     sinfo, sprio, etc.) will display information in a  feder‐
790                     ated view by default. This option is functionally equiva‐
791                     lent to using the --federation options on  each  command.
792                     Use the client's --local option to override the federated
793                     view and get a local view of the given cluster.
794
795
796       FirstJobId
797              The job id to be used for the first submitted to Slurm without a
798              specific  requested  value.  Job id values generated will incre‐
799              mented by 1 for each subsequent job. This may be used to provide
800              a  meta-scheduler with a job id space which is disjoint from the
801              interactive jobs.  The default value is 1.  Also see MaxJobId
802
803
804       GetEnvTimeout
805              Used for Moab scheduled jobs only. Controls how long job  should
806              wait  in  seconds  for  loading  the  user's  environment before
807              attempting to load it from a cache file. Applies when  the  srun
808              or sbatch --get-user-env option is used. If set to 0 then always
809              load the user's environment from the cache  file.   The  default
810              value is 2 seconds.
811
812
813       GresTypes
814              A  comma  delimited  list  of  generic  resources to be managed.
815              These generic resources may have an associated plugin  available
816              to  provide  additional functionality.  No generic resources are
817              managed by default.  Ensure this parameter is consistent  across
818              all  nodes  in  the cluster for proper operation.  The slurmctld
819              daemon must be restarted for changes to this parameter to become
820              effective.
821
822
823       GroupUpdateForce
824              If  set  to a non-zero value, then information about which users
825              are members of groups allowed to use a partition will be updated
826              periodically,  even  when  there  have  been  no  changes to the
827              /etc/group file.  If set to zero, group member information  will
828              be  updated  only  after  the  /etc/group  file is updated.  The
829              default value is 1.  Also see the GroupUpdateTime parameter.
830
831
832       GroupUpdateTime
833              Controls how frequently information about which users  are  mem‐
834              bers  of  groups allowed to use a partition will be updated, and
835              how long user group membership lists will be cached.   The  time
836              interval  is  given  in seconds with a default value of 600 sec‐
837              onds.  A value of zero will prevent periodic updating  of  group
838              membership  information.   Also see the GroupUpdateForce parame‐
839              ter.
840
841
842       HealthCheckInterval
843              The interval in seconds between  executions  of  HealthCheckPro‐
844              gram.  The default value is zero, which disables execution.
845
846
847       HealthCheckNodeState
848              Identify what node states should execute the HealthCheckProgram.
849              Multiple state values may be specified with a  comma  separator.
850              The default value is ANY to execute on nodes in any state.
851
852              ALLOC       Run  on  nodes  in  the  ALLOC state (all CPUs allo‐
853                          cated).
854
855              ANY         Run on nodes in any state.
856
857              CYCLE       Rather than running the health check program on  all
858                          nodes at the same time, cycle through running on all
859                          compute nodes through the course of the HealthCheck‐
860                          Interval.  May  be  combined  with  the various node
861                          state options.
862
863              IDLE        Run on nodes in the IDLE state.
864
865              MIXED       Run on nodes in the MIXED state (some CPUs idle  and
866                          other CPUs allocated).
867
868
869       HealthCheckProgram
870              Fully  qualified  pathname  of  a script to execute as user root
871              periodically  on  all  compute  nodes  that  are  not   in   the
872              NOT_RESPONDING  state.  This  program  may be used to verify the
873              node is fully operational and DRAIN the node or send email if  a
874              problem  is detected.  Any action to be taken must be explicitly
875              performed by the program (e.g. execute  "scontrol  update  Node‐
876              Name=foo  State=drain  Reason=tmp_file_system_full"  to  drain a
877              node).   The  execution  interval  is   controlled   using   the
878              HealthCheckInterval parameter.  Note that the HealthCheckProgram
879              will be executed at the same time on all nodes to  minimize  its
880              impact  upon  parallel programs.  This program is will be killed
881              if it does not terminate normally within 60 seconds.  This  pro‐
882              gram  will  also  be  executed  when  the slurmd daemon is first
883              started and before it registers with the slurmctld  daemon.   By
884              default, no program will be executed.
885
886
887       InactiveLimit
888              The interval, in seconds, after which a non-responsive job allo‐
889              cation command (e.g. srun or salloc)  will  result  in  the  job
890              being  terminated.  If the node on which the command is executed
891              fails or the command abnormally terminates, this will  terminate
892              its  job allocation.  This option has no effect upon batch jobs.
893              When setting a value, take into consideration  that  a  debugger
894              using  srun  to launch an application may leave the srun command
895              in a stopped state for extended periods of time.  This limit  is
896              ignored  for  jobs  running in partitions with the RootOnly flag
897              set (the scheduler running as root will be responsible  for  the
898              job).   The default value is unlimited (zero) and may not exceed
899              65533 seconds.
900
901
902       JobAcctGatherType
903              The job accounting mechanism type.  Acceptable values at present
904              include  "jobacct_gather/linux"  (for  Linux systems) and is the
905              recommended       one,        "jobacct_gather/cgroup"        and
906              "jobacct_gather/none"   (no  accounting  data  collected).   The
907              default value is "jobacct_gather/none".  "jobacct_gather/cgroup"
908              is  a plugin for the Linux operating system that uses cgroups to
909              collect accounting statistics. The plugin collects the following
910              statistics:    From    the   cgroup   memory   subsystem:   mem‐
911              ory.usage_in_bytes (reported  as  'pages')  and  rss  from  mem‐
912              ory.stat (reported as 'rss'). From the cgroup cpuacct subsystem:
913              user cpu time and system cpu  time.  No  value  is  provided  by
914              cgroups  for virtual memory size ('vsize').  In order to use the
915              sstat tool  "jobacct_gather/linux",  or  "jobacct_gather/cgroup"
916              must be configured.
917              NOTE: Changing this configuration parameter changes the contents
918              of the messages between Slurm daemons.  Any  previously  running
919              job  steps  are managed by a slurmstepd daemon that will persist
920              through the lifetime of that job step and not change it's commu‐
921              nication protocol. Only change this configuration parameter when
922              there are no running job steps.
923
924
925       JobAcctGatherFrequency
926              The job accounting and profiling sampling intervals.   The  sup‐
927              ported format is follows:
928
929              JobAcctGatherFrequency=<datatype>=<interval>
930                          where  <datatype>=<interval> specifies the task sam‐
931                          pling interval for the jobacct_gather  plugin  or  a
932                          sampling  interval  for  a  profiling  type  by  the
933                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
934                          rated  <datatype>=<interval> intervals may be speci‐
935                          fied. Supported datatypes are as follows:
936
937                          task=<interval>
938                                 where <interval> is the task sampling  inter‐
939                                 val in seconds for the jobacct_gather plugins
940                                 and    for    task    profiling    by     the
941                                 acct_gather_profile plugin.
942
943                          energy=<interval>
944                                 where  <interval> is the sampling interval in
945                                 seconds  for  energy  profiling   using   the
946                                 acct_gather_energy plugin
947
948                          network=<interval>
949                                 where  <interval> is the sampling interval in
950                                 seconds for infiniband  profiling  using  the
951                                 acct_gather_infiniband plugin.
952
953                          filesystem=<interval>
954                                 where  <interval> is the sampling interval in
955                                 seconds for filesystem  profiling  using  the
956                                 acct_gather_filesystem plugin.
957
958              The default value for task sampling interval
959              is  30  seconds. The default value for all other intervals is 0.
960              An interval of 0 disables sampling of the  specified  type.   If
961              the  task sampling interval is 0, accounting information is col‐
962              lected only at job termination (reducing Slurm interference with
963              the job).
964              Smaller (non-zero) values have a greater impact upon job perfor‐
965              mance, but a value of 30 seconds is not likely to be  noticeable
966              for applications having less than 10,000 tasks.
967              Users  can  independently  override  each  interval on a per job
968              basis using the --acctg-freq option when submitting the job.
969
970
971       JobAcctGatherParams
972              Arbitrary parameters for the job account gather  plugin  Accept‐
973              able values at present include:
974
975              NoShared            Exclude shared memory from accounting.
976
977              UsePss              Use  PSS  value  instead of RSS to calculate
978                                  real usage of memory.  The PSS value will be
979                                  saved as RSS.
980
981              OverMemoryKill      Kill  steps  that  are being detected to use
982                                  more  memory  than  requested,  every   time
983                                  accounting information is gathered by JobAc‐
984                                  ctGather plugin.  This  parameter  will  not
985                                  kill a job directly, but only the step.  See
986                                  MemLimitEnforce  for  that   purpose.   This
987                                  parameter  should be used with caution as if
988                                  jobs exceeds its memory  allocation  it  may
989                                  affect   other   processes   and/or  machine
990                                  health.  NOTE: It is  recommended  to  limit
991                                  memory by enabling task/cgroup in TaskPlugin
992                                  and  making  use  of   ConstrainRAMSpace=yes
993                                  cgroup.conf  instead  of using this JobAcct‐
994                                  Gather  mechanism  for  memory  enforcement,
995                                  since  the  former  has  a  lower resolution
996                                  (JobAcctGatherFreq) and OOMs could happen at
997                                  some point.
998
999
1000       JobCheckpointDir
1001              Specifies  the  default  directory  for  storing  or reading job
1002              checkpoint information. The data stored here is only a few thou‐
1003              sand  bytes  per job and includes information needed to resubmit
1004              the job request, not job's memory image. The directory  must  be
1005              readable  and writable by SlurmUser, but not writable by regular
1006              users. The job memory images may be in a different  location  as
1007              specified by --checkpoint-dir option at job submit time or scon‐
1008              trol's ImageDir option.
1009
1010
1011       JobCompHost
1012              The name of the machine hosting  the  job  completion  database.
1013              Only  used for database type storage plugins, ignored otherwise.
1014              Also see DefaultStorageHost.
1015
1016
1017       JobCompLoc
1018              The fully qualified file name where job completion  records  are
1019              written  when  the JobCompType is "jobcomp/filetxt" or the data‐
1020              base where job completion records are stored when  the  JobComp‐
1021              Type  is  a  database, or an url with format http://yourelastic‐
1022              server:port when JobCompType is "jobcomp/elasticsearch".   NOTE:
1023              when  you specify a URL for Elasticsearch, Slurm will remove any
1024              trailing  slashes  "/"  from  the  configured  URL  and   append
1025              "/slurm/jobcomp", which are the Elasticsearch index name (slurm)
1026              and mapping (jobcomp).  NOTE: More information is  available  at
1027              the   Slurm   web   site   (  https://slurm.schedmd.com/elastic‐
1028              search.html ).  Also see DefaultStorageLoc.
1029
1030
1031       JobCompPass
1032              The password used to gain access to the database  to  store  the
1033              job  completion data.  Only used for database type storage plug‐
1034              ins, ignored otherwise.  Also see DefaultStoragePass.
1035
1036
1037       JobCompPort
1038              The listening port of the job completion database server.   Only
1039              used for database type storage plugins, ignored otherwise.  Also
1040              see DefaultStoragePort.
1041
1042
1043       JobCompType
1044              The job completion logging mechanism type.  Acceptable values at
1045              present  include  "jobcomp/none", "jobcomp/elasticsearch", "job‐
1046              comp/filetxt",  "jobcomp/mysql"   and   "jobcomp/script".    The
1047              default  value is "jobcomp/none", which means that upon job com‐
1048              pletion the record of the job is purged  from  the  system.   If
1049              using  the  accounting  infrastructure this plugin may not be of
1050              interest since the information here  is  redundant.   The  value
1051              "jobcomp/elasticsearch"  indicates  that  a  record  of  the job
1052              should be written to an Elasticsearch server  specified  by  the
1053              JobCompLoc  parameter.   NOTE:  More information is available at
1054              the  Slurm   web   site   (   https://slurm.schedmd.com/elastic‐
1055              search.html  ).   The  value  "jobcomp/filetxt" indicates that a
1056              record of the job should be written to a text file specified  by
1057              the  JobCompLoc  parameter.  The value "jobcomp/mysql" indicates
1058              that a record of the job should be written to a MySQL or MariaDB
1059              database specified by the JobCompLoc parameter.  The value "job‐
1060              comp/script" indicates that a script specified by the JobCompLoc
1061              parameter  is to be executed with environment variables indicat‐
1062              ing the job information.
1063
1064       JobCompUser
1065              The user account for  accessing  the  job  completion  database.
1066              Only  used for database type storage plugins, ignored otherwise.
1067              Also see DefaultStorageUser.
1068
1069
1070       JobContainerType
1071              Identifies the plugin to be used for job tracking.   The  slurmd
1072              daemon  must  be  restarted  for a change in JobContainerType to
1073              take effect.  NOTE: The JobContainerType applies to a job  allo‐
1074              cation,  while  ProctrackType  applies to job steps.  Acceptable
1075              values at present include:
1076
1077              job_container/cncu  used only for Cray systems (CNCU  =  Compute
1078                                  Node Clean Up)
1079
1080              job_container/none  used for all other system types
1081
1082
1083       JobCredentialPrivateKey
1084              Fully qualified pathname of a file containing a private key used
1085              for authentication by Slurm daemons.  This parameter is  ignored
1086              if CryptoType=crypto/munge.
1087
1088
1089       JobCredentialPublicCertificate
1090              Fully  qualified pathname of a file containing a public key used
1091              for authentication by Slurm daemons.  This parameter is  ignored
1092              if CryptoType=crypto/munge.
1093
1094
1095       JobFileAppend
1096              This  option controls what to do if a job's output or error file
1097              exist when the job is started.  If JobFileAppend  is  set  to  a
1098              value  of  1, then append to the existing file.  By default, any
1099              existing file is truncated.
1100
1101
1102       JobRequeue
1103              This option controls the default ability for batch  jobs  to  be
1104              requeued.   Jobs may be requeued explicitly by a system adminis‐
1105              trator, after node failure, or upon preemption by a higher  pri‐
1106              ority job.  If JobRequeue is set to a value of 1, then batch job
1107              may be requeued unless explicitly  disabled  by  the  user.   If
1108              JobRequeue  is  set  to a value of 0, then batch job will not be
1109              requeued unless explicitly enabled by the user.  Use the  sbatch
1110              --no-requeue  or --requeue option to change the default behavior
1111              for individual jobs.  The default value is 1.
1112
1113
1114       JobSubmitPlugins
1115              A comma delimited list of job submission  plugins  to  be  used.
1116              The  specified  plugins  will  be  executed in the order listed.
1117              These are intended to be site-specific plugins which can be used
1118              to  set  default  job  parameters and/or logging events.  Sample
1119              plugins available in the distribution include  "all_partitions",
1120              "defaults",  "logging", "lua", and "partition".  For examples of
1121              use, see the Slurm code in  "src/plugins/job_submit"  and  "con‐
1122              tribs/lua/job_submit*.lua"  then modify the code to satisfy your
1123              needs.  Slurm can be configured to use multiple job_submit plug‐
1124              ins if desired, however the lua plugin will only execute one lua
1125              script named "job_submit.lua"  located  in  the  default  script
1126              directory  (typically the subdirectory "etc" of the installation
1127              directory).  No job submission plugins are used by default.
1128
1129
1130       KeepAliveTime
1131              Specifies how long sockets communications used between the  srun
1132              command  and its slurmstepd process are kept alive after discon‐
1133              nect.  Longer values can be used to improve reliability of  com‐
1134              munications in the event of network failures.  The default value
1135              leaves the system default  value.   The  value  may  not  exceed
1136              65533.
1137
1138
1139       KillOnBadExit
1140              If  set  to 1, a step will be terminated immediately if any task
1141              is crashed or aborted, as indicated by  a  non-zero  exit  code.
1142              With  the default value of 0, if one of the processes is crashed
1143              or aborted the other processes will continue to  run  while  the
1144              crashed  or  aborted  process  waits. The user can override this
1145              configuration parameter by using srun's -K, --kill-on-bad-exit.
1146
1147
1148       KillWait
1149              The interval, in seconds, given to a job's processes between the
1150              SIGTERM  and  SIGKILL  signals upon reaching its time limit.  If
1151              the job fails to terminate gracefully in the interval specified,
1152              it  will  be  forcibly terminated.  The default value is 30 sec‐
1153              onds.  The value may not exceed 65533.
1154
1155
1156       NodeFeaturesPlugins
1157              Identifies the plugins to be used for support of  node  features
1158              which  can  change through time. For example, a node which might
1159              be booted with various BIOS setting. This is  supported  through
1160              the  use  of  a  node's  active_features  and available_features
1161              information.  Acceptable values at present include:
1162
1163              node_features/knl_cray
1164                                  used only for Intel Knights Landing  proces‐
1165                                  sors (KNL) on Cray systems
1166
1167              node_features/knl_generic
1168                                  used  for  Intel  Knights Landing processors
1169                                  (KNL) on a generic Linux system
1170
1171
1172       LaunchParameters
1173              Identifies options to the job launch plugin.  Acceptable  values
1174              include:
1175
1176              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1177                                      from  given  --cpu-freq,  or  slurm.conf
1178                                      CpuFreqDef,  option.   By  default  only
1179                                      steps started with srun will utilize the
1180                                      cpu freq setting options.
1181
1182                                      NOTE:  If  you  are using srun to launch
1183                                      your  steps  inside   a   batch   script
1184                                      (advised) this option will create a sit‐
1185                                      uation  where  you  may  have   multiple
1186                                      agents setting the cpu_freq as the batch
1187                                      step usually runs on the same  resources
1188                                      one  or  more  steps  the  sruns  in the
1189                                      script will create.
1190
1191              cray_net_exclusive      Allow jobs  on  a  Cray  Native  cluster
1192                                      exclusive  access  to network resources.
1193                                      This should only be set on clusters pro‐
1194                                      viding  exclusive access to each node to
1195                                      a single job at once, and not using par‐
1196                                      allel  steps  within  the job, otherwise
1197                                      resources on the node  can  be  oversub‐
1198                                      scribed.
1199
1200              lustre_no_flush         If set on a Cray Native cluster, then do
1201                                      not flush the Lustre cache on  job  step
1202                                      completion.  This setting will only take
1203                                      effect  after  reconfiguring,  and  will
1204                                      only  take  effect  for  newly  launched
1205                                      jobs.
1206
1207              mem_sort                Sort NUMA memory at step start. User can
1208                                      override      this      default     with
1209                                      SLURM_MEM_BIND environment  variable  or
1210                                      --mem-bind=nosort command line option.
1211
1212              send_gids               Lookup   and   send  the  user_name  and
1213                                      extended  gids  for  a  job  within  the
1214                                      slurmctld,  rather  than  individual  on
1215                                      each node as part of each  task  launch.
1216                                      Should  avoid issues around name service
1217                                      scalability when launching jobs  involv‐
1218                                      ing many nodes.
1219
1220              slurmstepd_memlock      Lock  the  slurmstepd  process's current
1221                                      memory in RAM.
1222
1223              slurmstepd_memlock_all  Lock the  slurmstepd  process's  current
1224                                      and future memory in RAM.
1225
1226              test_exec               Have  srun  verify existence of the exe‐
1227                                      cutable program along with user  execute
1228                                      permission  on  the  node where srun was
1229                                      called before attempting to launch it on
1230                                      nodes in the step.
1231
1232
1233       LaunchType
1234              Identifies the mechanism to be used to launch application tasks.
1235              Acceptable values include:
1236
1237              launch/slurm
1238                     The default value.
1239
1240
1241       Licenses
1242              Specification of licenses (or other resources available  on  all
1243              nodes  of  the cluster) which can be allocated to jobs.  License
1244              names can optionally be followed by a colon  and  count  with  a
1245              default  count  of  one.  Multiple license names should be comma
1246              separated (e.g.  "Licenses=foo:4,bar").  Note  that  Slurm  pre‐
1247              vents jobs from being scheduled if their required license speci‐
1248              fication is not available.  Slurm does  not  prevent  jobs  from
1249              using licenses that are not explicitly listed in the job submis‐
1250              sion specification.
1251
1252
1253       LogTimeFormat
1254              Format of the timestamp  in  slurmctld  and  slurmd  log  files.
1255              Accepted   values   are   "iso8601",   "iso8601_ms",  "rfc5424",
1256              "rfc5424_ms", "clock", "short" and "thread_id". The values  end‐
1257              ing  in  "_ms"  differ  from the ones without in that fractional
1258              seconds with millisecond  precision  are  printed.  The  default
1259              value is "iso8601_ms". The "rfc5424" formats are the same as the
1260              "iso8601" formats except that the timezone value is also  shown.
1261              The  "clock"  format shows a timestamp in microseconds retrieved
1262              with the C standard clock() function. The "short"  format  is  a
1263              short  date  and  time  format. The "thread_id" format shows the
1264              timestamp in the C standard ctime() function  form  without  the
1265              year but including the microseconds, the daemon's process ID and
1266              the current thread name and ID.
1267
1268
1269       MailDomain
1270              Domain name to qualify usernames if email address is not explic‐
1271              itly  given  with  the "--mail-user" option. If unset, the local
1272              MTA will need to qualify local address itself.
1273
1274
1275       MailProg
1276              Fully qualified pathname to the program used to send  email  per
1277              user   request.    The   default   value   is   "/bin/mail"  (or
1278              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1279              "/usr/bin/mail" does exist).
1280
1281
1282       MaxArraySize
1283              The  maximum  job  array size.  The maximum job array task index
1284              value will be one less than MaxArraySize to allow for  an  index
1285              value  of zero.  Configure MaxArraySize to 0 in order to disable
1286              job array use.  The value may not exceed 4000001.  The value  of
1287              MaxJobCount  should  be  much  larger  than  MaxArraySize.   The
1288              default value is 1001.
1289
1290
1291       MaxJobCount
1292              The maximum number of jobs Slurm can have in its active database
1293              at  one  time.  Set  the  values of MaxJobCount and MinJobAge to
1294              ensure the slurmctld daemon does not exhaust its memory or other
1295              resources.  Once this limit is reached, requests to submit addi‐
1296              tional jobs will fail. The default value is 10000  jobs.   NOTE:
1297              Each task of a job array counts as one job even though they will
1298              not occupy separate job records  until  modified  or  initiated.
1299              Performance  can  suffer  with  more than a few hundred thousand
1300              jobs.  Setting per MaxSubmitJobs per user is generally  valuable
1301              to  prevent  a  single  user  from filling the system with jobs.
1302              This is accomplished  using  Slurm's  database  and  configuring
1303              enforcement of resource limits.  This value may not be reset via
1304              "scontrol reconfig".  It only takes effect upon restart  of  the
1305              slurmctld daemon.
1306
1307
1308       MaxJobId
1309              The  maximum job id to be used for jobs submitted to Slurm with‐
1310              out a specific requested value. Job ids are unsigned 32bit inte‐
1311              gers  with  the first 26 bits reserved for local job ids and the
1312              remaining 6 bits reserved for a cluster id to identify a  feder‐
1313              ated   job's  origin.  The  maximun  allowed  local  job  id  is
1314              67,108,863  (0x3FFFFFF).  The  default   value   is   67,043,328
1315              (0x03ff0000).  MaxJobId only applies to the local job id and not
1316              the federated job id.  Job id values generated  will  be  incre‐
1317              mented  by  1 for each subsequent job. Once MaxJobId is reached,
1318              the next job will be assigned FirstJobId.  Federated  jobs  will
1319              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1320              bId.
1321
1322
1323       MaxMemPerCPU
1324              Maximum  real  memory  size  available  per  allocated  CPU   in
1325              megabytes.   Used  to  avoid over-subscribing memory and causing
1326              paging.  MaxMemPerCPU would generally be used if individual pro‐
1327              cessors are allocated to jobs (SelectType=select/cons_res).  The
1328              default value is  0  (unlimited).   Also  see  DefMemPerCPU  and
1329              MaxMemPerNode.   MaxMemPerCPU  and  MaxMemPerNode  are  mutually
1330              exclusive.
1331
1332              NOTE: If a job specifies a memory per  CPU  limit  that  exceeds
1333              this  system limit, that job's count of CPUs per task will auto‐
1334              matically be increased. This may result in the job  failing  due
1335              to CPU count limits.
1336
1337
1338       MaxMemPerNode
1339              Maximum  real  memory  size  available  per  allocated  node  in
1340              megabytes.  Used to avoid over-subscribing  memory  and  causing
1341              paging.   MaxMemPerNode  would  generally be used if whole nodes
1342              are allocated to jobs (SelectType=select/linear)  and  resources
1343              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
1344              The default value is 0 (unlimited).  Also see DefMemPerNode  and
1345              MaxMemPerCPU.    MaxMemPerCPU  and  MaxMemPerNode  are  mutually
1346              exclusive.
1347
1348
1349       MaxStepCount
1350              The maximum number of steps that  any  job  can  initiate.  This
1351              parameter  is intended to limit the effect of bad batch scripts.
1352              The default value is 40000 steps.
1353
1354
1355       MaxTasksPerNode
1356              Maximum number of tasks Slurm will allow a job step to spawn  on
1357              a  single  node.  The  default  MaxTasksPerNode is 512.  May not
1358              exceed 65533.
1359
1360
1361       MCSParameters
1362              MCS = Multi-Category Security MCS Plugin Parameters.   The  sup‐
1363              ported  parameters  are  specific  to the MCSPlugin.  Changes to
1364              this value take effect when the Slurm daemons are  reconfigured.
1365              More     information     about    MCS    is    available    here
1366              <https://slurm.schedmd.com/mcs.html>.
1367
1368
1369       MCSPlugin
1370              MCS = Multi-Category Security : associate a  security  label  to
1371              jobs  and  ensure that nodes can only be shared among jobs using
1372              the same security label.  Acceptable values include:
1373
1374              mcs/none    is the default value.  No security label  associated
1375                          with  jobs,  no particular security restriction when
1376                          sharing nodes among jobs.
1377
1378              mcs/account only users with the same account can share the nodes
1379                          (requires enabling of accounting).
1380
1381              mcs/group   only users with the same group can share the nodes.
1382
1383              mcs/user    a node cannot be shared with other users.
1384
1385
1386       MemLimitEnforce
1387              If  set  to  yes then Slurm will terminate the job if it exceeds
1388              the value requested  using  the  --mem-per-cpu  option  of  sal‐
1389              loc/sbatch/srun.   This  is  useful in combination with JobAcct‐
1390              GatherParams=OverMemoryKill.  Used when  jobs  need  to  specify
1391              --mem-per-cpu  for  scheduling  and they should be terminated if
1392              they exceed the estimated value.  The  default  value  is  'no',
1393              which  disables  this  enforcing  mechanism.  NOTE: It is recom‐
1394              mended to limit memory by enabling task/cgroup in TaskPlugin and
1395              making use of ConstrainRAMSpace=yes cgroup.conf instead of using
1396              this JobAcctGather mechanism for memory enforcement,  since  the
1397              former has a lower resolution (JobAcctGatherFreq) and OOMs could
1398              happen at some point.
1399
1400
1401       MessageTimeout
1402              Time permitted for a round-trip  communication  to  complete  in
1403              seconds.  Default  value  is 10 seconds. For systems with shared
1404              nodes, the slurmd daemon could  be  paged  out  and  necessitate
1405              higher values.
1406
1407
1408       MinJobAge
1409              The  minimum  age of a completed job before its record is purged
1410              from Slurm's active database. Set the values of MaxJobCount  and
1411              to  ensure  the  slurmctld daemon does not exhaust its memory or
1412              other resources. The default value is 300 seconds.  A  value  of
1413              zero  prevents  any  job  record purging.  In order to eliminate
1414              some possible race conditions, the minimum  non-zero  value  for
1415              MinJobAge recommended is 2.
1416
1417
1418       MpiDefault
1419              Identifies  the  default type of MPI to be used.  Srun may over‐
1420              ride this configuration parameter in any case.   Currently  sup‐
1421              ported versions include: openmpi, pmi2, pmix, and none (default,
1422              which works for many other versions of MPI).   More  information
1423              about        MPI        use        is       available       here
1424              <https://slurm.schedmd.com/mpi_guide.html>.
1425
1426
1427       MpiParams
1428              MPI parameters.  Used to identify ports used by  older  versions
1429              of  OpenMPI  and  native  Cray  systems.   The  input  format is
1430              "ports=12000-12999" to identify a range of  communication  ports
1431              to  be  used.   NOTE:  This is not needed for modern versions of
1432              OpenMPI, taking it out can cause a  small  boost  in  scheduling
1433              performance.  NOTE: This is require for Cray's PMI.
1434
1435       MsgAggregationParams
1436              Message   aggregation  parameters.  Message  aggregation  is  an
1437              optional feature that may improve system performance by reducing
1438              the  number  of separate messages passed between nodes. The fea‐
1439              ture works by routing messages through one or more message  col‐
1440              lector nodes between their source and destination nodes. At each
1441              collector node, messages with the same destination received dur‐
1442              ing a defined message collection window are packaged into a sin‐
1443              gle composite message. When the window  expires,  the  composite
1444              message  is  sent to the next collector node on the route to its
1445              destination. The route between each source and destination  node
1446              is  provided  by  the  Route plugin. When a composite message is
1447              received at its destination  node,  the  original  messages  are
1448              extracted and processed as if they had been sent directly.
1449              Currently,  the only message types supported by message aggrega‐
1450              tion are the node registration, batch  script  completion,  step
1451              completion, and epilog complete messages.
1452              The format for this parameter is as follows:
1453
1454              MsgAggregationParams=<option>=<value>
1455                          where  <option>=<value> specify a particular control
1456                          variable. Multiple, comma-separated <option>=<value>
1457                          pairs  may  be  specified.  Supported options are as
1458                          follows:
1459
1460                          WindowMsgs=<number>
1461                                 where <number> is the maximum number of  mes‐
1462                                 sages in each message collection window.
1463
1464                          WindowTime=<time>
1465                                 where  <time>  is the maximum elapsed time in
1466                                 milliseconds of each message collection  win‐
1467                                 dow.
1468
1469              A  window  expires  when either WindowMsgs or
1470              WindowTime is
1471              reached. By default, message aggregation is disabled. To  enable
1472              the  feature,  set  WindowMsgs  to  a  value greater than 1. The
1473              default value for WindowTime is 100 milliseconds.
1474
1475
1476       OverTimeLimit
1477              Number of minutes by which a  job  can  exceed  its  time  limit
1478              before  being  canceled.  Normally a job's time limit is treated
1479              as a hard limit and the job will be killed  upon  reaching  that
1480              limit.   Configuring OverTimeLimit will result in the job's time
1481              limit being treated like a soft limit.  Adding the OverTimeLimit
1482              value  to  the  soft  time  limit provides a hard time limit, at
1483              which point the job is canceled.  This  is  particularly  useful
1484              for  backfill  scheduling, which bases upon each job's soft time
1485              limit.  The default value is zero.  May not exceed exceed  65533
1486              minutes.  A value of "UNLIMITED" is also supported.
1487
1488
1489       PluginDir
1490              Identifies  the places in which to look for Slurm plugins.  This
1491              is a colon-separated list of directories, like the PATH environ‐
1492              ment variable.  The default value is "/usr/local/lib/slurm".
1493
1494
1495       PlugStackConfig
1496              Location of the config file for Slurm stackable plugins that use
1497              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1498              (SPANK).  This provides support for a highly configurable set of
1499              plugins to be called before and/or after execution of each  task
1500              spawned  as  part  of  a  user's  job step.  Default location is
1501              "plugstack.conf" in the same directory as the system slurm.conf.
1502              For more information on SPANK plugins, see the spank(8) manual.
1503
1504
1505       PowerParameters
1506              System  power  management  parameters.  The supported parameters
1507              are specific to the PowerPlugin.  Changes  to  this  value  take
1508              effect  when  the Slurm daemons are reconfigured.  More informa‐
1509              tion  about  system   power   management   is   available   here
1510              <https://slurm.schedmd.com/power_mgmt.html>.    Options  current
1511              supported by any plugins are listed below.
1512
1513              balance_interval=#
1514                     Specifies the time interval, in seconds, between attempts
1515                     to rebalance power caps across the nodes.  This also con‐
1516                     trols the frequency at which Slurm  attempts  to  collect
1517                     current  power  consumption  data  (old  data may be used
1518                     until new data is available from  the  underlying  infra‐
1519                     structure and values below 10 seconds are not recommended
1520                     for Cray systems).  The  default  value  is  30  seconds.
1521                     Supported by the power/cray plugin.
1522
1523              capmc_path=
1524                     Specifies  the  absolute  path of the capmc command.  The
1525                     default  value  is   "/opt/cray/capmc/default/bin/capmc".
1526                     Supported by the power/cray plugin.
1527
1528              cap_watts=#
1529                     Specifies  the total power limit to be established across
1530                     all compute nodes managed by Slurm.  A value  of  0  sets
1531                     every compute node to have an unlimited cap.  The default
1532                     value is 0.  Supported by the power/cray plugin.
1533
1534              decrease_rate=#
1535                     Specifies the maximum rate of change in the power cap for
1536                     a  node  where  the actual power usage is below the power
1537                     cap  by  an  amount  greater  than  lower_threshold  (see
1538                     below).   Value represents a percentage of the difference
1539                     between a node's minimum and maximum  power  consumption.
1540                     The  default  value  is  50  percent.   Supported  by the
1541                     power/cray plugin.
1542
1543              get_timeout=#
1544                     Amount of time allowed to get power state information  in
1545                     milliseconds.  The default value is 5,000 milliseconds or
1546                     5 seconds.  Supported by the power/cray plugin and repre‐
1547                     sents  the  time allowed for the capmc command to respond
1548                     to various "get" options.
1549
1550              increase_rate=#
1551                     Specifies the maximum rate of change in the power cap for
1552                     a   node   where   the   actual  power  usage  is  within
1553                     upper_threshold (see below) of the power cap.  Value rep‐
1554                     resents  a  percentage of the difference between a node's
1555                     minimum and maximum power consumption.  The default value
1556                     is 20 percent.  Supported by the power/cray plugin.
1557
1558              job_level
1559                     All  nodes  associated  with every job will have the same
1560                     power  cap,  to  the  extent  possible.   Also  see   the
1561                     --power=level option on the job submission commands.
1562
1563              job_no_level
1564                     Disable  the  user's ability to set every node associated
1565                     with a job to the same power cap.  Each  node  will  have
1566                     it's  power  cap  set  independently.   This disables the
1567                     --power=level option on the job submission commands.
1568
1569              lower_threshold=#
1570                     Specify a lower power consumption threshold.  If a node's
1571                     current power consumption is below this percentage of its
1572                     current cap, then its power cap  will  be  reduced.   The
1573                     default value is 90 percent.  Supported by the power/cray
1574                     plugin.
1575
1576              recent_job=#
1577                     If a job has started or resumed execution (from  suspend)
1578                     on  a compute node within this number of seconds from the
1579                     current time, the node's power cap will be  increased  to
1580                     the  maximum.   The  default  value is 300 seconds.  Sup‐
1581                     ported by the power/cray plugin.
1582
1583              set_timeout=#
1584                     Amount of time allowed to set power state information  in
1585                     milliseconds.   The  default value is 30,000 milliseconds
1586                     or 30 seconds.  Supported by the  power/cray  plugin  and
1587                     represents  the  time  allowed  for  the capmc command to
1588                     respond to various "set" options.
1589
1590              set_watts=#
1591                     Specifies the power limit to  be  set  on  every  compute
1592                     nodes  managed by Slurm.  Every node gets this same power
1593                     cap and there is no variation  through  time  based  upon
1594                     actual  power  usage  on  the  node.   Supported  by  the
1595                     power/cray plugin.
1596
1597              upper_threshold=#
1598                     Specify an  upper  power  consumption  threshold.   If  a
1599                     node's current power consumption is above this percentage
1600                     of its current cap, then its power cap will be  increased
1601                     to the extent possible.  The default value is 95 percent.
1602                     Supported by the power/cray plugin.
1603
1604
1605       PowerPlugin
1606              Identifies the plugin used for system  power  management.   Cur‐
1607              rently  supported  plugins  include:  cray and none.  Changes to
1608              this value require restarting  Slurm  daemons  to  take  effect.
1609              More information about system power management is available here
1610              <https://slurm.schedmd.com/power_mgmt.html>.   By  default,   no
1611              power plugin is loaded.
1612
1613
1614       PreemptMode
1615              Enables  gang  scheduling  and/or controls the mechanism used to
1616              preempt jobs.  When the PreemptType parameter is set  to  enable
1617              preemption,  the  PreemptMode selects the default mechanism used
1618              to preempt the lower priority jobs for the cluster.  PreemptMode
1619              may  be  specified  on  a  per  partition basis to override this
1620              default value if PreemptType=preempt/partition_prio, but a valid
1621              default PreemptMode value must be specified for the cluster as a
1622              whole when preemption is enabled.  The GANG option  is  used  to
1623              enable  gang  scheduling  independent  of  whether preemption is
1624              enabled (the PreemptType setting).  The GANG option can be spec‐
1625              ified  in addition to a PreemptMode setting with the two options
1626              comma separated.  The SUSPEND option requires that gang schedul‐
1627              ing  be  enabled  (i.e,  "PreemptMode=SUSPEND,GANG").  NOTE: For
1628              performance reasons, the backfill scheduler reserves whole nodes
1629              for jobs, not partial nodes. If during backfill scheduling a job
1630              preempts one or more other jobs, the whole nodes for those  pre‐
1631              empted jobs are reserved for the preemptor job, even if the pre‐
1632              emptor job requested fewer resources than that.  These  reserved
1633              nodes aren't available to other jobs during that backfill cycle,
1634              even if the other jobs could fit on the nodes.  Therefore,  jobs
1635              may  preempt  more  resources during a single backfill iteration
1636              than they requested.
1637
1638              OFF         is the default value and disables job preemption and
1639                          gang scheduling.
1640
1641              CANCEL      always cancel the job.
1642
1643              CHECKPOINT  preempts jobs by checkpointing them (if possible) or
1644                          canceling them.
1645
1646              GANG        enables gang scheduling (time slicing)  of  jobs  in
1647                          the  same  partition.  NOTE: Gang scheduling is per‐
1648                          formed independently for each partition, so  config‐
1649                          uring  partitions  with  overlapping  nodes and gang
1650                          scheduling is generally not recommended.
1651
1652              REQUEUE     preempts jobs by requeuing  them  (if  possible)  or
1653                          canceling  them.   For jobs to be requeued they must
1654                          have the --requeue sbatch option set or the  cluster
1655                          wide  JobRequeue parameter in slurm.conf must be set
1656                          to one.
1657
1658              SUSPEND     If PreemptType=preempt/partition_prio is  configured
1659                          then suspend and automatically resume the low prior‐
1660                          ity jobs.  If PreemptType=preempt/qos is configured,
1661                          then  the  jobs  sharing  resources will always time
1662                          slice rather than one job remaining suspended.   The
1663                          SUSPEND  may  only be used with the GANG option (the
1664                          gang scheduler module performs the job resume opera‐
1665                          tion).
1666
1667
1668       PreemptType
1669              This  specifies  the  plugin  used to identify which jobs can be
1670              preempted in order to start a pending job.
1671
1672              preempt/none
1673                     Job preemption is disabled.  This is the default.
1674
1675              preempt/partition_prio
1676                     Job preemption is based  upon  partition  priority  tier.
1677                     Jobs  in  higher priority partitions (queues) may preempt
1678                     jobs from lower priority partitions.  This is not compat‐
1679                     ible with PreemptMode=OFF.
1680
1681              preempt/qos
1682                     Job  preemption rules are specified by Quality Of Service
1683                     (QOS) specifications in the Slurm database.  This  option
1684                     is  not compatible with PreemptMode=OFF.  A configuration
1685                     of  PreemptMode=SUSPEND  is   only   supported   by   the
1686                     select/cons_res plugin.
1687
1688
1689       PriorityDecayHalfLife
1690              This  controls  how  long  prior  resource  use is considered in
1691              determining how over- or under-serviced an association is (user,
1692              bank  account  and  cluster)  in  determining job priority.  The
1693              record of usage will be decayed over  time,  with  half  of  the
1694              original  value cleared at age PriorityDecayHalfLife.  If set to
1695              0 no decay will be applied.  This is  helpful  if  you  want  to
1696              enforce  hard  time limits per association.  If set to 0 Priori‐
1697              tyUsageResetPeriod must be set  to  some  interval.   Applicable
1698              only  if  PriorityType=priority/multifactor.  The unit is a time
1699              string (i.e. min, hr:min:00, days-hr:min:00, or  days-hr).   The
1700              default value is 7-0 (7 days).
1701
1702
1703       PriorityCalcPeriod
1704              The  period of time in minutes in which the half-life decay will
1705              be re-calculated.  Applicable only if PriorityType=priority/mul‐
1706              tifactor.  The default value is 5 (minutes).
1707
1708
1709       PriorityFavorSmall
1710              Specifies  that small jobs should be given preferential schedul‐
1711              ing priority.  Applicable only  if  PriorityType=priority/multi‐
1712              factor.  Supported values are "YES" and "NO".  The default value
1713              is "NO".
1714
1715
1716       PriorityFlags
1717              Flags to modify priority behavior Applicable only  if  Priority‐
1718              Type=priority/multifactor.   The  keywords below have no associ‐
1719              ated   value   (e.g.    "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
1720              TIVE_TO_TIME").
1721
1722              ACCRUE_ALWAYS    If  set,  priority age factor will be increased
1723                               despite job dependencies or holds.
1724
1725              CALCULATE_RUNNING
1726                               If set, priorities  will  be  recalculated  not
1727                               only  for  pending  jobs,  but also running and
1728                               suspended jobs.
1729
1730              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
1731                               lar  to the normal multifactor calculation, but
1732                               depth of the associations in the  tree  do  not
1733                               adversely  effect  their  priority. This option
1734                               precludes the use of FAIR_TREE.
1735
1736              FAIR_TREE        If set, priority will be calculated in  such  a
1737                               way that if accounts A and B are siblings and A
1738                               has a higher fairshare factor than B, all chil‐
1739                               dren  of  A  will have higher fairshare factors
1740                               than all children of B.
1741
1742              INCR_ONLY        If set, priority values will only  increase  in
1743                               value.  Job  priority  will  never  decrease in
1744                               value.
1745
1746              MAX_TRES         If set, the weighted  TRES  value  (e.g.  TRES‐
1747                               BillingWeights)  is  calculated  as  the MAX of
1748                               individual TRES' on a  node  (e.g.  cpus,  mem,
1749                               gres)  plus  the  sum of all global TRES' (e.g.
1750                               licenses).
1751
1752              SMALL_RELATIVE_TO_TIME
1753                               If set, the job's size component will be  based
1754                               upon not the job size alone, but the job's size
1755                               divided by it's time limit.
1756
1757
1758       PriorityParameters
1759              Arbitrary string used by the PriorityType plugin.
1760
1761
1762       PriorityMaxAge
1763              Specifies the job age which will be given the maximum age factor
1764              in  computing priority. For example, a value of 30 minutes would
1765              result in all jobs over  30  minutes  old  would  get  the  same
1766              age-based  priority.   Applicable  only  if  PriorityType=prior‐
1767              ity/multifactor.   The  unit  is  a  time  string   (i.e.   min,
1768              hr:min:00,  days-hr:min:00,  or  days-hr).  The default value is
1769              7-0 (7 days).
1770
1771
1772       PriorityUsageResetPeriod
1773              At this interval the usage of associations will be reset  to  0.
1774              This  is  used  if you want to enforce hard limits of time usage
1775              per association.  If PriorityDecayHalfLife is set  to  be  0  no
1776              decay  will  happen  and this is the only way to reset the usage
1777              accumulated by running jobs.  By default this is turned off  and
1778              it  is  advised to use the PriorityDecayHalfLife option to avoid
1779              not having anything running on your cluster, but if your  schema
1780              is  set  up to only allow certain amounts of time on your system
1781              this is the way to do it.  Applicable only if  PriorityType=pri‐
1782              ority/multifactor.
1783
1784              NONE        Never clear historic usage. The default value.
1785
1786              NOW         Clear  the  historic usage now.  Executed at startup
1787                          and reconfiguration time.
1788
1789              DAILY       Cleared every day at midnight.
1790
1791              WEEKLY      Cleared every week on Sunday at time 00:00.
1792
1793              MONTHLY     Cleared on the first  day  of  each  month  at  time
1794                          00:00.
1795
1796              QUARTERLY   Cleared  on  the  first  day of each quarter at time
1797                          00:00.
1798
1799              YEARLY      Cleared on the first day of each year at time 00:00.
1800
1801
1802       PriorityType
1803              This specifies the plugin to be used  in  establishing  a  job's
1804              scheduling priority. Supported values are "priority/basic" (jobs
1805              are prioritized by  order  of  arrival),  "priority/multifactor"
1806              (jobs  are prioritized based upon size, age, fair-share of allo‐
1807              cation, etc).  Also see PriorityFlags for configuration options.
1808              The default value is "priority/basic".
1809
1810              When  not FIFO scheduling, jobs are prioritized in the following
1811              order:
1812
1813              1. Jobs that can preempt
1814
1815              2. Jobs with an advanced reservation
1816
1817              3. Partition Priority Tier
1818
1819              4. Job Priority
1820
1821              5. Job Id
1822
1823
1824
1825       PriorityWeightAge
1826              An integer value that sets the degree to which  the  queue  wait
1827              time  component  contributes  to the job's priority.  Applicable
1828              only if PriorityType=priority/multifactor.  The default value is
1829              0.
1830
1831
1832       PriorityWeightFairshare
1833              An  integer  value  that sets the degree to which the fair-share
1834              component contributes to the job's priority.  Applicable only if
1835              PriorityType=priority/multifactor.  The default value is 0.
1836
1837
1838       PriorityWeightJobSize
1839              An integer value that sets the degree to which the job size com‐
1840              ponent contributes to the job's priority.   Applicable  only  if
1841              PriorityType=priority/multifactor.  The default value is 0.
1842
1843
1844       PriorityWeightPartition
1845              Partition  factor  used by priority/multifactor plugin in calcu‐
1846              lating job priority.   Applicable  only  if  PriorityType=prior‐
1847              ity/multifactor.  The default value is 0.
1848
1849
1850       PriorityWeightQOS
1851              An  integer  value  that sets the degree to which the Quality Of
1852              Service component contributes to the job's priority.  Applicable
1853              only if PriorityType=priority/multifactor.  The default value is
1854              0.
1855
1856
1857       PriorityWeightTRES
1858              A comma separated list of TRES Types and weights that  sets  the
1859              degree that each TRES Type contributes to the job's priority.
1860
1861              e.g.
1862              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
1863
1864              Applicable  only  if  PriorityType=priority/multifactor  and  if
1865              AccountingStorageTRES is configured with each TRES Type.   Nega‐
1866              tive values are allowed.  The default values are 0.
1867
1868
1869       PrivateData
1870              This  controls  what  type of information is hidden from regular
1871              users.  By default, all information is  visible  to  all  users.
1872              User SlurmUser and root can always view all information.  Multi‐
1873              ple values may be specified with a comma separator.   Acceptable
1874              values include:
1875
1876              accounts
1877                     (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
1878                     ing any account definitions unless they are  coordinators
1879                     of them.
1880
1881              cloud  Powered down nodes in the cloud are visible.
1882
1883              events prevents users from viewing event information unless they
1884                     have operator status or above.
1885
1886              jobs   Prevents users from viewing jobs or job  steps  belonging
1887                     to  other  users. (NON-SlurmDBD ACCOUNTING ONLY) Prevents
1888                     users from viewing job records belonging to  other  users
1889                     unless  they  are coordinators of the association running
1890                     the job when using sacct.
1891
1892              nodes  Prevents users from viewing node state information.
1893
1894              partitions
1895                     Prevents users from viewing partition state information.
1896
1897              reservations
1898                     Prevents regular users from  viewing  reservations  which
1899                     they can not use.
1900
1901              usage  Prevents users from viewing usage of any other user, this
1902                     applies to sshare.  (NON-SlurmDBD ACCOUNTING  ONLY)  Pre‐
1903                     vents  users  from  viewing usage of any other user, this
1904                     applies to sreport.
1905
1906              users  (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
1907                     ing  information  of any user other than themselves, this
1908                     also makes it so users can  only  see  associations  they
1909                     deal  with.   Coordinators  can  see  associations of all
1910                     users they are coordinator of, but  can  only  see  them‐
1911                     selves when listing users.
1912
1913
1914       ProctrackType
1915              Identifies  the  plugin to be used for process tracking on a job
1916              step basis.  The slurmd daemon uses this mechanism  to  identify
1917              all  processes  which  are children of processes it spawns for a
1918              user job step.  The slurmd daemon must be restarted for a change
1919              in  ProctrackType  to  take effect.  NOTE: "proctrack/linuxproc"
1920              and "proctrack/pgid" can fail to identify all processes  associ‐
1921              ated  with  a job since processes can become a child of the init
1922              process (when the parent process  terminates)  or  change  their
1923              process   group.    To  reliably  track  all  processes,  "proc‐
1924              track/cgroup" is highly recommended.  NOTE: The JobContainerType
1925              applies  to a job allocation, while ProctrackType applies to job
1926              steps.  Acceptable values at present include:
1927
1928              proctrack/cgroup    which uses linux cgroups  to  constrain  and
1929                                  track  processes, and is the default.  NOTE:
1930                                  see  "man  cgroup.conf"  for   configuration
1931                                  details
1932
1933              proctrack/cray      which uses Cray proprietary process tracking
1934
1935              proctrack/linuxproc which  uses  linux process tree using parent
1936                                  process IDs.
1937
1938              proctrack/lua       which uses a  site-specific  LUA  script  to
1939                                  track processes
1940
1941              proctrack/sgi_job   which  uses  SGI's Process Aggregates (PAGG)
1942                                  kernel              module,              see
1943                                  http://oss.sgi.com/projects/pagg/  for  more
1944                                  information
1945
1946              proctrack/pgid      which uses process group IDs
1947
1948
1949       Prolog Fully qualified pathname of a program for the slurmd to  execute
1950              whenever it is asked to run a job step from a new job allocation
1951              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
1952              may  also  be used to specify more than one program to run (e.g.
1953              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
1954              starting  the  first job step.  The prolog script or scripts may
1955              be used to purge files, enable  user  login,  etc.   By  default
1956              there  is  no  prolog. Any configured script is expected to com‐
1957              plete execution quickly (in less time than MessageTimeout).   If
1958              the  prolog  fails  (returns  a  non-zero  exit code), this will
1959              result in the node being set to a DRAIN state and the job  being
1960              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
1961              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
1962              for more information.
1963
1964
1965       PrologEpilogTimeout
1966              The  interval  in  seconds  Slurms  waits  for Prolog and Epilog
1967              before terminating them. The default behavior is to wait indefi‐
1968              nitely.  This  interval  applies to the Prolog and Epilog run by
1969              slurmd daemon before and after the job, the PrologSlurmctld  and
1970              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
1971              run by the slurmstepd daemon.
1972
1973
1974       PrologFlags
1975              Flags to control the Prolog behavior. By default  no  flags  are
1976              set.  Multiple flags may be specified in a comma-separated list.
1977              Currently supported options are:
1978
1979              Alloc   If set, the Prolog script will be executed at job  allo‐
1980                      cation.  By  default, Prolog is executed just before the
1981                      task is launched. Therefore, when salloc is started,  no
1982                      Prolog is executed. Alloc is useful for preparing things
1983                      before a user starts to use any allocated resources.  In
1984                      particular,  this  flag  is needed on a Cray system when
1985                      cluster compatibility mode is enabled.
1986
1987                      NOTE: Use of the  Alloc  flag  will  increase  the  time
1988                      required to start jobs.
1989
1990              Contain At job allocation time, use the ProcTrack plugin to cre‐
1991                      ate a job container  on  all  allocated  compute  nodes.
1992                      This  container  may  be  used  for  user  processes not
1993                      launched under Slurm control, for example the PAM module
1994                      may  place  processes launch through a direct user login
1995                      into this container.   Setting  the  Contain  implicitly
1996                      sets  the  Alloc flag.  You must set ProctrackType=proc‐
1997                      track/cgroup when using the Contain flag.
1998
1999              NoHold  If set, the Alloc flag should also be  set.   This  will
2000                      allow  for  salloc to not block until the prolog is fin‐
2001                      ished on each node.  The blocking will happen when steps
2002                      reach  the  slurmd and before any execution has happened
2003                      in the step.  This is a much faster way to work  and  if
2004                      using  srun  to  launch  your  tasks you should use this
2005                      flag. This flag cannot be combined with the  Contain  or
2006                      X11 flags.
2007
2008              Serial  By  default,  the  Prolog and Epilog scripts run concur‐
2009                      rently on each node.  This flag forces those scripts  to
2010                      run  serially  within  each node, but with a significant
2011                      penalty to job throughput on each node.
2012
2013              X11     Enable Slurm's  built-in  X11  forwarding  capabilities.
2014                      Slurm  must  have  been  compiled  with  libssh2 support
2015                      enabled, and either SSH hostkey authentication  or  per-
2016                      users  SSH key authentication must be enabled within the
2017                      cluster. Only RSA keys are supported at this time.  Set‐
2018                      ting  the  X11  flag implicitly enables both Contain and
2019                      Alloc flags as well.
2020
2021
2022       PrologSlurmctld
2023              Fully qualified pathname of a program for the  slurmctld  daemon
2024              to   execute   before   granting  a  new  job  allocation  (e.g.
2025              "/usr/local/slurm/prolog_controller").  The program executes  as
2026              SlurmUser  on the same node where the slurmctld daemon executes,
2027              giving it permission to drain nodes and requeue  the  job  if  a
2028              failure  occurs  or  cancel the job if appropriate.  The program
2029              can be used to reboot nodes or perform  other  work  to  prepare
2030              resources  for  use.   Exactly  what the program does and how it
2031              accomplishes this is completely at the discretion of the  system
2032              administrator.   Information about the job being initiated, it's
2033              allocated nodes, etc. are passed to the program  using  environ‐
2034              ment  variables.  While this program is running, the nodes asso‐
2035              ciated with the job will be have a POWER_UP/CONFIGURING flag set
2036              in their state, which can be readily viewed.  The slurmctld dae‐
2037              mon will wait indefinitely for this program to  complete.   Once
2038              the  program completes with an exit code of zero, the nodes will
2039              be considered ready for use and the program will be started.  If
2040              some  node can not be made available for use, the program should
2041              drain the node (typically using the scontrol command) and termi‐
2042              nate  with  a  non-zero  exit  code.   A non-zero exit code will
2043              result in the job being requeued  (where  possible)  or  killed.
2044              Note  that only batch jobs can be requeued.  See Prolog and Epi‐
2045              log Scripts for more information.
2046
2047
2048       PropagatePrioProcess
2049              Controls the scheduling priority (nice value)  of  user  spawned
2050              tasks.
2051
2052              0    The  tasks  will  inherit  the scheduling priority from the
2053                   slurm daemon.  This is the default value.
2054
2055              1    The tasks will inherit the scheduling priority of the  com‐
2056                   mand used to submit them (e.g. srun or sbatch).  Unless the
2057                   job is submitted by user root, the tasks will have a sched‐
2058                   uling  priority  no  higher  than the slurm daemon spawning
2059                   them.
2060
2061              2    The tasks will inherit the scheduling priority of the  com‐
2062                   mand  used  to  submit  them (e.g. srun or sbatch) with the
2063                   restriction that their nice value will always be one higher
2064                   than  the slurm daemon (i.e.  the tasks scheduling priority
2065                   will be lower than the slurm daemon).
2066
2067
2068       PropagateResourceLimits
2069              A list of comma separated resource limit names.  The slurmd dae‐
2070              mon  uses these names to obtain the associated (soft) limit val‐
2071              ues from the user's process  environment  on  the  submit  node.
2072              These  limits  are  then propagated and applied to the jobs that
2073              will run on the compute nodes.  This  parameter  can  be  useful
2074              when  system  limits vary among nodes.  Any resource limits that
2075              do not appear in the list are not propagated.  However, the user
2076              can  override this by specifying which resource limits to propa‐
2077              gate with the sbatch or srun "--propagate"  option.  If  neither
2078              PropagateResourceLimits   or  PropagateResourceLimitsExcept  are
2079              configured and the "--propagate" option is not  specified,  then
2080              the  default  action is to propagate all limits. Only one of the
2081              parameters, either PropagateResourceLimits or PropagateResource‐
2082              LimitsExcept,  may be specified.  The user limits can not exceed
2083              hard limits under which the slurmd daemon operates. If the  user
2084              limits  are  not  propagated,  the limits from the slurmd daemon
2085              will be propagated to the user's job. The limits  used  for  the
2086              Slurm  daemons  can  be  set in the /etc/sysconf/slurm file. For
2087              more information,  see:  https://slurm.schedmd.com/faq.html#mem‐
2088              lock  The following limit names are supported by Slurm (although
2089              some options may not be supported on some systems):
2090
2091              ALL       All limits listed below (default)
2092
2093              NONE      No limits listed below
2094
2095              AS        The maximum address space for a process
2096
2097              CORE      The maximum size of core file
2098
2099              CPU       The maximum amount of CPU time
2100
2101              DATA      The maximum size of a process's data segment
2102
2103              FSIZE     The maximum size of files created. Note  that  if  the
2104                        user  sets  FSIZE to less than the current size of the
2105                        slurmd.log, job launches will fail with a  'File  size
2106                        limit exceeded' error.
2107
2108              MEMLOCK   The maximum size that may be locked into memory
2109
2110              NOFILE    The maximum number of open files
2111
2112              NPROC     The maximum number of processes available
2113
2114              RSS       The maximum resident set size
2115
2116              STACK     The maximum stack size
2117
2118
2119       PropagateResourceLimitsExcept
2120              A list of comma separated resource limit names.  By default, all
2121              resource limits will be propagated, (as described by the  Propa‐
2122              gateResourceLimits  parameter),  except for the limits appearing
2123              in this list.   The user can override this by  specifying  which
2124              resource  limits  to propagate with the sbatch or srun "--propa‐
2125              gate" option.  See PropagateResourceLimits above for a  list  of
2126              valid limit names.
2127
2128
2129       RebootProgram
2130              Program  to  be  executed  on  each  compute  node to reboot it.
2131              Invoked on each node once it  becomes  idle  after  the  command
2132              "scontrol  reboot_nodes"  is executed by an authorized user or a
2133              job is submitted with the "--reboot" option.   After  rebooting,
2134              the  node  is returned to normal use.  See ResumeTimeout to con‐
2135              figure the time you expect a reboot to finish in.  A  node  will
2136              be marked DOWN if it doesn't reboot within ResumeTimeout.
2137
2138
2139       ReconfigFlags
2140              Flags  to  control  various  actions  that  may be taken when an
2141              "scontrol reconfig" command is  issued.  Currently  the  options
2142              are:
2143
2144              KeepPartInfo     If  set,  an  "scontrol  reconfig" command will
2145                               maintain  the  in-memory  value  of   partition
2146                               "state" and other parameters that may have been
2147                               dynamically updated by "scontrol update".  Par‐
2148                               tition  information in the slurm.conf file will
2149                               be  merged  with  in-memory  data.   This  flag
2150                               supersedes the KeepPartState flag.
2151
2152              KeepPartState    If  set,  an  "scontrol  reconfig" command will
2153                               preserve only  the  current  "state"  value  of
2154                               in-memory  partitions  and will reset all other
2155                               parameters of the partitions that may have been
2156                               dynamically updated by "scontrol update" to the
2157                               values from  the  slurm.conf  file.   Partition
2158                               information  in  the  slurm.conf  file  will be
2159                               merged with in-memory data.
2160              The default for the above flags is not set,  and  the  "scontrol
2161              reconfig"  will rebuild the partition information using only the
2162              definitions in the slurm.conf file.
2163
2164
2165       RequeueExit
2166              Enables automatic requeue for batch jobs  which  exit  with  the
2167              specified values.  Separate multiple exit code by a comma and/or
2168              specify numeric ranges using a  "-"  separator  (e.g.  "Requeue‐
2169              Exit=1-9,18")  Jobs  will  be  put  back in to pending state and
2170              later scheduled again.  Restarted jobs will have the environment
2171              variable  SLURM_RESTART_COUNT set to the number of times the job
2172              has been restarted.
2173
2174
2175       RequeueExitHold
2176              Enables automatic requeue for batch jobs  which  exit  with  the
2177              specified values, with these jobs being held until released man‐
2178              ually by the user.  Separate  multiple  exit  code  by  a  comma
2179              and/or  specify  numeric  ranges  using  a  "-"  separator (e.g.
2180              "RequeueExitHold=10-12,16") These jobs are put in  the  JOB_SPE‐
2181              CIAL_EXIT  exit state.  Restarted jobs will have the environment
2182              variable SLURM_RESTART_COUNT set to the number of times the  job
2183              has been restarted.
2184
2185
2186       ResumeFailProgram
2187              The  program  that will be executed when nodes fail to resume to
2188              by ResumeTimeout. The argument to the program will be the  names
2189              of the failed nodes (using Slurm's hostlist expression format).
2190
2191
2192       ResumeProgram
2193              Slurm  supports a mechanism to reduce power consumption on nodes
2194              that remain idle for an extended period of time.  This is  typi‐
2195              cally accomplished by reducing voltage and frequency or powering
2196              the node down.  ResumeProgram is the program that will  be  exe‐
2197              cuted  when  a  node in power save mode is assigned work to per‐
2198              form.  For reasons of  reliability,  ResumeProgram  may  execute
2199              more  than once for a node when the slurmctld daemon crashes and
2200              is restarted.  If ResumeProgram is unable to restore a  node  to
2201              service  with  a  responding  slurmd and an updated BootTime, it
2202              should requeue any job associated with the node and set the node
2203              state  to  DOWN.  If the node isn't actually rebooted (i.e. when
2204              multiple-slurmd is configured) starting slurmd with "-b"  option
2205              might  be useful.  The program executes as SlurmUser.  The argu‐
2206              ment to the program will be the names of  nodes  to  be  removed
2207              from  power savings mode (using Slurm's hostlist expression for‐
2208              mat).  By default no  program  is  run.   Related  configuration
2209              options include ResumeTimeout, ResumeRate, SuspendRate, Suspend‐
2210              Time, SuspendTimeout, SuspendProgram, SuspendExcNodes, and  Sus‐
2211              pendExcParts.   More  information  is available at the Slurm web
2212              site ( https://slurm.schedmd.com/power_save.html ).
2213
2214
2215       ResumeRate
2216              The rate at which nodes in power save mode are returned to  nor‐
2217              mal  operation  by  ResumeProgram.  The value is number of nodes
2218              per minute and it can be used to prevent power surges if a large
2219              number of nodes in power save mode are assigned work at the same
2220              time (e.g. a large job starts).  A value of zero results  in  no
2221              limits  being  imposed.   The  default  value  is  300 nodes per
2222              minute.  Related configuration  options  include  ResumeTimeout,
2223              ResumeProgram,  SuspendRate,  SuspendTime,  SuspendTimeout, Sus‐
2224              pendProgram, SuspendExcNodes, and SuspendExcParts.
2225
2226
2227       ResumeTimeout
2228              Maximum time permitted (in seconds) between when a  node  resume
2229              request  is  issued  and when the node is actually available for
2230              use.  Nodes which fail to respond in this  time  frame  will  be
2231              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
2232              which reboot after this time frame will be marked  DOWN  with  a
2233              reason of "Node unexpectedly rebooted."  The default value is 60
2234              seconds.  Related configuration options  include  ResumeProgram,
2235              ResumeRate,  SuspendRate,  SuspendTime, SuspendTimeout, Suspend‐
2236              Program, SuspendExcNodes and SuspendExcParts.  More  information
2237              is     available     at     the     Slurm     web     site     (
2238              https://slurm.schedmd.com/power_save.html ).
2239
2240
2241       ResvEpilog
2242              Fully qualified pathname of a program for the slurmctld to  exe‐
2243              cute  when a reservation ends. The program can be used to cancel
2244              jobs, modify  partition  configuration,  etc.   The  reservation
2245              named  will be passed as an argument to the program.  By default
2246              there is no epilog.
2247
2248
2249       ResvOverRun
2250              Describes how long a job already running in a reservation should
2251              be  permitted  to  execute after the end time of the reservation
2252              has been reached.  The time period is specified in  minutes  and
2253              the  default  value  is 0 (kill the job immediately).  The value
2254              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2255              supported to permit a job to run indefinitely after its reserva‐
2256              tion is terminated.
2257
2258
2259       ResvProlog
2260              Fully qualified pathname of a program for the slurmctld to  exe‐
2261              cute  when a reservation begins. The program can be used to can‐
2262              cel jobs, modify partition configuration, etc.  The  reservation
2263              named  will be passed as an argument to the program.  By default
2264              there is no prolog.
2265
2266
2267       ReturnToService
2268              Controls when a DOWN node will  be  returned  to  service.   The
2269              default value is 0.  Supported values include
2270
2271              0   A node will remain in the DOWN state until a system adminis‐
2272                  trator explicitly changes its state (even if the slurmd dae‐
2273                  mon registers and resumes communications).
2274
2275              1   A  DOWN node will become available for use upon registration
2276                  with a valid configuration only if it was set  DOWN  due  to
2277                  being  non-responsive.   If  the  node  was set DOWN for any
2278                  other reason (low  memory,  unexpected  reboot,  etc.),  its
2279                  state  will  not automatically be changed.  A node registers
2280                  with a valid configuration if its memory, GRES,  CPU  count,
2281                  etc.  are  equal to or greater than the values configured in
2282                  slurm.conf.
2283
2284              2   A DOWN node will become available for use upon  registration
2285                  with  a  valid  configuration.  The node could have been set
2286                  DOWN for any reason.  A node registers with a valid configu‐
2287                  ration  if its memory, GRES, CPU count, etc. are equal to or
2288                  greater than the values configured in slurm.conf.  (Disabled
2289                  on Cray ALPS systems.)
2290
2291
2292       RoutePlugin
2293              Identifies  the  plugin to be used for defining which nodes will
2294              be used for message forwarding and message aggregation.
2295
2296              route/default
2297                     default, use TreeWidth.
2298
2299              route/topology
2300                     use the switch hierarchy defined in a topology.conf file.
2301                     TopologyPlugin=topology/tree is required.
2302
2303
2304       SallocDefaultCommand
2305              Normally,  salloc(1)  will  run  the user's default shell when a
2306              command to execute is not specified on the salloc command  line.
2307              If  SallocDefaultCommand  is  specified, salloc will instead run
2308              the configured command. The command is passed to  '/bin/sh  -c',
2309              so  shell metacharacters are allowed, and commands with multiple
2310              arguments should be quoted. For instance:
2311
2312                  SallocDefaultCommand = "$SHELL"
2313
2314              would run the shell in the user's $SHELL  environment  variable.
2315              and
2316
2317                  SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL"
2318
2319              would  run  spawn  the  user's  default  shell  on the allocated
2320              resources, but not consume any of the CPU or  memory  resources,
2321              configure it as a pseudo-terminal, and preserve all of the job's
2322              environment variables (i.e. and not over-write them with the job
2323              step's allocation information).
2324
2325              For systems with generic resources (GRES) defined, the SallocDe‐
2326              faultCommand value should explicitly specify a  zero  count  for
2327              the  configured  GRES.   Failure  to  do  so  will result in the
2328              launched shell consuming those GRES  and  preventing  subsequent
2329              srun commands from using them.  For example, on Cray systems add
2330              "--gres=craynetwork:0" as shown below:
2331                  SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --gres=craynetwork:0 --pty --preserve-env --mpi=none $SHELL"
2332
2333              For  systems  with  TaskPlugin  set,   adding   an   option   of
2334              "--cpu-bind=no"  is recommended if the default shell should have
2335              access to all of the CPUs allocated to the  job  on  that  node,
2336              otherwise the shell may be limited to a single cpu or core.
2337
2338
2339       SbcastParameters
2340              Controls sbcast command behavior. Multiple options can be speci‐
2341              fied in a comma separated list.  Supported values include:
2342
2343              DestDir=       Destination directory for file being broadcast to
2344                             allocated  compute  nodes.  Default value is cur‐
2345                             rent working directory.
2346
2347              Compression=   Specify default file compression  library  to  be
2348                             used.   Supported  values  are  "lz4", "none" and
2349                             "zlib".  The default value with the sbcast --com‐
2350                             press option is "lz4" and "none" otherwise.  Some
2351                             compression libraries may be unavailable on  some
2352                             systems.
2353
2354
2355       SchedulerParameters
2356              The  interpretation  of  this parameter varies by SchedulerType.
2357              Multiple options may be comma separated.
2358
2359              allow_zero_lic
2360                     If set, then job submissions requesting more than config‐
2361                     ured licenses won't be rejected.
2362
2363              assoc_limit_stop
2364                     If  set and a job cannot start due to association limits,
2365                     then do not attempt to initiate any lower  priority  jobs
2366                     in  that  partition.  Setting  this  can  decrease system
2367                     throughput and utilization, but avoid potentially  starv‐
2368                     ing larger jobs by preventing them from launching indefi‐
2369                     nitely.
2370
2371              batch_sched_delay=#
2372                     How long, in seconds, the scheduling of batch jobs can be
2373                     delayed.   This  can be useful in a high-throughput envi‐
2374                     ronment in which batch jobs are submitted at a very  high
2375                     rate  (i.e.  using  the sbatch command) and one wishes to
2376                     reduce the overhead of attempting to schedule each job at
2377                     submit time.  The default value is 3 seconds.
2378
2379              bb_array_stage_cnt=#
2380                     Number of tasks from a job array that should be available
2381                     for burst buffer resource allocation. Higher values  will
2382                     increase  the  system  overhead as each task from the job
2383                     array will be moved to it's own job record in memory,  so
2384                     relatively  small  values are generally recommended.  The
2385                     default value is 10.
2386
2387              bf_busy_nodes
2388                     When selecting resources for pending jobs to reserve  for
2389                     future execution (i.e. the job can not be started immedi‐
2390                     ately), then preferentially select nodes that are in use.
2391                     This  will  tend to leave currently idle resources avail‐
2392                     able for backfilling longer running jobs, but may  result
2393                     in allocations having less than optimal network topology.
2394                     This  option  is  currently   only   supported   by   the
2395                     select/cons_res plugin (or select/cray with SelectTypePa‐
2396                     rameters  set  to  "OTHER_CONS_RES",  which  layers   the
2397                     select/cray plugin over the select/cons_res plugin).
2398
2399              bf_continue
2400                     The  backfill  scheduler  periodically  releases locks in
2401                     order to permit other operations to proceed  rather  than
2402                     blocking  all  activity  for  what  could  be an extended
2403                     period of time.  Setting this option will cause the back‐
2404                     fill  scheduler  to continue processing pending jobs from
2405                     its original job list after releasing locks even  if  job
2406                     or node state changes.  This can result in lower priority
2407                     jobs being backfill scheduled instead  of  newly  arrived
2408                     higher priority jobs, but will permit more queued jobs to
2409                     be considered for backfill scheduling.
2410
2411              bf_hetjob_immediate
2412                     Instruct the backfill scheduler to  attempt  to  start  a
2413                     heterogeneous  job  as  soon as all of its components are
2414                     determined able to do so. Otherwise, the backfill  sched‐
2415                     uler  will  delay  heterogeneous jobs initiation attempts
2416                     until after the rest of the  queue  has  been  processed.
2417                     This  delay may result in lower priority jobs being allo‐
2418                     cated resources, which could delay the initiation of  the
2419                     heterogeneous  job due to account and/or QOS limits being
2420                     reached. This option is disabled by default.  If  enabled
2421                     and bf_hetjob_prio=min is not set, then it would be auto‐
2422                     matically set.
2423
2424              bf_hetjob_prio=[min|avg|max]
2425                     At the beginning of each  backfill  scheduling  cycle,  a
2426                     list  of pending to be scheduled jobs is sorted according
2427                     to the precedence order configured in PriorityType.  This
2428                     option instructs the scheduler to alter the sorting algo‐
2429                     rithm to ensure that all components belonging to the same
2430                     heterogeneous  job will be attempted to be scheduled con‐
2431                     secutively (thus not fragmented in the  resulting  list).
2432                     More specifically, all components from the same heteroge‐
2433                     neous job will be treated as if they all  have  the  same
2434                     priority (minimum, average or maximum depending upon this
2435                     option's parameter) when compared  with  other  jobs  (or
2436                     other  heterogeneous  job components). The original order
2437                     will be preserved within the same heterogeneous job. Note
2438                     that  the  operation  is  calculated for the PriorityTier
2439                     layer and for the  Priority  resulting  from  the  prior‐
2440                     ity/multifactor plugin calculations. When enabled, if any
2441                     heterogeneous job requested an advanced reservation, then
2442                     all  of  that job's components will be treated as if they
2443                     had requested an advanced reservation (and get  preferen‐
2444                     tial treatment in scheduling).
2445
2446                     Note  that  this  operation  does not update the Priority
2447                     values of the heterogeneous job  components,  only  their
2448                     order within the list, so the output of the sprio command
2449                     will not be effected.
2450
2451                     Heterogeneous jobs have  special  scheduling  properties:
2452                     they  are only scheduled by the backfill scheduling plug‐
2453                     in, each of their  components  is  considered  separately
2454                     when reserving resources (and might have different Prior‐
2455                     ityTier or different Priority values), and  no  heteroge‐
2456                     neous job component is actually allocated resources until
2457                     all if its components can be initiated.  This  may  imply
2458                     potential  scheduling  deadlock  scenarios because compo‐
2459                     nents from different heterogeneous jobs can start reserv‐
2460                     ing  resources  in  an  interleaved fashion (not consecu‐
2461                     tively), but none of the jobs can reserve  resources  for
2462                     all  components  and start. Enabling this option can help
2463                     to mitigate this problem. By default, this option is dis‐
2464                     abled.
2465
2466              bf_ignore_newly_avail_nodes
2467                     If set, then only resources available at the beginning of
2468                     a backfill cycle will be considered  for  use.  Otherwise
2469                     resources made available during that backfill cycle (dur‐
2470                     ing a yield with bf_continue set) may be used  for  lower
2471                     priority jobs, delaying the initiation of higher priority
2472                     jobs.  Disabled by default.
2473
2474              bf_interval=#
2475                     The  number  of  seconds  between  backfill   iterations.
2476                     Higher  values result in less overhead and better respon‐
2477                     siveness.   This  option  applies  only   to   Scheduler‐
2478                     Type=sched/backfill.  The default value is 30 seconds.
2479
2480
2481              bf_job_part_count_reserve=#
2482                     The  backfill scheduling logic will reserve resources for
2483                     the specified count of highest priority jobs in each par‐
2484                     tition.   For  example, bf_job_part_count_reserve=10 will
2485                     cause the backfill scheduler to reserve resources for the
2486                     ten  highest  priority jobs in each partition.  Any lower
2487                     priority job that can be started using  currently  avail‐
2488                     able  resources  and  not  adversely  impact the expected
2489                     start time of these higher priority jobs will be  started
2490                     by  the  backfill  scheduler  The  default value is zero,
2491                     which will reserve resources  for  any  pending  job  and
2492                     delay  initiation  of  lower  priority  jobs.   Also  see
2493                     bf_min_age_reserve and bf_min_prio_reserve.
2494
2495
2496              bf_max_job_array_resv=#
2497                     The maximum number of tasks from a job  array  for  which
2498                     the  backfill  scheduler  will  reserve  resources in the
2499                     future.  Since job arrays can potentially  have  millions
2500                     of  tasks,  the  overhead  in reserving resources for all
2501                     tasks can be prohibitive.  In addition various limits may
2502                     prevent all the jobs from starting at the expected times.
2503                     This has no impact upon the number of tasks  from  a  job
2504                     array  that  can be started immediately, only those tasks
2505                     expected to start at some future time.  The default value
2506                     is 20 tasks.  NOTE: Jobs submitted to multiple partitions
2507                     appear in the job queue once per partition. If  different
2508                     copies of a single job array record aren't consecutive in
2509                     the job queue and another job array record is in between,
2510                     then  bf_max_job_array_resv tasks are considered per par‐
2511                     tition that the job is submitted to.
2512
2513              bf_max_job_assoc=#
2514                     The maximum  number  of  jobs  per  user  association  to
2515                     attempt  starting with the backfill scheduler.  This set‐
2516                     ting is similar to bf_max_job_user but is handy if a user
2517                     has  multiple assocations equating to basically different
2518                     users.  One can set this  limit  to  prevent  users  from
2519                     flooding  the  backfill queue with jobs that cannot start
2520                     and that prevent jobs from other  users  to  start.   The
2521                     default  value  is  0, which means no limit.  This option
2522                     applies only to SchedulerType=sched/backfill.   Also  see
2523                     the  bf_max_job_user bf_max_job_part, bf_max_job_test and
2524                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2525                     value much higher than bf_max_job_assoc.
2526
2527              bf_max_job_part=#
2528                     The  maximum  number  of  jobs  per  partition to attempt
2529                     starting with the backfill scheduler. This can  be  espe‐
2530                     cially  helpful  for systems with large numbers of parti‐
2531                     tions and jobs.  The default value is 0, which  means  no
2532                     limit.    This   option   applies   only   to  Scheduler‐
2533                     Type=sched/backfill.  Also  see  the  partition_job_depth
2534                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2535                     value much higher than bf_max_job_part.
2536
2537              bf_max_job_start=#
2538                     The maximum number of jobs which can be  initiated  in  a
2539                     single  iteration of the backfill scheduler.  The default
2540                     value is 0, which means no limit.   This  option  applies
2541                     only to SchedulerType=sched/backfill.
2542
2543              bf_max_job_test=#
2544                     The maximum number of jobs to attempt backfill scheduling
2545                     for (i.e. the queue depth).  Higher values result in more
2546                     overhead  and  less  responsiveness.  Until an attempt is
2547                     made to backfill schedule a job, its expected  initiation
2548                     time  value  will  not be set.  The default value is 100.
2549                     In the case of large clusters, configuring  a  relatively
2550                     small  value  may be desirable.  This option applies only
2551                     to SchedulerType=sched/backfill.
2552
2553              bf_max_job_user=#
2554                     The maximum number of jobs per user to  attempt  starting
2555                     with  the backfill scheduler for ALL partitions.  One can
2556                     set this limit to prevent users from flooding  the  back‐
2557                     fill  queue  with jobs that cannot start and that prevent
2558                     jobs from other users to start.  This is similar  to  the
2559                     MAXIJOB  limit  in  Maui.   The default value is 0, which
2560                     means no limit.  This option applies only  to  Scheduler‐
2561                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2562                     bf_max_job_test and bf_max_job_user_part=# options.   Set
2563                     bf_max_job_test    to    a   value   much   higher   than
2564                     bf_max_job_user.
2565
2566              bf_max_job_user_part=#
2567                     The maximum number of jobs  per  user  per  partition  to
2568                     attempt starting with the backfill scheduler for any sin‐
2569                     gle partition.  The default value is 0,  which  means  no
2570                     limit.    This   option   applies   only   to  Scheduler‐
2571                     Type=sched/backfill.   Also  see   the   bf_max_job_part,
2572                     bf_max_job_test and bf_max_job_user=# options.
2573
2574              bf_max_time=#
2575                     The   maximum  time  the  backfill  scheduler  can  spend
2576                     (including time spent sleeping when locks  are  released)
2577                     before discontinuing, even if maximum job counts have not
2578                     been reached.  This option  applies  only  to  Scheduler‐
2579                     Type=sched/backfill.   The  default value is the value of
2580                     bf_interval (which defaults to 30 seconds).   NOTE:  This
2581                     needs to be high enough that scheduling isn't always dis‐
2582                     abled, and low enough that our interactive  workload  can
2583                     get  through  in  a reasonable period of time.  Certainly
2584                     needs to be below 256 (the  default  RPC  thread  limit).
2585                     Running  around  the  middle  (150)  may  give  you  good
2586                     results.
2587
2588              bf_min_age_reserve=#
2589                     The backfill and main scheduling logic will  not  reserve
2590                     resources  for  pending jobs until they have been pending
2591                     and runnable for at least the specified  number  of  sec‐
2592                     onds.  In addition, jobs waiting for less than the speci‐
2593                     fied number of seconds will not prevent a newly submitted
2594                     job  from starting immediately, even if the newly submit‐
2595                     ted job has a lower priority.  This can  be  valuable  if
2596                     jobs  lack  time  limits or all time limits have the same
2597                     value.  The default value is  zero,  which  will  reserve
2598                     resources  for  any  pending  job and delay initiation of
2599                     lower priority jobs.  Also see  bf_job_part_count_reserve
2600                     and bf_min_prio_reserve.
2601
2602              bf_min_prio_reserve=#
2603                     The  backfill  and main scheduling logic will not reserve
2604                     resources for pending jobs unless they  have  a  priority
2605                     equal  to  or  higher than the specified value.  In addi‐
2606                     tion, jobs with a lower priority will not prevent a newly
2607                     submitted  job  from  starting  immediately,  even if the
2608                     newly submitted job has a lower priority.   This  can  be
2609                     valuable  if  one  wished  to  maximum system utilization
2610                     without regard for job priority below a  certain  thresh‐
2611                     old.   The  default  value  is  zero,  which will reserve
2612                     resources for any pending job  and  delay  initiation  of
2613                     lower  priority jobs.  Also see bf_job_part_count_reserve
2614                     and bf_min_age_reserve.
2615
2616              bf_resolution=#
2617                     The number of seconds in the  resolution  of  data  main‐
2618                     tained  about  when  jobs  begin  and end.  Higher values
2619                     result in less overhead and better  responsiveness.   The
2620                     default value is 60 seconds.  This option applies only to
2621                     SchedulerType=sched/backfill.
2622
2623              bf_window=#
2624                     The number of minutes into the future to look  when  con‐
2625                     sidering  jobs to schedule.  Higher values result in more
2626                     overhead and less responsiveness.  The default  value  is
2627                     1440  minutes (one day).  A value at least as long as the
2628                     highest allowed time limit is generally advisable to pre‐
2629                     vent  job  starvation.   In  order to limit the amount of
2630                     data managed by the backfill scheduler, if the  value  of
2631                     bf_window is increased, then it is generally advisable to
2632                     also increase bf_resolution.  This option applies only to
2633                     SchedulerType=sched/backfill.
2634
2635              bf_window_linear=#
2636                     For  performance  reasons,  the  backfill  scheduler will
2637                     decrease precision in calculation of job expected  termi‐
2638                     nation times. By default, the precision starts at 30 sec‐
2639                     onds and that time interval doubles with each  evaluation
2640                     of currently executing jobs when trying to determine when
2641                     a pending job can start. This algorithm  can  support  an
2642                     environment  with many thousands of running jobs, but can
2643                     result in the expected start time of pending  jobs  being
2644                     gradually  being  deferred  due  to  lack of precision. A
2645                     value for bf_window_linear will cause the  time  interval
2646                     to  be  increased by a constant amount on each iteration.
2647                     The value is specified in units of seconds. For  example,
2648                     a  value  of  60 will cause the backfill scheduler on the
2649                     first iteration to identify the job  ending  soonest  and
2650                     determine  if  the  pending job can be started after that
2651                     job plus all other jobs expected to end within 30 seconds
2652                     (default  initial  value)  of  the first job. On the next
2653                     iteration, the pending job will be evaluated for starting
2654                     after  the  next job expected to end plus all jobs ending
2655                     within 90 seconds of that time (30 second  default,  plus
2656                     the  60  second  option value).  The third iteration will
2657                     have a 150 second window  and  the  fourth  210  seconds.
2658                     Without this option, the time windows will double on each
2659                     iteration and thus be 30, 60, 120, 240 seconds, etc.  The
2660                     use of bf_window_linear is not recommended with more than
2661                     a few hundred simultaneously executing jobs.
2662
2663              bf_yield_interval=#
2664                     The backfill scheduler will periodically relinquish locks
2665                     in  order  for  other  pending  operations to take place.
2666                     This specifies the times when the locks are relinquish in
2667                     microseconds.   The  default value is 2,000,000 microsec‐
2668                     onds (2 seconds).  Smaller values may be helpful for high
2669                     throughput  computing  when  used in conjunction with the
2670                     bf_continue option.  Also see the bf_yield_sleep option.
2671
2672              bf_yield_sleep=#
2673                     The backfill scheduler will periodically relinquish locks
2674                     in  order  for  other  pending  operations to take place.
2675                     This specifies the length of time for which the locks are
2676                     relinquish in microseconds.  The default value is 500,000
2677                     microseconds (0.5 seconds).  Also see the bf_yield_inter‐
2678                     val option.
2679
2680              build_queue_timeout=#
2681                     Defines  the maximum time that can be devoted to building
2682                     a queue of jobs to be tested for scheduling.  If the sys‐
2683                     tem  has  a  huge  number of jobs with dependencies, just
2684                     building the job queue  can  take  so  much  time  as  to
2685                     adversely  impact  overall  system  performance  and this
2686                     parameter can be adjusted as needed.  The  default  value
2687                     is 2,000,000 microseconds (2 seconds).
2688
2689              default_queue_depth=#
2690                     The  default  number  of jobs to attempt scheduling (i.e.
2691                     the queue depth) when a running job  completes  or  other
2692                     routine  actions  occur, however the frequency with which
2693                     the scheduler is run may be limited by using the defer or
2694                     sched_min_interval  parameters described below.  The full
2695                     queue will be tested on a less frequent basis as  defined
2696                     by the sched_interval option described below. The default
2697                     value is 100.   See  the  partition_job_depth  option  to
2698                     limit depth by partition.
2699
2700              defer  Setting  this  option  will  avoid attempting to schedule
2701                     each job individually at job submit time,  but  defer  it
2702                     until a later time when scheduling multiple jobs simulta‐
2703                     neously may be possible.  This option may improve  system
2704                     responsiveness when large numbers of jobs (many hundreds)
2705                     are submitted at the same time, but  it  will  delay  the
2706                     initiation    time   of   individual   jobs.   Also   see
2707                     default_queue_depth above.
2708
2709              delay_boot=#
2710                     Do not reboot nodes in order to satisfied this job's fea‐
2711                     ture  specification  if  the job has been eligible to run
2712                     for less than this time period.  If the  job  has  waited
2713                     for  less  than  the  specified  period, it will use only
2714                     nodes which already have  the  specified  features.   The
2715                     argument  is  in  units  of minutes.  Individual jobs may
2716                     override this default value with the --delay-boot option.
2717
2718              default_gbytes
2719                     The default units in job submission memory and  temporary
2720                     disk  size  specification  will  be gigabytes rather than
2721                     megabytes.  Users can override the  default  by  using  a
2722                     suffix of "M" for megabytes.
2723
2724              disable_hetero_steps
2725                     Disable  job  steps  that  span heterogeneous job alloca‐
2726                     tions.  The default value on Cray systems.
2727
2728              enable_hetero_steps
2729                     Enable job steps that span heterogeneous job allocations.
2730                     The default value except for Cray systems.
2731
2732              enable_user_top
2733                     Enable  use  of  the "scontrol top" command by non-privi‐
2734                     leged users.
2735
2736              Ignore_NUMA
2737                     Some processors (e.g. AMD Opteron  6000  series)  contain
2738                     multiple  NUMA  nodes per socket. This is a configuration
2739                     which does not map into the hardware entities that  Slurm
2740                     optimizes   resource  allocation  for  (PU/thread,  core,
2741                     socket, baseboard, node and network switch). In order  to
2742                     optimize  resource  allocations  on  such hardware, Slurm
2743                     will consider each NUMA node within the socket as a sepa‐
2744                     rate  socket  by  default.  Use the Ignore_NUMA option to
2745                     report  the  correct  socket  count,  but  not   optimize
2746                     resource allocations on the NUMA nodes.
2747
2748              inventory_interval=#
2749                     On  a  Cray system using Slurm on top of ALPS this limits
2750                     the number of times a Basil Inventory call is made.  Nor‐
2751                     mally this call happens every scheduling consideration to
2752                     attempt to close a node state change window with respects
2753                     to what ALPS has.  This call is rather slow, so making it
2754                     less frequently improves performance dramatically, but in
2755                     the situation where a node changes state the window is as
2756                     large as this setting.  In an HTC environment  this  set‐
2757                     ting is a must and we advise around 10 seconds.
2758
2759              kill_invalid_depend
2760                     If  a  job has an invalid dependency and it can never run
2761                     terminate it and set its state to  be  JOB_CANCELLED.  By
2762                     default  the job stays pending with reason DependencyNev‐
2763                     erSatisfied.
2764
2765              max_array_tasks
2766                     Specify the maximum number of tasks that be included in a
2767                     job  array.   The default limit is MaxArraySize, but this
2768                     option can be used to set a  lower  limit.  For  example,
2769                     max_array_tasks=1000 and MaxArraySize=100001 would permit
2770                     a maximum task ID of 100000,  but  limit  the  number  of
2771                     tasks in any single job array to 1000.
2772
2773              max_depend_depth=#
2774                     Maximum  number of jobs to test for a circular job depen‐
2775                     dency. Stop testing after this number of job dependencies
2776                     have been tested. The default value is 10 jobs.
2777
2778              max_rpc_cnt=#
2779                     If  the  number of active threads in the slurmctld daemon
2780                     is equal to or larger than this value,  defer  scheduling
2781                     of  jobs.   This  can  improve Slurm's ability to process
2782                     requests at a cost  of  initiating  new  jobs  less  fre‐
2783                     quently.   The default value is zero, which disables this
2784                     option.  If a value is set, then a value of 10 or  higher
2785                     is recommended.
2786
2787              max_sched_time=#
2788                     How  long, in seconds, that the main scheduling loop will
2789                     execute for before exiting.  If a value is configured, be
2790                     aware  that  all  other Slurm operations will be deferred
2791                     during this time period.  Make certain the value is lower
2792                     than  MessageTimeout.   If a value is not explicitly con‐
2793                     figured, the default value is half of MessageTimeout with
2794                     a minimum default value of 1 second and a maximum default
2795                     value of 2 seconds.  For  example  if  MessageTimeout=10,
2796                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
2797
2798              max_script_size=#
2799                     Specify  the  maximum  size  of a batch script, in bytes.
2800                     The default value is  4  megabytes.   Larger  values  may
2801                     adversely impact system performance.
2802
2803              max_switch_wait=#
2804                     Maximum  number of seconds that a job can delay execution
2805                     waiting for  the  specified  desired  switch  count.  The
2806                     default value is 300 seconds.
2807
2808              no_backup_scheduling
2809                     If  used,  the  backup  controller will not schedule jobs
2810                     when it takes over. The backup controller will allow jobs
2811                     to  be submitted, modified and cancelled but won't sched‐
2812                     ule new jobs. This is useful in  Cray  environments  when
2813                     the  backup  controller resides on an external Cray node.
2814                     A restart is required  to  alter  this  option.  This  is
2815                     explicitly set on a Cray/ALPS system.
2816
2817              no_env_cache
2818                     If  used,  any job started on node that fails to load the
2819                     env from a node will fail instead  of  using  the  cached
2820                     env.   This  will  also implicitly imply the requeue_set‐
2821                     up_env_fail option as well.
2822
2823              pack_serial_at_end
2824                     If used with the select/cons_res plugin then  put  serial
2825                     jobs  at the end of the available nodes rather than using
2826                     a best fit algorithm.  This may reduce resource  fragmen‐
2827                     tation for some workloads.
2828
2829              partition_job_depth=#
2830                     The  default  number  of jobs to attempt scheduling (i.e.
2831                     the queue depth) from  each  partition/queue  in  Slurm's
2832                     main  scheduling  logic.  The functionality is similar to
2833                     that provided by the bf_max_job_part option for the back‐
2834                     fill  scheduling  logic.   The  default  value  is  0 (no
2835                     limit).  Job's excluded from attempted  scheduling  based
2836                     upon   partition   will   not   be  counted  against  the
2837                     default_queue_depth limit.  Also see the  bf_max_job_part
2838                     option.
2839
2840              preempt_reorder_count=#
2841                     Specify how many attempts should be made in reording pre‐
2842                     emptable jobs to minimize the count  of  jobs  preempted.
2843                     The  default value is 1. High values may adversely impact
2844                     performance.  The logic to support this  option  is  only
2845                     available in the select/cons_res plugin.
2846
2847              preempt_strict_order
2848                     If set, then execute extra logic in an attempt to preempt
2849                     only the lowest priority jobs.  It may  be  desirable  to
2850                     set  this configuration parameter when there are multiple
2851                     priorities of preemptable jobs.   The  logic  to  support
2852                     this  option  is  only  available  in the select/cons_res
2853                     plugin.
2854
2855              preempt_youngest_first
2856                     If set, then the preemption  sorting  algorithm  will  be
2857                     changed  to sort by the job start times to favor preempt‐
2858                     ing younger jobs  over  older.  (Requires  preempt/parti‐
2859                     tion_prio or preempt/qos plugins.)
2860
2861              nohold_on_prolog_fail
2862                     By  default if the Prolog exits with a non-zero value the
2863                     job is requeued in held state. By specifying this parame‐
2864                     ter  the  job  will  be requeued but not held so that the
2865                     scheduler can dispatch it to another host.
2866
2867              reduce_completing_frag
2868                     This  option  is  used  to  control  how  scheduling   of
2869                     resources is performed when jobs are in completing state,
2870                     which influences potential fragmentation.  If the  option
2871                     is  not set then no jobs will be started in any partition
2872                     when any job is in completing state.  If  the  option  is
2873                     set then no jobs will be started in any individual parti‐
2874                     tion that has a job in completing state.  In addition, no
2875                     jobs  will  be  started  in any partition with nodes that
2876                     overlap with any nodes in the partition of the completing
2877                     job.   This option is to be used in conjunction with Com‐
2878                     pleteWait.  NOTE: CompleteWait must be set  for  this  to
2879                     work.
2880
2881              requeue_setup_env_fail
2882                     By default if a job environment setup fails the job keeps
2883                     running with a limited environment.  By  specifying  this
2884                     parameter  the job will be requeued in held state and the
2885                     execution node drained.
2886
2887              salloc_wait_nodes
2888                     If defined, the salloc command will wait until all  allo‐
2889                     cated  nodes  are  ready for use (i.e. booted) before the
2890                     command returns. By default, salloc will return  as  soon
2891                     as the resource allocation has been made.
2892
2893              sbatch_wait_nodes
2894                     If  defined,  the sbatch script will wait until all allo‐
2895                     cated nodes are ready for use (i.e.  booted)  before  the
2896                     initiation.  By default, the sbatch script will be initi‐
2897                     ated as soon as the first node in the job  allocation  is
2898                     ready.  The  sbatch  command can use the --wait-all-nodes
2899                     option to override this configuration parameter.
2900
2901              sched_interval=#
2902                     How frequently, in seconds, the main scheduling loop will
2903                     execute  and test all pending jobs.  The default value is
2904                     60 seconds.
2905
2906              sched_max_job_start=#
2907                     The maximum number of jobs that the main scheduling logic
2908                     will start in any single execution.  The default value is
2909                     zero, which imposes no limit.
2910
2911              sched_min_interval=#
2912                     How frequently, in microseconds, the main scheduling loop
2913                     will  execute  and  test any pending jobs.  The scheduler
2914                     runs in a limited fashion every time that any event  hap‐
2915                     pens  which could enable a job to start (e.g. job submit,
2916                     job terminate, etc.).  If these events happen at  a  high
2917                     frequency, the scheduler can run very frequently and con‐
2918                     sume significant  resources  if  not  throttled  by  this
2919                     option.   This  option specifies the minimum time between
2920                     the end of one scheduling cycle and the beginning of  the
2921                     next  scheduling  cycle.   A  value  of zero will disable
2922                     throttling of the scheduling logic interval.  The default
2923                     value  is 1,000,000 microseconds on Cray/ALPS systems and
2924                     2 microseconds on other systems.
2925
2926              spec_cores_first
2927                     Specialized cores will be selected from the  first  cores
2928                     of  the  first  sockets, cycling through the sockets on a
2929                     round robin basis.  By default, specialized cores will be
2930                     selected from the last cores of the last sockets, cycling
2931                     through the sockets on a round robin basis.
2932
2933              step_retry_count=#
2934                     When a step completes and there are steps ending resource
2935                     allocation, then retry step allocations for at least this
2936                     number of pending steps.  Also see step_retry_time.   The
2937                     default value is 8 steps.
2938
2939              step_retry_time=#
2940                     When a step completes and there are steps ending resource
2941                     allocation, then retry step  allocations  for  all  steps
2942                     which  have been pending for at least this number of sec‐
2943                     onds.  Also see step_retry_count.  The default  value  is
2944                     60 seconds.
2945
2946              whole_hetjob
2947                     Requests  to  cancel,  hold or release any component of a
2948                     heterogeneous job will be applied to  all  components  of
2949                     the job.
2950
2951                     NOTE:  this  option  was  previously named whole_pack and
2952                     this is still supported for retrocompatibility.
2953
2954
2955       SchedulerTimeSlice
2956              Number of seconds in each time slice  when  gang  scheduling  is
2957              enabled (PreemptMode=SUSPEND,GANG).  The value must be between 5
2958              seconds and 65533 seconds.  The default value is 30 seconds.
2959
2960
2961       SchedulerType
2962              Identifies the type of scheduler to be used.  Note the slurmctld
2963              daemon  must  be  restarted  for  a  change in scheduler type to
2964              become effective (reconfiguring a running daemon has  no  effect
2965              for  this parameter).  The scontrol command can be used to manu‐
2966              ally  change  job  priorities  if  desired.   Acceptable  values
2967              include:
2968
2969              sched/backfill
2970                     For  a  backfill scheduling module to augment the default
2971                     FIFO  scheduling.   Backfill  scheduling  will   initiate
2972                     lower-priority  jobs  if  doing  so  does  not  delay the
2973                     expected initiation time  of  any  higher  priority  job.
2974                     Effectiveness  of  backfill  scheduling is dependent upon
2975                     users specifying job time limits, otherwise all jobs will
2976                     have  the  same time limit and backfilling is impossible.
2977                     Note documentation  for  the  SchedulerParameters  option
2978                     above.  This is the default configuration.
2979
2980              sched/builtin
2981                     This is the FIFO scheduler which initiates jobs in prior‐
2982                     ity order.  If any job in the partition can not be sched‐
2983                     uled,  no  lower  priority  job in that partition will be
2984                     scheduled.  An exception is made for jobs  that  can  not
2985                     run due to partition constraints (e.g. the time limit) or
2986                     down/drained nodes.  In that case,  lower  priority  jobs
2987                     can be initiated and not impact the higher priority job.
2988
2989              sched/hold
2990                     To   hold   all   newly   arriving   jobs   if   a   file
2991                     "/etc/slurm.hold" exists otherwise use the built-in  FIFO
2992                     scheduler
2993
2994
2995       SelectType
2996              Identifies  the type of resource selection algorithm to be used.
2997              Changing this value can only be done by restarting the slurmctld
2998              daemon  and will result in the loss of all job information (run‐
2999              ning and pending) since the job state save format used  by  each
3000              plugin is different.  Acceptable values include
3001
3002              select/cons_res
3003                     The  resources (cores and memory) within a node are indi‐
3004                     vidually allocated as consumable  resources.   Note  that
3005                     whole  nodes can be allocated to jobs for selected parti‐
3006                     tions by using the OverSubscribe=Exclusive  option.   See
3007                     the  partition  OverSubscribe parameter for more informa‐
3008                     tion.
3009
3010              select/cray
3011                     for a Cray system.  The default  value  is  "select/cray"
3012                     for all Cray systems.
3013
3014              select/linear
3015                     for allocation of entire nodes assuming a one-dimensional
3016                     array of nodes in which sequentially  ordered  nodes  are
3017                     preferable.   For a heterogeneous cluster (e.g. different
3018                     CPU counts on the various  nodes),  resource  allocations
3019                     will  favor  nodes  with  high CPU counts as needed based
3020                     upon the job's node and CPU specification if TopologyPlu‐
3021                     gin=topology/none  is  configured.  Use of other topology
3022                     plugins with select/linear and heterogeneous nodes is not
3023                     recommended  and  may  result  in  valid  job  allocation
3024                     requests being rejected.  This is the default value.
3025
3026              select/serial
3027                     for allocating resources to single CPU jobs only.  Highly
3028                     optimized  for  maximum throughput.  NOTE: SPANK environ‐
3029                     ment variables are NOT propagated  to  the  job's  Epilog
3030                     program.
3031
3032
3033       SelectTypeParameters
3034              The  permitted  values  of  SelectTypeParameters depend upon the
3035              configured value of SelectType.  The only supported options  for
3036              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3037              which treats memory as a consumable resource and prevents memory
3038              over  subscription  with  job preemption or gang scheduling.  By
3039              default SelectType=select/linear allocates whole nodes  to  jobs
3040              without   considering  their  memory  consumption.   By  default
3041              SelectType=select/cons_res, SelectType=select/cray, and  Select‐
3042              Type=select/serial  use CR_CPU, which allocates CPU (threads) to
3043              jobs without considering their memory consumption.
3044
3045              The following options are supported for SelectType=select/cray:
3046
3047                     OTHER_CONS_RES
3048                            Layer  the  select/cons_res   plugin   under   the
3049                            select/cray  plugin,  the  default  is to layer on
3050                            select/linear.  This also allows all  the  options
3051                            available for SelectType=select/cons_res.
3052
3053                     NHC_ABSOLUTELY_NO
3054                            Never  run  the  node health check. Implies NHC_NO
3055                            and NHC_NO_STEPS as well.
3056
3057                     NHC_NO_STEPS
3058                            Do not run the node health check after each  step.
3059                            Default is to run after each step.
3060
3061                     NHC_NO Do  not run the node health check after each allo‐
3062                            cation.  Default is to run after each  allocation.
3063                            This also sets NHC_NO_STEPS, so the NHC will never
3064                            run except when nodes have been left with  unkill‐
3065                            able steps.
3066
3067              The   following   options   are   supported   by   the   Select‐
3068              Type=select/cons_res plugin:
3069
3070                     CR_CPU CPUs are consumable resources.  Configure the num‐
3071                            ber  of  CPUs  on each node, which may be equal to
3072                            the count of cores or hyper-threads  on  the  node
3073                            depending  upon the desired minimum resource allo‐
3074                            cation.  The  node's  Boards,  Sockets,  CoresPer‐
3075                            Socket  and  ThreadsPerCore may optionally be con‐
3076                            figured and result in job allocations  which  have
3077                            improved  locality;  however doing so will prevent
3078                            more than one job being from  being  allocated  on
3079                            each core.
3080
3081                     CR_CPU_Memory
3082                            CPUs and memory are consumable resources.  Config‐
3083                            ure the number of CPUs on each node, which may  be
3084                            equal  to  the  count of cores or hyper-threads on
3085                            the  node  depending  upon  the  desired   minimum
3086                            resource  allocation.  The node's Boards, Sockets,
3087                            CoresPerSocket and ThreadsPerCore  may  optionally
3088                            be  configured and result in job allocations which
3089                            have improved locality; however doing so will pre‐
3090                            vent  more than one job being from being allocated
3091                            on each core.  Setting a value for DefMemPerCPU is
3092                            strongly recommended.
3093
3094                     CR_Core
3095                            Cores  are  consumable  resources.   On nodes with
3096                            hyper-threads, each thread is counted as a CPU  to
3097                            satisfy a job's resource requirement, but multiple
3098                            jobs are not allocated threads on the  same  core.
3099                            The  count  of  CPUs  allocated  to  a  job may be
3100                            rounded up to account for every CPU  on  an  allo‐
3101                            cated core.
3102
3103                     CR_Core_Memory
3104                            Cores  and  memory  are  consumable resources.  On
3105                            nodes with hyper-threads, each thread  is  counted
3106                            as  a CPU to satisfy a job's resource requirement,
3107                            but multiple jobs are not allocated threads on the
3108                            same  core.   The count of CPUs allocated to a job
3109                            may be rounded up to account for every CPU  on  an
3110                            allocated  core.  Setting a value for DefMemPerCPU
3111                            is strongly recommended.
3112
3113                     CR_ONE_TASK_PER_CORE
3114                            Allocate one task per core  by  default.   Without
3115                            this option, by default one task will be allocated
3116                            per thread on nodes with more than one ThreadsPer‐
3117                            Core configured.  NOTE: This option cannot be used
3118                            with CR_CPU*.
3119
3120                     CR_CORE_DEFAULT_DIST_BLOCK
3121                            Allocate cores within a node using block distribu‐
3122                            tion  by default.  This is a pseudo-best-fit algo‐
3123                            rithm that minimizes the number of boards and min‐
3124                            imizes  the  number  of  sockets  (within  minimum
3125                            boards) used for  the  allocation.   This  default
3126                            behavior can be overridden specifying a particular
3127                            "-m" parameter with  srun/salloc/sbatch.   Without
3128                            this  option,  cores  will  be  allocated cyclicly
3129                            across the sockets.
3130
3131                     CR_LLN Schedule resources to jobs  on  the  least  loaded
3132                            nodes  (based  upon the number of idle CPUs). This
3133                            is generally only recommended for  an  environment
3134                            with serial jobs as idle resources will tend to be
3135                            highly  fragmented,  resulting  in  parallel  jobs
3136                            being  distributed  across  many nodes.  Note that
3137                            node Weight takes precedence over  how  many  idle
3138                            resources  are  on each node.  Also see the parti‐
3139                            tion configuration parameter  LLN  use  the  least
3140                            loaded nodes in selected partitions.
3141
3142                     CR_Pack_Nodes
3143                            If  a  job allocation contains more resources than
3144                            will be used for launching tasks  (e.g.  if  whole
3145                            nodes  are  allocated  to a job), then rather than
3146                            distributing a  job's  tasks  evenly  across  it's
3147                            allocated  nodes, pack them as tightly as possible
3148                            on these nodes.  For example, consider a job allo‐
3149                            cation containing two entire nodes with eight CPUs
3150                            each.  If the job starts ten  tasks  across  those
3151                            two  nodes without this option, it will start five
3152                            tasks on each of the two nodes.  With this option,
3153                            eight  tasks will be started on the first node and
3154                            two tasks on the second node.
3155
3156                     CR_Socket
3157                            Sockets are consumable resources.  On  nodes  with
3158                            multiple  cores, each core or thread is counted as
3159                            a CPU to satisfy a job's resource requirement, but
3160                            multiple  jobs  are not allocated resources on the
3161                            same socket.
3162
3163                     CR_Socket_Memory
3164                            Memory and sockets are consumable  resources.   On
3165                            nodes  with multiple cores, each core or thread is
3166                            counted as a  CPU  to  satisfy  a  job's  resource
3167                            requirement,  but  multiple jobs are not allocated
3168                            resources on the same socket.  Setting a value for
3169                            DefMemPerCPU is strongly recommended.
3170
3171                     CR_Memory
3172                            Memory  is  a  consumable  resource.   NOTE:  This
3173                            implies OverSubscribe=YES  or  OverSubscribe=FORCE
3174                            for  all  partitions.  Setting a value for DefMem‐
3175                            PerCPU is strongly recommended.
3176
3177
3178       SlurmUser
3179              The name of the user that the slurmctld daemon executes as.  For
3180              security  purposes,  a  user  other  than "root" is recommended.
3181              This user must exist on all nodes of the cluster for authentica‐
3182              tion  of  communications  between Slurm components.  The default
3183              value is "root".
3184
3185
3186       SlurmdParameters
3187              Parameters specific to the  Slurmd.   Multiple  options  may  be
3188              comma separated.
3189
3190              shutdown_on_reboot
3191                     If  set,  the  Slurmd will shut itself down when a reboot
3192                     request is received.
3193
3194
3195       SlurmdUser
3196              The name of the user that the slurmd daemon executes  as.   This
3197              user  must  exist on all nodes of the cluster for authentication
3198              of communications between Slurm components.  The  default  value
3199              is "root".
3200
3201
3202       SlurmctldAddr
3203              An  optional  address  to be used for communications to the cur‐
3204              rently active slurmctld daemon, normally used  with  Virtual  IP
3205              addressing of the currently active server.  If this parameter is
3206              not specified then each primary and backup server will have  its
3207              own  unique  address used for communications as specified in the
3208              SlurmctldHost parameter.  If this parameter  is  specified  then
3209              the  SlurmctldHost  parameter  will still be used for communica‐
3210              tions to specific slurmctld primary or backup servers, for exam‐
3211              ple to cause all of them to read the current configuration files
3212              or shutdown.  Also see the  SlurmctldPrimaryOffProg  and  Slurm‐
3213              ctldPrimaryOnProg configuration parameters to configure programs
3214              to manipulate virtual IP address manipulation.
3215
3216
3217       SlurmctldDebug
3218              The level of detail to provide  slurmctld  daemon's  logs.   The
3219              default  value  is  info.   If the slurmctld daemon is initiated
3220              with -v or --verbose options, that debug level will be  preserve
3221              or restored upon reconfiguration.
3222
3223
3224              quiet     Log nothing
3225
3226              fatal     Log only fatal errors
3227
3228              error     Log only errors
3229
3230              info      Log errors and general informational messages
3231
3232              verbose   Log errors and verbose informational messages
3233
3234              debug     Log  errors  and  verbose  informational  messages and
3235                        debugging messages
3236
3237              debug2    Log errors and verbose informational messages and more
3238                        debugging messages
3239
3240              debug3    Log errors and verbose informational messages and even
3241                        more debugging messages
3242
3243              debug4    Log errors and verbose informational messages and even
3244                        more debugging messages
3245
3246              debug5    Log errors and verbose informational messages and even
3247                        more debugging messages
3248
3249
3250       SlurmctldHost
3251              The short, or long, hostname of the machine where Slurm  control
3252              daemon is executed (i.e. the name returned by the command "host‐
3253              name -s").  This hostname is optionally followed by the address,
3254              either  the  IP  address  or  a name by which the address can be
3255              identifed, enclosed  in  parentheses  (e.g.   SlurmctldHost=mas‐
3256              ter1(12.34.56.78)).  This value must be specified at least once.
3257              If specified more than once, the first hostname  named  will  be
3258              where  the  daemon runs.  If the first specified host fails, the
3259              daemon will execute on the second host.  If both the  first  and
3260              second  specified  host  fails,  the  daemon will execute on the
3261              third host.
3262
3263
3264       SlurmctldLogFile
3265              Fully qualified pathname of a file into which the slurmctld dae‐
3266              mon's  logs  are  written.   The default value is none (performs
3267              logging via syslog).
3268              See the section LOGGING if a pathname is specified.
3269
3270
3271       SlurmctldParameters
3272              Multiple options may be comma-separated.
3273
3274
3275              allow_user_triggers
3276                     Permit setting triggers from  non-root/slurm_user  users.
3277                     SlurmUser  must also be set to root to permit these trig‐
3278                     gers to work. See the strigger man  page  for  additional
3279                     details.
3280
3281              cloud_dns
3282                     By  default, Slurm expects that the network addresses for
3283                     cloud nodes won't won't be known until  creation  of  the
3284                     node  and  that  Slurm  will  be  notified  of the node's
3285                     address    (e.g.    scontrol    update    nodename=<name>
3286                     nodeaddr=<addr>).  Since Slurm communications rely on the
3287                     node configuration found in the  slurm.conf,  Slurm  will
3288                     tell  the  client command, after waiting for all nodes to
3289                     boot, each node's ip address.  However,  in  environments
3290                     where  the  nodes are in DNS, this step can be avoided by
3291                     configuring this option.
3292
3293
3294       SlurmctldPidFile
3295              Fully qualified pathname of a file  into  which  the   slurmctld
3296              daemon  may write its process id. This may be used for automated
3297              signal  processing.   The  default  value  is   "/var/run/slurm‐
3298              ctld.pid".
3299
3300
3301       SlurmctldPlugstack
3302              A comma delimited list of Slurm controller plugins to be started
3303              when the daemon begins and terminated when it  ends.   Only  the
3304              plugin's init and fini functions are called.
3305
3306
3307       SlurmctldPort
3308              The port number that the Slurm controller, slurmctld, listens to
3309              for work. The default value is SLURMCTLD_PORT as established  at
3310              system  build  time. If none is explicitly specified, it will be
3311              set to 6817.  SlurmctldPort may also be configured to support  a
3312              range of port numbers in order to accept larger bursts of incom‐
3313              ing messages by specifying two numbers separated by a dash (e.g.
3314              SlurmctldPort=6817-6818).   NOTE:  Either  slurmctld  and slurmd
3315              daemons must not execute on the same  nodes  or  the  values  of
3316              SlurmctldPort and SlurmdPort must be different.
3317
3318              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3319              automatically try to interact  with  anything  opened  on  ports
3320              8192-60000.   Configure  SlurmctldPort  to use a port outside of
3321              the configured SrunPortRange and RSIP's port range.
3322
3323
3324       SlurmctldPrimaryOffProg
3325              This program is executed when a slurmctld daemon running as  the
3326              primary server becomes a backup server. By default no program is
3327              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3328              ter.
3329
3330
3331       SlurmctldPrimaryOnProg
3332              This  program  is  executed when a slurmctld daemon running as a
3333              backup server becomes the primary server. By default no  program
3334              is  executed.   When  using  virtual IP addresses to manage High
3335              Available Slurm services, this program can be used to add the IP
3336              address  to  an  interface (and optionally try to kill the unre‐
3337              sponsive  slurmctld daemon and flush the ARP caches on nodes  on
3338              the local ethernet fabric).  See also the related "SlurmctldPri‐
3339              maryOffProg" parameter.
3340
3341       SlurmctldSyslogDebug
3342              The slurmctld daemon will log events to the syslog file  at  the
3343              specified level of detail. If not set, the slurmctld daemon will
3344              log to syslog at level fatal, unless there is  no  SlurmctldLog‐
3345              File  and it is running in the background, in which case it will
3346              log to syslog at the level specified by SlurmctldDebug (at fatal
3347              in the case that SlurmctldDebug is set to quiet) or it is run in
3348              the foreground, when it will be set to quiet.
3349
3350
3351              quiet     Log nothing
3352
3353              fatal     Log only fatal errors
3354
3355              error     Log only errors
3356
3357              info      Log errors and general informational messages
3358
3359              verbose   Log errors and verbose informational messages
3360
3361              debug     Log errors  and  verbose  informational  messages  and
3362                        debugging messages
3363
3364              debug2    Log errors and verbose informational messages and more
3365                        debugging messages
3366
3367              debug3    Log errors and verbose informational messages and even
3368                        more debugging messages
3369
3370              debug4    Log errors and verbose informational messages and even
3371                        more debugging messages
3372
3373              debug5    Log errors and verbose informational messages and even
3374                        more debugging messages
3375
3376
3377
3378       SlurmctldTimeout
3379              The  interval,  in seconds, that the backup controller waits for
3380              the primary controller to respond before assuming control.   The
3381              default value is 120 seconds.  May not exceed 65533.
3382
3383
3384       SlurmdDebug
3385              The  level  of  detail  to  provide  slurmd  daemon's logs.  The
3386              default value is info.
3387
3388              quiet     Log nothing
3389
3390              fatal     Log only fatal errors
3391
3392              error     Log only errors
3393
3394              info      Log errors and general informational messages
3395
3396              verbose   Log errors and verbose informational messages
3397
3398              debug     Log errors  and  verbose  informational  messages  and
3399                        debugging messages
3400
3401              debug2    Log errors and verbose informational messages and more
3402                        debugging messages
3403
3404              debug3    Log errors and verbose informational messages and even
3405                        more debugging messages
3406
3407              debug4    Log errors and verbose informational messages and even
3408                        more debugging messages
3409
3410              debug5    Log errors and verbose informational messages and even
3411                        more debugging messages
3412
3413
3414       SlurmdLogFile
3415              Fully  qualified  pathname of a file into which the  slurmd dae‐
3416              mon's logs are written.  The default  value  is  none  (performs
3417              logging  via syslog).  Any "%h" within the name is replaced with
3418              the hostname on which the slurmd is running.   Any  "%n"  within
3419              the  name  is  replaced  with  the  Slurm node name on which the
3420              slurmd is running.
3421              See the section LOGGING if a pathname is specified.
3422
3423
3424       SlurmdPidFile
3425              Fully qualified pathname of a file into which the  slurmd daemon
3426              may  write its process id. This may be used for automated signal
3427              processing.  Any "%h" within the name is replaced with the host‐
3428              name  on  which the slurmd is running.  Any "%n" within the name
3429              is replaced with the Slurm node name on which the slurmd is run‐
3430              ning.  The default value is "/var/run/slurmd.pid".
3431
3432
3433       SlurmdPort
3434              The port number that the Slurm compute node daemon, slurmd, lis‐
3435              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3436              lished  at  system  build time. If none is explicitly specified,
3437              its value will be 6818.  NOTE: Either slurmctld and slurmd  dae‐
3438              mons  must not execute on the same nodes or the values of Slurm‐
3439              ctldPort and SlurmdPort must be different.
3440
3441              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3442              automatically  try  to  interact  with  anything opened on ports
3443              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3444              configured SrunPortRange and RSIP's port range.
3445
3446
3447       SlurmdSpoolDir
3448              Fully  qualified  pathname  of a directory into which the slurmd
3449              daemon's state information and batch job script information  are
3450              written.  This  must  be  a  common  pathname for all nodes, but
3451              should represent a directory which is local to each node (refer‐
3452              ence    a   local   file   system).   The   default   value   is
3453              "/var/spool/slurmd".  Any "%h" within the name is replaced  with
3454              the  hostname  on  which the slurmd is running.  Any "%n" within
3455              the name is replaced with the  Slurm  node  name  on  which  the
3456              slurmd is running.
3457
3458
3459       SlurmdSyslogDebug
3460              The  slurmd  daemon  will  log  events to the syslog file at the
3461              specified level of detail. If not set, the  slurmd  daemon  will
3462              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3463              and it is running in the background, in which case it  will  log
3464              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3465              the case that SlurmdDebug is set to quiet) or it is run  in  the
3466              foreground, when it will be set to quiet.
3467
3468
3469              quiet     Log nothing
3470
3471              fatal     Log only fatal errors
3472
3473              error     Log only errors
3474
3475              info      Log errors and general informational messages
3476
3477              verbose   Log errors and verbose informational messages
3478
3479              debug     Log  errors  and  verbose  informational  messages and
3480                        debugging messages
3481
3482              debug2    Log errors and verbose informational messages and more
3483                        debugging messages
3484
3485              debug3    Log errors and verbose informational messages and even
3486                        more debugging messages
3487
3488              debug4    Log errors and verbose informational messages and even
3489                        more debugging messages
3490
3491              debug5    Log errors and verbose informational messages and even
3492                        more debugging messages
3493
3494
3495       SlurmdTimeout
3496              The interval, in seconds, that the Slurm  controller  waits  for
3497              slurmd  to respond before configuring that node's state to DOWN.
3498              A value of zero indicates the node will not be tested by  slurm‐
3499              ctld  to confirm the state of slurmd, the node will not be auto‐
3500              matically set  to  a  DOWN  state  indicating  a  non-responsive
3501              slurmd,  and  some other tool will take responsibility for moni‐
3502              toring the state of each compute node  and  its  slurmd  daemon.
3503              Slurm's hierarchical communication mechanism is used to ping the
3504              slurmd daemons in order to minimize system noise  and  overhead.
3505              The  default  value  is  300  seconds.  The value may not exceed
3506              65533 seconds.
3507
3508
3509       SlurmSchedLogFile
3510              Fully qualified pathname of the scheduling event  logging  file.
3511              The  syntax  of  this parameter is the same as for SlurmctldLog‐
3512              File.  In order to configure scheduler  logging,  set  both  the
3513              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3514
3515
3516       SlurmSchedLogLevel
3517              The  initial  level  of scheduling event logging, similar to the
3518              SlurmctldDebug parameter used to control the  initial  level  of
3519              slurmctld  logging.  Valid values for SlurmSchedLogLevel are "0"
3520              (scheduler  logging  disabled)  and   "1"   (scheduler   logging
3521              enabled).   If  this parameter is omitted, the value defaults to
3522              "0" (disabled).  In order to configure  scheduler  logging,  set
3523              both  the  SlurmSchedLogFile  and SlurmSchedLogLevel parameters.
3524              The scheduler logging level can  be  changed  dynamically  using
3525              scontrol.
3526
3527
3528       SrunEpilog
3529              Fully qualified pathname of an executable to be run by srun fol‐
3530              lowing the completion of a job step.  The command line arguments
3531              for  the executable will be the command and arguments of the job
3532              step.  This configuration parameter may be overridden by  srun's
3533              --epilog  parameter. Note that while the other "Epilog" executa‐
3534              bles (e.g., TaskEpilog) are run by slurmd on the  compute  nodes
3535              where  the  tasks  are executed, the SrunEpilog runs on the node
3536              where the "srun" is executing.
3537
3538
3539       SrunPortRange
3540              The srun creates a set of listening ports  to  communicate  with
3541              the  controller,  the  slurmstepd  and to handle the application
3542              I/O.  By default these ports are ephemeral meaning the port num‐
3543              bers  are  selected  by  the  kernel. Using this parameter allow
3544              sites to configure a range of ports from which srun  ports  will
3545              be  selected. This is useful if sites want to allow only certain
3546              port range on their network.
3547
3548              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3549              automatically  try  to  interact  with  anything opened on ports
3550              8192-60000.  Configure SrunPortRange to use  a  range  of  ports
3551              above  those used by RSIP, ideally 1000 or more ports, for exam‐
3552              ple "SrunPortRange=60001-63000".
3553
3554              Note: A sufficient number of ports must be configured  based  on
3555              the estimated number of srun on the submission nodes considering
3556              that srun opens 3 listening ports  plus  2  more  for  every  48
3557              hosts. Example:
3558
3559              srun -N 48 will use 5 listening ports.
3560
3561
3562              srun -N 50 will use 7 listening ports.
3563
3564
3565              srun -N 200 will use 13 listening ports.
3566
3567
3568       SrunProlog
3569              Fully  qualified  pathname  of  an  executable to be run by srun
3570              prior to the launch of a job step.  The command  line  arguments
3571              for  the executable will be the command and arguments of the job
3572              step.  This configuration parameter may be overridden by  srun's
3573              --prolog  parameter. Note that while the other "Prolog" executa‐
3574              bles (e.g., TaskProlog) are run by slurmd on the  compute  nodes
3575              where  the  tasks  are executed, the SrunProlog runs on the node
3576              where the "srun" is executing.
3577
3578
3579       StateSaveLocation
3580              Fully qualified pathname of a directory  into  which  the  Slurm
3581              controller,     slurmctld,     saves     its     state     (e.g.
3582              "/usr/local/slurm/checkpoint").  Slurm state will saved here  to
3583              recover  from system failures.  SlurmUser must be able to create
3584              files in this directory.  If you have a BackupController config‐
3585              ured, this location should be readable and writable by both sys‐
3586              tems.  Since all running and pending job information  is  stored
3587              here,  the  use  of a reliable file system (e.g. RAID) is recom‐
3588              mended.  The default value is "/var/spool".  If any  slurm  dae‐
3589              mons terminate abnormally, their core files will also be written
3590              into this directory.
3591
3592
3593       SuspendExcNodes
3594              Specifies the nodes which are to not be  placed  in  power  save
3595              mode,  even  if  the node remains idle for an extended period of
3596              time.  Use Slurm's hostlist expression to identify nodes with an
3597              optional  ":"  separator  and count of nodes to exclude from the
3598              preceding range.  For  example  "nid[10-20]:4"  will  prevent  4
3599              usable nodes (i.e IDLE and not DOWN, DRAINING or already powered
3600              down) in the set "nid[10-20]" from being powered down.  Multiple
3601              sets of nodes can be specified with or without counts in a comma
3602              separated list (e.g  "nid[10-20]:4,nid[80-90]:2").   If  a  node
3603              count  specification  is  given, any list of nodes to NOT have a
3604              node count must be after the last specification  with  a  count.
3605              For  example  "nid[10-20]:4,nid[60-70]"  will exclude 4 nodes in
3606              the set "nid[10-20]:4" plus all nodes in  the  set  "nid[60-70]"
3607              while  "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the set
3608              "nid[1-3],nid[10-20]".   By  default  no  nodes  are   excluded.
3609              Related  configuration options include ResumeTimeout, ResumePro‐
3610              gram, ResumeRate, SuspendProgram, SuspendRate, SuspendTime, Sus‐
3611              pendTimeout, and SuspendExcParts.
3612
3613
3614       SuspendExcParts
3615              Specifies  the  partitions  whose  nodes are to not be placed in
3616              power save mode, even if the node remains idle for  an  extended
3617              period of time.  Multiple partitions can be identified and sepa‐
3618              rated by commas.  By default no  nodes  are  excluded.   Related
3619              configuration   options  include  ResumeTimeout,  ResumeProgram,
3620              ResumeRate, SuspendProgram,  SuspendRate,  SuspendTime  Suspend‐
3621              Timeout, and SuspendExcNodes.
3622
3623
3624       SuspendProgram
3625              SuspendProgram  is the program that will be executed when a node
3626              remains idle for an extended period of time.   This  program  is
3627              expected  to place the node into some power save mode.  This can
3628              be used to reduce the frequency and voltage of a  node  or  com‐
3629              pletely  power the node off.  The program executes as SlurmUser.
3630              The argument to the program will be the names  of  nodes  to  be
3631              placed  into  power savings mode (using Slurm's hostlist expres‐
3632              sion format).  By default, no program is run.  Related  configu‐
3633              ration options include ResumeTimeout, ResumeProgram, ResumeRate,
3634              SuspendRate, SuspendTime, SuspendTimeout,  SuspendExcNodes,  and
3635              SuspendExcParts.
3636
3637
3638       SuspendRate
3639              The  rate at which nodes are placed into power save mode by Sus‐
3640              pendProgram.  The value is number of nodes per minute and it can
3641              be used to prevent a large drop in power consumption (e.g. after
3642              a large job completes).  A value of zero results  in  no  limits
3643              being  imposed.   The  default  value  is  60  nodes per minute.
3644              Related configuration options include ResumeTimeout,  ResumePro‐
3645              gram,  ResumeRate,  SuspendProgram, SuspendTime, SuspendTimeout,
3646              SuspendExcNodes, and SuspendExcParts.
3647
3648
3649       SuspendTime
3650              Nodes which remain idle for  this  number  of  seconds  will  be
3651              placed  into  power  save mode by SuspendProgram.  For efficient
3652              system utilization, it is recommended that the value of Suspend‐
3653              Time  be  at  least  as  large as the sum of SuspendTimeout plus
3654              ResumeTimeout.  A value of -1 disables power save  mode  and  is
3655              the  default.  Related configuration options include ResumeTime‐
3656              out,  ResumeProgram,  ResumeRate,  SuspendProgram,  SuspendRate,
3657              SuspendTimeout, SuspendExcNodes, and SuspendExcParts.
3658
3659
3660       SuspendTimeout
3661              Maximum  time permitted (in seconds) between when a node suspend
3662              request is issued and when the node is shutdown.  At  that  time
3663              the  node  must  be  ready  for a resume request to be issued as
3664              needed for new work.  The default value is 30 seconds.   Related
3665              configuration options include ResumeProgram, ResumeRate, Resume‐
3666              Timeout, SuspendRate, SuspendTime, SuspendProgram,  SuspendExcN‐
3667              odes  and SuspendExcParts.  More information is available at the
3668              Slurm web site ( https://slurm.schedmd.com/power_save.html ).
3669
3670
3671       SwitchType
3672              Identifies the type of switch or interconnect used for  applica‐
3673              tion  communications.   Acceptable  values include "switch/cray"
3674              for Cray systems, "switch/none" for switches not requiring  spe‐
3675              cial  processing  for  job  launch or termination (Ethernet, and
3676              InfiniBand) and The default value is "switch/none".   All  Slurm
3677              daemons,  commands  and  running  jobs  must  be restarted for a
3678              change in SwitchType to take effect.  If running jobs  exist  at
3679              the  time slurmctld is restarted with a new value of SwitchType,
3680              records of all jobs in any state may be lost.
3681
3682
3683       TaskEpilog
3684              Fully qualified pathname of a program to be execute as the slurm
3685              job's  owner after termination of each task.  See TaskProlog for
3686              execution order details.
3687
3688
3689       TaskPlugin
3690              Identifies the type of task launch  plugin,  typically  used  to
3691              provide resource management within a node (e.g. pinning tasks to
3692              specific processors). More than one task plugin can be specified
3693              in  a  comma  separated list. The prefix of "task/" is optional.
3694              Acceptable values include:
3695
3696              task/affinity  enables resource containment using CPUSETs.  This
3697                             enables  the  --cpu-bind  and/or  --mem-bind srun
3698                             options.   If   you   use   "task/affinity"   and
3699                             encounter  problems, it may be due to the variety
3700                             of system calls used to implement  task  affinity
3701                             on different operating systems.
3702
3703              task/cgroup    enables  resource containment using Linux control
3704                             cgroups.   This  enables  the  --cpu-bind  and/or
3705                             --mem-bind   srun   options.    NOTE:   see  "man
3706                             cgroup.conf" for configuration details.
3707
3708              task/none      for systems requiring no special handling of user
3709                             tasks.   Lacks  support for the --cpu-bind and/or
3710                             --mem-bind srun options.  The  default  value  is
3711                             "task/none".
3712
3713       NOTE:  It  is  recommended  to stack task/affinity,task/cgroup together
3714       when configuring TaskPlugin, and setting TaskAffinity=no and Constrain‐
3715       Cores=yes  in cgroup.conf. This setup uses the task/affinity plugin for
3716       setting the affinity of the tasks (which is better and  different  than
3717       task/cgroup)  and  uses  the task/cgroup plugin to fence tasks into the
3718       specified resources, thus combining the best of both pieces.
3719
3720       NOTE: For CRAY systems only: task/cgroup must be used with, and  listed
3721       after  task/cray  in TaskPlugin. The task/affinity plugin can be listed
3722       everywhere, but the previous constrain must be satisfied. So  for  CRAY
3723       systems, a configuration like this is recommended:
3724
3725       TaskPlugin=task/affinity,task/cray,task/cgroup
3726
3727
3728       TaskPluginParam
3729              Optional  parameters  for  the  task  plugin.   Multiple options
3730              should be comma separated.  If  None,  Boards,  Sockets,  Cores,
3731              Threads,  and/or  Verbose  are specified, they will override the
3732              --cpu-bind option specified by the user  in  the  srun  command.
3733              None,  Boards, Sockets, Cores and Threads are mutually exclusive
3734              and since they decrease scheduling flexibility are not generally
3735              recommended  (select  no  more  than  one of them).  Cpusets and
3736              Sched are mutually exclusive (select only  one  of  them).   All
3737              TaskPluginParam options are supported on FreeBSD except Cpusets.
3738              The Sched  option  uses  cpuset_setaffinity()  on  FreeBSD,  not
3739              sched_setaffinity().
3740
3741
3742              Boards    Bind  tasks to boards by default.  Overrides automatic
3743                        binding.
3744
3745              Cores     Bind tasks to cores by default.   Overrides  automatic
3746                        binding.
3747
3748              Cpusets   Use  cpusets  to  perform task affinity functions.  By
3749                        default, Sched task binding is performed.
3750
3751              None      Perform no task binding by default.   Overrides  auto‐
3752                        matic binding.
3753
3754              Sched     Use  sched_setaffinity (if available) to bind tasks to
3755                        processors.
3756
3757              Sockets   Bind to sockets by default.  Overrides automatic bind‐
3758                        ing.
3759
3760              Threads   Bind to threads by default.  Overrides automatic bind‐
3761                        ing.
3762
3763              SlurmdOffSpec
3764                        If specialized cores or CPUs are  identified  for  the
3765                        node  (i.e.  the CoreSpecCount or CpuSpecList are con‐
3766                        figured for the node), then Slurm daemons  running  on
3767                        the  compute  node (i.e. slurmd and slurmstepd) should
3768                        run  outside  of  those  resources  (i.e.  specialized
3769                        resources  are completely unavailable to Slurm daemons
3770                        and jobs spawned by Slurm).  This option  may  not  be
3771                        used with the task/cray plugin.
3772
3773              Verbose   Verbosely  report binding before tasks run.  Overrides
3774                        user options.
3775
3776              Autobind  Set a default binding in the event that "auto binding"
3777                        doesn't  find a match.  Set to Threads, Cores or Sock‐
3778                        ets (E.g. TaskPluginParam=autobind=threads).
3779
3780
3781       TaskProlog
3782              Fully qualified pathname of a program to be execute as the slurm
3783              job's  owner prior to initiation of each task.  Besides the nor‐
3784              mal environment variables, this has SLURM_TASK_PID available  to
3785              identify  the  process  ID  of the task being started.  Standard
3786              output from this program can be used to control the  environment
3787              variables and output for the user program.
3788
3789              export NAME=value   Will  set environment variables for the task
3790                                  being spawned.  Everything after  the  equal
3791                                  sign  to the end of the line will be used as
3792                                  the  value  for  the  environment  variable.
3793                                  Exporting of functions is not currently sup‐
3794                                  ported.
3795
3796              print ...           Will cause that line  (without  the  leading
3797                                  "print  ")  to be printed to the job's stan‐
3798                                  dard output.
3799
3800              unset NAME          Will clear  environment  variables  for  the
3801                                  task being spawned.
3802
3803              The order of task prolog/epilog execution is as follows:
3804
3805              1. pre_launch_priv()
3806                                  Function in TaskPlugin
3807
3808              1. pre_launch()     Function in TaskPlugin
3809
3810              2. TaskProlog       System-wide  per  task  program  defined  in
3811                                  slurm.conf
3812
3813              3. user prolog      Job step specific task program defined using
3814                                  srun's      --task-prolog      option     or
3815                                  SLURM_TASK_PROLOG environment variable
3816
3817              4. Execute the job step's task
3818
3819              5. user epilog      Job step specific task program defined using
3820                                  srun's      --task-epilog      option     or
3821                                  SLURM_TASK_EPILOG environment variable
3822
3823              6. TaskEpilog       System-wide  per  task  program  defined  in
3824                                  slurm.conf
3825
3826              7. post_term()      Function in TaskPlugin
3827
3828
3829       TCPTimeout
3830              Time  permitted  for  TCP  connection to be established. Default
3831              value is 2 seconds.
3832
3833
3834       TmpFS  Fully qualified pathname of the file system  available  to  user
3835              jobs for temporary storage. This parameter is used in establish‐
3836              ing a node's TmpDisk space.  The default value is "/tmp".
3837
3838
3839       TopologyParam
3840              Comma separated options identifying network topology options.
3841
3842              Dragonfly      Optimize allocation for Dragonfly network.  Valid
3843                             when TopologyPlugin=topology/tree.
3844
3845              TopoOptional   Only  optimize allocation for network topology if
3846                             the job includes a switch option. Since  optimiz‐
3847                             ing  resource  allocation  for  topology involves
3848                             much higher system overhead, this option  can  be
3849                             used  to  impose  the extra overhead only on jobs
3850                             which can take advantage of it. If most job allo‐
3851                             cations  are  not optimized for network topology,
3852                             they make fragment resources to  the  point  that
3853                             topology optimization for other jobs will be dif‐
3854                             ficult to achieve.  NOTE: Jobs  may  span  across
3855                             nodes  without  common  parent switches with this
3856                             enabled.
3857
3858
3859       TopologyPlugin
3860              Identifies the plugin to be used  for  determining  the  network
3861              topology and optimizing job allocations to minimize network con‐
3862              tention.  See NETWORK TOPOLOGY below  for  details.   Additional
3863              plugins  may  be  provided  in  the future which gather topology
3864              information  directly  from  the  network.   Acceptable   values
3865              include:
3866
3867              topology/3d_torus    best-fit   logic   over   three-dimensional
3868                                   topology
3869
3870              topology/node_rank   orders  nodes  based  upon  information   a
3871                                   node_rank  field in the node record as gen‐
3872                                   erated by a select plugin. Slurm performs a
3873                                   best-fit algorithm over those ordered nodes
3874
3875              topology/none        default  for  other systems, best-fit logic
3876                                   over one-dimensional topology
3877
3878              topology/tree        used  for   a   hierarchical   network   as
3879                                   described in a topology.conf file
3880
3881
3882       TrackWCKey
3883              Boolean  yes  or no.  Used to set display and track of the Work‐
3884              load Characterization Key.  Must be set to track  correct  wckey
3885              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
3886              file to create historical usage reports.
3887
3888
3889       TreeWidth
3890              Slurmd daemons use a virtual tree  network  for  communications.
3891              TreeWidth specifies the width of the tree (i.e. the fanout).  On
3892              architectures with a front end node running the  slurmd  daemon,
3893              the  value must always be equal to or greater than the number of
3894              front end nodes which eliminates the need for message forwarding
3895              between  the slurmd daemons.  On other architectures the default
3896              value is 50, meaning each slurmd daemon can communicate with  up
3897              to  50 other slurmd daemons and over 2500 nodes can be contacted
3898              with two message hops.  The default value  will  work  well  for
3899              most  clusters.   Optimal  system  performance  can typically be
3900              achieved if TreeWidth is set to the square root of the number of
3901              nodes  in the cluster for systems having no more than 2500 nodes
3902              or the cube root for larger systems. The value  may  not  exceed
3903              65533.
3904
3905
3906       UnkillableStepProgram
3907              If  the  processes in a job step are determined to be unkillable
3908              for a period of  time  specified  by  the  UnkillableStepTimeout
3909              variable, the program specified by UnkillableStepProgram will be
3910              executed.  This program can be used to take special  actions  to
3911              clean  up the unkillable processes and/or notify computer admin‐
3912              istrators.  The program will be run SlurmdUser (usually  "root")
3913              on the compute node.  By default no program is run.
3914
3915
3916       UnkillableStepTimeout
3917              The  length  of  time,  in  seconds, that Slurm will wait before
3918              deciding that processes in a job step are unkillable (after they
3919              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
3920              gram as described above.  The default timeout value is  60  sec‐
3921              onds.   If exceeded, the compute node will be drained to prevent
3922              future jobs from being scheduled on the node.
3923
3924
3925       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
3926              will  be enabled.  PAM is used to establish the upper bounds for
3927              resource limits. With PAM support enabled, local system adminis‐
3928              trators can dynamically configure system resource limits. Chang‐
3929              ing the upper bound of a resource limit will not alter the  lim‐
3930              its  of  running jobs, only jobs started after a change has been
3931              made will pick up the new limits.  The default value is  0  (not
3932              to enable PAM support).  Remember that PAM also needs to be con‐
3933              figured to support Slurm as a service.  For  sites  using  PAM's
3934              directory based configuration option, a configuration file named
3935              slurm should be created.  The  module-type,  control-flags,  and
3936              module-path names that should be included in the file are:
3937              auth        required      pam_localuser.so
3938              auth        required      pam_shells.so
3939              account     required      pam_unix.so
3940              account     required      pam_access.so
3941              session     required      pam_unix.so
3942              For sites configuring PAM with a general configuration file, the
3943              appropriate lines (see above), where slurm is the  service-name,
3944              should be added.
3945
3946              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
3947              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
3948              these  two  modules  can work independently of the value set for
3949              UsePAM.
3950
3951
3952       VSizeFactor
3953              Memory specifications in job requests apply to real memory  size
3954              (also  known  as  resident  set size). It is possible to enforce
3955              virtual memory limits for both jobs and job  steps  by  limiting
3956              their  virtual  memory  to  some percentage of their real memory
3957              allocation. The VSizeFactor parameter specifies the job's or job
3958              step's  virtual  memory limit as a percentage of its real memory
3959              limit. For example, if a job's real memory limit  is  500MB  and
3960              VSizeFactor  is  set  to  101 then the job will be killed if its
3961              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
3962              (101 percent of the real memory limit).  The default value is 0,
3963              which disables enforcement of virtual memory limits.  The  value
3964              may not exceed 65533 percent.
3965
3966
3967       WaitTime
3968              Specifies  how  many  seconds the srun command should by default
3969              wait after the first  task  terminates  before  terminating  all
3970              remaining  tasks.  The  "--wait" option on the srun command line
3971              overrides this value.  The default value is  0,  which  disables
3972              this feature.  May not exceed 65533 seconds.
3973
3974
3975       X11Parameters
3976              For use with Slurm's built-in X11 forwarding implementation.
3977
3978              local_xauthority
3979                      If set, xauth data on the compute node will be placed in
3980                      a temporary file (under TmpFS) rather  than  in  ~/.Xau‐
3981                      thority, and the XAUTHORITY environment variable will be
3982                      injected into the job's  environment  (as  well  as  any
3983                      process  captured  by  pam_slurm_adopt).   This can help
3984                      avoid file locking contention on the user's home  direc‐
3985                      tory.
3986
3987              use_raw_hostname
3988                      If  set,  xauth hostname will use the raw value of geth‐
3989                      ostname() instead of the local  part-only  (as  is  used
3990                      elsewhere within Slurm).
3991
3992
3993       The configuration of nodes (or machines) to be managed by Slurm is also
3994       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
3995       adding  nodes, changing their processor count, etc.) require restarting
3996       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
3997       must  know  each  node  in the system to forward messages in support of
3998       hierarchical communications.  Only the NodeName must be supplied in the
3999       configuration  file.   All  other  node  configuration  information  is
4000       optional.  It is advisable to establish baseline  node  configurations,
4001       especially  if  the  cluster is heterogeneous.  Nodes which register to
4002       the system with less than the configured  resources  (e.g.  too  little
4003       memory), will be placed in the "DOWN" state to avoid scheduling jobs on
4004       them.  Establishing baseline configurations  will  also  speed  Slurm's
4005       scheduling process by permitting it to compare job requirements against
4006       these (relatively few) configuration parameters and possibly avoid hav‐
4007       ing  to check job requirements against every individual node's configu‐
4008       ration.  The resources checked at node  registration  time  are:  CPUs,
4009       RealMemory and TmpDisk.  While baseline values for each of these can be
4010       established in the configuration file, the actual values upon node reg‐
4011       istration are recorded and these actual values may be used for schedul‐
4012       ing purposes (depending upon the value of FastSchedule in the  configu‐
4013       ration file.
4014
4015       Default  values  can  be  specified  with a record in which NodeName is
4016       "DEFAULT".  The default entry values will apply only to lines following
4017       it in the configuration file and the default values can be reset multi‐
4018       ple times in the configuration file with multiple entries where  "Node‐
4019       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4020       add to previous default values and not a reinitialize the default  val‐
4021       ues.   The  "NodeName="  specification  must  be  placed  on every line
4022       describing the configuration of nodes.  A  single  node  name  can  not
4023       appear  as  a NodeName value in more than one line (duplicate node name
4024       records will be ignored).  In fact, it is generally possible and desir‐
4025       able  to  define  the  configurations of all nodes in only a few lines.
4026       This convention permits significant optimization in the  scheduling  of
4027       larger  clusters.   In  order  to support the concept of jobs requiring
4028       consecutive nodes on some architectures, node specifications should  be
4029       place  in  this  file in consecutive order.  No single node name may be
4030       listed more than once in the configuration file.  Use  "DownNodes="  to
4031       record  the  state  of  nodes which are temporarily in a DOWN, DRAIN or
4032       FAILING state without altering permanent configuration information.   A
4033       job  step's  tasks  are allocated to nodes in order the nodes appear in
4034       the configuration file. There is presently no capability  within  Slurm
4035       to arbitrarily order a job step's tasks.
4036
4037       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4038       and/or a simple node range expression may optionally be used to specify
4039       numeric  ranges  of  nodes  to avoid building a configuration file with
4040       large numbers of entries.  The node range expression  can  contain  one
4041       pair  of  square  brackets  with  a sequence of comma separated numbers
4042       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4043       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4044       more leading zeros to indicate the numeric portion has a  fixed  number
4045       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4046       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4047       more  numeric  expressions are included, one of them must be at the end
4048       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4049       always be used in a comma separated list.
4050
4051       The node configuration specified the following information:
4052
4053
4054       NodeName
4055              Name  that  Slurm uses to refer to a node.  Typically this would
4056              be the string that "/bin/hostname -s" returns.  It may  also  be
4057              the  fully  qualified  domain name as returned by "/bin/hostname
4058              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4059              with  the  host  through  the host database (/etc/hosts) or DNS,
4060              depending on the resolver settings.  Note that if the short form
4061              of  the  hostname  is  not  used, it may prevent use of hostlist
4062              expressions (the numeric portion in brackets must be at the  end
4063              of the string).  It may also be an arbitrary string if NodeHost‐
4064              name is specified.  If the NodeName  is  "DEFAULT",  the  values
4065              specified  with that record will apply to subsequent node speci‐
4066              fications unless explicitly set to other  values  in  that  node
4067              record or replaced with a different set of default values.  Each
4068              line where NodeName is "DEFAULT" will replace or add to previous
4069              default  values  and not a reinitialize the default values.  For
4070              architectures in which the node order is significant, nodes will
4071              be considered consecutive in the order defined.  For example, if
4072              the configuration for "NodeName=charlie" immediately follows the
4073              configuration for "NodeName=baker" they will be considered adja‐
4074              cent in the computer.
4075
4076
4077       NodeHostname
4078              Typically this would  be  the  string  that  "/bin/hostname  -s"
4079              returns.   It  may  also  be  the fully qualified domain name as
4080              returned by "/bin/hostname -f"  (e.g.  "foo1.bar.com"),  or  any
4081              valid  domain  name  associated  with  the host through the host
4082              database (/etc/hosts) or DNS, depending  on  the  resolver  set‐
4083              tings.  Note that if the short form of the hostname is not used,
4084              it may prevent use of hostlist expressions (the numeric  portion
4085              in  brackets  must  be  at the end of the string).  A node range
4086              expression can be used to specify a set of nodes.  If an expres‐
4087              sion  is used, the number of nodes identified by NodeHostname on
4088              a line in the configuration file must be identical to the number
4089              of  nodes  identified by NodeName.  By default, the NodeHostname
4090              will be identical in value to NodeName.
4091
4092
4093       NodeAddr
4094              Name that a node should be referred to in establishing a  commu‐
4095              nications  path.   This  name will be used as an argument to the
4096              gethostbyname() function for identification.  If  a  node  range
4097              expression  is  used  to  designate  multiple  nodes,  they must
4098              exactly  match  the  entries  in  the  NodeName   (e.g.   "Node‐
4099              Name=lx[0-7]  NodeAddr=elx[0-7]").  NodeAddr may also contain IP
4100              addresses.  By default, the NodeAddr will be identical in  value
4101              to NodeHostname.
4102
4103
4104       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4105              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4106              and  ThreadsPerCore  should  be  specified.  Boards and CPUs are
4107              mutually exclusive.  The default value is 1.
4108
4109
4110       CoreSpecCount
4111              Number of cores reserved for system use.  These cores  will  not
4112              be  available  for  allocation to user jobs.  Depending upon the
4113              TaskPluginParam option of  SlurmdOffSpec,  Slurm  daemons  (i.e.
4114              slurmd and slurmstepd) may either be confined to these resources
4115              (the default) or prevented from using these  resources.   Isola‐
4116              tion of the Slurm daemons from user jobs may improve application
4117              performance.  If this option and CpuSpecList are both designated
4118              for a node, an error is generated.  For information on the algo‐
4119              rithm used by Slurm to select the cores refer to the  core  spe‐
4120              cialization                    documentation                   (
4121              https://slurm.schedmd.com/core_spec.html ).
4122
4123
4124       CoresPerSocket
4125              Number of cores in a  single  physical  processor  socket  (e.g.
4126              "2").   The  CoresPerSocket  value describes physical cores, not
4127              the logical number of processors per socket.  NOTE: If you  have
4128              multi-core  processors,  you  will  likely  need to specify this
4129              parameter in order to optimize scheduling.  The default value is
4130              1.
4131
4132
4133       CpuBind
4134              If  a job step request does not specify an option to control how
4135              tasks are bound to allocated CPUs  (--cpu-bind)  and  all  nodes
4136              allocated  to the job have the same CpuBind option the node Cpu‐
4137              Bind option will  control  how  tasks  are  bound  to  allocated
4138              resources.  Supported  values  for  CpuBind are "none", "board",
4139              "socket", "ldom" (NUMA), "core" and "thread".
4140
4141
4142       CPUs   Number of logical processors on the node (e.g. "2").   CPUs  and
4143              Boards are mutually exclusive. It can be set to the total number
4144              of sockets, cores or threads. This can be useful when  you  want
4145              to schedule only the cores on a hyper-threaded node.  If CPUs is
4146              omitted, it will be set equal to the product of Sockets,  Cores‐
4147              PerSocket, and ThreadsPerCore.  The default value is 1.
4148
4149
4150       CpuSpecList
4151              A  comma  delimited  list of Slurm abstract CPU IDs reserved for
4152              system use.  The list will be  expanded  to  include  all  other
4153              CPUs, if any, on the same cores.  These cores will not be avail‐
4154              able for allocation to user jobs.  Depending upon the  TaskPlug‐
4155              inParam  option of SlurmdOffSpec, Slurm daemons (i.e. slurmd and
4156              slurmstepd) may either  be  confined  to  these  resources  (the
4157              default)  or prevented from using these resources.  Isolation of
4158              the Slurm daemons from user jobs may improve application perfor‐
4159              mance.  If this option and CoreSpecCount are both designated for
4160              a node, an error is generated.  This option has no effect unless
4161              cgroup    job   confinement   is   also   configured   (TaskPlu‐
4162              gin=task/cgroup with ConstrainCores=yes in cgroup.conf).
4163
4164
4165       Feature
4166              A comma delimited list of arbitrary strings indicative  of  some
4167              characteristic  associated  with  the  node.   There is no value
4168              associated with a feature at this time, a node either has a fea‐
4169              ture or it does not.  If desired a feature may contain a numeric
4170              component indicating, for example, processor speed.  By  default
4171              a node has no features.  Also see Gres.
4172
4173
4174       Gres   A comma delimited list of generic resources specifications for a
4175              node.   The   format   is:   "<name>[:<type>][:no_consume]:<num‐
4176              ber>[K|M|G]".   The  first  field  is  the  resource name, which
4177              matches the GresType configuration parameter name.  The optional
4178              type  field  might  be  used to identify a model of that generic
4179              resource.  A generic resource can also be specified as  non-con‐
4180              sumable  (i.e.  multiple jobs can use the same generic resource)
4181              with the optional field ":no_consume".   The  final  field  must
4182              specify  a  generic resources count.  A suffix of "K", "M", "G",
4183              "T" or "P" may be used to multiply the number by 1024,  1048576,
4184              1073741824,                  etc.                  respectively.
4185              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4186              sume:4G").   By  default a node has no generic resources and its
4187              maximum count is that of an unsigned 64bit  integer.   Also  see
4188              Feature.
4189
4190
4191       MemSpecLimit
4192              Amount  of memory, in megabytes, reserved for system use and not
4193              available for user allocations.  If the  task/cgroup  plugin  is
4194              configured  and  that plugin constrains memory allocations (i.e.
4195              TaskPlugin=task/cgroup in slurm.conf, plus ConstrainRAMSpace=yes
4196              in  cgroup.conf),  then  Slurm compute node daemons (slurmd plus
4197              slurmstepd) will be allocated the specified memory  limit.  Note
4198              that having the Memory set in SelectTypeParameters as any of the
4199              options that has it as a consumable resource is needed for  this
4200              option  to work.  The daemons will not be killed if they exhaust
4201              the memory allocation (ie. the Out-Of-Memory Killer is  disabled
4202              for  the  daemon's memory cgroup).  If the task/cgroup plugin is
4203              not configured, the specified memory will  only  be  unavailable
4204              for user allocations.
4205
4206
4207       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4208              tens to for work on this particular node. By default there is  a
4209              single  port  number for all slurmd daemons on all compute nodes
4210              as defined by the SlurmdPort  configuration  parameter.  Use  of
4211              this  option is not generally recommended except for development
4212              or testing purposes. If multiple slurmd  daemons  execute  on  a
4213              node this can specify a range of ports.
4214
4215              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4216              automatically try to interact  with  anything  opened  on  ports
4217              8192-60000.  Configure Port to use a port outside of the config‐
4218              ured SrunPortRange and RSIP's port range.
4219
4220
4221       Procs  See CPUs.
4222
4223
4224       RealMemory
4225              Size of real memory on the node in megabytes (e.g. "2048").  The
4226              default value is 1. Lowering RealMemory with the goal of setting
4227              aside some amount for the OS and not available for  job  alloca‐
4228              tions  will  not work as intended if Memory is not set as a con‐
4229              sumable resource in SelectTypeParameters. So one of the *_Memory
4230              options  need  to  be  enabled for that goal to be accomplished.
4231              Also see MemSpecLimit.
4232
4233
4234       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4235              "DRAINED"  "DRAINING",  "FAIL"  or  "FAILING".   Use  quotes  to
4236              enclose a reason having more than one word.
4237
4238
4239       Sockets
4240              Number of physical processor sockets/chips  on  the  node  (e.g.
4241              "2").   If  Sockets  is  omitted, it will be inferred from CPUs,
4242              CoresPerSocket,  and  ThreadsPerCore.    NOTE:   If   you   have
4243              multi-core  processors,  you  will  likely need to specify these
4244              parameters.  Sockets and SocketsPerBoard are mutually exclusive.
4245              If  Sockets  is  specified  when Boards is also used, Sockets is
4246              interpreted as SocketsPerBoard rather than total  sockets.   The
4247              default value is 1.
4248
4249
4250       SocketsPerBoard
4251              Number  of  physical  processor  sockets/chips  on  a baseboard.
4252              Sockets and SocketsPerBoard are mutually exclusive.  The default
4253              value is 1.
4254
4255
4256       State  State  of  the node with respect to the initiation of user jobs.
4257              Acceptable values are "CLOUD", "DOWN", "DRAIN",  "FAIL",  "FAIL‐
4258              ING",  "FUTURE" and "UNKNOWN".  Node states of "BUSY" and "IDLE"
4259              should not be specified in the node configuration, but  set  the
4260              node  state  to  "UNKNOWN"  instead.   Setting the node state to
4261              "UNKNOWN" will result in the node state  being  set  to  "BUSY",
4262              "IDLE"  or  other  appropriate state based upon recovered system
4263              state information.  The default value is  "UNKNOWN".   Also  see
4264              the DownNodes parameter below.
4265
4266              CLOUD     Indicates  the node exists in the cloud.  It's initial
4267                        state will be treated as powered down.  The node  will
4268                        be  available  for  use  after it's state is recovered
4269                        from Slurm's state save  file  or  the  slurmd  daemon
4270                        starts on the compute node.
4271
4272              DOWN      Indicates  the  node  failed  and is unavailable to be
4273                        allocated work.
4274
4275              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4276                        work.on.
4277
4278              FAIL      Indicates  the  node  is expected to fail soon, has no
4279                        jobs allocated to it, and will not be allocated to any
4280                        new jobs.
4281
4282              FAILING   Indicates  the  node is expected to fail soon, has one
4283                        or more jobs allocated to it, but will  not  be  allo‐
4284                        cated to any new jobs.
4285
4286              FUTURE    Indicates  the node is defined for future use and need
4287                        not exist when the Slurm daemons  are  started.  These
4288                        nodes can be made available for use simply by updating
4289                        the node state using the scontrol command rather  than
4290                        restarting the slurmctld daemon. After these nodes are
4291                        made available, change their State in  the  slurm.conf
4292                        file.  Until these nodes are made available, they will
4293                        not be seen using any Slurm commands or nor  will  any
4294                        attempt be made to contact them.
4295
4296              UNKNOWN   Indicates  the  node's  state  is  undefined  (BUSY or
4297                        IDLE), but will be established when the slurmd  daemon
4298                        on   that   node  registers.   The  default  value  is
4299                        "UNKNOWN".
4300
4301
4302       ThreadsPerCore
4303              Number of logical threads in a single physical core (e.g.  "2").
4304              Note  that  the Slurm can allocate resources to jobs down to the
4305              resolution of a core. If your system  is  configured  with  more
4306              than  one  thread per core, execution of a different job on each
4307              thread is not supported unless you  configure  SelectTypeParame‐
4308              ters=CR_CPU  plus CPUs; do not configure Sockets, CoresPerSocket
4309              or ThreadsPerCore.  A job can execute a one task per thread from
4310              within  one  job  step or execute a distinct job step on each of
4311              the threads.  Note also if you are  running  with  more  than  1
4312              thread  per core and running the select/cons_res plugin you will
4313              want to set the SelectTypeParameters variable to something other
4314              than  CR_CPU  to avoid unexpected results.  The default value is
4315              1.
4316
4317
4318       TmpDisk
4319              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4320              "16384").  TmpFS  (for  "Temporary  File System") identifies the
4321              location which jobs should use for temporary storage.  Note this
4322              does not indicate the amount of free space available to the user
4323              on the node, only the total file system size. The system  admin‐
4324              istration  should ensure this file system is purged as needed so
4325              that user jobs have access to most of this  space.   The  Prolog
4326              and/or  Epilog  programs  (specified  in the configuration file)
4327              might be used to ensure the file  system  is  kept  clean.   The
4328              default value is 0.
4329
4330
4331       TRESWeights  TRESWeights  are used to calculate a value that represents
4332       how
4333              busy a node is. Currently only  used  in  federation  configura‐
4334              tions.  TRESWeights  are  different  from  TRESBillingWeights --
4335              which is used for fairshare calcuations.
4336
4337              TRES weights are specified as a comma-separated  list  of  <TRES
4338              Type>=<TRES Weight> pairs.
4339              e.g.
4340              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4341
4342              By  default  the weighted TRES value is calculated as the sum of
4343              all node TRES  types  multiplied  by  their  corresponding  TRES
4344              weight.
4345
4346              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4347              is calculated as the MAX of individual node  TRES'  (e.g.  cpus,
4348              mem, gres).
4349
4350
4351       Weight The  priority  of  the node for scheduling purposes.  All things
4352              being equal, jobs will be allocated the nodes  with  the  lowest
4353              weight  which satisfies their requirements.  For example, a het‐
4354              erogeneous collection of nodes might be  placed  into  a  single
4355              partition  for  greater  system  utilization, responsiveness and
4356              capability. It would be preferable to  allocate  smaller  memory
4357              nodes  rather  than larger memory nodes if either will satisfy a
4358              job's requirements.  The units  of  weight  are  arbitrary,  but
4359              larger weights should be assigned to nodes with more processors,
4360              memory, disk space, higher processor speed, etc.  Note that if a
4361              job allocation request can not be satisfied using the nodes with
4362              the lowest weight, the set of nodes with the next lowest  weight
4363              is added to the set of nodes under consideration for use (repeat
4364              as needed for higher weight values). If you absolutely  want  to
4365              minimize  the  number  of higher weight nodes allocated to a job
4366              (at a cost of higher scheduling overhead), give each node a dis‐
4367              tinct  Weight  value and they will be added to the pool of nodes
4368              being considered for scheduling individually.  The default value
4369              is 1.
4370
4371
4372       The  "DownNodes=" configuration permits you to mark certain nodes as in
4373       a DOWN, DRAIN, FAIL, or FAILING state without  altering  the  permanent
4374       configuration information listed under a "NodeName=" specification.
4375
4376
4377       DownNodes
4378              Any node name, or list of node names, from the "NodeName=" spec‐
4379              ifications.
4380
4381
4382       Reason Identifies the reason for a node being in state "DOWN", "DRAIN",
4383              "FAIL"  or "FAILING.  Use quotes to enclose a reason having more
4384              than one word.
4385
4386
4387       State  State of the node with respect to the initiation of  user  jobs.
4388              Acceptable  values  are  "DOWN",  "DRAIN", "FAIL", "FAILING" and
4389              "UNKNOWN".  Node states of "BUSY" and "IDLE" should not be spec‐
4390              ified  in  the  node  configuration,  but  set the node state to
4391              "UNKNOWN" instead.  Setting the node  state  to  "UNKNOWN"  will
4392              result  in  the  node state being set to "BUSY", "IDLE" or other
4393              appropriate state based upon recovered system state information.
4394              The default value is "UNKNOWN".
4395
4396              DOWN      Indicates  the  node  failed  and is unavailable to be
4397                        allocated work.
4398
4399              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4400                        work.on.
4401
4402              FAIL      Indicates  the  node  is expected to fail soon, has no
4403                        jobs allocated to it, and will not be allocated to any
4404                        new jobs.
4405
4406              FAILING   Indicates  the  node is expected to fail soon, has one
4407                        or more jobs allocated to it, but will  not  be  allo‐
4408                        cated to any new jobs.
4409
4410              UNKNOWN   Indicates  the  node's  state  is  undefined  (BUSY or
4411                        IDLE), but will be established when the slurmd  daemon
4412                        on   that   node  registers.   The  default  value  is
4413                        "UNKNOWN".
4414
4415
4416       On computers where frontend nodes are used  to  execute  batch  scripts
4417       rather than compute nodes (Cray ALPS systems), one may configure one or
4418       more frontend nodes using the configuration parameters  defined  below.
4419       These  options  are  very  similar to those used in configuring compute
4420       nodes. These options may only be used on systems configured  and  built
4421       with  the  appropriate parameters (--have-front-end) or a system deter‐
4422       mined to have the appropriate  architecture  by  the  configure  script
4423       (Cray ALPS systems).  The front end configuration specifies the follow‐
4424       ing information:
4425
4426
4427       AllowGroups
4428              Comma separated list of group names which may  execute  jobs  on
4429              this  front  end node. By default, all groups may use this front
4430              end node.  If at  least  one  group  associated  with  the  user
4431              attempting to execute the job is in AllowGroups, he will be per‐
4432              mitted to use this front end node.  May not  be  used  with  the
4433              DenyGroups option.
4434
4435
4436       AllowUsers
4437              Comma  separated  list  of  user names which may execute jobs on
4438              this front end node. By default, all users may  use  this  front
4439              end node.  May not be used with the DenyUsers option.
4440
4441
4442       DenyGroups
4443              Comma  separated  list  of  group names which are prevented from
4444              executing jobs on this front end node.  May not be used with the
4445              AllowGroups option.
4446
4447
4448       DenyUsers
4449              Comma separated list of user names which are prevented from exe‐
4450              cuting jobs on this front end node.  May not be  used  with  the
4451              AllowUsers option.
4452
4453
4454       FrontendName
4455              Name  that  Slurm  uses  to refer to a frontend node.  Typically
4456              this would be the string that "/bin/hostname  -s"  returns.   It
4457              may  also  be  the  fully  qualified  domain name as returned by
4458              "/bin/hostname -f" (e.g. "foo1.bar.com"), or  any  valid  domain
4459              name   associated  with  the  host  through  the  host  database
4460              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4461              that  if the short form of the hostname is not used, it may pre‐
4462              vent use of hostlist expressions (the numeric portion in  brack‐
4463              ets  must  be at the end of the string).  If the FrontendName is
4464              "DEFAULT", the values specified with that record will  apply  to
4465              subsequent  node  specifications  unless explicitly set to other
4466              values in that frontend node record or replaced with a different
4467              set   of  default  values.   Each  line  where  FrontendName  is
4468              "DEFAULT" will replace or add to previous default values and not
4469              a  reinitialize  the default values.  Note that since the naming
4470              of front end nodes would typically not follow that of  the  com‐
4471              pute  nodes  (e.g.  lacking  X, Y and Z coordinates found in the
4472              compute node naming scheme), each front end node name should  be
4473              listed  separately and without a hostlist expression (i.e. fron‐
4474              tend00,frontend01" rather than "frontend[00-01]").</p>
4475
4476
4477       FrontendAddr
4478              Name that a frontend node should be referred to in  establishing
4479              a  communications path. This name will be used as an argument to
4480              the gethostbyname() function for identification.  As with  Fron‐
4481              tendName, list the individual node addresses rather than using a
4482              hostlist expression.  The number  of  FrontendAddr  records  per
4483              line  must  equal  the  number  of FrontendName records per line
4484              (i.e. you can't map to node names to one address).  FrontendAddr
4485              may  also  contain  IP  addresses.  By default, the FrontendAddr
4486              will be identical in value to FrontendName.
4487
4488
4489       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4490              tens  to  for  work on this particular frontend node. By default
4491              there is a single port number for  all  slurmd  daemons  on  all
4492              frontend nodes as defined by the SlurmdPort configuration param‐
4493              eter. Use of this option is not generally recommended except for
4494              development or testing purposes.
4495
4496              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4497              automatically try to interact  with  anything  opened  on  ports
4498              8192-60000.  Configure Port to use a port outside of the config‐
4499              ured SrunPortRange and RSIP's port range.
4500
4501
4502       Reason Identifies the reason for a frontend node being in state "DOWN",
4503              "DRAINED"  "DRAINING",  "FAIL"  or  "FAILING".   Use  quotes  to
4504              enclose a reason having more than one word.
4505
4506
4507       State  State of the frontend node with respect  to  the  initiation  of
4508              user  jobs.   Acceptable  values  are  "DOWN",  "DRAIN", "FAIL",
4509              "FAILING" and "UNKNOWN".  "DOWN" indicates the frontend node has
4510              failed  and  is unavailable to be allocated work.  "DRAIN" indi‐
4511              cates the frontend node is unavailable  to  be  allocated  work.
4512              "FAIL" indicates the frontend node is expected to fail soon, has
4513              no jobs allocated to it, and will not be allocated  to  any  new
4514              jobs.  "FAILING" indicates the frontend node is expected to fail
4515              soon, has one or more jobs allocated to  it,  but  will  not  be
4516              allocated  to  any  new  jobs.  "UNKNOWN" indicates the frontend
4517              node's state is undefined (BUSY or IDLE),  but  will  be  estab‐
4518              lished  when  the  slurmd  daemon  on  that node registers.  The
4519              default value is "UNKNOWN".  Also see  the  DownNodes  parameter
4520              below.
4521
4522              For  example:  "FrontendName=frontend[00-03] FrontendAddr=efron‐
4523              tend[00-03] State=UNKNOWN" is used  to  define  four  front  end
4524              nodes for running slurmd daemons.
4525
4526
4527       The partition configuration permits you to establish different job lim‐
4528       its or access controls for various groups  (or  partitions)  of  nodes.
4529       Nodes  may  be  in  more than one partition, making partitions serve as
4530       general purpose queues.  For example one may put the same set of  nodes
4531       into  two  different  partitions, each with different constraints (time
4532       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4533       allocated  resources  within a single partition.  Default values can be
4534       specified with a record  in  which  PartitionName  is  "DEFAULT".   The
4535       default  entry values will apply only to lines following it in the con‐
4536       figuration file and the default values can be reset multiple  times  in
4537       the   configuration   file  with  multiple  entries  where  "Partition‐
4538       Name=DEFAULT".  The "PartitionName=" specification must  be  placed  on
4539       every line describing the configuration of partitions.  Each line where
4540       PartitionName is "DEFAULT" will replace or add to previous default val‐
4541       ues and not a reinitialize the default values.  A single partition name
4542       can not appear as a PartitionName value in more than one  line  (dupli‐
4543       cate  partition  name records will be ignored).  If a partition that is
4544       in use is deleted from the configuration  and  slurm  is  restarted  or
4545       reconfigured  (scontrol reconfigure), jobs using the partition are can‐
4546       celed.  NOTE: Put all parameters for each partition on a  single  line.
4547       Each  line  of  partition  configuration information should represent a
4548       different partition.  The partition  configuration  file  contains  the
4549       following information:
4550
4551
4552       AllocNodes
4553              Comma  separated  list of nodes from which users can submit jobs
4554              in the partition.  Node names may be specified  using  the  node
4555              range  expression  syntax described above.  The default value is
4556              "ALL".
4557
4558
4559       AllowAccounts
4560              Comma separated list of accounts which may execute jobs  in  the
4561              partition.   The default value is "ALL".  NOTE: If AllowAccounts
4562              is used then DenyAccounts will not be enforced.  Also  refer  to
4563              DenyAccounts.
4564
4565
4566       AllowGroups
4567              Comma  separated  list  of group names which may execute jobs in
4568              the partition.  If at least one group associated with  the  user
4569              attempting to execute the job is in AllowGroups, he will be per‐
4570              mitted to use this partition.  Jobs executed as  user  root  can
4571              use  any  partition  without regard to the value of AllowGroups.
4572              If user root attempts to execute a job  as  another  user  (e.g.
4573              using  srun's  --uid  option), this other user must be in one of
4574              groups identified by AllowGroups for  the  job  to  successfully
4575              execute.   The default value is "ALL".  When set, all partitions
4576              that a user does not have access will  be  hidden  from  display
4577              regardless of the settings used for PrivateData.  NOTE: For per‐
4578              formance reasons, Slurm maintains a list of user IDs allowed  to
4579              use  each  partition and this is checked at job submission time.
4580              This list of user IDs is updated when the  slurmctld  daemon  is
4581              restarted, reconfigured (e.g. "scontrol reconfig") or the parti‐
4582              tion's AllowGroups value is reset, even if is value is unchanged
4583              (e.g.  "scontrol  update PartitionName=name AllowGroups=group").
4584              For a user's access to a partition to  change,  both  his  group
4585              membership  must  change  and Slurm's internal user ID list must
4586              change using one of the methods described above.
4587
4588
4589       AllowQos
4590              Comma separated list of Qos which may execute jobs in the parti‐
4591              tion.   Jobs executed as user root can use any partition without
4592              regard to the value of AllowQos.  The default  value  is  "ALL".
4593              NOTE:  If  AllowQos  is  used then DenyQos will not be enforced.
4594              Also refer to DenyQos.
4595
4596
4597       Alternate
4598              Partition name of alternate partition to be used if the state of
4599              this partition is "DRAIN" or "INACTIVE."
4600
4601
4602       CpuBind
4603              If  a job step request does not specify an option to control how
4604              tasks are bound to allocated CPUs  (--cpu-bind)  and  all  nodes
4605              allocated  to  the  job  do not have the same CpuBind option the
4606              node. Then the partition's CpuBind option will control how tasks
4607              are  bound  to allocated resources.  Supported values forCpuBind
4608              are  "none",  "board",  "socket",  "ldom"  (NUMA),  "core"   and
4609              "thread".
4610
4611
4612       Default
4613              If this keyword is set, jobs submitted without a partition spec‐
4614              ification will utilize  this  partition.   Possible  values  are
4615              "YES" and "NO".  The default value is "NO".
4616
4617
4618       DefMemPerCPU
4619              Default   real  memory  size  available  per  allocated  CPU  in
4620              megabytes.  Used to avoid over-subscribing  memory  and  causing
4621              paging.  DefMemPerCPU would generally be used if individual pro‐
4622              cessors are allocated to jobs (SelectType=select/cons_res).   If
4623              not  set,  the DefMemPerCPU value for the entire cluster will be
4624              used.  Also see DefMemPerNode  and  MaxMemPerCPU.   DefMemPerCPU
4625              and DefMemPerNode are mutually exclusive.
4626
4627
4628       DefMemPerNode
4629              Default  real  memory  size  available  per  allocated  node  in
4630              megabytes.  Used to avoid over-subscribing  memory  and  causing
4631              paging.   DefMemPerNode  would  generally be used if whole nodes
4632              are allocated to jobs (SelectType=select/linear)  and  resources
4633              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
4634              If not set, the DefMemPerNode value for the entire cluster  will
4635              be used.  Also see DefMemPerCPU and MaxMemPerNode.  DefMemPerCPU
4636              and DefMemPerNode are mutually exclusive.
4637
4638
4639       DenyAccounts
4640              Comma separated list of accounts which may not execute  jobs  in
4641              the  partition.  By default, no accounts are denied access NOTE:
4642              If AllowAccounts is used then DenyAccounts will not be enforced.
4643              Also refer to AllowAccounts.
4644
4645
4646       DenyQos
4647              Comma  separated  list  of Qos which may not execute jobs in the
4648              partition.  By default,  no  QOS  are  denied  access  NOTE:  If
4649              AllowQos  is used then DenyQos will not be enforced.  Also refer
4650              AllowQos.
4651
4652
4653       DefaultTime
4654              Run time limit used for jobs that don't specify a value. If  not
4655              set  then  MaxTime will be used.  Format is the same as for Max‐
4656              Time.
4657
4658
4659       DisableRootJobs
4660              If set to "YES" then user root will be  prevented  from  running
4661              any jobs on this partition.  The default value will be the value
4662              of DisableRootJobs set  outside  of  a  partition  specification
4663              (which is "NO", allowing user root to execute jobs).
4664
4665
4666       ExclusiveUser
4667              If  set  to  "YES"  then  nodes will be exclusively allocated to
4668              users.  Multiple jobs may be run for the same user, but only one
4669              user can be active at a time.  This capability is also available
4670              on a per-job basis by using the --exclusive=user option.
4671
4672
4673       GraceTime
4674              Specifies, in units of seconds, the preemption grace time to  be
4675              extended  to  a job which has been selected for preemption.  The
4676              default value is zero, no preemption grace time  is  allowed  on
4677              this  partition.   Once  a job has been selected for preemption,
4678              its end time is set to the  current  time  plus  GraceTime.  The
4679              job's  tasks are immediately sent SIGCONT and SIGTERM signals in
4680              order to provide notification of its imminent termination.  This
4681              is  followed by the SIGCONT, SIGTERM and SIGKILL signal sequence
4682              upon reaching its new end time. This second set  of  signals  is
4683              sent  to  both  the  tasks  and  the containing batch script, if
4684              applicable.  Meaningful only for PreemptMode=CANCEL.   See  also
4685              the global KillWait configuration parameter.
4686
4687
4688       Hidden Specifies  if  the  partition  and  its jobs are to be hidden by
4689              default.  Hidden partitions will by default not be  reported  by
4690              the Slurm APIs or commands.  Possible values are "YES" and "NO".
4691              The default value is "NO".  Note that  partitions  that  a  user
4692              lacks access to by virtue of the AllowGroups parameter will also
4693              be hidden by default.
4694
4695
4696       LLN    Schedule resources to jobs on the least loaded nodes (based upon
4697              the number of idle CPUs). This is generally only recommended for
4698              an environment with serial jobs as idle resources will  tend  to
4699              be  highly fragmented, resulting in parallel jobs being distrib‐
4700              uted across many nodes.  Note that node Weight takes  precedence
4701              over  how  many  idle  resources are on each node.  Also see the
4702              SelectParameters configuration parameter CR_LLN to use the least
4703              loaded nodes in every partition.
4704
4705
4706       MaxCPUsPerNode
4707              Maximum  number  of  CPUs on any node available to all jobs from
4708              this partition.  This can be especially useful to schedule GPUs.
4709              For  example  a node can be associated with two Slurm partitions
4710              (e.g. "cpu" and "gpu") and the partition/queue  "cpu"  could  be
4711              limited  to  only a subset of the node's CPUs, ensuring that one
4712              or more CPUs would be available to  jobs  in  the  "gpu"  parti‐
4713              tion/queue.
4714
4715
4716       MaxMemPerCPU
4717              Maximum   real  memory  size  available  per  allocated  CPU  in
4718              megabytes.  Used to avoid over-subscribing  memory  and  causing
4719              paging.  MaxMemPerCPU would generally be used if individual pro‐
4720              cessors are allocated to jobs (SelectType=select/cons_res).   If
4721              not  set,  the MaxMemPerCPU value for the entire cluster will be
4722              used.  Also see DefMemPerCPU  and  MaxMemPerNode.   MaxMemPerCPU
4723              and MaxMemPerNode are mutually exclusive.
4724
4725
4726       MaxMemPerNode
4727              Maximum  real  memory  size  available  per  allocated  node  in
4728              megabytes.  Used to avoid over-subscribing  memory  and  causing
4729              paging.   MaxMemPerNode  would  generally be used if whole nodes
4730              are allocated to jobs (SelectType=select/linear)  and  resources
4731              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
4732              If not set, the MaxMemPerNode value for the entire cluster  will
4733              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
4734              and MaxMemPerNode are mutually exclusive.
4735
4736
4737       MaxNodes
4738              Maximum count of nodes which may be allocated to any single job.
4739              The  default  value  is "UNLIMITED", which is represented inter‐
4740              nally as -1.  This limit does not  apply  to  jobs  executed  by
4741              SlurmUser or user root.
4742
4743
4744       MaxTime
4745              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
4746              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
4747              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
4748              tion is one minute and second values are rounded up to the  next
4749              minute.  This limit does not apply to jobs executed by SlurmUser
4750              or user root.
4751
4752
4753       MinNodes
4754              Minimum count of nodes which may be allocated to any single job.
4755              The  default value is 0.  This limit does not apply to jobs exe‐
4756              cuted by SlurmUser or user root.
4757
4758
4759       Nodes  Comma separated list of nodes which  are  associated  with  this
4760              partition.   Node  names  may  be specified using the node range
4761              expression syntax described above. A blank list of  nodes  (i.e.
4762              "Nodes=  ")  can  be used if one wants a partition to exist, but
4763              have no resources (possibly on a temporary basis).  A  value  of
4764              "ALL" is mapped to all nodes configured in the cluster.
4765
4766
4767       OverSubscribe
4768              Controls  the  ability of the partition to execute more than one
4769              job at a time on each resource (node, socket or  core  depending
4770              upon the value of SelectTypeParameters).  If resources are to be
4771              over-subscribed,  avoiding  memory  over-subscription  is   very
4772              important.   SelectTypeParameters  should be configured to treat
4773              memory as a consumable resource and the --mem option  should  be
4774              used  for  job  allocations.   Sharing of resources is typically
4775              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
4776              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
4777              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
4778              can  negatively  impact  performance for systems with many thou‐
4779              sands of running jobs.  The default value  is  "NO".   For  more
4780              information see the following web pages:
4781              https://slurm.schedmd.com/cons_res.html,
4782              https://slurm.schedmd.com/cons_res_share.html,
4783              https://slurm.schedmd.com/gang_scheduling.html, and
4784              https://slurm.schedmd.com/preempt.html.
4785
4786
4787              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
4788                          Type=select/cons_res configured.  Jobs that  run  in
4789                          partitions  with "OverSubscribe=EXCLUSIVE" will have
4790                          exclusive access to all allocated nodes.
4791
4792              FORCE       Makes all resources in the partition  available  for
4793                          oversubscription without any means for users to dis‐
4794                          able it.  May be followed with a colon  and  maximum
4795                          number  of  jobs in running or suspended state.  For
4796                          example "OverSubscribe=FORCE:4" enables  each  node,
4797                          socket  or  core to oversubscribe each resource four
4798                          ways.  Recommended only  for  systems  running  with
4799                          gang  scheduling  (PreemptMode=suspend,gang).  NOTE:
4800                          PreemptType=QOS will permit one additional job to be
4801                          run  on  the partition if started due to job preemp‐
4802                          tion.  For  example,  a  configuration  of  OverSub‐
4803                          scribe=FORCE:1   will   only   permit  one  job  per
4804                          resources normally, but a second job can be  started
4805                          if  done  so through preemption based upon QOS.  The
4806                          use of PreemptType=QOS and PreemptType=Suspend  only
4807                          applies with SelectType=select/cons_res.
4808
4809              YES         Makes  all  resources in the partition available for
4810                          sharing upon request by  the  job.   Resources  will
4811                          only be over-subscribed when explicitly requested by
4812                          the user using the "--oversubscribe" option  on  job
4813                          submission.   May be followed with a colon and maxi‐
4814                          mum number of jobs in running  or  suspended  state.
4815                          For example "OverSubscribe=YES:4" enables each node,
4816                          socket or core to execute up to four jobs  at  once.
4817                          Recommended  only  for  systems  running  with  gang
4818                          scheduling (PreemptMode=suspend,gang).
4819
4820              NO          Selected resources are allocated to a single job. No
4821                          resource will be allocated to more than one job.
4822
4823
4824       PartitionName
4825              Name  by  which  the partition may be referenced (e.g. "Interac‐
4826              tive").  This name can be specified  by  users  when  submitting
4827              jobs.   If  the PartitionName is "DEFAULT", the values specified
4828              with that record will apply to subsequent  partition  specifica‐
4829              tions  unless  explicitly  set to other values in that partition
4830              record or replaced with a different set of default values.  Each
4831              line  where  PartitionName  is  "DEFAULT" will replace or add to
4832              previous default values and not a reinitialize the default  val‐
4833              ues.
4834
4835
4836       PreemptMode
4837              Mechanism used to preempt jobs from this partition when Preempt‐
4838              Type=preempt/partition_prio is configured.  This partition  spe‐
4839              cific PreemptMode configuration parameter will override the Pre‐
4840              emptMode configuration parameter set for the cluster as a whole.
4841              The  cluster-level  PreemptMode  must include the GANG option if
4842              PreemptMode is configured to SUSPEND  for  any  partition.   The
4843              cluster-level  PreemptMode  must  not  be  OFF if PreemptMode is
4844              enabled for any  partition.  See the description  of  the  clus‐
4845              ter-level  PreemptMode configuration parameter above for further
4846              information.
4847
4848
4849       PriorityJobFactor
4850              Partition factor used by priority/multifactor plugin  in  calcu‐
4851              lating  job priority.  The value may not exceed 65533.  Also see
4852              PriorityTier.
4853
4854
4855       PriorityTier
4856              Jobs submitted to a partition with a higher priority tier  value
4857              will  be  dispatched before pending jobs in partition with lower
4858              priority tier value and, if possible, they will preempt  running
4859              jobs from partitions with lower priority tier values.  Note that
4860              a partition's priority tier takes precedence over a job's prior‐
4861              ity.   The value may not exceed 65533.  Also see PriorityJobFac‐
4862              tor.
4863
4864
4865       QOS    Used to extend the limits available to a  QOS  on  a  partition.
4866              Jobs will not be associated to this QOS outside of being associ‐
4867              ated to the partition.  They will still be associated  to  their
4868              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
4869              set in both the Partition's QOS and the Job's QOS the  Partition
4870              QOS  will  be  honored  unless the Job's QOS has the OverPartQOS
4871              flag set in which the Job's QOS will have priority.
4872
4873
4874       ReqResv
4875              Specifies users of this partition are required  to  designate  a
4876              reservation  when submitting a job. This option can be useful in
4877              restricting usage of a partition that may have  higher  priority
4878              or additional resources to be allowed only within a reservation.
4879              Possible values are "YES" and "NO".  The default value is "NO".
4880
4881
4882       RootOnly
4883              Specifies if only user ID zero (i.e.  user  root)  may  allocate
4884              resources  in  this  partition. User root may allocate resources
4885              for any other user, but the request must be  initiated  by  user
4886              root.   This  option can be useful for a partition to be managed
4887              by some external entity (e.g. a higher-level  job  manager)  and
4888              prevents  users  from  directly using those resources.  Possible
4889              values are "YES" and "NO".  The default value is "NO".
4890
4891
4892       SelectTypeParameters
4893              Partition-specific  resource  allocation  type.    This   option
4894              replaces  the global SelectTypeParameters value.  Supported val‐
4895              ues are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory.
4896              Use  requires  the system-wide SelectTypeParameters value be set
4897              to any of the four supported values  previously  listed;  other‐
4898              wise, the partition-specific value will be ignored.
4899
4900
4901       Shared The  Shared  configuration  parameter  has  been replaced by the
4902              OverSubscribe parameter described above.
4903
4904
4905       State  State of partition or availability for use.  Possible values are
4906              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
4907              See also the related "Alternate" keyword.
4908
4909              UP        Designates that new jobs may queued on the  partition,
4910                        and  that jobs may be allocated nodes and run from the
4911                        partition.
4912
4913              DOWN      Designates that new jobs may be queued on  the  parti‐
4914                        tion,  but  queued jobs may not be allocated nodes and
4915                        run from the partition. Jobs already  running  on  the
4916                        partition continue to run. The jobs must be explicitly
4917                        canceled to force their termination.
4918
4919              DRAIN     Designates that no new jobs may be queued on the  par‐
4920                        tition (job submission requests will be denied with an
4921                        error message), but jobs already queued on the  parti‐
4922                        tion  may  be  allocated  nodes and run.  See also the
4923                        "Alternate" partition specification.
4924
4925              INACTIVE  Designates that no new jobs may be queued on the  par‐
4926                        tition,  and  jobs already queued may not be allocated
4927                        nodes and run.  See  also  the  "Alternate"  partition
4928                        specification.
4929
4930
4931       TRESBillingWeights
4932              TRESBillingWeights is used to define the billing weights of each
4933              TRES type that will be used in calculating the usage of  a  job.
4934              The calculated usage is used when calculating fairshare and when
4935              enforcing the TRES billing limit on jobs.
4936
4937              Billing weights are specified as a comma-separated list of <TRES
4938              Type>=<TRES Billing Weight> pairs.
4939
4940              Any  TRES Type is available for billing. Note that the base unit
4941              for memory and burst buffers is megabytes.
4942
4943              By default the billing of TRES is calculated as the sum  of  all
4944              TRES types multiplied by their corresponding billing weight.
4945
4946              The  weighted  amount  of a resource can be adjusted by adding a
4947              suffix of K,M,G,T or P after the billing weight. For example,  a
4948              memory weight of "mem=.25" on a job allocated 8GB will be billed
4949              2048 (8192MB *.25) units. A memory weight of "mem=.25G"  on  the
4950              same job will be billed 2 (8192MB * (.25/1024)) units.
4951
4952              Negative values are allowed.
4953
4954              When  a job is allocated 1 CPU and 8 GB of memory on a partition
4955              configured                   with                   TRESBilling‐
4956              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
4957              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
4958
4959              If PriorityFlags=MAX_TRES is configured, the  billable  TRES  is
4960              calculated  as the MAX of individual TRES' on a node (e.g. cpus,
4961              mem, gres) plus the sum of all  global  TRES'  (e.g.  licenses).
4962              Using   the  same  example  above  the  billable  TRES  will  be
4963              MAX(1*1.0, 8*0.25) + (0*2.0) = 2.0.
4964
4965              If TRESBillingWeights is not defined  then  the  job  is  billed
4966              against the total number of allocated CPUs.
4967
4968              NOTE: TRESBillingWeights doesn't affect job priority directly as
4969              it is currently not used for the size of the job.  If  you  want
4970              TRES'  to  play  a  role in the job's priority then refer to the
4971              PriorityWeightTRES option.
4972
4973
4974

Prolog and Epilog Scripts

4976       There are a variety of prolog and epilog program options  that  execute
4977       with  various  permissions and at various times.  The four options most
4978       likely to be used are: Prolog and Epilog (executed once on each compute
4979       node  for  each job) plus PrologSlurmctld and EpilogSlurmctld (executed
4980       once on the ControlMachine for each job).
4981
4982       NOTE:  Standard output and error messages are normally  not  preserved.
4983       Explicitly  write  output and error messages to an appropriate location
4984       if you wish to preserve that information.
4985
4986       NOTE:  By default the Prolog script is ONLY run on any individual  node
4987       when  it  first  sees a job step from a new allocation; it does not run
4988       the Prolog immediately when an allocation is granted.  If no job  steps
4989       from  an allocation are run on a node, it will never run the Prolog for
4990       that allocation.  This Prolog behaviour can  be  changed  by  the  Pro‐
4991       logFlags  parameter.   The  Epilog,  on  the other hand, always runs on
4992       every node of an allocation when the allocation is released.
4993
4994       If the Epilog fails (returns a non-zero exit code), this will result in
4995       the  node  being  set  to  a DRAIN state.  If the EpilogSlurmctld fails
4996       (returns a non-zero exit code), this will only be logged.  If the  Pro‐
4997       log  fails (returns a non-zero exit code), this will result in the node
4998       being set to a DRAIN state and the job being requeued in a  held  state
4999       unless  nohold_on_prolog_fail is configured in SchedulerParameters.  If
5000       the PrologSlurmctld fails (returns a non-zero  exit  code),  this  will
5001       result  in  the  job  requeued to executed on another node if possible.
5002       Only batch jobs can be requeued.
5003        Interactive jobs (salloc and srun)  will  be  cancelled  if  the  Pro‐
5004       logSlurmctld fails.
5005
5006
5007       Information  about  the  job  is passed to the script using environment
5008       variables.  Unless otherwise specified, these environment variables are
5009       available to all of the programs.
5010
5011       BASIL_RESERVATION_ID
5012              Basil reservation ID.  Available on Cray systems with ALPS only.
5013
5014       SLURM_ARRAY_JOB_ID
5015              If  this job is part of a job array, this will be set to the job
5016              ID.  Otherwise it will not be set.  To reference  this  specific
5017              task   of   a   job   array,   combine  SLURM_ARRAY_JOB_ID  with
5018              SLURM_ARRAY_TASK_ID        (e.g.        "scontrol         update
5019              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5020              PrologSlurmctld and EpilogSlurmctld only.
5021
5022       SLURM_ARRAY_TASK_ID
5023              If this job is part of a job array, this will be set to the task
5024              ID.   Otherwise  it will not be set.  To reference this specific
5025              task  of  a   job   array,   combine   SLURM_ARRAY_JOB_ID   with
5026              SLURM_ARRAY_TASK_ID         (e.g.        "scontrol        update
5027              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5028              PrologSlurmctld and EpilogSlurmctld only.
5029
5030       SLURM_ARRAY_TASK_MAX
5031              If this job is part of a job array, this will be set to the max‐
5032              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5033              logSlurmctld and EpilogSlurmctld only.
5034
5035       SLURM_ARRAY_TASK_MIN
5036              If this job is part of a job array, this will be set to the min‐
5037              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5038              logSlurmctld and EpilogSlurmctld only.
5039
5040       SLURM_ARRAY_TASK_STEP
5041              If this job is part of a job array, this will be set to the step
5042              size of task IDs.  Otherwise it will not be set.   Available  in
5043              PrologSlurmctld and EpilogSlurmctld only.
5044
5045       SLURM_CLUSTER_NAME
5046              Name of the cluster executing the job.
5047
5048       SLURM_JOB_ACCOUNT
5049              Account name used for the job.  Available in PrologSlurmctld and
5050              EpilogSlurmctld only.
5051
5052       SLURM_JOB_CONSTRAINTS
5053              Features required to run the job.   Available  in  Prolog,  Pro‐
5054              logSlurmctld and EpilogSlurmctld only.
5055
5056       SLURM_JOB_DERIVED_EC
5057              The  highest  exit  code  of all of the job steps.  Available in
5058              EpilogSlurmctld only.
5059
5060       SLURM_JOB_EXIT_CODE
5061              The exit code of the job script (or salloc). The  value  is  the
5062              status  as  returned  by  the  wait()  system call (See wait(2))
5063              Available in EpilogSlurmctld only.
5064
5065       SLURM_JOB_EXIT_CODE2
5066              The exit code of the job script (or salloc). The value  has  the
5067              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5068              cally as set by the exit() function. The second  number  of  the
5069              signal that caused the process to terminate if it was terminated
5070              by a signal.  Available in EpilogSlurmctld only.
5071
5072       SLURM_JOB_GID
5073              Group ID of the job's owner.  Available in PrologSlurmctld, Epi‐
5074              logSlurmctld and TaskProlog only.
5075
5076       SLURM_JOB_GPUS
5077              GPU  IDs allocated to the job (if any).  Available in the Prolog
5078              only.
5079
5080       SLURM_JOB_GROUP
5081              Group name of the job's owner.  Available in PrologSlurmctld and
5082              EpilogSlurmctld only.
5083
5084       SLURM_JOB_ID
5085              Job  ID.  CAUTION: If this job is the first task of a job array,
5086              then Slurm commands using this job ID will refer to  the  entire
5087              job array rather than this specific task of the job array.
5088
5089       SLURM_JOB_NAME
5090              Name  of the job.  Available in PrologSlurmctld and EpilogSlurm‐
5091              ctld only.
5092
5093       SLURM_JOB_NODELIST
5094              Nodes assigned to job. A Slurm hostlist  expression.   "scontrol
5095              show  hostnames"  can be used to convert this to a list of indi‐
5096              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5097              logSlurmctld only.
5098
5099       SLURM_JOB_PARTITION
5100              Partition  that  job runs in.  Available in Prolog, PrologSlurm‐
5101              ctld and EpilogSlurmctld only.
5102
5103       SLURM_JOB_UID
5104              User ID of the job's owner.
5105
5106       SLURM_JOB_USER
5107              User name of the job's owner.
5108
5109

NETWORK TOPOLOGY

5111       Slurm is able to optimize job  allocations  to  minimize  network  con‐
5112       tention.   Special  Slurm logic is used to optimize allocations on sys‐
5113       tems with a three-dimensional interconnect.  and information about con‐
5114       figuring  those  systems  are  available  on  web pages available here:
5115       <https://slurm.schedmd.com/>.  For a hierarchical network, Slurm  needs
5116       to have detailed information about how nodes are configured on the net‐
5117       work switches.
5118
5119       Given network topology information, Slurm  allocates  all  of  a  job's
5120       resources  onto  a  single  leaf  of  the network (if possible) using a
5121       best-fit algorithm.  Otherwise it will allocate a job's resources  onto
5122       multiple  leaf  switches  so  as  to  minimize  the use of higher-level
5123       switches.  The TopologyPlugin parameter controls which plugin  is  used
5124       to  collect  network  topology  information.  The only values presently
5125       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5126       forms  best-fit logic over three-dimensional topology), "topology/none"
5127       (default for other systems, best-fit logic over one-dimensional  topol‐
5128       ogy), "topology/tree" (determine the network topology based upon infor‐
5129       mation contained in a topology.conf file, see "man  topology.conf"  for
5130       more  information).   Future  plugins  may  gather topology information
5131       directly from the network.  The topology information is  optional.   If
5132       not  provided,  Slurm  will  perform  a best-fit algorithm assuming the
5133       nodes are in a one-dimensional array as configured and  the  communica‐
5134       tions cost is related to the node distance in this array.
5135
5136

RELOCATING CONTROLLERS

5138       If  the  cluster's  computers used for the primary or backup controller
5139       will be out of service for an extended period of time, it may be desir‐
5140       able to relocate them.  In order to do so, follow this procedure:
5141
5142       1. Stop the Slurm daemons
5143       2. Modify the slurm.conf file appropriately
5144       3. Distribute the updated slurm.conf file to all nodes
5145       4. Restart the Slurm daemons
5146
5147       There  should  be  no loss of any running or pending jobs.  Ensure that
5148       any nodes added  to  the  cluster  have  the  current  slurm.conf  file
5149       installed.
5150
5151       CAUTION: If two nodes are simultaneously configured as the primary con‐
5152       troller (two nodes on which ControlMachine specify the local  host  and
5153       the  slurmctld  daemon  is  executing on each), system behavior will be
5154       destructive.  If a compute node  has  an  incorrect  ControlMachine  or
5155       BackupController  parameter, that node may be rendered unusable, but no
5156       other harm will result.
5157
5158

EXAMPLE

5160       #
5161       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5162       # Author: John Doe
5163       # Date: 11/06/2001
5164       #
5165       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5166       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5167       #
5168       AuthType=auth/munge
5169       Epilog=/usr/local/slurm/epilog
5170       Prolog=/usr/local/slurm/prolog
5171       FastSchedule=1
5172       FirstJobId=65536
5173       InactiveLimit=120
5174       JobCompType=jobcomp/filetxt
5175       JobCompLoc=/var/log/slurm/jobcomp
5176       KillWait=30
5177       MaxJobCount=10000
5178       MinJobAge=3600
5179       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5180       ReturnToService=0
5181       SchedulerType=sched/backfill
5182       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5183       SlurmdLogFile=/var/log/slurm/slurmd.log
5184       SlurmctldPort=7002
5185       SlurmdPort=7003
5186       SlurmdSpoolDir=/var/spool/slurmd.spool
5187       StateSaveLocation=/var/spool/slurm.state
5188       SwitchType=switch/none
5189       TmpFS=/tmp
5190       WaitTime=30
5191       JobCredentialPrivateKey=/usr/local/slurm/private.key
5192       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5193       #
5194       # Node Configurations
5195       #
5196       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5197       NodeName=DEFAULT State=UNKNOWN
5198       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5199       # Update records for specific DOWN nodes
5200       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5201       #
5202       # Partition Configurations
5203       #
5204       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5205       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5206       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5207       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5208
5209

INCLUDE MODIFIERS

5211       The "include" key word can be used with modifiers within the  specified
5212       pathname.  These modifiers would be replaced with cluster name or other
5213       information depending on which modifier is specified. If  the  included
5214       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5215       slash), it will searched for in the same directory  as  the  slurm.conf
5216       file.
5217
5218       %c     Cluster name specified in the slurm.conf will be used.
5219
5220       EXAMPLE
5221       ClusterName=linux
5222       include /home/slurm/etc/%c_config
5223       # Above line interpreted as
5224       # "include /home/slurm/etc/linux_config"
5225
5226

FILE AND DIRECTORY PERMISSIONS

5228       There  are  three  classes  of  files:  Files used by slurmctld must be
5229       accessible by user SlurmUser and accessible by the primary  and  backup
5230       control machines.  Files used by slurmd must be accessible by user root
5231       and accessible from every compute node.  A few files need to be  acces‐
5232       sible by normal users on all login and compute nodes.  While many files
5233       and directories are listed below, most of them will not  be  used  with
5234       most configurations.
5235
5236       AccountingStorageLoc
5237              If this specifies a file, it must be writable by user SlurmUser.
5238              The file must be accessible by the primary  and  backup  control
5239              machines.   It  is  recommended that the file be readable by all
5240              users from login and compute nodes.
5241
5242       Epilog Must be executable by user root.  It  is  recommended  that  the
5243              file  be  readable  by  all users.  The file must exist on every
5244              compute node.
5245
5246       EpilogSlurmctld
5247              Must be executable by user SlurmUser.  It  is  recommended  that
5248              the  file be readable by all users.  The file must be accessible
5249              by the primary and backup control machines.
5250
5251       HealthCheckProgram
5252              Must be executable by user root.  It  is  recommended  that  the
5253              file  be  readable  by  all users.  The file must exist on every
5254              compute node.
5255
5256       JobCheckpointDir
5257              Must be writable by user SlurmUser and no other users.  The file
5258              must be accessible by the primary and backup control machines.
5259
5260       JobCompLoc
5261              If this specifies a file, it must be writable by user SlurmUser.
5262              The file must be accessible by the primary  and  backup  control
5263              machines.
5264
5265       JobCredentialPrivateKey
5266              Must be readable only by user SlurmUser and writable by no other
5267              users.  The file must be accessible by the  primary  and  backup
5268              control machines.
5269
5270       JobCredentialPublicCertificate
5271              Readable  to  all  users  on all nodes.  Must not be writable by
5272              regular users.
5273
5274       MailProg
5275              Must be executable by user SlurmUser.  Must not be  writable  by
5276              regular  users.   The file must be accessible by the primary and
5277              backup control machines.
5278
5279       Prolog Must be executable by user root.  It  is  recommended  that  the
5280              file  be  readable  by  all users.  The file must exist on every
5281              compute node.
5282
5283       PrologSlurmctld
5284              Must be executable by user SlurmUser.  It  is  recommended  that
5285              the  file be readable by all users.  The file must be accessible
5286              by the primary and backup control machines.
5287
5288       ResumeProgram
5289              Must be executable by user SlurmUser.  The file must be accessi‐
5290              ble by the primary and backup control machines.
5291
5292       SallocDefaultCommand
5293              Must  be  executable by all users.  The file must exist on every
5294              login and compute node.
5295
5296       slurm.conf
5297              Readable to all users on all nodes.  Must  not  be  writable  by
5298              regular users.
5299
5300       SlurmctldLogFile
5301              Must be writable by user SlurmUser.  The file must be accessible
5302              by the primary and backup control machines.
5303
5304       SlurmctldPidFile
5305              Must be writable by user root.  Preferably writable  and  remov‐
5306              able  by  SlurmUser.  The file must be accessible by the primary
5307              and backup control machines.
5308
5309       SlurmdLogFile
5310              Must be writable by user root.  A distinct file  must  exist  on
5311              each compute node.
5312
5313       SlurmdPidFile
5314              Must  be  writable  by user root.  A distinct file must exist on
5315              each compute node.
5316
5317       SlurmdSpoolDir
5318              Must be writable by user root.  A distinct file  must  exist  on
5319              each compute node.
5320
5321       SrunEpilog
5322              Must  be  executable by all users.  The file must exist on every
5323              login and compute node.
5324
5325       SrunProlog
5326              Must be executable by all users.  The file must exist  on  every
5327              login and compute node.
5328
5329       StateSaveLocation
5330              Must be writable by user SlurmUser.  The file must be accessible
5331              by the primary and backup control machines.
5332
5333       SuspendProgram
5334              Must be executable by user SlurmUser.  The file must be accessi‐
5335              ble by the primary and backup control machines.
5336
5337       TaskEpilog
5338              Must  be  executable by all users.  The file must exist on every
5339              compute node.
5340
5341       TaskProlog
5342              Must be executable by all users.  The file must exist  on  every
5343              compute node.
5344
5345       UnkillableStepProgram
5346              Must be executable by user SlurmUser.  The file must be accessi‐
5347              ble by the primary and backup control machines.
5348
5349

LOGGING

5351       Note that while Slurm daemons create  log  files  and  other  files  as
5352       needed,  it  treats  the  lack  of parent directories as a fatal error.
5353       This prevents the daemons from running if critical file systems are not
5354       mounted  and  will minimize the risk of cold-starting (starting without
5355       preserving jobs).
5356
5357       Log files and job accounting files, may need to be created/owned by the
5358       "SlurmUser"  uid  to  be  successfully  accessed.   Use the "chown" and
5359       "chmod" commands to set the ownership  and  permissions  appropriately.
5360       See  the  section  FILE AND DIRECTORY PERMISSIONS for information about
5361       the various files and directories used by Slurm.
5362
5363       It is recommended that the logrotate utility be  used  to  ensure  that
5364       various  log  files do not become too large.  This also applies to text
5365       files used for accounting, process tracking, and the  slurmdbd  log  if
5366       they are used.
5367
5368       Here is a sample logrotate configuration. Make appropriate site modifi‐
5369       cations and save as  /etc/logrotate.d/slurm  on  all  nodes.   See  the
5370       logrotate man page for more details.
5371
5372       ##
5373       # Slurm Logrotate Configuration
5374       ##
5375       /var/log/slurm/*.log {
5376            compress
5377            missingok
5378            nocopytruncate
5379            nodelaycompress
5380            nomail
5381            notifempty
5382            noolddir
5383            rotate 5
5384            sharedscripts
5385            size=5M
5386            create 640 slurm root
5387            postrotate
5388                 for daemon in $(/usr/bin/scontrol show daemons)
5389                 do
5390                      killall -SIGUSR2 $daemon
5391                 done
5392            endscript
5393       }
5394
5395       NOTE: slurmdbd daemon isn't listed in the output of 'scontrol show dae‐
5396       mons', so a separate logrotate config should be used to send a  SIGUSR2
5397       signal to it.
5398

COPYING

5400       Copyright  (C)  2002-2007  The Regents of the University of California.
5401       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5402       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5403       Copyright (C) 2010-2017 SchedMD LLC.
5404
5405       This file is  part  of  Slurm,  a  resource  management  program.   For
5406       details, see <https://slurm.schedmd.com/>.
5407
5408       Slurm  is free software; you can redistribute it and/or modify it under
5409       the terms of the GNU General Public License as published  by  the  Free
5410       Software  Foundation;  either  version  2  of  the License, or (at your
5411       option) any later version.
5412
5413       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
5414       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
5415       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
5416       for more details.
5417
5418

FILES

5420       /etc/slurm.conf
5421
5422