slurm.conf(5)

1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified at execution time by setting the
17       SLURM_CONF environment variable. The Slurm daemons also  allow  you  to
18       override  both the built-in and environment-provided location using the
19       "-f" option on the command line.
20
21       The contents of the file are case insensitive except for the  names  of
22       nodes  and  partitions.  Any  text following a "#" in the configuration
23       file is treated as a comment through the end of that line.  Changes  to
24       the  configuration file take effect upon restart of Slurm daemons, dae‐
25       mon receipt of the SIGHUP signal, or execution of the command "scontrol
26       reconfigure" unless otherwise noted.
27
28       If  a  line  begins  with the word "Include" followed by whitespace and
29       then a file name, that file will be included inline  with  the  current
30       configuration  file.  For large or complex systems, multiple configura‐
31       tion files may prove easier to manage and enable reuse  of  some  files
32       (See INCLUDE MODIFIERS for more details).
33
34       Note on file permissions:
35
36       The slurm.conf file must be readable by all users of Slurm, since it is
37       used by many of the Slurm commands.  Other files that  are  defined  in
38       the  slurm.conf  file,  such as log files and job accounting files, may
39       need to be created/owned by the user "SlurmUser" to be successfully ac‐
40       cessed.   Use the "chown" and "chmod" commands to set the ownership and
41       permissions appropriately.  See the section FILE AND DIRECTORY  PERMIS‐
42       SIONS  for  information about the various files and directories used by
43       Slurm.
44
45

PARAMETERS

47       The overall configuration parameters available include:
48
49
50       AccountingStorageBackupHost
51              The name of the backup machine hosting  the  accounting  storage
52              database.   If used with the accounting_storage/slurmdbd plugin,
53              this is where the backup slurmdbd would be running.   Only  used
54              with systems using SlurmDBD, ignored otherwise.
55
56       AccountingStorageEnforce
57              This controls what level of association-based enforcement to im‐
58              pose on job submissions.  Valid options are any  combination  of
59              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
60              all for all things (except nojobs and nosteps, which must be re‐
61              quested as well).
62
63              If  limits,  qos, or wckeys are set, associations will automati‐
64              cally be set.
65
66              If wckeys is set, TrackWCKey will automatically be set.
67
68              If safe is set, limits and associations  will  automatically  be
69              set.
70
71              If nojobs is set, nosteps will automatically be set.
72
73              By  setting  associations, no new job is allowed to run unless a
74              corresponding association exists in the system.  If  limits  are
75              enforced,  users  can  be limited by association to whatever job
76              size or run time limits are defined.
77
78              If nojobs is set, Slurm will not account for any jobs  or  steps
79              on  the  system. Likewise, if nosteps is set, Slurm will not ac‐
80              count for any steps that have run.
81
82              If safe is enforced, a job will only be launched against an  as‐
83              sociation  or  qos  that has a GrpTRESMins limit set, if the job
84              will be able to run to completion. Without this option set, jobs
85              will  be  launched  as  long  as  their usage hasn't reached the
86              cpu-minutes limit. This can lead to jobs being launched but then
87              killed when the limit is reached.
88
89              With  qos  and/or wckeys enforced jobs will not be scheduled un‐
90              less a valid qos and/or workload characterization key is  speci‐
91              fied.
92
93              A restart of slurmctld is required for changes to this parameter
94              to take effect.
95
96       AccountingStorageExternalHost
97              A     comma-separated     list     of     external     slurmdbds
98              (<host/ip>[:port][,...])  to register with. If no port is given,
99              the AccountingStoragePort will be used.
100
101              This allows clusters registered with the  external  slurmdbd  to
102              communicate  with  each other using the --cluster/-M client com‐
103              mand options.
104
105              The cluster will add itself  to  the  external  slurmdbd  if  it
106              doesn't  exist.  If a non-external cluster already exists on the
107              external slurmdbd, the slurmctld will ignore registering to  the
108              external slurmdbd.
109
110       AccountingStorageHost
111              The name of the machine hosting the accounting storage database.
112              Only used with systems using SlurmDBD, ignored otherwise.
113
114       AccountingStorageParameters
115              Comma-separated list of  key-value  pair  parameters.  Currently
116              supported  values  include options to establish a secure connec‐
117              tion to the database:
118
119              SSL_CERT
120                The path name of the client public key certificate file.
121
122              SSL_CA
123                The path name of the Certificate  Authority  (CA)  certificate
124                file.
125
126              SSL_CAPATH
127                The  path  name  of the directory that contains trusted SSL CA
128                certificate files.
129
130              SSL_KEY
131                The path name of the client private key file.
132
133              SSL_CIPHER
134                The list of permissible ciphers for SSL encryption.
135
136       AccountingStoragePass
137              The password used to gain access to the database  to  store  the
138              accounting  data.   Only used for database type storage plugins,
139              ignored otherwise.  In the case of Slurm DBD  (Database  Daemon)
140              with  MUNGE authentication this can be configured to use a MUNGE
141              daemon specifically configured to provide authentication between
142              clusters  while the default MUNGE daemon provides authentication
143              within a cluster.  In that  case,  AccountingStoragePass  should
144              specify  the  named  port to be used for communications with the
145              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
146              The default value is NULL.
147
148       AccountingStoragePort
149              The  listening  port  of the accounting storage database server.
150              Only used for database type storage plugins, ignored  otherwise.
151              The  default  value  is  SLURMDBD_PORT  as established at system
152              build time. If no value is explicitly specified, it will be  set
153              to  6819.   This value must be equal to the DbdPort parameter in
154              the slurmdbd.conf file.
155
156       AccountingStorageTRES
157              Comma-separated list of resources you wish to track on the clus‐
158              ter.   These  are the resources requested by the sbatch/srun job
159              when it is submitted. Currently this consists of  any  GRES,  BB
160              (burst  buffer) or license along with CPU, Memory, Node, Energy,
161              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
162              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
163              These default TRES cannot be disabled,  but  only  appended  to.
164              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
165              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
166              along with a gres called craynetwork as well as a license called
167              iop1. Whenever these resources are used on the cluster they  are
168              recorded.  The  TRES are automatically set up in the database on
169              the start of the slurmctld.
170
171              If multiple GRES of different types are tracked  (e.g.  GPUs  of
172              different  types), then job requests with matching type specifi‐
173              cations will be recorded.  Given a  configuration  of  "Account‐
174              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
175              "gres/gpu:tesla" and "gres/gpu:volta" will track only jobs  that
176              explicitly  request  those  two GPU types, while "gres/gpu" will
177              track allocated GPUs of any type ("tesla", "volta" or any  other
178              GPU type).
179
180              Given      a      configuration      of      "AccountingStorage‐
181              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
182              "gres/gpu:volta"  will  track jobs that explicitly request those
183              GPU types.  If a job requests  GPUs,  but  does  not  explicitly
184              specify  the  GPU type, then its resource allocation will be ac‐
185              counted for as either "gres/gpu:tesla" or "gres/gpu:volta",  al‐
186              though  the  accounting  may not match the actual GPU type allo‐
187              cated to the job and the GPUs allocated to the job could be het‐
188              erogeneous.  In an environment containing various GPU types, use
189              of a job_submit plugin may be desired in order to force jobs  to
190              explicitly specify some GPU type.
191
192       AccountingStorageType
193              The  accounting  storage  mechanism  type.  Acceptable values at
194              present include "accounting_storage/none" and  "accounting_stor‐
195              age/slurmdbd".   The  "accounting_storage/slurmdbd"  value indi‐
196              cates that accounting records will be written to the Slurm  DBD,
197              which  manages  an underlying MySQL database. See "man slurmdbd"
198              for more information.  The default  value  is  "accounting_stor‐
199              age/none" and indicates that account records are not maintained.
200
201       AccountingStorageUser
202              The  user account for accessing the accounting storage database.
203              Only used for database type storage plugins, ignored otherwise.
204
205       AccountingStoreFlags
206              Comma separated list used to tell the slurmctld to  store  extra
207              fields  that may be more heavy weight than the normal job infor‐
208              mation.
209
210              Current options are:
211
212              job_comment
213                     Include the job's comment field in the job complete  mes‐
214                     sage  sent  to the Accounting Storage database.  Note the
215                     AdminComment and SystemComment are always recorded in the
216                     database.
217
218              job_env
219                     Include  a  batch job's environment variables used at job
220                     submission in the job start message sent to the  Account‐
221                     ing Storage database.
222
223              job_script
224                     Include  the  job's batch script in the job start message
225                     sent to the Accounting Storage database.
226
227       AcctGatherNodeFreq
228              The AcctGather plugins sampling interval  for  node  accounting.
229              For AcctGather plugin values of none, this parameter is ignored.
230              For all other values this parameter is the number of seconds be‐
231              tween  node  accounting samples. For the acct_gather_energy/rapl
232              plugin, set a value less than 300 because the counters may over‐
233              flow  beyond  this  rate.  The default value is zero. This value
234              disables accounting sampling for  nodes.  Note:  The  accounting
235              sampling  interval for jobs is determined by the value of JobAc‐
236              ctGatherFrequency.
237
238       AcctGatherEnergyType
239              Identifies the plugin to be used for energy consumption account‐
240              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
241              plugin to collect energy consumption data for  jobs  and  nodes.
242              The  collection  of  energy  consumption data takes place on the
243              node level, hence only in case of exclusive job  allocation  the
244              energy consumption measurements will reflect the job's real con‐
245              sumption. In case of node sharing between jobs the reported con‐
246              sumed  energy  per job (through sstat or sacct) will not reflect
247              the real energy consumed by the jobs.
248
249              Configurable values at present are:
250
251              acct_gather_energy/none
252                                  No energy consumption data is collected.
253
254              acct_gather_energy/ipmi
255                                  Energy consumption data  is  collected  from
256                                  the  Baseboard  Management  Controller (BMC)
257                                  using the  Intelligent  Platform  Management
258                                  Interface (IPMI).
259
260              acct_gather_energy/pm_counters
261                                  Energy  consumption  data  is collected from
262                                  the Baseboard  Management  Controller  (BMC)
263                                  for HPE Cray systems.
264
265              acct_gather_energy/rapl
266                                  Energy  consumption  data  is collected from
267                                  hardware sensors using the  Running  Average
268                                  Power  Limit (RAPL) mechanism. Note that en‐
269                                  abling RAPL may require the execution of the
270                                  command "sudo modprobe msr".
271
272              acct_gather_energy/xcc
273                                  Energy  consumption  data  is collected from
274                                  the Lenovo SD650 XClarity  Controller  (XCC)
275                                  using IPMI OEM raw commands.
276
277       AcctGatherInterconnectType
278              Identifies  the plugin to be used for interconnect network traf‐
279              fic accounting.  The jobacct_gather  plugin  and  slurmd  daemon
280              call  this  plugin  to collect network traffic data for jobs and
281              nodes.  The collection of network traffic data  takes  place  on
282              the  node  level, hence only in case of exclusive job allocation
283              the collected values will reflect the  job's  real  traffic.  In
284              case  of  node sharing between jobs the reported network traffic
285              per job (through sstat or sacct) will not reflect the real  net‐
286              work traffic by the jobs.
287
288              Configurable values at present are:
289
290              acct_gather_interconnect/none
291                                  No infiniband network data are collected.
292
293              acct_gather_interconnect/ofed
294                                  Infiniband  network  traffic  data  are col‐
295                                  lected from the hardware monitoring counters
296                                  of  Infiniband  devices through the OFED li‐
297                                  brary.  In order to account for per job net‐
298                                  work  traffic, add the "ic/ofed" TRES to Ac‐
299                                  countingStorageTRES.
300
301       AcctGatherFilesystemType
302              Identifies the plugin to be used for filesystem traffic account‐
303              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
304              plugin to collect filesystem traffic data for  jobs  and  nodes.
305              The  collection  of  filesystem  traffic data takes place on the
306              node level, hence only in case of exclusive job  allocation  the
307              collected values will reflect the job's real traffic. In case of
308              node sharing between jobs the reported  filesystem  traffic  per
309              job  (through sstat or sacct) will not reflect the real filesys‐
310              tem traffic by the jobs.
311
312
313              Configurable values at present are:
314
315              acct_gather_filesystem/none
316                                  No filesystem data are collected.
317
318              acct_gather_filesystem/lustre
319                                  Lustre filesystem traffic data are collected
320                                  from the counters found in /proc/fs/lustre/.
321                                  In order to account for per job lustre traf‐
322                                  fic,  add  the  "fs/lustre" TRES to Account‐
323                                  ingStorageTRES.
324
325       AcctGatherProfileType
326              Identifies the plugin to be used  for  detailed  job  profiling.
327              The  jobacct_gather plugin and slurmd daemon call this plugin to
328              collect detailed data such as I/O counts, memory usage,  or  en‐
329              ergy  consumption  for  jobs  and nodes. There are interfaces in
330              this plugin to collect data as step start and  completion,  task
331              start  and  completion, and at the account gather frequency. The
332              data collected at the node level is related to jobs only in case
333              of exclusive job allocation.
334
335              Configurable values at present are:
336
337              acct_gather_profile/none
338                                  No profile data is collected.
339
340              acct_gather_profile/hdf5
341                                  This  enables the HDF5 plugin. The directory
342                                  where the profile files are stored and which
343                                  values  are  collected are configured in the
344                                  acct_gather.conf file.
345
346              acct_gather_profile/influxdb
347                                  This enables the influxdb  plugin.  The  in‐
348                                  fluxdb instance host, port, database, reten‐
349                                  tion policy and which values  are  collected
350                                  are configured in the acct_gather.conf file.
351
352       AllowSpecResourcesUsage
353              If set to "YES", Slurm allows individual jobs to override node's
354              configured CoreSpecCount value. For a job to take  advantage  of
355              this feature, a command line option of --core-spec must be spec‐
356              ified.  The default value for this option is "YES" for Cray sys‐
357              tems and "NO" for other system types.
358
359       AuthAltTypes
360              Comma-separated  list of alternative authentication plugins that
361              the slurmctld will permit for communication.  Acceptable  values
362              at present include auth/jwt.
363
364              NOTE:  auth/jwt  requires a jwt_hs256.key to be populated in the
365              StateSaveLocation   directory   for    slurmctld    only.    The
366              jwt_hs256.key  should only be visible to the SlurmUser and root.
367              It is not suggested to place the jwt_hs256.key on any nodes  but
368              the  controller running slurmctld.  auth/jwt can be activated by
369              the presence of the SLURM_JWT environment variable.  When  acti‐
370              vated, it will override the default AuthType.
371
372       AuthAltParameters
373              Used  to define alternative authentication plugins options. Mul‐
374              tiple options may be comma separated.
375
376              disable_token_creation
377                             Disable "scontrol token" use by non-SlurmUser ac‐
378                             counts.
379
380              jwks=          Absolute  path  to JWKS file. Only RS256 keys are
381                             supported, although other key types may be listed
382                             in  the file. If set, no HS256 key will be loaded
383                             by default (and token  generation  is  disabled),
384                             although  the  jwt_key setting may be used to ex‐
385                             plicitly re-enable HS256 key use (and token  gen‐
386                             eration).
387
388              jwt_key=       Absolute path to JWT key file. Key must be HS256,
389                             and should only be accessible  by  SlurmUser.  If
390                             not set, the default key file is jwt_hs256.key in
391                             StateSaveLocation.
392
393       AuthInfo
394              Additional information to be used for authentication of communi‐
395              cations between the Slurm daemons (slurmctld and slurmd) and the
396              Slurm clients.  The interpretation of this option is specific to
397              the configured AuthType.  Multiple options may be specified in a
398              comma-delimited list.  If not specified, the default authentica‐
399              tion information will be used.
400
401              cred_expire   Default  job  step credential lifetime, in seconds
402                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
403                            ciently  long enough to load user environment, run
404                            prolog, deal with the slurmd getting paged out  of
405                            memory,  etc.   This  also controls how long a re‐
406                            queued job must wait before starting  again.   The
407                            default value is 120 seconds.
408
409              socket        Path  name  to  a MUNGE daemon socket to use (e.g.
410                            "socket=/var/run/munge/munge.socket.2").  The  de‐
411                            fault  value  is  "/var/run/munge/munge.socket.2".
412                            Used by auth/munge and cred/munge.
413
414              ttl           Credential lifetime, in seconds (e.g.  "ttl=300").
415                            The  default value is dependent upon the MUNGE in‐
416                            stallation, but is typically 300 seconds.
417
418       AuthType
419              The authentication method for communications between Slurm  com‐
420              ponents.   Acceptable  values  at  present include "auth/munge",
421              which is the default.  "auth/munge" indicates that MUNGE  is  to
422              be  used.  (See "https://dun.github.io/munge/" for more informa‐
423              tion).  All Slurm daemons and commands must be terminated  prior
424              to changing the value of AuthType and later restarted.
425
426       BackupAddr
427              Deprecated option, see SlurmctldHost.
428
429       BackupController
430              Deprecated option, see SlurmctldHost.
431
432              The backup controller recovers state information from the State‐
433              SaveLocation directory, which must be readable and writable from
434              both  the  primary and backup controllers.  While not essential,
435              it is recommended that you specify  a  backup  controller.   See
436              the RELOCATING CONTROLLERS section if you change this.
437
438       BatchStartTimeout
439              The  maximum time (in seconds) that a batch job is permitted for
440              launching before being considered missing and releasing the  al‐
441              location.  The  default value is 10 (seconds). Larger values may
442              be required if more time is required to execute the Prolog, load
443              user  environment  variables, or if the slurmd daemon gets paged
444              from memory.
445              Note: The test for a job being  successfully  launched  is  only
446              performed  when  the  Slurm daemon on the compute node registers
447              state with the slurmctld daemon on the head node, which  happens
448              fairly  rarely.   Therefore a job will not necessarily be termi‐
449              nated if its start time exceeds BatchStartTimeout.  This config‐
450              uration  parameter  is  also  applied  to launch tasks and avoid
451              aborting srun commands due to long running Prolog scripts.
452
453       BcastExclude
454              Comma-separated list of absolute directory paths to be  excluded
455              when autodetecting and broadcasting executable shared object de‐
456              pendencies through sbcast or srun --bcast.  The  keyword  "none"
457              can  be  used  to indicate that no directory paths should be ex‐
458              cluded. The default value is  "/lib,/usr/lib,/lib64,/usr/lib64".
459              This  option  can  be  overridden  by  sbcast --exclude and srun
460              --bcast-exclude.
461
462       BcastParameters
463              Controls sbcast and srun --bcast behavior. Multiple options  can
464              be  specified  in  a comma separated list.  Supported values in‐
465              clude:
466
467              DestDir=       Destination directory for file being broadcast to
468                             allocated  compute  nodes.  Default value is cur‐
469                             rent working directory, or --chdir  for  srun  if
470                             set.
471
472              Compression=   Specify  default  file  compression library to be
473                             used.  Supported values  are  "lz4"  and  "none".
474                             The  default value with the sbcast --compress op‐
475                             tion is "lz4" and "none"  otherwise.   Some  com‐
476                             pression  libraries  may  be  unavailable on some
477                             systems.
478
479              send_libs      If set, attempt to autodetect and  broadcast  the
480                             executable's  shared object dependencies to allo‐
481                             cated compute nodes. The files are  placed  in  a
482                             directory  alongside  the  executable.  For  srun
483                             only, the LD_LIBRARY_PATH  is  automatically  up‐
484                             dated  to  include  this cache directory as well.
485                             This can be overridden with either sbcast or srun
486                             --send-libs option. By default this is disabled.
487
488       BurstBufferType
489              The  plugin  used  to manage burst buffers. Acceptable values at
490              present are:
491
492              burst_buffer/datawarp
493                     Use Cray DataWarp API to provide burst buffer functional‐
494                     ity.
495
496              burst_buffer/lua
497                     This plugin provides hooks to an API that is defined by a
498                     Lua script. This plugin was developed to  provide  system
499                     administrators  with  a way to do any task (not only file
500                     staging) at different points in a job’s life cycle.
501
502              burst_buffer/none
503
504       CliFilterPlugins
505              A comma-delimited list of command  line  interface  option  fil‐
506              ter/modification plugins. The specified plugins will be executed
507              in the order listed.  No cli_filter plugins are used by default.
508              Acceptable values at present are:
509
510              cli_filter/lua
511                     This  plugin  allows you to write your own implementation
512                     of a cli_filter using lua.
513
514              cli_filter/syslog
515                     This plugin enables logging of job submission  activities
516                     performed.  All the salloc/sbatch/srun options are logged
517                     to syslog together with  environment  variables  in  JSON
518                     format.  If the plugin is not the last one in the list it
519                     may log values different than what was actually  sent  to
520                     slurmctld.
521
522              cli_filter/user_defaults
523                     This  plugin looks for the file $HOME/.slurm/defaults and
524                     reads every line of it as a key=value pair, where key  is
525                     any  of  the  job  submission  options  available to sal‐
526                     loc/sbatch/srun and value is a default value  defined  by
527                     the user. For instance:
528                     time=1:30
529                     mem=2048
530                     The  above will result in a user defined default for each
531                     of their jobs of "-t 1:30" and "--mem=2048".
532
533       ClusterName
534              The name by which this Slurm managed cluster is known in the ac‐
535              counting   database.   This  is  needed  distinguish  accounting
536              records when multiple clusters report to the same database.  Be‐
537              cause  of  limitations in some databases, any upper case letters
538              in the name will be silently mapped to lower case. In  order  to
539              avoid confusion, it is recommended that the name be lower case.
540
541       CommunicationParameters
542              Comma-separated options identifying communication options.
543
544              block_null_hash
545                             Require  all  Slurm  authentication tokens to in‐
546                             clude a newer (20.11.9 and 21.08.8) payload  that
547                             provides  an additional layer of security against
548                             credential replay attacks.   This  option  should
549                             only  be enabled once all Slurm daemons have been
550                             upgraded to 20.11.9/21.08.8  or  newer,  and  all
551                             jobs  that  were  started before the upgrade have
552                             been completed.
553
554              CheckGhalQuiesce
555                             Used specifically on a Cray using an  Aries  Ghal
556                             interconnect.  This will check to see if the sys‐
557                             tem is quiescing when sending a message,  and  if
558                             so, we wait until it is done before sending.
559
560              DisableIPv4    Disable IPv4 only operation for all slurm daemons
561                             (except slurmdbd). This should  also  be  set  in
562                             your slurmdbd.conf file.
563
564              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
565                             (except slurmdbd). When using both IPv4 and IPv6,
566                             address  family preferences will be based on your
567                             /etc/gai.conf file. This should also  be  set  in
568                             your slurmdbd.conf file.
569
570              NoAddrCache    By default, Slurm will cache a node's network ad‐
571                             dress after successfully establishing the  node's
572                             network  address.  This option disables the cache
573                             and Slurm will look up the node's network address
574                             each  time a connection is made.  This is useful,
575                             for example, in a  cloud  environment  where  the
576                             node addresses come and go out of DNS.
577
578              NoCtldInAddrAny
579                             Used  to directly bind to the address of what the
580                             node resolves to running the slurmctld instead of
581                             binding  messages  to  any  address  on the node,
582                             which is the default.
583
584              NoInAddrAny    Used to directly bind to the address of what  the
585                             node  resolves  to instead of binding messages to
586                             any address on the node  which  is  the  default.
587                             This option is for all daemons/clients except for
588                             the slurmctld.
589
590       CompleteWait
591              The time to wait, in seconds, when any job is in the  COMPLETING
592              state  before  any additional jobs are scheduled. This is to at‐
593              tempt to keep jobs on nodes that were recently in use, with  the
594              goal  of preventing fragmentation.  If set to zero, pending jobs
595              will be started as soon as possible.  Since a  COMPLETING  job's
596              resources are released for use by other jobs as soon as the Epi‐
597              log completes on each individual node, this can result  in  very
598              fragmented resource allocations.  To provide jobs with the mini‐
599              mum response time, a value of zero is recommended (no  waiting).
600              To  minimize  fragmentation of resources, a value equal to Kill‐
601              Wait plus two is recommended.  In that case, setting KillWait to
602              a small value may be beneficial.  The default value of Complete‐
603              Wait is zero seconds.  The value may not exceed 65533.
604
605              NOTE: Setting reduce_completing_frag  affects  the  behavior  of
606              CompleteWait.
607
608       ControlAddr
609              Deprecated option, see SlurmctldHost.
610
611       ControlMachine
612              Deprecated option, see SlurmctldHost.
613
614       CoreSpecPlugin
615              Identifies  the  plugins to be used for enforcement of core spe‐
616              cialization.  A restart of the slurmd daemons  is  required  for
617              changes  to this parameter to take effect.  Acceptable values at
618              present include:
619
620              core_spec/cray_aries
621                                  used only for Cray systems
622
623              core_spec/none      used for all other system types
624
625       CpuFreqDef
626              Default CPU frequency value or frequency governor  to  use  when
627              running  a  job  step if it has not been explicitly set with the
628              --cpu-freq option.  Acceptable values at present include  a  nu‐
629              meric  value  (frequency  in  kilohertz) or one of the following
630              governors:
631
632              Conservative  attempts to use the Conservative CPU governor
633
634              OnDemand      attempts to use the OnDemand CPU governor
635
636              Performance   attempts to use the Performance CPU governor
637
638              PowerSave     attempts to use the PowerSave CPU governor
639       There is no default value. If unset, no attempt to set the governor  is
640       made if the --cpu-freq option has not been set.
641
642       CpuFreqGovernors
643              List  of CPU frequency governors allowed to be set with the sal‐
644              loc, sbatch, or srun option  --cpu-freq.  Acceptable  values  at
645              present include:
646
647              Conservative  attempts to use the Conservative CPU governor
648
649              OnDemand      attempts  to  use the OnDemand CPU governor (a de‐
650                            fault value)
651
652              Performance   attempts to use the Performance  CPU  governor  (a
653                            default value)
654
655              PowerSave     attempts to use the PowerSave CPU governor
656
657              SchedUtil     attempts to use the SchedUtil CPU governor
658
659              UserSpace     attempts  to use the UserSpace CPU governor (a de‐
660                            fault value)
661       The default is OnDemand, Performance and UserSpace.
662
663       CredType
664              The cryptographic signature tool to be used in the  creation  of
665              job  step  credentials.   A restart of slurmctld is required for
666              changes to this parameter to take effect.  The default (and rec‐
667              ommended) value is "cred/munge".
668
669       DebugFlags
670              Defines  specific  subsystems which should provide more detailed
671              event logging.  Multiple subsystems can be specified with  comma
672              separators.   Most  DebugFlags will result in verbose-level log‐
673              ging for the identified subsystems,  and  could  impact  perfor‐
674              mance.  Valid subsystems available include:
675
676              Accrue           Accrue counters accounting details
677
678              Agent            RPC agents (outgoing RPCs from Slurm daemons)
679
680              Backfill         Backfill scheduler details
681
682              BackfillMap      Backfill scheduler to log a very verbose map of
683                               reserved resources through time.  Combine  with
684                               Backfill for a verbose and complete view of the
685                               backfill scheduler's work.
686
687              BurstBuffer      Burst Buffer plugin
688
689              Cgroup           Cgroup details
690
691              CPU_Bind         CPU binding details for jobs and steps
692
693              CpuFrequency     Cpu frequency details for jobs and steps  using
694                               the --cpu-freq option.
695
696              Data             Generic data structure details.
697
698              Dependency       Job dependency debug info
699
700              Elasticsearch    Elasticsearch debug info
701
702              Energy           AcctGatherEnergy debug info
703
704              ExtSensors       External Sensors debug info
705
706              Federation       Federation scheduling debug info
707
708              FrontEnd         Front end node details
709
710              Gres             Generic resource details
711
712              Hetjob           Heterogeneous job details
713
714              Gang             Gang scheduling details
715
716              JobAccountGather Common   job  account  gathering  details  (not
717                               plugin specific).
718
719              JobContainer     Job container plugin details
720
721              License          License management details
722
723              Network          Network details. Warning: activating this  flag
724                               may cause logging of passwords, tokens or other
725                               authentication credentials.
726
727              NetworkRaw       Dump raw hex values of key  Network  communica‐
728                               tions.  Warning: This flag will cause very ver‐
729                               bose logs and may cause logging  of  passwords,
730                               tokens or other authentication credentials.
731
732              NodeFeatures     Node Features plugin debug info
733
734              NO_CONF_HASH     Do not log when the slurm.conf files differ be‐
735                               tween Slurm daemons
736
737              Power            Power management plugin and  power  save  (sus‐
738                               pend/resume programs) details
739
740              Priority         Job prioritization
741
742              Profile          AcctGatherProfile plugins details
743
744              Protocol         Communication protocol details
745
746              Reservation      Advanced reservations
747
748              Route            Message forwarding debug info
749
750              Script           Debug  info  regarding  the  process  that runs
751                               slurmctld scripts such as  PrologSlurmctld  and
752                               EpilogSlurmctld
753
754              SelectType       Resource selection plugin
755
756              Steps            Slurmctld resource allocation for job steps
757
758              Switch           Switch plugin
759
760              TimeCray         Timing of Cray APIs
761
762              TraceJobs        Trace jobs in slurmctld. It will print detailed
763                               job information including state,  job  ids  and
764                               allocated nodes counter.
765
766              Triggers         Slurmctld triggers
767
768              WorkQueue        Work Queue details
769
770       DefCpuPerGPU
771              Default count of CPUs allocated per allocated GPU. This value is
772              used  only  if  the  job  didn't  specify  --cpus-per-task   and
773              --cpus-per-gpu.
774
775       DefMemPerCPU
776              Default   real  memory  size  available  per  allocated  CPU  in
777              megabytes.  Used to avoid over-subscribing  memory  and  causing
778              paging.  DefMemPerCPU would generally be used if individual pro‐
779              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
780              lectType=select/cons_tres).  The default value is 0 (unlimited).
781              Also see DefMemPerGPU, DefMemPerNode and MaxMemPerCPU.   DefMem‐
782              PerCPU, DefMemPerGPU and DefMemPerNode are mutually exclusive.
783
784       DefMemPerGPU
785              Default   real  memory  size  available  per  allocated  GPU  in
786              megabytes.  The  default  value  is  0  (unlimited).   Also  see
787              DefMemPerCPU  and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU and
788              DefMemPerNode are mutually exclusive.
789
790       DefMemPerNode
791              Default  real  memory  size  available  per  allocated  node  in
792              megabytes.   Used  to  avoid over-subscribing memory and causing
793              paging.  DefMemPerNode would generally be used  if  whole  nodes
794              are  allocated  to jobs (SelectType=select/linear) and resources
795              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
796              The  default  value  is  0  (unlimited).  Also see DefMemPerCPU,
797              DefMemPerGPU and MaxMemPerCPU.  DefMemPerCPU,  DefMemPerGPU  and
798              DefMemPerNode are mutually exclusive.
799
800       DependencyParameters
801              Multiple options may be comma separated.
802
803              disable_remote_singleton
804                     By  default,  when a federated job has a singleton depen‐
805                     dency, each cluster in the federation must clear the sin‐
806                     gleton  dependency  before the job's singleton dependency
807                     is considered satisfied. Enabling this option means  that
808                     only  the  origin cluster must clear the singleton depen‐
809                     dency. This option must be set in every  cluster  in  the
810                     federation.
811
812              kill_invalid_depend
813                     If  a  job has an invalid dependency and it can never run
814                     terminate it and set its state to  be  JOB_CANCELLED.  By
815                     default  the job stays pending with reason DependencyNev‐
816                     erSatisfied.  max_depend_depth=# Maximum number  of  jobs
817                     to test for a circular job dependency. Stop testing after
818                     this number of job dependencies have been tested. The de‐
819                     fault value is 10 jobs.
820
821       DisableRootJobs
822              If  set  to  "YES" then user root will be prevented from running
823              any jobs.  The default value is "NO", meaning user root will  be
824              able to execute jobs.  DisableRootJobs may also be set by parti‐
825              tion.
826
827       EioTimeout
828              The number of seconds srun waits for  slurmstepd  to  close  the
829              TCP/IP  connection  used to relay data between the user applica‐
830              tion and srun when the user application terminates. The  default
831              value is 60 seconds.  May not exceed 65533.
832
833       EnforcePartLimits
834              If set to "ALL" then jobs which exceed a partition's size and/or
835              time limits will be rejected at submission time. If job is  sub‐
836              mitted  to  multiple partitions, the job must satisfy the limits
837              on all the requested partitions. If set to  "NO"  then  the  job
838              will  be  accepted  and remain queued until the partition limits
839              are altered(Time and Node Limits).  If set to "ANY" a  job  must
840              satisfy any of the requested partitions to be submitted. The de‐
841              fault value is "NO".  NOTE: If set, then a job's QOS can not  be
842              used to exceed partition limits.  NOTE: The partition limits be‐
843              ing considered are its configured  MaxMemPerCPU,  MaxMemPerNode,
844              MinNodes,  MaxNodes,  MaxTime, AllocNodes, AllowAccounts, Allow‐
845              Groups, AllowQOS, and QOS usage threshold.
846
847       Epilog Fully qualified pathname of a script to execute as user root  on
848              every   node   when  a  user's  job  completes  (e.g.  "/usr/lo‐
849              cal/slurm/epilog"). A glob pattern (See glob (7))  may  also  be
850              used  to  run more than one epilog script (e.g. "/etc/slurm/epi‐
851              log.d/*"). The Epilog script or scripts may  be  used  to  purge
852              files,  disable user login, etc.  By default there is no epilog.
853              See Prolog and Epilog Scripts for more information.
854
855       EpilogMsgTime
856              The number of microseconds that the slurmctld daemon requires to
857              process  an  epilog  completion message from the slurmd daemons.
858              This parameter can be used to prevent a burst of epilog  comple‐
859              tion messages from being sent at the same time which should help
860              prevent lost messages and improve  throughput  for  large  jobs.
861              The  default  value  is 2000 microseconds.  For a 1000 node job,
862              this spreads the epilog completion messages out  over  two  sec‐
863              onds.
864
865       EpilogSlurmctld
866              Fully  qualified pathname of a program for the slurmctld to exe‐
867              cute upon termination  of  a  job  allocation  (e.g.   "/usr/lo‐
868              cal/slurm/epilog_controller").   The  program  executes as Slur‐
869              mUser, which gives it permission to drain nodes and requeue  the
870              job  if  a  failure  occurs (See scontrol(1)).  Exactly what the
871              program does and how it accomplishes this is completely  at  the
872              discretion  of  the system administrator.  Information about the
873              job being initiated, its allocated nodes, etc. are passed to the
874              program  using  environment  variables.   See  Prolog and Epilog
875              Scripts for more information.
876
877       ExtSensorsFreq
878              The external  sensors  plugin  sampling  interval.   If  ExtSen‐
879              sorsType=ext_sensors/none,  this  parameter is ignored.  For all
880              other values of ExtSensorsType, this parameter is the number  of
881              seconds between external sensors samples for hardware components
882              (nodes, switches, etc.) The default value is  zero.  This  value
883              disables  external  sensors  sampling. Note: This parameter does
884              not affect external sensors data collection for jobs/steps.
885
886       ExtSensorsType
887              Identifies the plugin to be used for external sensors data  col‐
888              lection.   Slurmctld  calls this plugin to collect external sen‐
889              sors data for jobs/steps and hardware  components.  In  case  of
890              node  sharing  between  jobs  the  reported  values per job/step
891              (through sstat or sacct) may not be  accurate.   See  also  "man
892              ext_sensors.conf".
893
894              Configurable values at present are:
895
896              ext_sensors/none    No external sensors data is collected.
897
898              ext_sensors/rrd     External  sensors data is collected from the
899                                  RRD database.
900
901       FairShareDampeningFactor
902              Dampen the effect of exceeding a user or group's fair  share  of
903              allocated resources. Higher values will provides greater ability
904              to differentiate between exceeding the fair share at high levels
905              (e.g. a value of 1 results in almost no difference between over‐
906              consumption by a factor of 10 and 100, while a value of  5  will
907              result  in  a  significant difference in priority).  The default
908              value is 1.
909
910       FederationParameters
911              Used to define federation options. Multiple options may be comma
912              separated.
913
914              fed_display
915                     If  set,  then  the  client status commands (e.g. squeue,
916                     sinfo, sprio, etc.) will display information in a  feder‐
917                     ated view by default. This option is functionally equiva‐
918                     lent to using the --federation options on  each  command.
919                     Use the client's --local option to override the federated
920                     view and get a local view of the given cluster.
921
922       FirstJobId
923              The job id to be used for the first job submitted to Slurm.  Job
924              id  values  generated  will incremented by 1 for each subsequent
925              job.  Value must be larger than 0. The default value is 1.  Also
926              see MaxJobId
927
928       GetEnvTimeout
929              Controls  how  long the job should wait (in seconds) to load the
930              user's environment before attempting to load  it  from  a  cache
931              file.   Applies  when the salloc or sbatch --get-user-env option
932              is used.  If set to 0 then always load  the  user's  environment
933              from the cache file.  The default value is 2 seconds.
934
935       GresTypes
936              A  comma-delimited list of generic resources to be managed (e.g.
937              GresTypes=gpu,mps).  These resources may have an associated GRES
938              plugin  of the same name providing additional functionality.  No
939              generic resources are managed by default.  Ensure this parameter
940              is  consistent across all nodes in the cluster for proper opera‐
941              tion.  A restart of slurmctld and the slurmd daemons is required
942              for this to take effect.
943
944       GroupUpdateForce
945              If  set  to a non-zero value, then information about which users
946              are members of groups allowed to use a partition will be updated
947              periodically,  even  when  there  have  been  no  changes to the
948              /etc/group file.  If set to zero, group member information  will
949              be  updated  only after the /etc/group file is updated.  The de‐
950              fault value is 1.  Also see the GroupUpdateTime parameter.
951
952       GroupUpdateTime
953              Controls how frequently information about which users  are  mem‐
954              bers  of  groups allowed to use a partition will be updated, and
955              how long user group membership lists will be cached.   The  time
956              interval  is  given  in seconds with a default value of 600 sec‐
957              onds.  A value of zero will prevent periodic updating  of  group
958              membership  information.   Also see the GroupUpdateForce parame‐
959              ter.
960
961       GpuFreqDef=[<type]=value>[,<type=value>]
962              Default GPU frequency to use when running a job step if  it  has
963              not  been  explicitly set using the --gpu-freq option.  This op‐
964              tion can be used to independently configure the GPU and its mem‐
965              ory  frequencies. Defaults to "high,memory=high".  After the job
966              is completed, the frequencies of all affected GPUs will be reset
967              to  the  highest  possible  values.  In some cases, system power
968              caps may override the requested values.  The field type  can  be
969              "memory".   If  type  is not specified, the GPU frequency is im‐
970              plied.  The value field can either be "low",  "medium",  "high",
971              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
972              fied numeric value is not possible, a value as close as possible
973              will be used.  See below for definition of the values.  Examples
974              of  use  include  "GpuFreqDef=medium,memory=high  and   "GpuFre‐
975              qDef=450".
976
977              Supported value definitions:
978
979              low       the lowest available frequency.
980
981              medium    attempts  to  set  a  frequency  in  the middle of the
982                        available range.
983
984              high      the highest available frequency.
985
986              highm1    (high minus one) will select the next  highest  avail‐
987                        able frequency.
988
989       HealthCheckInterval
990              The  interval  in  seconds between executions of HealthCheckPro‐
991              gram.  The default value is zero, which disables execution.
992
993       HealthCheckNodeState
994              Identify what node states should execute the HealthCheckProgram.
995              Multiple  state  values may be specified with a comma separator.
996              The default value is ANY to execute on nodes in any state.
997
998              ALLOC       Run on nodes in the  ALLOC  state  (all  CPUs  allo‐
999                          cated).
1000
1001              ANY         Run on nodes in any state.
1002
1003              CYCLE       Rather  than running the health check program on all
1004                          nodes at the same time, cycle through running on all
1005                          compute nodes through the course of the HealthCheck‐
1006                          Interval. May be  combined  with  the  various  node
1007                          state options.
1008
1009              IDLE        Run on nodes in the IDLE state.
1010
1011              MIXED       Run  on nodes in the MIXED state (some CPUs idle and
1012                          other CPUs allocated).
1013
1014       HealthCheckProgram
1015              Fully qualified pathname of a script to execute as user root pe‐
1016              riodically on all compute nodes that are not in the NOT_RESPOND‐
1017              ING state. This program may be used to verify the node is  fully
1018              operational and DRAIN the node or send email if a problem is de‐
1019              tected.  Any action to be taken must be explicitly performed  by
1020              the   program   (e.g.   execute  "scontrol  update  NodeName=foo
1021              State=drain Reason=tmp_file_system_full" to drain a node).   The
1022              execution  interval  is controlled using the HealthCheckInterval
1023              parameter.  Note that the HealthCheckProgram will be executed at
1024              the  same time on all nodes to minimize its impact upon parallel
1025              programs.  This program is will be killed if it does not  termi‐
1026              nate normally within 60 seconds.  This program will also be exe‐
1027              cuted when the slurmd daemon is first started and before it reg‐
1028              isters  with  the slurmctld daemon.  By default, no program will
1029              be executed.
1030
1031       InactiveLimit
1032              The interval, in seconds, after which a non-responsive job allo‐
1033              cation  command (e.g. srun or salloc) will result in the job be‐
1034              ing terminated. If the node on which  the  command  is  executed
1035              fails  or the command abnormally terminates, this will terminate
1036              its job allocation.  This option has no effect upon batch  jobs.
1037              When  setting  a  value, take into consideration that a debugger
1038              using srun to launch an application may leave the  srun  command
1039              in  a stopped state for extended periods of time.  This limit is
1040              ignored for jobs running in partitions with  the  RootOnly  flag
1041              set  (the  scheduler running as root will be responsible for the
1042              job).  The default value is unlimited (zero) and may not  exceed
1043              65533 seconds.
1044
1045       InteractiveStepOptions
1046              When LaunchParameters=use_interactive_step is enabled, launching
1047              salloc will automatically start an srun  process  with  Interac‐
1048              tiveStepOptions  to launch a terminal on a node in the job allo‐
1049              cation.  The  default  value  is  "--interactive  --preserve-env
1050              --pty  $SHELL".  The "--interactive" option is intentionally not
1051              documented in the srun man page. It is meant only to be used  in
1052              InteractiveStepOptions  in order to create an "interactive step"
1053              that will not consume resources so that other steps may  run  in
1054              parallel with the interactive step.
1055
1056       JobAcctGatherType
1057              The job accounting mechanism type.  Acceptable values at present
1058              include    "jobacct_gather/linux"    (for    Linux     systems),
1059              "jobacct_gather/cgroup" and "jobacct_gather/none" (no accounting
1060              data collected).  The default  value  is  "jobacct_gather/none".
1061              "jobacct_gather/cgroup" is a plugin for the Linux operating sys‐
1062              tem that uses cgroups  to  collect  accounting  statistics.  The
1063              plugin collects the following statistics: From the cgroup memory
1064              subsystem: memory.usage_in_bytes (reported as 'pages')  and  rss
1065              from  memory.stat  (reported  as 'rss'). From the cgroup cpuacct
1066              subsystem: user cpu time and system cpu time. No value  is  pro‐
1067              vided by cgroups for virtual memory size ('vsize').  In order to
1068              use    the     sstat     tool     "jobacct_gather/linux",     or
1069              "jobacct_gather/cgroup" must be configured.
1070              NOTE: Changing this configuration parameter changes the contents
1071              of the messages between Slurm daemons.  Any  previously  running
1072              job  steps  are managed by a slurmstepd daemon that will persist
1073              through the lifetime of that job step and not change its  commu‐
1074              nication protocol. Only change this configuration parameter when
1075              there are no running job steps.
1076
1077       JobAcctGatherFrequency
1078              The job accounting and profiling sampling intervals.   The  sup‐
1079              ported format is follows:
1080
1081              JobAcctGatherFrequency=<datatype>=<interval>
1082                          where  <datatype>=<interval> specifies the task sam‐
1083                          pling interval for the jobacct_gather  plugin  or  a
1084                          sampling  interval  for  a  profiling  type  by  the
1085                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1086                          rated  <datatype>=<interval> intervals may be speci‐
1087                          fied. Supported datatypes are as follows:
1088
1089                          task=<interval>
1090                                 where <interval> is the task sampling  inter‐
1091                                 val in seconds for the jobacct_gather plugins
1092                                 and    for    task    profiling    by     the
1093                                 acct_gather_profile plugin.
1094
1095                          energy=<interval>
1096                                 where  <interval> is the sampling interval in
1097                                 seconds  for  energy  profiling   using   the
1098                                 acct_gather_energy plugin
1099
1100                          network=<interval>
1101                                 where  <interval> is the sampling interval in
1102                                 seconds for infiniband  profiling  using  the
1103                                 acct_gather_interconnect plugin.
1104
1105                          filesystem=<interval>
1106                                 where  <interval> is the sampling interval in
1107                                 seconds for filesystem  profiling  using  the
1108                                 acct_gather_filesystem plugin.
1109
1110
1111                     The  default  value  for task sampling
1112                     interval
1113              is 30 seconds. The default value for all other intervals  is  0.
1114              An  interval  of  0 disables sampling of the specified type.  If
1115              the task sampling interval is 0, accounting information is  col‐
1116              lected only at job termination (reducing Slurm interference with
1117              the job).
1118              Smaller (non-zero) values have a greater impact upon job perfor‐
1119              mance,  but a value of 30 seconds is not likely to be noticeable
1120              for applications having less than 10,000 tasks.
1121              Users can independently override each interval on a per job  ba‐
1122              sis using the --acctg-freq option when submitting the job.
1123
1124       JobAcctGatherParams
1125              Arbitrary parameters for the job account gather plugin.  Accept‐
1126              able values at present include:
1127
1128              NoShared            Exclude shared memory from accounting.
1129
1130              UsePss              Use PSS value instead of  RSS  to  calculate
1131                                  real usage of memory.  The PSS value will be
1132                                  saved as RSS.
1133
1134              OverMemoryKill      Kill processes that are  being  detected  to
1135                                  use  more memory than requested by steps ev‐
1136                                  ery time accounting information is  gathered
1137                                  by the JobAcctGather plugin.  This parameter
1138                                  should be used with caution  because  a  job
1139                                  exceeding  its  memory allocation may affect
1140                                  other processes and/or machine health.
1141
1142                                  NOTE: If available,  it  is  recommended  to
1143                                  limit  memory  by  enabling task/cgroup as a
1144                                  TaskPlugin  and  making  use  of  Constrain‐
1145                                  RAMSpace=yes  in  the cgroup.conf instead of
1146                                  using this JobAcctGather mechanism for  mem‐
1147                                  ory   enforcement.  Using  JobAcctGather  is
1148                                  polling based and there is a delay before  a
1149                                  job  is  killed,  which could lead to system
1150                                  Out of Memory events.
1151
1152                                  NOTE: When using OverMemoryKill, if the com‐
1153                                  bined  memory used by all the processes in a
1154                                  step exceeds the memory  limit,  the  entire
1155                                  step  will be killed/cancelled by the JobAc‐
1156                                  ctGather plugin.  This differs from the  be‐
1157                                  havior  when  using ConstrainRAMSpace, where
1158                                  processes in the step will  be  killed,  but
1159                                  the  step will be left active, possibly with
1160                                  other processes left running.
1161
1162       JobCompHost
1163              The name of the machine hosting  the  job  completion  database.
1164              Only used for database type storage plugins, ignored otherwise.
1165
1166       JobCompLoc
1167              The  fully  qualified file name where job completion records are
1168              written when the JobCompType is "jobcomp/filetxt" or  the  data‐
1169              base  where  job completion records are stored when the JobComp‐
1170              Type is a database, or  a  complete  URL  endpoint  with  format
1171              <host>:<port>/<target>/_doc  when  JobCompType is "jobcomp/elas‐
1172              ticsearch" like i.e.  "localhost:9200/slurm/_doc".   NOTE:  More
1173              information    is    available    at    the   Slurm   web   site
1174              <https://slurm.schedmd.com/elasticsearch.html>.
1175
1176       JobCompParams
1177              Pass arbitrary text string to job completion plugin.   Also  see
1178              JobCompType.
1179
1180       JobCompPass
1181              The  password  used  to gain access to the database to store the
1182              job completion data.  Only used for database type storage  plug‐
1183              ins, ignored otherwise.
1184
1185       JobCompPort
1186              The  listening port of the job completion database server.  Only
1187              used for database type storage plugins, ignored otherwise.
1188
1189       JobCompType
1190              The job completion logging mechanism type.  Acceptable values at
1191              present include:
1192
1193              jobcomp/none
1194                     Upon  job  completion, a record of the job is purged from
1195                     the system.  If using the accounting infrastructure  this
1196                     plugin  may not be of interest since some of the informa‐
1197                     tion is redundant.
1198
1199              jobcomp/elasticsearch
1200                     Upon job completion, a record of the job should be  writ‐
1201                     ten  to an Elasticsearch server, specified by the JobCom‐
1202                     pLoc parameter.
1203                     NOTE: More information is available at the Slurm web site
1204                     ( https://slurm.schedmd.com/elasticsearch.html ).
1205
1206              jobcomp/filetxt
1207                     Upon  job completion, a record of the job should be writ‐
1208                     ten to a text file, specified by the  JobCompLoc  parame‐
1209                     ter.
1210
1211              jobcomp/lua
1212                     Upon  job  completion, a record of the job should be pro‐
1213                     cessed by the jobcomp.lua script, located in the  default
1214                     script  directory  (typically the subdirectory etc of the
1215                     installation directory.
1216
1217              jobcomp/mysql
1218                     Upon job completion, a record of the job should be  writ‐
1219                     ten to a MySQL or MariaDB database, specified by the Job‐
1220                     CompLoc parameter.
1221
1222              jobcomp/script
1223                     Upon job completion, a script specified by the JobCompLoc
1224                     parameter  is  to  be executed with environment variables
1225                     providing the job information.
1226
1227       JobCompUser
1228              The user account for  accessing  the  job  completion  database.
1229              Only used for database type storage plugins, ignored otherwise.
1230
1231       JobContainerType
1232              Identifies the plugin to be used for job tracking.  A restart of
1233              slurmctld is required for changes to this parameter to take  ef‐
1234              fect.   NOTE:  The JobContainerType applies to a job allocation,
1235              while ProctrackType applies to job steps.  Acceptable values  at
1236              present include:
1237
1238              job_container/cncu  Used  only  for Cray systems (CNCU = Compute
1239                                  Node Clean Up)
1240
1241              job_container/none  Used for all other system types
1242
1243              job_container/tmpfs Used to create a private  namespace  on  the
1244                                  filesystem  for jobs, which houses temporary
1245                                  file systems (/tmp and  /dev/shm)  for  each
1246                                  job.  'PrologFlags=Contain'  must  be set to
1247                                  use this plugin.
1248
1249       JobFileAppend
1250              This option controls what to do if a job's output or error  file
1251              exist  when  the  job  is started.  If JobFileAppend is set to a
1252              value of 1, then append to the existing file.  By  default,  any
1253              existing file is truncated.
1254
1255       JobRequeue
1256              This  option  controls  the default ability for batch jobs to be
1257              requeued.  Jobs may be requeued explicitly by a system  adminis‐
1258              trator,  after node failure, or upon preemption by a higher pri‐
1259              ority job.  If JobRequeue is set to a value  of  1,  then  batch
1260              jobs may be requeued unless explicitly disabled by the user.  If
1261              JobRequeue is set to a value of 0, then batch jobs will  not  be
1262              requeued  unless explicitly enabled by the user.  Use the sbatch
1263              --no-requeue or --requeue option to change the default  behavior
1264              for individual jobs.  The default value is 1.
1265
1266       JobSubmitPlugins
1267              A  comma-delimited  list  of  job submission plugins to be used.
1268              The specified plugins will be  executed  in  the  order  listed.
1269              These are intended to be site-specific plugins which can be used
1270              to set default job parameters  and/or  logging  events.   Sample
1271              plugins  available in the distribution include "all_partitions",
1272              "defaults", "logging", "lua", and "partition".  For examples  of
1273              use,  see  the  Slurm code in "src/plugins/job_submit" and "con‐
1274              tribs/lua/job_submit*.lua" then modify the code to satisfy  your
1275              needs.  Slurm can be configured to use multiple job_submit plug‐
1276              ins if desired, however the lua plugin will only execute one lua
1277              script  named "job_submit.lua" located in the default script di‐
1278              rectory (typically the subdirectory "etc"  of  the  installation
1279              directory).  No job submission plugins are used by default.
1280
1281       KeepAliveTime
1282              Specifies  how long sockets communications used between the srun
1283              command and its slurmstepd process are kept alive after  discon‐
1284              nect.   Longer values can be used to improve reliability of com‐
1285              munications in the event of network failures.  The default value
1286              leaves  the  system  default  value.   The  value may not exceed
1287              65533.
1288
1289       KillOnBadExit
1290              If set to 1, a step will be terminated immediately if  any  task
1291              is  crashed  or  aborted,  as indicated by a non-zero exit code.
1292              With the default value of 0, if one of the processes is  crashed
1293              or  aborted  the  other processes will continue to run while the
1294              crashed or aborted process waits. The  user  can  override  this
1295              configuration parameter by using srun's -K, --kill-on-bad-exit.
1296
1297       KillWait
1298              The interval, in seconds, given to a job's processes between the
1299              SIGTERM and SIGKILL signals upon reaching its  time  limit.   If
1300              the job fails to terminate gracefully in the interval specified,
1301              it will be forcibly terminated.  The default value  is  30  sec‐
1302              onds.  The value may not exceed 65533.
1303
1304       NodeFeaturesPlugins
1305              Identifies  the  plugins to be used for support of node features
1306              which can change through time. For example, a node  which  might
1307              be  booted  with various BIOS setting. This is supported through
1308              the use of a node's active_features and  available_features  in‐
1309              formation.  Acceptable values at present include:
1310
1311              node_features/knl_cray
1312                     Used  only  for Intel Knights Landing processors (KNL) on
1313                     Cray systems.
1314
1315              node_features/knl_generic
1316                     Used for Intel Knights  Landing  processors  (KNL)  on  a
1317                     generic Linux system.
1318
1319              node_features/helpers
1320                     Used  to  report and modify features on nodes using arbi‐
1321                     trary scripts or programs.
1322
1323       LaunchParameters
1324              Identifies options to the job launch plugin.  Acceptable  values
1325              include:
1326
1327              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1328                                      from  given  --cpu-freq,  or  slurm.conf
1329                                      CpuFreqDef,  option.   By  default  only
1330                                      steps started with srun will utilize the
1331                                      cpu freq setting options.
1332
1333                                      NOTE:  If  you  are using srun to launch
1334                                      your steps inside a  batch  script  (ad‐
1335                                      vised)  this option will create a situa‐
1336                                      tion where you may have multiple  agents
1337                                      setting  the  cpu_freq as the batch step
1338                                      usually runs on the same  resources  one
1339                                      or  more  steps  the sruns in the script
1340                                      will create.
1341
1342              cray_net_exclusive      Allow jobs on a Cray Native cluster  ex‐
1343                                      clusive  access  to  network  resources.
1344                                      This should only be set on clusters pro‐
1345                                      viding  exclusive access to each node to
1346                                      a single job at once, and not using par‐
1347                                      allel  steps  within  the job, otherwise
1348                                      resources on the node  can  be  oversub‐
1349                                      scribed.
1350
1351              enable_nss_slurm        Permits  passwd and group resolution for
1352                                      a  job  to  be  serviced  by  slurmstepd
1353                                      rather  than  requiring  a lookup from a
1354                                      network     based      service.      See
1355                                      https://slurm.schedmd.com/nss_slurm.html
1356                                      for more information.
1357
1358              lustre_no_flush         If set on a Cray Native cluster, then do
1359                                      not  flush  the Lustre cache on job step
1360                                      completion. This setting will only  take
1361                                      effect  after  reconfiguring,  and  will
1362                                      only  take  effect  for  newly  launched
1363                                      jobs.
1364
1365              mem_sort                Sort NUMA memory at step start. User can
1366                                      override     this      default      with
1367                                      SLURM_MEM_BIND  environment  variable or
1368                                      --mem-bind=nosort command line option.
1369
1370              mpir_use_nodeaddr       When launching tasks Slurm  creates  en‐
1371                                      tries in MPIR_proctable that are used by
1372                                      parallel debuggers, profilers,  and  re‐
1373                                      lated   tools   to   attach  to  running
1374                                      process.  By default the  MPIR_proctable
1375                                      entries contain MPIR_procdesc structures
1376                                      where the host_name is set  to  NodeName
1377                                      by default. If this option is specified,
1378                                      NodeAddr will be used  in  this  context
1379                                      instead.
1380
1381              disable_send_gids       By  default,  the slurmctld will look up
1382                                      and send the user_name and extended gids
1383                                      for  a job, rather than independently on
1384                                      each node as part of each  task  launch.
1385                                      This  helps  mitigate issues around name
1386                                      service scalability when launching  jobs
1387                                      involving  many nodes. Using this option
1388                                      will disable  this  functionality.  This
1389                                      option is ignored if enable_nss_slurm is
1390                                      specified.
1391
1392              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1393                                      memory in RAM.
1394
1395              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1396                                      and future memory in RAM.
1397
1398              test_exec               Have srun verify existence of  the  exe‐
1399                                      cutable  program along with user execute
1400                                      permission on the node  where  srun  was
1401                                      called before attempting to launch it on
1402                                      nodes in the step.
1403
1404              use_interactive_step    Have salloc use the Interactive Step  to
1405                                      launch  a  shell on an allocated compute
1406                                      node rather  than  locally  to  wherever
1407                                      salloc was invoked. This is accomplished
1408                                      by launching the srun command  with  In‐
1409                                      teractiveStepOptions as options.
1410
1411                                      This  does not affect salloc called with
1412                                      a command as  an  argument.  These  jobs
1413                                      will  continue  to  be  executed  as the
1414                                      calling user on the calling host.
1415
1416       LaunchType
1417              Identifies the mechanism to be used to launch application tasks.
1418              Acceptable values include:
1419
1420              launch/slurm
1421                     The default value.
1422
1423       Licenses
1424              Specification  of  licenses (or other resources available on all
1425              nodes of the cluster) which can be allocated to  jobs.   License
1426              names can optionally be followed by a colon and count with a de‐
1427              fault count of one.  Multiple license names should be comma sep‐
1428              arated  (e.g.   "Licenses=foo:4,bar").  Note that Slurm prevents
1429              jobs from being scheduled if their required  license  specifica‐
1430              tion  is  not available.  Slurm does not prevent jobs from using
1431              licenses that are not explicitly listed in  the  job  submission
1432              specification.
1433
1434       LogTimeFormat
1435              Format  of  the timestamp in slurmctld and slurmd log files. Ac‐
1436              cepted   values   are   "iso8601",   "iso8601_ms",    "rfc5424",
1437              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1438              ing in "_ms" differ from the ones  without  in  that  fractional
1439              seconds  with  millisecond  precision  are  printed. The default
1440              value is "iso8601_ms". The "rfc5424" formats are the same as the
1441              "iso8601"  formats except that the timezone value is also shown.
1442              The "clock" format shows a timestamp in  microseconds  retrieved
1443              with  the  C  standard clock() function. The "short" format is a
1444              short date and time format. The  "thread_id"  format  shows  the
1445              timestamp  in  the  C standard ctime() function form without the
1446              year but including the microseconds, the daemon's process ID and
1447              the current thread name and ID.
1448
1449       MailDomain
1450              Domain name to qualify usernames if email address is not explic‐
1451              itly given with the "--mail-user" option. If  unset,  the  local
1452              MTA  will need to qualify local address itself. Changes to Mail‐
1453              Domain will only affect new jobs.
1454
1455       MailProg
1456              Fully qualified pathname to the program used to send  email  per
1457              user   request.    The   default   value   is   "/bin/mail"  (or
1458              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1459              "/usr/bin/mail"  does  exist).  The program is called with argu‐
1460              ments suitable for the default mail command, however  additional
1461              information  about  the job is passed in the form of environment
1462              variables.
1463
1464              Additional variables are  the  same  as  those  passed  to  Pro‐
1465              logSlurmctld  and  EpilogSlurmctld  with additional variables in
1466              the following contexts:
1467
1468              ALL
1469
1470                     SLURM_JOB_STATE
1471                            The base state of the job  when  the  MailProg  is
1472                            called.
1473
1474                     SLURM_JOB_MAIL_TYPE
1475                            The mail type triggering the mail.
1476
1477              BEGIN
1478
1479                     SLURM_JOB_QEUEUED_TIME
1480                            The amount of time the job was queued.
1481
1482              END, FAIL, REQUEUE, TIME_LIMIT_*
1483
1484                     SLURM_JOB_RUN_TIME
1485                            The amount of time the job ran for.
1486
1487              END, FAIL
1488
1489                     SLURM_JOB_EXIT_CODE_MAX
1490                            Job's  exit code or highest exit code for an array
1491                            job.
1492
1493                     SLURM_JOB_EXIT_CODE_MIN
1494                            Job's minimum exit code for an array job.
1495
1496                     SLURM_JOB_TERM_SIGNAL_MAX
1497                            Job's highest signal for an array job.
1498
1499              STAGE_OUT
1500
1501                     SLURM_JOB_STAGE_OUT_TIME
1502                            Job's staging out time.
1503
1504       MaxArraySize
1505              The maximum job array task index value will  be  one  less  than
1506              MaxArraySize  to  allow  for  an index value of zero.  Configure
1507              MaxArraySize to 0 in order to disable job array use.  The  value
1508              may not exceed 4000001.  The value of MaxJobCount should be much
1509              larger than MaxArraySize.  The default value is 1001.  See  also
1510              max_array_tasks in SchedulerParameters.
1511
1512       MaxDBDMsgs
1513              When communication to the SlurmDBD is not possible the slurmctld
1514              will queue messages meant to  processed  when  the  SlurmDBD  is
1515              available  again.   In  order to avoid running out of memory the
1516              slurmctld will only queue so many messages. The default value is
1517              10000,  or  MaxJobCount  *  2  +  Node  Count  * 4, whichever is
1518              greater.  The value can not be less than 10000.
1519
1520       MaxJobCount
1521              The maximum number of jobs slurmctld can have in memory  at  one
1522              time.   Combine  with  MinJobAge  to ensure the slurmctld daemon
1523              does not exhaust its memory or other resources. Once this  limit
1524              is  reached,  requests  to submit additional jobs will fail. The
1525              default value is 10000 jobs.  NOTE: Each task  of  a  job  array
1526              counts  as one job even though they will not occupy separate job
1527              records until modified or  initiated.   Performance  can  suffer
1528              with more than a few hundred thousand jobs.  Setting per MaxSub‐
1529              mitJobs per user is generally valuable to prevent a single  user
1530              from  filling  the system with jobs.  This is accomplished using
1531              Slurm's database and configuring enforcement of resource limits.
1532              A restart of slurmctld is required for changes to this parameter
1533              to take effect.
1534
1535       MaxJobId
1536              The maximum job id to be used for jobs submitted to Slurm  with‐
1537              out a specific requested value. Job ids are unsigned 32bit inte‐
1538              gers with the first 26 bits reserved for local job ids  and  the
1539              remaining  6 bits reserved for a cluster id to identify a feder‐
1540              ated  job's  origin.  The  maximum  allowed  local  job  id   is
1541              67,108,863   (0x3FFFFFF).   The   default  value  is  67,043,328
1542              (0x03ff0000).  MaxJobId only applies to the local job id and not
1543              the  federated  job  id.  Job id values generated will be incre‐
1544              mented by 1 for each subsequent job. Once MaxJobId  is  reached,
1545              the  next  job will be assigned FirstJobId.  Federated jobs will
1546              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1547              bId.
1548
1549       MaxMemPerCPU
1550              Maximum   real  memory  size  available  per  allocated  CPU  in
1551              megabytes.  Used to avoid over-subscribing  memory  and  causing
1552              paging.  MaxMemPerCPU would generally be used if individual pro‐
1553              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
1554              lectType=select/cons_tres).  The default value is 0 (unlimited).
1555              Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerNode.   MaxMem‐
1556              PerCPU and MaxMemPerNode are mutually exclusive.
1557
1558              NOTE:  If  a  job  specifies a memory per CPU limit that exceeds
1559              this system limit, that job's count of CPUs per task will try to
1560              automatically  increase.  This may result in the job failing due
1561              to CPU count limits. This auto-adjustment feature is a  best-ef‐
1562              fort  one  and  optimal  assignment is not guaranteed due to the
1563              possibility   of   having   heterogeneous   configurations   and
1564              multi-partition/qos jobs.  If this is a concern it is advised to
1565              use a job submit LUA plugin instead to enforce  auto-adjustments
1566              to your specific needs.
1567
1568       MaxMemPerNode
1569              Maximum  real  memory  size  available  per  allocated  node  in
1570              megabytes.  Used to avoid over-subscribing  memory  and  causing
1571              paging.   MaxMemPerNode  would  generally be used if whole nodes
1572              are allocated to jobs (SelectType=select/linear)  and  resources
1573              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
1574              The default value is 0 (unlimited).  Also see DefMemPerNode  and
1575              MaxMemPerCPU.   MaxMemPerCPU  and MaxMemPerNode are mutually ex‐
1576              clusive.
1577
1578       MaxStepCount
1579              The maximum number of steps that any job can initiate. This  pa‐
1580              rameter  is  intended  to limit the effect of bad batch scripts.
1581              The default value is 40000 steps.
1582
1583       MaxTasksPerNode
1584              Maximum number of tasks Slurm will allow a job step to spawn  on
1585              a  single node. The default MaxTasksPerNode is 512.  May not ex‐
1586              ceed 65533.
1587
1588       MCSParameters
1589              MCS = Multi-Category Security MCS Plugin Parameters.   The  sup‐
1590              ported  parameters  are  specific  to the MCSPlugin.  Changes to
1591              this value take effect when the Slurm daemons are  reconfigured.
1592              More     information     about    MCS    is    available    here
1593              <https://slurm.schedmd.com/mcs.html>.
1594
1595       MCSPlugin
1596              MCS = Multi-Category Security : associate a  security  label  to
1597              jobs  and  ensure that nodes can only be shared among jobs using
1598              the same security label.  Acceptable values include:
1599
1600              mcs/none    is the default value.  No security label  associated
1601                          with  jobs,  no particular security restriction when
1602                          sharing nodes among jobs.
1603
1604              mcs/account only users with the same account can share the nodes
1605                          (requires enabling of accounting).
1606
1607              mcs/group   only users with the same group can share the nodes.
1608
1609              mcs/user    a node cannot be shared with other users.
1610
1611       MessageTimeout
1612              Time  permitted  for  a  round-trip communication to complete in
1613              seconds. Default value is 10 seconds. For  systems  with  shared
1614              nodes,  the  slurmd  daemon  could  be paged out and necessitate
1615              higher values.
1616
1617       MinJobAge
1618              The minimum age of a completed job before its record is  cleared
1619              from  the  list  of jobs slurmctld keeps in memory. Combine with
1620              MaxJobCount to ensure the slurmctld daemon does not exhaust  its
1621              memory  or other resources. The default value is 300 seconds.  A
1622              value of zero prevents any job record  purging.   Jobs  are  not
1623              purged  during a backfill cycle, so it can take longer than Min‐
1624              JobAge seconds to purge a job if using the  backfill  scheduling
1625              plugin.   In  order  to eliminate some possible race conditions,
1626              the minimum non-zero value for MinJobAge recommended is 2.
1627
1628       MpiDefault
1629              Identifies the default type of MPI to be used.  Srun  may  over‐
1630              ride  this  configuration parameter in any case.  Currently sup‐
1631              ported versions include: pmi2, pmix, and  none  (default,  which
1632              works  for  many other versions of MPI).  More information about
1633              MPI          use           is           available           here
1634              <https://slurm.schedmd.com/mpi_guide.html>.
1635
1636       MpiParams
1637              MPI  parameters.   Used to identify ports used by older versions
1638              of OpenMPI  and  native  Cray  systems.   The  input  format  is
1639              "ports=12000-12999"  to  identify a range of communication ports
1640              to be used.  NOTE: This is not needed  for  modern  versions  of
1641              OpenMPI,  taking  it  out  can cause a small boost in scheduling
1642              performance.  NOTE: This is require for Cray's PMI.
1643
1644       OverTimeLimit
1645              Number of minutes by which a job can exceed its time  limit  be‐
1646              fore  being canceled.  Normally a job's time limit is treated as
1647              a hard limit and the job  will  be  killed  upon  reaching  that
1648              limit.   Configuring OverTimeLimit will result in the job's time
1649              limit being treated like a soft limit.  Adding the OverTimeLimit
1650              value  to  the  soft  time  limit provides a hard time limit, at
1651              which point the job is canceled.  This  is  particularly  useful
1652              for  backfill  scheduling, which bases upon each job's soft time
1653              limit.  The default value is zero.  May not  exceed  65533  min‐
1654              utes.  A value of "UNLIMITED" is also supported.
1655
1656       PluginDir
1657              Identifies  the places in which to look for Slurm plugins.  This
1658              is a colon-separated list of directories, like the PATH environ‐
1659              ment variable.  The default value is the prefix given at config‐
1660              ure time + "/lib/slurm".  A restart of slurmctld and the  slurmd
1661              daemons  is  required  for changes to this parameter to take ef‐
1662              fect.
1663
1664       PlugStackConfig
1665              Location of the config file for Slurm stackable plugins that use
1666              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1667              (SPANK).  This provides support for a highly configurable set of
1668              plugins  to be called before and/or after execution of each task
1669              spawned as part of a  user's  job  step.   Default  location  is
1670              "plugstack.conf" in the same directory as the system slurm.conf.
1671              For more information on SPANK plugins, see the spank(8) manual.
1672
1673       PowerParameters
1674              System power management parameters.   The  supported  parameters
1675              are specific to the PowerPlugin.  Changes to this value take ef‐
1676              fect when the Slurm daemons are reconfigured.  More  information
1677              about    system    power    management    is    available   here
1678              <https://slurm.schedmd.com/power_mgmt.html>.   Options   current
1679              supported by any plugins are listed below.
1680
1681              balance_interval=#
1682                     Specifies the time interval, in seconds, between attempts
1683                     to rebalance power caps across the nodes.  This also con‐
1684                     trols  the  frequency  at which Slurm attempts to collect
1685                     current power consumption data (old data may be used  un‐
1686                     til new data is available from the underlying infrastruc‐
1687                     ture and values below 10 seconds are not recommended  for
1688                     Cray  systems).   The  default value is 30 seconds.  Sup‐
1689                     ported by the power/cray_aries plugin.
1690
1691              capmc_path=
1692                     Specifies the absolute path of the  capmc  command.   The
1693                     default   value  is  "/opt/cray/capmc/default/bin/capmc".
1694                     Supported by the power/cray_aries plugin.
1695
1696              cap_watts=#
1697                     Specifies the total power limit to be established  across
1698                     all  compute  nodes  managed by Slurm.  A value of 0 sets
1699                     every compute node to have an unlimited cap.  The default
1700                     value is 0.  Supported by the power/cray_aries plugin.
1701
1702              decrease_rate=#
1703                     Specifies the maximum rate of change in the power cap for
1704                     a node where the actual power usage is  below  the  power
1705                     cap  by  an  amount greater than lower_threshold (see be‐
1706                     low).  Value represents a percentage  of  the  difference
1707                     between  a  node's minimum and maximum power consumption.
1708                     The default  value  is  50  percent.   Supported  by  the
1709                     power/cray_aries plugin.
1710
1711              get_timeout=#
1712                     Amount  of time allowed to get power state information in
1713                     milliseconds.  The default value is 5,000 milliseconds or
1714                     5  seconds.  Supported by the power/cray_aries plugin and
1715                     represents the time allowed for the capmc command to  re‐
1716                     spond to various "get" options.
1717
1718              increase_rate=#
1719                     Specifies the maximum rate of change in the power cap for
1720                     a node  where  the  actual  power  usage  is  within  up‐
1721                     per_threshold (see below) of the power cap.  Value repre‐
1722                     sents a percentage of the  difference  between  a  node's
1723                     minimum and maximum power consumption.  The default value
1724                     is 20 percent.  Supported by the power/cray_aries plugin.
1725
1726              job_level
1727                     All nodes associated with every job will  have  the  same
1728                     power   cap,  to  the  extent  possible.   Also  see  the
1729                     --power=level option on the job submission commands.
1730
1731              job_no_level
1732                     Disable the user's ability to set every  node  associated
1733                     with  a  job  to the same power cap.  Each node will have
1734                     its power  cap  set  independently.   This  disables  the
1735                     --power=level option on the job submission commands.
1736
1737              lower_threshold=#
1738                     Specify a lower power consumption threshold.  If a node's
1739                     current power consumption is below this percentage of its
1740                     current cap, then its power cap will be reduced.  The de‐
1741                     fault  value   is   90   percent.    Supported   by   the
1742                     power/cray_aries plugin.
1743
1744              recent_job=#
1745                     If  a job has started or resumed execution (from suspend)
1746                     on a compute node within this number of seconds from  the
1747                     current  time,  the node's power cap will be increased to
1748                     the maximum.  The default value  is  300  seconds.   Sup‐
1749                     ported by the power/cray_aries plugin.
1750
1751
1752              set_timeout=#
1753                     Amount  of time allowed to set power state information in
1754                     milliseconds.  The default value is  30,000  milliseconds
1755                     or  30  seconds.   Supported by the power/cray plugin and
1756                     represents the time allowed for the capmc command to  re‐
1757                     spond to various "set" options.
1758
1759              set_watts=#
1760                     Specifies  the  power  limit  to  be set on every compute
1761                     nodes managed by Slurm.  Every node gets this same  power
1762                     cap and there is no variation through time based upon ac‐
1763                     tual  power  usage  on  the  node.   Supported   by   the
1764                     power/cray_aries plugin.
1765
1766              upper_threshold=#
1767                     Specify  an  upper  power  consumption  threshold.   If a
1768                     node's current power consumption is above this percentage
1769                     of  its current cap, then its power cap will be increased
1770                     to the extent possible.  The default value is 95 percent.
1771                     Supported by the power/cray_aries plugin.
1772
1773       PowerPlugin
1774              Identifies  the  plugin  used for system power management.  Cur‐
1775              rently  supported  plugins  include:  cray_aries  and  none.   A
1776              restart  of  slurmctld is required for changes to this parameter
1777              to take effect.  More information about system power  management
1778              is  available  here <https://slurm.schedmd.com/power_mgmt.html>.
1779              By default, no power plugin is loaded.
1780
1781       PreemptMode
1782              Mechanism used to preempt jobs or enable gang  scheduling.  When
1783              the  PreemptType parameter is set to enable preemption, the Pre‐
1784              emptMode selects the default mechanism used to preempt the  eli‐
1785              gible jobs for the cluster.
1786              PreemptMode  may  be specified on a per partition basis to over‐
1787              ride this default value  if  PreemptType=preempt/partition_prio.
1788              Alternatively,  it  can  be specified on a per QOS basis if Pre‐
1789              emptType=preempt/qos. In either case, a valid  default  Preempt‐
1790              Mode  value  must  be  specified for the cluster as a whole when
1791              preemption is enabled.
1792              The GANG option is used to enable gang scheduling independent of
1793              whether  preemption is enabled (i.e. independent of the Preempt‐
1794              Type setting). It can be specified in addition to a  PreemptMode
1795              setting  with  the  two  options  comma separated (e.g. Preempt‐
1796              Mode=SUSPEND,GANG).
1797              See         <https://slurm.schedmd.com/preempt.html>         and
1798              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
1799              tails.
1800
1801              NOTE: For performance reasons, the backfill  scheduler  reserves
1802              whole  nodes  for  jobs,  not  partial nodes. If during backfill
1803              scheduling a job preempts one or  more  other  jobs,  the  whole
1804              nodes  for  those  preempted jobs are reserved for the preemptor
1805              job, even if the preemptor job requested  fewer  resources  than
1806              that.   These reserved nodes aren't available to other jobs dur‐
1807              ing that backfill cycle, even if the other jobs could fit on the
1808              nodes.  Therefore, jobs may preempt more resources during a sin‐
1809              gle backfill iteration than they requested.
1810              NOTE: For heterogeneous job to be considered for preemption  all
1811              components must be eligible for preemption. When a heterogeneous
1812              job is to be preempted the first identified component of the job
1813              with  the highest order PreemptMode (SUSPEND (highest), REQUEUE,
1814              CANCEL (lowest)) will be used to set  the  PreemptMode  for  all
1815              components.  The GraceTime and user warning signal for each com‐
1816              ponent of the heterogeneous job  remain  unique.   Heterogeneous
1817              jobs are excluded from GANG scheduling operations.
1818
1819              OFF         Is the default value and disables job preemption and
1820                          gang scheduling.  It is only  compatible  with  Pre‐
1821                          emptType=preempt/none  at  a global level.  A common
1822                          use case for this parameter is to set it on a parti‐
1823                          tion to disable preemption for that partition.
1824
1825              CANCEL      The preempted job will be cancelled.
1826
1827              GANG        Enables  gang  scheduling  (time slicing) of jobs in
1828                          the same partition, and allows the resuming of  sus‐
1829                          pended jobs.
1830
1831                          NOTE: Gang scheduling is performed independently for
1832                          each partition, so if you only want time-slicing  by
1833                          OverSubscribe,  without any preemption, then config‐
1834                          uring partitions with overlapping nodes is not  rec‐
1835                          ommended.   On  the  other  hand, if you want to use
1836                          PreemptType=preempt/partition_prio  to  allow   jobs
1837                          from  higher PriorityTier partitions to Suspend jobs
1838                          from lower PriorityTier  partitions  you  will  need
1839                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1840                          to use the Gang scheduler to  resume  the  suspended
1841                          jobs(s).  In any case, time-slicing won't happen be‐
1842                          tween jobs on different partitions.
1843
1844                          NOTE: Heterogeneous  jobs  are  excluded  from  GANG
1845                          scheduling operations.
1846
1847              REQUEUE     Preempts  jobs  by  requeuing  them (if possible) or
1848                          canceling them.  For jobs to be requeued  they  must
1849                          have  the --requeue sbatch option set or the cluster
1850                          wide JobRequeue parameter in slurm.conf must be  set
1851                          to 1.
1852
1853              SUSPEND     The  preempted jobs will be suspended, and later the
1854                          Gang scheduler will resume them. Therefore the  SUS‐
1855                          PEND preemption mode always needs the GANG option to
1856                          be specified at the cluster level. Also, because the
1857                          suspended  jobs  will  still use memory on the allo‐
1858                          cated nodes, Slurm needs to be able to track  memory
1859                          resources to be able to suspend jobs.
1860                          If  PreemptType=preempt/qos is configured and if the
1861                          preempted job(s) and the preemptor job  are  on  the
1862                          same  partition, then they will share resources with
1863                          the Gang scheduler (time-slicing). If not  (i.e.  if
1864                          the preemptees and preemptor are on different parti‐
1865                          tions) then the preempted jobs will remain suspended
1866                          until the preemptor ends.
1867
1868                          NOTE:  Because gang scheduling is performed indepen‐
1869                          dently for each partition, if using PreemptType=pre‐
1870                          empt/partition_prio then jobs in higher PriorityTier
1871                          partitions will suspend jobs in  lower  PriorityTier
1872                          partitions  to  run  on the released resources. Only
1873                          when the preemptor job ends will the suspended  jobs
1874                          will be resumed by the Gang scheduler.
1875                          NOTE:  Suspended  jobs will not release GRES. Higher
1876                          priority jobs will not be able to  preempt  to  gain
1877                          access to GRES.
1878
1879       PreemptType
1880              Specifies  the  plugin  used  to identify which jobs can be pre‐
1881              empted in order to start a pending job.
1882
1883              preempt/none
1884                     Job preemption is disabled.  This is the default.
1885
1886              preempt/partition_prio
1887                     Job preemption  is  based  upon  partition  PriorityTier.
1888                     Jobs  in  higher PriorityTier partitions may preempt jobs
1889                     from lower PriorityTier partitions.  This is not compati‐
1890                     ble with PreemptMode=OFF.
1891
1892              preempt/qos
1893                     Job  preemption rules are specified by Quality Of Service
1894                     (QOS) specifications in the Slurm database.  This  option
1895                     is  not compatible with PreemptMode=OFF.  A configuration
1896                     of PreemptMode=SUSPEND is only supported by  the  Select‐
1897                     Type=select/cons_res    and   SelectType=select/cons_tres
1898                     plugins.  See the sacctmgr man page to configure the  op‐
1899                     tions for preempt/qos.
1900
1901       PreemptExemptTime
1902              Global  option for minimum run time for all jobs before they can
1903              be considered for preemption. Any  QOS  PreemptExemptTime  takes
1904              precedence over the global option. This is only honored for Pre‐
1905              emptMode=REQUEUE and PreemptMode=CANCEL.
1906              A time of -1 disables the option, equivalent  to  0.  Acceptable
1907              time  formats  include "minutes", "minutes:seconds", "hours:min‐
1908              utes:seconds",    "days-hours",    "days-hours:minutes",     and
1909              "days-hours:minutes:seconds".
1910
1911       PrEpParameters
1912              Parameters to be passed to the PrEpPlugins.
1913
1914       PrEpPlugins
1915              A  resource  for  programmers wishing to write their own plugins
1916              for the Prolog and Epilog (PrEp) scripts. The default, and  cur‐
1917              rently  the  only  implemented plugin is prep/script. Additional
1918              plugins can be specified in a comma-separated list. For more in‐
1919              formation  please  see  the  PrEp Plugin API documentation page:
1920              <https://slurm.schedmd.com/prep_plugins.html>
1921
1922       PriorityCalcPeriod
1923              The period of time in minutes in which the half-life decay  will
1924              be re-calculated.  Applicable only if PriorityType=priority/mul‐
1925              tifactor.  The default value is 5 (minutes).
1926
1927       PriorityDecayHalfLife
1928              This controls how long prior resource use is considered  in  de‐
1929              termining  how  over- or under-serviced an association is (user,
1930              bank account and cluster)  in  determining  job  priority.   The
1931              record  of  usage  will  be  decayed over time, with half of the
1932              original value cleared at age PriorityDecayHalfLife.  If set  to
1933              0  no decay will be applied.  This is helpful if you want to en‐
1934              force hard time limits per association.  If  set  to  0  Priori‐
1935              tyUsageResetPeriod  must  be  set  to some interval.  Applicable
1936              only if PriorityType=priority/multifactor.  The unit is  a  time
1937              string  (i.e.  min, hr:min:00, days-hr:min:00, or days-hr).  The
1938              default value is 7-0 (7 days).
1939
1940       PriorityFavorSmall
1941              Specifies that small jobs should be given preferential  schedul‐
1942              ing  priority.   Applicable only if PriorityType=priority/multi‐
1943              factor.  Supported values are "YES" and "NO".  The default value
1944              is "NO".
1945
1946       PriorityFlags
1947              Flags to modify priority behavior.  Applicable only if Priority‐
1948              Type=priority/multifactor.  The keywords below have  no  associ‐
1949              ated    value   (e.g.   "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
1950              TIVE_TO_TIME").
1951
1952              ACCRUE_ALWAYS    If set, priority age factor will  be  increased
1953                               despite job dependencies or holds.
1954
1955              CALCULATE_RUNNING
1956                               If  set,  priorities  will  be recalculated not
1957                               only for pending jobs,  but  also  running  and
1958                               suspended jobs.
1959
1960              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
1961                               lar to the normal multifactor calculation,  but
1962                               depth  of the associations in the tree does not
1963                               adversely affect their  priority.  This  option
1964                               automatically enables NO_FAIR_TREE.
1965
1966              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
1967                               to "classic" fair share priority scheduling.
1968
1969              INCR_ONLY        If set, priority values will only  increase  in
1970                               value.  Job  priority  will  never  decrease in
1971                               value.
1972
1973              MAX_TRES         If set, the weighted  TRES  value  (e.g.  TRES‐
1974                               BillingWeights) is calculated as the MAX of in‐
1975                               dividual TRES' on a node (e.g. cpus, mem, gres)
1976                               plus  the  sum  of  all  global TRES' (e.g. li‐
1977                               censes).
1978
1979              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
1980
1981              NO_NORMAL_ASSOC  If set, the association factor is  not  normal‐
1982                               ized against the highest association priority.
1983
1984              NO_NORMAL_PART   If  set, the partition factor is not normalized
1985                               against the highest  partition  PriorityJobFac‐
1986                               tor.
1987
1988              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
1989                               against the highest qos priority.
1990
1991              NO_NORMAL_TRES   If set,  the  TRES  factor  is  not  normalized
1992                               against the job's partition TRES counts.
1993
1994              SMALL_RELATIVE_TO_TIME
1995                               If  set, the job's size component will be based
1996                               upon not the job size alone, but the job's size
1997                               divided by its time limit.
1998
1999       PriorityMaxAge
2000              Specifies the job age which will be given the maximum age factor
2001              in computing priority. For example, a value of 30 minutes  would
2002              result  in  all  jobs  over  30  minutes  old would get the same
2003              age-based  priority.   Applicable  only  if  PriorityType=prior‐
2004              ity/multifactor.    The   unit  is  a  time  string  (i.e.  min,
2005              hr:min:00, days-hr:min:00, or days-hr).  The  default  value  is
2006              7-0 (7 days).
2007
2008       PriorityParameters
2009              Arbitrary string used by the PriorityType plugin.
2010
2011       PrioritySiteFactorParameters
2012              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
2013
2014       PrioritySiteFactorPlugin
2015              The  specifies  an  optional plugin to be used alongside "prior‐
2016              ity/multifactor", which is meant to initially set  and  continu‐
2017              ously  update the SiteFactor priority factor.  The default value
2018              is "site_factor/none".
2019
2020       PriorityType
2021              This specifies the plugin to be used  in  establishing  a  job's
2022              scheduling  priority.   Also see PriorityFlags for configuration
2023              options.  The default value is "priority/basic".
2024
2025              priority/basic
2026                     Jobs are evaluated in a First In, First Out  (FIFO)  man‐
2027                     ner.
2028
2029              priority/multifactor
2030                     Jobs are assigned a priority based upon a variety of fac‐
2031                     tors that include size, age, Fairshare, etc.
2032
2033              When not FIFO scheduling, jobs are prioritized in the  following
2034              order:
2035
2036              1. Jobs that can preempt
2037              2. Jobs with an advanced reservation
2038              3. Partition PriorityTier
2039              4. Job priority
2040              5. Job submit time
2041              6. Job ID
2042
2043       PriorityUsageResetPeriod
2044              At  this  interval the usage of associations will be reset to 0.
2045              This is used if you want to enforce hard limits  of  time  usage
2046              per association.  If PriorityDecayHalfLife is set to be 0 no de‐
2047              cay will happen and this is the only way to reset the usage  ac‐
2048              cumulated by running jobs.  By default this is turned off and it
2049              is advised to use the PriorityDecayHalfLife option to avoid  not
2050              having  anything  running on your cluster, but if your schema is
2051              set up to only allow certain amounts of time on your system this
2052              is  the  way  to  do it.  Applicable only if PriorityType=prior‐
2053              ity/multifactor.
2054
2055              NONE        Never clear historic usage. The default value.
2056
2057              NOW         Clear the historic usage now.  Executed  at  startup
2058                          and reconfiguration time.
2059
2060              DAILY       Cleared every day at midnight.
2061
2062              WEEKLY      Cleared every week on Sunday at time 00:00.
2063
2064              MONTHLY     Cleared  on  the  first  day  of  each month at time
2065                          00:00.
2066
2067              QUARTERLY   Cleared on the first day of  each  quarter  at  time
2068                          00:00.
2069
2070              YEARLY      Cleared on the first day of each year at time 00:00.
2071
2072       PriorityWeightAge
2073              An  integer  value  that sets the degree to which the queue wait
2074              time component contributes to the  job's  priority.   Applicable
2075              only  if  PriorityType=priority/multifactor.   Requires Account‐
2076              ingStorageType=accounting_storage/slurmdbd.  The  default  value
2077              is 0.
2078
2079       PriorityWeightAssoc
2080              An  integer  value that sets the degree to which the association
2081              component contributes to the job's priority.  Applicable only if
2082              PriorityType=priority/multifactor.  The default value is 0.
2083
2084       PriorityWeightFairshare
2085              An  integer  value  that sets the degree to which the fair-share
2086              component contributes to the job's priority.  Applicable only if
2087              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2088              ageType=accounting_storage/slurmdbd.  The default value is 0.
2089
2090       PriorityWeightJobSize
2091              An integer value that sets the degree to which the job size com‐
2092              ponent  contributes  to  the job's priority.  Applicable only if
2093              PriorityType=priority/multifactor.  The default value is 0.
2094
2095       PriorityWeightPartition
2096              Partition factor used by priority/multifactor plugin  in  calcu‐
2097              lating  job  priority.   Applicable  only if PriorityType=prior‐
2098              ity/multifactor.  The default value is 0.
2099
2100       PriorityWeightQOS
2101              An integer value that sets the degree to which  the  Quality  Of
2102              Service component contributes to the job's priority.  Applicable
2103              only if PriorityType=priority/multifactor.  The default value is
2104              0.
2105
2106       PriorityWeightTRES
2107              A  comma-separated  list of TRES Types and weights that sets the
2108              degree that each TRES Type contributes to the job's priority.
2109
2110              e.g.
2111              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2112
2113              Applicable only if PriorityType=priority/multifactor and if  Ac‐
2114              countingStorageTRES is configured with each TRES Type.  Negative
2115              values are allowed.  The default values are 0.
2116
2117       PrivateData
2118              This controls what type of information is  hidden  from  regular
2119              users.   By  default,  all  information is visible to all users.
2120              User SlurmUser and root can always view all information.  Multi‐
2121              ple  values may be specified with a comma separator.  Acceptable
2122              values include:
2123
2124              accounts
2125                     (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2126                     ing  any account definitions unless they are coordinators
2127                     of them.
2128
2129              cloud  Powered down nodes in the cloud are visible.
2130
2131              events prevents users from viewing event information unless they
2132                     have operator status or above.
2133
2134              jobs   Prevents  users  from viewing jobs or job steps belonging
2135                     to other users. (NON-SlurmDBD ACCOUNTING  ONLY)  Prevents
2136                     users  from  viewing job records belonging to other users
2137                     unless they are coordinators of the  association  running
2138                     the job when using sacct.
2139
2140              nodes  Prevents users from viewing node state information.
2141
2142              partitions
2143                     Prevents users from viewing partition state information.
2144
2145              reservations
2146                     Prevents  regular  users  from viewing reservations which
2147                     they can not use.
2148
2149              usage  Prevents users from viewing usage of any other user, this
2150                     applies  to  sshare.  (NON-SlurmDBD ACCOUNTING ONLY) Pre‐
2151                     vents users from viewing usage of any  other  user,  this
2152                     applies to sreport.
2153
2154              users  (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2155                     ing information of any user other than  themselves,  this
2156                     also  makes  it  so  users can only see associations they
2157                     deal with.  Coordinators  can  see  associations  of  all
2158                     users  in  the  account  they are coordinator of, but can
2159                     only see themselves when listing users.
2160
2161       ProctrackType
2162              Identifies the plugin to be used for process tracking on  a  job
2163              step  basis.   The slurmd daemon uses this mechanism to identify
2164              all processes which are children of processes it  spawns  for  a
2165              user  job  step.  A restart of slurmctld is required for changes
2166              to this parameter to take effect.   NOTE:  "proctrack/linuxproc"
2167              and  "proctrack/pgid" can fail to identify all processes associ‐
2168              ated with a job since processes can become a child of  the  init
2169              process  (when  the  parent  process terminates) or change their
2170              process  group.   To  reliably  track  all   processes,   "proc‐
2171              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2172              applies to a job allocation, while ProctrackType applies to  job
2173              steps.  Acceptable values at present include:
2174
2175              proctrack/cgroup
2176                     Uses  linux cgroups to constrain and track processes, and
2177                     is the default for systems with cgroup support.
2178                     NOTE: see "man cgroup.conf" for configuration details.
2179
2180              proctrack/cray_aries
2181                     Uses Cray proprietary process tracking.
2182
2183              proctrack/linuxproc
2184                     Uses linux process tree using parent process IDs.
2185
2186              proctrack/pgid
2187                     Uses Process Group IDs.
2188                     NOTE: This is the default for the BSD family.
2189
2190       Prolog Fully qualified pathname of a program for the slurmd to  execute
2191              whenever it is asked to run a job step from a new job allocation
2192              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2193              may  also  be used to specify more than one program to run (e.g.
2194              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2195              starting  the  first job step.  The prolog script or scripts may
2196              be used to purge files, enable  user  login,  etc.   By  default
2197              there  is  no  prolog. Any configured script is expected to com‐
2198              plete execution quickly (in less time than MessageTimeout).   If
2199              the  prolog  fails (returns a non-zero exit code), this will re‐
2200              sult in the node being set to a DRAIN state and  the  job  being
2201              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2202              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2203              for more information.
2204
2205       PrologEpilogTimeout
2206              The  interval  in seconds Slurms waits for Prolog and Epilog be‐
2207              fore terminating them. The default behavior is to  wait  indefi‐
2208              nitely.  This  interval  applies to the Prolog and Epilog run by
2209              slurmd daemon before and after the job, the PrologSlurmctld  and
2210              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
2211              run by the slurmstepd daemon.
2212
2213       PrologFlags
2214              Flags to control the Prolog behavior. By default  no  flags  are
2215              set.  Multiple flags may be specified in a comma-separated list.
2216              Currently supported options are:
2217
2218              Alloc   If set, the Prolog script will be executed at job  allo‐
2219                      cation.  By  default, Prolog is executed just before the
2220                      task is launched. Therefore, when salloc is started,  no
2221                      Prolog is executed. Alloc is useful for preparing things
2222                      before a user starts to use any allocated resources.  In
2223                      particular,  this  flag  is needed on a Cray system when
2224                      cluster compatibility mode is enabled.
2225
2226                      NOTE: Use of the Alloc flag will increase the  time  re‐
2227                      quired to start jobs.
2228
2229              Contain At job allocation time, use the ProcTrack plugin to cre‐
2230                      ate a job container  on  all  allocated  compute  nodes.
2231                      This  container  may  be  used  for  user  processes not
2232                      launched    under    Slurm    control,    for    example
2233                      pam_slurm_adopt  may  place processes launched through a
2234                      direct  user  login  into  this  container.   If   using
2235                      pam_slurm_adopt,  then  ProcTrackType must be set to ei‐
2236                      ther proctrack/cgroup or proctrack/cray_aries.   Setting
2237                      the Contain implicitly sets the Alloc flag.
2238
2239              NoHold  If  set,  the  Alloc flag should also be set.  This will
2240                      allow for salloc to not block until the prolog  is  fin‐
2241                      ished on each node.  The blocking will happen when steps
2242                      reach the slurmd and before any execution  has  happened
2243                      in  the  step.  This is a much faster way to work and if
2244                      using srun to launch your  tasks  you  should  use  this
2245                      flag.  This  flag cannot be combined with the Contain or
2246                      X11 flags.
2247
2248              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2249                      rently  on each node.  This flag forces those scripts to
2250                      run serially within each node, but  with  a  significant
2251                      penalty to job throughput on each node.
2252
2253              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2254                      This is incompatible with ProctrackType=proctrack/linux‐
2255                      proc.  Setting the X11 flag implicitly enables both Con‐
2256                      tain and Alloc flags as well.
2257
2258       PrologSlurmctld
2259              Fully qualified pathname of a program for the  slurmctld  daemon
2260              to execute before granting a new job allocation (e.g.  "/usr/lo‐
2261              cal/slurm/prolog_controller").  The program  executes  as  Slur‐
2262              mUser on the same node where the slurmctld daemon executes, giv‐
2263              ing it permission to drain nodes and requeue the job if a  fail‐
2264              ure  occurs  or cancel the job if appropriate.  Exactly what the
2265              program does and how it accomplishes this is completely  at  the
2266              discretion  of  the system administrator.  Information about the
2267              job being initiated, its allocated nodes, etc. are passed to the
2268              program using environment variables.  While this program is run‐
2269              ning,  the  nodes  associated  with  the  job  will  be  have  a
2270              POWER_UP/CONFIGURING flag set in their state, which can be read‐
2271              ily viewed.  The slurmctld daemon  will  wait  indefinitely  for
2272              this  program  to  complete.  Once the program completes with an
2273              exit code of zero, the nodes will be considered  ready  for  use
2274              and  the  program will be started.  If some node can not be made
2275              available for use, the program should drain the node  (typically
2276              using  the  scontrol command) and terminate with a non-zero exit
2277              code.  A non-zero exit code will result in  the  job  being  re‐
2278              queued (where possible) or killed. Note that only batch jobs can
2279              be requeued.  See Prolog and Epilog Scripts  for  more  informa‐
2280              tion.
2281
2282       PropagatePrioProcess
2283              Controls  the  scheduling  priority (nice value) of user spawned
2284              tasks.
2285
2286              0    The tasks will inherit the  scheduling  priority  from  the
2287                   slurm daemon.  This is the default value.
2288
2289              1    The  tasks will inherit the scheduling priority of the com‐
2290                   mand used to submit them (e.g. srun or sbatch).  Unless the
2291                   job is submitted by user root, the tasks will have a sched‐
2292                   uling priority no higher than  the  slurm  daemon  spawning
2293                   them.
2294
2295              2    The  tasks will inherit the scheduling priority of the com‐
2296                   mand used to submit them (e.g. srun or sbatch) with the re‐
2297                   striction  that  their nice value will always be one higher
2298                   than the slurm daemon (i.e.  the tasks scheduling  priority
2299                   will be lower than the slurm daemon).
2300
2301       PropagateResourceLimits
2302              A comma-separated list of resource limit names.  The slurmd dae‐
2303              mon uses these names to obtain the associated (soft) limit  val‐
2304              ues  from  the  user's  process  environment on the submit node.
2305              These limits are then propagated and applied to  the  jobs  that
2306              will  run  on  the  compute nodes.  This parameter can be useful
2307              when system limits vary among nodes.  Any resource  limits  that
2308              do not appear in the list are not propagated.  However, the user
2309              can override this by specifying which resource limits to  propa‐
2310              gate  with  the  sbatch or srun "--propagate" option. If neither
2311              PropagateResourceLimits  or  PropagateResourceLimitsExcept   are
2312              configured  and  the "--propagate" option is not specified, then
2313              the default action is to propagate all limits. Only one  of  the
2314              parameters, either PropagateResourceLimits or PropagateResource‐
2315              LimitsExcept, may be specified.  The user limits can not  exceed
2316              hard  limits under which the slurmd daemon operates. If the user
2317              limits are not propagated, the limits  from  the  slurmd  daemon
2318              will  be  propagated  to the user's job. The limits used for the
2319              Slurm daemons can be set in  the  /etc/sysconf/slurm  file.  For
2320              more  information,  see: https://slurm.schedmd.com/faq.html#mem‐
2321              lock The following limit names are supported by Slurm  (although
2322              some options may not be supported on some systems):
2323
2324              ALL       All limits listed below (default)
2325
2326              NONE      No limits listed below
2327
2328              AS        The  maximum  address  space  (virtual  memory)  for a
2329                        process.
2330
2331              CORE      The maximum size of core file
2332
2333              CPU       The maximum amount of CPU time
2334
2335              DATA      The maximum size of a process's data segment
2336
2337              FSIZE     The maximum size of files created. Note  that  if  the
2338                        user  sets  FSIZE to less than the current size of the
2339                        slurmd.log, job launches will fail with a  'File  size
2340                        limit exceeded' error.
2341
2342              MEMLOCK   The maximum size that may be locked into memory
2343
2344              NOFILE    The maximum number of open files
2345
2346              NPROC     The maximum number of processes available
2347
2348              RSS       The  maximum  resident  set size.  Note that this only
2349                        has effect with Linux kernels 2.4.30 or older or BSD.
2350
2351              STACK     The maximum stack size
2352
2353       PropagateResourceLimitsExcept
2354              A comma-separated list of resource limit names.  By default, all
2355              resource  limits will be propagated, (as described by the Propa‐
2356              gateResourceLimits parameter), except for the  limits  appearing
2357              in  this  list.   The user can override this by specifying which
2358              resource limits to propagate with the sbatch or  srun  "--propa‐
2359              gate"  option.   See PropagateResourceLimits above for a list of
2360              valid limit names.
2361
2362       RebootProgram
2363              Program to be executed on each compute node to  reboot  it.  In‐
2364              voked on each node once it becomes idle after the command "scon‐
2365              trol reboot" is executed by an authorized user or a job is  sub‐
2366              mitted with the "--reboot" option.  After rebooting, the node is
2367              returned to normal use.  See ResumeTimeout to configure the time
2368              you expect a reboot to finish in.  A node will be marked DOWN if
2369              it doesn't reboot within ResumeTimeout.
2370
2371       ReconfigFlags
2372              Flags to control various actions  that  may  be  taken  when  an
2373              "scontrol  reconfig"  command  is  issued. Currently the options
2374              are:
2375
2376              KeepPartInfo     If set, an  "scontrol  reconfig"  command  will
2377                               maintain   the  in-memory  value  of  partition
2378                               "state" and other parameters that may have been
2379                               dynamically updated by "scontrol update".  Par‐
2380                               tition information in the slurm.conf file  will
2381                               be  merged  with in-memory data.  This flag su‐
2382                               persedes the KeepPartState flag.
2383
2384              KeepPartState    If set, an  "scontrol  reconfig"  command  will
2385                               preserve  only  the  current  "state"  value of
2386                               in-memory partitions and will reset  all  other
2387                               parameters of the partitions that may have been
2388                               dynamically updated by "scontrol update" to the
2389                               values from the slurm.conf file.  Partition in‐
2390                               formation in the slurm.conf file will be merged
2391                               with in-memory data.
2392
2393              The  default  for  the above flags is not set, and the "scontrol
2394              reconfig" will rebuild the partition information using only  the
2395              definitions in the slurm.conf file.
2396
2397       RequeueExit
2398              Enables  automatic  requeue  for  batch jobs which exit with the
2399              specified values.  Separate multiple exit code by a comma and/or
2400              specify  numeric  ranges  using  a "-" separator (e.g. "Requeue‐
2401              Exit=1-9,18") Jobs will be put back  in  to  pending  state  and
2402              later scheduled again.  Restarted jobs will have the environment
2403              variable SLURM_RESTART_COUNT set to the number of times the  job
2404              has been restarted.
2405
2406       RequeueExitHold
2407              Enables  automatic  requeue  for  batch jobs which exit with the
2408              specified values, with these jobs being held until released man‐
2409              ually  by  the  user.   Separate  multiple  exit code by a comma
2410              and/or specify numeric ranges using a "-" separator  (e.g.  "Re‐
2411              queueExitHold=10-12,16")  These  jobs  are  put  in the JOB_SPE‐
2412              CIAL_EXIT exit state.  Restarted jobs will have the  environment
2413              variable  SLURM_RESTART_COUNT set to the number of times the job
2414              has been restarted.
2415
2416       ResumeFailProgram
2417              The program that will be executed when nodes fail to  resume  to
2418              by  ResumeTimeout. The argument to the program will be the names
2419              of the failed nodes (using Slurm's hostlist expression format).
2420
2421       ResumeProgram
2422              Slurm supports a mechanism to reduce power consumption on  nodes
2423              that  remain idle for an extended period of time.  This is typi‐
2424              cally accomplished by reducing voltage and frequency or powering
2425              the  node  down.  ResumeProgram is the program that will be exe‐
2426              cuted when a node in power save mode is assigned  work  to  per‐
2427              form.   For  reasons  of  reliability, ResumeProgram may execute
2428              more than once for a node when the slurmctld daemon crashes  and
2429              is  restarted.   If ResumeProgram is unable to restore a node to
2430              service with a responding slurmd and  an  updated  BootTime,  it
2431              should requeue any job associated with the node and set the node
2432              state to DOWN. If the node isn't actually  rebooted  (i.e.  when
2433              multiple-slurmd  is configured) starting slurmd with "-b" option
2434              might be useful.  The program executes as SlurmUser.  The  argu‐
2435              ment  to  the  program  will be the names of nodes to be removed
2436              from power savings mode (using Slurm's hostlist expression  for‐
2437              mat). A job to node mapping is available in JSON format by read‐
2438              ing the temporary file specified by the SLURM_RESUME_FILE  envi‐
2439              ronment variable.  By default no program is run.
2440
2441       ResumeRate
2442              The  rate at which nodes in power save mode are returned to nor‐
2443              mal operation by ResumeProgram.  The value is a number of  nodes
2444              per minute and it can be used to prevent power surges if a large
2445              number of nodes in power save mode are assigned work at the same
2446              time  (e.g.  a large job starts).  A value of zero results in no
2447              limits being imposed.   The  default  value  is  300  nodes  per
2448              minute.
2449
2450       ResumeTimeout
2451              Maximum  time  permitted (in seconds) between when a node resume
2452              request is issued and when the node is  actually  available  for
2453              use.   Nodes  which  fail  to respond in this time frame will be
2454              marked DOWN and the jobs scheduled on the node requeued.   Nodes
2455              which  reboot  after  this time frame will be marked DOWN with a
2456              reason of "Node unexpectedly rebooted."  The default value is 60
2457              seconds.
2458
2459       ResvEpilog
2460              Fully  qualified pathname of a program for the slurmctld to exe‐
2461              cute when a reservation ends. The program can be used to  cancel
2462              jobs,  modify  partition  configuration,  etc.   The reservation
2463              named will be passed as an argument to the program.  By  default
2464              there is no epilog.
2465
2466       ResvOverRun
2467              Describes how long a job already running in a reservation should
2468              be permitted to execute after the end time  of  the  reservation
2469              has  been  reached.  The time period is specified in minutes and
2470              the default value is 0 (kill the job  immediately).   The  value
2471              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2472              supported to permit a job to run indefinitely after its reserva‐
2473              tion is terminated.
2474
2475       ResvProlog
2476              Fully  qualified pathname of a program for the slurmctld to exe‐
2477              cute when a reservation begins. The program can be used to  can‐
2478              cel  jobs, modify partition configuration, etc.  The reservation
2479              named will be passed as an argument to the program.  By  default
2480              there is no prolog.
2481
2482       ReturnToService
2483              Controls  when a DOWN node will be returned to service.  The de‐
2484              fault value is 0.  Supported values include
2485
2486              0   A node will remain in the DOWN state until a system adminis‐
2487                  trator explicitly changes its state (even if the slurmd dae‐
2488                  mon registers and resumes communications).
2489
2490              1   A DOWN node will become available for use upon  registration
2491                  with  a  valid  configuration only if it was set DOWN due to
2492                  being non-responsive.  If the node  was  set  DOWN  for  any
2493                  other  reason  (low  memory,  unexpected  reboot, etc.), its
2494                  state will not automatically be changed.  A  node  registers
2495                  with  a  valid configuration if its memory, GRES, CPU count,
2496                  etc. are equal to or greater than the values  configured  in
2497                  slurm.conf.
2498
2499              2   A  DOWN node will become available for use upon registration
2500                  with a valid configuration.  The node could  have  been  set
2501                  DOWN for any reason.  A node registers with a valid configu‐
2502                  ration if its memory, GRES, CPU count, etc. are equal to  or
2503                  greater than the values configured in slurm.conf.
2504
2505       RoutePlugin
2506              Identifies  the  plugin to be used for defining which nodes will
2507              be used for message forwarding.
2508
2509              route/default
2510                     default, use TreeWidth.
2511
2512              route/topology
2513                     use the switch hierarchy defined in a topology.conf file.
2514                     TopologyPlugin=topology/tree is required.
2515
2516       SchedulerParameters
2517              The  interpretation  of  this parameter varies by SchedulerType.
2518              Multiple options may be comma separated.
2519
2520              allow_zero_lic
2521                     If set, then job submissions requesting more than config‐
2522                     ured licenses won't be rejected.
2523
2524              assoc_limit_stop
2525                     If  set and a job cannot start due to association limits,
2526                     then do not attempt to initiate any lower  priority  jobs
2527                     in  that  partition.  Setting  this  can  decrease system
2528                     throughput and utilization, but avoid potentially  starv‐
2529                     ing larger jobs by preventing them from launching indefi‐
2530                     nitely.
2531
2532              batch_sched_delay=#
2533                     How long, in seconds, the scheduling of batch jobs can be
2534                     delayed.   This  can be useful in a high-throughput envi‐
2535                     ronment in which batch jobs are submitted at a very  high
2536                     rate  (i.e.  using  the sbatch command) and one wishes to
2537                     reduce the overhead of attempting to schedule each job at
2538                     submit time.  The default value is 3 seconds.
2539
2540              bb_array_stage_cnt=#
2541                     Number of tasks from a job array that should be available
2542                     for burst buffer resource allocation. Higher values  will
2543                     increase  the  system  overhead as each task from the job
2544                     array will be moved to its own job record in  memory,  so
2545                     relatively  small  values are generally recommended.  The
2546                     default value is 10.
2547
2548              bf_busy_nodes
2549                     When selecting resources for pending jobs to reserve  for
2550                     future execution (i.e. the job can not be started immedi‐
2551                     ately), then preferentially select nodes that are in use.
2552                     This  will  tend to leave currently idle resources avail‐
2553                     able for backfilling longer running jobs, but may  result
2554                     in allocations having less than optimal network topology.
2555                     This option  is  currently  only  supported  by  the  se‐
2556                     lect/cons_res   and   select/cons_tres  plugins  (or  se‐
2557                     lect/cray_aries   with   SelectTypeParameters   set    to
2558                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2559                     select/cray_aries plugin over the select/cons_res or  se‐
2560                     lect/cons_tres plugin respectively).
2561
2562              bf_continue
2563                     The backfill scheduler periodically releases locks in or‐
2564                     der to permit other operations  to  proceed  rather  than
2565                     blocking  all  activity for what could be an extended pe‐
2566                     riod of time.  Setting this option will cause  the  back‐
2567                     fill  scheduler  to continue processing pending jobs from
2568                     its original job list after releasing locks even  if  job
2569                     or node state changes.
2570
2571              bf_hetjob_immediate
2572                     Instruct  the  backfill  scheduler  to attempt to start a
2573                     heterogeneous job as soon as all of  its  components  are
2574                     determined  able to do so. Otherwise, the backfill sched‐
2575                     uler will delay heterogeneous  jobs  initiation  attempts
2576                     until  after  the  rest  of the queue has been processed.
2577                     This delay may result in lower priority jobs being  allo‐
2578                     cated  resources, which could delay the initiation of the
2579                     heterogeneous job due to account and/or QOS limits  being
2580                     reached.  This  option is disabled by default. If enabled
2581                     and bf_hetjob_prio=min is not set, then it would be auto‐
2582                     matically set.
2583
2584              bf_hetjob_prio=[min|avg|max]
2585                     At  the  beginning  of  each backfill scheduling cycle, a
2586                     list of pending to be scheduled jobs is sorted  according
2587                     to  the precedence order configured in PriorityType. This
2588                     option instructs the scheduler to alter the sorting algo‐
2589                     rithm to ensure that all components belonging to the same
2590                     heterogeneous job will be attempted to be scheduled  con‐
2591                     secutively  (thus  not fragmented in the resulting list).
2592                     More specifically, all components from the same heteroge‐
2593                     neous  job  will  be treated as if they all have the same
2594                     priority (minimum, average or maximum depending upon this
2595                     option's  parameter)  when  compared  with other jobs (or
2596                     other heterogeneous job components). The  original  order
2597                     will be preserved within the same heterogeneous job. Note
2598                     that the operation is  calculated  for  the  PriorityTier
2599                     layer  and  for  the  Priority  resulting from the prior‐
2600                     ity/multifactor plugin calculations. When enabled, if any
2601                     heterogeneous job requested an advanced reservation, then
2602                     all of that job's components will be treated as  if  they
2603                     had  requested an advanced reservation (and get preferen‐
2604                     tial treatment in scheduling).
2605
2606                     Note that this operation does  not  update  the  Priority
2607                     values  of  the  heterogeneous job components, only their
2608                     order within the list, so the output of the sprio command
2609                     will not be effected.
2610
2611                     Heterogeneous  jobs  have  special scheduling properties:
2612                     they  are  only  scheduled  by  the  backfill  scheduling
2613                     plugin, each of their components is considered separately
2614                     when reserving resources (and might have different Prior‐
2615                     ityTier  or  different Priority values), and no heteroge‐
2616                     neous job component is actually allocated resources until
2617                     all  if  its components can be initiated.  This may imply
2618                     potential scheduling deadlock  scenarios  because  compo‐
2619                     nents from different heterogeneous jobs can start reserv‐
2620                     ing resources in an  interleaved  fashion  (not  consecu‐
2621                     tively),  but  none of the jobs can reserve resources for
2622                     all components and start. Enabling this option  can  help
2623                     to mitigate this problem. By default, this option is dis‐
2624                     abled.
2625
2626              bf_interval=#
2627                     The  number  of  seconds  between  backfill   iterations.
2628                     Higher  values result in less overhead and better respon‐
2629                     siveness.   This  option  applies  only   to   Scheduler‐
2630                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2631                     (3h).
2632
2633
2634              bf_job_part_count_reserve=#
2635                     The backfill scheduling logic will reserve resources  for
2636                     the specified count of highest priority jobs in each par‐
2637                     tition.  For example,  bf_job_part_count_reserve=10  will
2638                     cause the backfill scheduler to reserve resources for the
2639                     ten highest priority jobs in each partition.   Any  lower
2640                     priority  job  that can be started using currently avail‐
2641                     able resources and  not  adversely  impact  the  expected
2642                     start  time of these higher priority jobs will be started
2643                     by the backfill scheduler  The  default  value  is  zero,
2644                     which  will reserve resources for any pending job and de‐
2645                     lay  initiation  of  lower  priority  jobs.    Also   see
2646                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2647                     Min: 0, Max: 100000.
2648
2649              bf_max_job_array_resv=#
2650                     The maximum number of tasks from a job  array  for  which
2651                     the  backfill scheduler will reserve resources in the fu‐
2652                     ture.  Since job arrays can potentially have millions  of
2653                     tasks,  the overhead in reserving resources for all tasks
2654                     can be prohibitive.  In addition various limits may  pre‐
2655                     vent  all  the  jobs from starting at the expected times.
2656                     This has no impact upon the number of tasks  from  a  job
2657                     array  that  can be started immediately, only those tasks
2658                     expected to start at some future time.  Default: 20, Min:
2659                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2660                     tions appear in the job queue once per partition. If dif‐
2661                     ferent copies of a single job array record aren't consec‐
2662                     utive in the job queue and another job array record is in
2663                     between,  then bf_max_job_array_resv tasks are considered
2664                     per partition that the job is submitted to.
2665
2666              bf_max_job_assoc=#
2667                     The maximum number of jobs per user  association  to  at‐
2668                     tempt starting with the backfill scheduler.  This setting
2669                     is similar to bf_max_job_user but is handy if a user  has
2670                     multiple  associations  equating  to  basically different
2671                     users.  One can set this  limit  to  prevent  users  from
2672                     flooding  the  backfill queue with jobs that cannot start
2673                     and that prevent jobs from other users  to  start.   This
2674                     option   applies  only  to  SchedulerType=sched/backfill.
2675                     Also    see    the    bf_max_job_user    bf_max_job_part,
2676                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2677                     bf_max_job_test   to   a   value   much    higher    than
2678                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2679                     bf_max_job_test.
2680
2681              bf_max_job_part=#
2682                     The maximum number  of  jobs  per  partition  to  attempt
2683                     starting  with  the backfill scheduler. This can be espe‐
2684                     cially helpful for systems with large numbers  of  parti‐
2685                     tions  and  jobs.  This option applies only to Scheduler‐
2686                     Type=sched/backfill.  Also  see  the  partition_job_depth
2687                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2688                     value much higher than bf_max_job_part.  Default:  0  (no
2689                     limit), Min: 0, Max: bf_max_job_test.
2690
2691              bf_max_job_start=#
2692                     The  maximum  number  of jobs which can be initiated in a
2693                     single iteration of the backfill scheduler.  This  option
2694                     applies only to SchedulerType=sched/backfill.  Default: 0
2695                     (no limit), Min: 0, Max: 10000.
2696
2697              bf_max_job_test=#
2698                     The maximum number of jobs to attempt backfill scheduling
2699                     for (i.e. the queue depth).  Higher values result in more
2700                     overhead and less responsiveness.  Until  an  attempt  is
2701                     made  to backfill schedule a job, its expected initiation
2702                     time value will not be set.  In the case of  large  clus‐
2703                     ters,  configuring a relatively small value may be desir‐
2704                     able.    This   option   applies   only   to   Scheduler‐
2705                     Type=sched/backfill.    Default:   500,   Min:   1,  Max:
2706                     1,000,000.
2707
2708              bf_max_job_user=#
2709                     The maximum number of jobs per user to  attempt  starting
2710                     with  the backfill scheduler for ALL partitions.  One can
2711                     set this limit to prevent users from flooding  the  back‐
2712                     fill  queue  with jobs that cannot start and that prevent
2713                     jobs from other users to start.  This is similar  to  the
2714                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2715                     SchedulerType=sched/backfill.      Also      see      the
2716                     bf_max_job_part,            bf_max_job_test           and
2717                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2718                     value  much  higher than bf_max_job_user.  Default: 0 (no
2719                     limit), Min: 0, Max: bf_max_job_test.
2720
2721              bf_max_job_user_part=#
2722                     The maximum number of jobs per user per partition to  at‐
2723                     tempt starting with the backfill scheduler for any single
2724                     partition.   This  option  applies  only  to   Scheduler‐
2725                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2726                     bf_max_job_test and bf_max_job_user=# options.   Default:
2727                     0 (no limit), Min: 0, Max: bf_max_job_test.
2728
2729              bf_max_time=#
2730                     The  maximum  time  in seconds the backfill scheduler can
2731                     spend (including time spent sleeping when locks  are  re‐
2732                     leased)  before discontinuing, even if maximum job counts
2733                     have not been  reached.   This  option  applies  only  to
2734                     SchedulerType=sched/backfill.   The  default value is the
2735                     value of bf_interval (which defaults to 30 seconds).  De‐
2736                     fault: bf_interval value (def. 30 sec), Min: 1, Max: 3600
2737                     (1h).  NOTE: If bf_interval is short and  bf_max_time  is
2738                     large, this may cause locks to be acquired too frequently
2739                     and starve out other serviced RPCs. It's advisable if us‐
2740                     ing  this  parameter  to set max_rpc_cnt high enough that
2741                     scheduling isn't always disabled, and low enough that the
2742                     interactive  workload can get through in a reasonable pe‐
2743                     riod of time. max_rpc_cnt needs to be below 256 (the  de‐
2744                     fault  RPC thread limit). Running around the middle (150)
2745                     may give you good results.   NOTE:  When  increasing  the
2746                     amount  of  time  spent in the backfill scheduling cycle,
2747                     Slurm can be prevented from responding to client requests
2748                     in  a  timely  manner.   To  address  this  you  can  use
2749                     max_rpc_cnt to specify a number of queued RPCs before the
2750                     scheduler stops to respond to these requests.
2751
2752              bf_min_age_reserve=#
2753                     The  backfill  and main scheduling logic will not reserve
2754                     resources for pending jobs until they have  been  pending
2755                     and  runnable  for  at least the specified number of sec‐
2756                     onds.  In addition, jobs waiting for less than the speci‐
2757                     fied number of seconds will not prevent a newly submitted
2758                     job from starting immediately, even if the newly  submit‐
2759                     ted  job  has  a lower priority.  This can be valuable if
2760                     jobs lack time limits or all time limits  have  the  same
2761                     value.  The default value is zero, which will reserve re‐
2762                     sources for any pending job and delay initiation of lower
2763                     priority  jobs.   Also  see bf_job_part_count_reserve and
2764                     bf_min_prio_reserve.  Default: 0, Min:  0,  Max:  2592000
2765                     (30 days).
2766
2767              bf_min_prio_reserve=#
2768                     The  backfill  and main scheduling logic will not reserve
2769                     resources for pending jobs unless they  have  a  priority
2770                     equal  to  or  higher than the specified value.  In addi‐
2771                     tion, jobs with a lower priority will not prevent a newly
2772                     submitted  job  from  starting  immediately,  even if the
2773                     newly submitted job has a lower priority.   This  can  be
2774                     valuable  if  one  wished  to maximize system utilization
2775                     without regard for job priority below a  certain  thresh‐
2776                     old.   The  default value is zero, which will reserve re‐
2777                     sources for any pending job and delay initiation of lower
2778                     priority  jobs.   Also  see bf_job_part_count_reserve and
2779                     bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2780
2781              bf_node_space_size=#
2782                     Size of backfill node_space table. Adding a single job to
2783                     backfill  reservations  in the worst case can consume two
2784                     node_space records.  In the case of large clusters,  con‐
2785                     figuring a relatively small value may be desirable.  This
2786                     option  applies  only  to   SchedulerType=sched/backfill.
2787                     Also see bf_max_job_test and bf_running_job_reserve.  De‐
2788                     fault: bf_max_job_test, Min: 2, Max: 2,000,000.
2789
2790              bf_one_resv_per_job
2791                     Disallow adding more than one  backfill  reservation  per
2792                     job.   The  scheduling logic builds a sorted list of job-
2793                     partition pairs. Jobs submitted  to  multiple  partitions
2794                     have as many entries in the list as requested partitions.
2795                     By default, the backfill scheduler may evaluate  all  the
2796                     job-partition  entries  for a single job, potentially re‐
2797                     serving resources for each pair, but  only  starting  the
2798                     job  in the reservation offering the earliest start time.
2799                     Having a single job reserving resources for multiple par‐
2800                     titions  could  impede  other jobs (or hetjob components)
2801                     from reserving resources already reserved for the  parti‐
2802                     tions that don't offer the earliest start time.  A single
2803                     job that requests multiple partitions  can  also  prevent
2804                     itself  from  starting earlier in a lower priority parti‐
2805                     tion if the  partitions  overlap  nodes  and  a  backfill
2806                     reservation in the higher priority partition blocks nodes
2807                     that are also in the lower priority partition.  This  op‐
2808                     tion  makes it so that a job submitted to multiple parti‐
2809                     tions will stop reserving resources once the  first  job-
2810                     partition  pair has booked a backfill reservation. Subse‐
2811                     quent pairs from the same job  will  only  be  tested  to
2812                     start  now. This allows for other jobs to be able to book
2813                     the other pairs resources at the cost of not guaranteeing
2814                     that  the multi partition job will start in the partition
2815                     offering the earliest start time (unless it can start im‐
2816                     mediately).  This option is disabled by default.
2817
2818              bf_resolution=#
2819                     The  number  of  seconds  in the resolution of data main‐
2820                     tained about when jobs begin and end. Higher  values  re‐
2821                     sult in better responsiveness and quicker backfill cycles
2822                     by using larger blocks of time to determine  node  eligi‐
2823                     bility.   However,  higher  values lead to less efficient
2824                     system planning, and may miss  opportunities  to  improve
2825                     system  utilization.   This option applies only to Sched‐
2826                     ulerType=sched/backfill.  Default: 60, Min: 1, Max:  3600
2827                     (1 hour).
2828
2829              bf_running_job_reserve
2830                     Add  an extra step to backfill logic, which creates back‐
2831                     fill reservations for jobs running on whole nodes.   This
2832                     option is disabled by default.
2833
2834              bf_window=#
2835                     The  number  of minutes into the future to look when con‐
2836                     sidering jobs to schedule.  Higher values result in  more
2837                     overhead  and  less  responsiveness.  A value at least as
2838                     long as the highest allowed time limit is  generally  ad‐
2839                     visable to prevent job starvation.  In order to limit the
2840                     amount of data managed by the backfill scheduler, if  the
2841                     value of bf_window is increased, then it is generally ad‐
2842                     visable to also increase bf_resolution.  This option  ap‐
2843                     plies  only  to  SchedulerType=sched/backfill.   Default:
2844                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2845
2846              bf_window_linear=#
2847                     For performance reasons, the backfill scheduler will  de‐
2848                     crease  precision in calculation of job expected termina‐
2849                     tion times. By default, the precision starts at  30  sec‐
2850                     onds  and that time interval doubles with each evaluation
2851                     of currently executing jobs when trying to determine when
2852                     a  pending  job  can start. This algorithm can support an
2853                     environment with many thousands of running jobs, but  can
2854                     result  in  the expected start time of pending jobs being
2855                     gradually being deferred due  to  lack  of  precision.  A
2856                     value  for  bf_window_linear will cause the time interval
2857                     to be increased by a constant amount on  each  iteration.
2858                     The  value is specified in units of seconds. For example,
2859                     a value of 60 will cause the backfill  scheduler  on  the
2860                     first  iteration  to  identify the job ending soonest and
2861                     determine if the pending job can be  started  after  that
2862                     job plus all other jobs expected to end within 30 seconds
2863                     (default initial value) of the first job. On the next it‐
2864                     eration,  the  pending job will be evaluated for starting
2865                     after the next job expected to end plus all  jobs  ending
2866                     within  90  seconds of that time (30 second default, plus
2867                     the 60 second option value).  The  third  iteration  will
2868                     have  a  150  second  window  and the fourth 210 seconds.
2869                     Without this option, the time windows will double on each
2870                     iteration  and thus be 30, 60, 120, 240 seconds, etc. The
2871                     use of bf_window_linear is not recommended with more than
2872                     a few hundred simultaneously executing jobs.
2873
2874              bf_yield_interval=#
2875                     The backfill scheduler will periodically relinquish locks
2876                     in order for other  pending  operations  to  take  place.
2877                     This  specifies the times when the locks are relinquished
2878                     in microseconds.  Smaller values may be helpful for  high
2879                     throughput  computing  when  used in conjunction with the
2880                     bf_continue option.  Also see the bf_yield_sleep  option.
2881                     Default:  2,000,000  (2 sec), Min: 1, Max: 10,000,000 (10
2882                     sec).
2883
2884              bf_yield_sleep=#
2885                     The backfill scheduler will periodically relinquish locks
2886                     in  order  for  other  pending  operations to take place.
2887                     This specifies the length of time for which the locks are
2888                     relinquished  in microseconds.  Also see the bf_yield_in‐
2889                     terval option.  Default: 500,000 (0.5 sec), Min: 1,  Max:
2890                     10,000,000 (10 sec).
2891
2892              build_queue_timeout=#
2893                     Defines  the maximum time that can be devoted to building
2894                     a queue of jobs to be tested for scheduling.  If the sys‐
2895                     tem  has  a  huge  number of jobs with dependencies, just
2896                     building the job queue can take so much time  as  to  ad‐
2897                     versely impact overall system performance and this param‐
2898                     eter can be adjusted as needed.   The  default  value  is
2899                     2,000,000 microseconds (2 seconds).
2900
2901              correspond_after_task_cnt=#
2902                     Defines  the number of array tasks that get split for po‐
2903                     tential aftercorr dependency check. Low number may result
2904                     in dependent task check failures when the job one depends
2905                     on gets purged before the split.  Default: 10.
2906
2907              default_queue_depth=#
2908                     The default number of jobs to  attempt  scheduling  (i.e.
2909                     the  queue  depth)  when a running job completes or other
2910                     routine actions occur, however the frequency  with  which
2911                     the scheduler is run may be limited by using the defer or
2912                     sched_min_interval parameters described below.  The  full
2913                     queue  will be tested on a less frequent basis as defined
2914                     by the sched_interval option described below. The default
2915                     value  is  100.   See  the  partition_job_depth option to
2916                     limit depth by partition.
2917
2918              defer  Setting this option will  avoid  attempting  to  schedule
2919                     each  job  individually  at job submit time, but defer it
2920                     until a later time when scheduling multiple jobs simulta‐
2921                     neously  may be possible.  This option may improve system
2922                     responsiveness when large numbers of jobs (many hundreds)
2923                     are  submitted  at  the  same time, but it will delay the
2924                     initiation  time  of  individual  jobs.  Also   see   de‐
2925                     fault_queue_depth above.
2926
2927              delay_boot=#
2928                     Do not reboot nodes in order to satisfied this job's fea‐
2929                     ture specification if the job has been  eligible  to  run
2930                     for  less  than  this time period.  If the job has waited
2931                     for less than the specified  period,  it  will  use  only
2932                     nodes which already have the specified features.  The ar‐
2933                     gument is in units of minutes.  Individual jobs may over‐
2934                     ride this default value with the --delay-boot option.
2935
2936              disable_job_shrink
2937                     Deny  user  requests  to shrink the size of running jobs.
2938                     (However, running jobs may still shrink due to node fail‐
2939                     ure if the --no-kill option was set.)
2940
2941              disable_hetjob_steps
2942                     Disable  job  steps  that  span heterogeneous job alloca‐
2943                     tions.
2944
2945              enable_hetjob_steps
2946                     Enable job steps that span heterogeneous job allocations.
2947                     The default value.
2948
2949              enable_user_top
2950                     Enable  use  of  the "scontrol top" command by non-privi‐
2951                     leged users.
2952
2953              Ignore_NUMA
2954                     Some processors (e.g. AMD Opteron  6000  series)  contain
2955                     multiple  NUMA  nodes per socket. This is a configuration
2956                     which does not map into the hardware entities that  Slurm
2957                     optimizes   resource  allocation  for  (PU/thread,  core,
2958                     socket, baseboard, node and network switch). In order  to
2959                     optimize  resource  allocations  on  such hardware, Slurm
2960                     will consider each NUMA node within the socket as a sepa‐
2961                     rate socket by default. Use the Ignore_NUMA option to re‐
2962                     port the correct socket count, but not optimize  resource
2963                     allocations on the NUMA nodes.
2964
2965              max_array_tasks
2966                     Specify  the maximum number of tasks that can be included
2967                     in a job array.  The default limit is  MaxArraySize,  but
2968                     this  option  can be used to set a lower limit. For exam‐
2969                     ple, max_array_tasks=1000 and  MaxArraySize=100001  would
2970                     permit  a maximum task ID of 100000, but limit the number
2971                     of tasks in any single job array to 1000.
2972
2973              max_rpc_cnt=#
2974                     If the number of active threads in the  slurmctld  daemon
2975                     is  equal  to or larger than this value, defer scheduling
2976                     of jobs. The scheduler will check this condition at  cer‐
2977                     tain  points  in code and yield locks if necessary.  This
2978                     can improve Slurm's ability to process requests at a cost
2979                     of  initiating  new jobs less frequently. Default: 0 (op‐
2980                     tion disabled), Min: 0, Max: 1000.
2981
2982                     NOTE: The maximum number of threads  (MAX_SERVER_THREADS)
2983                     is internally set to 256 and defines the number of served
2984                     RPCs at a given time. Setting max_rpc_cnt  to  more  than
2985                     256 will be only useful to let backfill continue schedul‐
2986                     ing work after locks have been yielded (i.e. each 2  sec‐
2987                     onds)  if  there are a maximum of MAX(max_rpc_cnt/10, 20)
2988                     RPCs in the queue. i.e. max_rpc_cnt=1000,  the  scheduler
2989                     will  be  allowed  to  continue after yielding locks only
2990                     when there are less than or equal to  100  pending  RPCs.
2991                     If a value is set, then a value of 10 or higher is recom‐
2992                     mended. It may require some tuning for each  system,  but
2993                     needs to be high enough that scheduling isn't always dis‐
2994                     abled, and low enough that requests can get through in  a
2995                     reasonable period of time.
2996
2997              max_sched_time=#
2998                     How  long, in seconds, that the main scheduling loop will
2999                     execute for before exiting.  If a value is configured, be
3000                     aware  that  all  other Slurm operations will be deferred
3001                     during this time period.  Make certain the value is lower
3002                     than  MessageTimeout.   If a value is not explicitly con‐
3003                     figured, the default value is half of MessageTimeout with
3004                     a minimum default value of 1 second and a maximum default
3005                     value of 2 seconds.  For  example  if  MessageTimeout=10,
3006                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3007
3008              max_script_size=#
3009                     Specify  the  maximum  size  of a batch script, in bytes.
3010                     The default value is 4 megabytes.  Larger values may  ad‐
3011                     versely impact system performance.
3012
3013              max_switch_wait=#
3014                     Maximum  number of seconds that a job can delay execution
3015                     waiting for the specified desired switch count.  The  de‐
3016                     fault value is 300 seconds.
3017
3018              no_backup_scheduling
3019                     If  used,  the  backup  controller will not schedule jobs
3020                     when it takes over. The backup controller will allow jobs
3021                     to  be submitted, modified and cancelled but won't sched‐
3022                     ule new jobs. This is useful in  Cray  environments  when
3023                     the  backup  controller resides on an external Cray node.
3024                     A restart of slurmctld is required for  changes  to  this
3025                     parameter to take effect.
3026
3027              no_env_cache
3028                     If  used,  any job started on node that fails to load the
3029                     env from a node will fail instead  of  using  the  cached
3030                     env.    This   will   also   implicitly   imply  the  re‐
3031                     queue_setup_env_fail option as well.
3032
3033              nohold_on_prolog_fail
3034                     By default, if the Prolog exits with a non-zero value the
3035                     job  is  requeued in a held state. By specifying this pa‐
3036                     rameter the job will be requeued but not held so that the
3037                     scheduler can dispatch it to another host.
3038
3039              pack_serial_at_end
3040                     If  used  with  the  select/cons_res  or select/cons_tres
3041                     plugin, then put serial jobs at the end of the  available
3042                     nodes  rather  than using a best fit algorithm.  This may
3043                     reduce resource fragmentation for some workloads.
3044
3045              partition_job_depth=#
3046                     The default number of jobs to  attempt  scheduling  (i.e.
3047                     the  queue  depth)  from  each partition/queue in Slurm's
3048                     main scheduling logic.  The functionality is  similar  to
3049                     that provided by the bf_max_job_part option for the back‐
3050                     fill scheduling  logic.   The  default  value  is  0  (no
3051                     limit).   Job's  excluded from attempted scheduling based
3052                     upon partition  will  not  be  counted  against  the  de‐
3053                     fault_queue_depth  limit.   Also  see the bf_max_job_part
3054                     option.
3055
3056              preempt_reorder_count=#
3057                     Specify how many attempts should be  made  in  reordering
3058                     preemptable jobs to minimize the count of jobs preempted.
3059                     The default value is 1. High values may adversely  impact
3060                     performance.   The  logic  to support this option is only
3061                     available in  the  select/cons_res  and  select/cons_tres
3062                     plugins.
3063
3064              preempt_strict_order
3065                     If set, then execute extra logic in an attempt to preempt
3066                     only the lowest priority jobs.  It may  be  desirable  to
3067                     set  this configuration parameter when there are multiple
3068                     priorities of preemptable jobs.   The  logic  to  support
3069                     this  option is only available in the select/cons_res and
3070                     select/cons_tres plugins.
3071
3072              preempt_youngest_first
3073                     If set, then the preemption  sorting  algorithm  will  be
3074                     changed  to sort by the job start times to favor preempt‐
3075                     ing younger jobs  over  older.  (Requires  preempt/parti‐
3076                     tion_prio or preempt/qos plugins.)
3077
3078              reduce_completing_frag
3079                     This  option  is  used  to  control how scheduling of re‐
3080                     sources is performed when  jobs  are  in  the  COMPLETING
3081                     state, which influences potential fragmentation.  If this
3082                     option is not set then no jobs will  be  started  in  any
3083                     partition  when  any  job  is in the COMPLETING state for
3084                     less than CompleteWait seconds.  If this  option  is  set
3085                     then  no jobs will be started in any individual partition
3086                     that has a job in COMPLETING state  for  less  than  Com‐
3087                     pleteWait  seconds.  In addition, no jobs will be started
3088                     in any partition with nodes that overlap with  any  nodes
3089                     in  the  partition of the completing job.  This option is
3090                     to be used in conjunction with CompleteWait.
3091
3092                     NOTE: CompleteWait must be set in order for this to work.
3093                     If CompleteWait=0 then this option does nothing.
3094
3095                     NOTE: reduce_completing_frag only affects the main sched‐
3096                     uler, not the backfill scheduler.
3097
3098              requeue_setup_env_fail
3099                     By default if a job environment setup fails the job keeps
3100                     running  with  a  limited environment. By specifying this
3101                     parameter the job will be requeued in held state and  the
3102                     execution node drained.
3103
3104              salloc_wait_nodes
3105                     If  defined, the salloc command will wait until all allo‐
3106                     cated nodes are ready for use (i.e.  booted)  before  the
3107                     command  returns.  By default, salloc will return as soon
3108                     as the resource allocation has been made.
3109
3110              sbatch_wait_nodes
3111                     If defined, the sbatch script will wait until  all  allo‐
3112                     cated  nodes  are  ready for use (i.e. booted) before the
3113                     initiation. By default, the sbatch script will be  initi‐
3114                     ated  as  soon as the first node in the job allocation is
3115                     ready. The sbatch command can  use  the  --wait-all-nodes
3116                     option to override this configuration parameter.
3117
3118              sched_interval=#
3119                     How frequently, in seconds, the main scheduling loop will
3120                     execute and test all pending jobs.  The default value  is
3121                     60 seconds.
3122
3123              sched_max_job_start=#
3124                     The maximum number of jobs that the main scheduling logic
3125                     will start in any single execution.  The default value is
3126                     zero, which imposes no limit.
3127
3128              sched_min_interval=#
3129                     How frequently, in microseconds, the main scheduling loop
3130                     will execute and test any pending  jobs.   The  scheduler
3131                     runs  in a limited fashion every time that any event hap‐
3132                     pens which could enable a job to start (e.g. job  submit,
3133                     job  terminate,  etc.).  If these events happen at a high
3134                     frequency, the scheduler can run very frequently and con‐
3135                     sume  significant  resources if not throttled by this op‐
3136                     tion.  This option specifies the minimum time between the
3137                     end of one scheduling cycle and the beginning of the next
3138                     scheduling cycle.  A value of zero  will  disable  throt‐
3139                     tling  of  the  scheduling  logic  interval.  The default
3140                     value is 2 microseconds.
3141
3142              spec_cores_first
3143                     Specialized cores will be selected from the  first  cores
3144                     of  the  first  sockets, cycling through the sockets on a
3145                     round robin basis.  By default, specialized cores will be
3146                     selected from the last cores of the last sockets, cycling
3147                     through the sockets on a round robin basis.
3148
3149              step_retry_count=#
3150                     When a step completes and there are steps ending resource
3151                     allocation, then retry step allocations for at least this
3152                     number of pending steps.  Also see step_retry_time.   The
3153                     default value is 8 steps.
3154
3155              step_retry_time=#
3156                     When a step completes and there are steps ending resource
3157                     allocation, then retry step  allocations  for  all  steps
3158                     which  have been pending for at least this number of sec‐
3159                     onds.  Also see step_retry_count.  The default  value  is
3160                     60 seconds.
3161
3162              whole_hetjob
3163                     Requests  to  cancel,  hold or release any component of a
3164                     heterogeneous job will be applied to  all  components  of
3165                     the job.
3166
3167                     NOTE:  this  option  was  previously named whole_pack and
3168                     this is still supported for retrocompatibility.
3169
3170       SchedulerTimeSlice
3171              Number of seconds in each time slice when gang scheduling is en‐
3172              abled  (PreemptMode=SUSPEND,GANG).   The value must be between 5
3173              seconds and 65533 seconds.  The default value is 30 seconds.
3174
3175       SchedulerType
3176              Identifies the type of scheduler  to  be  used.   A  restart  of
3177              slurmctld  is required for changes to this parameter to take ef‐
3178              fect.  The scontrol command can be used to manually  change  job
3179              priorities if desired.  Acceptable values include:
3180
3181              sched/backfill
3182                     For  a  backfill scheduling module to augment the default
3183                     FIFO  scheduling.   Backfill  scheduling  will   initiate
3184                     lower-priority  jobs  if  doing so does not delay the ex‐
3185                     pected initiation time of any higher priority  job.   Ef‐
3186                     fectiveness  of  backfill  scheduling  is  dependent upon
3187                     users specifying job time limits, otherwise all jobs will
3188                     have  the  same time limit and backfilling is impossible.
3189                     Note documentation  for  the  SchedulerParameters  option
3190                     above.  This is the default configuration.
3191
3192              sched/builtin
3193                     This is the FIFO scheduler which initiates jobs in prior‐
3194                     ity order.  If any job in the partition can not be sched‐
3195                     uled,  no  lower  priority  job in that partition will be
3196                     scheduled.  An exception is made for jobs  that  can  not
3197                     run due to partition constraints (e.g. the time limit) or
3198                     down/drained nodes.  In that case,  lower  priority  jobs
3199                     can be initiated and not impact the higher priority job.
3200
3201       ScronParameters
3202              Multiple options may be comma separated.
3203
3204              enable Enable  the use of scrontab to submit and manage periodic
3205                     repeating jobs.
3206
3207       SelectType
3208              Identifies the type of resource selection algorithm to be  used.
3209              A restart of slurmctld is required for changes to this parameter
3210              to take effect.  When changed, all job information (running  and
3211              pending)  will  be lost, since the job state save format used by
3212              each plugin is different.  The only exception to  this  is  when
3213              changing  from  cons_res  to  cons_tres  or  from  cons_tres  to
3214              cons_res. However, if a job contains cons_tres-specific features
3215              and then SelectType is changed to cons_res, the job will be can‐
3216              celed, since there is no way for cons_res  to  satisfy  require‐
3217              ments specific to cons_tres.
3218
3219              Acceptable values include
3220
3221              select/cons_res
3222                     The  resources (cores and memory) within a node are indi‐
3223                     vidually allocated as consumable  resources.   Note  that
3224                     whole  nodes can be allocated to jobs for selected parti‐
3225                     tions by using the OverSubscribe=Exclusive  option.   See
3226                     the  partition  OverSubscribe parameter for more informa‐
3227                     tion.
3228
3229              select/cons_tres
3230                     The resources (cores, memory, GPUs and all  other  track‐
3231                     able  resources) within a node are individually allocated
3232                     as consumable resources.  Note that whole  nodes  can  be
3233                     allocated  to  jobs  for selected partitions by using the
3234                     OverSubscribe=Exclusive option.  See the partition  Over‐
3235                     Subscribe parameter for more information.
3236
3237              select/cray_aries
3238                     for   a   Cray   system.    The  default  value  is  "se‐
3239                     lect/cray_aries" for all Cray systems.
3240
3241              select/linear
3242                     for allocation of entire nodes assuming a one-dimensional
3243                     array  of  nodes  in which sequentially ordered nodes are
3244                     preferable.  For a heterogeneous cluster (e.g.  different
3245                     CPU  counts  on  the various nodes), resource allocations
3246                     will favor nodes with high CPU  counts  as  needed  based
3247                     upon the job's node and CPU specification if TopologyPlu‐
3248                     gin=topology/none is configured. Use  of  other  topology
3249                     plugins with select/linear and heterogeneous nodes is not
3250                     recommended and may result in valid  job  allocation  re‐
3251                     quests being rejected.  This is the default value.
3252
3253       SelectTypeParameters
3254              The  permitted  values  of  SelectTypeParameters depend upon the
3255              configured value of SelectType.  The only supported options  for
3256              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3257              which treats memory as a consumable resource and prevents memory
3258              over  subscription  with  job preemption or gang scheduling.  By
3259              default SelectType=select/linear allocates whole nodes  to  jobs
3260              without  considering  their  memory consumption.  By default Se‐
3261              lectType=select/cons_res, SelectType=select/cray_aries, and  Se‐
3262              lectType=select/cons_tres,  use  CR_Core_Memory, which allocates
3263              Core to jobs with considering their memory consumption.
3264
3265              A restart of slurmctld is required for changes to this parameter
3266              to take effect.
3267
3268              The   following   options   are   supported  for  SelectType=se‐
3269              lect/cray_aries:
3270
3271              OTHER_CONS_RES
3272                     Layer  the   select/cons_res   plugin   under   the   se‐
3273                     lect/cray_aries  plugin,  the  default is to layer on se‐
3274                     lect/linear.  This also allows all the options  available
3275                     for SelectType=select/cons_res.
3276
3277              OTHER_CONS_TRES
3278                     Layer   the   select/cons_tres   plugin   under  the  se‐
3279                     lect/cray_aries plugin, the default is to  layer  on  se‐
3280                     lect/linear.   This also allows all the options available
3281                     for SelectType=select/cons_tres.
3282
3283       The following options are supported by  the  SelectType=select/cons_res
3284       and SelectType=select/cons_tres plugins:
3285
3286              CR_CPU CPUs  are  consumable resources.  Configure the number of
3287                     CPUs on each node, which may be equal  to  the  count  of
3288                     cores or hyper-threads on the node depending upon the de‐
3289                     sired minimum resource allocation.   The  node's  Boards,
3290                     Sockets, CoresPerSocket and ThreadsPerCore may optionally
3291                     be configured and result in job  allocations  which  have
3292                     improved  locality;  however  doing  so will prevent more
3293                     than one job from being allocated on each core.
3294
3295              CR_CPU_Memory
3296                     CPUs and memory are consumable resources.  Configure  the
3297                     number  of  CPUs  on each node, which may be equal to the
3298                     count of cores or hyper-threads  on  the  node  depending
3299                     upon the desired minimum resource allocation.  The node's
3300                     Boards, Sockets, CoresPerSocket  and  ThreadsPerCore  may
3301                     optionally  be  configured  and result in job allocations
3302                     which have improved locality; however doing so will  pre‐
3303                     vent more than one job from being allocated on each core.
3304                     Setting a value for DefMemPerCPU is strongly recommended.
3305
3306              CR_Core
3307                     Cores  are  consumable  resources.   On  nodes  with  hy‐
3308                     per-threads, each thread is counted as a CPU to satisfy a
3309                     job's resource requirement, but multiple jobs are not al‐
3310                     located  threads on the same core.  The count of CPUs al‐
3311                     located to a job is rounded up to account for  every  CPU
3312                     on  an  allocated core. This will also impact total allo‐
3313                     cated memory when --mem-per-cpu is used to be multiply of
3314                     total number of CPUs on allocated cores.
3315
3316              CR_Core_Memory
3317                     Cores and memory are consumable resources.  On nodes with
3318                     hyper-threads, each thread is counted as a CPU to satisfy
3319                     a  job's  resource requirement, but multiple jobs are not
3320                     allocated threads on the same core.  The  count  of  CPUs
3321                     allocated to a job may be rounded up to account for every
3322                     CPU on an allocated core.  Setting a value for DefMemPer‐
3323                     CPU is strongly recommended.
3324
3325              CR_ONE_TASK_PER_CORE
3326                     Allocate  one task per core by default.  Without this op‐
3327                     tion, by default one task will be allocated per thread on
3328                     nodes  with  more  than  one  ThreadsPerCore  configured.
3329                     NOTE: This option cannot be used with CR_CPU*.
3330
3331              CR_CORE_DEFAULT_DIST_BLOCK
3332                     Allocate cores within a node using block distribution  by
3333                     default.   This is a pseudo-best-fit algorithm that mini‐
3334                     mizes the number of boards and minimizes  the  number  of
3335                     sockets  (within minimum boards) used for the allocation.
3336                     This default behavior can be overridden specifying a par‐
3337                     ticular  "-m" parameter with srun/salloc/sbatch.  Without
3338                     this option, cores will be  allocated  cyclically  across
3339                     the sockets.
3340
3341              CR_LLN Schedule  resources  to  jobs  on  the least loaded nodes
3342                     (based upon the number of idle CPUs). This  is  generally
3343                     only  recommended  for an environment with serial jobs as
3344                     idle resources will tend to be highly fragmented, result‐
3345                     ing in parallel jobs being distributed across many nodes.
3346                     Note that node Weight takes precedence over how many idle
3347                     resources  are on each node.  Also see the partition con‐
3348                     figuration parameter LLN use the least  loaded  nodes  in
3349                     selected partitions.
3350
3351              CR_Pack_Nodes
3352                     If  a job allocation contains more resources than will be
3353                     used for launching tasks (e.g. if whole nodes  are  allo‐
3354                     cated  to  a  job), then rather than distributing a job's
3355                     tasks evenly across its allocated  nodes,  pack  them  as
3356                     tightly  as  possible  on these nodes.  For example, con‐
3357                     sider a job allocation containing two entire  nodes  with
3358                     eight  CPUs  each.   If  the  job starts ten tasks across
3359                     those two nodes without this option, it will  start  five
3360                     tasks  on each of the two nodes.  With this option, eight
3361                     tasks will be started on the first node and two tasks  on
3362                     the  second  node.  This can be superseded by "NoPack" in
3363                     srun's "--distribution" option.  CR_Pack_Nodes  only  ap‐
3364                     plies when the "block" task distribution method is used.
3365
3366              CR_Socket
3367                     Sockets are consumable resources.  On nodes with multiple
3368                     cores, each core or thread is counted as a CPU to satisfy
3369                     a  job's  resource requirement, but multiple jobs are not
3370                     allocated resources on the same socket.
3371
3372              CR_Socket_Memory
3373                     Memory and sockets are consumable  resources.   On  nodes
3374                     with  multiple cores, each core or thread is counted as a
3375                     CPU to satisfy a job's resource requirement, but multiple
3376                     jobs  are  not  allocated  resources  on the same socket.
3377                     Setting a value for DefMemPerCPU is strongly recommended.
3378
3379              CR_Memory
3380                     Memory is a  consumable  resource.   NOTE:  This  implies
3381                     OverSubscribe=YES  or  OverSubscribe=FORCE for all parti‐
3382                     tions.  Setting a value for DefMemPerCPU is strongly rec‐
3383                     ommended.
3384
3385       SlurmctldAddr
3386              An  optional  address  to be used for communications to the cur‐
3387              rently active slurmctld daemon, normally used  with  Virtual  IP
3388              addressing of the currently active server.  If this parameter is
3389              not specified then each primary and backup server will have  its
3390              own  unique  address used for communications as specified in the
3391              SlurmctldHost parameter.  If this parameter  is  specified  then
3392              the  SlurmctldHost  parameter  will still be used for communica‐
3393              tions to specific slurmctld primary or backup servers, for exam‐
3394              ple to cause all of them to read the current configuration files
3395              or shutdown.  Also see the  SlurmctldPrimaryOffProg  and  Slurm‐
3396              ctldPrimaryOnProg configuration parameters to configure programs
3397              to manipulate virtual IP address manipulation.
3398
3399       SlurmctldDebug
3400              The level of detail to provide slurmctld daemon's logs.  The de‐
3401              fault  value is info.  If the slurmctld daemon is initiated with
3402              -v or --verbose options, that debug level will  be  preserve  or
3403              restored upon reconfiguration.
3404
3405              quiet     Log nothing
3406
3407              fatal     Log only fatal errors
3408
3409              error     Log only errors
3410
3411              info      Log errors and general informational messages
3412
3413              verbose   Log errors and verbose informational messages
3414
3415              debug     Log  errors and verbose informational messages and de‐
3416                        bugging messages
3417
3418              debug2    Log errors and verbose informational messages and more
3419                        debugging messages
3420
3421              debug3    Log errors and verbose informational messages and even
3422                        more debugging messages
3423
3424              debug4    Log errors and verbose informational messages and even
3425                        more debugging messages
3426
3427              debug5    Log errors and verbose informational messages and even
3428                        more debugging messages
3429
3430       SlurmctldHost
3431              The short, or long, hostname of the machine where Slurm  control
3432              daemon is executed (i.e. the name returned by the command "host‐
3433              name -s").  This hostname is optionally followed by the address,
3434              either  the  IP  address  or  a name by which the address can be
3435              identified, enclosed in parentheses (e.g.   SlurmctldHost=slurm‐
3436              ctl-primary(12.34.56.78)). This value must be specified at least
3437              once. If specified more than once, the first hostname named will
3438              be  where  the  daemon runs.  If the first specified host fails,
3439              the daemon will execute on the second host.  If both  the  first
3440              and  second specified host fails, the daemon will execute on the
3441              third host.  A restart of slurmctld is required for  changes  to
3442              this parameter to take effect.
3443
3444       SlurmctldLogFile
3445              Fully qualified pathname of a file into which the slurmctld dae‐
3446              mon's logs are written.  The default  value  is  none  (performs
3447              logging via syslog).
3448              See the section LOGGING if a pathname is specified.
3449
3450       SlurmctldParameters
3451              Multiple options may be comma separated.
3452
3453              allow_user_triggers
3454                     Permit  setting  triggers from non-root/slurm_user users.
3455                     SlurmUser must also be set to root to permit these  trig‐
3456                     gers  to  work.  See the strigger man page for additional
3457                     details.
3458
3459              cloud_dns
3460                     By default, Slurm expects that the network address for  a
3461                     cloud  node won't be known until the creation of the node
3462                     and that Slurm will be notified  of  the  node's  address
3463                     (e.g.  scontrol  update nodename=<name> nodeaddr=<addr>).
3464                     Since Slurm communications rely on the node configuration
3465                     found  in the slurm.conf, Slurm will tell the client com‐
3466                     mand, after waiting for all nodes to boot, each node's ip
3467                     address.  However, in environments where the nodes are in
3468                     DNS, this step can be avoided by configuring this option.
3469
3470              cloud_reg_addrs
3471                     When a cloud node  registers,  the  node's  NodeAddr  and
3472                     NodeHostName  will automatically be set. They will be re‐
3473                     set back to the nodename after powering off.
3474
3475              enable_configless
3476                     Permit "configless" operation by the slurmd,  slurmstepd,
3477                     and  user commands.  When enabled the slurmd will be per‐
3478                     mitted to retrieve config files from the  slurmctld,  and
3479                     on any 'scontrol reconfigure' command new configs will be
3480                     automatically pushed out and applied to  nodes  that  are
3481                     running  in  this "configless" mode.  A restart of slurm‐
3482                     ctld is required for changes to this  parameter  to  take
3483                     effect.
3484
3485              idle_on_node_suspend
3486                     Mark  nodes  as  idle,  regardless of current state, when
3487                     suspending nodes with SuspendProgram so that  nodes  will
3488                     be eligible to be resumed at a later time.
3489
3490              node_reg_mem_percent=#
3491                     Percentage  of  memory a node is allowed to register with
3492                     without being marked as invalid with low memory.  Default
3493                     is 100. For State=CLOUD nodes, the default is 90. To dis‐
3494                     able this for cloud nodes set it to 100. config_overrides
3495                     takes precedence over this option.
3496
3497                     It's  recommended that task/cgroup with ConstrainRamSpace
3498                     is configured. A memory cgroup limit won't  be  set  more
3499                     than  the actual memory on the node. If needed, configure
3500                     AllowedRamSpace in the cgroup.conf to add a buffer.
3501
3502              power_save_interval
3503                     How often the power_save thread looks to resume and  sus‐
3504                     pend  nodes. The power_save thread will do work sooner if
3505                     there are node state changes. Default is 10 seconds.
3506
3507              power_save_min_interval
3508                     How often the power_save thread, at a minimum,  looks  to
3509                     resume and suspend nodes. Default is 0.
3510
3511              max_dbd_msg_action
3512                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3513                     card' (default) and 'exit'.
3514
3515                     When 'discard' is specified and MaxDBDMsgs is reached  we
3516                     start by purging pending messages of types Step start and
3517                     complete, and it reaches MaxDBDMsgs again Job start  mes‐
3518                     sages  are  purged.  Job completes and node state changes
3519                     continue to consume the  empty  space  created  from  the
3520                     purgings  until  MaxDBDMsgs  is reached again at which no
3521                     new message is tracked creating data loss and potentially
3522                     runaway jobs.
3523
3524                     When  'exit'  is  specified and MaxDBDMsgs is reached the
3525                     slurmctld will exit instead of discarding  any  messages.
3526                     It  will  be  impossible to start the slurmctld with this
3527                     option where the slurmdbd is down and  the  slurmctld  is
3528                     tracking more than MaxDBDMsgs.
3529
3530              preempt_send_user_signal
3531                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3532                     tion time even if the signal time hasn't been reached. In
3533                     the  case  of a gracetime preemption the user signal will
3534                     be sent if the user signal has  been  specified  and  not
3535                     sent, otherwise a SIGTERM will be sent to the tasks.
3536
3537              reboot_from_controller
3538                     Run  the  RebootProgram from the controller instead of on
3539                     the  slurmds.  The  RebootProgram  will   be   passed   a
3540                     comma-separated list of nodes to reboot.
3541
3542              user_resv_delete
3543                     Allow any user able to run in a reservation to delete it.
3544
3545       SlurmctldPidFile
3546              Fully  qualified  pathname  of  a file into which the  slurmctld
3547              daemon may write its process id. This may be used for  automated
3548              signal   processing.   The  default  value  is  "/var/run/slurm‐
3549              ctld.pid".
3550
3551       SlurmctldPlugstack
3552              A comma-delimited list of Slurm controller plugins to be started
3553              when  the  daemon  begins and terminated when it ends.  Only the
3554              plugin's init and fini functions are called.
3555
3556       SlurmctldPort
3557              The port number that the Slurm controller, slurmctld, listens to
3558              for  work. The default value is SLURMCTLD_PORT as established at
3559              system build time. If none is explicitly specified, it  will  be
3560              set  to 6817.  SlurmctldPort may also be configured to support a
3561              range of port numbers in order to accept larger bursts of incom‐
3562              ing messages by specifying two numbers separated by a dash (e.g.
3563              SlurmctldPort=6817-6818).  A restart of  slurmctld  is  required
3564              for  changes  to  this  parameter  to take effect.  NOTE: Either
3565              slurmctld and slurmd daemons must not execute on the same  nodes
3566              or the values of SlurmctldPort and SlurmdPort must be different.
3567
3568              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3569              automatically try to interact  with  anything  opened  on  ports
3570              8192-60000.   Configure  SlurmctldPort  to use a port outside of
3571              the configured SrunPortRange and RSIP's port range.
3572
3573       SlurmctldPrimaryOffProg
3574              This program is executed when a slurmctld daemon running as  the
3575              primary server becomes a backup server. By default no program is
3576              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3577              ter.
3578
3579       SlurmctldPrimaryOnProg
3580              This  program  is  executed when a slurmctld daemon running as a
3581              backup server becomes the primary server. By default no  program
3582              is  executed.   When  using  virtual IP addresses to manage High
3583              Available Slurm services, this program can be used to add the IP
3584              address  to  an  interface (and optionally try to kill the unre‐
3585              sponsive  slurmctld daemon and flush the ARP caches on nodes  on
3586              the local Ethernet fabric).  See also the related "SlurmctldPri‐
3587              maryOffProg" parameter.
3588
3589       SlurmctldSyslogDebug
3590              The slurmctld daemon will log events to the syslog file  at  the
3591              specified level of detail. If not set, the slurmctld daemon will
3592              log to syslog at level fatal, unless there is  no  SlurmctldLog‐
3593              File  and it is running in the background, in which case it will
3594              log to syslog at the level specified by SlurmctldDebug (at fatal
3595              in the case that SlurmctldDebug is set to quiet) or it is run in
3596              the foreground, when it will be set to quiet.
3597
3598              quiet     Log nothing
3599
3600              fatal     Log only fatal errors
3601
3602              error     Log only errors
3603
3604              info      Log errors and general informational messages
3605
3606              verbose   Log errors and verbose informational messages
3607
3608              debug     Log errors and verbose informational messages and  de‐
3609                        bugging messages
3610
3611              debug2    Log errors and verbose informational messages and more
3612                        debugging messages
3613
3614              debug3    Log errors and verbose informational messages and even
3615                        more debugging messages
3616
3617              debug4    Log errors and verbose informational messages and even
3618                        more debugging messages
3619
3620              debug5    Log errors and verbose informational messages and even
3621                        more debugging messages
3622
3623              NOTE: By default, Slurm's systemd service files start daemons in
3624              the foreground with the -D option. This means that systemd  will
3625              capture  stdout/stderr output and print that to syslog, indepen‐
3626              dent of Slurm printing to syslog directly.  To  prevent  systemd
3627              from  doing  this,  add  "StandardOutput=null"  and "StandardEr‐
3628              ror=null" to the respective service files or override files.
3629
3630       SlurmctldTimeout
3631              The interval, in seconds, that the backup controller  waits  for
3632              the  primary controller to respond before assuming control.  The
3633              default value is 120 seconds.  May not exceed 65533.
3634
3635       SlurmdDebug
3636              The level of detail to provide slurmd daemon's  logs.   The  de‐
3637              fault value is info.
3638
3639              quiet     Log nothing
3640
3641              fatal     Log only fatal errors
3642
3643              error     Log only errors
3644
3645              info      Log errors and general informational messages
3646
3647              verbose   Log errors and verbose informational messages
3648
3649              debug     Log  errors and verbose informational messages and de‐
3650                        bugging messages
3651
3652              debug2    Log errors and verbose informational messages and more
3653                        debugging messages
3654
3655              debug3    Log errors and verbose informational messages and even
3656                        more debugging messages
3657
3658              debug4    Log errors and verbose informational messages and even
3659                        more debugging messages
3660
3661              debug5    Log errors and verbose informational messages and even
3662                        more debugging messages
3663
3664       SlurmdLogFile
3665              Fully qualified pathname of a file into which the   slurmd  dae‐
3666              mon's  logs  are  written.   The default value is none (performs
3667              logging via syslog).  Any "%h" within the name is replaced  with
3668              the  hostname  on  which the slurmd is running.  Any "%n" within
3669              the name is replaced with the  Slurm  node  name  on  which  the
3670              slurmd is running.
3671              See the section LOGGING if a pathname is specified.
3672
3673       SlurmdParameters
3674              Parameters  specific  to  the  Slurmd.   Multiple options may be
3675              comma separated.
3676
3677              config_overrides
3678                     If set, consider the configuration of  each  node  to  be
3679                     that  specified  in the slurm.conf configuration file and
3680                     any node with less than the configured resources will not
3681                     be  set  to  INVAL/INVALID_REG.  This option is generally
3682                     only useful for testing purposes.  Equivalent to the  now
3683                     deprecated FastSchedule=2 option.
3684
3685              l3cache_as_socket
3686                     Use  the hwloc l3cache as the socket count. Can be useful
3687                     on certain processors  where  the  socket  level  is  too
3688                     coarse, and the l3cache may provide better task distribu‐
3689                     tion. (E.g.,  along  CCX  boundaries  instead  of  socket
3690                     boundaries.)  Requires hwloc v2.
3691
3692              shutdown_on_reboot
3693                     If  set,  the  Slurmd will shut itself down when a reboot
3694                     request is received.
3695
3696       SlurmdPidFile
3697              Fully qualified pathname of a file into which the  slurmd daemon
3698              may  write its process id. This may be used for automated signal
3699              processing.  Any "%h" within the name is replaced with the host‐
3700              name  on  which the slurmd is running.  Any "%n" within the name
3701              is replaced with the Slurm node name on which the slurmd is run‐
3702              ning.  The default value is "/var/run/slurmd.pid".
3703
3704       SlurmdPort
3705              The port number that the Slurm compute node daemon, slurmd, lis‐
3706              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3707              lished  at  system  build time. If none is explicitly specified,
3708              its value will be 6818.  A restart of slurmctld is required  for
3709              changes  to  this parameter to take effect.  NOTE: Either slurm‐
3710              ctld and slurmd daemons must not execute on the  same  nodes  or
3711              the values of SlurmctldPort and SlurmdPort must be different.
3712
3713              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3714              automatically try to interact  with  anything  opened  on  ports
3715              8192-60000.   Configure  SlurmdPort to use a port outside of the
3716              configured SrunPortRange and RSIP's port range.
3717
3718       SlurmdSpoolDir
3719              Fully qualified pathname of a directory into  which  the  slurmd
3720              daemon's  state information and batch job script information are
3721              written. This must be a  common  pathname  for  all  nodes,  but
3722              should represent a directory which is local to each node (refer‐
3723              ence   a   local   file   system).   The   default   value    is
3724              "/var/spool/slurmd".   Any "%h" within the name is replaced with
3725              the hostname on which the slurmd is running.   Any  "%n"  within
3726              the  name  is  replaced  with  the  Slurm node name on which the
3727              slurmd is running.
3728
3729       SlurmdSyslogDebug
3730              The slurmd daemon will log events to  the  syslog  file  at  the
3731              specified  level  of  detail. If not set, the slurmd daemon will
3732              log to syslog at level fatal, unless there is  no  SlurmdLogFile
3733              and  it  is running in the background, in which case it will log
3734              to syslog at the level specified by SlurmdDebug   (at  fatal  in
3735              the  case  that SlurmdDebug is set to quiet) or it is run in the
3736              foreground, when it will be set to quiet.
3737
3738              quiet     Log nothing
3739
3740              fatal     Log only fatal errors
3741
3742              error     Log only errors
3743
3744              info      Log errors and general informational messages
3745
3746              verbose   Log errors and verbose informational messages
3747
3748              debug     Log errors and verbose informational messages and  de‐
3749                        bugging messages
3750
3751              debug2    Log errors and verbose informational messages and more
3752                        debugging messages
3753
3754              debug3    Log errors and verbose informational messages and even
3755                        more debugging messages
3756
3757              debug4    Log errors and verbose informational messages and even
3758                        more debugging messages
3759
3760              debug5    Log errors and verbose informational messages and even
3761                        more debugging messages
3762
3763              NOTE: By default, Slurm's systemd service files start daemons in
3764              the foreground with the -D option. This means that systemd  will
3765              capture  stdout/stderr output and print that to syslog, indepen‐
3766              dent of Slurm printing to syslog directly.  To  prevent  systemd
3767              from  doing  this,  add  "StandardOutput=null"  and "StandardEr‐
3768              ror=null" to the respective service files or override files.
3769
3770       SlurmdTimeout
3771              The interval, in seconds, that the Slurm  controller  waits  for
3772              slurmd  to respond before configuring that node's state to DOWN.
3773              A value of zero indicates the node will not be tested by  slurm‐
3774              ctld  to confirm the state of slurmd, the node will not be auto‐
3775              matically set  to  a  DOWN  state  indicating  a  non-responsive
3776              slurmd,  and  some other tool will take responsibility for moni‐
3777              toring the state of each compute node  and  its  slurmd  daemon.
3778              Slurm's hierarchical communication mechanism is used to ping the
3779              slurmd daemons in order to minimize system noise  and  overhead.
3780              The  default  value  is  300  seconds.  The value may not exceed
3781              65533 seconds.
3782
3783       SlurmdUser
3784              The name of the user that the slurmd daemon executes  as.   This
3785              user  must  exist on all nodes of the cluster for authentication
3786              of communications between Slurm components.  The  default  value
3787              is "root".
3788
3789       SlurmSchedLogFile
3790              Fully  qualified  pathname of the scheduling event logging file.
3791              The syntax of this parameter is the same  as  for  SlurmctldLog‐
3792              File.   In  order  to  configure scheduler logging, set both the
3793              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3794
3795       SlurmSchedLogLevel
3796              The initial level of scheduling event logging,  similar  to  the
3797              SlurmctldDebug  parameter  used  to control the initial level of
3798              slurmctld logging.  Valid values for SlurmSchedLogLevel are  "0"
3799              (scheduler  logging  disabled)  and  "1"  (scheduler logging en‐
3800              abled).  If this parameter is omitted, the value defaults to "0"
3801              (disabled).   In  order to configure scheduler logging, set both
3802              the SlurmSchedLogFile and  SlurmSchedLogLevel  parameters.   The
3803              scheduler  logging  level can be changed dynamically using scon‐
3804              trol.
3805
3806       SlurmUser
3807              The name of the user that the slurmctld daemon executes as.  For
3808              security  purposes,  a  user  other  than "root" is recommended.
3809              This user must exist on all nodes of the cluster for authentica‐
3810              tion  of  communications  between Slurm components.  The default
3811              value is "root".
3812
3813       SrunEpilog
3814              Fully qualified pathname of an executable to be run by srun fol‐
3815              lowing the completion of a job step.  The command line arguments
3816              for the executable will be the command and arguments of the  job
3817              step.   This configuration parameter may be overridden by srun's
3818              --epilog parameter. Note that while the other "Epilog"  executa‐
3819              bles  (e.g.,  TaskEpilog) are run by slurmd on the compute nodes
3820              where the tasks are executed, the SrunEpilog runs  on  the  node
3821              where the "srun" is executing.
3822
3823       SrunPortRange
3824              The  srun  creates  a set of listening ports to communicate with
3825              the controller, the slurmstepd and  to  handle  the  application
3826              I/O.  By default these ports are ephemeral meaning the port num‐
3827              bers are selected by the  kernel.  Using  this  parameter  allow
3828              sites  to  configure a range of ports from which srun ports will
3829              be selected. This is useful if sites want to allow only  certain
3830              port range on their network.
3831
3832              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3833              automatically try to interact  with  anything  opened  on  ports
3834              8192-60000.   Configure  SrunPortRange  to  use a range of ports
3835              above those used by RSIP, ideally 1000 or more ports, for  exam‐
3836              ple "SrunPortRange=60001-63000".
3837
3838              Note:  SrunPortRange  must be large enough to cover the expected
3839              number of srun ports created on a given submission node. A  sin‐
3840              gle srun opens 3 listening ports plus 2 more for every 48 hosts.
3841              Example:
3842
3843              srun -N 48 will use 5 listening ports.
3844
3845              srun -N 50 will use 7 listening ports.
3846
3847              srun -N 200 will use 13 listening ports.
3848
3849       SrunProlog
3850              Fully qualified pathname of an executable  to  be  run  by  srun
3851              prior  to  the launch of a job step.  The command line arguments
3852              for the executable will be the command and arguments of the  job
3853              step.   This configuration parameter may be overridden by srun's
3854              --prolog parameter. Note that while the other "Prolog"  executa‐
3855              bles  (e.g.,  TaskProlog) are run by slurmd on the compute nodes
3856              where the tasks are executed, the SrunProlog runs  on  the  node
3857              where the "srun" is executing.
3858
3859       StateSaveLocation
3860              Fully  qualified  pathname  of  a directory into which the Slurm
3861              controller,  slurmctld,  saves   its   state   (e.g.   "/usr/lo‐
3862              cal/slurm/checkpoint").   Slurm state will saved here to recover
3863              from system failures.  SlurmUser must be able to create files in
3864              this  directory.   If you have a secondary SlurmctldHost config‐
3865              ured, this location should be readable and writable by both sys‐
3866              tems.   Since  all running and pending job information is stored
3867              here, the use of a reliable file system (e.g.  RAID)  is  recom‐
3868              mended.  The default value is "/var/spool".  A restart of slurm‐
3869              ctld is required for changes to this parameter to  take  effect.
3870              If any slurm daemons terminate abnormally, their core files will
3871              also be written into this directory.
3872
3873       SuspendExcNodes
3874              Specifies the nodes which are to not be  placed  in  power  save
3875              mode,  even  if  the node remains idle for an extended period of
3876              time.  Use Slurm's hostlist expression to identify nodes with an
3877              optional  ":"  separator  and count of nodes to exclude from the
3878              preceding range.  For example "nid[10-20]:4" will prevent 4  us‐
3879              able  nodes  (i.e IDLE and not DOWN, DRAINING or already powered
3880              down) in the set "nid[10-20]" from being powered down.  Multiple
3881              sets of nodes can be specified with or without counts in a comma
3882              separated list (e.g  "nid[10-20]:4,nid[80-90]:2").   If  a  node
3883              count  specification  is  given, any list of nodes to NOT have a
3884              node count must be after the last specification  with  a  count.
3885              For  example  "nid[10-20]:4,nid[60-70]"  will exclude 4 nodes in
3886              the set "nid[10-20]:4" plus all nodes in  the  set  "nid[60-70]"
3887              while  "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the set
3888              "nid[1-3],nid[10-20]".  By default no nodes are excluded.
3889
3890       SuspendExcParts
3891              Specifies the partitions whose nodes are to  not  be  placed  in
3892              power  save  mode, even if the node remains idle for an extended
3893              period of time.  Multiple partitions can be identified and sepa‐
3894              rated by commas.  By default no nodes are excluded.
3895
3896       SuspendProgram
3897              SuspendProgram  is the program that will be executed when a node
3898              remains idle for an extended period of time.   This  program  is
3899              expected  to place the node into some power save mode.  This can
3900              be used to reduce the frequency and voltage of a  node  or  com‐
3901              pletely  power the node off.  The program executes as SlurmUser.
3902              The argument to the program will be the names  of  nodes  to  be
3903              placed  into  power savings mode (using Slurm's hostlist expres‐
3904              sion format).  By default, no program is run.
3905
3906       SuspendRate
3907              The rate at which nodes are placed into power save mode by  Sus‐
3908              pendProgram.  The value is number of nodes per minute and it can
3909              be used to prevent a large drop in power consumption (e.g. after
3910              a  large  job  completes).  A value of zero results in no limits
3911              being imposed.  The default value is 60 nodes per minute.
3912
3913       SuspendTime
3914              Nodes which remain idle or down for this number of seconds  will
3915              be  placed into power save mode by SuspendProgram.  Setting Sus‐
3916              pendTime to anything but INFINITE (or -1) will enable power save
3917              mode. INFINITE is the default.
3918
3919       SuspendTimeout
3920              Maximum  time permitted (in seconds) between when a node suspend
3921              request is issued and when the node is shutdown.  At  that  time
3922              the  node  must  be  ready  for a resume request to be issued as
3923              needed for new work.  The default value is 30 seconds.
3924
3925       SwitchParameters
3926              Optional parameters for the switch plugin.
3927
3928       SwitchType
3929              Identifies the type of switch or interconnect used for  applica‐
3930              tion      communications.      Acceptable     values     include
3931              "switch/cray_aries" for Cray systems, "switch/none" for switches
3932              not  requiring  special processing for job launch or termination
3933              (Ethernet,  and   InfiniBand)   and   The   default   value   is
3934              "switch/none".   All  Slurm  daemons,  commands and running jobs
3935              must be restarted for a change in SwitchType to take effect.  If
3936              running jobs exist at the time slurmctld is restarted with a new
3937              value of SwitchType, records of all jobs in  any  state  may  be
3938              lost.
3939
3940       TaskEpilog
3941              Fully qualified pathname of a program to be execute as the slurm
3942              job's owner after termination of each task.  See TaskProlog  for
3943              execution order details.
3944
3945       TaskPlugin
3946              Identifies  the  type  of  task launch plugin, typically used to
3947              provide resource management within a node (e.g. pinning tasks to
3948              specific processors). More than one task plugin can be specified
3949              in a comma-separated list. The prefix of  "task/"  is  optional.
3950              Acceptable values include:
3951
3952              task/affinity  enables      resource      containment      using
3953                             sched_setaffinity().  This enables the --cpu-bind
3954                             and/or --mem-bind srun options.
3955
3956              task/cgroup    enables  resource containment using Linux control
3957                             cgroups.   This  enables  the  --cpu-bind  and/or
3958                             --mem-bind   srun   options.    NOTE:   see  "man
3959                             cgroup.conf" for configuration details.
3960
3961              task/none      for systems requiring no special handling of user
3962                             tasks.   Lacks  support for the --cpu-bind and/or
3963                             --mem-bind srun options.  The  default  value  is
3964                             "task/none".
3965
3966              NOTE:  It  is recommended to stack task/affinity,task/cgroup to‐
3967              gether  when  configuring  TaskPlugin,  and  setting  Constrain‐
3968              Cores=yes  in  cgroup.conf.  This  setup  uses the task/affinity
3969              plugin for setting the  affinity  of  the  tasks  and  uses  the
3970              task/cgroup plugin to fence tasks into the specified resources.
3971
3972              NOTE:  For CRAY systems only: task/cgroup must be used with, and
3973              listed after task/cray_aries in  TaskPlugin.  The  task/affinity
3974              plugin  can be listed anywhere, but the previous constraint must
3975              be satisfied. For CRAY systems, a  configuration  like  this  is
3976              recommended:
3977              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
3978
3979       TaskPluginParam
3980              Optional  parameters  for  the  task  plugin.   Multiple options
3981              should be comma separated.  None, Sockets, Cores and Threads are
3982              mutually  exclusive  and  treated  as  a last possible source of
3983              --cpu-bind default. See also Node and Partition CpuBind options.
3984
3985              Cores  Bind tasks to  cores  by  default.   Overrides  automatic
3986                     binding.
3987
3988              None   Perform  no task binding by default.  Overrides automatic
3989                     binding.
3990
3991              Sockets
3992                     Bind to sockets by default.  Overrides automatic binding.
3993
3994              Threads
3995                     Bind to threads by default.  Overrides automatic binding.
3996
3997              SlurmdOffSpec
3998                     If specialized cores or CPUs are identified for the  node
3999                     (i.e. the CoreSpecCount or CpuSpecList are configured for
4000                     the node), then Slurm daemons running on the compute node
4001                     (i.e.  slurmd and slurmstepd) should run outside of those
4002                     resources (i.e. specialized resources are completely  un‐
4003                     available  to  Slurm  daemons and jobs spawned by Slurm).
4004                     This option may not  be  used  with  the  task/cray_aries
4005                     plugin.
4006
4007              Verbose
4008                     Verbosely report binding before tasks run by default.
4009
4010              Autobind
4011                     Set  a  default  binding in the event that "auto binding"
4012                     doesn't find a match.  Set to Threads, Cores  or  Sockets
4013                     (E.g. TaskPluginParam=autobind=threads).
4014
4015       TaskProlog
4016              Fully qualified pathname of a program to be execute as the slurm
4017              job's owner prior to initiation of each task.  Besides the  nor‐
4018              mal  environment variables, this has SLURM_TASK_PID available to
4019              identify the process ID of the  task  being  started.   Standard
4020              output  from this program can be used to control the environment
4021              variables and output for the user program.
4022
4023              export NAME=value   Will set environment variables for the  task
4024                                  being  spawned.   Everything after the equal
4025                                  sign to the end of the line will be used  as
4026                                  the value for the environment variable.  Ex‐
4027                                  porting of functions is not  currently  sup‐
4028                                  ported.
4029
4030              print ...           Will  cause  that  line (without the leading
4031                                  "print ") to be printed to the  job's  stan‐
4032                                  dard output.
4033
4034              unset NAME          Will  clear  environment  variables  for the
4035                                  task being spawned.
4036
4037              The order of task prolog/epilog execution is as follows:
4038
4039              1. pre_launch_priv()
4040                                  Function in TaskPlugin
4041
4042              1. pre_launch()     Function in TaskPlugin
4043
4044              2. TaskProlog       System-wide  per  task  program  defined  in
4045                                  slurm.conf
4046
4047              3. User prolog      Job-step-specific task program defined using
4048                                  srun's     --task-prolog      option      or
4049                                  SLURM_TASK_PROLOG environment variable
4050
4051              4. Task             Execute the job step's task
4052
4053              5. User epilog      Job-step-specific task program defined using
4054                                  srun's     --task-epilog      option      or
4055                                  SLURM_TASK_EPILOG environment variable
4056
4057              6. TaskEpilog       System-wide  per  task  program  defined  in
4058                                  slurm.conf
4059
4060              7. post_term()      Function in TaskPlugin
4061
4062       TCPTimeout
4063              Time permitted for TCP connection  to  be  established.  Default
4064              value is 2 seconds.
4065
4066       TmpFS  Fully  qualified  pathname  of the file system available to user
4067              jobs for temporary storage. This parameter is used in establish‐
4068              ing a node's TmpDisk space.  The default value is "/tmp".
4069
4070       TopologyParam
4071              Comma-separated options identifying network topology options.
4072
4073              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4074                             when TopologyPlugin=topology/tree.
4075
4076              TopoOptional   Only optimize allocation for network topology  if
4077                             the  job includes a switch option. Since optimiz‐
4078                             ing resource  allocation  for  topology  involves
4079                             much  higher  system overhead, this option can be
4080                             used to impose the extra overhead  only  on  jobs
4081                             which can take advantage of it. If most job allo‐
4082                             cations are not optimized for  network  topology,
4083                             they  may  fragment  resources  to the point that
4084                             topology optimization for other jobs will be dif‐
4085                             ficult  to  achieve.   NOTE: Jobs may span across
4086                             nodes without common parent  switches  with  this
4087                             enabled.
4088
4089       TopologyPlugin
4090              Identifies  the  plugin  to  be used for determining the network
4091              topology and optimizing job allocations to minimize network con‐
4092              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4093              plugins may be provided in the future which gather topology  in‐
4094              formation directly from the network.  Acceptable values include:
4095
4096              topology/3d_torus    best-fit   logic   over   three-dimensional
4097                                   topology
4098
4099              topology/none        default for other systems,  best-fit  logic
4100                                   over one-dimensional topology
4101
4102              topology/tree        used  for  a  hierarchical  network  as de‐
4103                                   scribed in a topology.conf file
4104
4105       TrackWCKey
4106              Boolean yes or no.  Used to set display and track of  the  Work‐
4107              load  Characterization  Key.  Must be set to track correct wckey
4108              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4109              file to create historical usage reports.
4110
4111       TreeWidth
4112              Slurmd  daemons  use  a virtual tree network for communications.
4113              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4114              architectures  with  a front end node running the slurmd daemon,
4115              the value must always be equal to or greater than the number  of
4116              front end nodes which eliminates the need for message forwarding
4117              between the slurmd daemons.  On other architectures the  default
4118              value  is 50, meaning each slurmd daemon can communicate with up
4119              to 50 other slurmd daemons and over 2500 nodes can be  contacted
4120              with  two  message  hops.   The default value will work well for
4121              most clusters.  Optimal  system  performance  can  typically  be
4122              achieved if TreeWidth is set to the square root of the number of
4123              nodes in the cluster for systems having no more than 2500  nodes
4124              or  the  cube  root for larger systems. The value may not exceed
4125              65533.
4126
4127       UnkillableStepProgram
4128              If the processes in a job step are determined to  be  unkillable
4129              for  a  period  of  time  specified by the UnkillableStepTimeout
4130              variable, the program specified by UnkillableStepProgram will be
4131              executed.  By default no program is run.
4132
4133              See section UNKILLABLE STEP PROGRAM SCRIPT for more information.
4134
4135       UnkillableStepTimeout
4136              The  length of time, in seconds, that Slurm will wait before de‐
4137              ciding that processes in a job step are unkillable  (after  they
4138              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
4139              gram.  The default timeout value is 60  seconds.   If  exceeded,
4140              the compute node will be drained to prevent future jobs from be‐
4141              ing scheduled on the node.
4142
4143       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
4144              will  be enabled.  PAM is used to establish the upper bounds for
4145              resource limits. With PAM support enabled, local system adminis‐
4146              trators can dynamically configure system resource limits. Chang‐
4147              ing the upper bound of a resource limit will not alter the  lim‐
4148              its  of  running jobs, only jobs started after a change has been
4149              made will pick up the new limits.  The default value is  0  (not
4150              to enable PAM support).  Remember that PAM also needs to be con‐
4151              figured to support Slurm as a service.  For  sites  using  PAM's
4152              directory based configuration option, a configuration file named
4153              slurm should be created.  The  module-type,  control-flags,  and
4154              module-path names that should be included in the file are:
4155              auth        required      pam_localuser.so
4156              auth        required      pam_shells.so
4157              account     required      pam_unix.so
4158              account     required      pam_access.so
4159              session     required      pam_unix.so
4160              For sites configuring PAM with a general configuration file, the
4161              appropriate lines (see above), where slurm is the  service-name,
4162              should be added.
4163
4164              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
4165              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
4166              these  two  modules  can work independently of the value set for
4167              UsePAM.
4168
4169       VSizeFactor
4170              Memory specifications in job requests apply to real memory  size
4171              (also  known  as  resident  set size). It is possible to enforce
4172              virtual memory limits for both jobs and job  steps  by  limiting
4173              their virtual memory to some percentage of their real memory al‐
4174              location. The VSizeFactor parameter specifies the job's  or  job
4175              step's  virtual  memory limit as a percentage of its real memory
4176              limit. For example, if a job's real memory limit  is  500MB  and
4177              VSizeFactor  is  set  to  101 then the job will be killed if its
4178              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
4179              (101 percent of the real memory limit).  The default value is 0,
4180              which disables enforcement of virtual memory limits.  The  value
4181              may not exceed 65533 percent.
4182
4183              NOTE:  This  parameter is dependent on OverMemoryKill being con‐
4184              figured in JobAcctGatherParams. It is also possible to configure
4185              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4186              Factor will not  have  an  effect  on  memory  enforcement  done
4187              through cgroups.
4188
4189       WaitTime
4190              Specifies  how  many  seconds the srun command should by default
4191              wait after the first task terminates before terminating all  re‐
4192              maining  tasks.  The  "--wait"  option  on the srun command line
4193              overrides this value.  The default value is  0,  which  disables
4194              this feature.  May not exceed 65533 seconds.
4195
4196       X11Parameters
4197              For use with Slurm's built-in X11 forwarding implementation.
4198
4199              home_xauthority
4200                      If set, xauth data on the compute node will be placed in
4201                      ~/.Xauthority rather than  in  a  temporary  file  under
4202                      TmpFS.
4203

NODE CONFIGURATION

4205       The configuration of nodes (or machines) to be managed by Slurm is also
4206       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
4207       adding  nodes, changing their processor count, etc.) require restarting
4208       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
4209       must know each node in the system to forward messages in support of hi‐
4210       erarchical communications.  Only the NodeName must be supplied  in  the
4211       configuration  file.   All  other node configuration information is op‐
4212       tional.  It is advisable to establish baseline node configurations, es‐
4213       pecially  if the cluster is heterogeneous.  Nodes which register to the
4214       system with less than the configured resources (e.g.  too  little  mem‐
4215       ory),  will  be  placed in the "DOWN" state to avoid scheduling jobs on
4216       them.  Establishing baseline configurations  will  also  speed  Slurm's
4217       scheduling process by permitting it to compare job requirements against
4218       these (relatively few) configuration parameters and possibly avoid hav‐
4219       ing  to check job requirements against every individual node's configu‐
4220       ration.  The resources checked at node  registration  time  are:  CPUs,
4221       RealMemory and TmpDisk.
4222
4223       Default values can be specified with a record in which NodeName is "DE‐
4224       FAULT".  The default entry values will apply only to lines following it
4225       in  the configuration file and the default values can be reset multiple
4226       times in the configuration file  with  multiple  entries  where  "Node‐
4227       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4228       add to previous default values and will not  reinitialize  the  default
4229       values.  The "NodeName=" specification must be placed on every line de‐
4230       scribing the configuration of nodes.  A single node name can not appear
4231       as  a NodeName value in more than one line (duplicate node name records
4232       will be ignored).  In fact, it is generally possible and  desirable  to
4233       define  the configurations of all nodes in only a few lines.  This con‐
4234       vention permits significant optimization in the  scheduling  of  larger
4235       clusters.   In  order to support the concept of jobs requiring consecu‐
4236       tive nodes on some architectures, node specifications should  be  place
4237       in  this  file in consecutive order.  No single node name may be listed
4238       more than once in the configuration file.  Use "DownNodes="  to  record
4239       the  state  of  nodes which are temporarily in a DOWN, DRAIN or FAILING
4240       state without altering  permanent  configuration  information.   A  job
4241       step's  tasks  are  allocated to nodes in order the nodes appear in the
4242       configuration file. There is presently no capability  within  Slurm  to
4243       arbitrarily order a job step's tasks.
4244
4245       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4246       and/or a simple node range expression may optionally be used to specify
4247       numeric  ranges  of  nodes  to avoid building a configuration file with
4248       large numbers of entries.  The node range expression  can  contain  one
4249       pair  of  square  brackets  with  a sequence of comma-separated numbers
4250       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4251       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4252       more leading zeros to indicate the numeric portion has a  fixed  number
4253       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4254       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4255       more  numeric  expressions are included, one of them must be at the end
4256       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4257       always be used in a comma-separated list.
4258
4259       The node configuration specified the following information:
4260
4261
4262       NodeName
4263              Name  that  Slurm uses to refer to a node.  Typically this would
4264              be the string that "/bin/hostname -s" returns.  It may  also  be
4265              the  fully  qualified  domain name as returned by "/bin/hostname
4266              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4267              with the host through the host database (/etc/hosts) or DNS, de‐
4268              pending on the resolver settings.  Note that if the  short  form
4269              of  the hostname is not used, it may prevent use of hostlist ex‐
4270              pressions (the numeric portion in brackets must be at the end of
4271              the string).  It may also be an arbitrary string if NodeHostname
4272              is specified.  If the NodeName is "DEFAULT", the  values  speci‐
4273              fied  with  that record will apply to subsequent node specifica‐
4274              tions unless explicitly set to other values in that node  record
4275              or  replaced  with a different set of default values.  Each line
4276              where NodeName is "DEFAULT" will replace or add to previous  de‐
4277              fault values and not a reinitialize the default values.  For ar‐
4278              chitectures in which the node order is significant,  nodes  will
4279              be considered consecutive in the order defined.  For example, if
4280              the configuration for "NodeName=charlie" immediately follows the
4281              configuration for "NodeName=baker" they will be considered adja‐
4282              cent in the computer.
4283
4284       NodeHostname
4285              Typically this would be the string that "/bin/hostname  -s"  re‐
4286              turns.   It  may  also be the fully qualified domain name as re‐
4287              turned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid
4288              domain  name  associated with the host through the host database
4289              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4290              that  if the short form of the hostname is not used, it may pre‐
4291              vent use of hostlist expressions (the numeric portion in  brack‐
4292              ets  must be at the end of the string).  A node range expression
4293              can be used to specify a set of  nodes.   If  an  expression  is
4294              used,  the  number of nodes identified by NodeHostname on a line
4295              in the configuration file must be identical  to  the  number  of
4296              nodes identified by NodeName.  By default, the NodeHostname will
4297              be identical in value to NodeName.
4298
4299       NodeAddr
4300              Name that a node should be referred to in establishing a  commu‐
4301              nications  path.   This  name will be used as an argument to the
4302              getaddrinfo() function for identification.  If a node range  ex‐
4303              pression  is used to designate multiple nodes, they must exactly
4304              match  the  entries  in  the  NodeName  (e.g.  "NodeName=lx[0-7]
4305              NodeAddr=elx[0-7]").   NodeAddr  may  also contain IP addresses.
4306              By default, the NodeAddr will be identical in value to NodeHost‐
4307              name.
4308
4309       BcastAddr
4310              Alternate  network path to be used for sbcast network traffic to
4311              a given node.  This name will be used  as  an  argument  to  the
4312              getaddrinfo()  function.   If a node range expression is used to
4313              designate multiple nodes, they must exactly match the entries in
4314              the   NodeName   (e.g.  "NodeName=lx[0-7]  BcastAddr=elx[0-7]").
4315              BcastAddr may also contain IP addresses.  By default, the  Bcas‐
4316              tAddr  is  unset,  and  sbcast  traffic  will  be  routed to the
4317              NodeAddr for a given node.  Note: cannot be used with Communica‐
4318              tionParameters=NoInAddrAny.
4319
4320       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4321              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4322              and ThreadsPerCore should be specified.  The default value is 1.
4323
4324       CoreSpecCount
4325              Number  of  cores reserved for system use.  These cores will not
4326              be available for allocation to user jobs.   Depending  upon  the
4327              TaskPluginParam  option  of  SlurmdOffSpec,  Slurm daemons (i.e.
4328              slurmd and slurmstepd) may either be confined to these resources
4329              (the  default)  or prevented from using these resources.  Isola‐
4330              tion of the Slurm daemons from user jobs may improve application
4331              performance.  If this option and CpuSpecList are both designated
4332              for a node, an error is generated.  For information on the algo‐
4333              rithm  used  by Slurm to select the cores refer to the core spe‐
4334              cialization                   documentation                    (
4335              https://slurm.schedmd.com/core_spec.html ).
4336
4337       CoresPerSocket
4338              Number  of  cores  in  a  single physical processor socket (e.g.
4339              "2").  The CoresPerSocket value describes  physical  cores,  not
4340              the  logical number of processors per socket.  NOTE: If you have
4341              multi-core processors, you will likely need to specify this  pa‐
4342              rameter  in  order to optimize scheduling.  The default value is
4343              1.
4344
4345       CpuBind
4346              If a job step request does not specify an option to control  how
4347              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4348              located to the job have the same CpuBind option the node CpuBind
4349              option  will control how tasks are bound to allocated resources.
4350              Supported  values  for  CpuBind  are  "none",  "socket",  "ldom"
4351              (NUMA), "core" and "thread".
4352
4353       CPUs   Number  of logical processors on the node (e.g. "2").  It can be
4354              set to the total number of sockets(supported only by select/lin‐
4355              ear),  cores  or  threads.   This can be useful when you want to
4356              schedule only the cores on a hyper-threaded  node.  If  CPUs  is
4357              omitted, its default will be set equal to the product of Boards,
4358              Sockets, CoresPerSocket, and ThreadsPerCore.
4359
4360       CpuSpecList
4361              A comma-delimited list of Slurm abstract CPU  IDs  reserved  for
4362              system  use.   The  list  will  be expanded to include all other
4363              CPUs, if any, on the same cores.  These cores will not be avail‐
4364              able  for allocation to user jobs.  Depending upon the TaskPlug‐
4365              inParam option of SlurmdOffSpec, Slurm daemons (i.e. slurmd  and
4366              slurmstepd)  may  either be confined to these resources (the de‐
4367              fault) or prevented from using these  resources.   Isolation  of
4368              the Slurm daemons from user jobs may improve application perfor‐
4369              mance.  If this option and CoreSpecCount are both designated for
4370              a node, an error is generated.  This option has no effect unless
4371              cgroup job confinement is also configured (i.e. the  task/cgroup
4372              TaskPlugin   is   enabled   and  ConstrainCores=yes  is  set  in
4373              cgroup.conf).
4374
4375       Features
4376              A comma-delimited list of arbitrary strings indicative  of  some
4377              characteristic  associated  with the node.  There is no value or
4378              count associated with a feature at this time, a node either  has
4379              a  feature  or it does not.  A desired feature may contain a nu‐
4380              meric component indicating, for  example,  processor  speed  but
4381              this numeric component will be considered to be part of the fea‐
4382              ture string. Features are intended to be used  to  filter  nodes
4383              eligible  to run jobs via the --constraint argument.  By default
4384              a node has no features.  Also see Gres for being  able  to  have
4385              more  control  such as types and count. Using features is faster
4386              than scheduling against GRES but is limited  to  Boolean  opera‐
4387              tions.
4388
4389       Gres   A comma-delimited list of generic resources specifications for a
4390              node.   The   format   is:   "<name>[:<type>][:no_consume]:<num‐
4391              ber>[K|M|G]".   The  first  field  is  the  resource name, which
4392              matches the GresType configuration parameter name.  The optional
4393              type field might be used to identify a model of that generic re‐
4394              source.  It is forbidden to specify both an untyped GRES  and  a
4395              typed  GRES with the same <name>.  The optional no_consume field
4396              allows you to specify that a generic resource does  not  have  a
4397              finite  number  of that resource that gets consumed as it is re‐
4398              quested. The no_consume field is a GRES specific setting and ap‐
4399              plies  to the GRES, regardless of the type specified.  The final
4400              field must specify a generic resources count.  A suffix of  "K",
4401              "M", "G", "T" or "P" may be used to multiply the number by 1024,
4402              1048576,         1073741824,         etc.          respectively.
4403              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4404              sume:4G").  By default a node has no generic resources  and  its
4405              maximum  count  is  that of an unsigned 64bit integer.  Also see
4406              Features for Boolean  flags  to  filter  nodes  using  job  con‐
4407              straints.
4408
4409       MemSpecLimit
4410              Amount  of memory, in megabytes, reserved for system use and not
4411              available for user allocations.  If the  task/cgroup  plugin  is
4412              configured  and  that plugin constrains memory allocations (i.e.
4413              the task/cgroup TaskPlugin is enabled and  ConstrainRAMSpace=yes
4414              is  set in cgroup.conf), then Slurm compute node daemons (slurmd
4415              plus slurmstepd) will be allocated the specified  memory  limit.
4416              Note  that  having the Memory set in SelectTypeParameters as any
4417              of the options that has it as a consumable  resource  is  needed
4418              for this option to work.  The daemons will not be killed if they
4419              exhaust the memory allocation (ie. the Out-Of-Memory  Killer  is
4420              disabled  for  the  daemon's memory cgroup).  If the task/cgroup
4421              plugin is not configured, the specified memory will only be  un‐
4422              available for user allocations.
4423
4424       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4425              tens to for work on this particular node. By default there is  a
4426              single  port  number for all slurmd daemons on all compute nodes
4427              as defined by the SlurmdPort  configuration  parameter.  Use  of
4428              this  option is not generally recommended except for development
4429              or testing purposes. If multiple slurmd  daemons  execute  on  a
4430              node this can specify a range of ports.
4431
4432              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4433              automatically try to interact  with  anything  opened  on  ports
4434              8192-60000.  Configure Port to use a port outside of the config‐
4435              ured SrunPortRange and RSIP's port range.
4436
4437       Procs  See CPUs.
4438
4439       RealMemory
4440              Size of real memory on the node in megabytes (e.g. "2048").  The
4441              default value is 1. Lowering RealMemory with the goal of setting
4442              aside some amount for the OS and not available for  job  alloca‐
4443              tions  will  not work as intended if Memory is not set as a con‐
4444              sumable resource in SelectTypeParameters. So one of the *_Memory
4445              options  need  to  be  enabled for that goal to be accomplished.
4446              Also see MemSpecLimit.
4447
4448       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4449              "DRAINED"  "DRAINING",  "FAIL"  or "FAILING".  Use quotes to en‐
4450              close a reason having more than one word.
4451
4452       Sockets
4453              Number of physical processor sockets/chips  on  the  node  (e.g.
4454              "2").   If  Sockets  is  omitted, it will be inferred from CPUs,
4455              CoresPerSocket,  and  ThreadsPerCore.    NOTE:   If   you   have
4456              multi-core processors, you will likely need to specify these pa‐
4457              rameters.  Sockets and SocketsPerBoard are  mutually  exclusive.
4458              If Sockets is specified when Boards is also used, Sockets is in‐
4459              terpreted as SocketsPerBoard rather than total sockets.  The de‐
4460              fault value is 1.
4461
4462       SocketsPerBoard
4463              Number  of  physical  processor  sockets/chips  on  a baseboard.
4464              Sockets and SocketsPerBoard are mutually exclusive.  The default
4465              value is 1.
4466
4467       State  State  of  the node with respect to the initiation of user jobs.
4468              Acceptable values are CLOUD, DOWN, DRAIN, FAIL, FAILING,  FUTURE
4469              and  UNKNOWN.  Node states of BUSY and IDLE should not be speci‐
4470              fied in the node configuration, but set the node  state  to  UN‐
4471              KNOWN instead.  Setting the node state to UNKNOWN will result in
4472              the node state being set to  BUSY,  IDLE  or  other  appropriate
4473              state  based  upon  recovered system state information.  The de‐
4474              fault value is UNKNOWN.  Also see the DownNodes parameter below.
4475
4476              CLOUD     Indicates the node exists in the cloud.   Its  initial
4477                        state  will be treated as powered down.  The node will
4478                        be available for use after its state is recovered from
4479                        Slurm's state save file or the slurmd daemon starts on
4480                        the compute node.
4481
4482              DOWN      Indicates the node failed and is unavailable to be al‐
4483                        located work.
4484
4485              DRAIN     Indicates  the  node  is  unavailable  to be allocated
4486                        work.
4487
4488              FAIL      Indicates the node is expected to fail  soon,  has  no
4489                        jobs allocated to it, and will not be allocated to any
4490                        new jobs.
4491
4492              FAILING   Indicates the node is expected to fail soon,  has  one
4493                        or  more  jobs  allocated to it, but will not be allo‐
4494                        cated to any new jobs.
4495
4496              FUTURE    Indicates the node is defined for future use and  need
4497                        not  exist  when  the Slurm daemons are started. These
4498                        nodes can be made available for use simply by updating
4499                        the  node state using the scontrol command rather than
4500                        restarting the slurmctld daemon. After these nodes are
4501                        made  available,  change their State in the slurm.conf
4502                        file. Until these nodes are made available, they  will
4503                        not  be  seen using any Slurm commands or nor will any
4504                        attempt be made to contact them.
4505
4506                        Dynamic Future Nodes
4507                               A slurmd started with -F[<feature>] will be as‐
4508                               sociated  with  a  FUTURE node that matches the
4509                               same configuration (sockets, cores, threads) as
4510                               reported  by slurmd -C. The node's NodeAddr and
4511                               NodeHostname will  automatically  be  retrieved
4512                               from  the  slurmd  and will be cleared when set
4513                               back to the FUTURE state. Dynamic FUTURE  nodes
4514                               retain  non-FUTURE  state on restart. Use scon‐
4515                               trol to put node back into FUTURE state.
4516
4517                               If the mapping of the NodeName  to  the  slurmd
4518                               HostName  is not updated in DNS, Dynamic Future
4519                               nodes won't know how to communicate  with  each
4520                               other  -- because NodeAddr and NodeHostName are
4521                               not defined in the slurm.conf -- and the fanout
4522                               communications  need  to be disabled by setting
4523                               TreeWidth to a high number (e.g. 65533). If the
4524                               DNS  mapping is made, then the cloud_dns Slurm‐
4525                               ctldParameter can be used.
4526
4527              UNKNOWN   Indicates the node's state is undefined  but  will  be
4528                        established (set to BUSY or IDLE) when the slurmd dae‐
4529                        mon on that node registers.  UNKNOWN  is  the  default
4530                        state.
4531
4532       ThreadsPerCore
4533              Number  of logical threads in a single physical core (e.g. "2").
4534              Note that the Slurm can allocate resources to jobs down  to  the
4535              resolution  of  a  core.  If your system is configured with more
4536              than one thread per core, execution of a different job  on  each
4537              thread  is  not supported unless you configure SelectTypeParame‐
4538              ters=CR_CPU plus CPUs; do not configure Sockets,  CoresPerSocket
4539              or ThreadsPerCore.  A job can execute a one task per thread from
4540              within one job step or execute a distinct job step  on  each  of
4541              the  threads.   Note  also  if  you are running with more than 1
4542              thread  per  core  and  running  the  select/cons_res   or   se‐
4543              lect/cons_tres  plugin then you will want to set the SelectType‐
4544              Parameters variable to something other than CR_CPU to avoid  un‐
4545              expected results.  The default value is 1.
4546
4547       TmpDisk
4548              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4549              "16384"). TmpFS (for "Temporary File System") identifies the lo‐
4550              cation  which  jobs should use for temporary storage.  Note this
4551              does not indicate the amount of free space available to the user
4552              on  the node, only the total file system size. The system admin‐
4553              istration should ensure this file system is purged as needed  so
4554              that  user  jobs  have access to most of this space.  The Prolog
4555              and/or Epilog programs (specified  in  the  configuration  file)
4556              might  be used to ensure the file system is kept clean.  The de‐
4557              fault value is 0.
4558
4559       TRESWeights
4560              TRESWeights are used to calculate a value  that  represents  how
4561              busy  a  node  is.  Currently only used in federation configura‐
4562              tions. TRESWeights  are  different  from  TRESBillingWeights  --
4563              which is used for fairshare calculations.
4564
4565              TRES  weights  are  specified as a comma-separated list of <TRES
4566              Type>=<TRES Weight> pairs.
4567
4568              e.g.
4569              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4570
4571              By default the weighted TRES value is calculated as the  sum  of
4572              all  node  TRES  types  multiplied  by  their corresponding TRES
4573              weight.
4574
4575              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4576              is  calculated  as  the MAX of individual node TRES' (e.g. cpus,
4577              mem, gres).
4578
4579       Weight The priority of the node for scheduling  purposes.   All  things
4580              being  equal,  jobs  will be allocated the nodes with the lowest
4581              weight which satisfies their requirements.  For example, a  het‐
4582              erogeneous  collection  of  nodes  might be placed into a single
4583              partition for greater system utilization, responsiveness and ca‐
4584              pability.  It  would  be  preferable  to allocate smaller memory
4585              nodes rather than larger memory nodes if either will  satisfy  a
4586              job's  requirements.   The  units  of  weight are arbitrary, but
4587              larger weights should be assigned to nodes with more processors,
4588              memory, disk space, higher processor speed, etc.  Note that if a
4589              job allocation request can not be satisfied using the nodes with
4590              the  lowest weight, the set of nodes with the next lowest weight
4591              is added to the set of nodes under consideration for use (repeat
4592              as  needed  for higher weight values). If you absolutely want to
4593              minimize the number of higher weight nodes allocated  to  a  job
4594              (at a cost of higher scheduling overhead), give each node a dis‐
4595              tinct Weight value and they will be added to the pool  of  nodes
4596              being considered for scheduling individually.
4597
4598              The default value is 1.
4599
4600              NOTE:  Node  weights are first considered among currently avail‐
4601              able nodes. For example, a POWERED_DOWN node with a lower weight
4602              will not be evaluated before an IDLE node.
4603

DOWN NODE CONFIGURATION

4605       The  DownNodes=  parameter  permits  you  to mark certain nodes as in a
4606       DOWN, DRAIN, FAIL, FAILING or FUTURE state without altering the  perma‐
4607       nent configuration information listed under a NodeName= specification.
4608
4609
4610       DownNodes
4611              Any  node name, or list of node names, from the NodeName= speci‐
4612              fications.
4613
4614       Reason Identifies the reason for a node being  in  state  DOWN,  DRAIN,
4615              FAIL,  FAILING or FUTURE.  Use quotes to enclose a reason having
4616              more than one word.
4617
4618       State  State of the node with respect to the initiation of  user  jobs.
4619              Acceptable  values  are  DOWN,  DRAIN, FAIL, FAILING and FUTURE.
4620              For more information about these states see the descriptions un‐
4621              der  State in the NodeName= section above.  The default value is
4622              DOWN.
4623

FRONTEND NODE CONFIGURATION

4625       On computers where frontend nodes are used  to  execute  batch  scripts
4626       rather than compute nodes, one may configure one or more frontend nodes
4627       using the configuration parameters defined  below.  These  options  are
4628       very  similar to those used in configuring compute nodes. These options
4629       may only be used on systems configured and built with  the  appropriate
4630       parameters  (--have-front-end).   The front end configuration specifies
4631       the following information:
4632
4633
4634       AllowGroups
4635              Comma-separated list of group names which may  execute  jobs  on
4636              this  front  end node. By default, all groups may use this front
4637              end node.  A user will be permitted to use this front  end  node
4638              if  AllowGroups has at least one group associated with the user.
4639              May not be used with the DenyGroups option.
4640
4641       AllowUsers
4642              Comma-separated list of user names which  may  execute  jobs  on
4643              this  front  end  node. By default, all users may use this front
4644              end node.  May not be used with the DenyUsers option.
4645
4646       DenyGroups
4647              Comma-separated list of group names which are prevented from ex‐
4648              ecuting  jobs  on this front end node.  May not be used with the
4649              AllowGroups option.
4650
4651       DenyUsers
4652              Comma-separated list of user names which are prevented from exe‐
4653              cuting  jobs  on  this front end node.  May not be used with the
4654              AllowUsers option.
4655
4656       FrontendName
4657              Name that Slurm uses to refer to  a  frontend  node.   Typically
4658              this  would  be  the string that "/bin/hostname -s" returns.  It
4659              may also be the fully  qualified  domain  name  as  returned  by
4660              "/bin/hostname  -f"  (e.g.  "foo1.bar.com"), or any valid domain
4661              name  associated  with  the  host  through  the  host   database
4662              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4663              that if the short form of the hostname is not used, it may  pre‐
4664              vent  use of hostlist expressions (the numeric portion in brack‐
4665              ets must be at the end of the string).  If the  FrontendName  is
4666              "DEFAULT",  the  values specified with that record will apply to
4667              subsequent node specifications unless explicitly  set  to  other
4668              values in that frontend node record or replaced with a different
4669              set of default values.  Each line  where  FrontendName  is  "DE‐
4670              FAULT"  will replace or add to previous default values and not a
4671              reinitialize the default values.
4672
4673       FrontendAddr
4674              Name that a frontend node should be referred to in  establishing
4675              a  communications path. This name will be used as an argument to
4676              the getaddrinfo() function for identification.   As  with  Fron‐
4677              tendName, list the individual node addresses rather than using a
4678              hostlist expression.  The number  of  FrontendAddr  records  per
4679              line  must  equal  the  number  of FrontendName records per line
4680              (i.e. you can't map to node names to one address).  FrontendAddr
4681              may  also  contain  IP  addresses.  By default, the FrontendAddr
4682              will be identical in value to FrontendName.
4683
4684       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4685              tens  to  for  work on this particular frontend node. By default
4686              there is a single port number for  all  slurmd  daemons  on  all
4687              frontend nodes as defined by the SlurmdPort configuration param‐
4688              eter. Use of this option is not generally recommended except for
4689              development or testing purposes.
4690
4691              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4692              automatically try to interact  with  anything  opened  on  ports
4693              8192-60000.  Configure Port to use a port outside of the config‐
4694              ured SrunPortRange and RSIP's port range.
4695
4696       Reason Identifies the reason for a frontend node being in  state  DOWN,
4697              DRAINED,  DRAINING,  FAIL  or  FAILING.  Use quotes to enclose a
4698              reason having more than one word.
4699
4700       State  State of the frontend node with respect  to  the  initiation  of
4701              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4702              UNKNOWN.  Node states of BUSY and IDLE should not  be  specified
4703              in the node configuration, but set the node state to UNKNOWN in‐
4704              stead.  Setting the node state to UNKNOWN  will  result  in  the
4705              node  state  being  set to BUSY, IDLE or other appropriate state
4706              based upon recovered system state information.  For more  infor‐
4707              mation  about  these  states see the descriptions under State in
4708              the NodeName= section above.  The default value is UNKNOWN.
4709
4710       As an example, you can do something similar to the following to  define
4711       four front end nodes for running slurmd daemons.
4712       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4713
4714

NODESET CONFIGURATION

4716       The  nodeset  configuration  allows you to define a name for a specific
4717       set of nodes which can be used to simplify the partition  configuration
4718       section, especially for heterogenous or condo-style systems. Each node‐
4719       set may be defined by an explicit list of nodes,  and/or  by  filtering
4720       the  nodes  by  a  particular  configured feature. If both Feature= and
4721       Nodes= are used the nodeset shall be the  union  of  the  two  subsets.
4722       Note  that the nodesets are only used to simplify the partition defini‐
4723       tions at present, and are not usable outside of the partition  configu‐
4724       ration.
4725
4726
4727       Feature
4728              All  nodes  with this single feature will be included as part of
4729              this nodeset.
4730
4731       Nodes  List of nodes in this set.
4732
4733       NodeSet
4734              Unique name for a set of nodes. Must not overlap with any  Node‐
4735              Name definitions.
4736

PARTITION CONFIGURATION

4738       The partition configuration permits you to establish different job lim‐
4739       its or access controls for various groups  (or  partitions)  of  nodes.
4740       Nodes  may  be  in  more than one partition, making partitions serve as
4741       general purpose queues.  For example one may put the same set of  nodes
4742       into  two  different  partitions, each with different constraints (time
4743       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4744       allocated  resources  within a single partition.  Default values can be
4745       specified with a record in which PartitionName is "DEFAULT".   The  de‐
4746       fault entry values will apply only to lines following it in the config‐
4747       uration file and the default values can be reset multiple times in  the
4748       configuration file with multiple entries where "PartitionName=DEFAULT".
4749       The "PartitionName=" specification must be placed  on  every  line  de‐
4750       scribing  the  configuration of partitions.  Each line where Partition‐
4751       Name is "DEFAULT" will replace or add to previous  default  values  and
4752       not a reinitialize the default values.  A single partition name can not
4753       appear as a PartitionName value in more than one line (duplicate parti‐
4754       tion  name  records will be ignored).  If a partition that is in use is
4755       deleted from the configuration and slurm is restarted  or  reconfigured
4756       (scontrol  reconfigure),  jobs using the partition are canceled.  NOTE:
4757       Put all parameters for each partition on a single line.  Each  line  of
4758       partition configuration information should represent a different parti‐
4759       tion.  The partition configuration file contains the following informa‐
4760       tion:
4761
4762
4763       AllocNodes
4764              Comma-separated  list  of nodes from which users can submit jobs
4765              in the partition.  Node names may be specified  using  the  node
4766              range  expression  syntax described above.  The default value is
4767              "ALL".
4768
4769       AllowAccounts
4770              Comma-separated list of accounts which may execute jobs  in  the
4771              partition.   The default value is "ALL".  NOTE: If AllowAccounts
4772              is used then DenyAccounts will not be enforced.  Also  refer  to
4773              DenyAccounts.
4774
4775       AllowGroups
4776              Comma-separated  list  of  group names which may execute jobs in
4777              this partition.  A user will be permitted to  submit  a  job  to
4778              this  partition if AllowGroups has at least one group associated
4779              with the user.  Jobs executed as user root or as user  SlurmUser
4780              will be allowed to use any partition, regardless of the value of
4781              AllowGroups. In addition, a Slurm Admin or Operator will be able
4782              to  view  any partition, regardless of the value of AllowGroups.
4783              If user root attempts to execute a job as another user (e.g. us‐
4784              ing srun's --uid option), then the job will be subject to Allow‐
4785              Groups as if it were submitted by that user.  By default, Allow‐
4786              Groups is unset, meaning all groups are allowed to use this par‐
4787              tition. The special value 'ALL' is equivalent  to  this.   Users
4788              who are not members of the specified group will not see informa‐
4789              tion about this partition by default. However, this  should  not
4790              be  treated  as a security mechanism, since job information will
4791              be returned if a user requests details about the partition or  a
4792              specific  job.  See the PrivateData parameter to restrict access
4793              to job information.  NOTE: For performance reasons, Slurm  main‐
4794              tains  a list of user IDs allowed to use each partition and this
4795              is checked at job submission time.  This list of user IDs is up‐
4796              dated when the slurmctld daemon is restarted, reconfigured (e.g.
4797              "scontrol reconfig") or the partition's AllowGroups value is re‐
4798              set, even if is value is unchanged (e.g. "scontrol update Parti‐
4799              tionName=name AllowGroups=group").  For a  user's  access  to  a
4800              partition  to  change, both his group membership must change and
4801              Slurm's internal user ID list must change using one of the meth‐
4802              ods described above.
4803
4804       AllowQos
4805              Comma-separated list of Qos which may execute jobs in the parti‐
4806              tion.  Jobs executed as user root can use any partition  without
4807              regard  to  the  value of AllowQos.  The default value is "ALL".
4808              NOTE: If AllowQos is used then DenyQos  will  not  be  enforced.
4809              Also refer to DenyQos.
4810
4811       Alternate
4812              Partition name of alternate partition to be used if the state of
4813              this partition is "DRAIN" or "INACTIVE."
4814
4815       CpuBind
4816              If a job step request does not specify an option to control  how
4817              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4818              located to the job do not have the same CpuBind option the node.
4819              Then  the  partition's CpuBind option will control how tasks are
4820              bound to allocated resources.  Supported values  forCpuBind  are
4821              "none", "socket", "ldom" (NUMA), "core" and "thread".
4822
4823       Default
4824              If this keyword is set, jobs submitted without a partition spec‐
4825              ification will utilize  this  partition.   Possible  values  are
4826              "YES" and "NO".  The default value is "NO".
4827
4828       DefaultTime
4829              Run  time limit used for jobs that don't specify a value. If not
4830              set then MaxTime will be used.  Format is the same as  for  Max‐
4831              Time.
4832
4833       DefCpuPerGPU
4834              Default count of CPUs allocated per allocated GPU. This value is
4835              used  only  if  the  job  didn't  specify  --cpus-per-task   and
4836              --cpus-per-gpu.
4837
4838       DefMemPerCPU
4839              Default   real  memory  size  available  per  allocated  CPU  in
4840              megabytes.  Used to avoid over-subscribing  memory  and  causing
4841              paging.  DefMemPerCPU would generally be used if individual pro‐
4842              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
4843              lectType=select/cons_tres).   If not set, the DefMemPerCPU value
4844              for the entire cluster will be  used.   Also  see  DefMemPerGPU,
4845              DefMemPerNode  and MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
4846              DefMemPerNode are mutually exclusive.
4847
4848       DefMemPerGPU
4849              Default  real  memory  size  available  per  allocated  GPU   in
4850              megabytes.   Also see DefMemPerCPU, DefMemPerNode and MaxMemPer‐
4851              CPU.  DefMemPerCPU, DefMemPerGPU and DefMemPerNode are  mutually
4852              exclusive.
4853
4854       DefMemPerNode
4855              Default  real  memory  size  available  per  allocated  node  in
4856              megabytes.  Used to avoid over-subscribing  memory  and  causing
4857              paging.   DefMemPerNode  would  generally be used if whole nodes
4858              are allocated to jobs (SelectType=select/linear)  and  resources
4859              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
4860              If not set, the DefMemPerNode value for the entire cluster  will
4861              be  used.  Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerCPU.
4862              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
4863              sive.
4864
4865       DenyAccounts
4866              Comma-separated  list  of accounts which may not execute jobs in
4867              the partition.  By default, no accounts are denied access  NOTE:
4868              If AllowAccounts is used then DenyAccounts will not be enforced.
4869              Also refer to AllowAccounts.
4870
4871       DenyQos
4872              Comma-separated list of Qos which may not execute  jobs  in  the
4873              partition.   By  default,  no QOS are denied access NOTE: If Al‐
4874              lowQos is used then DenyQos will not be  enforced.   Also  refer
4875              AllowQos.
4876
4877       DisableRootJobs
4878              If  set  to  "YES" then user root will be prevented from running
4879              any jobs on this partition.  The default value will be the value
4880              of  DisableRootJobs  set  outside  of  a partition specification
4881              (which is "NO", allowing user root to execute jobs).
4882
4883       ExclusiveUser
4884              If set to "YES" then nodes  will  be  exclusively  allocated  to
4885              users.  Multiple jobs may be run for the same user, but only one
4886              user can be active at a time.  This capability is also available
4887              on a per-job basis by using the --exclusive=user option.
4888
4889       GraceTime
4890              Specifies,  in units of seconds, the preemption grace time to be
4891              extended to a job which has been selected for  preemption.   The
4892              default  value  is  zero, no preemption grace time is allowed on
4893              this partition.  Once a job has been  selected  for  preemption,
4894              its  end  time  is  set  to the current time plus GraceTime. The
4895              job's tasks are immediately sent SIGCONT and SIGTERM signals  in
4896              order to provide notification of its imminent termination.  This
4897              is followed by the SIGCONT, SIGTERM and SIGKILL signal  sequence
4898              upon  reaching  its  new end time. This second set of signals is
4899              sent to both the tasks and the containing batch script,  if  ap‐
4900              plicable.  See also the global KillWait configuration parameter.
4901
4902       Hidden Specifies  if the partition and its jobs are to be hidden by de‐
4903              fault.  Hidden partitions will by default not be reported by the
4904              Slurm  APIs  or  commands.   Possible values are "YES" and "NO".
4905              The default value is "NO".  Note that  partitions  that  a  user
4906              lacks access to by virtue of the AllowGroups parameter will also
4907              be hidden by default.
4908
4909       LLN    Schedule resources to jobs on the least loaded nodes (based upon
4910              the number of idle CPUs). This is generally only recommended for
4911              an environment with serial jobs as idle resources will  tend  to
4912              be  highly fragmented, resulting in parallel jobs being distrib‐
4913              uted across many nodes.  Note that node Weight takes  precedence
4914              over how many idle resources are on each node.  Also see the Se‐
4915              lectParameters configuration parameter CR_LLN to use  the  least
4916              loaded nodes in every partition.
4917
4918       MaxCPUsPerNode
4919              Maximum  number  of  CPUs on any node available to all jobs from
4920              this partition.  This can be especially useful to schedule GPUs.
4921              For  example  a node can be associated with two Slurm partitions
4922              (e.g. "cpu" and "gpu") and the partition/queue  "cpu"  could  be
4923              limited  to  only a subset of the node's CPUs, ensuring that one
4924              or more CPUs would be available to  jobs  in  the  "gpu"  parti‐
4925              tion/queue.
4926
4927       MaxMemPerCPU
4928              Maximum   real  memory  size  available  per  allocated  CPU  in
4929              megabytes.  Used to avoid over-subscribing  memory  and  causing
4930              paging.  MaxMemPerCPU would generally be used if individual pro‐
4931              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
4932              lectType=select/cons_tres).   If not set, the MaxMemPerCPU value
4933              for the entire cluster will be used.  Also see DefMemPerCPU  and
4934              MaxMemPerNode.   MaxMemPerCPU and MaxMemPerNode are mutually ex‐
4935              clusive.
4936
4937       MaxMemPerNode
4938              Maximum  real  memory  size  available  per  allocated  node  in
4939              megabytes.   Used  to  avoid over-subscribing memory and causing
4940              paging.  MaxMemPerNode would generally be used  if  whole  nodes
4941              are  allocated  to jobs (SelectType=select/linear) and resources
4942              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
4943              If  not set, the MaxMemPerNode value for the entire cluster will
4944              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
4945              and MaxMemPerNode are mutually exclusive.
4946
4947       MaxNodes
4948              Maximum count of nodes which may be allocated to any single job.
4949              The default value is "UNLIMITED", which  is  represented  inter‐
4950              nally as -1.
4951
4952       MaxTime
4953              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
4954              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
4955              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
4956              tion is one minute and second values are rounded up to the  next
4957              minute.   The job TimeLimit may be updated by root, SlurmUser or
4958              an Operator to a value higher than the configured MaxTime  after
4959              job submission.
4960
4961       MinNodes
4962              Minimum count of nodes which may be allocated to any single job.
4963              The default value is 0.
4964
4965       Nodes  Comma-separated list of nodes or nodesets which  are  associated
4966              with this partition.  Node names may be specified using the node
4967              range expression syntax described above. A blank list  of  nodes
4968              (i.e.  "Nodes= ") can be used if one wants a partition to exist,
4969              but have no resources (possibly on a temporary basis).  A  value
4970              of "ALL" is mapped to all nodes configured in the cluster.
4971
4972       OverSubscribe
4973              Controls  the  ability of the partition to execute more than one
4974              job at a time on each resource (node, socket or  core  depending
4975              upon the value of SelectTypeParameters).  If resources are to be
4976              over-subscribed, avoiding memory over-subscription is  very  im‐
4977              portant.   SelectTypeParameters  should  be  configured to treat
4978              memory as a consumable resource and the --mem option  should  be
4979              used  for  job  allocations.   Sharing of resources is typically
4980              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
4981              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
4982              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
4983              can  negatively  impact  performance for systems with many thou‐
4984              sands of running jobs.  The default value is "NO".  For more in‐
4985              formation see the following web pages:
4986              https://slurm.schedmd.com/cons_res.html
4987              https://slurm.schedmd.com/cons_res_share.html
4988              https://slurm.schedmd.com/gang_scheduling.html
4989              https://slurm.schedmd.com/preempt.html
4990
4991              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
4992                          Type=select/cons_res or  SelectType=select/cons_tres
4993                          configured.   Jobs that run in partitions with Over‐
4994                          Subscribe=EXCLUSIVE will have  exclusive  access  to
4995                          all  allocated  nodes.  These jobs are allocated all
4996                          CPUs and GRES on the nodes, but they are only  allo‐
4997                          cated as much memory as they ask for. This is by de‐
4998                          sign to support gang scheduling,  because  suspended
4999                          jobs still reside in memory. To request all the mem‐
5000                          ory on a node, use --mem=0 at submit time.
5001
5002              FORCE       Makes all resources (except GRES) in  the  partition
5003                          available for oversubscription without any means for
5004                          users to disable it.  May be followed with  a  colon
5005                          and  maximum  number of jobs in running or suspended
5006                          state.  For  example  OverSubscribe=FORCE:4  enables
5007                          each  node, socket or core to oversubscribe each re‐
5008                          source four ways.  Recommended only for systems  us‐
5009                          ing PreemptMode=suspend,gang.
5010
5011                          NOTE:  OverSubscribe=FORCE:1  is a special case that
5012                          is not exactly equivalent to OverSubscribe=NO. Over‐
5013                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5014                          tion of resources in the same partition but it  will
5015                          still allow oversubscription due to preemption. Set‐
5016                          ting OverSubscribe=NO will prevent  oversubscription
5017                          from happening due to preemption as well.
5018
5019                          NOTE: If using PreemptType=preempt/qos you can spec‐
5020                          ify a value for FORCE that is greater  than  1.  For
5021                          example,  OverSubscribe=FORCE:2 will permit two jobs
5022                          per resource  normally,  but  a  third  job  can  be
5023                          started  only  if  done  so through preemption based
5024                          upon QOS.
5025
5026                          NOTE: If OverSubscribe is configured to FORCE or YES
5027                          in  your slurm.conf and the system is not configured
5028                          to use preemption (PreemptMode=OFF)  accounting  can
5029                          easily  grow  to values greater than the actual uti‐
5030                          lization. It may be common on such  systems  to  get
5031                          error messages in the slurmdbd log stating: "We have
5032                          more allocated time than is possible."
5033
5034              YES         Makes all resources (except GRES) in  the  partition
5035                          available  for sharing upon request by the job.  Re‐
5036                          sources will only be over-subscribed when explicitly
5037                          requested  by  the  user using the "--oversubscribe"
5038                          option on job submission.  May be  followed  with  a
5039                          colon  and maximum number of jobs in running or sus‐
5040                          pended state.  For example "OverSubscribe=YES:4" en‐
5041                          ables  each  node,  socket  or core to execute up to
5042                          four jobs at once.   Recommended  only  for  systems
5043                          running   with   gang  scheduling  (PreemptMode=sus‐
5044                          pend,gang).
5045
5046              NO          Selected resources are allocated to a single job. No
5047                          resource will be allocated to more than one job.
5048
5049                          NOTE:   Even   if  you  are  using  PreemptMode=sus‐
5050                          pend,gang,  setting  OverSubscribe=NO  will  disable
5051                          preemption   on   that   partition.   Use   OverSub‐
5052                          scribe=FORCE:1 if you want to disable  normal  over‐
5053                          subscription  but still allow suspension due to pre‐
5054                          emption.
5055
5056       OverTimeLimit
5057              Number of minutes by which a job can exceed its time  limit  be‐
5058              fore  being canceled.  Normally a job's time limit is treated as
5059              a hard limit and the job  will  be  killed  upon  reaching  that
5060              limit.   Configuring OverTimeLimit will result in the job's time
5061              limit being treated like a soft limit.  Adding the OverTimeLimit
5062              value  to  the  soft  time  limit provides a hard time limit, at
5063              which point the job is canceled.  This  is  particularly  useful
5064              for  backfill  scheduling, which bases upon each job's soft time
5065              limit.  If not set, the OverTimeLimit value for the entire clus‐
5066              ter  will  be  used.   May not exceed 65533 minutes.  A value of
5067              "UNLIMITED" is also supported.
5068
5069       PartitionName
5070              Name by which the partition may be  referenced  (e.g.  "Interac‐
5071              tive").   This  name  can  be specified by users when submitting
5072              jobs.  If the PartitionName is "DEFAULT", the  values  specified
5073              with  that  record will apply to subsequent partition specifica‐
5074              tions unless explicitly set to other values  in  that  partition
5075              record or replaced with a different set of default values.  Each
5076              line where PartitionName is "DEFAULT" will  replace  or  add  to
5077              previous  default values and not a reinitialize the default val‐
5078              ues.
5079
5080       PreemptMode
5081              Mechanism used to preempt jobs or  enable  gang  scheduling  for
5082              this  partition  when PreemptType=preempt/partition_prio is con‐
5083              figured.  This partition-specific PreemptMode configuration  pa‐
5084              rameter will override the cluster-wide PreemptMode for this par‐
5085              tition.  It can be set to OFF to  disable  preemption  and  gang
5086              scheduling  for  this  partition.  See also PriorityTier and the
5087              above description of the cluster-wide PreemptMode parameter  for
5088              further details.
5089              The GANG option is used to enable gang scheduling independent of
5090              whether preemption is enabled (i.e. independent of the  Preempt‐
5091              Type  setting). It can be specified in addition to a PreemptMode
5092              setting with the two  options  comma  separated  (e.g.  Preempt‐
5093              Mode=SUSPEND,GANG).
5094              See         <https://slurm.schedmd.com/preempt.html>         and
5095              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
5096              tails.
5097
5098              NOTE:  For  performance reasons, the backfill scheduler reserves
5099              whole nodes for jobs, not  partial  nodes.  If  during  backfill
5100              scheduling  a  job  preempts  one  or more other jobs, the whole
5101              nodes for those preempted jobs are reserved  for  the  preemptor
5102              job,  even  if  the preemptor job requested fewer resources than
5103              that.  These reserved nodes aren't available to other jobs  dur‐
5104              ing that backfill cycle, even if the other jobs could fit on the
5105              nodes. Therefore, jobs may preempt more resources during a  sin‐
5106              gle backfill iteration than they requested.
5107              NOTE:  For heterogeneous job to be considered for preemption all
5108              components must be eligible for preemption. When a heterogeneous
5109              job is to be preempted the first identified component of the job
5110              with the highest order PreemptMode (SUSPEND (highest),  REQUEUE,
5111              CANCEL  (lowest))  will  be  used to set the PreemptMode for all
5112              components. The GraceTime and user warning signal for each  com‐
5113              ponent  of  the  heterogeneous job remain unique.  Heterogeneous
5114              jobs are excluded from GANG scheduling operations.
5115
5116              OFF         Is the default value and disables job preemption and
5117                          gang  scheduling.   It  is only compatible with Pre‐
5118                          emptType=preempt/none at a global level.   A  common
5119                          use case for this parameter is to set it on a parti‐
5120                          tion to disable preemption for that partition.
5121
5122              CANCEL      The preempted job will be cancelled.
5123
5124              GANG        Enables gang scheduling (time slicing)  of  jobs  in
5125                          the  same partition, and allows the resuming of sus‐
5126                          pended jobs.
5127
5128                          NOTE: Gang scheduling is performed independently for
5129                          each  partition, so if you only want time-slicing by
5130                          OverSubscribe, without any preemption, then  config‐
5131                          uring  partitions with overlapping nodes is not rec‐
5132                          ommended.  On the other hand, if  you  want  to  use
5133                          PreemptType=preempt/partition_prio   to  allow  jobs
5134                          from higher PriorityTier partitions to Suspend  jobs
5135                          from  lower  PriorityTier  partitions  you will need
5136                          overlapping partitions, and PreemptMode=SUSPEND,GANG
5137                          to  use  the  Gang scheduler to resume the suspended
5138                          jobs(s).  In any case, time-slicing won't happen be‐
5139                          tween jobs on different partitions.
5140                          NOTE:  Heterogeneous  jobs  are  excluded  from GANG
5141                          scheduling operations.
5142
5143              REQUEUE     Preempts jobs by requeuing  them  (if  possible)  or
5144                          canceling  them.   For jobs to be requeued they must
5145                          have the --requeue sbatch option set or the  cluster
5146                          wide  JobRequeue parameter in slurm.conf must be set
5147                          to 1.
5148
5149              SUSPEND     The preempted jobs will be suspended, and later  the
5150                          Gang  scheduler will resume them. Therefore the SUS‐
5151                          PEND preemption mode always needs the GANG option to
5152                          be specified at the cluster level. Also, because the
5153                          suspended jobs will still use memory  on  the  allo‐
5154                          cated  nodes, Slurm needs to be able to track memory
5155                          resources to be able to suspend jobs.
5156
5157                          If the preemptees and  preemptor  are  on  different
5158                          partitions  then the preempted jobs will remain sus‐
5159                          pended until the preemptor ends.
5160                          NOTE: Because gang scheduling is performed  indepen‐
5161                          dently for each partition, if using PreemptType=pre‐
5162                          empt/partition_prio then jobs in higher PriorityTier
5163                          partitions  will  suspend jobs in lower PriorityTier
5164                          partitions to run on the  released  resources.  Only
5165                          when  the preemptor job ends will the suspended jobs
5166                          will be resumed by the Gang scheduler.
5167                          NOTE: Suspended jobs will not release  GRES.  Higher
5168                          priority  jobs  will  not be able to preempt to gain
5169                          access to GRES.
5170
5171       PriorityJobFactor
5172              Partition factor used by priority/multifactor plugin  in  calcu‐
5173              lating  job priority.  The value may not exceed 65533.  Also see
5174              PriorityTier.
5175
5176       PriorityTier
5177              Jobs submitted to a partition with a higher  PriorityTier  value
5178              will be evaluated by the scheduler before pending jobs in a par‐
5179              tition with a lower PriorityTier value. They will also  be  con‐
5180              sidered  for  preemption  of  running  jobs in partition(s) with
5181              lower PriorityTier values if PreemptType=preempt/partition_prio.
5182              The value may not exceed 65533.  Also see PriorityJobFactor.
5183
5184       QOS    Used  to  extend  the  limits available to a QOS on a partition.
5185              Jobs will not be associated to this QOS outside of being associ‐
5186              ated  to  the partition.  They will still be associated to their
5187              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5188              set  in both the Partition's QOS and the Job's QOS the Partition
5189              QOS will be honored unless the Job's  QOS  has  the  OverPartQOS
5190              flag set in which the Job's QOS will have priority.
5191
5192       ReqResv
5193              Specifies  users  of  this partition are required to designate a
5194              reservation when submitting a job. This option can be useful  in
5195              restricting  usage  of a partition that may have higher priority
5196              or additional resources to be allowed only within a reservation.
5197              Possible values are "YES" and "NO".  The default value is "NO".
5198
5199       ResumeTimeout
5200              Maximum  time  permitted (in seconds) between when a node resume
5201              request is issued and when the node is  actually  available  for
5202              use.   Nodes  which  fail  to respond in this time frame will be
5203              marked DOWN and the jobs scheduled on the node requeued.   Nodes
5204              which  reboot  after  this time frame will be marked DOWN with a
5205              reason of "Node unexpectedly rebooted."  For nodes that  are  in
5206              multiple  partitions with this option set, the highest time will
5207              take effect. If not set on any partition, the node will use  the
5208              ResumeTimeout value set for the entire cluster.
5209
5210       RootOnly
5211              Specifies if only user ID zero (i.e. user root) may allocate re‐
5212              sources in this partition. User root may allocate resources  for
5213              any  other user, but the request must be initiated by user root.
5214              This option can be useful for a partition to be managed by  some
5215              external  entity  (e.g. a higher-level job manager) and prevents
5216              users from directly using those resources.  Possible values  are
5217              "YES" and "NO".  The default value is "NO".
5218
5219       SelectTypeParameters
5220              Partition-specific  resource  allocation  type.  This option re‐
5221              places the global SelectTypeParameters value.  Supported  values
5222              are  CR_Core,  CR_Core_Memory,  CR_Socket  and CR_Socket_Memory.
5223              Use requires the system-wide SelectTypeParameters value  be  set
5224              to  any  of  the four supported values previously listed; other‐
5225              wise, the partition-specific value will be ignored.
5226
5227       Shared The Shared configuration parameter  has  been  replaced  by  the
5228              OverSubscribe parameter described above.
5229
5230       State  State of partition or availability for use.  Possible values are
5231              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5232              See also the related "Alternate" keyword.
5233
5234              UP        Designates  that  new jobs may be queued on the parti‐
5235                        tion, and that jobs may be  allocated  nodes  and  run
5236                        from the partition.
5237
5238              DOWN      Designates  that  new jobs may be queued on the parti‐
5239                        tion, but queued jobs may not be allocated  nodes  and
5240                        run  from  the  partition. Jobs already running on the
5241                        partition continue to run. The jobs must be explicitly
5242                        canceled to force their termination.
5243
5244              DRAIN     Designates  that no new jobs may be queued on the par‐
5245                        tition (job submission requests will be denied with an
5246                        error  message), but jobs already queued on the parti‐
5247                        tion may be allocated nodes and  run.   See  also  the
5248                        "Alternate" partition specification.
5249
5250              INACTIVE  Designates  that no new jobs may be queued on the par‐
5251                        tition, and jobs already queued may not  be  allocated
5252                        nodes  and  run.   See  also the "Alternate" partition
5253                        specification.
5254
5255       SuspendTime
5256              Nodes which remain idle or down for this number of seconds  will
5257              be placed into power save mode by SuspendProgram.  For efficient
5258              system utilization, it is recommended that the value of Suspend‐
5259              Time  be at least as large as the sum of SuspendTimeout plus Re‐
5260              sumeTimeout.  For nodes that are  in  multiple  partitions  with
5261              this  option  set, the highest time will take effect. If not set
5262              on any partition, the node will use the  SuspendTime  value  set
5263              for  the  entire  cluster.   Setting SuspendTime to anything but
5264              "INFINITE" will enable power save mode.
5265
5266       SuspendTimeout
5267              Maximum time permitted (in seconds) between when a node  suspend
5268              request  is  issued and when the node is shutdown.  At that time
5269              the node must be ready for a resume  request  to  be  issued  as
5270              needed  for new work.  For nodes that are in multiple partitions
5271              with this option set, the highest time will take effect. If  not
5272              set on any partition, the node will use the SuspendTimeout value
5273              set for the entire cluster.
5274
5275       TRESBillingWeights
5276              TRESBillingWeights is used to define the billing weights of each
5277              TRES  type  that will be used in calculating the usage of a job.
5278              The calculated usage is used when calculating fairshare and when
5279              enforcing the TRES billing limit on jobs.
5280
5281              Billing weights are specified as a comma-separated list of <TRES
5282              Type>=<TRES Billing Weight> pairs.
5283
5284              Any TRES Type is available for billing. Note that the base  unit
5285              for memory and burst buffers is megabytes.
5286
5287              By  default  the billing of TRES is calculated as the sum of all
5288              TRES types multiplied by their corresponding billing weight.
5289
5290              The weighted amount of a resource can be adjusted  by  adding  a
5291              suffix  of K,M,G,T or P after the billing weight. For example, a
5292              memory weight of "mem=.25" on a job allocated 8GB will be billed
5293              2048  (8192MB  *.25) units. A memory weight of "mem=.25G" on the
5294              same job will be billed 2 (8192MB * (.25/1024)) units.
5295
5296              Negative values are allowed.
5297
5298              When a job is allocated 1 CPU and 8 GB of memory on a  partition
5299              configured                   with                   TRESBilling‐
5300              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5301              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5302
5303              If  PriorityFlags=MAX_TRES  is  configured, the billable TRES is
5304              calculated as the MAX of individual TRES' on a node (e.g.  cpus,
5305              mem, gres) plus the sum of all global TRES' (e.g. licenses). Us‐
5306              ing the same example above the billable TRES will be  MAX(1*1.0,
5307              8*0.25) + (0*2.0) = 2.0.
5308
5309              If  TRESBillingWeights  is  not  defined  then the job is billed
5310              against the total number of allocated CPUs.
5311
5312              NOTE: TRESBillingWeights doesn't affect job priority directly as
5313              it  is  currently  not used for the size of the job. If you want
5314              TRES' to play a role in the job's priority  then  refer  to  the
5315              PriorityWeightTRES option.
5316

PROLOG AND EPILOG SCRIPTS

5318       There  are  a variety of prolog and epilog program options that execute
5319       with various permissions and at various times.  The four  options  most
5320       likely to be used are: Prolog and Epilog (executed once on each compute
5321       node for each job) plus PrologSlurmctld and  EpilogSlurmctld  (executed
5322       once on the ControlMachine for each job).
5323
5324       NOTE:  Standard  output  and error messages are normally not preserved.
5325       Explicitly write output and error messages to an  appropriate  location
5326       if you wish to preserve that information.
5327
5328       NOTE:   By default the Prolog script is ONLY run on any individual node
5329       when it first sees a job step from a new allocation. It  does  not  run
5330       the  Prolog immediately when an allocation is granted.  If no job steps
5331       from an allocation are run on a node, it will never run the Prolog  for
5332       that  allocation.   This  Prolog  behaviour  can be changed by the Pro‐
5333       logFlags parameter.  The Epilog, on the other hand, always runs on  ev‐
5334       ery node of an allocation when the allocation is released.
5335
5336       If the Epilog fails (returns a non-zero exit code), this will result in
5337       the node being set to a DRAIN state.  If the EpilogSlurmctld fails (re‐
5338       turns  a  non-zero exit code), this will only be logged.  If the Prolog
5339       fails (returns a non-zero exit code), this will result in the node  be‐
5340       ing set to a DRAIN state and the job being requeued in a held state un‐
5341       less nohold_on_prolog_fail is configured  in  SchedulerParameters.   If
5342       the PrologSlurmctld fails (returns a non-zero exit code), this will re‐
5343       sult in the job being requeued to be executed on another node if possi‐
5344       ble.  Only  batch  jobs  can be requeued.  Interactive jobs (salloc and
5345       srun) will be cancelled if the PrologSlurmctld fails.  If slurmcltd  is
5346       stopped while either PrologSlurmctld or EpilogSlurmctld is running, the
5347       script will be killed with SIGKILL. The script will restart when slurm‐
5348       ctld restarts.
5349
5350
5351       Information  about  the  job  is passed to the script using environment
5352       variables.  Unless otherwise specified, these environment variables are
5353       available  in each of the scripts mentioned above (Prolog, Epilog, Pro‐
5354       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5355       ables  that  includes  those  available  in the SrunProlog, SrunEpilog,
5356       TaskProlog and TaskEpilog  please  see  the  Prolog  and  Epilog  Guide
5357       <https://slurm.schedmd.com/prolog_epilog.html>.
5358
5359
5360       SLURM_ARRAY_JOB_ID
5361              If  this job is part of a job array, this will be set to the job
5362              ID.  Otherwise it will not be set.  To reference  this  specific
5363              task  of  a job array, combine SLURM_ARRAY_JOB_ID with SLURM_AR‐
5364              RAY_TASK_ID     (e.g.     "scontrol      update      ${SLURM_AR‐
5365              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}   ...");  Available  in  Pro‐
5366              logSlurmctld and EpilogSlurmctld.
5367
5368       SLURM_ARRAY_TASK_ID
5369              If this job is part of a job array, this will be set to the task
5370              ID.   Otherwise  it will not be set.  To reference this specific
5371              task of a job array, combine SLURM_ARRAY_JOB_ID  with  SLURM_AR‐
5372              RAY_TASK_ID      (e.g.      "scontrol     update     ${SLURM_AR‐
5373              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}  ...");  Available  in   Pro‐
5374              logSlurmctld and EpilogSlurmctld.
5375
5376       SLURM_ARRAY_TASK_MAX
5377              If this job is part of a job array, this will be set to the max‐
5378              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5379              logSlurmctld and EpilogSlurmctld.
5380
5381       SLURM_ARRAY_TASK_MIN
5382              If this job is part of a job array, this will be set to the min‐
5383              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5384              logSlurmctld and EpilogSlurmctld.
5385
5386       SLURM_ARRAY_TASK_STEP
5387              If this job is part of a job array, this will be set to the step
5388              size of task IDs.  Otherwise it will not be set.   Available  in
5389              PrologSlurmctld and EpilogSlurmctld.
5390
5391       SLURM_CLUSTER_NAME
5392              Name of the cluster executing the job.
5393
5394       SLURM_CONF
5395              Location of the slurm.conf file. Available in Prolog and Epilog.
5396
5397       SLURMD_NODENAME
5398              Name of the node running the task. In the case of a parallel job
5399              executing on multiple compute nodes, the various tasks will have
5400              this  environment  variable set to different values on each com‐
5401              pute node. Available in Prolog and Epilog.
5402
5403       SLURM_JOB_ACCOUNT
5404              Account name used for the job.  Available in PrologSlurmctld and
5405              EpilogSlurmctld.
5406
5407       SLURM_JOB_CONSTRAINTS
5408              Features  required  to  run  the job.  Available in Prolog, Pro‐
5409              logSlurmctld and EpilogSlurmctld.
5410
5411       SLURM_JOB_DERIVED_EC
5412              The highest exit code of all of the  job  steps.   Available  in
5413              EpilogSlurmctld.
5414
5415       SLURM_JOB_EXIT_CODE
5416              The  exit  code  of the job script (or salloc). The value is the
5417              status as returned by  the  wait()  system  call  (See  wait(2))
5418              Available in EpilogSlurmctld.
5419
5420       SLURM_JOB_EXIT_CODE2
5421              The  exit  code of the job script (or salloc). The value has the
5422              format <exit>:<sig>. The first number is the  exit  code,  typi‐
5423              cally  as  set  by the exit() function. The second number of the
5424              signal that caused the process to terminate if it was terminated
5425              by a signal.  Available in EpilogSlurmctld.
5426
5427       SLURM_JOB_GID
5428              Group ID of the job's owner.
5429
5430       SLURM_JOB_GPUS
5431              The  GPU  IDs of GPUs in the job allocation (if any).  Available
5432              in the Prolog and Epilog.
5433
5434       SLURM_JOB_GROUP
5435              Group name of the job's owner.  Available in PrologSlurmctld and
5436              EpilogSlurmctld.
5437
5438       SLURM_JOB_ID
5439              Job ID.
5440
5441       SLURM_JOBID
5442              Job ID.
5443
5444       SLURM_JOB_NAME
5445              Name  of the job.  Available in PrologSlurmctld and EpilogSlurm‐
5446              ctld.
5447
5448       SLURM_JOB_NODELIST
5449              Nodes assigned to job. A Slurm hostlist  expression.   "scontrol
5450              show  hostnames"  can be used to convert this to a list of indi‐
5451              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5452              logSlurmctld.
5453
5454       SLURM_JOB_PARTITION
5455              Partition  that  job runs in.  Available in Prolog, PrologSlurm‐
5456              ctld and EpilogSlurmctld.
5457
5458       SLURM_JOB_UID
5459              User ID of the job's owner.
5460
5461       SLURM_JOB_USER
5462              User name of the job's owner.
5463
5464       SLURM_SCRIPT_CONTEXT
5465              Identifies which epilog or prolog program is currently running.
5466

UNKILLABLE STEP PROGRAM SCRIPT

5468       This program can be used to take special actions to clean up the unkil‐
5469       lable  processes and/or notify system administrators.  The program will
5470       be run as SlurmdUser (usually "root") on the compute node where Unkill‐
5471       ableStepTimeout was triggered.
5472
5473       Information about the unkillable job step is passed to the script using
5474       environment variables.
5475
5476
5477       SLURM_JOB_ID
5478              Job ID.
5479
5480       SLURM_STEP_ID
5481              Job Step ID.
5482

NETWORK TOPOLOGY

5484       Slurm is able to optimize job  allocations  to  minimize  network  con‐
5485       tention.   Special  Slurm logic is used to optimize allocations on sys‐
5486       tems with a three-dimensional interconnect.  and information about con‐
5487       figuring  those  systems  are  available  on  web pages available here:
5488       <https://slurm.schedmd.com/>.  For a hierarchical network, Slurm  needs
5489       to have detailed information about how nodes are configured on the net‐
5490       work switches.
5491
5492       Given network topology information, Slurm allocates all of a job's  re‐
5493       sources  onto  a  single  leaf  of  the  network  (if possible) using a
5494       best-fit algorithm.  Otherwise it will allocate a job's resources  onto
5495       multiple  leaf  switches  so  as  to  minimize  the use of higher-level
5496       switches.  The TopologyPlugin parameter controls which plugin  is  used
5497       to  collect  network  topology  information.  The only values presently
5498       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5499       forms  best-fit logic over three-dimensional topology), "topology/none"
5500       (default for other systems, best-fit logic over one-dimensional  topol‐
5501       ogy), "topology/tree" (determine the network topology based upon infor‐
5502       mation contained in a topology.conf file, see "man  topology.conf"  for
5503       more  information).  Future plugins may gather topology information di‐
5504       rectly from the network.  The topology information is optional.  If not
5505       provided,  Slurm  will  perform a best-fit algorithm assuming the nodes
5506       are in a one-dimensional array as  configured  and  the  communications
5507       cost is related to the node distance in this array.
5508
5509

RELOCATING CONTROLLERS

5511       If  the  cluster's  computers used for the primary or backup controller
5512       will be out of service for an extended period of time, it may be desir‐
5513       able to relocate them.  In order to do so, follow this procedure:
5514
5515       1. Stop the Slurm daemons
5516       2. Modify the slurm.conf file appropriately
5517       3. Distribute the updated slurm.conf file to all nodes
5518       4. Restart the Slurm daemons
5519
5520       There  should  be  no loss of any running or pending jobs.  Ensure that
5521       any nodes added to the cluster have the  current  slurm.conf  file  in‐
5522       stalled.
5523
5524       CAUTION: If two nodes are simultaneously configured as the primary con‐
5525       troller (two nodes on which SlurmctldHost specify the  local  host  and
5526       the slurmctld daemon is executing on each), system behavior will be de‐
5527       structive.  If a compute node has an incorrect SlurmctldHost parameter,
5528       that node may be rendered unusable, but no other harm will result.
5529
5530

EXAMPLE

5532       #
5533       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5534       # Author: John Doe
5535       # Date: 11/06/2001
5536       #
5537       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5538       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5539       #
5540       AuthType=auth/munge
5541       Epilog=/usr/local/slurm/epilog
5542       Prolog=/usr/local/slurm/prolog
5543       FirstJobId=65536
5544       InactiveLimit=120
5545       JobCompType=jobcomp/filetxt
5546       JobCompLoc=/var/log/slurm/jobcomp
5547       KillWait=30
5548       MaxJobCount=10000
5549       MinJobAge=3600
5550       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5551       ReturnToService=0
5552       SchedulerType=sched/backfill
5553       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5554       SlurmdLogFile=/var/log/slurm/slurmd.log
5555       SlurmctldPort=7002
5556       SlurmdPort=7003
5557       SlurmdSpoolDir=/var/spool/slurmd.spool
5558       StateSaveLocation=/var/spool/slurm.state
5559       SwitchType=switch/none
5560       TmpFS=/tmp
5561       WaitTime=30
5562       JobCredentialPrivateKey=/usr/local/slurm/private.key
5563       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5564       #
5565       # Node Configurations
5566       #
5567       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5568       NodeName=DEFAULT State=UNKNOWN
5569       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5570       # Update records for specific DOWN nodes
5571       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5572       #
5573       # Partition Configurations
5574       #
5575       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5576       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5577       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5578       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5579
5580

INCLUDE MODIFIERS

5582       The  "include" key word can be used with modifiers within the specified
5583       pathname. These modifiers would be replaced with cluster name or  other
5584       information  depending  on which modifier is specified. If the included
5585       file is not an absolute path name  (i.e.  it  does  not  start  with  a
5586       slash),  it  will  searched for in the same directory as the slurm.conf
5587       file.
5588
5589
5590       %c     Cluster name specified in the slurm.conf will be used.
5591
5592       EXAMPLE
5593       ClusterName=linux
5594       include /home/slurm/etc/%c_config
5595       # Above line interpreted as
5596       # "include /home/slurm/etc/linux_config"
5597
5598

FILE AND DIRECTORY PERMISSIONS

5600       There are three classes of files: Files used by slurmctld must  be  ac‐
5601       cessible  by  user  SlurmUser  and accessible by the primary and backup
5602       control machines.  Files used by slurmd must be accessible by user root
5603       and  accessible from every compute node.  A few files need to be acces‐
5604       sible by normal users on all login and compute nodes.  While many files
5605       and  directories  are  listed below, most of them will not be used with
5606       most configurations.
5607
5608
5609       Epilog Must be executable by user root.  It  is  recommended  that  the
5610              file  be  readable  by  all users.  The file must exist on every
5611              compute node.
5612
5613       EpilogSlurmctld
5614              Must be executable by user SlurmUser.  It  is  recommended  that
5615              the  file be readable by all users.  The file must be accessible
5616              by the primary and backup control machines.
5617
5618       HealthCheckProgram
5619              Must be executable by user root.  It  is  recommended  that  the
5620              file  be  readable  by  all users.  The file must exist on every
5621              compute node.
5622
5623       JobCompLoc
5624              If this specifies a file, it must be writable by user SlurmUser.
5625              The  file  must  be accessible by the primary and backup control
5626              machines.
5627
5628       JobCredentialPrivateKey
5629              Must be readable only by user SlurmUser and writable by no other
5630              users.   The  file  must be accessible by the primary and backup
5631              control machines.
5632
5633       JobCredentialPublicCertificate
5634              Readable to all users on all nodes.  Must  not  be  writable  by
5635              regular users.
5636
5637       MailProg
5638              Must  be  executable by user SlurmUser.  Must not be writable by
5639              regular users.  The file must be accessible by the  primary  and
5640              backup control machines.
5641
5642       Prolog Must  be  executable  by  user root.  It is recommended that the
5643              file be readable by all users.  The file  must  exist  on  every
5644              compute node.
5645
5646       PrologSlurmctld
5647              Must  be  executable  by user SlurmUser.  It is recommended that
5648              the file be readable by all users.  The file must be  accessible
5649              by the primary and backup control machines.
5650
5651       ResumeProgram
5652              Must be executable by user SlurmUser.  The file must be accessi‐
5653              ble by the primary and backup control machines.
5654
5655       slurm.conf
5656              Readable to all users on all nodes.  Must  not  be  writable  by
5657              regular users.
5658
5659       SlurmctldLogFile
5660              Must be writable by user SlurmUser.  The file must be accessible
5661              by the primary and backup control machines.
5662
5663       SlurmctldPidFile
5664              Must be writable by user root.  Preferably writable  and  remov‐
5665              able  by  SlurmUser.  The file must be accessible by the primary
5666              and backup control machines.
5667
5668       SlurmdLogFile
5669              Must be writable by user root.  A distinct file  must  exist  on
5670              each compute node.
5671
5672       SlurmdPidFile
5673              Must  be  writable  by user root.  A distinct file must exist on
5674              each compute node.
5675
5676       SlurmdSpoolDir
5677              Must be writable by user root.  A distinct file  must  exist  on
5678              each compute node.
5679
5680       SrunEpilog
5681              Must  be  executable by all users.  The file must exist on every
5682              login and compute node.
5683
5684       SrunProlog
5685              Must be executable by all users.  The file must exist  on  every
5686              login and compute node.
5687
5688       StateSaveLocation
5689              Must be writable by user SlurmUser.  The file must be accessible
5690              by the primary and backup control machines.
5691
5692       SuspendProgram
5693              Must be executable by user SlurmUser.  The file must be accessi‐
5694              ble by the primary and backup control machines.
5695
5696       TaskEpilog
5697              Must  be  executable by all users.  The file must exist on every
5698              compute node.
5699
5700       TaskProlog
5701              Must be executable by all users.  The file must exist  on  every
5702              compute node.
5703
5704       UnkillableStepProgram
5705              Must be executable by user SlurmUser.  The file must be accessi‐
5706              ble by the primary and backup control machines.
5707

LOGGING

5709       Note that while Slurm daemons create  log  files  and  other  files  as
5710       needed,  it  treats  the  lack  of parent directories as a fatal error.
5711       This prevents the daemons from running if critical file systems are not
5712       mounted  and  will minimize the risk of cold-starting (starting without
5713       preserving jobs).
5714
5715       Log files and job accounting files may need to be created/owned by  the
5716       "SlurmUser"  uid  to  be  successfully  accessed.   Use the "chown" and
5717       "chmod" commands to set the ownership  and  permissions  appropriately.
5718       See  the  section  FILE AND DIRECTORY PERMISSIONS for information about
5719       the various files and directories used by Slurm.
5720
5721       It is recommended that the logrotate utility be  used  to  ensure  that
5722       various  log  files do not become too large.  This also applies to text
5723       files used for accounting, process tracking, and the  slurmdbd  log  if
5724       they are used.
5725
5726       Here is a sample logrotate configuration. Make appropriate site modifi‐
5727       cations and save as  /etc/logrotate.d/slurm  on  all  nodes.   See  the
5728       logrotate man page for more details.
5729
5730       ##
5731       # Slurm Logrotate Configuration
5732       ##
5733       /var/log/slurm/*.log {
5734            compress
5735            missingok
5736            nocopytruncate
5737            nodelaycompress
5738            nomail
5739            notifempty
5740            noolddir
5741            rotate 5
5742            sharedscripts
5743            size=5M
5744            create 640 slurm root
5745            postrotate
5746                 pkill -x --signal SIGUSR2 slurmctld
5747                 pkill -x --signal SIGUSR2 slurmd
5748                 pkill -x --signal SIGUSR2 slurmdbd
5749                 exit 0
5750            endscript
5751       }
5752
5753

COPYING

5755       Copyright  (C)  2002-2007  The Regents of the University of California.
5756       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5757       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5758       Copyright (C) 2010-2022 SchedMD LLC.
5759
5760       This file is part of Slurm, a resource  management  program.   For  de‐
5761       tails, see <https://slurm.schedmd.com/>.
5762
5763       Slurm  is free software; you can redistribute it and/or modify it under
5764       the terms of the GNU General Public License as published  by  the  Free
5765       Software  Foundation;  either version 2 of the License, or (at your op‐
5766       tion) any later version.
5767
5768       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
5769       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
5770       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
5771       for more details.
5772
5773

FILES

5775       /etc/slurm.conf
5776
5777