1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified at execution time by setting the
17       SLURM_CONF environment variable. The Slurm daemons also  allow  you  to
18       override  both the built-in and environment-provided location using the
19       "-f" option on the command line.
20
21       The contents of the file are case insensitive except for the  names  of
22       nodes  and  partitions.  Any  text following a "#" in the configuration
23       file is treated as a comment through the end of that line.  Changes  to
24       the  configuration file take effect upon restart of Slurm daemons, dae‐
25       mon receipt of the SIGHUP signal, or execution of the command "scontrol
26       reconfigure" unless otherwise noted.
27
28       If  a  line  begins  with the word "Include" followed by whitespace and
29       then a file name, that file will be included inline  with  the  current
30       configuration  file.  For large or complex systems, multiple configura‐
31       tion files may prove easier to manage and enable reuse  of  some  files
32       (See INCLUDE MODIFIERS for more details).
33
34       Note on file permissions:
35
36       The slurm.conf file must be readable by all users of Slurm, since it is
37       used by many of the Slurm commands.  Other files that  are  defined  in
38       the  slurm.conf  file,  such as log files and job accounting files, may
39       need to be created/owned by the user "SlurmUser" to be successfully ac‐
40       cessed.   Use the "chown" and "chmod" commands to set the ownership and
41       permissions appropriately.  See the section FILE AND DIRECTORY  PERMIS‐
42       SIONS  for  information about the various files and directories used by
43       Slurm.
44
45

PARAMETERS

47       The overall configuration parameters available include:
48
49
50       AccountingStorageBackupHost
51              The name of the backup machine hosting  the  accounting  storage
52              database.   If used with the accounting_storage/slurmdbd plugin,
53              this is where the backup slurmdbd would be running.   Only  used
54              with systems using SlurmDBD, ignored otherwise.
55
56       AccountingStorageEnforce
57              This controls what level of association-based enforcement to im‐
58              pose on job submissions.  Valid options are any  combination  of
59              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
60              all for all things (except nojobs and nosteps, which must be re‐
61              quested as well).
62
63              If  limits,  qos, or wckeys are set, associations will automati‐
64              cally be set.
65
66              If wckeys is set, TrackWCKey will automatically be set.
67
68              If safe is set, limits and associations  will  automatically  be
69              set.
70
71              If nojobs is set, nosteps will automatically be set.
72
73              By  setting  associations, no new job is allowed to run unless a
74              corresponding association exists in the system.  If  limits  are
75              enforced,  users  can  be limited by association to whatever job
76              size or run time limits are defined.
77
78              If nojobs is set, Slurm will not account for any jobs  or  steps
79              on  the  system. Likewise, if nosteps is set, Slurm will not ac‐
80              count for any steps that have run.
81
82              If safe is enforced, a job will only be launched against an  as‐
83              sociation  or  qos  that has a GrpTRESMins limit set, if the job
84              will be able to run to completion. Without this option set, jobs
85              will  be  launched  as  long  as  their usage hasn't reached the
86              cpu-minutes limit. This can lead to jobs being launched but then
87              killed when the limit is reached.
88
89              With  qos  and/or wckeys enforced jobs will not be scheduled un‐
90              less a valid qos and/or workload characterization key is  speci‐
91              fied.
92
93              A restart of slurmctld is required for changes to this parameter
94              to take effect.
95
96       AccountingStorageExternalHost
97              A     comma-separated     list     of     external     slurmdbds
98              (<host/ip>[:port][,...])  to register with. If no port is given,
99              the AccountingStoragePort will be used.
100
101              This allows clusters registered with the  external  slurmdbd  to
102              communicate  with  each other using the --cluster/-M client com‐
103              mand options.
104
105              The cluster will add itself  to  the  external  slurmdbd  if  it
106              doesn't  exist.  If a non-external cluster already exists on the
107              external slurmdbd, the slurmctld will ignore registering to  the
108              external slurmdbd.
109
110       AccountingStorageHost
111              The name of the machine hosting the accounting storage database.
112              Only used with systems using SlurmDBD, ignored otherwise.
113
114       AccountingStorageParameters
115              Comma-separated list of  key-value  pair  parameters.  Currently
116              supported  values  include options to establish a secure connec‐
117              tion to the database:
118
119              SSL_CERT
120                The path name of the client public key certificate file.
121
122              SSL_CA
123                The path name of the Certificate  Authority  (CA)  certificate
124                file.
125
126              SSL_CAPATH
127                The  path  name  of the directory that contains trusted SSL CA
128                certificate files.
129
130              SSL_KEY
131                The path name of the client private key file.
132
133              SSL_CIPHER
134                The list of permissible ciphers for SSL encryption.
135
136       AccountingStoragePass
137              The password used to gain access to the database  to  store  the
138              accounting  data.   Only used for database type storage plugins,
139              ignored otherwise.  In the case of Slurm DBD  (Database  Daemon)
140              with  MUNGE authentication this can be configured to use a MUNGE
141              daemon specifically configured to provide authentication between
142              clusters  while the default MUNGE daemon provides authentication
143              within a cluster.  In that  case,  AccountingStoragePass  should
144              specify  the  named  port to be used for communications with the
145              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
146              The default value is NULL.
147
148       AccountingStoragePort
149              The  listening  port  of the accounting storage database server.
150              Only used for database type storage plugins, ignored  otherwise.
151              The  default  value  is  SLURMDBD_PORT  as established at system
152              build time. If no value is explicitly specified, it will be  set
153              to  6819.   This value must be equal to the DbdPort parameter in
154              the slurmdbd.conf file.
155
156       AccountingStorageTRES
157              Comma-separated list of resources you wish to track on the clus‐
158              ter.   These  are the resources requested by the sbatch/srun job
159              when it is submitted. Currently this consists of  any  GRES,  BB
160              (burst  buffer) or license along with CPU, Memory, Node, Energy,
161              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
162              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
163              These default TRES cannot be disabled,  but  only  appended  to.
164              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
165              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
166              along with a gres called craynetwork as well as a license called
167              iop1. Whenever these resources are used on the cluster they  are
168              recorded.  The  TRES are automatically set up in the database on
169              the start of the slurmctld.
170
171              If multiple GRES of different types are tracked  (e.g.  GPUs  of
172              different  types), then job requests with matching type specifi‐
173              cations will be recorded.  Given a  configuration  of  "Account‐
174              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
175              "gres/gpu:tesla" and "gres/gpu:volta" will track only jobs  that
176              explicitly  request  those  two GPU types, while "gres/gpu" will
177              track allocated GPUs of any type ("tesla", "volta" or any  other
178              GPU type).
179
180              Given      a      configuration      of      "AccountingStorage‐
181              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
182              "gres/gpu:volta"  will  track jobs that explicitly request those
183              GPU types.  If a job requests  GPUs,  but  does  not  explicitly
184              specify  the  GPU type, then its resource allocation will be ac‐
185              counted for as either "gres/gpu:tesla" or "gres/gpu:volta",  al‐
186              though  the  accounting  may not match the actual GPU type allo‐
187              cated to the job and the GPUs allocated to the job could be het‐
188              erogeneous.  In an environment containing various GPU types, use
189              of a job_submit plugin may be desired in order to force jobs  to
190              explicitly specify some GPU type.
191
192       AccountingStorageType
193              The  accounting  storage  mechanism  type.  Acceptable values at
194              present include "accounting_storage/none" and  "accounting_stor‐
195              age/slurmdbd".   The  "accounting_storage/slurmdbd"  value indi‐
196              cates that accounting records will be written to the Slurm  DBD,
197              which  manages  an underlying MySQL database. See "man slurmdbd"
198              for more information.  The default  value  is  "accounting_stor‐
199              age/none" and indicates that account records are not maintained.
200
201       AccountingStorageUser
202              The  user account for accessing the accounting storage database.
203              Only used for database type storage plugins, ignored otherwise.
204
205       AccountingStoreFlags
206              Comma separated list used to tell the slurmctld to  store  extra
207              fields  that may be more heavy weight than the normal job infor‐
208              mation.
209
210              Current options are:
211
212              job_comment
213                     Include the job's comment field in the job complete  mes‐
214                     sage  sent  to the Accounting Storage database.  Note the
215                     AdminComment and SystemComment are always recorded in the
216                     database.
217
218              job_env
219                     Include  a  batch job's environment variables used at job
220                     submission in the job start message sent to the  Account‐
221                     ing Storage database.
222
223              job_script
224                     Include  the  job's batch script in the job start message
225                     sent to the Accounting Storage database.
226
227       AcctGatherNodeFreq
228              The AcctGather plugins sampling interval  for  node  accounting.
229              For AcctGather plugin values of none, this parameter is ignored.
230              For all other values this parameter is the number of seconds be‐
231              tween  node  accounting samples. For the acct_gather_energy/rapl
232              plugin, set a value less than 300 because the counters may over‐
233              flow  beyond  this  rate.  The default value is zero. This value
234              disables accounting sampling for  nodes.  Note:  The  accounting
235              sampling  interval for jobs is determined by the value of JobAc‐
236              ctGatherFrequency.
237
238       AcctGatherEnergyType
239              Identifies the plugin to be used for energy consumption account‐
240              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
241              plugin to collect energy consumption data for  jobs  and  nodes.
242              The  collection  of  energy  consumption data takes place on the
243              node level, hence only in case of exclusive job  allocation  the
244              energy consumption measurements will reflect the job's real con‐
245              sumption. In case of node sharing between jobs the reported con‐
246              sumed  energy  per job (through sstat or sacct) will not reflect
247              the real energy consumed by the jobs.
248
249              Configurable values at present are:
250
251              acct_gather_energy/none
252                                  No energy consumption data is collected.
253
254              acct_gather_energy/ipmi
255                                  Energy consumption data  is  collected  from
256                                  the  Baseboard  Management  Controller (BMC)
257                                  using the  Intelligent  Platform  Management
258                                  Interface (IPMI).
259
260              acct_gather_energy/pm_counters
261                                  Energy  consumption  data  is collected from
262                                  the Baseboard  Management  Controller  (BMC)
263                                  for HPE Cray systems.
264
265              acct_gather_energy/rapl
266                                  Energy  consumption  data  is collected from
267                                  hardware sensors using the  Running  Average
268                                  Power  Limit (RAPL) mechanism. Note that en‐
269                                  abling RAPL may require the execution of the
270                                  command "sudo modprobe msr".
271
272              acct_gather_energy/xcc
273                                  Energy  consumption  data  is collected from
274                                  the Lenovo SD650 XClarity  Controller  (XCC)
275                                  using IPMI OEM raw commands.
276
277       AcctGatherInterconnectType
278              Identifies  the plugin to be used for interconnect network traf‐
279              fic accounting.  The jobacct_gather  plugin  and  slurmd  daemon
280              call  this  plugin  to collect network traffic data for jobs and
281              nodes.  The collection of network traffic data  takes  place  on
282              the  node  level, hence only in case of exclusive job allocation
283              the collected values will reflect the  job's  real  traffic.  In
284              case  of  node sharing between jobs the reported network traffic
285              per job (through sstat or sacct) will not reflect the real  net‐
286              work traffic by the jobs.
287
288              Configurable values at present are:
289
290              acct_gather_interconnect/none
291                                  No infiniband network data are collected.
292
293              acct_gather_interconnect/ofed
294                                  Infiniband  network  traffic  data  are col‐
295                                  lected from the hardware monitoring counters
296                                  of  Infiniband  devices through the OFED li‐
297                                  brary.  In order to account for per job net‐
298                                  work  traffic, add the "ic/ofed" TRES to Ac‐
299                                  countingStorageTRES.
300
301              acct_gather_interconnect/sysfs
302                                  Network  traffic  statistics  are  collected
303                                  from  the  Linux sysfs pseudo-filesystem for
304                                  specific     interfaces      defined      in
305                                  acct_gather_interconnect.conf(5).   In order
306                                  to account for per job network traffic,  add
307                                  the  "ic/sysfs"  TRES  to AccountingStorage‐
308                                  TRES.
309
310       AcctGatherFilesystemType
311              Identifies the plugin to be used for filesystem traffic account‐
312              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
313              plugin to collect filesystem traffic data for  jobs  and  nodes.
314              The  collection  of  filesystem  traffic data takes place on the
315              node level, hence only in case of exclusive job  allocation  the
316              collected values will reflect the job's real traffic. In case of
317              node sharing between jobs the reported  filesystem  traffic  per
318              job  (through sstat or sacct) will not reflect the real filesys‐
319              tem traffic by the jobs.
320
321
322              Configurable values at present are:
323
324              acct_gather_filesystem/none
325                                  No filesystem data are collected.
326
327              acct_gather_filesystem/lustre
328                                  Lustre filesystem traffic data are collected
329                                  from the counters found in /proc/fs/lustre/.
330                                  In order to account for per job lustre traf‐
331                                  fic,  add  the  "fs/lustre" TRES to Account‐
332                                  ingStorageTRES.
333
334       AcctGatherProfileType
335              Identifies the plugin to be used  for  detailed  job  profiling.
336              The  jobacct_gather plugin and slurmd daemon call this plugin to
337              collect detailed data such as I/O counts, memory usage,  or  en‐
338              ergy  consumption  for  jobs  and nodes. There are interfaces in
339              this plugin to collect data as step start and  completion,  task
340              start  and  completion, and at the account gather frequency. The
341              data collected at the node level is related to jobs only in case
342              of exclusive job allocation.
343
344              Configurable values at present are:
345
346              acct_gather_profile/none
347                                  No profile data is collected.
348
349              acct_gather_profile/hdf5
350                                  This  enables the HDF5 plugin. The directory
351                                  where the profile files are stored and which
352                                  values  are  collected are configured in the
353                                  acct_gather.conf file.
354
355              acct_gather_profile/influxdb
356                                  This enables the influxdb  plugin.  The  in‐
357                                  fluxdb instance host, port, database, reten‐
358                                  tion policy and which values  are  collected
359                                  are configured in the acct_gather.conf file.
360
361       AllowSpecResourcesUsage
362              If set to "YES", Slurm allows individual jobs to override node's
363              configured CoreSpecCount value. For a job to take  advantage  of
364              this feature, a command line option of --core-spec must be spec‐
365              ified.  The default value for this option is "YES" for Cray sys‐
366              tems and "NO" for other system types.
367
368       AuthAltTypes
369              Comma-separated  list of alternative authentication plugins that
370              the slurmctld will permit for communication.  Acceptable  values
371              at present include auth/jwt.
372
373              NOTE:  auth/jwt  requires a jwt_hs256.key to be populated in the
374              StateSaveLocation   directory   for    slurmctld    only.    The
375              jwt_hs256.key  should only be visible to the SlurmUser and root.
376              It is not suggested to place the jwt_hs256.key on any nodes  but
377              the  controller running slurmctld.  auth/jwt can be activated by
378              the presence of the SLURM_JWT environment variable.  When  acti‐
379              vated, it will override the default AuthType.
380
381       AuthAltParameters
382              Used  to define alternative authentication plugins options. Mul‐
383              tiple options may be comma separated.
384
385              disable_token_creation
386                             Disable "scontrol token" use by non-SlurmUser ac‐
387                             counts.
388
389              max_token_lifespan=<seconds>
390                             Set  max lifespan (in seconds) for any token gen‐
391                             erated for user accounts.  (This limit  does  not
392                             apply to SlurmUser.)
393
394              jwks=          Absolute  path  to JWKS file. Only RS256 keys are
395                             supported, although other key types may be listed
396                             in  the file. If set, no HS256 key will be loaded
397                             by default (and token  generation  is  disabled),
398                             although  the  jwt_key setting may be used to ex‐
399                             plicitly re-enable HS256 key use (and token  gen‐
400                             eration).
401
402              jwt_key=       Absolute path to JWT key file. Key must be HS256,
403                             and should only be accessible  by  SlurmUser.  If
404                             not set, the default key file is jwt_hs256.key in
405                             StateSaveLocation.
406
407       AuthInfo
408              Additional information to be used for authentication of communi‐
409              cations between the Slurm daemons (slurmctld and slurmd) and the
410              Slurm clients.  The interpretation of this option is specific to
411              the configured AuthType.  Multiple options may be specified in a
412              comma-delimited list.  If not specified, the default authentica‐
413              tion information will be used.
414
415              cred_expire   Default  job  step credential lifetime, in seconds
416                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
417                            ciently  long enough to load user environment, run
418                            prolog, deal with the slurmd getting paged out  of
419                            memory,  etc.   This  also controls how long a re‐
420                            queued job must wait before starting  again.   The
421                            default value is 120 seconds.
422
423              socket        Path  name  to  a MUNGE daemon socket to use (e.g.
424                            "socket=/var/run/munge/munge.socket.2").  The  de‐
425                            fault  value  is  "/var/run/munge/munge.socket.2".
426                            Used by auth/munge and cred/munge.
427
428              ttl           Credential lifetime, in seconds (e.g.  "ttl=300").
429                            The  default value is dependent upon the MUNGE in‐
430                            stallation, but is typically 300 seconds.
431
432       AuthType
433              The authentication method for communications between Slurm  com‐
434              ponents.   Acceptable  values  at  present include "auth/munge",
435              which is the default.  "auth/munge" indicates that MUNGE  is  to
436              be  used.  (See "https://dun.github.io/munge/" for more informa‐
437              tion).  All Slurm daemons and commands must be terminated  prior
438              to changing the value of AuthType and later restarted.
439
440       BackupAddr
441              Deprecated option, see SlurmctldHost.
442
443       BackupController
444              Deprecated option, see SlurmctldHost.
445
446              The backup controller recovers state information from the State‐
447              SaveLocation directory, which must be readable and writable from
448              both  the  primary and backup controllers.  While not essential,
449              it is recommended that you specify  a  backup  controller.   See
450              the RELOCATING CONTROLLERS section if you change this.
451
452       BatchStartTimeout
453              The  maximum time (in seconds) that a batch job is permitted for
454              launching before being considered missing and releasing the  al‐
455              location.  The  default value is 10 (seconds). Larger values may
456              be required if more time is required to execute the Prolog, load
457              user  environment  variables, or if the slurmd daemon gets paged
458              from memory.
459              Note: The test for a job being  successfully  launched  is  only
460              performed  when  the  Slurm daemon on the compute node registers
461              state with the slurmctld daemon on the head node, which  happens
462              fairly  rarely.   Therefore a job will not necessarily be termi‐
463              nated if its start time exceeds BatchStartTimeout.  This config‐
464              uration  parameter  is  also  applied  to launch tasks and avoid
465              aborting srun commands due to long running Prolog scripts.
466
467       BcastExclude
468              Comma-separated list of absolute directory paths to be  excluded
469              when autodetecting and broadcasting executable shared object de‐
470              pendencies through sbcast or srun --bcast.  The  keyword  "none"
471              can  be  used  to indicate that no directory paths should be ex‐
472              cluded. The default value is  "/lib,/usr/lib,/lib64,/usr/lib64".
473              This  option  can  be  overridden  by  sbcast --exclude and srun
474              --bcast-exclude.
475
476       BcastParameters
477              Controls sbcast and srun --bcast behavior. Multiple options  can
478              be  specified  in  a comma separated list.  Supported values in‐
479              clude:
480
481              DestDir=       Destination directory for file being broadcast to
482                             allocated  compute  nodes.  Default value is cur‐
483                             rent working directory, or --chdir  for  srun  if
484                             set.
485
486              Compression=   Specify  default  file  compression library to be
487                             used.  Supported values  are  "lz4"  and  "none".
488                             The  default value with the sbcast --compress op‐
489                             tion is "lz4" and "none"  otherwise.   Some  com‐
490                             pression  libraries  may  be  unavailable on some
491                             systems.
492
493              send_libs      If set, attempt to autodetect and  broadcast  the
494                             executable's  shared object dependencies to allo‐
495                             cated compute nodes. The files are  placed  in  a
496                             directory  alongside  the  executable.  For  srun
497                             only, the LD_LIBRARY_PATH  is  automatically  up‐
498                             dated  to  include  this cache directory as well.
499                             This can be overridden with either sbcast or srun
500                             --send-libs option. By default this is disabled.
501
502       BurstBufferType
503              The  plugin  used  to manage burst buffers. Acceptable values at
504              present are:
505
506              burst_buffer/datawarp
507                     Use Cray DataWarp API to provide burst buffer functional‐
508                     ity.
509
510              burst_buffer/lua
511                     This plugin provides hooks to an API that is defined by a
512                     Lua script. This plugin was developed to  provide  system
513                     administrators  with  a way to do any task (not only file
514                     staging) at different points in a job’s life cycle.
515
516              burst_buffer/none
517
518       CliFilterPlugins
519              A comma-delimited list of command  line  interface  option  fil‐
520              ter/modification plugins. The specified plugins will be executed
521              in the order listed.  No cli_filter plugins are used by default.
522              Acceptable values at present are:
523
524              cli_filter/lua
525                     This  plugin  allows you to write your own implementation
526                     of a cli_filter using lua.
527
528              cli_filter/syslog
529                     This plugin enables logging of job submission  activities
530                     performed.  All the salloc/sbatch/srun options are logged
531                     to syslog together with  environment  variables  in  JSON
532                     format.  If the plugin is not the last one in the list it
533                     may log values different than what was actually  sent  to
534                     slurmctld.
535
536              cli_filter/user_defaults
537                     This  plugin looks for the file $HOME/.slurm/defaults and
538                     reads every line of it as a key=value pair, where key  is
539                     any  of  the  job  submission  options  available to sal‐
540                     loc/sbatch/srun and value is a default value  defined  by
541                     the user. For instance:
542                     time=1:30
543                     mem=2048
544                     The  above will result in a user defined default for each
545                     of their jobs of "-t 1:30" and "--mem=2048".
546
547       ClusterName
548              The name by which this Slurm managed cluster is known in the ac‐
549              counting   database.   This  is  needed  distinguish  accounting
550              records when multiple clusters report to the same database.  Be‐
551              cause  of  limitations in some databases, any upper case letters
552              in the name will be silently mapped to lower case. In  order  to
553              avoid confusion, it is recommended that the name be lower case.
554
555       CommunicationParameters
556              Comma-separated options identifying communication options.
557
558              block_null_hash
559                             Require  all  Slurm  authentication tokens to in‐
560                             clude a newer (20.11.9 and 21.08.8) payload  that
561                             provides  an additional layer of security against
562                             credential replay attacks.   This  option  should
563                             only  be enabled once all Slurm daemons have been
564                             upgraded to 20.11.9/21.08.8  or  newer,  and  all
565                             jobs  that  were  started before the upgrade have
566                             been completed.
567
568              CheckGhalQuiesce
569                             Used specifically on a Cray using an  Aries  Ghal
570                             interconnect.  This will check to see if the sys‐
571                             tem is quiescing when sending a message,  and  if
572                             so, we wait until it is done before sending.
573
574              DisableIPv4    Disable IPv4 only operation for all slurm daemons
575                             (except slurmdbd). This should  also  be  set  in
576                             your slurmdbd.conf file.
577
578              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
579                             (except slurmdbd). When using both IPv4 and IPv6,
580                             address  family preferences will be based on your
581                             /etc/gai.conf file. This should also  be  set  in
582                             your slurmdbd.conf file.
583
584              keepaliveinterval=#
585                             Specifies  the  interval between keepalive probes
586                             on the socket communications between srun and its
587                             slurmstepd process.
588
589              keepaliveprobes=#
590                             Specifies  the number of keepalive probes sent on
591                             the socket communications  between  srun  command
592                             and  its slurmstepd process before the connection
593                             is considered broken.
594
595              keepalivetime=#
596                             Specifies how long  sockets  communications  used
597                             between  the  srun  command  and  its  slurmstepd
598                             process are kept alive after  disconnect.  Longer
599                             values can be used to improve reliability of com‐
600                             munications in the event of network failures.
601
602              NoAddrCache    By default, Slurm will cache a node's network ad‐
603                             dress  after successfully establishing the node's
604                             network address. This option disables  the  cache
605                             and Slurm will look up the node's network address
606                             each time a connection is made.  This is  useful,
607                             for  example,  in  a  cloud environment where the
608                             node addresses come and go out of DNS.
609
610              NoCtldInAddrAny
611                             Used to directly bind to the address of what  the
612                             node resolves to running the slurmctld instead of
613                             binding messages to  any  address  on  the  node,
614                             which is the default.
615
616              NoInAddrAny    Used  to directly bind to the address of what the
617                             node resolves to instead of binding  messages  to
618                             any  address  on  the  node which is the default.
619                             This option is for all daemons/clients except for
620                             the slurmctld.
621
622       CompleteWait
623              The  time to wait, in seconds, when any job is in the COMPLETING
624              state before any additional jobs are scheduled. This is  to  at‐
625              tempt  to keep jobs on nodes that were recently in use, with the
626              goal of preventing fragmentation.  If set to zero, pending  jobs
627              will  be  started as soon as possible.  Since a COMPLETING job's
628              resources are released for use by other jobs as soon as the Epi‐
629              log  completes  on each individual node, this can result in very
630              fragmented resource allocations.  To provide jobs with the mini‐
631              mum  response time, a value of zero is recommended (no waiting).
632              To minimize fragmentation of resources, a value equal  to  Kill‐
633              Wait plus two is recommended.  In that case, setting KillWait to
634              a small value may be beneficial.  The default value of Complete‐
635              Wait is zero seconds.  The value may not exceed 65533.
636
637              NOTE:  Setting  reduce_completing_frag  affects  the behavior of
638              CompleteWait.
639
640       ControlAddr
641              Deprecated option, see SlurmctldHost.
642
643       ControlMachine
644              Deprecated option, see SlurmctldHost.
645
646       CoreSpecPlugin
647              Identifies the plugins to be used for enforcement of  core  spe‐
648              cialization.   A  restart  of the slurmd daemons is required for
649              changes to this parameter to take effect.  Acceptable values  at
650              present include:
651
652              core_spec/cray_aries
653                                  used only for Cray systems
654
655              core_spec/none      used for all other system types
656
657       CpuFreqDef
658              Default  CPU  frequency  value or frequency governor to use when
659              running a job step if it has not been explicitly  set  with  the
660              --cpu-freq  option.   Acceptable values at present include a nu‐
661              meric value (frequency in kilohertz) or  one  of  the  following
662              governors:
663
664              Conservative  attempts to use the Conservative CPU governor
665
666              OnDemand      attempts to use the OnDemand CPU governor
667
668              Performance   attempts to use the Performance CPU governor
669
670              PowerSave     attempts to use the PowerSave CPU governor
671       There  is no default value. If unset, no attempt to set the governor is
672       made if the --cpu-freq option has not been set.
673
674       CpuFreqGovernors
675              List of CPU frequency governors allowed to be set with the  sal‐
676              loc,  sbatch,  or srun option  --cpu-freq.  Acceptable values at
677              present include:
678
679              Conservative  attempts to use the Conservative CPU governor
680
681              OnDemand      attempts to use the OnDemand CPU governor  (a  de‐
682                            fault value)
683
684              Performance   attempts  to  use  the Performance CPU governor (a
685                            default value)
686
687              PowerSave     attempts to use the PowerSave CPU governor
688
689              SchedUtil     attempts to use the SchedUtil CPU governor
690
691              UserSpace     attempts to use the UserSpace CPU governor (a  de‐
692                            fault value)
693       The default is OnDemand, Performance and UserSpace.
694
695       CredType
696              The  cryptographic  signature tool to be used in the creation of
697              job step credentials.  A restart of slurmctld  is  required  for
698              changes to this parameter to take effect.  The default (and rec‐
699              ommended) value is "cred/munge".
700
701       DebugFlags
702              Defines specific subsystems which should provide  more  detailed
703              event  logging.  Multiple subsystems can be specified with comma
704              separators.  Most DebugFlags will result in  verbose-level  log‐
705              ging  for  the  identified  subsystems, and could impact perfor‐
706              mance.
707
708              NOTE: You can also set  debug  flags  by  having  the  SLURM_DE‐
709              BUG_FLAGS  environment  variable  defined with the desired flags
710              when the process (client command, daemon, etc.) is started.  The
711              environment  variable  takes  precedence over the setting in the
712              slurm.conf.
713
714              Valid subsystems available include:
715
716              Accrue           Accrue counters accounting details
717
718              Agent            RPC agents (outgoing RPCs from Slurm daemons)
719
720              Backfill         Backfill scheduler details
721
722              BackfillMap      Backfill scheduler to log a very verbose map of
723                               reserved  resources  through time. Combine with
724                               Backfill for a verbose and complete view of the
725                               backfill scheduler's work.
726
727              BurstBuffer      Burst Buffer plugin
728
729              Cgroup           Cgroup details
730
731              CPU_Bind         CPU binding details for jobs and steps
732
733              CpuFrequency     Cpu  frequency details for jobs and steps using
734                               the --cpu-freq option.
735
736              Data             Generic data structure details.
737
738              Dependency       Job dependency debug info
739
740              Elasticsearch    Elasticsearch debug info
741
742              Energy           AcctGatherEnergy debug info
743
744              ExtSensors       External Sensors debug info
745
746              Federation       Federation scheduling debug info
747
748              FrontEnd         Front end node details
749
750              Gres             Generic resource details
751
752              Hetjob           Heterogeneous job details
753
754              Gang             Gang scheduling details
755
756              JobAccountGather Common  job  account  gathering  details   (not
757                               plugin specific).
758
759              JobContainer     Job container plugin details
760
761              License          License management details
762
763              Network          Network  details. Warning: activating this flag
764                               may cause logging of passwords, tokens or other
765                               authentication credentials.
766
767              NetworkRaw       Dump  raw  hex values of key Network communica‐
768                               tions. Warning: This flag will cause very  ver‐
769                               bose  logs  and may cause logging of passwords,
770                               tokens or other authentication credentials.
771
772              NodeFeatures     Node Features plugin debug info
773
774              NO_CONF_HASH     Do not log when the slurm.conf files differ be‐
775                               tween Slurm daemons
776
777              Power            Power  management  plugin  and power save (sus‐
778                               pend/resume programs) details
779
780              Priority         Job prioritization
781
782              Profile          AcctGatherProfile plugins details
783
784              Protocol         Communication protocol details
785
786              Reservation      Advanced reservations
787
788              Route            Message forwarding debug info
789
790              Script           Debug info  regarding  the  process  that  runs
791                               slurmctld  scripts  such as PrologSlurmctld and
792                               EpilogSlurmctld
793
794              SelectType       Resource selection plugin
795
796              Steps            Slurmctld resource allocation for job steps
797
798              Switch           Switch plugin
799
800              TimeCray         Timing of Cray APIs
801
802              TraceJobs        Trace jobs in slurmctld. It will print detailed
803                               job  information  including  state, job ids and
804                               allocated nodes counter.
805
806              Triggers         Slurmctld triggers
807
808              WorkQueue        Work Queue details
809
810       DefCpuPerGPU
811              Default count of CPUs allocated per allocated GPU. This value is
812              used   only  if  the  job  didn't  specify  --cpus-per-task  and
813              --cpus-per-gpu.
814
815       DefMemPerCPU
816              Default real memory size available per usable allocated  CPU  in
817              megabytes.   Used  to  avoid over-subscribing memory and causing
818              paging.  DefMemPerCPU would generally be used if individual pro‐
819              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
820              lectType=select/cons_tres).  The default value is 0 (unlimited).
821              Also  see DefMemPerGPU, DefMemPerNode and MaxMemPerCPU.  DefMem‐
822              PerCPU, DefMemPerGPU and DefMemPerNode are mutually exclusive.
823
824
825              NOTE: This applies to usable allocated CPUs in a job allocation.
826              This  is important when more than one thread per core is config‐
827              ured.  If a job requests --threads-per-core with  fewer  threads
828              on  a core than exist on the core (or --hint=nomultithread which
829              implies --threads-per-core=1), the job will  be  unable  to  use
830              those  extra  threads  on the core and those threads will not be
831              included in the memory per CPU calculation. But if the  job  has
832              access  to  all  threads  on the core, those threads will be in‐
833              cluded in the memory per CPU calculation even if the job did not
834              explicitly request those threads.
835
836              In the following examples, each core has two threads.
837
838              In  this  first  example,  two  tasks can run on separate hyper‐
839              threads in the same core because --threads-per-core is not used.
840              The  third  task uses both threads of the second core. The allo‐
841              cated memory per cpu includes all threads:
842
843              $ salloc -n3 --mem-per-cpu=100
844              salloc: Granted job allocation 17199
845              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
846                JobID                             ReqTRES                           AllocTRES
847              ------- ----------------------------------- -----------------------------------
848                17199     billing=3,cpu=3,mem=300M,node=1     billing=4,cpu=4,mem=400M,node=1
849
850              In this second example, because  of  --threads-per-core=1,  each
851              task  is  allocated  an  entire core but is only able to use one
852              thread per core. Allocated CPUs includes  all  threads  on  each
853              core. However, allocated memory per cpu includes only the usable
854              thread in each core.
855
856              $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
857              salloc: Granted job allocation 17200
858              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
859                JobID                             ReqTRES                           AllocTRES
860              ------- ----------------------------------- -----------------------------------
861                17200     billing=3,cpu=3,mem=300M,node=1     billing=6,cpu=6,mem=300M,node=1
862
863       DefMemPerGPU
864              Default  real  memory  size  available  per  allocated  GPU   in
865              megabytes.   The  default  value  is  0  (unlimited).   Also see
866              DefMemPerCPU and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU  and
867              DefMemPerNode are mutually exclusive.
868
869       DefMemPerNode
870              Default  real  memory  size  available  per  allocated  node  in
871              megabytes.  Used to avoid over-subscribing  memory  and  causing
872              paging.   DefMemPerNode  would  generally be used if whole nodes
873              are allocated to jobs (SelectType=select/linear)  and  resources
874              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
875              The default value is  0  (unlimited).   Also  see  DefMemPerCPU,
876              DefMemPerGPU  and  MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
877              DefMemPerNode are mutually exclusive.
878
879       DependencyParameters
880              Multiple options may be comma separated.
881
882              disable_remote_singleton
883                     By default, when a federated job has a  singleton  depen‐
884                     dency, each cluster in the federation must clear the sin‐
885                     gleton dependency before the job's  singleton  dependency
886                     is  considered satisfied. Enabling this option means that
887                     only the origin cluster must clear the  singleton  depen‐
888                     dency.  This  option  must be set in every cluster in the
889                     federation.
890
891              kill_invalid_depend
892                     If a job has an invalid dependency and it can  never  run
893                     terminate  it  and  set its state to be JOB_CANCELLED. By
894                     default the job stays pending with reason  DependencyNev‐
895                     erSatisfied.
896
897              max_depend_depth=#
898                     Maximum  number of jobs to test for a circular job depen‐
899                     dency. Stop testing after this number of job dependencies
900                     have been tested. The default value is 10 jobs.
901
902       DisableRootJobs
903              If  set  to  "YES" then user root will be prevented from running
904              any jobs.  The default value is "NO", meaning user root will  be
905              able to execute jobs.  DisableRootJobs may also be set by parti‐
906              tion.
907
908       EioTimeout
909              The number of seconds srun waits for  slurmstepd  to  close  the
910              TCP/IP  connection  used to relay data between the user applica‐
911              tion and srun when the user application terminates. The  default
912              value is 60 seconds.  May not exceed 65533.
913
914       EnforcePartLimits
915              If set to "ALL" then jobs which exceed a partition's size and/or
916              time limits will be rejected at submission time. If job is  sub‐
917              mitted  to  multiple partitions, the job must satisfy the limits
918              on all the requested partitions. If set to  "NO"  then  the  job
919              will  be  accepted  and remain queued until the partition limits
920              are altered(Time and Node Limits).  If set to "ANY" a  job  must
921              satisfy any of the requested partitions to be submitted. The de‐
922              fault value is "NO".  NOTE: If set, then a job's QOS can not  be
923              used to exceed partition limits.  NOTE: The partition limits be‐
924              ing considered are its configured  MaxMemPerCPU,  MaxMemPerNode,
925              MinNodes,  MaxNodes,  MaxTime, AllocNodes, AllowAccounts, Allow‐
926              Groups, AllowQOS, and QOS usage threshold.
927
928       Epilog Fully qualified pathname of a script to execute as user root  on
929              every   node   when  a  user's  job  completes  (e.g.  "/usr/lo‐
930              cal/slurm/epilog"). A glob pattern (See glob (7))  may  also  be
931              used  to  run more than one epilog script (e.g. "/etc/slurm/epi‐
932              log.d/*"). The Epilog script or scripts may  be  used  to  purge
933              files,  disable user login, etc.  By default there is no epilog.
934              See Prolog and Epilog Scripts for more information.
935
936       EpilogMsgTime
937              The number of microseconds that the slurmctld daemon requires to
938              process  an  epilog  completion message from the slurmd daemons.
939              This parameter can be used to prevent a burst of epilog  comple‐
940              tion messages from being sent at the same time which should help
941              prevent lost messages and improve  throughput  for  large  jobs.
942              The  default  value  is 2000 microseconds.  For a 1000 node job,
943              this spreads the epilog completion messages out  over  two  sec‐
944              onds.
945
946       EpilogSlurmctld
947              Fully  qualified pathname of a program for the slurmctld to exe‐
948              cute upon termination  of  a  job  allocation  (e.g.   "/usr/lo‐
949              cal/slurm/epilog_controller").   The  program  executes as Slur‐
950              mUser, which gives it permission to drain nodes and requeue  the
951              job  if  a  failure  occurs (See scontrol(1)).  Exactly what the
952              program does and how it accomplishes this is completely  at  the
953              discretion  of  the system administrator.  Information about the
954              job being initiated, its allocated nodes, etc. are passed to the
955              program  using  environment  variables.   See  Prolog and Epilog
956              Scripts for more information.
957
958       ExtSensorsFreq
959              The external  sensors  plugin  sampling  interval.   If  ExtSen‐
960              sorsType=ext_sensors/none,  this  parameter is ignored.  For all
961              other values of ExtSensorsType, this parameter is the number  of
962              seconds between external sensors samples for hardware components
963              (nodes, switches, etc.) The default value is  zero.  This  value
964              disables  external  sensors  sampling. Note: This parameter does
965              not affect external sensors data collection for jobs/steps.
966
967       ExtSensorsType
968              Identifies the plugin to be used for external sensors data  col‐
969              lection.   Slurmctld  calls this plugin to collect external sen‐
970              sors data for jobs/steps and hardware  components.  In  case  of
971              node  sharing  between  jobs  the  reported  values per job/step
972              (through sstat or sacct) may not be  accurate.   See  also  "man
973              ext_sensors.conf".
974
975              Configurable values at present are:
976
977              ext_sensors/none    No external sensors data is collected.
978
979              ext_sensors/rrd     External  sensors data is collected from the
980                                  RRD database.
981
982       FairShareDampeningFactor
983              Dampen the effect of exceeding a user or group's fair  share  of
984              allocated resources. Higher values will provides greater ability
985              to differentiate between exceeding the fair share at high levels
986              (e.g. a value of 1 results in almost no difference between over‐
987              consumption by a factor of 10 and 100, while a value of  5  will
988              result  in  a  significant difference in priority).  The default
989              value is 1.
990
991       FederationParameters
992              Used to define federation options. Multiple options may be comma
993              separated.
994
995              fed_display
996                     If  set,  then  the  client status commands (e.g. squeue,
997                     sinfo, sprio, etc.) will display information in a  feder‐
998                     ated view by default. This option is functionally equiva‐
999                     lent to using the --federation options on  each  command.
1000                     Use the client's --local option to override the federated
1001                     view and get a local view of the given cluster.
1002
1003       FirstJobId
1004              The job id to be used for the first job submitted to Slurm.  Job
1005              id  values  generated  will incremented by 1 for each subsequent
1006              job.  Value must be larger than 0. The default value is 1.  Also
1007              see MaxJobId
1008
1009       GetEnvTimeout
1010              Controls  how  long the job should wait (in seconds) to load the
1011              user's environment before attempting to load  it  from  a  cache
1012              file.   Applies  when the salloc or sbatch --get-user-env option
1013              is used.  If set to 0 then always load  the  user's  environment
1014              from the cache file.  The default value is 2 seconds.
1015
1016       GresTypes
1017              A  comma-delimited list of generic resources to be managed (e.g.
1018              GresTypes=gpu,mps).  These resources may have an associated GRES
1019              plugin  of the same name providing additional functionality.  No
1020              generic resources are managed by default.  Ensure this parameter
1021              is  consistent across all nodes in the cluster for proper opera‐
1022              tion.  A restart of slurmctld and the slurmd daemons is required
1023              for this to take effect.
1024
1025       GroupUpdateForce
1026              If  set  to a non-zero value, then information about which users
1027              are members of groups allowed to use a partition will be updated
1028              periodically,  even  when  there  have  been  no  changes to the
1029              /etc/group file.  If set to zero, group member information  will
1030              be  updated  only after the /etc/group file is updated.  The de‐
1031              fault value is 1.  Also see the GroupUpdateTime parameter.
1032
1033       GroupUpdateTime
1034              Controls how frequently information about which users  are  mem‐
1035              bers  of  groups allowed to use a partition will be updated, and
1036              how long user group membership lists will be cached.   The  time
1037              interval  is  given  in seconds with a default value of 600 sec‐
1038              onds.  A value of zero will prevent periodic updating  of  group
1039              membership  information.   Also see the GroupUpdateForce parame‐
1040              ter.
1041
1042       GpuFreqDef=[<type]=value>[,<type=value>]
1043              Default GPU frequency to use when running a job step if  it  has
1044              not  been  explicitly set using the --gpu-freq option.  This op‐
1045              tion can be used to independently configure the GPU and its mem‐
1046              ory  frequencies. Defaults to "high,memory=high".  After the job
1047              is completed, the frequencies of all affected GPUs will be reset
1048              to  the  highest  possible  values.  In some cases, system power
1049              caps may override the requested values.  The field type  can  be
1050              "memory".   If  type  is not specified, the GPU frequency is im‐
1051              plied.  The value field can either be "low",  "medium",  "high",
1052              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
1053              fied numeric value is not possible, a value as close as possible
1054              will be used.  See below for definition of the values.  Examples
1055              of  use  include  "GpuFreqDef=medium,memory=high  and   "GpuFre‐
1056              qDef=450".
1057
1058              Supported value definitions:
1059
1060              low       the lowest available frequency.
1061
1062              medium    attempts  to  set  a  frequency  in  the middle of the
1063                        available range.
1064
1065              high      the highest available frequency.
1066
1067              highm1    (high minus one) will select the next  highest  avail‐
1068                        able frequency.
1069
1070       HealthCheckInterval
1071              The  interval  in  seconds between executions of HealthCheckPro‐
1072              gram.  The default value is zero, which disables execution.
1073
1074       HealthCheckNodeState
1075              Identify what node states should execute the HealthCheckProgram.
1076              Multiple  state  values may be specified with a comma separator.
1077              The default value is ANY to execute on nodes in any state.
1078
1079              ALLOC       Run on nodes in the  ALLOC  state  (all  CPUs  allo‐
1080                          cated).
1081
1082              ANY         Run on nodes in any state.
1083
1084              CYCLE       Rather  than running the health check program on all
1085                          nodes at the same time, cycle through running on all
1086                          compute nodes through the course of the HealthCheck‐
1087                          Interval. May be  combined  with  the  various  node
1088                          state options.
1089
1090              IDLE        Run on nodes in the IDLE state.
1091
1092              MIXED       Run  on nodes in the MIXED state (some CPUs idle and
1093                          other CPUs allocated).
1094
1095       HealthCheckProgram
1096              Fully qualified pathname of a script to execute as user root pe‐
1097              riodically on all compute nodes that are not in the NOT_RESPOND‐
1098              ING state. This program may be used to verify the node is  fully
1099              operational and DRAIN the node or send email if a problem is de‐
1100              tected.  Any action to be taken must be explicitly performed  by
1101              the   program   (e.g.   execute  "scontrol  update  NodeName=foo
1102              State=drain Reason=tmp_file_system_full" to drain a node).   The
1103              execution  interval  is controlled using the HealthCheckInterval
1104              parameter.  Note that the HealthCheckProgram will be executed at
1105              the  same time on all nodes to minimize its impact upon parallel
1106              programs.  This program will be killed if it does not  terminate
1107              normally  within 60 seconds.  This program will also be executed
1108              when the slurmd daemon is first started and before it  registers
1109              with  the slurmctld daemon.  By default, no program will be exe‐
1110              cuted.
1111
1112       InactiveLimit
1113              The interval, in seconds, after which a non-responsive job allo‐
1114              cation  command (e.g. srun or salloc) will result in the job be‐
1115              ing terminated. If the node on which  the  command  is  executed
1116              fails  or the command abnormally terminates, this will terminate
1117              its job allocation.  This option has no effect upon batch  jobs.
1118              When  setting  a  value, take into consideration that a debugger
1119              using srun to launch an application may leave the  srun  command
1120              in  a stopped state for extended periods of time.  This limit is
1121              ignored for jobs running in partitions with  the  RootOnly  flag
1122              set  (the  scheduler running as root will be responsible for the
1123              job).  The default value is unlimited (zero) and may not  exceed
1124              65533 seconds.
1125
1126       InteractiveStepOptions
1127              When LaunchParameters=use_interactive_step is enabled, launching
1128              salloc will automatically start an srun  process  with  Interac‐
1129              tiveStepOptions  to launch a terminal on a node in the job allo‐
1130              cation.  The  default  value  is  "--interactive  --preserve-env
1131              --pty  $SHELL".  The "--interactive" option is intentionally not
1132              documented in the srun man page. It is meant only to be used  in
1133              InteractiveStepOptions  in order to create an "interactive step"
1134              that will not consume resources so that other steps may  run  in
1135              parallel with the interactive step.
1136
1137       JobAcctGatherType
1138              The job accounting mechanism type.  Acceptable values at present
1139              include    "jobacct_gather/linux"    (for    Linux     systems),
1140              "jobacct_gather/cgroup" and "jobacct_gather/none" (no accounting
1141              data collected).  The default  value  is  "jobacct_gather/none".
1142              "jobacct_gather/cgroup" is a plugin for the Linux operating sys‐
1143              tem that uses cgroups  to  collect  accounting  statistics.  The
1144              plugin collects the following statistics: From the cgroup memory
1145              subsystem: memory.usage_in_bytes (reported as 'pages')  and  rss
1146              from  memory.stat  (reported  as 'rss'). From the cgroup cpuacct
1147              subsystem: user cpu time and system cpu time. No value  is  pro‐
1148              vided by cgroups for virtual memory size ('vsize').  In order to
1149              use    the     sstat     tool     "jobacct_gather/linux",     or
1150              "jobacct_gather/cgroup" must be configured.
1151              NOTE: Changing this configuration parameter changes the contents
1152              of the messages between Slurm daemons.  Any  previously  running
1153              job  steps  are managed by a slurmstepd daemon that will persist
1154              through the lifetime of that job step and not change its  commu‐
1155              nication protocol. Only change this configuration parameter when
1156              there are no running job steps.
1157
1158       JobAcctGatherFrequency
1159              The job accounting and profiling sampling intervals.   The  sup‐
1160              ported format is follows:
1161
1162              JobAcctGatherFrequency=<datatype>=<interval>
1163                          where  <datatype>=<interval> specifies the task sam‐
1164                          pling interval for the jobacct_gather  plugin  or  a
1165                          sampling  interval  for  a  profiling  type  by  the
1166                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1167                          rated  <datatype>=<interval> intervals may be speci‐
1168                          fied. Supported datatypes are as follows:
1169
1170                          task=<interval>
1171                                 where <interval> is the task sampling  inter‐
1172                                 val in seconds for the jobacct_gather plugins
1173                                 and    for    task    profiling    by     the
1174                                 acct_gather_profile plugin.
1175
1176                          energy=<interval>
1177                                 where  <interval> is the sampling interval in
1178                                 seconds  for  energy  profiling   using   the
1179                                 acct_gather_energy plugin
1180
1181                          network=<interval>
1182                                 where  <interval> is the sampling interval in
1183                                 seconds for infiniband  profiling  using  the
1184                                 acct_gather_interconnect plugin.
1185
1186                          filesystem=<interval>
1187                                 where  <interval> is the sampling interval in
1188                                 seconds for filesystem  profiling  using  the
1189                                 acct_gather_filesystem plugin.
1190
1191
1192                     The  default  value  for task sampling
1193                     interval
1194              is 30 seconds. The default value for all other intervals  is  0.
1195              An  interval  of  0 disables sampling of the specified type.  If
1196              the task sampling interval is 0, accounting information is  col‐
1197              lected only at job termination, which reduces Slurm interference
1198              with the job, but also means that the  statistics  about  a  job
1199              don't  reflect the average or maximum of several samples though‐
1200              out the life of the job, but just show the information collected
1201              in the single sample.
1202              Smaller (non-zero) values have a greater impact upon job perfor‐
1203              mance, but a value of 30 seconds is not likely to be  noticeable
1204              for applications having less than 10,000 tasks.
1205              Users  can independently override each interval on a per job ba‐
1206              sis using the --acctg-freq option when submitting the job.
1207
1208       JobAcctGatherParams
1209              Arbitrary parameters for the job account gather plugin.  Accept‐
1210              able values at present include:
1211
1212              NoShared            Exclude  shared memory from RSS. This option
1213                                  cannot be used with UsePSS.
1214
1215              UsePss              Use PSS value instead of  RSS  to  calculate
1216                                  real  usage of memory. The PSS value will be
1217                                  saved as RSS. This  option  cannot  be  used
1218                                  with NoShared.
1219
1220              OverMemoryKill      Kill  processes  that  are being detected to
1221                                  use more memory than requested by steps  ev‐
1222                                  ery  time accounting information is gathered
1223                                  by the JobAcctGather plugin.  This parameter
1224                                  should  be  used  with caution because a job
1225                                  exceeding its memory allocation  may  affect
1226                                  other processes and/or machine health.
1227
1228                                  NOTE:  If  available,  it  is recommended to
1229                                  limit memory by enabling  task/cgroup  as  a
1230                                  TaskPlugin  and  making  use  of  Constrain‐
1231                                  RAMSpace=yes in the cgroup.conf  instead  of
1232                                  using  this JobAcctGather mechanism for mem‐
1233                                  ory  enforcement.  Using  JobAcctGather   is
1234                                  polling  based and there is a delay before a
1235                                  job is killed, which could  lead  to  system
1236                                  Out of Memory events.
1237
1238                                  NOTE: When using OverMemoryKill, if the com‐
1239                                  bined memory used by all the processes in  a
1240                                  step  exceeds  the  memory limit, the entire
1241                                  step will be killed/cancelled by the  JobAc‐
1242                                  ctGather  plugin.  This differs from the be‐
1243                                  havior when using  ConstrainRAMSpace,  where
1244                                  processes  in  the  step will be killed, but
1245                                  the step will be left active, possibly  with
1246                                  other processes left running.
1247
1248       JobCompHost
1249              The  name  of  the  machine hosting the job completion database.
1250              Only used for database type storage plugins, ignored otherwise.
1251
1252       JobCompLoc
1253              The fully qualified file name where job completion  records  are
1254              written  when  the JobCompType is "jobcomp/filetxt" or the data‐
1255              base where job completion records are stored when  the  JobComp‐
1256              Type  is  a  database,  or  a  complete URL endpoint with format
1257              <host>:<port>/<target>/_doc when JobCompType  is  "jobcomp/elas‐
1258              ticsearch"  like  i.e.  "localhost:9200/slurm/_doc".  NOTE: More
1259              information   is   available   at    the    Slurm    web    site
1260              <https://slurm.schedmd.com/elasticsearch.html>.
1261
1262       JobCompParams
1263              Pass  arbitrary  text string to job completion plugin.  Also see
1264              JobCompType.
1265
1266       JobCompPass
1267              The password used to gain access to the database  to  store  the
1268              job  completion data.  Only used for database type storage plug‐
1269              ins, ignored otherwise.
1270
1271       JobCompPort
1272              The listening port of the job completion database server.   Only
1273              used for database type storage plugins, ignored otherwise.
1274
1275       JobCompType
1276              The job completion logging mechanism type.  Acceptable values at
1277              present include:
1278
1279              jobcomp/none
1280                     Upon job completion, a record of the job is  purged  from
1281                     the  system.  If using the accounting infrastructure this
1282                     plugin may not be of interest since some of the  informa‐
1283                     tion is redundant.
1284
1285              jobcomp/elasticsearch
1286                     Upon  job completion, a record of the job should be writ‐
1287                     ten to an Elasticsearch server, specified by the  JobCom‐
1288                     pLoc parameter.
1289                     NOTE: More information is available at the Slurm web site
1290                     ( https://slurm.schedmd.com/elasticsearch.html ).
1291
1292              jobcomp/filetxt
1293                     Upon job completion, a record of the job should be  writ‐
1294                     ten  to  a text file, specified by the JobCompLoc parame‐
1295                     ter.
1296
1297              jobcomp/lua
1298                     Upon job completion, a record of the job should  be  pro‐
1299                     cessed  by the jobcomp.lua script, located in the default
1300                     script directory (typically the subdirectory etc  of  the
1301                     installation directory.
1302
1303              jobcomp/mysql
1304                     Upon  job completion, a record of the job should be writ‐
1305                     ten to a MySQL or MariaDB database, specified by the Job‐
1306                     CompLoc parameter.
1307
1308              jobcomp/script
1309                     Upon job completion, a script specified by the JobCompLoc
1310                     parameter is to be executed  with  environment  variables
1311                     providing the job information.
1312
1313       JobCompUser
1314              The  user  account  for  accessing  the job completion database.
1315              Only used for database type storage plugins, ignored otherwise.
1316
1317       JobContainerType
1318              Identifies the plugin to be used for job tracking.  A restart of
1319              slurmctld  is required for changes to this parameter to take ef‐
1320              fect.  NOTE: The JobContainerType applies to a  job  allocation,
1321              while  ProctrackType applies to job steps.  Acceptable values at
1322              present include:
1323
1324              job_container/cncu  Used only for Cray systems (CNCU  =  Compute
1325                                  Node Clean Up)
1326
1327              job_container/none  Used for all other system types
1328
1329              job_container/tmpfs Used  to  create  a private namespace on the
1330                                  filesystem for jobs, which houses  temporary
1331                                  file  systems  (/tmp  and /dev/shm) for each
1332                                  job. 'PrologFlags=Contain' must  be  set  to
1333                                  use this plugin.
1334
1335       JobFileAppend
1336              This  option controls what to do if a job's output or error file
1337              exist when the job is started.  If JobFileAppend  is  set  to  a
1338              value  of  1, then append to the existing file.  By default, any
1339              existing file is truncated.
1340
1341       JobRequeue
1342              This option controls the default ability for batch  jobs  to  be
1343              requeued.   Jobs may be requeued explicitly by a system adminis‐
1344              trator, after node failure, or upon preemption by a higher  pri‐
1345              ority  job.   If  JobRequeue  is set to a value of 1, then batch
1346              jobs may be requeued unless explicitly disabled by the user.  If
1347              JobRequeue  is  set to a value of 0, then batch jobs will not be
1348              requeued unless explicitly enabled by the user.  Use the  sbatch
1349              --no-requeue  or --requeue option to change the default behavior
1350              for individual jobs.  The default value is 1.
1351
1352       JobSubmitPlugins
1353              These are intended to be site-specific plugins which can be used
1354              to  set  default job parameters and/or logging events. Slurm can
1355              be configured to use multiple  job_submit  plugins  if  desired,
1356              which  must  be  specified as a comma-delimited list and will be
1357              executed in the order listed.
1358              e.g. for multiple job_submit plugin configuration:
1359              JobSubmitPlugins=lua,require_timelimit
1360              Take  a  look   at   <https://slurm.schedmd.com/job_submit_plug
1361              ins.html> for further plugin implementation details. No job sub‐
1362              mission plugins are used by default.  Currently available  plug‐
1363              ins are:
1364
1365              all_partitions          Set  default partition to all partitions
1366                                      on the cluster.
1367
1368              defaults                Set default values for job submission or
1369                                      modify requests.
1370
1371              logging                 Log  select job submission and modifica‐
1372                                      tion parameters.
1373
1374              lua                     Execute a Lua script implementing site's
1375                                      own   job_submit  logic.  Only  one  Lua
1376                                      script will  be  executed.  It  must  be
1377                                      named  "job_submit.lua"  and must be lo‐
1378                                      cated in the default  configuration  di‐
1379                                      rectory   (typically   the  subdirectory
1380                                      "etc" of  the  installation  directory).
1381                                      Sample Lua scripts can be found with the
1382                                      Slurm  distribution,  in  the  directory
1383                                      contribs/lua.  Slurmctld  will  fatal on
1384                                      startup if the configured lua script  is
1385                                      invalid.  Slurm  will  try  to  load the
1386                                      script for each job submission.  If  the
1387                                      script is broken or removed while slurm‐
1388                                      ctld is running, Slurm will fallback  to
1389                                      the  previous  working  version  of  the
1390                                      script.
1391
1392              partition               Set a job's default partition based upon
1393                                      job  submission parameters and available
1394                                      partitions.
1395
1396              pbs                     Translate PBS job submission options  to
1397                                      Slurm equivalent (if possible).
1398
1399              require_timelimit       Force job submissions to specify a time‐
1400                                      limit.
1401
1402              NOTE: For examples of use  see  the  Slurm  code  in  "src/plug‐
1403              ins/job_submit"  and  "contribs/lua/job_submit*.lua" then modify
1404              the code to satisfy your needs.
1405
1406       KillOnBadExit
1407              If set to 1, a step will be terminated immediately if  any  task
1408              is  crashed  or  aborted,  as indicated by a non-zero exit code.
1409              With the default value of 0, if one of the processes is  crashed
1410              or  aborted  the  other processes will continue to run while the
1411              crashed or aborted process waits. The  user  can  override  this
1412              configuration parameter by using srun's -K, --kill-on-bad-exit.
1413
1414       KillWait
1415              The interval, in seconds, given to a job's processes between the
1416              SIGTERM and SIGKILL signals upon reaching its  time  limit.   If
1417              the job fails to terminate gracefully in the interval specified,
1418              it will be forcibly terminated.  The default value  is  30  sec‐
1419              onds.  The value may not exceed 65533.
1420
1421       NodeFeaturesPlugins
1422              Identifies  the  plugins to be used for support of node features
1423              which can change through time. For example, a node  which  might
1424              be  booted  with various BIOS setting. This is supported through
1425              the use of a node's active_features and  available_features  in‐
1426              formation.  Acceptable values at present include:
1427
1428              node_features/knl_cray
1429                     Used  only  for Intel Knights Landing processors (KNL) on
1430                     Cray systems.
1431
1432              node_features/knl_generic
1433                     Used for Intel Knights  Landing  processors  (KNL)  on  a
1434                     generic Linux system.
1435
1436              node_features/helpers
1437                     Used  to  report and modify features on nodes using arbi‐
1438                     trary scripts or programs.
1439
1440       LaunchParameters
1441              Identifies options to the job launch plugin.  Acceptable  values
1442              include:
1443
1444              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1445                                      from  given  --cpu-freq,  or  slurm.conf
1446                                      CpuFreqDef,  option.   By  default  only
1447                                      steps started with srun will utilize the
1448                                      cpu freq setting options.
1449
1450                                      NOTE:  If  you  are using srun to launch
1451                                      your steps inside a  batch  script  (ad‐
1452                                      vised)  this option will create a situa‐
1453                                      tion where you may have multiple  agents
1454                                      setting  the  cpu_freq as the batch step
1455                                      usually runs on the same  resources  one
1456                                      or  more  steps  the sruns in the script
1457                                      will create.
1458
1459              cray_net_exclusive      Allow jobs on a Cray Native cluster  ex‐
1460                                      clusive  access  to  network  resources.
1461                                      This should only be set on clusters pro‐
1462                                      viding  exclusive access to each node to
1463                                      a single job at once, and not using par‐
1464                                      allel  steps  within  the job, otherwise
1465                                      resources on the node  can  be  oversub‐
1466                                      scribed.
1467
1468              enable_nss_slurm        Permits  passwd and group resolution for
1469                                      a  job  to  be  serviced  by  slurmstepd
1470                                      rather  than  requiring  a lookup from a
1471                                      network     based      service.      See
1472                                      https://slurm.schedmd.com/nss_slurm.html
1473                                      for more information.
1474
1475              lustre_no_flush         If set on a Cray Native cluster, then do
1476                                      not  flush  the Lustre cache on job step
1477                                      completion. This setting will only  take
1478                                      effect  after  reconfiguring,  and  will
1479                                      only  take  effect  for  newly  launched
1480                                      jobs.
1481
1482              mem_sort                Sort NUMA memory at step start. User can
1483                                      override     this      default      with
1484                                      SLURM_MEM_BIND  environment  variable or
1485                                      --mem-bind=nosort command line option.
1486
1487              mpir_use_nodeaddr       When launching tasks Slurm  creates  en‐
1488                                      tries in MPIR_proctable that are used by
1489                                      parallel debuggers, profilers,  and  re‐
1490                                      lated   tools   to   attach  to  running
1491                                      process.  By default the  MPIR_proctable
1492                                      entries contain MPIR_procdesc structures
1493                                      where the host_name is set  to  NodeName
1494                                      by default. If this option is specified,
1495                                      NodeAddr will be used  in  this  context
1496                                      instead.
1497
1498              disable_send_gids       By  default,  the slurmctld will look up
1499                                      and send the user_name and extended gids
1500                                      for  a job, rather than independently on
1501                                      each node as part of each  task  launch.
1502                                      This  helps  mitigate issues around name
1503                                      service scalability when launching  jobs
1504                                      involving  many nodes. Using this option
1505                                      will disable  this  functionality.  This
1506                                      option is ignored if enable_nss_slurm is
1507                                      specified.
1508
1509              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1510                                      memory in RAM.
1511
1512              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1513                                      and future memory in RAM.
1514
1515              test_exec               Have srun verify existence of  the  exe‐
1516                                      cutable  program along with user execute
1517                                      permission on the node  where  srun  was
1518                                      called before attempting to launch it on
1519                                      nodes in the step.
1520
1521              use_interactive_step    Have salloc use the Interactive Step  to
1522                                      launch  a  shell on an allocated compute
1523                                      node rather  than  locally  to  wherever
1524                                      salloc was invoked. This is accomplished
1525                                      by launching the srun command  with  In‐
1526                                      teractiveStepOptions as options.
1527
1528                                      This  does not affect salloc called with
1529                                      a command as  an  argument.  These  jobs
1530                                      will  continue  to  be  executed  as the
1531                                      calling user on the calling host.
1532
1533       LaunchType
1534              Identifies the mechanism to be used to launch application tasks.
1535              Acceptable values include:
1536
1537              launch/slurm
1538                     The default value.
1539
1540       Licenses
1541              Specification  of  licenses (or other resources available on all
1542              nodes of the cluster) which can be allocated to  jobs.   License
1543              names can optionally be followed by a colon and count with a de‐
1544              fault count of one.  Multiple license names should be comma sep‐
1545              arated  (e.g.   "Licenses=foo:4,bar").  Note that Slurm prevents
1546              jobs from being scheduled if their required  license  specifica‐
1547              tion  is  not available.  Slurm does not prevent jobs from using
1548              licenses that are not explicitly listed in  the  job  submission
1549              specification.
1550
1551       LogTimeFormat
1552              Format  of  the timestamp in slurmctld and slurmd log files. Ac‐
1553              cepted   values   are   "iso8601",   "iso8601_ms",    "rfc5424",
1554              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1555              ing in "_ms" differ from the ones  without  in  that  fractional
1556              seconds  with  millisecond  precision  are  printed. The default
1557              value is "iso8601_ms". The "rfc5424" formats are the same as the
1558              "iso8601"  formats except that the timezone value is also shown.
1559              The "clock" format shows a timestamp in  microseconds  retrieved
1560              with  the  C  standard clock() function. The "short" format is a
1561              short date and time format. The  "thread_id"  format  shows  the
1562              timestamp  in  the  C standard ctime() function form without the
1563              year but including the microseconds, the daemon's process ID and
1564              the current thread name and ID.
1565
1566       MailDomain
1567              Domain name to qualify usernames if email address is not explic‐
1568              itly given with the "--mail-user" option. If  unset,  the  local
1569              MTA  will need to qualify local address itself. Changes to Mail‐
1570              Domain will only affect new jobs.
1571
1572       MailProg
1573              Fully qualified pathname to the program used to send  email  per
1574              user   request.    The   default   value   is   "/bin/mail"  (or
1575              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1576              "/usr/bin/mail"  does  exist).  The program is called with argu‐
1577              ments suitable for the default mail command, however  additional
1578              information  about  the job is passed in the form of environment
1579              variables.
1580
1581              Additional variables are  the  same  as  those  passed  to  Pro‐
1582              logSlurmctld  and  EpilogSlurmctld  with additional variables in
1583              the following contexts:
1584
1585              ALL
1586
1587                     SLURM_JOB_STATE
1588                            The base state of the job  when  the  MailProg  is
1589                            called.
1590
1591                     SLURM_JOB_MAIL_TYPE
1592                            The mail type triggering the mail.
1593
1594              BEGIN
1595
1596                     SLURM_JOB_QEUEUED_TIME
1597                            The amount of time the job was queued.
1598
1599              END, FAIL, REQUEUE, TIME_LIMIT_*
1600
1601                     SLURM_JOB_RUN_TIME
1602                            The amount of time the job ran for.
1603
1604              END, FAIL
1605
1606                     SLURM_JOB_EXIT_CODE_MAX
1607                            Job's  exit code or highest exit code for an array
1608                            job.
1609
1610                     SLURM_JOB_EXIT_CODE_MIN
1611                            Job's minimum exit code for an array job.
1612
1613                     SLURM_JOB_TERM_SIGNAL_MAX
1614                            Job's highest signal for an array job.
1615
1616              STAGE_OUT
1617
1618                     SLURM_JOB_STAGE_OUT_TIME
1619                            Job's staging out time.
1620
1621       MaxArraySize
1622              The maximum job array task index value will  be  one  less  than
1623              MaxArraySize  to  allow  for  an index value of zero.  Configure
1624              MaxArraySize to 0 in order to disable job array use.  The  value
1625              may not exceed 4000001.  The value of MaxJobCount should be much
1626              larger than MaxArraySize.  The default value is 1001.  See  also
1627              max_array_tasks in SchedulerParameters.
1628
1629       MaxDBDMsgs
1630              When communication to the SlurmDBD is not possible the slurmctld
1631              will queue messages meant to  processed  when  the  SlurmDBD  is
1632              available  again.   In  order to avoid running out of memory the
1633              slurmctld will only queue so many messages. The default value is
1634              10000,  or  MaxJobCount  *  2  +  Node  Count  * 4, whichever is
1635              greater.  The value can not be less than 10000.
1636
1637       MaxJobCount
1638              The maximum number of jobs slurmctld can have in memory  at  one
1639              time.   Combine  with  MinJobAge  to ensure the slurmctld daemon
1640              does not exhaust its memory or other resources. Once this  limit
1641              is  reached,  requests  to submit additional jobs will fail. The
1642              default value is 10000 jobs.  NOTE: Each task  of  a  job  array
1643              counts  as one job even though they will not occupy separate job
1644              records until modified or  initiated.   Performance  can  suffer
1645              with more than a few hundred thousand jobs.  Setting per MaxSub‐
1646              mitJobs per user is generally valuable to prevent a single  user
1647              from  filling  the system with jobs.  This is accomplished using
1648              Slurm's database and configuring enforcement of resource limits.
1649              A restart of slurmctld is required for changes to this parameter
1650              to take effect.
1651
1652       MaxJobId
1653              The maximum job id to be used for jobs submitted to Slurm  with‐
1654              out a specific requested value. Job ids are unsigned 32bit inte‐
1655              gers with the first 26 bits reserved for local job ids  and  the
1656              remaining  6 bits reserved for a cluster id to identify a feder‐
1657              ated  job's  origin.  The  maximum  allowed  local  job  id   is
1658              67,108,863   (0x3FFFFFF).   The   default  value  is  67,043,328
1659              (0x03ff0000).  MaxJobId only applies to the local job id and not
1660              the  federated  job  id.  Job id values generated will be incre‐
1661              mented by 1 for each subsequent job. Once MaxJobId  is  reached,
1662              the  next  job will be assigned FirstJobId.  Federated jobs will
1663              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1664              bId.
1665
1666       MaxMemPerCPU
1667              Maximum   real  memory  size  available  per  allocated  CPU  in
1668              megabytes.  Used to avoid over-subscribing  memory  and  causing
1669              paging.  MaxMemPerCPU would generally be used if individual pro‐
1670              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
1671              lectType=select/cons_tres).  The default value is 0 (unlimited).
1672              Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerNode.   MaxMem‐
1673              PerCPU and MaxMemPerNode are mutually exclusive.
1674
1675              NOTE:  If  a  job  specifies a memory per CPU limit that exceeds
1676              this system limit, that job's count of CPUs per task will try to
1677              automatically  increase.  This may result in the job failing due
1678              to CPU count limits. This auto-adjustment feature is a  best-ef‐
1679              fort  one  and  optimal  assignment is not guaranteed due to the
1680              possibility   of   having   heterogeneous   configurations   and
1681              multi-partition/qos jobs.  If this is a concern it is advised to
1682              use a job submit LUA plugin instead to enforce  auto-adjustments
1683              to your specific needs.
1684
1685       MaxMemPerNode
1686              Maximum  real  memory  size  available  per  allocated  node  in
1687              megabytes.  Used to avoid over-subscribing  memory  and  causing
1688              paging.   MaxMemPerNode  would  generally be used if whole nodes
1689              are allocated to jobs (SelectType=select/linear)  and  resources
1690              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
1691              The default value is 0 (unlimited).  Also see DefMemPerNode  and
1692              MaxMemPerCPU.   MaxMemPerCPU  and MaxMemPerNode are mutually ex‐
1693              clusive.
1694
1695       MaxNodeCount
1696              Maximum count of nodes which may exist in the controller. By de‐
1697              fault  MaxNodeCount  will be set to the number of nodes found in
1698              the slurm.conf. MaxNodeCount will be ignored if  less  than  the
1699              number  of  nodes found in the slurm.conf. Increase MaxNodeCount
1700              to accommodate dynamically created nodes with dynamic node  reg‐
1701              istrations and nodes created with scontrol. The slurmctld daemon
1702              must be restarted for changes to this parameter to take effect.
1703
1704       MaxStepCount
1705              The maximum number of steps that any job can initiate. This  pa‐
1706              rameter  is  intended  to limit the effect of bad batch scripts.
1707              The default value is 40000 steps.
1708
1709       MaxTasksPerNode
1710              Maximum number of tasks Slurm will allow a job step to spawn  on
1711              a  single node. The default MaxTasksPerNode is 512.  May not ex‐
1712              ceed 65533.
1713
1714       MCSParameters
1715              MCS = Multi-Category Security MCS Plugin Parameters.   The  sup‐
1716              ported  parameters  are  specific  to the MCSPlugin.  Changes to
1717              this value take effect when the Slurm daemons are  reconfigured.
1718              More     information     about    MCS    is    available    here
1719              <https://slurm.schedmd.com/mcs.html>.
1720
1721       MCSPlugin
1722              MCS = Multi-Category Security : associate a  security  label  to
1723              jobs  and  ensure that nodes can only be shared among jobs using
1724              the same security label.  Acceptable values include:
1725
1726              mcs/none    is the default value.  No security label  associated
1727                          with  jobs,  no particular security restriction when
1728                          sharing nodes among jobs.
1729
1730              mcs/account only users with the same account can share the nodes
1731                          (requires enabling of accounting).
1732
1733              mcs/group   only users with the same group can share the nodes.
1734
1735              mcs/user    a node cannot be shared with other users.
1736
1737       MessageTimeout
1738              Time  permitted  for  a  round-trip communication to complete in
1739              seconds. Default value is 10 seconds. For  systems  with  shared
1740              nodes,  the  slurmd  daemon  could  be paged out and necessitate
1741              higher values.
1742
1743       MinJobAge
1744              The minimum age of a completed job before its record is  cleared
1745              from  the  list  of jobs slurmctld keeps in memory. Combine with
1746              MaxJobCount to ensure the slurmctld daemon does not exhaust  its
1747              memory  or other resources. The default value is 300 seconds.  A
1748              value of zero prevents any job record  purging.   Jobs  are  not
1749              purged  during a backfill cycle, so it can take longer than Min‐
1750              JobAge seconds to purge a job if using the  backfill  scheduling
1751              plugin.   In  order  to eliminate some possible race conditions,
1752              the minimum non-zero value for MinJobAge recommended is 2.
1753
1754       MpiDefault
1755              Identifies the default type of MPI to be used.  Srun  may  over‐
1756              ride  this  configuration parameter in any case.  Currently sup‐
1757              ported versions include: pmi2, pmix, and  none  (default,  which
1758              works  for  many other versions of MPI).  More information about
1759              MPI          use           is           available           here
1760              <https://slurm.schedmd.com/mpi_guide.html>.
1761
1762       MpiParams
1763              MPI  parameters.   Used  to identify ports used by native Cray's
1764              PMI. The format to identify a range of  communication  ports  is
1765              "ports=12000-12999".
1766
1767       OverTimeLimit
1768              Number  of  minutes by which a job can exceed its time limit be‐
1769              fore being canceled.  Normally a job's time limit is treated  as
1770              a  hard  limit  and  the  job  will be killed upon reaching that
1771              limit.  Configuring OverTimeLimit will result in the job's  time
1772              limit being treated like a soft limit.  Adding the OverTimeLimit
1773              value to the soft time limit provides  a  hard  time  limit,  at
1774              which  point  the  job is canceled.  This is particularly useful
1775              for backfill scheduling, which bases upon each job's  soft  time
1776              limit.   The  default  value is zero.  May not exceed 65533 min‐
1777              utes.  A value of "UNLIMITED" is also supported.
1778
1779       PluginDir
1780              Identifies the places in which to look for Slurm plugins.   This
1781              is a colon-separated list of directories, like the PATH environ‐
1782              ment variable.  The default value is the prefix given at config‐
1783              ure  time + "/lib/slurm".  A restart of slurmctld and the slurmd
1784              daemons is required for changes to this parameter  to  take  ef‐
1785              fect.
1786
1787       PlugStackConfig
1788              Location of the config file for Slurm stackable plugins that use
1789              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1790              (SPANK).  This provides support for a highly configurable set of
1791              plugins to be called before and/or after execution of each  task
1792              spawned  as  part  of  a  user's  job step.  Default location is
1793              "plugstack.conf" in the same directory as the system slurm.conf.
1794              For more information on SPANK plugins, see the spank(8) manual.
1795
1796       PowerParameters
1797              System  power  management  parameters.  The supported parameters
1798              are specific to the PowerPlugin.  Changes to this value take ef‐
1799              fect  when the Slurm daemons are reconfigured.  More information
1800              about   system    power    management    is    available    here
1801              <https://slurm.schedmd.com/power_mgmt.html>.    Options  current
1802              supported by any plugins are listed below.
1803
1804              balance_interval=#
1805                     Specifies the time interval, in seconds, between attempts
1806                     to rebalance power caps across the nodes.  This also con‐
1807                     trols the frequency at which Slurm  attempts  to  collect
1808                     current  power consumption data (old data may be used un‐
1809                     til new data is available from the underlying infrastruc‐
1810                     ture  and values below 10 seconds are not recommended for
1811                     Cray systems).  The default value is  30  seconds.   Sup‐
1812                     ported by the power/cray_aries plugin.
1813
1814              capmc_path=
1815                     Specifies  the  absolute  path of the capmc command.  The
1816                     default  value  is   "/opt/cray/capmc/default/bin/capmc".
1817                     Supported by the power/cray_aries plugin.
1818
1819              cap_watts=#
1820                     Specifies  the total power limit to be established across
1821                     all compute nodes managed by Slurm.  A value  of  0  sets
1822                     every compute node to have an unlimited cap.  The default
1823                     value is 0.  Supported by the power/cray_aries plugin.
1824
1825              decrease_rate=#
1826                     Specifies the maximum rate of change in the power cap for
1827                     a  node  where  the actual power usage is below the power
1828                     cap by an amount greater than  lower_threshold  (see  be‐
1829                     low).   Value  represents  a percentage of the difference
1830                     between a node's minimum and maximum  power  consumption.
1831                     The  default  value  is  50  percent.   Supported  by the
1832                     power/cray_aries plugin.
1833
1834              get_timeout=#
1835                     Amount of time allowed to get power state information  in
1836                     milliseconds.  The default value is 5,000 milliseconds or
1837                     5 seconds.  Supported by the power/cray_aries plugin  and
1838                     represents  the time allowed for the capmc command to re‐
1839                     spond to various "get" options.
1840
1841              increase_rate=#
1842                     Specifies the maximum rate of change in the power cap for
1843                     a  node  where  the  actual  power  usage  is  within up‐
1844                     per_threshold (see below) of the power cap.  Value repre‐
1845                     sents  a  percentage  of  the difference between a node's
1846                     minimum and maximum power consumption.  The default value
1847                     is 20 percent.  Supported by the power/cray_aries plugin.
1848
1849              job_level
1850                     All  nodes  associated  with every job will have the same
1851                     power  cap,  to  the  extent  possible.   Also  see   the
1852                     --power=level option on the job submission commands.
1853
1854              job_no_level
1855                     Disable  the  user's ability to set every node associated
1856                     with a job to the same power cap.  Each  node  will  have
1857                     its  power  cap  set  independently.   This  disables the
1858                     --power=level option on the job submission commands.
1859
1860              lower_threshold=#
1861                     Specify a lower power consumption threshold.  If a node's
1862                     current power consumption is below this percentage of its
1863                     current cap, then its power cap will be reduced.  The de‐
1864                     fault   value   is   90   percent.    Supported   by  the
1865                     power/cray_aries plugin.
1866
1867              recent_job=#
1868                     If a job has started or resumed execution (from  suspend)
1869                     on  a compute node within this number of seconds from the
1870                     current time, the node's power cap will be  increased  to
1871                     the  maximum.   The  default  value is 300 seconds.  Sup‐
1872                     ported by the power/cray_aries plugin.
1873
1874
1875              set_timeout=#
1876                     Amount of time allowed to set power state information  in
1877                     milliseconds.   The  default value is 30,000 milliseconds
1878                     or 30 seconds.  Supported by the  power/cray  plugin  and
1879                     represents  the time allowed for the capmc command to re‐
1880                     spond to various "set" options.
1881
1882              set_watts=#
1883                     Specifies the power limit to  be  set  on  every  compute
1884                     nodes  managed by Slurm.  Every node gets this same power
1885                     cap and there is no variation through time based upon ac‐
1886                     tual   power   usage  on  the  node.   Supported  by  the
1887                     power/cray_aries plugin.
1888
1889              upper_threshold=#
1890                     Specify an  upper  power  consumption  threshold.   If  a
1891                     node's current power consumption is above this percentage
1892                     of its current cap, then its power cap will be  increased
1893                     to the extent possible.  The default value is 95 percent.
1894                     Supported by the power/cray_aries plugin.
1895
1896       PowerPlugin
1897              Identifies the plugin used for system  power  management.   Cur‐
1898              rently  supported  plugins  include:  cray_aries  and  none.   A
1899              restart of slurmctld is required for changes to  this  parameter
1900              to  take effect.  More information about system power management
1901              is available  here  <https://slurm.schedmd.com/power_mgmt.html>.
1902              By default, no power plugin is loaded.
1903
1904       PreemptMode
1905              Mechanism  used  to preempt jobs or enable gang scheduling. When
1906              the PreemptType parameter is set to enable preemption, the  Pre‐
1907              emptMode  selects the default mechanism used to preempt the eli‐
1908              gible jobs for the cluster.
1909              PreemptMode may be specified on a per partition basis  to  over‐
1910              ride  this  default value if PreemptType=preempt/partition_prio.
1911              Alternatively, it can be specified on a per QOS  basis  if  Pre‐
1912              emptType=preempt/qos.  In  either case, a valid default Preempt‐
1913              Mode value must be specified for the cluster  as  a  whole  when
1914              preemption is enabled.
1915              The GANG option is used to enable gang scheduling independent of
1916              whether preemption is enabled (i.e. independent of the  Preempt‐
1917              Type  setting). It can be specified in addition to a PreemptMode
1918              setting with the two  options  comma  separated  (e.g.  Preempt‐
1919              Mode=SUSPEND,GANG).
1920              See         <https://slurm.schedmd.com/preempt.html>         and
1921              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
1922              tails.
1923
1924              NOTE:  For  performance reasons, the backfill scheduler reserves
1925              whole nodes for jobs, not  partial  nodes.  If  during  backfill
1926              scheduling  a  job  preempts  one  or more other jobs, the whole
1927              nodes for those preempted jobs are reserved  for  the  preemptor
1928              job,  even  if  the preemptor job requested fewer resources than
1929              that.  These reserved nodes aren't available to other jobs  dur‐
1930              ing that backfill cycle, even if the other jobs could fit on the
1931              nodes. Therefore, jobs may preempt more resources during a  sin‐
1932              gle backfill iteration than they requested.
1933              NOTE:  For heterogeneous job to be considered for preemption all
1934              components must be eligible for preemption. When a heterogeneous
1935              job is to be preempted the first identified component of the job
1936              with the highest order PreemptMode (SUSPEND (highest),  REQUEUE,
1937              CANCEL  (lowest))  will  be  used to set the PreemptMode for all
1938              components. The GraceTime and user warning signal for each  com‐
1939              ponent  of  the  heterogeneous job remain unique.  Heterogeneous
1940              jobs are excluded from GANG scheduling operations.
1941
1942              OFF         Is the default value and disables job preemption and
1943                          gang  scheduling.   It  is only compatible with Pre‐
1944                          emptType=preempt/none at a global level.   A  common
1945                          use case for this parameter is to set it on a parti‐
1946                          tion to disable preemption for that partition.
1947
1948              CANCEL      The preempted job will be cancelled.
1949
1950              GANG        Enables gang scheduling (time slicing)  of  jobs  in
1951                          the  same partition, and allows the resuming of sus‐
1952                          pended jobs.
1953
1954                          NOTE: Gang scheduling is performed independently for
1955                          each  partition, so if you only want time-slicing by
1956                          OverSubscribe, without any preemption, then  config‐
1957                          uring  partitions with overlapping nodes is not rec‐
1958                          ommended.  On the other hand, if  you  want  to  use
1959                          PreemptType=preempt/partition_prio   to  allow  jobs
1960                          from higher PriorityTier partitions to Suspend  jobs
1961                          from  lower  PriorityTier  partitions  you will need
1962                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1963                          to  use  the  Gang scheduler to resume the suspended
1964                          jobs(s).  In any case, time-slicing won't happen be‐
1965                          tween jobs on different partitions.
1966
1967                          NOTE:  Heterogeneous  jobs  are  excluded  from GANG
1968                          scheduling operations.
1969
1970              REQUEUE     Preempts jobs by requeuing  them  (if  possible)  or
1971                          canceling  them.   For jobs to be requeued they must
1972                          have the --requeue sbatch option set or the  cluster
1973                          wide  JobRequeue parameter in slurm.conf must be set
1974                          to 1.
1975
1976              SUSPEND     The preempted jobs will be suspended, and later  the
1977                          Gang  scheduler will resume them. Therefore the SUS‐
1978                          PEND preemption mode always needs the GANG option to
1979                          be specified at the cluster level. Also, because the
1980                          suspended jobs will still use memory  on  the  allo‐
1981                          cated  nodes, Slurm needs to be able to track memory
1982                          resources to be able to suspend jobs.
1983                          If PreemptType=preempt/qos is configured and if  the
1984                          preempted  job(s)  and  the preemptor job are on the
1985                          same partition, then they will share resources  with
1986                          the  Gang  scheduler (time-slicing). If not (i.e. if
1987                          the preemptees and preemptor are on different parti‐
1988                          tions) then the preempted jobs will remain suspended
1989                          until the preemptor ends.
1990
1991                          NOTE: Because gang scheduling is performed  indepen‐
1992                          dently for each partition, if using PreemptType=pre‐
1993                          empt/partition_prio then jobs in higher PriorityTier
1994                          partitions  will  suspend jobs in lower PriorityTier
1995                          partitions to run on the  released  resources.  Only
1996                          when  the preemptor job ends will the suspended jobs
1997                          will be resumed by the Gang scheduler.
1998                          NOTE: Suspended jobs will not release  GRES.  Higher
1999                          priority  jobs  will  not be able to preempt to gain
2000                          access to GRES.
2001
2002              WITHIN      For PreemptType=preempt/qos, allow jobs  within  the
2003                          same  qos  to preempt one another. While this can be
2004                          set globally here, it is recommend that this only be
2005                          set  directly on a relevant subset of the system qos
2006                          values instead.
2007
2008       PreemptType
2009              Specifies the plugin used to identify which  jobs  can  be  pre‐
2010              empted in order to start a pending job.
2011
2012              preempt/none
2013                     Job preemption is disabled.  This is the default.
2014
2015              preempt/partition_prio
2016                     Job  preemption  is  based  upon  partition PriorityTier.
2017                     Jobs in higher PriorityTier partitions may  preempt  jobs
2018                     from lower PriorityTier partitions.  This is not compati‐
2019                     ble with PreemptMode=OFF.
2020
2021              preempt/qos
2022                     Job preemption rules are specified by Quality Of  Service
2023                     (QOS)  specifications in the Slurm database.  This option
2024                     is not compatible with PreemptMode=OFF.  A  configuration
2025                     of  PreemptMode=SUSPEND  is only supported by the Select‐
2026                     Type=select/cons_res   and    SelectType=select/cons_tres
2027                     plugins.   See the sacctmgr man page to configure the op‐
2028                     tions for preempt/qos.
2029
2030       PreemptExemptTime
2031              Global option for minimum run time for all jobs before they  can
2032              be  considered  for  preemption. Any QOS PreemptExemptTime takes
2033              precedence over the global option. This is only honored for Pre‐
2034              emptMode=REQUEUE and PreemptMode=CANCEL.
2035              A  time  of  -1 disables the option, equivalent to 0. Acceptable
2036              time formats include "minutes",  "minutes:seconds",  "hours:min‐
2037              utes:seconds",     "days-hours",    "days-hours:minutes",    and
2038              "days-hours:minutes:seconds".
2039
2040       PrEpParameters
2041              Parameters to be passed to the PrEpPlugins.
2042
2043       PrEpPlugins
2044              A resource for programmers wishing to write  their  own  plugins
2045              for  the Prolog and Epilog (PrEp) scripts. The default, and cur‐
2046              rently the only implemented plugin  is  prep/script.  Additional
2047              plugins can be specified in a comma-separated list. For more in‐
2048              formation please see the PrEp  Plugin  API  documentation  page:
2049              <https://slurm.schedmd.com/prep_plugins.html>
2050
2051       PriorityCalcPeriod
2052              The  period of time in minutes in which the half-life decay will
2053              be re-calculated.  Applicable only if PriorityType=priority/mul‐
2054              tifactor.  The default value is 5 (minutes).
2055
2056       PriorityDecayHalfLife
2057              This  controls  how long prior resource use is considered in de‐
2058              termining how over- or under-serviced an association  is  (user,
2059              bank  account  and  cluster)  in  determining job priority.  The
2060              record of usage will be decayed over  time,  with  half  of  the
2061              original  value cleared at age PriorityDecayHalfLife.  If set to
2062              0 no decay will be applied.  This is helpful if you want to  en‐
2063              force  hard  time  limits  per association.  If set to 0 Priori‐
2064              tyUsageResetPeriod must be set  to  some  interval.   Applicable
2065              only  if  PriorityType=priority/multifactor.  The unit is a time
2066              string (i.e. min, hr:min:00, days-hr:min:00, or  days-hr).   The
2067              default value is 7-0 (7 days).
2068
2069       PriorityFavorSmall
2070              Specifies  that small jobs should be given preferential schedul‐
2071              ing priority.  Applicable only  if  PriorityType=priority/multi‐
2072              factor.  Supported values are "YES" and "NO".  The default value
2073              is "NO".
2074
2075       PriorityFlags
2076              Flags to modify priority behavior.  Applicable only if Priority‐
2077              Type=priority/multifactor.   The  keywords below have no associ‐
2078              ated   value   (e.g.    "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
2079              TIVE_TO_TIME").
2080
2081              ACCRUE_ALWAYS    If  set,  priority age factor will be increased
2082                               despite job ineligibility due to either  depen‐
2083                               dencies, holds or begin time in the future. Ac‐
2084                               crue limits are ignored.
2085
2086              CALCULATE_RUNNING
2087                               If set, priorities  will  be  recalculated  not
2088                               only  for  pending  jobs,  but also running and
2089                               suspended jobs.
2090
2091              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
2092                               lar  to the normal multifactor calculation, but
2093                               depth of the associations in the tree does  not
2094                               adversely  affect  their  priority. This option
2095                               automatically enables NO_FAIR_TREE.
2096
2097              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
2098                               to "classic" fair share priority scheduling.
2099
2100              INCR_ONLY        If  set,  priority values will only increase in
2101                               value. Job  priority  will  never  decrease  in
2102                               value.
2103
2104              MAX_TRES         If  set,  the  weighted  TRES value (e.g. TRES‐
2105                               BillingWeights) is calculated as the MAX of in‐
2106                               dividual TRES' on a node (e.g. cpus, mem, gres)
2107                               plus the sum of  all  global  TRES'  (e.g.  li‐
2108                               censes).
2109
2110              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
2111
2112              NO_NORMAL_ASSOC  If  set,  the association factor is not normal‐
2113                               ized against the highest association priority.
2114
2115              NO_NORMAL_PART   If set, the partition factor is not  normalized
2116                               against  the  highest partition PriorityJobFac‐
2117                               tor.
2118
2119              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
2120                               against the highest qos priority.
2121
2122              NO_NORMAL_TRES   If  set,  the  TRES  factor  is  not normalized
2123                               against the job's partition TRES counts.
2124
2125              SMALL_RELATIVE_TO_TIME
2126                               If set, the job's size component will be  based
2127                               upon not the job size alone, but the job's size
2128                               divided by its time limit.
2129
2130       PriorityMaxAge
2131              Specifies the job age which will be given the maximum age factor
2132              in  computing priority. For example, a value of 30 minutes would
2133              result in all jobs over  30  minutes  old  would  get  the  same
2134              age-based  priority.   Applicable  only  if  PriorityType=prior‐
2135              ity/multifactor.   The  unit  is  a  time  string   (i.e.   min,
2136              hr:min:00,  days-hr:min:00,  or  days-hr).  The default value is
2137              7-0 (7 days).
2138
2139       PriorityParameters
2140              Arbitrary string used by the PriorityType plugin.
2141
2142       PrioritySiteFactorParameters
2143              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
2144
2145       PrioritySiteFactorPlugin
2146              The specifies an optional plugin to be  used  alongside  "prior‐
2147              ity/multifactor",  which  is meant to initially set and continu‐
2148              ously update the SiteFactor priority factor.  The default  value
2149              is "site_factor/none".
2150
2151       PriorityType
2152              This  specifies  the  plugin  to be used in establishing a job's
2153              scheduling priority.  Also see PriorityFlags  for  configuration
2154              options.  The default value is "priority/basic".
2155
2156              priority/basic
2157                     Jobs  are  evaluated in a First In, First Out (FIFO) man‐
2158                     ner.
2159
2160              priority/multifactor
2161                     Jobs are assigned a priority based upon a variety of fac‐
2162                     tors that include size, age, Fairshare, etc.
2163
2164              When  not FIFO scheduling, jobs are prioritized in the following
2165              order:
2166
2167              1. Jobs that can preempt
2168              2. Jobs with an advanced reservation
2169              3. Partition PriorityTier
2170              4. Job priority
2171              5. Job submit time
2172              6. Job ID
2173
2174       PriorityUsageResetPeriod
2175              At this interval the usage of associations will be reset  to  0.
2176              This  is  used  if you want to enforce hard limits of time usage
2177              per association.  If PriorityDecayHalfLife is set to be 0 no de‐
2178              cay  will happen and this is the only way to reset the usage ac‐
2179              cumulated by running jobs.  By default this is turned off and it
2180              is  advised to use the PriorityDecayHalfLife option to avoid not
2181              having anything running on your cluster, but if your  schema  is
2182              set up to only allow certain amounts of time on your system this
2183              is the way to do it.   Applicable  only  if  PriorityType=prior‐
2184              ity/multifactor.
2185
2186              NONE        Never clear historic usage. The default value.
2187
2188              NOW         Clear  the  historic usage now.  Executed at startup
2189                          and reconfiguration time.
2190
2191              DAILY       Cleared every day at midnight.
2192
2193              WEEKLY      Cleared every week on Sunday at time 00:00.
2194
2195              MONTHLY     Cleared on the first  day  of  each  month  at  time
2196                          00:00.
2197
2198              QUARTERLY   Cleared  on  the  first  day of each quarter at time
2199                          00:00.
2200
2201              YEARLY      Cleared on the first day of each year at time 00:00.
2202
2203       PriorityWeightAge
2204              An integer value that sets the degree to which  the  queue  wait
2205              time  component  contributes  to the job's priority.  Applicable
2206              only if  PriorityType=priority/multifactor.   Requires  Account‐
2207              ingStorageType=accounting_storage/slurmdbd.   The  default value
2208              is 0.
2209
2210       PriorityWeightAssoc
2211              An integer value that sets the degree to which  the  association
2212              component contributes to the job's priority.  Applicable only if
2213              PriorityType=priority/multifactor.  The default value is 0.
2214
2215       PriorityWeightFairshare
2216              An integer value that sets the degree to  which  the  fair-share
2217              component contributes to the job's priority.  Applicable only if
2218              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2219              ageType=accounting_storage/slurmdbd.  The default value is 0.
2220
2221       PriorityWeightJobSize
2222              An integer value that sets the degree to which the job size com‐
2223              ponent contributes to the job's priority.   Applicable  only  if
2224              PriorityType=priority/multifactor.  The default value is 0.
2225
2226       PriorityWeightPartition
2227              Partition  factor  used by priority/multifactor plugin in calcu‐
2228              lating job priority.   Applicable  only  if  PriorityType=prior‐
2229              ity/multifactor.  The default value is 0.
2230
2231       PriorityWeightQOS
2232              An  integer  value  that sets the degree to which the Quality Of
2233              Service component contributes to the job's priority.  Applicable
2234              only if PriorityType=priority/multifactor.  The default value is
2235              0.
2236
2237       PriorityWeightTRES
2238              A comma-separated list of TRES Types and weights that  sets  the
2239              degree that each TRES Type contributes to the job's priority.
2240
2241              e.g.
2242              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2243
2244              Applicable  only if PriorityType=priority/multifactor and if Ac‐
2245              countingStorageTRES is configured with each TRES Type.  Negative
2246              values are allowed.  The default values are 0.
2247
2248       PrivateData
2249              This  controls  what  type of information is hidden from regular
2250              users.  By default, all information is  visible  to  all  users.
2251              User SlurmUser and root can always view all information.  Multi‐
2252              ple values may be specified with a comma separator.   Acceptable
2253              values include:
2254
2255              accounts
2256                     (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2257                     ing any account definitions unless they are  coordinators
2258                     of them.
2259
2260              cloud  Powered  down  nodes  in  the cloud are visible.  Without
2261                     this flag, cloud nodes will not appear in the  output  of
2262                     commands  like sinfo unless they are powered on, even for
2263                     SlurmUser and root.
2264
2265              events prevents users from viewing event information unless they
2266                     have operator status or above.
2267
2268              jobs   Prevents  users  from viewing jobs or job steps belonging
2269                     to other users. (NON-SlurmDBD ACCOUNTING  ONLY)  Prevents
2270                     users  from  viewing job records belonging to other users
2271                     unless they are coordinators of the  association  running
2272                     the job when using sacct.
2273
2274              nodes  Prevents users from viewing node state information.
2275
2276              partitions
2277                     Prevents users from viewing partition state information.
2278
2279              reservations
2280                     Prevents  regular  users  from viewing reservations which
2281                     they can not use.
2282
2283              usage  Prevents users from viewing usage of any other user, this
2284                     applies  to  sshare.  (NON-SlurmDBD ACCOUNTING ONLY) Pre‐
2285                     vents users from viewing usage of any  other  user,  this
2286                     applies to sreport.
2287
2288              users  (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2289                     ing information of any user other than  themselves,  this
2290                     also  makes  it  so  users can only see associations they
2291                     deal with.  Coordinators  can  see  associations  of  all
2292                     users  in  the  account  they are coordinator of, but can
2293                     only see themselves when listing users.
2294
2295       ProctrackType
2296              Identifies the plugin to be used for process tracking on  a  job
2297              step  basis.   The slurmd daemon uses this mechanism to identify
2298              all processes which are children of processes it  spawns  for  a
2299              user  job  step.  A restart of slurmctld is required for changes
2300              to this parameter to take effect.   NOTE:  "proctrack/linuxproc"
2301              and  "proctrack/pgid" can fail to identify all processes associ‐
2302              ated with a job since processes can become a child of  the  init
2303              process  (when  the  parent  process terminates) or change their
2304              process  group.   To  reliably  track  all   processes,   "proc‐
2305              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2306              applies to a job allocation, while ProctrackType applies to  job
2307              steps.  Acceptable values at present include:
2308
2309              proctrack/cgroup
2310                     Uses  linux cgroups to constrain and track processes, and
2311                     is the default for systems with cgroup support.
2312                     NOTE: see "man cgroup.conf" for configuration details.
2313
2314              proctrack/cray_aries
2315                     Uses Cray proprietary process tracking.
2316
2317              proctrack/linuxproc
2318                     Uses linux process tree using parent process IDs.
2319
2320              proctrack/pgid
2321                     Uses Process Group IDs.
2322                     NOTE: This is the default for the BSD family.
2323
2324       Prolog Fully qualified pathname of a program for the slurmd to  execute
2325              whenever it is asked to run a job step from a new job allocation
2326              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2327              may  also  be used to specify more than one program to run (e.g.
2328              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2329              starting  the  first job step.  The prolog script or scripts may
2330              be used to purge files, enable  user  login,  etc.   By  default
2331              there  is  no  prolog. Any configured script is expected to com‐
2332              plete execution quickly (in less time than MessageTimeout).   If
2333              the  prolog  fails (returns a non-zero exit code), this will re‐
2334              sult in the node being set to a DRAIN state and  the  job  being
2335              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2336              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2337              for more information.
2338
2339       PrologEpilogTimeout
2340              The interval in seconds Slurm waits for Prolog and Epilog before
2341              terminating them. The default behavior is to wait  indefinitely.
2342              This  interval  applies  to  the Prolog and Epilog run by slurmd
2343              daemon before and after the job, the  PrologSlurmctld  and  Epi‐
2344              logSlurmctld  run by slurmctld daemon, and the SPANK plugin pro‐
2345              log/epilog       calls:        slurm_spank_job_prolog        and
2346              slurm_spank_job_epilog.
2347              If  the PrologSlurmctld times out, the job is requeued if possi‐
2348              ble.  If the Prolog or slurm_spank_job_prolog time out, the  job
2349              is  requeued if possible and the node is drained.  If the Epilog
2350              or slurm_spank_job_epilog time out, the node is drained.  In all
2351              cases, errors are logged.
2352
2353       PrologFlags
2354              Flags  to  control  the Prolog behavior. By default no flags are
2355              set.  Multiple flags may be specified in a comma-separated list.
2356              Currently supported options are:
2357
2358              Alloc   If  set, the Prolog script will be executed at job allo‐
2359                      cation. By default, Prolog is executed just  before  the
2360                      task  is launched. Therefore, when salloc is started, no
2361                      Prolog is executed. Alloc is useful for preparing things
2362                      before a user starts to use any allocated resources.  In
2363                      particular, this flag is needed on a  Cray  system  when
2364                      cluster compatibility mode is enabled.
2365
2366                      NOTE:  Use  of the Alloc flag will increase the time re‐
2367                      quired to start jobs.
2368
2369              Contain At job allocation time, use the ProcTrack plugin to cre‐
2370                      ate  a  job  container  on  all allocated compute nodes.
2371                      This container  may  be  used  for  user  processes  not
2372                      launched    under    Slurm    control,    for    example
2373                      pam_slurm_adopt may place processes launched  through  a
2374                      direct   user   login  into  this  container.  If  using
2375                      pam_slurm_adopt, then ProcTrackType must be set  to  ei‐
2376                      ther  proctrack/cgroup or proctrack/cray_aries.  Setting
2377                      the Contain implicitly sets the Alloc flag.
2378
2379              DeferBatch
2380                      If set, slurmctld will wait until the  prolog  completes
2381                      on  all  allocated  nodes  before  sending the batch job
2382                      launch request. With just the Alloc flag, slurmctld will
2383                      launch  the  batch step as soon as the first node in the
2384                      job allocation completes the prolog.
2385
2386              NoHold  If set, the Alloc flag should also be  set.   This  will
2387                      allow  for  salloc to not block until the prolog is fin‐
2388                      ished on each node.  The blocking will happen when steps
2389                      reach  the  slurmd and before any execution has happened
2390                      in the step.  This is a much faster way to work  and  if
2391                      using  srun  to  launch  your  tasks you should use this
2392                      flag. This flag cannot be combined with the  Contain  or
2393                      X11 flags.
2394
2395              Serial  By  default,  the  Prolog and Epilog scripts run concur‐
2396                      rently on each node.  This flag forces those scripts  to
2397                      run  serially  within  each node, but with a significant
2398                      penalty to job throughput on each node.
2399
2400              X11     Enable Slurm's  built-in  X11  forwarding  capabilities.
2401                      This is incompatible with ProctrackType=proctrack/linux‐
2402                      proc.  Setting the X11 flag implicitly enables both Con‐
2403                      tain and Alloc flags as well.
2404
2405       PrologSlurmctld
2406              Fully  qualified  pathname of a program for the slurmctld daemon
2407              to execute before granting a new job allocation (e.g.  "/usr/lo‐
2408              cal/slurm/prolog_controller").   The  program  executes as Slur‐
2409              mUser on the same node where the slurmctld daemon executes, giv‐
2410              ing  it permission to drain nodes and requeue the job if a fail‐
2411              ure occurs or cancel the job if appropriate.  Exactly  what  the
2412              program  does  and how it accomplishes this is completely at the
2413              discretion of the system administrator.  Information  about  the
2414              job being initiated, its allocated nodes, etc. are passed to the
2415              program using environment variables.  While this program is run‐
2416              ning,  the  nodes  associated  with  the  job  will  be  have  a
2417              POWER_UP/CONFIGURING flag set in their state, which can be read‐
2418              ily  viewed.   The  slurmctld  daemon will wait indefinitely for
2419              this program to complete.  Once the program  completes  with  an
2420              exit  code  of  zero, the nodes will be considered ready for use
2421              and the program will be started.  If some node can not  be  made
2422              available  for use, the program should drain the node (typically
2423              using the scontrol command) and terminate with a  non-zero  exit
2424              code.   A  non-zero  exit  code will result in the job being re‐
2425              queued (where possible) or killed. Note that only batch jobs can
2426              be  requeued.   See  Prolog and Epilog Scripts for more informa‐
2427              tion.
2428
2429       PropagatePrioProcess
2430              Controls the scheduling priority (nice value)  of  user  spawned
2431              tasks.
2432
2433              0    The  tasks  will  inherit  the scheduling priority from the
2434                   slurm daemon.  This is the default value.
2435
2436              1    The tasks will inherit the scheduling priority of the  com‐
2437                   mand used to submit them (e.g. srun or sbatch).  Unless the
2438                   job is submitted by user root, the tasks will have a sched‐
2439                   uling  priority  no  higher  than the slurm daemon spawning
2440                   them.
2441
2442              2    The tasks will inherit the scheduling priority of the  com‐
2443                   mand used to submit them (e.g. srun or sbatch) with the re‐
2444                   striction that their nice value will always be  one  higher
2445                   than  the slurm daemon (i.e.  the tasks scheduling priority
2446                   will be lower than the slurm daemon).
2447
2448       PropagateResourceLimits
2449              A comma-separated list of resource limit names.  The slurmd dae‐
2450              mon  uses these names to obtain the associated (soft) limit val‐
2451              ues from the user's process  environment  on  the  submit  node.
2452              These  limits  are  then propagated and applied to the jobs that
2453              will run on the compute nodes.  This  parameter  can  be  useful
2454              when  system  limits vary among nodes.  Any resource limits that
2455              do not appear in the list are not propagated.  However, the user
2456              can  override this by specifying which resource limits to propa‐
2457              gate with the sbatch or srun "--propagate"  option.  If  neither
2458              PropagateResourceLimits   or  PropagateResourceLimitsExcept  are
2459              configured and the "--propagate" option is not  specified,  then
2460              the  default  action is to propagate all limits. Only one of the
2461              parameters, either PropagateResourceLimits or PropagateResource‐
2462              LimitsExcept,  may be specified.  The user limits can not exceed
2463              hard limits under which the slurmd daemon operates. If the  user
2464              limits  are  not  propagated,  the limits from the slurmd daemon
2465              will be propagated to the user's job. The limits  used  for  the
2466              Slurm  daemons  can  be  set in the /etc/sysconf/slurm file. For
2467              more information,  see:  https://slurm.schedmd.com/faq.html#mem‐
2468              lock  The following limit names are supported by Slurm (although
2469              some options may not be supported on some systems):
2470
2471              ALL       All limits listed below (default)
2472
2473              NONE      No limits listed below
2474
2475              AS        The maximum  address  space  (virtual  memory)  for  a
2476                        process.
2477
2478              CORE      The maximum size of core file
2479
2480              CPU       The maximum amount of CPU time
2481
2482              DATA      The maximum size of a process's data segment
2483
2484              FSIZE     The  maximum  size  of files created. Note that if the
2485                        user sets FSIZE to less than the current size  of  the
2486                        slurmd.log,  job  launches will fail with a 'File size
2487                        limit exceeded' error.
2488
2489              MEMLOCK   The maximum size that may be locked into memory
2490
2491              NOFILE    The maximum number of open files
2492
2493              NPROC     The maximum number of processes available
2494
2495              RSS       The maximum resident set size.  Note  that  this  only
2496                        has effect with Linux kernels 2.4.30 or older or BSD.
2497
2498              STACK     The maximum stack size
2499
2500       PropagateResourceLimitsExcept
2501              A comma-separated list of resource limit names.  By default, all
2502              resource limits will be propagated, (as described by the  Propa‐
2503              gateResourceLimits  parameter),  except for the limits appearing
2504              in this list.   The user can override this by  specifying  which
2505              resource  limits  to propagate with the sbatch or srun "--propa‐
2506              gate" option.  See PropagateResourceLimits above for a  list  of
2507              valid limit names.
2508
2509       RebootProgram
2510              Program  to  be  executed on each compute node to reboot it. In‐
2511              voked on each node once it becomes idle after the command "scon‐
2512              trol  reboot" is executed by an authorized user or a job is sub‐
2513              mitted with the "--reboot" option.  After rebooting, the node is
2514              returned to normal use.  See ResumeTimeout to configure the time
2515              you expect a reboot to finish in.  A node will be marked DOWN if
2516              it doesn't reboot within ResumeTimeout.
2517
2518       ReconfigFlags
2519              Flags  to  control  various  actions  that  may be taken when an
2520              "scontrol reconfig" command is  issued.  Currently  the  options
2521              are:
2522
2523              KeepPartInfo     If  set,  an  "scontrol  reconfig" command will
2524                               maintain  the  in-memory  value  of   partition
2525                               "state" and other parameters that may have been
2526                               dynamically updated by "scontrol update".  Par‐
2527                               tition  information in the slurm.conf file will
2528                               be merged with in-memory data.  This  flag  su‐
2529                               persedes the KeepPartState flag.
2530
2531              KeepPartState    If  set,  an  "scontrol  reconfig" command will
2532                               preserve only  the  current  "state"  value  of
2533                               in-memory  partitions  and will reset all other
2534                               parameters of the partitions that may have been
2535                               dynamically updated by "scontrol update" to the
2536                               values from the slurm.conf file.  Partition in‐
2537                               formation in the slurm.conf file will be merged
2538                               with in-memory data.
2539
2540              The default for the above flags is not set,  and  the  "scontrol
2541              reconfig"  will rebuild the partition information using only the
2542              definitions in the slurm.conf file.
2543
2544       RequeueExit
2545              Enables automatic requeue for batch jobs  which  exit  with  the
2546              specified values.  Separate multiple exit code by a comma and/or
2547              specify numeric ranges using a  "-"  separator  (e.g.  "Requeue‐
2548              Exit=1-9,18")  Jobs  will  be  put  back in to pending state and
2549              later scheduled again.  Restarted jobs will have the environment
2550              variable  SLURM_RESTART_COUNT set to the number of times the job
2551              has been restarted.
2552
2553       RequeueExitHold
2554              Enables automatic requeue for batch jobs  which  exit  with  the
2555              specified values, with these jobs being held until released man‐
2556              ually by the user.  Separate  multiple  exit  code  by  a  comma
2557              and/or  specify  numeric ranges using a "-" separator (e.g. "Re‐
2558              queueExitHold=10-12,16") These jobs  are  put  in  the  JOB_SPE‐
2559              CIAL_EXIT  exit state.  Restarted jobs will have the environment
2560              variable SLURM_RESTART_COUNT set to the number of times the  job
2561              has been restarted.
2562
2563       ResumeFailProgram
2564              The  program  that will be executed when nodes fail to resume to
2565              by ResumeTimeout. The argument to the program will be the  names
2566              of the failed nodes (using Slurm's hostlist expression format).
2567
2568       ResumeProgram
2569              Slurm  supports a mechanism to reduce power consumption on nodes
2570              that remain idle for an extended period of time.  This is  typi‐
2571              cally accomplished by reducing voltage and frequency or powering
2572              the node down.  ResumeProgram is the program that will  be  exe‐
2573              cuted  when  a  node in power save mode is assigned work to per‐
2574              form.  For reasons of  reliability,  ResumeProgram  may  execute
2575              more  than once for a node when the slurmctld daemon crashes and
2576              is restarted.  If ResumeProgram is unable to restore a  node  to
2577              service  with  a  responding  slurmd and an updated BootTime, it
2578              should set the node state to DOWN, which will result  in  a  re‐
2579              queue of any job associated with the node - this will happen au‐
2580              tomatically if the node doesn't register  within  ResumeTimeout.
2581              If  the  node isn't actually rebooted (i.e. when multiple-slurmd
2582              is configured) starting slurmd with "-b" option might be useful.
2583              The  program executes as SlurmUser.  The argument to the program
2584              will be the names of nodes to be removed from power savings mode
2585              (using  Slurm's  hostlist expression format). A job to node map‐
2586              ping is available in JSON format by reading the  temporary  file
2587              specified by the SLURM_RESUME_FILE environment variable.  By de‐
2588              fault no program is run.
2589
2590       ResumeRate
2591              The rate at which nodes in power save mode are returned to  nor‐
2592              mal  operation by ResumeProgram.  The value is a number of nodes
2593              per minute and it can be used to prevent power surges if a large
2594              number of nodes in power save mode are assigned work at the same
2595              time (e.g. a large job starts).  A value of zero results  in  no
2596              limits  being  imposed.   The  default  value  is  300 nodes per
2597              minute.
2598
2599       ResumeTimeout
2600              Maximum time permitted (in seconds) between when a  node  resume
2601              request  is  issued  and when the node is actually available for
2602              use.  Nodes which fail to respond in this  time  frame  will  be
2603              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
2604              which reboot after this time frame will be marked  DOWN  with  a
2605              reason of "Node unexpectedly rebooted."  The default value is 60
2606              seconds.
2607
2608       ResvEpilog
2609              Fully qualified pathname of a program for the slurmctld to  exe‐
2610              cute  when a reservation ends. The program can be used to cancel
2611              jobs, modify  partition  configuration,  etc.   The  reservation
2612              named  will be passed as an argument to the program.  By default
2613              there is no epilog.
2614
2615       ResvOverRun
2616              Describes how long a job already running in a reservation should
2617              be  permitted  to  execute after the end time of the reservation
2618              has been reached.  The time period is specified in  minutes  and
2619              the  default  value  is 0 (kill the job immediately).  The value
2620              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2621              supported to permit a job to run indefinitely after its reserva‐
2622              tion is terminated.
2623
2624       ResvProlog
2625              Fully qualified pathname of a program for the slurmctld to  exe‐
2626              cute  when a reservation begins. The program can be used to can‐
2627              cel jobs, modify partition configuration, etc.  The  reservation
2628              named  will be passed as an argument to the program.  By default
2629              there is no prolog.
2630
2631       ReturnToService
2632              Controls when a DOWN node will be returned to service.  The  de‐
2633              fault value is 0.  Supported values include
2634
2635              0   A node will remain in the DOWN state until a system adminis‐
2636                  trator explicitly changes its state (even if the slurmd dae‐
2637                  mon registers and resumes communications).
2638
2639              1   A  DOWN node will become available for use upon registration
2640                  with a valid configuration only if it was set  DOWN  due  to
2641                  being  non-responsive.   If  the  node  was set DOWN for any
2642                  other reason (low  memory,  unexpected  reboot,  etc.),  its
2643                  state  will  not automatically be changed.  A node registers
2644                  with a valid configuration if its memory, GRES,  CPU  count,
2645                  etc.  are  equal to or greater than the values configured in
2646                  slurm.conf.
2647
2648              2   A DOWN node will become available for use upon  registration
2649                  with  a  valid  configuration.  The node could have been set
2650                  DOWN for any reason.  A node registers with a valid configu‐
2651                  ration  if its memory, GRES, CPU count, etc. are equal to or
2652                  greater than the values configured in slurm.conf.
2653
2654       RoutePlugin
2655              Identifies the plugin to be used for defining which  nodes  will
2656              be used for message forwarding.
2657
2658              route/default
2659                     default, use TreeWidth.
2660
2661              route/topology
2662                     use the switch hierarchy defined in a topology.conf file.
2663                     TopologyPlugin=topology/tree is required.
2664
2665       SchedulerParameters
2666              The interpretation of this parameter  varies  by  SchedulerType.
2667              Multiple options may be comma separated.
2668
2669              allow_zero_lic
2670                     If set, then job submissions requesting more than config‐
2671                     ured licenses won't be rejected.
2672
2673              assoc_limit_stop
2674                     If set and a job cannot start due to association  limits,
2675                     then  do  not attempt to initiate any lower priority jobs
2676                     in that  partition.  Setting  this  can  decrease  system
2677                     throughput  and utilization, but avoid potentially starv‐
2678                     ing larger jobs by preventing them from launching indefi‐
2679                     nitely.
2680
2681              batch_sched_delay=#
2682                     How long, in seconds, the scheduling of batch jobs can be
2683                     delayed.  This can be useful in a  high-throughput  envi‐
2684                     ronment  in which batch jobs are submitted at a very high
2685                     rate (i.e. using the sbatch command) and  one  wishes  to
2686                     reduce the overhead of attempting to schedule each job at
2687                     submit time.  The default value is 3 seconds.
2688
2689              bb_array_stage_cnt=#
2690                     Number of tasks from a job array that should be available
2691                     for  burst buffer resource allocation. Higher values will
2692                     increase the system overhead as each task  from  the  job
2693                     array  will  be moved to its own job record in memory, so
2694                     relatively small values are generally  recommended.   The
2695                     default value is 10.
2696
2697              bf_busy_nodes
2698                     When  selecting resources for pending jobs to reserve for
2699                     future execution (i.e. the job can not be started immedi‐
2700                     ately), then preferentially select nodes that are in use.
2701                     This will tend to leave currently idle  resources  avail‐
2702                     able  for backfilling longer running jobs, but may result
2703                     in allocations having less than optimal network topology.
2704                     This  option  is  currently  only  supported  by  the se‐
2705                     lect/cons_res  and  select/cons_tres  plugins   (or   se‐
2706                     lect/cray_aries    with   SelectTypeParameters   set   to
2707                     "OTHER_CONS_RES" or "OTHER_CONS_TRES", which  layers  the
2708                     select/cray_aries  plugin over the select/cons_res or se‐
2709                     lect/cons_tres plugin respectively).
2710
2711              bf_continue
2712                     The backfill scheduler periodically releases locks in or‐
2713                     der  to  permit  other  operations to proceed rather than
2714                     blocking all activity for what could be an  extended  pe‐
2715                     riod  of  time.  Setting this option will cause the back‐
2716                     fill scheduler to continue processing pending  jobs  from
2717                     its  original  job list after releasing locks even if job
2718                     or node state changes.
2719
2720              bf_hetjob_immediate
2721                     Instruct the backfill scheduler to  attempt  to  start  a
2722                     heterogeneous  job  as  soon as all of its components are
2723                     determined able to do so. Otherwise, the backfill  sched‐
2724                     uler  will  delay  heterogeneous jobs initiation attempts
2725                     until after the rest of the  queue  has  been  processed.
2726                     This  delay may result in lower priority jobs being allo‐
2727                     cated resources, which could delay the initiation of  the
2728                     heterogeneous  job due to account and/or QOS limits being
2729                     reached. This option is disabled by default.  If  enabled
2730                     and bf_hetjob_prio=min is not set, then it would be auto‐
2731                     matically set.
2732
2733              bf_hetjob_prio=[min|avg|max]
2734                     At the beginning of each  backfill  scheduling  cycle,  a
2735                     list  of pending to be scheduled jobs is sorted according
2736                     to the precedence order configured in PriorityType.  This
2737                     option instructs the scheduler to alter the sorting algo‐
2738                     rithm to ensure that all components belonging to the same
2739                     heterogeneous  job will be attempted to be scheduled con‐
2740                     secutively (thus not fragmented in the  resulting  list).
2741                     More specifically, all components from the same heteroge‐
2742                     neous job will be treated as if they all  have  the  same
2743                     priority (minimum, average or maximum depending upon this
2744                     option's parameter) when compared  with  other  jobs  (or
2745                     other  heterogeneous  job components). The original order
2746                     will be preserved within the same heterogeneous job. Note
2747                     that  the  operation  is  calculated for the PriorityTier
2748                     layer and for the  Priority  resulting  from  the  prior‐
2749                     ity/multifactor plugin calculations. When enabled, if any
2750                     heterogeneous job requested an advanced reservation, then
2751                     all  of  that job's components will be treated as if they
2752                     had requested an advanced reservation (and get  preferen‐
2753                     tial treatment in scheduling).
2754
2755                     Note  that  this  operation  does not update the Priority
2756                     values of the heterogeneous job  components,  only  their
2757                     order within the list, so the output of the sprio command
2758                     will not be effected.
2759
2760                     Heterogeneous jobs have  special  scheduling  properties:
2761                     they  are  only  scheduled  by  the  backfill  scheduling
2762                     plugin, each of their components is considered separately
2763                     when reserving resources (and might have different Prior‐
2764                     ityTier or different Priority values), and  no  heteroge‐
2765                     neous job component is actually allocated resources until
2766                     all if its components can be initiated.  This  may  imply
2767                     potential  scheduling  deadlock  scenarios because compo‐
2768                     nents from different heterogeneous jobs can start reserv‐
2769                     ing  resources  in  an  interleaved fashion (not consecu‐
2770                     tively), but none of the jobs can reserve  resources  for
2771                     all  components  and start. Enabling this option can help
2772                     to mitigate this problem. By default, this option is dis‐
2773                     abled.
2774
2775              bf_interval=#
2776                     The   number  of  seconds  between  backfill  iterations.
2777                     Higher values result in less overhead and better  respon‐
2778                     siveness.    This   option  applies  only  to  Scheduler‐
2779                     Type=sched/backfill.  Default: 30,  Min:  1,  Max:  10800
2780                     (3h).  A setting of -1 will disable the backfill schedul‐
2781                     ing loop.
2782
2783              bf_job_part_count_reserve=#
2784                     The backfill scheduling logic will reserve resources  for
2785                     the specified count of highest priority jobs in each par‐
2786                     tition.  For example,  bf_job_part_count_reserve=10  will
2787                     cause the backfill scheduler to reserve resources for the
2788                     ten highest priority jobs in each partition.   Any  lower
2789                     priority  job  that can be started using currently avail‐
2790                     able resources and  not  adversely  impact  the  expected
2791                     start  time of these higher priority jobs will be started
2792                     by the backfill scheduler  The  default  value  is  zero,
2793                     which  will reserve resources for any pending job and de‐
2794                     lay  initiation  of  lower  priority  jobs.    Also   see
2795                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2796                     Min: 0, Max: 100000.
2797
2798              bf_licenses
2799                     Require the backfill scheduling logic to track  and  plan
2800                     for  license availability. By default, any job blocked on
2801                     license availability will  not  have  resources  reserved
2802                     which can lead to job starvation.  This option implicitly
2803                     enables bf_running_job_reserve.
2804
2805              bf_max_job_array_resv=#
2806                     The maximum number of tasks from a job  array  for  which
2807                     the  backfill scheduler will reserve resources in the fu‐
2808                     ture.  Since job arrays can potentially have millions  of
2809                     tasks,  the overhead in reserving resources for all tasks
2810                     can be prohibitive.  In addition various limits may  pre‐
2811                     vent  all  the  jobs from starting at the expected times.
2812                     This has no impact upon the number of tasks  from  a  job
2813                     array  that  can be started immediately, only those tasks
2814                     expected to start at some future time.  Default: 20, Min:
2815                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2816                     tions appear in the job queue once per partition. If dif‐
2817                     ferent copies of a single job array record aren't consec‐
2818                     utive in the job queue and another job array record is in
2819                     between,  then bf_max_job_array_resv tasks are considered
2820                     per partition that the job is submitted to.
2821
2822              bf_max_job_assoc=#
2823                     The maximum number of jobs per user  association  to  at‐
2824                     tempt starting with the backfill scheduler.  This setting
2825                     is similar to bf_max_job_user but is handy if a user  has
2826                     multiple  associations  equating  to  basically different
2827                     users.  One can set this  limit  to  prevent  users  from
2828                     flooding  the  backfill queue with jobs that cannot start
2829                     and that prevent jobs from other users  to  start.   This
2830                     option   applies  only  to  SchedulerType=sched/backfill.
2831                     Also    see    the    bf_max_job_user    bf_max_job_part,
2832                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2833                     bf_max_job_test   to   a   value   much    higher    than
2834                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2835                     bf_max_job_test.
2836
2837              bf_max_job_part=#
2838                     The maximum number  of  jobs  per  partition  to  attempt
2839                     starting  with  the backfill scheduler. This can be espe‐
2840                     cially helpful for systems with large numbers  of  parti‐
2841                     tions  and  jobs.  This option applies only to Scheduler‐
2842                     Type=sched/backfill.  Also  see  the  partition_job_depth
2843                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2844                     value much higher than bf_max_job_part.  Default:  0  (no
2845                     limit), Min: 0, Max: bf_max_job_test.
2846
2847              bf_max_job_start=#
2848                     The  maximum  number  of jobs which can be initiated in a
2849                     single iteration of the backfill scheduler.  This  option
2850                     applies only to SchedulerType=sched/backfill.  Default: 0
2851                     (no limit), Min: 0, Max: 10000.
2852
2853              bf_max_job_test=#
2854                     The maximum number of jobs to attempt backfill scheduling
2855                     for (i.e. the queue depth).  Higher values result in more
2856                     overhead and less responsiveness.  Until  an  attempt  is
2857                     made  to backfill schedule a job, its expected initiation
2858                     time value will not be set.  In the case of  large  clus‐
2859                     ters,  configuring a relatively small value may be desir‐
2860                     able.    This   option   applies   only   to   Scheduler‐
2861                     Type=sched/backfill.    Default:   500,   Min:   1,  Max:
2862                     1,000,000.
2863
2864              bf_max_job_user=#
2865                     The maximum number of jobs per user to  attempt  starting
2866                     with  the backfill scheduler for ALL partitions.  One can
2867                     set this limit to prevent users from flooding  the  back‐
2868                     fill  queue  with jobs that cannot start and that prevent
2869                     jobs from other users to start.  This is similar  to  the
2870                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2871                     SchedulerType=sched/backfill.      Also      see      the
2872                     bf_max_job_part,            bf_max_job_test           and
2873                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2874                     value  much  higher than bf_max_job_user.  Default: 0 (no
2875                     limit), Min: 0, Max: bf_max_job_test.
2876
2877              bf_max_job_user_part=#
2878                     The maximum number of jobs per user per partition to  at‐
2879                     tempt starting with the backfill scheduler for any single
2880                     partition.   This  option  applies  only  to   Scheduler‐
2881                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2882                     bf_max_job_test and bf_max_job_user=# options.   Default:
2883                     0 (no limit), Min: 0, Max: bf_max_job_test.
2884
2885              bf_max_time=#
2886                     The  maximum  time  in seconds the backfill scheduler can
2887                     spend (including time spent sleeping when locks  are  re‐
2888                     leased)  before discontinuing, even if maximum job counts
2889                     have not been  reached.   This  option  applies  only  to
2890                     SchedulerType=sched/backfill.   The  default value is the
2891                     value of bf_interval (which defaults to 30 seconds).  De‐
2892                     fault: bf_interval value (def. 30 sec), Min: 1, Max: 3600
2893                     (1h).  NOTE: If bf_interval is short and  bf_max_time  is
2894                     large, this may cause locks to be acquired too frequently
2895                     and starve out other serviced RPCs. It's advisable if us‐
2896                     ing  this  parameter  to set max_rpc_cnt high enough that
2897                     scheduling isn't always disabled, and low enough that the
2898                     interactive  workload can get through in a reasonable pe‐
2899                     riod of time. max_rpc_cnt needs to be below 256 (the  de‐
2900                     fault  RPC thread limit). Running around the middle (150)
2901                     may give you good results.   NOTE:  When  increasing  the
2902                     amount  of  time  spent in the backfill scheduling cycle,
2903                     Slurm can be prevented from responding to client requests
2904                     in  a  timely  manner.   To  address  this  you  can  use
2905                     max_rpc_cnt to specify a number of queued RPCs before the
2906                     scheduler stops to respond to these requests.
2907
2908              bf_min_age_reserve=#
2909                     The  backfill  and main scheduling logic will not reserve
2910                     resources for pending jobs until they have  been  pending
2911                     and  runnable  for  at least the specified number of sec‐
2912                     onds.  In addition, jobs waiting for less than the speci‐
2913                     fied number of seconds will not prevent a newly submitted
2914                     job from starting immediately, even if the newly  submit‐
2915                     ted  job  has  a lower priority.  This can be valuable if
2916                     jobs lack time limits or all time limits  have  the  same
2917                     value.  The default value is zero, which will reserve re‐
2918                     sources for any pending job and delay initiation of lower
2919                     priority  jobs.   Also  see bf_job_part_count_reserve and
2920                     bf_min_prio_reserve.  Default: 0, Min:  0,  Max:  2592000
2921                     (30 days).
2922
2923              bf_min_prio_reserve=#
2924                     The  backfill  and main scheduling logic will not reserve
2925                     resources for pending jobs unless they  have  a  priority
2926                     equal  to  or  higher than the specified value.  In addi‐
2927                     tion, jobs with a lower priority will not prevent a newly
2928                     submitted  job  from  starting  immediately,  even if the
2929                     newly submitted job has a lower priority.   This  can  be
2930                     valuable  if  one  wished  to maximize system utilization
2931                     without regard for job priority below a  certain  thresh‐
2932                     old.   The  default value is zero, which will reserve re‐
2933                     sources for any pending job and delay initiation of lower
2934                     priority  jobs.   Also  see bf_job_part_count_reserve and
2935                     bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2936
2937              bf_node_space_size=#
2938                     Size of backfill node_space table. Adding a single job to
2939                     backfill  reservations  in the worst case can consume two
2940                     node_space records.  In the case of large clusters,  con‐
2941                     figuring a relatively small value may be desirable.  This
2942                     option  applies  only  to   SchedulerType=sched/backfill.
2943                     Also see bf_max_job_test and bf_running_job_reserve.  De‐
2944                     fault: bf_max_job_test, Min: 2, Max: 2,000,000.
2945
2946              bf_one_resv_per_job
2947                     Disallow adding more than one  backfill  reservation  per
2948                     job.   The  scheduling logic builds a sorted list of job-
2949                     partition pairs. Jobs submitted  to  multiple  partitions
2950                     have as many entries in the list as requested partitions.
2951                     By default, the backfill scheduler may evaluate  all  the
2952                     job-partition  entries  for a single job, potentially re‐
2953                     serving resources for each pair, but  only  starting  the
2954                     job  in the reservation offering the earliest start time.
2955                     Having a single job reserving resources for multiple par‐
2956                     titions  could  impede  other jobs (or hetjob components)
2957                     from reserving resources already reserved for the  parti‐
2958                     tions that don't offer the earliest start time.  A single
2959                     job that requests multiple partitions  can  also  prevent
2960                     itself  from  starting earlier in a lower priority parti‐
2961                     tion if the  partitions  overlap  nodes  and  a  backfill
2962                     reservation in the higher priority partition blocks nodes
2963                     that are also in the lower priority partition.  This  op‐
2964                     tion  makes it so that a job submitted to multiple parti‐
2965                     tions will stop reserving resources once the  first  job-
2966                     partition  pair has booked a backfill reservation. Subse‐
2967                     quent pairs from the same job  will  only  be  tested  to
2968                     start  now. This allows for other jobs to be able to book
2969                     the other pairs resources at the cost of not guaranteeing
2970                     that  the multi partition job will start in the partition
2971                     offering the earliest start time (unless it can start im‐
2972                     mediately).  This option is disabled by default.
2973
2974              bf_resolution=#
2975                     The  number  of  seconds  in the resolution of data main‐
2976                     tained about when jobs begin and end. Higher  values  re‐
2977                     sult in better responsiveness and quicker backfill cycles
2978                     by using larger blocks of time to determine  node  eligi‐
2979                     bility.   However,  higher  values lead to less efficient
2980                     system planning, and may miss  opportunities  to  improve
2981                     system  utilization.   This option applies only to Sched‐
2982                     ulerType=sched/backfill.  Default: 60, Min: 1, Max:  3600
2983                     (1 hour).
2984
2985              bf_running_job_reserve
2986                     Add  an extra step to backfill logic, which creates back‐
2987                     fill reservations for jobs running on whole nodes.   This
2988                     option is disabled by default.
2989
2990              bf_window=#
2991                     The  number  of minutes into the future to look when con‐
2992                     sidering jobs to schedule.  Higher values result in  more
2993                     overhead  and  less  responsiveness.  A value at least as
2994                     long as the highest allowed time limit is  generally  ad‐
2995                     visable to prevent job starvation.  In order to limit the
2996                     amount of data managed by the backfill scheduler, if  the
2997                     value of bf_window is increased, then it is generally ad‐
2998                     visable to also increase bf_resolution.  This option  ap‐
2999                     plies  only  to  SchedulerType=sched/backfill.   Default:
3000                     1440 (1 day), Min: 1, Max: 43200 (30 days).
3001
3002              bf_window_linear=#
3003                     For performance reasons, the backfill scheduler will  de‐
3004                     crease  precision in calculation of job expected termina‐
3005                     tion times. By default, the precision starts at  30  sec‐
3006                     onds  and that time interval doubles with each evaluation
3007                     of currently executing jobs when trying to determine when
3008                     a  pending  job  can start. This algorithm can support an
3009                     environment with many thousands of running jobs, but  can
3010                     result  in  the expected start time of pending jobs being
3011                     gradually being deferred due  to  lack  of  precision.  A
3012                     value  for  bf_window_linear will cause the time interval
3013                     to be increased by a constant amount on  each  iteration.
3014                     The  value is specified in units of seconds. For example,
3015                     a value of 60 will cause the backfill  scheduler  on  the
3016                     first  iteration  to  identify the job ending soonest and
3017                     determine if the pending job can be  started  after  that
3018                     job plus all other jobs expected to end within 30 seconds
3019                     (default initial value) of the first job. On the next it‐
3020                     eration,  the  pending job will be evaluated for starting
3021                     after the next job expected to end plus all  jobs  ending
3022                     within  90  seconds of that time (30 second default, plus
3023                     the 60 second option value).  The  third  iteration  will
3024                     have  a  150  second  window  and the fourth 210 seconds.
3025                     Without this option, the time windows will double on each
3026                     iteration  and thus be 30, 60, 120, 240 seconds, etc. The
3027                     use of bf_window_linear is not recommended with more than
3028                     a few hundred simultaneously executing jobs.
3029
3030              bf_yield_interval=#
3031                     The backfill scheduler will periodically relinquish locks
3032                     in order for other  pending  operations  to  take  place.
3033                     This  specifies the times when the locks are relinquished
3034                     in microseconds.  Smaller values may be helpful for  high
3035                     throughput  computing  when  used in conjunction with the
3036                     bf_continue option.  Also see the bf_yield_sleep  option.
3037                     Default:  2,000,000  (2 sec), Min: 1, Max: 10,000,000 (10
3038                     sec).
3039
3040              bf_yield_sleep=#
3041                     The backfill scheduler will periodically relinquish locks
3042                     in  order  for  other  pending  operations to take place.
3043                     This specifies the length of time for which the locks are
3044                     relinquished  in microseconds.  Also see the bf_yield_in‐
3045                     terval option.  Default: 500,000 (0.5 sec), Min: 1,  Max:
3046                     10,000,000 (10 sec).
3047
3048              build_queue_timeout=#
3049                     Defines  the maximum time that can be devoted to building
3050                     a queue of jobs to be tested for scheduling.  If the sys‐
3051                     tem  has  a  huge  number of jobs with dependencies, just
3052                     building the job queue can take so much time  as  to  ad‐
3053                     versely impact overall system performance and this param‐
3054                     eter can be adjusted as needed.   The  default  value  is
3055                     2,000,000 microseconds (2 seconds).
3056
3057              correspond_after_task_cnt=#
3058                     Defines  the number of array tasks that get split for po‐
3059                     tential aftercorr dependency check. Low number may result
3060                     in dependent task check failures when the job one depends
3061                     on gets purged before the split.  Default: 10.
3062
3063              default_queue_depth=#
3064                     The default number of jobs to  attempt  scheduling  (i.e.
3065                     the  queue  depth)  when a running job completes or other
3066                     routine actions occur, however the frequency  with  which
3067                     the scheduler is run may be limited by using the defer or
3068                     sched_min_interval parameters described below.  The  full
3069                     queue  will be tested on a less frequent basis as defined
3070                     by the sched_interval option described below. The default
3071                     value  is  100.   See  the  partition_job_depth option to
3072                     limit depth by partition.
3073
3074              defer  Setting this option will  avoid  attempting  to  schedule
3075                     each  job  individually  at job submit time, but defer it
3076                     until a later time when scheduling multiple jobs simulta‐
3077                     neously  may be possible.  This option may improve system
3078                     responsiveness when large numbers of jobs (many hundreds)
3079                     are  submitted  at  the  same time, but it will delay the
3080                     initiation  time  of  individual  jobs.  Also   see   de‐
3081                     fault_queue_depth above.
3082
3083              delay_boot=#
3084                     Do not reboot nodes in order to satisfied this job's fea‐
3085                     ture specification if the job has been  eligible  to  run
3086                     for  less  than  this time period.  If the job has waited
3087                     for less than the specified  period,  it  will  use  only
3088                     nodes which already have the specified features.  The ar‐
3089                     gument is in units of minutes.  Individual jobs may over‐
3090                     ride this default value with the --delay-boot option.
3091
3092              disable_job_shrink
3093                     Deny  user  requests  to shrink the size of running jobs.
3094                     (However, running jobs may still shrink due to node fail‐
3095                     ure if the --no-kill option was set.)
3096
3097              disable_hetjob_steps
3098                     Disable  job  steps  that  span heterogeneous job alloca‐
3099                     tions.
3100
3101              enable_hetjob_steps
3102                     Enable job steps that span heterogeneous job allocations.
3103                     The default value.
3104
3105              enable_user_top
3106                     Enable  use  of  the "scontrol top" command by non-privi‐
3107                     leged users.
3108
3109              Ignore_NUMA
3110                     Some processors (e.g. AMD Opteron  6000  series)  contain
3111                     multiple  NUMA  nodes per socket. This is a configuration
3112                     which does not map into the hardware entities that  Slurm
3113                     optimizes   resource  allocation  for  (PU/thread,  core,
3114                     socket, baseboard, node and network switch). In order  to
3115                     optimize  resource  allocations  on  such hardware, Slurm
3116                     will consider each NUMA node within the socket as a sepa‐
3117                     rate socket by default. Use the Ignore_NUMA option to re‐
3118                     port the correct socket count, but not optimize  resource
3119                     allocations on the NUMA nodes.
3120
3121                     NOTE:  Since hwloc 2.0 NUMA Nodes are are not part of the
3122                     main/CPU topology tree, because of that if Slurm is build
3123                     with  hwloc 2.0 or above Slurm will treat HWLOC_OBJ_PACK‐
3124                     AGE as Socket, you can change this behavior using Slurmd‐
3125                     Parameters=l3cache_as_socket.
3126
3127              ignore_prefer_validation
3128                     If  set,  and a job requests --prefer any features in the
3129                     request that would create an  invalid  request  with  the
3130                     current system will not generate an error.  This is help‐
3131                     ful for dynamic systems where nodes  with  features  come
3132                     and  go.   Please note using this option will not protect
3133                     you from typos.
3134
3135              max_array_tasks
3136                     Specify the maximum number of tasks that can be  included
3137                     in  a  job array.  The default limit is MaxArraySize, but
3138                     this option can be used to set a lower limit.  For  exam‐
3139                     ple,  max_array_tasks=1000  and MaxArraySize=100001 would
3140                     permit a maximum task ID of 100000, but limit the  number
3141                     of tasks in any single job array to 1000.
3142
3143              max_rpc_cnt=#
3144                     If  the  number of active threads in the slurmctld daemon
3145                     is equal to or larger than this value,  defer  scheduling
3146                     of  jobs. The scheduler will check this condition at cer‐
3147                     tain points in code and yield locks if  necessary.   This
3148                     can improve Slurm's ability to process requests at a cost
3149                     of initiating new jobs less frequently. Default:  0  (op‐
3150                     tion disabled), Min: 0, Max: 1000.
3151
3152                     NOTE:  The maximum number of threads (MAX_SERVER_THREADS)
3153                     is internally set to 256 and defines the number of served
3154                     RPCs  at  a  given time. Setting max_rpc_cnt to more than
3155                     256 will be only useful to let backfill continue schedul‐
3156                     ing  work after locks have been yielded (i.e. each 2 sec‐
3157                     onds) if there are a maximum of  MAX(max_rpc_cnt/10,  20)
3158                     RPCs  in  the queue. i.e. max_rpc_cnt=1000, the scheduler
3159                     will be allowed to continue  after  yielding  locks  only
3160                     when  there  are  less than or equal to 100 pending RPCs.
3161                     If a value is set, then a value of 10 or higher is recom‐
3162                     mended.  It  may require some tuning for each system, but
3163                     needs to be high enough that scheduling isn't always dis‐
3164                     abled,  and low enough that requests can get through in a
3165                     reasonable period of time.
3166
3167              max_sched_time=#
3168                     How long, in seconds, that the main scheduling loop  will
3169                     execute for before exiting.  If a value is configured, be
3170                     aware that all other Slurm operations  will  be  deferred
3171                     during this time period.  Make certain the value is lower
3172                     than MessageTimeout.  If a value is not  explicitly  con‐
3173                     figured, the default value is half of MessageTimeout with
3174                     a minimum default value of 1 second and a maximum default
3175                     value  of  2  seconds.  For example if MessageTimeout=10,
3176                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3177
3178              max_script_size=#
3179                     Specify the maximum size of a  batch  script,  in  bytes.
3180                     The  default value is 4 megabytes.  Larger values may ad‐
3181                     versely impact system performance.
3182
3183              max_switch_wait=#
3184                     Maximum number of seconds that a job can delay  execution
3185                     waiting  for  the specified desired switch count. The de‐
3186                     fault value is 300 seconds.
3187
3188              no_backup_scheduling
3189                     If used, the backup controller  will  not  schedule  jobs
3190                     when it takes over. The backup controller will allow jobs
3191                     to be submitted, modified and cancelled but won't  sched‐
3192                     ule  new  jobs.  This is useful in Cray environments when
3193                     the backup controller resides on an external  Cray  node.
3194                     A  restart  of  slurmctld is required for changes to this
3195                     parameter to take effect.
3196
3197              no_env_cache
3198                     If used, any job started on node that fails to  load  the
3199                     env  from  a  node  will fail instead of using the cached
3200                     env.   This  will   also   implicitly   imply   the   re‐
3201                     queue_setup_env_fail option as well.
3202
3203              nohold_on_prolog_fail
3204                     By default, if the Prolog exits with a non-zero value the
3205                     job is requeued in a held state. By specifying  this  pa‐
3206                     rameter the job will be requeued but not held so that the
3207                     scheduler can dispatch it to another host.
3208
3209              pack_serial_at_end
3210                     If used  with  the  select/cons_res  or  select/cons_tres
3211                     plugin,  then put serial jobs at the end of the available
3212                     nodes rather than using a best fit algorithm.   This  may
3213                     reduce resource fragmentation for some workloads.
3214
3215              partition_job_depth=#
3216                     The  default  number  of jobs to attempt scheduling (i.e.
3217                     the queue depth) from  each  partition/queue  in  Slurm's
3218                     main  scheduling  logic.  The functionality is similar to
3219                     that provided by the bf_max_job_part option for the back‐
3220                     fill  scheduling  logic.   The  default  value  is  0 (no
3221                     limit).  Job's excluded from attempted  scheduling  based
3222                     upon  partition  will  not  be  counted  against  the de‐
3223                     fault_queue_depth limit.  Also  see  the  bf_max_job_part
3224                     option.
3225
3226              preempt_reorder_count=#
3227                     Specify  how  many  attempts should be made in reordering
3228                     preemptable jobs to minimize the count of jobs preempted.
3229                     The  default value is 1. High values may adversely impact
3230                     performance.  The logic to support this  option  is  only
3231                     available  in  the  select/cons_res  and select/cons_tres
3232                     plugins.
3233
3234              preempt_strict_order
3235                     If set, then execute extra logic in an attempt to preempt
3236                     only  the  lowest  priority jobs.  It may be desirable to
3237                     set this configuration parameter when there are  multiple
3238                     priorities  of  preemptable  jobs.   The logic to support
3239                     this option is only available in the select/cons_res  and
3240                     select/cons_tres plugins.
3241
3242              preempt_youngest_first
3243                     If  set,  then  the  preemption sorting algorithm will be
3244                     changed to sort by the job start times to favor  preempt‐
3245                     ing  younger  jobs  over  older. (Requires preempt/parti‐
3246                     tion_prio or preempt/qos plugins.)
3247
3248              reduce_completing_frag
3249                     This option is used to  control  how  scheduling  of  re‐
3250                     sources  is  performed  when  jobs  are in the COMPLETING
3251                     state, which influences potential fragmentation.  If this
3252                     option  is  not  set  then no jobs will be started in any
3253                     partition when any job is in  the  COMPLETING  state  for
3254                     less  than  CompleteWait  seconds.  If this option is set
3255                     then no jobs will be started in any individual  partition
3256                     that  has  a  job  in COMPLETING state for less than Com‐
3257                     pleteWait seconds.  In addition, no jobs will be  started
3258                     in  any  partition with nodes that overlap with any nodes
3259                     in the partition of the completing job.  This  option  is
3260                     to be used in conjunction with CompleteWait.
3261
3262                     NOTE: CompleteWait must be set in order for this to work.
3263                     If CompleteWait=0 then this option does nothing.
3264
3265                     NOTE: reduce_completing_frag only affects the main sched‐
3266                     uler, not the backfill scheduler.
3267
3268              requeue_setup_env_fail
3269                     By default if a job environment setup fails the job keeps
3270                     running with a limited environment.  By  specifying  this
3271                     parameter  the job will be requeued in held state and the
3272                     execution node drained.
3273
3274              salloc_wait_nodes
3275                     If defined, the salloc command will wait until all  allo‐
3276                     cated  nodes  are  ready for use (i.e. booted) before the
3277                     command returns. By default, salloc will return  as  soon
3278                     as the resource allocation has been made.
3279
3280              sbatch_wait_nodes
3281                     If  defined,  the sbatch script will wait until all allo‐
3282                     cated nodes are ready for use (i.e.  booted)  before  the
3283                     initiation.  By default, the sbatch script will be initi‐
3284                     ated as soon as the first node in the job  allocation  is
3285                     ready.  The  sbatch  command can use the --wait-all-nodes
3286                     option to override this configuration parameter.
3287
3288              sched_interval=#
3289                     How frequently, in seconds, the main scheduling loop will
3290                     execute  and test all pending jobs.  The default value is
3291                     60 seconds.  A setting of -1 will disable the main sched‐
3292                     uling loop.
3293
3294              sched_max_job_start=#
3295                     The maximum number of jobs that the main scheduling logic
3296                     will start in any single execution.  The default value is
3297                     zero, which imposes no limit.
3298
3299              sched_min_interval=#
3300                     How frequently, in microseconds, the main scheduling loop
3301                     will execute and test any pending  jobs.   The  scheduler
3302                     runs  in a limited fashion every time that any event hap‐
3303                     pens which could enable a job to start (e.g. job  submit,
3304                     job  terminate,  etc.).  If these events happen at a high
3305                     frequency, the scheduler can run very frequently and con‐
3306                     sume  significant  resources if not throttled by this op‐
3307                     tion.  This option specifies the minimum time between the
3308                     end of one scheduling cycle and the beginning of the next
3309                     scheduling cycle.  A value of zero  will  disable  throt‐
3310                     tling  of  the  scheduling  logic  interval.  The default
3311                     value is 2 microseconds.
3312
3313              spec_cores_first
3314                     Specialized cores will be selected from the  first  cores
3315                     of  the  first  sockets, cycling through the sockets on a
3316                     round robin basis.  By default, specialized cores will be
3317                     selected from the last cores of the last sockets, cycling
3318                     through the sockets on a round robin basis.
3319
3320              step_retry_count=#
3321                     When a step completes and there are steps ending resource
3322                     allocation, then retry step allocations for at least this
3323                     number of pending steps.  Also see step_retry_time.   The
3324                     default value is 8 steps.
3325
3326              step_retry_time=#
3327                     When a step completes and there are steps ending resource
3328                     allocation, then retry step  allocations  for  all  steps
3329                     which  have been pending for at least this number of sec‐
3330                     onds.  Also see step_retry_count.  The default  value  is
3331                     60 seconds.
3332
3333              whole_hetjob
3334                     Requests  to  cancel,  hold or release any component of a
3335                     heterogeneous job will be applied to  all  components  of
3336                     the job.
3337
3338                     NOTE:  this  option  was  previously named whole_pack and
3339                     this is still supported for retrocompatibility.
3340
3341       SchedulerTimeSlice
3342              Number of seconds in each time slice when gang scheduling is en‐
3343              abled  (PreemptMode=SUSPEND,GANG).   The value must be between 5
3344              seconds and 65533 seconds.  The default value is 30 seconds.
3345
3346       SchedulerType
3347              Identifies the type of scheduler  to  be  used.   A  restart  of
3348              slurmctld  is required for changes to this parameter to take ef‐
3349              fect.  The scontrol command can be used to manually  change  job
3350              priorities if desired.  Acceptable values include:
3351
3352              sched/backfill
3353                     For  a  backfill scheduling module to augment the default
3354                     FIFO  scheduling.   Backfill  scheduling  will   initiate
3355                     lower-priority  jobs  if  doing so does not delay the ex‐
3356                     pected initiation time of any higher priority  job.   Ef‐
3357                     fectiveness  of  backfill  scheduling  is  dependent upon
3358                     users specifying job time limits, otherwise all jobs will
3359                     have  the  same time limit and backfilling is impossible.
3360                     Note documentation  for  the  SchedulerParameters  option
3361                     above.  This is the default configuration.
3362
3363              sched/builtin
3364                     This is the FIFO scheduler which initiates jobs in prior‐
3365                     ity order.  If any job in the partition can not be sched‐
3366                     uled,  no  lower  priority  job in that partition will be
3367                     scheduled.  An exception is made for jobs  that  can  not
3368                     run due to partition constraints (e.g. the time limit) or
3369                     down/drained nodes.  In that case,  lower  priority  jobs
3370                     can be initiated and not impact the higher priority job.
3371
3372       ScronParameters
3373              Multiple options may be comma separated.
3374
3375              enable Enable  the use of scrontab to submit and manage periodic
3376                     repeating jobs.
3377
3378       SelectType
3379              Identifies the type of resource selection algorithm to be  used.
3380              A restart of slurmctld is required for changes to this parameter
3381              to take effect.  When changed, all job information (running  and
3382              pending)  will  be lost, since the job state save format used by
3383              each plugin is different.  The only exception to  this  is  when
3384              changing  from  cons_res  to  cons_tres  or  from  cons_tres  to
3385              cons_res. However, if a job contains cons_tres-specific features
3386              and then SelectType is changed to cons_res, the job will be can‐
3387              celed, since there is no way for cons_res  to  satisfy  require‐
3388              ments specific to cons_tres.
3389
3390              Acceptable values include
3391
3392              select/cons_res
3393                     The  resources (cores and memory) within a node are indi‐
3394                     vidually allocated as consumable  resources.   Note  that
3395                     whole  nodes can be allocated to jobs for selected parti‐
3396                     tions by using the OverSubscribe=Exclusive  option.   See
3397                     the  partition  OverSubscribe parameter for more informa‐
3398                     tion.
3399
3400              select/cons_tres
3401                     The resources (cores, memory, GPUs and all  other  track‐
3402                     able  resources) within a node are individually allocated
3403                     as consumable resources.  Note that whole  nodes  can  be
3404                     allocated  to  jobs  for selected partitions by using the
3405                     OverSubscribe=Exclusive option.  See the partition  Over‐
3406                     Subscribe parameter for more information.
3407
3408              select/cray_aries
3409                     for   a   Cray   system.    The  default  value  is  "se‐
3410                     lect/cray_aries" for all Cray systems.
3411
3412              select/linear
3413                     for allocation of entire nodes assuming a one-dimensional
3414                     array  of  nodes  in which sequentially ordered nodes are
3415                     preferable.  For a heterogeneous cluster (e.g.  different
3416                     CPU  counts  on  the various nodes), resource allocations
3417                     will favor nodes with high CPU  counts  as  needed  based
3418                     upon the job's node and CPU specification if TopologyPlu‐
3419                     gin=topology/none is configured. Use  of  other  topology
3420                     plugins with select/linear and heterogeneous nodes is not
3421                     recommended and may result in valid  job  allocation  re‐
3422                     quests  being rejected. The linear plugin is not designed
3423                     to track generic resources on  a  node.  In  cases  where
3424                     generic  resources (such as GPUs) need to be tracked, the
3425                     cons_res or cons_tres plugins  should  be  used  instead.
3426                     This is the default value.
3427
3428       SelectTypeParameters
3429              The  permitted  values  of  SelectTypeParameters depend upon the
3430              configured value of SelectType.  The only supported options  for
3431              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3432              which treats memory as a consumable resource and prevents memory
3433              over  subscription  with  job preemption or gang scheduling.  By
3434              default SelectType=select/linear allocates whole nodes  to  jobs
3435              without  considering  their  memory consumption.  By default Se‐
3436              lectType=select/cons_res, SelectType=select/cray_aries, and  Se‐
3437              lectType=select/cons_tres,  use  CR_Core_Memory, which allocates
3438              Core to jobs with considering their memory consumption.
3439
3440              A restart of slurmctld is required for changes to this parameter
3441              to take effect.
3442
3443              The   following   options   are   supported  for  SelectType=se‐
3444              lect/cray_aries:
3445
3446              OTHER_CONS_RES
3447                     Layer  the   select/cons_res   plugin   under   the   se‐
3448                     lect/cray_aries  plugin,  the  default is to layer on se‐
3449                     lect/linear.  This also allows all the options  available
3450                     for SelectType=select/cons_res.
3451
3452              OTHER_CONS_TRES
3453                     Layer   the   select/cons_tres   plugin   under  the  se‐
3454                     lect/cray_aries plugin, the default is to  layer  on  se‐
3455                     lect/linear.   This also allows all the options available
3456                     for SelectType=select/cons_tres.
3457
3458       The following options are supported by  the  SelectType=select/cons_res
3459       and SelectType=select/cons_tres plugins:
3460
3461              CR_CPU CPUs  are  consumable resources.  Configure the number of
3462                     CPUs on each node, which may be equal  to  the  count  of
3463                     cores or hyper-threads on the node depending upon the de‐
3464                     sired minimum resource allocation.   The  node's  Boards,
3465                     Sockets, CoresPerSocket and ThreadsPerCore may optionally
3466                     be configured and result in job  allocations  which  have
3467                     improved  locality;  however  doing  so will prevent more
3468                     than one job from being allocated on each core.
3469
3470              CR_CPU_Memory
3471                     CPUs and memory are consumable resources.  Configure  the
3472                     number  of  CPUs  on each node, which may be equal to the
3473                     count of cores or hyper-threads  on  the  node  depending
3474                     upon the desired minimum resource allocation.  The node's
3475                     Boards, Sockets, CoresPerSocket  and  ThreadsPerCore  may
3476                     optionally  be  configured  and result in job allocations
3477                     which have improved locality; however doing so will  pre‐
3478                     vent more than one job from being allocated on each core.
3479                     Setting a value for DefMemPerCPU is strongly recommended.
3480
3481              CR_Core
3482                     Cores  are  consumable  resources.   On  nodes  with  hy‐
3483                     per-threads, each thread is counted as a CPU to satisfy a
3484                     job's resource requirement, but multiple jobs are not al‐
3485                     located  threads on the same core.  The count of CPUs al‐
3486                     located to a job is rounded up to account for  every  CPU
3487                     on  an  allocated core. This will also impact total allo‐
3488                     cated memory when --mem-per-cpu is used to be multiply of
3489                     total number of CPUs on allocated cores.
3490
3491              CR_Core_Memory
3492                     Cores and memory are consumable resources.  On nodes with
3493                     hyper-threads, each thread is counted as a CPU to satisfy
3494                     a  job's  resource requirement, but multiple jobs are not
3495                     allocated threads on the same core.  The  count  of  CPUs
3496                     allocated to a job may be rounded up to account for every
3497                     CPU on an allocated core.  Setting a value for DefMemPer‐
3498                     CPU is strongly recommended.
3499
3500              CR_ONE_TASK_PER_CORE
3501                     Allocate  one task per core by default.  Without this op‐
3502                     tion, by default one task will be allocated per thread on
3503                     nodes  with  more  than  one  ThreadsPerCore  configured.
3504                     NOTE: This option cannot be used with CR_CPU*.
3505
3506              CR_CORE_DEFAULT_DIST_BLOCK
3507                     Allocate cores within a node using block distribution  by
3508                     default.   This is a pseudo-best-fit algorithm that mini‐
3509                     mizes the number of boards and minimizes  the  number  of
3510                     sockets  (within minimum boards) used for the allocation.
3511                     This default behavior can be overridden specifying a par‐
3512                     ticular  "-m" parameter with srun/salloc/sbatch.  Without
3513                     this option, cores will be  allocated  cyclically  across
3514                     the sockets.
3515
3516              CR_LLN Schedule  resources  to  jobs  on  the least loaded nodes
3517                     (based upon the number of idle CPUs). This  is  generally
3518                     only  recommended  for an environment with serial jobs as
3519                     idle resources will tend to be highly fragmented, result‐
3520                     ing in parallel jobs being distributed across many nodes.
3521                     Note that node Weight takes precedence over how many idle
3522                     resources  are on each node.  Also see the partition con‐
3523                     figuration parameter LLN use the least  loaded  nodes  in
3524                     selected partitions.
3525
3526              CR_Pack_Nodes
3527                     If  a job allocation contains more resources than will be
3528                     used for launching tasks (e.g. if whole nodes  are  allo‐
3529                     cated  to  a  job), then rather than distributing a job's
3530                     tasks evenly across its allocated  nodes,  pack  them  as
3531                     tightly  as  possible  on these nodes.  For example, con‐
3532                     sider a job allocation containing two entire  nodes  with
3533                     eight  CPUs  each.   If  the  job starts ten tasks across
3534                     those two nodes without this option, it will  start  five
3535                     tasks  on each of the two nodes.  With this option, eight
3536                     tasks will be started on the first node and two tasks  on
3537                     the  second  node.  This can be superseded by "NoPack" in
3538                     srun's "--distribution" option.  CR_Pack_Nodes  only  ap‐
3539                     plies when the "block" task distribution method is used.
3540
3541              CR_Socket
3542                     Sockets are consumable resources.  On nodes with multiple
3543                     cores, each core or thread is counted as a CPU to satisfy
3544                     a  job's  resource requirement, but multiple jobs are not
3545                     allocated resources on the same socket.
3546
3547              CR_Socket_Memory
3548                     Memory and sockets are consumable  resources.   On  nodes
3549                     with  multiple cores, each core or thread is counted as a
3550                     CPU to satisfy a job's resource requirement, but multiple
3551                     jobs  are  not  allocated  resources  on the same socket.
3552                     Setting a value for DefMemPerCPU is strongly recommended.
3553
3554              CR_Memory
3555                     Memory is a  consumable  resource.   NOTE:  This  implies
3556                     OverSubscribe=YES  or  OverSubscribe=FORCE for all parti‐
3557                     tions.  Setting a value for DefMemPerCPU is strongly rec‐
3558                     ommended.
3559
3560              NOTE:  If  memory  isn't  configured  as  a  consumable resource
3561              (CR_CPU,
3562                     CR_Core or CR_Socket without _Memory) memory can be over‐
3563                     subscribed. In this case the --mem option is only used to
3564                     filter out nodes with lower configured  memory  and  does
3565                     not  take  running  jobs  into account. For instance, two
3566                     jobs requesting all the memory of a node can run  at  the
3567                     same time.
3568
3569       SlurmctldAddr
3570              An  optional  address  to be used for communications to the cur‐
3571              rently active slurmctld daemon, normally used  with  Virtual  IP
3572              addressing of the currently active server.  If this parameter is
3573              not specified then each primary and backup server will have  its
3574              own  unique  address used for communications as specified in the
3575              SlurmctldHost parameter.  If this parameter  is  specified  then
3576              the  SlurmctldHost  parameter  will still be used for communica‐
3577              tions to specific slurmctld primary or backup servers, for exam‐
3578              ple to cause all of them to read the current configuration files
3579              or shutdown.  Also see the  SlurmctldPrimaryOffProg  and  Slurm‐
3580              ctldPrimaryOnProg configuration parameters to configure programs
3581              to manipulate virtual IP address manipulation.
3582
3583       SlurmctldDebug
3584              The level of detail to provide slurmctld daemon's logs.  The de‐
3585              fault  value is info.  If the slurmctld daemon is initiated with
3586              -v or --verbose options, that debug level will  be  preserve  or
3587              restored upon reconfiguration.
3588
3589              quiet     Log nothing
3590
3591              fatal     Log only fatal errors
3592
3593              error     Log only errors
3594
3595              info      Log errors and general informational messages
3596
3597              verbose   Log errors and verbose informational messages
3598
3599              debug     Log  errors and verbose informational messages and de‐
3600                        bugging messages
3601
3602              debug2    Log errors and verbose informational messages and more
3603                        debugging messages
3604
3605              debug3    Log errors and verbose informational messages and even
3606                        more debugging messages
3607
3608              debug4    Log errors and verbose informational messages and even
3609                        more debugging messages
3610
3611              debug5    Log errors and verbose informational messages and even
3612                        more debugging messages
3613
3614       SlurmctldHost
3615              The short, or long, hostname of the machine where Slurm  control
3616              daemon is executed (i.e. the name returned by the command "host‐
3617              name -s").  This hostname is optionally followed by the address,
3618              either  the  IP  address  or  a name by which the address can be
3619              identified, enclosed in parentheses (e.g.   SlurmctldHost=slurm‐
3620              ctl-primary(12.34.56.78)). This value must be specified at least
3621              once. If specified more than once, the first hostname named will
3622              be  where  the  daemon runs.  If the first specified host fails,
3623              the daemon will execute on the second host.  If both  the  first
3624              and  second specified host fails, the daemon will execute on the
3625              third host.  A restart of slurmctld is required for  changes  to
3626              this parameter to take effect.
3627
3628       SlurmctldLogFile
3629              Fully qualified pathname of a file into which the slurmctld dae‐
3630              mon's logs are written.  The default  value  is  none  (performs
3631              logging via syslog).
3632              See the section LOGGING if a pathname is specified.
3633
3634       SlurmctldParameters
3635              Multiple options may be comma separated.
3636
3637              allow_user_triggers
3638                     Permit  setting  triggers from non-root/slurm_user users.
3639                     SlurmUser must also be set to root to permit these  trig‐
3640                     gers  to  work.  See the strigger man page for additional
3641                     details.
3642
3643              cloud_dns
3644                     By default, Slurm expects that the network address for  a
3645                     cloud  node won't be known until the creation of the node
3646                     and that Slurm will be notified  of  the  node's  address
3647                     (e.g.  scontrol  update nodename=<name> nodeaddr=<addr>).
3648                     Since Slurm communications rely on the node configuration
3649                     found  in the slurm.conf, Slurm will tell the client com‐
3650                     mand, after waiting for all nodes to boot, each node's ip
3651                     address.  However, in environments where the nodes are in
3652                     DNS, this step can be avoided by configuring this option.
3653
3654              cloud_reg_addrs
3655                     When a cloud node  registers,  the  node's  NodeAddr  and
3656                     NodeHostName  will automatically be set. They will be re‐
3657                     set back to the nodename after powering off.
3658
3659              enable_configless
3660                     Permit "configless" operation by the slurmd,  slurmstepd,
3661                     and  user commands.  When enabled the slurmd will be per‐
3662                     mitted to retrieve config files from the  slurmctld,  and
3663                     on any 'scontrol reconfigure' command new configs will be
3664                     automatically pushed out and applied to  nodes  that  are
3665                     running  in  this "configless" mode.  A restart of slurm‐
3666                     ctld is required for changes to this  parameter  to  take
3667                     effect.   NOTE: Included files with the Include directive
3668                     will only be pushed if the filename has no  path  separa‐
3669                     tors and is located adjacent to slurm.conf.
3670
3671              idle_on_node_suspend
3672                     Mark  nodes  as  idle,  regardless of current state, when
3673                     suspending nodes with SuspendProgram so that  nodes  will
3674                     be eligible to be resumed at a later time.
3675
3676              node_reg_mem_percent=#
3677                     Percentage  of  memory a node is allowed to register with
3678                     without being marked as invalid with low memory.  Default
3679                     is 100. For State=CLOUD nodes, the default is 90. To dis‐
3680                     able this for cloud nodes set it to 100. config_overrides
3681                     takes precedence over this option.
3682
3683                     It's  recommended that task/cgroup with ConstrainRamSpace
3684                     is configured. A memory cgroup limit won't  be  set  more
3685                     than  the actual memory on the node. If needed, configure
3686                     AllowedRamSpace in the cgroup.conf to add a buffer.
3687
3688              power_save_interval
3689                     How often the power_save thread looks to resume and  sus‐
3690                     pend  nodes. The power_save thread will do work sooner if
3691                     there are node state changes. Default is 10 seconds.
3692
3693              power_save_min_interval
3694                     How often the power_save thread, at a minimum,  looks  to
3695                     resume and suspend nodes. Default is 0.
3696
3697              max_dbd_msg_action
3698                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3699                     card' (default) and 'exit'.
3700
3701                     When 'discard' is specified and MaxDBDMsgs is reached  we
3702                     start by purging pending messages of types Step start and
3703                     complete, and it reaches MaxDBDMsgs again Job start  mes‐
3704                     sages  are  purged.  Job completes and node state changes
3705                     continue to consume the  empty  space  created  from  the
3706                     purgings  until  MaxDBDMsgs  is reached again at which no
3707                     new message is tracked creating data loss and potentially
3708                     runaway jobs.
3709
3710                     When  'exit'  is  specified and MaxDBDMsgs is reached the
3711                     slurmctld will exit instead of discarding  any  messages.
3712                     It  will  be  impossible to start the slurmctld with this
3713                     option where the slurmdbd is down and  the  slurmctld  is
3714                     tracking more than MaxDBDMsgs.
3715
3716              preempt_send_user_signal
3717                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3718                     tion time even if the signal time hasn't been reached. In
3719                     the  case  of a gracetime preemption the user signal will
3720                     be sent if the user signal has  been  specified  and  not
3721                     sent, otherwise a SIGTERM will be sent to the tasks.
3722
3723              reboot_from_controller
3724                     Run  the  RebootProgram from the controller instead of on
3725                     the  slurmds.  The  RebootProgram  will   be   passed   a
3726                     comma-separated  list of nodes to reboot as the first ar‐
3727                     gument and if applicable the required features needed for
3728                     reboot as the second argument.
3729
3730              user_resv_delete
3731                     Allow any user able to run in a reservation to delete it.
3732
3733       SlurmctldPidFile
3734              Fully  qualified  pathname  of  a file into which the  slurmctld
3735              daemon may write its process id. This may be used for  automated
3736              signal   processing.   The  default  value  is  "/var/run/slurm‐
3737              ctld.pid".
3738
3739       SlurmctldPlugstack
3740              A comma-delimited list of Slurm controller plugins to be started
3741              when  the  daemon  begins and terminated when it ends.  Only the
3742              plugin's init and fini functions are called.
3743
3744       SlurmctldPort
3745              The port number that the Slurm controller, slurmctld, listens to
3746              for  work. The default value is SLURMCTLD_PORT as established at
3747              system build time. If none is explicitly specified, it  will  be
3748              set  to 6817.  SlurmctldPort may also be configured to support a
3749              range of port numbers in order to accept larger bursts of incom‐
3750              ing messages by specifying two numbers separated by a dash (e.g.
3751              SlurmctldPort=6817-6818).  A restart of  slurmctld  is  required
3752              for  changes  to  this  parameter  to take effect.  NOTE: Either
3753              slurmctld and slurmd daemons must not execute on the same  nodes
3754              or the values of SlurmctldPort and SlurmdPort must be different.
3755
3756              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3757              automatically try to interact  with  anything  opened  on  ports
3758              8192-60000.   Configure  SlurmctldPort  to use a port outside of
3759              the configured SrunPortRange and RSIP's port range.
3760
3761       SlurmctldPrimaryOffProg
3762              This program is executed when a slurmctld daemon running as  the
3763              primary server becomes a backup server. By default no program is
3764              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3765              ter.
3766
3767       SlurmctldPrimaryOnProg
3768              This  program  is  executed when a slurmctld daemon running as a
3769              backup server becomes the primary server. By default no  program
3770              is  executed.   When  using  virtual IP addresses to manage High
3771              Available Slurm services, this program can be used to add the IP
3772              address  to  an  interface (and optionally try to kill the unre‐
3773              sponsive  slurmctld daemon and flush the ARP caches on nodes  on
3774              the local Ethernet fabric).  See also the related "SlurmctldPri‐
3775              maryOffProg" parameter.
3776
3777       SlurmctldSyslogDebug
3778              The slurmctld daemon will log events to the syslog file  at  the
3779              specified level of detail. If not set, the slurmctld daemon will
3780              log to syslog at level fatal, unless there is  no  SlurmctldLog‐
3781              File  and it is running in the background, in which case it will
3782              log to syslog at the level specified by SlurmctldDebug (at fatal
3783              in the case that SlurmctldDebug is set to quiet) or it is run in
3784              the foreground, when it will be set to quiet.
3785
3786              quiet     Log nothing
3787
3788              fatal     Log only fatal errors
3789
3790              error     Log only errors
3791
3792              info      Log errors and general informational messages
3793
3794              verbose   Log errors and verbose informational messages
3795
3796              debug     Log errors and verbose informational messages and  de‐
3797                        bugging messages
3798
3799              debug2    Log errors and verbose informational messages and more
3800                        debugging messages
3801
3802              debug3    Log errors and verbose informational messages and even
3803                        more debugging messages
3804
3805              debug4    Log errors and verbose informational messages and even
3806                        more debugging messages
3807
3808              debug5    Log errors and verbose informational messages and even
3809                        more debugging messages
3810
3811              NOTE: By default, Slurm's systemd service files start daemons in
3812              the foreground with the -D option. This means that systemd  will
3813              capture  stdout/stderr output and print that to syslog, indepen‐
3814              dent of Slurm printing to syslog directly.  To  prevent  systemd
3815              from  doing  this,  add  "StandardOutput=null"  and "StandardEr‐
3816              ror=null" to the respective service files or override files.
3817
3818       SlurmctldTimeout
3819              The interval, in seconds, that the backup controller  waits  for
3820              the  primary controller to respond before assuming control.  The
3821              default value is 120 seconds.  May not exceed 65533.
3822
3823       SlurmdDebug
3824              The level of detail to provide slurmd daemon's  logs.   The  de‐
3825              fault value is info.
3826
3827              quiet     Log nothing
3828
3829              fatal     Log only fatal errors
3830
3831              error     Log only errors
3832
3833              info      Log errors and general informational messages
3834
3835              verbose   Log errors and verbose informational messages
3836
3837              debug     Log  errors and verbose informational messages and de‐
3838                        bugging messages
3839
3840              debug2    Log errors and verbose informational messages and more
3841                        debugging messages
3842
3843              debug3    Log errors and verbose informational messages and even
3844                        more debugging messages
3845
3846              debug4    Log errors and verbose informational messages and even
3847                        more debugging messages
3848
3849              debug5    Log errors and verbose informational messages and even
3850                        more debugging messages
3851
3852       SlurmdLogFile
3853              Fully qualified pathname of a file into which the   slurmd  dae‐
3854              mon's  logs  are  written.   The default value is none (performs
3855              logging via syslog).  The first "%h" within the name is replaced
3856              with  the  hostname  on  which the slurmd is running.  The first
3857              "%n" within the name is replaced with the  Slurm  node  name  on
3858              which the slurmd is running.
3859              See the section LOGGING if a pathname is specified.
3860
3861       SlurmdParameters
3862              Parameters  specific  to  the  Slurmd.   Multiple options may be
3863              comma separated.
3864
3865              config_overrides
3866                     If set, consider the configuration of  each  node  to  be
3867                     that  specified  in the slurm.conf configuration file and
3868                     any node with less than the configured resources will not
3869                     be  set  to  INVAL/INVALID_REG.  This option is generally
3870                     only useful for testing purposes.  Equivalent to the  now
3871                     deprecated FastSchedule=2 option.
3872
3873              l3cache_as_socket
3874                     Use  the hwloc l3cache as the socket count. Can be useful
3875                     on certain processors  where  the  socket  level  is  too
3876                     coarse, and the l3cache may provide better task distribu‐
3877                     tion. (E.g.,  along  CCX  boundaries  instead  of  socket
3878                     boundaries.)         Mutually        exclusive       with
3879                     numa_node_as_socket.  Requires hwloc v2.
3880
3881              numa_node_as_socket
3882                     Use the hwloc NUMA Node to determine main  hierarchy  ob‐
3883                     ject  to  be  used as socket.  If the option is set Slurm
3884                     will check the parent object of NUMA Noda and use  it  as
3885                     socket. This option may be useful for architectures likes
3886                     AMD Epyc, where number of nodes per socket may be config‐
3887                     ured.   Mutually  exclusive  with l3cache_as_socket.  Re‐
3888                     quires hwloc v2.
3889
3890              shutdown_on_reboot
3891                     If set, the Slurmd will shut itself down  when  a  reboot
3892                     request is received.
3893
3894       SlurmdPidFile
3895              Fully qualified pathname of a file into which the  slurmd daemon
3896              may write its process id. This may be used for automated  signal
3897              processing.  The first "%h" within the name is replaced with the
3898              hostname on which the slurmd is running.  The first "%n"  within
3899              the  name  is  replaced  with  the  Slurm node name on which the
3900              slurmd is running.  The default value is "/var/run/slurmd.pid".
3901
3902       SlurmdPort
3903              The port number that the Slurm compute node daemon, slurmd, lis‐
3904              tens  to  for  work.  The default value is SLURMD_PORT as estab‐
3905              lished at system build time. If none  is  explicitly  specified,
3906              its  value will be 6818.  A restart of slurmctld is required for
3907              changes to this parameter to take effect.  NOTE:  Either  slurm‐
3908              ctld  and  slurmd  daemons must not execute on the same nodes or
3909              the values of SlurmctldPort and SlurmdPort must be different.
3910
3911              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3912              automatically  try  to  interact  with  anything opened on ports
3913              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3914              configured SrunPortRange and RSIP's port range.
3915
3916       SlurmdSpoolDir
3917              Fully  qualified  pathname  of a directory into which the slurmd
3918              daemon's state information and batch job script information  are
3919              written.  This  must  be  a  common  pathname for all nodes, but
3920              should represent a directory which is local to each node (refer‐
3921              ence    a   local   file   system).   The   default   value   is
3922              "/var/spool/slurmd".  The first "%h" within the name is replaced
3923              with  the  hostname  on  which the slurmd is running.  The first
3924              "%n" within the name is replaced with the  Slurm  node  name  on
3925              which the slurmd is running.
3926
3927       SlurmdSyslogDebug
3928              The  slurmd  daemon  will  log  events to the syslog file at the
3929              specified level of detail. If not set, the  slurmd  daemon  will
3930              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3931              and it is running in the background, in which case it  will  log
3932              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3933              the case that SlurmdDebug is set to quiet) or it is run  in  the
3934              foreground, when it will be set to quiet.
3935
3936              quiet     Log nothing
3937
3938              fatal     Log only fatal errors
3939
3940              error     Log only errors
3941
3942              info      Log errors and general informational messages
3943
3944              verbose   Log errors and verbose informational messages
3945
3946              debug     Log  errors and verbose informational messages and de‐
3947                        bugging messages
3948
3949              debug2    Log errors and verbose informational messages and more
3950                        debugging messages
3951
3952              debug3    Log errors and verbose informational messages and even
3953                        more debugging messages
3954
3955              debug4    Log errors and verbose informational messages and even
3956                        more debugging messages
3957
3958              debug5    Log errors and verbose informational messages and even
3959                        more debugging messages
3960
3961              NOTE: By default, Slurm's systemd service files start daemons in
3962              the  foreground with the -D option. This means that systemd will
3963              capture stdout/stderr output and print that to syslog,  indepen‐
3964              dent  of  Slurm  printing to syslog directly. To prevent systemd
3965              from doing  this,  add  "StandardOutput=null"  and  "StandardEr‐
3966              ror=null" to the respective service files or override files.
3967
3968       SlurmdTimeout
3969              The  interval,  in  seconds, that the Slurm controller waits for
3970              slurmd to respond before configuring that node's state to  DOWN.
3971              A  value of zero indicates the node will not be tested by slurm‐
3972              ctld to confirm the state of slurmd, the node will not be  auto‐
3973              matically  set  to  a  DOWN  state  indicating  a non-responsive
3974              slurmd, and some other tool will take responsibility  for  moni‐
3975              toring  the  state  of  each compute node and its slurmd daemon.
3976              Slurm's hierarchical communication mechanism is used to ping the
3977              slurmd  daemons  in order to minimize system noise and overhead.
3978              The default value is 300 seconds.   The  value  may  not  exceed
3979              65533 seconds.
3980
3981       SlurmdUser
3982              The  name  of the user that the slurmd daemon executes as.  This
3983              user must exist on all nodes of the cluster  for  authentication
3984              of  communications  between Slurm components.  The default value
3985              is "root".
3986
3987       SlurmSchedLogFile
3988              Fully qualified pathname of the scheduling event  logging  file.
3989              The  syntax  of  this parameter is the same as for SlurmctldLog‐
3990              File.  In order to configure scheduler  logging,  set  both  the
3991              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3992
3993       SlurmSchedLogLevel
3994              The  initial  level  of scheduling event logging, similar to the
3995              SlurmctldDebug parameter used to control the  initial  level  of
3996              slurmctld  logging.  Valid values for SlurmSchedLogLevel are "0"
3997              (scheduler logging disabled)  and  "1"  (scheduler  logging  en‐
3998              abled).  If this parameter is omitted, the value defaults to "0"
3999              (disabled).  In order to configure scheduler logging,  set  both
4000              the  SlurmSchedLogFile  and  SlurmSchedLogLevel parameters.  The
4001              scheduler logging level can be changed dynamically  using  scon‐
4002              trol.
4003
4004       SlurmUser
4005              The name of the user that the slurmctld daemon executes as.  For
4006              security purposes, a user  other  than  "root"  is  recommended.
4007              This user must exist on all nodes of the cluster for authentica‐
4008              tion of communications between Slurm  components.   The  default
4009              value is "root".
4010
4011       SrunEpilog
4012              Fully qualified pathname of an executable to be run by srun fol‐
4013              lowing the completion of a job step.  The command line arguments
4014              for  the executable will be the command and arguments of the job
4015              step.  This configuration parameter may be overridden by  srun's
4016              --epilog  parameter. Note that while the other "Epilog" executa‐
4017              bles (e.g., TaskEpilog) are run by slurmd on the  compute  nodes
4018              where  the  tasks  are executed, the SrunEpilog runs on the node
4019              where the "srun" is executing.
4020
4021       SrunPortRange
4022              The srun creates a set of listening ports  to  communicate  with
4023              the  controller,  the  slurmstepd  and to handle the application
4024              I/O.  By default these ports are ephemeral meaning the port num‐
4025              bers  are  selected  by  the  kernel. Using this parameter allow
4026              sites to configure a range of ports from which srun  ports  will
4027              be  selected. This is useful if sites want to allow only certain
4028              port range on their network.
4029
4030              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4031              automatically  try  to  interact  with  anything opened on ports
4032              8192-60000.  Configure SrunPortRange to use  a  range  of  ports
4033              above  those used by RSIP, ideally 1000 or more ports, for exam‐
4034              ple "SrunPortRange=60001-63000".
4035
4036              Note: SrunPortRange must be large enough to cover  the  expected
4037              number  of srun ports created on a given submission node. A sin‐
4038              gle srun opens 3 listening ports plus 2 more for every 48 hosts.
4039              Example:
4040
4041              srun -N 48 will use 5 listening ports.
4042
4043              srun -N 50 will use 7 listening ports.
4044
4045              srun -N 200 will use 13 listening ports.
4046
4047       SrunProlog
4048              Fully  qualified  pathname  of  an  executable to be run by srun
4049              prior to the launch of a job step.  The command  line  arguments
4050              for  the executable will be the command and arguments of the job
4051              step.  This configuration parameter may be overridden by  srun's
4052              --prolog  parameter. Note that while the other "Prolog" executa‐
4053              bles (e.g., TaskProlog) are run by slurmd on the  compute  nodes
4054              where  the  tasks  are executed, the SrunProlog runs on the node
4055              where the "srun" is executing.
4056
4057       StateSaveLocation
4058              Fully qualified pathname of a directory  into  which  the  Slurm
4059              controller,   slurmctld,   saves   its   state  (e.g.  "/usr/lo‐
4060              cal/slurm/checkpoint").  Slurm state will saved here to  recover
4061              from system failures.  SlurmUser must be able to create files in
4062              this directory.  If you have a secondary  SlurmctldHost  config‐
4063              ured, this location should be readable and writable by both sys‐
4064              tems.  Since all running and pending job information  is  stored
4065              here,  the  use  of a reliable file system (e.g. RAID) is recom‐
4066              mended.  The default value is "/var/spool".  A restart of slurm‐
4067              ctld  is  required for changes to this parameter to take effect.
4068              If any slurm daemons terminate abnormally, their core files will
4069              also be written into this directory.
4070
4071       SuspendExcNodes
4072              Specifies  the  nodes  which  are to not be placed in power save
4073              mode, even if the node remains idle for an  extended  period  of
4074              time.  Use Slurm's hostlist expression to identify nodes with an
4075              optional ":" separator and count of nodes to  exclude  from  the
4076              preceding  range.  For example "nid[10-20]:4" will prevent 4 us‐
4077              able nodes (i.e IDLE and not DOWN, DRAINING or  already  powered
4078              down) in the set "nid[10-20]" from being powered down.  Multiple
4079              sets of nodes can be specified with or without counts in a comma
4080              separated  list  (e.g  "nid[10-20]:4,nid[80-90]:2").   If a node
4081              count specification is given, any list of nodes to  NOT  have  a
4082              node  count  must  be after the last specification with a count.
4083              For example "nid[10-20]:4,nid[60-70]" will exclude  4  nodes  in
4084              the  set  "nid[10-20]:4"  plus all nodes in the set "nid[60-70]"
4085              while "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the  set
4086              "nid[1-3],nid[10-20]".  By default no nodes are excluded.
4087
4088       SuspendExcParts
4089              Specifies  the  partitions  whose  nodes are to not be placed in
4090              power save mode, even if the node remains idle for  an  extended
4091              period of time.  Multiple partitions can be identified and sepa‐
4092              rated by commas.  By default no nodes are excluded.
4093
4094       SuspendProgram
4095              SuspendProgram is the program that will be executed when a  node
4096              remains  idle  for  an extended period of time.  This program is
4097              expected to place the node into some power save mode.  This  can
4098              be  used  to  reduce the frequency and voltage of a node or com‐
4099              pletely power the node off.  The program executes as  SlurmUser.
4100              The  argument  to  the  program will be the names of nodes to be
4101              placed into power savings mode (using Slurm's  hostlist  expres‐
4102              sion format).  By default, no program is run.
4103
4104       SuspendRate
4105              The  rate at which nodes are placed into power save mode by Sus‐
4106              pendProgram.  The value is number of nodes per minute and it can
4107              be used to prevent a large drop in power consumption (e.g. after
4108              a large job completes).  A value of zero results  in  no  limits
4109              being imposed.  The default value is 60 nodes per minute.
4110
4111       SuspendTime
4112              Nodes  which remain idle or down for this number of seconds will
4113              be placed into power save mode by SuspendProgram.  Setting  Sus‐
4114              pendTime to anything but INFINITE (or -1) will enable power save
4115              mode. INFINITE is the default.
4116
4117       SuspendTimeout
4118              Maximum time permitted (in seconds) between when a node  suspend
4119              request  is  issued and when the node is shutdown.  At that time
4120              the node must be ready for a resume  request  to  be  issued  as
4121              needed for new work.  The default value is 30 seconds.
4122
4123       SwitchParameters
4124              Optional parameters for the switch plugin.
4125
4126              On      HPE      Slingshot      systems      configured     with
4127              SwitchType=switch/hpe_slingshot, the  following  parameters  are
4128              supported (separate multiple parameters with a comma):
4129
4130              vnis=<min>-<max>
4131                     Range  of  VNIs  to  allocate  for jobs and applications.
4132                     This parameter is required.
4133
4134              tcs=<class1>[:<class2>]...
4135                     Set of traffic classes  to  configure  for  applications.
4136                     Supported  traffic  classes are DEDICATED_ACCESS, LOW_LA‐
4137                     TENCY, BULK_DATA, and BEST_EFFORT.
4138
4139              single_node_vni
4140                     Allocate a VNI for single node job steps.
4141
4142              job_vni
4143                     Allocate an additional VNI for jobs, shared among all job
4144                     steps.
4145
4146              def_<rsrc>=<val>
4147                     Per-CPU reserved allocation for this resource.
4148
4149              res_<rsrc>=<val>
4150                     Per-node  reserved allocation for this resource.  If set,
4151                     overrides the per-CPU allocation.
4152
4153              max_<rsrc>=<val>
4154                     Maximum per-node application for this resource.
4155
4156       The resources that may be configured are:
4157
4158              txqs   Transmit command queues. The default is 3 per-CPU,  maxi‐
4159                     mum 1024 per-node.
4160
4161              tgqs   Target  command queues. The default is 2 per-CPU, maximum
4162                     512 per-node.
4163
4164              eqs    Event queues. The default is 8 per-CPU, maximum 2048 per-
4165                     node.
4166
4167              cts    Counters.  The  default  is  2 per-CPU, maximum 2048 per-
4168                     node.
4169
4170              tles   Trigger list entries. The default is 1  per-CPU,  maximum
4171                     2048 per-node.
4172
4173              ptes   Portable table entries. The default is 8 per-CPU, maximum
4174                     2048 per-node.
4175
4176              les    List entries. The default is 134 per-CPU,  maximum  65535
4177                     per-node.
4178
4179              acs    Addressing  contexts.  The  default is 4 per-CPU, maximum
4180                     1024 per-node.
4181
4182       SwitchType
4183              Identifies the type of switch or interconnect used for  applica‐
4184              tion      communications.      Acceptable     values     include
4185              "switch/cray_aries" for Cray systems, "switch/hpe_slingshot" for
4186              HPE Slingshot systems and "switch/none" for switches not requir‐
4187              ing special processing for job launch or termination  (Ethernet,
4188              and InfiniBand).  The default value is "switch/none".  All Slurm
4189              daemons, commands and running  jobs  must  be  restarted  for  a
4190              change  in  SwitchType to take effect.  If running jobs exist at
4191              the time slurmctld is restarted with a new value of  SwitchType,
4192              records of all jobs in any state may be lost.
4193
4194       TaskEpilog
4195              Fully  qualified  pathname  of  a  program to be executed as the
4196              slurm job's owner after termination of each task.  See  TaskPro‐
4197              log for execution order details.
4198
4199       TaskPlugin
4200              Identifies  the  type  of  task launch plugin, typically used to
4201              provide resource management within a node (e.g. pinning tasks to
4202              specific processors). More than one task plugin can be specified
4203              in a comma-separated list. The prefix of  "task/"  is  optional.
4204              Acceptable values include:
4205
4206              task/affinity  enables      resource      containment      using
4207                             sched_setaffinity().  This enables the --cpu-bind
4208                             and/or --mem-bind srun options.
4209
4210              task/cgroup    enables  resource containment using Linux control
4211                             cgroups.   This  enables  the  --cpu-bind  and/or
4212                             --mem-bind   srun   options.    NOTE:   see  "man
4213                             cgroup.conf" for configuration details.
4214
4215              task/none      for systems requiring no special handling of user
4216                             tasks.   Lacks  support for the --cpu-bind and/or
4217                             --mem-bind srun options.  The  default  value  is
4218                             "task/none".
4219
4220              NOTE:  It  is recommended to stack task/affinity,task/cgroup to‐
4221              gether  when  configuring  TaskPlugin,  and  setting  Constrain‐
4222              Cores=yes  in  cgroup.conf.  This  setup  uses the task/affinity
4223              plugin for setting the  affinity  of  the  tasks  and  uses  the
4224              task/cgroup plugin to fence tasks into the specified resources.
4225
4226              NOTE:  For CRAY systems only: task/cgroup must be used with, and
4227              listed after task/cray_aries in  TaskPlugin.  The  task/affinity
4228              plugin  can be listed anywhere, but the previous constraint must
4229              be satisfied. For CRAY systems, a  configuration  like  this  is
4230              recommended:
4231              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
4232
4233       TaskPluginParam
4234              Optional  parameters  for  the  task  plugin.   Multiple options
4235              should be comma separated.  None, Sockets, Cores and Threads are
4236              mutually  exclusive  and  treated  as  a last possible source of
4237              --cpu-bind default. See also Node and Partition CpuBind options.
4238
4239              Cores  Bind tasks to  cores  by  default.   Overrides  automatic
4240                     binding.
4241
4242              None   Perform  no task binding by default.  Overrides automatic
4243                     binding.
4244
4245              Sockets
4246                     Bind to sockets by default.  Overrides automatic binding.
4247
4248              Threads
4249                     Bind to threads by default.  Overrides automatic binding.
4250
4251              SlurmdOffSpec
4252                     If specialized cores or CPUs are identified for the  node
4253                     (i.e. the CoreSpecCount or CpuSpecList are configured for
4254                     the node), then Slurm daemons running on the compute node
4255                     (i.e.  slurmd and slurmstepd) should run outside of those
4256                     resources (i.e. specialized resources are completely  un‐
4257                     available  to  Slurm  daemons and jobs spawned by Slurm).
4258                     This option may not  be  used  with  the  task/cray_aries
4259                     plugin.
4260
4261              Verbose
4262                     Verbosely report binding before tasks run by default.
4263
4264              Autobind
4265                     Set  a  default  binding in the event that "auto binding"
4266                     doesn't find a match.  Set to Threads, Cores  or  Sockets
4267                     (E.g. TaskPluginParam=autobind=threads).
4268
4269       TaskProlog
4270              Fully  qualified  pathname  of  a  program to be executed as the
4271              slurm job's owner prior to initiation of each task.  Besides the
4272              normal  environment variables, this has SLURM_TASK_PID available
4273              to identify the process ID of the task being started.   Standard
4274              output  from this program can be used to control the environment
4275              variables and output for the user program.
4276
4277              export NAME=value   Will set environment variables for the  task
4278                                  being  spawned.   Everything after the equal
4279                                  sign to the end of the line will be used  as
4280                                  the value for the environment variable.  Ex‐
4281                                  porting of functions is not  currently  sup‐
4282                                  ported.
4283
4284              print ...           Will  cause  that  line (without the leading
4285                                  "print ") to be printed to the  job's  stan‐
4286                                  dard output.
4287
4288              unset NAME          Will  clear  environment  variables  for the
4289                                  task being spawned.
4290
4291              The order of task prolog/epilog execution is as follows:
4292
4293              1. pre_launch_priv()
4294                                  Function in TaskPlugin
4295
4296              1. pre_launch()     Function in TaskPlugin
4297
4298              2. TaskProlog       System-wide  per  task  program  defined  in
4299                                  slurm.conf
4300
4301              3. User prolog      Job-step-specific task program defined using
4302                                  srun's     --task-prolog      option      or
4303                                  SLURM_TASK_PROLOG environment variable
4304
4305              4. Task             Execute the job step's task
4306
4307              5. User epilog      Job-step-specific task program defined using
4308                                  srun's     --task-epilog      option      or
4309                                  SLURM_TASK_EPILOG environment variable
4310
4311              6. TaskEpilog       System-wide  per  task  program  defined  in
4312                                  slurm.conf
4313
4314              7. post_term()      Function in TaskPlugin
4315
4316       TCPTimeout
4317              Time permitted for TCP connection  to  be  established.  Default
4318              value is 2 seconds.
4319
4320       TmpFS  Fully  qualified  pathname  of the file system available to user
4321              jobs for temporary storage. This parameter is used in establish‐
4322              ing a node's TmpDisk space.  The default value is "/tmp".
4323
4324       TopologyParam
4325              Comma-separated options identifying network topology options.
4326
4327              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4328                             when TopologyPlugin=topology/tree.
4329
4330              TopoOptional   Only optimize allocation for network topology  if
4331                             the  job includes a switch option. Since optimiz‐
4332                             ing resource  allocation  for  topology  involves
4333                             much  higher  system overhead, this option can be
4334                             used to impose the extra overhead  only  on  jobs
4335                             which can take advantage of it. If most job allo‐
4336                             cations are not optimized for  network  topology,
4337                             they  may  fragment  resources  to the point that
4338                             topology optimization for other jobs will be dif‐
4339                             ficult  to  achieve.   NOTE: Jobs may span across
4340                             nodes without common parent  switches  with  this
4341                             enabled.
4342
4343       TopologyPlugin
4344              Identifies  the  plugin  to  be used for determining the network
4345              topology and optimizing job allocations to minimize network con‐
4346              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4347              plugins may be provided in the future which gather topology  in‐
4348              formation directly from the network.  Acceptable values include:
4349
4350              topology/3d_torus    best-fit   logic   over   three-dimensional
4351                                   topology
4352
4353              topology/none        default for other systems,  best-fit  logic
4354                                   over one-dimensional topology
4355
4356              topology/tree        used  for  a  hierarchical  network  as de‐
4357                                   scribed in a topology.conf file
4358
4359       TrackWCKey
4360              Boolean yes or no.  Used to set display and track of  the  Work‐
4361              load  Characterization  Key.  Must be set to track correct wckey
4362              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4363              file to create historical usage reports.
4364
4365       TreeWidth
4366              Slurmd  daemons  use  a virtual tree network for communications.
4367              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4368              architectures  with  a front end node running the slurmd daemon,
4369              the value must always be equal to or greater than the number  of
4370              front end nodes which eliminates the need for message forwarding
4371              between the slurmd daemons.  On other architectures the  default
4372              value  is 50, meaning each slurmd daemon can communicate with up
4373              to 50 other slurmd daemons and over 2500 nodes can be  contacted
4374              with  two  message  hops.   The default value will work well for
4375              most clusters.  Optimal  system  performance  can  typically  be
4376              achieved if TreeWidth is set to the square root of the number of
4377              nodes in the cluster for systems having no more than 2500  nodes
4378              or  the  cube  root for larger systems. The value may not exceed
4379              65533.
4380
4381       UnkillableStepProgram
4382              If the processes in a job step are determined to  be  unkillable
4383              for  a  period  of  time  specified by the UnkillableStepTimeout
4384              variable, the program specified by UnkillableStepProgram will be
4385              executed.  By default no program is run.
4386
4387              See section UNKILLABLE STEP PROGRAM SCRIPT for more information.
4388
4389       UnkillableStepTimeout
4390              The  length of time, in seconds, that Slurm will wait before de‐
4391              ciding that processes in a job step are unkillable  (after  they
4392              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
4393              gram.  The default timeout value is 60  seconds.   If  exceeded,
4394              the compute node will be drained to prevent future jobs from be‐
4395              ing scheduled on the node.
4396
4397       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
4398              will  be enabled.  PAM is used to establish the upper bounds for
4399              resource limits. With PAM support enabled, local system adminis‐
4400              trators can dynamically configure system resource limits. Chang‐
4401              ing the upper bound of a resource limit will not alter the  lim‐
4402              its  of  running jobs, only jobs started after a change has been
4403              made will pick up the new limits.  The default value is  0  (not
4404              to enable PAM support).  Remember that PAM also needs to be con‐
4405              figured to support Slurm as a service.  For  sites  using  PAM's
4406              directory based configuration option, a configuration file named
4407              slurm should be created.  The  module-type,  control-flags,  and
4408              module-path names that should be included in the file are:
4409              auth        required      pam_localuser.so
4410              auth        required      pam_shells.so
4411              account     required      pam_unix.so
4412              account     required      pam_access.so
4413              session     required      pam_unix.so
4414              For sites configuring PAM with a general configuration file, the
4415              appropriate lines (see above), where slurm is the  service-name,
4416              should be added.
4417
4418              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
4419              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
4420              these  two  modules  can work independently of the value set for
4421              UsePAM.
4422
4423       VSizeFactor
4424              Memory specifications in job requests apply to real memory  size
4425              (also  known  as  resident  set size). It is possible to enforce
4426              virtual memory limits for both jobs and job  steps  by  limiting
4427              their virtual memory to some percentage of their real memory al‐
4428              location. The VSizeFactor parameter specifies the job's  or  job
4429              step's  virtual  memory limit as a percentage of its real memory
4430              limit. For example, if a job's real memory limit  is  500MB  and
4431              VSizeFactor  is  set  to  101 then the job will be killed if its
4432              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
4433              (101 percent of the real memory limit).  The default value is 0,
4434              which disables enforcement of virtual memory limits.  The  value
4435              may not exceed 65533 percent.
4436
4437              NOTE:  This  parameter is dependent on OverMemoryKill being con‐
4438              figured in JobAcctGatherParams. It is also possible to configure
4439              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4440              Factor will not  have  an  effect  on  memory  enforcement  done
4441              through cgroups.
4442
4443       WaitTime
4444              Specifies  how  many  seconds the srun command should by default
4445              wait after the first task terminates before terminating all  re‐
4446              maining  tasks.  The  "--wait"  option  on the srun command line
4447              overrides this value.  The default value is  0,  which  disables
4448              this feature.  May not exceed 65533 seconds.
4449
4450       X11Parameters
4451              For use with Slurm's built-in X11 forwarding implementation.
4452
4453              home_xauthority
4454                      If set, xauth data on the compute node will be placed in
4455                      ~/.Xauthority rather than  in  a  temporary  file  under
4456                      TmpFS.
4457

NODE CONFIGURATION

4459       The configuration of nodes (or machines) to be managed by Slurm is also
4460       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
4461       adding  nodes, changing their processor count, etc.) require restarting
4462       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
4463       must know each node in the system to forward messages in support of hi‐
4464       erarchical communications.  Only the NodeName must be supplied  in  the
4465       configuration  file.   All  other node configuration information is op‐
4466       tional.  It is advisable to establish baseline node configurations, es‐
4467       pecially  if the cluster is heterogeneous.  Nodes which register to the
4468       system with less than the configured resources (e.g.  too  little  mem‐
4469       ory),  will  be  placed in the "DOWN" state to avoid scheduling jobs on
4470       them.  Establishing baseline configurations  will  also  speed  Slurm's
4471       scheduling process by permitting it to compare job requirements against
4472       these (relatively few) configuration parameters and possibly avoid hav‐
4473       ing  to check job requirements against every individual node's configu‐
4474       ration.  The resources checked at node  registration  time  are:  CPUs,
4475       RealMemory and TmpDisk.
4476
4477       Default values can be specified with a record in which NodeName is "DE‐
4478       FAULT".  The default entry values will apply only to lines following it
4479       in  the configuration file and the default values can be reset multiple
4480       times in the configuration file  with  multiple  entries  where  "Node‐
4481       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4482       add to previous default values and will not  reinitialize  the  default
4483       values.  The "NodeName=" specification must be placed on every line de‐
4484       scribing the configuration of nodes.  A single node name can not appear
4485       as  a NodeName value in more than one line (duplicate node name records
4486       will be ignored).  In fact, it is generally possible and  desirable  to
4487       define  the configurations of all nodes in only a few lines.  This con‐
4488       vention permits significant optimization in the  scheduling  of  larger
4489       clusters.   In  order to support the concept of jobs requiring consecu‐
4490       tive nodes on some architectures, node specifications should  be  place
4491       in  this  file in consecutive order.  No single node name may be listed
4492       more than once in the configuration file.  Use "DownNodes="  to  record
4493       the  state  of  nodes which are temporarily in a DOWN, DRAIN or FAILING
4494       state without altering  permanent  configuration  information.   A  job
4495       step's  tasks  are  allocated to nodes in order the nodes appear in the
4496       configuration file. There is presently no capability  within  Slurm  to
4497       arbitrarily order a job step's tasks.
4498
4499       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4500       and/or a simple node range expression may optionally be used to specify
4501       numeric  ranges  of  nodes  to avoid building a configuration file with
4502       large numbers of entries.  The node range expression  can  contain  one
4503       pair  of  square  brackets  with  a sequence of comma-separated numbers
4504       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4505       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4506       more leading zeros to indicate the numeric portion has a  fixed  number
4507       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4508       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4509       more  numeric  expressions are included, one of them must be at the end
4510       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4511       always be used in a comma-separated list.
4512
4513       The node configuration specified the following information:
4514
4515
4516       NodeName
4517              Name  that  Slurm uses to refer to a node.  Typically this would
4518              be the string that "/bin/hostname -s" returns.  It may  also  be
4519              the  fully  qualified  domain name as returned by "/bin/hostname
4520              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4521              with the host through the host database (/etc/hosts) or DNS, de‐
4522              pending on the resolver settings.  Note that if the  short  form
4523              of  the hostname is not used, it may prevent use of hostlist ex‐
4524              pressions (the numeric portion in brackets must be at the end of
4525              the string).  It may also be an arbitrary string if NodeHostname
4526              is specified.  If the NodeName is "DEFAULT", the  values  speci‐
4527              fied  with  that record will apply to subsequent node specifica‐
4528              tions unless explicitly set to other values in that node  record
4529              or  replaced  with a different set of default values.  Each line
4530              where NodeName is "DEFAULT" will replace or add to previous  de‐
4531              fault values and not a reinitialize the default values.  For ar‐
4532              chitectures in which the node order is significant,  nodes  will
4533              be considered consecutive in the order defined.  For example, if
4534              the configuration for "NodeName=charlie" immediately follows the
4535              configuration for "NodeName=baker" they will be considered adja‐
4536              cent in the computer.   NOTE:  If  the  NodeName  is  "ALL"  the
4537              process parsing the configuration will exit immediately as it is
4538              an internally reserved word.
4539
4540       NodeHostname
4541              Typically this would be the string that "/bin/hostname  -s"  re‐
4542              turns.   It  may  also be the fully qualified domain name as re‐
4543              turned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid
4544              domain  name  associated with the host through the host database
4545              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4546              that  if the short form of the hostname is not used, it may pre‐
4547              vent use of hostlist expressions (the numeric portion in  brack‐
4548              ets  must be at the end of the string).  A node range expression
4549              can be used to specify a set of  nodes.   If  an  expression  is
4550              used,  the  number of nodes identified by NodeHostname on a line
4551              in the configuration file must be identical  to  the  number  of
4552              nodes identified by NodeName.  By default, the NodeHostname will
4553              be identical in value to NodeName.
4554
4555       NodeAddr
4556              Name that a node should be referred to in establishing a  commu‐
4557              nications  path.   This  name will be used as an argument to the
4558              getaddrinfo() function for identification.  If a node range  ex‐
4559              pression  is used to designate multiple nodes, they must exactly
4560              match  the  entries  in  the  NodeName  (e.g.  "NodeName=lx[0-7]
4561              NodeAddr=elx[0-7]").   NodeAddr  may  also contain IP addresses.
4562              By default, the NodeAddr will be identical in value to NodeHost‐
4563              name.
4564
4565       BcastAddr
4566              Alternate  network path to be used for sbcast network traffic to
4567              a given node.  This name will be used  as  an  argument  to  the
4568              getaddrinfo()  function.   If a node range expression is used to
4569              designate multiple nodes, they must exactly match the entries in
4570              the   NodeName   (e.g.  "NodeName=lx[0-7]  BcastAddr=elx[0-7]").
4571              BcastAddr may also contain IP addresses.  By default, the  Bcas‐
4572              tAddr  is  unset,  and  sbcast  traffic  will  be  routed to the
4573              NodeAddr for a given node.  Note: cannot be used with Communica‐
4574              tionParameters=NoInAddrAny.
4575
4576       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4577              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4578              and ThreadsPerCore should be specified.  The default value is 1.
4579
4580       CoreSpecCount
4581              Number  of  cores  reserved  for system use.  Depending upon the
4582              TaskPluginParam option of SlurmdOffSpec, the Slurm daemon slurmd
4583              may  either be confined to these resources (the default) or pre‐
4584              vented from using these resources.   Isolation  of  slurmd  from
4585              user  jobs  may  improve application performance.  A job can use
4586              these cores if AllowSpecResourcesUsage=yes and the user  explic‐
4587              itly  requests  less than the configured CoreSpecCount.  If this
4588              option and CpuSpecList are both designated for a node, an  error
4589              is generated.  For information on the algorithm used by Slurm to
4590              select the cores refer to the core specialization  documentation
4591              ( https://slurm.schedmd.com/core_spec.html ).
4592
4593       CoresPerSocket
4594              Number  of  cores  in  a  single physical processor socket (e.g.
4595              "2").  The CoresPerSocket value describes  physical  cores,  not
4596              the  logical number of processors per socket.  NOTE: If you have
4597              multi-core processors, you will likely need to specify this  pa‐
4598              rameter  in  order to optimize scheduling.  The default value is
4599              1.
4600
4601       CpuBind
4602              If a job step request does not specify an option to control  how
4603              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4604              located to the job have the same CpuBind option the node CpuBind
4605              option  will control how tasks are bound to allocated resources.
4606              Supported  values  for  CpuBind  are  "none",  "socket",  "ldom"
4607              (NUMA), "core" and "thread".
4608
4609       CPUs   Number  of logical processors on the node (e.g. "2").  It can be
4610              set to the total number of sockets(supported only by select/lin‐
4611              ear),  cores  or  threads.   This can be useful when you want to
4612              schedule only the cores on a hyper-threaded  node.  If  CPUs  is
4613              omitted, its default will be set equal to the product of Boards,
4614              Sockets, CoresPerSocket, and ThreadsPerCore.
4615
4616       CpuSpecList
4617              A comma-delimited list of Slurm abstract CPU  IDs  reserved  for
4618              system  use.   The  list  will  be expanded to include all other
4619              CPUs, if any, on the same cores.  Depending upon the TaskPlugin‐
4620              Param  option  of SlurmdOffSpec, the Slurm daemon slurmd may ei‐
4621              ther be confined to these resources (the default)  or  prevented
4622              from  using these resources.  Isolation of slurmd from user jobs
4623              may improve application performance.  A job can use these  cores
4624              if  AllowSpecResourcesUsage=yes and the user explicitly requests
4625              less than the number of CPUs in this list.  If this  option  and
4626              CoreSpecCount are both designated for a node, an error is gener‐
4627              ated.  This option has no effect unless cgroup  job  confinement
4628              is  also  configured (i.e. the task/cgroup TaskPlugin is enabled
4629              and ConstrainCores=yes is set in cgroup.conf).
4630
4631       Features
4632              A comma-delimited list of arbitrary strings indicative  of  some
4633              characteristic  associated  with the node.  There is no value or
4634              count associated with a feature at this time, a node either  has
4635              a  feature  or it does not.  A desired feature may contain a nu‐
4636              meric component indicating, for  example,  processor  speed  but
4637              this numeric component will be considered to be part of the fea‐
4638              ture string. Features are intended to be used  to  filter  nodes
4639              eligible  to run jobs via the --constraint argument.  By default
4640              a node has no features.  Also see Gres for being  able  to  have
4641              more  control  such as types and count. Using features is faster
4642              than scheduling against GRES but is limited  to  Boolean  opera‐
4643              tions.
4644
4645       Gres   A comma-delimited list of generic resources specifications for a
4646              node.   The   format   is:   "<name>[:<type>][:no_consume]:<num‐
4647              ber>[K|M|G]".   The  first  field  is  the  resource name, which
4648              matches the GresType configuration parameter name.  The optional
4649              type field might be used to identify a model of that generic re‐
4650              source.  It is forbidden to specify both an untyped GRES  and  a
4651              typed  GRES with the same <name>.  The optional no_consume field
4652              allows you to specify that a generic resource does  not  have  a
4653              finite  number  of that resource that gets consumed as it is re‐
4654              quested. The no_consume field is a GRES specific setting and ap‐
4655              plies  to the GRES, regardless of the type specified.  It should
4656              not be used with GRES that has a  dedicated  plugin,  if  you're
4657              looking  for  a  way to overcommit GPUs to multiple processes at
4658              the time you may be interested in using  "shard"  GRES  instead.
4659              The  final field must specify a generic resources count.  A suf‐
4660              fix of "K", "M", "G", "T" or "P" may be  used  to  multiply  the
4661              number   by   1024,   1048576,  1073741824,  etc.  respectively.
4662              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4663              sume:4G").   By  default a node has no generic resources and its
4664              maximum count is that of an unsigned 64bit  integer.   Also  see
4665              Features  for  Boolean  flags  to  filter  nodes  using job con‐
4666              straints.
4667
4668       MemSpecLimit
4669              Amount of memory, in megabytes, reserved for system use and  not
4670              available  for  user  allocations.  If the task/cgroup plugin is
4671              configured and that plugin constrains memory  allocations  (i.e.
4672              the  task/cgroup TaskPlugin is enabled and ConstrainRAMSpace=yes
4673              is set in cgroup.conf), then Slurm compute node daemons  (slurmd
4674              plus  slurmstepd)  will be allocated the specified memory limit.
4675              Note that having the Memory set in SelectTypeParameters  as  any
4676              of  the  options  that has it as a consumable resource is needed
4677              for this option to work.  The daemons will not be killed if they
4678              exhaust  the  memory allocation (ie. the Out-Of-Memory Killer is
4679              disabled for the daemon's memory cgroup).   If  the  task/cgroup
4680              plugin  is not configured, the specified memory will only be un‐
4681              available for user allocations.
4682
4683       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4684              tens  to for work on this particular node. By default there is a
4685              single port number for all slurmd daemons on all  compute  nodes
4686              as  defined  by  the  SlurmdPort configuration parameter. Use of
4687              this option is not generally recommended except for  development
4688              or  testing  purposes.  If  multiple slurmd daemons execute on a
4689              node this can specify a range of ports.
4690
4691              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4692              automatically  try  to  interact  with  anything opened on ports
4693              8192-60000.  Configure Port to use a port outside of the config‐
4694              ured SrunPortRange and RSIP's port range.
4695
4696       Procs  See CPUs.
4697
4698       RealMemory
4699              Size of real memory on the node in megabytes (e.g. "2048").  The
4700              default value is 1. Lowering RealMemory with the goal of setting
4701              aside  some  amount for the OS and not available for job alloca‐
4702              tions will not work as intended if Memory is not set as  a  con‐
4703              sumable resource in SelectTypeParameters. So one of the *_Memory
4704              options need to be enabled for that  goal  to  be  accomplished.
4705              Also see MemSpecLimit.
4706
4707       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4708              "DRAINED" "DRAINING", "FAIL" or "FAILING".  Use  quotes  to  en‐
4709              close a reason having more than one word.
4710
4711       Sockets
4712              Number  of  physical  processor  sockets/chips on the node (e.g.
4713              "2").  If Sockets is omitted, it will  be  inferred  from  CPUs,
4714              CoresPerSocket,   and   ThreadsPerCore.    NOTE:   If  you  have
4715              multi-core processors, you will likely need to specify these pa‐
4716              rameters.   Sockets  and SocketsPerBoard are mutually exclusive.
4717              If Sockets is specified when Boards is also used, Sockets is in‐
4718              terpreted as SocketsPerBoard rather than total sockets.  The de‐
4719              fault value is 1.
4720
4721       SocketsPerBoard
4722              Number of  physical  processor  sockets/chips  on  a  baseboard.
4723              Sockets and SocketsPerBoard are mutually exclusive.  The default
4724              value is 1.
4725
4726       State  State of the node with respect to the initiation of  user  jobs.
4727              Acceptable  values are CLOUD, DOWN, DRAIN, FAIL, FAILING, FUTURE
4728              and UNKNOWN.  Node states of BUSY and IDLE should not be  speci‐
4729              fied  in  the  node configuration, but set the node state to UN‐
4730              KNOWN instead.  Setting the node state to UNKNOWN will result in
4731              the  node  state  being  set  to BUSY, IDLE or other appropriate
4732              state based upon recovered system state  information.   The  de‐
4733              fault value is UNKNOWN.  Also see the DownNodes parameter below.
4734
4735              CLOUD     Indicates  the  node exists in the cloud.  Its initial
4736                        state will be treated as powered down.  The node  will
4737                        be available for use after its state is recovered from
4738                        Slurm's state save file or the slurmd daemon starts on
4739                        the compute node.
4740
4741              DOWN      Indicates the node failed and is unavailable to be al‐
4742                        located work.
4743
4744              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4745                        work.
4746
4747              FAIL      Indicates  the  node  is expected to fail soon, has no
4748                        jobs allocated to it, and will not be allocated to any
4749                        new jobs.
4750
4751              FAILING   Indicates  the  node is expected to fail soon, has one
4752                        or more jobs allocated to it, but will  not  be  allo‐
4753                        cated to any new jobs.
4754
4755              FUTURE    Indicates  the node is defined for future use and need
4756                        not exist when the Slurm daemons  are  started.  These
4757                        nodes can be made available for use simply by updating
4758                        the node state using the scontrol command rather  than
4759                        restarting the slurmctld daemon. After these nodes are
4760                        made available, change their State in  the  slurm.conf
4761                        file.  Until these nodes are made available, they will
4762                        not be seen using any Slurm commands or nor  will  any
4763                        attempt be made to contact them.
4764
4765                        Dynamic Future Nodes
4766                               A slurmd started with -F[<feature>] will be as‐
4767                               sociated with a FUTURE node  that  matches  the
4768                               same configuration (sockets, cores, threads) as
4769                               reported by slurmd -C. The node's NodeAddr  and
4770                               NodeHostname  will  automatically  be retrieved
4771                               from the slurmd and will be  cleared  when  set
4772                               back  to the FUTURE state. Dynamic FUTURE nodes
4773                               retain non-FUTURE state on restart.  Use  scon‐
4774                               trol to put node back into FUTURE state.
4775
4776                               If  the  mapping  of the NodeName to the slurmd
4777                               HostName is not updated in DNS, Dynamic  Future
4778                               nodes  won't  know how to communicate with each
4779                               other -- because NodeAddr and NodeHostName  are
4780                               not defined in the slurm.conf -- and the fanout
4781                               communications need to be disabled  by  setting
4782                               TreeWidth to a high number (e.g. 65533). If the
4783                               DNS mapping is made, then the cloud_dns  Slurm‐
4784                               ctldParameter can be used.
4785
4786              UNKNOWN   Indicates  the  node's  state is undefined but will be
4787                        established (set to BUSY or IDLE) when the slurmd dae‐
4788                        mon  on  that  node  registers. UNKNOWN is the default
4789                        state.
4790
4791       ThreadsPerCore
4792              Number of logical threads in a single physical core (e.g.  "2").
4793              Note  that  the Slurm can allocate resources to jobs down to the
4794              resolution of a core. If your system  is  configured  with  more
4795              than  one  thread per core, execution of a different job on each
4796              thread is not supported unless you  configure  SelectTypeParame‐
4797              ters=CR_CPU  plus CPUs; do not configure Sockets, CoresPerSocket
4798              or ThreadsPerCore.  A job can execute a one task per thread from
4799              within  one  job  step or execute a distinct job step on each of
4800              the threads.  Note also if you are  running  with  more  than  1
4801              thread   per   core  and  running  the  select/cons_res  or  se‐
4802              lect/cons_tres plugin then you will want to set the  SelectType‐
4803              Parameters  variable to something other than CR_CPU to avoid un‐
4804              expected results.  The default value is 1.
4805
4806       TmpDisk
4807              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4808              "16384"). TmpFS (for "Temporary File System") identifies the lo‐
4809              cation which jobs should use for temporary storage.   Note  this
4810              does not indicate the amount of free space available to the user
4811              on the node, only the total file system size. The system  admin‐
4812              istration  should ensure this file system is purged as needed so
4813              that user jobs have access to most of this  space.   The  Prolog
4814              and/or  Epilog  programs  (specified  in the configuration file)
4815              might be used to ensure the file system is kept clean.  The  de‐
4816              fault value is 0.
4817
4818       Weight The  priority  of  the node for scheduling purposes.  All things
4819              being equal, jobs will be allocated the nodes  with  the  lowest
4820              weight  which satisfies their requirements.  For example, a het‐
4821              erogeneous collection of nodes might be  placed  into  a  single
4822              partition for greater system utilization, responsiveness and ca‐
4823              pability. It would be  preferable  to  allocate  smaller  memory
4824              nodes  rather  than larger memory nodes if either will satisfy a
4825              job's requirements.  The units  of  weight  are  arbitrary,  but
4826              larger weights should be assigned to nodes with more processors,
4827              memory, disk space, higher processor speed, etc.  Note that if a
4828              job allocation request can not be satisfied using the nodes with
4829              the lowest weight, the set of nodes with the next lowest  weight
4830              is added to the set of nodes under consideration for use (repeat
4831              as needed for higher weight values). If you absolutely  want  to
4832              minimize  the  number  of higher weight nodes allocated to a job
4833              (at a cost of higher scheduling overhead), give each node a dis‐
4834              tinct  Weight  value and they will be added to the pool of nodes
4835              being considered for scheduling individually.
4836
4837              The default value is 1.
4838
4839              NOTE: Node weights are first considered among  currently  avail‐
4840              able nodes. For example, a POWERED_DOWN node with a lower weight
4841              will not be evaluated before an IDLE node.
4842

DOWN NODE CONFIGURATION

4844       The DownNodes= parameter permits you to mark  certain  nodes  as  in  a
4845       DOWN,  DRAIN, FAIL, FAILING or FUTURE state without altering the perma‐
4846       nent configuration information listed under a NodeName= specification.
4847
4848
4849       DownNodes
4850              Any node name, or list of node names, from the NodeName=  speci‐
4851              fications.
4852
4853       Reason Identifies  the  reason  for  a node being in state DOWN, DRAIN,
4854              FAIL, FAILING or FUTURE.  Use quotes to enclose a reason  having
4855              more than one word.
4856
4857       State  State  of  the node with respect to the initiation of user jobs.
4858              Acceptable values are DOWN, DRAIN,  FAIL,  FAILING  and  FUTURE.
4859              For more information about these states see the descriptions un‐
4860              der State in the NodeName= section above.  The default value  is
4861              DOWN.
4862

FRONTEND NODE CONFIGURATION

4864       On  computers  where  frontend  nodes are used to execute batch scripts
4865       rather than compute nodes, one may configure one or more frontend nodes
4866       using  the  configuration  parameters  defined below. These options are
4867       very similar to those used in configuring compute nodes. These  options
4868       may  only  be used on systems configured and built with the appropriate
4869       parameters (--have-front-end).  The front end  configuration  specifies
4870       the following information:
4871
4872
4873       AllowGroups
4874              Comma-separated  list  of  group names which may execute jobs on
4875              this front end node. By default, all groups may use  this  front
4876              end  node.   A user will be permitted to use this front end node
4877              if AllowGroups has at least one group associated with the  user.
4878              May not be used with the DenyGroups option.
4879
4880       AllowUsers
4881              Comma-separated  list  of  user  names which may execute jobs on
4882              this front end node. By default, all users may  use  this  front
4883              end node.  May not be used with the DenyUsers option.
4884
4885       DenyGroups
4886              Comma-separated list of group names which are prevented from ex‐
4887              ecuting jobs on this front end node.  May not be used  with  the
4888              AllowGroups option.
4889
4890       DenyUsers
4891              Comma-separated list of user names which are prevented from exe‐
4892              cuting jobs on this front end node.  May not be  used  with  the
4893              AllowUsers option.
4894
4895       FrontendName
4896              Name  that  Slurm  uses  to refer to a frontend node.  Typically
4897              this would be the string that "/bin/hostname  -s"  returns.   It
4898              may  also  be  the  fully  qualified  domain name as returned by
4899              "/bin/hostname -f" (e.g. "foo1.bar.com"), or  any  valid  domain
4900              name   associated  with  the  host  through  the  host  database
4901              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4902              that  if the short form of the hostname is not used, it may pre‐
4903              vent use of hostlist expressions (the numeric portion in  brack‐
4904              ets  must  be at the end of the string).  If the FrontendName is
4905              "DEFAULT", the values specified with that record will  apply  to
4906              subsequent  node  specifications  unless explicitly set to other
4907              values in that frontend node record or replaced with a different
4908              set  of  default  values.   Each line where FrontendName is "DE‐
4909              FAULT" will replace or add to previous default values and not  a
4910              reinitialize the default values.
4911
4912       FrontendAddr
4913              Name  that a frontend node should be referred to in establishing
4914              a communications path. This name will be used as an argument  to
4915              the  getaddrinfo()  function  for identification.  As with Fron‐
4916              tendName, list the individual node addresses rather than using a
4917              hostlist  expression.   The  number  of FrontendAddr records per
4918              line must equal the number  of  FrontendName  records  per  line
4919              (i.e. you can't map to node names to one address).  FrontendAddr
4920              may also contain IP addresses.   By  default,  the  FrontendAddr
4921              will be identical in value to FrontendName.
4922
4923       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4924              tens to for work on this particular frontend  node.  By  default
4925              there  is  a  single  port  number for all slurmd daemons on all
4926              frontend nodes as defined by the SlurmdPort configuration param‐
4927              eter. Use of this option is not generally recommended except for
4928              development or testing purposes.
4929
4930              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4931              automatically  try  to  interact  with  anything opened on ports
4932              8192-60000.  Configure Port to use a port outside of the config‐
4933              ured SrunPortRange and RSIP's port range.
4934
4935       Reason Identifies  the  reason for a frontend node being in state DOWN,
4936              DRAINED, DRAINING, FAIL or FAILING.  Use  quotes  to  enclose  a
4937              reason having more than one word.
4938
4939       State  State  of  the  frontend  node with respect to the initiation of
4940              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4941              UNKNOWN.   Node  states of BUSY and IDLE should not be specified
4942              in the node configuration, but set the node state to UNKNOWN in‐
4943              stead.   Setting  the  node  state to UNKNOWN will result in the
4944              node state being set to BUSY, IDLE or  other  appropriate  state
4945              based  upon recovered system state information.  For more infor‐
4946              mation about these states see the descriptions  under  State  in
4947              the NodeName= section above.  The default value is UNKNOWN.
4948
4949       As  an example, you can do something similar to the following to define
4950       four front end nodes for running slurmd daemons.
4951       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4952
4953

NODESET CONFIGURATION

4955       The nodeset configuration allows you to define a name  for  a  specific
4956       set  of nodes which can be used to simplify the partition configuration
4957       section, especially for heterogenous or condo-style systems. Each node‐
4958       set  may  be  defined by an explicit list of nodes, and/or by filtering
4959       the nodes by a particular configured  feature.  If  both  Feature=  and
4960       Nodes=  are  used  the  nodeset  shall be the union of the two subsets.
4961       Note that the nodesets are only used to simplify the partition  defini‐
4962       tions  at present, and are not usable outside of the partition configu‐
4963       ration.
4964
4965
4966       Feature
4967              All nodes with this single feature will be included as  part  of
4968              this nodeset.
4969
4970       Nodes  List of nodes in this set.
4971
4972       NodeSet
4973              Unique  name for a set of nodes. Must not overlap with any Node‐
4974              Name definitions.
4975

PARTITION CONFIGURATION

4977       The partition configuration permits you to establish different job lim‐
4978       its  or  access  controls  for various groups (or partitions) of nodes.
4979       Nodes may be in more than one partition,  making  partitions  serve  as
4980       general  purpose queues.  For example one may put the same set of nodes
4981       into two different partitions, each with  different  constraints  (time
4982       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4983       allocated resources within a single partition.  Default values  can  be
4984       specified  with  a record in which PartitionName is "DEFAULT".  The de‐
4985       fault entry values will apply only to lines following it in the config‐
4986       uration  file and the default values can be reset multiple times in the
4987       configuration file with multiple entries where "PartitionName=DEFAULT".
4988       The  "PartitionName="  specification  must  be placed on every line de‐
4989       scribing the configuration of partitions.  Each line  where  Partition‐
4990       Name  is  "DEFAULT"  will replace or add to previous default values and
4991       not a reinitialize the default values.  A single partition name can not
4992       appear as a PartitionName value in more than one line (duplicate parti‐
4993       tion name records will be ignored).  If a partition that is in  use  is
4994       deleted  from  the configuration and slurm is restarted or reconfigured
4995       (scontrol reconfigure), jobs using the partition are  canceled.   NOTE:
4996       Put  all  parameters for each partition on a single line.  Each line of
4997       partition configuration information should represent a different parti‐
4998       tion.  The partition configuration file contains the following informa‐
4999       tion:
5000
5001
5002       AllocNodes
5003              Comma-separated list of nodes from which users can  submit  jobs
5004              in  the  partition.   Node names may be specified using the node
5005              range expression syntax described above.  The default  value  is
5006              "ALL".
5007
5008       AllowAccounts
5009              Comma-separated  list  of accounts which may execute jobs in the
5010              partition.  The default value is "ALL".  NOTE: If  AllowAccounts
5011              is  used  then DenyAccounts will not be enforced.  Also refer to
5012              DenyAccounts.
5013
5014       AllowGroups
5015              Comma-separated list of group names which may  execute  jobs  in
5016              this  partition.   A  user  will be permitted to submit a job to
5017              this partition if AllowGroups has at least one group  associated
5018              with  the user.  Jobs executed as user root or as user SlurmUser
5019              will be allowed to use any partition, regardless of the value of
5020              AllowGroups. In addition, a Slurm Admin or Operator will be able
5021              to view any partition, regardless of the value  of  AllowGroups.
5022              If user root attempts to execute a job as another user (e.g. us‐
5023              ing srun's --uid option), then the job will be subject to Allow‐
5024              Groups as if it were submitted by that user.  By default, Allow‐
5025              Groups is unset, meaning all groups are allowed to use this par‐
5026              tition.  The  special  value 'ALL' is equivalent to this.  Users
5027              who are not members of the specified group will not see informa‐
5028              tion  about  this partition by default. However, this should not
5029              be treated as a security mechanism, since job  information  will
5030              be  returned if a user requests details about the partition or a
5031              specific job. See the PrivateData parameter to  restrict  access
5032              to  job information.  NOTE: For performance reasons, Slurm main‐
5033              tains a list of user IDs allowed to use each partition and  this
5034              is checked at job submission time.  This list of user IDs is up‐
5035              dated when the slurmctld daemon is restarted, reconfigured (e.g.
5036              "scontrol reconfig") or the partition's AllowGroups value is re‐
5037              set, even if is value is unchanged (e.g. "scontrol update Parti‐
5038              tionName=name  AllowGroups=group").   For  a  user's access to a
5039              partition to change, both his group membership must  change  and
5040              Slurm's internal user ID list must change using one of the meth‐
5041              ods described above.
5042
5043       AllowQos
5044              Comma-separated list of Qos which may execute jobs in the parti‐
5045              tion.   Jobs executed as user root can use any partition without
5046              regard to the value of AllowQos.  The default  value  is  "ALL".
5047              NOTE:  If  AllowQos  is  used then DenyQos will not be enforced.
5048              Also refer to DenyQos.
5049
5050       Alternate
5051              Partition name of alternate partition to be used if the state of
5052              this partition is "DRAIN" or "INACTIVE."
5053
5054       CpuBind
5055              If  a job step request does not specify an option to control how
5056              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
5057              located to the job do not have the same CpuBind option the node.
5058              Then the partition's CpuBind option will control how  tasks  are
5059              bound  to  allocated resources.  Supported values forCpuBind are
5060              "none", "socket", "ldom" (NUMA), "core" and "thread".
5061
5062       Default
5063              If this keyword is set, jobs submitted without a partition spec‐
5064              ification  will  utilize  this  partition.   Possible values are
5065              "YES" and "NO".  The default value is "NO".
5066
5067       DefaultTime
5068              Run time limit used for jobs that don't specify a value. If  not
5069              set  then  MaxTime will be used.  Format is the same as for Max‐
5070              Time.
5071
5072       DefCpuPerGPU
5073              Default count of CPUs allocated per allocated GPU. This value is
5074              used   only  if  the  job  didn't  specify  --cpus-per-task  and
5075              --cpus-per-gpu.
5076
5077       DefMemPerCPU
5078              Default  real  memory  size  available  per  allocated  CPU   in
5079              megabytes.   Used  to  avoid over-subscribing memory and causing
5080              paging.  DefMemPerCPU would generally be used if individual pro‐
5081              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5082              lectType=select/cons_tres).  If not set, the DefMemPerCPU  value
5083              for  the  entire  cluster  will be used.  Also see DefMemPerGPU,
5084              DefMemPerNode and MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU  and
5085              DefMemPerNode are mutually exclusive.
5086
5087       DefMemPerGPU
5088              Default   real  memory  size  available  per  allocated  GPU  in
5089              megabytes.  Also see DefMemPerCPU, DefMemPerNode and  MaxMemPer‐
5090              CPU.   DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually
5091              exclusive.
5092
5093       DefMemPerNode
5094              Default  real  memory  size  available  per  allocated  node  in
5095              megabytes.   Used  to  avoid over-subscribing memory and causing
5096              paging.  DefMemPerNode would generally be used  if  whole  nodes
5097              are  allocated  to jobs (SelectType=select/linear) and resources
5098              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
5099              If  not set, the DefMemPerNode value for the entire cluster will
5100              be used.  Also see DefMemPerCPU, DefMemPerGPU and  MaxMemPerCPU.
5101              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
5102              sive.
5103
5104       DenyAccounts
5105              Comma-separated list of accounts which may not execute  jobs  in
5106              the  partition.  By default, no accounts are denied access NOTE:
5107              If AllowAccounts is used then DenyAccounts will not be enforced.
5108              Also refer to AllowAccounts.
5109
5110       DenyQos
5111              Comma-separated  list  of  Qos which may not execute jobs in the
5112              partition.  By default, no QOS are denied access  NOTE:  If  Al‐
5113              lowQos  is  used  then DenyQos will not be enforced.  Also refer
5114              AllowQos.
5115
5116       DisableRootJobs
5117              If set to "YES" then user root will be  prevented  from  running
5118              any jobs on this partition.  The default value will be the value
5119              of DisableRootJobs set  outside  of  a  partition  specification
5120              (which is "NO", allowing user root to execute jobs).
5121
5122       ExclusiveUser
5123              If  set  to  "YES"  then  nodes will be exclusively allocated to
5124              users.  Multiple jobs may be run for the same user, but only one
5125              user can be active at a time.  This capability is also available
5126              on a per-job basis by using the --exclusive=user option.
5127
5128       GraceTime
5129              Specifies, in units of seconds, the preemption grace time to  be
5130              extended  to  a job which has been selected for preemption.  The
5131              default value is zero, no preemption grace time  is  allowed  on
5132              this  partition.   Once  a job has been selected for preemption,
5133              its end time is set to the  current  time  plus  GraceTime.  The
5134              job's  tasks are immediately sent SIGCONT and SIGTERM signals in
5135              order to provide notification of its imminent termination.  This
5136              is  followed by the SIGCONT, SIGTERM and SIGKILL signal sequence
5137              upon reaching its new end time. This second set  of  signals  is
5138              sent  to  both the tasks and the containing batch script, if ap‐
5139              plicable.  See also the global KillWait configuration parameter.
5140
5141       Hidden Specifies if the partition and its jobs are to be hidden by  de‐
5142              fault.  Hidden partitions will by default not be reported by the
5143              Slurm APIs or commands.  Possible values  are  "YES"  and  "NO".
5144              The  default  value  is  "NO".  Note that partitions that a user
5145              lacks access to by virtue of the AllowGroups parameter will also
5146              be hidden by default.
5147
5148       LLN    Schedule resources to jobs on the least loaded nodes (based upon
5149              the number of idle CPUs). This is generally only recommended for
5150              an  environment  with serial jobs as idle resources will tend to
5151              be highly fragmented, resulting in parallel jobs being  distrib‐
5152              uted  across many nodes.  Note that node Weight takes precedence
5153              over how many idle resources are on each node.  Also see the Se‐
5154              lectTypeParameters  configuration  parameter  CR_LLN  to use the
5155              least loaded nodes in every partition.
5156
5157       MaxCPUsPerNode
5158              Maximum number of CPUs on any node available to  all  jobs  from
5159              this partition.  This can be especially useful to schedule GPUs.
5160              For example a node can be associated with two  Slurm  partitions
5161              (e.g.  "cpu"  and  "gpu") and the partition/queue "cpu" could be
5162              limited to only a subset of the node's CPUs, ensuring  that  one
5163              or  more  CPUs  would  be  available to jobs in the "gpu" parti‐
5164              tion/queue.
5165
5166       MaxMemPerCPU
5167              Maximum  real  memory  size  available  per  allocated  CPU   in
5168              megabytes.   Used  to  avoid over-subscribing memory and causing
5169              paging.  MaxMemPerCPU would generally be used if individual pro‐
5170              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5171              lectType=select/cons_tres).  If not set, the MaxMemPerCPU  value
5172              for  the entire cluster will be used.  Also see DefMemPerCPU and
5173              MaxMemPerNode.  MaxMemPerCPU and MaxMemPerNode are mutually  ex‐
5174              clusive.
5175
5176       MaxMemPerNode
5177              Maximum  real  memory  size  available  per  allocated  node  in
5178              megabytes.  Used to avoid over-subscribing  memory  and  causing
5179              paging.   MaxMemPerNode  would  generally be used if whole nodes
5180              are allocated to jobs (SelectType=select/linear)  and  resources
5181              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5182              If not set, the MaxMemPerNode value for the entire cluster  will
5183              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
5184              and MaxMemPerNode are mutually exclusive.
5185
5186       MaxNodes
5187              Maximum count of nodes which may be allocated to any single job.
5188              The  default  value  is "UNLIMITED", which is represented inter‐
5189              nally as -1.
5190
5191       MaxTime
5192              Maximum run time  limit  for  jobs.   Format  is  minutes,  min‐
5193              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
5194              utes, days-hours:minutes:seconds or "UNLIMITED".   Time  resolu‐
5195              tion  is one minute and second values are rounded up to the next
5196              minute.  The job TimeLimit may be updated by root, SlurmUser  or
5197              an  Operator to a value higher than the configured MaxTime after
5198              job submission.
5199
5200       MinNodes
5201              Minimum count of nodes which may be allocated to any single job.
5202              The default value is 0.
5203
5204       Nodes  Comma-separated  list  of nodes or nodesets which are associated
5205              with this partition.  Node names may be specified using the node
5206              range  expression  syntax described above. A blank list of nodes
5207              (i.e. "Nodes= ") can be used if one wants a partition to  exist,
5208              but  have no resources (possibly on a temporary basis).  A value
5209              of "ALL" is mapped to all nodes configured in the cluster.
5210
5211       OverSubscribe
5212              Controls the ability of the partition to execute more  than  one
5213              job  at  a time on each resource (node, socket or core depending
5214              upon the value of SelectTypeParameters).  If resources are to be
5215              over-subscribed,  avoiding  memory over-subscription is very im‐
5216              portant.  SelectTypeParameters should  be  configured  to  treat
5217              memory  as  a consumable resource and the --mem option should be
5218              used for job allocations.  Sharing  of  resources  is  typically
5219              useful   only   when  using  gang  scheduling  (PreemptMode=sus‐
5220              pend,gang).  Possible values for OverSubscribe are  "EXCLUSIVE",
5221              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
5222              can negatively impact performance for systems  with  many  thou‐
5223              sands of running jobs.  The default value is "NO".  For more in‐
5224              formation see the following web pages:
5225              https://slurm.schedmd.com/cons_res.html
5226              https://slurm.schedmd.com/cons_res_share.html
5227              https://slurm.schedmd.com/gang_scheduling.html
5228              https://slurm.schedmd.com/preempt.html
5229
5230              EXCLUSIVE   Allocates entire nodes to  jobs  even  with  Select‐
5231                          Type=select/cons_res  or SelectType=select/cons_tres
5232                          configured.  Jobs that run in partitions with  Over‐
5233                          Subscribe=EXCLUSIVE  will  have  exclusive access to
5234                          all allocated nodes.  These jobs are  allocated  all
5235                          CPUs  and GRES on the nodes, but they are only allo‐
5236                          cated as much memory as they ask for. This is by de‐
5237                          sign  to  support gang scheduling, because suspended
5238                          jobs still reside in memory. To request all the mem‐
5239                          ory on a node, use --mem=0 at submit time.
5240
5241              FORCE       Makes  all  resources (except GRES) in the partition
5242                          available for oversubscription without any means for
5243                          users  to  disable it.  May be followed with a colon
5244                          and maximum number of jobs in running  or  suspended
5245                          state.   For  example  OverSubscribe=FORCE:4 enables
5246                          each node, socket or core to oversubscribe each  re‐
5247                          source  four ways.  Recommended only for systems us‐
5248                          ing PreemptMode=suspend,gang.
5249
5250                          NOTE: OverSubscribe=FORCE:1 is a special  case  that
5251                          is not exactly equivalent to OverSubscribe=NO. Over‐
5252                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5253                          tion  of resources in the same partition but it will
5254                          still allow oversubscription due to preemption. Set‐
5255                          ting  OverSubscribe=NO will prevent oversubscription
5256                          from happening due to preemption as well.
5257
5258                          NOTE: If using PreemptType=preempt/qos you can spec‐
5259                          ify  a  value  for FORCE that is greater than 1. For
5260                          example, OverSubscribe=FORCE:2 will permit two  jobs
5261                          per  resource  normally,  but  a  third  job  can be
5262                          started only if done  so  through  preemption  based
5263                          upon QOS.
5264
5265                          NOTE: If OverSubscribe is configured to FORCE or YES
5266                          in your slurm.conf and the system is not  configured
5267                          to  use  preemption (PreemptMode=OFF) accounting can
5268                          easily grow to values greater than the  actual  uti‐
5269                          lization.  It  may  be common on such systems to get
5270                          error messages in the slurmdbd log stating: "We have
5271                          more allocated time than is possible."
5272
5273              YES         Makes  all  resources (except GRES) in the partition
5274                          available for sharing upon request by the job.   Re‐
5275                          sources will only be over-subscribed when explicitly
5276                          requested by the user  using  the  "--oversubscribe"
5277                          option  on  job  submission.  May be followed with a
5278                          colon and maximum number of jobs in running or  sus‐
5279                          pended state.  For example "OverSubscribe=YES:4" en‐
5280                          ables each node, socket or core  to  execute  up  to
5281                          four  jobs  at  once.   Recommended only for systems
5282                          running  with  gang   scheduling   (PreemptMode=sus‐
5283                          pend,gang).
5284
5285              NO          Selected resources are allocated to a single job. No
5286                          resource will be allocated to more than one job.
5287
5288                          NOTE:  Even  if  you  are   using   PreemptMode=sus‐
5289                          pend,gang,  setting  OverSubscribe=NO  will  disable
5290                          preemption   on   that   partition.   Use   OverSub‐
5291                          scribe=FORCE:1  if  you want to disable normal over‐
5292                          subscription but still allow suspension due to  pre‐
5293                          emption.
5294
5295       OverTimeLimit
5296              Number  of  minutes by which a job can exceed its time limit be‐
5297              fore being canceled.  Normally a job's time limit is treated  as
5298              a  hard  limit  and  the  job  will be killed upon reaching that
5299              limit.  Configuring OverTimeLimit will result in the job's  time
5300              limit being treated like a soft limit.  Adding the OverTimeLimit
5301              value to the soft time limit provides  a  hard  time  limit,  at
5302              which  point  the  job is canceled.  This is particularly useful
5303              for backfill scheduling, which bases upon each job's  soft  time
5304              limit.  If not set, the OverTimeLimit value for the entire clus‐
5305              ter will be used.  May not exceed 65533  minutes.   A  value  of
5306              "UNLIMITED" is also supported.
5307
5308       PartitionName
5309              Name  by  which  the partition may be referenced (e.g. "Interac‐
5310              tive").  This name can be specified  by  users  when  submitting
5311              jobs.   If  the PartitionName is "DEFAULT", the values specified
5312              with that record will apply to subsequent  partition  specifica‐
5313              tions  unless  explicitly  set to other values in that partition
5314              record or replaced with a different set of default values.  Each
5315              line  where  PartitionName  is  "DEFAULT" will replace or add to
5316              previous default values and not a reinitialize the default  val‐
5317              ues.
5318
5319       PreemptMode
5320              Mechanism  used  to  preempt  jobs or enable gang scheduling for
5321              this partition when PreemptType=preempt/partition_prio  is  con‐
5322              figured.   This partition-specific PreemptMode configuration pa‐
5323              rameter will override the cluster-wide PreemptMode for this par‐
5324              tition.   It  can  be  set to OFF to disable preemption and gang
5325              scheduling for this partition.  See also  PriorityTier  and  the
5326              above  description of the cluster-wide PreemptMode parameter for
5327              further details.
5328              The GANG option is used to enable gang scheduling independent of
5329              whether  preemption is enabled (i.e. independent of the Preempt‐
5330              Type setting). It can be specified in addition to a  PreemptMode
5331              setting  with  the  two  options  comma separated (e.g. Preempt‐
5332              Mode=SUSPEND,GANG).
5333              See         <https://slurm.schedmd.com/preempt.html>         and
5334              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
5335              tails.
5336
5337              NOTE: For performance reasons, the backfill  scheduler  reserves
5338              whole  nodes  for  jobs,  not  partial nodes. If during backfill
5339              scheduling a job preempts one or  more  other  jobs,  the  whole
5340              nodes  for  those  preempted jobs are reserved for the preemptor
5341              job, even if the preemptor job requested  fewer  resources  than
5342              that.   These reserved nodes aren't available to other jobs dur‐
5343              ing that backfill cycle, even if the other jobs could fit on the
5344              nodes.  Therefore, jobs may preempt more resources during a sin‐
5345              gle backfill iteration than they requested.
5346              NOTE: For heterogeneous job to be considered for preemption  all
5347              components must be eligible for preemption. When a heterogeneous
5348              job is to be preempted the first identified component of the job
5349              with  the highest order PreemptMode (SUSPEND (highest), REQUEUE,
5350              CANCEL (lowest)) will be used to set  the  PreemptMode  for  all
5351              components.  The GraceTime and user warning signal for each com‐
5352              ponent of the heterogeneous job  remain  unique.   Heterogeneous
5353              jobs are excluded from GANG scheduling operations.
5354
5355              OFF         Is the default value and disables job preemption and
5356                          gang scheduling.  It is only  compatible  with  Pre‐
5357                          emptType=preempt/none  at  a global level.  A common
5358                          use case for this parameter is to set it on a parti‐
5359                          tion to disable preemption for that partition.
5360
5361              CANCEL      The preempted job will be cancelled.
5362
5363              GANG        Enables  gang  scheduling  (time slicing) of jobs in
5364                          the same partition, and allows the resuming of  sus‐
5365                          pended jobs.
5366
5367                          NOTE: Gang scheduling is performed independently for
5368                          each partition, so if you only want time-slicing  by
5369                          OverSubscribe,  without any preemption, then config‐
5370                          uring partitions with overlapping nodes is not  rec‐
5371                          ommended.   On  the  other  hand, if you want to use
5372                          PreemptType=preempt/partition_prio  to  allow   jobs
5373                          from  higher PriorityTier partitions to Suspend jobs
5374                          from lower PriorityTier  partitions  you  will  need
5375                          overlapping partitions, and PreemptMode=SUSPEND,GANG
5376                          to use the Gang scheduler to  resume  the  suspended
5377                          jobs(s).  In any case, time-slicing won't happen be‐
5378                          tween jobs on different partitions.
5379                          NOTE: Heterogeneous  jobs  are  excluded  from  GANG
5380                          scheduling operations.
5381
5382              REQUEUE     Preempts  jobs  by  requeuing  them (if possible) or
5383                          canceling them.  For jobs to be requeued  they  must
5384                          have  the --requeue sbatch option set or the cluster
5385                          wide JobRequeue parameter in slurm.conf must be  set
5386                          to 1.
5387
5388              SUSPEND     The  preempted jobs will be suspended, and later the
5389                          Gang scheduler will resume them. Therefore the  SUS‐
5390                          PEND preemption mode always needs the GANG option to
5391                          be specified at the cluster level. Also, because the
5392                          suspended  jobs  will  still use memory on the allo‐
5393                          cated nodes, Slurm needs to be able to track  memory
5394                          resources to be able to suspend jobs.
5395
5396                          If  the  preemptees  and  preemptor are on different
5397                          partitions then the preempted jobs will remain  sus‐
5398                          pended until the preemptor ends.
5399                          NOTE:  Because gang scheduling is performed indepen‐
5400                          dently for each partition, if using PreemptType=pre‐
5401                          empt/partition_prio then jobs in higher PriorityTier
5402                          partitions will suspend jobs in  lower  PriorityTier
5403                          partitions  to  run  on the released resources. Only
5404                          when the preemptor job ends will the suspended  jobs
5405                          will be resumed by the Gang scheduler.
5406                          NOTE:  Suspended  jobs will not release GRES. Higher
5407                          priority jobs will not be able to  preempt  to  gain
5408                          access to GRES.
5409
5410       PriorityJobFactor
5411              Partition  factor  used by priority/multifactor plugin in calcu‐
5412              lating job priority.  The value may not exceed 65533.  Also  see
5413              PriorityTier.
5414
5415       PriorityTier
5416              Jobs  submitted  to a partition with a higher PriorityTier value
5417              will be evaluated by the scheduler before pending jobs in a par‐
5418              tition  with  a lower PriorityTier value. They will also be con‐
5419              sidered for preemption of  running  jobs  in  partition(s)  with
5420              lower PriorityTier values if PreemptType=preempt/partition_prio.
5421              The value may not exceed 65533.  Also see PriorityJobFactor.
5422
5423       QOS    Used to extend the limits available to a  QOS  on  a  partition.
5424              Jobs will not be associated to this QOS outside of being associ‐
5425              ated to the partition.  They will still be associated  to  their
5426              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5427              set in both the Partition's QOS and the Job's QOS the  Partition
5428              QOS  will  be  honored  unless the Job's QOS has the OverPartQOS
5429              flag set in which the Job's QOS will have priority.
5430
5431       ReqResv
5432              Specifies users of this partition are required  to  designate  a
5433              reservation  when submitting a job. This option can be useful in
5434              restricting usage of a partition that may have  higher  priority
5435              or additional resources to be allowed only within a reservation.
5436              Possible values are "YES" and "NO".  The default value is "NO".
5437
5438       ResumeTimeout
5439              Maximum time permitted (in seconds) between when a  node  resume
5440              request  is  issued  and when the node is actually available for
5441              use.  Nodes which fail to respond in this  time  frame  will  be
5442              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
5443              which reboot after this time frame will be marked  DOWN  with  a
5444              reason  of  "Node unexpectedly rebooted."  For nodes that are in
5445              multiple partitions with this option set, the highest time  will
5446              take  effect. If not set on any partition, the node will use the
5447              ResumeTimeout value set for the entire cluster.
5448
5449       RootOnly
5450              Specifies if only user ID zero (i.e. user root) may allocate re‐
5451              sources  in this partition. User root may allocate resources for
5452              any other user, but the request must be initiated by user  root.
5453              This  option can be useful for a partition to be managed by some
5454              external entity (e.g. a higher-level job manager)  and  prevents
5455              users  from directly using those resources.  Possible values are
5456              "YES" and "NO".  The default value is "NO".
5457
5458       SelectTypeParameters
5459              Partition-specific resource allocation type.   This  option  re‐
5460              places  the global SelectTypeParameters value.  Supported values
5461              are CR_Core,  CR_Core_Memory,  CR_Socket  and  CR_Socket_Memory.
5462              Use  requires  the system-wide SelectTypeParameters value be set
5463              to any of the four supported values  previously  listed;  other‐
5464              wise, the partition-specific value will be ignored.
5465
5466       Shared The  Shared  configuration  parameter  has  been replaced by the
5467              OverSubscribe parameter described above.
5468
5469       State  State of partition or availability for use.  Possible values are
5470              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5471              See also the related "Alternate" keyword.
5472
5473              UP        Designates that new jobs may be queued on  the  parti‐
5474                        tion,  and  that  jobs  may be allocated nodes and run
5475                        from the partition.
5476
5477              DOWN      Designates that new jobs may be queued on  the  parti‐
5478                        tion,  but  queued jobs may not be allocated nodes and
5479                        run from the partition. Jobs already  running  on  the
5480                        partition continue to run. The jobs must be explicitly
5481                        canceled to force their termination.
5482
5483              DRAIN     Designates that no new jobs may be queued on the  par‐
5484                        tition (job submission requests will be denied with an
5485                        error message), but jobs already queued on the  parti‐
5486                        tion  may  be  allocated  nodes and run.  See also the
5487                        "Alternate" partition specification.
5488
5489              INACTIVE  Designates that no new jobs may be queued on the  par‐
5490                        tition,  and  jobs already queued may not be allocated
5491                        nodes and run.  See  also  the  "Alternate"  partition
5492                        specification.
5493
5494       SuspendTime
5495              Nodes  which remain idle or down for this number of seconds will
5496              be placed into power save mode  by  SuspendProgram.   For  nodes
5497              that  are in multiple partitions with this option set, the high‐
5498              est time will take effect. If not set on any partition, the node
5499              will use the SuspendTime value set for the entire cluster.  Set‐
5500              ting SuspendTime to anything but "INFINITE"  will  enable  power
5501              save mode.
5502
5503       SuspendTimeout
5504              Maximum  time permitted (in seconds) between when a node suspend
5505              request is issued and when the node is shutdown.  At  that  time
5506              the  node  must  be  ready  for a resume request to be issued as
5507              needed for new work.  For nodes that are in multiple  partitions
5508              with  this option set, the highest time will take effect. If not
5509              set on any partition, the node will use the SuspendTimeout value
5510              set for the entire cluster.
5511
5512       TRESBillingWeights
5513              TRESBillingWeights is used to define the billing weights of each
5514              TRES type that will be used in calculating the usage of  a  job.
5515              The calculated usage is used when calculating fairshare and when
5516              enforcing the TRES billing limit on jobs.
5517
5518              Billing weights are specified as a comma-separated list of <TRES
5519              Type>=<TRES Billing Weight> pairs.
5520
5521              Any  TRES Type is available for billing. Note that the base unit
5522              for memory and burst buffers is megabytes.
5523
5524              By default the billing of TRES is calculated as the sum  of  all
5525              TRES types multiplied by their corresponding billing weight.
5526
5527              The  weighted  amount  of a resource can be adjusted by adding a
5528              suffix of K,M,G,T or P after the billing weight. For example,  a
5529              memory weight of "mem=.25" on a job allocated 8GB will be billed
5530              2048 (8192MB *.25) units. A memory weight of "mem=.25G"  on  the
5531              same job will be billed 2 (8192MB * (.25/1024)) units.
5532
5533              Negative values are allowed.
5534
5535              When  a job is allocated 1 CPU and 8 GB of memory on a partition
5536              configured                   with                   TRESBilling‐
5537              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5538              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5539
5540              If PriorityFlags=MAX_TRES is configured, the  billable  TRES  is
5541              calculated  as the MAX of individual TRES' on a node (e.g. cpus,
5542              mem, gres) plus the sum of all global TRES' (e.g. licenses). Us‐
5543              ing  the same example above the billable TRES will be MAX(1*1.0,
5544              8*0.25) + (0*2.0) = 2.0.
5545
5546              If TRESBillingWeights is not defined  then  the  job  is  billed
5547              against the total number of allocated CPUs.
5548
5549              NOTE: TRESBillingWeights doesn't affect job priority directly as
5550              it is currently not used for the size of the job.  If  you  want
5551              TRES'  to  play  a  role in the job's priority then refer to the
5552              PriorityWeightTRES option.
5553

PROLOG AND EPILOG SCRIPTS

5555       There are a variety of prolog and epilog program options  that  execute
5556       with  various  permissions and at various times.  The four options most
5557       likely to be used are: Prolog and Epilog (executed once on each compute
5558       node  for  each job) plus PrologSlurmctld and EpilogSlurmctld (executed
5559       once on the ControlMachine for each job).
5560
5561       NOTE: Standard output and error messages are  normally  not  preserved.
5562       Explicitly  write  output and error messages to an appropriate location
5563       if you wish to preserve that information.
5564
5565       NOTE:  By default the Prolog script is ONLY run on any individual  node
5566       when  it  first  sees a job step from a new allocation. It does not run
5567       the Prolog immediately when an allocation is granted.  If no job  steps
5568       from  an allocation are run on a node, it will never run the Prolog for
5569       that allocation.  This Prolog behaviour can  be  changed  by  the  Pro‐
5570       logFlags  parameter.  The Epilog, on the other hand, always runs on ev‐
5571       ery node of an allocation when the allocation is released.
5572
5573       If the Epilog fails (returns a non-zero exit code), this will result in
5574       the node being set to a DRAIN state.  If the EpilogSlurmctld fails (re‐
5575       turns a non-zero exit code), this will only be logged.  If  the  Prolog
5576       fails  (returns a non-zero exit code), this will result in the node be‐
5577       ing set to a DRAIN state and the job being requeued in a held state un‐
5578       less  nohold_on_prolog_fail  is  configured in SchedulerParameters.  If
5579       the PrologSlurmctld fails (returns a non-zero exit code), this will re‐
5580       sult in the job being requeued to be executed on another node if possi‐
5581       ble. Only batch jobs can be requeued.   Interactive  jobs  (salloc  and
5582       srun)  will be cancelled if the PrologSlurmctld fails.  If slurmcltd is
5583       stopped while either PrologSlurmctld or EpilogSlurmctld is running, the
5584       script will be killed with SIGKILL. The script will restart when slurm‐
5585       ctld restarts.
5586
5587
5588       Information about the job is passed to  the  script  using  environment
5589       variables.  Unless otherwise specified, these environment variables are
5590       available in each of the scripts mentioned above (Prolog, Epilog,  Pro‐
5591       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5592       ables that includes those  available  in  the  SrunProlog,  SrunEpilog,
5593       TaskProlog  and  TaskEpilog  please  see  the  Prolog  and Epilog Guide
5594       <https://slurm.schedmd.com/prolog_epilog.html>.
5595
5596
5597       SLURM_ARRAY_JOB_ID
5598              If this job is part of a job array, this will be set to the  job
5599              ID.   Otherwise  it will not be set.  To reference this specific
5600              task of a job array, combine SLURM_ARRAY_JOB_ID  with  SLURM_AR‐
5601              RAY_TASK_ID      (e.g.      "scontrol     update     ${SLURM_AR‐
5602              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}  ...");  Available  in   Pro‐
5603              logSlurmctld and EpilogSlurmctld.
5604
5605       SLURM_ARRAY_TASK_ID
5606              If this job is part of a job array, this will be set to the task
5607              ID.  Otherwise it will not be set.  To reference  this  specific
5608              task  of  a job array, combine SLURM_ARRAY_JOB_ID with SLURM_AR‐
5609              RAY_TASK_ID     (e.g.     "scontrol      update      ${SLURM_AR‐
5610              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}   ...");  Available  in  Pro‐
5611              logSlurmctld and EpilogSlurmctld.
5612
5613       SLURM_ARRAY_TASK_MAX
5614              If this job is part of a job array, this will be set to the max‐
5615              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5616              logSlurmctld and EpilogSlurmctld.
5617
5618       SLURM_ARRAY_TASK_MIN
5619              If this job is part of a job array, this will be set to the min‐
5620              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5621              logSlurmctld and EpilogSlurmctld.
5622
5623       SLURM_ARRAY_TASK_STEP
5624              If this job is part of a job array, this will be set to the step
5625              size  of  task IDs.  Otherwise it will not be set.  Available in
5626              PrologSlurmctld and EpilogSlurmctld.
5627
5628       SLURM_CLUSTER_NAME
5629              Name of the cluster executing the job.
5630
5631       SLURM_CONF
5632              Location of the slurm.conf file. Available in Prolog and Epilog.
5633
5634       SLURMD_NODENAME
5635              Name of the node running the task. In the case of a parallel job
5636              executing on multiple compute nodes, the various tasks will have
5637              this environment variable set to different values on  each  com‐
5638              pute node. Available in Prolog and Epilog.
5639
5640       SLURM_JOB_ACCOUNT
5641              Account name used for the job.
5642
5643       SLURM_JOB_COMMENT
5644              Comment added to the job.  Available in Prolog, PrologSlurmctld,
5645              Epilog and EpilogSlurmctld.
5646
5647       SLURM_JOB_CONSTRAINTS
5648              Features required to run the job.   Available  in  Prolog,  Pro‐
5649              logSlurmctld, Epilog and EpilogSlurmctld.
5650
5651       SLURM_JOB_DERIVED_EC
5652              The  highest  exit  code  of all of the job steps.  Available in
5653              Epilog and EpilogSlurmctld.
5654
5655       SLURM_JOB_EXIT_CODE
5656              The exit code of the job script (or salloc). The  value  is  the
5657              status  as  returned  by  the  wait()  system call (See wait(2))
5658              Available in Epilog and EpilogSlurmctld.
5659
5660       SLURM_JOB_EXIT_CODE2
5661              The exit code of the job script (or salloc). The value  has  the
5662              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5663              cally as set by the exit() function. The second  number  of  the
5664              signal that caused the process to terminate if it was terminated
5665              by a signal.  Available in Epilog and EpilogSlurmctld.
5666
5667       SLURM_JOB_GID
5668              Group ID of the job's owner.
5669
5670       SLURM_JOB_GPUS
5671              The GPU IDs of GPUs in the job allocation (if  any).   Available
5672              in the Prolog and Epilog.
5673
5674       SLURM_JOB_GROUP
5675              Group name of the job's owner.  Available in PrologSlurmctld and
5676              EpilogSlurmctld.
5677
5678       SLURM_JOB_ID
5679              Job ID.
5680
5681       SLURM_JOBID
5682              Job ID.
5683
5684       SLURM_JOB_NAME
5685              Name of the job.  Available in PrologSlurmctld and  EpilogSlurm‐
5686              ctld.
5687
5688       SLURM_JOB_NODELIST
5689              Nodes  assigned  to job. A Slurm hostlist expression.  "scontrol
5690              show hostnames" can be used to convert this to a list  of  indi‐
5691              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5692              logSlurmctld.
5693
5694       SLURM_JOB_PARTITION
5695              Partition that job runs in.  Available in  Prolog,  PrologSlurm‐
5696              ctld, Epilog and EpilogSlurmctld.
5697
5698       SLURM_JOB_UID
5699              User ID of the job's owner.
5700
5701       SLURM_JOB_USER
5702              User name of the job's owner.
5703
5704       SLURM_SCRIPT_CONTEXT
5705              Identifies which epilog or prolog program is currently running.
5706

UNKILLABLE STEP PROGRAM SCRIPT

5708       This program can be used to take special actions to clean up the unkil‐
5709       lable processes and/or notify system administrators.  The program  will
5710       be run as SlurmdUser (usually "root") on the compute node where Unkill‐
5711       ableStepTimeout was triggered.
5712
5713       Information about the unkillable job step is passed to the script using
5714       environment variables.
5715
5716
5717       SLURM_JOB_ID
5718              Job ID.
5719
5720       SLURM_STEP_ID
5721              Job Step ID.
5722

NETWORK TOPOLOGY

5724       Slurm  is  able  to  optimize  job allocations to minimize network con‐
5725       tention.  Special Slurm logic is used to optimize allocations  on  sys‐
5726       tems with a three-dimensional interconnect.  and information about con‐
5727       figuring those systems are  available  on  web  pages  available  here:
5728       <https://slurm.schedmd.com/>.   For a hierarchical network, Slurm needs
5729       to have detailed information about how nodes are configured on the net‐
5730       work switches.
5731
5732       Given  network topology information, Slurm allocates all of a job's re‐
5733       sources onto a single  leaf  of  the  network  (if  possible)  using  a
5734       best-fit  algorithm.  Otherwise it will allocate a job's resources onto
5735       multiple leaf switches so  as  to  minimize  the  use  of  higher-level
5736       switches.   The  TopologyPlugin parameter controls which plugin is used
5737       to collect network topology information.   The  only  values  presently
5738       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5739       forms best-fit logic over three-dimensional topology),  "topology/none"
5740       (default  for other systems, best-fit logic over one-dimensional topol‐
5741       ogy), "topology/tree" (determine the network topology based upon infor‐
5742       mation  contained  in a topology.conf file, see "man topology.conf" for
5743       more information).  Future plugins may gather topology information  di‐
5744       rectly from the network.  The topology information is optional.  If not
5745       provided, Slurm will perform a best-fit algorithm  assuming  the  nodes
5746       are  in  a  one-dimensional  array as configured and the communications
5747       cost is related to the node distance in this array.
5748
5749

RELOCATING CONTROLLERS

5751       If the cluster's computers used for the primary  or  backup  controller
5752       will be out of service for an extended period of time, it may be desir‐
5753       able to relocate them.  In order to do so, follow this procedure:
5754
5755       1. Stop the Slurm daemons
5756       2. Modify the slurm.conf file appropriately
5757       3. Distribute the updated slurm.conf file to all nodes
5758       4. Restart the Slurm daemons
5759
5760       There should be no loss of any running or pending  jobs.   Ensure  that
5761       any  nodes  added  to  the cluster have the current slurm.conf file in‐
5762       stalled.
5763
5764       CAUTION: If two nodes are simultaneously configured as the primary con‐
5765       troller  (two  nodes  on which SlurmctldHost specify the local host and
5766       the slurmctld daemon is executing on each), system behavior will be de‐
5767       structive.  If a compute node has an incorrect SlurmctldHost parameter,
5768       that node may be rendered unusable, but no other harm will result.
5769
5770

EXAMPLE

5772       #
5773       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5774       # Author: John Doe
5775       # Date: 11/06/2001
5776       #
5777       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5778       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5779       #
5780       AuthType=auth/munge
5781       Epilog=/usr/local/slurm/epilog
5782       Prolog=/usr/local/slurm/prolog
5783       FirstJobId=65536
5784       InactiveLimit=120
5785       JobCompType=jobcomp/filetxt
5786       JobCompLoc=/var/log/slurm/jobcomp
5787       KillWait=30
5788       MaxJobCount=10000
5789       MinJobAge=3600
5790       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5791       ReturnToService=0
5792       SchedulerType=sched/backfill
5793       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5794       SlurmdLogFile=/var/log/slurm/slurmd.log
5795       SlurmctldPort=7002
5796       SlurmdPort=7003
5797       SlurmdSpoolDir=/var/spool/slurmd.spool
5798       StateSaveLocation=/var/spool/slurm.state
5799       SwitchType=switch/none
5800       TmpFS=/tmp
5801       WaitTime=30
5802       #
5803       # Node Configurations
5804       #
5805       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5806       NodeName=DEFAULT State=UNKNOWN
5807       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5808       # Update records for specific DOWN nodes
5809       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5810       #
5811       # Partition Configurations
5812       #
5813       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5814       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5815       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5816       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5817
5818

INCLUDE MODIFIERS

5820       The "include" key word can be used with modifiers within the  specified
5821       pathname.  These modifiers would be replaced with cluster name or other
5822       information depending on which modifier is specified. If  the  included
5823       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5824       slash), it will searched for in the same directory  as  the  slurm.conf
5825       file.
5826
5827
5828       %c     Cluster name specified in the slurm.conf will be used.
5829
5830       EXAMPLE
5831       ClusterName=linux
5832       include /home/slurm/etc/%c_config
5833       # Above line interpreted as
5834       # "include /home/slurm/etc/linux_config"
5835
5836

FILE AND DIRECTORY PERMISSIONS

5838       There  are  three classes of files: Files used by slurmctld must be ac‐
5839       cessible by user SlurmUser and accessible by  the  primary  and  backup
5840       control machines.  Files used by slurmd must be accessible by user root
5841       and accessible from every compute node.  A few files need to be  acces‐
5842       sible by normal users on all login and compute nodes.  While many files
5843       and directories are listed below, most of them will not  be  used  with
5844       most configurations.
5845
5846
5847       Epilog Must  be  executable  by  user root.  It is recommended that the
5848              file be readable by all users.  The file  must  exist  on  every
5849              compute node.
5850
5851       EpilogSlurmctld
5852              Must  be  executable  by user SlurmUser.  It is recommended that
5853              the file be readable by all users.  The file must be  accessible
5854              by the primary and backup control machines.
5855
5856       HealthCheckProgram
5857              Must  be  executable  by  user root.  It is recommended that the
5858              file be readable by all users.  The file  must  exist  on  every
5859              compute node.
5860
5861       JobCompLoc
5862              If this specifies a file, it must be writable by user SlurmUser.
5863              The file must be accessible by the primary  and  backup  control
5864              machines.
5865
5866       MailProg
5867              Must  be  executable by user SlurmUser.  Must not be writable by
5868              regular users.  The file must be accessible by the  primary  and
5869              backup control machines.
5870
5871       Prolog Must  be  executable  by  user root.  It is recommended that the
5872              file be readable by all users.  The file  must  exist  on  every
5873              compute node.
5874
5875       PrologSlurmctld
5876              Must  be  executable  by user SlurmUser.  It is recommended that
5877              the file be readable by all users.  The file must be  accessible
5878              by the primary and backup control machines.
5879
5880       ResumeProgram
5881              Must be executable by user SlurmUser.  The file must be accessi‐
5882              ble by the primary and backup control machines.
5883
5884       slurm.conf
5885              Readable to all users on all nodes.  Must  not  be  writable  by
5886              regular users.
5887
5888       SlurmctldLogFile
5889              Must be writable by user SlurmUser.  The file must be accessible
5890              by the primary and backup control machines.
5891
5892       SlurmctldPidFile
5893              Must be writable by user root.  Preferably writable  and  remov‐
5894              able  by  SlurmUser.  The file must be accessible by the primary
5895              and backup control machines.
5896
5897       SlurmdLogFile
5898              Must be writable by user root.  A distinct file  must  exist  on
5899              each compute node.
5900
5901       SlurmdPidFile
5902              Must  be  writable  by user root.  A distinct file must exist on
5903              each compute node.
5904
5905       SlurmdSpoolDir
5906              Must be writable by user root.  A distinct file  must  exist  on
5907              each compute node.
5908
5909       SrunEpilog
5910              Must  be  executable by all users.  The file must exist on every
5911              login and compute node.
5912
5913       SrunProlog
5914              Must be executable by all users.  The file must exist  on  every
5915              login and compute node.
5916
5917       StateSaveLocation
5918              Must be writable by user SlurmUser.  The file must be accessible
5919              by the primary and backup control machines.
5920
5921       SuspendProgram
5922              Must be executable by user SlurmUser.  The file must be accessi‐
5923              ble by the primary and backup control machines.
5924
5925       TaskEpilog
5926              Must  be  executable by all users.  The file must exist on every
5927              compute node.
5928
5929       TaskProlog
5930              Must be executable by all users.  The file must exist  on  every
5931              compute node.
5932
5933       UnkillableStepProgram
5934              Must be executable by user SlurmUser.  The file must be accessi‐
5935              ble by the primary and backup control machines.
5936

LOGGING

5938       Note that while Slurm daemons create  log  files  and  other  files  as
5939       needed,  it  treats  the  lack  of parent directories as a fatal error.
5940       This prevents the daemons from running if critical file systems are not
5941       mounted  and  will minimize the risk of cold-starting (starting without
5942       preserving jobs).
5943
5944       Log files and job accounting files may need to be created/owned by  the
5945       "SlurmUser"  uid  to  be  successfully  accessed.   Use the "chown" and
5946       "chmod" commands to set the ownership  and  permissions  appropriately.
5947       See  the  section  FILE AND DIRECTORY PERMISSIONS for information about
5948       the various files and directories used by Slurm.
5949
5950       It is recommended that the logrotate utility be  used  to  ensure  that
5951       various  log  files do not become too large.  This also applies to text
5952       files used for accounting, process tracking, and the  slurmdbd  log  if
5953       they are used.
5954
5955       Here is a sample logrotate configuration. Make appropriate site modifi‐
5956       cations and save as  /etc/logrotate.d/slurm  on  all  nodes.   See  the
5957       logrotate man page for more details.
5958
5959       ##
5960       # Slurm Logrotate Configuration
5961       ##
5962       /var/log/slurm/*.log {
5963            compress
5964            missingok
5965            nocopytruncate
5966            nodelaycompress
5967            nomail
5968            notifempty
5969            noolddir
5970            rotate 5
5971            sharedscripts
5972            size=5M
5973            create 640 slurm root
5974            postrotate
5975                 pkill -x --signal SIGUSR2 slurmctld
5976                 pkill -x --signal SIGUSR2 slurmd
5977                 pkill -x --signal SIGUSR2 slurmdbd
5978                 exit 0
5979            endscript
5980       }
5981
5982

COPYING

5984       Copyright  (C)  2002-2007  The Regents of the University of California.
5985       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5986       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5987       Copyright (C) 2010-2022 SchedMD LLC.
5988
5989       This file is part of Slurm, a resource  management  program.   For  de‐
5990       tails, see <https://slurm.schedmd.com/>.
5991
5992       Slurm  is free software; you can redistribute it and/or modify it under
5993       the terms of the GNU General Public License as published  by  the  Free
5994       Software  Foundation;  either version 2 of the License, or (at your op‐
5995       tion) any later version.
5996
5997       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
5998       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
5999       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
6000       for more details.
6001
6002

FILES

6004       /etc/slurm.conf
6005
6006

SEE ALSO

6008       cgroup.conf(5),  getaddrinfo(3),  getrlimit(2), gres.conf(5), group(5),
6009       hostname(1), scontrol(1), slurmctld(8), slurmd(8),  slurmdbd(8),  slur‐
6010       mdbd.conf(5), srun(1), spank(8), syslog(3), topology.conf(5)
6011
6012
6013
6014October 2022               Slurm Configuration File              slurm.conf(5)
Impressum