slurm.conf(5)

1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified at execution time by setting the
17       SLURM_CONF environment variable. The Slurm daemons also  allow  you  to
18       override  both the built-in and environment-provided location using the
19       "-f" option on the command line.
20
21       The contents of the file are case insensitive except for the  names  of
22       nodes  and  partitions.  Any  text following a "#" in the configuration
23       file is treated as a comment through the end of that line.  Changes  to
24       the  configuration file take effect upon restart of Slurm daemons, dae‐
25       mon receipt of the SIGHUP signal, or execution of the command "scontrol
26       reconfigure" unless otherwise noted.
27
28       If  a  line  begins  with the word "Include" followed by whitespace and
29       then a file name, that file will be included inline  with  the  current
30       configuration  file.  For large or complex systems, multiple configura‐
31       tion files may prove easier to manage and enable reuse  of  some  files
32       (See INCLUDE MODIFIERS for more details).
33
34       Note on file permissions:
35
36       The slurm.conf file must be readable by all users of Slurm, since it is
37       used by many of the Slurm commands.  Other files that  are  defined  in
38       the  slurm.conf  file,  such as log files and job accounting files, may
39       need to be created/owned by the user "SlurmUser" to be successfully ac‐
40       cessed.   Use the "chown" and "chmod" commands to set the ownership and
41       permissions appropriately.  See the section FILE AND DIRECTORY  PERMIS‐
42       SIONS  for  information about the various files and directories used by
43       Slurm.
44
45

PARAMETERS

47       The overall configuration parameters available include:
48
49
50       AccountingStorageBackupHost
51              The name of the backup machine hosting  the  accounting  storage
52              database.   If used with the accounting_storage/slurmdbd plugin,
53              this is where the backup slurmdbd would be running.   Only  used
54              with systems using SlurmDBD, ignored otherwise.
55
56       AccountingStorageEnforce
57              This controls what level of association-based enforcement to im‐
58              pose on job submissions.  Valid options are any  combination  of
59              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
60              all for all things (except nojobs and nosteps, which must be re‐
61              quested as well).
62
63              If  limits,  qos, or wckeys are set, associations will automati‐
64              cally be set.
65
66              If wckeys is set, TrackWCKey will automatically be set.
67
68              If safe is set, limits and associations  will  automatically  be
69              set.
70
71              If nojobs is set, nosteps will automatically be set.
72
73              By  setting  associations, no new job is allowed to run unless a
74              corresponding association exists in the system.  If  limits  are
75              enforced,  users  can  be limited by association to whatever job
76              size or run time limits are defined.
77
78              If nojobs is set, Slurm will not account for any jobs  or  steps
79              on  the  system. Likewise, if nosteps is set, Slurm will not ac‐
80              count for any steps that have run.
81
82              If safe is enforced, a job will only be launched against an  as‐
83              sociation  or  qos  that has a GrpTRESMins limit set, if the job
84              will be able to run to completion. Without this option set, jobs
85              will  be  launched  as  long  as  their usage hasn't reached the
86              cpu-minutes limit. This can lead to jobs being launched but then
87              killed when the limit is reached.
88
89              With  qos  and/or wckeys enforced jobs will not be scheduled un‐
90              less a valid qos and/or workload characterization key is  speci‐
91              fied.
92
93              A restart of slurmctld is required for changes to this parameter
94              to take effect.
95
96       AccountingStorageExternalHost
97              A     comma-separated     list     of     external     slurmdbds
98              (<host/ip>[:port][,...])  to register with. If no port is given,
99              the AccountingStoragePort will be used.
100
101              This allows clusters registered with the  external  slurmdbd  to
102              communicate  with  each other using the --cluster/-M client com‐
103              mand options.
104
105              The cluster will add itself  to  the  external  slurmdbd  if  it
106              doesn't  exist.  If a non-external cluster already exists on the
107              external slurmdbd, the slurmctld will ignore registering to  the
108              external slurmdbd.
109
110       AccountingStorageHost
111              The name of the machine hosting the accounting storage database.
112              Only used with systems using SlurmDBD, ignored otherwise.
113
114       AccountingStorageParameters
115              Comma-separated list of  key-value  pair  parameters.  Currently
116              supported  values  include options to establish a secure connec‐
117              tion to the database:
118
119              SSL_CERT
120                The path name of the client public key certificate file.
121
122              SSL_CA
123                The path name of the Certificate  Authority  (CA)  certificate
124                file.
125
126              SSL_CAPATH
127                The  path  name  of the directory that contains trusted SSL CA
128                certificate files.
129
130              SSL_KEY
131                The path name of the client private key file.
132
133              SSL_CIPHER
134                The list of permissible ciphers for SSL encryption.
135
136       AccountingStoragePass
137              The password used to gain access to the database  to  store  the
138              accounting  data.   Only used for database type storage plugins,
139              ignored otherwise.  In the case of Slurm DBD  (Database  Daemon)
140              with  MUNGE authentication this can be configured to use a MUNGE
141              daemon specifically configured to provide authentication between
142              clusters  while the default MUNGE daemon provides authentication
143              within a cluster.  In that  case,  AccountingStoragePass  should
144              specify  the  named  port to be used for communications with the
145              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
146              The default value is NULL.
147
148       AccountingStoragePort
149              The  listening  port  of the accounting storage database server.
150              Only used for database type storage plugins, ignored  otherwise.
151              The  default  value  is  SLURMDBD_PORT  as established at system
152              build time. If no value is explicitly specified, it will be  set
153              to  6819.   This value must be equal to the DbdPort parameter in
154              the slurmdbd.conf file.
155
156       AccountingStorageTRES
157              Comma-separated list of resources you wish to track on the clus‐
158              ter.   These  are the resources requested by the sbatch/srun job
159              when it is submitted. Currently this consists of  any  GRES,  BB
160              (burst  buffer) or license along with CPU, Memory, Node, Energy,
161              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
162              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
163              These default TRES cannot be disabled,  but  only  appended  to.
164              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
165              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
166              along with a gres called craynetwork as well as a license called
167              iop1. Whenever these resources are used on the cluster they  are
168              recorded.  The  TRES are automatically set up in the database on
169              the start of the slurmctld.
170
171              If multiple GRES of different types are tracked  (e.g.  GPUs  of
172              different  types), then job requests with matching type specifi‐
173              cations will be recorded.  Given a  configuration  of  "Account‐
174              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
175              "gres/gpu:tesla" and "gres/gpu:volta" will track only jobs  that
176              explicitly  request  those  two GPU types, while "gres/gpu" will
177              track allocated GPUs of any type ("tesla", "volta" or any  other
178              GPU type).
179
180              Given      a      configuration      of      "AccountingStorage‐
181              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
182              "gres/gpu:volta"  will  track jobs that explicitly request those
183              GPU types.  If a job requests  GPUs,  but  does  not  explicitly
184              specify  the  GPU type, then its resource allocation will be ac‐
185              counted for as either "gres/gpu:tesla" or "gres/gpu:volta",  al‐
186              though  the  accounting  may not match the actual GPU type allo‐
187              cated to the job and the GPUs allocated to the job could be het‐
188              erogeneous.  In an environment containing various GPU types, use
189              of a job_submit plugin may be desired in order to force jobs  to
190              explicitly specify some GPU type.
191
192       AccountingStorageType
193              The  accounting  storage  mechanism  type.  Acceptable values at
194              present include "accounting_storage/none" and  "accounting_stor‐
195              age/slurmdbd".   The  "accounting_storage/slurmdbd"  value indi‐
196              cates that accounting records will be written to the Slurm  DBD,
197              which  manages  an underlying MySQL database. See "man slurmdbd"
198              for more information.  The default  value  is  "accounting_stor‐
199              age/none" and indicates that account records are not maintained.
200
201       AccountingStorageUser
202              The  user account for accessing the accounting storage database.
203              Only used for database type storage plugins, ignored otherwise.
204
205       AccountingStoreFlags
206              Comma separated list used to tell the slurmctld to  store  extra
207              fields  that may be more heavy weight than the normal job infor‐
208              mation.
209
210              Current options are:
211
212              job_comment
213                     Include the job's comment field in the job complete  mes‐
214                     sage  sent  to the Accounting Storage database.  Note the
215                     AdminComment and SystemComment are always recorded in the
216                     database.
217
218              job_env
219                     Include  a  batch job's environment variables used at job
220                     submission in the job start message sent to the  Account‐
221                     ing Storage database.
222
223              job_script
224                     Include  the  job's batch script in the job start message
225                     sent to the Accounting Storage database.
226
227       AcctGatherNodeFreq
228              The AcctGather plugins sampling interval  for  node  accounting.
229              For AcctGather plugin values of none, this parameter is ignored.
230              For all other values this parameter is the number of seconds be‐
231              tween  node  accounting samples. For the acct_gather_energy/rapl
232              plugin, set a value less than 300 because the counters may over‐
233              flow  beyond  this  rate.  The default value is zero. This value
234              disables accounting sampling for  nodes.  Note:  The  accounting
235              sampling  interval for jobs is determined by the value of JobAc‐
236              ctGatherFrequency.
237
238       AcctGatherEnergyType
239              Identifies the plugin to be used for energy consumption account‐
240              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
241              plugin to collect energy consumption data for  jobs  and  nodes.
242              The  collection  of  energy  consumption data takes place on the
243              node level, hence only in case of exclusive job  allocation  the
244              energy consumption measurements will reflect the job's real con‐
245              sumption. In case of node sharing between jobs the reported con‐
246              sumed  energy  per job (through sstat or sacct) will not reflect
247              the real energy consumed by the jobs.
248
249              Configurable values at present are:
250
251              acct_gather_energy/none
252                                  No energy consumption data is collected.
253
254              acct_gather_energy/ipmi
255                                  Energy consumption data  is  collected  from
256                                  the  Baseboard  Management  Controller (BMC)
257                                  using the  Intelligent  Platform  Management
258                                  Interface (IPMI).
259
260              acct_gather_energy/pm_counters
261                                  Energy  consumption  data  is collected from
262                                  the Baseboard  Management  Controller  (BMC)
263                                  for HPE Cray systems.
264
265              acct_gather_energy/rapl
266                                  Energy  consumption  data  is collected from
267                                  hardware sensors using the  Running  Average
268                                  Power  Limit (RAPL) mechanism. Note that en‐
269                                  abling RAPL may require the execution of the
270                                  command "sudo modprobe msr".
271
272              acct_gather_energy/xcc
273                                  Energy  consumption  data  is collected from
274                                  the Lenovo SD650 XClarity  Controller  (XCC)
275                                  using IPMI OEM raw commands.
276
277       AcctGatherInterconnectType
278              Identifies  the plugin to be used for interconnect network traf‐
279              fic accounting.  The jobacct_gather  plugin  and  slurmd  daemon
280              call  this  plugin  to collect network traffic data for jobs and
281              nodes.  The collection of network traffic data  takes  place  on
282              the  node  level, hence only in case of exclusive job allocation
283              the collected values will reflect the  job's  real  traffic.  In
284              case  of  node sharing between jobs the reported network traffic
285              per job (through sstat or sacct) will not reflect the real  net‐
286              work traffic by the jobs.
287
288              Configurable values at present are:
289
290              acct_gather_interconnect/none
291                                  No infiniband network data are collected.
292
293              acct_gather_interconnect/ofed
294                                  Infiniband  network  traffic  data  are col‐
295                                  lected from the hardware monitoring counters
296                                  of  Infiniband  devices through the OFED li‐
297                                  brary.  In order to account for per job net‐
298                                  work  traffic, add the "ic/ofed" TRES to Ac‐
299                                  countingStorageTRES.
300
301              acct_gather_interconnect/sysfs
302                                  Network  traffic  statistics  are  collected
303                                  from  the  Linux sysfs pseudo-filesystem for
304                                  specific     interfaces      defined      in
305                                  acct_gather_interconnect.conf(5).   In order
306                                  to account for per job network traffic,  add
307                                  the  "ic/sysfs"  TRES  to AccountingStorage‐
308                                  TRES.
309
310       AcctGatherFilesystemType
311              Identifies the plugin to be used for filesystem traffic account‐
312              ing.   The  jobacct_gather  plugin  and  slurmd daemon call this
313              plugin to collect filesystem traffic data for  jobs  and  nodes.
314              The  collection  of  filesystem  traffic data takes place on the
315              node level, hence only in case of exclusive job  allocation  the
316              collected values will reflect the job's real traffic. In case of
317              node sharing between jobs the reported  filesystem  traffic  per
318              job  (through sstat or sacct) will not reflect the real filesys‐
319              tem traffic by the jobs.
320
321
322              Configurable values at present are:
323
324              acct_gather_filesystem/none
325                                  No filesystem data are collected.
326
327              acct_gather_filesystem/lustre
328                                  Lustre filesystem traffic data are collected
329                                  from the counters found in /proc/fs/lustre/.
330                                  In order to account for per job lustre traf‐
331                                  fic,  add  the  "fs/lustre" TRES to Account‐
332                                  ingStorageTRES.
333
334       AcctGatherProfileType
335              Identifies the plugin to be used  for  detailed  job  profiling.
336              The  jobacct_gather plugin and slurmd daemon call this plugin to
337              collect detailed data such as I/O counts, memory usage,  or  en‐
338              ergy  consumption  for  jobs  and nodes. There are interfaces in
339              this plugin to collect data as step start and  completion,  task
340              start  and  completion, and at the account gather frequency. The
341              data collected at the node level is related to jobs only in case
342              of exclusive job allocation.
343
344              Configurable values at present are:
345
346              acct_gather_profile/none
347                                  No profile data is collected.
348
349              acct_gather_profile/hdf5
350                                  This  enables the HDF5 plugin. The directory
351                                  where the profile files are stored and which
352                                  values  are  collected are configured in the
353                                  acct_gather.conf file.
354
355              acct_gather_profile/influxdb
356                                  This enables the influxdb  plugin.  The  in‐
357                                  fluxdb instance host, port, database, reten‐
358                                  tion policy and which values  are  collected
359                                  are configured in the acct_gather.conf file.
360
361       AllowSpecResourcesUsage
362              If set to "YES", Slurm allows individual jobs to override node's
363              configured CoreSpecCount value. For a job to take  advantage  of
364              this feature, a command line option of --core-spec must be spec‐
365              ified.  The default value for this option is "YES" for Cray sys‐
366              tems and "NO" for other system types.
367
368       AuthAltTypes
369              Comma-separated  list of alternative authentication plugins that
370              the slurmctld will permit for communication.  Acceptable  values
371              at present include auth/jwt.
372
373              NOTE:  auth/jwt  requires a jwt_hs256.key to be populated in the
374              StateSaveLocation   directory   for    slurmctld    only.    The
375              jwt_hs256.key  should only be visible to the SlurmUser and root.
376              It is not suggested to place the jwt_hs256.key on any nodes  but
377              the  controller running slurmctld.  auth/jwt can be activated by
378              the presence of the SLURM_JWT environment variable.  When  acti‐
379              vated, it will override the default AuthType.
380
381       AuthAltParameters
382              Used  to define alternative authentication plugins options. Mul‐
383              tiple options may be comma separated.
384
385              disable_token_creation
386                             Disable "scontrol token" use by non-SlurmUser ac‐
387                             counts.
388
389              max_token_lifespan=<seconds>
390                             Set  max lifespan (in seconds) for any token gen‐
391                             erated for user accounts.  (This limit  does  not
392                             apply to SlurmUser.)
393
394              jwks=          Absolute  path  to JWKS file. Only RS256 keys are
395                             supported, although other key types may be listed
396                             in  the file. If set, no HS256 key will be loaded
397                             by default (and token  generation  is  disabled),
398                             although  the  jwt_key setting may be used to ex‐
399                             plicitly re-enable HS256 key use (and token  gen‐
400                             eration).
401
402              jwt_key=       Absolute path to JWT key file. Key must be HS256,
403                             and should only be accessible  by  SlurmUser.  If
404                             not set, the default key file is jwt_hs256.key in
405                             StateSaveLocation.
406
407       AuthInfo
408              Additional information to be used for authentication of communi‐
409              cations between the Slurm daemons (slurmctld and slurmd) and the
410              Slurm clients.  The interpretation of this option is specific to
411              the configured AuthType.  Multiple options may be specified in a
412              comma-delimited list.  If not specified, the default authentica‐
413              tion information will be used.
414
415              cred_expire   Default  job  step credential lifetime, in seconds
416                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
417                            ciently  long enough to load user environment, run
418                            prolog, deal with the slurmd getting paged out  of
419                            memory,  etc.   This  also controls how long a re‐
420                            queued job must wait before starting  again.   The
421                            default value is 120 seconds.
422
423              socket        Path  name  to  a MUNGE daemon socket to use (e.g.
424                            "socket=/var/run/munge/munge.socket.2").  The  de‐
425                            fault  value  is  "/var/run/munge/munge.socket.2".
426                            Used by auth/munge and cred/munge.
427
428              ttl           Credential lifetime, in seconds (e.g.  "ttl=300").
429                            The  default value is dependent upon the MUNGE in‐
430                            stallation, but is typically 300 seconds.
431
432       AuthType
433              The authentication method for communications between Slurm  com‐
434              ponents.   Acceptable  values  at  present include "auth/munge",
435              which is the default.  "auth/munge" indicates that MUNGE  is  to
436              be  used.  (See "https://dun.github.io/munge/" for more informa‐
437              tion).  All Slurm daemons and commands must be terminated  prior
438              to changing the value of AuthType and later restarted.
439
440       BackupAddr
441              Deprecated option, see SlurmctldHost.
442
443       BackupController
444              Deprecated option, see SlurmctldHost.
445
446              The backup controller recovers state information from the State‐
447              SaveLocation directory, which must be readable and writable from
448              both  the  primary and backup controllers.  While not essential,
449              it is recommended that you specify  a  backup  controller.   See
450              the RELOCATING CONTROLLERS section if you change this.
451
452       BatchStartTimeout
453              The  maximum time (in seconds) that a batch job is permitted for
454              launching before being considered missing and releasing the  al‐
455              location.  The  default value is 10 (seconds). Larger values may
456              be required if more time is required to execute the Prolog, load
457              user  environment  variables, or if the slurmd daemon gets paged
458              from memory.
459              Note: The test for a job being  successfully  launched  is  only
460              performed  when  the  Slurm daemon on the compute node registers
461              state with the slurmctld daemon on the head node, which  happens
462              fairly  rarely.   Therefore a job will not necessarily be termi‐
463              nated if its start time exceeds BatchStartTimeout.  This config‐
464              uration  parameter  is  also  applied  to launch tasks and avoid
465              aborting srun commands due to long running Prolog scripts.
466
467       BcastExclude
468              Comma-separated list of absolute directory paths to be  excluded
469              when autodetecting and broadcasting executable shared object de‐
470              pendencies through sbcast or srun --bcast.  The  keyword  "none"
471              can  be  used  to indicate that no directory paths should be ex‐
472              cluded. The default value is  "/lib,/usr/lib,/lib64,/usr/lib64".
473              This  option  can  be  overridden  by  sbcast --exclude and srun
474              --bcast-exclude.
475
476       BcastParameters
477              Controls sbcast and srun --bcast behavior. Multiple options  can
478              be  specified  in  a comma separated list.  Supported values in‐
479              clude:
480
481              DestDir=       Destination directory for file being broadcast to
482                             allocated  compute  nodes.  Default value is cur‐
483                             rent working directory, or --chdir  for  srun  if
484                             set.
485
486              Compression=   Specify  default  file  compression library to be
487                             used.  Supported values  are  "lz4"  and  "none".
488                             The  default value with the sbcast --compress op‐
489                             tion is "lz4" and "none"  otherwise.   Some  com‐
490                             pression  libraries  may  be  unavailable on some
491                             systems.
492
493              send_libs      If set, attempt to autodetect and  broadcast  the
494                             executable's  shared object dependencies to allo‐
495                             cated compute nodes. The files are  placed  in  a
496                             directory  alongside  the  executable.  For  srun
497                             only, the LD_LIBRARY_PATH  is  automatically  up‐
498                             dated  to  include  this cache directory as well.
499                             This can be overridden with either sbcast or srun
500                             --send-libs option. By default this is disabled.
501
502       BurstBufferType
503              The  plugin  used  to manage burst buffers. Acceptable values at
504              present are:
505
506              burst_buffer/datawarp
507                     Use Cray DataWarp API to provide burst buffer functional‐
508                     ity.
509
510              burst_buffer/lua
511                     This plugin provides hooks to an API that is defined by a
512                     Lua script. This plugin was developed to  provide  system
513                     administrators  with  a way to do any task (not only file
514                     staging) at different points in a job’s life cycle.
515
516              burst_buffer/none
517
518       CliFilterPlugins
519              A comma-delimited list of command  line  interface  option  fil‐
520              ter/modification plugins. The specified plugins will be executed
521              in the order listed.  No cli_filter plugins are used by default.
522              Acceptable values at present are:
523
524              cli_filter/lua
525                     This  plugin  allows you to write your own implementation
526                     of a cli_filter using lua.
527
528              cli_filter/syslog
529                     This plugin enables logging of job submission  activities
530                     performed.  All the salloc/sbatch/srun options are logged
531                     to syslog together with  environment  variables  in  JSON
532                     format.  If the plugin is not the last one in the list it
533                     may log values different than what was actually  sent  to
534                     slurmctld.
535
536              cli_filter/user_defaults
537                     This  plugin looks for the file $HOME/.slurm/defaults and
538                     reads every line of it as a key=value pair, where key  is
539                     any  of  the  job  submission  options  available to sal‐
540                     loc/sbatch/srun and value is a default value  defined  by
541                     the user. For instance:
542                     time=1:30
543                     mem=2048
544                     The  above will result in a user defined default for each
545                     of their jobs of "-t 1:30" and "--mem=2048".
546
547       ClusterName
548              The name by which this Slurm managed cluster is known in the ac‐
549              counting   database.   This  is  needed  distinguish  accounting
550              records when multiple clusters report to the same database.  Be‐
551              cause  of  limitations in some databases, any upper case letters
552              in the name will be silently mapped to lower case. In  order  to
553              avoid  confusion, it is recommended that the name be lower case.
554              The cluster name must be 40 characters or less in order to  com‐
555              ply  with  the  limit  on  the maximum length for table names in
556              MySQL/MariaDB.
557
558       CommunicationParameters
559              Comma-separated options identifying communication options.
560
561              block_null_hash
562                             Require all Slurm authentication  tokens  to  in‐
563                             clude  a newer (20.11.9 and 21.08.8) payload that
564                             provides an additional layer of security  against
565                             credential  replay  attacks.   This option should
566                             only be enabled once all Slurm daemons have  been
567                             upgraded  to  20.11.9/21.08.8  or  newer, and all
568                             jobs that were started before  the  upgrade  have
569                             been completed.
570
571              CheckGhalQuiesce
572                             Used  specifically  on a Cray using an Aries Ghal
573                             interconnect.  This will check to see if the sys‐
574                             tem  is  quiescing when sending a message, and if
575                             so, we wait until it is done before sending.
576
577              DisableIPv4    Disable IPv4 only operation for all slurm daemons
578                             (except  slurmdbd).  This  should  also be set in
579                             your slurmdbd.conf file.
580
581              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
582                             (except slurmdbd). When using both IPv4 and IPv6,
583                             address family preferences will be based on  your
584                             /etc/gai.conf  file.  This  should also be set in
585                             your slurmdbd.conf file.
586
587              keepaliveinterval=#
588                             Specifies the interval between  keepalive  probes
589                             on the socket communications between srun and its
590                             slurmstepd process.
591
592              keepaliveprobes=#
593                             Specifies the number of keepalive probes sent  on
594                             the  socket  communications  between srun command
595                             and its slurmstepd process before the  connection
596                             is considered broken.
597
598              keepalivetime=#
599                             Specifies  how  long  sockets communications used
600                             between  the  srun  command  and  its  slurmstepd
601                             process  are  kept alive after disconnect. Longer
602                             values can be used to improve reliability of com‐
603                             munications in the event of network failures.
604
605              NoAddrCache    By default, Slurm will cache a node's network ad‐
606                             dress after successfully establishing the  node's
607                             network  address.  This option disables the cache
608                             and Slurm will look up the node's network address
609                             each  time a connection is made.  This is useful,
610                             for example, in a  cloud  environment  where  the
611                             node addresses come and go out of DNS.
612
613              NoCtldInAddrAny
614                             Used  to directly bind to the address of what the
615                             node resolves to running the slurmctld instead of
616                             binding  messages  to  any  address  on the node,
617                             which is the default.
618
619              NoInAddrAny    Used to directly bind to the address of what  the
620                             node  resolves  to instead of binding messages to
621                             any address on the node  which  is  the  default.
622                             This option is for all daemons/clients except for
623                             the slurmctld.
624
625       CompleteWait
626              The time to wait, in seconds, when any job is in the  COMPLETING
627              state  before  any additional jobs are scheduled. This is to at‐
628              tempt to keep jobs on nodes that were recently in use, with  the
629              goal  of preventing fragmentation.  If set to zero, pending jobs
630              will be started as soon as possible.  Since a  COMPLETING  job's
631              resources are released for use by other jobs as soon as the Epi‐
632              log completes on each individual node, this can result  in  very
633              fragmented resource allocations.  To provide jobs with the mini‐
634              mum response time, a value of zero is recommended (no  waiting).
635              To  minimize  fragmentation of resources, a value equal to Kill‐
636              Wait plus two is recommended.  In that case, setting KillWait to
637              a small value may be beneficial.  The default value of Complete‐
638              Wait is zero seconds.  The value may not exceed 65533.
639
640              NOTE: Setting reduce_completing_frag  affects  the  behavior  of
641              CompleteWait.
642
643       ControlAddr
644              Deprecated option, see SlurmctldHost.
645
646       ControlMachine
647              Deprecated option, see SlurmctldHost.
648
649       CoreSpecPlugin
650              Identifies  the  plugins to be used for enforcement of core spe‐
651              cialization.  A restart of the slurmd daemons  is  required  for
652              changes  to this parameter to take effect.  Acceptable values at
653              present include:
654
655              core_spec/cray_aries
656                                  used only for Cray systems
657
658              core_spec/none      used for all other system types
659
660       CpuFreqDef
661              Default CPU frequency value or frequency governor  to  use  when
662              running  a  job  step if it has not been explicitly set with the
663              --cpu-freq option.  Acceptable values at present include  a  nu‐
664              meric  value  (frequency  in  kilohertz) or one of the following
665              governors:
666
667              Conservative  attempts to use the Conservative CPU governor
668
669              OnDemand      attempts to use the OnDemand CPU governor
670
671              Performance   attempts to use the Performance CPU governor
672
673              PowerSave     attempts to use the PowerSave CPU governor
674       There is no default value. If unset, no attempt to set the governor  is
675       made if the --cpu-freq option has not been set.
676
677       CpuFreqGovernors
678              List  of CPU frequency governors allowed to be set with the sal‐
679              loc, sbatch, or srun option  --cpu-freq.  Acceptable  values  at
680              present include:
681
682              Conservative  attempts to use the Conservative CPU governor
683
684              OnDemand      attempts  to  use the OnDemand CPU governor (a de‐
685                            fault value)
686
687              Performance   attempts to use the Performance  CPU  governor  (a
688                            default value)
689
690              PowerSave     attempts to use the PowerSave CPU governor
691
692              SchedUtil     attempts to use the SchedUtil CPU governor
693
694              UserSpace     attempts  to use the UserSpace CPU governor (a de‐
695                            fault value)
696       The default is OnDemand, Performance and UserSpace.
697
698       CredType
699              The cryptographic signature tool to be used in the  creation  of
700              job  step  credentials.   A restart of slurmctld is required for
701              changes to this parameter to take effect.  The default (and rec‐
702              ommended) value is "cred/munge".
703
704       DebugFlags
705              Defines  specific  subsystems which should provide more detailed
706              event logging.  Multiple subsystems can be specified with  comma
707              separators.   Most  DebugFlags will result in verbose-level log‐
708              ging for the identified subsystems,  and  could  impact  perfor‐
709              mance.
710
711              Valid subsystems available include:
712
713              Accrue           Accrue counters accounting details
714
715              Agent            RPC agents (outgoing RPCs from Slurm daemons)
716
717              Backfill         Backfill scheduler details
718
719              BackfillMap      Backfill scheduler to log a very verbose map of
720                               reserved resources through time.  Combine  with
721                               Backfill for a verbose and complete view of the
722                               backfill scheduler's work.
723
724              BurstBuffer      Burst Buffer plugin
725
726              Cgroup           Cgroup details
727
728              CPU_Bind         CPU binding details for jobs and steps
729
730              CpuFrequency     Cpu frequency details for jobs and steps  using
731                               the --cpu-freq option.
732
733              Data             Generic data structure details.
734
735              Dependency       Job dependency debug info
736
737              Elasticsearch    Elasticsearch debug info
738
739              Energy           AcctGatherEnergy debug info
740
741              ExtSensors       External Sensors debug info
742
743              Federation       Federation scheduling debug info
744
745              FrontEnd         Front end node details
746
747              Gres             Generic resource details
748
749              Hetjob           Heterogeneous job details
750
751              Gang             Gang scheduling details
752
753              JobAccountGather Common   job  account  gathering  details  (not
754                               plugin specific).
755
756              JobContainer     Job container plugin details
757
758              License          License management details
759
760              Network          Network details. Warning: activating this  flag
761                               may cause logging of passwords, tokens or other
762                               authentication credentials.
763
764              NetworkRaw       Dump raw hex values of key  Network  communica‐
765                               tions.  Warning: This flag will cause very ver‐
766                               bose logs and may cause logging  of  passwords,
767                               tokens or other authentication credentials.
768
769              NodeFeatures     Node Features plugin debug info
770
771              NO_CONF_HASH     Do not log when the slurm.conf files differ be‐
772                               tween Slurm daemons
773
774              Power            Power management plugin and  power  save  (sus‐
775                               pend/resume programs) details
776
777              Priority         Job prioritization
778
779              Profile          AcctGatherProfile plugins details
780
781              Protocol         Communication protocol details
782
783              Reservation      Advanced reservations
784
785              Route            Message forwarding debug info
786
787              Script           Debug  info  regarding  the  process  that runs
788                               slurmctld scripts such as  PrologSlurmctld  and
789                               EpilogSlurmctld
790
791              SelectType       Resource selection plugin
792
793              Steps            Slurmctld resource allocation for job steps
794
795              Switch           Switch plugin
796
797              TimeCray         Timing of Cray APIs
798
799              TraceJobs        Trace jobs in slurmctld. It will print detailed
800                               job information including state,  job  ids  and
801                               allocated nodes counter.
802
803              Triggers         Slurmctld triggers
804
805              WorkQueue        Work Queue details
806
807       DefCpuPerGPU
808              Default count of CPUs allocated per allocated GPU. This value is
809              used  only  if  the  job  didn't  specify  --cpus-per-task   and
810              --cpus-per-gpu.
811
812       DefMemPerCPU
813              Default  real  memory size available per usable allocated CPU in
814              megabytes.  Used to avoid over-subscribing  memory  and  causing
815              paging.  DefMemPerCPU would generally be used if individual pro‐
816              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
817              lectType=select/cons_tres).  The default value is 0 (unlimited).
818              Also see DefMemPerGPU, DefMemPerNode and MaxMemPerCPU.   DefMem‐
819              PerCPU, DefMemPerGPU and DefMemPerNode are mutually exclusive.
820
821
822              NOTE: This applies to usable allocated CPUs in a job allocation.
823              This is important when more than one thread per core is  config‐
824              ured.   If  a job requests --threads-per-core with fewer threads
825              on a core than exist on the core (or --hint=nomultithread  which
826              implies  --threads-per-core=1),  the  job  will be unable to use
827              those extra threads on the core and those threads  will  not  be
828              included  in  the memory per CPU calculation. But if the job has
829              access to all threads on the core, those  threads  will  be  in‐
830              cluded in the memory per CPU calculation even if the job did not
831              explicitly request those threads.
832
833              In the following examples, each core has two threads.
834
835              In this first example, two tasks  can  run  on  separate  hyper‐
836              threads in the same core because --threads-per-core is not used.
837              The third task uses both threads of the second core.  The  allo‐
838              cated memory per cpu includes all threads:
839
840              $ salloc -n3 --mem-per-cpu=100
841              salloc: Granted job allocation 17199
842              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
843                JobID                             ReqTRES                           AllocTRES
844              ------- ----------------------------------- -----------------------------------
845                17199     billing=3,cpu=3,mem=300M,node=1     billing=4,cpu=4,mem=400M,node=1
846
847              In  this  second  example, because of --threads-per-core=1, each
848              task is allocated an entire core but is only  able  to  use  one
849              thread  per  core.  Allocated  CPUs includes all threads on each
850              core. However, allocated memory per cpu includes only the usable
851              thread in each core.
852
853              $ salloc -n3 --mem-per-cpu=100 --threads-per-core=1
854              salloc: Granted job allocation 17200
855              $ sacct -j $SLURM_JOB_ID -X -o jobid%7,reqtres%35,alloctres%35
856                JobID                             ReqTRES                           AllocTRES
857              ------- ----------------------------------- -----------------------------------
858                17200     billing=3,cpu=3,mem=300M,node=1     billing=6,cpu=6,mem=300M,node=1
859
860       DefMemPerGPU
861              Default   real  memory  size  available  per  allocated  GPU  in
862              megabytes.  The  default  value  is  0  (unlimited).   Also  see
863              DefMemPerCPU  and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU and
864              DefMemPerNode are mutually exclusive.
865
866       DefMemPerNode
867              Default  real  memory  size  available  per  allocated  node  in
868              megabytes.   Used  to  avoid over-subscribing memory and causing
869              paging.  DefMemPerNode would generally be used  if  whole  nodes
870              are  allocated  to jobs (SelectType=select/linear) and resources
871              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
872              The  default  value  is  0  (unlimited).  Also see DefMemPerCPU,
873              DefMemPerGPU and MaxMemPerCPU.  DefMemPerCPU,  DefMemPerGPU  and
874              DefMemPerNode are mutually exclusive.
875
876       DependencyParameters
877              Multiple options may be comma separated.
878
879              disable_remote_singleton
880                     By  default,  when a federated job has a singleton depen‐
881                     dency, each cluster in the federation must clear the sin‐
882                     gleton  dependency  before the job's singleton dependency
883                     is considered satisfied. Enabling this option means  that
884                     only  the  origin cluster must clear the singleton depen‐
885                     dency. This option must be set in every  cluster  in  the
886                     federation.
887
888              kill_invalid_depend
889                     If  a  job has an invalid dependency and it can never run
890                     terminate it and set its state to  be  JOB_CANCELLED.  By
891                     default  the job stays pending with reason DependencyNev‐
892                     erSatisfied.
893
894              max_depend_depth=#
895                     Maximum number of jobs to test for a circular job  depen‐
896                     dency. Stop testing after this number of job dependencies
897                     have been tested. The default value is 10 jobs.
898
899       DisableRootJobs
900              If set to "YES" then user root will be  prevented  from  running
901              any  jobs.  The default value is "NO", meaning user root will be
902              able to execute jobs.  DisableRootJobs may also be set by parti‐
903              tion.
904
905       EioTimeout
906              The  number  of  seconds  srun waits for slurmstepd to close the
907              TCP/IP connection used to relay data between the  user  applica‐
908              tion  and srun when the user application terminates. The default
909              value is 60 seconds.  May not exceed 65533.
910
911       EnforcePartLimits
912              If set to "ALL" then jobs which exceed a partition's size and/or
913              time  limits will be rejected at submission time. If job is sub‐
914              mitted to multiple partitions, the job must satisfy  the  limits
915              on  all  the  requested  partitions. If set to "NO" then the job
916              will be accepted and remain queued until  the  partition  limits
917              are  altered(Time  and Node Limits).  If set to "ANY" a job must
918              satisfy any of the requested partitions to be submitted. The de‐
919              fault  value is "NO".  NOTE: If set, then a job's QOS can not be
920              used to exceed partition limits.  NOTE: The partition limits be‐
921              ing  considered  are its configured MaxMemPerCPU, MaxMemPerNode,
922              MinNodes, MaxNodes, MaxTime, AllocNodes,  AllowAccounts,  Allow‐
923              Groups, AllowQOS, and QOS usage threshold.
924
925       Epilog Fully  qualified pathname of a script to execute as user root on
926              every  node  when  a  user's  job  completes   (e.g.   "/usr/lo‐
927              cal/slurm/epilog").  A  glob  pattern (See glob (7)) may also be
928              used to run more than one epilog script  (e.g.  "/etc/slurm/epi‐
929              log.d/*").  The  Epilog  script  or scripts may be used to purge
930              files, disable user login, etc.  By default there is no  epilog.
931              See Prolog and Epilog Scripts for more information.
932
933       EpilogMsgTime
934              The number of microseconds that the slurmctld daemon requires to
935              process an epilog completion message from  the  slurmd  daemons.
936              This  parameter can be used to prevent a burst of epilog comple‐
937              tion messages from being sent at the same time which should help
938              prevent  lost  messages  and  improve throughput for large jobs.
939              The default value is 2000 microseconds.  For a  1000  node  job,
940              this  spreads  the  epilog completion messages out over two sec‐
941              onds.
942
943       EpilogSlurmctld
944              Fully qualified pathname of a program for the slurmctld to  exe‐
945              cute  upon  termination  of  a  job  allocation (e.g.  "/usr/lo‐
946              cal/slurm/epilog_controller").  The program  executes  as  Slur‐
947              mUser,  which gives it permission to drain nodes and requeue the
948              job if a failure occurs (See  scontrol(1)).   Exactly  what  the
949              program  does  and how it accomplishes this is completely at the
950              discretion of the system administrator.  Information  about  the
951              job being initiated, its allocated nodes, etc. are passed to the
952              program using environment  variables.   See  Prolog  and  Epilog
953              Scripts for more information.
954
955       ExtSensorsFreq
956              The  external  sensors  plugin  sampling  interval.   If ExtSen‐
957              sorsType=ext_sensors/none, this parameter is ignored.   For  all
958              other  values of ExtSensorsType, this parameter is the number of
959              seconds between external sensors samples for hardware components
960              (nodes,  switches,  etc.)  The default value is zero. This value
961              disables external sensors sampling. Note:  This  parameter  does
962              not affect external sensors data collection for jobs/steps.
963
964       ExtSensorsType
965              Identifies  the plugin to be used for external sensors data col‐
966              lection.  Slurmctld calls this plugin to collect  external  sen‐
967              sors  data  for  jobs/steps  and hardware components. In case of
968              node sharing between  jobs  the  reported  values  per  job/step
969              (through  sstat  or  sacct)  may not be accurate.  See also "man
970              ext_sensors.conf".
971
972              Configurable values at present are:
973
974              ext_sensors/none    No external sensors data is collected.
975
976              ext_sensors/rrd     External sensors data is collected from  the
977                                  RRD database.
978
979       FairShareDampeningFactor
980              Dampen  the  effect of exceeding a user or group's fair share of
981              allocated resources. Higher values will provides greater ability
982              to differentiate between exceeding the fair share at high levels
983              (e.g. a value of 1 results in almost no difference between over‐
984              consumption  by  a factor of 10 and 100, while a value of 5 will
985              result in a significant difference in  priority).   The  default
986              value is 1.
987
988       FederationParameters
989              Used to define federation options. Multiple options may be comma
990              separated.
991
992              fed_display
993                     If set, then the client  status  commands  (e.g.  squeue,
994                     sinfo,  sprio, etc.) will display information in a feder‐
995                     ated view by default. This option is functionally equiva‐
996                     lent  to  using the --federation options on each command.
997                     Use the client's --local option to override the federated
998                     view and get a local view of the given cluster.
999
1000       FirstJobId
1001              The job id to be used for the first job submitted to Slurm.  Job
1002              id values generated will incremented by 1  for  each  subsequent
1003              job.  Value must be larger than 0. The default value is 1.  Also
1004              see MaxJobId
1005
1006       GetEnvTimeout
1007              Controls how long the job should wait (in seconds) to  load  the
1008              user's  environment  before  attempting  to load it from a cache
1009              file.  Applies when the salloc or sbatch  --get-user-env  option
1010              is  used.   If  set to 0 then always load the user's environment
1011              from the cache file.  The default value is 2 seconds.
1012
1013       GresTypes
1014              A comma-delimited list of generic resources to be managed  (e.g.
1015              GresTypes=gpu,mps).  These resources may have an associated GRES
1016              plugin of the same name providing additional functionality.   No
1017              generic resources are managed by default.  Ensure this parameter
1018              is consistent across all nodes in the cluster for proper  opera‐
1019              tion.  A restart of slurmctld and the slurmd daemons is required
1020              for this to take effect.
1021
1022       GroupUpdateForce
1023              If set to a non-zero value, then information about  which  users
1024              are members of groups allowed to use a partition will be updated
1025              periodically, even when  there  have  been  no  changes  to  the
1026              /etc/group  file.  If set to zero, group member information will
1027              be updated only after the /etc/group file is updated.   The  de‐
1028              fault value is 1.  Also see the GroupUpdateTime parameter.
1029
1030       GroupUpdateTime
1031              Controls  how  frequently information about which users are mem‐
1032              bers of groups allowed to use a partition will be  updated,  and
1033              how  long  user group membership lists will be cached.  The time
1034              interval is given in seconds with a default value  of  600  sec‐
1035              onds.   A  value of zero will prevent periodic updating of group
1036              membership information.  Also see the  GroupUpdateForce  parame‐
1037              ter.
1038
1039       GpuFreqDef=[<type]=value>[,<type=value>]
1040              Default  GPU  frequency to use when running a job step if it has
1041              not been explicitly set using the --gpu-freq option.   This  op‐
1042              tion can be used to independently configure the GPU and its mem‐
1043              ory frequencies. Defaults to "high,memory=high".  After the  job
1044              is completed, the frequencies of all affected GPUs will be reset
1045              to the highest possible values.  In  some  cases,  system  power
1046              caps  may  override the requested values.  The field type can be
1047              "memory".  If type is not specified, the GPU  frequency  is  im‐
1048              plied.   The  value field can either be "low", "medium", "high",
1049              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
1050              fied numeric value is not possible, a value as close as possible
1051              will be used.  See below for definition of the values.  Examples
1052              of   use  include  "GpuFreqDef=medium,memory=high  and  "GpuFre‐
1053              qDef=450".
1054
1055              Supported value definitions:
1056
1057              low       the lowest available frequency.
1058
1059              medium    attempts to set a  frequency  in  the  middle  of  the
1060                        available range.
1061
1062              high      the highest available frequency.
1063
1064              highm1    (high  minus  one) will select the next highest avail‐
1065                        able frequency.
1066
1067       HealthCheckInterval
1068              The interval in seconds between  executions  of  HealthCheckPro‐
1069              gram.  The default value is zero, which disables execution.
1070
1071       HealthCheckNodeState
1072              Identify what node states should execute the HealthCheckProgram.
1073              Multiple state values may be specified with a  comma  separator.
1074              The default value is ANY to execute on nodes in any state.
1075
1076              ALLOC       Run  on  nodes  in  the  ALLOC state (all CPUs allo‐
1077                          cated).
1078
1079              ANY         Run on nodes in any state.
1080
1081              CYCLE       Rather than running the health check program on  all
1082                          nodes at the same time, cycle through running on all
1083                          compute nodes through the course of the HealthCheck‐
1084                          Interval.  May  be  combined  with  the various node
1085                          state options.
1086
1087              IDLE        Run on nodes in the IDLE state.
1088
1089              MIXED       Run on nodes in the MIXED state (some CPUs idle  and
1090                          other CPUs allocated).
1091
1092       HealthCheckProgram
1093              Fully qualified pathname of a script to execute as user root pe‐
1094              riodically on all compute nodes that are not in the NOT_RESPOND‐
1095              ING  state. This program may be used to verify the node is fully
1096              operational and DRAIN the node or send email if a problem is de‐
1097              tected.   Any action to be taken must be explicitly performed by
1098              the  program  (e.g.  execute   "scontrol   update   NodeName=foo
1099              State=drain  Reason=tmp_file_system_full" to drain a node).  The
1100              execution interval is controlled using  the  HealthCheckInterval
1101              parameter.  Note that the HealthCheckProgram will be executed at
1102              the same time on all nodes to minimize its impact upon  parallel
1103              programs.   This program will be killed if it does not terminate
1104              normally within 60 seconds.  This program will also be  executed
1105              when  the slurmd daemon is first started and before it registers
1106              with the slurmctld daemon.  By default, no program will be  exe‐
1107              cuted.
1108
1109       InactiveLimit
1110              The interval, in seconds, after which a non-responsive job allo‐
1111              cation command (e.g. srun or salloc) will result in the job  be‐
1112              ing  terminated.  If  the  node on which the command is executed
1113              fails or the command abnormally terminates, this will  terminate
1114              its  job allocation.  This option has no effect upon batch jobs.
1115              When setting a value, take into consideration  that  a  debugger
1116              using  srun  to launch an application may leave the srun command
1117              in a stopped state for extended periods of time.  This limit  is
1118              ignored  for  jobs  running in partitions with the RootOnly flag
1119              set (the scheduler running as root will be responsible  for  the
1120              job).   The default value is unlimited (zero) and may not exceed
1121              65533 seconds.
1122
1123       InteractiveStepOptions
1124              When LaunchParameters=use_interactive_step is enabled, launching
1125              salloc  will  automatically  start an srun process with Interac‐
1126              tiveStepOptions to launch a terminal on a node in the job  allo‐
1127              cation.   The  default  value  is  "--interactive --preserve-env
1128              --pty $SHELL".  The "--interactive" option is intentionally  not
1129              documented  in the srun man page. It is meant only to be used in
1130              InteractiveStepOptions in order to create an "interactive  step"
1131              that  will  not consume resources so that other steps may run in
1132              parallel with the interactive step.
1133
1134       JobAcctGatherType
1135              The job accounting mechanism type.  Acceptable values at present
1136              include     "jobacct_gather/linux"    (for    Linux    systems),
1137              "jobacct_gather/cgroup" and "jobacct_gather/none" (no accounting
1138              data  collected).   The  default value is "jobacct_gather/none".
1139              "jobacct_gather/cgroup" is a plugin for the Linux operating sys‐
1140              tem  that  uses  cgroups  to  collect accounting statistics. The
1141              plugin collects the following statistics: From the cgroup memory
1142              subsystem:  memory.usage_in_bytes  (reported as 'pages') and rss
1143              from memory.stat (reported as 'rss'). From  the  cgroup  cpuacct
1144              subsystem:  user  cpu time and system cpu time. No value is pro‐
1145              vided by cgroups for virtual memory size ('vsize').  In order to
1146              use     the     sstat     tool     "jobacct_gather/linux",    or
1147              "jobacct_gather/cgroup" must be configured.
1148              NOTE: Changing this configuration parameter changes the contents
1149              of  the  messages  between Slurm daemons. Any previously running
1150              job steps are managed by a slurmstepd daemon that  will  persist
1151              through  the lifetime of that job step and not change its commu‐
1152              nication protocol. Only change this configuration parameter when
1153              there are no running job steps.
1154
1155       JobAcctGatherFrequency
1156              The  job  accounting and profiling sampling intervals.  The sup‐
1157              ported format is follows:
1158
1159              JobAcctGatherFrequency=<datatype>=<interval>
1160                          where <datatype>=<interval> specifies the task  sam‐
1161                          pling  interval  for  the jobacct_gather plugin or a
1162                          sampling  interval  for  a  profiling  type  by  the
1163                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1164                          rated <datatype>=<interval> intervals may be  speci‐
1165                          fied. Supported datatypes are as follows:
1166
1167                          task=<interval>
1168                                 where  <interval> is the task sampling inter‐
1169                                 val in seconds for the jobacct_gather plugins
1170                                 and     for    task    profiling    by    the
1171                                 acct_gather_profile plugin.
1172
1173                          energy=<interval>
1174                                 where <interval> is the sampling interval  in
1175                                 seconds   for   energy  profiling  using  the
1176                                 acct_gather_energy plugin
1177
1178                          network=<interval>
1179                                 where <interval> is the sampling interval  in
1180                                 seconds  for  infiniband  profiling using the
1181                                 acct_gather_interconnect plugin.
1182
1183                          filesystem=<interval>
1184                                 where <interval> is the sampling interval  in
1185                                 seconds  for  filesystem  profiling using the
1186                                 acct_gather_filesystem plugin.
1187
1188
1189                     The default value  for  task  sampling
1190                     interval
1191              is  30  seconds. The default value for all other intervals is 0.
1192              An interval of 0 disables sampling of the  specified  type.   If
1193              the  task sampling interval is 0, accounting information is col‐
1194              lected only at job termination, which reduces Slurm interference
1195              with  the  job,  but  also means that the statistics about a job
1196              don't reflect the average or maximum of several samples  though‐
1197              out the life of the job, but just show the information collected
1198              in the single sample.
1199              Smaller (non-zero) values have a greater impact upon job perfor‐
1200              mance,  but a value of 30 seconds is not likely to be noticeable
1201              for applications having less than 10,000 tasks.
1202              Users can independently override each interval on a per job  ba‐
1203              sis using the --acctg-freq option when submitting the job.
1204
1205       JobAcctGatherParams
1206              Arbitrary parameters for the job account gather plugin.  Accept‐
1207              able values at present include:
1208
1209              NoShared            Exclude shared memory from RSS. This  option
1210                                  cannot be used with UsePSS.
1211
1212              UsePss              Use  PSS  value  instead of RSS to calculate
1213                                  real usage of memory. The PSS value will  be
1214                                  saved  as  RSS.  This  option cannot be used
1215                                  with NoShared.
1216
1217              OverMemoryKill      Kill processes that are  being  detected  to
1218                                  use  more memory than requested by steps ev‐
1219                                  ery time accounting information is  gathered
1220                                  by the JobAcctGather plugin.  This parameter
1221                                  should be used with caution  because  a  job
1222                                  exceeding  its  memory allocation may affect
1223                                  other processes and/or machine health.
1224
1225                                  NOTE: If available,  it  is  recommended  to
1226                                  limit  memory  by  enabling task/cgroup as a
1227                                  TaskPlugin  and  making  use  of  Constrain‐
1228                                  RAMSpace=yes  in  the cgroup.conf instead of
1229                                  using this JobAcctGather mechanism for  mem‐
1230                                  ory   enforcement.  Using  JobAcctGather  is
1231                                  polling based and there is a delay before  a
1232                                  job  is  killed,  which could lead to system
1233                                  Out of Memory events.
1234
1235                                  NOTE: When using OverMemoryKill, if the com‐
1236                                  bined  memory used by all the processes in a
1237                                  step exceeds the memory  limit,  the  entire
1238                                  step  will be killed/cancelled by the JobAc‐
1239                                  ctGather plugin.  This differs from the  be‐
1240                                  havior  when  using ConstrainRAMSpace, where
1241                                  processes in the step will  be  killed,  but
1242                                  the  step will be left active, possibly with
1243                                  other processes left running.
1244
1245       JobCompHost
1246              The name of the machine hosting  the  job  completion  database.
1247              Only used for database type storage plugins, ignored otherwise.
1248
1249       JobCompLoc
1250              The  fully  qualified file name where job completion records are
1251              written when the JobCompType is "jobcomp/filetxt" or  the  data‐
1252              base  where  job completion records are stored when the JobComp‐
1253              Type is a database, or  a  complete  URL  endpoint  with  format
1254              <host>:<port>/<target>/_doc  when  JobCompType is "jobcomp/elas‐
1255              ticsearch" like i.e.  "localhost:9200/slurm/_doc".   NOTE:  More
1256              information    is    available    at    the   Slurm   web   site
1257              <https://slurm.schedmd.com/elasticsearch.html>.
1258
1259       JobCompParams
1260              Pass arbitrary text string to job completion plugin.   Also  see
1261              JobCompType.
1262
1263       JobCompPass
1264              The  password  used  to gain access to the database to store the
1265              job completion data.  Only used for database type storage  plug‐
1266              ins, ignored otherwise.
1267
1268       JobCompPort
1269              The  listening port of the job completion database server.  Only
1270              used for database type storage plugins, ignored otherwise.
1271
1272       JobCompType
1273              The job completion logging mechanism type.  Acceptable values at
1274              present include:
1275
1276              jobcomp/none
1277                     Upon  job  completion, a record of the job is purged from
1278                     the system.  If using the accounting infrastructure  this
1279                     plugin  may not be of interest since some of the informa‐
1280                     tion is redundant.
1281
1282              jobcomp/elasticsearch
1283                     Upon job completion, a record of the job should be  writ‐
1284                     ten  to an Elasticsearch server, specified by the JobCom‐
1285                     pLoc parameter.
1286                     NOTE: More information is available at the Slurm web site
1287                     ( https://slurm.schedmd.com/elasticsearch.html ).
1288
1289              jobcomp/filetxt
1290                     Upon  job completion, a record of the job should be writ‐
1291                     ten to a text file, specified by the  JobCompLoc  parame‐
1292                     ter.
1293
1294              jobcomp/lua
1295                     Upon  job  completion, a record of the job should be pro‐
1296                     cessed by the jobcomp.lua script, located in the  default
1297                     script  directory  (typically the subdirectory etc of the
1298                     installation directory.
1299
1300              jobcomp/mysql
1301                     Upon job completion, a record of the job should be  writ‐
1302                     ten to a MySQL or MariaDB database, specified by the Job‐
1303                     CompLoc parameter.
1304
1305              jobcomp/script
1306                     Upon job completion, a script specified by the JobCompLoc
1307                     parameter  is  to  be executed with environment variables
1308                     providing the job information.
1309
1310       JobCompUser
1311              The user account for  accessing  the  job  completion  database.
1312              Only used for database type storage plugins, ignored otherwise.
1313
1314       JobContainerType
1315              Identifies the plugin to be used for job tracking.  A restart of
1316              slurmctld is required for changes to this parameter to take  ef‐
1317              fect.   NOTE:  The JobContainerType applies to a job allocation,
1318              while ProctrackType applies to job steps.  Acceptable values  at
1319              present include:
1320
1321              job_container/cncu  Used  only  for Cray systems (CNCU = Compute
1322                                  Node Clean Up)
1323
1324              job_container/none  Used for all other system types
1325
1326              job_container/tmpfs Used to create a private  namespace  on  the
1327                                  filesystem  for jobs, which houses temporary
1328                                  file systems (/tmp and  /dev/shm)  for  each
1329                                  job.  'PrologFlags=Contain'  must  be set to
1330                                  use this plugin.
1331
1332       JobFileAppend
1333              This option controls what to do if a job's output or error  file
1334              exist  when  the  job  is started.  If JobFileAppend is set to a
1335              value of 1, then append to the existing file.  By  default,  any
1336              existing file is truncated.
1337
1338       JobRequeue
1339              This  option  controls  the default ability for batch jobs to be
1340              requeued.  Jobs may be requeued explicitly by a system  adminis‐
1341              trator,  after node failure, or upon preemption by a higher pri‐
1342              ority job.  If JobRequeue is set to a value  of  1,  then  batch
1343              jobs may be requeued unless explicitly disabled by the user.  If
1344              JobRequeue is set to a value of 0, then batch jobs will  not  be
1345              requeued  unless explicitly enabled by the user.  Use the sbatch
1346              --no-requeue or --requeue option to change the default  behavior
1347              for individual jobs.  The default value is 1.
1348
1349       JobSubmitPlugins
1350              These are intended to be site-specific plugins which can be used
1351              to set default job parameters and/or logging events.  Slurm  can
1352              be  configured  to  use  multiple job_submit plugins if desired,
1353              which must be specified as a comma-delimited list  and  will  be
1354              executed in the order listed.
1355              e.g. for multiple job_submit plugin configuration:
1356              JobSubmitPlugins=lua,require_timelimit
1357              Take   a   look  at  <https://slurm.schedmd.com/job_submit_plug‐
1358              ins.html> for further plugin implementation details. No job sub‐
1359              mission  plugins are used by default.  Currently available plug‐
1360              ins are:
1361
1362              all_partitions          Set default partition to all  partitions
1363                                      on the cluster.
1364
1365              defaults                Set default values for job submission or
1366                                      modify requests.
1367
1368              logging                 Log select job submission and  modifica‐
1369                                      tion parameters.
1370
1371              lua                     Execute a Lua script implementing site's
1372                                      own  job_submit  logic.  Only  one   Lua
1373                                      script  will  be  executed.  It  must be
1374                                      named "job_submit.lua" and must  be  lo‐
1375                                      cated  in  the default configuration di‐
1376                                      rectory  (typically   the   subdirectory
1377                                      "etc"  of  the  installation directory).
1378                                      Sample Lua scripts can be found with the
1379                                      Slurm  distribution,  in  the  directory
1380                                      contribs/lua. Slurmctld  will  fatal  on
1381                                      startup  if the configured lua script is
1382                                      invalid. Slurm  will  try  to  load  the
1383                                      script  for  each job submission. If the
1384                                      script is broken or removed while slurm‐
1385                                      ctld  is running, Slurm will fallback to
1386                                      the  previous  working  version  of  the
1387                                      script.
1388
1389              partition               Set a job's default partition based upon
1390                                      job submission parameters and  available
1391                                      partitions.
1392
1393              pbs                     Translate  PBS job submission options to
1394                                      Slurm equivalent (if possible).
1395
1396              require_timelimit       Force job submissions to specify a time‐
1397                                      limit.
1398
1399              NOTE:  For  examples  of  use  see  the Slurm code in "src/plug‐
1400              ins/job_submit" and "contribs/lua/job_submit*.lua"  then  modify
1401              the code to satisfy your needs.
1402
1403       KillOnBadExit
1404              If  set  to 1, a step will be terminated immediately if any task
1405              is crashed or aborted, as indicated by  a  non-zero  exit  code.
1406              With  the default value of 0, if one of the processes is crashed
1407              or aborted the other processes will continue to  run  while  the
1408              crashed  or  aborted  process  waits. The user can override this
1409              configuration parameter by using srun's -K, --kill-on-bad-exit.
1410
1411       KillWait
1412              The interval, in seconds, given to a job's processes between the
1413              SIGTERM  and  SIGKILL  signals upon reaching its time limit.  If
1414              the job fails to terminate gracefully in the interval specified,
1415              it  will  be  forcibly terminated.  The default value is 30 sec‐
1416              onds.  The value may not exceed 65533.
1417
1418       NodeFeaturesPlugins
1419              Identifies the plugins to be used for support of  node  features
1420              which  can  change through time. For example, a node which might
1421              be booted with various BIOS setting. This is  supported  through
1422              the  use  of a node's active_features and available_features in‐
1423              formation.  Acceptable values at present include:
1424
1425              node_features/knl_cray
1426                     Used only for Intel Knights Landing processors  (KNL)  on
1427                     Cray systems.
1428
1429              node_features/knl_generic
1430                     Used  for  Intel  Knights  Landing  processors (KNL) on a
1431                     generic Linux system.
1432
1433              node_features/helpers
1434                     Used to report and modify features on nodes  using  arbi‐
1435                     trary scripts or programs.
1436
1437       LaunchParameters
1438              Identifies  options to the job launch plugin.  Acceptable values
1439              include:
1440
1441              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1442                                      from  given  --cpu-freq,  or  slurm.conf
1443                                      CpuFreqDef,  option.   By  default  only
1444                                      steps started with srun will utilize the
1445                                      cpu freq setting options.
1446
1447                                      NOTE: If you are using  srun  to  launch
1448                                      your  steps  inside  a batch script (ad‐
1449                                      vised) this option will create a  situa‐
1450                                      tion  where you may have multiple agents
1451                                      setting the cpu_freq as the  batch  step
1452                                      usually  runs  on the same resources one
1453                                      or more steps the sruns  in  the  script
1454                                      will create.
1455
1456              cray_net_exclusive      Allow  jobs on a Cray Native cluster ex‐
1457                                      clusive  access  to  network  resources.
1458                                      This should only be set on clusters pro‐
1459                                      viding exclusive access to each node  to
1460                                      a single job at once, and not using par‐
1461                                      allel steps within  the  job,  otherwise
1462                                      resources  on  the  node can be oversub‐
1463                                      scribed.
1464
1465              enable_nss_slurm        Permits passwd and group resolution  for
1466                                      a  job  to  be  serviced  by  slurmstepd
1467                                      rather than requiring a  lookup  from  a
1468                                      network      based      service.     See
1469                                      https://slurm.schedmd.com/nss_slurm.html
1470                                      for more information.
1471
1472              lustre_no_flush         If set on a Cray Native cluster, then do
1473                                      not flush the Lustre cache on  job  step
1474                                      completion.  This setting will only take
1475                                      effect  after  reconfiguring,  and  will
1476                                      only  take  effect  for  newly  launched
1477                                      jobs.
1478
1479              mem_sort                Sort NUMA memory at step start. User can
1480                                      override      this      default     with
1481                                      SLURM_MEM_BIND environment  variable  or
1482                                      --mem-bind=nosort command line option.
1483
1484              mpir_use_nodeaddr       When  launching  tasks Slurm creates en‐
1485                                      tries in MPIR_proctable that are used by
1486                                      parallel  debuggers,  profilers, and re‐
1487                                      lated  tools  to   attach   to   running
1488                                      process.   By default the MPIR_proctable
1489                                      entries contain MPIR_procdesc structures
1490                                      where  the  host_name is set to NodeName
1491                                      by default. If this option is specified,
1492                                      NodeAddr  will  be  used in this context
1493                                      instead.
1494
1495              disable_send_gids       By default, the slurmctld will  look  up
1496                                      and send the user_name and extended gids
1497                                      for a job, rather than independently  on
1498                                      each  node  as part of each task launch.
1499                                      This helps mitigate issues  around  name
1500                                      service  scalability when launching jobs
1501                                      involving many nodes. Using this  option
1502                                      will  disable  this  functionality. This
1503                                      option is ignored if enable_nss_slurm is
1504                                      specified.
1505
1506              slurmstepd_memlock      Lock  the  slurmstepd  process's current
1507                                      memory in RAM.
1508
1509              slurmstepd_memlock_all  Lock the  slurmstepd  process's  current
1510                                      and future memory in RAM.
1511
1512              test_exec               Have  srun  verify existence of the exe‐
1513                                      cutable program along with user  execute
1514                                      permission  on  the  node where srun was
1515                                      called before attempting to launch it on
1516                                      nodes in the step.
1517
1518              use_interactive_step    Have  salloc use the Interactive Step to
1519                                      launch a shell on an  allocated  compute
1520                                      node  rather  than  locally  to wherever
1521                                      salloc was invoked. This is accomplished
1522                                      by  launching  the srun command with In‐
1523                                      teractiveStepOptions as options.
1524
1525                                      This does not affect salloc called  with
1526                                      a  command  as  an  argument. These jobs
1527                                      will continue  to  be  executed  as  the
1528                                      calling user on the calling host.
1529
1530       LaunchType
1531              Identifies the mechanism to be used to launch application tasks.
1532              Acceptable values include:
1533
1534              launch/slurm
1535                     The default value.
1536
1537       Licenses
1538              Specification of licenses (or other resources available  on  all
1539              nodes  of  the cluster) which can be allocated to jobs.  License
1540              names can optionally be followed by a colon and count with a de‐
1541              fault count of one.  Multiple license names should be comma sep‐
1542              arated (e.g.  "Licenses=foo:4,bar").  Note that  Slurm  prevents
1543              jobs  from  being scheduled if their required license specifica‐
1544              tion is not available.  Slurm does not prevent jobs  from  using
1545              licenses  that  are  not explicitly listed in the job submission
1546              specification.
1547
1548       LogTimeFormat
1549              Format of the timestamp in slurmctld and slurmd log  files.  Ac‐
1550              cepted    values   are   "iso8601",   "iso8601_ms",   "rfc5424",
1551              "rfc5424_ms", "clock", "short" and "thread_id". The values  end‐
1552              ing  in  "_ms"  differ  from the ones without in that fractional
1553              seconds with millisecond  precision  are  printed.  The  default
1554              value is "iso8601_ms". The "rfc5424" formats are the same as the
1555              "iso8601" formats except that the timezone value is also  shown.
1556              The  "clock"  format shows a timestamp in microseconds retrieved
1557              with the C standard clock() function. The "short"  format  is  a
1558              short  date  and  time  format. The "thread_id" format shows the
1559              timestamp in the C standard ctime() function  form  without  the
1560              year but including the microseconds, the daemon's process ID and
1561              the current thread name and ID.
1562
1563       MailDomain
1564              Domain name to qualify usernames if email address is not explic‐
1565              itly  given  with  the "--mail-user" option. If unset, the local
1566              MTA will need to qualify local address itself. Changes to  Mail‐
1567              Domain will only affect new jobs.
1568
1569       MailProg
1570              Fully  qualified  pathname to the program used to send email per
1571              user  request.    The   default   value   is   "/bin/mail"   (or
1572              "/usr/bin/mail"    if    "/bin/mail"    does   not   exist   but
1573              "/usr/bin/mail" does exist).  The program is called  with  argu‐
1574              ments  suitable for the default mail command, however additional
1575              information about the job is passed in the form  of  environment
1576              variables.
1577
1578              Additional  variables  are  the  same  as  those  passed to Pro‐
1579              logSlurmctld and EpilogSlurmctld with  additional  variables  in
1580              the following contexts:
1581
1582              ALL
1583
1584                     SLURM_JOB_STATE
1585                            The  base  state  of  the job when the MailProg is
1586                            called.
1587
1588                     SLURM_JOB_MAIL_TYPE
1589                            The mail type triggering the mail.
1590
1591              BEGIN
1592
1593                     SLURM_JOB_QEUEUED_TIME
1594                            The amount of time the job was queued.
1595
1596              END, FAIL, REQUEUE, TIME_LIMIT_*
1597
1598                     SLURM_JOB_RUN_TIME
1599                            The amount of time the job ran for.
1600
1601              END, FAIL
1602
1603                     SLURM_JOB_EXIT_CODE_MAX
1604                            Job's exit code or highest exit code for an  array
1605                            job.
1606
1607                     SLURM_JOB_EXIT_CODE_MIN
1608                            Job's minimum exit code for an array job.
1609
1610                     SLURM_JOB_TERM_SIGNAL_MAX
1611                            Job's highest signal for an array job.
1612
1613              STAGE_OUT
1614
1615                     SLURM_JOB_STAGE_OUT_TIME
1616                            Job's staging out time.
1617
1618       MaxArraySize
1619              The  maximum  job  array  task index value will be one less than
1620              MaxArraySize to allow for an index  value  of  zero.   Configure
1621              MaxArraySize  to 0 in order to disable job array use.  The value
1622              may not exceed 4000001.  The value of MaxJobCount should be much
1623              larger  than MaxArraySize.  The default value is 1001.  See also
1624              max_array_tasks in SchedulerParameters.
1625
1626       MaxDBDMsgs
1627              When communication to the SlurmDBD is not possible the slurmctld
1628              will  queue  messages  meant  to  processed when the SlurmDBD is
1629              available again.  In order to avoid running out  of  memory  the
1630              slurmctld will only queue so many messages. The default value is
1631              10000, or MaxJobCount *  2  +  Node  Count  *  4,  whichever  is
1632              greater.  The value can not be less than 10000.
1633
1634       MaxJobCount
1635              The  maximum  number of jobs slurmctld can have in memory at one
1636              time.  Combine with MinJobAge to  ensure  the  slurmctld  daemon
1637              does  not exhaust its memory or other resources. Once this limit
1638              is reached, requests to submit additional jobs  will  fail.  The
1639              default  value  is  10000  jobs.  NOTE: Each task of a job array
1640              counts as one job even though they will not occupy separate  job
1641              records  until  modified  or  initiated.  Performance can suffer
1642              with more than a few hundred thousand jobs.  Setting per MaxSub‐
1643              mitJobs  per user is generally valuable to prevent a single user
1644              from filling the system with jobs.  This is  accomplished  using
1645              Slurm's database and configuring enforcement of resource limits.
1646              A restart of slurmctld is required for changes to this parameter
1647              to take effect.
1648
1649       MaxJobId
1650              The  maximum job id to be used for jobs submitted to Slurm with‐
1651              out a specific requested value. Job ids are unsigned 32bit inte‐
1652              gers  with  the first 26 bits reserved for local job ids and the
1653              remaining 6 bits reserved for a cluster id to identify a  feder‐
1654              ated   job's  origin.  The  maximum  allowed  local  job  id  is
1655              67,108,863  (0x3FFFFFF).  The  default   value   is   67,043,328
1656              (0x03ff0000).  MaxJobId only applies to the local job id and not
1657              the federated job id.  Job id values generated  will  be  incre‐
1658              mented  by  1 for each subsequent job. Once MaxJobId is reached,
1659              the next job will be assigned FirstJobId.  Federated  jobs  will
1660              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1661              bId.
1662
1663       MaxMemPerCPU
1664              Maximum  real  memory  size  available  per  allocated  CPU   in
1665              megabytes.   Used  to  avoid over-subscribing memory and causing
1666              paging.  MaxMemPerCPU would generally be used if individual pro‐
1667              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
1668              lectType=select/cons_tres).  The default value is 0 (unlimited).
1669              Also  see DefMemPerCPU, DefMemPerGPU and MaxMemPerNode.  MaxMem‐
1670              PerCPU and MaxMemPerNode are mutually exclusive.
1671
1672              NOTE: If a job specifies a memory per  CPU  limit  that  exceeds
1673              this system limit, that job's count of CPUs per task will try to
1674              automatically increase. This may result in the job  failing  due
1675              to  CPU count limits. This auto-adjustment feature is a best-ef‐
1676              fort one and optimal assignment is not  guaranteed  due  to  the
1677              possibility   of   having   heterogeneous   configurations   and
1678              multi-partition/qos jobs.  If this is a concern it is advised to
1679              use  a job submit LUA plugin instead to enforce auto-adjustments
1680              to your specific needs.
1681
1682       MaxMemPerNode
1683              Maximum  real  memory  size  available  per  allocated  node  in
1684              megabytes.   Used  to  avoid over-subscribing memory and causing
1685              paging.  MaxMemPerNode would generally be used  if  whole  nodes
1686              are  allocated  to jobs (SelectType=select/linear) and resources
1687              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
1688              The  default value is 0 (unlimited).  Also see DefMemPerNode and
1689              MaxMemPerCPU.  MaxMemPerCPU and MaxMemPerNode are  mutually  ex‐
1690              clusive.
1691
1692       MaxNodeCount
1693              Maximum count of nodes which may exist in the controller. By de‐
1694              fault MaxNodeCount will be set to the number of nodes  found  in
1695              the  slurm.conf.  MaxNodeCount  will be ignored if less than the
1696              number of nodes found in the slurm.conf.  Increase  MaxNodeCount
1697              to  accommodate dynamically created nodes with dynamic node reg‐
1698              istrations and nodes created with scontrol. The slurmctld daemon
1699              must be restarted for changes to this parameter to take effect.
1700
1701       MaxStepCount
1702              The  maximum number of steps that any job can initiate. This pa‐
1703              rameter is intended to limit the effect of  bad  batch  scripts.
1704              The default value is 40000 steps.
1705
1706       MaxTasksPerNode
1707              Maximum  number of tasks Slurm will allow a job step to spawn on
1708              a single node. The default MaxTasksPerNode is 512.  May not  ex‐
1709              ceed 65533.
1710
1711       MCSParameters
1712              MCS  =  Multi-Category Security MCS Plugin Parameters.  The sup‐
1713              ported parameters are specific to  the  MCSPlugin.   Changes  to
1714              this  value take effect when the Slurm daemons are reconfigured.
1715              More    information    about    MCS    is     available     here
1716              <https://slurm.schedmd.com/mcs.html>.
1717
1718       MCSPlugin
1719              MCS  =  Multi-Category  Security : associate a security label to
1720              jobs and ensure that nodes can only be shared among  jobs  using
1721              the same security label.  Acceptable values include:
1722
1723              mcs/none    is  the default value.  No security label associated
1724                          with jobs, no particular security  restriction  when
1725                          sharing nodes among jobs.
1726
1727              mcs/account only users with the same account can share the nodes
1728                          (requires enabling of accounting).
1729
1730              mcs/group   only users with the same group can share the nodes.
1731
1732              mcs/user    a node cannot be shared with other users.
1733
1734       MessageTimeout
1735              Time permitted for a round-trip  communication  to  complete  in
1736              seconds.  Default  value  is 10 seconds. For systems with shared
1737              nodes, the slurmd daemon could  be  paged  out  and  necessitate
1738              higher values.
1739
1740       MinJobAge
1741              The  minimum age of a completed job before its record is cleared
1742              from the list of jobs slurmctld keeps in  memory.  Combine  with
1743              MaxJobCount  to ensure the slurmctld daemon does not exhaust its
1744              memory or other resources. The default value is 300 seconds.   A
1745              value  of  zero  prevents  any job record purging.  Jobs are not
1746              purged during a backfill cycle, so it can take longer than  Min‐
1747              JobAge  seconds  to purge a job if using the backfill scheduling
1748              plugin.  In order to eliminate some  possible  race  conditions,
1749              the minimum non-zero value for MinJobAge recommended is 2.
1750
1751       MpiDefault
1752              Identifies  the  default type of MPI to be used.  Srun may over‐
1753              ride this configuration parameter in any case.   Currently  sup‐
1754              ported  versions  include:  pmi2, pmix, and none (default, which
1755              works for many other versions of MPI).  More  information  about
1756              MPI           use           is           available          here
1757              <https://slurm.schedmd.com/mpi_guide.html>.
1758
1759       MpiParams
1760              MPI parameters.  Used to identify ports used  by  native  Cray's
1761              PMI.  The  format  to identify a range of communication ports is
1762              "ports=12000-12999".
1763
1764       OverTimeLimit
1765              Number of minutes by which a job can exceed its time  limit  be‐
1766              fore  being canceled.  Normally a job's time limit is treated as
1767              a hard limit and the job  will  be  killed  upon  reaching  that
1768              limit.   Configuring OverTimeLimit will result in the job's time
1769              limit being treated like a soft limit.  Adding the OverTimeLimit
1770              value  to  the  soft  time  limit provides a hard time limit, at
1771              which point the job is canceled.  This  is  particularly  useful
1772              for  backfill  scheduling, which bases upon each job's soft time
1773              limit.  The default value is zero.  May not  exceed  65533  min‐
1774              utes.  A value of "UNLIMITED" is also supported.
1775
1776       PluginDir
1777              Identifies  the places in which to look for Slurm plugins.  This
1778              is a colon-separated list of directories, like the PATH environ‐
1779              ment variable.  The default value is the prefix given at config‐
1780              ure time + "/lib/slurm".  A restart of slurmctld and the  slurmd
1781              daemons  is  required  for changes to this parameter to take ef‐
1782              fect.
1783
1784       PlugStackConfig
1785              Location of the config file for Slurm stackable plugins that use
1786              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1787              (SPANK).  This provides support for a highly configurable set of
1788              plugins  to be called before and/or after execution of each task
1789              spawned as part of a  user's  job  step.   Default  location  is
1790              "plugstack.conf" in the same directory as the system slurm.conf.
1791              For more information on SPANK plugins, see the spank(8) manual.
1792
1793       PowerParameters
1794              System power management parameters.   The  supported  parameters
1795              are specific to the PowerPlugin.  Changes to this value take ef‐
1796              fect when the Slurm daemons are reconfigured.  More  information
1797              about    system    power    management    is    available   here
1798              <https://slurm.schedmd.com/power_mgmt.html>.   Options   current
1799              supported by any plugins are listed below.
1800
1801              balance_interval=#
1802                     Specifies the time interval, in seconds, between attempts
1803                     to rebalance power caps across the nodes.  This also con‐
1804                     trols  the  frequency  at which Slurm attempts to collect
1805                     current power consumption data (old data may be used  un‐
1806                     til new data is available from the underlying infrastruc‐
1807                     ture and values below 10 seconds are not recommended  for
1808                     Cray  systems).   The  default value is 30 seconds.  Sup‐
1809                     ported by the power/cray_aries plugin.
1810
1811              capmc_path=
1812                     Specifies the absolute path of the  capmc  command.   The
1813                     default   value  is  "/opt/cray/capmc/default/bin/capmc".
1814                     Supported by the power/cray_aries plugin.
1815
1816              cap_watts=#
1817                     Specifies the total power limit to be established  across
1818                     all  compute  nodes  managed by Slurm.  A value of 0 sets
1819                     every compute node to have an unlimited cap.  The default
1820                     value is 0.  Supported by the power/cray_aries plugin.
1821
1822              decrease_rate=#
1823                     Specifies the maximum rate of change in the power cap for
1824                     a node where the actual power usage is  below  the  power
1825                     cap  by  an  amount greater than lower_threshold (see be‐
1826                     low).  Value represents a percentage  of  the  difference
1827                     between  a  node's minimum and maximum power consumption.
1828                     The default  value  is  50  percent.   Supported  by  the
1829                     power/cray_aries plugin.
1830
1831              get_timeout=#
1832                     Amount  of time allowed to get power state information in
1833                     milliseconds.  The default value is 5,000 milliseconds or
1834                     5  seconds.  Supported by the power/cray_aries plugin and
1835                     represents the time allowed for the capmc command to  re‐
1836                     spond to various "get" options.
1837
1838              increase_rate=#
1839                     Specifies the maximum rate of change in the power cap for
1840                     a node  where  the  actual  power  usage  is  within  up‐
1841                     per_threshold (see below) of the power cap.  Value repre‐
1842                     sents a percentage of the  difference  between  a  node's
1843                     minimum and maximum power consumption.  The default value
1844                     is 20 percent.  Supported by the power/cray_aries plugin.
1845
1846              job_level
1847                     All nodes associated with every job will  have  the  same
1848                     power   cap,  to  the  extent  possible.   Also  see  the
1849                     --power=level option on the job submission commands.
1850
1851              job_no_level
1852                     Disable the user's ability to set every  node  associated
1853                     with  a  job  to the same power cap.  Each node will have
1854                     its power  cap  set  independently.   This  disables  the
1855                     --power=level option on the job submission commands.
1856
1857              lower_threshold=#
1858                     Specify a lower power consumption threshold.  If a node's
1859                     current power consumption is below this percentage of its
1860                     current cap, then its power cap will be reduced.  The de‐
1861                     fault  value   is   90   percent.    Supported   by   the
1862                     power/cray_aries plugin.
1863
1864              recent_job=#
1865                     If  a job has started or resumed execution (from suspend)
1866                     on a compute node within this number of seconds from  the
1867                     current  time,  the node's power cap will be increased to
1868                     the maximum.  The default value  is  300  seconds.   Sup‐
1869                     ported by the power/cray_aries plugin.
1870
1871
1872              set_timeout=#
1873                     Amount  of time allowed to set power state information in
1874                     milliseconds.  The default value is  30,000  milliseconds
1875                     or  30  seconds.   Supported by the power/cray plugin and
1876                     represents the time allowed for the capmc command to  re‐
1877                     spond to various "set" options.
1878
1879              set_watts=#
1880                     Specifies  the  power  limit  to  be set on every compute
1881                     nodes managed by Slurm.  Every node gets this same  power
1882                     cap and there is no variation through time based upon ac‐
1883                     tual  power  usage  on  the  node.   Supported   by   the
1884                     power/cray_aries plugin.
1885
1886              upper_threshold=#
1887                     Specify  an  upper  power  consumption  threshold.   If a
1888                     node's current power consumption is above this percentage
1889                     of  its current cap, then its power cap will be increased
1890                     to the extent possible.  The default value is 95 percent.
1891                     Supported by the power/cray_aries plugin.
1892
1893       PowerPlugin
1894              Identifies  the  plugin  used for system power management.  Cur‐
1895              rently  supported  plugins  include:  cray_aries  and  none.   A
1896              restart  of  slurmctld is required for changes to this parameter
1897              to take effect.  More information about system power  management
1898              is  available  here <https://slurm.schedmd.com/power_mgmt.html>.
1899              By default, no power plugin is loaded.
1900
1901       PreemptMode
1902              Mechanism used to preempt jobs or enable gang  scheduling.  When
1903              the  PreemptType parameter is set to enable preemption, the Pre‐
1904              emptMode selects the default mechanism used to preempt the  eli‐
1905              gible jobs for the cluster.
1906              PreemptMode  may  be specified on a per partition basis to over‐
1907              ride this default value  if  PreemptType=preempt/partition_prio.
1908              Alternatively,  it  can  be specified on a per QOS basis if Pre‐
1909              emptType=preempt/qos. In either case, a valid  default  Preempt‐
1910              Mode  value  must  be  specified for the cluster as a whole when
1911              preemption is enabled.
1912              The GANG option is used to enable gang scheduling independent of
1913              whether  preemption is enabled (i.e. independent of the Preempt‐
1914              Type setting). It can be specified in addition to a  PreemptMode
1915              setting  with  the  two  options  comma separated (e.g. Preempt‐
1916              Mode=SUSPEND,GANG).
1917              See         <https://slurm.schedmd.com/preempt.html>         and
1918              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
1919              tails.
1920
1921              NOTE: For performance reasons, the backfill  scheduler  reserves
1922              whole  nodes  for  jobs,  not  partial nodes. If during backfill
1923              scheduling a job preempts one or  more  other  jobs,  the  whole
1924              nodes  for  those  preempted jobs are reserved for the preemptor
1925              job, even if the preemptor job requested  fewer  resources  than
1926              that.   These reserved nodes aren't available to other jobs dur‐
1927              ing that backfill cycle, even if the other jobs could fit on the
1928              nodes.  Therefore, jobs may preempt more resources during a sin‐
1929              gle backfill iteration than they requested.
1930              NOTE: For heterogeneous job to be considered for preemption  all
1931              components must be eligible for preemption. When a heterogeneous
1932              job is to be preempted the first identified component of the job
1933              with  the highest order PreemptMode (SUSPEND (highest), REQUEUE,
1934              CANCEL (lowest)) will be used to set  the  PreemptMode  for  all
1935              components.  The GraceTime and user warning signal for each com‐
1936              ponent of the heterogeneous job  remain  unique.   Heterogeneous
1937              jobs are excluded from GANG scheduling operations.
1938
1939              OFF         Is the default value and disables job preemption and
1940                          gang scheduling.  It is only  compatible  with  Pre‐
1941                          emptType=preempt/none  at  a global level.  A common
1942                          use case for this parameter is to set it on a parti‐
1943                          tion to disable preemption for that partition.
1944
1945              CANCEL      The preempted job will be cancelled.
1946
1947              GANG        Enables  gang  scheduling  (time slicing) of jobs in
1948                          the same partition, and allows the resuming of  sus‐
1949                          pended jobs.
1950
1951                          NOTE: Gang scheduling is performed independently for
1952                          each partition, so if you only want time-slicing  by
1953                          OverSubscribe,  without any preemption, then config‐
1954                          uring partitions with overlapping nodes is not  rec‐
1955                          ommended.   On  the  other  hand, if you want to use
1956                          PreemptType=preempt/partition_prio  to  allow   jobs
1957                          from  higher PriorityTier partitions to Suspend jobs
1958                          from lower PriorityTier  partitions  you  will  need
1959                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1960                          to use the Gang scheduler to  resume  the  suspended
1961                          jobs(s).  In any case, time-slicing won't happen be‐
1962                          tween jobs on different partitions.
1963
1964                          NOTE: Heterogeneous  jobs  are  excluded  from  GANG
1965                          scheduling operations.
1966
1967              REQUEUE     Preempts  jobs  by  requeuing  them (if possible) or
1968                          canceling them.  For jobs to be requeued  they  must
1969                          have  the --requeue sbatch option set or the cluster
1970                          wide JobRequeue parameter in slurm.conf must be  set
1971                          to 1.
1972
1973              SUSPEND     The  preempted jobs will be suspended, and later the
1974                          Gang scheduler will resume them. Therefore the  SUS‐
1975                          PEND preemption mode always needs the GANG option to
1976                          be specified at the cluster level. Also, because the
1977                          suspended  jobs  will  still use memory on the allo‐
1978                          cated nodes, Slurm needs to be able to track  memory
1979                          resources to be able to suspend jobs.
1980                          If  PreemptType=preempt/qos is configured and if the
1981                          preempted job(s) and the preemptor job  are  on  the
1982                          same  partition, then they will share resources with
1983                          the Gang scheduler (time-slicing). If not  (i.e.  if
1984                          the preemptees and preemptor are on different parti‐
1985                          tions) then the preempted jobs will remain suspended
1986                          until the preemptor ends.
1987
1988                          NOTE:  Because gang scheduling is performed indepen‐
1989                          dently for each partition, if using PreemptType=pre‐
1990                          empt/partition_prio then jobs in higher PriorityTier
1991                          partitions will suspend jobs in  lower  PriorityTier
1992                          partitions  to  run  on the released resources. Only
1993                          when the preemptor job ends will the suspended  jobs
1994                          will be resumed by the Gang scheduler.
1995                          NOTE:  Suspended  jobs will not release GRES. Higher
1996                          priority jobs will not be able to  preempt  to  gain
1997                          access to GRES.
1998
1999              WITHIN      For  PreemptType=preempt/qos,  allow jobs within the
2000                          same qos to preempt one another. While this  can  be
2001                          set globally here, it is recommend that this only be
2002                          set directly on a relevant subset of the system  qos
2003                          values instead.
2004
2005       PreemptType
2006              Specifies  the  plugin  used  to identify which jobs can be pre‐
2007              empted in order to start a pending job.
2008
2009              preempt/none
2010                     Job preemption is disabled.  This is the default.
2011
2012              preempt/partition_prio
2013                     Job preemption  is  based  upon  partition  PriorityTier.
2014                     Jobs  in  higher PriorityTier partitions may preempt jobs
2015                     from lower PriorityTier partitions.  This is not compati‐
2016                     ble with PreemptMode=OFF.
2017
2018              preempt/qos
2019                     Job  preemption rules are specified by Quality Of Service
2020                     (QOS) specifications in the Slurm database.  This  option
2021                     is  not compatible with PreemptMode=OFF.  A configuration
2022                     of PreemptMode=SUSPEND is only supported by  the  Select‐
2023                     Type=select/cons_res    and   SelectType=select/cons_tres
2024                     plugins.  See the sacctmgr man page to configure the  op‐
2025                     tions for preempt/qos.
2026
2027       PreemptExemptTime
2028              Global  option for minimum run time for all jobs before they can
2029              be considered for preemption. Any  QOS  PreemptExemptTime  takes
2030              precedence over the global option. This is only honored for Pre‐
2031              emptMode=REQUEUE and PreemptMode=CANCEL.
2032              A time of -1 disables the option, equivalent  to  0.  Acceptable
2033              time  formats  include "minutes", "minutes:seconds", "hours:min‐
2034              utes:seconds",    "days-hours",    "days-hours:minutes",     and
2035              "days-hours:minutes:seconds".
2036
2037       PrEpParameters
2038              Parameters to be passed to the PrEpPlugins.
2039
2040       PrEpPlugins
2041              A  resource  for  programmers wishing to write their own plugins
2042              for the Prolog and Epilog (PrEp) scripts. The default, and  cur‐
2043              rently  the  only  implemented plugin is prep/script. Additional
2044              plugins can be specified in a comma-separated list. For more in‐
2045              formation  please  see  the  PrEp Plugin API documentation page:
2046              <https://slurm.schedmd.com/prep_plugins.html>
2047
2048       PriorityCalcPeriod
2049              The period of time in minutes in which the half-life decay  will
2050              be re-calculated.  Applicable only if PriorityType=priority/mul‐
2051              tifactor.  The default value is 5 (minutes).
2052
2053       PriorityDecayHalfLife
2054              This controls how long prior resource use is considered  in  de‐
2055              termining  how  over- or under-serviced an association is (user,
2056              bank account and cluster)  in  determining  job  priority.   The
2057              record  of  usage  will  be  decayed over time, with half of the
2058              original value cleared at age PriorityDecayHalfLife.  If set  to
2059              0  no decay will be applied.  This is helpful if you want to en‐
2060              force hard time limits per association.  If  set  to  0  Priori‐
2061              tyUsageResetPeriod  must  be  set  to some interval.  Applicable
2062              only if PriorityType=priority/multifactor.  The unit is  a  time
2063              string  (i.e.  min, hr:min:00, days-hr:min:00, or days-hr).  The
2064              default value is 7-0 (7 days).
2065
2066       PriorityFavorSmall
2067              Specifies that small jobs should be given preferential  schedul‐
2068              ing  priority.   Applicable only if PriorityType=priority/multi‐
2069              factor.  Supported values are "YES" and "NO".  The default value
2070              is "NO".
2071
2072       PriorityFlags
2073              Flags to modify priority behavior.  Applicable only if Priority‐
2074              Type=priority/multifactor.  The keywords below have  no  associ‐
2075              ated    value   (e.g.   "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
2076              TIVE_TO_TIME").
2077
2078              ACCRUE_ALWAYS    If set, priority age factor will  be  increased
2079                               despite  job ineligibility due to either depen‐
2080                               dencies, holds or begin time in the future. Ac‐
2081                               crue limits are ignored.
2082
2083              CALCULATE_RUNNING
2084                               If  set,  priorities  will  be recalculated not
2085                               only for pending jobs,  but  also  running  and
2086                               suspended jobs.
2087
2088              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
2089                               lar to the normal multifactor calculation,  but
2090                               depth  of the associations in the tree does not
2091                               adversely affect their  priority.  This  option
2092                               automatically enables NO_FAIR_TREE.
2093
2094              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
2095                               to "classic" fair share priority scheduling.
2096
2097              INCR_ONLY        If set, priority values will only  increase  in
2098                               value.  Job  priority  will  never  decrease in
2099                               value.
2100
2101              MAX_TRES         If set, the weighted  TRES  value  (e.g.  TRES‐
2102                               BillingWeights) is calculated as the MAX of in‐
2103                               dividual TRES' on a node (e.g. cpus, mem, gres)
2104                               plus  the  sum  of  all  global TRES' (e.g. li‐
2105                               censes).
2106
2107              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
2108
2109              NO_NORMAL_ASSOC  If set, the association factor is  not  normal‐
2110                               ized against the highest association priority.
2111
2112              NO_NORMAL_PART   If  set, the partition factor is not normalized
2113                               against the highest  partition  PriorityJobFac‐
2114                               tor.
2115
2116              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
2117                               against the highest qos priority.
2118
2119              NO_NORMAL_TRES   If set,  the  TRES  factor  is  not  normalized
2120                               against the job's partition TRES counts.
2121
2122              SMALL_RELATIVE_TO_TIME
2123                               If  set, the job's size component will be based
2124                               upon not the job size alone, but the job's size
2125                               divided by its time limit.
2126
2127       PriorityMaxAge
2128              Specifies the job age which will be given the maximum age factor
2129              in computing priority. For example, a value of 30 minutes  would
2130              result  in  all  jobs  over  30  minutes  old would get the same
2131              age-based  priority.   Applicable  only  if  PriorityType=prior‐
2132              ity/multifactor.    The   unit  is  a  time  string  (i.e.  min,
2133              hr:min:00, days-hr:min:00, or days-hr).  The  default  value  is
2134              7-0 (7 days).
2135
2136       PriorityParameters
2137              Arbitrary string used by the PriorityType plugin.
2138
2139       PrioritySiteFactorParameters
2140              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
2141
2142       PrioritySiteFactorPlugin
2143              The  specifies  an  optional plugin to be used alongside "prior‐
2144              ity/multifactor", which is meant to initially set  and  continu‐
2145              ously  update the SiteFactor priority factor.  The default value
2146              is "site_factor/none".
2147
2148       PriorityType
2149              This specifies the plugin to be used  in  establishing  a  job's
2150              scheduling  priority.   Also see PriorityFlags for configuration
2151              options.  The default value is "priority/basic".
2152
2153              priority/basic
2154                     Jobs are evaluated in a First In, First Out  (FIFO)  man‐
2155                     ner.
2156
2157              priority/multifactor
2158                     Jobs are assigned a priority based upon a variety of fac‐
2159                     tors that include size, age, Fairshare, etc.
2160
2161              When not FIFO scheduling, jobs are prioritized in the  following
2162              order:
2163
2164              1. Jobs that can preempt
2165              2. Jobs with an advanced reservation
2166              3. Partition PriorityTier
2167              4. Job priority
2168              5. Job submit time
2169              6. Job ID
2170
2171       PriorityUsageResetPeriod
2172              At  this  interval the usage of associations will be reset to 0.
2173              This is used if you want to enforce hard limits  of  time  usage
2174              per association.  If PriorityDecayHalfLife is set to be 0 no de‐
2175              cay will happen and this is the only way to reset the usage  ac‐
2176              cumulated by running jobs.  By default this is turned off and it
2177              is advised to use the PriorityDecayHalfLife option to avoid  not
2178              having  anything  running on your cluster, but if your schema is
2179              set up to only allow certain amounts of time on your system this
2180              is  the  way  to  do it.  Applicable only if PriorityType=prior‐
2181              ity/multifactor.
2182
2183              NONE        Never clear historic usage. The default value.
2184
2185              NOW         Clear the historic usage now.  Executed  at  startup
2186                          and reconfiguration time.
2187
2188              DAILY       Cleared every day at midnight.
2189
2190              WEEKLY      Cleared every week on Sunday at time 00:00.
2191
2192              MONTHLY     Cleared  on  the  first  day  of  each month at time
2193                          00:00.
2194
2195              QUARTERLY   Cleared on the first day of  each  quarter  at  time
2196                          00:00.
2197
2198              YEARLY      Cleared on the first day of each year at time 00:00.
2199
2200       PriorityWeightAge
2201              An  integer  value  that sets the degree to which the queue wait
2202              time component contributes to the  job's  priority.   Applicable
2203              only  if  PriorityType=priority/multifactor.   Requires Account‐
2204              ingStorageType=accounting_storage/slurmdbd.  The  default  value
2205              is 0.
2206
2207       PriorityWeightAssoc
2208              An  integer  value that sets the degree to which the association
2209              component contributes to the job's priority.  Applicable only if
2210              PriorityType=priority/multifactor.  The default value is 0.
2211
2212       PriorityWeightFairshare
2213              An  integer  value  that sets the degree to which the fair-share
2214              component contributes to the job's priority.  Applicable only if
2215              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2216              ageType=accounting_storage/slurmdbd.  The default value is 0.
2217
2218       PriorityWeightJobSize
2219              An integer value that sets the degree to which the job size com‐
2220              ponent  contributes  to  the job's priority.  Applicable only if
2221              PriorityType=priority/multifactor.  The default value is 0.
2222
2223       PriorityWeightPartition
2224              Partition factor used by priority/multifactor plugin  in  calcu‐
2225              lating  job  priority.   Applicable  only if PriorityType=prior‐
2226              ity/multifactor.  The default value is 0.
2227
2228       PriorityWeightQOS
2229              An integer value that sets the degree to which  the  Quality  Of
2230              Service component contributes to the job's priority.  Applicable
2231              only if PriorityType=priority/multifactor.  The default value is
2232              0.
2233
2234       PriorityWeightTRES
2235              A  comma-separated  list of TRES Types and weights that sets the
2236              degree that each TRES Type contributes to the job's priority.
2237
2238              e.g.
2239              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2240
2241              Applicable only if PriorityType=priority/multifactor and if  Ac‐
2242              countingStorageTRES is configured with each TRES Type.  Negative
2243              values are allowed.  The default values are 0.
2244
2245       PrivateData
2246              This controls what type of information is  hidden  from  regular
2247              users.   By  default,  all  information is visible to all users.
2248              User SlurmUser and root can always view all information.  Multi‐
2249              ple  values may be specified with a comma separator.  Acceptable
2250              values include:
2251
2252              accounts
2253                     (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2254                     ing  any account definitions unless they are coordinators
2255                     of them.
2256
2257              cloud  Powered down nodes in the  cloud  are  visible.   Without
2258                     this  flag,  cloud nodes will not appear in the output of
2259                     commands like sinfo unless they are powered on, even  for
2260                     SlurmUser and root.
2261
2262              events prevents users from viewing event information unless they
2263                     have operator status or above.
2264
2265              jobs   Prevents users from viewing jobs or job  steps  belonging
2266                     to  other  users. (NON-SlurmDBD ACCOUNTING ONLY) Prevents
2267                     users from viewing job records belonging to  other  users
2268                     unless  they  are coordinators of the association running
2269                     the job when using sacct.
2270
2271              nodes  Prevents users from viewing node state information.
2272
2273              partitions
2274                     Prevents users from viewing partition state information.
2275
2276              reservations
2277                     Prevents regular users from  viewing  reservations  which
2278                     they can not use.
2279
2280              usage  Prevents users from viewing usage of any other user, this
2281                     applies to sshare.  (NON-SlurmDBD ACCOUNTING  ONLY)  Pre‐
2282                     vents  users  from  viewing usage of any other user, this
2283                     applies to sreport.
2284
2285              users  (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2286                     ing  information  of any user other than themselves, this
2287                     also makes it so users can  only  see  associations  they
2288                     deal  with.   Coordinators  can  see  associations of all
2289                     users in the account they are  coordinator  of,  but  can
2290                     only see themselves when listing users.
2291
2292       ProctrackType
2293              Identifies  the  plugin to be used for process tracking on a job
2294              step basis.  The slurmd daemon uses this mechanism  to  identify
2295              all  processes  which  are children of processes it spawns for a
2296              user job step.  A restart of slurmctld is required  for  changes
2297              to  this  parameter to take effect.  NOTE: "proctrack/linuxproc"
2298              and "proctrack/pgid" can fail to identify all processes  associ‐
2299              ated  with  a job since processes can become a child of the init
2300              process (when the parent process  terminates)  or  change  their
2301              process   group.    To  reliably  track  all  processes,  "proc‐
2302              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2303              applies  to a job allocation, while ProctrackType applies to job
2304              steps.  Acceptable values at present include:
2305
2306              proctrack/cgroup
2307                     Uses linux cgroups to constrain and track processes,  and
2308                     is the default for systems with cgroup support.
2309                     NOTE: see "man cgroup.conf" for configuration details.
2310
2311              proctrack/cray_aries
2312                     Uses Cray proprietary process tracking.
2313
2314              proctrack/linuxproc
2315                     Uses linux process tree using parent process IDs.
2316
2317              proctrack/pgid
2318                     Uses Process Group IDs.
2319                     NOTE: This is the default for the BSD family.
2320
2321       Prolog Fully  qualified pathname of a program for the slurmd to execute
2322              whenever it is asked to run a job step from a new job allocation
2323              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2324              may also be used to specify more than one program to  run  (e.g.
2325              "/etc/slurm/prolog.d/*").  The slurmd executes the prolog before
2326              starting the first job step.  The prolog script or  scripts  may
2327              be  used  to  purge  files,  enable user login, etc.  By default
2328              there is no prolog. Any configured script is  expected  to  com‐
2329              plete  execution quickly (in less time than MessageTimeout).  If
2330              the prolog fails (returns a non-zero exit code), this  will  re‐
2331              sult  in  the  node being set to a DRAIN state and the job being
2332              requeued in a held state, unless nohold_on_prolog_fail  is  con‐
2333              figured  in  SchedulerParameters.  See Prolog and Epilog Scripts
2334              for more information.
2335
2336       PrologEpilogTimeout
2337              The interval in seconds Slurm waits for Prolog and Epilog before
2338              terminating  them. The default behavior is to wait indefinitely.
2339              This interval applies to the Prolog and  Epilog  run  by  slurmd
2340              daemon  before  and  after the job, the PrologSlurmctld and Epi‐
2341              logSlurmctld run by slurmctld daemon, and the SPANK plugin  pro‐
2342              log/epilog        calls:        slurm_spank_job_prolog       and
2343              slurm_spank_job_epilog.
2344              If the PrologSlurmctld times out, the job is requeued if  possi‐
2345              ble.   If the Prolog or slurm_spank_job_prolog time out, the job
2346              is requeued if possible and the node is drained.  If the  Epilog
2347              or slurm_spank_job_epilog time out, the node is drained.  In all
2348              cases, errors are logged.
2349
2350       PrologFlags
2351              Flags to control the Prolog behavior. By default  no  flags  are
2352              set.  Multiple flags may be specified in a comma-separated list.
2353              Currently supported options are:
2354
2355              Alloc   If set, the Prolog script will be executed at job  allo‐
2356                      cation.  By  default, Prolog is executed just before the
2357                      task is launched. Therefore, when salloc is started,  no
2358                      Prolog is executed. Alloc is useful for preparing things
2359                      before a user starts to use any allocated resources.  In
2360                      particular,  this  flag  is needed on a Cray system when
2361                      cluster compatibility mode is enabled.
2362
2363                      NOTE: Use of the Alloc flag will increase the  time  re‐
2364                      quired to start jobs.
2365
2366              Contain At job allocation time, use the ProcTrack plugin to cre‐
2367                      ate a job container  on  all  allocated  compute  nodes.
2368                      This  container  may  be  used  for  user  processes not
2369                      launched    under    Slurm    control,    for    example
2370                      pam_slurm_adopt  may  place processes launched through a
2371                      direct  user  login  into  this  container.   If   using
2372                      pam_slurm_adopt,  then  ProcTrackType must be set to ei‐
2373                      ther proctrack/cgroup or proctrack/cray_aries.   Setting
2374                      the Contain implicitly sets the Alloc flag.
2375
2376              DeferBatch
2377                      If  set,  slurmctld will wait until the prolog completes
2378                      on all allocated nodes  before  sending  the  batch  job
2379                      launch request. With just the Alloc flag, slurmctld will
2380                      launch the batch step as soon as the first node  in  the
2381                      job allocation completes the prolog.
2382
2383              NoHold  If  set,  the  Alloc flag should also be set.  This will
2384                      allow for salloc to not block until the prolog  is  fin‐
2385                      ished on each node.  The blocking will happen when steps
2386                      reach the slurmd and before any execution  has  happened
2387                      in  the  step.  This is a much faster way to work and if
2388                      using srun to launch your  tasks  you  should  use  this
2389                      flag.  This  flag cannot be combined with the Contain or
2390                      X11 flags.
2391
2392              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2393                      rently  on each node.  This flag forces those scripts to
2394                      run serially within each node, but  with  a  significant
2395                      penalty to job throughput on each node.
2396
2397              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2398                      This is incompatible with ProctrackType=proctrack/linux‐
2399                      proc.  Setting the X11 flag implicitly enables both Con‐
2400                      tain and Alloc flags as well.
2401
2402       PrologSlurmctld
2403              Fully qualified pathname of a program for the  slurmctld  daemon
2404              to execute before granting a new job allocation (e.g.  "/usr/lo‐
2405              cal/slurm/prolog_controller").  The program  executes  as  Slur‐
2406              mUser on the same node where the slurmctld daemon executes, giv‐
2407              ing it permission to drain nodes and requeue the job if a  fail‐
2408              ure  occurs  or cancel the job if appropriate.  Exactly what the
2409              program does and how it accomplishes this is completely  at  the
2410              discretion  of  the system administrator.  Information about the
2411              job being initiated, its allocated nodes, etc. are passed to the
2412              program using environment variables.  While this program is run‐
2413              ning,  the  nodes  associated  with  the  job  will  be  have  a
2414              POWER_UP/CONFIGURING flag set in their state, which can be read‐
2415              ily viewed.  The slurmctld daemon  will  wait  indefinitely  for
2416              this  program  to  complete.  Once the program completes with an
2417              exit code of zero, the nodes will be considered  ready  for  use
2418              and  the  program will be started.  If some node can not be made
2419              available for use, the program should drain the node  (typically
2420              using  the  scontrol command) and terminate with a non-zero exit
2421              code.  A non-zero exit code will result in  the  job  being  re‐
2422              queued (where possible) or killed. Note that only batch jobs can
2423              be requeued.  See Prolog and Epilog Scripts  for  more  informa‐
2424              tion.
2425
2426       PropagatePrioProcess
2427              Controls  the  scheduling  priority (nice value) of user spawned
2428              tasks.
2429
2430              0    The tasks will inherit the  scheduling  priority  from  the
2431                   slurm daemon.  This is the default value.
2432
2433              1    The  tasks will inherit the scheduling priority of the com‐
2434                   mand used to submit them (e.g. srun or sbatch).  Unless the
2435                   job is submitted by user root, the tasks will have a sched‐
2436                   uling priority no higher than  the  slurm  daemon  spawning
2437                   them.
2438
2439              2    The  tasks will inherit the scheduling priority of the com‐
2440                   mand used to submit them (e.g. srun or sbatch) with the re‐
2441                   striction  that  their nice value will always be one higher
2442                   than the slurm daemon (i.e.  the tasks scheduling  priority
2443                   will be lower than the slurm daemon).
2444
2445       PropagateResourceLimits
2446              A comma-separated list of resource limit names.  The slurmd dae‐
2447              mon uses these names to obtain the associated (soft) limit  val‐
2448              ues  from  the  user's  process  environment on the submit node.
2449              These limits are then propagated and applied to  the  jobs  that
2450              will  run  on  the  compute nodes.  This parameter can be useful
2451              when system limits vary among nodes.  Any resource  limits  that
2452              do not appear in the list are not propagated.  However, the user
2453              can override this by specifying which resource limits to  propa‐
2454              gate  with  the  sbatch or srun "--propagate" option. If neither
2455              PropagateResourceLimits  or  PropagateResourceLimitsExcept   are
2456              configured  and  the "--propagate" option is not specified, then
2457              the default action is to propagate all limits. Only one  of  the
2458              parameters, either PropagateResourceLimits or PropagateResource‐
2459              LimitsExcept, may be specified.  The user limits can not  exceed
2460              hard  limits under which the slurmd daemon operates. If the user
2461              limits are not propagated, the limits  from  the  slurmd  daemon
2462              will  be  propagated  to the user's job. The limits used for the
2463              Slurm daemons can be set in  the  /etc/sysconf/slurm  file.  For
2464              more  information,  see: https://slurm.schedmd.com/faq.html#mem‐
2465              lock The following limit names are supported by Slurm  (although
2466              some options may not be supported on some systems):
2467
2468              ALL       All limits listed below (default)
2469
2470              NONE      No limits listed below
2471
2472              AS        The  maximum  address  space  (virtual  memory)  for a
2473                        process.
2474
2475              CORE      The maximum size of core file
2476
2477              CPU       The maximum amount of CPU time
2478
2479              DATA      The maximum size of a process's data segment
2480
2481              FSIZE     The maximum size of files created. Note  that  if  the
2482                        user  sets  FSIZE to less than the current size of the
2483                        slurmd.log, job launches will fail with a  'File  size
2484                        limit exceeded' error.
2485
2486              MEMLOCK   The maximum size that may be locked into memory
2487
2488              NOFILE    The maximum number of open files
2489
2490              NPROC     The maximum number of processes available
2491
2492              RSS       The  maximum  resident  set size.  Note that this only
2493                        has effect with Linux kernels 2.4.30 or older or BSD.
2494
2495              STACK     The maximum stack size
2496
2497       PropagateResourceLimitsExcept
2498              A comma-separated list of resource limit names.  By default, all
2499              resource  limits will be propagated, (as described by the Propa‐
2500              gateResourceLimits parameter), except for the  limits  appearing
2501              in  this  list.   The user can override this by specifying which
2502              resource limits to propagate with the sbatch or  srun  "--propa‐
2503              gate"  option.   See PropagateResourceLimits above for a list of
2504              valid limit names.
2505
2506       RebootProgram
2507              Program to be executed on each compute node to  reboot  it.  In‐
2508              voked on each node once it becomes idle after the command "scon‐
2509              trol reboot" is executed by an authorized user or a job is  sub‐
2510              mitted with the "--reboot" option.  After rebooting, the node is
2511              returned to normal use.  See ResumeTimeout to configure the time
2512              you expect a reboot to finish in.  A node will be marked DOWN if
2513              it doesn't reboot within ResumeTimeout.
2514
2515       ReconfigFlags
2516              Flags to control various actions  that  may  be  taken  when  an
2517              "scontrol  reconfig"  command  is  issued. Currently the options
2518              are:
2519
2520              KeepPartInfo     If set, an  "scontrol  reconfig"  command  will
2521                               maintain   the  in-memory  value  of  partition
2522                               "state" and other parameters that may have been
2523                               dynamically updated by "scontrol update".  Par‐
2524                               tition information in the slurm.conf file  will
2525                               be  merged  with in-memory data.  This flag su‐
2526                               persedes the KeepPartState flag.
2527
2528              KeepPartState    If set, an  "scontrol  reconfig"  command  will
2529                               preserve  only  the  current  "state"  value of
2530                               in-memory partitions and will reset  all  other
2531                               parameters of the partitions that may have been
2532                               dynamically updated by "scontrol update" to the
2533                               values from the slurm.conf file.  Partition in‐
2534                               formation in the slurm.conf file will be merged
2535                               with in-memory data.
2536
2537              The  default  for  the above flags is not set, and the "scontrol
2538              reconfig" will rebuild the partition information using only  the
2539              definitions in the slurm.conf file.
2540
2541       RequeueExit
2542              Enables  automatic  requeue  for  batch jobs which exit with the
2543              specified values.  Separate multiple exit code by a comma and/or
2544              specify  numeric  ranges  using  a "-" separator (e.g. "Requeue‐
2545              Exit=1-9,18") Jobs will be put back  in  to  pending  state  and
2546              later scheduled again.  Restarted jobs will have the environment
2547              variable SLURM_RESTART_COUNT set to the number of times the  job
2548              has been restarted.
2549
2550       RequeueExitHold
2551              Enables  automatic  requeue  for  batch jobs which exit with the
2552              specified values, with these jobs being held until released man‐
2553              ually  by  the  user.   Separate  multiple  exit code by a comma
2554              and/or specify numeric ranges using a "-" separator  (e.g.  "Re‐
2555              queueExitHold=10-12,16")  These  jobs  are  put  in the JOB_SPE‐
2556              CIAL_EXIT exit state.  Restarted jobs will have the  environment
2557              variable  SLURM_RESTART_COUNT set to the number of times the job
2558              has been restarted.
2559
2560       ResumeFailProgram
2561              The program that will be executed when nodes fail to  resume  to
2562              by  ResumeTimeout. The argument to the program will be the names
2563              of the failed nodes (using Slurm's hostlist expression format).
2564
2565       ResumeProgram
2566              Slurm supports a mechanism to reduce power consumption on  nodes
2567              that  remain idle for an extended period of time.  This is typi‐
2568              cally accomplished by reducing voltage and frequency or powering
2569              the  node  down.  ResumeProgram is the program that will be exe‐
2570              cuted when a node in power save mode is assigned  work  to  per‐
2571              form.   For  reasons  of  reliability, ResumeProgram may execute
2572              more than once for a node when the slurmctld daemon crashes  and
2573              is  restarted.   If ResumeProgram is unable to restore a node to
2574              service with a responding slurmd and  an  updated  BootTime,  it
2575              should  set  the  node state to DOWN, which will result in a re‐
2576              queue of any job associated with the node - this will happen au‐
2577              tomatically  if  the node doesn't register within ResumeTimeout.
2578              If the node isn't actually rebooted (i.e.  when  multiple-slurmd
2579              is configured) starting slurmd with "-b" option might be useful.
2580              The program executes as SlurmUser.  The argument to the  program
2581              will be the names of nodes to be removed from power savings mode
2582              (using Slurm's hostlist expression format). A job to  node  map‐
2583              ping  is  available in JSON format by reading the temporary file
2584              specified by the SLURM_RESUME_FILE environment variable.  By de‐
2585              fault no program is run.
2586
2587       ResumeRate
2588              The  rate at which nodes in power save mode are returned to nor‐
2589              mal operation by ResumeProgram.  The value is a number of  nodes
2590              per minute and it can be used to prevent power surges if a large
2591              number of nodes in power save mode are assigned work at the same
2592              time  (e.g.  a large job starts).  A value of zero results in no
2593              limits being imposed.   The  default  value  is  300  nodes  per
2594              minute.
2595
2596       ResumeTimeout
2597              Maximum  time  permitted (in seconds) between when a node resume
2598              request is issued and when the node is  actually  available  for
2599              use.   Nodes  which  fail  to respond in this time frame will be
2600              marked DOWN and the jobs scheduled on the node requeued.   Nodes
2601              which  reboot  after  this time frame will be marked DOWN with a
2602              reason of "Node unexpectedly rebooted."  The default value is 60
2603              seconds.
2604
2605       ResvEpilog
2606              Fully  qualified pathname of a program for the slurmctld to exe‐
2607              cute when a reservation ends. The program can be used to  cancel
2608              jobs,  modify  partition  configuration,  etc.   The reservation
2609              named will be passed as an argument to the program.  By  default
2610              there is no epilog.
2611
2612       ResvOverRun
2613              Describes how long a job already running in a reservation should
2614              be permitted to execute after the end time  of  the  reservation
2615              has  been  reached.  The time period is specified in minutes and
2616              the default value is 0 (kill the job  immediately).   The  value
2617              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2618              supported to permit a job to run indefinitely after its reserva‐
2619              tion is terminated.
2620
2621       ResvProlog
2622              Fully  qualified pathname of a program for the slurmctld to exe‐
2623              cute when a reservation begins. The program can be used to  can‐
2624              cel  jobs, modify partition configuration, etc.  The reservation
2625              named will be passed as an argument to the program.  By  default
2626              there is no prolog.
2627
2628       ReturnToService
2629              Controls  when a DOWN node will be returned to service.  The de‐
2630              fault value is 0.  Supported values include
2631
2632              0   A node will remain in the DOWN state until a system adminis‐
2633                  trator explicitly changes its state (even if the slurmd dae‐
2634                  mon registers and resumes communications).
2635
2636              1   A DOWN node will become available for use upon  registration
2637                  with  a  valid  configuration only if it was set DOWN due to
2638                  being non-responsive.  If the node  was  set  DOWN  for  any
2639                  other  reason  (low  memory,  unexpected  reboot, etc.), its
2640                  state will not automatically be changed.  A  node  registers
2641                  with  a  valid configuration if its memory, GRES, CPU count,
2642                  etc. are equal to or greater than the values  configured  in
2643                  slurm.conf.
2644
2645              2   A  DOWN node will become available for use upon registration
2646                  with a valid configuration.  The node could  have  been  set
2647                  DOWN for any reason.  A node registers with a valid configu‐
2648                  ration if its memory, GRES, CPU count, etc. are equal to  or
2649                  greater than the values configured in slurm.conf.
2650
2651       RoutePlugin
2652              Identifies  the  plugin to be used for defining which nodes will
2653              be used for message forwarding.
2654
2655              route/default
2656                     default, use TreeWidth.
2657
2658              route/topology
2659                     use the switch hierarchy defined in a topology.conf file.
2660                     TopologyPlugin=topology/tree is required.
2661
2662       SchedulerParameters
2663              The  interpretation  of  this parameter varies by SchedulerType.
2664              Multiple options may be comma separated.
2665
2666              allow_zero_lic
2667                     If set, then job submissions requesting more than config‐
2668                     ured licenses won't be rejected.
2669
2670              assoc_limit_stop
2671                     If  set and a job cannot start due to association limits,
2672                     then do not attempt to initiate any lower  priority  jobs
2673                     in  that  partition.  Setting  this  can  decrease system
2674                     throughput and utilization, but avoid potentially  starv‐
2675                     ing larger jobs by preventing them from launching indefi‐
2676                     nitely.
2677
2678              batch_sched_delay=#
2679                     How long, in seconds, the scheduling of batch jobs can be
2680                     delayed.   This  can be useful in a high-throughput envi‐
2681                     ronment in which batch jobs are submitted at a very  high
2682                     rate  (i.e.  using  the sbatch command) and one wishes to
2683                     reduce the overhead of attempting to schedule each job at
2684                     submit time.  The default value is 3 seconds.
2685
2686              bb_array_stage_cnt=#
2687                     Number of tasks from a job array that should be available
2688                     for burst buffer resource allocation. Higher values  will
2689                     increase  the  system  overhead as each task from the job
2690                     array will be moved to its own job record in  memory,  so
2691                     relatively  small  values are generally recommended.  The
2692                     default value is 10.
2693
2694              bf_busy_nodes
2695                     When selecting resources for pending jobs to reserve  for
2696                     future execution (i.e. the job can not be started immedi‐
2697                     ately), then preferentially select nodes that are in use.
2698                     This  will  tend to leave currently idle resources avail‐
2699                     able for backfilling longer running jobs, but may  result
2700                     in allocations having less than optimal network topology.
2701                     This option  is  currently  only  supported  by  the  se‐
2702                     lect/cons_res   and   select/cons_tres  plugins  (or  se‐
2703                     lect/cray_aries   with   SelectTypeParameters   set    to
2704                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2705                     select/cray_aries plugin over the select/cons_res or  se‐
2706                     lect/cons_tres plugin respectively).
2707
2708              bf_continue
2709                     The backfill scheduler periodically releases locks in or‐
2710                     der to permit other operations  to  proceed  rather  than
2711                     blocking  all  activity for what could be an extended pe‐
2712                     riod of time.  Setting this option will cause  the  back‐
2713                     fill  scheduler  to continue processing pending jobs from
2714                     its original job list after releasing locks even  if  job
2715                     or node state changes.
2716
2717              bf_hetjob_immediate
2718                     Instruct  the  backfill  scheduler  to attempt to start a
2719                     heterogeneous job as soon as all of  its  components  are
2720                     determined  able to do so. Otherwise, the backfill sched‐
2721                     uler will delay heterogeneous  jobs  initiation  attempts
2722                     until  after  the  rest  of the queue has been processed.
2723                     This delay may result in lower priority jobs being  allo‐
2724                     cated  resources, which could delay the initiation of the
2725                     heterogeneous job due to account and/or QOS limits  being
2726                     reached.  This  option is disabled by default. If enabled
2727                     and bf_hetjob_prio=min is not set, then it would be auto‐
2728                     matically set.
2729
2730              bf_hetjob_prio=[min|avg|max]
2731                     At  the  beginning  of  each backfill scheduling cycle, a
2732                     list of pending to be scheduled jobs is sorted  according
2733                     to  the precedence order configured in PriorityType. This
2734                     option instructs the scheduler to alter the sorting algo‐
2735                     rithm to ensure that all components belonging to the same
2736                     heterogeneous job will be attempted to be scheduled  con‐
2737                     secutively  (thus  not fragmented in the resulting list).
2738                     More specifically, all components from the same heteroge‐
2739                     neous  job  will  be treated as if they all have the same
2740                     priority (minimum, average or maximum depending upon this
2741                     option's  parameter)  when  compared  with other jobs (or
2742                     other heterogeneous job components). The  original  order
2743                     will be preserved within the same heterogeneous job. Note
2744                     that the operation is  calculated  for  the  PriorityTier
2745                     layer  and  for  the  Priority  resulting from the prior‐
2746                     ity/multifactor plugin calculations. When enabled, if any
2747                     heterogeneous job requested an advanced reservation, then
2748                     all of that job's components will be treated as  if  they
2749                     had  requested an advanced reservation (and get preferen‐
2750                     tial treatment in scheduling).
2751
2752                     Note that this operation does  not  update  the  Priority
2753                     values  of  the  heterogeneous job components, only their
2754                     order within the list, so the output of the sprio command
2755                     will not be effected.
2756
2757                     Heterogeneous  jobs  have  special scheduling properties:
2758                     they  are  only  scheduled  by  the  backfill  scheduling
2759                     plugin, each of their components is considered separately
2760                     when reserving resources (and might have different Prior‐
2761                     ityTier  or  different Priority values), and no heteroge‐
2762                     neous job component is actually allocated resources until
2763                     all  if  its components can be initiated.  This may imply
2764                     potential scheduling deadlock  scenarios  because  compo‐
2765                     nents from different heterogeneous jobs can start reserv‐
2766                     ing resources in an  interleaved  fashion  (not  consecu‐
2767                     tively),  but  none of the jobs can reserve resources for
2768                     all components and start. Enabling this option  can  help
2769                     to mitigate this problem. By default, this option is dis‐
2770                     abled.
2771
2772              bf_interval=#
2773                     The  number  of  seconds  between  backfill   iterations.
2774                     Higher  values result in less overhead and better respon‐
2775                     siveness.   This  option  applies  only   to   Scheduler‐
2776                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2777                     (3h).  A setting of -1 will disable the backfill schedul‐
2778                     ing loop.
2779
2780              bf_job_part_count_reserve=#
2781                     The  backfill scheduling logic will reserve resources for
2782                     the specified count of highest priority jobs in each par‐
2783                     tition.   For  example, bf_job_part_count_reserve=10 will
2784                     cause the backfill scheduler to reserve resources for the
2785                     ten  highest  priority jobs in each partition.  Any lower
2786                     priority job that can be started using  currently  avail‐
2787                     able  resources  and  not  adversely  impact the expected
2788                     start time of these higher priority jobs will be  started
2789                     by  the  backfill  scheduler  The  default value is zero,
2790                     which will reserve resources for any pending job and  de‐
2791                     lay   initiation   of  lower  priority  jobs.   Also  see
2792                     bf_min_age_reserve and bf_min_prio_reserve.  Default:  0,
2793                     Min: 0, Max: 100000.
2794
2795              bf_licenses
2796                     Require  the  backfill scheduling logic to track and plan
2797                     for license availability. By default, any job blocked  on
2798                     license  availability  will  not  have resources reserved
2799                     which can lead to job starvation.  This option implicitly
2800                     enables bf_running_job_reserve.
2801
2802              bf_max_job_array_resv=#
2803                     The  maximum  number  of tasks from a job array for which
2804                     the backfill scheduler will reserve resources in the  fu‐
2805                     ture.   Since job arrays can potentially have millions of
2806                     tasks, the overhead in reserving resources for all  tasks
2807                     can  be prohibitive.  In addition various limits may pre‐
2808                     vent all the jobs from starting at  the  expected  times.
2809                     This  has  no  impact upon the number of tasks from a job
2810                     array that can be started immediately, only  those  tasks
2811                     expected to start at some future time.  Default: 20, Min:
2812                     0, Max: 1000.  NOTE: Jobs submitted  to  multiple  parti‐
2813                     tions appear in the job queue once per partition. If dif‐
2814                     ferent copies of a single job array record aren't consec‐
2815                     utive in the job queue and another job array record is in
2816                     between, then bf_max_job_array_resv tasks are  considered
2817                     per partition that the job is submitted to.
2818
2819              bf_max_job_assoc=#
2820                     The  maximum  number  of jobs per user association to at‐
2821                     tempt starting with the backfill scheduler.  This setting
2822                     is  similar to bf_max_job_user but is handy if a user has
2823                     multiple associations  equating  to  basically  different
2824                     users.   One  can  set  this  limit to prevent users from
2825                     flooding the backfill queue with jobs that  cannot  start
2826                     and  that  prevent  jobs from other users to start.  This
2827                     option  applies  only  to   SchedulerType=sched/backfill.
2828                     Also    see    the    bf_max_job_user    bf_max_job_part,
2829                     bf_max_job_test and bf_max_job_user_part=# options.   Set
2830                     bf_max_job_test    to    a   value   much   higher   than
2831                     bf_max_job_assoc.  Default: 0 (no limit),  Min:  0,  Max:
2832                     bf_max_job_test.
2833
2834              bf_max_job_part=#
2835                     The  maximum  number  of  jobs  per  partition to attempt
2836                     starting with the backfill scheduler. This can  be  espe‐
2837                     cially  helpful  for systems with large numbers of parti‐
2838                     tions and jobs.  This option applies only  to  Scheduler‐
2839                     Type=sched/backfill.   Also  see  the partition_job_depth
2840                     and bf_max_job_test options.  Set  bf_max_job_test  to  a
2841                     value  much  higher than bf_max_job_part.  Default: 0 (no
2842                     limit), Min: 0, Max: bf_max_job_test.
2843
2844              bf_max_job_start=#
2845                     The maximum number of jobs which can be  initiated  in  a
2846                     single  iteration of the backfill scheduler.  This option
2847                     applies only to SchedulerType=sched/backfill.  Default: 0
2848                     (no limit), Min: 0, Max: 10000.
2849
2850              bf_max_job_test=#
2851                     The maximum number of jobs to attempt backfill scheduling
2852                     for (i.e. the queue depth).  Higher values result in more
2853                     overhead  and  less  responsiveness.  Until an attempt is
2854                     made to backfill schedule a job, its expected  initiation
2855                     time  value  will not be set.  In the case of large clus‐
2856                     ters, configuring a relatively small value may be  desir‐
2857                     able.    This   option   applies   only   to   Scheduler‐
2858                     Type=sched/backfill.   Default:   500,   Min:   1,   Max:
2859                     1,000,000.
2860
2861              bf_max_job_user=#
2862                     The  maximum  number of jobs per user to attempt starting
2863                     with the backfill scheduler for ALL partitions.  One  can
2864                     set  this  limit to prevent users from flooding the back‐
2865                     fill queue with jobs that cannot start and  that  prevent
2866                     jobs  from  other users to start.  This is similar to the
2867                     MAXIJOB limit in  Maui.   This  option  applies  only  to
2868                     SchedulerType=sched/backfill.       Also      see     the
2869                     bf_max_job_part,           bf_max_job_test            and
2870                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2871                     value much higher than bf_max_job_user.  Default:  0  (no
2872                     limit), Min: 0, Max: bf_max_job_test.
2873
2874              bf_max_job_user_part=#
2875                     The  maximum number of jobs per user per partition to at‐
2876                     tempt starting with the backfill scheduler for any single
2877                     partition.    This  option  applies  only  to  Scheduler‐
2878                     Type=sched/backfill.   Also  see   the   bf_max_job_part,
2879                     bf_max_job_test  and bf_max_job_user=# options.  Default:
2880                     0 (no limit), Min: 0, Max: bf_max_job_test.
2881
2882              bf_max_time=#
2883                     The maximum time in seconds the  backfill  scheduler  can
2884                     spend  (including  time spent sleeping when locks are re‐
2885                     leased) before discontinuing, even if maximum job  counts
2886                     have  not  been  reached.   This  option  applies only to
2887                     SchedulerType=sched/backfill.  The default value  is  the
2888                     value of bf_interval (which defaults to 30 seconds).  De‐
2889                     fault: bf_interval value (def. 30 sec), Min: 1, Max: 3600
2890                     (1h).   NOTE:  If bf_interval is short and bf_max_time is
2891                     large, this may cause locks to be acquired too frequently
2892                     and starve out other serviced RPCs. It's advisable if us‐
2893                     ing this parameter to set max_rpc_cnt  high  enough  that
2894                     scheduling isn't always disabled, and low enough that the
2895                     interactive workload can get through in a reasonable  pe‐
2896                     riod  of time. max_rpc_cnt needs to be below 256 (the de‐
2897                     fault RPC thread limit). Running around the middle  (150)
2898                     may  give  you  good  results.  NOTE: When increasing the
2899                     amount of time spent in the  backfill  scheduling  cycle,
2900                     Slurm can be prevented from responding to client requests
2901                     in  a  timely  manner.   To  address  this  you  can  use
2902                     max_rpc_cnt to specify a number of queued RPCs before the
2903                     scheduler stops to respond to these requests.
2904
2905              bf_min_age_reserve=#
2906                     The backfill and main scheduling logic will  not  reserve
2907                     resources  for  pending jobs until they have been pending
2908                     and runnable for at least the specified  number  of  sec‐
2909                     onds.  In addition, jobs waiting for less than the speci‐
2910                     fied number of seconds will not prevent a newly submitted
2911                     job  from starting immediately, even if the newly submit‐
2912                     ted job has a lower priority.  This can  be  valuable  if
2913                     jobs  lack  time  limits or all time limits have the same
2914                     value.  The default value is zero, which will reserve re‐
2915                     sources for any pending job and delay initiation of lower
2916                     priority jobs.  Also  see  bf_job_part_count_reserve  and
2917                     bf_min_prio_reserve.   Default:  0,  Min: 0, Max: 2592000
2918                     (30 days).
2919
2920              bf_min_prio_reserve=#
2921                     The backfill and main scheduling logic will  not  reserve
2922                     resources  for  pending  jobs unless they have a priority
2923                     equal to or higher than the specified  value.   In  addi‐
2924                     tion, jobs with a lower priority will not prevent a newly
2925                     submitted job from  starting  immediately,  even  if  the
2926                     newly  submitted  job  has a lower priority.  This can be
2927                     valuable if one wished  to  maximize  system  utilization
2928                     without  regard  for job priority below a certain thresh‐
2929                     old.  The default value is zero, which will  reserve  re‐
2930                     sources for any pending job and delay initiation of lower
2931                     priority jobs.  Also  see  bf_job_part_count_reserve  and
2932                     bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2933
2934              bf_node_space_size=#
2935                     Size of backfill node_space table. Adding a single job to
2936                     backfill reservations in the worst case can  consume  two
2937                     node_space  records.  In the case of large clusters, con‐
2938                     figuring a relatively small value may be desirable.  This
2939                     option   applies  only  to  SchedulerType=sched/backfill.
2940                     Also see bf_max_job_test and bf_running_job_reserve.  De‐
2941                     fault: bf_max_job_test, Min: 2, Max: 2,000,000.
2942
2943              bf_one_resv_per_job
2944                     Disallow  adding  more  than one backfill reservation per
2945                     job.  The scheduling logic builds a sorted list  of  job-
2946                     partition  pairs.  Jobs  submitted to multiple partitions
2947                     have as many entries in the list as requested partitions.
2948                     By  default,  the backfill scheduler may evaluate all the
2949                     job-partition entries for a single job,  potentially  re‐
2950                     serving  resources  for  each pair, but only starting the
2951                     job in the reservation offering the earliest start  time.
2952                     Having a single job reserving resources for multiple par‐
2953                     titions could impede other jobs  (or  hetjob  components)
2954                     from  reserving resources already reserved for the parti‐
2955                     tions that don't offer the earliest start time.  A single
2956                     job  that  requests  multiple partitions can also prevent
2957                     itself from starting earlier in a lower  priority  parti‐
2958                     tion  if  the  partitions  overlap  nodes  and a backfill
2959                     reservation in the higher priority partition blocks nodes
2960                     that  are also in the lower priority partition.  This op‐
2961                     tion makes it so that a job submitted to multiple  parti‐
2962                     tions  will  stop reserving resources once the first job-
2963                     partition pair has booked a backfill reservation.  Subse‐
2964                     quent  pairs  from  the  same  job will only be tested to
2965                     start now. This allows for other jobs to be able to  book
2966                     the other pairs resources at the cost of not guaranteeing
2967                     that the multi partition job will start in the  partition
2968                     offering the earliest start time (unless it can start im‐
2969                     mediately).  This option is disabled by default.
2970
2971              bf_resolution=#
2972                     The number of seconds in the  resolution  of  data  main‐
2973                     tained  about  when jobs begin and end. Higher values re‐
2974                     sult in better responsiveness and quicker backfill cycles
2975                     by  using  larger blocks of time to determine node eligi‐
2976                     bility.  However, higher values lead  to  less  efficient
2977                     system  planning,  and  may miss opportunities to improve
2978                     system utilization.  This option applies only  to  Sched‐
2979                     ulerType=sched/backfill.   Default: 60, Min: 1, Max: 3600
2980                     (1 hour).
2981
2982              bf_running_job_reserve
2983                     Add an extra step to backfill logic, which creates  back‐
2984                     fill  reservations for jobs running on whole nodes.  This
2985                     option is disabled by default.
2986
2987              bf_window=#
2988                     The number of minutes into the future to look  when  con‐
2989                     sidering  jobs to schedule.  Higher values result in more
2990                     overhead and less responsiveness.  A value  at  least  as
2991                     long  as  the highest allowed time limit is generally ad‐
2992                     visable to prevent job starvation.  In order to limit the
2993                     amount  of data managed by the backfill scheduler, if the
2994                     value of bf_window is increased, then it is generally ad‐
2995                     visable  to also increase bf_resolution.  This option ap‐
2996                     plies  only  to  SchedulerType=sched/backfill.   Default:
2997                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2998
2999              bf_window_linear=#
3000                     For  performance reasons, the backfill scheduler will de‐
3001                     crease precision in calculation of job expected  termina‐
3002                     tion  times.  By default, the precision starts at 30 sec‐
3003                     onds and that time interval doubles with each  evaluation
3004                     of currently executing jobs when trying to determine when
3005                     a pending job can start. This algorithm  can  support  an
3006                     environment  with many thousands of running jobs, but can
3007                     result in the expected start time of pending  jobs  being
3008                     gradually  being  deferred  due  to  lack of precision. A
3009                     value for bf_window_linear will cause the  time  interval
3010                     to  be  increased by a constant amount on each iteration.
3011                     The value is specified in units of seconds. For  example,
3012                     a  value  of  60 will cause the backfill scheduler on the
3013                     first iteration to identify the job  ending  soonest  and
3014                     determine  if  the  pending job can be started after that
3015                     job plus all other jobs expected to end within 30 seconds
3016                     (default initial value) of the first job. On the next it‐
3017                     eration, the pending job will be evaluated  for  starting
3018                     after  the  next job expected to end plus all jobs ending
3019                     within 90 seconds of that time (30 second  default,  plus
3020                     the  60  second  option value).  The third iteration will
3021                     have a 150 second window  and  the  fourth  210  seconds.
3022                     Without this option, the time windows will double on each
3023                     iteration and thus be 30, 60, 120, 240 seconds, etc.  The
3024                     use of bf_window_linear is not recommended with more than
3025                     a few hundred simultaneously executing jobs.
3026
3027              bf_yield_interval=#
3028                     The backfill scheduler will periodically relinquish locks
3029                     in  order  for  other  pending  operations to take place.
3030                     This specifies the times when the locks are  relinquished
3031                     in  microseconds.  Smaller values may be helpful for high
3032                     throughput computing when used in  conjunction  with  the
3033                     bf_continue  option.  Also see the bf_yield_sleep option.
3034                     Default: 2,000,000 (2 sec), Min: 1, Max:  10,000,000  (10
3035                     sec).
3036
3037              bf_yield_sleep=#
3038                     The backfill scheduler will periodically relinquish locks
3039                     in order for other  pending  operations  to  take  place.
3040                     This specifies the length of time for which the locks are
3041                     relinquished in microseconds.  Also see the  bf_yield_in‐
3042                     terval  option.  Default: 500,000 (0.5 sec), Min: 1, Max:
3043                     10,000,000 (10 sec).
3044
3045              build_queue_timeout=#
3046                     Defines the maximum time that can be devoted to  building
3047                     a queue of jobs to be tested for scheduling.  If the sys‐
3048                     tem has a huge number of  jobs  with  dependencies,  just
3049                     building  the  job  queue can take so much time as to ad‐
3050                     versely impact overall system performance and this param‐
3051                     eter  can  be  adjusted  as needed.  The default value is
3052                     2,000,000 microseconds (2 seconds).
3053
3054              correspond_after_task_cnt=#
3055                     Defines the number of array tasks that get split for  po‐
3056                     tential aftercorr dependency check. Low number may result
3057                     in dependent task check failures when the job one depends
3058                     on gets purged before the split.  Default: 10.
3059
3060              default_queue_depth=#
3061                     The  default  number  of jobs to attempt scheduling (i.e.
3062                     the queue depth) when a running job  completes  or  other
3063                     routine  actions  occur, however the frequency with which
3064                     the scheduler is run may be limited by using the defer or
3065                     sched_min_interval  parameters described below.  The full
3066                     queue will be tested on a less frequent basis as  defined
3067                     by the sched_interval option described below. The default
3068                     value is 100.   See  the  partition_job_depth  option  to
3069                     limit depth by partition.
3070
3071              defer  Setting  this  option  will  avoid attempting to schedule
3072                     each job individually at job submit time,  but  defer  it
3073                     until a later time when scheduling multiple jobs simulta‐
3074                     neously may be possible.  This option may improve  system
3075                     responsiveness when large numbers of jobs (many hundreds)
3076                     are submitted at the same time, but  it  will  delay  the
3077                     initiation   time   of  individual  jobs.  Also  see  de‐
3078                     fault_queue_depth above.
3079
3080              delay_boot=#
3081                     Do not reboot nodes in order to satisfied this job's fea‐
3082                     ture  specification  if  the job has been eligible to run
3083                     for less than this time period.  If the  job  has  waited
3084                     for  less  than  the  specified  period, it will use only
3085                     nodes which already have the specified features.  The ar‐
3086                     gument is in units of minutes.  Individual jobs may over‐
3087                     ride this default value with the --delay-boot option.
3088
3089              disable_job_shrink
3090                     Deny user requests to shrink the size  of  running  jobs.
3091                     (However, running jobs may still shrink due to node fail‐
3092                     ure if the --no-kill option was set.)
3093
3094              disable_hetjob_steps
3095                     Disable job steps that  span  heterogeneous  job  alloca‐
3096                     tions.
3097
3098              enable_hetjob_steps
3099                     Enable job steps that span heterogeneous job allocations.
3100                     The default value.
3101
3102              enable_user_top
3103                     Enable use of the "scontrol top"  command  by  non-privi‐
3104                     leged users.
3105
3106              Ignore_NUMA
3107                     Some  processors  (e.g.  AMD Opteron 6000 series) contain
3108                     multiple NUMA nodes per socket. This is  a  configuration
3109                     which  does not map into the hardware entities that Slurm
3110                     optimizes  resource  allocation  for  (PU/thread,   core,
3111                     socket,  baseboard, node and network switch). In order to
3112                     optimize resource allocations  on  such  hardware,  Slurm
3113                     will consider each NUMA node within the socket as a sepa‐
3114                     rate socket by default. Use the Ignore_NUMA option to re‐
3115                     port  the correct socket count, but not optimize resource
3116                     allocations on the NUMA nodes.
3117
3118                     NOTE: Since hwloc 2.0 NUMA Nodes are are not part of  the
3119                     main/CPU topology tree, because of that if Slurm is build
3120                     with hwloc 2.0 or above Slurm will treat  HWLOC_OBJ_PACK‐
3121                     AGE as Socket, you can change this behavior using Slurmd‐
3122                     Parameters=l3cache_as_socket.
3123
3124              ignore_prefer_validation
3125                     If set, and a job requests --prefer any features  in  the
3126                     request  that  would  create  an invalid request with the
3127                     current system will not generate an error.  This is help‐
3128                     ful  for  dynamic  systems where nodes with features come
3129                     and go.  Please note using this option will  not  protect
3130                     you from typos.
3131
3132              max_array_tasks
3133                     Specify  the maximum number of tasks that can be included
3134                     in a job array.  The default limit is  MaxArraySize,  but
3135                     this  option  can be used to set a lower limit. For exam‐
3136                     ple, max_array_tasks=1000 and  MaxArraySize=100001  would
3137                     permit  a maximum task ID of 100000, but limit the number
3138                     of tasks in any single job array to 1000.
3139
3140              max_rpc_cnt=#
3141                     If the number of active threads in the  slurmctld  daemon
3142                     is  equal  to or larger than this value, defer scheduling
3143                     of jobs. The scheduler will check this condition at  cer‐
3144                     tain  points  in code and yield locks if necessary.  This
3145                     can improve Slurm's ability to process requests at a cost
3146                     of  initiating  new jobs less frequently. Default: 0 (op‐
3147                     tion disabled), Min: 0, Max: 1000.
3148
3149                     NOTE: The maximum number of threads  (MAX_SERVER_THREADS)
3150                     is internally set to 256 and defines the number of served
3151                     RPCs at a given time. Setting max_rpc_cnt  to  more  than
3152                     256 will be only useful to let backfill continue schedul‐
3153                     ing work after locks have been yielded (i.e. each 2  sec‐
3154                     onds)  if  there are a maximum of MAX(max_rpc_cnt/10, 20)
3155                     RPCs in the queue. i.e. max_rpc_cnt=1000,  the  scheduler
3156                     will  be  allowed  to  continue after yielding locks only
3157                     when there are less than or equal to  100  pending  RPCs.
3158                     If a value is set, then a value of 10 or higher is recom‐
3159                     mended. It may require some tuning for each  system,  but
3160                     needs to be high enough that scheduling isn't always dis‐
3161                     abled, and low enough that requests can get through in  a
3162                     reasonable period of time.
3163
3164              max_sched_time=#
3165                     How  long, in seconds, that the main scheduling loop will
3166                     execute for before exiting.  If a value is configured, be
3167                     aware  that  all  other Slurm operations will be deferred
3168                     during this time period.  Make certain the value is lower
3169                     than  MessageTimeout.   If a value is not explicitly con‐
3170                     figured, the default value is half of MessageTimeout with
3171                     a minimum default value of 1 second and a maximum default
3172                     value of 2 seconds.  For  example  if  MessageTimeout=10,
3173                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3174
3175              max_script_size=#
3176                     Specify  the  maximum  size  of a batch script, in bytes.
3177                     The default value is 4 megabytes.  Larger values may  ad‐
3178                     versely impact system performance.
3179
3180              max_switch_wait=#
3181                     Maximum  number of seconds that a job can delay execution
3182                     waiting for the specified desired switch count.  The  de‐
3183                     fault value is 300 seconds.
3184
3185              no_backup_scheduling
3186                     If  used,  the  backup  controller will not schedule jobs
3187                     when it takes over. The backup controller will allow jobs
3188                     to  be submitted, modified and cancelled but won't sched‐
3189                     ule new jobs. This is useful in  Cray  environments  when
3190                     the  backup  controller resides on an external Cray node.
3191                     A restart of slurmctld is required for  changes  to  this
3192                     parameter to take effect.
3193
3194              no_env_cache
3195                     If  used,  any job started on node that fails to load the
3196                     env from a node will fail instead  of  using  the  cached
3197                     env.    This   will   also   implicitly   imply  the  re‐
3198                     queue_setup_env_fail option as well.
3199
3200              nohold_on_prolog_fail
3201                     By default, if the Prolog exits with a non-zero value the
3202                     job  is  requeued in a held state. By specifying this pa‐
3203                     rameter the job will be requeued but not held so that the
3204                     scheduler can dispatch it to another host.
3205
3206              pack_serial_at_end
3207                     If  used  with  the  select/cons_res  or select/cons_tres
3208                     plugin, then put serial jobs at the end of the  available
3209                     nodes  rather  than using a best fit algorithm.  This may
3210                     reduce resource fragmentation for some workloads.
3211
3212              partition_job_depth=#
3213                     The default number of jobs to  attempt  scheduling  (i.e.
3214                     the  queue  depth)  from  each partition/queue in Slurm's
3215                     main scheduling logic.  The functionality is  similar  to
3216                     that provided by the bf_max_job_part option for the back‐
3217                     fill scheduling  logic.   The  default  value  is  0  (no
3218                     limit).   Job's  excluded from attempted scheduling based
3219                     upon partition  will  not  be  counted  against  the  de‐
3220                     fault_queue_depth  limit.   Also  see the bf_max_job_part
3221                     option.
3222
3223              preempt_reorder_count=#
3224                     Specify how many attempts should be  made  in  reordering
3225                     preemptable jobs to minimize the count of jobs preempted.
3226                     The default value is 1. High values may adversely  impact
3227                     performance.   The  logic  to support this option is only
3228                     available in  the  select/cons_res  and  select/cons_tres
3229                     plugins.
3230
3231              preempt_strict_order
3232                     If set, then execute extra logic in an attempt to preempt
3233                     only the lowest priority jobs.  It may  be  desirable  to
3234                     set  this configuration parameter when there are multiple
3235                     priorities of preemptable jobs.   The  logic  to  support
3236                     this  option is only available in the select/cons_res and
3237                     select/cons_tres plugins.
3238
3239              preempt_youngest_first
3240                     If set, then the preemption  sorting  algorithm  will  be
3241                     changed  to sort by the job start times to favor preempt‐
3242                     ing younger jobs  over  older.  (Requires  preempt/parti‐
3243                     tion_prio or preempt/qos plugins.)
3244
3245              reduce_completing_frag
3246                     This  option  is  used  to  control how scheduling of re‐
3247                     sources is performed when  jobs  are  in  the  COMPLETING
3248                     state, which influences potential fragmentation.  If this
3249                     option is not set then no jobs will  be  started  in  any
3250                     partition  when  any  job  is in the COMPLETING state for
3251                     less than CompleteWait seconds.  If this  option  is  set
3252                     then  no jobs will be started in any individual partition
3253                     that has a job in COMPLETING state  for  less  than  Com‐
3254                     pleteWait  seconds.  In addition, no jobs will be started
3255                     in any partition with nodes that overlap with  any  nodes
3256                     in  the  partition of the completing job.  This option is
3257                     to be used in conjunction with CompleteWait.
3258
3259                     NOTE: CompleteWait must be set in order for this to work.
3260                     If CompleteWait=0 then this option does nothing.
3261
3262                     NOTE: reduce_completing_frag only affects the main sched‐
3263                     uler, not the backfill scheduler.
3264
3265              requeue_setup_env_fail
3266                     By default if a job environment setup fails the job keeps
3267                     running  with  a  limited environment. By specifying this
3268                     parameter the job will be requeued in held state and  the
3269                     execution node drained.
3270
3271              salloc_wait_nodes
3272                     If  defined, the salloc command will wait until all allo‐
3273                     cated nodes are ready for use (i.e.  booted)  before  the
3274                     command  returns.  By default, salloc will return as soon
3275                     as the resource allocation has been made.
3276
3277              sbatch_wait_nodes
3278                     If defined, the sbatch script will wait until  all  allo‐
3279                     cated  nodes  are  ready for use (i.e. booted) before the
3280                     initiation. By default, the sbatch script will be  initi‐
3281                     ated  as  soon as the first node in the job allocation is
3282                     ready. The sbatch command can  use  the  --wait-all-nodes
3283                     option to override this configuration parameter.
3284
3285              sched_interval=#
3286                     How frequently, in seconds, the main scheduling loop will
3287                     execute and test all pending jobs.  The default value  is
3288                     60 seconds.  A setting of -1 will disable the main sched‐
3289                     uling loop.
3290
3291              sched_max_job_start=#
3292                     The maximum number of jobs that the main scheduling logic
3293                     will start in any single execution.  The default value is
3294                     zero, which imposes no limit.
3295
3296              sched_min_interval=#
3297                     How frequently, in microseconds, the main scheduling loop
3298                     will  execute  and  test any pending jobs.  The scheduler
3299                     runs in a limited fashion every time that any event  hap‐
3300                     pens  which could enable a job to start (e.g. job submit,
3301                     job terminate, etc.).  If these events happen at  a  high
3302                     frequency, the scheduler can run very frequently and con‐
3303                     sume significant resources if not throttled by  this  op‐
3304                     tion.  This option specifies the minimum time between the
3305                     end of one scheduling cycle and the beginning of the next
3306                     scheduling  cycle.   A  value of zero will disable throt‐
3307                     tling of the  scheduling  logic  interval.   The  default
3308                     value is 2 microseconds.
3309
3310              spec_cores_first
3311                     Specialized  cores  will be selected from the first cores
3312                     of the first sockets, cycling through the  sockets  on  a
3313                     round robin basis.  By default, specialized cores will be
3314                     selected from the last cores of the last sockets, cycling
3315                     through the sockets on a round robin basis.
3316
3317              step_retry_count=#
3318                     When a step completes and there are steps ending resource
3319                     allocation, then retry step allocations for at least this
3320                     number  of pending steps.  Also see step_retry_time.  The
3321                     default value is 8 steps.
3322
3323              step_retry_time=#
3324                     When a step completes and there are steps ending resource
3325                     allocation,  then  retry  step  allocations for all steps
3326                     which have been pending for at least this number of  sec‐
3327                     onds.   Also  see step_retry_count.  The default value is
3328                     60 seconds.
3329
3330              whole_hetjob
3331                     Requests to cancel, hold or release any  component  of  a
3332                     heterogeneous  job  will  be applied to all components of
3333                     the job.
3334
3335                     NOTE: this option was  previously  named  whole_pack  and
3336                     this is still supported for retrocompatibility.
3337
3338       SchedulerTimeSlice
3339              Number of seconds in each time slice when gang scheduling is en‐
3340              abled (PreemptMode=SUSPEND,GANG).  The value must be  between  5
3341              seconds and 65533 seconds.  The default value is 30 seconds.
3342
3343       SchedulerType
3344              Identifies  the  type  of  scheduler  to  be used.  A restart of
3345              slurmctld is required for changes to this parameter to take  ef‐
3346              fect.   The  scontrol command can be used to manually change job
3347              priorities if desired.  Acceptable values include:
3348
3349              sched/backfill
3350                     For a backfill scheduling module to augment  the  default
3351                     FIFO   scheduling.   Backfill  scheduling  will  initiate
3352                     lower-priority jobs if doing so does not  delay  the  ex‐
3353                     pected  initiation  time of any higher priority job.  Ef‐
3354                     fectiveness of  backfill  scheduling  is  dependent  upon
3355                     users specifying job time limits, otherwise all jobs will
3356                     have the same time limit and backfilling  is  impossible.
3357                     Note  documentation  for  the  SchedulerParameters option
3358                     above.  This is the default configuration.
3359
3360              sched/builtin
3361                     This is the FIFO scheduler which initiates jobs in prior‐
3362                     ity order.  If any job in the partition can not be sched‐
3363                     uled, no lower priority job in  that  partition  will  be
3364                     scheduled.   An  exception  is made for jobs that can not
3365                     run due to partition constraints (e.g. the time limit) or
3366                     down/drained  nodes.   In  that case, lower priority jobs
3367                     can be initiated and not impact the higher priority job.
3368
3369       ScronParameters
3370              Multiple options may be comma separated.
3371
3372              enable Enable the use of scrontab to submit and manage  periodic
3373                     repeating jobs.
3374
3375       SelectType
3376              Identifies  the type of resource selection algorithm to be used.
3377              A restart of slurmctld and slurmd is  required  for  changes  to
3378              this parameter to take effect. When changed, all job information
3379              (running and pending) will be lost, since  the  job  state  save
3380              format  used by each plugin is different.  The only exception to
3381              this is  when  changing  from  cons_res  to  cons_tres  or  from
3382              cons_tres to cons_res. However, if a job contains cons_tres-spe‐
3383              cific features and then SelectType is changed to  cons_res,  the
3384              job will be canceled, since there is no way for cons_res to sat‐
3385              isfy requirements specific to cons_tres.
3386
3387              Acceptable values include
3388
3389              select/cons_res
3390                     The resources (cores and memory) within a node are  indi‐
3391                     vidually  allocated  as  consumable resources.  Note that
3392                     whole nodes can be allocated to jobs for selected  parti‐
3393                     tions  by  using the OverSubscribe=Exclusive option.  See
3394                     the partition OverSubscribe parameter for  more  informa‐
3395                     tion.
3396
3397              select/cons_tres
3398                     The  resources  (cores, memory, GPUs and all other track‐
3399                     able resources) within a node are individually  allocated
3400                     as  consumable  resources.   Note that whole nodes can be
3401                     allocated to jobs for selected partitions  by  using  the
3402                     OverSubscribe=Exclusive  option.  See the partition Over‐
3403                     Subscribe parameter for more information.
3404
3405              select/cray_aries
3406                     for  a  Cray  system.   The   default   value   is   "se‐
3407                     lect/cray_aries" for all Cray systems.
3408
3409              select/linear
3410                     for allocation of entire nodes assuming a one-dimensional
3411                     array of nodes in which sequentially  ordered  nodes  are
3412                     preferable.   For a heterogeneous cluster (e.g. different
3413                     CPU counts on the various  nodes),  resource  allocations
3414                     will  favor  nodes  with  high CPU counts as needed based
3415                     upon the job's node and CPU specification if TopologyPlu‐
3416                     gin=topology/none  is  configured.  Use of other topology
3417                     plugins with select/linear and heterogeneous nodes is not
3418                     recommended  and  may  result in valid job allocation re‐
3419                     quests being rejected. The linear plugin is not  designed
3420                     to  track  generic  resources  on  a node. In cases where
3421                     generic resources (such as GPUs) need to be tracked,  the
3422                     cons_res  or  cons_tres  plugins  should be used instead.
3423                     This is the default value.
3424
3425       SelectTypeParameters
3426              The permitted values of  SelectTypeParameters  depend  upon  the
3427              configured  value of SelectType.  The only supported options for
3428              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3429              which treats memory as a consumable resource and prevents memory
3430              over subscription with job preemption or  gang  scheduling.   By
3431              default  SelectType=select/linear  allocates whole nodes to jobs
3432              without considering their memory consumption.   By  default  Se‐
3433              lectType=select/cons_res,  SelectType=select/cray_aries, and Se‐
3434              lectType=select/cons_tres, use CR_Core_Memory,  which  allocates
3435              Core to jobs with considering their memory consumption.
3436
3437              A restart of slurmctld is required for changes to this parameter
3438              to take effect.
3439
3440              The  following  options   are   supported   for   SelectType=se‐
3441              lect/cray_aries:
3442
3443              OTHER_CONS_RES
3444                     Layer   the   select/cons_res   plugin   under   the  se‐
3445                     lect/cray_aries plugin, the default is to  layer  on  se‐
3446                     lect/linear.   This also allows all the options available
3447                     for SelectType=select/cons_res.
3448
3449              OTHER_CONS_TRES
3450                     Layer  the  select/cons_tres   plugin   under   the   se‐
3451                     lect/cray_aries  plugin,  the  default is to layer on se‐
3452                     lect/linear.  This also allows all the options  available
3453                     for SelectType=select/cons_tres.
3454
3455       The  following  options are supported by the SelectType=select/cons_res
3456       and SelectType=select/cons_tres plugins:
3457
3458              CR_CPU CPUs are consumable resources.  Configure the  number  of
3459                     CPUs  on  each  node,  which may be equal to the count of
3460                     cores or hyper-threads on the node depending upon the de‐
3461                     sired  minimum  resource  allocation.  The node's Boards,
3462                     Sockets, CoresPerSocket and ThreadsPerCore may optionally
3463                     be  configured  and  result in job allocations which have
3464                     improved locality; however doing  so  will  prevent  more
3465                     than one job from being allocated on each core.
3466
3467              CR_CPU_Memory
3468                     CPUs  and memory are consumable resources.  Configure the
3469                     number of CPUs on each node, which may be  equal  to  the
3470                     count  of  cores  or  hyper-threads on the node depending
3471                     upon the desired minimum resource allocation.  The node's
3472                     Boards,  Sockets,  CoresPerSocket  and ThreadsPerCore may
3473                     optionally be configured and result  in  job  allocations
3474                     which  have improved locality; however doing so will pre‐
3475                     vent more than one job from being allocated on each core.
3476                     Setting a value for DefMemPerCPU is strongly recommended.
3477
3478              CR_Core
3479                     Cores  are  consumable  resources.   On  nodes  with  hy‐
3480                     per-threads, each thread is counted as a CPU to satisfy a
3481                     job's resource requirement, but multiple jobs are not al‐
3482                     located threads on the same core.  The count of CPUs  al‐
3483                     located  to  a job is rounded up to account for every CPU
3484                     on an allocated core. This will also impact  total  allo‐
3485                     cated memory when --mem-per-cpu is used to be multiply of
3486                     total number of CPUs on allocated cores.
3487
3488              CR_Core_Memory
3489                     Cores and memory are consumable resources.  On nodes with
3490                     hyper-threads, each thread is counted as a CPU to satisfy
3491                     a job's resource requirement, but multiple jobs  are  not
3492                     allocated  threads  on  the same core.  The count of CPUs
3493                     allocated to a job may be rounded up to account for every
3494                     CPU on an allocated core.  Setting a value for DefMemPer‐
3495                     CPU is strongly recommended.
3496
3497              CR_ONE_TASK_PER_CORE
3498                     Allocate one task per core by default.  Without this  op‐
3499                     tion, by default one task will be allocated per thread on
3500                     nodes  with  more  than  one  ThreadsPerCore  configured.
3501                     NOTE: This option cannot be used with CR_CPU*.
3502
3503              CR_CORE_DEFAULT_DIST_BLOCK
3504                     Allocate  cores within a node using block distribution by
3505                     default.  This is a pseudo-best-fit algorithm that  mini‐
3506                     mizes  the  number  of boards and minimizes the number of
3507                     sockets (within minimum boards) used for the  allocation.
3508                     This default behavior can be overridden specifying a par‐
3509                     ticular "-m" parameter with srun/salloc/sbatch.   Without
3510                     this  option,  cores  will be allocated cyclically across
3511                     the sockets.
3512
3513              CR_LLN Schedule resources to jobs  on  the  least  loaded  nodes
3514                     (based  upon  the number of idle CPUs). This is generally
3515                     only recommended for an environment with serial  jobs  as
3516                     idle resources will tend to be highly fragmented, result‐
3517                     ing in parallel jobs being distributed across many nodes.
3518                     Note that node Weight takes precedence over how many idle
3519                     resources are on each node.  Also see the partition  con‐
3520                     figuration  parameter  LLN  use the least loaded nodes in
3521                     selected partitions.
3522
3523              CR_Pack_Nodes
3524                     If a job allocation contains more resources than will  be
3525                     used  for  launching tasks (e.g. if whole nodes are allo‐
3526                     cated to a job), then rather than  distributing  a  job's
3527                     tasks  evenly  across  its  allocated nodes, pack them as
3528                     tightly as possible on these nodes.   For  example,  con‐
3529                     sider  a  job allocation containing two entire nodes with
3530                     eight CPUs each.  If the  job  starts  ten  tasks  across
3531                     those  two  nodes without this option, it will start five
3532                     tasks on each of the two nodes.  With this option,  eight
3533                     tasks  will be started on the first node and two tasks on
3534                     the second node.  This can be superseded by  "NoPack"  in
3535                     srun's  "--distribution"  option.  CR_Pack_Nodes only ap‐
3536                     plies when the "block" task distribution method is used.
3537
3538              CR_Socket
3539                     Sockets are consumable resources.  On nodes with multiple
3540                     cores, each core or thread is counted as a CPU to satisfy
3541                     a job's resource requirement, but multiple jobs  are  not
3542                     allocated resources on the same socket.
3543
3544              CR_Socket_Memory
3545                     Memory  and  sockets  are consumable resources.  On nodes
3546                     with multiple cores, each core or thread is counted as  a
3547                     CPU to satisfy a job's resource requirement, but multiple
3548                     jobs are not allocated  resources  on  the  same  socket.
3549                     Setting a value for DefMemPerCPU is strongly recommended.
3550
3551              CR_Memory
3552                     Memory  is  a  consumable  resource.   NOTE: This implies
3553                     OverSubscribe=YES or OverSubscribe=FORCE for  all  parti‐
3554                     tions.  Setting a value for DefMemPerCPU is strongly rec‐
3555                     ommended.
3556
3557              NOTE: If  memory  isn't  configured  as  a  consumable  resource
3558              (CR_CPU,
3559                     CR_Core or CR_Socket without _Memory) memory can be over‐
3560                     subscribed. In this case the --mem option is only used to
3561                     filter  out  nodes  with lower configured memory and does
3562                     not take running jobs into  account.  For  instance,  two
3563                     jobs  requesting  all the memory of a node can run at the
3564                     same time.
3565
3566       SlurmctldAddr
3567              An optional address to be used for communications  to  the  cur‐
3568              rently  active  slurmctld  daemon, normally used with Virtual IP
3569              addressing of the currently active server.  If this parameter is
3570              not  specified then each primary and backup server will have its
3571              own unique address used for communications as specified  in  the
3572              SlurmctldHost  parameter.   If  this parameter is specified then
3573              the SlurmctldHost parameter will still be  used  for  communica‐
3574              tions to specific slurmctld primary or backup servers, for exam‐
3575              ple to cause all of them to read the current configuration files
3576              or  shutdown.   Also  see the SlurmctldPrimaryOffProg and Slurm‐
3577              ctldPrimaryOnProg configuration parameters to configure programs
3578              to manipulate virtual IP address manipulation.
3579
3580       SlurmctldDebug
3581              The level of detail to provide slurmctld daemon's logs.  The de‐
3582              fault value is info.  If the slurmctld daemon is initiated  with
3583              -v  or  --verbose  options, that debug level will be preserve or
3584              restored upon reconfiguration.
3585
3586              quiet     Log nothing
3587
3588              fatal     Log only fatal errors
3589
3590              error     Log only errors
3591
3592              info      Log errors and general informational messages
3593
3594              verbose   Log errors and verbose informational messages
3595
3596              debug     Log errors and verbose informational messages and  de‐
3597                        bugging messages
3598
3599              debug2    Log errors and verbose informational messages and more
3600                        debugging messages
3601
3602              debug3    Log errors and verbose informational messages and even
3603                        more debugging messages
3604
3605              debug4    Log errors and verbose informational messages and even
3606                        more debugging messages
3607
3608              debug5    Log errors and verbose informational messages and even
3609                        more debugging messages
3610
3611       SlurmctldHost
3612              The  short, or long, hostname of the machine where Slurm control
3613              daemon is executed (i.e. the name returned by the command "host‐
3614              name -s").  This hostname is optionally followed by the address,
3615              either the IP address or a name by  which  the  address  can  be
3616              identified,  enclosed in parentheses (e.g.  SlurmctldHost=slurm‐
3617              ctl-primary(12.34.56.78)). This value must be specified at least
3618              once. If specified more than once, the first hostname named will
3619              be where the daemon runs.  If the first  specified  host  fails,
3620              the  daemon  will execute on the second host.  If both the first
3621              and second specified host fails, the daemon will execute on  the
3622              third  host.   A restart of slurmctld is required for changes to
3623              this parameter to take effect.
3624
3625       SlurmctldLogFile
3626              Fully qualified pathname of a file into which the slurmctld dae‐
3627              mon's  logs  are  written.   The default value is none (performs
3628              logging via syslog).
3629              See the section LOGGING if a pathname is specified.
3630
3631       SlurmctldParameters
3632              Multiple options may be comma separated.
3633
3634              allow_user_triggers
3635                     Permit setting triggers from  non-root/slurm_user  users.
3636                     SlurmUser  must also be set to root to permit these trig‐
3637                     gers to work. See the strigger man  page  for  additional
3638                     details.
3639
3640              cloud_dns
3641                     By  default, Slurm expects that the network address for a
3642                     cloud node won't be known until the creation of the  node
3643                     and  that  Slurm  will  be notified of the node's address
3644                     (e.g. scontrol update  nodename=<name>  nodeaddr=<addr>).
3645                     Since Slurm communications rely on the node configuration
3646                     found in the slurm.conf, Slurm will tell the client  com‐
3647                     mand, after waiting for all nodes to boot, each node's ip
3648                     address. However, in environments where the nodes are  in
3649                     DNS, this step can be avoided by configuring this option.
3650
3651              cloud_reg_addrs
3652                     When  a  cloud  node  registers,  the node's NodeAddr and
3653                     NodeHostName will automatically be set. They will be  re‐
3654                     set back to the nodename after powering off.
3655
3656              enable_configless
3657                     Permit  "configless" operation by the slurmd, slurmstepd,
3658                     and user commands.  When enabled the slurmd will be  per‐
3659                     mitted  to  retrieve config files from the slurmctld, and
3660                     on any 'scontrol reconfigure' command new configs will be
3661                     automatically  pushed  out  and applied to nodes that are
3662                     running in this "configless" mode.  A restart  of  slurm‐
3663                     ctld  is  required  for changes to this parameter to take
3664                     effect.  NOTE: Included files with the Include  directive
3665                     will  only  be pushed if the filename has no path separa‐
3666                     tors and is located adjacent to slurm.conf.
3667
3668              idle_on_node_suspend
3669                     Mark nodes as idle, regardless  of  current  state,  when
3670                     suspending  nodes  with SuspendProgram so that nodes will
3671                     be eligible to be resumed at a later time.
3672
3673              node_reg_mem_percent=#
3674                     Percentage of memory a node is allowed to  register  with
3675                     without  being marked as invalid with low memory. Default
3676                     is 100. For State=CLOUD nodes, the default is 90. To dis‐
3677                     able this for cloud nodes set it to 100. config_overrides
3678                     takes precedence over this option.
3679
3680                     It's recommended that task/cgroup with  ConstrainRamSpace
3681                     is  configured.  A  memory cgroup limit won't be set more
3682                     than the actual memory on the node. If needed,  configure
3683                     AllowedRamSpace in the cgroup.conf to add a buffer.
3684
3685              power_save_interval
3686                     How  often the power_save thread looks to resume and sus‐
3687                     pend nodes. The power_save thread will do work sooner  if
3688                     there are node state changes. Default is 10 seconds.
3689
3690              power_save_min_interval
3691                     How  often  the power_save thread, at a minimum, looks to
3692                     resume and suspend nodes. Default is 0.
3693
3694              max_dbd_msg_action
3695                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3696                     card' (default) and 'exit'.
3697
3698                     When  'discard' is specified and MaxDBDMsgs is reached we
3699                     start by purging pending messages of types Step start and
3700                     complete,  and it reaches MaxDBDMsgs again Job start mes‐
3701                     sages are purged. Job completes and  node  state  changes
3702                     continue  to  consume  the  empty  space created from the
3703                     purgings until MaxDBDMsgs is reached again  at  which  no
3704                     new message is tracked creating data loss and potentially
3705                     runaway jobs.
3706
3707                     When 'exit' is specified and MaxDBDMsgs  is  reached  the
3708                     slurmctld  will  exit instead of discarding any messages.
3709                     It will be impossible to start the  slurmctld  with  this
3710                     option  where  the  slurmdbd is down and the slurmctld is
3711                     tracking more than MaxDBDMsgs.
3712
3713              preempt_send_user_signal
3714                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3715                     tion time even if the signal time hasn't been reached. In
3716                     the case of a gracetime preemption the user  signal  will
3717                     be  sent  if  the  user signal has been specified and not
3718                     sent, otherwise a SIGTERM will be sent to the tasks.
3719
3720              reboot_from_controller
3721                     Run the RebootProgram from the controller instead  of  on
3722                     the   slurmds.   The   RebootProgram  will  be  passed  a
3723                     comma-separated list of nodes to reboot as the first  ar‐
3724                     gument and if applicable the required features needed for
3725                     reboot as the second argument.
3726
3727              user_resv_delete
3728                     Allow any user able to run in a reservation to delete it.
3729
3730       SlurmctldPidFile
3731              Fully qualified pathname of a file  into  which  the   slurmctld
3732              daemon  may write its process id. This may be used for automated
3733              signal  processing.   The  default  value  is   "/var/run/slurm‐
3734              ctld.pid".
3735
3736       SlurmctldPlugstack
3737              A comma-delimited list of Slurm controller plugins to be started
3738              when the daemon begins and terminated when it  ends.   Only  the
3739              plugin's init and fini functions are called.
3740
3741       SlurmctldPort
3742              The port number that the Slurm controller, slurmctld, listens to
3743              for work. The default value is SLURMCTLD_PORT as established  at
3744              system  build  time. If none is explicitly specified, it will be
3745              set to 6817.  SlurmctldPort may also be configured to support  a
3746              range of port numbers in order to accept larger bursts of incom‐
3747              ing messages by specifying two numbers separated by a dash (e.g.
3748              SlurmctldPort=6817-6818).   A  restart  of slurmctld is required
3749              for changes to this parameter  to  take  effect.   NOTE:  Either
3750              slurmctld  and slurmd daemons must not execute on the same nodes
3751              or the values of SlurmctldPort and SlurmdPort must be different.
3752
3753              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3754              automatically  try  to  interact  with  anything opened on ports
3755              8192-60000.  Configure SlurmctldPort to use a  port  outside  of
3756              the configured SrunPortRange and RSIP's port range.
3757
3758       SlurmctldPrimaryOffProg
3759              This  program is executed when a slurmctld daemon running as the
3760              primary server becomes a backup server. By default no program is
3761              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3762              ter.
3763
3764       SlurmctldPrimaryOnProg
3765              This program is executed when a slurmctld daemon  running  as  a
3766              backup  server becomes the primary server. By default no program
3767              is executed.  When using virtual IP  addresses  to  manage  High
3768              Available Slurm services, this program can be used to add the IP
3769              address to an interface (and optionally try to  kill  the  unre‐
3770              sponsive   slurmctld daemon and flush the ARP caches on nodes on
3771              the local Ethernet fabric).  See also the related "SlurmctldPri‐
3772              maryOffProg" parameter.
3773
3774       SlurmctldSyslogDebug
3775              The  slurmctld  daemon will log events to the syslog file at the
3776              specified level of detail. If not set, the slurmctld daemon will
3777              log  to  syslog at level fatal, unless there is no SlurmctldLog‐
3778              File and it is running in the background, in which case it  will
3779              log to syslog at the level specified by SlurmctldDebug (at fatal
3780              in the case that SlurmctldDebug is set to quiet) or it is run in
3781              the foreground, when it will be set to quiet.
3782
3783              quiet     Log nothing
3784
3785              fatal     Log only fatal errors
3786
3787              error     Log only errors
3788
3789              info      Log errors and general informational messages
3790
3791              verbose   Log errors and verbose informational messages
3792
3793              debug     Log  errors and verbose informational messages and de‐
3794                        bugging messages
3795
3796              debug2    Log errors and verbose informational messages and more
3797                        debugging messages
3798
3799              debug3    Log errors and verbose informational messages and even
3800                        more debugging messages
3801
3802              debug4    Log errors and verbose informational messages and even
3803                        more debugging messages
3804
3805              debug5    Log errors and verbose informational messages and even
3806                        more debugging messages
3807
3808              NOTE: By default, Slurm's systemd service files start daemons in
3809              the  foreground with the -D option. This means that systemd will
3810              capture stdout/stderr output and print that to syslog,  indepen‐
3811              dent  of  Slurm  printing to syslog directly. To prevent systemd
3812              from doing  this,  add  "StandardOutput=null"  and  "StandardEr‐
3813              ror=null" to the respective service files or override files.
3814
3815       SlurmctldTimeout
3816              The  interval,  in seconds, that the backup controller waits for
3817              the primary controller to respond before assuming control.   The
3818              default value is 120 seconds.  May not exceed 65533.
3819
3820       SlurmdDebug
3821              The  level  of  detail to provide slurmd daemon's logs.  The de‐
3822              fault value is info.
3823
3824              quiet     Log nothing
3825
3826              fatal     Log only fatal errors
3827
3828              error     Log only errors
3829
3830              info      Log errors and general informational messages
3831
3832              verbose   Log errors and verbose informational messages
3833
3834              debug     Log errors and verbose informational messages and  de‐
3835                        bugging messages
3836
3837              debug2    Log errors and verbose informational messages and more
3838                        debugging messages
3839
3840              debug3    Log errors and verbose informational messages and even
3841                        more debugging messages
3842
3843              debug4    Log errors and verbose informational messages and even
3844                        more debugging messages
3845
3846              debug5    Log errors and verbose informational messages and even
3847                        more debugging messages
3848
3849       SlurmdLogFile
3850              Fully  qualified  pathname of a file into which the  slurmd dae‐
3851              mon's logs are written.  The default  value  is  none  (performs
3852              logging via syslog).  The first "%h" within the name is replaced
3853              with the hostname on which the slurmd  is  running.   The  first
3854              "%n"  within  the  name  is replaced with the Slurm node name on
3855              which the slurmd is running.
3856              See the section LOGGING if a pathname is specified.
3857
3858       SlurmdParameters
3859              Parameters specific to the  Slurmd.   Multiple  options  may  be
3860              comma separated.
3861
3862              config_overrides
3863                     If  set,  consider  the  configuration of each node to be
3864                     that specified in the slurm.conf configuration  file  and
3865                     any node with less than the configured resources will not
3866                     be set to INVAL/INVALID_REG.  This  option  is  generally
3867                     only  useful for testing purposes.  Equivalent to the now
3868                     deprecated FastSchedule=2 option.
3869
3870              l3cache_as_socket
3871                     Use the hwloc l3cache as the socket count. Can be  useful
3872                     on  certain  processors  where  the  socket  level is too
3873                     coarse, and the l3cache may provide better task distribu‐
3874                     tion.  (E.g.,  along  CCX  boundaries  instead  of socket
3875                     boundaries.)        Mutually        exclusive        with
3876                     numa_node_as_socket.  Requires hwloc v2.
3877
3878              numa_node_as_socket
3879                     Use  the  hwloc NUMA Node to determine main hierarchy ob‐
3880                     ject to be used as socket.  If the option  is  set  Slurm
3881                     will  check  the parent object of NUMA Noda and use it as
3882                     socket. This option may be useful for architectures likes
3883                     AMD Epyc, where number of nodes per socket may be config‐
3884                     ured.  Mutually exclusive  with  l3cache_as_socket.   Re‐
3885                     quires hwloc v2.
3886
3887              shutdown_on_reboot
3888                     If  set,  the  Slurmd will shut itself down when a reboot
3889                     request is received.
3890
3891       SlurmdPidFile
3892              Fully qualified pathname of a file into which the  slurmd daemon
3893              may  write its process id. This may be used for automated signal
3894              processing.  The first "%h" within the name is replaced with the
3895              hostname  on which the slurmd is running.  The first "%n" within
3896              the name is replaced with the  Slurm  node  name  on  which  the
3897              slurmd is running.  The default value is "/var/run/slurmd.pid".
3898
3899       SlurmdPort
3900              The port number that the Slurm compute node daemon, slurmd, lis‐
3901              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3902              lished  at  system  build time. If none is explicitly specified,
3903              its value will be 6818.  A restart of slurmctld is required  for
3904              changes  to  this parameter to take effect.  NOTE: Either slurm‐
3905              ctld and slurmd daemons must not execute on the  same  nodes  or
3906              the values of SlurmctldPort and SlurmdPort must be different.
3907
3908              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3909              automatically try to interact  with  anything  opened  on  ports
3910              8192-60000.   Configure  SlurmdPort to use a port outside of the
3911              configured SrunPortRange and RSIP's port range.
3912
3913       SlurmdSpoolDir
3914              Fully qualified pathname of a directory into  which  the  slurmd
3915              daemon's  state information and batch job script information are
3916              written. This must be a  common  pathname  for  all  nodes,  but
3917              should represent a directory which is local to each node (refer‐
3918              ence   a   local   file   system).   The   default   value    is
3919              "/var/spool/slurmd".  The first "%h" within the name is replaced
3920              with the hostname on which the slurmd  is  running.   The  first
3921              "%n"  within  the  name  is replaced with the Slurm node name on
3922              which the slurmd is running.
3923
3924       SlurmdSyslogDebug
3925              The slurmd daemon will log events to  the  syslog  file  at  the
3926              specified  level  of  detail. If not set, the slurmd daemon will
3927              log to syslog at level fatal, unless there is  no  SlurmdLogFile
3928              and  it  is running in the background, in which case it will log
3929              to syslog at the level specified by SlurmdDebug   (at  fatal  in
3930              the  case  that SlurmdDebug is set to quiet) or it is run in the
3931              foreground, when it will be set to quiet.
3932
3933              quiet     Log nothing
3934
3935              fatal     Log only fatal errors
3936
3937              error     Log only errors
3938
3939              info      Log errors and general informational messages
3940
3941              verbose   Log errors and verbose informational messages
3942
3943              debug     Log errors and verbose informational messages and  de‐
3944                        bugging messages
3945
3946              debug2    Log errors and verbose informational messages and more
3947                        debugging messages
3948
3949              debug3    Log errors and verbose informational messages and even
3950                        more debugging messages
3951
3952              debug4    Log errors and verbose informational messages and even
3953                        more debugging messages
3954
3955              debug5    Log errors and verbose informational messages and even
3956                        more debugging messages
3957
3958              NOTE: By default, Slurm's systemd service files start daemons in
3959              the foreground with the -D option. This means that systemd  will
3960              capture  stdout/stderr output and print that to syslog, indepen‐
3961              dent of Slurm printing to syslog directly.  To  prevent  systemd
3962              from  doing  this,  add  "StandardOutput=null"  and "StandardEr‐
3963              ror=null" to the respective service files or override files.
3964
3965       SlurmdTimeout
3966              The interval, in seconds, that the Slurm  controller  waits  for
3967              slurmd  to respond before configuring that node's state to DOWN.
3968              A value of zero indicates the node will not be tested by  slurm‐
3969              ctld  to confirm the state of slurmd, the node will not be auto‐
3970              matically set  to  a  DOWN  state  indicating  a  non-responsive
3971              slurmd,  and  some other tool will take responsibility for moni‐
3972              toring the state of each compute node  and  its  slurmd  daemon.
3973              Slurm's hierarchical communication mechanism is used to ping the
3974              slurmd daemons in order to minimize system noise  and  overhead.
3975              The  default  value  is  300  seconds.  The value may not exceed
3976              65533 seconds.
3977
3978       SlurmdUser
3979              The name of the user that the slurmd daemon executes  as.   This
3980              user  must  exist on all nodes of the cluster for authentication
3981              of communications between Slurm components.  The  default  value
3982              is "root".
3983
3984       SlurmSchedLogFile
3985              Fully  qualified  pathname of the scheduling event logging file.
3986              The syntax of this parameter is the same  as  for  SlurmctldLog‐
3987              File.   In  order  to  configure scheduler logging, set both the
3988              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3989
3990       SlurmSchedLogLevel
3991              The initial level of scheduling event logging,  similar  to  the
3992              SlurmctldDebug  parameter  used  to control the initial level of
3993              slurmctld logging.  Valid values for SlurmSchedLogLevel are  "0"
3994              (scheduler  logging  disabled)  and  "1"  (scheduler logging en‐
3995              abled).  If this parameter is omitted, the value defaults to "0"
3996              (disabled).   In  order to configure scheduler logging, set both
3997              the SlurmSchedLogFile and  SlurmSchedLogLevel  parameters.   The
3998              scheduler  logging  level can be changed dynamically using scon‐
3999              trol.
4000
4001       SlurmUser
4002              The name of the user that the slurmctld daemon executes as.  For
4003              security  purposes,  a  user  other  than "root" is recommended.
4004              This user must exist on all nodes of the cluster for authentica‐
4005              tion  of  communications  between Slurm components.  The default
4006              value is "root".
4007
4008       SrunEpilog
4009              Fully qualified pathname of an executable to be run by srun fol‐
4010              lowing the completion of a job step.  The command line arguments
4011              for the executable will be the command and arguments of the  job
4012              step.   This configuration parameter may be overridden by srun's
4013              --epilog parameter. Note that while the other "Epilog"  executa‐
4014              bles  (e.g.,  TaskEpilog) are run by slurmd on the compute nodes
4015              where the tasks are executed, the SrunEpilog runs  on  the  node
4016              where the "srun" is executing.
4017
4018       SrunPortRange
4019              The  srun  creates  a set of listening ports to communicate with
4020              the controller, the slurmstepd and  to  handle  the  application
4021              I/O.  By default these ports are ephemeral meaning the port num‐
4022              bers are selected by the  kernel.  Using  this  parameter  allow
4023              sites  to  configure a range of ports from which srun ports will
4024              be selected. This is useful if sites want to allow only  certain
4025              port range on their network.
4026
4027              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4028              automatically try to interact  with  anything  opened  on  ports
4029              8192-60000.   Configure  SrunPortRange  to  use a range of ports
4030              above those used by RSIP, ideally 1000 or more ports, for  exam‐
4031              ple "SrunPortRange=60001-63000".
4032
4033              Note:  SrunPortRange  must be large enough to cover the expected
4034              number of srun ports created on a given submission node. A  sin‐
4035              gle srun opens 3 listening ports plus 2 more for every 48 hosts.
4036              Example:
4037
4038              srun -N 48 will use 5 listening ports.
4039
4040              srun -N 50 will use 7 listening ports.
4041
4042              srun -N 200 will use 13 listening ports.
4043
4044       SrunProlog
4045              Fully qualified pathname of an executable  to  be  run  by  srun
4046              prior  to  the launch of a job step.  The command line arguments
4047              for the executable will be the command and arguments of the  job
4048              step.   This configuration parameter may be overridden by srun's
4049              --prolog parameter. Note that while the other "Prolog"  executa‐
4050              bles  (e.g.,  TaskProlog) are run by slurmd on the compute nodes
4051              where the tasks are executed, the SrunProlog runs  on  the  node
4052              where the "srun" is executing.
4053
4054       StateSaveLocation
4055              Fully  qualified  pathname  of  a directory into which the Slurm
4056              controller,  slurmctld,  saves   its   state   (e.g.   "/usr/lo‐
4057              cal/slurm/checkpoint").   Slurm state will saved here to recover
4058              from system failures.  SlurmUser must be able to create files in
4059              this  directory.   If you have a secondary SlurmctldHost config‐
4060              ured, this location should be readable and writable by both sys‐
4061              tems.   Since  all running and pending job information is stored
4062              here, the use of a reliable file system (e.g.  RAID)  is  recom‐
4063              mended.  The default value is "/var/spool".  A restart of slurm‐
4064              ctld is required for changes to this parameter to  take  effect.
4065              If any slurm daemons terminate abnormally, their core files will
4066              also be written into this directory.
4067
4068       SuspendExcNodes
4069              Specifies the nodes which are to not be  placed  in  power  save
4070              mode,  even  if  the node remains idle for an extended period of
4071              time.  Use Slurm's hostlist expression to identify nodes with an
4072              optional  ":"  separator  and count of nodes to exclude from the
4073              preceding range.  For example "nid[10-20]:4" will prevent 4  us‐
4074              able  nodes  (i.e IDLE and not DOWN, DRAINING or already powered
4075              down) in the set "nid[10-20]" from being powered down.  Multiple
4076              sets of nodes can be specified with or without counts in a comma
4077              separated list (e.g  "nid[10-20]:4,nid[80-90]:2").   If  a  node
4078              count  specification  is  given, any list of nodes to NOT have a
4079              node count must be after the last specification  with  a  count.
4080              For  example  "nid[10-20]:4,nid[60-70]"  will exclude 4 nodes in
4081              the set "nid[10-20]:4" plus all nodes in  the  set  "nid[60-70]"
4082              while  "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the set
4083              "nid[1-3],nid[10-20]".  By default no nodes are excluded.
4084
4085       SuspendExcParts
4086              Specifies the partitions whose nodes are to  not  be  placed  in
4087              power  save  mode, even if the node remains idle for an extended
4088              period of time.  Multiple partitions can be identified and sepa‐
4089              rated by commas.  By default no nodes are excluded.
4090
4091       SuspendProgram
4092              SuspendProgram  is the program that will be executed when a node
4093              remains idle for an extended period of time.   This  program  is
4094              expected  to place the node into some power save mode.  This can
4095              be used to reduce the frequency and voltage of a  node  or  com‐
4096              pletely  power the node off.  The program executes as SlurmUser.
4097              The argument to the program will be the names  of  nodes  to  be
4098              placed  into  power savings mode (using Slurm's hostlist expres‐
4099              sion format).  By default, no program is run.
4100
4101       SuspendRate
4102              The rate at which nodes are placed into power save mode by  Sus‐
4103              pendProgram.  The value is number of nodes per minute and it can
4104              be used to prevent a large drop in power consumption (e.g. after
4105              a  large  job  completes).  A value of zero results in no limits
4106              being imposed.  The default value is 60 nodes per minute.
4107
4108       SuspendTime
4109              Nodes which remain idle or down for this number of seconds  will
4110              be  placed into power save mode by SuspendProgram.  Setting Sus‐
4111              pendTime to anything but INFINITE (or -1) will enable power save
4112              mode. INFINITE is the default.
4113
4114       SuspendTimeout
4115              Maximum  time permitted (in seconds) between when a node suspend
4116              request is issued and when the node is shutdown.  At  that  time
4117              the  node  must  be  ready  for a resume request to be issued as
4118              needed for new work.  The default value is 30 seconds.
4119
4120       SwitchParameters
4121              Optional parameters for the switch plugin.
4122
4123              On     HPE      Slingshot      systems      configured      with
4124              SwitchType=switch/hpe_slingshot,  the  following  parameters are
4125              supported (separate multiple parameters with a comma):
4126
4127              vnis=<min>-<max>
4128                     Range of VNIs to  allocate  for  jobs  and  applications.
4129                     This parameter is required.
4130
4131              tcs=<class1>[:<class2>]...
4132                     Set  of  traffic  classes  to configure for applications.
4133                     Supported traffic classes are  DEDICATED_ACCESS,  LOW_LA‐
4134                     TENCY, BULK_DATA, and BEST_EFFORT.
4135
4136              single_node_vni
4137                     Allocate a VNI for single node job steps.
4138
4139              job_vni
4140                     Allocate an additional VNI for jobs, shared among all job
4141                     steps.
4142
4143              def_<rsrc>=<val>
4144                     Per-CPU reserved allocation for this resource.
4145
4146              res_<rsrc>=<val>
4147                     Per-node reserved allocation for this resource.  If  set,
4148                     overrides the per-CPU allocation.
4149
4150              max_<rsrc>=<val>
4151                     Maximum per-node application for this resource.
4152
4153       The resources that may be configured are:
4154
4155              txqs   Transmit  command queues. The default is 3 per-CPU, maxi‐
4156                     mum 1024 per-node.
4157
4158              tgqs   Target command queues. The default is 2 per-CPU,  maximum
4159                     512 per-node.
4160
4161              eqs    Event queues. The default is 8 per-CPU, maximum 2048 per-
4162                     node.
4163
4164              cts    Counters. The default is 2  per-CPU,  maximum  2048  per-
4165                     node.
4166
4167              tles   Trigger  list  entries. The default is 1 per-CPU, maximum
4168                     2048 per-node.
4169
4170              ptes   Portable table entries. The default is 8 per-CPU, maximum
4171                     2048 per-node.
4172
4173              les    List  entries.  The default is 134 per-CPU, maximum 65535
4174                     per-node.
4175
4176              acs    Addressing contexts. The default is  4  per-CPU,  maximum
4177                     1024 per-node.
4178
4179       SwitchType
4180              Identifies  the type of switch or interconnect used for applica‐
4181              tion     communications.      Acceptable     values      include
4182              "switch/cray_aries" for Cray systems, "switch/hpe_slingshot" for
4183              HPE Slingshot systems and "switch/none" for switches not requir‐
4184              ing  special processing for job launch or termination (Ethernet,
4185              and InfiniBand).  The default value is "switch/none".  All Slurm
4186              daemons,  commands  and  running  jobs  must  be restarted for a
4187              change in SwitchType to take effect.  If running jobs  exist  at
4188              the  time slurmctld is restarted with a new value of SwitchType,
4189              records of all jobs in any state may be lost.
4190
4191       TaskEpilog
4192              Fully qualified pathname of a program  to  be  executed  as  the
4193              slurm  job's owner after termination of each task.  See TaskPro‐
4194              log for execution order details.
4195
4196       TaskPlugin
4197              Identifies the type of task launch  plugin,  typically  used  to
4198              provide resource management within a node (e.g. pinning tasks to
4199              specific processors). More than one task plugin can be specified
4200              in  a  comma-separated  list. The prefix of "task/" is optional.
4201              Acceptable values include:
4202
4203              task/affinity  enables      resource      containment      using
4204                             sched_setaffinity().  This enables the --cpu-bind
4205                             and/or --mem-bind srun options.
4206
4207              task/cgroup    enables resource containment using Linux  control
4208                             cgroups.   This  enables  the  --cpu-bind  and/or
4209                             --mem-bind  srun   options.    NOTE:   see   "man
4210                             cgroup.conf" for configuration details.
4211
4212              task/none      for systems requiring no special handling of user
4213                             tasks.  Lacks support for the  --cpu-bind  and/or
4214                             --mem-bind  srun  options.   The default value is
4215                             "task/none".
4216
4217              NOTE: It is recommended to stack  task/affinity,task/cgroup  to‐
4218              gether  when  configuring  TaskPlugin,  and  setting  Constrain‐
4219              Cores=yes in cgroup.conf.  This  setup  uses  the  task/affinity
4220              plugin  for  setting  the  affinity  of  the  tasks and uses the
4221              task/cgroup plugin to fence tasks into the specified resources.
4222
4223              NOTE: For CRAY systems only: task/cgroup must be used with,  and
4224              listed  after  task/cray_aries  in TaskPlugin. The task/affinity
4225              plugin can be listed anywhere, but the previous constraint  must
4226              be  satisfied.  For  CRAY  systems, a configuration like this is
4227              recommended:
4228              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
4229
4230       TaskPluginParam
4231              Optional parameters  for  the  task  plugin.   Multiple  options
4232              should be comma separated.  None, Sockets, Cores and Threads are
4233              mutually exclusive and treated as  a  last  possible  source  of
4234              --cpu-bind default. See also Node and Partition CpuBind options.
4235
4236              Cores  Bind  tasks  to  cores  by  default.  Overrides automatic
4237                     binding.
4238
4239              None   Perform no task binding by default.  Overrides  automatic
4240                     binding.
4241
4242              Sockets
4243                     Bind to sockets by default.  Overrides automatic binding.
4244
4245              Threads
4246                     Bind to threads by default.  Overrides automatic binding.
4247
4248              SlurmdOffSpec
4249                     If  specialized cores or CPUs are identified for the node
4250                     (i.e. the CoreSpecCount or CpuSpecList are configured for
4251                     the node), then Slurm daemons running on the compute node
4252                     (i.e. slurmd and slurmstepd) should run outside of  those
4253                     resources  (i.e. specialized resources are completely un‐
4254                     available to Slurm daemons and jobs  spawned  by  Slurm).
4255                     This  option  may  not  be  used with the task/cray_aries
4256                     plugin.
4257
4258              Verbose
4259                     Verbosely report binding before tasks run by default.
4260
4261              Autobind
4262                     Set a default binding in the event  that  "auto  binding"
4263                     doesn't  find  a match.  Set to Threads, Cores or Sockets
4264                     (E.g. TaskPluginParam=autobind=threads).
4265
4266       TaskProlog
4267              Fully qualified pathname of a program  to  be  executed  as  the
4268              slurm job's owner prior to initiation of each task.  Besides the
4269              normal environment variables, this has SLURM_TASK_PID  available
4270              to  identify the process ID of the task being started.  Standard
4271              output from this program can be used to control the  environment
4272              variables and output for the user program.
4273
4274              export NAME=value   Will  set environment variables for the task
4275                                  being spawned.  Everything after  the  equal
4276                                  sign  to the end of the line will be used as
4277                                  the value for the environment variable.  Ex‐
4278                                  porting  of  functions is not currently sup‐
4279                                  ported.
4280
4281              print ...           Will cause that line  (without  the  leading
4282                                  "print  ")  to be printed to the job's stan‐
4283                                  dard output.
4284
4285              unset NAME          Will clear  environment  variables  for  the
4286                                  task being spawned.
4287
4288              The order of task prolog/epilog execution is as follows:
4289
4290              1. pre_launch_priv()
4291                                  Function in TaskPlugin
4292
4293              1. pre_launch()     Function in TaskPlugin
4294
4295              2. TaskProlog       System-wide  per  task  program  defined  in
4296                                  slurm.conf
4297
4298              3. User prolog      Job-step-specific task program defined using
4299                                  srun's      --task-prolog      option     or
4300                                  SLURM_TASK_PROLOG environment variable
4301
4302              4. Task             Execute the job step's task
4303
4304              5. User epilog      Job-step-specific task program defined using
4305                                  srun's      --task-epilog      option     or
4306                                  SLURM_TASK_EPILOG environment variable
4307
4308              6. TaskEpilog       System-wide  per  task  program  defined  in
4309                                  slurm.conf
4310
4311              7. post_term()      Function in TaskPlugin
4312
4313       TCPTimeout
4314              Time  permitted  for  TCP  connection to be established. Default
4315              value is 2 seconds.
4316
4317       TmpFS  Fully qualified pathname of the file system  available  to  user
4318              jobs for temporary storage. This parameter is used in establish‐
4319              ing a node's TmpDisk space.  The default value is "/tmp".
4320
4321       TopologyParam
4322              Comma-separated options identifying network topology options.
4323
4324              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4325                             when TopologyPlugin=topology/tree.
4326
4327              TopoOptional   Only  optimize allocation for network topology if
4328                             the job includes a switch option. Since  optimiz‐
4329                             ing  resource  allocation  for  topology involves
4330                             much higher system overhead, this option  can  be
4331                             used  to  impose  the extra overhead only on jobs
4332                             which can take advantage of it. If most job allo‐
4333                             cations  are  not optimized for network topology,
4334                             they may fragment resources  to  the  point  that
4335                             topology optimization for other jobs will be dif‐
4336                             ficult to achieve.  NOTE: Jobs  may  span  across
4337                             nodes  without  common  parent switches with this
4338                             enabled.
4339
4340       TopologyPlugin
4341              Identifies the plugin to be used  for  determining  the  network
4342              topology and optimizing job allocations to minimize network con‐
4343              tention.  See NETWORK TOPOLOGY below  for  details.   Additional
4344              plugins  may be provided in the future which gather topology in‐
4345              formation directly from the network.  Acceptable values include:
4346
4347              topology/3d_torus    best-fit   logic   over   three-dimensional
4348                                   topology
4349
4350              topology/none        default  for  other systems, best-fit logic
4351                                   over one-dimensional topology
4352
4353              topology/tree        used for  a  hierarchical  network  as  de‐
4354                                   scribed in a topology.conf file
4355
4356       TrackWCKey
4357              Boolean  yes  or no.  Used to set display and track of the Work‐
4358              load Characterization Key.  Must be set to track  correct  wckey
4359              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4360              file to create historical usage reports.
4361
4362       TreeWidth
4363              Slurmd daemons use a virtual tree  network  for  communications.
4364              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4365              architectures with a front end node running the  slurmd  daemon,
4366              the  value must always be equal to or greater than the number of
4367              front end nodes which eliminates the need for message forwarding
4368              between  the slurmd daemons.  On other architectures the default
4369              value is 50, meaning each slurmd daemon can communicate with  up
4370              to  50 other slurmd daemons and over 2500 nodes can be contacted
4371              with two message hops.  The default value  will  work  well  for
4372              most  clusters.   Optimal  system  performance  can typically be
4373              achieved if TreeWidth is set to the square root of the number of
4374              nodes  in the cluster for systems having no more than 2500 nodes
4375              or the cube root for larger systems. The value  may  not  exceed
4376              65533.
4377
4378       UnkillableStepProgram
4379              If  the  processes in a job step are determined to be unkillable
4380              for a period of  time  specified  by  the  UnkillableStepTimeout
4381              variable, the program specified by UnkillableStepProgram will be
4382              executed.  By default no program is run.
4383
4384              See section UNKILLABLE STEP PROGRAM SCRIPT for more information.
4385
4386       UnkillableStepTimeout
4387              The length of time, in seconds, that Slurm will wait before  de‐
4388              ciding  that  processes in a job step are unkillable (after they
4389              have been signaled with SIGKILL) and execute  UnkillableStepPro‐
4390              gram.   The  default  timeout value is 60 seconds.  If exceeded,
4391              the compute node will be drained to prevent future jobs from be‐
4392              ing scheduled on the node.
4393
4394       UsePAM If  set  to  1, PAM (Pluggable Authentication Modules for Linux)
4395              will be enabled.  PAM is used to establish the upper bounds  for
4396              resource limits. With PAM support enabled, local system adminis‐
4397              trators can dynamically configure system resource limits. Chang‐
4398              ing  the upper bound of a resource limit will not alter the lim‐
4399              its of running jobs, only jobs started after a change  has  been
4400              made  will  pick up the new limits.  The default value is 0 (not
4401              to enable PAM support).  Remember that PAM also needs to be con‐
4402              figured  to  support  Slurm as a service.  For sites using PAM's
4403              directory based configuration option, a configuration file named
4404              slurm  should  be  created.  The module-type, control-flags, and
4405              module-path names that should be included in the file are:
4406              auth        required      pam_localuser.so
4407              auth        required      pam_shells.so
4408              account     required      pam_unix.so
4409              account     required      pam_access.so
4410              session     required      pam_unix.so
4411              For sites configuring PAM with a general configuration file, the
4412              appropriate  lines (see above), where slurm is the service-name,
4413              should be added.
4414
4415              NOTE:  UsePAM  option  has  nothing  to   do   with   the   con‐
4416              tribs/pam/pam_slurm  and/or contribs/pam_slurm_adopt modules. So
4417              these two modules can work independently of the  value  set  for
4418              UsePAM.
4419
4420       VSizeFactor
4421              Memory  specifications in job requests apply to real memory size
4422              (also known as resident set size). It  is  possible  to  enforce
4423              virtual  memory  limits  for both jobs and job steps by limiting
4424              their virtual memory to some percentage of their real memory al‐
4425              location.  The  VSizeFactor parameter specifies the job's or job
4426              step's virtual memory limit as a percentage of its  real  memory
4427              limit.  For  example,  if a job's real memory limit is 500MB and
4428              VSizeFactor is set to 101 then the job will  be  killed  if  its
4429              real  memory  exceeds  500MB or its virtual memory exceeds 505MB
4430              (101 percent of the real memory limit).  The default value is 0,
4431              which  disables enforcement of virtual memory limits.  The value
4432              may not exceed 65533 percent.
4433
4434              NOTE: This parameter is dependent on OverMemoryKill  being  con‐
4435              figured in JobAcctGatherParams. It is also possible to configure
4436              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4437              Factor  will  not  have  an  effect  on  memory enforcement done
4438              through cgroups.
4439
4440       WaitTime
4441              Specifies how many seconds the srun command  should  by  default
4442              wait  after the first task terminates before terminating all re‐
4443              maining tasks. The "--wait" option  on  the  srun  command  line
4444              overrides  this  value.   The default value is 0, which disables
4445              this feature.  May not exceed 65533 seconds.
4446
4447       X11Parameters
4448              For use with Slurm's built-in X11 forwarding implementation.
4449
4450              home_xauthority
4451                      If set, xauth data on the compute node will be placed in
4452                      ~/.Xauthority  rather  than  in  a  temporary file under
4453                      TmpFS.
4454

NODE CONFIGURATION

4456       The configuration of nodes (or machines) to be managed by Slurm is also
4457       specified  in  /etc/slurm.conf.   Changes  in  node configuration (e.g.
4458       adding nodes, changing their processor count, etc.) require  restarting
4459       both  the  slurmctld daemon and the slurmd daemons.  All slurmd daemons
4460       must know each node in the system to forward messages in support of hi‐
4461       erarchical  communications.   Only the NodeName must be supplied in the
4462       configuration file.  All other node configuration  information  is  op‐
4463       tional.  It is advisable to establish baseline node configurations, es‐
4464       pecially if the cluster is heterogeneous.  Nodes which register to  the
4465       system  with  less  than the configured resources (e.g. too little mem‐
4466       ory), will be placed in the "DOWN" state to avoid  scheduling  jobs  on
4467       them.   Establishing  baseline  configurations  will also speed Slurm's
4468       scheduling process by permitting it to compare job requirements against
4469       these (relatively few) configuration parameters and possibly avoid hav‐
4470       ing to check job requirements against every individual node's  configu‐
4471       ration.   The  resources  checked  at node registration time are: CPUs,
4472       RealMemory and TmpDisk.
4473
4474       Default values can be specified with a record in which NodeName is "DE‐
4475       FAULT".  The default entry values will apply only to lines following it
4476       in the configuration file and the default values can be reset  multiple
4477       times  in  the  configuration  file  with multiple entries where "Node‐
4478       Name=DEFAULT".  Each line where NodeName is "DEFAULT" will  replace  or
4479       add  to  previous  default values and will not reinitialize the default
4480       values.  The "NodeName=" specification must be placed on every line de‐
4481       scribing the configuration of nodes.  A single node name can not appear
4482       as a NodeName value in more than one line (duplicate node name  records
4483       will  be  ignored).  In fact, it is generally possible and desirable to
4484       define the configurations of all nodes in only a few lines.  This  con‐
4485       vention  permits  significant  optimization in the scheduling of larger
4486       clusters.  In order to support the concept of jobs  requiring  consecu‐
4487       tive  nodes  on some architectures, node specifications should be place
4488       in this file in consecutive order.  No single node name may  be  listed
4489       more  than  once in the configuration file.  Use "DownNodes=" to record
4490       the state of nodes which are temporarily in a DOWN,  DRAIN  or  FAILING
4491       state  without  altering  permanent  configuration  information.  A job
4492       step's tasks are allocated to nodes in order the nodes  appear  in  the
4493       configuration  file.  There  is presently no capability within Slurm to
4494       arbitrarily order a job step's tasks.
4495
4496       Multiple node names may be comma  separated  (e.g.  "alpha,beta,gamma")
4497       and/or a simple node range expression may optionally be used to specify
4498       numeric ranges of nodes to avoid building  a  configuration  file  with
4499       large  numbers  of  entries.  The node range expression can contain one
4500       pair of square brackets with  a  sequence  of  comma-separated  numbers
4501       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4502       "lx[15,18,32-33]").  Note that the numeric ranges can  include  one  or
4503       more  leading  zeros to indicate the numeric portion has a fixed number
4504       of digits (e.g. "linux[0000-1023]").  Multiple numeric  ranges  can  be
4505       included  in the expression (e.g. "rack[0-63]_blade[0-41]").  If one or
4506       more numeric expressions are included, one of them must be at  the  end
4507       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4508       always be used in a comma-separated list.
4509
4510       The node configuration specified the following information:
4511
4512
4513       NodeName
4514              Name that Slurm uses to refer to a node.  Typically  this  would
4515              be  the  string that "/bin/hostname -s" returns.  It may also be
4516              the fully qualified domain name as  returned  by  "/bin/hostname
4517              -f"  (e.g.  "foo1.bar.com"), or any valid domain name associated
4518              with the host through the host database (/etc/hosts) or DNS, de‐
4519              pending  on  the resolver settings.  Note that if the short form
4520              of the hostname is not used, it may prevent use of hostlist  ex‐
4521              pressions (the numeric portion in brackets must be at the end of
4522              the string).  It may also be an arbitrary string if NodeHostname
4523              is  specified.   If the NodeName is "DEFAULT", the values speci‐
4524              fied with that record will apply to subsequent  node  specifica‐
4525              tions  unless explicitly set to other values in that node record
4526              or replaced with a different set of default values.   Each  line
4527              where  NodeName is "DEFAULT" will replace or add to previous de‐
4528              fault values and not a reinitialize the default values.  For ar‐
4529              chitectures  in  which the node order is significant, nodes will
4530              be considered consecutive in the order defined.  For example, if
4531              the configuration for "NodeName=charlie" immediately follows the
4532              configuration for "NodeName=baker" they will be considered adja‐
4533              cent  in  the  computer.   NOTE:  If  the  NodeName is "ALL" the
4534              process parsing the configuration will exit immediately as it is
4535              an internally reserved word.
4536
4537       NodeHostname
4538              Typically  this  would be the string that "/bin/hostname -s" re‐
4539              turns.  It may also be the fully qualified domain  name  as  re‐
4540              turned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid
4541              domain name associated with the host through the  host  database
4542              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4543              that if the short form of the hostname is not used, it may  pre‐
4544              vent  use of hostlist expressions (the numeric portion in brack‐
4545              ets must be at the end of the string).  A node range  expression
4546              can  be  used  to  specify  a set of nodes.  If an expression is
4547              used, the number of nodes identified by NodeHostname on  a  line
4548              in  the  configuration  file  must be identical to the number of
4549              nodes identified by NodeName.  By default, the NodeHostname will
4550              be identical in value to NodeName.
4551
4552       NodeAddr
4553              Name  that a node should be referred to in establishing a commu‐
4554              nications path.  This name will be used as an  argument  to  the
4555              getaddrinfo()  function for identification.  If a node range ex‐
4556              pression is used to designate multiple nodes, they must  exactly
4557              match  the  entries  in  the  NodeName  (e.g.  "NodeName=lx[0-7]
4558              NodeAddr=elx[0-7]").  NodeAddr may also  contain  IP  addresses.
4559              By default, the NodeAddr will be identical in value to NodeHost‐
4560              name.
4561
4562       BcastAddr
4563              Alternate network path to be used for sbcast network traffic  to
4564              a  given  node.   This  name  will be used as an argument to the
4565              getaddrinfo() function.  If a node range expression is  used  to
4566              designate multiple nodes, they must exactly match the entries in
4567              the  NodeName  (e.g.   "NodeName=lx[0-7]   BcastAddr=elx[0-7]").
4568              BcastAddr  may also contain IP addresses.  By default, the Bcas‐
4569              tAddr is unset,  and  sbcast  traffic  will  be  routed  to  the
4570              NodeAddr for a given node.  Note: cannot be used with Communica‐
4571              tionParameters=NoInAddrAny.
4572
4573       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4574              that  when Boards is specified, SocketsPerBoard, CoresPerSocket,
4575              and ThreadsPerCore should be specified.  The default value is 1.
4576
4577       CoreSpecCount
4578              Number of cores reserved for system  use.   Depending  upon  the
4579              TaskPluginParam option of SlurmdOffSpec, the Slurm daemon slurmd
4580              may either be confined to these resources (the default) or  pre‐
4581              vented  from  using  these  resources.  Isolation of slurmd from
4582              user jobs may improve application performance.  A  job  can  use
4583              these  cores if AllowSpecResourcesUsage=yes and the user explic‐
4584              itly requests less than the configured CoreSpecCount.   If  this
4585              option  and CpuSpecList are both designated for a node, an error
4586              is generated.  For information on the algorithm used by Slurm to
4587              select  the cores refer to the core specialization documentation
4588              ( https://slurm.schedmd.com/core_spec.html ).
4589
4590       CoresPerSocket
4591              Number of cores in a  single  physical  processor  socket  (e.g.
4592              "2").   The  CoresPerSocket  value describes physical cores, not
4593              the logical number of processors per socket.  NOTE: If you  have
4594              multi-core  processors, you will likely need to specify this pa‐
4595              rameter in order to optimize scheduling.  The default  value  is
4596              1.
4597
4598       CpuBind
4599              If  a job step request does not specify an option to control how
4600              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4601              located to the job have the same CpuBind option the node CpuBind
4602              option will control how tasks are bound to allocated  resources.
4603              Supported  values  for  CpuBind  are  "none",  "socket",  "ldom"
4604              (NUMA), "core" and "thread".
4605
4606       CPUs   Number of logical processors on the node (e.g. "2").  It can  be
4607              set to the total number of sockets(supported only by select/lin‐
4608              ear), cores or threads.  This can be useful  when  you  want  to
4609              schedule  only  the  cores  on a hyper-threaded node. If CPUs is
4610              omitted, its default will be set equal to the product of Boards,
4611              Sockets, CoresPerSocket, and ThreadsPerCore.
4612
4613       CpuSpecList
4614              A  comma-delimited  list  of Slurm abstract CPU IDs reserved for
4615              system use.  The list will be  expanded  to  include  all  other
4616              CPUs, if any, on the same cores.  Depending upon the TaskPlugin‐
4617              Param option of SlurmdOffSpec, the Slurm daemon slurmd  may  ei‐
4618              ther  be  confined to these resources (the default) or prevented
4619              from using these resources.  Isolation of slurmd from user  jobs
4620              may  improve application performance.  A job can use these cores
4621              if AllowSpecResourcesUsage=yes and the user explicitly  requests
4622              less  than  the number of CPUs in this list.  If this option and
4623              CoreSpecCount are both designated for a node, an error is gener‐
4624              ated.   This  option has no effect unless cgroup job confinement
4625              is also configured (i.e. the task/cgroup TaskPlugin  is  enabled
4626              and ConstrainCores=yes is set in cgroup.conf).
4627
4628       Features
4629              A  comma-delimited  list of arbitrary strings indicative of some
4630              characteristic associated with the node.  There is no  value  or
4631              count  associated with a feature at this time, a node either has
4632              a feature or it does not.  A desired feature may contain  a  nu‐
4633              meric  component  indicating,  for  example, processor speed but
4634              this numeric component will be considered to be part of the fea‐
4635              ture  string.  Features  are intended to be used to filter nodes
4636              eligible to run jobs via the --constraint argument.  By  default
4637              a  node  has  no features.  Also see Gres for being able to have
4638              more control such as types and count. Using features  is  faster
4639              than  scheduling  against  GRES but is limited to Boolean opera‐
4640              tions.
4641
4642       Gres   A comma-delimited list of generic resources specifications for a
4643              node.    The   format   is:  "<name>[:<type>][:no_consume]:<num‐
4644              ber>[K|M|G]".  The first  field  is  the  resource  name,  which
4645              matches the GresType configuration parameter name.  The optional
4646              type field might be used to identify a model of that generic re‐
4647              source.   It  is forbidden to specify both an untyped GRES and a
4648              typed GRES with the same <name>.  The optional no_consume  field
4649              allows  you  to  specify that a generic resource does not have a
4650              finite number of that resource that gets consumed as it  is  re‐
4651              quested. The no_consume field is a GRES specific setting and ap‐
4652              plies to the GRES, regardless of the type specified.  It  should
4653              not  be  used  with  GRES that has a dedicated plugin, if you're
4654              looking for a way to overcommit GPUs to  multiple  processes  at
4655              the  time  you  may be interested in using "shard" GRES instead.
4656              The final field must specify a generic resources count.  A  suf‐
4657              fix  of  "K",  "M",  "G", "T" or "P" may be used to multiply the
4658              number  by  1024,  1048576,   1073741824,   etc.   respectively.
4659              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4660              sume:4G").  By default a node has no generic resources  and  its
4661              maximum  count  is  that of an unsigned 64bit integer.  Also see
4662              Features for Boolean  flags  to  filter  nodes  using  job  con‐
4663              straints.
4664
4665       MemSpecLimit
4666              Amount  of memory, in megabytes, reserved for system use and not
4667              available for user allocations.  If the  task/cgroup  plugin  is
4668              configured  and  that plugin constrains memory allocations (i.e.
4669              the task/cgroup TaskPlugin is enabled and  ConstrainRAMSpace=yes
4670              is  set in cgroup.conf), then Slurm compute node daemons (slurmd
4671              plus slurmstepd) will be allocated the specified  memory  limit.
4672              Note  that  having the Memory set in SelectTypeParameters as any
4673              of the options that has it as a consumable  resource  is  needed
4674              for this option to work.  The daemons will not be killed if they
4675              exhaust the memory allocation (ie. the Out-Of-Memory  Killer  is
4676              disabled  for  the  daemon's memory cgroup).  If the task/cgroup
4677              plugin is not configured, the specified memory will only be  un‐
4678              available for user allocations.
4679
4680       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4681              tens to for work on this particular node. By default there is  a
4682              single  port  number for all slurmd daemons on all compute nodes
4683              as defined by the SlurmdPort  configuration  parameter.  Use  of
4684              this  option is not generally recommended except for development
4685              or testing purposes. If multiple slurmd  daemons  execute  on  a
4686              node this can specify a range of ports.
4687
4688              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4689              automatically try to interact  with  anything  opened  on  ports
4690              8192-60000.  Configure Port to use a port outside of the config‐
4691              ured SrunPortRange and RSIP's port range.
4692
4693       Procs  See CPUs.
4694
4695       RealMemory
4696              Size of real memory on the node in megabytes (e.g. "2048").  The
4697              default value is 1. Lowering RealMemory with the goal of setting
4698              aside some amount for the OS and not available for  job  alloca‐
4699              tions  will  not work as intended if Memory is not set as a con‐
4700              sumable resource in SelectTypeParameters. So one of the *_Memory
4701              options  need  to  be  enabled for that goal to be accomplished.
4702              Also see MemSpecLimit.
4703
4704       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4705              "DRAINED"  "DRAINING",  "FAIL"  or "FAILING".  Use quotes to en‐
4706              close a reason having more than one word.
4707
4708       Sockets
4709              Number of physical processor sockets/chips  on  the  node  (e.g.
4710              "2").   If  Sockets  is  omitted, it will be inferred from CPUs,
4711              CoresPerSocket,  and  ThreadsPerCore.    NOTE:   If   you   have
4712              multi-core processors, you will likely need to specify these pa‐
4713              rameters.  Sockets and SocketsPerBoard are  mutually  exclusive.
4714              If Sockets is specified when Boards is also used, Sockets is in‐
4715              terpreted as SocketsPerBoard rather than total sockets.  The de‐
4716              fault value is 1.
4717
4718       SocketsPerBoard
4719              Number  of  physical  processor  sockets/chips  on  a baseboard.
4720              Sockets and SocketsPerBoard are mutually exclusive.  The default
4721              value is 1.
4722
4723       State  State  of  the node with respect to the initiation of user jobs.
4724              Acceptable values are CLOUD, DOWN, DRAIN, FAIL, FAILING,  FUTURE
4725              and  UNKNOWN.  Node states of BUSY and IDLE should not be speci‐
4726              fied in the node configuration, but set the node  state  to  UN‐
4727              KNOWN instead.  Setting the node state to UNKNOWN will result in
4728              the node state being set to  BUSY,  IDLE  or  other  appropriate
4729              state  based  upon  recovered system state information.  The de‐
4730              fault value is UNKNOWN.  Also see the DownNodes parameter below.
4731
4732              CLOUD     Indicates the node exists in the cloud.   Its  initial
4733                        state  will be treated as powered down.  The node will
4734                        be available for use after its state is recovered from
4735                        Slurm's state save file or the slurmd daemon starts on
4736                        the compute node.
4737
4738              DOWN      Indicates the node failed and is unavailable to be al‐
4739                        located work.
4740
4741              DRAIN     Indicates  the  node  is  unavailable  to be allocated
4742                        work.
4743
4744              FAIL      Indicates the node is expected to fail  soon,  has  no
4745                        jobs allocated to it, and will not be allocated to any
4746                        new jobs.
4747
4748              FAILING   Indicates the node is expected to fail soon,  has  one
4749                        or  more  jobs  allocated to it, but will not be allo‐
4750                        cated to any new jobs.
4751
4752              FUTURE    Indicates the node is defined for future use and  need
4753                        not  exist  when  the Slurm daemons are started. These
4754                        nodes can be made available for use simply by updating
4755                        the  node state using the scontrol command rather than
4756                        restarting the slurmctld daemon. After these nodes are
4757                        made  available,  change their State in the slurm.conf
4758                        file. Until these nodes are made available, they  will
4759                        not  be  seen using any Slurm commands or nor will any
4760                        attempt be made to contact them.
4761
4762                        Dynamic Future Nodes
4763                               A slurmd started with -F[<feature>] will be as‐
4764                               sociated  with  a  FUTURE node that matches the
4765                               same configuration (sockets, cores, threads) as
4766                               reported  by slurmd -C. The node's NodeAddr and
4767                               NodeHostname will  automatically  be  retrieved
4768                               from  the  slurmd  and will be cleared when set
4769                               back to the FUTURE state. Dynamic FUTURE  nodes
4770                               retain  non-FUTURE  state on restart. Use scon‐
4771                               trol to put node back into FUTURE state.
4772
4773                               If the mapping of the NodeName  to  the  slurmd
4774                               HostName  is not updated in DNS, Dynamic Future
4775                               nodes won't know how to communicate  with  each
4776                               other  -- because NodeAddr and NodeHostName are
4777                               not defined in the slurm.conf -- and the fanout
4778                               communications  need  to be disabled by setting
4779                               TreeWidth to a high number (e.g. 65533). If the
4780                               DNS  mapping is made, then the cloud_dns Slurm‐
4781                               ctldParameter can be used.
4782
4783              UNKNOWN   Indicates the node's state is undefined  but  will  be
4784                        established (set to BUSY or IDLE) when the slurmd dae‐
4785                        mon on that node registers.  UNKNOWN  is  the  default
4786                        state.
4787
4788       ThreadsPerCore
4789              Number  of logical threads in a single physical core (e.g. "2").
4790              Note that the Slurm can allocate resources to jobs down  to  the
4791              resolution  of  a  core.  If your system is configured with more
4792              than one thread per core, execution of a different job  on  each
4793              thread  is  not supported unless you configure SelectTypeParame‐
4794              ters=CR_CPU plus CPUs; do not configure Sockets,  CoresPerSocket
4795              or ThreadsPerCore.  A job can execute a one task per thread from
4796              within one job step or execute a distinct job step  on  each  of
4797              the  threads.   Note  also  if  you are running with more than 1
4798              thread  per  core  and  running  the  select/cons_res   or   se‐
4799              lect/cons_tres  plugin then you will want to set the SelectType‐
4800              Parameters variable to something other than CR_CPU to avoid  un‐
4801              expected results.  The default value is 1.
4802
4803       TmpDisk
4804              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4805              "16384"). TmpFS (for "Temporary File System") identifies the lo‐
4806              cation  which  jobs should use for temporary storage.  Note this
4807              does not indicate the amount of free space available to the user
4808              on  the node, only the total file system size. The system admin‐
4809              istration should ensure this file system is purged as needed  so
4810              that  user  jobs  have access to most of this space.  The Prolog
4811              and/or Epilog programs (specified  in  the  configuration  file)
4812              might  be used to ensure the file system is kept clean.  The de‐
4813              fault value is 0.
4814
4815       Weight The priority of the node for scheduling  purposes.   All  things
4816              being  equal,  jobs  will be allocated the nodes with the lowest
4817              weight which satisfies their requirements.  For example, a  het‐
4818              erogeneous  collection  of  nodes  might be placed into a single
4819              partition for greater system utilization, responsiveness and ca‐
4820              pability.  It  would  be  preferable  to allocate smaller memory
4821              nodes rather than larger memory nodes if either will  satisfy  a
4822              job's  requirements.   The  units  of  weight are arbitrary, but
4823              larger weights should be assigned to nodes with more processors,
4824              memory, disk space, higher processor speed, etc.  Note that if a
4825              job allocation request can not be satisfied using the nodes with
4826              the  lowest weight, the set of nodes with the next lowest weight
4827              is added to the set of nodes under consideration for use (repeat
4828              as  needed  for higher weight values). If you absolutely want to
4829              minimize the number of higher weight nodes allocated  to  a  job
4830              (at a cost of higher scheduling overhead), give each node a dis‐
4831              tinct Weight value and they will be added to the pool  of  nodes
4832              being considered for scheduling individually.
4833
4834              The default value is 1.
4835
4836              NOTE:  Node  weights are first considered among currently avail‐
4837              able nodes. For example, a POWERED_DOWN node with a lower weight
4838              will not be evaluated before an IDLE node.
4839

DOWN NODE CONFIGURATION

4841       The  DownNodes=  parameter  permits  you  to mark certain nodes as in a
4842       DOWN, DRAIN, FAIL, FAILING or FUTURE state without altering the  perma‐
4843       nent configuration information listed under a NodeName= specification.
4844
4845
4846       DownNodes
4847              Any  node name, or list of node names, from the NodeName= speci‐
4848              fications.
4849
4850       Reason Identifies the reason for a node being  in  state  DOWN,  DRAIN,
4851              FAIL,  FAILING or FUTURE.  Use quotes to enclose a reason having
4852              more than one word.
4853
4854       State  State of the node with respect to the initiation of  user  jobs.
4855              Acceptable  values  are  DOWN,  DRAIN, FAIL, FAILING and FUTURE.
4856              For more information about these states see the descriptions un‐
4857              der  State in the NodeName= section above.  The default value is
4858              DOWN.
4859

FRONTEND NODE CONFIGURATION

4861       On computers where frontend nodes are used  to  execute  batch  scripts
4862       rather than compute nodes, one may configure one or more frontend nodes
4863       using the configuration parameters defined  below.  These  options  are
4864       very  similar to those used in configuring compute nodes. These options
4865       may only be used on systems configured and built with  the  appropriate
4866       parameters  (--have-front-end).   The front end configuration specifies
4867       the following information:
4868
4869
4870       AllowGroups
4871              Comma-separated list of group names which may  execute  jobs  on
4872              this  front  end node. By default, all groups may use this front
4873              end node.  A user will be permitted to use this front  end  node
4874              if  AllowGroups has at least one group associated with the user.
4875              May not be used with the DenyGroups option.
4876
4877       AllowUsers
4878              Comma-separated list of user names which  may  execute  jobs  on
4879              this  front  end  node. By default, all users may use this front
4880              end node.  May not be used with the DenyUsers option.
4881
4882       DenyGroups
4883              Comma-separated list of group names which are prevented from ex‐
4884              ecuting  jobs  on this front end node.  May not be used with the
4885              AllowGroups option.
4886
4887       DenyUsers
4888              Comma-separated list of user names which are prevented from exe‐
4889              cuting  jobs  on  this front end node.  May not be used with the
4890              AllowUsers option.
4891
4892       FrontendName
4893              Name that Slurm uses to refer to  a  frontend  node.   Typically
4894              this  would  be  the string that "/bin/hostname -s" returns.  It
4895              may also be the fully  qualified  domain  name  as  returned  by
4896              "/bin/hostname  -f"  (e.g.  "foo1.bar.com"), or any valid domain
4897              name  associated  with  the  host  through  the  host   database
4898              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4899              that if the short form of the hostname is not used, it may  pre‐
4900              vent  use of hostlist expressions (the numeric portion in brack‐
4901              ets must be at the end of the string).  If the  FrontendName  is
4902              "DEFAULT",  the  values specified with that record will apply to
4903              subsequent node specifications unless explicitly  set  to  other
4904              values in that frontend node record or replaced with a different
4905              set of default values.  Each line  where  FrontendName  is  "DE‐
4906              FAULT"  will replace or add to previous default values and not a
4907              reinitialize the default values.
4908
4909       FrontendAddr
4910              Name that a frontend node should be referred to in  establishing
4911              a  communications path. This name will be used as an argument to
4912              the getaddrinfo() function for identification.   As  with  Fron‐
4913              tendName, list the individual node addresses rather than using a
4914              hostlist expression.  The number  of  FrontendAddr  records  per
4915              line  must  equal  the  number  of FrontendName records per line
4916              (i.e. you can't map to node names to one address).  FrontendAddr
4917              may  also  contain  IP  addresses.  By default, the FrontendAddr
4918              will be identical in value to FrontendName.
4919
4920       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4921              tens  to  for  work on this particular frontend node. By default
4922              there is a single port number for  all  slurmd  daemons  on  all
4923              frontend nodes as defined by the SlurmdPort configuration param‐
4924              eter. Use of this option is not generally recommended except for
4925              development or testing purposes.
4926
4927              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4928              automatically try to interact  with  anything  opened  on  ports
4929              8192-60000.  Configure Port to use a port outside of the config‐
4930              ured SrunPortRange and RSIP's port range.
4931
4932       Reason Identifies the reason for a frontend node being in  state  DOWN,
4933              DRAINED,  DRAINING,  FAIL  or  FAILING.  Use quotes to enclose a
4934              reason having more than one word.
4935
4936       State  State of the frontend node with respect  to  the  initiation  of
4937              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4938              UNKNOWN.  Node states of BUSY and IDLE should not  be  specified
4939              in the node configuration, but set the node state to UNKNOWN in‐
4940              stead.  Setting the node state to UNKNOWN  will  result  in  the
4941              node  state  being  set to BUSY, IDLE or other appropriate state
4942              based upon recovered system state information.  For more  infor‐
4943              mation  about  these  states see the descriptions under State in
4944              the NodeName= section above.  The default value is UNKNOWN.
4945
4946       As an example, you can do something similar to the following to  define
4947       four front end nodes for running slurmd daemons.
4948       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4949
4950

NODESET CONFIGURATION

4952       The  nodeset  configuration  allows you to define a name for a specific
4953       set of nodes which can be used to simplify the partition  configuration
4954       section, especially for heterogenous or condo-style systems. Each node‐
4955       set may be defined by an explicit list of nodes,  and/or  by  filtering
4956       the  nodes  by  a  particular  configured feature. If both Feature= and
4957       Nodes= are used the nodeset shall be the  union  of  the  two  subsets.
4958       Note  that the nodesets are only used to simplify the partition defini‐
4959       tions at present, and are not usable outside of the partition  configu‐
4960       ration.
4961
4962
4963       Feature
4964              All  nodes  with this single feature will be included as part of
4965              this nodeset.
4966
4967       Nodes  List of nodes in this set.
4968
4969       NodeSet
4970              Unique name for a set of nodes. Must not overlap with any  Node‐
4971              Name definitions.
4972

PARTITION CONFIGURATION

4974       The partition configuration permits you to establish different job lim‐
4975       its or access controls for various groups  (or  partitions)  of  nodes.
4976       Nodes  may  be  in  more than one partition, making partitions serve as
4977       general purpose queues.  For example one may put the same set of  nodes
4978       into  two  different  partitions, each with different constraints (time
4979       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4980       allocated  resources  within a single partition.  Default values can be
4981       specified with a record in which PartitionName is "DEFAULT".   The  de‐
4982       fault entry values will apply only to lines following it in the config‐
4983       uration file and the default values can be reset multiple times in  the
4984       configuration file with multiple entries where "PartitionName=DEFAULT".
4985       The "PartitionName=" specification must be placed  on  every  line  de‐
4986       scribing  the  configuration of partitions.  Each line where Partition‐
4987       Name is "DEFAULT" will replace or add to previous  default  values  and
4988       not a reinitialize the default values.  A single partition name can not
4989       appear as a PartitionName value in more than one line (duplicate parti‐
4990       tion  name  records will be ignored).  If a partition that is in use is
4991       deleted from the configuration and slurm is restarted  or  reconfigured
4992       (scontrol  reconfigure),  jobs using the partition are canceled.  NOTE:
4993       Put all parameters for each partition on a single line.  Each  line  of
4994       partition configuration information should represent a different parti‐
4995       tion.  The partition configuration file contains the following informa‐
4996       tion:
4997
4998
4999       AllocNodes
5000              Comma-separated  list  of nodes from which users can submit jobs
5001              in the partition.  Node names may be specified  using  the  node
5002              range  expression  syntax described above.  The default value is
5003              "ALL".
5004
5005       AllowAccounts
5006              Comma-separated list of accounts which may execute jobs  in  the
5007              partition.   The default value is "ALL".  NOTE: If AllowAccounts
5008              is used then DenyAccounts will not be enforced.  Also  refer  to
5009              DenyAccounts.
5010
5011       AllowGroups
5012              Comma-separated  list  of  group names which may execute jobs in
5013              this partition.  A user will be permitted to  submit  a  job  to
5014              this  partition if AllowGroups has at least one group associated
5015              with the user.  Jobs executed as user root or as user  SlurmUser
5016              will be allowed to use any partition, regardless of the value of
5017              AllowGroups. In addition, a Slurm Admin or Operator will be able
5018              to  view  any partition, regardless of the value of AllowGroups.
5019              If user root attempts to execute a job as another user (e.g. us‐
5020              ing srun's --uid option), then the job will be subject to Allow‐
5021              Groups as if it were submitted by that user.  By default, Allow‐
5022              Groups is unset, meaning all groups are allowed to use this par‐
5023              tition. The special value 'ALL' is equivalent  to  this.   Users
5024              who are not members of the specified group will not see informa‐
5025              tion about this partition by default. However, this  should  not
5026              be  treated  as a security mechanism, since job information will
5027              be returned if a user requests details about the partition or  a
5028              specific  job.  See the PrivateData parameter to restrict access
5029              to job information.  NOTE: For performance reasons, Slurm  main‐
5030              tains  a list of user IDs allowed to use each partition and this
5031              is checked at job submission time.  This list of user IDs is up‐
5032              dated when the slurmctld daemon is restarted, reconfigured (e.g.
5033              "scontrol reconfig") or the partition's AllowGroups value is re‐
5034              set, even if is value is unchanged (e.g. "scontrol update Parti‐
5035              tionName=name AllowGroups=group").  For a  user's  access  to  a
5036              partition  to  change, both his group membership must change and
5037              Slurm's internal user ID list must change using one of the meth‐
5038              ods described above.
5039
5040       AllowQos
5041              Comma-separated list of Qos which may execute jobs in the parti‐
5042              tion.  Jobs executed as user root can use any partition  without
5043              regard  to  the  value of AllowQos.  The default value is "ALL".
5044              NOTE: If AllowQos is used then DenyQos  will  not  be  enforced.
5045              Also refer to DenyQos.
5046
5047       Alternate
5048              Partition name of alternate partition to be used if the state of
5049              this partition is "DRAIN" or "INACTIVE."
5050
5051       CpuBind
5052              If a job step request does not specify an option to control  how
5053              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
5054              located to the job do not have the same CpuBind option the node.
5055              Then  the  partition's CpuBind option will control how tasks are
5056              bound to allocated resources.  Supported values  forCpuBind  are
5057              "none", "socket", "ldom" (NUMA), "core" and "thread".
5058
5059       Default
5060              If this keyword is set, jobs submitted without a partition spec‐
5061              ification will utilize  this  partition.   Possible  values  are
5062              "YES" and "NO".  The default value is "NO".
5063
5064       DefaultTime
5065              Run  time limit used for jobs that don't specify a value. If not
5066              set then MaxTime will be used.  Format is the same as  for  Max‐
5067              Time.
5068
5069       DefCpuPerGPU
5070              Default count of CPUs allocated per allocated GPU. This value is
5071              used  only  if  the  job  didn't  specify  --cpus-per-task   and
5072              --cpus-per-gpu.
5073
5074       DefMemPerCPU
5075              Default   real  memory  size  available  per  allocated  CPU  in
5076              megabytes.  Used to avoid over-subscribing  memory  and  causing
5077              paging.  DefMemPerCPU would generally be used if individual pro‐
5078              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5079              lectType=select/cons_tres).   If not set, the DefMemPerCPU value
5080              for the entire cluster will be  used.   Also  see  DefMemPerGPU,
5081              DefMemPerNode  and MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
5082              DefMemPerNode are mutually exclusive.
5083
5084       DefMemPerGPU
5085              Default  real  memory  size  available  per  allocated  GPU   in
5086              megabytes.   Also see DefMemPerCPU, DefMemPerNode and MaxMemPer‐
5087              CPU.  DefMemPerCPU, DefMemPerGPU and DefMemPerNode are  mutually
5088              exclusive.
5089
5090       DefMemPerNode
5091              Default  real  memory  size  available  per  allocated  node  in
5092              megabytes.  Used to avoid over-subscribing  memory  and  causing
5093              paging.   DefMemPerNode  would  generally be used if whole nodes
5094              are allocated to jobs (SelectType=select/linear)  and  resources
5095              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5096              If not set, the DefMemPerNode value for the entire cluster  will
5097              be  used.  Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerCPU.
5098              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
5099              sive.
5100
5101       DenyAccounts
5102              Comma-separated  list  of accounts which may not execute jobs in
5103              the partition.  By default, no accounts are denied access  NOTE:
5104              If AllowAccounts is used then DenyAccounts will not be enforced.
5105              Also refer to AllowAccounts.
5106
5107       DenyQos
5108              Comma-separated list of Qos which may not execute  jobs  in  the
5109              partition.   By  default,  no QOS are denied access NOTE: If Al‐
5110              lowQos is used then DenyQos will not be  enforced.   Also  refer
5111              AllowQos.
5112
5113       DisableRootJobs
5114              If  set  to  "YES" then user root will be prevented from running
5115              any jobs on this partition.  The default value will be the value
5116              of  DisableRootJobs  set  outside  of  a partition specification
5117              (which is "NO", allowing user root to execute jobs).
5118
5119       ExclusiveUser
5120              If set to "YES" then nodes  will  be  exclusively  allocated  to
5121              users.  Multiple jobs may be run for the same user, but only one
5122              user can be active at a time.  This capability is also available
5123              on a per-job basis by using the --exclusive=user option.
5124
5125       GraceTime
5126              Specifies,  in units of seconds, the preemption grace time to be
5127              extended to a job which has been selected for  preemption.   The
5128              default  value  is  zero, no preemption grace time is allowed on
5129              this partition.  Once a job has been  selected  for  preemption,
5130              its  end  time  is  set  to the current time plus GraceTime. The
5131              job's tasks are immediately sent SIGCONT and SIGTERM signals  in
5132              order to provide notification of its imminent termination.  This
5133              is followed by the SIGCONT, SIGTERM and SIGKILL signal  sequence
5134              upon  reaching  its  new end time. This second set of signals is
5135              sent to both the tasks and the containing batch script,  if  ap‐
5136              plicable.  See also the global KillWait configuration parameter.
5137
5138       Hidden Specifies  if the partition and its jobs are to be hidden by de‐
5139              fault.  Hidden partitions will by default not be reported by the
5140              Slurm  APIs  or  commands.   Possible values are "YES" and "NO".
5141              The default value is "NO".  Note that  partitions  that  a  user
5142              lacks access to by virtue of the AllowGroups parameter will also
5143              be hidden by default.
5144
5145       LLN    Schedule resources to jobs on the least loaded nodes (based upon
5146              the number of idle CPUs). This is generally only recommended for
5147              an environment with serial jobs as idle resources will  tend  to
5148              be  highly fragmented, resulting in parallel jobs being distrib‐
5149              uted across many nodes.  Note that node Weight takes  precedence
5150              over how many idle resources are on each node.  Also see the Se‐
5151              lectTypeParameters configuration parameter  CR_LLN  to  use  the
5152              least loaded nodes in every partition.
5153
5154       MaxCPUsPerNode
5155              Maximum  number  of  CPUs on any node available to all jobs from
5156              this partition.  This can be especially useful to schedule GPUs.
5157              For  example  a node can be associated with two Slurm partitions
5158              (e.g. "cpu" and "gpu") and the partition/queue  "cpu"  could  be
5159              limited  to  only a subset of the node's CPUs, ensuring that one
5160              or more CPUs would be available to  jobs  in  the  "gpu"  parti‐
5161              tion/queue.
5162
5163       MaxMemPerCPU
5164              Maximum   real  memory  size  available  per  allocated  CPU  in
5165              megabytes.  Used to avoid over-subscribing  memory  and  causing
5166              paging.  MaxMemPerCPU would generally be used if individual pro‐
5167              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5168              lectType=select/cons_tres).   If not set, the MaxMemPerCPU value
5169              for the entire cluster will be used.  Also see DefMemPerCPU  and
5170              MaxMemPerNode.   MaxMemPerCPU and MaxMemPerNode are mutually ex‐
5171              clusive.
5172
5173       MaxMemPerNode
5174              Maximum  real  memory  size  available  per  allocated  node  in
5175              megabytes.   Used  to  avoid over-subscribing memory and causing
5176              paging.  MaxMemPerNode would generally be used  if  whole  nodes
5177              are  allocated  to jobs (SelectType=select/linear) and resources
5178              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
5179              If  not set, the MaxMemPerNode value for the entire cluster will
5180              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
5181              and MaxMemPerNode are mutually exclusive.
5182
5183       MaxNodes
5184              Maximum count of nodes which may be allocated to any single job.
5185              The default value is "UNLIMITED", which  is  represented  inter‐
5186              nally as -1.
5187
5188       MaxTime
5189              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
5190              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
5191              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
5192              tion is one minute and second values are rounded up to the  next
5193              minute.   The job TimeLimit may be updated by root, SlurmUser or
5194              an Operator to a value higher than the configured MaxTime  after
5195              job submission.
5196
5197       MinNodes
5198              Minimum count of nodes which may be allocated to any single job.
5199              The default value is 0.
5200
5201       Nodes  Comma-separated list of nodes or nodesets which  are  associated
5202              with this partition.  Node names may be specified using the node
5203              range expression syntax described above. A blank list  of  nodes
5204              (i.e.  "Nodes= ") can be used if one wants a partition to exist,
5205              but have no resources (possibly on a temporary basis).  A  value
5206              of "ALL" is mapped to all nodes configured in the cluster.
5207
5208       OverSubscribe
5209              Controls  the  ability of the partition to execute more than one
5210              job at a time on each resource (node, socket or  core  depending
5211              upon the value of SelectTypeParameters).  If resources are to be
5212              over-subscribed, avoiding memory over-subscription is  very  im‐
5213              portant.   SelectTypeParameters  should  be  configured to treat
5214              memory as a consumable resource and the --mem option  should  be
5215              used  for  job  allocations.   Sharing of resources is typically
5216              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
5217              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
5218              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
5219              can  negatively  impact  performance for systems with many thou‐
5220              sands of running jobs.  The default value is "NO".  For more in‐
5221              formation see the following web pages:
5222              https://slurm.schedmd.com/cons_res.html
5223              https://slurm.schedmd.com/cons_res_share.html
5224              https://slurm.schedmd.com/gang_scheduling.html
5225              https://slurm.schedmd.com/preempt.html
5226
5227              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
5228                          Type=select/cons_res or  SelectType=select/cons_tres
5229                          configured.   Jobs that run in partitions with Over‐
5230                          Subscribe=EXCLUSIVE will have  exclusive  access  to
5231                          all  allocated  nodes.  These jobs are allocated all
5232                          CPUs and GRES on the nodes, but they are only  allo‐
5233                          cated as much memory as they ask for. This is by de‐
5234                          sign to support gang scheduling,  because  suspended
5235                          jobs still reside in memory. To request all the mem‐
5236                          ory on a node, use --mem=0 at submit time.
5237
5238              FORCE       Makes all resources (except GRES) in  the  partition
5239                          available for oversubscription without any means for
5240                          users to disable it.  May be followed with  a  colon
5241                          and  maximum  number of jobs in running or suspended
5242                          state.  For  example  OverSubscribe=FORCE:4  enables
5243                          each  node, socket or core to oversubscribe each re‐
5244                          source four ways.  Recommended only for systems  us‐
5245                          ing PreemptMode=suspend,gang.
5246
5247                          NOTE:  OverSubscribe=FORCE:1  is a special case that
5248                          is not exactly equivalent to OverSubscribe=NO. Over‐
5249                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5250                          tion of resources in the same partition but it  will
5251                          still allow oversubscription due to preemption. Set‐
5252                          ting OverSubscribe=NO will prevent  oversubscription
5253                          from happening due to preemption as well.
5254
5255                          NOTE: If using PreemptType=preempt/qos you can spec‐
5256                          ify a value for FORCE that is greater  than  1.  For
5257                          example,  OverSubscribe=FORCE:2 will permit two jobs
5258                          per resource  normally,  but  a  third  job  can  be
5259                          started  only  if  done  so through preemption based
5260                          upon QOS.
5261
5262                          NOTE: If OverSubscribe is configured to FORCE or YES
5263                          in  your slurm.conf and the system is not configured
5264                          to use preemption (PreemptMode=OFF)  accounting  can
5265                          easily  grow  to values greater than the actual uti‐
5266                          lization. It may be common on such  systems  to  get
5267                          error messages in the slurmdbd log stating: "We have
5268                          more allocated time than is possible."
5269
5270              YES         Makes all resources (except GRES) in  the  partition
5271                          available  for sharing upon request by the job.  Re‐
5272                          sources will only be over-subscribed when explicitly
5273                          requested  by  the  user using the "--oversubscribe"
5274                          option on job submission.  May be  followed  with  a
5275                          colon  and maximum number of jobs in running or sus‐
5276                          pended state.  For example "OverSubscribe=YES:4" en‐
5277                          ables  each  node,  socket  or core to execute up to
5278                          four jobs at once.   Recommended  only  for  systems
5279                          running   with   gang  scheduling  (PreemptMode=sus‐
5280                          pend,gang).
5281
5282              NO          Selected resources are allocated to a single job. No
5283                          resource will be allocated to more than one job.
5284
5285                          NOTE:   Even   if  you  are  using  PreemptMode=sus‐
5286                          pend,gang,  setting  OverSubscribe=NO  will  disable
5287                          preemption   on   that   partition.   Use   OverSub‐
5288                          scribe=FORCE:1 if you want to disable  normal  over‐
5289                          subscription  but still allow suspension due to pre‐
5290                          emption.
5291
5292       OverTimeLimit
5293              Number of minutes by which a job can exceed its time  limit  be‐
5294              fore  being canceled.  Normally a job's time limit is treated as
5295              a hard limit and the job  will  be  killed  upon  reaching  that
5296              limit.   Configuring OverTimeLimit will result in the job's time
5297              limit being treated like a soft limit.  Adding the OverTimeLimit
5298              value  to  the  soft  time  limit provides a hard time limit, at
5299              which point the job is canceled.  This  is  particularly  useful
5300              for  backfill  scheduling, which bases upon each job's soft time
5301              limit.  If not set, the OverTimeLimit value for the entire clus‐
5302              ter  will  be  used.   May not exceed 65533 minutes.  A value of
5303              "UNLIMITED" is also supported.
5304
5305       PartitionName
5306              Name by which the partition may be  referenced  (e.g.  "Interac‐
5307              tive").   This  name  can  be specified by users when submitting
5308              jobs.  If the PartitionName is "DEFAULT", the  values  specified
5309              with  that  record will apply to subsequent partition specifica‐
5310              tions unless explicitly set to other values  in  that  partition
5311              record or replaced with a different set of default values.  Each
5312              line where PartitionName is "DEFAULT" will  replace  or  add  to
5313              previous  default values and not a reinitialize the default val‐
5314              ues.
5315
5316       PreemptMode
5317              Mechanism used to preempt jobs or  enable  gang  scheduling  for
5318              this  partition  when PreemptType=preempt/partition_prio is con‐
5319              figured.  This partition-specific PreemptMode configuration  pa‐
5320              rameter will override the cluster-wide PreemptMode for this par‐
5321              tition.  It can be set to OFF to  disable  preemption  and  gang
5322              scheduling  for  this  partition.  See also PriorityTier and the
5323              above description of the cluster-wide PreemptMode parameter  for
5324              further details.
5325              The GANG option is used to enable gang scheduling independent of
5326              whether preemption is enabled (i.e. independent of the  Preempt‐
5327              Type  setting). It can be specified in addition to a PreemptMode
5328              setting with the two  options  comma  separated  (e.g.  Preempt‐
5329              Mode=SUSPEND,GANG).
5330              See         <https://slurm.schedmd.com/preempt.html>         and
5331              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
5332              tails.
5333
5334              NOTE:  For  performance reasons, the backfill scheduler reserves
5335              whole nodes for jobs, not  partial  nodes.  If  during  backfill
5336              scheduling  a  job  preempts  one  or more other jobs, the whole
5337              nodes for those preempted jobs are reserved  for  the  preemptor
5338              job,  even  if  the preemptor job requested fewer resources than
5339              that.  These reserved nodes aren't available to other jobs  dur‐
5340              ing that backfill cycle, even if the other jobs could fit on the
5341              nodes. Therefore, jobs may preempt more resources during a  sin‐
5342              gle backfill iteration than they requested.
5343              NOTE:  For heterogeneous job to be considered for preemption all
5344              components must be eligible for preemption. When a heterogeneous
5345              job is to be preempted the first identified component of the job
5346              with the highest order PreemptMode (SUSPEND (highest),  REQUEUE,
5347              CANCEL  (lowest))  will  be  used to set the PreemptMode for all
5348              components. The GraceTime and user warning signal for each  com‐
5349              ponent  of  the  heterogeneous job remain unique.  Heterogeneous
5350              jobs are excluded from GANG scheduling operations.
5351
5352              OFF         Is the default value and disables job preemption and
5353                          gang  scheduling.   It  is only compatible with Pre‐
5354                          emptType=preempt/none at a global level.   A  common
5355                          use case for this parameter is to set it on a parti‐
5356                          tion to disable preemption for that partition.
5357
5358              CANCEL      The preempted job will be cancelled.
5359
5360              GANG        Enables gang scheduling (time slicing)  of  jobs  in
5361                          the  same partition, and allows the resuming of sus‐
5362                          pended jobs.
5363
5364                          NOTE: Gang scheduling is performed independently for
5365                          each  partition, so if you only want time-slicing by
5366                          OverSubscribe, without any preemption, then  config‐
5367                          uring  partitions with overlapping nodes is not rec‐
5368                          ommended.  On the other hand, if  you  want  to  use
5369                          PreemptType=preempt/partition_prio   to  allow  jobs
5370                          from higher PriorityTier partitions to Suspend  jobs
5371                          from  lower  PriorityTier  partitions  you will need
5372                          overlapping partitions, and PreemptMode=SUSPEND,GANG
5373                          to  use  the  Gang scheduler to resume the suspended
5374                          jobs(s).  In any case, time-slicing won't happen be‐
5375                          tween jobs on different partitions.
5376                          NOTE:  Heterogeneous  jobs  are  excluded  from GANG
5377                          scheduling operations.
5378
5379              REQUEUE     Preempts jobs by requeuing  them  (if  possible)  or
5380                          canceling  them.   For jobs to be requeued they must
5381                          have the --requeue sbatch option set or the  cluster
5382                          wide  JobRequeue parameter in slurm.conf must be set
5383                          to 1.
5384
5385              SUSPEND     The preempted jobs will be suspended, and later  the
5386                          Gang  scheduler will resume them. Therefore the SUS‐
5387                          PEND preemption mode always needs the GANG option to
5388                          be specified at the cluster level. Also, because the
5389                          suspended jobs will still use memory  on  the  allo‐
5390                          cated  nodes, Slurm needs to be able to track memory
5391                          resources to be able to suspend jobs.
5392
5393                          If the preemptees and  preemptor  are  on  different
5394                          partitions  then the preempted jobs will remain sus‐
5395                          pended until the preemptor ends.
5396                          NOTE: Because gang scheduling is performed  indepen‐
5397                          dently for each partition, if using PreemptType=pre‐
5398                          empt/partition_prio then jobs in higher PriorityTier
5399                          partitions  will  suspend jobs in lower PriorityTier
5400                          partitions to run on the  released  resources.  Only
5401                          when  the preemptor job ends will the suspended jobs
5402                          will be resumed by the Gang scheduler.
5403                          NOTE: Suspended jobs will not release  GRES.  Higher
5404                          priority  jobs  will  not be able to preempt to gain
5405                          access to GRES.
5406
5407       PriorityJobFactor
5408              Partition factor used by priority/multifactor plugin  in  calcu‐
5409              lating  job priority.  The value may not exceed 65533.  Also see
5410              PriorityTier.
5411
5412       PriorityTier
5413              Jobs submitted to a partition with a higher  PriorityTier  value
5414              will be evaluated by the scheduler before pending jobs in a par‐
5415              tition with a lower PriorityTier value. They will also  be  con‐
5416              sidered  for  preemption  of  running  jobs in partition(s) with
5417              lower PriorityTier values if PreemptType=preempt/partition_prio.
5418              The value may not exceed 65533.  Also see PriorityJobFactor.
5419
5420       QOS    Used  to  extend  the  limits available to a QOS on a partition.
5421              Jobs will not be associated to this QOS outside of being associ‐
5422              ated  to  the partition.  They will still be associated to their
5423              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5424              set  in both the Partition's QOS and the Job's QOS the Partition
5425              QOS will be honored unless the Job's  QOS  has  the  OverPartQOS
5426              flag set in which the Job's QOS will have priority.
5427
5428       ReqResv
5429              Specifies  users  of  this partition are required to designate a
5430              reservation when submitting a job. This option can be useful  in
5431              restricting  usage  of a partition that may have higher priority
5432              or additional resources to be allowed only within a reservation.
5433              Possible values are "YES" and "NO".  The default value is "NO".
5434
5435       ResumeTimeout
5436              Maximum  time  permitted (in seconds) between when a node resume
5437              request is issued and when the node is  actually  available  for
5438              use.   Nodes  which  fail  to respond in this time frame will be
5439              marked DOWN and the jobs scheduled on the node requeued.   Nodes
5440              which  reboot  after  this time frame will be marked DOWN with a
5441              reason of "Node unexpectedly rebooted."  For nodes that  are  in
5442              multiple  partitions with this option set, the highest time will
5443              take effect. If not set on any partition, the node will use  the
5444              ResumeTimeout value set for the entire cluster.
5445
5446       RootOnly
5447              Specifies if only user ID zero (i.e. user root) may allocate re‐
5448              sources in this partition. User root may allocate resources  for
5449              any  other user, but the request must be initiated by user root.
5450              This option can be useful for a partition to be managed by  some
5451              external  entity  (e.g. a higher-level job manager) and prevents
5452              users from directly using those resources.  Possible values  are
5453              "YES" and "NO".  The default value is "NO".
5454
5455       SelectTypeParameters
5456              Partition-specific  resource  allocation  type.  This option re‐
5457              places the global SelectTypeParameters value.  Supported  values
5458              are  CR_Core,  CR_Core_Memory,  CR_Socket  and CR_Socket_Memory.
5459              Use requires the system-wide SelectTypeParameters value  be  set
5460              to  any  of  the four supported values previously listed; other‐
5461              wise, the partition-specific value will be ignored.
5462
5463       Shared The Shared configuration parameter  has  been  replaced  by  the
5464              OverSubscribe parameter described above.
5465
5466       State  State of partition or availability for use.  Possible values are
5467              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5468              See also the related "Alternate" keyword.
5469
5470              UP        Designates  that  new jobs may be queued on the parti‐
5471                        tion, and that jobs may be  allocated  nodes  and  run
5472                        from the partition.
5473
5474              DOWN      Designates  that  new jobs may be queued on the parti‐
5475                        tion, but queued jobs may not be allocated  nodes  and
5476                        run  from  the  partition. Jobs already running on the
5477                        partition continue to run. The jobs must be explicitly
5478                        canceled to force their termination.
5479
5480              DRAIN     Designates  that no new jobs may be queued on the par‐
5481                        tition (job submission requests will be denied with an
5482                        error  message), but jobs already queued on the parti‐
5483                        tion may be allocated nodes and  run.   See  also  the
5484                        "Alternate" partition specification.
5485
5486              INACTIVE  Designates  that no new jobs may be queued on the par‐
5487                        tition, and jobs already queued may not  be  allocated
5488                        nodes  and  run.   See  also the "Alternate" partition
5489                        specification.
5490
5491       SuspendTime
5492              Nodes which remain idle or down for this number of seconds  will
5493              be  placed  into  power  save mode by SuspendProgram.  For nodes
5494              that are in multiple partitions with this option set, the  high‐
5495              est time will take effect. If not set on any partition, the node
5496              will use the SuspendTime value set for the entire cluster.  Set‐
5497              ting SuspendTime to INFINITE will disable suspending of nodes in
5498              this partition.  Setting SuspendTime to  anything  but  INFINITE
5499              (or -1) will enable power save mode.
5500
5501       SuspendTimeout
5502              Maximum  time permitted (in seconds) between when a node suspend
5503              request is issued and when the node is shutdown.  At  that  time
5504              the  node  must  be  ready  for a resume request to be issued as
5505              needed for new work.  For nodes that are in multiple  partitions
5506              with  this option set, the highest time will take effect. If not
5507              set on any partition, the node will use the SuspendTimeout value
5508              set for the entire cluster.
5509
5510       TRESBillingWeights
5511              TRESBillingWeights is used to define the billing weights of each
5512              TRES type that will be used in calculating the usage of  a  job.
5513              The calculated usage is used when calculating fairshare and when
5514              enforcing the TRES billing limit on jobs.
5515
5516              Billing weights are specified as a comma-separated list of <TRES
5517              Type>=<TRES Billing Weight> pairs.
5518
5519              Any  TRES Type is available for billing. Note that the base unit
5520              for memory and burst buffers is megabytes.
5521
5522              By default the billing of TRES is calculated as the sum  of  all
5523              TRES types multiplied by their corresponding billing weight.
5524
5525              The  weighted  amount  of a resource can be adjusted by adding a
5526              suffix of K,M,G,T or P after the billing weight. For example,  a
5527              memory weight of "mem=.25" on a job allocated 8GB will be billed
5528              2048 (8192MB *.25) units. A memory weight of "mem=.25G"  on  the
5529              same job will be billed 2 (8192MB * (.25/1024)) units.
5530
5531              Negative values are allowed.
5532
5533              When  a job is allocated 1 CPU and 8 GB of memory on a partition
5534              configured                   with                   TRESBilling‐
5535              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5536              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5537
5538              If PriorityFlags=MAX_TRES is configured, the  billable  TRES  is
5539              calculated  as the MAX of individual TRES' on a node (e.g. cpus,
5540              mem, gres) plus the sum of all global TRES' (e.g. licenses). Us‐
5541              ing  the same example above the billable TRES will be MAX(1*1.0,
5542              8*0.25) + (0*2.0) = 2.0.
5543
5544              If TRESBillingWeights is not defined  then  the  job  is  billed
5545              against the total number of allocated CPUs.
5546
5547              NOTE: TRESBillingWeights doesn't affect job priority directly as
5548              it is currently not used for the size of the job.  If  you  want
5549              TRES'  to  play  a  role in the job's priority then refer to the
5550              PriorityWeightTRES option.
5551

PROLOG AND EPILOG SCRIPTS

5553       There are a variety of prolog and epilog program options  that  execute
5554       with  various  permissions and at various times.  The four options most
5555       likely to be used are: Prolog and Epilog (executed once on each compute
5556       node  for  each job) plus PrologSlurmctld and EpilogSlurmctld (executed
5557       once on the ControlMachine for each job).
5558
5559       NOTE: Standard output and error messages are  normally  not  preserved.
5560       Explicitly  write  output and error messages to an appropriate location
5561       if you wish to preserve that information.
5562
5563       NOTE:  By default the Prolog script is ONLY run on any individual  node
5564       when  it  first  sees a job step from a new allocation. It does not run
5565       the Prolog immediately when an allocation is granted.  If no job  steps
5566       from  an allocation are run on a node, it will never run the Prolog for
5567       that allocation.  This Prolog behaviour can  be  changed  by  the  Pro‐
5568       logFlags  parameter.  The Epilog, on the other hand, always runs on ev‐
5569       ery node of an allocation when the allocation is released.
5570
5571       If the Epilog fails (returns a non-zero exit code), this will result in
5572       the node being set to a DRAIN state.  If the EpilogSlurmctld fails (re‐
5573       turns a non-zero exit code), this will only be logged.  If  the  Prolog
5574       fails  (returns a non-zero exit code), this will result in the node be‐
5575       ing set to a DRAIN state and the job being requeued in a held state un‐
5576       less  nohold_on_prolog_fail  is  configured in SchedulerParameters.  If
5577       the PrologSlurmctld fails (returns a non-zero exit code), this will re‐
5578       sult in the job being requeued to be executed on another node if possi‐
5579       ble. Only batch jobs can be requeued.   Interactive  jobs  (salloc  and
5580       srun)  will be cancelled if the PrologSlurmctld fails.  If slurmcltd is
5581       stopped while either PrologSlurmctld or EpilogSlurmctld is running, the
5582       script will be killed with SIGKILL. The script will restart when slurm‐
5583       ctld restarts.
5584
5585
5586       Information about the job is passed to  the  script  using  environment
5587       variables.  Unless otherwise specified, these environment variables are
5588       available in each of the scripts mentioned above (Prolog, Epilog,  Pro‐
5589       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5590       ables that includes those  available  in  the  SrunProlog,  SrunEpilog,
5591       TaskProlog  and  TaskEpilog  please  see  the  Prolog  and Epilog Guide
5592       <https://slurm.schedmd.com/prolog_epilog.html>.
5593
5594
5595       SLURM_ARRAY_JOB_ID
5596              If this job is part of a job array, this will be set to the  job
5597              ID.   Otherwise  it will not be set.  To reference this specific
5598              task of a job array, combine SLURM_ARRAY_JOB_ID  with  SLURM_AR‐
5599              RAY_TASK_ID      (e.g.      "scontrol     update     ${SLURM_AR‐
5600              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}  ...");  Available  in   Pro‐
5601              logSlurmctld and EpilogSlurmctld.
5602
5603       SLURM_ARRAY_TASK_ID
5604              If this job is part of a job array, this will be set to the task
5605              ID.  Otherwise it will not be set.  To reference  this  specific
5606              task  of  a job array, combine SLURM_ARRAY_JOB_ID with SLURM_AR‐
5607              RAY_TASK_ID     (e.g.     "scontrol      update      ${SLURM_AR‐
5608              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}   ...");  Available  in  Pro‐
5609              logSlurmctld and EpilogSlurmctld.
5610
5611       SLURM_ARRAY_TASK_MAX
5612              If this job is part of a job array, this will be set to the max‐
5613              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5614              logSlurmctld and EpilogSlurmctld.
5615
5616       SLURM_ARRAY_TASK_MIN
5617              If this job is part of a job array, this will be set to the min‐
5618              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5619              logSlurmctld and EpilogSlurmctld.
5620
5621       SLURM_ARRAY_TASK_STEP
5622              If this job is part of a job array, this will be set to the step
5623              size  of  task IDs.  Otherwise it will not be set.  Available in
5624              PrologSlurmctld and EpilogSlurmctld.
5625
5626       SLURM_CLUSTER_NAME
5627              Name of the cluster executing the job.
5628
5629       SLURM_CONF
5630              Location of the slurm.conf file. Available in Prolog and Epilog.
5631
5632       SLURMD_NODENAME
5633              Name of the node running the task. In the case of a parallel job
5634              executing on multiple compute nodes, the various tasks will have
5635              this environment variable set to different values on  each  com‐
5636              pute node. Available in Prolog and Epilog.
5637
5638       SLURM_JOB_ACCOUNT
5639              Account name used for the job.
5640
5641       SLURM_JOB_COMMENT
5642              Comment added to the job.  Available in Prolog, PrologSlurmctld,
5643              Epilog and EpilogSlurmctld.
5644
5645       SLURM_JOB_CONSTRAINTS
5646              Features required to run the job.   Available  in  Prolog,  Pro‐
5647              logSlurmctld, Epilog and EpilogSlurmctld.
5648
5649       SLURM_JOB_DERIVED_EC
5650              The  highest  exit  code  of all of the job steps.  Available in
5651              Epilog and EpilogSlurmctld.
5652
5653       SLURM_JOB_EXIT_CODE
5654              The exit code of the job script (or salloc). The  value  is  the
5655              status  as  returned  by  the  wait()  system call (See wait(2))
5656              Available in Epilog and EpilogSlurmctld.
5657
5658       SLURM_JOB_EXIT_CODE2
5659              The exit code of the job script (or salloc). The value  has  the
5660              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5661              cally as set by the exit() function. The second  number  of  the
5662              signal that caused the process to terminate if it was terminated
5663              by a signal.  Available in Epilog and EpilogSlurmctld.
5664
5665       SLURM_JOB_GID
5666              Group ID of the job's owner.
5667
5668       SLURM_JOB_GPUS
5669              The GPU IDs of GPUs in the job allocation (if  any).   Available
5670              in the Prolog and Epilog.
5671
5672       SLURM_JOB_GROUP
5673              Group name of the job's owner.  Available in PrologSlurmctld and
5674              EpilogSlurmctld.
5675
5676       SLURM_JOB_ID
5677              Job ID.
5678
5679       SLURM_JOBID
5680              Job ID.
5681
5682       SLURM_JOB_NAME
5683              Name of the job.  Available in PrologSlurmctld and  EpilogSlurm‐
5684              ctld.
5685
5686       SLURM_JOB_NODELIST
5687              Nodes  assigned  to job. A Slurm hostlist expression.  "scontrol
5688              show hostnames" can be used to convert this to a list  of  indi‐
5689              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5690              logSlurmctld.
5691
5692       SLURM_JOB_PARTITION
5693              Partition that job runs in.  Available in  Prolog,  PrologSlurm‐
5694              ctld, Epilog and EpilogSlurmctld.
5695
5696       SLURM_JOB_UID
5697              User ID of the job's owner.
5698
5699       SLURM_JOB_USER
5700              User name of the job's owner.
5701
5702       SLURM_SCRIPT_CONTEXT
5703              Identifies which epilog or prolog program is currently running.
5704

UNKILLABLE STEP PROGRAM SCRIPT

5706       This program can be used to take special actions to clean up the unkil‐
5707       lable processes and/or notify system administrators.  The program  will
5708       be run as SlurmdUser (usually "root") on the compute node where Unkill‐
5709       ableStepTimeout was triggered.
5710
5711       Information about the unkillable job step is passed to the script using
5712       environment variables.
5713
5714
5715       SLURM_JOB_ID
5716              Job ID.
5717
5718       SLURM_STEP_ID
5719              Job Step ID.
5720

NETWORK TOPOLOGY

5722       Slurm  is  able  to  optimize  job allocations to minimize network con‐
5723       tention.  Special Slurm logic is used to optimize allocations  on  sys‐
5724       tems with a three-dimensional interconnect.  and information about con‐
5725       figuring those systems are  available  on  web  pages  available  here:
5726       <https://slurm.schedmd.com/>.   For a hierarchical network, Slurm needs
5727       to have detailed information about how nodes are configured on the net‐
5728       work switches.
5729
5730       Given  network topology information, Slurm allocates all of a job's re‐
5731       sources onto a single  leaf  of  the  network  (if  possible)  using  a
5732       best-fit  algorithm.  Otherwise it will allocate a job's resources onto
5733       multiple leaf switches so  as  to  minimize  the  use  of  higher-level
5734       switches.   The  TopologyPlugin parameter controls which plugin is used
5735       to collect network topology information.   The  only  values  presently
5736       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5737       forms best-fit logic over three-dimensional topology),  "topology/none"
5738       (default  for other systems, best-fit logic over one-dimensional topol‐
5739       ogy), "topology/tree" (determine the network topology based upon infor‐
5740       mation  contained  in a topology.conf file, see "man topology.conf" for
5741       more information).  Future plugins may gather topology information  di‐
5742       rectly from the network.  The topology information is optional.  If not
5743       provided, Slurm will perform a best-fit algorithm  assuming  the  nodes
5744       are  in  a  one-dimensional  array as configured and the communications
5745       cost is related to the node distance in this array.
5746
5747

RELOCATING CONTROLLERS

5749       If the cluster's computers used for the primary  or  backup  controller
5750       will be out of service for an extended period of time, it may be desir‐
5751       able to relocate them.  In order to do so, follow this procedure:
5752
5753       1. Stop the Slurm daemons
5754       2. Modify the slurm.conf file appropriately
5755       3. Distribute the updated slurm.conf file to all nodes
5756       4. Restart the Slurm daemons
5757
5758       There should be no loss of any running or pending  jobs.   Ensure  that
5759       any  nodes  added  to  the cluster have the current slurm.conf file in‐
5760       stalled.
5761
5762       CAUTION: If two nodes are simultaneously configured as the primary con‐
5763       troller  (two  nodes  on which SlurmctldHost specify the local host and
5764       the slurmctld daemon is executing on each), system behavior will be de‐
5765       structive.  If a compute node has an incorrect SlurmctldHost parameter,
5766       that node may be rendered unusable, but no other harm will result.
5767
5768

EXAMPLE

5770       #
5771       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5772       # Author: John Doe
5773       # Date: 11/06/2001
5774       #
5775       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5776       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5777       #
5778       AuthType=auth/munge
5779       Epilog=/usr/local/slurm/epilog
5780       Prolog=/usr/local/slurm/prolog
5781       FirstJobId=65536
5782       InactiveLimit=120
5783       JobCompType=jobcomp/filetxt
5784       JobCompLoc=/var/log/slurm/jobcomp
5785       KillWait=30
5786       MaxJobCount=10000
5787       MinJobAge=3600
5788       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5789       ReturnToService=0
5790       SchedulerType=sched/backfill
5791       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5792       SlurmdLogFile=/var/log/slurm/slurmd.log
5793       SlurmctldPort=7002
5794       SlurmdPort=7003
5795       SlurmdSpoolDir=/var/spool/slurmd.spool
5796       StateSaveLocation=/var/spool/slurm.state
5797       SwitchType=switch/none
5798       TmpFS=/tmp
5799       WaitTime=30
5800       #
5801       # Node Configurations
5802       #
5803       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5804       NodeName=DEFAULT State=UNKNOWN
5805       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5806       # Update records for specific DOWN nodes
5807       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5808       #
5809       # Partition Configurations
5810       #
5811       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5812       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5813       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5814       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5815
5816

INCLUDE MODIFIERS

5818       The "include" key word can be used with modifiers within the  specified
5819       pathname.  These modifiers would be replaced with cluster name or other
5820       information depending on which modifier is specified. If  the  included
5821       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5822       slash), it will searched for in the same directory  as  the  slurm.conf
5823       file.
5824
5825
5826       %c     Cluster name specified in the slurm.conf will be used.
5827
5828       EXAMPLE
5829       ClusterName=linux
5830       include /home/slurm/etc/%c_config
5831       # Above line interpreted as
5832       # "include /home/slurm/etc/linux_config"
5833
5834

FILE AND DIRECTORY PERMISSIONS

5836       There  are  three classes of files: Files used by slurmctld must be ac‐
5837       cessible by user SlurmUser and accessible by  the  primary  and  backup
5838       control machines.  Files used by slurmd must be accessible by user root
5839       and accessible from every compute node.  A few files need to be  acces‐
5840       sible by normal users on all login and compute nodes.  While many files
5841       and directories are listed below, most of them will not  be  used  with
5842       most configurations.
5843
5844
5845       Epilog Must  be  executable  by  user root.  It is recommended that the
5846              file be readable by all users.  The file  must  exist  on  every
5847              compute node.
5848
5849       EpilogSlurmctld
5850              Must  be  executable  by user SlurmUser.  It is recommended that
5851              the file be readable by all users.  The file must be  accessible
5852              by the primary and backup control machines.
5853
5854       HealthCheckProgram
5855              Must  be  executable  by  user root.  It is recommended that the
5856              file be readable by all users.  The file  must  exist  on  every
5857              compute node.
5858
5859       JobCompLoc
5860              If this specifies a file, it must be writable by user SlurmUser.
5861              The file must be accessible by the primary  and  backup  control
5862              machines.
5863
5864       MailProg
5865              Must  be  executable by user SlurmUser.  Must not be writable by
5866              regular users.  The file must be accessible by the  primary  and
5867              backup control machines.
5868
5869       Prolog Must  be  executable  by  user root.  It is recommended that the
5870              file be readable by all users.  The file  must  exist  on  every
5871              compute node.
5872
5873       PrologSlurmctld
5874              Must  be  executable  by user SlurmUser.  It is recommended that
5875              the file be readable by all users.  The file must be  accessible
5876              by the primary and backup control machines.
5877
5878       ResumeProgram
5879              Must be executable by user SlurmUser.  The file must be accessi‐
5880              ble by the primary and backup control machines.
5881
5882       slurm.conf
5883              Readable to all users on all nodes.  Must  not  be  writable  by
5884              regular users.
5885
5886       SlurmctldLogFile
5887              Must be writable by user SlurmUser.  The file must be accessible
5888              by the primary and backup control machines.
5889
5890       SlurmctldPidFile
5891              Must be writable by user root.  Preferably writable  and  remov‐
5892              able  by  SlurmUser.  The file must be accessible by the primary
5893              and backup control machines.
5894
5895       SlurmdLogFile
5896              Must be writable by user root.  A distinct file  must  exist  on
5897              each compute node.
5898
5899       SlurmdPidFile
5900              Must  be  writable  by user root.  A distinct file must exist on
5901              each compute node.
5902
5903       SlurmdSpoolDir
5904              Must be writable by user root. Permissions must be set to 755 so
5905              that  job  scripts  can be executed from this directory.  A dis‐
5906              tinct file must exist on each compute node.
5907
5908       SrunEpilog
5909              Must be executable by all users.  The file must exist  on  every
5910              login and compute node.
5911
5912       SrunProlog
5913              Must  be  executable by all users.  The file must exist on every
5914              login and compute node.
5915
5916       StateSaveLocation
5917              Must be writable by user SlurmUser.  The file must be accessible
5918              by the primary and backup control machines.
5919
5920       SuspendProgram
5921              Must be executable by user SlurmUser.  The file must be accessi‐
5922              ble by the primary and backup control machines.
5923
5924       TaskEpilog
5925              Must be executable by all users.  The file must exist  on  every
5926              compute node.
5927
5928       TaskProlog
5929              Must  be  executable by all users.  The file must exist on every
5930              compute node.
5931
5932       UnkillableStepProgram
5933              Must be executable by user SlurmdUser.  The file must be  acces‐
5934              sible by the primary and backup control machines.
5935

LOGGING

5937       Note  that  while  Slurm  daemons  create  log files and other files as
5938       needed, it treats the lack of parent  directories  as  a  fatal  error.
5939       This prevents the daemons from running if critical file systems are not
5940       mounted and will minimize the risk of cold-starting  (starting  without
5941       preserving jobs).
5942
5943       Log  files and job accounting files may need to be created/owned by the
5944       "SlurmUser" uid to be  successfully  accessed.   Use  the  "chown"  and
5945       "chmod"  commands  to  set the ownership and permissions appropriately.
5946       See the section FILE AND DIRECTORY PERMISSIONS  for  information  about
5947       the various files and directories used by Slurm.
5948
5949       It  is  recommended  that  the logrotate utility be used to ensure that
5950       various log files do not become too large.  This also applies  to  text
5951       files  used  for  accounting, process tracking, and the slurmdbd log if
5952       they are used.
5953
5954       Here is a sample logrotate configuration. Make appropriate site modifi‐
5955       cations  and  save  as  /etc/logrotate.d/slurm  on  all nodes.  See the
5956       logrotate man page for more details.
5957
5958       ##
5959       # Slurm Logrotate Configuration
5960       ##
5961       /var/log/slurm/*.log {
5962            compress
5963            missingok
5964            nocopytruncate
5965            nodelaycompress
5966            nomail
5967            notifempty
5968            noolddir
5969            rotate 5
5970            sharedscripts
5971            size=5M
5972            create 640 slurm root
5973            postrotate
5974                 pkill -x --signal SIGUSR2 slurmctld
5975                 pkill -x --signal SIGUSR2 slurmd
5976                 pkill -x --signal SIGUSR2 slurmdbd
5977                 exit 0
5978            endscript
5979       }
5980
5981

COPYING

5983       Copyright (C) 2002-2007 The Regents of the  University  of  California.
5984       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5985       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5986       Copyright (C) 2010-2022 SchedMD LLC.
5987
5988       This  file  is  part  of Slurm, a resource management program.  For de‐
5989       tails, see <https://slurm.schedmd.com/>.
5990
5991       Slurm is free software; you can redistribute it and/or modify it  under
5992       the  terms  of  the GNU General Public License as published by the Free
5993       Software Foundation; either version 2 of the License, or (at  your  op‐
5994       tion) any later version.
5995
5996       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
5997       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
5998       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
5999       for more details.
6000
6001

FILES

6003       /etc/slurm.conf
6004
6005