slurm.conf(5)

1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can be modified at system build time using the DE‐
17       FAULT_SLURM_CONF  parameter  or  at  execution  time  by  setting   the
18       SLURM_CONF  environment  variable.  The Slurm daemons also allow you to
19       override both the built-in and environment-provided location using  the
20       "-f" option on the command line.
21
22       The  contents  of the file are case insensitive except for the names of
23       nodes and partitions. Any text following a  "#"  in  the  configuration
24       file  is treated as a comment through the end of that line.  Changes to
25       the configuration file take effect upon restart of Slurm daemons,  dae‐
26       mon receipt of the SIGHUP signal, or execution of the command "scontrol
27       reconfigure" unless otherwise noted.
28
29       If a line begins with the word "Include"  followed  by  whitespace  and
30       then  a  file  name, that file will be included inline with the current
31       configuration file. For large or complex systems,  multiple  configura‐
32       tion  files  may  prove easier to manage and enable reuse of some files
33       (See INCLUDE MODIFIERS for more details).
34
35       Note on file permissions:
36
37       The slurm.conf file must be readable by all users of Slurm, since it is
38       used  by  many  of the Slurm commands.  Other files that are defined in
39       the slurm.conf file, such as log files and job  accounting  files,  may
40       need to be created/owned by the user "SlurmUser" to be successfully ac‐
41       cessed.  Use the "chown" and "chmod" commands to set the ownership  and
42       permissions  appropriately.  See the section FILE AND DIRECTORY PERMIS‐
43       SIONS for information about the various files and directories  used  by
44       Slurm.
45
46

PARAMETERS

48       The overall configuration parameters available include:
49
50
51       AccountingStorageBackupHost
52              The  name  of  the backup machine hosting the accounting storage
53              database.  If used with the accounting_storage/slurmdbd  plugin,
54              this  is  where the backup slurmdbd would be running.  Only used
55              with systems using SlurmDBD, ignored otherwise.
56
57
58       AccountingStorageEnforce
59              This controls what level of association-based enforcement to im‐
60              pose  on  job submissions.  Valid options are any combination of
61              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
62              all for all things (except nojobs and nosteps, which must be re‐
63              quested as well).
64
65              If limits, qos, or wckeys are set, associations  will  automati‐
66              cally be set.
67
68              If wckeys is set, TrackWCKey will automatically be set.
69
70              If  safe  is  set, limits and associations will automatically be
71              set.
72
73              If nojobs is set, nosteps will automatically be set.
74
75              By setting associations, no new job is allowed to run  unless  a
76              corresponding  association  exists in the system.  If limits are
77              enforced, users can be limited by association  to  whatever  job
78              size or run time limits are defined.
79
80              If  nojobs  is set, Slurm will not account for any jobs or steps
81              on the system. Likewise, if nosteps is set, Slurm will  not  ac‐
82              count for any steps that have run.
83
84              If  safe is enforced, a job will only be launched against an as‐
85              sociation or qos that has a GrpTRESMins limit set,  if  the  job
86              will be able to run to completion. Without this option set, jobs
87              will be launched as long  as  their  usage  hasn't  reached  the
88              cpu-minutes limit. This can lead to jobs being launched but then
89              killed when the limit is reached.
90
91              With qos and/or wckeys enforced jobs will not be  scheduled  un‐
92              less  a valid qos and/or workload characterization key is speci‐
93              fied.
94
95              When AccountingStorageEnforce  is  changed,  a  restart  of  the
96              slurmctld daemon is required (not just a "scontrol reconfig").
97
98
99       AccountingStorageExternalHost
100              A     comma-separated     list     of     external     slurmdbds
101              (<host/ip>[:port][,...]) to register with. If no port is  given,
102              the AccountingStoragePort will be used.
103
104              This  allows  clusters  registered with the external slurmdbd to
105              communicate with each other using the --cluster/-M  client  com‐
106              mand options.
107
108              The  cluster  will  add  itself  to  the external slurmdbd if it
109              doesn't exist. If a non-external cluster already exists  on  the
110              external  slurmdbd, the slurmctld will ignore registering to the
111              external slurmdbd.
112
113
114       AccountingStorageHost
115              The name of the machine hosting the accounting storage database.
116              Only used with systems using SlurmDBD, ignored otherwise.
117
118
119       AccountingStorageParameters
120              Comma-separated  list  of  key-value  pair parameters. Currently
121              supported values include options to establish a  secure  connec‐
122              tion to the database:
123
124              SSL_CERT
125                The path name of the client public key certificate file.
126
127              SSL_CA
128                The  path  name  of the Certificate Authority (CA) certificate
129                file.
130
131              SSL_CAPATH
132                The path name of the directory that contains  trusted  SSL  CA
133                certificate files.
134
135              SSL_KEY
136                The path name of the client private key file.
137
138              SSL_CIPHER
139                The list of permissible ciphers for SSL encryption.
140
141
142       AccountingStoragePass
143              The  password  used  to gain access to the database to store the
144              accounting data.  Only used for database type  storage  plugins,
145              ignored  otherwise.   In the case of Slurm DBD (Database Daemon)
146              with MUNGE authentication this can be configured to use a  MUNGE
147              daemon specifically configured to provide authentication between
148              clusters while the default MUNGE daemon provides  authentication
149              within  a  cluster.   In that case, AccountingStoragePass should
150              specify the named port to be used for  communications  with  the
151              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
152              The default value is NULL.
153
154
155       AccountingStoragePort
156              The listening port of the accounting  storage  database  server.
157              Only  used for database type storage plugins, ignored otherwise.
158              The default value is  SLURMDBD_PORT  as  established  at  system
159              build  time. If no value is explicitly specified, it will be set
160              to 6819.  This value must be equal to the DbdPort  parameter  in
161              the slurmdbd.conf file.
162
163
164       AccountingStorageTRES
165              Comma-separated list of resources you wish to track on the clus‐
166              ter.  These are the resources requested by the  sbatch/srun  job
167              when  it  is  submitted. Currently this consists of any GRES, BB
168              (burst buffer) or license along with CPU, Memory, Node,  Energy,
169              FS/[Disk|Lustre],  IC/OFED, Pages, and VMem. By default Billing,
170              CPU, Energy, Memory, Node, FS/Disk, Pages and VMem are  tracked.
171              These  default  TRES  cannot  be disabled, but only appended to.
172              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
173              billing,  cpu,  energy,  memory,  nodes, fs/disk, pages and vmem
174              along with a gres called craynetwork as well as a license called
175              iop1.  Whenever these resources are used on the cluster they are
176              recorded. The TRES are automatically set up in the  database  on
177              the start of the slurmctld.
178
179              If  multiple  GRES  of different types are tracked (e.g. GPUs of
180              different types), then job requests with matching type  specifi‐
181              cations  will  be  recorded.  Given a configuration of "Account‐
182              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
183              "gres/gpu:tesla"  and "gres/gpu:volta" will track only jobs that
184              explicitly request those two GPU types,  while  "gres/gpu"  will
185              track  allocated GPUs of any type ("tesla", "volta" or any other
186              GPU type).
187
188              Given      a      configuration      of      "AccountingStorage‐
189              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
190              "gres/gpu:volta" will track jobs that explicitly  request  those
191              GPU  types.   If  a  job  requests GPUs, but does not explicitly
192              specify the GPU type, then its resource allocation will  be  ac‐
193              counted  for as either "gres/gpu:tesla" or "gres/gpu:volta", al‐
194              though the accounting may not match the actual  GPU  type  allo‐
195              cated to the job and the GPUs allocated to the job could be het‐
196              erogeneous.  In an environment containing various GPU types, use
197              of  a job_submit plugin may be desired in order to force jobs to
198              explicitly specify some GPU type.
199
200
201       AccountingStorageType
202              The accounting storage mechanism  type.   Acceptable  values  at
203              present  include "accounting_storage/none" and "accounting_stor‐
204              age/slurmdbd".  The  "accounting_storage/slurmdbd"  value  indi‐
205              cates  that accounting records will be written to the Slurm DBD,
206              which manages an underlying MySQL database. See  "man  slurmdbd"
207              for  more  information.   The default value is "accounting_stor‐
208              age/none" and indicates that account records are not maintained.
209
210
211       AccountingStorageUser
212              The user account for accessing the accounting storage  database.
213              Only used for database type storage plugins, ignored otherwise.
214
215
216       AccountingStoreFlags
217              Comma  separated  list used to tell the slurmctld to store extra
218              fields that may be more heavy weight than the normal job  infor‐
219              mation.
220
221
222              Current options are:
223
224
225              job_comment
226                     Include  the job's comment field in the job complete mes‐
227                     sage sent to the Accounting Storage database.   Note  the
228                     AdminComment and SystemComment are always recorded in the
229                     database.
230
231
232              job_env
233                     Include a batch job's environment variables used  at  job
234                     submission  in the job start message sent to the Account‐
235                     ing Storage database.
236
237
238              job_script
239                     Include the job's batch script in the job  start  message
240                     sent to the Accounting Storage database.
241
242
243       AcctGatherNodeFreq
244              The  AcctGather  plugins  sampling interval for node accounting.
245              For AcctGather plugin values of none, this parameter is ignored.
246              For all other values this parameter is the number of seconds be‐
247              tween node accounting samples. For  the  acct_gather_energy/rapl
248              plugin, set a value less than 300 because the counters may over‐
249              flow beyond this rate.  The default value is  zero.  This  value
250              disables  accounting  sampling  for  nodes. Note: The accounting
251              sampling interval for jobs is determined by the value of  JobAc‐
252              ctGatherFrequency.
253
254
255       AcctGatherEnergyType
256              Identifies the plugin to be used for energy consumption account‐
257              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
258              plugin  to  collect  energy consumption data for jobs and nodes.
259              The collection of energy consumption data  takes  place  on  the
260              node  level,  hence only in case of exclusive job allocation the
261              energy consumption measurements will reflect the job's real con‐
262              sumption. In case of node sharing between jobs the reported con‐
263              sumed energy per job (through sstat or sacct) will  not  reflect
264              the real energy consumed by the jobs.
265
266              Configurable values at present are:
267
268              acct_gather_energy/none
269                                  No energy consumption data is collected.
270
271              acct_gather_energy/ipmi
272                                  Energy  consumption  data  is collected from
273                                  the Baseboard  Management  Controller  (BMC)
274                                  using  the  Intelligent  Platform Management
275                                  Interface (IPMI).
276
277              acct_gather_energy/pm_counters
278                                  Energy consumption data  is  collected  from
279                                  the  Baseboard  Management  Controller (BMC)
280                                  for HPE Cray systems.
281
282              acct_gather_energy/rapl
283                                  Energy consumption data  is  collected  from
284                                  hardware  sensors  using the Running Average
285                                  Power Limit (RAPL) mechanism. Note that  en‐
286                                  abling RAPL may require the execution of the
287                                  command "sudo modprobe msr".
288
289              acct_gather_energy/xcc
290                                  Energy consumption data  is  collected  from
291                                  the  Lenovo  SD650 XClarity Controller (XCC)
292                                  using IPMI OEM raw commands.
293
294
295       AcctGatherInterconnectType
296              Identifies the plugin to be used for interconnect network  traf‐
297              fic  accounting.   The  jobacct_gather  plugin and slurmd daemon
298              call this plugin to collect network traffic data  for  jobs  and
299              nodes.   The  collection  of network traffic data takes place on
300              the node level, hence only in case of exclusive  job  allocation
301              the  collected  values  will  reflect the job's real traffic. In
302              case of node sharing between jobs the reported  network  traffic
303              per  job (through sstat or sacct) will not reflect the real net‐
304              work traffic by the jobs.
305
306              Configurable values at present are:
307
308              acct_gather_interconnect/none
309                                  No infiniband network data are collected.
310
311              acct_gather_interconnect/ofed
312                                  Infiniband network  traffic  data  are  col‐
313                                  lected from the hardware monitoring counters
314                                  of Infiniband devices through the  OFED  li‐
315                                  brary.  In order to account for per job net‐
316                                  work traffic, add the "ic/ofed" TRES to  Ac‐
317                                  countingStorageTRES.
318
319
320       AcctGatherFilesystemType
321              Identifies the plugin to be used for filesystem traffic account‐
322              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
323              plugin  to  collect  filesystem traffic data for jobs and nodes.
324              The collection of filesystem traffic data  takes  place  on  the
325              node  level,  hence only in case of exclusive job allocation the
326              collected values will reflect the job's real traffic. In case of
327              node  sharing  between  jobs the reported filesystem traffic per
328              job (through sstat or sacct) will not reflect the real  filesys‐
329              tem traffic by the jobs.
330
331
332              Configurable values at present are:
333
334              acct_gather_filesystem/none
335                                  No filesystem data are collected.
336
337              acct_gather_filesystem/lustre
338                                  Lustre filesystem traffic data are collected
339                                  from the counters found in /proc/fs/lustre/.
340                                  In order to account for per job lustre traf‐
341                                  fic, add the "fs/lustre"  TRES  to  Account‐
342                                  ingStorageTRES.
343
344
345       AcctGatherProfileType
346              Identifies  the  plugin  to  be used for detailed job profiling.
347              The jobacct_gather plugin and slurmd daemon call this plugin  to
348              collect  detailed  data such as I/O counts, memory usage, or en‐
349              ergy consumption for jobs and nodes.  There  are  interfaces  in
350              this  plugin  to collect data as step start and completion, task
351              start and completion, and at the account gather  frequency.  The
352              data collected at the node level is related to jobs only in case
353              of exclusive job allocation.
354
355              Configurable values at present are:
356
357              acct_gather_profile/none
358                                  No profile data is collected.
359
360              acct_gather_profile/hdf5
361                                  This enables the HDF5 plugin. The  directory
362                                  where the profile files are stored and which
363                                  values are collected are configured  in  the
364                                  acct_gather.conf file.
365
366              acct_gather_profile/influxdb
367                                  This  enables  the  influxdb plugin. The in‐
368                                  fluxdb instance host, port, database, reten‐
369                                  tion  policy  and which values are collected
370                                  are configured in the acct_gather.conf file.
371
372
373       AllowSpecResourcesUsage
374              If set to "YES", Slurm allows individual jobs to override node's
375              configured  CoreSpecCount  value. For a job to take advantage of
376              this feature, a command line option of --core-spec must be spec‐
377              ified.  The default value for this option is "YES" for Cray sys‐
378              tems and "NO" for other system types.
379
380
381       AuthAltTypes
382              Comma-separated list of alternative authentication plugins  that
383              the  slurmctld  will permit for communication. Acceptable values
384              at present include auth/jwt.
385
386              NOTE: auth/jwt requires a jwt_hs256.key to be populated  in  the
387              StateSaveLocation    directory    for    slurmctld   only.   The
388              jwt_hs256.key should only be visible to the SlurmUser and  root.
389              It  is not suggested to place the jwt_hs256.key on any nodes but
390              the controller running slurmctld.  auth/jwt can be activated  by
391              the  presence of the SLURM_JWT environment variable.  When acti‐
392              vated, it will override the default AuthType.
393
394
395       AuthAltParameters
396              Used to define alternative authentication plugins options.  Mul‐
397              tiple options may be comma separated.
398
399              disable_token_creation
400                             Disable "scontrol token" use by non-SlurmUser ac‐
401                             counts.
402
403              jwks=          Absolute path to JWKS file. Only RS256  keys  are
404                             supported, although other key types may be listed
405                             in the file. If set, no HS256 key will be  loaded
406                             by  default  (and  token generation is disabled),
407                             although the jwt_key setting may be used  to  ex‐
408                             plicitly  re-enable HS256 key use (and token gen‐
409                             eration).
410
411              jwt_key=       Absolute path to JWT key file. Key must be HS256,
412                             and  should  only  be accessible by SlurmUser. If
413                             not set, the default key file is jwt_hs256.key in
414                             StateSaveLocation.
415
416
417       AuthInfo
418              Additional information to be used for authentication of communi‐
419              cations between the Slurm daemons (slurmctld and slurmd) and the
420              Slurm clients.  The interpretation of this option is specific to
421              the configured AuthType.  Multiple options may be specified in a
422              comma-delimited list.  If not specified, the default authentica‐
423              tion information will be used.
424
425              cred_expire   Default job step credential lifetime,  in  seconds
426                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
427                            ciently long enough to load user environment,  run
428                            prolog,  deal with the slurmd getting paged out of
429                            memory, etc.  This also controls how  long  a  re‐
430                            queued  job  must wait before starting again.  The
431                            default value is 120 seconds.
432
433              socket        Path name to a MUNGE daemon socket  to  use  (e.g.
434                            "socket=/var/run/munge/munge.socket.2").   The de‐
435                            fault  value  is  "/var/run/munge/munge.socket.2".
436                            Used by auth/munge and cred/munge.
437
438              ttl           Credential  lifetime, in seconds (e.g. "ttl=300").
439                            The default value is dependent upon the MUNGE  in‐
440                            stallation, but is typically 300 seconds.
441
442
443       AuthType
444              The  authentication method for communications between Slurm com‐
445              ponents.  Acceptable values  at  present  include  "auth/munge",
446              which  is  the default.  "auth/munge" indicates that MUNGE is to
447              be used.  (See "https://dun.github.io/munge/" for more  informa‐
448              tion).   All Slurm daemons and commands must be terminated prior
449              to changing the value of AuthType and later restarted.
450
451
452       BackupAddr
453              Deprecated option, see SlurmctldHost.
454
455
456       BackupController
457              Deprecated option, see SlurmctldHost.
458
459              The backup controller recovers state information from the State‐
460              SaveLocation directory, which must be readable and writable from
461              both the primary and backup controllers.  While  not  essential,
462              it  is  recommended  that  you specify a backup controller.  See
463              the RELOCATING CONTROLLERS section if you change this.
464
465
466       BatchStartTimeout
467              The maximum time (in seconds) that a batch job is permitted  for
468              launching  before being considered missing and releasing the al‐
469              location. The default value is 10 (seconds). Larger  values  may
470              be required if more time is required to execute the Prolog, load
471              user environment variables, or if the slurmd daemon  gets  paged
472              from memory.
473              Note:  The  test  for  a job being successfully launched is only
474              performed when the Slurm daemon on the  compute  node  registers
475              state  with the slurmctld daemon on the head node, which happens
476              fairly rarely.  Therefore a job will not necessarily  be  termi‐
477              nated if its start time exceeds BatchStartTimeout.  This config‐
478              uration parameter is also applied  to  launch  tasks  and  avoid
479              aborting srun commands due to long running Prolog scripts.
480
481
482       BcastExclude
483              Comma-separated  list of absolute directory paths to be excluded
484              when autodetecting and broadcasting executable shared object de‐
485              pendencies  through  sbcast  or srun --bcast. The keyword "none"
486              can be used to indicate that no directory paths  should  be  ex‐
487              cluded.  The default value is "/lib,/usr/lib,/lib64,/usr/lib64".
488              This option can be  overridden  by  sbcast  --exclude  and  srun
489              --bcast-exclude.
490
491
492       BcastParameters
493              Controls  sbcast and srun --bcast behavior. Multiple options can
494              be specified in a comma separated list.   Supported  values  in‐
495              clude:
496
497              DestDir=       Destination directory for file being broadcast to
498                             allocated compute nodes.  Default value  is  cur‐
499                             rent  working  directory,  or --chdir for srun if
500                             set.
501
502              Compression=   Specify default file compression  library  to  be
503                             used.   Supported  values  are  "lz4" and "none".
504                             The default value with the sbcast --compress  op‐
505                             tion  is  "lz4"  and "none" otherwise.  Some com‐
506                             pression libraries may  be  unavailable  on  some
507                             systems.
508
509              send_libs      If  set,  attempt to autodetect and broadcast the
510                             executable's shared object dependencies to  allo‐
511                             cated  compute  nodes.  The files are placed in a
512                             directory  alongside  the  executable.  For  srun
513                             only,  the  LD_LIBRARY_PATH  is automatically up‐
514                             dated to include this cache  directory  as  well.
515                             This can be overridden with either sbcast or srun
516                             --send-libs option. By default this is disabled.
517
518
519       BurstBufferType
520              The plugin used to manage burst buffers.  Acceptable  values  at
521              present are:
522
523              burst_buffer/datawarp
524                     Use Cray DataWarp API to provide burst buffer functional‐
525                     ity.
526
527              burst_buffer/lua
528                     This plugin provides hooks to an API that is defined by a
529                     Lua  script.  This plugin was developed to provide system
530                     administrators with a way to do any task (not  only  file
531                     staging) at different points in a job’s life cycle.
532
533              burst_buffer/none
534
535
536       CliFilterPlugins
537              A  comma-delimited  list  of  command line interface option fil‐
538              ter/modification plugins. The specified plugins will be executed
539              in  the  order  listed.   These are intended to be site-specific
540              plugins which can be used to set default job  parameters  and/or
541              logging events.  No cli_filter plugins are used by default.
542
543
544       ClusterName
545              The name by which this Slurm managed cluster is known in the ac‐
546              counting  database.   This  is  needed  distinguish   accounting
547              records  when multiple clusters report to the same database. Be‐
548              cause of limitations in some databases, any upper  case  letters
549              in  the  name will be silently mapped to lower case. In order to
550              avoid confusion, it is recommended that the name be lower case.
551
552
553       CommunicationParameters
554              Comma-separated options identifying communication options.
555
556              CheckGhalQuiesce
557                             Used specifically on a Cray using an  Aries  Ghal
558                             interconnect.  This will check to see if the sys‐
559                             tem is quiescing when sending a message,  and  if
560                             so, we wait until it is done before sending.
561
562              DisableIPv4    Disable IPv4 only operation for all slurm daemons
563                             (except slurmdbd). This should  also  be  set  in
564                             your slurmdbd.conf file.
565
566              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
567                             (except slurmdbd). When using both IPv4 and IPv6,
568                             address  family preferences will be based on your
569                             /etc/gai.conf file. This should also  be  set  in
570                             your slurmdbd.conf file.
571
572              NoAddrCache    By default, Slurm will cache a node's network ad‐
573                             dress after successfully establishing the  node's
574                             network  address.  This option disables the cache
575                             and Slurm will look up the node's network address
576                             each  time a connection is made.  This is useful,
577                             for example, in a  cloud  environment  where  the
578                             node addresses come and go out of DNS.
579
580              NoCtldInAddrAny
581                             Used  to directly bind to the address of what the
582                             node resolves to running the slurmctld instead of
583                             binding  messages  to  any  address  on the node,
584                             which is the default.
585
586              NoInAddrAny    Used to directly bind to the address of what  the
587                             node  resolves  to instead of binding messages to
588                             any address on the node  which  is  the  default.
589                             This option is for all daemons/clients except for
590                             the slurmctld.
591
592
593
594       CompleteWait
595              The time to wait, in seconds, when any job is in the  COMPLETING
596              state  before  any additional jobs are scheduled. This is to at‐
597              tempt to keep jobs on nodes that were recently in use, with  the
598              goal  of preventing fragmentation.  If set to zero, pending jobs
599              will be started as soon as possible.  Since a  COMPLETING  job's
600              resources are released for use by other jobs as soon as the Epi‐
601              log completes on each individual node, this can result  in  very
602              fragmented resource allocations.  To provide jobs with the mini‐
603              mum response time, a value of zero is recommended (no  waiting).
604              To  minimize  fragmentation of resources, a value equal to Kill‐
605              Wait plus two is recommended.  In that case, setting KillWait to
606              a small value may be beneficial.  The default value of Complete‐
607              Wait is zero seconds.  The value may not exceed 65533.
608
609              NOTE: Setting reduce_completing_frag  affects  the  behavior  of
610              CompleteWait.
611
612
613       ControlAddr
614              Deprecated option, see SlurmctldHost.
615
616
617       ControlMachine
618              Deprecated option, see SlurmctldHost.
619
620
621       CoreSpecPlugin
622              Identifies  the  plugins to be used for enforcement of core spe‐
623              cialization.  The slurmd daemon must be restarted for  a  change
624              in  CoreSpecPlugin to take effect.  Acceptable values at present
625              include:
626
627              core_spec/cray_aries
628                                  used only for Cray systems
629
630              core_spec/none      used for all other system types
631
632
633       CpuFreqDef
634              Default CPU frequency value or frequency governor  to  use  when
635              running  a  job  step if it has not been explicitly set with the
636              --cpu-freq option.  Acceptable values at present include  a  nu‐
637              meric  value  (frequency  in  kilohertz) or one of the following
638              governors:
639
640              Conservative  attempts to use the Conservative CPU governor
641
642              OnDemand      attempts to use the OnDemand CPU governor
643
644              Performance   attempts to use the Performance CPU governor
645
646              PowerSave     attempts to use the PowerSave CPU governor
647       There is no default value. If unset, no attempt to set the governor  is
648       made if the --cpu-freq option has not been set.
649
650
651       CpuFreqGovernors
652              List  of CPU frequency governors allowed to be set with the sal‐
653              loc, sbatch, or srun option  --cpu-freq.  Acceptable  values  at
654              present include:
655
656              Conservative  attempts to use the Conservative CPU governor
657
658              OnDemand      attempts  to  use the OnDemand CPU governor (a de‐
659                            fault value)
660
661              Performance   attempts to use the Performance  CPU  governor  (a
662                            default value)
663
664              PowerSave     attempts to use the PowerSave CPU governor
665
666              SchedUtil     attempts to use the SchedUtil CPU governor
667
668              UserSpace     attempts  to use the UserSpace CPU governor (a de‐
669                            fault value)
670       The default is OnDemand, Performance and UserSpace.
671
672       CredType
673              The cryptographic signature tool to be used in the  creation  of
674              job  step  credentials.   The slurmctld daemon must be restarted
675              for a change in CredType to take effect.  The default (and  rec‐
676              ommended) value is "cred/munge".
677
678
679       DebugFlags
680              Defines  specific  subsystems which should provide more detailed
681              event logging.  Multiple subsystems can be specified with  comma
682              separators.   Most  DebugFlags will result in verbose-level log‐
683              ging for the identified subsystems,  and  could  impact  perfor‐
684              mance.  Valid subsystems available include:
685
686              Accrue           Accrue counters accounting details
687
688              Agent            RPC agents (outgoing RPCs from Slurm daemons)
689
690              Backfill         Backfill scheduler details
691
692              BackfillMap      Backfill scheduler to log a very verbose map of
693                               reserved resources through time.  Combine  with
694                               Backfill for a verbose and complete view of the
695                               backfill scheduler's work.
696
697              BurstBuffer      Burst Buffer plugin
698
699              Cgroup           Cgroup details
700
701              CPU_Bind         CPU binding details for jobs and steps
702
703              CpuFrequency     Cpu frequency details for jobs and steps  using
704                               the --cpu-freq option.
705
706              Data             Generic data structure details.
707
708              Dependency       Job dependency debug info
709
710              Elasticsearch    Elasticsearch debug info
711
712              Energy           AcctGatherEnergy debug info
713
714              ExtSensors       External Sensors debug info
715
716              Federation       Federation scheduling debug info
717
718              FrontEnd         Front end node details
719
720              Gres             Generic resource details
721
722              Hetjob           Heterogeneous job details
723
724              Gang             Gang scheduling details
725
726              JobAccountGather Common   job  account  gathering  details  (not
727                               plugin specific).
728
729              JobContainer     Job container plugin details
730
731              License          License management details
732
733              Network          Network details
734
735              NetworkRaw       Dump raw hex values of key  Network  communica‐
736                               tions. Warning: very verbose.
737
738              NodeFeatures     Node Features plugin debug info
739
740              NO_CONF_HASH     Do not log when the slurm.conf files differ be‐
741                               tween Slurm daemons
742
743              Power            Power management plugin and  power  save  (sus‐
744                               pend/resume programs) details
745
746              Priority         Job prioritization
747
748              Profile          AcctGatherProfile plugins details
749
750              Protocol         Communication protocol details
751
752              Reservation      Advanced reservations
753
754              Route            Message forwarding debug info
755
756              Script           Debug  info  regarding  the  process  that runs
757                               slurmctld scripts such as  PrologSlurmctld  and
758                               EpilogSlurmctld
759
760              SelectType       Resource selection plugin
761
762              Steps            Slurmctld resource allocation for job steps
763
764              Switch           Switch plugin
765
766              TimeCray         Timing of Cray APIs
767
768              TraceJobs        Trace jobs in slurmctld. It will print detailed
769                               job information including state,  job  ids  and
770                               allocated nodes counter.
771
772              Triggers         Slurmctld triggers
773
774              WorkQueue        Work Queue details
775
776
777       DefCpuPerGPU
778              Default count of CPUs allocated per allocated GPU. This value is
779              used  only  if  the  job  didn't  specify  --cpus-per-task   and
780              --cpus-per-gpu.
781
782
783       DefMemPerCPU
784              Default   real  memory  size  available  per  allocated  CPU  in
785              megabytes.  Used to avoid over-subscribing  memory  and  causing
786              paging.  DefMemPerCPU would generally be used if individual pro‐
787              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
788              lectType=select/cons_tres).  The default value is 0 (unlimited).
789              Also see DefMemPerGPU, DefMemPerNode and MaxMemPerCPU.   DefMem‐
790              PerCPU, DefMemPerGPU and DefMemPerNode are mutually exclusive.
791
792
793       DefMemPerGPU
794              Default   real  memory  size  available  per  allocated  GPU  in
795              megabytes.  The  default  value  is  0  (unlimited).   Also  see
796              DefMemPerCPU  and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU and
797              DefMemPerNode are mutually exclusive.
798
799
800       DefMemPerNode
801              Default  real  memory  size  available  per  allocated  node  in
802              megabytes.   Used  to  avoid over-subscribing memory and causing
803              paging.  DefMemPerNode would generally be used  if  whole  nodes
804              are  allocated  to jobs (SelectType=select/linear) and resources
805              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
806              The  default  value  is  0  (unlimited).  Also see DefMemPerCPU,
807              DefMemPerGPU and MaxMemPerCPU.  DefMemPerCPU,  DefMemPerGPU  and
808              DefMemPerNode are mutually exclusive.
809
810
811       DependencyParameters
812              Multiple options may be comma separated.
813
814
815              disable_remote_singleton
816                     By  default,  when a federated job has a singleton depen‐
817                     dency, each cluster in the federation must clear the sin‐
818                     gleton  dependency  before the job's singleton dependency
819                     is considered satisfied. Enabling this option means  that
820                     only  the  origin cluster must clear the singleton depen‐
821                     dency. This option must be set in every  cluster  in  the
822                     federation.
823
824              kill_invalid_depend
825                     If  a  job has an invalid dependency and it can never run
826                     terminate it and set its state to  be  JOB_CANCELLED.  By
827                     default  the job stays pending with reason DependencyNev‐
828                     erSatisfied.  max_depend_depth=# Maximum number  of  jobs
829                     to test for a circular job dependency. Stop testing after
830                     this number of job dependencies have been tested. The de‐
831                     fault value is 10 jobs.
832
833
834       DisableRootJobs
835              If  set  to  "YES" then user root will be prevented from running
836              any jobs.  The default value is "NO", meaning user root will  be
837              able to execute jobs.  DisableRootJobs may also be set by parti‐
838              tion.
839
840
841       EioTimeout
842              The number of seconds srun waits for  slurmstepd  to  close  the
843              TCP/IP  connection  used to relay data between the user applica‐
844              tion and srun when the user application terminates. The  default
845              value is 60 seconds.  May not exceed 65533.
846
847
848       EnforcePartLimits
849              If set to "ALL" then jobs which exceed a partition's size and/or
850              time limits will be rejected at submission time. If job is  sub‐
851              mitted  to  multiple partitions, the job must satisfy the limits
852              on all the requested partitions. If set to  "NO"  then  the  job
853              will  be  accepted  and remain queued until the partition limits
854              are altered(Time and Node Limits).  If set to "ANY" a  job  must
855              satisfy any of the requested partitions to be submitted. The de‐
856              fault value is "NO".  NOTE: If set, then a job's QOS can not  be
857              used to exceed partition limits.  NOTE: The partition limits be‐
858              ing considered are its configured  MaxMemPerCPU,  MaxMemPerNode,
859              MinNodes,  MaxNodes,  MaxTime, AllocNodes, AllowAccounts, Allow‐
860              Groups, AllowQOS, and QOS usage threshold.
861
862
863       Epilog Fully qualified pathname of a script to execute as user root  on
864              every   node   when  a  user's  job  completes  (e.g.  "/usr/lo‐
865              cal/slurm/epilog"). A glob pattern (See glob (7))  may  also  be
866              used  to  run more than one epilog script (e.g. "/etc/slurm/epi‐
867              log.d/*"). The Epilog script or scripts may  be  used  to  purge
868              files,  disable user login, etc.  By default there is no epilog.
869              See Prolog and Epilog Scripts for more information.
870
871
872       EpilogMsgTime
873              The number of microseconds that the slurmctld daemon requires to
874              process  an  epilog  completion message from the slurmd daemons.
875              This parameter can be used to prevent a burst of epilog  comple‐
876              tion messages from being sent at the same time which should help
877              prevent lost messages and improve  throughput  for  large  jobs.
878              The  default  value  is 2000 microseconds.  For a 1000 node job,
879              this spreads the epilog completion messages out  over  two  sec‐
880              onds.
881
882
883       EpilogSlurmctld
884              Fully  qualified pathname of a program for the slurmctld to exe‐
885              cute upon termination  of  a  job  allocation  (e.g.   "/usr/lo‐
886              cal/slurm/epilog_controller").   The  program  executes as Slur‐
887              mUser, which gives it permission to drain nodes and requeue  the
888              job  if  a  failure  occurs (See scontrol(1)).  Exactly what the
889              program does and how it accomplishes this is completely  at  the
890              discretion  of  the system administrator.  Information about the
891              job being initiated, its allocated nodes, etc. are passed to the
892              program  using  environment  variables.   See  Prolog and Epilog
893              Scripts for more information.
894
895
896       ExtSensorsFreq
897              The external  sensors  plugin  sampling  interval.   If  ExtSen‐
898              sorsType=ext_sensors/none,  this  parameter is ignored.  For all
899              other values of ExtSensorsType, this parameter is the number  of
900              seconds between external sensors samples for hardware components
901              (nodes, switches, etc.) The default value is  zero.  This  value
902              disables  external  sensors  sampling. Note: This parameter does
903              not affect external sensors data collection for jobs/steps.
904
905
906       ExtSensorsType
907              Identifies the plugin to be used for external sensors data  col‐
908              lection.   Slurmctld  calls this plugin to collect external sen‐
909              sors data for jobs/steps and hardware  components.  In  case  of
910              node  sharing  between  jobs  the  reported  values per job/step
911              (through sstat or sacct) may not be  accurate.   See  also  "man
912              ext_sensors.conf".
913
914              Configurable values at present are:
915
916              ext_sensors/none    No external sensors data is collected.
917
918              ext_sensors/rrd     External  sensors data is collected from the
919                                  RRD database.
920
921
922       FairShareDampeningFactor
923              Dampen the effect of exceeding a user or group's fair  share  of
924              allocated resources. Higher values will provides greater ability
925              to differentiate between exceeding the fair share at high levels
926              (e.g. a value of 1 results in almost no difference between over‐
927              consumption by a factor of 10 and 100, while a value of  5  will
928              result  in  a  significant difference in priority).  The default
929              value is 1.
930
931
932       FederationParameters
933              Used to define federation options. Multiple options may be comma
934              separated.
935
936
937              fed_display
938                     If  set,  then  the  client status commands (e.g. squeue,
939                     sinfo, sprio, etc.) will display information in a  feder‐
940                     ated view by default. This option is functionally equiva‐
941                     lent to using the --federation options on  each  command.
942                     Use the client's --local option to override the federated
943                     view and get a local view of the given cluster.
944
945
946       FirstJobId
947              The job id to be used for the first job submitted to Slurm.  Job
948              id  values  generated  will incremented by 1 for each subsequent
949              job.  Value must be larger than 0. The default value is 1.  Also
950              see MaxJobId
951
952
953       GetEnvTimeout
954              Controls  how  long the job should wait (in seconds) to load the
955              user's environment before attempting to load  it  from  a  cache
956              file.   Applies  when the salloc or sbatch --get-user-env option
957              is used.  If set to 0 then always load  the  user's  environment
958              from the cache file.  The default value is 2 seconds.
959
960
961       GresTypes
962              A  comma-delimited list of generic resources to be managed (e.g.
963              GresTypes=gpu,mps).  These resources may have an associated GRES
964              plugin  of the same name providing additional functionality.  No
965              generic resources are managed by default.  Ensure this parameter
966              is  consistent across all nodes in the cluster for proper opera‐
967              tion.  The slurmctld and slurmd daemons must  be  restarted  for
968              changes to this parameter to take effect.
969
970
971       GroupUpdateForce
972              If  set  to a non-zero value, then information about which users
973              are members of groups allowed to use a partition will be updated
974              periodically,  even  when  there  have  been  no  changes to the
975              /etc/group file.  If set to zero, group member information  will
976              be  updated  only after the /etc/group file is updated.  The de‐
977              fault value is 1.  Also see the GroupUpdateTime parameter.
978
979
980       GroupUpdateTime
981              Controls how frequently information about which users  are  mem‐
982              bers  of  groups allowed to use a partition will be updated, and
983              how long user group membership lists will be cached.   The  time
984              interval  is  given  in seconds with a default value of 600 sec‐
985              onds.  A value of zero will prevent periodic updating  of  group
986              membership  information.   Also see the GroupUpdateForce parame‐
987              ter.
988
989
990       GpuFreqDef=[<type]=value>[,<type=value>]
991              Default GPU frequency to use when running a job step if  it  has
992              not  been  explicitly set using the --gpu-freq option.  This op‐
993              tion can be used to independently configure the GPU and its mem‐
994              ory  frequencies. Defaults to "high,memory=high".  After the job
995              is completed, the frequencies of all affected GPUs will be reset
996              to  the  highest  possible  values.  In some cases, system power
997              caps may override the requested values.  The field type  can  be
998              "memory".   If  type  is not specified, the GPU frequency is im‐
999              plied.  The value field can either be "low",  "medium",  "high",
1000              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
1001              fied numeric value is not possible, a value as close as possible
1002              will be used.  See below for definition of the values.  Examples
1003              of  use  include  "GpuFreqDef=medium,memory=high  and   "GpuFre‐
1004              qDef=450".
1005
1006              Supported value definitions:
1007
1008              low       the lowest available frequency.
1009
1010              medium    attempts  to  set  a  frequency  in  the middle of the
1011                        available range.
1012
1013              high      the highest available frequency.
1014
1015              highm1    (high minus one) will select the next  highest  avail‐
1016                        able frequency.
1017
1018
1019       HealthCheckInterval
1020              The  interval  in  seconds between executions of HealthCheckPro‐
1021              gram.  The default value is zero, which disables execution.
1022
1023
1024       HealthCheckNodeState
1025              Identify what node states should execute the HealthCheckProgram.
1026              Multiple  state  values may be specified with a comma separator.
1027              The default value is ANY to execute on nodes in any state.
1028
1029              ALLOC       Run on nodes in the  ALLOC  state  (all  CPUs  allo‐
1030                          cated).
1031
1032              ANY         Run on nodes in any state.
1033
1034              CYCLE       Rather  than running the health check program on all
1035                          nodes at the same time, cycle through running on all
1036                          compute nodes through the course of the HealthCheck‐
1037                          Interval. May be  combined  with  the  various  node
1038                          state options.
1039
1040              IDLE        Run on nodes in the IDLE state.
1041
1042              MIXED       Run  on nodes in the MIXED state (some CPUs idle and
1043                          other CPUs allocated).
1044
1045
1046       HealthCheckProgram
1047              Fully qualified pathname of a script to execute as user root pe‐
1048              riodically on all compute nodes that are not in the NOT_RESPOND‐
1049              ING state. This program may be used to verify the node is  fully
1050              operational and DRAIN the node or send email if a problem is de‐
1051              tected.  Any action to be taken must be explicitly performed  by
1052              the   program   (e.g.   execute  "scontrol  update  NodeName=foo
1053              State=drain Reason=tmp_file_system_full" to drain a node).   The
1054              execution  interval  is controlled using the HealthCheckInterval
1055              parameter.  Note that the HealthCheckProgram will be executed at
1056              the  same time on all nodes to minimize its impact upon parallel
1057              programs.  This program is will be killed if it does not  termi‐
1058              nate normally within 60 seconds.  This program will also be exe‐
1059              cuted when the slurmd daemon is first started and before it reg‐
1060              isters  with  the slurmctld daemon.  By default, no program will
1061              be executed.
1062
1063
1064       InactiveLimit
1065              The interval, in seconds, after which a non-responsive job allo‐
1066              cation  command (e.g. srun or salloc) will result in the job be‐
1067              ing terminated. If the node on which  the  command  is  executed
1068              fails  or the command abnormally terminates, this will terminate
1069              its job allocation.  This option has no effect upon batch  jobs.
1070              When  setting  a  value, take into consideration that a debugger
1071              using srun to launch an application may leave the  srun  command
1072              in  a stopped state for extended periods of time.  This limit is
1073              ignored for jobs running in partitions with  the  RootOnly  flag
1074              set  (the  scheduler running as root will be responsible for the
1075              job).  The default value is unlimited (zero) and may not  exceed
1076              65533 seconds.
1077
1078
1079       InteractiveStepOptions
1080              When LaunchParameters=use_interactive_step is enabled, launching
1081              salloc will automatically start an srun  process  with  Interac‐
1082              tiveStepOptions  to launch a terminal on a node in the job allo‐
1083              cation.  The  default  value  is  "--interactive  --preserve-env
1084              --pty  $SHELL".  The "--interactive" option is intentionally not
1085              documented in the srun man page. It is meant only to be used  in
1086              InteractiveStepOptions  in order to create an "interactive step"
1087              that will not consume resources so that other steps may  run  in
1088              parallel with the interactive step.
1089
1090
1091       JobAcctGatherType
1092              The job accounting mechanism type.  Acceptable values at present
1093              include "jobacct_gather/linux" (for Linux systems)  and  is  the
1094              recommended        one,        "jobacct_gather/cgroup"       and
1095              "jobacct_gather/none" (no accounting data collected).   The  de‐
1096              fault  value  is "jobacct_gather/none".  "jobacct_gather/cgroup"
1097              is a plugin for the Linux operating system that uses cgroups  to
1098              collect accounting statistics. The plugin collects the following
1099              statistics:  From  the  cgroup  memory   subsystem:   memory.us‐
1100              age_in_bytes (reported as 'pages') and rss from memory.stat (re‐
1101              ported as 'rss'). From the cgroup cpuacct  subsystem:  user  cpu
1102              time  and  system  cpu time. No value is provided by cgroups for
1103              virtual memory size ('vsize').  In order to use the  sstat  tool
1104              "jobacct_gather/linux",  or "jobacct_gather/cgroup" must be con‐
1105              figured.
1106              NOTE: Changing this configuration parameter changes the contents
1107              of  the  messages  between Slurm daemons. Any previously running
1108              job steps are managed by a slurmstepd daemon that  will  persist
1109              through  the lifetime of that job step and not change its commu‐
1110              nication protocol. Only change this configuration parameter when
1111              there are no running job steps.
1112
1113
1114       JobAcctGatherFrequency
1115              The  job  accounting and profiling sampling intervals.  The sup‐
1116              ported format is follows:
1117
1118              JobAcctGatherFrequency=<datatype>=<interval>
1119                          where <datatype>=<interval> specifies the task  sam‐
1120                          pling  interval  for  the jobacct_gather plugin or a
1121                          sampling  interval  for  a  profiling  type  by  the
1122                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1123                          rated <datatype>=<interval> intervals may be  speci‐
1124                          fied. Supported datatypes are as follows:
1125
1126                          task=<interval>
1127                                 where  <interval> is the task sampling inter‐
1128                                 val in seconds for the jobacct_gather plugins
1129                                 and     for    task    profiling    by    the
1130                                 acct_gather_profile plugin.
1131
1132                          energy=<interval>
1133                                 where <interval> is the sampling interval  in
1134                                 seconds   for   energy  profiling  using  the
1135                                 acct_gather_energy plugin
1136
1137                          network=<interval>
1138                                 where <interval> is the sampling interval  in
1139                                 seconds  for  infiniband  profiling using the
1140                                 acct_gather_interconnect plugin.
1141
1142                          filesystem=<interval>
1143                                 where <interval> is the sampling interval  in
1144                                 seconds  for  filesystem  profiling using the
1145                                 acct_gather_filesystem plugin.
1146
1147              The default value for task sampling interval
1148              is 30 seconds. The default value for all other intervals  is  0.
1149              An  interval  of  0 disables sampling of the specified type.  If
1150              the task sampling interval is 0, accounting information is  col‐
1151              lected only at job termination (reducing Slurm interference with
1152              the job).
1153              Smaller (non-zero) values have a greater impact upon job perfor‐
1154              mance,  but a value of 30 seconds is not likely to be noticeable
1155              for applications having less than 10,000 tasks.
1156              Users can independently override each interval on a per job  ba‐
1157              sis using the --acctg-freq option when submitting the job.
1158
1159
1160       JobAcctGatherParams
1161              Arbitrary  parameters  for the job account gather plugin Accept‐
1162              able values at present include:
1163
1164              NoShared            Exclude shared memory from accounting.
1165
1166              UsePss              Use PSS value instead of  RSS  to  calculate
1167                                  real usage of memory.  The PSS value will be
1168                                  saved as RSS.
1169
1170              OverMemoryKill      Kill processes that are  being  detected  to
1171                                  use  more memory than requested by steps ev‐
1172                                  ery time accounting information is  gathered
1173                                  by the JobAcctGather plugin.  This parameter
1174                                  should be used with caution  because  a  job
1175                                  exceeding  its  memory allocation may affect
1176                                  other processes and/or machine health.
1177
1178                                  NOTE: If available,  it  is  recommended  to
1179                                  limit  memory  by  enabling task/cgroup as a
1180                                  TaskPlugin  and  making  use  of  Constrain‐
1181                                  RAMSpace=yes  in  the cgroup.conf instead of
1182                                  using this JobAcctGather mechanism for  mem‐
1183                                  ory   enforcement.  Using  JobAcctGather  is
1184                                  polling based and there is a delay before  a
1185                                  job  is  killed,  which could lead to system
1186                                  Out of Memory events.
1187
1188                                  NOTE: When using OverMemoryKill, if the mem‐
1189                                  ory  usage of one of the processes in a step
1190                                  exceeds the memory limit,  the  entire  step
1191                                  will  be  killed/cancelled  by  the JobAcct‐
1192                                  Gather plugin.  This differs from the behav‐
1193                                  ior when using ConstrainRAMSpace, where pro‐
1194                                  cesses in the step will be killed,  but  the
1195                                  step  will  be  left  active,  possibly with
1196                                  other processes left running. It  also  dif‐
1197                                  fers  in  that  the combined memory usage of
1198                                  all the processes in the step are considered
1199                                  when evaluating against the memory limit.
1200
1201
1202       JobCompHost
1203              The  name  of  the  machine hosting the job completion database.
1204              Only used for database type storage plugins, ignored otherwise.
1205
1206
1207       JobCompLoc
1208              The fully qualified file name where job completion  records  are
1209              written  when  the JobCompType is "jobcomp/filetxt" or the data‐
1210              base where job completion records are stored when  the  JobComp‐
1211              Type  is  a  database,  or  a  complete URL endpoint with format
1212              <host>:<port>/<target>/_doc when JobCompType  is  "jobcomp/elas‐
1213              ticsearch"  like  i.e.  "localhost:9200/slurm/_doc".  NOTE: More
1214              information   is   available   at    the    Slurm    web    site
1215              <https://slurm.schedmd.com/elasticsearch.html>.
1216
1217
1218       JobCompParams
1219              Pass  arbitrary  text string to job completion plugin.  Also see
1220              JobCompType.
1221
1222
1223       JobCompPass
1224              The password used to gain access to the database  to  store  the
1225              job  completion data.  Only used for database type storage plug‐
1226              ins, ignored otherwise.
1227
1228
1229       JobCompPort
1230              The listening port of the job completion database server.   Only
1231              used for database type storage plugins, ignored otherwise.
1232
1233
1234       JobCompType
1235              The job completion logging mechanism type.  Acceptable values at
1236              present include:
1237
1238              jobcomp/none
1239                     Upon job completion, a record of the job is  purged  from
1240                     the  system.  If using the accounting infrastructure this
1241                     plugin may not be of interest since some of the  informa‐
1242                     tion is redundant.
1243
1244
1245              jobcomp/elasticsearch
1246                     Upon  job completion, a record of the job should be writ‐
1247                     ten to an Elasticsearch server, specified by the  JobCom‐
1248                     pLoc parameter.
1249                     NOTE: More information is available at the Slurm web site
1250                     ( https://slurm.schedmd.com/elasticsearch.html ).
1251
1252
1253              jobcomp/filetxt
1254                     Upon job completion, a record of the job should be  writ‐
1255                     ten  to  a text file, specified by the JobCompLoc parame‐
1256                     ter.
1257
1258
1259              jobcomp/lua
1260                     Upon job completion, a record of the job should  be  pro‐
1261                     cessed  by the jobcomp.lua script, located in the default
1262                     script directory (typically the subdirectory etc  of  the
1263                     installation directory.
1264
1265
1266              jobcomp/mysql
1267                     Upon  job completion, a record of the job should be writ‐
1268                     ten to a MySQL or MariaDB database, specified by the Job‐
1269                     CompLoc parameter.
1270
1271
1272              jobcomp/script
1273                     Upon job completion, a script specified by the JobCompLoc
1274                     parameter is to be executed  with  environment  variables
1275                     providing the job information.
1276
1277
1278       JobCompUser
1279              The  user  account  for  accessing  the job completion database.
1280              Only used for database type storage plugins, ignored otherwise.
1281
1282
1283       JobContainerType
1284              Identifies the plugin to be used for job tracking.   The  slurmd
1285              daemon  must  be  restarted  for a change in JobContainerType to
1286              take effect.  NOTE: The JobContainerType applies to a job  allo‐
1287              cation,  while  ProctrackType  applies to job steps.  Acceptable
1288              values at present include:
1289
1290              job_container/cncu  Used only for Cray systems (CNCU  =  Compute
1291                                  Node Clean Up)
1292
1293              job_container/none  Used for all other system types
1294
1295              job_container/tmpfs Used  to  create  a private namespace on the
1296                                  filesystem for jobs, which houses  temporary
1297                                  file  systems  (/tmp  and /dev/shm) for each
1298                                  job. 'PrologFlags=Contain' must  be  set  to
1299                                  use this plugin.
1300
1301
1302       JobFileAppend
1303              This  option controls what to do if a job's output or error file
1304              exist when the job is started.  If JobFileAppend  is  set  to  a
1305              value  of  1, then append to the existing file.  By default, any
1306              existing file is truncated.
1307
1308
1309       JobRequeue
1310              This option controls the default ability for batch  jobs  to  be
1311              requeued.   Jobs may be requeued explicitly by a system adminis‐
1312              trator, after node failure, or upon preemption by a higher  pri‐
1313              ority job.  If JobRequeue is set to a value of 1, then batch job
1314              may be requeued unless explicitly disabled by the user.  If  Jo‐
1315              bRequeue  is set to a value of 0, then batch job will not be re‐
1316              queued unless explicitly enabled by the user.   Use  the  sbatch
1317              --no-requeue  or --requeue option to change the default behavior
1318              for individual jobs.  The default value is 1.
1319
1320
1321       JobSubmitPlugins
1322              A comma-delimited list of job submission  plugins  to  be  used.
1323              The  specified  plugins  will  be  executed in the order listed.
1324              These are intended to be site-specific plugins which can be used
1325              to  set  default  job  parameters and/or logging events.  Sample
1326              plugins available in the distribution include  "all_partitions",
1327              "defaults",  "logging", "lua", and "partition".  For examples of
1328              use, see the Slurm code in  "src/plugins/job_submit"  and  "con‐
1329              tribs/lua/job_submit*.lua"  then modify the code to satisfy your
1330              needs.  Slurm can be configured to use multiple job_submit plug‐
1331              ins if desired, however the lua plugin will only execute one lua
1332              script named "job_submit.lua" located in the default script  di‐
1333              rectory  (typically  the  subdirectory "etc" of the installation
1334              directory).  No job submission plugins are used by default.
1335
1336
1337       KeepAliveTime
1338              Specifies how long sockets communications used between the  srun
1339              command  and its slurmstepd process are kept alive after discon‐
1340              nect.  Longer values can be used to improve reliability of  com‐
1341              munications in the event of network failures.  The default value
1342              leaves the system default  value.   The  value  may  not  exceed
1343              65533.
1344
1345
1346       KillOnBadExit
1347              If  set  to 1, a step will be terminated immediately if any task
1348              is crashed or aborted, as indicated by  a  non-zero  exit  code.
1349              With  the default value of 0, if one of the processes is crashed
1350              or aborted the other processes will continue to  run  while  the
1351              crashed  or  aborted  process  waits. The user can override this
1352              configuration parameter by using srun's -K, --kill-on-bad-exit.
1353
1354
1355       KillWait
1356              The interval, in seconds, given to a job's processes between the
1357              SIGTERM  and  SIGKILL  signals upon reaching its time limit.  If
1358              the job fails to terminate gracefully in the interval specified,
1359              it  will  be  forcibly terminated.  The default value is 30 sec‐
1360              onds.  The value may not exceed 65533.
1361
1362
1363       NodeFeaturesPlugins
1364              Identifies the plugins to be used for support of  node  features
1365              which  can  change through time. For example, a node which might
1366              be booted with various BIOS setting. This is  supported  through
1367              the  use  of a node's active_features and available_features in‐
1368              formation.  Acceptable values at present include:
1369
1370              node_features/knl_cray
1371                                  used only for Intel Knights Landing  proces‐
1372                                  sors (KNL) on Cray systems
1373
1374              node_features/knl_generic
1375                                  used  for  Intel  Knights Landing processors
1376                                  (KNL) on a generic Linux system
1377
1378
1379       LaunchParameters
1380              Identifies options to the job launch plugin.  Acceptable  values
1381              include:
1382
1383              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1384                                      from  given  --cpu-freq,  or  slurm.conf
1385                                      CpuFreqDef,  option.   By  default  only
1386                                      steps started with srun will utilize the
1387                                      cpu freq setting options.
1388
1389                                      NOTE:  If  you  are using srun to launch
1390                                      your steps inside a  batch  script  (ad‐
1391                                      vised)  this option will create a situa‐
1392                                      tion where you may have multiple  agents
1393                                      setting  the  cpu_freq as the batch step
1394                                      usually runs on the same  resources  one
1395                                      or  more  steps  the sruns in the script
1396                                      will create.
1397
1398              cray_net_exclusive      Allow jobs on a Cray Native cluster  ex‐
1399                                      clusive  access  to  network  resources.
1400                                      This should only be set on clusters pro‐
1401                                      viding  exclusive access to each node to
1402                                      a single job at once, and not using par‐
1403                                      allel  steps  within  the job, otherwise
1404                                      resources on the node  can  be  oversub‐
1405                                      scribed.
1406
1407              enable_nss_slurm        Permits  passwd and group resolution for
1408                                      a  job  to  be  serviced  by  slurmstepd
1409                                      rather  than  requiring  a lookup from a
1410                                      network     based      service.      See
1411                                      https://slurm.schedmd.com/nss_slurm.html
1412                                      for more information.
1413
1414              lustre_no_flush         If set on a Cray Native cluster, then do
1415                                      not  flush  the Lustre cache on job step
1416                                      completion. This setting will only  take
1417                                      effect  after  reconfiguring,  and  will
1418                                      only  take  effect  for  newly  launched
1419                                      jobs.
1420
1421              mem_sort                Sort NUMA memory at step start. User can
1422                                      override     this      default      with
1423                                      SLURM_MEM_BIND  environment  variable or
1424                                      --mem-bind=nosort command line option.
1425
1426              mpir_use_nodeaddr       When launching tasks Slurm  creates  en‐
1427                                      tries in MPIR_proctable that are used by
1428                                      parallel debuggers, profilers,  and  re‐
1429                                      lated   tools   to   attach  to  running
1430                                      process.  By default the  MPIR_proctable
1431                                      entries contain MPIR_procdesc structures
1432                                      where the host_name is set  to  NodeName
1433                                      by default. If this option is specified,
1434                                      NodeAddr will be used  in  this  context
1435                                      instead.
1436
1437              disable_send_gids       By  default,  the slurmctld will look up
1438                                      and send the user_name and extended gids
1439                                      for  a job, rather than independently on
1440                                      each node as part of each  task  launch.
1441                                      This  helps  mitigate issues around name
1442                                      service scalability when launching  jobs
1443                                      involving  many nodes. Using this option
1444                                      will disable  this  functionality.  This
1445                                      option is ignored if enable_nss_slurm is
1446                                      specified.
1447
1448              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1449                                      memory in RAM.
1450
1451              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1452                                      and future memory in RAM.
1453
1454              test_exec               Have srun verify existence of  the  exe‐
1455                                      cutable  program along with user execute
1456                                      permission on the node  where  srun  was
1457                                      called before attempting to launch it on
1458                                      nodes in the step.
1459
1460              use_interactive_step    Have salloc use the Interactive Step  to
1461                                      launch  a  shell on an allocated compute
1462                                      node rather  than  locally  to  wherever
1463                                      salloc was invoked. This is accomplished
1464                                      by launching the srun command  with  In‐
1465                                      teractiveStepOptions as options.
1466
1467                                      This  does not affect salloc called with
1468                                      a command as  an  argument.  These  jobs
1469                                      will  continue  to  be  executed  as the
1470                                      calling user on the calling host.
1471
1472
1473       LaunchType
1474              Identifies the mechanism to be used to launch application tasks.
1475              Acceptable values include:
1476
1477              launch/slurm
1478                     The default value.
1479
1480
1481       Licenses
1482              Specification  of  licenses (or other resources available on all
1483              nodes of the cluster) which can be allocated to  jobs.   License
1484              names can optionally be followed by a colon and count with a de‐
1485              fault count of one.  Multiple license names should be comma sep‐
1486              arated  (e.g.   "Licenses=foo:4,bar").  Note that Slurm prevents
1487              jobs from being scheduled if their required  license  specifica‐
1488              tion  is  not available.  Slurm does not prevent jobs from using
1489              licenses that are not explicitly listed in  the  job  submission
1490              specification.
1491
1492
1493       LogTimeFormat
1494              Format  of  the timestamp in slurmctld and slurmd log files. Ac‐
1495              cepted   values   are   "iso8601",   "iso8601_ms",    "rfc5424",
1496              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1497              ing in "_ms" differ from the ones  without  in  that  fractional
1498              seconds  with  millisecond  precision  are  printed. The default
1499              value is "iso8601_ms". The "rfc5424" formats are the same as the
1500              "iso8601"  formats except that the timezone value is also shown.
1501              The "clock" format shows a timestamp in  microseconds  retrieved
1502              with  the  C  standard clock() function. The "short" format is a
1503              short date and time format. The  "thread_id"  format  shows  the
1504              timestamp  in  the  C standard ctime() function form without the
1505              year but including the microseconds, the daemon's process ID and
1506              the current thread name and ID.
1507
1508
1509       MailDomain
1510              Domain name to qualify usernames if email address is not explic‐
1511              itly given with the "--mail-user" option. If  unset,  the  local
1512              MTA  will need to qualify local address itself. Changes to Mail‐
1513              Domain will only affect new jobs.
1514
1515
1516       MailProg
1517              Fully qualified pathname to the program used to send  email  per
1518              user   request.    The   default   value   is   "/bin/mail"  (or
1519              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1520              "/usr/bin/mail"  does  exist).  The program is called with argu‐
1521              ments suitable for the default mail command, however  additional
1522              information  about  the job is passed in the form of environment
1523              variables.
1524
1525              Additional variables are  the  same  as  those  passed  to  Pro‐
1526              logSlurmctld  and  EpilogSlurmctld  with additional variables in
1527              the following contexts:
1528
1529
1530              ALL
1531
1532
1533                     SLURM_JOB_STATE
1534                            The base state of the job  when  the  MailProg  is
1535                            called.
1536
1537
1538                     SLURM_JOB_MAIL_TYPE
1539                            The mail type triggering the mail.
1540
1541
1542              BEGIN
1543
1544                     SLURM_JOB_QEUEUED_TIME
1545                            The amount of time the job was queued.
1546
1547
1548              END, FAIL, REQUEUE, TIME_LIMIT_*
1549
1550                     SLURM_JOB_RUN_TIME
1551                            The amount of time the job ran for.
1552
1553
1554              END, FAIL
1555
1556                     SLURM_JOB_EXIT_CODE_MAX
1557                            Job's  exit code or highest exit code for an array
1558                            job.
1559
1560
1561                     SLURM_JOB_EXIT_CODE_MIN
1562                            Job's minimun exit code for an array job.
1563
1564
1565                     SLURM_JOB_TERM_SIGNAL_MAX
1566                            Job's highest signal for an array job.
1567
1568
1569              STAGE_OUT
1570
1571
1572                     SLURM_JOB_STAGE_OUT_TIME
1573                            Job's staging out time.
1574
1575
1576       MaxArraySize
1577              The maximum job array task index value will  be  one  less  than
1578              MaxArraySize  to  allow  for  an index value of zero.  Configure
1579              MaxArraySize to 0 in order to disable job array use.  The  value
1580              may not exceed 4000001.  The value of MaxJobCount should be much
1581              larger than MaxArraySize.  The default value is 1001.  See  also
1582              max_array_tasks in SchedulerParameters.
1583
1584
1585       MaxDBDMsgs
1586              When communication to the SlurmDBD is not possible the slurmctld
1587              will queue messages meant to  processed  when  the  SlurmDBD  is
1588              available  again.   In  order to avoid running out of memory the
1589              slurmctld will only queue so many messages. The default value is
1590              10000,  or  MaxJobCount  *  2  +  Node  Count  * 4, whichever is
1591              greater.  The value can not be less than 10000.
1592
1593
1594       MaxJobCount
1595              The maximum number of jobs Slurm can have in its active database
1596              at  one time. Set the values of MaxJobCount and MinJobAge to en‐
1597              sure the slurmctld daemon does not exhaust its memory  or  other
1598              resources.  Once this limit is reached, requests to submit addi‐
1599              tional jobs will fail. The default value is 10000  jobs.   NOTE:
1600              Each task of a job array counts as one job even though they will
1601              not occupy separate job records  until  modified  or  initiated.
1602              Performance  can  suffer  with  more than a few hundred thousand
1603              jobs.  Setting per MaxSubmitJobs per user is generally  valuable
1604              to  prevent  a  single  user  from filling the system with jobs.
1605              This is accomplished using Slurm's database and configuring  en‐
1606              forcement  of  resource limits.  This value may not be reset via
1607              "scontrol reconfig".  It only takes effect upon restart  of  the
1608              slurmctld daemon.
1609
1610
1611       MaxJobId
1612              The  maximum job id to be used for jobs submitted to Slurm with‐
1613              out a specific requested value. Job ids are unsigned 32bit inte‐
1614              gers  with  the first 26 bits reserved for local job ids and the
1615              remaining 6 bits reserved for a cluster id to identify a  feder‐
1616              ated   job's  origin.  The  maximum  allowed  local  job  id  is
1617              67,108,863  (0x3FFFFFF).  The  default   value   is   67,043,328
1618              (0x03ff0000).  MaxJobId only applies to the local job id and not
1619              the federated job id.  Job id values generated  will  be  incre‐
1620              mented  by  1 for each subsequent job. Once MaxJobId is reached,
1621              the next job will be assigned FirstJobId.  Federated  jobs  will
1622              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1623              bId.
1624
1625
1626       MaxMemPerCPU
1627              Maximum  real  memory  size  available  per  allocated  CPU   in
1628              megabytes.   Used  to  avoid over-subscribing memory and causing
1629              paging.  MaxMemPerCPU would generally be used if individual pro‐
1630              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
1631              lectType=select/cons_tres).  The default value is 0 (unlimited).
1632              Also  see DefMemPerCPU, DefMemPerGPU and MaxMemPerNode.  MaxMem‐
1633              PerCPU and MaxMemPerNode are mutually exclusive.
1634
1635              NOTE: If a job specifies a memory per  CPU  limit  that  exceeds
1636              this system limit, that job's count of CPUs per task will try to
1637              automatically increase. This may result in the job  failing  due
1638              to  CPU count limits. This auto-adjustment feature is a best-ef‐
1639              fort one and optimal assignment is not  guaranteed  due  to  the
1640              possibility   of   having   heterogeneous   configurations   and
1641              multi-partition/qos jobs.  If this is a concern it is advised to
1642              use  a job submit LUA plugin instead to enforce auto-adjustments
1643              to your specific needs.
1644
1645
1646       MaxMemPerNode
1647              Maximum  real  memory  size  available  per  allocated  node  in
1648              megabytes.   Used  to  avoid over-subscribing memory and causing
1649              paging.  MaxMemPerNode would generally be used  if  whole  nodes
1650              are  allocated  to jobs (SelectType=select/linear) and resources
1651              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
1652              The  default value is 0 (unlimited).  Also see DefMemPerNode and
1653              MaxMemPerCPU.  MaxMemPerCPU and MaxMemPerNode are  mutually  ex‐
1654              clusive.
1655
1656
1657       MaxStepCount
1658              The  maximum number of steps that any job can initiate. This pa‐
1659              rameter is intended to limit the effect of  bad  batch  scripts.
1660              The default value is 40000 steps.
1661
1662
1663       MaxTasksPerNode
1664              Maximum  number of tasks Slurm will allow a job step to spawn on
1665              a single node. The default MaxTasksPerNode is 512.  May not  ex‐
1666              ceed 65533.
1667
1668
1669       MCSParameters
1670              MCS  =  Multi-Category Security MCS Plugin Parameters.  The sup‐
1671              ported parameters are specific to  the  MCSPlugin.   Changes  to
1672              this  value take effect when the Slurm daemons are reconfigured.
1673              More    information    about    MCS    is     available     here
1674              <https://slurm.schedmd.com/mcs.html>.
1675
1676
1677       MCSPlugin
1678              MCS  =  Multi-Category  Security : associate a security label to
1679              jobs and ensure that nodes can only be shared among  jobs  using
1680              the same security label.  Acceptable values include:
1681
1682              mcs/none    is  the default value.  No security label associated
1683                          with jobs, no particular security  restriction  when
1684                          sharing nodes among jobs.
1685
1686              mcs/account only users with the same account can share the nodes
1687                          (requires enabling of accounting).
1688
1689              mcs/group   only users with the same group can share the nodes.
1690
1691              mcs/user    a node cannot be shared with other users.
1692
1693
1694       MessageTimeout
1695              Time permitted for a round-trip  communication  to  complete  in
1696              seconds.  Default  value  is 10 seconds. For systems with shared
1697              nodes, the slurmd daemon could  be  paged  out  and  necessitate
1698              higher values.
1699
1700
1701       MinJobAge
1702              The  minimum  age of a completed job before its record is purged
1703              from Slurm's active database. Set the values of MaxJobCount  and
1704              to  ensure  the  slurmctld daemon does not exhaust its memory or
1705              other resources. The default value is 300 seconds.  A  value  of
1706              zero  prevents any job record purging.  Jobs are not purged dur‐
1707              ing a backfill cycle, so it can take longer than MinJobAge  sec‐
1708              onds to purge a job if using the backfill scheduling plugin.  In
1709              order to eliminate some possible race  conditions,  the  minimum
1710              non-zero value for MinJobAge recommended is 2.
1711
1712
1713       MpiDefault
1714              Identifies  the  default type of MPI to be used.  Srun may over‐
1715              ride this configuration parameter in any case.   Currently  sup‐
1716              ported  versions  include:  pmi2, pmix, and none (default, which
1717              works for many other versions of MPI).  More  information  about
1718              MPI           use           is           available          here
1719              <https://slurm.schedmd.com/mpi_guide.html>.
1720
1721
1722       MpiParams
1723              MPI parameters.  Used to identify ports used by  older  versions
1724              of  OpenMPI  and  native  Cray  systems.   The  input  format is
1725              "ports=12000-12999" to identify a range of  communication  ports
1726              to  be  used.   NOTE:  This is not needed for modern versions of
1727              OpenMPI, taking it out can cause a  small  boost  in  scheduling
1728              performance.  NOTE: This is require for Cray's PMI.
1729
1730
1731       OverTimeLimit
1732              Number  of  minutes by which a job can exceed its time limit be‐
1733              fore being canceled.  Normally a job's time limit is treated  as
1734              a  hard  limit  and  the  job  will be killed upon reaching that
1735              limit.  Configuring OverTimeLimit will result in the job's  time
1736              limit being treated like a soft limit.  Adding the OverTimeLimit
1737              value to the soft time limit provides  a  hard  time  limit,  at
1738              which  point  the  job is canceled.  This is particularly useful
1739              for backfill scheduling, which bases upon each job's  soft  time
1740              limit.   The  default  value is zero.  May not exceed 65533 min‐
1741              utes.  A value of "UNLIMITED" is also supported.
1742
1743
1744       PluginDir
1745              Identifies the places in which to look for Slurm plugins.   This
1746              is a colon-separated list of directories, like the PATH environ‐
1747              ment variable.  The default value is the prefix given at config‐
1748              ure time + "/lib/slurm".
1749
1750
1751       PlugStackConfig
1752              Location of the config file for Slurm stackable plugins that use
1753              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1754              (SPANK).  This provides support for a highly configurable set of
1755              plugins to be called before and/or after execution of each  task
1756              spawned  as  part  of  a  user's  job step.  Default location is
1757              "plugstack.conf" in the same directory as the system slurm.conf.
1758              For more information on SPANK plugins, see the spank(8) manual.
1759
1760
1761       PowerParameters
1762              System  power  management  parameters.  The supported parameters
1763              are specific to the PowerPlugin.  Changes to this value take ef‐
1764              fect  when the Slurm daemons are reconfigured.  More information
1765              about   system    power    management    is    available    here
1766              <https://slurm.schedmd.com/power_mgmt.html>.    Options  current
1767              supported by any plugins are listed below.
1768
1769              balance_interval=#
1770                     Specifies the time interval, in seconds, between attempts
1771                     to rebalance power caps across the nodes.  This also con‐
1772                     trols the frequency at which Slurm  attempts  to  collect
1773                     current  power consumption data (old data may be used un‐
1774                     til new data is available from the underlying infrastruc‐
1775                     ture  and values below 10 seconds are not recommended for
1776                     Cray systems).  The default value is  30  seconds.   Sup‐
1777                     ported by the power/cray_aries plugin.
1778
1779              capmc_path=
1780                     Specifies  the  absolute  path of the capmc command.  The
1781                     default  value  is   "/opt/cray/capmc/default/bin/capmc".
1782                     Supported by the power/cray_aries plugin.
1783
1784              cap_watts=#
1785                     Specifies  the total power limit to be established across
1786                     all compute nodes managed by Slurm.  A value  of  0  sets
1787                     every compute node to have an unlimited cap.  The default
1788                     value is 0.  Supported by the power/cray_aries plugin.
1789
1790              decrease_rate=#
1791                     Specifies the maximum rate of change in the power cap for
1792                     a  node  where  the actual power usage is below the power
1793                     cap by an amount greater than  lower_threshold  (see  be‐
1794                     low).   Value  represents  a percentage of the difference
1795                     between a node's minimum and maximum  power  consumption.
1796                     The  default  value  is  50  percent.   Supported  by the
1797                     power/cray_aries plugin.
1798
1799              get_timeout=#
1800                     Amount of time allowed to get power state information  in
1801                     milliseconds.  The default value is 5,000 milliseconds or
1802                     5 seconds.  Supported by the power/cray_aries plugin  and
1803                     represents  the time allowed for the capmc command to re‐
1804                     spond to various "get" options.
1805
1806              increase_rate=#
1807                     Specifies the maximum rate of change in the power cap for
1808                     a  node  where  the  actual  power  usage  is  within up‐
1809                     per_threshold (see below) of the power cap.  Value repre‐
1810                     sents  a  percentage  of  the difference between a node's
1811                     minimum and maximum power consumption.  The default value
1812                     is 20 percent.  Supported by the power/cray_aries plugin.
1813
1814              job_level
1815                     All  nodes  associated  with every job will have the same
1816                     power  cap,  to  the  extent  possible.   Also  see   the
1817                     --power=level option on the job submission commands.
1818
1819              job_no_level
1820                     Disable  the  user's ability to set every node associated
1821                     with a job to the same power cap.  Each  node  will  have
1822                     its  power  cap  set  independently.   This  disables the
1823                     --power=level option on the job submission commands.
1824
1825              lower_threshold=#
1826                     Specify a lower power consumption threshold.  If a node's
1827                     current power consumption is below this percentage of its
1828                     current cap, then its power cap will be reduced.  The de‐
1829                     fault   value   is   90   percent.    Supported   by  the
1830                     power/cray_aries plugin.
1831
1832              recent_job=#
1833                     If a job has started or resumed execution (from  suspend)
1834                     on  a compute node within this number of seconds from the
1835                     current time, the node's power cap will be  increased  to
1836                     the  maximum.   The  default  value is 300 seconds.  Sup‐
1837                     ported by the power/cray_aries plugin.
1838
1839
1840              set_timeout=#
1841                     Amount of time allowed to set power state information  in
1842                     milliseconds.   The  default value is 30,000 milliseconds
1843                     or 30 seconds.  Supported by the  power/cray  plugin  and
1844                     represents  the time allowed for the capmc command to re‐
1845                     spond to various "set" options.
1846
1847              set_watts=#
1848                     Specifies the power limit to  be  set  on  every  compute
1849                     nodes  managed by Slurm.  Every node gets this same power
1850                     cap and there is no variation through time based upon ac‐
1851                     tual   power   usage  on  the  node.   Supported  by  the
1852                     power/cray_aries plugin.
1853
1854              upper_threshold=#
1855                     Specify an  upper  power  consumption  threshold.   If  a
1856                     node's current power consumption is above this percentage
1857                     of its current cap, then its power cap will be  increased
1858                     to the extent possible.  The default value is 95 percent.
1859                     Supported by the power/cray_aries plugin.
1860
1861
1862       PowerPlugin
1863              Identifies the plugin used for system  power  management.   Cur‐
1864              rently  supported plugins include: cray_aries and none.  Changes
1865              to this value require restarting Slurm daemons to  take  effect.
1866              More information about system power management is available here
1867              <https://slurm.schedmd.com/power_mgmt.html>.   By  default,   no
1868              power plugin is loaded.
1869
1870
1871       PreemptMode
1872              Mechanism  used  to preempt jobs or enable gang scheduling. When
1873              the PreemptType parameter is set to enable preemption, the  Pre‐
1874              emptMode  selects the default mechanism used to preempt the eli‐
1875              gible jobs for the cluster.
1876              PreemptMode may be specified on a per partition basis  to  over‐
1877              ride  this  default value if PreemptType=preempt/partition_prio.
1878              Alternatively, it can be specified on a per QOS  basis  if  Pre‐
1879              emptType=preempt/qos.  In  either case, a valid default Preempt‐
1880              Mode value must be specified for the cluster  as  a  whole  when
1881              preemption is enabled.
1882              The GANG option is used to enable gang scheduling independent of
1883              whether preemption is enabled (i.e. independent of the  Preempt‐
1884              Type  setting). It can be specified in addition to a PreemptMode
1885              setting with the two  options  comma  separated  (e.g.  Preempt‐
1886              Mode=SUSPEND,GANG).
1887              See         <https://slurm.schedmd.com/preempt.html>         and
1888              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
1889              tails.
1890
1891              NOTE:  For  performance reasons, the backfill scheduler reserves
1892              whole nodes for jobs, not  partial  nodes.  If  during  backfill
1893              scheduling  a  job  preempts  one  or more other jobs, the whole
1894              nodes for those preempted jobs are reserved  for  the  preemptor
1895              job,  even  if  the preemptor job requested fewer resources than
1896              that.  These reserved nodes aren't available to other jobs  dur‐
1897              ing that backfill cycle, even if the other jobs could fit on the
1898              nodes. Therefore, jobs may preempt more resources during a  sin‐
1899              gle backfill iteration than they requested.
1900
1901              NOTE:  For heterogeneous job to be considered for preemption all
1902              components must be eligible for preemption. When a heterogeneous
1903              job is to be preempted the first identified component of the job
1904              with the highest order PreemptMode (SUSPEND (highest),  REQUEUE,
1905              CANCEL  (lowest))  will  be  used to set the PreemptMode for all
1906              components. The GraceTime and user warning signal for each  com‐
1907              ponent  of  the  heterogeneous job remain unique.  Heterogeneous
1908              jobs are excluded from GANG scheduling operations.
1909
1910              OFF         Is the default value and disables job preemption and
1911                          gang  scheduling.   It  is only compatible with Pre‐
1912                          emptType=preempt/none at a global level.   A  common
1913                          use case for this parameter is to set it on a parti‐
1914                          tion to disable preemption for that partition.
1915
1916              CANCEL      The preempted job will be cancelled.
1917
1918              GANG        Enables gang scheduling (time slicing)  of  jobs  in
1919                          the  same partition, and allows the resuming of sus‐
1920                          pended jobs.
1921
1922                          NOTE: Gang scheduling is performed independently for
1923                          each  partition, so if you only want time-slicing by
1924                          OverSubscribe, without any preemption, then  config‐
1925                          uring  partitions with overlapping nodes is not rec‐
1926                          ommended.  On the other hand, if  you  want  to  use
1927                          PreemptType=preempt/partition_prio   to  allow  jobs
1928                          from higher PriorityTier partitions to Suspend  jobs
1929                          from  lower  PriorityTier  partitions  you will need
1930                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1931                          to  use  the  Gang scheduler to resume the suspended
1932                          jobs(s).  In any case, time-slicing won't happen be‐
1933                          tween jobs on different partitions.
1934
1935                          NOTE:  Heterogeneous  jobs  are  excluded  from GANG
1936                          scheduling operations.
1937
1938              REQUEUE     Preempts jobs by requeuing  them  (if  possible)  or
1939                          canceling  them.   For jobs to be requeued they must
1940                          have the --requeue sbatch option set or the  cluster
1941                          wide  JobRequeue parameter in slurm.conf must be set
1942                          to one.
1943
1944              SUSPEND     The preempted jobs will be suspended, and later  the
1945                          Gang  scheduler will resume them. Therefore the SUS‐
1946                          PEND preemption mode always needs the GANG option to
1947                          be specified at the cluster level. Also, because the
1948                          suspended jobs will still use memory  on  the  allo‐
1949                          cated  nodes, Slurm needs to be able to track memory
1950                          resources to be able to suspend jobs.
1951
1952                          NOTE: Because gang scheduling is performed  indepen‐
1953                          dently for each partition, if using PreemptType=pre‐
1954                          empt/partition_prio then jobs in higher PriorityTier
1955                          partitions  will  suspend jobs in lower PriorityTier
1956                          partitions to run on the  released  resources.  Only
1957                          when  the preemptor job ends will the suspended jobs
1958                          will be resumed by the Gang scheduler.
1959
1960                          NOTE: Suspended jobs will not release  GRES.  Higher
1961                          priority  jobs  will  not be able to preempt to gain
1962                          access to GRES.
1963                          If PreemptType=preempt/qos is configured and if  the
1964                          preempted  job(s)  and  the preemptor job are on the
1965                          same partition, then they will share resources  with
1966                          the  Gang  scheduler (time-slicing). If not (i.e. if
1967                          the preemptees and preemptor are on different parti‐
1968                          tions) then the preempted jobs will remain suspended
1969                          until the preemptor ends.
1970
1971
1972       PreemptType
1973              Specifies the plugin used to identify which  jobs  can  be  pre‐
1974              empted in order to start a pending job.
1975
1976              preempt/none
1977                     Job preemption is disabled.  This is the default.
1978
1979              preempt/partition_prio
1980                     Job  preemption  is  based  upon  partition PriorityTier.
1981                     Jobs in higher PriorityTier partitions may  preempt  jobs
1982                     from lower PriorityTier partitions.  This is not compati‐
1983                     ble with PreemptMode=OFF.
1984
1985              preempt/qos
1986                     Job preemption rules are specified by Quality Of  Service
1987                     (QOS)  specifications in the Slurm database.  This option
1988                     is not compatible with PreemptMode=OFF.  A  configuration
1989                     of  PreemptMode=SUSPEND  is only supported by the Select‐
1990                     Type=select/cons_res   and    SelectType=select/cons_tres
1991                     plugins.   See the sacctmgr man page to configure the op‐
1992                     tions for preempt/qos.
1993
1994
1995       PreemptExemptTime
1996              Global option for minimum run time for all jobs before they  can
1997              be  considered  for  preemption. Any QOS PreemptExemptTime takes
1998              precedence over the global option.  A time of  -1  disables  the
1999              option,  equivalent  to 0. Acceptable time formats include "min‐
2000              utes", "minutes:seconds", "hours:minutes:seconds", "days-hours",
2001              "days-hours:minutes", and "days-hours:minutes:seconds".
2002
2003
2004       PrEpParameters
2005              Parameters to be passed to the PrEpPlugins.
2006
2007
2008       PrEpPlugins
2009              A  resource  for  programmers wishing to write their own plugins
2010              for the Prolog and Epilog (PrEp) scripts. The default, and  cur‐
2011              rently  the  only  implemented plugin is prep/script. Additional
2012              plugins can be specified in a comma-separated list. For more in‐
2013              formation  please  see  the  PrEp Plugin API documentation page:
2014              <https://slurm.schedmd.com/prep_plugins.html>
2015
2016
2017       PriorityCalcPeriod
2018              The period of time in minutes in which the half-life decay  will
2019              be re-calculated.  Applicable only if PriorityType=priority/mul‐
2020              tifactor.  The default value is 5 (minutes).
2021
2022
2023       PriorityDecayHalfLife
2024              This controls how long prior resource use is considered  in  de‐
2025              termining  how  over- or under-serviced an association is (user,
2026              bank account and cluster)  in  determining  job  priority.   The
2027              record  of  usage  will  be  decayed over time, with half of the
2028              original value cleared at age PriorityDecayHalfLife.  If set  to
2029              0  no decay will be applied.  This is helpful if you want to en‐
2030              force hard time limits per association.  If  set  to  0  Priori‐
2031              tyUsageResetPeriod  must  be  set  to some interval.  Applicable
2032              only if PriorityType=priority/multifactor.  The unit is  a  time
2033              string  (i.e.  min, hr:min:00, days-hr:min:00, or days-hr).  The
2034              default value is 7-0 (7 days).
2035
2036
2037       PriorityFavorSmall
2038              Specifies that small jobs should be given preferential  schedul‐
2039              ing  priority.   Applicable only if PriorityType=priority/multi‐
2040              factor.  Supported values are "YES" and "NO".  The default value
2041              is "NO".
2042
2043
2044       PriorityFlags
2045              Flags to modify priority behavior.  Applicable only if Priority‐
2046              Type=priority/multifactor.  The keywords below have  no  associ‐
2047              ated    value   (e.g.   "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
2048              TIVE_TO_TIME").
2049
2050              ACCRUE_ALWAYS    If set, priority age factor will  be  increased
2051                               despite job dependencies or holds.
2052
2053              CALCULATE_RUNNING
2054                               If  set,  priorities  will  be recalculated not
2055                               only for pending jobs,  but  also  running  and
2056                               suspended jobs.
2057
2058              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
2059                               lar to the normal multifactor calculation,  but
2060                               depth  of  the  associations in the tree do not
2061                               adversely effect their  priority.  This  option
2062                               automatically enables NO_FAIR_TREE.
2063
2064              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
2065                               to "classic" fair share priority scheduling.
2066
2067              INCR_ONLY        If set, priority values will only  increase  in
2068                               value.  Job  priority  will  never  decrease in
2069                               value.
2070
2071              MAX_TRES         If set, the weighted  TRES  value  (e.g.  TRES‐
2072                               BillingWeights) is calculated as the MAX of in‐
2073                               dividual TRES' on a node (e.g. cpus, mem, gres)
2074                               plus  the  sum  of  all  global TRES' (e.g. li‐
2075                               censes).
2076
2077              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
2078
2079              NO_NORMAL_ASSOC  If set, the association factor is  not  normal‐
2080                               ized against the highest association priority.
2081
2082              NO_NORMAL_PART   If  set, the partition factor is not normalized
2083                               against the highest  partition  PriorityJobFac‐
2084                               tor.
2085
2086              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
2087                               against the highest qos priority.
2088
2089              NO_NORMAL_TRES   If  set,  the  QOS  factor  is  not  normalized
2090                               against the job's partition TRES counts.
2091
2092              SMALL_RELATIVE_TO_TIME
2093                               If  set, the job's size component will be based
2094                               upon not the job size alone, but the job's size
2095                               divided by its time limit.
2096
2097
2098       PriorityMaxAge
2099              Specifies the job age which will be given the maximum age factor
2100              in computing priority. For example, a value of 30 minutes  would
2101              result  in  all  jobs  over  30  minutes  old would get the same
2102              age-based  priority.   Applicable  only  if  PriorityType=prior‐
2103              ity/multifactor.    The   unit  is  a  time  string  (i.e.  min,
2104              hr:min:00, days-hr:min:00, or days-hr).  The  default  value  is
2105              7-0 (7 days).
2106
2107
2108       PriorityParameters
2109              Arbitrary string used by the PriorityType plugin.
2110
2111
2112       PrioritySiteFactorParameters
2113              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
2114
2115
2116       PrioritySiteFactorPlugin
2117              The  specifies  an  optional plugin to be used alongside "prior‐
2118              ity/multifactor", which is meant to initially set  and  continu‐
2119              ously  update the SiteFactor priority factor.  The default value
2120              is "site_factor/none".
2121
2122
2123       PriorityType
2124              This specifies the plugin to be used  in  establishing  a  job's
2125              scheduling  priority.   Also see PriorityFlags for configuration
2126              options.  The default value is "priority/basic".
2127
2128              priority/basic
2129                     Jobs are evaluated in a First In, First Out  (FIFO)  man‐
2130                     ner.
2131
2132              priority/multifactor
2133                     Jobs are assigned a priority based upon a variety of fac‐
2134                     tors that include size, age, Fairshare, etc.
2135              When not FIFO scheduling, jobs are prioritized in the  following
2136              order:
2137
2138              1. Jobs that can preempt
2139              2. Jobs with an advanced reservation
2140              3. Partition PriorityTier
2141              4. Job priority
2142              5. Job submit time
2143              6. Job ID
2144
2145
2146       PriorityUsageResetPeriod
2147              At  this  interval the usage of associations will be reset to 0.
2148              This is used if you want to enforce hard limits  of  time  usage
2149              per association.  If PriorityDecayHalfLife is set to be 0 no de‐
2150              cay will happen and this is the only way to reset the usage  ac‐
2151              cumulated by running jobs.  By default this is turned off and it
2152              is advised to use the PriorityDecayHalfLife option to avoid  not
2153              having  anything  running on your cluster, but if your schema is
2154              set up to only allow certain amounts of time on your system this
2155              is  the  way  to  do it.  Applicable only if PriorityType=prior‐
2156              ity/multifactor.
2157
2158              NONE        Never clear historic usage. The default value.
2159
2160              NOW         Clear the historic usage now.  Executed  at  startup
2161                          and reconfiguration time.
2162
2163              DAILY       Cleared every day at midnight.
2164
2165              WEEKLY      Cleared every week on Sunday at time 00:00.
2166
2167              MONTHLY     Cleared  on  the  first  day  of  each month at time
2168                          00:00.
2169
2170              QUARTERLY   Cleared on the first day of  each  quarter  at  time
2171                          00:00.
2172
2173              YEARLY      Cleared on the first day of each year at time 00:00.
2174
2175
2176       PriorityWeightAge
2177              An  integer  value  that sets the degree to which the queue wait
2178              time component contributes to the  job's  priority.   Applicable
2179              only  if  PriorityType=priority/multifactor.   Requires Account‐
2180              ingStorageType=accounting_storage/slurmdbd.  The  default  value
2181              is 0.
2182
2183
2184       PriorityWeightAssoc
2185              An  integer  value that sets the degree to which the association
2186              component contributes to the job's priority.  Applicable only if
2187              PriorityType=priority/multifactor.  The default value is 0.
2188
2189
2190       PriorityWeightFairshare
2191              An  integer  value  that sets the degree to which the fair-share
2192              component contributes to the job's priority.  Applicable only if
2193              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2194              ageType=accounting_storage/slurmdbd.  The default value is 0.
2195
2196
2197       PriorityWeightJobSize
2198              An integer value that sets the degree to which the job size com‐
2199              ponent  contributes  to  the job's priority.  Applicable only if
2200              PriorityType=priority/multifactor.  The default value is 0.
2201
2202
2203       PriorityWeightPartition
2204              Partition factor used by priority/multifactor plugin  in  calcu‐
2205              lating  job  priority.   Applicable  only if PriorityType=prior‐
2206              ity/multifactor.  The default value is 0.
2207
2208
2209       PriorityWeightQOS
2210              An integer value that sets the degree to which  the  Quality  Of
2211              Service component contributes to the job's priority.  Applicable
2212              only if PriorityType=priority/multifactor.  The default value is
2213              0.
2214
2215
2216       PriorityWeightTRES
2217              A  comma-separated  list of TRES Types and weights that sets the
2218              degree that each TRES Type contributes to the job's priority.
2219
2220              e.g.
2221              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2222
2223              Applicable only if PriorityType=priority/multifactor and if  Ac‐
2224              countingStorageTRES is configured with each TRES Type.  Negative
2225              values are allowed.  The default values are 0.
2226
2227
2228       PrivateData
2229              This controls what type of information is  hidden  from  regular
2230              users.   By  default,  all  information is visible to all users.
2231              User SlurmUser and root can always view all information.  Multi‐
2232              ple  values may be specified with a comma separator.  Acceptable
2233              values include:
2234
2235              accounts
2236                     (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2237                     ing  any account definitions unless they are coordinators
2238                     of them.
2239
2240              cloud  Powered down nodes in the cloud are visible.
2241
2242              events prevents users from viewing event information unless they
2243                     have operator status or above.
2244
2245              jobs   Prevents  users  from viewing jobs or job steps belonging
2246                     to other users. (NON-SlurmDBD ACCOUNTING  ONLY)  Prevents
2247                     users  from  viewing job records belonging to other users
2248                     unless they are coordinators of the  association  running
2249                     the job when using sacct.
2250
2251              nodes  Prevents users from viewing node state information.
2252
2253              partitions
2254                     Prevents users from viewing partition state information.
2255
2256              reservations
2257                     Prevents  regular  users  from viewing reservations which
2258                     they can not use.
2259
2260              usage  Prevents users from viewing usage of any other user, this
2261                     applies  to  sshare.  (NON-SlurmDBD ACCOUNTING ONLY) Pre‐
2262                     vents users from viewing usage of any  other  user,  this
2263                     applies to sreport.
2264
2265              users  (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2266                     ing information of any user other than  themselves,  this
2267                     also  makes  it  so  users can only see associations they
2268                     deal with.  Coordinators  can  see  associations  of  all
2269                     users  in  the  account  they are coordinator of, but can
2270                     only see themselves when listing users.
2271
2272
2273       ProctrackType
2274              Identifies the plugin to be used for process tracking on  a  job
2275              step  basis.   The slurmd daemon uses this mechanism to identify
2276              all processes which are children of processes it  spawns  for  a
2277              user job step.  The slurmd daemon must be restarted for a change
2278              in ProctrackType to take  effect.   NOTE:  "proctrack/linuxproc"
2279              and  "proctrack/pgid" can fail to identify all processes associ‐
2280              ated with a job since processes can become a child of  the  init
2281              process  (when  the  parent  process terminates) or change their
2282              process  group.   To  reliably  track  all   processes,   "proc‐
2283              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2284              applies to a job allocation, while ProctrackType applies to  job
2285              steps.  Acceptable values at present include:
2286
2287              proctrack/cgroup
2288                     Uses  linux cgroups to constrain and track processes, and
2289                     is the default for systems with cgroup support.
2290                     NOTE: see "man cgroup.conf" for configuration details.
2291
2292              proctrack/cray_aries
2293                     Uses Cray proprietary process tracking.
2294
2295              proctrack/linuxproc
2296                     Uses linux process tree using parent process IDs.
2297
2298              proctrack/pgid
2299                     Uses Process Group IDs.
2300                     NOTE: This is the default for the BSD family.
2301
2302
2303       Prolog Fully qualified pathname of a program for the slurmd to  execute
2304              whenever it is asked to run a job step from a new job allocation
2305              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2306              may  also  be used to specify more than one program to run (e.g.
2307              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2308              starting  the  first job step.  The prolog script or scripts may
2309              be used to purge files, enable  user  login,  etc.   By  default
2310              there  is  no  prolog. Any configured script is expected to com‐
2311              plete execution quickly (in less time than MessageTimeout).   If
2312              the  prolog  fails (returns a non-zero exit code), this will re‐
2313              sult in the node being set to a DRAIN state and  the  job  being
2314              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2315              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2316              for more information.
2317
2318
2319       PrologEpilogTimeout
2320              The  interval  in seconds Slurms waits for Prolog and Epilog be‐
2321              fore terminating them. The default behavior is to  wait  indefi‐
2322              nitely.  This  interval  applies to the Prolog and Epilog run by
2323              slurmd daemon before and after the job, the PrologSlurmctld  and
2324              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
2325              run by the slurmstepd daemon.
2326
2327
2328       PrologFlags
2329              Flags to control the Prolog behavior. By default  no  flags  are
2330              set.  Multiple flags may be specified in a comma-separated list.
2331              Currently supported options are:
2332
2333              Alloc   If set, the Prolog script will be executed at job  allo‐
2334                      cation.  By  default, Prolog is executed just before the
2335                      task is launched. Therefore, when salloc is started,  no
2336                      Prolog is executed. Alloc is useful for preparing things
2337                      before a user starts to use any allocated resources.  In
2338                      particular,  this  flag  is needed on a Cray system when
2339                      cluster compatibility mode is enabled.
2340
2341                      NOTE: Use of the Alloc flag will increase the  time  re‐
2342                      quired to start jobs.
2343
2344              Contain At job allocation time, use the ProcTrack plugin to cre‐
2345                      ate a job container  on  all  allocated  compute  nodes.
2346                      This  container  may  be  used  for  user  processes not
2347                      launched    under    Slurm    control,    for    example
2348                      pam_slurm_adopt  may  place processes launched through a
2349                      direct  user  login  into  this  container.   If   using
2350                      pam_slurm_adopt,  then  ProcTrackType must be set to ei‐
2351                      ther proctrack/cgroup or proctrack/cray_aries.   Setting
2352                      the Contain implicitly sets the Alloc flag.
2353
2354              NoHold  If  set,  the  Alloc flag should also be set.  This will
2355                      allow for salloc to not block until the prolog  is  fin‐
2356                      ished on each node.  The blocking will happen when steps
2357                      reach the slurmd and before any execution  has  happened
2358                      in  the  step.  This is a much faster way to work and if
2359                      using srun to launch your  tasks  you  should  use  this
2360                      flag.  This  flag cannot be combined with the Contain or
2361                      X11 flags.
2362
2363              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2364                      rently  on each node.  This flag forces those scripts to
2365                      run serially within each node, but  with  a  significant
2366                      penalty to job throughput on each node.
2367
2368              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2369                      This is incompatible with ProctrackType=proctrack/linux‐
2370                      proc.  Setting the X11 flag implicitly enables both Con‐
2371                      tain and Alloc flags as well.
2372
2373
2374       PrologSlurmctld
2375              Fully qualified pathname of a program for the  slurmctld  daemon
2376              to execute before granting a new job allocation (e.g.  "/usr/lo‐
2377              cal/slurm/prolog_controller").  The program  executes  as  Slur‐
2378              mUser on the same node where the slurmctld daemon executes, giv‐
2379              ing it permission to drain nodes and requeue the job if a  fail‐
2380              ure  occurs  or cancel the job if appropriate.  Exactly what the
2381              program does and how it accomplishes this is completely  at  the
2382              discretion  of  the system administrator.  Information about the
2383              job being initiated, its allocated nodes, etc. are passed to the
2384              program using environment variables.  While this program is run‐
2385              ning,  the  nodes  associated  with  the  job  will  be  have  a
2386              POWER_UP/CONFIGURING flag set in their state, which can be read‐
2387              ily viewed.  The slurmctld daemon  will  wait  indefinitely  for
2388              this  program  to  complete.  Once the program completes with an
2389              exit code of zero, the nodes will be considered  ready  for  use
2390              and  the  program will be started.  If some node can not be made
2391              available for use, the program should drain the node  (typically
2392              using  the  scontrol command) and terminate with a non-zero exit
2393              code.  A non-zero exit code will result in  the  job  being  re‐
2394              queued (where possible) or killed. Note that only batch jobs can
2395              be requeued.  See Prolog and Epilog Scripts  for  more  informa‐
2396              tion.
2397
2398
2399       PropagatePrioProcess
2400              Controls  the  scheduling  priority (nice value) of user spawned
2401              tasks.
2402
2403              0    The tasks will inherit the  scheduling  priority  from  the
2404                   slurm daemon.  This is the default value.
2405
2406              1    The  tasks will inherit the scheduling priority of the com‐
2407                   mand used to submit them (e.g. srun or sbatch).  Unless the
2408                   job is submitted by user root, the tasks will have a sched‐
2409                   uling priority no higher than  the  slurm  daemon  spawning
2410                   them.
2411
2412              2    The  tasks will inherit the scheduling priority of the com‐
2413                   mand used to submit them (e.g. srun or sbatch) with the re‐
2414                   striction  that  their nice value will always be one higher
2415                   than the slurm daemon (i.e.  the tasks scheduling  priority
2416                   will be lower than the slurm daemon).
2417
2418
2419       PropagateResourceLimits
2420              A comma-separated list of resource limit names.  The slurmd dae‐
2421              mon uses these names to obtain the associated (soft) limit  val‐
2422              ues  from  the  user's  process  environment on the submit node.
2423              These limits are then propagated and applied to  the  jobs  that
2424              will  run  on  the  compute nodes.  This parameter can be useful
2425              when system limits vary among nodes.  Any resource  limits  that
2426              do not appear in the list are not propagated.  However, the user
2427              can override this by specifying which resource limits to  propa‐
2428              gate  with  the  sbatch or srun "--propagate" option. If neither
2429              PropagateResourceLimits  or  PropagateResourceLimitsExcept   are
2430              configured  and  the "--propagate" option is not specified, then
2431              the default action is to propagate all limits. Only one  of  the
2432              parameters, either PropagateResourceLimits or PropagateResource‐
2433              LimitsExcept, may be specified.  The user limits can not  exceed
2434              hard  limits under which the slurmd daemon operates. If the user
2435              limits are not propagated, the limits  from  the  slurmd  daemon
2436              will  be  propagated  to the user's job. The limits used for the
2437              Slurm daemons can be set in  the  /etc/sysconf/slurm  file.  For
2438              more  information,  see: https://slurm.schedmd.com/faq.html#mem‐
2439              lock The following limit names are supported by Slurm  (although
2440              some options may not be supported on some systems):
2441
2442              ALL       All limits listed below (default)
2443
2444              NONE      No limits listed below
2445
2446              AS        The  maximum  address  space  (virtual  memory)  for a
2447                        process.
2448
2449              CORE      The maximum size of core file
2450
2451              CPU       The maximum amount of CPU time
2452
2453              DATA      The maximum size of a process's data segment
2454
2455              FSIZE     The maximum size of files created. Note  that  if  the
2456                        user  sets  FSIZE to less than the current size of the
2457                        slurmd.log, job launches will fail with a  'File  size
2458                        limit exceeded' error.
2459
2460              MEMLOCK   The maximum size that may be locked into memory
2461
2462              NOFILE    The maximum number of open files
2463
2464              NPROC     The maximum number of processes available
2465
2466              RSS       The  maximum  resident  set size.  Note that this only
2467                        has effect with Linux kernels 2.4.30 or older or BSD.
2468
2469              STACK     The maximum stack size
2470
2471
2472       PropagateResourceLimitsExcept
2473              A comma-separated list of resource limit names.  By default, all
2474              resource  limits will be propagated, (as described by the Propa‐
2475              gateResourceLimits parameter), except for the  limits  appearing
2476              in  this  list.   The user can override this by specifying which
2477              resource limits to propagate with the sbatch or  srun  "--propa‐
2478              gate"  option.   See PropagateResourceLimits above for a list of
2479              valid limit names.
2480
2481
2482       RebootProgram
2483              Program to be executed on each compute node to  reboot  it.  In‐
2484              voked on each node once it becomes idle after the command "scon‐
2485              trol reboot" is executed by an authorized user or a job is  sub‐
2486              mitted with the "--reboot" option.  After rebooting, the node is
2487              returned to normal use.  See ResumeTimeout to configure the time
2488              you expect a reboot to finish in.  A node will be marked DOWN if
2489              it doesn't reboot within ResumeTimeout.
2490
2491
2492       ReconfigFlags
2493              Flags to control various actions  that  may  be  taken  when  an
2494              "scontrol  reconfig"  command  is  issued. Currently the options
2495              are:
2496
2497              KeepPartInfo     If set, an  "scontrol  reconfig"  command  will
2498                               maintain   the  in-memory  value  of  partition
2499                               "state" and other parameters that may have been
2500                               dynamically updated by "scontrol update".  Par‐
2501                               tition information in the slurm.conf file  will
2502                               be  merged  with in-memory data.  This flag su‐
2503                               persedes the KeepPartState flag.
2504
2505              KeepPartState    If set, an  "scontrol  reconfig"  command  will
2506                               preserve  only  the  current  "state"  value of
2507                               in-memory partitions and will reset  all  other
2508                               parameters of the partitions that may have been
2509                               dynamically updated by "scontrol update" to the
2510                               values from the slurm.conf file.  Partition in‐
2511                               formation in the slurm.conf file will be merged
2512                               with in-memory data.
2513              The  default  for  the above flags is not set, and the "scontrol
2514              reconfig" will rebuild the partition information using only  the
2515              definitions in the slurm.conf file.
2516
2517
2518       RequeueExit
2519              Enables  automatic  requeue  for  batch jobs which exit with the
2520              specified values.  Separate multiple exit code by a comma and/or
2521              specify  numeric  ranges  using  a "-" separator (e.g. "Requeue‐
2522              Exit=1-9,18") Jobs will be put back  in  to  pending  state  and
2523              later scheduled again.  Restarted jobs will have the environment
2524              variable SLURM_RESTART_COUNT set to the number of times the  job
2525              has been restarted.
2526
2527
2528       RequeueExitHold
2529              Enables  automatic  requeue  for  batch jobs which exit with the
2530              specified values, with these jobs being held until released man‐
2531              ually  by  the  user.   Separate  multiple  exit code by a comma
2532              and/or specify numeric ranges using a "-" separator  (e.g.  "Re‐
2533              queueExitHold=10-12,16")  These  jobs  are  put  in the JOB_SPE‐
2534              CIAL_EXIT exit state.  Restarted jobs will have the  environment
2535              variable  SLURM_RESTART_COUNT set to the number of times the job
2536              has been restarted.
2537
2538
2539       ResumeFailProgram
2540              The program that will be executed when nodes fail to  resume  to
2541              by  ResumeTimeout. The argument to the program will be the names
2542              of the failed nodes (using Slurm's hostlist expression format).
2543
2544
2545       ResumeProgram
2546              Slurm supports a mechanism to reduce power consumption on  nodes
2547              that  remain idle for an extended period of time.  This is typi‐
2548              cally accomplished by reducing voltage and frequency or powering
2549              the  node  down.  ResumeProgram is the program that will be exe‐
2550              cuted when a node in power save mode is assigned  work  to  per‐
2551              form.   For  reasons  of  reliability, ResumeProgram may execute
2552              more than once for a node when the slurmctld daemon crashes  and
2553              is  restarted.   If ResumeProgram is unable to restore a node to
2554              service with a responding slurmd and  an  updated  BootTime,  it
2555              should requeue any job associated with the node and set the node
2556              state to DOWN. If the node isn't actually  rebooted  (i.e.  when
2557              multiple-slurmd  is configured) starting slurmd with "-b" option
2558              might be useful.  The program executes as SlurmUser.  The  argu‐
2559              ment  to  the  program  will be the names of nodes to be removed
2560              from power savings mode (using Slurm's hostlist expression  for‐
2561              mat). A job to node mapping is available in JSON format by read‐
2562              ing the temporary file specified by the SLURM_RESUME_FILE  envi‐
2563              ronment variable.  By default no program is run.
2564
2565
2566       ResumeRate
2567              The  rate at which nodes in power save mode are returned to nor‐
2568              mal operation by ResumeProgram.  The value is  number  of  nodes
2569              per minute and it can be used to prevent power surges if a large
2570              number of nodes in power save mode are assigned work at the same
2571              time  (e.g.  a large job starts).  A value of zero results in no
2572              limits being imposed.   The  default  value  is  300  nodes  per
2573              minute.
2574
2575
2576       ResumeTimeout
2577              Maximum  time  permitted (in seconds) between when a node resume
2578              request is issued and when the node is  actually  available  for
2579              use.   Nodes  which  fail  to respond in this time frame will be
2580              marked DOWN and the jobs scheduled on the node requeued.   Nodes
2581              which  reboot  after  this time frame will be marked DOWN with a
2582              reason of "Node unexpectedly rebooted."  The default value is 60
2583              seconds.
2584
2585
2586       ResvEpilog
2587              Fully  qualified pathname of a program for the slurmctld to exe‐
2588              cute when a reservation ends. The program can be used to  cancel
2589              jobs,  modify  partition  configuration,  etc.   The reservation
2590              named will be passed as an argument to the program.  By  default
2591              there is no epilog.
2592
2593
2594       ResvOverRun
2595              Describes how long a job already running in a reservation should
2596              be permitted to execute after the end time  of  the  reservation
2597              has  been  reached.  The time period is specified in minutes and
2598              the default value is 0 (kill the job  immediately).   The  value
2599              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2600              supported to permit a job to run indefinitely after its reserva‐
2601              tion is terminated.
2602
2603
2604       ResvProlog
2605              Fully  qualified pathname of a program for the slurmctld to exe‐
2606              cute when a reservation begins. The program can be used to  can‐
2607              cel  jobs, modify partition configuration, etc.  The reservation
2608              named will be passed as an argument to the program.  By  default
2609              there is no prolog.
2610
2611
2612       ReturnToService
2613              Controls  when a DOWN node will be returned to service.  The de‐
2614              fault value is 0.  Supported values include
2615
2616              0   A node will remain in the DOWN state until a system adminis‐
2617                  trator explicitly changes its state (even if the slurmd dae‐
2618                  mon registers and resumes communications).
2619
2620              1   A DOWN node will become available for use upon  registration
2621                  with  a  valid  configuration only if it was set DOWN due to
2622                  being non-responsive.  If the node  was  set  DOWN  for  any
2623                  other  reason  (low  memory,  unexpected  reboot, etc.), its
2624                  state will not automatically be changed.  A  node  registers
2625                  with  a  valid configuration if its memory, GRES, CPU count,
2626                  etc. are equal to or greater than the values  configured  in
2627                  slurm.conf.
2628
2629              2   A  DOWN node will become available for use upon registration
2630                  with a valid configuration.  The node could  have  been  set
2631                  DOWN for any reason.  A node registers with a valid configu‐
2632                  ration if its memory, GRES, CPU count, etc. are equal to  or
2633                  greater than the values configured in slurm.conf.
2634
2635
2636       RoutePlugin
2637              Identifies  the  plugin to be used for defining which nodes will
2638              be used for message forwarding.
2639
2640              route/default
2641                     default, use TreeWidth.
2642
2643              route/topology
2644                     use the switch hierarchy defined in a topology.conf file.
2645                     TopologyPlugin=topology/tree is required.
2646
2647
2648       SchedulerParameters
2649              The  interpretation  of  this parameter varies by SchedulerType.
2650              Multiple options may be comma separated.
2651
2652              allow_zero_lic
2653                     If set, then job submissions requesting more than config‐
2654                     ured licenses won't be rejected.
2655
2656              assoc_limit_stop
2657                     If  set and a job cannot start due to association limits,
2658                     then do not attempt to initiate any lower  priority  jobs
2659                     in  that  partition.  Setting  this  can  decrease system
2660                     throughput and utilization, but avoid potentially  starv‐
2661                     ing larger jobs by preventing them from launching indefi‐
2662                     nitely.
2663
2664              batch_sched_delay=#
2665                     How long, in seconds, the scheduling of batch jobs can be
2666                     delayed.   This  can be useful in a high-throughput envi‐
2667                     ronment in which batch jobs are submitted at a very  high
2668                     rate  (i.e.  using  the sbatch command) and one wishes to
2669                     reduce the overhead of attempting to schedule each job at
2670                     submit time.  The default value is 3 seconds.
2671
2672              bb_array_stage_cnt=#
2673                     Number of tasks from a job array that should be available
2674                     for burst buffer resource allocation. Higher values  will
2675                     increase  the  system  overhead as each task from the job
2676                     array will be moved to its own job record in  memory,  so
2677                     relatively  small  values are generally recommended.  The
2678                     default value is 10.
2679
2680              bf_busy_nodes
2681                     When selecting resources for pending jobs to reserve  for
2682                     future execution (i.e. the job can not be started immedi‐
2683                     ately), then preferentially select nodes that are in use.
2684                     This  will  tend to leave currently idle resources avail‐
2685                     able for backfilling longer running jobs, but may  result
2686                     in allocations having less than optimal network topology.
2687                     This option  is  currently  only  supported  by  the  se‐
2688                     lect/cons_res   and   select/cons_tres  plugins  (or  se‐
2689                     lect/cray_aries   with   SelectTypeParameters   set    to
2690                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2691                     select/cray_aries plugin over the select/cons_res or  se‐
2692                     lect/cons_tres plugin respectively).
2693
2694              bf_continue
2695                     The backfill scheduler periodically releases locks in or‐
2696                     der to permit other operations  to  proceed  rather  than
2697                     blocking  all  activity for what could be an extended pe‐
2698                     riod of time.  Setting this option will cause  the  back‐
2699                     fill  scheduler  to continue processing pending jobs from
2700                     its original job list after releasing locks even  if  job
2701                     or node state changes.
2702
2703              bf_hetjob_immediate
2704                     Instruct  the  backfill  scheduler  to attempt to start a
2705                     heterogeneous job as soon as all of  its  components  are
2706                     determined  able to do so. Otherwise, the backfill sched‐
2707                     uler will delay heterogeneous  jobs  initiation  attempts
2708                     until  after  the  rest  of the queue has been processed.
2709                     This delay may result in lower priority jobs being  allo‐
2710                     cated  resources, which could delay the initiation of the
2711                     heterogeneous job due to account and/or QOS limits  being
2712                     reached.  This  option is disabled by default. If enabled
2713                     and bf_hetjob_prio=min is not set, then it would be auto‐
2714                     matically set.
2715
2716              bf_hetjob_prio=[min|avg|max]
2717                     At  the  beginning  of  each backfill scheduling cycle, a
2718                     list of pending to be scheduled jobs is sorted  according
2719                     to  the precedence order configured in PriorityType. This
2720                     option instructs the scheduler to alter the sorting algo‐
2721                     rithm to ensure that all components belonging to the same
2722                     heterogeneous job will be attempted to be scheduled  con‐
2723                     secutively  (thus  not fragmented in the resulting list).
2724                     More specifically, all components from the same heteroge‐
2725                     neous  job  will  be treated as if they all have the same
2726                     priority (minimum, average or maximum depending upon this
2727                     option's  parameter)  when  compared  with other jobs (or
2728                     other heterogeneous job components). The  original  order
2729                     will be preserved within the same heterogeneous job. Note
2730                     that the operation is  calculated  for  the  PriorityTier
2731                     layer  and  for  the  Priority  resulting from the prior‐
2732                     ity/multifactor plugin calculations. When enabled, if any
2733                     heterogeneous job requested an advanced reservation, then
2734                     all of that job's components will be treated as  if  they
2735                     had  requested an advanced reservation (and get preferen‐
2736                     tial treatment in scheduling).
2737
2738                     Note that this operation does  not  update  the  Priority
2739                     values  of  the  heterogeneous job components, only their
2740                     order within the list, so the output of the sprio command
2741                     will not be effected.
2742
2743                     Heterogeneous  jobs  have  special scheduling properties:
2744                     they  are  only  scheduled  by  the  backfill  scheduling
2745                     plugin, each of their components is considered separately
2746                     when reserving resources (and might have different Prior‐
2747                     ityTier  or  different Priority values), and no heteroge‐
2748                     neous job component is actually allocated resources until
2749                     all  if  its components can be initiated.  This may imply
2750                     potential scheduling deadlock  scenarios  because  compo‐
2751                     nents from different heterogeneous jobs can start reserv‐
2752                     ing resources in an  interleaved  fashion  (not  consecu‐
2753                     tively),  but  none of the jobs can reserve resources for
2754                     all components and start. Enabling this option  can  help
2755                     to mitigate this problem. By default, this option is dis‐
2756                     abled.
2757
2758              bf_interval=#
2759                     The  number  of  seconds  between  backfill   iterations.
2760                     Higher  values result in less overhead and better respon‐
2761                     siveness.   This  option  applies  only   to   Scheduler‐
2762                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2763                     (3h).
2764
2765
2766              bf_job_part_count_reserve=#
2767                     The backfill scheduling logic will reserve resources  for
2768                     the specified count of highest priority jobs in each par‐
2769                     tition.  For example,  bf_job_part_count_reserve=10  will
2770                     cause the backfill scheduler to reserve resources for the
2771                     ten highest priority jobs in each partition.   Any  lower
2772                     priority  job  that can be started using currently avail‐
2773                     able resources and  not  adversely  impact  the  expected
2774                     start  time of these higher priority jobs will be started
2775                     by the backfill scheduler  The  default  value  is  zero,
2776                     which  will reserve resources for any pending job and de‐
2777                     lay  initiation  of  lower  priority  jobs.    Also   see
2778                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2779                     Min: 0, Max: 100000.
2780
2781
2782              bf_max_job_array_resv=#
2783                     The maximum number of tasks from a job  array  for  which
2784                     the  backfill scheduler will reserve resources in the fu‐
2785                     ture.  Since job arrays can potentially have millions  of
2786                     tasks,  the overhead in reserving resources for all tasks
2787                     can be prohibitive.  In addition various limits may  pre‐
2788                     vent  all  the  jobs from starting at the expected times.
2789                     This has no impact upon the number of tasks  from  a  job
2790                     array  that  can be started immediately, only those tasks
2791                     expected to start at some future time.  Default: 20, Min:
2792                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2793                     tions appear in the job queue once per partition. If dif‐
2794                     ferent copies of a single job array record aren't consec‐
2795                     utive in the job queue and another job array record is in
2796                     between,  then bf_max_job_array_resv tasks are considered
2797                     per partition that the job is submitted to.
2798
2799              bf_max_job_assoc=#
2800                     The maximum number of jobs per user  association  to  at‐
2801                     tempt starting with the backfill scheduler.  This setting
2802                     is similar to bf_max_job_user but is handy if a user  has
2803                     multiple  associations  equating  to  basically different
2804                     users.  One can set this  limit  to  prevent  users  from
2805                     flooding  the  backfill queue with jobs that cannot start
2806                     and that prevent jobs from other users  to  start.   This
2807                     option   applies  only  to  SchedulerType=sched/backfill.
2808                     Also    see    the    bf_max_job_user    bf_max_job_part,
2809                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2810                     bf_max_job_test   to   a   value   much    higher    than
2811                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2812                     bf_max_job_test.
2813
2814              bf_max_job_part=#
2815                     The maximum number  of  jobs  per  partition  to  attempt
2816                     starting  with  the backfill scheduler. This can be espe‐
2817                     cially helpful for systems with large numbers  of  parti‐
2818                     tions  and  jobs.  This option applies only to Scheduler‐
2819                     Type=sched/backfill.  Also  see  the  partition_job_depth
2820                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2821                     value much higher than bf_max_job_part.  Default:  0  (no
2822                     limit), Min: 0, Max: bf_max_job_test.
2823
2824              bf_max_job_start=#
2825                     The  maximum  number  of jobs which can be initiated in a
2826                     single iteration of the backfill scheduler.  This  option
2827                     applies only to SchedulerType=sched/backfill.  Default: 0
2828                     (no limit), Min: 0, Max: 10000.
2829
2830              bf_max_job_test=#
2831                     The maximum number of jobs to attempt backfill scheduling
2832                     for (i.e. the queue depth).  Higher values result in more
2833                     overhead and less responsiveness.  Until  an  attempt  is
2834                     made  to backfill schedule a job, its expected initiation
2835                     time value will not be set.  In the case of  large  clus‐
2836                     ters,  configuring a relatively small value may be desir‐
2837                     able.    This   option   applies   only   to   Scheduler‐
2838                     Type=sched/backfill.    Default:   500,   Min:   1,  Max:
2839                     1,000,000.
2840
2841              bf_max_job_user=#
2842                     The maximum number of jobs per user to  attempt  starting
2843                     with  the backfill scheduler for ALL partitions.  One can
2844                     set this limit to prevent users from flooding  the  back‐
2845                     fill  queue  with jobs that cannot start and that prevent
2846                     jobs from other users to start.  This is similar  to  the
2847                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2848                     SchedulerType=sched/backfill.      Also      see      the
2849                     bf_max_job_part,            bf_max_job_test           and
2850                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2851                     value  much  higher than bf_max_job_user.  Default: 0 (no
2852                     limit), Min: 0, Max: bf_max_job_test.
2853
2854              bf_max_job_user_part=#
2855                     The maximum number of jobs per user per partition to  at‐
2856                     tempt starting with the backfill scheduler for any single
2857                     partition.   This  option  applies  only  to   Scheduler‐
2858                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2859                     bf_max_job_test and bf_max_job_user=# options.   Default:
2860                     0 (no limit), Min: 0, Max: bf_max_job_test.
2861
2862              bf_max_time=#
2863                     The  maximum  time  in seconds the backfill scheduler can
2864                     spend (including time spent sleeping when locks  are  re‐
2865                     leased)  before discontinuing, even if maximum job counts
2866                     have not been  reached.   This  option  applies  only  to
2867                     SchedulerType=sched/backfill.   The  default value is the
2868                     value of bf_interval (which defaults to 30 seconds).  De‐
2869                     fault: bf_interval value (def. 30 sec), Min: 1, Max: 3600
2870                     (1h).  NOTE: If bf_interval is short and  bf_max_time  is
2871                     large, this may cause locks to be acquired too frequently
2872                     and starve out other serviced RPCs. It's advisable if us‐
2873                     ing  this  parameter  to set max_rpc_cnt high enough that
2874                     scheduling isn't always disabled, and low enough that the
2875                     interactive  workload can get through in a reasonable pe‐
2876                     riod of time. max_rpc_cnt needs to be below 256 (the  de‐
2877                     fault  RPC thread limit). Running around the middle (150)
2878                     may give you good results.   NOTE:  When  increasing  the
2879                     amount  of  time  spent in the backfill scheduling cycle,
2880                     Slurm can be prevented from responding to client requests
2881                     in  a  timely  manner.   To  address  this  you  can  use
2882                     max_rpc_cnt to specify a number of queued RPCs before the
2883                     scheduler stops to respond to these requests.
2884
2885              bf_min_age_reserve=#
2886                     The  backfill  and main scheduling logic will not reserve
2887                     resources for pending jobs until they have  been  pending
2888                     and  runnable  for  at least the specified number of sec‐
2889                     onds.  In addition, jobs waiting for less than the speci‐
2890                     fied number of seconds will not prevent a newly submitted
2891                     job from starting immediately, even if the newly  submit‐
2892                     ted  job  has  a lower priority.  This can be valuable if
2893                     jobs lack time limits or all time limits  have  the  same
2894                     value.  The default value is zero, which will reserve re‐
2895                     sources for any pending job and delay initiation of lower
2896                     priority  jobs.   Also  see bf_job_part_count_reserve and
2897                     bf_min_prio_reserve.  Default: 0, Min:  0,  Max:  2592000
2898                     (30 days).
2899
2900              bf_min_prio_reserve=#
2901                     The  backfill  and main scheduling logic will not reserve
2902                     resources for pending jobs unless they  have  a  priority
2903                     equal  to  or  higher than the specified value.  In addi‐
2904                     tion, jobs with a lower priority will not prevent a newly
2905                     submitted  job  from  starting  immediately,  even if the
2906                     newly submitted job has a lower priority.   This  can  be
2907                     valuable  if  one  wished  to  maximum system utilization
2908                     without regard for job priority below a  certain  thresh‐
2909                     old.   The  default value is zero, which will reserve re‐
2910                     sources for any pending job and delay initiation of lower
2911                     priority  jobs.   Also  see bf_job_part_count_reserve and
2912                     bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2913
2914              bf_node_space_size=#
2915                     Size of backfill node_space table. Adding a single job to
2916                     backfill  reservations  in the worst case can consume two
2917                     node_space records.  In the case of large clusters,  con‐
2918                     figuring a relatively small value may be desirable.  This
2919                     option  applies  only  to   SchedulerType=sched/backfill.
2920                     Also see bf_max_job_test and bf_running_job_reserve.  De‐
2921                     fault: bf_max_job_test, Min: 2, Max: 2,000,000.
2922
2923              bf_one_resv_per_job
2924                     Disallow adding more than one  backfill  reservation  per
2925                     job.   The  scheduling logic builds a sorted list of job-
2926                     partition pairs. Jobs submitted  to  multiple  partitions
2927                     have as many entries in the list as requested partitions.
2928                     By default, the backfill scheduler may evaluate  all  the
2929                     job-partition  entries  for a single job, potentially re‐
2930                     serving resources for each pair, but  only  starting  the
2931                     job  in the reservation offering the earliest start time.
2932                     Having a single job reserving resources for multiple par‐
2933                     titions  could  impede  other jobs (or hetjob components)
2934                     from reserving resources already reserved for the  parti‐
2935                     tions that don't offer the earliest start time.  A single
2936                     job that requests multiple partitions  can  also  prevent
2937                     itself  from  starting earlier in a lower priority parti‐
2938                     tion if the  partitions  overlap  nodes  and  a  backfill
2939                     reservation in the higher priority partition blocks nodes
2940                     that are also in the lower priority partition.  This  op‐
2941                     tion  makes it so that a job submitted to multiple parti‐
2942                     tions will stop reserving resources once the  first  job-
2943                     partition  pair has booked a backfill reservation. Subse‐
2944                     quent pairs from the same job  will  only  be  tested  to
2945                     start  now. This allows for other jobs to be able to book
2946                     the other pairs resources at the cost of not guaranteeing
2947                     that  the multi partition job will start in the partition
2948                     offering the earliest start time (unless it can start im‐
2949                     mediately).  This option is disabled by default.
2950
2951
2952              bf_resolution=#
2953                     The  number  of  seconds  in the resolution of data main‐
2954                     tained about when jobs begin and end. Higher  values  re‐
2955                     sult in better responsiveness and quicker backfill cycles
2956                     by using larger blocks of time to determine  node  eligi‐
2957                     bility.   However,  higher  values lead to less efficient
2958                     system planning, and may miss  opportunities  to  improve
2959                     system  utilization.   This option applies only to Sched‐
2960                     ulerType=sched/backfill.  Default: 60, Min: 1, Max:  3600
2961                     (1 hour).
2962
2963              bf_running_job_reserve
2964                     Add  an extra step to backfill logic, which creates back‐
2965                     fill reservations for jobs running on whole nodes.   This
2966                     option is disabled by default.
2967
2968              bf_window=#
2969                     The  number  of minutes into the future to look when con‐
2970                     sidering jobs to schedule.  Higher values result in  more
2971                     overhead  and  less  responsiveness.  A value at least as
2972                     long as the highest allowed time limit is  generally  ad‐
2973                     visable to prevent job starvation.  In order to limit the
2974                     amount of data managed by the backfill scheduler, if  the
2975                     value of bf_window is increased, then it is generally ad‐
2976                     visable to also increase bf_resolution.  This option  ap‐
2977                     plies  only  to  SchedulerType=sched/backfill.   Default:
2978                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2979
2980              bf_window_linear=#
2981                     For performance reasons, the backfill scheduler will  de‐
2982                     crease  precision in calculation of job expected termina‐
2983                     tion times. By default, the precision starts at  30  sec‐
2984                     onds  and that time interval doubles with each evaluation
2985                     of currently executing jobs when trying to determine when
2986                     a  pending  job  can start. This algorithm can support an
2987                     environment with many thousands of running jobs, but  can
2988                     result  in  the expected start time of pending jobs being
2989                     gradually being deferred due  to  lack  of  precision.  A
2990                     value  for  bf_window_linear will cause the time interval
2991                     to be increased by a constant amount on  each  iteration.
2992                     The  value is specified in units of seconds. For example,
2993                     a value of 60 will cause the backfill  scheduler  on  the
2994                     first  iteration  to  identify the job ending soonest and
2995                     determine if the pending job can be  started  after  that
2996                     job plus all other jobs expected to end within 30 seconds
2997                     (default initial value) of the first job. On the next it‐
2998                     eration,  the  pending job will be evaluated for starting
2999                     after the next job expected to end plus all  jobs  ending
3000                     within  90  seconds of that time (30 second default, plus
3001                     the 60 second option value).  The  third  iteration  will
3002                     have  a  150  second  window  and the fourth 210 seconds.
3003                     Without this option, the time windows will double on each
3004                     iteration  and thus be 30, 60, 120, 240 seconds, etc. The
3005                     use of bf_window_linear is not recommended with more than
3006                     a few hundred simultaneously executing jobs.
3007
3008              bf_yield_interval=#
3009                     The backfill scheduler will periodically relinquish locks
3010                     in order for other  pending  operations  to  take  place.
3011                     This  specifies the times when the locks are relinquished
3012                     in microseconds.  Smaller values may be helpful for  high
3013                     throughput  computing  when  used in conjunction with the
3014                     bf_continue option.  Also see the bf_yield_sleep  option.
3015                     Default:  2,000,000  (2 sec), Min: 1, Max: 10,000,000 (10
3016                     sec).
3017
3018              bf_yield_sleep=#
3019                     The backfill scheduler will periodically relinquish locks
3020                     in  order  for  other  pending  operations to take place.
3021                     This specifies the length of time for which the locks are
3022                     relinquished  in microseconds.  Also see the bf_yield_in‐
3023                     terval option.  Default: 500,000 (0.5 sec), Min: 1,  Max:
3024                     10,000,000 (10 sec).
3025
3026              build_queue_timeout=#
3027                     Defines  the maximum time that can be devoted to building
3028                     a queue of jobs to be tested for scheduling.  If the sys‐
3029                     tem  has  a  huge  number of jobs with dependencies, just
3030                     building the job queue can take so much time  as  to  ad‐
3031                     versely impact overall system performance and this param‐
3032                     eter can be adjusted as needed.   The  default  value  is
3033                     2,000,000 microseconds (2 seconds).
3034
3035              correspond_after_task_cnt=#
3036                     Defines  the number of array tasks that get split for po‐
3037                     tential aftercorr dependency check. Low number may result
3038                     in dependent task check failures when the job one depends
3039                     on gets purged before the split.  Default: 10.
3040
3041              default_queue_depth=#
3042                     The default number of jobs to  attempt  scheduling  (i.e.
3043                     the  queue  depth)  when a running job completes or other
3044                     routine actions occur, however the frequency  with  which
3045                     the scheduler is run may be limited by using the defer or
3046                     sched_min_interval parameters described below.  The  full
3047                     queue  will be tested on a less frequent basis as defined
3048                     by the sched_interval option described below. The default
3049                     value  is  100.   See  the  partition_job_depth option to
3050                     limit depth by partition.
3051
3052              defer  Setting this option will  avoid  attempting  to  schedule
3053                     each  job  individually  at job submit time, but defer it
3054                     until a later time when scheduling multiple jobs simulta‐
3055                     neously  may be possible.  This option may improve system
3056                     responsiveness when large numbers of jobs (many hundreds)
3057                     are  submitted  at  the  same time, but it will delay the
3058                     initiation  time  of  individual  jobs.  Also   see   de‐
3059                     fault_queue_depth above.
3060
3061              delay_boot=#
3062                     Do not reboot nodes in order to satisfied this job's fea‐
3063                     ture specification if the job has been  eligible  to  run
3064                     for  less  than  this time period.  If the job has waited
3065                     for less than the specified  period,  it  will  use  only
3066                     nodes which already have the specified features.  The ar‐
3067                     gument is in units of minutes.  Individual jobs may over‐
3068                     ride this default value with the --delay-boot option.
3069
3070              disable_job_shrink
3071                     Deny  user  requests  to shrink the side of running jobs.
3072                     (However, running jobs may still shrink due to node fail‐
3073                     ure if the --no-kill option was set.)
3074
3075              disable_hetjob_steps
3076                     Disable  job  steps  that  span heterogeneous job alloca‐
3077                     tions.
3078
3079              enable_hetjob_steps
3080                     Enable job steps that span heterogeneous job allocations.
3081                     The default value.
3082
3083              enable_user_top
3084                     Enable  use  of  the "scontrol top" command by non-privi‐
3085                     leged users.
3086
3087              Ignore_NUMA
3088                     Some processors (e.g. AMD Opteron  6000  series)  contain
3089                     multiple  NUMA  nodes per socket. This is a configuration
3090                     which does not map into the hardware entities that  Slurm
3091                     optimizes   resource  allocation  for  (PU/thread,  core,
3092                     socket, baseboard, node and network switch). In order  to
3093                     optimize  resource  allocations  on  such hardware, Slurm
3094                     will consider each NUMA node within the socket as a sepa‐
3095                     rate socket by default. Use the Ignore_NUMA option to re‐
3096                     port the correct socket count, but not optimize  resource
3097                     allocations on the NUMA nodes.
3098
3099              max_array_tasks
3100                     Specify  the maximum number of tasks that can be included
3101                     in a job array.  The default limit is  MaxArraySize,  but
3102                     this  option  can be used to set a lower limit. For exam‐
3103                     ple, max_array_tasks=1000 and  MaxArraySize=100001  would
3104                     permit  a maximum task ID of 100000, but limit the number
3105                     of tasks in any single job array to 1000.
3106
3107              max_rpc_cnt=#
3108                     If the number of active threads in the  slurmctld  daemon
3109                     is  equal  to or larger than this value, defer scheduling
3110                     of jobs. The scheduler will check this condition at  cer‐
3111                     tain  points  in code and yield locks if necessary.  This
3112                     can improve Slurm's ability to process requests at a cost
3113                     of  initiating  new jobs less frequently. Default: 0 (op‐
3114                     tion disabled), Min: 0, Max: 1000.
3115
3116                     NOTE: The maximum number of threads  (MAX_SERVER_THREADS)
3117                     is internally set to 256 and defines the number of served
3118                     RPCs at a given time. Setting max_rpc_cnt  to  more  than
3119                     256 will be only useful to let backfill continue schedul‐
3120                     ing work after locks have been yielded (i.e. each 2  sec‐
3121                     onds)  if  there are a maximum of MAX(max_rpc_cnt/10, 20)
3122                     RPCs in the queue. i.e. max_rpc_cnt=1000,  the  scheduler
3123                     will  be  allowed  to  continue after yielding locks only
3124                     when there are less than or equal to  100  pending  RPCs.
3125                     If a value is set, then a value of 10 or higher is recom‐
3126                     mended. It may require some tuning for each  system,  but
3127                     needs to be high enough that scheduling isn't always dis‐
3128                     abled, and low enough that requests can get through in  a
3129                     reasonable period of time.
3130
3131              max_sched_time=#
3132                     How  long, in seconds, that the main scheduling loop will
3133                     execute for before exiting.  If a value is configured, be
3134                     aware  that  all  other Slurm operations will be deferred
3135                     during this time period.  Make certain the value is lower
3136                     than  MessageTimeout.   If a value is not explicitly con‐
3137                     figured, the default value is half of MessageTimeout with
3138                     a minimum default value of 1 second and a maximum default
3139                     value of 2 seconds.  For  example  if  MessageTimeout=10,
3140                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3141
3142              max_script_size=#
3143                     Specify  the  maximum  size  of a batch script, in bytes.
3144                     The default value is 4 megabytes.  Larger values may  ad‐
3145                     versely impact system performance.
3146
3147              max_switch_wait=#
3148                     Maximum  number of seconds that a job can delay execution
3149                     waiting for the specified desired switch count.  The  de‐
3150                     fault value is 300 seconds.
3151
3152              no_backup_scheduling
3153                     If  used,  the  backup  controller will not schedule jobs
3154                     when it takes over. The backup controller will allow jobs
3155                     to  be submitted, modified and cancelled but won't sched‐
3156                     ule new jobs. This is useful in  Cray  environments  when
3157                     the  backup  controller resides on an external Cray node.
3158                     A restart is required to alter this option.
3159
3160              no_env_cache
3161                     If used, any job started on node that fails to  load  the
3162                     env  from  a  node  will fail instead of using the cached
3163                     env.   This  will   also   implicitly   imply   the   re‐
3164                     queue_setup_env_fail option as well.
3165
3166              nohold_on_prolog_fail
3167                     By default, if the Prolog exits with a non-zero value the
3168                     job is requeued in a held state. By specifying  this  pa‐
3169                     rameter the job will be requeued but not held so that the
3170                     scheduler can dispatch it to another host.
3171
3172              pack_serial_at_end
3173                     If used  with  the  select/cons_res  or  select/cons_tres
3174                     plugin,  then put serial jobs at the end of the available
3175                     nodes rather than using a best fit algorithm.   This  may
3176                     reduce resource fragmentation for some workloads.
3177
3178              partition_job_depth=#
3179                     The  default  number  of jobs to attempt scheduling (i.e.
3180                     the queue depth) from  each  partition/queue  in  Slurm's
3181                     main  scheduling  logic.  The functionality is similar to
3182                     that provided by the bf_max_job_part option for the back‐
3183                     fill  scheduling  logic.   The  default  value  is  0 (no
3184                     limit).  Job's excluded from attempted  scheduling  based
3185                     upon  partition  will  not  be  counted  against  the de‐
3186                     fault_queue_depth limit.  Also  see  the  bf_max_job_part
3187                     option.
3188
3189              preempt_reorder_count=#
3190                     Specify  how  many  attempts should be made in reordering
3191                     preemptable jobs to minimize the count of jobs preempted.
3192                     The  default value is 1. High values may adversely impact
3193                     performance.  The logic to support this  option  is  only
3194                     available  in  the  select/cons_res  and select/cons_tres
3195                     plugins.
3196
3197              preempt_strict_order
3198                     If set, then execute extra logic in an attempt to preempt
3199                     only  the  lowest  priority jobs.  It may be desirable to
3200                     set this configuration parameter when there are  multiple
3201                     priorities  of  preemptable  jobs.   The logic to support
3202                     this option is only available in the select/cons_res  and
3203                     select/cons_tres plugins.
3204
3205              preempt_youngest_first
3206                     If  set,  then  the  preemption sorting algorithm will be
3207                     changed to sort by the job start times to favor  preempt‐
3208                     ing  younger  jobs  over  older. (Requires preempt/parti‐
3209                     tion_prio or preempt/qos plugins.)
3210
3211              reduce_completing_frag
3212                     This option is used to  control  how  scheduling  of  re‐
3213                     sources  is  performed  when  jobs  are in the COMPLETING
3214                     state, which influences potential fragmentation.  If this
3215                     option  is  not  set  then no jobs will be started in any
3216                     partition when any job is in  the  COMPLETING  state  for
3217                     less  than  CompleteWait  seconds.  If this option is set
3218                     then no jobs will be started in any individual  partition
3219                     that  has  a  job  in COMPLETING state for less than Com‐
3220                     pleteWait seconds.  In addition, no jobs will be  started
3221                     in  any  partition with nodes that overlap with any nodes
3222                     in the partition of the completing job.  This  option  is
3223                     to be used in conjunction with CompleteWait.
3224
3225                     NOTE: CompleteWait must be set in order for this to work.
3226                     If CompleteWait=0 then this option does nothing.
3227
3228                     NOTE: reduce_completing_frag only affects the main sched‐
3229                     uler, not the backfill scheduler.
3230
3231              requeue_setup_env_fail
3232                     By default if a job environment setup fails the job keeps
3233                     running with a limited environment.  By  specifying  this
3234                     parameter  the job will be requeued in held state and the
3235                     execution node drained.
3236
3237              salloc_wait_nodes
3238                     If defined, the salloc command will wait until all  allo‐
3239                     cated  nodes  are  ready for use (i.e. booted) before the
3240                     command returns. By default, salloc will return  as  soon
3241                     as the resource allocation has been made.
3242
3243              sbatch_wait_nodes
3244                     If  defined,  the sbatch script will wait until all allo‐
3245                     cated nodes are ready for use (i.e.  booted)  before  the
3246                     initiation.  By default, the sbatch script will be initi‐
3247                     ated as soon as the first node in the job  allocation  is
3248                     ready.  The  sbatch  command can use the --wait-all-nodes
3249                     option to override this configuration parameter.
3250
3251              sched_interval=#
3252                     How frequently, in seconds, the main scheduling loop will
3253                     execute  and test all pending jobs.  The default value is
3254                     60 seconds.
3255
3256              sched_max_job_start=#
3257                     The maximum number of jobs that the main scheduling logic
3258                     will start in any single execution.  The default value is
3259                     zero, which imposes no limit.
3260
3261              sched_min_interval=#
3262                     How frequently, in microseconds, the main scheduling loop
3263                     will  execute  and  test any pending jobs.  The scheduler
3264                     runs in a limited fashion every time that any event  hap‐
3265                     pens  which could enable a job to start (e.g. job submit,
3266                     job terminate, etc.).  If these events happen at  a  high
3267                     frequency, the scheduler can run very frequently and con‐
3268                     sume significant resources if not throttled by  this  op‐
3269                     tion.  This option specifies the minimum time between the
3270                     end of one scheduling cycle and the beginning of the next
3271                     scheduling  cycle.   A  value of zero will disable throt‐
3272                     tling of the  scheduling  logic  interval.   The  default
3273                     value is 2 microseconds on other systems.
3274
3275              spec_cores_first
3276                     Specialized  cores  will be selected from the first cores
3277                     of the first sockets, cycling through the  sockets  on  a
3278                     round robin basis.  By default, specialized cores will be
3279                     selected from the last cores of the last sockets, cycling
3280                     through the sockets on a round robin basis.
3281
3282              step_retry_count=#
3283                     When a step completes and there are steps ending resource
3284                     allocation, then retry step allocations for at least this
3285                     number  of pending steps.  Also see step_retry_time.  The
3286                     default value is 8 steps.
3287
3288              step_retry_time=#
3289                     When a step completes and there are steps ending resource
3290                     allocation,  then  retry  step  allocations for all steps
3291                     which have been pending for at least this number of  sec‐
3292                     onds.   Also  see step_retry_count.  The default value is
3293                     60 seconds.
3294
3295              whole_hetjob
3296                     Requests to cancel, hold or release any  component  of  a
3297                     heterogeneous  job  will  be applied to all components of
3298                     the job.
3299
3300                     NOTE: this option was  previously  named  whole_pack  and
3301                     this is still supported for retrocompatibility.
3302
3303
3304       SchedulerTimeSlice
3305              Number of seconds in each time slice when gang scheduling is en‐
3306              abled (PreemptMode=SUSPEND,GANG).  The value must be  between  5
3307              seconds and 65533 seconds.  The default value is 30 seconds.
3308
3309
3310       SchedulerType
3311              Identifies the type of scheduler to be used.  Note the slurmctld
3312              daemon must be restarted for a change in scheduler type  to  be‐
3313              come effective (reconfiguring a running daemon has no effect for
3314              this parameter).  The scontrol command can be used  to  manually
3315              change job priorities if desired.  Acceptable values include:
3316
3317              sched/backfill
3318                     For  a  backfill scheduling module to augment the default
3319                     FIFO  scheduling.   Backfill  scheduling  will   initiate
3320                     lower-priority  jobs  if  doing so does not delay the ex‐
3321                     pected initiation time of any higher priority  job.   Ef‐
3322                     fectiveness  of  backfill  scheduling  is  dependent upon
3323                     users specifying job time limits, otherwise all jobs will
3324                     have  the  same time limit and backfilling is impossible.
3325                     Note documentation  for  the  SchedulerParameters  option
3326                     above.  This is the default configuration.
3327
3328              sched/builtin
3329                     This is the FIFO scheduler which initiates jobs in prior‐
3330                     ity order.  If any job in the partition can not be sched‐
3331                     uled,  no  lower  priority  job in that partition will be
3332                     scheduled.  An exception is made for jobs  that  can  not
3333                     run due to partition constraints (e.g. the time limit) or
3334                     down/drained nodes.  In that case,  lower  priority  jobs
3335                     can be initiated and not impact the higher priority job.
3336
3337              sched/hold
3338                     To   hold   all   newly   arriving   jobs   if   a   file
3339                     "/etc/slurm.hold" exists otherwise use the built-in  FIFO
3340                     scheduler
3341
3342
3343       ScronParameters
3344              Multiple options may be comma separated.
3345
3346              enable Enable  the use of scrontab to submit and manage periodic
3347                     repeating jobs.
3348
3349
3350       SelectType
3351              Identifies the type of resource selection algorithm to be  used.
3352              Changing this value can only be done by restarting the slurmctld
3353              daemon.  When changed, all job information (running and pending)
3354              will  be  lost,  since  the  job  state save format used by each
3355              plugin is different.  The only exception to this is when  chang‐
3356              ing  from  cons_res  to cons_tres or from cons_tres to cons_res.
3357              However, if a job contains cons_tres-specific features and  then
3358              SelectType  is  changed  to  cons_res, the job will be canceled,
3359              since there is no way for cons_res to satisfy requirements  spe‐
3360              cific to cons_tres.
3361
3362              Acceptable values include
3363
3364              select/cons_res
3365                     The  resources (cores and memory) within a node are indi‐
3366                     vidually allocated as consumable  resources.   Note  that
3367                     whole  nodes can be allocated to jobs for selected parti‐
3368                     tions by using the OverSubscribe=Exclusive  option.   See
3369                     the  partition  OverSubscribe parameter for more informa‐
3370                     tion.
3371
3372              select/cons_tres
3373                     The resources (cores, memory, GPUs and all  other  track‐
3374                     able  resources) within a node are individually allocated
3375                     as consumable resources.  Note that whole  nodes  can  be
3376                     allocated  to  jobs  for selected partitions by using the
3377                     OverSubscribe=Exclusive option.  See the partition  Over‐
3378                     Subscribe parameter for more information.
3379
3380              select/cray_aries
3381                     for   a   Cray   system.    The  default  value  is  "se‐
3382                     lect/cray_aries" for all Cray systems.
3383
3384              select/linear
3385                     for allocation of entire nodes assuming a one-dimensional
3386                     array  of  nodes  in which sequentially ordered nodes are
3387                     preferable.  For a heterogeneous cluster (e.g.  different
3388                     CPU  counts  on  the various nodes), resource allocations
3389                     will favor nodes with high CPU  counts  as  needed  based
3390                     upon the job's node and CPU specification if TopologyPlu‐
3391                     gin=topology/none is configured. Use  of  other  topology
3392                     plugins with select/linear and heterogeneous nodes is not
3393                     recommended and may result in valid  job  allocation  re‐
3394                     quests being rejected.  This is the default value.
3395
3396
3397       SelectTypeParameters
3398              The  permitted  values  of  SelectTypeParameters depend upon the
3399              configured value of SelectType.  The only supported options  for
3400              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3401              which treats memory as a consumable resource and prevents memory
3402              over  subscription  with  job preemption or gang scheduling.  By
3403              default SelectType=select/linear allocates whole nodes  to  jobs
3404              without  considering  their  memory consumption.  By default Se‐
3405              lectType=select/cons_res, SelectType=select/cray_aries, and  Se‐
3406              lectType=select/cons_tres,  use  CR_Core_Memory, which allocates
3407              Core to jobs with considering their memory consumption.
3408
3409              The  following  options   are   supported   for   SelectType=se‐
3410              lect/cray_aries:
3411
3412                     OTHER_CONS_RES
3413                            Layer  the  select/cons_res  plugin  under the se‐
3414                            lect/cray_aries plugin, the default is to layer on
3415                            select/linear.   This  also allows all the options
3416                            available for SelectType=select/cons_res.
3417
3418                     OTHER_CONS_TRES
3419                            Layer the select/cons_tres plugin  under  the  se‐
3420                            lect/cray_aries plugin, the default is to layer on
3421                            select/linear.  This also allows all  the  options
3422                            available for SelectType=select/cons_tres.
3423
3424              The  following  options  are  supported  by  the  SelectType=se‐
3425              lect/cons_res and SelectType=select/cons_tres plugins:
3426
3427                     CR_CPU CPUs are consumable resources.  Configure the num‐
3428                            ber  of  CPUs  on each node, which may be equal to
3429                            the count of cores or hyper-threads  on  the  node
3430                            depending  upon the desired minimum resource allo‐
3431                            cation.  The  node's  Boards,  Sockets,  CoresPer‐
3432                            Socket  and  ThreadsPerCore may optionally be con‐
3433                            figured and result in job allocations  which  have
3434                            improved  locality;  however doing so will prevent
3435                            more than one job from  being  allocated  on  each
3436                            core.
3437
3438                     CR_CPU_Memory
3439                            CPUs and memory are consumable resources.  Config‐
3440                            ure the number of CPUs on each node, which may  be
3441                            equal  to  the  count of cores or hyper-threads on
3442                            the node depending upon the  desired  minimum  re‐
3443                            source  allocation.   The  node's Boards, Sockets,
3444                            CoresPerSocket and ThreadsPerCore  may  optionally
3445                            be  configured and result in job allocations which
3446                            have improved locality; however doing so will pre‐
3447                            vent  more  than  one  job from being allocated on
3448                            each core.  Setting a value  for  DefMemPerCPU  is
3449                            strongly recommended.
3450
3451                     CR_Core
3452                            Cores are consumable resources.  On nodes with hy‐
3453                            per-threads, each thread is counted as  a  CPU  to
3454                            satisfy a job's resource requirement, but multiple
3455                            jobs are not allocated threads on the  same  core.
3456                            The count of CPUs allocated to a job is rounded up
3457                            to account for every CPU  on  an  allocated  core.
3458                            This  will also impact total allocated memory when
3459                            --mem-per-cpu is used to be multiply of total num‐
3460                            ber of CPUs on allocated cores.
3461
3462                     CR_Core_Memory
3463                            Cores  and  memory  are  consumable resources.  On
3464                            nodes with hyper-threads, each thread  is  counted
3465                            as  a CPU to satisfy a job's resource requirement,
3466                            but multiple jobs are not allocated threads on the
3467                            same  core.   The count of CPUs allocated to a job
3468                            may be rounded up to account for every CPU  on  an
3469                            allocated  core.  Setting a value for DefMemPerCPU
3470                            is strongly recommended.
3471
3472                     CR_ONE_TASK_PER_CORE
3473                            Allocate one task per core  by  default.   Without
3474                            this option, by default one task will be allocated
3475                            per thread on nodes with more than one ThreadsPer‐
3476                            Core configured.  NOTE: This option cannot be used
3477                            with CR_CPU*.
3478
3479                     CR_CORE_DEFAULT_DIST_BLOCK
3480                            Allocate cores within a node using block distribu‐
3481                            tion  by default.  This is a pseudo-best-fit algo‐
3482                            rithm that minimizes the number of boards and min‐
3483                            imizes  the  number  of  sockets  (within  minimum
3484                            boards) used for the allocation.  This default be‐
3485                            havior  can  be overridden specifying a particular
3486                            "-m" parameter with  srun/salloc/sbatch.   Without
3487                            this  option,  cores  will be allocated cyclically
3488                            across the sockets.
3489
3490                     CR_LLN Schedule resources to jobs  on  the  least  loaded
3491                            nodes  (based  upon the number of idle CPUs). This
3492                            is generally only recommended for  an  environment
3493                            with serial jobs as idle resources will tend to be
3494                            highly fragmented, resulting in parallel jobs  be‐
3495                            ing distributed across many nodes.  Note that node
3496                            Weight takes precedence over  how  many  idle  re‐
3497                            sources  are on each node.  Also see the partition
3498                            configuration parameter LLN use the  least  loaded
3499                            nodes in selected partitions.
3500
3501                     CR_Pack_Nodes
3502                            If  a  job allocation contains more resources than
3503                            will be used for launching tasks  (e.g.  if  whole
3504                            nodes  are  allocated  to a job), then rather than
3505                            distributing a job's tasks evenly across its allo‐
3506                            cated  nodes,  pack them as tightly as possible on
3507                            these nodes.  For example, consider a job  alloca‐
3508                            tion  containing  two entire nodes with eight CPUs
3509                            each.  If the job starts ten  tasks  across  those
3510                            two  nodes without this option, it will start five
3511                            tasks on each of the two nodes.  With this option,
3512                            eight  tasks will be started on the first node and
3513                            two tasks on the second node.  This can be  super‐
3514                            seded  by  "NoPack" in srun's "--distribution" op‐
3515                            tion.  CR_Pack_Nodes only applies when the "block"
3516                            task distribution method is used.
3517
3518                     CR_Socket
3519                            Sockets  are  consumable resources.  On nodes with
3520                            multiple cores, each core or thread is counted  as
3521                            a CPU to satisfy a job's resource requirement, but
3522                            multiple jobs are not allocated resources  on  the
3523                            same socket.
3524
3525                     CR_Socket_Memory
3526                            Memory  and  sockets are consumable resources.  On
3527                            nodes with multiple cores, each core or thread  is
3528                            counted  as  a CPU to satisfy a job's resource re‐
3529                            quirement, but multiple jobs are not allocated re‐
3530                            sources  on  the same socket.  Setting a value for
3531                            DefMemPerCPU is strongly recommended.
3532
3533                     CR_Memory
3534                            Memory is a consumable resource.  NOTE:  This  im‐
3535                            plies OverSubscribe=YES or OverSubscribe=FORCE for
3536                            all partitions.  Setting a value for  DefMemPerCPU
3537                            is strongly recommended.
3538
3539
3540       SlurmctldAddr
3541              An  optional  address  to be used for communications to the cur‐
3542              rently active slurmctld daemon, normally used  with  Virtual  IP
3543              addressing of the currently active server.  If this parameter is
3544              not specified then each primary and backup server will have  its
3545              own  unique  address used for communications as specified in the
3546              SlurmctldHost parameter.  If this parameter  is  specified  then
3547              the  SlurmctldHost  parameter  will still be used for communica‐
3548              tions to specific slurmctld primary or backup servers, for exam‐
3549              ple to cause all of them to read the current configuration files
3550              or shutdown.  Also see the  SlurmctldPrimaryOffProg  and  Slurm‐
3551              ctldPrimaryOnProg configuration parameters to configure programs
3552              to manipulate virtual IP address manipulation.
3553
3554
3555       SlurmctldDebug
3556              The level of detail to provide slurmctld daemon's logs.  The de‐
3557              fault  value is info.  If the slurmctld daemon is initiated with
3558              -v or --verbose options, that debug level will  be  preserve  or
3559              restored upon reconfiguration.
3560
3561
3562              quiet     Log nothing
3563
3564              fatal     Log only fatal errors
3565
3566              error     Log only errors
3567
3568              info      Log errors and general informational messages
3569
3570              verbose   Log errors and verbose informational messages
3571
3572              debug     Log  errors and verbose informational messages and de‐
3573                        bugging messages
3574
3575              debug2    Log errors and verbose informational messages and more
3576                        debugging messages
3577
3578              debug3    Log errors and verbose informational messages and even
3579                        more debugging messages
3580
3581              debug4    Log errors and verbose informational messages and even
3582                        more debugging messages
3583
3584              debug5    Log errors and verbose informational messages and even
3585                        more debugging messages
3586
3587
3588       SlurmctldHost
3589              The short, or long, hostname of the machine where Slurm  control
3590              daemon is executed (i.e. the name returned by the command "host‐
3591              name -s").  This hostname is optionally followed by the address,
3592              either  the  IP  address  or  a name by which the address can be
3593              identified, enclosed in parentheses (e.g.   SlurmctldHost=slurm‐
3594              ctl-primary(12.34.56.78)). This value must be specified at least
3595              once. If specified more than once, the first hostname named will
3596              be  where  the  daemon runs.  If the first specified host fails,
3597              the daemon will execute on the second host.  If both  the  first
3598              and  second specified host fails, the daemon will execute on the
3599              third host.
3600
3601
3602       SlurmctldLogFile
3603              Fully qualified pathname of a file into which the slurmctld dae‐
3604              mon's  logs  are  written.   The default value is none (performs
3605              logging via syslog).
3606              See the section LOGGING if a pathname is specified.
3607
3608
3609       SlurmctldParameters
3610              Multiple options may be comma separated.
3611
3612
3613              allow_user_triggers
3614                     Permit setting triggers from  non-root/slurm_user  users.
3615                     SlurmUser  must also be set to root to permit these trig‐
3616                     gers to work. See the strigger man  page  for  additional
3617                     details.
3618
3619              cloud_dns
3620                     By  default, Slurm expects that the network address for a
3621                     cloud node won't be known until the creation of the  node
3622                     and  that  Slurm  will  be notified of the node's address
3623                     (e.g. scontrol update  nodename=<name>  nodeaddr=<addr>).
3624                     Since Slurm communications rely on the node configuration
3625                     found in the slurm.conf, Slurm will tell the client  com‐
3626                     mand, after waiting for all nodes to boot, each node's ip
3627                     address. However, in environments where the nodes are  in
3628                     DNS, this step can be avoided by configuring this option.
3629
3630              cloud_reg_addrs
3631                     When  a  cloud  node  registers,  the node's NodeAddr and
3632                     NodeHostName will automatically be set. They will be  re‐
3633                     set back to the nodename after powering off.
3634
3635              enable_configless
3636                     Permit  "configless" operation by the slurmd, slurmstepd,
3637                     and user commands.  When enabled the slurmd will be  per‐
3638                     mitted  to  retrieve config files from the slurmctld, and
3639                     on any 'scontrol reconfigure' command new configs will be
3640                     automatically  pushed  out  and applied to nodes that are
3641                     running in this "configless" mode.  NOTE:  a  restart  of
3642                     the slurmctld is required for this to take effect.
3643
3644              idle_on_node_suspend
3645                     Mark  nodes  as  idle,  regardless of current state, when
3646                     suspending nodes with SuspendProgram so that  nodes  will
3647                     be eligible to be resumed at a later time.
3648
3649              node_reg_mem_percent=#
3650                     Percentage  of  memory a node is allowed to register with
3651                     without being marked as invalid with low memory.  Default
3652                     is 100. For State=CLOUD nodes, the default is 90. To dis‐
3653                     able this for cloud nodes set it to 100. config_overrides
3654                     takes precendence over this option.
3655
3656                     It's  recommended that task/cgroup with ConstrainRamSpace
3657                     is configured. A memory cgroup limit won't  be  set  more
3658                     than  the actual memory on the node. If needed, configure
3659                     AllowedRamSpace in the cgroup.conf to add a buffer.
3660
3661              power_save_interval
3662                     How often the power_save thread looks to resume and  sus‐
3663                     pend  nodes. The power_save thread will do work sooner if
3664                     there are node state changes. Default is 10 seconds.
3665
3666              power_save_min_interval
3667                     How often the power_save thread, at a minimum,  looks  to
3668                     resume and suspend nodes. Default is 0.
3669
3670              max_dbd_msg_action
3671                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3672                     card' (default) and 'exit'.
3673
3674                     When 'discard' is specified and MaxDBDMsgs is reached  we
3675                     start by purging pending messages of types Step start and
3676                     complete, and it reaches MaxDBDMsgs again Job start  mes‐
3677                     sages  are  purged.  Job completes and node state changes
3678                     continue to consume the  empty  space  created  from  the
3679                     purgings  until  MaxDBDMsgs  is reached again at which no
3680                     new message is tracked creating data loss and potentially
3681                     runaway jobs.
3682
3683                     When  'exit'  is  specified and MaxDBDMsgs is reached the
3684                     slurmctld will exit instead of discarding  any  messages.
3685                     It  will  be  impossible to start the slurmctld with this
3686                     option where the slurmdbd is down and  the  slurmctld  is
3687                     tracking more than MaxDBDMsgs.
3688
3689
3690              preempt_send_user_signal
3691                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3692                     tion time even if the signal time hasn't been reached. In
3693                     the  case  of a gracetime preemption the user signal will
3694                     be sent if the user signal has  been  specified  and  not
3695                     sent, otherwise a SIGTERM will be sent to the tasks.
3696
3697              reboot_from_controller
3698                     Run  the  RebootProgram from the controller instead of on
3699                     the  slurmds.  The  RebootProgram  will   be   passed   a
3700                     comma-separated list of nodes to reboot.
3701
3702              user_resv_delete
3703                     Allow any user able to run in a reservation to delete it.
3704
3705
3706       SlurmctldPidFile
3707              Fully  qualified  pathname  of  a file into which the  slurmctld
3708              daemon may write its process id. This may be used for  automated
3709              signal   processing.   The  default  value  is  "/var/run/slurm‐
3710              ctld.pid".
3711
3712
3713       SlurmctldPlugstack
3714              A comma-delimited list of Slurm controller plugins to be started
3715              when  the  daemon  begins and terminated when it ends.  Only the
3716              plugin's init and fini functions are called.
3717
3718
3719       SlurmctldPort
3720              The port number that the Slurm controller, slurmctld, listens to
3721              for  work. The default value is SLURMCTLD_PORT as established at
3722              system build time. If none is explicitly specified, it  will  be
3723              set  to 6817.  SlurmctldPort may also be configured to support a
3724              range of port numbers in order to accept larger bursts of incom‐
3725              ing messages by specifying two numbers separated by a dash (e.g.
3726              SlurmctldPort=6817-6818).  NOTE:  Either  slurmctld  and  slurmd
3727              daemons  must  not  execute  on  the same nodes or the values of
3728              SlurmctldPort and SlurmdPort must be different.
3729
3730              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3731              automatically  try  to  interact  with  anything opened on ports
3732              8192-60000.  Configure SlurmctldPort to use a  port  outside  of
3733              the configured SrunPortRange and RSIP's port range.
3734
3735
3736       SlurmctldPrimaryOffProg
3737              This  program is executed when a slurmctld daemon running as the
3738              primary server becomes a backup server. By default no program is
3739              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3740              ter.
3741
3742
3743       SlurmctldPrimaryOnProg
3744              This program is executed when a slurmctld daemon  running  as  a
3745              backup  server becomes the primary server. By default no program
3746              is executed.  When using virtual IP  addresses  to  manage  High
3747              Available Slurm services, this program can be used to add the IP
3748              address to an interface (and optionally try to  kill  the  unre‐
3749              sponsive   slurmctld daemon and flush the ARP caches on nodes on
3750              the local Ethernet fabric).  See also the related "SlurmctldPri‐
3751              maryOffProg" parameter.
3752
3753       SlurmctldSyslogDebug
3754              The  slurmctld  daemon will log events to the syslog file at the
3755              specified level of detail. If not set, the slurmctld daemon will
3756              log  to  syslog at level fatal, unless there is no SlurmctldLog‐
3757              File and it is running in the background, in which case it  will
3758              log to syslog at the level specified by SlurmctldDebug (at fatal
3759              in the case that SlurmctldDebug is set to quiet) or it is run in
3760              the foreground, when it will be set to quiet.
3761
3762
3763              quiet     Log nothing
3764
3765              fatal     Log only fatal errors
3766
3767              error     Log only errors
3768
3769              info      Log errors and general informational messages
3770
3771              verbose   Log errors and verbose informational messages
3772
3773              debug     Log  errors and verbose informational messages and de‐
3774                        bugging messages
3775
3776              debug2    Log errors and verbose informational messages and more
3777                        debugging messages
3778
3779              debug3    Log errors and verbose informational messages and even
3780                        more debugging messages
3781
3782              debug4    Log errors and verbose informational messages and even
3783                        more debugging messages
3784
3785              debug5    Log errors and verbose informational messages and even
3786                        more debugging messages
3787
3788
3789
3790       SlurmctldTimeout
3791              The interval, in seconds, that the backup controller  waits  for
3792              the  primary controller to respond before assuming control.  The
3793              default value is 120 seconds.  May not exceed 65533.
3794
3795
3796       SlurmdDebug
3797              The level of detail to provide slurmd daemon's  logs.   The  de‐
3798              fault value is info.
3799
3800              quiet     Log nothing
3801
3802              fatal     Log only fatal errors
3803
3804              error     Log only errors
3805
3806              info      Log errors and general informational messages
3807
3808              verbose   Log errors and verbose informational messages
3809
3810              debug     Log  errors and verbose informational messages and de‐
3811                        bugging messages
3812
3813              debug2    Log errors and verbose informational messages and more
3814                        debugging messages
3815
3816              debug3    Log errors and verbose informational messages and even
3817                        more debugging messages
3818
3819              debug4    Log errors and verbose informational messages and even
3820                        more debugging messages
3821
3822              debug5    Log errors and verbose informational messages and even
3823                        more debugging messages
3824
3825
3826       SlurmdLogFile
3827              Fully qualified pathname of a file into which the   slurmd  dae‐
3828              mon's  logs  are  written.   The default value is none (performs
3829              logging via syslog).  Any "%h" within the name is replaced  with
3830              the  hostname  on  which the slurmd is running.  Any "%n" within
3831              the name is replaced with the  Slurm  node  name  on  which  the
3832              slurmd is running.
3833              See the section LOGGING if a pathname is specified.
3834
3835
3836       SlurmdParameters
3837              Parameters  specific  to  the  Slurmd.   Multiple options may be
3838              comma separated.
3839
3840              config_overrides
3841                     If set, consider the configuration of  each  node  to  be
3842                     that  specified  in the slurm.conf configuration file and
3843                     any node with less than the configured resources will not
3844                     be  set  DRAIN.  This option is generally only useful for
3845                     testing  purposes.   Equivalent  to  the  now  deprecated
3846                     FastSchedule=2 option.
3847
3848              l3cache_as_socket
3849                     Use  the hwloc l3cache as the socket count. Can be useful
3850                     on certain processors  where  the  socket  level  is  too
3851                     coarse, and the l3cache may provide better task distribu‐
3852                     tion. (E.g.,  along  CCX  boundaries  instead  of  socket
3853                     boundaries.)  Requires hwloc v2.
3854
3855              shutdown_on_reboot
3856                     If  set,  the  Slurmd will shut itself down when a reboot
3857                     request is received.
3858
3859
3860       SlurmdPidFile
3861              Fully qualified pathname of a file into which the  slurmd daemon
3862              may  write its process id. This may be used for automated signal
3863              processing.  Any "%h" within the name is replaced with the host‐
3864              name  on  which the slurmd is running.  Any "%n" within the name
3865              is replaced with the Slurm node name on which the slurmd is run‐
3866              ning.  The default value is "/var/run/slurmd.pid".
3867
3868
3869       SlurmdPort
3870              The port number that the Slurm compute node daemon, slurmd, lis‐
3871              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3872              lished  at  system  build time. If none is explicitly specified,
3873              its value will be 6818.  NOTE: Either slurmctld and slurmd  dae‐
3874              mons  must not execute on the same nodes or the values of Slurm‐
3875              ctldPort and SlurmdPort must be different.
3876
3877              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3878              automatically  try  to  interact  with  anything opened on ports
3879              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3880              configured SrunPortRange and RSIP's port range.
3881
3882
3883       SlurmdSpoolDir
3884              Fully  qualified  pathname  of a directory into which the slurmd
3885              daemon's state information and batch job script information  are
3886              written.  This  must  be  a  common  pathname for all nodes, but
3887              should represent a directory which is local to each node (refer‐
3888              ence    a   local   file   system).   The   default   value   is
3889              "/var/spool/slurmd".  Any "%h" within the name is replaced  with
3890              the  hostname  on  which the slurmd is running.  Any "%n" within
3891              the name is replaced with the  Slurm  node  name  on  which  the
3892              slurmd is running.
3893
3894
3895       SlurmdSyslogDebug
3896              The  slurmd  daemon  will  log  events to the syslog file at the
3897              specified level of detail. If not set, the  slurmd  daemon  will
3898              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3899              and it is running in the background, in which case it  will  log
3900              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3901              the case that SlurmdDebug is set to quiet) or it is run  in  the
3902              foreground, when it will be set to quiet.
3903
3904
3905              quiet     Log nothing
3906
3907              fatal     Log only fatal errors
3908
3909              error     Log only errors
3910
3911              info      Log errors and general informational messages
3912
3913              verbose   Log errors and verbose informational messages
3914
3915              debug     Log  errors and verbose informational messages and de‐
3916                        bugging messages
3917
3918              debug2    Log errors and verbose informational messages and more
3919                        debugging messages
3920
3921              debug3    Log errors and verbose informational messages and even
3922                        more debugging messages
3923
3924              debug4    Log errors and verbose informational messages and even
3925                        more debugging messages
3926
3927              debug5    Log errors and verbose informational messages and even
3928                        more debugging messages
3929
3930
3931       SlurmdTimeout
3932              The interval, in seconds, that the Slurm  controller  waits  for
3933              slurmd  to respond before configuring that node's state to DOWN.
3934              A value of zero indicates the node will not be tested by  slurm‐
3935              ctld  to confirm the state of slurmd, the node will not be auto‐
3936              matically set  to  a  DOWN  state  indicating  a  non-responsive
3937              slurmd,  and  some other tool will take responsibility for moni‐
3938              toring the state of each compute node  and  its  slurmd  daemon.
3939              Slurm's hierarchical communication mechanism is used to ping the
3940              slurmd daemons in order to minimize system noise  and  overhead.
3941              The  default  value  is  300  seconds.  The value may not exceed
3942              65533 seconds.
3943
3944
3945       SlurmdUser
3946              The name of the user that the slurmd daemon executes  as.   This
3947              user  must  exist on all nodes of the cluster for authentication
3948              of communications between Slurm components.  The  default  value
3949              is "root".
3950
3951
3952       SlurmSchedLogFile
3953              Fully  qualified  pathname of the scheduling event logging file.
3954              The syntax of this parameter is the same  as  for  SlurmctldLog‐
3955              File.   In  order  to  configure scheduler logging, set both the
3956              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3957
3958
3959       SlurmSchedLogLevel
3960              The initial level of scheduling event logging,  similar  to  the
3961              SlurmctldDebug  parameter  used  to control the initial level of
3962              slurmctld logging.  Valid values for SlurmSchedLogLevel are  "0"
3963              (scheduler  logging  disabled)  and  "1"  (scheduler logging en‐
3964              abled).  If this parameter is omitted, the value defaults to "0"
3965              (disabled).   In  order to configure scheduler logging, set both
3966              the SlurmSchedLogFile and  SlurmSchedLogLevel  parameters.   The
3967              scheduler  logging  level can be changed dynamically using scon‐
3968              trol.
3969
3970
3971       SlurmUser
3972              The name of the user that the slurmctld daemon executes as.  For
3973              security  purposes,  a  user  other  than "root" is recommended.
3974              This user must exist on all nodes of the cluster for authentica‐
3975              tion  of  communications  between Slurm components.  The default
3976              value is "root".
3977
3978
3979       SrunEpilog
3980              Fully qualified pathname of an executable to be run by srun fol‐
3981              lowing the completion of a job step.  The command line arguments
3982              for the executable will be the command and arguments of the  job
3983              step.   This configuration parameter may be overridden by srun's
3984              --epilog parameter. Note that while the other "Epilog"  executa‐
3985              bles  (e.g.,  TaskEpilog) are run by slurmd on the compute nodes
3986              where the tasks are executed, the SrunEpilog runs  on  the  node
3987              where the "srun" is executing.
3988
3989
3990       SrunPortRange
3991              The  srun  creates  a set of listening ports to communicate with
3992              the controller, the slurmstepd and  to  handle  the  application
3993              I/O.  By default these ports are ephemeral meaning the port num‐
3994              bers are selected by the  kernel.  Using  this  parameter  allow
3995              sites  to  configure a range of ports from which srun ports will
3996              be selected. This is useful if sites want to allow only  certain
3997              port range on their network.
3998
3999              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4000              automatically try to interact  with  anything  opened  on  ports
4001              8192-60000.   Configure  SrunPortRange  to  use a range of ports
4002              above those used by RSIP, ideally 1000 or more ports, for  exam‐
4003              ple "SrunPortRange=60001-63000".
4004
4005              Note:  SrunPortRange  must be large enough to cover the expected
4006              number of srun ports created on a given submission node. A  sin‐
4007              gle srun opens 3 listening ports plus 2 more for every 48 hosts.
4008              Example:
4009
4010              srun -N 48 will use 5 listening ports.
4011
4012
4013              srun -N 50 will use 7 listening ports.
4014
4015
4016              srun -N 200 will use 13 listening ports.
4017
4018
4019       SrunProlog
4020              Fully qualified pathname of an executable  to  be  run  by  srun
4021              prior  to  the launch of a job step.  The command line arguments
4022              for the executable will be the command and arguments of the  job
4023              step.   This configuration parameter may be overridden by srun's
4024              --prolog parameter. Note that while the other "Prolog"  executa‐
4025              bles  (e.g.,  TaskProlog) are run by slurmd on the compute nodes
4026              where the tasks are executed, the SrunProlog runs  on  the  node
4027              where the "srun" is executing.
4028
4029
4030       StateSaveLocation
4031              Fully  qualified  pathname  of  a directory into which the Slurm
4032              controller,  slurmctld,  saves   its   state   (e.g.   "/usr/lo‐
4033              cal/slurm/checkpoint").   Slurm state will saved here to recover
4034              from system failures.  SlurmUser must be able to create files in
4035              this  directory.   If you have a secondary SlurmctldHost config‐
4036              ured, this location should be readable and writable by both sys‐
4037              tems.   Since  all running and pending job information is stored
4038              here, the use of a reliable file system (e.g.  RAID)  is  recom‐
4039              mended.   The  default value is "/var/spool".  If any slurm dae‐
4040              mons terminate abnormally, their core files will also be written
4041              into this directory.
4042
4043
4044       SuspendExcNodes
4045              Specifies  the  nodes  which  are to not be placed in power save
4046              mode, even if the node remains idle for an  extended  period  of
4047              time.  Use Slurm's hostlist expression to identify nodes with an
4048              optional ":" separator and count of nodes to  exclude  from  the
4049              preceding  range.  For example "nid[10-20]:4" will prevent 4 us‐
4050              able nodes (i.e IDLE and not DOWN, DRAINING or  already  powered
4051              down) in the set "nid[10-20]" from being powered down.  Multiple
4052              sets of nodes can be specified with or without counts in a comma
4053              separated  list  (e.g  "nid[10-20]:4,nid[80-90]:2").   If a node
4054              count specification is given, any list of nodes to  NOT  have  a
4055              node  count  must  be after the last specification with a count.
4056              For example "nid[10-20]:4,nid[60-70]" will exclude  4  nodes  in
4057              the  set  "nid[10-20]:4"  plus all nodes in the set "nid[60-70]"
4058              while "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the  set
4059              "nid[1-3],nid[10-20]".  By default no nodes are excluded.
4060
4061
4062       SuspendExcParts
4063              Specifies  the  partitions  whose  nodes are to not be placed in
4064              power save mode, even if the node remains idle for  an  extended
4065              period of time.  Multiple partitions can be identified and sepa‐
4066              rated by commas.  By default no nodes are excluded.
4067
4068
4069       SuspendProgram
4070              SuspendProgram is the program that will be executed when a  node
4071              remains  idle  for  an extended period of time.  This program is
4072              expected to place the node into some power save mode.  This  can
4073              be  used  to  reduce the frequency and voltage of a node or com‐
4074              pletely power the node off.  The program executes as  SlurmUser.
4075              The  argument  to  the  program will be the names of nodes to be
4076              placed into power savings mode (using Slurm's  hostlist  expres‐
4077              sion format).  By default, no program is run.
4078
4079
4080       SuspendRate
4081              The  rate at which nodes are placed into power save mode by Sus‐
4082              pendProgram.  The value is number of nodes per minute and it can
4083              be used to prevent a large drop in power consumption (e.g. after
4084              a large job completes).  A value of zero results  in  no  limits
4085              being imposed.  The default value is 60 nodes per minute.
4086
4087
4088       SuspendTime
4089              Nodes  which remain idle or down for this number of seconds will
4090              be placed into power save mode by SuspendProgram.  Setting  Sus‐
4091              pendTime to anything but INFINITE (or -1) will enable power save
4092              mode. INFINITE is the default.
4093
4094
4095       SuspendTimeout
4096              Maximum time permitted (in seconds) between when a node  suspend
4097              request  is  issued and when the node is shutdown.  At that time
4098              the node must be ready for a resume  request  to  be  issued  as
4099              needed for new work.  The default value is 30 seconds.
4100
4101
4102       SwitchParameters
4103              Optional parameters for the switch plugin.
4104
4105
4106       SwitchType
4107              Identifies  the type of switch or interconnect used for applica‐
4108              tion     communications.      Acceptable     values      include
4109              "switch/cray_aries" for Cray systems, "switch/none" for switches
4110              not requiring special processing for job launch  or  termination
4111              (Ethernet,   and   InfiniBand)   and   The   default   value  is
4112              "switch/none".  All Slurm daemons,  commands  and  running  jobs
4113              must be restarted for a change in SwitchType to take effect.  If
4114              running jobs exist at the time slurmctld is restarted with a new
4115              value  of  SwitchType,  records  of all jobs in any state may be
4116              lost.
4117
4118
4119       TaskEpilog
4120              Fully qualified pathname of a program to be execute as the slurm
4121              job's  owner after termination of each task.  See TaskProlog for
4122              execution order details.
4123
4124
4125       TaskPlugin
4126              Identifies the type of task launch  plugin,  typically  used  to
4127              provide resource management within a node (e.g. pinning tasks to
4128              specific processors). More than one task plugin can be specified
4129              in  a  comma-separated  list. The prefix of "task/" is optional.
4130              Acceptable values include:
4131
4132              task/affinity  enables      resource      containment      using
4133                             sched_setaffinity().  This enables the --cpu-bind
4134                             and/or --mem-bind srun options.
4135
4136              task/cgroup    enables resource containment using Linux  control
4137                             cgroups.   This  enables  the  --cpu-bind  and/or
4138                             --mem-bind  srun   options.    NOTE:   see   "man
4139                             cgroup.conf" for configuration details.
4140
4141              task/none      for systems requiring no special handling of user
4142                             tasks.  Lacks support for the  --cpu-bind  and/or
4143                             --mem-bind  srun  options.   The default value is
4144                             "task/none".
4145
4146              NOTE: It is recommended to stack  task/affinity,task/cgroup  to‐
4147              gether  when  configuring  TaskPlugin,  and  setting  Constrain‐
4148              Cores=yes in cgroup.conf.  This  setup  uses  the  task/affinity
4149              plugin  for  setting  the  affinity  of  the  tasks and uses the
4150              task/cgroup plugin to fence tasks into the specified resources.
4151
4152              NOTE: For CRAY systems only: task/cgroup must be used with,  and
4153              listed  after  task/cray_aries  in TaskPlugin. The task/affinity
4154              plugin can be listed anywhere, but the previous constraint  must
4155              be  satisfied.  For  CRAY  systems, a configuration like this is
4156              recommended:
4157              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
4158
4159
4160       TaskPluginParam
4161              Optional parameters  for  the  task  plugin.   Multiple  options
4162              should  be  comma  separated.   None, Boards, Sockets, Cores and
4163              Threads are mutually exclusive and treated as  a  last  possible
4164              source  of  --cpu-bind default. See also Node and Partition Cpu‐
4165              Bind options.
4166
4167
4168              Cores  Bind tasks to  cores  by  default.   Overrides  automatic
4169                     binding.
4170
4171              None   Perform  no task binding by default.  Overrides automatic
4172                     binding.
4173
4174              Sockets
4175                     Bind to sockets by default.  Overrides automatic binding.
4176
4177              Threads
4178                     Bind to threads by default.  Overrides automatic binding.
4179
4180              SlurmdOffSpec
4181                     If specialized cores or CPUs are identified for the  node
4182                     (i.e. the CoreSpecCount or CpuSpecList are configured for
4183                     the node), then Slurm daemons running on the compute node
4184                     (i.e.  slurmd and slurmstepd) should run outside of those
4185                     resources (i.e. specialized resources are completely  un‐
4186                     available  to  Slurm  daemons and jobs spawned by Slurm).
4187                     This option may not  be  used  with  the  task/cray_aries
4188                     plugin.
4189
4190              Verbose
4191                     Verbosely report binding before tasks run by default.
4192
4193              Autobind
4194                     Set  a  default  binding in the event that "auto binding"
4195                     doesn't find a match.  Set to Threads, Cores  or  Sockets
4196                     (E.g. TaskPluginParam=autobind=threads).
4197
4198
4199       TaskProlog
4200              Fully qualified pathname of a program to be execute as the slurm
4201              job's owner prior to initiation of each task.  Besides the  nor‐
4202              mal  environment variables, this has SLURM_TASK_PID available to
4203              identify the process ID of the  task  being  started.   Standard
4204              output  from this program can be used to control the environment
4205              variables and output for the user program.
4206
4207              export NAME=value   Will set environment variables for the  task
4208                                  being  spawned.   Everything after the equal
4209                                  sign to the end of the line will be used  as
4210                                  the value for the environment variable.  Ex‐
4211                                  porting of functions is not  currently  sup‐
4212                                  ported.
4213
4214              print ...           Will  cause  that  line (without the leading
4215                                  "print ") to be printed to the  job's  stan‐
4216                                  dard output.
4217
4218              unset NAME          Will  clear  environment  variables  for the
4219                                  task being spawned.
4220
4221              The order of task prolog/epilog execution is as follows:
4222
4223              1. pre_launch_priv()
4224                                  Function in TaskPlugin
4225
4226              1. pre_launch()     Function in TaskPlugin
4227
4228              2. TaskProlog       System-wide  per  task  program  defined  in
4229                                  slurm.conf
4230
4231              3. User prolog      Job-step-specific task program defined using
4232                                  srun's     --task-prolog      option      or
4233                                  SLURM_TASK_PROLOG environment variable
4234
4235              4. Task             Execute the job step's task
4236
4237              5. User epilog      Job-step-specific task program defined using
4238                                  srun's     --task-epilog      option      or
4239                                  SLURM_TASK_EPILOG environment variable
4240
4241              6. TaskEpilog       System-wide  per  task  program  defined  in
4242                                  slurm.conf
4243
4244              7. post_term()      Function in TaskPlugin
4245
4246
4247       TCPTimeout
4248              Time permitted for TCP connection  to  be  established.  Default
4249              value is 2 seconds.
4250
4251
4252       TmpFS  Fully  qualified  pathname  of the file system available to user
4253              jobs for temporary storage. This parameter is used in establish‐
4254              ing a node's TmpDisk space.  The default value is "/tmp".
4255
4256
4257       TopologyParam
4258              Comma-separated options identifying network topology options.
4259
4260              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4261                             when TopologyPlugin=topology/tree.
4262
4263              TopoOptional   Only optimize allocation for network topology  if
4264                             the  job includes a switch option. Since optimiz‐
4265                             ing resource  allocation  for  topology  involves
4266                             much  higher  system overhead, this option can be
4267                             used to impose the extra overhead  only  on  jobs
4268                             which can take advantage of it. If most job allo‐
4269                             cations are not optimized for  network  topology,
4270                             they  may  fragment  resources  to the point that
4271                             topology optimization for other jobs will be dif‐
4272                             ficult  to  achieve.   NOTE: Jobs may span across
4273                             nodes without common parent  switches  with  this
4274                             enabled.
4275
4276
4277       TopologyPlugin
4278              Identifies  the  plugin  to  be used for determining the network
4279              topology and optimizing job allocations to minimize network con‐
4280              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4281              plugins may be provided in the future which gather topology  in‐
4282              formation directly from the network.  Acceptable values include:
4283
4284              topology/3d_torus    best-fit   logic   over   three-dimensional
4285                                   topology
4286
4287              topology/none        default for other systems,  best-fit  logic
4288                                   over one-dimensional topology
4289
4290              topology/tree        used  for  a  hierarchical  network  as de‐
4291                                   scribed in a topology.conf file
4292
4293
4294       TrackWCKey
4295              Boolean yes or no.  Used to set display and track of  the  Work‐
4296              load  Characterization  Key.  Must be set to track correct wckey
4297              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4298              file to create historical usage reports.
4299
4300
4301       TreeWidth
4302              Slurmd  daemons  use  a virtual tree network for communications.
4303              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4304              architectures  with  a front end node running the slurmd daemon,
4305              the value must always be equal to or greater than the number  of
4306              front end nodes which eliminates the need for message forwarding
4307              between the slurmd daemons.  On other architectures the  default
4308              value  is 50, meaning each slurmd daemon can communicate with up
4309              to 50 other slurmd daemons and over 2500 nodes can be  contacted
4310              with  two  message  hops.   The default value will work well for
4311              most clusters.  Optimal  system  performance  can  typically  be
4312              achieved if TreeWidth is set to the square root of the number of
4313              nodes in the cluster for systems having no more than 2500  nodes
4314              or  the  cube  root for larger systems. The value may not exceed
4315              65533.
4316
4317
4318       UnkillableStepProgram
4319              If the processes in a job step are determined to  be  unkillable
4320              for  a  period  of  time  specified by the UnkillableStepTimeout
4321              variable, the program specified by UnkillableStepProgram will be
4322              executed.  By default no program is run.
4323
4324              See section UNKILLABLE STEP PROGRAM SCRIPT for more information.
4325
4326
4327       UnkillableStepTimeout
4328              The  length of time, in seconds, that Slurm will wait before de‐
4329              ciding that processes in a job step are unkillable  (after  they
4330              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
4331              gram.  The default timeout value is 60  seconds.   If  exceeded,
4332              the compute node will be drained to prevent future jobs from be‐
4333              ing scheduled on the node.
4334
4335
4336       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
4337              will  be enabled.  PAM is used to establish the upper bounds for
4338              resource limits. With PAM support enabled, local system adminis‐
4339              trators can dynamically configure system resource limits. Chang‐
4340              ing the upper bound of a resource limit will not alter the  lim‐
4341              its  of  running jobs, only jobs started after a change has been
4342              made will pick up the new limits.  The default value is  0  (not
4343              to enable PAM support).  Remember that PAM also needs to be con‐
4344              figured to support Slurm as a service.  For  sites  using  PAM's
4345              directory based configuration option, a configuration file named
4346              slurm should be created.  The  module-type,  control-flags,  and
4347              module-path names that should be included in the file are:
4348              auth        required      pam_localuser.so
4349              auth        required      pam_shells.so
4350              account     required      pam_unix.so
4351              account     required      pam_access.so
4352              session     required      pam_unix.so
4353              For sites configuring PAM with a general configuration file, the
4354              appropriate lines (see above), where slurm is the  service-name,
4355              should be added.
4356
4357              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
4358              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
4359              these  two  modules  can work independently of the value set for
4360              UsePAM.
4361
4362
4363       VSizeFactor
4364              Memory specifications in job requests apply to real memory  size
4365              (also  known  as  resident  set size). It is possible to enforce
4366              virtual memory limits for both jobs and job  steps  by  limiting
4367              their virtual memory to some percentage of their real memory al‐
4368              location. The VSizeFactor parameter specifies the job's  or  job
4369              step's  virtual  memory limit as a percentage of its real memory
4370              limit. For example, if a job's real memory limit  is  500MB  and
4371              VSizeFactor  is  set  to  101 then the job will be killed if its
4372              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
4373              (101 percent of the real memory limit).  The default value is 0,
4374              which disables enforcement of virtual memory limits.  The  value
4375              may not exceed 65533 percent.
4376
4377              NOTE:  This  parameter is dependent on OverMemoryKill being con‐
4378              figured in JobAcctGatherParams. It is also possible to configure
4379              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4380              Factor will not  have  an  effect  on  memory  enforcement  done
4381              through cgroups.
4382
4383
4384       WaitTime
4385              Specifies  how  many  seconds the srun command should by default
4386              wait after the first task terminates before terminating all  re‐
4387              maining  tasks.  The  "--wait"  option  on the srun command line
4388              overrides this value.  The default value is  0,  which  disables
4389              this feature.  May not exceed 65533 seconds.
4390
4391
4392       X11Parameters
4393              For use with Slurm's built-in X11 forwarding implementation.
4394
4395              home_xauthority
4396                      If set, xauth data on the compute node will be placed in
4397                      ~/.Xauthority rather than  in  a  temporary  file  under
4398                      TmpFS.
4399
4400

NODE CONFIGURATION

4402       The configuration of nodes (or machines) to be managed by Slurm is also
4403       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
4404       adding  nodes, changing their processor count, etc.) require restarting
4405       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
4406       must know each node in the system to forward messages in support of hi‐
4407       erarchical communications.  Only the NodeName must be supplied  in  the
4408       configuration  file.   All  other node configuration information is op‐
4409       tional.  It is advisable to establish baseline node configurations, es‐
4410       pecially  if the cluster is heterogeneous.  Nodes which register to the
4411       system with less than the configured resources (e.g.  too  little  mem‐
4412       ory),  will  be  placed in the "DOWN" state to avoid scheduling jobs on
4413       them.  Establishing baseline configurations  will  also  speed  Slurm's
4414       scheduling process by permitting it to compare job requirements against
4415       these (relatively few) configuration parameters and possibly avoid hav‐
4416       ing  to check job requirements against every individual node's configu‐
4417       ration.  The resources checked at node  registration  time  are:  CPUs,
4418       RealMemory and TmpDisk.
4419
4420       Default values can be specified with a record in which NodeName is "DE‐
4421       FAULT".  The default entry values will apply only to lines following it
4422       in  the configuration file and the default values can be reset multiple
4423       times in the configuration file  with  multiple  entries  where  "Node‐
4424       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4425       add to previous default values and not a reinitialize the default  val‐
4426       ues.   The  "NodeName="  specification must be placed on every line de‐
4427       scribing the configuration of nodes.  A single node name can not appear
4428       as  a NodeName value in more than one line (duplicate node name records
4429       will be ignored).  In fact, it is generally possible and  desirable  to
4430       define  the configurations of all nodes in only a few lines.  This con‐
4431       vention permits significant optimization in the  scheduling  of  larger
4432       clusters.   In  order to support the concept of jobs requiring consecu‐
4433       tive nodes on some architectures, node specifications should  be  place
4434       in  this  file in consecutive order.  No single node name may be listed
4435       more than once in the configuration file.  Use "DownNodes="  to  record
4436       the  state  of  nodes which are temporarily in a DOWN, DRAIN or FAILING
4437       state without altering  permanent  configuration  information.   A  job
4438       step's  tasks  are  allocated to nodes in order the nodes appear in the
4439       configuration file. There is presently no capability  within  Slurm  to
4440       arbitrarily order a job step's tasks.
4441
4442       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4443       and/or a simple node range expression may optionally be used to specify
4444       numeric  ranges  of  nodes  to avoid building a configuration file with
4445       large numbers of entries.  The node range expression  can  contain  one
4446       pair  of  square  brackets  with  a sequence of comma-separated numbers
4447       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4448       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4449       more leading zeros to indicate the numeric portion has a  fixed  number
4450       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4451       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4452       more  numeric  expressions are included, one of them must be at the end
4453       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4454       always be used in a comma-separated list.
4455
4456       The node configuration specified the following information:
4457
4458
4459       NodeName
4460              Name  that  Slurm uses to refer to a node.  Typically this would
4461              be the string that "/bin/hostname -s" returns.  It may  also  be
4462              the  fully  qualified  domain name as returned by "/bin/hostname
4463              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4464              with the host through the host database (/etc/hosts) or DNS, de‐
4465              pending on the resolver settings.  Note that if the  short  form
4466              of  the hostname is not used, it may prevent use of hostlist ex‐
4467              pressions (the numeric portion in brackets must be at the end of
4468              the string).  It may also be an arbitrary string if NodeHostname
4469              is specified.  If the NodeName is "DEFAULT", the  values  speci‐
4470              fied  with  that record will apply to subsequent node specifica‐
4471              tions unless explicitly set to other values in that node  record
4472              or  replaced  with a different set of default values.  Each line
4473              where NodeName is "DEFAULT" will replace or add to previous  de‐
4474              fault values and not a reinitialize the default values.  For ar‐
4475              chitectures in which the node order is significant,  nodes  will
4476              be considered consecutive in the order defined.  For example, if
4477              the configuration for "NodeName=charlie" immediately follows the
4478              configuration for "NodeName=baker" they will be considered adja‐
4479              cent in the computer.
4480
4481
4482       NodeHostname
4483              Typically this would be the string that "/bin/hostname  -s"  re‐
4484              turns.   It  may  also be the fully qualified domain name as re‐
4485              turned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid
4486              domain  name  associated with the host through the host database
4487              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4488              that  if the short form of the hostname is not used, it may pre‐
4489              vent use of hostlist expressions (the numeric portion in  brack‐
4490              ets  must be at the end of the string).  A node range expression
4491              can be used to specify a set of  nodes.   If  an  expression  is
4492              used,  the  number of nodes identified by NodeHostname on a line
4493              in the configuration file must be identical  to  the  number  of
4494              nodes identified by NodeName.  By default, the NodeHostname will
4495              be identical in value to NodeName.
4496
4497
4498       NodeAddr
4499              Name that a node should be referred to in establishing a  commu‐
4500              nications  path.   This  name will be used as an argument to the
4501              getaddrinfo() function for identification.  If a node range  ex‐
4502              pression  is used to designate multiple nodes, they must exactly
4503              match  the  entries  in  the  NodeName  (e.g.  "NodeName=lx[0-7]
4504              NodeAddr=elx[0-7]").   NodeAddr  may  also contain IP addresses.
4505              By default, the NodeAddr will be identical in value to NodeHost‐
4506              name.
4507
4508
4509       BcastAddr
4510              Alternate  network path to be used for sbcast network traffic to
4511              a given node.  This name will be used  as  an  argument  to  the
4512              getaddrinfo()  function.   If a node range expression is used to
4513              designate multiple nodes, they must exactly match the entries in
4514              the   NodeName   (e.g.  "NodeName=lx[0-7]  BcastAddr=elx[0-7]").
4515              BcastAddr may also contain IP addresses.  By default, the  Bcas‐
4516              tAddr  is  unset,  and  sbcast  traffic  will  be  routed to the
4517              NodeAddr for a given node.  Note: cannot be used with Communica‐
4518              tionParameters=NoInAddrAny.
4519
4520
4521       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4522              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4523              and ThreadsPerCore should be specified.  The default value is 1.
4524
4525
4526       CoreSpecCount
4527              Number  of  cores reserved for system use.  These cores will not
4528              be available for allocation to user jobs.   Depending  upon  the
4529              TaskPluginParam  option  of  SlurmdOffSpec,  Slurm daemons (i.e.
4530              slurmd and slurmstepd) may either be confined to these resources
4531              (the  default)  or prevented from using these resources.  Isola‐
4532              tion of the Slurm daemons from user jobs may improve application
4533              performance.  If this option and CpuSpecList are both designated
4534              for a node, an error is generated.  For information on the algo‐
4535              rithm  used  by Slurm to select the cores refer to the core spe‐
4536              cialization                   documentation                    (
4537              https://slurm.schedmd.com/core_spec.html ).
4538
4539
4540       CoresPerSocket
4541              Number  of  cores  in  a  single physical processor socket (e.g.
4542              "2").  The CoresPerSocket value describes  physical  cores,  not
4543              the  logical number of processors per socket.  NOTE: If you have
4544              multi-core processors, you will likely need to specify this  pa‐
4545              rameter  in  order to optimize scheduling.  The default value is
4546              1.
4547
4548
4549       CpuBind
4550              If a job step request does not specify an option to control  how
4551              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4552              located to the job have the same CpuBind option the node CpuBind
4553              option  will control how tasks are bound to allocated resources.
4554              Supported values for  CpuBind  are  "none",  "board",  "socket",
4555              "ldom" (NUMA), "core" and "thread".
4556
4557
4558       CPUs   Number  of logical processors on the node (e.g. "2").  It can be
4559              set to the total number of sockets(supported only by select/lin‐
4560              ear),  cores  or  threads.   This can be useful when you want to
4561              schedule only the cores on a hyper-threaded  node.  If  CPUs  is
4562              omitted, its default will be set equal to the product of Boards,
4563              Sockets, CoresPerSocket, and ThreadsPerCore.
4564
4565
4566       CpuSpecList
4567              A comma-delimited list of Slurm abstract CPU  IDs  reserved  for
4568              system  use.   The  list  will  be expanded to include all other
4569              CPUs, if any, on the same cores.  These cores will not be avail‐
4570              able  for allocation to user jobs.  Depending upon the TaskPlug‐
4571              inParam option of SlurmdOffSpec, Slurm daemons (i.e. slurmd  and
4572              slurmstepd)  may  either be confined to these resources (the de‐
4573              fault) or prevented from using these  resources.   Isolation  of
4574              the Slurm daemons from user jobs may improve application perfor‐
4575              mance.  If this option and CoreSpecCount are both designated for
4576              a node, an error is generated.  This option has no effect unless
4577              cgroup   job   confinement   is   also   configured    (TaskPlu‐
4578              gin=task/cgroup with ConstrainCores=yes in cgroup.conf).
4579
4580
4581       Features
4582              A  comma-delimited  list of arbitrary strings indicative of some
4583              characteristic associated with the node.  There is no  value  or
4584              count  associated with a feature at this time, a node either has
4585              a feature or it does not.  A desired feature may contain  a  nu‐
4586              meric  component  indicating,  for  example, processor speed but
4587              this numeric component will be considered to be part of the fea‐
4588              ture  string.  Features  are intended to be used to filter nodes
4589              eligible to run jobs via the --constraint argument.  By  default
4590              a  node  has  no features.  Also see Gres for being able to have
4591              more control such as types and count. Using features  is  faster
4592              than  scheduling  against  GRES but is limited to Boolean opera‐
4593              tions.
4594
4595
4596       Gres   A comma-delimited list of generic resources specifications for a
4597              node.    The   format   is:  "<name>[:<type>][:no_consume]:<num‐
4598              ber>[K|M|G]".  The first  field  is  the  resource  name,  which
4599              matches the GresType configuration parameter name.  The optional
4600              type field might be used to identify a model of that generic re‐
4601              source.   It  is forbidden to specify both an untyped GRES and a
4602              typed GRES with the same <name>.  The optional no_consume  field
4603              allows  you  to  specify that a generic resource does not have a
4604              finite number of that resource that gets consumed as it  is  re‐
4605              quested. The no_consume field is a GRES specific setting and ap‐
4606              plies to the GRES, regardless of the type specified.  The  final
4607              field  must specify a generic resources count.  A suffix of "K",
4608              "M", "G", "T" or "P" may be used to multiply the number by 1024,
4609              1048576,          1073741824,         etc.         respectively.
4610              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4611              sume:4G").   By  default a node has no generic resources and its
4612              maximum count is that of an unsigned 64bit  integer.   Also  see
4613              Features  for  Boolean  flags  to  filter  nodes  using job con‐
4614              straints.
4615
4616
4617       MemSpecLimit
4618              Amount of memory, in megabytes, reserved for system use and  not
4619              available  for  user  allocations.  If the task/cgroup plugin is
4620              configured and that plugin constrains memory  allocations  (i.e.
4621              TaskPlugin=task/cgroup in slurm.conf, plus ConstrainRAMSpace=yes
4622              in cgroup.conf), then Slurm compute node  daemons  (slurmd  plus
4623              slurmstepd)  will  be allocated the specified memory limit. Note
4624              that having the Memory set in SelectTypeParameters as any of the
4625              options  that has it as a consumable resource is needed for this
4626              option to work.  The daemons will not be killed if they  exhaust
4627              the  memory allocation (ie. the Out-Of-Memory Killer is disabled
4628              for the daemon's memory cgroup).  If the task/cgroup  plugin  is
4629              not  configured,  the  specified memory will only be unavailable
4630              for user allocations.
4631
4632
4633       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4634              tens  to for work on this particular node. By default there is a
4635              single port number for all slurmd daemons on all  compute  nodes
4636              as  defined  by  the  SlurmdPort configuration parameter. Use of
4637              this option is not generally recommended except for  development
4638              or  testing  purposes.  If  multiple slurmd daemons execute on a
4639              node this can specify a range of ports.
4640
4641              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4642              automatically  try  to  interact  with  anything opened on ports
4643              8192-60000.  Configure Port to use a port outside of the config‐
4644              ured SrunPortRange and RSIP's port range.
4645
4646
4647       Procs  See CPUs.
4648
4649
4650       RealMemory
4651              Size of real memory on the node in megabytes (e.g. "2048").  The
4652              default value is 1. Lowering RealMemory with the goal of setting
4653              aside  some  amount for the OS and not available for job alloca‐
4654              tions will not work as intended if Memory is not set as  a  con‐
4655              sumable resource in SelectTypeParameters. So one of the *_Memory
4656              options need to be enabled for that  goal  to  be  accomplished.
4657              Also see MemSpecLimit.
4658
4659
4660       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4661              "DRAINED" "DRAINING", "FAIL" or "FAILING".  Use  quotes  to  en‐
4662              close a reason having more than one word.
4663
4664
4665       Sockets
4666              Number  of  physical  processor  sockets/chips on the node (e.g.
4667              "2").  If Sockets is omitted, it will  be  inferred  from  CPUs,
4668              CoresPerSocket,   and   ThreadsPerCore.    NOTE:   If  you  have
4669              multi-core processors, you will likely need to specify these pa‐
4670              rameters.   Sockets  and SocketsPerBoard are mutually exclusive.
4671              If Sockets is specified when Boards is also used, Sockets is in‐
4672              terpreted as SocketsPerBoard rather than total sockets.  The de‐
4673              fault value is 1.
4674
4675
4676       SocketsPerBoard
4677              Number of  physical  processor  sockets/chips  on  a  baseboard.
4678              Sockets and SocketsPerBoard are mutually exclusive.  The default
4679              value is 1.
4680
4681
4682       State  State of the node with respect to the initiation of  user  jobs.
4683              Acceptable  values are CLOUD, DOWN, DRAIN, FAIL, FAILING, FUTURE
4684              and UNKNOWN.  Node states of BUSY and IDLE should not be  speci‐
4685              fied  in  the  node configuration, but set the node state to UN‐
4686              KNOWN instead.  Setting the node state to UNKNOWN will result in
4687              the  node  state  being  set  to BUSY, IDLE or other appropriate
4688              state based upon recovered system state  information.   The  de‐
4689              fault value is UNKNOWN.  Also see the DownNodes parameter below.
4690
4691              CLOUD     Indicates  the  node exists in the cloud.  Its initial
4692                        state will be treated as powered down.  The node  will
4693                        be available for use after its state is recovered from
4694                        Slurm's state save file or the slurmd daemon starts on
4695                        the compute node.
4696
4697              DOWN      Indicates the node failed and is unavailable to be al‐
4698                        located work.
4699
4700              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4701                        work.
4702
4703              FAIL      Indicates  the  node  is expected to fail soon, has no
4704                        jobs allocated to it, and will not be allocated to any
4705                        new jobs.
4706
4707              FAILING   Indicates  the  node is expected to fail soon, has one
4708                        or more jobs allocated to it, but will  not  be  allo‐
4709                        cated to any new jobs.
4710
4711              FUTURE    Indicates  the node is defined for future use and need
4712                        not exist when the Slurm daemons  are  started.  These
4713                        nodes can be made available for use simply by updating
4714                        the node state using the scontrol command rather  than
4715                        restarting the slurmctld daemon. After these nodes are
4716                        made available, change their State in  the  slurm.conf
4717                        file.  Until these nodes are made available, they will
4718                        not be seen using any Slurm commands or nor  will  any
4719                        attempt be made to contact them.
4720
4721
4722                        Dynamic Future Nodes
4723                               A slurmd started with -F[<feature>] will be as‐
4724                               sociated with a FUTURE node  that  matches  the
4725                               same configuration (sockets, cores, threads) as
4726                               reported by slurmd -C. The node's NodeAddr  and
4727                               NodeHostname  will  automatically  be retrieved
4728                               from the slurmd and will be  cleared  when  set
4729                               back  to the FUTURE state. Dynamic FUTURE nodes
4730                               retain non-FUTURE state on restart.  Use  scon‐
4731                               trol to put node back into FUTURE state.
4732
4733                               If  the  mapping  of the NodeName to the slurmd
4734                               HostName is not updated in DNS, Dynamic  Future
4735                               nodes  won't  know how to communicate with each
4736                               other -- because NodeAddr and NodeHostName  are
4737                               not defined in the slurm.conf -- and the fanout
4738                               communications need to be disabled  by  setting
4739                               TreeWidth to a high number (e.g. 65533). If the
4740                               DNS mapping is made, then the cloud_dns  Slurm‐
4741                               ctldParameter can be used.
4742
4743
4744              UNKNOWN   Indicates  the  node's  state is undefined but will be
4745                        established (set to BUSY or IDLE) when the slurmd dae‐
4746                        mon  on  that  node  registers. UNKNOWN is the default
4747                        state.
4748
4749
4750       ThreadsPerCore
4751              Number of logical threads in a single physical core (e.g.  "2").
4752              Note  that  the Slurm can allocate resources to jobs down to the
4753              resolution of a core. If your system  is  configured  with  more
4754              than  one  thread per core, execution of a different job on each
4755              thread is not supported unless you  configure  SelectTypeParame‐
4756              ters=CR_CPU  plus CPUs; do not configure Sockets, CoresPerSocket
4757              or ThreadsPerCore.  A job can execute a one task per thread from
4758              within  one  job  step or execute a distinct job step on each of
4759              the threads.  Note also if you are  running  with  more  than  1
4760              thread   per   core  and  running  the  select/cons_res  or  se‐
4761              lect/cons_tres plugin then you will want to set the  SelectType‐
4762              Parameters  variable to something other than CR_CPU to avoid un‐
4763              expected results.  The default value is 1.
4764
4765
4766       TmpDisk
4767              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4768              "16384"). TmpFS (for "Temporary File System") identifies the lo‐
4769              cation which jobs should use for temporary storage.   Note  this
4770              does not indicate the amount of free space available to the user
4771              on the node, only the total file system size. The system  admin‐
4772              istration  should ensure this file system is purged as needed so
4773              that user jobs have access to most of this  space.   The  Prolog
4774              and/or  Epilog  programs  (specified  in the configuration file)
4775              might be used to ensure the file system is kept clean.  The  de‐
4776              fault value is 0.
4777
4778
4779       TRESWeights
4780              TRESWeights  are  used  to calculate a value that represents how
4781              busy a node is. Currently only  used  in  federation  configura‐
4782              tions.  TRESWeights  are  different  from  TRESBillingWeights --
4783              which is used for fairshare calculations.
4784
4785              TRES weights are specified as a comma-separated  list  of  <TRES
4786              Type>=<TRES Weight> pairs.
4787              e.g.
4788              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4789
4790              By  default  the weighted TRES value is calculated as the sum of
4791              all node TRES  types  multiplied  by  their  corresponding  TRES
4792              weight.
4793
4794              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4795              is calculated as the MAX of individual node  TRES'  (e.g.  cpus,
4796              mem, gres).
4797
4798
4799       Weight The  priority  of  the node for scheduling purposes.  All things
4800              being equal, jobs will be allocated the nodes  with  the  lowest
4801              weight  which satisfies their requirements.  For example, a het‐
4802              erogeneous collection of nodes might be  placed  into  a  single
4803              partition for greater system utilization, responsiveness and ca‐
4804              pability. It would be  preferable  to  allocate  smaller  memory
4805              nodes  rather  than larger memory nodes if either will satisfy a
4806              job's requirements.  The units  of  weight  are  arbitrary,  but
4807              larger weights should be assigned to nodes with more processors,
4808              memory, disk space, higher processor speed, etc.  Note that if a
4809              job allocation request can not be satisfied using the nodes with
4810              the lowest weight, the set of nodes with the next lowest  weight
4811              is added to the set of nodes under consideration for use (repeat
4812              as needed for higher weight values). If you absolutely  want  to
4813              minimize  the  number  of higher weight nodes allocated to a job
4814              (at a cost of higher scheduling overhead), give each node a dis‐
4815              tinct  Weight  value and they will be added to the pool of nodes
4816              being considered for scheduling individually.  The default value
4817              is 1.
4818
4819

DOWN NODE CONFIGURATION

4821       The  DownNodes=  parameter  permits  you  to mark certain nodes as in a
4822       DOWN, DRAIN, FAIL, FAILING or FUTURE state without altering the  perma‐
4823       nent configuration information listed under a NodeName= specification.
4824
4825
4826       DownNodes
4827              Any  node name, or list of node names, from the NodeName= speci‐
4828              fications.
4829
4830
4831       Reason Identifies the reason for a node being  in  state  DOWN,  DRAIN,
4832              FAIL,  FAILING or FUTURE.  Use quotes to enclose a reason having
4833              more than one word.
4834
4835
4836       State  State of the node with respect to the initiation of  user  jobs.
4837              Acceptable  values  are  DOWN,  DRAIN, FAIL, FAILING and FUTURE.
4838              For more information about these states see the descriptions un‐
4839              der  State in the NodeName= section above.  The default value is
4840              DOWN.
4841
4842

FRONTEND NODE CONFIGURATION

4844       On computers where frontend nodes are used  to  execute  batch  scripts
4845       rather than compute nodes, one may configure one or more frontend nodes
4846       using the configuration parameters defined  below.  These  options  are
4847       very  similar to those used in configuring compute nodes. These options
4848       may only be used on systems configured and built with  the  appropriate
4849       parameters  (--have-front-end).   The front end configuration specifies
4850       the following information:
4851
4852
4853       AllowGroups
4854              Comma-separated list of group names which may  execute  jobs  on
4855              this  front  end node. By default, all groups may use this front
4856              end node.  A user will be permitted to use this front  end  node
4857              if  AllowGroups has at least one group associated with the user.
4858              May not be used with the DenyGroups option.
4859
4860
4861       AllowUsers
4862              Comma-separated list of user names which  may  execute  jobs  on
4863              this  front  end  node. By default, all users may use this front
4864              end node.  May not be used with the DenyUsers option.
4865
4866
4867       DenyGroups
4868              Comma-separated list of group names which are prevented from ex‐
4869              ecuting  jobs  on this front end node.  May not be used with the
4870              AllowGroups option.
4871
4872
4873       DenyUsers
4874              Comma-separated list of user names which are prevented from exe‐
4875              cuting  jobs  on  this front end node.  May not be used with the
4876              AllowUsers option.
4877
4878
4879       FrontendName
4880              Name that Slurm uses to refer to  a  frontend  node.   Typically
4881              this  would  be  the string that "/bin/hostname -s" returns.  It
4882              may also be the fully  qualified  domain  name  as  returned  by
4883              "/bin/hostname  -f"  (e.g.  "foo1.bar.com"), or any valid domain
4884              name  associated  with  the  host  through  the  host   database
4885              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4886              that if the short form of the hostname is not used, it may  pre‐
4887              vent  use of hostlist expressions (the numeric portion in brack‐
4888              ets must be at the end of the string).  If the  FrontendName  is
4889              "DEFAULT",  the  values specified with that record will apply to
4890              subsequent node specifications unless explicitly  set  to  other
4891              values in that frontend node record or replaced with a different
4892              set of default values.  Each line  where  FrontendName  is  "DE‐
4893              FAULT"  will replace or add to previous default values and not a
4894              reinitialize the default values.
4895
4896
4897       FrontendAddr
4898              Name that a frontend node should be referred to in  establishing
4899              a  communications path. This name will be used as an argument to
4900              the getaddrinfo() function for identification.   As  with  Fron‐
4901              tendName, list the individual node addresses rather than using a
4902              hostlist expression.  The number  of  FrontendAddr  records  per
4903              line  must  equal  the  number  of FrontendName records per line
4904              (i.e. you can't map to node names to one address).  FrontendAddr
4905              may  also  contain  IP  addresses.  By default, the FrontendAddr
4906              will be identical in value to FrontendName.
4907
4908
4909       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4910              tens  to  for  work on this particular frontend node. By default
4911              there is a single port number for  all  slurmd  daemons  on  all
4912              frontend nodes as defined by the SlurmdPort configuration param‐
4913              eter. Use of this option is not generally recommended except for
4914              development or testing purposes.
4915
4916              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4917              automatically try to interact  with  anything  opened  on  ports
4918              8192-60000.  Configure Port to use a port outside of the config‐
4919              ured SrunPortRange and RSIP's port range.
4920
4921
4922       Reason Identifies the reason for a frontend node being in  state  DOWN,
4923              DRAINED,  DRAINING,  FAIL  or  FAILING.  Use quotes to enclose a
4924              reason having more than one word.
4925
4926
4927       State  State of the frontend node with respect  to  the  initiation  of
4928              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4929              UNKNOWN.  Node states of BUSY and IDLE should not  be  specified
4930              in the node configuration, but set the node state to UNKNOWN in‐
4931              stead.  Setting the node state to UNKNOWN  will  result  in  the
4932              node  state  being  set to BUSY, IDLE or other appropriate state
4933              based upon recovered system state information.  For more  infor‐
4934              mation  about  these  states see the descriptions under State in
4935              the NodeName= section above.  The default value is UNKNOWN.
4936
4937
4938       As an example, you can do something similar to the following to  define
4939       four front end nodes for running slurmd daemons.
4940       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4941
4942

NODESET CONFIGURATION

4944       The  nodeset  configuration  allows you to define a name for a specific
4945       set of nodes which can be used to simplify the partition  configuration
4946       section, especially for heterogenous or condo-style systems. Each node‐
4947       set may be defined by an explicit list of nodes,  and/or  by  filtering
4948       the  nodes  by  a  particular  configured feature. If both Feature= and
4949       Nodes= are used the nodeset shall be the  union  of  the  two  subsets.
4950       Note  that the nodesets are only used to simplify the partition defini‐
4951       tions at present, and are not usable outside of the partition  configu‐
4952       ration.
4953
4954       Feature
4955              All  nodes  with this single feature will be included as part of
4956              this nodeset.
4957
4958       Nodes  List of nodes in this set.
4959
4960       NodeSet
4961              Unique name for a set of nodes. Must not overlap with any  Node‐
4962              Name definitions.
4963
4964

PARTITION CONFIGURATION

4966       The partition configuration permits you to establish different job lim‐
4967       its or access controls for various groups  (or  partitions)  of  nodes.
4968       Nodes  may  be  in  more than one partition, making partitions serve as
4969       general purpose queues.  For example one may put the same set of  nodes
4970       into  two  different  partitions, each with different constraints (time
4971       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4972       allocated  resources  within a single partition.  Default values can be
4973       specified with a record in which PartitionName is "DEFAULT".   The  de‐
4974       fault entry values will apply only to lines following it in the config‐
4975       uration file and the default values can be reset multiple times in  the
4976       configuration file with multiple entries where "PartitionName=DEFAULT".
4977       The "PartitionName=" specification must be placed  on  every  line  de‐
4978       scribing  the  configuration of partitions.  Each line where Partition‐
4979       Name is "DEFAULT" will replace or add to previous  default  values  and
4980       not a reinitialize the default values.  A single partition name can not
4981       appear as a PartitionName value in more than one line (duplicate parti‐
4982       tion  name  records will be ignored).  If a partition that is in use is
4983       deleted from the configuration and slurm is restarted  or  reconfigured
4984       (scontrol  reconfigure),  jobs using the partition are canceled.  NOTE:
4985       Put all parameters for each partition on a single line.  Each  line  of
4986       partition configuration information should represent a different parti‐
4987       tion.  The partition configuration file contains the following informa‐
4988       tion:
4989
4990
4991       AllocNodes
4992              Comma-separated  list  of nodes from which users can submit jobs
4993              in the partition.  Node names may be specified  using  the  node
4994              range  expression  syntax described above.  The default value is
4995              "ALL".
4996
4997
4998       AllowAccounts
4999              Comma-separated list of accounts which may execute jobs  in  the
5000              partition.   The default value is "ALL".  NOTE: If AllowAccounts
5001              is used then DenyAccounts will not be enforced.  Also  refer  to
5002              DenyAccounts.
5003
5004
5005       AllowGroups
5006              Comma-separated  list  of  group names which may execute jobs in
5007              this partition.  A user will be permitted to  submit  a  job  to
5008              this  partition if AllowGroups has at least one group associated
5009              with the user.  Jobs executed as user root or as user  SlurmUser
5010              will be allowed to use any partition, regardless of the value of
5011              AllowGroups. In addition, a Slurm Admin or Operator will be able
5012              to  view  any partition, regardless of the value of AllowGroups.
5013              If user root attempts to execute a job as another user (e.g. us‐
5014              ing srun's --uid option), then the job will be subject to Allow‐
5015              Groups as if it were submitted by that user.  By default, Allow‐
5016              Groups is unset, meaning all groups are allowed to use this par‐
5017              tition. The special value 'ALL' is equivalent  to  this.   Users
5018              who are not members of the specified group will not see informa‐
5019              tion about this partition by default. However, this  should  not
5020              be  treated  as a security mechanism, since job information will
5021              be returned if a user requests details about the partition or  a
5022              specific  job.  See the PrivateData parameter to restrict access
5023              to job information.  NOTE: For performance reasons, Slurm  main‐
5024              tains  a list of user IDs allowed to use each partition and this
5025              is checked at job submission time.  This list of user IDs is up‐
5026              dated when the slurmctld daemon is restarted, reconfigured (e.g.
5027              "scontrol reconfig") or the partition's AllowGroups value is re‐
5028              set, even if is value is unchanged (e.g. "scontrol update Parti‐
5029              tionName=name AllowGroups=group").  For a  user's  access  to  a
5030              partition  to  change, both his group membership must change and
5031              Slurm's internal user ID list must change using one of the meth‐
5032              ods described above.
5033
5034
5035       AllowQos
5036              Comma-separated list of Qos which may execute jobs in the parti‐
5037              tion.  Jobs executed as user root can use any partition  without
5038              regard  to  the  value of AllowQos.  The default value is "ALL".
5039              NOTE: If AllowQos is used then DenyQos  will  not  be  enforced.
5040              Also refer to DenyQos.
5041
5042
5043       Alternate
5044              Partition name of alternate partition to be used if the state of
5045              this partition is "DRAIN" or "INACTIVE."
5046
5047
5048       CpuBind
5049              If a job step request does not specify an option to control  how
5050              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
5051              located to the job do not have the same CpuBind option the node.
5052              Then  the  partition's CpuBind option will control how tasks are
5053              bound to allocated resources.  Supported values  forCpuBind  are
5054              "none", "board", "socket", "ldom" (NUMA), "core" and "thread".
5055
5056
5057       Default
5058              If this keyword is set, jobs submitted without a partition spec‐
5059              ification will utilize  this  partition.   Possible  values  are
5060              "YES" and "NO".  The default value is "NO".
5061
5062
5063       DefaultTime
5064              Run  time limit used for jobs that don't specify a value. If not
5065              set then MaxTime will be used.  Format is the same as  for  Max‐
5066              Time.
5067
5068
5069       DefCpuPerGPU
5070              Default count of CPUs allocated per allocated GPU. This value is
5071              used  only  if  the  job  didn't  specify  --cpus-per-task   and
5072              --cpus-per-gpu.
5073
5074
5075       DefMemPerCPU
5076              Default   real  memory  size  available  per  allocated  CPU  in
5077              megabytes.  Used to avoid over-subscribing  memory  and  causing
5078              paging.  DefMemPerCPU would generally be used if individual pro‐
5079              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5080              lectType=select/cons_tres).   If not set, the DefMemPerCPU value
5081              for the entire cluster will be  used.   Also  see  DefMemPerGPU,
5082              DefMemPerNode  and MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
5083              DefMemPerNode are mutually exclusive.
5084
5085
5086       DefMemPerGPU
5087              Default  real  memory  size  available  per  allocated  GPU   in
5088              megabytes.   Also see DefMemPerCPU, DefMemPerNode and MaxMemPer‐
5089              CPU.  DefMemPerCPU, DefMemPerGPU and DefMemPerNode are  mutually
5090              exclusive.
5091
5092
5093       DefMemPerNode
5094              Default  real  memory  size  available  per  allocated  node  in
5095              megabytes.  Used to avoid over-subscribing  memory  and  causing
5096              paging.   DefMemPerNode  would  generally be used if whole nodes
5097              are allocated to jobs (SelectType=select/linear)  and  resources
5098              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5099              If not set, the DefMemPerNode value for the entire cluster  will
5100              be  used.  Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerCPU.
5101              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
5102              sive.
5103
5104
5105       DenyAccounts
5106              Comma-separated  list  of accounts which may not execute jobs in
5107              the partition.  By default, no accounts are denied access  NOTE:
5108              If AllowAccounts is used then DenyAccounts will not be enforced.
5109              Also refer to AllowAccounts.
5110
5111
5112       DenyQos
5113              Comma-separated list of Qos which may not execute  jobs  in  the
5114              partition.   By  default,  no QOS are denied access NOTE: If Al‐
5115              lowQos is used then DenyQos will not be  enforced.   Also  refer
5116              AllowQos.
5117
5118
5119       DisableRootJobs
5120              If  set  to  "YES" then user root will be prevented from running
5121              any jobs on this partition.  The default value will be the value
5122              of  DisableRootJobs  set  outside  of  a partition specification
5123              (which is "NO", allowing user root to execute jobs).
5124
5125
5126       ExclusiveUser
5127              If set to "YES" then nodes  will  be  exclusively  allocated  to
5128              users.  Multiple jobs may be run for the same user, but only one
5129              user can be active at a time.  This capability is also available
5130              on a per-job basis by using the --exclusive=user option.
5131
5132
5133       GraceTime
5134              Specifies,  in units of seconds, the preemption grace time to be
5135              extended to a job which has been selected for  preemption.   The
5136              default  value  is  zero, no preemption grace time is allowed on
5137              this partition.  Once a job has been  selected  for  preemption,
5138              its  end  time  is  set  to the current time plus GraceTime. The
5139              job's tasks are immediately sent SIGCONT and SIGTERM signals  in
5140              order to provide notification of its imminent termination.  This
5141              is followed by the SIGCONT, SIGTERM and SIGKILL signal  sequence
5142              upon  reaching  its  new end time. This second set of signals is
5143              sent to both the tasks and the containing batch script,  if  ap‐
5144              plicable.  See also the global KillWait configuration parameter.
5145
5146
5147       Hidden Specifies  if the partition and its jobs are to be hidden by de‐
5148              fault.  Hidden partitions will by default not be reported by the
5149              Slurm  APIs  or  commands.   Possible values are "YES" and "NO".
5150              The default value is "NO".  Note that  partitions  that  a  user
5151              lacks access to by virtue of the AllowGroups parameter will also
5152              be hidden by default.
5153
5154
5155       LLN    Schedule resources to jobs on the least loaded nodes (based upon
5156              the number of idle CPUs). This is generally only recommended for
5157              an environment with serial jobs as idle resources will  tend  to
5158              be  highly fragmented, resulting in parallel jobs being distrib‐
5159              uted across many nodes.  Note that node Weight takes  precedence
5160              over how many idle resources are on each node.  Also see the Se‐
5161              lectParameters configuration parameter CR_LLN to use  the  least
5162              loaded nodes in every partition.
5163
5164
5165       MaxCPUsPerNode
5166              Maximum  number  of  CPUs on any node available to all jobs from
5167              this partition.  This can be especially useful to schedule GPUs.
5168              For  example  a node can be associated with two Slurm partitions
5169              (e.g. "cpu" and "gpu") and the partition/queue  "cpu"  could  be
5170              limited  to  only a subset of the node's CPUs, ensuring that one
5171              or more CPUs would be available to  jobs  in  the  "gpu"  parti‐
5172              tion/queue.
5173
5174
5175       MaxMemPerCPU
5176              Maximum   real  memory  size  available  per  allocated  CPU  in
5177              megabytes.  Used to avoid over-subscribing  memory  and  causing
5178              paging.  MaxMemPerCPU would generally be used if individual pro‐
5179              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5180              lectType=select/cons_tres).   If not set, the MaxMemPerCPU value
5181              for the entire cluster will be used.  Also see DefMemPerCPU  and
5182              MaxMemPerNode.   MaxMemPerCPU and MaxMemPerNode are mutually ex‐
5183              clusive.
5184
5185
5186       MaxMemPerNode
5187              Maximum  real  memory  size  available  per  allocated  node  in
5188              megabytes.   Used  to  avoid over-subscribing memory and causing
5189              paging.  MaxMemPerNode would generally be used  if  whole  nodes
5190              are  allocated  to jobs (SelectType=select/linear) and resources
5191              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
5192              If  not set, the MaxMemPerNode value for the entire cluster will
5193              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
5194              and MaxMemPerNode are mutually exclusive.
5195
5196
5197       MaxNodes
5198              Maximum count of nodes which may be allocated to any single job.
5199              The default value is "UNLIMITED", which  is  represented  inter‐
5200              nally as -1.
5201
5202
5203       MaxTime
5204              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
5205              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
5206              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
5207              tion is one minute and second values are rounded up to the  next
5208              minute.   The job TimeLimit may be updated by root, SlurmUser or
5209              an Operator to a value higher than the configured MaxTime  after
5210              job submission.
5211
5212
5213       MinNodes
5214              Minimum count of nodes which may be allocated to any single job.
5215              The default value is 0.
5216
5217
5218       Nodes  Comma-separated list of nodes or nodesets which  are  associated
5219              with this partition.  Node names may be specified using the node
5220              range expression syntax described above. A blank list  of  nodes
5221              (i.e.  "Nodes= ") can be used if one wants a partition to exist,
5222              but have no resources (possibly on a temporary basis).  A  value
5223              of "ALL" is mapped to all nodes configured in the cluster.
5224
5225
5226       OverSubscribe
5227              Controls  the  ability of the partition to execute more than one
5228              job at a time on each resource (node, socket or  core  depending
5229              upon the value of SelectTypeParameters).  If resources are to be
5230              over-subscribed, avoiding memory over-subscription is  very  im‐
5231              portant.   SelectTypeParameters  should  be  configured to treat
5232              memory as a consumable resource and the --mem option  should  be
5233              used  for  job  allocations.   Sharing of resources is typically
5234              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
5235              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
5236              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
5237              can  negatively  impact  performance for systems with many thou‐
5238              sands of running jobs.  The default value is "NO".  For more in‐
5239              formation see the following web pages:
5240              https://slurm.schedmd.com/cons_res.html
5241              https://slurm.schedmd.com/cons_res_share.html
5242              https://slurm.schedmd.com/gang_scheduling.html
5243              https://slurm.schedmd.com/preempt.html
5244
5245
5246              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
5247                          Type=select/cons_res or  SelectType=select/cons_tres
5248                          configured.   Jobs that run in partitions with Over‐
5249                          Subscribe=EXCLUSIVE will have  exclusive  access  to
5250                          all  allocated  nodes.  These jobs are allocated all
5251                          CPUs and GRES on the nodes, but they are only  allo‐
5252                          cated as much memory as they ask for. This is by de‐
5253                          sign to support gang scheduling,  because  suspended
5254                          jobs still reside in memory. To request all the mem‐
5255                          ory on a node, use --mem=0 at submit time.
5256
5257              FORCE       Makes all resources (except GRES) in  the  partition
5258                          available for oversubscription without any means for
5259                          users to disable it.  May be followed with  a  colon
5260                          and  maximum  number of jobs in running or suspended
5261                          state.  For  example  OverSubscribe=FORCE:4  enables
5262                          each  node, socket or core to oversubscribe each re‐
5263                          source four ways.  Recommended only for systems  us‐
5264                          ing PreemptMode=suspend,gang.
5265
5266                          NOTE:  OverSubscribe=FORCE:1  is a special case that
5267                          is not exactly equivalent to OverSubscribe=NO. Over‐
5268                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5269                          tion of resources in the same partition but it  will
5270                          still allow oversubscription due to preemption. Set‐
5271                          ting OverSubscribe=NO will prevent  oversubscription
5272                          from happening due to preemption as well.
5273
5274                          NOTE: If using PreemptType=preempt/qos you can spec‐
5275                          ify a value for FORCE that is greater  than  1.  For
5276                          example,  OverSubscribe=FORCE:2 will permit two jobs
5277                          per resource  normally,  but  a  third  job  can  be
5278                          started  only  if  done  so through preemption based
5279                          upon QOS.
5280
5281                          NOTE: If OverSubscribe is configured to FORCE or YES
5282                          in  your slurm.conf and the system is not configured
5283                          to use preemption (PreemptMode=OFF)  accounting  can
5284                          easily  grow  to values greater than the actual uti‐
5285                          lization. It may be common on such  systems  to  get
5286                          error messages in the slurmdbd log stating: "We have
5287                          more allocated time than is possible."
5288
5289
5290              YES         Makes all resources (except GRES) in  the  partition
5291                          available  for sharing upon request by the job.  Re‐
5292                          sources will only be over-subscribed when explicitly
5293                          requested  by  the  user using the "--oversubscribe"
5294                          option on job submission.  May be  followed  with  a
5295                          colon  and maximum number of jobs in running or sus‐
5296                          pended state.  For example "OverSubscribe=YES:4" en‐
5297                          ables  each  node,  socket  or core to execute up to
5298                          four jobs at once.   Recommended  only  for  systems
5299                          running   with   gang  scheduling  (PreemptMode=sus‐
5300                          pend,gang).
5301
5302              NO          Selected resources are allocated to a single job. No
5303                          resource will be allocated to more than one job.
5304
5305                          NOTE:   Even   if  you  are  using  PreemptMode=sus‐
5306                          pend,gang,  setting  OverSubscribe=NO  will  disable
5307                          preemption   on   that   partition.   Use   OverSub‐
5308                          scribe=FORCE:1 if you want to disable  normal  over‐
5309                          subscription  but still allow suspension due to pre‐
5310                          emption.
5311
5312
5313       OverTimeLimit
5314              Number of minutes by which a job can exceed its time  limit  be‐
5315              fore  being canceled.  Normally a job's time limit is treated as
5316              a hard limit and the job  will  be  killed  upon  reaching  that
5317              limit.   Configuring OverTimeLimit will result in the job's time
5318              limit being treated like a soft limit.  Adding the OverTimeLimit
5319              value  to  the  soft  time  limit provides a hard time limit, at
5320              which point the job is canceled.  This  is  particularly  useful
5321              for  backfill  scheduling, which bases upon each job's soft time
5322              limit.  If not set, the OverTimeLimit value for the entire clus‐
5323              ter  will  be  used.   May not exceed 65533 minutes.  A value of
5324              "UNLIMITED" is also supported.
5325
5326
5327       PartitionName
5328              Name by which the partition may be  referenced  (e.g.  "Interac‐
5329              tive").   This  name  can  be specified by users when submitting
5330              jobs.  If the PartitionName is "DEFAULT", the  values  specified
5331              with  that  record will apply to subsequent partition specifica‐
5332              tions unless explicitly set to other values  in  that  partition
5333              record or replaced with a different set of default values.  Each
5334              line where PartitionName is "DEFAULT" will  replace  or  add  to
5335              previous  default values and not a reinitialize the default val‐
5336              ues.
5337
5338
5339       PreemptMode
5340              Mechanism used to preempt jobs or  enable  gang  scheduling  for
5341              this  partition  when PreemptType=preempt/partition_prio is con‐
5342              figured.  This partition-specific PreemptMode configuration  pa‐
5343              rameter will override the cluster-wide PreemptMode for this par‐
5344              tition.  It can be set to OFF to  disable  preemption  and  gang
5345              scheduling  for  this  partition.  See also PriorityTier and the
5346              above description of the cluster-wide PreemptMode parameter  for
5347              further details.
5348
5349
5350       PriorityJobFactor
5351              Partition  factor  used by priority/multifactor plugin in calcu‐
5352              lating job priority.  The value may not exceed 65533.  Also  see
5353              PriorityTier.
5354
5355
5356       PriorityTier
5357              Jobs  submitted  to a partition with a higher PriorityTier value
5358              will be evaluated by the scheduler before pending jobs in a par‐
5359              tition  with  a lower PriorityTier value. They will also be con‐
5360              sidered for preemption of  running  jobs  in  partition(s)  with
5361              lower PriorityTier values if PreemptType=preempt/partition_prio.
5362              The value may not exceed 65533.  Also see PriorityJobFactor.
5363
5364
5365       QOS    Used to extend the limits available to a  QOS  on  a  partition.
5366              Jobs will not be associated to this QOS outside of being associ‐
5367              ated to the partition.  They will still be associated  to  their
5368              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5369              set in both the Partition's QOS and the Job's QOS the  Partition
5370              QOS  will  be  honored  unless the Job's QOS has the OverPartQOS
5371              flag set in which the Job's QOS will have priority.
5372
5373
5374       ReqResv
5375              Specifies users of this partition are required  to  designate  a
5376              reservation  when submitting a job. This option can be useful in
5377              restricting usage of a partition that may have  higher  priority
5378              or additional resources to be allowed only within a reservation.
5379              Possible values are "YES" and "NO".  The default value is "NO".
5380
5381
5382       ResumeTimeout
5383              Maximum time permitted (in seconds) between when a  node  resume
5384              request  is  issued  and when the node is actually available for
5385              use.  Nodes which fail to respond in this  time  frame  will  be
5386              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
5387              which reboot after this time frame will be marked  DOWN  with  a
5388              reason  of  "Node unexpectedly rebooted."  For nodes that are in
5389              multiple partitions with this option set, the highest time  will
5390              take  effect. If not set on any partition, the node will use the
5391              ResumeTimeout value set for the entire cluster.
5392
5393
5394       RootOnly
5395              Specifies if only user ID zero (i.e. user root) may allocate re‐
5396              sources  in this partition. User root may allocate resources for
5397              any other user, but the request must be initiated by user  root.
5398              This  option can be useful for a partition to be managed by some
5399              external entity (e.g. a higher-level job manager)  and  prevents
5400              users  from directly using those resources.  Possible values are
5401              "YES" and "NO".  The default value is "NO".
5402
5403
5404       SelectTypeParameters
5405              Partition-specific resource allocation type.   This  option  re‐
5406              places  the global SelectTypeParameters value.  Supported values
5407              are CR_Core,  CR_Core_Memory,  CR_Socket  and  CR_Socket_Memory.
5408              Use  requires  the system-wide SelectTypeParameters value be set
5409              to any of the four supported values  previously  listed;  other‐
5410              wise, the partition-specific value will be ignored.
5411
5412
5413       Shared The  Shared  configuration  parameter  has  been replaced by the
5414              OverSubscribe parameter described above.
5415
5416
5417       State  State of partition or availability for use.  Possible values are
5418              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5419              See also the related "Alternate" keyword.
5420
5421              UP        Designates that new jobs may be queued on  the  parti‐
5422                        tion,  and  that  jobs  may be allocated nodes and run
5423                        from the partition.
5424
5425              DOWN      Designates that new jobs may be queued on  the  parti‐
5426                        tion,  but  queued jobs may not be allocated nodes and
5427                        run from the partition. Jobs already  running  on  the
5428                        partition continue to run. The jobs must be explicitly
5429                        canceled to force their termination.
5430
5431              DRAIN     Designates that no new jobs may be queued on the  par‐
5432                        tition (job submission requests will be denied with an
5433                        error message), but jobs already queued on the  parti‐
5434                        tion  may  be  allocated  nodes and run.  See also the
5435                        "Alternate" partition specification.
5436
5437              INACTIVE  Designates that no new jobs may be queued on the  par‐
5438                        tition,  and  jobs already queued may not be allocated
5439                        nodes and run.  See  also  the  "Alternate"  partition
5440                        specification.
5441
5442
5443       SuspendTime
5444              Nodes  which remain idle or down for this number of seconds will
5445              be placed into power save mode by SuspendProgram.  For efficient
5446              system utilization, it is recommended that the value of Suspend‐
5447              Time be at least as large as the sum of SuspendTimeout plus  Re‐
5448              sumeTimeout.   For  nodes  that  are in multiple partitions with
5449              this option set, the highest time will take effect. If  not  set
5450              on  any  partition,  the node will use the SuspendTime value set
5451              for the entire cluster.  Setting  SuspendTime  to  anything  but
5452              "INFINITE" will enable power save mode.
5453
5454
5455       SuspendTimeout
5456              Maximum  time permitted (in seconds) between when a node suspend
5457              request is issued and when the node is shutdown.  At  that  time
5458              the  node  must  be  ready  for a resume request to be issued as
5459              needed for new work.  For nodes that are in multiple  partitions
5460              with  this option set, the highest time will take effect. If not
5461              set on any partition, the node will use the SuspendTimeout value
5462              set for the entire cluster.
5463
5464
5465       TRESBillingWeights
5466              TRESBillingWeights is used to define the billing weights of each
5467              TRES type that will be used in calculating the usage of  a  job.
5468              The calculated usage is used when calculating fairshare and when
5469              enforcing the TRES billing limit on jobs.
5470
5471              Billing weights are specified as a comma-separated list of <TRES
5472              Type>=<TRES Billing Weight> pairs.
5473
5474              Any  TRES Type is available for billing. Note that the base unit
5475              for memory and burst buffers is megabytes.
5476
5477              By default the billing of TRES is calculated as the sum  of  all
5478              TRES types multiplied by their corresponding billing weight.
5479
5480              The  weighted  amount  of a resource can be adjusted by adding a
5481              suffix of K,M,G,T or P after the billing weight. For example,  a
5482              memory weight of "mem=.25" on a job allocated 8GB will be billed
5483              2048 (8192MB *.25) units. A memory weight of "mem=.25G"  on  the
5484              same job will be billed 2 (8192MB * (.25/1024)) units.
5485
5486              Negative values are allowed.
5487
5488              When  a job is allocated 1 CPU and 8 GB of memory on a partition
5489              configured                   with                   TRESBilling‐
5490              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5491              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5492
5493              If PriorityFlags=MAX_TRES is configured, the  billable  TRES  is
5494              calculated  as the MAX of individual TRES' on a node (e.g. cpus,
5495              mem, gres) plus the sum of all global TRES' (e.g. licenses). Us‐
5496              ing  the same example above the billable TRES will be MAX(1*1.0,
5497              8*0.25) + (0*2.0) = 2.0.
5498
5499              If TRESBillingWeights is not defined  then  the  job  is  billed
5500              against the total number of allocated CPUs.
5501
5502              NOTE: TRESBillingWeights doesn't affect job priority directly as
5503              it is currently not used for the size of the job.  If  you  want
5504              TRES'  to  play  a  role in the job's priority then refer to the
5505              PriorityWeightTRES option.
5506
5507
5508

PROLOG AND EPILOG SCRIPTS

5510       There are a variety of prolog and epilog program options  that  execute
5511       with  various  permissions and at various times.  The four options most
5512       likely to be used are: Prolog and Epilog (executed once on each compute
5513       node  for  each job) plus PrologSlurmctld and EpilogSlurmctld (executed
5514       once on the ControlMachine for each job).
5515
5516       NOTE: Standard output and error messages are  normally  not  preserved.
5517       Explicitly  write  output and error messages to an appropriate location
5518       if you wish to preserve that information.
5519
5520       NOTE:  By default the Prolog script is ONLY run on any individual  node
5521       when  it  first  sees a job step from a new allocation. It does not run
5522       the Prolog immediately when an allocation is granted.  If no job  steps
5523       from  an allocation are run on a node, it will never run the Prolog for
5524       that allocation.  This Prolog behaviour can  be  changed  by  the  Pro‐
5525       logFlags  parameter.  The Epilog, on the other hand, always runs on ev‐
5526       ery node of an allocation when the allocation is released.
5527
5528       If the Epilog fails (returns a non-zero exit code), this will result in
5529       the node being set to a DRAIN state.  If the EpilogSlurmctld fails (re‐
5530       turns a non-zero exit code), this will only be logged.  If  the  Prolog
5531       fails  (returns a non-zero exit code), this will result in the node be‐
5532       ing set to a DRAIN state and the job being requeued in a held state un‐
5533       less  nohold_on_prolog_fail  is  configured in SchedulerParameters.  If
5534       the PrologSlurmctld fails (returns a non-zero exit code), this will re‐
5535       sult in the job being requeued to be executed on another node if possi‐
5536       ble. Only batch jobs can be requeued.   Interactive  jobs  (salloc  and
5537       srun)  will be cancelled if the PrologSlurmctld fails.  If slurmcltd is
5538       stopped while either PrologSlurmctld or EpilogSlurmctld is running, the
5539       script will be killed with SIGKILL. The script will restart when slurm‐
5540       ctld restarts.
5541
5542
5543       Information about the job is passed to  the  script  using  environment
5544       variables.  Unless otherwise specified, these environment variables are
5545       available in each of the scripts mentioned above (Prolog, Epilog,  Pro‐
5546       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5547       ables that includes those  available  in  the  SrunProlog,  SrunEpilog,
5548       TaskProlog  and  TaskEpilog  please  see  the  Prolog  and Epilog Guide
5549       <https://slurm.schedmd.com/prolog_epilog.html>.
5550
5551       SLURM_ARRAY_JOB_ID
5552              If this job is part of a job array, this will be set to the  job
5553              ID.   Otherwise  it will not be set.  To reference this specific
5554              task of a job array, combine SLURM_ARRAY_JOB_ID  with  SLURM_AR‐
5555              RAY_TASK_ID      (e.g.      "scontrol     update     ${SLURM_AR‐
5556              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}  ...");  Available  in   Pro‐
5557              logSlurmctld and EpilogSlurmctld.
5558
5559       SLURM_ARRAY_TASK_ID
5560              If this job is part of a job array, this will be set to the task
5561              ID.  Otherwise it will not be set.  To reference  this  specific
5562              task  of  a job array, combine SLURM_ARRAY_JOB_ID with SLURM_AR‐
5563              RAY_TASK_ID     (e.g.     "scontrol      update      ${SLURM_AR‐
5564              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}   ...");  Available  in  Pro‐
5565              logSlurmctld and EpilogSlurmctld.
5566
5567       SLURM_ARRAY_TASK_MAX
5568              If this job is part of a job array, this will be set to the max‐
5569              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5570              logSlurmctld and EpilogSlurmctld.
5571
5572       SLURM_ARRAY_TASK_MIN
5573              If this job is part of a job array, this will be set to the min‐
5574              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5575              logSlurmctld and EpilogSlurmctld.
5576
5577       SLURM_ARRAY_TASK_STEP
5578              If this job is part of a job array, this will be set to the step
5579              size  of  task IDs.  Otherwise it will not be set.  Available in
5580              PrologSlurmctld and EpilogSlurmctld.
5581
5582       SLURM_CLUSTER_NAME
5583              Name of the cluster executing the job.
5584
5585       SLURM_CONF
5586              Location of the slurm.conf file. Available in Prolog and Epilog.
5587
5588       SLURMD_NODENAME
5589              Name of the node running the task. In the case of a parallel job
5590              executing on multiple compute nodes, the various tasks will have
5591              this environment variable set to different values on  each  com‐
5592              pute node. Available in Prolog and Epilog.
5593
5594       SLURM_JOB_ACCOUNT
5595              Account name used for the job.  Available in PrologSlurmctld and
5596              EpilogSlurmctld.
5597
5598       SLURM_JOB_CONSTRAINTS
5599              Features required to run the job.   Available  in  Prolog,  Pro‐
5600              logSlurmctld and EpilogSlurmctld.
5601
5602       SLURM_JOB_DERIVED_EC
5603              The  highest  exit  code  of all of the job steps.  Available in
5604              EpilogSlurmctld.
5605
5606       SLURM_JOB_EXIT_CODE
5607              The exit code of the job script (or salloc). The  value  is  the
5608              status  as  returned  by  the  wait()  system call (See wait(2))
5609              Available in EpilogSlurmctld.
5610
5611       SLURM_JOB_EXIT_CODE2
5612              The exit code of the job script (or salloc). The value  has  the
5613              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5614              cally as set by the exit() function. The second  number  of  the
5615              signal that caused the process to terminate if it was terminated
5616              by a signal.  Available in EpilogSlurmctld.
5617
5618       SLURM_JOB_GID
5619              Group ID of the job's owner.
5620
5621       SLURM_JOB_GPUS
5622              The GPU IDs of GPUs in the job allocation (if  any).   Available
5623              in the Prolog and Epilog.
5624
5625       SLURM_JOB_GROUP
5626              Group name of the job's owner.  Available in PrologSlurmctld and
5627              EpilogSlurmctld.
5628
5629       SLURM_JOB_ID
5630              Job ID.
5631
5632       SLURM_JOBID
5633              Job ID.
5634
5635       SLURM_JOB_NAME
5636              Name of the job.  Available in PrologSlurmctld and  EpilogSlurm‐
5637              ctld.
5638
5639       SLURM_JOB_NODELIST
5640              Nodes  assigned  to job. A Slurm hostlist expression.  "scontrol
5641              show hostnames" can be used to convert this to a list  of  indi‐
5642              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5643              logSlurmctld.
5644
5645       SLURM_JOB_PARTITION
5646              Partition that job runs in.  Available in  Prolog,  PrologSlurm‐
5647              ctld and EpilogSlurmctld.
5648
5649       SLURM_JOB_UID
5650              User ID of the job's owner.
5651
5652       SLURM_JOB_USER
5653              User name of the job's owner.
5654
5655       SLURM_SCRIPT_CONTEXT
5656              Identifies which epilog or prolog program is currently running.
5657
5658

UNKILLABLE STEP PROGRAM SCRIPT

5660       This program can be used to take special actions to clean up the unkil‐
5661       lable processes and/or notify system administrators.  The program  will
5662       be run as SlurmdUser (usually "root") on the compute node where Unkill‐
5663       ableStepTimeout was triggered.
5664
5665       Information about the unkillable job step is passed to the script using
5666       environment variables.
5667
5668       SLURM_JOB_ID
5669              Job ID.
5670
5671       SLURM_STEP_ID
5672              Job Step ID.
5673
5674

NETWORK TOPOLOGY

5676       Slurm  is  able  to  optimize  job allocations to minimize network con‐
5677       tention.  Special Slurm logic is used to optimize allocations  on  sys‐
5678       tems with a three-dimensional interconnect.  and information about con‐
5679       figuring those systems are  available  on  web  pages  available  here:
5680       <https://slurm.schedmd.com/>.   For a hierarchical network, Slurm needs
5681       to have detailed information about how nodes are configured on the net‐
5682       work switches.
5683
5684       Given  network topology information, Slurm allocates all of a job's re‐
5685       sources onto a single  leaf  of  the  network  (if  possible)  using  a
5686       best-fit  algorithm.  Otherwise it will allocate a job's resources onto
5687       multiple leaf switches so  as  to  minimize  the  use  of  higher-level
5688       switches.   The  TopologyPlugin parameter controls which plugin is used
5689       to collect network topology information.   The  only  values  presently
5690       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5691       forms best-fit logic over three-dimensional topology),  "topology/none"
5692       (default  for other systems, best-fit logic over one-dimensional topol‐
5693       ogy), "topology/tree" (determine the network topology based upon infor‐
5694       mation  contained  in a topology.conf file, see "man topology.conf" for
5695       more information).  Future plugins may gather topology information  di‐
5696       rectly from the network.  The topology information is optional.  If not
5697       provided, Slurm will perform a best-fit algorithm  assuming  the  nodes
5698       are  in  a  one-dimensional  array as configured and the communications
5699       cost is related to the node distance in this array.
5700
5701

RELOCATING CONTROLLERS

5703       If the cluster's computers used for the primary  or  backup  controller
5704       will be out of service for an extended period of time, it may be desir‐
5705       able to relocate them.  In order to do so, follow this procedure:
5706
5707       1. Stop the Slurm daemons
5708       2. Modify the slurm.conf file appropriately
5709       3. Distribute the updated slurm.conf file to all nodes
5710       4. Restart the Slurm daemons
5711
5712       There should be no loss of any running or pending  jobs.   Ensure  that
5713       any  nodes  added  to  the cluster have the current slurm.conf file in‐
5714       stalled.
5715
5716       CAUTION: If two nodes are simultaneously configured as the primary con‐
5717       troller  (two  nodes  on which SlurmctldHost specify the local host and
5718       the slurmctld daemon is executing on each), system behavior will be de‐
5719       structive.  If a compute node has an incorrect SlurmctldHost parameter,
5720       that node may be rendered unusable, but no other harm will result.
5721
5722

EXAMPLE

5724       #
5725       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5726       # Author: John Doe
5727       # Date: 11/06/2001
5728       #
5729       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5730       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5731       #
5732       AuthType=auth/munge
5733       Epilog=/usr/local/slurm/epilog
5734       Prolog=/usr/local/slurm/prolog
5735       FirstJobId=65536
5736       InactiveLimit=120
5737       JobCompType=jobcomp/filetxt
5738       JobCompLoc=/var/log/slurm/jobcomp
5739       KillWait=30
5740       MaxJobCount=10000
5741       MinJobAge=3600
5742       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5743       ReturnToService=0
5744       SchedulerType=sched/backfill
5745       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5746       SlurmdLogFile=/var/log/slurm/slurmd.log
5747       SlurmctldPort=7002
5748       SlurmdPort=7003
5749       SlurmdSpoolDir=/var/spool/slurmd.spool
5750       StateSaveLocation=/var/spool/slurm.state
5751       SwitchType=switch/none
5752       TmpFS=/tmp
5753       WaitTime=30
5754       JobCredentialPrivateKey=/usr/local/slurm/private.key
5755       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5756       #
5757       # Node Configurations
5758       #
5759       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5760       NodeName=DEFAULT State=UNKNOWN
5761       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5762       # Update records for specific DOWN nodes
5763       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5764       #
5765       # Partition Configurations
5766       #
5767       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5768       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5769       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5770       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5771
5772

INCLUDE MODIFIERS

5774       The "include" key word can be used with modifiers within the  specified
5775       pathname.  These modifiers would be replaced with cluster name or other
5776       information depending on which modifier is specified. If  the  included
5777       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5778       slash), it will searched for in the same directory  as  the  slurm.conf
5779       file.
5780
5781       %c     Cluster name specified in the slurm.conf will be used.
5782
5783       EXAMPLE
5784       ClusterName=linux
5785       include /home/slurm/etc/%c_config
5786       # Above line interpreted as
5787       # "include /home/slurm/etc/linux_config"
5788
5789

FILE AND DIRECTORY PERMISSIONS

5791       There  are  three classes of files: Files used by slurmctld must be ac‐
5792       cessible by user SlurmUser and accessible by  the  primary  and  backup
5793       control machines.  Files used by slurmd must be accessible by user root
5794       and accessible from every compute node.  A few files need to be  acces‐
5795       sible by normal users on all login and compute nodes.  While many files
5796       and directories are listed below, most of them will not  be  used  with
5797       most configurations.
5798
5799       Epilog Must  be  executable  by  user root.  It is recommended that the
5800              file be readable by all users.  The file  must  exist  on  every
5801              compute node.
5802
5803       EpilogSlurmctld
5804              Must  be  executable  by user SlurmUser.  It is recommended that
5805              the file be readable by all users.  The file must be  accessible
5806              by the primary and backup control machines.
5807
5808       HealthCheckProgram
5809              Must  be  executable  by  user root.  It is recommended that the
5810              file be readable by all users.  The file  must  exist  on  every
5811              compute node.
5812
5813       JobCompLoc
5814              If this specifies a file, it must be writable by user SlurmUser.
5815              The file must be accessible by the primary  and  backup  control
5816              machines.
5817
5818       JobCredentialPrivateKey
5819              Must be readable only by user SlurmUser and writable by no other
5820              users.  The file must be accessible by the  primary  and  backup
5821              control machines.
5822
5823       JobCredentialPublicCertificate
5824              Readable  to  all  users  on all nodes.  Must not be writable by
5825              regular users.
5826
5827       MailProg
5828              Must be executable by user SlurmUser.  Must not be  writable  by
5829              regular  users.   The file must be accessible by the primary and
5830              backup control machines.
5831
5832       Prolog Must be executable by user root.  It  is  recommended  that  the
5833              file  be  readable  by  all users.  The file must exist on every
5834              compute node.
5835
5836       PrologSlurmctld
5837              Must be executable by user SlurmUser.  It  is  recommended  that
5838              the  file be readable by all users.  The file must be accessible
5839              by the primary and backup control machines.
5840
5841       ResumeProgram
5842              Must be executable by user SlurmUser.  The file must be accessi‐
5843              ble by the primary and backup control machines.
5844
5845       slurm.conf
5846              Readable  to  all  users  on all nodes.  Must not be writable by
5847              regular users.
5848
5849       SlurmctldLogFile
5850              Must be writable by user SlurmUser.  The file must be accessible
5851              by the primary and backup control machines.
5852
5853       SlurmctldPidFile
5854              Must  be  writable by user root.  Preferably writable and remov‐
5855              able by SlurmUser.  The file must be accessible by  the  primary
5856              and backup control machines.
5857
5858       SlurmdLogFile
5859              Must  be  writable  by user root.  A distinct file must exist on
5860              each compute node.
5861
5862       SlurmdPidFile
5863              Must be writable by user root.  A distinct file  must  exist  on
5864              each compute node.
5865
5866       SlurmdSpoolDir
5867              Must  be  writable  by user root.  A distinct file must exist on
5868              each compute node.
5869
5870       SrunEpilog
5871              Must be executable by all users.  The file must exist  on  every
5872              login and compute node.
5873
5874       SrunProlog
5875              Must  be  executable by all users.  The file must exist on every
5876              login and compute node.
5877
5878       StateSaveLocation
5879              Must be writable by user SlurmUser.  The file must be accessible
5880              by the primary and backup control machines.
5881
5882       SuspendProgram
5883              Must be executable by user SlurmUser.  The file must be accessi‐
5884              ble by the primary and backup control machines.
5885
5886       TaskEpilog
5887              Must be executable by all users.  The file must exist  on  every
5888              compute node.
5889
5890       TaskProlog
5891              Must  be  executable by all users.  The file must exist on every
5892              compute node.
5893
5894       UnkillableStepProgram
5895              Must be executable by user SlurmUser.  The file must be accessi‐
5896              ble by the primary and backup control machines.
5897
5898

LOGGING

5900       Note  that  while  Slurm  daemons  create  log files and other files as
5901       needed, it treats the lack of parent  directories  as  a  fatal  error.
5902       This prevents the daemons from running if critical file systems are not
5903       mounted and will minimize the risk of cold-starting  (starting  without
5904       preserving jobs).
5905
5906       Log files and job accounting files, may need to be created/owned by the
5907       "SlurmUser" uid to be  successfully  accessed.   Use  the  "chown"  and
5908       "chmod"  commands  to  set the ownership and permissions appropriately.
5909       See the section FILE AND DIRECTORY PERMISSIONS  for  information  about
5910       the various files and directories used by Slurm.
5911
5912       It  is  recommended  that  the logrotate utility be used to ensure that
5913       various log files do not become too large.  This also applies  to  text
5914       files  used  for  accounting, process tracking, and the slurmdbd log if
5915       they are used.
5916
5917       Here is a sample logrotate configuration. Make appropriate site modifi‐
5918       cations  and  save  as  /etc/logrotate.d/slurm  on  all nodes.  See the
5919       logrotate man page for more details.
5920
5921       ##
5922       # Slurm Logrotate Configuration
5923       ##
5924       /var/log/slurm/*.log {
5925            compress
5926            missingok
5927            nocopytruncate
5928            nodelaycompress
5929            nomail
5930            notifempty
5931            noolddir
5932            rotate 5
5933            sharedscripts
5934            size=5M
5935            create 640 slurm root
5936            postrotate
5937                 pkill -x --signal SIGUSR2 slurmctld
5938                 pkill -x --signal SIGUSR2 slurmd
5939                 pkill -x --signal SIGUSR2 slurmdbd
5940                 exit 0
5941            endscript
5942       }
5943

COPYING

5945       Copyright (C) 2002-2007 The Regents of the  University  of  California.
5946       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5947       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5948       Copyright (C) 2010-2021 SchedMD LLC.
5949
5950       This  file  is  part  of Slurm, a resource management program.  For de‐
5951       tails, see <https://slurm.schedmd.com/>.
5952
5953       Slurm is free software; you can redistribute it and/or modify it  under
5954       the  terms  of  the GNU General Public License as published by the Free
5955       Software Foundation; either version 2 of the License, or (at  your  op‐
5956       tion) any later version.
5957
5958       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
5959       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
5960       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
5961       for more details.
5962
5963

FILES

5965       /etc/slurm.conf
5966
5967