1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can be modified at system build time using the DE‐
17       FAULT_SLURM_CONF  parameter  or  at  execution  time  by  setting   the
18       SLURM_CONF  environment  variable.  The Slurm daemons also allow you to
19       override both the built-in and environment-provided location using  the
20       "-f" option on the command line.
21
22       The  contents  of the file are case insensitive except for the names of
23       nodes and partitions. Any text following a  "#"  in  the  configuration
24       file  is treated as a comment through the end of that line.  Changes to
25       the configuration file take effect upon restart of Slurm daemons,  dae‐
26       mon receipt of the SIGHUP signal, or execution of the command "scontrol
27       reconfigure" unless otherwise noted.
28
29       If a line begins with the word "Include"  followed  by  whitespace  and
30       then  a  file  name, that file will be included inline with the current
31       configuration file. For large or complex systems,  multiple  configura‐
32       tion  files  may  prove easier to manage and enable reuse of some files
33       (See INCLUDE MODIFIERS for more details).
34
35       Note on file permissions:
36
37       The slurm.conf file must be readable by all users of Slurm, since it is
38       used  by  many  of the Slurm commands.  Other files that are defined in
39       the slurm.conf file, such as log files and job  accounting  files,  may
40       need to be created/owned by the user "SlurmUser" to be successfully ac‐
41       cessed.  Use the "chown" and "chmod" commands to set the ownership  and
42       permissions  appropriately.  See the section FILE AND DIRECTORY PERMIS‐
43       SIONS for information about the various files and directories  used  by
44       Slurm.
45
46

PARAMETERS

48       The overall configuration parameters available include:
49
50
51       AccountingStorageBackupHost
52              The  name  of  the backup machine hosting the accounting storage
53              database.  If used with the accounting_storage/slurmdbd  plugin,
54              this  is  where the backup slurmdbd would be running.  Only used
55              with systems using SlurmDBD, ignored otherwise.
56
57
58       AccountingStorageEnforce
59              This controls what level of association-based enforcement to im‐
60              pose  on  job submissions.  Valid options are any combination of
61              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
62              all for all things (except nojobs and nosteps, which must be re‐
63              quested as well).
64
65              If limits, qos, or wckeys are set, associations  will  automati‐
66              cally be set.
67
68              If wckeys is set, TrackWCKey will automatically be set.
69
70              If  safe  is  set, limits and associations will automatically be
71              set.
72
73              If nojobs is set, nosteps will automatically be set.
74
75              By setting associations, no new job is allowed to run  unless  a
76              corresponding  association  exists in the system.  If limits are
77              enforced, users can be limited by association  to  whatever  job
78              size or run time limits are defined.
79
80              If  nojobs  is set, Slurm will not account for any jobs or steps
81              on the system. Likewise, if nosteps is set, Slurm will  not  ac‐
82              count for any steps that have run.
83
84              If  safe is enforced, a job will only be launched against an as‐
85              sociation or qos that has a GrpTRESMins limit set,  if  the  job
86              will be able to run to completion. Without this option set, jobs
87              will be launched as long as their usage hasn't reached the  cpu-
88              minutes  limit.  This  can  lead to jobs being launched but then
89              killed when the limit is reached.
90
91              With qos and/or wckeys enforced jobs will not be  scheduled  un‐
92              less  a valid qos and/or workload characterization key is speci‐
93              fied.
94
95              When AccountingStorageEnforce  is  changed,  a  restart  of  the
96              slurmctld daemon is required (not just a "scontrol reconfig").
97
98
99       AccountingStorageExternalHost
100              A     comma-separated     list     of     external     slurmdbds
101              (<host/ip>[:port][,...]) to register with. If no port is  given,
102              the AccountingStoragePort will be used.
103
104              This  allows  clusters  registered with the external slurmdbd to
105              communicate with each other using the --cluster/-M  client  com‐
106              mand options.
107
108              The  cluster  will  add  itself  to  the external slurmdbd if it
109              doesn't exist. If a non-external cluster already exists  on  the
110              external  slurmdbd, the slurmctld will ignore registering to the
111              external slurmdbd.
112
113
114       AccountingStorageHost
115              The name of the machine hosting the accounting storage database.
116              Only  used with systems using SlurmDBD, ignored otherwise.  Also
117              see DefaultStorageHost.
118
119
120       AccountingStorageParameters
121              Comma-separated list of  key-value  pair  parameters.  Currently
122              supported  values  include options to establish a secure connec‐
123              tion to the database:
124
125              SSL_CERT
126                The path name of the client public key certificate file.
127
128              SSL_CA
129                The path name of the Certificate  Authority  (CA)  certificate
130                file.
131
132              SSL_CAPATH
133                The  path  name  of the directory that contains trusted SSL CA
134                certificate files.
135
136              SSL_KEY
137                The path name of the client private key file.
138
139              SSL_CIPHER
140                The list of permissible ciphers for SSL encryption.
141
142
143       AccountingStoragePass
144              The password used to gain access to the database  to  store  the
145              accounting  data.   Only used for database type storage plugins,
146              ignored otherwise.  In the case of Slurm DBD  (Database  Daemon)
147              with  MUNGE authentication this can be configured to use a MUNGE
148              daemon specifically configured to provide authentication between
149              clusters  while the default MUNGE daemon provides authentication
150              within a cluster.  In that  case,  AccountingStoragePass  should
151              specify  the  named  port to be used for communications with the
152              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
153              The default value is NULL.  Also see DefaultStoragePass.
154
155
156       AccountingStoragePort
157              The  listening  port  of the accounting storage database server.
158              Only used for database type storage plugins, ignored  otherwise.
159              The  default  value  is  SLURMDBD_PORT  as established at system
160              build time. If no value is explicitly specified, it will be  set
161              to  6819.   This value must be equal to the DbdPort parameter in
162              the slurmdbd.conf file.  Also see DefaultStoragePort.
163
164
165       AccountingStorageTRES
166              Comma-separated list of resources you wish to track on the clus‐
167              ter.   These  are the resources requested by the sbatch/srun job
168              when it is submitted. Currently this consists of  any  GRES,  BB
169              (burst  buffer) or license along with CPU, Memory, Node, Energy,
170              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
171              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
172              These default TRES cannot be disabled,  but  only  appended  to.
173              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
174              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
175              along with a gres called craynetwork as well as a license called
176              iop1. Whenever these resources are used on the cluster they  are
177              recorded.  The  TRES are automatically set up in the database on
178              the start of the slurmctld.
179
180              If multiple GRES of different types are tracked  (e.g.  GPUs  of
181              different  types), then job requests with matching type specifi‐
182              cations will be recorded.  Given a  configuration  of  "Account‐
183              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
184              "gres/gpu:tesla" and "gres/gpu:volta" will track only jobs  that
185              explicitly  request  those  two GPU types, while "gres/gpu" will
186              track allocated GPUs of any type ("tesla", "volta" or any  other
187              GPU type).
188
189              Given      a      configuration      of      "AccountingStorage‐
190              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
191              "gres/gpu:volta"  will  track jobs that explicitly request those
192              GPU types.  If a job requests  GPUs,  but  does  not  explicitly
193              specify  the  GPU type, then its resource allocation will be ac‐
194              counted for as either "gres/gpu:tesla" or "gres/gpu:volta",  al‐
195              though  the  accounting  may not match the actual GPU type allo‐
196              cated to the job and the GPUs allocated to the job could be het‐
197              erogeneous.  In an environment containing various GPU types, use
198              of a job_submit plugin may be desired in order to force jobs  to
199              explicitly specify some GPU type.
200
201
202       AccountingStorageType
203              The  accounting  storage  mechanism  type.  Acceptable values at
204              present include "accounting_storage/none" and  "accounting_stor‐
205              age/slurmdbd".   The  "accounting_storage/slurmdbd"  value indi‐
206              cates that accounting records will be written to the Slurm  DBD,
207              which  manages  an underlying MySQL database. See "man slurmdbd"
208              for more information.  The default  value  is  "accounting_stor‐
209              age/none" and indicates that account records are not maintained.
210              Also see DefaultStorageType.
211
212
213       AccountingStorageUser
214              The user account for accessing the accounting storage  database.
215              Only  used for database type storage plugins, ignored otherwise.
216              Also see DefaultStorageUser.
217
218
219       AccountingStoreJobComment
220              If set to "YES" then include the job's comment field in the  job
221              complete  message  sent to the Accounting Storage database.  The
222              default is "YES".  Note the AdminComment and  SystemComment  are
223              always recorded in the database.
224
225
226       AcctGatherNodeFreq
227              The  AcctGather  plugins  sampling interval for node accounting.
228              For AcctGather plugin values of none, this parameter is ignored.
229              For all other values this parameter is the number of seconds be‐
230              tween node accounting samples. For  the  acct_gather_energy/rapl
231              plugin, set a value less than 300 because the counters may over‐
232              flow beyond this rate.  The default value is  zero.  This  value
233              disables  accounting  sampling  for  nodes. Note: The accounting
234              sampling interval for jobs is determined by the value of  JobAc‐
235              ctGatherFrequency.
236
237
238       AcctGatherEnergyType
239              Identifies the plugin to be used for energy consumption account‐
240              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
241              plugin  to  collect  energy consumption data for jobs and nodes.
242              The collection of energy consumption data  takes  place  on  the
243              node  level,  hence only in case of exclusive job allocation the
244              energy consumption measurements will reflect the job's real con‐
245              sumption. In case of node sharing between jobs the reported con‐
246              sumed energy per job (through sstat or sacct) will  not  reflect
247              the real energy consumed by the jobs.
248
249              Configurable values at present are:
250
251              acct_gather_energy/none
252                                  No energy consumption data is collected.
253
254              acct_gather_energy/ipmi
255                                  Energy  consumption  data  is collected from
256                                  the Baseboard  Management  Controller  (BMC)
257                                  using  the  Intelligent  Platform Management
258                                  Interface (IPMI).
259
260              acct_gather_energy/pm_counters
261                                  Energy consumption data  is  collected  from
262                                  the  Baseboard  Management  Controller (BMC)
263                                  for HPE Cray systems.
264
265              acct_gather_energy/rapl
266                                  Energy consumption data  is  collected  from
267                                  hardware  sensors  using the Running Average
268                                  Power Limit (RAPL) mechanism. Note that  en‐
269                                  abling RAPL may require the execution of the
270                                  command "sudo modprobe msr".
271
272              acct_gather_energy/xcc
273                                  Energy consumption data  is  collected  from
274                                  the  Lenovo  SD650 XClarity Controller (XCC)
275                                  using IPMI OEM raw commands.
276
277
278       AcctGatherInterconnectType
279              Identifies the plugin to be used for interconnect network  traf‐
280              fic  accounting.   The  jobacct_gather  plugin and slurmd daemon
281              call this plugin to collect network traffic data  for  jobs  and
282              nodes.   The  collection  of network traffic data takes place on
283              the node level, hence only in case of exclusive  job  allocation
284              the  collected  values  will  reflect the job's real traffic. In
285              case of node sharing between jobs the reported  network  traffic
286              per  job (through sstat or sacct) will not reflect the real net‐
287              work traffic by the jobs.
288
289              Configurable values at present are:
290
291              acct_gather_interconnect/none
292                                  No infiniband network data are collected.
293
294              acct_gather_interconnect/ofed
295                                  Infiniband network  traffic  data  are  col‐
296                                  lected from the hardware monitoring counters
297                                  of Infiniband devices through the  OFED  li‐
298                                  brary.  In order to account for per job net‐
299                                  work traffic, add the "ic/ofed" TRES to  Ac‐
300                                  countingStorageTRES.
301
302
303       AcctGatherFilesystemType
304              Identifies the plugin to be used for filesystem traffic account‐
305              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
306              plugin  to  collect  filesystem traffic data for jobs and nodes.
307              The collection of filesystem traffic data  takes  place  on  the
308              node  level,  hence only in case of exclusive job allocation the
309              collected values will reflect the job's real traffic. In case of
310              node  sharing  between  jobs the reported filesystem traffic per
311              job (through sstat or sacct) will not reflect the real  filesys‐
312              tem traffic by the jobs.
313
314
315              Configurable values at present are:
316
317              acct_gather_filesystem/none
318                                  No filesystem data are collected.
319
320              acct_gather_filesystem/lustre
321                                  Lustre filesystem traffic data are collected
322                                  from the counters found in /proc/fs/lustre/.
323                                  In order to account for per job lustre traf‐
324                                  fic, add the "fs/lustre"  TRES  to  Account‐
325                                  ingStorageTRES.
326
327
328       AcctGatherProfileType
329              Identifies  the  plugin  to  be used for detailed job profiling.
330              The jobacct_gather plugin and slurmd daemon call this plugin  to
331              collect  detailed  data such as I/O counts, memory usage, or en‐
332              ergy consumption for jobs and nodes.  There  are  interfaces  in
333              this  plugin  to collect data as step start and completion, task
334              start and completion, and at the account gather  frequency.  The
335              data collected at the node level is related to jobs only in case
336              of exclusive job allocation.
337
338              Configurable values at present are:
339
340              acct_gather_profile/none
341                                  No profile data is collected.
342
343              acct_gather_profile/hdf5
344                                  This enables the HDF5 plugin. The  directory
345                                  where the profile files are stored and which
346                                  values are collected are configured  in  the
347                                  acct_gather.conf file.
348
349              acct_gather_profile/influxdb
350                                  This  enables  the  influxdb plugin. The in‐
351                                  fluxdb instance host, port, database, reten‐
352                                  tion  policy  and which values are collected
353                                  are configured in the acct_gather.conf file.
354
355
356       AllowSpecResourcesUsage
357              If set to "YES", Slurm allows individual jobs to override node's
358              configured  CoreSpecCount  value. For a job to take advantage of
359              this feature, a command line option of --core-spec must be spec‐
360              ified.  The default value for this option is "YES" for Cray sys‐
361              tems and "NO" for other system types.
362
363
364       AuthAltTypes
365              Comma-separated list of alternative authentication plugins  that
366              the  slurmctld  will permit for communication. Acceptable values
367              at present include auth/jwt.
368
369              NOTE: auth/jwt requires a jwt_hs256.key to be populated  in  the
370              StateSaveLocation    directory    for    slurmctld   only.   The
371              jwt_hs256.key should only be visible to the SlurmUser and  root.
372              It  is not suggested to place the jwt_hs256.key on any nodes but
373              the controller running slurmctld.  auth/jwt can be activated  by
374              the  presence of the SLURM_JWT environment variable.  When acti‐
375              vated, it will override the default AuthType.
376
377
378       AuthAltParameters
379              Used to define alternative authentication plugins options.  Mul‐
380              tiple options may be comma separated.
381
382              disable_token_creation
383                             Disable "scontrol token" use by non-SlurmUser ac‐
384                             counts.
385
386              jwt_key=       Absolute path to JWT key file. Key must be HS256,
387                             and  should  only  be accessible by SlurmUser. If
388                             not set, the default key file is jwt_hs256.key in
389                             StateSaveLocation.
390
391
392       AuthInfo
393              Additional information to be used for authentication of communi‐
394              cations between the Slurm daemons (slurmctld and slurmd) and the
395              Slurm clients.  The interpretation of this option is specific to
396              the configured AuthType.  Multiple options may be specified in a
397              comma delimited list.  If not specified, the default authentica‐
398              tion information will be used.
399
400              cred_expire   Default job step credential lifetime,  in  seconds
401                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
402                            ciently long enough to load user environment,  run
403                            prolog,  deal with the slurmd getting paged out of
404                            memory, etc.  This also controls how  long  a  re‐
405                            queued  job  must wait before starting again.  The
406                            default value is 120 seconds.
407
408              socket        Path name to a MUNGE daemon socket  to  use  (e.g.
409                            "socket=/var/run/munge/munge.socket.2").   The de‐
410                            fault  value  is  "/var/run/munge/munge.socket.2".
411                            Used by auth/munge and cred/munge.
412
413              ttl           Credential  lifetime, in seconds (e.g. "ttl=300").
414                            The default value is dependent upon the MUNGE  in‐
415                            stallation, but is typically 300 seconds.
416
417
418       AuthType
419              The  authentication method for communications between Slurm com‐
420              ponents.  Acceptable values at present include "auth/munge"  and
421              "auth/none".   The  default  value is "auth/munge".  "auth/none"
422              includes the UID in each communication, but it is not  verified.
423              This   may  be  fine  for  testing  purposes,  but  do  not  use
424              "auth/none" if you desire any security.  "auth/munge"  indicates
425              that  MUNGE  is to be used.  (See "https://dun.github.io/munge/"
426              for more information).  All Slurm daemons and commands  must  be
427              terminated  prior  to  changing  the value of AuthType and later
428              restarted.
429
430
431       BackupAddr
432              Deprecated option, see SlurmctldHost.
433
434
435       BackupController
436              Deprecated option, see SlurmctldHost.
437
438              The backup controller recovers state information from the State‐
439              SaveLocation directory, which must be readable and writable from
440              both the primary and backup controllers.  While  not  essential,
441              it  is  recommended  that  you specify a backup controller.  See
442              the RELOCATING CONTROLLERS section if you change this.
443
444
445       BatchStartTimeout
446              The maximum time (in seconds) that a batch job is permitted  for
447              launching  before being considered missing and releasing the al‐
448              location. The default value is 10 (seconds). Larger  values  may
449              be required if more time is required to execute the Prolog, load
450              user environment variables, or if the slurmd daemon  gets  paged
451              from memory.
452              Note:  The  test  for  a job being successfully launched is only
453              performed when the Slurm daemon on the  compute  node  registers
454              state  with the slurmctld daemon on the head node, which happens
455              fairly rarely.  Therefore a job will not necessarily  be  termi‐
456              nated if its start time exceeds BatchStartTimeout.  This config‐
457              uration parameter is also applied  to  launch  tasks  and  avoid
458              aborting srun commands due to long running Prolog scripts.
459
460
461       BurstBufferType
462              The  plugin  used  to manage burst buffers. Acceptable values at
463              present are:
464
465              burst_buffer/datawarp
466                     Use Cray DataWarp API to provide burst buffer functional‐
467                     ity.
468
469              burst_buffer/none
470
471
472       CliFilterPlugins
473              A  comma  delimited  list  of command line interface option fil‐
474              ter/modification plugins. The specified plugins will be executed
475              in  the  order  listed.   These are intended to be site-specific
476              plugins which can be used to set default job  parameters  and/or
477              logging events.  No cli_filter plugins are used by default.
478
479
480       ClusterName
481              The name by which this Slurm managed cluster is known in the ac‐
482              counting  database.   This  is  needed  distinguish   accounting
483              records  when multiple clusters report to the same database. Be‐
484              cause of limitations in some databases, any upper  case  letters
485              in  the  name will be silently mapped to lower case. In order to
486              avoid confusion, it is recommended that the name be lower case.
487
488
489       CommunicationParameters
490              Comma-separated options identifying communication options.
491
492              CheckGhalQuiesce
493                             Used specifically on a Cray using an  Aries  Ghal
494                             interconnect.  This will check to see if the sys‐
495                             tem is quiescing when sending a message,  and  if
496                             so, we wait until it is done before sending.
497
498              DisableIPv4    Disable IPv4 only operation for all slurm daemons
499                             (except slurmdbd). This should  also  be  set  in
500                             your slurmdbd.conf file.
501
502              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
503                             (except slurmdbd). When using both IPv4 and IPv6,
504                             address  family preferences will be based on your
505                             /etc/gai.conf file. This should also  be  set  in
506                             your slurmdbd.conf file.
507
508              NoAddrCache    By default, Slurm will cache a node's network ad‐
509                             dress after successfully establishing the  node's
510                             network  address.  This option disables the cache
511                             and Slurm will look up the node's network address
512                             each  time a connection is made.  This is useful,
513                             for example, in a  cloud  environment  where  the
514                             node addresses come and go out of DNS.
515
516              NoCtldInAddrAny
517                             Used  to directly bind to the address of what the
518                             node resolves to running the slurmctld instead of
519                             binding  messages  to  any  address  on the node,
520                             which is the default.
521
522              NoInAddrAny    Used to directly bind to the address of what  the
523                             node  resolves  to instead of binding messages to
524                             any address on the node  which  is  the  default.
525                             This option is for all daemons/clients except for
526                             the slurmctld.
527
528
529
530       CompleteWait
531              The time to wait, in seconds, when any job is in the  COMPLETING
532              state  before  any additional jobs are scheduled. This is to at‐
533              tempt to keep jobs on nodes that were recently in use, with  the
534              goal  of preventing fragmentation.  If set to zero, pending jobs
535              will be started as soon as possible.  Since a  COMPLETING  job's
536              resources are released for use by other jobs as soon as the Epi‐
537              log completes on each individual node, this can result  in  very
538              fragmented resource allocations.  To provide jobs with the mini‐
539              mum response time, a value of zero is recommended (no  waiting).
540              To  minimize  fragmentation of resources, a value equal to Kill‐
541              Wait plus two is recommended.  In that case, setting KillWait to
542              a small value may be beneficial.  The default value of Complete‐
543              Wait is zero seconds.  The value may not exceed 65533.
544
545              NOTE: Setting reduce_completing_frag  affects  the  behavior  of
546              CompleteWait.
547
548
549       ControlAddr
550              Deprecated option, see SlurmctldHost.
551
552
553       ControlMachine
554              Deprecated option, see SlurmctldHost.
555
556
557       CoreSpecPlugin
558              Identifies  the  plugins to be used for enforcement of core spe‐
559              cialization.  The slurmd daemon must be restarted for  a  change
560              in  CoreSpecPlugin to take effect.  Acceptable values at present
561              include:
562
563              core_spec/cray_aries
564                                  used only for Cray systems
565
566              core_spec/none      used for all other system types
567
568
569       CpuFreqDef
570              Default CPU frequency value or frequency governor  to  use  when
571              running  a  job  step if it has not been explicitly set with the
572              --cpu-freq option.  Acceptable values at present include  a  nu‐
573              meric  value  (frequency  in  kilohertz) or one of the following
574              governors:
575
576              Conservative  attempts to use the Conservative CPU governor
577
578              OnDemand      attempts to use the OnDemand CPU governor
579
580              Performance   attempts to use the Performance CPU governor
581
582              PowerSave     attempts to use the PowerSave CPU governor
583       There is no default value. If unset, no attempt to set the governor  is
584       made if the --cpu-freq option has not been set.
585
586
587       CpuFreqGovernors
588              List  of CPU frequency governors allowed to be set with the sal‐
589              loc, sbatch, or srun option  --cpu-freq.  Acceptable  values  at
590              present include:
591
592              Conservative  attempts to use the Conservative CPU governor
593
594              OnDemand      attempts  to  use the OnDemand CPU governor (a de‐
595                            fault value)
596
597              Performance   attempts to use the Performance  CPU  governor  (a
598                            default value)
599
600              PowerSave     attempts to use the PowerSave CPU governor
601
602              UserSpace     attempts  to use the UserSpace CPU governor (a de‐
603                            fault value)
604       The default is OnDemand, Performance and UserSpace.
605
606       CredType
607              The cryptographic signature tool to be used in the  creation  of
608              job  step  credentials.   The slurmctld daemon must be restarted
609              for a change in CredType to take effect.  Acceptable  values  at
610              present include "cred/munge" and "cred/none".  The default value
611              is "cred/munge" and is the recommended.
612
613
614       DebugFlags
615              Defines specific subsystems which should provide  more  detailed
616              event  logging.  Multiple subsystems can be specified with comma
617              separators.  Most DebugFlags will result in  verbose-level  log‐
618              ging  for  the  identified  subsystems, and could impact perfor‐
619              mance.  Valid subsystems available include:
620
621              Accrue           Accrue counters accounting details
622
623              Agent            RPC agents (outgoing RPCs from Slurm daemons)
624
625              Backfill         Backfill scheduler details
626
627              BackfillMap      Backfill scheduler to log a very verbose map of
628                               reserved  resources  through time. Combine with
629                               Backfill for a verbose and complete view of the
630                               backfill scheduler's work.
631
632              BurstBuffer      Burst Buffer plugin
633
634              CPU_Bind         CPU binding details for jobs and steps
635
636              CpuFrequency     Cpu  frequency details for jobs and steps using
637                               the --cpu-freq option.
638
639              Data             Generic data structure details.
640
641              Dependency       Job dependency debug info
642
643              Elasticsearch    Elasticsearch debug info
644
645              Energy           AcctGatherEnergy debug info
646
647              ExtSensors       External Sensors debug info
648
649              Federation       Federation scheduling debug info
650
651              FrontEnd         Front end node details
652
653              Gres             Generic resource details
654
655              Hetjob           Heterogeneous job details
656
657              Gang             Gang scheduling details
658
659              JobContainer     Job container plugin details
660
661              License          License management details
662
663              Network          Network details
664
665              NetworkRaw       Dump raw hex values of key  Network  communica‐
666                               tions. Warning: very verbose.
667
668              NodeFeatures     Node Features plugin debug info
669
670              NO_CONF_HASH     Do not log when the slurm.conf files differ be‐
671                               tween Slurm daemons
672
673              Power            Power management plugin
674
675              PowerSave        Power save (suspend/resume programs) details
676
677              Priority         Job prioritization
678
679              Profile          AcctGatherProfile plugins details
680
681              Protocol         Communication protocol details
682
683              Reservation      Advanced reservations
684
685              Route            Message forwarding debug info
686
687              SelectType       Resource selection plugin
688
689              Steps            Slurmctld resource allocation for job steps
690
691              Switch           Switch plugin
692
693              TimeCray         Timing of Cray APIs
694
695              TRESNode         Limits dealing with TRES=Node
696
697              TraceJobs        Trace jobs in slurmctld. It will print detailed
698                               job  information  including  state, job ids and
699                               allocated nodes counter.
700
701              Triggers         Slurmctld triggers
702
703              WorkQueue        Work Queue details
704
705
706       DefCpuPerGPU
707              Default count of CPUs allocated per allocated GPU.
708
709
710       DefMemPerCPU
711              Default  real  memory  size  available  per  allocated  CPU   in
712              megabytes.   Used  to  avoid over-subscribing memory and causing
713              paging.  DefMemPerCPU would generally be used if individual pro‐
714              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
715              lectType=select/cons_tres).  The default value is 0 (unlimited).
716              Also  see DefMemPerGPU, DefMemPerNode and MaxMemPerCPU.  DefMem‐
717              PerCPU, DefMemPerGPU and DefMemPerNode are mutually exclusive.
718
719
720       DefMemPerGPU
721              Default  real  memory  size  available  per  allocated  GPU   in
722              megabytes.   The  default  value  is  0  (unlimited).   Also see
723              DefMemPerCPU and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU  and
724              DefMemPerNode are mutually exclusive.
725
726
727       DefMemPerNode
728              Default  real  memory  size  available  per  allocated  node  in
729              megabytes.  Used to avoid over-subscribing  memory  and  causing
730              paging.   DefMemPerNode  would  generally be used if whole nodes
731              are allocated to jobs (SelectType=select/linear)  and  resources
732              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
733              The default value is  0  (unlimited).   Also  see  DefMemPerCPU,
734              DefMemPerGPU  and  MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
735              DefMemPerNode are mutually exclusive.
736
737
738       DefaultStorageHost
739              The default name of the machine hosting the  accounting  storage
740              and job completion databases.  Only used for database type stor‐
741              age plugins and when the AccountingStorageHost  and  JobCompHost
742              have not been defined.
743
744
745       DefaultStorageLoc
746              The  fully  qualified file name where job completion records are
747              written when the DefaultStorageType is "filetxt".  Also see Job‐
748              CompLoc.
749
750
751       DefaultStoragePass
752              The  password  used  to gain access to the database to store the
753              accounting and job completion data.  Only used for database type
754              storage  plugins,  ignored  otherwise.  Also see AccountingStor‐
755              agePass and JobCompPass.
756
757
758       DefaultStoragePort
759              The listening port of the accounting storage and/or job  comple‐
760              tion database server.  Only used for database type storage plug‐
761              ins, ignored otherwise.  Also see AccountingStoragePort and Job‐
762              CompPort.
763
764
765       DefaultStorageType
766              The  accounting  and job completion storage mechanism type.  Ac‐
767              ceptable  values  at  present  include  "filetxt",  "mysql"  and
768              "none".   The  value  "filetxt"  indicates  that records will be
769              written to a file.  The value "mysql" indicates that  accounting
770              records will be written to a MySQL or MariaDB database.  The de‐
771              fault value is "none", which means that records  are  not  main‐
772              tained.  Also see AccountingStorageType and JobCompType.
773
774
775       DefaultStorageUser
776              The user account for accessing the accounting storage and/or job
777              completion database.  Only used for database type storage  plug‐
778              ins, ignored otherwise.  Also see AccountingStorageUser and Job‐
779              CompUser.
780
781
782       DependencyParameters
783              Multiple options may be comma separated.
784
785
786              disable_remote_singleton
787                     By default, when a federated job has a  singleton  depen‐
788                     deny,  each cluster in the federation must clear the sin‐
789                     gleton dependency before the job's  singleton  dependency
790                     is  considered satisfied. Enabling this option means that
791                     only the origin cluster must clear the  singleton  depen‐
792                     dency.  This  option  must be set in every cluster in the
793                     federation.
794
795              kill_invalid_depend
796                     If a job has an invalid dependency and it can  never  run
797                     terminate  it  and  set its state to be JOB_CANCELLED. By
798                     default the job stays pending with reason  DependencyNev‐
799                     erSatisfied.   max_depend_depth=#  Maximum number of jobs
800                     to test for a circular job dependency. Stop testing after
801                     this number of job dependencies have been tested. The de‐
802                     fault value is 10 jobs.
803
804
805       DisableRootJobs
806              If set to "YES" then user root will be  prevented  from  running
807              any  jobs.  The default value is "NO", meaning user root will be
808              able to execute jobs.  DisableRootJobs may also be set by parti‐
809              tion.
810
811
812       EioTimeout
813              The  number  of  seconds  srun waits for slurmstepd to close the
814              TCP/IP connection used to relay data between the  user  applica‐
815              tion  and srun when the user application terminates. The default
816              value is 60 seconds.  May not exceed 65533.
817
818
819       EnforcePartLimits
820              If set to "ALL" then jobs which exceed a partition's size and/or
821              time  limits will be rejected at submission time. If job is sub‐
822              mitted to multiple partitions, the job must satisfy  the  limits
823              on  all  the  requested  partitions. If set to "NO" then the job
824              will be accepted and remain queued until  the  partition  limits
825              are  altered(Time  and Node Limits).  If set to "ANY" a job must
826              satisfy any of the requested partitions to be submitted. The de‐
827              fault  value is "NO".  NOTE: If set, then a job's QOS can not be
828              used to exceed partition limits.  NOTE: The partition limits be‐
829              ing  considered  are its configured MaxMemPerCPU, MaxMemPerNode,
830              MinNodes, MaxNodes, MaxTime, AllocNodes,  AllowAccounts,  Allow‐
831              Groups, AllowQOS, and QOS usage threshold.
832
833
834       Epilog Fully  qualified pathname of a script to execute as user root on
835              every  node  when  a  user's  job  completes   (e.g.   "/usr/lo‐
836              cal/slurm/epilog").  A  glob  pattern (See glob (7)) may also be
837              used to run more than one epilog script  (e.g.  "/etc/slurm/epi‐
838              log.d/*").  The  Epilog  script  or scripts may be used to purge
839              files, disable user login, etc.  By default there is no  epilog.
840              See Prolog and Epilog Scripts for more information.
841
842
843       EpilogMsgTime
844              The number of microseconds that the slurmctld daemon requires to
845              process an epilog completion message from  the  slurmd  daemons.
846              This  parameter can be used to prevent a burst of epilog comple‐
847              tion messages from being sent at the same time which should help
848              prevent  lost  messages  and  improve throughput for large jobs.
849              The default value is 2000 microseconds.  For a  1000  node  job,
850              this  spreads  the  epilog completion messages out over two sec‐
851              onds.
852
853
854       EpilogSlurmctld
855              Fully qualified pathname of a program for the slurmctld to  exe‐
856              cute  upon  termination  of  a  job  allocation (e.g.  "/usr/lo‐
857              cal/slurm/epilog_controller").  The program  executes  as  Slur‐
858              mUser,  which gives it permission to drain nodes and requeue the
859              job if a failure occurs (See  scontrol(1)).   Exactly  what  the
860              program  does  and how it accomplishes this is completely at the
861              discretion of the system administrator.  Information  about  the
862              job being initiated, its allocated nodes, etc. are passed to the
863              program using environment  variables.   See  Prolog  and  Epilog
864              Scripts for more information.
865
866
867       ExtSensorsFreq
868              The  external  sensors  plugin  sampling  interval.   If ExtSen‐
869              sorsType=ext_sensors/none, this parameter is ignored.   For  all
870              other  values of ExtSensorsType, this parameter is the number of
871              seconds between external sensors samples for hardware components
872              (nodes,  switches,  etc.)  The default value is zero. This value
873              disables external sensors sampling. Note:  This  parameter  does
874              not affect external sensors data collection for jobs/steps.
875
876
877       ExtSensorsType
878              Identifies  the plugin to be used for external sensors data col‐
879              lection.  Slurmctld calls this plugin to collect  external  sen‐
880              sors  data  for  jobs/steps  and hardware components. In case of
881              node sharing between  jobs  the  reported  values  per  job/step
882              (through  sstat  or  sacct)  may not be accurate.  See also "man
883              ext_sensors.conf".
884
885              Configurable values at present are:
886
887              ext_sensors/none    No external sensors data is collected.
888
889              ext_sensors/rrd     External sensors data is collected from  the
890                                  RRD database.
891
892
893       FairShareDampeningFactor
894              Dampen  the  effect of exceeding a user or group's fair share of
895              allocated resources. Higher values will provides greater ability
896              to differentiate between exceeding the fair share at high levels
897              (e.g. a value of 1 results in almost no difference between over‐
898              consumption  by  a factor of 10 and 100, while a value of 5 will
899              result in a significant difference in  priority).   The  default
900              value is 1.
901
902
903       FederationParameters
904              Used to define federation options. Multiple options may be comma
905              separated.
906
907
908              fed_display
909                     If set, then the client  status  commands  (e.g.  squeue,
910                     sinfo,  sprio, etc.) will display information in a feder‐
911                     ated view by default. This option is functionally equiva‐
912                     lent  to  using the --federation options on each command.
913                     Use the client's --local option to override the federated
914                     view and get a local view of the given cluster.
915
916
917       FirstJobId
918              The job id to be used for the first submitted to Slurm without a
919              specific requested value. Job id values  generated  will  incre‐
920              mented by 1 for each subsequent job. This may be used to provide
921              a meta-scheduler with a job id space which is disjoint from  the
922              interactive jobs.  The default value is 1.  Also see MaxJobId
923
924
925       GetEnvTimeout
926              Controls  how  long the job should wait (in seconds) to load the
927              user's environment before attempting to load  it  from  a  cache
928              file.   Applies  when the salloc or sbatch --get-user-env option
929              is used.  If set to 0 then always load  the  user's  environment
930              from the cache file.  The default value is 2 seconds.
931
932
933       GresTypes
934              A  comma delimited list of generic resources to be managed (e.g.
935              GresTypes=gpu,mps).  These resources may have an associated GRES
936              plugin  of the same name providing additional functionality.  No
937              generic resources are managed by default.  Ensure this parameter
938              is  consistent across all nodes in the cluster for proper opera‐
939              tion.  The slurmctld daemon must be  restarted  for  changes  to
940              this parameter to become effective.
941
942
943       GroupUpdateForce
944              If  set  to a non-zero value, then information about which users
945              are members of groups allowed to use a partition will be updated
946              periodically,  even  when  there  have  been  no  changes to the
947              /etc/group file.  If set to zero, group member information  will
948              be  updated  only after the /etc/group file is updated.  The de‐
949              fault value is 1.  Also see the GroupUpdateTime parameter.
950
951
952       GroupUpdateTime
953              Controls how frequently information about which users  are  mem‐
954              bers  of  groups allowed to use a partition will be updated, and
955              how long user group membership lists will be cached.   The  time
956              interval  is  given  in seconds with a default value of 600 sec‐
957              onds.  A value of zero will prevent periodic updating  of  group
958              membership  information.   Also see the GroupUpdateForce parame‐
959              ter.
960
961
962       GpuFreqDef=[<type]=value>[,<type=value>]
963              Default GPU frequency to use when running a job step if  it  has
964              not  been  explicitly set using the --gpu-freq option.  This op‐
965              tion can be used to independently configure the GPU and its mem‐
966              ory  frequencies. Defaults to "high,memory=high".  After the job
967              is completed, the frequencies of all affected GPUs will be reset
968              to  the  highest  possible  values.  In some cases, system power
969              caps may override the requested values.  The field type  can  be
970              "memory".   If  type  is not specified, the GPU frequency is im‐
971              plied.  The value field can either be "low",  "medium",  "high",
972              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
973              fied numeric value is not possible, a value as close as possible
974              will be used.  See below for definition of the values.  Examples
975              of  use  include  "GpuFreqDef=medium,memory=high  and   "GpuFre‐
976              qDef=450".
977
978              Supported value definitions:
979
980              low       the lowest available frequency.
981
982              medium    attempts  to  set  a  frequency  in  the middle of the
983                        available range.
984
985              high      the highest available frequency.
986
987              highm1    (high minus one) will select the next  highest  avail‐
988                        able frequency.
989
990
991       HealthCheckInterval
992              The  interval  in  seconds between executions of HealthCheckPro‐
993              gram.  The default value is zero, which disables execution.
994
995
996       HealthCheckNodeState
997              Identify what node states should execute the HealthCheckProgram.
998              Multiple  state  values may be specified with a comma separator.
999              The default value is ANY to execute on nodes in any state.
1000
1001              ALLOC       Run on nodes in the  ALLOC  state  (all  CPUs  allo‐
1002                          cated).
1003
1004              ANY         Run on nodes in any state.
1005
1006              CYCLE       Rather  than running the health check program on all
1007                          nodes at the same time, cycle through running on all
1008                          compute nodes through the course of the HealthCheck‐
1009                          Interval. May be  combined  with  the  various  node
1010                          state options.
1011
1012              IDLE        Run on nodes in the IDLE state.
1013
1014              MIXED       Run  on nodes in the MIXED state (some CPUs idle and
1015                          other CPUs allocated).
1016
1017
1018       HealthCheckProgram
1019              Fully qualified pathname of a script to execute as user root pe‐
1020              riodically on all compute nodes that are not in the NOT_RESPOND‐
1021              ING state. This program may be used to verify the node is  fully
1022              operational and DRAIN the node or send email if a problem is de‐
1023              tected.  Any action to be taken must be explicitly performed  by
1024              the   program   (e.g.   execute  "scontrol  update  NodeName=foo
1025              State=drain Reason=tmp_file_system_full" to drain a node).   The
1026              execution  interval  is controlled using the HealthCheckInterval
1027              parameter.  Note that the HealthCheckProgram will be executed at
1028              the  same time on all nodes to minimize its impact upon parallel
1029              programs.  This program is will be killed if it does not  termi‐
1030              nate normally within 60 seconds.  This program will also be exe‐
1031              cuted when the slurmd daemon is first started and before it reg‐
1032              isters  with  the slurmctld daemon.  By default, no program will
1033              be executed.
1034
1035
1036       InactiveLimit
1037              The interval, in seconds, after which a non-responsive job allo‐
1038              cation  command (e.g. srun or salloc) will result in the job be‐
1039              ing terminated. If the node on which  the  command  is  executed
1040              fails  or the command abnormally terminates, this will terminate
1041              its job allocation.  This option has no effect upon batch  jobs.
1042              When  setting  a  value, take into consideration that a debugger
1043              using srun to launch an application may leave the  srun  command
1044              in  a stopped state for extended periods of time.  This limit is
1045              ignored for jobs running in partitions with  the  RootOnly  flag
1046              set  (the  scheduler running as root will be responsible for the
1047              job).  The default value is unlimited (zero) and may not  exceed
1048              65533 seconds.
1049
1050
1051       InteractiveStepOptions
1052              When LaunchParameters=use_interactive_step is enabled, launching
1053              salloc will automatically start an srun  process  with  Interac‐
1054              tiveStepOptions  to launch a terminal on a node in the job allo‐
1055              cation.  The  default  value  is  "--interactive  --preserve-env
1056              --pty $SHELL".
1057
1058
1059       JobAcctGatherType
1060              The job accounting mechanism type.  Acceptable values at present
1061              include "jobacct_gather/linux" (for Linux systems)  and  is  the
1062              recommended        one,        "jobacct_gather/cgroup"       and
1063              "jobacct_gather/none" (no accounting data collected).   The  de‐
1064              fault  value  is "jobacct_gather/none".  "jobacct_gather/cgroup"
1065              is a plugin for the Linux operating system that uses cgroups  to
1066              collect accounting statistics. The plugin collects the following
1067              statistics:  From  the  cgroup  memory   subsystem:   memory.us‐
1068              age_in_bytes (reported as 'pages') and rss from memory.stat (re‐
1069              ported as 'rss'). From the cgroup cpuacct  subsystem:  user  cpu
1070              time  and  system  cpu time. No value is provided by cgroups for
1071              virtual memory size ('vsize').  In order to use the  sstat  tool
1072              "jobacct_gather/linux",  or "jobacct_gather/cgroup" must be con‐
1073              figured.
1074              NOTE: Changing this configuration parameter changes the contents
1075              of  the  messages  between Slurm daemons. Any previously running
1076              job steps are managed by a slurmstepd daemon that  will  persist
1077              through  the lifetime of that job step and not change its commu‐
1078              nication protocol. Only change this configuration parameter when
1079              there are no running job steps.
1080
1081
1082       JobAcctGatherFrequency
1083              The  job  accounting and profiling sampling intervals.  The sup‐
1084              ported format is follows:
1085
1086              JobAcctGatherFrequency=<datatype>=<interval>
1087                          where <datatype>=<interval> specifies the task  sam‐
1088                          pling  interval  for  the jobacct_gather plugin or a
1089                          sampling  interval  for  a  profiling  type  by  the
1090                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1091                          rated <datatype>=<interval> intervals may be  speci‐
1092                          fied. Supported datatypes are as follows:
1093
1094                          task=<interval>
1095                                 where  <interval> is the task sampling inter‐
1096                                 val in seconds for the jobacct_gather plugins
1097                                 and     for    task    profiling    by    the
1098                                 acct_gather_profile plugin.
1099
1100                          energy=<interval>
1101                                 where <interval> is the sampling interval  in
1102                                 seconds   for   energy  profiling  using  the
1103                                 acct_gather_energy plugin
1104
1105                          network=<interval>
1106                                 where <interval> is the sampling interval  in
1107                                 seconds  for  infiniband  profiling using the
1108                                 acct_gather_interconnect plugin.
1109
1110                          filesystem=<interval>
1111                                 where <interval> is the sampling interval  in
1112                                 seconds  for  filesystem  profiling using the
1113                                 acct_gather_filesystem plugin.
1114
1115              The default value for task sampling interval
1116              is 30 seconds. The default value for all other intervals  is  0.
1117              An  interval  of  0 disables sampling of the specified type.  If
1118              the task sampling interval is 0, accounting information is  col‐
1119              lected only at job termination (reducing Slurm interference with
1120              the job).
1121              Smaller (non-zero) values have a greater impact upon job perfor‐
1122              mance,  but a value of 30 seconds is not likely to be noticeable
1123              for applications having less than 10,000 tasks.
1124              Users can independently override each interval on a per job  ba‐
1125              sis using the --acctg-freq option when submitting the job.
1126
1127
1128       JobAcctGatherParams
1129              Arbitrary  parameters  for the job account gather plugin Accept‐
1130              able values at present include:
1131
1132              NoShared            Exclude shared memory from accounting.
1133
1134              UsePss              Use PSS value instead of  RSS  to  calculate
1135                                  real usage of memory.  The PSS value will be
1136                                  saved as RSS.
1137
1138              OverMemoryKill      Kill processes that are  being  detected  to
1139                                  use  more memory than requested by steps ev‐
1140                                  ery time accounting information is  gathered
1141                                  by the JobAcctGather plugin.  This parameter
1142                                  should be used with caution  because  a  job
1143                                  exceeding  its  memory allocation may affect
1144                                  other processes and/or machine health.
1145
1146                                  NOTE: If available,  it  is  recommended  to
1147                                  limit  memory  by  enabling task/cgroup as a
1148                                  TaskPlugin  and  making  use  of  Constrain‐
1149                                  RAMSpace=yes  in  the cgroup.conf instead of
1150                                  using this JobAcctGather mechanism for  mem‐
1151                                  ory enforcement. With OverMemoryKill, memory
1152                                  limit is applied against each process  indi‐
1153                                  vidually and is not applied to the step as a
1154                                  whole. This means  that  when  jobs  have  a
1155                                  process  that  consumes too much memory, the
1156                                  process will be killed  but  the  step  will
1157                                  continue  to  run.  When  using cgroups with
1158                                  ConstrainRAMSpace=yes, a process  that  con‐
1159                                  sumes too much memory will result in the job
1160                                  step being killed.  Using  JobAcctGather  is
1161                                  polling  based and there is a delay before a
1162                                  job is killed, which could  lead  to  system
1163                                  Out of Memory events.
1164
1165
1166       JobCompHost
1167              The  name  of  the  machine hosting the job completion database.
1168              Only used for database type storage plugins, ignored  otherwise.
1169              Also see DefaultStorageHost.
1170
1171
1172       JobCompLoc
1173              The  fully  qualified file name where job completion records are
1174              written when the JobCompType is "jobcomp/filetxt" or  the  data‐
1175              base  where  job completion records are stored when the JobComp‐
1176              Type is a database, or  a  complete  URL  endpoint  with  format
1177              <host>:<port>/<target>/_doc  when  JobCompType is "jobcomp/elas‐
1178              ticsearch" like i.e.  "localhost:9200/slurm/_doc".   NOTE:  More
1179              information    is    available    at    the   Slurm   web   site
1180              <https://slurm.schedmd.com/elasticsearch.html>.   Also  see  De‐
1181              faultStorageLoc.
1182
1183
1184       JobCompParams
1185              Pass  arbitrary  text string to job completion plugin.  Also see
1186              JobCompType.
1187
1188
1189       JobCompPass
1190              The password used to gain access to the database  to  store  the
1191              job  completion data.  Only used for database type storage plug‐
1192              ins, ignored otherwise.  Also see DefaultStoragePass.
1193
1194
1195       JobCompPort
1196              The listening port of the job completion database server.   Only
1197              used for database type storage plugins, ignored otherwise.  Also
1198              see DefaultStoragePort.
1199
1200
1201       JobCompType
1202              The job completion logging mechanism type.  Acceptable values at
1203              present  include  "jobcomp/none", "jobcomp/elasticsearch", "job‐
1204              comp/filetxt",   "jobcomp/lua",   "jobcomp/mysql"   and    "job‐
1205              comp/script".   The default value is "jobcomp/none", which means
1206              that upon job completion the record of the job  is  purged  from
1207              the  system.  If using the accounting infrastructure this plugin
1208              may not be of interest since the information here is  redundant.
1209              The value "jobcomp/elasticsearch" indicates that a record of the
1210              job should be written to an Elasticsearch  server  specified  by
1211              the  JobCompLoc  parameter.  NOTE: More information is available
1212              at  the  Slurm  web  site  (  https://slurm.schedmd.com/elastic
1213              search.html  ).   The  value  "jobcomp/filetxt" indicates that a
1214              record of the job should be written to a text file specified  by
1215              the  JobCompLoc  parameter.   The  value "jobcomp/lua" indicates
1216              that a record of the job should processed by  the  "jobcomp.lua"
1217              script  located  in  the default script directory (typically the
1218              subdirectory "etc" of the installation  directory).   The  value
1219              "jobcomp/mysql"  indicates  that  a  record of the job should be
1220              written to a MySQL or MariaDB database specified by the  JobCom‐
1221              pLoc  parameter.   The  value  "jobcomp/script" indicates that a
1222              script specified by the JobCompLoc parameter is to  be  executed
1223              with environment variables indicating the job information.
1224
1225       JobCompUser
1226              The  user  account  for  accessing  the job completion database.
1227              Only used for database type storage plugins, ignored  otherwise.
1228              Also see DefaultStorageUser.
1229
1230
1231       JobContainerType
1232              Identifies  the  plugin to be used for job tracking.  The slurmd
1233              daemon must be restarted for a  change  in  JobContainerType  to
1234              take  effect.  NOTE: The JobContainerType applies to a job allo‐
1235              cation, while ProctrackType applies to  job  steps.   Acceptable
1236              values at present include:
1237
1238              job_container/cncu  Used  only  for Cray systems (CNCU = Compute
1239                                  Node Clean Up)
1240
1241              job_container/none  Used for all other system types
1242
1243              job_container/tmpfs Used to create a private  namespace  on  the
1244                                  filesystem  for jobs, which houses temporary
1245                                  file systems (/tmp and  /dev/shm)  for  each
1246                                  job.
1247
1248
1249       JobFileAppend
1250              This  option controls what to do if a job's output or error file
1251              exist when the job is started.  If JobFileAppend  is  set  to  a
1252              value  of  1, then append to the existing file.  By default, any
1253              existing file is truncated.
1254
1255
1256       JobRequeue
1257              This option controls the default ability for batch  jobs  to  be
1258              requeued.   Jobs may be requeued explicitly by a system adminis‐
1259              trator, after node failure, or upon preemption by a higher  pri‐
1260              ority job.  If JobRequeue is set to a value of 1, then batch job
1261              may be requeued unless explicitly disabled by the user.  If  Jo‐
1262              bRequeue  is set to a value of 0, then batch job will not be re‐
1263              queued unless explicitly enabled by the user.   Use  the  sbatch
1264              --no-requeue  or --requeue option to change the default behavior
1265              for individual jobs.  The default value is 1.
1266
1267
1268       JobSubmitPlugins
1269              A comma delimited list of job submission  plugins  to  be  used.
1270              The  specified  plugins  will  be  executed in the order listed.
1271              These are intended to be site-specific plugins which can be used
1272              to  set  default  job  parameters and/or logging events.  Sample
1273              plugins available in the distribution include  "all_partitions",
1274              "defaults",  "logging", "lua", and "partition".  For examples of
1275              use, see the Slurm code in  "src/plugins/job_submit"  and  "con‐
1276              tribs/lua/job_submit*.lua"  then modify the code to satisfy your
1277              needs.  Slurm can be configured to use multiple job_submit plug‐
1278              ins if desired, however the lua plugin will only execute one lua
1279              script named "job_submit.lua" located in the default script  di‐
1280              rectory  (typically  the  subdirectory "etc" of the installation
1281              directory).  No job submission plugins are used by default.
1282
1283
1284       KeepAliveTime
1285              Specifies how long sockets communications used between the  srun
1286              command  and its slurmstepd process are kept alive after discon‐
1287              nect.  Longer values can be used to improve reliability of  com‐
1288              munications in the event of network failures.  The default value
1289              leaves the system default  value.   The  value  may  not  exceed
1290              65533.
1291
1292
1293       KillOnBadExit
1294              If  set  to 1, a step will be terminated immediately if any task
1295              is crashed or aborted, as indicated by  a  non-zero  exit  code.
1296              With  the default value of 0, if one of the processes is crashed
1297              or aborted the other processes will continue to  run  while  the
1298              crashed  or  aborted  process  waits. The user can override this
1299              configuration parameter by using srun's -K, --kill-on-bad-exit.
1300
1301
1302       KillWait
1303              The interval, in seconds, given to a job's processes between the
1304              SIGTERM  and  SIGKILL  signals upon reaching its time limit.  If
1305              the job fails to terminate gracefully in the interval specified,
1306              it  will  be  forcibly terminated.  The default value is 30 sec‐
1307              onds.  The value may not exceed 65533.
1308
1309
1310       NodeFeaturesPlugins
1311              Identifies the plugins to be used for support of  node  features
1312              which  can  change through time. For example, a node which might
1313              be booted with various BIOS setting. This is  supported  through
1314              the  use  of a node's active_features and available_features in‐
1315              formation.  Acceptable values at present include:
1316
1317              node_features/knl_cray
1318                                  used only for Intel Knights Landing  proces‐
1319                                  sors (KNL) on Cray systems
1320
1321              node_features/knl_generic
1322                                  used  for  Intel  Knights Landing processors
1323                                  (KNL) on a generic Linux system
1324
1325
1326       LaunchParameters
1327              Identifies options to the job launch plugin.  Acceptable  values
1328              include:
1329
1330              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1331                                      from  given  --cpu-freq,  or  slurm.conf
1332                                      CpuFreqDef,  option.   By  default  only
1333                                      steps started with srun will utilize the
1334                                      cpu freq setting options.
1335
1336                                      NOTE:  If  you  are using srun to launch
1337                                      your steps inside a  batch  script  (ad‐
1338                                      vised)  this option will create a situa‐
1339                                      tion where you may have multiple  agents
1340                                      setting  the  cpu_freq as the batch step
1341                                      usually runs on the same  resources  one
1342                                      or  more  steps  the sruns in the script
1343                                      will create.
1344
1345              cray_net_exclusive      Allow jobs on a Cray Native cluster  ex‐
1346                                      clusive  access  to  network  resources.
1347                                      This should only be set on clusters pro‐
1348                                      viding  exclusive access to each node to
1349                                      a single job at once, and not using par‐
1350                                      allel  steps  within  the job, otherwise
1351                                      resources on the node  can  be  oversub‐
1352                                      scribed.
1353
1354              enable_nss_slurm        Permits  passwd and group resolution for
1355                                      a  job  to  be  serviced  by  slurmstepd
1356                                      rather  than  requiring  a lookup from a
1357                                      network     based      service.      See
1358                                      https://slurm.schedmd.com/nss_slurm.html
1359                                      for more information.
1360
1361              lustre_no_flush         If set on a Cray Native cluster, then do
1362                                      not  flush  the Lustre cache on job step
1363                                      completion. This setting will only  take
1364                                      effect  after  reconfiguring,  and  will
1365                                      only  take  effect  for  newly  launched
1366                                      jobs.
1367
1368              mem_sort                Sort NUMA memory at step start. User can
1369                                      override     this      default      with
1370                                      SLURM_MEM_BIND  environment  variable or
1371                                      --mem-bind=nosort command line option.
1372
1373              mpir_use_nodeaddr       When launching tasks Slurm  creates  en‐
1374                                      tries in MPIR_proctable that are used by
1375                                      parallel debuggers, profilers,  and  re‐
1376                                      lated   tools   to   attach  to  running
1377                                      process.  By default the  MPIR_proctable
1378                                      entries contain MPIR_procdesc structures
1379                                      where the host_name is set  to  NodeName
1380                                      by default. If this option is specified,
1381                                      NodeAddr will be used  in  this  context
1382                                      instead.
1383
1384              disable_send_gids       By  default,  the slurmctld will look up
1385                                      and send the user_name and extended gids
1386                                      for  a job, rather than independently on
1387                                      each node as part of each  task  launch.
1388                                      This  helps  mitigate issues around name
1389                                      service scalability when launching  jobs
1390                                      involving  many nodes. Using this option
1391                                      will disable  this  functionality.  This
1392                                      option is ignored if enable_nss_slurm is
1393                                      specified.
1394
1395              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1396                                      memory in RAM.
1397
1398              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1399                                      and future memory in RAM.
1400
1401              test_exec               Have srun verify existence of  the  exe‐
1402                                      cutable  program along with user execute
1403                                      permission on the node  where  srun  was
1404                                      called before attempting to launch it on
1405                                      nodes in the step.
1406
1407              use_interactive_step    Have salloc use the Interactive Step  to
1408                                      launch  a  shell on an allocated compute
1409                                      node rather  than  locally  to  wherever
1410                                      salloc was invoked. This is accomplished
1411                                      by launching the srun command  with  In‐
1412                                      teractiveStepOptions as options.
1413
1414                                      This  does not affect salloc called with
1415                                      a command as  an  argument.  These  jobs
1416                                      will  continue  to  be  executed  as the
1417                                      calling user on the calling host.
1418
1419
1420       LaunchType
1421              Identifies the mechanism to be used to launch application tasks.
1422              Acceptable values include:
1423
1424              launch/slurm
1425                     The default value.
1426
1427
1428       Licenses
1429              Specification  of  licenses (or other resources available on all
1430              nodes of the cluster) which can be allocated to  jobs.   License
1431              names can optionally be followed by a colon and count with a de‐
1432              fault count of one.  Multiple license names should be comma sep‐
1433              arated  (e.g.   "Licenses=foo:4,bar").  Note that Slurm prevents
1434              jobs from being scheduled if their required  license  specifica‐
1435              tion  is  not available.  Slurm does not prevent jobs from using
1436              licenses that are not explicitly listed in  the  job  submission
1437              specification.
1438
1439
1440       LogTimeFormat
1441              Format  of  the timestamp in slurmctld and slurmd log files. Ac‐
1442              cepted   values   are   "iso8601",   "iso8601_ms",    "rfc5424",
1443              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1444              ing in "_ms" differ from the ones  without  in  that  fractional
1445              seconds  with  millisecond  precision  are  printed. The default
1446              value is "iso8601_ms". The "rfc5424" formats are the same as the
1447              "iso8601"  formats except that the timezone value is also shown.
1448              The "clock" format shows a timestamp in  microseconds  retrieved
1449              with  the  C  standard clock() function. The "short" format is a
1450              short date and time format. The  "thread_id"  format  shows  the
1451              timestamp  in  the  C standard ctime() function form without the
1452              year but including the microseconds, the daemon's process ID and
1453              the current thread name and ID.
1454
1455
1456       MailDomain
1457              Domain name to qualify usernames if email address is not explic‐
1458              itly given with the "--mail-user" option. If  unset,  the  local
1459              MTA  will need to qualify local address itself. Changes to Mail‐
1460              Domain will only affect new jobs.
1461
1462
1463       MailProg
1464              Fully qualified pathname to the program used to send  email  per
1465              user   request.    The   default   value   is   "/bin/mail"  (or
1466              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1467              "/usr/bin/mail" does exist).
1468
1469
1470       MaxArraySize
1471              The  maximum  job  array  task index value will be one less than
1472              MaxArraySize to allow for an index  value  of  zero.   Configure
1473              MaxArraySize  to 0 in order to disable job array use.  The value
1474              may not exceed 4000001.  The value of MaxJobCount should be much
1475              larger  than MaxArraySize.  The default value is 1001.  See also
1476              max_array_tasks in SchedulerParameters.
1477
1478
1479       MaxDBDMsgs
1480              When communication to the SlurmDBD is not possible the slurmctld
1481              will  queue  messages  meant  to  processed when the SlurmDBD is
1482              available again.  In order to avoid running out  of  memory  the
1483              slurmctld will only queue so many messages. The default value is
1484              10000, or MaxJobCount *  2  +  Node  Count  *  4,  whichever  is
1485              greater.  The value can not be less than 10000.
1486
1487
1488       MaxJobCount
1489              The maximum number of jobs Slurm can have in its active database
1490              at one time. Set the values of MaxJobCount and MinJobAge to  en‐
1491              sure  the  slurmctld daemon does not exhaust its memory or other
1492              resources. Once this limit is reached, requests to submit  addi‐
1493              tional  jobs  will fail. The default value is 10000 jobs.  NOTE:
1494              Each task of a job array counts as one job even though they will
1495              not  occupy  separate  job  records until modified or initiated.
1496              Performance can suffer with more than  a  few  hundred  thousand
1497              jobs.   Setting per MaxSubmitJobs per user is generally valuable
1498              to prevent a single user from  filling  the  system  with  jobs.
1499              This  is accomplished using Slurm's database and configuring en‐
1500              forcement of resource limits.  This value may not be  reset  via
1501              "scontrol  reconfig".   It only takes effect upon restart of the
1502              slurmctld daemon.
1503
1504
1505       MaxJobId
1506              The maximum job id to be used for jobs submitted to Slurm  with‐
1507              out a specific requested value. Job ids are unsigned 32bit inte‐
1508              gers with the first 26 bits reserved for local job ids  and  the
1509              remaining  6 bits reserved for a cluster id to identify a feder‐
1510              ated  job's  origin.  The  maximun  allowed  local  job  id   is
1511              67,108,863   (0x3FFFFFF).   The   default  value  is  67,043,328
1512              (0x03ff0000).  MaxJobId only applies to the local job id and not
1513              the  federated  job  id.  Job id values generated will be incre‐
1514              mented by 1 for each subsequent job. Once MaxJobId  is  reached,
1515              the  next  job will be assigned FirstJobId.  Federated jobs will
1516              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1517              bId.
1518
1519
1520       MaxMemPerCPU
1521              Maximum   real  memory  size  available  per  allocated  CPU  in
1522              megabytes.  Used to avoid over-subscribing  memory  and  causing
1523              paging.  MaxMemPerCPU would generally be used if individual pro‐
1524              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
1525              lectType=select/cons_tres).  The default value is 0 (unlimited).
1526              Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerNode.   MaxMem‐
1527              PerCPU and MaxMemPerNode are mutually exclusive.
1528
1529              NOTE:  If  a  job  specifies a memory per CPU limit that exceeds
1530              this system limit, that job's count of CPUs per task will try to
1531              automatically  increase.  This may result in the job failing due
1532              to CPU count limits. This auto-adjustment feature is a  best-ef‐
1533              fort  one  and  optimal  assignment is not guaranteed due to the
1534              possibility of having heterogeneous  configurations  and  multi-
1535              partition/qos jobs.  If this is a concern it is advised to use a
1536              job submit LUA plugin instead  to  enforce  auto-adjustments  to
1537              your specific needs.
1538
1539
1540       MaxMemPerNode
1541              Maximum  real  memory  size  available  per  allocated  node  in
1542              megabytes.  Used to avoid over-subscribing  memory  and  causing
1543              paging.   MaxMemPerNode  would  generally be used if whole nodes
1544              are allocated to jobs (SelectType=select/linear)  and  resources
1545              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
1546              The default value is 0 (unlimited).  Also see DefMemPerNode  and
1547              MaxMemPerCPU.   MaxMemPerCPU  and MaxMemPerNode are mutually ex‐
1548              clusive.
1549
1550
1551       MaxStepCount
1552              The maximum number of steps that any job can initiate. This  pa‐
1553              rameter  is  intended  to limit the effect of bad batch scripts.
1554              The default value is 40000 steps.
1555
1556
1557       MaxTasksPerNode
1558              Maximum number of tasks Slurm will allow a job step to spawn  on
1559              a  single node. The default MaxTasksPerNode is 512.  May not ex‐
1560              ceed 65533.
1561
1562
1563       MCSParameters
1564              MCS = Multi-Category Security MCS Plugin Parameters.   The  sup‐
1565              ported  parameters  are  specific  to the MCSPlugin.  Changes to
1566              this value take effect when the Slurm daemons are  reconfigured.
1567              More     information     about    MCS    is    available    here
1568              <https://slurm.schedmd.com/mcs.html>.
1569
1570
1571       MCSPlugin
1572              MCS = Multi-Category Security : associate a  security  label  to
1573              jobs  and  ensure that nodes can only be shared among jobs using
1574              the same security label.  Acceptable values include:
1575
1576              mcs/none    is the default value.  No security label  associated
1577                          with  jobs,  no particular security restriction when
1578                          sharing nodes among jobs.
1579
1580              mcs/account only users with the same account can share the nodes
1581                          (requires enabling of accounting).
1582
1583              mcs/group   only users with the same group can share the nodes.
1584
1585              mcs/user    a node cannot be shared with other users.
1586
1587
1588       MessageTimeout
1589              Time  permitted  for  a  round-trip communication to complete in
1590              seconds. Default value is 10 seconds. For  systems  with  shared
1591              nodes,  the  slurmd  daemon  could  be paged out and necessitate
1592              higher values.
1593
1594
1595       MinJobAge
1596              The minimum age of a completed job before its record  is  purged
1597              from  Slurm's active database. Set the values of MaxJobCount and
1598              to ensure the slurmctld daemon does not exhaust  its  memory  or
1599              other  resources.  The default value is 300 seconds.  A value of
1600              zero prevents any job record purging.  Jobs are not purged  dur‐
1601              ing  a backfill cycle, so it can take longer than MinJobAge sec‐
1602              onds to purge a job if using the backfill scheduling plugin.  In
1603              order  to  eliminate  some possible race conditions, the minimum
1604              non-zero value for MinJobAge recommended is 2.
1605
1606
1607       MpiDefault
1608              Identifies the default type of MPI to be used.  Srun  may  over‐
1609              ride  this  configuration parameter in any case.  Currently sup‐
1610              ported versions include: pmi2, pmix, and  none  (default,  which
1611              works  for  many other versions of MPI).  More information about
1612              MPI          use           is           available           here
1613              <https://slurm.schedmd.com/mpi_guide.html>.
1614
1615
1616       MpiParams
1617              MPI  parameters.   Used to identify ports used by older versions
1618              of OpenMPI  and  native  Cray  systems.   The  input  format  is
1619              "ports=12000-12999"  to  identify a range of communication ports
1620              to be used.  NOTE: This is not needed  for  modern  versions  of
1621              OpenMPI,  taking  it  out  can cause a small boost in scheduling
1622              performance.  NOTE: This is require for Cray's PMI.
1623
1624
1625       OverTimeLimit
1626              Number of minutes by which a job can exceed its time  limit  be‐
1627              fore  being canceled.  Normally a job's time limit is treated as
1628              a hard limit and the job  will  be  killed  upon  reaching  that
1629              limit.   Configuring OverTimeLimit will result in the job's time
1630              limit being treated like a soft limit.  Adding the OverTimeLimit
1631              value  to  the  soft  time  limit provides a hard time limit, at
1632              which point the job is canceled.  This  is  particularly  useful
1633              for  backfill  scheduling, which bases upon each job's soft time
1634              limit.  The default value is zero.  May not  exceed  65533  min‐
1635              utes.  A value of "UNLIMITED" is also supported.
1636
1637
1638       PluginDir
1639              Identifies  the places in which to look for Slurm plugins.  This
1640              is a colon-separated list of directories, like the PATH environ‐
1641              ment variable.  The default value is the prefix given at config‐
1642              ure time + "/lib/slurm".
1643
1644
1645       PlugStackConfig
1646              Location of the config file for Slurm stackable plugins that use
1647              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1648              (SPANK).  This provides support for a highly configurable set of
1649              plugins  to be called before and/or after execution of each task
1650              spawned as part of a  user's  job  step.   Default  location  is
1651              "plugstack.conf" in the same directory as the system slurm.conf.
1652              For more information on SPANK plugins, see the spank(8) manual.
1653
1654
1655       PowerParameters
1656              System power management parameters.   The  supported  parameters
1657              are specific to the PowerPlugin.  Changes to this value take ef‐
1658              fect when the Slurm daemons are reconfigured.  More  information
1659              about    system    power    management    is    available   here
1660              <https://slurm.schedmd.com/power_mgmt.html>.   Options   current
1661              supported by any plugins are listed below.
1662
1663              balance_interval=#
1664                     Specifies the time interval, in seconds, between attempts
1665                     to rebalance power caps across the nodes.  This also con‐
1666                     trols  the  frequency  at which Slurm attempts to collect
1667                     current power consumption data (old data may be used  un‐
1668                     til new data is available from the underlying infrastruc‐
1669                     ture and values below 10 seconds are not recommended  for
1670                     Cray  systems).   The  default value is 30 seconds.  Sup‐
1671                     ported by the power/cray_aries plugin.
1672
1673              capmc_path=
1674                     Specifies the absolute path of the  capmc  command.   The
1675                     default   value  is  "/opt/cray/capmc/default/bin/capmc".
1676                     Supported by the power/cray_aries plugin.
1677
1678              cap_watts=#
1679                     Specifies the total power limit to be established  across
1680                     all  compute  nodes  managed by Slurm.  A value of 0 sets
1681                     every compute node to have an unlimited cap.  The default
1682                     value is 0.  Supported by the power/cray_aries plugin.
1683
1684              decrease_rate=#
1685                     Specifies the maximum rate of change in the power cap for
1686                     a node where the actual power usage is  below  the  power
1687                     cap  by  an  amount greater than lower_threshold (see be‐
1688                     low).  Value represents a percentage  of  the  difference
1689                     between  a  node's minimum and maximum power consumption.
1690                     The default  value  is  50  percent.   Supported  by  the
1691                     power/cray_aries plugin.
1692
1693              get_timeout=#
1694                     Amount  of time allowed to get power state information in
1695                     milliseconds.  The default value is 5,000 milliseconds or
1696                     5  seconds.  Supported by the power/cray_aries plugin and
1697                     represents the time allowed for the capmc command to  re‐
1698                     spond to various "get" options.
1699
1700              increase_rate=#
1701                     Specifies the maximum rate of change in the power cap for
1702                     a node  where  the  actual  power  usage  is  within  up‐
1703                     per_threshold (see below) of the power cap.  Value repre‐
1704                     sents a percentage of the  difference  between  a  node's
1705                     minimum and maximum power consumption.  The default value
1706                     is 20 percent.  Supported by the power/cray_aries plugin.
1707
1708              job_level
1709                     All nodes associated with every job will  have  the  same
1710                     power   cap,  to  the  extent  possible.   Also  see  the
1711                     --power=level option on the job submission commands.
1712
1713              job_no_level
1714                     Disable the user's ability to set every  node  associated
1715                     with  a  job  to the same power cap.  Each node will have
1716                     its power  cap  set  independently.   This  disables  the
1717                     --power=level option on the job submission commands.
1718
1719              lower_threshold=#
1720                     Specify a lower power consumption threshold.  If a node's
1721                     current power consumption is below this percentage of its
1722                     current cap, then its power cap will be reduced.  The de‐
1723                     fault  value   is   90   percent.    Supported   by   the
1724                     power/cray_aries plugin.
1725
1726              recent_job=#
1727                     If  a job has started or resumed execution (from suspend)
1728                     on a compute node within this number of seconds from  the
1729                     current  time,  the node's power cap will be increased to
1730                     the maximum.  The default value  is  300  seconds.   Sup‐
1731                     ported by the power/cray_aries plugin.
1732
1733
1734              set_timeout=#
1735                     Amount  of time allowed to set power state information in
1736                     milliseconds.  The default value is  30,000  milliseconds
1737                     or  30  seconds.   Supported by the power/cray plugin and
1738                     represents the time allowed for the capmc command to  re‐
1739                     spond to various "set" options.
1740
1741              set_watts=#
1742                     Specifies  the  power  limit  to  be set on every compute
1743                     nodes managed by Slurm.  Every node gets this same  power
1744                     cap and there is no variation through time based upon ac‐
1745                     tual  power  usage  on  the  node.   Supported   by   the
1746                     power/cray_aries plugin.
1747
1748              upper_threshold=#
1749                     Specify  an  upper  power  consumption  threshold.   If a
1750                     node's current power consumption is above this percentage
1751                     of  its current cap, then its power cap will be increased
1752                     to the extent possible.  The default value is 95 percent.
1753                     Supported by the power/cray_aries plugin.
1754
1755
1756       PowerPlugin
1757              Identifies  the  plugin  used for system power management.  Cur‐
1758              rently supported plugins include: cray_aries and none.   Changes
1759              to  this  value require restarting Slurm daemons to take effect.
1760              More information about system power management is available here
1761              <https://slurm.schedmd.com/power_mgmt.html>.    By  default,  no
1762              power plugin is loaded.
1763
1764
1765       PreemptMode
1766              Mechanism used to preempt jobs or enable gang  scheduling.  When
1767              the  PreemptType parameter is set to enable preemption, the Pre‐
1768              emptMode selects the default mechanism used to preempt the  eli‐
1769              gible jobs for the cluster.
1770              PreemptMode  may  be specified on a per partition basis to over‐
1771              ride this default value  if  PreemptType=preempt/partition_prio.
1772              Alternatively,  it  can  be specified on a per QOS basis if Pre‐
1773              emptType=preempt/qos. In either case, a valid  default  Preempt‐
1774              Mode  value  must  be  specified for the cluster as a whole when
1775              preemption is enabled.
1776              The GANG option is used to enable gang scheduling independent of
1777              whether  preemption is enabled (i.e. independent of the Preempt‐
1778              Type setting). It can be specified in addition to a  PreemptMode
1779              setting  with  the  two  options  comma separated (e.g. Preempt‐
1780              Mode=SUSPEND,GANG).
1781              See         <https://slurm.schedmd.com/preempt.html>         and
1782              <https://slurm.schedmd.com/gang_scheduling.html>  for  more  de‐
1783              tails.
1784
1785              NOTE: For performance reasons, the backfill  scheduler  reserves
1786              whole  nodes  for  jobs,  not  partial nodes. If during backfill
1787              scheduling a job preempts one or  more  other  jobs,  the  whole
1788              nodes  for  those  preempted jobs are reserved for the preemptor
1789              job, even if the preemptor job requested  fewer  resources  than
1790              that.   These reserved nodes aren't available to other jobs dur‐
1791              ing that backfill cycle, even if the other jobs could fit on the
1792              nodes.  Therefore, jobs may preempt more resources during a sin‐
1793              gle backfill iteration than they requested.
1794
1795              NOTE: For heterogeneous job to be considered for preemption  all
1796              components must be eligible for preemption. When a heterogeneous
1797              job is to be preempted the first identified component of the job
1798              with  the highest order PreemptMode (SUSPEND (highest), REQUEUE,
1799              CANCEL (lowest)) will be used to set  the  PreemptMode  for  all
1800              components.  The GraceTime and user warning signal for each com‐
1801              ponent of the heterogeneous job  remain  unique.   Heterogeneous
1802              jobs are excluded from GANG scheduling operations.
1803
1804              OFF         Is the default value and disables job preemption and
1805                          gang scheduling.  It is only  compatible  with  Pre‐
1806                          emptType=preempt/none  at  a global level.  A common
1807                          use case for this parameter is to set it on a parti‐
1808                          tion to disable preemption for that partition.
1809
1810              CANCEL      The preempted job will be cancelled.
1811
1812              GANG        Enables  gang  scheduling  (time slicing) of jobs in
1813                          the same partition, and allows the resuming of  sus‐
1814                          pended jobs.
1815
1816                          NOTE: Gang scheduling is performed independently for
1817                          each partition, so if you only want time-slicing  by
1818                          OverSubscribe,  without any preemption, then config‐
1819                          uring partitions with overlapping nodes is not  rec‐
1820                          ommended.   On  the  other  hand, if you want to use
1821                          PreemptType=preempt/partition_prio  to  allow   jobs
1822                          from  higher PriorityTier partitions to Suspend jobs
1823                          from lower PriorityTier  partitions  you  will  need
1824                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1825                          to use the Gang scheduler to  resume  the  suspended
1826                          jobs(s).  In any case, time-slicing won't happen be‐
1827                          tween jobs on different partitions.
1828
1829                          NOTE: Heterogeneous  jobs  are  excluded  from  GANG
1830                          scheduling operations.
1831
1832              REQUEUE     Preempts  jobs  by  requeuing  them (if possible) or
1833                          canceling them.  For jobs to be requeued  they  must
1834                          have  the --requeue sbatch option set or the cluster
1835                          wide JobRequeue parameter in slurm.conf must be  set
1836                          to one.
1837
1838              SUSPEND     The  preempted jobs will be suspended, and later the
1839                          Gang scheduler will resume them. Therefore the  SUS‐
1840                          PEND preemption mode always needs the GANG option to
1841                          be specified at the cluster level. Also, because the
1842                          suspended  jobs  will  still use memory on the allo‐
1843                          cated nodes, Slurm needs to be able to track  memory
1844                          resources to be able to suspend jobs.
1845
1846                          NOTE:  Because gang scheduling is performed indepen‐
1847                          dently for each partition, if using PreemptType=pre‐
1848                          empt/partition_prio then jobs in higher PriorityTier
1849                          partitions will suspend jobs in  lower  PriorityTier
1850                          partitions  to  run  on the released resources. Only
1851                          when the preemptor job ends will the suspended  jobs
1852                          will be resumed by the Gang scheduler.
1853                          If  PreemptType=preempt/qos is configured and if the
1854                          preempted job(s) and the preemptor job  are  on  the
1855                          same  partition, then they will share resources with
1856                          the Gang scheduler (time-slicing). If not  (i.e.  if
1857                          the preemptees and preemptor are on different parti‐
1858                          tions) then the preempted jobs will remain suspended
1859                          until the preemptor ends.
1860
1861
1862       PreemptType
1863              Specifies  the  plugin  used  to identify which jobs can be pre‐
1864              empted in order to start a pending job.
1865
1866              preempt/none
1867                     Job preemption is disabled.  This is the default.
1868
1869              preempt/partition_prio
1870                     Job preemption  is  based  upon  partition  PriorityTier.
1871                     Jobs  in  higher PriorityTier partitions may preempt jobs
1872                     from lower PriorityTier partitions.  This is not compati‐
1873                     ble with PreemptMode=OFF.
1874
1875              preempt/qos
1876                     Job  preemption rules are specified by Quality Of Service
1877                     (QOS) specifications in the Slurm database.  This  option
1878                     is  not compatible with PreemptMode=OFF.  A configuration
1879                     of PreemptMode=SUSPEND is only supported by  the  Select‐
1880                     Type=select/cons_res    and   SelectType=select/cons_tres
1881                     plugins.  See the sacctmgr man page to configure the  op‐
1882                     tions for preempt/qos.
1883
1884
1885       PreemptExemptTime
1886              Global  option for minimum run time for all jobs before they can
1887              be considered for preemption. Any  QOS  PreemptExemptTime  takes
1888              precedence  over  the  global option.  A time of -1 disables the
1889              option, equivalent to 0. Acceptable time formats  include  "min‐
1890              utes", "minutes:seconds", "hours:minutes:seconds", "days-hours",
1891              "days-hours:minutes", and "days-hours:minutes:seconds".
1892
1893
1894       PrEpParameters
1895              Parameters to be passed to the PrEpPlugins.
1896
1897
1898       PrEpPlugins
1899              A resource for programmers wishing to write  their  own  plugins
1900              for  the Prolog and Epilog (PrEp) scripts. The default, and cur‐
1901              rently the only implemented plugin  is  prep/script.  Additional
1902              plugins can be specified in a comma-separated list. For more in‐
1903              formation please see the PrEp  Plugin  API  documentation  page:
1904              <https://slurm.schedmd.com/prep_plugins.html>
1905
1906
1907       PriorityCalcPeriod
1908              The  period of time in minutes in which the half-life decay will
1909              be re-calculated.  Applicable only if PriorityType=priority/mul‐
1910              tifactor.  The default value is 5 (minutes).
1911
1912
1913       PriorityDecayHalfLife
1914              This  controls  how long prior resource use is considered in de‐
1915              termining how over- or under-serviced an association  is  (user,
1916              bank  account  and  cluster)  in  determining job priority.  The
1917              record of usage will be decayed over  time,  with  half  of  the
1918              original  value cleared at age PriorityDecayHalfLife.  If set to
1919              0 no decay will be applied.  This is helpful if you want to  en‐
1920              force  hard  time  limits  per association.  If set to 0 Priori‐
1921              tyUsageResetPeriod must be set  to  some  interval.   Applicable
1922              only  if  PriorityType=priority/multifactor.  The unit is a time
1923              string (i.e. min, hr:min:00, days-hr:min:00, or  days-hr).   The
1924              default value is 7-0 (7 days).
1925
1926
1927       PriorityFavorSmall
1928              Specifies  that small jobs should be given preferential schedul‐
1929              ing priority.  Applicable only  if  PriorityType=priority/multi‐
1930              factor.  Supported values are "YES" and "NO".  The default value
1931              is "NO".
1932
1933
1934       PriorityFlags
1935              Flags to modify priority behavior.  Applicable only if Priority‐
1936              Type=priority/multifactor.   The  keywords below have no associ‐
1937              ated   value   (e.g.    "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
1938              TIVE_TO_TIME").
1939
1940              ACCRUE_ALWAYS    If  set,  priority age factor will be increased
1941                               despite job dependencies or holds.
1942
1943              CALCULATE_RUNNING
1944                               If set, priorities  will  be  recalculated  not
1945                               only  for  pending  jobs,  but also running and
1946                               suspended jobs.
1947
1948              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
1949                               lar  to the normal multifactor calculation, but
1950                               depth of the associations in the  tree  do  not
1951                               adversely  effect  their  priority. This option
1952                               automatically enables NO_FAIR_TREE.
1953
1954              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
1955                               to "classic" fair share priority scheduling.
1956
1957              INCR_ONLY        If  set,  priority values will only increase in
1958                               value. Job  priority  will  never  decrease  in
1959                               value.
1960
1961              MAX_TRES         If  set,  the  weighted  TRES value (e.g. TRES‐
1962                               BillingWeights) is calculated as the MAX of in‐
1963                               dividual TRES' on a node (e.g. cpus, mem, gres)
1964                               plus the sum of  all  global  TRES'  (e.g.  li‐
1965                               censes).
1966
1967              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
1968
1969              NO_NORMAL_ASSOC  If  set,  the association factor is not normal‐
1970                               ized against the highest association priority.
1971
1972              NO_NORMAL_PART   If set, the partition factor is not  normalized
1973                               against  the  highest partition PriorityJobFac‐
1974                               tor.
1975
1976              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
1977                               against the highest qos priority.
1978
1979              NO_NORMAL_TRES   If  set,  the  QOS  factor  is  not  normalized
1980                               against the job's partition TRES counts.
1981
1982              SMALL_RELATIVE_TO_TIME
1983                               If set, the job's size component will be  based
1984                               upon not the job size alone, but the job's size
1985                               divided by its time limit.
1986
1987
1988       PriorityMaxAge
1989              Specifies the job age which will be given the maximum age factor
1990              in  computing priority. For example, a value of 30 minutes would
1991              result in all jobs over  30  minutes  old  would  get  the  same
1992              age-based  priority.   Applicable  only  if  PriorityType=prior‐
1993              ity/multifactor.   The  unit  is  a  time  string   (i.e.   min,
1994              hr:min:00,  days-hr:min:00,  or  days-hr).  The default value is
1995              7-0 (7 days).
1996
1997
1998       PriorityParameters
1999              Arbitrary string used by the PriorityType plugin.
2000
2001
2002       PrioritySiteFactorParameters
2003              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
2004
2005
2006       PrioritySiteFactorPlugin
2007              The specifies an optional plugin to be  used  alongside  "prior‐
2008              ity/multifactor",  which  is meant to initially set and continu‐
2009              ously update the SiteFactor priority factor.  The default  value
2010              is "site_factor/none".
2011
2012
2013       PriorityType
2014              This  specifies  the  plugin  to be used in establishing a job's
2015              scheduling priority. Supported values are "priority/basic" (jobs
2016              are  prioritized  by  order  of arrival), "priority/multifactor"
2017              (jobs are prioritized based upon size, age, fair-share of  allo‐
2018              cation, etc).  Also see PriorityFlags for configuration options.
2019              The default value is "priority/basic".
2020
2021              When not FIFO scheduling, jobs are prioritized in the  following
2022              order:
2023
2024              1. Jobs that can preempt
2025              2. Jobs with an advanced reservation
2026              3. Partition Priority Tier
2027              4. Job Priority
2028              5. Job Id
2029
2030
2031       PriorityUsageResetPeriod
2032              At  this  interval the usage of associations will be reset to 0.
2033              This is used if you want to enforce hard limits  of  time  usage
2034              per association.  If PriorityDecayHalfLife is set to be 0 no de‐
2035              cay will happen and this is the only way to reset the usage  ac‐
2036              cumulated by running jobs.  By default this is turned off and it
2037              is advised to use the PriorityDecayHalfLife option to avoid  not
2038              having  anything  running on your cluster, but if your schema is
2039              set up to only allow certain amounts of time on your system this
2040              is  the  way  to  do it.  Applicable only if PriorityType=prior‐
2041              ity/multifactor.
2042
2043              NONE        Never clear historic usage. The default value.
2044
2045              NOW         Clear the historic usage now.  Executed  at  startup
2046                          and reconfiguration time.
2047
2048              DAILY       Cleared every day at midnight.
2049
2050              WEEKLY      Cleared every week on Sunday at time 00:00.
2051
2052              MONTHLY     Cleared  on  the  first  day  of  each month at time
2053                          00:00.
2054
2055              QUARTERLY   Cleared on the first day of  each  quarter  at  time
2056                          00:00.
2057
2058              YEARLY      Cleared on the first day of each year at time 00:00.
2059
2060
2061       PriorityWeightAge
2062              An  integer  value  that sets the degree to which the queue wait
2063              time component contributes to the  job's  priority.   Applicable
2064              only  if  PriorityType=priority/multifactor.   Requires Account‐
2065              ingStorageType=accounting_storage/slurmdbd.  The  default  value
2066              is 0.
2067
2068
2069       PriorityWeightAssoc
2070              An  integer  value that sets the degree to which the association
2071              component contributes to the job's priority.  Applicable only if
2072              PriorityType=priority/multifactor.  The default value is 0.
2073
2074
2075       PriorityWeightFairshare
2076              An  integer  value  that sets the degree to which the fair-share
2077              component contributes to the job's priority.  Applicable only if
2078              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2079              ageType=accounting_storage/slurmdbd.  The default value is 0.
2080
2081
2082       PriorityWeightJobSize
2083              An integer value that sets the degree to which the job size com‐
2084              ponent  contributes  to  the job's priority.  Applicable only if
2085              PriorityType=priority/multifactor.  The default value is 0.
2086
2087
2088       PriorityWeightPartition
2089              Partition factor used by priority/multifactor plugin  in  calcu‐
2090              lating  job  priority.   Applicable  only if PriorityType=prior‐
2091              ity/multifactor.  The default value is 0.
2092
2093
2094       PriorityWeightQOS
2095              An integer value that sets the degree to which  the  Quality  Of
2096              Service component contributes to the job's priority.  Applicable
2097              only if PriorityType=priority/multifactor.  The default value is
2098              0.
2099
2100
2101       PriorityWeightTRES
2102              A  comma-separated  list of TRES Types and weights that sets the
2103              degree that each TRES Type contributes to the job's priority.
2104
2105              e.g.
2106              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2107
2108              Applicable only if PriorityType=priority/multifactor and if  Ac‐
2109              countingStorageTRES is configured with each TRES Type.  Negative
2110              values are allowed.  The default values are 0.
2111
2112
2113       PrivateData
2114              This controls what type of information is  hidden  from  regular
2115              users.   By  default,  all  information is visible to all users.
2116              User SlurmUser and root can always view all information.  Multi‐
2117              ple  values may be specified with a comma separator.  Acceptable
2118              values include:
2119
2120              accounts
2121                     (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2122                     ing  any account definitions unless they are coordinators
2123                     of them.
2124
2125              cloud  Powered down nodes in the cloud are visible.
2126
2127              events prevents users from viewing event information unless they
2128                     have operator status or above.
2129
2130              jobs   Prevents  users  from viewing jobs or job steps belonging
2131                     to other users. (NON-SlurmDBD ACCOUNTING  ONLY)  Prevents
2132                     users  from  viewing job records belonging to other users
2133                     unless they are coordinators of the  association  running
2134                     the job when using sacct.
2135
2136              nodes  Prevents users from viewing node state information.
2137
2138              partitions
2139                     Prevents users from viewing partition state information.
2140
2141              reservations
2142                     Prevents  regular  users  from viewing reservations which
2143                     they can not use.
2144
2145              usage  Prevents users from viewing usage of any other user, this
2146                     applies  to  sshare.  (NON-SlurmDBD ACCOUNTING ONLY) Pre‐
2147                     vents users from viewing usage of any  other  user,  this
2148                     applies to sreport.
2149
2150              users  (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2151                     ing information of any user other than  themselves,  this
2152                     also  makes  it  so  users can only see associations they
2153                     deal with.  Coordinators  can  see  associations  of  all
2154                     users  in  the  account  they are coordinator of, but can
2155                     only see themselves when listing users.
2156
2157
2158       ProctrackType
2159              Identifies the plugin to be used for process tracking on  a  job
2160              step  basis.   The slurmd daemon uses this mechanism to identify
2161              all processes which are children of processes it  spawns  for  a
2162              user job step.  The slurmd daemon must be restarted for a change
2163              in ProctrackType to take  effect.   NOTE:  "proctrack/linuxproc"
2164              and  "proctrack/pgid" can fail to identify all processes associ‐
2165              ated with a job since processes can become a child of  the  init
2166              process  (when  the  parent  process terminates) or change their
2167              process  group.   To  reliably  track  all   processes,   "proc‐
2168              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2169              applies to a job allocation, while ProctrackType applies to  job
2170              steps.  Acceptable values at present include:
2171
2172              proctrack/cgroup
2173                     Uses  linux cgroups to constrain and track processes, and
2174                     is the default for systems with cgroup support.
2175                     NOTE: see "man cgroup.conf" for configuration details.
2176
2177              proctrack/cray_aries
2178                     Uses Cray proprietary process tracking.
2179
2180              proctrack/linuxproc
2181                     Uses linux process tree using parent process IDs.
2182
2183              proctrack/pgid
2184                     Uses Process Group IDs.
2185                     NOTE: This is the default for the BSD family.
2186
2187
2188       Prolog Fully qualified pathname of a program for the slurmd to  execute
2189              whenever it is asked to run a job step from a new job allocation
2190              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2191              may  also  be used to specify more than one program to run (e.g.
2192              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2193              starting  the  first job step.  The prolog script or scripts may
2194              be used to purge files, enable  user  login,  etc.   By  default
2195              there  is  no  prolog. Any configured script is expected to com‐
2196              plete execution quickly (in less time than MessageTimeout).   If
2197              the  prolog  fails (returns a non-zero exit code), this will re‐
2198              sult in the node being set to a DRAIN state and  the  job  being
2199              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2200              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2201              for more information.
2202
2203
2204       PrologEpilogTimeout
2205              The  interval  in seconds Slurms waits for Prolog and Epilog be‐
2206              fore terminating them. The default behavior is to  wait  indefi‐
2207              nitely.  This  interval  applies to the Prolog and Epilog run by
2208              slurmd daemon before and after the job, the PrologSlurmctld  and
2209              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
2210              run by the slurmstepd daemon.
2211
2212
2213       PrologFlags
2214              Flags to control the Prolog behavior. By default  no  flags  are
2215              set.  Multiple flags may be specified in a comma-separated list.
2216              Currently supported options are:
2217
2218              Alloc   If set, the Prolog script will be executed at job  allo‐
2219                      cation.  By  default, Prolog is executed just before the
2220                      task is launched. Therefore, when salloc is started,  no
2221                      Prolog is executed. Alloc is useful for preparing things
2222                      before a user starts to use any allocated resources.  In
2223                      particular,  this  flag  is needed on a Cray system when
2224                      cluster compatibility mode is enabled.
2225
2226                      NOTE: Use of the Alloc flag will increase the  time  re‐
2227                      quired to start jobs.
2228
2229              Contain At job allocation time, use the ProcTrack plugin to cre‐
2230                      ate a job container  on  all  allocated  compute  nodes.
2231                      This  container  may  be  used  for  user  processes not
2232                      launched    under    Slurm    control,    for    example
2233                      pam_slurm_adopt  may  place processes launched through a
2234                      direct  user  login  into  this  container.   If   using
2235                      pam_slurm_adopt,  then  ProcTrackType must be set to ei‐
2236                      ther proctrack/cgroup or proctrack/cray_aries.   Setting
2237                      the Contain implicitly sets the Alloc flag.
2238
2239              NoHold  If  set,  the  Alloc flag should also be set.  This will
2240                      allow for salloc to not block until the prolog  is  fin‐
2241                      ished on each node.  The blocking will happen when steps
2242                      reach the slurmd and before any execution  has  happened
2243                      in  the  step.  This is a much faster way to work and if
2244                      using srun to launch your  tasks  you  should  use  this
2245                      flag.  This  flag cannot be combined with the Contain or
2246                      X11 flags.
2247
2248              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2249                      rently  on each node.  This flag forces those scripts to
2250                      run serially within each node, but  with  a  significant
2251                      penalty to job throughput on each node.
2252
2253              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2254                      This is incompatible with ProctrackType=proctrack/linux‐
2255                      proc.  Setting the X11 flag implicitly enables both Con‐
2256                      tain and Alloc flags as well.
2257
2258
2259       PrologSlurmctld
2260              Fully qualified pathname of a program for the  slurmctld  daemon
2261              to execute before granting a new job allocation (e.g.  "/usr/lo‐
2262              cal/slurm/prolog_controller").  The program  executes  as  Slur‐
2263              mUser on the same node where the slurmctld daemon executes, giv‐
2264              ing it permission to drain nodes and requeue the job if a  fail‐
2265              ure occurs or cancel the job if appropriate.  The program can be
2266              used to reboot nodes or perform other work to prepare  resources
2267              for  use.  Exactly what the program does and how it accomplishes
2268              this is completely at the discretion of the  system  administra‐
2269              tor.   Information  about the job being initiated, its allocated
2270              nodes, etc. are passed to the program  using  environment  vari‐
2271              ables.  While this program is running, the nodes associated with
2272              the job will be have a POWER_UP/CONFIGURING flag  set  in  their
2273              state,  which  can be readily viewed.  The slurmctld daemon will
2274              wait indefinitely for this program to complete.  Once  the  pro‐
2275              gram completes with an exit code of zero, the nodes will be con‐
2276              sidered ready for use and the program will be started.  If  some
2277              node can not be made available for use, the program should drain
2278              the node (typically using the scontrol  command)  and  terminate
2279              with  a non-zero exit code.  A non-zero exit code will result in
2280              the job being requeued (where possible)  or  killed.  Note  that
2281              only  batch jobs can be requeued.  See Prolog and Epilog Scripts
2282              for more information.
2283
2284
2285       PropagatePrioProcess
2286              Controls the scheduling priority (nice value)  of  user  spawned
2287              tasks.
2288
2289              0    The  tasks  will  inherit  the scheduling priority from the
2290                   slurm daemon.  This is the default value.
2291
2292              1    The tasks will inherit the scheduling priority of the  com‐
2293                   mand used to submit them (e.g. srun or sbatch).  Unless the
2294                   job is submitted by user root, the tasks will have a sched‐
2295                   uling  priority  no  higher  than the slurm daemon spawning
2296                   them.
2297
2298              2    The tasks will inherit the scheduling priority of the  com‐
2299                   mand used to submit them (e.g. srun or sbatch) with the re‐
2300                   striction that their nice value will always be  one  higher
2301                   than  the slurm daemon (i.e.  the tasks scheduling priority
2302                   will be lower than the slurm daemon).
2303
2304
2305       PropagateResourceLimits
2306              A comma-separated list of resource limit names.  The slurmd dae‐
2307              mon  uses these names to obtain the associated (soft) limit val‐
2308              ues from the user's process  environment  on  the  submit  node.
2309              These  limits  are  then propagated and applied to the jobs that
2310              will run on the compute nodes.  This  parameter  can  be  useful
2311              when  system  limits vary among nodes.  Any resource limits that
2312              do not appear in the list are not propagated.  However, the user
2313              can  override this by specifying which resource limits to propa‐
2314              gate with the sbatch or srun "--propagate"  option.  If  neither
2315              PropagateResourceLimits   or  PropagateResourceLimitsExcept  are
2316              configured and the "--propagate" option is not  specified,  then
2317              the  default  action is to propagate all limits. Only one of the
2318              parameters, either PropagateResourceLimits or PropagateResource‐
2319              LimitsExcept,  may be specified.  The user limits can not exceed
2320              hard limits under which the slurmd daemon operates. If the  user
2321              limits  are  not  propagated,  the limits from the slurmd daemon
2322              will be propagated to the user's job. The limits  used  for  the
2323              Slurm  daemons  can  be  set in the /etc/sysconf/slurm file. For
2324              more information,  see:  https://slurm.schedmd.com/faq.html#mem‐
2325              lock  The following limit names are supported by Slurm (although
2326              some options may not be supported on some systems):
2327
2328              ALL       All limits listed below (default)
2329
2330              NONE      No limits listed below
2331
2332              AS        The maximum address space for a process
2333
2334              CORE      The maximum size of core file
2335
2336              CPU       The maximum amount of CPU time
2337
2338              DATA      The maximum size of a process's data segment
2339
2340              FSIZE     The maximum size of files created. Note  that  if  the
2341                        user  sets  FSIZE to less than the current size of the
2342                        slurmd.log, job launches will fail with a  'File  size
2343                        limit exceeded' error.
2344
2345              MEMLOCK   The maximum size that may be locked into memory
2346
2347              NOFILE    The maximum number of open files
2348
2349              NPROC     The maximum number of processes available
2350
2351              RSS       The maximum resident set size
2352
2353              STACK     The maximum stack size
2354
2355
2356       PropagateResourceLimitsExcept
2357              A comma-separated list of resource limit names.  By default, all
2358              resource limits will be propagated, (as described by the  Propa‐
2359              gateResourceLimits  parameter),  except for the limits appearing
2360              in this list.   The user can override this by  specifying  which
2361              resource  limits  to propagate with the sbatch or srun "--propa‐
2362              gate" option.  See PropagateResourceLimits above for a  list  of
2363              valid limit names.
2364
2365
2366       RebootProgram
2367              Program  to  be  executed on each compute node to reboot it. In‐
2368              voked on each node once it becomes idle after the command "scon‐
2369              trol  reboot" is executed by an authorized user or a job is sub‐
2370              mitted with the "--reboot" option.  After rebooting, the node is
2371              returned to normal use.  See ResumeTimeout to configure the time
2372              you expect a reboot to finish in.  A node will be marked DOWN if
2373              it doesn't reboot within ResumeTimeout.
2374
2375
2376       ReconfigFlags
2377              Flags  to  control  various  actions  that  may be taken when an
2378              "scontrol reconfig" command is  issued.  Currently  the  options
2379              are:
2380
2381              KeepPartInfo     If  set,  an  "scontrol  reconfig" command will
2382                               maintain  the  in-memory  value  of   partition
2383                               "state" and other parameters that may have been
2384                               dynamically updated by "scontrol update".  Par‐
2385                               tition  information in the slurm.conf file will
2386                               be merged with in-memory data.  This  flag  su‐
2387                               persedes the KeepPartState flag.
2388
2389              KeepPartState    If  set,  an  "scontrol  reconfig" command will
2390                               preserve only  the  current  "state"  value  of
2391                               in-memory  partitions  and will reset all other
2392                               parameters of the partitions that may have been
2393                               dynamically updated by "scontrol update" to the
2394                               values from the slurm.conf file.  Partition in‐
2395                               formation in the slurm.conf file will be merged
2396                               with in-memory data.
2397              The default for the above flags is not set,  and  the  "scontrol
2398              reconfig"  will rebuild the partition information using only the
2399              definitions in the slurm.conf file.
2400
2401
2402       RequeueExit
2403              Enables automatic requeue for batch jobs  which  exit  with  the
2404              specified values.  Separate multiple exit code by a comma and/or
2405              specify numeric ranges using a  "-"  separator  (e.g.  "Requeue‐
2406              Exit=1-9,18")  Jobs  will  be  put  back in to pending state and
2407              later scheduled again.  Restarted jobs will have the environment
2408              variable  SLURM_RESTART_COUNT set to the number of times the job
2409              has been restarted.
2410
2411
2412       RequeueExitHold
2413              Enables automatic requeue for batch jobs  which  exit  with  the
2414              specified values, with these jobs being held until released man‐
2415              ually by the user.  Separate  multiple  exit  code  by  a  comma
2416              and/or  specify  numeric ranges using a "-" separator (e.g. "Re‐
2417              queueExitHold=10-12,16") These jobs  are  put  in  the  JOB_SPE‐
2418              CIAL_EXIT  exit state.  Restarted jobs will have the environment
2419              variable SLURM_RESTART_COUNT set to the number of times the  job
2420              has been restarted.
2421
2422
2423       ResumeFailProgram
2424              The  program  that will be executed when nodes fail to resume to
2425              by ResumeTimeout. The argument to the program will be the  names
2426              of the failed nodes (using Slurm's hostlist expression format).
2427
2428
2429       ResumeProgram
2430              Slurm  supports a mechanism to reduce power consumption on nodes
2431              that remain idle for an extended period of time.  This is  typi‐
2432              cally accomplished by reducing voltage and frequency or powering
2433              the node down.  ResumeProgram is the program that will  be  exe‐
2434              cuted  when  a  node in power save mode is assigned work to per‐
2435              form.  For reasons of  reliability,  ResumeProgram  may  execute
2436              more  than once for a node when the slurmctld daemon crashes and
2437              is restarted.  If ResumeProgram is unable to restore a  node  to
2438              service  with  a  responding  slurmd and an updated BootTime, it
2439              should requeue any job associated with the node and set the node
2440              state  to  DOWN.  If the node isn't actually rebooted (i.e. when
2441              multiple-slurmd is configured) starting slurmd with "-b"  option
2442              might  be useful.  The program executes as SlurmUser.  The argu‐
2443              ment to the program will be the names of  nodes  to  be  removed
2444              from  power savings mode (using Slurm's hostlist expression for‐
2445              mat).  By default no program is run.  Related configuration  op‐
2446              tions  include  ResumeTimeout, ResumeRate, SuspendRate, Suspend‐
2447              Time, SuspendTimeout, SuspendProgram, SuspendExcNodes, and  Sus‐
2448              pendExcParts.   More  information  is available at the Slurm web
2449              site ( https://slurm.schedmd.com/power_save.html ).
2450
2451
2452       ResumeRate
2453              The rate at which nodes in power save mode are returned to  nor‐
2454              mal  operation  by  ResumeProgram.  The value is number of nodes
2455              per minute and it can be used to prevent power surges if a large
2456              number of nodes in power save mode are assigned work at the same
2457              time (e.g. a large job starts).  A value of zero results  in  no
2458              limits  being  imposed.   The  default  value  is  300 nodes per
2459              minute.  Related configuration  options  include  ResumeTimeout,
2460              ResumeProgram,  SuspendRate,  SuspendTime,  SuspendTimeout, Sus‐
2461              pendProgram, SuspendExcNodes, and SuspendExcParts.
2462
2463
2464       ResumeTimeout
2465              Maximum time permitted (in seconds) between when a  node  resume
2466              request  is  issued  and when the node is actually available for
2467              use.  Nodes which fail to respond in this  time  frame  will  be
2468              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
2469              which reboot after this time frame will be marked  DOWN  with  a
2470              reason of "Node unexpectedly rebooted."  The default value is 60
2471              seconds.  Related configuration options  include  ResumeProgram,
2472              ResumeRate,  SuspendRate,  SuspendTime, SuspendTimeout, Suspend‐
2473              Program, SuspendExcNodes and SuspendExcParts.  More  information
2474              is     available     at     the     Slurm     web     site     (
2475              https://slurm.schedmd.com/power_save.html ).
2476
2477
2478       ResvEpilog
2479              Fully qualified pathname of a program for the slurmctld to  exe‐
2480              cute  when a reservation ends. The program can be used to cancel
2481              jobs, modify  partition  configuration,  etc.   The  reservation
2482              named  will be passed as an argument to the program.  By default
2483              there is no epilog.
2484
2485
2486       ResvOverRun
2487              Describes how long a job already running in a reservation should
2488              be  permitted  to  execute after the end time of the reservation
2489              has been reached.  The time period is specified in  minutes  and
2490              the  default  value  is 0 (kill the job immediately).  The value
2491              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2492              supported to permit a job to run indefinitely after its reserva‐
2493              tion is terminated.
2494
2495
2496       ResvProlog
2497              Fully qualified pathname of a program for the slurmctld to  exe‐
2498              cute  when a reservation begins. The program can be used to can‐
2499              cel jobs, modify partition configuration, etc.  The  reservation
2500              named  will be passed as an argument to the program.  By default
2501              there is no prolog.
2502
2503
2504       ReturnToService
2505              Controls when a DOWN node will be returned to service.  The  de‐
2506              fault value is 0.  Supported values include
2507
2508              0   A node will remain in the DOWN state until a system adminis‐
2509                  trator explicitly changes its state (even if the slurmd dae‐
2510                  mon registers and resumes communications).
2511
2512              1   A  DOWN node will become available for use upon registration
2513                  with a valid configuration only if it was set  DOWN  due  to
2514                  being  non-responsive.   If  the  node  was set DOWN for any
2515                  other reason (low  memory,  unexpected  reboot,  etc.),  its
2516                  state  will  not automatically be changed.  A node registers
2517                  with a valid configuration if its memory, GRES,  CPU  count,
2518                  etc.  are  equal to or greater than the values configured in
2519                  slurm.conf.
2520
2521              2   A DOWN node will become available for use upon  registration
2522                  with  a  valid  configuration.  The node could have been set
2523                  DOWN for any reason.  A node registers with a valid configu‐
2524                  ration  if its memory, GRES, CPU count, etc. are equal to or
2525                  greater than the values configured in slurm.conf.  (Disabled
2526                  on Cray ALPS systems.)
2527
2528
2529       RoutePlugin
2530              Identifies  the  plugin to be used for defining which nodes will
2531              be used for message forwarding.
2532
2533              route/default
2534                     default, use TreeWidth.
2535
2536              route/topology
2537                     use the switch hierarchy defined in a topology.conf file.
2538                     TopologyPlugin=topology/tree is required.
2539
2540
2541       SbcastParameters
2542              Controls sbcast command behavior. Multiple options can be speci‐
2543              fied in a comma separated list.  Supported values include:
2544
2545              DestDir=       Destination directory for file being broadcast to
2546                             allocated  compute  nodes.  Default value is cur‐
2547                             rent working directory.
2548
2549              Compression=   Specify default file compression  library  to  be
2550                             used.   Supported  values  are  "lz4", "none" and
2551                             "zlib".  The default value with the sbcast --com‐
2552                             press option is "lz4" and "none" otherwise.  Some
2553                             compression libraries may be unavailable on  some
2554                             systems.
2555
2556
2557       SchedulerParameters
2558              The  interpretation  of  this parameter varies by SchedulerType.
2559              Multiple options may be comma separated.
2560
2561              allow_zero_lic
2562                     If set, then job submissions requesting more than config‐
2563                     ured licenses won't be rejected.
2564
2565              assoc_limit_stop
2566                     If  set and a job cannot start due to association limits,
2567                     then do not attempt to initiate any lower  priority  jobs
2568                     in  that  partition.  Setting  this  can  decrease system
2569                     throughput and utilization, but avoid potentially  starv‐
2570                     ing larger jobs by preventing them from launching indefi‐
2571                     nitely.
2572
2573              batch_sched_delay=#
2574                     How long, in seconds, the scheduling of batch jobs can be
2575                     delayed.   This  can be useful in a high-throughput envi‐
2576                     ronment in which batch jobs are submitted at a very  high
2577                     rate  (i.e.  using  the sbatch command) and one wishes to
2578                     reduce the overhead of attempting to schedule each job at
2579                     submit time.  The default value is 3 seconds.
2580
2581              bb_array_stage_cnt=#
2582                     Number of tasks from a job array that should be available
2583                     for burst buffer resource allocation. Higher values  will
2584                     increase  the  system  overhead as each task from the job
2585                     array will be moved to its own job record in  memory,  so
2586                     relatively  small  values are generally recommended.  The
2587                     default value is 10.
2588
2589              bf_busy_nodes
2590                     When selecting resources for pending jobs to reserve  for
2591                     future execution (i.e. the job can not be started immedi‐
2592                     ately), then preferentially select nodes that are in use.
2593                     This  will  tend to leave currently idle resources avail‐
2594                     able for backfilling longer running jobs, but may  result
2595                     in allocations having less than optimal network topology.
2596                     This option  is  currently  only  supported  by  the  se‐
2597                     lect/cons_res   and   select/cons_tres  plugins  (or  se‐
2598                     lect/cray_aries   with   SelectTypeParameters   set    to
2599                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2600                     select/cray_aries plugin over the select/cons_res or  se‐
2601                     lect/cons_tres plugin respectively).
2602
2603              bf_continue
2604                     The backfill scheduler periodically releases locks in or‐
2605                     der to permit other operations  to  proceed  rather  than
2606                     blocking  all  activity for what could be an extended pe‐
2607                     riod of time.  Setting this option will cause  the  back‐
2608                     fill  scheduler  to continue processing pending jobs from
2609                     its original job list after releasing locks even  if  job
2610                     or node state changes.
2611
2612              bf_hetjob_immediate
2613                     Instruct  the  backfill  scheduler  to attempt to start a
2614                     heterogeneous job as soon as all of  its  components  are
2615                     determined  able to do so. Otherwise, the backfill sched‐
2616                     uler will delay heterogeneous  jobs  initiation  attempts
2617                     until  after  the  rest  of the queue has been processed.
2618                     This delay may result in lower priority jobs being  allo‐
2619                     cated  resources, which could delay the initiation of the
2620                     heterogeneous job due to account and/or QOS limits  being
2621                     reached.  This  option is disabled by default. If enabled
2622                     and bf_hetjob_prio=min is not set, then it would be auto‐
2623                     matically set.
2624
2625              bf_hetjob_prio=[min|avg|max]
2626                     At  the  beginning  of  each backfill scheduling cycle, a
2627                     list of pending to be scheduled jobs is sorted  according
2628                     to  the precedence order configured in PriorityType. This
2629                     option instructs the scheduler to alter the sorting algo‐
2630                     rithm to ensure that all components belonging to the same
2631                     heterogeneous job will be attempted to be scheduled  con‐
2632                     secutively  (thus  not fragmented in the resulting list).
2633                     More specifically, all components from the same heteroge‐
2634                     neous  job  will  be treated as if they all have the same
2635                     priority (minimum, average or maximum depending upon this
2636                     option's  parameter)  when  compared  with other jobs (or
2637                     other heterogeneous job components). The  original  order
2638                     will be preserved within the same heterogeneous job. Note
2639                     that the operation is  calculated  for  the  PriorityTier
2640                     layer  and  for  the  Priority  resulting from the prior‐
2641                     ity/multifactor plugin calculations. When enabled, if any
2642                     heterogeneous job requested an advanced reservation, then
2643                     all of that job's components will be treated as  if  they
2644                     had  requested an advanced reservation (and get preferen‐
2645                     tial treatment in scheduling).
2646
2647                     Note that this operation does  not  update  the  Priority
2648                     values  of  the  heterogeneous job components, only their
2649                     order within the list, so the output of the sprio command
2650                     will not be effected.
2651
2652                     Heterogeneous  jobs  have  special scheduling properties:
2653                     they  are  only  scheduled  by  the  backfill  scheduling
2654                     plugin, each of their components is considered separately
2655                     when reserving resources (and might have different Prior‐
2656                     ityTier  or  different Priority values), and no heteroge‐
2657                     neous job component is actually allocated resources until
2658                     all  if  its components can be initiated.  This may imply
2659                     potential scheduling deadlock  scenarios  because  compo‐
2660                     nents from different heterogeneous jobs can start reserv‐
2661                     ing resources in an  interleaved  fashion  (not  consecu‐
2662                     tively),  but  none of the jobs can reserve resources for
2663                     all components and start. Enabling this option  can  help
2664                     to mitigate this problem. By default, this option is dis‐
2665                     abled.
2666
2667              bf_interval=#
2668                     The  number  of  seconds  between  backfill   iterations.
2669                     Higher  values result in less overhead and better respon‐
2670                     siveness.   This  option  applies  only   to   Scheduler‐
2671                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2672                     (3h).
2673
2674
2675              bf_job_part_count_reserve=#
2676                     The backfill scheduling logic will reserve resources  for
2677                     the specified count of highest priority jobs in each par‐
2678                     tition.  For example,  bf_job_part_count_reserve=10  will
2679                     cause the backfill scheduler to reserve resources for the
2680                     ten highest priority jobs in each partition.   Any  lower
2681                     priority  job  that can be started using currently avail‐
2682                     able resources and  not  adversely  impact  the  expected
2683                     start  time of these higher priority jobs will be started
2684                     by the backfill scheduler  The  default  value  is  zero,
2685                     which  will reserve resources for any pending job and de‐
2686                     lay  initiation  of  lower  priority  jobs.    Also   see
2687                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2688                     Min: 0, Max: 100000.
2689
2690
2691              bf_max_job_array_resv=#
2692                     The maximum number of tasks from a job  array  for  which
2693                     the  backfill scheduler will reserve resources in the fu‐
2694                     ture.  Since job arrays can potentially have millions  of
2695                     tasks,  the overhead in reserving resources for all tasks
2696                     can be prohibitive.  In addition various limits may  pre‐
2697                     vent  all  the  jobs from starting at the expected times.
2698                     This has no impact upon the number of tasks  from  a  job
2699                     array  that  can be started immediately, only those tasks
2700                     expected to start at some future time.  Default: 20, Min:
2701                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2702                     tions appear in the job queue once per partition. If dif‐
2703                     ferent copies of a single job array record aren't consec‐
2704                     utive in the job queue and another job array record is in
2705                     between,  then bf_max_job_array_resv tasks are considered
2706                     per partition that the job is submitted to.
2707
2708              bf_max_job_assoc=#
2709                     The maximum number of jobs per user  association  to  at‐
2710                     tempt starting with the backfill scheduler.  This setting
2711                     is similar to bf_max_job_user but is handy if a user  has
2712                     multiple  associations  equating  to  basically different
2713                     users.  One can set this  limit  to  prevent  users  from
2714                     flooding  the  backfill queue with jobs that cannot start
2715                     and that prevent jobs from other users  to  start.   This
2716                     option   applies  only  to  SchedulerType=sched/backfill.
2717                     Also    see    the    bf_max_job_user    bf_max_job_part,
2718                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2719                     bf_max_job_test   to   a   value   much    higher    than
2720                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2721                     bf_max_job_test.
2722
2723              bf_max_job_part=#
2724                     The maximum number  of  jobs  per  partition  to  attempt
2725                     starting  with  the backfill scheduler. This can be espe‐
2726                     cially helpful for systems with large numbers  of  parti‐
2727                     tions  and  jobs.  This option applies only to Scheduler‐
2728                     Type=sched/backfill.  Also  see  the  partition_job_depth
2729                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2730                     value much higher than bf_max_job_part.  Default:  0  (no
2731                     limit), Min: 0, Max: bf_max_job_test.
2732
2733              bf_max_job_start=#
2734                     The  maximum  number  of jobs which can be initiated in a
2735                     single iteration of the backfill scheduler.  This  option
2736                     applies only to SchedulerType=sched/backfill.  Default: 0
2737                     (no limit), Min: 0, Max: 10000.
2738
2739              bf_max_job_test=#
2740                     The maximum number of jobs to attempt backfill scheduling
2741                     for (i.e. the queue depth).  Higher values result in more
2742                     overhead and less responsiveness.  Until  an  attempt  is
2743                     made  to backfill schedule a job, its expected initiation
2744                     time value will not be set.  In the case of  large  clus‐
2745                     ters,  configuring a relatively small value may be desir‐
2746                     able.    This   option   applies   only   to   Scheduler‐
2747                     Type=sched/backfill.    Default:   100,   Min:   1,  Max:
2748                     1,000,000.
2749
2750              bf_max_job_user=#
2751                     The maximum number of jobs per user to  attempt  starting
2752                     with  the backfill scheduler for ALL partitions.  One can
2753                     set this limit to prevent users from flooding  the  back‐
2754                     fill  queue  with jobs that cannot start and that prevent
2755                     jobs from other users to start.  This is similar  to  the
2756                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2757                     SchedulerType=sched/backfill.      Also      see      the
2758                     bf_max_job_part,            bf_max_job_test           and
2759                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2760                     value  much  higher than bf_max_job_user.  Default: 0 (no
2761                     limit), Min: 0, Max: bf_max_job_test.
2762
2763              bf_max_job_user_part=#
2764                     The maximum number of jobs per user per partition to  at‐
2765                     tempt starting with the backfill scheduler for any single
2766                     partition.   This  option  applies  only  to   Scheduler‐
2767                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2768                     bf_max_job_test and bf_max_job_user=# options.   Default:
2769                     0 (no limit), Min: 0, Max: bf_max_job_test.
2770
2771              bf_max_time=#
2772                     The  maximum  time  in seconds the backfill scheduler can
2773                     spend (including time spent sleeping when locks  are  re‐
2774                     leased)  before discontinuing, even if maximum job counts
2775                     have not been  reached.   This  option  applies  only  to
2776                     SchedulerType=sched/backfill.   The  default value is the
2777                     value of bf_interval (which defaults to 30 seconds).  De‐
2778                     fault: bf_interval value (def. 30 sec), Min: 1, Max: 3600
2779                     (1h).  NOTE: If bf_interval is short and  bf_max_time  is
2780                     large, this may cause locks to be acquired too frequently
2781                     and starve out other serviced RPCs. It's advisable if us‐
2782                     ing  this  parameter  to set max_rpc_cnt high enough that
2783                     scheduling isn't always disabled, and low enough that the
2784                     interactive  workload can get through in a reasonable pe‐
2785                     riod of time. max_rpc_cnt needs to be below 256 (the  de‐
2786                     fault  RPC thread limit). Running around the middle (150)
2787                     may give you good results.   NOTE:  When  increasing  the
2788                     amount  of  time  spent in the backfill scheduling cycle,
2789                     Slurm can be prevented from responding to client requests
2790                     in  a  timely  manner.   To  address  this  you  can  use
2791                     max_rpc_cnt to specify a number of queued RPCs before the
2792                     scheduler stops to respond to these requests.
2793
2794              bf_min_age_reserve=#
2795                     The  backfill  and main scheduling logic will not reserve
2796                     resources for pending jobs until they have  been  pending
2797                     and  runnable  for  at least the specified number of sec‐
2798                     onds.  In addition, jobs waiting for less than the speci‐
2799                     fied number of seconds will not prevent a newly submitted
2800                     job from starting immediately, even if the newly  submit‐
2801                     ted  job  has  a lower priority.  This can be valuable if
2802                     jobs lack time limits or all time limits  have  the  same
2803                     value.  The default value is zero, which will reserve re‐
2804                     sources for any pending job and delay initiation of lower
2805                     priority  jobs.   Also  see bf_job_part_count_reserve and
2806                     bf_min_prio_reserve.  Default: 0, Min:  0,  Max:  2592000
2807                     (30 days).
2808
2809              bf_min_prio_reserve=#
2810                     The  backfill  and main scheduling logic will not reserve
2811                     resources for pending jobs unless they  have  a  priority
2812                     equal  to  or  higher than the specified value.  In addi‐
2813                     tion, jobs with a lower priority will not prevent a newly
2814                     submitted  job  from  starting  immediately,  even if the
2815                     newly submitted job has a lower priority.   This  can  be
2816                     valuable  if  one  wished  to  maximum system utilization
2817                     without regard for job priority below a  certain  thresh‐
2818                     old.   The  default value is zero, which will reserve re‐
2819                     sources for any pending job and delay initiation of lower
2820                     priority  jobs.   Also  see bf_job_part_count_reserve and
2821                     bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2822
2823              bf_one_resv_per_job
2824                     Disallow adding more than one  backfill  reservation  per
2825                     job.   The scheduling logic builds a sorted list of (job,
2826                     partition) pairs. Jobs submitted to  multiple  partitions
2827                     have as many entries in the list as requested partitions.
2828                     By default, the backfill scheduler may evaluate  all  the
2829                     (job,  partition)  entries  for a single job, potentially
2830                     reserving resources for each pair, but only starting  the
2831                     job  in the reservation offering the earliest start time.
2832                     Having a single job reserving resources for multiple par‐
2833                     titions  could  impede  other jobs (or hetjob components)
2834                     from reserving resources already reserved for the  reser‐
2835                     vations  related  to  the partitions that don't offer the
2836                     earliest start time.  This option makes it so that a  job
2837                     submitted  to multiple partitions will stop reserving re‐
2838                     sources once the first (job, partition) pair has booked a
2839                     backfill  reservation. Subsequent pairs from the same job
2840                     will only be tested to start now. This allows  for  other
2841                     jobs  to be able to book the other pairs resources at the
2842                     cost of not guaranteeing that  the  multi  partition  job
2843                     will  start  in the partition offering the earliest start
2844                     time (except if it can start now).  This option  is  dis‐
2845                     abled by default.
2846
2847
2848              bf_resolution=#
2849                     The  number  of  seconds  in the resolution of data main‐
2850                     tained about when jobs begin and end. Higher  values  re‐
2851                     sult in better responsiveness and quicker backfill cycles
2852                     by using larger blocks of time to determine  node  eligi‐
2853                     bility.   However,  higher  values lead to less efficient
2854                     system planning, and may miss  opportunities  to  improve
2855                     system  utilization.   This option applies only to Sched‐
2856                     ulerType=sched/backfill.  Default: 60, Min: 1, Max:  3600
2857                     (1 hour).
2858
2859              bf_running_job_reserve
2860                     Add  an extra step to backfill logic, which creates back‐
2861                     fill reservations for jobs running on whole nodes.   This
2862                     option is disabled by default.
2863
2864              bf_window=#
2865                     The  number  of minutes into the future to look when con‐
2866                     sidering jobs to schedule.  Higher values result in  more
2867                     overhead  and  less  responsiveness.  A value at least as
2868                     long as the highest allowed time limit is  generally  ad‐
2869                     visable to prevent job starvation.  In order to limit the
2870                     amount of data managed by the backfill scheduler, if  the
2871                     value of bf_window is increased, then it is generally ad‐
2872                     visable to also increase bf_resolution.  This option  ap‐
2873                     plies  only  to  SchedulerType=sched/backfill.   Default:
2874                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2875
2876              bf_window_linear=#
2877                     For performance reasons, the backfill scheduler will  de‐
2878                     crease  precision in calculation of job expected termina‐
2879                     tion times. By default, the precision starts at  30  sec‐
2880                     onds  and that time interval doubles with each evaluation
2881                     of currently executing jobs when trying to determine when
2882                     a  pending  job  can start. This algorithm can support an
2883                     environment with many thousands of running jobs, but  can
2884                     result  in  the expected start time of pending jobs being
2885                     gradually being deferred due  to  lack  of  precision.  A
2886                     value  for  bf_window_linear will cause the time interval
2887                     to be increased by a constant amount on  each  iteration.
2888                     The  value is specified in units of seconds. For example,
2889                     a value of 60 will cause the backfill  scheduler  on  the
2890                     first  iteration  to  identify the job ending soonest and
2891                     determine if the pending job can be  started  after  that
2892                     job plus all other jobs expected to end within 30 seconds
2893                     (default initial value) of the first job. On the next it‐
2894                     eration,  the  pending job will be evaluated for starting
2895                     after the next job expected to end plus all  jobs  ending
2896                     within  90  seconds of that time (30 second default, plus
2897                     the 60 second option value).  The  third  iteration  will
2898                     have  a  150  second  window  and the fourth 210 seconds.
2899                     Without this option, the time windows will double on each
2900                     iteration  and thus be 30, 60, 120, 240 seconds, etc. The
2901                     use of bf_window_linear is not recommended with more than
2902                     a few hundred simultaneously executing jobs.
2903
2904              bf_yield_interval=#
2905                     The backfill scheduler will periodically relinquish locks
2906                     in order for other  pending  operations  to  take  place.
2907                     This  specifies the times when the locks are relinquished
2908                     in microseconds.  Smaller values may be helpful for  high
2909                     throughput  computing  when  used in conjunction with the
2910                     bf_continue option.  Also see the bf_yield_sleep  option.
2911                     Default:  2,000,000  (2 sec), Min: 1, Max: 10,000,000 (10
2912                     sec).
2913
2914              bf_yield_sleep=#
2915                     The backfill scheduler will periodically relinquish locks
2916                     in  order  for  other  pending  operations to take place.
2917                     This specifies the length of time for which the locks are
2918                     relinquished  in microseconds.  Also see the bf_yield_in‐
2919                     terval option.  Default: 500,000 (0.5 sec), Min: 1,  Max:
2920                     10,000,000 (10 sec).
2921
2922              build_queue_timeout=#
2923                     Defines  the maximum time that can be devoted to building
2924                     a queue of jobs to be tested for scheduling.  If the sys‐
2925                     tem  has  a  huge  number of jobs with dependencies, just
2926                     building the job queue can take so much time  as  to  ad‐
2927                     versely impact overall system performance and this param‐
2928                     eter can be adjusted as needed.   The  default  value  is
2929                     2,000,000 microseconds (2 seconds).
2930
2931              correspond_after_task_cnt=#
2932                     Defines  the number of array tasks that get split for po‐
2933                     tential aftercorr dependency check. Low number may result
2934                     in dependent task check failures when the job one depends
2935                     on gets purged before the split.  Default: 10.
2936
2937              default_queue_depth=#
2938                     The default number of jobs to  attempt  scheduling  (i.e.
2939                     the  queue  depth)  when a running job completes or other
2940                     routine actions occur, however the frequency  with  which
2941                     the scheduler is run may be limited by using the defer or
2942                     sched_min_interval parameters described below.  The  full
2943                     queue  will be tested on a less frequent basis as defined
2944                     by the sched_interval option described below. The default
2945                     value  is  100.   See  the  partition_job_depth option to
2946                     limit depth by partition.
2947
2948              defer  Setting this option will  avoid  attempting  to  schedule
2949                     each  job  individually  at job submit time, but defer it
2950                     until a later time when scheduling multiple jobs simulta‐
2951                     neously  may be possible.  This option may improve system
2952                     responsiveness when large numbers of jobs (many hundreds)
2953                     are  submitted  at  the  same time, but it will delay the
2954                     initiation  time  of  individual  jobs.  Also   see   de‐
2955                     fault_queue_depth above.
2956
2957              delay_boot=#
2958                     Do not reboot nodes in order to satisfied this job's fea‐
2959                     ture specification if the job has been  eligible  to  run
2960                     for  less  than  this time period.  If the job has waited
2961                     for less than the specified  period,  it  will  use  only
2962                     nodes which already have the specified features.  The ar‐
2963                     gument is in units of minutes.  Individual jobs may over‐
2964                     ride this default value with the --delay-boot option.
2965
2966              disable_job_shrink
2967                     Deny  user  requests  to shrink the side of running jobs.
2968                     (However, running jobs may still shrink due to node fail‐
2969                     ure if the --no-kill option was set.)
2970
2971              disable_hetjob_steps
2972                     Disable  job  steps  that  span heterogeneous job alloca‐
2973                     tions.  The default value on Cray systems.
2974
2975              enable_hetjob_steps
2976                     Enable job steps that span heterogeneous job allocations.
2977                     The default value except for Cray systems.
2978
2979              enable_user_top
2980                     Enable  use  of  the "scontrol top" command by non-privi‐
2981                     leged users.
2982
2983              Ignore_NUMA
2984                     Some processors (e.g. AMD Opteron  6000  series)  contain
2985                     multiple  NUMA  nodes per socket. This is a configuration
2986                     which does not map into the hardware entities that  Slurm
2987                     optimizes   resource  allocation  for  (PU/thread,  core,
2988                     socket, baseboard, node and network switch). In order  to
2989                     optimize  resource  allocations  on  such hardware, Slurm
2990                     will consider each NUMA node within the socket as a sepa‐
2991                     rate socket by default. Use the Ignore_NUMA option to re‐
2992                     port the correct socket count, but not optimize  resource
2993                     allocations on the NUMA nodes.
2994
2995              inventory_interval=#
2996                     On  a  Cray system using Slurm on top of ALPS this limits
2997                     the number of times a Basil Inventory call is made.  Nor‐
2998                     mally this call happens every scheduling consideration to
2999                     attempt to close a node state change window with respects
3000                     to what ALPS has.  This call is rather slow, so making it
3001                     less frequently improves performance dramatically, but in
3002                     the situation where a node changes state the window is as
3003                     large as this setting.  In an HTC environment  this  set‐
3004                     ting is a must and we advise around 10 seconds.
3005
3006              max_array_tasks
3007                     Specify  the maximum number of tasks that can be included
3008                     in a job array.  The default limit is  MaxArraySize,  but
3009                     this  option  can be used to set a lower limit. For exam‐
3010                     ple, max_array_tasks=1000 and  MaxArraySize=100001  would
3011                     permit  a maximum task ID of 100000, but limit the number
3012                     of tasks in any single job array to 1000.
3013
3014              max_rpc_cnt=#
3015                     If the number of active threads in the  slurmctld  daemon
3016                     is  equal  to or larger than this value, defer scheduling
3017                     of jobs. The scheduler will check this condition at  cer‐
3018                     tain  points  in code and yield locks if necessary.  This
3019                     can improve Slurm's ability to process requests at a cost
3020                     of  initiating  new jobs less frequently. Default: 0 (op‐
3021                     tion disabled), Min: 0, Max: 1000.
3022
3023                     NOTE: The maximum number of threads  (MAX_SERVER_THREADS)
3024                     is internally set to 256 and defines the number of served
3025                     RPCs at a given time. Setting max_rpc_cnt  to  more  than
3026                     256 will be only useful to let backfill continue schedul‐
3027                     ing work after locks have been yielded (i.e. each 2  sec‐
3028                     onds)  if  there are a maximum of MAX(max_rpc_cnt/10, 20)
3029                     RPCs in the queue. i.e. max_rpc_cnt=1000,  the  scheduler
3030                     will  be  allowed  to  continue after yielding locks only
3031                     when there are less than or equal to  100  pending  RPCs.
3032                     If a value is set, then a value of 10 or higher is recom‐
3033                     mended. It may require some tuning for each  system,  but
3034                     needs to be high enough that scheduling isn't always dis‐
3035                     abled, and low enough that requests can get through in  a
3036                     reasonable period of time.
3037
3038              max_sched_time=#
3039                     How  long, in seconds, that the main scheduling loop will
3040                     execute for before exiting.  If a value is configured, be
3041                     aware  that  all  other Slurm operations will be deferred
3042                     during this time period.  Make certain the value is lower
3043                     than  MessageTimeout.   If a value is not explicitly con‐
3044                     figured, the default value is half of MessageTimeout with
3045                     a minimum default value of 1 second and a maximum default
3046                     value of 2 seconds.  For  example  if  MessageTimeout=10,
3047                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3048
3049              max_script_size=#
3050                     Specify  the  maximum  size  of a batch script, in bytes.
3051                     The default value is 4 megabytes.  Larger values may  ad‐
3052                     versely impact system performance.
3053
3054              max_switch_wait=#
3055                     Maximum  number of seconds that a job can delay execution
3056                     waiting for the specified desired switch count.  The  de‐
3057                     fault value is 300 seconds.
3058
3059              no_backup_scheduling
3060                     If  used,  the  backup  controller will not schedule jobs
3061                     when it takes over. The backup controller will allow jobs
3062                     to  be submitted, modified and cancelled but won't sched‐
3063                     ule new jobs. This is useful in  Cray  environments  when
3064                     the  backup  controller resides on an external Cray node.
3065                     A restart is required to alter this option. This  is  ex‐
3066                     plicitly set on a Cray/ALPS system.
3067
3068              no_env_cache
3069                     If  used,  any job started on node that fails to load the
3070                     env from a node will fail instead  of  using  the  cached
3071                     env.    This   will   also   implicitly   imply  the  re‐
3072                     queue_setup_env_fail option as well.
3073
3074              nohold_on_prolog_fail
3075                     By default, if the Prolog exits with a non-zero value the
3076                     job  is  requeued in a held state. By specifying this pa‐
3077                     rameter the job will be requeued but not held so that the
3078                     scheduler can dispatch it to another host.
3079
3080              pack_serial_at_end
3081                     If  used  with  the  select/cons_res  or select/cons_tres
3082                     plugin, then put serial jobs at the end of the  available
3083                     nodes  rather  than using a best fit algorithm.  This may
3084                     reduce resource fragmentation for some workloads.
3085
3086              partition_job_depth=#
3087                     The default number of jobs to  attempt  scheduling  (i.e.
3088                     the  queue  depth)  from  each partition/queue in Slurm's
3089                     main scheduling logic.  The functionality is  similar  to
3090                     that provided by the bf_max_job_part option for the back‐
3091                     fill scheduling  logic.   The  default  value  is  0  (no
3092                     limit).   Job's  excluded from attempted scheduling based
3093                     upon partition  will  not  be  counted  against  the  de‐
3094                     fault_queue_depth  limit.   Also  see the bf_max_job_part
3095                     option.
3096
3097              permit_job_expansion
3098                     Allow running jobs to request additional nodes be  merged
3099                     in with the current job allocation.
3100
3101              preempt_reorder_count=#
3102                     Specify how many attempts should be made in reording pre‐
3103                     emptable jobs to minimize the count  of  jobs  preempted.
3104                     The  default value is 1. High values may adversely impact
3105                     performance.  The logic to support this  option  is  only
3106                     available  in  the  select/cons_res  and select/cons_tres
3107                     plugins.
3108
3109              preempt_strict_order
3110                     If set, then execute extra logic in an attempt to preempt
3111                     only  the  lowest  priority jobs.  It may be desirable to
3112                     set this configuration parameter when there are  multiple
3113                     priorities  of  preemptable  jobs.   The logic to support
3114                     this option is only available in the select/cons_res  and
3115                     select/cons_tres plugins.
3116
3117              preempt_youngest_first
3118                     If  set,  then  the  preemption sorting algorithm will be
3119                     changed to sort by the job start times to favor  preempt‐
3120                     ing  younger  jobs  over  older. (Requires preempt/parti‐
3121                     tion_prio or preempt/qos plugins.)
3122
3123              reduce_completing_frag
3124                     This option is used to  control  how  scheduling  of  re‐
3125                     sources  is  performed  when  jobs  are in the COMPLETING
3126                     state, which influences potential fragmentation.  If this
3127                     option  is  not  set  then no jobs will be started in any
3128                     partition when any job is in  the  COMPLETING  state  for
3129                     less  than  CompleteWait  seconds.  If this option is set
3130                     then no jobs will be started in any individual  partition
3131                     that  has  a  job  in COMPLETING state for less than Com‐
3132                     pleteWait seconds.  In addition, no jobs will be  started
3133                     in  any  partition with nodes that overlap with any nodes
3134                     in the partition of the completing job.  This  option  is
3135                     to be used in conjunction with CompleteWait.
3136
3137                     NOTE: CompleteWait must be set in order for this to work.
3138                     If CompleteWait=0 then this option does nothing.
3139
3140                     NOTE: reduce_completing_frag only affects the main sched‐
3141                     uler, not the backfill scheduler.
3142
3143              requeue_setup_env_fail
3144                     By default if a job environment setup fails the job keeps
3145                     running with a limited environment.  By  specifying  this
3146                     parameter  the job will be requeued in held state and the
3147                     execution node drained.
3148
3149              salloc_wait_nodes
3150                     If defined, the salloc command will wait until all  allo‐
3151                     cated  nodes  are  ready for use (i.e. booted) before the
3152                     command returns. By default, salloc will return  as  soon
3153                     as the resource allocation has been made.
3154
3155              sbatch_wait_nodes
3156                     If  defined,  the sbatch script will wait until all allo‐
3157                     cated nodes are ready for use (i.e.  booted)  before  the
3158                     initiation.  By default, the sbatch script will be initi‐
3159                     ated as soon as the first node in the job  allocation  is
3160                     ready.  The  sbatch  command can use the --wait-all-nodes
3161                     option to override this configuration parameter.
3162
3163              sched_interval=#
3164                     How frequently, in seconds, the main scheduling loop will
3165                     execute  and test all pending jobs.  The default value is
3166                     60 seconds.
3167
3168              sched_max_job_start=#
3169                     The maximum number of jobs that the main scheduling logic
3170                     will start in any single execution.  The default value is
3171                     zero, which imposes no limit.
3172
3173              sched_min_interval=#
3174                     How frequently, in microseconds, the main scheduling loop
3175                     will  execute  and  test any pending jobs.  The scheduler
3176                     runs in a limited fashion every time that any event  hap‐
3177                     pens  which could enable a job to start (e.g. job submit,
3178                     job terminate, etc.).  If these events happen at  a  high
3179                     frequency, the scheduler can run very frequently and con‐
3180                     sume significant resources if not throttled by  this  op‐
3181                     tion.  This option specifies the minimum time between the
3182                     end of one scheduling cycle and the beginning of the next
3183                     scheduling  cycle.   A  value of zero will disable throt‐
3184                     tling of the  scheduling  logic  interval.   The  default
3185                     value  is 1,000,000 microseconds on Cray/ALPS systems and
3186                     2 microseconds on other systems.
3187
3188              spec_cores_first
3189                     Specialized cores will be selected from the  first  cores
3190                     of  the  first  sockets, cycling through the sockets on a
3191                     round robin basis.  By default, specialized cores will be
3192                     selected from the last cores of the last sockets, cycling
3193                     through the sockets on a round robin basis.
3194
3195              step_retry_count=#
3196                     When a step completes and there are steps ending resource
3197                     allocation, then retry step allocations for at least this
3198                     number of pending steps.  Also see step_retry_time.   The
3199                     default value is 8 steps.
3200
3201              step_retry_time=#
3202                     When a step completes and there are steps ending resource
3203                     allocation, then retry step  allocations  for  all  steps
3204                     which  have been pending for at least this number of sec‐
3205                     onds.  Also see step_retry_count.  The default  value  is
3206                     60 seconds.
3207
3208              whole_hetjob
3209                     Requests  to  cancel,  hold or release any component of a
3210                     heterogeneous job will be applied to  all  components  of
3211                     the job.
3212
3213                     NOTE:  this  option  was  previously named whole_pack and
3214                     this is still supported for retrocompatibility.
3215
3216
3217       SchedulerTimeSlice
3218              Number of seconds in each time slice when gang scheduling is en‐
3219              abled  (PreemptMode=SUSPEND,GANG).   The value must be between 5
3220              seconds and 65533 seconds.  The default value is 30 seconds.
3221
3222
3223       SchedulerType
3224              Identifies the type of scheduler to be used.  Note the slurmctld
3225              daemon  must  be restarted for a change in scheduler type to be‐
3226              come effective (reconfiguring a running daemon has no effect for
3227              this  parameter).   The scontrol command can be used to manually
3228              change job priorities if desired.  Acceptable values include:
3229
3230              sched/backfill
3231                     For a backfill scheduling module to augment  the  default
3232                     FIFO   scheduling.   Backfill  scheduling  will  initiate
3233                     lower-priority jobs if doing so does not  delay  the  ex‐
3234                     pected  initiation  time of any higher priority job.  Ef‐
3235                     fectiveness of  backfill  scheduling  is  dependent  upon
3236                     users specifying job time limits, otherwise all jobs will
3237                     have the same time limit and backfilling  is  impossible.
3238                     Note  documentation  for  the  SchedulerParameters option
3239                     above.  This is the default configuration.
3240
3241              sched/builtin
3242                     This is the FIFO scheduler which initiates jobs in prior‐
3243                     ity order.  If any job in the partition can not be sched‐
3244                     uled, no lower priority job in  that  partition  will  be
3245                     scheduled.   An  exception  is made for jobs that can not
3246                     run due to partition constraints (e.g. the time limit) or
3247                     down/drained  nodes.   In  that case, lower priority jobs
3248                     can be initiated and not impact the higher priority job.
3249
3250              sched/hold
3251                     To   hold   all   newly   arriving   jobs   if   a   file
3252                     "/etc/slurm.hold"  exists otherwise use the built-in FIFO
3253                     scheduler
3254
3255
3256       ScronParameters
3257              Multiple options may be comma separated.
3258
3259              enable Enable the use of scrontab to submit and manage  periodic
3260                     repeating jobs.
3261
3262
3263       SelectType
3264              Identifies  the type of resource selection algorithm to be used.
3265              Changing this value can only be done by restarting the slurmctld
3266              daemon.  When changed, all job information (running and pending)
3267              will be lost, since the job  state  save  format  used  by  each
3268              plugin  is different.  The only exception to this is when chang‐
3269              ing from cons_res to cons_tres or from  cons_tres  to  cons_res.
3270              However,  if a job contains cons_tres-specific features and then
3271              SelectType is changed to cons_res, the  job  will  be  canceled,
3272              since  there is no way for cons_res to satisfy requirements spe‐
3273              cific to cons_tres.
3274
3275              Acceptable values include
3276
3277              select/cons_res
3278                     The resources (cores and memory) within a node are  indi‐
3279                     vidually  allocated  as  consumable resources.  Note that
3280                     whole nodes can be allocated to jobs for selected  parti‐
3281                     tions  by  using the OverSubscribe=Exclusive option.  See
3282                     the partition OverSubscribe parameter for  more  informa‐
3283                     tion.
3284
3285              select/cons_tres
3286                     The  resources  (cores, memory, GPUs and all other track‐
3287                     able resources) within a node are individually  allocated
3288                     as  consumable  resources.   Note that whole nodes can be
3289                     allocated to jobs for selected partitions  by  using  the
3290                     OverSubscribe=Exclusive  option.  See the partition Over‐
3291                     Subscribe parameter for more information.
3292
3293              select/cray_aries
3294                     for  a  Cray  system.   The   default   value   is   "se‐
3295                     lect/cray_aries" for all Cray systems.
3296
3297              select/linear
3298                     for allocation of entire nodes assuming a one-dimensional
3299                     array of nodes in which sequentially  ordered  nodes  are
3300                     preferable.   For a heterogeneous cluster (e.g. different
3301                     CPU counts on the various  nodes),  resource  allocations
3302                     will  favor  nodes  with  high CPU counts as needed based
3303                     upon the job's node and CPU specification if TopologyPlu‐
3304                     gin=topology/none  is  configured.  Use of other topology
3305                     plugins with select/linear and heterogeneous nodes is not
3306                     recommended  and  may  result in valid job allocation re‐
3307                     quests being rejected.  This is the default value.
3308
3309
3310       SelectTypeParameters
3311              The permitted values of  SelectTypeParameters  depend  upon  the
3312              configured  value of SelectType.  The only supported options for
3313              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3314              which treats memory as a consumable resource and prevents memory
3315              over subscription with job preemption or  gang  scheduling.   By
3316              default  SelectType=select/linear  allocates whole nodes to jobs
3317              without considering their memory consumption.   By  default  Se‐
3318              lectType=select/cons_res,  SelectType=select/cray_aries, and Se‐
3319              lectType=select/cons_tres,  use  CR_CPU,  which  allocates   CPU
3320              (threads) to jobs without considering their memory consumption.
3321
3322              The   following   options   are   supported  for  SelectType=se‐
3323              lect/cray_aries:
3324
3325                     OTHER_CONS_RES
3326                            Layer the select/cons_res  plugin  under  the  se‐
3327                            lect/cray_aries plugin, the default is to layer on
3328                            select/linear.  This also allows all  the  options
3329                            available for SelectType=select/cons_res.
3330
3331                     OTHER_CONS_TRES
3332                            Layer  the  select/cons_tres  plugin under the se‐
3333                            lect/cray_aries plugin, the default is to layer on
3334                            select/linear.   This  also allows all the options
3335                            available for SelectType=select/cons_tres.
3336
3337              The  following  options  are  supported  by  the  SelectType=se‐
3338              lect/cons_res and SelectType=select/cons_tres plugins:
3339
3340                     CR_CPU CPUs are consumable resources.  Configure the num‐
3341                            ber of CPUs on each node, which may  be  equal  to
3342                            the  count  of  cores or hyper-threads on the node
3343                            depending upon the desired minimum resource  allo‐
3344                            cation.   The  node's  Boards,  Sockets, CoresPer‐
3345                            Socket and ThreadsPerCore may optionally  be  con‐
3346                            figured  and  result in job allocations which have
3347                            improved locality; however doing so  will  prevent
3348                            more  than  one  job  from being allocated on each
3349                            core.
3350
3351                     CR_CPU_Memory
3352                            CPUs and memory are consumable resources.  Config‐
3353                            ure  the number of CPUs on each node, which may be
3354                            equal to the count of cores  or  hyper-threads  on
3355                            the  node  depending  upon the desired minimum re‐
3356                            source allocation.  The  node's  Boards,  Sockets,
3357                            CoresPerSocket  and  ThreadsPerCore may optionally
3358                            be configured and result in job allocations  which
3359                            have improved locality; however doing so will pre‐
3360                            vent more than one job  from  being  allocated  on
3361                            each  core.   Setting  a value for DefMemPerCPU is
3362                            strongly recommended.
3363
3364                     CR_Core
3365                            Cores are consumable resources.  On nodes with hy‐
3366                            per-threads,  each  thread  is counted as a CPU to
3367                            satisfy a job's resource requirement, but multiple
3368                            jobs  are  not allocated threads on the same core.
3369                            The count of CPUs allocated to a job is rounded up
3370                            to  account  for  every  CPU on an allocated core.
3371                            This will also impact total allocated memory  when
3372                            --mem-per-cpu is used to be multiply of total num‐
3373                            ber of CPUs on allocated cores.
3374
3375                     CR_Core_Memory
3376                            Cores and memory  are  consumable  resources.   On
3377                            nodes  with  hyper-threads, each thread is counted
3378                            as a CPU to satisfy a job's resource  requirement,
3379                            but multiple jobs are not allocated threads on the
3380                            same core.  The count of CPUs allocated to  a  job
3381                            may  be  rounded up to account for every CPU on an
3382                            allocated core.  Setting a value for  DefMemPerCPU
3383                            is strongly recommended.
3384
3385                     CR_ONE_TASK_PER_CORE
3386                            Allocate  one  task  per core by default.  Without
3387                            this option, by default one task will be allocated
3388                            per thread on nodes with more than one ThreadsPer‐
3389                            Core configured.  NOTE: This option cannot be used
3390                            with CR_CPU*.
3391
3392                     CR_CORE_DEFAULT_DIST_BLOCK
3393                            Allocate cores within a node using block distribu‐
3394                            tion by default.  This is a pseudo-best-fit  algo‐
3395                            rithm that minimizes the number of boards and min‐
3396                            imizes  the  number  of  sockets  (within  minimum
3397                            boards) used for the allocation.  This default be‐
3398                            havior can be overridden specifying  a  particular
3399                            "-m"  parameter  with srun/salloc/sbatch.  Without
3400                            this option,  cores  will  be  allocated  cyclicly
3401                            across the sockets.
3402
3403                     CR_LLN Schedule  resources  to  jobs  on the least loaded
3404                            nodes (based upon the number of idle  CPUs).  This
3405                            is  generally  only recommended for an environment
3406                            with serial jobs as idle resources will tend to be
3407                            highly  fragmented, resulting in parallel jobs be‐
3408                            ing distributed across many nodes.  Note that node
3409                            Weight  takes  precedence  over  how many idle re‐
3410                            sources are on each node.  Also see the  partition
3411                            configuration  parameter  LLN use the least loaded
3412                            nodes in selected partitions.
3413
3414                     CR_Pack_Nodes
3415                            If a job allocation contains more  resources  than
3416                            will  be  used  for launching tasks (e.g. if whole
3417                            nodes are allocated to a job),  then  rather  than
3418                            distributing a job's tasks evenly across its allo‐
3419                            cated nodes, pack them as tightly as  possible  on
3420                            these  nodes.  For example, consider a job alloca‐
3421                            tion containing two entire nodes with  eight  CPUs
3422                            each.   If  the  job starts ten tasks across those
3423                            two nodes without this option, it will start  five
3424                            tasks on each of the two nodes.  With this option,
3425                            eight tasks will be started on the first node  and
3426                            two  tasks on the second node.  This can be super‐
3427                            seded by "NoPack" in srun's  "--distribution"  op‐
3428                            tion.  CR_Pack_Nodes only applies when the "block"
3429                            task distribution method is used.
3430
3431                     CR_Socket
3432                            Sockets are consumable resources.  On  nodes  with
3433                            multiple  cores, each core or thread is counted as
3434                            a CPU to satisfy a job's resource requirement, but
3435                            multiple  jobs  are not allocated resources on the
3436                            same socket.
3437
3438                     CR_Socket_Memory
3439                            Memory and sockets are consumable  resources.   On
3440                            nodes  with multiple cores, each core or thread is
3441                            counted as a CPU to satisfy a job's  resource  re‐
3442                            quirement, but multiple jobs are not allocated re‐
3443                            sources on the same socket.  Setting a  value  for
3444                            DefMemPerCPU is strongly recommended.
3445
3446                     CR_Memory
3447                            Memory  is  a consumable resource.  NOTE: This im‐
3448                            plies OverSubscribe=YES or OverSubscribe=FORCE for
3449                            all  partitions.  Setting a value for DefMemPerCPU
3450                            is strongly recommended.
3451
3452
3453       SlurmctldAddr
3454              An optional address to be used for communications  to  the  cur‐
3455              rently  active  slurmctld  daemon, normally used with Virtual IP
3456              addressing of the currently active server.  If this parameter is
3457              not  specified then each primary and backup server will have its
3458              own unique address used for communications as specified  in  the
3459              SlurmctldHost  parameter.   If  this parameter is specified then
3460              the SlurmctldHost parameter will still be  used  for  communica‐
3461              tions to specific slurmctld primary or backup servers, for exam‐
3462              ple to cause all of them to read the current configuration files
3463              or  shutdown.   Also  see the SlurmctldPrimaryOffProg and Slurm‐
3464              ctldPrimaryOnProg configuration parameters to configure programs
3465              to manipulate virtual IP address manipulation.
3466
3467
3468       SlurmctldDebug
3469              The level of detail to provide slurmctld daemon's logs.  The de‐
3470              fault value is info.  If the slurmctld daemon is initiated  with
3471              -v  or  --verbose  options, that debug level will be preserve or
3472              restored upon reconfiguration.
3473
3474
3475              quiet     Log nothing
3476
3477              fatal     Log only fatal errors
3478
3479              error     Log only errors
3480
3481              info      Log errors and general informational messages
3482
3483              verbose   Log errors and verbose informational messages
3484
3485              debug     Log errors and verbose informational messages and  de‐
3486                        bugging messages
3487
3488              debug2    Log errors and verbose informational messages and more
3489                        debugging messages
3490
3491              debug3    Log errors and verbose informational messages and even
3492                        more debugging messages
3493
3494              debug4    Log errors and verbose informational messages and even
3495                        more debugging messages
3496
3497              debug5    Log errors and verbose informational messages and even
3498                        more debugging messages
3499
3500
3501       SlurmctldHost
3502              The  short, or long, hostname of the machine where Slurm control
3503              daemon is executed (i.e. the name returned by the command "host‐
3504              name -s").  This hostname is optionally followed by the address,
3505              either the IP address or a name by  which  the  address  can  be
3506              identifed,  enclosed  in parentheses (e.g.  SlurmctldHost=slurm‐
3507              ctl-primary(12.34.56.78)). This value must be specified at least
3508              once. If specified more than once, the first hostname named will
3509              be where the daemon runs.  If the first  specified  host  fails,
3510              the  daemon  will execute on the second host.  If both the first
3511              and second specified host fails, the daemon will execute on  the
3512              third host.
3513
3514
3515       SlurmctldLogFile
3516              Fully qualified pathname of a file into which the slurmctld dae‐
3517              mon's logs are written.  The default  value  is  none  (performs
3518              logging via syslog).
3519              See the section LOGGING if a pathname is specified.
3520
3521
3522       SlurmctldParameters
3523              Multiple options may be comma separated.
3524
3525
3526              allow_user_triggers
3527                     Permit  setting  triggers from non-root/slurm_user users.
3528                     SlurmUser must also be set to root to permit these  trig‐
3529                     gers  to  work.  See the strigger man page for additional
3530                     details.
3531
3532              cloud_dns
3533                     By default, Slurm expects that the network address for  a
3534                     cloud  node won't be known until the creation of the node
3535                     and that Slurm will be notified  of  the  node's  address
3536                     (e.g.  scontrol  update nodename=<name> nodeaddr=<addr>).
3537                     Since Slurm communications rely on the node configuration
3538                     found  in the slurm.conf, Slurm will tell the client com‐
3539                     mand, after waiting for all nodes to boot, each node's ip
3540                     address.  However, in environments where the nodes are in
3541                     DNS, this step can be avoided by configuring this option.
3542
3543              cloud_reg_addrs
3544                     When a cloud node  registers,  the  node's  NodeAddr  and
3545                     NodeHostName  will automatically be set. They will be re‐
3546                     set back to the nodename after powering off.
3547
3548              enable_configless
3549                     Permit "configless" operation by the slurmd,  slurmstepd,
3550                     and  user commands.  When enabled the slurmd will be per‐
3551                     mitted to retrieve config files from the  slurmctld,  and
3552                     on any 'scontrol reconfigure' command new configs will be
3553                     automatically pushed out and applied to  nodes  that  are
3554                     running  in  this  "configless" mode.  NOTE: a restart of
3555                     the slurmctld is required for this to take effect.
3556
3557              idle_on_node_suspend
3558                     Mark nodes as idle, regardless  of  current  state,  when
3559                     suspending  nodes  with SuspendProgram so that nodes will
3560                     be eligible to be resumed at a later time.
3561
3562              power_save_interval
3563                     How often the power_save thread looks to resume and  sus‐
3564                     pend  nodes. The power_save thread will do work sooner if
3565                     there are node state changes. Default is 10 seconds.
3566
3567              power_save_min_interval
3568                     How often the power_save thread, at a minimun,  looks  to
3569                     resume and suspend nodes. Default is 0.
3570
3571              max_dbd_msg_action
3572                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3573                     card' (default) and 'exit'.
3574
3575                     When 'discard' is specified and MaxDBDMsgs is reached  we
3576                     start by purging pending messages of types Step start and
3577                     complete, and it reaches MaxDBDMsgs again Job start  mes‐
3578                     sages  are  purged.  Job completes and node state changes
3579                     continue to consume the  empty  space  created  from  the
3580                     purgings  until  MaxDBDMsgs  is reached again at which no
3581                     new message is tracked creating data loss and potentially
3582                     runaway jobs.
3583
3584                     When  'exit'  is  specified and MaxDBDMsgs is reached the
3585                     slurmctld will exit instead of discarding  any  messages.
3586                     It  will  be  impossible to start the slurmctld with this
3587                     option where the slurmdbd is down and  the  slurmctld  is
3588                     tracking more than MaxDBDMsgs.
3589
3590
3591              preempt_send_user_signal
3592                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3593                     tion time even if the signal time hasn't been reached. In
3594                     the  case  of a gracetime preemption the user signal will
3595                     be sent if the user signal has  been  specified  and  not
3596                     sent, otherwise a SIGTERM will be sent to the tasks.
3597
3598              reboot_from_controller
3599                     Run  the  RebootProgram from the controller instead of on
3600                     the slurmds. The RebootProgram will be  passed  a  comma-
3601                     separated list of nodes to reboot.
3602
3603              user_resv_delete
3604                     Allow any user able to run in a reservation to delete it.
3605
3606
3607       SlurmctldPidFile
3608              Fully  qualified  pathname  of  a file into which the  slurmctld
3609              daemon may write its process id. This may be used for  automated
3610              signal   processing.   The  default  value  is  "/var/run/slurm‐
3611              ctld.pid".
3612
3613
3614       SlurmctldPlugstack
3615              A comma delimited list of Slurm controller plugins to be started
3616              when  the  daemon  begins and terminated when it ends.  Only the
3617              plugin's init and fini functions are called.
3618
3619
3620       SlurmctldPort
3621              The port number that the Slurm controller, slurmctld, listens to
3622              for  work. The default value is SLURMCTLD_PORT as established at
3623              system build time. If none is explicitly specified, it  will  be
3624              set  to 6817.  SlurmctldPort may also be configured to support a
3625              range of port numbers in order to accept larger bursts of incom‐
3626              ing messages by specifying two numbers separated by a dash (e.g.
3627              SlurmctldPort=6817-6818).  NOTE:  Either  slurmctld  and  slurmd
3628              daemons  must  not  execute  on  the same nodes or the values of
3629              SlurmctldPort and SlurmdPort must be different.
3630
3631              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3632              automatically  try  to  interact  with  anything opened on ports
3633              8192-60000.  Configure SlurmctldPort to use a  port  outside  of
3634              the configured SrunPortRange and RSIP's port range.
3635
3636
3637       SlurmctldPrimaryOffProg
3638              This  program is executed when a slurmctld daemon running as the
3639              primary server becomes a backup server. By default no program is
3640              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3641              ter.
3642
3643
3644       SlurmctldPrimaryOnProg
3645              This program is executed when a slurmctld daemon  running  as  a
3646              backup  server becomes the primary server. By default no program
3647              is executed.  When using virtual IP  addresses  to  manage  High
3648              Available Slurm services, this program can be used to add the IP
3649              address to an interface (and optionally try to  kill  the  unre‐
3650              sponsive   slurmctld daemon and flush the ARP caches on nodes on
3651              the local ethernet fabric).  See also the related "SlurmctldPri‐
3652              maryOffProg" parameter.
3653
3654       SlurmctldSyslogDebug
3655              The  slurmctld  daemon will log events to the syslog file at the
3656              specified level of detail. If not set, the slurmctld daemon will
3657              log  to  syslog at level fatal, unless there is no SlurmctldLog‐
3658              File and it is running in the background, in which case it  will
3659              log to syslog at the level specified by SlurmctldDebug (at fatal
3660              in the case that SlurmctldDebug is set to quiet) or it is run in
3661              the foreground, when it will be set to quiet.
3662
3663
3664              quiet     Log nothing
3665
3666              fatal     Log only fatal errors
3667
3668              error     Log only errors
3669
3670              info      Log errors and general informational messages
3671
3672              verbose   Log errors and verbose informational messages
3673
3674              debug     Log  errors and verbose informational messages and de‐
3675                        bugging messages
3676
3677              debug2    Log errors and verbose informational messages and more
3678                        debugging messages
3679
3680              debug3    Log errors and verbose informational messages and even
3681                        more debugging messages
3682
3683              debug4    Log errors and verbose informational messages and even
3684                        more debugging messages
3685
3686              debug5    Log errors and verbose informational messages and even
3687                        more debugging messages
3688
3689
3690
3691       SlurmctldTimeout
3692              The interval, in seconds, that the backup controller  waits  for
3693              the  primary controller to respond before assuming control.  The
3694              default value is 120 seconds.  May not exceed 65533.
3695
3696
3697       SlurmdDebug
3698              The level of detail to provide slurmd daemon's  logs.   The  de‐
3699              fault value is info.
3700
3701              quiet     Log nothing
3702
3703              fatal     Log only fatal errors
3704
3705              error     Log only errors
3706
3707              info      Log errors and general informational messages
3708
3709              verbose   Log errors and verbose informational messages
3710
3711              debug     Log  errors and verbose informational messages and de‐
3712                        bugging messages
3713
3714              debug2    Log errors and verbose informational messages and more
3715                        debugging messages
3716
3717              debug3    Log errors and verbose informational messages and even
3718                        more debugging messages
3719
3720              debug4    Log errors and verbose informational messages and even
3721                        more debugging messages
3722
3723              debug5    Log errors and verbose informational messages and even
3724                        more debugging messages
3725
3726
3727       SlurmdLogFile
3728              Fully qualified pathname of a file into which the   slurmd  dae‐
3729              mon's  logs  are  written.   The default value is none (performs
3730              logging via syslog).  Any "%h" within the name is replaced  with
3731              the  hostname  on  which the slurmd is running.  Any "%n" within
3732              the name is replaced with the  Slurm  node  name  on  which  the
3733              slurmd is running.
3734              See the section LOGGING if a pathname is specified.
3735
3736
3737       SlurmdParameters
3738              Parameters  specific  to  the  Slurmd.   Multiple options may be
3739              comma separated.
3740
3741              config_overrides
3742                     If set, consider the configuration of  each  node  to  be
3743                     that  specified  in the slurm.conf configuration file and
3744                     any node with less than the configured resources will not
3745                     be  set  DRAIN.  This option is generally only useful for
3746                     testing  purposes.   Equivalent  to  the  now  deprecated
3747                     FastSchedule=2 option.
3748
3749              shutdown_on_reboot
3750                     If  set,  the  Slurmd will shut itself down when a reboot
3751                     request is received.
3752
3753
3754       SlurmdPidFile
3755              Fully qualified pathname of a file into which the  slurmd daemon
3756              may  write its process id. This may be used for automated signal
3757              processing.  Any "%h" within the name is replaced with the host‐
3758              name  on  which the slurmd is running.  Any "%n" within the name
3759              is replaced with the Slurm node name on which the slurmd is run‐
3760              ning.  The default value is "/var/run/slurmd.pid".
3761
3762
3763       SlurmdPort
3764              The port number that the Slurm compute node daemon, slurmd, lis‐
3765              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3766              lished  at  system  build time. If none is explicitly specified,
3767              its value will be 6818.  NOTE: Either slurmctld and slurmd  dae‐
3768              mons  must not execute on the same nodes or the values of Slurm‐
3769              ctldPort and SlurmdPort must be different.
3770
3771              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3772              automatically  try  to  interact  with  anything opened on ports
3773              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3774              configured SrunPortRange and RSIP's port range.
3775
3776
3777       SlurmdSpoolDir
3778              Fully  qualified  pathname  of a directory into which the slurmd
3779              daemon's state information and batch job script information  are
3780              written.  This  must  be  a  common  pathname for all nodes, but
3781              should represent a directory which is local to each node (refer‐
3782              ence    a   local   file   system).   The   default   value   is
3783              "/var/spool/slurmd".  Any "%h" within the name is replaced  with
3784              the  hostname  on  which the slurmd is running.  Any "%n" within
3785              the name is replaced with the  Slurm  node  name  on  which  the
3786              slurmd is running.
3787
3788
3789       SlurmdSyslogDebug
3790              The  slurmd  daemon  will  log  events to the syslog file at the
3791              specified level of detail. If not set, the  slurmd  daemon  will
3792              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3793              and it is running in the background, in which case it  will  log
3794              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3795              the case that SlurmdDebug is set to quiet) or it is run  in  the
3796              foreground, when it will be set to quiet.
3797
3798
3799              quiet     Log nothing
3800
3801              fatal     Log only fatal errors
3802
3803              error     Log only errors
3804
3805              info      Log errors and general informational messages
3806
3807              verbose   Log errors and verbose informational messages
3808
3809              debug     Log  errors and verbose informational messages and de‐
3810                        bugging messages
3811
3812              debug2    Log errors and verbose informational messages and more
3813                        debugging messages
3814
3815              debug3    Log errors and verbose informational messages and even
3816                        more debugging messages
3817
3818              debug4    Log errors and verbose informational messages and even
3819                        more debugging messages
3820
3821              debug5    Log errors and verbose informational messages and even
3822                        more debugging messages
3823
3824
3825       SlurmdTimeout
3826              The interval, in seconds, that the Slurm  controller  waits  for
3827              slurmd  to respond before configuring that node's state to DOWN.
3828              A value of zero indicates the node will not be tested by  slurm‐
3829              ctld  to confirm the state of slurmd, the node will not be auto‐
3830              matically set  to  a  DOWN  state  indicating  a  non-responsive
3831              slurmd,  and  some other tool will take responsibility for moni‐
3832              toring the state of each compute node  and  its  slurmd  daemon.
3833              Slurm's hierarchical communication mechanism is used to ping the
3834              slurmd daemons in order to minimize system noise  and  overhead.
3835              The  default  value  is  300  seconds.  The value may not exceed
3836              65533 seconds.
3837
3838
3839       SlurmdUser
3840              The name of the user that the slurmd daemon executes  as.   This
3841              user  must  exist on all nodes of the cluster for authentication
3842              of communications between Slurm components.  The  default  value
3843              is "root".
3844
3845
3846       SlurmSchedLogFile
3847              Fully  qualified  pathname of the scheduling event logging file.
3848              The syntax of this parameter is the same  as  for  SlurmctldLog‐
3849              File.   In  order  to  configure scheduler logging, set both the
3850              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3851
3852
3853       SlurmSchedLogLevel
3854              The initial level of scheduling event logging,  similar  to  the
3855              SlurmctldDebug  parameter  used  to control the initial level of
3856              slurmctld logging.  Valid values for SlurmSchedLogLevel are  "0"
3857              (scheduler  logging  disabled)  and  "1"  (scheduler logging en‐
3858              abled).  If this parameter is omitted, the value defaults to "0"
3859              (disabled).   In  order to configure scheduler logging, set both
3860              the SlurmSchedLogFile and  SlurmSchedLogLevel  parameters.   The
3861              scheduler  logging  level can be changed dynamically using scon‐
3862              trol.
3863
3864
3865       SlurmUser
3866              The name of the user that the slurmctld daemon executes as.  For
3867              security  purposes,  a  user  other  than "root" is recommended.
3868              This user must exist on all nodes of the cluster for authentica‐
3869              tion  of  communications  between Slurm components.  The default
3870              value is "root".
3871
3872
3873       SrunEpilog
3874              Fully qualified pathname of an executable to be run by srun fol‐
3875              lowing the completion of a job step.  The command line arguments
3876              for the executable will be the command and arguments of the  job
3877              step.   This configuration parameter may be overridden by srun's
3878              --epilog parameter. Note that while the other "Epilog"  executa‐
3879              bles  (e.g.,  TaskEpilog) are run by slurmd on the compute nodes
3880              where the tasks are executed, the SrunEpilog runs  on  the  node
3881              where the "srun" is executing.
3882
3883
3884       SrunPortRange
3885              The  srun  creates  a set of listening ports to communicate with
3886              the controller, the slurmstepd and  to  handle  the  application
3887              I/O.  By default these ports are ephemeral meaning the port num‐
3888              bers are selected by the  kernel.  Using  this  parameter  allow
3889              sites  to  configure a range of ports from which srun ports will
3890              be selected. This is useful if sites want to allow only  certain
3891              port range on their network.
3892
3893              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3894              automatically try to interact  with  anything  opened  on  ports
3895              8192-60000.   Configure  SrunPortRange  to  use a range of ports
3896              above those used by RSIP, ideally 1000 or more ports, for  exam‐
3897              ple "SrunPortRange=60001-63000".
3898
3899              Note:  A  sufficient number of ports must be configured based on
3900              the estimated number of srun on the submission nodes considering
3901              that  srun  opens  3  listening  ports  plus 2 more for every 48
3902              hosts. Example:
3903
3904              srun -N 48 will use 5 listening ports.
3905
3906
3907              srun -N 50 will use 7 listening ports.
3908
3909
3910              srun -N 200 will use 13 listening ports.
3911
3912
3913       SrunProlog
3914              Fully qualified pathname of an executable  to  be  run  by  srun
3915              prior  to  the launch of a job step.  The command line arguments
3916              for the executable will be the command and arguments of the  job
3917              step.   This configuration parameter may be overridden by srun's
3918              --prolog parameter. Note that while the other "Prolog"  executa‐
3919              bles  (e.g.,  TaskProlog) are run by slurmd on the compute nodes
3920              where the tasks are executed, the SrunProlog runs  on  the  node
3921              where the "srun" is executing.
3922
3923
3924       StateSaveLocation
3925              Fully  qualified  pathname  of  a directory into which the Slurm
3926              controller,  slurmctld,  saves   its   state   (e.g.   "/usr/lo‐
3927              cal/slurm/checkpoint").   Slurm state will saved here to recover
3928              from system failures.  SlurmUser must be able to create files in
3929              this  directory.   If you have a secondary SlurmctldHost config‐
3930              ured, this location should be readable and writable by both sys‐
3931              tems.   Since  all running and pending job information is stored
3932              here, the use of a reliable file system (e.g.  RAID)  is  recom‐
3933              mended.   The  default value is "/var/spool".  If any slurm dae‐
3934              mons terminate abnormally, their core files will also be written
3935              into this directory.
3936
3937
3938       SuspendExcNodes
3939              Specifies  the  nodes  which  are to not be placed in power save
3940              mode, even if the node remains idle for an  extended  period  of
3941              time.  Use Slurm's hostlist expression to identify nodes with an
3942              optional ":" separator and count of nodes to  exclude  from  the
3943              preceding  range.  For example "nid[10-20]:4" will prevent 4 us‐
3944              able nodes (i.e IDLE and not DOWN, DRAINING or  already  powered
3945              down) in the set "nid[10-20]" from being powered down.  Multiple
3946              sets of nodes can be specified with or without counts in a comma
3947              separated  list  (e.g  "nid[10-20]:4,nid[80-90]:2").   If a node
3948              count specification is given, any list of nodes to  NOT  have  a
3949              node  count  must  be after the last specification with a count.
3950              For example "nid[10-20]:4,nid[60-70]" will exclude  4  nodes  in
3951              the  set  "nid[10-20]:4"  plus all nodes in the set "nid[60-70]"
3952              while "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the  set
3953              "nid[1-3],nid[10-20]".   By  default no nodes are excluded.  Re‐
3954              lated configuration options  include  ResumeTimeout,  ResumePro‐
3955              gram, ResumeRate, SuspendProgram, SuspendRate, SuspendTime, Sus‐
3956              pendTimeout, and SuspendExcParts.
3957
3958
3959       SuspendExcParts
3960              Specifies the partitions whose nodes are to  not  be  placed  in
3961              power  save  mode, even if the node remains idle for an extended
3962              period of time.  Multiple partitions can be identified and sepa‐
3963              rated  by  commas.   By  default no nodes are excluded.  Related
3964              configuration options include ResumeTimeout, ResumeProgram,  Re‐
3965              sumeRate,  SuspendProgram, SuspendRate, SuspendTime SuspendTime‐
3966              out, and SuspendExcNodes.
3967
3968
3969       SuspendProgram
3970              SuspendProgram is the program that will be executed when a  node
3971              remains  idle  for  an extended period of time.  This program is
3972              expected to place the node into some power save mode.  This  can
3973              be  used  to  reduce the frequency and voltage of a node or com‐
3974              pletely power the node off.  The program executes as  SlurmUser.
3975              The  argument  to  the  program will be the names of nodes to be
3976              placed into power savings mode (using Slurm's  hostlist  expres‐
3977              sion  format).  By default, no program is run.  Related configu‐
3978              ration options include ResumeTimeout, ResumeProgram, ResumeRate,
3979              SuspendRate,  SuspendTime,  SuspendTimeout, SuspendExcNodes, and
3980              SuspendExcParts.
3981
3982
3983       SuspendRate
3984              The rate at which nodes are placed into power save mode by  Sus‐
3985              pendProgram.  The value is number of nodes per minute and it can
3986              be used to prevent a large drop in power consumption (e.g. after
3987              a  large  job  completes).  A value of zero results in no limits
3988              being imposed.  The default value is 60 nodes per  minute.   Re‐
3989              lated  configuration  options  include ResumeTimeout, ResumePro‐
3990              gram, ResumeRate, SuspendProgram,  SuspendTime,  SuspendTimeout,
3991              SuspendExcNodes, and SuspendExcParts.
3992
3993
3994       SuspendTime
3995              Nodes  which remain idle or down for this number of seconds will
3996              be placed into power save mode by SuspendProgram.  For efficient
3997              system utilization, it is recommended that the value of Suspend‐
3998              Time be at least as large as the sum of SuspendTimeout plus  Re‐
3999              sumeTimeout.   A value of -1 disables power save mode and is the
4000              default.  Related configuration options  include  ResumeTimeout,
4001              ResumeProgram, ResumeRate, SuspendProgram, SuspendRate, Suspend‐
4002              Timeout, SuspendExcNodes, and SuspendExcParts.
4003
4004
4005       SuspendTimeout
4006              Maximum time permitted (in seconds) between when a node  suspend
4007              request  is  issued and when the node is shutdown.  At that time
4008              the node must be ready for a resume  request  to  be  issued  as
4009              needed  for new work.  The default value is 30 seconds.  Related
4010              configuration options include ResumeProgram, ResumeRate, Resume‐
4011              Timeout,  SuspendRate, SuspendTime, SuspendProgram, SuspendExcN‐
4012              odes and SuspendExcParts.  More information is available at  the
4013              Slurm web site ( https://slurm.schedmd.com/power_save.html ).
4014
4015
4016       SwitchType
4017              Identifies  the type of switch or interconnect used for applica‐
4018              tion     communications.      Acceptable     values      include
4019              "switch/cray_aries" for Cray systems, "switch/none" for switches
4020              not requiring special processing for job launch  or  termination
4021              (Ethernet,   and   InfiniBand)   and   The   default   value  is
4022              "switch/none".  All Slurm daemons,  commands  and  running  jobs
4023              must be restarted for a change in SwitchType to take effect.  If
4024              running jobs exist at the time slurmctld is restarted with a new
4025              value  of  SwitchType,  records  of all jobs in any state may be
4026              lost.
4027
4028
4029       TaskEpilog
4030              Fully qualified pathname of a program to be execute as the slurm
4031              job's  owner after termination of each task.  See TaskProlog for
4032              execution order details.
4033
4034
4035       TaskPlugin
4036              Identifies the type of task launch  plugin,  typically  used  to
4037              provide resource management within a node (e.g. pinning tasks to
4038              specific processors). More than one task plugin can be specified
4039              in  a  comma-separated  list. The prefix of "task/" is optional.
4040              Acceptable values include:
4041
4042              task/affinity  enables      resource      containment      using
4043                             sched_setaffinity().  This enables the --cpu-bind
4044                             and/or --mem-bind srun options.
4045
4046              task/cgroup    enables resource containment using Linux  control
4047                             cgroups.   This  enables  the  --cpu-bind  and/or
4048                             --mem-bind  srun   options.    NOTE:   see   "man
4049                             cgroup.conf" for configuration details.
4050
4051              task/none      for systems requiring no special handling of user
4052                             tasks.  Lacks support for the  --cpu-bind  and/or
4053                             --mem-bind  srun  options.   The default value is
4054                             "task/none".
4055
4056              NOTE: It is recommended to stack  task/affinity,task/cgroup  to‐
4057              gether  when configuring TaskPlugin, and setting TaskAffinity=no
4058              and ConstrainCores=yes  in  cgroup.conf.  This  setup  uses  the
4059              task/affinity  plugin  for  setting  the  affinity  of the tasks
4060              (which is better and different than task/cgroup)  and  uses  the
4061              task/cgroup  plugin to fence tasks into the specified resources,
4062              thus combining the best of both pieces.
4063
4064              NOTE: For CRAY systems only: task/cgroup must be used with,  and
4065              listed  after  task/cray_aries  in TaskPlugin. The task/affinity
4066              plugin can be listed anywhere, but the previous constraint  must
4067              be  satisfied.  For  CRAY  systems, a configuration like this is
4068              recommended:
4069              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
4070
4071
4072       TaskPluginParam
4073              Optional parameters  for  the  task  plugin.   Multiple  options
4074              should  be  comma  separated.   If None, Boards, Sockets, Cores,
4075              Threads, and/or Verbose are specified, they  will  override  the
4076              --cpu-bind  option  specified  by  the user in the srun command.
4077              None, Boards, Sockets, Cores and Threads are mutually  exclusive
4078              and since they decrease scheduling flexibility are not generally
4079              recommended (select no more than one of them).
4080
4081
4082              Boards    Bind tasks to boards by default.  Overrides  automatic
4083                        binding.
4084
4085              Cores     Bind  tasks  to cores by default.  Overrides automatic
4086                        binding.
4087
4088              None      Perform no task binding by default.   Overrides  auto‐
4089                        matic binding.
4090
4091              Sockets   Bind to sockets by default.  Overrides automatic bind‐
4092                        ing.
4093
4094              Threads   Bind to threads by default.  Overrides automatic bind‐
4095                        ing.
4096
4097              SlurmdOffSpec
4098                        If  specialized  cores  or CPUs are identified for the
4099                        node (i.e. the CoreSpecCount or CpuSpecList  are  con‐
4100                        figured  for  the node), then Slurm daemons running on
4101                        the compute node (i.e. slurmd and  slurmstepd)  should
4102                        run  outside  of those resources (i.e. specialized re‐
4103                        sources are completely unavailable  to  Slurm  daemons
4104                        and  jobs  spawned  by Slurm).  This option may not be
4105                        used with the task/cray_aries plugin.
4106
4107              Verbose   Verbosely report binding before tasks run.   Overrides
4108                        user options.
4109
4110              Autobind  Set a default binding in the event that "auto binding"
4111                        doesn't find a match.  Set to Threads, Cores or  Sock‐
4112                        ets (E.g. TaskPluginParam=autobind=threads).
4113
4114
4115       TaskProlog
4116              Fully qualified pathname of a program to be execute as the slurm
4117              job's owner prior to initiation of each task.  Besides the  nor‐
4118              mal  environment variables, this has SLURM_TASK_PID available to
4119              identify the process ID of the  task  being  started.   Standard
4120              output  from this program can be used to control the environment
4121              variables and output for the user program.
4122
4123              export NAME=value   Will set environment variables for the  task
4124                                  being  spawned.   Everything after the equal
4125                                  sign to the end of the line will be used  as
4126                                  the value for the environment variable.  Ex‐
4127                                  porting of functions is not  currently  sup‐
4128                                  ported.
4129
4130              print ...           Will  cause  that  line (without the leading
4131                                  "print ") to be printed to the  job's  stan‐
4132                                  dard output.
4133
4134              unset NAME          Will  clear  environment  variables  for the
4135                                  task being spawned.
4136
4137              The order of task prolog/epilog execution is as follows:
4138
4139              1. pre_launch_priv()
4140                                  Function in TaskPlugin
4141
4142              1. pre_launch()     Function in TaskPlugin
4143
4144              2. TaskProlog       System-wide  per  task  program  defined  in
4145                                  slurm.conf
4146
4147              3. User prolog      Job-step-specific task program defined using
4148                                  srun's     --task-prolog      option      or
4149                                  SLURM_TASK_PROLOG environment variable
4150
4151              4. Task             Execute the job step's task
4152
4153              5. User epilog      Job-step-specific task program defined using
4154                                  srun's     --task-epilog      option      or
4155                                  SLURM_TASK_EPILOG environment variable
4156
4157              6. TaskEpilog       System-wide  per  task  program  defined  in
4158                                  slurm.conf
4159
4160              7. post_term()      Function in TaskPlugin
4161
4162
4163       TCPTimeout
4164              Time permitted for TCP connection  to  be  established.  Default
4165              value is 2 seconds.
4166
4167
4168       TmpFS  Fully  qualified  pathname  of the file system available to user
4169              jobs for temporary storage. This parameter is used in establish‐
4170              ing a node's TmpDisk space.  The default value is "/tmp".
4171
4172
4173       TopologyParam
4174              Comma-separated options identifying network topology options.
4175
4176              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4177                             when TopologyPlugin=topology/tree.
4178
4179              TopoOptional   Only optimize allocation for network topology  if
4180                             the  job includes a switch option. Since optimiz‐
4181                             ing resource  allocation  for  topology  involves
4182                             much  higher  system overhead, this option can be
4183                             used to impose the extra overhead  only  on  jobs
4184                             which can take advantage of it. If most job allo‐
4185                             cations are not optimized for  network  topology,
4186                             they  may  fragment  resources  to the point that
4187                             topology optimization for other jobs will be dif‐
4188                             ficult  to  achieve.   NOTE: Jobs may span across
4189                             nodes without common parent  switches  with  this
4190                             enabled.
4191
4192
4193       TopologyPlugin
4194              Identifies  the  plugin  to  be used for determining the network
4195              topology and optimizing job allocations to minimize network con‐
4196              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4197              plugins may be provided in the future which gather topology  in‐
4198              formation directly from the network.  Acceptable values include:
4199
4200              topology/3d_torus    best-fit   logic   over   three-dimensional
4201                                   topology
4202
4203              topology/none        default for other systems,  best-fit  logic
4204                                   over one-dimensional topology
4205
4206              topology/tree        used  for  a  hierarchical  network  as de‐
4207                                   scribed in a topology.conf file
4208
4209
4210       TrackWCKey
4211              Boolean yes or no.  Used to set display and track of  the  Work‐
4212              load  Characterization  Key.  Must be set to track correct wckey
4213              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4214              file to create historical usage reports.
4215
4216
4217       TreeWidth
4218              Slurmd  daemons  use  a virtual tree network for communications.
4219              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4220              architectures  with  a front end node running the slurmd daemon,
4221              the value must always be equal to or greater than the number  of
4222              front end nodes which eliminates the need for message forwarding
4223              between the slurmd daemons.  On other architectures the  default
4224              value  is 50, meaning each slurmd daemon can communicate with up
4225              to 50 other slurmd daemons and over 2500 nodes can be  contacted
4226              with  two  message  hops.   The default value will work well for
4227              most clusters.  Optimal  system  performance  can  typically  be
4228              achieved if TreeWidth is set to the square root of the number of
4229              nodes in the cluster for systems having no more than 2500  nodes
4230              or  the  cube  root for larger systems. The value may not exceed
4231              65533.
4232
4233
4234       UnkillableStepProgram
4235              If the processes in a job step are determined to  be  unkillable
4236              for  a  period  of  time  specified by the UnkillableStepTimeout
4237              variable, the program specified by UnkillableStepProgram will be
4238              executed.  By default no program is run.
4239
4240              See section UNKILLABLE STEP PROGRAM SCRIPT for more information.
4241
4242
4243       UnkillableStepTimeout
4244              The  length of time, in seconds, that Slurm will wait before de‐
4245              ciding that processes in a job step are unkillable  (after  they
4246              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
4247              gram.  The default timeout value is 60  seconds.   If  exceeded,
4248              the compute node will be drained to prevent future jobs from be‐
4249              ing scheduled on the node.
4250
4251
4252       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
4253              will  be enabled.  PAM is used to establish the upper bounds for
4254              resource limits. With PAM support enabled, local system adminis‐
4255              trators can dynamically configure system resource limits. Chang‐
4256              ing the upper bound of a resource limit will not alter the  lim‐
4257              its  of  running jobs, only jobs started after a change has been
4258              made will pick up the new limits.  The default value is  0  (not
4259              to enable PAM support).  Remember that PAM also needs to be con‐
4260              figured to support Slurm as a service.  For  sites  using  PAM's
4261              directory based configuration option, a configuration file named
4262              slurm should be created.  The  module-type,  control-flags,  and
4263              module-path names that should be included in the file are:
4264              auth        required      pam_localuser.so
4265              auth        required      pam_shells.so
4266              account     required      pam_unix.so
4267              account     required      pam_access.so
4268              session     required      pam_unix.so
4269              For sites configuring PAM with a general configuration file, the
4270              appropriate lines (see above), where slurm is the  service-name,
4271              should be added.
4272
4273              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
4274              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
4275              these  two  modules  can work independently of the value set for
4276              UsePAM.
4277
4278
4279       VSizeFactor
4280              Memory specifications in job requests apply to real memory  size
4281              (also  known  as  resident  set size). It is possible to enforce
4282              virtual memory limits for both jobs and job  steps  by  limiting
4283              their virtual memory to some percentage of their real memory al‐
4284              location. The VSizeFactor parameter specifies the job's  or  job
4285              step's  virtual  memory limit as a percentage of its real memory
4286              limit. For example, if a job's real memory limit  is  500MB  and
4287              VSizeFactor  is  set  to  101 then the job will be killed if its
4288              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
4289              (101 percent of the real memory limit).  The default value is 0,
4290              which disables enforcement of virtual memory limits.  The  value
4291              may not exceed 65533 percent.
4292
4293              NOTE:  This  parameter is dependent on OverMemoryKill being con‐
4294              figured in JobAcctGatherParams. It is also possible to configure
4295              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4296              Factor will not  have  an  effect  on  memory  enforcement  done
4297              through cgroups.
4298
4299
4300       WaitTime
4301              Specifies  how  many  seconds the srun command should by default
4302              wait after the first task terminates before terminating all  re‐
4303              maining  tasks.  The  "--wait"  option  on the srun command line
4304              overrides this value.  The default value is  0,  which  disables
4305              this feature.  May not exceed 65533 seconds.
4306
4307
4308       X11Parameters
4309              For use with Slurm's built-in X11 forwarding implementation.
4310
4311              home_xauthority
4312                      If set, xauth data on the compute node will be placed in
4313                      ~/.Xauthority rather than  in  a  temporary  file  under
4314                      TmpFS.
4315
4316

NODE CONFIGURATION

4318       The configuration of nodes (or machines) to be managed by Slurm is also
4319       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
4320       adding  nodes, changing their processor count, etc.) require restarting
4321       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
4322       must know each node in the system to forward messages in support of hi‐
4323       erarchical communications.  Only the NodeName must be supplied  in  the
4324       configuration  file.   All  other node configuration information is op‐
4325       tional.  It is advisable to establish baseline node configurations, es‐
4326       pecially  if the cluster is heterogeneous.  Nodes which register to the
4327       system with less than the configured resources (e.g.  too  little  mem‐
4328       ory),  will  be  placed in the "DOWN" state to avoid scheduling jobs on
4329       them.  Establishing baseline configurations  will  also  speed  Slurm's
4330       scheduling process by permitting it to compare job requirements against
4331       these (relatively few) configuration parameters and possibly avoid hav‐
4332       ing  to check job requirements against every individual node's configu‐
4333       ration.  The resources checked at node  registration  time  are:  CPUs,
4334       RealMemory and TmpDisk.
4335
4336       Default values can be specified with a record in which NodeName is "DE‐
4337       FAULT".  The default entry values will apply only to lines following it
4338       in  the configuration file and the default values can be reset multiple
4339       times in the configuration file  with  multiple  entries  where  "Node‐
4340       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4341       add to previous default values and not a reinitialize the default  val‐
4342       ues.   The  "NodeName="  specification must be placed on every line de‐
4343       scribing the configuration of nodes.  A single node name can not appear
4344       as  a NodeName value in more than one line (duplicate node name records
4345       will be ignored).  In fact, it is generally possible and  desirable  to
4346       define  the configurations of all nodes in only a few lines.  This con‐
4347       vention permits significant optimization in the  scheduling  of  larger
4348       clusters.   In  order to support the concept of jobs requiring consecu‐
4349       tive nodes on some architectures, node specifications should  be  place
4350       in  this  file in consecutive order.  No single node name may be listed
4351       more than once in the configuration file.  Use "DownNodes="  to  record
4352       the  state  of  nodes which are temporarily in a DOWN, DRAIN or FAILING
4353       state without altering  permanent  configuration  information.   A  job
4354       step's  tasks  are  allocated to nodes in order the nodes appear in the
4355       configuration file. There is presently no capability  within  Slurm  to
4356       arbitrarily order a job step's tasks.
4357
4358       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4359       and/or a simple node range expression may optionally be used to specify
4360       numeric  ranges  of  nodes  to avoid building a configuration file with
4361       large numbers of entries.  The node range expression  can  contain  one
4362       pair  of  square  brackets  with  a sequence of comma-separated numbers
4363       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4364       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4365       more leading zeros to indicate the numeric portion has a  fixed  number
4366       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4367       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4368       more  numeric  expressions are included, one of them must be at the end
4369       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4370       always be used in a comma-separated list.
4371
4372       The node configuration specified the following information:
4373
4374
4375       NodeName
4376              Name  that  Slurm uses to refer to a node.  Typically this would
4377              be the string that "/bin/hostname -s" returns.  It may  also  be
4378              the  fully  qualified  domain name as returned by "/bin/hostname
4379              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4380              with the host through the host database (/etc/hosts) or DNS, de‐
4381              pending on the resolver settings.  Note that if the  short  form
4382              of  the hostname is not used, it may prevent use of hostlist ex‐
4383              pressions (the numeric portion in brackets must be at the end of
4384              the string).  It may also be an arbitrary string if NodeHostname
4385              is specified.  If the NodeName is "DEFAULT", the  values  speci‐
4386              fied  with  that record will apply to subsequent node specifica‐
4387              tions unless explicitly set to other values in that node  record
4388              or  replaced  with a different set of default values.  Each line
4389              where NodeName is "DEFAULT" will replace or add to previous  de‐
4390              fault values and not a reinitialize the default values.  For ar‐
4391              chitectures in which the node order is significant,  nodes  will
4392              be considered consecutive in the order defined.  For example, if
4393              the configuration for "NodeName=charlie" immediately follows the
4394              configuration for "NodeName=baker" they will be considered adja‐
4395              cent in the computer.
4396
4397
4398       NodeHostname
4399              Typically this would be the string that "/bin/hostname  -s"  re‐
4400              turns.   It  may  also be the fully qualified domain name as re‐
4401              turned by "/bin/hostname -f" (e.g. "foo1.bar.com"), or any valid
4402              domain  name  associated with the host through the host database
4403              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4404              that  if the short form of the hostname is not used, it may pre‐
4405              vent use of hostlist expressions (the numeric portion in  brack‐
4406              ets  must be at the end of the string).  A node range expression
4407              can be used to specify a set of  nodes.   If  an  expression  is
4408              used,  the  number of nodes identified by NodeHostname on a line
4409              in the configuration file must be identical  to  the  number  of
4410              nodes identified by NodeName.  By default, the NodeHostname will
4411              be identical in value to NodeName.
4412
4413
4414       NodeAddr
4415              Name that a node should be referred to in establishing a  commu‐
4416              nications  path.   This  name will be used as an argument to the
4417              getaddrinfo() function for identification.  If a node range  ex‐
4418              pression  is used to designate multiple nodes, they must exactly
4419              match  the  entries  in  the  NodeName  (e.g.  "NodeName=lx[0-7]
4420              NodeAddr=elx[0-7]").   NodeAddr  may  also contain IP addresses.
4421              By default, the NodeAddr will be identical in value to NodeHost‐
4422              name.
4423
4424
4425       BcastAddr
4426              Alternate  network path to be used for sbcast network traffic to
4427              a given node.  This name will be used  as  an  argument  to  the
4428              getaddrinfo()  function.   If a node range expression is used to
4429              designate multiple nodes, they must exactly match the entries in
4430              the   NodeName   (e.g.  "NodeName=lx[0-7]  BcastAddr=elx[0-7]").
4431              BcastAddr may also contain IP addresses.  By default, the  Bcas‐
4432              tAddr  is  unset,  and  sbcast  traffic  will  be  routed to the
4433              NodeAddr for a given node.  Note: cannot be used with Communica‐
4434              tionParameters=NoInAddrAny.
4435
4436
4437       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4438              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4439              and ThreadsPerCore should be specified.  Boards and CPUs are mu‐
4440              tually exclusive.  The default value is 1.
4441
4442
4443       CoreSpecCount
4444              Number of cores reserved for system use.  These cores  will  not
4445              be  available  for  allocation to user jobs.  Depending upon the
4446              TaskPluginParam option of  SlurmdOffSpec,  Slurm  daemons  (i.e.
4447              slurmd and slurmstepd) may either be confined to these resources
4448              (the default) or prevented from using these  resources.   Isola‐
4449              tion of the Slurm daemons from user jobs may improve application
4450              performance.  If this option and CpuSpecList are both designated
4451              for a node, an error is generated.  For information on the algo‐
4452              rithm used by Slurm to select the cores refer to the  core  spe‐
4453              cialization                    documentation                   (
4454              https://slurm.schedmd.com/core_spec.html ).
4455
4456
4457       CoresPerSocket
4458              Number of cores in a  single  physical  processor  socket  (e.g.
4459              "2").   The  CoresPerSocket  value describes physical cores, not
4460              the logical number of processors per socket.  NOTE: If you  have
4461              multi-core  processors, you will likely need to specify this pa‐
4462              rameter in order to optimize scheduling.  The default  value  is
4463              1.
4464
4465
4466       CpuBind
4467              If  a job step request does not specify an option to control how
4468              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4469              located to the job have the same CpuBind option the node CpuBind
4470              option will control how tasks are bound to allocated  resources.
4471              Supported  values  for  CpuBind  are  "none", "board", "socket",
4472              "ldom" (NUMA), "core" and "thread".
4473
4474
4475       CPUs   Number of logical processors on the node (e.g. "2").   CPUs  and
4476              Boards are mutually exclusive. It can be set to the total number
4477              of sockets(supported only by select/linear), cores  or  threads.
4478              This can be useful when you want to schedule only the cores on a
4479              hyper-threaded node. If CPUs is omitted, its default will be set
4480              equal  to  the  product  of Boards, Sockets, CoresPerSocket, and
4481              ThreadsPerCore.
4482
4483
4484       CpuSpecList
4485              A comma delimited list of Slurm abstract CPU  IDs  reserved  for
4486              system  use.   The  list  will  be expanded to include all other
4487              CPUs, if any, on the same cores.  These cores will not be avail‐
4488              able  for allocation to user jobs.  Depending upon the TaskPlug‐
4489              inParam option of SlurmdOffSpec, Slurm daemons (i.e. slurmd  and
4490              slurmstepd)  may  either be confined to these resources (the de‐
4491              fault) or prevented from using these  resources.   Isolation  of
4492              the Slurm daemons from user jobs may improve application perfor‐
4493              mance.  If this option and CoreSpecCount are both designated for
4494              a node, an error is generated.  This option has no effect unless
4495              cgroup   job   confinement   is   also   configured    (TaskPlu‐
4496              gin=task/cgroup with ConstrainCores=yes in cgroup.conf).
4497
4498
4499       Features
4500              A  comma  delimited list of arbitrary strings indicative of some
4501              characteristic associated with the node.  There is no  value  or
4502              count  associated with a feature at this time, a node either has
4503              a feature or it does not.  A desired feature may contain  a  nu‐
4504              meric  component  indicating,  for  example, processor speed but
4505              this numeric component will be considered to be part of the fea‐
4506              ture  string.  Features  are intended to be used to filter nodes
4507              eligible to run jobs via the --constraint argument.  By  default
4508              a  node  has  no features.  Also see Gres for being able to have
4509              more control such as types and count. Using features  is  faster
4510              than  scheduling  against  GRES but is limited to Boolean opera‐
4511              tions.
4512
4513
4514       Gres   A comma delimited list of generic resources specifications for a
4515              node.    The   format   is:  "<name>[:<type>][:no_consume]:<num‐
4516              ber>[K|M|G]".  The first  field  is  the  resource  name,  which
4517              matches the GresType configuration parameter name.  The optional
4518              type field might be used to identify a model of that generic re‐
4519              source.   It  is forbidden to specify both an untyped GRES and a
4520              typed GRES with the same <name>.  The optional no_consume  field
4521              allows  you  to  specify that a generic resource does not have a
4522              finite number of that resource that gets consumed as it  is  re‐
4523              quested. The no_consume field is a GRES specific setting and ap‐
4524              plies to the GRES, regardless of the type specified.  The  final
4525              field  must specify a generic resources count.  A suffix of "K",
4526              "M", "G", "T" or "P" may be used to multiply the number by 1024,
4527              1048576,          1073741824,         etc.         respectively.
4528              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4529              sume:4G").   By  default a node has no generic resources and its
4530              maximum count is that of an unsigned 64bit  integer.   Also  see
4531              Features  for  Boolean  flags  to  filter  nodes  using job con‐
4532              straints.
4533
4534
4535       MemSpecLimit
4536              Amount of memory, in megabytes, reserved for system use and  not
4537              available  for  user  allocations.  If the task/cgroup plugin is
4538              configured and that plugin constrains memory  allocations  (i.e.
4539              TaskPlugin=task/cgroup in slurm.conf, plus ConstrainRAMSpace=yes
4540              in cgroup.conf), then Slurm compute node  daemons  (slurmd  plus
4541              slurmstepd)  will  be allocated the specified memory limit. Note
4542              that having the Memory set in SelectTypeParameters as any of the
4543              options  that has it as a consumable resource is needed for this
4544              option to work.  The daemons will not be killed if they  exhaust
4545              the  memory allocation (ie. the Out-Of-Memory Killer is disabled
4546              for the daemon's memory cgroup).  If the task/cgroup  plugin  is
4547              not  configured,  the  specified memory will only be unavailable
4548              for user allocations.
4549
4550
4551       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4552              tens  to for work on this particular node. By default there is a
4553              single port number for all slurmd daemons on all  compute  nodes
4554              as  defined  by  the  SlurmdPort configuration parameter. Use of
4555              this option is not generally recommended except for  development
4556              or  testing  purposes.  If  multiple slurmd daemons execute on a
4557              node this can specify a range of ports.
4558
4559              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4560              automatically  try  to  interact  with  anything opened on ports
4561              8192-60000.  Configure Port to use a port outside of the config‐
4562              ured SrunPortRange and RSIP's port range.
4563
4564
4565       Procs  See CPUs.
4566
4567
4568       RealMemory
4569              Size of real memory on the node in megabytes (e.g. "2048").  The
4570              default value is 1. Lowering RealMemory with the goal of setting
4571              aside  some  amount for the OS and not available for job alloca‐
4572              tions will not work as intended if Memory is not set as  a  con‐
4573              sumable resource in SelectTypeParameters. So one of the *_Memory
4574              options need to be enabled for that  goal  to  be  accomplished.
4575              Also see MemSpecLimit.
4576
4577
4578       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4579              "DRAINED" "DRAINING", "FAIL" or "FAILING".  Use  quotes  to  en‐
4580              close a reason having more than one word.
4581
4582
4583       Sockets
4584              Number  of  physical  processor  sockets/chips on the node (e.g.
4585              "2").  If Sockets is omitted, it will  be  inferred  from  CPUs,
4586              CoresPerSocket,   and   ThreadsPerCore.    NOTE:   If  you  have
4587              multi-core processors, you will likely need to specify these pa‐
4588              rameters.   Sockets  and SocketsPerBoard are mutually exclusive.
4589              If Sockets is specified when Boards is also used, Sockets is in‐
4590              terpreted as SocketsPerBoard rather than total sockets.  The de‐
4591              fault value is 1.
4592
4593
4594       SocketsPerBoard
4595              Number of  physical  processor  sockets/chips  on  a  baseboard.
4596              Sockets and SocketsPerBoard are mutually exclusive.  The default
4597              value is 1.
4598
4599
4600       State  State of the node with respect to the initiation of  user  jobs.
4601              Acceptable  values are CLOUD, DOWN, DRAIN, FAIL, FAILING, FUTURE
4602              and UNKNOWN.  Node states of BUSY and IDLE should not be  speci‐
4603              fied  in  the  node configuration, but set the node state to UN‐
4604              KNOWN instead.  Setting the node state to UNKNOWN will result in
4605              the  node  state  being  set  to BUSY, IDLE or other appropriate
4606              state based upon recovered system state  information.   The  de‐
4607              fault value is UNKNOWN.  Also see the DownNodes parameter below.
4608
4609              CLOUD     Indicates  the  node exists in the cloud.  Its initial
4610                        state will be treated as powered down.  The node  will
4611                        be available for use after its state is recovered from
4612                        Slurm's state save file or the slurmd daemon starts on
4613                        the compute node.
4614
4615              DOWN      Indicates the node failed and is unavailable to be al‐
4616                        located work.
4617
4618              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4619                        work.
4620
4621              FAIL      Indicates  the  node  is expected to fail soon, has no
4622                        jobs allocated to it, and will not be allocated to any
4623                        new jobs.
4624
4625              FAILING   Indicates  the  node is expected to fail soon, has one
4626                        or more jobs allocated to it, but will  not  be  allo‐
4627                        cated to any new jobs.
4628
4629              FUTURE    Indicates  the node is defined for future use and need
4630                        not exist when the Slurm daemons  are  started.  These
4631                        nodes can be made available for use simply by updating
4632                        the node state using the scontrol command rather  than
4633                        restarting the slurmctld daemon. After these nodes are
4634                        made available, change their State in  the  slurm.conf
4635                        file.  Until these nodes are made available, they will
4636                        not be seen using any Slurm commands or nor  will  any
4637                        attempt be made to contact them.
4638
4639
4640                        Dynamic Future Nodes
4641                               A slurmd started with -F[<feature>] will be as‐
4642                               sociated with a FUTURE node  that  matches  the
4643                               same configuration (sockets, cores, threads) as
4644                               reported by slurmd -C. The node's NodeAddr  and
4645                               NodeHostname  will  automatically  be retrieved
4646                               from the slurmd and will be  cleared  when  set
4647                               back  to the FUTURE state. Dynamic FUTURE nodes
4648                               retain non-FUTURE state on restart.  Use  scon‐
4649                               trol to put node back into FUTURE state.
4650
4651                               If  the  mapping  of the NodeName to the slurmd
4652                               HostName is not updated in DNS, Dynamic  Future
4653                               nodes  won't  know how to communicate with each
4654                               other -- because NodeAddr and NodeHostName  are
4655                               not defined in the slurm.conf -- and the fanout
4656                               communications need to be disabled  by  setting
4657                               TreeWidth to a high number (e.g. 65533). If the
4658                               DNS mapping is made, then the cloud_dns  Slurm‐
4659                               ctldParameter can be used.
4660
4661
4662              UNKNOWN   Indicates  the  node's  state is undefined but will be
4663                        established (set to BUSY or IDLE) when the slurmd dae‐
4664                        mon  on  that  node  registers. UNKNOWN is the default
4665                        state.
4666
4667
4668       ThreadsPerCore
4669              Number of logical threads in a single physical core (e.g.  "2").
4670              Note  that  the Slurm can allocate resources to jobs down to the
4671              resolution of a core. If your system  is  configured  with  more
4672              than  one  thread per core, execution of a different job on each
4673              thread is not supported unless you  configure  SelectTypeParame‐
4674              ters=CR_CPU  plus CPUs; do not configure Sockets, CoresPerSocket
4675              or ThreadsPerCore.  A job can execute a one task per thread from
4676              within  one  job  step or execute a distinct job step on each of
4677              the threads.  Note also if you are  running  with  more  than  1
4678              thread   per   core  and  running  the  select/cons_res  or  se‐
4679              lect/cons_tres plugin then you will want to set the  SelectType‐
4680              Parameters  variable to something other than CR_CPU to avoid un‐
4681              expected results.  The default value is 1.
4682
4683
4684       TmpDisk
4685              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4686              "16384"). TmpFS (for "Temporary File System") identifies the lo‐
4687              cation which jobs should use for temporary storage.   Note  this
4688              does not indicate the amount of free space available to the user
4689              on the node, only the total file system size. The system  admin‐
4690              istration  should ensure this file system is purged as needed so
4691              that user jobs have access to most of this  space.   The  Prolog
4692              and/or  Epilog  programs  (specified  in the configuration file)
4693              might be used to ensure the file system is kept clean.  The  de‐
4694              fault value is 0.
4695
4696
4697       TRESWeights
4698              TRESWeights  are  used  to calculate a value that represents how
4699              busy a node is. Currently only  used  in  federation  configura‐
4700              tions.  TRESWeights  are  different  from  TRESBillingWeights --
4701              which is used for fairshare calculations.
4702
4703              TRES weights are specified as a comma-separated  list  of  <TRES
4704              Type>=<TRES Weight> pairs.
4705              e.g.
4706              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4707
4708              By  default  the weighted TRES value is calculated as the sum of
4709              all node TRES  types  multiplied  by  their  corresponding  TRES
4710              weight.
4711
4712              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4713              is calculated as the MAX of individual node  TRES'  (e.g.  cpus,
4714              mem, gres).
4715
4716
4717       Weight The  priority  of  the node for scheduling purposes.  All things
4718              being equal, jobs will be allocated the nodes  with  the  lowest
4719              weight  which satisfies their requirements.  For example, a het‐
4720              erogeneous collection of nodes might be  placed  into  a  single
4721              partition for greater system utilization, responsiveness and ca‐
4722              pability. It would be  preferable  to  allocate  smaller  memory
4723              nodes  rather  than larger memory nodes if either will satisfy a
4724              job's requirements.  The units  of  weight  are  arbitrary,  but
4725              larger weights should be assigned to nodes with more processors,
4726              memory, disk space, higher processor speed, etc.  Note that if a
4727              job allocation request can not be satisfied using the nodes with
4728              the lowest weight, the set of nodes with the next lowest  weight
4729              is added to the set of nodes under consideration for use (repeat
4730              as needed for higher weight values). If you absolutely  want  to
4731              minimize  the  number  of higher weight nodes allocated to a job
4732              (at a cost of higher scheduling overhead), give each node a dis‐
4733              tinct  Weight  value and they will be added to the pool of nodes
4734              being considered for scheduling individually.  The default value
4735              is 1.
4736
4737

DOWN NODE CONFIGURATION

4739       The  DownNodes=  parameter  permits  you  to mark certain nodes as in a
4740       DOWN, DRAIN, FAIL, FAILING or FUTURE state without altering the  perma‐
4741       nent configuration information listed under a NodeName= specification.
4742
4743
4744       DownNodes
4745              Any  node name, or list of node names, from the NodeName= speci‐
4746              fications.
4747
4748
4749       Reason Identifies the reason for a node being  in  state  DOWN,  DRAIN,
4750              FAIL,  FAILING or FUTURE.  Use quotes to enclose a reason having
4751              more than one word.
4752
4753
4754       State  State of the node with respect to the initiation of  user  jobs.
4755              Acceptable  values  are  DOWN,  DRAIN, FAIL, FAILING and FUTURE.
4756              For more information about these states see the descriptions un‐
4757              der  State in the NodeName= section above.  The default value is
4758              DOWN.
4759
4760

FRONTEND NODE CONFIGURATION

4762       On computers where frontend nodes are used  to  execute  batch  scripts
4763       rather than compute nodes (Cray ALPS systems), one may configure one or
4764       more frontend nodes using the configuration parameters  defined  below.
4765       These  options  are  very  similar to those used in configuring compute
4766       nodes. These options may only be used on systems configured  and  built
4767       with  the  appropriate parameters (--have-front-end) or a system deter‐
4768       mined to have the appropriate  architecture  by  the  configure  script
4769       (Cray ALPS systems).  The front end configuration specifies the follow‐
4770       ing information:
4771
4772
4773       AllowGroups
4774              Comma-separated list of group names which may  execute  jobs  on
4775              this  front  end node. By default, all groups may use this front
4776              end node.  A user will be permitted to use this front  end  node
4777              if  AllowGroups has at least one group associated with the user.
4778              May not be used with the DenyGroups option.
4779
4780
4781       AllowUsers
4782              Comma-separated list of user names which  may  execute  jobs  on
4783              this  front  end  node. By default, all users may use this front
4784              end node.  May not be used with the DenyUsers option.
4785
4786
4787       DenyGroups
4788              Comma-separated list of group names which are prevented from ex‐
4789              ecuting  jobs  on this front end node.  May not be used with the
4790              AllowGroups option.
4791
4792
4793       DenyUsers
4794              Comma-separated list of user names which are prevented from exe‐
4795              cuting  jobs  on  this front end node.  May not be used with the
4796              AllowUsers option.
4797
4798
4799       FrontendName
4800              Name that Slurm uses to refer to  a  frontend  node.   Typically
4801              this  would  be  the string that "/bin/hostname -s" returns.  It
4802              may also be the fully  qualified  domain  name  as  returned  by
4803              "/bin/hostname  -f"  (e.g.  "foo1.bar.com"), or any valid domain
4804              name  associated  with  the  host  through  the  host   database
4805              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4806              that if the short form of the hostname is not used, it may  pre‐
4807              vent  use of hostlist expressions (the numeric portion in brack‐
4808              ets must be at the end of the string).  If the  FrontendName  is
4809              "DEFAULT",  the  values specified with that record will apply to
4810              subsequent node specifications unless explicitly  set  to  other
4811              values in that frontend node record or replaced with a different
4812              set of default values.  Each line  where  FrontendName  is  "DE‐
4813              FAULT"  will replace or add to previous default values and not a
4814              reinitialize the default values.
4815
4816
4817       FrontendAddr
4818              Name that a frontend node should be referred to in  establishing
4819              a  communications path. This name will be used as an argument to
4820              the getaddrinfo() function for identification.   As  with  Fron‐
4821              tendName, list the individual node addresses rather than using a
4822              hostlist expression.  The number  of  FrontendAddr  records  per
4823              line  must  equal  the  number  of FrontendName records per line
4824              (i.e. you can't map to node names to one address).  FrontendAddr
4825              may  also  contain  IP  addresses.  By default, the FrontendAddr
4826              will be identical in value to FrontendName.
4827
4828
4829       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4830              tens  to  for  work on this particular frontend node. By default
4831              there is a single port number for  all  slurmd  daemons  on  all
4832              frontend nodes as defined by the SlurmdPort configuration param‐
4833              eter. Use of this option is not generally recommended except for
4834              development or testing purposes.
4835
4836              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4837              automatically try to interact  with  anything  opened  on  ports
4838              8192-60000.  Configure Port to use a port outside of the config‐
4839              ured SrunPortRange and RSIP's port range.
4840
4841
4842       Reason Identifies the reason for a frontend node being in  state  DOWN,
4843              DRAINED,  DRAINING,  FAIL  or  FAILING.  Use quotes to enclose a
4844              reason having more than one word.
4845
4846
4847       State  State of the frontend node with respect  to  the  initiation  of
4848              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4849              UNKNOWN.  Node states of BUSY and IDLE should not  be  specified
4850              in the node configuration, but set the node state to UNKNOWN in‐
4851              stead.  Setting the node state to UNKNOWN  will  result  in  the
4852              node  state  being  set to BUSY, IDLE or other appropriate state
4853              based upon recovered system state information.  For more  infor‐
4854              mation  about  these  states see the descriptions under State in
4855              the NodeName= section above.  The default value is UNKNOWN.
4856
4857
4858       As an example, you can do something similar to the following to  define
4859       four front end nodes for running slurmd daemons.
4860       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4861
4862

NODESET CONFIGURATION

4864       The  nodeset  configuration  allows you to define a name for a specific
4865       set of nodes which can be used to simplify the partition  configuration
4866       section, especially for heterogenous or condo-style systems. Each node‐
4867       set may be defined by an explicit list of nodes,  and/or  by  filtering
4868       the  nodes  by  a  particular  configured feature. If both Feature= and
4869       Nodes= are used the nodeset shall be the  union  of  the  two  subsets.
4870       Note  that the nodesets are only used to simplify the partition defini‐
4871       tions at present, and are not usable outside of the partition  configu‐
4872       ration.
4873
4874       Feature
4875              All  nodes  with this single feature will be included as part of
4876              this nodeset.
4877
4878       Nodes  List of nodes in this set.
4879
4880       NodeSet
4881              Unique name for a set of nodes. Must not overlap with any  Node‐
4882              Name definitions.
4883
4884

PARTITION CONFIGURATION

4886       The partition configuration permits you to establish different job lim‐
4887       its or access controls for various groups  (or  partitions)  of  nodes.
4888       Nodes  may  be  in  more than one partition, making partitions serve as
4889       general purpose queues.  For example one may put the same set of  nodes
4890       into  two  different  partitions, each with different constraints (time
4891       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4892       allocated  resources  within a single partition.  Default values can be
4893       specified with a record in which PartitionName is "DEFAULT".   The  de‐
4894       fault entry values will apply only to lines following it in the config‐
4895       uration file and the default values can be reset multiple times in  the
4896       configuration file with multiple entries where "PartitionName=DEFAULT".
4897       The "PartitionName=" specification must be placed  on  every  line  de‐
4898       scribing  the  configuration of partitions.  Each line where Partition‐
4899       Name is "DEFAULT" will replace or add to previous  default  values  and
4900       not a reinitialize the default values.  A single partition name can not
4901       appear as a PartitionName value in more than one line (duplicate parti‐
4902       tion  name  records will be ignored).  If a partition that is in use is
4903       deleted from the configuration and slurm is restarted  or  reconfigured
4904       (scontrol  reconfigure),  jobs using the partition are canceled.  NOTE:
4905       Put all parameters for each partition on a single line.  Each  line  of
4906       partition configuration information should represent a different parti‐
4907       tion.  The partition configuration file contains the following informa‐
4908       tion:
4909
4910
4911       AllocNodes
4912              Comma-separated  list  of nodes from which users can submit jobs
4913              in the partition.  Node names may be specified  using  the  node
4914              range  expression  syntax described above.  The default value is
4915              "ALL".
4916
4917
4918       AllowAccounts
4919              Comma-separated list of accounts which may execute jobs  in  the
4920              partition.   The default value is "ALL".  NOTE: If AllowAccounts
4921              is used then DenyAccounts will not be enforced.  Also  refer  to
4922              DenyAccounts.
4923
4924
4925       AllowGroups
4926              Comma-separated  list  of group names which may view and execute
4927              jobs in this partition.  A user will be permitted  to  view  and
4928              submit  a  job to this partition if AllowGroups has at least one
4929              group associated with the user.  Jobs executed as user  root  or
4930              as user SlurmUser will be allowed to view and use any partition,
4931              regardless of the value of AllowGroups. In addition, a Slurm Ad‐
4932              min  or  Operator will be able to view any partition, regardless
4933              of the value of AllowGroups.  If user root attempts to execute a
4934              job  as  another user (e.g. using srun's --uid option), then the
4935              job will be subject to AllowGroups as if it  were  submitted  by
4936              that user.  By default, AllowGroups is unset, meaning all groups
4937              are allowed to use this partition. The special  value  'ALL'  is
4938              equivalent  to this.  Even when PrivateData does not hide parti‐
4939              tion information, AllowGroups will still hide partition informa‐
4940              tion  accordingly.   NOTE:  For performance reasons, Slurm main‐
4941              tains a list of user IDs allowed to use each partition and  this
4942              is checked at job submission time.  This list of user IDs is up‐
4943              dated when the slurmctld daemon is restarted, reconfigured (e.g.
4944              "scontrol reconfig") or the partition's AllowGroups value is re‐
4945              set, even if is value is unchanged (e.g. "scontrol update Parti‐
4946              tionName=name  AllowGroups=group").   For  a  user's access to a
4947              partition to change, both his group membership must  change  and
4948              Slurm's internal user ID list must change using one of the meth‐
4949              ods described above.
4950
4951
4952       AllowQos
4953              Comma-separated list of Qos which may execute jobs in the parti‐
4954              tion.   Jobs executed as user root can use any partition without
4955              regard to the value of AllowQos.  The default  value  is  "ALL".
4956              NOTE:  If  AllowQos  is  used then DenyQos will not be enforced.
4957              Also refer to DenyQos.
4958
4959
4960       Alternate
4961              Partition name of alternate partition to be used if the state of
4962              this partition is "DRAIN" or "INACTIVE."
4963
4964
4965       CpuBind
4966              If  a job step request does not specify an option to control how
4967              tasks are bound to allocated CPUs (--cpu-bind) and all nodes al‐
4968              located to the job do not have the same CpuBind option the node.
4969              Then the partition's CpuBind option will control how  tasks  are
4970              bound  to  allocated resources.  Supported values forCpuBind are
4971              "none", "board", "socket", "ldom" (NUMA), "core" and "thread".
4972
4973
4974       Default
4975              If this keyword is set, jobs submitted without a partition spec‐
4976              ification  will  utilize  this  partition.   Possible values are
4977              "YES" and "NO".  The default value is "NO".
4978
4979
4980       DefCpuPerGPU
4981              Default count of CPUs allocated per allocated GPU.
4982
4983
4984       DefMemPerCPU
4985              Default  real  memory  size  available  per  allocated  CPU   in
4986              megabytes.   Used  to  avoid over-subscribing memory and causing
4987              paging.  DefMemPerCPU would generally be used if individual pro‐
4988              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
4989              lectType=select/cons_tres).  If not set, the DefMemPerCPU  value
4990              for  the  entire  cluster  will be used.  Also see DefMemPerGPU,
4991              DefMemPerNode and MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU  and
4992              DefMemPerNode are mutually exclusive.
4993
4994
4995       DefMemPerGPU
4996              Default   real  memory  size  available  per  allocated  GPU  in
4997              megabytes.  Also see DefMemPerCPU, DefMemPerNode and  MaxMemPer‐
4998              CPU.   DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually
4999              exclusive.
5000
5001
5002       DefMemPerNode
5003              Default  real  memory  size  available  per  allocated  node  in
5004              megabytes.   Used  to  avoid over-subscribing memory and causing
5005              paging.  DefMemPerNode would generally be used  if  whole  nodes
5006              are  allocated  to jobs (SelectType=select/linear) and resources
5007              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
5008              If  not set, the DefMemPerNode value for the entire cluster will
5009              be used.  Also see DefMemPerCPU, DefMemPerGPU and  MaxMemPerCPU.
5010              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
5011              sive.
5012
5013
5014       DenyAccounts
5015              Comma-separated list of accounts which may not execute  jobs  in
5016              the  partition.  By default, no accounts are denied access NOTE:
5017              If AllowAccounts is used then DenyAccounts will not be enforced.
5018              Also refer to AllowAccounts.
5019
5020
5021       DenyQos
5022              Comma-separated  list  of  Qos which may not execute jobs in the
5023              partition.  By default, no QOS are denied access  NOTE:  If  Al‐
5024              lowQos  is  used  then DenyQos will not be enforced.  Also refer
5025              AllowQos.
5026
5027
5028       DefaultTime
5029              Run time limit used for jobs that don't specify a value. If  not
5030              set  then  MaxTime will be used.  Format is the same as for Max‐
5031              Time.
5032
5033
5034       DisableRootJobs
5035              If set to "YES" then user root will be  prevented  from  running
5036              any jobs on this partition.  The default value will be the value
5037              of DisableRootJobs set  outside  of  a  partition  specification
5038              (which is "NO", allowing user root to execute jobs).
5039
5040
5041       ExclusiveUser
5042              If  set  to  "YES"  then  nodes will be exclusively allocated to
5043              users.  Multiple jobs may be run for the same user, but only one
5044              user can be active at a time.  This capability is also available
5045              on a per-job basis by using the --exclusive=user option.
5046
5047
5048       GraceTime
5049              Specifies, in units of seconds, the preemption grace time to  be
5050              extended  to  a job which has been selected for preemption.  The
5051              default value is zero, no preemption grace time  is  allowed  on
5052              this  partition.   Once  a job has been selected for preemption,
5053              its end time is set to the  current  time  plus  GraceTime.  The
5054              job's  tasks are immediately sent SIGCONT and SIGTERM signals in
5055              order to provide notification of its imminent termination.  This
5056              is  followed by the SIGCONT, SIGTERM and SIGKILL signal sequence
5057              upon reaching its new end time. This second set  of  signals  is
5058              sent  to  both the tasks and the containing batch script, if ap‐
5059              plicable.  See also the global KillWait configuration parameter.
5060
5061
5062       Hidden Specifies if the partition and its jobs are to be hidden by  de‐
5063              fault.  Hidden partitions will by default not be reported by the
5064              Slurm APIs or commands.  Possible values  are  "YES"  and  "NO".
5065              The  default  value  is  "NO".  Note that partitions that a user
5066              lacks access to by virtue of the AllowGroups parameter will also
5067              be hidden by default.
5068
5069
5070       LLN    Schedule resources to jobs on the least loaded nodes (based upon
5071              the number of idle CPUs). This is generally only recommended for
5072              an  environment  with serial jobs as idle resources will tend to
5073              be highly fragmented, resulting in parallel jobs being  distrib‐
5074              uted  across many nodes.  Note that node Weight takes precedence
5075              over how many idle resources are on each node.  Also see the Se‐
5076              lectParameters  configuration  parameter CR_LLN to use the least
5077              loaded nodes in every partition.
5078
5079
5080       MaxCPUsPerNode
5081              Maximum number of CPUs on any node available to  all  jobs  from
5082              this partition.  This can be especially useful to schedule GPUs.
5083              For example a node can be associated with two  Slurm  partitions
5084              (e.g.  "cpu"  and  "gpu") and the partition/queue "cpu" could be
5085              limited to only a subset of the node's CPUs, ensuring  that  one
5086              or  more  CPUs  would  be  available to jobs in the "gpu" parti‐
5087              tion/queue.
5088
5089
5090       MaxMemPerCPU
5091              Maximum  real  memory  size  available  per  allocated  CPU   in
5092              megabytes.   Used  to  avoid over-subscribing memory and causing
5093              paging.  MaxMemPerCPU would generally be used if individual pro‐
5094              cessors are allocated to jobs (SelectType=select/cons_res or Se‐
5095              lectType=select/cons_tres).  If not set, the MaxMemPerCPU  value
5096              for  the entire cluster will be used.  Also see DefMemPerCPU and
5097              MaxMemPerNode.  MaxMemPerCPU and MaxMemPerNode are mutually  ex‐
5098              clusive.
5099
5100
5101       MaxMemPerNode
5102              Maximum  real  memory  size  available  per  allocated  node  in
5103              megabytes.  Used to avoid over-subscribing  memory  and  causing
5104              paging.   MaxMemPerNode  would  generally be used if whole nodes
5105              are allocated to jobs (SelectType=select/linear)  and  resources
5106              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5107              If not set, the MaxMemPerNode value for the entire cluster  will
5108              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
5109              and MaxMemPerNode are mutually exclusive.
5110
5111
5112       MaxNodes
5113              Maximum count of nodes which may be allocated to any single job.
5114              The  default  value  is "UNLIMITED", which is represented inter‐
5115              nally as -1.  This limit does not  apply  to  jobs  executed  by
5116              SlurmUser or user root.
5117
5118
5119       MaxTime
5120              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
5121              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
5122              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
5123              tion is one minute and second values are rounded up to the  next
5124              minute.  This limit does not apply to jobs executed by SlurmUser
5125              or user root.
5126
5127
5128       MinNodes
5129              Minimum count of nodes which may be allocated to any single job.
5130              The  default value is 0.  This limit does not apply to jobs exe‐
5131              cuted by SlurmUser or user root.
5132
5133
5134       Nodes  Comma-separated list of nodes or nodesets which  are  associated
5135              with this partition.  Node names may be specified using the node
5136              range expression syntax described above. A blank list  of  nodes
5137              (i.e.  "Nodes= ") can be used if one wants a partition to exist,
5138              but have no resources (possibly on a temporary basis).  A  value
5139              of "ALL" is mapped to all nodes configured in the cluster.
5140
5141
5142       OverSubscribe
5143              Controls  the  ability of the partition to execute more than one
5144              job at a time on each resource (node, socket or  core  depending
5145              upon the value of SelectTypeParameters).  If resources are to be
5146              over-subscribed, avoiding memory over-subscription is  very  im‐
5147              portant.   SelectTypeParameters  should  be  configured to treat
5148              memory as a consumable resource and the --mem option  should  be
5149              used  for  job  allocations.   Sharing of resources is typically
5150              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
5151              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
5152              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
5153              can  negatively  impact  performance for systems with many thou‐
5154              sands of running jobs.  The default value is "NO".  For more in‐
5155              formation see the following web pages:
5156              https://slurm.schedmd.com/cons_res.html
5157              https://slurm.schedmd.com/cons_res_share.html
5158              https://slurm.schedmd.com/gang_scheduling.html
5159              https://slurm.schedmd.com/preempt.html
5160
5161
5162              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
5163                          Type=select/cons_res or  SelectType=select/cons_tres
5164                          configured.   Jobs that run in partitions with Over‐
5165                          Subscribe=EXCLUSIVE will have  exclusive  access  to
5166                          all allocated nodes.
5167
5168              FORCE       Makes  all  resources in the partition available for
5169                          oversubscription without any means for users to dis‐
5170                          able  it.   May be followed with a colon and maximum
5171                          number of jobs in running or suspended  state.   For
5172                          example  OverSubscribe=FORCE:4  enables  each  node,
5173                          socket or core to oversubscribe each  resource  four
5174                          ways.   Recommended  only for systems using Preempt‐
5175                          Mode=suspend,gang.
5176
5177                          NOTE: OverSubscribe=FORCE:1 is a special  case  that
5178                          is not exactly equivalent to OverSubscribe=NO. Over‐
5179                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5180                          tion  of resources in the same partition but it will
5181                          still allow oversubscription due to preemption. Set‐
5182                          ting  OverSubscribe=NO will prevent oversubscription
5183                          from happening due to preemption as well.
5184
5185                          NOTE: If using PreemptType=preempt/qos you can spec‐
5186                          ify  a  value  for FORCE that is greater than 1. For
5187                          example, OverSubscribe=FORCE:2 will permit two  jobs
5188                          per  resource  normally,  but  a  third  job  can be
5189                          started only if done  so  through  preemption  based
5190                          upon QOS.
5191
5192                          NOTE: If OverSubscribe is configured to FORCE or YES
5193                          in your slurm.conf and the system is not  configured
5194                          to  use  preemption (PreemptMode=OFF) accounting can
5195                          easily grow to values greater than the  actual  uti‐
5196                          lization.  It  may  be common on such systems to get
5197                          error messages in the slurmdbd log stating: "We have
5198                          more allocated time than is possible."
5199
5200
5201              YES         Makes  all  resources in the partition available for
5202                          sharing upon request by  the  job.   Resources  will
5203                          only be over-subscribed when explicitly requested by
5204                          the user using the "--oversubscribe" option  on  job
5205                          submission.   May be followed with a colon and maxi‐
5206                          mum number of jobs in running  or  suspended  state.
5207                          For example "OverSubscribe=YES:4" enables each node,
5208                          socket or core to execute up to four jobs  at  once.
5209                          Recommended  only  for  systems  running  with  gang
5210                          scheduling (PreemptMode=suspend,gang).
5211
5212              NO          Selected resources are allocated to a single job. No
5213                          resource will be allocated to more than one job.
5214
5215                          NOTE:   Even   if  you  are  using  PreemptMode=sus‐
5216                          pend,gang,  setting  OverSubscribe=NO  will  disable
5217                          preemption   on   that   partition.   Use   OverSub‐
5218                          scribe=FORCE:1 if you want to disable  normal  over‐
5219                          subscription  but still allow suspension due to pre‐
5220                          emption.
5221
5222
5223       PartitionName
5224              Name by which the partition may be  referenced  (e.g.  "Interac‐
5225              tive").   This  name  can  be specified by users when submitting
5226              jobs.  If the PartitionName is "DEFAULT", the  values  specified
5227              with  that  record will apply to subsequent partition specifica‐
5228              tions unless explicitly set to other values  in  that  partition
5229              record or replaced with a different set of default values.  Each
5230              line where PartitionName is "DEFAULT" will  replace  or  add  to
5231              previous  default values and not a reinitialize the default val‐
5232              ues.
5233
5234
5235       PreemptMode
5236              Mechanism used to preempt jobs or  enable  gang  scheduling  for
5237              this  partition  when PreemptType=preempt/partition_prio is con‐
5238              figured.  This partition-specific PreemptMode configuration  pa‐
5239              rameter will override the cluster-wide PreemptMode for this par‐
5240              tition.  It can be set to OFF to  disable  preemption  and  gang
5241              scheduling  for  this  partition.  See also PriorityTier and the
5242              above description of the cluster-wide PreemptMode parameter  for
5243              further details.
5244
5245
5246       PriorityJobFactor
5247              Partition  factor  used by priority/multifactor plugin in calcu‐
5248              lating job priority.  The value may not exceed 65533.  Also  see
5249              PriorityTier.
5250
5251
5252       PriorityTier
5253              Jobs  submitted to a partition with a higher priority tier value
5254              will be dispatched before pending jobs in partition  with  lower
5255              priority  tier value and, if possible, they will preempt running
5256              jobs from partitions with lower priority tier values.  Note that
5257              a partition's priority tier takes precedence over a job's prior‐
5258              ity.  The value may not exceed 65533.  Also see  PriorityJobFac‐
5259              tor.
5260
5261
5262       QOS    Used  to  extend  the  limits available to a QOS on a partition.
5263              Jobs will not be associated to this QOS outside of being associ‐
5264              ated  to  the partition.  They will still be associated to their
5265              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5266              set  in both the Partition's QOS and the Job's QOS the Partition
5267              QOS will be honored unless the Job's  QOS  has  the  OverPartQOS
5268              flag set in which the Job's QOS will have priority.
5269
5270
5271       ReqResv
5272              Specifies  users  of  this partition are required to designate a
5273              reservation when submitting a job. This option can be useful  in
5274              restricting  usage  of a partition that may have higher priority
5275              or additional resources to be allowed only within a reservation.
5276              Possible values are "YES" and "NO".  The default value is "NO".
5277
5278
5279       RootOnly
5280              Specifies if only user ID zero (i.e. user root) may allocate re‐
5281              sources in this partition. User root may allocate resources  for
5282              any  other user, but the request must be initiated by user root.
5283              This option can be useful for a partition to be managed by  some
5284              external  entity  (e.g. a higher-level job manager) and prevents
5285              users from directly using those resources.  Possible values  are
5286              "YES" and "NO".  The default value is "NO".
5287
5288
5289       SelectTypeParameters
5290              Partition-specific  resource  allocation  type.  This option re‐
5291              places the global SelectTypeParameters value.  Supported  values
5292              are  CR_Core,  CR_Core_Memory,  CR_Socket  and CR_Socket_Memory.
5293              Use requires the system-wide SelectTypeParameters value  be  set
5294              to  any  of  the four supported values previously listed; other‐
5295              wise, the partition-specific value will be ignored.
5296
5297
5298       Shared The Shared configuration parameter  has  been  replaced  by  the
5299              OverSubscribe parameter described above.
5300
5301
5302       State  State of partition or availability for use.  Possible values are
5303              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5304              See also the related "Alternate" keyword.
5305
5306              UP        Designates  that  new jobs may be queued on the parti‐
5307                        tion, and that jobs may be  allocated  nodes  and  run
5308                        from the partition.
5309
5310              DOWN      Designates  that  new jobs may be queued on the parti‐
5311                        tion, but queued jobs may not be allocated  nodes  and
5312                        run  from  the  partition. Jobs already running on the
5313                        partition continue to run. The jobs must be explicitly
5314                        canceled to force their termination.
5315
5316              DRAIN     Designates  that no new jobs may be queued on the par‐
5317                        tition (job submission requests will be denied with an
5318                        error  message), but jobs already queued on the parti‐
5319                        tion may be allocated nodes and  run.   See  also  the
5320                        "Alternate" partition specification.
5321
5322              INACTIVE  Designates  that no new jobs may be queued on the par‐
5323                        tition, and jobs already queued may not  be  allocated
5324                        nodes  and  run.   See  also the "Alternate" partition
5325                        specification.
5326
5327
5328       TRESBillingWeights
5329              TRESBillingWeights is used to define the billing weights of each
5330              TRES  type  that will be used in calculating the usage of a job.
5331              The calculated usage is used when calculating fairshare and when
5332              enforcing the TRES billing limit on jobs.
5333
5334              Billing weights are specified as a comma-separated list of <TRES
5335              Type>=<TRES Billing Weight> pairs.
5336
5337              Any TRES Type is available for billing. Note that the base  unit
5338              for memory and burst buffers is megabytes.
5339
5340              By  default  the billing of TRES is calculated as the sum of all
5341              TRES types multiplied by their corresponding billing weight.
5342
5343              The weighted amount of a resource can be adjusted  by  adding  a
5344              suffix  of K,M,G,T or P after the billing weight. For example, a
5345              memory weight of "mem=.25" on a job allocated 8GB will be billed
5346              2048  (8192MB  *.25) units. A memory weight of "mem=.25G" on the
5347              same job will be billed 2 (8192MB * (.25/1024)) units.
5348
5349              Negative values are allowed.
5350
5351              When a job is allocated 1 CPU and 8 GB of memory on a  partition
5352              configured                   with                   TRESBilling‐
5353              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5354              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5355
5356              If  PriorityFlags=MAX_TRES  is  configured, the billable TRES is
5357              calculated as the MAX of individual TRES' on a node (e.g.  cpus,
5358              mem, gres) plus the sum of all global TRES' (e.g. licenses). Us‐
5359              ing the same example above the billable TRES will be  MAX(1*1.0,
5360              8*0.25) + (0*2.0) = 2.0.
5361
5362              If  TRESBillingWeights  is  not  defined  then the job is billed
5363              against the total number of allocated CPUs.
5364
5365              NOTE: TRESBillingWeights doesn't affect job priority directly as
5366              it  is  currently  not used for the size of the job. If you want
5367              TRES' to play a role in the job's priority  then  refer  to  the
5368              PriorityWeightTRES option.
5369
5370
5371

PROLOG AND EPILOG SCRIPTS

5373       There  are  a variety of prolog and epilog program options that execute
5374       with various permissions and at various times.  The four  options  most
5375       likely to be used are: Prolog and Epilog (executed once on each compute
5376       node for each job) plus PrologSlurmctld and  EpilogSlurmctld  (executed
5377       once on the ControlMachine for each job).
5378
5379       NOTE:  Standard  output  and error messages are normally not preserved.
5380       Explicitly write output and error messages to an  appropriate  location
5381       if you wish to preserve that information.
5382
5383       NOTE:   By default the Prolog script is ONLY run on any individual node
5384       when it first sees a job step from a new allocation. It  does  not  run
5385       the  Prolog immediately when an allocation is granted.  If no job steps
5386       from an allocation are run on a node, it will never run the Prolog  for
5387       that  allocation.   This  Prolog  behaviour  can be changed by the Pro‐
5388       logFlags parameter.  The Epilog, on the other hand, always runs on  ev‐
5389       ery node of an allocation when the allocation is released.
5390
5391       If the Epilog fails (returns a non-zero exit code), this will result in
5392       the node being set to a DRAIN state.  If the EpilogSlurmctld fails (re‐
5393       turns  a  non-zero exit code), this will only be logged.  If the Prolog
5394       fails (returns a non-zero exit code), this will result in the node  be‐
5395       ing set to a DRAIN state and the job being requeued in a held state un‐
5396       less nohold_on_prolog_fail is configured  in  SchedulerParameters.   If
5397       the PrologSlurmctld fails (returns a non-zero exit code), this will re‐
5398       sult in the job being requeued to be executed on another node if possi‐
5399       ble.  Only  batch  jobs  can be requeued.  Interactive jobs (salloc and
5400       srun) will be cancelled if the PrologSlurmctld fails.
5401
5402
5403       Information about the job is passed to  the  script  using  environment
5404       variables.  Unless otherwise specified, these environment variables are
5405       available in each of the scripts mentioned above (Prolog, Epilog,  Pro‐
5406       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5407       ables that includes those  available  in  the  SrunProlog,  SrunEpilog,
5408       TaskProlog  and  TaskEpilog  please  see  the  Prolog  and Epilog Guide
5409       <https://slurm.schedmd.com/prolog_epilog.html>.
5410
5411       SLURM_ARRAY_JOB_ID
5412              If this job is part of a job array, this will be set to the  job
5413              ID.   Otherwise  it will not be set.  To reference this specific
5414              task of a job array, combine SLURM_ARRAY_JOB_ID  with  SLURM_AR‐
5415              RAY_TASK_ID      (e.g.      "scontrol     update     ${SLURM_AR‐
5416              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}  ...");  Available  in   Pro‐
5417              logSlurmctld and EpilogSlurmctld.
5418
5419       SLURM_ARRAY_TASK_ID
5420              If this job is part of a job array, this will be set to the task
5421              ID.  Otherwise it will not be set.  To reference  this  specific
5422              task  of  a job array, combine SLURM_ARRAY_JOB_ID with SLURM_AR‐
5423              RAY_TASK_ID     (e.g.     "scontrol      update      ${SLURM_AR‐
5424              RAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID}   ...");  Available  in  Pro‐
5425              logSlurmctld and EpilogSlurmctld.
5426
5427       SLURM_ARRAY_TASK_MAX
5428              If this job is part of a job array, this will be set to the max‐
5429              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5430              logSlurmctld and EpilogSlurmctld.
5431
5432       SLURM_ARRAY_TASK_MIN
5433              If this job is part of a job array, this will be set to the min‐
5434              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5435              logSlurmctld and EpilogSlurmctld.
5436
5437       SLURM_ARRAY_TASK_STEP
5438              If this job is part of a job array, this will be set to the step
5439              size  of  task IDs.  Otherwise it will not be set.  Available in
5440              PrologSlurmctld and EpilogSlurmctld.
5441
5442       SLURM_CLUSTER_NAME
5443              Name of the cluster executing the job.
5444
5445       SLURM_CONF
5446              Location of the slurm.conf file. Available in Prolog and Epilog.
5447
5448       SLURMD_NODENAME
5449              Name of the node running the task. In the case of a parallel job
5450              executing on multiple compute nodes, the various tasks will have
5451              this environment variable set to different values on  each  com‐
5452              pute node. Availble in Prolog and Epilog.
5453
5454       SLURM_JOB_ACCOUNT
5455              Account name used for the job.  Available in PrologSlurmctld and
5456              EpilogSlurmctld.
5457
5458       SLURM_JOB_CONSTRAINTS
5459              Features required to run the job.   Available  in  Prolog,  Pro‐
5460              logSlurmctld and EpilogSlurmctld.
5461
5462       SLURM_JOB_DERIVED_EC
5463              The  highest  exit  code  of all of the job steps.  Available in
5464              EpilogSlurmctld.
5465
5466       SLURM_JOB_EXIT_CODE
5467              The exit code of the job script (or salloc). The  value  is  the
5468              status  as  returned  by  the  wait()  system call (See wait(2))
5469              Available in EpilogSlurmctld.
5470
5471       SLURM_JOB_EXIT_CODE2
5472              The exit code of the job script (or salloc). The value  has  the
5473              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5474              cally as set by the exit() function. The second  number  of  the
5475              signal that caused the process to terminate if it was terminated
5476              by a signal.  Available in EpilogSlurmctld.
5477
5478       SLURM_JOB_GID
5479              Group ID of the job's owner.
5480
5481       SLURM_JOB_GPUS
5482              GPU IDs allocated to the job (if any).  Available in the Prolog.
5483
5484       SLURM_JOB_GROUP
5485              Group name of the job's owner.  Available in PrologSlurmctld and
5486              EpilogSlurmctld.
5487
5488       SLURM_JOB_ID
5489              Job ID.
5490
5491       SLURM_JOBID
5492              Job ID.
5493
5494       SLURM_JOB_NAME
5495              Name  of the job.  Available in PrologSlurmctld and EpilogSlurm‐
5496              ctld.
5497
5498       SLURM_JOB_NODELIST
5499              Nodes assigned to job. A Slurm hostlist  expression.   "scontrol
5500              show  hostnames"  can be used to convert this to a list of indi‐
5501              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5502              logSlurmctld.
5503
5504       SLURM_JOB_PARTITION
5505              Partition  that  job runs in.  Available in Prolog, PrologSlurm‐
5506              ctld and EpilogSlurmctld.
5507
5508       SLURM_JOB_UID
5509              User ID of the job's owner.
5510
5511       SLURM_JOB_USER
5512              User name of the job's owner.
5513
5514       SLURM_SCRIPT_CONTEXT
5515              Identifies which epilog or prolog program is currently running.
5516
5517

UNKILLABLE STEP PROGRAM SCRIPT

5519       This program can be used to take special actions to clean up the unkil‐
5520       lable  processes and/or notify system administrators.  The program will
5521       be run as SlurmdUser (usually "root") on the compute node where Unkill‐
5522       ableStepTimeout was triggered.
5523
5524       Information about the unkillable job step is passed to the script using
5525       environment variables.
5526
5527       SLURM_JOB_ID
5528              Job ID.
5529
5530       SLURM_STEP_ID
5531              Job Step ID.
5532
5533

NETWORK TOPOLOGY

5535       Slurm is able to optimize job  allocations  to  minimize  network  con‐
5536       tention.   Special  Slurm logic is used to optimize allocations on sys‐
5537       tems with a three-dimensional interconnect.  and information about con‐
5538       figuring  those  systems  are  available  on  web pages available here:
5539       <https://slurm.schedmd.com/>.  For a hierarchical network, Slurm  needs
5540       to have detailed information about how nodes are configured on the net‐
5541       work switches.
5542
5543       Given network topology information, Slurm allocates all of a job's  re‐
5544       sources  onto  a  single  leaf  of  the  network  (if possible) using a
5545       best-fit algorithm.  Otherwise it will allocate a job's resources  onto
5546       multiple  leaf  switches  so  as  to  minimize  the use of higher-level
5547       switches.  The TopologyPlugin parameter controls which plugin  is  used
5548       to  collect  network  topology  information.  The only values presently
5549       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5550       forms  best-fit logic over three-dimensional topology), "topology/none"
5551       (default for other systems, best-fit logic over one-dimensional  topol‐
5552       ogy), "topology/tree" (determine the network topology based upon infor‐
5553       mation contained in a topology.conf file, see "man  topology.conf"  for
5554       more  information).  Future plugins may gather topology information di‐
5555       rectly from the network.  The topology information is optional.  If not
5556       provided,  Slurm  will  perform a best-fit algorithm assuming the nodes
5557       are in a one-dimensional array as  configured  and  the  communications
5558       cost is related to the node distance in this array.
5559
5560

RELOCATING CONTROLLERS

5562       If  the  cluster's  computers used for the primary or backup controller
5563       will be out of service for an extended period of time, it may be desir‐
5564       able to relocate them.  In order to do so, follow this procedure:
5565
5566       1. Stop the Slurm daemons
5567       2. Modify the slurm.conf file appropriately
5568       3. Distribute the updated slurm.conf file to all nodes
5569       4. Restart the Slurm daemons
5570
5571       There  should  be  no loss of any running or pending jobs.  Ensure that
5572       any nodes added to the cluster have the  current  slurm.conf  file  in‐
5573       stalled.
5574
5575       CAUTION: If two nodes are simultaneously configured as the primary con‐
5576       troller (two nodes on which SlurmctldHost specify the  local  host  and
5577       the slurmctld daemon is executing on each), system behavior will be de‐
5578       structive.  If a compute node has an incorrect SlurmctldHost parameter,
5579       that node may be rendered unusable, but no other harm will result.
5580
5581

EXAMPLE

5583       #
5584       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5585       # Author: John Doe
5586       # Date: 11/06/2001
5587       #
5588       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5589       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5590       #
5591       AuthType=auth/munge
5592       Epilog=/usr/local/slurm/epilog
5593       Prolog=/usr/local/slurm/prolog
5594       FirstJobId=65536
5595       InactiveLimit=120
5596       JobCompType=jobcomp/filetxt
5597       JobCompLoc=/var/log/slurm/jobcomp
5598       KillWait=30
5599       MaxJobCount=10000
5600       MinJobAge=3600
5601       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5602       ReturnToService=0
5603       SchedulerType=sched/backfill
5604       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5605       SlurmdLogFile=/var/log/slurm/slurmd.log
5606       SlurmctldPort=7002
5607       SlurmdPort=7003
5608       SlurmdSpoolDir=/var/spool/slurmd.spool
5609       StateSaveLocation=/var/spool/slurm.state
5610       SwitchType=switch/none
5611       TmpFS=/tmp
5612       WaitTime=30
5613       JobCredentialPrivateKey=/usr/local/slurm/private.key
5614       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5615       #
5616       # Node Configurations
5617       #
5618       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5619       NodeName=DEFAULT State=UNKNOWN
5620       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5621       # Update records for specific DOWN nodes
5622       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5623       #
5624       # Partition Configurations
5625       #
5626       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5627       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5628       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5629       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5630
5631

INCLUDE MODIFIERS

5633       The  "include" key word can be used with modifiers within the specified
5634       pathname. These modifiers would be replaced with cluster name or  other
5635       information  depending  on which modifier is specified. If the included
5636       file is not an absolute path name  (i.e.  it  does  not  start  with  a
5637       slash),  it  will  searched for in the same directory as the slurm.conf
5638       file.
5639
5640       %c     Cluster name specified in the slurm.conf will be used.
5641
5642       EXAMPLE
5643       ClusterName=linux
5644       include /home/slurm/etc/%c_config
5645       # Above line interpreted as
5646       # "include /home/slurm/etc/linux_config"
5647
5648

FILE AND DIRECTORY PERMISSIONS

5650       There are three classes of files: Files used by slurmctld must  be  ac‐
5651       cessible  by  user  SlurmUser  and accessible by the primary and backup
5652       control machines.  Files used by slurmd must be accessible by user root
5653       and  accessible from every compute node.  A few files need to be acces‐
5654       sible by normal users on all login and compute nodes.  While many files
5655       and  directories  are  listed below, most of them will not be used with
5656       most configurations.
5657
5658       Epilog Must be executable by user root.  It  is  recommended  that  the
5659              file  be  readable  by  all users.  The file must exist on every
5660              compute node.
5661
5662       EpilogSlurmctld
5663              Must be executable by user SlurmUser.  It  is  recommended  that
5664              the  file be readable by all users.  The file must be accessible
5665              by the primary and backup control machines.
5666
5667       HealthCheckProgram
5668              Must be executable by user root.  It  is  recommended  that  the
5669              file  be  readable  by  all users.  The file must exist on every
5670              compute node.
5671
5672       JobCompLoc
5673              If this specifies a file, it must be writable by user SlurmUser.
5674              The  file  must  be accessible by the primary and backup control
5675              machines.
5676
5677       JobCredentialPrivateKey
5678              Must be readable only by user SlurmUser and writable by no other
5679              users.   The  file  must be accessible by the primary and backup
5680              control machines.
5681
5682       JobCredentialPublicCertificate
5683              Readable to all users on all nodes.  Must  not  be  writable  by
5684              regular users.
5685
5686       MailProg
5687              Must  be  executable by user SlurmUser.  Must not be writable by
5688              regular users.  The file must be accessible by the  primary  and
5689              backup control machines.
5690
5691       Prolog Must  be  executable  by  user root.  It is recommended that the
5692              file be readable by all users.  The file  must  exist  on  every
5693              compute node.
5694
5695       PrologSlurmctld
5696              Must  be  executable  by user SlurmUser.  It is recommended that
5697              the file be readable by all users.  The file must be  accessible
5698              by the primary and backup control machines.
5699
5700       ResumeProgram
5701              Must be executable by user SlurmUser.  The file must be accessi‐
5702              ble by the primary and backup control machines.
5703
5704       slurm.conf
5705              Readable to all users on all nodes.  Must  not  be  writable  by
5706              regular users.
5707
5708       SlurmctldLogFile
5709              Must be writable by user SlurmUser.  The file must be accessible
5710              by the primary and backup control machines.
5711
5712       SlurmctldPidFile
5713              Must be writable by user root.  Preferably writable  and  remov‐
5714              able  by  SlurmUser.  The file must be accessible by the primary
5715              and backup control machines.
5716
5717       SlurmdLogFile
5718              Must be writable by user root.  A distinct file  must  exist  on
5719              each compute node.
5720
5721       SlurmdPidFile
5722              Must  be  writable  by user root.  A distinct file must exist on
5723              each compute node.
5724
5725       SlurmdSpoolDir
5726              Must be writable by user root.  A distinct file  must  exist  on
5727              each compute node.
5728
5729       SrunEpilog
5730              Must  be  executable by all users.  The file must exist on every
5731              login and compute node.
5732
5733       SrunProlog
5734              Must be executable by all users.  The file must exist  on  every
5735              login and compute node.
5736
5737       StateSaveLocation
5738              Must be writable by user SlurmUser.  The file must be accessible
5739              by the primary and backup control machines.
5740
5741       SuspendProgram
5742              Must be executable by user SlurmUser.  The file must be accessi‐
5743              ble by the primary and backup control machines.
5744
5745       TaskEpilog
5746              Must  be  executable by all users.  The file must exist on every
5747              compute node.
5748
5749       TaskProlog
5750              Must be executable by all users.  The file must exist  on  every
5751              compute node.
5752
5753       UnkillableStepProgram
5754              Must be executable by user SlurmUser.  The file must be accessi‐
5755              ble by the primary and backup control machines.
5756
5757

LOGGING

5759       Note that while Slurm daemons create  log  files  and  other  files  as
5760       needed,  it  treats  the  lack  of parent directories as a fatal error.
5761       This prevents the daemons from running if critical file systems are not
5762       mounted  and  will minimize the risk of cold-starting (starting without
5763       preserving jobs).
5764
5765       Log files and job accounting files, may need to be created/owned by the
5766       "SlurmUser"  uid  to  be  successfully  accessed.   Use the "chown" and
5767       "chmod" commands to set the ownership  and  permissions  appropriately.
5768       See  the  section  FILE AND DIRECTORY PERMISSIONS for information about
5769       the various files and directories used by Slurm.
5770
5771       It is recommended that the logrotate utility be  used  to  ensure  that
5772       various  log  files do not become too large.  This also applies to text
5773       files used for accounting, process tracking, and the  slurmdbd  log  if
5774       they are used.
5775
5776       Here is a sample logrotate configuration. Make appropriate site modifi‐
5777       cations and save as  /etc/logrotate.d/slurm  on  all  nodes.   See  the
5778       logrotate man page for more details.
5779
5780       ##
5781       # Slurm Logrotate Configuration
5782       ##
5783       /var/log/slurm/*.log {
5784            compress
5785            missingok
5786            nocopytruncate
5787            nodelaycompress
5788            nomail
5789            notifempty
5790            noolddir
5791            rotate 5
5792            sharedscripts
5793            size=5M
5794            create 640 slurm root
5795            postrotate
5796                 pkill -x --signal SIGUSR2 slurmctld
5797                 pkill -x --signal SIGUSR2 slurmd
5798                 pkill -x --signal SIGUSR2 slurmdbd
5799                 exit 0
5800            endscript
5801       }
5802

COPYING

5804       Copyright  (C)  2002-2007  The Regents of the University of California.
5805       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5806       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5807       Copyright (C) 2010-2017 SchedMD LLC.
5808
5809       This file is part of Slurm, a resource  management  program.   For  de‐
5810       tails, see <https://slurm.schedmd.com/>.
5811
5812       Slurm  is free software; you can redistribute it and/or modify it under
5813       the terms of the GNU General Public License as published  by  the  Free
5814       Software  Foundation;  either version 2 of the License, or (at your op‐
5815       tion) any later version.
5816
5817       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
5818       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
5819       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
5820       for more details.
5821
5822

FILES

5824       /etc/slurm.conf
5825
5826

SEE ALSO

5828       cgroup.conf(5),  getaddrinfo(3),  getrlimit(2), gres.conf(5), group(5),
5829       hostname(1), scontrol(1), slurmctld(8), slurmd(8),  slurmdbd(8),  slur‐
5830       mdbd.conf(5), srun(1), spank(8), syslog(3), topology.conf(5)
5831
5832
5833
5834May 2021                   Slurm Configuration File              slurm.conf(5)
Impressum