1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified  at  system build time using the
17       DEFAULT_SLURM_CONF parameter  or  at  execution  time  by  setting  the
18       SLURM_CONF  environment  variable.  The Slurm daemons also allow you to
19       override both the built-in and environment-provided location using  the
20       "-f" option on the command line.
21
22       The  contents  of the file are case insensitive except for the names of
23       nodes and partitions. Any text following a  "#"  in  the  configuration
24       file  is treated as a comment through the end of that line.  Changes to
25       the configuration file take effect upon restart of Slurm daemons,  dae‐
26       mon receipt of the SIGHUP signal, or execution of the command "scontrol
27       reconfigure" unless otherwise noted.
28
29       If a line begins with the word "Include"  followed  by  whitespace  and
30       then  a  file  name, that file will be included inline with the current
31       configuration file. For large or complex systems,  multiple  configura‐
32       tion  files  may  prove easier to manage and enable reuse of some files
33       (See INCLUDE MODIFIERS for more details).
34
35       Note on file permissions:
36
37       The slurm.conf file must be readable by all users of Slurm, since it is
38       used  by  many  of the Slurm commands.  Other files that are defined in
39       the slurm.conf file, such as log files and job  accounting  files,  may
40       need  to  be  created/owned  by the user "SlurmUser" to be successfully
41       accessed.  Use the "chown" and "chmod" commands to  set  the  ownership
42       and permissions appropriately.  See the section FILE AND DIRECTORY PER‐
43       MISSIONS for information about the various files and  directories  used
44       by Slurm.
45
46

PARAMETERS

48       The overall configuration parameters available include:
49
50
51       AccountingStorageBackupHost
52              The  name  of  the backup machine hosting the accounting storage
53              database.  If used with the accounting_storage/slurmdbd  plugin,
54              this  is  where the backup slurmdbd would be running.  Only used
55              with systems using SlurmDBD, ignored otherwise.
56
57
58       AccountingStorageEnforce
59              This controls what level  of  association-based  enforcement  to
60              impose on job submissions.  Valid options are any combination of
61              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
62              all  for  all  things  (except nojobs and nosteps, which must be
63              requested as well).
64
65              If limits, qos, or wckeys are set, associations  will  automati‐
66              cally be set.
67
68              If wckeys is set, TrackWCKey will automatically be set.
69
70              If  safe  is  set, limits and associations will automatically be
71              set.
72
73              If nojobs is set, nosteps will automatically be set.
74
75              By setting associations, no new job is allowed to run  unless  a
76              corresponding  association  exists in the system.  If limits are
77              enforced, users can be limited by association  to  whatever  job
78              size or run time limits are defined.
79
80              If  nojobs  is set, Slurm will not account for any jobs or steps
81              on the system. Likewise, if  nosteps  is  set,  Slurm  will  not
82              account for any steps that have run.
83
84              If  safe  is  enforced,  a  job will only be launched against an
85              association or qos that has a GrpTRESMins limit set, if the  job
86              will be able to run to completion. Without this option set, jobs
87              will be launched as long as their usage hasn't reached the  cpu-
88              minutes  limit.  This  can  lead to jobs being launched but then
89              killed when the limit is reached.
90
91              With qos and/or wckeys  enforced  jobs  will  not  be  scheduled
92              unless a valid qos and/or workload characterization key is spec‐
93              ified.
94
95              When AccountingStorageEnforce  is  changed,  a  restart  of  the
96              slurmctld daemon is required (not just a "scontrol reconfig").
97
98
99       AccountingStorageExternalHost
100              A     comma     separated    list    of    external    slurmdbds
101              (<host/ip>[:port][,...]) to register with. If no port is  given,
102              the AccountingStoragePort will be used.
103
104              This  allows  clusters  registered with the external slurmdbd to
105              communicate with each other using the --cluster/-M  client  com‐
106              mand options.
107
108              The  cluster  will  add  itself  to  the external slurmdbd if it
109              doesn't exist. If a non-external cluster already exists  on  the
110              external  slurmdbd, the slurmctld will ignore registering to the
111              external slurmdbd.
112
113
114       AccountingStorageHost
115              The name of the machine hosting the accounting storage database.
116              Only  used with systems using SlurmDBD, ignored otherwise.  Also
117              see DefaultStorageHost.
118
119
120       AccountingStorageParameters
121              Comma separated list of  key-value  pair  parameters.  Currently
122              supported  values  include options to establish a secure connec‐
123              tion to the database:
124
125              SSL_CERT
126                The path name of the client public key certificate file.
127
128              SSL_CA
129                The path name of the Certificate  Authority  (CA)  certificate
130                file.
131
132              SSL_CAPATH
133                The  path  name  of the directory that contains trusted SSL CA
134                certificate files.
135
136              SSL_KEY
137                The path name of the client private key file.
138
139              SSL_CIPHER
140                The list of permissible ciphers for SSL encryption.
141
142
143       AccountingStoragePass
144              The password used to gain access to the database  to  store  the
145              accounting  data.   Only used for database type storage plugins,
146              ignored otherwise.  In the case of Slurm DBD  (Database  Daemon)
147              with  MUNGE authentication this can be configured to use a MUNGE
148              daemon specifically configured to provide authentication between
149              clusters  while the default MUNGE daemon provides authentication
150              within a cluster.  In that  case,  AccountingStoragePass  should
151              specify  the  named  port to be used for communications with the
152              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
153              The default value is NULL.  Also see DefaultStoragePass.
154
155
156       AccountingStoragePort
157              The  listening  port  of the accounting storage database server.
158              Only used for database type storage plugins, ignored  otherwise.
159              The  default  value  is  SLURMDBD_PORT  as established at system
160              build time. If no value is explicitly specified, it will be  set
161              to  6819.   This value must be equal to the DbdPort parameter in
162              the slurmdbd.conf file.  Also see DefaultStoragePort.
163
164
165       AccountingStorageTRES
166              Comma separated list of resources you wish to track on the clus‐
167              ter.   These  are the resources requested by the sbatch/srun job
168              when it is submitted. Currently this consists of  any  GRES,  BB
169              (burst  buffer) or license along with CPU, Memory, Node, Energy,
170              FS/[Disk|Lustre], IC/OFED, Pages, and VMem. By default  Billing,
171              CPU,  Energy, Memory, Node, FS/Disk, Pages and VMem are tracked.
172              These default TRES cannot be disabled,  but  only  appended  to.
173              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
174              billing, cpu, energy, memory, nodes,  fs/disk,  pages  and  vmem
175              along with a gres called craynetwork as well as a license called
176              iop1. Whenever these resources are used on the cluster they  are
177              recorded.  The  TRES are automatically set up in the database on
178              the start of the slurmctld.
179
180              If multiple GRES of different types are tracked  (e.g.  GPUs  of
181              different  types), then job requests with matching type specifi‐
182              cations will be recorded.  Given a  configuration  of  "Account‐
183              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
184              "gres/gpu:tesla" and "gres/gpu:volta" will track only jobs  that
185              explicitly  request  those  two GPU types, while "gres/gpu" will
186              track allocated GPUs of any type ("tesla", "volta" or any  other
187              GPU type).
188
189              Given      a      configuration      of      "AccountingStorage‐
190              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
191              "gres/gpu:volta"  will  track jobs that explicitly request those
192              GPU types.  If a job requests  GPUs,  but  does  not  explicitly
193              specify  the  GPU  type,  then  its  resource allocation will be
194              accounted for as either  "gres/gpu:tesla"  or  "gres/gpu:volta",
195              although  the accounting may not match the actual GPU type allo‐
196              cated to the job and the GPUs allocated to the job could be het‐
197              erogeneous.  In an environment containing various GPU types, use
198              of a job_submit plugin may be desired in order to force jobs  to
199              explicitly specify some GPU type.
200
201
202       AccountingStorageType
203              The  accounting  storage  mechanism  type.  Acceptable values at
204              present include "accounting_storage/none" and  "accounting_stor‐
205              age/slurmdbd".   The  "accounting_storage/slurmdbd"  value indi‐
206              cates that accounting records will be written to the Slurm  DBD,
207              which  manages  an underlying MySQL database. See "man slurmdbd"
208              for more information.  The default  value  is  "accounting_stor‐
209              age/none" and indicates that account records are not maintained.
210              Also see DefaultStorageType.
211
212
213       AccountingStorageUser
214              The user account for accessing the accounting storage  database.
215              Only  used for database type storage plugins, ignored otherwise.
216              Also see DefaultStorageUser.
217
218
219       AccountingStoreJobComment
220              If set to "YES" then include the job's comment field in the  job
221              complete  message  sent to the Accounting Storage database.  The
222              default is "YES".  Note the AdminComment and  SystemComment  are
223              always recorded in the database.
224
225
226       AcctGatherNodeFreq
227              The  AcctGather  plugins  sampling interval for node accounting.
228              For AcctGather plugin values of none, this parameter is ignored.
229              For  all  other  values  this parameter is the number of seconds
230              between node accounting samples. For the acct_gather_energy/rapl
231              plugin, set a value less than 300 because the counters may over‐
232              flow beyond this rate.  The default value is  zero.  This  value
233              disables  accounting  sampling  for  nodes. Note: The accounting
234              sampling interval for jobs is determined by the value of  JobAc‐
235              ctGatherFrequency.
236
237
238       AcctGatherEnergyType
239              Identifies the plugin to be used for energy consumption account‐
240              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
241              plugin  to  collect  energy consumption data for jobs and nodes.
242              The collection of energy consumption data  takes  place  on  the
243              node  level,  hence only in case of exclusive job allocation the
244              energy consumption measurements will reflect the job's real con‐
245              sumption. In case of node sharing between jobs the reported con‐
246              sumed energy per job (through sstat or sacct) will  not  reflect
247              the real energy consumed by the jobs.
248
249              Configurable values at present are:
250
251              acct_gather_energy/none
252                                  No energy consumption data is collected.
253
254              acct_gather_energy/ipmi
255                                  Energy  consumption  data  is collected from
256                                  the Baseboard  Management  Controller  (BMC)
257                                  using  the  Intelligent  Platform Management
258                                  Interface (IPMI).
259
260              acct_gather_energy/pm_counters
261                                  Energy consumption data  is  collected  from
262                                  the  Baseboard  Management  Controller (BMC)
263                                  for HPE Cray systems.
264
265              acct_gather_energy/rapl
266                                  Energy consumption data  is  collected  from
267                                  hardware  sensors  using the Running Average
268                                  Power  Limit  (RAPL)  mechanism.  Note  that
269                                  enabling  RAPL  may require the execution of
270                                  the command "sudo modprobe msr".
271
272              acct_gather_energy/xcc
273                                  Energy consumption data  is  collected  from
274                                  the  Lenovo  SD650 XClarity Controller (XCC)
275                                  using IPMI OEM raw commands.
276
277
278       AcctGatherInterconnectType
279              Identifies the plugin to be used for interconnect network  traf‐
280              fic  accounting.   The  jobacct_gather  plugin and slurmd daemon
281              call this plugin to collect network traffic data  for  jobs  and
282              nodes.   The  collection  of network traffic data takes place on
283              the node level, hence only in case of exclusive  job  allocation
284              the  collected  values  will  reflect the job's real traffic. In
285              case of node sharing between jobs the reported  network  traffic
286              per  job (through sstat or sacct) will not reflect the real net‐
287              work traffic by the jobs.
288
289              Configurable values at present are:
290
291              acct_gather_interconnect/none
292                                  No infiniband network data are collected.
293
294              acct_gather_interconnect/ofed
295                                  Infiniband network  traffic  data  are  col‐
296                                  lected from the hardware monitoring counters
297                                  of  Infiniband  devices  through  the   OFED
298                                  library.   In  order  to account for per job
299                                  network traffic, add the "ic/ofed"  TRES  to
300                                  AccountingStorageTRES.
301
302
303       AcctGatherFilesystemType
304              Identifies the plugin to be used for filesystem traffic account‐
305              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
306              plugin  to  collect  filesystem traffic data for jobs and nodes.
307              The collection of filesystem traffic data  takes  place  on  the
308              node  level,  hence only in case of exclusive job allocation the
309              collected values will reflect the job's real traffic. In case of
310              node  sharing  between  jobs the reported filesystem traffic per
311              job (through sstat or sacct) will not reflect the real  filesys‐
312              tem traffic by the jobs.
313
314
315              Configurable values at present are:
316
317              acct_gather_filesystem/none
318                                  No filesystem data are collected.
319
320              acct_gather_filesystem/lustre
321                                  Lustre filesystem traffic data are collected
322                                  from the counters found in /proc/fs/lustre/.
323                                  In order to account for per job lustre traf‐
324                                  fic, add the "fs/lustre"  TRES  to  Account‐
325                                  ingStorageTRES.
326
327
328       AcctGatherProfileType
329              Identifies  the  plugin  to  be used for detailed job profiling.
330              The jobacct_gather plugin and slurmd daemon call this plugin  to
331              collect  detailed  data  such  as  I/O  counts, memory usage, or
332              energy consumption for jobs and nodes. There are  interfaces  in
333              this  plugin  to collect data as step start and completion, task
334              start and completion, and at the account gather  frequency.  The
335              data collected at the node level is related to jobs only in case
336              of exclusive job allocation.
337
338              Configurable values at present are:
339
340              acct_gather_profile/none
341                                  No profile data is collected.
342
343              acct_gather_profile/hdf5
344                                  This enables the HDF5 plugin. The  directory
345                                  where the profile files are stored and which
346                                  values are collected are configured  in  the
347                                  acct_gather.conf file.
348
349              acct_gather_profile/influxdb
350                                  This   enables   the  influxdb  plugin.  The
351                                  influxdb  instance  host,  port,   database,
352                                  retention  policy  and which values are col‐
353                                  lected     are     configured     in     the
354                                  acct_gather.conf file.
355
356
357       AllowSpecResourcesUsage
358              If set to "YES", Slurm allows individual jobs to override node's
359              configured CoreSpecCount value. For a job to take  advantage  of
360              this feature, a command line option of --core-spec must be spec‐
361              ified.  The default value for this option is "YES" for Cray sys‐
362              tems and "NO" for other system types.
363
364
365       AuthAltTypes
366              Comma  separated list of alternative authentication plugins that
367              the slurmctld will permit for communication.  Acceptable  values
368              at present include auth/jwt.
369
370              NOTE:  auth/jwt  requires a jwt_hs256.key to be populated in the
371              StateSaveLocation   directory   for    slurmctld    only.    The
372              jwt_hs256.key  should only be visible to the SlurmUser and root.
373              It is not suggested to place the jwt_hs256.key on any nodes  but
374              the  controller running slurmctld.  auth/jwt can be activated by
375              the presence of the SLURM_JWT environment variable.  When  acti‐
376              vated, it will override the default AuthType.
377
378
379       AuthAltParameters
380              Used  to define alternative authentication plugins options. Mul‐
381              tiple options may be comma separated.
382
383              disable_token_creation
384                             Disable "scontrol  token"  use  by  non-SlurmUser
385                             accounts.
386
387              jwt_key=       Absolute path to JWT key file. Key must be HS256,
388                             and should only be accessible  by  SlurmUser.  If
389                             not set, the default key file is jwt_hs256.key in
390                             StateSaveLocation.
391
392
393       AuthInfo
394              Additional information to be used for authentication of communi‐
395              cations between the Slurm daemons (slurmctld and slurmd) and the
396              Slurm clients.  The interpretation of this option is specific to
397              the configured AuthType.  Multiple options may be specified in a
398              comma delimited list.  If not specified, the default authentica‐
399              tion information will be used.
400
401              cred_expire   Default  job  step credential lifetime, in seconds
402                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
403                            ciently  long enough to load user environment, run
404                            prolog, deal with the slurmd getting paged out  of
405                            memory,  etc.   This  also  controls  how  long  a
406                            requeued job must wait before starting again.  The
407                            default value is 120 seconds.
408
409              socket        Path  name  to  a MUNGE daemon socket to use (e.g.
410                            "socket=/var/run/munge/munge.socket.2").       The
411                            default  value is "/var/run/munge/munge.socket.2".
412                            Used by auth/munge and cred/munge.
413
414              ttl           Credential lifetime, in seconds (e.g.  "ttl=300").
415                            The  default  value  is  dependent  upon the MUNGE
416                            installation, but is typically 300 seconds.
417
418
419       AuthType
420              The authentication method for communications between Slurm  com‐
421              ponents.   Acceptable values at present include "auth/munge" and
422              "auth/none".  The default value  is  "auth/munge".   "auth/none"
423              includes  the UID in each communication, but it is not verified.
424              This  may  be  fine  for  testing  purposes,  but  do  not   use
425              "auth/none"  if you desire any security.  "auth/munge" indicates
426              that MUNGE is to be used.   (See  "https://dun.github.io/munge/"
427              for  more  information).  All Slurm daemons and commands must be
428              terminated prior to changing the value  of  AuthType  and  later
429              restarted.
430
431
432       BackupAddr
433              Deprecated option, see SlurmctldHost.
434
435
436       BackupController
437              Deprecated option, see SlurmctldHost.
438
439              The backup controller recovers state information from the State‐
440              SaveLocation directory, which must be readable and writable from
441              both  the  primary and backup controllers.  While not essential,
442              it is recommended that you specify  a  backup  controller.   See
443              the RELOCATING CONTROLLERS section if you change this.
444
445
446       BatchStartTimeout
447              The  maximum time (in seconds) that a batch job is permitted for
448              launching before being  considered  missing  and  releasing  the
449              allocation. The default value is 10 (seconds). Larger values may
450              be required if more time is required to execute the Prolog, load
451              user  environment  variables, or if the slurmd daemon gets paged
452              from memory.
453              Note: The test for a job being  successfully  launched  is  only
454              performed  when  the  Slurm daemon on the compute node registers
455              state with the slurmctld daemon on the head node, which  happens
456              fairly  rarely.   Therefore a job will not necessarily be termi‐
457              nated if its start time exceeds BatchStartTimeout.  This config‐
458              uration  parameter  is  also  applied  to launch tasks and avoid
459              aborting srun commands due to long running Prolog scripts.
460
461
462       BurstBufferType
463              The plugin used to manage burst buffers.  Acceptable  values  at
464              present are:
465
466              burst_buffer/datawarp
467                     Use Cray DataWarp API to provide burst buffer functional‐
468                     ity.
469
470              burst_buffer/none
471
472
473       CliFilterPlugins
474              A comma delimited list of command  line  interface  option  fil‐
475              ter/modification plugins. The specified plugins will be executed
476              in the order listed.  These are  intended  to  be  site-specific
477              plugins  which  can be used to set default job parameters and/or
478              logging events.  No cli_filter plugins are used by default.
479
480
481       ClusterName
482              The name by which this Slurm managed cluster  is  known  in  the
483              accounting  database.   This  is  needed  distinguish accounting
484              records when multiple clusters  report  to  the  same  database.
485              Because of limitations in some databases, any upper case letters
486              in the name will be silently mapped to lower case. In  order  to
487              avoid confusion, it is recommended that the name be lower case.
488
489
490       CommunicationParameters
491              Comma separated options identifying communication options.
492
493              CheckGhalQuiesce
494                             Used  specifically  on a Cray using an Aries Ghal
495                             interconnect.  This will check to see if the sys‐
496                             tem  is  quiescing when sending a message, and if
497                             so, we wait until it is done before sending.
498
499              DisableIPv4    Disable IPv4 only operation for all slurm daemons
500                             (except  slurmdbd).  This  should  also be set in
501                             your slurmdbd.conf file.
502
503              EnableIPv6     Enable using IPv6 addresses for all slurm daemons
504                             (except slurmdbd). When using both IPv4 and IPv6,
505                             address family preferences will be based on  your
506                             /etc/gai.conf  file.  This  should also be set in
507                             your slurmdbd.conf file.
508
509              NoAddrCache    By default, Slurm will  cache  a  node's  network
510                             address   after   successfully  establishing  the
511                             node's network address. This option disables  the
512                             cache  and  Slurm will look up the node's network
513                             address each time a connection is made.  This  is
514                             useful, for example, in a cloud environment where
515                             the node addresses come and go out of DNS.
516
517              NoCtldInAddrAny
518                             Used to directly bind to the address of what  the
519                             node resolves to running the slurmctld instead of
520                             binding messages to  any  address  on  the  node,
521                             which is the default.
522
523              NoInAddrAny    Used  to directly bind to the address of what the
524                             node resolves to instead of binding  messages  to
525                             any  address  on  the  node which is the default.
526                             This option is for all daemons/clients except for
527                             the slurmctld.
528
529
530
531       CompleteWait
532              The  time to wait, in seconds, when any job is in the COMPLETING
533              state before any additional  jobs  are  scheduled.  This  is  to
534              attempt  to  keep  jobs on nodes that were recently in use, with
535              the goal of preventing fragmentation.  If set to  zero,  pending
536              jobs  will  be  started as soon as possible.  Since a COMPLETING
537              job's resources are released for use by other jobs  as  soon  as
538              the Epilog completes on each individual node, this can result in
539              very fragmented resource allocations.  To provide jobs with  the
540              minimum  response time, a value of zero is recommended (no wait‐
541              ing).  To minimize fragmentation of resources, a value equal  to
542              KillWait  plus  two is recommended.  In that case, setting Kill‐
543              Wait to a small value may be beneficial.  The default  value  of
544              CompleteWait is zero seconds.  The value may not exceed 65533.
545
546              NOTE:  Setting  reduce_completing_frag  affects  the behavior of
547              CompleteWait.
548
549
550       ControlAddr
551              Deprecated option, see SlurmctldHost.
552
553
554       ControlMachine
555              Deprecated option, see SlurmctldHost.
556
557
558       CoreSpecPlugin
559              Identifies the plugins to be used for enforcement of  core  spe‐
560              cialization.   The  slurmd daemon must be restarted for a change
561              in CoreSpecPlugin to take effect.  Acceptable values at  present
562              include:
563
564              core_spec/cray_aries
565                                  used only for Cray systems
566
567              core_spec/none      used for all other system types
568
569
570       CpuFreqDef
571              Default  CPU  frequency  value or frequency governor to use when
572              running a job step if it has not been explicitly  set  with  the
573              --cpu-freq  option.   Acceptable  values  at  present  include a
574              numeric value (frequency in kilohertz) or one of  the  following
575              governors:
576
577              Conservative  attempts to use the Conservative CPU governor
578
579              OnDemand      attempts to use the OnDemand CPU governor
580
581              Performance   attempts to use the Performance CPU governor
582
583              PowerSave     attempts to use the PowerSave CPU governor
584       There  is no default value. If unset, no attempt to set the governor is
585       made if the --cpu-freq option has not been set.
586
587
588       CpuFreqGovernors
589              List of CPU frequency governors allowed to be set with the  sal‐
590              loc,  sbatch,  or srun option  --cpu-freq.  Acceptable values at
591              present include:
592
593              Conservative  attempts to use the Conservative CPU governor
594
595              OnDemand      attempts to  use  the  OnDemand  CPU  governor  (a
596                            default value)
597
598              Performance   attempts  to  use  the Performance CPU governor (a
599                            default value)
600
601              PowerSave     attempts to use the PowerSave CPU governor
602
603              UserSpace     attempts to use  the  UserSpace  CPU  governor  (a
604                            default value)
605       The default is OnDemand, Performance and UserSpace.
606
607       CredType
608              The  cryptographic  signature tool to be used in the creation of
609              job step credentials.  The slurmctld daemon  must  be  restarted
610              for  a  change in CredType to take effect.  Acceptable values at
611              present include "cred/munge" and "cred/none".  The default value
612              is "cred/munge" and is the recommended.
613
614
615       DebugFlags
616              Defines  specific  subsystems which should provide more detailed
617              event logging.  Multiple subsystems can be specified with  comma
618              separators.   Most  DebugFlags will result in verbose-level log‐
619              ging for the identified subsystems,  and  could  impact  perfor‐
620              mance.  Valid subsystems available include:
621
622              Accrue           Accrue counters accounting details
623
624              Agent            RPC agents (outgoing RPCs from Slurm daemons)
625
626              Backfill         Backfill scheduler details
627
628              BackfillMap      Backfill scheduler to log a very verbose map of
629                               reserved resources through time.  Combine  with
630                               Backfill for a verbose and complete view of the
631                               backfill scheduler's work.
632
633              BurstBuffer      Burst Buffer plugin
634
635              CPU_Bind         CPU binding details for jobs and steps
636
637              CpuFrequency     Cpu frequency details for jobs and steps  using
638                               the --cpu-freq option.
639
640              Data             Generic data structure details.
641
642              Dependency       Job dependency debug info
643
644              Elasticsearch    Elasticsearch debug info
645
646              Energy           AcctGatherEnergy debug info
647
648              ExtSensors       External Sensors debug info
649
650              Federation       Federation scheduling debug info
651
652              FrontEnd         Front end node details
653
654              Gres             Generic resource details
655
656              Hetjob           Heterogeneous job details
657
658              Gang             Gang scheduling details
659
660              JobContainer     Job container plugin details
661
662              License          License management details
663
664              Network          Network details
665
666              NetworkRaw       Dump  raw  hex values of key Network communica‐
667                               tions. Warning: very verbose.
668
669              NodeFeatures     Node Features plugin debug info
670
671              NO_CONF_HASH     Do not log when  the  slurm.conf  files  differ
672                               between Slurm daemons
673
674              Power            Power management plugin
675
676              PowerSave        Power save (suspend/resume programs) details
677
678              Priority         Job prioritization
679
680              Profile          AcctGatherProfile plugins details
681
682              Protocol         Communication protocol details
683
684              Reservation      Advanced reservations
685
686              Route            Message forwarding debug info
687
688              SelectType       Resource selection plugin
689
690              Steps            Slurmctld resource allocation for job steps
691
692              Switch           Switch plugin
693
694              TimeCray         Timing of Cray APIs
695
696              TRESNode         Limits dealing with TRES=Node
697
698              TraceJobs        Trace jobs in slurmctld. It will print detailed
699                               job information including state,  job  ids  and
700                               allocated nodes counter.
701
702              Triggers         Slurmctld triggers
703
704              WorkQueue        Work Queue details
705
706
707       DefCpuPerGPU
708              Default count of CPUs allocated per allocated GPU.
709
710
711       DefMemPerCPU
712              Default   real  memory  size  available  per  allocated  CPU  in
713              megabytes.  Used to avoid over-subscribing  memory  and  causing
714              paging.  DefMemPerCPU would generally be used if individual pro‐
715              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
716              SelectType=select/cons_tres).   The  default  value is 0 (unlim‐
717              ited).  Also see DefMemPerGPU, DefMemPerNode  and  MaxMemPerCPU.
718              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
719              sive.
720
721
722       DefMemPerGPU
723              Default  real  memory  size  available  per  allocated  GPU   in
724              megabytes.   The  default  value  is  0  (unlimited).   Also see
725              DefMemPerCPU and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU  and
726              DefMemPerNode are mutually exclusive.
727
728
729       DefMemPerNode
730              Default  real  memory  size  available  per  allocated  node  in
731              megabytes.  Used to avoid over-subscribing  memory  and  causing
732              paging.   DefMemPerNode  would  generally be used if whole nodes
733              are allocated to jobs (SelectType=select/linear)  and  resources
734              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
735              The default value is  0  (unlimited).   Also  see  DefMemPerCPU,
736              DefMemPerGPU  and  MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
737              DefMemPerNode are mutually exclusive.
738
739
740       DefaultStorageHost
741              The default name of the machine hosting the  accounting  storage
742              and job completion databases.  Only used for database type stor‐
743              age plugins and when the AccountingStorageHost  and  JobCompHost
744              have not been defined.
745
746
747       DefaultStorageLoc
748              The  fully  qualified file name where job completion records are
749              written when the DefaultStorageType is "filetxt".  Also see Job‐
750              CompLoc.
751
752
753       DefaultStoragePass
754              The  password  used  to gain access to the database to store the
755              accounting and job completion data.  Only used for database type
756              storage  plugins,  ignored  otherwise.  Also see AccountingStor‐
757              agePass and JobCompPass.
758
759
760       DefaultStoragePort
761              The listening port of the accounting storage and/or job  comple‐
762              tion database server.  Only used for database type storage plug‐
763              ins, ignored otherwise.  Also see AccountingStoragePort and Job‐
764              CompPort.
765
766
767       DefaultStorageType
768              The  accounting  and  job  completion  storage  mechanism  type.
769              Acceptable values at  present  include  "filetxt",  "mysql"  and
770              "none".   The  value  "filetxt"  indicates  that records will be
771              written to a file.  The value "mysql" indicates that  accounting
772              records  will  be  written  to a MySQL or MariaDB database.  The
773              default value is "none", which means that records are not  main‐
774              tained.  Also see AccountingStorageType and JobCompType.
775
776
777       DefaultStorageUser
778              The user account for accessing the accounting storage and/or job
779              completion database.  Only used for database type storage  plug‐
780              ins, ignored otherwise.  Also see AccountingStorageUser and Job‐
781              CompUser.
782
783
784       DependencyParameters
785              Multiple options may be comma-separated.
786
787
788              disable_remote_singleton
789                     By default, when a federated job has a  singleton  depen‐
790                     deny,  each cluster in the federation must clear the sin‐
791                     gleton dependency before the job's  singleton  dependency
792                     is  considered satisfied. Enabling this option means that
793                     only the origin cluster must clear the  singleton  depen‐
794                     dency.  This  option  must be set in every cluster in the
795                     federation.
796
797              kill_invalid_depend
798                     If a job has an invalid dependency and it can  never  run
799                     terminate  it  and  set its state to be JOB_CANCELLED. By
800                     default the job stays pending with reason  DependencyNev‐
801                     erSatisfied.   max_depend_depth=#  Maximum number of jobs
802                     to test for a circular job dependency. Stop testing after
803                     this  number  of  job  dependencies have been tested. The
804                     default value is 10 jobs.
805
806
807       DisableRootJobs
808              If set to "YES" then user root will be  prevented  from  running
809              any  jobs.  The default value is "NO", meaning user root will be
810              able to execute jobs.  DisableRootJobs may also be set by parti‐
811              tion.
812
813
814       EioTimeout
815              The  number  of  seconds  srun waits for slurmstepd to close the
816              TCP/IP connection used to relay data between the  user  applica‐
817              tion  and srun when the user application terminates. The default
818              value is 60 seconds.  May not exceed 65533.
819
820
821       EnforcePartLimits
822              If set to "ALL" then jobs which exceed a partition's size and/or
823              time  limits will be rejected at submission time. If job is sub‐
824              mitted to multiple partitions, the job must satisfy  the  limits
825              on  all  the  requested  partitions. If set to "NO" then the job
826              will be accepted and remain queued until  the  partition  limits
827              are  altered(Time  and Node Limits).  If set to "ANY" a job must
828              satisfy any of the requested partitions  to  be  submitted.  The
829              default  value  is "NO".  NOTE: If set, then a job's QOS can not
830              be used to exceed partition limits.  NOTE: The partition  limits
831              being considered are its configured MaxMemPerCPU, MaxMemPerNode,
832              MinNodes, MaxNodes, MaxTime, AllocNodes,  AllowAccounts,  Allow‐
833              Groups, AllowQOS, and QOS usage threshold.
834
835
836       Epilog Fully  qualified pathname of a script to execute as user root on
837              every   node    when    a    user's    job    completes    (e.g.
838              "/usr/local/slurm/epilog").  A  glob  pattern (See glob (7)) may
839              also  be  used  to  run  more  than  one  epilog  script   (e.g.
840              "/etc/slurm/epilog.d/*").  The  Epilog  script or scripts may be
841              used to purge files, disable user login, etc.  By default  there
842              is  no  epilog.  See Prolog and Epilog Scripts for more informa‐
843              tion.
844
845
846       EpilogMsgTime
847              The number of microseconds that the slurmctld daemon requires to
848              process  an  epilog  completion message from the slurmd daemons.
849              This parameter can be used to prevent a burst of epilog  comple‐
850              tion messages from being sent at the same time which should help
851              prevent lost messages and improve  throughput  for  large  jobs.
852              The  default  value  is 2000 microseconds.  For a 1000 node job,
853              this spreads the epilog completion messages out  over  two  sec‐
854              onds.
855
856
857       EpilogSlurmctld
858              Fully  qualified pathname of a program for the slurmctld to exe‐
859              cute   upon   termination   of   a    job    allocation    (e.g.
860              "/usr/local/slurm/epilog_controller").   The program executes as
861              SlurmUser, which gives it permission to drain nodes and  requeue
862              the job if a failure occurs (See scontrol(1)).  Exactly what the
863              program does and how it accomplishes this is completely  at  the
864              discretion  of  the system administrator.  Information about the
865              job being initiated, its allocated nodes, etc. are passed to the
866              program  using  environment  variables.   See  Prolog and Epilog
867              Scripts for more information.
868
869
870       ExtSensorsFreq
871              The external  sensors  plugin  sampling  interval.   If  ExtSen‐
872              sorsType=ext_sensors/none,  this  parameter is ignored.  For all
873              other values of ExtSensorsType, this parameter is the number  of
874              seconds between external sensors samples for hardware components
875              (nodes, switches, etc.) The default value is  zero.  This  value
876              disables  external  sensors  sampling. Note: This parameter does
877              not affect external sensors data collection for jobs/steps.
878
879
880       ExtSensorsType
881              Identifies the plugin to be used for external sensors data  col‐
882              lection.   Slurmctld  calls this plugin to collect external sen‐
883              sors data for jobs/steps and hardware  components.  In  case  of
884              node  sharing  between  jobs  the  reported  values per job/step
885              (through sstat or sacct) may not be  accurate.   See  also  "man
886              ext_sensors.conf".
887
888              Configurable values at present are:
889
890              ext_sensors/none    No external sensors data is collected.
891
892              ext_sensors/rrd     External  sensors data is collected from the
893                                  RRD database.
894
895
896       FairShareDampeningFactor
897              Dampen the effect of exceeding a user or group's fair  share  of
898              allocated resources. Higher values will provides greater ability
899              to differentiate between exceeding the fair share at high levels
900              (e.g. a value of 1 results in almost no difference between over‐
901              consumption by a factor of 10 and 100, while a value of  5  will
902              result  in  a  significant difference in priority).  The default
903              value is 1.
904
905
906       FederationParameters
907              Used to define federation options. Multiple options may be comma
908              separated.
909
910
911              fed_display
912                     If  set,  then  the  client status commands (e.g. squeue,
913                     sinfo, sprio, etc.) will display information in a  feder‐
914                     ated view by default. This option is functionally equiva‐
915                     lent to using the --federation options on  each  command.
916                     Use the client's --local option to override the federated
917                     view and get a local view of the given cluster.
918
919
920       FirstJobId
921              The job id to be used for the first submitted to Slurm without a
922              specific  requested  value.  Job id values generated will incre‐
923              mented by 1 for each subsequent job. This may be used to provide
924              a  meta-scheduler with a job id space which is disjoint from the
925              interactive jobs.  The default value is 1.  Also see MaxJobId
926
927
928       GetEnvTimeout
929              Controls how long the job should wait (in seconds) to  load  the
930              user's  environment  before  attempting  to load it from a cache
931              file.  Applies when the salloc or sbatch  --get-user-env  option
932              is  used.   If  set to 0 then always load the user's environment
933              from the cache file.  The default value is 2 seconds.
934
935
936       GresTypes
937              A comma delimited list of generic resources to be managed  (e.g.
938              GresTypes=gpu,mps).  These resources may have an associated GRES
939              plugin of the same name providing additional functionality.   No
940              generic resources are managed by default.  Ensure this parameter
941              is consistent across all nodes in the cluster for proper  opera‐
942              tion.   The  slurmctld  daemon  must be restarted for changes to
943              this parameter to become effective.
944
945
946       GroupUpdateForce
947              If set to a non-zero value, then information about  which  users
948              are members of groups allowed to use a partition will be updated
949              periodically, even when  there  have  been  no  changes  to  the
950              /etc/group  file.  If set to zero, group member information will
951              be updated only after  the  /etc/group  file  is  updated.   The
952              default value is 1.  Also see the GroupUpdateTime parameter.
953
954
955       GroupUpdateTime
956              Controls  how  frequently information about which users are mem‐
957              bers of groups allowed to use a partition will be  updated,  and
958              how  long  user group membership lists will be cached.  The time
959              interval is given in seconds with a default value  of  600  sec‐
960              onds.   A  value of zero will prevent periodic updating of group
961              membership information.  Also see the  GroupUpdateForce  parame‐
962              ter.
963
964
965       GpuFreqDef=[<type]=value>[,<type=value>]
966              Default  GPU  frequency to use when running a job step if it has
967              not been explicitly  set  using  the  --gpu-freq  option.   This
968              option  can  be  used to independently configure the GPU and its
969              memory frequencies. Defaults to "high,memory=high".   After  the
970              job  is  completed, the frequencies of all affected GPUs will be
971              reset to the highest possible values.   In  some  cases,  system
972              power  caps  may  override the requested values.  The field type
973              can be "memory".  If type is not specified, the GPU frequency is
974              implied.  The value field can either be "low", "medium", "high",
975              "highm1" or a numeric value in megahertz (MHz).  If  the  speci‐
976              fied numeric value is not possible, a value as close as possible
977              will be used.  See below for definition of the values.  Examples
978              of   use  include  "GpuFreqDef=medium,memory=high  and  "GpuFre‐
979              qDef=450".
980
981              Supported value definitions:
982
983              low       the lowest available frequency.
984
985              medium    attempts to set a  frequency  in  the  middle  of  the
986                        available range.
987
988              high      the highest available frequency.
989
990              highm1    (high  minus  one) will select the next highest avail‐
991                        able frequency.
992
993
994       HealthCheckInterval
995              The interval in seconds between  executions  of  HealthCheckPro‐
996              gram.  The default value is zero, which disables execution.
997
998
999       HealthCheckNodeState
1000              Identify what node states should execute the HealthCheckProgram.
1001              Multiple state values may be specified with a  comma  separator.
1002              The default value is ANY to execute on nodes in any state.
1003
1004              ALLOC       Run  on  nodes  in  the  ALLOC state (all CPUs allo‐
1005                          cated).
1006
1007              ANY         Run on nodes in any state.
1008
1009              CYCLE       Rather than running the health check program on  all
1010                          nodes at the same time, cycle through running on all
1011                          compute nodes through the course of the HealthCheck‐
1012                          Interval.  May  be  combined  with  the various node
1013                          state options.
1014
1015              IDLE        Run on nodes in the IDLE state.
1016
1017              MIXED       Run on nodes in the MIXED state (some CPUs idle  and
1018                          other CPUs allocated).
1019
1020
1021       HealthCheckProgram
1022              Fully  qualified  pathname  of  a script to execute as user root
1023              periodically  on  all  compute  nodes  that  are  not   in   the
1024              NOT_RESPONDING  state.  This  program  may be used to verify the
1025              node is fully operational and DRAIN the node or send email if  a
1026              problem  is detected.  Any action to be taken must be explicitly
1027              performed by the program (e.g. execute  "scontrol  update  Node‐
1028              Name=foo  State=drain  Reason=tmp_file_system_full"  to  drain a
1029              node).   The  execution  interval  is   controlled   using   the
1030              HealthCheckInterval parameter.  Note that the HealthCheckProgram
1031              will be executed at the same time on all nodes to  minimize  its
1032              impact  upon  parallel programs.  This program is will be killed
1033              if it does not terminate normally within 60 seconds.  This  pro‐
1034              gram  will  also  be  executed  when  the slurmd daemon is first
1035              started and before it registers with the slurmctld  daemon.   By
1036              default, no program will be executed.
1037
1038
1039       InactiveLimit
1040              The interval, in seconds, after which a non-responsive job allo‐
1041              cation command (e.g. srun or salloc)  will  result  in  the  job
1042              being  terminated.  If the node on which the command is executed
1043              fails or the command abnormally terminates, this will  terminate
1044              its  job allocation.  This option has no effect upon batch jobs.
1045              When setting a value, take into consideration  that  a  debugger
1046              using  srun  to launch an application may leave the srun command
1047              in a stopped state for extended periods of time.  This limit  is
1048              ignored  for  jobs  running in partitions with the RootOnly flag
1049              set (the scheduler running as root will be responsible  for  the
1050              job).   The default value is unlimited (zero) and may not exceed
1051              65533 seconds.
1052
1053
1054       InteractiveStepOptions
1055              When LaunchParameters=use_interactive_step is enabled, launching
1056              salloc  will  automatically  start an srun process with Interac‐
1057              tiveStepOptions to launch a terminal on a node in the job  allo‐
1058              cation.   The  default  value  is  "--interactive --preserve-env
1059              --pty $SHELL".
1060
1061
1062       JobAcctGatherType
1063              The job accounting mechanism type.  Acceptable values at present
1064              include  "jobacct_gather/linux"  (for  Linux systems) and is the
1065              recommended       one,        "jobacct_gather/cgroup"        and
1066              "jobacct_gather/none"   (no  accounting  data  collected).   The
1067              default value is "jobacct_gather/none".  "jobacct_gather/cgroup"
1068              is  a plugin for the Linux operating system that uses cgroups to
1069              collect accounting statistics. The plugin collects the following
1070              statistics:    From    the   cgroup   memory   subsystem:   mem‐
1071              ory.usage_in_bytes (reported  as  'pages')  and  rss  from  mem‐
1072              ory.stat (reported as 'rss'). From the cgroup cpuacct subsystem:
1073              user cpu time and system cpu  time.  No  value  is  provided  by
1074              cgroups  for virtual memory size ('vsize').  In order to use the
1075              sstat tool  "jobacct_gather/linux",  or  "jobacct_gather/cgroup"
1076              must be configured.
1077              NOTE: Changing this configuration parameter changes the contents
1078              of the messages between Slurm daemons.  Any  previously  running
1079              job  steps  are managed by a slurmstepd daemon that will persist
1080              through the lifetime of that job step and not change its  commu‐
1081              nication protocol. Only change this configuration parameter when
1082              there are no running job steps.
1083
1084
1085       JobAcctGatherFrequency
1086              The job accounting and profiling sampling intervals.   The  sup‐
1087              ported format is follows:
1088
1089              JobAcctGatherFrequency=<datatype>=<interval>
1090                          where  <datatype>=<interval> specifies the task sam‐
1091                          pling interval for the jobacct_gather  plugin  or  a
1092                          sampling  interval  for  a  profiling  type  by  the
1093                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
1094                          rated  <datatype>=<interval> intervals may be speci‐
1095                          fied. Supported datatypes are as follows:
1096
1097                          task=<interval>
1098                                 where <interval> is the task sampling  inter‐
1099                                 val in seconds for the jobacct_gather plugins
1100                                 and    for    task    profiling    by     the
1101                                 acct_gather_profile plugin.
1102
1103                          energy=<interval>
1104                                 where  <interval> is the sampling interval in
1105                                 seconds  for  energy  profiling   using   the
1106                                 acct_gather_energy plugin
1107
1108                          network=<interval>
1109                                 where  <interval> is the sampling interval in
1110                                 seconds for infiniband  profiling  using  the
1111                                 acct_gather_interconnect plugin.
1112
1113                          filesystem=<interval>
1114                                 where  <interval> is the sampling interval in
1115                                 seconds for filesystem  profiling  using  the
1116                                 acct_gather_filesystem plugin.
1117
1118              The default value for task sampling interval
1119              is  30  seconds. The default value for all other intervals is 0.
1120              An interval of 0 disables sampling of the  specified  type.   If
1121              the  task sampling interval is 0, accounting information is col‐
1122              lected only at job termination (reducing Slurm interference with
1123              the job).
1124              Smaller (non-zero) values have a greater impact upon job perfor‐
1125              mance, but a value of 30 seconds is not likely to be  noticeable
1126              for applications having less than 10,000 tasks.
1127              Users  can  independently  override  each  interval on a per job
1128              basis using the --acctg-freq option when submitting the job.
1129
1130
1131       JobAcctGatherParams
1132              Arbitrary parameters for the job account gather  plugin  Accept‐
1133              able values at present include:
1134
1135              NoShared            Exclude shared memory from accounting.
1136
1137              UsePss              Use  PSS  value  instead of RSS to calculate
1138                                  real usage of memory.  The PSS value will be
1139                                  saved as RSS.
1140
1141              OverMemoryKill      Kill  processes  that  are being detected to
1142                                  use more  memory  than  requested  by  steps
1143                                  every  time  accounting information is gath‐
1144                                  ered  by  the  JobAcctGather  plugin.   This
1145                                  parameter   should   be  used  with  caution
1146                                  because a job exceeding its  memory  alloca‐
1147                                  tion   may  affect  other  processes  and/or
1148                                  machine health.
1149
1150                                  NOTE: If available,  it  is  recommended  to
1151                                  limit  memory  by  enabling task/cgroup as a
1152                                  TaskPlugin  and  making  use  of  Constrain‐
1153                                  RAMSpace=yes  in  the cgroup.conf instead of
1154                                  using this JobAcctGather mechanism for  mem‐
1155                                  ory enforcement. With OverMemoryKill, memory
1156                                  limit is applied against each process  indi‐
1157                                  vidually and is not applied to the step as a
1158                                  whole as it is  with  ConstrainRAMSpace=yes.
1159                                  Using  JobAcctGather  is  polling  based and
1160                                  there is a delay before  a  job  is  killed,
1161                                  which  could  lead  to  system Out of Memory
1162                                  events.
1163
1164
1165       JobCompHost
1166              The name of the machine hosting  the  job  completion  database.
1167              Only  used for database type storage plugins, ignored otherwise.
1168              Also see DefaultStorageHost.
1169
1170
1171       JobCompLoc
1172              The fully qualified file name where job completion  records  are
1173              written  when  the JobCompType is "jobcomp/filetxt" or the data‐
1174              base where job completion records are stored when  the  JobComp‐
1175              Type  is  a  database,  or  a  complete URL endpoint with format
1176              <host>:<port>/<target>/_doc when JobCompType  is  "jobcomp/elas‐
1177              ticsearch"  like  i.e.  "localhost:9200/slurm/_doc".  NOTE: More
1178              information   is   available   at    the    Slurm    web    site
1179              <https://slurm.schedmd.com/elasticsearch.html>.      Also    see
1180              DefaultStorageLoc.
1181
1182
1183       JobCompParams
1184              Pass arbitrary text string to job completion plugin.   Also  see
1185              JobCompType.
1186
1187
1188       JobCompPass
1189              The  password  used  to gain access to the database to store the
1190              job completion data.  Only used for database type storage  plug‐
1191              ins, ignored otherwise.  Also see DefaultStoragePass.
1192
1193
1194       JobCompPort
1195              The  listening port of the job completion database server.  Only
1196              used for database type storage plugins, ignored otherwise.  Also
1197              see DefaultStoragePort.
1198
1199
1200       JobCompType
1201              The job completion logging mechanism type.  Acceptable values at
1202              present include "jobcomp/none",  "jobcomp/elasticsearch",  "job‐
1203              comp/filetxt",    "jobcomp/lua",   "jobcomp/mysql"   and   "job‐
1204              comp/script".  The default value is "jobcomp/none", which  means
1205              that  upon  job  completion the record of the job is purged from
1206              the system.  If using the accounting infrastructure this  plugin
1207              may  not be of interest since the information here is redundant.
1208              The value "jobcomp/elasticsearch" indicates that a record of the
1209              job  should  be  written to an Elasticsearch server specified by
1210              the JobCompLoc parameter.  NOTE: More information  is  available
1211              at  the  Slurm  web  site  (  https://slurm.schedmd.com/elastic
1212              search.html ).  The value  "jobcomp/filetxt"  indicates  that  a
1213              record  of the job should be written to a text file specified by
1214              the JobCompLoc parameter.   The  value  "jobcomp/lua"  indicates
1215              that  a  record of the job should processed by the "jobcomp.lua"
1216              script located in the default script  directory  (typically  the
1217              subdirectory  "etc"  of  the installation directory).  The value
1218              "jobcomp/mysql" indicates that a record of  the  job  should  be
1219              written  to a MySQL or MariaDB database specified by the JobCom‐
1220              pLoc parameter.  The value  "jobcomp/script"  indicates  that  a
1221              script  specified  by the JobCompLoc parameter is to be executed
1222              with environment variables indicating the job information.
1223
1224       JobCompUser
1225              The user account for  accessing  the  job  completion  database.
1226              Only  used for database type storage plugins, ignored otherwise.
1227              Also see DefaultStorageUser.
1228
1229
1230       JobContainerType
1231              Identifies the plugin to be used for job tracking.   The  slurmd
1232              daemon  must  be  restarted  for a change in JobContainerType to
1233              take effect.  NOTE: The JobContainerType applies to a job  allo‐
1234              cation,  while  ProctrackType  applies to job steps.  Acceptable
1235              values at present include:
1236
1237              job_container/cncu  used only for Cray systems (CNCU  =  Compute
1238                                  Node Clean Up)
1239
1240              job_container/none  used for all other system types
1241
1242
1243       JobFileAppend
1244              This  option controls what to do if a job's output or error file
1245              exist when the job is started.  If JobFileAppend  is  set  to  a
1246              value  of  1, then append to the existing file.  By default, any
1247              existing file is truncated.
1248
1249
1250       JobRequeue
1251              This option controls the default ability for batch  jobs  to  be
1252              requeued.   Jobs may be requeued explicitly by a system adminis‐
1253              trator, after node failure, or upon preemption by a higher  pri‐
1254              ority job.  If JobRequeue is set to a value of 1, then batch job
1255              may be requeued unless explicitly  disabled  by  the  user.   If
1256              JobRequeue  is  set  to a value of 0, then batch job will not be
1257              requeued unless explicitly enabled by the user.  Use the  sbatch
1258              --no-requeue  or --requeue option to change the default behavior
1259              for individual jobs.  The default value is 1.
1260
1261
1262       JobSubmitPlugins
1263              A comma delimited list of job submission  plugins  to  be  used.
1264              The  specified  plugins  will  be  executed in the order listed.
1265              These are intended to be site-specific plugins which can be used
1266              to  set  default  job  parameters and/or logging events.  Sample
1267              plugins available in the distribution include  "all_partitions",
1268              "defaults",  "logging", "lua", and "partition".  For examples of
1269              use, see the Slurm code in  "src/plugins/job_submit"  and  "con‐
1270              tribs/lua/job_submit*.lua"  then modify the code to satisfy your
1271              needs.  Slurm can be configured to use multiple job_submit plug‐
1272              ins if desired, however the lua plugin will only execute one lua
1273              script named "job_submit.lua"  located  in  the  default  script
1274              directory  (typically the subdirectory "etc" of the installation
1275              directory).  No job submission plugins are used by default.
1276
1277
1278       KeepAliveTime
1279              Specifies how long sockets communications used between the  srun
1280              command  and its slurmstepd process are kept alive after discon‐
1281              nect.  Longer values can be used to improve reliability of  com‐
1282              munications in the event of network failures.  The default value
1283              leaves the system default  value.   The  value  may  not  exceed
1284              65533.
1285
1286
1287       KillOnBadExit
1288              If  set  to 1, a step will be terminated immediately if any task
1289              is crashed or aborted, as indicated by  a  non-zero  exit  code.
1290              With  the default value of 0, if one of the processes is crashed
1291              or aborted the other processes will continue to  run  while  the
1292              crashed  or  aborted  process  waits. The user can override this
1293              configuration parameter by using srun's -K, --kill-on-bad-exit.
1294
1295
1296       KillWait
1297              The interval, in seconds, given to a job's processes between the
1298              SIGTERM  and  SIGKILL  signals upon reaching its time limit.  If
1299              the job fails to terminate gracefully in the interval specified,
1300              it  will  be  forcibly terminated.  The default value is 30 sec‐
1301              onds.  The value may not exceed 65533.
1302
1303
1304       NodeFeaturesPlugins
1305              Identifies the plugins to be used for support of  node  features
1306              which  can  change through time. For example, a node which might
1307              be booted with various BIOS setting. This is  supported  through
1308              the  use  of  a  node's  active_features  and available_features
1309              information.  Acceptable values at present include:
1310
1311              node_features/knl_cray
1312                                  used only for Intel Knights Landing  proces‐
1313                                  sors (KNL) on Cray systems
1314
1315              node_features/knl_generic
1316                                  used  for  Intel  Knights Landing processors
1317                                  (KNL) on a generic Linux system
1318
1319
1320       LaunchParameters
1321              Identifies options to the job launch plugin.  Acceptable  values
1322              include:
1323
1324              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1325                                      from  given  --cpu-freq,  or  slurm.conf
1326                                      CpuFreqDef,  option.   By  default  only
1327                                      steps started with srun will utilize the
1328                                      cpu freq setting options.
1329
1330                                      NOTE:  If  you  are using srun to launch
1331                                      your  steps  inside   a   batch   script
1332                                      (advised) this option will create a sit‐
1333                                      uation  where  you  may  have   multiple
1334                                      agents setting the cpu_freq as the batch
1335                                      step usually runs on the same  resources
1336                                      one  or  more  steps  the  sruns  in the
1337                                      script will create.
1338
1339              cray_net_exclusive      Allow jobs  on  a  Cray  Native  cluster
1340                                      exclusive  access  to network resources.
1341                                      This should only be set on clusters pro‐
1342                                      viding  exclusive access to each node to
1343                                      a single job at once, and not using par‐
1344                                      allel  steps  within  the job, otherwise
1345                                      resources on the node  can  be  oversub‐
1346                                      scribed.
1347
1348              enable_nss_slurm        Permits  passwd and group resolution for
1349                                      a  job  to  be  serviced  by  slurmstepd
1350                                      rather  than  requiring  a lookup from a
1351                                      network     based      service.      See
1352                                      https://slurm.schedmd.com/nss_slurm.html
1353                                      for more information.
1354
1355              lustre_no_flush         If set on a Cray Native cluster, then do
1356                                      not  flush  the Lustre cache on job step
1357                                      completion. This setting will only  take
1358                                      effect  after  reconfiguring,  and  will
1359                                      only  take  effect  for  newly  launched
1360                                      jobs.
1361
1362              mem_sort                Sort NUMA memory at step start. User can
1363                                      override     this      default      with
1364                                      SLURM_MEM_BIND  environment  variable or
1365                                      --mem-bind=nosort command line option.
1366
1367              mpir_use_nodeaddr       When  launching  tasks   Slurm   creates
1368                                      entries  in MPIR_proctable that are used
1369                                      by parallel  debuggers,  profilers,  and
1370                                      related   tools  to  attach  to  running
1371                                      process.  By default the  MPIR_proctable
1372                                      entries contain MPIR_procdesc structures
1373                                      where the host_name is set  to  NodeName
1374                                      by default. If this option is specified,
1375                                      NodeAddr will be used  in  this  context
1376                                      instead.
1377
1378              disable_send_gids       By  default,  the slurmctld will look up
1379                                      and send the user_name and extended gids
1380                                      for  a job, rather than independently on
1381                                      each node as part of each  task  launch.
1382                                      This  helps  mitigate issues around name
1383                                      service scalability when launching  jobs
1384                                      involving  many nodes. Using this option
1385                                      will disable  this  functionality.  This
1386                                      option is ignored if enable_nss_slurm is
1387                                      specified.
1388
1389              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1390                                      memory in RAM.
1391
1392              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1393                                      and future memory in RAM.
1394
1395              test_exec               Have srun verify existence of  the  exe‐
1396                                      cutable  program along with user execute
1397                                      permission on the node  where  srun  was
1398                                      called before attempting to launch it on
1399                                      nodes in the step.
1400
1401              use_interactive_step    Have salloc use the Interactive Step  to
1402                                      launch  a  shell on an allocated compute
1403                                      node rather  than  locally  to  wherever
1404                                      salloc was invoked. This is accomplished
1405                                      by  launching  the  srun  command   with
1406                                      InteractiveStepOptions as options.
1407
1408                                      This  does not affect salloc called with
1409                                      a command as  an  argument.  These  jobs
1410                                      will  continue  to  be  executed  as the
1411                                      calling user on the calling host.
1412
1413
1414       LaunchType
1415              Identifies the mechanism to be used to launch application tasks.
1416              Acceptable values include:
1417
1418              launch/slurm
1419                     The default value.
1420
1421
1422       Licenses
1423              Specification  of  licenses (or other resources available on all
1424              nodes of the cluster) which can be allocated to  jobs.   License
1425              names  can  optionally  be  followed by a colon and count with a
1426              default count of one.  Multiple license names  should  be  comma
1427              separated  (e.g.   "Licenses=foo:4,bar").   Note that Slurm pre‐
1428              vents jobs from being scheduled if their required license speci‐
1429              fication  is  not  available.   Slurm does not prevent jobs from
1430              using licenses that are not explicitly listed in the job submis‐
1431              sion specification.
1432
1433
1434       LogTimeFormat
1435              Format  of  the  timestamp  in  slurmctld  and slurmd log files.
1436              Accepted  values   are   "iso8601",   "iso8601_ms",   "rfc5424",
1437              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1438              ing in "_ms" differ from the ones  without  in  that  fractional
1439              seconds  with  millisecond  precision  are  printed. The default
1440              value is "iso8601_ms". The "rfc5424" formats are the same as the
1441              "iso8601"  formats except that the timezone value is also shown.
1442              The "clock" format shows a timestamp in  microseconds  retrieved
1443              with  the  C  standard clock() function. The "short" format is a
1444              short date and time format. The  "thread_id"  format  shows  the
1445              timestamp  in  the  C standard ctime() function form without the
1446              year but including the microseconds, the daemon's process ID and
1447              the current thread name and ID.
1448
1449
1450       MailDomain
1451              Domain name to qualify usernames if email address is not explic‐
1452              itly given with the "--mail-user" option. If  unset,  the  local
1453              MTA  will need to qualify local address itself. Changes to Mail‐
1454              Domain will only affect new jobs.
1455
1456
1457       MailProg
1458              Fully qualified pathname to the program used to send  email  per
1459              user   request.    The   default   value   is   "/bin/mail"  (or
1460              "/usr/bin/mail"   if   "/bin/mail"   does    not    exist    but
1461              "/usr/bin/mail" does exist).
1462
1463
1464       MaxArraySize
1465              The  maximum  job  array size.  The maximum job array task index
1466              value will be one less than MaxArraySize to allow for  an  index
1467              value  of zero.  Configure MaxArraySize to 0 in order to disable
1468              job array use.  The value may not exceed 4000001.  The value  of
1469              MaxJobCount  should  be  much  larger  than  MaxArraySize.   The
1470              default value is 1001.
1471
1472
1473       MaxDBDMsgs
1474              When communication to the SlurmDBD is not possible the slurmctld
1475              will  queue  messages  meant  to  processed when the SlurmDBD is
1476              available again.  In order to avoid running out  of  memory  the
1477              slurmctld will only queue so many messages. The default value is
1478              10000, or MaxJobCount *  2  +  Node  Count  *  4,  whichever  is
1479              greater.  The value can not be less than 10000.
1480
1481
1482       MaxJobCount
1483              The maximum number of jobs Slurm can have in its active database
1484              at one time. Set the values  of  MaxJobCount  and  MinJobAge  to
1485              ensure the slurmctld daemon does not exhaust its memory or other
1486              resources. Once this limit is reached, requests to submit  addi‐
1487              tional  jobs  will fail. The default value is 10000 jobs.  NOTE:
1488              Each task of a job array counts as one job even though they will
1489              not  occupy  separate  job  records until modified or initiated.
1490              Performance can suffer with more than  a  few  hundred  thousand
1491              jobs.   Setting per MaxSubmitJobs per user is generally valuable
1492              to prevent a single user from  filling  the  system  with  jobs.
1493              This  is  accomplished  using  Slurm's  database and configuring
1494              enforcement of resource limits.  This value may not be reset via
1495              "scontrol  reconfig".   It only takes effect upon restart of the
1496              slurmctld daemon.
1497
1498
1499       MaxJobId
1500              The maximum job id to be used for jobs submitted to Slurm  with‐
1501              out a specific requested value. Job ids are unsigned 32bit inte‐
1502              gers with the first 26 bits reserved for local job ids  and  the
1503              remaining  6 bits reserved for a cluster id to identify a feder‐
1504              ated  job's  origin.  The  maximun  allowed  local  job  id   is
1505              67,108,863   (0x3FFFFFF).   The   default  value  is  67,043,328
1506              (0x03ff0000).  MaxJobId only applies to the local job id and not
1507              the  federated  job  id.  Job id values generated will be incre‐
1508              mented by 1 for each subsequent job. Once MaxJobId  is  reached,
1509              the  next  job will be assigned FirstJobId.  Federated jobs will
1510              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1511              bId.
1512
1513
1514       MaxMemPerCPU
1515              Maximum   real  memory  size  available  per  allocated  CPU  in
1516              megabytes.  Used to avoid over-subscribing  memory  and  causing
1517              paging.  MaxMemPerCPU would generally be used if individual pro‐
1518              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
1519              SelectType=select/cons_tres).   The  default  value is 0 (unlim‐
1520              ited).  Also see DefMemPerCPU, DefMemPerGPU  and  MaxMemPerNode.
1521              MaxMemPerCPU and MaxMemPerNode are mutually exclusive.
1522
1523              NOTE:  If  a  job  specifies a memory per CPU limit that exceeds
1524              this system limit, that job's count of CPUs per task will try to
1525              automatically  increase.  This may result in the job failing due
1526              to CPU count limits. This auto-adjustment  feature  is  a  best-
1527              effort  one  and optimal assignment is not guaranteed due to the
1528              possibility of having heterogeneous  configurations  and  multi-
1529              partition/qos jobs.  If this is a concern it is advised to use a
1530              job submit LUA plugin instead  to  enforce  auto-adjustments  to
1531              your specific needs.
1532
1533
1534       MaxMemPerNode
1535              Maximum  real  memory  size  available  per  allocated  node  in
1536              megabytes.  Used to avoid over-subscribing  memory  and  causing
1537              paging.   MaxMemPerNode  would  generally be used if whole nodes
1538              are allocated to jobs (SelectType=select/linear)  and  resources
1539              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
1540              The default value is 0 (unlimited).  Also see DefMemPerNode  and
1541              MaxMemPerCPU.    MaxMemPerCPU  and  MaxMemPerNode  are  mutually
1542              exclusive.
1543
1544
1545       MaxStepCount
1546              The maximum number of steps that  any  job  can  initiate.  This
1547              parameter  is intended to limit the effect of bad batch scripts.
1548              The default value is 40000 steps.
1549
1550
1551       MaxTasksPerNode
1552              Maximum number of tasks Slurm will allow a job step to spawn  on
1553              a  single  node.  The  default  MaxTasksPerNode is 512.  May not
1554              exceed 65533.
1555
1556
1557       MCSParameters
1558              MCS = Multi-Category Security MCS Plugin Parameters.   The  sup‐
1559              ported  parameters  are  specific  to the MCSPlugin.  Changes to
1560              this value take effect when the Slurm daemons are  reconfigured.
1561              More     information     about    MCS    is    available    here
1562              <https://slurm.schedmd.com/mcs.html>.
1563
1564
1565       MCSPlugin
1566              MCS = Multi-Category Security : associate a  security  label  to
1567              jobs  and  ensure that nodes can only be shared among jobs using
1568              the same security label.  Acceptable values include:
1569
1570              mcs/none    is the default value.  No security label  associated
1571                          with  jobs,  no particular security restriction when
1572                          sharing nodes among jobs.
1573
1574              mcs/account only users with the same account can share the nodes
1575                          (requires enabling of accounting).
1576
1577              mcs/group   only users with the same group can share the nodes.
1578
1579              mcs/user    a node cannot be shared with other users.
1580
1581
1582       MessageTimeout
1583              Time  permitted  for  a  round-trip communication to complete in
1584              seconds. Default value is 10 seconds. For  systems  with  shared
1585              nodes,  the  slurmd  daemon  could  be paged out and necessitate
1586              higher values.
1587
1588
1589       MinJobAge
1590              The minimum age of a completed job before its record  is  purged
1591              from  Slurm's active database. Set the values of MaxJobCount and
1592              to ensure the slurmctld daemon does not exhaust  its  memory  or
1593              other  resources.  The default value is 300 seconds.  A value of
1594              zero prevents any job record purging.  Jobs are not purged  dur‐
1595              ing  a backfill cycle, so it can take longer than MinJobAge sec‐
1596              onds to purge a job if using the backfill scheduling plugin.  In
1597              order  to  eliminate  some possible race conditions, the minimum
1598              non-zero value for MinJobAge recommended is 2.
1599
1600
1601       MpiDefault
1602              Identifies the default type of MPI to be used.  Srun  may  over‐
1603              ride  this  configuration parameter in any case.  Currently sup‐
1604              ported versions include: pmi2, pmix, and  none  (default,  which
1605              works  for  many other versions of MPI).  More information about
1606              MPI          use           is           available           here
1607              <https://slurm.schedmd.com/mpi_guide.html>.
1608
1609
1610       MpiParams
1611              MPI  parameters.   Used to identify ports used by older versions
1612              of OpenMPI  and  native  Cray  systems.   The  input  format  is
1613              "ports=12000-12999"  to  identify a range of communication ports
1614              to be used.  NOTE: This is not needed  for  modern  versions  of
1615              OpenMPI,  taking  it  out  can cause a small boost in scheduling
1616              performance.  NOTE: This is require for Cray's PMI.
1617
1618
1619       OverTimeLimit
1620              Number of minutes by which a  job  can  exceed  its  time  limit
1621              before  being  canceled.  Normally a job's time limit is treated
1622              as a hard limit and the job will be killed  upon  reaching  that
1623              limit.   Configuring OverTimeLimit will result in the job's time
1624              limit being treated like a soft limit.  Adding the OverTimeLimit
1625              value  to  the  soft  time  limit provides a hard time limit, at
1626              which point the job is canceled.  This  is  particularly  useful
1627              for  backfill  scheduling, which bases upon each job's soft time
1628              limit.  The default value is zero.  May not  exceed  65533  min‐
1629              utes.  A value of "UNLIMITED" is also supported.
1630
1631
1632       PluginDir
1633              Identifies  the places in which to look for Slurm plugins.  This
1634              is a colon-separated list of directories, like the PATH environ‐
1635              ment variable.  The default value is the prefix given at config‐
1636              ure time + "/lib/slurm".
1637
1638
1639       PlugStackConfig
1640              Location of the config file for Slurm stackable plugins that use
1641              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1642              (SPANK).  This provides support for a highly configurable set of
1643              plugins  to be called before and/or after execution of each task
1644              spawned as part of a  user's  job  step.   Default  location  is
1645              "plugstack.conf" in the same directory as the system slurm.conf.
1646              For more information on SPANK plugins, see the spank(8) manual.
1647
1648
1649       PowerParameters
1650              System power management parameters.   The  supported  parameters
1651              are  specific  to  the  PowerPlugin.  Changes to this value take
1652              effect when the Slurm daemons are reconfigured.   More  informa‐
1653              tion   about   system   power   management   is  available  here
1654              <https://slurm.schedmd.com/power_mgmt.html>.   Options   current
1655              supported by any plugins are listed below.
1656
1657              balance_interval=#
1658                     Specifies the time interval, in seconds, between attempts
1659                     to rebalance power caps across the nodes.  This also con‐
1660                     trols  the  frequency  at which Slurm attempts to collect
1661                     current power consumption data  (old  data  may  be  used
1662                     until  new  data  is available from the underlying infra‐
1663                     structure and values below 10 seconds are not recommended
1664                     for  Cray  systems).   The  default  value is 30 seconds.
1665                     Supported by the power/cray_aries plugin.
1666
1667              capmc_path=
1668                     Specifies the absolute path of the  capmc  command.   The
1669                     default   value  is  "/opt/cray/capmc/default/bin/capmc".
1670                     Supported by the power/cray_aries plugin.
1671
1672              cap_watts=#
1673                     Specifies the total power limit to be established  across
1674                     all  compute  nodes  managed by Slurm.  A value of 0 sets
1675                     every compute node to have an unlimited cap.  The default
1676                     value is 0.  Supported by the power/cray_aries plugin.
1677
1678              decrease_rate=#
1679                     Specifies the maximum rate of change in the power cap for
1680                     a node where the actual power usage is  below  the  power
1681                     cap  by  an  amount  greater  than  lower_threshold  (see
1682                     below).  Value represents a percentage of the  difference
1683                     between  a  node's minimum and maximum power consumption.
1684                     The default  value  is  50  percent.   Supported  by  the
1685                     power/cray_aries plugin.
1686
1687              get_timeout=#
1688                     Amount  of time allowed to get power state information in
1689                     milliseconds.  The default value is 5,000 milliseconds or
1690                     5  seconds.  Supported by the power/cray_aries plugin and
1691                     represents the time allowed  for  the  capmc  command  to
1692                     respond to various "get" options.
1693
1694              increase_rate=#
1695                     Specifies the maximum rate of change in the power cap for
1696                     a  node  where  the  actual   power   usage   is   within
1697                     upper_threshold (see below) of the power cap.  Value rep‐
1698                     resents a percentage of the difference between  a  node's
1699                     minimum and maximum power consumption.  The default value
1700                     is 20 percent.  Supported by the power/cray_aries plugin.
1701
1702              job_level
1703                     All nodes associated with every job will  have  the  same
1704                     power   cap,  to  the  extent  possible.   Also  see  the
1705                     --power=level option on the job submission commands.
1706
1707              job_no_level
1708                     Disable the user's ability to set every  node  associated
1709                     with  a  job  to the same power cap.  Each node will have
1710                     its power  cap  set  independently.   This  disables  the
1711                     --power=level option on the job submission commands.
1712
1713              lower_threshold=#
1714                     Specify a lower power consumption threshold.  If a node's
1715                     current power consumption is below this percentage of its
1716                     current  cap,  then  its  power cap will be reduced.  The
1717                     default  value  is  90   percent.    Supported   by   the
1718                     power/cray_aries plugin.
1719
1720              recent_job=#
1721                     If  a job has started or resumed execution (from suspend)
1722                     on a compute node within this number of seconds from  the
1723                     current  time,  the node's power cap will be increased to
1724                     the maximum.  The default value  is  300  seconds.   Sup‐
1725                     ported by the power/cray_aries plugin.
1726
1727
1728              set_timeout=#
1729                     Amount  of time allowed to set power state information in
1730                     milliseconds.  The default value is  30,000  milliseconds
1731                     or  30  seconds.   Supported by the power/cray plugin and
1732                     represents the time allowed  for  the  capmc  command  to
1733                     respond to various "set" options.
1734
1735              set_watts=#
1736                     Specifies  the  power  limit  to  be set on every compute
1737                     nodes managed by Slurm.  Every node gets this same  power
1738                     cap  and  there  is  no variation through time based upon
1739                     actual  power  usage  on  the  node.   Supported  by  the
1740                     power/cray_aries plugin.
1741
1742              upper_threshold=#
1743                     Specify  an  upper  power  consumption  threshold.   If a
1744                     node's current power consumption is above this percentage
1745                     of  its current cap, then its power cap will be increased
1746                     to the extent possible.  The default value is 95 percent.
1747                     Supported by the power/cray_aries plugin.
1748
1749
1750       PowerPlugin
1751              Identifies  the  plugin  used for system power management.  Cur‐
1752              rently supported plugins include: cray_aries and none.   Changes
1753              to  this  value require restarting Slurm daemons to take effect.
1754              More information about system power management is available here
1755              <https://slurm.schedmd.com/power_mgmt.html>.    By  default,  no
1756              power plugin is loaded.
1757
1758
1759       PreemptMode
1760              Mechanism used to preempt jobs or enable gang  scheduling.  When
1761              the  PreemptType parameter is set to enable preemption, the Pre‐
1762              emptMode selects the default mechanism used to preempt the  eli‐
1763              gible jobs for the cluster.
1764              PreemptMode  may  be specified on a per partition basis to over‐
1765              ride this default value  if  PreemptType=preempt/partition_prio.
1766              Alternatively,  it  can  be specified on a per QOS basis if Pre‐
1767              emptType=preempt/qos. In either case, a valid  default  Preempt‐
1768              Mode  value  must  be  specified for the cluster as a whole when
1769              preemption is enabled.
1770              The GANG option is used to enable gang scheduling independent of
1771              whether  preemption is enabled (i.e. independent of the Preempt‐
1772              Type setting). It can be specified in addition to a  PreemptMode
1773              setting  with  the  two  options  comma separated (e.g. Preempt‐
1774              Mode=SUSPEND,GANG).
1775              See         <https://slurm.schedmd.com/preempt.html>         and
1776              <https://slurm.schedmd.com/gang_scheduling.html>     for    more
1777              details.
1778
1779              NOTE: For performance reasons, the backfill  scheduler  reserves
1780              whole  nodes  for  jobs,  not  partial nodes. If during backfill
1781              scheduling a job preempts one or  more  other  jobs,  the  whole
1782              nodes  for  those  preempted jobs are reserved for the preemptor
1783              job, even if the preemptor job requested  fewer  resources  than
1784              that.   These reserved nodes aren't available to other jobs dur‐
1785              ing that backfill cycle, even if the other jobs could fit on the
1786              nodes.  Therefore, jobs may preempt more resources during a sin‐
1787              gle backfill iteration than they requested.
1788
1789              NOTE: For heterogeneous job to be considered for preemption  all
1790              components must be eligible for preemption. When a heterogeneous
1791              job is to be preempted the first identified component of the job
1792              with  the highest order PreemptMode (SUSPEND (highest), REQUEUE,
1793              CANCEL (lowest)) will be used to set  the  PreemptMode  for  all
1794              components.  The GraceTime and user warning signal for each com‐
1795              ponent of the heterogeneous job  remain  unique.   Heterogeneous
1796              jobs are excluded from GANG scheduling operations.
1797
1798              OFF         Is the default value and disables job preemption and
1799                          gang scheduling.  It is only  compatible  with  Pre‐
1800                          emptType=preempt/none  at  a global level.  A common
1801                          use case for this parameter is to set it on a parti‐
1802                          tion to disable preemption for that partition.
1803
1804              CANCEL      The preempted job will be cancelled.
1805
1806              GANG        Enables  gang  scheduling  (time slicing) of jobs in
1807                          the same partition, and allows the resuming of  sus‐
1808                          pended jobs.
1809
1810                          NOTE: Gang scheduling is performed independently for
1811                          each partition, so if you only want time-slicing  by
1812                          OverSubscribe,  without any preemption, then config‐
1813                          uring partitions with overlapping nodes is not  rec‐
1814                          ommended.   On  the  other  hand, if you want to use
1815                          PreemptType=preempt/partition_prio  to  allow   jobs
1816                          from  higher PriorityTier partitions to Suspend jobs
1817                          from lower PriorityTier  partitions  you  will  need
1818                          overlapping partitions, and PreemptMode=SUSPEND,GANG
1819                          to use the Gang scheduler to  resume  the  suspended
1820                          jobs(s).   In  any  case,  time-slicing won't happen
1821                          between jobs on different partitions.
1822
1823                          NOTE: Heterogeneous  jobs  are  excluded  from  GANG
1824                          scheduling operations.
1825
1826              REQUEUE     Preempts  jobs  by  requeuing  them (if possible) or
1827                          canceling them.  For jobs to be requeued  they  must
1828                          have  the --requeue sbatch option set or the cluster
1829                          wide JobRequeue parameter in slurm.conf must be  set
1830                          to one.
1831
1832              SUSPEND     The  preempted jobs will be suspended, and later the
1833                          Gang scheduler will resume them. Therefore the  SUS‐
1834                          PEND preemption mode always needs the GANG option to
1835                          be specified at the cluster level. Also, because the
1836                          suspended  jobs  will  still use memory on the allo‐
1837                          cated nodes, Slurm needs to be able to track  memory
1838                          resources to be able to suspend jobs.
1839
1840                          NOTE:  Because gang scheduling is performed indepen‐
1841                          dently for each partition, if using PreemptType=pre‐
1842                          empt/partition_prio then jobs in higher PriorityTier
1843                          partitions will suspend jobs in  lower  PriorityTier
1844                          partitions  to  run  on the released resources. Only
1845                          when the preemptor job ends will the suspended  jobs
1846                          will be resumed by the Gang scheduler.
1847                          If  PreemptType=preempt/qos is configured and if the
1848                          preempted job(s) and the preemptor job  are  on  the
1849                          same  partition, then they will share resources with
1850                          the Gang scheduler (time-slicing). If not  (i.e.  if
1851                          the preemptees and preemptor are on different parti‐
1852                          tions) then the preempted jobs will remain suspended
1853                          until the preemptor ends.
1854
1855
1856       PreemptType
1857              Specifies  the  plugin  used  to identify which jobs can be pre‐
1858              empted in order to start a pending job.
1859
1860              preempt/none
1861                     Job preemption is disabled.  This is the default.
1862
1863              preempt/partition_prio
1864                     Job preemption  is  based  upon  partition  PriorityTier.
1865                     Jobs  in  higher PriorityTier partitions may preempt jobs
1866                     from lower PriorityTier partitions.  This is not compati‐
1867                     ble with PreemptMode=OFF.
1868
1869              preempt/qos
1870                     Job  preemption rules are specified by Quality Of Service
1871                     (QOS) specifications in the Slurm database.  This  option
1872                     is  not compatible with PreemptMode=OFF.  A configuration
1873                     of PreemptMode=SUSPEND is only supported by  the  Select‐
1874                     Type=select/cons_res    and   SelectType=select/cons_tres
1875                     plugins.  See the sacctmgr  man  page  to  configure  the
1876                     options for preempt/qos.
1877
1878
1879       PreemptExemptTime
1880              Global  option for minimum run time for all jobs before they can
1881              be considered for preemption. Any  QOS  PreemptExemptTime  takes
1882              precedence  over  the  global option.  A time of -1 disables the
1883              option, equivalent to 0. Acceptable time formats  include  "min‐
1884              utes", "minutes:seconds", "hours:minutes:seconds", "days-hours",
1885              "days-hours:minutes", and "days-hours:minutes:seconds".
1886
1887
1888       PrEpParameters
1889              Parameters to be passed to the PrEpPlugins.
1890
1891
1892       PrEpPlugins
1893              A resource for programmers wishing to write  their  own  plugins
1894              for  the Prolog and Epilog (PrEp) scripts. The default, and cur‐
1895              rently the only implemented plugin  is  prep/script.  Additional
1896              plugins  can  be  specified  in a comma separated list. For more
1897              information please see the PrEp Plugin API  documentation  page:
1898              <https://slurm.schedmd.com/prep_plugins.html>
1899
1900
1901       PriorityCalcPeriod
1902              The  period of time in minutes in which the half-life decay will
1903              be re-calculated.  Applicable only if PriorityType=priority/mul‐
1904              tifactor.  The default value is 5 (minutes).
1905
1906
1907       PriorityDecayHalfLife
1908              This  controls  how  long  prior  resource  use is considered in
1909              determining how over- or under-serviced an association is (user,
1910              bank  account  and  cluster)  in  determining job priority.  The
1911              record of usage will be decayed over  time,  with  half  of  the
1912              original  value cleared at age PriorityDecayHalfLife.  If set to
1913              0 no decay will be applied.  This is  helpful  if  you  want  to
1914              enforce  hard  time limits per association.  If set to 0 Priori‐
1915              tyUsageResetPeriod must be set  to  some  interval.   Applicable
1916              only  if  PriorityType=priority/multifactor.  The unit is a time
1917              string (i.e. min, hr:min:00, days-hr:min:00, or  days-hr).   The
1918              default value is 7-0 (7 days).
1919
1920
1921       PriorityFavorSmall
1922              Specifies  that small jobs should be given preferential schedul‐
1923              ing priority.  Applicable only  if  PriorityType=priority/multi‐
1924              factor.  Supported values are "YES" and "NO".  The default value
1925              is "NO".
1926
1927
1928       PriorityFlags
1929              Flags to modify priority behavior.  Applicable only if Priority‐
1930              Type=priority/multifactor.   The  keywords below have no associ‐
1931              ated   value   (e.g.    "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
1932              TIVE_TO_TIME").
1933
1934              ACCRUE_ALWAYS    If  set,  priority age factor will be increased
1935                               despite job dependencies or holds.
1936
1937              CALCULATE_RUNNING
1938                               If set, priorities  will  be  recalculated  not
1939                               only  for  pending  jobs,  but also running and
1940                               suspended jobs.
1941
1942              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
1943                               lar  to the normal multifactor calculation, but
1944                               depth of the associations in the  tree  do  not
1945                               adversely  effect  their  priority. This option
1946                               automatically enables NO_FAIR_TREE.
1947
1948              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
1949                               to "classic" fair share priority scheduling.
1950
1951              INCR_ONLY        If  set,  priority values will only increase in
1952                               value. Job  priority  will  never  decrease  in
1953                               value.
1954
1955              MAX_TRES         If  set,  the  weighted  TRES value (e.g. TRES‐
1956                               BillingWeights) is calculated  as  the  MAX  of
1957                               individual  TRES'  on  a  node (e.g. cpus, mem,
1958                               gres) plus the sum of all  global  TRES'  (e.g.
1959                               licenses).
1960
1961              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
1962
1963              NO_NORMAL_ASSOC  If  set,  the association factor is not normal‐
1964                               ized against the highest association priority.
1965
1966              NO_NORMAL_PART   If set, the partition factor is not  normalized
1967                               against  the  highest partition PriorityJobFac‐
1968                               tor.
1969
1970              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
1971                               against the highest qos priority.
1972
1973              NO_NORMAL_TRES   If  set,  the  QOS  factor  is  not  normalized
1974                               against the job's partition TRES counts.
1975
1976              SMALL_RELATIVE_TO_TIME
1977                               If set, the job's size component will be  based
1978                               upon not the job size alone, but the job's size
1979                               divided by its time limit.
1980
1981
1982       PriorityMaxAge
1983              Specifies the job age which will be given the maximum age factor
1984              in  computing priority. For example, a value of 30 minutes would
1985              result in all jobs over  30  minutes  old  would  get  the  same
1986              age-based  priority.   Applicable  only  if  PriorityType=prior‐
1987              ity/multifactor.   The  unit  is  a  time  string   (i.e.   min,
1988              hr:min:00,  days-hr:min:00,  or  days-hr).  The default value is
1989              7-0 (7 days).
1990
1991
1992       PriorityParameters
1993              Arbitrary string used by the PriorityType plugin.
1994
1995
1996       PrioritySiteFactorParameters
1997              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
1998
1999
2000       PrioritySiteFactorPlugin
2001              The specifies an optional plugin to be  used  alongside  "prior‐
2002              ity/multifactor",  which  is meant to initially set and continu‐
2003              ously update the SiteFactor priority factor.  The default  value
2004              is "site_factor/none".
2005
2006
2007       PriorityType
2008              This  specifies  the  plugin  to be used in establishing a job's
2009              scheduling priority. Supported values are "priority/basic" (jobs
2010              are  prioritized  by  order  of arrival), "priority/multifactor"
2011              (jobs are prioritized based upon size, age, fair-share of  allo‐
2012              cation, etc).  Also see PriorityFlags for configuration options.
2013              The default value is "priority/basic".
2014
2015              When not FIFO scheduling, jobs are prioritized in the  following
2016              order:
2017
2018              1. Jobs that can preempt
2019              2. Jobs with an advanced reservation
2020              3. Partition Priority Tier
2021              4. Job Priority
2022              5. Job Id
2023
2024
2025       PriorityUsageResetPeriod
2026              At  this  interval the usage of associations will be reset to 0.
2027              This is used if you want to enforce hard limits  of  time  usage
2028              per  association.   If  PriorityDecayHalfLife  is set to be 0 no
2029              decay will happen and this is the only way to  reset  the  usage
2030              accumulated  by running jobs.  By default this is turned off and
2031              it is advised to use the PriorityDecayHalfLife option  to  avoid
2032              not  having anything running on your cluster, but if your schema
2033              is set up to only allow certain amounts of time on  your  system
2034              this  is the way to do it.  Applicable only if PriorityType=pri‐
2035              ority/multifactor.
2036
2037              NONE        Never clear historic usage. The default value.
2038
2039              NOW         Clear the historic usage now.  Executed  at  startup
2040                          and reconfiguration time.
2041
2042              DAILY       Cleared every day at midnight.
2043
2044              WEEKLY      Cleared every week on Sunday at time 00:00.
2045
2046              MONTHLY     Cleared  on  the  first  day  of  each month at time
2047                          00:00.
2048
2049              QUARTERLY   Cleared on the first day of  each  quarter  at  time
2050                          00:00.
2051
2052              YEARLY      Cleared on the first day of each year at time 00:00.
2053
2054
2055       PriorityWeightAge
2056              An  integer  value  that sets the degree to which the queue wait
2057              time component contributes to the  job's  priority.   Applicable
2058              only  if  PriorityType=priority/multifactor.   Requires Account‐
2059              ingStorageType=accounting_storage/slurmdbd.  The  default  value
2060              is 0.
2061
2062
2063       PriorityWeightAssoc
2064              An  integer  value that sets the degree to which the association
2065              component contributes to the job's priority.  Applicable only if
2066              PriorityType=priority/multifactor.  The default value is 0.
2067
2068
2069       PriorityWeightFairshare
2070              An  integer  value  that sets the degree to which the fair-share
2071              component contributes to the job's priority.  Applicable only if
2072              PriorityType=priority/multifactor.    Requires   AccountingStor‐
2073              ageType=accounting_storage/slurmdbd.  The default value is 0.
2074
2075
2076       PriorityWeightJobSize
2077              An integer value that sets the degree to which the job size com‐
2078              ponent  contributes  to  the job's priority.  Applicable only if
2079              PriorityType=priority/multifactor.  The default value is 0.
2080
2081
2082       PriorityWeightPartition
2083              Partition factor used by priority/multifactor plugin  in  calcu‐
2084              lating  job  priority.   Applicable  only if PriorityType=prior‐
2085              ity/multifactor.  The default value is 0.
2086
2087
2088       PriorityWeightQOS
2089              An integer value that sets the degree to which  the  Quality  Of
2090              Service component contributes to the job's priority.  Applicable
2091              only if PriorityType=priority/multifactor.  The default value is
2092              0.
2093
2094
2095       PriorityWeightTRES
2096              A  comma  separated list of TRES Types and weights that sets the
2097              degree that each TRES Type contributes to the job's priority.
2098
2099              e.g.
2100              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
2101
2102              Applicable  only  if  PriorityType=priority/multifactor  and  if
2103              AccountingStorageTRES  is configured with each TRES Type.  Nega‐
2104              tive values are allowed.  The default values are 0.
2105
2106
2107       PrivateData
2108              This controls what type of information is  hidden  from  regular
2109              users.   By  default,  all  information is visible to all users.
2110              User SlurmUser and root can always view all information.  Multi‐
2111              ple  values may be specified with a comma separator.  Acceptable
2112              values include:
2113
2114              accounts
2115                     (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2116                     ing  any account definitions unless they are coordinators
2117                     of them.
2118
2119              cloud  Powered down nodes in the cloud are visible.
2120
2121              events prevents users from viewing event information unless they
2122                     have operator status or above.
2123
2124              jobs   Prevents  users  from viewing jobs or job steps belonging
2125                     to other users. (NON-SlurmDBD ACCOUNTING  ONLY)  Prevents
2126                     users  from  viewing job records belonging to other users
2127                     unless they are coordinators of the  association  running
2128                     the job when using sacct.
2129
2130              nodes  Prevents users from viewing node state information.
2131
2132              partitions
2133                     Prevents users from viewing partition state information.
2134
2135              reservations
2136                     Prevents  regular  users  from viewing reservations which
2137                     they can not use.
2138
2139              usage  Prevents users from viewing usage of any other user, this
2140                     applies  to  sshare.  (NON-SlurmDBD ACCOUNTING ONLY) Pre‐
2141                     vents users from viewing usage of any  other  user,  this
2142                     applies to sreport.
2143
2144              users  (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
2145                     ing information of any user other than  themselves,  this
2146                     also  makes  it  so  users can only see associations they
2147                     deal with.  Coordinators  can  see  associations  of  all
2148                     users  in  the  account  they are coordinator of, but can
2149                     only see themselves when listing users.
2150
2151
2152       ProctrackType
2153              Identifies the plugin to be used for process tracking on  a  job
2154              step  basis.   The slurmd daemon uses this mechanism to identify
2155              all processes which are children of processes it  spawns  for  a
2156              user job step.  The slurmd daemon must be restarted for a change
2157              in ProctrackType to take  effect.   NOTE:  "proctrack/linuxproc"
2158              and  "proctrack/pgid" can fail to identify all processes associ‐
2159              ated with a job since processes can become a child of  the  init
2160              process  (when  the  parent  process terminates) or change their
2161              process  group.   To  reliably  track  all   processes,   "proc‐
2162              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2163              applies to a job allocation, while ProctrackType applies to  job
2164              steps.  Acceptable values at present include:
2165
2166              proctrack/cgroup
2167                     Uses  linux cgroups to constrain and track processes, and
2168                     is the default for systems with cgroup support.
2169                     NOTE: see "man cgroup.conf" for configuration details.
2170
2171              proctrack/cray_aries
2172                     Uses Cray proprietary process tracking.
2173
2174              proctrack/linuxproc
2175                     Uses linux process tree using parent process IDs.
2176
2177              proctrack/pgid
2178                     Uses Process Group IDs.
2179                     NOTE: This is the default for the BSD family.
2180
2181
2182       Prolog Fully qualified pathname of a program for the slurmd to  execute
2183              whenever it is asked to run a job step from a new job allocation
2184              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2185              may  also  be used to specify more than one program to run (e.g.
2186              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2187              starting  the  first job step.  The prolog script or scripts may
2188              be used to purge files, enable  user  login,  etc.   By  default
2189              there  is  no  prolog. Any configured script is expected to com‐
2190              plete execution quickly (in less time than MessageTimeout).   If
2191              the  prolog  fails  (returns  a  non-zero  exit code), this will
2192              result in the node being set to a DRAIN state and the job  being
2193              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2194              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2195              for more information.
2196
2197
2198       PrologEpilogTimeout
2199              The  interval  in  seconds  Slurms  waits  for Prolog and Epilog
2200              before terminating them. The default behavior is to wait indefi‐
2201              nitely.  This  interval  applies to the Prolog and Epilog run by
2202              slurmd daemon before and after the job, the PrologSlurmctld  and
2203              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
2204              run by the slurmstepd daemon.
2205
2206
2207       PrologFlags
2208              Flags to control the Prolog behavior. By default  no  flags  are
2209              set.  Multiple flags may be specified in a comma-separated list.
2210              Currently supported options are:
2211
2212              Alloc   If set, the Prolog script will be executed at job  allo‐
2213                      cation.  By  default, Prolog is executed just before the
2214                      task is launched. Therefore, when salloc is started,  no
2215                      Prolog is executed. Alloc is useful for preparing things
2216                      before a user starts to use any allocated resources.  In
2217                      particular,  this  flag  is needed on a Cray system when
2218                      cluster compatibility mode is enabled.
2219
2220                      NOTE: Use of the  Alloc  flag  will  increase  the  time
2221                      required to start jobs.
2222
2223              Contain At job allocation time, use the ProcTrack plugin to cre‐
2224                      ate a job container  on  all  allocated  compute  nodes.
2225                      This  container  may  be  used  for  user  processes not
2226                      launched    under    Slurm    control,    for    example
2227                      pam_slurm_adopt  may  place processes launched through a
2228                      direct  user  login  into  this  container.   If   using
2229                      pam_slurm_adopt,  then  ProcTrackType  must  be  set  to
2230                      either proctrack/cgroup or  proctrack/cray_aries.   Set‐
2231                      ting the Contain implicitly sets the Alloc flag.
2232
2233              NoHold  If  set,  the  Alloc flag should also be set.  This will
2234                      allow for salloc to not block until the prolog  is  fin‐
2235                      ished on each node.  The blocking will happen when steps
2236                      reach the slurmd and before any execution  has  happened
2237                      in  the  step.  This is a much faster way to work and if
2238                      using srun to launch your  tasks  you  should  use  this
2239                      flag.  This  flag cannot be combined with the Contain or
2240                      X11 flags.
2241
2242              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2243                      rently  on each node.  This flag forces those scripts to
2244                      run serially within each node, but  with  a  significant
2245                      penalty to job throughput on each node.
2246
2247              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2248                      This is incompatible with ProctrackType=proctrack/linux‐
2249                      proc.  Setting the X11 flag implicitly enables both Con‐
2250                      tain and Alloc flags as well.
2251
2252
2253       PrologSlurmctld
2254              Fully qualified pathname of a program for the  slurmctld  daemon
2255              to   execute   before   granting  a  new  job  allocation  (e.g.
2256              "/usr/local/slurm/prolog_controller").  The program executes  as
2257              SlurmUser  on the same node where the slurmctld daemon executes,
2258              giving it permission to drain nodes and requeue  the  job  if  a
2259              failure  occurs  or  cancel the job if appropriate.  The program
2260              can be used to reboot nodes or perform  other  work  to  prepare
2261              resources  for  use.   Exactly  what the program does and how it
2262              accomplishes this is completely at the discretion of the  system
2263              administrator.   Information  about the job being initiated, its
2264              allocated nodes, etc. are passed to the program  using  environ‐
2265              ment  variables.  While this program is running, the nodes asso‐
2266              ciated with the job will be have a POWER_UP/CONFIGURING flag set
2267              in their state, which can be readily viewed.  The slurmctld dae‐
2268              mon will wait indefinitely for this program to  complete.   Once
2269              the  program completes with an exit code of zero, the nodes will
2270              be considered ready for use and the program will be started.  If
2271              some  node can not be made available for use, the program should
2272              drain the node (typically using the scontrol command) and termi‐
2273              nate  with  a  non-zero  exit  code.   A non-zero exit code will
2274              result in the job being requeued  (where  possible)  or  killed.
2275              Note  that only batch jobs can be requeued.  See Prolog and Epi‐
2276              log Scripts for more information.
2277
2278
2279       PropagatePrioProcess
2280              Controls the scheduling priority (nice value)  of  user  spawned
2281              tasks.
2282
2283              0    The  tasks  will  inherit  the scheduling priority from the
2284                   slurm daemon.  This is the default value.
2285
2286              1    The tasks will inherit the scheduling priority of the  com‐
2287                   mand used to submit them (e.g. srun or sbatch).  Unless the
2288                   job is submitted by user root, the tasks will have a sched‐
2289                   uling  priority  no  higher  than the slurm daemon spawning
2290                   them.
2291
2292              2    The tasks will inherit the scheduling priority of the  com‐
2293                   mand  used  to  submit  them (e.g. srun or sbatch) with the
2294                   restriction that their nice value will always be one higher
2295                   than  the slurm daemon (i.e.  the tasks scheduling priority
2296                   will be lower than the slurm daemon).
2297
2298
2299       PropagateResourceLimits
2300              A list of comma separated resource limit names.  The slurmd dae‐
2301              mon  uses these names to obtain the associated (soft) limit val‐
2302              ues from the user's process  environment  on  the  submit  node.
2303              These  limits  are  then propagated and applied to the jobs that
2304              will run on the compute nodes.  This  parameter  can  be  useful
2305              when  system  limits vary among nodes.  Any resource limits that
2306              do not appear in the list are not propagated.  However, the user
2307              can  override this by specifying which resource limits to propa‐
2308              gate with the sbatch or srun "--propagate"  option.  If  neither
2309              PropagateResourceLimits   or  PropagateResourceLimitsExcept  are
2310              configured and the "--propagate" option is not  specified,  then
2311              the  default  action is to propagate all limits. Only one of the
2312              parameters, either PropagateResourceLimits or PropagateResource‐
2313              LimitsExcept,  may be specified.  The user limits can not exceed
2314              hard limits under which the slurmd daemon operates. If the  user
2315              limits  are  not  propagated,  the limits from the slurmd daemon
2316              will be propagated to the user's job. The limits  used  for  the
2317              Slurm  daemons  can  be  set in the /etc/sysconf/slurm file. For
2318              more information,  see:  https://slurm.schedmd.com/faq.html#mem‐
2319              lock  The following limit names are supported by Slurm (although
2320              some options may not be supported on some systems):
2321
2322              ALL       All limits listed below (default)
2323
2324              NONE      No limits listed below
2325
2326              AS        The maximum address space for a process
2327
2328              CORE      The maximum size of core file
2329
2330              CPU       The maximum amount of CPU time
2331
2332              DATA      The maximum size of a process's data segment
2333
2334              FSIZE     The maximum size of files created. Note  that  if  the
2335                        user  sets  FSIZE to less than the current size of the
2336                        slurmd.log, job launches will fail with a  'File  size
2337                        limit exceeded' error.
2338
2339              MEMLOCK   The maximum size that may be locked into memory
2340
2341              NOFILE    The maximum number of open files
2342
2343              NPROC     The maximum number of processes available
2344
2345              RSS       The maximum resident set size
2346
2347              STACK     The maximum stack size
2348
2349
2350       PropagateResourceLimitsExcept
2351              A list of comma separated resource limit names.  By default, all
2352              resource limits will be propagated, (as described by the  Propa‐
2353              gateResourceLimits  parameter),  except for the limits appearing
2354              in this list.   The user can override this by  specifying  which
2355              resource  limits  to propagate with the sbatch or srun "--propa‐
2356              gate" option.  See PropagateResourceLimits above for a  list  of
2357              valid limit names.
2358
2359
2360       RebootProgram
2361              Program  to  be  executed  on  each  compute  node to reboot it.
2362              Invoked on each node once it  becomes  idle  after  the  command
2363              "scontrol  reboot" is executed by an authorized user or a job is
2364              submitted with the "--reboot" option.  After rebooting, the node
2365              is  returned  to normal use.  See ResumeTimeout to configure the
2366              time you expect a reboot to finish in.  A node  will  be  marked
2367              DOWN if it doesn't reboot within ResumeTimeout.
2368
2369
2370       ReconfigFlags
2371              Flags  to  control  various  actions  that  may be taken when an
2372              "scontrol reconfig" command is  issued.  Currently  the  options
2373              are:
2374
2375              KeepPartInfo     If  set,  an  "scontrol  reconfig" command will
2376                               maintain  the  in-memory  value  of   partition
2377                               "state" and other parameters that may have been
2378                               dynamically updated by "scontrol update".  Par‐
2379                               tition  information in the slurm.conf file will
2380                               be  merged  with  in-memory  data.   This  flag
2381                               supersedes the KeepPartState flag.
2382
2383              KeepPartState    If  set,  an  "scontrol  reconfig" command will
2384                               preserve only  the  current  "state"  value  of
2385                               in-memory  partitions  and will reset all other
2386                               parameters of the partitions that may have been
2387                               dynamically updated by "scontrol update" to the
2388                               values from  the  slurm.conf  file.   Partition
2389                               information  in  the  slurm.conf  file  will be
2390                               merged with in-memory data.
2391              The default for the above flags is not set,  and  the  "scontrol
2392              reconfig"  will rebuild the partition information using only the
2393              definitions in the slurm.conf file.
2394
2395
2396       RequeueExit
2397              Enables automatic requeue for batch jobs  which  exit  with  the
2398              specified values.  Separate multiple exit code by a comma and/or
2399              specify numeric ranges using a  "-"  separator  (e.g.  "Requeue‐
2400              Exit=1-9,18")  Jobs  will  be  put  back in to pending state and
2401              later scheduled again.  Restarted jobs will have the environment
2402              variable  SLURM_RESTART_COUNT set to the number of times the job
2403              has been restarted.
2404
2405
2406       RequeueExitHold
2407              Enables automatic requeue for batch jobs  which  exit  with  the
2408              specified values, with these jobs being held until released man‐
2409              ually by the user.  Separate  multiple  exit  code  by  a  comma
2410              and/or  specify  numeric  ranges  using  a  "-"  separator (e.g.
2411              "RequeueExitHold=10-12,16") These jobs are put in  the  JOB_SPE‐
2412              CIAL_EXIT  exit state.  Restarted jobs will have the environment
2413              variable SLURM_RESTART_COUNT set to the number of times the  job
2414              has been restarted.
2415
2416
2417       ResumeFailProgram
2418              The  program  that will be executed when nodes fail to resume to
2419              by ResumeTimeout. The argument to the program will be the  names
2420              of the failed nodes (using Slurm's hostlist expression format).
2421
2422
2423       ResumeProgram
2424              Slurm  supports a mechanism to reduce power consumption on nodes
2425              that remain idle for an extended period of time.  This is  typi‐
2426              cally accomplished by reducing voltage and frequency or powering
2427              the node down.  ResumeProgram is the program that will  be  exe‐
2428              cuted  when  a  node in power save mode is assigned work to per‐
2429              form.  For reasons of  reliability,  ResumeProgram  may  execute
2430              more  than once for a node when the slurmctld daemon crashes and
2431              is restarted.  If ResumeProgram is unable to restore a  node  to
2432              service  with  a  responding  slurmd and an updated BootTime, it
2433              should requeue any job associated with the node and set the node
2434              state  to  DOWN.  If the node isn't actually rebooted (i.e. when
2435              multiple-slurmd is configured) starting slurmd with "-b"  option
2436              might  be useful.  The program executes as SlurmUser.  The argu‐
2437              ment to the program will be the names of  nodes  to  be  removed
2438              from  power savings mode (using Slurm's hostlist expression for‐
2439              mat).  By default no  program  is  run.   Related  configuration
2440              options include ResumeTimeout, ResumeRate, SuspendRate, Suspend‐
2441              Time, SuspendTimeout, SuspendProgram, SuspendExcNodes, and  Sus‐
2442              pendExcParts.   More  information  is available at the Slurm web
2443              site ( https://slurm.schedmd.com/power_save.html ).
2444
2445
2446       ResumeRate
2447              The rate at which nodes in power save mode are returned to  nor‐
2448              mal  operation  by  ResumeProgram.  The value is number of nodes
2449              per minute and it can be used to prevent power surges if a large
2450              number of nodes in power save mode are assigned work at the same
2451              time (e.g. a large job starts).  A value of zero results  in  no
2452              limits  being  imposed.   The  default  value  is  300 nodes per
2453              minute.  Related configuration  options  include  ResumeTimeout,
2454              ResumeProgram,  SuspendRate,  SuspendTime,  SuspendTimeout, Sus‐
2455              pendProgram, SuspendExcNodes, and SuspendExcParts.
2456
2457
2458       ResumeTimeout
2459              Maximum time permitted (in seconds) between when a  node  resume
2460              request  is  issued  and when the node is actually available for
2461              use.  Nodes which fail to respond in this  time  frame  will  be
2462              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
2463              which reboot after this time frame will be marked  DOWN  with  a
2464              reason of "Node unexpectedly rebooted."  The default value is 60
2465              seconds.  Related configuration options  include  ResumeProgram,
2466              ResumeRate,  SuspendRate,  SuspendTime, SuspendTimeout, Suspend‐
2467              Program, SuspendExcNodes and SuspendExcParts.  More  information
2468              is     available     at     the     Slurm     web     site     (
2469              https://slurm.schedmd.com/power_save.html ).
2470
2471
2472       ResvEpilog
2473              Fully qualified pathname of a program for the slurmctld to  exe‐
2474              cute  when a reservation ends. The program can be used to cancel
2475              jobs, modify  partition  configuration,  etc.   The  reservation
2476              named  will be passed as an argument to the program.  By default
2477              there is no epilog.
2478
2479
2480       ResvOverRun
2481              Describes how long a job already running in a reservation should
2482              be  permitted  to  execute after the end time of the reservation
2483              has been reached.  The time period is specified in  minutes  and
2484              the  default  value  is 0 (kill the job immediately).  The value
2485              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2486              supported to permit a job to run indefinitely after its reserva‐
2487              tion is terminated.
2488
2489
2490       ResvProlog
2491              Fully qualified pathname of a program for the slurmctld to  exe‐
2492              cute  when a reservation begins. The program can be used to can‐
2493              cel jobs, modify partition configuration, etc.  The  reservation
2494              named  will be passed as an argument to the program.  By default
2495              there is no prolog.
2496
2497
2498       ReturnToService
2499              Controls when a DOWN node will  be  returned  to  service.   The
2500              default value is 0.  Supported values include
2501
2502              0   A node will remain in the DOWN state until a system adminis‐
2503                  trator explicitly changes its state (even if the slurmd dae‐
2504                  mon registers and resumes communications).
2505
2506              1   A  DOWN node will become available for use upon registration
2507                  with a valid configuration only if it was set  DOWN  due  to
2508                  being  non-responsive.   If  the  node  was set DOWN for any
2509                  other reason (low  memory,  unexpected  reboot,  etc.),  its
2510                  state  will  not automatically be changed.  A node registers
2511                  with a valid configuration if its memory, GRES,  CPU  count,
2512                  etc.  are  equal to or greater than the values configured in
2513                  slurm.conf.
2514
2515              2   A DOWN node will become available for use upon  registration
2516                  with  a  valid  configuration.  The node could have been set
2517                  DOWN for any reason.  A node registers with a valid configu‐
2518                  ration  if its memory, GRES, CPU count, etc. are equal to or
2519                  greater than the values configured in slurm.conf.  (Disabled
2520                  on Cray ALPS systems.)
2521
2522
2523       RoutePlugin
2524              Identifies  the  plugin to be used for defining which nodes will
2525              be used for message forwarding.
2526
2527              route/default
2528                     default, use TreeWidth.
2529
2530              route/topology
2531                     use the switch hierarchy defined in a topology.conf file.
2532                     TopologyPlugin=topology/tree is required.
2533
2534
2535       SbcastParameters
2536              Controls sbcast command behavior. Multiple options can be speci‐
2537              fied in a comma separated list.  Supported values include:
2538
2539              DestDir=       Destination directory for file being broadcast to
2540                             allocated  compute  nodes.  Default value is cur‐
2541                             rent working directory.
2542
2543              Compression=   Specify default file compression  library  to  be
2544                             used.   Supported  values  are  "lz4", "none" and
2545                             "zlib".  The default value with the sbcast --com‐
2546                             press option is "lz4" and "none" otherwise.  Some
2547                             compression libraries may be unavailable on  some
2548                             systems.
2549
2550
2551       SchedulerParameters
2552              The  interpretation  of  this parameter varies by SchedulerType.
2553              Multiple options may be comma separated.
2554
2555              allow_zero_lic
2556                     If set, then job submissions requesting more than config‐
2557                     ured licenses won't be rejected.
2558
2559              assoc_limit_stop
2560                     If  set and a job cannot start due to association limits,
2561                     then do not attempt to initiate any lower  priority  jobs
2562                     in  that  partition.  Setting  this  can  decrease system
2563                     throughput and utilization, but avoid potentially  starv‐
2564                     ing larger jobs by preventing them from launching indefi‐
2565                     nitely.
2566
2567              batch_sched_delay=#
2568                     How long, in seconds, the scheduling of batch jobs can be
2569                     delayed.   This  can be useful in a high-throughput envi‐
2570                     ronment in which batch jobs are submitted at a very  high
2571                     rate  (i.e.  using  the sbatch command) and one wishes to
2572                     reduce the overhead of attempting to schedule each job at
2573                     submit time.  The default value is 3 seconds.
2574
2575              bb_array_stage_cnt=#
2576                     Number of tasks from a job array that should be available
2577                     for burst buffer resource allocation. Higher values  will
2578                     increase  the  system  overhead as each task from the job
2579                     array will be moved to its own job record in  memory,  so
2580                     relatively  small  values are generally recommended.  The
2581                     default value is 10.
2582
2583              bf_busy_nodes
2584                     When selecting resources for pending jobs to reserve  for
2585                     future execution (i.e. the job can not be started immedi‐
2586                     ately), then preferentially select nodes that are in use.
2587                     This  will  tend to leave currently idle resources avail‐
2588                     able for backfilling longer running jobs, but may  result
2589                     in allocations having less than optimal network topology.
2590                     This  option  is  currently   only   supported   by   the
2591                     select/cons_res    and   select/cons_tres   plugins   (or
2592                     select/cray_aries  with   SelectTypeParameters   set   to
2593                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2594                     select/cray_aries  plugin  over  the  select/cons_res  or
2595                     select/cons_tres plugin respectively).
2596
2597              bf_continue
2598                     The  backfill  scheduler  periodically  releases locks in
2599                     order to permit other operations to proceed  rather  than
2600                     blocking  all  activity  for  what  could  be an extended
2601                     period of time.  Setting this option will cause the back‐
2602                     fill  scheduler  to continue processing pending jobs from
2603                     its original job list after releasing locks even  if  job
2604                     or node state changes.
2605
2606              bf_hetjob_immediate
2607                     Instruct  the  backfill  scheduler  to attempt to start a
2608                     heterogeneous job as soon as all of  its  components  are
2609                     determined  able to do so. Otherwise, the backfill sched‐
2610                     uler will delay heterogeneous  jobs  initiation  attempts
2611                     until  after  the  rest  of the queue has been processed.
2612                     This delay may result in lower priority jobs being  allo‐
2613                     cated  resources, which could delay the initiation of the
2614                     heterogeneous job due to account and/or QOS limits  being
2615                     reached.  This  option is disabled by default. If enabled
2616                     and bf_hetjob_prio=min is not set, then it would be auto‐
2617                     matically set.
2618
2619              bf_hetjob_prio=[min|avg|max]
2620                     At  the  beginning  of  each backfill scheduling cycle, a
2621                     list of pending to be scheduled jobs is sorted  according
2622                     to  the precedence order configured in PriorityType. This
2623                     option instructs the scheduler to alter the sorting algo‐
2624                     rithm to ensure that all components belonging to the same
2625                     heterogeneous job will be attempted to be scheduled  con‐
2626                     secutively  (thus  not fragmented in the resulting list).
2627                     More specifically, all components from the same heteroge‐
2628                     neous  job  will  be treated as if they all have the same
2629                     priority (minimum, average or maximum depending upon this
2630                     option's  parameter)  when  compared  with other jobs (or
2631                     other heterogeneous job components). The  original  order
2632                     will be preserved within the same heterogeneous job. Note
2633                     that the operation is  calculated  for  the  PriorityTier
2634                     layer  and  for  the  Priority  resulting from the prior‐
2635                     ity/multifactor plugin calculations. When enabled, if any
2636                     heterogeneous job requested an advanced reservation, then
2637                     all of that job's components will be treated as  if  they
2638                     had  requested an advanced reservation (and get preferen‐
2639                     tial treatment in scheduling).
2640
2641                     Note that this operation does  not  update  the  Priority
2642                     values  of  the  heterogeneous job components, only their
2643                     order within the list, so the output of the sprio command
2644                     will not be effected.
2645
2646                     Heterogeneous  jobs  have  special scheduling properties:
2647                     they are only scheduled by the backfill scheduling  plug‐
2648                     in,  each  of  their  components is considered separately
2649                     when reserving resources (and might have different Prior‐
2650                     ityTier  or  different Priority values), and no heteroge‐
2651                     neous job component is actually allocated resources until
2652                     all  if  its components can be initiated.  This may imply
2653                     potential scheduling deadlock  scenarios  because  compo‐
2654                     nents from different heterogeneous jobs can start reserv‐
2655                     ing resources in an  interleaved  fashion  (not  consecu‐
2656                     tively),  but  none of the jobs can reserve resources for
2657                     all components and start. Enabling this option  can  help
2658                     to mitigate this problem. By default, this option is dis‐
2659                     abled.
2660
2661              bf_interval=#
2662                     The  number  of  seconds  between  backfill   iterations.
2663                     Higher  values result in less overhead and better respon‐
2664                     siveness.   This  option  applies  only   to   Scheduler‐
2665                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2666                     (3h).
2667
2668
2669              bf_job_part_count_reserve=#
2670                     The backfill scheduling logic will reserve resources  for
2671                     the specified count of highest priority jobs in each par‐
2672                     tition.  For example,  bf_job_part_count_reserve=10  will
2673                     cause the backfill scheduler to reserve resources for the
2674                     ten highest priority jobs in each partition.   Any  lower
2675                     priority  job  that can be started using currently avail‐
2676                     able resources and  not  adversely  impact  the  expected
2677                     start  time of these higher priority jobs will be started
2678                     by the backfill scheduler  The  default  value  is  zero,
2679                     which  will  reserve  resources  for  any pending job and
2680                     delay  initiation  of  lower  priority  jobs.   Also  see
2681                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2682                     Min: 0, Max: 100000.
2683
2684
2685              bf_max_job_array_resv=#
2686                     The maximum number of tasks from a job  array  for  which
2687                     the  backfill  scheduler  will  reserve  resources in the
2688                     future.  Since job arrays can potentially  have  millions
2689                     of  tasks,  the  overhead  in reserving resources for all
2690                     tasks can be prohibitive.  In addition various limits may
2691                     prevent all the jobs from starting at the expected times.
2692                     This has no impact upon the number of tasks  from  a  job
2693                     array  that  can be started immediately, only those tasks
2694                     expected to start at some future time.  Default: 20, Min:
2695                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2696                     tions appear in the job queue once per partition. If dif‐
2697                     ferent copies of a single job array record aren't consec‐
2698                     utive in the job queue and another job array record is in
2699                     between,  then bf_max_job_array_resv tasks are considered
2700                     per partition that the job is submitted to.
2701
2702              bf_max_job_assoc=#
2703                     The maximum  number  of  jobs  per  user  association  to
2704                     attempt  starting with the backfill scheduler.  This set‐
2705                     ting is similar to bf_max_job_user but is handy if a user
2706                     has multiple associations equating to basically different
2707                     users.  One can set this  limit  to  prevent  users  from
2708                     flooding  the  backfill queue with jobs that cannot start
2709                     and that prevent jobs from other users  to  start.   This
2710                     option   applies  only  to  SchedulerType=sched/backfill.
2711                     Also    see    the    bf_max_job_user    bf_max_job_part,
2712                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2713                     bf_max_job_test   to   a   value   much    higher    than
2714                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2715                     bf_max_job_test.
2716
2717              bf_max_job_part=#
2718                     The maximum number  of  jobs  per  partition  to  attempt
2719                     starting  with  the backfill scheduler. This can be espe‐
2720                     cially helpful for systems with large numbers  of  parti‐
2721                     tions  and  jobs.  This option applies only to Scheduler‐
2722                     Type=sched/backfill.  Also  see  the  partition_job_depth
2723                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2724                     value much higher than bf_max_job_part.  Default:  0  (no
2725                     limit), Min: 0, Max: bf_max_job_test.
2726
2727              bf_max_job_start=#
2728                     The  maximum  number  of jobs which can be initiated in a
2729                     single iteration of the backfill scheduler.  This  option
2730                     applies only to SchedulerType=sched/backfill.  Default: 0
2731                     (no limit), Min: 0, Max: 10000.
2732
2733              bf_max_job_test=#
2734                     The maximum number of jobs to attempt backfill scheduling
2735                     for (i.e. the queue depth).  Higher values result in more
2736                     overhead and less responsiveness.  Until  an  attempt  is
2737                     made  to backfill schedule a job, its expected initiation
2738                     time value will not be set.  In the case of  large  clus‐
2739                     ters,  configuring a relatively small value may be desir‐
2740                     able.    This   option   applies   only   to   Scheduler‐
2741                     Type=sched/backfill.    Default:   100,   Min:   1,  Max:
2742                     1,000,000.
2743
2744              bf_max_job_user=#
2745                     The maximum number of jobs per user to  attempt  starting
2746                     with  the backfill scheduler for ALL partitions.  One can
2747                     set this limit to prevent users from flooding  the  back‐
2748                     fill  queue  with jobs that cannot start and that prevent
2749                     jobs from other users to start.  This is similar  to  the
2750                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2751                     SchedulerType=sched/backfill.      Also      see      the
2752                     bf_max_job_part,            bf_max_job_test           and
2753                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2754                     value  much  higher than bf_max_job_user.  Default: 0 (no
2755                     limit), Min: 0, Max: bf_max_job_test.
2756
2757              bf_max_job_user_part=#
2758                     The maximum number of jobs  per  user  per  partition  to
2759                     attempt starting with the backfill scheduler for any sin‐
2760                     gle partition.  This option applies  only  to  Scheduler‐
2761                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2762                     bf_max_job_test and bf_max_job_user=# options.   Default:
2763                     0 (no limit), Min: 0, Max: bf_max_job_test.
2764
2765              bf_max_time=#
2766                     The  maximum  time  in seconds the backfill scheduler can
2767                     spend (including  time  spent  sleeping  when  locks  are
2768                     released)  before  discontinuing,  even  if  maximum  job
2769                     counts have not been reached.  This option  applies  only
2770                     to  SchedulerType=sched/backfill.   The  default value is
2771                     the value of bf_interval (which defaults to 30  seconds).
2772                     Default:  bf_interval  value  (def. 30 sec), Min: 1, Max:
2773                     3600 (1h).  NOTE: If bf_interval is short and bf_max_time
2774                     is  large,  this  may cause locks to be acquired too fre‐
2775                     quently and starve out other serviced RPCs.  It's  advis‐
2776                     able  if  using  this  parameter  to set max_rpc_cnt high
2777                     enough that scheduling isn't  always  disabled,  and  low
2778                     enough that the interactive workload can get through in a
2779                     reasonable period of time. max_rpc_cnt needs to be  below
2780                     256  (the  default  RPC thread limit). Running around the
2781                     middle (150) may  give  you  good  results.   NOTE:  When
2782                     increasing  the  amount  of  time  spent  in the backfill
2783                     scheduling cycle, Slurm can be prevented from  responding
2784                     to  client  requests in a timely manner.  To address this
2785                     you can use max_rpc_cnt to specify  a  number  of  queued
2786                     RPCs  before  the  scheduler  stops  to  respond to these
2787                     requests.
2788
2789              bf_min_age_reserve=#
2790                     The backfill and main scheduling logic will  not  reserve
2791                     resources  for  pending jobs until they have been pending
2792                     and runnable for at least the specified  number  of  sec‐
2793                     onds.  In addition, jobs waiting for less than the speci‐
2794                     fied number of seconds will not prevent a newly submitted
2795                     job  from starting immediately, even if the newly submit‐
2796                     ted job has a lower priority.  This can  be  valuable  if
2797                     jobs  lack  time  limits or all time limits have the same
2798                     value.  The default value is  zero,  which  will  reserve
2799                     resources  for  any  pending  job and delay initiation of
2800                     lower priority jobs.  Also see  bf_job_part_count_reserve
2801                     and   bf_min_prio_reserve.   Default:  0,  Min:  0,  Max:
2802                     2592000 (30 days).
2803
2804              bf_min_prio_reserve=#
2805                     The backfill and main scheduling logic will  not  reserve
2806                     resources  for  pending  jobs unless they have a priority
2807                     equal to or higher than the specified  value.   In  addi‐
2808                     tion, jobs with a lower priority will not prevent a newly
2809                     submitted job from  starting  immediately,  even  if  the
2810                     newly  submitted  job  has a lower priority.  This can be
2811                     valuable if one  wished  to  maximum  system  utilization
2812                     without  regard  for job priority below a certain thresh‐
2813                     old.  The default  value  is  zero,  which  will  reserve
2814                     resources  for  any  pending  job and delay initiation of
2815                     lower priority jobs.  Also see  bf_job_part_count_reserve
2816                     and bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2817
2818              bf_one_resv_per_job
2819                     Disallow  adding  more  than one backfill reservation per
2820                     job.  The scheduling logic builds a sorted list of  (job,
2821                     partition)  pairs.  Jobs submitted to multiple partitions
2822                     have as many entries in the list as requested partitions.
2823                     By  default,  the backfill scheduler may evaluate all the
2824                     (job, partition) entries for a  single  job,  potentially
2825                     reserving  resources for each pair, but only starting the
2826                     job in the reservation offering the earliest start  time.
2827                     Having a single job reserving resources for multiple par‐
2828                     titions could impede other jobs  (or  hetjob  components)
2829                     from  reserving resources already reserved for the reser‐
2830                     vations related to the partitions that  don't  offer  the
2831                     earliest  start time.  This option makes it so that a job
2832                     submitted to  multiple  partitions  will  stop  reserving
2833                     resources once the first (job, partition) pair has booked
2834                     a backfill reservation. Subsequent pairs  from  the  same
2835                     job  will  only  be  tested to start now. This allows for
2836                     other jobs to be able to book the other  pairs  resources
2837                     at  the cost of not guaranteeing that the multi partition
2838                     job will start in the  partition  offering  the  earliest
2839                     start  time (except if it can start now).  This option is
2840                     disabled by default.
2841
2842
2843              bf_resolution=#
2844                     The number of seconds in the  resolution  of  data  main‐
2845                     tained  about  when  jobs  begin  and  end. Higher values
2846                     result in  better  responsiveness  and  quicker  backfill
2847                     cycles  by  using larger blocks of time to determine node
2848                     eligibility.  However, higher values lead to  less  effi‐
2849                     cient  system  planning,  and  may  miss opportunities to
2850                     improve system utilization.  This option applies only  to
2851                     SchedulerType=sched/backfill.   Default: 60, Min: 1, Max:
2852                     3600 (1 hour).
2853
2854              bf_running_job_reserve
2855                     Add an extra step to backfill logic, which creates  back‐
2856                     fill  reservations for jobs running on whole nodes.  This
2857                     option is disabled by default.
2858
2859              bf_window=#
2860                     The number of minutes into the future to look  when  con‐
2861                     sidering  jobs to schedule.  Higher values result in more
2862                     overhead and less responsiveness.  A value  at  least  as
2863                     long  as  the  highest  allowed  time  limit is generally
2864                     advisable to prevent job starvation.  In order  to  limit
2865                     the  amount of data managed by the backfill scheduler, if
2866                     the value of bf_window is increased, then it is generally
2867                     advisable  to  also  increase bf_resolution.  This option
2868                     applies only to  SchedulerType=sched/backfill.   Default:
2869                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2870
2871              bf_window_linear=#
2872                     For  performance  reasons,  the  backfill  scheduler will
2873                     decrease precision in calculation of job expected  termi‐
2874                     nation times. By default, the precision starts at 30 sec‐
2875                     onds and that time interval doubles with each  evaluation
2876                     of currently executing jobs when trying to determine when
2877                     a pending job can start. This algorithm  can  support  an
2878                     environment  with many thousands of running jobs, but can
2879                     result in the expected start time of pending  jobs  being
2880                     gradually  being  deferred  due  to  lack of precision. A
2881                     value for bf_window_linear will cause the  time  interval
2882                     to  be  increased by a constant amount on each iteration.
2883                     The value is specified in units of seconds. For  example,
2884                     a  value  of  60 will cause the backfill scheduler on the
2885                     first iteration to identify the job  ending  soonest  and
2886                     determine  if  the  pending job can be started after that
2887                     job plus all other jobs expected to end within 30 seconds
2888                     (default  initial  value)  of  the first job. On the next
2889                     iteration, the pending job will be evaluated for starting
2890                     after  the  next job expected to end plus all jobs ending
2891                     within 90 seconds of that time (30 second  default,  plus
2892                     the  60  second  option value).  The third iteration will
2893                     have a 150 second window  and  the  fourth  210  seconds.
2894                     Without this option, the time windows will double on each
2895                     iteration and thus be 30, 60, 120, 240 seconds, etc.  The
2896                     use of bf_window_linear is not recommended with more than
2897                     a few hundred simultaneously executing jobs.
2898
2899              bf_yield_interval=#
2900                     The backfill scheduler will periodically relinquish locks
2901                     in  order  for  other  pending  operations to take place.
2902                     This specifies the times when the locks are  relinquished
2903                     in  microseconds.  Smaller values may be helpful for high
2904                     throughput computing when used in  conjunction  with  the
2905                     bf_continue  option.  Also see the bf_yield_sleep option.
2906                     Default: 2,000,000 (2 sec), Min: 1, Max:  10,000,000  (10
2907                     sec).
2908
2909              bf_yield_sleep=#
2910                     The backfill scheduler will periodically relinquish locks
2911                     in order for other  pending  operations  to  take  place.
2912                     This specifies the length of time for which the locks are
2913                     relinquished   in    microseconds.     Also    see    the
2914                     bf_yield_interval  option.   Default:  500,000 (0.5 sec),
2915                     Min: 1, Max: 10,000,000 (10 sec).
2916
2917              build_queue_timeout=#
2918                     Defines the maximum time that can be devoted to  building
2919                     a queue of jobs to be tested for scheduling.  If the sys‐
2920                     tem has a huge number of  jobs  with  dependencies,  just
2921                     building  the  job  queue  can  take  so  much time as to
2922                     adversely impact  overall  system  performance  and  this
2923                     parameter  can  be adjusted as needed.  The default value
2924                     is 2,000,000 microseconds (2 seconds).
2925
2926              correspond_after_task_cnt=#
2927                     Defines the number of array  tasks  that  get  split  for
2928                     potential  aftercorr  dependency  check.  Low  number may
2929                     result in dependent task check failures when the job  one
2930                     depends on gets purged before the split.  Default: 10.
2931
2932              default_queue_depth=#
2933                     The  default  number  of jobs to attempt scheduling (i.e.
2934                     the queue depth) when a running job  completes  or  other
2935                     routine  actions  occur, however the frequency with which
2936                     the scheduler is run may be limited by using the defer or
2937                     sched_min_interval  parameters described below.  The full
2938                     queue will be tested on a less frequent basis as  defined
2939                     by the sched_interval option described below. The default
2940                     value is 100.   See  the  partition_job_depth  option  to
2941                     limit depth by partition.
2942
2943              defer  Setting  this  option  will  avoid attempting to schedule
2944                     each job individually at job submit time,  but  defer  it
2945                     until a later time when scheduling multiple jobs simulta‐
2946                     neously may be possible.  This option may improve  system
2947                     responsiveness when large numbers of jobs (many hundreds)
2948                     are submitted at the same time, but  it  will  delay  the
2949                     initiation    time   of   individual   jobs.   Also   see
2950                     default_queue_depth above.
2951
2952              delay_boot=#
2953                     Do not reboot nodes in order to satisfied this job's fea‐
2954                     ture  specification  if  the job has been eligible to run
2955                     for less than this time period.  If the  job  has  waited
2956                     for  less  than  the  specified  period, it will use only
2957                     nodes which already have  the  specified  features.   The
2958                     argument  is  in  units  of minutes.  Individual jobs may
2959                     override this default value with the --delay-boot option.
2960
2961              disable_job_shrink
2962                     Deny user requests to shrink the side  of  running  jobs.
2963                     (However, running jobs may still shrink due to node fail‐
2964                     ure if the --no-kill option was set.)
2965
2966              disable_hetjob_steps
2967                     Disable job steps that  span  heterogeneous  job  alloca‐
2968                     tions.  The default value on Cray systems.
2969
2970              enable_hetjob_steps
2971                     Enable job steps that span heterogeneous job allocations.
2972                     The default value except for Cray systems.
2973
2974              enable_user_top
2975                     Enable use of the "scontrol top"  command  by  non-privi‐
2976                     leged users.
2977
2978              Ignore_NUMA
2979                     Some  processors  (e.g.  AMD Opteron 6000 series) contain
2980                     multiple NUMA nodes per socket. This is  a  configuration
2981                     which  does not map into the hardware entities that Slurm
2982                     optimizes  resource  allocation  for  (PU/thread,   core,
2983                     socket,  baseboard, node and network switch). In order to
2984                     optimize resource allocations  on  such  hardware,  Slurm
2985                     will consider each NUMA node within the socket as a sepa‐
2986                     rate socket by default. Use  the  Ignore_NUMA  option  to
2987                     report   the  correct  socket  count,  but  not  optimize
2988                     resource allocations on the NUMA nodes.
2989
2990              inventory_interval=#
2991                     On a Cray system using Slurm on top of ALPS  this  limits
2992                     the number of times a Basil Inventory call is made.  Nor‐
2993                     mally this call happens every scheduling consideration to
2994                     attempt to close a node state change window with respects
2995                     to what ALPS has.  This call is rather slow, so making it
2996                     less frequently improves performance dramatically, but in
2997                     the situation where a node changes state the window is as
2998                     large  as  this setting.  In an HTC environment this set‐
2999                     ting is a must and we advise around 10 seconds.
3000
3001              max_array_tasks
3002                     Specify the maximum number of tasks that be included in a
3003                     job  array.   The default limit is MaxArraySize, but this
3004                     option can be used to set a  lower  limit.  For  example,
3005                     max_array_tasks=1000 and MaxArraySize=100001 would permit
3006                     a maximum task ID of 100000,  but  limit  the  number  of
3007                     tasks in any single job array to 1000.
3008
3009              max_rpc_cnt=#
3010                     If  the  number of active threads in the slurmctld daemon
3011                     is equal to or larger than this value,  defer  scheduling
3012                     of  jobs. The scheduler will check this condition at cer‐
3013                     tain points in code and yield locks if  necessary.   This
3014                     can improve Slurm's ability to process requests at a cost
3015                     of  initiating  new  jobs  less  frequently.  Default:  0
3016                     (option disabled), Min: 0, Max: 1000.
3017
3018                     NOTE:  The maximum number of threads (MAX_SERVER_THREADS)
3019                     is internally set to 256 and defines the number of served
3020                     RPCs  at  a  given time. Setting max_rpc_cnt to more than
3021                     256 will be only useful to let backfill continue schedul‐
3022                     ing  work after locks have been yielded (i.e. each 2 sec‐
3023                     onds) if there are a maximum of  MAX(max_rpc_cnt/10,  20)
3024                     RPCs  in  the queue. i.e. max_rpc_cnt=1000, the scheduler
3025                     will be allowed to continue  after  yielding  locks  only
3026                     when  there  are  less than or equal to 100 pending RPCs.
3027                     If a value is set, then a value of 10 or higher is recom‐
3028                     mended.  It  may require some tuning for each system, but
3029                     needs to be high enough that scheduling isn't always dis‐
3030                     abled,  and low enough that requests can get through in a
3031                     reasonable period of time.
3032
3033              max_sched_time=#
3034                     How long, in seconds, that the main scheduling loop  will
3035                     execute for before exiting.  If a value is configured, be
3036                     aware that all other Slurm operations  will  be  deferred
3037                     during this time period.  Make certain the value is lower
3038                     than MessageTimeout.  If a value is not  explicitly  con‐
3039                     figured, the default value is half of MessageTimeout with
3040                     a minimum default value of 1 second and a maximum default
3041                     value  of  2  seconds.  For example if MessageTimeout=10,
3042                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
3043
3044              max_script_size=#
3045                     Specify the maximum size of a  batch  script,  in  bytes.
3046                     The  default  value  is  4  megabytes.  Larger values may
3047                     adversely impact system performance.
3048
3049              max_switch_wait=#
3050                     Maximum number of seconds that a job can delay  execution
3051                     waiting  for  the  specified  desired  switch  count. The
3052                     default value is 300 seconds.
3053
3054              no_backup_scheduling
3055                     If used, the backup controller  will  not  schedule  jobs
3056                     when it takes over. The backup controller will allow jobs
3057                     to be submitted, modified and cancelled but won't  sched‐
3058                     ule  new  jobs.  This is useful in Cray environments when
3059                     the backup controller resides on an external  Cray  node.
3060                     A  restart  is  required  to  alter  this option. This is
3061                     explicitly set on a Cray/ALPS system.
3062
3063              no_env_cache
3064                     If used, any job started on node that fails to  load  the
3065                     env  from  a  node  will fail instead of using the cached
3066                     env.  This will also implicitly  imply  the  requeue_set‐
3067                     up_env_fail option as well.
3068
3069              nohold_on_prolog_fail
3070                     By default, if the Prolog exits with a non-zero value the
3071                     job is requeued in  a  held  state.  By  specifying  this
3072                     parameter  the  job will be requeued but not held so that
3073                     the scheduler can dispatch it to another host.
3074
3075              pack_serial_at_end
3076                     If used  with  the  select/cons_res  or  select/cons_tres
3077                     plugin,  then put serial jobs at the end of the available
3078                     nodes rather than using a best fit algorithm.   This  may
3079                     reduce resource fragmentation for some workloads.
3080
3081              partition_job_depth=#
3082                     The  default  number  of jobs to attempt scheduling (i.e.
3083                     the queue depth) from  each  partition/queue  in  Slurm's
3084                     main  scheduling  logic.  The functionality is similar to
3085                     that provided by the bf_max_job_part option for the back‐
3086                     fill  scheduling  logic.   The  default  value  is  0 (no
3087                     limit).  Job's excluded from attempted  scheduling  based
3088                     upon   partition   will   not   be  counted  against  the
3089                     default_queue_depth limit.  Also see the  bf_max_job_part
3090                     option.
3091
3092              permit_job_expansion
3093                     Allow  running jobs to request additional nodes be merged
3094                     in with the current job allocation.
3095
3096              preempt_reorder_count=#
3097                     Specify how many attempts should be made in reording pre‐
3098                     emptable  jobs  to  minimize the count of jobs preempted.
3099                     The default value is 1. High values may adversely  impact
3100                     performance.   The  logic  to support this option is only
3101                     available in  the  select/cons_res  and  select/cons_tres
3102                     plugins.
3103
3104              preempt_strict_order
3105                     If set, then execute extra logic in an attempt to preempt
3106                     only the lowest priority jobs.  It may  be  desirable  to
3107                     set  this configuration parameter when there are multiple
3108                     priorities of preemptable jobs.   The  logic  to  support
3109                     this  option is only available in the select/cons_res and
3110                     select/cons_tres plugins.
3111
3112              preempt_youngest_first
3113                     If set, then the preemption  sorting  algorithm  will  be
3114                     changed  to sort by the job start times to favor preempt‐
3115                     ing younger jobs  over  older.  (Requires  preempt/parti‐
3116                     tion_prio or preempt/qos plugins.)
3117
3118              reduce_completing_frag
3119                     This   option  is  used  to  control  how  scheduling  of
3120                     resources is performed when jobs are  in  the  COMPLETING
3121                     state, which influences potential fragmentation.  If this
3122                     option is not set then no jobs will  be  started  in  any
3123                     partition  when  any  job  is in the COMPLETING state for
3124                     less than CompleteWait seconds.  If this  option  is  set
3125                     then  no jobs will be started in any individual partition
3126                     that has a job in COMPLETING state  for  less  than  Com‐
3127                     pleteWait  seconds.  In addition, no jobs will be started
3128                     in any partition with nodes that overlap with  any  nodes
3129                     in  the  partition of the completing job.  This option is
3130                     to be used in conjunction with CompleteWait.
3131
3132                     NOTE: CompleteWait must be set in order for this to work.
3133                     If CompleteWait=0 then this option does nothing.
3134
3135                     NOTE: reduce_completing_frag only affects the main sched‐
3136                     uler, not the backfill scheduler.
3137
3138              requeue_setup_env_fail
3139                     By default if a job environment setup fails the job keeps
3140                     running  with  a  limited environment. By specifying this
3141                     parameter the job will be requeued in held state and  the
3142                     execution node drained.
3143
3144              salloc_wait_nodes
3145                     If  defined, the salloc command will wait until all allo‐
3146                     cated nodes are ready for use (i.e.  booted)  before  the
3147                     command  returns.  By default, salloc will return as soon
3148                     as the resource allocation has been made.
3149
3150              sbatch_wait_nodes
3151                     If defined, the sbatch script will wait until  all  allo‐
3152                     cated  nodes  are  ready for use (i.e. booted) before the
3153                     initiation. By default, the sbatch script will be  initi‐
3154                     ated  as  soon as the first node in the job allocation is
3155                     ready. The sbatch command can  use  the  --wait-all-nodes
3156                     option to override this configuration parameter.
3157
3158              sched_interval=#
3159                     How frequently, in seconds, the main scheduling loop will
3160                     execute and test all pending jobs.  The default value  is
3161                     60 seconds.
3162
3163              sched_max_job_start=#
3164                     The maximum number of jobs that the main scheduling logic
3165                     will start in any single execution.  The default value is
3166                     zero, which imposes no limit.
3167
3168              sched_min_interval=#
3169                     How frequently, in microseconds, the main scheduling loop
3170                     will execute and test any pending  jobs.   The  scheduler
3171                     runs  in a limited fashion every time that any event hap‐
3172                     pens which could enable a job to start (e.g. job  submit,
3173                     job  terminate,  etc.).  If these events happen at a high
3174                     frequency, the scheduler can run very frequently and con‐
3175                     sume  significant  resources  if  not  throttled  by this
3176                     option.  This option specifies the minimum  time  between
3177                     the  end of one scheduling cycle and the beginning of the
3178                     next scheduling cycle.  A  value  of  zero  will  disable
3179                     throttling of the scheduling logic interval.  The default
3180                     value is 1,000,000 microseconds on Cray/ALPS systems  and
3181                     2 microseconds on other systems.
3182
3183              spec_cores_first
3184                     Specialized  cores  will be selected from the first cores
3185                     of the first sockets, cycling through the  sockets  on  a
3186                     round robin basis.  By default, specialized cores will be
3187                     selected from the last cores of the last sockets, cycling
3188                     through the sockets on a round robin basis.
3189
3190              step_retry_count=#
3191                     When a step completes and there are steps ending resource
3192                     allocation, then retry step allocations for at least this
3193                     number  of pending steps.  Also see step_retry_time.  The
3194                     default value is 8 steps.
3195
3196              step_retry_time=#
3197                     When a step completes and there are steps ending resource
3198                     allocation,  then  retry  step  allocations for all steps
3199                     which have been pending for at least this number of  sec‐
3200                     onds.   Also  see step_retry_count.  The default value is
3201                     60 seconds.
3202
3203              whole_hetjob
3204                     Requests to cancel, hold or release any  component  of  a
3205                     heterogeneous  job  will  be applied to all components of
3206                     the job.
3207
3208                     NOTE: this option was  previously  named  whole_pack  and
3209                     this is still supported for retrocompatibility.
3210
3211
3212       SchedulerTimeSlice
3213              Number  of  seconds  in  each time slice when gang scheduling is
3214              enabled (PreemptMode=SUSPEND,GANG).  The value must be between 5
3215              seconds and 65533 seconds.  The default value is 30 seconds.
3216
3217
3218       SchedulerType
3219              Identifies the type of scheduler to be used.  Note the slurmctld
3220              daemon must be restarted for  a  change  in  scheduler  type  to
3221              become  effective  (reconfiguring a running daemon has no effect
3222              for this parameter).  The scontrol command can be used to  manu‐
3223              ally  change  job  priorities  if  desired.   Acceptable  values
3224              include:
3225
3226              sched/backfill
3227                     For a backfill scheduling module to augment  the  default
3228                     FIFO   scheduling.   Backfill  scheduling  will  initiate
3229                     lower-priority jobs  if  doing  so  does  not  delay  the
3230                     expected  initiation  time  of  any  higher priority job.
3231                     Effectiveness of backfill scheduling  is  dependent  upon
3232                     users specifying job time limits, otherwise all jobs will
3233                     have the same time limit and backfilling  is  impossible.
3234                     Note  documentation  for  the  SchedulerParameters option
3235                     above.  This is the default configuration.
3236
3237              sched/builtin
3238                     This is the FIFO scheduler which initiates jobs in prior‐
3239                     ity order.  If any job in the partition can not be sched‐
3240                     uled, no lower priority job in  that  partition  will  be
3241                     scheduled.   An  exception  is made for jobs that can not
3242                     run due to partition constraints (e.g. the time limit) or
3243                     down/drained  nodes.   In  that case, lower priority jobs
3244                     can be initiated and not impact the higher priority job.
3245
3246              sched/hold
3247                     To   hold   all   newly   arriving   jobs   if   a   file
3248                     "/etc/slurm.hold"  exists otherwise use the built-in FIFO
3249                     scheduler
3250
3251
3252       ScronParameters
3253              Multiple options may be comma-separated.
3254
3255              enable Enable the use of scrontab to submit and manage  periodic
3256                     repeating jobs.
3257
3258
3259       SelectType
3260              Identifies  the type of resource selection algorithm to be used.
3261              Changing this value can only be done by restarting the slurmctld
3262              daemon.  When changed, all job information (running and pending)
3263              will be lost, since the job state save format used by each plug‐
3264              in  is  different.   The only exception to this is when changing
3265              from cons_res to cons_tres or from cons_tres to  cons_res.  How‐
3266              ever,  if  a  job  contains cons_tres-specific features and then
3267              SelectType is changed to cons_res, the  job  will  be  canceled,
3268              since  there is no way for cons_res to satisfy requirements spe‐
3269              cific to cons_tres.
3270
3271              Acceptable values include
3272
3273              select/cons_res
3274                     The resources (cores and memory) within a node are  indi‐
3275                     vidually  allocated  as  consumable resources.  Note that
3276                     whole nodes can be allocated to jobs for selected  parti‐
3277                     tions  by  using the OverSubscribe=Exclusive option.  See
3278                     the partition OverSubscribe parameter for  more  informa‐
3279                     tion.
3280
3281              select/cons_tres
3282                     The  resources  (cores, memory, GPUs and all other track‐
3283                     able resources) within a node are individually  allocated
3284                     as  consumable  resources.   Note that whole nodes can be
3285                     allocated to jobs for selected partitions  by  using  the
3286                     OverSubscribe=Exclusive  option.  See the partition Over‐
3287                     Subscribe parameter for more information.
3288
3289              select/cray_aries
3290                     for   a   Cray   system.    The    default    value    is
3291                     "select/cray_aries" for all Cray systems.
3292
3293              select/linear
3294                     for allocation of entire nodes assuming a one-dimensional
3295                     array of nodes in which sequentially  ordered  nodes  are
3296                     preferable.   For a heterogeneous cluster (e.g. different
3297                     CPU counts on the various  nodes),  resource  allocations
3298                     will  favor  nodes  with  high CPU counts as needed based
3299                     upon the job's node and CPU specification if TopologyPlu‐
3300                     gin=topology/none  is  configured.  Use of other topology
3301                     plugins with select/linear and heterogeneous nodes is not
3302                     recommended  and  may  result  in  valid  job  allocation
3303                     requests being rejected.  This is the default value.
3304
3305
3306       SelectTypeParameters
3307              The permitted values of  SelectTypeParameters  depend  upon  the
3308              configured  value of SelectType.  The only supported options for
3309              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3310              which treats memory as a consumable resource and prevents memory
3311              over subscription with job preemption or  gang  scheduling.   By
3312              default  SelectType=select/linear  allocates whole nodes to jobs
3313              without  considering  their  memory  consumption.   By   default
3314              SelectType=select/cons_res,   SelectType=select/cray_aries,  and
3315              SelectType=select/cons_tres, use  CR_CPU,  which  allocates  CPU
3316              (threads) to jobs without considering their memory consumption.
3317
3318              The    following    options    are    supported    for   Select‐
3319              Type=select/cray_aries:
3320
3321                     OTHER_CONS_RES
3322                            Layer  the  select/cons_res   plugin   under   the
3323                            select/cray_aries  plugin, the default is to layer
3324                            on  select/linear.   This  also  allows  all   the
3325                            options available for SelectType=select/cons_res.
3326
3327                     OTHER_CONS_TRES
3328                            Layer   the   select/cons_tres  plugin  under  the
3329                            select/cray_aries plugin, the default is to  layer
3330                            on   select/linear.   This  also  allows  all  the
3331                            options available for SelectType=select/cons_tres.
3332
3333              The   following   options   are   supported   by   the   Select‐
3334              Type=select/cons_res and SelectType=select/cons_tres plugins:
3335
3336                     CR_CPU CPUs are consumable resources.  Configure the num‐
3337                            ber of CPUs on each node, which may  be  equal  to
3338                            the  count  of  cores or hyper-threads on the node
3339                            depending upon the desired minimum resource  allo‐
3340                            cation.   The  node's  Boards,  Sockets, CoresPer‐
3341                            Socket and ThreadsPerCore may optionally  be  con‐
3342                            figured  and  result in job allocations which have
3343                            improved locality; however doing so  will  prevent
3344                            more  than  one  job  from being allocated on each
3345                            core.
3346
3347                     CR_CPU_Memory
3348                            CPUs and memory are consumable resources.  Config‐
3349                            ure  the number of CPUs on each node, which may be
3350                            equal to the count of cores  or  hyper-threads  on
3351                            the   node  depending  upon  the  desired  minimum
3352                            resource allocation.  The node's Boards,  Sockets,
3353                            CoresPerSocket  and  ThreadsPerCore may optionally
3354                            be configured and result in job allocations  which
3355                            have improved locality; however doing so will pre‐
3356                            vent more than one job  from  being  allocated  on
3357                            each  core.   Setting  a value for DefMemPerCPU is
3358                            strongly recommended.
3359
3360                     CR_Core
3361                            Cores are consumable  resources.   On  nodes  with
3362                            hyper-threads,  each thread is counted as a CPU to
3363                            satisfy a job's resource requirement, but multiple
3364                            jobs  are  not allocated threads on the same core.
3365                            The count of CPUs allocated to a job is rounded up
3366                            to  account  for  every  CPU on an allocated core.
3367                            This will also impact total allocated memory  when
3368                            --mem-per-cpu is used to be multiply of total num‐
3369                            ber of CPUs on allocated cores.
3370
3371                     CR_Core_Memory
3372                            Cores and memory  are  consumable  resources.   On
3373                            nodes  with  hyper-threads, each thread is counted
3374                            as a CPU to satisfy a job's resource  requirement,
3375                            but multiple jobs are not allocated threads on the
3376                            same core.  The count of CPUs allocated to  a  job
3377                            may  be  rounded up to account for every CPU on an
3378                            allocated core.  Setting a value for  DefMemPerCPU
3379                            is strongly recommended.
3380
3381                     CR_ONE_TASK_PER_CORE
3382                            Allocate  one  task  per core by default.  Without
3383                            this option, by default one task will be allocated
3384                            per thread on nodes with more than one ThreadsPer‐
3385                            Core configured.  NOTE: This option cannot be used
3386                            with CR_CPU*.
3387
3388                     CR_CORE_DEFAULT_DIST_BLOCK
3389                            Allocate cores within a node using block distribu‐
3390                            tion by default.  This is a pseudo-best-fit  algo‐
3391                            rithm that minimizes the number of boards and min‐
3392                            imizes  the  number  of  sockets  (within  minimum
3393                            boards)  used  for  the  allocation.  This default
3394                            behavior can be overridden specifying a particular
3395                            "-m"  parameter  with srun/salloc/sbatch.  Without
3396                            this option,  cores  will  be  allocated  cyclicly
3397                            across the sockets.
3398
3399                     CR_LLN Schedule  resources  to  jobs  on the least loaded
3400                            nodes (based upon the number of idle  CPUs).  This
3401                            is  generally  only recommended for an environment
3402                            with serial jobs as idle resources will tend to be
3403                            highly  fragmented,  resulting  in  parallel  jobs
3404                            being distributed across many  nodes.   Note  that
3405                            node  Weight  takes  precedence over how many idle
3406                            resources are on each node.  Also see  the  parti‐
3407                            tion  configuration  parameter  LLN  use the least
3408                            loaded nodes in selected partitions.
3409
3410                     CR_Pack_Nodes
3411                            If a job allocation contains more  resources  than
3412                            will  be  used  for launching tasks (e.g. if whole
3413                            nodes are allocated to a job),  then  rather  than
3414                            distributing a job's tasks evenly across its allo‐
3415                            cated nodes, pack them as tightly as  possible  on
3416                            these  nodes.  For example, consider a job alloca‐
3417                            tion containing two entire nodes with  eight  CPUs
3418                            each.   If  the  job starts ten tasks across those
3419                            two nodes without this option, it will start  five
3420                            tasks on each of the two nodes.  With this option,
3421                            eight tasks will be started on the first node  and
3422                            two  tasks on the second node.  This can be super‐
3423                            seded  by  "NoPack"  in  srun's   "--distribution"
3424                            option.    CR_Pack_Nodes  only  applies  when  the
3425                            "block" task distribution method is used.
3426
3427                     CR_Socket
3428                            Sockets are consumable resources.  On  nodes  with
3429                            multiple  cores, each core or thread is counted as
3430                            a CPU to satisfy a job's resource requirement, but
3431                            multiple  jobs  are not allocated resources on the
3432                            same socket.
3433
3434                     CR_Socket_Memory
3435                            Memory and sockets are consumable  resources.   On
3436                            nodes  with multiple cores, each core or thread is
3437                            counted as a  CPU  to  satisfy  a  job's  resource
3438                            requirement,  but  multiple jobs are not allocated
3439                            resources on the same socket.  Setting a value for
3440                            DefMemPerCPU is strongly recommended.
3441
3442                     CR_Memory
3443                            Memory  is  a  consumable  resource.   NOTE:  This
3444                            implies OverSubscribe=YES  or  OverSubscribe=FORCE
3445                            for  all  partitions.  Setting a value for DefMem‐
3446                            PerCPU is strongly recommended.
3447
3448
3449       SlurmctldAddr
3450              An optional address to be used for communications  to  the  cur‐
3451              rently  active  slurmctld  daemon, normally used with Virtual IP
3452              addressing of the currently active server.  If this parameter is
3453              not  specified then each primary and backup server will have its
3454              own unique address used for communications as specified  in  the
3455              SlurmctldHost  parameter.   If  this parameter is specified then
3456              the SlurmctldHost parameter will still be  used  for  communica‐
3457              tions to specific slurmctld primary or backup servers, for exam‐
3458              ple to cause all of them to read the current configuration files
3459              or  shutdown.   Also  see the SlurmctldPrimaryOffProg and Slurm‐
3460              ctldPrimaryOnProg configuration parameters to configure programs
3461              to manipulate virtual IP address manipulation.
3462
3463
3464       SlurmctldDebug
3465              The  level  of  detail  to provide slurmctld daemon's logs.  The
3466              default value is info.  If the  slurmctld  daemon  is  initiated
3467              with  -v or --verbose options, that debug level will be preserve
3468              or restored upon reconfiguration.
3469
3470
3471              quiet     Log nothing
3472
3473              fatal     Log only fatal errors
3474
3475              error     Log only errors
3476
3477              info      Log errors and general informational messages
3478
3479              verbose   Log errors and verbose informational messages
3480
3481              debug     Log errors  and  verbose  informational  messages  and
3482                        debugging messages
3483
3484              debug2    Log errors and verbose informational messages and more
3485                        debugging messages
3486
3487              debug3    Log errors and verbose informational messages and even
3488                        more debugging messages
3489
3490              debug4    Log errors and verbose informational messages and even
3491                        more debugging messages
3492
3493              debug5    Log errors and verbose informational messages and even
3494                        more debugging messages
3495
3496
3497       SlurmctldHost
3498              The  short, or long, hostname of the machine where Slurm control
3499              daemon is executed (i.e. the name returned by the command "host‐
3500              name -s").  This hostname is optionally followed by the address,
3501              either the IP address or a name by  which  the  address  can  be
3502              identifed,  enclosed  in parentheses (e.g.  SlurmctldHost=slurm‐
3503              ctl-primary(12.34.56.78)). This value must be specified at least
3504              once. If specified more than once, the first hostname named will
3505              be where the daemon runs.  If the first  specified  host  fails,
3506              the  daemon  will execute on the second host.  If both the first
3507              and second specified host fails, the daemon will execute on  the
3508              third host.
3509
3510
3511       SlurmctldLogFile
3512              Fully qualified pathname of a file into which the slurmctld dae‐
3513              mon's logs are written.  The default  value  is  none  (performs
3514              logging via syslog).
3515              See the section LOGGING if a pathname is specified.
3516
3517
3518       SlurmctldParameters
3519              Multiple options may be comma-separated.
3520
3521
3522              allow_user_triggers
3523                     Permit  setting  triggers from non-root/slurm_user users.
3524                     SlurmUser must also be set to root to permit these  trig‐
3525                     gers  to  work.  See the strigger man page for additional
3526                     details.
3527
3528              cloud_dns
3529                     By default, Slurm expects that the network address for  a
3530                     cloud  node won't be known until the creation of the node
3531                     and that Slurm will be notified  of  the  node's  address
3532                     (e.g.  scontrol  update nodename=<name> nodeaddr=<addr>).
3533                     Since Slurm communications rely on the node configuration
3534                     found  in the slurm.conf, Slurm will tell the client com‐
3535                     mand, after waiting for all nodes to boot, each node's ip
3536                     address.  However, in environments where the nodes are in
3537                     DNS, this step can be avoided by configuring this option.
3538
3539              cloud_reg_addrs
3540                     When a cloud node  registers,  the  node's  NodeAddr  and
3541                     NodeHostName  will  automatically  be  set.  They will be
3542                     reset back to the nodename after powering off.
3543
3544              enable_configless
3545                     Permit "configless" operation by the slurmd,  slurmstepd,
3546                     and  user commands.  When enabled the slurmd will be per‐
3547                     mitted to retrieve config files from the  slurmctld,  and
3548                     on any 'scontrol reconfigure' command new configs will be
3549                     automatically pushed out and applied to  nodes  that  are
3550                     running  in  this  "configless" mode.  NOTE: a restart of
3551                     the slurmctld is required for this to take effect.
3552
3553              idle_on_node_suspend
3554                     Mark nodes as idle, regardless  of  current  state,  when
3555                     suspending  nodes  with SuspendProgram so that nodes will
3556                     be eligible to be resumed at a later time.
3557
3558              power_save_interval
3559                     How often the power_save thread looks to resume and  sus‐
3560                     pend  nodes. The power_save thread will do work sooner if
3561                     there are node state changes. Default is 10 seconds.
3562
3563              power_save_min_interval
3564                     How often the power_save thread, at a minimun,  looks  to
3565                     resume and suspend nodes. Default is 0.
3566
3567              max_dbd_msg_action
3568                     Action used once MaxDBDMsgs is reached, options are 'dis‐
3569                     card' (default) and 'exit'.
3570
3571                     When 'discard' is specified and MaxDBDMsgs is reached  we
3572                     start by purging pending messages of types Step start and
3573                     complete, and it reaches MaxDBDMsgs again Job start  mes‐
3574                     sages  are  purged.  Job completes and node state changes
3575                     continue to consume the  empty  space  created  from  the
3576                     purgings  until  MaxDBDMsgs  is reached again at which no
3577                     new message is tracked creating data loss and potentially
3578                     runaway jobs.
3579
3580                     When  'exit'  is  specified and MaxDBDMsgs is reached the
3581                     slurmctld will exit instead of discarding  any  messages.
3582                     It  will  be  impossible to start the slurmctld with this
3583                     option where the slurmdbd is down and  the  slurmctld  is
3584                     tracking more than MaxDBDMsgs.
3585
3586
3587              preempt_send_user_signal
3588                     Send the user signal (e.g. --signal=<sig_num>) at preemp‐
3589                     tion time even if the signal time hasn't been reached. In
3590                     the  case  of a gracetime preemption the user signal will
3591                     be sent if the user signal has  been  specified  and  not
3592                     sent, otherwise a SIGTERM will be sent to the tasks.
3593
3594              reboot_from_controller
3595                     Run  the  RebootProgram from the controller instead of on
3596                     the slurmds. The RebootProgram will be  passed  a  comma-
3597                     separated list of nodes to reboot.
3598
3599              user_resv_delete
3600                     Allow any user able to run in a reservation to delete it.
3601
3602
3603       SlurmctldPidFile
3604              Fully  qualified  pathname  of  a file into which the  slurmctld
3605              daemon may write its process id. This may be used for  automated
3606              signal   processing.   The  default  value  is  "/var/run/slurm‐
3607              ctld.pid".
3608
3609
3610       SlurmctldPlugstack
3611              A comma delimited list of Slurm controller plugins to be started
3612              when  the  daemon  begins and terminated when it ends.  Only the
3613              plugin's init and fini functions are called.
3614
3615
3616       SlurmctldPort
3617              The port number that the Slurm controller, slurmctld, listens to
3618              for  work. The default value is SLURMCTLD_PORT as established at
3619              system build time. If none is explicitly specified, it  will  be
3620              set  to 6817.  SlurmctldPort may also be configured to support a
3621              range of port numbers in order to accept larger bursts of incom‐
3622              ing messages by specifying two numbers separated by a dash (e.g.
3623              SlurmctldPort=6817-6818).  NOTE:  Either  slurmctld  and  slurmd
3624              daemons  must  not  execute  on  the same nodes or the values of
3625              SlurmctldPort and SlurmdPort must be different.
3626
3627              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3628              automatically  try  to  interact  with  anything opened on ports
3629              8192-60000.  Configure SlurmctldPort to use a  port  outside  of
3630              the configured SrunPortRange and RSIP's port range.
3631
3632
3633       SlurmctldPrimaryOffProg
3634              This  program is executed when a slurmctld daemon running as the
3635              primary server becomes a backup server. By default no program is
3636              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3637              ter.
3638
3639
3640       SlurmctldPrimaryOnProg
3641              This program is executed when a slurmctld daemon  running  as  a
3642              backup  server becomes the primary server. By default no program
3643              is executed.  When using virtual IP  addresses  to  manage  High
3644              Available Slurm services, this program can be used to add the IP
3645              address to an interface (and optionally try to  kill  the  unre‐
3646              sponsive   slurmctld daemon and flush the ARP caches on nodes on
3647              the local ethernet fabric).  See also the related "SlurmctldPri‐
3648              maryOffProg" parameter.
3649
3650       SlurmctldSyslogDebug
3651              The  slurmctld  daemon will log events to the syslog file at the
3652              specified level of detail. If not set, the slurmctld daemon will
3653              log  to  syslog at level fatal, unless there is no SlurmctldLog‐
3654              File and it is running in the background, in which case it  will
3655              log to syslog at the level specified by SlurmctldDebug (at fatal
3656              in the case that SlurmctldDebug is set to quiet) or it is run in
3657              the foreground, when it will be set to quiet.
3658
3659
3660              quiet     Log nothing
3661
3662              fatal     Log only fatal errors
3663
3664              error     Log only errors
3665
3666              info      Log errors and general informational messages
3667
3668              verbose   Log errors and verbose informational messages
3669
3670              debug     Log  errors  and  verbose  informational  messages and
3671                        debugging messages
3672
3673              debug2    Log errors and verbose informational messages and more
3674                        debugging messages
3675
3676              debug3    Log errors and verbose informational messages and even
3677                        more debugging messages
3678
3679              debug4    Log errors and verbose informational messages and even
3680                        more debugging messages
3681
3682              debug5    Log errors and verbose informational messages and even
3683                        more debugging messages
3684
3685
3686
3687       SlurmctldTimeout
3688              The interval, in seconds, that the backup controller  waits  for
3689              the  primary controller to respond before assuming control.  The
3690              default value is 120 seconds.  May not exceed 65533.
3691
3692
3693       SlurmdDebug
3694              The level of  detail  to  provide  slurmd  daemon's  logs.   The
3695              default value is info.
3696
3697              quiet     Log nothing
3698
3699              fatal     Log only fatal errors
3700
3701              error     Log only errors
3702
3703              info      Log errors and general informational messages
3704
3705              verbose   Log errors and verbose informational messages
3706
3707              debug     Log  errors  and  verbose  informational  messages and
3708                        debugging messages
3709
3710              debug2    Log errors and verbose informational messages and more
3711                        debugging messages
3712
3713              debug3    Log errors and verbose informational messages and even
3714                        more debugging messages
3715
3716              debug4    Log errors and verbose informational messages and even
3717                        more debugging messages
3718
3719              debug5    Log errors and verbose informational messages and even
3720                        more debugging messages
3721
3722
3723       SlurmdLogFile
3724              Fully qualified pathname of a file into which the   slurmd  dae‐
3725              mon's  logs  are  written.   The default value is none (performs
3726              logging via syslog).  Any "%h" within the name is replaced  with
3727              the  hostname  on  which the slurmd is running.  Any "%n" within
3728              the name is replaced with the  Slurm  node  name  on  which  the
3729              slurmd is running.
3730              See the section LOGGING if a pathname is specified.
3731
3732
3733       SlurmdParameters
3734              Parameters  specific  to  the  Slurmd.   Multiple options may be
3735              comma separated.
3736
3737              config_overrides
3738                     If set, consider the configuration of  each  node  to  be
3739                     that  specified  in the slurm.conf configuration file and
3740                     any node with less than the configured resources will not
3741                     be  set  DRAIN.  This option is generally only useful for
3742                     testing  purposes.   Equivalent  to  the  now  deprecated
3743                     FastSchedule=2 option.
3744
3745              shutdown_on_reboot
3746                     If  set,  the  Slurmd will shut itself down when a reboot
3747                     request is received.
3748
3749
3750       SlurmdPidFile
3751              Fully qualified pathname of a file into which the  slurmd daemon
3752              may  write its process id. This may be used for automated signal
3753              processing.  Any "%h" within the name is replaced with the host‐
3754              name  on  which the slurmd is running.  Any "%n" within the name
3755              is replaced with the Slurm node name on which the slurmd is run‐
3756              ning.  The default value is "/var/run/slurmd.pid".
3757
3758
3759       SlurmdPort
3760              The port number that the Slurm compute node daemon, slurmd, lis‐
3761              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3762              lished  at  system  build time. If none is explicitly specified,
3763              its value will be 6818.  NOTE: Either slurmctld and slurmd  dae‐
3764              mons  must not execute on the same nodes or the values of Slurm‐
3765              ctldPort and SlurmdPort must be different.
3766
3767              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3768              automatically  try  to  interact  with  anything opened on ports
3769              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3770              configured SrunPortRange and RSIP's port range.
3771
3772
3773       SlurmdSpoolDir
3774              Fully  qualified  pathname  of a directory into which the slurmd
3775              daemon's state information and batch job script information  are
3776              written.  This  must  be  a  common  pathname for all nodes, but
3777              should represent a directory which is local to each node (refer‐
3778              ence    a   local   file   system).   The   default   value   is
3779              "/var/spool/slurmd".  Any "%h" within the name is replaced  with
3780              the  hostname  on  which the slurmd is running.  Any "%n" within
3781              the name is replaced with the  Slurm  node  name  on  which  the
3782              slurmd is running.
3783
3784
3785       SlurmdSyslogDebug
3786              The  slurmd  daemon  will  log  events to the syslog file at the
3787              specified level of detail. If not set, the  slurmd  daemon  will
3788              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3789              and it is running in the background, in which case it  will  log
3790              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3791              the case that SlurmdDebug is set to quiet) or it is run  in  the
3792              foreground, when it will be set to quiet.
3793
3794
3795              quiet     Log nothing
3796
3797              fatal     Log only fatal errors
3798
3799              error     Log only errors
3800
3801              info      Log errors and general informational messages
3802
3803              verbose   Log errors and verbose informational messages
3804
3805              debug     Log  errors  and  verbose  informational  messages and
3806                        debugging messages
3807
3808              debug2    Log errors and verbose informational messages and more
3809                        debugging messages
3810
3811              debug3    Log errors and verbose informational messages and even
3812                        more debugging messages
3813
3814              debug4    Log errors and verbose informational messages and even
3815                        more debugging messages
3816
3817              debug5    Log errors and verbose informational messages and even
3818                        more debugging messages
3819
3820
3821       SlurmdTimeout
3822              The interval, in seconds, that the Slurm  controller  waits  for
3823              slurmd  to respond before configuring that node's state to DOWN.
3824              A value of zero indicates the node will not be tested by  slurm‐
3825              ctld  to confirm the state of slurmd, the node will not be auto‐
3826              matically set  to  a  DOWN  state  indicating  a  non-responsive
3827              slurmd,  and  some other tool will take responsibility for moni‐
3828              toring the state of each compute node  and  its  slurmd  daemon.
3829              Slurm's hierarchical communication mechanism is used to ping the
3830              slurmd daemons in order to minimize system noise  and  overhead.
3831              The  default  value  is  300  seconds.  The value may not exceed
3832              65533 seconds.
3833
3834
3835       SlurmdUser
3836              The name of the user that the slurmd daemon executes  as.   This
3837              user  must  exist on all nodes of the cluster for authentication
3838              of communications between Slurm components.  The  default  value
3839              is "root".
3840
3841
3842       SlurmSchedLogFile
3843              Fully  qualified  pathname of the scheduling event logging file.
3844              The syntax of this parameter is the same  as  for  SlurmctldLog‐
3845              File.   In  order  to  configure scheduler logging, set both the
3846              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3847
3848
3849       SlurmSchedLogLevel
3850              The initial level of scheduling event logging,  similar  to  the
3851              SlurmctldDebug  parameter  used  to control the initial level of
3852              slurmctld logging.  Valid values for SlurmSchedLogLevel are  "0"
3853              (scheduler   logging   disabled)   and  "1"  (scheduler  logging
3854              enabled).  If this parameter is omitted, the value  defaults  to
3855              "0"  (disabled).   In  order to configure scheduler logging, set
3856              both the SlurmSchedLogFile  and  SlurmSchedLogLevel  parameters.
3857              The  scheduler  logging  level  can be changed dynamically using
3858              scontrol.
3859
3860
3861       SlurmUser
3862              The name of the user that the slurmctld daemon executes as.  For
3863              security  purposes,  a  user  other  than "root" is recommended.
3864              This user must exist on all nodes of the cluster for authentica‐
3865              tion  of  communications  between Slurm components.  The default
3866              value is "root".
3867
3868
3869       SrunEpilog
3870              Fully qualified pathname of an executable to be run by srun fol‐
3871              lowing the completion of a job step.  The command line arguments
3872              for the executable will be the command and arguments of the  job
3873              step.   This configuration parameter may be overridden by srun's
3874              --epilog parameter. Note that while the other "Epilog"  executa‐
3875              bles  (e.g.,  TaskEpilog) are run by slurmd on the compute nodes
3876              where the tasks are executed, the SrunEpilog runs  on  the  node
3877              where the "srun" is executing.
3878
3879
3880       SrunPortRange
3881              The  srun  creates  a set of listening ports to communicate with
3882              the controller, the slurmstepd and  to  handle  the  application
3883              I/O.  By default these ports are ephemeral meaning the port num‐
3884              bers are selected by the  kernel.  Using  this  parameter  allow
3885              sites  to  configure a range of ports from which srun ports will
3886              be selected. This is useful if sites want to allow only  certain
3887              port range on their network.
3888
3889              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3890              automatically try to interact  with  anything  opened  on  ports
3891              8192-60000.   Configure  SrunPortRange  to  use a range of ports
3892              above those used by RSIP, ideally 1000 or more ports, for  exam‐
3893              ple "SrunPortRange=60001-63000".
3894
3895              Note:  A  sufficient number of ports must be configured based on
3896              the estimated number of srun on the submission nodes considering
3897              that  srun  opens  3  listening  ports  plus 2 more for every 48
3898              hosts. Example:
3899
3900              srun -N 48 will use 5 listening ports.
3901
3902
3903              srun -N 50 will use 7 listening ports.
3904
3905
3906              srun -N 200 will use 13 listening ports.
3907
3908
3909       SrunProlog
3910              Fully qualified pathname of an executable  to  be  run  by  srun
3911              prior  to  the launch of a job step.  The command line arguments
3912              for the executable will be the command and arguments of the  job
3913              step.   This configuration parameter may be overridden by srun's
3914              --prolog parameter. Note that while the other "Prolog"  executa‐
3915              bles  (e.g.,  TaskProlog) are run by slurmd on the compute nodes
3916              where the tasks are executed, the SrunProlog runs  on  the  node
3917              where the "srun" is executing.
3918
3919
3920       StateSaveLocation
3921              Fully  qualified  pathname  of  a directory into which the Slurm
3922              controller,     slurmctld,     saves     its     state     (e.g.
3923              "/usr/local/slurm/checkpoint").   Slurm state will saved here to
3924              recover from system failures.  SlurmUser must be able to  create
3925              files  in this directory.  If you have a secondary SlurmctldHost
3926              configured, this location should be  readable  and  writable  by
3927              both  systems.  Since all running and pending job information is
3928              stored here, the use of a reliable file system  (e.g.  RAID)  is
3929              recommended.   The  default value is "/var/spool".  If any slurm
3930              daemons terminate abnormally, their  core  files  will  also  be
3931              written into this directory.
3932
3933
3934       SuspendExcNodes
3935              Specifies  the  nodes  which  are to not be placed in power save
3936              mode, even if the node remains idle for an  extended  period  of
3937              time.  Use Slurm's hostlist expression to identify nodes with an
3938              optional ":" separator and count of nodes to  exclude  from  the
3939              preceding  range.   For  example  "nid[10-20]:4"  will prevent 4
3940              usable nodes (i.e IDLE and not DOWN, DRAINING or already powered
3941              down) in the set "nid[10-20]" from being powered down.  Multiple
3942              sets of nodes can be specified with or without counts in a comma
3943              separated  list  (e.g  "nid[10-20]:4,nid[80-90]:2").   If a node
3944              count specification is given, any list of nodes to  NOT  have  a
3945              node  count  must  be after the last specification with a count.
3946              For example "nid[10-20]:4,nid[60-70]" will exclude  4  nodes  in
3947              the  set  "nid[10-20]:4"  plus all nodes in the set "nid[60-70]"
3948              while "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the  set
3949              "nid[1-3],nid[10-20]".    By  default  no  nodes  are  excluded.
3950              Related configuration options include ResumeTimeout,  ResumePro‐
3951              gram, ResumeRate, SuspendProgram, SuspendRate, SuspendTime, Sus‐
3952              pendTimeout, and SuspendExcParts.
3953
3954
3955       SuspendExcParts
3956              Specifies the partitions whose nodes are to  not  be  placed  in
3957              power  save  mode, even if the node remains idle for an extended
3958              period of time.  Multiple partitions can be identified and sepa‐
3959              rated  by  commas.   By  default no nodes are excluded.  Related
3960              configuration  options  include  ResumeTimeout,   ResumeProgram,
3961              ResumeRate,  SuspendProgram,  SuspendRate,  SuspendTime Suspend‐
3962              Timeout, and SuspendExcNodes.
3963
3964
3965       SuspendProgram
3966              SuspendProgram is the program that will be executed when a  node
3967              remains  idle  for  an extended period of time.  This program is
3968              expected to place the node into some power save mode.  This  can
3969              be  used  to  reduce the frequency and voltage of a node or com‐
3970              pletely power the node off.  The program executes as  SlurmUser.
3971              The  argument  to  the  program will be the names of nodes to be
3972              placed into power savings mode (using Slurm's  hostlist  expres‐
3973              sion  format).  By default, no program is run.  Related configu‐
3974              ration options include ResumeTimeout, ResumeProgram, ResumeRate,
3975              SuspendRate,  SuspendTime,  SuspendTimeout, SuspendExcNodes, and
3976              SuspendExcParts.
3977
3978
3979       SuspendRate
3980              The rate at which nodes are placed into power save mode by  Sus‐
3981              pendProgram.  The value is number of nodes per minute and it can
3982              be used to prevent a large drop in power consumption (e.g. after
3983              a  large  job  completes).  A value of zero results in no limits
3984              being imposed.  The  default  value  is  60  nodes  per  minute.
3985              Related  configuration options include ResumeTimeout, ResumePro‐
3986              gram, ResumeRate, SuspendProgram,  SuspendTime,  SuspendTimeout,
3987              SuspendExcNodes, and SuspendExcParts.
3988
3989
3990       SuspendTime
3991              Nodes  which remain idle or down for this number of seconds will
3992              be placed into power save mode by SuspendProgram.  For efficient
3993              system utilization, it is recommended that the value of Suspend‐
3994              Time be at least as large as  the  sum  of  SuspendTimeout  plus
3995              ResumeTimeout.   A  value  of -1 disables power save mode and is
3996              the default.  Related configuration options include  ResumeTime‐
3997              out,  ResumeProgram,  ResumeRate,  SuspendProgram,  SuspendRate,
3998              SuspendTimeout, SuspendExcNodes, and SuspendExcParts.
3999
4000
4001       SuspendTimeout
4002              Maximum time permitted (in seconds) between when a node  suspend
4003              request  is  issued and when the node is shutdown.  At that time
4004              the node must be ready for a resume  request  to  be  issued  as
4005              needed  for new work.  The default value is 30 seconds.  Related
4006              configuration options include ResumeProgram, ResumeRate, Resume‐
4007              Timeout,  SuspendRate, SuspendTime, SuspendProgram, SuspendExcN‐
4008              odes and SuspendExcParts.  More information is available at  the
4009              Slurm web site ( https://slurm.schedmd.com/power_save.html ).
4010
4011
4012       SwitchType
4013              Identifies  the type of switch or interconnect used for applica‐
4014              tion     communications.      Acceptable     values      include
4015              "switch/cray_aries" for Cray systems, "switch/none" for switches
4016              not requiring special processing for job launch  or  termination
4017              (Ethernet,   and   InfiniBand)   and   The   default   value  is
4018              "switch/none".  All Slurm daemons,  commands  and  running  jobs
4019              must be restarted for a change in SwitchType to take effect.  If
4020              running jobs exist at the time slurmctld is restarted with a new
4021              value  of  SwitchType,  records  of all jobs in any state may be
4022              lost.
4023
4024
4025       TaskEpilog
4026              Fully qualified pathname of a program to be execute as the slurm
4027              job's  owner after termination of each task.  See TaskProlog for
4028              execution order details.
4029
4030
4031       TaskPlugin
4032              Identifies the type of task launch  plugin,  typically  used  to
4033              provide resource management within a node (e.g. pinning tasks to
4034              specific processors). More than one task plugin can be specified
4035              in  a  comma  separated list. The prefix of "task/" is optional.
4036              Acceptable values include:
4037
4038              task/affinity  enables      resource      containment      using
4039                             sched_setaffinity().  This enables the --cpu-bind
4040                             and/or --mem-bind srun options.
4041
4042              task/cgroup    enables resource containment using Linux  control
4043                             cgroups.   This  enables  the  --cpu-bind  and/or
4044                             --mem-bind  srun   options.    NOTE:   see   "man
4045                             cgroup.conf" for configuration details.
4046
4047              task/none      for systems requiring no special handling of user
4048                             tasks.  Lacks support for the  --cpu-bind  and/or
4049                             --mem-bind  srun  options.   The default value is
4050                             "task/none".
4051
4052              NOTE:  It  is  recommended  to  stack  task/affinity,task/cgroup
4053              together  when  configuring  TaskPlugin,  and setting TaskAffin‐
4054              ity=no and ConstrainCores=yes in cgroup.conf.  This  setup  uses
4055              the  task/affinity  plugin for setting the affinity of the tasks
4056              (which is better and different than task/cgroup)  and  uses  the
4057              task/cgroup  plugin to fence tasks into the specified resources,
4058              thus combining the best of both pieces.
4059
4060              NOTE: For CRAY systems only: task/cgroup must be used with,  and
4061              listed  after  task/cray_aries  in TaskPlugin. The task/affinity
4062              plugin can be listed anywhere, but the previous constraint  must
4063              be  satisfied.  For  CRAY  systems, a configuration like this is
4064              recommended:
4065              TaskPlugin=task/affinity,task/cray_aries,task/cgroup
4066
4067
4068       TaskPluginParam
4069              Optional parameters  for  the  task  plugin.   Multiple  options
4070              should  be  comma  separated.   If None, Boards, Sockets, Cores,
4071              Threads, and/or Verbose are specified, they  will  override  the
4072              --cpu-bind  option  specified  by  the user in the srun command.
4073              None, Boards, Sockets, Cores and Threads are mutually  exclusive
4074              and since they decrease scheduling flexibility are not generally
4075              recommended (select no more than one of them).
4076
4077
4078              Boards    Bind tasks to boards by default.  Overrides  automatic
4079                        binding.
4080
4081              Cores     Bind  tasks  to cores by default.  Overrides automatic
4082                        binding.
4083
4084              None      Perform no task binding by default.   Overrides  auto‐
4085                        matic binding.
4086
4087              Sockets   Bind to sockets by default.  Overrides automatic bind‐
4088                        ing.
4089
4090              Threads   Bind to threads by default.  Overrides automatic bind‐
4091                        ing.
4092
4093              SlurmdOffSpec
4094                        If  specialized  cores  or CPUs are identified for the
4095                        node (i.e. the CoreSpecCount or CpuSpecList  are  con‐
4096                        figured  for  the node), then Slurm daemons running on
4097                        the compute node (i.e. slurmd and  slurmstepd)  should
4098                        run  outside  of  those  resources  (i.e.  specialized
4099                        resources are completely unavailable to Slurm  daemons
4100                        and  jobs  spawned  by Slurm).  This option may not be
4101                        used with the task/cray_aries plugin.
4102
4103              Verbose   Verbosely report binding before tasks run.   Overrides
4104                        user options.
4105
4106              Autobind  Set a default binding in the event that "auto binding"
4107                        doesn't find a match.  Set to Threads, Cores or  Sock‐
4108                        ets (E.g. TaskPluginParam=autobind=threads).
4109
4110
4111       TaskProlog
4112              Fully qualified pathname of a program to be execute as the slurm
4113              job's owner prior to initiation of each task.  Besides the  nor‐
4114              mal  environment variables, this has SLURM_TASK_PID available to
4115              identify the process ID of the  task  being  started.   Standard
4116              output  from this program can be used to control the environment
4117              variables and output for the user program.
4118
4119              export NAME=value   Will set environment variables for the  task
4120                                  being  spawned.   Everything after the equal
4121                                  sign to the end of the line will be used  as
4122                                  the  value  for  the  environment  variable.
4123                                  Exporting of functions is not currently sup‐
4124                                  ported.
4125
4126              print ...           Will  cause  that  line (without the leading
4127                                  "print ") to be printed to the  job's  stan‐
4128                                  dard output.
4129
4130              unset NAME          Will  clear  environment  variables  for the
4131                                  task being spawned.
4132
4133              The order of task prolog/epilog execution is as follows:
4134
4135              1. pre_launch_priv()
4136                                  Function in TaskPlugin
4137
4138              1. pre_launch()     Function in TaskPlugin
4139
4140              2. TaskProlog       System-wide  per  task  program  defined  in
4141                                  slurm.conf
4142
4143              3. user prolog      Job step specific task program defined using
4144                                  srun's     --task-prolog      option      or
4145                                  SLURM_TASK_PROLOG environment variable
4146
4147              4. Execute the job step's task
4148
4149              5. user epilog      Job step specific task program defined using
4150                                  srun's     --task-epilog      option      or
4151                                  SLURM_TASK_EPILOG environment variable
4152
4153              6. TaskEpilog       System-wide  per  task  program  defined  in
4154                                  slurm.conf
4155
4156              7. post_term()      Function in TaskPlugin
4157
4158
4159       TCPTimeout
4160              Time permitted for TCP connection  to  be  established.  Default
4161              value is 2 seconds.
4162
4163
4164       TmpFS  Fully  qualified  pathname  of the file system available to user
4165              jobs for temporary storage. This parameter is used in establish‐
4166              ing a node's TmpDisk space.  The default value is "/tmp".
4167
4168
4169       TopologyParam
4170              Comma separated options identifying network topology options.
4171
4172              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4173                             when TopologyPlugin=topology/tree.
4174
4175              TopoOptional   Only optimize allocation for network topology  if
4176                             the  job includes a switch option. Since optimiz‐
4177                             ing resource  allocation  for  topology  involves
4178                             much  higher  system overhead, this option can be
4179                             used to impose the extra overhead  only  on  jobs
4180                             which can take advantage of it. If most job allo‐
4181                             cations are not optimized for  network  topology,
4182                             they  may  fragment  resources  to the point that
4183                             topology optimization for other jobs will be dif‐
4184                             ficult  to  achieve.   NOTE: Jobs may span across
4185                             nodes without common parent  switches  with  this
4186                             enabled.
4187
4188
4189       TopologyPlugin
4190              Identifies  the  plugin  to  be used for determining the network
4191              topology and optimizing job allocations to minimize network con‐
4192              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4193              plugins may be provided in  the  future  which  gather  topology
4194              information   directly  from  the  network.   Acceptable  values
4195              include:
4196
4197              topology/3d_torus    best-fit   logic   over   three-dimensional
4198                                   topology
4199
4200              topology/none        default  for  other systems, best-fit logic
4201                                   over one-dimensional topology
4202
4203              topology/tree        used  for   a   hierarchical   network   as
4204                                   described in a topology.conf file
4205
4206
4207       TrackWCKey
4208              Boolean  yes  or no.  Used to set display and track of the Work‐
4209              load Characterization Key.  Must be set to track  correct  wckey
4210              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4211              file to create historical usage reports.
4212
4213
4214       TreeWidth
4215              Slurmd daemons use a virtual tree  network  for  communications.
4216              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4217              architectures with a front end node running the  slurmd  daemon,
4218              the  value must always be equal to or greater than the number of
4219              front end nodes which eliminates the need for message forwarding
4220              between  the slurmd daemons.  On other architectures the default
4221              value is 50, meaning each slurmd daemon can communicate with  up
4222              to  50 other slurmd daemons and over 2500 nodes can be contacted
4223              with two message hops.  The default value  will  work  well  for
4224              most  clusters.   Optimal  system  performance  can typically be
4225              achieved if TreeWidth is set to the square root of the number of
4226              nodes  in the cluster for systems having no more than 2500 nodes
4227              or the cube root for larger systems. The value  may  not  exceed
4228              65533.
4229
4230
4231       UnkillableStepProgram
4232              If  the  processes in a job step are determined to be unkillable
4233              for a period of  time  specified  by  the  UnkillableStepTimeout
4234              variable, the program specified by UnkillableStepProgram will be
4235              executed.  This program can be used to take special  actions  to
4236              clean  up the unkillable processes and/or notify computer admin‐
4237              istrators.  The program will be run SlurmdUser (usually  "root")
4238              on the compute node.  By default no program is run.
4239
4240
4241       UnkillableStepTimeout
4242              The  length  of  time,  in  seconds, that Slurm will wait before
4243              deciding that processes in a job step are unkillable (after they
4244              have  been signaled with SIGKILL) and execute UnkillableStepPro‐
4245              gram as described above.  The default timeout value is  60  sec‐
4246              onds.   If exceeded, the compute node will be drained to prevent
4247              future jobs from being scheduled on the node.
4248
4249
4250       UsePAM If set to 1, PAM (Pluggable Authentication  Modules  for  Linux)
4251              will  be enabled.  PAM is used to establish the upper bounds for
4252              resource limits. With PAM support enabled, local system adminis‐
4253              trators can dynamically configure system resource limits. Chang‐
4254              ing the upper bound of a resource limit will not alter the  lim‐
4255              its  of  running jobs, only jobs started after a change has been
4256              made will pick up the new limits.  The default value is  0  (not
4257              to enable PAM support).  Remember that PAM also needs to be con‐
4258              figured to support Slurm as a service.  For  sites  using  PAM's
4259              directory based configuration option, a configuration file named
4260              slurm should be created.  The  module-type,  control-flags,  and
4261              module-path names that should be included in the file are:
4262              auth        required      pam_localuser.so
4263              auth        required      pam_shells.so
4264              account     required      pam_unix.so
4265              account     required      pam_access.so
4266              session     required      pam_unix.so
4267              For sites configuring PAM with a general configuration file, the
4268              appropriate lines (see above), where slurm is the  service-name,
4269              should be added.
4270
4271              NOTE:   UsePAM   option   has   nothing  to  do  with  the  con‐
4272              tribs/pam/pam_slurm and/or contribs/pam_slurm_adopt modules.  So
4273              these  two  modules  can work independently of the value set for
4274              UsePAM.
4275
4276
4277       VSizeFactor
4278              Memory specifications in job requests apply to real memory  size
4279              (also  known  as  resident  set size). It is possible to enforce
4280              virtual memory limits for both jobs and job  steps  by  limiting
4281              their  virtual  memory  to  some percentage of their real memory
4282              allocation. The VSizeFactor parameter specifies the job's or job
4283              step's  virtual  memory limit as a percentage of its real memory
4284              limit. For example, if a job's real memory limit  is  500MB  and
4285              VSizeFactor  is  set  to  101 then the job will be killed if its
4286              real memory exceeds 500MB or its virtual  memory  exceeds  505MB
4287              (101 percent of the real memory limit).  The default value is 0,
4288              which disables enforcement of virtual memory limits.  The  value
4289              may not exceed 65533 percent.
4290
4291              NOTE:  This  parameter is dependent on OverMemoryKill being con‐
4292              figured in JobAcctGatherParams. It is also possible to configure
4293              the TaskPlugin to use task/cgroup for memory enforcement. VSize‐
4294              Factor will not  have  an  effect  on  memory  enforcement  done
4295              through cgroups.
4296
4297
4298       WaitTime
4299              Specifies  how  many  seconds the srun command should by default
4300              wait after the first  task  terminates  before  terminating  all
4301              remaining  tasks.  The  "--wait" option on the srun command line
4302              overrides this value.  The default value is  0,  which  disables
4303              this feature.  May not exceed 65533 seconds.
4304
4305
4306       X11Parameters
4307              For use with Slurm's built-in X11 forwarding implementation.
4308
4309              home_xauthority
4310                      If set, xauth data on the compute node will be placed in
4311                      ~/.Xauthority rather than  in  a  temporary  file  under
4312                      TmpFS.
4313
4314

NODE CONFIGURATION

4316       The configuration of nodes (or machines) to be managed by Slurm is also
4317       specified in /etc/slurm.conf.   Changes  in  node  configuration  (e.g.
4318       adding  nodes, changing their processor count, etc.) require restarting
4319       both the slurmctld daemon and the slurmd daemons.  All  slurmd  daemons
4320       must  know  each  node  in the system to forward messages in support of
4321       hierarchical communications.  Only the NodeName must be supplied in the
4322       configuration  file.   All  other  node  configuration  information  is
4323       optional.  It is advisable to establish baseline  node  configurations,
4324       especially  if  the  cluster is heterogeneous.  Nodes which register to
4325       the system with less than the configured  resources  (e.g.  too  little
4326       memory), will be placed in the "DOWN" state to avoid scheduling jobs on
4327       them.  Establishing baseline configurations  will  also  speed  Slurm's
4328       scheduling process by permitting it to compare job requirements against
4329       these (relatively few) configuration parameters and possibly avoid hav‐
4330       ing  to check job requirements against every individual node's configu‐
4331       ration.  The resources checked at node  registration  time  are:  CPUs,
4332       RealMemory and TmpDisk.
4333
4334       Default  values  can  be  specified  with a record in which NodeName is
4335       "DEFAULT".  The default entry values will apply only to lines following
4336       it in the configuration file and the default values can be reset multi‐
4337       ple times in the configuration file with multiple entries where  "Node‐
4338       Name=DEFAULT".   Each  line where NodeName is "DEFAULT" will replace or
4339       add to previous default values and not a reinitialize the default  val‐
4340       ues.   The  "NodeName="  specification  must  be  placed  on every line
4341       describing the configuration of nodes.  A  single  node  name  can  not
4342       appear  as  a NodeName value in more than one line (duplicate node name
4343       records will be ignored).  In fact, it is generally possible and desir‐
4344       able  to  define  the  configurations of all nodes in only a few lines.
4345       This convention permits significant optimization in the  scheduling  of
4346       larger  clusters.   In  order  to support the concept of jobs requiring
4347       consecutive nodes on some architectures, node specifications should  be
4348       place  in  this  file in consecutive order.  No single node name may be
4349       listed more than once in the configuration file.  Use  "DownNodes="  to
4350       record  the  state  of  nodes which are temporarily in a DOWN, DRAIN or
4351       FAILING state without altering permanent configuration information.   A
4352       job  step's  tasks  are allocated to nodes in order the nodes appear in
4353       the configuration file. There is presently no capability  within  Slurm
4354       to arbitrarily order a job step's tasks.
4355
4356       Multiple  node  names  may be comma separated (e.g. "alpha,beta,gamma")
4357       and/or a simple node range expression may optionally be used to specify
4358       numeric  ranges  of  nodes  to avoid building a configuration file with
4359       large numbers of entries.  The node range expression  can  contain  one
4360       pair  of  square  brackets  with  a sequence of comma separated numbers
4361       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4362       "lx[15,18,32-33]").   Note  that  the numeric ranges can include one or
4363       more leading zeros to indicate the numeric portion has a  fixed  number
4364       of  digits  (e.g.  "linux[0000-1023]").  Multiple numeric ranges can be
4365       included in the expression (e.g. "rack[0-63]_blade[0-41]").  If one  or
4366       more  numeric  expressions are included, one of them must be at the end
4367       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4368       always be used in a comma separated list.
4369
4370       The node configuration specified the following information:
4371
4372
4373       NodeName
4374              Name  that  Slurm uses to refer to a node.  Typically this would
4375              be the string that "/bin/hostname -s" returns.  It may  also  be
4376              the  fully  qualified  domain name as returned by "/bin/hostname
4377              -f" (e.g. "foo1.bar.com"), or any valid domain  name  associated
4378              with  the  host  through  the host database (/etc/hosts) or DNS,
4379              depending on the resolver settings.  Note that if the short form
4380              of  the  hostname  is  not  used, it may prevent use of hostlist
4381              expressions (the numeric portion in brackets must be at the  end
4382              of the string).  It may also be an arbitrary string if NodeHost‐
4383              name is specified.  If the NodeName  is  "DEFAULT",  the  values
4384              specified  with that record will apply to subsequent node speci‐
4385              fications unless explicitly set to other  values  in  that  node
4386              record or replaced with a different set of default values.  Each
4387              line where NodeName is "DEFAULT" will replace or add to previous
4388              default  values  and not a reinitialize the default values.  For
4389              architectures in which the node order is significant, nodes will
4390              be considered consecutive in the order defined.  For example, if
4391              the configuration for "NodeName=charlie" immediately follows the
4392              configuration for "NodeName=baker" they will be considered adja‐
4393              cent in the computer.
4394
4395
4396       NodeHostname
4397              Typically this would  be  the  string  that  "/bin/hostname  -s"
4398              returns.   It  may  also  be  the fully qualified domain name as
4399              returned by "/bin/hostname -f"  (e.g.  "foo1.bar.com"),  or  any
4400              valid  domain  name  associated  with  the host through the host
4401              database (/etc/hosts) or DNS, depending  on  the  resolver  set‐
4402              tings.  Note that if the short form of the hostname is not used,
4403              it may prevent use of hostlist expressions (the numeric  portion
4404              in  brackets  must  be  at the end of the string).  A node range
4405              expression can be used to specify a set of nodes.  If an expres‐
4406              sion  is used, the number of nodes identified by NodeHostname on
4407              a line in the configuration file must be identical to the number
4408              of  nodes  identified by NodeName.  By default, the NodeHostname
4409              will be identical in value to NodeName.
4410
4411
4412       NodeAddr
4413              Name that a node should be referred to in establishing a  commu‐
4414              nications  path.   This  name will be used as an argument to the
4415              getaddrinfo() function for  identification.   If  a  node  range
4416              expression  is  used  to  designate  multiple  nodes,  they must
4417              exactly  match  the  entries  in  the  NodeName   (e.g.   "Node‐
4418              Name=lx[0-7]  NodeAddr=elx[0-7]").  NodeAddr may also contain IP
4419              addresses.  By default, the NodeAddr will be identical in  value
4420              to NodeHostname.
4421
4422
4423       BcastAddr
4424              Alternate  network path to be used for sbcast network traffic to
4425              a given node.  This name will be used  as  an  argument  to  the
4426              getaddrinfo()  function.   If a node range expression is used to
4427              designate multiple nodes, they must exactly match the entries in
4428              the   NodeName   (e.g.  "NodeName=lx[0-7]  BcastAddr=elx[0-7]").
4429              BcastAddr may also contain IP addresses.  By default, the  Bcas‐
4430              tAddr  is  unset,  and  sbcast  traffic  will  be  routed to the
4431              NodeAddr for a given node.  Note: cannot be used with Communica‐
4432              tionParameters=NoInAddrAny.
4433
4434
4435       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4436              that when Boards is specified, SocketsPerBoard,  CoresPerSocket,
4437              and  ThreadsPerCore  should  be  specified.  Boards and CPUs are
4438              mutually exclusive.  The default value is 1.
4439
4440
4441       CoreSpecCount
4442              Number of cores reserved for system use.  These cores  will  not
4443              be  available  for  allocation to user jobs.  Depending upon the
4444              TaskPluginParam option of  SlurmdOffSpec,  Slurm  daemons  (i.e.
4445              slurmd and slurmstepd) may either be confined to these resources
4446              (the default) or prevented from using these  resources.   Isola‐
4447              tion of the Slurm daemons from user jobs may improve application
4448              performance.  If this option and CpuSpecList are both designated
4449              for a node, an error is generated.  For information on the algo‐
4450              rithm used by Slurm to select the cores refer to the  core  spe‐
4451              cialization                    documentation                   (
4452              https://slurm.schedmd.com/core_spec.html ).
4453
4454
4455       CoresPerSocket
4456              Number of cores in a  single  physical  processor  socket  (e.g.
4457              "2").   The  CoresPerSocket  value describes physical cores, not
4458              the logical number of processors per socket.  NOTE: If you  have
4459              multi-core  processors,  you  will  likely  need to specify this
4460              parameter in order to optimize scheduling.  The default value is
4461              1.
4462
4463
4464       CpuBind
4465              If  a job step request does not specify an option to control how
4466              tasks are bound to allocated CPUs  (--cpu-bind)  and  all  nodes
4467              allocated  to the job have the same CpuBind option the node Cpu‐
4468              Bind option will  control  how  tasks  are  bound  to  allocated
4469              resources.  Supported  values  for  CpuBind are "none", "board",
4470              "socket", "ldom" (NUMA), "core" and "thread".
4471
4472
4473       CPUs   Number of logical processors on the node (e.g. "2").   CPUs  and
4474              Boards are mutually exclusive. It can be set to the total number
4475              of sockets(supported only by select/linear), cores  or  threads.
4476              This can be useful when you want to schedule only the cores on a
4477              hyper-threaded node. If CPUs is omitted, its default will be set
4478              equal  to  the  product  of Boards, Sockets, CoresPerSocket, and
4479              ThreadsPerCore.
4480
4481
4482       CpuSpecList
4483              A comma delimited list of Slurm abstract CPU  IDs  reserved  for
4484              system  use.   The  list  will  be expanded to include all other
4485              CPUs, if any, on the same cores.  These cores will not be avail‐
4486              able  for allocation to user jobs.  Depending upon the TaskPlug‐
4487              inParam option of SlurmdOffSpec, Slurm daemons (i.e. slurmd  and
4488              slurmstepd)  may  either  be  confined  to  these resources (the
4489              default) or prevented from using these resources.  Isolation  of
4490              the Slurm daemons from user jobs may improve application perfor‐
4491              mance.  If this option and CoreSpecCount are both designated for
4492              a node, an error is generated.  This option has no effect unless
4493              cgroup   job   confinement   is   also   configured    (TaskPlu‐
4494              gin=task/cgroup with ConstrainCores=yes in cgroup.conf).
4495
4496
4497       Features
4498              A  comma  delimited list of arbitrary strings indicative of some
4499              characteristic associated with the node.  There is no  value  or
4500              count  associated with a feature at this time, a node either has
4501              a feature or it does not.   A  desired  feature  may  contain  a
4502              numeric  component  indicating, for example, processor speed but
4503              this numeric component will be considered to be part of the fea‐
4504              ture  string.  Features  are intended to be used to filter nodes
4505              eligible to run jobs via the --constraint argument.  By  default
4506              a  node  has  no features.  Also see Gres for being able to have
4507              more control such as types and count. Using features  is  faster
4508              than  scheduling  against  GRES but is limited to Boolean opera‐
4509              tions.
4510
4511
4512       Gres   A comma delimited list of generic resources specifications for a
4513              node.    The   format   is:  "<name>[:<type>][:no_consume]:<num‐
4514              ber>[K|M|G]".  The first  field  is  the  resource  name,  which
4515              matches the GresType configuration parameter name.  The optional
4516              type field might be used to identify a  model  of  that  generic
4517              resource.  It is forbidden to specify both an untyped GRES and a
4518              typed GRES with the same <name>.  The optional no_consume  field
4519              allows  you  to  specify that a generic resource does not have a
4520              finite number of that resource  that  gets  consumed  as  it  is
4521              requested.  The  no_consume field is a GRES specific setting and
4522              applies to the GRES, regardless  of  the  type  specified.   The
4523              final field must specify a generic resources count.  A suffix of
4524              "K", "M", "G", "T" or "P" may be used to multiply the number  by
4525              1024,      1048576,      1073741824,      etc.     respectively.
4526              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4527              sume:4G").   By  default a node has no generic resources and its
4528              maximum count is that of an unsigned 64bit  integer.   Also  see
4529              Features  for  Boolean  flags  to  filter  nodes  using job con‐
4530              straints.
4531
4532
4533       MemSpecLimit
4534              Amount of memory, in megabytes, reserved for system use and  not
4535              available  for  user  allocations.  If the task/cgroup plugin is
4536              configured and that plugin constrains memory  allocations  (i.e.
4537              TaskPlugin=task/cgroup in slurm.conf, plus ConstrainRAMSpace=yes
4538              in cgroup.conf), then Slurm compute node  daemons  (slurmd  plus
4539              slurmstepd)  will  be allocated the specified memory limit. Note
4540              that having the Memory set in SelectTypeParameters as any of the
4541              options  that has it as a consumable resource is needed for this
4542              option to work.  The daemons will not be killed if they  exhaust
4543              the  memory allocation (ie. the Out-Of-Memory Killer is disabled
4544              for the daemon's memory cgroup).  If the task/cgroup  plugin  is
4545              not  configured,  the  specified memory will only be unavailable
4546              for user allocations.
4547
4548
4549       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4550              tens  to for work on this particular node. By default there is a
4551              single port number for all slurmd daemons on all  compute  nodes
4552              as  defined  by  the  SlurmdPort configuration parameter. Use of
4553              this option is not generally recommended except for  development
4554              or  testing  purposes.  If  multiple slurmd daemons execute on a
4555              node this can specify a range of ports.
4556
4557              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
4558              automatically  try  to  interact  with  anything opened on ports
4559              8192-60000.  Configure Port to use a port outside of the config‐
4560              ured SrunPortRange and RSIP's port range.
4561
4562
4563       Procs  See CPUs.
4564
4565
4566       RealMemory
4567              Size of real memory on the node in megabytes (e.g. "2048").  The
4568              default value is 1. Lowering RealMemory with the goal of setting
4569              aside  some  amount for the OS and not available for job alloca‐
4570              tions will not work as intended if Memory is not set as  a  con‐
4571              sumable resource in SelectTypeParameters. So one of the *_Memory
4572              options need to be enabled for that  goal  to  be  accomplished.
4573              Also see MemSpecLimit.
4574
4575
4576       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4577              "DRAINED"  "DRAINING",  "FAIL"  or  "FAILING".   Use  quotes  to
4578              enclose a reason having more than one word.
4579
4580
4581       Sockets
4582              Number  of  physical  processor  sockets/chips on the node (e.g.
4583              "2").  If Sockets is omitted, it will  be  inferred  from  CPUs,
4584              CoresPerSocket,   and   ThreadsPerCore.    NOTE:   If  you  have
4585              multi-core processors, you will likely  need  to  specify  these
4586              parameters.  Sockets and SocketsPerBoard are mutually exclusive.
4587              If Sockets is specified when Boards is  also  used,  Sockets  is
4588              interpreted  as  SocketsPerBoard rather than total sockets.  The
4589              default value is 1.
4590
4591
4592       SocketsPerBoard
4593              Number of  physical  processor  sockets/chips  on  a  baseboard.
4594              Sockets and SocketsPerBoard are mutually exclusive.  The default
4595              value is 1.
4596
4597
4598       State  State of the node with respect to the initiation of  user  jobs.
4599              Acceptable  values are CLOUD, DOWN, DRAIN, FAIL, FAILING, FUTURE
4600              and UNKNOWN.  Node states of BUSY and IDLE should not be  speci‐
4601              fied  in  the  node  configuration,  but  set  the node state to
4602              UNKNOWN instead.  Setting the node state to UNKNOWN will  result
4603              in  the  node state being set to BUSY, IDLE or other appropriate
4604              state  based  upon  recovered  system  state  information.   The
4605              default  value  is  UNKNOWN.   Also  see the DownNodes parameter
4606              below.
4607
4608              CLOUD     Indicates the node exists in the cloud.   Its  initial
4609                        state  will be treated as powered down.  The node will
4610                        be available for use after its state is recovered from
4611                        Slurm's state save file or the slurmd daemon starts on
4612                        the compute node.
4613
4614              DOWN      Indicates the node failed and  is  unavailable  to  be
4615                        allocated work.
4616
4617              DRAIN     Indicates  the  node  is  unavailable  to be allocated
4618                        work.
4619
4620              FAIL      Indicates the node is expected to fail  soon,  has  no
4621                        jobs allocated to it, and will not be allocated to any
4622                        new jobs.
4623
4624              FAILING   Indicates the node is expected to fail soon,  has  one
4625                        or  more  jobs  allocated to it, but will not be allo‐
4626                        cated to any new jobs.
4627
4628              FUTURE    Indicates the node is defined for future use and  need
4629                        not  exist  when  the Slurm daemons are started. These
4630                        nodes can be made available for use simply by updating
4631                        the  node state using the scontrol command rather than
4632                        restarting the slurmctld daemon. After these nodes are
4633                        made  available,  change their State in the slurm.conf
4634                        file. Until these nodes are made available, they  will
4635                        not  be  seen using any Slurm commands or nor will any
4636                        attempt be made to contact them.
4637
4638
4639                        Dynamic Future Nodes
4640                               A slurmd started  with  -F[<feature>]  will  be
4641                               associated  with a FUTURE node that matches the
4642                               same configuration (sockets, cores, threads) as
4643                               reported  by slurmd -C. The node's NodeAddr and
4644                               NodeHostname will  automatically  be  retrieved
4645                               from  the  slurmd  and will be cleared when set
4646                               back to the FUTURE state. Dynamic FUTURE  nodes
4647                               retain  non-FUTURE  state on restart. Use scon‐
4648                               trol to put node back into FUTURE state.
4649
4650                               If the mapping of the NodeName  to  the  slurmd
4651                               HostName  is not updated in DNS, Dynamic Future
4652                               nodes won't know how to communicate  with  each
4653                               other  -- because NodeAddr and NodeHostName are
4654                               not defined in the slurm.conf -- and the fanout
4655                               communications  need  to be disabled by setting
4656                               TreeWidth to a high number (e.g. 65533). If the
4657                               DNS  mapping is made, then the cloud_dns Slurm‐
4658                               ctldParameter can be used.
4659
4660
4661              UNKNOWN   Indicates the node's state is undefined  but  will  be
4662                        established (set to BUSY or IDLE) when the slurmd dae‐
4663                        mon on that node registers.  UNKNOWN  is  the  default
4664                        state.
4665
4666
4667       ThreadsPerCore
4668              Number  of logical threads in a single physical core (e.g. "2").
4669              Note that the Slurm can allocate resources to jobs down  to  the
4670              resolution  of  a  core.  If your system is configured with more
4671              than one thread per core, execution of a different job  on  each
4672              thread  is  not supported unless you configure SelectTypeParame‐
4673              ters=CR_CPU plus CPUs; do not configure Sockets,  CoresPerSocket
4674              or ThreadsPerCore.  A job can execute a one task per thread from
4675              within one job step or execute a distinct job step  on  each  of
4676              the  threads.   Note  also  if  you are running with more than 1
4677              thread   per   core   and   running   the   select/cons_res   or
4678              select/cons_tres  plugin  then  you will want to set the Select‐
4679              TypeParameters variable to something other than CR_CPU to  avoid
4680              unexpected results.  The default value is 1.
4681
4682
4683       TmpDisk
4684              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4685              "16384"). TmpFS (for "Temporary  File  System")  identifies  the
4686              location which jobs should use for temporary storage.  Note this
4687              does not indicate the amount of free space available to the user
4688              on  the node, only the total file system size. The system admin‐
4689              istration should ensure this file system is purged as needed  so
4690              that  user  jobs  have access to most of this space.  The Prolog
4691              and/or Epilog programs (specified  in  the  configuration  file)
4692              might  be  used  to  ensure  the file system is kept clean.  The
4693              default value is 0.
4694
4695
4696       TRESWeights TRESWeights are used to calculate a value  that  represents
4697       how
4698              busy  a  node  is.  Currently only used in federation configura‐
4699              tions. TRESWeights  are  different  from  TRESBillingWeights  --
4700              which is used for fairshare calculations.
4701
4702              TRES  weights  are  specified as a comma-separated list of <TRES
4703              Type>=<TRES Weight> pairs.
4704              e.g.
4705              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4706
4707              By default the weighted TRES value is calculated as the  sum  of
4708              all  node  TRES  types  multiplied  by  their corresponding TRES
4709              weight.
4710
4711              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4712              is  calculated  as  the MAX of individual node TRES' (e.g. cpus,
4713              mem, gres).
4714
4715
4716       Weight The priority of the node for scheduling  purposes.   All  things
4717              being  equal,  jobs  will be allocated the nodes with the lowest
4718              weight which satisfies their requirements.  For example, a  het‐
4719              erogeneous  collection  of  nodes  might be placed into a single
4720              partition for greater  system  utilization,  responsiveness  and
4721              capability.  It  would  be preferable to allocate smaller memory
4722              nodes rather than larger memory nodes if either will  satisfy  a
4723              job's  requirements.   The  units  of  weight are arbitrary, but
4724              larger weights should be assigned to nodes with more processors,
4725              memory, disk space, higher processor speed, etc.  Note that if a
4726              job allocation request can not be satisfied using the nodes with
4727              the  lowest weight, the set of nodes with the next lowest weight
4728              is added to the set of nodes under consideration for use (repeat
4729              as  needed  for higher weight values). If you absolutely want to
4730              minimize the number of higher weight nodes allocated  to  a  job
4731              (at a cost of higher scheduling overhead), give each node a dis‐
4732              tinct Weight value and they will be added to the pool  of  nodes
4733              being considered for scheduling individually.  The default value
4734              is 1.
4735
4736

DOWN NODE CONFIGURATION

4738       The DownNodes= parameter permits you to mark  certain  nodes  as  in  a
4739       DOWN,  DRAIN, FAIL, FAILING or FUTURE state without altering the perma‐
4740       nent configuration information listed under a NodeName= specification.
4741
4742
4743       DownNodes
4744              Any node name, or list of node names, from the NodeName=  speci‐
4745              fications.
4746
4747
4748       Reason Identifies  the  reason  for  a node being in state DOWN, DRAIN,
4749              FAIL, FAILING or FUTURE.  Use quotes to enclose a reason  having
4750              more than one word.
4751
4752
4753       State  State  of  the node with respect to the initiation of user jobs.
4754              Acceptable values are DOWN, DRAIN,  FAIL,  FAILING  and  FUTURE.
4755              For  more  information  about  these states see the descriptions
4756              under State in the NodeName= section above.  The  default  value
4757              is DOWN.
4758
4759

FRONTEND NODE CONFIGURATION

4761       On  computers  where  frontend  nodes are used to execute batch scripts
4762       rather than compute nodes (Cray ALPS systems), one may configure one or
4763       more  frontend  nodes using the configuration parameters defined below.
4764       These options are very similar to those  used  in  configuring  compute
4765       nodes.  These  options may only be used on systems configured and built
4766       with the appropriate parameters (--have-front-end) or a  system  deter‐
4767       mined  to  have  the  appropriate  architecture by the configure script
4768       (Cray ALPS systems).  The front end configuration specifies the follow‐
4769       ing information:
4770
4771
4772       AllowGroups
4773              Comma  separated  list  of group names which may execute jobs on
4774              this front end node. By default, all groups may use  this  front
4775              end  node.   If  at  least  one  group  associated with the user
4776              attempting to execute the job is in AllowGroups, he will be per‐
4777              mitted  to  use  this  front end node.  May not be used with the
4778              DenyGroups option.
4779
4780
4781       AllowUsers
4782              Comma separated list of user names which  may  execute  jobs  on
4783              this  front  end  node. By default, all users may use this front
4784              end node.  May not be used with the DenyUsers option.
4785
4786
4787       DenyGroups
4788              Comma separated list of group names  which  are  prevented  from
4789              executing jobs on this front end node.  May not be used with the
4790              AllowGroups option.
4791
4792
4793       DenyUsers
4794              Comma separated list of user names which are prevented from exe‐
4795              cuting  jobs  on  this front end node.  May not be used with the
4796              AllowUsers option.
4797
4798
4799       FrontendName
4800              Name that Slurm uses to refer to  a  frontend  node.   Typically
4801              this  would  be  the string that "/bin/hostname -s" returns.  It
4802              may also be the fully  qualified  domain  name  as  returned  by
4803              "/bin/hostname  -f"  (e.g.  "foo1.bar.com"), or any valid domain
4804              name  associated  with  the  host  through  the  host   database
4805              (/etc/hosts)  or  DNS, depending on the resolver settings.  Note
4806              that if the short form of the hostname is not used, it may  pre‐
4807              vent  use of hostlist expressions (the numeric portion in brack‐
4808              ets must be at the end of the string).  If the  FrontendName  is
4809              "DEFAULT",  the  values specified with that record will apply to
4810              subsequent node specifications unless explicitly  set  to  other
4811              values in that frontend node record or replaced with a different
4812              set  of  default  values.   Each  line  where  FrontendName   is
4813              "DEFAULT" will replace or add to previous default values and not
4814              a reinitialize the default values.
4815
4816
4817       FrontendAddr
4818              Name that a frontend node should be referred to in  establishing
4819              a  communications path. This name will be used as an argument to
4820              the getaddrinfo() function for identification.   As  with  Fron‐
4821              tendName, list the individual node addresses rather than using a
4822              hostlist expression.  The number  of  FrontendAddr  records  per
4823              line  must  equal  the  number  of FrontendName records per line
4824              (i.e. you can't map to node names to one address).  FrontendAddr
4825              may  also  contain  IP  addresses.  By default, the FrontendAddr
4826              will be identical in value to FrontendName.
4827
4828
4829       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4830              tens  to  for  work on this particular frontend node. By default
4831              there is a single port number for  all  slurmd  daemons  on  all
4832              frontend nodes as defined by the SlurmdPort configuration param‐
4833              eter. Use of this option is not generally recommended except for
4834              development or testing purposes.
4835
4836              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4837              automatically try to interact  with  anything  opened  on  ports
4838              8192-60000.  Configure Port to use a port outside of the config‐
4839              ured SrunPortRange and RSIP's port range.
4840
4841
4842       Reason Identifies the reason for a frontend node being in  state  DOWN,
4843              DRAINED,  DRAINING,  FAIL  or  FAILING.  Use quotes to enclose a
4844              reason having more than one word.
4845
4846
4847       State  State of the frontend node with respect  to  the  initiation  of
4848              user jobs.  Acceptable values are DOWN, DRAIN, FAIL, FAILING and
4849              UNKNOWN.  Node states of BUSY and IDLE should not  be  specified
4850              in  the  node  configuration,  but set the node state to UNKNOWN
4851              instead.  Setting the node state to UNKNOWN will result  in  the
4852              node  state  being  set to BUSY, IDLE or other appropriate state
4853              based upon recovered system state information.  For more  infor‐
4854              mation  about  these  states see the descriptions under State in
4855              the NodeName= section above.  The default value is UNKNOWN.
4856
4857
4858       As an example, you can do something similar to the following to  define
4859       four front end nodes for running slurmd daemons.
4860       FrontendName=frontend[00-03] FrontendAddr=efrontend[00-03] State=UNKNOWN
4861
4862

NODESET CONFIGURATION

4864       The  nodeset  configuration  allows you to define a name for a specific
4865       set of nodes which can be used to simplify the partition  configuration
4866       section, especially for heterogenous or condo-style systems. Each node‐
4867       set may be defined by an explicit list of nodes,  and/or  by  filtering
4868       the  nodes  by  a  particular  configured feature. If both Feature= and
4869       Nodes= are used the nodeset shall be the  union  of  the  two  subsets.
4870       Note  that the nodesets are only used to simplify the partition defini‐
4871       tions at present, and are not usable outside of the partition  configu‐
4872       ration.
4873
4874       Feature
4875              All  nodes  with this single feature will be included as part of
4876              this nodeset.
4877
4878       Nodes  List of nodes in this set.
4879
4880       NodeSet
4881              Unique name for a set of nodes. Must not overlap with any  Node‐
4882              Name definitions.
4883
4884

PARTITION CONFIGURATION

4886       The partition configuration permits you to establish different job lim‐
4887       its or access controls for various groups  (or  partitions)  of  nodes.
4888       Nodes  may  be  in  more than one partition, making partitions serve as
4889       general purpose queues.  For example one may put the same set of  nodes
4890       into  two  different  partitions, each with different constraints (time
4891       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4892       allocated  resources  within a single partition.  Default values can be
4893       specified with a record  in  which  PartitionName  is  "DEFAULT".   The
4894       default  entry values will apply only to lines following it in the con‐
4895       figuration file and the default values can be reset multiple  times  in
4896       the   configuration   file  with  multiple  entries  where  "Partition‐
4897       Name=DEFAULT".  The "PartitionName=" specification must  be  placed  on
4898       every line describing the configuration of partitions.  Each line where
4899       PartitionName is "DEFAULT" will replace or add to previous default val‐
4900       ues and not a reinitialize the default values.  A single partition name
4901       can not appear as a PartitionName value in more than one  line  (dupli‐
4902       cate  partition  name records will be ignored).  If a partition that is
4903       in use is deleted from the configuration  and  slurm  is  restarted  or
4904       reconfigured  (scontrol reconfigure), jobs using the partition are can‐
4905       celed.  NOTE: Put all parameters for each partition on a  single  line.
4906       Each  line  of  partition  configuration information should represent a
4907       different partition.  The partition  configuration  file  contains  the
4908       following information:
4909
4910
4911       AllocNodes
4912              Comma  separated  list of nodes from which users can submit jobs
4913              in the partition.  Node names may be specified  using  the  node
4914              range  expression  syntax described above.  The default value is
4915              "ALL".
4916
4917
4918       AllowAccounts
4919              Comma separated list of accounts which may execute jobs  in  the
4920              partition.   The default value is "ALL".  NOTE: If AllowAccounts
4921              is used then DenyAccounts will not be enforced.  Also  refer  to
4922              DenyAccounts.
4923
4924
4925       AllowGroups
4926              Comma  separated  list  of group names which may execute jobs in
4927              the partition.  If at least one group associated with  the  user
4928              attempting to execute the job is in AllowGroups, he will be per‐
4929              mitted to use this partition.  Jobs executed as  user  root  can
4930              use  any  partition  without regard to the value of AllowGroups.
4931              If user root attempts to execute a job  as  another  user  (e.g.
4932              using  srun's  --uid  option), this other user must be in one of
4933              groups identified by AllowGroups for  the  job  to  successfully
4934              execute.   The default value is "ALL".  When set, all partitions
4935              that a user does not have access will  be  hidden  from  display
4936              regardless of the settings used for PrivateData.  NOTE: For per‐
4937              formance reasons, Slurm maintains a list of user IDs allowed  to
4938              use  each  partition and this is checked at job submission time.
4939              This list of user IDs is updated when the  slurmctld  daemon  is
4940              restarted, reconfigured (e.g. "scontrol reconfig") or the parti‐
4941              tion's AllowGroups value is reset, even if is value is unchanged
4942              (e.g.  "scontrol  update PartitionName=name AllowGroups=group").
4943              For a user's access to a partition to  change,  both  his  group
4944              membership  must  change  and Slurm's internal user ID list must
4945              change using one of the methods described above.
4946
4947
4948       AllowQos
4949              Comma separated list of Qos which may execute jobs in the parti‐
4950              tion.   Jobs executed as user root can use any partition without
4951              regard to the value of AllowQos.  The default  value  is  "ALL".
4952              NOTE:  If  AllowQos  is  used then DenyQos will not be enforced.
4953              Also refer to DenyQos.
4954
4955
4956       Alternate
4957              Partition name of alternate partition to be used if the state of
4958              this partition is "DRAIN" or "INACTIVE."
4959
4960
4961       CpuBind
4962              If  a job step request does not specify an option to control how
4963              tasks are bound to allocated CPUs  (--cpu-bind)  and  all  nodes
4964              allocated  to  the  job  do not have the same CpuBind option the
4965              node. Then the partition's CpuBind option will control how tasks
4966              are  bound  to allocated resources.  Supported values forCpuBind
4967              are  "none",  "board",  "socket",  "ldom"  (NUMA),  "core"   and
4968              "thread".
4969
4970
4971       Default
4972              If this keyword is set, jobs submitted without a partition spec‐
4973              ification will utilize  this  partition.   Possible  values  are
4974              "YES" and "NO".  The default value is "NO".
4975
4976
4977       DefCpuPerGPU
4978              Default count of CPUs allocated per allocated GPU.
4979
4980
4981       DefMemPerCPU
4982              Default   real  memory  size  available  per  allocated  CPU  in
4983              megabytes.  Used to avoid over-subscribing  memory  and  causing
4984              paging.  DefMemPerCPU would generally be used if individual pro‐
4985              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
4986              SelectType=select/cons_tres).   If  not  set,  the  DefMemPerCPU
4987              value for the entire cluster will be  used.   Also  see  DefMem‐
4988              PerGPU,  DefMemPerNode  and MaxMemPerCPU.  DefMemPerCPU, DefMem‐
4989              PerGPU and DefMemPerNode are mutually exclusive.
4990
4991
4992       DefMemPerGPU
4993              Default  real  memory  size  available  per  allocated  GPU   in
4994              megabytes.   Also see DefMemPerCPU, DefMemPerNode and MaxMemPer‐
4995              CPU.  DefMemPerCPU, DefMemPerGPU and DefMemPerNode are  mutually
4996              exclusive.
4997
4998
4999       DefMemPerNode
5000              Default  real  memory  size  available  per  allocated  node  in
5001              megabytes.  Used to avoid over-subscribing  memory  and  causing
5002              paging.   DefMemPerNode  would  generally be used if whole nodes
5003              are allocated to jobs (SelectType=select/linear)  and  resources
5004              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5005              If not set, the DefMemPerNode value for the entire cluster  will
5006              be  used.  Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerCPU.
5007              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
5008              sive.
5009
5010
5011       DenyAccounts
5012              Comma  separated  list of accounts which may not execute jobs in
5013              the partition.  By default, no accounts are denied access  NOTE:
5014              If AllowAccounts is used then DenyAccounts will not be enforced.
5015              Also refer to AllowAccounts.
5016
5017
5018       DenyQos
5019              Comma separated list of Qos which may not execute  jobs  in  the
5020              partition.   By  default,  no  QOS  are  denied  access NOTE: If
5021              AllowQos is used then DenyQos will not be enforced.  Also  refer
5022              AllowQos.
5023
5024
5025       DefaultTime
5026              Run  time limit used for jobs that don't specify a value. If not
5027              set then MaxTime will be used.  Format is the same as  for  Max‐
5028              Time.
5029
5030
5031       DisableRootJobs
5032              If  set  to  "YES" then user root will be prevented from running
5033              any jobs on this partition.  The default value will be the value
5034              of  DisableRootJobs  set  outside  of  a partition specification
5035              (which is "NO", allowing user root to execute jobs).
5036
5037
5038       ExclusiveUser
5039              If set to "YES" then nodes  will  be  exclusively  allocated  to
5040              users.  Multiple jobs may be run for the same user, but only one
5041              user can be active at a time.  This capability is also available
5042              on a per-job basis by using the --exclusive=user option.
5043
5044
5045       GraceTime
5046              Specifies,  in units of seconds, the preemption grace time to be
5047              extended to a job which has been selected for  preemption.   The
5048              default  value  is  zero, no preemption grace time is allowed on
5049              this partition.  Once a job has been  selected  for  preemption,
5050              its  end  time  is  set  to the current time plus GraceTime. The
5051              job's tasks are immediately sent SIGCONT and SIGTERM signals  in
5052              order to provide notification of its imminent termination.  This
5053              is followed by the SIGCONT, SIGTERM and SIGKILL signal  sequence
5054              upon  reaching  its  new end time. This second set of signals is
5055              sent to both the tasks  and  the  containing  batch  script,  if
5056              applicable.   See also the global KillWait configuration parame‐
5057              ter.
5058
5059
5060       Hidden Specifies if the partition and its jobs  are  to  be  hidden  by
5061              default.   Hidden  partitions will by default not be reported by
5062              the Slurm APIs or commands.  Possible values are "YES" and "NO".
5063              The  default  value  is  "NO".  Note that partitions that a user
5064              lacks access to by virtue of the AllowGroups parameter will also
5065              be hidden by default.
5066
5067
5068       LLN    Schedule resources to jobs on the least loaded nodes (based upon
5069              the number of idle CPUs). This is generally only recommended for
5070              an  environment  with serial jobs as idle resources will tend to
5071              be highly fragmented, resulting in parallel jobs being  distrib‐
5072              uted  across many nodes.  Note that node Weight takes precedence
5073              over how many idle resources are on each  node.   Also  see  the
5074              SelectParameters configuration parameter CR_LLN to use the least
5075              loaded nodes in every partition.
5076
5077
5078       MaxCPUsPerNode
5079              Maximum number of CPUs on any node available to  all  jobs  from
5080              this partition.  This can be especially useful to schedule GPUs.
5081              For example a node can be associated with two  Slurm  partitions
5082              (e.g.  "cpu"  and  "gpu") and the partition/queue "cpu" could be
5083              limited to only a subset of the node's CPUs, ensuring  that  one
5084              or  more  CPUs  would  be  available to jobs in the "gpu" parti‐
5085              tion/queue.
5086
5087
5088       MaxMemPerCPU
5089              Maximum  real  memory  size  available  per  allocated  CPU   in
5090              megabytes.   Used  to  avoid over-subscribing memory and causing
5091              paging.  MaxMemPerCPU would generally be used if individual pro‐
5092              cessors  are  allocated  to  jobs (SelectType=select/cons_res or
5093              SelectType=select/cons_tres).   If  not  set,  the  MaxMemPerCPU
5094              value  for the entire cluster will be used.  Also see DefMemPer‐
5095              CPU and MaxMemPerNode.  MaxMemPerCPU and MaxMemPerNode are mutu‐
5096              ally exclusive.
5097
5098
5099       MaxMemPerNode
5100              Maximum  real  memory  size  available  per  allocated  node  in
5101              megabytes.  Used to avoid over-subscribing  memory  and  causing
5102              paging.   MaxMemPerNode  would  generally be used if whole nodes
5103              are allocated to jobs (SelectType=select/linear)  and  resources
5104              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
5105              If not set, the MaxMemPerNode value for the entire cluster  will
5106              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
5107              and MaxMemPerNode are mutually exclusive.
5108
5109
5110       MaxNodes
5111              Maximum count of nodes which may be allocated to any single job.
5112              The  default  value  is "UNLIMITED", which is represented inter‐
5113              nally as -1.  This limit does not  apply  to  jobs  executed  by
5114              SlurmUser or user root.
5115
5116
5117       MaxTime
5118              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
5119              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
5120              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
5121              tion is one minute and second values are rounded up to the  next
5122              minute.  This limit does not apply to jobs executed by SlurmUser
5123              or user root.
5124
5125
5126       MinNodes
5127              Minimum count of nodes which may be allocated to any single job.
5128              The  default value is 0.  This limit does not apply to jobs exe‐
5129              cuted by SlurmUser or user root.
5130
5131
5132       Nodes  Comma separated list of nodes or nodesets which  are  associated
5133              with this partition.  Node names may be specified using the node
5134              range expression syntax described above. A blank list  of  nodes
5135              (i.e.  "Nodes= ") can be used if one wants a partition to exist,
5136              but have no resources (possibly on a temporary basis).  A  value
5137              of "ALL" is mapped to all nodes configured in the cluster.
5138
5139
5140       OverSubscribe
5141              Controls  the  ability of the partition to execute more than one
5142              job at a time on each resource (node, socket or  core  depending
5143              upon the value of SelectTypeParameters).  If resources are to be
5144              over-subscribed,  avoiding  memory  over-subscription  is   very
5145              important.   SelectTypeParameters  should be configured to treat
5146              memory as a consumable resource and the --mem option  should  be
5147              used  for  job  allocations.   Sharing of resources is typically
5148              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
5149              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
5150              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
5151              can  negatively  impact  performance for systems with many thou‐
5152              sands of running jobs.  The default value  is  "NO".   For  more
5153              information see the following web pages:
5154              https://slurm.schedmd.com/cons_res.html
5155              https://slurm.schedmd.com/cons_res_share.html
5156              https://slurm.schedmd.com/gang_scheduling.html
5157              https://slurm.schedmd.com/preempt.html
5158
5159
5160              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
5161                          Type=select/cons_res or  SelectType=select/cons_tres
5162                          configured.   Jobs that run in partitions with Over‐
5163                          Subscribe=EXCLUSIVE will have  exclusive  access  to
5164                          all allocated nodes.
5165
5166              FORCE       Makes  all  resources in the partition available for
5167                          oversubscription without any means for users to dis‐
5168                          able  it.   May be followed with a colon and maximum
5169                          number of jobs in running or suspended  state.   For
5170                          example  OverSubscribe=FORCE:4  enables  each  node,
5171                          socket or core to oversubscribe each  resource  four
5172                          ways.   Recommended  only for systems using Preempt‐
5173                          Mode=suspend,gang.
5174
5175                          NOTE: OverSubscribe=FORCE:1 is a special  case  that
5176                          is not exactly equivalent to OverSubscribe=NO. Over‐
5177                          Subscribe=FORCE:1 disables the regular oversubscrip‐
5178                          tion  of resources in the same partition but it will
5179                          still allow oversubscription due to preemption. Set‐
5180                          ting  OverSubscribe=NO will prevent oversubscription
5181                          from happening due to preemption as well.
5182
5183                          NOTE: If using PreemptType=preempt/qos you can spec‐
5184                          ify  a  value  for FORCE that is greater than 1. For
5185                          example, OverSubscribe=FORCE:2 will permit two  jobs
5186                          per  resource  normally,  but  a  third  job  can be
5187                          started only if done  so  through  preemption  based
5188                          upon QOS.
5189
5190                          NOTE: If OverSubscribe is configured to FORCE or YES
5191                          in your slurm.conf and the system is not  configured
5192                          to  use  preemption (PreemptMode=OFF) accounting can
5193                          easily grow to values greater than the  actual  uti‐
5194                          lization.  It  may  be common on such systems to get
5195                          error messages in the slurmdbd log stating: "We have
5196                          more allocated time than is possible."
5197
5198
5199              YES         Makes  all  resources in the partition available for
5200                          sharing upon request by  the  job.   Resources  will
5201                          only be over-subscribed when explicitly requested by
5202                          the user using the "--oversubscribe" option  on  job
5203                          submission.   May be followed with a colon and maxi‐
5204                          mum number of jobs in running  or  suspended  state.
5205                          For example "OverSubscribe=YES:4" enables each node,
5206                          socket or core to execute up to four jobs  at  once.
5207                          Recommended  only  for  systems  running  with  gang
5208                          scheduling (PreemptMode=suspend,gang).
5209
5210              NO          Selected resources are allocated to a single job. No
5211                          resource will be allocated to more than one job.
5212
5213                          NOTE:   Even   if  you  are  using  PreemptMode=sus‐
5214                          pend,gang,  setting  OverSubscribe=NO  will  disable
5215                          preemption   on   that   partition.   Use   OverSub‐
5216                          scribe=FORCE:1 if you want to disable  normal  over‐
5217                          subscription  but still allow suspension due to pre‐
5218                          emption.
5219
5220
5221       PartitionName
5222              Name by which the partition may be  referenced  (e.g.  "Interac‐
5223              tive").   This  name  can  be specified by users when submitting
5224              jobs.  If the PartitionName is "DEFAULT", the  values  specified
5225              with  that  record will apply to subsequent partition specifica‐
5226              tions unless explicitly set to other values  in  that  partition
5227              record or replaced with a different set of default values.  Each
5228              line where PartitionName is "DEFAULT" will  replace  or  add  to
5229              previous  default values and not a reinitialize the default val‐
5230              ues.
5231
5232
5233       PreemptMode
5234              Mechanism used to preempt jobs or  enable  gang  scheduling  for
5235              this  partition  when PreemptType=preempt/partition_prio is con‐
5236              figured.   This  partition-specific  PreemptMode   configuration
5237              parameter  will  override  the cluster-wide PreemptMode for this
5238              partition.  It can be set to OFF to disable preemption and  gang
5239              scheduling  for  this  partition.  See also PriorityTier and the
5240              above description of the cluster-wide PreemptMode parameter  for
5241              further details.
5242
5243
5244       PriorityJobFactor
5245              Partition  factor  used by priority/multifactor plugin in calcu‐
5246              lating job priority.  The value may not exceed 65533.  Also  see
5247              PriorityTier.
5248
5249
5250       PriorityTier
5251              Jobs  submitted to a partition with a higher priority tier value
5252              will be dispatched before pending jobs in partition  with  lower
5253              priority  tier value and, if possible, they will preempt running
5254              jobs from partitions with lower priority tier values.  Note that
5255              a partition's priority tier takes precedence over a job's prior‐
5256              ity.  The value may not exceed 65533.  Also see  PriorityJobFac‐
5257              tor.
5258
5259
5260       QOS    Used  to  extend  the  limits available to a QOS on a partition.
5261              Jobs will not be associated to this QOS outside of being associ‐
5262              ated  to  the partition.  They will still be associated to their
5263              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5264              set  in both the Partition's QOS and the Job's QOS the Partition
5265              QOS will be honored unless the Job's  QOS  has  the  OverPartQOS
5266              flag set in which the Job's QOS will have priority.
5267
5268
5269       ReqResv
5270              Specifies  users  of  this partition are required to designate a
5271              reservation when submitting a job. This option can be useful  in
5272              restricting  usage  of a partition that may have higher priority
5273              or additional resources to be allowed only within a reservation.
5274              Possible values are "YES" and "NO".  The default value is "NO".
5275
5276
5277       RootOnly
5278              Specifies  if  only  user  ID zero (i.e. user root) may allocate
5279              resources in this partition. User root  may  allocate  resources
5280              for  any  other  user, but the request must be initiated by user
5281              root.  This option can be useful for a partition to  be  managed
5282              by  some  external  entity (e.g. a higher-level job manager) and
5283              prevents users from directly using  those  resources.   Possible
5284              values are "YES" and "NO".  The default value is "NO".
5285
5286
5287       SelectTypeParameters
5288              Partition-specific   resource   allocation  type.   This  option
5289              replaces the global SelectTypeParameters value.  Supported  val‐
5290              ues are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory.
5291              Use requires the system-wide SelectTypeParameters value  be  set
5292              to  any  of  the four supported values previously listed; other‐
5293              wise, the partition-specific value will be ignored.
5294
5295
5296       Shared The Shared configuration parameter  has  been  replaced  by  the
5297              OverSubscribe parameter described above.
5298
5299
5300       State  State of partition or availability for use.  Possible values are
5301              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5302              See also the related "Alternate" keyword.
5303
5304              UP        Designates  that  new jobs may be queued on the parti‐
5305                        tion, and that jobs may be  allocated  nodes  and  run
5306                        from the partition.
5307
5308              DOWN      Designates  that  new jobs may be queued on the parti‐
5309                        tion, but queued jobs may not be allocated  nodes  and
5310                        run  from  the  partition. Jobs already running on the
5311                        partition continue to run. The jobs must be explicitly
5312                        canceled to force their termination.
5313
5314              DRAIN     Designates  that no new jobs may be queued on the par‐
5315                        tition (job submission requests will be denied with an
5316                        error  message), but jobs already queued on the parti‐
5317                        tion may be allocated nodes and  run.   See  also  the
5318                        "Alternate" partition specification.
5319
5320              INACTIVE  Designates  that no new jobs may be queued on the par‐
5321                        tition, and jobs already queued may not  be  allocated
5322                        nodes  and  run.   See  also the "Alternate" partition
5323                        specification.
5324
5325
5326       TRESBillingWeights
5327              TRESBillingWeights is used to define the billing weights of each
5328              TRES  type  that will be used in calculating the usage of a job.
5329              The calculated usage is used when calculating fairshare and when
5330              enforcing the TRES billing limit on jobs.
5331
5332              Billing weights are specified as a comma-separated list of <TRES
5333              Type>=<TRES Billing Weight> pairs.
5334
5335              Any TRES Type is available for billing. Note that the base  unit
5336              for memory and burst buffers is megabytes.
5337
5338              By  default  the billing of TRES is calculated as the sum of all
5339              TRES types multiplied by their corresponding billing weight.
5340
5341              The weighted amount of a resource can be adjusted  by  adding  a
5342              suffix  of K,M,G,T or P after the billing weight. For example, a
5343              memory weight of "mem=.25" on a job allocated 8GB will be billed
5344              2048  (8192MB  *.25) units. A memory weight of "mem=.25G" on the
5345              same job will be billed 2 (8192MB * (.25/1024)) units.
5346
5347              Negative values are allowed.
5348
5349              When a job is allocated 1 CPU and 8 GB of memory on a  partition
5350              configured                   with                   TRESBilling‐
5351              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5352              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5353
5354              If  PriorityFlags=MAX_TRES  is  configured, the billable TRES is
5355              calculated as the MAX of individual TRES' on a node (e.g.  cpus,
5356              mem,  gres)  plus  the  sum of all global TRES' (e.g. licenses).
5357              Using  the  same  example  above  the  billable  TRES  will   be
5358              MAX(1*1.0, 8*0.25) + (0*2.0) = 2.0.
5359
5360              If  TRESBillingWeights  is  not  defined  then the job is billed
5361              against the total number of allocated CPUs.
5362
5363              NOTE: TRESBillingWeights doesn't affect job priority directly as
5364              it  is  currently  not used for the size of the job. If you want
5365              TRES' to play a role in the job's priority  then  refer  to  the
5366              PriorityWeightTRES option.
5367
5368
5369

PROLOG AND EPILOG SCRIPTS

5371       There  are  a variety of prolog and epilog program options that execute
5372       with various permissions and at various times.  The four  options  most
5373       likely to be used are: Prolog and Epilog (executed once on each compute
5374       node for each job) plus PrologSlurmctld and  EpilogSlurmctld  (executed
5375       once on the ControlMachine for each job).
5376
5377       NOTE:  Standard  output  and error messages are normally not preserved.
5378       Explicitly write output and error messages to an  appropriate  location
5379       if you wish to preserve that information.
5380
5381       NOTE:   By default the Prolog script is ONLY run on any individual node
5382       when it first sees a job step from a new allocation. It  does  not  run
5383       the  Prolog immediately when an allocation is granted.  If no job steps
5384       from an allocation are run on a node, it will never run the Prolog  for
5385       that  allocation.   This  Prolog  behaviour  can be changed by the Pro‐
5386       logFlags parameter.  The Epilog, on the  other  hand,  always  runs  on
5387       every node of an allocation when the allocation is released.
5388
5389       If the Epilog fails (returns a non-zero exit code), this will result in
5390       the node being set to a DRAIN  state.   If  the  EpilogSlurmctld  fails
5391       (returns  a non-zero exit code), this will only be logged.  If the Pro‐
5392       log fails (returns a non-zero exit code), this will result in the  node
5393       being  set  to a DRAIN state and the job being requeued in a held state
5394       unless nohold_on_prolog_fail is configured in SchedulerParameters.   If
5395       the  PrologSlurmctld  fails  (returns  a non-zero exit code), this will
5396       result in the job being requeued to be executed on another node if pos‐
5397       sible.  Only  batch jobs can be requeued.  Interactive jobs (salloc and
5398       srun) will be cancelled if the PrologSlurmctld fails.
5399
5400
5401       Information about the job is passed to  the  script  using  environment
5402       variables.  Unless otherwise specified, these environment variables are
5403       available in each of the scripts mentioned above (Prolog, Epilog,  Pro‐
5404       logSlurmctld and EpilogSlurmctld). For a full list of environment vari‐
5405       ables that includes those  available  in  the  SrunProlog,  SrunEpilog,
5406       TaskProlog  and  TaskEpilog  please  see  the  Prolog  and Epilog Guide
5407       <https://slurm.schedmd.com/prolog_epilog.html>.
5408
5409       SLURM_ARRAY_JOB_ID
5410              If this job is part of a job array, this will be set to the  job
5411              ID.   Otherwise  it will not be set.  To reference this specific
5412              task  of  a   job   array,   combine   SLURM_ARRAY_JOB_ID   with
5413              SLURM_ARRAY_TASK_ID         (e.g.        "scontrol        update
5414              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5415              PrologSlurmctld and EpilogSlurmctld only.
5416
5417       SLURM_ARRAY_TASK_ID
5418              If this job is part of a job array, this will be set to the task
5419              ID.  Otherwise it will not be set.  To reference  this  specific
5420              task   of   a   job   array,   combine  SLURM_ARRAY_JOB_ID  with
5421              SLURM_ARRAY_TASK_ID        (e.g.        "scontrol         update
5422              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5423              PrologSlurmctld and EpilogSlurmctld only.
5424
5425       SLURM_ARRAY_TASK_MAX
5426              If this job is part of a job array, this will be set to the max‐
5427              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5428              logSlurmctld and EpilogSlurmctld only.
5429
5430       SLURM_ARRAY_TASK_MIN
5431              If this job is part of a job array, this will be set to the min‐
5432              imum  task ID.  Otherwise it will not be set.  Available in Pro‐
5433              logSlurmctld and EpilogSlurmctld only.
5434
5435       SLURM_ARRAY_TASK_STEP
5436              If this job is part of a job array, this will be set to the step
5437              size  of  task IDs.  Otherwise it will not be set.  Available in
5438              PrologSlurmctld and EpilogSlurmctld only.
5439
5440       SLURM_CLUSTER_NAME
5441              Name of the cluster executing the job.
5442
5443       SLURM_CONF
5444              Location of the slurm.conf file. Available in Prolog and  Epilog
5445              only.
5446
5447       SLURMD_NODENAME
5448              Name of the node running the task. In the case of a parallel job
5449              executing on multiple compute nodes, the various tasks will have
5450              this  environment  variable set to different values on each com‐
5451              pute node. Availble in Prolog and Epilog only.
5452
5453       SLURM_JOB_ACCOUNT
5454              Account name used for the job.  Available in PrologSlurmctld and
5455              EpilogSlurmctld only.
5456
5457       SLURM_JOB_CONSTRAINTS
5458              Features  required  to  run  the job.  Available in Prolog, Pro‐
5459              logSlurmctld and EpilogSlurmctld only.
5460
5461       SLURM_JOB_DERIVED_EC
5462              The highest exit code of all of the  job  steps.   Available  in
5463              EpilogSlurmctld only.
5464
5465       SLURM_JOB_EXIT_CODE
5466              The  exit  code  of the job script (or salloc). The value is the
5467              status as returned by  the  wait()  system  call  (See  wait(2))
5468              Available in EpilogSlurmctld only.
5469
5470       SLURM_JOB_EXIT_CODE2
5471              The  exit  code of the job script (or salloc). The value has the
5472              format <exit>:<sig>. The first number is the  exit  code,  typi‐
5473              cally  as  set  by the exit() function. The second number of the
5474              signal that caused the process to terminate if it was terminated
5475              by a signal.  Available in EpilogSlurmctld only.
5476
5477       SLURM_JOB_GID
5478              Group  ID  of the job's owner.  Available in PrologSlurmctld and
5479              EpilogSlurmctld only.
5480
5481       SLURM_JOB_GPUS
5482              GPU IDs allocated to the job (if any).  Available in the  Prolog
5483              only.
5484
5485       SLURM_JOB_GROUP
5486              Group name of the job's owner.  Available in PrologSlurmctld and
5487              EpilogSlurmctld only.
5488
5489       SLURM_JOB_ID
5490              Job ID.
5491
5492       SLURM_JOBID
5493              Job ID.
5494
5495       SLURM_JOB_NAME
5496              Name of the job.  Available in PrologSlurmctld and  EpilogSlurm‐
5497              ctld only.
5498
5499       SLURM_JOB_NODELIST
5500              Nodes  assigned  to job. A Slurm hostlist expression.  "scontrol
5501              show hostnames" can be used to convert this to a list  of  indi‐
5502              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5503              logSlurmctld only.
5504
5505       SLURM_JOB_PARTITION
5506              Partition that job runs in.  Available in  Prolog,  PrologSlurm‐
5507              ctld and EpilogSlurmctld only.
5508
5509       SLURM_JOB_UID
5510              User ID of the job's owner.
5511
5512       SLURM_JOB_USER
5513              User name of the job's owner.
5514
5515       SLURM_SCRIPT_CONTEXT
5516              Identifies which epilog or prolog program is currently running.
5517
5518

NETWORK TOPOLOGY

5520       Slurm  is  able  to  optimize  job allocations to minimize network con‐
5521       tention.  Special Slurm logic is used to optimize allocations  on  sys‐
5522       tems with a three-dimensional interconnect.  and information about con‐
5523       figuring those systems are  available  on  web  pages  available  here:
5524       <https://slurm.schedmd.com/>.   For a hierarchical network, Slurm needs
5525       to have detailed information about how nodes are configured on the net‐
5526       work switches.
5527
5528       Given  network  topology  information,  Slurm  allocates all of a job's
5529       resources onto a single leaf of  the  network  (if  possible)  using  a
5530       best-fit  algorithm.  Otherwise it will allocate a job's resources onto
5531       multiple leaf switches so  as  to  minimize  the  use  of  higher-level
5532       switches.   The  TopologyPlugin parameter controls which plugin is used
5533       to collect network topology information.   The  only  values  presently
5534       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5535       forms best-fit logic over three-dimensional topology),  "topology/none"
5536       (default  for other systems, best-fit logic over one-dimensional topol‐
5537       ogy), "topology/tree" (determine the network topology based upon infor‐
5538       mation  contained  in a topology.conf file, see "man topology.conf" for
5539       more information).  Future  plugins  may  gather  topology  information
5540       directly  from  the network.  The topology information is optional.  If
5541       not provided, Slurm will perform  a  best-fit  algorithm  assuming  the
5542       nodes  are  in a one-dimensional array as configured and the communica‐
5543       tions cost is related to the node distance in this array.
5544
5545

RELOCATING CONTROLLERS

5547       If the cluster's computers used for the primary  or  backup  controller
5548       will be out of service for an extended period of time, it may be desir‐
5549       able to relocate them.  In order to do so, follow this procedure:
5550
5551       1. Stop the Slurm daemons
5552       2. Modify the slurm.conf file appropriately
5553       3. Distribute the updated slurm.conf file to all nodes
5554       4. Restart the Slurm daemons
5555
5556       There should be no loss of any running or pending  jobs.   Ensure  that
5557       any  nodes  added  to  the  cluster  have  the  current slurm.conf file
5558       installed.
5559
5560       CAUTION: If two nodes are simultaneously configured as the primary con‐
5561       troller  (two  nodes  on which SlurmctldHost specify the local host and
5562       the slurmctld daemon is executing on each),  system  behavior  will  be
5563       destructive.   If a compute node has an incorrect SlurmctldHost parame‐
5564       ter, that node may be rendered unusable, but no other harm will result.
5565
5566

EXAMPLE

5568       #
5569       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5570       # Author: John Doe
5571       # Date: 11/06/2001
5572       #
5573       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5574       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5575       #
5576       AuthType=auth/munge
5577       Epilog=/usr/local/slurm/epilog
5578       Prolog=/usr/local/slurm/prolog
5579       FirstJobId=65536
5580       InactiveLimit=120
5581       JobCompType=jobcomp/filetxt
5582       JobCompLoc=/var/log/slurm/jobcomp
5583       KillWait=30
5584       MaxJobCount=10000
5585       MinJobAge=3600
5586       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5587       ReturnToService=0
5588       SchedulerType=sched/backfill
5589       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5590       SlurmdLogFile=/var/log/slurm/slurmd.log
5591       SlurmctldPort=7002
5592       SlurmdPort=7003
5593       SlurmdSpoolDir=/var/spool/slurmd.spool
5594       StateSaveLocation=/var/spool/slurm.state
5595       SwitchType=switch/none
5596       TmpFS=/tmp
5597       WaitTime=30
5598       JobCredentialPrivateKey=/usr/local/slurm/private.key
5599       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5600       #
5601       # Node Configurations
5602       #
5603       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5604       NodeName=DEFAULT State=UNKNOWN
5605       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5606       # Update records for specific DOWN nodes
5607       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5608       #
5609       # Partition Configurations
5610       #
5611       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5612       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5613       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5614       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5615
5616

INCLUDE MODIFIERS

5618       The "include" key word can be used with modifiers within the  specified
5619       pathname.  These modifiers would be replaced with cluster name or other
5620       information depending on which modifier is specified. If  the  included
5621       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5622       slash), it will searched for in the same directory  as  the  slurm.conf
5623       file.
5624
5625       %c     Cluster name specified in the slurm.conf will be used.
5626
5627       EXAMPLE
5628       ClusterName=linux
5629       include /home/slurm/etc/%c_config
5630       # Above line interpreted as
5631       # "include /home/slurm/etc/linux_config"
5632
5633

FILE AND DIRECTORY PERMISSIONS

5635       There  are  three  classes  of  files:  Files used by slurmctld must be
5636       accessible by user SlurmUser and accessible by the primary  and  backup
5637       control machines.  Files used by slurmd must be accessible by user root
5638       and accessible from every compute node.  A few files need to be  acces‐
5639       sible by normal users on all login and compute nodes.  While many files
5640       and directories are listed below, most of them will not  be  used  with
5641       most configurations.
5642
5643       Epilog Must  be  executable  by  user root.  It is recommended that the
5644              file be readable by all users.  The file  must  exist  on  every
5645              compute node.
5646
5647       EpilogSlurmctld
5648              Must  be  executable  by user SlurmUser.  It is recommended that
5649              the file be readable by all users.  The file must be  accessible
5650              by the primary and backup control machines.
5651
5652       HealthCheckProgram
5653              Must  be  executable  by  user root.  It is recommended that the
5654              file be readable by all users.  The file  must  exist  on  every
5655              compute node.
5656
5657       JobCompLoc
5658              If this specifies a file, it must be writable by user SlurmUser.
5659              The file must be accessible by the primary  and  backup  control
5660              machines.
5661
5662       JobCredentialPrivateKey
5663              Must be readable only by user SlurmUser and writable by no other
5664              users.  The file must be accessible by the  primary  and  backup
5665              control machines.
5666
5667       JobCredentialPublicCertificate
5668              Readable  to  all  users  on all nodes.  Must not be writable by
5669              regular users.
5670
5671       MailProg
5672              Must be executable by user SlurmUser.  Must not be  writable  by
5673              regular  users.   The file must be accessible by the primary and
5674              backup control machines.
5675
5676       Prolog Must be executable by user root.  It  is  recommended  that  the
5677              file  be  readable  by  all users.  The file must exist on every
5678              compute node.
5679
5680       PrologSlurmctld
5681              Must be executable by user SlurmUser.  It  is  recommended  that
5682              the  file be readable by all users.  The file must be accessible
5683              by the primary and backup control machines.
5684
5685       ResumeProgram
5686              Must be executable by user SlurmUser.  The file must be accessi‐
5687              ble by the primary and backup control machines.
5688
5689       slurm.conf
5690              Readable  to  all  users  on all nodes.  Must not be writable by
5691              regular users.
5692
5693       SlurmctldLogFile
5694              Must be writable by user SlurmUser.  The file must be accessible
5695              by the primary and backup control machines.
5696
5697       SlurmctldPidFile
5698              Must  be  writable by user root.  Preferably writable and remov‐
5699              able by SlurmUser.  The file must be accessible by  the  primary
5700              and backup control machines.
5701
5702       SlurmdLogFile
5703              Must  be  writable  by user root.  A distinct file must exist on
5704              each compute node.
5705
5706       SlurmdPidFile
5707              Must be writable by user root.  A distinct file  must  exist  on
5708              each compute node.
5709
5710       SlurmdSpoolDir
5711              Must  be  writable  by user root.  A distinct file must exist on
5712              each compute node.
5713
5714       SrunEpilog
5715              Must be executable by all users.  The file must exist  on  every
5716              login and compute node.
5717
5718       SrunProlog
5719              Must  be  executable by all users.  The file must exist on every
5720              login and compute node.
5721
5722       StateSaveLocation
5723              Must be writable by user SlurmUser.  The file must be accessible
5724              by the primary and backup control machines.
5725
5726       SuspendProgram
5727              Must be executable by user SlurmUser.  The file must be accessi‐
5728              ble by the primary and backup control machines.
5729
5730       TaskEpilog
5731              Must be executable by all users.  The file must exist  on  every
5732              compute node.
5733
5734       TaskProlog
5735              Must  be  executable by all users.  The file must exist on every
5736              compute node.
5737
5738       UnkillableStepProgram
5739              Must be executable by user SlurmUser.  The file must be accessi‐
5740              ble by the primary and backup control machines.
5741
5742

LOGGING

5744       Note  that  while  Slurm  daemons  create  log files and other files as
5745       needed, it treats the lack of parent  directories  as  a  fatal  error.
5746       This prevents the daemons from running if critical file systems are not
5747       mounted and will minimize the risk of cold-starting  (starting  without
5748       preserving jobs).
5749
5750       Log files and job accounting files, may need to be created/owned by the
5751       "SlurmUser" uid to be  successfully  accessed.   Use  the  "chown"  and
5752       "chmod"  commands  to  set the ownership and permissions appropriately.
5753       See the section FILE AND DIRECTORY PERMISSIONS  for  information  about
5754       the various files and directories used by Slurm.
5755
5756       It  is  recommended  that  the logrotate utility be used to ensure that
5757       various log files do not become too large.  This also applies  to  text
5758       files  used  for  accounting, process tracking, and the slurmdbd log if
5759       they are used.
5760
5761       Here is a sample logrotate configuration. Make appropriate site modifi‐
5762       cations  and  save  as  /etc/logrotate.d/slurm  on  all nodes.  See the
5763       logrotate man page for more details.
5764
5765       ##
5766       # Slurm Logrotate Configuration
5767       ##
5768       /var/log/slurm/*.log {
5769            compress
5770            missingok
5771            nocopytruncate
5772            nodelaycompress
5773            nomail
5774            notifempty
5775            noolddir
5776            rotate 5
5777            sharedscripts
5778            size=5M
5779            create 640 slurm root
5780            postrotate
5781                 pkill -x --signal SIGUSR2 slurmctld
5782                 pkill -x --signal SIGUSR2 slurmd
5783                 pkill -x --signal SIGUSR2 slurmdbd
5784                 exit 0
5785            endscript
5786       }
5787

COPYING

5789       Copyright (C) 2002-2007 The Regents of the  University  of  California.
5790       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5791       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5792       Copyright (C) 2010-2017 SchedMD LLC.
5793
5794       This  file  is  part  of  Slurm,  a  resource  management program.  For
5795       details, see <https://slurm.schedmd.com/>.
5796
5797       Slurm is free software; you can redistribute it and/or modify it  under
5798       the  terms  of  the GNU General Public License as published by the Free
5799       Software Foundation; either version 2  of  the  License,  or  (at  your
5800       option) any later version.
5801
5802       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
5803       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
5804       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
5805       for more details.
5806
5807

FILES

5809       /etc/slurm.conf
5810
5811

SEE ALSO

5813       cgroup.conf(5), getaddrinfo(3), getrlimit(2),  gres.conf(5),  group(5),
5814       hostname(1),  scontrol(1),  slurmctld(8), slurmd(8), slurmdbd(8), slur‐
5815       mdbd.conf(5), srun(1), spank(8), syslog(3), topology.conf(5)
5816
5817
5818
5819January 2021               Slurm Configuration File              slurm.conf(5)
Impressum