slurm.conf(5)

1slurm.conf(5)              Slurm Configuration File              slurm.conf(5)
2
3
4

NAME

6       slurm.conf - Slurm configuration file
7
8

DESCRIPTION

10       slurm.conf is an ASCII file which describes general Slurm configuration
11       information, the nodes to be managed, information about how those nodes
12       are  grouped into partitions, and various scheduling parameters associ‐
13       ated with those partitions. This file should be consistent  across  all
14       nodes in the cluster.
15
16       The  file  location  can  be  modified  at  system build time using the
17       DEFAULT_SLURM_CONF parameter  or  at  execution  time  by  setting  the
18       SLURM_CONF  environment  variable.  The Slurm daemons also allow you to
19       override both the built-in and environment-provided location using  the
20       "-f" option on the command line.
21
22       The  contents  of the file are case insensitive except for the names of
23       nodes and partitions. Any text following a  "#"  in  the  configuration
24       file  is treated as a comment through the end of that line.  Changes to
25       the configuration file take effect upon restart of Slurm daemons,  dae‐
26       mon receipt of the SIGHUP signal, or execution of the command "scontrol
27       reconfigure" unless otherwise noted.
28
29       If a line begins with the word "Include"  followed  by  whitespace  and
30       then  a  file  name, that file will be included inline with the current
31       configuration file. For large or complex systems,  multiple  configura‐
32       tion  files  may  prove easier to manage and enable reuse of some files
33       (See INCLUDE MODIFIERS for more details).
34
35       Note on file permissions:
36
37       The slurm.conf file must be readable by all users of Slurm, since it is
38       used  by  many  of the Slurm commands.  Other files that are defined in
39       the slurm.conf file, such as log files and job  accounting  files,  may
40       need  to  be  created/owned  by the user "SlurmUser" to be successfully
41       accessed.  Use the "chown" and "chmod" commands to  set  the  ownership
42       and permissions appropriately.  See the section FILE AND DIRECTORY PER‐
43       MISSIONS for information about the various files and  directories  used
44       by Slurm.
45
46

PARAMETERS

48       The overall configuration parameters available include:
49
50
51       AccountingStorageBackupHost
52              The  name  of  the backup machine hosting the accounting storage
53              database.  If used with the accounting_storage/slurmdbd  plugin,
54              this  is  where the backup slurmdbd would be running.  Only used
55              with systems using SlurmDBD, ignored otherwise.
56
57
58       AccountingStorageEnforce
59              This controls what level  of  association-based  enforcement  to
60              impose on job submissions.  Valid options are any combination of
61              associations, limits, nojobs, nosteps, qos, safe, and wckeys, or
62              all  for  all  things  (expect  nojobs and nosteps, they must be
63              requested as well).
64
65              If limits, qos, or wckeys are set, associations  will  automati‐
66              cally be set.
67
68              If wckeys is set, TrackWCKey will automatically be set.
69
70              If  safe  is  set, limits and associations will automatically be
71              set.
72
73              If nojobs is set nosteps will automatically be set.
74
75              By enforcing Associations no new job is allowed to run unless  a
76              corresponding  association  exists in the system.  If limits are
77              enforced users can be limited by  association  to  whatever  job
78              size or run time limits are defined.
79
80              If nojobs is set Slurm will not account for any jobs or steps on
81              the system, like wise if nosteps is set Slurm will  not  account
82              for any steps ran limits will still be enforced.
83
84              If  safe  is  enforced,  a  job will only be launched against an
85              association or qos that has a GrpTRESMins limit set if  the  job
86              will  be  able  to  run to completion.  Without this option set,
87              jobs will be launched as long as their usage hasn't reached  the
88              cpu-minutes limit which can lead to jobs being launched but then
89              killed when the limit is reached.
90
91              With qos and/or wckeys  enforced  jobs  will  not  be  scheduled
92              unless a valid qos and/or workload characterization key is spec‐
93              ified.
94
95              When AccountingStorageEnforce  is  changed,  a  restart  of  the
96              slurmctld daemon is required (not just a "scontrol reconfig").
97
98
99       AccountingStorageHost
100              The name of the machine hosting the accounting storage database.
101              Only used with systems using SlurmDBD, ignored otherwise.   Also
102              see DefaultStorageHost.
103
104
105       AccountingStorageLoc
106              The fully qualified file name where accounting records are writ‐
107              ten  when   the   AccountingStorageType   is   "accounting_stor‐
108              age/filetxt".  Also see DefaultStorageLoc.
109
110
111       AccountingStoragePass
112              The  password  used  to gain access to the database to store the
113              accounting data.  Only used for database type  storage  plugins,
114              ignored  otherwise.   In the case of Slurm DBD (Database Daemon)
115              with MUNGE authentication this can be configured to use a  MUNGE
116              daemon specifically configured to provide authentication between
117              clusters while the default MUNGE daemon provides  authentication
118              within  a  cluster.   In that case, AccountingStoragePass should
119              specify the named port to be used for  communications  with  the
120              alternate MUNGE daemon (e.g.  "/var/run/munge/global.socket.2").
121              The default value is NULL.  Also see DefaultStoragePass.
122
123
124       AccountingStoragePort
125              The listening port of the accounting  storage  database  server.
126              Only  used for database type storage plugins, ignored otherwise.
127              The default value is  SLURMDBD_PORT  as  established  at  system
128              build  time. If no value is explicitly specified, it will be set
129              to 6819.  This value must be equal to the DbdPort  parameter  in
130              the slurmdbd.conf file.  Also see DefaultStoragePort.
131
132
133       AccountingStorageTRES
134              Comma separated list of resources you wish to track on the clus‐
135              ter.  These are the resources requested by the  sbatch/srun  job
136              when  it  is  submitted. Currently this consists of any GRES, BB
137              (burst buffer) or license along with CPU, Memory, Node,  Energy,
138              FS/[Disk|Lustre],  IC/OFED, Pages, and VMem. By default Billing,
139              CPU, Energy, Memory, Node, FS/Disk, Pages and VMem are  tracked.
140              These  default  TRES  cannot  be disabled, but only appended to.
141              AccountingStorageTRES=gres/craynetwork,license/iop1  will  track
142              billing,  cpu,  energy,  memory,  nodes, fs/disk, pages and vmem
143              along with a gres called craynetwork as well as a license called
144              iop1.  Whenever these resources are used on the cluster they are
145              recorded. The TRES are automatically set up in the  database  on
146              the start of the slurmctld.
147
148              If  multiple  GRES  of different types are tracked (e.g. GPUs of
149              different types), then job requests with matching type  specifi‐
150              cations  will  be  recorded.  Given a configuration of "Account‐
151              ingStorageTRES=gres/gpu,gres/gpu:tesla,gres/gpu:volta"      Then
152              "gres/gpu:tesla"  and "gres/gpu:volta" will track only jobs that
153              explicitly request those two GPU types,  while  "gres/gpu"  will
154              track  allocated GPUs of any type ("tesla", "volta" or any other
155              GPU type).
156
157              Given      a      configuration      of      "AccountingStorage‐
158              TRES=gres/gpu:tesla,gres/gpu:volta"  Then  "gres/gpu:tesla"  and
159              "gres/gpu:volta" will track jobs that explicitly  request  those
160              GPU  types.   If  a  job  requests GPUs, but does not explicitly
161              specify the GPU type, then  it's  resource  allocation  will  be
162              accounted  for  as  either "gres/gpu:tesla" or "gres/gpu:volta",
163              although the accounting may not match the actual GPU type  allo‐
164              cated to the job and the GPUs allocated to the job could be het‐
165              erogeneous.  In an environment containing various GPU types, use
166              of  a job_submit plugin may be desired in order to force jobs to
167              explicitly specify some GPU type.
168
169
170       AccountingStorageType
171              The accounting storage mechanism  type.   Acceptable  values  at
172              present  include "accounting_storage/filetxt", "accounting_stor‐
173              age/none"  and  "accounting_storage/slurmdbd".   The   "account‐
174              ing_storage/filetxt"  value  indicates  that  accounting records
175              will be written to the file specified by the  AccountingStorage‐
176              Loc  parameter.   The  "accounting_storage/slurmdbd" value indi‐
177              cates that accounting records will be written to the Slurm  DBD,
178              which  manages  an underlying MySQL database. See "man slurmdbd"
179              for more information.  The default  value  is  "accounting_stor‐
180              age/none" and indicates that account records are not maintained.
181              Note: The filetxt  plugin  records  only  a  limited  subset  of
182              accounting  information and will prevent some sacct options from
183              proper operation.  Also see DefaultStorageType.
184
185
186       AccountingStorageUser
187              The user account for accessing the accounting storage  database.
188              Only  used for database type storage plugins, ignored otherwise.
189              Also see DefaultStorageUser.
190
191
192       AccountingStoreJobComment
193              If set to "YES" then include the job's comment field in the  job
194              complete  message  sent to the Accounting Storage database.  The
195              default is "YES".  Note the AdminComment and  SystemComment  are
196              always recorded in the database.
197
198
199       AcctGatherNodeFreq
200              The  AcctGather  plugins  sampling interval for node accounting.
201              For AcctGather plugin values of none, this parameter is ignored.
202              For  all  other  values  this parameter is the number of seconds
203              between node accounting samples. For the acct_gather_energy/rapl
204              plugin, set a value less than 300 because the counters may over‐
205              flow beyond this rate.  The default value is  zero.  This  value
206              disables  accounting  sampling  for  nodes. Note: The accounting
207              sampling interval for jobs is determined by the value of  JobAc‐
208              ctGatherFrequency.
209
210
211       AcctGatherEnergyType
212              Identifies the plugin to be used for energy consumption account‐
213              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
214              plugin  to  collect  energy consumption data for jobs and nodes.
215              The collection of energy consumption data  takes  place  on  the
216              node  level,  hence only in case of exclusive job allocation the
217              energy consumption measurements will reflect the job's real con‐
218              sumption. In case of node sharing between jobs the reported con‐
219              sumed energy per job (through sstat or sacct) will  not  reflect
220              the real energy consumed by the jobs.
221
222              Configurable values at present are:
223
224              acct_gather_energy/none
225                                  No energy consumption data is collected.
226
227              acct_gather_energy/ipmi
228                                  Energy  consumption  data  is collected from
229                                  the Baseboard  Management  Controller  (BMC)
230                                  using  the  Intelligent  Platform Management
231                                  Interface (IPMI).
232
233              acct_gather_energy/rapl
234                                  Energy consumption data  is  collected  from
235                                  hardware  sensors  using the Running Average
236                                  Power  Limit  (RAPL)  mechanism.  Note  that
237                                  enabling  RAPL  may require the execution of
238                                  the command "sudo modprobe msr".
239
240
241       AcctGatherInfinibandType
242              Identifies the plugin to be used for infiniband network  traffic
243              accounting.   The  jobacct_gather  plugin and slurmd daemon call
244              this plugin to collect network traffic data for jobs and  nodes.
245              The  collection  of network traffic data takes place on the node
246              level, hence only in case of exclusive job allocation  the  col‐
247              lected  values  will  reflect the job's real traffic. In case of
248              node sharing between jobs the reported network traffic  per  job
249              (through sstat or sacct) will not reflect the real network traf‐
250              fic by the jobs.
251
252              Configurable values at present are:
253
254              acct_gather_infiniband/none
255                                  No infiniband network data are collected.
256
257              acct_gather_infiniband/ofed
258                                  Infiniband network  traffic  data  are  col‐
259                                  lected from the hardware monitoring counters
260                                  of  Infiniband  devices  through  the   OFED
261                                  library.   In  order  to account for per job
262                                  network traffic, add the "ic/ofed"  TRES  to
263                                  AccountingStorageTRES.
264
265
266       AcctGatherFilesystemType
267              Identifies the plugin to be used for filesystem traffic account‐
268              ing.  The jobacct_gather plugin  and  slurmd  daemon  call  this
269              plugin  to  collect  filesystem traffic data for jobs and nodes.
270              The collection of filesystem traffic data  takes  place  on  the
271              node  level,  hence only in case of exclusive job allocation the
272              collected values will reflect the job's real traffic. In case of
273              node  sharing  between  jobs the reported filesystem traffic per
274              job (through sstat or sacct) will not reflect the real  filesys‐
275              tem traffic by the jobs.
276
277
278              Configurable values at present are:
279
280              acct_gather_filesystem/none
281                                  No filesystem data are collected.
282
283              acct_gather_filesystem/lustre
284                                  Lustre filesystem traffic data are collected
285                                  from the counters found in /proc/fs/lustre/.
286                                  In order to account for per job lustre traf‐
287                                  fic, add the "fs/lustre"  TRES  to  Account‐
288                                  ingStorageTRES.
289
290
291       AcctGatherProfileType
292              Identifies  the  plugin  to  be used for detailed job profiling.
293              The jobacct_gather plugin and slurmd daemon call this plugin  to
294              collect  detailed  data  such  as  I/O  counts, memory usage, or
295              energy consumption for jobs and nodes. There are  interfaces  in
296              this  plugin  to collect data as step start and completion, task
297              start and completion, and at the account gather  frequency.  The
298              data collected at the node level is related to jobs only in case
299              of exclusive job allocation.
300
301              Configurable values at present are:
302
303              acct_gather_profile/none
304                                  No profile data is collected.
305
306              acct_gather_profile/hdf5
307                                  This enables the HDF5 plugin. The  directory
308                                  where the profile files are stored and which
309                                  values are collected are configured  in  the
310                                  acct_gather.conf file.
311
312              acct_gather_profile/influxdb
313                                  This   enables   the  influxdb  plugin.  The
314                                  influxdb  instance  host,  port,   database,
315                                  retention  policy  and which values are col‐
316                                  lected     are     configured     in     the
317                                  acct_gather.conf file.
318
319
320       AllowSpecResourcesUsage
321              If  set  to  1,  Slurm allows individual jobs to override node's
322              configured CoreSpecCount value. For a job to take  advantage  of
323              this feature, a command line option of --core-spec must be spec‐
324              ified.  The default value for this option is 1 for Cray  systems
325              and 0 for other system types.
326
327
328       AuthAltTypes
329              Command  separated  list  of  alternative authentication plugins
330              that the slurmctld will permit for communication.
331
332
333       AuthInfo
334              Additional information to be used for authentication of communi‐
335              cations between the Slurm daemons (slurmctld and slurmd) and the
336              Slurm clients.  The interpretation of this option is specific to
337              the configured AuthType.  Multiple options may be specified in a
338              comma delimited list.  If not specified, the default authentica‐
339              tion information will be used.
340
341              cred_expire   Default  job  step credential lifetime, in seconds
342                            (e.g.  "cred_expire=1200").   It  must  be  suffi‐
343                            ciently  long enough to load user environment, run
344                            prolog, deal with the slurmd getting paged out  of
345                            memory,  etc.   This  also  controls  how  long  a
346                            requeued job must wait before starting again.  The
347                            default value is 120 seconds.
348
349              socket        Path  name  to  a MUNGE daemon socket to use (e.g.
350                            "socket=/var/run/munge/munge.socket.2").       The
351                            default  value is "/var/run/munge/munge.socket.2".
352                            Used by auth/munge and cred/munge.
353
354              ttl           Credential lifetime, in seconds (e.g.  "ttl=300").
355                            The  default  value  is  dependent  upon the MUNGE
356                            installation, but is typically 300 seconds.
357
358
359       AuthType
360              The authentication method for communications between Slurm  com‐
361              ponents.   Acceptable values at present include "auth/munge" and
362              "auth/none".  The default value  is  "auth/munge".   "auth/none"
363              includes  the UID in each communication, but it is not verified.
364              This  may  be  fine  for  testing  purposes,  but  do  not   use
365              "auth/none"  if you desire any security.  "auth/munge" indicates
366              that MUNGE is to be used.   (See  "https://dun.github.io/munge/"
367              for  more  information).  All Slurm daemons and commands must be
368              terminated prior to changing the value  of  AuthType  and  later
369              restarted.
370
371
372       BackupAddr
373              Defunct option, see SlurmctldHost.
374
375
376       BackupController
377              Defunct option, see SlurmctldHost.
378
379              The backup controller recovers state information from the State‐
380              SaveLocation directory, which must be readable and writable from
381              both  the  primary and backup controllers.  While not essential,
382              it is recommended that you specify  a  backup  controller.   See
383              the RELOCATING CONTROLLERS section if you change this.
384
385
386       BatchStartTimeout
387              The  maximum time (in seconds) that a batch job is permitted for
388              launching before being  considered  missing  and  releasing  the
389              allocation. The default value is 10 (seconds). Larger values may
390              be required if more time is required to execute the Prolog, load
391              user  environment  variables  (for Moab spawned jobs), or if the
392              slurmd daemon gets paged from memory.
393              Note: The test for a job being  successfully  launched  is  only
394              performed  when  the  Slurm daemon on the compute node registers
395              state with the slurmctld daemon on the head node, which  happens
396              fairly  rarely.   Therefore a job will not necessarily be termi‐
397              nated if its start time exceeds BatchStartTimeout.  This config‐
398              uration  parameter  is  also  applied  to launch tasks and avoid
399              aborting srun commands due to long running Prolog scripts.
400
401
402       BurstBufferType
403              The plugin used to manage burst buffers.  Acceptable  values  at
404              present are:
405
406              burst_buffer/datawrap
407                     Use Cray DataWarp API to provide burst buffer functional‐
408                     ity.
409
410              burst_buffer/none
411
412
413       CheckpointType
414              The system-initiated checkpoint method to be used for user jobs.
415              The  slurmctld  daemon  must be restarted for a change in Check‐
416              pointType to take effect. Supported values presently include:
417
418              checkpoint/none
419                     no checkpoint support (default)
420
421              checkpoint/ompi
422                     OpenMPI (version 1.3 or higher)
423
424
425       CliFilterPlugins
426              A comma delimited list of command  line  interface  option  fil‐
427              ter/modification plugins. The specified plugins will be executed
428              in the order listed.  These are  intended  to  be  site-specific
429              plugins  which  can be used to set default job parameters and/or
430              logging events.  No cli_filter plugins are used by default.
431
432
433       ClusterName
434              The name by which this Slurm managed cluster  is  known  in  the
435              accounting  database.   This  is  needed  distinguish accounting
436              records when multiple clusters  report  to  the  same  database.
437              Because of limitations in some databases, any upper case letters
438              in the name will be silently mapped to lower case. In  order  to
439              avoid confusion, it is recommended that the name be lower case.
440
441
442       CommunicationParameters
443              Comma separated options identifying communication options.
444
445              CheckGhalQuiesce
446                             Used  specifically  on a Cray using an Aries Ghal
447                             interconnect.  This will check to see if the sys‐
448                             tem  is  quiescing when sending a message, and if
449                             so, we wait until it is done before sending.
450
451              NoAddrCache By  default,  Slurm  will  cache  a  node's  network
452              address after
453                             successfully   establishing  the  node's  network
454                             address. This option disables the cache and Slurm
455                             will look up the node's network address each time
456                             a connection is made. This is useful,  for  exam‐
457                             ple,  in  a  cloud  environment  where  the  node
458                             addresses come and go out of DNS.
459
460              NoCtldInAddrAny
461                             Used to directly bind to the address of what  the
462                             node resolves to running the slurmctld instead of
463                             binding messages to  any  address  on  the  node,
464                             which is the default.
465
466              NoInAddrAny    Used  to directly bind to the address of what the
467                             node resolves to instead of binding  messages  to
468                             any  address  on  the  node which is the default.
469                             This option is for all daemons/clients except for
470                             the slurmctld.
471
472
473       CompleteWait
474              The  time,  in  seconds, given for a job to remain in COMPLETING
475              state before any additional jobs are scheduled.  If set to zero,
476              pending  jobs will be started as soon as possible.  Since a COM‐
477              PLETING job's resources are released for use by  other  jobs  as
478              soon  as  the Epilog completes on each individual node, this can
479              result in very fragmented resource allocations.  To provide jobs
480              with  the  minimum response time, a value of zero is recommended
481              (no waiting).  To minimize fragmentation of resources,  a  value
482              equal  to  KillWait plus two is recommended.  In that case, set‐
483              ting KillWait to a small value may be beneficial.   The  default
484              value of CompleteWait is zero seconds.  The value may not exceed
485              65533.
486
487
488       ControlAddr
489              Defunct option, see SlurmctldHost.
490
491
492       ControlMachine
493              Defunct option, see SlurmctldHost.
494
495
496       CoreSpecPlugin
497              Identifies the plugins to be used for enforcement of  core  spe‐
498              cialization.   The  slurmd daemon must be restarted for a change
499              in CoreSpecPlugin to take effect.  Acceptable values at  present
500              include:
501
502              core_spec/cray_aries
503                                  used only for Cray systems
504
505              core_spec/none      used for all other system types
506
507
508       CpuFreqDef
509              Default  CPU  frequency  value or frequency governor to use when
510              running a job step if it has not been explicitly  set  with  the
511              --cpu-freq  option.   Acceptable  values  at  present  include a
512              numeric value (frequency in kilohertz) or one of  the  following
513              governors:
514
515              Conservative  attempts to use the Conservative CPU governor
516
517              OnDemand      attempts to use the OnDemand CPU governor
518
519              Performance   attempts to use the Performance CPU governor
520
521              PowerSave     attempts to use the PowerSave CPU governor
522       There  is no default value. If unset, no attempt to set the governor is
523       made if the --cpu-freq option has not been set.
524
525
526       CpuFreqGovernors
527              List of CPU frequency governors allowed to be set with the  sal‐
528              loc,  sbatch,  or srun option  --cpu-freq.  Acceptable values at
529              present include:
530
531              Conservative  attempts to use the Conservative CPU governor
532
533              OnDemand      attempts to  use  the  OnDemand  CPU  governor  (a
534                            default value)
535
536              Performance   attempts  to  use  the Performance CPU governor (a
537                            default value)
538
539              PowerSave     attempts to use the PowerSave CPU governor
540
541              UserSpace     attempts to use  the  UserSpace  CPU  governor  (a
542                            default value)
543       The default is OnDemand, Performance and UserSpace.
544
545       CredType
546              The  cryptographic  signature tool to be used in the creation of
547              job step credentials.  The slurmctld daemon  must  be  restarted
548              for  a  change in CredType to take effect.  Acceptable values at
549              present include "cred/munge".  The default value is "cred/munge"
550              and is the recommended.
551
552
553       DebugFlags
554              Defines  specific  subsystems which should provide more detailed
555              event logging.  Multiple subsystems can be specified with  comma
556              separators.   Most DebugFlags will result in verbose logging for
557              the identified subsystems and could impact  performance.   Valid
558              subsystems available today (with more to come) include:
559
560              Accrue           Accrue counters accounting details
561
562              Agent            RPC agents (outgoing RPCs from Slurm daemons)
563
564              Backfill         Backfill scheduler details
565
566              BackfillMap      Backfill scheduler to log a very verbose map of
567                               reserved resources through time.  Combine  with
568                               Backfill for a verbose and complete view of the
569                               backfill scheduler's work.
570
571              BurstBuffer      Burst Buffer plugin
572
573              CPU_Bind         CPU binding details for jobs and steps
574
575              CpuFrequency     Cpu frequency details for jobs and steps  using
576                               the --cpu-freq option.
577
578              Elasticsearch    Elasticsearch debug info
579
580              Energy           AcctGatherEnergy debug info
581
582              ExtSensors       External Sensors debug info
583
584              Federation       Federation scheduling debug info
585
586              FrontEnd         Front end node details
587
588              Gres             Generic resource details
589
590              HeteroJobs       Heterogeneous job details
591
592              Gang             Gang scheduling details
593
594              JobContainer     Job container plugin details
595
596              License          License management details
597
598              NodeFeatures     Node Features plugin debug info
599
600              NO_CONF_HASH     Do  not  log  when the slurm.conf files differs
601                               between Slurm daemons
602
603              Power            Power management plugin
604
605              PowerSave        Power save (suspend/resume programs) details
606
607              Priority         Job prioritization
608
609              Profile          AcctGatherProfile plugins details
610
611              Protocol         Communication protocol details
612
613              Reservation      Advanced reservations
614
615              Route            Message  forwarding  and  message   aggregation
616                               debug info
617
618              SelectType       Resource selection plugin
619
620              Steps            Slurmctld resource allocation for job steps
621
622              Switch           Switch plugin
623
624              TimeCray         Timing of Cray APIs
625
626              TRESNode         Limits dealing with TRES=Node
627
628              TraceJobs        Trace jobs in slurmctld. It will print detailed
629                               job information including state,  job  ids  and
630                               allocated nodes counter.
631
632              Triggers         Slurmctld triggers
633
634
635       DefCpuPerGPU
636              Default count of CPUs allocated per allocated GPU.
637
638
639       DefMemPerCPU
640              Default   real  memory  size  available  per  allocated  CPU  in
641              megabytes.  Used to avoid over-subscribing  memory  and  causing
642              paging.  DefMemPerCPU would generally be used if individual pro‐
643              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
644              SelectType=select/cons_tres).   The  default  value is 0 (unlim‐
645              ited).  Also see DefMemPerGPU, DefMemPerNode  and  MaxMemPerCPU.
646              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
647              sive.
648
649
650       DefMemPerGPU
651              Default  real  memory  size  available  per  allocated  GPU   in
652              megabytes.   The  default  value  is  0  (unlimited).   Also see
653              DefMemPerCPU and DefMemPerNode.  DefMemPerCPU, DefMemPerGPU  and
654              DefMemPerNode are mutually exclusive.
655
656
657       DefMemPerNode
658              Default  real  memory  size  available  per  allocated  node  in
659              megabytes.  Used to avoid over-subscribing  memory  and  causing
660              paging.   DefMemPerNode  would  generally be used if whole nodes
661              are allocated to jobs (SelectType=select/linear)  and  resources
662              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
663              The default value is  0  (unlimited).   Also  see  DefMemPerCPU,
664              DefMemPerGPU  and  MaxMemPerCPU.  DefMemPerCPU, DefMemPerGPU and
665              DefMemPerNode are mutually exclusive.
666
667
668       DefaultStorageHost
669              The default name of the machine hosting the  accounting  storage
670              and job completion databases.  Only used for database type stor‐
671              age plugins and when the AccountingStorageHost  and  JobCompHost
672              have not been defined.
673
674
675       DefaultStorageLoc
676              The  fully  qualified  file name where accounting records and/or
677              job completion records are written when  the  DefaultStorageType
678              is "filetxt".  Also see AccountingStorageLoc and JobCompLoc.
679
680
681       DefaultStoragePass
682              The  password  used  to gain access to the database to store the
683              accounting and job completion data.  Only used for database type
684              storage  plugins,  ignored  otherwise.  Also see AccountingStor‐
685              agePass and JobCompPass.
686
687
688       DefaultStoragePort
689              The listening port of the accounting storage and/or job  comple‐
690              tion database server.  Only used for database type storage plug‐
691              ins, ignored otherwise.  Also see AccountingStoragePort and Job‐
692              CompPort.
693
694
695       DefaultStorageType
696              The  accounting  and  job  completion  storage  mechanism  type.
697              Acceptable values at  present  include  "filetxt",  "mysql"  and
698              "none".   The  value  "filetxt"  indicates  that records will be
699              written to a file.  The value "mysql" indicates that  accounting
700              records  will  be  written  to a MySQL or MariaDB database.  The
701              default value is "none", which means that records are not  main‐
702              tained.  Also see AccountingStorageType and JobCompType.
703
704
705       DefaultStorageUser
706              The user account for accessing the accounting storage and/or job
707              completion database.  Only used for database type storage  plug‐
708              ins, ignored otherwise.  Also see AccountingStorageUser and Job‐
709              CompUser.
710
711
712       DisableRootJobs
713              If set to "YES" then user root will be  prevented  from  running
714              any  jobs.  The default value is "NO", meaning user root will be
715              able to execute jobs.  DisableRootJobs may also be set by parti‐
716              tion.
717
718
719       EioTimeout
720              The  number  of  seconds  srun waits for slurmstepd to close the
721              TCP/IP connection used to relay data between the  user  applica‐
722              tion  and srun when the user application terminates. The default
723              value is 60 seconds.  May not exceed 65533.
724
725
726       EnforcePartLimits
727              If set to "ALL" then jobs which exceed a partition's size and/or
728              time  limits will be rejected at submission time. If job is sub‐
729              mitted to multiple partitions, the job must satisfy  the  limits
730              on  all  the  requested  partitions. If set to "NO" then the job
731              will be accepted and remain queued until  the  partition  limits
732              are  altered(Time  and Node Limits).  If set to "ANY" a job must
733              satisfy any of the requested partitions  to  be  submitted.  The
734              default  value  is "NO".  NOTE: If set, then a job's QOS can not
735              be used to exceed partition limits.  NOTE: The partition  limits
736              being  considered  are it's configured MaxMemPerCPU, MaxMemPerN‐
737              ode, MinNodes,  MaxNodes,  MaxTime,  AllocNodes,  AllowAccounts,
738              AllowGroups, AllowQOS, and QOS usage threshold.
739
740
741       Epilog Fully  qualified pathname of a script to execute as user root on
742              every   node    when    a    user's    job    completes    (e.g.
743              "/usr/local/slurm/epilog").  A  glob  pattern (See glob (7)) may
744              also  be  used  to  run  more  than  one  epilog  script   (e.g.
745              "/etc/slurm/epilog.d/*").  The  Epilog  script or scripts may be
746              used to purge files, disable user login, etc.  By default  there
747              is  no  epilog.  See Prolog and Epilog Scripts for more informa‐
748              tion.
749
750
751       EpilogMsgTime
752              The number of microseconds that the slurmctld daemon requires to
753              process  an  epilog  completion message from the slurmd daemons.
754              This parameter can be used to prevent a burst of epilog  comple‐
755              tion messages from being sent at the same time which should help
756              prevent lost messages and improve  throughput  for  large  jobs.
757              The  default  value  is 2000 microseconds.  For a 1000 node job,
758              this spreads the epilog completion messages out  over  two  sec‐
759              onds.
760
761
762       EpilogSlurmctld
763              Fully  qualified pathname of a program for the slurmctld to exe‐
764              cute   upon   termination   of   a    job    allocation    (e.g.
765              "/usr/local/slurm/epilog_controller").   The program executes as
766              SlurmUser, which gives it permission to drain nodes and  requeue
767              the job if a failure occurs (See scontrol(1)).  Exactly what the
768              program does and how it accomplishes this is completely  at  the
769              discretion  of  the system administrator.  Information about the
770              job being initiated, it's allocated nodes, etc.  are  passed  to
771              the  program using environment variables.  See Prolog and Epilog
772              Scripts for more information.
773
774
775       ExtSensorsFreq
776              The external  sensors  plugin  sampling  interval.   If  ExtSen‐
777              sorsType=ext_sensors/none,  this  parameter is ignored.  For all
778              other values of ExtSensorsType, this parameter is the number  of
779              seconds between external sensors samples for hardware components
780              (nodes, switches, etc.) The default value is  zero.  This  value
781              disables  external  sensors  sampling. Note: This parameter does
782              not affect external sensors data collection for jobs/steps.
783
784
785       ExtSensorsType
786              Identifies the plugin to be used for external sensors data  col‐
787              lection.   Slurmctld  calls this plugin to collect external sen‐
788              sors data for jobs/steps and hardware  components.  In  case  of
789              node  sharing  between  jobs  the  reported  values per job/step
790              (through sstat or sacct) may not be  accurate.   See  also  "man
791              ext_sensors.conf".
792
793              Configurable values at present are:
794
795              ext_sensors/none    No external sensors data is collected.
796
797              ext_sensors/rrd     External  sensors data is collected from the
798                                  RRD database.
799
800
801       FairShareDampeningFactor
802              Dampen the effect of exceeding a user or group's fair  share  of
803              allocated resources. Higher values will provides greater ability
804              to differentiate between exceeding the fair share at high levels
805              (e.g. a value of 1 results in almost no difference between over‐
806              consumption by a factor of 10 and 100, while a value of  5  will
807              result  in  a  significant difference in priority).  The default
808              value is 1.
809
810
811       FederationParameters
812              Used to define federation options. Multiple options may be comma
813              separated.
814
815
816              fed_display
817                     If  set,  then  the  client status commands (e.g. squeue,
818                     sinfo, sprio, etc.) will display information in a  feder‐
819                     ated view by default. This option is functionally equiva‐
820                     lent to using the --federation options on  each  command.
821                     Use the client's --local option to override the federated
822                     view and get a local view of the given cluster.
823
824
825       FirstJobId
826              The job id to be used for the first submitted to Slurm without a
827              specific  requested  value.  Job id values generated will incre‐
828              mented by 1 for each subsequent job. This may be used to provide
829              a  meta-scheduler with a job id space which is disjoint from the
830              interactive jobs.  The default value is 1.  Also see MaxJobId
831
832
833       GetEnvTimeout
834              Used for Moab scheduled jobs only. Controls how long job  should
835              wait  in  seconds  for  loading  the  user's  environment before
836              attempting to load it from a cache file. Applies when  the  srun
837              or sbatch --get-user-env option is used. If set to 0 then always
838              load the user's environment from the cache  file.   The  default
839              value is 2 seconds.
840
841
842       GresTypes
843              A  comma delimited list of generic resources to be managed (e.g.
844              GresTypes=gpu,mps).  These resources may have an associated GRES
845              plugin  of the same name providing additional functionality.  No
846              generic resources are managed by default.  Ensure this parameter
847              is  consistent across all nodes in the cluster for proper opera‐
848              tion.  The slurmctld daemon must be  restarted  for  changes  to
849              this parameter to become effective.
850
851
852       GroupUpdateForce
853              If  set  to a non-zero value, then information about which users
854              are members of groups allowed to use a partition will be updated
855              periodically,  even  when  there  have  been  no  changes to the
856              /etc/group file.  If set to zero, group member information  will
857              be  updated  only  after  the  /etc/group  file is updated.  The
858              default value is 1.  Also see the GroupUpdateTime parameter.
859
860
861       GroupUpdateTime
862              Controls how frequently information about which users  are  mem‐
863              bers  of  groups allowed to use a partition will be updated, and
864              how long user group membership lists will be cached.   The  time
865              interval  is  given  in seconds with a default value of 600 sec‐
866              onds.  A value of zero will prevent periodic updating  of  group
867              membership  information.   Also see the GroupUpdateForce parame‐
868              ter.
869
870
871       GpuFreqDef=[<type]=value>[,<type=value>]
872              Default GPU frequency to use when running a job step if  it  has
873              not  been  explicitly  set  using  the  --gpu-freq option.  This
874              option can be used to independently configure the  GPU  and  its
875              memory  frequencies.  Defaults to "high,memory=high".  After the
876              job is completed, the frequencies of all affected GPUs  will  be
877              reset  to  the  highest  possible values.  In some cases, system
878              power caps may override the requested values.   The  field  type
879              can be "memory".  If type is not specified, the GPU frequency is
880              implied.  The value field can either be "low", "medium", "high",
881              "highm1"  or  a numeric value in megahertz (MHz).  If the speci‐
882              fied numeric value is not possible, a value as close as possible
883              will be used.  See below for definition of the values.  Examples
884              of  use  include  "GpuFreqDef=medium,memory=high  and   "GpuFre‐
885              qDef=450".
886
887              Supported value definitions:
888
889              low       the lowest available frequency.
890
891              medium    attempts  to  set  a  frequency  in  the middle of the
892                        available range.
893
894              high      the highest available frequency.
895
896              highm1    (high minus one) will select the next  highest  avail‐
897                        able frequency.
898
899
900       HealthCheckInterval
901              The  interval  in  seconds between executions of HealthCheckPro‐
902              gram.  The default value is zero, which disables execution.
903
904
905       HealthCheckNodeState
906              Identify what node states should execute the HealthCheckProgram.
907              Multiple  state  values may be specified with a comma separator.
908              The default value is ANY to execute on nodes in any state.
909
910              ALLOC       Run on nodes in the  ALLOC  state  (all  CPUs  allo‐
911                          cated).
912
913              ANY         Run on nodes in any state.
914
915              CYCLE       Rather  than running the health check program on all
916                          nodes at the same time, cycle through running on all
917                          compute nodes through the course of the HealthCheck‐
918                          Interval. May be  combined  with  the  various  node
919                          state options.
920
921              IDLE        Run on nodes in the IDLE state.
922
923              MIXED       Run  on nodes in the MIXED state (some CPUs idle and
924                          other CPUs allocated).
925
926
927       HealthCheckProgram
928              Fully qualified pathname of a script to  execute  as  user  root
929              periodically   on   all  compute  nodes  that  are  not  in  the
930              NOT_RESPONDING state. This program may be  used  to  verify  the
931              node  is fully operational and DRAIN the node or send email if a
932              problem is detected.  Any action to be taken must be  explicitly
933              performed  by  the  program (e.g. execute "scontrol update Node‐
934              Name=foo State=drain  Reason=tmp_file_system_full"  to  drain  a
935              node).    The   execution   interval  is  controlled  using  the
936              HealthCheckInterval parameter.  Note that the HealthCheckProgram
937              will  be  executed at the same time on all nodes to minimize its
938              impact upon parallel programs.  This program is will  be  killed
939              if  it does not terminate normally within 60 seconds.  This pro‐
940              gram will also be executed  when  the  slurmd  daemon  is  first
941              started  and  before it registers with the slurmctld daemon.  By
942              default, no program will be executed.
943
944
945       InactiveLimit
946              The interval, in seconds, after which a non-responsive job allo‐
947              cation  command  (e.g.  srun  or  salloc) will result in the job
948              being terminated. If the node on which the command  is  executed
949              fails  or the command abnormally terminates, this will terminate
950              its job allocation.  This option has no effect upon batch  jobs.
951              When  setting  a  value, take into consideration that a debugger
952              using srun to launch an application may leave the  srun  command
953              in  a stopped state for extended periods of time.  This limit is
954              ignored for jobs running in partitions with  the  RootOnly  flag
955              set  (the  scheduler running as root will be responsible for the
956              job).  The default value is unlimited (zero) and may not  exceed
957              65533 seconds.
958
959
960       JobAcctGatherType
961              The job accounting mechanism type.  Acceptable values at present
962              include "jobacct_gather/linux" (for Linux systems)  and  is  the
963              recommended        one,        "jobacct_gather/cgroup"       and
964              "jobacct_gather/none"  (no  accounting  data  collected).    The
965              default value is "jobacct_gather/none".  "jobacct_gather/cgroup"
966              is a plugin for the Linux operating system that uses cgroups  to
967              collect accounting statistics. The plugin collects the following
968              statistics:   From   the   cgroup   memory    subsystem:    mem‐
969              ory.usage_in_bytes  (reported  as  'pages')  and  rss  from mem‐
970              ory.stat (reported as 'rss'). From the cgroup cpuacct subsystem:
971              user  cpu  time  and  system  cpu  time. No value is provided by
972              cgroups for virtual memory size ('vsize').  In order to use  the
973              sstat  tool  "jobacct_gather/linux",  or "jobacct_gather/cgroup"
974              must be configured.
975              NOTE: Changing this configuration parameter changes the contents
976              of  the  messages  between Slurm daemons. Any previously running
977              job steps are managed by a slurmstepd daemon that  will  persist
978              through the lifetime of that job step and not change it's commu‐
979              nication protocol. Only change this configuration parameter when
980              there are no running job steps.
981
982
983       JobAcctGatherFrequency
984              The  job  accounting and profiling sampling intervals.  The sup‐
985              ported format is follows:
986
987              JobAcctGatherFrequency=<datatype>=<interval>
988                          where <datatype>=<interval> specifies the task  sam‐
989                          pling  interval  for  the jobacct_gather plugin or a
990                          sampling  interval  for  a  profiling  type  by  the
991                          acct_gather_profile  plugin.  Multiple,  comma-sepa‐
992                          rated <datatype>=<interval> intervals may be  speci‐
993                          fied. Supported datatypes are as follows:
994
995                          task=<interval>
996                                 where  <interval> is the task sampling inter‐
997                                 val in seconds for the jobacct_gather plugins
998                                 and     for    task    profiling    by    the
999                                 acct_gather_profile plugin.
1000
1001                          energy=<interval>
1002                                 where <interval> is the sampling interval  in
1003                                 seconds   for   energy  profiling  using  the
1004                                 acct_gather_energy plugin
1005
1006                          network=<interval>
1007                                 where <interval> is the sampling interval  in
1008                                 seconds  for  infiniband  profiling using the
1009                                 acct_gather_infiniband plugin.
1010
1011                          filesystem=<interval>
1012                                 where <interval> is the sampling interval  in
1013                                 seconds  for  filesystem  profiling using the
1014                                 acct_gather_filesystem plugin.
1015
1016              The default value for task sampling interval
1017              is 30 seconds. The default value for all other intervals  is  0.
1018              An  interval  of  0 disables sampling of the specified type.  If
1019              the task sampling interval is 0, accounting information is  col‐
1020              lected only at job termination (reducing Slurm interference with
1021              the job).
1022              Smaller (non-zero) values have a greater impact upon job perfor‐
1023              mance,  but a value of 30 seconds is not likely to be noticeable
1024              for applications having less than 10,000 tasks.
1025              Users can independently override each  interval  on  a  per  job
1026              basis using the --acctg-freq option when submitting the job.
1027
1028
1029       JobAcctGatherParams
1030              Arbitrary  parameters  for the job account gather plugin Accept‐
1031              able values at present include:
1032
1033              NoShared            Exclude shared memory from accounting.
1034
1035              UsePss              Use PSS value instead of  RSS  to  calculate
1036                                  real usage of memory.  The PSS value will be
1037                                  saved as RSS.
1038
1039              OverMemoryKill      Kill jobs or steps that are  being  detected
1040                                  to use more memory than requested every time
1041                                  accounting information is  gathered  by  the
1042                                  JobAcctGather  plugin.   This parameter will
1043                                  not kill a job directly, but only the  step.
1044                                  See  MemLimitEnforce  for that purpose. This
1045                                  parameter should be used with caution as  if
1046                                  jobs  exceeds  its  memory allocation it may
1047                                  affect  other   processes   and/or   machine
1048                                  health.   NOTE:  It  is recommended to limit
1049                                  memory by enabling task/cgroup in TaskPlugin
1050                                  and   making  use  of  ConstrainRAMSpace=yes
1051                                  cgroup.conf instead of using  this  JobAcct‐
1052                                  Gather  mechanism  for  memory  enforcement,
1053                                  since the  former  has  a  lower  resolution
1054                                  (JobAcctGatherFreq) and OOMs could happen at
1055                                  some point.
1056
1057
1058       JobCheckpointDir
1059              Specifies the default  directory  for  storing  or  reading  job
1060              checkpoint information. The data stored here is only a few thou‐
1061              sand bytes per job and includes information needed  to  resubmit
1062              the  job  request, not job's memory image. The directory must be
1063              readable and writable by SlurmUser, but not writable by  regular
1064              users.  The  job memory images may be in a different location as
1065              specified by --checkpoint-dir option at job submit time or scon‐
1066              trol's ImageDir option.
1067
1068
1069       JobCompHost
1070              The  name  of  the  machine hosting the job completion database.
1071              Only used for database type storage plugins, ignored  otherwise.
1072              Also see DefaultStorageHost.
1073
1074
1075       JobCompLoc
1076              The  fully  qualified file name where job completion records are
1077              written when the JobCompType is "jobcomp/filetxt" or  the  data‐
1078              base  where  job completion records are stored when the JobComp‐
1079              Type is a database, or an url  with  format  http://yourelastic‐
1080              server:port  when JobCompType is "jobcomp/elasticsearch".  NOTE:
1081              when you specify a URL for Elasticsearch, Slurm will remove  any
1082              trailing   slashes  "/"  from  the  configured  URL  and  append
1083              "/slurm/jobcomp", which are the Elasticsearch index name (slurm)
1084              and  mapping  (jobcomp).  NOTE: More information is available at
1085              the  Slurm   web   site   (   https://slurm.schedmd.com/elastic‐
1086              search.html ).  Also see DefaultStorageLoc.
1087
1088
1089       JobCompPass
1090              The  password  used  to gain access to the database to store the
1091              job completion data.  Only used for database type storage  plug‐
1092              ins, ignored otherwise.  Also see DefaultStoragePass.
1093
1094
1095       JobCompPort
1096              The  listening port of the job completion database server.  Only
1097              used for database type storage plugins, ignored otherwise.  Also
1098              see DefaultStoragePort.
1099
1100
1101       JobCompType
1102              The job completion logging mechanism type.  Acceptable values at
1103              present include "jobcomp/none",  "jobcomp/elasticsearch",  "job‐
1104              comp/filetxt",   "jobcomp/mysql"   and   "jobcomp/script".   The
1105              default value is "jobcomp/none", which means that upon job  com‐
1106              pletion  the  record  of  the job is purged from the system.  If
1107              using the accounting infrastructure this plugin may  not  be  of
1108              interest  since  the  information  here is redundant.  The value
1109              "jobcomp/elasticsearch" indicates  that  a  record  of  the  job
1110              should  be  written  to an Elasticsearch server specified by the
1111              JobCompLoc parameter.  NOTE: More information  is  available  at
1112              the   Slurm   web   site   (  https://slurm.schedmd.com/elastic‐
1113              search.html ).  The value  "jobcomp/filetxt"  indicates  that  a
1114              record  of the job should be written to a text file specified by
1115              the JobCompLoc parameter.  The value  "jobcomp/mysql"  indicates
1116              that a record of the job should be written to a MySQL or MariaDB
1117              database specified by the JobCompLoc parameter.  The value "job‐
1118              comp/script" indicates that a script specified by the JobCompLoc
1119              parameter is to be executed with environment variables  indicat‐
1120              ing the job information.
1121
1122       JobCompUser
1123              The  user  account  for  accessing  the job completion database.
1124              Only used for database type storage plugins, ignored  otherwise.
1125              Also see DefaultStorageUser.
1126
1127
1128       JobContainerType
1129              Identifies  the  plugin to be used for job tracking.  The slurmd
1130              daemon must be restarted for a  change  in  JobContainerType  to
1131              take  effect.  NOTE: The JobContainerType applies to a job allo‐
1132              cation, while ProctrackType applies to  job  steps.   Acceptable
1133              values at present include:
1134
1135              job_container/cncu  used  only  for Cray systems (CNCU = Compute
1136                                  Node Clean Up)
1137
1138              job_container/none  used for all other system types
1139
1140
1141       JobFileAppend
1142              This option controls what to do if a job's output or error  file
1143              exist  when  the  job  is started.  If JobFileAppend is set to a
1144              value of 1, then append to the existing file.  By  default,  any
1145              existing file is truncated.
1146
1147
1148       JobRequeue
1149              This  option  controls  the default ability for batch jobs to be
1150              requeued.  Jobs may be requeued explicitly by a system  adminis‐
1151              trator,  after node failure, or upon preemption by a higher pri‐
1152              ority job.  If JobRequeue is set to a value of 1, then batch job
1153              may  be  requeued  unless  explicitly  disabled by the user.  If
1154              JobRequeue is set to a value of 0, then batch job  will  not  be
1155              requeued  unless explicitly enabled by the user.  Use the sbatch
1156              --no-requeue or --requeue option to change the default  behavior
1157              for individual jobs.  The default value is 1.
1158
1159
1160       JobSubmitPlugins
1161              A  comma  delimited  list  of job submission plugins to be used.
1162              The specified plugins will be  executed  in  the  order  listed.
1163              These are intended to be site-specific plugins which can be used
1164              to set default job parameters  and/or  logging  events.   Sample
1165              plugins  available in the distribution include "all_partitions",
1166              "defaults", "logging", "lua", and "partition".  For examples  of
1167              use,  see  the  Slurm code in "src/plugins/job_submit" and "con‐
1168              tribs/lua/job_submit*.lua" then modify the code to satisfy  your
1169              needs.  Slurm can be configured to use multiple job_submit plug‐
1170              ins if desired, however the lua plugin will only execute one lua
1171              script  named  "job_submit.lua"  located  in  the default script
1172              directory (typically the subdirectory "etc" of the  installation
1173              directory).  No job submission plugins are used by default.
1174
1175
1176       KeepAliveTime
1177              Specifies  how long sockets communications used between the srun
1178              command and its slurmstepd process are kept alive after  discon‐
1179              nect.   Longer values can be used to improve reliability of com‐
1180              munications in the event of network failures.  The default value
1181              leaves  the  system  default  value.   The  value may not exceed
1182              65533.
1183
1184
1185       KillOnBadExit
1186              If set to 1, a step will be terminated immediately if  any  task
1187              is  crashed  or  aborted,  as indicated by a non-zero exit code.
1188              With the default value of 0, if one of the processes is  crashed
1189              or  aborted  the  other processes will continue to run while the
1190              crashed or aborted process waits. The  user  can  override  this
1191              configuration parameter by using srun's -K, --kill-on-bad-exit.
1192
1193
1194       KillWait
1195              The interval, in seconds, given to a job's processes between the
1196              SIGTERM and SIGKILL signals upon reaching its  time  limit.   If
1197              the job fails to terminate gracefully in the interval specified,
1198              it will be forcibly terminated.  The default value  is  30  sec‐
1199              onds.  The value may not exceed 65533.
1200
1201
1202       NodeFeaturesPlugins
1203              Identifies  the  plugins to be used for support of node features
1204              which can change through time. For example, a node  which  might
1205              be  booted  with various BIOS setting. This is supported through
1206              the use  of  a  node's  active_features  and  available_features
1207              information.  Acceptable values at present include:
1208
1209              node_features/knl_cray
1210                                  used  only for Intel Knights Landing proces‐
1211                                  sors (KNL) on Cray systems
1212
1213              node_features/knl_generic
1214                                  used for Intel  Knights  Landing  processors
1215                                  (KNL) on a generic Linux system
1216
1217
1218       LaunchParameters
1219              Identifies  options to the job launch plugin.  Acceptable values
1220              include:
1221
1222              batch_step_set_cpu_freq Set the cpu frequency for the batch step
1223                                      from  given  --cpu-freq,  or  slurm.conf
1224                                      CpuFreqDef,  option.   By  default  only
1225                                      steps started with srun will utilize the
1226                                      cpu freq setting options.
1227
1228                                      NOTE: If you are using  srun  to  launch
1229                                      your   steps   inside   a  batch  script
1230                                      (advised) this option will create a sit‐
1231                                      uation   where  you  may  have  multiple
1232                                      agents setting the cpu_freq as the batch
1233                                      step  usually runs on the same resources
1234                                      one or  more  steps  the  sruns  in  the
1235                                      script will create.
1236
1237              cray_net_exclusive      Allow  jobs  on  a  Cray  Native cluster
1238                                      exclusive access to  network  resources.
1239                                      This should only be set on clusters pro‐
1240                                      viding exclusive access to each node  to
1241                                      a single job at once, and not using par‐
1242                                      allel steps within  the  job,  otherwise
1243                                      resources  on  the  node can be oversub‐
1244                                      scribed.
1245
1246              enable_nss_slurm        Permits passwd and group resolution  for
1247                                      a  job  to  be  serviced  by  slurmstepd
1248                                      rather than requiring a  lookup  from  a
1249                                      network      based      service.     See
1250                                      https://slurm.schedmd.com/nss_slurm.html
1251                                      for more information.
1252
1253              lustre_no_flush         If set on a Cray Native cluster, then do
1254                                      not flush the Lustre cache on  job  step
1255                                      completion.  This setting will only take
1256                                      effect  after  reconfiguring,  and  will
1257                                      only  take  effect  for  newly  launched
1258                                      jobs.
1259
1260              mem_sort                Sort NUMA memory at step start. User can
1261                                      override      this      default     with
1262                                      SLURM_MEM_BIND environment  variable  or
1263                                      --mem-bind=nosort command line option.
1264
1265              disable_send_gids       By default the slurmctld will lookup and
1266                                      send the user_name and extended gids for
1267                                      a  job,  rather  than individual on each
1268                                      node as part of each task launch.  Which
1269                                      avoids issues around name service scala‐
1270                                      bility  when  launching  jobs  involving
1271                                      many  nodes.   Using  this  option  will
1272                                      reverse this functionality.
1273
1274              slurmstepd_memlock      Lock the  slurmstepd  process's  current
1275                                      memory in RAM.
1276
1277              slurmstepd_memlock_all  Lock  the  slurmstepd  process's current
1278                                      and future memory in RAM.
1279
1280              test_exec               Have srun verify existence of  the  exe‐
1281                                      cutable  program along with user execute
1282                                      permission on the node  where  srun  was
1283                                      called before attempting to launch it on
1284                                      nodes in the step.
1285
1286
1287       LaunchType
1288              Identifies the mechanism to be used to launch application tasks.
1289              Acceptable values include:
1290
1291              launch/slurm
1292                     The default value.
1293
1294
1295       Licenses
1296              Specification  of  licenses (or other resources available on all
1297              nodes of the cluster) which can be allocated to  jobs.   License
1298              names  can  optionally  be  followed by a colon and count with a
1299              default count of one.  Multiple license names  should  be  comma
1300              separated  (e.g.   "Licenses=foo:4,bar").   Note that Slurm pre‐
1301              vents jobs from being scheduled if their required license speci‐
1302              fication  is  not  available.   Slurm does not prevent jobs from
1303              using licenses that are not explicitly listed in the job submis‐
1304              sion specification.
1305
1306
1307       LogTimeFormat
1308              Format  of  the  timestamp  in  slurmctld  and slurmd log files.
1309              Accepted  values   are   "iso8601",   "iso8601_ms",   "rfc5424",
1310              "rfc5424_ms",  "clock", "short" and "thread_id". The values end‐
1311              ing in "_ms" differ from the ones  without  in  that  fractional
1312              seconds  with  millisecond  precision  are  printed. The default
1313              value is "iso8601_ms". The "rfc5424" formats are the same as the
1314              "iso8601"  formats except that the timezone value is also shown.
1315              The "clock" format shows a timestamp in  microseconds  retrieved
1316              with  the  C  standard clock() function. The "short" format is a
1317              short date and time format. The  "thread_id"  format  shows  the
1318              timestamp  in  the  C standard ctime() function form without the
1319              year but including the microseconds, the daemon's process ID and
1320              the current thread name and ID.
1321
1322
1323       MailDomain
1324              Domain name to qualify usernames if email address is not explic‐
1325              itly given with the "--mail-user" option. If  unset,  the  local
1326              MTA will need to qualify local address itself.
1327
1328
1329       MailProg
1330              Fully  qualified  pathname to the program used to send email per
1331              user  request.    The   default   value   is   "/bin/mail"   (or
1332              "/usr/bin/mail"    if    "/bin/mail"    does   not   exist   but
1333              "/usr/bin/mail" does exist).
1334
1335
1336       MaxArraySize
1337              The maximum job array size.  The maximum job  array  task  index
1338              value  will  be one less than MaxArraySize to allow for an index
1339              value of zero.  Configure MaxArraySize to 0 in order to  disable
1340              job  array use.  The value may not exceed 4000001.  The value of
1341              MaxJobCount  should  be  much  larger  than  MaxArraySize.   The
1342              default value is 1001.
1343
1344
1345       MaxJobCount
1346              The maximum number of jobs Slurm can have in its active database
1347              at one time. Set the values  of  MaxJobCount  and  MinJobAge  to
1348              ensure the slurmctld daemon does not exhaust its memory or other
1349              resources. Once this limit is reached, requests to submit  addi‐
1350              tional  jobs  will fail. The default value is 10000 jobs.  NOTE:
1351              Each task of a job array counts as one job even though they will
1352              not  occupy  separate  job  records until modified or initiated.
1353              Performance can suffer with more than  a  few  hundred  thousand
1354              jobs.   Setting per MaxSubmitJobs per user is generally valuable
1355              to prevent a single user from  filling  the  system  with  jobs.
1356              This  is  accomplished  using  Slurm's  database and configuring
1357              enforcement of resource limits.  This value may not be reset via
1358              "scontrol  reconfig".   It only takes effect upon restart of the
1359              slurmctld daemon.
1360
1361
1362       MaxJobId
1363              The maximum job id to be used for jobs submitted to Slurm  with‐
1364              out a specific requested value. Job ids are unsigned 32bit inte‐
1365              gers with the first 26 bits reserved for local job ids  and  the
1366              remaining  6 bits reserved for a cluster id to identify a feder‐
1367              ated  job's  origin.  The  maximun  allowed  local  job  id   is
1368              67,108,863   (0x3FFFFFF).   The   default  value  is  67,043,328
1369              (0x03ff0000).  MaxJobId only applies to the local job id and not
1370              the  federated  job  id.  Job id values generated will be incre‐
1371              mented by 1 for each subsequent job. Once MaxJobId  is  reached,
1372              the  next  job will be assigned FirstJobId.  Federated jobs will
1373              always have a job ID of 67,108,865 or higher.  Also see FirstJo‐
1374              bId.
1375
1376
1377       MaxMemPerCPU
1378              Maximum   real  memory  size  available  per  allocated  CPU  in
1379              megabytes.  Used to avoid over-subscribing  memory  and  causing
1380              paging.  MaxMemPerCPU would generally be used if individual pro‐
1381              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
1382              SelectType=select/cons_tres).   The  default  value is 0 (unlim‐
1383              ited).  Also see DefMemPerCPU, DefMemPerGPU  and  MaxMemPerNode.
1384              MaxMemPerCPU and MaxMemPerNode are mutually exclusive.
1385
1386              NOTE:  If  a  job  specifies a memory per CPU limit that exceeds
1387              this system limit, that job's count of CPUs per task will  auto‐
1388              matically  be  increased. This may result in the job failing due
1389              to CPU count limits.
1390
1391
1392       MaxMemPerNode
1393              Maximum  real  memory  size  available  per  allocated  node  in
1394              megabytes.   Used  to  avoid over-subscribing memory and causing
1395              paging.  MaxMemPerNode would generally be used  if  whole  nodes
1396              are  allocated  to jobs (SelectType=select/linear) and resources
1397              are over-subscribed (OverSubscribe=yes or  OverSubscribe=force).
1398              The  default value is 0 (unlimited).  Also see DefMemPerNode and
1399              MaxMemPerCPU.   MaxMemPerCPU  and  MaxMemPerNode  are   mutually
1400              exclusive.
1401
1402
1403       MaxStepCount
1404              The  maximum  number  of  steps  that any job can initiate. This
1405              parameter is intended to limit the effect of bad batch  scripts.
1406              The default value is 40000 steps.
1407
1408
1409       MaxTasksPerNode
1410              Maximum  number of tasks Slurm will allow a job step to spawn on
1411              a single node. The default  MaxTasksPerNode  is  512.   May  not
1412              exceed 65533.
1413
1414
1415       MCSParameters
1416              MCS  =  Multi-Category Security MCS Plugin Parameters.  The sup‐
1417              ported parameters are specific to  the  MCSPlugin.   Changes  to
1418              this  value take effect when the Slurm daemons are reconfigured.
1419              More    information    about    MCS    is     available     here
1420              <https://slurm.schedmd.com/mcs.html>.
1421
1422
1423       MCSPlugin
1424              MCS  =  Multi-Category  Security : associate a security label to
1425              jobs and ensure that nodes can only be shared among  jobs  using
1426              the same security label.  Acceptable values include:
1427
1428              mcs/none    is  the default value.  No security label associated
1429                          with jobs, no particular security  restriction  when
1430                          sharing nodes among jobs.
1431
1432              mcs/account only users with the same account can share the nodes
1433                          (requires enabling of accounting).
1434
1435              mcs/group   only users with the same group can share the nodes.
1436
1437              mcs/user    a node cannot be shared with other users.
1438
1439
1440       MemLimitEnforce
1441              If set to yes then Slurm will terminate the job  if  it  exceeds
1442              the  value  requested  using  the  --mem-per-cpu  option of sal‐
1443              loc/sbatch/srun.  This is useful in  combination  with  JobAcct‐
1444              GatherParams=OverMemoryKill.   Used  when  jobs  need to specify
1445              --mem-per-cpu for scheduling and they should  be  terminated  if
1446              they  exceed  the  estimated  value.  The default value is 'no',
1447              which disables this enforcing mechanism.   NOTE:  It  is  recom‐
1448              mended to limit memory by enabling task/cgroup in TaskPlugin and
1449              making use of ConstrainRAMSpace=yes cgroup.conf instead of using
1450              this  JobAcctGather  mechanism for memory enforcement, since the
1451              former has a lower resolution (JobAcctGatherFreq) and OOMs could
1452              happen at some point.
1453
1454
1455       MessageTimeout
1456              Time  permitted  for  a  round-trip communication to complete in
1457              seconds. Default value is 10 seconds. For  systems  with  shared
1458              nodes,  the  slurmd  daemon  could  be paged out and necessitate
1459              higher values.
1460
1461
1462       MinJobAge
1463              The minimum age of a completed job before its record  is  purged
1464              from  Slurm's active database. Set the values of MaxJobCount and
1465              to ensure the slurmctld daemon does not exhaust  its  memory  or
1466              other  resources.  The default value is 300 seconds.  A value of
1467              zero prevents any job record purging.  Jobs are not purged  dur‐
1468              ing  a backfill cycle, so it can take longer than MinJobAge sec‐
1469              onds to purge a job if using the backfill scheduling plugin.  In
1470              order  to  eliminate  some possible race conditions, the minimum
1471              non-zero value for MinJobAge recommended is 2.
1472
1473
1474       MpiDefault
1475              Identifies the default type of MPI to be used.  Srun  may  over‐
1476              ride  this  configuration parameter in any case.  Currently sup‐
1477              ported versions include: openmpi, pmi2, pmix, and none (default,
1478              which  works  for many other versions of MPI).  More information
1479              about       MPI       use        is        available        here
1480              <https://slurm.schedmd.com/mpi_guide.html>.
1481
1482
1483       MpiParams
1484              MPI  parameters.   Used to identify ports used by older versions
1485              of OpenMPI  and  native  Cray  systems.   The  input  format  is
1486              "ports=12000-12999"  to  identify a range of communication ports
1487              to be used.  NOTE: This is not needed  for  modern  versions  of
1488              OpenMPI,  taking  it  out  can cause a small boost in scheduling
1489              performance.  NOTE: This is require for Cray's PMI.
1490
1491       MsgAggregationParams
1492              Message  aggregation  parameters.  Message  aggregation  is   an
1493              optional feature that may improve system performance by reducing
1494              the number of separate messages passed between nodes.  The  fea‐
1495              ture  works by routing messages through one or more message col‐
1496              lector nodes between their source and destination nodes. At each
1497              collector node, messages with the same destination received dur‐
1498              ing a defined message collection window are packaged into a sin‐
1499              gle  composite  message.  When the window expires, the composite
1500              message is sent to the next collector node on the route  to  its
1501              destination.  The route between each source and destination node
1502              is provided by the Route plugin. When  a  composite  message  is
1503              received  at  its  destination  node,  the original messages are
1504              extracted and processed as if they had been sent directly.
1505              Currently, the only message types supported by message  aggrega‐
1506              tion  are  the  node registration, batch script completion, step
1507              completion, and epilog complete messages.
1508              Since the aggregation node address is set resolving the hostname
1509              at  slurmd  start  in  each node, using this feature in non-flat
1510              networks is not possible.  For example, if  slurmctld  is  in  a
1511              different  subnetwork  than compute nodes and node addresses are
1512              resolved differently the controller than in the  compute  nodes,
1513              you  may face communication issues. In some cases it may be use‐
1514              ful to set CommunicationParameters=NoInAddrAny to make all  dae‐
1515              mons communicate through the same network.
1516              The format for this parameter is as follows:
1517
1518              MsgAggregationParams=<option>=<value>
1519                          where  <option>=<value> specify a particular control
1520                          variable. Multiple, comma-separated <option>=<value>
1521                          pairs  may  be  specified.  Supported options are as
1522                          follows:
1523
1524                          WindowMsgs=<number>
1525                                 where <number> is the maximum number of  mes‐
1526                                 sages in each message collection window.
1527
1528                          WindowTime=<time>
1529                                 where  <time>  is the maximum elapsed time in
1530                                 milliseconds of each message collection  win‐
1531                                 dow.
1532
1533              A  window  expires  when either WindowMsgs or
1534              WindowTime is
1535              reached. By default, message aggregation is disabled. To  enable
1536              the  feature,  set  WindowMsgs  to  a  value greater than 1. The
1537              default value for WindowTime is 100 milliseconds.
1538
1539
1540       OverTimeLimit
1541              Number of minutes by which a  job  can  exceed  its  time  limit
1542              before  being  canceled.  Normally a job's time limit is treated
1543              as a hard limit and the job will be killed  upon  reaching  that
1544              limit.   Configuring OverTimeLimit will result in the job's time
1545              limit being treated like a soft limit.  Adding the OverTimeLimit
1546              value  to  the  soft  time  limit provides a hard time limit, at
1547              which point the job is canceled.  This  is  particularly  useful
1548              for  backfill  scheduling, which bases upon each job's soft time
1549              limit.  The default value is zero.  May not exceed exceed  65533
1550              minutes.  A value of "UNLIMITED" is also supported.
1551
1552
1553       PluginDir
1554              Identifies  the places in which to look for Slurm plugins.  This
1555              is a colon-separated list of directories, like the PATH environ‐
1556              ment variable.  The default value is "/usr/local/lib/slurm".
1557
1558
1559       PlugStackConfig
1560              Location of the config file for Slurm stackable plugins that use
1561              the  Stackable  Plugin  Architecture  for  Node  job  (K)control
1562              (SPANK).  This provides support for a highly configurable set of
1563              plugins to be called before and/or after execution of each  task
1564              spawned  as  part  of  a  user's  job step.  Default location is
1565              "plugstack.conf" in the same directory as the system slurm.conf.
1566              For more information on SPANK plugins, see the spank(8) manual.
1567
1568
1569       PowerParameters
1570              System  power  management  parameters.  The supported parameters
1571              are specific to the PowerPlugin.  Changes  to  this  value  take
1572              effect  when  the Slurm daemons are reconfigured.  More informa‐
1573              tion  about  system   power   management   is   available   here
1574              <https://slurm.schedmd.com/power_mgmt.html>.    Options  current
1575              supported by any plugins are listed below.
1576
1577              balance_interval=#
1578                     Specifies the time interval, in seconds, between attempts
1579                     to rebalance power caps across the nodes.  This also con‐
1580                     trols the frequency at which Slurm  attempts  to  collect
1581                     current  power  consumption  data  (old  data may be used
1582                     until new data is available from  the  underlying  infra‐
1583                     structure and values below 10 seconds are not recommended
1584                     for Cray systems).  The  default  value  is  30  seconds.
1585                     Supported by the power/cray_aries plugin.
1586
1587              capmc_path=
1588                     Specifies  the  absolute  path of the capmc command.  The
1589                     default  value  is   "/opt/cray/capmc/default/bin/capmc".
1590                     Supported by the power/cray_aries plugin.
1591
1592              cap_watts=#
1593                     Specifies  the total power limit to be established across
1594                     all compute nodes managed by Slurm.  A value  of  0  sets
1595                     every compute node to have an unlimited cap.  The default
1596                     value is 0.  Supported by the power/cray_aries plugin.
1597
1598              decrease_rate=#
1599                     Specifies the maximum rate of change in the power cap for
1600                     a  node  where  the actual power usage is below the power
1601                     cap  by  an  amount  greater  than  lower_threshold  (see
1602                     below).   Value represents a percentage of the difference
1603                     between a node's minimum and maximum  power  consumption.
1604                     The  default  value  is  50  percent.   Supported  by the
1605                     power/cray_aries plugin.
1606
1607              get_timeout=#
1608                     Amount of time allowed to get power state information  in
1609                     milliseconds.  The default value is 5,000 milliseconds or
1610                     5 seconds.  Supported by the power/cray_aries plugin  and
1611                     represents  the  time  allowed  for  the capmc command to
1612                     respond to various "get" options.
1613
1614              increase_rate=#
1615                     Specifies the maximum rate of change in the power cap for
1616                     a   node   where   the   actual  power  usage  is  within
1617                     upper_threshold (see below) of the power cap.  Value rep‐
1618                     resents  a  percentage of the difference between a node's
1619                     minimum and maximum power consumption.  The default value
1620                     is 20 percent.  Supported by the power/cray_aries plugin.
1621
1622              job_level
1623                     All  nodes  associated  with every job will have the same
1624                     power  cap,  to  the  extent  possible.   Also  see   the
1625                     --power=level option on the job submission commands.
1626
1627              job_no_level
1628                     Disable  the  user's ability to set every node associated
1629                     with a job to the same power cap.  Each  node  will  have
1630                     it's  power  cap  set  independently.   This disables the
1631                     --power=level option on the job submission commands.
1632
1633              lower_threshold=#
1634                     Specify a lower power consumption threshold.  If a node's
1635                     current power consumption is below this percentage of its
1636                     current cap, then its power cap  will  be  reduced.   The
1637                     default   value   is   90   percent.   Supported  by  the
1638                     power/cray_aries plugin.
1639
1640              recent_job=#
1641                     If a job has started or resumed execution (from  suspend)
1642                     on  a compute node within this number of seconds from the
1643                     current time, the node's power cap will be  increased  to
1644                     the  maximum.   The  default  value is 300 seconds.  Sup‐
1645                     ported by the power/cray_aries plugin.
1646
1647
1648              set_timeout=#
1649                     Amount of time allowed to set power state information  in
1650                     milliseconds.   The  default value is 30,000 milliseconds
1651                     or 30 seconds.  Supported by the  power/cray  plugin  and
1652                     represents  the  time  allowed  for  the capmc command to
1653                     respond to various "set" options.
1654
1655              set_watts=#
1656                     Specifies the power limit to  be  set  on  every  compute
1657                     nodes  managed by Slurm.  Every node gets this same power
1658                     cap and there is no variation  through  time  based  upon
1659                     actual  power  usage  on  the  node.   Supported  by  the
1660                     power/cray_aries plugin.
1661
1662              upper_threshold=#
1663                     Specify an  upper  power  consumption  threshold.   If  a
1664                     node's current power consumption is above this percentage
1665                     of its current cap, then its power cap will be  increased
1666                     to the extent possible.  The default value is 95 percent.
1667                     Supported by the power/cray_aries plugin.
1668
1669
1670       PowerPlugin
1671              Identifies the plugin used for system  power  management.   Cur‐
1672              rently  supported plugins include: cray_aries and none.  Changes
1673              to this value require restarting Slurm daemons to  take  effect.
1674              More information about system power management is available here
1675              <https://slurm.schedmd.com/power_mgmt.html>.   By  default,   no
1676              power plugin is loaded.
1677
1678
1679       PreemptMode
1680              Enables  gang  scheduling  and/or controls the mechanism used to
1681              preempt jobs.  When the PreemptType parameter is set  to  enable
1682              preemption,  the  PreemptMode selects the default mechanism used
1683              to preempt the lower priority jobs for the cluster.  PreemptMode
1684              may  be  specified  on  a  per  partition basis to override this
1685              default value if PreemptType=preempt/partition_prio, but a valid
1686              default PreemptMode value must be specified for the cluster as a
1687              whole when preemption is enabled.  The GANG option  is  used  to
1688              enable  gang  scheduling  independent  of  whether preemption is
1689              enabled (the PreemptType setting).  The GANG option can be spec‐
1690              ified  in addition to a PreemptMode setting with the two options
1691              comma separated.  The SUSPEND option requires that gang schedul‐
1692              ing  be  enabled  (i.e,  "PreemptMode=SUSPEND,GANG").  NOTE: For
1693              performance reasons, the backfill scheduler reserves whole nodes
1694              for jobs, not partial nodes. If during backfill scheduling a job
1695              preempts one or more other jobs, the whole nodes for those  pre‐
1696              empted jobs are reserved for the preemptor job, even if the pre‐
1697              emptor job requested fewer resources than that.  These  reserved
1698              nodes aren't available to other jobs during that backfill cycle,
1699              even if the other jobs could fit on the nodes.  Therefore,  jobs
1700              may  preempt  more  resources during a single backfill iteration
1701              than they requested.
1702
1703              OFF         is the default value and disables job preemption and
1704                          gang scheduling.
1705
1706              CANCEL      always cancel the job.
1707
1708              CHECKPOINT  preempts jobs by checkpointing them (if possible) or
1709                          canceling them.
1710
1711              GANG        enables gang scheduling (time slicing)  of  jobs  in
1712                          the  same  partition.  NOTE: Gang scheduling is per‐
1713                          formed independently for each partition, so  config‐
1714                          uring  partitions  with  overlapping  nodes and gang
1715                          scheduling is generally not recommended.
1716
1717              REQUEUE     preempts jobs by requeuing  them  (if  possible)  or
1718                          canceling  them.   For jobs to be requeued they must
1719                          have the --requeue sbatch option set or the  cluster
1720                          wide  JobRequeue parameter in slurm.conf must be set
1721                          to one.
1722
1723              SUSPEND     If PreemptType=preempt/partition_prio is  configured
1724                          then suspend and automatically resume the low prior‐
1725                          ity jobs.  If PreemptType=preempt/qos is configured,
1726                          then  the  jobs  sharing  resources will always time
1727                          slice rather than one job remaining suspended.   The
1728                          SUSPEND  may  only be used with the GANG option (the
1729                          gang scheduler module performs the job resume opera‐
1730                          tion).
1731
1732
1733       PreemptType
1734              This  specifies  the  plugin  used to identify which jobs can be
1735              preempted in order to start a pending job.
1736
1737              preempt/none
1738                     Job preemption is disabled.  This is the default.
1739
1740              preempt/partition_prio
1741                     Job preemption is based  upon  partition  priority  tier.
1742                     Jobs  in  higher priority partitions (queues) may preempt
1743                     jobs from lower priority partitions.  This is not compat‐
1744                     ible with PreemptMode=OFF.
1745
1746              preempt/qos
1747                     Job  preemption rules are specified by Quality Of Service
1748                     (QOS) specifications in the Slurm database.  This  option
1749                     is  not compatible with PreemptMode=OFF.  A configuration
1750                     of PreemptMode=SUSPEND is only supported by  the  Select‐
1751                     Type=select/cons_res    and   SelectType=select/cons_tres
1752                     plugins.
1753
1754
1755       PreemptExemptTime
1756              Global option for minimum run time for all jobs before they  can
1757              be  considered  for  preemption. Any QOS PreemptExemptTime takes
1758              precedence over the global option.  A time of  -1  disables  the
1759              option,  equivalent  to 0. Acceptable time formats include "min‐
1760              utes", "minutes:seconds", "hours:minutes:seconds", "days-hours",
1761              "days-hours:minutes", and "days-hours:minutes:seconds".
1762
1763
1764       PriorityCalcPeriod
1765              The  period of time in minutes in which the half-life decay will
1766              be re-calculated.  Applicable only if PriorityType=priority/mul‐
1767              tifactor.  The default value is 5 (minutes).
1768
1769
1770       PriorityDecayHalfLife
1771              This  controls  how  long  prior  resource  use is considered in
1772              determining how over- or under-serviced an association is (user,
1773              bank  account  and  cluster)  in  determining job priority.  The
1774              record of usage will be decayed over  time,  with  half  of  the
1775              original  value cleared at age PriorityDecayHalfLife.  If set to
1776              0 no decay will be applied.  This is  helpful  if  you  want  to
1777              enforce  hard  time limits per association.  If set to 0 Priori‐
1778              tyUsageResetPeriod must be set  to  some  interval.   Applicable
1779              only  if  PriorityType=priority/multifactor.  The unit is a time
1780              string (i.e. min, hr:min:00, days-hr:min:00, or  days-hr).   The
1781              default value is 7-0 (7 days).
1782
1783
1784       PriorityFavorSmall
1785              Specifies  that small jobs should be given preferential schedul‐
1786              ing priority.  Applicable only  if  PriorityType=priority/multi‐
1787              factor.  Supported values are "YES" and "NO".  The default value
1788              is "NO".
1789
1790
1791       PriorityFlags
1792              Flags to modify priority behavior.  Applicable only if Priority‐
1793              Type=priority/multifactor.   The  keywords below have no associ‐
1794              ated   value   (e.g.    "PriorityFlags=ACCRUE_ALWAYS,SMALL_RELA‐
1795              TIVE_TO_TIME").
1796
1797              ACCRUE_ALWAYS    If  set,  priority age factor will be increased
1798                               despite job dependencies or holds.
1799
1800              CALCULATE_RUNNING
1801                               If set, priorities  will  be  recalculated  not
1802                               only  for  pending  jobs,  but also running and
1803                               suspended jobs.
1804
1805              DEPTH_OBLIVIOUS  If set, priority will be calculated based simi‐
1806                               lar  to the normal multifactor calculation, but
1807                               depth of the associations in the  tree  do  not
1808                               adversely  effect  their  priority. This option
1809                               automatically enables NO_FAIR_TREE.
1810
1811              NO_FAIR_TREE     Disables the "fair tree" algorithm, and reverts
1812                               to "classic" fair share priority scheduling.
1813
1814              INCR_ONLY        If  set,  priority values will only increase in
1815                               value. Job  priority  will  never  decrease  in
1816                               value.
1817
1818              MAX_TRES         If  set,  the  weighted  TRES value (e.g. TRES‐
1819                               BillingWeights) is calculated  as  the  MAX  of
1820                               individual  TRES'  on  a  node (e.g. cpus, mem,
1821                               gres) plus the sum of all  global  TRES'  (e.g.
1822                               licenses).
1823
1824              NO_NORMAL_ALL    If set, all NO_NORMAL_* flags are set.
1825
1826              NO_NORMAL_ASSOC  If  set,  the association factor is not normal‐
1827                               ized against the highest association priority.
1828
1829              NO_NORMAL_PART   If set, the partition factor is not  normalized
1830                               against the highest partition PriorityTier.
1831
1832              NO_NORMAL_QOS    If  set,  the  QOS  factor  is  not  normalized
1833                               against the highest qos priority.
1834
1835              NO_NORMAL_TRES   If  set,  the  QOS  factor  is  not  normalized
1836                               against the job's partition TRES counts.
1837
1838              SMALL_RELATIVE_TO_TIME
1839                               If  set, the job's size component will be based
1840                               upon not the job size alone, but the job's size
1841                               divided by it's time limit.
1842
1843
1844       PriorityMaxAge
1845              Specifies the job age which will be given the maximum age factor
1846              in computing priority. For example, a value of 30 minutes  would
1847              result  in  all  jobs  over  30  minutes  old would get the same
1848              age-based  priority.   Applicable  only  if  PriorityType=prior‐
1849              ity/multifactor.    The   unit  is  a  time  string  (i.e.  min,
1850              hr:min:00, days-hr:min:00, or days-hr).  The  default  value  is
1851              7-0 (7 days).
1852
1853
1854       PriorityParameters
1855              Arbitrary string used by the PriorityType plugin.
1856
1857
1858       PrioritySiteFactorParameters
1859              Arbitrary string used by the PrioritySiteFactorPlugin plugin.
1860
1861
1862       PrioritySiteFactorPlugin
1863              The  specifies  an  optional plugin to be used alongside "prior‐
1864              ity/multifactor", which is meant to initially set  and  continu‐
1865              ously  update the SiteFactor priority factor.  The default value
1866              is "site_factor/none".
1867
1868
1869       PriorityType
1870              This specifies the plugin to be used  in  establishing  a  job's
1871              scheduling priority. Supported values are "priority/basic" (jobs
1872              are prioritized by  order  of  arrival),  "priority/multifactor"
1873              (jobs  are prioritized based upon size, age, fair-share of allo‐
1874              cation, etc).  Also see PriorityFlags for configuration options.
1875              The default value is "priority/basic".
1876
1877              When  not FIFO scheduling, jobs are prioritized in the following
1878              order:
1879
1880              1. Jobs that can preempt
1881
1882              2. Jobs with an advanced reservation
1883
1884              3. Partition Priority Tier
1885
1886              4. Job Priority
1887
1888              5. Job Id
1889
1890
1891       PriorityUsageResetPeriod
1892              At this interval the usage of associations will be reset  to  0.
1893              This  is  used  if you want to enforce hard limits of time usage
1894              per association.  If PriorityDecayHalfLife is set  to  be  0  no
1895              decay  will  happen  and this is the only way to reset the usage
1896              accumulated by running jobs.  By default this is turned off  and
1897              it  is  advised to use the PriorityDecayHalfLife option to avoid
1898              not having anything running on your cluster, but if your  schema
1899              is  set  up to only allow certain amounts of time on your system
1900              this is the way to do it.  Applicable only if  PriorityType=pri‐
1901              ority/multifactor.
1902
1903              NONE        Never clear historic usage. The default value.
1904
1905              NOW         Clear  the  historic usage now.  Executed at startup
1906                          and reconfiguration time.
1907
1908              DAILY       Cleared every day at midnight.
1909
1910              WEEKLY      Cleared every week on Sunday at time 00:00.
1911
1912              MONTHLY     Cleared on the first  day  of  each  month  at  time
1913                          00:00.
1914
1915              QUARTERLY   Cleared  on  the  first  day of each quarter at time
1916                          00:00.
1917
1918              YEARLY      Cleared on the first day of each year at time 00:00.
1919
1920
1921       PriorityWeightAge
1922              An integer value that sets the degree to which  the  queue  wait
1923              time  component  contributes  to the job's priority.  Applicable
1924              only if PriorityType=priority/multifactor.  The default value is
1925              0.
1926
1927
1928       PriorityWeightAssoc
1929              An  integer  value that sets the degree to which the association
1930              component contributes to the job's priority.  Applicable only if
1931              PriorityType=priority/multifactor.  The default value is 0.
1932
1933
1934       PriorityWeightFairshare
1935              An  integer  value  that sets the degree to which the fair-share
1936              component contributes to the job's priority.  Applicable only if
1937              PriorityType=priority/multifactor.  The default value is 0.
1938
1939
1940       PriorityWeightJobSize
1941              An integer value that sets the degree to which the job size com‐
1942              ponent contributes to the job's priority.   Applicable  only  if
1943              PriorityType=priority/multifactor.  The default value is 0.
1944
1945
1946       PriorityWeightPartition
1947              Partition  factor  used by priority/multifactor plugin in calcu‐
1948              lating job priority.   Applicable  only  if  PriorityType=prior‐
1949              ity/multifactor.  The default value is 0.
1950
1951
1952       PriorityWeightQOS
1953              An  integer  value  that sets the degree to which the Quality Of
1954              Service component contributes to the job's priority.  Applicable
1955              only if PriorityType=priority/multifactor.  The default value is
1956              0.
1957
1958
1959       PriorityWeightTRES
1960              A comma separated list of TRES Types and weights that  sets  the
1961              degree that each TRES Type contributes to the job's priority.
1962
1963              e.g.
1964              PriorityWeightTRES=CPU=1000,Mem=2000,GRES/gpu=3000
1965
1966              Applicable  only  if  PriorityType=priority/multifactor  and  if
1967              AccountingStorageTRES is configured with each TRES Type.   Nega‐
1968              tive values are allowed.  The default values are 0.
1969
1970
1971       PrivateData
1972              This  controls  what  type of information is hidden from regular
1973              users.  By default, all information is  visible  to  all  users.
1974              User SlurmUser and root can always view all information.  Multi‐
1975              ple values may be specified with a comma separator.   Acceptable
1976              values include:
1977
1978              accounts
1979                     (NON-SlurmDBD  ACCOUNTING ONLY) Prevents users from view‐
1980                     ing any account definitions unless they are  coordinators
1981                     of them.
1982
1983              cloud  Powered down nodes in the cloud are visible.
1984
1985              events prevents users from viewing event information unless they
1986                     have operator status or above.
1987
1988              jobs   Prevents users from viewing jobs or job  steps  belonging
1989                     to  other  users. (NON-SlurmDBD ACCOUNTING ONLY) Prevents
1990                     users from viewing job records belonging to  other  users
1991                     unless  they  are coordinators of the association running
1992                     the job when using sacct.
1993
1994              nodes  Prevents users from viewing node state information.
1995
1996              partitions
1997                     Prevents users from viewing partition state information.
1998
1999              reservations
2000                     Prevents regular users from  viewing  reservations  which
2001                     they can not use.
2002
2003              usage  Prevents users from viewing usage of any other user, this
2004                     applies to sshare.  (NON-SlurmDBD ACCOUNTING  ONLY)  Pre‐
2005                     vents  users  from  viewing usage of any other user, this
2006                     applies to sreport.
2007
2008              users  (NON-SlurmDBD ACCOUNTING ONLY) Prevents users from  view‐
2009                     ing  information  of any user other than themselves, this
2010                     also makes it so users can  only  see  associations  they
2011                     deal  with.   Coordinators  can  see  associations of all
2012                     users they are coordinator of, but  can  only  see  them‐
2013                     selves when listing users.
2014
2015
2016       ProctrackType
2017              Identifies  the  plugin to be used for process tracking on a job
2018              step basis.  The slurmd daemon uses this mechanism  to  identify
2019              all  processes  which  are children of processes it spawns for a
2020              user job step.  The slurmd daemon must be restarted for a change
2021              in  ProctrackType  to  take effect.  NOTE: "proctrack/linuxproc"
2022              and "proctrack/pgid" can fail to identify all processes  associ‐
2023              ated  with  a job since processes can become a child of the init
2024              process (when the parent process  terminates)  or  change  their
2025              process   group.    To  reliably  track  all  processes,  "proc‐
2026              track/cgroup" is highly recommended.  NOTE: The JobContainerType
2027              applies  to a job allocation, while ProctrackType applies to job
2028              steps.  Acceptable values at present include:
2029
2030              proctrack/cgroup    which uses linux cgroups  to  constrain  and
2031                                  track  processes, and is the default.  NOTE:
2032                                  see  "man  cgroup.conf"  for   configuration
2033                                  details
2034
2035              proctrack/cray_aries
2036                                  which uses Cray proprietary process tracking
2037
2038              proctrack/linuxproc which  uses  linux process tree using parent
2039                                  process IDs.
2040
2041              proctrack/pgid      which uses process group IDs
2042
2043
2044       Prolog Fully qualified pathname of a program for the slurmd to  execute
2045              whenever it is asked to run a job step from a new job allocation
2046              (e.g.  "/usr/local/slurm/prolog"). A glob pattern (See glob (7))
2047              may  also  be used to specify more than one program to run (e.g.
2048              "/etc/slurm/prolog.d/*"). The slurmd executes the prolog  before
2049              starting  the  first job step.  The prolog script or scripts may
2050              be used to purge files, enable  user  login,  etc.   By  default
2051              there  is  no  prolog. Any configured script is expected to com‐
2052              plete execution quickly (in less time than MessageTimeout).   If
2053              the  prolog  fails  (returns  a  non-zero  exit code), this will
2054              result in the node being set to a DRAIN state and the job  being
2055              requeued  in  a held state, unless nohold_on_prolog_fail is con‐
2056              figured in SchedulerParameters.  See Prolog and  Epilog  Scripts
2057              for more information.
2058
2059
2060       PrologEpilogTimeout
2061              The  interval  in  seconds  Slurms  waits  for Prolog and Epilog
2062              before terminating them. The default behavior is to wait indefi‐
2063              nitely.  This  interval  applies to the Prolog and Epilog run by
2064              slurmd daemon before and after the job, the PrologSlurmctld  and
2065              EpilogSlurmctld  run  by slurmctld daemon, and the SPANK plugins
2066              run by the slurmstepd daemon.
2067
2068
2069       PrologFlags
2070              Flags to control the Prolog behavior. By default  no  flags  are
2071              set.  Multiple flags may be specified in a comma-separated list.
2072              Currently supported options are:
2073
2074              Alloc   If set, the Prolog script will be executed at job  allo‐
2075                      cation.  By  default, Prolog is executed just before the
2076                      task is launched. Therefore, when salloc is started,  no
2077                      Prolog is executed. Alloc is useful for preparing things
2078                      before a user starts to use any allocated resources.  In
2079                      particular,  this  flag  is needed on a Cray system when
2080                      cluster compatibility mode is enabled.
2081
2082                      NOTE: Use of the  Alloc  flag  will  increase  the  time
2083                      required to start jobs.
2084
2085              Contain At job allocation time, use the ProcTrack plugin to cre‐
2086                      ate a job container  on  all  allocated  compute  nodes.
2087                      This  container  may  be  used  for  user  processes not
2088                      launched    under    Slurm    control,    for    example
2089                      pam_slurm_adopt  may  place processes launched through a
2090                      direct  user  login  into  this  container.   If   using
2091                      pam_slurm_adopt,  then  ProcTrackType  must  be  set  to
2092                      either proctrack/cgroup or  proctrack/cray_aries.   Set‐
2093                      ting the Contain implicitly sets the Alloc flag.
2094
2095              NoHold  If  set,  the  Alloc flag should also be set.  This will
2096                      allow for salloc to not block until the prolog  is  fin‐
2097                      ished on each node.  The blocking will happen when steps
2098                      reach the slurmd and before any execution  has  happened
2099                      in  the  step.  This is a much faster way to work and if
2100                      using srun to launch your  tasks  you  should  use  this
2101                      flag.  This  flag cannot be combined with the Contain or
2102                      X11 flags.
2103
2104              Serial  By default, the Prolog and Epilog  scripts  run  concur‐
2105                      rently  on each node.  This flag forces those scripts to
2106                      run serially within each node, but  with  a  significant
2107                      penalty to job throughput on each node.
2108
2109              X11     Enable  Slurm's  built-in  X11  forwarding capabilities.
2110                      This is incompatible with ProctrackType=proctrack/linux‐
2111                      proc.  Setting the X11 flag implicitly enables both Con‐
2112                      tain and Alloc flags as well.
2113
2114
2115       PrologSlurmctld
2116              Fully qualified pathname of a program for the  slurmctld  daemon
2117              to   execute   before   granting  a  new  job  allocation  (e.g.
2118              "/usr/local/slurm/prolog_controller").  The program executes  as
2119              SlurmUser  on the same node where the slurmctld daemon executes,
2120              giving it permission to drain nodes and requeue  the  job  if  a
2121              failure  occurs  or  cancel the job if appropriate.  The program
2122              can be used to reboot nodes or perform  other  work  to  prepare
2123              resources  for  use.   Exactly  what the program does and how it
2124              accomplishes this is completely at the discretion of the  system
2125              administrator.   Information about the job being initiated, it's
2126              allocated nodes, etc. are passed to the program  using  environ‐
2127              ment  variables.  While this program is running, the nodes asso‐
2128              ciated with the job will be have a POWER_UP/CONFIGURING flag set
2129              in their state, which can be readily viewed.  The slurmctld dae‐
2130              mon will wait indefinitely for this program to  complete.   Once
2131              the  program completes with an exit code of zero, the nodes will
2132              be considered ready for use and the program will be started.  If
2133              some  node can not be made available for use, the program should
2134              drain the node (typically using the scontrol command) and termi‐
2135              nate  with  a  non-zero  exit  code.   A non-zero exit code will
2136              result in the job being requeued  (where  possible)  or  killed.
2137              Note  that only batch jobs can be requeued.  See Prolog and Epi‐
2138              log Scripts for more information.
2139
2140
2141       PropagatePrioProcess
2142              Controls the scheduling priority (nice value)  of  user  spawned
2143              tasks.
2144
2145              0    The  tasks  will  inherit  the scheduling priority from the
2146                   slurm daemon.  This is the default value.
2147
2148              1    The tasks will inherit the scheduling priority of the  com‐
2149                   mand used to submit them (e.g. srun or sbatch).  Unless the
2150                   job is submitted by user root, the tasks will have a sched‐
2151                   uling  priority  no  higher  than the slurm daemon spawning
2152                   them.
2153
2154              2    The tasks will inherit the scheduling priority of the  com‐
2155                   mand  used  to  submit  them (e.g. srun or sbatch) with the
2156                   restriction that their nice value will always be one higher
2157                   than  the slurm daemon (i.e.  the tasks scheduling priority
2158                   will be lower than the slurm daemon).
2159
2160
2161       PropagateResourceLimits
2162              A list of comma separated resource limit names.  The slurmd dae‐
2163              mon  uses these names to obtain the associated (soft) limit val‐
2164              ues from the user's process  environment  on  the  submit  node.
2165              These  limits  are  then propagated and applied to the jobs that
2166              will run on the compute nodes.  This  parameter  can  be  useful
2167              when  system  limits vary among nodes.  Any resource limits that
2168              do not appear in the list are not propagated.  However, the user
2169              can  override this by specifying which resource limits to propa‐
2170              gate with the sbatch or srun "--propagate"  option.  If  neither
2171              PropagateResourceLimits   or  PropagateResourceLimitsExcept  are
2172              configured and the "--propagate" option is not  specified,  then
2173              the  default  action is to propagate all limits. Only one of the
2174              parameters, either PropagateResourceLimits or PropagateResource‐
2175              LimitsExcept,  may be specified.  The user limits can not exceed
2176              hard limits under which the slurmd daemon operates. If the  user
2177              limits  are  not  propagated,  the limits from the slurmd daemon
2178              will be propagated to the user's job. The limits  used  for  the
2179              Slurm  daemons  can  be  set in the /etc/sysconf/slurm file. For
2180              more information,  see:  https://slurm.schedmd.com/faq.html#mem‐
2181              lock  The following limit names are supported by Slurm (although
2182              some options may not be supported on some systems):
2183
2184              ALL       All limits listed below (default)
2185
2186              NONE      No limits listed below
2187
2188              AS        The maximum address space for a process
2189
2190              CORE      The maximum size of core file
2191
2192              CPU       The maximum amount of CPU time
2193
2194              DATA      The maximum size of a process's data segment
2195
2196              FSIZE     The maximum size of files created. Note  that  if  the
2197                        user  sets  FSIZE to less than the current size of the
2198                        slurmd.log, job launches will fail with a  'File  size
2199                        limit exceeded' error.
2200
2201              MEMLOCK   The maximum size that may be locked into memory
2202
2203              NOFILE    The maximum number of open files
2204
2205              NPROC     The maximum number of processes available
2206
2207              RSS       The maximum resident set size
2208
2209              STACK     The maximum stack size
2210
2211
2212       PropagateResourceLimitsExcept
2213              A list of comma separated resource limit names.  By default, all
2214              resource limits will be propagated, (as described by the  Propa‐
2215              gateResourceLimits  parameter),  except for the limits appearing
2216              in this list.   The user can override this by  specifying  which
2217              resource  limits  to propagate with the sbatch or srun "--propa‐
2218              gate" option.  See PropagateResourceLimits above for a  list  of
2219              valid limit names.
2220
2221
2222       RebootProgram
2223              Program  to  be  executed  on  each  compute  node to reboot it.
2224              Invoked on each node once it  becomes  idle  after  the  command
2225              "scontrol  reboot_nodes"  is executed by an authorized user or a
2226              job is submitted with the "--reboot" option.   After  rebooting,
2227              the  node  is returned to normal use.  See ResumeTimeout to con‐
2228              figure the time you expect a reboot to finish in.  A  node  will
2229              be marked DOWN if it doesn't reboot within ResumeTimeout.
2230
2231
2232       ReconfigFlags
2233              Flags  to  control  various  actions  that  may be taken when an
2234              "scontrol reconfig" command is  issued.  Currently  the  options
2235              are:
2236
2237              KeepPartInfo     If  set,  an  "scontrol  reconfig" command will
2238                               maintain  the  in-memory  value  of   partition
2239                               "state" and other parameters that may have been
2240                               dynamically updated by "scontrol update".  Par‐
2241                               tition  information in the slurm.conf file will
2242                               be  merged  with  in-memory  data.   This  flag
2243                               supersedes the KeepPartState flag.
2244
2245              KeepPartState    If  set,  an  "scontrol  reconfig" command will
2246                               preserve only  the  current  "state"  value  of
2247                               in-memory  partitions  and will reset all other
2248                               parameters of the partitions that may have been
2249                               dynamically updated by "scontrol update" to the
2250                               values from  the  slurm.conf  file.   Partition
2251                               information  in  the  slurm.conf  file  will be
2252                               merged with in-memory data.
2253              The default for the above flags is not set,  and  the  "scontrol
2254              reconfig"  will rebuild the partition information using only the
2255              definitions in the slurm.conf file.
2256
2257
2258       RequeueExit
2259              Enables automatic requeue for batch jobs  which  exit  with  the
2260              specified values.  Separate multiple exit code by a comma and/or
2261              specify numeric ranges using a  "-"  separator  (e.g.  "Requeue‐
2262              Exit=1-9,18")  Jobs  will  be  put  back in to pending state and
2263              later scheduled again.  Restarted jobs will have the environment
2264              variable  SLURM_RESTART_COUNT set to the number of times the job
2265              has been restarted.
2266
2267
2268       RequeueExitHold
2269              Enables automatic requeue for batch jobs  which  exit  with  the
2270              specified values, with these jobs being held until released man‐
2271              ually by the user.  Separate  multiple  exit  code  by  a  comma
2272              and/or  specify  numeric  ranges  using  a  "-"  separator (e.g.
2273              "RequeueExitHold=10-12,16") These jobs are put in  the  JOB_SPE‐
2274              CIAL_EXIT  exit state.  Restarted jobs will have the environment
2275              variable SLURM_RESTART_COUNT set to the number of times the  job
2276              has been restarted.
2277
2278
2279       ResumeFailProgram
2280              The  program  that will be executed when nodes fail to resume to
2281              by ResumeTimeout. The argument to the program will be the  names
2282              of the failed nodes (using Slurm's hostlist expression format).
2283
2284
2285       ResumeProgram
2286              Slurm  supports a mechanism to reduce power consumption on nodes
2287              that remain idle for an extended period of time.  This is  typi‐
2288              cally accomplished by reducing voltage and frequency or powering
2289              the node down.  ResumeProgram is the program that will  be  exe‐
2290              cuted  when  a  node in power save mode is assigned work to per‐
2291              form.  For reasons of  reliability,  ResumeProgram  may  execute
2292              more  than once for a node when the slurmctld daemon crashes and
2293              is restarted.  If ResumeProgram is unable to restore a  node  to
2294              service  with  a  responding  slurmd and an updated BootTime, it
2295              should requeue any job associated with the node and set the node
2296              state  to  DOWN.  If the node isn't actually rebooted (i.e. when
2297              multiple-slurmd is configured) starting slurmd with "-b"  option
2298              might  be useful.  The program executes as SlurmUser.  The argu‐
2299              ment to the program will be the names of  nodes  to  be  removed
2300              from  power savings mode (using Slurm's hostlist expression for‐
2301              mat).  By default no  program  is  run.   Related  configuration
2302              options include ResumeTimeout, ResumeRate, SuspendRate, Suspend‐
2303              Time, SuspendTimeout, SuspendProgram, SuspendExcNodes, and  Sus‐
2304              pendExcParts.   More  information  is available at the Slurm web
2305              site ( https://slurm.schedmd.com/power_save.html ).
2306
2307
2308       ResumeRate
2309              The rate at which nodes in power save mode are returned to  nor‐
2310              mal  operation  by  ResumeProgram.  The value is number of nodes
2311              per minute and it can be used to prevent power surges if a large
2312              number of nodes in power save mode are assigned work at the same
2313              time (e.g. a large job starts).  A value of zero results  in  no
2314              limits  being  imposed.   The  default  value  is  300 nodes per
2315              minute.  Related configuration  options  include  ResumeTimeout,
2316              ResumeProgram,  SuspendRate,  SuspendTime,  SuspendTimeout, Sus‐
2317              pendProgram, SuspendExcNodes, and SuspendExcParts.
2318
2319
2320       ResumeTimeout
2321              Maximum time permitted (in seconds) between when a  node  resume
2322              request  is  issued  and when the node is actually available for
2323              use.  Nodes which fail to respond in this  time  frame  will  be
2324              marked  DOWN and the jobs scheduled on the node requeued.  Nodes
2325              which reboot after this time frame will be marked  DOWN  with  a
2326              reason of "Node unexpectedly rebooted."  The default value is 60
2327              seconds.  Related configuration options  include  ResumeProgram,
2328              ResumeRate,  SuspendRate,  SuspendTime, SuspendTimeout, Suspend‐
2329              Program, SuspendExcNodes and SuspendExcParts.  More  information
2330              is     available     at     the     Slurm     web     site     (
2331              https://slurm.schedmd.com/power_save.html ).
2332
2333
2334       ResvEpilog
2335              Fully qualified pathname of a program for the slurmctld to  exe‐
2336              cute  when a reservation ends. The program can be used to cancel
2337              jobs, modify  partition  configuration,  etc.   The  reservation
2338              named  will be passed as an argument to the program.  By default
2339              there is no epilog.
2340
2341
2342       ResvOverRun
2343              Describes how long a job already running in a reservation should
2344              be  permitted  to  execute after the end time of the reservation
2345              has been reached.  The time period is specified in  minutes  and
2346              the  default  value  is 0 (kill the job immediately).  The value
2347              may not exceed 65533 minutes, although a value of "UNLIMITED" is
2348              supported to permit a job to run indefinitely after its reserva‐
2349              tion is terminated.
2350
2351
2352       ResvProlog
2353              Fully qualified pathname of a program for the slurmctld to  exe‐
2354              cute  when a reservation begins. The program can be used to can‐
2355              cel jobs, modify partition configuration, etc.  The  reservation
2356              named  will be passed as an argument to the program.  By default
2357              there is no prolog.
2358
2359
2360       ReturnToService
2361              Controls when a DOWN node will  be  returned  to  service.   The
2362              default value is 0.  Supported values include
2363
2364              0   A node will remain in the DOWN state until a system adminis‐
2365                  trator explicitly changes its state (even if the slurmd dae‐
2366                  mon registers and resumes communications).
2367
2368              1   A  DOWN node will become available for use upon registration
2369                  with a valid configuration only if it was set  DOWN  due  to
2370                  being  non-responsive.   If  the  node  was set DOWN for any
2371                  other reason (low  memory,  unexpected  reboot,  etc.),  its
2372                  state  will  not automatically be changed.  A node registers
2373                  with a valid configuration if its memory, GRES,  CPU  count,
2374                  etc.  are  equal to or greater than the values configured in
2375                  slurm.conf.
2376
2377              2   A DOWN node will become available for use upon  registration
2378                  with  a  valid  configuration.  The node could have been set
2379                  DOWN for any reason.  A node registers with a valid configu‐
2380                  ration  if its memory, GRES, CPU count, etc. are equal to or
2381                  greater than the values configured in slurm.conf.  (Disabled
2382                  on Cray ALPS systems.)
2383
2384
2385       RoutePlugin
2386              Identifies  the  plugin to be used for defining which nodes will
2387              be used for message forwarding and message aggregation.
2388
2389              route/default
2390                     default, use TreeWidth.
2391
2392              route/topology
2393                     use the switch hierarchy defined in a topology.conf file.
2394                     TopologyPlugin=topology/tree is required.
2395
2396
2397       SallocDefaultCommand
2398              Normally,  salloc(1)  will  run  the user's default shell when a
2399              command to execute is not specified on the salloc command  line.
2400              If  SallocDefaultCommand  is  specified, salloc will instead run
2401              the configured command. The command is passed to  '/bin/sh  -c',
2402              so  shell metacharacters are allowed, and commands with multiple
2403              arguments should be quoted. For instance:
2404
2405                  SallocDefaultCommand = "$SHELL"
2406
2407              would run the shell in the user's $SHELL  environment  variable.
2408              and
2409
2410                  SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --mpi=none $SHELL"
2411
2412              would  run  spawn  the  user's  default  shell  on the allocated
2413              resources, but not consume any of the CPU or  memory  resources,
2414              configure it as a pseudo-terminal, and preserve all of the job's
2415              environment variables (i.e. and not over-write them with the job
2416              step's allocation information).
2417
2418              For systems with generic resources (GRES) defined, the SallocDe‐
2419              faultCommand value should explicitly specify a  zero  count  for
2420              the  configured  GRES.   Failure  to  do  so  will result in the
2421              launched shell consuming those GRES  and  preventing  subsequent
2422              srun commands from using them.  For example, on Cray systems add
2423              "--gres=craynetwork:0" as shown below:
2424                  SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 --gres=craynetwork:0 --pty --preserve-env --mpi=none $SHELL"
2425
2426              For  systems  with  TaskPlugin  set,   adding   an   option   of
2427              "--cpu-bind=no"  is recommended if the default shell should have
2428              access to all of the CPUs allocated to the  job  on  that  node,
2429              otherwise the shell may be limited to a single cpu or core.
2430
2431
2432       SbcastParameters
2433              Controls sbcast command behavior. Multiple options can be speci‐
2434              fied in a comma separated list.  Supported values include:
2435
2436              DestDir=       Destination directory for file being broadcast to
2437                             allocated  compute  nodes.  Default value is cur‐
2438                             rent working directory.
2439
2440              Compression=   Specify default file compression  library  to  be
2441                             used.   Supported  values  are  "lz4", "none" and
2442                             "zlib".  The default value with the sbcast --com‐
2443                             press option is "lz4" and "none" otherwise.  Some
2444                             compression libraries may be unavailable on  some
2445                             systems.
2446
2447
2448       SchedulerParameters
2449              The  interpretation  of  this parameter varies by SchedulerType.
2450              Multiple options may be comma separated.
2451
2452              allow_zero_lic
2453                     If set, then job submissions requesting more than config‐
2454                     ured licenses won't be rejected.
2455
2456              assoc_limit_stop
2457                     If  set and a job cannot start due to association limits,
2458                     then do not attempt to initiate any lower  priority  jobs
2459                     in  that  partition.  Setting  this  can  decrease system
2460                     throughput and utilization, but avoid potentially  starv‐
2461                     ing larger jobs by preventing them from launching indefi‐
2462                     nitely.
2463
2464              batch_sched_delay=#
2465                     How long, in seconds, the scheduling of batch jobs can be
2466                     delayed.   This  can be useful in a high-throughput envi‐
2467                     ronment in which batch jobs are submitted at a very  high
2468                     rate  (i.e.  using  the sbatch command) and one wishes to
2469                     reduce the overhead of attempting to schedule each job at
2470                     submit time.  The default value is 3 seconds.
2471
2472              bb_array_stage_cnt=#
2473                     Number of tasks from a job array that should be available
2474                     for burst buffer resource allocation. Higher values  will
2475                     increase  the  system  overhead as each task from the job
2476                     array will be moved to it's own job record in memory,  so
2477                     relatively  small  values are generally recommended.  The
2478                     default value is 10.
2479
2480              bf_busy_nodes
2481                     When selecting resources for pending jobs to reserve  for
2482                     future execution (i.e. the job can not be started immedi‐
2483                     ately), then preferentially select nodes that are in use.
2484                     This  will  tend to leave currently idle resources avail‐
2485                     able for backfilling longer running jobs, but may  result
2486                     in allocations having less than optimal network topology.
2487                     This  option  is  currently   only   supported   by   the
2488                     select/cons_res    and   select/cons_tres   plugins   (or
2489                     select/cray_aries  with   SelectTypeParameters   set   to
2490                     "OTHER_CONS_RES"  or  "OTHER_CONS_TRES", which layers the
2491                     select/cray_aries  plugin  over  the  select/cons_res  or
2492                     select/cons_tres plugin respectively).
2493
2494              bf_continue
2495                     The  backfill  scheduler  periodically  releases locks in
2496                     order to permit other operations to proceed  rather  than
2497                     blocking  all  activity  for  what  could  be an extended
2498                     period of time.  Setting this option will cause the back‐
2499                     fill  scheduler  to continue processing pending jobs from
2500                     its original job list after releasing locks even  if  job
2501                     or node state changes.  This can result in lower priority
2502                     jobs being backfill scheduled instead  of  newly  arrived
2503                     higher priority jobs, but will permit more queued jobs to
2504                     be considered for backfill scheduling.
2505
2506              bf_hetjob_immediate
2507                     Instruct the backfill scheduler to  attempt  to  start  a
2508                     heterogeneous  job  as  soon as all of its components are
2509                     determined able to do so. Otherwise, the backfill  sched‐
2510                     uler  will  delay  heterogeneous jobs initiation attempts
2511                     until after the rest of the  queue  has  been  processed.
2512                     This  delay may result in lower priority jobs being allo‐
2513                     cated resources, which could delay the initiation of  the
2514                     heterogeneous  job due to account and/or QOS limits being
2515                     reached. This option is disabled by default.  If  enabled
2516                     and bf_hetjob_prio=min is not set, then it would be auto‐
2517                     matically set.
2518
2519              bf_hetjob_prio=[min|avg|max]
2520                     At the beginning of each  backfill  scheduling  cycle,  a
2521                     list  of pending to be scheduled jobs is sorted according
2522                     to the precedence order configured in PriorityType.  This
2523                     option instructs the scheduler to alter the sorting algo‐
2524                     rithm to ensure that all components belonging to the same
2525                     heterogeneous  job will be attempted to be scheduled con‐
2526                     secutively (thus not fragmented in the  resulting  list).
2527                     More specifically, all components from the same heteroge‐
2528                     neous job will be treated as if they all  have  the  same
2529                     priority (minimum, average or maximum depending upon this
2530                     option's parameter) when compared  with  other  jobs  (or
2531                     other  heterogeneous  job components). The original order
2532                     will be preserved within the same heterogeneous job. Note
2533                     that  the  operation  is  calculated for the PriorityTier
2534                     layer and for the  Priority  resulting  from  the  prior‐
2535                     ity/multifactor plugin calculations. When enabled, if any
2536                     heterogeneous job requested an advanced reservation, then
2537                     all  of  that job's components will be treated as if they
2538                     had requested an advanced reservation (and get  preferen‐
2539                     tial treatment in scheduling).
2540
2541                     Note  that  this  operation  does not update the Priority
2542                     values of the heterogeneous job  components,  only  their
2543                     order within the list, so the output of the sprio command
2544                     will not be effected.
2545
2546                     Heterogeneous jobs have  special  scheduling  properties:
2547                     they  are only scheduled by the backfill scheduling plug‐
2548                     in, each of their  components  is  considered  separately
2549                     when reserving resources (and might have different Prior‐
2550                     ityTier or different Priority values), and  no  heteroge‐
2551                     neous job component is actually allocated resources until
2552                     all if its components can be initiated.  This  may  imply
2553                     potential  scheduling  deadlock  scenarios because compo‐
2554                     nents from different heterogeneous jobs can start reserv‐
2555                     ing  resources  in  an  interleaved fashion (not consecu‐
2556                     tively), but none of the jobs can reserve  resources  for
2557                     all  components  and start. Enabling this option can help
2558                     to mitigate this problem. By default, this option is dis‐
2559                     abled.
2560
2561              bf_ignore_newly_avail_nodes
2562                     If set, then only resources available at the beginning of
2563                     a backfill cycle will be considered  for  use.  Otherwise
2564                     resources made available during that backfill cycle (dur‐
2565                     ing a yield with bf_continue set) may be used  for  lower
2566                     priority jobs, delaying the initiation of higher priority
2567                     jobs.  Disabled by default.
2568
2569              bf_interval=#
2570                     The  number  of  seconds  between  backfill   iterations.
2571                     Higher  values result in less overhead and better respon‐
2572                     siveness.   This  option  applies  only   to   Scheduler‐
2573                     Type=sched/backfill.   Default:  30,  Min:  1, Max: 10800
2574                     (3h).
2575
2576
2577              bf_job_part_count_reserve=#
2578                     The backfill scheduling logic will reserve resources  for
2579                     the specified count of highest priority jobs in each par‐
2580                     tition.  For example,  bf_job_part_count_reserve=10  will
2581                     cause the backfill scheduler to reserve resources for the
2582                     ten highest priority jobs in each partition.   Any  lower
2583                     priority  job  that can be started using currently avail‐
2584                     able resources and  not  adversely  impact  the  expected
2585                     start  time of these higher priority jobs will be started
2586                     by the backfill scheduler  The  default  value  is  zero,
2587                     which  will  reserve  resources  for  any pending job and
2588                     delay  initiation  of  lower  priority  jobs.   Also  see
2589                     bf_min_age_reserve  and bf_min_prio_reserve.  Default: 0,
2590                     Min: 0, Max: 100000.
2591
2592
2593              bf_max_job_array_resv=#
2594                     The maximum number of tasks from a job  array  for  which
2595                     the  backfill  scheduler  will  reserve  resources in the
2596                     future.  Since job arrays can potentially  have  millions
2597                     of  tasks,  the  overhead  in reserving resources for all
2598                     tasks can be prohibitive.  In addition various limits may
2599                     prevent all the jobs from starting at the expected times.
2600                     This has no impact upon the number of tasks  from  a  job
2601                     array  that  can be started immediately, only those tasks
2602                     expected to start at some future time.  Default: 20, Min:
2603                     0,  Max:  1000.   NOTE: Jobs submitted to multiple parti‐
2604                     tions appear in the job queue once per partition. If dif‐
2605                     ferent copies of a single job array record aren't consec‐
2606                     utive in the job queue and another job array record is in
2607                     between,  then bf_max_job_array_resv tasks are considered
2608                     per partition that the job is submitted to.
2609
2610              bf_max_job_assoc=#
2611                     The maximum  number  of  jobs  per  user  association  to
2612                     attempt  starting with the backfill scheduler.  This set‐
2613                     ting is similar to bf_max_job_user but is handy if a user
2614                     has multiple associations equating to basically different
2615                     users.  One can set this  limit  to  prevent  users  from
2616                     flooding  the  backfill queue with jobs that cannot start
2617                     and that prevent jobs from other users  to  start.   This
2618                     option   applies  only  to  SchedulerType=sched/backfill.
2619                     Also    see    the    bf_max_job_user    bf_max_job_part,
2620                     bf_max_job_test  and bf_max_job_user_part=# options.  Set
2621                     bf_max_job_test   to   a   value   much    higher    than
2622                     bf_max_job_assoc.   Default:  0  (no limit), Min: 0, Max:
2623                     bf_max_job_test.
2624
2625              bf_max_job_part=#
2626                     The maximum number  of  jobs  per  partition  to  attempt
2627                     starting  with  the backfill scheduler. This can be espe‐
2628                     cially helpful for systems with large numbers  of  parti‐
2629                     tions  and  jobs.  This option applies only to Scheduler‐
2630                     Type=sched/backfill.  Also  see  the  partition_job_depth
2631                     and  bf_max_job_test  options.   Set bf_max_job_test to a
2632                     value much higher than bf_max_job_part.  Default:  0  (no
2633                     limit), Min: 0, Max: bf_max_job_test.
2634
2635              bf_max_job_start=#
2636                     The  maximum  number  of jobs which can be initiated in a
2637                     single iteration of the backfill scheduler.  This  option
2638                     applies only to SchedulerType=sched/backfill.  Default: 0
2639                     (no limit), Min: 0, Max: 10000.
2640
2641              bf_max_job_test=#
2642                     The maximum number of jobs to attempt backfill scheduling
2643                     for (i.e. the queue depth).  Higher values result in more
2644                     overhead and less responsiveness.  Until  an  attempt  is
2645                     made  to backfill schedule a job, its expected initiation
2646                     time value will not be set.  In the case of  large  clus‐
2647                     ters,  configuring a relatively small value may be desir‐
2648                     able.    This   option   applies   only   to   Scheduler‐
2649                     Type=sched/backfill.    Default:   100,   Min:   1,  Max:
2650                     1,000,000.
2651
2652              bf_max_job_user=#
2653                     The maximum number of jobs per user to  attempt  starting
2654                     with  the backfill scheduler for ALL partitions.  One can
2655                     set this limit to prevent users from flooding  the  back‐
2656                     fill  queue  with jobs that cannot start and that prevent
2657                     jobs from other users to start.  This is similar  to  the
2658                     MAXIJOB  limit  in  Maui.   This  option  applies only to
2659                     SchedulerType=sched/backfill.      Also      see      the
2660                     bf_max_job_part,            bf_max_job_test           and
2661                     bf_max_job_user_part=# options.  Set bf_max_job_test to a
2662                     value  much  higher than bf_max_job_user.  Default: 0 (no
2663                     limit), Min: 0, Max: bf_max_job_test.
2664
2665              bf_max_job_user_part=#
2666                     The maximum number of jobs  per  user  per  partition  to
2667                     attempt starting with the backfill scheduler for any sin‐
2668                     gle partition.  This option applies  only  to  Scheduler‐
2669                     Type=sched/backfill.    Also   see  the  bf_max_job_part,
2670                     bf_max_job_test and bf_max_job_user=# options.   Default:
2671                     0 (no limit), Min: 0, Max: bf_max_job_test.
2672
2673              bf_max_time=#
2674                     The  maximum  time  in seconds the backfill scheduler can
2675                     spend (including  time  spent  sleeping  when  locks  are
2676                     released)  before  discontinuing,  even  if  maximum  job
2677                     counts have not been reached.  This option  applies  only
2678                     to  SchedulerType=sched/backfill.   The  default value is
2679                     the value of bf_interval (which defaults to 30  seconds).
2680                     Default:  bf_interval  value  (def. 30 sec), Min: 1, Max:
2681                     3600 (1h).  NOTE: If bf_interval is short and bf_max_time
2682                     is  large,  this  may cause locks to be acquired too fre‐
2683                     quently and starve out other serviced RPCs.  It's  advis‐
2684                     able  if  using  this  parameter  to set max_rpc_cnt high
2685                     enough that scheduling isn't  always  disabled,  and  low
2686                     enough that the interactive workload can get through in a
2687                     reasonable period of time. max_rpc_cnt needs to be  below
2688                     256  (the  default  RPC thread limit). Running around the
2689                     middle (150) may  give  you  good  results.   NOTE:  When
2690                     increasing  the  amount  of  time  spent  in the backfill
2691                     scheduling cycle, Slurm can be prevented from  responding
2692                     to  client  requests in a timely manner.  To address this
2693                     you can use max_rpc_cnt to specify  a  number  of  queued
2694                     RPCs  before  the  scheduler  stops  to  respond to these
2695                     requests.
2696
2697              bf_min_age_reserve=#
2698                     The backfill and main scheduling logic will  not  reserve
2699                     resources  for  pending jobs until they have been pending
2700                     and runnable for at least the specified  number  of  sec‐
2701                     onds.  In addition, jobs waiting for less than the speci‐
2702                     fied number of seconds will not prevent a newly submitted
2703                     job  from starting immediately, even if the newly submit‐
2704                     ted job has a lower priority.  This can  be  valuable  if
2705                     jobs  lack  time  limits or all time limits have the same
2706                     value.  The default value is  zero,  which  will  reserve
2707                     resources  for  any  pending  job and delay initiation of
2708                     lower priority jobs.  Also see  bf_job_part_count_reserve
2709                     and   bf_min_prio_reserve.   Default:  0,  Min:  0,  Max:
2710                     2592000 (30 days).
2711
2712              bf_min_prio_reserve=#
2713                     The backfill and main scheduling logic will  not  reserve
2714                     resources  for  pending  jobs unless they have a priority
2715                     equal to or higher than the specified  value.   In  addi‐
2716                     tion, jobs with a lower priority will not prevent a newly
2717                     submitted job from  starting  immediately,  even  if  the
2718                     newly  submitted  job  has a lower priority.  This can be
2719                     valuable if one  wished  to  maximum  system  utilization
2720                     without  regard  for job priority below a certain thresh‐
2721                     old.  The default  value  is  zero,  which  will  reserve
2722                     resources  for  any  pending  job and delay initiation of
2723                     lower priority jobs.  Also see  bf_job_part_count_reserve
2724                     and bf_min_age_reserve.  Default: 0, Min: 0, Max: 2^63.
2725
2726              bf_resolution=#
2727                     The  number  of  seconds  in the resolution of data main‐
2728                     tained about when jobs  begin  and  end.   Higher  values
2729                     result  in less overhead and better responsiveness.  This
2730                     option  applies  only  to   SchedulerType=sched/backfill.
2731                     Default: 60, Min: 1, Max: 3600 (1 hour).
2732
2733              bf_window=#
2734                     The  number  of minutes into the future to look when con‐
2735                     sidering jobs to schedule.  Higher values result in  more
2736                     overhead  and  less  responsiveness.  A value at least as
2737                     long as the  highest  allowed  time  limit  is  generally
2738                     advisable  to  prevent job starvation.  In order to limit
2739                     the amount of data managed by the backfill scheduler,  if
2740                     the value of bf_window is increased, then it is generally
2741                     advisable to also increase  bf_resolution.   This  option
2742                     applies  only  to SchedulerType=sched/backfill.  Default:
2743                     1440 (1 day), Min: 1, Max: 43200 (30 days).
2744
2745              bf_window_linear=#
2746                     For performance  reasons,  the  backfill  scheduler  will
2747                     decrease  precision in calculation of job expected termi‐
2748                     nation times. By default, the precision starts at 30 sec‐
2749                     onds  and that time interval doubles with each evaluation
2750                     of currently executing jobs when trying to determine when
2751                     a  pending  job  can start. This algorithm can support an
2752                     environment with many thousands of running jobs, but  can
2753                     result  in  the expected start time of pending jobs being
2754                     gradually being deferred due  to  lack  of  precision.  A
2755                     value  for  bf_window_linear will cause the time interval
2756                     to be increased by a constant amount on  each  iteration.
2757                     The  value is specified in units of seconds. For example,
2758                     a value of 60 will cause the backfill  scheduler  on  the
2759                     first  iteration  to  identify the job ending soonest and
2760                     determine if the pending job can be  started  after  that
2761                     job plus all other jobs expected to end within 30 seconds
2762                     (default initial value) of the first  job.  On  the  next
2763                     iteration, the pending job will be evaluated for starting
2764                     after the next job expected to end plus all  jobs  ending
2765                     within  90  seconds of that time (30 second default, plus
2766                     the 60 second option value).  The  third  iteration  will
2767                     have  a  150  second  window  and the fourth 210 seconds.
2768                     Without this option, the time windows will double on each
2769                     iteration  and thus be 30, 60, 120, 240 seconds, etc. The
2770                     use of bf_window_linear is not recommended with more than
2771                     a few hundred simultaneously executing jobs.
2772
2773              bf_yield_interval=#
2774                     The backfill scheduler will periodically relinquish locks
2775                     in order for other  pending  operations  to  take  place.
2776                     This specifies the times when the locks are relinquish in
2777                     microseconds.  Smaller values may  be  helpful  for  high
2778                     throughput  computing  when  used in conjunction with the
2779                     bf_continue option.  Also see the bf_yield_sleep  option.
2780                     Default:  2,000,000  (2 sec), Min: 1, Max: 10,000,000 (10
2781                     sec).
2782
2783              bf_yield_sleep=#
2784                     The backfill scheduler will periodically relinquish locks
2785                     in  order  for  other  pending  operations to take place.
2786                     This specifies the length of time for which the locks are
2787                     relinquish in microseconds.  Also see the bf_yield_inter‐
2788                     val option.  Default: 500,000 (0.5  sec),  Min:  1,  Max:
2789                     10,000,000 (10 sec).
2790
2791              build_queue_timeout=#
2792                     Defines  the maximum time that can be devoted to building
2793                     a queue of jobs to be tested for scheduling.  If the sys‐
2794                     tem  has  a  huge  number of jobs with dependencies, just
2795                     building the job queue  can  take  so  much  time  as  to
2796                     adversely  impact  overall  system  performance  and this
2797                     parameter can be adjusted as needed.  The  default  value
2798                     is 2,000,000 microseconds (2 seconds).
2799
2800              default_queue_depth=#
2801                     The  default  number  of jobs to attempt scheduling (i.e.
2802                     the queue depth) when a running job  completes  or  other
2803                     routine  actions  occur, however the frequency with which
2804                     the scheduler is run may be limited by using the defer or
2805                     sched_min_interval  parameters described below.  The full
2806                     queue will be tested on a less frequent basis as  defined
2807                     by the sched_interval option described below. The default
2808                     value is 100.   See  the  partition_job_depth  option  to
2809                     limit depth by partition.
2810
2811              defer  Setting  this  option  will  avoid attempting to schedule
2812                     each job individually at job submit time,  but  defer  it
2813                     until a later time when scheduling multiple jobs simulta‐
2814                     neously may be possible.  This option may improve  system
2815                     responsiveness when large numbers of jobs (many hundreds)
2816                     are submitted at the same time, but  it  will  delay  the
2817                     initiation    time   of   individual   jobs.   Also   see
2818                     default_queue_depth above.
2819
2820              delay_boot=#
2821                     Do not reboot nodes in order to satisfied this job's fea‐
2822                     ture  specification  if  the job has been eligible to run
2823                     for less than this time period.  If the  job  has  waited
2824                     for  less  than  the  specified  period, it will use only
2825                     nodes which already have  the  specified  features.   The
2826                     argument  is  in  units  of minutes.  Individual jobs may
2827                     override this default value with the --delay-boot option.
2828
2829              default_gbytes
2830                     The default units in job submission memory and  temporary
2831                     disk  size  specification  will  be gigabytes rather than
2832                     megabytes.  Users can override the  default  by  using  a
2833                     suffix of "M" for megabytes.
2834
2835              disable_job_shrink
2836                     Deny  user  requests  to shrink the side of running jobs.
2837                     (However, running jobs may still shrink due to node fail‐
2838                     ure if the --no-kill option was set.)
2839
2840              disable_hetero_steps
2841                     Disable  job  steps  that  span heterogeneous job alloca‐
2842                     tions.  The default value on Cray systems.
2843
2844              enable_hetero_steps
2845                     Enable job steps that span heterogeneous job allocations.
2846                     The default value except for Cray systems.
2847
2848              enable_user_top
2849                     Enable  use  of  the "scontrol top" command by non-privi‐
2850                     leged users.
2851
2852              Ignore_NUMA
2853                     Some processors (e.g. AMD Opteron  6000  series)  contain
2854                     multiple  NUMA  nodes per socket. This is a configuration
2855                     which does not map into the hardware entities that  Slurm
2856                     optimizes   resource  allocation  for  (PU/thread,  core,
2857                     socket, baseboard, node and network switch). In order  to
2858                     optimize  resource  allocations  on  such hardware, Slurm
2859                     will consider each NUMA node within the socket as a sepa‐
2860                     rate  socket  by  default.  Use the Ignore_NUMA option to
2861                     report  the  correct  socket  count,  but  not   optimize
2862                     resource allocations on the NUMA nodes.
2863
2864              inventory_interval=#
2865                     On  a  Cray system using Slurm on top of ALPS this limits
2866                     the number of times a Basil Inventory call is made.  Nor‐
2867                     mally this call happens every scheduling consideration to
2868                     attempt to close a node state change window with respects
2869                     to what ALPS has.  This call is rather slow, so making it
2870                     less frequently improves performance dramatically, but in
2871                     the situation where a node changes state the window is as
2872                     large as this setting.  In an HTC environment  this  set‐
2873                     ting is a must and we advise around 10 seconds.
2874
2875              kill_invalid_depend
2876                     If  a  job has an invalid dependency and it can never run
2877                     terminate it and set its state to  be  JOB_CANCELLED.  By
2878                     default  the job stays pending with reason DependencyNev‐
2879                     erSatisfied.
2880
2881              max_array_tasks
2882                     Specify the maximum number of tasks that be included in a
2883                     job  array.   The default limit is MaxArraySize, but this
2884                     option can be used to set a  lower  limit.  For  example,
2885                     max_array_tasks=1000 and MaxArraySize=100001 would permit
2886                     a maximum task ID of 100000,  but  limit  the  number  of
2887                     tasks in any single job array to 1000.
2888
2889              max_depend_depth=#
2890                     Maximum  number of jobs to test for a circular job depen‐
2891                     dency. Stop testing after this number of job dependencies
2892                     have been tested. The default value is 10 jobs.
2893
2894              max_rpc_cnt=#
2895                     If  the  number of active threads in the slurmctld daemon
2896                     is equal to or larger than this value,  defer  scheduling
2897                     of  jobs. The scheduler will check this condition at cer‐
2898                     tain points in code and yield locks if  necessary.   This
2899                     can improve Slurm's ability to process requests at a cost
2900                     of initiating new jobs less frequently.  If  a  value  is
2901                     set,  then a value of 10 or higher is recommended.  NOTE:
2902                     The maximum number  of  threads  (MAX_SERVER_THREADS)  is
2903                     internally  set  to  256 and defines the number of served
2904                     RPCs at a given time. Setting max_rpc_cnt  to  more  than
2905                     256 will be only useful to let backfill continue schedul‐
2906                     ing work after locks have been yielded (i.e. each 2  sec‐
2907                     onds)  if  there are a maximum of MAX(max_rpc_cnt/10, 20)
2908                     RPCs in the queue. i.e. max_rpc_cnt=1000,  the  scheduler
2909                     will  be  allowed  to  continue after yielding locks only
2910                     when there are less than or equal to  100  pending  RPCs.
2911                     Default:  0  (option  disabled),  Min: 0, Max: 1000.  The
2912                     default value is zero, which disables this option.  If  a
2913                     value  is  set,  then  a  value of 10 or higher is recom‐
2914                     mended. It may require some tuning for each  system,  but
2915                     needs to be high enough that scheduling isn't always dis‐
2916                     abled, and low enough that requests can get through in  a
2917                     reasonable period of time.
2918
2919              max_sched_time=#
2920                     How  long, in seconds, that the main scheduling loop will
2921                     execute for before exiting.  If a value is configured, be
2922                     aware  that  all  other Slurm operations will be deferred
2923                     during this time period.  Make certain the value is lower
2924                     than  MessageTimeout.   If a value is not explicitly con‐
2925                     figured, the default value is half of MessageTimeout with
2926                     a minimum default value of 1 second and a maximum default
2927                     value of 2 seconds.  For  example  if  MessageTimeout=10,
2928                     the time limit will be 2 seconds (i.e. MIN(10/2, 2) = 2).
2929
2930              max_script_size=#
2931                     Specify  the  maximum  size  of a batch script, in bytes.
2932                     The default value is  4  megabytes.   Larger  values  may
2933                     adversely impact system performance.
2934
2935              max_switch_wait=#
2936                     Maximum  number of seconds that a job can delay execution
2937                     waiting for  the  specified  desired  switch  count.  The
2938                     default value is 300 seconds.
2939
2940              no_backup_scheduling
2941                     If  used,  the  backup  controller will not schedule jobs
2942                     when it takes over. The backup controller will allow jobs
2943                     to  be submitted, modified and cancelled but won't sched‐
2944                     ule new jobs. This is useful in  Cray  environments  when
2945                     the  backup  controller resides on an external Cray node.
2946                     A restart is required  to  alter  this  option.  This  is
2947                     explicitly set on a Cray/ALPS system.
2948
2949              no_env_cache
2950                     If  used,  any job started on node that fails to load the
2951                     env from a node will fail instead  of  using  the  cached
2952                     env.   This  will  also implicitly imply the requeue_set‐
2953                     up_env_fail option as well.
2954
2955              nohold_on_prolog_fail
2956                     By default, if the Prolog exits with a non-zero value the
2957                     job  is  requeued  in  a  held  state. By specifying this
2958                     parameter the job will be requeued but not held  so  that
2959                     the scheduler can dispatch it to another host.
2960
2961              pack_serial_at_end
2962                     If  used  with  the  select/cons_res  or select/cons_tres
2963                     plugin, then put serial jobs at the end of the  available
2964                     nodes  rather  than using a best fit algorithm.  This may
2965                     reduce resource fragmentation for some workloads.
2966
2967              partition_job_depth=#
2968                     The default number of jobs to  attempt  scheduling  (i.e.
2969                     the  queue  depth)  from  each partition/queue in Slurm's
2970                     main scheduling logic.  The functionality is  similar  to
2971                     that provided by the bf_max_job_part option for the back‐
2972                     fill scheduling  logic.   The  default  value  is  0  (no
2973                     limit).   Job's  excluded from attempted scheduling based
2974                     upon  partition  will  not   be   counted   against   the
2975                     default_queue_depth  limit.  Also see the bf_max_job_part
2976                     option.
2977
2978              permit_job_expansion
2979                     Allow running jobs to request additional nodes be  merged
2980                     in with the current job allocation.
2981
2982              preempt_reorder_count=#
2983                     Specify how many attempts should be made in reording pre‐
2984                     emptable jobs to minimize the count  of  jobs  preempted.
2985                     The  default value is 1. High values may adversely impact
2986                     performance.  The logic to support this  option  is  only
2987                     available  in  the  select/cons_res  and select/cons_tres
2988                     plugins.
2989
2990              preempt_strict_order
2991                     If set, then execute extra logic in an attempt to preempt
2992                     only  the  lowest  priority jobs.  It may be desirable to
2993                     set this configuration parameter when there are  multiple
2994                     priorities  of  preemptable  jobs.   The logic to support
2995                     this option is only available in the select/cons_res  and
2996                     select/cons_tres plugins.
2997
2998              preempt_youngest_first
2999                     If  set,  then  the  preemption sorting algorithm will be
3000                     changed to sort by the job start times to favor  preempt‐
3001                     ing  younger  jobs  over  older. (Requires preempt/parti‐
3002                     tion_prio or preempt/qos plugins.)
3003
3004              reduce_completing_frag
3005                     This  option  is  used  to  control  how  scheduling   of
3006                     resources is performed when jobs are in completing state,
3007                     which influences potential fragmentation.  If the  option
3008                     is  not set then no jobs will be started in any partition
3009                     when any job is in completing state.  If  the  option  is
3010                     set then no jobs will be started in any individual parti‐
3011                     tion that has a job in completing state.  In addition, no
3012                     jobs  will  be  started  in any partition with nodes that
3013                     overlap with any nodes in the partition of the completing
3014                     job.   This option is to be used in conjunction with Com‐
3015                     pleteWait.  NOTE: CompleteWait must be set  for  this  to
3016                     work.
3017
3018              requeue_setup_env_fail
3019                     By default if a job environment setup fails the job keeps
3020                     running with a limited environment.  By  specifying  this
3021                     parameter  the job will be requeued in held state and the
3022                     execution node drained.
3023
3024              salloc_wait_nodes
3025                     If defined, the salloc command will wait until all  allo‐
3026                     cated  nodes  are  ready for use (i.e. booted) before the
3027                     command returns. By default, salloc will return  as  soon
3028                     as the resource allocation has been made.
3029
3030              sbatch_wait_nodes
3031                     If  defined,  the sbatch script will wait until all allo‐
3032                     cated nodes are ready for use (i.e.  booted)  before  the
3033                     initiation.  By default, the sbatch script will be initi‐
3034                     ated as soon as the first node in the job  allocation  is
3035                     ready.  The  sbatch  command can use the --wait-all-nodes
3036                     option to override this configuration parameter.
3037
3038              sched_interval=#
3039                     How frequently, in seconds, the main scheduling loop will
3040                     execute  and test all pending jobs.  The default value is
3041                     60 seconds.
3042
3043              sched_max_job_start=#
3044                     The maximum number of jobs that the main scheduling logic
3045                     will start in any single execution.  The default value is
3046                     zero, which imposes no limit.
3047
3048              sched_min_interval=#
3049                     How frequently, in microseconds, the main scheduling loop
3050                     will  execute  and  test any pending jobs.  The scheduler
3051                     runs in a limited fashion every time that any event  hap‐
3052                     pens  which could enable a job to start (e.g. job submit,
3053                     job terminate, etc.).  If these events happen at  a  high
3054                     frequency, the scheduler can run very frequently and con‐
3055                     sume significant  resources  if  not  throttled  by  this
3056                     option.   This  option specifies the minimum time between
3057                     the end of one scheduling cycle and the beginning of  the
3058                     next  scheduling  cycle.   A  value  of zero will disable
3059                     throttling of the scheduling logic interval.  The default
3060                     value  is 1,000,000 microseconds on Cray/ALPS systems and
3061                     2 microseconds on other systems.
3062
3063              spec_cores_first
3064                     Specialized cores will be selected from the  first  cores
3065                     of  the  first  sockets, cycling through the sockets on a
3066                     round robin basis.  By default, specialized cores will be
3067                     selected from the last cores of the last sockets, cycling
3068                     through the sockets on a round robin basis.
3069
3070              step_retry_count=#
3071                     When a step completes and there are steps ending resource
3072                     allocation, then retry step allocations for at least this
3073                     number of pending steps.  Also see step_retry_time.   The
3074                     default value is 8 steps.
3075
3076              step_retry_time=#
3077                     When a step completes and there are steps ending resource
3078                     allocation, then retry step  allocations  for  all  steps
3079                     which  have been pending for at least this number of sec‐
3080                     onds.  Also see step_retry_count.  The default  value  is
3081                     60 seconds.
3082
3083              whole_hetjob
3084                     Requests  to  cancel,  hold or release any component of a
3085                     heterogeneous job will be applied to  all  components  of
3086                     the job.
3087
3088                     NOTE:  this  option  was  previously named whole_pack and
3089                     this is still supported for retrocompatibility.
3090
3091
3092       SchedulerTimeSlice
3093              Number of seconds in each time slice  when  gang  scheduling  is
3094              enabled (PreemptMode=SUSPEND,GANG).  The value must be between 5
3095              seconds and 65533 seconds.  The default value is 30 seconds.
3096
3097
3098       SchedulerType
3099              Identifies the type of scheduler to be used.  Note the slurmctld
3100              daemon  must  be  restarted  for  a  change in scheduler type to
3101              become effective (reconfiguring a running daemon has  no  effect
3102              for  this parameter).  The scontrol command can be used to manu‐
3103              ally  change  job  priorities  if  desired.   Acceptable  values
3104              include:
3105
3106              sched/backfill
3107                     For  a  backfill scheduling module to augment the default
3108                     FIFO  scheduling.   Backfill  scheduling  will   initiate
3109                     lower-priority  jobs  if  doing  so  does  not  delay the
3110                     expected initiation time  of  any  higher  priority  job.
3111                     Effectiveness  of  backfill  scheduling is dependent upon
3112                     users specifying job time limits, otherwise all jobs will
3113                     have  the  same time limit and backfilling is impossible.
3114                     Note documentation  for  the  SchedulerParameters  option
3115                     above.  This is the default configuration.
3116
3117              sched/builtin
3118                     This is the FIFO scheduler which initiates jobs in prior‐
3119                     ity order.  If any job in the partition can not be sched‐
3120                     uled,  no  lower  priority  job in that partition will be
3121                     scheduled.  An exception is made for jobs  that  can  not
3122                     run due to partition constraints (e.g. the time limit) or
3123                     down/drained nodes.  In that case,  lower  priority  jobs
3124                     can be initiated and not impact the higher priority job.
3125
3126              sched/hold
3127                     To   hold   all   newly   arriving   jobs   if   a   file
3128                     "/etc/slurm.hold" exists otherwise use the built-in  FIFO
3129                     scheduler
3130
3131
3132       SelectType
3133              Identifies  the type of resource selection algorithm to be used.
3134              Changing this value can only be done by restarting the slurmctld
3135              daemon  and will result in the loss of all job information (run‐
3136              ning and pending) since the job state save format used  by  each
3137              plugin is different.  Acceptable values include
3138
3139              select/cons_res
3140                     The  resources (cores and memory) within a node are indi‐
3141                     vidually allocated as consumable  resources.   Note  that
3142                     whole  nodes can be allocated to jobs for selected parti‐
3143                     tions by using the OverSubscribe=Exclusive  option.   See
3144                     the  partition  OverSubscribe parameter for more informa‐
3145                     tion.
3146
3147              select/cray_aries
3148                     for   a   Cray   system.    The    default    value    is
3149                     "select/cray_aries" for all Cray systems.
3150
3151              select/linear
3152                     for allocation of entire nodes assuming a one-dimensional
3153                     array of nodes in which sequentially  ordered  nodes  are
3154                     preferable.   For a heterogeneous cluster (e.g. different
3155                     CPU counts on the various  nodes),  resource  allocations
3156                     will  favor  nodes  with  high CPU counts as needed based
3157                     upon the job's node and CPU specification if TopologyPlu‐
3158                     gin=topology/none  is  configured.  Use of other topology
3159                     plugins with select/linear and heterogeneous nodes is not
3160                     recommended  and  may  result  in  valid  job  allocation
3161                     requests being rejected.  This is the default value.
3162
3163              select/cons_tres
3164                     The resources (cores, memory, GPUs and all  other  track‐
3165                     able  resources) within a node are individually allocated
3166                     as consumable resources.  Note that whole  nodes  can  be
3167                     allocated  to  jobs  for selected partitions by using the
3168                     OverSubscribe=Exclusive option.  See the partition  Over‐
3169                     Subscribe parameter for more information.
3170
3171
3172       SelectTypeParameters
3173              The  permitted  values  of  SelectTypeParameters depend upon the
3174              configured value of SelectType.  The only supported options  for
3175              SelectType=select/linear are CR_ONE_TASK_PER_CORE and CR_Memory,
3176              which treats memory as a consumable resource and prevents memory
3177              over  subscription  with  job preemption or gang scheduling.  By
3178              default SelectType=select/linear allocates whole nodes  to  jobs
3179              without   considering  their  memory  consumption.   By  default
3180              SelectType=select/cons_res,  SelectType=select/cray_aries,   and
3181              SelectType=select/cons_tres,  use  CR_CPU,  which  allocates CPU
3182              (threads) to jobs without considering their memory consumption.
3183
3184              The   following    options    are    supported    for    Select‐
3185              Type=select/cray_aries:
3186
3187                     OTHER_CONS_RES
3188                            Layer   the   select/cons_res   plugin  under  the
3189                            select/cray_aries plugin, the default is to  layer
3190                            on   select/linear.   This  also  allows  all  the
3191                            options available for SelectType=select/cons_res.
3192
3193                     OTHER_CONS_TRES
3194                            Layer  the  select/cons_tres  plugin   under   the
3195                            select/cray_aries  plugin, the default is to layer
3196                            on  select/linear.   This  also  allows  all   the
3197                            options available for SelectType=select/cons_tres.
3198
3199              The   following   options   are   supported   by   the   Select‐
3200              Type=select/cons_res and SelectType=select/cons_tres plugins:
3201
3202                     CR_CPU CPUs are consumable resources.  Configure the num‐
3203                            ber  of  CPUs  on each node, which may be equal to
3204                            the count of cores or hyper-threads  on  the  node
3205                            depending  upon the desired minimum resource allo‐
3206                            cation.  The  node's  Boards,  Sockets,  CoresPer‐
3207                            Socket  and  ThreadsPerCore may optionally be con‐
3208                            figured and result in job allocations  which  have
3209                            improved  locality;  however doing so will prevent
3210                            more than one job being from  being  allocated  on
3211                            each core.
3212
3213                     CR_CPU_Memory
3214                            CPUs and memory are consumable resources.  Config‐
3215                            ure the number of CPUs on each node, which may  be
3216                            equal  to  the  count of cores or hyper-threads on
3217                            the  node  depending  upon  the  desired   minimum
3218                            resource  allocation.  The node's Boards, Sockets,
3219                            CoresPerSocket and ThreadsPerCore  may  optionally
3220                            be  configured and result in job allocations which
3221                            have improved locality; however doing so will pre‐
3222                            vent  more than one job being from being allocated
3223                            on each core.  Setting a value for DefMemPerCPU is
3224                            strongly recommended.
3225
3226                     CR_Core
3227                            Cores  are  consumable  resources.   On nodes with
3228                            hyper-threads, each thread is counted as a CPU  to
3229                            satisfy a job's resource requirement, but multiple
3230                            jobs are not allocated threads on the  same  core.
3231                            The  count  of  CPUs  allocated  to  a  job may be
3232                            rounded up to account for every CPU  on  an  allo‐
3233                            cated core.
3234
3235                     CR_Core_Memory
3236                            Cores  and  memory  are  consumable resources.  On
3237                            nodes with hyper-threads, each thread  is  counted
3238                            as  a CPU to satisfy a job's resource requirement,
3239                            but multiple jobs are not allocated threads on the
3240                            same  core.   The count of CPUs allocated to a job
3241                            may be rounded up to account for every CPU  on  an
3242                            allocated  core.  Setting a value for DefMemPerCPU
3243                            is strongly recommended.
3244
3245                     CR_ONE_TASK_PER_CORE
3246                            Allocate one task per core  by  default.   Without
3247                            this option, by default one task will be allocated
3248                            per thread on nodes with more than one ThreadsPer‐
3249                            Core configured.  NOTE: This option cannot be used
3250                            with CR_CPU*.
3251
3252                     CR_CORE_DEFAULT_DIST_BLOCK
3253                            Allocate cores within a node using block distribu‐
3254                            tion  by default.  This is a pseudo-best-fit algo‐
3255                            rithm that minimizes the number of boards and min‐
3256                            imizes  the  number  of  sockets  (within  minimum
3257                            boards) used for  the  allocation.   This  default
3258                            behavior can be overridden specifying a particular
3259                            "-m" parameter with  srun/salloc/sbatch.   Without
3260                            this  option,  cores  will  be  allocated cyclicly
3261                            across the sockets.
3262
3263                     CR_LLN Schedule resources to jobs  on  the  least  loaded
3264                            nodes  (based  upon the number of idle CPUs). This
3265                            is generally only recommended for  an  environment
3266                            with serial jobs as idle resources will tend to be
3267                            highly  fragmented,  resulting  in  parallel  jobs
3268                            being  distributed  across  many nodes.  Note that
3269                            node Weight takes precedence over  how  many  idle
3270                            resources  are  on each node.  Also see the parti‐
3271                            tion configuration parameter  LLN  use  the  least
3272                            loaded nodes in selected partitions.
3273
3274                     CR_Pack_Nodes
3275                            If  a  job allocation contains more resources than
3276                            will be used for launching tasks  (e.g.  if  whole
3277                            nodes  are  allocated  to a job), then rather than
3278                            distributing a  job's  tasks  evenly  across  it's
3279                            allocated  nodes, pack them as tightly as possible
3280                            on these nodes.  For example, consider a job allo‐
3281                            cation containing two entire nodes with eight CPUs
3282                            each.  If the job starts ten  tasks  across  those
3283                            two  nodes without this option, it will start five
3284                            tasks on each of the two nodes.  With this option,
3285                            eight  tasks will be started on the first node and
3286                            two tasks on the second node.
3287
3288                     CR_Socket
3289                            Sockets are consumable resources.  On  nodes  with
3290                            multiple  cores, each core or thread is counted as
3291                            a CPU to satisfy a job's resource requirement, but
3292                            multiple  jobs  are not allocated resources on the
3293                            same socket.
3294
3295                     CR_Socket_Memory
3296                            Memory and sockets are consumable  resources.   On
3297                            nodes  with multiple cores, each core or thread is
3298                            counted as a  CPU  to  satisfy  a  job's  resource
3299                            requirement,  but  multiple jobs are not allocated
3300                            resources on the same socket.  Setting a value for
3301                            DefMemPerCPU is strongly recommended.
3302
3303                     CR_Memory
3304                            Memory  is  a  consumable  resource.   NOTE:  This
3305                            implies OverSubscribe=YES  or  OverSubscribe=FORCE
3306                            for  all  partitions.  Setting a value for DefMem‐
3307                            PerCPU is strongly recommended.
3308
3309
3310       SlurmUser
3311              The name of the user that the slurmctld daemon executes as.  For
3312              security  purposes,  a  user  other  than "root" is recommended.
3313              This user must exist on all nodes of the cluster for authentica‐
3314              tion  of  communications  between Slurm components.  The default
3315              value is "root".
3316
3317
3318       SlurmdParameters
3319              Parameters specific to the  Slurmd.   Multiple  options  may  be
3320              comma separated.
3321
3322              config_overrides
3323                     If  set,  consider  the  configuration of each node to be
3324                     that specified in the slurm.conf configuration  file  and
3325                     any node with less than the configured resources will not
3326                     be set DRAIN.  This option is generally only  useful  for
3327                     testing  purposes.   Equivalent  to  the  now  deprecated
3328                     FastSchedule=2 option.
3329
3330              shutdown_on_reboot
3331                     If set, the Slurmd will shut itself down  when  a  reboot
3332                     request is received.
3333
3334
3335       SlurmdUser
3336              The  name  of the user that the slurmd daemon executes as.  This
3337              user must exist on all nodes of the cluster  for  authentication
3338              of  communications  between Slurm components.  The default value
3339              is "root".
3340
3341
3342       SlurmctldAddr
3343              An optional address to be used for communications  to  the  cur‐
3344              rently  active  slurmctld  daemon, normally used with Virtual IP
3345              addressing of the currently active server.  If this parameter is
3346              not  specified then each primary and backup server will have its
3347              own unique address used for communications as specified  in  the
3348              SlurmctldHost  parameter.   If  this parameter is specified then
3349              the SlurmctldHost parameter will still be  used  for  communica‐
3350              tions to specific slurmctld primary or backup servers, for exam‐
3351              ple to cause all of them to read the current configuration files
3352              or  shutdown.   Also  see the SlurmctldPrimaryOffProg and Slurm‐
3353              ctldPrimaryOnProg configuration parameters to configure programs
3354              to manipulate virtual IP address manipulation.
3355
3356
3357       SlurmctldDebug
3358              The  level  of  detail  to provide slurmctld daemon's logs.  The
3359              default value is info.  If the  slurmctld  daemon  is  initiated
3360              with  -v or --verbose options, that debug level will be preserve
3361              or restored upon reconfiguration.
3362
3363
3364              quiet     Log nothing
3365
3366              fatal     Log only fatal errors
3367
3368              error     Log only errors
3369
3370              info      Log errors and general informational messages
3371
3372              verbose   Log errors and verbose informational messages
3373
3374              debug     Log errors  and  verbose  informational  messages  and
3375                        debugging messages
3376
3377              debug2    Log errors and verbose informational messages and more
3378                        debugging messages
3379
3380              debug3    Log errors and verbose informational messages and even
3381                        more debugging messages
3382
3383              debug4    Log errors and verbose informational messages and even
3384                        more debugging messages
3385
3386              debug5    Log errors and verbose informational messages and even
3387                        more debugging messages
3388
3389
3390       SlurmctldHost
3391              The  short, or long, hostname of the machine where Slurm control
3392              daemon is executed (i.e. the name returned by the command "host‐
3393              name -s").  This hostname is optionally followed by the address,
3394              either the IP address or a name by  which  the  address  can  be
3395              identifed,  enclosed  in  parentheses  (e.g.  SlurmctldHost=mas‐
3396              ter1(12.34.56.78)). This value must be specified at least  once.
3397              If  specified  more  than once, the first hostname named will be
3398              where the daemon runs.  If the first specified host  fails,  the
3399              daemon  will  execute on the second host.  If both the first and
3400              second specified host fails, the  daemon  will  execute  on  the
3401              third host.
3402
3403
3404       SlurmctldLogFile
3405              Fully qualified pathname of a file into which the slurmctld dae‐
3406              mon's logs are written.  The default  value  is  none  (performs
3407              logging via syslog).
3408              See the section LOGGING if a pathname is specified.
3409
3410
3411       SlurmctldParameters
3412              Multiple options may be comma-separated.
3413
3414
3415              allow_user_triggers
3416                     Permit  setting  triggers from non-root/slurm_user users.
3417                     SlurmUser must also be set to root to permit these  trig‐
3418                     gers  to  work.  See the strigger man page for additional
3419                     details.
3420
3421              cloud_dns
3422                     By default, Slurm expects that the network address for  a
3423                     cloud  node won't be known until the creation of the node
3424                     and that Slurm will be notified  of  the  node's  address
3425                     (e.g.  scontrol  update nodename=<name> nodeaddr=<addr>).
3426                     Since Slurm communications rely on the node configuration
3427                     found  in the slurm.conf, Slurm will tell the client com‐
3428                     mand, after waiting for all nodes to boot, each node's ip
3429                     address.  However, in environments where the nodes are in
3430                     DNS, this step can be avoided by configuring this option.
3431
3432              idle_on_node_suspend Mark nodes as idle, regardless  of  current
3433              state,
3434                     when  suspending  nodes with SuspendProgram so that nodes
3435                     will be eligible to be resumed at a later time.
3436
3437              preempt_send_user_signal  Send  the  user  signal  (e.g.  --sig‐
3438              nal=<sig_num>)
3439                     at  preemption  time  even if the signal time hasn't been
3440                     reached. In the case of a gracetime preemption  the  user
3441                     signal will be sent if the user signal has been specified
3442                     and not sent, otherwise a SIGTERM will  be  sent  to  the
3443                     tasks.
3444
3445              reboot_from_controller Run the RebootProgram from the controller
3446                     instead  of  on  the  slurmds.  The RebootProgram will be
3447                     passed a comma-separated list of nodes to reboot.
3448
3449
3450       SlurmctldPidFile
3451              Fully qualified pathname of a file  into  which  the   slurmctld
3452              daemon  may write its process id. This may be used for automated
3453              signal  processing.   The  default  value  is   "/var/run/slurm‐
3454              ctld.pid".
3455
3456
3457       SlurmctldPlugstack
3458              A comma delimited list of Slurm controller plugins to be started
3459              when the daemon begins and terminated when it  ends.   Only  the
3460              plugin's init and fini functions are called.
3461
3462
3463       SlurmctldPort
3464              The port number that the Slurm controller, slurmctld, listens to
3465              for work. The default value is SLURMCTLD_PORT as established  at
3466              system  build  time. If none is explicitly specified, it will be
3467              set to 6817.  SlurmctldPort may also be configured to support  a
3468              range of port numbers in order to accept larger bursts of incom‐
3469              ing messages by specifying two numbers separated by a dash (e.g.
3470              SlurmctldPort=6817-6818).   NOTE:  Either  slurmctld  and slurmd
3471              daemons must not execute on the same  nodes  or  the  values  of
3472              SlurmctldPort and SlurmdPort must be different.
3473
3474              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
3475              automatically try to interact  with  anything  opened  on  ports
3476              8192-60000.   Configure  SlurmctldPort  to use a port outside of
3477              the configured SrunPortRange and RSIP's port range.
3478
3479
3480       SlurmctldPrimaryOffProg
3481              This program is executed when a slurmctld daemon running as  the
3482              primary server becomes a backup server. By default no program is
3483              executed.  See also the related "SlurmctldPrimaryOnProg" parame‐
3484              ter.
3485
3486
3487       SlurmctldPrimaryOnProg
3488              This  program  is  executed when a slurmctld daemon running as a
3489              backup server becomes the primary server. By default no  program
3490              is  executed.   When  using  virtual IP addresses to manage High
3491              Available Slurm services, this program can be used to add the IP
3492              address  to  an  interface (and optionally try to kill the unre‐
3493              sponsive  slurmctld daemon and flush the ARP caches on nodes  on
3494              the local ethernet fabric).  See also the related "SlurmctldPri‐
3495              maryOffProg" parameter.
3496
3497       SlurmctldSyslogDebug
3498              The slurmctld daemon will log events to the syslog file  at  the
3499              specified level of detail. If not set, the slurmctld daemon will
3500              log to syslog at level fatal, unless there is  no  SlurmctldLog‐
3501              File  and it is running in the background, in which case it will
3502              log to syslog at the level specified by SlurmctldDebug (at fatal
3503              in the case that SlurmctldDebug is set to quiet) or it is run in
3504              the foreground, when it will be set to quiet.
3505
3506
3507              quiet     Log nothing
3508
3509              fatal     Log only fatal errors
3510
3511              error     Log only errors
3512
3513              info      Log errors and general informational messages
3514
3515              verbose   Log errors and verbose informational messages
3516
3517              debug     Log errors  and  verbose  informational  messages  and
3518                        debugging messages
3519
3520              debug2    Log errors and verbose informational messages and more
3521                        debugging messages
3522
3523              debug3    Log errors and verbose informational messages and even
3524                        more debugging messages
3525
3526              debug4    Log errors and verbose informational messages and even
3527                        more debugging messages
3528
3529              debug5    Log errors and verbose informational messages and even
3530                        more debugging messages
3531
3532
3533
3534       SlurmctldTimeout
3535              The  interval,  in seconds, that the backup controller waits for
3536              the primary controller to respond before assuming control.   The
3537              default value is 120 seconds.  May not exceed 65533.
3538
3539
3540       SlurmdDebug
3541              The  level  of  detail  to  provide  slurmd  daemon's logs.  The
3542              default value is info.
3543
3544              quiet     Log nothing
3545
3546              fatal     Log only fatal errors
3547
3548              error     Log only errors
3549
3550              info      Log errors and general informational messages
3551
3552              verbose   Log errors and verbose informational messages
3553
3554              debug     Log errors  and  verbose  informational  messages  and
3555                        debugging messages
3556
3557              debug2    Log errors and verbose informational messages and more
3558                        debugging messages
3559
3560              debug3    Log errors and verbose informational messages and even
3561                        more debugging messages
3562
3563              debug4    Log errors and verbose informational messages and even
3564                        more debugging messages
3565
3566              debug5    Log errors and verbose informational messages and even
3567                        more debugging messages
3568
3569
3570       SlurmdLogFile
3571              Fully  qualified  pathname of a file into which the  slurmd dae‐
3572              mon's logs are written.  The default  value  is  none  (performs
3573              logging  via syslog).  Any "%h" within the name is replaced with
3574              the hostname on which the slurmd is running.   Any  "%n"  within
3575              the  name  is  replaced  with  the  Slurm node name on which the
3576              slurmd is running.
3577              See the section LOGGING if a pathname is specified.
3578
3579
3580       SlurmdPidFile
3581              Fully qualified pathname of a file into which the  slurmd daemon
3582              may  write its process id. This may be used for automated signal
3583              processing.  Any "%h" within the name is replaced with the host‐
3584              name  on  which the slurmd is running.  Any "%n" within the name
3585              is replaced with the Slurm node name on which the slurmd is run‐
3586              ning.  The default value is "/var/run/slurmd.pid".
3587
3588
3589       SlurmdPort
3590              The port number that the Slurm compute node daemon, slurmd, lis‐
3591              tens to for work. The default value  is  SLURMD_PORT  as  estab‐
3592              lished  at  system  build time. If none is explicitly specified,
3593              its value will be 6818.  NOTE: Either slurmctld and slurmd  dae‐
3594              mons  must not execute on the same nodes or the values of Slurm‐
3595              ctldPort and SlurmdPort must be different.
3596
3597              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3598              automatically  try  to  interact  with  anything opened on ports
3599              8192-60000.  Configure SlurmdPort to use a port outside  of  the
3600              configured SrunPortRange and RSIP's port range.
3601
3602
3603       SlurmdSpoolDir
3604              Fully  qualified  pathname  of a directory into which the slurmd
3605              daemon's state information and batch job script information  are
3606              written.  This  must  be  a  common  pathname for all nodes, but
3607              should represent a directory which is local to each node (refer‐
3608              ence    a   local   file   system).   The   default   value   is
3609              "/var/spool/slurmd".  Any "%h" within the name is replaced  with
3610              the  hostname  on  which the slurmd is running.  Any "%n" within
3611              the name is replaced with the  Slurm  node  name  on  which  the
3612              slurmd is running.
3613
3614
3615       SlurmdSyslogDebug
3616              The  slurmd  daemon  will  log  events to the syslog file at the
3617              specified level of detail. If not set, the  slurmd  daemon  will
3618              log  to  syslog at level fatal, unless there is no SlurmdLogFile
3619              and it is running in the background, in which case it  will  log
3620              to  syslog  at  the level specified by SlurmdDebug  (at fatal in
3621              the case that SlurmdDebug is set to quiet) or it is run  in  the
3622              foreground, when it will be set to quiet.
3623
3624
3625              quiet     Log nothing
3626
3627              fatal     Log only fatal errors
3628
3629              error     Log only errors
3630
3631              info      Log errors and general informational messages
3632
3633              verbose   Log errors and verbose informational messages
3634
3635              debug     Log  errors  and  verbose  informational  messages and
3636                        debugging messages
3637
3638              debug2    Log errors and verbose informational messages and more
3639                        debugging messages
3640
3641              debug3    Log errors and verbose informational messages and even
3642                        more debugging messages
3643
3644              debug4    Log errors and verbose informational messages and even
3645                        more debugging messages
3646
3647              debug5    Log errors and verbose informational messages and even
3648                        more debugging messages
3649
3650
3651       SlurmdTimeout
3652              The interval, in seconds, that the Slurm  controller  waits  for
3653              slurmd  to respond before configuring that node's state to DOWN.
3654              A value of zero indicates the node will not be tested by  slurm‐
3655              ctld  to confirm the state of slurmd, the node will not be auto‐
3656              matically set  to  a  DOWN  state  indicating  a  non-responsive
3657              slurmd,  and  some other tool will take responsibility for moni‐
3658              toring the state of each compute node  and  its  slurmd  daemon.
3659              Slurm's hierarchical communication mechanism is used to ping the
3660              slurmd daemons in order to minimize system noise  and  overhead.
3661              The  default  value  is  300  seconds.  The value may not exceed
3662              65533 seconds.
3663
3664
3665       SlurmSchedLogFile
3666              Fully qualified pathname of the scheduling event  logging  file.
3667              The  syntax  of  this parameter is the same as for SlurmctldLog‐
3668              File.  In order to configure scheduler  logging,  set  both  the
3669              SlurmSchedLogFile and SlurmSchedLogLevel parameters.
3670
3671
3672       SlurmSchedLogLevel
3673              The  initial  level  of scheduling event logging, similar to the
3674              SlurmctldDebug parameter used to control the  initial  level  of
3675              slurmctld  logging.  Valid values for SlurmSchedLogLevel are "0"
3676              (scheduler  logging  disabled)  and   "1"   (scheduler   logging
3677              enabled).   If  this parameter is omitted, the value defaults to
3678              "0" (disabled).  In order to configure  scheduler  logging,  set
3679              both  the  SlurmSchedLogFile  and SlurmSchedLogLevel parameters.
3680              The scheduler logging level can  be  changed  dynamically  using
3681              scontrol.
3682
3683
3684       SrunEpilog
3685              Fully qualified pathname of an executable to be run by srun fol‐
3686              lowing the completion of a job step.  The command line arguments
3687              for  the executable will be the command and arguments of the job
3688              step.  This configuration parameter may be overridden by  srun's
3689              --epilog  parameter. Note that while the other "Epilog" executa‐
3690              bles (e.g., TaskEpilog) are run by slurmd on the  compute  nodes
3691              where  the  tasks  are executed, the SrunEpilog runs on the node
3692              where the "srun" is executing.
3693
3694
3695       SrunPortRange
3696              The srun creates a set of listening ports  to  communicate  with
3697              the  controller,  the  slurmstepd  and to handle the application
3698              I/O.  By default these ports are ephemeral meaning the port num‐
3699              bers  are  selected  by  the  kernel. Using this parameter allow
3700              sites to configure a range of ports from which srun  ports  will
3701              be  selected. This is useful if sites want to allow only certain
3702              port range on their network.
3703
3704              Note: On Cray systems, Realm-Specific IP Addressing (RSIP)  will
3705              automatically  try  to  interact  with  anything opened on ports
3706              8192-60000.  Configure SrunPortRange to use  a  range  of  ports
3707              above  those used by RSIP, ideally 1000 or more ports, for exam‐
3708              ple "SrunPortRange=60001-63000".
3709
3710              Note: A sufficient number of ports must be configured  based  on
3711              the estimated number of srun on the submission nodes considering
3712              that srun opens 3 listening ports  plus  2  more  for  every  48
3713              hosts. Example:
3714
3715              srun -N 48 will use 5 listening ports.
3716
3717
3718              srun -N 50 will use 7 listening ports.
3719
3720
3721              srun -N 200 will use 13 listening ports.
3722
3723
3724       SrunProlog
3725              Fully  qualified  pathname  of  an  executable to be run by srun
3726              prior to the launch of a job step.  The command  line  arguments
3727              for  the executable will be the command and arguments of the job
3728              step.  This configuration parameter may be overridden by  srun's
3729              --prolog  parameter. Note that while the other "Prolog" executa‐
3730              bles (e.g., TaskProlog) are run by slurmd on the  compute  nodes
3731              where  the  tasks  are executed, the SrunProlog runs on the node
3732              where the "srun" is executing.
3733
3734
3735       StateSaveLocation
3736              Fully qualified pathname of a directory  into  which  the  Slurm
3737              controller,     slurmctld,     saves     its     state     (e.g.
3738              "/usr/local/slurm/checkpoint").  Slurm state will saved here  to
3739              recover  from system failures.  SlurmUser must be able to create
3740              files in this directory.  If you have a BackupController config‐
3741              ured, this location should be readable and writable by both sys‐
3742              tems.  Since all running and pending job information  is  stored
3743              here,  the  use  of a reliable file system (e.g. RAID) is recom‐
3744              mended.  The default value is "/var/spool".  If any  slurm  dae‐
3745              mons terminate abnormally, their core files will also be written
3746              into this directory.
3747
3748
3749       SuspendExcNodes
3750              Specifies the nodes which are to not be  placed  in  power  save
3751              mode,  even  if  the node remains idle for an extended period of
3752              time.  Use Slurm's hostlist expression to identify nodes with an
3753              optional  ":"  separator  and count of nodes to exclude from the
3754              preceding range.  For  example  "nid[10-20]:4"  will  prevent  4
3755              usable nodes (i.e IDLE and not DOWN, DRAINING or already powered
3756              down) in the set "nid[10-20]" from being powered down.  Multiple
3757              sets of nodes can be specified with or without counts in a comma
3758              separated list (e.g  "nid[10-20]:4,nid[80-90]:2").   If  a  node
3759              count  specification  is  given, any list of nodes to NOT have a
3760              node count must be after the last specification  with  a  count.
3761              For  example  "nid[10-20]:4,nid[60-70]"  will exclude 4 nodes in
3762              the set "nid[10-20]:4" plus all nodes in  the  set  "nid[60-70]"
3763              while  "nid[1-3],nid[10-20]:4" will exclude 4 nodes from the set
3764              "nid[1-3],nid[10-20]".   By  default  no  nodes  are   excluded.
3765              Related  configuration options include ResumeTimeout, ResumePro‐
3766              gram, ResumeRate, SuspendProgram, SuspendRate, SuspendTime, Sus‐
3767              pendTimeout, and SuspendExcParts.
3768
3769
3770       SuspendExcParts
3771              Specifies  the  partitions  whose  nodes are to not be placed in
3772              power save mode, even if the node remains idle for  an  extended
3773              period of time.  Multiple partitions can be identified and sepa‐
3774              rated by commas.  By default no  nodes  are  excluded.   Related
3775              configuration   options  include  ResumeTimeout,  ResumeProgram,
3776              ResumeRate, SuspendProgram,  SuspendRate,  SuspendTime  Suspend‐
3777              Timeout, and SuspendExcNodes.
3778
3779
3780       SuspendProgram
3781              SuspendProgram  is the program that will be executed when a node
3782              remains idle for an extended period of time.   This  program  is
3783              expected  to place the node into some power save mode.  This can
3784              be used to reduce the frequency and voltage of a  node  or  com‐
3785              pletely  power the node off.  The program executes as SlurmUser.
3786              The argument to the program will be the names  of  nodes  to  be
3787              placed  into  power savings mode (using Slurm's hostlist expres‐
3788              sion format).  By default, no program is run.  Related  configu‐
3789              ration options include ResumeTimeout, ResumeProgram, ResumeRate,
3790              SuspendRate, SuspendTime, SuspendTimeout,  SuspendExcNodes,  and
3791              SuspendExcParts.
3792
3793
3794       SuspendRate
3795              The  rate at which nodes are placed into power save mode by Sus‐
3796              pendProgram.  The value is number of nodes per minute and it can
3797              be used to prevent a large drop in power consumption (e.g. after
3798              a large job completes).  A value of zero results  in  no  limits
3799              being  imposed.   The  default  value  is  60  nodes per minute.
3800              Related configuration options include ResumeTimeout,  ResumePro‐
3801              gram,  ResumeRate,  SuspendProgram, SuspendTime, SuspendTimeout,
3802              SuspendExcNodes, and SuspendExcParts.
3803
3804
3805       SuspendTime
3806              Nodes which remain idle or down for this number of seconds  will
3807              be placed into power save mode by SuspendProgram.  For efficient
3808              system utilization, it is recommended that the value of Suspend‐
3809              Time  be  at  least  as  large as the sum of SuspendTimeout plus
3810              ResumeTimeout.  A value of -1 disables power save  mode  and  is
3811              the  default.  Related configuration options include ResumeTime‐
3812              out,  ResumeProgram,  ResumeRate,  SuspendProgram,  SuspendRate,
3813              SuspendTimeout, SuspendExcNodes, and SuspendExcParts.
3814
3815
3816       SuspendTimeout
3817              Maximum  time permitted (in seconds) between when a node suspend
3818              request is issued and when the node is shutdown.  At  that  time
3819              the  node  must  be  ready  for a resume request to be issued as
3820              needed for new work.  The default value is 30 seconds.   Related
3821              configuration options include ResumeProgram, ResumeRate, Resume‐
3822              Timeout, SuspendRate, SuspendTime, SuspendProgram,  SuspendExcN‐
3823              odes  and SuspendExcParts.  More information is available at the
3824              Slurm web site ( https://slurm.schedmd.com/power_save.html ).
3825
3826
3827       SwitchType
3828              Identifies the type of switch or interconnect used for  applica‐
3829              tion      communications.      Acceptable     values     include
3830              "switch/cray_aries" for Cray systems, "switch/none" for switches
3831              not  requiring  special processing for job launch or termination
3832              (Ethernet,  and   InfiniBand)   and   The   default   value   is
3833              "switch/none".   All  Slurm  daemons,  commands and running jobs
3834              must be restarted for a change in SwitchType to take effect.  If
3835              running jobs exist at the time slurmctld is restarted with a new
3836              value of SwitchType, records of all jobs in  any  state  may  be
3837              lost.
3838
3839
3840       TaskEpilog
3841              Fully qualified pathname of a program to be execute as the slurm
3842              job's owner after termination of each task.  See TaskProlog  for
3843              execution order details.
3844
3845
3846       TaskPlugin
3847              Identifies  the  type  of  task launch plugin, typically used to
3848              provide resource management within a node (e.g. pinning tasks to
3849              specific processors). More than one task plugin can be specified
3850              in a comma separated list. The prefix of  "task/"  is  optional.
3851              Acceptable values include:
3852
3853              task/affinity  enables resource containment using CPUSETs.  This
3854                             enables the  --cpu-bind  and/or  --mem-bind  srun
3855                             options.    If   you   use   "task/affinity"  and
3856                             encounter problems, it may be due to the  variety
3857                             of  system  calls used to implement task affinity
3858                             on different operating systems.
3859
3860              task/cgroup    enables resource containment using Linux  control
3861                             cgroups.   This  enables  the  --cpu-bind  and/or
3862                             --mem-bind  srun   options.    NOTE:   see   "man
3863                             cgroup.conf" for configuration details.
3864
3865              task/none      for systems requiring no special handling of user
3866                             tasks.  Lacks support for the  --cpu-bind  and/or
3867                             --mem-bind  srun  options.   The default value is
3868                             "task/none".
3869
3870       NOTE: It is recommended  to  stack  task/affinity,task/cgroup  together
3871       when configuring TaskPlugin, and setting TaskAffinity=no and Constrain‐
3872       Cores=yes in cgroup.conf. This setup uses the task/affinity plugin  for
3873       setting  the  affinity of the tasks (which is better and different than
3874       task/cgroup) and uses the task/cgroup plugin to fence  tasks  into  the
3875       specified resources, thus combining the best of both pieces.
3876
3877       NOTE:  For CRAY systems only: task/cgroup must be used with, and listed
3878       after task/cray_aries in TaskPlugin. The task/affinity  plugin  can  be
3879       listed  everywhere,  but  the previous constraint must be satisfied. So
3880       for CRAY systems, a configuration like this is recommended:
3881
3882       TaskPlugin=task/affinity,task/cray_aries,task/cgroup
3883
3884
3885       TaskPluginParam
3886              Optional parameters  for  the  task  plugin.   Multiple  options
3887              should  be  comma  separated.   If None, Boards, Sockets, Cores,
3888              Threads, and/or Verbose are specified, they  will  override  the
3889              --cpu-bind  option  specified  by  the user in the srun command.
3890              None, Boards, Sockets, Cores and Threads are mutually  exclusive
3891              and since they decrease scheduling flexibility are not generally
3892              recommended (select no more than  one  of  them).   Cpusets  and
3893              Sched  are  mutually  exclusive  (select only one of them).  All
3894              TaskPluginParam options are supported on FreeBSD except Cpusets.
3895              The  Sched  option  uses  cpuset_setaffinity()  on  FreeBSD, not
3896              sched_setaffinity().
3897
3898
3899              Boards    Bind tasks to boards by default.  Overrides  automatic
3900                        binding.
3901
3902              Cores     Bind  tasks  to cores by default.  Overrides automatic
3903                        binding.
3904
3905              Cpusets   Use cpusets to perform task  affinity  functions.   By
3906                        default, Sched task binding is performed.
3907
3908              None      Perform  no  task binding by default.  Overrides auto‐
3909                        matic binding.
3910
3911              Sched     Use sched_setaffinity (if available) to bind tasks  to
3912                        processors.
3913
3914              Sockets   Bind to sockets by default.  Overrides automatic bind‐
3915                        ing.
3916
3917              Threads   Bind to threads by default.  Overrides automatic bind‐
3918                        ing.
3919
3920              SlurmdOffSpec
3921                        If  specialized  cores  or CPUs are identified for the
3922                        node (i.e. the CoreSpecCount or CpuSpecList  are  con‐
3923                        figured  for  the node), then Slurm daemons running on
3924                        the compute node (i.e. slurmd and  slurmstepd)  should
3925                        run  outside  of  those  resources  (i.e.  specialized
3926                        resources are completely unavailable to Slurm  daemons
3927                        and  jobs  spawned  by Slurm).  This option may not be
3928                        used with the task/cray_aries plugin.
3929
3930              Verbose   Verbosely report binding before tasks run.   Overrides
3931                        user options.
3932
3933              Autobind  Set a default binding in the event that "auto binding"
3934                        doesn't find a match.  Set to Threads, Cores or  Sock‐
3935                        ets (E.g. TaskPluginParam=autobind=threads).
3936
3937
3938       TaskProlog
3939              Fully qualified pathname of a program to be execute as the slurm
3940              job's owner prior to initiation of each task.  Besides the  nor‐
3941              mal  environment variables, this has SLURM_TASK_PID available to
3942              identify the process ID of the  task  being  started.   Standard
3943              output  from this program can be used to control the environment
3944              variables and output for the user program.
3945
3946              export NAME=value   Will set environment variables for the  task
3947                                  being  spawned.   Everything after the equal
3948                                  sign to the end of the line will be used  as
3949                                  the  value  for  the  environment  variable.
3950                                  Exporting of functions is not currently sup‐
3951                                  ported.
3952
3953              print ...           Will  cause  that  line (without the leading
3954                                  "print ") to be printed to the  job's  stan‐
3955                                  dard output.
3956
3957              unset NAME          Will  clear  environment  variables  for the
3958                                  task being spawned.
3959
3960              The order of task prolog/epilog execution is as follows:
3961
3962              1. pre_launch_priv()
3963                                  Function in TaskPlugin
3964
3965              1. pre_launch()     Function in TaskPlugin
3966
3967              2. TaskProlog       System-wide  per  task  program  defined  in
3968                                  slurm.conf
3969
3970              3. user prolog      Job step specific task program defined using
3971                                  srun's     --task-prolog      option      or
3972                                  SLURM_TASK_PROLOG environment variable
3973
3974              4. Execute the job step's task
3975
3976              5. user epilog      Job step specific task program defined using
3977                                  srun's     --task-epilog      option      or
3978                                  SLURM_TASK_EPILOG environment variable
3979
3980              6. TaskEpilog       System-wide  per  task  program  defined  in
3981                                  slurm.conf
3982
3983              7. post_term()      Function in TaskPlugin
3984
3985
3986       TCPTimeout
3987              Time permitted for TCP connection  to  be  established.  Default
3988              value is 2 seconds.
3989
3990
3991       TmpFS  Fully  qualified  pathname  of the file system available to user
3992              jobs for temporary storage. This parameter is used in establish‐
3993              ing a node's TmpDisk space.  The default value is "/tmp".
3994
3995
3996       TopologyParam
3997              Comma separated options identifying network topology options.
3998
3999              Dragonfly      Optimize allocation for Dragonfly network.  Valid
4000                             when TopologyPlugin=topology/tree.
4001
4002              TopoOptional   Only optimize allocation for network topology  if
4003                             the  job includes a switch option. Since optimiz‐
4004                             ing resource  allocation  for  topology  involves
4005                             much  higher  system overhead, this option can be
4006                             used to impose the extra overhead  only  on  jobs
4007                             which can take advantage of it. If most job allo‐
4008                             cations are not optimized for  network  topology,
4009                             they  make  fragment  resources to the point that
4010                             topology optimization for other jobs will be dif‐
4011                             ficult  to  achieve.   NOTE: Jobs may span across
4012                             nodes without common parent  switches  with  this
4013                             enabled.
4014
4015
4016       TopologyPlugin
4017              Identifies  the  plugin  to  be used for determining the network
4018              topology and optimizing job allocations to minimize network con‐
4019              tention.   See  NETWORK  TOPOLOGY below for details.  Additional
4020              plugins may be provided in  the  future  which  gather  topology
4021              information   directly  from  the  network.   Acceptable  values
4022              include:
4023
4024              topology/3d_torus    best-fit   logic   over   three-dimensional
4025                                   topology
4026
4027              topology/node_rank   orders   nodes  based  upon  information  a
4028                                   node_rank field in the node record as  gen‐
4029                                   erated by a select plugin. Slurm performs a
4030                                   best-fit algorithm over those ordered nodes
4031
4032              topology/none        default for other systems,  best-fit  logic
4033                                   over one-dimensional topology
4034
4035              topology/tree        used   for   a   hierarchical   network  as
4036                                   described in a topology.conf file
4037
4038
4039       TrackWCKey
4040              Boolean yes or no.  Used to set display and track of  the  Work‐
4041              load  Characterization  Key.  Must be set to track correct wckey
4042              usage.  NOTE: You must also set TrackWCKey in your slurmdbd.conf
4043              file to create historical usage reports.
4044
4045
4046       TreeWidth
4047              Slurmd  daemons  use  a virtual tree network for communications.
4048              TreeWidth specifies the width of the tree (i.e. the fanout).  On
4049              architectures  with  a front end node running the slurmd daemon,
4050              the value must always be equal to or greater than the number  of
4051              front end nodes which eliminates the need for message forwarding
4052              between the slurmd daemons.  On other architectures the  default
4053              value  is 50, meaning each slurmd daemon can communicate with up
4054              to 50 other slurmd daemons and over 2500 nodes can be  contacted
4055              with  two  message  hops.   The default value will work well for
4056              most clusters.  Optimal  system  performance  can  typically  be
4057              achieved if TreeWidth is set to the square root of the number of
4058              nodes in the cluster for systems having no more than 2500  nodes
4059              or  the  cube  root for larger systems. The value may not exceed
4060              65533.
4061
4062
4063       UnkillableStepProgram
4064              If the processes in a job step are determined to  be  unkillable
4065              for  a  period  of  time  specified by the UnkillableStepTimeout
4066              variable, the program specified by UnkillableStepProgram will be
4067              executed.   This  program can be used to take special actions to
4068              clean up the unkillable processes and/or notify computer  admin‐
4069              istrators.   The program will be run SlurmdUser (usually "root")
4070              on the compute node.  By default no program is run.
4071
4072
4073       UnkillableStepTimeout
4074              The length of time, in seconds,  that  Slurm  will  wait  before
4075              deciding that processes in a job step are unkillable (after they
4076              have been signaled with SIGKILL) and execute  UnkillableStepPro‐
4077              gram  as  described above.  The default timeout value is 60 sec‐
4078              onds.  If exceeded, the compute node will be drained to  prevent
4079              future jobs from being scheduled on the node.
4080
4081
4082       UsePAM If  set  to  1, PAM (Pluggable Authentication Modules for Linux)
4083              will be enabled.  PAM is used to establish the upper bounds  for
4084              resource limits. With PAM support enabled, local system adminis‐
4085              trators can dynamically configure system resource limits. Chang‐
4086              ing  the upper bound of a resource limit will not alter the lim‐
4087              its of running jobs, only jobs started after a change  has  been
4088              made  will  pick up the new limits.  The default value is 0 (not
4089              to enable PAM support).  Remember that PAM also needs to be con‐
4090              figured  to  support  Slurm as a service.  For sites using PAM's
4091              directory based configuration option, a configuration file named
4092              slurm  should  be  created.  The module-type, control-flags, and
4093              module-path names that should be included in the file are:
4094              auth        required      pam_localuser.so
4095              auth        required      pam_shells.so
4096              account     required      pam_unix.so
4097              account     required      pam_access.so
4098              session     required      pam_unix.so
4099              For sites configuring PAM with a general configuration file, the
4100              appropriate  lines (see above), where slurm is the service-name,
4101              should be added.
4102
4103              NOTE:  UsePAM  option  has  nothing  to   do   with   the   con‐
4104              tribs/pam/pam_slurm  and/or contribs/pam_slurm_adopt modules. So
4105              these two modules can work independently of the  value  set  for
4106              UsePAM.
4107
4108
4109       VSizeFactor
4110              Memory  specifications in job requests apply to real memory size
4111              (also known as resident set size). It  is  possible  to  enforce
4112              virtual  memory  limits  for both jobs and job steps by limiting
4113              their virtual memory to some percentage  of  their  real  memory
4114              allocation. The VSizeFactor parameter specifies the job's or job
4115              step's virtual memory limit as a percentage of its  real  memory
4116              limit.  For  example,  if a job's real memory limit is 500MB and
4117              VSizeFactor is set to 101 then the job will  be  killed  if  its
4118              real  memory  exceeds  500MB or its virtual memory exceeds 505MB
4119              (101 percent of the real memory limit).  The default value is 0,
4120              which  disables enforcement of virtual memory limits.  The value
4121              may not exceed 65533 percent.
4122
4123
4124       WaitTime
4125              Specifies how many seconds the srun command  should  by  default
4126              wait  after  the  first  task  terminates before terminating all
4127              remaining tasks. The "--wait" option on the  srun  command  line
4128              overrides  this  value.   The default value is 0, which disables
4129              this feature.  May not exceed 65533 seconds.
4130
4131
4132       X11Parameters
4133              For use with Slurm's built-in X11 forwarding implementation.
4134
4135              home_xauthority
4136                      If set, xauth data on the compute node will be placed in
4137                      ~/.Xauthority  rather  than  in  a  temporary file under
4138                      TmpFS.
4139
4140
4141       The configuration of nodes (or machines) to be managed by Slurm is also
4142       specified  in  /etc/slurm.conf.   Changes  in  node configuration (e.g.
4143       adding nodes, changing their processor count, etc.) require  restarting
4144       both  the  slurmctld daemon and the slurmd daemons.  All slurmd daemons
4145       must know each node in the system to forward  messages  in  support  of
4146       hierarchical communications.  Only the NodeName must be supplied in the
4147       configuration  file.   All  other  node  configuration  information  is
4148       optional.   It  is advisable to establish baseline node configurations,
4149       especially if the cluster is heterogeneous.  Nodes  which  register  to
4150       the  system  with  less  than the configured resources (e.g. too little
4151       memory), will be placed in the "DOWN" state to avoid scheduling jobs on
4152       them.   Establishing  baseline  configurations  will also speed Slurm's
4153       scheduling process by permitting it to compare job requirements against
4154       these (relatively few) configuration parameters and possibly avoid hav‐
4155       ing to check job requirements against every individual node's  configu‐
4156       ration.   The  resources  checked  at node registration time are: CPUs,
4157       RealMemory and TmpDisk.
4158
4159       Default values can be specified with a  record  in  which  NodeName  is
4160       "DEFAULT".  The default entry values will apply only to lines following
4161       it in the configuration file and the default values can be reset multi‐
4162       ple  times in the configuration file with multiple entries where "Node‐
4163       Name=DEFAULT".  Each line where NodeName is "DEFAULT" will  replace  or
4164       add  to previous default values and not a reinitialize the default val‐
4165       ues.  The "NodeName="  specification  must  be  placed  on  every  line
4166       describing  the  configuration  of  nodes.   A single node name can not
4167       appear as a NodeName value in more than one line (duplicate  node  name
4168       records will be ignored).  In fact, it is generally possible and desir‐
4169       able to define the configurations of all nodes in  only  a  few  lines.
4170       This  convention  permits significant optimization in the scheduling of
4171       larger clusters.  In order to support the  concept  of  jobs  requiring
4172       consecutive  nodes on some architectures, node specifications should be
4173       place in this file in consecutive order.  No single node  name  may  be
4174       listed  more  than once in the configuration file.  Use "DownNodes=" to
4175       record the state of nodes which are temporarily in  a  DOWN,  DRAIN  or
4176       FAILING  state without altering permanent configuration information.  A
4177       job step's tasks are allocated to nodes in order the  nodes  appear  in
4178       the  configuration  file. There is presently no capability within Slurm
4179       to arbitrarily order a job step's tasks.
4180
4181       Multiple node names may be comma  separated  (e.g.  "alpha,beta,gamma")
4182       and/or a simple node range expression may optionally be used to specify
4183       numeric ranges of nodes to avoid building  a  configuration  file  with
4184       large  numbers  of  entries.  The node range expression can contain one
4185       pair of square brackets with a  sequence  of  comma  separated  numbers
4186       and/or ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
4187       "lx[15,18,32-33]").  Note that the numeric ranges can  include  one  or
4188       more  leading  zeros to indicate the numeric portion has a fixed number
4189       of digits (e.g. "linux[0000-1023]").  Multiple numeric  ranges  can  be
4190       included  in the expression (e.g. "rack[0-63]_blade[0-41]").  If one or
4191       more numeric expressions are included, one of them must be at  the  end
4192       of the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
4193       always be used in a comma separated list.
4194
4195       The node configuration specified the following information:
4196
4197
4198       NodeName
4199              Name that Slurm uses to refer to a node.  Typically  this  would
4200              be  the  string that "/bin/hostname -s" returns.  It may also be
4201              the fully qualified domain name as  returned  by  "/bin/hostname
4202              -f"  (e.g.  "foo1.bar.com"), or any valid domain name associated
4203              with the host through the host  database  (/etc/hosts)  or  DNS,
4204              depending on the resolver settings.  Note that if the short form
4205              of the hostname is not used, it  may  prevent  use  of  hostlist
4206              expressions  (the numeric portion in brackets must be at the end
4207              of the string).  It may also be an arbitrary string if NodeHost‐
4208              name  is  specified.   If  the NodeName is "DEFAULT", the values
4209              specified with that record will apply to subsequent node  speci‐
4210              fications  unless  explicitly  set  to other values in that node
4211              record or replaced with a different set of default values.  Each
4212              line where NodeName is "DEFAULT" will replace or add to previous
4213              default values and not a reinitialize the default  values.   For
4214              architectures in which the node order is significant, nodes will
4215              be considered consecutive in the order defined.  For example, if
4216              the configuration for "NodeName=charlie" immediately follows the
4217              configuration for "NodeName=baker" they will be considered adja‐
4218              cent in the computer.
4219
4220
4221       NodeHostname
4222              Typically  this  would  be  the  string  that "/bin/hostname -s"
4223              returns.  It may also be the  fully  qualified  domain  name  as
4224              returned  by  "/bin/hostname  -f"  (e.g. "foo1.bar.com"), or any
4225              valid domain name associated with  the  host  through  the  host
4226              database  (/etc/hosts)  or  DNS,  depending on the resolver set‐
4227              tings.  Note that if the short form of the hostname is not used,
4228              it  may prevent use of hostlist expressions (the numeric portion
4229              in brackets must be at the end of the  string).   A  node  range
4230              expression can be used to specify a set of nodes.  If an expres‐
4231              sion is used, the number of nodes identified by NodeHostname  on
4232              a line in the configuration file must be identical to the number
4233              of nodes identified by NodeName.  By default,  the  NodeHostname
4234              will be identical in value to NodeName.
4235
4236
4237       NodeAddr
4238              Name  that a node should be referred to in establishing a commu‐
4239              nications path.  This name will be used as an  argument  to  the
4240              gethostbyname()  function  for  identification.  If a node range
4241              expression is  used  to  designate  multiple  nodes,  they  must
4242              exactly   match   the  entries  in  the  NodeName  (e.g.  "Node‐
4243              Name=lx[0-7] NodeAddr=elx[0-7]").  NodeAddr may also contain  IP
4244              addresses.   By default, the NodeAddr will be identical in value
4245              to NodeHostname.
4246
4247
4248       Boards Number of Baseboards in nodes with a baseboard controller.  Note
4249              that  when Boards is specified, SocketsPerBoard, CoresPerSocket,
4250              and ThreadsPerCore should be specified.   Boards  and  CPUs  are
4251              mutually exclusive.  The default value is 1.
4252
4253
4254       CoreSpecCount
4255              Number  of  cores reserved for system use.  These cores will not
4256              be available for allocation to user jobs.   Depending  upon  the
4257              TaskPluginParam  option  of  SlurmdOffSpec,  Slurm daemons (i.e.
4258              slurmd and slurmstepd) may either be confined to these resources
4259              (the  default)  or prevented from using these resources.  Isola‐
4260              tion of the Slurm daemons from user jobs may improve application
4261              performance.  If this option and CpuSpecList are both designated
4262              for a node, an error is generated.  For information on the algo‐
4263              rithm  used  by Slurm to select the cores refer to the core spe‐
4264              cialization                   documentation                    (
4265              https://slurm.schedmd.com/core_spec.html ).
4266
4267
4268       CoresPerSocket
4269              Number  of  cores  in  a  single physical processor socket (e.g.
4270              "2").  The CoresPerSocket value describes  physical  cores,  not
4271              the  logical number of processors per socket.  NOTE: If you have
4272              multi-core processors, you will  likely  need  to  specify  this
4273              parameter in order to optimize scheduling.  The default value is
4274              1.
4275
4276
4277       CpuBind
4278              If a job step request does not specify an option to control  how
4279              tasks  are  bound  to  allocated CPUs (--cpu-bind) and all nodes
4280              allocated to the job have the same CpuBind option the node  Cpu‐
4281              Bind  option  will  control  how  tasks  are  bound to allocated
4282              resources. Supported values for  CpuBind  are  "none",  "board",
4283              "socket", "ldom" (NUMA), "core" and "thread".
4284
4285
4286       CPUs   Number  of  logical processors on the node (e.g. "2").  CPUs and
4287              Boards are mutually exclusive. It can be set to the total number
4288              of  sockets,  cores or threads. This can be useful when you want
4289              to schedule only the cores on a hyper-threaded node.  If CPUs is
4290              omitted,  it will be set equal to the product of Sockets, Cores‐
4291              PerSocket, and ThreadsPerCore.  The default value is 1.
4292
4293
4294       CpuSpecList
4295              A comma delimited list of Slurm abstract CPU  IDs  reserved  for
4296              system  use.   The  list  will  be expanded to include all other
4297              CPUs, if any, on the same cores.  These cores will not be avail‐
4298              able  for allocation to user jobs.  Depending upon the TaskPlug‐
4299              inParam option of SlurmdOffSpec, Slurm daemons (i.e. slurmd  and
4300              slurmstepd)  may  either  be  confined  to  these resources (the
4301              default) or prevented from using these resources.  Isolation  of
4302              the Slurm daemons from user jobs may improve application perfor‐
4303              mance.  If this option and CoreSpecCount are both designated for
4304              a node, an error is generated.  This option has no effect unless
4305              cgroup   job   confinement   is   also   configured    (TaskPlu‐
4306              gin=task/cgroup with ConstrainCores=yes in cgroup.conf).
4307
4308
4309       Feature
4310              A  comma  delimited list of arbitrary strings indicative of some
4311              characteristic associated with the  node.   There  is  no  value
4312              associated with a feature at this time, a node either has a fea‐
4313              ture or it does not.  If desired a feature may contain a numeric
4314              component  indicating, for example, processor speed.  By default
4315              a node has no features.  Also see Gres.
4316
4317
4318       Gres   A comma delimited list of generic resources specifications for a
4319              node.    The   format   is:  "<name>[:<type>][:no_consume]:<num‐
4320              ber>[K|M|G]".  The first  field  is  the  resource  name,  which
4321              matches the GresType configuration parameter name.  The optional
4322              type field might be used to identify a  model  of  that  generic
4323              resource.  It is forbidden to specify both an untyped GRES and a
4324              typed GRES with the same <name>.  A generic resource can also be
4325              specified as non-consumable (i.e. multiple jobs can use the same
4326              generic resource) with the optional  field  ":no_consume".   The
4327              final field must specify a generic resources count.  A suffix of
4328              "K", "M", "G", "T" or "P" may be used to multiply the number  by
4329              1024,      1048576,      1073741824,      etc.     respectively.
4330              (e.g."Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_con‐
4331              sume:4G").   By  default a node has no generic resources and its
4332              maximum count is that of an unsigned 64bit  integer.   Also  see
4333              Feature.
4334
4335
4336       MemSpecLimit
4337              Amount  of memory, in megabytes, reserved for system use and not
4338              available for user allocations.  If the  task/cgroup  plugin  is
4339              configured  and  that plugin constrains memory allocations (i.e.
4340              TaskPlugin=task/cgroup in slurm.conf, plus ConstrainRAMSpace=yes
4341              in  cgroup.conf),  then  Slurm compute node daemons (slurmd plus
4342              slurmstepd) will be allocated the specified memory  limit.  Note
4343              that having the Memory set in SelectTypeParameters as any of the
4344              options that has it as a consumable resource is needed for  this
4345              option  to work.  The daemons will not be killed if they exhaust
4346              the memory allocation (ie. the Out-Of-Memory Killer is  disabled
4347              for  the  daemon's memory cgroup).  If the task/cgroup plugin is
4348              not configured, the specified memory will  only  be  unavailable
4349              for user allocations.
4350
4351
4352       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4353              tens to for work on this particular node. By default there is  a
4354              single  port  number for all slurmd daemons on all compute nodes
4355              as defined by the SlurmdPort  configuration  parameter.  Use  of
4356              this  option is not generally recommended except for development
4357              or testing purposes. If multiple slurmd  daemons  execute  on  a
4358              node this can specify a range of ports.
4359
4360              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4361              automatically try to interact  with  anything  opened  on  ports
4362              8192-60000.  Configure Port to use a port outside of the config‐
4363              ured SrunPortRange and RSIP's port range.
4364
4365
4366       Procs  See CPUs.
4367
4368
4369       RealMemory
4370              Size of real memory on the node in megabytes (e.g. "2048").  The
4371              default value is 1. Lowering RealMemory with the goal of setting
4372              aside some amount for the OS and not available for  job  alloca‐
4373              tions  will  not work as intended if Memory is not set as a con‐
4374              sumable resource in SelectTypeParameters. So one of the *_Memory
4375              options  need  to  be  enabled for that goal to be accomplished.
4376              Also see MemSpecLimit.
4377
4378
4379       Reason Identifies  the  reason  for  a  node  being  in  state  "DOWN",
4380              "DRAINED"  "DRAINING",  "FAIL"  or  "FAILING".   Use  quotes  to
4381              enclose a reason having more than one word.
4382
4383
4384       Sockets
4385              Number of physical processor sockets/chips  on  the  node  (e.g.
4386              "2").   If  Sockets  is  omitted, it will be inferred from CPUs,
4387              CoresPerSocket,  and  ThreadsPerCore.    NOTE:   If   you   have
4388              multi-core  processors,  you  will  likely need to specify these
4389              parameters.  Sockets and SocketsPerBoard are mutually exclusive.
4390              If  Sockets  is  specified  when Boards is also used, Sockets is
4391              interpreted as SocketsPerBoard rather than total  sockets.   The
4392              default value is 1.
4393
4394
4395       SocketsPerBoard
4396              Number  of  physical  processor  sockets/chips  on  a baseboard.
4397              Sockets and SocketsPerBoard are mutually exclusive.  The default
4398              value is 1.
4399
4400
4401       State  State  of  the node with respect to the initiation of user jobs.
4402              Acceptable values are "CLOUD", "DOWN", "DRAIN",  "FAIL",  "FAIL‐
4403              ING",  "FUTURE" and "UNKNOWN".  Node states of "BUSY" and "IDLE"
4404              should not be specified in the node configuration, but  set  the
4405              node  state  to  "UNKNOWN"  instead.   Setting the node state to
4406              "UNKNOWN" will result in the node state  being  set  to  "BUSY",
4407              "IDLE"  or  other  appropriate state based upon recovered system
4408              state information.  The default value is  "UNKNOWN".   Also  see
4409              the DownNodes parameter below.
4410
4411              CLOUD     Indicates  the node exists in the cloud.  It's initial
4412                        state will be treated as powered down.  The node  will
4413                        be  available  for  use  after it's state is recovered
4414                        from Slurm's state save  file  or  the  slurmd  daemon
4415                        starts on the compute node.
4416
4417              DOWN      Indicates  the  node  failed  and is unavailable to be
4418                        allocated work.
4419
4420              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4421                        work.on.
4422
4423              FAIL      Indicates  the  node  is expected to fail soon, has no
4424                        jobs allocated to it, and will not be allocated to any
4425                        new jobs.
4426
4427              FAILING   Indicates  the  node is expected to fail soon, has one
4428                        or more jobs allocated to it, but will  not  be  allo‐
4429                        cated to any new jobs.
4430
4431              FUTURE    Indicates  the node is defined for future use and need
4432                        not exist when the Slurm daemons  are  started.  These
4433                        nodes can be made available for use simply by updating
4434                        the node state using the scontrol command rather  than
4435                        restarting the slurmctld daemon. After these nodes are
4436                        made available, change their State in  the  slurm.conf
4437                        file.  Until these nodes are made available, they will
4438                        not be seen using any Slurm commands or nor  will  any
4439                        attempt be made to contact them.
4440
4441              UNKNOWN   Indicates  the  node's  state  is  undefined  (BUSY or
4442                        IDLE), but will be established when the slurmd  daemon
4443                        on   that   node  registers.   The  default  value  is
4444                        "UNKNOWN".
4445
4446
4447       ThreadsPerCore
4448              Number of logical threads in a single physical core (e.g.  "2").
4449              Note  that  the Slurm can allocate resources to jobs down to the
4450              resolution of a core. If your system  is  configured  with  more
4451              than  one  thread per core, execution of a different job on each
4452              thread is not supported unless you  configure  SelectTypeParame‐
4453              ters=CR_CPU  plus CPUs; do not configure Sockets, CoresPerSocket
4454              or ThreadsPerCore.  A job can execute a one task per thread from
4455              within  one  job  step or execute a distinct job step on each of
4456              the threads.  Note also if you are  running  with  more  than  1
4457              thread   per   core   and   running   the   select/cons_res   or
4458              select/cons_tres plugin then you will want to  set  the  Select‐
4459              TypeParameters  variable to something other than CR_CPU to avoid
4460              unexpected results.  The default value is 1.
4461
4462
4463       TmpDisk
4464              Total size of temporary disk storage in TmpFS in megabytes (e.g.
4465              "16384").  TmpFS  (for  "Temporary  File System") identifies the
4466              location which jobs should use for temporary storage.  Note this
4467              does not indicate the amount of free space available to the user
4468              on the node, only the total file system size. The system  admin‐
4469              istration  should ensure this file system is purged as needed so
4470              that user jobs have access to most of this  space.   The  Prolog
4471              and/or  Epilog  programs  (specified  in the configuration file)
4472              might be used to ensure the file  system  is  kept  clean.   The
4473              default value is 0.
4474
4475
4476       TRESWeights  TRESWeights  are used to calculate a value that represents
4477       how
4478              busy a node is. Currently only  used  in  federation  configura‐
4479              tions.  TRESWeights  are  different  from  TRESBillingWeights --
4480              which is used for fairshare calculations.
4481
4482              TRES weights are specified as a comma-separated  list  of  <TRES
4483              Type>=<TRES Weight> pairs.
4484              e.g.
4485              NodeName=node1 ... TRESWeights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0"
4486
4487              By  default  the weighted TRES value is calculated as the sum of
4488              all node TRES  types  multiplied  by  their  corresponding  TRES
4489              weight.
4490
4491              If PriorityFlags=MAX_TRES is configured, the weighted TRES value
4492              is calculated as the MAX of individual node  TRES'  (e.g.  cpus,
4493              mem, gres).
4494
4495
4496       Weight The  priority  of  the node for scheduling purposes.  All things
4497              being equal, jobs will be allocated the nodes  with  the  lowest
4498              weight  which satisfies their requirements.  For example, a het‐
4499              erogeneous collection of nodes might be  placed  into  a  single
4500              partition  for  greater  system  utilization, responsiveness and
4501              capability. It would be preferable to  allocate  smaller  memory
4502              nodes  rather  than larger memory nodes if either will satisfy a
4503              job's requirements.  The units  of  weight  are  arbitrary,  but
4504              larger weights should be assigned to nodes with more processors,
4505              memory, disk space, higher processor speed, etc.  Note that if a
4506              job allocation request can not be satisfied using the nodes with
4507              the lowest weight, the set of nodes with the next lowest  weight
4508              is added to the set of nodes under consideration for use (repeat
4509              as needed for higher weight values). If you absolutely  want  to
4510              minimize  the  number  of higher weight nodes allocated to a job
4511              (at a cost of higher scheduling overhead), give each node a dis‐
4512              tinct  Weight  value and they will be added to the pool of nodes
4513              being considered for scheduling individually.  The default value
4514              is 1.
4515
4516
4517       The  "DownNodes=" configuration permits you to mark certain nodes as in
4518       a DOWN, DRAIN, FAIL, or FAILING state without  altering  the  permanent
4519       configuration information listed under a "NodeName=" specification.
4520
4521
4522       DownNodes
4523              Any node name, or list of node names, from the "NodeName=" spec‐
4524              ifications.
4525
4526
4527       Reason Identifies the reason for a node being in state "DOWN", "DRAIN",
4528              "FAIL"  or "FAILING.  Use quotes to enclose a reason having more
4529              than one word.
4530
4531
4532       State  State of the node with respect to the initiation of  user  jobs.
4533              Acceptable  values  are  "DOWN",  "DRAIN", "FAIL", "FAILING" and
4534              "UNKNOWN".  Node states of "BUSY" and "IDLE" should not be spec‐
4535              ified  in  the  node  configuration,  but  set the node state to
4536              "UNKNOWN" instead.  Setting the node  state  to  "UNKNOWN"  will
4537              result  in  the  node state being set to "BUSY", "IDLE" or other
4538              appropriate state based upon recovered system state information.
4539              The default value is "UNKNOWN".
4540
4541              DOWN      Indicates  the  node  failed  and is unavailable to be
4542                        allocated work.
4543
4544              DRAIN     Indicates the node  is  unavailable  to  be  allocated
4545                        work.on.
4546
4547              FAIL      Indicates  the  node  is expected to fail soon, has no
4548                        jobs allocated to it, and will not be allocated to any
4549                        new jobs.
4550
4551              FAILING   Indicates  the  node is expected to fail soon, has one
4552                        or more jobs allocated to it, but will  not  be  allo‐
4553                        cated to any new jobs.
4554
4555              UNKNOWN   Indicates  the  node's  state  is  undefined  (BUSY or
4556                        IDLE), but will be established when the slurmd  daemon
4557                        on   that   node  registers.   The  default  value  is
4558                        "UNKNOWN".
4559
4560
4561       On computers where frontend nodes are used  to  execute  batch  scripts
4562       rather than compute nodes (Cray ALPS systems), one may configure one or
4563       more frontend nodes using the configuration parameters  defined  below.
4564       These  options  are  very  similar to those used in configuring compute
4565       nodes. These options may only be used on systems configured  and  built
4566       with  the  appropriate parameters (--have-front-end) or a system deter‐
4567       mined to have the appropriate  architecture  by  the  configure  script
4568       (Cray ALPS systems).  The front end configuration specifies the follow‐
4569       ing information:
4570
4571
4572       AllowGroups
4573              Comma separated list of group names which may  execute  jobs  on
4574              this  front  end node. By default, all groups may use this front
4575              end node.  If at  least  one  group  associated  with  the  user
4576              attempting to execute the job is in AllowGroups, he will be per‐
4577              mitted to use this front end node.  May not  be  used  with  the
4578              DenyGroups option.
4579
4580
4581       AllowUsers
4582              Comma  separated  list  of  user names which may execute jobs on
4583              this front end node. By default, all users may  use  this  front
4584              end node.  May not be used with the DenyUsers option.
4585
4586
4587       DenyGroups
4588              Comma  separated  list  of  group names which are prevented from
4589              executing jobs on this front end node.  May not be used with the
4590              AllowGroups option.
4591
4592
4593       DenyUsers
4594              Comma separated list of user names which are prevented from exe‐
4595              cuting jobs on this front end node.  May not be  used  with  the
4596              AllowUsers option.
4597
4598
4599       FrontendName
4600              Name  that  Slurm  uses  to refer to a frontend node.  Typically
4601              this would be the string that "/bin/hostname  -s"  returns.   It
4602              may  also  be  the  fully  qualified  domain name as returned by
4603              "/bin/hostname -f" (e.g. "foo1.bar.com"), or  any  valid  domain
4604              name   associated  with  the  host  through  the  host  database
4605              (/etc/hosts) or DNS, depending on the resolver  settings.   Note
4606              that  if the short form of the hostname is not used, it may pre‐
4607              vent use of hostlist expressions (the numeric portion in  brack‐
4608              ets  must  be at the end of the string).  If the FrontendName is
4609              "DEFAULT", the values specified with that record will  apply  to
4610              subsequent  node  specifications  unless explicitly set to other
4611              values in that frontend node record or replaced with a different
4612              set   of  default  values.   Each  line  where  FrontendName  is
4613              "DEFAULT" will replace or add to previous default values and not
4614              a  reinitialize  the default values.  Note that since the naming
4615              of front end nodes would typically not follow that of  the  com‐
4616              pute  nodes  (e.g.  lacking  X, Y and Z coordinates found in the
4617              compute node naming scheme), each front end node name should  be
4618              listed  separately and without a hostlist expression (i.e. fron‐
4619              tend00,frontend01" rather than "frontend[00-01]").</p>
4620
4621
4622       FrontendAddr
4623              Name that a frontend node should be referred to in  establishing
4624              a  communications path. This name will be used as an argument to
4625              the gethostbyname() function for identification.  As with  Fron‐
4626              tendName, list the individual node addresses rather than using a
4627              hostlist expression.  The number  of  FrontendAddr  records  per
4628              line  must  equal  the  number  of FrontendName records per line
4629              (i.e. you can't map to node names to one address).  FrontendAddr
4630              may  also  contain  IP  addresses.  By default, the FrontendAddr
4631              will be identical in value to FrontendName.
4632
4633
4634       Port   The port number that the Slurm compute node daemon, slurmd, lis‐
4635              tens  to  for  work on this particular frontend node. By default
4636              there is a single port number for  all  slurmd  daemons  on  all
4637              frontend nodes as defined by the SlurmdPort configuration param‐
4638              eter. Use of this option is not generally recommended except for
4639              development or testing purposes.
4640
4641              Note:  On Cray systems, Realm-Specific IP Addressing (RSIP) will
4642              automatically try to interact  with  anything  opened  on  ports
4643              8192-60000.  Configure Port to use a port outside of the config‐
4644              ured SrunPortRange and RSIP's port range.
4645
4646
4647       Reason Identifies the reason for a frontend node being in state "DOWN",
4648              "DRAINED"  "DRAINING",  "FAIL"  or  "FAILING".   Use  quotes  to
4649              enclose a reason having more than one word.
4650
4651
4652       State  State of the frontend node with respect  to  the  initiation  of
4653              user  jobs.   Acceptable  values  are  "DOWN",  "DRAIN", "FAIL",
4654              "FAILING" and "UNKNOWN".  "DOWN" indicates the frontend node has
4655              failed  and  is unavailable to be allocated work.  "DRAIN" indi‐
4656              cates the frontend node is unavailable  to  be  allocated  work.
4657              "FAIL" indicates the frontend node is expected to fail soon, has
4658              no jobs allocated to it, and will not be allocated  to  any  new
4659              jobs.  "FAILING" indicates the frontend node is expected to fail
4660              soon, has one or more jobs allocated to  it,  but  will  not  be
4661              allocated  to  any  new  jobs.  "UNKNOWN" indicates the frontend
4662              node's state is undefined (BUSY or IDLE),  but  will  be  estab‐
4663              lished  when  the  slurmd  daemon  on  that node registers.  The
4664              default value is "UNKNOWN".  Also see  the  DownNodes  parameter
4665              below.
4666
4667              For  example:  "FrontendName=frontend[00-03] FrontendAddr=efron‐
4668              tend[00-03] State=UNKNOWN" is used  to  define  four  front  end
4669              nodes for running slurmd daemons.
4670
4671
4672       The partition configuration permits you to establish different job lim‐
4673       its or access controls for various groups  (or  partitions)  of  nodes.
4674       Nodes  may  be  in  more than one partition, making partitions serve as
4675       general purpose queues.  For example one may put the same set of  nodes
4676       into  two  different  partitions, each with different constraints (time
4677       limit, job sizes, groups allowed to use the partition, etc.).  Jobs are
4678       allocated  resources  within a single partition.  Default values can be
4679       specified with a record  in  which  PartitionName  is  "DEFAULT".   The
4680       default  entry values will apply only to lines following it in the con‐
4681       figuration file and the default values can be reset multiple  times  in
4682       the   configuration   file  with  multiple  entries  where  "Partition‐
4683       Name=DEFAULT".  The "PartitionName=" specification must  be  placed  on
4684       every line describing the configuration of partitions.  Each line where
4685       PartitionName is "DEFAULT" will replace or add to previous default val‐
4686       ues and not a reinitialize the default values.  A single partition name
4687       can not appear as a PartitionName value in more than one  line  (dupli‐
4688       cate  partition  name records will be ignored).  If a partition that is
4689       in use is deleted from the configuration  and  slurm  is  restarted  or
4690       reconfigured  (scontrol reconfigure), jobs using the partition are can‐
4691       celed.  NOTE: Put all parameters for each partition on a  single  line.
4692       Each  line  of  partition  configuration information should represent a
4693       different partition.  The partition  configuration  file  contains  the
4694       following information:
4695
4696
4697       AllocNodes
4698              Comma  separated  list of nodes from which users can submit jobs
4699              in the partition.  Node names may be specified  using  the  node
4700              range  expression  syntax described above.  The default value is
4701              "ALL".
4702
4703
4704       AllowAccounts
4705              Comma separated list of accounts which may execute jobs  in  the
4706              partition.   The default value is "ALL".  NOTE: If AllowAccounts
4707              is used then DenyAccounts will not be enforced.  Also  refer  to
4708              DenyAccounts.
4709
4710
4711       AllowGroups
4712              Comma  separated  list  of group names which may execute jobs in
4713              the partition.  If at least one group associated with  the  user
4714              attempting to execute the job is in AllowGroups, he will be per‐
4715              mitted to use this partition.  Jobs executed as  user  root  can
4716              use  any  partition  without regard to the value of AllowGroups.
4717              If user root attempts to execute a job  as  another  user  (e.g.
4718              using  srun's  --uid  option), this other user must be in one of
4719              groups identified by AllowGroups for  the  job  to  successfully
4720              execute.   The default value is "ALL".  When set, all partitions
4721              that a user does not have access will  be  hidden  from  display
4722              regardless of the settings used for PrivateData.  NOTE: For per‐
4723              formance reasons, Slurm maintains a list of user IDs allowed  to
4724              use  each  partition and this is checked at job submission time.
4725              This list of user IDs is updated when the  slurmctld  daemon  is
4726              restarted, reconfigured (e.g. "scontrol reconfig") or the parti‐
4727              tion's AllowGroups value is reset, even if is value is unchanged
4728              (e.g.  "scontrol  update PartitionName=name AllowGroups=group").
4729              For a user's access to a partition to  change,  both  his  group
4730              membership  must  change  and Slurm's internal user ID list must
4731              change using one of the methods described above.
4732
4733
4734       AllowQos
4735              Comma separated list of Qos which may execute jobs in the parti‐
4736              tion.   Jobs executed as user root can use any partition without
4737              regard to the value of AllowQos.  The default  value  is  "ALL".
4738              NOTE:  If  AllowQos  is  used then DenyQos will not be enforced.
4739              Also refer to DenyQos.
4740
4741
4742       Alternate
4743              Partition name of alternate partition to be used if the state of
4744              this partition is "DRAIN" or "INACTIVE."
4745
4746
4747       CpuBind
4748              If  a job step request does not specify an option to control how
4749              tasks are bound to allocated CPUs  (--cpu-bind)  and  all  nodes
4750              allocated  to  the  job  do not have the same CpuBind option the
4751              node. Then the partition's CpuBind option will control how tasks
4752              are  bound  to allocated resources.  Supported values forCpuBind
4753              are  "none",  "board",  "socket",  "ldom"  (NUMA),  "core"   and
4754              "thread".
4755
4756
4757       Default
4758              If this keyword is set, jobs submitted without a partition spec‐
4759              ification will utilize  this  partition.   Possible  values  are
4760              "YES" and "NO".  The default value is "NO".
4761
4762
4763       DefCpuPerGPU
4764              Default count of CPUs allocated per allocated GPU.
4765
4766
4767       DefMemPerCPU
4768              Default   real  memory  size  available  per  allocated  CPU  in
4769              megabytes.  Used to avoid over-subscribing  memory  and  causing
4770              paging.  DefMemPerCPU would generally be used if individual pro‐
4771              cessors are allocated  to  jobs  (SelectType=select/cons_res  or
4772              SelectType=select/cons_tres).   If  not  set,  the  DefMemPerCPU
4773              value for the entire cluster will be  used.   Also  see  DefMem‐
4774              PerGPU,  DefMemPerNode  and MaxMemPerCPU.  DefMemPerCPU, DefMem‐
4775              PerGPU and DefMemPerNode are mutually exclusive.
4776
4777
4778       DefMemPerGPU
4779              Default  real  memory  size  available  per  allocated  GPU   in
4780              megabytes.   Also see DefMemPerCPU, DefMemPerNode and MaxMemPer‐
4781              CPU.  DefMemPerCPU, DefMemPerGPU and DefMemPerNode are  mutually
4782              exclusive.
4783
4784
4785       DefMemPerNode
4786              Default  real  memory  size  available  per  allocated  node  in
4787              megabytes.  Used to avoid over-subscribing  memory  and  causing
4788              paging.   DefMemPerNode  would  generally be used if whole nodes
4789              are allocated to jobs (SelectType=select/linear)  and  resources
4790              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
4791              If not set, the DefMemPerNode value for the entire cluster  will
4792              be  used.  Also see DefMemPerCPU, DefMemPerGPU and MaxMemPerCPU.
4793              DefMemPerCPU, DefMemPerGPU and DefMemPerNode are mutually exclu‐
4794              sive.
4795
4796
4797       DenyAccounts
4798              Comma  separated  list of accounts which may not execute jobs in
4799              the partition.  By default, no accounts are denied access  NOTE:
4800              If AllowAccounts is used then DenyAccounts will not be enforced.
4801              Also refer to AllowAccounts.
4802
4803
4804       DenyQos
4805              Comma separated list of Qos which may not execute  jobs  in  the
4806              partition.   By  default,  no  QOS  are  denied  access NOTE: If
4807              AllowQos is used then DenyQos will not be enforced.  Also  refer
4808              AllowQos.
4809
4810
4811       DefaultTime
4812              Run  time limit used for jobs that don't specify a value. If not
4813              set then MaxTime will be used.  Format is the same as  for  Max‐
4814              Time.
4815
4816
4817       DisableRootJobs
4818              If  set  to  "YES" then user root will be prevented from running
4819              any jobs on this partition.  The default value will be the value
4820              of  DisableRootJobs  set  outside  of  a partition specification
4821              (which is "NO", allowing user root to execute jobs).
4822
4823
4824       ExclusiveUser
4825              If set to "YES" then nodes  will  be  exclusively  allocated  to
4826              users.  Multiple jobs may be run for the same user, but only one
4827              user can be active at a time.  This capability is also available
4828              on a per-job basis by using the --exclusive=user option.
4829
4830
4831       GraceTime
4832              Specifies,  in units of seconds, the preemption grace time to be
4833              extended to a job which has been selected for  preemption.   The
4834              default  value  is  zero, no preemption grace time is allowed on
4835              this partition.  Once a job has been  selected  for  preemption,
4836              its  end  time  is  set  to the current time plus GraceTime. The
4837              job's tasks are immediately sent SIGCONT and SIGTERM signals  in
4838              order to provide notification of its imminent termination.  This
4839              is followed by the SIGCONT, SIGTERM and SIGKILL signal  sequence
4840              upon  reaching  its  new end time. This second set of signals is
4841              sent to both the tasks  and  the  containing  batch  script,  if
4842              applicable.   Meaningful  only for PreemptMode=CANCEL.  See also
4843              the global KillWait configuration parameter.
4844
4845
4846       Hidden Specifies if the partition and its jobs  are  to  be  hidden  by
4847              default.   Hidden  partitions will by default not be reported by
4848              the Slurm APIs or commands.  Possible values are "YES" and "NO".
4849              The  default  value  is  "NO".  Note that partitions that a user
4850              lacks access to by virtue of the AllowGroups parameter will also
4851              be hidden by default.
4852
4853
4854       LLN    Schedule resources to jobs on the least loaded nodes (based upon
4855              the number of idle CPUs). This is generally only recommended for
4856              an  environment  with serial jobs as idle resources will tend to
4857              be highly fragmented, resulting in parallel jobs being  distrib‐
4858              uted  across many nodes.  Note that node Weight takes precedence
4859              over how many idle resources are on each  node.   Also  see  the
4860              SelectParameters configuration parameter CR_LLN to use the least
4861              loaded nodes in every partition.
4862
4863
4864       MaxCPUsPerNode
4865              Maximum number of CPUs on any node available to  all  jobs  from
4866              this partition.  This can be especially useful to schedule GPUs.
4867              For example a node can be associated with two  Slurm  partitions
4868              (e.g.  "cpu"  and  "gpu") and the partition/queue "cpu" could be
4869              limited to only a subset of the node's CPUs, ensuring  that  one
4870              or  more  CPUs  would  be  available to jobs in the "gpu" parti‐
4871              tion/queue.
4872
4873
4874       MaxMemPerCPU
4875              Maximum  real  memory  size  available  per  allocated  CPU   in
4876              megabytes.   Used  to  avoid over-subscribing memory and causing
4877              paging.  MaxMemPerCPU would generally be used if individual pro‐
4878              cessors  are  allocated  to  jobs (SelectType=select/cons_res or
4879              SelectType=select/cons_tres).   If  not  set,  the  MaxMemPerCPU
4880              value  for the entire cluster will be used.  Also see DefMemPer‐
4881              CPU and MaxMemPerNode.  MaxMemPerCPU and MaxMemPerNode are mutu‐
4882              ally exclusive.
4883
4884
4885       MaxMemPerNode
4886              Maximum  real  memory  size  available  per  allocated  node  in
4887              megabytes.  Used to avoid over-subscribing  memory  and  causing
4888              paging.   MaxMemPerNode  would  generally be used if whole nodes
4889              are allocated to jobs (SelectType=select/linear)  and  resources
4890              are  over-subscribed (OverSubscribe=yes or OverSubscribe=force).
4891              If not set, the MaxMemPerNode value for the entire cluster  will
4892              be used.  Also see DefMemPerNode and MaxMemPerCPU.  MaxMemPerCPU
4893              and MaxMemPerNode are mutually exclusive.
4894
4895
4896       MaxNodes
4897              Maximum count of nodes which may be allocated to any single job.
4898              The  default  value  is "UNLIMITED", which is represented inter‐
4899              nally as -1.  This limit does not  apply  to  jobs  executed  by
4900              SlurmUser or user root.
4901
4902
4903       MaxTime
4904              Maximum  run  time  limit  for  jobs.   Format  is minutes, min‐
4905              utes:seconds, hours:minutes:seconds, days-hours, days-hours:min‐
4906              utes,  days-hours:minutes:seconds  or "UNLIMITED".  Time resolu‐
4907              tion is one minute and second values are rounded up to the  next
4908              minute.  This limit does not apply to jobs executed by SlurmUser
4909              or user root.
4910
4911
4912       MinNodes
4913              Minimum count of nodes which may be allocated to any single job.
4914              The  default value is 0.  This limit does not apply to jobs exe‐
4915              cuted by SlurmUser or user root.
4916
4917
4918       Nodes  Comma separated list of nodes which  are  associated  with  this
4919              partition.   Node  names  may  be specified using the node range
4920              expression syntax described above. A blank list of  nodes  (i.e.
4921              "Nodes=  ")  can  be used if one wants a partition to exist, but
4922              have no resources (possibly on a temporary basis).  A  value  of
4923              "ALL" is mapped to all nodes configured in the cluster.
4924
4925
4926       OverSubscribe
4927              Controls  the  ability of the partition to execute more than one
4928              job at a time on each resource (node, socket or  core  depending
4929              upon the value of SelectTypeParameters).  If resources are to be
4930              over-subscribed,  avoiding  memory  over-subscription  is   very
4931              important.   SelectTypeParameters  should be configured to treat
4932              memory as a consumable resource and the --mem option  should  be
4933              used  for  job  allocations.   Sharing of resources is typically
4934              useful  only  when  using  gang   scheduling   (PreemptMode=sus‐
4935              pend,gang).   Possible values for OverSubscribe are "EXCLUSIVE",
4936              "FORCE", "YES", and "NO".  Note that a value of "YES" or "FORCE"
4937              can  negatively  impact  performance for systems with many thou‐
4938              sands of running jobs.  The default value  is  "NO".   For  more
4939              information see the following web pages:
4940              https://slurm.schedmd.com/cons_res.html,
4941              https://slurm.schedmd.com/cons_res_share.html,
4942              https://slurm.schedmd.com/gang_scheduling.html, and
4943              https://slurm.schedmd.com/preempt.html.
4944
4945
4946              EXCLUSIVE   Allocates  entire  nodes  to  jobs even with Select‐
4947                          Type=select/cons_res or  SelectType=select/cons_tres
4948                          configured.  Jobs that run in partitions with "Over‐
4949                          Subscribe=EXCLUSIVE" will have exclusive  access  to
4950                          all allocated nodes.
4951
4952              FORCE       Makes  all  resources in the partition available for
4953                          oversubscription without any means for users to dis‐
4954                          able  it.   May be followed with a colon and maximum
4955                          number of jobs in running or suspended  state.   For
4956                          example  "OverSubscribe=FORCE:4"  enables each node,
4957                          socket or core to oversubscribe each  resource  four
4958                          ways.   Recommended  only  for  systems running with
4959                          gang scheduling  (PreemptMode=suspend,gang).   NOTE:
4960                          PreemptType=QOS will permit one additional job to be
4961                          run on the partition if started due to  job  preemp‐
4962                          tion.  For  example,  a  configuration  of  OverSub‐
4963                          scribe=FORCE:1  will  only  permit   one   job   per
4964                          resources  normally, but a second job can be started
4965                          if done so through preemption based upon  QOS.   The
4966                          use  of PreemptType=QOS and PreemptType=Suspend only
4967                          applies with SelectType=select/cons_res  or  Select‐
4968                          Type=select/cons_tres.
4969
4970              YES         Makes  all  resources in the partition available for
4971                          sharing upon request by  the  job.   Resources  will
4972                          only be over-subscribed when explicitly requested by
4973                          the user using the "--oversubscribe" option  on  job
4974                          submission.   May be followed with a colon and maxi‐
4975                          mum number of jobs in running  or  suspended  state.
4976                          For example "OverSubscribe=YES:4" enables each node,
4977                          socket or core to execute up to four jobs  at  once.
4978                          Recommended  only  for  systems  running  with  gang
4979                          scheduling (PreemptMode=suspend,gang).
4980
4981              NO          Selected resources are allocated to a single job. No
4982                          resource will be allocated to more than one job.
4983
4984
4985       PartitionName
4986              Name  by  which  the partition may be referenced (e.g. "Interac‐
4987              tive").  This name can be specified  by  users  when  submitting
4988              jobs.   If  the PartitionName is "DEFAULT", the values specified
4989              with that record will apply to subsequent  partition  specifica‐
4990              tions  unless  explicitly  set to other values in that partition
4991              record or replaced with a different set of default values.  Each
4992              line  where  PartitionName  is  "DEFAULT" will replace or add to
4993              previous default values and not a reinitialize the default  val‐
4994              ues.
4995
4996
4997       PreemptMode
4998              Mechanism used to preempt jobs from this partition when Preempt‐
4999              Type=preempt/partition_prio is configured.  This partition  spe‐
5000              cific PreemptMode configuration parameter will override the Pre‐
5001              emptMode configuration parameter set for the cluster as a whole.
5002              The  cluster-level  PreemptMode  must include the GANG option if
5003              PreemptMode is configured to SUSPEND  for  any  partition.   The
5004              cluster-level  PreemptMode  must  not  be  OFF if PreemptMode is
5005              enabled for any  partition.  See the description  of  the  clus‐
5006              ter-level  PreemptMode configuration parameter above for further
5007              information.
5008
5009
5010       PriorityJobFactor
5011              Partition factor used by priority/multifactor plugin  in  calcu‐
5012              lating  job priority.  The value may not exceed 65533.  Also see
5013              PriorityTier.
5014
5015
5016       PriorityTier
5017              Jobs submitted to a partition with a higher priority tier  value
5018              will  be  dispatched before pending jobs in partition with lower
5019              priority tier value and, if possible, they will preempt  running
5020              jobs from partitions with lower priority tier values.  Note that
5021              a partition's priority tier takes precedence over a job's prior‐
5022              ity.   The value may not exceed 65533.  Also see PriorityJobFac‐
5023              tor.
5024
5025
5026       QOS    Used to extend the limits available to a  QOS  on  a  partition.
5027              Jobs will not be associated to this QOS outside of being associ‐
5028              ated to the partition.  They will still be associated  to  their
5029              requested QOS.  By default, no QOS is used.  NOTE: If a limit is
5030              set in both the Partition's QOS and the Job's QOS the  Partition
5031              QOS  will  be  honored  unless the Job's QOS has the OverPartQOS
5032              flag set in which the Job's QOS will have priority.
5033
5034
5035       ReqResv
5036              Specifies users of this partition are required  to  designate  a
5037              reservation  when submitting a job. This option can be useful in
5038              restricting usage of a partition that may have  higher  priority
5039              or additional resources to be allowed only within a reservation.
5040              Possible values are "YES" and "NO".  The default value is "NO".
5041
5042
5043       RootOnly
5044              Specifies if only user ID zero (i.e.  user  root)  may  allocate
5045              resources  in  this  partition. User root may allocate resources
5046              for any other user, but the request must be  initiated  by  user
5047              root.   This  option can be useful for a partition to be managed
5048              by some external entity (e.g. a higher-level  job  manager)  and
5049              prevents  users  from  directly using those resources.  Possible
5050              values are "YES" and "NO".  The default value is "NO".
5051
5052
5053       SelectTypeParameters
5054              Partition-specific  resource  allocation  type.    This   option
5055              replaces  the global SelectTypeParameters value.  Supported val‐
5056              ues are CR_Core, CR_Core_Memory, CR_Socket and CR_Socket_Memory.
5057              Use  requires  the system-wide SelectTypeParameters value be set
5058              to any of the four supported values  previously  listed;  other‐
5059              wise, the partition-specific value will be ignored.
5060
5061
5062       Shared The  Shared  configuration  parameter  has  been replaced by the
5063              OverSubscribe parameter described above.
5064
5065
5066       State  State of partition or availability for use.  Possible values are
5067              "UP", "DOWN", "DRAIN" and "INACTIVE". The default value is "UP".
5068              See also the related "Alternate" keyword.
5069
5070              UP        Designates that new jobs may be queued on  the  parti‐
5071                        tion,  and  that  jobs  may be allocated nodes and run
5072                        from the partition.
5073
5074              DOWN      Designates that new jobs may be queued on  the  parti‐
5075                        tion,  but  queued jobs may not be allocated nodes and
5076                        run from the partition. Jobs already  running  on  the
5077                        partition continue to run. The jobs must be explicitly
5078                        canceled to force their termination.
5079
5080              DRAIN     Designates that no new jobs may be queued on the  par‐
5081                        tition (job submission requests will be denied with an
5082                        error message), but jobs already queued on the  parti‐
5083                        tion  may  be  allocated  nodes and run.  See also the
5084                        "Alternate" partition specification.
5085
5086              INACTIVE  Designates that no new jobs may be queued on the  par‐
5087                        tition,  and  jobs already queued may not be allocated
5088                        nodes and run.  See  also  the  "Alternate"  partition
5089                        specification.
5090
5091
5092       TRESBillingWeights
5093              TRESBillingWeights is used to define the billing weights of each
5094              TRES type that will be used in calculating the usage of  a  job.
5095              The calculated usage is used when calculating fairshare and when
5096              enforcing the TRES billing limit on jobs.
5097
5098              Billing weights are specified as a comma-separated list of <TRES
5099              Type>=<TRES Billing Weight> pairs.
5100
5101              Any  TRES Type is available for billing. Note that the base unit
5102              for memory and burst buffers is megabytes.
5103
5104              By default the billing of TRES is calculated as the sum  of  all
5105              TRES types multiplied by their corresponding billing weight.
5106
5107              The  weighted  amount  of a resource can be adjusted by adding a
5108              suffix of K,M,G,T or P after the billing weight. For example,  a
5109              memory weight of "mem=.25" on a job allocated 8GB will be billed
5110              2048 (8192MB *.25) units. A memory weight of "mem=.25G"  on  the
5111              same job will be billed 2 (8192MB * (.25/1024)) units.
5112
5113              Negative values are allowed.
5114
5115              When  a job is allocated 1 CPU and 8 GB of memory on a partition
5116              configured                   with                   TRESBilling‐
5117              Weights="CPU=1.0,Mem=0.25G,GRES/gpu=2.0", the billable TRES will
5118              be: (1*1.0) + (8*0.25) + (0*2.0) = 3.0.
5119
5120              If PriorityFlags=MAX_TRES is configured, the  billable  TRES  is
5121              calculated  as the MAX of individual TRES' on a node (e.g. cpus,
5122              mem, gres) plus the sum of all  global  TRES'  (e.g.  licenses).
5123              Using   the  same  example  above  the  billable  TRES  will  be
5124              MAX(1*1.0, 8*0.25) + (0*2.0) = 2.0.
5125
5126              If TRESBillingWeights is not defined  then  the  job  is  billed
5127              against the total number of allocated CPUs.
5128
5129              NOTE: TRESBillingWeights doesn't affect job priority directly as
5130              it is currently not used for the size of the job.  If  you  want
5131              TRES'  to  play  a  role in the job's priority then refer to the
5132              PriorityWeightTRES option.
5133
5134
5135

Prolog and Epilog Scripts

5137       There are a variety of prolog and epilog program options  that  execute
5138       with  various  permissions and at various times.  The four options most
5139       likely to be used are: Prolog and Epilog (executed once on each compute
5140       node  for  each job) plus PrologSlurmctld and EpilogSlurmctld (executed
5141       once on the ControlMachine for each job).
5142
5143       NOTE:  Standard output and error messages are normally  not  preserved.
5144       Explicitly  write  output and error messages to an appropriate location
5145       if you wish to preserve that information.
5146
5147       NOTE:  By default the Prolog script is ONLY run on any individual  node
5148       when  it  first  sees a job step from a new allocation; it does not run
5149       the Prolog immediately when an allocation is granted.  If no job  steps
5150       from  an allocation are run on a node, it will never run the Prolog for
5151       that allocation.  This Prolog behaviour can  be  changed  by  the  Pro‐
5152       logFlags  parameter.   The  Epilog,  on  the other hand, always runs on
5153       every node of an allocation when the allocation is released.
5154
5155       If the Epilog fails (returns a non-zero exit code), this will result in
5156       the  node  being  set  to  a DRAIN state.  If the EpilogSlurmctld fails
5157       (returns a non-zero exit code), this will only be logged.  If the  Pro‐
5158       log  fails (returns a non-zero exit code), this will result in the node
5159       being set to a DRAIN state and the job being requeued in a  held  state
5160       unless  nohold_on_prolog_fail is configured in SchedulerParameters.  If
5161       the PrologSlurmctld fails (returns a non-zero  exit  code),  this  will
5162       result  in  the  job  requeued to executed on another node if possible.
5163       Only batch jobs can be requeued.
5164        Interactive jobs (salloc and srun)  will  be  cancelled  if  the  Pro‐
5165       logSlurmctld fails.
5166
5167
5168       Information  about  the  job  is passed to the script using environment
5169       variables.  Unless otherwise specified, these environment variables are
5170       available to all of the programs.
5171
5172       SLURM_ARRAY_JOB_ID
5173              If  this job is part of a job array, this will be set to the job
5174              ID.  Otherwise it will not be set.  To reference  this  specific
5175              task   of   a   job   array,   combine  SLURM_ARRAY_JOB_ID  with
5176              SLURM_ARRAY_TASK_ID        (e.g.        "scontrol         update
5177              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5178              PrologSlurmctld and EpilogSlurmctld only.
5179
5180       SLURM_ARRAY_TASK_ID
5181              If this job is part of a job array, this will be set to the task
5182              ID.   Otherwise  it will not be set.  To reference this specific
5183              task  of  a   job   array,   combine   SLURM_ARRAY_JOB_ID   with
5184              SLURM_ARRAY_TASK_ID         (e.g.        "scontrol        update
5185              ${SLURM_ARRAY_JOB_ID}_{$SLURM_ARRAY_TASK_ID} ..."); Available in
5186              PrologSlurmctld and EpilogSlurmctld only.
5187
5188       SLURM_ARRAY_TASK_MAX
5189              If this job is part of a job array, this will be set to the max‐
5190              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5191              logSlurmctld and EpilogSlurmctld only.
5192
5193       SLURM_ARRAY_TASK_MIN
5194              If this job is part of a job array, this will be set to the min‐
5195              imum task ID.  Otherwise it will not be set.  Available in  Pro‐
5196              logSlurmctld and EpilogSlurmctld only.
5197
5198       SLURM_ARRAY_TASK_STEP
5199              If this job is part of a job array, this will be set to the step
5200              size of task IDs.  Otherwise it will not be set.   Available  in
5201              PrologSlurmctld and EpilogSlurmctld only.
5202
5203       SLURM_CLUSTER_NAME
5204              Name of the cluster executing the job.
5205
5206       SLURM_JOB_ACCOUNT
5207              Account name used for the job.  Available in PrologSlurmctld and
5208              EpilogSlurmctld only.
5209
5210       SLURM_JOB_CONSTRAINTS
5211              Features required to run the job.   Available  in  Prolog,  Pro‐
5212              logSlurmctld and EpilogSlurmctld only.
5213
5214       SLURM_JOB_DERIVED_EC
5215              The  highest  exit  code  of all of the job steps.  Available in
5216              EpilogSlurmctld only.
5217
5218       SLURM_JOB_EXIT_CODE
5219              The exit code of the job script (or salloc). The  value  is  the
5220              status  as  returned  by  the  wait()  system call (See wait(2))
5221              Available in EpilogSlurmctld only.
5222
5223       SLURM_JOB_EXIT_CODE2
5224              The exit code of the job script (or salloc). The value  has  the
5225              format  <exit>:<sig>.  The  first number is the exit code, typi‐
5226              cally as set by the exit() function. The second  number  of  the
5227              signal that caused the process to terminate if it was terminated
5228              by a signal.  Available in EpilogSlurmctld only.
5229
5230       SLURM_JOB_GID
5231              Group ID of the job's owner.  Available in PrologSlurmctld, Epi‐
5232              logSlurmctld and TaskProlog only.
5233
5234       SLURM_JOB_GPUS
5235              GPU  IDs allocated to the job (if any).  Available in the Prolog
5236              only.
5237
5238       SLURM_JOB_GROUP
5239              Group name of the job's owner.  Available in PrologSlurmctld and
5240              EpilogSlurmctld only.
5241
5242       SLURM_JOB_ID
5243              Job  ID.  CAUTION: If this job is the first task of a job array,
5244              then Slurm commands using this job ID will refer to  the  entire
5245              job array rather than this specific task of the job array.
5246
5247       SLURM_JOB_NAME
5248              Name  of the job.  Available in PrologSlurmctld and EpilogSlurm‐
5249              ctld only.
5250
5251       SLURM_JOB_NODELIST
5252              Nodes assigned to job. A Slurm hostlist  expression.   "scontrol
5253              show  hostnames"  can be used to convert this to a list of indi‐
5254              vidual  host  names.   Available  in  PrologSlurmctld  and  Epi‐
5255              logSlurmctld only.
5256
5257       SLURM_JOB_PARTITION
5258              Partition  that  job runs in.  Available in Prolog, PrologSlurm‐
5259              ctld and EpilogSlurmctld only.
5260
5261       SLURM_JOB_UID
5262              User ID of the job's owner.
5263
5264       SLURM_JOB_USER
5265              User name of the job's owner.
5266
5267

NETWORK TOPOLOGY

5269       Slurm is able to optimize job  allocations  to  minimize  network  con‐
5270       tention.   Special  Slurm logic is used to optimize allocations on sys‐
5271       tems with a three-dimensional interconnect.  and information about con‐
5272       figuring  those  systems  are  available  on  web pages available here:
5273       <https://slurm.schedmd.com/>.  For a hierarchical network, Slurm  needs
5274       to have detailed information about how nodes are configured on the net‐
5275       work switches.
5276
5277       Given network topology information, Slurm  allocates  all  of  a  job's
5278       resources  onto  a  single  leaf  of  the network (if possible) using a
5279       best-fit algorithm.  Otherwise it will allocate a job's resources  onto
5280       multiple  leaf  switches  so  as  to  minimize  the use of higher-level
5281       switches.  The TopologyPlugin parameter controls which plugin  is  used
5282       to  collect  network  topology  information.  The only values presently
5283       supported are "topology/3d_torus" (default for Cray XT/XE systems, per‐
5284       forms  best-fit logic over three-dimensional topology), "topology/none"
5285       (default for other systems, best-fit logic over one-dimensional  topol‐
5286       ogy), "topology/tree" (determine the network topology based upon infor‐
5287       mation contained in a topology.conf file, see "man  topology.conf"  for
5288       more  information).   Future  plugins  may  gather topology information
5289       directly from the network.  The topology information is  optional.   If
5290       not  provided,  Slurm  will  perform  a best-fit algorithm assuming the
5291       nodes are in a one-dimensional array as configured and  the  communica‐
5292       tions cost is related to the node distance in this array.
5293
5294

RELOCATING CONTROLLERS

5296       If  the  cluster's  computers used for the primary or backup controller
5297       will be out of service for an extended period of time, it may be desir‐
5298       able to relocate them.  In order to do so, follow this procedure:
5299
5300       1. Stop the Slurm daemons
5301       2. Modify the slurm.conf file appropriately
5302       3. Distribute the updated slurm.conf file to all nodes
5303       4. Restart the Slurm daemons
5304
5305       There  should  be  no loss of any running or pending jobs.  Ensure that
5306       any nodes added  to  the  cluster  have  the  current  slurm.conf  file
5307       installed.
5308
5309       CAUTION: If two nodes are simultaneously configured as the primary con‐
5310       troller (two nodes on which ControlMachine specify the local  host  and
5311       the  slurmctld  daemon  is  executing on each), system behavior will be
5312       destructive.  If a compute node  has  an  incorrect  ControlMachine  or
5313       BackupController  parameter, that node may be rendered unusable, but no
5314       other harm will result.
5315
5316

EXAMPLE

5318       #
5319       # Sample /etc/slurm.conf for dev[0-25].llnl.gov
5320       # Author: John Doe
5321       # Date: 11/06/2001
5322       #
5323       SlurmctldHost=dev0(12.34.56.78)  # Primary server
5324       SlurmctldHost=dev1(12.34.56.79)  # Backup server
5325       #
5326       AuthType=auth/munge
5327       Epilog=/usr/local/slurm/epilog
5328       Prolog=/usr/local/slurm/prolog
5329       FirstJobId=65536
5330       InactiveLimit=120
5331       JobCompType=jobcomp/filetxt
5332       JobCompLoc=/var/log/slurm/jobcomp
5333       KillWait=30
5334       MaxJobCount=10000
5335       MinJobAge=3600
5336       PluginDir=/usr/local/lib:/usr/local/slurm/lib
5337       ReturnToService=0
5338       SchedulerType=sched/backfill
5339       SlurmctldLogFile=/var/log/slurm/slurmctld.log
5340       SlurmdLogFile=/var/log/slurm/slurmd.log
5341       SlurmctldPort=7002
5342       SlurmdPort=7003
5343       SlurmdSpoolDir=/var/spool/slurmd.spool
5344       StateSaveLocation=/var/spool/slurm.state
5345       SwitchType=switch/none
5346       TmpFS=/tmp
5347       WaitTime=30
5348       JobCredentialPrivateKey=/usr/local/slurm/private.key
5349       JobCredentialPublicCertificate=/usr/local/slurm/public.cert
5350       #
5351       # Node Configurations
5352       #
5353       NodeName=DEFAULT CPUs=2 RealMemory=2000 TmpDisk=64000
5354       NodeName=DEFAULT State=UNKNOWN
5355       NodeName=dev[0-25] NodeAddr=edev[0-25] Weight=16
5356       # Update records for specific DOWN nodes
5357       DownNodes=dev20 State=DOWN Reason="power,ETA=Dec25"
5358       #
5359       # Partition Configurations
5360       #
5361       PartitionName=DEFAULT MaxTime=30 MaxNodes=10 State=UP
5362       PartitionName=debug Nodes=dev[0-8,18-25] Default=YES
5363       PartitionName=batch Nodes=dev[9-17]  MinNodes=4
5364       PartitionName=long Nodes=dev[9-17] MaxTime=120 AllowGroups=admin
5365
5366

INCLUDE MODIFIERS

5368       The "include" key word can be used with modifiers within the  specified
5369       pathname.  These modifiers would be replaced with cluster name or other
5370       information depending on which modifier is specified. If  the  included
5371       file  is  not  an  absolute  path  name  (i.e. it does not start with a
5372       slash), it will searched for in the same directory  as  the  slurm.conf
5373       file.
5374
5375       %c     Cluster name specified in the slurm.conf will be used.
5376
5377       EXAMPLE
5378       ClusterName=linux
5379       include /home/slurm/etc/%c_config
5380       # Above line interpreted as
5381       # "include /home/slurm/etc/linux_config"
5382
5383

FILE AND DIRECTORY PERMISSIONS

5385       There  are  three  classes  of  files:  Files used by slurmctld must be
5386       accessible by user SlurmUser and accessible by the primary  and  backup
5387       control machines.  Files used by slurmd must be accessible by user root
5388       and accessible from every compute node.  A few files need to be  acces‐
5389       sible by normal users on all login and compute nodes.  While many files
5390       and directories are listed below, most of them will not  be  used  with
5391       most configurations.
5392
5393       AccountingStorageLoc
5394              If this specifies a file, it must be writable by user SlurmUser.
5395              The file must be accessible by the primary  and  backup  control
5396              machines.   It  is  recommended that the file be readable by all
5397              users from login and compute nodes.
5398
5399       Epilog Must be executable by user root.  It  is  recommended  that  the
5400              file  be  readable  by  all users.  The file must exist on every
5401              compute node.
5402
5403       EpilogSlurmctld
5404              Must be executable by user SlurmUser.  It  is  recommended  that
5405              the  file be readable by all users.  The file must be accessible
5406              by the primary and backup control machines.
5407
5408       HealthCheckProgram
5409              Must be executable by user root.  It  is  recommended  that  the
5410              file  be  readable  by  all users.  The file must exist on every
5411              compute node.
5412
5413       JobCheckpointDir
5414              Must be writable by user SlurmUser and no other users.  The file
5415              must be accessible by the primary and backup control machines.
5416
5417       JobCompLoc
5418              If this specifies a file, it must be writable by user SlurmUser.
5419              The file must be accessible by the primary  and  backup  control
5420              machines.
5421
5422       JobCredentialPrivateKey
5423              Must be readable only by user SlurmUser and writable by no other
5424              users.  The file must be accessible by the  primary  and  backup
5425              control machines.
5426
5427       JobCredentialPublicCertificate
5428              Readable  to  all  users  on all nodes.  Must not be writable by
5429              regular users.
5430
5431       MailProg
5432              Must be executable by user SlurmUser.  Must not be  writable  by
5433              regular  users.   The file must be accessible by the primary and
5434              backup control machines.
5435
5436       Prolog Must be executable by user root.  It  is  recommended  that  the
5437              file  be  readable  by  all users.  The file must exist on every
5438              compute node.
5439
5440       PrologSlurmctld
5441              Must be executable by user SlurmUser.  It  is  recommended  that
5442              the  file be readable by all users.  The file must be accessible
5443              by the primary and backup control machines.
5444
5445       ResumeProgram
5446              Must be executable by user SlurmUser.  The file must be accessi‐
5447              ble by the primary and backup control machines.
5448
5449       SallocDefaultCommand
5450              Must  be  executable by all users.  The file must exist on every
5451              login and compute node.
5452
5453       slurm.conf
5454              Readable to all users on all nodes.  Must  not  be  writable  by
5455              regular users.
5456
5457       SlurmctldLogFile
5458              Must be writable by user SlurmUser.  The file must be accessible
5459              by the primary and backup control machines.
5460
5461       SlurmctldPidFile
5462              Must be writable by user root.  Preferably writable  and  remov‐
5463              able  by  SlurmUser.  The file must be accessible by the primary
5464              and backup control machines.
5465
5466       SlurmdLogFile
5467              Must be writable by user root.  A distinct file  must  exist  on
5468              each compute node.
5469
5470       SlurmdPidFile
5471              Must  be  writable  by user root.  A distinct file must exist on
5472              each compute node.
5473
5474       SlurmdSpoolDir
5475              Must be writable by user root.  A distinct file  must  exist  on
5476              each compute node.
5477
5478       SrunEpilog
5479              Must  be  executable by all users.  The file must exist on every
5480              login and compute node.
5481
5482       SrunProlog
5483              Must be executable by all users.  The file must exist  on  every
5484              login and compute node.
5485
5486       StateSaveLocation
5487              Must be writable by user SlurmUser.  The file must be accessible
5488              by the primary and backup control machines.
5489
5490       SuspendProgram
5491              Must be executable by user SlurmUser.  The file must be accessi‐
5492              ble by the primary and backup control machines.
5493
5494       TaskEpilog
5495              Must  be  executable by all users.  The file must exist on every
5496              compute node.
5497
5498       TaskProlog
5499              Must be executable by all users.  The file must exist  on  every
5500              compute node.
5501
5502       UnkillableStepProgram
5503              Must be executable by user SlurmUser.  The file must be accessi‐
5504              ble by the primary and backup control machines.
5505
5506

LOGGING

5508       Note that while Slurm daemons create  log  files  and  other  files  as
5509       needed,  it  treats  the  lack  of parent directories as a fatal error.
5510       This prevents the daemons from running if critical file systems are not
5511       mounted  and  will minimize the risk of cold-starting (starting without
5512       preserving jobs).
5513
5514       Log files and job accounting files, may need to be created/owned by the
5515       "SlurmUser"  uid  to  be  successfully  accessed.   Use the "chown" and
5516       "chmod" commands to set the ownership  and  permissions  appropriately.
5517       See  the  section  FILE AND DIRECTORY PERMISSIONS for information about
5518       the various files and directories used by Slurm.
5519
5520       It is recommended that the logrotate utility be  used  to  ensure  that
5521       various  log  files do not become too large.  This also applies to text
5522       files used for accounting, process tracking, and the  slurmdbd  log  if
5523       they are used.
5524
5525       Here is a sample logrotate configuration. Make appropriate site modifi‐
5526       cations and save as  /etc/logrotate.d/slurm  on  all  nodes.   See  the
5527       logrotate man page for more details.
5528
5529       ##
5530       # Slurm Logrotate Configuration
5531       ##
5532       /var/log/slurm/*.log {
5533            compress
5534            missingok
5535            nocopytruncate
5536            nodelaycompress
5537            nomail
5538            notifempty
5539            noolddir
5540            rotate 5
5541            sharedscripts
5542            size=5M
5543            create 640 slurm root
5544            postrotate
5545                 pkill -x --signal SIGUSR2 slurmctld
5546                 pkill -x --signal SIGUSR2 slurmd
5547                 pkill -x --signal SIGUSR2 slurmdbd
5548                 exit 0
5549            endscript
5550       }
5551

COPYING

5553       Copyright  (C)  2002-2007  The Regents of the University of California.
5554       Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
5555       Copyright (C) 2008-2010 Lawrence Livermore National Security.
5556       Copyright (C) 2010-2017 SchedMD LLC.
5557
5558       This file is  part  of  Slurm,  a  resource  management  program.   For
5559       details, see <https://slurm.schedmd.com/>.
5560
5561       Slurm  is free software; you can redistribute it and/or modify it under
5562       the terms of the GNU General Public License as published  by  the  Free
5563       Software  Foundation;  either  version  2  of  the License, or (at your
5564       option) any later version.
5565
5566       Slurm is distributed in the hope that it will be  useful,  but  WITHOUT
5567       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
5568       FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public  License
5569       for more details.
5570
5571

FILES

5573       /etc/slurm.conf
5574
5575