1cgroup.conf(5)             Slurm Configuration File             cgroup.conf(5)
2
3
4

NAME

6       cgroup.conf - Slurm configuration file for the cgroup support
7
8

DESCRIPTION

10       cgroup.conf  is  an ASCII file which defines parameters used by Slurm's
11       Linux cgroup related plugins.  The file will always be located  in  the
12       same directory as the slurm.conf.
13
14       Parameter  names are case insensitive.  Any text following a "#" in the
15       configuration file is treated as a comment  through  the  end  of  that
16       line.   Changes  to  the configuration file take effect upon restart of
17       Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
18       command "scontrol reconfigure" unless otherwise noted.
19
20
21       For  general  Slurm  cgroups  information,  see  the  Cgroups  Guide at
22       <https://slurm.schedmd.com/cgroups.html>.
23
24
25       The following cgroup.conf parameters are defined to control the general
26       behavior of Slurm cgroup plugins.
27
28
29       CgroupAutomount=<yes|no>
30              In  cgroup/v1  this  parameter  detects  if /sys/fs/cgroup/<con‐
31              troller_name> is available, and if not it  tries  to  mount  the
32              filesystem.   In  cgroup/v2  this parameter only takes effect if
33              IgnoreSystemd is set, and enables the  required  controllers  on
34              slurmd and slurmstepd cgroup directories.  This parameter is in‐
35              tended for development and testing with cgroup/v2.
36
37       CgroupMountpoint=PATH
38              Only intended for development and testing.  Specifies  the  PATH
39              under  which  cgroup  controllers should be mounted. The default
40              PATH is /sys/fs/cgroup.
41
42       CgroupPlugin=<cgroup/v1|cgroup/v2|autodetect>
43              Specify the plugin to be used when interacting with  the  cgroup
44              subsystem.  Supported values at the moment are "cgroup/v1" which
45              supports the legacy interface of cgroup v1, "cgroup/v2" for uni‐
46              fied  architecture,  or  "autodetect"  which  tries to determine
47              which cgroup version does your system provide.  This  is  useful
48              if  nodes  have  support for different cgroup versions.  The de‐
49              fault value is "autodetect".
50
51       IgnoreSystemd=<yes|no>
52              Only for cgroup/v2 and for  development  and  testing.  It  will
53              avoid  any  call  to  dbus and contact with systemd, and instead
54              will prepare all the cgroup hierarchy manually. This  option  is
55              dangerous  in systems with systemd since the cgroup can be modi‐
56              fied by systemd and cause issues to jobs.
57
58       IgnoreSystemdOnFailure=<yes|no>
59              Only for cgroup/v2 and for development and testing. It has simi‐
60              lar  functionality  to IgnoreSystemd but only in the case that a
61              dbus call does not succeed.
62

TASK/CGROUP PLUGIN

64       The following cgroup.conf parameters are defined to control the  behav‐
65       ior of this particular plugin:
66
67
68       AllowedKmemSpace=<number>
69              Only  for  cgroup/v1.  Constrain the job cgroup kernel memory to
70              this amount of the allocated memory, specified in bytes. The Al‐
71              lowedKmemSpace  must  be between the upper and lower memory lim‐
72              its, specified by MaxKmemPercent and MinKmemSpace, respectively.
73              If  AllowedKmemSpace  goes  beyond  the upper or lower limit, it
74              will be reset to that upper or lower limit, whichever  has  been
75              exceeded.
76
77       AllowedRAMSpace=<number>
78              Constrain  the job/step cgroup RAM to this percentage of the al‐
79              located memory.  The percentage supplied  may  be  expressed  as
80              floating  point number, e.g. 101.5.  Sets the cgroup soft memory
81              limit at the allocated memory size and then  sets  the  job/step
82              hard  memory limit at the (AllowedRAMSpace/100) * allocated mem‐
83              ory. If the job/step exceeds the hard limit, then it might trig‐
84              ger  Out  Of Memory (OOM) events (including oom-kill) which will
85              be logged to kernel log ring buffer (dmesg  in  Linux).  Setting
86              AllowedRAMSpace  above  100 may cause system Out of Memory (OOM)
87              events as it allows job/step to allocate more memory  than  con‐
88              figured to the nodes.  Reducing configured node available memory
89              to avoid  system  OOM  events  is  suggested.   Setting  Allowe‐
90              dRAMSpace  below  100  will result in jobs receiving less memory
91              than allocated and soft memory limit will set to the same  value
92              as  the  hard  limit.   Also see ConstrainRAMSpace.  The default
93              value is 100.
94
95       AllowedSwapSpace=<number>
96              Constrain the job cgroup swap space to this  percentage  of  the
97              allocated  memory.   The  default  value  is 0, which means that
98              RAM+Swap will be limited to AllowedRAMSpace. The  supplied  per‐
99              centage  may be expressed as a floating point number, e.g. 50.5.
100              If the limit is exceeded, the job steps will  be  killed  and  a
101              warning  message  will  be  written to standard error.  Also see
102              ConstrainSwapSpace.  NOTE: Setting AllowedSwapSpace  to  0  does
103              not  restrict the Linux kernel from using swap space. To control
104              how the kernel uses swap space, see MemorySwappiness.
105
106       ConstrainCores=<yes|no>
107              If configured to "yes" then constrain allowed cores to the  sub‐
108              set  of allocated resources. This functionality makes use of the
109              cpuset subsystem.  Due to a  bug  fixed  in  version  1.11.5  of
110              HWLOC,  the  task/affinity plugin may be required in addition to
111              task/cgroup for this to function properly.  The default value is
112              "no".
113
114       ConstrainDevices=<yes|no>
115              If  configured to "yes" then constrain the job's allowed devices
116              based on GRES allocated resources. It uses the devices subsystem
117              for that.  The default value is "no".
118
119       ConstrainKmemSpace=<yes|no>
120              Only  for  cgroup/v1.  If configured to "yes" then constrain the
121              job's Kmem RAM usage in addition to RAM usage. Only takes effect
122              if ConstrainRAMSpace is set to "yes". If enabled, the job's Kmem
123              limit will be assigned the  value  of  AllowedKmemSpace  or  the
124              value  coming  from  MaxKmemPercent.   The default value is "no"
125              which will leave Kmem setting untouched by Slurm.  Also see  Al‐
126              lowedKmemSpace, MaxKmemPercent.
127
128       ConstrainRAMSpace=<yes|no>
129              If  configured  to  "yes"  then constrain the job's RAM usage by
130              setting the memory soft limit to the allocated  memory  and  the
131              hard  limit  to the allocated memory * AllowedRAMSpace.  The de‐
132              fault value is "no", in which case the job's RAM limit  will  be
133              set  to  its  swap  space  limit if ConstrainSwapSpace is set to
134              "yes".  Also  see  AllowedSwapSpace,  AllowedRAMSpace  and  Con‐
135              strainSwapSpace.
136
137              NOTE:  When using ConstrainRAMSpace, if the combined memory used
138              by all processes in a step is greater than the limit,  then  the
139              kernel  will  trigger  an  OOM event, killing one or more of the
140              processes in the step. The step state will be marked as OOM, but
141              the  step  itself  will  keep running and other processes in the
142              step may continue to run as well.  This differs from the  behav‐
143              ior  of OverMemoryKill, where the whole step will be killed/can‐
144              celled.
145
146              NOTE: When enabled, ConstrainRAMSpace can lead to  a  noticeable
147              decline  in  per-node job throughout. Sites with high-throughput
148              requirements  should  carefully  weigh  the   tradeoff   between
149              per-node  throughput,  versus  potential problems that can arise
150              from   unconstrained   memory   usage   on   the    node.    See
151              <https://slurm.schedmd.com/high_throughput.html>   for   further
152              discussion.
153
154       ConstrainSwapSpace=<yes|no>
155              If configured to "yes" then constrain the job's swap  space  us‐
156              age.  The default value is "no". Note that when set to "yes" and
157              ConstrainRAMSpace is set to "no", AllowedRAMSpace  is  automati‐
158              cally  set to 100% in order to limit the RAM+Swap amount to 100%
159              of job's requirement plus the percent  of  allowed  swap  space.
160              This  amount  is  thus set to both RAM and RAM+Swap limits. This
161              means that in that particular case, ConstrainRAMSpace  is  auto‐
162              matically  enabled  with  the same limit as the one used to con‐
163              strain swap space.  Also see AllowedSwapSpace.
164
165       MaxRAMPercent=PERCENT
166              Set an upper bound in percent of total RAM on the RAM constraint
167              for  a  job.  This will be the memory constraint applied to jobs
168              that are not explicitly allocated memory by Slurm (i.e.  Slurm's
169              select  plugin  is not configured to manage memory allocations).
170              The PERCENT may be an arbitrary floating point number.  The  de‐
171              fault value is 100.
172
173       MaxSwapPercent=PERCENT
174              Set  an  upper  bound (in percent of total RAM) on the amount of
175              RAM+Swap that may be used for a job. This will be the swap limit
176              applied  to jobs on systems where memory is not being explicitly
177              allocated to job. The PERCENT may be an arbitrary floating point
178              number between 0 and 100.  The default value is 100.
179
180       MaxKmemPercent=PERCENT
181              Only  for cgroup/v1.  Set an upper bound in percent of total RAM
182              as the maximum Kmem for a job. The PERCENT may be  an  arbitrary
183              floating  point  number,  however, the product of MaxKmemPercent
184              and job requested memory has to fall  between  MinKmemSpace  and
185              job  requested memory, otherwise the boundary value is used. The
186              default value is 100.
187
188       MemorySwappiness=<number>
189              Only for cgroup/v1.  Configure the kernel's priority  for  swap‐
190              ping  out  anonymous  pages  (such  as program data) verses file
191              cache pages for the job cgroup. Valid values are between  0  and
192              100,  inclusive.  A value of 0 prevents the kernel from swapping
193              out program data. A value of 100 gives equal priority  to  swap‐
194              ping  out  file  cache  or anonymous pages. If not set, then the
195              kernel's default  swappiness  value  will  be  used.  Constrain‐
196              SwapSpace  must  be set to yes in order for this parameter to be
197              applied.
198
199       MinKmemSpace=<number>
200              Only for cgroup/v1.  Set a lower bound (in  MB)  on  the  memory
201              limits defined by AllowedKmemSpace. The default limit is 30M.
202
203       MinRAMSpace=<number>
204              Set  a  lower  bound (in MB) on the memory limits defined by Al‐
205              lowedRAMSpace and AllowedSwapSpace. This  prevents  accidentally
206              creating  a  memory cgroup with such a low limit that slurmstepd
207              is immediately killed due to lack of RAM. The default  limit  is
208              30M.
209

DISTRIBUTION-SPECIFIC NOTES

211       Debian  and  derivatives  (e.g.  Ubuntu) usually exclude the memory and
212       memsw (swap) cgroups by default. To include them, add the following pa‐
213       rameters to the kernel command line: cgroup_enable=memory swapaccount=1
214
215       This  can  usually  be placed in /etc/default/grub inside the GRUB_CMD‐
216       LINE_LINUX variable. A command such as update-grub must  be  run  after
217       updating the file.
218
219

EXAMPLE

221       /etc/slurm/cgroup.conf:
222              This example cgroup.conf file shows a configuration that enables
223              the more commonly used cgroup enforcement mechanisms.
224
225              ###
226              # Slurm cgroup support configuration file.
227              ###
228              CgroupAutomount=yes
229              CgroupMountpoint=/sys/fs/cgroup
230              ConstrainCores=yes
231              ConstrainDevices=yes
232              ConstrainKmemSpace=no        #avoid known Kernel issues
233              ConstrainRAMSpace=yes
234              ConstrainSwapSpace=yes
235
236
237       /etc/slurm/slurm.conf:
238              These are the entries required in  slurm.conf  to  activate  the
239              cgroup  enforcement  mechanisms. Make sure that the node defini‐
240              tions in your slurm.conf  closely  match  the  configuration  as
241              shown  by  "slurmd  -C".   Either  MemSpecLimit should be set or
242              RealMemory should be defined with less than the actual amount of
243              memory  for  a  node to ensure that all system/non-job processes
244              will have sufficient memory at all times. Sites should also con‐
245              figure  pam_slurm_adopt  to  ensure  users  can  not  escape the
246              cgroups via ssh.
247
248              ###
249              # Slurm configuration entries for cgroups
250              ###
251              ProctrackType=proctrack/cgroup
252              TaskPlugin=task/cgroup,task/affinity
253              JobAcctGatherType=jobacct_gather/cgroup #optional for gathering metrics
254              PrologFlags=Contain                     #X11 flag is also suggested
255
256

COPYING

258       Copyright (C) 2010-2012 Lawrence Livermore National Security.  Produced
259       at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
260       Copyright (C) 2010-2022 SchedMD LLC.
261
262       This  file  is  part  of Slurm, a resource management program.  For de‐
263       tails, see <https://slurm.schedmd.com/>.
264
265       Slurm is free software; you can redistribute it and/or modify it  under
266       the  terms  of  the GNU General Public License as published by the Free
267       Software Foundation; either version 2 of the License, or (at  your  op‐
268       tion) any later version.
269
270       Slurm  is  distributed  in the hope that it will be useful, but WITHOUT
271       ANY WARRANTY; without even the implied warranty of  MERCHANTABILITY  or
272       FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU General Public License
273       for more details.
274
275

SEE ALSO

277       slurm.conf(5)
278
279
280
281April 2022                 Slurm Configuration File             cgroup.conf(5)
Impressum