1cgroup.conf(5) Slurm Configuration File cgroup.conf(5)
2
3
4
6 cgroup.conf - Slurm configuration file for the cgroup support
7
8
10 cgroup.conf is an ASCII file which defines parameters used by Slurm's
11 Linux cgroup related plugins. The file will always be located in the
12 same directory as the slurm.conf.
13
14 Parameter names are case insensitive. Any text following a "#" in the
15 configuration file is treated as a comment through the end of that
16 line. Changes to the configuration file take effect upon restart of
17 Slurm daemons, daemon receipt of the SIGHUP signal, or execution of the
18 command "scontrol reconfigure" unless otherwise noted.
19
20
21 For general Slurm cgroups information, see the Cgroups Guide at
22 <https://slurm.schedmd.com/cgroups.html>.
23
24
25 The following cgroup.conf parameters are defined to control the general
26 behavior of Slurm cgroup plugins.
27
28
29 CgroupAutomount=<yes|no>
30 In cgroup/v1 this parameter detects if /sys/fs/cgroup/<con‐
31 troller_name> is available, and if not it tries to mount the
32 filesystem. In cgroup/v2 this parameter only takes effect if
33 IgnoreSystemd is set, and enables the required controllers on
34 slurmd and slurmstepd cgroup directories. This parameter is in‐
35 tended for development and testing with cgroup/v2.
36
37 CgroupMountpoint=PATH
38 Only intended for development and testing. Specifies the PATH
39 under which cgroup controllers should be mounted. The default
40 PATH is /sys/fs/cgroup.
41
42 CgroupPlugin=<cgroup/v1|cgroup/v2|autodetect>
43 Specify the plugin to be used when interacting with the cgroup
44 subsystem. Supported values at the moment are "cgroup/v1" which
45 supports the legacy interface of cgroup v1, "cgroup/v2" for uni‐
46 fied architecture, or "autodetect" which tries to determine
47 which cgroup version does your system provide. This is useful
48 if nodes have support for different cgroup versions. The de‐
49 fault value is "autodetect".
50
51 IgnoreSystemd=<yes|no>
52 Only for cgroup/v2 and for development and testing. It will
53 avoid any call to dbus and contact with systemd, and instead
54 will prepare all the cgroup hierarchy manually. This option is
55 dangerous in systems with systemd since the cgroup can be modi‐
56 fied by systemd and cause issues to jobs.
57
58 IgnoreSystemdOnFailure=<yes|no>
59 Only for cgroup/v2 and for development and testing. It has simi‐
60 lar functionality to IgnoreSystemd but only in the case that a
61 dbus call does not succeed.
62
64 The following cgroup.conf parameters are defined to control the behav‐
65 ior of this particular plugin:
66
67
68 AllowedKmemSpace=<number>
69 Only for cgroup/v1. Constrain the job cgroup kernel memory to
70 this amount of the allocated memory, specified in bytes. The Al‐
71 lowedKmemSpace must be between the upper and lower memory lim‐
72 its, specified by MaxKmemPercent and MinKmemSpace, respectively.
73 If AllowedKmemSpace goes beyond the upper or lower limit, it
74 will be reset to that upper or lower limit, whichever has been
75 exceeded.
76
77 AllowedRAMSpace=<number>
78 Constrain the job/step cgroup RAM to this percentage of the al‐
79 located memory. The percentage supplied may be expressed as
80 floating point number, e.g. 101.5. Sets the cgroup soft memory
81 limit at the allocated memory size and then sets the job/step
82 hard memory limit at the (AllowedRAMSpace/100) * allocated mem‐
83 ory. If the job/step exceeds the hard limit, then it might trig‐
84 ger Out Of Memory (OOM) events (including oom-kill) which will
85 be logged to kernel log ring buffer (dmesg in Linux). Setting
86 AllowedRAMSpace above 100 may cause system Out of Memory (OOM)
87 events as it allows job/step to allocate more memory than con‐
88 figured to the nodes. Reducing configured node available memory
89 to avoid system OOM events is suggested. Setting Allowe‐
90 dRAMSpace below 100 will result in jobs receiving less memory
91 than allocated and soft memory limit will set to the same value
92 as the hard limit. Also see ConstrainRAMSpace. The default
93 value is 100.
94
95 AllowedSwapSpace=<number>
96 Constrain the job cgroup swap space to this percentage of the
97 allocated memory. The default value is 0, which means that
98 RAM+Swap will be limited to AllowedRAMSpace. The supplied per‐
99 centage may be expressed as a floating point number, e.g. 50.5.
100 If the limit is exceeded, the job steps will be killed and a
101 warning message will be written to standard error. Also see
102 ConstrainSwapSpace. NOTE: Setting AllowedSwapSpace to 0 does
103 not restrict the Linux kernel from using swap space. To control
104 how the kernel uses swap space, see MemorySwappiness.
105
106 ConstrainCores=<yes|no>
107 If configured to "yes" then constrain allowed cores to the sub‐
108 set of allocated resources. This functionality makes use of the
109 cpuset subsystem. Due to a bug fixed in version 1.11.5 of
110 HWLOC, the task/affinity plugin may be required in addition to
111 task/cgroup for this to function properly. The default value is
112 "no".
113
114 ConstrainDevices=<yes|no>
115 If configured to "yes" then constrain the job's allowed devices
116 based on GRES allocated resources. It uses the devices subsystem
117 for that. The default value is "no".
118
119 ConstrainKmemSpace=<yes|no>
120 Only for cgroup/v1. If configured to "yes" then constrain the
121 job's Kmem RAM usage in addition to RAM usage. Only takes effect
122 if ConstrainRAMSpace is set to "yes". If enabled, the job's Kmem
123 limit will be assigned the value of AllowedKmemSpace or the
124 value coming from MaxKmemPercent. The default value is "no"
125 which will leave Kmem setting untouched by Slurm. Also see Al‐
126 lowedKmemSpace, MaxKmemPercent.
127
128 ConstrainRAMSpace=<yes|no>
129 If configured to "yes" then constrain the job's RAM usage by
130 setting the memory soft limit to the allocated memory and the
131 hard limit to the allocated memory * AllowedRAMSpace. The de‐
132 fault value is "no", in which case the job's RAM limit will be
133 set to its swap space limit if ConstrainSwapSpace is set to
134 "yes". Also see AllowedSwapSpace, AllowedRAMSpace and Con‐
135 strainSwapSpace.
136
137 NOTE: When using ConstrainRAMSpace, if the combined memory used
138 by all processes in a step is greater than the limit, then the
139 kernel will trigger an OOM event, killing one or more of the
140 processes in the step. The step state will be marked as OOM, but
141 the step itself will keep running and other processes in the
142 step may continue to run as well. This differs from the behav‐
143 ior of OverMemoryKill, where the whole step will be killed/can‐
144 celled.
145
146 NOTE: When enabled, ConstrainRAMSpace can lead to a noticeable
147 decline in per-node job throughout. Sites with high-throughput
148 requirements should carefully weigh the tradeoff between
149 per-node throughput, versus potential problems that can arise
150 from unconstrained memory usage on the node. See
151 <https://slurm.schedmd.com/high_throughput.html> for further
152 discussion.
153
154 ConstrainSwapSpace=<yes|no>
155 If configured to "yes" then constrain the job's swap space us‐
156 age. The default value is "no". Note that when set to "yes" and
157 ConstrainRAMSpace is set to "no", AllowedRAMSpace is automati‐
158 cally set to 100% in order to limit the RAM+Swap amount to 100%
159 of job's requirement plus the percent of allowed swap space.
160 This amount is thus set to both RAM and RAM+Swap limits. This
161 means that in that particular case, ConstrainRAMSpace is auto‐
162 matically enabled with the same limit as the one used to con‐
163 strain swap space. Also see AllowedSwapSpace.
164
165 MaxRAMPercent=PERCENT
166 Set an upper bound in percent of total RAM on the RAM constraint
167 for a job. This will be the memory constraint applied to jobs
168 that are not explicitly allocated memory by Slurm (i.e. Slurm's
169 select plugin is not configured to manage memory allocations).
170 The PERCENT may be an arbitrary floating point number. The de‐
171 fault value is 100.
172
173 MaxSwapPercent=PERCENT
174 Set an upper bound (in percent of total RAM) on the amount of
175 RAM+Swap that may be used for a job. This will be the swap limit
176 applied to jobs on systems where memory is not being explicitly
177 allocated to job. The PERCENT may be an arbitrary floating point
178 number between 0 and 100. The default value is 100.
179
180 MaxKmemPercent=PERCENT
181 Only for cgroup/v1. Set an upper bound in percent of total RAM
182 as the maximum Kmem for a job. The PERCENT may be an arbitrary
183 floating point number, however, the product of MaxKmemPercent
184 and job requested memory has to fall between MinKmemSpace and
185 job requested memory, otherwise the boundary value is used. The
186 default value is 100.
187
188 MemorySwappiness=<number>
189 Only for cgroup/v1. Configure the kernel's priority for swap‐
190 ping out anonymous pages (such as program data) verses file
191 cache pages for the job cgroup. Valid values are between 0 and
192 100, inclusive. A value of 0 prevents the kernel from swapping
193 out program data. A value of 100 gives equal priority to swap‐
194 ping out file cache or anonymous pages. If not set, then the
195 kernel's default swappiness value will be used. Constrain‐
196 SwapSpace must be set to yes in order for this parameter to be
197 applied.
198
199 MinKmemSpace=<number>
200 Only for cgroup/v1. Set a lower bound (in MB) on the memory
201 limits defined by AllowedKmemSpace. The default limit is 30M.
202
203 MinRAMSpace=<number>
204 Set a lower bound (in MB) on the memory limits defined by Al‐
205 lowedRAMSpace and AllowedSwapSpace. This prevents accidentally
206 creating a memory cgroup with such a low limit that slurmstepd
207 is immediately killed due to lack of RAM. The default limit is
208 30M.
209
211 Debian and derivatives (e.g. Ubuntu) usually exclude the memory and
212 memsw (swap) cgroups by default. To include them, add the following pa‐
213 rameters to the kernel command line: cgroup_enable=memory swapaccount=1
214
215 This can usually be placed in /etc/default/grub inside the GRUB_CMD‐
216 LINE_LINUX variable. A command such as update-grub must be run after
217 updating the file.
218
219
221 /etc/slurm/cgroup.conf:
222 This example cgroup.conf file shows a configuration that enables
223 the more commonly used cgroup enforcement mechanisms.
224
225 ###
226 # Slurm cgroup support configuration file.
227 ###
228 CgroupAutomount=yes
229 CgroupMountpoint=/sys/fs/cgroup
230 ConstrainCores=yes
231 ConstrainDevices=yes
232 ConstrainKmemSpace=no #avoid known Kernel issues
233 ConstrainRAMSpace=yes
234 ConstrainSwapSpace=yes
235
236
237 /etc/slurm/slurm.conf:
238 These are the entries required in slurm.conf to activate the
239 cgroup enforcement mechanisms. Make sure that the node defini‐
240 tions in your slurm.conf closely match the configuration as
241 shown by "slurmd -C". Either MemSpecLimit should be set or
242 RealMemory should be defined with less than the actual amount of
243 memory for a node to ensure that all system/non-job processes
244 will have sufficient memory at all times. Sites should also con‐
245 figure pam_slurm_adopt to ensure users can not escape the
246 cgroups via ssh.
247
248 ###
249 # Slurm configuration entries for cgroups
250 ###
251 ProctrackType=proctrack/cgroup
252 TaskPlugin=task/cgroup,task/affinity
253 JobAcctGatherType=jobacct_gather/cgroup #optional for gathering metrics
254 PrologFlags=Contain #X11 flag is also suggested
255
256
258 Copyright (C) 2010-2012 Lawrence Livermore National Security. Produced
259 at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
260 Copyright (C) 2010-2022 SchedMD LLC.
261
262 This file is part of Slurm, a resource management program. For de‐
263 tails, see <https://slurm.schedmd.com/>.
264
265 Slurm is free software; you can redistribute it and/or modify it under
266 the terms of the GNU General Public License as published by the Free
267 Software Foundation; either version 2 of the License, or (at your op‐
268 tion) any later version.
269
270 Slurm is distributed in the hope that it will be useful, but WITHOUT
271 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
272 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
273 for more details.
274
275
277 slurm.conf(5)
278
279
280
281April 2022 Slurm Configuration File cgroup.conf(5)