1CGROUPS(7) Linux Programmer's Manual CGROUPS(7)
2
3
4
6 cgroups - Linux control groups
7
9 Control groups, usually referred to as cgroups, are a Linux kernel fea‐
10 ture which allow processes to be organized into hierarchical groups
11 whose usage of various types of resources can then be limited and moni‐
12 tored. The kernel's cgroup interface is provided through a pseudo-
13 filesystem called cgroupfs. Grouping is implemented in the core cgroup
14 kernel code, while resource tracking and limits are implemented in a
15 set of per-resource-type subsystems (memory, CPU, and so on).
16
17 Terminology
18 A cgroup is a collection of processes that are bound to a set of limits
19 or parameters defined via the cgroup filesystem.
20
21 A subsystem is a kernel component that modifies the behavior of the
22 processes in a cgroup. Various subsystems have been implemented, mak‐
23 ing it possible to do things such as limiting the amount of CPU time
24 and memory available to a cgroup, accounting for the CPU time used by a
25 cgroup, and freezing and resuming execution of the processes in a
26 cgroup. Subsystems are sometimes also known as resource controllers
27 (or simply, controllers).
28
29 The cgroups for a controller are arranged in a hierarchy. This hierar‐
30 chy is defined by creating, removing, and renaming subdirectories
31 within the cgroup filesystem. At each level of the hierarchy, at‐
32 tributes (e.g., limits) can be defined. The limits, control, and ac‐
33 counting provided by cgroups generally have effect throughout the sub‐
34 hierarchy underneath the cgroup where the attributes are defined.
35 Thus, for example, the limits placed on a cgroup at a higher level in
36 the hierarchy cannot be exceeded by descendant cgroups.
37
38 Cgroups version 1 and version 2
39 The initial release of the cgroups implementation was in Linux 2.6.24.
40 Over time, various cgroup controllers have been added to allow the man‐
41 agement of various types of resources. However, the development of
42 these controllers was largely uncoordinated, with the result that many
43 inconsistencies arose between controllers and management of the cgroup
44 hierarchies became rather complex. A longer description of these prob‐
45 lems can be found in the kernel source file Documentation/ad‐
46 min-guide/cgroup-v2.rst (or Documentation/cgroup-v2.txt in Linux 4.17
47 and earlier).
48
49 Because of the problems with the initial cgroups implementation
50 (cgroups version 1), starting in Linux 3.10, work began on a new, or‐
51 thogonal implementation to remedy these problems. Initially marked ex‐
52 perimental, and hidden behind the -o __DEVEL__sane_behavior mount op‐
53 tion, the new version (cgroups version 2) was eventually made official
54 with the release of Linux 4.5. Differences between the two versions
55 are described in the text below. The file cgroup.sane_behavior,
56 present in cgroups v1, is a relic of this mount option. The file always
57 reports "0" and is only retained for backward compatibility.
58
59 Although cgroups v2 is intended as a replacement for cgroups v1, the
60 older system continues to exist (and for compatibility reasons is un‐
61 likely to be removed). Currently, cgroups v2 implements only a subset
62 of the controllers available in cgroups v1. The two systems are imple‐
63 mented so that both v1 controllers and v2 controllers can be mounted on
64 the same system. Thus, for example, it is possible to use those con‐
65 trollers that are supported under version 2, while also using version 1
66 controllers where version 2 does not yet support those controllers.
67 The only restriction here is that a controller can't be simultaneously
68 employed in both a cgroups v1 hierarchy and in the cgroups v2 hierar‐
69 chy.
70
72 Under cgroups v1, each controller may be mounted against a separate
73 cgroup filesystem that provides its own hierarchical organization of
74 the processes on the system. It is also possible to comount multiple
75 (or even all) cgroups v1 controllers against the same cgroup filesys‐
76 tem, meaning that the comounted controllers manage the same hierarchi‐
77 cal organization of processes.
78
79 For each mounted hierarchy, the directory tree mirrors the control
80 group hierarchy. Each control group is represented by a directory,
81 with each of its child control cgroups represented as a child direc‐
82 tory. For instance, /user/joe/1.session represents control group
83 1.session, which is a child of cgroup joe, which is a child of /user.
84 Under each cgroup directory is a set of files which can be read or
85 written to, reflecting resource limits and a few general cgroup proper‐
86 ties.
87
88 Tasks (threads) versus processes
89 In cgroups v1, a distinction is drawn between processes and tasks. In
90 this view, a process can consist of multiple tasks (more commonly
91 called threads, from a user-space perspective, and called such in the
92 remainder of this man page). In cgroups v1, it is possible to indepen‐
93 dently manipulate the cgroup memberships of the threads in a process.
94
95 The cgroups v1 ability to split threads across different cgroups caused
96 problems in some cases. For example, it made no sense for the memory
97 controller, since all of the threads of a process share a single ad‐
98 dress space. Because of these problems, the ability to independently
99 manipulate the cgroup memberships of the threads in a process was re‐
100 moved in the initial cgroups v2 implementation, and subsequently re‐
101 stored in a more limited form (see the discussion of "thread mode" be‐
102 low).
103
104 Mounting v1 controllers
105 The use of cgroups requires a kernel built with the CONFIG_CGROUP op‐
106 tion. In addition, each of the v1 controllers has an associated con‐
107 figuration option that must be set in order to employ that controller.
108
109 In order to use a v1 controller, it must be mounted against a cgroup
110 filesystem. The usual place for such mounts is under a tmpfs(5)
111 filesystem mounted at /sys/fs/cgroup. Thus, one might mount the cpu
112 controller as follows:
113
114 mount -t cgroup -o cpu none /sys/fs/cgroup/cpu
115
116 It is possible to comount multiple controllers against the same hierar‐
117 chy. For example, here the cpu and cpuacct controllers are comounted
118 against a single hierarchy:
119
120 mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
121
122 Comounting controllers has the effect that a process is in the same
123 cgroup for all of the comounted controllers. Separately mounting con‐
124 trollers allows a process to be in cgroup /foo1 for one controller
125 while being in /foo2/foo3 for another.
126
127 It is possible to comount all v1 controllers against the same hierar‐
128 chy:
129
130 mount -t cgroup -o all cgroup /sys/fs/cgroup
131
132 (One can achieve the same result by omitting -o all, since it is the
133 default if no controllers are explicitly specified.)
134
135 It is not possible to mount the same controller against multiple cgroup
136 hierarchies. For example, it is not possible to mount both the cpu and
137 cpuacct controllers against one hierarchy, and to mount the cpu con‐
138 troller alone against another hierarchy. It is possible to create mul‐
139 tiple mount points with exactly the same set of comounted controllers.
140 However, in this case all that results is multiple mount points provid‐
141 ing a view of the same hierarchy.
142
143 Note that on many systems, the v1 controllers are automatically mounted
144 under /sys/fs/cgroup; in particular, systemd(1) automatically creates
145 such mount points.
146
147 Unmounting v1 controllers
148 A mounted cgroup filesystem can be unmounted using the umount(8) com‐
149 mand, as in the following example:
150
151 umount /sys/fs/cgroup/pids
152
153 But note well: a cgroup filesystem is unmounted only if it is not busy,
154 that is, it has no child cgroups. If this is not the case, then the
155 only effect of the umount(8) is to make the mount invisible. Thus, to
156 ensure that the mount point is really removed, one must first remove
157 all child cgroups, which in turn can be done only after all member pro‐
158 cesses have been moved from those cgroups to the root cgroup.
159
160 Cgroups version 1 controllers
161 Each of the cgroups version 1 controllers is governed by a kernel con‐
162 figuration option (listed below). Additionally, the availability of
163 the cgroups feature is governed by the CONFIG_CGROUPS kernel configura‐
164 tion option.
165
166 cpu (since Linux 2.6.24; CONFIG_CGROUP_SCHED)
167 Cgroups can be guaranteed a minimum number of "CPU shares" when
168 a system is busy. This does not limit a cgroup's CPU usage if
169 the CPUs are not busy. For further information, see Documenta‐
170 tion/scheduler/sched-design-CFS.rst (or Documentation/sched‐
171 uler/sched-design-CFS.txt in Linux 5.2 and earlier).
172
173 In Linux 3.2, this controller was extended to provide CPU "band‐
174 width" control. If the kernel is configured with CON‐
175 FIG_CFS_BANDWIDTH, then within each scheduling period (defined
176 via a file in the cgroup directory), it is possible to define an
177 upper limit on the CPU time allocated to the processes in a
178 cgroup. This upper limit applies even if there is no other com‐
179 petition for the CPU. Further information can be found in the
180 kernel source file Documentation/scheduler/sched-bwc.rst (or
181 Documentation/scheduler/sched-bwc.txt in Linux 5.2 and earlier).
182
183 cpuacct (since Linux 2.6.24; CONFIG_CGROUP_CPUACCT)
184 This provides accounting for CPU usage by groups of processes.
185
186 Further information can be found in the kernel source file Docu‐
187 mentation/admin-guide/cgroup-v1/cpuacct.rst (or Documenta‐
188 tion/cgroup-v1/cpuacct.txt in Linux 5.2 and earlier).
189
190 cpuset (since Linux 2.6.24; CONFIG_CPUSETS)
191 This cgroup can be used to bind the processes in a cgroup to a
192 specified set of CPUs and NUMA nodes.
193
194 Further information can be found in the kernel source file Docu‐
195 mentation/admin-guide/cgroup-v1/cpusets.rst (or Documenta‐
196 tion/cgroup-v1/cpusets.txt in Linux 5.2 and earlier).
197
198 memory (since Linux 2.6.25; CONFIG_MEMCG)
199 The memory controller supports reporting and limiting of process
200 memory, kernel memory, and swap used by cgroups.
201
202 Further information can be found in the kernel source file Docu‐
203 mentation/admin-guide/cgroup-v1/memory.rst (or Documenta‐
204 tion/cgroup-v1/memory.txt in Linux 5.2 and earlier).
205
206 devices (since Linux 2.6.26; CONFIG_CGROUP_DEVICE)
207 This supports controlling which processes may create (mknod) de‐
208 vices as well as open them for reading or writing. The policies
209 may be specified as allow-lists and deny-lists. Hierarchy is
210 enforced, so new rules must not violate existing rules for the
211 target or ancestor cgroups.
212
213 Further information can be found in the kernel source file Docu‐
214 mentation/admin-guide/cgroup-v1/devices.rst (or Documenta‐
215 tion/cgroup-v1/devices.txt in Linux 5.2 and earlier).
216
217 freezer (since Linux 2.6.28; CONFIG_CGROUP_FREEZER)
218 The freezer cgroup can suspend and restore (resume) all pro‐
219 cesses in a cgroup. Freezing a cgroup /A also causes its chil‐
220 dren, for example, processes in /A/B, to be frozen.
221
222 Further information can be found in the kernel source file Docu‐
223 mentation/admin-guide/cgroup-v1/freezer-subsystem.rst (or Docu‐
224 mentation/cgroup-v1/freezer-subsystem.txt in Linux 5.2 and ear‐
225 lier).
226
227 net_cls (since Linux 2.6.29; CONFIG_CGROUP_NET_CLASSID)
228 This places a classid, specified for the cgroup, on network
229 packets created by a cgroup. These classids can then be used in
230 firewall rules, as well as used to shape traffic using tc(8).
231 This applies only to packets leaving the cgroup, not to traffic
232 arriving at the cgroup.
233
234 Further information can be found in the kernel source file Docu‐
235 mentation/admin-guide/cgroup-v1/net_cls.rst (or Documenta‐
236 tion/cgroup-v1/net_cls.txt in Linux 5.2 and earlier).
237
238 blkio (since Linux 2.6.33; CONFIG_BLK_CGROUP)
239 The blkio cgroup controls and limits access to specified block
240 devices by applying IO control in the form of throttling and up‐
241 per limits against leaf nodes and intermediate nodes in the
242 storage hierarchy.
243
244 Two policies are available. The first is a proportional-weight
245 time-based division of disk implemented with CFQ. This is in
246 effect for leaf nodes using CFQ. The second is a throttling
247 policy which specifies upper I/O rate limits on a device.
248
249 Further information can be found in the kernel source file Docu‐
250 mentation/admin-guide/cgroup-v1/blkio-controller.rst (or Docu‐
251 mentation/cgroup-v1/blkio-controller.txt in Linux 5.2 and ear‐
252 lier).
253
254 perf_event (since Linux 2.6.39; CONFIG_CGROUP_PERF)
255 This controller allows perf monitoring of the set of processes
256 grouped in a cgroup.
257
258 Further information can be found in the kernel source files
259
260 net_prio (since Linux 3.3; CONFIG_CGROUP_NET_PRIO)
261 This allows priorities to be specified, per network interface,
262 for cgroups.
263
264 Further information can be found in the kernel source file Docu‐
265 mentation/admin-guide/cgroup-v1/net_prio.rst (or Documenta‐
266 tion/cgroup-v1/net_prio.txt in Linux 5.2 and earlier).
267
268 hugetlb (since Linux 3.5; CONFIG_CGROUP_HUGETLB)
269 This supports limiting the use of huge pages by cgroups.
270
271 Further information can be found in the kernel source file Docu‐
272 mentation/admin-guide/cgroup-v1/hugetlb.rst (or Documenta‐
273 tion/cgroup-v1/hugetlb.txt in Linux 5.2 and earlier).
274
275 pids (since Linux 4.3; CONFIG_CGROUP_PIDS)
276 This controller permits limiting the number of process that may
277 be created in a cgroup (and its descendants).
278
279 Further information can be found in the kernel source file Docu‐
280 mentation/admin-guide/cgroup-v1/pids.rst (or Documenta‐
281 tion/cgroup-v1/pids.txt in Linux 5.2 and earlier).
282
283 rdma (since Linux 4.11; CONFIG_CGROUP_RDMA)
284 The RDMA controller permits limiting the use of RDMA/IB-specific
285 resources per cgroup.
286
287 Further information can be found in the kernel source file Docu‐
288 mentation/admin-guide/cgroup-v1/rdma.rst (or Documenta‐
289 tion/cgroup-v1/rdma.txt in Linux 5.2 and earlier).
290
291 Creating cgroups and moving processes
292 A cgroup filesystem initially contains a single root cgroup, '/', which
293 all processes belong to. A new cgroup is created by creating a direc‐
294 tory in the cgroup filesystem:
295
296 mkdir /sys/fs/cgroup/cpu/cg1
297
298 This creates a new empty cgroup.
299
300 A process may be moved to this cgroup by writing its PID into the
301 cgroup's cgroup.procs file:
302
303 echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
304
305 Only one PID at a time should be written to this file.
306
307 Writing the value 0 to a cgroup.procs file causes the writing process
308 to be moved to the corresponding cgroup.
309
310 When writing a PID into the cgroup.procs, all threads in the process
311 are moved into the new cgroup at once.
312
313 Within a hierarchy, a process can be a member of exactly one cgroup.
314 Writing a process's PID to a cgroup.procs file automatically removes it
315 from the cgroup of which it was previously a member.
316
317 The cgroup.procs file can be read to obtain a list of the processes
318 that are members of a cgroup. The returned list of PIDs is not guaran‐
319 teed to be in order. Nor is it guaranteed to be free of duplicates.
320 (For example, a PID may be recycled while reading from the list.)
321
322 In cgroups v1, an individual thread can be moved to another cgroup by
323 writing its thread ID (i.e., the kernel thread ID returned by clone(2)
324 and gettid(2)) to the tasks file in a cgroup directory. This file can
325 be read to discover the set of threads that are members of the cgroup.
326
327 Removing cgroups
328 To remove a cgroup, it must first have no child cgroups and contain no
329 (nonzombie) processes. So long as that is the case, one can simply re‐
330 move the corresponding directory pathname. Note that files in a cgroup
331 directory cannot and need not be removed.
332
333 Cgroups v1 release notification
334 Two files can be used to determine whether the kernel provides notifi‐
335 cations when a cgroup becomes empty. A cgroup is considered to be
336 empty when it contains no child cgroups and no member processes.
337
338 A special file in the root directory of each cgroup hierarchy, re‐
339 lease_agent, can be used to register the pathname of a program that may
340 be invoked when a cgroup in the hierarchy becomes empty. The pathname
341 of the newly empty cgroup (relative to the cgroup mount point) is pro‐
342 vided as the sole command-line argument when the release_agent program
343 is invoked. The release_agent program might remove the cgroup direc‐
344 tory, or perhaps repopulate it with a process.
345
346 The default value of the release_agent file is empty, meaning that no
347 release agent is invoked.
348
349 The content of the release_agent file can also be specified via a mount
350 option when the cgroup filesystem is mounted:
351
352 mount -o release_agent=pathname ...
353
354 Whether or not the release_agent program is invoked when a particular
355 cgroup becomes empty is determined by the value in the notify_on_re‐
356 lease file in the corresponding cgroup directory. If this file con‐
357 tains the value 0, then the release_agent program is not invoked. If
358 it contains the value 1, the release_agent program is invoked. The de‐
359 fault value for this file in the root cgroup is 0. At the time when a
360 new cgroup is created, the value in this file is inherited from the
361 corresponding file in the parent cgroup.
362
363 Cgroup v1 named hierarchies
364 In cgroups v1, it is possible to mount a cgroup hierarchy that has no
365 attached controllers:
366
367 mount -t cgroup -o none,name=somename none /some/mount/point
368
369 Multiple instances of such hierarchies can be mounted; each hierarchy
370 must have a unique name. The only purpose of such hierarchies is to
371 track processes. (See the discussion of release notification below.)
372 An example of this is the name=systemd cgroup hierarchy that is used by
373 systemd(1) to track services and user sessions.
374
375 Since Linux 5.0, the cgroup_no_v1 kernel boot option (described below)
376 can be used to disable cgroup v1 named hierarchies, by specifying
377 cgroup_no_v1=named.
378
379
381 In cgroups v2, all mounted controllers reside in a single unified hier‐
382 archy. While (different) controllers may be simultaneously mounted un‐
383 der the v1 and v2 hierarchies, it is not possible to mount the same
384 controller simultaneously under both the v1 and the v2 hierarchies.
385
386 The new behaviors in cgroups v2 are summarized here, and in some cases
387 elaborated in the following subsections.
388
389 1. Cgroups v2 provides a unified hierarchy against which all con‐
390 trollers are mounted.
391
392 2. "Internal" processes are not permitted. With the exception of the
393 root cgroup, processes may reside only in leaf nodes (cgroups that
394 do not themselves contain child cgroups). The details are somewhat
395 more subtle than this, and are described below.
396
397 3. Active cgroups must be specified via the files cgroup.controllers
398 and cgroup.subtree_control.
399
400 4. The tasks file has been removed. In addition, the
401 cgroup.clone_children file that is employed by the cpuset controller
402 has been removed.
403
404 5. An improved mechanism for notification of empty cgroups is provided
405 by the cgroup.events file.
406
407 For more changes, see the Documentation/admin-guide/cgroup-v2.rst file
408 in the kernel source (or Documentation/cgroup-v2.txt in Linux 4.17 and
409 earlier).
410
411 Some of the new behaviors listed above saw subsequent modification with
412 the addition in Linux 4.14 of "thread mode" (described below).
413
414 Cgroups v2 unified hierarchy
415 In cgroups v1, the ability to mount different controllers against dif‐
416 ferent hierarchies was intended to allow great flexibility for applica‐
417 tion design. In practice, though, the flexibility turned out to be
418 less useful than expected, and in many cases added complexity. There‐
419 fore, in cgroups v2, all available controllers are mounted against a
420 single hierarchy. The available controllers are automatically mounted,
421 meaning that it is not necessary (or possible) to specify the con‐
422 trollers when mounting the cgroup v2 filesystem using a command such as
423 the following:
424
425 mount -t cgroup2 none /mnt/cgroup2
426
427 A cgroup v2 controller is available only if it is not currently in use
428 via a mount against a cgroup v1 hierarchy. Or, to put things another
429 way, it is not possible to employ the same controller against both a v1
430 hierarchy and the unified v2 hierarchy. This means that it may be nec‐
431 essary first to unmount a v1 controller (as described above) before
432 that controller is available in v2. Since systemd(1) makes heavy use
433 of some v1 controllers by default, it can in some cases be simpler to
434 boot the system with selected v1 controllers disabled. To do this,
435 specify the cgroup_no_v1=list option on the kernel boot command line;
436 list is a comma-separated list of the names of the controllers to dis‐
437 able, or the word all to disable all v1 controllers. (This situation
438 is correctly handled by systemd(1), which falls back to operating with‐
439 out the specified controllers.)
440
441 Note that on many modern systems, systemd(1) automatically mounts the
442 cgroup2 filesystem at /sys/fs/cgroup/unified during the boot process.
443
444 Cgroups v2 mount options
445 The following options (mount -o) can be specified when mounting the
446 group v2 filesystem:
447
448 nsdelegate (since Linux 4.15)
449 Treat cgroup namespaces as delegation boundaries. For details,
450 see below.
451
452 memory_localevents (since Linux 5.2)
453 The memory.events should show statistics only for the cgroup it‐
454 self, and not for any descendant cgroups. This was the behavior
455 before Linux 5.2. Starting in Linux 5.2, the default behavior
456 is to include statistics for descendant cgroups in mem‐
457 ory.events, and this mount option can be used to revert to the
458 legacy behavior. This option is system wide and can be set on
459 mount or modified through remount only from the initial mount
460 namespace; it is silently ignored in noninitial namespaces.
461
462 Cgroups v2 controllers
463 The following controllers, documented in the kernel source file Docu‐
464 mentation/admin-guide/cgroup-v2.rst (or Documentation/cgroup-v2.txt in
465 Linux 4.17 and earlier), are supported in cgroups version 2:
466
467 cpu (since Linux 4.15)
468 This is the successor to the version 1 cpu and cpuacct con‐
469 trollers.
470
471 cpuset (since Linux 5.0)
472 This is the successor of the version 1 cpuset controller.
473
474 freezer (since Linux 5.2)
475 This is the successor of the version 1 freezer controller.
476
477 hugetlb (since Linux 5.6)
478 This is the successor of the version 1 hugetlb controller.
479
480 io (since Linux 4.5)
481 This is the successor of the version 1 blkio controller.
482
483 memory (since Linux 4.5)
484 This is the successor of the version 1 memory controller.
485
486 perf_event (since Linux 4.11)
487 This is the same as the version 1 perf_event controller.
488
489 pids (since Linux 4.5)
490 This is the same as the version 1 pids controller.
491
492 rdma (since Linux 4.11)
493 This is the same as the version 1 rdma controller.
494
495 There is no direct equivalent of the net_cls and net_prio controllers
496 from cgroups version 1. Instead, support has been added to iptables(8)
497 to allow eBPF filters that hook on cgroup v2 pathnames to make deci‐
498 sions about network traffic on a per-cgroup basis.
499
500 The v2 devices controller provides no interface files; instead, device
501 control is gated by attaching an eBPF (BPF_CGROUP_DEVICE) program to a
502 v2 cgroup.
503
504 Cgroups v2 subtree control
505 Each cgroup in the v2 hierarchy contains the following two files:
506
507 cgroup.controllers
508 This read-only file exposes a list of the controllers that are
509 available in this cgroup. The contents of this file match the
510 contents of the cgroup.subtree_control file in the parent
511 cgroup.
512
513 cgroup.subtree_control
514 This is a list of controllers that are active (enabled) in the
515 cgroup. The set of controllers in this file is a subset of the
516 set in the cgroup.controllers of this cgroup. The set of active
517 controllers is modified by writing strings to this file contain‐
518 ing space-delimited controller names, each preceded by '+' (to
519 enable a controller) or '-' (to disable a controller), as in the
520 following example:
521
522 echo '+pids -memory' > x/y/cgroup.subtree_control
523
524 An attempt to enable a controller that is not present in
525 cgroup.controllers leads to an ENOENT error when writing to the
526 cgroup.subtree_control file.
527
528 Because the list of controllers in cgroup.subtree_control is a subset
529 of those cgroup.controllers, a controller that has been disabled in one
530 cgroup in the hierarchy can never be re-enabled in the subtree below
531 that cgroup.
532
533 A cgroup's cgroup.subtree_control file determines the set of con‐
534 trollers that are exercised in the child cgroups. When a controller
535 (e.g., pids) is present in the cgroup.subtree_control file of a parent
536 cgroup, then the corresponding controller-interface files (e.g.,
537 pids.max) are automatically created in the children of that cgroup and
538 can be used to exert resource control in the child cgroups.
539
540 Cgroups v2 "no internal processes" rule
541 Cgroups v2 enforces a so-called "no internal processes" rule. Roughly
542 speaking, this rule means that, with the exception of the root cgroup,
543 processes may reside only in leaf nodes (cgroups that do not themselves
544 contain child cgroups). This avoids the need to decide how to parti‐
545 tion resources between processes which are members of cgroup A and pro‐
546 cesses in child cgroups of A.
547
548 For instance, if cgroup /cg1/cg2 exists, then a process may reside in
549 /cg1/cg2, but not in /cg1. This is to avoid an ambiguity in cgroups v1
550 with respect to the delegation of resources between processes in /cg1
551 and its child cgroups. The recommended approach in cgroups v2 is to
552 create a subdirectory called leaf for any nonleaf cgroup which should
553 contain processes, but no child cgroups. Thus, processes which previ‐
554 ously would have gone into /cg1 would now go into /cg1/leaf. This has
555 the advantage of making explicit the relationship between processes in
556 /cg1/leaf and /cg1's other children.
557
558 The "no internal processes" rule is in fact more subtle than stated
559 above. More precisely, the rule is that a (nonroot) cgroup can't both
560 (1) have member processes, and (2) distribute resources into child
561 cgroups—that is, have a nonempty cgroup.subtree_control file. Thus, it
562 is possible for a cgroup to have both member processes and child
563 cgroups, but before controllers can be enabled for that cgroup, the
564 member processes must be moved out of the cgroup (e.g., perhaps into
565 the child cgroups).
566
567 With the Linux 4.14 addition of "thread mode" (described below), the
568 "no internal processes" rule has been relaxed in some cases.
569
570 Cgroups v2 cgroup.events file
571 Each nonroot cgroup in the v2 hierarchy contains a read-only file,
572 cgroup.events, whose contents are key-value pairs (delimited by newline
573 characters, with the key and value separated by spaces) providing state
574 information about the cgroup:
575
576 $ cat mygrp/cgroup.events
577 populated 1
578 frozen 0
579
580 The following keys may appear in this file:
581
582 populated
583 The value of this key is either 1, if this cgroup or any of its
584 descendants has member processes, or otherwise 0.
585
586 frozen (since Linux 5.2)
587 The value of this key is 1 if this cgroup is currently frozen,
588 or 0 if it is not.
589
590 The cgroup.events file can be monitored, in order to receive notifica‐
591 tion when the value of one of its keys changes. Such monitoring can be
592 done using inotify(7), which notifies changes as IN_MODIFY events, or
593 poll(2), which notifies changes by returning the POLLPRI and POLLERR
594 bits in the revents field.
595
596 Cgroup v2 release notification
597 Cgroups v2 provides a new mechanism for obtaining notification when a
598 cgroup becomes empty. The cgroups v1 release_agent and notify_on_re‐
599 lease files are removed, and replaced by the populated key in the
600 cgroup.events file. This key either has the value 0, meaning that the
601 cgroup (and its descendants) contain no (nonzombie) member processes,
602 or 1, meaning that the cgroup (or one of its descendants) contains mem‐
603 ber processes.
604
605 The cgroups v2 release-notification mechanism offers the following ad‐
606 vantages over the cgroups v1 release_agent mechanism:
607
608 * It allows for cheaper notification, since a single process can moni‐
609 tor multiple cgroup.events files (using the techniques described
610 earlier). By contrast, the cgroups v1 mechanism requires the ex‐
611 pense of creating a process for each notification.
612
613 * Notification for different cgroup subhierarchies can be delegated to
614 different processes. By contrast, the cgroups v1 mechanism allows
615 only one release agent for an entire hierarchy.
616
617 Cgroups v2 cgroup.stat file
618 Each cgroup in the v2 hierarchy contains a read-only cgroup.stat file
619 (first introduced in Linux 4.14) that consists of lines containing key-
620 value pairs. The following keys currently appear in this file:
621
622 nr_descendants
623 This is the total number of visible (i.e., living) descendant
624 cgroups underneath this cgroup.
625
626 nr_dying_descendants
627 This is the total number of dying descendant cgroups underneath
628 this cgroup. A cgroup enters the dying state after being
629 deleted. It remains in that state for an undefined period
630 (which will depend on system load) while resources are freed be‐
631 fore the cgroup is destroyed. Note that the presence of some
632 cgroups in the dying state is normal, and is not indicative of
633 any problem.
634
635 A process can't be made a member of a dying cgroup, and a dying
636 cgroup can't be brought back to life.
637
638 Limiting the number of descendant cgroups
639 Each cgroup in the v2 hierarchy contains the following files, which can
640 be used to view and set limits on the number of descendant cgroups un‐
641 der that cgroup:
642
643 cgroup.max.depth (since Linux 4.14)
644 This file defines a limit on the depth of nesting of descendant
645 cgroups. A value of 0 in this file means that no descendant
646 cgroups can be created. An attempt to create a descendant whose
647 nesting level exceeds the limit fails (mkdir(2) fails with the
648 error EAGAIN).
649
650 Writing the string "max" to this file means that no limit is im‐
651 posed. The default value in this file is "max".
652
653 cgroup.max.descendants (since Linux 4.14)
654 This file defines a limit on the number of live descendant
655 cgroups that this cgroup may have. An attempt to create more
656 descendants than allowed by the limit fails (mkdir(2) fails with
657 the error EAGAIN).
658
659 Writing the string "max" to this file means that no limit is im‐
660 posed. The default value in this file is "max".
661
663 In the context of cgroups, delegation means passing management of some
664 subtree of the cgroup hierarchy to a nonprivileged user. Cgroups v1
665 provides support for delegation based on file permissions in the cgroup
666 hierarchy but with less strict containment rules than v2 (as noted be‐
667 low). Cgroups v2 supports delegation with containment by explicit de‐
668 sign. The focus of the discussion in this section is on delegation in
669 cgroups v2, with some differences for cgroups v1 noted along the way.
670
671 Some terminology is required in order to describe delegation. A dele‐
672 gater is a privileged user (i.e., root) who owns a parent cgroup. A
673 delegatee is a nonprivileged user who will be granted the permissions
674 needed to manage some subhierarchy under that parent cgroup, known as
675 the delegated subtree.
676
677 To perform delegation, the delegater makes certain directories and
678 files writable by the delegatee, typically by changing the ownership of
679 the objects to be the user ID of the delegatee. Assuming that we want
680 to delegate the hierarchy rooted at (say) /dlgt_grp and that there are
681 not yet any child cgroups under that cgroup, the ownership of the fol‐
682 lowing is changed to the user ID of the delegatee:
683
684 /dlgt_grp
685 Changing the ownership of the root of the subtree means that any
686 new cgroups created under the subtree (and the files they con‐
687 tain) will also be owned by the delegatee.
688
689 /dlgt_grp/cgroup.procs
690 Changing the ownership of this file means that the delegatee can
691 move processes into the root of the delegated subtree.
692
693 /dlgt_grp/cgroup.subtree_control (cgroups v2 only)
694 Changing the ownership of this file means that the delegatee can
695 enable controllers (that are present in /dlgt_grp/cgroup.con‐
696 trollers) in order to further redistribute resources at lower
697 levels in the subtree. (As an alternative to changing the own‐
698 ership of this file, the delegater might instead add selected
699 controllers to this file.)
700
701 /dlgt_grp/cgroup.threads (cgroups v2 only)
702 Changing the ownership of this file is necessary if a threaded
703 subtree is being delegated (see the description of "thread
704 mode", below). This permits the delegatee to write thread IDs
705 to the file. (The ownership of this file can also be changed
706 when delegating a domain subtree, but currently this serves no
707 purpose, since, as described below, it is not possible to move a
708 thread between domain cgroups by writing its thread ID to the
709 cgroup.threads file.)
710
711 In cgroups v1, the corresponding file that should instead be
712 delegated is the tasks file.
713
714 The delegater should not change the ownership of any of the controller
715 interfaces files (e.g., pids.max, memory.high) in dlgt_grp. Those
716 files are used from the next level above the delegated subtree in order
717 to distribute resources into the subtree, and the delegatee should not
718 have permission to change the resources that are distributed into the
719 delegated subtree.
720
721 See also the discussion of the /sys/kernel/cgroup/delegate file in
722 NOTES for information about further delegatable files in cgroups v2.
723
724 After the aforementioned steps have been performed, the delegatee can
725 create child cgroups within the delegated subtree (the cgroup subdirec‐
726 tories and the files they contain will be owned by the delegatee) and
727 move processes between cgroups in the subtree. If some controllers are
728 present in dlgt_grp/cgroup.subtree_control, or the ownership of that
729 file was passed to the delegatee, the delegatee can also control the
730 further redistribution of the corresponding resources into the dele‐
731 gated subtree.
732
733 Cgroups v2 delegation: nsdelegate and cgroup namespaces
734 Starting with Linux 4.13, there is a second way to perform cgroup dele‐
735 gation in the cgroups v2 hierarchy. This is done by mounting or re‐
736 mounting the cgroup v2 filesystem with the nsdelegate mount option.
737 For example, if the cgroup v2 filesystem has already been mounted, we
738 can remount it with the nsdelegate option as follows:
739
740 mount -t cgroup2 -o remount,nsdelegate \
741 none /sys/fs/cgroup/unified
742
743 The effect of this mount option is to cause cgroup namespaces to auto‐
744 matically become delegation boundaries. More specifically, the follow‐
745 ing restrictions apply for processes inside the cgroup namespace:
746
747 * Writes to controller interface files in the root directory of the
748 namespace will fail with the error EPERM. Processes inside the
749 cgroup namespace can still write to delegatable files in the root
750 directory of the cgroup namespace such as cgroup.procs and
751 cgroup.subtree_control, and can create subhierarchy underneath the
752 root directory.
753
754 * Attempts to migrate processes across the namespace boundary are de‐
755 nied (with the error ENOENT). Processes inside the cgroup namespace
756 can still (subject to the containment rules described below) move
757 processes between cgroups within the subhierarchy under the name‐
758 space root.
759
760 The ability to define cgroup namespaces as delegation boundaries makes
761 cgroup namespaces more useful. To understand why, suppose that we al‐
762 ready have one cgroup hierarchy that has been delegated to a nonprivi‐
763 leged user, cecilia, using the older delegation technique described
764 above. Suppose further that cecilia wanted to further delegate a sub‐
765 hierarchy under the existing delegated hierarchy. (For example, the
766 delegated hierarchy might be associated with an unprivileged container
767 run by cecilia.) Even if a cgroup namespace was employed, because both
768 hierarchies are owned by the unprivileged user cecilia, the following
769 illegitimate actions could be performed:
770
771 * A process in the inferior hierarchy could change the resource con‐
772 troller settings in the root directory of that hierarchy. (These
773 resource controller settings are intended to allow control to be ex‐
774 ercised from the parent cgroup; a process inside the child cgroup
775 should not be allowed to modify them.)
776
777 * A process inside the inferior hierarchy could move processes into
778 and out of the inferior hierarchy if the cgroups in the superior hi‐
779 erarchy were somehow visible.
780
781 Employing the nsdelegate mount option prevents both of these possibili‐
782 ties.
783
784 The nsdelegate mount option only has an effect when performed in the
785 initial mount namespace; in other mount namespaces, the option is
786 silently ignored.
787
788 Note: On some systems, systemd(1) automatically mounts the cgroup v2
789 filesystem. In order to experiment with the nsdelegate operation, it
790 may be useful to boot the kernel with the following command-line op‐
791 tions:
792
793 cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
794
795 These options cause the kernel to boot with the cgroups v1 controllers
796 disabled (meaning that the controllers are available in the v2 hierar‐
797 chy), and tells systemd(1) not to mount and use the cgroup v2 hierar‐
798 chy, so that the v2 hierarchy can be manually mounted with the desired
799 options after boot-up.
800
801 Cgroup delegation containment rules
802 Some delegation containment rules ensure that the delegatee can move
803 processes between cgroups within the delegated subtree, but can't move
804 processes from outside the delegated subtree into the subtree or vice
805 versa. A nonprivileged process (i.e., the delegatee) can write the PID
806 of a "target" process into a cgroup.procs file only if all of the fol‐
807 lowing are true:
808
809 * The writer has write permission on the cgroup.procs file in the des‐
810 tination cgroup.
811
812 * The writer has write permission on the cgroup.procs file in the
813 nearest common ancestor of the source and destination cgroups. Note
814 that in some cases, the nearest common ancestor may be the source or
815 destination cgroup itself. This requirement is not enforced for
816 cgroups v1 hierarchies, with the consequence that containment in v1
817 is less strict than in v2. (For example, in cgroups v1 the user
818 that owns two distinct delegated subhierarchies can move a process
819 between the hierarchies.)
820
821 * If the cgroup v2 filesystem was mounted with the nsdelegate option,
822 the writer must be able to see the source and destination cgroups
823 from its cgroup namespace.
824
825 * In cgroups v1: the effective UID of the writer (i.e., the delegatee)
826 matches the real user ID or the saved set-user-ID of the target
827 process. Before Linux 4.11, this requirement also applied in
828 cgroups v2 (This was a historical requirement inherited from cgroups
829 v1 that was later deemed unnecessary, since the other rules suffice
830 for containment in cgroups v2.)
831
832 Note: one consequence of these delegation containment rules is that the
833 unprivileged delegatee can't place the first process into the delegated
834 subtree; instead, the delegater must place the first process (a process
835 owned by the delegatee) into the delegated subtree.
836
838 Among the restrictions imposed by cgroups v2 that were not present in
839 cgroups v1 are the following:
840
841 * No thread-granularity control: all of the threads of a process must
842 be in the same cgroup.
843
844 * No internal processes: a cgroup can't both have member processes and
845 exercise controllers on child cgroups.
846
847 Both of these restrictions were added because the lack of these re‐
848 strictions had caused problems in cgroups v1. In particular, the
849 cgroups v1 ability to allow thread-level granularity for cgroup member‐
850 ship made no sense for some controllers. (A notable example was the
851 memory controller: since threads share an address space, it made no
852 sense to split threads across different memory cgroups.)
853
854 Notwithstanding the initial design decision in cgroups v2, there were
855 use cases for certain controllers, notably the cpu controller, for
856 which thread-level granularity of control was meaningful and useful.
857 To accommodate such use cases, Linux 4.14 added thread mode for cgroups
858 v2.
859
860 Thread mode allows the following:
861
862 * The creation of threaded subtrees in which the threads of a process
863 may be spread across cgroups inside the tree. (A threaded subtree
864 may contain multiple multithreaded processes.)
865
866 * The concept of threaded controllers, which can distribute resources
867 across the cgroups in a threaded subtree.
868
869 * A relaxation of the "no internal processes rule", so that, within a
870 threaded subtree, a cgroup can both contain member threads and exer‐
871 cise resource control over child cgroups.
872
873 With the addition of thread mode, each nonroot cgroup now contains a
874 new file, cgroup.type, that exposes, and in some circumstances can be
875 used to change, the "type" of a cgroup. This file contains one of the
876 following type values:
877
878 domain This is a normal v2 cgroup that provides process-granularity
879 control. If a process is a member of this cgroup, then all
880 threads of the process are (by definition) in the same cgroup.
881 This is the default cgroup type, and provides the same behavior
882 that was provided for cgroups in the initial cgroups v2 imple‐
883 mentation.
884
885 threaded
886 This cgroup is a member of a threaded subtree. Threads can be
887 added to this cgroup, and controllers can be enabled for the
888 cgroup.
889
890 domain threaded
891 This is a domain cgroup that serves as the root of a threaded
892 subtree. This cgroup type is also known as "threaded root".
893
894 domain invalid
895 This is a cgroup inside a threaded subtree that is in an "in‐
896 valid" state. Processes can't be added to the cgroup, and con‐
897 trollers can't be enabled for the cgroup. The only thing that
898 can be done with this cgroup (other than deleting it) is to con‐
899 vert it to a threaded cgroup by writing the string "threaded" to
900 the cgroup.type file.
901
902 The rationale for the existence of this "interim" type during
903 the creation of a threaded subtree (rather than the kernel sim‐
904 ply immediately converting all cgroups under the threaded root
905 to the type threaded) is to allow for possible future extensions
906 to the thread mode model
907
908 Threaded versus domain controllers
909 With the addition of threads mode, cgroups v2 now distinguishes two
910 types of resource controllers:
911
912 * Threaded controllers: these controllers support thread-granularity
913 for resource control and can be enabled inside threaded subtrees,
914 with the result that the corresponding controller-interface files
915 appear inside the cgroups in the threaded subtree. As at Linux
916 4.19, the following controllers are threaded: cpu, perf_event, and
917 pids.
918
919 * Domain controllers: these controllers support only process granular‐
920 ity for resource control. From the perspective of a domain con‐
921 troller, all threads of a process are always in the same cgroup.
922 Domain controllers can't be enabled inside a threaded subtree.
923
924 Creating a threaded subtree
925 There are two pathways that lead to the creation of a threaded subtree.
926 The first pathway proceeds as follows:
927
928 1. We write the string "threaded" to the cgroup.type file of a cgroup
929 y/z that currently has the type domain. This has the following ef‐
930 fects:
931
932 * The type of the cgroup y/z becomes threaded.
933
934 * The type of the parent cgroup, y, becomes domain threaded. The
935 parent cgroup is the root of a threaded subtree (also known as
936 the "threaded root").
937
938 * All other cgroups under y that were not already of type threaded
939 (because they were inside already existing threaded subtrees un‐
940 der the new threaded root) are converted to type domain invalid.
941 Any subsequently created cgroups under y will also have the type
942 domain invalid.
943
944 2. We write the string "threaded" to each of the domain invalid cgroups
945 under y, in order to convert them to the type threaded. As a conse‐
946 quence of this step, all threads under the threaded root now have
947 the type threaded and the threaded subtree is now fully usable. The
948 requirement to write "threaded" to each of these cgroups is somewhat
949 cumbersome, but allows for possible future extensions to the thread-
950 mode model.
951
952 The second way of creating a threaded subtree is as follows:
953
954 1. In an existing cgroup, z, that currently has the type domain, we (1)
955 enable one or more threaded controllers and (2) make a process a
956 member of z. (These two steps can be done in either order.) This
957 has the following consequences:
958
959 * The type of z becomes domain threaded.
960
961 * All of the descendant cgroups of x that were not already of type
962 threaded are converted to type domain invalid.
963
964 2. As before, we make the threaded subtree usable by writing the string
965 "threaded" to each of the domain invalid cgroups under y, in order
966 to convert them to the type threaded.
967
968 One of the consequences of the above pathways to creating a threaded
969 subtree is that the threaded root cgroup can be a parent only to
970 threaded (and domain invalid) cgroups. The threaded root cgroup can't
971 be a parent of a domain cgroups, and a threaded cgroup can't have a
972 sibling that is a domain cgroup.
973
974 Using a threaded subtree
975 Within a threaded subtree, threaded controllers can be enabled in each
976 subgroup whose type has been changed to threaded; upon doing so, the
977 corresponding controller interface files appear in the children of that
978 cgroup.
979
980 A process can be moved into a threaded subtree by writing its PID to
981 the cgroup.procs file in one of the cgroups inside the tree. This has
982 the effect of making all of the threads in the process members of the
983 corresponding cgroup and makes the process a member of the threaded
984 subtree. The threads of the process can then be spread across the
985 threaded subtree by writing their thread IDs (see gettid(2)) to the
986 cgroup.threads files in different cgroups inside the subtree. The
987 threads of a process must all reside in the same threaded subtree.
988
989 As with writing to cgroup.procs, some containment rules apply when
990 writing to the cgroup.threads file:
991
992 * The writer must have write permission on the cgroup.threads file in
993 the destination cgroup.
994
995 * The writer must have write permission on the cgroup.procs file in
996 the common ancestor of the source and destination cgroups. (In some
997 cases, the common ancestor may be the source or destination cgroup
998 itself.)
999
1000 * The source and destination cgroups must be in the same threaded sub‐
1001 tree. (Outside a threaded subtree, an attempt to move a thread by
1002 writing its thread ID to the cgroup.threads file in a different do‐
1003 main cgroup fails with the error EOPNOTSUPP.)
1004
1005 The cgroup.threads file is present in each cgroup (including domain
1006 cgroups) and can be read in order to discover the set of threads that
1007 is present in the cgroup. The set of thread IDs obtained when reading
1008 this file is not guaranteed to be ordered or free of duplicates.
1009
1010 The cgroup.procs file in the threaded root shows the PIDs of all pro‐
1011 cesses that are members of the threaded subtree. The cgroup.procs
1012 files in the other cgroups in the subtree are not readable.
1013
1014 Domain controllers can't be enabled in a threaded subtree; no con‐
1015 troller-interface files appear inside the cgroups underneath the
1016 threaded root. From the point of view of a domain controller, threaded
1017 subtrees are invisible: a multithreaded process inside a threaded sub‐
1018 tree appears to a domain controller as a process that resides in the
1019 threaded root cgroup.
1020
1021 Within a threaded subtree, the "no internal processes" rule does not
1022 apply: a cgroup can both contain member processes (or thread) and exer‐
1023 cise controllers on child cgroups.
1024
1025 Rules for writing to cgroup.type and creating threaded subtrees
1026 A number of rules apply when writing to the cgroup.type file:
1027
1028 * Only the string "threaded" may be written. In other words, the only
1029 explicit transition that is possible is to convert a domain cgroup
1030 to type threaded.
1031
1032 * The effect of writing "threaded" depends on the current value in
1033 cgroup.type, as follows:
1034
1035 • domain or domain threaded: start the creation of a threaded sub‐
1036 tree (whose root is the parent of this cgroup) via the first of
1037 the pathways described above;
1038
1039 • domain invalid: convert this cgroup (which is inside a threaded
1040 subtree) to a usable (i.e., threaded) state;
1041
1042 • threaded: no effect (a "no-op").
1043
1044 * We can't write to a cgroup.type file if the parent's type is domain
1045 invalid. In other words, the cgroups of a threaded subtree must be
1046 converted to the threaded state in a top-down manner.
1047
1048 There are also some constraints that must be satisfied in order to cre‐
1049 ate a threaded subtree rooted at the cgroup x:
1050
1051 * There can be no member processes in the descendant cgroups of x.
1052 (The cgroup x can itself have member processes.)
1053
1054 * No domain controllers may be enabled in x's cgroup.subtree_control
1055 file.
1056
1057 If any of the above constraints is violated, then an attempt to write
1058 "threaded" to a cgroup.type file fails with the error ENOTSUP.
1059
1060 The "domain threaded" cgroup type
1061 According to the pathways described above, the type of a cgroup can
1062 change to domain threaded in either of the following cases:
1063
1064 * The string "threaded" is written to a child cgroup.
1065
1066 * A threaded controller is enabled inside the cgroup and a process is
1067 made a member of the cgroup.
1068
1069 A domain threaded cgroup, x, can revert to the type domain if the above
1070 conditions no longer hold true—that is, if all threaded child cgroups
1071 of x are removed and either x no longer has threaded controllers en‐
1072 abled or no longer has member processes.
1073
1074 When a domain threaded cgroup x reverts to the type domain:
1075
1076 * All domain invalid descendants of x that are not in lower-level
1077 threaded subtrees revert to the type domain.
1078
1079 * The root cgroups in any lower-level threaded subtrees revert to the
1080 type domain threaded.
1081
1082 Exceptions for the root cgroup
1083 The root cgroup of the v2 hierarchy is treated exceptionally: it can be
1084 the parent of both domain and threaded cgroups. If the string
1085 "threaded" is written to the cgroup.type file of one of the children of
1086 the root cgroup, then
1087
1088 * The type of that cgroup becomes threaded.
1089
1090 * The type of any descendants of that cgroup that are not part of
1091 lower-level threaded subtrees changes to domain invalid.
1092
1093 Note that in this case, there is no cgroup whose type becomes domain
1094 threaded. (Notionally, the root cgroup can be considered as the
1095 threaded root for the cgroup whose type was changed to threaded.)
1096
1097 The aim of this exceptional treatment for the root cgroup is to allow a
1098 threaded cgroup that employs the cpu controller to be placed as high as
1099 possible in the hierarchy, so as to minimize the (small) cost of
1100 traversing the cgroup hierarchy.
1101
1102 The cgroups v2 "cpu" controller and realtime threads
1103 As at Linux 4.19, the cgroups v2 cpu controller does not support con‐
1104 trol of realtime threads (specifically threads scheduled under any of
1105 the policies SCHED_FIFO, SCHED_RR, described SCHED_DEADLINE; see
1106 sched(7)). Therefore, the cpu controller can be enabled in the root
1107 cgroup only if all realtime threads are in the root cgroup. (If there
1108 are realtime threads in nonroot cgroups, then a write(2) of the string
1109 "+cpu" to the cgroup.subtree_control file fails with the error EINVAL.)
1110
1111 On some systems, systemd(1) places certain realtime threads in nonroot
1112 cgroups in the v2 hierarchy. On such systems, these threads must first
1113 be moved to the root cgroup before the cpu controller can be enabled.
1114
1116 The following errors can occur for mount(2):
1117
1118 EBUSY An attempt to mount a cgroup version 1 filesystem specified nei‐
1119 ther the name= option (to mount a named hierarchy) nor a con‐
1120 troller name (or all).
1121
1123 A child process created via fork(2) inherits its parent's cgroup mem‐
1124 berships. A process's cgroup memberships are preserved across ex‐
1125 ecve(2).
1126
1127 The clone3(2) CLONE_INTO_CGROUP flag can be used to create a child
1128 process that begins its life in a different version 2 cgroup from the
1129 parent process.
1130
1131 /proc files
1132 /proc/cgroups (since Linux 2.6.24)
1133 This file contains information about the controllers that are
1134 compiled into the kernel. An example of the contents of this
1135 file (reformatted for readability) is the following:
1136
1137 #subsys_name hierarchy num_cgroups enabled
1138 cpuset 4 1 1
1139 cpu 8 1 1
1140 cpuacct 8 1 1
1141 blkio 6 1 1
1142 memory 3 1 1
1143 devices 10 84 1
1144 freezer 7 1 1
1145 net_cls 9 1 1
1146 perf_event 5 1 1
1147 net_prio 9 1 1
1148 hugetlb 0 1 0
1149 pids 2 1 1
1150
1151 The fields in this file are, from left to right:
1152
1153 1. The name of the controller.
1154
1155 2. The unique ID of the cgroup hierarchy on which this con‐
1156 troller is mounted. If multiple cgroups v1 controllers are
1157 bound to the same hierarchy, then each will show the same hi‐
1158 erarchy ID in this field. The value in this field will be 0
1159 if:
1160
1161 a) the controller is not mounted on a cgroups v1 hierarchy;
1162
1163 b) the controller is bound to the cgroups v2 single unified
1164 hierarchy; or
1165
1166 c) the controller is disabled (see below).
1167
1168 3. The number of control groups in this hierarchy using this
1169 controller.
1170
1171 4. This field contains the value 1 if this controller is en‐
1172 abled, or 0 if it has been disabled (via the cgroup_disable
1173 kernel command-line boot parameter).
1174
1175 /proc/[pid]/cgroup (since Linux 2.6.24)
1176 This file describes control groups to which the process with the
1177 corresponding PID belongs. The displayed information differs
1178 for cgroups version 1 and version 2 hierarchies.
1179
1180 For each cgroup hierarchy of which the process is a member,
1181 there is one entry containing three colon-separated fields:
1182
1183 hierarchy-ID:controller-list:cgroup-path
1184
1185 For example:
1186
1187 5:cpuacct,cpu,cpuset:/daemons
1188
1189 The colon-separated fields are, from left to right:
1190
1191 1. For cgroups version 1 hierarchies, this field contains a
1192 unique hierarchy ID number that can be matched to a hierarchy
1193 ID in /proc/cgroups. For the cgroups version 2 hierarchy,
1194 this field contains the value 0.
1195
1196 2. For cgroups version 1 hierarchies, this field contains a
1197 comma-separated list of the controllers bound to the hierar‐
1198 chy. For the cgroups version 2 hierarchy, this field is
1199 empty.
1200
1201 3. This field contains the pathname of the control group in the
1202 hierarchy to which the process belongs. This pathname is
1203 relative to the mount point of the hierarchy.
1204
1205 /sys/kernel/cgroup files
1206 /sys/kernel/cgroup/delegate (since Linux 4.15)
1207 This file exports a list of the cgroups v2 files (one per line)
1208 that are delegatable (i.e., whose ownership should be changed to
1209 the user ID of the delegatee). In the future, the set of dele‐
1210 gatable files may change or grow, and this file provides a way
1211 for the kernel to inform user-space applications of which files
1212 must be delegated. As at Linux 4.15, one sees the following
1213 when inspecting this file:
1214
1215 $ cat /sys/kernel/cgroup/delegate
1216 cgroup.procs
1217 cgroup.subtree_control
1218 cgroup.threads
1219
1220 /sys/kernel/cgroup/features (since Linux 4.15)
1221 Over time, the set of cgroups v2 features that are provided by
1222 the kernel may change or grow, or some features may not be en‐
1223 abled by default. This file provides a way for user-space ap‐
1224 plications to discover what features the running kernel supports
1225 and has enabled. Features are listed one per line:
1226
1227 $ cat /sys/kernel/cgroup/features
1228 nsdelegate
1229 memory_localevents
1230
1231 The entries that can appear in this file are:
1232
1233 memory_localevents (since Linux 5.2)
1234 The kernel supports the memory_localevents mount option.
1235
1236 nsdelegate (since Linux 4.15)
1237 The kernel supports the nsdelegate mount option.
1238
1240 prlimit(1), systemd(1), systemd-cgls(1), systemd-cgtop(1), clone(2),
1241 ioprio_set(2), perf_event_open(2), setrlimit(2), cgroup_namespaces(7),
1242 cpuset(7), namespaces(7), sched(7), user_namespaces(7)
1243
1244 The kernel source file Documentation/admin-guide/cgroup-v2.rst.
1245
1247 This page is part of release 5.10 of the Linux man-pages project. A
1248 description of the project, information about reporting bugs, and the
1249 latest version of this page, can be found at
1250 https://www.kernel.org/doc/man-pages/.
1251
1252
1253
1254Linux 2020-08-13 CGROUPS(7)