cgroups(7)

1CGROUPS(7)                 Linux Programmer's Manual                CGROUPS(7)
2
3
4

NAME

6       cgroups - Linux control groups
7

DESCRIPTION

9       Control groups, usually referred to as cgroups, are a Linux kernel fea‐
10       ture which allow processes to be  organized  into  hierarchical  groups
11       whose usage of various types of resources can then be limited and moni‐
12       tored.  The kernel's cgroup interface is  provided  through  a  pseudo-
13       filesystem called cgroupfs.  Grouping is implemented in the core cgroup
14       kernel code, while resource tracking and limits are  implemented  in  a
15       set of per-resource-type subsystems (memory, CPU, and so on).
16
17   Terminology
18       A cgroup is a collection of processes that are bound to a set of limits
19       or parameters defined via the cgroup filesystem.
20
21       A subsystem is a kernel component that modifies  the  behavior  of  the
22       processes  in a cgroup.  Various subsystems have been implemented, mak‐
23       ing it possible to do things such as limiting the amount  of  CPU  time
24       and memory available to a cgroup, accounting for the CPU time used by a
25       cgroup, and freezing and resuming  execution  of  the  processes  in  a
26       cgroup.   Subsystems  are  sometimes also known as resource controllers
27       (or simply, controllers).
28
29       The cgroups for a controller are arranged in a hierarchy.  This hierar‐
30       chy  is  defined  by  creating,  removing,  and renaming subdirectories
31       within the cgroup filesystem.  At each  level  of  the  hierarchy,  at‐
32       tributes  (e.g.,  limits) can be defined.  The limits, control, and ac‐
33       counting provided by cgroups generally have effect throughout the  sub‐
34       hierarchy  underneath  the  cgroup  where  the  attributes are defined.
35       Thus, for example, the limits placed on a cgroup at a higher  level  in
36       the hierarchy cannot be exceeded by descendant cgroups.
37
38   Cgroups version 1 and version 2
39       The  initial release of the cgroups implementation was in Linux 2.6.24.
40       Over time, various cgroup controllers have been added to allow the man‐
41       agement  of  various  types  of resources.  However, the development of
42       these controllers was largely uncoordinated, with the result that  many
43       inconsistencies  arose between controllers and management of the cgroup
44       hierarchies became rather complex.  A longer description of these prob‐
45       lems   can  be  found  in  the  kernel  source  file  Documentation/ad‐
46       min-guide/cgroup-v2.rst (or Documentation/cgroup-v2.txt in  Linux  4.17
47       and earlier).
48
49       Because  of  the  problems  with  the  initial  cgroups  implementation
50       (cgroups version 1), starting in Linux 3.10, work began on a  new,  or‐
51       thogonal implementation to remedy these problems.  Initially marked ex‐
52       perimental, and hidden behind the -o __DEVEL__sane_behavior  mount  op‐
53       tion,  the new version (cgroups version 2) was eventually made official
54       with the release of Linux 4.5.  Differences between  the  two  versions
55       are  described  in  the  text  below.   The  file cgroup.sane_behavior,
56       present in cgroups v1, is a relic of this mount option.  The  file  al‐
57       ways reports "0" and is only retained for backward compatibility.
58
59       Although  cgroups  v2  is intended as a replacement for cgroups v1, the
60       older system continues to exist (and for compatibility reasons  is  un‐
61       likely  to be removed).  Currently, cgroups v2 implements only a subset
62       of the controllers available in cgroups v1.  The two systems are imple‐
63       mented so that both v1 controllers and v2 controllers can be mounted on
64       the same system.  Thus, for example, it is possible to use  those  con‐
65       trollers that are supported under version 2, while also using version 1
66       controllers where version 2 does not  yet  support  those  controllers.
67       The  only restriction here is that a controller can't be simultaneously
68       employed in both a cgroups v1 hierarchy and in the cgroups  v2  hierar‐
69       chy.
70

CGROUPS VERSION 1

72       Under  cgroups  v1,  each  controller may be mounted against a separate
73       cgroup filesystem that provides its own  hierarchical  organization  of
74       the  processes  on the system.  It is also possible to comount multiple
75       (or even all) cgroups v1 controllers against the same  cgroup  filesys‐
76       tem,  meaning that the comounted controllers manage the same hierarchi‐
77       cal organization of processes.
78
79       For each mounted hierarchy, the  directory  tree  mirrors  the  control
80       group  hierarchy.   Each  control  group is represented by a directory,
81       with each of its child control cgroups represented as  a  child  direc‐
82       tory.   For  instance,  /user/joe/1.session  represents  control  group
83       1.session, which is a child of cgroup joe, which is a child  of  /user.
84       Under  each  cgroup  directory  is  a set of files which can be read or
85       written to, reflecting resource limits and a few general cgroup proper‐
86       ties.
87
88   Tasks (threads) versus processes
89       In  cgroups v1, a distinction is drawn between processes and tasks.  In
90       this view, a process can  consist  of  multiple  tasks  (more  commonly
91       called  threads,  from a user-space perspective, and called such in the
92       remainder of this man page).  In cgroups v1, it is possible to indepen‐
93       dently manipulate the cgroup memberships of the threads in a process.
94
95       The cgroups v1 ability to split threads across different cgroups caused
96       problems in some cases.  For example, it made no sense for  the  memory
97       controller,  since  all  of the threads of a process share a single ad‐
98       dress space.  Because of these problems, the ability  to  independently
99       manipulate  the  cgroup memberships of the threads in a process was re‐
100       moved in the initial cgroups v2 implementation,  and  subsequently  re‐
101       stored  in a more limited form (see the discussion of "thread mode" be‐
102       low).
103
104   Mounting v1 controllers
105       The use of cgroups requires a kernel built with the  CONFIG_CGROUP  op‐
106       tion.   In  addition, each of the v1 controllers has an associated con‐
107       figuration option that must be set in order to employ that controller.
108
109       In order to use a v1 controller, it must be mounted  against  a  cgroup
110       filesystem.   The  usual  place  for  such  mounts  is under a tmpfs(5)
111       filesystem mounted at /sys/fs/cgroup.  Thus, one might  mount  the  cpu
112       controller as follows:
113
114           mount -t cgroup -o cpu none /sys/fs/cgroup/cpu
115
116       It is possible to comount multiple controllers against the same hierar‐
117       chy.  For example, here the cpu and cpuacct controllers  are  comounted
118       against a single hierarchy:
119
120           mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
121
122       Comounting  controllers  has  the  effect that a process is in the same
123       cgroup for all of the comounted controllers.  Separately mounting  con‐
124       trollers  allows  a  process  to  be in cgroup /foo1 for one controller
125       while being in /foo2/foo3 for another.
126
127       It is possible to comount all v1 controllers against the  same  hierar‐
128       chy:
129
130           mount -t cgroup -o all cgroup /sys/fs/cgroup
131
132       (One  can  achieve  the same result by omitting -o all, since it is the
133       default if no controllers are explicitly specified.)
134
135       It is not possible to mount the same controller against multiple cgroup
136       hierarchies.  For example, it is not possible to mount both the cpu and
137       cpuacct controllers against one hierarchy, and to mount  the  cpu  con‐
138       troller alone against another hierarchy.  It is possible to create mul‐
139       tiple mount with exactly the same set of comounted  controllers.   How‐
140       ever,  in this case all that results is multiple mount points providing
141       a view of the same hierarchy.
142
143       Note that on many systems, the v1 controllers are automatically mounted
144       under  /sys/fs/cgroup;  in particular, systemd(1) automatically creates
145       such mounts.
146
147   Unmounting v1 controllers
148       A mounted cgroup filesystem can be unmounted using the  umount(8)  com‐
149       mand, as in the following example:
150
151           umount /sys/fs/cgroup/pids
152
153       But note well: a cgroup filesystem is unmounted only if it is not busy,
154       that is, it has no child cgroups.  If this is not the  case,  then  the
155       only  effect of the umount(8) is to make the mount invisible.  Thus, to
156       ensure that the mount is really removed,  one  must  first  remove  all
157       child  cgroups,  which  in  turn can be done only after all member pro‐
158       cesses have been moved from those cgroups to the root cgroup.
159
160   Cgroups version 1 controllers
161       Each of the cgroups version 1 controllers is governed by a kernel  con‐
162       figuration  option  (listed  below).  Additionally, the availability of
163       the cgroups feature is governed by the CONFIG_CGROUPS kernel configura‐
164       tion option.
165
166       cpu (since Linux 2.6.24; CONFIG_CGROUP_SCHED)
167              Cgroups  can be guaranteed a minimum number of "CPU shares" when
168              a system is busy.  This does not limit a cgroup's CPU  usage  if
169              the  CPUs are not busy.  For further information, see Documenta‐
170              tion/scheduler/sched-design-CFS.rst   (or   Documentation/sched‐
171              uler/sched-design-CFS.txt in Linux 5.2 and earlier).
172
173              In Linux 3.2, this controller was extended to provide CPU "band‐
174              width"  control.   If  the  kernel  is  configured   with   CON‐
175              FIG_CFS_BANDWIDTH,  then  within each scheduling period (defined
176              via a file in the cgroup directory), it is possible to define an
177              upper  limit  on  the  CPU  time allocated to the processes in a
178              cgroup.  This upper limit applies even if there is no other com‐
179              petition  for  the CPU.  Further information can be found in the
180              kernel  source  file  Documentation/scheduler/sched-bwc.rst  (or
181              Documentation/scheduler/sched-bwc.txt in Linux 5.2 and earlier).
182
183       cpuacct (since Linux 2.6.24; CONFIG_CGROUP_CPUACCT)
184              This provides accounting for CPU usage by groups of processes.
185
186              Further information can be found in the kernel source file Docu‐
187              mentation/admin-guide/cgroup-v1/cpuacct.rst    (or    Documenta‐
188              tion/cgroup-v1/cpuacct.txt in Linux 5.2 and earlier).
189
190       cpuset (since Linux 2.6.24; CONFIG_CPUSETS)
191              This  cgroup  can be used to bind the processes in a cgroup to a
192              specified set of CPUs and NUMA nodes.
193
194              Further information can be found in the kernel source file Docu‐
195              mentation/admin-guide/cgroup-v1/cpusets.rst    (or    Documenta‐
196              tion/cgroup-v1/cpusets.txt in Linux 5.2 and earlier).
197
198       memory (since Linux 2.6.25; CONFIG_MEMCG)
199              The memory controller supports reporting and limiting of process
200              memory, kernel memory, and swap used by cgroups.
201
202              Further information can be found in the kernel source file Docu‐
203              mentation/admin-guide/cgroup-v1/memory.rst    (or     Documenta‐
204              tion/cgroup-v1/memory.txt in Linux 5.2 and earlier).
205
206       devices (since Linux 2.6.26; CONFIG_CGROUP_DEVICE)
207              This supports controlling which processes may create (mknod) de‐
208              vices as well as open them for reading or writing.  The policies
209              may  be  specified  as allow-lists and deny-lists.  Hierarchy is
210              enforced, so new rules must not violate existing rules  for  the
211              target or ancestor cgroups.
212
213              Further information can be found in the kernel source file Docu‐
214              mentation/admin-guide/cgroup-v1/devices.rst    (or    Documenta‐
215              tion/cgroup-v1/devices.txt in Linux 5.2 and earlier).
216
217       freezer (since Linux 2.6.28; CONFIG_CGROUP_FREEZER)
218              The  freezer  cgroup  can  suspend and restore (resume) all pro‐
219              cesses in a cgroup.  Freezing a cgroup /A also causes its  chil‐
220              dren, for example, processes in /A/B, to be frozen.
221
222              Further information can be found in the kernel source file Docu‐
223              mentation/admin-guide/cgroup-v1/freezer-subsystem.rst (or  Docu‐
224              mentation/cgroup-v1/freezer-subsystem.txt  in Linux 5.2 and ear‐
225              lier).
226
227       net_cls (since Linux 2.6.29; CONFIG_CGROUP_NET_CLASSID)
228              This places a classid, specified  for  the  cgroup,  on  network
229              packets created by a cgroup.  These classids can then be used in
230              firewall rules, as well as used to shape  traffic  using  tc(8).
231              This  applies only to packets leaving the cgroup, not to traffic
232              arriving at the cgroup.
233
234              Further information can be found in the kernel source file Docu‐
235              mentation/admin-guide/cgroup-v1/net_cls.rst    (or    Documenta‐
236              tion/cgroup-v1/net_cls.txt in Linux 5.2 and earlier).
237
238       blkio (since Linux 2.6.33; CONFIG_BLK_CGROUP)
239              The blkio cgroup controls and limits access to  specified  block
240              devices by applying IO control in the form of throttling and up‐
241              per limits against leaf nodes  and  intermediate  nodes  in  the
242              storage hierarchy.
243
244              Two  policies are available.  The first is a proportional-weight
245              time-based division of disk implemented with CFQ.   This  is  in
246              effect  for  leaf  nodes  using CFQ.  The second is a throttling
247              policy which specifies upper I/O rate limits on a device.
248
249              Further information can be found in the kernel source file Docu‐
250              mentation/admin-guide/cgroup-v1/blkio-controller.rst  (or  Docu‐
251              mentation/cgroup-v1/blkio-controller.txt in Linux 5.2  and  ear‐
252              lier).
253
254       perf_event (since Linux 2.6.39; CONFIG_CGROUP_PERF)
255              This  controller  allows perf monitoring of the set of processes
256              grouped in a cgroup.
257
258              Further information can be found in the kernel source files
259
260       net_prio (since Linux 3.3; CONFIG_CGROUP_NET_PRIO)
261              This allows priorities to be specified, per  network  interface,
262              for cgroups.
263
264              Further information can be found in the kernel source file Docu‐
265              mentation/admin-guide/cgroup-v1/net_prio.rst   (or    Documenta‐
266              tion/cgroup-v1/net_prio.txt in Linux 5.2 and earlier).
267
268       hugetlb (since Linux 3.5; CONFIG_CGROUP_HUGETLB)
269              This supports limiting the use of huge pages by cgroups.
270
271              Further information can be found in the kernel source file Docu‐
272              mentation/admin-guide/cgroup-v1/hugetlb.rst    (or    Documenta‐
273              tion/cgroup-v1/hugetlb.txt in Linux 5.2 and earlier).
274
275       pids (since Linux 4.3; CONFIG_CGROUP_PIDS)
276              This  controller permits limiting the number of process that may
277              be created in a cgroup (and its descendants).
278
279              Further information can be found in the kernel source file Docu‐
280              mentation/admin-guide/cgroup-v1/pids.rst      (or     Documenta‐
281              tion/cgroup-v1/pids.txt in Linux 5.2 and earlier).
282
283       rdma (since Linux 4.11; CONFIG_CGROUP_RDMA)
284              The RDMA controller permits limiting the use of RDMA/IB-specific
285              resources per cgroup.
286
287              Further information can be found in the kernel source file Docu‐
288              mentation/admin-guide/cgroup-v1/rdma.rst     (or      Documenta‐
289              tion/cgroup-v1/rdma.txt in Linux 5.2 and earlier).
290
291   Creating cgroups and moving processes
292       A cgroup filesystem initially contains a single root cgroup, '/', which
293       all processes belong to.  A new cgroup is created by creating a  direc‐
294       tory in the cgroup filesystem:
295
296           mkdir /sys/fs/cgroup/cpu/cg1
297
298       This creates a new empty cgroup.
299
300       A  process  may  be  moved  to  this cgroup by writing its PID into the
301       cgroup's cgroup.procs file:
302
303           echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
304
305       Only one PID at a time should be written to this file.
306
307       Writing the value 0 to a cgroup.procs file causes the  writing  process
308       to be moved to the corresponding cgroup.
309
310       When  writing  a  PID into the cgroup.procs, all threads in the process
311       are moved into the new cgroup at once.
312
313       Within a hierarchy, a process can be a member of  exactly  one  cgroup.
314       Writing a process's PID to a cgroup.procs file automatically removes it
315       from the cgroup of which it was previously a member.
316
317       The cgroup.procs file can be read to obtain a  list  of  the  processes
318       that are members of a cgroup.  The returned list of PIDs is not guaran‐
319       teed to be in order.  Nor is it guaranteed to be  free  of  duplicates.
320       (For example, a PID may be recycled while reading from the list.)
321
322       In  cgroups  v1, an individual thread can be moved to another cgroup by
323       writing its thread ID (i.e., the kernel thread ID returned by  clone(2)
324       and  gettid(2)) to the tasks file in a cgroup directory.  This file can
325       be read to discover the set of threads that are members of the cgroup.
326
327   Removing cgroups
328       To remove a cgroup, it must first have no child cgroups and contain  no
329       (nonzombie) processes.  So long as that is the case, one can simply re‐
330       move the corresponding directory pathname.  Note that files in a cgroup
331       directory cannot and need not be removed.
332
333   Cgroups v1 release notification
334       Two  files can be used to determine whether the kernel provides notifi‐
335       cations when a cgroup becomes empty.  A  cgroup  is  considered  to  be
336       empty when it contains no child cgroups and no member processes.
337
338       A  special  file  in  the  root directory of each cgroup hierarchy, re‐
339       lease_agent, can be used to register the pathname of a program that may
340       be  invoked when a cgroup in the hierarchy becomes empty.  The pathname
341       of the newly empty cgroup (relative to the cgroup mount point) is  pro‐
342       vided  as the sole command-line argument when the release_agent program
343       is invoked.  The release_agent program might remove the  cgroup  direc‐
344       tory, or perhaps repopulate it with a process.
345
346       The  default  value of the release_agent file is empty, meaning that no
347       release agent is invoked.
348
349       The content of the release_agent file can also be specified via a mount
350       option when the cgroup filesystem is mounted:
351
352           mount -o release_agent=pathname ...
353
354       Whether  or  not the release_agent program is invoked when a particular
355       cgroup becomes empty is determined by the value  in  the  notify_on_re‐
356       lease  file  in  the corresponding cgroup directory.  If this file con‐
357       tains the value 0, then the release_agent program is not  invoked.   If
358       it contains the value 1, the release_agent program is invoked.  The de‐
359       fault value for this file in the root cgroup is 0.  At the time when  a
360       new  cgroup  is  created,  the value in this file is inherited from the
361       corresponding file in the parent cgroup.
362
363   Cgroup v1 named hierarchies
364       In cgroups v1, it is possible to mount a cgroup hierarchy that  has  no
365       attached controllers:
366
367           mount -t cgroup -o none,name=somename none /some/mount/point
368
369       Multiple  instances  of such hierarchies can be mounted; each hierarchy
370       must have a unique name.  The only purpose of such  hierarchies  is  to
371       track  processes.   (See the discussion of release notification below.)
372       An example of this is the name=systemd cgroup hierarchy that is used by
373       systemd(1) to track services and user sessions.
374
375       Since  Linux 5.0, the cgroup_no_v1 kernel boot option (described below)
376       can be used to disable  cgroup  v1  named  hierarchies,  by  specifying
377       cgroup_no_v1=named.
378

CGROUPS VERSION 2

380       In cgroups v2, all mounted controllers reside in a single unified hier‐
381       archy.  While (different) controllers may be simultaneously mounted un‐
382       der  the  v1  and  v2 hierarchies, it is not possible to mount the same
383       controller simultaneously under both the v1 and the v2 hierarchies.
384
385       The new behaviors in cgroups v2 are summarized here, and in some  cases
386       elaborated in the following subsections.
387
388       1. Cgroups  v2  provides  a  unified  hierarchy  against which all con‐
389          trollers are mounted.
390
391       2. "Internal" processes are not permitted.  With the exception  of  the
392          root  cgroup,  processes may reside only in leaf nodes (cgroups that
393          do not themselves contain child cgroups).  The details are  somewhat
394          more subtle than this, and are described below.
395
396       3. Active  cgroups  must  be specified via the files cgroup.controllers
397          and cgroup.subtree_control.
398
399       4. The   tasks   file   has   been   removed.    In    addition,    the
400          cgroup.clone_children file that is employed by the cpuset controller
401          has been removed.
402
403       5. An improved mechanism for notification of empty cgroups is  provided
404          by the cgroup.events file.
405
406       For  more changes, see the Documentation/admin-guide/cgroup-v2.rst file
407       in the kernel source (or Documentation/cgroup-v2.txt in Linux 4.17  and
408       earlier).
409
410       Some of the new behaviors listed above saw subsequent modification with
411       the addition in Linux 4.14 of "thread mode" (described below).
412
413   Cgroups v2 unified hierarchy
414       In cgroups v1, the ability to mount different controllers against  dif‐
415       ferent hierarchies was intended to allow great flexibility for applica‐
416       tion design.  In practice, though, the flexibility  turned  out  to  be
417       less  useful than expected, and in many cases added complexity.  There‐
418       fore, in cgroups v2, all available controllers are  mounted  against  a
419       single hierarchy.  The available controllers are automatically mounted,
420       meaning that it is not necessary (or  possible)  to  specify  the  con‐
421       trollers when mounting the cgroup v2 filesystem using a command such as
422       the following:
423
424           mount -t cgroup2 none /mnt/cgroup2
425
426       A cgroup v2 controller is available only if it is not currently in  use
427       via  a  mount against a cgroup v1 hierarchy.  Or, to put things another
428       way, it is not possible to employ the same controller against both a v1
429       hierarchy and the unified v2 hierarchy.  This means that it may be nec‐
430       essary first to unmount a v1 controller  (as  described  above)  before
431       that  controller  is available in v2.  Since systemd(1) makes heavy use
432       of some v1 controllers by default, it can in some cases be  simpler  to
433       boot  the  system  with  selected v1 controllers disabled.  To do this,
434       specify the cgroup_no_v1=list option on the kernel boot  command  line;
435       list  is a comma-separated list of the names of the controllers to dis‐
436       able, or the word all to disable all v1 controllers.   (This  situation
437       is correctly handled by systemd(1), which falls back to operating with‐
438       out the specified controllers.)
439
440       Note that on many modern systems, systemd(1) automatically  mounts  the
441       cgroup2 filesystem at /sys/fs/cgroup/unified during the boot process.
442
443   Cgroups v2 mount options
444       The  following  options  (mount  -o) can be specified when mounting the
445       group v2 filesystem:
446
447       nsdelegate (since Linux 4.15)
448              Treat cgroup namespaces as delegation boundaries.  For  details,
449              see below.
450
451       memory_localevents (since Linux 5.2)
452              The memory.events should show statistics only for the cgroup it‐
453              self, and not for any descendant cgroups.  This was the behavior
454              before  Linux  5.2.  Starting in Linux 5.2, the default behavior
455              is  to  include  statistics  for  descendant  cgroups  in   mem‐
456              ory.events,  and  this mount option can be used to revert to the
457              legacy behavior.  This option is system wide and can be  set  on
458              mount  or  modified  through remount only from the initial mount
459              namespace; it is silently ignored in noninitial namespaces.
460
461   Cgroups v2 controllers
462       The following controllers, documented in the kernel source  file  Docu‐
463       mentation/admin-guide/cgroup-v2.rst  (or Documentation/cgroup-v2.txt in
464       Linux 4.17 and earlier), are supported in cgroups version 2:
465
466       cpu (since Linux 4.15)
467              This is the successor to the version  1  cpu  and  cpuacct  con‐
468              trollers.
469
470       cpuset (since Linux 5.0)
471              This is the successor of the version 1 cpuset controller.
472
473       freezer (since Linux 5.2)
474              This is the successor of the version 1 freezer controller.
475
476       hugetlb (since Linux 5.6)
477              This is the successor of the version 1 hugetlb controller.
478
479       io (since Linux 4.5)
480              This is the successor of the version 1 blkio controller.
481
482       memory (since Linux 4.5)
483              This is the successor of the version 1 memory controller.
484
485       perf_event (since Linux 4.11)
486              This is the same as the version 1 perf_event controller.
487
488       pids (since Linux 4.5)
489              This is the same as the version 1 pids controller.
490
491       rdma (since Linux 4.11)
492              This is the same as the version 1 rdma controller.
493
494       There  is  no direct equivalent of the net_cls and net_prio controllers
495       from cgroups version 1.  Instead, support has been added to iptables(8)
496       to  allow  eBPF  filters that hook on cgroup v2 pathnames to make deci‐
497       sions about network traffic on a per-cgroup basis.
498
499       The v2 devices controller provides no interface files; instead,  device
500       control  is gated by attaching an eBPF (BPF_CGROUP_DEVICE) program to a
501       v2 cgroup.
502
503   Cgroups v2 subtree control
504       Each cgroup in the v2 hierarchy contains the following two files:
505
506       cgroup.controllers
507              This read-only file exposes a list of the controllers  that  are
508              available  in  this cgroup.  The contents of this file match the
509              contents  of  the  cgroup.subtree_control  file  in  the  parent
510              cgroup.
511
512       cgroup.subtree_control
513              This  is  a list of controllers that are active (enabled) in the
514              cgroup.  The set of controllers in this file is a subset of  the
515              set in the cgroup.controllers of this cgroup.  The set of active
516              controllers is modified by writing strings to this file contain‐
517              ing  space-delimited  controller names, each preceded by '+' (to
518              enable a controller) or '-' (to disable a controller), as in the
519              following example:
520
521                  echo '+pids -memory' > x/y/cgroup.subtree_control
522
523              An  attempt  to  enable  a  controller  that  is  not present in
524              cgroup.controllers leads to an ENOENT error when writing to  the
525              cgroup.subtree_control file.
526
527       Because  the  list of controllers in cgroup.subtree_control is a subset
528       of those cgroup.controllers, a controller that has been disabled in one
529       cgroup  in  the  hierarchy can never be re-enabled in the subtree below
530       that cgroup.
531
532       A cgroup's cgroup.subtree_control  file  determines  the  set  of  con‐
533       trollers  that  are  exercised in the child cgroups.  When a controller
534       (e.g., pids) is present in the cgroup.subtree_control file of a  parent
535       cgroup,   then  the  corresponding  controller-interface  files  (e.g.,
536       pids.max) are automatically created in the children of that cgroup  and
537       can be used to exert resource control in the child cgroups.
538
539   Cgroups v2 "no internal processes" rule
540       Cgroups  v2 enforces a so-called "no internal processes" rule.  Roughly
541       speaking, this rule means that, with the exception of the root  cgroup,
542       processes may reside only in leaf nodes (cgroups that do not themselves
543       contain child cgroups).  This avoids the need to decide how  to  parti‐
544       tion resources between processes which are members of cgroup A and pro‐
545       cesses in child cgroups of A.
546
547       For instance, if cgroup /cg1/cg2 exists, then a process may  reside  in
548       /cg1/cg2, but not in /cg1.  This is to avoid an ambiguity in cgroups v1
549       with respect to the delegation of resources between processes  in  /cg1
550       and  its  child  cgroups.  The recommended approach in cgroups v2 is to
551       create a subdirectory called leaf for any nonleaf cgroup  which  should
552       contain  processes, but no child cgroups.  Thus, processes which previ‐
553       ously would have gone into /cg1 would now go into /cg1/leaf.  This  has
554       the  advantage of making explicit the relationship between processes in
555       /cg1/leaf and /cg1's other children.
556
557       The "no internal processes" rule is in fact  more  subtle  than  stated
558       above.   More precisely, the rule is that a (nonroot) cgroup can't both
559       (1) have member processes, and  (2)  distribute  resources  into  child
560       cgroups—that is, have a nonempty cgroup.subtree_control file.  Thus, it
561       is possible for a cgroup  to  have  both  member  processes  and  child
562       cgroups,  but  before  controllers  can be enabled for that cgroup, the
563       member processes must be moved out of the cgroup  (e.g.,  perhaps  into
564       the child cgroups).
565
566       With  the  Linux  4.14 addition of "thread mode" (described below), the
567       "no internal processes" rule has been relaxed in some cases.
568
569   Cgroups v2 cgroup.events file
570       Each nonroot cgroup in the v2  hierarchy  contains  a  read-only  file,
571       cgroup.events, whose contents are key-value pairs (delimited by newline
572       characters, with the key and value separated by spaces) providing state
573       information about the cgroup:
574
575           $ cat mygrp/cgroup.events
576           populated 1
577           frozen 0
578
579       The following keys may appear in this file:
580
581       populated
582              The  value of this key is either 1, if this cgroup or any of its
583              descendants has member processes, or otherwise 0.
584
585       frozen (since Linux 5.2)
586              The value of this key is 1 if this cgroup is  currently  frozen,
587              or 0 if it is not.
588
589       The  cgroup.events file can be monitored, in order to receive notifica‐
590       tion when the value of one of its keys changes.  Such monitoring can be
591       done  using  inotify(7), which notifies changes as IN_MODIFY events, or
592       poll(2), which notifies changes by returning the  POLLPRI  and  POLLERR
593       bits in the revents field.
594
595   Cgroup v2 release notification
596       Cgroups  v2  provides a new mechanism for obtaining notification when a
597       cgroup becomes empty.  The cgroups v1 release_agent  and  notify_on_re‐
598       lease  files  are  removed,  and  replaced  by the populated key in the
599       cgroup.events file.  This key either has the value 0, meaning that  the
600       cgroup  (and  its descendants) contain no (nonzombie) member processes,
601       or 1, meaning that the cgroup (or one of its descendants) contains mem‐
602       ber processes.
603
604       The  cgroups v2 release-notification mechanism offers the following ad‐
605       vantages over the cgroups v1 release_agent mechanism:
606
607       *  It allows for cheaper notification, since a single process can moni‐
608          tor  multiple  cgroup.events  files  (using the techniques described
609          earlier).  By contrast, the cgroups v1 mechanism  requires  the  ex‐
610          pense of creating a process for each notification.
611
612       *  Notification for different cgroup subhierarchies can be delegated to
613          different processes.  By contrast, the cgroups v1  mechanism  allows
614          only one release agent for an entire hierarchy.
615
616   Cgroups v2 cgroup.stat file
617       Each  cgroup  in the v2 hierarchy contains a read-only cgroup.stat file
618       (first introduced in Linux 4.14) that consists of lines containing key-
619       value pairs.  The following keys currently appear in this file:
620
621       nr_descendants
622              This  is  the  total number of visible (i.e., living) descendant
623              cgroups underneath this cgroup.
624
625       nr_dying_descendants
626              This is the total number of dying descendant cgroups  underneath
627              this  cgroup.   A  cgroup  enters  the  dying  state after being
628              deleted.  It remains in  that  state  for  an  undefined  period
629              (which will depend on system load) while resources are freed be‐
630              fore the cgroup is destroyed.  Note that the  presence  of  some
631              cgroups  in  the dying state is normal, and is not indicative of
632              any problem.
633
634              A process can't be made a member of a dying cgroup, and a  dying
635              cgroup can't be brought back to life.
636
637   Limiting the number of descendant cgroups
638       Each cgroup in the v2 hierarchy contains the following files, which can
639       be used to view and set limits on the number of descendant cgroups  un‐
640       der that cgroup:
641
642       cgroup.max.depth (since Linux 4.14)
643              This  file defines a limit on the depth of nesting of descendant
644              cgroups.  A value of 0 in this file  means  that  no  descendant
645              cgroups can be created.  An attempt to create a descendant whose
646              nesting level exceeds the limit fails (mkdir(2) fails  with  the
647              error EAGAIN).
648
649              Writing the string "max" to this file means that no limit is im‐
650              posed.  The default value in this file is "max".
651
652       cgroup.max.descendants (since Linux 4.14)
653              This file defines a limit  on  the  number  of  live  descendant
654              cgroups  that  this  cgroup may have.  An attempt to create more
655              descendants than allowed by the limit fails (mkdir(2) fails with
656              the error EAGAIN).
657
658              Writing the string "max" to this file means that no limit is im‐
659              posed.  The default value in this file is "max".
660

CGROUPS DELEGATION: DELEGATING A HIERARCHY TO A LESS PRIVILEGED USER

662       In the context of cgroups, delegation means passing management of  some
663       subtree  of  the  cgroup hierarchy to a nonprivileged user.  Cgroups v1
664       provides support for delegation based on file permissions in the cgroup
665       hierarchy  but with less strict containment rules than v2 (as noted be‐
666       low).  Cgroups v2 supports delegation with containment by explicit  de‐
667       sign.   The focus of the discussion in this section is on delegation in
668       cgroups v2, with some differences for cgroups v1 noted along the way.
669
670       Some terminology is required in order to describe delegation.  A  dele‐
671       gater  is  a  privileged user (i.e., root) who owns a parent cgroup.  A
672       delegatee is a nonprivileged user who will be granted  the  permissions
673       needed  to  manage some subhierarchy under that parent cgroup, known as
674       the delegated subtree.
675
676       To perform delegation, the  delegater  makes  certain  directories  and
677       files writable by the delegatee, typically by changing the ownership of
678       the objects to be the user ID of the delegatee.  Assuming that we  want
679       to  delegate the hierarchy rooted at (say) /dlgt_grp and that there are
680       not yet any child cgroups under that cgroup, the ownership of the  fol‐
681       lowing is changed to the user ID of the delegatee:
682
683       /dlgt_grp
684              Changing the ownership of the root of the subtree means that any
685              new cgroups created under the subtree (and the files  they  con‐
686              tain) will also be owned by the delegatee.
687
688       /dlgt_grp/cgroup.procs
689              Changing the ownership of this file means that the delegatee can
690              move processes into the root of the delegated subtree.
691
692       /dlgt_grp/cgroup.subtree_control (cgroups v2 only)
693              Changing the ownership of this file means that the delegatee can
694              enable  controllers  (that  are present in /dlgt_grp/cgroup.con‐
695              trollers) in order to further redistribute  resources  at  lower
696              levels  in the subtree.  (As an alternative to changing the own‐
697              ership of this file, the delegater might  instead  add  selected
698              controllers to this file.)
699
700       /dlgt_grp/cgroup.threads (cgroups v2 only)
701              Changing  the  ownership of this file is necessary if a threaded
702              subtree is being  delegated  (see  the  description  of  "thread
703              mode",  below).   This permits the delegatee to write thread IDs
704              to the file.  (The ownership of this file can  also  be  changed
705              when  delegating  a domain subtree, but currently this serves no
706              purpose, since, as described below, it is not possible to move a
707              thread  between  domain  cgroups by writing its thread ID to the
708              cgroup.threads file.)
709
710              In cgroups v1, the corresponding file  that  should  instead  be
711              delegated is the tasks file.
712
713       The  delegater should not change the ownership of any of the controller
714       interfaces files (e.g.,  pids.max,  memory.high)  in  dlgt_grp.   Those
715       files are used from the next level above the delegated subtree in order
716       to distribute resources into the subtree, and the delegatee should  not
717       have  permission  to change the resources that are distributed into the
718       delegated subtree.
719
720       See also the discussion  of  the  /sys/kernel/cgroup/delegate  file  in
721       NOTES for information about further delegatable files in cgroups v2.
722
723       After  the  aforementioned steps have been performed, the delegatee can
724       create child cgroups within the delegated subtree (the cgroup subdirec‐
725       tories  and  the files they contain will be owned by the delegatee) and
726       move processes between cgroups in the subtree.  If some controllers are
727       present  in  dlgt_grp/cgroup.subtree_control,  or the ownership of that
728       file was passed to the delegatee, the delegatee can  also  control  the
729       further  redistribution  of  the corresponding resources into the dele‐
730       gated subtree.
731
732   Cgroups v2 delegation: nsdelegate and cgroup namespaces
733       Starting with Linux 4.13, there is a second way to perform cgroup dele‐
734       gation  in  the  cgroups v2 hierarchy.  This is done by mounting or re‐
735       mounting the cgroup v2 filesystem with  the  nsdelegate  mount  option.
736       For  example,  if the cgroup v2 filesystem has already been mounted, we
737       can remount it with the nsdelegate option as follows:
738
739           mount -t cgroup2 -o remount,nsdelegate \
740                            none /sys/fs/cgroup/unified
741
742       The effect of this mount option is to cause cgroup namespaces to  auto‐
743       matically become delegation boundaries.  More specifically, the follow‐
744       ing restrictions apply for processes inside the cgroup namespace:
745
746       *  Writes to controller interface files in the root  directory  of  the
747          namespace  will  fail  with  the  error EPERM.  Processes inside the
748          cgroup namespace can still write to delegatable files  in  the  root
749          directory   of   the  cgroup  namespace  such  as  cgroup.procs  and
750          cgroup.subtree_control, and can create subhierarchy  underneath  the
751          root directory.
752
753       *  Attempts  to migrate processes across the namespace boundary are de‐
754          nied (with the error ENOENT).  Processes inside the cgroup namespace
755          can  still  (subject  to the containment rules described below) move
756          processes between cgroups within the subhierarchy  under  the  name‐
757          space root.
758
759       The  ability to define cgroup namespaces as delegation boundaries makes
760       cgroup namespaces more useful.  To understand why, suppose that we  al‐
761       ready  have one cgroup hierarchy that has been delegated to a nonprivi‐
762       leged user, cecilia, using the  older  delegation  technique  described
763       above.   Suppose further that cecilia wanted to further delegate a sub‐
764       hierarchy under the existing delegated hierarchy.   (For  example,  the
765       delegated  hierarchy might be associated with an unprivileged container
766       run by cecilia.)  Even if a cgroup namespace was employed, because both
767       hierarchies  are  owned by the unprivileged user cecilia, the following
768       illegitimate actions could be performed:
769
770       *  A process in the inferior hierarchy could change the  resource  con‐
771          troller  settings  in  the root directory of that hierarchy.  (These
772          resource controller settings are intended to allow control to be ex‐
773          ercised  from  the  parent cgroup; a process inside the child cgroup
774          should not be allowed to modify them.)
775
776       *  A process inside the inferior hierarchy could  move  processes  into
777          and out of the inferior hierarchy if the cgroups in the superior hi‐
778          erarchy were somehow visible.
779
780       Employing the nsdelegate mount option prevents both of these possibili‐
781       ties.
782
783       The  nsdelegate  mount  option only has an effect when performed in the
784       initial mount namespace; in  other  mount  namespaces,  the  option  is
785       silently ignored.
786
787       Note:  On  some  systems, systemd(1) automatically mounts the cgroup v2
788       filesystem.  In order to experiment with the nsdelegate  operation,  it
789       may  be  useful  to boot the kernel with the following command-line op‐
790       tions:
791
792           cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
793
794       These options cause the kernel to boot with the cgroups v1  controllers
795       disabled  (meaning that the controllers are available in the v2 hierar‐
796       chy), and tells systemd(1) not to mount and use the cgroup  v2  hierar‐
797       chy,  so that the v2 hierarchy can be manually mounted with the desired
798       options after boot-up.
799
800   Cgroup delegation containment rules
801       Some delegation containment rules ensure that the  delegatee  can  move
802       processes  between cgroups within the delegated subtree, but can't move
803       processes from outside the delegated subtree into the subtree  or  vice
804       versa.  A nonprivileged process (i.e., the delegatee) can write the PID
805       of a "target" process into a cgroup.procs file only if all of the  fol‐
806       lowing are true:
807
808       *  The writer has write permission on the cgroup.procs file in the des‐
809          tination cgroup.
810
811       *  The writer has write permission on  the  cgroup.procs  file  in  the
812          nearest common ancestor of the source and destination cgroups.  Note
813          that in some cases, the nearest common ancestor may be the source or
814          destination  cgroup  itself.   This  requirement is not enforced for
815          cgroups v1 hierarchies, with the consequence that containment in  v1
816          is  less  strict  than  in v2.  (For example, in cgroups v1 the user
817          that owns two distinct delegated subhierarchies can move  a  process
818          between the hierarchies.)
819
820       *  If  the cgroup v2 filesystem was mounted with the nsdelegate option,
821          the writer must be able to see the source  and  destination  cgroups
822          from its cgroup namespace.
823
824       *  In cgroups v1: the effective UID of the writer (i.e., the delegatee)
825          matches the real user ID or the  saved  set-user-ID  of  the  target
826          process.   Before  Linux  4.11,  this  requirement  also  applied in
827          cgroups v2 (This was a historical requirement inherited from cgroups
828          v1  that was later deemed unnecessary, since the other rules suffice
829          for containment in cgroups v2.)
830
831       Note: one consequence of these delegation containment rules is that the
832       unprivileged delegatee can't place the first process into the delegated
833       subtree; instead, the delegater must place the first process (a process
834       owned by the delegatee) into the delegated subtree.
835

CGROUPS VERSION 2 THREAD MODE

837       Among  the  restrictions imposed by cgroups v2 that were not present in
838       cgroups v1 are the following:
839
840       *  No thread-granularity control: all of the threads of a process  must
841          be in the same cgroup.
842
843       *  No internal processes: a cgroup can't both have member processes and
844          exercise controllers on child cgroups.
845
846       Both of these restrictions were added because the  lack  of  these  re‐
847       strictions  had  caused  problems  in  cgroups  v1.  In particular, the
848       cgroups v1 ability to allow thread-level granularity for cgroup member‐
849       ship  made  no  sense for some controllers.  (A notable example was the
850       memory controller: since threads share an address  space,  it  made  no
851       sense to split threads across different memory cgroups.)
852
853       Notwithstanding  the  initial design decision in cgroups v2, there were
854       use cases for certain controllers,  notably  the  cpu  controller,  for
855       which  thread-level  granularity  of control was meaningful and useful.
856       To accommodate such use cases, Linux 4.14 added thread mode for cgroups
857       v2.
858
859       Thread mode allows the following:
860
861       *  The  creation of threaded subtrees in which the threads of a process
862          may be spread across cgroups inside the tree.  (A  threaded  subtree
863          may contain multiple multithreaded processes.)
864
865       *  The  concept of threaded controllers, which can distribute resources
866          across the cgroups in a threaded subtree.
867
868       *  A relaxation of the "no internal processes rule", so that, within  a
869          threaded subtree, a cgroup can both contain member threads and exer‐
870          cise resource control over child cgroups.
871
872       With the addition of thread mode, each nonroot cgroup  now  contains  a
873       new  file,  cgroup.type, that exposes, and in some circumstances can be
874       used to change, the "type" of a cgroup.  This file contains one of  the
875       following type values:
876
877       domain This  is  a  normal  v2 cgroup that provides process-granularity
878              control.  If a process is a member  of  this  cgroup,  then  all
879              threads  of  the process are (by definition) in the same cgroup.
880              This is the default cgroup type, and provides the same  behavior
881              that  was  provided for cgroups in the initial cgroups v2 imple‐
882              mentation.
883
884       threaded
885              This cgroup is a member of a threaded subtree.  Threads  can  be
886              added  to  this  cgroup,  and controllers can be enabled for the
887              cgroup.
888
889       domain threaded
890              This is a domain cgroup that serves as the root  of  a  threaded
891              subtree.  This cgroup type is also known as "threaded root".
892
893       domain invalid
894              This  is  a  cgroup inside a threaded subtree that is in an "in‐
895              valid" state.  Processes can't be added to the cgroup, and  con‐
896              trollers  can't  be enabled for the cgroup.  The only thing that
897              can be done with this cgroup (other than deleting it) is to con‐
898              vert it to a threaded cgroup by writing the string "threaded" to
899              the cgroup.type file.
900
901              The rationale for the existence of this  "interim"  type  during
902              the  creation of a threaded subtree (rather than the kernel sim‐
903              ply immediately converting all cgroups under the  threaded  root
904              to the type threaded) is to allow for possible future extensions
905              to the thread mode model
906
907   Threaded versus domain controllers
908       With the addition of threads mode, cgroups  v2  now  distinguishes  two
909       types of resource controllers:
910
911       *  Threaded  controllers:  these controllers support thread-granularity
912          for resource control and can be enabled  inside  threaded  subtrees,
913          with  the  result  that the corresponding controller-interface files
914          appear inside the cgroups in the  threaded  subtree.   As  at  Linux
915          4.19,  the  following controllers are threaded: cpu, perf_event, and
916          pids.
917
918       *  Domain controllers: these controllers support only process granular‐
919          ity  for  resource  control.   From the perspective of a domain con‐
920          troller, all threads of a process are always  in  the  same  cgroup.
921          Domain controllers can't be enabled inside a threaded subtree.
922
923   Creating a threaded subtree
924       There are two pathways that lead to the creation of a threaded subtree.
925       The first pathway proceeds as follows:
926
927       1. We write the string "threaded" to the cgroup.type file of  a  cgroup
928          y/z  that currently has the type domain.  This has the following ef‐
929          fects:
930
931          *  The type of the cgroup y/z becomes threaded.
932
933          *  The type of the parent cgroup, y, becomes domain  threaded.   The
934             parent  cgroup  is  the root of a threaded subtree (also known as
935             the "threaded root").
936
937          *  All other cgroups under y that were not already of type  threaded
938             (because  they were inside already existing threaded subtrees un‐
939             der the new threaded root) are converted to type domain  invalid.
940             Any  subsequently created cgroups under y will also have the type
941             domain invalid.
942
943       2. We write the string "threaded" to each of the domain invalid cgroups
944          under y, in order to convert them to the type threaded.  As a conse‐
945          quence of this step, all threads under the threaded  root  now  have
946          the type threaded and the threaded subtree is now fully usable.  The
947          requirement to write "threaded" to each of these cgroups is somewhat
948          cumbersome, but allows for possible future extensions to the thread-
949          mode model.
950
951       The second way of creating a threaded subtree is as follows:
952
953       1. In an existing cgroup, z, that currently has the type domain, we (1)
954          enable  one  or  more  threaded controllers and (2) make a process a
955          member of z.  (These two steps can be done in either  order.)   This
956          has the following consequences:
957
958          *  The type of z becomes domain threaded.
959
960          *  All  of the descendant cgroups of x that were not already of type
961             threaded are converted to type domain invalid.
962
963       2. As before, we make the threaded subtree usable by writing the string
964          "threaded"  to  each of the domain invalid cgroups under y, in order
965          to convert them to the type threaded.
966
967       One of the consequences of the above pathways to  creating  a  threaded
968       subtree  is  that  the  threaded  root  cgroup  can be a parent only to
969       threaded (and domain invalid) cgroups.  The threaded root cgroup  can't
970       be  a  parent  of  a domain cgroups, and a threaded cgroup can't have a
971       sibling that is a domain cgroup.
972
973   Using a threaded subtree
974       Within a threaded subtree, threaded controllers can be enabled in  each
975       subgroup  whose  type  has been changed to threaded; upon doing so, the
976       corresponding controller interface files appear in the children of that
977       cgroup.
978
979       A  process  can  be moved into a threaded subtree by writing its PID to
980       the cgroup.procs file in one of the cgroups inside the tree.  This  has
981       the  effect  of making all of the threads in the process members of the
982       corresponding cgroup and makes the process a  member  of  the  threaded
983       subtree.   The  threads  of  the  process can then be spread across the
984       threaded subtree by writing their thread IDs  (see  gettid(2))  to  the
985       cgroup.threads  files  in  different  cgroups  inside the subtree.  The
986       threads of a process must all reside in the same threaded subtree.
987
988       As with writing to cgroup.procs,  some  containment  rules  apply  when
989       writing to the cgroup.threads file:
990
991       *  The  writer must have write permission on the cgroup.threads file in
992          the destination cgroup.
993
994       *  The writer must have write permission on the  cgroup.procs  file  in
995          the common ancestor of the source and destination cgroups.  (In some
996          cases, the common ancestor may be the source or  destination  cgroup
997          itself.)
998
999       *  The source and destination cgroups must be in the same threaded sub‐
1000          tree.  (Outside a threaded subtree, an attempt to move a  thread  by
1001          writing  its thread ID to the cgroup.threads file in a different do‐
1002          main cgroup fails with the error EOPNOTSUPP.)
1003
1004       The cgroup.threads file is present in  each  cgroup  (including  domain
1005       cgroups)  and  can be read in order to discover the set of threads that
1006       is present in the cgroup.  The set of thread IDs obtained when  reading
1007       this file is not guaranteed to be ordered or free of duplicates.
1008
1009       The  cgroup.procs  file in the threaded root shows the PIDs of all pro‐
1010       cesses that are members of  the  threaded  subtree.   The  cgroup.procs
1011       files in the other cgroups in the subtree are not readable.
1012
1013       Domain  controllers  can't  be  enabled  in a threaded subtree; no con‐
1014       troller-interface  files  appear  inside  the  cgroups  underneath  the
1015       threaded root.  From the point of view of a domain controller, threaded
1016       subtrees are invisible: a multithreaded process inside a threaded  sub‐
1017       tree  appears  to  a domain controller as a process that resides in the
1018       threaded root cgroup.
1019
1020       Within a threaded subtree, the "no internal processes"  rule  does  not
1021       apply: a cgroup can both contain member processes (or thread) and exer‐
1022       cise controllers on child cgroups.
1023
1024   Rules for writing to cgroup.type and creating threaded subtrees
1025       A number of rules apply when writing to the cgroup.type file:
1026
1027       *  Only the string "threaded" may be written.  In other words, the only
1028          explicit  transition  that is possible is to convert a domain cgroup
1029          to type threaded.
1030
1031       *  The effect of writing "threaded" depends on  the  current  value  in
1032          cgroup.type, as follows:
1033
1034          •  domain  or domain threaded: start the creation of a threaded sub‐
1035             tree (whose root is the parent of this cgroup) via the  first  of
1036             the pathways described above;
1037
1038          •  domain invalid:  convert  this cgroup (which is inside a threaded
1039             subtree) to a usable (i.e., threaded) state;
1040
1041          •  threaded: no effect (a "no-op").
1042
1043       *  We can't write to a cgroup.type file if the parent's type is  domain
1044          invalid.   In other words, the cgroups of a threaded subtree must be
1045          converted to the threaded state in a top-down manner.
1046
1047       There are also some constraints that must be satisfied in order to cre‐
1048       ate a threaded subtree rooted at the cgroup x:
1049
1050       *  There  can  be  no  member processes in the descendant cgroups of x.
1051          (The cgroup x can itself have member processes.)
1052
1053       *  No domain controllers may be enabled in  x's  cgroup.subtree_control
1054          file.
1055
1056       If  any  of the above constraints is violated, then an attempt to write
1057       "threaded" to a cgroup.type file fails with the error ENOTSUP.
1058
1059   The "domain threaded" cgroup type
1060       According to the pathways described above, the type  of  a  cgroup  can
1061       change to domain threaded in either of the following cases:
1062
1063       *  The string "threaded" is written to a child cgroup.
1064
1065       *  A  threaded controller is enabled inside the cgroup and a process is
1066          made a member of the cgroup.
1067
1068       A domain threaded cgroup, x, can revert to the type domain if the above
1069       conditions  no  longer hold true—that is, if all threaded child cgroups
1070       of x are removed and either x no longer has  threaded  controllers  en‐
1071       abled or no longer has member processes.
1072
1073       When a domain threaded cgroup x reverts to the type domain:
1074
1075       *  All  domain  invalid  descendants  of  x that are not in lower-level
1076          threaded subtrees revert to the type domain.
1077
1078       *  The root cgroups in any lower-level threaded subtrees revert to  the
1079          type domain threaded.
1080
1081   Exceptions for the root cgroup
1082       The root cgroup of the v2 hierarchy is treated exceptionally: it can be
1083       the parent  of  both  domain  and  threaded  cgroups.   If  the  string
1084       "threaded" is written to the cgroup.type file of one of the children of
1085       the root cgroup, then
1086
1087       *  The type of that cgroup becomes threaded.
1088
1089       *  The type of any descendants of that cgroup  that  are  not  part  of
1090          lower-level threaded subtrees changes to domain invalid.
1091
1092       Note  that  in  this case, there is no cgroup whose type becomes domain
1093       threaded.  (Notionally, the  root  cgroup  can  be  considered  as  the
1094       threaded root for the cgroup whose type was changed to threaded.)
1095
1096       The aim of this exceptional treatment for the root cgroup is to allow a
1097       threaded cgroup that employs the cpu controller to be placed as high as
1098       possible  in  the  hierarchy,  so  as  to  minimize the (small) cost of
1099       traversing the cgroup hierarchy.
1100
1101   The cgroups v2 "cpu" controller and realtime threads
1102       As at Linux 4.19, the cgroups v2 cpu controller does not  support  con‐
1103       trol  of  realtime threads (specifically threads scheduled under any of
1104       the  policies  SCHED_FIFO,  SCHED_RR,  described  SCHED_DEADLINE;   see
1105       sched(7)).   Therefore,  the  cpu controller can be enabled in the root
1106       cgroup only if all realtime threads are in the root cgroup.  (If  there
1107       are  realtime threads in nonroot cgroups, then a write(2) of the string
1108       "+cpu" to the cgroup.subtree_control file fails with the error EINVAL.)
1109
1110       On some systems, systemd(1) places certain realtime threads in  nonroot
1111       cgroups in the v2 hierarchy.  On such systems, these threads must first
1112       be moved to the root cgroup before the cpu controller can be enabled.
1113

ERRORS

1115       The following errors can occur for mount(2):
1116
1117       EBUSY  An attempt to mount a cgroup version 1 filesystem specified nei‐
1118              ther  the  name=  option (to mount a named hierarchy) nor a con‐
1119              troller name (or all).
1120

NOTES

1122       A child process created via fork(2) inherits its parent's  cgroup  mem‐
1123       berships.   A  process's  cgroup  memberships  are preserved across ex‐
1124       ecve(2).
1125
1126       The clone3(2) CLONE_INTO_CGROUP flag can be  used  to  create  a  child
1127       process  that  begins its life in a different version 2 cgroup from the
1128       parent process.
1129
1130   /proc files
1131       /proc/cgroups (since Linux 2.6.24)
1132              This file contains information about the  controllers  that  are
1133              compiled  into  the  kernel.  An example of the contents of this
1134              file (reformatted for readability) is the following:
1135
1136                  #subsys_name    hierarchy      num_cgroups    enabled
1137                  cpuset          4              1              1
1138                  cpu             8              1              1
1139                  cpuacct         8              1              1
1140                  blkio           6              1              1
1141                  memory          3              1              1
1142                  devices         10             84             1
1143                  freezer         7              1              1
1144                  net_cls         9              1              1
1145                  perf_event      5              1              1
1146                  net_prio        9              1              1
1147                  hugetlb         0              1              0
1148                  pids            2              1              1
1149
1150              The fields in this file are, from left to right:
1151
1152              1. The name of the controller.
1153
1154              2. The unique ID of the cgroup  hierarchy  on  which  this  con‐
1155                 troller  is  mounted.  If multiple cgroups v1 controllers are
1156                 bound to the same hierarchy, then each will show the same hi‐
1157                 erarchy  ID in this field.  The value in this field will be 0
1158                 if:
1159
1160                   a) the controller is not mounted on a cgroups v1 hierarchy;
1161
1162                   b) the controller is bound to the cgroups v2 single unified
1163                      hierarchy; or
1164
1165                   c) the controller is disabled (see below).
1166
1167              3. The  number  of  control  groups in this hierarchy using this
1168                 controller.
1169
1170              4. This field contains the value 1 if  this  controller  is  en‐
1171                 abled,  or  0 if it has been disabled (via the cgroup_disable
1172                 kernel command-line boot parameter).
1173
1174       /proc/[pid]/cgroup (since Linux 2.6.24)
1175              This file describes control groups to which the process with the
1176              corresponding  PID  belongs.   The displayed information differs
1177              for cgroups version 1 and version 2 hierarchies.
1178
1179              For each cgroup hierarchy of which  the  process  is  a  member,
1180              there is one entry containing three colon-separated fields:
1181
1182                  hierarchy-ID:controller-list:cgroup-path
1183
1184              For example:
1185
1186                  5:cpuacct,cpu,cpuset:/daemons
1187
1188              The colon-separated fields are, from left to right:
1189
1190              1. For  cgroups  version  1  hierarchies,  this field contains a
1191                 unique hierarchy ID number that can be matched to a hierarchy
1192                 ID  in  /proc/cgroups.   For the cgroups version 2 hierarchy,
1193                 this field contains the value 0.
1194
1195              2. For cgroups version 1  hierarchies,  this  field  contains  a
1196                 comma-separated  list of the controllers bound to the hierar‐
1197                 chy.  For the cgroups version  2  hierarchy,  this  field  is
1198                 empty.
1199
1200              3. This  field contains the pathname of the control group in the
1201                 hierarchy to which the process  belongs.   This  pathname  is
1202                 relative to the mount point of the hierarchy.
1203
1204   /sys/kernel/cgroup files
1205       /sys/kernel/cgroup/delegate (since Linux 4.15)
1206              This  file exports a list of the cgroups v2 files (one per line)
1207              that are delegatable (i.e., whose ownership should be changed to
1208              the  user ID of the delegatee).  In the future, the set of dele‐
1209              gatable files may change or grow, and this file provides  a  way
1210              for  the kernel to inform user-space applications of which files
1211              must be delegated.  As at Linux 4.15,  one  sees  the  following
1212              when inspecting this file:
1213
1214                  $ cat /sys/kernel/cgroup/delegate
1215                  cgroup.procs
1216                  cgroup.subtree_control
1217                  cgroup.threads
1218
1219       /sys/kernel/cgroup/features (since Linux 4.15)
1220              Over  time,  the set of cgroups v2 features that are provided by
1221              the kernel may change or grow, or some features may not  be  en‐
1222              abled  by  default.  This file provides a way for user-space ap‐
1223              plications to discover what features the running kernel supports
1224              and has enabled.  Features are listed one per line:
1225
1226                  $ cat /sys/kernel/cgroup/features
1227                  nsdelegate
1228                  memory_localevents
1229
1230              The entries that can appear in this file are:
1231
1232              memory_localevents (since Linux 5.2)
1233                     The kernel supports the memory_localevents mount option.
1234
1235              nsdelegate (since Linux 4.15)
1236                     The kernel supports the nsdelegate mount option.
1237

COLOPHON

1246       This  page  is  part of release 5.13 of the Linux man-pages project.  A
1247       description of the project, information about reporting bugs,  and  the
1248       latest     version     of     this    page,    can    be    found    at
1249       https://www.kernel.org/doc/man-pages/.
1250
1251
1252
1253Linux                             2021-08-27                        CGROUPS(7)