cgroups(7)

1CGROUPS(7)                 Linux Programmer's Manual                CGROUPS(7)
2
3
4

NAME

6       cgroups - Linux control groups
7

DESCRIPTION

9       Control groups, usually referred to as cgroups, are a Linux kernel fea‐
10       ture which allow processes to be  organized  into  hierarchical  groups
11       whose usage of various types of resources can then be limited and moni‐
12       tored.  The kernel's cgroup interface is  provided  through  a  pseudo-
13       filesystem called cgroupfs.  Grouping is implemented in the core cgroup
14       kernel code, while resource tracking and limits are  implemented  in  a
15       set of per-resource-type subsystems (memory, CPU, and so on).
16
17   Terminology
18       A cgroup is a collection of processes that are bound to a set of limits
19       or parameters defined via the cgroup filesystem.
20
21       A subsystem is a kernel component that modifies  the  behavior  of  the
22       processes  in a cgroup.  Various subsystems have been implemented, mak‐
23       ing it possible to do things such as limiting the amount  of  CPU  time
24       and memory available to a cgroup, accounting for the CPU time used by a
25       cgroup, and freezing and resuming  execution  of  the  processes  in  a
26       cgroup.   Subsystems  are  sometimes also known as resource controllers
27       (or simply, controllers).
28
29       The cgroups for a controller are arranged in a hierarchy.  This hierar‐
30       chy  is  defined  by  creating,  removing,  and renaming subdirectories
31       within  the  cgroup  filesystem.   At  each  level  of  the  hierarchy,
32       attributes  (e.g.,  limits)  can  be defined.  The limits, control, and
33       accounting provided by cgroups generally  have  effect  throughout  the
34       subhierarchy  underneath  the  cgroup where the attributes are defined.
35       Thus, for example, the limits placed on a cgroup at a higher  level  in
36       the hierarchy cannot be exceeded by descendant cgroups.
37
38   Cgroups version 1 and version 2
39       The  initial release of the cgroups implementation was in Linux 2.6.24.
40       Over time, various cgroup controllers have been added to allow the man‐
41       agement  of  various  types  of resources.  However, the development of
42       these controllers was largely uncoordinated, with the result that  many
43       inconsistencies  arose between controllers and management of the cgroup
44       hierarchies became rather complex.   (A  longer  description  of  these
45       problems   can   be   found   in  the  kernel  source  file  Documenta‐
46       tion/cgroup-v2.txt.)
47
48       Because  of  the  problems  with  the  initial  cgroups  implementation
49       (cgroups  version  1),  starting  in  Linux  3.10, work began on a new,
50       orthogonal implementation to remedy these problems.   Initially  marked
51       experimental,  and  hidden  behind  the -o __DEVEL__sane_behavior mount
52       option, the new version (cgroups version 2) was eventually  made  offi‐
53       cial  with  the release of Linux 4.5.  Differences between the two ver‐
54       sions are described in the text below.  The file  cgroup.sane_behavior,
55       present in cgroups v1, is a relic of this mount option. The file always
56       reports "0" and is only retained for backward compatibility.
57
58       Although cgroups v2 is intended as a replacement for  cgroups  v1,  the
59       older  system  continues  to  exist  (and  for compatibility reasons is
60       unlikely to be removed).  Currently, cgroups v2 implements only a  sub‐
61       set  of  the  controllers available in cgroups v1.  The two systems are
62       implemented so that both v1  controllers  and  v2  controllers  can  be
63       mounted  on  the same system.  Thus, for example, it is possible to use
64       those controllers that are supported under version 2, while also  using
65       version  1  controllers where version 2 does not yet support those con‐
66       trollers.  The only restriction here is  that  a  controller  can't  be
67       simultaneously  employed  in  both  a  cgroups  v1 hierarchy and in the
68       cgroups v2 hierarchy.
69

CGROUPS VERSION 1

71       Under cgroups v1, each controller may be  mounted  against  a  separate
72       cgroup  filesystem  that  provides its own hierarchical organization of
73       the processes on the system.  It is also possible to  comount  multiple
74       (or  even  all) cgroups v1 controllers against the same cgroup filesys‐
75       tem, meaning that the comounted controllers manage the same  hierarchi‐
76       cal organization of processes.
77
78       For  each  mounted  hierarchy,  the  directory tree mirrors the control
79       group hierarchy.  Each control group is  represented  by  a  directory,
80       with  each  of  its child control cgroups represented as a child direc‐
81       tory.   For  instance,  /user/joe/1.session  represents  control  group
82       1.session,  which  is a child of cgroup joe, which is a child of /user.
83       Under each cgroup directory is a set of files  which  can  be  read  or
84       written to, reflecting resource limits and a few general cgroup proper‐
85       ties.
86
87   Tasks (threads) versus processes
88       In cgroups v1, a distinction is drawn between processes and tasks.   In
89       this  view,  a  process  can  consist  of multiple tasks (more commonly
90       called threads, from a user-space perspective, and called such  in  the
91       remainder of this man page).  In cgroups v1, it is possible to indepen‐
92       dently manipulate the cgroup memberships of the threads in a process.
93
94       The cgroups v1 ability to split threads across different cgroups caused
95       problems  in  some cases.  For example, it made no sense for the memory
96       controller, since all of the  threads  of  a  process  share  a  single
97       address space.  Because of these problems, the ability to independently
98       manipulate the cgroup memberships of  the  threads  in  a  process  was
99       removed  in  the  initial  cgroups  v2 implementation, and subsequently
100       restored in a more limited form (see the discussion  of  "thread  mode"
101       below).
102
103   Mounting v1 controllers
104       The  use  of  cgroups  requires  a  kernel built with the CONFIG_CGROUP
105       option.  In addition, each of the v1 controllers has an associated con‐
106       figuration option that must be set in order to employ that controller.
107
108       In  order  to  use a v1 controller, it must be mounted against a cgroup
109       filesystem.  The usual place  for  such  mounts  is  under  a  tmpfs(5)
110       filesystem  mounted  at  /sys/fs/cgroup.  Thus, one might mount the cpu
111       controller as follows:
112
113           mount -t cgroup -o cpu none /sys/fs/cgroup/cpu
114
115       It is possible to comount multiple controllers against the same hierar‐
116       chy.   For  example, here the cpu and cpuacct controllers are comounted
117       against a single hierarchy:
118
119           mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct
120
121       Comounting controllers has the effect that a process  is  in  the  same
122       cgroup  for all of the comounted controllers.  Separately mounting con‐
123       trollers allows a process to be in  cgroup  /foo1  for  one  controller
124       while being in /foo2/foo3 for another.
125
126       It  is  possible to comount all v1 controllers against the same hierar‐
127       chy:
128
129           mount -t cgroup -o all cgroup /sys/fs/cgroup
130
131       (One can achieve the same result by omitting -o all, since  it  is  the
132       default if no controllers are explicitly specified.)
133
134       It is not possible to mount the same controller against multiple cgroup
135       hierarchies.  For example, it is not possible to mount both the cpu and
136       cpuacct  controllers  against  one hierarchy, and to mount the cpu con‐
137       troller alone against another hierarchy.  It is possible to create mul‐
138       tiple  mount points with exactly the same set of comounted controllers.
139       However, in this case all that results is multiple mount points provid‐
140       ing a view of the same hierarchy.
141
142       Note that on many systems, the v1 controllers are automatically mounted
143       under /sys/fs/cgroup; in particular, systemd(1)  automatically  creates
144       such mount points.
145
146   Unmounting v1 controllers
147       A  mounted  cgroup filesystem can be unmounted using the umount(8) com‐
148       mand, as in the following example:
149
150           umount /sys/fs/cgroup/pids
151
152       But note well: a cgroup filesystem is unmounted only if it is not busy,
153       that  is,  it  has no child cgroups.  If this is not the case, then the
154       only effect of the umount(8) is to make the mount invisible.  Thus,  to
155       ensure  that  the  mount point is really removed, one must first remove
156       all child cgroups, which in turn can be done only after all member pro‐
157       cesses have been moved from those cgroups to the root cgroup.
158
159   Cgroups version 1 controllers
160       Each  of the cgroups version 1 controllers is governed by a kernel con‐
161       figuration option (listed below).  Additionally,  the  availability  of
162       the cgroups feature is governed by the CONFIG_CGROUPS kernel configura‐
163       tion option.
164
165       cpu (since Linux 2.6.24; CONFIG_CGROUP_SCHED)
166              Cgroups can be guaranteed a minimum number of "CPU shares"  when
167              a  system  is busy.  This does not limit a cgroup's CPU usage if
168              the CPUs are not busy.  For further information, see  Documenta‐
169              tion/scheduler/sched-design-CFS.txt.
170
171              In Linux 3.2, this controller was extended to provide CPU "band‐
172              width"  control.   If  the  kernel  is  configured   with   CON‐
173              FIG_CFS_BANDWIDTH,  then  within each scheduling period (defined
174              via a file in the cgroup directory), it is possible to define an
175              upper  limit  on  the  CPU  time allocated to the processes in a
176              cgroup.  This upper limit applies even if there is no other com‐
177              petition  for  the CPU.  Further information can be found in the
178              kernel source file Documentation/scheduler/sched-bwc.txt.
179
180       cpuacct (since Linux 2.6.24; CONFIG_CGROUP_CPUACCT)
181              This provides accounting for CPU usage by groups of processes.
182
183              Further information can be found in the kernel source file Docu‐
184              mentation/cgroup-v1/cpuacct.txt.
185
186       cpuset (since Linux 2.6.24; CONFIG_CPUSETS)
187              This  cgroup  can be used to bind the processes in a cgroup to a
188              specified set of CPUs and NUMA nodes.
189
190              Further information can be found in the kernel source file Docu‐
191              mentation/cgroup-v1/cpusets.txt.
192
193       memory (since Linux 2.6.25; CONFIG_MEMCG)
194              The memory controller supports reporting and limiting of process
195              memory, kernel memory, and swap used by cgroups.
196
197              Further information can be found in the kernel source file Docu‐
198              mentation/cgroup-v1/memory.txt.
199
200       devices (since Linux 2.6.26; CONFIG_CGROUP_DEVICE)
201              This  supports  controlling  which  processes may create (mknod)
202              devices as well as open them for reading or writing.  The  poli‐
203              cies  may be specified as allow-lists and deny-lists.  Hierarchy
204              is enforced, so new rules must not violate  existing  rules  for
205              the target or ancestor cgroups.
206
207              Further information can be found in the kernel source file Docu‐
208              mentation/cgroup-v1/devices.txt.
209
210       freezer (since Linux 2.6.28; CONFIG_CGROUP_FREEZER)
211              The freezer cgroup can suspend and  restore  (resume)  all  pro‐
212              cesses  in a cgroup.  Freezing a cgroup /A also causes its chil‐
213              dren, for example, processes in /A/B, to be frozen.
214
215              Further information can be found in the kernel source file Docu‐
216              mentation/cgroup-v1/freezer-subsystem.txt.
217
218       net_cls (since Linux 2.6.29; CONFIG_CGROUP_NET_CLASSID)
219              This  places  a  classid,  specified  for the cgroup, on network
220              packets created by a cgroup.  These classids can then be used in
221              firewall  rules,  as  well as used to shape traffic using tc(8).
222              This applies only to packets leaving the cgroup, not to  traffic
223              arriving at the cgroup.
224
225              Further information can be found in the kernel source file Docu‐
226              mentation/cgroup-v1/net_cls.txt.
227
228       blkio (since Linux 2.6.33; CONFIG_BLK_CGROUP)
229              The blkio cgroup controls and limits access to  specified  block
230              devices  by  applying  IO  control in the form of throttling and
231              upper limits against leaf nodes and intermediate  nodes  in  the
232              storage hierarchy.
233
234              Two  policies are available.  The first is a proportional-weight
235              time-based division of disk implemented with CFQ.   This  is  in
236              effect  for  leaf  nodes  using CFQ.  The second is a throttling
237              policy which specifies upper I/O rate limits on a device.
238
239              Further information can be found in the kernel source file Docu‐
240              mentation/cgroup-v1/blkio-controller.txt.
241
242       perf_event (since Linux 2.6.39; CONFIG_CGROUP_PERF)
243              This  controller  allows perf monitoring of the set of processes
244              grouped in a cgroup.
245
246              Further information can be  found  in  the  kernel  source  file
247              tools/perf/Documentation/perf-record.txt.
248
249       net_prio (since Linux 3.3; CONFIG_CGROUP_NET_PRIO)
250              This  allows  priorities to be specified, per network interface,
251              for cgroups.
252
253              Further information can be found in the kernel source file Docu‐
254              mentation/cgroup-v1/net_prio.txt.
255
256       hugetlb (since Linux 3.5; CONFIG_CGROUP_HUGETLB)
257              This supports limiting the use of huge pages by cgroups.
258
259              Further information can be found in the kernel source file Docu‐
260              mentation/cgroup-v1/hugetlb.txt.
261
262       pids (since Linux 4.3; CONFIG_CGROUP_PIDS)
263              This controller permits limiting the number of process that  may
264              be created in a cgroup (and its descendants).
265
266              Further information can be found in the kernel source file Docu‐
267              mentation/cgroup-v1/pids.txt.
268
269       rdma (since Linux 4.11; CONFIG_CGROUP_RDMA)
270              The RDMA controller permits limiting the use of RDMA/IB-specific
271              resources per cgroup.
272
273              Further information can be found in the kernel source file Docu‐
274              mentation/cgroup-v1/rdma.txt.
275
276   Creating cgroups and moving processes
277       A cgroup filesystem initially contains a single root cgroup, '/', which
278       all  processes belong to.  A new cgroup is created by creating a direc‐
279       tory in the cgroup filesystem:
280
281           mkdir /sys/fs/cgroup/cpu/cg1
282
283       This creates a new empty cgroup.
284
285       A process may be moved to this cgroup  by  writing  its  PID  into  the
286       cgroup's cgroup.procs file:
287
288           echo $$ > /sys/fs/cgroup/cpu/cg1/cgroup.procs
289
290       Only one PID at a time should be written to this file.
291
292       Writing  the  value 0 to a cgroup.procs file causes the writing process
293       to be moved to the corresponding cgroup.
294
295       When writing a PID into the cgroup.procs, all threads  in  the  process
296       are moved into the new cgroup at once.
297
298       Within  a  hierarchy,  a process can be a member of exactly one cgroup.
299       Writing a process's PID to a cgroup.procs file automatically removes it
300       from the cgroup of which it was previously a member.
301
302       The  cgroup.procs  file  can  be read to obtain a list of the processes
303       that are members of a cgroup.  The returned list of PIDs is not guaran‐
304       teed  to  be  in order.  Nor is it guaranteed to be free of duplicates.
305       (For example, a PID may be recycled while reading from the list.)
306
307       In cgroups v1, an individual thread can be moved to another  cgroup  by
308       writing  its thread ID (i.e., the kernel thread ID returned by clone(2)
309       and gettid(2)) to the tasks file in a cgroup directory.  This file  can
310       be read to discover the set of threads that are members of the cgroup.
311
312   Removing cgroups
313       To  remove a cgroup, it must first have no child cgroups and contain no
314       (nonzombie) processes.  So long as that is the  case,  one  can  simply
315       remove  the  corresponding  directory  pathname.   Note that files in a
316       cgroup directory cannot and need not be removed.
317
318   Cgroups v1 release notification
319       Two files can be used to determine whether the kernel provides  notifi‐
320       cations  when  a  cgroup  becomes  empty.  A cgroup is considered to be
321       empty when it contains no child cgroups and no member processes.
322
323       A special  file  in  the  root  directory  of  each  cgroup  hierarchy,
324       release_agent,  can  be used to register the pathname of a program that
325       may be invoked when a cgroup in the hierarchy becomes empty.  The path‐
326       name  of the newly empty cgroup (relative to the cgroup mount point) is
327       provided as the sole command-line argument when the release_agent  pro‐
328       gram  is  invoked.   The  release_agent program might remove the cgroup
329       directory, or perhaps repopulate it with a process.
330
331       The default value of the release_agent file is empty, meaning  that  no
332       release agent is invoked.
333
334       The content of the release_agent file can also be specified via a mount
335       option when the cgroup filesystem is mounted:
336
337           mount -o release_agent=pathname ...
338
339       Whether or not the release_agent program is invoked when  a  particular
340       cgroup   becomes   empty   is   determined   by   the   value   in  the
341       notify_on_release file in the corresponding cgroup directory.  If  this
342       file  contains  the  value  0,  then  the  release_agent program is not
343       invoked.  If it contains the value  1,  the  release_agent  program  is
344       invoked.   The default value for this file in the root cgroup is 0.  At
345       the time when a new cgroup is created, the value in this file is inher‐
346       ited from the corresponding file in the parent cgroup.
347
348   Cgroup v1 named hierarchies
349       In  cgroups  v1, it is possible to mount a cgroup hierarchy that has no
350       attached controllers:
351
352           mount -t cgroup -o none,name=somename none /some/mount/point
353
354       Multiple instances of such hierarchies can be mounted;  each  hierarchy
355       must  have  a  unique name.  The only purpose of such hierarchies is to
356       track processes.  (See the discussion of release  notification  below.)
357       An example of this is the name=systemd cgroup hierarchy that is used by
358       systemd(1) to track services and user sessions.
359
360       Since Linux 5.0, the cgroup_no_v1 kernel boot option (described  below)
361       can  be  used  to  disable  cgroup  v1 named hierarchies, by specifying
362       cgroup_no_v1=named.
363
364

CGROUPS VERSION 2

366       In cgroups v2, all mounted controllers reside in a single unified hier‐
367       archy.   While  (different)  controllers  may be simultaneously mounted
368       under the v1 and v2 hierarchies, it is not possible to mount  the  same
369       controller simultaneously under both the v1 and the v2 hierarchies.
370
371       The  new behaviors in cgroups v2 are summarized here, and in some cases
372       elaborated in the following subsections.
373
374       1. Cgroups v2 provides a  unified  hierarchy  against  which  all  con‐
375          trollers are mounted.
376
377       2. "Internal"  processes  are not permitted.  With the exception of the
378          root cgroup, processes may reside only in leaf nodes  (cgroups  that
379          do  not themselves contain child cgroups).  The details are somewhat
380          more subtle than this, and are described below.
381
382       3. Active cgroups must be specified via  the  files  cgroup.controllers
383          and cgroup.subtree_control.
384
385       4. The    tasks    file   has   been   removed.    In   addition,   the
386          cgroup.clone_children file that is employed by the cpuset controller
387          has been removed.
388
389       5. An  improved mechanism for notification of empty cgroups is provided
390          by the cgroup.events file.
391
392       For more changes, see the Documentation/cgroup-v2.txt file in the  ker‐
393       nel source.
394
395       Some of the new behaviors listed above saw subsequent modification with
396       the addition in Linux 4.14 of "thread mode" (described below).
397
398   Cgroups v2 unified hierarchy
399       In cgroups v1, the ability to mount different controllers against  dif‐
400       ferent hierarchies was intended to allow great flexibility for applica‐
401       tion design.  In practice, though, the flexibility  turned  out  to  be
402       less  useful than expected, and in many cases added complexity.  There‐
403       fore, in cgroups v2, all available controllers are  mounted  against  a
404       single hierarchy.  The available controllers are automatically mounted,
405       meaning that it is not necessary (or  possible)  to  specify  the  con‐
406       trollers when mounting the cgroup v2 filesystem using a command such as
407       the following:
408
409           mount -t cgroup2 none /mnt/cgroup2
410
411       A cgroup v2 controller is available only if it is not currently in  use
412       via  a  mount against a cgroup v1 hierarchy.  Or, to put things another
413       way, it is not possible to employ the same controller against both a v1
414       hierarchy and the unified v2 hierarchy.  This means that it may be nec‐
415       essary first to unmount a v1 controller  (as  described  above)  before
416       that  controller  is available in v2.  Since systemd(1) makes heavy use
417       of some v1 controllers by default, it can in some cases be  simpler  to
418       boot  the  system  with  selected v1 controllers disabled.  To do this,
419       specify the cgroup_no_v1=list option on the kernel boot  command  line;
420       list  is a comma-separated list of the names of the controllers to dis‐
421       able, or the word all to disable all v1 controllers.   (This  situation
422       is correctly handled by systemd(1), which falls back to operating with‐
423       out the specified controllers.)
424
425       Note that on many modern systems, systemd(1) automatically  mounts  the
426       cgroup2 filesystem at /sys/fs/cgroup/unified during the boot process.
427
428   Cgroups v2 mount options
429       The  following  options  (mount  -o) can be specified when mounting the
430       group v2 filesystem:
431
432       nsdelegate (since Linux 4.15)
433              Treat cgroup namespaces as delegation boundaries.  For  details,
434              see below.
435
436       memory_localevents (since Linux 5.2)
437              The  memory.events  should  show  statistics only for the cgroup
438              itself, and not for any descendant cgroups.  This was the behav‐
439              ior before Linux 5.2.  Starting in Linux 5.2, the default behav‐
440              ior is to include statistics  for  descendant  cgroups  in  mem‐
441              ory.events,  and  this mount option can be used to revert to the
442              legacy behavior.  This option is system wide and can be  set  on
443              mount  or  modified  through remount only from the initial mount
444              namespace; it is silently ignored in noninitial namespaces.
445
446   Cgroups v2 controllers
447       The following controllers, documented in the kernel source  file  Docu‐
448       mentation/cgroup-v2.txt, are supported in cgroups version 2:
449
450       cpu (since Linux 4.15)
451              This  is  the  successor  to  the version 1 cpu and cpuacct con‐
452              trollers.
453
454       cpuset (since Linux 5.0)
455              This is the successor of the version 1 cpuset controller.
456
457       freezer (since Linux 5.2)
458              This is the successor of the version 1 freezer controller.
459
460       hugetlb (since Linux 5.6)
461              This is the successor of the version 1 hugetlb controller.
462
463       io (since Linux 4.5)
464              This is the successor of the version 1 blkio controller.
465
466       memory (since Linux 4.5)
467              This is the successor of the version 1 memory controller.
468
469       perf_event (since Linux 4.11)
470              This is the same as the version 1 perf_event controller.
471
472       pids (since Linux 4.5)
473              This is the same as the version 1 pids controller.
474
475       rdma (since Linux 4.11)
476              This is the same as the version 1 rdma controller.
477
478       There is no direct equivalent of the net_cls and  net_prio  controllers
479       from cgroups version 1.  Instead, support has been added to iptables(8)
480       to allow eBPF filters that hook on cgroup v2 pathnames  to  make  deci‐
481       sions about network traffic on a per-cgroup basis.
482
483       The  v2 devices controller provides no interface files; instead, device
484       control is gated by attaching an eBPF (BPF_CGROUP_DEVICE) program to  a
485       v2 cgroup.
486
487   Cgroups v2 subtree control
488       Each cgroup in the v2 hierarchy contains the following two files:
489
490       cgroup.controllers
491              This  read-only  file exposes a list of the controllers that are
492              available in this cgroup.  The contents of this file  match  the
493              contents  of  the  cgroup.subtree_control  file  in  the  parent
494              cgroup.
495
496       cgroup.subtree_control
497              This is a list of controllers that are active (enabled)  in  the
498              cgroup.   The set of controllers in this file is a subset of the
499              set in the cgroup.controllers of this cgroup.  The set of active
500              controllers is modified by writing strings to this file contain‐
501              ing space-delimited controller names, each preceded by  '+'  (to
502              enable a controller) or '-' (to disable a controller), as in the
503              following example:
504
505                  echo '+pids -memory' > x/y/cgroup.subtree_control
506
507              An attempt to  enable  a  controller  that  is  not  present  in
508              cgroup.controllers  leads to an ENOENT error when writing to the
509              cgroup.subtree_control file.
510
511       Because the list of controllers in cgroup.subtree_control is  a  subset
512       of those cgroup.controllers, a controller that has been disabled in one
513       cgroup in the hierarchy can never be re-enabled in  the  subtree  below
514       that cgroup.
515
516       A  cgroup's  cgroup.subtree_control  file  determines  the  set of con‐
517       trollers that are exercised in the child cgroups.   When  a  controller
518       (e.g.,  pids) is present in the cgroup.subtree_control file of a parent
519       cgroup,  then  the  corresponding  controller-interface  files   (e.g.,
520       pids.max)  are automatically created in the children of that cgroup and
521       can be used to exert resource control in the child cgroups.
522
523   Cgroups v2 "no internal processes" rule
524       Cgroups v2 enforces a so-called "no internal processes" rule.   Roughly
525       speaking,  this rule means that, with the exception of the root cgroup,
526       processes may reside only in leaf nodes (cgroups that do not themselves
527       contain  child  cgroups).  This avoids the need to decide how to parti‐
528       tion resources between processes which are members of cgroup A and pro‐
529       cesses in child cgroups of A.
530
531       For  instance,  if cgroup /cg1/cg2 exists, then a process may reside in
532       /cg1/cg2, but not in /cg1.  This is to avoid an ambiguity in cgroups v1
533       with  respect  to the delegation of resources between processes in /cg1
534       and its child cgroups.  The recommended approach in cgroups  v2  is  to
535       create  a  subdirectory called leaf for any nonleaf cgroup which should
536       contain processes, but no child cgroups.  Thus, processes which  previ‐
537       ously  would have gone into /cg1 would now go into /cg1/leaf.  This has
538       the advantage of making explicit the relationship between processes  in
539       /cg1/leaf and /cg1's other children.
540
541       The  "no  internal  processes"  rule is in fact more subtle than stated
542       above.  More precisely, the rule is that a (nonroot) cgroup can't  both
543       (1)  have  member  processes,  and  (2) distribute resources into child
544       cgroups—that is, have a nonempty cgroup.subtree_control file.  Thus, it
545       is  possible  for  a  cgroup  to  have  both member processes and child
546       cgroups, but before controllers can be enabled  for  that  cgroup,  the
547       member  processes  must  be moved out of the cgroup (e.g., perhaps into
548       the child cgroups).
549
550       With the Linux 4.14 addition of "thread mode"  (described  below),  the
551       "no internal processes" rule has been relaxed in some cases.
552
553   Cgroups v2 cgroup.events file
554       Each  nonroot  cgroup  in  the  v2 hierarchy contains a read-only file,
555       cgroup.events, whose contents are key-value pairs (delimited by newline
556       characters, with the key and value separated by spaces) providing state
557       information about the the cgroup:
558
559           $ cat mygrp/cgroup.events
560           populated 1
561           frozen 0
562
563       The following keys may appear in this file:
564
565       populated
566              The value of this key is either 1, if this cgroup or any of  its
567              descendants has member processes, or otherwise 0.
568
569       frozen (since Linux 5.2)
570              The  value  of this key is 1 if this cgroup is currently frozen,
571              or 0 if it is not.
572
573       The cgroup.events file can be monitored, in order to receive  notifica‐
574       tion when the value of one of its keys changes.  Such monitoring can be
575       done using inotify(7), which notifies changes as IN_MODIFY  events,  or
576       poll(2),  which  notifies  changes by returning the POLLPRI and POLLERR
577       bits in the revents field.
578
579   Cgroup v2 release notification
580       Cgroups v2 provides a new mechanism for obtaining notification  when  a
581       cgroup    becomes    empty.    The   cgroups   v1   release_agent   and
582       notify_on_release files are removed, and replaced by the populated  key
583       in  the  cgroup.events  file.  This key either has the value 0, meaning
584       that the cgroup (and its descendants)  contain  no  (nonzombie)  member
585       processes,  or  1,  meaning that the cgroup (or one of its descendants)
586       contains member processes.
587
588       The cgroups v2  release-notification  mechanism  offers  the  following
589       advantages over the cgroups v1 release_agent mechanism:
590
591       *  It allows for cheaper notification, since a single process can moni‐
592          tor multiple cgroup.events files  (using  the  techniques  described
593          earlier).   By  contrast,  the  cgroups  v1  mechanism  requires the
594          expense of creating a process for each notification.
595
596       *  Notification for different cgroup subhierarchies can be delegated to
597          different  processes.   By contrast, the cgroups v1 mechanism allows
598          only one release agent for an entire hierarchy.
599
600   Cgroups v2 cgroup.stat file
601       Each cgroup in the v2 hierarchy contains a read-only  cgroup.stat  file
602       (first introduced in Linux 4.14) that consists of lines containing key-
603       value pairs.  The following keys currently appear in this file:
604
605       nr_descendants
606              This is the total number of visible  (i.e.,  living)  descendant
607              cgroups underneath this cgroup.
608
609       nr_dying_descendants
610              This  is the total number of dying descendant cgroups underneath
611              this cgroup.  A  cgroup  enters  the  dying  state  after  being
612              deleted.   It  remains  in  that  state  for an undefined period
613              (which will depend on system load)  while  resources  are  freed
614              before  the cgroup is destroyed.  Note that the presence of some
615              cgroups in the dying state is normal, and is not  indicative  of
616              any problem.
617
618              A  process can't be made a member of a dying cgroup, and a dying
619              cgroup can't be brought back to life.
620
621   Limiting the number of descendant cgroups
622       Each cgroup in the v2 hierarchy contains the following files, which can
623       be  used  to  view  and  set limits on the number of descendant cgroups
624       under that cgroup:
625
626       cgroup.max.depth (since Linux 4.14)
627              This file defines a limit on the depth of nesting of  descendant
628              cgroups.   A  value  of  0 in this file means that no descendant
629              cgroups can be created.  An attempt to create a descendant whose
630              nesting  level  exceeds the limit fails (mkdir(2) fails with the
631              error EAGAIN).
632
633              Writing the string "max" to this file means  that  no  limit  is
634              imposed.  The default value in this file is "max".
635
636       cgroup.max.descendants (since Linux 4.14)
637              This  file  defines  a  limit  on  the number of live descendant
638              cgroups that this cgroup may have.  An attempt  to  create  more
639              descendants than allowed by the limit fails (mkdir(2) fails with
640              the error EAGAIN).
641
642              Writing the string "max" to this file means  that  no  limit  is
643              imposed.  The default value in this file is "max".
644

CGROUPS DELEGATION: DELEGATING A HIERARCHY TO A LESS PRIVILEGED USER

646       In  the context of cgroups, delegation means passing management of some
647       subtree of the cgroup hierarchy to a nonprivileged  user.   Cgroups  v1
648       provides support for delegation based on file permissions in the cgroup
649       hierarchy but with less strict containment  rules  than  v2  (as  noted
650       below).   Cgroups  v2  supports delegation with containment by explicit
651       design.  The focus of the discussion in this section is  on  delegation
652       in  cgroups  v2,  with  some differences for cgroups v1 noted along the
653       way.
654
655       Some terminology is required in order to describe delegation.  A  dele‐
656       gater  is  a  privileged user (i.e., root) who owns a parent cgroup.  A
657       delegatee is a nonprivileged user who will be granted  the  permissions
658       needed  to  manage some subhierarchy under that parent cgroup, known as
659       the delegated subtree.
660
661       To perform delegation, the  delegater  makes  certain  directories  and
662       files writable by the delegatee, typically by changing the ownership of
663       the objects to be the user ID of the delegatee.  Assuming that we  want
664       to  delegate the hierarchy rooted at (say) /dlgt_grp and that there are
665       not yet any child cgroups under that cgroup, the ownership of the  fol‐
666       lowing is changed to the user ID of the delegatee:
667
668       /dlgt_grp
669              Changing the ownership of the root of the subtree means that any
670              new cgroups created under the subtree (and the files  they  con‐
671              tain) will also be owned by the delegatee.
672
673       /dlgt_grp/cgroup.procs
674              Changing the ownership of this file means that the delegatee can
675              move processes into the root of the delegated subtree.
676
677       /dlgt_grp/cgroup.subtree_control (cgroups v2 only)
678              Changing the ownership of this file means that the delegatee can
679              enable  controllers  (that  are present in /dlgt_grp/cgroup.con‐
680              trollers) in order to further redistribute  resources  at  lower
681              levels  in the subtree.  (As an alternative to changing the own‐
682              ership of this file, the delegater might  instead  add  selected
683              controllers to this file.)
684
685       /dlgt_grp/cgroup.threads (cgroups v2 only)
686              Changing  the  ownership of this file is necessary if a threaded
687              subtree is being  delegated  (see  the  description  of  "thread
688              mode",  below).   This permits the delegatee to write thread IDs
689              to the file.  (The ownership of this file can  also  be  changed
690              when  delegating  a domain subtree, but currently this serves no
691              purpose, since, as described below, it is not possible to move a
692              thread  between  domain  cgroups by writing its thread ID to the
693              cgroup.threads file.)
694
695              In cgroups v1, the corresponding file  that  should  instead  be
696              delegated is the tasks file.
697
698       The  delegater should not change the ownership of any of the controller
699       interfaces files (e.g.,  pids.max,  memory.high)  in  dlgt_grp.   Those
700       files are used from the next level above the delegated subtree in order
701       to distribute resources into the subtree, and the delegatee should  not
702       have  permission  to change the resources that are distributed into the
703       delegated subtree.
704
705       See also the discussion  of  the  /sys/kernel/cgroup/delegate  file  in
706       NOTES for information about further delegatable files in cgroups v2.
707
708       After  the  aforementioned steps have been performed, the delegatee can
709       create child cgroups within the delegated subtree (the cgroup subdirec‐
710       tories  and  the files they contain will be owned by the delegatee) and
711       move processes between cgroups in the subtree.  If some controllers are
712       present  in  dlgt_grp/cgroup.subtree_control,  or the ownership of that
713       file was passed to the delegatee, the delegatee can  also  control  the
714       further  redistribution  of  the corresponding resources into the dele‐
715       gated subtree.
716
717   Cgroups v2 delegation: nsdelegate and cgroup namespaces
718       Starting with Linux 4.13, there is a second way to perform cgroup dele‐
719       gation  in  the  cgroups  v2  hierarchy.   This  is done by mounting or
720       remounting the cgroup v2 filesystem with the nsdelegate  mount  option.
721       For  example,  if the cgroup v2 filesystem has already been mounted, we
722       can remount it with the nsdelegate option as follows:
723
724           mount -t cgroup2 -o remount,nsdelegate \
725                            none /sys/fs/cgroup/unified
726
727       The effect of this mount option is to cause cgroup namespaces to  auto‐
728       matically become delegation boundaries.  More specifically, the follow‐
729       ing restrictions apply for processes inside the cgroup namespace:
730
731       *  Writes to controller interface files in the root  directory  of  the
732          namespace  will  fail  with  the  error EPERM.  Processes inside the
733          cgroup namespace can still write to delegatable files  in  the  root
734          directory   of   the  cgroup  namespace  such  as  cgroup.procs  and
735          cgroup.subtree_control, and can create subhierarchy  underneath  the
736          root directory.
737
738       *  Attempts  to  migrate  processes  across  the namespace boundary are
739          denied (with the error ENOENT).  Processes inside the cgroup  names‐
740          pace  can  still  (subject to the containment rules described below)
741          move processes between cgroups within  the  subhierarchy  under  the
742          namespace root.
743
744       The  ability to define cgroup namespaces as delegation boundaries makes
745       cgroup namespaces more useful.  To  understand  why,  suppose  that  we
746       already have one cgroup hierarchy that has been delegated to a nonpriv‐
747       ileged user, cecilia, using the older  delegation  technique  described
748       above.   Suppose further that cecilia wanted to further delegate a sub‐
749       hierarchy under the existing delegated hierarchy.   (For  example,  the
750       delegated  hierarchy might be associated with an unprivileged container
751       run by cecilia.)  Even if a cgroup namespace was employed, because both
752       hierarchies  are  owned by the unprivileged user cecilia, the following
753       illegitimate actions could be performed:
754
755       *  A process in the inferior hierarchy could change the  resource  con‐
756          troller  settings  in  the root directory of that hierarchy.  (These
757          resource controller settings are intended to  allow  control  to  be
758          exercised  from the parent cgroup; a process inside the child cgroup
759          should not be allowed to modify them.)
760
761       *  A process inside the inferior hierarchy could  move  processes  into
762          and  out  of  the  inferior hierarchy if the cgroups in the superior
763          hierarchy were somehow visible.
764
765       Employing the nsdelegate mount option prevents both of these possibili‐
766       ties.
767
768       The  nsdelegate  mount  option only has an effect when performed in the
769       initial mount namespace; in  other  mount  namespaces,  the  option  is
770       silently ignored.
771
772       Note:  On  some  systems, systemd(1) automatically mounts the cgroup v2
773       filesystem.  In order to experiment with the nsdelegate  operation,  it
774       may  be  useful  to  boot  the  kernel  with the following command-line
775       options:
776
777           cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
778
779       These options cause the kernel to boot with the cgroups v1  controllers
780       disabled  (meaning that the controllers are available in the v2 hierar‐
781       chy), and tells systemd(1) not to mount and use the cgroup  v2  hierar‐
782       chy,  so that the v2 hierarchy can be manually mounted with the desired
783       options after boot-up.
784
785   Cgroup delegation containment rules
786       Some delegation containment rules ensure that the  delegatee  can  move
787       processes  between cgroups within the delegated subtree, but can't move
788       processes from outside the delegated subtree into the subtree  or  vice
789       versa.  A nonprivileged process (i.e., the delegatee) can write the PID
790       of a "target" process into a cgroup.procs file only if all of the  fol‐
791       lowing are true:
792
793       *  The writer has write permission on the cgroup.procs file in the des‐
794          tination cgroup.
795
796       *  The writer has write permission on  the  cgroup.procs  file  in  the
797          nearest common ancestor of the source and destination cgroups.  Note
798          that in some cases, the nearest common ancestor may be the source or
799          destination  cgroup  itself.   This  requirement is not enforced for
800          cgroups v1 hierarchies, with the consequence that containment in  v1
801          is  less  strict  than  in v2.  (For example, in cgroups v1 the user
802          that owns two distinct delegated subhierarchies can move  a  process
803          between the hierarchies.)
804
805       *  If  the cgroup v2 filesystem was mounted with the nsdelegate option,
806          the writer must be able to see the source  and  destination  cgroups
807          from its cgroup namespace.
808
809       *  In cgroups v1: the effective UID of the writer (i.e., the delegatee)
810          matches the real user ID or the  saved  set-user-ID  of  the  target
811          process.   Before  Linux  4.11,  this  requirement  also  applied in
812          cgroups v2 (This was a historical requirement inherited from cgroups
813          v1  that was later deemed unnecessary, since the other rules suffice
814          for containment in cgroups v2.)
815
816       Note: one consequence of these delegation containment rules is that the
817       unprivileged delegatee can't place the first process into the delegated
818       subtree; instead, the delegater must place the first process (a process
819       owned by the delegatee) into the delegated subtree.
820

CGROUPS VERSION 2 THREAD MODE

822       Among  the  restrictions imposed by cgroups v2 that were not present in
823       cgroups v1 are the following:
824
825       *  No thread-granularity control: all of the threads of a process  must
826          be in the same cgroup.
827
828       *  No internal processes: a cgroup can't both have member processes and
829          exercise controllers on child cgroups.
830
831       Both of these  restrictions  were  added  because  the  lack  of  these
832       restrictions  had  caused  problems  in cgroups v1.  In particular, the
833       cgroups v1 ability to allow thread-level granularity for cgroup member‐
834       ship  made  no  sense for some controllers.  (A notable example was the
835       memory controller: since threads share an address  space,  it  made  no
836       sense to split threads across different memory cgroups.)
837
838       Notwithstanding  the  initial design decision in cgroups v2, there were
839       use cases for certain controllers,  notably  the  cpu  controller,  for
840       which  thread-level  granularity  of control was meaningful and useful.
841       To accommodate such use cases, Linux 4.14 added thread mode for cgroups
842       v2.
843
844       Thread mode allows the following:
845
846       *  The  creation of threaded subtrees in which the threads of a process
847          may be spread across cgroups inside the tree.  (A  threaded  subtree
848          may contain multiple multithreaded processes.)
849
850       *  The  concept of threaded controllers, which can distribute resources
851          across the cgroups in a threaded subtree.
852
853       *  A relaxation of the "no internal processes rule", so that, within  a
854          threaded subtree, a cgroup can both contain member threads and exer‐
855          cise resource control over child cgroups.
856
857       With the addition of thread mode, each nonroot cgroup  now  contains  a
858       new  file,  cgroup.type, that exposes, and in some circumstances can be
859       used to change, the "type" of a cgroup.  This file contains one of  the
860       following type values:
861
862       domain This  is  a  normal  v2 cgroup that provides process-granularity
863              control.  If a process is a member  of  this  cgroup,  then  all
864              threads  of  the process are (by definition) in the same cgroup.
865              This is the default cgroup type, and provides the same  behavior
866              that  was  provided for cgroups in the initial cgroups v2 imple‐
867              mentation.
868
869       threaded
870              This cgroup is a member of a threaded subtree.  Threads  can  be
871              added  to  this  cgroup,  and controllers can be enabled for the
872              cgroup.
873
874       domain threaded
875              This is a domain cgroup that serves as the root  of  a  threaded
876              subtree.  This cgroup type is also known as "threaded root".
877
878       domain invalid
879              This  is  a  cgroup  inside  a  threaded  subtree  that is in an
880              "invalid" state.  Processes can't be added to  the  cgroup,  and
881              controllers  can't  be  enabled  for the cgroup.  The only thing
882              that can be done with this cgroup (other than deleting it) is to
883              convert it to a threaded cgroup by writing the string "threaded"
884              to the cgroup.type file.
885
886              The rationale for the existence of this  "interim"  type  during
887              the  creation of a threaded subtree (rather than the kernel sim‐
888              ply immediately converting all cgroups under the  threaded  root
889              to the type threaded) is to allow for possible future extensions
890              to the thread mode model
891
892   Threaded versus domain controllers
893       With the addition of threads mode, cgroups  v2  now  distinguishes  two
894       types of resource controllers:
895
896       *  Threaded  controllers:  these controllers support thread-granularity
897          for resource control and can be enabled  inside  threaded  subtrees,
898          with  the  result  that the corresponding controller-interface files
899          appear inside the cgroups in the  threaded  subtree.   As  at  Linux
900          4.19,  the  following controllers are threaded: cpu, perf_event, and
901          pids.
902
903       *  Domain controllers: these controllers support only process granular‐
904          ity  for  resource  control.   From the perspective of a domain con‐
905          troller, all threads of a process are always  in  the  same  cgroup.
906          Domain controllers can't be enabled inside a threaded subtree.
907
908   Creating a threaded subtree
909       There are two pathways that lead to the creation of a threaded subtree.
910       The first pathway proceeds as follows:
911
912       1. We write the string "threaded" to the cgroup.type file of  a  cgroup
913          y/z  that  currently  has  the  type domain.  This has the following
914          effects:
915
916          *  The type of the cgroup y/z becomes threaded.
917
918          *  The type of the parent cgroup, y, becomes domain  threaded.   The
919             parent  cgroup  is  the root of a threaded subtree (also known as
920             the "threaded root").
921
922          *  All other cgroups under y that were not already of type  threaded
923             (because  they  were  inside  already  existing threaded subtrees
924             under the  new  threaded  root)  are  converted  to  type  domain
925             invalid.  Any subsequently created cgroups under y will also have
926             the type domain invalid.
927
928       2. We write the string "threaded" to each of the domain invalid cgroups
929          under y, in order to convert them to the type threaded.  As a conse‐
930          quence of this step, all threads under the threaded  root  now  have
931          the type threaded and the threaded subtree is now fully usable.  The
932          requirement to write "threaded" to each of these cgroups is somewhat
933          cumbersome, but allows for possible future extensions to the thread-
934          mode model.
935
936       The second way of creating a threaded subtree is as follows:
937
938       1. In an existing cgroup, z, that currently has the type domain, we (1)
939          enable  one  or  more  threaded controllers and (2) make a process a
940          member of z.  (These two steps can be done in either  order.)   This
941          has the following consequences:
942
943          *  The type of z becomes domain threaded.
944
945          *  All  of the descendant cgroups of x that were not already of type
946             threaded are converted to type domain invalid.
947
948       2. As before, we make the threaded subtree usable by writing the string
949          "threaded"  to  each of the domain invalid cgroups under y, in order
950          to convert them to the type threaded.
951
952       One of the consequences of the above pathways to  creating  a  threaded
953       subtree  is  that  the  threaded  root  cgroup  can be a parent only to
954       threaded (and domain invalid) cgroups.  The threaded root cgroup  can't
955       be  a  parent  of  a domain cgroups, and a threaded cgroup can't have a
956       sibling that is a domain cgroup.
957
958   Using a threaded subtree
959       Within a threaded subtree, threaded controllers can be enabled in  each
960       subgroup  whose  type  has been changed to threaded; upon doing so, the
961       corresponding controller interface files appear in the children of that
962       cgroup.
963
964       A  process  can  be moved into a threaded subtree by writing its PID to
965       the cgroup.procs file in one of the cgroups inside the tree.  This  has
966       the  effect  of making all of the threads in the process members of the
967       corresponding cgroup and makes the process a  member  of  the  threaded
968       subtree.   The  threads  of  the  process can then be spread across the
969       threaded subtree by writing their thread IDs  (see  gettid(2))  to  the
970       cgroup.threads  files  in  different  cgroups  inside the subtree.  The
971       threads of a process must all reside in the same threaded subtree.
972
973       As with writing to cgroup.procs,  some  containment  rules  apply  when
974       writing to the cgroup.threads file:
975
976       *  The  writer must have write permission on the cgroup.threads file in
977          the destination cgroup.
978
979       *  The writer must have write permission on the  cgroup.procs  file  in
980          the common ancestor of the source and destination cgroups.  (In some
981          cases, the common ancestor may be the source or  destination  cgroup
982          itself.)
983
984       *  The source and destination cgroups must be in the same threaded sub‐
985          tree.  (Outside a threaded subtree, an attempt to move a  thread  by
986          writing  its  thread  ID  to  the cgroup.threads file in a different
987          domain cgroup fails with the error EOPNOTSUPP.)
988
989       The cgroup.threads file is present in  each  cgroup  (including  domain
990       cgroups)  and  can be read in order to discover the set of threads that
991       is present in the cgroup.  The set of thread IDs obtained when  reading
992       this file is not guaranteed to be ordered or free of duplicates.
993
994       The  cgroup.procs  file in the threaded root shows the PIDs of all pro‐
995       cesses that are members of  the  threaded  subtree.   The  cgroup.procs
996       files in the other cgroups in the subtree are not readable.
997
998       Domain  controllers  can't  be  enabled  in a threaded subtree; no con‐
999       troller-interface  files  appear  inside  the  cgroups  underneath  the
1000       threaded root.  From the point of view of a domain controller, threaded
1001       subtrees are invisible: a multithreaded process inside a threaded  sub‐
1002       tree  appears  to  a domain controller as a process that resides in the
1003       threaded root cgroup.
1004
1005       Within a threaded subtree, the "no internal processes"  rule  does  not
1006       apply: a cgroup can both contain member processes (or thread) and exer‐
1007       cise controllers on child cgroups.
1008
1009   Rules for writing to cgroup.type and creating threaded subtrees
1010       A number of rules apply when writing to the cgroup.type file:
1011
1012       *  Only the string "threaded" may be written.  In other words, the only
1013          explicit  transition  that is possible is to convert a domain cgroup
1014          to type threaded.
1015
1016       *  The effect of writing "threaded" depends on  the  current  value  in
1017          cgroup.type, as follows:
1018
1019          ·  domain  or domain threaded: start the creation of a threaded sub‐
1020             tree (whose root is the parent of this cgroup) via the  first  of
1021             the pathways described above;
1022
1023          ·  domain invalid:  convert  this cgroup (which is inside a threaded
1024             subtree) to a usable (i.e., threaded) state;
1025
1026          ·  threaded: no effect (a "no-op").
1027
1028       *  We can't write to a cgroup.type file if the parent's type is  domain
1029          invalid.   In other words, the cgroups of a threaded subtree must be
1030          converted to the threaded state in a top-down manner.
1031
1032       There are also some constraints that must be satisfied in order to cre‐
1033       ate a threaded subtree rooted at the cgroup x:
1034
1035       *  There  can  be  no  member processes in the descendant cgroups of x.
1036          (The cgroup x can itself have member processes.)
1037
1038       *  No domain controllers may be enabled in  x's  cgroup.subtree_control
1039          file.
1040
1041       If  any  of the above constraints is violated, then an attempt to write
1042       "threaded" to a cgroup.type file fails with the error ENOTSUP.
1043
1044   The "domain threaded" cgroup type
1045       According to the pathways described above, the type  of  a  cgroup  can
1046       change to domain threaded in either of the following cases:
1047
1048       *  The string "threaded" is written to a child cgroup.
1049
1050       *  A  threaded controller is enabled inside the cgroup and a process is
1051          made a member of the cgroup.
1052
1053       A domain threaded cgroup, x, can revert to the type domain if the above
1054       conditions  no  longer hold true—that is, if all threaded child cgroups
1055       of x are removed and  either  x  no  longer  has  threaded  controllers
1056       enabled or no longer has member processes.
1057
1058       When a domain threaded cgroup x reverts to the type domain:
1059
1060       *  All  domain  invalid  descendants  of  x that are not in lower-level
1061          threaded subtrees revert to the type domain.
1062
1063       *  The root cgroups in any lower-level threaded subtrees revert to  the
1064          type domain threaded.
1065
1066   Exceptions for the root cgroup
1067       The root cgroup of the v2 hierarchy is treated exceptionally: it can be
1068       the parent  of  both  domain  and  threaded  cgroups.   If  the  string
1069       "threaded" is written to the cgroup.type file of one of the children of
1070       the root cgroup, then
1071
1072       *  The type of that cgroup becomes threaded.
1073
1074       *  The type of any descendants of that cgroup  that  are  not  part  of
1075          lower-level threaded subtrees changes to domain invalid.
1076
1077       Note  that  in  this case, there is no cgroup whose type becomes domain
1078       threaded.  (Notionally, the  root  cgroup  can  be  considered  as  the
1079       threaded root for the cgroup whose type was changed to threaded.)
1080
1081       The aim of this exceptional treatment for the root cgroup is to allow a
1082       threaded cgroup that employs the cpu controller to be placed as high as
1083       possible  in  the  hierarchy,  so  as  to  minimize the (small) cost of
1084       traversing the cgroup hierarchy.
1085
1086   The cgroups v2 "cpu" controller and realtime threads
1087       As at Linux 4.19, the cgroups v2 cpu controller does not  support  con‐
1088       trol  of  realtime threads (specifically threads scheduled under any of
1089       the  policies  SCHED_FIFO,  SCHED_RR,  described  SCHED_DEADLINE;   see
1090       sched(7)).   Therefore,  the  cpu controller can be enabled in the root
1091       cgroup only if all realtime threads are in the root cgroup.  (If  there
1092       are  realtime threads in nonroot cgroups, then a write(2) of the string
1093       "+cpu" to the cgroup.subtree_control file fails with the error EINVAL.)
1094
1095       On some systems, systemd(1) places certain realtime threads in  nonroot
1096       cgroups in the v2 hierarchy.  On such systems, these threads must first
1097       be moved to the root cgroup before the cpu controller can be enabled.
1098

ERRORS

1100       The following errors can occur for mount(2):
1101
1102       EBUSY  An attempt to mount a cgroup version 1 filesystem specified nei‐
1103              ther  the  name=  option (to mount a named hierarchy) nor a con‐
1104              troller name (or all).
1105

NOTES

1107       A child process created via fork(2) inherits its parent's  cgroup  mem‐
1108       berships.    A   process's  cgroup  memberships  are  preserved  across
1109       execve(2).
1110
1111       The clone3(2) CLONE_INTO_CGROUP flag can be  used  to  create  a  child
1112       process  that  begins its life in a different version 2 cgroup from the
1113       parent process.
1114
1115   /proc files
1116       /proc/cgroups (since Linux 2.6.24)
1117              This file contains information about the  controllers  that  are
1118              compiled  into  the  kernel.  An example of the contents of this
1119              file (reformatted for readability) is the following:
1120
1121                  #subsys_name    hierarchy      num_cgroups    enabled
1122                  cpuset          4              1              1
1123                  cpu             8              1              1
1124                  cpuacct         8              1              1
1125                  blkio           6              1              1
1126                  memory          3              1              1
1127                  devices         10             84             1
1128                  freezer         7              1              1
1129                  net_cls         9              1              1
1130                  perf_event      5              1              1
1131                  net_prio        9              1              1
1132                  hugetlb         0              1              0
1133                  pids            2              1              1
1134
1135              The fields in this file are, from left to right:
1136
1137              1. The name of the controller.
1138
1139              2. The unique ID of the cgroup  hierarchy  on  which  this  con‐
1140                 troller  is  mounted.  If multiple cgroups v1 controllers are
1141                 bound to the same hierarchy, then each  will  show  the  same
1142                 hierarchy  ID in this field.  The value in this field will be
1143                 0 if:
1144
1145                   a) the controller is not mounted on a cgroups v1 hierarchy;
1146
1147                   b) the controller is bound to the cgroups v2 single unified
1148                      hierarchy; or
1149
1150                   c) the controller is disabled (see below).
1151
1152              3. The  number  of  control  groups in this hierarchy using this
1153                 controller.
1154
1155              4. This field  contains  the  value  1  if  this  controller  is
1156                 enabled, or 0 if it has been disabled (via the cgroup_disable
1157                 kernel command-line boot parameter).
1158
1159       /proc/[pid]/cgroup (since Linux 2.6.24)
1160              This file describes control groups to which the process with the
1161              corresponding  PID  belongs.   The displayed information differs
1162              for cgroups version 1 and version 2 hierarchies.
1163
1164              For each cgroup hierarchy of which  the  process  is  a  member,
1165              there is one entry containing three colon-separated fields:
1166
1167                  hierarchy-ID:controller-list:cgroup-path
1168
1169              For example:
1170
1171                  5:cpuacct,cpu,cpuset:/daemons
1172
1173              The colon-separated fields are, from left to right:
1174
1175              1. For  cgroups  version  1  hierarchies,  this field contains a
1176                 unique hierarchy ID number that can be matched to a hierarchy
1177                 ID  in  /proc/cgroups.   For the cgroups version 2 hierarchy,
1178                 this field contains the value 0.
1179
1180              2. For cgroups version 1  hierarchies,  this  field  contains  a
1181                 comma-separated  list of the controllers bound to the hierar‐
1182                 chy.  For the cgroups version  2  hierarchy,  this  field  is
1183                 empty.
1184
1185              3. This  field contains the pathname of the control group in the
1186                 hierarchy to which the process  belongs.   This  pathname  is
1187                 relative to the mount point of the hierarchy.
1188
1189   /sys/kernel/cgroup files
1190       /sys/kernel/cgroup/delegate (since Linux 4.15)
1191              This  file exports a list of the cgroups v2 files (one per line)
1192              that are delegatable (i.e., whose ownership should be changed to
1193              the  user ID of the delegatee).  In the future, the set of dele‐
1194              gatable files may change or grow, and this file provides  a  way
1195              for  the kernel to inform user-space applications of which files
1196              must be delegated.  As at Linux 4.15,  one  sees  the  following
1197              when inspecting this file:
1198
1199                  $ cat /sys/kernel/cgroup/delegate
1200                  cgroup.procs
1201                  cgroup.subtree_control
1202                  cgroup.threads
1203
1204       /sys/kernel/cgroup/features (since Linux 4.15)
1205              Over  time,  the set of cgroups v2 features that are provided by
1206              the kernel may change or grow,  or  some  features  may  not  be
1207              enabled  by  default.   This  file provides a way for user-space
1208              applications to discover what features the running  kernel  sup‐
1209              ports and has enabled.  Features are listed one per line:
1210
1211                  $ cat /sys/kernel/cgroup/features
1212                  nsdelegate
1213                  memory_localevents
1214
1215              The entries that can appear in this file are:
1216
1217              memory_localevents (since Linux 5.2)
1218                     The kernel supports the memory_localevents mount option.
1219
1220              nsdelegate (since Linux 4.15)
1221                     The kernel supports the nsdelegate mount option.
1222

COLOPHON

1231       This  page  is  part of release 5.07 of the Linux man-pages project.  A
1232       description of the project, information about reporting bugs,  and  the
1233       latest     version     of     this    page,    can    be    found    at
1234       https://www.kernel.org/doc/man-pages/.
1235
1236
1237
1238Linux                             2020-04-11                        CGROUPS(7)