1USER_NAMESPACES(7)         Linux Programmer's Manual        USER_NAMESPACES(7)
2
3
4

NAME

6       user_namespaces - overview of Linux user namespaces
7

DESCRIPTION

9       For an overview of namespaces, see namespaces(7).
10
11       User namespaces isolate security-related identifiers and attributes, in
12       particular, user IDs and  group  IDs  (see  credentials(7)),  the  root
13       directory,  keys  (see  keyrings(7)),  and  capabilities (see capabili‐
14       ties(7)).  A process's user and group IDs can be different  inside  and
15       outside  a  user namespace.  In particular, a process can have a normal
16       unprivileged user ID outside a user namespace while at  the  same  time
17       having a user ID of 0 inside the namespace; in other words, the process
18       has full privileges for operations inside the user  namespace,  but  is
19       unprivileged for operations outside the namespace.
20
21   Nested namespaces, namespace membership
22       User  namespaces can be nested; that is, each user namespace—except the
23       initial ("root") namespace—has a parent user namespace,  and  can  have
24       zero  or  more child user namespaces.  The parent user namespace is the
25       user namespace of the process that creates the  user  namespace  via  a
26       call to unshare(2) or clone(2) with the CLONE_NEWUSER flag.
27
28       The  kernel imposes (since version 3.11) a limit of 32 nested levels of
29       user namespaces.  Calls to unshare(2) or clone(2) that would cause this
30       limit to be exceeded fail with the error EUSERS.
31
32       Each process is a member of exactly one user namespace.  A process cre‐
33       ated via fork(2) or clone(2) without the CLONE_NEWUSER flag is a member
34       of  the  same  user namespace as its parent.  A single-threaded process
35       can  join  another  user  namespace  with  setns(2)  if  it   has   the
36       CAP_SYS_ADMIN  in that namespace; upon doing so, it gains a full set of
37       capabilities in that namespace.
38
39       A call to clone(2) or unshare(2) with the CLONE_NEWUSER flag makes  the
40       new  child process (for clone(2)) or the caller (for unshare(2)) a mem‐
41       ber of the new user namespace created by the call.
42
43       The NS_GET_PARENT ioctl(2)  operation  can  be  used  to  discover  the
44       parental relationship between user namespaces; see ioctl_ns(2).
45
46   Capabilities
47       The  child  process  created  by  clone(2)  with the CLONE_NEWUSER flag
48       starts out with a complete set of capabilities in the new  user  names‐
49       pace.   Likewise,  a  process  that  creates a new user namespace using
50       unshare(2) or joins an existing user namespace using setns(2)  gains  a
51       full  set  of  capabilities in that namespace.  On the other hand, that
52       process has no capabilities in the parent (in the case of clone(2))  or
53       previous  (in the case of unshare(2) and setns(2)) user namespace, even
54       if the new namespace is created or joined by the  root  user  (i.e.,  a
55       process with user ID 0 in the root namespace).
56
57       Note that a call to execve(2) will cause a process's capabilities to be
58       recalculated in the usual  way  (see  capabilities(7)).   Consequently,
59       unless the process has a user ID of 0 within the namespace, or the exe‐
60       cutable file has a nonempty inheritable capabilities mask, the  process
61       will  lose  all  capabilities.  See the discussion of user and group ID
62       mappings, below.
63
64       A call to clone(2), unshare(2), or  setns(2)  using  the  CLONE_NEWUSER
65       flag sets the "securebits" flags (see capabilities(7)) to their default
66       values (all flags disabled) in the child (for clone(2)) or caller  (for
67       unshare(2),  or  setns(2)).  Note that because the caller no longer has
68       capabilities in its original user namespace after a call  to  setns(2),
69       it  is not possible for a process to reset its "securebits" flags while
70       retaining its user namespace membership by using  a  pair  of  setns(2)
71       calls to move to another user namespace and then return to its original
72       user namespace.
73
74       The rules for determining whether or not a process has a capability  in
75       a particular user namespace are as follows:
76
77       1. A process has a capability inside a user namespace if it is a member
78          of that namespace and it has the capability in its  effective  capa‐
79          bility  set.  A process can gain capabilities in its effective capa‐
80          bility set in various ways.  For example, it may execute a set-user-
81          ID  program  or an executable with associated file capabilities.  In
82          addition,  a  process  may  gain  capabilities  via  the  effect  of
83          clone(2), unshare(2), or setns(2), as already described.
84
85       2. If  a process has a capability in a user namespace, then it has that
86          capability in all child (and further removed descendant)  namespaces
87          as well.
88
89       3. When  a  user namespace is created, the kernel records the effective
90          user ID of the creating process as being the "owner" of  the  names‐
91          pace.   A  process  that resides in the parent of the user namespace
92          and whose effective user ID matches the owner of the  namespace  has
93          all  capabilities in the namespace.  By virtue of the previous rule,
94          this means that the process has  all  capabilities  in  all  further
95          removed  descendant  user  namespaces as well.  The NS_GET_OWNER_UID
96          ioctl(2) operation can be used to discover the user ID of the  owner
97          of the namespace; see ioctl_ns(2).
98
99   Effect of capabilities within a user namespace
100       Having  a  capability inside a user namespace permits a process to per‐
101       form operations (that require privilege) only on resources governed  by
102       that  namespace.   In other words, having a capability in a user names‐
103       pace permits a process to perform privileged  operations  on  resources
104       that  are  governed  by  (nonuser)  namespaces associated with the user
105       namespace (see the next subsection).
106
107       On the other hand, there are many  privileged  operations  that  affect
108       resources that are not associated with any namespace type, for example,
109       changing the system time (governed by CAP_SYS_TIME), loading  a  kernel
110       module (governed by CAP_SYS_MODULE), and creating a device (governed by
111       CAP_MKNOD).  Only a process with privileges in the initial user  names‐
112       pace can perform such operations.