1PID_NAMESPACES(7) Linux Programmer's Manual PID_NAMESPACES(7)
2
3
4
6 pid_namespaces - overview of Linux PID namespaces
7
9 For an overview of namespaces, see namespaces(7).
10
11 PID namespaces isolate the process ID number space, meaning that pro‐
12 cesses in different PID namespaces can have the same PID. PID names‐
13 paces allow containers to provide functionality such as suspend‐
14 ing/resuming the set of processes in the container and migrating the
15 container to a new host while the processes inside the container main‐
16 tain the same PIDs.
17
18 PIDs in a new PID namespace start at 1, somewhat like a standalone sys‐
19 tem, and calls to fork(2), vfork(2), or clone(2) will produce processes
20 with PIDs that are unique within the namespace.
21
22 Use of PID namespaces requires a kernel that is configured with the
23 CONFIG_PID_NS option.
24
25 The namespace init process
26 The first process created in a new namespace (i.e., the process created
27 using clone(2) with the CLONE_NEWPID flag, or the first child created
28 by a process after a call to unshare(2) using the CLONE_NEWPID flag)
29 has the PID 1, and is the "init" process for the namespace (see
30 init(1)). A child process that is orphaned within the namespace will
31 be reparented to this process rather than init(1) (unless one of the
32 ancestors of the child in the same PID namespace employed the prctl(2)
33 PR_SET_CHILD_SUBREAPER command to mark itself as the reaper of orphaned
34 descendant processes).
35
36 If the "init" process of a PID namespace terminates, the kernel termi‐
37 nates all of the processes in the namespace via a SIGKILL signal. This
38 behavior reflects the fact that the "init" process is essential for the
39 correct operation of a PID namespace. In this case, a subsequent
40 fork(2) into this PID namespace fail with the error ENOMEM; it is not
41 possible to create a new processes in a PID namespace whose "init"
42 process has terminated. Such scenarios can occur when, for example, a
43 process uses an open file descriptor for a /proc/[pid]/ns/pid file cor‐
44 responding to a process that was in a namespace to setns(2) into that
45 namespace after the "init" process has terminated. Another possible
46 scenario can occur after a call to unshare(2): if the first child sub‐
47 sequently created by a fork(2) terminates, then subsequent calls to
48 fork(2) fail with ENOMEM.
49
50 Only signals for which the "init" process has established a signal han‐
51 dler can be sent to the "init" process by other members of the PID
52 namespace. This restriction applies even to privileged processes, and
53 prevents other members of the PID namespace from accidentally killing
54 the "init" process.
55
56 Likewise, a process in an ancestor namespace can—subject to the usual
57 permission checks described in kill(2)—send signals to the "init"
58 process of a child PID namespace only if the "init" process has estab‐
59 lished a handler for that signal. (Within the handler, the siginfo_t
60 si_pid field described in sigaction(2) will be zero.) SIGKILL or
61 SIGSTOP are treated exceptionally: these signals are forcibly delivered
62 when sent from an ancestor PID namespace. Neither of these signals can
63 be caught by the "init" process, and so will result in the usual
64 actions associated with those signals (respectively, terminating and
65 stopping the process).
66
67 Starting with Linux 3.4, the reboot(2) system call causes a signal to
68 be sent to the namespace "init" process. See reboot(2) for more
69 details.
70
71 Nesting PID namespaces
72 PID namespaces can be nested: each PID namespace has a parent, except
73 for the initial ("root") PID namespace. The parent of a PID namespace
74 is the PID namespace of the process that created the namespace using
75 clone(2) or unshare(2). PID namespaces thus form a tree, with all
76 namespaces ultimately tracing their ancestry to the root namespace.
77 Since Linux 3.7, the kernel limits the maximum nesting depth for PID
78 namespaces to 32.
79
80 A process is visible to other processes in its PID namespace, and to
81 the processes in each direct ancestor PID namespace going back to the
82 root PID namespace. In this context, "visible" means that one process
83 can be the target of operations by another process using system calls
84 that specify a process ID. Conversely, the processes in a child PID
85 namespace can't see processes in the parent and further removed ances‐
86 tor namespaces. More succinctly: a process can see (e.g., send signals
87 with kill(2), set nice values with setpriority(2), etc.) only processes
88 contained in its own PID namespace and in descendants of that names‐
89 pace.
90
91 A process has one process ID in each of the layers of the PID namespace
92 hierarchy in which is visible, and walking back though each direct
93 ancestor namespace through to the root PID namespace. System calls
94 that operate on process IDs always operate using the process ID that is
95 visible in the PID namespace of the caller. A call to getpid(2) always
96 returns the PID associated with the namespace in which the process was
97 created.
98
99 Some processes in a PID namespace may have parents that are outside of
100 the namespace. For example, the parent of the initial process in the
101 namespace (i.e., the init(1) process with PID 1) is necessarily in
102 another namespace. Likewise, the direct children of a process that
103 uses setns(2) to cause its children to join a PID namespace are in a
104 different PID namespace from the caller of setns(2). Calls to getp‐
105 pid(2) for such processes return 0.
106
107 While processes may freely descend into child PID namespaces (e.g.,
108 using setns(2) with a PID namespace file descriptor), they may not move
109 in the other direction. That is to say, processes may not enter any
110 ancestor namespaces (parent, grandparent, etc.). Changing PID names‐
111 paces is a one-way operation.
112
113 The NS_GET_PARENT ioctl(2) operation can be used to discover the
114 parental relationship between PID namespaces; see ioctl_ns(2).
115
116 setns(2) and unshare(2) semantics
117 Calls to setns(2) that specify a PID namespace file descriptor and
118 calls to unshare(2) with the CLONE_NEWPID flag cause children subse‐
119 quently created by the caller to be placed in a different PID namespace
120 from the caller. (Since Linux 4.12, that PID namespace is shown via
121 the /proc/[pid]/ns/pid_for_children file, as described in names‐
122 paces(7).) These calls do not, however, change the PID namespace of
123 the calling process, because doing so would change the caller's idea of
124 its own PID (as reported by getpid()), which would break many applica‐
125 tions and libraries.
126
127 To put things another way: a process's PID namespace membership is
128 determined when the process is created and cannot be changed there‐
129 after. Among other things, this means that the parental relationship
130 between processes mirrors the parental relationship between PID names‐
131 paces: the parent of a process is either in the same namespace or
132 resides in the immediate parent PID namespace.
133
134 Compatibility of CLONE_NEWPID with other CLONE_* flags
135 In current versions of Linux, CLONE_NEWPID can't be combined with
136 CLONE_THREAD. Threads are required to be in the same PID namespace
137 such that the threads in a process can send signals to each other.
138 Similarly, it must be possible to see all of the threads of a processes
139 in the proc(5) filesystem. Additionally, if two threads were in dif‐
140 ferent PID namespaces, the process ID of the process sending a signal
141 could not be meaningfully encoded when a signal is sent (see the
142 description of the siginfo_t type in sigaction(2)). Since this is com‐
143 puted when a signal is enqueued, a signal queue shared by processes in
144 multiple PID namespaces would defeat that.
145
146 In earlier versions of Linux, CLONE_NEWPID was additionally disallowed
147 (failing with the error EINVAL) in combination with CLONE_SIGHAND
148 (before Linux 4.3) as well as CLONE_VM (before Linux 3.12). The
149 changes that lifted these restrictions have also been ported to earlier
150 stable kernels.
151
152 /proc and PID namespaces
153 A /proc filesystem shows (in the /proc/[pid] directories) only pro‐
154 cesses visible in the PID namespace of the process that performed the
155 mount, even if the /proc filesystem is viewed from processes in other
156 namespaces.
157
158 After creating a new PID namespace, it is useful for the child to
159 change its root directory and mount a new procfs instance at /proc so
160 that tools such as ps(1) work correctly. If a new mount namespace is
161 simultaneously created by including CLONE_NEWNS in the flags argument
162 of clone(2) or unshare(2), then it isn't necessary to change the root
163 directory: a new procfs instance can be mounted directly over /proc.
164
165 From a shell, the command to mount /proc is:
166
167 $ mount -t proc proc /proc
168
169 Calling readlink(2) on the path /proc/self yields the process ID of the
170 caller in the PID namespace of the procfs mount (i.e., the PID names‐
171 pace of the process that mounted the procfs). This can be useful for
172 introspection purposes, when a process wants to discover its PID in
173 other namespaces.
174
175 /proc files
176 /proc/sys/kernel/ns_last_pid (since Linux 3.3)
177 This file displays the last PID that was allocated in this PID
178 namespace. When the next PID is allocated, the kernel will
179 search for the lowest unallocated PID that is greater than this
180 value, and when this file is subsequently read it will show that
181 PID.
182
183 This file is writable by a process that has the CAP_SYS_ADMIN
184 capability inside its user namespace. This makes it possible to
185 determine the PID that is allocated to the next process that is
186 created inside this PID namespace.
187
188 Miscellaneous
189 When a process ID is passed over a UNIX domain socket to a process in a
190 different PID namespace (see the description of SCM_CREDENTIALS in
191 unix(7)), it is translated into the corresponding PID value in the
192 receiving process's PID namespace.
193
195 Namespaces are a Linux-specific feature.
196
198 See user_namespaces(7).
199
201 clone(2), reboot(2), setns(2), unshare(2), proc(5), capabilities(7),
202 credentials(7), mount_namespaces(7), namespaces(7), user_namespaces(7),
203 switch_root(8)
204
206 This page is part of release 4.16 of the Linux man-pages project. A
207 description of the project, information about reporting bugs, and the
208 latest version of this page, can be found at
209 https://www.kernel.org/doc/man-pages/.
210
211
212
213Linux 2017-11-26 PID_NAMESPACES(7)