pid_namespaces(7)

1PID_NAMESPACES(7)          Linux Programmer's Manual         PID_NAMESPACES(7)
2
3
4

NAME

6       pid_namespaces - overview of Linux PID namespaces
7

DESCRIPTION

9       For an overview of namespaces, see namespaces(7).
10
11       PID  namespaces  isolate the process ID number space, meaning that pro‐
12       cesses in different PID namespaces can have the same  PID.   PID  name‐
13       spaces allow containers to provide functionality such as suspending/re‐
14       suming the set of processes in the container  and  migrating  the  con‐
15       tainer  to a new host while the processes inside the container maintain
16       the same PIDs.
17
18       PIDs in a new PID namespace start at 1, somewhat like a standalone sys‐
19       tem, and calls to fork(2), vfork(2), or clone(2) will produce processes
20       with PIDs that are unique within the namespace.
21
22       Use of PID namespaces requires a kernel that  is  configured  with  the
23       CONFIG_PID_NS option.
24
25   The namespace init process
26       The first process created in a new namespace (i.e., the process created
27       using clone(2) with the CLONE_NEWPID flag, or the first  child  created
28       by  a  process  after a call to unshare(2) using the CLONE_NEWPID flag)
29       has the PID 1, and  is  the  "init"  process  for  the  namespace  (see
30       init(1)).   This process becomes the parent of any child processes that
31       are orphaned because a process that resides in this PID namespace  ter‐
32       minated (see below for further details).
33
34       If  the "init" process of a PID namespace terminates, the kernel termi‐
35       nates all of the processes in the namespace via a SIGKILL signal.  This
36       behavior reflects the fact that the "init" process is essential for the
37       correct operation of a PID  namespace.   In  this  case,  a  subsequent
38       fork(2)  into  this PID namespace fail with the error ENOMEM; it is not
39       possible to create a new  process  in  a  PID  namespace  whose  "init"
40       process  has terminated.  Such scenarios can occur when, for example, a
41       process uses an open file descriptor for a /proc/[pid]/ns/pid file cor‐
42       responding  to  a process that was in a namespace to setns(2) into that
43       namespace after the "init" process has  terminated.   Another  possible
44       scenario  can occur after a call to unshare(2): if the first child sub‐
45       sequently created by a fork(2) terminates,  then  subsequent  calls  to
46       fork(2) fail with ENOMEM.
47
48       Only signals for which the "init" process has established a signal han‐
49       dler can be sent to the "init" process by  other  members  of  the  PID
50       namespace.   This restriction applies even to privileged processes, and
51       prevents other members of the PID namespace from  accidentally  killing
52       the "init" process.
53
54       Likewise,  a  process in an ancestor namespace can—subject to the usual
55       permission checks described  in  kill(2)—send  signals  to  the  "init"
56       process  of a child PID namespace only if the "init" process has estab‐
57       lished a handler for that signal.  (Within the handler,  the  siginfo_t
58       si_pid  field  described  in  sigaction(2)  will  be zero.)  SIGKILL or
59       SIGSTOP are treated exceptionally: these signals are forcibly delivered
60       when sent from an ancestor PID namespace.  Neither of these signals can
61       be caught by the "init" process, and so will result in  the  usual  ac‐
62       tions  associated  with  those  signals  (respectively, terminating and
63       stopping the process).
64
65       Starting with Linux 3.4, the reboot(2) system call causes a  signal  to
66       be  sent  to  the namespace "init" process.  See reboot(2) for more de‐
67       tails.
68
69   Nesting PID namespaces
70       PID namespaces can be nested: each PID namespace has a  parent,  except
71       for  the initial ("root") PID namespace.  The parent of a PID namespace
72       is the PID namespace of the process that created  the  namespace  using
73       clone(2)  or  unshare(2).   PID  namespaces  thus form a tree, with all
74       namespaces ultimately tracing their ancestry  to  the  root  namespace.
75       Since  Linux  3.7,  the kernel limits the maximum nesting depth for PID
76       namespaces to 32.
77
78       A process is visible to other processes in its PID  namespace,  and  to
79       the  processes  in each direct ancestor PID namespace going back to the
80       root PID namespace.  In this context, "visible" means that one  process
81       can  be  the target of operations by another process using system calls
82       that specify a process ID.  Conversely, the processes in  a  child  PID
83       namespace  can't see processes in the parent and further removed ances‐
84       tor namespaces.  More succinctly: a process can see (e.g., send signals
85       with kill(2), set nice values with setpriority(2), etc.) only processes
86       contained in its own PID namespace and in  descendants  of  that  name‐
87       space.
88
89       A process has one process ID in each of the layers of the PID namespace
90       hierarchy in which is visible, and walking back though each direct  an‐
91       cestor  namespace through to the root PID namespace.  System calls that
92       operate on process IDs always operate using the process ID that is vis‐
93       ible  in  the  PID namespace of the caller.  A call to getpid(2) always
94       returns the PID associated with the namespace in which the process  was
95       created.
96
97       Some  processes in a PID namespace may have parents that are outside of
98       the namespace.  For example, the parent of the initial process  in  the
99       namespace  (i.e., the init(1) process with PID 1) is necessarily in an‐
100       other namespace.  Likewise, the direct children of a process that  uses
101       setns(2) to cause its children to join a PID namespace are in a differ‐
102       ent PID namespace from the caller of setns(2).  Calls to getppid(2) for
103       such processes return 0.
104
105       While processes may freely descend into child PID namespaces (e.g., us‐
106       ing setns(2) with a PID namespace file descriptor), they may  not  move
107       in  the  other  direction.  That is to say, processes may not enter any
108       ancestor namespaces (parent, grandparent, etc.).   Changing  PID  name‐
109       spaces is a one-way operation.
110
111       The  NS_GET_PARENT  ioctl(2)  operation  can  be  used  to discover the
112       parental relationship between PID namespaces; see ioctl_ns(2).
113
114   setns(2) and unshare(2) semantics
115       Calls to setns(2) that specify a  PID  namespace  file  descriptor  and
116       calls  to  unshare(2)  with the CLONE_NEWPID flag cause children subse‐
117       quently created by the caller to be placed in a different PID namespace
118       from  the  caller.   (Since Linux 4.12, that PID namespace is shown via
119       the  /proc/[pid]/ns/pid_for_children  file,  as  described   in   name‐
120       spaces(7).)   These  calls do not, however, change the PID namespace of
121       the calling process, because doing so would change the caller's idea of
122       its  own PID (as reported by getpid()), which would break many applica‐
123       tions and libraries.
124
125       To put things another way: a process's PID namespace membership is  de‐
126       termined  when the process is created and cannot be changed thereafter.
127       Among other things, this means that the parental  relationship  between
128       processes mirrors the parental relationship between PID namespaces: the
129       parent of a process is either in the same namespace or resides  in  the
130       immediate parent PID namespace.
131
132       A  process  may  call  unshare(2) with the CLONE_NEWPID flag only once.
133       After it has performed this operation,  its  /proc/PID/ns/pid_for_chil‐
134       dren  symbolic  link  will be empty until the first child is created in
135       the namespace.
136
137   Adoption of orphaned children
138       When a child process becomes orphaned, it is reparented to  the  "init"
139       process  in  the  PID namespace of its parent (unless one of the nearer
140       ancestors of the parent employed  the  prctl(2)  PR_SET_CHILD_SUBREAPER
141       command to mark itself as the reaper of orphaned descendant processes).
142       Note that because of the setns(2) and  unshare(2)  semantics  described
143       above,  this may be the "init" process in the PID namespace that is the
144       parent of the child's PID namespace, rather than the "init" process  in
145       the child's own PID namespace.
146
147   Compatibility of CLONE_NEWPID with other CLONE_* flags
148       In  current  versions  of  Linux,  CLONE_NEWPID  can't be combined with
149       CLONE_THREAD.  Threads are required to be in  the  same  PID  namespace
150       such  that  the  threads  in  a process can send signals to each other.
151       Similarly, it must be possible to see all of the threads of a processes
152       in  the  proc(5) filesystem.  Additionally, if two threads were in dif‐
153       ferent PID namespaces, the process ID of the process sending  a  signal
154       could  not  be  meaningfully encoded when a signal is sent (see the de‐
155       scription of the siginfo_t type in sigaction(2)).  Since this  is  com‐
156       puted  when a signal is enqueued, a signal queue shared by processes in
157       multiple PID namespaces would defeat that.
158
159       In earlier versions of Linux, CLONE_NEWPID was additionally  disallowed
160       (failing  with the error EINVAL) in combination with CLONE_SIGHAND (be‐
161       fore Linux 4.3) as well as CLONE_VM (before Linux 3.12).   The  changes
162       that  lifted these restrictions have also been ported to earlier stable
163       kernels.
164
165   /proc and PID namespaces
166       A /proc filesystem shows (in the  /proc/[pid]  directories)  only  pro‐
167       cesses  visible  in the PID namespace of the process that performed the
168       mount, even if the /proc filesystem is viewed from processes  in  other
169       namespaces.
170
171       After  creating  a  new  PID  namespace,  it is useful for the child to
172       change its root directory and mount a new procfs instance at  /proc  so
173       that  tools  such as ps(1) work correctly.  If a new mount namespace is
174       simultaneously created by including CLONE_NEWNS in the  flags  argument
175       of  clone(2)  or unshare(2), then it isn't necessary to change the root
176       directory: a new procfs instance can be mounted directly over /proc.
177
178       From a shell, the command to mount /proc is:
179
180           $ mount -t proc proc /proc
181
182       Calling readlink(2) on the path /proc/self yields the process ID of the
183       caller  in  the  PID namespace of the procfs mount (i.e., the PID name‐
184       space of the process that mounted the procfs).  This can be useful  for
185       introspection  purposes,  when  a  process wants to discover its PID in
186       other namespaces.
187
188   /proc files
189       /proc/sys/kernel/ns_last_pid (since Linux 3.3)
190              This file (which is virtualized per PID namespace) displays  the
191              last  PID  that  was  allocated in this PID namespace.  When the
192              next PID is allocated, the kernel will search for the lowest un‐
193              allocated  PID  that  is  greater than this value, and when this
194              file is subsequently read it will show that PID.
195
196              This file is writable by a process that has the CAP_SYS_ADMIN or
197              (since  Linux  5.9) CAP_CHECKPOINT_RESTORE capability inside the
198              user namespace that owns the PID namespace.  This makes it  pos‐
199              sible to determine the PID that is allocated to the next process
200              that is created inside this PID namespace.
201
202   Miscellaneous
203       When a process ID is passed over a UNIX domain socket to a process in a
204       different  PID  namespace  (see  the  description of SCM_CREDENTIALS in
205       unix(7)), it is translated into the corresponding PID value in the  re‐
206       ceiving process's PID namespace.
207

CONFORMING TO

209       Namespaces are a Linux-specific feature.
210

EXAMPLES

212       See user_namespaces(7).
213

COLOPHON

220       This  page  is  part of release 5.12 of the Linux man-pages project.  A
221       description of the project, information about reporting bugs,  and  the
222       latest     version     of     this    page,    can    be    found    at
223       https://www.kernel.org/doc/man-pages/.
224
225
226
227Linux                             2020-11-01                 PID_NAMESPACES(7)

NAME

DESCRIPTION

CONFORMING TO

EXAMPLES

SEE ALSO

COLOPHON