pid_namespaces(7)

1PID_NAMESPACES(7)          Linux Programmer's Manual         PID_NAMESPACES(7)
2
3
4

NAME

6       pid_namespaces - overview of Linux PID namespaces
7

DESCRIPTION

9       For an overview of namespaces, see namespaces(7).
10
11       PID  namespaces  isolate the process ID number space, meaning that pro‐
12       cesses in different PID namespaces can have the same PID.   PID  names‐
13       paces  allow  containers  to  provide  functionality  such  as suspend‐
14       ing/resuming the set of processes in the container  and  migrating  the
15       container  to a new host while the processes inside the container main‐
16       tain the same PIDs.
17
18       PIDs in a new PID namespace start at 1, somewhat like a standalone sys‐
19       tem, and calls to fork(2), vfork(2), or clone(2) will produce processes
20       with PIDs that are unique within the namespace.
21
22       Use of PID namespaces requires a kernel that  is  configured  with  the
23       CONFIG_PID_NS option.
24
25   The namespace init process
26       The first process created in a new namespace (i.e., the process created
27       using clone(2) with the CLONE_NEWPID flag, or the first  child  created
28       by  a  process  after a call to unshare(2) using the CLONE_NEWPID flag)
29       has the PID 1, and  is  the  "init"  process  for  the  namespace  (see
30       init(1)).   This process becomes the parent of any child processes that
31       are orphaned because a process that resides in this PID namespace  ter‐
32       minated (see below for further details).
33
34       If  the "init" process of a PID namespace terminates, the kernel termi‐
35       nates all of the processes in the namespace via a SIGKILL signal.  This
36       behavior reflects the fact that the "init" process is essential for the
37       correct operation of a PID  namespace.   In  this  case,  a  subsequent
38       fork(2)  into  this PID namespace fail with the error ENOMEM; it is not
39       possible to create a new  process  in  a  PID  namespace  whose  "init"
40       process  has terminated.  Such scenarios can occur when, for example, a
41       process uses an open file descriptor for a /proc/[pid]/ns/pid file cor‐
42       responding  to  a process that was in a namespace to setns(2) into that
43       namespace after the "init" process has  terminated.   Another  possible
44       scenario  can occur after a call to unshare(2): if the first child sub‐
45       sequently created by a fork(2) terminates,  then  subsequent  calls  to
46       fork(2) fail with ENOMEM.
47
48       Only signals for which the "init" process has established a signal han‐
49       dler can be sent to the "init" process by  other  members  of  the  PID
50       namespace.   This restriction applies even to privileged processes, and
51       prevents other members of the PID namespace from  accidentally  killing
52       the "init" process.
53
54       Likewise,  a  process in an ancestor namespace can—subject to the usual
55       permission checks described  in  kill(2)—send  signals  to  the  "init"
56       process  of a child PID namespace only if the "init" process has estab‐
57       lished a handler for that signal.  (Within the handler,  the  siginfo_t
58       si_pid  field  described  in  sigaction(2)  will  be zero.)  SIGKILL or
59       SIGSTOP are treated exceptionally: these signals are forcibly delivered
60       when sent from an ancestor PID namespace.  Neither of these signals can
61       be caught by the "init" process,  and  so  will  result  in  the  usual
62       actions  associated  with  those signals (respectively, terminating and
63       stopping the process).
64
65       Starting with Linux 3.4, the reboot(2) system call causes a  signal  to
66       be  sent  to  the  namespace  "init"  process.   See reboot(2) for more
67       details.
68
69   Nesting PID namespaces
70       PID namespaces can be nested: each PID namespace has a  parent,  except
71       for  the initial ("root") PID namespace.  The parent of a PID namespace
72       is the PID namespace of the process that created  the  namespace  using
73       clone(2)  or  unshare(2).   PID  namespaces  thus form a tree, with all
74       namespaces ultimately tracing their ancestry  to  the  root  namespace.
75       Since  Linux  3.7,  the kernel limits the maximum nesting depth for PID
76       namespaces to 32.
77
78       A process is visible to other processes in its PID  namespace,  and  to
79       the  processes  in each direct ancestor PID namespace going back to the
80       root PID namespace.  In this context, "visible" means that one  process
81       can  be  the target of operations by another process using system calls
82       that specify a process ID.  Conversely, the processes in  a  child  PID
83       namespace  can't see processes in the parent and further removed ances‐
84       tor namespaces.  More succinctly: a process can see (e.g., send signals
85       with kill(2), set nice values with setpriority(2), etc.) only processes
86       contained in its own PID namespace and in descendants  of  that  names‐
87       pace.
88
89       A process has one process ID in each of the layers of the PID namespace
90       hierarchy in which is visible, and  walking  back  though  each  direct
91       ancestor  namespace  through  to  the root PID namespace.  System calls
92       that operate on process IDs always operate using the process ID that is
93       visible in the PID namespace of the caller.  A call to getpid(2) always
94       returns the PID associated with the namespace in which the process  was
95       created.
96
97       Some  processes in a PID namespace may have parents that are outside of
98       the namespace.  For example, the parent of the initial process  in  the
99       namespace  (i.e.,  the  init(1)  process  with PID 1) is necessarily in
100       another namespace.  Likewise, the direct children  of  a  process  that
101       uses  setns(2)  to  cause its children to join a PID namespace are in a
102       different PID namespace from the caller of setns(2).   Calls  to  getp‐
103       pid(2) for such processes return 0.
104
105       While  processes  may  freely  descend into child PID namespaces (e.g.,
106       using setns(2) with a PID namespace file descriptor), they may not move
107       in  the  other  direction.  That is to say, processes may not enter any
108       ancestor namespaces (parent, grandparent, etc.).  Changing  PID  names‐
109       paces is a one-way operation.
110
111       The  NS_GET_PARENT  ioctl(2)  operation  can  be  used  to discover the
112       parental relationship between PID namespaces; see ioctl_ns(2).
113
114   setns(2) and unshare(2) semantics
115       Calls to setns(2) that specify a  PID  namespace  file  descriptor  and
116       calls  to  unshare(2)  with the CLONE_NEWPID flag cause children subse‐
117       quently created by the caller to be placed in a different PID namespace
118       from  the  caller.   (Since Linux 4.12, that PID namespace is shown via
119       the  /proc/[pid]/ns/pid_for_children  file,  as  described  in   names‐
120       paces(7).)   These  calls  do not, however, change the PID namespace of
121       the calling process, because doing so would change the caller's idea of
122       its  own PID (as reported by getpid()), which would break many applica‐
123       tions and libraries.
124
125       To put things another way: a  process's  PID  namespace  membership  is
126       determined  when  the  process  is created and cannot be changed there‐
127       after.  Among other things, this means that the  parental  relationship
128       between  processes mirrors the parental relationship between PID names‐
129       paces: the parent of a process is  either  in  the  same  namespace  or
130       resides in the immediate parent PID namespace.
131
132       A  process  may  call  unshare(2) with the CLONE_NEWPID flag only once.
133       After it has performed this operation,  its  /proc/PID/ns/pid_for_chil‐
134       dren  symbolic  link  will be empty until the first child is created in
135       the namespace.
136
137   Adoption of orphaned children
138       When a child process becomes orphaned, it is reparented to  the  "init"
139       process  in  the  PID namespace of its parent (unless one of the nearer
140       ancestors of the parent employed  the  prctl(2)  PR_SET_CHILD_SUBREAPER
141       command to mark itself as the reaper of orphaned descendant processes).
142       Note that because of the setns(2) and  unshare(2)  semantics  described
143       above,  this may be the "init" process in the PID namespace that is the
144       parent of the child's PID namespace, rather than the "init" process  in
145       the child's own PID namespace.
146
147
148   Compatibility of CLONE_NEWPID with other CLONE_* flags
149       In  current  versions  of  Linux,  CLONE_NEWPID  can't be combined with
150       CLONE_THREAD.  Threads are required to be in  the  same  PID  namespace
151       such  that  the  threads  in  a process can send signals to each other.
152       Similarly, it must be possible to see all of the threads of a processes
153       in  the  proc(5) filesystem.  Additionally, if two threads were in dif‐
154       ferent PID namespaces, the process ID of the process sending  a  signal
155       could  not  be  meaningfully  encoded  when  a  signal is sent (see the
156       description of the siginfo_t type in sigaction(2)).  Since this is com‐
157       puted  when a signal is enqueued, a signal queue shared by processes in
158       multiple PID namespaces would defeat that.
159
160       In earlier versions of Linux, CLONE_NEWPID was additionally  disallowed
161       (failing  with  the  error  EINVAL)  in  combination with CLONE_SIGHAND
162       (before Linux 4.3) as  well  as  CLONE_VM  (before  Linux  3.12).   The
163       changes that lifted these restrictions have also been ported to earlier
164       stable kernels.
165
166   /proc and PID namespaces
167       A /proc filesystem shows (in the  /proc/[pid]  directories)  only  pro‐
168       cesses  visible  in the PID namespace of the process that performed the
169       mount, even if the /proc filesystem is viewed from processes  in  other
170       namespaces.
171
172       After  creating  a  new  PID  namespace,  it is useful for the child to
173       change its root directory and mount a new procfs instance at  /proc  so
174       that  tools  such as ps(1) work correctly.  If a new mount namespace is
175       simultaneously created by including CLONE_NEWNS in the  flags  argument
176       of  clone(2)  or unshare(2), then it isn't necessary to change the root
177       directory: a new procfs instance can be mounted directly over /proc.
178
179       From a shell, the command to mount /proc is:
180
181           $ mount -t proc proc /proc
182
183       Calling readlink(2) on the path /proc/self yields the process ID of the
184       caller  in  the PID namespace of the procfs mount (i.e., the PID names‐
185       pace of the process that mounted the procfs).  This can be  useful  for
186       introspection  purposes,  when  a  process wants to discover its PID in
187       other namespaces.
188
189   /proc files
190       /proc/sys/kernel/ns_last_pid (since Linux 3.3)
191              This file (which is virtualized per PID namespace) displays  the
192              last  PID  that  was  allocated in this PID namespace.  When the
193              next PID is allocated, the kernel will  search  for  the  lowest
194              unallocated  PID  that is greater than this value, and when this
195              file is subsequently read it will show that PID.
196
197              This file is writable by a process that  has  the  CAP_SYS_ADMIN
198              capability  inside  the  user namespace that owns the PID names‐
199              pace.  This makes it possible to determine the PID that is allo‐
200              cated to the next process that is created inside this PID names‐
201              pace.
202
203   Miscellaneous
204       When a process ID is passed over a UNIX domain socket to a process in a
205       different  PID  namespace  (see  the  description of SCM_CREDENTIALS in
206       unix(7)), it is translated into the  corresponding  PID  value  in  the
207       receiving process's PID namespace.
208

CONFORMING TO

210       Namespaces are a Linux-specific feature.
211

EXAMPLES

213       See user_namespaces(7).
214

COLOPHON

221       This  page  is  part of release 5.07 of the Linux man-pages project.  A
222       description of the project, information about reporting bugs,  and  the
223       latest     version     of     this    page,    can    be    found    at
224       https://www.kernel.org/doc/man-pages/.
225
226
227
228Linux                             2020-06-09                 PID_NAMESPACES(7)

NAME

DESCRIPTION

CONFORMING TO

EXAMPLES

SEE ALSO

COLOPHON