pid_namespaces(7)

1PID_NAMESPACES(7)          Linux Programmer's Manual         PID_NAMESPACES(7)
2
3
4

NAME

6       pid_namespaces - overview of Linux PID namespaces
7

DESCRIPTION

9       For an overview of namespaces, see namespaces(7).
10
11       PID  namespaces  isolate the process ID number space, meaning that pro‐
12       cesses in different PID namespaces can have the same PID.   PID  names‐
13       paces  allow  containers  to  provide  functionality  such  as suspend‐
14       ing/resuming the set of processes in the container  and  migrating  the
15       container  to a new host while the processes inside the container main‐
16       tain the same PIDs.
17
18       PIDs in a new PID namespace start at 1, somewhat like a standalone sys‐
19       tem, and calls to fork(2), vfork(2), or clone(2) will produce processes
20       with PIDs that are unique within the namespace.
21
22       Use of PID namespaces requires a kernel that  is  configured  with  the
23       CONFIG_PID_NS option.
24
25   The namespace init process
26       The first process created in a new namespace (i.e., the process created
27       using clone(2) with the CLONE_NEWPID flag, or the first  child  created
28       by  a  process  after a call to unshare(2) using the CLONE_NEWPID flag)
29       has the PID 1, and  is  the  "init"  process  for  the  namespace  (see
30       init(1)).   A  child process that is orphaned within the namespace will
31       be reparented to this process rather than init(1) (unless  one  of  the
32       ancestors  of the child in the same PID namespace employed the prctl(2)
33       PR_SET_CHILD_SUBREAPER command to mark itself as the reaper of orphaned
34       descendant processes).
35
36       If  the "init" process of a PID namespace terminates, the kernel termi‐
37       nates all of the processes in the namespace via a SIGKILL signal.  This
38       behavior reflects the fact that the "init" process is essential for the
39       correct operation of a PID  namespace.   In  this  case,  a  subsequent
40       fork(2)  into  this PID namespace fail with the error ENOMEM; it is not
41       possible to create a new processes in  a  PID  namespace  whose  "init"
42       process  has terminated.  Such scenarios can occur when, for example, a
43       process uses an open file descriptor for a /proc/[pid]/ns/pid file cor‐
44       responding  to  a process that was in a namespace to setns(2) into that
45       namespace after the "init" process has  terminated.   Another  possible
46       scenario  can occur after a call to unshare(2): if the first child sub‐
47       sequently created by a fork(2) terminates,  then  subsequent  calls  to
48       fork(2) fail with ENOMEM.
49
50       Only signals for which the "init" process has established a signal han‐
51       dler can be sent to the "init" process by  other  members  of  the  PID
52       namespace.   This restriction applies even to privileged processes, and
53       prevents other members of the PID namespace from  accidentally  killing
54       the "init" process.
55
56       Likewise,  a  process in an ancestor namespace can—subject to the usual
57       permission checks described  in  kill(2)—send  signals  to  the  "init"
58       process  of a child PID namespace only if the "init" process has estab‐
59       lished a handler for that signal.  (Within the handler,  the  siginfo_t
60       si_pid  field  described  in  sigaction(2)  will  be zero.)  SIGKILL or
61       SIGSTOP are treated exceptionally: these signals are forcibly delivered
62       when sent from an ancestor PID namespace.  Neither of these signals can
63       be caught by the "init" process,  and  so  will  result  in  the  usual
64       actions  associated  with  those signals (respectively, terminating and
65       stopping the process).
66
67       Starting with Linux 3.4, the reboot(2) system call causes a  signal  to
68       be  sent  to  the  namespace  "init"  process.   See reboot(2) for more
69       details.
70
71   Nesting PID namespaces
72       PID namespaces can be nested: each PID namespace has a  parent,  except
73       for  the initial ("root") PID namespace.  The parent of a PID namespace
74       is the PID namespace of the process that created  the  namespace  using
75       clone(2)  or  unshare(2).   PID  namespaces  thus form a tree, with all
76       namespaces ultimately tracing their ancestry  to  the  root  namespace.
77       Since  Linux  3.7,  the kernel limits the maximum nesting depth for PID
78       namespaces to 32.
79
80       A process is visible to other processes in its PID  namespace,  and  to
81       the  processes  in each direct ancestor PID namespace going back to the
82       root PID namespace.  In this context, "visible" means that one  process
83       can  be  the target of operations by another process using system calls
84       that specify a process ID.  Conversely, the processes in  a  child  PID
85       namespace  can't see processes in the parent and further removed ances‐
86       tor namespaces.  More succinctly: a process can see (e.g., send signals
87       with kill(2), set nice values with setpriority(2), etc.) only processes
88       contained in its own PID namespace and in descendants  of  that  names‐
89       pace.
90
91       A process has one process ID in each of the layers of the PID namespace
92       hierarchy in which is visible, and  walking  back  though  each  direct
93       ancestor  namespace  through  to  the root PID namespace.  System calls
94       that operate on process IDs always operate using the process ID that is
95       visible in the PID namespace of the caller.  A call to getpid(2) always
96       returns the PID associated with the namespace in which the process  was
97       created.
98
99       Some  processes in a PID namespace may have parents that are outside of
100       the namespace.  For example, the parent of the initial process  in  the
101       namespace  (i.e.,  the  init(1)  process  with PID 1) is necessarily in
102       another namespace.  Likewise, the direct children  of  a  process  that
103       uses  setns(2)  to  cause its children to join a PID namespace are in a
104       different PID namespace from the caller of setns(2).   Calls  to  getp‐
105       pid(2) for such processes return 0.
106
107       While  processes  may  freely  descend into child PID namespaces (e.g.,
108       using setns(2) with a PID namespace file descriptor), they may not move
109       in  the  other  direction.  That is to say, processes may not enter any
110       ancestor namespaces (parent, grandparent, etc.).  Changing  PID  names‐
111       paces is a one-way operation.
112
113       The  NS_GET_PARENT  ioctl(2)  operation  can  be  used  to discover the
114       parental relationship between PID namespaces; see ioctl_ns(2).
115
116   setns(2) and unshare(2) semantics
117       Calls to setns(2) that specify a  PID  namespace  file  descriptor  and
118       calls  to  unshare(2)  with the CLONE_NEWPID flag cause children subse‐
119       quently created by the caller to be placed in a different PID namespace
120       from  the  caller.   (Since Linux 4.12, that PID namespace is shown via
121       the  /proc/[pid]/ns/pid_for_children  file,  as  described  in   names‐
122       paces(7).)   These  calls  do not, however, change the PID namespace of
123       the calling process, because doing so would change the caller's idea of
124       its  own PID (as reported by getpid()), which would break many applica‐
125       tions and libraries.
126
127       To put things another way: a  process's  PID  namespace  membership  is
128       determined  when  the  process  is created and cannot be changed there‐
129       after.  Among other things, this means that the  parental  relationship
130       between  processes mirrors the parental relationship between PID names‐
131       paces: the parent of a process is  either  in  the  same  namespace  or
132       resides in the immediate parent PID namespace.
133
134   Compatibility of CLONE_NEWPID with other CLONE_* flags
135       In  current  versions  of  Linux,  CLONE_NEWPID  can't be combined with
136       CLONE_THREAD.  Threads are required to be in  the  same  PID  namespace
137       such  that  the  threads  in  a process can send signals to each other.
138       Similarly, it must be possible to see all of the threads of a processes
139       in  the  proc(5) filesystem.  Additionally, if two threads were in dif‐
140       ferent PID namespaces, the process ID of the process sending  a  signal
141       could  not  be  meaningfully  encoded  when  a  signal is sent (see the
142       description of the siginfo_t type in sigaction(2)).  Since this is com‐
143       puted  when a signal is enqueued, a signal queue shared by processes in
144       multiple PID namespaces would defeat that.
145
146       In earlier versions of Linux, CLONE_NEWPID was additionally  disallowed
147       (failing  with  the  error  EINVAL)  in  combination with CLONE_SIGHAND
148       (before Linux 4.3) as  well  as  CLONE_VM  (before  Linux  3.12).   The
149       changes that lifted these restrictions have also been ported to earlier
150       stable kernels.
151
152   /proc and PID namespaces
153       A /proc filesystem shows (in the  /proc/[pid]  directories)  only  pro‐
154       cesses  visible  in the PID namespace of the process that performed the
155       mount, even if the /proc filesystem is viewed from processes  in  other
156       namespaces.
157
158       After  creating  a  new  PID  namespace,  it is useful for the child to
159       change its root directory and mount a new procfs instance at  /proc  so
160       that  tools  such as ps(1) work correctly.  If a new mount namespace is
161       simultaneously created by including CLONE_NEWNS in the  flags  argument
162       of  clone(2)  or unshare(2), then it isn't necessary to change the root
163       directory: a new procfs instance can be mounted directly over /proc.
164
165       From a shell, the command to mount /proc is:
166
167           $ mount -t proc proc /proc
168
169       Calling readlink(2) on the path /proc/self yields the process ID of the
170       caller  in  the PID namespace of the procfs mount (i.e., the PID names‐
171       pace of the process that mounted the procfs).  This can be  useful  for
172       introspection  purposes,  when  a  process wants to discover its PID in
173       other namespaces.
174
175   /proc files
176       /proc/sys/kernel/ns_last_pid (since Linux 3.3)
177              This file displays the last PID that was allocated in  this  PID
178              namespace.   When  the  next  PID  is allocated, the kernel will
179              search for the lowest unallocated PID that is greater than  this
180              value, and when this file is subsequently read it will show that
181              PID.
182
183              This file is writable by a process that  has  the  CAP_SYS_ADMIN
184              capability inside its user namespace.  This makes it possible to
185              determine the PID that is allocated to the next process that  is
186              created inside this PID namespace.
187
188   Miscellaneous
189       When a process ID is passed over a UNIX domain socket to a process in a
190       different PID namespace (see  the  description  of  SCM_CREDENTIALS  in
191       unix(7)),  it  is  translated  into  the corresponding PID value in the
192       receiving process's PID namespace.
193

CONFORMING TO

195       Namespaces are a Linux-specific feature.
196

EXAMPLE

198       See user_namespaces(7).
199

COLOPHON

206       This page is part of release 4.15 of the Linux  man-pages  project.   A
207       description  of  the project, information about reporting bugs, and the
208       latest    version    of    this    page,    can     be     found     at
209       https://www.kernel.org/doc/man-pages/.
210
211
212
213Linux                             2017-11-26                 PID_NAMESPACES(7)

NAME

DESCRIPTION

CONFORMING TO

EXAMPLE

SEE ALSO

COLOPHON