1mount_namespaces(7)    Miscellaneous Information Manual    mount_namespaces(7)
2
3
4

NAME

6       mount_namespaces - overview of Linux mount namespaces
7

DESCRIPTION

9       For an overview of namespaces, see namespaces(7).
10
11       Mount  namespaces  provide  isolation of the list of mounts seen by the
12       processes in each namespace instance.  Thus, the processes in  each  of
13       the  mount namespace instances will see distinct single-directory hier‐
14       archies.
15
16       The views provided by the  /proc/pid/mounts,  /proc/pid/mountinfo,  and
17       /proc/pid/mountstats files (all described in proc(5)) correspond to the
18       mount namespace in which the process with the PID pid resides.  (All of
19       the processes that reside in the same mount namespace will see the same
20       view in these files.)
21
22       A new mount namespace is created using either  clone(2)  or  unshare(2)
23       with  the CLONE_NEWNS flag.  When a new mount namespace is created, its
24       mount list is initialized as follows:
25
26       •  If the namespace is created using clone(2), the mount  list  of  the
27          child's  namespace  is  a  copy  of  the  mount  list  in the parent
28          process's mount namespace.
29
30       •  If the namespace is created using unshare(2), the mount list of  the
31          new  namespace  is a copy of the mount list in the caller's previous
32          mount namespace.
33
34       Subsequent modifications to the mount list (mount(2) and umount(2))  in
35       either mount namespace will not (by default) affect the mount list seen
36       in the other namespace (but see the following discussion of shared sub‐
37       trees).
38

SHARED SUBTREES

40       After  the implementation of mount namespaces was completed, experience
41       showed that the isolation that they provided was, in  some  cases,  too
42       great.   For  example,  in  order  to  make a newly loaded optical disk
43       available in all mount namespaces, a mount operation  was  required  in
44       each namespace.  For this use case, and others, the shared subtree fea‐
45       ture was introduced in Linux 2.6.15.  This  feature  allows  for  auto‐
46       matic,  controlled propagation of mount(2) and umount(2) events between
47       namespaces (or, more precisely, between the mounts that are members  of
48       a peer group that are propagating events to one another).
49
50       Each  mount  is  marked  (via  mount(2)) as having one of the following
51       propagation types:
52
53       MS_SHARED
54              This mount shares events with members of a peer group.  mount(2)
55              and umount(2) events immediately under this mount will propagate
56              to the other mounts that are members of the peer group.   Propa‐
57              gation here means that the same mount(2) or umount(2) will auto‐
58              matically occur under all of the other mounts in the peer group.
59              Conversely,  mount(2) and umount(2) events that take place under
60              peer mounts will propagate to this mount.
61
62       MS_PRIVATE
63              This mount is private; it does not have a peer group.   mount(2)
64              and umount(2) events do not propagate into or out of this mount.
65
66       MS_SLAVE
67              mount(2)  and  umount(2) events propagate into this mount from a
68              (master) shared peer group.  mount(2) and umount(2) events under
69              this mount do not propagate to any peer.
70
71              Note  that  a mount can be the slave of another peer group while
72              at the same time sharing mount(2) and umount(2)  events  with  a
73              peer  group  of which it is a member.  (More precisely, one peer
74              group can be the slave of another peer group.)
75
76       MS_UNBINDABLE
77              This is like a private mount, and in addition this  mount  can't
78              be  bind  mounted.   Attempts to bind mount this mount (mount(2)
79              with the MS_BIND flag) will fail.
80
81              When a recursive bind  mount  (mount(2)  with  the  MS_BIND  and
82              MS_REC  flags)  is  performed  on  a directory subtree, any bind
83              mounts within the subtree are automatically  pruned  (i.e.,  not
84              replicated)  when replicating that subtree to produce the target
85              subtree.
86
87       For a discussion of the propagation type assigned to a new  mount,  see
88       NOTES.
89
90       The  propagation  type is a per-mount-point setting; some mounts may be
91       marked as shared (with each shared mount being a member of  a  distinct
92       peer group), while others are private (or slaved or unbindable).
93
94       Note  that  a  mount's propagation type determines whether mount(2) and
95       umount(2) of mounts immediately under the mount are propagated.   Thus,
96       the  propagation  type does not affect propagation of events for grand‐
97       children and further removed descendant mounts.  What  happens  if  the
98       mount itself is unmounted is determined by the propagation type that is
99       in effect for the parent of the mount.
100
101       Members are added to a peer group when a mount is marked as shared  and
102       either:
103
104       (a)  the  mount  is replicated during the creation of a new mount name‐
105            space; or
106
107       (b)  a new bind mount is created from the mount.
108
109       In both of these cases, the new mount joins the peer group of which the
110       existing mount is a member.
111
112       A new peer group is also created when a child mount is created under an
113       existing mount that is marked as shared.  In this case, the  new  child
114       mount is also marked as shared and the resulting peer group consists of
115       all the mounts that are replicated under the peers of parent mounts.
116
117       A mount ceases to be a member of a peer group when either the mount  is
118       explicitly unmounted, or when the mount is implicitly unmounted because
119       a mount namespace is removed (because it has no more member processes).
120
121       The propagation type of the mounts in a mount namespace can be  discov‐
122       ered  via  the  "optional fields" exposed in /proc/pid/mountinfo.  (See
123       proc(5) for details of this file.)  The following tags  can  appear  in
124       the optional fields for a record in that file:
125
126       shared:X
127              This  mount  is  shared  in peer group X.  Each peer group has a
128              unique ID that is automatically generated by the kernel, and all
129              mounts in the same peer group will show the same ID.  (These IDs
130              are assigned starting from the value 1, and may be recycled when
131              a peer group ceases to have any members.)
132
133       master:X
134              This mount is a slave to shared peer group X.
135
136       propagate_from:X (since Linux 2.6.26)
137              This  mount is a slave and receives propagation from shared peer
138              group X.  This tag will always appear in conjunction with a mas‐
139              ter:X tag.  Here, X is the closest dominant peer group under the
140              process's root directory.  If X is the immediate master  of  the
141              mount,  or  if  there  is  no dominant peer group under the same
142              root, then only the master:X field is present and not the propa‐
143              gate_from:X field.  For further details, see below.
144
145       unbindable
146              This is an unbindable mount.
147
148       If none of the above tags is present, then this is a private mount.
149
150   MS_SHARED and MS_PRIVATE example
151       Suppose  that on a terminal in the initial mount namespace, we mark one
152       mount as shared and another as private, and then  view  the  mounts  in
153       /proc/self/mountinfo:
154
155           sh1# mount --make-shared /mntS
156           sh1# mount --make-private /mntP
157           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
158           77 61 8:17 / /mntS rw,relatime shared:1
159           83 61 8:15 / /mntP rw,relatime
160
161       From  the  /proc/self/mountinfo  output,  we see that /mntS is a shared
162       mount in peer group 1, and that /mntP has no optional tags,  indicating
163       that  it  is  a  private mount.  The first two fields in each record in
164       this file are the unique ID for this mount, and the  mount  ID  of  the
165       parent  mount.  We can further inspect this file to see that the parent
166       mount of /mntS and /mntP is the root directory, /, which is mounted  as
167       private:
168
169           sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
170           61 0 8:2 / / rw,relatime
171
172       On  a  second  terminal, we create a new mount namespace where we run a
173       second shell and inspect the mounts:
174
175           $ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
176           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
177           222 145 8:17 / /mntS rw,relatime shared:1
178           225 145 8:15 / /mntP rw,relatime
179
180       The new mount namespace received a copy  of  the  initial  mount  name‐
181       space's  mounts.  These new mounts maintain the same propagation types,
182       but have unique mount IDs.  (The --propagation  unchanged  option  pre‐
183       vents unshare(1) from marking all mounts as private when creating a new
184       mount namespace, which it does by default.)
185
186       In the second terminal, we then create submounts under  each  of  /mntS
187       and /mntP and inspect the set-up:
188
189           sh2# mkdir /mntS/a
190           sh2# mount /dev/sdb6 /mntS/a
191           sh2# mkdir /mntP/b
192           sh2# mount /dev/sdb7 /mntP/b
193           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
194           222 145 8:17 / /mntS rw,relatime shared:1
195           225 145 8:15 / /mntP rw,relatime
196           178 222 8:22 / /mntS/a rw,relatime shared:2
197           230 225 8:23 / /mntP/b rw,relatime
198
199       From  the above, it can be seen that /mntS/a was created as shared (in‐
200       heriting this setting from its parent mount) and /mntP/b was created as
201       a private mount.
202
203       Returning  to the first terminal and inspecting the set-up, we see that
204       the new mount created under the shared mount /mntS  propagated  to  its
205       peer  mount (in the initial mount namespace), but the new mount created
206       under the private mount /mntP did not propagate:
207
208           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
209           77 61 8:17 / /mntS rw,relatime shared:1
210           83 61 8:15 / /mntP rw,relatime
211           179 77 8:22 / /mntS/a rw,relatime shared:2
212
213   MS_SLAVE example
214       Making a mount a slave allows it to  receive  propagated  mount(2)  and
215       umount(2)  events  from a master shared peer group, while preventing it
216       from propagating events to that master.  This is useful if we  want  to
217       (say) receive a mount event when an optical disk is mounted in the mas‐
218       ter shared peer group (in another mount namespace), but want to prevent
219       mount(2)  and  umount(2)  events under the slave mount from having side
220       effects in other namespaces.
221
222       We can demonstrate the effect of slaving by first marking two mounts as
223       shared in the initial mount namespace:
224
225           sh1# mount --make-shared /mntX
226           sh1# mount --make-shared /mntY
227           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
228           132 83 8:23 / /mntX rw,relatime shared:1
229           133 83 8:22 / /mntY rw,relatime shared:2
230
231       On  a  second terminal, we create a new mount namespace and inspect the
232       mounts:
233
234           sh2# unshare -m --propagation unchanged sh
235           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
236           168 167 8:23 / /mntX rw,relatime shared:1
237           169 167 8:22 / /mntY rw,relatime shared:2
238
239       In the new mount namespace, we then mark one of the mounts as a slave:
240
241           sh2# mount --make-slave /mntY
242           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
243           168 167 8:23 / /mntX rw,relatime shared:1
244           169 167 8:22 / /mntY rw,relatime master:2
245
246       From the above output, we see that /mntY is now a slave mount  that  is
247       receiving propagation events from the shared peer group with the ID 2.
248
249       Continuing  in  the  new  namespace,  we create submounts under each of
250       /mntX and /mntY:
251
252           sh2# mkdir /mntX/a
253           sh2# mount /dev/sda3 /mntX/a
254           sh2# mkdir /mntY/b
255           sh2# mount /dev/sda5 /mntY/b
256
257       When we inspect the state of the mounts in the new mount namespace,  we
258       see  that  /mntX/a  was  created  as a new shared mount (inheriting the
259       "shared" setting from its parent mount) and /mntY/b was  created  as  a
260       private mount:
261
262           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
263           168 167 8:23 / /mntX rw,relatime shared:1
264           169 167 8:22 / /mntY rw,relatime master:2
265           173 168 8:3 / /mntX/a rw,relatime shared:3
266           175 169 8:5 / /mntY/b rw,relatime
267
268       Returning  to  the  first terminal (in the initial mount namespace), we
269       see that the mount /mntX/a propagated to the peer (the  shared  /mntX),
270       but the mount /mntY/b was not propagated:
271
272           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
273           132 83 8:23 / /mntX rw,relatime shared:1
274           133 83 8:22 / /mntY rw,relatime shared:2
275           174 132 8:3 / /mntX/a rw,relatime shared:3
276
277       Now we create a new mount under /mntY in the first shell:
278
279           sh1# mkdir /mntY/c
280           sh1# mount /dev/sda1 /mntY/c
281           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
282           132 83 8:23 / /mntX rw,relatime shared:1
283           133 83 8:22 / /mntY rw,relatime shared:2
284           174 132 8:3 / /mntX/a rw,relatime shared:3
285           178 133 8:1 / /mntY/c rw,relatime shared:4
286
287       When  we  examine the mounts in the second mount namespace, we see that
288       in this case the new mount has been propagated to the slave mount,  and
289       that the new mount is itself a slave mount (to peer group 4):
290
291           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
292           168 167 8:23 / /mntX rw,relatime shared:1
293           169 167 8:22 / /mntY rw,relatime master:2
294           173 168 8:3 / /mntX/a rw,relatime shared:3
295           175 169 8:5 / /mntY/b rw,relatime
296           179 169 8:1 / /mntY/c rw,relatime master:4
297
298   MS_UNBINDABLE example
299       One of the primary purposes of unbindable mounts is to avoid the "mount
300       explosion" problem when repeatedly performing bind mounts of a  higher-
301       level  subtree  at  a lower-level mount.  The problem is illustrated by
302       the following shell session.
303
304       Suppose we have a system with the following mounts:
305
306           # mount | awk '{print $1, $2, $3}'
307           /dev/sda1 on /
308           /dev/sdb6 on /mntX
309           /dev/sdb7 on /mntY
310
311       Suppose furthermore that we wish to recursively bind mount the root di‐
312       rectory  under  several  users'  home  directories.  We do this for the
313       first user, and inspect the mounts:
314
315           # mount --rbind / /home/cecilia/
316           # mount | awk '{print $1, $2, $3}'
317           /dev/sda1 on /
318           /dev/sdb6 on /mntX
319           /dev/sdb7 on /mntY
320           /dev/sda1 on /home/cecilia
321           /dev/sdb6 on /home/cecilia/mntX
322           /dev/sdb7 on /home/cecilia/mntY
323
324       When we repeat this operation for the second user, we start to see  the
325       explosion problem:
326
327           # mount --rbind / /home/henry
328           # mount | awk '{print $1, $2, $3}'
329           /dev/sda1 on /
330           /dev/sdb6 on /mntX
331           /dev/sdb7 on /mntY
332           /dev/sda1 on /home/cecilia
333           /dev/sdb6 on /home/cecilia/mntX
334           /dev/sdb7 on /home/cecilia/mntY
335           /dev/sda1 on /home/henry
336           /dev/sdb6 on /home/henry/mntX
337           /dev/sdb7 on /home/henry/mntY
338           /dev/sda1 on /home/henry/home/cecilia
339           /dev/sdb6 on /home/henry/home/cecilia/mntX
340           /dev/sdb7 on /home/henry/home/cecilia/mntY
341
342       Under  /home/henry,  we  have  not only recursively added the /mntX and
343       /mntY mounts, but also the recursive mounts of those directories  under
344       /home/cecilia  that  were created in the previous step.  Upon repeating
345       the step for a third user, it becomes obvious that the explosion is ex‐
346       ponential in nature:
347
348           # mount --rbind / /home/otto
349           # mount | awk '{print $1, $2, $3}'
350           /dev/sda1 on /
351           /dev/sdb6 on /mntX
352           /dev/sdb7 on /mntY
353           /dev/sda1 on /home/cecilia
354           /dev/sdb6 on /home/cecilia/mntX
355           /dev/sdb7 on /home/cecilia/mntY
356           /dev/sda1 on /home/henry
357           /dev/sdb6 on /home/henry/mntX
358           /dev/sdb7 on /home/henry/mntY
359           /dev/sda1 on /home/henry/home/cecilia
360           /dev/sdb6 on /home/henry/home/cecilia/mntX
361           /dev/sdb7 on /home/henry/home/cecilia/mntY
362           /dev/sda1 on /home/otto
363           /dev/sdb6 on /home/otto/mntX
364           /dev/sdb7 on /home/otto/mntY
365           /dev/sda1 on /home/otto/home/cecilia
366           /dev/sdb6 on /home/otto/home/cecilia/mntX
367           /dev/sdb7 on /home/otto/home/cecilia/mntY
368           /dev/sda1 on /home/otto/home/henry
369           /dev/sdb6 on /home/otto/home/henry/mntX
370           /dev/sdb7 on /home/otto/home/henry/mntY
371           /dev/sda1 on /home/otto/home/henry/home/cecilia
372           /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
373           /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
374
375       The  mount  explosion  problem  in the above scenario can be avoided by
376       making each of the new mounts unbindable.  The effect of doing this  is
377       that  recursive mounts of the root directory will not replicate the un‐
378       bindable mounts.  We make such a mount for the first user:
379
380           # mount --rbind --make-unbindable / /home/cecilia
381
382       Before going further, we show that unbindable mounts are indeed unbind‐
383       able:
384
385           # mkdir /mntZ
386           # mount --bind /home/cecilia /mntZ
387           mount: wrong fs type, bad option, bad superblock on /home/cecilia,
388                  missing codepage or helper program, or other error
389
390                  In some cases useful info is found in syslog - try
391                  dmesg | tail or so.
392
393       Now we create unbindable recursive bind mounts for the other two users:
394
395           # mount --rbind --make-unbindable / /home/henry
396           # mount --rbind --make-unbindable / /home/otto
397
398       Upon  examining  the list of mounts, we see there has been no explosion
399       of mounts, because the unbindable mounts were not replicated under each
400       user's directory:
401
402           # mount | awk '{print $1, $2, $3}'
403           /dev/sda1 on /
404           /dev/sdb6 on /mntX
405           /dev/sdb7 on /mntY
406           /dev/sda1 on /home/cecilia
407           /dev/sdb6 on /home/cecilia/mntX
408           /dev/sdb7 on /home/cecilia/mntY
409           /dev/sda1 on /home/henry
410           /dev/sdb6 on /home/henry/mntX
411           /dev/sdb7 on /home/henry/mntY
412           /dev/sda1 on /home/otto
413           /dev/sdb6 on /home/otto/mntX
414           /dev/sdb7 on /home/otto/mntY
415
416   Propagation type transitions
417       The  following  table  shows the effect that applying a new propagation
418       type (i.e., mount --make-xxxx) has on the existing propagation type  of
419       a  mount.   The  rows correspond to existing propagation types, and the
420       columns are the new propagation settings.  For reasons of space,  "pri‐
421       vate" is abbreviated as "priv" and "unbindable" as "unbind".
422
423                     make-shared   make-slave      make-priv  make-unbind
424       ─────────────┬───────────────────────────────────────────────────────
425       shared       │shared        slave/priv [1]  priv       unbind
426       slave        │slave+shared  slave [2]       priv       unbind
427       slave+shared │slave+shared  slave           priv       unbind
428       private      │shared        priv [2]        priv       unbind
429       unbindable   │shared        unbind [2]      priv       unbind
430
431       Note the following details to the table:
432
433       [1] If  a shared mount is the only mount in its peer group, making it a
434           slave automatically makes it private.
435
436       [2] Slaving a nonshared mount has no effect on the mount.
437
438   Bind (MS_BIND) semantics
439       Suppose that the following command is performed:
440
441           mount --bind A/a B/b
442
443       Here, A is the source mount, B is the destination mount, a is a  subdi‐
444       rectory  path under the mount point A, and b is a subdirectory path un‐
445       der the mount point B.  The propagation type of  the  resulting  mount,
446       B/b,  depends  on  the  propagation types of the mounts A and B, and is
447       summarized in the following table.
448
449                                  source(A)
450                          shared  private    slave         unbind
451       ──────────────────┬──────────────────────────────────────────
452       dest(B)  shared   │shared  shared     slave+shared  invalid
453                nonshared│shared  private    slave         invalid
454
455       Note that a recursive bind of a subtree follows the same  semantics  as
456       for  a bind operation on each mount in the subtree.  (Unbindable mounts
457       are automatically pruned at the target mount point.)
458
459       For further details, see Documentation/filesystems/sharedsubtree.rst in
460       the kernel source tree.
461
462   Move (MS_MOVE) semantics
463       Suppose that the following command is performed:
464
465           mount --move A B/b
466
467       Here,  A  is  the  source mount, B is the destination mount, and b is a
468       subdirectory path under the mount point B.  The propagation type of the
469       resulting  mount, B/b, depends on the propagation types of the mounts A
470       and B, and is summarized in the following table.
471
472                                  source(A)
473                          shared  private    slave         unbind
474       ──────────────────┬─────────────────────────────────────────────
475       dest(B)  shared   │shared  shared     slave+shared  invalid
476                nonshared│shared  private    slave         unbindable
477
478       Note: moving a mount that resides under a shared mount is invalid.
479
480       For further details, see Documentation/filesystems/sharedsubtree.rst in
481       the kernel source tree.
482
483   Mount semantics
484       Suppose that we use the following command to create a mount:
485
486           mount device B/b
487
488       Here,  B  is  the destination mount, and b is a subdirectory path under
489       the mount point B.  The propagation type of the resulting  mount,  B/b,
490       follows  the same rules as for a bind mount, where the propagation type
491       of the source mount is considered always to be private.
492
493   Unmount semantics
494       Suppose that we use the following command to tear down a mount:
495
496           umount A
497
498       Here, A is a mount on B/b, where B is the parent mount and b is a  sub‐
499       directory path under the mount point B.  If B is shared, then all most-
500       recently-mounted mounts at b on mounts that  receive  propagation  from
501       mount B and do not have submounts under them are unmounted.
502
503   The /proc/ pid /mountinfo propagate_from tag
504       The  propagate_from:X  tag  is  shown  in  the  optional  fields  of  a
505       /proc/pid/mountinfo record in cases where a process can't see a slave's
506       immediate  master  (i.e.,  the  pathname of the master is not reachable
507       from the filesystem root directory) and so cannot determine  the  chain
508       of propagation between the mounts it can see.
509
510       In the following example, we first create a two-link master-slave chain
511       between the mounts /mnt, /tmp/etc,  and  /mnt/tmp/etc.   Then  the  ch‐
512       root(1)  command  is  used to make the /tmp/etc mount point unreachable
513       from the root directory, creating  a  situation  where  the  master  of
514       /mnt/tmp/etc  is  not  reachable  from  the (new) root directory of the
515       process.
516
517       First, we bind mount the root directory onto /mnt and then  bind  mount
518       /proc  at  /mnt/proc  so  that  after  the  later chroot(1) the proc(5)
519       filesystem remains visible at the correct location in the chroot-ed en‐
520       vironment.
521
522           # mkdir -p /mnt/proc
523           # mount --bind / /mnt
524           # mount --bind /proc /mnt/proc
525
526       Next,  we  ensure  that  the /mnt mount is a shared mount in a new peer
527       group (with no peers):
528
529           # mount --make-private /mnt  # Isolate from any previous peer group
530           # mount --make-shared /mnt
531           # cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
532           239 61 8:2 / /mnt ... shared:102
533           248 239 0:4 / /mnt/proc ... shared:5
534
535       Next, we bind mount /mnt/etc onto /tmp/etc:
536
537           # mkdir -p /tmp/etc
538           # mount --bind /mnt/etc /tmp/etc
539           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
540           239 61 8:2 / /mnt ... shared:102
541           248 239 0:4 / /mnt/proc ... shared:5
542           267 40 8:2 /etc /tmp/etc ... shared:102
543
544       Initially, these two mounts are in the same peer  group,  but  we  then
545       make the /tmp/etc a slave of /mnt/etc, and then make /tmp/etc shared as
546       well, so that it can propagate events to the next slave in the chain:
547
548           # mount --make-slave /tmp/etc
549           # mount --make-shared /tmp/etc
550           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
551           239 61 8:2 / /mnt ... shared:102
552           248 239 0:4 / /mnt/proc ... shared:5
553           267 40 8:2 /etc /tmp/etc ... shared:105 master:102
554
555       Then we bind mount /tmp/etc onto /mnt/tmp/etc.  Again, the  two  mounts
556       are  initially  in the same peer group, but we then make /mnt/tmp/etc a
557       slave of /tmp/etc:
558
559           # mkdir -p /mnt/tmp/etc
560           # mount --bind /tmp/etc /mnt/tmp/etc
561           # mount --make-slave /mnt/tmp/etc
562           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
563           239 61 8:2 / /mnt ... shared:102
564           248 239 0:4 / /mnt/proc ... shared:5
565           267 40 8:2 /etc /tmp/etc ... shared:105 master:102
566           273 239 8:2 /etc /mnt/tmp/etc ... master:105
567
568       From the above, we see that /mnt is the master of the  slave  /tmp/etc,
569       which in turn is the master of the slave /mnt/tmp/etc.
570
571       We  then  chroot(1) to the /mnt directory, which renders the mount with
572       ID 267 unreachable from the (new) root directory:
573
574           # chroot /mnt
575
576       When we examine the state of the mounts inside the  chroot-ed  environ‐
577       ment, we see the following:
578
579           # cat /proc/self/mountinfo | sed 's/ - .*//'
580           239 61 8:2 / / ... shared:102
581           248 239 0:4 / /proc ... shared:5
582           273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
583
584       Above, we see that the mount with ID 273 is a slave whose master is the
585       peer group 105.  The mount point for that master is unreachable, and so
586       a propagate_from tag is displayed, indicating that the closest dominant
587       peer group (i.e., the nearest reachable mount in the  slave  chain)  is
588       the  peer  group with the ID 102 (corresponding to the /mnt mount point
589       before the chroot(1) was performed).
590

STANDARDS

592       Linux.
593

HISTORY

595       Linux 2.4.19.
596

NOTES

598       The propagation type assigned to a new mount depends on the propagation
599       type  of  the  parent  mount.  If the mount has a parent (i.e., it is a
600       non-root mount point)  and  the  propagation  type  of  the  parent  is
601       MS_SHARED,  then  the  propagation  type  of  the  new  mount  is  also
602       MS_SHARED.  Otherwise, the propagation type of the new mount is MS_PRI‐
603       VATE.
604
605       Notwithstanding  the  fact  that  the  default propagation type for new
606       mount is in many cases MS_PRIVATE, MS_SHARED is typically more  useful.
607       For  this  reason,  systemd(1)  automatically  remounts  all  mounts as
608       MS_SHARED on system startup.  Thus, on most modern systems, the default
609       propagation type is in practice MS_SHARED.
610
611       Since,  when  one uses unshare(1) to create a mount namespace, the goal
612       is commonly to provide full isolation of the mounts in  the  new  name‐
613       space,  unshare(1)  (since  util-linux  2.27) in turn reverses the step
614       performed by systemd(1), by making all mounts private in the new  name‐
615       space.  That is, unshare(1) performs the equivalent of the following in
616       the new mount namespace:
617
618           mount --make-rprivate /
619
620       To prevent this, one can use the --propagation unchanged option to  un‐
621       share(1).
622
623       An  application  that  creates  a  new  mount  namespace directly using
624       clone(2) or unshare(2) may  desire  to  prevent  propagation  of  mount
625       events  to other mount namespaces (as is done by unshare(1)).  This can
626       be done by changing the propagation type of mounts in the new namespace
627       to either MS_SLAVE or MS_PRIVATE, using a call such as the following:
628
629           mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
630
631       For  a discussion of propagation types when moving mounts (MS_MOVE) and
632       creating bind mounts (MS_BIND),  see  Documentation/filesystems/shared‐
633       subtree.rst.
634
635   Restrictions on mount namespaces
636       Note the following points with respect to mount namespaces:
637
638       [1] Each  mount  namespace  has  an owner user namespace.  As explained
639           above, when a new mount namespace is created,  its  mount  list  is
640           initialized as a copy of the mount list of another mount namespace.
641           If the new namespace and the namespace from which  the  mount  list
642           was  copied  are  owned  by different user namespaces, then the new
643           mount namespace is considered less privileged.
644
645       [2] When creating a less privileged mount namespace, shared mounts  are
646           reduced  to  slave mounts.  This ensures that mappings performed in
647           less privileged mount namespaces will not propagate to more  privi‐
648           leged mount namespaces.
649
650       [3] Mounts  that  come  as  a  single unit from a more privileged mount
651           namespace are locked together and may not be separated  in  a  less
652           privileged  mount namespace.  (The unshare(2) CLONE_NEWNS operation
653           brings across all of the mounts from the original  mount  namespace
654           as a single unit, and recursive mounts that propagate between mount
655           namespaces propagate as a single unit.)
656
657           In this context, "may not be separated" means that the  mounts  are
658           locked  so  that  they may not be individually unmounted.  Consider
659           the following example:
660
661               $ sudo sh
662               # mount --bind /dev/null /etc/shadow
663               # cat /etc/shadow       # Produces no output
664
665           The above steps, performed in a more  privileged  mount  namespace,
666           have  created a bind mount that obscures the contents of the shadow
667           password file, /etc/shadow.  For security reasons, it should not be
668           possible  to  umount(2) that mount in a less privileged mount name‐
669           space, since that would reveal the contents of /etc/shadow.
670
671           Suppose we now create a new mount namespace owned  by  a  new  user
672           namespace.   The  new mount namespace will inherit copies of all of
673           the mounts from  the  previous  mount  namespace.   However,  those
674           mounts will be locked because the new mount namespace is less priv‐
675           ileged.  Consequently, an attempt to umount(2) the mount  fails  as
676           show in the following step:
677
678               # unshare --user --map-root-user --mount \
679                              strace -o /tmp/log \
680                              umount /mnt/dir
681               umount: /etc/shadow: not mounted.
682               # grep '^umount' /tmp/log
683               umount2("/etc/shadow", 0)     = -1 EINVAL (Invalid argument)
684
685           The  error  message  from  mount(8)  is a little confusing, but the
686           strace(1) output reveals that the underlying umount2(2) system call
687           failed  with  the  error EINVAL, which is the error that the kernel
688           returns to indicate that the mount is locked.
689
690           Note, however, that it is possible to stack (and unstack)  a  mount
691           on  top  of one of the inherited locked mounts in a less privileged
692           mount namespace:
693
694               # echo 'aaaaa' > /tmp/a    # File to mount onto /etc/shadow
695               # unshare --user --map-root-user --mount \
696                   sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
697               aaaaa
698               # umount /etc/shadow
699
700           The final umount(8) command above, which is performed in  the  ini‐
701           tial mount namespace, makes the original /etc/shadow file once more
702           visible in that namespace.
703
704       [4] Following on from point [3], note that it is possible to  umount(2)
705           an  entire  subtree of mounts that propagated as a unit into a less
706           privileged mount namespace, as illustrated in the  following  exam‐
707           ple.
708
709           First,  we  create  new user and mount namespaces using unshare(1).
710           In the new mount namespace, the propagation type of all  mounts  is
711           set  to private.  We then create a shared bind mount at /mnt, and a
712           small hierarchy of mounts underneath that mount.
713
714               $ PS1='ns1# ' sudo unshare --user --map-root-user \
715                                      --mount --propagation private bash
716               ns1# echo $$        # We need the PID of this shell later
717               778501
718               ns1# mount --make-shared --bind /mnt /mnt
719               ns1# mkdir /mnt/x
720               ns1# mount --make-private -t tmpfs none /mnt/x
721               ns1# mkdir /mnt/x/y
722               ns1# mount --make-private -t tmpfs none /mnt/x/y
723               ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
724               986 83 8:5 /mnt /mnt rw,relatime shared:344
725               989 986 0:56 / /mnt/x rw,relatime
726               990 989 0:57 / /mnt/x/y rw,relatime
727
728           Continuing in the same shell session, we then create a second shell
729           in a new user namespace and a new (less privileged) mount namespace
730           and check the state of the propagated mounts rooted at /mnt.
731
732               ns1# PS1='ns2# ' unshare --user --map-root-user \
733                                      --mount --propagation unchanged bash
734               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
735               1239 1204 8:5 /mnt /mnt rw,relatime master:344
736               1240 1239 0:56 / /mnt/x rw,relatime
737               1241 1240 0:57 / /mnt/x/y rw,relatime
738
739           Of note in the above output is that the  propagation  type  of  the
740           mount  /mnt  has  been reduced to slave, as explained in point [2].
741           This means that submount events will propagate from the master /mnt
742           in "ns1", but propagation will not occur in the opposite direction.
743
744           From  a  separate  terminal window, we then use nsenter(1) to enter
745           the mount and user namespaces corresponding to "ns1".  In that ter‐
746           minal window, we then recursively bind mount /mnt/x at the location
747           /mnt/ppp.
748
749               $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
750               ns3# mount --rbind --make-private /mnt/x /mnt/ppp
751               ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
752               986 83 8:5 /mnt /mnt rw,relatime shared:344
753               989 986 0:56 / /mnt/x rw,relatime
754               990 989 0:57 / /mnt/x/y rw,relatime
755               1242 986 0:56 / /mnt/ppp rw,relatime
756               1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518
757
758           Because the propagation type of the parent mount, /mnt, was shared,
759           the recursive bind mount propagated a small subtree of mounts under
760           the slave mount /mnt into "ns2", as can be  verified  by  executing
761           the following command in that shell session:
762
763               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
764               1239 1204 8:5 /mnt /mnt rw,relatime master:344
765               1240 1239 0:56 / /mnt/x rw,relatime
766               1241 1240 0:57 / /mnt/x/y rw,relatime
767               1244 1239 0:56 / /mnt/ppp rw,relatime
768               1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518
769
770           While it is not possible to umount(2) a part of the propagated sub‐
771           tree (/mnt/ppp/y) in "ns2", it is possible to umount(2) the  entire
772           subtree, as shown by the following commands:
773
774               ns2# umount /mnt/ppp/y
775               umount: /mnt/ppp/y: not mounted.
776               ns2# umount -l /mnt/ppp | sed 's/ - .*//'      # Succeeds...
777               ns2# grep /mnt /proc/self/mountinfo
778               1239 1204 8:5 /mnt /mnt rw,relatime master:344
779               1240 1239 0:56 / /mnt/x rw,relatime
780               1241 1240 0:57 / /mnt/x/y rw,relatime
781
782       [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime"
783           flags  (MS_NOATIME,  MS_NODIRATIME,  MS_RELATIME)  settings  become
784           locked  when propagated from a more privileged to a less privileged
785           mount namespace, and may not be  changed  in  the  less  privileged
786           mount namespace.
787
788           This point is illustrated in the following example where, in a more
789           privileged mount namespace, we create a bind mount that  is  marked
790           as  read-only.   For security reasons, it should not be possible to
791           make the mount writable in a less privileged mount  namespace,  and
792           indeed the kernel prevents this:
793
794               $ sudo mkdir /mnt/dir
795               $ sudo mount --bind -o ro /some/path /mnt/dir
796               $ sudo unshare --user --map-root-user --mount \
797                              mount -o remount,rw /mnt/dir
798               mount: /mnt/dir: permission denied.
799
800       [6] A  file or directory that is a mount point in one namespace that is
801           not a mount point in another namespace, may be  renamed,  unlinked,
802           or  removed  (rmdir(2)) in the mount namespace in which it is not a
803           mount point (subject  to  the  usual  permission  checks).   Conse‐
804           quently, the mount point is removed in the mount namespace where it
805           was a mount point.
806
807           Previously (before Linux 3.18), attempting to  unlink,  rename,  or
808           remove  a file or directory that was a mount point in another mount
809           namespace would result in the error EBUSY.  That behavior had tech‐
810           nical problems of enforcement (e.g., for NFS) and permitted denial-
811           of-service attacks against more privileged users (i.e.,  preventing
812           individual  files  from  being  updated  by bind mounting on top of
813           them).
814

EXAMPLES

816       See pivot_root(2).
817

SEE ALSO

819       unshare(1),  clone(2),   mount(2),   mount_setattr(2),   pivot_root(2),
820       setns(2),  umount(2),  unshare(2),  proc(5),  namespaces(7), user_name‐
821       spaces(7),  findmnt(8),  mount(8),   pam_namespace(8),   pivot_root(8),
822       umount(8)
823
824       Documentation/filesystems/sharedsubtree.rst in the kernel source tree.
825
826
827
828Linux man-pages 6.05              2023-05-03               mount_namespaces(7)
Impressum