1MOUNT_NAMESPACES(7) Linux Programmer's Manual MOUNT_NAMESPACES(7)
2
3
4
6 mount_namespaces - overview of Linux mount namespaces
7
9 For an overview of namespaces, see namespaces(7).
10
11 Mount namespaces provide isolation of the list of mounts seen by the
12 processes in each namespace instance. Thus, the processes in each of
13 the mount namespace instances will see distinct single-directory hier‐
14 archies.
15
16 The views provided by the /proc/[pid]/mounts, /proc/[pid]/mountinfo,
17 and /proc/[pid]/mountstats files (all described in proc(5)) correspond
18 to the mount namespace in which the process with the PID [pid] resides.
19 (All of the processes that reside in the same mount namespace will see
20 the same view in these files.)
21
22 A new mount namespace is created using either clone(2) or unshare(2)
23 with the CLONE_NEWNS flag. When a new mount namespace is created, its
24 mount list is initialized as follows:
25
26 * If the namespace is created using clone(2), the mount list of the
27 child's namespace is a copy of the mount list in the parent
28 process's mount namespace.
29
30 * If the namespace is created using unshare(2), the mount list of the
31 new namespace is a copy of the mount list in the caller's previous
32 mount namespace.
33
34 Subsequent modifications to the mount list (mount(2) and umount(2)) in
35 either mount namespace will not (by default) affect the mount list seen
36 in the other namespace (but see the following discussion of shared sub‐
37 trees).
38
40 After the implementation of mount namespaces was completed, experience
41 showed that the isolation that they provided was, in some cases, too
42 great. For example, in order to make a newly loaded optical disk
43 available in all mount namespaces, a mount operation was required in
44 each namespace. For this use case, and others, the shared subtree fea‐
45 ture was introduced in Linux 2.6.15. This feature allows for auto‐
46 matic, controlled propagation of mount and unmount events between name‐
47 spaces (or, more precisely, between the mounts that are members of a
48 peer group that are propagating events to one another).
49
50 Each mount is marked (via mount(2)) as having one of the following
51 propagation types:
52
53 MS_SHARED
54 This mount shares events with members of a peer group. Mount
55 and unmount events immediately under this mount will propagate
56 to the other mounts that are members of the peer group. Propa‐
57 gation here means that the same mount or unmount will automati‐
58 cally occur under all of the other mounts in the peer group.
59 Conversely, mount and unmount events that take place under peer
60 mounts will propagate to this mount.
61
62 MS_PRIVATE
63 This mount is private; it does not have a peer group. Mount and
64 unmount events do not propagate into or out of this mount.
65
66 MS_SLAVE
67 Mount and unmount events propagate into this mount from a (mas‐
68 ter) shared peer group. Mount and unmount events under this
69 mount do not propagate to any peer.
70
71 Note that a mount can be the slave of another peer group while
72 at the same time sharing mount and unmount events with a peer
73 group of which it is a member. (More precisely, one peer group
74 can be the slave of another peer group.)
75
76 MS_UNBINDABLE
77 This is like a private mount, and in addition this mount can't
78 be bind mounted. Attempts to bind mount this mount (mount(2)
79 with the MS_BIND flag) will fail.
80
81 When a recursive bind mount (mount(2) with the MS_BIND and
82 MS_REC flags) is performed on a directory subtree, any bind
83 mounts within the subtree are automatically pruned (i.e., not
84 replicated) when replicating that subtree to produce the target
85 subtree.
86
87 For a discussion of the propagation type assigned to a new mount, see
88 NOTES.
89
90 The propagation type is a per-mount-point setting; some mounts may be
91 marked as shared (with each shared mount being a member of a distinct
92 peer group), while others are private (or slaved or unbindable).
93
94 Note that a mount's propagation type determines whether mounts and un‐
95 mounts of mounts immediately under the mount are propagated. Thus, the
96 propagation type does not affect propagation of events for grandchil‐
97 dren and further removed descendant mounts. What happens if the mount
98 itself is unmounted is determined by the propagation type that is in
99 effect for the parent of the mount.
100
101 Members are added to a peer group when a mount is marked as shared and
102 either:
103
104 * the mount is replicated during the creation of a new mount name‐
105 space; or
106
107 * a new bind mount is created from the mount.
108
109 In both of these cases, the new mount joins the peer group of which the
110 existing mount is a member.
111
112 A new peer group is also created when a child mount is created under an
113 existing mount that is marked as shared. In this case, the new child
114 mount is also marked as shared and the resulting peer group consists of
115 all the mounts that are replicated under the peers of parent mounts.
116
117 A mount ceases to be a member of a peer group when either the mount is
118 explicitly unmounted, or when the mount is implicitly unmounted because
119 a mount namespace is removed (because it has no more member processes).
120
121 The propagation type of the mounts in a mount namespace can be discov‐
122 ered via the "optional fields" exposed in /proc/[pid]/mountinfo. (See
123 proc(5) for details of this file.) The following tags can appear in
124 the optional fields for a record in that file:
125
126 shared:X
127 This mount is shared in peer group X. Each peer group has a
128 unique ID that is automatically generated by the kernel, and all
129 mounts in the same peer group will show the same ID. (These IDs
130 are assigned starting from the value 1, and may be recycled when
131 a peer group ceases to have any members.)
132
133 master:X
134 This mount is a slave to shared peer group X.
135
136 propagate_from:X (since Linux 2.6.26)
137 This mount is a slave and receives propagation from shared peer
138 group X. This tag will always appear in conjunction with a mas‐
139 ter:X tag. Here, X is the closest dominant peer group under the
140 process's root directory. If X is the immediate master of the
141 mount, or if there is no dominant peer group under the same
142 root, then only the master:X field is present and not the propa‐
143 gate_from:X field. For further details, see below.
144
145 unbindable
146 This is an unbindable mount.
147
148 If none of the above tags is present, then this is a private mount.
149
150 MS_SHARED and MS_PRIVATE example
151 Suppose that on a terminal in the initial mount namespace, we mark one
152 mount as shared and another as private, and then view the mounts in
153 /proc/self/mountinfo:
154
155 sh1# mount --make-shared /mntS
156 sh1# mount --make-private /mntP
157 sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
158 77 61 8:17 / /mntS rw,relatime shared:1
159 83 61 8:15 / /mntP rw,relatime
160
161 From the /proc/self/mountinfo output, we see that /mntS is a shared
162 mount in peer group 1, and that /mntP has no optional tags, indicating
163 that it is a private mount. The first two fields in each record in
164 this file are the unique ID for this mount, and the mount ID of the
165 parent mount. We can further inspect this file to see that the parent
166 mount of /mntS and /mntP is the root directory, /, which is mounted as
167 private:
168
169 sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
170 61 0 8:2 / / rw,relatime
171
172 On a second terminal, we create a new mount namespace where we run a
173 second shell and inspect the mounts:
174
175 $ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
176 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
177 222 145 8:17 / /mntS rw,relatime shared:1
178 225 145 8:15 / /mntP rw,relatime
179
180 The new mount namespace received a copy of the initial mount name‐
181 space's mounts. These new mounts maintain the same propagation types,
182 but have unique mount IDs. (The --propagation unchanged option pre‐
183 vents unshare(1) from marking all mounts as private when creating a new
184 mount namespace, which it does by default.)
185
186 In the second terminal, we then create submounts under each of /mntS
187 and /mntP and inspect the set-up:
188
189 sh2# mkdir /mntS/a
190 sh2# mount /dev/sdb6 /mntS/a
191 sh2# mkdir /mntP/b
192 sh2# mount /dev/sdb7 /mntP/b
193 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
194 222 145 8:17 / /mntS rw,relatime shared:1
195 225 145 8:15 / /mntP rw,relatime
196 178 222 8:22 / /mntS/a rw,relatime shared:2
197 230 225 8:23 / /mntP/b rw,relatime
198
199 From the above, it can be seen that /mntS/a was created as shared (in‐
200 heriting this setting from its parent mount) and /mntP/b was created as
201 a private mount.
202
203 Returning to the first terminal and inspecting the set-up, we see that
204 the new mount created under the shared mount /mntS propagated to its
205 peer mount (in the initial mount namespace), but the new mount created
206 under the private mount /mntP did not propagate:
207
208 sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
209 77 61 8:17 / /mntS rw,relatime shared:1
210 83 61 8:15 / /mntP rw,relatime
211 179 77 8:22 / /mntS/a rw,relatime shared:2
212
213 MS_SLAVE example
214 Making a mount a slave allows it to receive propagated mount and un‐
215 mount events from a master shared peer group, while preventing it from
216 propagating events to that master. This is useful if we want to (say)
217 receive a mount event when an optical disk is mounted in the master
218 shared peer group (in another mount namespace), but want to prevent
219 mount and unmount events under the slave mount from having side effects
220 in other namespaces.
221
222 We can demonstrate the effect of slaving by first marking two mounts as
223 shared in the initial mount namespace:
224
225 sh1# mount --make-shared /mntX
226 sh1# mount --make-shared /mntY
227 sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
228 132 83 8:23 / /mntX rw,relatime shared:1
229 133 83 8:22 / /mntY rw,relatime shared:2
230
231 On a second terminal, we create a new mount namespace and inspect the
232 mounts:
233
234 sh2# unshare -m --propagation unchanged sh
235 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
236 168 167 8:23 / /mntX rw,relatime shared:1
237 169 167 8:22 / /mntY rw,relatime shared:2
238
239 In the new mount namespace, we then mark one of the mounts as a slave:
240
241 sh2# mount --make-slave /mntY
242 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
243 168 167 8:23 / /mntX rw,relatime shared:1
244 169 167 8:22 / /mntY rw,relatime master:2
245
246 From the above output, we see that /mntY is now a slave mount that is
247 receiving propagation events from the shared peer group with the ID 2.
248
249 Continuing in the new namespace, we create submounts under each of
250 /mntX and /mntY:
251
252 sh2# mkdir /mntX/a
253 sh2# mount /dev/sda3 /mntX/a
254 sh2# mkdir /mntY/b
255 sh2# mount /dev/sda5 /mntY/b
256
257 When we inspect the state of the mounts in the new mount namespace, we
258 see that /mntX/a was created as a new shared mount (inheriting the
259 "shared" setting from its parent mount) and /mntY/b was created as a
260 private mount:
261
262 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
263 168 167 8:23 / /mntX rw,relatime shared:1
264 169 167 8:22 / /mntY rw,relatime master:2
265 173 168 8:3 / /mntX/a rw,relatime shared:3
266 175 169 8:5 / /mntY/b rw,relatime
267
268 Returning to the first terminal (in the initial mount namespace), we
269 see that the mount /mntX/a propagated to the peer (the shared /mntX),
270 but the mount /mntY/b was not propagated:
271
272 sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
273 132 83 8:23 / /mntX rw,relatime shared:1
274 133 83 8:22 / /mntY rw,relatime shared:2
275 174 132 8:3 / /mntX/a rw,relatime shared:3
276
277 Now we create a new mount under /mntY in the first shell:
278
279 sh1# mkdir /mntY/c
280 sh1# mount /dev/sda1 /mntY/c
281 sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
282 132 83 8:23 / /mntX rw,relatime shared:1
283 133 83 8:22 / /mntY rw,relatime shared:2
284 174 132 8:3 / /mntX/a rw,relatime shared:3
285 178 133 8:1 / /mntY/c rw,relatime shared:4
286
287 When we examine the mounts in the second mount namespace, we see that
288 in this case the new mount has been propagated to the slave mount, and
289 that the new mount is itself a slave mount (to peer group 4):
290
291 sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
292 168 167 8:23 / /mntX rw,relatime shared:1
293 169 167 8:22 / /mntY rw,relatime master:2
294 173 168 8:3 / /mntX/a rw,relatime shared:3
295 175 169 8:5 / /mntY/b rw,relatime
296 179 169 8:1 / /mntY/c rw,relatime master:4
297
298 MS_UNBINDABLE example
299 One of the primary purposes of unbindable mounts is to avoid the "mount
300 explosion" problem when repeatedly performing bind mounts of a higher-
301 level subtree at a lower-level mount. The problem is illustrated by
302 the following shell session.
303
304 Suppose we have a system with the following mounts:
305
306 # mount | awk '{print $1, $2, $3}'
307 /dev/sda1 on /
308 /dev/sdb6 on /mntX
309 /dev/sdb7 on /mntY
310
311 Suppose furthermore that we wish to recursively bind mount the root di‐
312 rectory under several users' home directories. We do this for the
313 first user, and inspect the mounts:
314
315 # mount --rbind / /home/cecilia/
316 # mount | awk '{print $1, $2, $3}'
317 /dev/sda1 on /
318 /dev/sdb6 on /mntX
319 /dev/sdb7 on /mntY
320 /dev/sda1 on /home/cecilia
321 /dev/sdb6 on /home/cecilia/mntX
322 /dev/sdb7 on /home/cecilia/mntY
323
324 When we repeat this operation for the second user, we start to see the
325 explosion problem:
326
327 # mount --rbind / /home/henry
328 # mount | awk '{print $1, $2, $3}'
329 /dev/sda1 on /
330 /dev/sdb6 on /mntX
331 /dev/sdb7 on /mntY
332 /dev/sda1 on /home/cecilia
333 /dev/sdb6 on /home/cecilia/mntX
334 /dev/sdb7 on /home/cecilia/mntY
335 /dev/sda1 on /home/henry
336 /dev/sdb6 on /home/henry/mntX
337 /dev/sdb7 on /home/henry/mntY
338 /dev/sda1 on /home/henry/home/cecilia
339 /dev/sdb6 on /home/henry/home/cecilia/mntX
340 /dev/sdb7 on /home/henry/home/cecilia/mntY
341
342 Under /home/henry, we have not only recursively added the /mntX and
343 /mntY mounts, but also the recursive mounts of those directories under
344 /home/cecilia that were created in the previous step. Upon repeating
345 the step for a third user, it becomes obvious that the explosion is ex‐
346 ponential in nature:
347
348 # mount --rbind / /home/otto
349 # mount | awk '{print $1, $2, $3}'
350 /dev/sda1 on /
351 /dev/sdb6 on /mntX
352 /dev/sdb7 on /mntY
353 /dev/sda1 on /home/cecilia
354 /dev/sdb6 on /home/cecilia/mntX
355 /dev/sdb7 on /home/cecilia/mntY
356 /dev/sda1 on /home/henry
357 /dev/sdb6 on /home/henry/mntX
358 /dev/sdb7 on /home/henry/mntY
359 /dev/sda1 on /home/henry/home/cecilia
360 /dev/sdb6 on /home/henry/home/cecilia/mntX
361 /dev/sdb7 on /home/henry/home/cecilia/mntY
362 /dev/sda1 on /home/otto
363 /dev/sdb6 on /home/otto/mntX
364 /dev/sdb7 on /home/otto/mntY
365 /dev/sda1 on /home/otto/home/cecilia
366 /dev/sdb6 on /home/otto/home/cecilia/mntX
367 /dev/sdb7 on /home/otto/home/cecilia/mntY
368 /dev/sda1 on /home/otto/home/henry
369 /dev/sdb6 on /home/otto/home/henry/mntX
370 /dev/sdb7 on /home/otto/home/henry/mntY
371 /dev/sda1 on /home/otto/home/henry/home/cecilia
372 /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
373 /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
374
375 The mount explosion problem in the above scenario can be avoided by
376 making each of the new mounts unbindable. The effect of doing this is
377 that recursive mounts of the root directory will not replicate the un‐
378 bindable mounts. We make such a mount for the first user:
379
380 # mount --rbind --make-unbindable / /home/cecilia
381
382 Before going further, we show that unbindable mounts are indeed unbind‐
383 able:
384
385 # mkdir /mntZ
386 # mount --bind /home/cecilia /mntZ
387 mount: wrong fs type, bad option, bad superblock on /home/cecilia,
388 missing codepage or helper program, or other error
389
390 In some cases useful info is found in syslog - try
391 dmesg | tail or so.
392
393 Now we create unbindable recursive bind mounts for the other two users:
394
395 # mount --rbind --make-unbindable / /home/henry
396 # mount --rbind --make-unbindable / /home/otto
397
398 Upon examining the list of mounts, we see there has been no explosion
399 of mounts, because the unbindable mounts were not replicated under each
400 user's directory:
401
402 # mount | awk '{print $1, $2, $3}'
403 /dev/sda1 on /
404 /dev/sdb6 on /mntX
405 /dev/sdb7 on /mntY
406 /dev/sda1 on /home/cecilia
407 /dev/sdb6 on /home/cecilia/mntX
408 /dev/sdb7 on /home/cecilia/mntY
409 /dev/sda1 on /home/henry
410 /dev/sdb6 on /home/henry/mntX
411 /dev/sdb7 on /home/henry/mntY
412 /dev/sda1 on /home/otto
413 /dev/sdb6 on /home/otto/mntX
414 /dev/sdb7 on /home/otto/mntY
415
416 Propagation type transitions
417 The following table shows the effect that applying a new propagation
418 type (i.e., mount --make-xxxx) has on the existing propagation type of
419 a mount. The rows correspond to existing propagation types, and the
420 columns are the new propagation settings. For reasons of space, "pri‐
421 vate" is abbreviated as "priv" and "unbindable" as "unbind".
422
423 make-shared make-slave make-priv make-unbind
424 ─────────────┬───────────────────────────────────────────────────────
425 shared │shared slave/priv [1] priv unbind
426 slave │slave+shared slave [2] priv unbind
427 slave+shared │slave+shared slave priv unbind
428 private │shared priv [2] priv unbind
429 unbindable │shared unbind [2] priv unbind
430
431 Note the following details to the table:
432
433 [1] If a shared mount is the only mount in its peer group, making it a
434 slave automatically makes it private.
435
436 [2] Slaving a nonshared mount has no effect on the mount.
437
438 Bind (MS_BIND) semantics
439 Suppose that the following command is performed:
440
441 mount --bind A/a B/b
442
443 Here, A is the source mount, B is the destination mount, a is a subdi‐
444 rectory path under the mount point A, and b is a subdirectory path un‐
445 der the mount point B. The propagation type of the resulting mount,
446 B/b, depends on the propagation types of the mounts A and B, and is
447 summarized in the following table.
448
449 source(A)
450 shared private slave unbind
451 ──────────────────┬──────────────────────────────────────────
452 dest(B) shared │shared shared slave+shared invalid
453 nonshared│shared private slave invalid
454
455 Note that a recursive bind of a subtree follows the same semantics as
456 for a bind operation on each mount in the subtree. (Unbindable mounts
457 are automatically pruned at the target mount point.)
458
459 For further details, see Documentation/filesystems/sharedsubtree.txt in
460 the kernel source tree.
461
462 Move (MS_MOVE) semantics
463 Suppose that the following command is performed:
464
465 mount --move A B/b
466
467 Here, A is the source mount, B is the destination mount, and b is a
468 subdirectory path under the mount point B. The propagation type of the
469 resulting mount, B/b, depends on the propagation types of the mounts A
470 and B, and is summarized in the following table.
471
472 source(A)
473 shared private slave unbind
474 ──────────────────┬─────────────────────────────────────────────
475 dest(B) shared │shared shared slave+shared invalid
476 nonshared│shared private slave unbindable
477
478 Note: moving a mount that resides under a shared mount is invalid.
479
480 For further details, see Documentation/filesystems/sharedsubtree.txt in
481 the kernel source tree.
482
483 Mount semantics
484 Suppose that we use the following command to create a mount:
485
486 mount device B/b
487
488 Here, B is the destination mount, and b is a subdirectory path under
489 the mount point B. The propagation type of the resulting mount, B/b,
490 follows the same rules as for a bind mount, where the propagation type
491 of the source mount is considered always to be private.
492
493 Unmount semantics
494 Suppose that we use the following command to tear down a mount:
495
496 unmount A
497
498 Here, A is a mount on B/b, where B is the parent mount and b is a sub‐
499 directory path under the mount point B. If B is shared, then all most-
500 recently-mounted mounts at b on mounts that receive propagation from
501 mount B and do not have submounts under them are unmounted.
502
503 The /proc/[pid]/mountinfo propagate_from tag
504 The propagate_from:X tag is shown in the optional fields of a
505 /proc/[pid]/mountinfo record in cases where a process can't see a
506 slave's immediate master (i.e., the pathname of the master is not
507 reachable from the filesystem root directory) and so cannot determine
508 the chain of propagation between the mounts it can see.
509
510 In the following example, we first create a two-link master-slave chain
511 between the mounts /mnt, /tmp/etc, and /mnt/tmp/etc. Then the ch‐
512 root(1) command is used to make the /tmp/etc mount point unreachable
513 from the root directory, creating a situation where the master of
514 /mnt/tmp/etc is not reachable from the (new) root directory of the
515 process.
516
517 First, we bind mount the root directory onto /mnt and then bind mount
518 /proc at /mnt/proc so that after the later chroot(1) the proc(5)
519 filesystem remains visible at the correct location in the chroot-ed en‐
520 vironment.
521
522 # mkdir -p /mnt/proc
523 # mount --bind / /mnt
524 # mount --bind /proc /mnt/proc
525
526 Next, we ensure that the /mnt mount is a shared mount in a new peer
527 group (with no peers):
528
529 # mount --make-private /mnt # Isolate from any previous peer group
530 # mount --make-shared /mnt
531 # cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
532 239 61 8:2 / /mnt ... shared:102
533 248 239 0:4 / /mnt/proc ... shared:5
534
535 Next, we bind mount /mnt/etc onto /tmp/etc:
536
537 # mkdir -p /tmp/etc
538 # mount --bind /mnt/etc /tmp/etc
539 # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
540 239 61 8:2 / /mnt ... shared:102
541 248 239 0:4 / /mnt/proc ... shared:5
542 267 40 8:2 /etc /tmp/etc ... shared:102
543
544 Initially, these two mounts are in the same peer group, but we then
545 make the /tmp/etc a slave of /mnt/etc, and then make /tmp/etc shared as
546 well, so that it can propagate events to the next slave in the chain:
547
548 # mount --make-slave /tmp/etc
549 # mount --make-shared /tmp/etc
550 # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
551 239 61 8:2 / /mnt ... shared:102
552 248 239 0:4 / /mnt/proc ... shared:5
553 267 40 8:2 /etc /tmp/etc ... shared:105 master:102
554
555 Then we bind mount /tmp/etc onto /mnt/tmp/etc. Again, the two mounts
556 are initially in the same peer group, but we then make /mnt/tmp/etc a
557 slave of /tmp/etc:
558
559 # mkdir -p /mnt/tmp/etc
560 # mount --bind /tmp/etc /mnt/tmp/etc
561 # mount --make-slave /mnt/tmp/etc
562 # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
563 239 61 8:2 / /mnt ... shared:102
564 248 239 0:4 / /mnt/proc ... shared:5
565 267 40 8:2 /etc /tmp/etc ... shared:105 master:102
566 273 239 8:2 /etc /mnt/tmp/etc ... master:105
567
568 From the above, we see that /mnt is the master of the slave /tmp/etc,
569 which in turn is the master of the slave /mnt/tmp/etc.
570
571 We then chroot(1) to the /mnt directory, which renders the mount with
572 ID 267 unreachable from the (new) root directory:
573
574 # chroot /mnt
575
576 When we examine the state of the mounts inside the chroot-ed environ‐
577 ment, we see the following:
578
579 # cat /proc/self/mountinfo | sed 's/ - .*//'
580 239 61 8:2 / / ... shared:102
581 248 239 0:4 / /proc ... shared:5
582 273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
583
584 Above, we see that the mount with ID 273 is a slave whose master is the
585 peer group 105. The mount point for that master is unreachable, and so
586 a propagate_from tag is displayed, indicating that the closest dominant
587 peer group (i.e., the nearest reachable mount in the slave chain) is
588 the peer group with the ID 102 (corresponding to the /mnt mount point
589 before the chroot(1) was performed.
590
592 Mount namespaces first appeared in Linux 2.4.19.
593
595 Namespaces are a Linux-specific feature.
596
598 The propagation type assigned to a new mount depends on the propagation
599 type of the parent mount. If the mount has a parent (i.e., it is a
600 non-root mount point) and the propagation type of the parent is
601 MS_SHARED, then the propagation type of the new mount is also
602 MS_SHARED. Otherwise, the propagation type of the new mount is MS_PRI‐
603 VATE.
604
605 Notwithstanding the fact that the default propagation type for new
606 mount is in many cases MS_PRIVATE, MS_SHARED is typically more useful.
607 For this reason, systemd(1) automatically remounts all mounts as
608 MS_SHARED on system startup. Thus, on most modern systems, the default
609 propagation type is in practice MS_SHARED.
610
611 Since, when one uses unshare(1) to create a mount namespace, the goal
612 is commonly to provide full isolation of the mounts in the new name‐
613 space, unshare(1) (since util-linux version 2.27) in turn reverses the
614 step performed by systemd(1), by making all mounts private in the new
615 namespace. That is, unshare(1) performs the equivalent of the follow‐
616 ing in the new mount namespace:
617
618 mount --make-rprivate /
619
620 To prevent this, one can use the --propagation unchanged option to un‐
621 share(1).
622
623 An application that creates a new mount namespace directly using
624 clone(2) or unshare(2) may desire to prevent propagation of mount
625 events to other mount namespaces (as is done by unshare(1)). This can
626 be done by changing the propagation type of mounts in the new namespace
627 to either MS_SLAVE or MS_PRIVATE, using a call such as the following:
628
629 mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
630
631 For a discussion of propagation types when moving mounts (MS_MOVE) and
632 creating bind mounts (MS_BIND), see Documentation/filesystems/shared‐
633 subtree.txt.
634
635 Restrictions on mount namespaces
636 Note the following points with respect to mount namespaces:
637
638 [1] Each mount namespace has an owner user namespace. As explained
639 above, when a new mount namespace is created, its mount list is
640 initialized as a copy of the mount list of another mount namespace.
641 If the new namespace and the namespace from which the mount list
642 was copied are owned by different user namespaces, then the new
643 mount namespace is considered less privileged.
644
645 [2] When creating a less privileged mount namespace, shared mounts are
646 reduced to slave mounts. This ensures that mappings performed in
647 less privileged mount namespaces will not propagate to more privi‐
648 leged mount namespaces.
649
650 [3] Mounts that come as a single unit from a more privileged mount
651 namespace are locked together and may not be separated in a less
652 privileged mount namespace. (The unshare(2) CLONE_NEWNS operation
653 brings across all of the mounts from the original mount namespace
654 as a single unit, and recursive mounts that propagate between mount
655 namespaces propagate as a single unit.)
656
657 In this context, "may not be separated" means that the mounts are
658 locked so that they may not be individually unmounted. Consider
659 the following example:
660
661 $ sudo sh
662 # mount --bind /dev/null /etc/shadow
663 # cat /etc/shadow # Produces no output
664
665 The above steps, performed in a more privileged mount namespace,
666 have created a bind mount that obscures the contents of the shadow
667 password file, /etc/shadow. For security reasons, it should not be
668 possible to unmount that mount in a less privileged mount name‐
669 space, since that would reveal the contents of /etc/shadow.
670
671 Suppose we now create a new mount namespace owned by a new user
672 namespace. The new mount namespace will inherit copies of all of
673 the mounts from the previous mount namespace. However, those
674 mounts will be locked because the new mount namespace is less priv‐
675 ileged. Consequently, an attempt to unmount the mount fails as
676 show in the following step:
677
678 # unshare --user --map-root-user --mount \
679 strace -o /tmp/log \
680 umount /mnt/dir
681 umount: /etc/shadow: not mounted.
682 # grep '^umount' /tmp/log
683 umount2("/etc/shadow", 0) = -1 EINVAL (Invalid argument)
684
685 The error message from mount(8) is a little confusing, but the
686 strace(1) output reveals that the underlying umount2(2) system call
687 failed with the error EINVAL, which is the error that the kernel
688 returns to indicate that the mount is locked.
689
690 Note, however, that it is possible to stack (and unstack) a mount
691 on top of one of the inherited locked mounts in a less privileged
692 mount namespace:
693
694 # echo 'aaaaa' > /tmp/a # File to mount onto /etc/shadow
695 # unshare --user --map-root-user --mount \
696 sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
697 aaaaa
698 # umount /etc/shadow
699
700 The final umount(8) command above, which is performed in the ini‐
701 tial mount namespace, makes the original /etc/shadow file once more
702 visible in that namespace.
703
704 [4] Following on from point [3], note that it is possible to unmount an
705 entire subtree of mounts that propagated as a unit into a less
706 privileged mount namespace, as illustrated in the following exam‐
707 ple.
708
709 First, we create new user and mount namespaces using unshare(1).
710 In the new mount namespace, the propagation type of all mounts is
711 set to private. We then create a shared bind mount at /mnt, and a
712 small hierarchy of mounts underneath that mount.
713
714 $ PS1='ns1# ' sudo unshare --user --map-root-user \
715 --mount --propagation private bash
716 ns1# echo $$ # We need the PID of this shell later
717 778501
718 ns1# mount --make-shared --bind /mnt /mnt
719 ns1# mkdir /mnt/x
720 ns1# mount --make-private -t tmpfs none /mnt/x
721 ns1# mkdir /mnt/x/y
722 ns1# mount --make-private -t tmpfs none /mnt/x/y
723 ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
724 986 83 8:5 /mnt /mnt rw,relatime shared:344
725 989 986 0:56 / /mnt/x rw,relatime
726 990 989 0:57 / /mnt/x/y rw,relatime
727
728 Continuing in the same shell session, we then create a second shell
729 in a new user namespace and a new (less privileged) mount namespace
730 and check the state of the propagated mounts rooted at /mnt.
731
732 ns1# PS1='ns2# ' unshare --user --map-root-user \
733 --mount --propagation unchanged bash
734 ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
735 1239 1204 8:5 /mnt /mnt rw,relatime master:344
736 1240 1239 0:56 / /mnt/x rw,relatime
737 1241 1240 0:57 / /mnt/x/y rw,relatime
738
739 Of note in the above output is that the propagation type of the
740 mount /mnt has been reduced to slave, as explained in point [2].
741 This means that submount events will propagate from the master /mnt
742 in "ns1", but propagation will not occur in the opposite direction.
743
744 From a separate terminal window, we then use nsenter(1) to enter
745 the mount and user namespaces corresponding to "ns1". In that ter‐
746 minal window, we then recursively bind mount /mnt/x at the location
747 /mnt/ppp.
748
749 $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
750 ns3# mount --rbind --make-private /mnt/x /mnt/ppp
751 ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
752 986 83 8:5 /mnt /mnt rw,relatime shared:344
753 989 986 0:56 / /mnt/x rw,relatime
754 990 989 0:57 / /mnt/x/y rw,relatime
755 1242 986 0:56 / /mnt/ppp rw,relatime
756 1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518
757
758 Because the propagation type of the parent mount, /mnt, was shared,
759 the recursive bind mount propagated a small subtree of mounts under
760 the slave mount /mnt into "ns2", as can be verified by executing
761 the following command in that shell session:
762
763 ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
764 1239 1204 8:5 /mnt /mnt rw,relatime master:344
765 1240 1239 0:56 / /mnt/x rw,relatime
766 1241 1240 0:57 / /mnt/x/y rw,relatime
767 1244 1239 0:56 / /mnt/ppp rw,relatime
768 1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518
769
770 While it is not possible to unmount a part of the propagated sub‐
771 tree (/mnt/ppp/y) in "ns2", it is possible to unmount the entire
772 subtree, as shown by the following commands:
773
774 ns2# umount /mnt/ppp/y
775 umount: /mnt/ppp/y: not mounted.
776 ns2# umount -l /mnt/ppp | sed 's/ - .*//' # Succeeds...
777 ns2# grep /mnt /proc/self/mountinfo
778 1239 1204 8:5 /mnt /mnt rw,relatime master:344
779 1240 1239 0:56 / /mnt/x rw,relatime
780 1241 1240 0:57 / /mnt/x/y rw,relatime
781
782 [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime"
783 flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) settings become
784 locked when propagated from a more privileged to a less privileged
785 mount namespace, and may not be changed in the less privileged
786 mount namespace.
787
788 This point is illustrated in the following example where, in a more
789 privileged mount namespace, we create a bind mount that is marked
790 as read-only. For security reasons, it should not be possible to
791 make the mount writable in a less privileged mount namespace, and
792 indeed the kernel prevents this:
793
794 $ sudo mkdir /mnt/dir
795 $ sudo mount --bind -o ro /some/path /mnt/dir
796 $ sudo unshare --user --map-root-user --mount \
797 mount -o remount,rw /mnt/dir
798 mount: /mnt/dir: permission denied.
799
800 [6] A file or directory that is a mount point in one namespace that is
801 not a mount point in another namespace, may be renamed, unlinked,
802 or removed (rmdir(2)) in the mount namespace in which it is not a
803 mount point (subject to the usual permission checks). Conse‐
804 quently, the mount point is removed in the mount namespace where it
805 was a mount point.
806
807 Previously (before Linux 3.18), attempting to unlink, rename, or
808 remove a file or directory that was a mount point in another mount
809 namespace would result in the error EBUSY. That behavior had tech‐
810 nical problems of enforcement (e.g., for NFS) and permitted denial-
811 of-service attacks against more privileged users (i.e., preventing
812 individual files from being updated by bind mounting on top of
813 them).
814
816 See pivot_root(2).
817
819 unshare(1), clone(2), mount(2), mount_setattr(2), pivot_root(2),
820 setns(2), umount(2), unshare(2), proc(5), namespaces(7), user_name‐
821 spaces(7), findmnt(8), mount(8), pam_namespace(8), pivot_root(8),
822 umount(8)
823
824 Documentation/filesystems/sharedsubtree.txt in the kernel source tree.
825
827 This page is part of release 5.13 of the Linux man-pages project. A
828 description of the project, information about reporting bugs, and the
829 latest version of this page, can be found at
830 https://www.kernel.org/doc/man-pages/.
831
832
833
834Linux 2021-08-27 MOUNT_NAMESPACES(7)