1pivot_root(2) System Calls Manual pivot_root(2)
2
3
4
6 pivot_root - change the root mount
7
9 Standard C library (libc, -lc)
10
12 #include <sys/syscall.h> /* Definition of SYS_* constants */
13 #include <unistd.h>
14
15 int syscall(SYS_pivot_root, const char *new_root, const char *put_old);
16
17 Note: glibc provides no wrapper for pivot_root(), necessitating the use
18 of syscall(2).
19
21 pivot_root() changes the root mount in the mount namespace of the call‐
22 ing process. More precisely, it moves the root mount to the directory
23 put_old and makes new_root the new root mount. The calling process
24 must have the CAP_SYS_ADMIN capability in the user namespace that owns
25 the caller's mount namespace.
26
27 pivot_root() changes the root directory and the current working direc‐
28 tory of each process or thread in the same mount namespace to new_root
29 if they point to the old root directory. (See also NOTES.) On the
30 other hand, pivot_root() does not change the caller's current working
31 directory (unless it is on the old root directory), and thus it should
32 be followed by a chdir("/") call.
33
34 The following restrictions apply:
35
36 • new_root and put_old must be directories.
37
38 • new_root and put_old must not be on the same mount as the current
39 root.
40
41 • put_old must be at or underneath new_root; that is, adding some non‐
42 negative number of "/.." suffixes to the pathname pointed to by
43 put_old must yield the same directory as new_root.
44
45 • new_root must be a path to a mount point, but can't be "/". A path
46 that is not already a mount point can be converted into one by bind
47 mounting the path onto itself.
48
49 • The propagation type of the parent mount of new_root and the parent
50 mount of the current root directory must not be MS_SHARED; simi‐
51 larly, if put_old is an existing mount point, its propagation type
52 must not be MS_SHARED. These restrictions ensure that pivot_root()
53 never propagates any changes to another mount namespace.
54
55 • The current root directory must be a mount point.
56
58 On success, zero is returned. On error, -1 is returned, and errno is
59 set to indicate the error.
60
62 pivot_root() may fail with any of the same errors as stat(2). Addi‐
63 tionally, it may fail with the following errors:
64
65 EBUSY new_root or put_old is on the current root mount. (This error
66 covers the pathological case where new_root is "/".)
67
68 EINVAL new_root is not a mount point.
69
70 EINVAL put_old is not at or underneath new_root.
71
72 EINVAL The current root directory is not a mount point (because of an
73 earlier chroot(2)).
74
75 EINVAL The current root is on the rootfs (initial ramfs) mount; see
76 NOTES.
77
78 EINVAL Either the mount point at new_root, or the parent mount of that
79 mount point, has propagation type MS_SHARED.
80
81 EINVAL put_old is a mount point and has the propagation type MS_SHARED.
82
83 ENOTDIR
84 new_root or put_old is not a directory.
85
86 EPERM The calling process does not have the CAP_SYS_ADMIN capability.
87
89 Linux.
90
92 Linux 2.3.41.
93
95 A command-line interface for this system call is provided by
96 pivot_root(8).
97
98 pivot_root() allows the caller to switch to a new root filesystem while
99 at the same time placing the old root mount at a location under
100 new_root from where it can subsequently be unmounted. (The fact that
101 it moves all processes that have a root directory or current working
102 directory on the old root directory to the new root frees the old root
103 directory of users, allowing the old root mount to be unmounted more
104 easily.)
105
106 One use of pivot_root() is during system startup, when the system
107 mounts a temporary root filesystem (e.g., an initrd(4)), then mounts
108 the real root filesystem, and eventually turns the latter into the root
109 directory of all relevant processes and threads. A modern use is to
110 set up a root filesystem during the creation of a container.
111
112 The fact that pivot_root() modifies process root and current working
113 directories in the manner noted in DESCRIPTION is necessary in order to
114 prevent kernel threads from keeping the old root mount busy with their
115 root and current working directories, even if they never access the
116 filesystem in any way.
117
118 The rootfs (initial ramfs) cannot be pivot_root()ed. The recommended
119 method of changing the root filesystem in this case is to delete every‐
120 thing in rootfs, overmount rootfs with the new root, attach stdin/std‐
121 out/stderr to the new /dev/console, and exec the new init(1). Helper
122 programs for this process exist; see switch_root(8).
123
124 pivot_root(".", ".")
125 new_root and put_old may be the same directory. In particular, the
126 following sequence allows a pivot-root operation without needing to
127 create and remove a temporary directory:
128
129 chdir(new_root);
130 pivot_root(".", ".");
131 umount2(".", MNT_DETACH);
132
133 This sequence succeeds because the pivot_root() call stacks the old
134 root mount point on top of the new root mount point at /. At that
135 point, the calling process's root directory and current working direc‐
136 tory refer to the new root mount point (new_root). During the subse‐
137 quent umount() call, resolution of "." starts with new_root and then
138 moves up the list of mounts stacked at /, with the result that old root
139 mount point is unmounted.
140
141 Historical notes
142 For many years, this manual page carried the following text:
143
144 pivot_root() may or may not change the current root and the cur‐
145 rent working directory of any processes or threads which use the
146 old root directory. The caller of pivot_root() must ensure that
147 processes with root or current working directory at the old root
148 operate correctly in either case. An easy way to ensure this is
149 to change their root and current working directory to new_root
150 before invoking pivot_root().
151
152 This text, written before the system call implementation was even fi‐
153 nalized in the kernel, was probably intended to warn users at that time
154 that the implementation might change before final release. However,
155 the behavior stated in DESCRIPTION has remained consistent since this
156 system call was first implemented and will not change now.
157
159 The program below demonstrates the use of pivot_root() inside a mount
160 namespace that is created using clone(2). After pivoting to the root
161 directory named in the program's first command-line argument, the child
162 created by clone(2) then executes the program named in the remaining
163 command-line arguments.
164
165 We demonstrate the program by creating a directory that will serve as
166 the new root filesystem and placing a copy of the (statically linked)
167 busybox(1) executable in that directory.
168
169 $ mkdir /tmp/rootfs
170 $ ls -id /tmp/rootfs # Show inode number of new root directory
171 319459 /tmp/rootfs
172 $ cp $(which busybox) /tmp/rootfs
173 $ PS1='bbsh$ ' sudo ./pivot_root_demo /tmp/rootfs /busybox sh
174 bbsh$ PATH=/
175 bbsh$ busybox ln busybox ln
176 bbsh$ ln busybox echo
177 bbsh$ ln busybox ls
178 bbsh$ ls
179 busybox echo ln ls
180 bbsh$ ls -id / # Compare with inode number above
181 319459 /
182 bbsh$ echo 'hello world'
183 hello world
184
185 Program source
186
187
188 /* pivot_root_demo.c */
189
190 #define _GNU_SOURCE
191 #include <err.h>
192 #include <limits.h>
193 #include <sched.h>
194 #include <signal.h>
195 #include <stdio.h>
196 #include <stdlib.h>
197 #include <sys/mman.h>
198 #include <sys/mount.h>
199 #include <sys/stat.h>
200 #include <sys/syscall.h>
201 #include <sys/wait.h>
202 #include <unistd.h>
203
204 static int
205 pivot_root(const char *new_root, const char *put_old)
206 {
207 return syscall(SYS_pivot_root, new_root, put_old);
208 }
209
210 #define STACK_SIZE (1024 * 1024)
211
212 static int /* Startup function for cloned child */
213 child(void *arg)
214 {
215 char path[PATH_MAX];
216 char **args = arg;
217 char *new_root = args[0];
218 const char *put_old = "/oldrootfs";
219
220 /* Ensure that 'new_root' and its parent mount don't have
221 shared propagation (which would cause pivot_root() to
222 return an error), and prevent propagation of mount
223 events to the initial mount namespace. */
224
225 if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL) == -1)
226 err(EXIT_FAILURE, "mount-MS_PRIVATE");
227
228 /* Ensure that 'new_root' is a mount point. */
229
230 if (mount(new_root, new_root, NULL, MS_BIND, NULL) == -1)
231 err(EXIT_FAILURE, "mount-MS_BIND");
232
233 /* Create directory to which old root will be pivoted. */
234
235 snprintf(path, sizeof(path), "%s/%s", new_root, put_old);
236 if (mkdir(path, 0777) == -1)
237 err(EXIT_FAILURE, "mkdir");
238
239 /* And pivot the root filesystem. */
240
241 if (pivot_root(new_root, path) == -1)
242 err(EXIT_FAILURE, "pivot_root");
243
244 /* Switch the current working directory to "/". */
245
246 if (chdir("/") == -1)
247 err(EXIT_FAILURE, "chdir");
248
249 /* Unmount old root and remove mount point. */
250
251 if (umount2(put_old, MNT_DETACH) == -1)
252 perror("umount2");
253 if (rmdir(put_old) == -1)
254 perror("rmdir");
255
256 /* Execute the command specified in argv[1]... */
257
258 execv(args[1], &args[1]);
259 err(EXIT_FAILURE, "execv");
260 }
261
262 int
263 main(int argc, char *argv[])
264 {
265 char *stack;
266
267 /* Create a child process in a new mount namespace. */
268
269 stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
270 MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
271 if (stack == MAP_FAILED)
272 err(EXIT_FAILURE, "mmap");
273
274 if (clone(child, stack + STACK_SIZE,
275 CLONE_NEWNS | SIGCHLD, &argv[1]) == -1)
276 err(EXIT_FAILURE, "clone");
277
278 /* Parent falls through to here; wait for child. */
279
280 if (wait(NULL) == -1)
281 err(EXIT_FAILURE, "wait");
282
283 exit(EXIT_SUCCESS);
284 }
285
287 chdir(2), chroot(2), mount(2), stat(2), initrd(4), mount_namespaces(7),
288 pivot_root(8), switch_root(8)
289
290
291
292Linux man-pages 6.05 2023-05-03 pivot_root(2)