1OPENAT2(2) Linux Programmer's Manual OPENAT2(2)
2
3
4
6 openat2 - open and possibly create a file (extended)
7
9 #include <fcntl.h> /* Definition of O_* and S_* constants */
10 #include <linux/openat2.h> /* Definition of RESOLVE_* constants */
11 #include <sys/syscall.h> /* Definition of SYS_* constants */
12 #include <unistd.h>
13
14 long syscall(SYS_openat2, int dirfd, const char *pathname,
15 struct open_how *how, size_t size);
16
17 Note: glibc provides no wrapper for openat2(), necessitating the use of
18 syscall(2).
19
21 The openat2() system call is an extension of openat(2) and provides a
22 superset of its functionality.
23
24 The openat2() system call opens the file specified by pathname. If the
25 specified file does not exist, it may optionally (if O_CREAT is speci‐
26 fied in how.flags) be created.
27
28 As with openat(2), if pathname is a relative pathname, then it is in‐
29 terpreted relative to the directory referred to by the file descriptor
30 dirfd (or the current working directory of the calling process, if
31 dirfd is the special value AT_FDCWD). If pathname is an absolute path‐
32 name, then dirfd is ignored (unless how.resolve contains RE‐
33 SOLVE_IN_ROOT, in which case pathname is resolved relative to dirfd).
34
35 Rather than taking a single flags argument, an extensible structure
36 (how) is passed to allow for future extensions. The size argument must
37 be specified as sizeof(struct open_how).
38
39 The open_how structure
40 The how argument specifies how pathname should be opened, and acts as a
41 superset of the flags and mode arguments to openat(2). This argument
42 is a pointer to a structure of the following form:
43
44 struct open_how {
45 u64 flags; /* O_* flags */
46 u64 mode; /* Mode for O_{CREAT,TMPFILE} */
47 u64 resolve; /* RESOLVE_* flags */
48 /* ... */
49 };
50
51 Any future extensions to openat2() will be implemented as new fields
52 appended to the above structure, with a zero value in a new field re‐
53 sulting in the kernel behaving as though that extension field was not
54 present. Therefore, the caller must zero-fill this structure on ini‐
55 tialization. (See the "Extensibility" section of the NOTES for more
56 detail on why this is necessary.)
57
58 The fields of the open_how structure are as follows:
59
60 flags This field specifies the file creation and file status flags to
61 use when opening the file. All of the O_* flags defined for
62 openat(2) are valid openat2() flag values.
63
64 Whereas openat(2) ignores unknown bits in its flags argument,
65 openat2() returns an error if unknown or conflicting flags are
66 specified in how.flags.
67
68 mode This field specifies the mode for the new file, with identical
69 semantics to the mode argument of openat(2).
70
71 Whereas openat(2) ignores bits other than those in the range
72 07777 in its mode argument, openat2() returns an error if
73 how.mode contains bits other than 07777. Similarly, an error is
74 returned if openat2() is called with a nonzero how.mode and
75 how.flags does not contain O_CREAT or O_TMPFILE.
76
77 resolve
78 This is a bit-mask of flags that modify the way in which all
79 components of pathname will be resolved. (See path_resolu‐
80 tion(7) for background information.)
81
82 The primary use case for these flags is to allow trusted pro‐
83 grams to restrict how untrusted paths (or paths inside untrusted
84 directories) are resolved. The full list of resolve flags is as
85 follows:
86
87 RESOLVE_BENEATH
88 Do not permit the path resolution to succeed if any com‐
89 ponent of the resolution is not a descendant of the di‐
90 rectory indicated by dirfd. This causes absolute sym‐
91 bolic links (and absolute values of pathname) to be re‐
92 jected.
93
94 Currently, this flag also disables magic-link resolution
95 (see below). However, this may change in the future.
96 Therefore, to ensure that magic links are not resolved,
97 the caller should explicitly specify RESOLVE_NO_MAGI‐
98 CLINKS.
99
100 RESOLVE_IN_ROOT
101 Treat the directory referred to by dirfd as the root di‐
102 rectory while resolving pathname. Absolute symbolic
103 links are interpreted relative to dirfd. If a prefix
104 component of pathname equates to dirfd, then an immedi‐
105 ately following .. component likewise equates to dirfd
106 (just as /.. is traditionally equivalent to /). If path‐
107 name is an absolute path, it is also interpreted relative
108 to dirfd.
109
110 The effect of this flag is as though the calling process
111 had used chroot(2) to (temporarily) modify its root di‐
112 rectory (to the directory referred to by dirfd). How‐
113 ever, unlike chroot(2) (which changes the filesystem root
114 permanently for a process), RESOLVE_IN_ROOT allows a pro‐
115 gram to efficiently restrict path resolution on a per-
116 open basis.
117
118 Currently, this flag also disables magic-link resolution.
119 However, this may change in the future. Therefore, to
120 ensure that magic links are not resolved, the caller
121 should explicitly specify RESOLVE_NO_MAGICLINKS.
122
123 RESOLVE_NO_MAGICLINKS
124 Disallow all magic-link resolution during path resolu‐
125 tion.
126
127 Magic links are symbolic link-like objects that are most
128 notably found in proc(5); examples include
129 /proc/[pid]/exe and /proc/[pid]/fd/*. (See symlink(7)
130 for more details.)
131
132 Unknowingly opening magic links can be risky for some ap‐
133 plications. Examples of such risks include the follow‐
134 ing:
135
136 • If the process opening a pathname is a controlling
137 process that currently has no controlling terminal (see
138 credentials(7)), then opening a magic link inside
139 /proc/[pid]/fd that happens to refer to a terminal
140 would cause the process to acquire a controlling termi‐
141 nal.
142
143 • In a containerized environment, a magic link inside
144 /proc may refer to an object outside the container, and
145 thus may provide a means to escape from the container.
146
147 Because of such risks, an application may prefer to dis‐
148 able magic link resolution using the RESOLVE_NO_MAGI‐
149 CLINKS flag.
150
151 If the trailing component (i.e., basename) of pathname is
152 a magic link, how.resolve contains RESOLVE_NO_MAGICLINKS,
153 and how.flags contains both O_PATH and O_NOFOLLOW, then
154 an O_PATH file descriptor referencing the magic link will
155 be returned.
156
157 RESOLVE_NO_SYMLINKS
158 Disallow resolution of symbolic links during path resolu‐
159 tion. This option implies RESOLVE_NO_MAGICLINKS.
160
161 If the trailing component (i.e., basename) of pathname is
162 a symbolic link, how.resolve contains RESOLVE_NO_SYM‐
163 LINKS, and how.flags contains both O_PATH and O_NOFOLLOW,
164 then an O_PATH file descriptor referencing the symbolic
165 link will be returned.
166
167 Note that the effect of the RESOLVE_NO_SYMLINKS flag,
168 which affects the treatment of symbolic links in all of
169 the components of pathname, differs from the effect of
170 the O_NOFOLLOW file creation flag (in how.flags), which
171 affects the handling of symbolic links only in the final
172 component of pathname.
173
174 Applications that employ the RESOLVE_NO_SYMLINKS flag are
175 encouraged to make its use configurable (unless it is
176 used for a specific security purpose), as symbolic links
177 are very widely used by end-users. Setting this flag in‐
178 discriminately—i.e., for purposes not specifically re‐
179 lated to security—for all uses of openat2() may result in
180 spurious errors on previously functional systems. This
181 may occur if, for example, a system pathname that is used
182 by an application is modified (e.g., in a new distribu‐
183 tion release) so that a pathname component (now) contains
184 a symbolic link.
185
186 RESOLVE_NO_XDEV
187 Disallow traversal of mount points during path resolution
188 (including all bind mounts). Consequently, pathname must
189 either be on the same mount as the directory referred to
190 by dirfd, or on the same mount as the current working di‐
191 rectory if dirfd is specified as AT_FDCWD.
192
193 Applications that employ the RESOLVE_NO_XDEV flag are en‐
194 couraged to make its use configurable (unless it is used
195 for a specific security purpose), as bind mounts are
196 widely used by end-users. Setting this flag indiscrimi‐
197 nately—i.e., for purposes not specifically related to se‐
198 curity—for all uses of openat2() may result in spurious
199 errors on previously functional systems. This may occur
200 if, for example, a system pathname that is used by an ap‐
201 plication is modified (e.g., in a new distribution re‐
202 lease) so that a pathname component (now) contains a bind
203 mount.
204
205 RESOLVE_CACHED
206 Make the open operation fail unless all path components
207 are already present in the kernel's lookup cache. If any
208 kind of revalidation or I/O is needed to satisfy the
209 lookup, openat2() fails with the error EAGAIN . This is
210 useful in providing a fast-path open that can be per‐
211 formed without resorting to thread offload, or other
212 mechanisms that an application might use to offload
213 slower operations.
214
215 If any bits other than those listed above are set in how.re‐
216 solve, an error is returned.
217
219 On success, a new file descriptor is returned. On error, -1 is re‐
220 turned, and errno is set to indicate the error.
221
223 The set of errors returned by openat2() includes all of the errors re‐
224 turned by openat(2), as well as the following additional errors:
225
226 E2BIG An extension that this kernel does not support was specified in
227 how. (See the "Extensibility" section of NOTES for more detail
228 on how extensions are handled.)
229
230 EAGAIN how.resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH,
231 and the kernel could not ensure that a ".." component didn't es‐
232 cape (due to a race condition or potential attack). The caller
233 may choose to retry the openat2() call.
234
235 EAGAIN RESOLVE_CACHED was set, and the open operation cannot be per‐
236 formed using only cached information. The caller should retry
237 without RESOLVE_CACHED set in how.resolve .
238
239 EINVAL An unknown flag or invalid value was specified in how.
240
241 EINVAL mode is nonzero, but how.flags does not contain O_CREAT or
242 O_TMPFILE.
243
244 EINVAL size was smaller than any known version of struct open_how.
245
246 ELOOP how.resolve contains RESOLVE_NO_SYMLINKS, and one of the path
247 components was a symbolic link (or magic link).
248
249 ELOOP how.resolve contains RESOLVE_NO_MAGICLINKS, and one of the path
250 components was a magic link.
251
252 EXDEV how.resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH,
253 and an escape from the root during path resolution was detected.
254
255 EXDEV how.resolve contains RESOLVE_NO_XDEV, and a path component
256 crosses a mount point.
257
259 openat2() first appeared in Linux 5.6.
260
262 This system call is Linux-specific.
263
264 The semantics of RESOLVE_BENEATH were modeled after FreeBSD's O_BE‐
265 NEATH.
266
268 Extensibility
269 In order to allow for future extensibility, openat2() requires the
270 user-space application to specify the size of the open_how structure
271 that it is passing. By providing this information, it is possible for
272 openat2() to provide both forwards- and backwards-compatibility, with
273 size acting as an implicit version number. (Because new extension
274 fields will always be appended, the structure size will always in‐
275 crease.) This extensibility design is very similar to other system
276 calls such as sched_setattr(2), perf_event_open(2), and clone3(2).
277
278 If we let usize be the size of the structure as specified by the user-
279 space application, and ksize be the size of the structure which the
280 kernel supports, then there are three cases to consider:
281
282 • If ksize equals usize, then there is no version mismatch and how can
283 be used verbatim.
284
285 • If ksize is larger than usize, then there are some extension fields
286 that the kernel supports which the user-space application is unaware
287 of. Because a zero value in any added extension field signifies a
288 no-op, the kernel treats all of the extension fields not provided by
289 the user-space application as having zero values. This provides
290 backwards-compatibility.
291
292 • If ksize is smaller than usize, then there are some extension fields
293 which the user-space application is aware of but which the kernel
294 does not support. Because any extension field must have its zero
295 values signify a no-op, the kernel can safely ignore the unsupported
296 extension fields if they are all-zero. If any unsupported extension
297 fields are nonzero, then -1 is returned and errno is set to E2BIG.
298 This provides forwards-compatibility.
299
300 Because the definition of struct open_how may change in the future
301 (with new fields being added when system headers are updated), user-
302 space applications should zero-fill struct open_how to ensure that re‐
303 compiling the program with new headers will not result in spurious er‐
304 rors at runtime. The simplest way is to use a designated initializer:
305
306 struct open_how how = { .flags = O_RDWR,
307 .resolve = RESOLVE_IN_ROOT };
308
309 or explicitly using memset(3) or similar:
310
311 struct open_how how;
312 memset(&how, 0, sizeof(how));
313 how.flags = O_RDWR;
314 how.resolve = RESOLVE_IN_ROOT;
315
316 A user-space application that wishes to determine which extensions the
317 running kernel supports can do so by conducting a binary search on size
318 with a structure which has every byte nonzero (to find the largest
319 value which doesn't produce an error of E2BIG).
320
322 openat(2), path_resolution(7), symlink(7)
323
325 This page is part of release 5.13 of the Linux man-pages project. A
326 description of the project, information about reporting bugs, and the
327 latest version of this page, can be found at
328 https://www.kernel.org/doc/man-pages/.
329
330
331
332Linux 2021-03-22 OPENAT2(2)