1IOCTL_USERFAULTFD(2) Linux Programmer's Manual IOCTL_USERFAULTFD(2)
2
3
4
6 ioctl_userfaultfd - create a file descriptor for handling page faults
7 in user space
8
10 #include <sys/ioctl.h>
11
12 int ioctl(int fd, int cmd, ...);
13
15 Various ioctl(2) operations can be performed on a userfaultfd object
16 (created by a call to userfaultfd(2)) using calls of the form:
17
18 ioctl(fd, cmd, argp);
19 In the above, fd is a file descriptor referring to a userfaultfd ob‐
20 ject, cmd is one of the commands listed below, and argp is a pointer to
21 a data structure that is specific to cmd.
22
23 The various ioctl(2) operations are described below. The UFFDIO_API,
24 UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
25 userfaultfd behavior. These operations allow the caller to choose what
26 features will be enabled and what kinds of events will be delivered to
27 the application. The remaining operations are range operations. These
28 operations enable the calling application to resolve page-fault events.
29
30 UFFDIO_API
31 (Since Linux 4.3.) Enable operation of the userfaultfd and perform API
32 handshake.
33
34 The argp argument is a pointer to a uffdio_api structure, defined as:
35
36 struct uffdio_api {
37 __u64 api; /* Requested API version (input) */
38 __u64 features; /* Requested features (input/output) */
39 __u64 ioctls; /* Available ioctl() operations (output) */
40 };
41
42 The api field denotes the API version requested by the application.
43
44 The kernel verifies that it can support the requested API version, and
45 sets the features and ioctls fields to bit masks representing all the
46 available features and the generic ioctl(2) operations available.
47
48 For Linux kernel versions before 4.11, the features field must be ini‐
49 tialized to zero before the call to UFFDIO_API, and zero (i.e., no fea‐
50 ture bits) is placed in the features field by the kernel upon return
51 from ioctl(2).
52
53 Starting from Linux 4.11, the features field can be used to ask whether
54 particular features are supported and explicitly enable userfaultfd
55 features that are disabled by default. The kernel always reports all
56 the available features in the features field.
57
58 To enable userfaultfd features the application should set a bit corre‐
59 sponding to each feature it wants to enable in the features field. If
60 the kernel supports all the requested features it will enable them.
61 Otherwise it will zero out the returned uffdio_api structure and return
62 EINVAL.
63
64 The following feature bits may be set:
65
66 UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
67 When this feature is enabled, the userfaultfd objects associated
68 with a parent process are duplicated into the child process dur‐
69 ing fork(2) and a UFFD_EVENT_FORK event is delivered to the
70 userfaultfd monitor
71
72 UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
73 If this feature is enabled, when the faulting process invokes
74 mremap(2), the userfaultfd monitor will receive an event of type
75 UFFD_EVENT_REMAP.
76
77 UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
78 If this feature is enabled, when the faulting process calls mad‐
79 vise(2) with the MADV_DONTNEED or MADV_REMOVE advice value to
80 free a virtual memory area the userfaultfd monitor will receive
81 an event of type UFFD_EVENT_REMOVE.
82
83 UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
84 If this feature is enabled, when the faulting process unmaps
85 virtual memory either explicitly with munmap(2), or implicitly
86 during either mmap(2) or mremap(2). the userfaultfd monitor
87 will receive an event of type UFFD_EVENT_UNMAP.
88
89 UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
90 If this feature bit is set, the kernel supports registering
91 userfaultfd ranges on hugetlbfs virtual memory areas
92
93 UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
94 If this feature bit is set, the kernel supports registering
95 userfaultfd ranges on shared memory areas. This includes all
96 kernel shared memory APIs: System V shared memory, tmpfs(5),
97 shared mappings of /dev/zero, mmap(2) with the MAP_SHARED flag
98 set, memfd_create(2), and so on.
99
100 UFFD_FEATURE_SIGBUS (since Linux 4.14)
101 If this feature bit is set, no page-fault events
102 (UFFD_EVENT_PAGEFAULT) will be delivered. Instead, a SIGBUS
103 signal will be sent to the faulting process. Applications using
104 this feature will not require the use of a userfaultfd monitor
105 for processing memory accesses to the regions registered with
106 userfaultfd.
107
108 The returned ioctls field can contain the following bits:
109
110 1 << _UFFDIO_API
111 The UFFDIO_API operation is supported.
112
113 1 << _UFFDIO_REGISTER
114 The UFFDIO_REGISTER operation is supported.
115
116 1 << _UFFDIO_UNREGISTER
117 The UFFDIO_UNREGISTER operation is supported.
118
119 This ioctl(2) operation returns 0 on success. On error, -1 is returned
120 and errno is set to indicate the cause of the error. Possible errors
121 include:
122
123 EFAULT argp refers to an address that is outside the calling process's
124 accessible address space.
125
126 EINVAL The userfaultfd has already been enabled by a previous UFF‐
127 DIO_API operation.
128
129 EINVAL The API version requested in the api field is not supported by
130 this kernel, or the features field passed to the kernel includes
131 feature bits that are not supported by the current kernel ver‐
132 sion.
133
134 UFFDIO_REGISTER
135 (Since Linux 4.3.) Register a memory address range with the user‐
136 faultfd object. The pages in the range must be "compatible".
137
138 Up to Linux kernel 4.11, only private anonymous ranges are compatible
139 for registering with UFFDIO_REGISTER.
140
141 Since Linux 4.11, hugetlbfs and shared memory ranges are also compati‐
142 ble with UFFDIO_REGISTER.
143
144 The argp argument is a pointer to a uffdio_register structure, defined
145 as:
146
147 struct uffdio_range {
148 __u64 start; /* Start of range */
149 __u64 len; /* Length of range (bytes) */
150 };
151
152 struct uffdio_register {
153 struct uffdio_range range;
154 __u64 mode; /* Desired mode of operation (input) */
155 __u64 ioctls; /* Available ioctl() operations (output) */
156 };
157
158 The range field defines a memory range starting at start and continuing
159 for len bytes that should be handled by the userfaultfd.
160
161 The mode field defines the mode of operation desired for this memory
162 region. The following values may be bitwise ORed to set the user‐
163 faultfd mode for the specified range:
164
165 UFFDIO_REGISTER_MODE_MISSING
166 Track page faults on missing pages.
167
168 UFFDIO_REGISTER_MODE_WP
169 Track page faults on write-protected pages.
170
171 Currently, the only supported mode is UFFDIO_REGISTER_MODE_MISSING.
172
173 If the operation is successful, the kernel modifies the ioctls bit-mask
174 field to indicate which ioctl(2) operations are available for the spec‐
175 ified range. This returned bit mask is as for UFFDIO_API.
176
177 This ioctl(2) operation returns 0 on success. On error, -1 is returned
178 and errno is set to indicate the cause of the error. Possible errors
179 include:
180
181 EBUSY A mapping in the specified range is registered with another
182 userfaultfd object.
183
184 EFAULT argp refers to an address that is outside the calling process's
185 accessible address space.
186
187 EINVAL An invalid or unsupported bit was specified in the mode field;
188 or the mode field was zero.
189
190 EINVAL There is no mapping in the specified address range.
191
192 EINVAL range.start or range.len is not a multiple of the system page
193 size; or, range.len is zero; or these fields are otherwise in‐
194 valid.
195
196 EINVAL There as an incompatible mapping in the specified address range.
197
198 UFFDIO_UNREGISTER
199 (Since Linux 4.3.) Unregister a memory address range from userfaultfd.
200 The pages in the range must be "compatible" (see the description of
201 UFFDIO_REGISTER.)
202
203 The address range to unregister is specified in the uffdio_range struc‐
204 ture pointed to by argp.
205
206 This ioctl(2) operation returns 0 on success. On error, -1 is returned
207 and errno is set to indicate the cause of the error. Possible errors
208 include:
209
210 EINVAL Either the start or the len field of the ufdio_range structure
211 was not a multiple of the system page size; or the len field was
212 zero; or these fields were otherwise invalid.
213
214 EINVAL There as an incompatible mapping in the specified address range.
215
216 EINVAL There was no mapping in the specified address range.
217
218 UFFDIO_COPY
219 (Since Linux 4.3.) Atomically copy a continuous memory chunk into the
220 userfault registered range and optionally wake up the blocked thread.
221 The source and destination addresses and the number of bytes to copy
222 are specified by the src, dst, and len fields of the uffdio_copy struc‐
223 ture pointed to by argp:
224
225 struct uffdio_copy {
226 __u64 dst; /* Destination of copy */
227 __u64 src; /* Source of copy */
228 __u64 len; /* Number of bytes to copy */
229 __u64 mode; /* Flags controlling behavior of copy */
230 __s64 copy; /* Number of bytes copied, or negated error */
231 };
232
233 The following value may be bitwise ORed in mode to change the behavior
234 of the UFFDIO_COPY operation:
235
236 UFFDIO_COPY_MODE_DONTWAKE
237 Do not wake up the thread that waits for page-fault resolution
238
239 The copy field is used by the kernel to return the number of bytes that
240 was actually copied, or an error (a negated errno-style value). If the
241 value returned in copy doesn't match the value that was specified in
242 len, the operation fails with the error EAGAIN. The copy field is out‐
243 put-only; it is not read by the UFFDIO_COPY operation.
244
245 This ioctl(2) operation returns 0 on success. In this case, the entire
246 area was copied. On error, -1 is returned and errno is set to indicate
247 the cause of the error. Possible errors include:
248
249 EAGAIN The number of bytes copied (i.e., the value returned in the copy
250 field) does not equal the value that was specified in the len
251 field.
252
253 EINVAL Either dst or len was not a multiple of the system page size, or
254 the range specified by src and len or dst and len was invalid.
255
256 EINVAL An invalid bit was specified in the mode field.
257
258 ENOENT (since Linux 4.11)
259 The faulting process has changed its virtual memory layout si‐
260 multaneously with an outstanding UFFDIO_COPY operation.
261
262 ENOSPC (from Linux 4.11 until Linux 4.13)
263 The faulting process has exited at the time of a UFFDIO_COPY op‐
264 eration.
265
266 ESRCH (since Linux 4.13)
267 The faulting process has exited at the time of a UFFDIO_COPY op‐
268 eration.
269
270 UFFDIO_ZEROPAGE
271 (Since Linux 4.3.) Zero out a memory range registered with user‐
272 faultfd.
273
274 The requested range is specified by the range field of the uffdio_ze‐
275 ropage structure pointed to by argp:
276
277 struct uffdio_zeropage {
278 struct uffdio_range range;
279 __u64 mode; /* Flags controlling behavior of copy */
280 __s64 zeropage; /* Number of bytes zeroed, or negated error */
281 };
282
283 The following value may be bitwise ORed in mode to change the behavior
284 of the UFFDIO_ZEROPAGE operation:
285
286 UFFDIO_ZEROPAGE_MODE_DONTWAKE
287 Do not wake up the thread that waits for page-fault resolution.
288
289 The zeropage field is used by the kernel to return the number of bytes
290 that was actually zeroed, or an error in the same manner as UFF‐
291 DIO_COPY. If the value returned in the zeropage field doesn't match
292 the value that was specified in range.len, the operation fails with the
293 error EAGAIN. The zeropage field is output-only; it is not read by the
294 UFFDIO_ZEROPAGE operation.
295
296 This ioctl(2) operation returns 0 on success. In this case, the entire
297 area was zeroed. On error, -1 is returned and errno is set to indicate
298 the cause of the error. Possible errors include:
299
300 EAGAIN The number of bytes zeroed (i.e., the value returned in the ze‐
301 ropage field) does not equal the value that was specified in the
302 range.len field.
303
304 EINVAL Either range.start or range.len was not a multiple of the system
305 page size; or range.len was zero; or the range specified was in‐
306 valid.
307
308 EINVAL An invalid bit was specified in the mode field.
309
310 ESRCH (since Linux 4.13)
311 The faulting process has exited at the time of a UFFDIO_ZEROPAGE
312 operation.
313
314 UFFDIO_WAKE
315 (Since Linux 4.3.) Wake up the thread waiting for page-fault resolu‐
316 tion on a specified memory address range.
317
318 The UFFDIO_WAKE operation is used in conjunction with UFFDIO_COPY and
319 UFFDIO_ZEROPAGE operations that have the UFFDIO_COPY_MODE_DONTWAKE or
320 UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field. The userfault
321 monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
322 in a batch and then explicitly wake up the faulting thread using UFF‐
323 DIO_WAKE.
324
325 The argp argument is a pointer to a uffdio_range structure (shown
326 above) that specifies the address range.
327
328 This ioctl(2) operation returns 0 on success. On error, -1 is returned
329 and errno is set to indicate the cause of the error. Possible errors
330 include:
331
332 EINVAL The start or the len field of the ufdio_range structure was not
333 a multiple of the system page size; or len was zero; or the
334 specified range was otherwise invalid.
335
337 See descriptions of the individual operations, above.
338
340 See descriptions of the individual operations, above. In addition, the
341 following general errors can occur for all of the operations described
342 above:
343
344 EFAULT argp does not point to a valid memory address.
345
346 EINVAL (For all operations except UFFDIO_API.) The userfaultfd object
347 has not yet been enabled (via the UFFDIO_API operation).
348
350 These ioctl(2) operations are Linux-specific.
351
353 In order to detect available userfault features and enable some subset
354 of those features the userfaultfd file descriptor must be closed after
355 the first UFFDIO_API operation that queries features availability and
356 reopened before the second UFFDIO_API operation that actually enables
357 the desired features.
358
360 See userfaultfd(2).
361
363 ioctl(2), mmap(2), userfaultfd(2)
364
365 Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
366 tree
367
369 This page is part of release 5.10 of the Linux man-pages project. A
370 description of the project, information about reporting bugs, and the
371 latest version of this page, can be found at
372 https://www.kernel.org/doc/man-pages/.
373
374
375
376Linux 2020-06-09 IOCTL_USERFAULTFD(2)