1IOCTL_USERFAULTFD(2) Linux Programmer's Manual IOCTL_USERFAULTFD(2)
2
3
4
6 ioctl_userfaultfd - create a file descriptor for handling page faults
7 in user space
8
10 #include <linux/userfaultfd.h> /* Definition of UFFD* constants */
11 #include <sys/ioctl.h>
12
13 int ioctl(int fd, int cmd, ...);
14
16 Various ioctl(2) operations can be performed on a userfaultfd object
17 (created by a call to userfaultfd(2)) using calls of the form:
18
19 ioctl(fd, cmd, argp);
20 In the above, fd is a file descriptor referring to a userfaultfd ob‐
21 ject, cmd is one of the commands listed below, and argp is a pointer to
22 a data structure that is specific to cmd.
23
24 The various ioctl(2) operations are described below. The UFFDIO_API,
25 UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
26 userfaultfd behavior. These operations allow the caller to choose what
27 features will be enabled and what kinds of events will be delivered to
28 the application. The remaining operations are range operations. These
29 operations enable the calling application to resolve page-fault events.
30
31 UFFDIO_API
32 (Since Linux 4.3.) Enable operation of the userfaultfd and perform API
33 handshake.
34
35 The argp argument is a pointer to a uffdio_api structure, defined as:
36
37 struct uffdio_api {
38 __u64 api; /* Requested API version (input) */
39 __u64 features; /* Requested features (input/output) */
40 __u64 ioctls; /* Available ioctl() operations (output) */
41 };
42
43 The api field denotes the API version requested by the application.
44
45 The kernel verifies that it can support the requested API version, and
46 sets the features and ioctls fields to bit masks representing all the
47 available features and the generic ioctl(2) operations available.
48
49 For Linux kernel versions before 4.11, the features field must be ini‐
50 tialized to zero before the call to UFFDIO_API, and zero (i.e., no fea‐
51 ture bits) is placed in the features field by the kernel upon return
52 from ioctl(2).
53
54 Starting from Linux 4.11, the features field can be used to ask whether
55 particular features are supported and explicitly enable userfaultfd
56 features that are disabled by default. The kernel always reports all
57 the available features in the features field.
58
59 To enable userfaultfd features the application should set a bit corre‐
60 sponding to each feature it wants to enable in the features field. If
61 the kernel supports all the requested features it will enable them.
62 Otherwise it will zero out the returned uffdio_api structure and return
63 EINVAL.
64
65 The following feature bits may be set:
66
67 UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
68 When this feature is enabled, the userfaultfd objects associated
69 with a parent process are duplicated into the child process dur‐
70 ing fork(2) and a UFFD_EVENT_FORK event is delivered to the
71 userfaultfd monitor
72
73 UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
74 If this feature is enabled, when the faulting process invokes
75 mremap(2), the userfaultfd monitor will receive an event of type
76 UFFD_EVENT_REMAP.
77
78 UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
79 If this feature is enabled, when the faulting process calls mad‐
80 vise(2) with the MADV_DONTNEED or MADV_REMOVE advice value to
81 free a virtual memory area the userfaultfd monitor will receive
82 an event of type UFFD_EVENT_REMOVE.
83
84 UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
85 If this feature is enabled, when the faulting process unmaps
86 virtual memory either explicitly with munmap(2), or implicitly
87 during either mmap(2) or mremap(2), the userfaultfd monitor will
88 receive an event of type UFFD_EVENT_UNMAP.
89
90 UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
91 If this feature bit is set, the kernel supports registering
92 userfaultfd ranges on hugetlbfs virtual memory areas
93
94 UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
95 If this feature bit is set, the kernel supports registering
96 userfaultfd ranges on shared memory areas. This includes all
97 kernel shared memory APIs: System V shared memory, tmpfs(5),
98 shared mappings of /dev/zero, mmap(2) with the MAP_SHARED flag
99 set, memfd_create(2), and so on.
100
101 UFFD_FEATURE_SIGBUS (since Linux 4.14)
102 If this feature bit is set, no page-fault events
103 (UFFD_EVENT_PAGEFAULT) will be delivered. Instead, a SIGBUS
104 signal will be sent to the faulting process. Applications using
105 this feature will not require the use of a userfaultfd monitor
106 for processing memory accesses to the regions registered with
107 userfaultfd.
108
109 UFFD_FEATURE_THREAD_ID (since Linux 4.14)
110 If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
111 set to the faulted thread ID for each page-fault message.
112
113 The returned ioctls field can contain the following bits:
114
115 1 << _UFFDIO_API
116 The UFFDIO_API operation is supported.
117
118 1 << _UFFDIO_REGISTER
119 The UFFDIO_REGISTER operation is supported.
120
121 1 << _UFFDIO_UNREGISTER
122 The UFFDIO_UNREGISTER operation is supported.
123
124 1 << _UFFDIO_WRITEPROTECT
125 The UFFDIO_WRITEPROTECT operation is supported.
126
127 This ioctl(2) operation returns 0 on success. On error, -1 is returned
128 and errno is set to indicate the error. Possible errors include:
129
130 EFAULT argp refers to an address that is outside the calling process's
131 accessible address space.
132
133 EINVAL The userfaultfd has already been enabled by a previous UFF‐
134 DIO_API operation.
135
136 EINVAL The API version requested in the api field is not supported by
137 this kernel, or the features field passed to the kernel includes
138 feature bits that are not supported by the current kernel ver‐
139 sion.
140
141 UFFDIO_REGISTER
142 (Since Linux 4.3.) Register a memory address range with the user‐
143 faultfd object. The pages in the range must be "compatible".
144
145 Up to Linux kernel 4.11, only private anonymous ranges are compatible
146 for registering with UFFDIO_REGISTER.
147
148 Since Linux 4.11, hugetlbfs and shared memory ranges are also compati‐
149 ble with UFFDIO_REGISTER.
150
151 The argp argument is a pointer to a uffdio_register structure, defined
152 as:
153
154 struct uffdio_range {
155 __u64 start; /* Start of range */
156 __u64 len; /* Length of range (bytes) */
157 };
158
159 struct uffdio_register {
160 struct uffdio_range range;
161 __u64 mode; /* Desired mode of operation (input) */
162 __u64 ioctls; /* Available ioctl() operations (output) */
163 };
164
165 The range field defines a memory range starting at start and continuing
166 for len bytes that should be handled by the userfaultfd.
167
168 The mode field defines the mode of operation desired for this memory
169 region. The following values may be bitwise ORed to set the user‐
170 faultfd mode for the specified range:
171
172 UFFDIO_REGISTER_MODE_MISSING
173 Track page faults on missing pages.
174
175 UFFDIO_REGISTER_MODE_WP
176 Track page faults on write-protected pages.
177
178 If the operation is successful, the kernel modifies the ioctls bit-mask
179 field to indicate which ioctl(2) operations are available for the spec‐
180 ified range. This returned bit mask is as for UFFDIO_API.
181
182 This ioctl(2) operation returns 0 on success. On error, -1 is returned
183 and errno is set to indicate the error. Possible errors include:
184
185 EBUSY A mapping in the specified range is registered with another
186 userfaultfd object.
187
188 EFAULT argp refers to an address that is outside the calling process's
189 accessible address space.
190
191 EINVAL An invalid or unsupported bit was specified in the mode field;
192 or the mode field was zero.
193
194 EINVAL There is no mapping in the specified address range.
195
196 EINVAL range.start or range.len is not a multiple of the system page
197 size; or, range.len is zero; or these fields are otherwise in‐
198 valid.
199
200 EINVAL There as an incompatible mapping in the specified address range.
201
202 UFFDIO_UNREGISTER
203 (Since Linux 4.3.) Unregister a memory address range from userfaultfd.
204 The pages in the range must be "compatible" (see the description of
205 UFFDIO_REGISTER.)
206
207 The address range to unregister is specified in the uffdio_range struc‐
208 ture pointed to by argp.
209
210 This ioctl(2) operation returns 0 on success. On error, -1 is returned
211 and errno is set to indicate the error. Possible errors include:
212
213 EINVAL Either the start or the len field of the ufdio_range structure
214 was not a multiple of the system page size; or the len field was
215 zero; or these fields were otherwise invalid.
216
217 EINVAL There as an incompatible mapping in the specified address range.
218
219 EINVAL There was no mapping in the specified address range.
220
221 UFFDIO_COPY
222 (Since Linux 4.3.) Atomically copy a continuous memory chunk into the
223 userfault registered range and optionally wake up the blocked thread.
224 The source and destination addresses and the number of bytes to copy
225 are specified by the src, dst, and len fields of the uffdio_copy struc‐
226 ture pointed to by argp:
227
228 struct uffdio_copy {
229 __u64 dst; /* Destination of copy */
230 __u64 src; /* Source of copy */
231 __u64 len; /* Number of bytes to copy */
232 __u64 mode; /* Flags controlling behavior of copy */
233 __s64 copy; /* Number of bytes copied, or negated error */
234 };
235
236 The following value may be bitwise ORed in mode to change the behavior
237 of the UFFDIO_COPY operation:
238
239 UFFDIO_COPY_MODE_DONTWAKE
240 Do not wake up the thread that waits for page-fault resolution
241
242 UFFDIO_COPY_MODE_WP
243 Copy the page with read-only permission. This allows the user
244 to trap the next write to the page, which will block and gener‐
245 ate another write-protect userfault message. This is used only
246 when both UFFDIO_REGISTER_MODE_MISSING and UFFDIO_REGIS‐
247 TER_MODE_WP modes are enabled for the registered range.
248
249 The copy field is used by the kernel to return the number of bytes that
250 was actually copied, or an error (a negated errno-style value). If the
251 value returned in copy doesn't match the value that was specified in
252 len, the operation fails with the error EAGAIN. The copy field is out‐
253 put-only; it is not read by the UFFDIO_COPY operation.
254
255 This ioctl(2) operation returns 0 on success. In this case, the entire
256 area was copied. On error, -1 is returned and errno is set to indicate
257 the error. Possible errors include:
258
259 EAGAIN The number of bytes copied (i.e., the value returned in the copy
260 field) does not equal the value that was specified in the len
261 field.
262
263 EINVAL Either dst or len was not a multiple of the system page size, or
264 the range specified by src and len or dst and len was invalid.
265
266 EINVAL An invalid bit was specified in the mode field.
267
268 ENOENT (since Linux 4.11)
269 The faulting process has changed its virtual memory layout si‐
270 multaneously with an outstanding UFFDIO_COPY operation.
271
272 ENOSPC (from Linux 4.11 until Linux 4.13)
273 The faulting process has exited at the time of a UFFDIO_COPY op‐
274 eration.
275
276 ESRCH (since Linux 4.13)
277 The faulting process has exited at the time of a UFFDIO_COPY op‐
278 eration.
279
280 UFFDIO_ZEROPAGE
281 (Since Linux 4.3.) Zero out a memory range registered with user‐
282 faultfd.
283
284 The requested range is specified by the range field of the uffdio_ze‐
285 ropage structure pointed to by argp:
286
287 struct uffdio_zeropage {
288 struct uffdio_range range;
289 __u64 mode; /* Flags controlling behavior of copy */
290 __s64 zeropage; /* Number of bytes zeroed, or negated error */
291 };
292
293 The following value may be bitwise ORed in mode to change the behavior
294 of the UFFDIO_ZEROPAGE operation:
295
296 UFFDIO_ZEROPAGE_MODE_DONTWAKE
297 Do not wake up the thread that waits for page-fault resolution.
298
299 The zeropage field is used by the kernel to return the number of bytes
300 that was actually zeroed, or an error in the same manner as UFF‐
301 DIO_COPY. If the value returned in the zeropage field doesn't match
302 the value that was specified in range.len, the operation fails with the
303 error EAGAIN. The zeropage field is output-only; it is not read by the
304 UFFDIO_ZEROPAGE operation.
305
306 This ioctl(2) operation returns 0 on success. In this case, the entire
307 area was zeroed. On error, -1 is returned and errno is set to indicate
308 the error. Possible errors include:
309
310 EAGAIN The number of bytes zeroed (i.e., the value returned in the ze‐
311 ropage field) does not equal the value that was specified in the
312 range.len field.
313
314 EINVAL Either range.start or range.len was not a multiple of the system
315 page size; or range.len was zero; or the range specified was in‐
316 valid.
317
318 EINVAL An invalid bit was specified in the mode field.
319
320 ESRCH (since Linux 4.13)
321 The faulting process has exited at the time of a UFFDIO_ZEROPAGE
322 operation.
323
324 UFFDIO_WAKE
325 (Since Linux 4.3.) Wake up the thread waiting for page-fault resolu‐
326 tion on a specified memory address range.
327
328 The UFFDIO_WAKE operation is used in conjunction with UFFDIO_COPY and
329 UFFDIO_ZEROPAGE operations that have the UFFDIO_COPY_MODE_DONTWAKE or
330 UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field. The userfault
331 monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
332 in a batch and then explicitly wake up the faulting thread using UFF‐
333 DIO_WAKE.
334
335 The argp argument is a pointer to a uffdio_range structure (shown
336 above) that specifies the address range.
337
338 This ioctl(2) operation returns 0 on success. On error, -1 is returned
339 and errno is set to indicate the error. Possible errors include:
340
341 EINVAL The start or the len field of the ufdio_range structure was not
342 a multiple of the system page size; or len was zero; or the
343 specified range was otherwise invalid.
344
345 UFFDIO_WRITEPROTECT (Since Linux 5.7)
346 Write-protect or write-unprotect a userfaultfd-registered memory range
347 registered with mode UFFDIO_REGISTER_MODE_WP.
348
349 The argp argument is a pointer to a uffdio_range structure as shown be‐
350 low:
351
352 struct uffdio_writeprotect {
353 struct uffdio_range range; /* Range to change write permission*/
354 __u64 mode; /* Mode to change write permission */
355 };
356
357 There are two mode bits that are supported in this structure:
358
359 UFFDIO_WRITEPROTECT_MODE_WP
360 When this mode bit is set, the ioctl will be a write-protect op‐
361 eration upon the memory range specified by range. Otherwise it
362 will be a write-unprotect operation upon the specified range,
363 which can be used to resolve a userfaultfd write-protect page
364 fault.
365
366 UFFDIO_WRITEPROTECT_MODE_DONTWAKE
367 When this mode bit is set, do not wake up any thread that waits
368 for page-fault resolution after the operation. This can be
369 specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
370
371 This ioctl(2) operation returns 0 on success. On error, -1 is returned
372 and errno is set to indicate the error. Possible errors include:
373
374 EINVAL The start or the len field of the ufdio_range structure was not
375 a multiple of the system page size; or len was zero; or the
376 specified range was otherwise invalid.
377
378 EAGAIN The process was interrupted; retry this call.
379
380 ENOENT The range specified in range is not valid. For example, the
381 virtual address does not exist, or not registered with user‐
382 faultfd write-protect mode.
383
384 EFAULT Encountered a generic fault during processing.
385
387 See descriptions of the individual operations, above.
388
390 See descriptions of the individual operations, above. In addition, the
391 following general errors can occur for all of the operations described
392 above:
393
394 EFAULT argp does not point to a valid memory address.
395
396 EINVAL (For all operations except UFFDIO_API.) The userfaultfd object
397 has not yet been enabled (via the UFFDIO_API operation).
398
400 These ioctl(2) operations are Linux-specific.
401
403 In order to detect available userfault features and enable some subset
404 of those features the userfaultfd file descriptor must be closed after
405 the first UFFDIO_API operation that queries features availability and
406 reopened before the second UFFDIO_API operation that actually enables
407 the desired features.
408
410 See userfaultfd(2).
411
413 ioctl(2), mmap(2), userfaultfd(2)
414
415 Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
416 tree
417
419 This page is part of release 5.13 of the Linux man-pages project. A
420 description of the project, information about reporting bugs, and the
421 latest version of this page, can be found at
422 https://www.kernel.org/doc/man-pages/.
423
424
425
426Linux 2021-03-22 IOCTL_USERFAULTFD(2)