1ioctl_userfaultfd(2) System Calls Manual ioctl_userfaultfd(2)
2
3
4
6 ioctl_userfaultfd - create a file descriptor for handling page faults
7 in user space
8
10 Standard C library (libc, -lc)
11
13 #include <linux/userfaultfd.h> /* Definition of UFFD* constants */
14 #include <sys/ioctl.h>
15
16 int ioctl(int fd, int cmd, ...);
17
19 Various ioctl(2) operations can be performed on a userfaultfd object
20 (created by a call to userfaultfd(2)) using calls of the form:
21
22 ioctl(fd, cmd, argp);
23 In the above, fd is a file descriptor referring to a userfaultfd ob‐
24 ject, cmd is one of the commands listed below, and argp is a pointer to
25 a data structure that is specific to cmd.
26
27 The various ioctl(2) operations are described below. The UFFDIO_API,
28 UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
29 userfaultfd behavior. These operations allow the caller to choose what
30 features will be enabled and what kinds of events will be delivered to
31 the application. The remaining operations are range operations. These
32 operations enable the calling application to resolve page-fault events.
33
34 UFFDIO_API
35 (Since Linux 4.3.) Enable operation of the userfaultfd and perform API
36 handshake.
37
38 The argp argument is a pointer to a uffdio_api structure, defined as:
39
40 struct uffdio_api {
41 __u64 api; /* Requested API version (input) */
42 __u64 features; /* Requested features (input/output) */
43 __u64 ioctls; /* Available ioctl() operations (output) */
44 };
45
46 The api field denotes the API version requested by the application.
47
48 The kernel verifies that it can support the requested API version, and
49 sets the features and ioctls fields to bit masks representing all the
50 available features and the generic ioctl(2) operations available.
51
52 Before Linux 4.11, the features field must be initialized to zero be‐
53 fore the call to UFFDIO_API, and zero (i.e., no feature bits) is placed
54 in the features field by the kernel upon return from ioctl(2).
55
56 Starting from Linux 4.11, the features field can be used to ask whether
57 particular features are supported and explicitly enable userfaultfd
58 features that are disabled by default. The kernel always reports all
59 the available features in the features field.
60
61 To enable userfaultfd features the application should set a bit corre‐
62 sponding to each feature it wants to enable in the features field. If
63 the kernel supports all the requested features it will enable them.
64 Otherwise it will zero out the returned uffdio_api structure and return
65 EINVAL.
66
67 The following feature bits may be set:
68
69 UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
70 When this feature is enabled, the userfaultfd objects associated
71 with a parent process are duplicated into the child process dur‐
72 ing fork(2) and a UFFD_EVENT_FORK event is delivered to the
73 userfaultfd monitor
74
75 UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
76 If this feature is enabled, when the faulting process invokes
77 mremap(2), the userfaultfd monitor will receive an event of type
78 UFFD_EVENT_REMAP.
79
80 UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
81 If this feature is enabled, when the faulting process calls mad‐
82 vise(2) with the MADV_DONTNEED or MADV_REMOVE advice value to
83 free a virtual memory area the userfaultfd monitor will receive
84 an event of type UFFD_EVENT_REMOVE.
85
86 UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
87 If this feature is enabled, when the faulting process unmaps
88 virtual memory either explicitly with munmap(2), or implicitly
89 during either mmap(2) or mremap(2), the userfaultfd monitor will
90 receive an event of type UFFD_EVENT_UNMAP.
91
92 UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
93 If this feature bit is set, the kernel supports registering
94 userfaultfd ranges on hugetlbfs virtual memory areas
95
96 UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
97 If this feature bit is set, the kernel supports registering
98 userfaultfd ranges on shared memory areas. This includes all
99 kernel shared memory APIs: System V shared memory, tmpfs(5),
100 shared mappings of /dev/zero, mmap(2) with the MAP_SHARED flag
101 set, memfd_create(2), and so on.
102
103 UFFD_FEATURE_SIGBUS (since Linux 4.14)
104 If this feature bit is set, no page-fault events
105 (UFFD_EVENT_PAGEFAULT) will be delivered. Instead, a SIGBUS
106 signal will be sent to the faulting process. Applications using
107 this feature will not require the use of a userfaultfd monitor
108 for processing memory accesses to the regions registered with
109 userfaultfd.
110
111 UFFD_FEATURE_THREAD_ID (since Linux 4.14)
112 If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
113 set to the faulted thread ID for each page-fault message.
114
115 UFFD_FEATURE_MINOR_HUGETLBFS (since Linux 5.13)
116 If this feature bit is set, the kernel supports registering
117 userfaultfd ranges in minor mode on hugetlbfs-backed memory ar‐
118 eas.
119
120 UFFD_FEATURE_MINOR_SHMEM (since Linux 5.14)
121 If this feature bit is set, the kernel supports registering
122 userfaultfd ranges in minor mode on shmem-backed memory areas.
123
124 The returned ioctls field can contain the following bits:
125
126 1 << _UFFDIO_API
127 The UFFDIO_API operation is supported.
128
129 1 << _UFFDIO_REGISTER
130 The UFFDIO_REGISTER operation is supported.
131
132 1 << _UFFDIO_UNREGISTER
133 The UFFDIO_UNREGISTER operation is supported.
134
135 This ioctl(2) operation returns 0 on success. On error, -1 is returned
136 and errno is set to indicate the error. Possible errors include:
137
138 EFAULT argp refers to an address that is outside the calling process's
139 accessible address space.
140
141 EINVAL The userfaultfd has already been enabled by a previous UFF‐
142 DIO_API operation.
143
144 EINVAL The API version requested in the api field is not supported by
145 this kernel, or the features field passed to the kernel includes
146 feature bits that are not supported by the current kernel ver‐
147 sion.
148
149 UFFDIO_REGISTER
150 (Since Linux 4.3.) Register a memory address range with the user‐
151 faultfd object. The pages in the range must be "compatible". Please
152 refer to the list of register modes below for the compatible memory
153 backends for each mode.
154
155 The argp argument is a pointer to a uffdio_register structure, defined
156 as:
157
158 struct uffdio_range {
159 __u64 start; /* Start of range */
160 __u64 len; /* Length of range (bytes) */
161 };
162
163 struct uffdio_register {
164 struct uffdio_range range;
165 __u64 mode; /* Desired mode of operation (input) */
166 __u64 ioctls; /* Available ioctl() operations (output) */
167 };
168
169 The range field defines a memory range starting at start and continuing
170 for len bytes that should be handled by the userfaultfd.
171
172 The mode field defines the mode of operation desired for this memory
173 region. The following values may be bitwise ORed to set the user‐
174 faultfd mode for the specified range:
175
176 UFFDIO_REGISTER_MODE_MISSING
177 Track page faults on missing pages. Since Linux 4.3, only pri‐
178 vate anonymous ranges are compatible. Since Linux 4.11,
179 hugetlbfs and shared memory ranges are also compatible.
180
181 UFFDIO_REGISTER_MODE_WP
182 Track page faults on write-protected pages. Since Linux 5.7,
183 only private anonymous ranges are compatible.
184
185 UFFDIO_REGISTER_MODE_MINOR
186 Track minor page faults. Since Linux 5.13, only hugetlbfs
187 ranges are compatible. Since Linux 5.14, compatibility with
188 shmem ranges was added.
189
190 If the operation is successful, the kernel modifies the ioctls bit-mask
191 field to indicate which ioctl(2) operations are available for the spec‐
192 ified range. This returned bit mask can contain the following bits:
193
194 1 << _UFFDIO_COPY
195 The UFFDIO_COPY operation is supported.
196
197 1 << _UFFDIO_WAKE
198 The UFFDIO_WAKE operation is supported.
199
200 1 << _UFFDIO_WRITEPROTECT
201 The UFFDIO_WRITEPROTECT
202
203 1 << _UFFDIO_ZEROPAGE
204 The UFFDIO_ZEROPAGE operation is supported.
205
206 1 << _UFFDIO_CONTINUE
207 The UFFDIO_CONTINUE operation is supported.
208
209 This ioctl(2) operation returns 0 on success. On error, -1 is returned
210 and errno is set to indicate the error. Possible errors include:
211
212 EBUSY A mapping in the specified range is registered with another
213 userfaultfd object.
214
215 EFAULT argp refers to an address that is outside the calling process's
216 accessible address space.
217
218 EINVAL An invalid or unsupported bit was specified in the mode field;
219 or the mode field was zero.
220
221 EINVAL There is no mapping in the specified address range.
222
223 EINVAL range.start or range.len is not a multiple of the system page
224 size; or, range.len is zero; or these fields are otherwise in‐
225 valid.
226
227 EINVAL There as an incompatible mapping in the specified address range.
228
229 UFFDIO_UNREGISTER
230 (Since Linux 4.3.) Unregister a memory address range from userfaultfd.
231 The pages in the range must be "compatible" (see the description of
232 UFFDIO_REGISTER.)
233
234 The address range to unregister is specified in the uffdio_range struc‐
235 ture pointed to by argp.
236
237 This ioctl(2) operation returns 0 on success. On error, -1 is returned
238 and errno is set to indicate the error. Possible errors include:
239
240 EINVAL Either the start or the len field of the ufdio_range structure
241 was not a multiple of the system page size; or the len field was
242 zero; or these fields were otherwise invalid.
243
244 EINVAL There as an incompatible mapping in the specified address range.
245
246 EINVAL There was no mapping in the specified address range.
247
248 UFFDIO_COPY
249 (Since Linux 4.3.) Atomically copy a continuous memory chunk into the
250 userfault registered range and optionally wake up the blocked thread.
251 The source and destination addresses and the number of bytes to copy
252 are specified by the src, dst, and len fields of the uffdio_copy struc‐
253 ture pointed to by argp:
254
255 struct uffdio_copy {
256 __u64 dst; /* Destination of copy */
257 __u64 src; /* Source of copy */
258 __u64 len; /* Number of bytes to copy */
259 __u64 mode; /* Flags controlling behavior of copy */
260 __s64 copy; /* Number of bytes copied, or negated error */
261 };
262
263 The following value may be bitwise ORed in mode to change the behavior
264 of the UFFDIO_COPY operation:
265
266 UFFDIO_COPY_MODE_DONTWAKE
267 Do not wake up the thread that waits for page-fault resolution
268
269 UFFDIO_COPY_MODE_WP
270 Copy the page with read-only permission. This allows the user
271 to trap the next write to the page, which will block and gener‐
272 ate another write-protect userfault message. This is used only
273 when both UFFDIO_REGISTER_MODE_MISSING and UFFDIO_REGIS‐
274 TER_MODE_WP modes are enabled for the registered range.
275
276 The copy field is used by the kernel to return the number of bytes that
277 was actually copied, or an error (a negated errno-style value). If the
278 value returned in copy doesn't match the value that was specified in
279 len, the operation fails with the error EAGAIN. The copy field is out‐
280 put-only; it is not read by the UFFDIO_COPY operation.
281
282 This ioctl(2) operation returns 0 on success. In this case, the entire
283 area was copied. On error, -1 is returned and errno is set to indicate
284 the error. Possible errors include:
285
286 EAGAIN The number of bytes copied (i.e., the value returned in the copy
287 field) does not equal the value that was specified in the len
288 field.
289
290 EINVAL Either dst or len was not a multiple of the system page size, or
291 the range specified by src and len or dst and len was invalid.
292
293 EINVAL An invalid bit was specified in the mode field.
294
295 ENOENT (since Linux 4.11)
296 The faulting process has changed its virtual memory layout si‐
297 multaneously with an outstanding UFFDIO_COPY operation.
298
299 ENOSPC (from Linux 4.11 until Linux 4.13)
300 The faulting process has exited at the time of a UFFDIO_COPY op‐
301 eration.
302
303 ESRCH (since Linux 4.13)
304 The faulting process has exited at the time of a UFFDIO_COPY op‐
305 eration.
306
307 UFFDIO_ZEROPAGE
308 (Since Linux 4.3.) Zero out a memory range registered with user‐
309 faultfd.
310
311 The requested range is specified by the range field of the uffdio_ze‐
312 ropage structure pointed to by argp:
313
314 struct uffdio_zeropage {
315 struct uffdio_range range;
316 __u64 mode; /* Flags controlling behavior of copy */
317 __s64 zeropage; /* Number of bytes zeroed, or negated error */
318 };
319
320 The following value may be bitwise ORed in mode to change the behavior
321 of the UFFDIO_ZEROPAGE operation:
322
323 UFFDIO_ZEROPAGE_MODE_DONTWAKE
324 Do not wake up the thread that waits for page-fault resolution.
325
326 The zeropage field is used by the kernel to return the number of bytes
327 that was actually zeroed, or an error in the same manner as UFF‐
328 DIO_COPY. If the value returned in the zeropage field doesn't match
329 the value that was specified in range.len, the operation fails with the
330 error EAGAIN. The zeropage field is output-only; it is not read by the
331 UFFDIO_ZEROPAGE operation.
332
333 This ioctl(2) operation returns 0 on success. In this case, the entire
334 area was zeroed. On error, -1 is returned and errno is set to indicate
335 the error. Possible errors include:
336
337 EAGAIN The number of bytes zeroed (i.e., the value returned in the ze‐
338 ropage field) does not equal the value that was specified in the
339 range.len field.
340
341 EINVAL Either range.start or range.len was not a multiple of the system
342 page size; or range.len was zero; or the range specified was in‐
343 valid.
344
345 EINVAL An invalid bit was specified in the mode field.
346
347 ESRCH (since Linux 4.13)
348 The faulting process has exited at the time of a UFFDIO_ZEROPAGE
349 operation.
350
351 UFFDIO_WAKE
352 (Since Linux 4.3.) Wake up the thread waiting for page-fault resolu‐
353 tion on a specified memory address range.
354
355 The UFFDIO_WAKE operation is used in conjunction with UFFDIO_COPY and
356 UFFDIO_ZEROPAGE operations that have the UFFDIO_COPY_MODE_DONTWAKE or
357 UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field. The userfault
358 monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
359 in a batch and then explicitly wake up the faulting thread using UFF‐
360 DIO_WAKE.
361
362 The argp argument is a pointer to a uffdio_range structure (shown
363 above) that specifies the address range.
364
365 This ioctl(2) operation returns 0 on success. On error, -1 is returned
366 and errno is set to indicate the error. Possible errors include:
367
368 EINVAL The start or the len field of the ufdio_range structure was not
369 a multiple of the system page size; or len was zero; or the
370 specified range was otherwise invalid.
371
372 UFFDIO_WRITEPROTECT (Since Linux 5.7)
373 Write-protect or write-unprotect a userfaultfd-registered memory range
374 registered with mode UFFDIO_REGISTER_MODE_WP.
375
376 The argp argument is a pointer to a uffdio_range structure as shown be‐
377 low:
378
379 struct uffdio_writeprotect {
380 struct uffdio_range range; /* Range to change write permission*/
381 __u64 mode; /* Mode to change write permission */
382 };
383
384 There are two mode bits that are supported in this structure:
385
386 UFFDIO_WRITEPROTECT_MODE_WP
387 When this mode bit is set, the ioctl will be a write-protect op‐
388 eration upon the memory range specified by range. Otherwise it
389 will be a write-unprotect operation upon the specified range,
390 which can be used to resolve a userfaultfd write-protect page
391 fault.
392
393 UFFDIO_WRITEPROTECT_MODE_DONTWAKE
394 When this mode bit is set, do not wake up any thread that waits
395 for page-fault resolution after the operation. This can be
396 specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
397
398 This ioctl(2) operation returns 0 on success. On error, -1 is returned
399 and errno is set to indicate the error. Possible errors include:
400
401 EINVAL The start or the len field of the ufdio_range structure was not
402 a multiple of the system page size; or len was zero; or the
403 specified range was otherwise invalid.
404
405 EAGAIN The process was interrupted; retry this call.
406
407 ENOENT The range specified in range is not valid. For example, the
408 virtual address does not exist, or not registered with user‐
409 faultfd write-protect mode.
410
411 EFAULT Encountered a generic fault during processing.
412
413 UFFDIO_CONTINUE
414 (Since Linux 5.13.) Resolve a minor page fault by installing page ta‐
415 ble entries for existing pages in the page cache.
416
417 The argp argument is a pointer to a uffdio_continue structure as shown
418 below:
419
420 struct uffdio_continue {
421 struct uffdio_range range;
422 /* Range to install PTEs for and continue */
423 __u64 mode; /* Flags controlling the behavior of continue */
424 __s64 mapped; /* Number of bytes mapped, or negated error */
425 };
426
427 The following value may be bitwise ORed in mode to change the behavior
428 of the UFFDIO_CONTINUE operation:
429
430 UFFDIO_CONTINUE_MODE_DONTWAKE
431 Do not wake up the thread that waits for page-fault resolution.
432
433 The mapped field is used by the kernel to return the number of bytes
434 that were actually mapped, or an error in the same manner as UFF‐
435 DIO_COPY. If the value returned in the mapped field doesn't match the
436 value that was specified in range.len, the operation fails with the er‐
437 ror EAGAIN. The mapped field is output-only; it is not read by the
438 UFFDIO_CONTINUE operation.
439
440 This ioctl(2) operation returns 0 on success. In this case, the entire
441 area was mapped. On error, -1 is returned and errno is set to indicate
442 the error. Possible errors include:
443
444 EAGAIN The number of bytes mapped (i.e., the value returned in the
445 mapped field) does not equal the value that was specified in the
446 range.len field.
447
448 EINVAL Either range.start or range.len was not a multiple of the system
449 page size; or range.len was zero; or the range specified was in‐
450 valid.
451
452 EINVAL An invalid bit was specified in the mode field.
453
454 EEXIST One or more pages were already mapped in the given range.
455
456 ENOENT The faulting process has changed its virtual memory layout si‐
457 multaneously with an outstanding UFFDIO_CONTINUE operation.
458
459 ENOMEM Allocating memory needed to setup the page table mappings
460 failed.
461
462 EFAULT No existing page could be found in the page cache for the given
463 range.
464
465 ESRCH The faulting process has exited at the time of a UFFDIO_CONTINUE
466 operation.
467
469 See descriptions of the individual operations, above.
470
472 See descriptions of the individual operations, above. In addition, the
473 following general errors can occur for all of the operations described
474 above:
475
476 EFAULT argp does not point to a valid memory address.
477
478 EINVAL (For all operations except UFFDIO_API.) The userfaultfd object
479 has not yet been enabled (via the UFFDIO_API operation).
480
482 Linux.
483
485 In order to detect available userfault features and enable some subset
486 of those features the userfaultfd file descriptor must be closed after
487 the first UFFDIO_API operation that queries features availability and
488 reopened before the second UFFDIO_API operation that actually enables
489 the desired features.
490
492 See userfaultfd(2).
493
495 ioctl(2), mmap(2), userfaultfd(2)
496
497 Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
498 tree
499
500
501
502Linux man-pages 6.04 2023-03-30 ioctl_userfaultfd(2)