1ioctl_userfaultfd(2) System Calls Manual ioctl_userfaultfd(2)
2
3
4
6 ioctl_userfaultfd - create a file descriptor for handling page faults
7 in user space
8
10 Standard C library (libc, -lc)
11
13 #include <linux/userfaultfd.h> /* Definition of UFFD* constants */
14 #include <sys/ioctl.h>
15
16 int ioctl(int fd, int cmd, ...);
17
19 Various ioctl(2) operations can be performed on a userfaultfd object
20 (created by a call to userfaultfd(2)) using calls of the form:
21
22 ioctl(fd, cmd, argp);
23 In the above, fd is a file descriptor referring to a userfaultfd ob‐
24 ject, cmd is one of the commands listed below, and argp is a pointer to
25 a data structure that is specific to cmd.
26
27 The various ioctl(2) operations are described below. The UFFDIO_API,
28 UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
29 userfaultfd behavior. These operations allow the caller to choose what
30 features will be enabled and what kinds of events will be delivered to
31 the application. The remaining operations are range operations. These
32 operations enable the calling application to resolve page-fault events.
33
34 UFFDIO_API
35 (Since Linux 4.3.) Enable operation of the userfaultfd and perform API
36 handshake.
37
38 The argp argument is a pointer to a uffdio_api structure, defined as:
39
40 struct uffdio_api {
41 __u64 api; /* Requested API version (input) */
42 __u64 features; /* Requested features (input/output) */
43 __u64 ioctls; /* Available ioctl() operations (output) */
44 };
45
46 The api field denotes the API version requested by the application.
47
48 The kernel verifies that it can support the requested API version, and
49 sets the features and ioctls fields to bit masks representing all the
50 available features and the generic ioctl(2) operations available.
51
52 Before Linux 4.11, the features field must be initialized to zero be‐
53 fore the call to UFFDIO_API, and zero (i.e., no feature bits) is placed
54 in the features field by the kernel upon return from ioctl(2).
55
56 Starting from Linux 4.11, the features field can be used to ask whether
57 particular features are supported and explicitly enable userfaultfd
58 features that are disabled by default. The kernel always reports all
59 the available features in the features field.
60
61 To enable userfaultfd features the application should set a bit corre‐
62 sponding to each feature it wants to enable in the features field. If
63 the kernel supports all the requested features it will enable them.
64 Otherwise it will zero out the returned uffdio_api structure and return
65 EINVAL.
66
67 The following feature bits may be set:
68
69 UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
70 When this feature is enabled, the userfaultfd objects associated
71 with a parent process are duplicated into the child process dur‐
72 ing fork(2) and a UFFD_EVENT_FORK event is delivered to the
73 userfaultfd monitor
74
75 UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
76 If this feature is enabled, when the faulting process invokes
77 mremap(2), the userfaultfd monitor will receive an event of type
78 UFFD_EVENT_REMAP.
79
80 UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
81 If this feature is enabled, when the faulting process calls mad‐
82 vise(2) with the MADV_DONTNEED or MADV_REMOVE advice value to
83 free a virtual memory area the userfaultfd monitor will receive
84 an event of type UFFD_EVENT_REMOVE.
85
86 UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
87 If this feature is enabled, when the faulting process unmaps
88 virtual memory either explicitly with munmap(2), or implicitly
89 during either mmap(2) or mremap(2), the userfaultfd monitor will
90 receive an event of type UFFD_EVENT_UNMAP.
91
92 UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
93 If this feature bit is set, the kernel supports registering
94 userfaultfd ranges on hugetlbfs virtual memory areas
95
96 UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
97 If this feature bit is set, the kernel supports registering
98 userfaultfd ranges on shared memory areas. This includes all
99 kernel shared memory APIs: System V shared memory, tmpfs(5),
100 shared mappings of /dev/zero, mmap(2) with the MAP_SHARED flag
101 set, memfd_create(2), and so on.
102
103 UFFD_FEATURE_SIGBUS (since Linux 4.14)
104 If this feature bit is set, no page-fault events
105 (UFFD_EVENT_PAGEFAULT) will be delivered. Instead, a SIGBUS
106 signal will be sent to the faulting process. Applications using
107 this feature will not require the use of a userfaultfd monitor
108 for processing memory accesses to the regions registered with
109 userfaultfd.
110
111 UFFD_FEATURE_THREAD_ID (since Linux 4.14)
112 If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
113 set to the faulted thread ID for each page-fault message.
114
115 UFFD_FEATURE_MINOR_HUGETLBFS (since Linux 5.13)
116 If this feature bit is set, the kernel supports registering
117 userfaultfd ranges in minor mode on hugetlbfs-backed memory ar‐
118 eas.
119
120 UFFD_FEATURE_MINOR_SHMEM (since Linux 5.14)
121 If this feature bit is set, the kernel supports registering
122 userfaultfd ranges in minor mode on shmem-backed memory areas.
123
124 UFFD_FEATURE_EXACT_ADDRESS (since Linux 5.18)
125 If this feature bit is set, uffd_msg.pagefault.address will be
126 set to the exact page-fault address that was reported by the
127 hardware, and will not mask the offset within the page. Note
128 that old Linux versions might indicate the exact address as
129 well, even though the feature bit is not set.
130
131 The returned ioctls field can contain the following bits:
132
133 1 << _UFFDIO_API
134 The UFFDIO_API operation is supported.
135
136 1 << _UFFDIO_REGISTER
137 The UFFDIO_REGISTER operation is supported.
138
139 1 << _UFFDIO_UNREGISTER
140 The UFFDIO_UNREGISTER operation is supported.
141
142 This ioctl(2) operation returns 0 on success. On error, -1 is returned
143 and errno is set to indicate the error. Possible errors include:
144
145 EFAULT argp refers to an address that is outside the calling process's
146 accessible address space.
147
148 EINVAL The userfaultfd has already been enabled by a previous UFF‐
149 DIO_API operation.
150
151 EINVAL The API version requested in the api field is not supported by
152 this kernel, or the features field passed to the kernel includes
153 feature bits that are not supported by the current kernel ver‐
154 sion.
155
156 UFFDIO_REGISTER
157 (Since Linux 4.3.) Register a memory address range with the user‐
158 faultfd object. The pages in the range must be "compatible". Please
159 refer to the list of register modes below for the compatible memory
160 backends for each mode.
161
162 The argp argument is a pointer to a uffdio_register structure, defined
163 as:
164
165 struct uffdio_range {
166 __u64 start; /* Start of range */
167 __u64 len; /* Length of range (bytes) */
168 };
169
170 struct uffdio_register {
171 struct uffdio_range range;
172 __u64 mode; /* Desired mode of operation (input) */
173 __u64 ioctls; /* Available ioctl() operations (output) */
174 };
175
176 The range field defines a memory range starting at start and continuing
177 for len bytes that should be handled by the userfaultfd.
178
179 The mode field defines the mode of operation desired for this memory
180 region. The following values may be bitwise ORed to set the user‐
181 faultfd mode for the specified range:
182
183 UFFDIO_REGISTER_MODE_MISSING
184 Track page faults on missing pages. Since Linux 4.3, only pri‐
185 vate anonymous ranges are compatible. Since Linux 4.11,
186 hugetlbfs and shared memory ranges are also compatible.
187
188 UFFDIO_REGISTER_MODE_WP
189 Track page faults on write-protected pages. Since Linux 5.7,
190 only private anonymous ranges are compatible.
191
192 UFFDIO_REGISTER_MODE_MINOR
193 Track minor page faults. Since Linux 5.13, only hugetlbfs
194 ranges are compatible. Since Linux 5.14, compatibility with
195 shmem ranges was added.
196
197 If the operation is successful, the kernel modifies the ioctls bit-mask
198 field to indicate which ioctl(2) operations are available for the spec‐
199 ified range. This returned bit mask can contain the following bits:
200
201 1 << _UFFDIO_COPY
202 The UFFDIO_COPY operation is supported.
203
204 1 << _UFFDIO_WAKE
205 The UFFDIO_WAKE operation is supported.
206
207 1 << _UFFDIO_WRITEPROTECT
208 The UFFDIO_WRITEPROTECT
209
210 1 << _UFFDIO_ZEROPAGE
211 The UFFDIO_ZEROPAGE operation is supported.
212
213 1 << _UFFDIO_CONTINUE
214 The UFFDIO_CONTINUE operation is supported.
215
216 This ioctl(2) operation returns 0 on success. On error, -1 is returned
217 and errno is set to indicate the error. Possible errors include:
218
219 EBUSY A mapping in the specified range is registered with another
220 userfaultfd object.
221
222 EFAULT argp refers to an address that is outside the calling process's
223 accessible address space.
224
225 EINVAL An invalid or unsupported bit was specified in the mode field;
226 or the mode field was zero.
227
228 EINVAL There is no mapping in the specified address range.
229
230 EINVAL range.start or range.len is not a multiple of the system page
231 size; or, range.len is zero; or these fields are otherwise in‐
232 valid.
233
234 EINVAL There as an incompatible mapping in the specified address range.
235
236 UFFDIO_UNREGISTER
237 (Since Linux 4.3.) Unregister a memory address range from userfaultfd.
238 The pages in the range must be "compatible" (see the description of
239 UFFDIO_REGISTER.)
240
241 The address range to unregister is specified in the uffdio_range struc‐
242 ture pointed to by argp.
243
244 This ioctl(2) operation returns 0 on success. On error, -1 is returned
245 and errno is set to indicate the error. Possible errors include:
246
247 EINVAL Either the start or the len field of the ufdio_range structure
248 was not a multiple of the system page size; or the len field was
249 zero; or these fields were otherwise invalid.
250
251 EINVAL There as an incompatible mapping in the specified address range.
252
253 EINVAL There was no mapping in the specified address range.
254
255 UFFDIO_COPY
256 (Since Linux 4.3.) Atomically copy a continuous memory chunk into the
257 userfault registered range and optionally wake up the blocked thread.
258 The source and destination addresses and the number of bytes to copy
259 are specified by the src, dst, and len fields of the uffdio_copy struc‐
260 ture pointed to by argp:
261
262 struct uffdio_copy {
263 __u64 dst; /* Destination of copy */
264 __u64 src; /* Source of copy */
265 __u64 len; /* Number of bytes to copy */
266 __u64 mode; /* Flags controlling behavior of copy */
267 __s64 copy; /* Number of bytes copied, or negated error */
268 };
269
270 The following value may be bitwise ORed in mode to change the behavior
271 of the UFFDIO_COPY operation:
272
273 UFFDIO_COPY_MODE_DONTWAKE
274 Do not wake up the thread that waits for page-fault resolution
275
276 UFFDIO_COPY_MODE_WP
277 Copy the page with read-only permission. This allows the user
278 to trap the next write to the page, which will block and gener‐
279 ate another write-protect userfault message. This is used only
280 when both UFFDIO_REGISTER_MODE_MISSING and UFFDIO_REGIS‐
281 TER_MODE_WP modes are enabled for the registered range.
282
283 The copy field is used by the kernel to return the number of bytes that
284 was actually copied, or an error (a negated errno-style value). If the
285 value returned in copy doesn't match the value that was specified in
286 len, the operation fails with the error EAGAIN. The copy field is out‐
287 put-only; it is not read by the UFFDIO_COPY operation.
288
289 This ioctl(2) operation returns 0 on success. In this case, the entire
290 area was copied. On error, -1 is returned and errno is set to indicate
291 the error. Possible errors include:
292
293 EAGAIN The number of bytes copied (i.e., the value returned in the copy
294 field) does not equal the value that was specified in the len
295 field.
296
297 EINVAL Either dst or len was not a multiple of the system page size, or
298 the range specified by src and len or dst and len was invalid.
299
300 EINVAL An invalid bit was specified in the mode field.
301
302 ENOENT (since Linux 4.11)
303 The faulting process has changed its virtual memory layout si‐
304 multaneously with an outstanding UFFDIO_COPY operation.
305
306 ENOSPC (from Linux 4.11 until Linux 4.13)
307 The faulting process has exited at the time of a UFFDIO_COPY op‐
308 eration.
309
310 ESRCH (since Linux 4.13)
311 The faulting process has exited at the time of a UFFDIO_COPY op‐
312 eration.
313
314 UFFDIO_ZEROPAGE
315 (Since Linux 4.3.) Zero out a memory range registered with user‐
316 faultfd.
317
318 The requested range is specified by the range field of the uffdio_ze‐
319 ropage structure pointed to by argp:
320
321 struct uffdio_zeropage {
322 struct uffdio_range range;
323 __u64 mode; /* Flags controlling behavior of copy */
324 __s64 zeropage; /* Number of bytes zeroed, or negated error */
325 };
326
327 The following value may be bitwise ORed in mode to change the behavior
328 of the UFFDIO_ZEROPAGE operation:
329
330 UFFDIO_ZEROPAGE_MODE_DONTWAKE
331 Do not wake up the thread that waits for page-fault resolution.
332
333 The zeropage field is used by the kernel to return the number of bytes
334 that was actually zeroed, or an error in the same manner as UFF‐
335 DIO_COPY. If the value returned in the zeropage field doesn't match
336 the value that was specified in range.len, the operation fails with the
337 error EAGAIN. The zeropage field is output-only; it is not read by the
338 UFFDIO_ZEROPAGE operation.
339
340 This ioctl(2) operation returns 0 on success. In this case, the entire
341 area was zeroed. On error, -1 is returned and errno is set to indicate
342 the error. Possible errors include:
343
344 EAGAIN The number of bytes zeroed (i.e., the value returned in the ze‐
345 ropage field) does not equal the value that was specified in the
346 range.len field.
347
348 EINVAL Either range.start or range.len was not a multiple of the system
349 page size; or range.len was zero; or the range specified was in‐
350 valid.
351
352 EINVAL An invalid bit was specified in the mode field.
353
354 ESRCH (since Linux 4.13)
355 The faulting process has exited at the time of a UFFDIO_ZEROPAGE
356 operation.
357
358 UFFDIO_WAKE
359 (Since Linux 4.3.) Wake up the thread waiting for page-fault resolu‐
360 tion on a specified memory address range.
361
362 The UFFDIO_WAKE operation is used in conjunction with UFFDIO_COPY and
363 UFFDIO_ZEROPAGE operations that have the UFFDIO_COPY_MODE_DONTWAKE or
364 UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field. The userfault
365 monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
366 in a batch and then explicitly wake up the faulting thread using UFF‐
367 DIO_WAKE.
368
369 The argp argument is a pointer to a uffdio_range structure (shown
370 above) that specifies the address range.
371
372 This ioctl(2) operation returns 0 on success. On error, -1 is returned
373 and errno is set to indicate the error. Possible errors include:
374
375 EINVAL The start or the len field of the ufdio_range structure was not
376 a multiple of the system page size; or len was zero; or the
377 specified range was otherwise invalid.
378
379 UFFDIO_WRITEPROTECT (Since Linux 5.7)
380 Write-protect or write-unprotect a userfaultfd-registered memory range
381 registered with mode UFFDIO_REGISTER_MODE_WP.
382
383 The argp argument is a pointer to a uffdio_range structure as shown be‐
384 low:
385
386 struct uffdio_writeprotect {
387 struct uffdio_range range; /* Range to change write permission*/
388 __u64 mode; /* Mode to change write permission */
389 };
390
391 There are two mode bits that are supported in this structure:
392
393 UFFDIO_WRITEPROTECT_MODE_WP
394 When this mode bit is set, the ioctl will be a write-protect op‐
395 eration upon the memory range specified by range. Otherwise it
396 will be a write-unprotect operation upon the specified range,
397 which can be used to resolve a userfaultfd write-protect page
398 fault.
399
400 UFFDIO_WRITEPROTECT_MODE_DONTWAKE
401 When this mode bit is set, do not wake up any thread that waits
402 for page-fault resolution after the operation. This can be
403 specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
404
405 This ioctl(2) operation returns 0 on success. On error, -1 is returned
406 and errno is set to indicate the error. Possible errors include:
407
408 EINVAL The start or the len field of the ufdio_range structure was not
409 a multiple of the system page size; or len was zero; or the
410 specified range was otherwise invalid.
411
412 EAGAIN The process was interrupted; retry this call.
413
414 ENOENT The range specified in range is not valid. For example, the
415 virtual address does not exist, or not registered with user‐
416 faultfd write-protect mode.
417
418 EFAULT Encountered a generic fault during processing.
419
420 UFFDIO_CONTINUE
421 (Since Linux 5.13.) Resolve a minor page fault by installing page ta‐
422 ble entries for existing pages in the page cache.
423
424 The argp argument is a pointer to a uffdio_continue structure as shown
425 below:
426
427 struct uffdio_continue {
428 struct uffdio_range range;
429 /* Range to install PTEs for and continue */
430 __u64 mode; /* Flags controlling the behavior of continue */
431 __s64 mapped; /* Number of bytes mapped, or negated error */
432 };
433
434 The following value may be bitwise ORed in mode to change the behavior
435 of the UFFDIO_CONTINUE operation:
436
437 UFFDIO_CONTINUE_MODE_DONTWAKE
438 Do not wake up the thread that waits for page-fault resolution.
439
440 The mapped field is used by the kernel to return the number of bytes
441 that were actually mapped, or an error in the same manner as UFF‐
442 DIO_COPY. If the value returned in the mapped field doesn't match the
443 value that was specified in range.len, the operation fails with the er‐
444 ror EAGAIN. The mapped field is output-only; it is not read by the
445 UFFDIO_CONTINUE operation.
446
447 This ioctl(2) operation returns 0 on success. In this case, the entire
448 area was mapped. On error, -1 is returned and errno is set to indicate
449 the error. Possible errors include:
450
451 EAGAIN The number of bytes mapped (i.e., the value returned in the
452 mapped field) does not equal the value that was specified in the
453 range.len field.
454
455 EINVAL Either range.start or range.len was not a multiple of the system
456 page size; or range.len was zero; or the range specified was in‐
457 valid.
458
459 EINVAL An invalid bit was specified in the mode field.
460
461 EEXIST One or more pages were already mapped in the given range.
462
463 ENOENT The faulting process has changed its virtual memory layout si‐
464 multaneously with an outstanding UFFDIO_CONTINUE operation.
465
466 ENOMEM Allocating memory needed to setup the page table mappings
467 failed.
468
469 EFAULT No existing page could be found in the page cache for the given
470 range.
471
472 ESRCH The faulting process has exited at the time of a UFFDIO_CONTINUE
473 operation.
474
476 See descriptions of the individual operations, above.
477
479 See descriptions of the individual operations, above. In addition, the
480 following general errors can occur for all of the operations described
481 above:
482
483 EFAULT argp does not point to a valid memory address.
484
485 EINVAL (For all operations except UFFDIO_API.) The userfaultfd object
486 has not yet been enabled (via the UFFDIO_API operation).
487
489 Linux.
490
492 In order to detect available userfault features and enable some subset
493 of those features the userfaultfd file descriptor must be closed after
494 the first UFFDIO_API operation that queries features availability and
495 reopened before the second UFFDIO_API operation that actually enables
496 the desired features.
497
499 See userfaultfd(2).
500
502 ioctl(2), mmap(2), userfaultfd(2)
503
504 Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
505 tree
506
507
508
509Linux man-pages 6.05 2023-05-03 ioctl_userfaultfd(2)