1userfaultfd(2) System Calls Manual userfaultfd(2)
2
3
4
6 userfaultfd - create a file descriptor for handling page faults in user
7 space
8
10 Standard C library (libc, -lc)
11
13 #include <fcntl.h> /* Definition of O_* constants */
14 #include <sys/syscall.h> /* Definition of SYS_* constants */
15 #include <linux/userfaultfd.h> /* Definition of UFFD_* constants */
16 #include <unistd.h>
17
18 int syscall(SYS_userfaultfd, int flags);
19
20 Note: glibc provides no wrapper for userfaultfd(), necessitating the
21 use of syscall(2).
22
24 userfaultfd() creates a new userfaultfd object that can be used for
25 delegation of page-fault handling to a user-space application, and re‐
26 turns a file descriptor that refers to the new object. The new user‐
27 faultfd object is configured using ioctl(2).
28
29 Once the userfaultfd object is configured, the application can use
30 read(2) to receive userfaultfd notifications. The reads from user‐
31 faultfd may be blocking or non-blocking, depending on the value of
32 flags used for the creation of the userfaultfd or subsequent calls to
33 fcntl(2).
34
35 The following values may be bitwise ORed in flags to change the behav‐
36 ior of userfaultfd():
37
38 O_CLOEXEC
39 Enable the close-on-exec flag for the new userfaultfd file de‐
40 scriptor. See the description of the O_CLOEXEC flag in open(2).
41
42 O_NONBLOCK
43 Enables non-blocking operation for the userfaultfd object. See
44 the description of the O_NONBLOCK flag in open(2).
45
46 UFFD_USER_MODE_ONLY
47 This is an userfaultfd-specific flag that was introduced in
48 Linux 5.11. When set, the userfaultfd object will only be able
49 to handle page faults originated from the user space on the reg‐
50 istered regions. When a kernel-originated fault was triggered
51 on the registered range with this userfaultfd, a SIGBUS signal
52 will be delivered.
53
54 When the last file descriptor referring to a userfaultfd object is
55 closed, all memory ranges that were registered with the object are un‐
56 registered and unread events are flushed.
57
58 Userfaultfd supports three modes of registration:
59
60 UFFDIO_REGISTER_MODE_MISSING (since Linux 4.10)
61 When registered with UFFDIO_REGISTER_MODE_MISSING mode, user-
62 space will receive a page-fault notification when a missing page
63 is accessed. The faulted thread will be stopped from execution
64 until the page fault is resolved from user-space by either an
65 UFFDIO_COPY or an UFFDIO_ZEROPAGE ioctl.
66
67 UFFDIO_REGISTER_MODE_MINOR (since Linux 5.13)
68 When registered with UFFDIO_REGISTER_MODE_MINOR mode, user-space
69 will receive a page-fault notification when a minor page fault
70 occurs. That is, when a backing page is in the page cache, but
71 page table entries don't yet exist. The faulted thread will be
72 stopped from execution until the page fault is resolved from
73 user-space by an UFFDIO_CONTINUE ioctl.
74
75 UFFDIO_REGISTER_MODE_WP (since Linux 5.7)
76 When registered with UFFDIO_REGISTER_MODE_WP mode, user-space
77 will receive a page-fault notification when a write-protected
78 page is written. The faulted thread will be stopped from execu‐
79 tion until user-space write-unprotects the page using an UFF‐
80 DIO_WRITEPROTECT ioctl.
81
82 Multiple modes can be enabled at the same time for the same memory
83 range.
84
85 Since Linux 4.14, a userfaultfd page-fault notification can selectively
86 embed faulting thread ID information into the notification. One needs
87 to enable this feature explicitly using the UFFD_FEATURE_THREAD_ID fea‐
88 ture bit when initializing the userfaultfd context. By default, thread
89 ID reporting is disabled.
90
91 Usage
92 The userfaultfd mechanism is designed to allow a thread in a multi‐
93 threaded program to perform user-space paging for the other threads in
94 the process. When a page fault occurs for one of the regions regis‐
95 tered to the userfaultfd object, the faulting thread is put to sleep
96 and an event is generated that can be read via the userfaultfd file de‐
97 scriptor. The fault-handling thread reads events from this file de‐
98 scriptor and services them using the operations described in
99 ioctl_userfaultfd(2). When servicing the page fault events, the fault-
100 handling thread can trigger a wake-up for the sleeping thread.
101
102 It is possible for the faulting threads and the fault-handling threads
103 to run in the context of different processes. In this case, these
104 threads may belong to different programs, and the program that executes
105 the faulting threads will not necessarily cooperate with the program
106 that handles the page faults. In such non-cooperative mode, the
107 process that monitors userfaultfd and handles page faults needs to be
108 aware of the changes in the virtual memory layout of the faulting
109 process to avoid memory corruption.
110
111 Since Linux 4.11, userfaultfd can also notify the fault-handling
112 threads about changes in the virtual memory layout of the faulting
113 process. In addition, if the faulting process invokes fork(2), the
114 userfaultfd objects associated with the parent may be duplicated into
115 the child process and the userfaultfd monitor will be notified (via the
116 UFFD_EVENT_FORK described below) about the file descriptor associated
117 with the userfault objects created for the child process, which allows
118 the userfaultfd monitor to perform user-space paging for the child
119 process. Unlike page faults which have to be synchronous and require
120 an explicit or implicit wakeup, all other events are delivered asyn‐
121 chronously and the non-cooperative process resumes execution as soon as
122 the userfaultfd manager executes read(2). The userfaultfd manager
123 should carefully synchronize calls to UFFDIO_COPY with the processing
124 of events.
125
126 The current asynchronous model of the event delivery is optimal for
127 single threaded non-cooperative userfaultfd manager implementations.
128
129 Since Linux 5.7, userfaultfd is able to do synchronous page dirty
130 tracking using the new write-protect register mode. One should check
131 against the feature bit UFFD_FEATURE_PAGEFAULT_FLAG_WP before using
132 this feature. Similar to the original userfaultfd missing mode, the
133 write-protect mode will generate a userfaultfd notification when the
134 protected page is written. The user needs to resolve the page fault by
135 unprotecting the faulted page and kicking the faulted thread to con‐
136 tinue. For more information, please refer to the "Userfaultfd write-
137 protect mode" section.
138
139 Userfaultfd operation
140 After the userfaultfd object is created with userfaultfd(), the appli‐
141 cation must enable it using the UFFDIO_API ioctl(2) operation. This
142 operation allows a handshake between the kernel and user space to de‐
143 termine the API version and supported features. This operation must be
144 performed before any of the other ioctl(2) operations described below
145 (or those operations fail with the EINVAL error).
146
147 After a successful UFFDIO_API operation, the application then registers
148 memory address ranges using the UFFDIO_REGISTER ioctl(2) operation.
149 After successful completion of a UFFDIO_REGISTER operation, a page
150 fault occurring in the requested memory range, and satisfying the mode
151 defined at the registration time, will be forwarded by the kernel to
152 the user-space application. The application can then use the UFF‐
153 DIO_COPY , UFFDIO_ZEROPAGE , or UFFDIO_CONTINUE ioctl(2) operations to
154 resolve the page fault.
155
156 Since Linux 4.14, if the application sets the UFFD_FEATURE_SIGBUS fea‐
157 ture bit using the UFFDIO_API ioctl(2), no page-fault notification will
158 be forwarded to user space. Instead a SIGBUS signal is delivered to
159 the faulting process. With this feature, userfaultfd can be used for
160 robustness purposes to simply catch any access to areas within the reg‐
161 istered address range that do not have pages allocated, without having
162 to listen to userfaultfd events. No userfaultfd monitor will be re‐
163 quired for dealing with such memory accesses. For example, this fea‐
164 ture can be useful for applications that want to prevent the kernel
165 from automatically allocating pages and filling holes in sparse files
166 when the hole is accessed through a memory mapping.
167
168 The UFFD_FEATURE_SIGBUS feature is implicitly inherited through fork(2)
169 if used in combination with UFFD_FEATURE_FORK.
170
171 Details of the various ioctl(2) operations can be found in ioctl_user‐
172 faultfd(2).
173
174 Since Linux 4.11, events other than page-fault may enabled during UFF‐
175 DIO_API operation.
176
177 Up to Linux 4.11, userfaultfd can be used only with anonymous private
178 memory mappings. Since Linux 4.11, userfaultfd can be also used with
179 hugetlbfs and shared memory mappings.
180
181 Userfaultfd write-protect mode (since Linux 5.7)
182 Since Linux 5.7, userfaultfd supports write-protect mode for anonymous
183 memory. The user needs to first check availability of this feature us‐
184 ing UFFDIO_API ioctl against the feature bit UFFD_FEATURE_PAGE‐
185 FAULT_FLAG_WP before using this feature.
186
187 Since Linux 5.19, the write-protection mode was also supported on shmem
188 and hugetlbfs memory types. It can be detected with the feature bit
189 UFFD_FEATURE_WP_HUGETLBFS_SHMEM.
190
191 To register with userfaultfd write-protect mode, the user needs to ini‐
192 tiate the UFFDIO_REGISTER ioctl with mode UFFDIO_REGISTER_MODE_WP set.
193 Note that it is legal to monitor the same memory range with multiple
194 modes. For example, the user can do UFFDIO_REGISTER with the mode set
195 to UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP. When there
196 is only UFFDIO_REGISTER_MODE_WP registered, user-space will not receive
197 any notification when a missing page is written. Instead, user-space
198 will receive a write-protect page-fault notification only when an ex‐
199 isting but write-protected page got written.
200
201 After the UFFDIO_REGISTER ioctl completed with UFFDIO_REGISTER_MODE_WP
202 mode set, the user can write-protect any existing memory within the
203 range using the ioctl UFFDIO_WRITEPROTECT where uffdio_writepro‐
204 tect.mode should be set to UFFDIO_WRITEPROTECT_MODE_WP.
205
206 When a write-protect event happens, user-space will receive a page-
207 fault notification whose uffd_msg.pagefault.flags will be with
208 UFFD_PAGEFAULT_FLAG_WP flag set. Note: since only writes can trigger
209 this kind of fault, write-protect notifications will always have the
210 UFFD_PAGEFAULT_FLAG_WRITE bit set along with the UFFD_PAGEFAULT_FLAG_WP
211 bit.
212
213 To resolve a write-protection page fault, the user should initiate an‐
214 other UFFDIO_WRITEPROTECT ioctl, whose uffd_msg.pagefault.flags should
215 have the flag UFFDIO_WRITEPROTECT_MODE_WP cleared upon the faulted page
216 or range.
217
218 Userfaultfd minor fault mode (since Linux 5.13)
219 Since Linux 5.13, userfaultfd supports minor fault mode. In this mode,
220 fault messages are produced not for major faults (where the page was
221 missing), but rather for minor faults, where a page exists in the page
222 cache, but the page table entries are not yet present. The user needs
223 to first check availability of this feature using the UFFDIO_API ioctl
224 with the appropriate feature bits set before using this feature:
225 UFFD_FEATURE_MINOR_HUGETLBFS since Linux 5.13, or UFFD_FEATURE_MI‐
226 NOR_SHMEM since Linux 5.14.
227
228 To register with userfaultfd minor fault mode, the user needs to initi‐
229 ate the UFFDIO_REGISTER ioctl with mode UFFD_REGISTER_MODE_MINOR set.
230
231 When a minor fault occurs, user-space will receive a page-fault notifi‐
232 cation whose uffd_msg.pagefault.flags will have the UFFD_PAGE‐
233 FAULT_FLAG_MINOR flag set.
234
235 To resolve a minor page fault, the handler should decide whether or not
236 the existing page contents need to be modified first. If so, this
237 should be done in-place via a second, non-userfaultfd-registered map‐
238 ping to the same backing page (e.g., by mapping the shmem or hugetlbfs
239 file twice). Once the page is considered "up to date", the fault can
240 be resolved by initiating an UFFDIO_CONTINUE ioctl, which installs the
241 page table entries and (by default) wakes up the faulting thread(s).
242
243 Minor fault mode supports only hugetlbfs-backed (since Linux 5.13) and
244 shmem-backed (since Linux 5.14) memory.
245
246 Reading from the userfaultfd structure
247 Each read(2) from the userfaultfd file descriptor returns one or more
248 uffd_msg structures, each of which describes a page-fault event or an
249 event required for the non-cooperative userfaultfd usage:
250
251 struct uffd_msg {
252 __u8 event; /* Type of event */
253 ...
254 union {
255 struct {
256 __u64 flags; /* Flags describing fault */
257 __u64 address; /* Faulting address */
258 union {
259 __u32 ptid; /* Thread ID of the fault */
260 } feat;
261 } pagefault;
262
263 struct { /* Since Linux 4.11 */
264 __u32 ufd; /* Userfault file descriptor
265 of the child process */
266 } fork;
267
268 struct { /* Since Linux 4.11 */
269 __u64 from; /* Old address of remapped area */
270 __u64 to; /* New address of remapped area */
271 __u64 len; /* Original mapping length */
272 } remap;
273
274 struct { /* Since Linux 4.11 */
275 __u64 start; /* Start address of removed area */
276 __u64 end; /* End address of removed area */
277 } remove;
278 ...
279 } arg;
280
281 /* Padding fields omitted */
282 } __packed;
283
284 If multiple events are available and the supplied buffer is large
285 enough, read(2) returns as many events as will fit in the supplied buf‐
286 fer. If the buffer supplied to read(2) is smaller than the size of the
287 uffd_msg structure, the read(2) fails with the error EINVAL.
288
289 The fields set in the uffd_msg structure are as follows:
290
291 event The type of event. Depending of the event type, different
292 fields of the arg union represent details required for the event
293 processing. The non-page-fault events are generated only when
294 appropriate feature is enabled during API handshake with UFF‐
295 DIO_API ioctl(2).
296
297 The following values can appear in the event field:
298
299 UFFD_EVENT_PAGEFAULT (since Linux 4.3)
300 A page-fault event. The page-fault details are available
301 in the pagefault field.
302
303 UFFD_EVENT_FORK (since Linux 4.11)
304 Generated when the faulting process invokes fork(2) (or
305 clone(2) without the CLONE_VM flag). The event details
306 are available in the fork field.
307
308 UFFD_EVENT_REMAP (since Linux 4.11)
309 Generated when the faulting process invokes mremap(2).
310 The event details are available in the remap field.
311
312 UFFD_EVENT_REMOVE (since Linux 4.11)
313 Generated when the faulting process invokes madvise(2)
314 with MADV_DONTNEED or MADV_REMOVE advice. The event de‐
315 tails are available in the remove field.
316
317 UFFD_EVENT_UNMAP (since Linux 4.11)
318 Generated when the faulting process unmaps a memory
319 range, either explicitly using munmap(2) or implicitly
320 during mmap(2) or mremap(2). The event details are
321 available in the remove field.
322
323 pagefault.address
324 The address that triggered the page fault.
325
326 pagefault.flags
327 A bit mask of flags that describe the event. For
328 UFFD_EVENT_PAGEFAULT, the following flag may appear:
329
330 UFFD_PAGEFAULT_FLAG_WP
331 If this flag is set, then the fault was a write-protect
332 fault.
333
334 UFFD_PAGEFAULT_FLAG_MINOR
335 If this flag is set, then the fault was a minor fault.
336
337 UFFD_PAGEFAULT_FLAG_WRITE
338 If this flag is set, then the fault was a write fault.
339
340 If neither UFFD_PAGEFAULT_FLAG_WP nor UFFD_PAGEFAULT_FLAG_MINOR
341 are set, then the fault was a missing fault.
342
343 pagefault.feat.pid
344 The thread ID that triggered the page fault.
345
346 fork.ufd
347 The file descriptor associated with the userfault object created
348 for the child created by fork(2).
349
350 remap.from
351 The original address of the memory range that was remapped using
352 mremap(2).
353
354 remap.to
355 The new address of the memory range that was remapped using
356 mremap(2).
357
358 remap.len
359 The original length of the memory range that was remapped using
360 mremap(2).
361
362 remove.start
363 The start address of the memory range that was freed using mad‐
364 vise(2) or unmapped
365
366 remove.end
367 The end address of the memory range that was freed using mad‐
368 vise(2) or unmapped
369
370 A read(2) on a userfaultfd file descriptor can fail with the following
371 errors:
372
373 EINVAL The userfaultfd object has not yet been enabled using the UFF‐
374 DIO_API ioctl(2) operation
375
376 If the O_NONBLOCK flag is enabled in the associated open file descrip‐
377 tion, the userfaultfd file descriptor can be monitored with poll(2),
378 select(2), and epoll(7). When events are available, the file descrip‐
379 tor indicates as readable. If the O_NONBLOCK flag is not enabled, then
380 poll(2) (always) indicates the file as having a POLLERR condition, and
381 select(2) indicates the file descriptor as both readable and writable.
382
384 On success, userfaultfd() returns a new file descriptor that refers to
385 the userfaultfd object. On error, -1 is returned, and errno is set to
386 indicate the error.
387
389 EINVAL An unsupported value was specified in flags.
390
391 EMFILE The per-process limit on the number of open file descriptors has
392 been reached
393
394 ENFILE The system-wide limit on the total number of open files has been
395 reached.
396
397 ENOMEM Insufficient kernel memory was available.
398
399 EPERM (since Linux 5.2)
400 The caller is not privileged (does not have the CAP_SYS_PTRACE
401 capability in the initial user namespace), and /proc/sys/vm/un‐
402 privileged_userfaultfd has the value 0.
403
405 Linux.
406
408 Linux 4.3.
409
410 Support for hugetlbfs and shared memory areas and non-page-fault events
411 was added in Linux 4.11
412
414 The userfaultfd mechanism can be used as an alternative to traditional
415 user-space paging techniques based on the use of the SIGSEGV signal and
416 mmap(2). It can also be used to implement lazy restore for check‐
417 point/restore mechanisms, as well as post-copy migration to allow
418 (nearly) uninterrupted execution when transferring virtual machines and
419 Linux containers from one host to another.
420
422 If the UFFD_FEATURE_EVENT_FORK is enabled and a system call from the
423 fork(2) family is interrupted by a signal or failed, a stale user‐
424 faultfd descriptor might be created. In this case, a spurious
425 UFFD_EVENT_FORK will be delivered to the userfaultfd monitor.
426
428 The program below demonstrates the use of the userfaultfd mechanism.
429 The program creates two threads, one of which acts as the page-fault
430 handler for the process, for the pages in a demand-page zero region
431 created using mmap(2).
432
433 The program takes one command-line argument, which is the number of
434 pages that will be created in a mapping whose page faults will be han‐
435 dled via userfaultfd. After creating a userfaultfd object, the program
436 then creates an anonymous private mapping of the specified size and
437 registers the address range of that mapping using the UFFDIO_REGISTER
438 ioctl(2) operation. The program then creates a second thread that will
439 perform the task of handling page faults.
440
441 The main thread then walks through the pages of the mapping fetching
442 bytes from successive pages. Because the pages have not yet been ac‐
443 cessed, the first access of a byte in each page will trigger a page-
444 fault event on the userfaultfd file descriptor.
445
446 Each of the page-fault events is handled by the second thread, which
447 sits in a loop processing input from the userfaultfd file descriptor.
448 In each loop iteration, the second thread first calls poll(2) to check
449 the state of the file descriptor, and then reads an event from the file
450 descriptor. All such events should be UFFD_EVENT_PAGEFAULT events,
451 which the thread handles by copying a page of data into the faulting
452 region using the UFFDIO_COPY ioctl(2) operation.
453
454 The following is an example of what we see when running the program:
455
456 $ ./userfaultfd_demo 3
457 Address returned by mmap() = 0x7fd30106c000
458
459 fault_handler_thread():
460 poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
461 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106c00f
462 (uffdio_copy.copy returned 4096)
463 Read address 0x7fd30106c00f in main(): A
464 Read address 0x7fd30106c40f in main(): A
465 Read address 0x7fd30106c80f in main(): A
466 Read address 0x7fd30106cc0f in main(): A
467
468 fault_handler_thread():
469 poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
470 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106d00f
471 (uffdio_copy.copy returned 4096)
472 Read address 0x7fd30106d00f in main(): B
473 Read address 0x7fd30106d40f in main(): B
474 Read address 0x7fd30106d80f in main(): B
475 Read address 0x7fd30106dc0f in main(): B
476
477 fault_handler_thread():
478 poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
479 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106e00f
480 (uffdio_copy.copy returned 4096)
481 Read address 0x7fd30106e00f in main(): C
482 Read address 0x7fd30106e40f in main(): C
483 Read address 0x7fd30106e80f in main(): C
484 Read address 0x7fd30106ec0f in main(): C
485
486 Program source
487
488 /* userfaultfd_demo.c
489
490 Licensed under the GNU General Public License version 2 or later.
491 */
492 #define _GNU_SOURCE
493 #include <err.h>
494 #include <errno.h>
495 #include <fcntl.h>
496 #include <inttypes.h>
497 #include <linux/userfaultfd.h>
498 #include <poll.h>
499 #include <pthread.h>
500 #include <stdio.h>
501 #include <stdlib.h>
502 #include <string.h>
503 #include <sys/ioctl.h>
504 #include <sys/mman.h>
505 #include <sys/syscall.h>
506 #include <unistd.h>
507
508 static int page_size;
509
510 static void *
511 fault_handler_thread(void *arg)
512 {
513 int nready;
514 long uffd; /* userfaultfd file descriptor */
515 ssize_t nread;
516 struct pollfd pollfd;
517 struct uffdio_copy uffdio_copy;
518
519 static int fault_cnt = 0; /* Number of faults so far handled */
520 static char *page = NULL;
521 static struct uffd_msg msg; /* Data read from userfaultfd */
522
523 uffd = (long) arg;
524
525 /* Create a page that will be copied into the faulting region. */
526
527 if (page == NULL) {
528 page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
529 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
530 if (page == MAP_FAILED)
531 err(EXIT_FAILURE, "mmap");
532 }
533
534 /* Loop, handling incoming events on the userfaultfd
535 file descriptor. */
536
537 for (;;) {
538
539 /* See what poll() tells us about the userfaultfd. */
540
541 pollfd.fd = uffd;
542 pollfd.events = POLLIN;
543 nready = poll(&pollfd, 1, -1);
544 if (nready == -1)
545 err(EXIT_FAILURE, "poll");
546
547 printf("\nfault_handler_thread():\n");
548 printf(" poll() returns: nready = %d; "
549 "POLLIN = %d; POLLERR = %d\n", nready,
550 (pollfd.revents & POLLIN) != 0,
551 (pollfd.revents & POLLERR) != 0);
552
553 /* Read an event from the userfaultfd. */
554
555 nread = read(uffd, &msg, sizeof(msg));
556 if (nread == 0) {
557 printf("EOF on userfaultfd!\n");
558 exit(EXIT_FAILURE);
559 }
560
561 if (nread == -1)
562 err(EXIT_FAILURE, "read");
563
564 /* We expect only one kind of event; verify that assumption. */
565
566 if (msg.event != UFFD_EVENT_PAGEFAULT) {
567 fprintf(stderr, "Unexpected event on userfaultfd\n");
568 exit(EXIT_FAILURE);
569 }
570
571 /* Display info about the page-fault event. */
572
573 printf(" UFFD_EVENT_PAGEFAULT event: ");
574 printf("flags = %"PRIx64"; ", msg.arg.pagefault.flags);
575 printf("address = %"PRIx64"\n", msg.arg.pagefault.address);
576
577 /* Copy the page pointed to by 'page' into the faulting
578 region. Vary the contents that are copied in, so that it
579 is more obvious that each fault is handled separately. */
580
581 memset(page, 'A' + fault_cnt % 20, page_size);
582 fault_cnt++;
583
584 uffdio_copy.src = (unsigned long) page;
585
586 /* We need to handle page faults in units of pages(!).
587 So, round faulting address down to page boundary. */
588
589 uffdio_copy.dst = (unsigned long) msg.arg.pagefault.address &
590 ~(page_size - 1);
591 uffdio_copy.len = page_size;
592 uffdio_copy.mode = 0;
593 uffdio_copy.copy = 0;
594 if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
595 err(EXIT_FAILURE, "ioctl-UFFDIO_COPY");
596
597 printf(" (uffdio_copy.copy returned %"PRId64")\n",
598 uffdio_copy.copy);
599 }
600 }
601
602 int
603 main(int argc, char *argv[])
604 {
605 int s;
606 char c;
607 char *addr; /* Start of region handled by userfaultfd */
608 long uffd; /* userfaultfd file descriptor */
609 size_t len, l; /* Length of region handled by userfaultfd */
610 pthread_t thr; /* ID of thread that handles page faults */
611 struct uffdio_api uffdio_api;
612 struct uffdio_register uffdio_register;
613
614 if (argc != 2) {
615 fprintf(stderr, "Usage: %s num-pages\n", argv[0]);
616 exit(EXIT_FAILURE);
617 }
618
619 page_size = sysconf(_SC_PAGE_SIZE);
620 len = strtoull(argv[1], NULL, 0) * page_size;
621
622 /* Create and enable userfaultfd object. */
623
624 uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK);
625 if (uffd == -1)
626 err(EXIT_FAILURE, "userfaultfd");
627
628 uffdio_api.api = UFFD_API;
629 uffdio_api.features = 0;
630 if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
631 err(EXIT_FAILURE, "ioctl-UFFDIO_API");
632
633 /* Create a private anonymous mapping. The memory will be
634 demand-zero paged--that is, not yet allocated. When we
635 actually touch the memory, it will be allocated via
636 the userfaultfd. */
637
638 addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
639 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
640 if (addr == MAP_FAILED)
641 err(EXIT_FAILURE, "mmap");
642
643 printf("Address returned by mmap() = %p\n", addr);
644
645 /* Register the memory range of the mapping we just created for
646 handling by the userfaultfd object. In mode, we request to track
647 missing pages (i.e., pages that have not yet been faulted in). */
648
649 uffdio_register.range.start = (unsigned long) addr;
650 uffdio_register.range.len = len;
651 uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
652 if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
653 err(EXIT_FAILURE, "ioctl-UFFDIO_REGISTER");
654
655 /* Create a thread that will process the userfaultfd events. */
656
657 s = pthread_create(&thr, NULL, fault_handler_thread, (void *) uffd);
658 if (s != 0) {
659 errc(EXIT_FAILURE, s, "pthread_create");
660 }
661
662 /* Main thread now touches memory in the mapping, touching
663 locations 1024 bytes apart. This will trigger userfaultfd
664 events for all pages in the region. */
665
666 l = 0xf; /* Ensure that faulting address is not on a page
667 boundary, in order to test that we correctly
668 handle that case in fault_handling_thread(). */
669 while (l < len) {
670 c = addr[l];
671 printf("Read address %p in %s(): ", addr + l, __func__);
672 printf("%c\n", c);
673 l += 1024;
674 usleep(100000); /* Slow things down a little */
675 }
676
677 exit(EXIT_SUCCESS);
678 }
679
681 fcntl(2), ioctl(2), ioctl_userfaultfd(2), madvise(2), mmap(2)
682
683 Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
684 tree
685
686
687
688Linux man-pages 6.04 2023-03-30 userfaultfd(2)