1userfaultfd(2)                System Calls Manual               userfaultfd(2)
2
3
4

NAME

6       userfaultfd - create a file descriptor for handling page faults in user
7       space
8

LIBRARY

10       Standard C library (libc, -lc)
11

SYNOPSIS

13       #include <fcntl.h>             /* Definition of O_* constants */
14       #include <sys/syscall.h>       /* Definition of SYS_* constants */
15       #include <linux/userfaultfd.h> /* Definition of UFFD_* constants */
16       #include <unistd.h>
17
18       int syscall(SYS_userfaultfd, int flags);
19
20       Note: glibc provides no wrapper for  userfaultfd(),  necessitating  the
21       use of syscall(2).
22

DESCRIPTION

24       userfaultfd()  creates  a  new  userfaultfd object that can be used for
25       delegation of page-fault handling to a user-space application, and  re‐
26       turns  a  file descriptor that refers to the new object.  The new user‐
27       faultfd object is configured using ioctl(2).
28
29       Once the userfaultfd object is  configured,  the  application  can  use
30       read(2)  to  receive  userfaultfd  notifications.  The reads from user‐
31       faultfd may be blocking or non-blocking,  depending  on  the  value  of
32       flags  used  for the creation of the userfaultfd or subsequent calls to
33       fcntl(2).
34
35       The following values may be bitwise ORed in flags to change the  behav‐
36       ior of userfaultfd():
37
38       O_CLOEXEC
39              Enable  the  close-on-exec flag for the new userfaultfd file de‐
40              scriptor.  See the description of the O_CLOEXEC flag in open(2).
41
42       O_NONBLOCK
43              Enables non-blocking operation for the userfaultfd object.   See
44              the description of the O_NONBLOCK flag in open(2).
45
46       UFFD_USER_MODE_ONLY
47              This  is  an  userfaultfd-specific  flag  that was introduced in
48              Linux 5.11.  When set, the userfaultfd object will only be  able
49              to handle page faults originated from the user space on the reg‐
50              istered regions.  When a kernel-originated fault  was  triggered
51              on  the  registered range with this userfaultfd, a SIGBUS signal
52              will be delivered.
53
54       When the last file descriptor referring  to  a  userfaultfd  object  is
55       closed,  all memory ranges that were registered with the object are un‐
56       registered and unread events are flushed.
57
58       Userfaultfd supports three modes of registration:
59
60       UFFDIO_REGISTER_MODE_MISSING (since Linux 4.10)
61              When registered with  UFFDIO_REGISTER_MODE_MISSING  mode,  user-
62              space will receive a page-fault notification when a missing page
63              is accessed.  The faulted thread will be stopped from  execution
64              until  the  page  fault is resolved from user-space by either an
65              UFFDIO_COPY or an UFFDIO_ZEROPAGE ioctl.
66
67       UFFDIO_REGISTER_MODE_MINOR (since Linux 5.13)
68              When registered with UFFDIO_REGISTER_MODE_MINOR mode, user-space
69              will  receive  a page-fault notification when a minor page fault
70              occurs.  That is, when a backing page is in the page cache,  but
71              page  table entries don't yet exist.  The faulted thread will be
72              stopped from execution until the page  fault  is  resolved  from
73              user-space by an UFFDIO_CONTINUE ioctl.
74
75       UFFDIO_REGISTER_MODE_WP (since Linux 5.7)
76              When  registered  with  UFFDIO_REGISTER_MODE_WP mode, user-space
77              will receive a page-fault notification  when  a  write-protected
78              page is written.  The faulted thread will be stopped from execu‐
79              tion until user-space write-unprotects the page  using  an  UFF‐
80              DIO_WRITEPROTECT ioctl.
81
82       Multiple  modes  can  be  enabled  at the same time for the same memory
83       range.
84
85       Since Linux 4.14, a userfaultfd page-fault notification can selectively
86       embed  faulting thread ID information into the notification.  One needs
87       to enable this feature explicitly using the UFFD_FEATURE_THREAD_ID fea‐
88       ture bit when initializing the userfaultfd context.  By default, thread
89       ID reporting is disabled.
90
91   Usage
92       The userfaultfd mechanism is designed to allow a  thread  in  a  multi‐
93       threaded  program to perform user-space paging for the other threads in
94       the process.  When a page fault occurs for one of  the  regions  regis‐
95       tered  to  the  userfaultfd object, the faulting thread is put to sleep
96       and an event is generated that can be read via the userfaultfd file de‐
97       scriptor.   The  fault-handling  thread reads events from this file de‐
98       scriptor  and  services  them  using  the   operations   described   in
99       ioctl_userfaultfd(2).  When servicing the page fault events, the fault-
100       handling thread can trigger a wake-up for the sleeping thread.
101
102       It is possible for the faulting threads and the fault-handling  threads
103       to  run  in  the  context  of different processes.  In this case, these
104       threads may belong to different programs, and the program that executes
105       the  faulting  threads  will not necessarily cooperate with the program
106       that handles the  page  faults.   In  such  non-cooperative  mode,  the
107       process  that  monitors userfaultfd and handles page faults needs to be
108       aware of the changes in the  virtual  memory  layout  of  the  faulting
109       process to avoid memory corruption.
110
111       Since  Linux  4.11,  userfaultfd  can  also  notify  the fault-handling
112       threads about changes in the virtual  memory  layout  of  the  faulting
113       process.   In  addition,  if  the faulting process invokes fork(2), the
114       userfaultfd objects associated with the parent may be  duplicated  into
115       the child process and the userfaultfd monitor will be notified (via the
116       UFFD_EVENT_FORK described below) about the file  descriptor  associated
117       with  the userfault objects created for the child process, which allows
118       the userfaultfd monitor to perform  user-space  paging  for  the  child
119       process.   Unlike  page faults which have to be synchronous and require
120       an explicit or implicit wakeup, all other events  are  delivered  asyn‐
121       chronously and the non-cooperative process resumes execution as soon as
122       the userfaultfd manager  executes  read(2).   The  userfaultfd  manager
123       should  carefully  synchronize calls to UFFDIO_COPY with the processing
124       of events.
125
126       The current asynchronous model of the event  delivery  is  optimal  for
127       single threaded non-cooperative userfaultfd manager implementations.
128
129       Since  Linux  5.7,  userfaultfd  is  able  to do synchronous page dirty
130       tracking using the new write-protect register mode.  One  should  check
131       against  the  feature  bit  UFFD_FEATURE_PAGEFAULT_FLAG_WP before using
132       this feature.  Similar to the original userfaultfd  missing  mode,  the
133       write-protect  mode  will  generate a userfaultfd notification when the
134       protected page is written.  The user needs to resolve the page fault by
135       unprotecting  the  faulted  page and kicking the faulted thread to con‐
136       tinue.  For more information, please refer to the  "Userfaultfd  write-
137       protect mode" section.
138
139   Userfaultfd operation
140       After  the userfaultfd object is created with userfaultfd(), the appli‐
141       cation must enable it using the UFFDIO_API  ioctl(2)  operation.   This
142       operation  allows  a handshake between the kernel and user space to de‐
143       termine the API version and supported features.  This operation must be
144       performed  before  any of the other ioctl(2) operations described below
145       (or those operations fail with the EINVAL error).
146
147       After a successful UFFDIO_API operation, the application then registers
148       memory  address  ranges  using  the UFFDIO_REGISTER ioctl(2) operation.
149       After successful completion of  a  UFFDIO_REGISTER  operation,  a  page
150       fault  occurring in the requested memory range, and satisfying the mode
151       defined at the registration time, will be forwarded by  the  kernel  to
152       the  user-space  application.   The  application  can then use the UFF‐
153       DIO_COPY , UFFDIO_ZEROPAGE , or UFFDIO_CONTINUE ioctl(2) operations  to
154       resolve the page fault.
155
156       Since  Linux 4.14, if the application sets the UFFD_FEATURE_SIGBUS fea‐
157       ture bit using the UFFDIO_API ioctl(2), no page-fault notification will
158       be  forwarded  to  user space.  Instead a SIGBUS signal is delivered to
159       the faulting process.  With this feature, userfaultfd can be  used  for
160       robustness purposes to simply catch any access to areas within the reg‐
161       istered address range that do not have pages allocated, without  having
162       to  listen  to  userfaultfd events.  No userfaultfd monitor will be re‐
163       quired for dealing with such memory accesses.  For example,  this  fea‐
164       ture  can  be  useful  for applications that want to prevent the kernel
165       from automatically allocating pages and filling holes in  sparse  files
166       when the hole is accessed through a memory mapping.
167
168       The UFFD_FEATURE_SIGBUS feature is implicitly inherited through fork(2)
169       if used in combination with UFFD_FEATURE_FORK.
170
171       Details of the various ioctl(2) operations can be found in  ioctl_user‐
172       faultfd(2).
173
174       Since  Linux 4.11, events other than page-fault may enabled during UFF‐
175       DIO_API operation.
176
177       Up to Linux 4.11, userfaultfd can be used only with  anonymous  private
178       memory  mappings.   Since Linux 4.11, userfaultfd can be also used with
179       hugetlbfs and shared memory mappings.
180
181   Userfaultfd write-protect mode (since Linux 5.7)
182       Since Linux 5.7, userfaultfd supports write-protect mode for  anonymous
183       memory.  The user needs to first check availability of this feature us‐
184       ing  UFFDIO_API  ioctl  against  the  feature  bit   UFFD_FEATURE_PAGE‐
185       FAULT_FLAG_WP before using this feature.
186
187       Since Linux 5.19, the write-protection mode was also supported on shmem
188       and hugetlbfs memory types.  It can be detected with  the  feature  bit
189       UFFD_FEATURE_WP_HUGETLBFS_SHMEM.
190
191       To register with userfaultfd write-protect mode, the user needs to ini‐
192       tiate the UFFDIO_REGISTER ioctl with mode UFFDIO_REGISTER_MODE_WP  set.
193       Note  that  it  is legal to monitor the same memory range with multiple
194       modes.  For example, the user can do UFFDIO_REGISTER with the mode  set
195       to  UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP.  When there
196       is only UFFDIO_REGISTER_MODE_WP registered, user-space will not receive
197       any  notification  when a missing page is written.  Instead, user-space
198       will receive a write-protect page-fault notification only when  an  ex‐
199       isting but write-protected page got written.
200
201       After  the UFFDIO_REGISTER ioctl completed with UFFDIO_REGISTER_MODE_WP
202       mode set, the user can write-protect any  existing  memory  within  the
203       range   using  the  ioctl  UFFDIO_WRITEPROTECT  where  uffdio_writepro‐
204       tect.mode should be set to UFFDIO_WRITEPROTECT_MODE_WP.
205
206       When a write-protect event happens, user-space  will  receive  a  page-
207       fault   notification   whose   uffd_msg.pagefault.flags  will  be  with
208       UFFD_PAGEFAULT_FLAG_WP flag set.  Note: since only writes  can  trigger
209       this  kind  of  fault, write-protect notifications will always have the
210       UFFD_PAGEFAULT_FLAG_WRITE bit set along with the UFFD_PAGEFAULT_FLAG_WP
211       bit.
212
213       To  resolve a write-protection page fault, the user should initiate an‐
214       other UFFDIO_WRITEPROTECT ioctl, whose uffd_msg.pagefault.flags  should
215       have the flag UFFDIO_WRITEPROTECT_MODE_WP cleared upon the faulted page
216       or range.
217
218   Userfaultfd minor fault mode (since Linux 5.13)
219       Since Linux 5.13, userfaultfd supports minor fault mode.  In this mode,
220       fault  messages  are  produced not for major faults (where the page was
221       missing), but rather for minor faults, where a page exists in the  page
222       cache,  but the page table entries are not yet present.  The user needs
223       to first check availability of this feature using the UFFDIO_API  ioctl
224       with  the  appropriate  feature  bits  set  before  using this feature:
225       UFFD_FEATURE_MINOR_HUGETLBFS  since  Linux  5.13,  or  UFFD_FEATURE_MI‐
226       NOR_SHMEM since Linux 5.14.
227
228       To register with userfaultfd minor fault mode, the user needs to initi‐
229       ate the UFFDIO_REGISTER ioctl with mode UFFD_REGISTER_MODE_MINOR set.
230
231       When a minor fault occurs, user-space will receive a page-fault notifi‐
232       cation   whose   uffd_msg.pagefault.flags   will  have  the  UFFD_PAGE‐
233       FAULT_FLAG_MINOR flag set.
234
235       To resolve a minor page fault, the handler should decide whether or not
236       the  existing  page  contents  need  to be modified first.  If so, this
237       should be done in-place via a second,  non-userfaultfd-registered  map‐
238       ping  to the same backing page (e.g., by mapping the shmem or hugetlbfs
239       file twice).  Once the page is considered "up to date", the  fault  can
240       be  resolved by initiating an UFFDIO_CONTINUE ioctl, which installs the
241       page table entries and (by default) wakes up the faulting thread(s).
242
243       Minor fault mode supports only hugetlbfs-backed (since Linux 5.13)  and
244       shmem-backed (since Linux 5.14) memory.
245
246   Reading from the userfaultfd structure
247       Each  read(2)  from the userfaultfd file descriptor returns one or more
248       uffd_msg structures, each of which describes a page-fault event  or  an
249       event required for the non-cooperative userfaultfd usage:
250
251           struct uffd_msg {
252               __u8  event;            /* Type of event */
253               ...
254               union {
255                   struct {
256                       __u64 flags;    /* Flags describing fault */
257                       __u64 address;  /* Faulting address */
258                       union {
259                           __u32 ptid; /* Thread ID of the fault */
260                       } feat;
261                   } pagefault;
262
263                   struct {            /* Since Linux 4.11 */
264                       __u32 ufd;      /* Userfault file descriptor
265                                          of the child process */
266                   } fork;
267
268                   struct {            /* Since Linux 4.11 */
269                       __u64 from;     /* Old address of remapped area */
270                       __u64 to;       /* New address of remapped area */
271                       __u64 len;      /* Original mapping length */
272                   } remap;
273
274                   struct {            /* Since Linux 4.11 */
275                       __u64 start;    /* Start address of removed area */
276                       __u64 end;      /* End address of removed area */
277                   } remove;
278                   ...
279               } arg;
280
281               /* Padding fields omitted */
282           } __packed;
283
284       If  multiple  events  are  available  and  the supplied buffer is large
285       enough, read(2) returns as many events as will fit in the supplied buf‐
286       fer.  If the buffer supplied to read(2) is smaller than the size of the
287       uffd_msg structure, the read(2) fails with the error EINVAL.
288
289       The fields set in the uffd_msg structure are as follows:
290
291       event  The type of event.   Depending  of  the  event  type,  different
292              fields of the arg union represent details required for the event
293              processing.  The non-page-fault events are generated  only  when
294              appropriate  feature  is  enabled during API handshake with UFF‐
295              DIO_API ioctl(2).
296
297              The following values can appear in the event field:
298
299              UFFD_EVENT_PAGEFAULT (since Linux 4.3)
300                     A page-fault event.  The page-fault details are available
301                     in the pagefault field.
302
303              UFFD_EVENT_FORK (since Linux 4.11)
304                     Generated  when  the faulting process invokes fork(2) (or
305                     clone(2) without the CLONE_VM flag).  The  event  details
306                     are available in the fork field.
307
308              UFFD_EVENT_REMAP (since Linux 4.11)
309                     Generated  when  the  faulting process invokes mremap(2).
310                     The event details are available in the remap field.
311
312              UFFD_EVENT_REMOVE (since Linux 4.11)
313                     Generated when the faulting  process  invokes  madvise(2)
314                     with  MADV_DONTNEED or MADV_REMOVE advice.  The event de‐
315                     tails are available in the remove field.
316
317              UFFD_EVENT_UNMAP (since Linux 4.11)
318                     Generated when  the  faulting  process  unmaps  a  memory
319                     range,  either  explicitly  using munmap(2) or implicitly
320                     during mmap(2)  or  mremap(2).   The  event  details  are
321                     available in the remove field.
322
323       pagefault.address
324              The address that triggered the page fault.
325
326       pagefault.flags
327              A   bit   mask   of   flags   that   describe  the  event.   For
328              UFFD_EVENT_PAGEFAULT, the following flag may appear:
329
330              UFFD_PAGEFAULT_FLAG_WP
331                     If this flag is set, then the fault was  a  write-protect
332                     fault.
333
334              UFFD_PAGEFAULT_FLAG_MINOR
335                     If this flag is set, then the fault was a minor fault.
336
337              UFFD_PAGEFAULT_FLAG_WRITE
338                     If this flag is set, then the fault was a write fault.
339
340              If  neither UFFD_PAGEFAULT_FLAG_WP nor UFFD_PAGEFAULT_FLAG_MINOR
341              are set, then the fault was a missing fault.
342
343       pagefault.feat.pid
344              The thread ID that triggered the page fault.
345
346       fork.ufd
347              The file descriptor associated with the userfault object created
348              for the child created by fork(2).
349
350       remap.from
351              The original address of the memory range that was remapped using
352              mremap(2).
353
354       remap.to
355              The new address of the memory  range  that  was  remapped  using
356              mremap(2).
357
358       remap.len
359              The  original length of the memory range that was remapped using
360              mremap(2).
361
362       remove.start
363              The start address of the memory range that was freed using  mad‐
364              vise(2) or unmapped
365
366       remove.end
367              The  end  address  of the memory range that was freed using mad‐
368              vise(2) or unmapped
369
370       A read(2) on a userfaultfd file descriptor can fail with the  following
371       errors:
372
373       EINVAL The  userfaultfd  object has not yet been enabled using the UFF‐
374              DIO_API ioctl(2) operation
375
376       If the O_NONBLOCK flag is enabled in the associated open file  descrip‐
377       tion,  the  userfaultfd  file descriptor can be monitored with poll(2),
378       select(2), and epoll(7).  When events are available, the file  descrip‐
379       tor indicates as readable.  If the O_NONBLOCK flag is not enabled, then
380       poll(2) (always) indicates the file as having a POLLERR condition,  and
381       select(2) indicates the file descriptor as both readable and writable.
382

RETURN VALUE

384       On  success, userfaultfd() returns a new file descriptor that refers to
385       the userfaultfd object.  On error, -1 is returned, and errno is set  to
386       indicate the error.
387

ERRORS

389       EINVAL An unsupported value was specified in flags.
390
391       EMFILE The per-process limit on the number of open file descriptors has
392              been reached
393
394       ENFILE The system-wide limit on the total number of open files has been
395              reached.
396
397       ENOMEM Insufficient kernel memory was available.
398
399       EPERM (since Linux 5.2)
400              The  caller  is not privileged (does not have the CAP_SYS_PTRACE
401              capability in the initial user namespace), and  /proc/sys/vm/un‐
402              privileged_userfaultfd has the value 0.
403

STANDARDS

405       Linux.
406

HISTORY

408       Linux 4.3.
409
410       Support for hugetlbfs and shared memory areas and non-page-fault events
411       was added in Linux 4.11
412

NOTES

414       The userfaultfd mechanism can be used as an alternative to  traditional
415       user-space paging techniques based on the use of the SIGSEGV signal and
416       mmap(2).  It can also be used to  implement  lazy  restore  for  check‐
417       point/restore  mechanisms,  as  well  as  post-copy  migration to allow
418       (nearly) uninterrupted execution when transferring virtual machines and
419       Linux containers from one host to another.
420

BUGS

422       If  the  UFFD_FEATURE_EVENT_FORK  is enabled and a system call from the
423       fork(2) family is interrupted by a signal  or  failed,  a  stale  user‐
424       faultfd  descriptor  might  be  created.   In  this  case,  a  spurious
425       UFFD_EVENT_FORK will be delivered to the userfaultfd monitor.
426

EXAMPLES

428       The program below demonstrates the use of  the  userfaultfd  mechanism.
429       The  program  creates  two threads, one of which acts as the page-fault
430       handler for the process, for the pages in  a  demand-page  zero  region
431       created using mmap(2).
432
433       The  program  takes  one  command-line argument, which is the number of
434       pages that will be created in a mapping whose page faults will be  han‐
435       dled via userfaultfd.  After creating a userfaultfd object, the program
436       then creates an anonymous private mapping of  the  specified  size  and
437       registers  the  address range of that mapping using the UFFDIO_REGISTER
438       ioctl(2) operation.  The program then creates a second thread that will
439       perform the task of handling page faults.
440
441       The  main  thread  then walks through the pages of the mapping fetching
442       bytes from successive pages.  Because the pages have not yet  been  ac‐
443       cessed,  the  first  access of a byte in each page will trigger a page-
444       fault event on the userfaultfd file descriptor.
445
446       Each of the page-fault events is handled by the  second  thread,  which
447       sits  in  a loop processing input from the userfaultfd file descriptor.
448       In each loop iteration, the second thread first calls poll(2) to  check
449       the state of the file descriptor, and then reads an event from the file
450       descriptor.  All such events  should  be  UFFD_EVENT_PAGEFAULT  events,
451       which  the  thread  handles by copying a page of data into the faulting
452       region using the UFFDIO_COPY ioctl(2) operation.
453
454       The following is an example of what we see when running the program:
455
456           $ ./userfaultfd_demo 3
457           Address returned by mmap() = 0x7fd30106c000
458
459           fault_handler_thread():
460               poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
461               UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106c00f
462                   (uffdio_copy.copy returned 4096)
463           Read address 0x7fd30106c00f in main(): A
464           Read address 0x7fd30106c40f in main(): A
465           Read address 0x7fd30106c80f in main(): A
466           Read address 0x7fd30106cc0f in main(): A
467
468           fault_handler_thread():
469               poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
470               UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106d00f
471                   (uffdio_copy.copy returned 4096)
472           Read address 0x7fd30106d00f in main(): B
473           Read address 0x7fd30106d40f in main(): B
474           Read address 0x7fd30106d80f in main(): B
475           Read address 0x7fd30106dc0f in main(): B
476
477           fault_handler_thread():
478               poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
479               UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fd30106e00f
480                   (uffdio_copy.copy returned 4096)
481           Read address 0x7fd30106e00f in main(): C
482           Read address 0x7fd30106e40f in main(): C
483           Read address 0x7fd30106e80f in main(): C
484           Read address 0x7fd30106ec0f in main(): C
485
486   Program source
487
488       /* userfaultfd_demo.c
489
490          Licensed under the GNU General Public License version 2 or later.
491       */
492       #define _GNU_SOURCE
493       #include <err.h>
494       #include <errno.h>
495       #include <fcntl.h>
496       #include <inttypes.h>
497       #include <linux/userfaultfd.h>
498       #include <poll.h>
499       #include <pthread.h>
500       #include <stdio.h>
501       #include <stdlib.h>
502       #include <string.h>
503       #include <sys/ioctl.h>
504       #include <sys/mman.h>
505       #include <sys/syscall.h>
506       #include <unistd.h>
507
508       static int page_size;
509
510       static void *
511       fault_handler_thread(void *arg)
512       {
513           int                 nready;
514           long                uffd;   /* userfaultfd file descriptor */
515           ssize_t             nread;
516           struct pollfd       pollfd;
517           struct uffdio_copy  uffdio_copy;
518
519           static int      fault_cnt = 0; /* Number of faults so far handled */
520           static char     *page = NULL;
521           static struct uffd_msg  msg;  /* Data read from userfaultfd */
522
523           uffd = (long) arg;
524
525           /* Create a page that will be copied into the faulting region. */
526
527           if (page == NULL) {
528               page = mmap(NULL, page_size, PROT_READ | PROT_WRITE,
529                           MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
530               if (page == MAP_FAILED)
531                   err(EXIT_FAILURE, "mmap");
532           }
533
534           /* Loop, handling incoming events on the userfaultfd
535              file descriptor. */
536
537           for (;;) {
538
539               /* See what poll() tells us about the userfaultfd. */
540
541               pollfd.fd = uffd;
542               pollfd.events = POLLIN;
543               nready = poll(&pollfd, 1, -1);
544               if (nready == -1)
545                   err(EXIT_FAILURE, "poll");
546
547               printf("\nfault_handler_thread():\n");
548               printf("    poll() returns: nready = %d; "
549                      "POLLIN = %d; POLLERR = %d\n", nready,
550                      (pollfd.revents & POLLIN) != 0,
551                      (pollfd.revents & POLLERR) != 0);
552
553               /* Read an event from the userfaultfd. */
554
555               nread = read(uffd, &msg, sizeof(msg));
556               if (nread == 0) {
557                   printf("EOF on userfaultfd!\n");
558                   exit(EXIT_FAILURE);
559               }
560
561               if (nread == -1)
562                   err(EXIT_FAILURE, "read");
563
564               /* We expect only one kind of event; verify that assumption. */
565
566               if (msg.event != UFFD_EVENT_PAGEFAULT) {
567                   fprintf(stderr, "Unexpected event on userfaultfd\n");
568                   exit(EXIT_FAILURE);
569               }
570
571               /* Display info about the page-fault event. */
572
573               printf("    UFFD_EVENT_PAGEFAULT event: ");
574               printf("flags = %"PRIx64"; ", msg.arg.pagefault.flags);
575               printf("address = %"PRIx64"\n", msg.arg.pagefault.address);
576
577               /* Copy the page pointed to by 'page' into the faulting
578                  region. Vary the contents that are copied in, so that it
579                  is more obvious that each fault is handled separately. */
580
581               memset(page, 'A' + fault_cnt % 20, page_size);
582               fault_cnt++;
583
584               uffdio_copy.src = (unsigned long) page;
585
586               /* We need to handle page faults in units of pages(!).
587                  So, round faulting address down to page boundary. */
588
589               uffdio_copy.dst = (unsigned long) msg.arg.pagefault.address &
590                                                  ~(page_size - 1);
591               uffdio_copy.len = page_size;
592               uffdio_copy.mode = 0;
593               uffdio_copy.copy = 0;
594               if (ioctl(uffd, UFFDIO_COPY, &uffdio_copy) == -1)
595                   err(EXIT_FAILURE, "ioctl-UFFDIO_COPY");
596
597               printf("        (uffdio_copy.copy returned %"PRId64")\n",
598                      uffdio_copy.copy);
599           }
600       }
601
602       int
603       main(int argc, char *argv[])
604       {
605           int        s;
606           char       c;
607           char       *addr;   /* Start of region handled by userfaultfd */
608           long       uffd;    /* userfaultfd file descriptor */
609           size_t     len, l;  /* Length of region handled by userfaultfd */
610           pthread_t  thr;     /* ID of thread that handles page faults */
611           struct uffdio_api       uffdio_api;
612           struct uffdio_register  uffdio_register;
613
614           if (argc != 2) {
615               fprintf(stderr, "Usage: %s num-pages\n", argv[0]);
616               exit(EXIT_FAILURE);
617           }
618
619           page_size = sysconf(_SC_PAGE_SIZE);
620           len = strtoull(argv[1], NULL, 0) * page_size;
621
622           /* Create and enable userfaultfd object. */
623
624           uffd = syscall(SYS_userfaultfd, O_CLOEXEC | O_NONBLOCK);
625           if (uffd == -1)
626               err(EXIT_FAILURE, "userfaultfd");
627
628           uffdio_api.api = UFFD_API;
629           uffdio_api.features = 0;
630           if (ioctl(uffd, UFFDIO_API, &uffdio_api) == -1)
631               err(EXIT_FAILURE, "ioctl-UFFDIO_API");
632
633           /* Create a private anonymous mapping. The memory will be
634              demand-zero paged--that is, not yet allocated. When we
635              actually touch the memory, it will be allocated via
636              the userfaultfd. */
637
638           addr = mmap(NULL, len, PROT_READ | PROT_WRITE,
639                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
640           if (addr == MAP_FAILED)
641               err(EXIT_FAILURE, "mmap");
642
643           printf("Address returned by mmap() = %p\n", addr);
644
645           /* Register the memory range of the mapping we just created for
646              handling by the userfaultfd object. In mode, we request to track
647              missing pages (i.e., pages that have not yet been faulted in). */
648
649           uffdio_register.range.start = (unsigned long) addr;
650           uffdio_register.range.len = len;
651           uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING;
652           if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) == -1)
653               err(EXIT_FAILURE, "ioctl-UFFDIO_REGISTER");
654
655           /* Create a thread that will process the userfaultfd events. */
656
657           s = pthread_create(&thr, NULL, fault_handler_thread, (void *) uffd);
658           if (s != 0) {
659               errc(EXIT_FAILURE, s, "pthread_create");
660           }
661
662           /* Main thread now touches memory in the mapping, touching
663              locations 1024 bytes apart. This will trigger userfaultfd
664              events for all pages in the region. */
665
666           l = 0xf;    /* Ensure that faulting address is not on a page
667                          boundary, in order to test that we correctly
668                          handle that case in fault_handling_thread(). */
669           while (l < len) {
670               c = addr[l];
671               printf("Read address %p in %s(): ", addr + l, __func__);
672               printf("%c\n", c);
673               l += 1024;
674               usleep(100000);         /* Slow things down a little */
675           }
676
677           exit(EXIT_SUCCESS);
678       }
679

SEE ALSO

681       fcntl(2), ioctl(2), ioctl_userfaultfd(2), madvise(2), mmap(2)
682
683       Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
684       tree
685
686
687
688Linux man-pages 6.04              2023-03-30                    userfaultfd(2)
Impressum