1ioctl_userfaultfd(2)          System Calls Manual         ioctl_userfaultfd(2)
2
3
4

NAME

6       ioctl_userfaultfd  -  create a file descriptor for handling page faults
7       in user space
8

LIBRARY

10       Standard C library (libc, -lc)
11

SYNOPSIS

13       #include <linux/userfaultfd.h>  /* Definition of UFFD* constants */
14       #include <sys/ioctl.h>
15
16       int ioctl(int fd, int cmd, ...);
17

DESCRIPTION

19       Various ioctl(2) operations can be performed on  a  userfaultfd  object
20       (created by a call to userfaultfd(2)) using calls of the form:
21
22           ioctl(fd, cmd, argp);
23       In  the  above,  fd is a file descriptor referring to a userfaultfd ob‐
24       ject, cmd is one of the commands listed below, and argp is a pointer to
25       a data structure that is specific to cmd.
26
27       The  various  ioctl(2) operations are described below.  The UFFDIO_API,
28       UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
29       userfaultfd behavior.  These operations allow the caller to choose what
30       features will be enabled and what kinds of events will be delivered  to
31       the application.  The remaining operations are range operations.  These
32       operations enable the calling application to resolve page-fault events.
33
34   UFFDIO_API
35       (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API
36       handshake.
37
38       The argp argument is a pointer to a uffdio_api structure, defined as:
39
40           struct uffdio_api {
41               __u64 api;        /* Requested API version (input) */
42               __u64 features;   /* Requested features (input/output) */
43               __u64 ioctls;     /* Available ioctl() operations (output) */
44           };
45
46       The api field denotes the API version requested by the application.
47
48       The  kernel verifies that it can support the requested API version, and
49       sets the features and ioctls fields to bit masks representing  all  the
50       available features and the generic ioctl(2) operations available.
51
52       Before  Linux  4.11, the features field must be initialized to zero be‐
53       fore the call to UFFDIO_API, and zero (i.e., no feature bits) is placed
54       in the features field by the kernel upon return from ioctl(2).
55
56       Starting from Linux 4.11, the features field can be used to ask whether
57       particular features are supported  and  explicitly  enable  userfaultfd
58       features  that  are disabled by default.  The kernel always reports all
59       the available features in the features field.
60
61       To enable userfaultfd features the application should set a bit  corre‐
62       sponding  to each feature it wants to enable in the features field.  If
63       the kernel supports all the requested features  it  will  enable  them.
64       Otherwise it will zero out the returned uffdio_api structure and return
65       EINVAL.
66
67       The following feature bits may be set:
68
69       UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
70              When this feature is enabled, the userfaultfd objects associated
71              with a parent process are duplicated into the child process dur‐
72              ing fork(2) and a UFFD_EVENT_FORK  event  is  delivered  to  the
73              userfaultfd monitor
74
75       UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
76              If  this  feature  is enabled, when the faulting process invokes
77              mremap(2), the userfaultfd monitor will receive an event of type
78              UFFD_EVENT_REMAP.
79
80       UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
81              If this feature is enabled, when the faulting process calls mad‐
82              vise(2) with the MADV_DONTNEED or MADV_REMOVE  advice  value  to
83              free  a virtual memory area the userfaultfd monitor will receive
84              an event of type UFFD_EVENT_REMOVE.
85
86       UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
87              If this feature is enabled, when  the  faulting  process  unmaps
88              virtual  memory  either explicitly with munmap(2), or implicitly
89              during either mmap(2) or mremap(2), the userfaultfd monitor will
90              receive an event of type UFFD_EVENT_UNMAP.
91
92       UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
93              If  this  feature  bit  is  set, the kernel supports registering
94              userfaultfd ranges on hugetlbfs virtual memory areas
95
96       UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
97              If this feature bit is  set,  the  kernel  supports  registering
98              userfaultfd  ranges  on  shared memory areas.  This includes all
99              kernel shared memory APIs: System  V  shared  memory,  tmpfs(5),
100              shared  mappings  of /dev/zero, mmap(2) with the MAP_SHARED flag
101              set, memfd_create(2), and so on.
102
103       UFFD_FEATURE_SIGBUS (since Linux 4.14)
104              If   this   feature   bit   is   set,   no   page-fault   events
105              (UFFD_EVENT_PAGEFAULT)  will  be  delivered.   Instead, a SIGBUS
106              signal will be sent to the faulting process.  Applications using
107              this  feature  will not require the use of a userfaultfd monitor
108              for processing memory accesses to the  regions  registered  with
109              userfaultfd.
110
111       UFFD_FEATURE_THREAD_ID (since Linux 4.14)
112              If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
113              set to the faulted thread ID for each page-fault message.
114
115       UFFD_FEATURE_MINOR_HUGETLBFS (since Linux 5.13)
116              If this feature bit is  set,  the  kernel  supports  registering
117              userfaultfd  ranges in minor mode on hugetlbfs-backed memory ar‐
118              eas.
119
120       UFFD_FEATURE_MINOR_SHMEM (since Linux 5.14)
121              If this feature bit is  set,  the  kernel  supports  registering
122              userfaultfd ranges in minor mode on shmem-backed memory areas.
123
124       UFFD_FEATURE_EXACT_ADDRESS (since Linux 5.18)
125              If  this  feature bit is set, uffd_msg.pagefault.address will be
126              set to the exact page-fault address that  was  reported  by  the
127              hardware,  and  will  not mask the offset within the page.  Note
128              that old Linux versions might  indicate  the  exact  address  as
129              well, even though the feature bit is not set.
130
131       The returned ioctls field can contain the following bits:
132
133       1 << _UFFDIO_API
134              The UFFDIO_API operation is supported.
135
136       1 << _UFFDIO_REGISTER
137              The UFFDIO_REGISTER operation is supported.
138
139       1 << _UFFDIO_UNREGISTER
140              The UFFDIO_UNREGISTER operation is supported.
141
142       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
143       and errno is set to indicate the error.  Possible errors include:
144
145       EFAULT argp refers to an address that is outside the calling  process's
146              accessible address space.
147
148       EINVAL The  userfaultfd  has  already  been  enabled by a previous UFF‐
149              DIO_API operation.
150
151       EINVAL The API version requested in the api field is not  supported  by
152              this kernel, or the features field passed to the kernel includes
153              feature bits that are not supported by the current  kernel  ver‐
154              sion.
155
156   UFFDIO_REGISTER
157       (Since  Linux  4.3.)   Register  a  memory address range with the user‐
158       faultfd object.  The pages in the range must be  "compatible".   Please
159       refer  to  the  list  of register modes below for the compatible memory
160       backends for each mode.
161
162       The argp argument is a pointer to a uffdio_register structure,  defined
163       as:
164
165           struct uffdio_range {
166               __u64 start;    /* Start of range */
167               __u64 len;      /* Length of range (bytes) */
168           };
169
170           struct uffdio_register {
171               struct uffdio_range range;
172               __u64 mode;     /* Desired mode of operation (input) */
173               __u64 ioctls;   /* Available ioctl() operations (output) */
174           };
175
176       The range field defines a memory range starting at start and continuing
177       for len bytes that should be handled by the userfaultfd.
178
179       The mode field defines the mode of operation desired  for  this  memory
180       region.   The  following  values  may  be bitwise ORed to set the user‐
181       faultfd mode for the specified range:
182
183       UFFDIO_REGISTER_MODE_MISSING
184              Track page faults on missing pages.  Since Linux 4.3, only  pri‐
185              vate   anonymous  ranges  are  compatible.   Since  Linux  4.11,
186              hugetlbfs and shared memory ranges are also compatible.
187
188       UFFDIO_REGISTER_MODE_WP
189              Track page faults on write-protected pages.   Since  Linux  5.7,
190              only private anonymous ranges are compatible.
191
192       UFFDIO_REGISTER_MODE_MINOR
193              Track  minor  page  faults.   Since  Linux  5.13, only hugetlbfs
194              ranges are compatible.  Since  Linux  5.14,  compatibility  with
195              shmem ranges was added.
196
197       If the operation is successful, the kernel modifies the ioctls bit-mask
198       field to indicate which ioctl(2) operations are available for the spec‐
199       ified range.  This returned bit mask can contain the following bits:
200
201       1 << _UFFDIO_COPY
202              The UFFDIO_COPY operation is supported.
203
204       1 << _UFFDIO_WAKE
205              The UFFDIO_WAKE operation is supported.
206
207       1 << _UFFDIO_WRITEPROTECT
208              The UFFDIO_WRITEPROTECT
209
210       1 << _UFFDIO_ZEROPAGE
211              The UFFDIO_ZEROPAGE operation is supported.
212
213       1 << _UFFDIO_CONTINUE
214              The UFFDIO_CONTINUE operation is supported.
215
216       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
217       and errno is set to indicate the error.  Possible errors include:
218
219       EBUSY  A mapping in the specified  range  is  registered  with  another
220              userfaultfd object.
221
222       EFAULT argp  refers to an address that is outside the calling process's
223              accessible address space.
224
225       EINVAL An invalid or unsupported bit was specified in the  mode  field;
226              or the mode field was zero.
227
228       EINVAL There is no mapping in the specified address range.
229
230       EINVAL range.start  or  range.len  is not a multiple of the system page
231              size; or, range.len is zero; or these fields are  otherwise  in‐
232              valid.
233
234       EINVAL There as an incompatible mapping in the specified address range.
235
236   UFFDIO_UNREGISTER
237       (Since Linux 4.3.)  Unregister a memory address range from userfaultfd.
238       The pages in the range must be "compatible"  (see  the  description  of
239       UFFDIO_REGISTER.)
240
241       The address range to unregister is specified in the uffdio_range struc‐
242       ture pointed to by argp.
243
244       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
245       and errno is set to indicate the error.  Possible errors include:
246
247       EINVAL Either  the  start or the len field of the ufdio_range structure
248              was not a multiple of the system page size; or the len field was
249              zero; or these fields were otherwise invalid.
250
251       EINVAL There as an incompatible mapping in the specified address range.
252
253       EINVAL There was no mapping in the specified address range.
254
255   UFFDIO_COPY
256       (Since  Linux 4.3.)  Atomically copy a continuous memory chunk into the
257       userfault registered range and optionally wake up the  blocked  thread.
258       The  source  and  destination addresses and the number of bytes to copy
259       are specified by the src, dst, and len fields of the uffdio_copy struc‐
260       ture pointed to by argp:
261
262           struct uffdio_copy {
263               __u64 dst;    /* Destination of copy */
264               __u64 src;    /* Source of copy */
265               __u64 len;    /* Number of bytes to copy */
266               __u64 mode;   /* Flags controlling behavior of copy */
267               __s64 copy;   /* Number of bytes copied, or negated error */
268           };
269
270       The  following value may be bitwise ORed in mode to change the behavior
271       of the UFFDIO_COPY operation:
272
273       UFFDIO_COPY_MODE_DONTWAKE
274              Do not wake up the thread that waits for page-fault resolution
275
276       UFFDIO_COPY_MODE_WP
277              Copy the page with read-only permission.  This allows  the  user
278              to  trap the next write to the page, which will block and gener‐
279              ate another write-protect userfault message.  This is used  only
280              when   both   UFFDIO_REGISTER_MODE_MISSING   and   UFFDIO_REGIS‐
281              TER_MODE_WP modes are enabled for the registered range.
282
283       The copy field is used by the kernel to return the number of bytes that
284       was actually copied, or an error (a negated errno-style value).  If the
285       value returned in copy doesn't match the value that  was  specified  in
286       len, the operation fails with the error EAGAIN.  The copy field is out‐
287       put-only; it is not read by the UFFDIO_COPY operation.
288
289       This ioctl(2) operation returns 0 on success.  In this case, the entire
290       area was copied.  On error, -1 is returned and errno is set to indicate
291       the error.  Possible errors include:
292
293       EAGAIN The number of bytes copied (i.e., the value returned in the copy
294              field)  does  not  equal the value that was specified in the len
295              field.
296
297       EINVAL Either dst or len was not a multiple of the system page size, or
298              the range specified by src and len or dst and len was invalid.
299
300       EINVAL An invalid bit was specified in the mode field.
301
302       ENOENT (since Linux 4.11)
303              The  faulting  process has changed its virtual memory layout si‐
304              multaneously with an outstanding UFFDIO_COPY operation.
305
306       ENOSPC (from Linux 4.11 until Linux 4.13)
307              The faulting process has exited at the time of a UFFDIO_COPY op‐
308              eration.
309
310       ESRCH (since Linux 4.13)
311              The faulting process has exited at the time of a UFFDIO_COPY op‐
312              eration.
313
314   UFFDIO_ZEROPAGE
315       (Since Linux 4.3.)  Zero out  a  memory  range  registered  with  user‐
316       faultfd.
317
318       The  requested  range is specified by the range field of the uffdio_ze‐
319       ropage structure pointed to by argp:
320
321           struct uffdio_zeropage {
322               struct uffdio_range range;
323               __u64 mode;     /* Flags controlling behavior of copy */
324               __s64 zeropage; /* Number of bytes zeroed, or negated error */
325           };
326
327       The following value may be bitwise ORed in mode to change the  behavior
328       of the UFFDIO_ZEROPAGE operation:
329
330       UFFDIO_ZEROPAGE_MODE_DONTWAKE
331              Do not wake up the thread that waits for page-fault resolution.
332
333       The  zeropage field is used by the kernel to return the number of bytes
334       that was actually zeroed, or an  error  in  the  same  manner  as  UFF‐
335       DIO_COPY.   If  the  value returned in the zeropage field doesn't match
336       the value that was specified in range.len, the operation fails with the
337       error EAGAIN.  The zeropage field is output-only; it is not read by the
338       UFFDIO_ZEROPAGE operation.
339
340       This ioctl(2) operation returns 0 on success.  In this case, the entire
341       area was zeroed.  On error, -1 is returned and errno is set to indicate
342       the error.  Possible errors include:
343
344       EAGAIN The number of bytes zeroed (i.e., the value returned in the  ze‐
345              ropage field) does not equal the value that was specified in the
346              range.len field.
347
348       EINVAL Either range.start or range.len was not a multiple of the system
349              page size; or range.len was zero; or the range specified was in‐
350              valid.
351
352       EINVAL An invalid bit was specified in the mode field.
353
354       ESRCH (since Linux 4.13)
355              The faulting process has exited at the time of a UFFDIO_ZEROPAGE
356              operation.
357
358   UFFDIO_WAKE
359       (Since  Linux  4.3.)  Wake up the thread waiting for page-fault resolu‐
360       tion on a specified memory address range.
361
362       The UFFDIO_WAKE operation is used in conjunction with  UFFDIO_COPY  and
363       UFFDIO_ZEROPAGE  operations  that have the UFFDIO_COPY_MODE_DONTWAKE or
364       UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field.  The userfault
365       monitor  can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
366       in a batch and then explicitly wake up the faulting thread  using  UFF‐
367       DIO_WAKE.
368
369       The  argp  argument  is  a  pointer  to a uffdio_range structure (shown
370       above) that specifies the address range.
371
372       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
373       and errno is set to indicate the error.  Possible errors include:
374
375       EINVAL The  start or the len field of the ufdio_range structure was not
376              a multiple of the system page size; or  len  was  zero;  or  the
377              specified range was otherwise invalid.
378
379   UFFDIO_WRITEPROTECT (Since Linux 5.7)
380       Write-protect  or write-unprotect a userfaultfd-registered memory range
381       registered with mode UFFDIO_REGISTER_MODE_WP.
382
383       The argp argument is a pointer to a uffdio_range structure as shown be‐
384       low:
385
386           struct uffdio_writeprotect {
387               struct uffdio_range range; /* Range to change write permission*/
388               __u64 mode;                /* Mode to change write permission */
389           };
390
391       There are two mode bits that are supported in this structure:
392
393       UFFDIO_WRITEPROTECT_MODE_WP
394              When this mode bit is set, the ioctl will be a write-protect op‐
395              eration upon the memory range specified by range.  Otherwise  it
396              will  be  a  write-unprotect operation upon the specified range,
397              which can be used to resolve a  userfaultfd  write-protect  page
398              fault.
399
400       UFFDIO_WRITEPROTECT_MODE_DONTWAKE
401              When  this mode bit is set, do not wake up any thread that waits
402              for page-fault resolution after  the  operation.   This  can  be
403              specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
404
405       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
406       and errno is set to indicate the error.  Possible errors include:
407
408       EINVAL The start or the len field of the ufdio_range structure was  not
409              a  multiple  of  the  system  page size; or len was zero; or the
410              specified range was otherwise invalid.
411
412       EAGAIN The process was interrupted; retry this call.
413
414       ENOENT The range specified in range is not  valid.   For  example,  the
415              virtual  address  does  not  exist, or not registered with user‐
416              faultfd write-protect mode.
417
418       EFAULT Encountered a generic fault during processing.
419
420   UFFDIO_CONTINUE
421       (Since Linux 5.13.)  Resolve a minor page fault by installing page  ta‐
422       ble entries for existing pages in the page cache.
423
424       The  argp argument is a pointer to a uffdio_continue structure as shown
425       below:
426
427           struct uffdio_continue {
428               struct uffdio_range range;
429                              /* Range to install PTEs for and continue */
430               __u64 mode;    /* Flags controlling the behavior of continue */
431               __s64 mapped;  /* Number of bytes mapped, or negated error */
432           };
433
434       The following value may be bitwise ORed in mode to change the  behavior
435       of the UFFDIO_CONTINUE operation:
436
437       UFFDIO_CONTINUE_MODE_DONTWAKE
438              Do not wake up the thread that waits for page-fault resolution.
439
440       The  mapped  field  is used by the kernel to return the number of bytes
441       that were actually mapped, or an error  in  the  same  manner  as  UFF‐
442       DIO_COPY.   If the value returned in the mapped field doesn't match the
443       value that was specified in range.len, the operation fails with the er‐
444       ror  EAGAIN.   The  mapped  field is output-only; it is not read by the
445       UFFDIO_CONTINUE operation.
446
447       This ioctl(2) operation returns 0 on success.  In this case, the entire
448       area was mapped.  On error, -1 is returned and errno is set to indicate
449       the error.  Possible errors include:
450
451       EAGAIN The number of bytes mapped (i.e.,  the  value  returned  in  the
452              mapped field) does not equal the value that was specified in the
453              range.len field.
454
455       EINVAL Either range.start or range.len was not a multiple of the system
456              page size; or range.len was zero; or the range specified was in‐
457              valid.
458
459       EINVAL An invalid bit was specified in the mode field.
460
461       EEXIST One or more pages were already mapped in the given range.
462
463       ENOENT The faulting process has changed its virtual memory  layout  si‐
464              multaneously with an outstanding UFFDIO_CONTINUE operation.
465
466       ENOMEM Allocating  memory  needed  to  setup  the  page  table mappings
467              failed.
468
469       EFAULT No existing page could be found in the page cache for the  given
470              range.
471
472       ESRCH  The faulting process has exited at the time of a UFFDIO_CONTINUE
473              operation.
474

RETURN VALUE

476       See descriptions of the individual operations, above.
477

ERRORS

479       See descriptions of the individual operations, above.  In addition, the
480       following  general errors can occur for all of the operations described
481       above:
482
483       EFAULT argp does not point to a valid memory address.
484
485       EINVAL (For all operations except UFFDIO_API.)  The userfaultfd  object
486              has not yet been enabled (via the UFFDIO_API operation).
487

STANDARDS

489       Linux.
490

BUGS

492       In  order to detect available userfault features and enable some subset
493       of those features the userfaultfd file descriptor must be closed  after
494       the  first  UFFDIO_API operation that queries features availability and
495       reopened before the second UFFDIO_API operation that  actually  enables
496       the desired features.
497

EXAMPLES

499       See userfaultfd(2).
500

SEE ALSO

502       ioctl(2), mmap(2), userfaultfd(2)
503
504       Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
505       tree
506
507
508
509Linux man-pages 6.05              2023-05-03              ioctl_userfaultfd(2)
Impressum