1ioctl_userfaultfd(2)          System Calls Manual         ioctl_userfaultfd(2)
2
3
4

NAME

6       ioctl_userfaultfd  -  create a file descriptor for handling page faults
7       in user space
8

LIBRARY

10       Standard C library (libc, -lc)
11

SYNOPSIS

13       #include <linux/userfaultfd.h>  /* Definition of UFFD* constants */
14       #include <sys/ioctl.h>
15
16       int ioctl(int fd, int cmd, ...);
17

DESCRIPTION

19       Various ioctl(2) operations can be performed on  a  userfaultfd  object
20       (created by a call to userfaultfd(2)) using calls of the form:
21
22           ioctl(fd, cmd, argp);
23       In  the  above,  fd is a file descriptor referring to a userfaultfd ob‐
24       ject, cmd is one of the commands listed below, and argp is a pointer to
25       a data structure that is specific to cmd.
26
27       The  various  ioctl(2) operations are described below.  The UFFDIO_API,
28       UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
29       userfaultfd behavior.  These operations allow the caller to choose what
30       features will be enabled and what kinds of events will be delivered  to
31       the application.  The remaining operations are range operations.  These
32       operations enable the calling application to resolve page-fault events.
33
34   UFFDIO_API
35       (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API
36       handshake.
37
38       The argp argument is a pointer to a uffdio_api structure, defined as:
39
40           struct uffdio_api {
41               __u64 api;        /* Requested API version (input) */
42               __u64 features;   /* Requested features (input/output) */
43               __u64 ioctls;     /* Available ioctl() operations (output) */
44           };
45
46       The api field denotes the API version requested by the application.
47
48       The  kernel verifies that it can support the requested API version, and
49       sets the features and ioctls fields to bit masks representing  all  the
50       available features and the generic ioctl(2) operations available.
51
52       Before  Linux  4.11, the features field must be initialized to zero be‐
53       fore the call to UFFDIO_API, and zero (i.e., no feature bits) is placed
54       in the features field by the kernel upon return from ioctl(2).
55
56       Starting from Linux 4.11, the features field can be used to ask whether
57       particular features are supported  and  explicitly  enable  userfaultfd
58       features  that  are disabled by default.  The kernel always reports all
59       the available features in the features field.
60
61       To enable userfaultfd features the application should set a bit  corre‐
62       sponding  to each feature it wants to enable in the features field.  If
63       the kernel supports all the requested features  it  will  enable  them.
64       Otherwise it will zero out the returned uffdio_api structure and return
65       EINVAL.
66
67       The following feature bits may be set:
68
69       UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
70              When this feature is enabled, the userfaultfd objects associated
71              with a parent process are duplicated into the child process dur‐
72              ing fork(2) and a UFFD_EVENT_FORK  event  is  delivered  to  the
73              userfaultfd monitor
74
75       UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
76              If  this  feature  is enabled, when the faulting process invokes
77              mremap(2), the userfaultfd monitor will receive an event of type
78              UFFD_EVENT_REMAP.
79
80       UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
81              If this feature is enabled, when the faulting process calls mad‐
82              vise(2) with the MADV_DONTNEED or MADV_REMOVE  advice  value  to
83              free  a virtual memory area the userfaultfd monitor will receive
84              an event of type UFFD_EVENT_REMOVE.
85
86       UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
87              If this feature is enabled, when  the  faulting  process  unmaps
88              virtual  memory  either explicitly with munmap(2), or implicitly
89              during either mmap(2) or mremap(2), the userfaultfd monitor will
90              receive an event of type UFFD_EVENT_UNMAP.
91
92       UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
93              If  this  feature  bit  is  set, the kernel supports registering
94              userfaultfd ranges on hugetlbfs virtual memory areas
95
96       UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
97              If this feature bit is  set,  the  kernel  supports  registering
98              userfaultfd  ranges  on  shared memory areas.  This includes all
99              kernel shared memory APIs: System  V  shared  memory,  tmpfs(5),
100              shared  mappings  of /dev/zero, mmap(2) with the MAP_SHARED flag
101              set, memfd_create(2), and so on.
102
103       UFFD_FEATURE_SIGBUS (since Linux 4.14)
104              If   this   feature   bit   is   set,   no   page-fault   events
105              (UFFD_EVENT_PAGEFAULT)  will  be  delivered.   Instead, a SIGBUS
106              signal will be sent to the faulting process.  Applications using
107              this  feature  will not require the use of a userfaultfd monitor
108              for processing memory accesses to the  regions  registered  with
109              userfaultfd.
110
111       UFFD_FEATURE_THREAD_ID (since Linux 4.14)
112              If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
113              set to the faulted thread ID for each page-fault message.
114
115       UFFD_FEATURE_MINOR_HUGETLBFS (since Linux 5.13)
116              If this feature bit is  set,  the  kernel  supports  registering
117              userfaultfd  ranges in minor mode on hugetlbfs-backed memory ar‐
118              eas.
119
120       UFFD_FEATURE_MINOR_SHMEM (since Linux 5.14)
121              If this feature bit is  set,  the  kernel  supports  registering
122              userfaultfd ranges in minor mode on shmem-backed memory areas.
123
124       The returned ioctls field can contain the following bits:
125
126       1 << _UFFDIO_API
127              The UFFDIO_API operation is supported.
128
129       1 << _UFFDIO_REGISTER
130              The UFFDIO_REGISTER operation is supported.
131
132       1 << _UFFDIO_UNREGISTER
133              The UFFDIO_UNREGISTER operation is supported.
134
135       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
136       and errno is set to indicate the error.  Possible errors include:
137
138       EFAULT argp refers to an address that is outside the calling  process's
139              accessible address space.
140
141       EINVAL The  userfaultfd  has  already  been  enabled by a previous UFF‐
142              DIO_API operation.
143
144       EINVAL The API version requested in the api field is not  supported  by
145              this kernel, or the features field passed to the kernel includes
146              feature bits that are not supported by the current  kernel  ver‐
147              sion.
148
149   UFFDIO_REGISTER
150       (Since  Linux  4.3.)   Register  a  memory address range with the user‐
151       faultfd object.  The pages in the range must be  "compatible".   Please
152       refer  to  the  list  of register modes below for the compatible memory
153       backends for each mode.
154
155       The argp argument is a pointer to a uffdio_register structure,  defined
156       as:
157
158           struct uffdio_range {
159               __u64 start;    /* Start of range */
160               __u64 len;      /* Length of range (bytes) */
161           };
162
163           struct uffdio_register {
164               struct uffdio_range range;
165               __u64 mode;     /* Desired mode of operation (input) */
166               __u64 ioctls;   /* Available ioctl() operations (output) */
167           };
168
169       The range field defines a memory range starting at start and continuing
170       for len bytes that should be handled by the userfaultfd.
171
172       The mode field defines the mode of operation desired  for  this  memory
173       region.   The  following  values  may  be bitwise ORed to set the user‐
174       faultfd mode for the specified range:
175
176       UFFDIO_REGISTER_MODE_MISSING
177              Track page faults on missing pages.  Since Linux 4.3, only  pri‐
178              vate   anonymous  ranges  are  compatible.   Since  Linux  4.11,
179              hugetlbfs and shared memory ranges are also compatible.
180
181       UFFDIO_REGISTER_MODE_WP
182              Track page faults on write-protected pages.   Since  Linux  5.7,
183              only private anonymous ranges are compatible.
184
185       UFFDIO_REGISTER_MODE_MINOR
186              Track  minor  page  faults.   Since  Linux  5.13, only hugetlbfs
187              ranges are compatible.  Since  Linux  5.14,  compatibility  with
188              shmem ranges was added.
189
190       If the operation is successful, the kernel modifies the ioctls bit-mask
191       field to indicate which ioctl(2) operations are available for the spec‐
192       ified range.  This returned bit mask can contain the following bits:
193
194       1 << _UFFDIO_COPY
195              The UFFDIO_COPY operation is supported.
196
197       1 << _UFFDIO_WAKE
198              The UFFDIO_WAKE operation is supported.
199
200       1 << _UFFDIO_WRITEPROTECT
201              The UFFDIO_WRITEPROTECT
202
203       1 << _UFFDIO_ZEROPAGE
204              The UFFDIO_ZEROPAGE operation is supported.
205
206       1 << _UFFDIO_CONTINUE
207              The UFFDIO_CONTINUE operation is supported.
208
209       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
210       and errno is set to indicate the error.  Possible errors include:
211
212       EBUSY  A mapping in the specified  range  is  registered  with  another
213              userfaultfd object.
214
215       EFAULT argp  refers to an address that is outside the calling process's
216              accessible address space.
217
218       EINVAL An invalid or unsupported bit was specified in the  mode  field;
219              or the mode field was zero.
220
221       EINVAL There is no mapping in the specified address range.
222
223       EINVAL range.start  or  range.len  is not a multiple of the system page
224              size; or, range.len is zero; or these fields are  otherwise  in‐
225              valid.
226
227       EINVAL There as an incompatible mapping in the specified address range.
228
229   UFFDIO_UNREGISTER
230       (Since Linux 4.3.)  Unregister a memory address range from userfaultfd.
231       The pages in the range must be "compatible"  (see  the  description  of
232       UFFDIO_REGISTER.)
233
234       The address range to unregister is specified in the uffdio_range struc‐
235       ture pointed to by argp.
236
237       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
238       and errno is set to indicate the error.  Possible errors include:
239
240       EINVAL Either  the  start or the len field of the ufdio_range structure
241              was not a multiple of the system page size; or the len field was
242              zero; or these fields were otherwise invalid.
243
244       EINVAL There as an incompatible mapping in the specified address range.
245
246       EINVAL There was no mapping in the specified address range.
247
248   UFFDIO_COPY
249       (Since  Linux 4.3.)  Atomically copy a continuous memory chunk into the
250       userfault registered range and optionally wake up the  blocked  thread.
251       The  source  and  destination addresses and the number of bytes to copy
252       are specified by the src, dst, and len fields of the uffdio_copy struc‐
253       ture pointed to by argp:
254
255           struct uffdio_copy {
256               __u64 dst;    /* Destination of copy */
257               __u64 src;    /* Source of copy */
258               __u64 len;    /* Number of bytes to copy */
259               __u64 mode;   /* Flags controlling behavior of copy */
260               __s64 copy;   /* Number of bytes copied, or negated error */
261           };
262
263       The  following value may be bitwise ORed in mode to change the behavior
264       of the UFFDIO_COPY operation:
265
266       UFFDIO_COPY_MODE_DONTWAKE
267              Do not wake up the thread that waits for page-fault resolution
268
269       UFFDIO_COPY_MODE_WP
270              Copy the page with read-only permission.  This allows  the  user
271              to  trap the next write to the page, which will block and gener‐
272              ate another write-protect userfault message.  This is used  only
273              when   both   UFFDIO_REGISTER_MODE_MISSING   and   UFFDIO_REGIS‐
274              TER_MODE_WP modes are enabled for the registered range.
275
276       The copy field is used by the kernel to return the number of bytes that
277       was actually copied, or an error (a negated errno-style value).  If the
278       value returned in copy doesn't match the value that  was  specified  in
279       len, the operation fails with the error EAGAIN.  The copy field is out‐
280       put-only; it is not read by the UFFDIO_COPY operation.
281
282       This ioctl(2) operation returns 0 on success.  In this case, the entire
283       area was copied.  On error, -1 is returned and errno is set to indicate
284       the error.  Possible errors include:
285
286       EAGAIN The number of bytes copied (i.e., the value returned in the copy
287              field)  does  not  equal the value that was specified in the len
288              field.
289
290       EINVAL Either dst or len was not a multiple of the system page size, or
291              the range specified by src and len or dst and len was invalid.
292
293       EINVAL An invalid bit was specified in the mode field.
294
295       ENOENT (since Linux 4.11)
296              The  faulting  process has changed its virtual memory layout si‐
297              multaneously with an outstanding UFFDIO_COPY operation.
298
299       ENOSPC (from Linux 4.11 until Linux 4.13)
300              The faulting process has exited at the time of a UFFDIO_COPY op‐
301              eration.
302
303       ESRCH (since Linux 4.13)
304              The faulting process has exited at the time of a UFFDIO_COPY op‐
305              eration.
306
307   UFFDIO_ZEROPAGE
308       (Since Linux 4.3.)  Zero out  a  memory  range  registered  with  user‐
309       faultfd.
310
311       The  requested  range is specified by the range field of the uffdio_ze‐
312       ropage structure pointed to by argp:
313
314           struct uffdio_zeropage {
315               struct uffdio_range range;
316               __u64 mode;     /* Flags controlling behavior of copy */
317               __s64 zeropage; /* Number of bytes zeroed, or negated error */
318           };
319
320       The following value may be bitwise ORed in mode to change the  behavior
321       of the UFFDIO_ZEROPAGE operation:
322
323       UFFDIO_ZEROPAGE_MODE_DONTWAKE
324              Do not wake up the thread that waits for page-fault resolution.
325
326       The  zeropage field is used by the kernel to return the number of bytes
327       that was actually zeroed, or an  error  in  the  same  manner  as  UFF‐
328       DIO_COPY.   If  the  value returned in the zeropage field doesn't match
329       the value that was specified in range.len, the operation fails with the
330       error EAGAIN.  The zeropage field is output-only; it is not read by the
331       UFFDIO_ZEROPAGE operation.
332
333       This ioctl(2) operation returns 0 on success.  In this case, the entire
334       area was zeroed.  On error, -1 is returned and errno is set to indicate
335       the error.  Possible errors include:
336
337       EAGAIN The number of bytes zeroed (i.e., the value returned in the  ze‐
338              ropage field) does not equal the value that was specified in the
339              range.len field.
340
341       EINVAL Either range.start or range.len was not a multiple of the system
342              page size; or range.len was zero; or the range specified was in‐
343              valid.
344
345       EINVAL An invalid bit was specified in the mode field.
346
347       ESRCH (since Linux 4.13)
348              The faulting process has exited at the time of a UFFDIO_ZEROPAGE
349              operation.
350
351   UFFDIO_WAKE
352       (Since  Linux  4.3.)  Wake up the thread waiting for page-fault resolu‐
353       tion on a specified memory address range.
354
355       The UFFDIO_WAKE operation is used in conjunction with  UFFDIO_COPY  and
356       UFFDIO_ZEROPAGE  operations  that have the UFFDIO_COPY_MODE_DONTWAKE or
357       UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field.  The userfault
358       monitor  can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE operations
359       in a batch and then explicitly wake up the faulting thread  using  UFF‐
360       DIO_WAKE.
361
362       The  argp  argument  is  a  pointer  to a uffdio_range structure (shown
363       above) that specifies the address range.
364
365       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
366       and errno is set to indicate the error.  Possible errors include:
367
368       EINVAL The  start or the len field of the ufdio_range structure was not
369              a multiple of the system page size; or  len  was  zero;  or  the
370              specified range was otherwise invalid.
371
372   UFFDIO_WRITEPROTECT (Since Linux 5.7)
373       Write-protect  or write-unprotect a userfaultfd-registered memory range
374       registered with mode UFFDIO_REGISTER_MODE_WP.
375
376       The argp argument is a pointer to a uffdio_range structure as shown be‐
377       low:
378
379           struct uffdio_writeprotect {
380               struct uffdio_range range; /* Range to change write permission*/
381               __u64 mode;                /* Mode to change write permission */
382           };
383
384       There are two mode bits that are supported in this structure:
385
386       UFFDIO_WRITEPROTECT_MODE_WP
387              When this mode bit is set, the ioctl will be a write-protect op‐
388              eration upon the memory range specified by range.  Otherwise  it
389              will  be  a  write-unprotect operation upon the specified range,
390              which can be used to resolve a  userfaultfd  write-protect  page
391              fault.
392
393       UFFDIO_WRITEPROTECT_MODE_DONTWAKE
394              When  this mode bit is set, do not wake up any thread that waits
395              for page-fault resolution after  the  operation.   This  can  be
396              specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
397
398       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
399       and errno is set to indicate the error.  Possible errors include:
400
401       EINVAL The start or the len field of the ufdio_range structure was  not
402              a  multiple  of  the  system  page size; or len was zero; or the
403              specified range was otherwise invalid.
404
405       EAGAIN The process was interrupted; retry this call.
406
407       ENOENT The range specified in range is not  valid.   For  example,  the
408              virtual  address  does  not  exist, or not registered with user‐
409              faultfd write-protect mode.
410
411       EFAULT Encountered a generic fault during processing.
412
413   UFFDIO_CONTINUE
414       (Since Linux 5.13.)  Resolve a minor page fault by installing page  ta‐
415       ble entries for existing pages in the page cache.
416
417       The  argp argument is a pointer to a uffdio_continue structure as shown
418       below:
419
420           struct uffdio_continue {
421               struct uffdio_range range;
422                              /* Range to install PTEs for and continue */
423               __u64 mode;    /* Flags controlling the behavior of continue */
424               __s64 mapped;  /* Number of bytes mapped, or negated error */
425           };
426
427       The following value may be bitwise ORed in mode to change the  behavior
428       of the UFFDIO_CONTINUE operation:
429
430       UFFDIO_CONTINUE_MODE_DONTWAKE
431              Do not wake up the thread that waits for page-fault resolution.
432
433       The  mapped  field  is used by the kernel to return the number of bytes
434       that were actually mapped, or an error  in  the  same  manner  as  UFF‐
435       DIO_COPY.   If the value returned in the mapped field doesn't match the
436       value that was specified in range.len, the operation fails with the er‐
437       ror  EAGAIN.   The  mapped  field is output-only; it is not read by the
438       UFFDIO_CONTINUE operation.
439
440       This ioctl(2) operation returns 0 on success.  In this case, the entire
441       area was mapped.  On error, -1 is returned and errno is set to indicate
442       the error.  Possible errors include:
443
444       EAGAIN The number of bytes mapped (i.e.,  the  value  returned  in  the
445              mapped field) does not equal the value that was specified in the
446              range.len field.
447
448       EINVAL Either range.start or range.len was not a multiple of the system
449              page size; or range.len was zero; or the range specified was in‐
450              valid.
451
452       EINVAL An invalid bit was specified in the mode field.
453
454       EEXIST One or more pages were already mapped in the given range.
455
456       ENOENT The faulting process has changed its virtual memory  layout  si‐
457              multaneously with an outstanding UFFDIO_CONTINUE operation.
458
459       ENOMEM Allocating  memory  needed  to  setup  the  page  table mappings
460              failed.
461
462       EFAULT No existing page could be found in the page cache for the  given
463              range.
464
465       ESRCH  The faulting process has exited at the time of a UFFDIO_CONTINUE
466              operation.
467

RETURN VALUE

469       See descriptions of the individual operations, above.
470

ERRORS

472       See descriptions of the individual operations, above.  In addition, the
473       following  general errors can occur for all of the operations described
474       above:
475
476       EFAULT argp does not point to a valid memory address.
477
478       EINVAL (For all operations except UFFDIO_API.)  The userfaultfd  object
479              has not yet been enabled (via the UFFDIO_API operation).
480

STANDARDS

482       Linux.
483

BUGS

485       In  order to detect available userfault features and enable some subset
486       of those features the userfaultfd file descriptor must be closed  after
487       the  first  UFFDIO_API operation that queries features availability and
488       reopened before the second UFFDIO_API operation that  actually  enables
489       the desired features.
490

EXAMPLES

492       See userfaultfd(2).
493

SEE ALSO

495       ioctl(2), mmap(2), userfaultfd(2)
496
497       Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
498       tree
499
500
501
502Linux man-pages 6.04              2023-03-30              ioctl_userfaultfd(2)
Impressum