1IOCTL_USERFAULTFD(2)       Linux Programmer's Manual      IOCTL_USERFAULTFD(2)
2
3
4

NAME

6       ioctl_userfaultfd  -  create a file descriptor for handling page faults
7       in user space
8

SYNOPSIS

10       #include <linux/userfaultfd.h>  /* Definition of UFFD* constants */
11       #include <sys/ioctl.h>
12
13       int ioctl(int fd, int cmd, ...);
14

DESCRIPTION

16       Various ioctl(2) operations can be performed on  a  userfaultfd  object
17       (created by a call to userfaultfd(2)) using calls of the form:
18
19           ioctl(fd, cmd, argp);
20       In  the  above,  fd is a file descriptor referring to a userfaultfd ob‐
21       ject, cmd is one of the commands listed below, and argp is a pointer to
22       a data structure that is specific to cmd.
23
24       The  various  ioctl(2) operations are described below.  The UFFDIO_API,
25       UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
26       userfaultfd behavior.  These operations allow the caller to choose what
27       features will be enabled and what kinds of events will be delivered  to
28       the application.  The remaining operations are range operations.  These
29       operations enable the calling application to resolve page-fault events.
30
31   UFFDIO_API
32       (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API
33       handshake.
34
35       The argp argument is a pointer to a uffdio_api structure, defined as:
36
37           struct uffdio_api {
38               __u64 api;        /* Requested API version (input) */
39               __u64 features;   /* Requested features (input/output) */
40               __u64 ioctls;     /* Available ioctl() operations (output) */
41           };
42
43       The api field denotes the API version requested by the application.
44
45       The  kernel verifies that it can support the requested API version, and
46       sets the features and ioctls fields to bit masks representing  all  the
47       available features and the generic ioctl(2) operations available.
48
49       For  Linux kernel versions before 4.11, the features field must be ini‐
50       tialized to zero before the call to UFFDIO_API, and zero (i.e., no fea‐
51       ture  bits)  is  placed in the features field by the kernel upon return
52       from ioctl(2).
53
54       Starting from Linux 4.11, the features field can be used to ask whether
55       particular  features  are  supported  and explicitly enable userfaultfd
56       features that are disabled by default.  The kernel always  reports  all
57       the available features in the features field.
58
59       To  enable userfaultfd features the application should set a bit corre‐
60       sponding to each feature it wants to enable in the features field.   If
61       the  kernel  supports  all  the requested features it will enable them.
62       Otherwise it will zero out the returned uffdio_api structure and return
63       EINVAL.
64
65       The following feature bits may be set:
66
67       UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
68              When this feature is enabled, the userfaultfd objects associated
69              with a parent process are duplicated into the child process dur‐
70              ing  fork(2)  and  a  UFFD_EVENT_FORK  event is delivered to the
71              userfaultfd monitor
72
73       UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
74              If this feature is enabled, when the  faulting  process  invokes
75              mremap(2), the userfaultfd monitor will receive an event of type
76              UFFD_EVENT_REMAP.
77
78       UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
79              If this feature is enabled, when the faulting process calls mad‐
80              vise(2)  with  the  MADV_DONTNEED or MADV_REMOVE advice value to
81              free a virtual memory area the userfaultfd monitor will  receive
82              an event of type UFFD_EVENT_REMOVE.
83
84       UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
85              If  this  feature  is  enabled, when the faulting process unmaps
86              virtual memory either explicitly with munmap(2),  or  implicitly
87              during either mmap(2) or mremap(2), the userfaultfd monitor will
88              receive an event of type UFFD_EVENT_UNMAP.
89
90       UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
91              If this feature bit is  set,  the  kernel  supports  registering
92              userfaultfd ranges on hugetlbfs virtual memory areas
93
94       UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
95              If  this  feature  bit  is  set, the kernel supports registering
96              userfaultfd ranges on shared memory areas.   This  includes  all
97              kernel  shared  memory  APIs:  System V shared memory, tmpfs(5),
98              shared mappings of /dev/zero, mmap(2) with the  MAP_SHARED  flag
99              set, memfd_create(2), and so on.
100
101       UFFD_FEATURE_SIGBUS (since Linux 4.14)
102              If   this   feature   bit   is   set,   no   page-fault   events
103              (UFFD_EVENT_PAGEFAULT) will be  delivered.   Instead,  a  SIGBUS
104              signal will be sent to the faulting process.  Applications using
105              this feature will not require the use of a  userfaultfd  monitor
106              for  processing  memory  accesses to the regions registered with
107              userfaultfd.
108
109       UFFD_FEATURE_THREAD_ID (since Linux 4.14)
110              If this feature bit is set, uffd_msg.pagefault.feat.ptid will be
111              set to the faulted thread ID for each page-fault message.
112
113       The returned ioctls field can contain the following bits:
114
115       1 << _UFFDIO_API
116              The UFFDIO_API operation is supported.
117
118       1 << _UFFDIO_REGISTER
119              The UFFDIO_REGISTER operation is supported.
120
121       1 << _UFFDIO_UNREGISTER
122              The UFFDIO_UNREGISTER operation is supported.
123
124       1 << _UFFDIO_WRITEPROTECT
125              The UFFDIO_WRITEPROTECT operation is supported.
126
127       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
128       and errno is set to indicate the error.  Possible errors include:
129
130       EFAULT argp refers to an address that is outside the calling  process's
131              accessible address space.
132
133       EINVAL The  userfaultfd  has  already  been  enabled by a previous UFF‐
134              DIO_API operation.
135
136       EINVAL The API version requested in the api field is not  supported  by
137              this kernel, or the features field passed to the kernel includes
138              feature bits that are not supported by the current  kernel  ver‐
139              sion.
140
141   UFFDIO_REGISTER
142       (Since  Linux  4.3.)   Register  a  memory address range with the user‐
143       faultfd object.  The pages in the range must be "compatible".
144
145       Up to Linux kernel 4.11, only private anonymous ranges  are  compatible
146       for registering with UFFDIO_REGISTER.
147
148       Since  Linux 4.11, hugetlbfs and shared memory ranges are also compati‐
149       ble with UFFDIO_REGISTER.
150
151       The argp argument is a pointer to a uffdio_register structure,  defined
152       as:
153
154           struct uffdio_range {
155               __u64 start;    /* Start of range */
156               __u64 len;      /* Length of range (bytes) */
157           };
158
159           struct uffdio_register {
160               struct uffdio_range range;
161               __u64 mode;     /* Desired mode of operation (input) */
162               __u64 ioctls;   /* Available ioctl() operations (output) */
163           };
164
165       The range field defines a memory range starting at start and continuing
166       for len bytes that should be handled by the userfaultfd.
167
168       The mode field defines the mode of operation desired  for  this  memory
169       region.   The  following  values  may  be bitwise ORed to set the user‐
170       faultfd mode for the specified range:
171
172       UFFDIO_REGISTER_MODE_MISSING
173              Track page faults on missing pages.
174
175       UFFDIO_REGISTER_MODE_WP
176              Track page faults on write-protected pages.
177
178       If the operation is successful, the kernel modifies the ioctls bit-mask
179       field to indicate which ioctl(2) operations are available for the spec‐
180       ified range.  This returned bit mask is as for UFFDIO_API.
181
182       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
183       and errno is set to indicate the error.  Possible errors include:
184
185       EBUSY  A  mapping  in  the  specified  range is registered with another
186              userfaultfd object.
187
188       EFAULT argp refers to an address that is outside the calling  process's
189              accessible address space.
190
191       EINVAL An  invalid  or unsupported bit was specified in the mode field;
192              or the mode field was zero.
193
194       EINVAL There is no mapping in the specified address range.
195
196       EINVAL range.start or range.len is not a multiple of  the  system  page
197              size;  or,  range.len is zero; or these fields are otherwise in‐
198              valid.
199
200       EINVAL There as an incompatible mapping in the specified address range.
201
202   UFFDIO_UNREGISTER
203       (Since Linux 4.3.)  Unregister a memory address range from userfaultfd.
204       The  pages  in  the  range must be "compatible" (see the description of
205       UFFDIO_REGISTER.)
206
207       The address range to unregister is specified in the uffdio_range struc‐
208       ture pointed to by argp.
209
210       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
211       and errno is set to indicate the error.  Possible errors include:
212
213       EINVAL Either the start or the len field of the  ufdio_range  structure
214              was not a multiple of the system page size; or the len field was
215              zero; or these fields were otherwise invalid.
216
217       EINVAL There as an incompatible mapping in the specified address range.
218
219       EINVAL There was no mapping in the specified address range.
220
221   UFFDIO_COPY
222       (Since Linux 4.3.)  Atomically copy a continuous memory chunk into  the
223       userfault  registered  range and optionally wake up the blocked thread.
224       The source and destination addresses and the number of  bytes  to  copy
225       are specified by the src, dst, and len fields of the uffdio_copy struc‐
226       ture pointed to by argp:
227
228           struct uffdio_copy {
229               __u64 dst;    /* Destination of copy */
230               __u64 src;    /* Source of copy */
231               __u64 len;    /* Number of bytes to copy */
232               __u64 mode;   /* Flags controlling behavior of copy */
233               __s64 copy;   /* Number of bytes copied, or negated error */
234           };
235
236       The following value may be bitwise ORed in mode to change the  behavior
237       of the UFFDIO_COPY operation:
238
239       UFFDIO_COPY_MODE_DONTWAKE
240              Do not wake up the thread that waits for page-fault resolution
241
242       UFFDIO_COPY_MODE_WP
243              Copy  the  page with read-only permission.  This allows the user
244              to trap the next write to the page, which will block and  gener‐
245              ate  another write-protect userfault message.  This is used only
246              when   both   UFFDIO_REGISTER_MODE_MISSING   and   UFFDIO_REGIS‐
247              TER_MODE_WP modes are enabled for the registered range.
248
249       The copy field is used by the kernel to return the number of bytes that
250       was actually copied, or an error (a negated errno-style value).  If the
251       value  returned  in  copy doesn't match the value that was specified in
252       len, the operation fails with the error EAGAIN.  The copy field is out‐
253       put-only; it is not read by the UFFDIO_COPY operation.
254
255       This ioctl(2) operation returns 0 on success.  In this case, the entire
256       area was copied.  On error, -1 is returned and errno is set to indicate
257       the error.  Possible errors include:
258
259       EAGAIN The number of bytes copied (i.e., the value returned in the copy
260              field) does not equal the value that was specified  in  the  len
261              field.
262
263       EINVAL Either dst or len was not a multiple of the system page size, or
264              the range specified by src and len or dst and len was invalid.
265
266       EINVAL An invalid bit was specified in the mode field.
267
268       ENOENT (since Linux 4.11)
269              The faulting process has changed its virtual memory  layout  si‐
270              multaneously with an outstanding UFFDIO_COPY operation.
271
272       ENOSPC (from Linux 4.11 until Linux 4.13)
273              The faulting process has exited at the time of a UFFDIO_COPY op‐
274              eration.
275
276       ESRCH (since Linux 4.13)
277              The faulting process has exited at the time of a UFFDIO_COPY op‐
278              eration.
279
280   UFFDIO_ZEROPAGE
281       (Since  Linux  4.3.)   Zero  out  a  memory range registered with user‐
282       faultfd.
283
284       The requested range is specified by the range field of  the  uffdio_ze‐
285       ropage structure pointed to by argp:
286
287           struct uffdio_zeropage {
288               struct uffdio_range range;
289               __u64 mode;     /* Flags controlling behavior of copy */
290               __s64 zeropage; /* Number of bytes zeroed, or negated error */
291           };
292
293       The  following value may be bitwise ORed in mode to change the behavior
294       of the UFFDIO_ZEROPAGE operation:
295
296       UFFDIO_ZEROPAGE_MODE_DONTWAKE
297              Do not wake up the thread that waits for page-fault resolution.
298
299       The zeropage field is used by the kernel to return the number of  bytes
300       that  was  actually  zeroed,  or  an  error  in the same manner as UFF‐
301       DIO_COPY.  If the value returned in the zeropage  field  doesn't  match
302       the value that was specified in range.len, the operation fails with the
303       error EAGAIN.  The zeropage field is output-only; it is not read by the
304       UFFDIO_ZEROPAGE operation.
305
306       This ioctl(2) operation returns 0 on success.  In this case, the entire
307       area was zeroed.  On error, -1 is returned and errno is set to indicate
308       the error.  Possible errors include:
309
310       EAGAIN The  number of bytes zeroed (i.e., the value returned in the ze‐
311              ropage field) does not equal the value that was specified in the
312              range.len field.
313
314       EINVAL Either range.start or range.len was not a multiple of the system
315              page size; or range.len was zero; or the range specified was in‐
316              valid.
317
318       EINVAL An invalid bit was specified in the mode field.
319
320       ESRCH (since Linux 4.13)
321              The faulting process has exited at the time of a UFFDIO_ZEROPAGE
322              operation.
323
324   UFFDIO_WAKE
325       (Since Linux 4.3.)  Wake up the thread waiting for  page-fault  resolu‐
326       tion on a specified memory address range.
327
328       The  UFFDIO_WAKE  operation is used in conjunction with UFFDIO_COPY and
329       UFFDIO_ZEROPAGE operations that have the  UFFDIO_COPY_MODE_DONTWAKE  or
330       UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field.  The userfault
331       monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE  operations
332       in  a  batch and then explicitly wake up the faulting thread using UFF‐
333       DIO_WAKE.
334
335       The argp argument is a  pointer  to  a  uffdio_range  structure  (shown
336       above) that specifies the address range.
337
338       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
339       and errno is set to indicate the error.  Possible errors include:
340
341       EINVAL The start or the len field of the ufdio_range structure was  not
342              a  multiple  of  the  system  page size; or len was zero; or the
343              specified range was otherwise invalid.
344
345   UFFDIO_WRITEPROTECT (Since Linux 5.7)
346       Write-protect or write-unprotect a userfaultfd-registered memory  range
347       registered with mode UFFDIO_REGISTER_MODE_WP.
348
349       The argp argument is a pointer to a uffdio_range structure as shown be‐
350       low:
351
352           struct uffdio_writeprotect {
353               struct uffdio_range range; /* Range to change write permission*/
354               __u64 mode;                /* Mode to change write permission */
355           };
356
357       There are two mode bits that are supported in this structure:
358
359       UFFDIO_WRITEPROTECT_MODE_WP
360              When this mode bit is set, the ioctl will be a write-protect op‐
361              eration  upon the memory range specified by range.  Otherwise it
362              will be a write-unprotect operation upon  the  specified  range,
363              which  can  be  used to resolve a userfaultfd write-protect page
364              fault.
365
366       UFFDIO_WRITEPROTECT_MODE_DONTWAKE
367              When this mode bit is set, do not wake up any thread that  waits
368              for  page-fault  resolution  after  the  operation.  This can be
369              specified only if UFFDIO_WRITEPROTECT_MODE_WP is not specified.
370
371       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
372       and errno is set to indicate the error.  Possible errors include:
373
374       EINVAL The  start or the len field of the ufdio_range structure was not
375              a multiple of the system page size; or  len  was  zero;  or  the
376              specified range was otherwise invalid.
377
378       EAGAIN The process was interrupted; retry this call.
379
380       ENOENT The  range  specified  in  range is not valid.  For example, the
381              virtual address does not exist, or  not  registered  with  user‐
382              faultfd write-protect mode.
383
384       EFAULT Encountered a generic fault during processing.
385

RETURN VALUE

387       See descriptions of the individual operations, above.
388

ERRORS

390       See descriptions of the individual operations, above.  In addition, the
391       following general errors can occur for all of the operations  described
392       above:
393
394       EFAULT argp does not point to a valid memory address.
395
396       EINVAL (For  all operations except UFFDIO_API.)  The userfaultfd object
397              has not yet been enabled (via the UFFDIO_API operation).
398

CONFORMING TO

400       These ioctl(2) operations are Linux-specific.
401

BUGS

403       In order to detect available userfault features and enable some  subset
404       of  those features the userfaultfd file descriptor must be closed after
405       the first UFFDIO_API operation that queries features  availability  and
406       reopened  before  the second UFFDIO_API operation that actually enables
407       the desired features.
408

EXAMPLES

410       See userfaultfd(2).
411

SEE ALSO

413       ioctl(2), mmap(2), userfaultfd(2)
414
415       Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
416       tree
417

COLOPHON

419       This  page  is  part of release 5.12 of the Linux man-pages project.  A
420       description of the project, information about reporting bugs,  and  the
421       latest     version     of     this    page,    can    be    found    at
422       https://www.kernel.org/doc/man-pages/.
423
424
425
426Linux                             2021-03-22              IOCTL_USERFAULTFD(2)
Impressum