1IOCTL_USERFAULTFD(2)       Linux Programmer's Manual      IOCTL_USERFAULTFD(2)
2
3
4

NAME

6       ioctl_userfaultfd  -  create a file descriptor for handling page faults
7       in user space
8

SYNOPSIS

10       #include <sys/ioctl.h>
11
12       int ioctl(int fd, int cmd, ...);
13

DESCRIPTION

15       Various ioctl(2) operations can be performed on  a  userfaultfd  object
16       (created by a call to userfaultfd(2)) using calls of the form:
17
18           ioctl(fd, cmd, argp);
19       In  the  above,  fd is a file descriptor referring to a userfaultfd ob‐
20       ject, cmd is one of the commands listed below, and argp is a pointer to
21       a data structure that is specific to cmd.
22
23       The  various  ioctl(2) operations are described below.  The UFFDIO_API,
24       UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure
25       userfaultfd behavior.  These operations allow the caller to choose what
26       features will be enabled and what kinds of events will be delivered  to
27       the application.  The remaining operations are range operations.  These
28       operations enable the calling application to resolve page-fault events.
29
30   UFFDIO_API
31       (Since Linux 4.3.)  Enable operation of the userfaultfd and perform API
32       handshake.
33
34       The argp argument is a pointer to a uffdio_api structure, defined as:
35
36           struct uffdio_api {
37               __u64 api;        /* Requested API version (input) */
38               __u64 features;   /* Requested features (input/output) */
39               __u64 ioctls;     /* Available ioctl() operations (output) */
40           };
41
42       The api field denotes the API version requested by the application.
43
44       The  kernel verifies that it can support the requested API version, and
45       sets the features and ioctls fields to bit masks representing  all  the
46       available features and the generic ioctl(2) operations available.
47
48       For  Linux kernel versions before 4.11, the features field must be ini‐
49       tialized to zero before the call to UFFDIO_API, and zero (i.e., no fea‐
50       ture  bits)  is  placed in the features field by the kernel upon return
51       from ioctl(2).
52
53       Starting from Linux 4.11, the features field can be used to ask whether
54       particular  features  are  supported  and explicitly enable userfaultfd
55       features that are disabled by default.  The kernel always  reports  all
56       the available features in the features field.
57
58       To  enable userfaultfd features the application should set a bit corre‐
59       sponding to each feature it wants to enable in the features field.   If
60       the  kernel  supports  all  the requested features it will enable them.
61       Otherwise it will zero out the returned uffdio_api structure and return
62       EINVAL.
63
64       The following feature bits may be set:
65
66       UFFD_FEATURE_EVENT_FORK (since Linux 4.11)
67              When this feature is enabled, the userfaultfd objects associated
68              with a parent process are duplicated into the child process dur‐
69              ing  fork(2)  and  a  UFFD_EVENT_FORK  event is delivered to the
70              userfaultfd monitor
71
72       UFFD_FEATURE_EVENT_REMAP (since Linux 4.11)
73              If this feature is enabled, when the  faulting  process  invokes
74              mremap(2), the userfaultfd monitor will receive an event of type
75              UFFD_EVENT_REMAP.
76
77       UFFD_FEATURE_EVENT_REMOVE (since Linux 4.11)
78              If this feature is enabled, when the faulting process calls mad‐
79              vise(2)  with  the  MADV_DONTNEED or MADV_REMOVE advice value to
80              free a virtual memory area the userfaultfd monitor will  receive
81              an event of type UFFD_EVENT_REMOVE.
82
83       UFFD_FEATURE_EVENT_UNMAP (since Linux 4.11)
84              If  this  feature  is  enabled, when the faulting process unmaps
85              virtual memory either explicitly with munmap(2),  or  implicitly
86              during  either  mmap(2)  or  mremap(2).  the userfaultfd monitor
87              will receive an event of type UFFD_EVENT_UNMAP.
88
89       UFFD_FEATURE_MISSING_HUGETLBFS (since Linux 4.11)
90              If this feature bit is  set,  the  kernel  supports  registering
91              userfaultfd ranges on hugetlbfs virtual memory areas
92
93       UFFD_FEATURE_MISSING_SHMEM (since Linux 4.11)
94              If  this  feature  bit  is  set, the kernel supports registering
95              userfaultfd ranges on shared memory areas.   This  includes  all
96              kernel  shared  memory  APIs:  System V shared memory, tmpfs(5),
97              shared mappings of /dev/zero, mmap(2) with the  MAP_SHARED  flag
98              set, memfd_create(2), and so on.
99
100       UFFD_FEATURE_SIGBUS (since Linux 4.14)
101              If   this   feature   bit   is   set,   no   page-fault   events
102              (UFFD_EVENT_PAGEFAULT) will be  delivered.   Instead,  a  SIGBUS
103              signal will be sent to the faulting process.  Applications using
104              this feature will not require the use of a  userfaultfd  monitor
105              for  processing  memory  accesses to the regions registered with
106              userfaultfd.
107
108       The returned ioctls field can contain the following bits:
109
110       1 << _UFFDIO_API
111              The UFFDIO_API operation is supported.
112
113       1 << _UFFDIO_REGISTER
114              The UFFDIO_REGISTER operation is supported.
115
116       1 << _UFFDIO_UNREGISTER
117              The UFFDIO_UNREGISTER operation is supported.
118
119       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
120       and  errno  is set to indicate the cause of the error.  Possible errors
121       include:
122
123       EFAULT argp refers to an address that is outside the calling  process's
124              accessible address space.
125
126       EINVAL The  userfaultfd  has  already  been  enabled by a previous UFF‐
127              DIO_API operation.
128
129       EINVAL The API version requested in the api field is not  supported  by
130              this kernel, or the features field passed to the kernel includes
131              feature bits that are not supported by the current  kernel  ver‐
132              sion.
133
134   UFFDIO_REGISTER
135       (Since  Linux  4.3.)   Register  a  memory address range with the user‐
136       faultfd object.  The pages in the range must be "compatible".
137
138       Up to Linux kernel 4.11, only private anonymous ranges  are  compatible
139       for registering with UFFDIO_REGISTER.
140
141       Since  Linux 4.11, hugetlbfs and shared memory ranges are also compati‐
142       ble with UFFDIO_REGISTER.
143
144       The argp argument is a pointer to a uffdio_register structure,  defined
145       as:
146
147           struct uffdio_range {
148               __u64 start;    /* Start of range */
149               __u64 len;      /* Length of range (bytes) */
150           };
151
152           struct uffdio_register {
153               struct uffdio_range range;
154               __u64 mode;     /* Desired mode of operation (input) */
155               __u64 ioctls;   /* Available ioctl() operations (output) */
156           };
157
158       The range field defines a memory range starting at start and continuing
159       for len bytes that should be handled by the userfaultfd.
160
161       The mode field defines the mode of operation desired  for  this  memory
162       region.   The  following  values  may  be bitwise ORed to set the user‐
163       faultfd mode for the specified range:
164
165       UFFDIO_REGISTER_MODE_MISSING
166              Track page faults on missing pages.
167
168       UFFDIO_REGISTER_MODE_WP
169              Track page faults on write-protected pages.
170
171       Currently, the only supported mode is UFFDIO_REGISTER_MODE_MISSING.
172
173       If the operation is successful, the kernel modifies the ioctls bit-mask
174       field to indicate which ioctl(2) operations are available for the spec‐
175       ified range.  This returned bit mask is as for UFFDIO_API.
176
177       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
178       and  errno  is set to indicate the cause of the error.  Possible errors
179       include:
180
181       EBUSY  A mapping in the specified  range  is  registered  with  another
182              userfaultfd object.
183
184       EFAULT argp  refers to an address that is outside the calling process's
185              accessible address space.
186
187       EINVAL An invalid or unsupported bit was specified in the  mode  field;
188              or the mode field was zero.
189
190       EINVAL There is no mapping in the specified address range.
191
192       EINVAL range.start  or  range.len  is not a multiple of the system page
193              size; or, range.len is zero; or these fields are  otherwise  in‐
194              valid.
195
196       EINVAL There as an incompatible mapping in the specified address range.
197
198   UFFDIO_UNREGISTER
199       (Since Linux 4.3.)  Unregister a memory address range from userfaultfd.
200       The pages in the range must be "compatible"  (see  the  description  of
201       UFFDIO_REGISTER.)
202
203       The address range to unregister is specified in the uffdio_range struc‐
204       ture pointed to by argp.
205
206       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
207       and  errno  is set to indicate the cause of the error.  Possible errors
208       include:
209
210       EINVAL Either the start or the len field of the  ufdio_range  structure
211              was not a multiple of the system page size; or the len field was
212              zero; or these fields were otherwise invalid.
213
214       EINVAL There as an incompatible mapping in the specified address range.
215
216       EINVAL There was no mapping in the specified address range.
217
218   UFFDIO_COPY
219       (Since Linux 4.3.)  Atomically copy a continuous memory chunk into  the
220       userfault  registered  range and optionally wake up the blocked thread.
221       The source and destination addresses and the number of  bytes  to  copy
222       are specified by the src, dst, and len fields of the uffdio_copy struc‐
223       ture pointed to by argp:
224
225           struct uffdio_copy {
226               __u64 dst;    /* Destination of copy */
227               __u64 src;    /* Source of copy */
228               __u64 len;    /* Number of bytes to copy */
229               __u64 mode;   /* Flags controlling behavior of copy */
230               __s64 copy;   /* Number of bytes copied, or negated error */
231           };
232
233       The following value may be bitwise ORed in mode to change the  behavior
234       of the UFFDIO_COPY operation:
235
236       UFFDIO_COPY_MODE_DONTWAKE
237              Do not wake up the thread that waits for page-fault resolution
238
239       The copy field is used by the kernel to return the number of bytes that
240       was actually copied, or an error (a negated errno-style value).  If the
241       value  returned  in  copy doesn't match the value that was specified in
242       len, the operation fails with the error EAGAIN.  The copy field is out‐
243       put-only; it is not read by the UFFDIO_COPY operation.
244
245       This ioctl(2) operation returns 0 on success.  In this case, the entire
246       area was copied.  On error, -1 is returned and errno is set to indicate
247       the cause of the error.  Possible errors include:
248
249       EAGAIN The number of bytes copied (i.e., the value returned in the copy
250              field) does not equal the value that was specified  in  the  len
251              field.
252
253       EINVAL Either dst or len was not a multiple of the system page size, or
254              the range specified by src and len or dst and len was invalid.
255
256       EINVAL An invalid bit was specified in the mode field.
257
258       ENOENT (since Linux 4.11)
259              The faulting process has changed its virtual memory  layout  si‐
260              multaneously with an outstanding UFFDIO_COPY operation.
261
262       ENOSPC (from Linux 4.11 until Linux 4.13)
263              The faulting process has exited at the time of a UFFDIO_COPY op‐
264              eration.
265
266       ESRCH (since Linux 4.13)
267              The faulting process has exited at the time of a UFFDIO_COPY op‐
268              eration.
269
270   UFFDIO_ZEROPAGE
271       (Since  Linux  4.3.)   Zero  out  a  memory range registered with user‐
272       faultfd.
273
274       The requested range is specified by the range field of  the  uffdio_ze‐
275       ropage structure pointed to by argp:
276
277           struct uffdio_zeropage {
278               struct uffdio_range range;
279               __u64 mode;     /* Flags controlling behavior of copy */
280               __s64 zeropage; /* Number of bytes zeroed, or negated error */
281           };
282
283       The  following value may be bitwise ORed in mode to change the behavior
284       of the UFFDIO_ZEROPAGE operation:
285
286       UFFDIO_ZEROPAGE_MODE_DONTWAKE
287              Do not wake up the thread that waits for page-fault resolution.
288
289       The zeropage field is used by the kernel to return the number of  bytes
290       that  was  actually  zeroed,  or  an  error  in the same manner as UFF‐
291       DIO_COPY.  If the value returned in the zeropage  field  doesn't  match
292       the value that was specified in range.len, the operation fails with the
293       error EAGAIN.  The zeropage field is output-only; it is not read by the
294       UFFDIO_ZEROPAGE operation.
295
296       This ioctl(2) operation returns 0 on success.  In this case, the entire
297       area was zeroed.  On error, -1 is returned and errno is set to indicate
298       the cause of the error.  Possible errors include:
299
300       EAGAIN The  number of bytes zeroed (i.e., the value returned in the ze‐
301              ropage field) does not equal the value that was specified in the
302              range.len field.
303
304       EINVAL Either range.start or range.len was not a multiple of the system
305              page size; or range.len was zero; or the range specified was in‐
306              valid.
307
308       EINVAL An invalid bit was specified in the mode field.
309
310       ESRCH (since Linux 4.13)
311              The faulting process has exited at the time of a UFFDIO_ZEROPAGE
312              operation.
313
314   UFFDIO_WAKE
315       (Since Linux 4.3.)  Wake up the thread waiting for  page-fault  resolu‐
316       tion on a specified memory address range.
317
318       The  UFFDIO_WAKE  operation is used in conjunction with UFFDIO_COPY and
319       UFFDIO_ZEROPAGE operations that have the  UFFDIO_COPY_MODE_DONTWAKE  or
320       UFFDIO_ZEROPAGE_MODE_DONTWAKE bit set in the mode field.  The userfault
321       monitor can perform several UFFDIO_COPY and UFFDIO_ZEROPAGE  operations
322       in  a  batch and then explicitly wake up the faulting thread using UFF‐
323       DIO_WAKE.
324
325       The argp argument is a  pointer  to  a  uffdio_range  structure  (shown
326       above) that specifies the address range.
327
328       This ioctl(2) operation returns 0 on success.  On error, -1 is returned
329       and errno is set to indicate the cause of the error.   Possible  errors
330       include:
331
332       EINVAL The  start or the len field of the ufdio_range structure was not
333              a multiple of the system page size; or  len  was  zero;  or  the
334              specified range was otherwise invalid.
335

RETURN VALUE

337       See descriptions of the individual operations, above.
338

ERRORS

340       See descriptions of the individual operations, above.  In addition, the
341       following general errors can occur for all of the operations  described
342       above:
343
344       EFAULT argp does not point to a valid memory address.
345
346       EINVAL (For  all operations except UFFDIO_API.)  The userfaultfd object
347              has not yet been enabled (via the UFFDIO_API operation).
348

CONFORMING TO

350       These ioctl(2) operations are Linux-specific.
351

BUGS

353       In order to detect available userfault features and enable some  subset
354       of  those features the userfaultfd file descriptor must be closed after
355       the first UFFDIO_API operation that queries features  availability  and
356       reopened  before  the second UFFDIO_API operation that actually enables
357       the desired features.
358

EXAMPLES

360       See userfaultfd(2).
361

SEE ALSO

363       ioctl(2), mmap(2), userfaultfd(2)
364
365       Documentation/admin-guide/mm/userfaultfd.rst in the Linux kernel source
366       tree
367

COLOPHON

369       This  page  is  part of release 5.10 of the Linux man-pages project.  A
370       description of the project, information about reporting bugs,  and  the
371       latest     version     of     this    page,    can    be    found    at
372       https://www.kernel.org/doc/man-pages/.
373
374
375
376Linux                             2020-06-09              IOCTL_USERFAULTFD(2)
Impressum