1IO_URING_ENTER(2)          Linux Programmer's Manual         IO_URING_ENTER(2)
2
3
4

NAME

6       io_uring_enter - initiate and/or complete asynchronous I/O
7

SYNOPSIS

9       #include <linux/io_uring.h>
10
11       int io_uring_enter(unsigned int fd, unsigned int to_submit,
12                          unsigned int min_complete, unsigned int flags,
13                          sigset_t *sig);
14

DESCRIPTION

16       io_uring_enter()  is used to initiate and complete I/O using the shared
17       submission and completion queues setup by a call to  io_uring_setup(2).
18       A  single  call can both submit new I/O and wait for completions of I/O
19       initiated by this call or previous calls to io_uring_enter().
20
21       fd is the file descriptor  returned  by  io_uring_setup(2).   to_submit
22       specifies  the  number  of  I/Os  to  submit from the submission queue.
23       flags is a bitmask of the following values:
24
25       IORING_ENTER_GETEVENTS
26              If this flag is set, then the system  call  will  wait  for  the
27              specificied  number  of events in min_complete before returning.
28              This flag can be set along with to_submit  to  both  submit  and
29              complete events in a single system call.
30
31       IORING_ENTER_SQ_WAKEUP
32              If the ring has been created with IORING_SETUP_SQPOLL, then this
33              flag asks the kernel to wakeup the SQ kernel  thread  to  submit
34              IO.
35
36       IORING_ENTER_SQ_WAIT
37              If  the ring has been created with IORING_SETUP_SQPOLL, then the
38              application has no real insight into when the SQ  kernel  thread
39              has consumed entries from the SQ ring. This can lead to a situa‐
40              tion where the application can no longer get a free SQE entry to
41              submit,  without knowing when it one becomes available as the SQ
42              kernel thread consumes them. If the system  call  is  used  with
43              this  flag  set,  then  it will wait until at least one entry is
44              free in the SQ ring.
45
46       If the io_uring instance was configured for polling, by specifying IOR‐
47       ING_SETUP_IOPOLL  in  the  call to io_uring_setup(2), then min_complete
48       has a slightly different meaning.  Passing a value of 0  instructs  the
49       kernel  to return any events which are already complete, without block‐
50       ing.  If min_complete is a non-zero value, the kernel will still return
51       immediately  if  any completion events are available.  If no event com‐
52       pletions are available, then the call will poll  either  until  one  or
53       more  completions  become  available, or until the process has exceeded
54       its scheduler time slice.
55
56       Note that, for interrupt driven I/O (where IORING_SETUP_IOPOLL was  not
57       specified  in  the call to io_uring_setup(2)), an application may check
58       the completion queue for event completions without entering the  kernel
59       at all.
60
61       When  the  system  call returns that a certain amount of SQEs have been
62       consumed and submitted, it's safe to reuse SQE  entries  in  the  ring.
63       This is true even if the actual IO submission had to be punted to async
64       context, which means that the SQE may in fact not have  been  submitted
65       yet.  If  the  kernel  requires later use of a particular SQE entry, it
66       will have made a private copy of it.
67
68       sig is a pointer to a signal mask (see sigprocmask(2)); if sig  is  not
69       NULL,  io_uring_enter()  first  replaces the current signal mask by the
70       one pointed to by sig, then waits for events to become available in the
71       completion queue, and then restores the original signal mask.  The fol‐
72       lowing io_uring_enter() call:
73
74           ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, &sig);
75
76       is equivalent to atomically executing the following calls:
77
78           pthread_sigmask(SIG_SETMASK, &sig, &orig);
79           ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, NULL);
80           pthread_sigmask(SIG_SETMASK, &orig, NULL);
81
82       See the description of pselect(2) for an explanation of why the sig pa‐
83       rameter is necessary.
84
85       Submission  queue  entries  are  represented  using  the following data
86       structure:
87
88           /*
89            * IO submission data structure (Submission Queue Entry)
90            */
91           struct io_uring_sqe {
92               __u8    opcode;         /* type of operation for this sqe */
93               __u8    flags;          /* IOSQE_ flags */
94               __u16   ioprio;         /* ioprio for the request */
95               __s32   fd;             /* file descriptor to do IO on */
96               union {
97                   __u64   off;            /* offset into file */
98                   __u64   addr2;
99               };
100               union {
101                   __u64   addr;       /* pointer to buffer or iovecs */
102                   __u64   splice_off_in;
103               }
104               __u32   len;            /* buffer size or number of iovecs */
105               union {
106                   __kernel_rwf_t  rw_flags;
107                   __u32    fsync_flags;
108                   __u16    poll_events;   /* compatibility */
109                   __u32    poll32_events; /* word-reversed for BE */
110                   __u32    sync_range_flags;
111                   __u32    msg_flags;
112                   __u32    timeout_flags;
113                   __u32    accept_flags;
114                   __u32    cancel_flags;
115                   __u32    open_flags;
116                   __u32    statx_flags;
117                   __u32    fadvise_advice;
118                   __u32    splice_flags;
119               };
120               __u64    user_data;     /* data to be passed back at completion time */
121               union {
122                struct {
123                    /* index into fixed buffers, if used */
124                       union {
125                           /* index into fixed buffers, if used */
126                           __u16    buf_index;
127                           /* for grouped buffer selection */
128                           __u16    buf_group;
129                       }
130                    /* personality to use, if used */
131                    __u16    personality;
132                       __s32    splice_fd_in;
133                };
134                   __u64    __pad2[3];
135               };
136           };
137
138       The opcode describes the operation to be performed.  It can be one of:
139
140       IORING_OP_NOP
141              Do not perform any I/O.  This is useful for testing the  perfor‐
142              mance of the io_uring implementation itself.
143
144       IORING_OP_READV
145
146       IORING_OP_WRITEV
147              Vectored  read  and  write operations, similar to preadv2(2) and
148              pwritev2(2).
149
150
151       IORING_OP_READ_FIXED
152
153       IORING_OP_WRITE_FIXED
154              Read from or write to pre-mapped buffers.   See  io_uring_regis‐
155              ter(2) for details on how to setup a context for fixed reads and
156              writes.
157
158
159       IORING_OP_FSYNC
160              File sync.  See also fsync(2).  Note that, while I/O  is  initi‐
161              ated  in  the order in which it appears in the submission queue,
162              completions are unordered.  For example,  an  application  which
163              places  a write I/O followed by an fsync in the submission queue
164              cannot expect the fsync to apply to the write.  The  two  opera‐
165              tions  execute in parallel, so the fsync may complete before the
166              write is issued to the storage.  The same is also true for  pre‐
167              viously  issued  writes  that  have  not  completed prior to the
168              fsync.
169
170
171       IORING_OP_POLL_ADD
172              Poll the fd specified in the  submission  queue  entry  for  the
173              events specified in the poll_events field.  Unlike poll or epoll
174              without EPOLLONESHOT, this interface always works  in  one  shot
175              mode.   That  is,  once the poll operation is completed, it will
176              have to be resubmitted. This command works like an async poll(2)
177              and the completion event result is the returned mask of events.
178
179
180       IORING_OP_POLL_REMOVE
181              Remove an existing poll request.  If found, the res field of the
182              struct io_uring_cqe will contain 0.  If not found, res will con‐
183              tain -ENOENT.
184
185
186       IORING_OP_EPOLL_CTL
187              Add,  remove or modify entries in the interest list of epoll(7).
188              See epoll_ctl(2) for details of the system call.  fd  holds  the
189              file  descriptor  that represents the epoll instance, addr holds
190              the file descriptor to add, remove or modify, len holds the  op‐
191              eration (EPOLL_CTL_ADD, EPOLL_CTL_DEL, EPOLL_CTL_MOD) to perform
192              and, off holds a pointer to the epoll_events  structure.  Avail‐
193              able since 5.6.
194
195
196       IORING_OP_SYNC_FILE_RANGE
197              Issue  the  equivalent  of a sync_file_range (2) on the file de‐
198              scriptor. The fd field is the file descriptor to sync,  the  off
199              field  holds the offset in bytes, the len field holds the length
200              in bytes, and the sync_range_flags field holds the flags for the
201              command. See also sync_file_range(2) for the general description
202              of the related system call. Available since 5.2.
203
204
205       IORING_OP_SENDMSG
206              Issue the equivalent of a sendmsg(2) system call.   fd  must  be
207              set  to  the socket file descriptor, addr must contain a pointer
208              to the msghdr structure, and msg_flags holds the  flags  associ‐
209              ated  with  the system call. See also sendmsg(2) for the general
210              description of the related system call. Available since 5.3.
211
212
213       IORING_OP_RECVMSG
214              Works just like IORING_OP_SENDMSG,  except  for  recvmsg(2)  in‐
215              stead. See the description of IORING_OP_SENDMSG. Available since
216              5.3.
217
218
219       IORING_OP_SEND
220              Issue the equivalent of a send(2) system call.  fd must  be  set
221              to  the  socket  file descriptor, addr must contain a pointer to
222              the buffer, len denotes the length of the buffer  to  send,  and
223              msg_flags  holds  the flags associated with the system call. See
224              also send(2) for the general description of the  related  system
225              call. Available since 5.6.
226
227
228       IORING_OP_RECV
229              Works  just like IORING_OP_SEND, except for recv(2) instead. See
230              the description of IORING_OP_SEND. Available since 5.6.
231
232
233       IORING_OP_TIMEOUT
234              This command will register a timeout operation. The  addr  field
235              must  contain  a  pointer  to a struct timespec64 structure, len
236              must contain  1  to  signify  one  timespec64  structure,  time‐
237              out_flags may contain IORING_TIMEOUT_ABS for an absolute timeout
238              value, or 0 for a relative timeout.  off may contain  a  comple‐
239              tion  event  count. A timeout will trigger a wakeup event on the
240              completion ring for anyone waiting for events. A timeout  condi‐
241              tion  is  met  when either the specified timeout expires, or the
242              specified number of events have completed. Either condition will
243              trigger  the  event.  If  set  to  0,  completed  events are not
244              counted, which effectively acts like a timer. io_uring  timeouts
245              use  the CLOCK_MONOTONIC clock source. The request will complete
246              with -ETIME if the timeout got completed through  expiration  of
247              the  timer,  or  0 if the timeout got completed through requests
248              completing on their own. If the timeout was cancelled before  it
249              expired,  the  request will complete with -ECANCELED.  Available
250              since 5.4.
251
252
253       IORING_OP_TIMEOUT_REMOVE
254              If timeout_flags are zero, then it attempts to remove an  exist‐
255              ing timeout operation.  addr must contain the user_data field of
256              the previously issued timeout operation. If the specified  time‐
257              out  request  is  found and cancelled successfully, this request
258              will terminate with a result value of 0 If the  timeout  request
259              was  found  but expiration was already in progress, this request
260              will terminate with a result value of -EBUSY If the timeout  re‐
261              quest  wasn't  found,  the  request will terminate with a result
262              value of -ENOENT Available since 5.5.
263
264              If timeout_flags contain IORING_TIMEOUT_UPDATE, instead  of  re‐
265              moving  an  existing  operation  it updates it.  addr and return
266              values are same as before.  addr2 field must contain  a  pointer
267              to  a  struct timespec64 structure.  timeout_flags may also con‐
268              tain IORING_TIMEOUT_ABS.  Available since 5.11.
269
270
271       IORING_OP_ACCEPT
272              Issue the equivalent of an accept4(2) system call.  fd  must  be
273              set to the socket file descriptor, addr must contain the pointer
274              to the sockaddr structure, and addr2 must contain a  pointer  to
275              the socklen_t addrlen field. See also accept4(2) for the general
276              description of the related system call. Available since 5.5.
277
278
279       IORING_OP_ASYNC_CANCEL
280              Attempt to cancel an already issued request.  addr must  contain
281              the user_data field of the request that should be cancelled. The
282              cancellation request will complete with one of the following re‐
283              sults  codes. If found, the res field of the cqe will contain 0.
284              If not found, res will contain -ENOENT. If found  and  attempted
285              cancelled,  the  res field will contain -EALREADY. In this case,
286              the request may or may not terminate. In general, requests  that
287              are  interruptible  (like  socket  IO) will get cancelled, while
288              disk IO requests cannot be cancelled if already started.  Avail‐
289              able since 5.5.
290
291
292       IORING_OP_LINK_TIMEOUT
293              This  request  must  be  linked  with  another  request  through
294              IOSQE_IO_LINK which is described below.  Unlike  IORING_OP_TIME‐
295              OUT,  IORING_OP_LINK_TIMEOUT acts on the linked request, not the
296              completion queue. The format of the command  is  otherwise  like
297              IORING_OP_TIMEOUT,  except  there's no completion event count as
298              it's tied to a specific request.  If used, the timeout specified
299              in the command will cancel the linked command, unless the linked
300              command completes before the timeout. The timeout will  complete
301              with  -ETIME if the timer expired and the linked request was at‐
302              tempted cancelled, or -ECANCELED if the timer got cancelled  be‐
303              cause  of completion of the linked request. Like IORING_OP_TIME‐
304              OUT the clock source used  is  CLOCK_MONOTONIC  Available  since
305              5.5.
306
307
308
309       IORING_OP_CONNECT
310              Issue  the  equivalent  of a connect(2) system call.  fd must be
311              set to the socket file descriptor, addr must contain  the  const
312              pointer  to  the  sockaddr  structure,  and off must contain the
313              socklen_t addrlen field. See also connect(2) for the general de‐
314              scription of the related system call. Available since 5.5.
315
316
317       IORING_OP_FALLOCATE
318              Issue  the equivalent of a fallocate(2) system call.  fd must be
319              set to the file descriptor, len must contain the mode associated
320              with  the operation, off must contain the offset on which to op‐
321              erate, and addr must contain the length. See  also  fallocate(2)
322              for  the  general description of the related system call. Avail‐
323              able since 5.6.
324
325
326       IORING_OP_FADVISE
327              Issue the equivalent of a posix_fadvise(2) system call.  fd must
328              be  set  to  the file descriptor, off must contain the offset on
329              which to operate, len must contain the length,  and  fadvise_ad‐
330              vice  must contain the advice associated with the operation. See
331              also posix_fadvise(2) for the general description of the related
332              system call. Available since 5.6.
333
334
335       IORING_OP_MADVISE
336              Issue  the  equivalent  of  a madvise(2) system call.  addr must
337              contain the address to operate on, len must contain  the  length
338              on  which to operate, and fadvise_advice must contain the advice
339              associated with the operation. See also madvise(2) for the  gen‐
340              eral  description  of  the  related system call. Available since
341              5.6.
342
343
344       IORING_OP_OPENAT
345              Issue the equivalent of a openat(2)  system  call.   fd  is  the
346              dirfd argument, addr must contain a pointer to the *pathname ar‐
347              gument, open_flags should contain any flags passed in,  and  len
348              is  access  mode of the file. See also openat(2) for the general
349              description of the related system call. Available since 5.6.
350
351
352       IORING_OP_OPENAT2
353              Issue the equivalent of a openat2(2) system  call.   fd  is  the
354              dirfd argument, addr must contain a pointer to the *pathname ar‐
355              gument, len should contain the size of the  open_how  structure,
356              and  off should be set to the address of the open_how structure.
357              See also openat2(2) for the general description of  the  related
358              system call. Available since 5.6.
359
360
361       IORING_OP_CLOSE
362              Issue  the equivalent of a close(2) system call.  fd is the file
363              descriptor to be closed. See also close(2) for the  general  de‐
364              scription of the related system call. Available since 5.6.
365
366
367       IORING_OP_STATX
368              Issue the equivalent of a statx(2) system call.  fd is the dirfd
369              argument, addr must contain a pointer to the  *pathname  string,
370              statx_flags  is the flags argument, len should be the mask argu‐
371              ment, and off must contain a  pointer  to  the  statxbuf  to  be
372              filled  in. See also statx(2) for the general description of the
373              related system call. Available since 5.6.
374
375
376       IORING_OP_READ
377
378       IORING_OP_WRITE
379              Issue the equivalent of a read(2) or write(2) system  call.   fd
380              is the file descriptor to be operated on, addr contains the buf‐
381              fer in question, and len contains the length of  the  IO  opera‐
382              tion. These are non-vectored versions of the IORING_OP_READV and
383              IORING_OP_WRITEV opcodes. See also read(2) and write(2) for  the
384              general  description of the related system call. Available since
385              5.6.
386
387
388       IORING_OP_SPLICE
389              Issue the equivalent of a splice(2) system  call.   splice_fd_in
390              is  the file descriptor to read from, splice_off_in is an offset
391              to read from, fd is the file descriptor to write to, off  is  an
392              offset from which to start writing to. A sentinel value of -1 is
393              used to pass the  equivalent  of  a  NULL  for  the  offsets  to
394              splice(2).    len   contains   the  number  of  bytes  to  copy.
395              splice_flags contains a bit mask for the flag  field  associated
396              with the system call.  Please note that one of the file descrip‐
397              tors must refer to a pipe.  See also splice(2) for  the  general
398              description of the related system call. Available since 5.7.
399
400
401       IORING_OP_TEE
402              Issue  the  equivalent of a tee(2) system call.  splice_fd_in is
403              the file descriptor to read from, fd is the file  descriptor  to
404              write  to,  len  contains  the  number  of  bytes  to  copy, and
405              splice_flags contains a bit mask for the flag  field  associated
406              with  the  system  call.   Please note that both of the file de‐
407              scriptors must refer to a pipe.  See also tee(2) for the general
408              description of the related system call. Available since 5.8.
409
410
411       IORING_OP_FILES_UPDATE
412              This   command   is   an   alternative  to  using  IORING_REGIS‐
413              TER_FILES_UPDATE which then works in an async fashion, like  the
414              rest  of the io_uring commands.  The arguments passed in are the
415              same.  addr must contain a pointer to the array of file descrip‐
416              tors,  len  must  contain  the length of the array, and off must
417              contain the offset at which to operate. Note that the  array  of
418              file descriptors pointed to in addr must remain valid until this
419              operation has completed. Available since 5.6.
420
421
422       IORING_OP_PROVIDE_BUFFERS
423              This command allows an application to register a group  of  buf‐
424              fers  to  be used by commands that read/receive data. Using buf‐
425              fers in this manner can eliminate the need to separate the  poll
426              +  read, which provides a convenient point in time to allocate a
427              buffer for a given request. It's often  infeasible  to  have  as
428              many  buffers  available  as pending reads or receive. With this
429              feature, the application can have its pool of buffers  ready  in
430              the kernel, and when the file or socket is ready to read/receive
431              data, a buffer can be selected for the operation.  fd must  con‐
432              tain  the  number  of  buffers to provide, addr must contain the
433              starting address to add  buffers  from,  len  must  contain  the
434              length of each buffer to add from the range, buf_group must con‐
435              tain the group ID of this range of buffers, and off must contain
436              the  starting buffer ID of this range of buffers. With that set,
437              the kernel adds buffers starting  with  the  memory  address  in
438              addr,  each  with a length of len.  Hence the application should
439              provide len * fd worth of memory in addr.  Buffers  are  grouped
440              by the group ID, and each buffer within this group will be iden‐
441              tical in size according to the above arguments. This allows  the
442              application  to provide different groups of buffers, and this is
443              often used to have differently sized buffers available depending
444              on  what  the  expectations  are of the individual request. When
445              submitting a request that should  use  a  provided  buffer,  the
446              IOSQE_BUFFER_SELECT  flag must be set, and buf_group must be set
447              to the desired buffer group ID where the buffer  should  be  se‐
448              lected from. Available since 5.7.
449
450
451       IORING_OP_REMOVE_BUFFERS
452              Remove buffers previously registered with IORING_OP_PROVIDE_BUF‐
453              FERS.  fd must contain the number  of  buffers  to  remove,  and
454              buf_group  must contain the buffer group ID from which to remove
455              the buffers. Available since 5.7.
456
457
458       IORING_OP_SHUTDOWN
459              Issue the equivalent of a shutdown(2) system call.   fd  is  the
460              file  descriptor  to  the socket being shutdown, no other fields
461              should be set. Available since 5.11.
462
463
464       IORING_OP_RENAMEAT
465              Issue the equivalent of a renameat2(2) system call.   fd  should
466              be  set  to the olddirfd, addr should be set to the oldpath, len
467              should be set to the newdirfd, addr should be set  to  the  old‐
468              path,  addr2  should  be  set  to  the  newpath, and finally re‐
469              name_flags should be set to the flags passed in to renameat2(2).
470              Available since 5.11.
471
472
473       IORING_OP_UNLINKAT
474              Issue  the  equivalent of a unlinkat2(2) system call.  fd should
475              be set to the dirfd, addr should be set to the pathname, and un‐
476              link_flags  should  be  set  to the flags being passed in to un‐
477              linkat(2).  Available since 5.11.
478
479
480       The flags field is a bit mask. The supported flags are:
481
482       IOSQE_FIXED_FILE
483              When this flag is specified, fd is an index into the files array
484              registered  with  the  io_uring  instance (see the IORING_REGIS‐
485              TER_FILES section of the io_uring_register(2) man page).  Avail‐
486              able since 5.1.
487
488       IOSQE_IO_DRAIN
489              When  this flag is specified, the SQE will not be started before
490              previously submitted SQEs have completed, and new SQEs will  not
491              be started before this one completes. Available since 5.2.
492
493       IOSQE_IO_LINK
494              When  this  flag is specified, it forms a link with the next SQE
495              in the submission ring. That next SQE will not be started before
496              this  one  completes.   This,  in effect, forms a chain of SQEs,
497              which can be arbitrarily long. The tail of the chain is  denoted
498              by  the  first  SQE that does not have this flag set.  This flag
499              has no effect on previous SQE submissions, nor  does  it  impact
500              SQEs  that are outside of the chain tail. This means that multi‐
501              ple chains can be executing in parallel, or chains and  individ‐
502              ual  SQEs. Only members inside the chain are serialized. A chain
503              of SQEs will be broken, if any request in that chain ends in er‐
504              ror.  io_uring  considers  any  unexpected result an error. This
505              means that, eg, a short read will also terminate  the  remainder
506              of  the chain.  If a chain of SQE links is broken, the remaining
507              unstarted part of the chain will  be  terminated  and  completed
508              with -ECANCELED as the error code. Available since 5.3.
509
510       IOSQE_IO_HARDLINK
511              Like  IOSQE_IO_LINK, but it doesn't sever regardless of the com‐
512              pletion result.  Note that the link will still sever if we  fail
513              submitting  the parent request, hard links are only resilient in
514              the presence of completion results for requests that did  submit
515              correctly.  IOSQE_IO_HARDLINK  implies IOSQE_IO_LINK.  Available
516              since 5.5.
517
518       IOSQE_ASYNC
519              Normal operation for io_uring is to try and issue an sqe as non-
520              blocking  first,  and if that fails, execute it in an async man‐
521              ner. To support more efficient overlapped operation of  requests
522              that  the  application knows/assumes will always (or most of the
523              time) block, the application can ask for an  sqe  to  be  issued
524              async from the start. Available since 5.6.
525
526       IOSQE_BUFFER_SELECT
527              Used  in conjunction with the IORING_OP_PROVIDE_BUFFERS command,
528              which registers a pool of buffers to be used  by  commands  that
529              read  or  receive data. When buffers are registered for this use
530              case, and this flag is set in the command, io_uring will grab  a
531              buffer  from  this  pool when the request is ready to receive or
532              read data. If  succesful,  the  resulting  CQE  will  have  IOR‐
533              ING_CQE_F_BUFFER  set  in  the flags part of the struct, and the
534              upper IORING_CQE_BUFFER_SHIFT bits will contain the  ID  of  the
535              selected  buffers.  This  allows the application to know exactly
536              which buffer was selected for the operation. If no  buffers  are
537              available  and this flag is set, then the request will fail with
538              -ENOBUFS as the error code. Once a buffer has been used,  it  is
539              no longer available in the kernel pool. The application must re-
540              register the given buffer again when it is ready to  recycle  it
541              (eg has completed using it). Available since 5.7.
542
543
544       ioprio specifies the I/O priority.  See ioprio_get(2) for a description
545       of Linux I/O priorities.
546
547       fd specifies the file descriptor against which the  operation  will  be
548       performed, with the exception noted above.
549
550       If   the   operation   is   one   of   IORING_OP_READ_FIXED   or   IOR‐
551       ING_OP_WRITE_FIXED, addr and len must fall within the buffer located at
552       buf_index  in  the fixed buffer array.  If the operation is either IOR‐
553       ING_OP_READV or IORING_OP_WRITEV, then addr points to an iovec array of
554       len entries.
555
556       rw_flags,  specified  for read and write operations, contains a bitwise
557       OR of per-I/O flags, as described in the preadv2(2) man page.
558
559       The fsync_flags bit mask may contain either 0, for a normal file integ‐
560       rity  sync,  or  IORING_FSYNC_DATASYNC to provide data sync only seman‐
561       tics.  See the descriptions of O_SYNC and O_DSYNC in the open(2) manual
562       page for more information.
563
564       The  bits  that  may be set in poll_events are defined in <poll.h>, and
565       documented in poll(2).
566
567       user_data is an application-supplied value that will be copied into the
568       completion  queue entry (see below).  buf_index is an index into an ar‐
569       ray of fixed buffers, and is only valid if fixed  buffers  were  regis‐
570       tered.   personality  is  the credentials id to use for this operation.
571       See io_uring_register(2) for how to register personalities with  io_ur‐
572       ing.  If  set  to  0, the current personality of the submitting task is
573       used.
574
575       Once the submission queue entry is initialized,  I/O  is  submitted  by
576       placing  the  index  of the submission queue entry into the tail of the
577       submission queue.  After one or more indexes are added  to  the  queue,
578       and  the  queue tail is advanced, the io_uring_enter(2) system call can
579       be invoked to initiate the I/O.
580
581       Completions use the following data structure:
582
583           /*
584            * IO completion data structure (Completion Queue Entry)
585            */
586           struct io_uring_cqe {
587               __u64    user_data; /* sqe->data submission passed back */
588               __s32    res;       /* result code for this event */
589               __u32    flags;
590           };
591
592       user_data is copied from the field of the same name in  the  submission
593       queue  entry.   The primary use case is to store data that the applica‐
594       tion will need to access upon completion of this particular  I/O.   The
595       flags  is  reserved  for future use.  res is the operation-specific re‐
596       sult, but io_uring-specific errors (e.g. flags or opcode  invalid)  are
597       returned through this field.  They are described in section CQE ERRORS.
598
599       For read and write opcodes, the return values match those documented in
600       the preadv2(2) and pwritev2(2) man pages.  Return codes for the  io_ur‐
601       ing-specific  opcodes  are documented in the description of the opcodes
602       above.
603

RETURN VALUE

605       io_uring_enter() returns the  number  of  I/Os  successfully  consumed.
606       This  can  be zero if to_submit was zero or if the submission queue was
607       empty.
608
609       The errors related to a submission queue entry will be returned through
610       a  completion queue entry (see section CQE ERRORS), rather than through
611       the system call itself.
612
613       Errors that occur not on behalf of a submission  queue  entry  are  re‐
614       turned  via  the system call directly. On such an error, -1 is returned
615       and errno is set appropriately.
616

ERRORS

618       These are the errors returned by io_uring_enter() system call.
619
620       EAGAIN The kernel was unable to allocate memory  for  the  request,  or
621              otherwise  ran  out  of  resources to handle it. The application
622              should wait for some completions and try again.
623
624       EBADF  fd is not a valid file descriptor.
625
626       EBADFD fd is a valid file descriptor, but the io_uring ring is  not  in
627              the  right state (enabled). See io_uring_register(2) for details
628              on how to enable the ring.
629
630       EBUSY  The application is attempting to overcommit the  number  of  re‐
631              quests it can have pending. The application should wait for some
632              completions and try again. May occur if the application tries to
633              queue more requests than we have room for in the CQ ring.
634
635       EINVAL Some bits in the flags argument are invalid.
636
637       EFAULT An  invalid  user  space address was specified for the sig argu‐
638              ment.
639
640       ENXIO  The io_uring instance is in the process of being torn down.
641
642       EOPNOTSUPP
643              fd does not refer to an io_uring instance.
644
645       EINTR  The operation was interrupted by a delivery of a  signal  before
646              it  could complete; see signal(7).  Can happen while waiting for
647              events with IORING_ENTER_GETEVENTS.
648
649

CQE ERRORS

651       These io_uring-specific errors are returned as a negative value in  the
652       res field of the completion queue entry.
653
654       EACCES The flags field or opcode in a submission queue entry is not al‐
655              lowed due to registered restrictions.  See  io_uring_register(2)
656              for details on how restrictions work.
657
658       EBADF  The  fd  field  in the submission queue entry is invalid, or the
659              IOSQE_FIXED_FILE flag was set in the submission queue entry, but
660              no files were registered with the io_uring instance.
661
662       EFAULT buffer is outside of the process' accessible address space
663
664       EFAULT IORING_OP_READ_FIXED  or  IORING_OP_WRITE_FIXED was specified in
665              the opcode field of the submission queue entry, but either  buf‐
666              fers  were not registered for this io_uring instance, or the ad‐
667              dress range described by addr and len does not  fit  within  the
668              buffer registered at buf_index.
669
670       EINVAL The  flags  field  or  opcode in a submission queue entry is in‐
671              valid.
672
673       EINVAL The buf_index member of the submission queue entry is invalid.
674
675       EINVAL The personality field in a submission queue entry is invalid.
676
677       EINVAL IORING_OP_NOP was specified in the submission queue  entry,  but
678              the  io_uring context was setup for polling (IORING_SETUP_IOPOLL
679              was specified in the call to io_uring_setup).
680
681       EINVAL IORING_OP_READV or IORING_OP_WRITEV was specified in the submis‐
682              sion  queue  entry,  but the io_uring instance has fixed buffers
683              registered.
684
685       EINVAL IORING_OP_READ_FIXED or IORING_OP_WRITE_FIXED was  specified  in
686              the submission queue entry, and the buf_index is invalid.
687
688       EINVAL IORING_OP_READV,  IORING_OP_WRITEV,  IORING_OP_READ_FIXED,  IOR‐
689              ING_OP_WRITE_FIXED or IORING_OP_FSYNC was specified in the  sub‐
690              mission  queue  entry,  but the io_uring instance was configured
691              for IOPOLLing, or any of addr, ioprio, off,  len,  or  buf_index
692              was set in the submission queue entry.
693
694       EINVAL IORING_OP_POLL_ADD or IORING_OP_POLL_REMOVE was specified in the
695              opcode field of the submission queue entry, but the io_uring in‐
696              stance    was    configured    for   busy-wait   polling   (IOR‐
697              ING_SETUP_IOPOLL), or any of ioprio, off, len, or buf_index  was
698              non-zero in the submission queue entry.
699
700       EINVAL IORING_OP_POLL_ADD was specified in the opcode field of the sub‐
701              mission queue entry, and the addr field was non-zero.
702
703       EOPNOTSUPP
704              opcode is valid, but not supported by this kernel.
705
706       EOPNOTSUPP
707              IOSQE_BUFFER_SELECT was set in the flags field of the submission
708              queue entry, but the opcode doesn't support buffer selection.
709
710
711
712Linux                             2019-01-22                 IO_URING_ENTER(2)
Impressum