1IO_URING_ENTER(2) Linux Programmer's Manual IO_URING_ENTER(2)
2
3
4
6 io_uring_enter - initiate and/or complete asynchronous I/O
7
9 #include <linux/io_uring.h>
10
11 int io_uring_enter(unsigned int fd, unsigned int to_submit,
12 unsigned int min_complete, unsigned int flags,
13 sigset_t *sig);
14
16 io_uring_enter() is used to initiate and complete I/O using the shared
17 submission and completion queues setup by a call to io_uring_setup(2).
18 A single call can both submit new I/O and wait for completions of I/O
19 initiated by this call or previous calls to io_uring_enter().
20
21 fd is the file descriptor returned by io_uring_setup(2). to_submit
22 specifies the number of I/Os to submit from the submission queue.
23 flags is a bitmask of the following values:
24
25 IORING_ENTER_GETEVENTS
26 If this flag is set, then the system call will wait for the
27 specificied number of events in min_complete before returning.
28 This flag can be set along with to_submit to both submit and
29 complete events in a single system call.
30
31 IORING_ENTER_SQ_WAKEUP
32 If the ring has been created with IORING_SETUP_SQPOLL, then this
33 flag asks the kernel to wakeup the SQ kernel thread to submit
34 IO.
35
36 IORING_ENTER_SQ_WAIT
37 If the ring has been created with IORING_SETUP_SQPOLL, then the
38 application has no real insight into when the SQ kernel thread
39 has consumed entries from the SQ ring. This can lead to a situa‐
40 tion where the application can no longer get a free SQE entry to
41 submit, without knowing when it one becomes available as the SQ
42 kernel thread consumes them. If the system call is used with
43 this flag set, then it will wait until at least one entry is
44 free in the SQ ring.
45
46 If the io_uring instance was configured for polling, by specifying IOR‐
47 ING_SETUP_IOPOLL in the call to io_uring_setup(2), then min_complete
48 has a slightly different meaning. Passing a value of 0 instructs the
49 kernel to return any events which are already complete, without block‐
50 ing. If min_complete is a non-zero value, the kernel will still return
51 immediately if any completion events are available. If no event com‐
52 pletions are available, then the call will poll either until one or
53 more completions become available, or until the process has exceeded
54 its scheduler time slice.
55
56 Note that, for interrupt driven I/O (where IORING_SETUP_IOPOLL was not
57 specified in the call to io_uring_setup(2)), an application may check
58 the completion queue for event completions without entering the kernel
59 at all.
60
61 When the system call returns that a certain amount of SQEs have been
62 consumed and submitted, it's safe to reuse SQE entries in the ring.
63 This is true even if the actual IO submission had to be punted to async
64 context, which means that the SQE may in fact not have been submitted
65 yet. If the kernel requires later use of a particular SQE entry, it
66 will have made a private copy of it.
67
68 sig is a pointer to a signal mask (see sigprocmask(2)); if sig is not
69 NULL, io_uring_enter() first replaces the current signal mask by the
70 one pointed to by sig, then waits for events to become available in the
71 completion queue, and then restores the original signal mask. The fol‐
72 lowing io_uring_enter() call:
73
74 ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, &sig);
75
76 is equivalent to atomically executing the following calls:
77
78 pthread_sigmask(SIG_SETMASK, &sig, &orig);
79 ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, NULL);
80 pthread_sigmask(SIG_SETMASK, &orig, NULL);
81
82 See the description of pselect(2) for an explanation of why the sig pa‐
83 rameter is necessary.
84
85 Submission queue entries are represented using the following data
86 structure:
87
88 /*
89 * IO submission data structure (Submission Queue Entry)
90 */
91 struct io_uring_sqe {
92 __u8 opcode; /* type of operation for this sqe */
93 __u8 flags; /* IOSQE_ flags */
94 __u16 ioprio; /* ioprio for the request */
95 __s32 fd; /* file descriptor to do IO on */
96 union {
97 __u64 off; /* offset into file */
98 __u64 addr2;
99 };
100 union {
101 __u64 addr; /* pointer to buffer or iovecs */
102 __u64 splice_off_in;
103 }
104 __u32 len; /* buffer size or number of iovecs */
105 union {
106 __kernel_rwf_t rw_flags;
107 __u32 fsync_flags;
108 __u16 poll_events; /* compatibility */
109 __u32 poll32_events; /* word-reversed for BE */
110 __u32 sync_range_flags;
111 __u32 msg_flags;
112 __u32 timeout_flags;
113 __u32 accept_flags;
114 __u32 cancel_flags;
115 __u32 open_flags;
116 __u32 statx_flags;
117 __u32 fadvise_advice;
118 __u32 splice_flags;
119 };
120 __u64 user_data; /* data to be passed back at completion time */
121 union {
122 struct {
123 /* index into fixed buffers, if used */
124 union {
125 /* index into fixed buffers, if used */
126 __u16 buf_index;
127 /* for grouped buffer selection */
128 __u16 buf_group;
129 }
130 /* personality to use, if used */
131 __u16 personality;
132 __s32 splice_fd_in;
133 };
134 __u64 __pad2[3];
135 };
136 };
137
138 The opcode describes the operation to be performed. It can be one of:
139
140 IORING_OP_NOP
141 Do not perform any I/O. This is useful for testing the perfor‐
142 mance of the io_uring implementation itself.
143
144 IORING_OP_READV
145
146 IORING_OP_WRITEV
147 Vectored read and write operations, similar to preadv2(2) and
148 pwritev2(2).
149
150
151 IORING_OP_READ_FIXED
152
153 IORING_OP_WRITE_FIXED
154 Read from or write to pre-mapped buffers. See io_uring_regis‐
155 ter(2) for details on how to setup a context for fixed reads and
156 writes.
157
158
159 IORING_OP_FSYNC
160 File sync. See also fsync(2). Note that, while I/O is initi‐
161 ated in the order in which it appears in the submission queue,
162 completions are unordered. For example, an application which
163 places a write I/O followed by an fsync in the submission queue
164 cannot expect the fsync to apply to the write. The two opera‐
165 tions execute in parallel, so the fsync may complete before the
166 write is issued to the storage. The same is also true for pre‐
167 viously issued writes that have not completed prior to the
168 fsync.
169
170
171 IORING_OP_POLL_ADD
172 Poll the fd specified in the submission queue entry for the
173 events specified in the poll_events field. Unlike poll or epoll
174 without EPOLLONESHOT, this interface always works in one shot
175 mode. That is, once the poll operation is completed, it will
176 have to be resubmitted. This command works like an async poll(2)
177 and the completion event result is the returned mask of events.
178
179
180 IORING_OP_POLL_REMOVE
181 Remove an existing poll request. If found, the res field of the
182 struct io_uring_cqe will contain 0. If not found, res will con‐
183 tain -ENOENT.
184
185
186 IORING_OP_EPOLL_CTL
187 Add, remove or modify entries in the interest list of epoll(7).
188 See epoll_ctl(2) for details of the system call. fd holds the
189 file descriptor that represents the epoll instance, addr holds
190 the file descriptor to add, remove or modify, len holds the op‐
191 eration (EPOLL_CTL_ADD, EPOLL_CTL_DEL, EPOLL_CTL_MOD) to perform
192 and, off holds a pointer to the epoll_events structure. Avail‐
193 able since 5.6.
194
195
196 IORING_OP_SYNC_FILE_RANGE
197 Issue the equivalent of a sync_file_range (2) on the file de‐
198 scriptor. The fd field is the file descriptor to sync, the off
199 field holds the offset in bytes, the len field holds the length
200 in bytes, and the sync_range_flags field holds the flags for the
201 command. See also sync_file_range(2) for the general description
202 of the related system call. Available since 5.2.
203
204
205 IORING_OP_SENDMSG
206 Issue the equivalent of a sendmsg(2) system call. fd must be
207 set to the socket file descriptor, addr must contain a pointer
208 to the msghdr structure, and msg_flags holds the flags associ‐
209 ated with the system call. See also sendmsg(2) for the general
210 description of the related system call. Available since 5.3.
211
212
213 IORING_OP_RECVMSG
214 Works just like IORING_OP_SENDMSG, except for recvmsg(2) in‐
215 stead. See the description of IORING_OP_SENDMSG. Available since
216 5.3.
217
218
219 IORING_OP_SEND
220 Issue the equivalent of a send(2) system call. fd must be set
221 to the socket file descriptor, addr must contain a pointer to
222 the buffer, len denotes the length of the buffer to send, and
223 msg_flags holds the flags associated with the system call. See
224 also send(2) for the general description of the related system
225 call. Available since 5.6.
226
227
228 IORING_OP_RECV
229 Works just like IORING_OP_SEND, except for recv(2) instead. See
230 the description of IORING_OP_SEND. Available since 5.6.
231
232
233 IORING_OP_TIMEOUT
234 This command will register a timeout operation. The addr field
235 must contain a pointer to a struct timespec64 structure, len
236 must contain 1 to signify one timespec64 structure, time‐
237 out_flags may contain IORING_TIMEOUT_ABS for an absolute timeout
238 value, or 0 for a relative timeout. off may contain a comple‐
239 tion event count. A timeout will trigger a wakeup event on the
240 completion ring for anyone waiting for events. A timeout condi‐
241 tion is met when either the specified timeout expires, or the
242 specified number of events have completed. Either condition will
243 trigger the event. If set to 0, completed events are not
244 counted, which effectively acts like a timer. io_uring timeouts
245 use the CLOCK_MONOTONIC clock source. The request will complete
246 with -ETIME if the timeout got completed through expiration of
247 the timer, or 0 if the timeout got completed through requests
248 completing on their own. If the timeout was cancelled before it
249 expired, the request will complete with -ECANCELED. Available
250 since 5.4.
251
252
253 IORING_OP_TIMEOUT_REMOVE
254 If timeout_flags are zero, then it attempts to remove an exist‐
255 ing timeout operation. addr must contain the user_data field of
256 the previously issued timeout operation. If the specified time‐
257 out request is found and cancelled successfully, this request
258 will terminate with a result value of 0 If the timeout request
259 was found but expiration was already in progress, this request
260 will terminate with a result value of -EBUSY If the timeout re‐
261 quest wasn't found, the request will terminate with a result
262 value of -ENOENT Available since 5.5.
263
264 If timeout_flags contain IORING_TIMEOUT_UPDATE, instead of re‐
265 moving an existing operation it updates it. addr and return
266 values are same as before. addr2 field must contain a pointer
267 to a struct timespec64 structure. timeout_flags may also con‐
268 tain IORING_TIMEOUT_ABS. Available since 5.11.
269
270
271 IORING_OP_ACCEPT
272 Issue the equivalent of an accept4(2) system call. fd must be
273 set to the socket file descriptor, addr must contain the pointer
274 to the sockaddr structure, and addr2 must contain a pointer to
275 the socklen_t addrlen field. See also accept4(2) for the general
276 description of the related system call. Available since 5.5.
277
278
279 IORING_OP_ASYNC_CANCEL
280 Attempt to cancel an already issued request. addr must contain
281 the user_data field of the request that should be cancelled. The
282 cancellation request will complete with one of the following re‐
283 sults codes. If found, the res field of the cqe will contain 0.
284 If not found, res will contain -ENOENT. If found and attempted
285 cancelled, the res field will contain -EALREADY. In this case,
286 the request may or may not terminate. In general, requests that
287 are interruptible (like socket IO) will get cancelled, while
288 disk IO requests cannot be cancelled if already started. Avail‐
289 able since 5.5.
290
291
292 IORING_OP_LINK_TIMEOUT
293 This request must be linked with another request through
294 IOSQE_IO_LINK which is described below. Unlike IORING_OP_TIME‐
295 OUT, IORING_OP_LINK_TIMEOUT acts on the linked request, not the
296 completion queue. The format of the command is otherwise like
297 IORING_OP_TIMEOUT, except there's no completion event count as
298 it's tied to a specific request. If used, the timeout specified
299 in the command will cancel the linked command, unless the linked
300 command completes before the timeout. The timeout will complete
301 with -ETIME if the timer expired and the linked request was at‐
302 tempted cancelled, or -ECANCELED if the timer got cancelled be‐
303 cause of completion of the linked request. Like IORING_OP_TIME‐
304 OUT the clock source used is CLOCK_MONOTONIC Available since
305 5.5.
306
307
308
309 IORING_OP_CONNECT
310 Issue the equivalent of a connect(2) system call. fd must be
311 set to the socket file descriptor, addr must contain the const
312 pointer to the sockaddr structure, and off must contain the
313 socklen_t addrlen field. See also connect(2) for the general de‐
314 scription of the related system call. Available since 5.5.
315
316
317 IORING_OP_FALLOCATE
318 Issue the equivalent of a fallocate(2) system call. fd must be
319 set to the file descriptor, len must contain the mode associated
320 with the operation, off must contain the offset on which to op‐
321 erate, and addr must contain the length. See also fallocate(2)
322 for the general description of the related system call. Avail‐
323 able since 5.6.
324
325
326 IORING_OP_FADVISE
327 Issue the equivalent of a posix_fadvise(2) system call. fd must
328 be set to the file descriptor, off must contain the offset on
329 which to operate, len must contain the length, and fadvise_ad‐
330 vice must contain the advice associated with the operation. See
331 also posix_fadvise(2) for the general description of the related
332 system call. Available since 5.6.
333
334
335 IORING_OP_MADVISE
336 Issue the equivalent of a madvise(2) system call. addr must
337 contain the address to operate on, len must contain the length
338 on which to operate, and fadvise_advice must contain the advice
339 associated with the operation. See also madvise(2) for the gen‐
340 eral description of the related system call. Available since
341 5.6.
342
343
344 IORING_OP_OPENAT
345 Issue the equivalent of a openat(2) system call. fd is the
346 dirfd argument, addr must contain a pointer to the *pathname ar‐
347 gument, open_flags should contain any flags passed in, and len
348 is access mode of the file. See also openat(2) for the general
349 description of the related system call. Available since 5.6.
350
351
352 IORING_OP_OPENAT2
353 Issue the equivalent of a openat2(2) system call. fd is the
354 dirfd argument, addr must contain a pointer to the *pathname ar‐
355 gument, len should contain the size of the open_how structure,
356 and off should be set to the address of the open_how structure.
357 See also openat2(2) for the general description of the related
358 system call. Available since 5.6.
359
360
361 IORING_OP_CLOSE
362 Issue the equivalent of a close(2) system call. fd is the file
363 descriptor to be closed. See also close(2) for the general de‐
364 scription of the related system call. Available since 5.6.
365
366
367 IORING_OP_STATX
368 Issue the equivalent of a statx(2) system call. fd is the dirfd
369 argument, addr must contain a pointer to the *pathname string,
370 statx_flags is the flags argument, len should be the mask argu‐
371 ment, and off must contain a pointer to the statxbuf to be
372 filled in. See also statx(2) for the general description of the
373 related system call. Available since 5.6.
374
375
376 IORING_OP_READ
377
378 IORING_OP_WRITE
379 Issue the equivalent of a read(2) or write(2) system call. fd
380 is the file descriptor to be operated on, addr contains the buf‐
381 fer in question, and len contains the length of the IO opera‐
382 tion. These are non-vectored versions of the IORING_OP_READV and
383 IORING_OP_WRITEV opcodes. See also read(2) and write(2) for the
384 general description of the related system call. Available since
385 5.6.
386
387
388 IORING_OP_SPLICE
389 Issue the equivalent of a splice(2) system call. splice_fd_in
390 is the file descriptor to read from, splice_off_in is an offset
391 to read from, fd is the file descriptor to write to, off is an
392 offset from which to start writing to. A sentinel value of -1 is
393 used to pass the equivalent of a NULL for the offsets to
394 splice(2). len contains the number of bytes to copy.
395 splice_flags contains a bit mask for the flag field associated
396 with the system call. Please note that one of the file descrip‐
397 tors must refer to a pipe. See also splice(2) for the general
398 description of the related system call. Available since 5.7.
399
400
401 IORING_OP_TEE
402 Issue the equivalent of a tee(2) system call. splice_fd_in is
403 the file descriptor to read from, fd is the file descriptor to
404 write to, len contains the number of bytes to copy, and
405 splice_flags contains a bit mask for the flag field associated
406 with the system call. Please note that both of the file de‐
407 scriptors must refer to a pipe. See also tee(2) for the general
408 description of the related system call. Available since 5.8.
409
410
411 IORING_OP_FILES_UPDATE
412 This command is an alternative to using IORING_REGIS‐
413 TER_FILES_UPDATE which then works in an async fashion, like the
414 rest of the io_uring commands. The arguments passed in are the
415 same. addr must contain a pointer to the array of file descrip‐
416 tors, len must contain the length of the array, and off must
417 contain the offset at which to operate. Note that the array of
418 file descriptors pointed to in addr must remain valid until this
419 operation has completed. Available since 5.6.
420
421
422 IORING_OP_PROVIDE_BUFFERS
423 This command allows an application to register a group of buf‐
424 fers to be used by commands that read/receive data. Using buf‐
425 fers in this manner can eliminate the need to separate the poll
426 + read, which provides a convenient point in time to allocate a
427 buffer for a given request. It's often infeasible to have as
428 many buffers available as pending reads or receive. With this
429 feature, the application can have its pool of buffers ready in
430 the kernel, and when the file or socket is ready to read/receive
431 data, a buffer can be selected for the operation. fd must con‐
432 tain the number of buffers to provide, addr must contain the
433 starting address to add buffers from, len must contain the
434 length of each buffer to add from the range, buf_group must con‐
435 tain the group ID of this range of buffers, and off must contain
436 the starting buffer ID of this range of buffers. With that set,
437 the kernel adds buffers starting with the memory address in
438 addr, each with a length of len. Hence the application should
439 provide len * fd worth of memory in addr. Buffers are grouped
440 by the group ID, and each buffer within this group will be iden‐
441 tical in size according to the above arguments. This allows the
442 application to provide different groups of buffers, and this is
443 often used to have differently sized buffers available depending
444 on what the expectations are of the individual request. When
445 submitting a request that should use a provided buffer, the
446 IOSQE_BUFFER_SELECT flag must be set, and buf_group must be set
447 to the desired buffer group ID where the buffer should be se‐
448 lected from. Available since 5.7.
449
450
451 IORING_OP_REMOVE_BUFFERS
452 Remove buffers previously registered with IORING_OP_PROVIDE_BUF‐
453 FERS. fd must contain the number of buffers to remove, and
454 buf_group must contain the buffer group ID from which to remove
455 the buffers. Available since 5.7.
456
457
458 IORING_OP_SHUTDOWN
459 Issue the equivalent of a shutdown(2) system call. fd is the
460 file descriptor to the socket being shutdown, no other fields
461 should be set. Available since 5.11.
462
463
464 IORING_OP_RENAMEAT
465 Issue the equivalent of a renameat2(2) system call. fd should
466 be set to the olddirfd, addr should be set to the oldpath, len
467 should be set to the newdirfd, addr should be set to the old‐
468 path, addr2 should be set to the newpath, and finally re‐
469 name_flags should be set to the flags passed in to renameat2(2).
470 Available since 5.11.
471
472
473 IORING_OP_UNLINKAT
474 Issue the equivalent of a unlinkat2(2) system call. fd should
475 be set to the dirfd, addr should be set to the pathname, and un‐
476 link_flags should be set to the flags being passed in to un‐
477 linkat(2). Available since 5.11.
478
479
480 The flags field is a bit mask. The supported flags are:
481
482 IOSQE_FIXED_FILE
483 When this flag is specified, fd is an index into the files array
484 registered with the io_uring instance (see the IORING_REGIS‐
485 TER_FILES section of the io_uring_register(2) man page). Avail‐
486 able since 5.1.
487
488 IOSQE_IO_DRAIN
489 When this flag is specified, the SQE will not be started before
490 previously submitted SQEs have completed, and new SQEs will not
491 be started before this one completes. Available since 5.2.
492
493 IOSQE_IO_LINK
494 When this flag is specified, it forms a link with the next SQE
495 in the submission ring. That next SQE will not be started before
496 this one completes. This, in effect, forms a chain of SQEs,
497 which can be arbitrarily long. The tail of the chain is denoted
498 by the first SQE that does not have this flag set. This flag
499 has no effect on previous SQE submissions, nor does it impact
500 SQEs that are outside of the chain tail. This means that multi‐
501 ple chains can be executing in parallel, or chains and individ‐
502 ual SQEs. Only members inside the chain are serialized. A chain
503 of SQEs will be broken, if any request in that chain ends in er‐
504 ror. io_uring considers any unexpected result an error. This
505 means that, eg, a short read will also terminate the remainder
506 of the chain. If a chain of SQE links is broken, the remaining
507 unstarted part of the chain will be terminated and completed
508 with -ECANCELED as the error code. Available since 5.3.
509
510 IOSQE_IO_HARDLINK
511 Like IOSQE_IO_LINK, but it doesn't sever regardless of the com‐
512 pletion result. Note that the link will still sever if we fail
513 submitting the parent request, hard links are only resilient in
514 the presence of completion results for requests that did submit
515 correctly. IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. Available
516 since 5.5.
517
518 IOSQE_ASYNC
519 Normal operation for io_uring is to try and issue an sqe as non-
520 blocking first, and if that fails, execute it in an async man‐
521 ner. To support more efficient overlapped operation of requests
522 that the application knows/assumes will always (or most of the
523 time) block, the application can ask for an sqe to be issued
524 async from the start. Available since 5.6.
525
526 IOSQE_BUFFER_SELECT
527 Used in conjunction with the IORING_OP_PROVIDE_BUFFERS command,
528 which registers a pool of buffers to be used by commands that
529 read or receive data. When buffers are registered for this use
530 case, and this flag is set in the command, io_uring will grab a
531 buffer from this pool when the request is ready to receive or
532 read data. If succesful, the resulting CQE will have IOR‐
533 ING_CQE_F_BUFFER set in the flags part of the struct, and the
534 upper IORING_CQE_BUFFER_SHIFT bits will contain the ID of the
535 selected buffers. This allows the application to know exactly
536 which buffer was selected for the operation. If no buffers are
537 available and this flag is set, then the request will fail with
538 -ENOBUFS as the error code. Once a buffer has been used, it is
539 no longer available in the kernel pool. The application must re-
540 register the given buffer again when it is ready to recycle it
541 (eg has completed using it). Available since 5.7.
542
543
544 ioprio specifies the I/O priority. See ioprio_get(2) for a description
545 of Linux I/O priorities.
546
547 fd specifies the file descriptor against which the operation will be
548 performed, with the exception noted above.
549
550 If the operation is one of IORING_OP_READ_FIXED or IOR‐
551 ING_OP_WRITE_FIXED, addr and len must fall within the buffer located at
552 buf_index in the fixed buffer array. If the operation is either IOR‐
553 ING_OP_READV or IORING_OP_WRITEV, then addr points to an iovec array of
554 len entries.
555
556 rw_flags, specified for read and write operations, contains a bitwise
557 OR of per-I/O flags, as described in the preadv2(2) man page.
558
559 The fsync_flags bit mask may contain either 0, for a normal file integ‐
560 rity sync, or IORING_FSYNC_DATASYNC to provide data sync only seman‐
561 tics. See the descriptions of O_SYNC and O_DSYNC in the open(2) manual
562 page for more information.
563
564 The bits that may be set in poll_events are defined in <poll.h>, and
565 documented in poll(2).
566
567 user_data is an application-supplied value that will be copied into the
568 completion queue entry (see below). buf_index is an index into an ar‐
569 ray of fixed buffers, and is only valid if fixed buffers were regis‐
570 tered. personality is the credentials id to use for this operation.
571 See io_uring_register(2) for how to register personalities with io_ur‐
572 ing. If set to 0, the current personality of the submitting task is
573 used.
574
575 Once the submission queue entry is initialized, I/O is submitted by
576 placing the index of the submission queue entry into the tail of the
577 submission queue. After one or more indexes are added to the queue,
578 and the queue tail is advanced, the io_uring_enter(2) system call can
579 be invoked to initiate the I/O.
580
581 Completions use the following data structure:
582
583 /*
584 * IO completion data structure (Completion Queue Entry)
585 */
586 struct io_uring_cqe {
587 __u64 user_data; /* sqe->data submission passed back */
588 __s32 res; /* result code for this event */
589 __u32 flags;
590 };
591
592 user_data is copied from the field of the same name in the submission
593 queue entry. The primary use case is to store data that the applica‐
594 tion will need to access upon completion of this particular I/O. The
595 flags is reserved for future use. res is the operation-specific re‐
596 sult, but io_uring-specific errors (e.g. flags or opcode invalid) are
597 returned through this field. They are described in section CQE ERRORS.
598
599 For read and write opcodes, the return values match those documented in
600 the preadv2(2) and pwritev2(2) man pages. Return codes for the io_ur‐
601 ing-specific opcodes are documented in the description of the opcodes
602 above.
603
605 io_uring_enter() returns the number of I/Os successfully consumed.
606 This can be zero if to_submit was zero or if the submission queue was
607 empty.
608
609 The errors related to a submission queue entry will be returned through
610 a completion queue entry (see section CQE ERRORS), rather than through
611 the system call itself.
612
613 Errors that occur not on behalf of a submission queue entry are re‐
614 turned via the system call directly. On such an error, -1 is returned
615 and errno is set appropriately.
616
618 These are the errors returned by io_uring_enter() system call.
619
620 EAGAIN The kernel was unable to allocate memory for the request, or
621 otherwise ran out of resources to handle it. The application
622 should wait for some completions and try again.
623
624 EBADF fd is not a valid file descriptor.
625
626 EBADFD fd is a valid file descriptor, but the io_uring ring is not in
627 the right state (enabled). See io_uring_register(2) for details
628 on how to enable the ring.
629
630 EBUSY The application is attempting to overcommit the number of re‐
631 quests it can have pending. The application should wait for some
632 completions and try again. May occur if the application tries to
633 queue more requests than we have room for in the CQ ring.
634
635 EINVAL Some bits in the flags argument are invalid.
636
637 EFAULT An invalid user space address was specified for the sig argu‐
638 ment.
639
640 ENXIO The io_uring instance is in the process of being torn down.
641
642 EOPNOTSUPP
643 fd does not refer to an io_uring instance.
644
645 EINTR The operation was interrupted by a delivery of a signal before
646 it could complete; see signal(7). Can happen while waiting for
647 events with IORING_ENTER_GETEVENTS.
648
649
651 These io_uring-specific errors are returned as a negative value in the
652 res field of the completion queue entry.
653
654 EACCES The flags field or opcode in a submission queue entry is not al‐
655 lowed due to registered restrictions. See io_uring_register(2)
656 for details on how restrictions work.
657
658 EBADF The fd field in the submission queue entry is invalid, or the
659 IOSQE_FIXED_FILE flag was set in the submission queue entry, but
660 no files were registered with the io_uring instance.
661
662 EFAULT buffer is outside of the process' accessible address space
663
664 EFAULT IORING_OP_READ_FIXED or IORING_OP_WRITE_FIXED was specified in
665 the opcode field of the submission queue entry, but either buf‐
666 fers were not registered for this io_uring instance, or the ad‐
667 dress range described by addr and len does not fit within the
668 buffer registered at buf_index.
669
670 EINVAL The flags field or opcode in a submission queue entry is in‐
671 valid.
672
673 EINVAL The buf_index member of the submission queue entry is invalid.
674
675 EINVAL The personality field in a submission queue entry is invalid.
676
677 EINVAL IORING_OP_NOP was specified in the submission queue entry, but
678 the io_uring context was setup for polling (IORING_SETUP_IOPOLL
679 was specified in the call to io_uring_setup).
680
681 EINVAL IORING_OP_READV or IORING_OP_WRITEV was specified in the submis‐
682 sion queue entry, but the io_uring instance has fixed buffers
683 registered.
684
685 EINVAL IORING_OP_READ_FIXED or IORING_OP_WRITE_FIXED was specified in
686 the submission queue entry, and the buf_index is invalid.
687
688 EINVAL IORING_OP_READV, IORING_OP_WRITEV, IORING_OP_READ_FIXED, IOR‐
689 ING_OP_WRITE_FIXED or IORING_OP_FSYNC was specified in the sub‐
690 mission queue entry, but the io_uring instance was configured
691 for IOPOLLing, or any of addr, ioprio, off, len, or buf_index
692 was set in the submission queue entry.
693
694 EINVAL IORING_OP_POLL_ADD or IORING_OP_POLL_REMOVE was specified in the
695 opcode field of the submission queue entry, but the io_uring in‐
696 stance was configured for busy-wait polling (IOR‐
697 ING_SETUP_IOPOLL), or any of ioprio, off, len, or buf_index was
698 non-zero in the submission queue entry.
699
700 EINVAL IORING_OP_POLL_ADD was specified in the opcode field of the sub‐
701 mission queue entry, and the addr field was non-zero.
702
703 EOPNOTSUPP
704 opcode is valid, but not supported by this kernel.
705
706 EOPNOTSUPP
707 IOSQE_BUFFER_SELECT was set in the flags field of the submission
708 queue entry, but the opcode doesn't support buffer selection.
709
710
711
712Linux 2019-01-22 IO_URING_ENTER(2)