1IO_URING_SETUP(2)          Linux Programmer's Manual         IO_URING_SETUP(2)
2
3
4

NAME

6       io_uring_setup - setup a context for performing asynchronous I/O
7

SYNOPSIS

9       #include <linux/io_uring.h>
10
11       int io_uring_setup(u32 entries, struct io_uring_params *p);
12

DESCRIPTION

14       The  io_uring_setup()  system  call sets up a submission queue (SQ) and
15       completion queue (CQ) with at least entries entries, and returns a file
16       descriptor  which  can  be used to perform subsequent operations on the
17       io_uring instance.  The submission and  completion  queues  are  shared
18       between  userspace  and  the  kernel, which eliminates the need to copy
19       data when initiating and completing I/O.
20
21       params is used by the application to pass options to the kernel, and by
22       the kernel to convey information about the ring buffers.
23
24           struct io_uring_params {
25               __u32 sq_entries;
26               __u32 cq_entries;
27               __u32 flags;
28               __u32 sq_thread_cpu;
29               __u32 sq_thread_idle;
30               __u32 features;
31               __u32 resv[4];
32               struct io_sqring_offsets sq_off;
33               struct io_cqring_offsets cq_off;
34           };
35
36       The flags, sq_thread_cpu, and sq_thread_idle fields are used to config‐
37       ure the io_uring instance.  flags is a bit mask of 0  or  more  of  the
38       following values ORed together:
39
40       IORING_SETUP_IOPOLL
41              Perform  busy-waiting  for an I/O completion, as opposed to get‐
42              ting notifications via an asynchronous IRQ (Interrupt  Request).
43              The  file  system (if any) and block device must support polling
44              in order for this to work.  Busy-waiting provides lower latency,
45              but  may  consume  more CPU resources than interrupt driven I/O.
46              Currently, this feature is usable  only  on  a  file  descriptor
47              opened using the O_DIRECT flag.  When a read or write is submit‐
48              ted to a polled context, the application must poll  for  comple‐
49              tions  on the CQ ring by calling io_uring_enter(2).  It is ille‐
50              gal to mix and match polled and non-polled I/O  on  an  io_uring
51              instance.
52
53
54       IORING_SETUP_SQPOLL
55              When  this flag is specified, a kernel thread is created to per‐
56              form submission queue polling.  An io_uring instance  configured
57              in  this  way  enables  an application to issue I/O without ever
58              context switching into the  kernel.   By  using  the  submission
59              queue  to  fill in new submission queue entries and watching for
60              completions on the completion queue, the application can  submit
61              and reap I/Os without doing a single system call.
62
63              If  the  kernel thread is idle for more than sq_thread_idle mil‐
64              liseconds, it will set  the  IORING_SQ_NEED_WAKEUP  bit  in  the
65              flags  field  of  the struct io_sq_ring.  When this happens, the
66              application must  call  io_uring_enter(2)  to  wake  the  kernel
67              thread.   If  I/O  is  kept  busy,  the kernel thread will never
68              sleep.  An application making use of this feature will  need  to
69              guard   the  io_uring_enter(2)  call  with  the  following  code
70              sequence:
71
72                  /*
73                   * Ensure that the wakeup flag is read after the tail pointer has been
74                   * written.
75                   */
76                  smp_mb();
77                  if (*sq_ring->flags & IORING_SQ_NEED_WAKEUP)
78                      io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);
79
80              where sq_ring is a submission queue ring setup using the  struct
81              io_sqring_offsets described below.
82
83              To  successfully use this feature, the application must register
84              a set of files to be used for  IO  through  io_uring_register(2)
85              using  the  IORING_REGISTER_FILES  opcode. Failure to do so will
86              result in submitted IO being errored with EBADF.
87
88       IORING_SETUP_SQ_AFF
89              If this flag is specified, then the poll thread will be bound to
90              the   cpu   set   in  the  sq_thread_cpu  field  of  the  struct
91              io_uring_params.  This flag is only meaningful when  IORING_SET‐
92              UP_SQPOLL is specified.
93
94       IORING_SETUP_CQSIZE
95              Create      the      completion      queue      with      struct
96              io_uring_params.cq_entries entries.  The value must  be  greater
97              than entries, and may be rounded up to the next power-of-two.
98
99       If no flags are specified, the io_uring instance is setup for interrupt
100       driven I/O.  I/O may be submitted using io_uring_enter(2)  and  can  be
101       reaped by polling the completion queue.
102
103       The resv array must be initialized to zero.
104
105       features  is  filled in by the kernel, which specifies various features
106       supported by current kernel version.
107
108       IORING_FEAT_SINGLE_MMAP
109              If this flag is set, the two SQ and CQ rings can be mapped  with
110              a  single  mmap(2)  call. The SQEs must still be allocated sepa‐
111              rately. This brings the necessary mmap(2) calls down from  three
112              to two.
113
114       IORING_FEAT_NODROP
115              If this flag is set, io_uring supports never dropping completion
116              events.  If a completion event occurs and the CQ ring  is  full,
117              the  kernel  stores  the event internally until such a time that
118              the CQ ring has room for more entries. If this  overflow  condi‐
119              tion is entered, attempting to submit more IO with fail with the
120              -EBUSY error value, if it can't flush the  overflown  events  to
121              the  CQ  ring. If this happens, the application must reap events
122              from the CQ ring and attempt the submit again.
123
124       IORING_FEAT_SUBMIT_STABLE
125              If this flag is set, applications can be certain that  any  data
126              for async offload has been consumed when the kernel has consumed
127              the SQE.
128
129       IORING_FEAT_RW_CUR_POS
130              If this flag is set, applications can specify offset == -1  with
131              IORING_OP_{READV,WRITEV}  ,  IORING_OP_{READ,WRITE}_FIXED  , and
132              IORING_OP_{READ,WRITE} to  mean  current  file  position,  which
133              behaves like preadv2(2) and pwritev2(2) with offset == -1. It'll
134              use (and update) the current file position. This obviously comes
135              with  the  caveat  that if the application has multiple reads or
136              writes in flight, then the end result will not be  as  expected.
137              This  is  similar to threads sharing a file descriptor and doing
138              IO using the current file position.
139
140       IORING_FEAT_CUR_PERSONALITY
141              If this flag is set, then io_uring guarantees that both sync and
142              async execution of a request assumes the credentials of the task
143              that called io_uring_enter(2) to queue  the  requests.  If  this
144              flag isn't set, then requests are issued with the credentials of
145              the task that originally registered the io_uring.  If  only  one
146              task  is using a ring, then this flag doesn't matter as the cre‐
147              dentials will always be the same. Note that this is the  default
148              behavior,  tasks  can  still  register  different  personalities
149              through  io_uring_register(2)  with  IORING_REGISTER_PERSONALITY
150              and specify the personality to use in the sqe.
151
152
153       The  rest  of the fields in the struct io_uring_params are filled in by
154       the kernel, and provide the information necessary  to  memory  map  the
155       submission  queue,  completion queue, and the array of submission queue
156       entries.  sq_entries specifies the number of submission  queue  entries
157       allocated.  sq_off describes the offsets of various ring buffer fields:
158
159           struct io_sqring_offsets {
160               __u32 head;
161               __u32 tail;
162               __u32 ring_mask;
163               __u32 ring_entries;
164               __u32 flags;
165               __u32 dropped;
166               __u32 array;
167               __u32 resv[3];
168           };
169
170       Taken  together,  sq_entries  and sq_off provide all of the information
171       necessary for accessing the submission queue ring buffer and  the  sub‐
172       mission  queue  entry array.  The submission queue can be mapped with a
173       call like:
174
175           ptr = mmap(0, sq_off.array + sq_entries * sizeof(__u32),
176                      PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
177                      ring_fd, IORING_OFF_SQ_RING);
178
179       where sq_off is the io_sqring_offsets structure,  and  ring_fd  is  the
180       file  descriptor  returned  from  io_uring_setup(2).   The  addition of
181       sq_off.array to the length of the region accounts for the fact that the
182       ring located at the end of the data structure.  As an example, the ring
183       buffer head pointer can  be  accessed  by  adding  sq_off.head  to  the
184       address returned from mmap(2):
185
186           head = ptr + sq_off.head;
187
188       The  flags field is used by the kernel to communicate state information
189       to the application.  Currently, it is used to  inform  the  application
190       when  a  call to io_uring_enter(2) is necessary.  See the documentation
191       for the IORING_SETUP_SQPOLL flag above.  The dropped member  is  incre‐
192       mented  for each invalid submission queue entry encountered in the ring
193       buffer.
194
195       The head and tail track the ring buffer state.  The tail is incremented
196       by the application when submitting new I/O, and the head is incremented
197       by the kernel when the I/O has been successfully submitted.   Determin‐
198       ing  the  index  of  the  head or tail into the ring is accomplished by
199       applying a mask:
200
201           index = tail & ring_mask;
202
203       The array of submission queue entries is mapped with:
204
205           sqentries = mmap(0, sq_entries * sizeof(struct io_uring_sqe),
206                            PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
207                            ring_fd, IORING_OFF_SQES);
208
209       The completion queue is described by cq_entries and cq_off shown here:
210
211           struct io_cqring_offsets {
212               __u32 head;
213               __u32 tail;
214               __u32 ring_mask;
215               __u32 ring_entries;
216               __u32 overflow;
217               __u32 cqes;
218               __u32 resv[4];
219           };
220
221       The completion queue is simpler, since the entries  are  not  separated
222       from the queue itself, and can be mapped with:
223
224           ptr = mmap(0, cq_off.cqes + cq_entries * sizeof(struct io_uring_cqe),
225                      PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, ring_fd,
226                      IORING_OFF_CQ_RING);
227
228       Closing the file descriptor returned by io_uring_setup(2) will free all
229       resources associated with the io_uring context.
230

RETURN VALUE

232       io_uring_setup(2) returns a new file descriptor on success.  The appli‐
233       cation  may  then  provide  the file descriptor in a subsequent mmap(2)
234       call  to  map  the  submission  and  completion  queues,  or   to   the
235       io_uring_register(2) or io_uring_enter(2) system calls.
236
237       On error, -1 is returned and errno is set appropriately.
238

ERRORS

240       EFAULT params is outside your accessible address space.
241
242       EINVAL The  resv  array  contains  non-zero  data,  p.flags contains an
243              unsupported flag, entries is out of bounds,  IORING_SETUP_SQ_AFF
244              was  specified,  but IORING_SETUP_SQPOLL was not, or IORING_SET‐
245              UP_CQSIZE  was  specified,  but  io_uring_params.cq_entries  was
246              invalid.
247
248       EMFILE The per-process limit on the number of open file descriptors has
249              been reached (see the  description  of  RLIMIT_NOFILE  in  getr‐
250              limit(2)).
251
252       ENFILE The system-wide limit on the total number of open files has been
253              reached.
254
255       ENOMEM Insufficient kernel resources are available.
256
257       EPERM  IORING_SETUP_SQPOLL was specified, but the effective user ID  of
258              the caller did not have sufficient privileges.
259

SEE ALSO

261       io_uring_register(2), io_uring_enter(2)
262
263
264
265Linux                             2019-01-29                 IO_URING_SETUP(2)
Impressum