1BLKIO(3)                                                              BLKIO(3)
2
3
4

NAME

6       blkio - Block device I/O library
7

DESCRIPTION

9       libblkio is a library for accessing data stored on block devices. Block
10       devices offer persistent data storage and are addressable in fixed-size
11       units  called  blocks.  Block  sizes of 4 KiB or 512 bytes are typical.
12       Hard disk drives, solid state disks (SSDs), USB mass  storage  devices,
13       and other types of hardware are block devices.
14
15       The  focus  of libblkio is on fast I/O for multi-threaded applications.
16       Management of block devices, including partitioning  and  resizing,  is
17       outside the scope of the library.
18
19       Block  devices have one or more queues for submitting I/O requests such
20       as reads and writes. Block devices  process  I/O  requests  from  their
21       queues  and produce a return code for each completed request indicating
22       success or an error.
23
24       The application is responsible for thread-safety.  No  thread  synchro‐
25       nization  is  necessary when a queue is only used from a single thread.
26       Proper synchronization is required when sharing a queue between  multi‐
27       ple threads.
28
29       libblkio  can  be used in blocking, event-driven, and polling modes de‐
30       pending on the architecture of the application and its performance  re‐
31       quirements.
32
33       Blocking  mode  suspends  the execution of the current thread until the
34       request completes. This is most natural way of  writing  programs  that
35       perform a sequence of I/O requests but cannot exploit request parallel‐
36       ism.
37
38       Event-driven mode provides a completion file descriptor that the appli‐
39       cation  can  monitor  from its event loop. This allows multiple I/O re‐
40       quests to be in flight simultaneously and the application  can  respond
41       to other events while waiting for completions.
42
43       Polling mode also supports multiple in-flight requests but the applica‐
44       tion continuously checks for completions, typically from a tight  loop,
45       in order to minimize latency.
46
47       libblkio contains drivers for several block I/O interfaces. This allows
48       applications using libblkio to access different block devices through a
49       single API.
50
51   Creating a blkio instance
52       A  struct  blkio  instance  is  created  from a specific driver such as
53       "io_uring" as follows:
54
55          struct blkio *b;
56          int ret;
57
58          ret = blkio_create("io_uring", &b);
59          if (ret < 0) {
60              fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
61              return;
62          }
63
64       For a list of available drivers, see the DRIVERS section below.
65
66   Error messages
67       Functions generally return 0 on success and a negative  errno(3)  value
68       on  failure.  In the later case, a per-thread error message is also set
69       and can be obtained as a const char * by calling blkio_get_error_msg().
70
71       Note that these messages are not stable and may change in between back‐
72       ward-compatible  libblkio  releases. The same applies to returned errno
73       values, unless a specific value is explicitly documented for a particu‐
74       lar error condition.
75
76   Connecting to a block device
77       Connection  details for a block device are specified by setting proper‐
78       ties on the blkio instance. The  available  properties  depend  on  the
79       driver.  For  example,  the io_uring driver's "path" property is set to
80       /dev/sdb to access a local disk:
81
82          int ret = blkio_set_str(b, "path", "/dev/sdb");
83          if (ret < 0) {
84              fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
85              blkio_destroy(&b);
86              return;
87          }
88
89       Once the connection details have been specified the blkio instance  can
90       be connected to the block device with blkio_connect():
91
92          ret = blkio_connect(b);
93
94   Starting a block device
95       After the blkio instance is connected, properties are available to con‐
96       figure its operation and query device characteristics such as the maxi‐
97       mum number of queues. See PROPERTIES for details.
98
99       For example, the number of queues can be set as follows:
100
101          ret = blkio_set_int(b, "num-queues", 4);
102
103       Once  configuration  is  complete  the  blkio  instance is started with
104       blkio_start():
105
106          ret = blkio_start(b);
107
108   Mapping memory regions
109       Memory containing I/O data buffers must be "mapped"  before  submitting
110       requests that touch the memory when the "needs-mem-regions" property is
111       true.  Otherwise mapping memory is optional but doing  so  may  improve
112       performance.
113
114       Memory  regions  are  mapped  globally  for  the blkio instance and are
115       available to all queues. A memory region is represented as follows:
116
117          struct blkio_mem_region
118          {
119              void *addr;
120              uint64_t iova;
121              size_t len;
122              int64_t fd_offset;
123              int fd;
124              uint32_t flags;
125          };
126
127       The addr field contains the starting address of the memory region.  Re‐
128       quests  transfer data between the block device and a subset of the mem‐
129       ory region, including  up  to  the  entire  memory  region.  Individual
130       read/write  requests or readv/writev request segments (iovecs) must not
131       access more than one memory region. Multiple requests  can  access  the
132       same  memory  region simultaneously, although usually with non-overlap‐
133       ping areas.
134
135       The addr field must be a multiple of the  "mem-region-alignment"  prop‐
136       erty.
137
138       The iova field is reserved and must be zero.
139
140       The len field is the size of the memory region in bytes. The value must
141       be a multiple of the "mem-region-alignment" property.
142
143       The fd field is the file descriptor for the memory region. Some drivers
144       require  that  I/O data buffers are located in file-backed memory. This
145       can be anonymous memory from memfd_create(2) rather than an actual file
146       on disk.  If the "needs-mem-region-fd" property is true then this field
147       must be a valid file descriptor. If the property is  false  this  field
148       may be -1.
149
150       The fd_offset field is the byte offset from the start of the file given
151       in fd.
152
153       The flags field is reserved and must be zero.
154
155       The application can either allocate I/O data  buffers  itself  and  de‐
156       scribe  them  with  struct  blkio_mem_region  or  it  can use blkio_al‐
157       loc_mem_region() and blkio_free_mem_region() to allocate  memory  suit‐
158       able for I/O data buffers:
159
160          int blkio_alloc_mem_region(struct blkio *b, struct blkio_mem_region *region,
161                                     size_t len);
162          void blkio_free_mem_region(struct blkio *b,
163                                     const struct blkio_mem_region *region);
164
165       The  len  argument  is the number of bytes to allocate. These functions
166       may only be called after the blkio instance has been started.
167
168       File descriptors for memory regions  created  with  blkio_alloc_mem_re‐
169       gion() are automatically closed across execve(2).
170
171       Memory  regions can be mapped and unmapped after the blkio instance has
172       been started using the blkio_map_mem_region()  and  blkio_unmap_mem_re‐
173       gion() functions:
174
175          int blkio_map_mem_region(struct blkio *b,
176                                   const struct blkio_mem_region *region);
177          void blkio_unmap_mem_region(struct blkio *b,
178                                      const struct blkio_mem_region *region);
179
180       These  functions  must  not be called while requests are in flight that
181       access the affected memory region. Memory  regions  must  not  overlap.
182       Memory  regions  must  be  unmapped/freed  with exactly the same region
183       field values that they were mapped/allocated with.
184
185       blkio_map_mem_region() does  not  take  ownership  of  region->fd.  The
186       caller may close region->fd after blkio_map_mem_region() returns.
187
188       blkio_map_mem_region()  returns  an  error if called on a memory region
189       that is already mapped against  the  given  blkio.  blkio_unmap_mem_re‐
190       gion()  has no effect when called on a memory region that is not mapped
191       against the given blkio.
192
193       blkio_free_mem_region() must not be called on a memory region that  was
194       mapped but not unmapped.
195
196       For  best  performance  applications should map memory regions once and
197       reuse them instead of changing memory regions frequently.
198
199       The "max-mem-regions" property gives the maximum number of  memory  re‐
200       gions that can be mapped.
201
202       Memory  regions  are  automatically  unmapped  when  blkio_destroy() is
203       called, and memory regions allocated using blkio_alloc_mem_region() are
204       freed.
205
206   Performing I/O
207       Once  at  least one memory region has been mapped, the queues are ready
208       for request processing. The following example  reads  4096  bytes  from
209       byte offset 0x10000:
210
211          struct blkioq *q = blkio_get_queue(b, 0);
212
213          blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
214
215          struct blkio_completion completion;
216          ret = blkioq_do_io(q, &completion, 1, 1, NULL);
217          if (ret != 1) ...
218          if (completion.ret != 0) ...
219
220       This  is  an  example of blocking mode where blkioq_do_io() waits until
221       the I/O request completes. See below for details  on  event-driven  and
222       polling modes.
223
224       The blkioq_do_io() function offers the following arguments:
225
226          int blkioq_do_io(struct blkioq *q,
227                           struct blkio_completion *completions,
228                           int min_completions,
229                           int max_completions,
230                           struct timespec *timeout);
231
232       The  completions  argument  is  a pointer to an array that is filled in
233       with completions when the function returns. When max_completions  is  0
234       completions   may  be  NULL.  Completions  are  represented  by  struct
235       blkio_completion:
236
237          struct blkio_completion
238          {
239              void *user_data;
240              const char *error_msg;
241              int ret;
242              /* reserved space */
243          };
244
245       The user_data field is the same pointer passed to blkioq_read() in  the
246       example  above.  Applications  that  submit  multiple  requests can use
247       user_data to correlate completions to previously submitted requests.
248
249       The ret field is the return code for the I/O request in negative  errno
250       representation. This field is 0 on success.
251
252       For  some  errors,  the  error_msg field points to a message describing
253       what caused the request to fail. Note that this may be NULL even if ret
254       is not 0, and is always NULL when ret is 0.
255
256       Note that these messages are not stable and may change in between back‐
257       ward-compatible libblkio releases. The same applies to the errno values
258       returned  through ret, unless a specific value is explicitly documented
259       for a particular error condition.
260
261       struct blkio_completion also includes some reserved space which may  be
262       used to add more fields in the future in a backward-compatible manner.
263
264       The remaining arguments of blkioq_do_io() are as follows:
265
266       The min_completions argument controls how many completions to wait for.
267       A value greater than 0 causes the function to block until the number of
268       completions  has been reached. A value of 0 causes the function to sub‐
269       mit I/O and return completions that have already occurred without wait‐
270       ing  for  more. If greater than the number of currently outstanding re‐
271       quests, blkioq_do_io() fails with -EINVAL.
272
273       The max_completions argument is the maximum number of completions  ele‐
274       ments  to  fill  in. This value must be greater or equal to min_comple‐
275       tions.
276
277       The timeout argument specifies the maximum amount of time to  wait  for
278       completions.  The function returns -ETIME if the timeout expires before
279       a request completes. If timeout is NULL  the  function  blocks  indefi‐
280       nitely. When timeout is non-NULL the elapsed time is subtracted and the
281       struct timespec is updated when the function returns regardless of suc‐
282       cess or failure.
283
284       The  return value is the number of completions elements filled in. This
285       value is within the inclusive range [min_completions,  max_completions]
286       on success or a negative errno on failure.
287
288       A blkioq_do_io_interruptible() variant is also available:
289
290          int blkioq_do_io_interruptible(struct blkioq *q,
291                                         struct blkio_completion *completions,
292                                         int min_completions,
293                                         int max_completions,
294                                         struct timespec *timeout,
295                                         const sigset_t *sig);
296
297       Unlike  blkioq_do_io(), this function can be interrupted by signals and
298       return -EINTR. The sig argument temporarily sets the signal mask of the
299       process  while  waiting  for completions, which allows the thread to be
300       woken by a signal without race conditions. To ensure this  function  is
301       interrupted when a signal is received, (1) the said signal must be in a
302       blocked state when invoking the function (see sigprocmask(2)) and (2) a
303       signal mask unblocking that signal must be given as the sig argument.
304
305   Event-driven mode
306       Completion  processing  can be integrated into the event loop of an ap‐
307       plication so that other activity can take place while I/O is in flight.
308       Each  queue  has  a  completion file descriptor that is returned by the
309       following function:
310
311          int blkioq_get_completion_fd(struct blkioq *q);
312
313       The returned file descriptor becomes readable when blkioq_do_io() needs
314       to be called again. Spurious events can occur, causing the fd to become
315       readable even if there are no new completions available.
316
317       The returned file descriptor has O_NONBLOCK set.  The  application  may
318       switch the file descriptor to blocking mode.
319
320       By  default,  the  driver  might not generate completion events for re‐
321       quests so it is necessary to explicitly enable the completion file  de‐
322       scriptor before use:
323
324          void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);
325
326       Changes  made  using  this function apply also to requests that are al‐
327       ready in flight but not yet completed. Note  that  even  after  calling
328       this function with enabled as false, the driver may still generate com‐
329       pletion events.
330
331       The application must read 8 bytes from the completion  file  descriptor
332       to  reset  the event before calling blkioq_do_io(). The contents of the
333       bytes are undefined and should not be interpreted by the application.
334
335       The following example demonstrates event-driven I/O:
336
337          struct blkioq *q = blkio_get_queue(b, 0);
338          int completion_fd = blkio_get_completion_fd(q);
339          char event_data[8];
340
341          /* Switch to blocking mode for read(2) below */
342          fcntl(completion_fd, F_SETFL,
343                fcntl(completion_fd, F_GETFL, NULL) & ~O_NONBLOCK);
344
345          /* Enable completion events */
346          blkioq_set_completion_fd_enabled(q, true);
347
348          blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
349
350          /* Since min_completions = 0 we will submit but not wait */
351          ret = blkioq_do_io(q, NULL, 0, 0, NULL);
352          if (ret != 0) ...
353
354          /* Wait for the next event on the completion file descriptor */
355          struct blkio_completion completion;
356          do {
357            read(completion_fd, event_data, sizeof(event_data));
358            ret = blkioq_do_io(q, &completion, 0, 1, NULL);
359          } while (ret == 0);
360          if (ret != 1) ...
361          if (completion.ret != 0) ...
362
363       This example uses a blocking read(2) to wait and consume the next event
364       on  the  completion file descriptor. Because spurious events can occur,
365       it then checks if there actually is a  completion  available,  retrying
366       read(2) otherwise.
367
368       Normally  completion_fd  would  be registered with an event loop so the
369       application can perform other tasks while waiting.
370
371       Applications may save CPU cycles by  suppressing  completion  file  de‐
372       scriptor  notifications while processing completions. This optimization
373       avoids an unnecessary application event loop iteration  and  completion
374       file  descriptor  read when additional completions arrive while the ap‐
375       plication is processing completions:
376
377          static void process_completions(...)
378          {
379              int ret;
380
381              /* Supress completion fd notifications while we process completions */
382              blkioq_set_completion_fd_enabled(q, false);
383
384              do {
385                  struct blkioq_completion completion;
386                  ret = blkioq_do_io(q, &completion, 0, 1, NULL);
387
388                  if (ret == 0) {
389                      blkioq_set_completion_fd_enabled(q, true);
390
391                      /* Re-check for completions to avoid race */
392                      ret = blkioq_do_io(q, &completion, 0, 1, NULL);
393                      if (ret == 1) {
394                          blkioq_set_completion_fd_enabled(q, false);
395                      }
396                  }
397
398                  if (ret < 0) {
399                      ... /* error */
400                  }
401
402                  if (ret == 1) {
403                      ... /* process completion */
404                  }
405              } while (ret == 1);
406          }
407
408   Application-level polling mode
409       Waiting for completions using blkioq_do_io() with min_completions  >  0
410       can  cause  the  current thread to be descheduled by the operating sys‐
411       tem's scheduler.  The same is true when waiting for events on the  com‐
412       pletion  file  descriptor  returned by blkioq_get_completion_fd(). Some
413       applications require consistent low response times and therefore cannot
414       risk being descheduled.
415
416       blkioq_do_io()  may  be called from a CPU polling loop with min_comple‐
417       tions = 0 to check for completions:
418
419          struct blkioq *q = blkio_get_queue(b, 0);
420
421          blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
422
423          /* Busy-wait for the completion */
424          struct blkio_completion completion;
425          do {
426              ret = blkioq_do_io(q, &completion, 0, 1, NULL);
427          } while (ret == 0);
428
429          if (ret != 1) ...
430          if (completion.ret != 0) ...
431
432       This approach is ideal for applications that need to poll several event
433       sources  simultaneously, or that need to intersperse polling with other
434       application logic. Otherwise, driver-level polling (see below) may lead
435       to further performance gains.
436
437   Driver-level polling mode (poll queues)
438       Poll  queues  differ  from the "regular" queues presented above in that
439       calling blkioq_do_io() with min_completions > 0 causes libblkio  itself
440       (or other lower layers) to poll for completions. This can be more effi‐
441       cient than repeatedly invoking blkioq_do_io() with min_completions =  0
442       on  a  "regular"  queue.  For  instance, with the io_uring driver, poll
443       queues cause the kernel itself to poll for  completions,  avoiding  re‐
444       peated context switching while polling.
445
446       A  limitation  of poll queues is that the CPU thread is occupied with a
447       single poll queue and cannot detect other events in the  meantime  such
448       as network I/O or application events. Applications wishing to poll mul‐
449       tiple things simultaneously may prefer to use application-level polling
450       (see above).
451
452       Poll  queue  support  is contingent on the particular driver and driver
453       configuration being used. To determine whether a given  blkio  supports
454       poll queues, check the "supports-poll-queues" property:
455
456          bool supports_poll_queues;
457          ret = blkio_get_bool(b, "supports-poll-queues", &supports_poll_queues);
458          if (ret != 0) ...
459
460          if (!supports_poll_queues) {
461              fprintf(stderr, "Poll queues not supported\n");
462              return;
463          }
464
465       It  is possible for poll queues not to support flush, write zeroes, and
466       discard requests, even if "regular" queues of the same blkio  do.  How‐
467       ever,  read,  write,  readv,  and writev requests are always supported.
468       There is currently no mechanism to check which types  of  requests  are
469       supported by poll queues.
470
471       To  use  poll  queues, set the "num-poll-queues" property to a positive
472       value before calling blkio_start(), then use blkio_get_poll_queue()  to
473       retrieve the poll queues. A single blkio can have both "regular" queues
474       and poll queues:
475
476          ...
477          ret = blkio_connect(b);
478          if (ret != 0) ...
479
480          ret = blkio_set_int(b, "num-queues", 1);
481          ret = blkio_set_int(b, "num-poll-queues", 1);
482          if (ret != 0) ...
483
484          ret = blkio_start(b);
485          if (ret != 0) ...
486
487          struct blkioq *q      = blkio_get_queue(b, 0);
488          struct blkioq *poll_q = blkio_get_poll_queue(b, 0);
489
490       It  is  possible  to  set  property  "num-queues"  to  0  as  long   as
491       "num-poll-queues" is positive.
492
493       Poll  queues also differ from "regular" queues in that they do not have
494       a completion fd. blkioq_get_completion_fd() returns -1 when called on a
495       poll  queue, and blkioq_set_completion_fd_enabled() has no effect. Fur‐
496       ther, blkioq_do_io_interruptible() is not currently supported  on  poll
497       queues.
498
499       Note  that  you  can  still  perform  application-level polling on poll
500       queues by repeatedly calling blkioq_do_io() with min_completions  =  0,
501       but this will lead to suboptimal performance.
502
503   Dynamically adding and removing queues
504       Some  drivers  have support for adding queues on demand after the blkio
505       instance is already started:
506
507          int index = blkio_add_queue(b); /* or blkio_add_poll_queue() */
508          if (ret < 0) ...
509
510          struct blkioq *q = blkio_get_queue(b, index); /* or blkio_get_poll_queue() */
511
512       The "can-add-queues" property determines  whether  this  is  supported.
513       When it is, the blkio instance can be started with 0 queues.
514
515       In  addition,  all drivers allow explicitly removing queues, regardless
516       of  whether   those   queues   were   created   by   blkio_start()   or
517       blkio_add_queue() / blkio_add_poll_queue():
518
519          assert(blkio_get_queue(b, 0) != NULL);
520          assert(blkio_get_queue(b, 1) != NULL);
521
522          /* blkio_remove_queue() will return 0, indicating success */
523          assert(blkio_remove_queue(b, 0) == 0);
524
525          /* Other queues' indices are not shifted, so q will be non-NULL and valid */
526          struct blkio *q = blkio_get_queue(b, 1);
527          assert(q != NULL);
528
529          /* blkio_remove_queue() will return -ENOENT, since queue 0 no longer exists */
530          assert(blkio_remove_queue(b, 0) == -ENOENT);
531
532       Once a queue is removed, any struct blkioq * pointing to it becomes in‐
533       valid.
534
535   Request types
536       The following types of I/O requests are available:
537
538          void blkioq_read(struct blkioq *q, uint64_t start, void *buf, size_t len,
539                           void *user_data, uint32_t flags);
540          void blkioq_write(struct blkioq *q, uint64_t start, void *buf, size_t len,
541                            void *user_data, uint32_t flags);
542          void blkioq_readv(struct blkioq *q, uint64_t start, struct iovec *iovec,
543                            int iovcnt, void *user_data, uint32_t flags);
544          void blkioq_writev(struct blkioq *q, uint64_t start, struct iovec *iovec,
545                             int iovcnt, void *user_data, uint32_t flags);
546          void blkioq_write_zeroes(struct blkioq *q, uint64_t start, uint64_t len,
547                                   void *user_data, uint32_t flags);
548          void blkioq_discard(struct blkioq *q, uint64_t start, uint64_t len,
549                              void *user_data, uint32_t flags);
550          void blkioq_flush(struct blkioq *q, void *user_data, uint32_t flags);
551
552       The block device may see requests as soon as they these  functions  are
553       called, but blkioq_do_io() must be called to ensure requests are seen.
554
555       If property "needs-mem-regions" is true, I/O data buffers pointed to by
556       buf and iovec must be within  regions  mapped  using  blkio_map_mem_re‐
557       gion().
558
559       The  application  must  not free the iovec elements until the request's
560       completion is returned by blkioq_do_io().
561
562       All  drivers  are  guaranteed  to  support  at   least   blkioq_read(),
563       blkioq_write(),  blkioq_readv(),  blkioq_writev(),  and blkioq_flush().
564       When attempting to queue a request that the driver  does  not  support,
565       the request itself fails and its completion's ret field is -ENOTSUP.
566
567       blkioq_read()  and  blkioq_readv()  read  data from the block device at
568       byte offset start. blkioq_write() and blkioq_writev() write data to the
569       block device at byte offset start. The length of the I/O data buffer is
570       len bytes and the total size of the iovec elements, respectively. start
571       and  the  length  of the I/O data buffer must be a multiple of the "re‐
572       quest-alignment" property. I/O data buffer addresses and  lengths,  in‐
573       cluding  buf  and  individual  iovec elements, must be multiples of the
574       "buf-alignment" property.
575
576       blkioq_write_zeroes() causes zeros to be written to the  specified  re‐
577       gion.   When   supported,   this  may  be  more  efficient  than  using
578       blkioq_write() with a zero-filled buffer.
579
580       blkioq_discard() causes data in the specified region to  be  discarded.
581       Subsequent reads to the same region return unspecified data until it is
582       written to again. Note that discarded data  is  not  guaranteed  to  be
583       erased and may still be returned by reads.
584
585       blkioq_flush() persists completed writes to the storage medium. Data is
586       persistent once the flush request completes successfully.  Applications
587       that  need  to  ensure that data persists across power failure or crash
588       must submit flush requests at appropriate points.
589
590       The  user_data  pointer  is  returned  in  the   struct   blkio_comple‐
591       tion::user_data field by blkioq_do_io(). It allows applications to cor‐
592       relate a completion with its request.
593
594       No ordering guarantees are defined for requests that are in flight  si‐
595       multaneously. For example, a flush request is not guaranteed to persist
596       in-flight write requests. Instead the application must wait  for  write
597       requests   that  it  wishes  to  persist  to  complete  before  calling
598       blkioq_flush().
599
600       Similarly, there are no ordering guarantees between multiple queues  of
601       a  block  device.  Multi-threaded applications that rely on an ordering
602       between multiple queues must wait for the first request to complete  on
603       one  queue,  synchronize  threads as needed, and then submit the second
604       request on the other queue.
605
606   Request flags
607       The following request flags are available:
608
609       BLKIO_REQ_FUA
610              Ensures that data written by  this  request  reaches  persistent
611              storage  before  the request is completed. This is also known as
612              Full Unit Access (FUA). This flag eliminates the need for a sep‐
613              arate blkioq_flush() call after the request has completed. Other
614              data  that  was  previously  successfully  written  without  the
615              BLKIO_REQ_FUA  flag is not necessarily persisted by this flag as
616              it is only guaranteed to affect the current  request.  Supported
617              by blkioq_write() and blkioq_writev().
618
619       BLKIO_REQ_NO_UNMAP
620              Ensures  that  blkioq_write_zeroes()  does  not cause underlying
621              storage space to be deallocated,  guaranteeing  that  subsequent
622              writes to the same region do not fail due to lack of space.
623
624       BLKIO_REQ_NO_FALLBACK
625              Ensures that blkioq_write_zeroes() does not resort to performing
626              regular write requests with zero-filled buffers. If  that  would
627              otherwise  be  the  case  and this flag is set, then the request
628              fails and its completion's ret field is -ENOTSUP.
629

PROPERTIES

631       The configuration of blkio instances is done through property accesses.
632       Each  property  has a name and a type (bool, int, str, uint64). Proper‐
633       ties may be read-only (r), write-only (w), or read/write (rw).
634
635       Access to properties depends on the blkio instance state  (created/con‐
636       nected/started).  A  property  may be read/write in the connected state
637       but read-only in the started state. This is written as "rw connected, r
638       started".
639
640       The following properties APIs are available:
641
642          int blkio_get_bool(struct blkio *b, const char *name, bool *value);
643          int blkio_get_int(struct blkio *b, const char *name, int *value);
644          int blkio_get_uint64(struct blkio *b, const char *name, uint64_t *value);
645          int blkio_get_str(struct blkio *b, const char *name, char **value);
646
647          int blkio_set_bool(struct blkio *b, const char *name, bool value);
648          int blkio_set_int(struct blkio *b, const char *name, int value);
649          int blkio_set_uint64(struct blkio *b, const char *name, uint64_t value);
650          int blkio_set_str(struct blkio *b, const char *name, const char *value);
651
652       blkio_get_str()  assigns  to  *value and the caller must use free(3) to
653       deallocate the memory.
654
655       blkio_get_str() automatically converts to string representation if  the
656       property  is  not  a  str.  blkio_set_str() automatically converts from
657       string representation if the property is not a str. This can be used to
658       easily   fetch  values  from  and  store  values  to  an  application's
659       text-based configuration file or command-line. Aside  from  this  auto‐
660       matic conversion, the other property APIs fail with ENOTTY if the prop‐
661       erty does not have the right type.
662
663       The following properties are common across all drivers. Driver-specific
664       properties are documented in DRIVERS.
665
666   Properties available after blkio_create()
667       can-add-queues (bool, r created/connected/started)
668              Whether  the  driver  supports  dynamically  adding  queues with
669              blkio_add_queue() / blkio_add_poll_queue().
670
671       driver (str, r created/connected/started)
672              The driver name that was passed to blkio_create().  See  DRIVERS
673              for details on available drivers.
674
675       read-only (bool, rw created, r connected/started)
676              If  true,  requests  other than read and flush fail with -EBADF.
677              The default is false.
678
679   Properties available after blkio_connect()
680       DEVICE AND QUEUES
681
682          capacity (uint64, r connected/started)
683                 The size of the block device in bytes.
684
685          max-queues (int, r connected/started)
686                 The maximum number of queues, including poll queues if any.
687
688          num-queues (int, rw connected, r started)
689                 The number of queues. The default is 1.
690
691          num-poll-queues (int, rw connected, r started)
692                 The number of poll queues. The default is 0. If set to a pos‐
693                 itive  value  and  property  "supports-poll-queues" is false,
694                 blkio_start() will fail.
695
696          supports-poll-queues (bool, r connected/started)
697                 Whether the driver supports poll queues.
698
699       MEMORY REGIONS
700
701          max-mem-regions (uint64, r connected/started)
702                 The maximum number of memory regions that can  be  mapped  at
703                 any given time.
704
705          may-pin-mem-regions (bool, r connected/started)
706                 Will  the driver sometimes pin memory region pages and there‐
707                 fore prevent madvise(MADV_DONTNEED) and related syscalls from
708                 working?
709
710          mem-region-alignment (uint64, r connected/started)
711                 The  alignment requirement, in bytes, for the addr, iova, and
712                 size in struct blkio_memory_region. This is always a multiple
713                 of the "buf-alignment" property.
714
715          needs-mem-regions (bool, r connected/started)
716                 Is  it necessary to map memory regions with blkio_map_mem_re‐
717                 gion()?
718
719          needs-mem-region-fd (bool, r connected/started)
720                 Is it necessary to provide a file descriptor for each  memory
721                 region?
722
723       ALL REQUESTS
724
725          optimal-io-alignment (int, r connected/started)
726                 The  ideal number of bytes of request start and length align‐
727                 ment for maximizing performance. This is a  multiple  of  the
728                 "request-alignment" property.
729
730          optimal-io-size (int, r connected/started)
731                 The ideal request length in bytes for achieving high through‐
732                 put. Can be 0 if unspecified. Otherwise, this is  a  multiple
733                 of the "optimal-io-alignment" property.
734
735          request-alignment (int, r connected/started)
736                 All  request  start  and  length  must  be a multiple of this
737                 value. Often this value is 512 bytes.
738
739       READ AND WRITE REQUESTS
740
741          buf-alignment (int, r connected/started)
742                 I/O data buffer memory address and length alignment,  includ‐
743                 ing  plain  void  *buf  buffers  and iovec segments. Note the
744                 "mem-region-alignment" property is always a multiple of  this
745                 value.
746
747          can-grow (bool, r connected/started)
748                 If  false  blkioq_read(),  blkioq_readv(), blkioq_write() and
749                 blkioq_writev() will fail if an attempt to read/write  beyond
750                 of EOF is made. Otherwise, reads will succeed and the portion
751                 of the read buffer that overruns EOF will be filled with  ze‐
752                 ros, and writes will increase the the device's capacity.
753
754          max-segments (int, r connected/started)
755                 The maximum iovcnt in a request.
756
757          max-segment-len (int, r connected/started)
758                 The  maximum size of each iovec in a request. Can be 0 if un‐
759                 specified.
760
761          max-transfer (int, r connected/started)
762                 The maximum read or write request length in bytes. Can  be  0
763                 if unspecified.
764
765          optimal-buf-alignment (int, r connected/started)
766                 The  ideal  number of bytes of I/O data buffer memory address
767                 and length alignment, including plain void *buf  buffers  and
768                 iovec segments.
769
770          supports-fua-natively (bool, r connected/started)
771                 Whether   blkioq_write()   and  blkioq_writev()  support  the
772                 BLKIO_REQ_FUA flag natively, as opposed to  emulating  it  by
773                 internally performing a flush request after the write.
774
775       WRITE ZEROES REQUESTS
776
777          max-write-zeroes-len (uint64, r connected/started)
778                 The maximum length of a write zeroes request in bytes. Can be
779                 0 if unspecified.
780
781       DISCARD REQUESTS
782
783          discard-alignment (int, r connected/started)
784                 Discard request start and length, after subtracting the value
785                 of  the "discard-alignment-offset" property, must be a multi‐
786                 ple of this value. This may or may not be 0  if  discard  re‐
787                 quests are not supported. If not 0, this is a multiple of the
788                 "request-alignment" property.
789
790          discard-alignment-offset (int, r connected/started)
791                 Offset of the first block that may be discarded. This may  be
792                 non-zero, for example, when the device is a partition that is
793                 not aligned to the value of the "discard-alignment" property.
794                 This  may  or  may  not be 0 if discard requests are not sup‐
795                 ported. If not 0, this is a multiple of  the  "request-align‐
796                 ment"  property,  and  is  less  than the "discard-alignment"
797                 property.
798
799          max-discard-len (uint64, r connected/started)
800                 The maximum length of a discard request in bytes. Can be 0 if
801                 unspecified.
802

DRIVERS

804   io_uring
805       The  io_uring  driver  uses the Linux io_uring system call interface to
806       perform I/O on files and block device nodes.  Both  regular  files  and
807       block device nodes are supported.
808
809       Note that io_uring was introduced in Linux kernel version 5.1, and ker‐
810       nels may also be configured to disable io_uring.  If  io_uring  is  not
811       available, blkio_create() fails with -ENOSYS when using this driver.
812
813       When performing I/O on regular files, write zeroes requests that extend
814       past the end-of-file may or may not update the file size. This is  left
815       unspecified and the user must not rely on any particular behavior.
816
817       This  driver supports poll queues only when using O_DIRECT on block de‐
818       vices or file systems that support polling. Its poll queues never  sup‐
819       port flush, write zeroes, or discard requests.
820
821       Driver-specific properties available after blkio_create()
822
823          direct (bool, rw created, r connected/started)
824                 True  to  bypass the page cache with O_DIRECT. The default is
825                 false.
826
827          fd (int, rw created, r connected/started)
828                 An existing open file descriptor for the file or block device
829                 node.  Ownership  of the file descriptor is passed to the li‐
830                 brary when connected is successfully set to true.
831
832                 If this property is set, properties "direct" and  "read-only"
833                 have  no  effect  and it is the user's responsibility to open
834                 the file with the desired flags.   Further,  during  connect,
835                 those  two  properties are updated to reflect the file status
836                 flags of the given file descriptor.
837
838          path (str, rw created, r connected/started)
839                 The file system path of the file or block device node.
840
841                 If this property is set, property "fd" must not  be  set  and
842                 will  be  updated  on  connect to reflect the opened file de‐
843                 scriptor. Note that the file descriptor is owned by libblkio.
844
845       Driver-specific properties available after blkio_connect()
846
847          num-entries (int, rw connected, r started)
848                 The minimum number of entries that each  io_uring  submission
849                 queue and completion queue should have. The default is 128.
850
851                 A larger value allows more requests to be in flight, but con‐
852                 sumes more resources. Tuning this value  can  affect  perfor‐
853                 mance.
854
855                 io_uring  imposes a maximum on this number: 32768 as of main‐
856                 line kernel 5.18, and 4096 prior to 5.4. If this  maximum  is
857                 exceeded, blkio_start() will fail with -EINVAL.
858
859   nvme-io_uring
860       The  nvme-io_uring  driver  submits  NVMe  commands directly to an NVMe
861       namespace using io_uring passthrough, which is available since mainline
862       Linux kernel 5.19.
863
864       The  process must have the CAP_SYS_ADMIN capability to use this driver,
865       and the NVMe namespace must use the NVM command set.
866
867       Driver-specific properties available after blkio_create()
868
869          fd (int, rw created, r connected/started)
870                 An existing open file descriptor  for  the  NVMe  namespace's
871                 character  device  (e.g.,  /dev/ng0n1). Ownership of the file
872                 descriptor is passed to the library when  connected  is  suc‐
873                 cessfully set to true.
874
875          path (str, rw created, r connected/started)
876                 A  path  to  the  NVMe  namespace's  character  device (e.g.,
877                 /dev/ng0n1).
878
879                 If this property is set, property "fd" must not  be  set  and
880                 will  be  updated  on  connect to reflect the opened file de‐
881                 scriptor. Note that the file descriptor is owned by libblkio.
882
883       Driver-specific properties available after blkio_connect()
884
885          num-entries (int, rw connected, r started)
886                 The minimum number of entries that each  io_uring  submission
887                 queue and completion queue should have. The default is 128.
888
889                 A larger value allows more requests to be in flight, but con‐
890                 sumes more resources. Tuning this value  can  affect  perfor‐
891                 mance.
892
893                 io_uring  imposes a maximum on this number: 32768 as of main‐
894                 line kernel 5.18, and 4096 prior to 5.4. If this  maximum  is
895                 exceeded, blkio_start() will fail with -EINVAL.
896
897   virtio-blk-...
898       The following virtio-blk drivers are provided:
899
900       • The  virtio-blk-vfio-pci  driver uses uses VFIO to control a PCI vir‐
901         tio-blk device.
902
903       • The virtio-blk-vhost-user driver connects as a client to a  Unix  do‐
904         main  socket provided by a vhost-user-blk backend (e.g. exported from
905         qemu-storage-daemon).
906
907       • The virtio-blk-vhost-vdpa driver uses vhost-vdpa kernel interface  to
908         perform  I/O  on  a  vDPA device. vDPA device could be implemented in
909         software (VDUSE, in-kernel, simulator) or in hardware.
910
911       These drivers always support poll queues, and their poll queues support
912       all types of requests.
913
914       The following properties apply to all these drivers.
915
916       Driver-specific properties available after blkio_create()
917
918          path (str, rw created, r connected/started)
919
920                 • virtio-blk-vfio-pci:  The  file system path of the device's
921                   sysfs directory, e.g., /sys/bus/pci/devices/0000:00:01.0.
922
923                 • virtio-blk-vhost-user:  The  file  system   path   of   the
924                   vhost-user socket to connect to.
925
926                 • virtio-blk-vhost-vdpa:   The   file   system  path  of  the
927                   vhost-vdpa character device to connect to.
928
929       Driver-specific properties available after blkio_connect()
930
931          max-queue-size (int, r connected/started)
932                 The maximum queue size supported by the device.
933
934          queue-size (int, rw connected, r started)
935                 The queue size to configure the device with. The  default  is
936                 256. A larger value allows more requests to be in flight, but
937                 consumes more resources.  Tuning this value can  affect  per‐
938                 formance.
939

BUILD SYSTEM INTEGRATION

941       pkg-config is the recommended way to build a program with libblkio:
942
943          $ cc -o app app.c `pkg-config blkio --cflags --libs`
944
945       Meson projects can use pkg-config as follows:
946
947          blkio = dependency('blkio')
948          executable('app', 'app.c', dependencies : [blkio])
949

FREQUENTLY ASKED QUESTIONS

951   Can network storage drivers be added?
952       Maybe.  The API was designed with a synchronous control path. Functions
953       like blkio_get_uint64() must  return  quickly.  Operations  on  network
954       storage can take an unbounded amount of time (in the absence of a time‐
955       out mechanism) and are not a good fit for synchronous APIs. A more com‐
956       plex  asynchronous  control  path  API  could be added for applications
957       wishing to use network storage drivers in the future.
958
959   Can non-Linux operating systems be supported in the future?
960       Maybe. No attempt has been made to restrict the library to  POSIX  fea‐
961       tures  only  and most drivers are platform-specific. If there is demand
962       for supporting other operating systems and developers willing  to  work
963       on it then it may be possible.
964
965   Can a Linux AIO driver be added?
966       Linux  AIO  could  serve as a fallback on systems where io_uring is not
967       available.  However, io_submit(2) can block the process and this causes
968       performance problems in event-driven applications that require that the
969       event loop does not block. Unless Linux AIO is  fixed  it  is  unlikely
970       that a proposal to add a driver will be accepted.
971

SEE ALSO

973       io_uring_setup(2), io_setup(2), aio(7)
974
975
976
977
978                                                                      BLKIO(3)
Impressum