1BLKIO(3) BLKIO(3)
2
3
4
6 blkio - Block device I/O library
7
9 libblkio is a library for accessing data stored on block devices. Block
10 devices offer persistent data storage and are addressable in fixed-size
11 units called blocks. Block sizes of 4 KiB or 512 bytes are typical.
12 Hard disk drives, solid state disks (SSDs), USB mass storage devices,
13 and other types of hardware are block devices.
14
15 The focus of libblkio is on fast I/O for multi-threaded applications.
16 Management of block devices, including partitioning and resizing, is
17 outside the scope of the library.
18
19 Block devices have one or more queues for submitting I/O requests such
20 as reads and writes. Block devices process I/O requests from their
21 queues and produce a return code for each completed request indicating
22 success or an error.
23
24 The application is responsible for thread-safety. No thread synchro‐
25 nization is necessary when a queue is only used from a single thread.
26 Proper synchronization is required when sharing a queue between multi‐
27 ple threads.
28
29 libblkio can be used in blocking, event-driven, and polling modes de‐
30 pending on the architecture of the application and its performance re‐
31 quirements.
32
33 Blocking mode suspends the execution of the current thread until the
34 request completes. This is most natural way of writing programs that
35 perform a sequence of I/O requests but cannot exploit request parallel‐
36 ism.
37
38 Event-driven mode provides a completion file descriptor that the appli‐
39 cation can monitor from its event loop. This allows multiple I/O re‐
40 quests to be in flight simultaneously and the application can respond
41 to other events while waiting for completions.
42
43 Polling mode also supports multiple in-flight requests but the applica‐
44 tion continuously checks for completions, typically from a tight loop,
45 in order to minimize latency.
46
47 libblkio contains drivers for several block I/O interfaces. This allows
48 applications using libblkio to access different block devices through a
49 single API.
50
51 Creating a blkio instance
52 A struct blkio instance is created from a specific driver such as
53 "io_uring" as follows:
54
55 struct blkio *b;
56 int ret;
57
58 ret = blkio_create("io_uring", &b);
59 if (ret < 0) {
60 fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
61 return;
62 }
63
64 For a list of available drivers, see the DRIVERS section below.
65
66 Error messages
67 Functions generally return 0 on success and a negative errno(3) value
68 on failure. In the later case, a per-thread error message is also set
69 and can be obtained as a const char * by calling blkio_get_error_msg().
70
71 Note that these messages are not stable and may change in between back‐
72 ward-compatible libblkio releases. The same applies to returned errno
73 values, unless a specific value is explicitly documented for a particu‐
74 lar error condition.
75
76 Connecting to a block device
77 Connection details for a block device are specified by setting proper‐
78 ties on the blkio instance. The available properties depend on the
79 driver. For example, the io_uring driver's "path" property is set to
80 /dev/sdb to access a local disk:
81
82 int ret = blkio_set_str(b, "path", "/dev/sdb");
83 if (ret < 0) {
84 fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
85 blkio_destroy(&b);
86 return;
87 }
88
89 Once the connection details have been specified the blkio instance can
90 be connected to the block device with blkio_connect():
91
92 ret = blkio_connect(b);
93
94 Starting a block device
95 After the blkio instance is connected, properties are available to con‐
96 figure its operation and query device characteristics such as the maxi‐
97 mum number of queues. See PROPERTIES for details.
98
99 For example, the number of queues can be set as follows:
100
101 ret = blkio_set_int(b, "num-queues", 4);
102
103 Once configuration is complete the blkio instance is started with
104 blkio_start():
105
106 ret = blkio_start(b);
107
108 Mapping memory regions
109 Memory containing I/O data buffers must be "mapped" before submitting
110 requests that touch the memory when the "needs-mem-regions" property is
111 true. Otherwise mapping memory is optional but doing so may improve
112 performance.
113
114 Memory regions are mapped globally for the blkio instance and are
115 available to all queues. A memory region is represented as follows:
116
117 struct blkio_mem_region
118 {
119 void *addr;
120 uint64_t iova;
121 size_t len;
122 int64_t fd_offset;
123 int fd;
124 uint32_t flags;
125 };
126
127 The addr field contains the starting address of the memory region. Re‐
128 quests transfer data between the block device and a subset of the mem‐
129 ory region, including up to the entire memory region. Individual
130 read/write requests or readv/writev request segments (iovecs) must not
131 access more than one memory region. Multiple requests can access the
132 same memory region simultaneously, although usually with non-overlap‐
133 ping areas.
134
135 The addr field must be a multiple of the "mem-region-alignment" prop‐
136 erty.
137
138 The iova field is reserved and must be zero.
139
140 The len field is the size of the memory region in bytes. The value must
141 be a multiple of the "mem-region-alignment" property.
142
143 The fd field is the file descriptor for the memory region. Some drivers
144 require that I/O data buffers are located in file-backed memory. This
145 can be anonymous memory from memfd_create(2) rather than an actual file
146 on disk. If the "needs-mem-region-fd" property is true then this field
147 must be a valid file descriptor. If the property is false this field
148 may be -1.
149
150 The fd_offset field is the byte offset from the start of the file given
151 in fd.
152
153 The flags field is reserved and must be zero.
154
155 The application can either allocate I/O data buffers itself and de‐
156 scribe them with struct blkio_mem_region or it can use blkio_al‐
157 loc_mem_region() and blkio_free_mem_region() to allocate memory suit‐
158 able for I/O data buffers:
159
160 int blkio_alloc_mem_region(struct blkio *b, struct blkio_mem_region *region,
161 size_t len);
162 void blkio_free_mem_region(struct blkio *b,
163 const struct blkio_mem_region *region);
164
165 The len argument is the number of bytes to allocate. These functions
166 may only be called after the blkio instance has been started.
167
168 File descriptors for memory regions created with blkio_alloc_mem_re‐
169 gion() are automatically closed across execve(2).
170
171 Memory regions can be mapped and unmapped after the blkio instance has
172 been started using the blkio_map_mem_region() and blkio_unmap_mem_re‐
173 gion() functions:
174
175 int blkio_map_mem_region(struct blkio *b,
176 const struct blkio_mem_region *region);
177 void blkio_unmap_mem_region(struct blkio *b,
178 const struct blkio_mem_region *region);
179
180 These functions must not be called while requests are in flight that
181 access the affected memory region. Memory regions must not overlap.
182 Memory regions must be unmapped/freed with exactly the same region
183 field values that they were mapped/allocated with.
184
185 blkio_map_mem_region() does not take ownership of region->fd. The
186 caller may close region->fd after blkio_map_mem_region() returns.
187
188 blkio_map_mem_region() returns an error if called on a memory region
189 that is already mapped against the given blkio. blkio_unmap_mem_re‐
190 gion() has no effect when called on a memory region that is not mapped
191 against the given blkio.
192
193 blkio_free_mem_region() must not be called on a memory region that was
194 mapped but not unmapped.
195
196 For best performance applications should map memory regions once and
197 reuse them instead of changing memory regions frequently.
198
199 The "max-mem-regions" property gives the maximum number of memory re‐
200 gions that can be mapped.
201
202 Memory regions are automatically unmapped when blkio_destroy() is
203 called, and memory regions allocated using blkio_alloc_mem_region() are
204 freed.
205
206 Performing I/O
207 Once at least one memory region has been mapped, the queues are ready
208 for request processing. The following example reads 4096 bytes from
209 byte offset 0x10000:
210
211 struct blkioq *q = blkio_get_queue(b, 0);
212
213 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
214
215 struct blkio_completion completion;
216 ret = blkioq_do_io(q, &completion, 1, 1, NULL);
217 if (ret != 1) ...
218 if (completion.ret != 0) ...
219
220 This is an example of blocking mode where blkioq_do_io() waits until
221 the I/O request completes. See below for details on event-driven and
222 polling modes.
223
224 The blkioq_do_io() function offers the following arguments:
225
226 int blkioq_do_io(struct blkioq *q,
227 struct blkio_completion *completions,
228 int min_completions,
229 int max_completions,
230 struct timespec *timeout);
231
232 The completions argument is a pointer to an array that is filled in
233 with completions when the function returns. When max_completions is 0
234 completions may be NULL. Completions are represented by struct
235 blkio_completion:
236
237 struct blkio_completion
238 {
239 void *user_data;
240 const char *error_msg;
241 int ret;
242 /* reserved space */
243 };
244
245 The user_data field is the same pointer passed to blkioq_read() in the
246 example above. Applications that submit multiple requests can use
247 user_data to correlate completions to previously submitted requests.
248
249 The ret field is the return code for the I/O request in negative errno
250 representation. This field is 0 on success.
251
252 For some errors, the error_msg field points to a message describing
253 what caused the request to fail. Note that this may be NULL even if ret
254 is not 0, and is always NULL when ret is 0.
255
256 Note that these messages are not stable and may change in between back‐
257 ward-compatible libblkio releases. The same applies to the errno values
258 returned through ret, unless a specific value is explicitly documented
259 for a particular error condition.
260
261 struct blkio_completion also includes some reserved space which may be
262 used to add more fields in the future in a backward-compatible manner.
263
264 The remaining arguments of blkioq_do_io() are as follows:
265
266 The min_completions argument controls how many completions to wait for.
267 A value greater than 0 causes the function to block until the number of
268 completions has been reached. A value of 0 causes the function to sub‐
269 mit I/O and return completions that have already occurred without wait‐
270 ing for more. If greater than the number of currently outstanding re‐
271 quests, blkioq_do_io() fails with -EINVAL.
272
273 The max_completions argument is the maximum number of completions ele‐
274 ments to fill in. This value must be greater or equal to min_comple‐
275 tions.
276
277 The timeout argument specifies the maximum amount of time to wait for
278 completions. The function returns -ETIME if the timeout expires before
279 a request completes. If timeout is NULL the function blocks indefi‐
280 nitely. When timeout is non-NULL the elapsed time is subtracted and the
281 struct timespec is updated when the function returns regardless of suc‐
282 cess or failure.
283
284 The return value is the number of completions elements filled in. This
285 value is within the inclusive range [min_completions, max_completions]
286 on success or a negative errno on failure.
287
288 A blkioq_do_io_interruptible() variant is also available:
289
290 int blkioq_do_io_interruptible(struct blkioq *q,
291 struct blkio_completion *completions,
292 int min_completions,
293 int max_completions,
294 struct timespec *timeout,
295 const sigset_t *sig);
296
297 Unlike blkioq_do_io(), this function can be interrupted by signals and
298 return -EINTR. The sig argument temporarily sets the signal mask of the
299 process while waiting for completions, which allows the thread to be
300 woken by a signal without race conditions. To ensure this function is
301 interrupted when a signal is received, (1) the said signal must be in a
302 blocked state when invoking the function (see sigprocmask(2)) and (2) a
303 signal mask unblocking that signal must be given as the sig argument.
304
305 Event-driven mode
306 Completion processing can be integrated into the event loop of an ap‐
307 plication so that other activity can take place while I/O is in flight.
308 Each queue has a completion file descriptor that is returned by the
309 following function:
310
311 int blkioq_get_completion_fd(struct blkioq *q);
312
313 The returned file descriptor becomes readable when blkioq_do_io() needs
314 to be called again. Spurious events can occur, causing the fd to become
315 readable even if there are no new completions available.
316
317 The returned file descriptor has O_NONBLOCK set. The application may
318 switch the file descriptor to blocking mode.
319
320 By default, the driver might not generate completion events for re‐
321 quests so it is necessary to explicitly enable the completion file de‐
322 scriptor before use:
323
324 void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);
325
326 Changes made using this function apply also to requests that are al‐
327 ready in flight but not yet completed. Note that even after calling
328 this function with enabled as false, the driver may still generate com‐
329 pletion events.
330
331 The application must read 8 bytes from the completion file descriptor
332 to reset the event before calling blkioq_do_io(). The contents of the
333 bytes are undefined and should not be interpreted by the application.
334
335 The following example demonstrates event-driven I/O:
336
337 struct blkioq *q = blkio_get_queue(b, 0);
338 int completion_fd = blkio_get_completion_fd(q);
339 char event_data[8];
340
341 /* Switch to blocking mode for read(2) below */
342 fcntl(completion_fd, F_SETFL,
343 fcntl(completion_fd, F_GETFL, NULL) & ~O_NONBLOCK);
344
345 /* Enable completion events */
346 blkioq_set_completion_fd_enabled(q, true);
347
348 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
349
350 /* Since min_completions = 0 we will submit but not wait */
351 ret = blkioq_do_io(q, NULL, 0, 0, NULL);
352 if (ret != 0) ...
353
354 /* Wait for the next event on the completion file descriptor */
355 struct blkio_completion completion;
356 do {
357 read(completion_fd, event_data, sizeof(event_data));
358 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
359 } while (ret == 0);
360 if (ret != 1) ...
361 if (completion.ret != 0) ...
362
363 This example uses a blocking read(2) to wait and consume the next event
364 on the completion file descriptor. Because spurious events can occur,
365 it then checks if there actually is a completion available, retrying
366 read(2) otherwise.
367
368 Normally completion_fd would be registered with an event loop so the
369 application can perform other tasks while waiting.
370
371 Applications may save CPU cycles by suppressing completion file de‐
372 scriptor notifications while processing completions. This optimization
373 avoids an unnecessary application event loop iteration and completion
374 file descriptor read when additional completions arrive while the ap‐
375 plication is processing completions:
376
377 static void process_completions(...)
378 {
379 int ret;
380
381 /* Supress completion fd notifications while we process completions */
382 blkioq_set_completion_fd_enabled(q, false);
383
384 do {
385 struct blkioq_completion completion;
386 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
387
388 if (ret == 0) {
389 blkioq_set_completion_fd_enabled(q, true);
390
391 /* Re-check for completions to avoid race */
392 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
393 if (ret == 1) {
394 blkioq_set_completion_fd_enabled(q, false);
395 }
396 }
397
398 if (ret < 0) {
399 ... /* error */
400 }
401
402 if (ret == 1) {
403 ... /* process completion */
404 }
405 } while (ret == 1);
406 }
407
408 Application-level polling mode
409 Waiting for completions using blkioq_do_io() with min_completions > 0
410 can cause the current thread to be descheduled by the operating sys‐
411 tem's scheduler. The same is true when waiting for events on the com‐
412 pletion file descriptor returned by blkioq_get_completion_fd(). Some
413 applications require consistent low response times and therefore cannot
414 risk being descheduled.
415
416 blkioq_do_io() may be called from a CPU polling loop with min_comple‐
417 tions = 0 to check for completions:
418
419 struct blkioq *q = blkio_get_queue(b, 0);
420
421 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
422
423 /* Busy-wait for the completion */
424 struct blkio_completion completion;
425 do {
426 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
427 } while (ret == 0);
428
429 if (ret != 1) ...
430 if (completion.ret != 0) ...
431
432 This approach is ideal for applications that need to poll several event
433 sources simultaneously, or that need to intersperse polling with other
434 application logic. Otherwise, driver-level polling (see below) may lead
435 to further performance gains.
436
437 Driver-level polling mode (poll queues)
438 Poll queues differ from the "regular" queues presented above in that
439 calling blkioq_do_io() with min_completions > 0 causes libblkio itself
440 (or other lower layers) to poll for completions. This can be more effi‐
441 cient than repeatedly invoking blkioq_do_io() with min_completions = 0
442 on a "regular" queue. For instance, with the io_uring driver, poll
443 queues cause the kernel itself to poll for completions, avoiding re‐
444 peated context switching while polling.
445
446 A limitation of poll queues is that the CPU thread is occupied with a
447 single poll queue and cannot detect other events in the meantime such
448 as network I/O or application events. Applications wishing to poll mul‐
449 tiple things simultaneously may prefer to use application-level polling
450 (see above).
451
452 Poll queue support is contingent on the particular driver and driver
453 configuration being used. To determine whether a given blkio supports
454 poll queues, check the "supports-poll-queues" property:
455
456 bool supports_poll_queues;
457 ret = blkio_get_bool(b, "supports-poll-queues", &supports_poll_queues);
458 if (ret != 0) ...
459
460 if (!supports_poll_queues) {
461 fprintf(stderr, "Poll queues not supported\n");
462 return;
463 }
464
465 It is possible for poll queues not to support flush, write zeroes, and
466 discard requests, even if "regular" queues of the same blkio do. How‐
467 ever, read, write, readv, and writev requests are always supported.
468 There is currently no mechanism to check which types of requests are
469 supported by poll queues.
470
471 To use poll queues, set the "num-poll-queues" property to a positive
472 value before calling blkio_start(), then use blkio_get_poll_queue() to
473 retrieve the poll queues. A single blkio can have both "regular" queues
474 and poll queues:
475
476 ...
477 ret = blkio_connect(b);
478 if (ret != 0) ...
479
480 ret = blkio_set_int(b, "num-queues", 1);
481 ret = blkio_set_int(b, "num-poll-queues", 1);
482 if (ret != 0) ...
483
484 ret = blkio_start(b);
485 if (ret != 0) ...
486
487 struct blkioq *q = blkio_get_queue(b, 0);
488 struct blkioq *poll_q = blkio_get_poll_queue(b, 0);
489
490 It is possible to set property "num-queues" to 0 as long as
491 "num-poll-queues" is positive.
492
493 Poll queues also differ from "regular" queues in that they do not have
494 a completion fd. blkioq_get_completion_fd() returns -1 when called on a
495 poll queue, and blkioq_set_completion_fd_enabled() has no effect. Fur‐
496 ther, blkioq_do_io_interruptible() is not currently supported on poll
497 queues.
498
499 Note that you can still perform application-level polling on poll
500 queues by repeatedly calling blkioq_do_io() with min_completions = 0,
501 but this will lead to suboptimal performance.
502
503 Dynamically adding and removing queues
504 Some drivers have support for adding queues on demand after the blkio
505 instance is already started:
506
507 int index = blkio_add_queue(b); /* or blkio_add_poll_queue() */
508 if (ret < 0) ...
509
510 struct blkioq *q = blkio_get_queue(b, index); /* or blkio_get_poll_queue() */
511
512 The "can-add-queues" property determines whether this is supported.
513 When it is, the blkio instance can be started with 0 queues.
514
515 In addition, all drivers allow explicitly removing queues, regardless
516 of whether those queues were created by blkio_start() or
517 blkio_add_queue() / blkio_add_poll_queue():
518
519 assert(blkio_get_queue(b, 0) != NULL);
520 assert(blkio_get_queue(b, 1) != NULL);
521
522 /* blkio_remove_queue() will return 0, indicating success */
523 assert(blkio_remove_queue(b, 0) == 0);
524
525 /* Other queues' indices are not shifted, so q will be non-NULL and valid */
526 struct blkio *q = blkio_get_queue(b, 1);
527 assert(q != NULL);
528
529 /* blkio_remove_queue() will return -ENOENT, since queue 0 no longer exists */
530 assert(blkio_remove_queue(b, 0) == -ENOENT);
531
532 Once a queue is removed, any struct blkioq * pointing to it becomes in‐
533 valid.
534
535 Request types
536 The following types of I/O requests are available:
537
538 void blkioq_read(struct blkioq *q, uint64_t start, void *buf, size_t len,
539 void *user_data, uint32_t flags);
540 void blkioq_write(struct blkioq *q, uint64_t start, void *buf, size_t len,
541 void *user_data, uint32_t flags);
542 void blkioq_readv(struct blkioq *q, uint64_t start, struct iovec *iovec,
543 int iovcnt, void *user_data, uint32_t flags);
544 void blkioq_writev(struct blkioq *q, uint64_t start, struct iovec *iovec,
545 int iovcnt, void *user_data, uint32_t flags);
546 void blkioq_write_zeroes(struct blkioq *q, uint64_t start, uint64_t len,
547 void *user_data, uint32_t flags);
548 void blkioq_discard(struct blkioq *q, uint64_t start, uint64_t len,
549 void *user_data, uint32_t flags);
550 void blkioq_flush(struct blkioq *q, void *user_data, uint32_t flags);
551
552 The block device may see requests as soon as they these functions are
553 called, but blkioq_do_io() must be called to ensure requests are seen.
554
555 If property "needs-mem-regions" is true, I/O data buffers pointed to by
556 buf and iovec must be within regions mapped using blkio_map_mem_re‐
557 gion().
558
559 The application must not free the iovec elements until the request's
560 completion is returned by blkioq_do_io().
561
562 All drivers are guaranteed to support at least blkioq_read(),
563 blkioq_write(), blkioq_readv(), blkioq_writev(), and blkioq_flush().
564 When attempting to queue a request that the driver does not support,
565 the request itself fails and its completion's ret field is -ENOTSUP.
566
567 blkioq_read() and blkioq_readv() read data from the block device at
568 byte offset start. blkioq_write() and blkioq_writev() write data to the
569 block device at byte offset start. The length of the I/O data buffer is
570 len bytes and the total size of the iovec elements, respectively. start
571 and the length of the I/O data buffer must be a multiple of the "re‐
572 quest-alignment" property. I/O data buffer addresses and lengths, in‐
573 cluding buf and individual iovec elements, must be multiples of the
574 "buf-alignment" property.
575
576 blkioq_write_zeroes() causes zeros to be written to the specified re‐
577 gion. When supported, this may be more efficient than using
578 blkioq_write() with a zero-filled buffer.
579
580 blkioq_discard() causes data in the specified region to be discarded.
581 Subsequent reads to the same region return unspecified data until it is
582 written to again. Note that discarded data is not guaranteed to be
583 erased and may still be returned by reads.
584
585 blkioq_flush() persists completed writes to the storage medium. Data is
586 persistent once the flush request completes successfully. Applications
587 that need to ensure that data persists across power failure or crash
588 must submit flush requests at appropriate points.
589
590 The user_data pointer is returned in the struct blkio_comple‐
591 tion::user_data field by blkioq_do_io(). It allows applications to cor‐
592 relate a completion with its request.
593
594 No ordering guarantees are defined for requests that are in flight si‐
595 multaneously. For example, a flush request is not guaranteed to persist
596 in-flight write requests. Instead the application must wait for write
597 requests that it wishes to persist to complete before calling
598 blkioq_flush().
599
600 Similarly, there are no ordering guarantees between multiple queues of
601 a block device. Multi-threaded applications that rely on an ordering
602 between multiple queues must wait for the first request to complete on
603 one queue, synchronize threads as needed, and then submit the second
604 request on the other queue.
605
606 Request flags
607 The following request flags are available:
608
609 BLKIO_REQ_FUA
610 Ensures that data written by this request reaches persistent
611 storage before the request is completed. This is also known as
612 Full Unit Access (FUA). This flag eliminates the need for a sep‐
613 arate blkioq_flush() call after the request has completed. Other
614 data that was previously successfully written without the
615 BLKIO_REQ_FUA flag is not necessarily persisted by this flag as
616 it is only guaranteed to affect the current request. Supported
617 by blkioq_write() and blkioq_writev().
618
619 BLKIO_REQ_NO_UNMAP
620 Ensures that blkioq_write_zeroes() does not cause underlying
621 storage space to be deallocated, guaranteeing that subsequent
622 writes to the same region do not fail due to lack of space.
623
624 BLKIO_REQ_NO_FALLBACK
625 Ensures that blkioq_write_zeroes() does not resort to performing
626 regular write requests with zero-filled buffers. If that would
627 otherwise be the case and this flag is set, then the request
628 fails and its completion's ret field is -ENOTSUP.
629
631 The configuration of blkio instances is done through property accesses.
632 Each property has a name and a type (bool, int, str, uint64). Proper‐
633 ties may be read-only (r), write-only (w), or read/write (rw).
634
635 Access to properties depends on the blkio instance state (created/con‐
636 nected/started). A property may be read/write in the connected state
637 but read-only in the started state. This is written as "rw connected, r
638 started".
639
640 The following properties APIs are available:
641
642 int blkio_get_bool(struct blkio *b, const char *name, bool *value);
643 int blkio_get_int(struct blkio *b, const char *name, int *value);
644 int blkio_get_uint64(struct blkio *b, const char *name, uint64_t *value);
645 int blkio_get_str(struct blkio *b, const char *name, char **value);
646
647 int blkio_set_bool(struct blkio *b, const char *name, bool value);
648 int blkio_set_int(struct blkio *b, const char *name, int value);
649 int blkio_set_uint64(struct blkio *b, const char *name, uint64_t value);
650 int blkio_set_str(struct blkio *b, const char *name, const char *value);
651
652 blkio_get_str() assigns to *value and the caller must use free(3) to
653 deallocate the memory.
654
655 blkio_get_str() automatically converts to string representation if the
656 property is not a str. blkio_set_str() automatically converts from
657 string representation if the property is not a str. This can be used to
658 easily fetch values from and store values to an application's
659 text-based configuration file or command-line. Aside from this auto‐
660 matic conversion, the other property APIs fail with ENOTTY if the prop‐
661 erty does not have the right type.
662
663 The following properties are common across all drivers. Driver-specific
664 properties are documented in DRIVERS.
665
666 Properties available after blkio_create()
667 can-add-queues (bool, r created/connected/started)
668 Whether the driver supports dynamically adding queues with
669 blkio_add_queue() / blkio_add_poll_queue().
670
671 driver (str, r created/connected/started)
672 The driver name that was passed to blkio_create(). See DRIVERS
673 for details on available drivers.
674
675 read-only (bool, rw created, r connected/started)
676 If true, requests other than read and flush fail with -EBADF.
677 The default is false.
678
679 Properties available after blkio_connect()
680 DEVICE AND QUEUES
681
682 capacity (uint64, r connected/started)
683 The size of the block device in bytes.
684
685 max-queues (int, r connected/started)
686 The maximum number of queues, including poll queues if any.
687
688 num-queues (int, rw connected, r started)
689 The number of queues. The default is 1.
690
691 num-poll-queues (int, rw connected, r started)
692 The number of poll queues. The default is 0. If set to a pos‐
693 itive value and property "supports-poll-queues" is false,
694 blkio_start() will fail.
695
696 supports-poll-queues (bool, r connected/started)
697 Whether the driver supports poll queues.
698
699 MEMORY REGIONS
700
701 max-mem-regions (uint64, r connected/started)
702 The maximum number of memory regions that can be mapped at
703 any given time.
704
705 may-pin-mem-regions (bool, r connected/started)
706 Will the driver sometimes pin memory region pages and there‐
707 fore prevent madvise(MADV_DONTNEED) and related syscalls from
708 working?
709
710 mem-region-alignment (uint64, r connected/started)
711 The alignment requirement, in bytes, for the addr, iova, and
712 size in struct blkio_memory_region. This is always a multiple
713 of the "buf-alignment" property.
714
715 needs-mem-regions (bool, r connected/started)
716 Is it necessary to map memory regions with blkio_map_mem_re‐
717 gion()?
718
719 needs-mem-region-fd (bool, r connected/started)
720 Is it necessary to provide a file descriptor for each memory
721 region?
722
723 ALL REQUESTS
724
725 optimal-io-alignment (int, r connected/started)
726 The ideal number of bytes of request start and length align‐
727 ment for maximizing performance. This is a multiple of the
728 "request-alignment" property.
729
730 optimal-io-size (int, r connected/started)
731 The ideal request length in bytes for achieving high through‐
732 put. Can be 0 if unspecified. Otherwise, this is a multiple
733 of the "optimal-io-alignment" property.
734
735 request-alignment (int, r connected/started)
736 All request start and length must be a multiple of this
737 value. Often this value is 512 bytes.
738
739 flush-needed (bool, r, connected/started)
740 Whether a flush request must be sent after write request com‐
741 pletion to ensure data persistence.
742
743 READ AND WRITE REQUESTS
744
745 buf-alignment (int, r connected/started)
746 I/O data buffer memory address and length alignment, includ‐
747 ing plain void *buf buffers and iovec segments. Note the
748 "mem-region-alignment" property is always a multiple of this
749 value.
750
751 can-grow (bool, r connected/started)
752 If false blkioq_read(), blkioq_readv(), blkioq_write() and
753 blkioq_writev() will fail if an attempt to read/write beyond
754 of EOF is made. Otherwise, reads will succeed and the portion
755 of the read buffer that overruns EOF will be filled with ze‐
756 ros, and writes will increase the the device's capacity.
757
758 max-segments (int, r connected/started)
759 The maximum iovcnt in a request.
760
761 max-segment-len (int, r connected/started)
762 The maximum size of each iovec in a request. Can be 0 if un‐
763 specified.
764
765 max-transfer (int, r connected/started)
766 The maximum read or write request length in bytes. Can be 0
767 if unspecified.
768
769 optimal-buf-alignment (int, r connected/started)
770 The ideal number of bytes of I/O data buffer memory address
771 and length alignment, including plain void *buf buffers and
772 iovec segments.
773
774 supports-fua-natively (bool, r connected/started)
775 Whether blkioq_write() and blkioq_writev() support the
776 BLKIO_REQ_FUA flag natively, as opposed to emulating it by
777 internally performing a flush request after the write.
778
779 WRITE ZEROES REQUESTS
780
781 max-write-zeroes-len (uint64, r connected/started)
782 The maximum length of a write zeroes request in bytes. Can be
783 0 if unspecified.
784
785 DISCARD REQUESTS
786
787 discard-alignment (int, r connected/started)
788 Discard request start and length, after subtracting the value
789 of the "discard-alignment-offset" property, must be a multi‐
790 ple of this value. This may or may not be 0 if discard re‐
791 quests are not supported. If not 0, this is a multiple of the
792 "request-alignment" property.
793
794 discard-alignment-offset (int, r connected/started)
795 Offset of the first block that may be discarded. This may be
796 non-zero, for example, when the device is a partition that is
797 not aligned to the value of the "discard-alignment" property.
798 This may or may not be 0 if discard requests are not sup‐
799 ported. If not 0, this is a multiple of the "request-align‐
800 ment" property, and is less than the "discard-alignment"
801 property.
802
803 max-discard-len (uint64, r connected/started)
804 The maximum length of a discard request in bytes. Can be 0 if
805 unspecified.
806
808 io_uring
809 The io_uring driver uses the Linux io_uring system call interface to
810 perform I/O on files and block device nodes. Both regular files and
811 block device nodes are supported.
812
813 Note that io_uring was introduced in Linux kernel version 5.1, and ker‐
814 nels may also be configured to disable io_uring. If io_uring is not
815 available, blkio_create() fails with -ENOSYS when using this driver.
816
817 When performing I/O on regular files, write zeroes requests that extend
818 past the end-of-file may or may not update the file size. This is left
819 unspecified and the user must not rely on any particular behavior.
820
821 This driver supports poll queues only when using O_DIRECT on block de‐
822 vices or file systems that support polling. Its poll queues never sup‐
823 port flush, write zeroes, or discard requests.
824
825 Driver-specific properties available after blkio_create()
826
827 direct (bool, rw created, r connected/started)
828 True to bypass the page cache with O_DIRECT. The default is
829 false.
830
831 fd (int, rw created, r connected/started)
832 An existing open file descriptor for the file or block device
833 node. Ownership of the file descriptor is passed to the li‐
834 brary when blkio_connect() returns success.
835
836 If this property is set, properties "direct" and "read-only"
837 have no effect and it is the user's responsibility to open
838 the file with the desired flags. Further, during connect,
839 those two properties are updated to reflect the file status
840 flags of the given file descriptor.
841
842 path (str, rw created, r connected/started)
843 The file system path of the file or block device node.
844
845 If this property is set, property "fd" must not be set and
846 will be updated on connect to reflect the opened file de‐
847 scriptor. Note that the file descriptor is owned by libblkio.
848
849 Driver-specific properties available after blkio_connect()
850
851 num-entries (int, rw connected, r started)
852 The minimum number of entries that each io_uring submission
853 queue and completion queue should have. The default is 128.
854
855 A larger value allows more requests to be in flight, but con‐
856 sumes more resources. Tuning this value can affect perfor‐
857 mance.
858
859 io_uring imposes a maximum on this number: 32768 as of main‐
860 line kernel 5.18, and 4096 prior to 5.4. If this maximum is
861 exceeded, blkio_start() will fail with -EINVAL.
862
863 nvme-io_uring
864 The nvme-io_uring driver submits NVMe commands directly to an NVMe
865 namespace using io_uring passthrough, which is available since mainline
866 Linux kernel 5.19.
867
868 The process must have the CAP_SYS_ADMIN capability to use this driver,
869 and the NVMe namespace must use the NVM command set.
870
871 Driver-specific properties available after blkio_create()
872
873 fd (int, rw created, r connected/started)
874 An existing open file descriptor for the NVMe namespace's
875 character device (e.g., /dev/ng0n1). Ownership of the file
876 descriptor is passed to the library when blkio_connect() re‐
877 turns success.
878
879 path (str, rw created, r connected/started)
880 A path to the NVMe namespace's character device (e.g.,
881 /dev/ng0n1).
882
883 If this property is set, property "fd" must not be set and
884 will be updated on connect to reflect the opened file de‐
885 scriptor. Note that the file descriptor is owned by libblkio.
886
887 Driver-specific properties available after blkio_connect()
888
889 num-entries (int, rw connected, r started)
890 The minimum number of entries that each io_uring submission
891 queue and completion queue should have. The default is 128.
892
893 A larger value allows more requests to be in flight, but con‐
894 sumes more resources. Tuning this value can affect perfor‐
895 mance.
896
897 io_uring imposes a maximum on this number: 32768 as of main‐
898 line kernel 5.18, and 4096 prior to 5.4. If this maximum is
899 exceeded, blkio_start() will fail with -EINVAL.
900
901 virtio-blk-...
902 The following virtio-blk drivers are provided:
903
904 • The virtio-blk-vfio-pci driver uses uses VFIO to control a PCI vir‐
905 tio-blk device.
906
907 • The virtio-blk-vhost-user driver connects as a client to a Unix do‐
908 main socket provided by a vhost-user-blk backend (e.g. exported from
909 qemu-storage-daemon).
910
911 • The virtio-blk-vhost-vdpa driver uses vhost-vdpa kernel interface to
912 perform I/O on a vDPA device. vDPA device could be implemented in
913 software (VDUSE, in-kernel, simulator) or in hardware.
914
915 These drivers always support poll queues, and their poll queues support
916 all types of requests.
917
918 The following properties apply to all these drivers with some excep‐
919 tions described in the property.
920
921 Driver-specific properties available after blkio_create()
922
923 fd (int, rw created, r connected/started)
924 An existing open file descriptor for the file system path
925 (see path below). Ownership of the file descriptor is passed
926 to the library when blkio_connect() returns success. Cur‐
927 rently supported by the following drivers: - vir‐
928 tio-blk-vhost-vdpa
929
930 path (str, rw created, r connected/started)
931
932 • virtio-blk-vfio-pci: The file system path of the device's
933 sysfs directory, e.g., /sys/bus/pci/devices/0000:00:01.0.
934
935 • virtio-blk-vhost-user: The file system path of the
936 vhost-user socket to connect to.
937
938 • virtio-blk-vhost-vdpa: The file system path of the
939 vhost-vdpa character device to connect to.
940
941 Driver-specific properties available after blkio_connect()
942
943 max-queue-size (int, r connected/started)
944 The maximum queue size supported by the device.
945
946 queue-size (int, rw connected, r started)
947 The queue size to configure the device with. The default is
948 256. A larger value allows more requests to be in flight, but
949 consumes more resources. Tuning this value can affect per‐
950 formance.
951
953 pkg-config is the recommended way to build a program with libblkio:
954
955 $ cc -o app app.c `pkg-config blkio --cflags --libs`
956
957 Meson projects can use pkg-config as follows:
958
959 blkio = dependency('blkio')
960 executable('app', 'app.c', dependencies : [blkio])
961
963 Can network storage drivers be added?
964 Maybe. The API was designed with a synchronous control path. Functions
965 like blkio_get_uint64() must return quickly. Operations on network
966 storage can take an unbounded amount of time (in the absence of a time‐
967 out mechanism) and are not a good fit for synchronous APIs. A more com‐
968 plex asynchronous control path API could be added for applications
969 wishing to use network storage drivers in the future.
970
971 Can non-Linux operating systems be supported in the future?
972 Maybe. No attempt has been made to restrict the library to POSIX fea‐
973 tures only and most drivers are platform-specific. If there is demand
974 for supporting other operating systems and developers willing to work
975 on it then it may be possible.
976
977 Can a Linux AIO driver be added?
978 Linux AIO could serve as a fallback on systems where io_uring is not
979 available. However, io_submit(2) can block the process and this causes
980 performance problems in event-driven applications that require that the
981 event loop does not block. Unless Linux AIO is fixed it is unlikely
982 that a proposal to add a driver will be accepted.
983
985 io_uring_setup(2), io_setup(2), aio(7)
986
987
988
989
990 BLKIO(3)