1BLKIO(3) BLKIO(3)
2
3
4
6 blkio - Block device I/O library
7
9 libblkio is a library for accessing data stored on block devices. Block
10 devices offer persistent data storage and are addressable in fixed-size
11 units called blocks. Block sizes of 4 KiB or 512 bytes are typical.
12 Hard disk drives, solid state disks (SSDs), USB mass storage devices,
13 and other types of hardware are block devices.
14
15 The focus of libblkio is on fast I/O for multi-threaded applications.
16 Management of block devices, including partitioning and resizing, is
17 outside the scope of the library.
18
19 Block devices have one or more queues for submitting I/O requests such
20 as reads and writes. Block devices process I/O requests from their
21 queues and produce a return code for each completed request indicating
22 success or an error.
23
24 The application is responsible for thread-safety. No thread synchro‐
25 nization is necessary when a queue is only used from a single thread.
26 Proper synchronization is required when sharing a queue between multi‐
27 ple threads.
28
29 libblkio can be used in blocking, event-driven, and polling modes de‐
30 pending on the architecture of the application and its performance re‐
31 quirements.
32
33 Blocking mode suspends the execution of the current thread until the
34 request completes. This is most natural way of writing programs that
35 perform a sequence of I/O requests but cannot exploit request parallel‐
36 ism.
37
38 Event-driven mode provides a completion file descriptor that the appli‐
39 cation can monitor from its event loop. This allows multiple I/O re‐
40 quests to be in flight simultaneously and the application can respond
41 to other events while waiting for completions.
42
43 Polling mode also supports multiple in-flight requests but the applica‐
44 tion continuously checks for completions, typically from a tight loop,
45 in order to minimize latency.
46
47 libblkio contains drivers for several block I/O interfaces. This allows
48 applications using libblkio to access different block devices through a
49 single API.
50
51 Creating a blkio instance
52 A struct blkio instance is created from a specific driver such as
53 "io_uring" as follows:
54
55 struct blkio *b;
56 int ret;
57
58 ret = blkio_create("io_uring", &b);
59 if (ret < 0) {
60 fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
61 return;
62 }
63
64 For a list of available drivers, see the DRIVERS section below.
65
66 Error messages
67 Functions generally return 0 on success and a negative errno(3) value
68 on failure. In the later case, a per-thread error message is also set
69 and can be obtained as a const char * by calling blkio_get_error_msg().
70
71 Note that these messages are not stable and may change in between back‐
72 ward-compatible libblkio releases. The same applies to returned errno
73 values, unless a specific value is explicitly documented for a particu‐
74 lar error condition.
75
76 Connecting to a block device
77 Connection details for a block device are specified by setting proper‐
78 ties on the blkio instance. The available properties depend on the
79 driver. For example, the io_uring driver's "path" property is set to
80 /dev/sdb to access a local disk:
81
82 int ret = blkio_set_str(b, "path", "/dev/sdb");
83 if (ret < 0) {
84 fprintf(stderr, "%s: %s\n", strerror(-ret), blkio_get_error_msg());
85 blkio_destroy(&b);
86 return;
87 }
88
89 Once the connection details have been specified the blkio instance can
90 be connected to the block device with blkio_connect():
91
92 ret = blkio_connect(b);
93
94 Starting a block device
95 After the blkio instance is connected, properties are available to con‐
96 figure its operation and query device characteristics such as the maxi‐
97 mum number of queues. See PROPERTIES for details.
98
99 For example, the number of queues can be set as follows:
100
101 ret = blkio_set_int(b, "num-queues", 4);
102
103 Once configuration is complete the blkio instance is started with
104 blkio_start():
105
106 ret = blkio_start(b);
107
108 Mapping memory regions
109 Memory containing I/O data buffers must be "mapped" before submitting
110 requests that touch the memory when the "needs-mem-regions" property is
111 true. Otherwise mapping memory is optional but doing so may improve
112 performance.
113
114 Memory regions are mapped globally for the blkio instance and are
115 available to all queues. A memory region is represented as follows:
116
117 struct blkio_mem_region
118 {
119 void *addr;
120 uint64_t iova;
121 size_t len;
122 int64_t fd_offset;
123 int fd;
124 uint32_t flags;
125 };
126
127 The addr field contains the starting address of the memory region. Re‐
128 quests transfer data between the block device and a subset of the mem‐
129 ory region, including up to the entire memory region. Individual
130 read/write requests or readv/writev request segments (iovecs) must not
131 access more than one memory region. Multiple requests can access the
132 same memory region simultaneously, although usually with non-overlap‐
133 ping areas.
134
135 The addr field must be a multiple of the "mem-region-alignment" prop‐
136 erty.
137
138 The iova field is reserved and must be zero.
139
140 The len field is the size of the memory region in bytes. The value must
141 be a multiple of the "mem-region-alignment" property.
142
143 The fd field is the file descriptor for the memory region. Some drivers
144 require that I/O data buffers are located in file-backed memory. This
145 can be anonymous memory from memfd_create(2) rather than an actual file
146 on disk. If the "needs-mem-region-fd" property is true then this field
147 must be a valid file descriptor. If the property is false this field
148 may be -1.
149
150 The fd_offset field is the byte offset from the start of the file given
151 in fd.
152
153 The flags field is reserved and must be zero.
154
155 The application can either allocate I/O data buffers itself and de‐
156 scribe them with struct blkio_mem_region or it can use blkio_al‐
157 loc_mem_region() and blkio_free_mem_region() to allocate memory suit‐
158 able for I/O data buffers:
159
160 int blkio_alloc_mem_region(struct blkio *b, struct blkio_mem_region *region,
161 size_t len);
162 void blkio_free_mem_region(struct blkio *b,
163 const struct blkio_mem_region *region);
164
165 The len argument is the number of bytes to allocate. These functions
166 may only be called after the blkio instance has been started.
167
168 File descriptors for memory regions created with blkio_alloc_mem_re‐
169 gion() are automatically closed across execve(2).
170
171 Memory regions can be mapped and unmapped after the blkio instance has
172 been started using the blkio_map_mem_region() and blkio_unmap_mem_re‐
173 gion() functions:
174
175 int blkio_map_mem_region(struct blkio *b,
176 const struct blkio_mem_region *region);
177 void blkio_unmap_mem_region(struct blkio *b,
178 const struct blkio_mem_region *region);
179
180 These functions must not be called while requests are in flight that
181 access the affected memory region. Memory regions must not overlap.
182 Memory regions must be unmapped/freed with exactly the same region
183 field values that they were mapped/allocated with.
184
185 blkio_map_mem_region() does not take ownership of region->fd. The
186 caller may close region->fd after blkio_map_mem_region() returns.
187
188 blkio_map_mem_region() returns an error if called on a memory region
189 that is already mapped against the given blkio. blkio_unmap_mem_re‐
190 gion() has no effect when called on a memory region that is not mapped
191 against the given blkio.
192
193 blkio_free_mem_region() must not be called on a memory region that was
194 mapped but not unmapped.
195
196 For best performance applications should map memory regions once and
197 reuse them instead of changing memory regions frequently.
198
199 The "max-mem-regions" property gives the maximum number of memory re‐
200 gions that can be mapped.
201
202 Memory regions are automatically unmapped when blkio_destroy() is
203 called, and memory regions allocated using blkio_alloc_mem_region() are
204 freed.
205
206 Performing I/O
207 Once at least one memory region has been mapped, the queues are ready
208 for request processing. The following example reads 4096 bytes from
209 byte offset 0x10000:
210
211 struct blkioq *q = blkio_get_queue(b, 0);
212
213 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
214
215 struct blkio_completion completion;
216 ret = blkioq_do_io(q, &completion, 1, 1, NULL);
217 if (ret != 1) ...
218 if (completion.ret != 0) ...
219
220 This is an example of blocking mode where blkioq_do_io() waits until
221 the I/O request completes. See below for details on event-driven and
222 polling modes.
223
224 The blkioq_do_io() function offers the following arguments:
225
226 int blkioq_do_io(struct blkioq *q,
227 struct blkio_completion *completions,
228 int min_completions,
229 int max_completions,
230 struct timespec *timeout);
231
232 The completions argument is a pointer to an array that is filled in
233 with completions when the function returns. When max_completions is 0
234 completions may be NULL. Completions are represented by struct
235 blkio_completion:
236
237 struct blkio_completion
238 {
239 void *user_data;
240 const char *error_msg;
241 int ret;
242 /* reserved space */
243 };
244
245 The user_data field is the same pointer passed to blkioq_read() in the
246 example above. Applications that submit multiple requests can use
247 user_data to correlate completions to previously submitted requests.
248
249 The ret field is the return code for the I/O request in negative errno
250 representation. This field is 0 on success.
251
252 For some errors, the error_msg field points to a message describing
253 what caused the request to fail. Note that this may be NULL even if ret
254 is not 0, and is always NULL when ret is 0.
255
256 Note that these messages are not stable and may change in between back‐
257 ward-compatible libblkio releases. The same applies to the errno values
258 returned through ret, unless a specific value is explicitly documented
259 for a particular error condition.
260
261 struct blkio_completion also includes some reserved space which may be
262 used to add more fields in the future in a backward-compatible manner.
263
264 The remaining arguments of blkioq_do_io() are as follows:
265
266 The min_completions argument controls how many completions to wait for.
267 A value greater than 0 causes the function to block until the number of
268 completions has been reached. A value of 0 causes the function to sub‐
269 mit I/O and return completions that have already occurred without wait‐
270 ing for more. If greater than the number of currently outstanding re‐
271 quests, blkioq_do_io() fails with -EINVAL.
272
273 The max_completions argument is the maximum number of completions ele‐
274 ments to fill in. This value must be greater or equal to min_comple‐
275 tions.
276
277 The timeout argument specifies the maximum amount of time to wait for
278 completions. The function returns -ETIME if the timeout expires before
279 a request completes. If timeout is NULL the function blocks indefi‐
280 nitely. When timeout is non-NULL the elapsed time is subtracted and the
281 struct timespec is updated when the function returns regardless of suc‐
282 cess or failure.
283
284 The return value is the number of completions elements filled in. This
285 value is within the inclusive range [min_completions, max_completions]
286 on success or a negative errno on failure.
287
288 A blkioq_do_io_interruptible() variant is also available:
289
290 int blkioq_do_io_interruptible(struct blkioq *q,
291 struct blkio_completion *completions,
292 int min_completions,
293 int max_completions,
294 struct timespec *timeout,
295 const sigset_t *sig);
296
297 Unlike blkioq_do_io(), this function can be interrupted by signals and
298 return -EINTR. The sig argument temporarily sets the signal mask of the
299 process while waiting for completions, which allows the thread to be
300 woken by a signal without race conditions. To ensure this function is
301 interrupted when a signal is received, (1) the said signal must be in a
302 blocked state when invoking the function (see sigprocmask(2)) and (2) a
303 signal mask unblocking that signal must be given as the sig argument.
304
305 Event-driven mode
306 Completion processing can be integrated into the event loop of an ap‐
307 plication so that other activity can take place while I/O is in flight.
308 Each queue has a completion file descriptor that is returned by the
309 following function:
310
311 int blkioq_get_completion_fd(struct blkioq *q);
312
313 The returned file descriptor becomes readable when blkioq_do_io() needs
314 to be called again. Spurious events can occur, causing the fd to become
315 readable even if there are no new completions available.
316
317 The returned file descriptor has O_NONBLOCK set. The application may
318 switch the file descriptor to blocking mode.
319
320 By default, the driver might not generate completion events for re‐
321 quests so it is necessary to explicitly enable the completion file de‐
322 scriptor before use:
323
324 void blkioq_set_completion_fd_enabled(struct blkioq *q, bool enable);
325
326 Changes made using this function apply also to requests that are al‐
327 ready in flight but not yet completed. Note that even after calling
328 this function with enabled as false, the driver may still generate com‐
329 pletion events.
330
331 The application must read 8 bytes from the completion file descriptor
332 to reset the event before calling blkioq_do_io(). The contents of the
333 bytes are undefined and should not be interpreted by the application.
334
335 The following example demonstrates event-driven I/O:
336
337 struct blkioq *q = blkio_get_queue(b, 0);
338 int completion_fd = blkio_get_completion_fd(q);
339 char event_data[8];
340
341 /* Switch to blocking mode for read(2) below */
342 fcntl(completion_fd, F_SETFL,
343 fcntl(completion_fd, F_GETFL, NULL) & ~O_NONBLOCK);
344
345 /* Enable completion events */
346 blkioq_set_completion_fd_enabled(q, true);
347
348 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
349
350 /* Since min_completions = 0 we will submit but not wait */
351 ret = blkioq_do_io(q, NULL, 0, 0, NULL);
352 if (ret != 0) ...
353
354 /* Wait for the next event on the completion file descriptor */
355 struct blkio_completion completion;
356 do {
357 read(completion_fd, event_data, sizeof(event_data));
358 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
359 } while (ret == 0);
360 if (ret != 1) ...
361 if (completion.ret != 0) ...
362
363 This example uses a blocking read(2) to wait and consume the next event
364 on the completion file descriptor. Because spurious events can occur,
365 it then checks if there actually is a completion available, retrying
366 read(2) otherwise.
367
368 Normally completion_fd would be registered with an event loop so the
369 application can perform other tasks while waiting.
370
371 Applications may save CPU cycles by suppressing completion file de‐
372 scriptor notifications while processing completions. This optimization
373 avoids an unnecessary application event loop iteration and completion
374 file descriptor read when additional completions arrive while the ap‐
375 plication is processing completions:
376
377 static void process_completions(...)
378 {
379 int ret;
380
381 /* Supress completion fd notifications while we process completions */
382 blkioq_set_completion_fd_enabled(q, false);
383
384 do {
385 struct blkioq_completion completion;
386 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
387
388 if (ret == 0) {
389 blkioq_set_completion_fd_enabled(q, true);
390
391 /* Re-check for completions to avoid race */
392 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
393 if (ret == 1) {
394 blkioq_set_completion_fd_enabled(q, false);
395 }
396 }
397
398 if (ret < 0) {
399 ... /* error */
400 }
401
402 if (ret == 1) {
403 ... /* process completion */
404 }
405 } while (ret == 1);
406 }
407
408 Application-level polling mode
409 Waiting for completions using blkioq_do_io() with min_completions > 0
410 can cause the current thread to be descheduled by the operating sys‐
411 tem's scheduler. The same is true when waiting for events on the com‐
412 pletion file descriptor returned by blkioq_get_completion_fd(). Some
413 applications require consistent low response times and therefore cannot
414 risk being descheduled.
415
416 blkioq_do_io() may be called from a CPU polling loop with min_comple‐
417 tions = 0 to check for completions:
418
419 struct blkioq *q = blkio_get_queue(b, 0);
420
421 blkioq_read(q, 0x10000, buf, buf_size, NULL, 0);
422
423 /* Busy-wait for the completion */
424 struct blkio_completion completion;
425 do {
426 ret = blkioq_do_io(q, &completion, 0, 1, NULL);
427 } while (ret == 0);
428
429 if (ret != 1) ...
430 if (completion.ret != 0) ...
431
432 This approach is ideal for applications that need to poll several event
433 sources simultaneously, or that need to intersperse polling with other
434 application logic. Otherwise, driver-level polling (see below) may lead
435 to further performance gains.
436
437 Driver-level polling mode (poll queues)
438 Poll queues differ from the "regular" queues presented above in that
439 calling blkioq_do_io() with min_completions > 0 causes libblkio itself
440 (or other lower layers) to poll for completions. This can be more effi‐
441 cient than repeatedly invoking blkioq_do_io() with min_completions = 0
442 on a "regular" queue. For instance, with the io_uring driver, poll
443 queues cause the kernel itself to poll for completions, avoiding re‐
444 peated context switching while polling.
445
446 A limitation of poll queues is that the CPU thread is occupied with a
447 single poll queue and cannot detect other events in the meantime such
448 as network I/O or application events. Applications wishing to poll mul‐
449 tiple things simultaneously may prefer to use application-level polling
450 (see above).
451
452 Poll queue support is contingent on the particular driver and driver
453 configuration being used. To determine whether a given blkio supports
454 poll queues, check the "supports-poll-queues" property:
455
456 bool supports_poll_queues;
457 ret = blkio_get_bool(b, "supports-poll-queues", &supports_poll_queues);
458 if (ret != 0) ...
459
460 if (!supports_poll_queues) {
461 fprintf(stderr, "Poll queues not supported\n");
462 return;
463 }
464
465 It is possible for poll queues not to support flush, write zeroes, and
466 discard requests, even if "regular" queues of the same blkio do. How‐
467 ever, read, write, readv, and writev requests are always supported.
468 There is currently no mechanism to check which types of requests are
469 supported by poll queues.
470
471 To use poll queues, set the "num-poll-queues" property to a positive
472 value before calling blkio_start(), then use blkio_get_poll_queue() to
473 retrieve the poll queues. A single blkio can have both "regular" queues
474 and poll queues:
475
476 ...
477 ret = blkio_connect(b);
478 if (ret != 0) ...
479
480 ret = blkio_set_int(b, "num-queues", 1);
481 ret = blkio_set_int(b, "num-poll-queues", 1);
482 if (ret != 0) ...
483
484 ret = blkio_start(b);
485 if (ret != 0) ...
486
487 struct blkioq *q = blkio_get_queue(b, 0);
488 struct blkioq *poll_q = blkio_get_poll_queue(b, 0);
489
490 It is possible to set property "num-queues" to 0 as long as
491 "num-poll-queues" is positive.
492
493 Poll queues also differ from "regular" queues in that they do not have
494 a completion fd. blkioq_get_completion_fd() returns -1 when called on a
495 poll queue, and blkioq_set_completion_fd_enabled() has no effect. Fur‐
496 ther, blkioq_do_io_interruptible() is not currently supported on poll
497 queues.
498
499 Note that you can still perform application-level polling on poll
500 queues by repeatedly calling blkioq_do_io() with min_completions = 0,
501 but this will lead to suboptimal performance.
502
503 Dynamically adding and removing queues
504 Some drivers have support for adding queues on demand after the blkio
505 instance is already started:
506
507 int index = blkio_add_queue(b); /* or blkio_add_poll_queue() */
508 if (ret < 0) ...
509
510 struct blkioq *q = blkio_get_queue(b, index); /* or blkio_get_poll_queue() */
511
512 The "can-add-queues" property determines whether this is supported.
513 When it is, the blkio instance can be started with 0 queues.
514
515 In addition, all drivers allow explicitly removing queues, regardless
516 of whether those queues were created by blkio_start() or
517 blkio_add_queue() / blkio_add_poll_queue():
518
519 assert(blkio_get_queue(b, 0) != NULL);
520 assert(blkio_get_queue(b, 1) != NULL);
521
522 /* blkio_remove_queue() will return 0, indicating success */
523 assert(blkio_remove_queue(b, 0) == 0);
524
525 /* Other queues' indices are not shifted, so q will be non-NULL and valid */
526 struct blkio *q = blkio_get_queue(b, 1);
527 assert(q != NULL);
528
529 /* blkio_remove_queue() will return -ENOENT, since queue 0 no longer exists */
530 assert(blkio_remove_queue(b, 0) == -ENOENT);
531
532 Once a queue is removed, any struct blkioq * pointing to it becomes in‐
533 valid.
534
535 Request types
536 The following types of I/O requests are available:
537
538 void blkioq_read(struct blkioq *q, uint64_t start, void *buf, size_t len,
539 void *user_data, uint32_t flags);
540 void blkioq_write(struct blkioq *q, uint64_t start, void *buf, size_t len,
541 void *user_data, uint32_t flags);
542 void blkioq_readv(struct blkioq *q, uint64_t start, struct iovec *iovec,
543 int iovcnt, void *user_data, uint32_t flags);
544 void blkioq_writev(struct blkioq *q, uint64_t start, struct iovec *iovec,
545 int iovcnt, void *user_data, uint32_t flags);
546 void blkioq_write_zeroes(struct blkioq *q, uint64_t start, uint64_t len,
547 void *user_data, uint32_t flags);
548 void blkioq_discard(struct blkioq *q, uint64_t start, uint64_t len,
549 void *user_data, uint32_t flags);
550 void blkioq_flush(struct blkioq *q, void *user_data, uint32_t flags);
551
552 The block device may see requests as soon as they these functions are
553 called, but blkioq_do_io() must be called to ensure requests are seen.
554
555 If property "needs-mem-regions" is true, I/O data buffers pointed to by
556 buf and iovec must be within regions mapped using blkio_map_mem_re‐
557 gion().
558
559 The application must not free the iovec elements until the request's
560 completion is returned by blkioq_do_io().
561
562 All drivers are guaranteed to support at least blkioq_read(),
563 blkioq_write(), blkioq_readv(), blkioq_writev(), and blkioq_flush().
564 When attempting to queue a request that the driver does not support,
565 the request itself fails and its completion's ret field is -ENOTSUP.
566
567 blkioq_read() and blkioq_readv() read data from the block device at
568 byte offset start. blkioq_write() and blkioq_writev() write data to the
569 block device at byte offset start. The length of the I/O data buffer is
570 len bytes and the total size of the iovec elements, respectively. start
571 and the length of the I/O data buffer must be a multiple of the "re‐
572 quest-alignment" property. I/O data buffer addresses and lengths, in‐
573 cluding buf and individual iovec elements, must be multiples of the
574 "buf-alignment" property.
575
576 blkioq_write_zeroes() causes zeros to be written to the specified re‐
577 gion. When supported, this may be more efficient than using
578 blkioq_write() with a zero-filled buffer.
579
580 blkioq_discard() causes data in the specified region to be discarded.
581 Subsequent reads to the same region return unspecified data until it is
582 written to again. Note that discarded data is not guaranteed to be
583 erased and may still be returned by reads.
584
585 blkioq_flush() persists completed writes to the storage medium. Data is
586 persistent once the flush request completes successfully. Applications
587 that need to ensure that data persists across power failure or crash
588 must submit flush requests at appropriate points.
589
590 The user_data pointer is returned in the struct blkio_comple‐
591 tion::user_data field by blkioq_do_io(). It allows applications to cor‐
592 relate a completion with its request.
593
594 No ordering guarantees are defined for requests that are in flight si‐
595 multaneously. For example, a flush request is not guaranteed to persist
596 in-flight write requests. Instead the application must wait for write
597 requests that it wishes to persist to complete before calling
598 blkioq_flush().
599
600 Similarly, there are no ordering guarantees between multiple queues of
601 a block device. Multi-threaded applications that rely on an ordering
602 between multiple queues must wait for the first request to complete on
603 one queue, synchronize threads as needed, and then submit the second
604 request on the other queue.
605
606 Request flags
607 The following request flags are available:
608
609 BLKIO_REQ_FUA
610 Ensures that data written by this request reaches persistent
611 storage before the request is completed. This is also known as
612 Full Unit Access (FUA). This flag eliminates the need for a sep‐
613 arate blkioq_flush() call after the request has completed. Other
614 data that was previously successfully written without the
615 BLKIO_REQ_FUA flag is not necessarily persisted by this flag as
616 it is only guaranteed to affect the current request. Supported
617 by blkioq_write() and blkioq_writev().
618
619 BLKIO_REQ_NO_UNMAP
620 Ensures that blkioq_write_zeroes() does not cause underlying
621 storage space to be deallocated, guaranteeing that subsequent
622 writes to the same region do not fail due to lack of space.
623
624 BLKIO_REQ_NO_FALLBACK
625 Ensures that blkioq_write_zeroes() does not resort to performing
626 regular write requests with zero-filled buffers. If that would
627 otherwise be the case and this flag is set, then the request
628 fails and its completion's ret field is -ENOTSUP.
629
631 The configuration of blkio instances is done through property accesses.
632 Each property has a name and a type (bool, int, str, uint64). Proper‐
633 ties may be read-only (r), write-only (w), or read/write (rw).
634
635 Access to properties depends on the blkio instance state (created/con‐
636 nected/started). A property may be read/write in the connected state
637 but read-only in the started state. This is written as "rw connected, r
638 started".
639
640 The following properties APIs are available:
641
642 int blkio_get_bool(struct blkio *b, const char *name, bool *value);
643 int blkio_get_int(struct blkio *b, const char *name, int *value);
644 int blkio_get_uint64(struct blkio *b, const char *name, uint64_t *value);
645 int blkio_get_str(struct blkio *b, const char *name, char **value);
646
647 int blkio_set_bool(struct blkio *b, const char *name, bool value);
648 int blkio_set_int(struct blkio *b, const char *name, int value);
649 int blkio_set_uint64(struct blkio *b, const char *name, uint64_t value);
650 int blkio_set_str(struct blkio *b, const char *name, const char *value);
651
652 blkio_get_str() assigns to *value and the caller must use free(3) to
653 deallocate the memory.
654
655 blkio_get_str() automatically converts to string representation if the
656 property is not a str. blkio_set_str() automatically converts from
657 string representation if the property is not a str. This can be used to
658 easily fetch values from and store values to an application's
659 text-based configuration file or command-line. Aside from this auto‐
660 matic conversion, the other property APIs fail with ENOTTY if the prop‐
661 erty does not have the right type.
662
663 The following properties are common across all drivers. Driver-specific
664 properties are documented in DRIVERS.
665
666 Properties available after blkio_create()
667 can-add-queues (bool, r created/connected/started)
668 Whether the driver supports dynamically adding queues with
669 blkio_add_queue() / blkio_add_poll_queue().
670
671 driver (str, r created/connected/started)
672 The driver name that was passed to blkio_create(). See DRIVERS
673 for details on available drivers.
674
675 read-only (bool, rw created, r connected/started)
676 If true, requests other than read and flush fail with -EBADF.
677 The default is false.
678
679 Properties available after blkio_connect()
680 DEVICE AND QUEUES
681
682 capacity (uint64, r connected/started)
683 The size of the block device in bytes.
684
685 max-queues (int, r connected/started)
686 The maximum number of queues, including poll queues if any.
687
688 num-queues (int, rw connected, r started)
689 The number of queues. The default is 1.
690
691 num-poll-queues (int, rw connected, r started)
692 The number of poll queues. The default is 0. If set to a pos‐
693 itive value and property "supports-poll-queues" is false,
694 blkio_start() will fail.
695
696 supports-poll-queues (bool, r connected/started)
697 Whether the driver supports poll queues.
698
699 MEMORY REGIONS
700
701 max-mem-regions (uint64, r connected/started)
702 The maximum number of memory regions that can be mapped at
703 any given time.
704
705 may-pin-mem-regions (bool, r connected/started)
706 Will the driver sometimes pin memory region pages and there‐
707 fore prevent madvise(MADV_DONTNEED) and related syscalls from
708 working?
709
710 mem-region-alignment (uint64, r connected/started)
711 The alignment requirement, in bytes, for the addr, iova, and
712 size in struct blkio_memory_region. This is always a multiple
713 of the "buf-alignment" property.
714
715 needs-mem-regions (bool, r connected/started)
716 Is it necessary to map memory regions with blkio_map_mem_re‐
717 gion()?
718
719 needs-mem-region-fd (bool, r connected/started)
720 Is it necessary to provide a file descriptor for each memory
721 region?
722
723 ALL REQUESTS
724
725 optimal-io-alignment (int, r connected/started)
726 The ideal number of bytes of request start and length align‐
727 ment for maximizing performance. This is a multiple of the
728 "request-alignment" property.
729
730 optimal-io-size (int, r connected/started)
731 The ideal request length in bytes for achieving high through‐
732 put. Can be 0 if unspecified. Otherwise, this is a multiple
733 of the "optimal-io-alignment" property.
734
735 request-alignment (int, r connected/started)
736 All request start and length must be a multiple of this
737 value. Often this value is 512 bytes.
738
739 READ AND WRITE REQUESTS
740
741 buf-alignment (int, r connected/started)
742 I/O data buffer memory address and length alignment, includ‐
743 ing plain void *buf buffers and iovec segments. Note the
744 "mem-region-alignment" property is always a multiple of this
745 value.
746
747 can-grow (bool, r connected/started)
748 If false blkioq_read(), blkioq_readv(), blkioq_write() and
749 blkioq_writev() will fail if an attempt to read/write beyond
750 of EOF is made. Otherwise, reads will succeed and the portion
751 of the read buffer that overruns EOF will be filled with ze‐
752 ros, and writes will increase the the device's capacity.
753
754 max-segments (int, r connected/started)
755 The maximum iovcnt in a request.
756
757 max-segment-len (int, r connected/started)
758 The maximum size of each iovec in a request. Can be 0 if un‐
759 specified.
760
761 max-transfer (int, r connected/started)
762 The maximum read or write request length in bytes. Can be 0
763 if unspecified.
764
765 optimal-buf-alignment (int, r connected/started)
766 The ideal number of bytes of I/O data buffer memory address
767 and length alignment, including plain void *buf buffers and
768 iovec segments.
769
770 supports-fua-natively (bool, r connected/started)
771 Whether blkioq_write() and blkioq_writev() support the
772 BLKIO_REQ_FUA flag natively, as opposed to emulating it by
773 internally performing a flush request after the write.
774
775 WRITE ZEROES REQUESTS
776
777 max-write-zeroes-len (uint64, r connected/started)
778 The maximum length of a write zeroes request in bytes. Can be
779 0 if unspecified.
780
781 DISCARD REQUESTS
782
783 discard-alignment (int, r connected/started)
784 Discard request start and length, after subtracting the value
785 of the "discard-alignment-offset" property, must be a multi‐
786 ple of this value. This may or may not be 0 if discard re‐
787 quests are not supported. If not 0, this is a multiple of the
788 "request-alignment" property.
789
790 discard-alignment-offset (int, r connected/started)
791 Offset of the first block that may be discarded. This may be
792 non-zero, for example, when the device is a partition that is
793 not aligned to the value of the "discard-alignment" property.
794 This may or may not be 0 if discard requests are not sup‐
795 ported. If not 0, this is a multiple of the "request-align‐
796 ment" property, and is less than the "discard-alignment"
797 property.
798
799 max-discard-len (uint64, r connected/started)
800 The maximum length of a discard request in bytes. Can be 0 if
801 unspecified.
802
804 io_uring
805 The io_uring driver uses the Linux io_uring system call interface to
806 perform I/O on files and block device nodes. Both regular files and
807 block device nodes are supported.
808
809 Note that io_uring was introduced in Linux kernel version 5.1, and ker‐
810 nels may also be configured to disable io_uring. If io_uring is not
811 available, blkio_create() fails with -ENOSYS when using this driver.
812
813 When performing I/O on regular files, write zeroes requests that extend
814 past the end-of-file may or may not update the file size. This is left
815 unspecified and the user must not rely on any particular behavior.
816
817 This driver supports poll queues only when using O_DIRECT on block de‐
818 vices or file systems that support polling. Its poll queues never sup‐
819 port flush, write zeroes, or discard requests.
820
821 Driver-specific properties available after blkio_create()
822
823 direct (bool, rw created, r connected/started)
824 True to bypass the page cache with O_DIRECT. The default is
825 false.
826
827 fd (int, rw created, r connected/started)
828 An existing open file descriptor for the file or block device
829 node. Ownership of the file descriptor is passed to the li‐
830 brary when connected is successfully set to true.
831
832 If this property is set, properties "direct" and "read-only"
833 have no effect and it is the user's responsibility to open
834 the file with the desired flags. Further, during connect,
835 those two properties are updated to reflect the file status
836 flags of the given file descriptor.
837
838 path (str, rw created, r connected/started)
839 The file system path of the file or block device node.
840
841 If this property is set, property "fd" must not be set and
842 will be updated on connect to reflect the opened file de‐
843 scriptor. Note that the file descriptor is owned by libblkio.
844
845 Driver-specific properties available after blkio_connect()
846
847 num-entries (int, rw connected, r started)
848 The minimum number of entries that each io_uring submission
849 queue and completion queue should have. The default is 128.
850
851 A larger value allows more requests to be in flight, but con‐
852 sumes more resources. Tuning this value can affect perfor‐
853 mance.
854
855 io_uring imposes a maximum on this number: 32768 as of main‐
856 line kernel 5.18, and 4096 prior to 5.4. If this maximum is
857 exceeded, blkio_start() will fail with -EINVAL.
858
859 nvme-io_uring
860 The nvme-io_uring driver submits NVMe commands directly to an NVMe
861 namespace using io_uring passthrough, which is available since mainline
862 Linux kernel 5.19.
863
864 The process must have the CAP_SYS_ADMIN capability to use this driver,
865 and the NVMe namespace must use the NVM command set.
866
867 Driver-specific properties available after blkio_create()
868
869 fd (int, rw created, r connected/started)
870 An existing open file descriptor for the NVMe namespace's
871 character device (e.g., /dev/ng0n1). Ownership of the file
872 descriptor is passed to the library when connected is suc‐
873 cessfully set to true.
874
875 path (str, rw created, r connected/started)
876 A path to the NVMe namespace's character device (e.g.,
877 /dev/ng0n1).
878
879 If this property is set, property "fd" must not be set and
880 will be updated on connect to reflect the opened file de‐
881 scriptor. Note that the file descriptor is owned by libblkio.
882
883 Driver-specific properties available after blkio_connect()
884
885 num-entries (int, rw connected, r started)
886 The minimum number of entries that each io_uring submission
887 queue and completion queue should have. The default is 128.
888
889 A larger value allows more requests to be in flight, but con‐
890 sumes more resources. Tuning this value can affect perfor‐
891 mance.
892
893 io_uring imposes a maximum on this number: 32768 as of main‐
894 line kernel 5.18, and 4096 prior to 5.4. If this maximum is
895 exceeded, blkio_start() will fail with -EINVAL.
896
897 virtio-blk-...
898 The following virtio-blk drivers are provided:
899
900 • The virtio-blk-vfio-pci driver uses uses VFIO to control a PCI vir‐
901 tio-blk device.
902
903 • The virtio-blk-vhost-user driver connects as a client to a Unix do‐
904 main socket provided by a vhost-user-blk backend (e.g. exported from
905 qemu-storage-daemon).
906
907 • The virtio-blk-vhost-vdpa driver uses vhost-vdpa kernel interface to
908 perform I/O on a vDPA device. vDPA device could be implemented in
909 software (VDUSE, in-kernel, simulator) or in hardware.
910
911 These drivers always support poll queues, and their poll queues support
912 all types of requests.
913
914 The following properties apply to all these drivers.
915
916 Driver-specific properties available after blkio_create()
917
918 path (str, rw created, r connected/started)
919
920 • virtio-blk-vfio-pci: The file system path of the device's
921 sysfs directory, e.g., /sys/bus/pci/devices/0000:00:01.0.
922
923 • virtio-blk-vhost-user: The file system path of the
924 vhost-user socket to connect to.
925
926 • virtio-blk-vhost-vdpa: The file system path of the
927 vhost-vdpa character device to connect to.
928
929 Driver-specific properties available after blkio_connect()
930
931 max-queue-size (int, r connected/started)
932 The maximum queue size supported by the device.
933
934 queue-size (int, rw connected, r started)
935 The queue size to configure the device with. The default is
936 256. A larger value allows more requests to be in flight, but
937 consumes more resources. Tuning this value can affect per‐
938 formance.
939
941 pkg-config is the recommended way to build a program with libblkio:
942
943 $ cc -o app app.c `pkg-config blkio --cflags --libs`
944
945 Meson projects can use pkg-config as follows:
946
947 blkio = dependency('blkio')
948 executable('app', 'app.c', dependencies : [blkio])
949
951 Can network storage drivers be added?
952 Maybe. The API was designed with a synchronous control path. Functions
953 like blkio_get_uint64() must return quickly. Operations on network
954 storage can take an unbounded amount of time (in the absence of a time‐
955 out mechanism) and are not a good fit for synchronous APIs. A more com‐
956 plex asynchronous control path API could be added for applications
957 wishing to use network storage drivers in the future.
958
959 Can non-Linux operating systems be supported in the future?
960 Maybe. No attempt has been made to restrict the library to POSIX fea‐
961 tures only and most drivers are platform-specific. If there is demand
962 for supporting other operating systems and developers willing to work
963 on it then it may be possible.
964
965 Can a Linux AIO driver be added?
966 Linux AIO could serve as a fallback on systems where io_uring is not
967 available. However, io_submit(2) can block the process and this causes
968 performance problems in event-driven applications that require that the
969 event loop does not block. Unless Linux AIO is fixed it is unlikely
970 that a proposal to add a driver will be accepted.
971
973 io_uring_setup(2), io_setup(2), aio(7)
974
975
976
977
978 BLKIO(3)