1fio(1) General Commands Manual fio(1)
2
3
4
6 fio - flexible I/O tester
7
9 fio [options] [jobfile]...
10
12 fio is a tool that will spawn a number of threads or processes doing a
13 particular type of I/O action as specified by the user. The typical
14 use of fio is to write a job file matching the I/O load one wants to
15 simulate.
16
18 --debug=type
19 Enable verbose tracing type of various fio actions. May be `all'
20 for all types or individual types separated by a comma (e.g.
21 `--debug=file,mem' will enable file and memory debugging).
22 `help' will list all available tracing options.
23
24 --parse-only
25 Parse options only, don't start any I/O.
26
27 --merge-blktrace-only
28 Merge blktraces only, don't start any I/O.
29
30 --output=filename
31 Write output to filename.
32
33 --output-format=format
34 Set the reporting format to `normal', `terse', `json', or
35 `json+'. Multiple formats can be selected, separate by a comma.
36 `terse' is a CSV based format. `json+' is like `json', except it
37 adds a full dump of the latency buckets.
38
39 --bandwidth-log
40 Generate aggregate bandwidth logs.
41
42 --minimal
43 Print statistics in a terse, semicolon-delimited format.
44
45 --append-terse
46 Print statistics in selected mode AND terse, semicolon-delimited
47 format. Deprecated, use --output-format instead to select mul‐
48 tiple formats.
49
50 --terse-version=version
51 Set terse version output format (default `3', or `2', `4', `5').
52
53 --version
54 Print version information and exit.
55
56 --help Print a summary of the command line options and exit.
57
58 --cpuclock-test
59 Perform test and validation of internal CPU clock.
60
61 --crctest=[test]
62 Test the speed of the built-in checksumming functions. If no ar‐
63 gument is given, all of them are tested. Alternatively, a comma
64 separated list can be passed, in which case the given ones are
65 tested.
66
67 --cmdhelp=command
68 Print help information for command. May be `all' for all com‐
69 mands.
70
71 --enghelp=[ioengine[,command]]
72 List all commands defined by ioengine, or print help for command
73 defined by ioengine. If no ioengine is given, list all available
74 ioengines.
75
76 --showcmd
77 Convert given jobfiles to a set of command-line options.
78
79 --readonly
80 Turn on safety read-only checks, preventing writes and trims.
81 The --readonly option is an extra safety guard to prevent users
82 from accidentally starting a write or trim workload when that is
83 not desired. Fio will only modify the device under test if
84 `rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite' is given.
85 This safety net can be used as an extra precaution.
86
87 --eta=when
88 Specifies when real-time ETA estimate should be printed. when
89 may be `always', `never' or `auto'. `auto' is the default, it
90 prints ETA when requested if the output is a TTY. `always' dis‐
91 regards the output type, and prints ETA when requested. `never'
92 never prints ETA.
93
94 --eta-interval=time
95 By default, fio requests client ETA status roughly every second.
96 With this option, the interval is configurable. Fio imposes a
97 minimum allowed time to avoid flooding the console, less than
98 250 msec is not supported.
99
100 --eta-newline=time
101 Force a new line for every time period passed. When the unit is
102 omitted, the value is interpreted in seconds.
103
104 --status-interval=time
105 Force a full status dump of cumulative (from job start) values
106 at time intervals. This option does *not* provide per-period
107 measurements. So values such as bandwidth are running averages.
108 When the time unit is omitted, time is interpreted in seconds.
109 Note that using this option with `--output-format=json' will
110 yield output that technically isn't valid json, since the output
111 will be collated sets of valid json. It will need to be split
112 into valid sets of json after the run.
113
114 --section=name
115 Only run specified section name in job file. Multiple sections
116 can be specified. The --section option allows one to combine
117 related jobs into one file. E.g. one job file could define
118 light, moderate, and heavy sections. Tell fio to run only the
119 "heavy" section by giving `--section=heavy' command line option.
120 One can also specify the "write" operations in one section and
121 "verify" operation in another section. The --section option only
122 applies to job sections. The reserved *global* section is always
123 parsed and used.
124
125 --alloc-size=kb
126 Allocate additional internal smalloc pools of size kb in KiB.
127 The --alloc-size option increases shared memory set aside for
128 use by fio. If running large jobs with randommap enabled, fio
129 can run out of memory. Smalloc is an internal allocator for
130 shared structures from a fixed size memory pool and can grow to
131 16 pools. The pool size defaults to 16MiB. NOTE: While running
132 `.fio_smalloc.*' backing store files are visible in `/tmp'.
133
134 --warnings-fatal
135 All fio parser warnings are fatal, causing fio to exit with an
136 error.
137
138 --max-jobs=nr
139 Set the maximum number of threads/processes to support to nr.
140 NOTE: On Linux, it may be necessary to increase the shared-mem‐
141 ory limit (`/proc/sys/kernel/shmmax') if fio runs into errors
142 while creating jobs.
143
144 --server=args
145 Start a backend server, with args specifying what to listen to.
146 See CLIENT/SERVER section.
147
148 --daemonize=pidfile
149 Background a fio server, writing the pid to the given pidfile
150 file.
151
152 --client=hostname
153 Instead of running the jobs locally, send and run them on the
154 given hostname or set of hostnames. See CLIENT/SERVER section.
155
156 --remote-config=file
157 Tell fio server to load this local file.
158
159 --idle-prof=option
160 Report CPU idleness. option is one of the following:
161
162 calibrate
163 Run unit work calibration only and exit.
164
165 system Show aggregate system idleness and unit work.
166
167 percpu As system but also show per CPU idleness.
168
169 --inflate-log=log
170 Inflate and output compressed log.
171
172 --trigger-file=file
173 Execute trigger command when file exists.
174
175 --trigger-timeout=time
176 Execute trigger at this time.
177
178 --trigger=command
179 Set this command as local trigger.
180
181 --trigger-remote=command
182 Set this command as remote trigger.
183
184 --aux-path=path
185 Use the directory specified by path for generated state files
186 instead of the current working directory.
187
189 Any parameters following the options will be assumed to be job files,
190 unless they match a job file parameter. Multiple job files can be
191 listed and each job file will be regarded as a separate group. Fio will
192 stonewall execution between each group.
193
194 Fio accepts one or more job files describing what it is supposed to do.
195 The job file format is the classic ini file, where the names enclosed
196 in [] brackets define the job name. You are free to use any ASCII name
197 you want, except *global* which has special meaning. Following the job
198 name is a sequence of zero or more parameters, one per line, that de‐
199 fine the behavior of the job. If the first character in a line is a ';'
200 or a '#', the entire line is discarded as a comment.
201
202 A *global* section sets defaults for the jobs described in that file. A
203 job may override a *global* section parameter, and a job file may even
204 have several *global* sections if so desired. A job is only affected by
205 a *global* section residing above it.
206
207 The --cmdhelp option also lists all options. If used with an command
208 argument, --cmdhelp will detail the given command.
209
210 See the `examples/' directory for inspiration on how to write job
211 files. Note the copyright and license requirements currently apply to
212 `examples/' files.
213
214 Note that the maximum length of a line in the job file is 8192 bytes.
215
217 Some parameters take an option of a given type, such as an integer or a
218 string. Anywhere a numeric value is required, an arithmetic expression
219 may be used, provided it is surrounded by parentheses. Supported opera‐
220 tors are:
221
222 addition (+)
223
224 subtraction (-)
225
226 multiplication (*)
227
228 division (/)
229
230 modulus (%)
231
232 exponentiation (^)
233
234 For time values in expressions, units are microseconds by default. This
235 is different than for time values not in expressions (not enclosed in
236 parentheses).
237
239 The following parameter types are used.
240
241 str String. A sequence of alphanumeric characters.
242
243 time Integer with possible time suffix. Without a unit value is in‐
244 terpreted as seconds unless otherwise specified. Accepts a suf‐
245 fix of 'd' for days, 'h' for hours, 'm' for minutes, 's' for
246 seconds, 'ms' (or 'msec') for milliseconds and 'us' (or 'usec')
247 for microseconds. For example, use 10m for 10 minutes.
248
249 int Integer. A whole number value, which may contain an integer pre‐
250 fix and an integer suffix.
251
252 [*integer prefix*] **number** [*integer suffix*]
253
254 The optional *integer prefix* specifies the number's base. The
255 default is decimal. *0x* specifies hexadecimal.
256
257 The optional *integer suffix* specifies the number's units, and
258 includes an optional unit prefix and an optional unit. For quan‐
259 tities of data, the default unit is bytes. For quantities of
260 time, the default unit is seconds unless otherwise specified.
261
262 With `kb_base=1000', fio follows international standards for
263 unit prefixes. To specify power-of-10 decimal values defined in
264 the International System of Units (SI):
265
266 K means kilo (K) or 1000
267 M means mega (M) or 1000**2
268 G means giga (G) or 1000**3
269 T means tera (T) or 1000**4
270 P means peta (P) or 1000**5
271
272 To specify power-of-2 binary values defined in IEC 80000-13:
273
274 Ki means kibi (Ki) or 1024
275 Mi means mebi (Mi) or 1024**2
276 Gi means gibi (Gi) or 1024**3
277 Ti means tebi (Ti) or 1024**4
278 Pi means pebi (Pi) or 1024**5
279
280 For Zone Block Device Mode:
281
282 z means Zone
283 With `kb_base=1024' (the default), the unit prefixes are oppo‐
284 site from those specified in the SI and IEC 80000-13 standards
285 to provide compatibility with old scripts. For example, 4k means
286 4096.
287
288 For quantities of data, an optional unit of 'B' may be included
289 (e.g., 'kB' is the same as 'k').
290
291 The *integer suffix* is not case sensitive (e.g., m/mi mean
292 mebi/mega, not milli). 'b' and 'B' both mean byte, not bit.
293
294 Examples with `kb_base=1000':
295
296 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
297 1 MiB: 1048576, 1m, 1024k
298 1 MB: 1000000, 1mi, 1000ki
299 1 TiB: 1073741824, 1t, 1024m, 1048576k
300 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
301
302 Examples with `kb_base=1024' (default):
303
304 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
305 1 MiB: 1048576, 1m, 1024k
306 1 MB: 1000000, 1mi, 1000ki
307 1 TiB: 1073741824, 1t, 1024m, 1048576k
308 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
309
310 To specify times (units are not case sensitive):
311
312 D means days
313 H means hours
314 M mean minutes
315 s or sec means seconds (default)
316 ms or msec means milliseconds
317 us or usec means microseconds
318
319 `z' suffix specifies that the value is measured in zones. Value
320 is recalculated once block device's zone size becomes known.
321
322 If the option accepts an upper and lower range, use a colon ':'
323 or minus '-' to separate such values. See irange parameter type.
324 If the lower value specified happens to be larger than the upper
325 value the two values are swapped.
326
327 bool Boolean. Usually parsed as an integer, however only defined for
328 true and false (1 and 0).
329
330 irange Integer range with suffix. Allows value range to be given, such
331 as 1024-4096. A colon may also be used as the separator, e.g.
332 1k:4k. If the option allows two sets of ranges, they can be
333 specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see
334 int parameter type.
335
336 float_list
337 A list of floating point numbers, separated by a ':' character.
338
340 With the above in mind, here follows the complete list of fio job pa‐
341 rameters.
342
343 Units
344 kb_base=int
345 Select the interpretation of unit prefixes in input parameters.
346
347 1000 Inputs comply with IEC 80000-13 and the Interna‐
348 tional System of Units (SI). Use:
349
350 - power-of-2 values with IEC prefixes (e.g., KiB)
351 - power-of-10 values with SI prefixes (e.g., kB)
352
353 1024 Compatibility mode (default). To avoid breaking
354 old scripts:
355
356 - power-of-2 values with SI prefixes
357 - power-of-10 values with IEC prefixes
358
359 See bs for more details on input parameters.
360
361 Outputs always use correct prefixes. Most outputs include both
362 side-by-side, like:
363
364 bw=2383.3kB/s (2327.4KiB/s)
365
366 If only one value is reported, then kb_base selects the one to
367 use:
368
369 1000 -- SI prefixes
370 1024 -- IEC prefixes
371
372 unit_base=int
373 Base unit for reporting. Allowed values are:
374
375 0 Use auto-detection (default).
376
377 8 Byte based.
378
379 1 Bit based.
380
381 Job description
382 name=str
383 ASCII name of the job. This may be used to override the name
384 printed by fio for this job. Otherwise the job name is used. On
385 the command line this parameter has the special purpose of also
386 signaling the start of a new job.
387
388 description=str
389 Text description of the job. Doesn't do anything except dump
390 this text description when this job is run. It's not parsed.
391
392 loops=int
393 Run the specified number of iterations of this job. Used to re‐
394 peat the same workload a given number of times. Defaults to 1.
395
396 numjobs=int
397 Create the specified number of clones of this job. Each clone of
398 job is spawned as an independent thread or process. May be used
399 to setup a larger number of threads/processes doing the same
400 thing. Each thread is reported separately; to see statistics for
401 all clones as a whole, use group_reporting in conjunction with
402 new_group. See --max-jobs. Default: 1.
403
404 Time related parameters
405 runtime=time
406 Limit runtime. The test will run until it completes the config‐
407 ured I/O workload or until it has run for this specified amount
408 of time, whichever occurs first. It can be quite hard to deter‐
409 mine for how long a specified job will run, so this parameter is
410 handy to cap the total runtime to a given time. When the unit
411 is omitted, the value is interpreted in seconds.
412
413 time_based
414 If set, fio will run for the duration of the runtime specified
415 even if the file(s) are completely read or written. It will sim‐
416 ply loop over the same workload as many times as the runtime al‐
417 lows.
418
419 startdelay=irange(int)
420 Delay the start of job for the specified amount of time. Can be
421 a single value or a range. When given as a range, each thread
422 will choose a value randomly from within the range. Value is in
423 seconds if a unit is omitted.
424
425 ramp_time=time
426 If set, fio will run the specified workload for this amount of
427 time before logging any performance numbers. Useful for letting
428 performance settle before logging results, thus minimizing the
429 runtime required for stable results. Note that the ramp_time is
430 considered lead in time for a job, thus it will increase the to‐
431 tal runtime if a special timeout or runtime is specified. When
432 the unit is omitted, the value is given in seconds.
433
434 clocksource=str
435 Use the given clocksource as the base of timing. The supported
436 options are:
437
438 gettimeofday
439 gettimeofday(2)
440
441 clock_gettime
442 clock_gettime(2)
443
444 cpu Internal CPU clock source
445
446 cpu is the preferred clocksource if it is reliable, as it is
447 very fast (and fio is heavy on time calls). Fio will automati‐
448 cally use this clocksource if it's supported and considered re‐
449 liable on the system it is running on, unless another clock‐
450 source is specifically set. For x86/x86-64 CPUs, this means sup‐
451 porting TSC Invariant.
452
453 gtod_reduce=bool
454 Enable all of the gettimeofday(2) reducing options (dis‐
455 able_clat, disable_slat, disable_bw_measurement) plus reduce
456 precision of the timeout somewhat to really shrink the gettime‐
457 ofday(2) call count. With this option enabled, we only do about
458 0.4% of the gettimeofday(2) calls we would have done if all time
459 keeping was enabled.
460
461 gtod_cpu=int
462 Sometimes it's cheaper to dedicate a single thread of execution
463 to just getting the current time. Fio (and databases, for in‐
464 stance) are very intensive on gettimeofday(2) calls. With this
465 option, you can set one CPU aside for doing nothing but logging
466 current time to a shared memory location. Then the other
467 threads/processes that run I/O workloads need only copy that
468 segment, instead of entering the kernel with a gettimeofday(2)
469 call. The CPU set aside for doing these time calls will be ex‐
470 cluded from other uses. Fio will manually clear it from the CPU
471 mask of other jobs.
472
473 Target file/device
474 directory=str
475 Prefix filenames with this directory. Used to place files in a
476 different location than `./'. You can specify a number of direc‐
477 tories by separating the names with a ':' character. These di‐
478 rectories will be assigned equally distributed to job clones
479 created by numjobs as long as they are using generated file‐
480 names. If specific filename(s) are set fio will use the first
481 listed directory, and thereby matching the filename semantic
482 (which generates a file for each clone if not specified, but
483 lets all clones use the same file if set).
484
485 See the filename option for information on how to escape ':'
486 characters within the directory path itself.
487
488 Note: To control the directory fio will use for internal state
489 files use --aux-path.
490
491 filename=str
492 Fio normally makes up a filename based on the job name, thread
493 number, and file number (see filename_format). If you want to
494 share files between threads in a job or several jobs with fixed
495 file paths, specify a filename for each of them to override the
496 default. If the ioengine is file based, you can specify a number
497 of files by separating the names with a ':' colon. So if you
498 wanted a job to open `/dev/sda' and `/dev/sdb' as the two work‐
499 ing files, you would use `filename=/dev/sda:/dev/sdb'. This also
500 means that whenever this option is specified, nrfiles is ig‐
501 nored. The size of regular files specified by this option will
502 be size divided by number of files unless an explicit size is
503 specified by filesize.
504
505 Each colon in the wanted path must be escaped with a '\' charac‐
506 ter. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
507 would use `filename=/dev/dsk/foo@3,0\:c' and if the path is
508 `F:\filename' then you would use `filename=F\:\filename'.
509
510 On Windows, disk devices are accessed as `\\.\PhysicalDrive0'
511 for the first device, `\\.\PhysicalDrive1' for the second etc.
512 Note: Windows and FreeBSD prevent write access to areas of the
513 disk containing in-use data (e.g. filesystems).
514
515 The filename `-' is a reserved name, meaning *stdin* or *std‐
516 out*. Which of the two depends on the read/write direction set.
517
518 filename_format=str
519 If sharing multiple files between jobs, it is usually necessary
520 to have fio generate the exact names that you want. By default,
521 fio will name a file based on the default file format specifica‐
522 tion of `jobname.jobnumber.filenumber'. With this option, that
523 can be customized. Fio will recognize and replace the following
524 keywords in this string:
525
526 $jobname
527 The name of the worker thread or process.
528
529 $clientuid
530 IP of the fio process when using client/server
531 mode.
532
533 $jobnum
534 The incremental number of the worker thread or
535 process.
536
537 $filenum
538 The incremental number of the file for that worker
539 thread or process.
540
541 To have dependent jobs share a set of files, this option can be
542 set to have fio generate filenames that are shared between the
543 two. For instance, if `testfiles.$filenum' is specified, file
544 number 4 for any job will be named `testfiles.4'. The default of
545 `$jobname.$jobnum.$filenum' will be used if no other format
546 specifier is given.
547
548 If you specify a path then the directories will be created up to
549 the main directory for the file. So for example if you specify
550 `a/b/c/$jobnum` then the directories a/b/c will be created be‐
551 fore the file setup part of the job. If you specify directory
552 then the path will be relative that directory, otherwise it is
553 treated as the absolute path.
554
555 unique_filename=bool
556 To avoid collisions between networked clients, fio defaults to
557 prefixing any generated filenames (with a directory specified)
558 with the source of the client connecting. To disable this behav‐
559 ior, set this option to 0.
560
561 opendir=str
562 Recursively open any files below directory str.
563
564 lockfile=str
565 Fio defaults to not locking any files before it does I/O to
566 them. If a file or file descriptor is shared, fio can serialize
567 I/O to that file to make the end result consistent. This is
568 usual for emulating real workloads that share files. The lock
569 modes are:
570
571 none No locking. The default.
572
573 exclusive
574 Only one thread or process may do I/O at a time,
575 excluding all others.
576
577 readwrite
578 Read-write locking on the file. Many readers may
579 access the file at the same time, but writes get
580 exclusive access.
581
582 nrfiles=int
583 Number of files to use for this job. Defaults to 1. The size of
584 files will be size divided by this unless explicit size is spec‐
585 ified by filesize. Files are created for each thread separately,
586 and each file will have a file number within its name by de‐
587 fault, as explained in filename section.
588
589 openfiles=int
590 Number of files to keep open at the same time. Defaults to the
591 same as nrfiles, can be set smaller to limit the number simulta‐
592 neous opens.
593
594 file_service_type=str
595 Defines how fio decides which file from a job to service next.
596 The following types are defined:
597
598 random Choose a file at random.
599
600 roundrobin
601 Round robin over opened files. This is the de‐
602 fault.
603
604 sequential
605 Finish one file before moving on to the next. Mul‐
606 tiple files can still be open depending on open‐
607 files.
608
609 zipf Use a Zipf distribution to decide what file to ac‐
610 cess.
611
612 pareto Use a Pareto distribution to decide what file to
613 access.
614
615 normal Use a Gaussian (normal) distribution to decide
616 what file to access.
617
618 gauss Alias for normal.
619
620 For random, roundrobin, and sequential, a postfix can be ap‐
621 pended to tell fio how many I/Os to issue before switching to a
622 new file. For example, specifying `file_service_type=random:8'
623 would cause fio to issue 8 I/Os before selecting a new file at
624 random. For the non-uniform distributions, a floating point
625 postfix can be given to influence how the distribution is
626 skewed. See random_distribution for a description of how that
627 would work.
628
629 ioscheduler=str
630 Attempt to switch the device hosting the file to the specified
631 I/O scheduler before running. If the file is a pipe, a character
632 device file or if device hosting the file could not be deter‐
633 mined, this option is ignored.
634
635 create_serialize=bool
636 If true, serialize the file creation for the jobs. This may be
637 handy to avoid interleaving of data files, which may greatly de‐
638 pend on the filesystem used and even the number of processors in
639 the system. Default: true.
640
641 create_fsync=bool
642 fsync(2) the data file after creation. This is the default.
643
644 create_on_open=bool
645 If true, don't pre-create files but allow the job's open() to
646 create a file when it's time to do I/O. Default: false -- pre-
647 create all necessary files when the job starts.
648
649 create_only=bool
650 If true, fio will only run the setup phase of the job. If files
651 need to be laid out or updated on disk, only that will be done
652 -- the actual job contents are not executed. Default: false.
653
654 allow_file_create=bool
655 If true, fio is permitted to create files as part of its work‐
656 load. If this option is false, then fio will error out if the
657 files it needs to use don't already exist. Default: true.
658
659 allow_mounted_write=bool
660 If this isn't set, fio will abort jobs that are destructive
661 (e.g. that write) to what appears to be a mounted device or par‐
662 tition. This should help catch creating inadvertently destruc‐
663 tive tests, not realizing that the test will destroy data on the
664 mounted file system. Note that some platforms don't allow writ‐
665 ing against a mounted device regardless of this option. Default:
666 false.
667
668 pre_read=bool
669 If this is given, files will be pre-read into memory before
670 starting the given I/O operation. This will also clear the in‐
671 validate flag, since it is pointless to pre-read and then drop
672 the cache. This will only work for I/O engines that are seek-
673 able, since they allow you to read the same data multiple times.
674 Thus it will not work on non-seekable I/O engines (e.g. network,
675 splice). Default: false.
676
677 unlink=bool
678 Unlink the job files when done. Not the default, as repeated
679 runs of that job would then waste time recreating the file set
680 again and again. Default: false.
681
682 unlink_each_loop=bool
683 Unlink job files after each iteration or loop. Default: false.
684
685 zonemode=str
686 Accepted values are:
687
688 none The zonerange, zonesize zonecapacity and zoneskip
689 parameters are ignored.
690
691 strided
692 I/O happens in a single zone until zonesize bytes
693 have been transferred. After that number of bytes
694 has been transferred processing of the next zone
695 starts. The zonecapacity parameter is ignored.
696
697 zbd Zoned block device mode. I/O happens sequentially
698 in each zone, even if random I/O has been se‐
699 lected. Random I/O happens across all zones in‐
700 stead of being restricted to a single zone. Trim
701 is handled using a zone reset operation. Trim only
702 considers non-empty sequential write required and
703 sequential write preferred zones.
704
705 zonerange=int
706 For zonemode=strided, this is the size of a single zone. See
707 also zonesize and zoneskip.
708
709 For zonemode=zbd, this parameter is ignored.
710
711 zonesize=int
712 For zonemode=strided, this is the number of bytes to transfer
713 before skipping zoneskip bytes. If this parameter is smaller
714 than zonerange then only a fraction of each zone with zonerange
715 bytes will be accessed. If this parameter is larger than zon‐
716 erange then each zone will be accessed multiple times before
717 skipping to the next zone.
718
719 For zonemode=zbd, this is the size of a single zone. The zon‐
720 erange parameter is ignored in this mode. For a job accessing a
721 zoned block device, the specified zonesize must be 0 or equal to
722 the device zone size. For a regular block device or file, the
723 specified zonesize must be at least 512B.
724
725 zonecapacity=int
726 For zonemode=zbd, this defines the capacity of a single zone,
727 which is the accessible area starting from the zone start ad‐
728 dress. This parameter only applies when using zonemode=zbd in
729 combination with regular block devices. If not specified it de‐
730 faults to the zone size. If the target device is a zoned block
731 device, the zone capacity is obtained from the device informa‐
732 tion and this option is ignored.
733
734 zoneskip=int[z]
735 For zonemode=strided, the number of bytes to skip after zonesize
736 bytes of data have been transferred.
737
738 For zonemode=zbd, the zonesize aligned number of bytes to skip
739 once a zone is fully written (write workloads) or all written
740 data in the zone have been read (read workloads). This parameter
741 is valid only for sequential workloads and ignored for random
742 workloads. For read workloads, see also read_beyond_wp.
743
744
745 read_beyond_wp=bool
746 This parameter applies to zonemode=zbd only.
747
748 Zoned block devices are block devices that consist of multiple
749 zones. Each zone has a type, e.g. conventional or sequential. A
750 conventional zone can be written at any offset that is a multi‐
751 ple of the block size. Sequential zones must be written sequen‐
752 tially. The position at which a write must occur is called the
753 write pointer. A zoned block device can be either host managed
754 or host aware. For host managed devices the host must ensure
755 that writes happen sequentially. Fio recognizes host managed de‐
756 vices and serializes writes to sequential zones for these de‐
757 vices.
758
759 If a read occurs in a sequential zone beyond the write pointer
760 then the zoned block device will complete the read without read‐
761 ing any data from the storage medium. Since such reads lead to
762 unrealistically high bandwidth and IOPS numbers fio only reads
763 beyond the write pointer if explicitly told to do so. Default:
764 false.
765
766 max_open_zones=int
767 A zone of a zoned block device is in the open state when it is
768 partially written (i.e. not all sectors of the zone have been
769 written). Zoned block devices may have limit a on the total num‐
770 ber of zones that can be simultaneously in the open state, that
771 is, the number of zones that can be written to simultaneously.
772 The max_open_zones parameter limits the number of zones to which
773 write commands are issued by all fio jobs, that is, limits the
774 number of zones that will be in the open state. This parameter
775 is relevant only if the zonemode=zbd is used. The default value
776 is always equal to maximum number of open zones of the target
777 zoned block device and a value higher than this limit cannot be
778 specified by users unless the option ignore_zone_limits is spec‐
779 ified. When ignore_zone_limits is specified or the target device
780 has no limit on the number of zones that can be in an open
781 state, max_open_zones can specify 0 to disable any limit on the
782 number of zones that can be simultaneously written to by all
783 jobs.
784
785 job_max_open_zones=int
786 In the same manner as max_open_zones, limit the number of open
787 zones per fio job, that is, the number of zones that a single
788 job can simultaneously write to. A value of zero indicates no
789 limit. Default: zero.
790
791 ignore_zone_limits=bool
792 If this option is used, fio will ignore the maximum number of
793 open zones limit of the zoned block device in use, thus allowing
794 the option max_open_zones value to be larger than the device re‐
795 ported limit. Default: false.
796
797 zone_reset_threshold=float
798 A number between zero and one that indicates the ratio of writ‐
799 ten bytes in the zones with write pointers in the IO range to
800 the size of the IO range. When current ratio is above this ra‐
801 tio, zones are reset periodically as zone_reset_frequency speci‐
802 fies. If there are multiple jobs when using this option, the IO
803 range for all write jobs has to be the same.
804
805 zone_reset_frequency=float
806 A number between zero and one that indicates how often a zone
807 reset should be issued if the zone reset threshold has been ex‐
808 ceeded. A zone reset is submitted after each (1 / zone_re‐
809 set_frequency) write requests. This and the previous parameter
810 can be used to simulate garbage collection activity.
811
812
813 I/O type
814 direct=bool
815 If value is true, use non-buffered I/O. This is usually O_DI‐
816 RECT. Note that OpenBSD and ZFS on Solaris don't support direct
817 I/O. On Windows the synchronous ioengines don't support direct
818 I/O. Default: false.
819
820 buffered=bool
821 If value is true, use buffered I/O. This is the opposite of the
822 direct option. Defaults to true.
823
824 readwrite=str, rw=str
825 Type of I/O pattern. Accepted values are:
826
827 read Sequential reads.
828
829 write Sequential writes.
830
831 trim Sequential trims (Linux block devices and SCSI
832 character devices only).
833
834 randread
835 Random reads.
836
837 randwrite
838 Random writes.
839
840 randtrim
841 Random trims (Linux block devices and SCSI charac‐
842 ter devices only).
843
844 rw,readwrite
845 Sequential mixed reads and writes.
846
847 randrw Random mixed reads and writes.
848
849 trimwrite
850 Sequential trim+write sequences. Blocks will be
851 trimmed first, then the same blocks will be writ‐
852 ten to. So if `io_size=64K' is specified, Fio will
853 trim a total of 64K bytes and also write 64K bytes
854 on the same trimmed blocks. This behaviour will be
855 consistent with `number_ios' or other Fio options
856 limiting the total bytes or number of I/O's.
857
858 randtrimwrite
859 Like trimwrite , but uses random offsets rather
860 than sequential writes.
861
862 Fio defaults to read if the option is not specified. For the
863 mixed I/O types, the default is to split them 50/50. For certain
864 types of I/O the result may still be skewed a bit, since the
865 speed may be different.
866
867 It is possible to specify the number of I/Os to do before get‐
868 ting a new offset by appending `:<nr>' to the end of the string
869 given. For a random read, it would look like `rw=randread:8' for
870 passing in an offset modifier with a value of 8. If the suffix
871 is used with a sequential I/O pattern, then the `<nr>' value
872 specified will be added to the generated offset for each I/O
873 turning sequential I/O into sequential I/O with holes. For in‐
874 stance, using `rw=write:4k' will skip 4k for every write. Also
875 see the rw_sequencer option.
876
877 rw_sequencer=str
878 If an offset modifier is given by appending a number to the
879 `rw=str' line, then this option controls how that number modi‐
880 fies the I/O offset being generated. Accepted values are:
881
882 sequential
883 Generate sequential offset.
884
885 identical
886 Generate the same offset.
887
888 sequential is only useful for random I/O, where fio would nor‐
889 mally generate a new random offset for every I/O. If you append
890 e.g. 8 to randread, i.e. `rw=randread:8' you would get a new
891 random offset for every 8 I/Os. The result would be a sequence
892 of 8 sequential offsets with a random starting point. However
893 this behavior may change if a sequential I/O reaches end of the
894 file. As sequential I/O is already sequential, setting sequen‐
895 tial for that would not result in any difference. identical be‐
896 haves in a similar fashion, except it sends the same offset 8
897 number of times before generating a new offset.
898
899 Example #1:
900
901 rw=randread:8
902 rw_sequencer=sequential
903 bs=4k
904
905 The generated sequence of offsets will look like this: 4k, 8k,
906 12k, 16k, 20k, 24k, 28k, 32k, 92k, 96k, 100k, 104k, 108k, 112k,
907 116k, 120k, 48k, 52k ...
908
909 Example #2:
910
911 rw=randread:8
912 rw_sequencer=identical
913 bs=4k
914
915 The generated sequence of offsets will look like this: 4k, 4k,
916 4k, 4k, 4k, 4k, 4k, 4k, 92k, 92k, 92k, 92k, 92k, 92k, 92k, 92k,
917 48k, 48k, 48k ...
918
919 unified_rw_reporting=str
920 Fio normally reports statistics on a per data direction basis,
921 meaning that reads, writes, and trims are accounted and reported
922 separately. This option determines whether fio reports the re‐
923 sults normally, summed together, or as both options. Accepted
924 values are:
925
926 none Normal statistics reporting.
927
928 mixed Statistics are summed per data direction and reported to‐
929 gether.
930
931 both Statistics are reported normally, followed by the mixed
932 statistics.
933
934 0 Backward-compatible alias for none.
935
936 1 Backward-compatible alias for mixed.
937
938 2 Alias for both.
939
940 randrepeat=bool
941 Seed all random number generators in a predictable way so the
942 pattern is repeatable across runs. Default: true.
943
944 allrandrepeat=bool
945 Alias for randrepeat. Default: true.
946
947 randseed=int
948 Seed the random number generators based on this seed value, to
949 be able to control what sequence of output is being generated.
950 If not set, the random sequence depends on the randrepeat set‐
951 ting.
952
953 fallocate=str
954 Whether pre-allocation is performed when laying down files. Ac‐
955 cepted values are:
956
957 none Do not pre-allocate space.
958
959 native Use a platform's native pre-allocation call but
960 fall back to none behavior if it fails/is not im‐
961 plemented.
962
963 posix Pre-allocate via posix_fallocate(3).
964
965 keep Pre-allocate via fallocate(2) with FAL‐
966 LOC_FL_KEEP_SIZE set.
967
968 truncate
969 Extend file to final size using ftruncate|(2) in‐
970 stead of allocating.
971
972 0 Backward-compatible alias for none.
973
974 1 Backward-compatible alias for posix.
975
976 May not be available on all supported platforms. keep is only
977 available on Linux. If using ZFS on Solaris this cannot be set
978 to posix because ZFS doesn't support pre-allocation. Default:
979 native if any pre-allocation methods except truncate are avail‐
980 able, none if not.
981
982 Note that using truncate on Windows will interact surprisingly
983 with non-sequential write patterns. When writing to a file that
984 has been extended by setting the end-of-file information, Win‐
985 dows will backfill the unwritten portion of the file up to that
986 offset with zeroes before issuing the new write. This means that
987 a single small write to the end of an extended file will stall
988 until the entire file has been filled with zeroes.
989
990 fadvise_hint=str
991 Use posix_fadvise(2) or posix_madvise(2) to advise the kernel
992 what I/O patterns are likely to be issued. Accepted values are:
993
994 0 Backwards compatible hint for "no hint".
995
996 1 Backwards compatible hint for "advise with fio
997 workload type". This uses FADV_RANDOM for a random
998 workload, and FADV_SEQUENTIAL for a sequential
999 workload.
1000
1001 sequential
1002 Advise using FADV_SEQUENTIAL.
1003
1004 random Advise using FADV_RANDOM.
1005
1006 noreuse
1007 Advise using FADV_NOREUSE. This may be a no-op on
1008 older Linux kernels. Since Linux 6.3, it provides
1009 a hint to the LRU algorithm. See the posix_fad‐
1010 vise(2) man page.
1011
1012 write_hint=str
1013 Use fcntl(2) to advise the kernel what life time to expect from
1014 a write. Only supported on Linux, as of version 4.13. Accepted
1015 values are:
1016
1017 none No particular life time associated with this file.
1018
1019 short Data written to this file has a short life time.
1020
1021 medium Data written to this file has a medium life time.
1022
1023 long Data written to this file has a long life time.
1024
1025 extreme
1026 Data written to this file has a very long life
1027 time.
1028
1029 The values are all relative to each other, and no absolute mean‐
1030 ing should be associated with them.
1031
1032 offset=int[%|z]
1033 Start I/O at the provided offset in the file, given as either a
1034 fixed size in bytes, zones or a percentage. If a percentage is
1035 given, the generated offset will be aligned to the minimum
1036 blocksize or to the value of offset_align if provided. Data be‐
1037 fore the given offset will not be touched. This effectively caps
1038 the file size at `real_size - offset'. Can be combined with size
1039 to constrain the start and end range of the I/O workload. A
1040 percentage can be specified by a number between 1 and 100 fol‐
1041 lowed by '%', for example, `offset=20%' to specify 20%. In ZBD
1042 mode, value can be set as number of zones using 'z'.
1043
1044 offset_align=int
1045 If set to non-zero value, the byte offset generated by a per‐
1046 centage offset is aligned upwards to this value. Defaults to 0
1047 meaning that a percentage offset is aligned to the minimum block
1048 size.
1049
1050 offset_increment=int[%|z]
1051 If this is provided, then the real offset becomes `offset + off‐
1052 set_increment * thread_number', where the thread number is a
1053 counter that starts at 0 and is incremented for each sub-job
1054 (i.e. when numjobs option is specified). This option is useful
1055 if there are several jobs which are intended to operate on a
1056 file in parallel disjoint segments, with even spacing between
1057 the starting points. Percentages can be used for this option.
1058 If a percentage is given, the generated offset will be aligned
1059 to the minimum blocksize or to the value of offset_align if pro‐
1060 vided.In ZBD mode, value can be set as number of zones using
1061 'z'.
1062
1063 number_ios=int
1064 Fio will normally perform I/Os until it has exhausted the size
1065 of the region set by size, or if it exhaust the allocated time
1066 (or hits an error condition). With this setting, the range/size
1067 can be set independently of the number of I/Os to perform. When
1068 fio reaches this number, it will exit normally and report sta‐
1069 tus. Note that this does not extend the amount of I/O that will
1070 be done, it will only stop fio if this condition is met before
1071 other end-of-job criteria.
1072
1073 fsync=int
1074 If writing to a file, issue an fsync(2) (or its equivalent) of
1075 the dirty data for every number of blocks given. For example, if
1076 you give 32 as a parameter, fio will sync the file after every
1077 32 writes issued. If fio is using non-buffered I/O, we may not
1078 sync the file. The exception is the sg I/O engine, which syn‐
1079 chronizes the disk cache anyway. Defaults to 0, which means fio
1080 does not periodically issue and wait for a sync to complete.
1081 Also see end_fsync and fsync_on_close.
1082
1083 fdatasync=int
1084 Like fsync but uses fdatasync(2) to only sync data and not meta‐
1085 data blocks. In Windows, DragonFlyBSD or OSX there is no fdata‐
1086 sync(2) so this falls back to using fsync(2). Defaults to 0,
1087 which means fio does not periodically issue and wait for a data-
1088 only sync to complete.
1089
1090 write_barrier=int
1091 Make every N-th write a barrier write.
1092
1093 sync_file_range=str:int
1094 Use sync_file_range(2) for every int number of write operations.
1095 Fio will track range of writes that have happened since the last
1096 sync_file_range(2) call. str can currently be one or more of:
1097
1098 wait_before
1099 SYNC_FILE_RANGE_WAIT_BEFORE
1100
1101 write SYNC_FILE_RANGE_WRITE
1102
1103 wait_after
1104 SYNC_FILE_RANGE_WRITE_AFTER
1105
1106 So if you do `sync_file_range=wait_before,write:8', fio would
1107 use `SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE' for
1108 every 8 writes. Also see the sync_file_range(2) man page. This
1109 option is Linux specific.
1110
1111 overwrite=bool
1112 If true, writes to a file will always overwrite existing data.
1113 If the file doesn't already exist, it will be created before the
1114 write phase begins. If the file exists and is large enough for
1115 the specified write phase, nothing will be done. Default: false.
1116
1117 end_fsync=bool
1118 If true, fsync(2) file contents when a write stage has com‐
1119 pleted. Default: false.
1120
1121 fsync_on_close=bool
1122 If true, fio will fsync(2) a dirty file on close. This differs
1123 from end_fsync in that it will happen on every file close, not
1124 just at the end of the job. Default: false.
1125
1126 rwmixread=int
1127 Percentage of a mixed workload that should be reads. Default:
1128 50.
1129
1130 rwmixwrite=int
1131 Percentage of a mixed workload that should be writes. If both
1132 rwmixread and rwmixwrite is given and the values do not add up
1133 to 100%, the latter of the two will be used to override the
1134 first. This may interfere with a given rate setting, if fio is
1135 asked to limit reads or writes to a certain rate. If that is the
1136 case, then the distribution may be skewed. Default: 50.
1137
1138 random_distribution=str:float[:float][,str:float][,str:float]
1139 By default, fio will use a completely uniform random distribu‐
1140 tion when asked to perform random I/O. Sometimes it is useful to
1141 skew the distribution in specific ways, ensuring that some parts
1142 of the data is more hot than others. fio includes the following
1143 distribution models:
1144
1145 random Uniform random distribution
1146
1147 zipf Zipf distribution
1148
1149 pareto Pareto distribution
1150
1151 normal Normal (Gaussian) distribution
1152
1153 zoned Zoned random distribution zoned_abs Zoned absolute
1154 random distribution
1155
1156 When using a zipf or pareto distribution, an input value is also
1157 needed to define the access pattern. For zipf, this is the `Zipf
1158 theta'. For pareto, it's the `Pareto power'. Fio includes a
1159 test program, fio-genzipf, that can be used visualize what the
1160 given input values will yield in terms of hit rates. If you
1161 wanted to use zipf with a `theta' of 1.2, you would use `ran‐
1162 dom_distribution=zipf:1.2' as the option. If a non-uniform model
1163 is used, fio will disable use of the random map. For the normal
1164 distribution, a normal (Gaussian) deviation is supplied as a
1165 value between 0 and 100.
1166
1167 The second, optional float is allowed for pareto, zipf and nor‐
1168 mal distributions. It allows one to set base of distribution in
1169 non-default place, giving more control over most probable out‐
1170 come. This value is in range [0-1] which maps linearly to range
1171 of possible random values. Defaults are: random for pareto and
1172 zipf, and 0.5 for normal. If you wanted to use zipf with a
1173 `theta` of 1.2 centered on 1/4 of allowed value range, you would
1174 use `random_distribution=zipf:1.2:0.25`.
1175
1176 For a zoned distribution, fio supports specifying percentages of
1177 I/O access that should fall within what range of the file or de‐
1178 vice. For example, given a criteria of:
1179
1180 60% of accesses should be to the first 10%
1181 30% of accesses should be to the next 20%
1182 8% of accesses should be to the next 30%
1183 2% of accesses should be to the next 40%
1184
1185 we can define that through zoning of the random accesses. For
1186 the above example, the user would do:
1187
1188 random_distribution=zoned:60/10:30/20:8/30:2/40
1189
1190 A zoned_abs distribution works exactly like thezoned, except
1191 that it takes absolute sizes. For example, let's say you wanted
1192 to define access according to the following criteria:
1193
1194 60% of accesses should be to the first 20G
1195 30% of accesses should be to the next 100G
1196 10% of accesses should be to the next 500G
1197
1198 we can define an absolute zoning distribution with:
1199
1200 random_distribution=zoned:60/10:30/20:8/30:2/40
1201
1202 For both zoned and zoned_abs, fio supports defining up to 256
1203 separate zones.
1204
1205 Similarly to how bssplit works for setting ranges and percent‐
1206 ages of block sizes. Like bssplit, it's possible to specify sep‐
1207 arate zones for reads, writes, and trims. If just one set is
1208 given, it'll apply to all of them.
1209
1210 percentage_random=int[,int][,int]
1211 For a random workload, set how big a percentage should be ran‐
1212 dom. This defaults to 100%, in which case the workload is fully
1213 random. It can be set from anywhere from 0 to 100. Setting it to
1214 0 would make the workload fully sequential. Any setting in be‐
1215 tween will result in a random mix of sequential and random I/O,
1216 at the given percentages. Comma-separated values may be speci‐
1217 fied for reads, writes, and trims as described in blocksize.
1218
1219 norandommap
1220 Normally fio will cover every block of the file when doing ran‐
1221 dom I/O. If this option is given, fio will just get a new random
1222 offset without looking at past I/O history. This means that some
1223 blocks may not be read or written, and that some blocks may be
1224 read/written more than once. If this option is used with verify
1225 and multiple blocksizes (via bsrange), only intact blocks are
1226 verified, i.e., partially-overwritten blocks are ignored. With
1227 an async I/O engine and an I/O depth > 1, it is possible for the
1228 same block to be overwritten, which can cause verification er‐
1229 rors. Either do not use norandommap in this case, or also use
1230 the lfsr random generator.
1231
1232 softrandommap=bool
1233 See norandommap. If fio runs with the random block map enabled
1234 and it fails to allocate the map, if this option is set it will
1235 continue without a random block map. As coverage will not be as
1236 complete as with random maps, this option is disabled by de‐
1237 fault.
1238
1239 random_generator=str
1240 Fio supports the following engines for generating I/O offsets
1241 for random I/O:
1242
1243 tausworthe
1244 Strong 2^88 cycle random number generator.
1245
1246 lfsr Linear feedback shift register generator.
1247
1248 tausworthe64
1249 Strong 64-bit 2^258 cycle random number generator.
1250
1251 tausworthe is a strong random number generator, but it requires
1252 tracking on the side if we want to ensure that blocks are only
1253 read or written once. lfsr guarantees that we never generate the
1254 same offset twice, and it's also less computationally expensive.
1255 It's not a true random generator, however, though for I/O pur‐
1256 poses it's typically good enough. lfsr only works with single
1257 block sizes, not with workloads that use multiple block sizes.
1258 If used with such a workload, fio may read or write some blocks
1259 multiple times. The default value is tausworthe, unless the re‐
1260 quired space exceeds 2^32 blocks. If it does, then tausworthe64
1261 is selected automatically.
1262
1263 Block size
1264 blocksize=int[,int][,int], bs=int[,int][,int]
1265 The block size in bytes used for I/O units. Default: 4096. A
1266 single value applies to reads, writes, and trims. Comma-sepa‐
1267 rated values may be specified for reads, writes, and trims. A
1268 value not terminated in a comma applies to subsequent types. Ex‐
1269 amples:
1270
1271 bs=256k means 256k for reads, writes and trims.
1272 bs=8k,32k means 8k for reads, 32k for writes and
1273 trims.
1274 bs=8k,32k, means 8k for reads, 32k for writes, and
1275 default for trims.
1276 bs=,8k means default for reads, 8k for writes and
1277 trims.
1278 bs=,8k, means default for reads, 8k for writes,
1279 and default for trims.
1280
1281 blocksize_range=irange[,irange][,irange],
1282 bsrange=irange[,irange][,irange]
1283 A range of block sizes in bytes for I/O units. The issued I/O
1284 unit will always be a multiple of the minimum size, unless
1285 blocksize_unaligned is set. Comma-separated ranges may be spec‐
1286 ified for reads, writes, and trims as described in blocksize.
1287 Example:
1288
1289 bsrange=1k-4k,2k-8k
1290
1291 bssplit=str[,str][,str]
1292 Sometimes you want even finer grained control of the block sizes
1293 issued, not just an even split between them. This option allows
1294 you to weight various block sizes, so that you are able to de‐
1295 fine a specific amount of block sizes issued. The format for
1296 this option is:
1297
1298 bssplit=blocksize/percentage:blocksize/percentage
1299
1300 for as many block sizes as needed. So if you want to define a
1301 workload that has 50% 64k blocks, 10% 4k blocks, and 40% 32k
1302 blocks, you would write:
1303
1304 bssplit=4k/10:64k/50:32k/40
1305
1306 Ordering does not matter. If the percentage is left blank, fio
1307 will fill in the remaining values evenly. So a bssplit option
1308 like this one:
1309
1310 bssplit=4k/50:1k/:32k/
1311
1312 would have 50% 4k ios, and 25% 1k and 32k ios. The percentages
1313 always add up to 100, if bssplit is given a range that adds up
1314 to more, it will error out.
1315
1316 Comma-separated values may be specified for reads, writes, and
1317 trims as described in blocksize.
1318
1319 If you want a workload that has 50% 2k reads and 50% 4k reads,
1320 while having 90% 4k writes and 10% 8k writes, you would specify:
1321
1322 bssplit=2k/50:4k/50,4k/90:8k/10
1323
1324 Fio supports defining up to 64 different weights for each data
1325 direction.
1326
1327 blocksize_unaligned, bs_unaligned
1328 If set, fio will issue I/O units with any size within block‐
1329 size_range, not just multiples of the minimum size. This typi‐
1330 cally won't work with direct I/O, as that normally requires sec‐
1331 tor alignment.
1332
1333 bs_is_seq_rand=bool
1334 If this option is set, fio will use the normal read,write block‐
1335 size settings as sequential,random blocksize settings instead.
1336 Any random read or write will use the WRITE blocksize settings,
1337 and any sequential read or write will use the READ blocksize
1338 settings.
1339
1340 blockalign=int[,int][,int], ba=int[,int][,int]
1341 Boundary to which fio will align random I/O units. Default:
1342 blocksize. Minimum alignment is typically 512b for using direct
1343 I/O, though it usually depends on the hardware block size. This
1344 option is mutually exclusive with using a random map for files,
1345 so it will turn off that option. Comma-separated values may be
1346 specified for reads, writes, and trims as described in block‐
1347 size.
1348
1349 Buffers and memory
1350 zero_buffers
1351 Initialize buffers with all zeros. Default: fill buffers with
1352 random data.
1353
1354 refill_buffers
1355 If this option is given, fio will refill the I/O buffers on ev‐
1356 ery submit. The default is to only fill it at init time and re‐
1357 use that data. Only makes sense if zero_buffers isn't specified,
1358 naturally. If data verification is enabled, refill_buffers is
1359 also automatically enabled.
1360
1361 scramble_buffers=bool
1362 If refill_buffers is too costly and the target is using data
1363 deduplication, then setting this option will slightly modify the
1364 I/O buffer contents to defeat normal de-dupe attempts. This is
1365 not enough to defeat more clever block compression attempts, but
1366 it will stop naive dedupe of blocks. Default: true.
1367
1368 buffer_compress_percentage=int
1369 If this is set, then fio will attempt to provide I/O buffer con‐
1370 tent (on WRITEs) that compresses to the specified level. Fio
1371 does this by providing a mix of random data followed by fixed
1372 pattern data. The fixed pattern is either zeros, or the pattern
1373 specified by buffer_pattern. If the buffer_pattern option is
1374 used, it might skew the compression ratio slightly. Setting buf‐
1375 fer_compress_percentage to a value other than 100 will also en‐
1376 able refill_buffers in order to reduce the likelihood that adja‐
1377 cent blocks are so similar that they over compress when seen to‐
1378 gether. See buffer_compress_chunk for how to set a finer or
1379 coarser granularity of the random/fixed data regions. Defaults
1380 to unset i.e., buffer data will not adhere to any compression
1381 level.
1382
1383 buffer_compress_chunk=int
1384 This setting allows fio to manage how big the random/fixed data
1385 region is when using buffer_compress_percentage. When buf‐
1386 fer_compress_chunk is set to some non-zero value smaller than
1387 the block size, fio can repeat the random/fixed region through‐
1388 out the I/O buffer at the specified interval (which particularly
1389 useful when bigger block sizes are used for a job). When set to
1390 0, fio will use a chunk size that matches the block size result‐
1391 ing in a single random/fixed region within the I/O buffer. De‐
1392 faults to 512. When the unit is omitted, the value is inter‐
1393 preted in bytes.
1394
1395 buffer_pattern=str
1396 If set, fio will fill the I/O buffers with this pattern or with
1397 the contents of a file. If not set, the contents of I/O buffers
1398 are defined by the other options related to buffer contents. The
1399 setting can be any pattern of bytes, and can be prefixed with 0x
1400 for hex values. It may also be a string, where the string must
1401 then be wrapped with "". Or it may also be a filename, where the
1402 filename must be wrapped with '' in which case the file is
1403 opened and read. Note that not all the file contents will be
1404 read if that would cause the buffers to overflow. So, for exam‐
1405 ple:
1406
1407 buffer_pattern='filename'
1408 or:
1409 buffer_pattern="abcd"
1410 or:
1411 buffer_pattern=-12
1412 or:
1413 buffer_pattern=0xdeadface
1414
1415 Also you can combine everything together in any order:
1416
1417 buffer_pattern=0xdeadface"abcd"-12'filename'
1418
1419 dedupe_percentage=int
1420 If set, fio will generate this percentage of identical buffers
1421 when writing. These buffers will be naturally dedupable. The
1422 contents of the buffers depend on what other buffer compression
1423 settings have been set. It's possible to have the individual
1424 buffers either fully compressible, or not at all -- this option
1425 only controls the distribution of unique buffers. Setting this
1426 option will also enable refill_buffers to prevent every buffer
1427 being identical.
1428
1429 dedupe_mode=str
1430 If dedupe_percentage is given, then this option controls how fio
1431 generates the dedupe buffers.
1432
1433 repeat
1434
1435 Generate dedupe buffers by repeating previous
1436 writes
1437
1438 working_set
1439
1440 Generate dedupe buffers from working set
1441
1442 repeat is the default option for fio. Dedupe buffers are gener‐
1443 ated by repeating previous unique write.
1444
1445 working_set is a more realistic workload. With working_set,
1446 dedupe_working_set_percentage should be provided. Given that,
1447 fio will use the initial unique write buffers as its working
1448 set. Upon deciding to dedupe, fio will randomly choose a buffer
1449 from the working set. Note that by using working_set the dedupe
1450 percentage will converge to the desired over time while repeat
1451 maintains the desired percentage throughout the job.
1452
1453 dedupe_working_set_percentage=int
1454 If dedupe_mode is set to working_set, then this controls the
1455 percentage of size of the file or device used as the buffers fio
1456 will choose to generate the dedupe buffers from
1457
1458 Note that size needs to be explicitly provided and only 1 file
1459 per job is supported
1460
1461 dedupe_global=bool
1462 This controls whether the deduplication buffers will be shared
1463 amongst all jobs that have this option set. The buffers are
1464 spread evenly between participating jobs.
1465
1466 Note that dedupe_mode must be set to working_set for this to
1467 work. Can be used in combination with compression
1468
1469 invalidate=bool
1470 Invalidate the buffer/page cache parts of the files to be
1471 used prior to starting I/O if the platform and file type
1472 support it. Defaults to true. This will be ignored if
1473 pre_read is also specified for the same job.
1474
1475 sync=str
1476 Whether, and what type, of synchronous I/O to use for
1477 writes. The allowed values are:
1478
1479 none Do not use synchronous IO, the default.
1480
1481 0 Same as none.
1482
1483 sync Use synchronous file IO. For the majority
1484 of I/O engines, this means using O_SYNC.
1485
1486 1 Same as sync.
1487
1488 dsync Use synchronous data IO. For the majority
1489 of I/O engines, this means using O_DSYNC.
1490
1491 iomem=str, mem=str
1492 Fio can use various types of memory as the I/O unit buf‐
1493 fer. The allowed values are:
1494
1495 malloc Use memory from malloc(3) as the buffers.
1496 Default memory type.
1497
1498 shm Use shared memory as the buffers. Allocated
1499 through shmget(2).
1500
1501 shmhuge
1502 Same as shm, but use huge pages as backing.
1503
1504 mmap Use mmap(2) to allocate buffers. May either
1505 be anonymous memory, or can be file backed
1506 if a filename is given after the option.
1507 The format is `mem=mmap:/path/to/file'.
1508
1509 mmaphuge
1510 Use a memory mapped huge file as the buffer
1511 backing. Append filename after mmaphuge,
1512 ala `mem=mmaphuge:/hugetlbfs/file'.
1513
1514 mmapshared
1515 Same as mmap, but use a MMAP_SHARED map‐
1516 ping.
1517
1518 cudamalloc
1519 Use GPU memory as the buffers for GPUDirect
1520 RDMA benchmark. The ioengine must be rdma.
1521
1522 The area allocated is a function of the maximum allowed
1523 bs size for the job, multiplied by the I/O depth given.
1524 Note that for shmhuge and mmaphuge to work, the system
1525 must have free huge pages allocated. This can normally be
1526 checked and set by reading/writing
1527 `/proc/sys/vm/nr_hugepages' on a Linux system. Fio as‐
1528 sumes a huge page is 2 or 4MiB in size depending on the
1529 platform. So to calculate the number of huge pages you
1530 need for a given job file, add up the I/O depth of all
1531 jobs (normally one unless iodepth is used) and multiply
1532 by the maximum bs set. Then divide that number by the
1533 huge page size. You can see the size of the huge pages in
1534 `/proc/meminfo'. If no huge pages are allocated by having
1535 a non-zero number in `nr_hugepages', using mmaphuge or
1536 shmhuge will fail. Also see hugepage-size.
1537
1538 mmaphuge also needs to have hugetlbfs mounted and the
1539 file location should point there. So if it's mounted in
1540 `/huge', you would use `mem=mmaphuge:/huge/somefile'.
1541
1542 iomem_align=int, mem_align=int
1543 This indicates the memory alignment of the I/O memory
1544 buffers. Note that the given alignment is applied to the
1545 first I/O unit buffer, if using iodepth the alignment of
1546 the following buffers are given by the bs used. In other
1547 words, if using a bs that is a multiple of the page sized
1548 in the system, all buffers will be aligned to this value.
1549 If using a bs that is not page aligned, the alignment of
1550 subsequent I/O memory buffers is the sum of the
1551 iomem_align and bs used.
1552
1553 hugepage-size=int
1554 Defines the size of a huge page. Must at least be equal
1555 to the system setting, see `/proc/meminfo' and `/sys/ker‐
1556 nel/mm/hugepages/'. Defaults to 2 or 4MiB depending on
1557 the platform. Should probably always be a multiple of
1558 megabytes, so using `hugepage-size=Xm' is the preferred
1559 way to set this to avoid setting a non-pow-2 bad value.
1560
1561 lockmem=int
1562 Pin the specified amount of memory with mlock(2). Can be
1563 used to simulate a smaller amount of memory. The amount
1564 specified is per worker.
1565
1566 I/O size
1567 size=int[%|z]
1568 The total size of file I/O for each thread of this job. Fio will
1569 run until this many bytes has been transferred, unless runtime
1570 is altered by other means such as (1) runtime, (2) io_size, (3)
1571 number_ios, (4) gaps/holes while doing I/O's such as
1572 `rw=read:16K', or (5) sequential I/O reaching end of the file
1573 which is possible when percentage_random is less than 100. Fio
1574 will divide this size between the available files determined by
1575 options such as nrfiles, filename, unless filesize is specified
1576 by the job. If the result of division happens to be 0, the size
1577 is set to the physical size of the given files or devices if
1578 they exist. If this option is not specified, fio will use the
1579 full size of the given files or devices. If the files do not ex‐
1580 ist, size must be given. It is also possible to give size as a
1581 percentage between 1 and 100. If `size=20%' is given, fio will
1582 use 20% of the full size of the given files or devices. In ZBD
1583 mode, size can be given in units of number of zones using 'z'.
1584 Can be combined with offset to constrain the start and end range
1585 that I/O will be done within.
1586
1587 io_size=int[%|z], io_limit=int[%|z]
1588 Normally fio operates within the region set by size, which means
1589 that the size option sets both the region and size of I/O to be
1590 performed. Sometimes that is not what you want. With this op‐
1591 tion, it is possible to define just the amount of I/O that fio
1592 should do. For instance, if size is set to 20GiB and io_size is
1593 set to 5GiB, fio will perform I/O within the first 20GiB but
1594 exit when 5GiB have been done. The opposite is also possible --
1595 if size is set to 20GiB, and io_size is set to 40GiB, then fio
1596 will do 40GiB of I/O within the 0..20GiB region. Value can be
1597 set as percentage: io_size=N%. In this case io_size multiplies
1598 size= value. In ZBD mode, value can also be set as number of
1599 zones using 'z'.
1600
1601 filesize=irange(int)
1602 Individual file sizes. May be a range, in which case fio will
1603 select sizes for files at random within the given range. If not
1604 given, each created file is the same size. This option overrides
1605 size in terms of file size, i.e. size becomes merely the default
1606 for io_size (and has no effect it all if io_size is set explic‐
1607 itly).
1608
1609 file_append=bool
1610 Perform I/O after the end of the file. Normally fio will operate
1611 within the size of a file. If this option is set, then fio will
1612 append to the file instead. This has identical behavior to set‐
1613 ting offset to the size of a file. This option is ignored on
1614 non-regular files.
1615
1616 fill_device=bool, fill_fs=bool
1617 Sets size to something really large and waits for ENOSPC (no
1618 space left on device) or EDQUOT (disk quota exceeded) as the
1619 terminating condition. Only makes sense with sequential write.
1620 For a read workload, the mount point will be filled first then
1621 I/O started on the result.
1622
1623 I/O engine
1624 ioengine=str
1625 Defines how the job issues I/O to the file. The following types
1626 are defined:
1627
1628 sync Basic read(2) or write(2) I/O. lseek(2) is used to
1629 position the I/O location. See fsync and fdata‐
1630 sync for syncing write I/Os.
1631
1632 psync Basic pread(2) or pwrite(2) I/O. Default on all
1633 supported operating systems except for Windows.
1634
1635 vsync Basic readv(2) or writev(2) I/O. Will emulate
1636 queuing by coalescing adjacent I/Os into a single
1637 submission.
1638
1639 pvsync Basic preadv(2) or pwritev(2) I/O.
1640
1641 pvsync2
1642 Basic preadv2(2) or pwritev2(2) I/O.
1643
1644 io_uring
1645 Fast Linux native asynchronous I/O. Supports async
1646 IO for both direct and buffered IO. This engine
1647 defines engine specific options.
1648
1649 io_uring_cmd
1650 Fast Linux native asynchronous I/O for passthrough
1651 commands. This engine defines engine specific op‐
1652 tions.
1653
1654 libaio Linux native asynchronous I/O. Note that Linux may
1655 only support queued behavior with non-buffered I/O
1656 (set `direct=1' or `buffered=0'). This engine de‐
1657 fines engine specific options.
1658
1659 posixaio
1660 POSIX asynchronous I/O using aio_read(3) and
1661 aio_write(3).
1662
1663 solarisaio
1664 Solaris native asynchronous I/O.
1665
1666 windowsaio
1667 Windows native asynchronous I/O. Default on Win‐
1668 dows.
1669
1670 mmap File is memory mapped with mmap(2) and data copied
1671 to/from using memcpy(3).
1672
1673 splice splice(2) is used to transfer the data and vm‐
1674 splice(2) to transfer data from user space to the
1675 kernel.
1676
1677 sg SCSI generic sg v3 I/O. May either be synchronous
1678 using the SG_IO ioctl, or if the target is an sg
1679 character device we use read(2) and write(2) for
1680 asynchronous I/O. Requires filename option to
1681 specify either block or character devices. This
1682 engine supports trim operations. The sg engine in‐
1683 cludes engine specific options.
1684
1685 libzbc Read, write, trim and ZBC/ZAC operations to a
1686 zoned block device using libzbc library. The tar‐
1687 get can be either an SG character device or a
1688 block device file.
1689
1690 null Doesn't transfer any data, just pretends to. This
1691 is mainly used to exercise fio itself and for de‐
1692 bugging/testing purposes.
1693
1694 net Transfer over the network to given `host:port'.
1695 Depending on the protocol used, the hostname,
1696 port, listen and filename options are used to
1697 specify what sort of connection to make, while the
1698 protocol option determines which protocol will be
1699 used. This engine defines engine specific options.
1700
1701 netsplice
1702 Like net, but uses splice(2) and vmsplice(2) to
1703 map data and send/receive. This engine defines
1704 engine specific options.
1705
1706 cpuio Doesn't transfer any data, but burns CPU cycles
1707 according to the cpuload, cpuchunks and cpumode
1708 options. A job never finishes unless there is at
1709 least one non-cpuio job.
1710
1711 cpuload=85 will cause that job to do nothing but
1712 burn 85% of the CPU. In case of SMP machines, use
1713 numjobs=<nr_of_cpu> to get desired CPU usage, as
1714 the cpuload only loads a single CPU at the desired
1715 rate.
1716
1717 cpumode=qsort replace the default noop instruc‐
1718 tions loop by a qsort algorithm to consume more
1719 energy.
1720
1721 rdma The RDMA I/O engine supports both RDMA memory se‐
1722 mantics (RDMA_WRITE/RDMA_READ) and channel seman‐
1723 tics (Send/Recv) for the InfiniBand, RoCE and
1724 iWARP protocols. This engine defines engine spe‐
1725 cific options.
1726 falloc I/O engine that does regular fallocate to simulate
1727 data transfer as fio ioengine.
1728 DDIR_READ does fallocate(,mode = FAL‐
1729 LOC_FL_KEEP_SIZE,).
1730 DIR_WRITE does fallocate(,mode = 0).
1731 DDIR_TRIM does fallocate(,mode = FAL‐
1732 LOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
1733
1734 ftruncate
1735 I/O engine that sends ftruncate(2) operations in
1736 response to write (DDIR_WRITE) events. Each ftrun‐
1737 cate issued sets the file's size to the current
1738 block offset. blocksize is ignored.
1739
1740 e4defrag
1741 I/O engine that does regular EXT4_IOC_MOVE_EXT
1742 ioctls to simulate defragment activity in request
1743 to DDIR_WRITE event.
1744
1745 rados I/O engine supporting direct access to Ceph Reli‐
1746 able Autonomic Distributed Object Store (RADOS)
1747 via librados. This ioengine defines engine spe‐
1748 cific options.
1749
1750 rbd I/O engine supporting direct access to Ceph Rados
1751 Block Devices (RBD) via librbd without the need to
1752 use the kernel rbd driver. This ioengine defines
1753 engine specific options.
1754
1755 http I/O engine supporting GET/PUT requests over
1756 HTTP(S) with libcurl to a WebDAV or S3 endpoint.
1757 This ioengine defines engine specific options.
1758
1759 This engine only supports direct IO of iodepth=1;
1760 you need to scale this via numjobs. blocksize de‐
1761 fines the size of the objects to be created.
1762
1763 TRIM is translated to object deletion.
1764
1765 gfapi Using GlusterFS libgfapi sync interface to direct
1766 access to GlusterFS volumes without having to go
1767 through FUSE. This ioengine defines engine spe‐
1768 cific options.
1769
1770 gfapi_async
1771 Using GlusterFS libgfapi async interface to direct
1772 access to GlusterFS volumes without having to go
1773 through FUSE. This ioengine defines engine spe‐
1774 cific options.
1775
1776 libhdfs
1777 Read and write through Hadoop (HDFS). The filename
1778 option is used to specify host,port of the hdfs
1779 name-node to connect. This engine interprets off‐
1780 sets a little differently. In HDFS, files once
1781 created cannot be modified so random writes are
1782 not possible. To imitate this the libhdfs engine
1783 expects a bunch of small files to be created over
1784 HDFS and will randomly pick a file from them based
1785 on the offset generated by fio backend (see the
1786 example job file to create such files, use
1787 `rw=write' option). Please note, it may be neces‐
1788 sary to set environment variables to work with
1789 HDFS/libhdfs properly. Each job uses its own con‐
1790 nection to HDFS.
1791
1792 mtd Read, write and erase an MTD character device
1793 (e.g., `/dev/mtd0'). Discards are treated as
1794 erases. Depending on the underlying device type,
1795 the I/O may have to go in a certain pattern, e.g.,
1796 on NAND, writing sequentially to erase blocks and
1797 discarding before overwriting. The trimwrite mode
1798 works well for this constraint.
1799
1800 dev-dax
1801 Read and write using device DAX to a persistent
1802 memory device (e.g., /dev/dax0.0) through the PMDK
1803 libpmem library.
1804
1805 external
1806 Prefix to specify loading an external I/O engine
1807 object file. Append the engine filename, e.g. `io‐
1808 engine=external:/tmp/foo.o' to load ioengine
1809 `foo.o' in `/tmp'. The path can be either absolute
1810 or relative. See `engines/skeleton_external.c' in
1811 the fio source for details of writing an external
1812 I/O engine.
1813
1814 filecreate
1815 Simply create the files and do no I/O to them.
1816 You still need to set filesize so that all the ac‐
1817 counting still occurs, but no actual I/O will be
1818 done other than creating the file.
1819
1820 filestat
1821 Simply do stat() and do no I/O to the file. You
1822 need to set 'filesize' and 'nrfiles', so that
1823 files will be created. This engine is to measure
1824 file lookup and meta data access.
1825
1826 filedelete
1827 Simply delete files by unlink() and do no I/O to
1828 the file. You need to set 'filesize' and 'nr‐
1829 files', so that files will be created. This en‐
1830 gine is to measure file delete.
1831
1832 libpmem
1833 Read and write using mmap I/O to a file on a
1834 filesystem mounted with DAX on a persistent memory
1835 device through the PMDK libpmem library.
1836
1837 ime_psync
1838 Synchronous read and write using DDN's Infinite
1839 Memory Engine (IME). This engine is very basic and
1840 issues calls to IME whenever an IO is queued.
1841
1842 ime_psyncv
1843 Synchronous read and write using DDN's Infinite
1844 Memory Engine (IME). This engine uses iovecs and
1845 will try to stack as much IOs as possible (if the
1846 IOs are "contiguous" and the IO depth is not ex‐
1847 ceeded) before issuing a call to IME.
1848
1849 ime_aio
1850 Asynchronous read and write using DDN's Infinite
1851 Memory Engine (IME). This engine will try to stack
1852 as much IOs as possible by creating requests for
1853 IME. FIO will then decide when to commit these
1854 requests.
1855
1856 libiscsi
1857 Read and write iscsi lun with libiscsi.
1858
1859 nbd Synchronous read and write a Network Block Device
1860 (NBD).
1861
1862 libcufile
1863 I/O engine supporting libcufile synchronous access
1864 to nvidia-fs and a GPUDirect Storage-supported
1865 filesystem. This engine performs I/O without
1866 transferring buffers between user-space and the
1867 kernel, unless verify is set or cuda_io is posix.
1868 iomem must not be cudamalloc. This ioengine de‐
1869 fines engine specific options.
1870
1871 dfs I/O engine supporting asynchronous read and write
1872 operations to the DAOS File System (DFS) via
1873 libdfs.
1874
1875 nfs I/O engine supporting asynchronous read and write
1876 operations to NFS filesystems from userspace via
1877 libnfs. This is useful for achieving higher con‐
1878 currency and thus throughput than is possible via
1879 kernel NFS.
1880
1881 exec Execute 3rd party tools. Could be used to perform
1882 monitoring during jobs runtime.
1883
1884 xnvme I/O engine using the xNVMe C API, for NVMe de‐
1885 vices. The xnvme engine provides flexibility to
1886 access GNU/Linux Kernel NVMe driver via libaio,
1887 IOCTLs, io_uring, the SPDK NVMe driver, or your
1888 own custom NVMe driver. The xnvme engine includes
1889 engine specific options. (See https://xnvme.io/).
1890
1891 libblkio
1892 Use the libblkio library (https://gitlab.com/lib‐
1893 blkio/libblkio). The specific driver to use must
1894 be set using libblkio_driver. If mem/iomem is not
1895 specified, memory allocation is delegated to lib‐
1896 blkio (and so is guaranteed to work with the se‐
1897 lected driver). One libblkio instance is used per
1898 process, so all jobs setting option thread will
1899 share a single instance (with one queue per
1900 thread) and must specify compatible options. Note
1901 that some drivers don't allow several instances to
1902 access the same device or file simultaneously, but
1903 allow it for threads.
1904
1905 I/O engine specific parameters
1906 In addition, there are some parameters which are only valid when a spe‐
1907 cific ioengine is in use. These are used identically to normal parame‐
1908 ters, with the caveat that when used on the command line, they must
1909 come after the ioengine that defines them is selected.
1910
1911 (io_uring,libaio)cmdprio_percentage=int[,int]
1912 Set the percentage of I/O that will be issued with the highest
1913 priority. Default: 0. A single value applies to reads and
1914 writes. Comma-separated values may be specified for reads and
1915 writes. For this option to be effective, NCQ priority must be
1916 supported and enabled, and `direct=1' option must be used. fio
1917 must also be run as the root user. Unlike slat/clat/lat stats,
1918 which can be tracked and reported independently, per priority
1919 stats only track and report a single type of latency. By de‐
1920 fault, completion latency (clat) will be reported, if lat_per‐
1921 centiles is set, total latency (lat) will be reported.
1922
1923 (io_uring,libaio)cmdprio_class=int[,int]
1924 Set the I/O priority class to use for I/Os that must be issued
1925 with a priority when cmdprio_percentage or cmdprio_bssplit is
1926 set. If not specified when cmdprio_percentage or cmdprio_bss‐
1927 plit is set, this defaults to the highest priority class. A sin‐
1928 gle value applies to reads and writes. Comma-separated values
1929 may be specified for reads and writes. See man ionice(1). See
1930 also the prioclass option.
1931
1932 (io_uring,libaio)cmdprio=int[,int]
1933 Set the I/O priority value to use for I/Os that must be issued
1934 with a priority when cmdprio_percentage or cmdprio_bssplit is
1935 set. If not specified when cmdprio_percentage or cmdprio_bss‐
1936 plit is set, this defaults to 0. Linux limits us to a positive
1937 value between 0 and 7, with 0 being the highest. A single value
1938 applies to reads and writes. Comma-separated values may be
1939 specified for reads and writes. See man ionice(1). Refer to an
1940 appropriate manpage for other operating systems since the mean‐
1941 ing of priority may differ. See also the prio option.
1942
1943 (io_uring,libaio)cmdprio_bssplit=str[,str]
1944 To get a finer control over I/O priority, this option allows
1945 specifying the percentage of IOs that must have a priority set
1946 depending on the block size of the IO. This option is useful
1947 only when used together with the option bssplit, that is, multi‐
1948 ple different block sizes are used for reads and writes.
1949
1950 The first accepted format for this option is the same as the
1951 format of the bssplit option:
1952
1953 cmdprio_bssplit=blocksize/percentage:blocksize/percentage
1954
1955 In this case, each entry will use the priority class and prior‐
1956 ity level defined by the options cmdprio_class and cmdprio re‐
1957 spectively.
1958
1959 The second accepted format for this option is:
1960
1961 cmdprio_bssplit=blocksize/percentage/class/level:block‐
1962 size/percentage/class/level
1963
1964 In this case, the priority class and priority level is defined
1965 inside each entry. In comparison with the first accepted format,
1966 the second accepted format does not restrict all entries to have
1967 the same priority class and priority level.
1968
1969 For both formats, only the read and write data directions are
1970 supported, values for trim IOs are ignored. This option is mutu‐
1971 ally exclusive with the cmdprio_percentage option.
1972
1973 (io_uring,io_uring_cmd)fixedbufs
1974 If fio is asked to do direct IO, then Linux will map pages for
1975 each IO call, and release them when IO is done. If this option
1976 is set, the pages are pre-mapped before IO is started. This
1977 eliminates the need to map and release for each IO. This is
1978 more efficient, and reduces the IO latency as well.
1979
1980 (io_uring,io_uring_cmd)nonvectored=int
1981 With this option, fio will use non-vectored read/write commands,
1982 where address must contain the address directly. Default is -1.
1983
1984 (io_uring,io_uring_cmd)force_async
1985 Normal operation for io_uring is to try and issue an sqe as non-
1986 blocking first, and if that fails, execute it in an async man‐
1987 ner. With this option set to N, then every N request fio will
1988 ask sqe to be issued in an async manner. Default is 0.
1989
1990 (io_uring,io_uring_cmd,xnvme)hipri
1991 If this option is set, fio will attempt to use polled IO comple‐
1992 tions. Normal IO completions generate interrupts to signal the
1993 completion of IO, polled completions do not. Hence they are re‐
1994 quire active reaping by the application. The benefits are more
1995 efficient IO for high IOPS scenarios, and lower latencies for
1996 low queue depth IO.
1997
1998 (io_uring,io_uring_cmd)registerfiles
1999 With this option, fio registers the set of files being used with
2000 the kernel. This avoids the overhead of managing file counts in
2001 the kernel, making the submission and completion part more
2002 lightweight. Required for the below sqthread_poll option.
2003
2004 (io_uring,io_uring_cmd,xnvme)sqthread_poll
2005 Normally fio will submit IO by issuing a system call to notify
2006 the kernel of available items in the SQ ring. If this option is
2007 set, the act of submitting IO will be done by a polling thread
2008 in the kernel. This frees up cycles for fio, at the cost of us‐
2009 ing more CPU in the system. As submission is just the time it
2010 takes to fill in the sqe entries and any syscall required to
2011 wake up the idle kernel thread, fio will not report submission
2012 latencies.
2013
2014 (io_uring,io_uring_cmd)sqthread_poll_cpu=int
2015 When `sqthread_poll` is set, this option provides a way to de‐
2016 fine which CPU should be used for the polling thread.
2017
2018 (io_uring_cmd)cmd_type=str
2019 Specifies the type of uring passthrough command to be used. Sup‐
2020 ported value is nvme. Default is nvme.
2021
2022 (libaio)userspace_reap
2023 Normally, with the libaio engine in use, fio will use the
2024 io_getevents(3) system call to reap newly returned events. With
2025 this flag turned on, the AIO ring will be read directly from
2026 user-space to reap events. The reaping mode is only enabled when
2027 polling for a minimum of 0 events (e.g. when `iodepth_batch_com‐
2028 plete=0').
2029
2030 (pvsync2)hipri
2031 Set RWF_HIPRI on I/O, indicating to the kernel that it's of
2032 higher priority than normal.
2033
2034 (pvsync2)hipri_percentage
2035 When hipri is set this determines the probability of a pvsync2
2036 I/O being high priority. The default is 100%.
2037
2038 (pvsync2,libaio,io_uring,io_uring_cmd)nowait=bool
2039 By default if a request cannot be executed immediately (e.g. re‐
2040 source starvation, waiting on locks) it is queued and the initi‐
2041 ating process will be blocked until the required resource be‐
2042 comes free. This option sets the RWF_NOWAIT flag (supported
2043 from the 4.14 Linux kernel) and the call will return instantly
2044 with EAGAIN or a partial result rather than waiting.
2045
2046 It is useful to also use ignore_error=EAGAIN when using this op‐
2047 tion. Note: glibc 2.27, 2.28 have a bug in syscall wrappers
2048 preadv2, pwritev2. They return EOPNOTSUP instead of EAGAIN.
2049
2050 For cached I/O, using this option usually means a request oper‐
2051 ates only with cached data. Currently the RWF_NOWAIT flag does
2052 not supported for cached write. For direct I/O, requests will
2053 only succeed if cache invalidation isn't required, file blocks
2054 are fully allocated and the disk request could be issued immedi‐
2055 ately.
2056
2057 (io_uring_cmd)fdp=bool
2058 Enable Flexible Data Placement mode for write commands.
2059
2060 (io_uring_cmd)fdp_pli=str
2061 Select which Placement ID Index/Indicies this job is allowed to
2062 use for writes. By default, the job will cycle through all
2063 available Placement IDs, so use this to isolate these identi‐
2064 fiers to specific jobs. If you want fio to use placement identi‐
2065 fier only at indices 0, 2 and 5 specify, you would set
2066 `fdp_pli=0,2,5`.
2067
2068 (cpuio)cpuload=int
2069 Attempt to use the specified percentage of CPU cycles. This is a
2070 mandatory option when using cpuio I/O engine.
2071
2072 (cpuio)cpuchunks=int
2073 Split the load into cycles of the given time. In microseconds.
2074
2075 (cpuio)cpumode=str
2076 Specify how to stress the CPU. It can take these two values:
2077
2078 noop This is the default and directs the CPU to execute
2079 noop instructions.
2080
2081 qsort Replace the default noop instructions with a qsort
2082 algorithm to consume more energy.
2083
2084 (cpuio)exit_on_io_done=bool
2085 Detect when I/O threads are done, then exit.
2086
2087 (libhdfs)namenode=str
2088 The hostname or IP address of a HDFS cluster namenode to con‐
2089 tact.
2090
2091 (libhdfs)port=int
2092 The listening port of the HFDS cluster namenode.
2093
2094 (netsplice,net)port=int
2095 The TCP or UDP port to bind to or connect to. If this is used
2096 with numjobs to spawn multiple instances of the same job type,
2097 then this will be the starting port number since fio will use a
2098 range of ports.
2099
2100 (rdma,librpma_*)port=int
2101 The port to use for RDMA-CM communication. This should be the
2102 same value on the client and the server side.
2103
2104 (netsplice,net,rdma)hostname=str
2105 The hostname or IP address to use for TCP, UDP or RDMA-CM based
2106 I/O. If the job is a TCP listener or UDP reader, the hostname
2107 is not used and must be omitted unless it is a valid UDP multi‐
2108 cast address.
2109
2110 (librpma_*)serverip=str
2111 The IP address to be used for RDMA-CM based I/O.
2112
2113 (librpma_*_server)direct_write_to_pmem=bool
2114 Set to 1 only when Direct Write to PMem from the remote host is
2115 possible. Otherwise, set to 0.
2116
2117 (librpma_*_server)busy_wait_polling=bool
2118 Set to 0 to wait for completion instead of busy-wait polling
2119 completion. Default: 1.
2120
2121 (netsplice,net)interface=str
2122 The IP address of the network interface used to send or receive
2123 UDP multicast.
2124
2125 (netsplice,net)ttl=int
2126 Time-to-live value for outgoing UDP multicast packets. Default:
2127 1.
2128
2129 (netsplice,net)nodelay=bool
2130 Set TCP_NODELAY on TCP connections.
2131
2132 (netsplice,net)protocol=str, proto=str
2133 The network protocol to use. Accepted values are:
2134
2135 tcp Transmission control protocol.
2136
2137 tcpv6 Transmission control protocol V6.
2138
2139 udp User datagram protocol.
2140
2141 udpv6 User datagram protocol V6.
2142
2143 unix UNIX domain socket.
2144
2145 When the protocol is TCP or UDP, the port must also be given, as
2146 well as the hostname if the job is a TCP listener or UDP reader.
2147 For unix sockets, the normal filename option should be used and
2148 the port is invalid.
2149
2150 (netsplice,net)listen
2151 For TCP network connections, tell fio to listen for incoming
2152 connections rather than initiating an outgoing connection. The
2153 hostname must be omitted if this option is used.
2154
2155 (netsplice,net)pingpong
2156 Normally a network writer will just continue writing data, and a
2157 network reader will just consume packages. If `pingpong=1' is
2158 set, a writer will send its normal payload to the reader, then
2159 wait for the reader to send the same payload back. This allows
2160 fio to measure network latencies. The submission and completion
2161 latencies then measure local time spent sending or receiving,
2162 and the completion latency measures how long it took for the
2163 other end to receive and send back. For UDP multicast traffic
2164 `pingpong=1' should only be set for a single reader when multi‐
2165 ple readers are listening to the same address.
2166
2167 (netsplice,net)window_size=int
2168 Set the desired socket buffer size for the connection.
2169
2170 (netsplice,net)mss=int
2171 Set the TCP maximum segment size (TCP_MAXSEG).
2172
2173 (e4defrag)donorname=str
2174 File will be used as a block donor (swap extents between files).
2175
2176 (e4defrag)inplace=int
2177 Configure donor file blocks allocation strategy:
2178
2179 0 Default. Preallocate donor's file on init.
2180
2181 1 Allocate space immediately inside defragment
2182 event, and free right after event.
2183
2184 (rbd,rados)clustername=str
2185 Specifies the name of the Ceph cluster.
2186
2187 (rbd)rbdname=str
2188 Specifies the name of the RBD.
2189
2190 (rbd,rados)pool=str
2191 Specifies the name of the Ceph pool containing RBD or RADOS
2192 data.
2193
2194 (rbd,rados)clientname=str
2195 Specifies the username (without the 'client.' prefix) used to
2196 access the Ceph cluster. If the clustername is specified, the
2197 clientname shall be the full *type.id* string. If no type. pre‐
2198 fix is given, fio will add 'client.' by default.
2199
2200 (rados)conf=str
2201 Specifies the configuration path of ceph cluster, so conf file
2202 does not have to be /etc/ceph/ceph.conf.
2203
2204 (rbd,rados)busy_poll=bool
2205 Poll store instead of waiting for completion. Usually this pro‐
2206 vides better throughput at cost of higher(up to 100%) CPU uti‐
2207 lization.
2208
2209 (rados)touch_objects=bool
2210 During initialization, touch (create if do not exist) all ob‐
2211 jects (files). Touching all objects affects ceph caches and
2212 likely impacts test results. Enabled by default.
2213
2214 (http)http_host=str
2215 Hostname to connect to. For S3, this could be the bucket name.
2216 Default is localhost
2217
2218 (http)http_user=str
2219 Username for HTTP authentication.
2220
2221 (http)http_pass=str
2222 Password for HTTP authentication.
2223
2224 (http)https=str
2225 Whether to use HTTPS instead of plain HTTP. on enables HTTPS;
2226 insecure will enable HTTPS, but disable SSL peer verification
2227 (use with caution!). Default is off.
2228
2229 (http)http_mode=str
2230 Which HTTP access mode to use: webdav, swift, or s3. Default is
2231 webdav.
2232
2233 (http)http_s3_region=str
2234 The S3 region/zone to include in the request. Default is us-
2235 east-1.
2236
2237 (http)http_s3_key=str
2238 The S3 secret key.
2239
2240 (http)http_s3_keyid=str
2241 The S3 key/access id.
2242
2243 (http)http_s3_sse_customer_key=str
2244 The encryption customer key in SSE server side.
2245
2246 (http)http_s3_sse_customer_algorithm=str
2247 The encryption customer algorithm in SSE server side. Default is
2248 AES256
2249
2250 (http)http_s3_storage_class=str
2251 Which storage class to access. User-customizable settings. De‐
2252 fault is STANDARD
2253
2254 (http)http_swift_auth_token=str
2255 The Swift auth token. See the example configuration file on how
2256 to retrieve this.
2257
2258 (http)http_verbose=int
2259 Enable verbose requests from libcurl. Useful for debugging. 1
2260 turns on verbose logging from libcurl, 2 additionally enables
2261 HTTP IO tracing. Default is 0
2262
2263 (mtd)skip_bad=bool
2264 Skip operations against known bad blocks.
2265
2266 (libhdfs)hdfsdirectory
2267 libhdfs will create chunk in this HDFS directory.
2268
2269 (libhdfs)chunk_size
2270 The size of the chunk to use for each file.
2271
2272 (rdma)verb=str
2273 The RDMA verb to use on this side of the RDMA ioengine connec‐
2274 tion. Valid values are write, read, send and recv. These corre‐
2275 spond to the equivalent RDMA verbs (e.g. write = rdma_write
2276 etc.). Note that this only needs to be specified on the client
2277 side of the connection. See the examples folder.
2278
2279 (rdma)bindname=str
2280 The name to use to bind the local RDMA-CM connection to a local
2281 RDMA device. This could be a hostname or an IPv4 or IPv6 ad‐
2282 dress. On the server side this will be passed into the
2283 rdma_bind_addr() function and on the client site it will be used
2284 in the rdma_resolve_add() function. This can be useful when mul‐
2285 tiple paths exist between the client and the server or in cer‐
2286 tain loopback configurations.
2287
2288 (filestat)stat_type=str
2289 Specify stat system call type to measure lookup/getattr perfor‐
2290 mance. Default is stat for stat(2).
2291
2292 (sg)hipri
2293 If this option is set, fio will attempt to use polled IO comple‐
2294 tions. This will have a similar effect as (io_uring)hipri. Only
2295 SCSI READ and WRITE commands will have the SGV4_FLAG_HIPRI set
2296 (not UNMAP (trim) nor VERIFY). Older versions of the Linux sg
2297 driver that do not support hipri will simply ignore this flag
2298 and do normal IO. The Linux SCSI Low Level Driver (LLD) that
2299 "owns" the device also needs to support hipri (also known as
2300 iopoll and mq_poll). The MegaRAID driver is an example of a SCSI
2301 LLD. Default: clear (0) which does normal (interrupted based)
2302 IO.
2303
2304 (sg)readfua=bool
2305 With readfua option set to 1, read operations include the force
2306 unit access (fua) flag. Default: 0.
2307
2308 (sg)writefua=bool
2309 With writefua option set to 1, write operations include the
2310 force unit access (fua) flag. Default: 0.
2311
2312 (sg)sg_write_mode=str
2313 Specify the type of write commands to issue. This option can
2314 take multiple values:
2315
2316 write (default)
2317 Write opcodes are issued as usual
2318
2319 write_and_verify
2320 Issue WRITE AND VERIFY commands. The BYTCHK bit is
2321 set to 00b. This directs the device to carry out a
2322 medium verification with no data comparison for
2323 the data that was written. The writefua option is
2324 ignored with this selection.
2325
2326 verify This option is deprecated. Use write_and_verify
2327 instead.
2328
2329 write_same
2330 Issue WRITE SAME commands. This transfers a single
2331 block to the device and writes this same block of
2332 data to a contiguous sequence of LBAs beginning at
2333 the specified offset. fio's block size parameter
2334 specifies the amount of data written with each
2335 command. However, the amount of data actually
2336 transferred to the device is equal to the device's
2337 block (sector) size. For a device with 512 byte
2338 sectors, blocksize=8k will write 16 sectors with
2339 each command. fio will still generate 8k of data
2340 for each command butonly the first 512 bytes will
2341 be used and transferred to the device. The write‐
2342 fua option is ignored with this selection.
2343
2344 same This option is deprecated. Use write_same instead.
2345
2346 write_same_ndob
2347 Issue WRITE SAME(16) commands as above but with
2348 the No Data Output Buffer (NDOB) bit set. No data
2349 will be transferred to the device with this bit
2350 set. Data written will be a pre-determined pattern
2351 such as all zeroes.
2352
2353 write_stream
2354 Issue WRITE STREAM(16) commands. Use the stream_id
2355 option to specify the stream identifier.
2356
2357 verify_bytchk_00
2358 Issue VERIFY commands with BYTCHK set to 00. This
2359 directs the device to carry out a medium verifica‐
2360 tion with no data comparison.
2361
2362 verify_bytchk_01
2363 Issue VERIFY commands with BYTCHK set to 01. This
2364 directs the device to compare the data on the de‐
2365 vice with the data transferred to the device.
2366
2367 verify_bytchk_11
2368 Issue VERIFY commands with BYTCHK set to 11. This
2369 transfers a single block to the device and com‐
2370 pares the contents of this block with the data on
2371 the device beginning at the specified offset.
2372 fio's block size parameter specifies the total
2373 amount of data compared with this command. How‐
2374 ever, only one block (sector) worth of data is
2375 transferred to the device. This is similar to the
2376 WRITE SAME command except that data is compared
2377 instead of written.
2378
2379 (sg)stream_id=int
2380 Set the stream identifier for WRITE STREAM commands. If this is
2381 set to 0 (which is not a valid stream identifier) fio will open
2382 a stream and then close it when done. Default is 0.
2383
2384 (nbd)uri=str
2385 Specify the NBD URI of the server to test. The string is a
2386 standard NBD URI (see https://github.com/NetworkBlockDe‐
2387 vice/nbd/tree/master/doc). Example URIs:
2388
2389 nbd://localhost:10809
2390
2391 nbd+unix:///?socket=/tmp/socket
2392
2393 nbds://tlshost/exportname
2394
2395 (libcufile)gpu_dev_ids=str
2396 Specify the GPU IDs to use with CUDA. This is a colon-separated
2397 list of int. GPUs are assigned to workers roundrobin. Default
2398 is 0.
2399
2400 (libcufile)cuda_io=str
2401 Specify the type of I/O to use with CUDA. This option takes the
2402 following values:
2403
2404 cufile (default)
2405 Use libcufile and nvidia-fs. This option performs
2406 I/O directly between a GPUDirect Storage filesys‐
2407 tem and GPU buffers, avoiding use of a bounce buf‐
2408 fer. If verify is set, cudaMemcpy is used to copy
2409 verification data between RAM and GPU(s). Verifi‐
2410 cation data is copied from RAM to GPU before a
2411 write and from GPU to RAM after a read. direct
2412 must be 1.
2413
2414 posix Use POSIX to perform I/O with a RAM buffer, and
2415 use cudaMemcpy to transfer data between RAM and
2416 the GPU(s). Data is copied from GPU to RAM before
2417 a write and copied from RAM to GPU after a read.
2418 verify does not affect the use of cudaMemcpy.
2419
2420 (dfs)pool
2421 Specify the label or UUID of the DAOS pool to connect to.
2422
2423 (dfs)cont
2424 Specify the label or UUID of the DAOS container to open.
2425
2426 (dfs)chunk_size
2427 Specify a different chunk size (in bytes) for the dfs file. Use
2428 DAOS container's chunk size by default.
2429
2430 (dfs)object_class
2431 Specify a different object class for the dfs file. Use DAOS
2432 container's object class by default.
2433
2434 (nfs)nfs_url
2435 URL in libnfs format, eg
2436 nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*] Refer to the
2437 libnfs README for more details.
2438
2439 (exec)program=str
2440 Specify the program to execute. Note the program will receive a
2441 SIGTERM when the job is reaching the time limit. A SIGKILL is
2442 sent once the job is over. The delay between the two signals is
2443 defined by grace_time option.
2444
2445 (exec)arguments=str
2446 Specify arguments to pass to program. Some special variables
2447 can be expanded to pass fio's job details to the program :
2448
2449 %r replaced by the duration of the job in seconds
2450
2451 %n replaced by the name of the job
2452
2453 (exec)grace_time=int
2454 Defines the time between the SIGTERM and SIGKILL signals. De‐
2455 fault is 1 second.
2456
2457 (exec)std_redirect=bool
2458 If set, stdout and stderr streams are redirected to files named
2459 from the job name. Default is true.
2460
2461 (xnvme)xnvme_async=str
2462 Select the xnvme async command interface. This can take these
2463 values.
2464
2465 emu This is default and use to emulate asynchronous
2466 I/O by using a single thread to create a queue
2467 pair on top of a synchronous I/O interface using
2468 the NVMe driver IOCTL.
2469
2470 thrpool
2471 Emulate an asynchronous I/O interface with a pool
2472 of userspace threads on top of a synchronous I/O
2473 interface using the NVMe driver IOCTL. By default
2474 four threads are used.
2475
2476 io_uring
2477 Linux native asynchronous I/O interface which sup‐
2478 ports both direct and buffered I/O.
2479
2480 libaio Use Linux aio for Asynchronous I/O
2481
2482 posix Use the posix asynchronous I/O interface to per‐
2483 form one or more I/O operations asynchronously.
2484
2485 vfio Use the user-space VFIO-based backend, implemented
2486 using libvfn instead of SPDK.
2487
2488 nil Do not transfer any data; just pretend to. This is
2489 mainly used for introspective performance evalua‐
2490 tion.
2491
2492 (xnvme)xnvme_sync=str
2493 Select the xnvme synchronous command interface. This can take
2494 these values.
2495
2496 nvme This is default and uses Linux NVMe Driver ioctl()
2497 for synchronous I/O.
2498
2499 psync This supports regular as well as vectored pread()
2500 and pwrite() commands.
2501
2502 block This is the same as psync except that it also sup‐
2503 ports zone management commands using Linux block
2504 layer IOCTLs.
2505
2506 (xnvme)xnvme_admin=str
2507 Select the xnvme admin command interface. This can take these
2508 values.
2509
2510 nvme This is default and uses Linux NVMe Driver ioctl()
2511 for admin commands.
2512
2513 block Use Linux Block Layer ioctl() and sysfs for admin
2514 commands.
2515
2516 (xnvme)xnvme_dev_nsid=int
2517 xnvme namespace identifier for userspace NVMe driver SPDK or
2518 vfio.
2519
2520 (xnvme)xnvme_dev_subnqn=str
2521 Sets the subsystem NQN for fabrics. This is for xNVMe to utilize
2522 a fabrics target with multiple systems.
2523
2524 (xnvme)xnvme_mem=str
2525 Select the xnvme memory backend. This can take these values.
2526
2527 posix This is the default posix memory backend for linux
2528 NVMe driver.
2529
2530 hugepage
2531 Use hugepages, instead of existing posix memory
2532 backend. The memory backend uses hugetlbfs. This
2533 require users to allocate hugepages, mount
2534 hugetlbfs and set an enviornment variable for XN‐
2535 VME_HUGETLB_PATH.
2536
2537 spdk Uses SPDK's memory allocator.
2538
2539 vfio Uses libvfn's memory allocator. This also speci‐
2540 fies the use of libvfn backend instead of SPDK.
2541
2542 (xnvme)xnvme_iovec
2543 If this option is set, xnvme will use vectored read/write com‐
2544 mands.
2545
2546 (libblkio)libblkio_driver=str
2547 The libblkio driver to use. Different drivers access devices
2548 through different underlying interfaces. Available drivers de‐
2549 pend on the libblkio version in use and are listed at
2550 https://libblkio.gitlab.io/libblkio/blkio.html#drivers
2551
2552 (libblkio)libblkio_path=str
2553 Sets the value of the driver-specific "path" property before
2554 connecting the libblkio instance, which identifies the target
2555 device or file on which to perform I/O. Its exact semantics are
2556 driver-dependent and not all drivers may support it; see
2557 https://libblkio.gitlab.io/libblkio/blkio.html#drivers
2558
2559 (libblkio)libblkio_pre_connect_props=str
2560 A colon-separated list of additional libblkio properties to be
2561 set after creating but before connecting the libblkio instance.
2562 Each property must have the format <name>=<value>. Colons can be
2563 escaped as \:. These are set after the engine sets any other
2564 properties, so those can be overriden. Available properties de‐
2565 pend on the libblkio version in use and are listed at
2566 https://libblkio.gitlab.io/libblkio/blkio.html#properties
2567
2568 (libblkio)libblkio_num_entries=int
2569 Sets the value of the driver-specific "num-entries" property be‐
2570 fore starting the libblkio instance. Its exact semantics are
2571 driver-dependent and not all drivers may support it; see
2572 https://libblkio.gitlab.io/libblkio/blkio.html#drivers
2573
2574 (libblkio)libblkio_queue_size=int
2575 Sets the value of the driver-specific "queue-size" property be‐
2576 fore starting the libblkio instance. Its exact semantics are
2577 driver-dependent and not all drivers may support it; see
2578 https://libblkio.gitlab.io/libblkio/blkio.html#drivers
2579
2580 (libblkio)libblkio_pre_start_props=str
2581 A colon-separated list of additional libblkio properties to be
2582 set after connecting but before starting the libblkio instance.
2583 Each property must have the format <name>=<value>. Colons can be
2584 escaped as \:. These are set after the engine sets any other
2585 properties, so those can be overriden. Available properties de‐
2586 pend on the libblkio version in use and are listed at
2587 https://libblkio.gitlab.io/libblkio/blkio.html#properties
2588
2589 (libblkio)hipri
2590 Use poll queues. This is incompatible with lib‐
2591 blkio_wait_mode=eventfd and libblkio_force_enable_comple‐
2592 tion_eventfd.
2593
2594 (libblkio)libblkio_vectored
2595 Submit vectored read and write requests.
2596
2597 (libblkio)libblkio_write_zeroes_on_trim
2598 Submit trims as "write zeroes" requests instead of discard re‐
2599 quests.
2600
2601 (libblkio)libblkio_wait_mode=str
2602 How to wait for completions:
2603
2604 block (default)
2605 Use a blocking call to blkioq_do_io().
2606
2607 eventfd
2608 Use a blocking call to read() on the completion
2609 eventfd.
2610
2611 loop Use a busy loop with a non-blocking call to
2612 blkioq_do_io().
2613
2614 (libblkio)libblkio_force_enable_completion_eventfd
2615 Enable the queue's completion eventfd even when unused. This may
2616 impact performance. The default is to enable it only if lib‐
2617 blkio_wait_mode=eventfd.
2618
2619 (windowsaio)no_completion_thread
2620 Avoid using a separate thread for completion polling.
2621
2622 I/O depth
2623 iodepth=int
2624 Number of I/O units to keep in flight against the file. Note
2625 that increasing iodepth beyond 1 will not affect synchronous io‐
2626 engines (except for small degrees when verify_async is in use).
2627 Even async engines may impose OS restrictions causing the de‐
2628 sired depth not to be achieved. This may happen on Linux when
2629 using libaio and not setting `direct=1', since buffered I/O is
2630 not async on that OS. Keep an eye on the I/O depth distribution
2631 in the fio output to verify that the achieved depth is as ex‐
2632 pected. Default: 1.
2633
2634 iodepth_batch_submit=int, iodepth_batch=int
2635 This defines how many pieces of I/O to submit at once. It de‐
2636 faults to 1 which means that we submit each I/O as soon as it is
2637 available, but can be raised to submit bigger batches of I/O at
2638 the time. If it is set to 0 the iodepth value will be used.
2639
2640 iodepth_batch_complete_min=int, iodepth_batch_complete=int
2641 This defines how many pieces of I/O to retrieve at once. It de‐
2642 faults to 1 which means that we'll ask for a minimum of 1 I/O in
2643 the retrieval process from the kernel. The I/O retrieval will go
2644 on until we hit the limit set by iodepth_low. If this variable
2645 is set to 0, then fio will always check for completed events be‐
2646 fore queuing more I/O. This helps reduce I/O latency, at the
2647 cost of more retrieval system calls.
2648
2649 iodepth_batch_complete_max=int
2650 This defines maximum pieces of I/O to retrieve at once. This
2651 variable should be used along with iodepth_batch_com‐
2652 plete_min=int variable, specifying the range of min and max
2653 amount of I/O which should be retrieved. By default it is equal
2654 to iodepth_batch_complete_min value. Example #1:
2655
2656 iodepth_batch_complete_min=1
2657 iodepth_batch_complete_max=<iodepth>
2658
2659 which means that we will retrieve at least 1 I/O and up to the
2660 whole submitted queue depth. If none of I/O has been completed
2661 yet, we will wait. Example #2:
2662
2663 iodepth_batch_complete_min=0
2664 iodepth_batch_complete_max=<iodepth>
2665
2666 which means that we can retrieve up to the whole submitted queue
2667 depth, but if none of I/O has been completed yet, we will NOT
2668 wait and immediately exit the system call. In this example we
2669 simply do polling.
2670
2671 iodepth_low=int
2672 The low water mark indicating when to start filling the queue
2673 again. Defaults to the same as iodepth, meaning that fio will
2674 attempt to keep the queue full at all times. If iodepth is set
2675 to e.g. 16 and iodepth_low is set to 4, then after fio has
2676 filled the queue of 16 requests, it will let the depth drain
2677 down to 4 before starting to fill it again.
2678
2679 serialize_overlap=bool
2680 Serialize in-flight I/Os that might otherwise cause or suffer
2681 from data races. When two or more I/Os are submitted simultane‐
2682 ously, there is no guarantee that the I/Os will be processed or
2683 completed in the submitted order. Further, if two or more of
2684 those I/Os are writes, any overlapping region between them can
2685 become indeterminate/undefined on certain storage. These issues
2686 can cause verification to fail erratically when at least one of
2687 the racing I/Os is changing data and the overlapping region has
2688 a non-zero size. Setting serialize_overlap tells fio to avoid
2689 provoking this behavior by explicitly serializing in-flight I/Os
2690 that have a non-zero overlap. Note that setting this option can
2691 reduce both performance and the iodepth achieved.
2692
2693 This option only applies to I/Os issued for a single job except
2694 when it is enabled along with io_submit_mode=offload. In offload
2695 mode, fio will check for overlap among all I/Os submitted by
2696 offload jobs with serialize_overlap enabled.
2697
2698 Default: false.
2699
2700 io_submit_mode=str
2701 This option controls how fio submits the I/O to the I/O engine.
2702 The default is `inline', which means that the fio job threads
2703 submit and reap I/O directly. If set to `offload', the job
2704 threads will offload I/O submission to a dedicated pool of I/O
2705 threads. This requires some coordination and thus has a bit of
2706 extra overhead, especially for lower queue depth I/O where it
2707 can increase latencies. The benefit is that fio can manage sub‐
2708 mission rates independently of the device completion rates. This
2709 avoids skewed latency reporting if I/O gets backed up on the de‐
2710 vice side (the coordinated omission problem). Note that this op‐
2711 tion cannot reliably be used with async IO engines.
2712
2713 I/O rate
2714 thinktime=time
2715 Stall the job for the specified period of time after an I/O has
2716 completed before issuing the next. May be used to simulate pro‐
2717 cessing being done by an application. When the unit is omitted,
2718 the value is interpreted in microseconds. See thinktime_blocks,
2719 thinktime_iotime and thinktime_spin.
2720
2721 thinktime_spin=time
2722 Only valid if thinktime is set - pretend to spend CPU time doing
2723 something with the data received, before falling back to sleep‐
2724 ing for the rest of the period specified by thinktime. When the
2725 unit is omitted, the value is interpreted in microseconds.
2726
2727 thinktime_blocks=int
2728 Only valid if thinktime is set - control how many blocks to is‐
2729 sue, before waiting thinktime usecs. If not set, defaults to 1
2730 which will make fio wait thinktime usecs after every block. This
2731 effectively makes any queue depth setting redundant, since no
2732 more than 1 I/O will be queued before we have to complete it and
2733 do our thinktime. In other words, this setting effectively caps
2734 the queue depth if the latter is larger.
2735
2736 thinktime_blocks_type=str
2737 Only valid if thinktime is set - control how thinktime_blocks
2738 triggers. The default is `complete', which triggers thinktime
2739 when fio completes thinktime_blocks blocks. If this is set to
2740 `issue', then the trigger happens at the issue side.
2741
2742 thinktime_iotime=time
2743 Only valid if thinktime is set - control thinktime interval by
2744 time. The thinktime stall is repeated after IOs are executed
2745 for thinktime_iotime. For example, `--thinktime_iotime=9s
2746 --thinktime=1s' repeat 10-second cycle with IOs for 9 seconds
2747 and stall for 1 second. When the unit is omitted, thinktime_io‐
2748 time is interpreted as a number of seconds. If this option is
2749 used together with thinktime_blocks, the thinktime stall is re‐
2750 peated after thinktime_iotime or after thinktime_blocks IOs,
2751 whichever happens first.
2752
2753
2754 rate=int[,int][,int]
2755 Cap the bandwidth used by this job. The number is in bytes/sec,
2756 the normal suffix rules apply. Comma-separated values may be
2757 specified for reads, writes, and trims as described in block‐
2758 size.
2759
2760 For example, using `rate=1m,500k' would limit reads to 1MiB/sec
2761 and writes to 500KiB/sec. Capping only reads or writes can be
2762 done with `rate=,500k' or `rate=500k,' where the former will
2763 only limit writes (to 500KiB/sec) and the latter will only limit
2764 reads.
2765
2766 rate_min=int[,int][,int]
2767 Tell fio to do whatever it can to maintain at least this band‐
2768 width. Failing to meet this requirement will cause the job to
2769 exit. Comma-separated values may be specified for reads, writes,
2770 and trims as described in blocksize.
2771
2772 rate_iops=int[,int][,int]
2773 Cap the bandwidth to this number of IOPS. Basically the same as
2774 rate, just specified independently of bandwidth. If the job is
2775 given a block size range instead of a fixed value, the smallest
2776 block size is used as the metric. Comma-separated values may be
2777 specified for reads, writes, and trims as described in block‐
2778 size.
2779
2780 rate_iops_min=int[,int][,int]
2781 If fio doesn't meet this rate of I/O, it will cause the job to
2782 exit. Comma-separated values may be specified for reads,
2783 writes, and trims as described in blocksize.
2784
2785 rate_process=str
2786 This option controls how fio manages rated I/O submissions. The
2787 default is `linear', which submits I/O in a linear fashion with
2788 fixed delays between I/Os that gets adjusted based on I/O com‐
2789 pletion rates. If this is set to `poisson', fio will submit I/O
2790 based on a more real world random request flow, known as the
2791 Poisson process (https://en.wikipedia.org/wiki/Pois‐
2792 son_point_process). The lambda will be 10^6 / IOPS for the given
2793 workload.
2794
2795 rate_ignore_thinktime=bool
2796 By default, fio will attempt to catch up to the specified rate
2797 setting, if any kind of thinktime setting was used. If this op‐
2798 tion is set, then fio will ignore the thinktime and continue do‐
2799 ing IO at the specified rate, instead of entering a catch-up
2800 mode after thinktime is done.
2801
2802 rate_cycle=int
2803 Average bandwidth for rate and rate_min over this number of mil‐
2804 liseconds. Defaults to 1000.
2805
2806 I/O latency
2807 latency_target=time
2808 If set, fio will attempt to find the max performance point that
2809 the given workload will run at while maintaining a latency below
2810 this target. When the unit is omitted, the value is interpreted
2811 in microseconds. See latency_window and latency_percentile.
2812
2813 latency_window=time
2814 Used with latency_target to specify the sample window that the
2815 job is run at varying queue depths to test the performance. When
2816 the unit is omitted, the value is interpreted in microseconds.
2817
2818 latency_percentile=float
2819 The percentage of I/Os that must fall within the criteria speci‐
2820 fied by latency_target and latency_window. If not set, this de‐
2821 faults to 100.0, meaning that all I/Os must be equal or below to
2822 the value set by latency_target.
2823
2824 latency_run=bool
2825 Used with latency_target. If false (default), fio will find the
2826 highest queue depth that meets latency_target and exit. If true,
2827 fio will continue running and try to meet latency_target by ad‐
2828 justing queue depth.
2829
2830 max_latency=time[,time][,time]
2831 If set, fio will exit the job with an ETIMEDOUT error if it ex‐
2832 ceeds this maximum latency. When the unit is omitted, the value
2833 is interpreted in microseconds. Comma-separated values may be
2834 specified for reads, writes, and trims as described in block‐
2835 size.
2836
2837 I/O replay
2838 write_iolog=str
2839 Write the issued I/O patterns to the specified file. See
2840 read_iolog. Specify a separate file for each job, otherwise the
2841 iologs will be interspersed and the file may be corrupt. This
2842 file will be opened in append mode.
2843
2844 read_iolog=str
2845 Open an iolog with the specified filename and replay the I/O
2846 patterns it contains. This can be used to store a workload and
2847 replay it sometime later. The iolog given may also be a blktrace
2848 binary file, which allows fio to replay a workload captured by
2849 blktrace. See blktrace(8) for how to capture such logging data.
2850 For blktrace replay, the file needs to be turned into a blkparse
2851 binary data file first (`blkparse <device> -o /dev/null -d
2852 file_for_fio.bin'). You can specify a number of files by sepa‐
2853 rating the names with a ':' character. See the filename option
2854 for information on how to escape ':' characters within the file
2855 names. These files will be sequentially assigned to job clones
2856 created by numjobs. '-' is a reserved name, meaning read from
2857 stdin, notably if filename is set to '-' which means stdin as
2858 well, then this flag can't be set to '-'.
2859
2860 read_iolog_chunked=bool
2861 Determines how iolog is read. If false (default) entire
2862 read_iolog will be read at once. If selected true, input from
2863 iolog will be read gradually. Useful when iolog is very large,
2864 or it is generated.
2865
2866 merge_blktrace_file=str
2867 When specified, rather than replaying the logs passed to
2868 read_iolog, the logs go through a merge phase which aggregates
2869 them into a single blktrace. The resulting file is then passed
2870 on as the read_iolog parameter. The intention here is to make
2871 the order of events consistent. This limits the influence of the
2872 scheduler compared to replaying multiple blktraces via concur‐
2873 rent jobs.
2874
2875 merge_blktrace_scalars=float_list
2876 This is a percentage based option that is index paired with the
2877 list of files passed to read_iolog. When merging is performed,
2878 scale the time of each event by the corresponding amount. For
2879 example, `--merge_blktrace_scalars="50:100"' runs the first
2880 trace in halftime and the second trace in realtime. This knob is
2881 separately tunable from replay_time_scale which scales the trace
2882 during runtime and will not change the output of the merge un‐
2883 like this option.
2884
2885 merge_blktrace_iters=float_list
2886 This is a whole number option that is index paired with the list
2887 of files passed to read_iolog. When merging is performed, run
2888 each trace for the specified number of iterations. For example,
2889 `--merge_blktrace_iters="2:1"' runs the first trace for two it‐
2890 erations and the second trace for one iteration.
2891
2892 replay_no_stall=bool
2893 When replaying I/O with read_iolog the default behavior is to
2894 attempt to respect the timestamps within the log and replay them
2895 with the appropriate delay between IOPS. By setting this vari‐
2896 able fio will not respect the timestamps and attempt to replay
2897 them as fast as possible while still respecting ordering. The
2898 result is the same I/O pattern to a given device, but different
2899 timings.
2900
2901 replay_time_scale=int
2902 When replaying I/O with read_iolog, fio will honor the original
2903 timing in the trace. With this option, it's possible to scale
2904 the time. It's a percentage option, if set to 50 it means run at
2905 50% the original IO rate in the trace. If set to 200, run at
2906 twice the original IO rate. Defaults to 100.
2907
2908 replay_redirect=str
2909 While replaying I/O patterns using read_iolog the default behav‐
2910 ior is to replay the IOPS onto the major/minor device that each
2911 IOP was recorded from. This is sometimes undesirable because on
2912 a different machine those major/minor numbers can map to a dif‐
2913 ferent device. Changing hardware on the same system can also re‐
2914 sult in a different major/minor mapping. replay_redirect causes
2915 all I/Os to be replayed onto the single specified device regard‐
2916 less of the device it was recorded from. i.e. `replay_redi‐
2917 rect=/dev/sdc' would cause all I/O in the blktrace or iolog to
2918 be replayed onto `/dev/sdc'. This means multiple devices will be
2919 replayed onto a single device, if the trace contains multiple
2920 devices. If you want multiple devices to be replayed concur‐
2921 rently to multiple redirected devices you must blkparse your
2922 trace into separate traces and replay them with independent fio
2923 invocations. Unfortunately this also breaks the strict time or‐
2924 dering between multiple device accesses.
2925
2926 replay_align=int
2927 Force alignment of the byte offsets in a trace to this value.
2928 The value must be a power of 2.
2929
2930 replay_scale=int
2931 Scale bye offsets down by this factor when replaying traces.
2932 Should most likely use replay_align as well.
2933
2934 Threads, processes and job synchronization
2935 replay_skip=str
2936 Sometimes it's useful to skip certain IO types in a replay
2937 trace. This could be, for instance, eliminating the writes in
2938 the trace. Or not replaying the trims/discards, if you are redi‐
2939 recting to a device that doesn't support them. This option
2940 takes a comma separated list of read, write, trim, sync.
2941
2942 thread Fio defaults to creating jobs by using fork, however if this op‐
2943 tion is given, fio will create jobs by using POSIX Threads'
2944 function pthread_create(3) to create threads instead.
2945
2946 wait_for=str
2947 If set, the current job won't be started until all workers of
2948 the specified waitee job are done. wait_for operates on the job
2949 name basis, so there are a few limitations. First, the waitee
2950 must be defined prior to the waiter job (meaning no forward ref‐
2951 erences). Second, if a job is being referenced as a waitee, it
2952 must have a unique name (no duplicate waitees).
2953
2954 nice=int
2955 Run the job with the given nice value. See man nice(2). On Win‐
2956 dows, values less than -15 set the process class to "High"; -1
2957 through -15 set "Above Normal"; 1 through 15 "Below Normal"; and
2958 above 15 "Idle" priority class.
2959
2960 prio=int
2961 Set the I/O priority value of this job. Linux limits us to a
2962 positive value between 0 and 7, with 0 being the highest. See
2963 man ionice(1). Refer to an appropriate manpage for other operat‐
2964 ing systems since meaning of priority may differ. For per-com‐
2965 mand priority setting, see the I/O engine specific `cmdprio_per‐
2966 centage` and `cmdprio` options.
2967
2968 prioclass=int
2969 Set the I/O priority class. See man ionice(1). For per-command
2970 priority setting, see the I/O engine specific `cmdprio_percent‐
2971 age` and `cmdprio_class` options.
2972
2973 cpus_allowed=str
2974 Controls the same options as cpumask, but accepts a textual
2975 specification of the permitted CPUs instead and CPUs are indexed
2976 from 0. So to use CPUs 0 and 5 you would specify `cpus_al‐
2977 lowed=0,5'. This option also allows a range of CPUs to be speci‐
2978 fied -- say you wanted a binding to CPUs 0, 5, and 8 to 15, you
2979 would set `cpus_allowed=0,5,8-15'.
2980
2981 On Windows, when `cpus_allowed' is unset only CPUs from fio's
2982 current processor group will be used and affinity settings are
2983 inherited from the system. An fio build configured to target
2984 Windows 7 makes options that set CPUs processor group aware and
2985 values will set both the processor group and a CPU from within
2986 that group. For example, on a system where processor group 0 has
2987 40 CPUs and processor group 1 has 32 CPUs, `cpus_allowed' values
2988 between 0 and 39 will bind CPUs from processor group 0 and
2989 `cpus_allowed' values between 40 and 71 will bind CPUs from pro‐
2990 cessor group 1. When using `cpus_allowed_policy=shared' all CPUs
2991 specified by a single `cpus_allowed' option must be from the
2992 same processor group. For Windows fio builds not built for Win‐
2993 dows 7, CPUs will only be selected from (and be relative to)
2994 whatever processor group fio happens to be running in and CPUs
2995 from other processor groups cannot be used.
2996
2997 cpus_allowed_policy=str
2998 Set the policy of how fio distributes the CPUs specified by
2999 cpus_allowed or cpumask. Two policies are supported:
3000
3001 shared All jobs will share the CPU set specified.
3002
3003 split Each job will get a unique CPU from the CPU set.
3004
3005 shared is the default behavior, if the option isn't specified.
3006 If split is specified, then fio will assign one cpu per job. If
3007 not enough CPUs are given for the jobs listed, then fio will
3008 roundrobin the CPUs in the set.
3009
3010 cpumask=int
3011 Set the CPU affinity of this job. The parameter given is a bit
3012 mask of allowed CPUs the job may run on. So if you want the al‐
3013 lowed CPUs to be 1 and 5, you would pass the decimal value of (1
3014 << 1 | 1 << 5), or 34. See man sched_setaffinity(2). This may
3015 not work on all supported operating systems or kernel versions.
3016 This option doesn't work well for a higher CPU count than what
3017 you can store in an integer mask, so it can only control cpus
3018 1-32. For boxes with larger CPU counts, use cpus_allowed.
3019
3020 numa_cpu_nodes=str
3021 Set this job running on specified NUMA nodes' CPUs. The argu‐
3022 ments allow comma delimited list of cpu numbers, A-B ranges, or
3023 `all'. Note, to enable NUMA options support, fio must be built
3024 on a system with libnuma-dev(el) installed.
3025
3026 numa_mem_policy=str
3027 Set this job's memory policy and corresponding NUMA nodes. For‐
3028 mat of the arguments:
3029
3030 <mode>[:<nodelist>]
3031
3032 `mode' is one of the following memory policies: `default', `pre‐
3033 fer', `bind', `interleave' or `local'. For `default' and `local'
3034 memory policies, no node needs to be specified. For `prefer',
3035 only one node is allowed. For `bind' and `interleave' the `node‐
3036 list' may be as follows: a comma delimited list of numbers, A-B
3037 ranges, or `all'.
3038
3039 cgroup=str
3040 Add job to this control group. If it doesn't exist, it will be
3041 created. The system must have a mounted cgroup blkio mount point
3042 for this to work. If your system doesn't have it mounted, you
3043 can do so with:
3044
3045 # mount -t cgroup -o blkio none /cgroup
3046
3047 cgroup_weight=int
3048 Set the weight of the cgroup to this value. See the documenta‐
3049 tion that comes with the kernel, allowed values are in the range
3050 of 100..1000.
3051
3052 cgroup_nodelete=bool
3053 Normally fio will delete the cgroups it has created after the
3054 job completion. To override this behavior and to leave cgroups
3055 around after the job completion, set `cgroup_nodelete=1'. This
3056 can be useful if one wants to inspect various cgroup files after
3057 job completion. Default: false.
3058
3059 flow_id=int
3060 The ID of the flow. If not specified, it defaults to being a
3061 global flow. See flow.
3062
3063 flow=int
3064 Weight in token-based flow control. If this value is used, then
3065 fio regulates the activity between two or more jobs sharing the
3066 same flow_id. Fio attempts to keep each job activity propor‐
3067 tional to other jobs' activities in the same flow_id group, with
3068 respect to requested weight per job. That is, if one job has
3069 `flow=3', another job has `flow=2' and another with `flow=1`,
3070 then there will be a roughly 3:2:1 ratio in how much one runs vs
3071 the others.
3072
3073 flow_sleep=int
3074 The period of time, in microseconds, to wait after the flow
3075 counter has exceeded its proportion before retrying operations.
3076
3077 stonewall, wait_for_previous
3078 Wait for preceding jobs in the job file to exit, before starting
3079 this one. Can be used to insert serialization points in the job
3080 file. A stone wall also implies starting a new reporting group,
3081 see group_reporting. Optionally you can use `stonewall=0` to
3082 disable or `stonewall=1` to enable it.
3083
3084 exitall
3085 By default, fio will continue running all other jobs when one
3086 job finishes. Sometimes this is not the desired action. Setting
3087 exitall will instead make fio terminate all jobs in the same
3088 group, as soon as one job of that group finishes.
3089
3090 exit_what=str
3091 By default, fio will continue running all other jobs when one
3092 job finishes. Sometimes this is not the desired action. Setting
3093 exitall will instead make fio terminate all jobs in the same
3094 group. The option exit_what allows you to control which jobs get
3095 terminated when exitall is enabled. The default value is group.
3096 The allowed values are:
3097
3098 all terminates all jobs.
3099
3100 group is the default and does not change the behaviour
3101 of exitall.
3102
3103 stonewall
3104 terminates all currently running jobs across all
3105 groups and continues execution with the next
3106 stonewalled group.
3107
3108 exec_prerun=str
3109 Before running this job, issue the command specified through
3110 system(3). Output is redirected in a file called `jobname.pre‐
3111 run.txt'.
3112
3113 exec_postrun=str
3114 After the job completes, issue the command specified though sys‐
3115 tem(3). Output is redirected in a file called `job‐
3116 name.postrun.txt'.
3117
3118 uid=int
3119 Instead of running as the invoking user, set the user ID to this
3120 value before the thread/process does any work.
3121
3122 gid=int
3123 Set group ID, see uid.
3124
3125 Verification
3126 verify_only
3127 Do not perform specified workload, only verify data still
3128 matches previous invocation of this workload. This option allows
3129 one to check data multiple times at a later date without over‐
3130 writing it. This option makes sense only for workloads that
3131 write data, and does not support workloads with the time_based
3132 option set.
3133
3134 do_verify=bool
3135 Run the verify phase after a write phase. Only valid if verify
3136 is set. Default: true.
3137
3138 verify=str
3139 If writing to a file, fio can verify the file contents after
3140 each iteration of the job. Each verification method also implies
3141 verification of special header, which is written to the begin‐
3142 ning of each block. This header also includes meta information,
3143 like offset of the block, block number, timestamp when block was
3144 written, etc. verify can be combined with verify_pattern option.
3145 The allowed values are:
3146
3147 md5 Use an md5 sum of the data area and store it in
3148 the header of each block.
3149
3150 crc64 Use an experimental crc64 sum of the data area and
3151 store it in the header of each block.
3152
3153 crc32c Use a crc32c sum of the data area and store it in
3154 the header of each block. This will automatically
3155 use hardware acceleration (e.g. SSE4.2 on an x86
3156 or CRC crypto extensions on ARM64) but will fall
3157 back to software crc32c if none is found. Gener‐
3158 ally the fastest checksum fio supports when hard‐
3159 ware accelerated.
3160
3161 crc32c-intel
3162 Synonym for crc32c.
3163
3164 crc32 Use a crc32 sum of the data area and store it in
3165 the header of each block.
3166
3167 crc16 Use a crc16 sum of the data area and store it in
3168 the header of each block.
3169
3170 crc7 Use a crc7 sum of the data area and store it in
3171 the header of each block.
3172
3173 xxhash Use xxhash as the checksum function. Generally the
3174 fastest software checksum that fio supports.
3175
3176 sha512 Use sha512 as the checksum function.
3177
3178 sha256 Use sha256 as the checksum function.
3179
3180 sha1 Use optimized sha1 as the checksum function.
3181
3182 sha3-224
3183 Use optimized sha3-224 as the checksum function.
3184
3185 sha3-256
3186 Use optimized sha3-256 as the checksum function.
3187
3188 sha3-384
3189 Use optimized sha3-384 as the checksum function.
3190
3191 sha3-512
3192 Use optimized sha3-512 as the checksum function.
3193
3194 meta This option is deprecated, since now meta informa‐
3195 tion is included in generic verification header
3196 and meta verification happens by default. For de‐
3197 tailed information see the description of the ver‐
3198 ify setting. This option is kept because of com‐
3199 patibility's sake with old configurations. Do not
3200 use it.
3201
3202 pattern
3203 Verify a strict pattern. Normally fio includes a
3204 header with some basic information and checksum‐
3205 ming, but if this option is set, only the specific
3206 pattern set with verify_pattern is verified.
3207
3208 null Only pretend to verify. Useful for testing inter‐
3209 nals with `ioengine=null', not for much else.
3210
3211 This option can be used for repeated burn-in tests of a system
3212 to make sure that the written data is also correctly read back.
3213 If the data direction given is a read or random read, fio will
3214 assume that it should verify a previously written file. If the
3215 data direction includes any form of write, the verify will be of
3216 the newly written data.
3217
3218 To avoid false verification errors, do not use the norandommap
3219 option when verifying data with async I/O engines and I/O depths
3220 > 1. Or use the norandommap and the lfsr random generator to‐
3221 gether to avoid writing to the same offset with multiple out‐
3222 standing I/Os.
3223
3224 verify_offset=int
3225 Swap the verification header with data somewhere else in the
3226 block before writing. It is swapped back before verifying.
3227
3228 verify_interval=int
3229 Write the verification header at a finer granularity than the
3230 blocksize. It will be written for chunks the size of verify_in‐
3231 terval. blocksize should divide this evenly.
3232
3233 verify_pattern=str
3234 If set, fio will fill the I/O buffers with this pattern. Fio de‐
3235 faults to filling with totally random bytes, but sometimes it's
3236 interesting to fill with a known pattern for I/O verification
3237 purposes. Depending on the width of the pattern, fio will fill
3238 1/2/3/4 bytes of the buffer at the time (it can be either a dec‐
3239 imal or a hex number). The verify_pattern if larger than a
3240 32-bit quantity has to be a hex number that starts with either
3241 "0x" or "0X". Use with verify. Also, verify_pattern supports %o
3242 format, which means that for each block offset will be written
3243 and then verified back, e.g.:
3244
3245 verify_pattern=%o
3246
3247 Or use combination of everything:
3248
3249 verify_pattern=0xff%o"abcd"-12
3250
3251 verify_fatal=bool
3252 Normally fio will keep checking the entire contents before quit‐
3253 ting on a block verification failure. If this option is set, fio
3254 will exit the job on the first observed failure. Default: false.
3255
3256 verify_dump=bool
3257 If set, dump the contents of both the original data block and
3258 the data block we read off disk to files. This allows later
3259 analysis to inspect just what kind of data corruption occurred.
3260 Off by default.
3261
3262 verify_async=int
3263 Fio will normally verify I/O inline from the submitting thread.
3264 This option takes an integer describing how many async offload
3265 threads to create for I/O verification instead, causing fio to
3266 offload the duty of verifying I/O contents to one or more sepa‐
3267 rate threads. If using this offload option, even sync I/O en‐
3268 gines can benefit from using an iodepth setting higher than 1,
3269 as it allows them to have I/O in flight while verifies are run‐
3270 ning. Defaults to 0 async threads, i.e. verification is not
3271 asynchronous.
3272
3273 verify_async_cpus=str
3274 Tell fio to set the given CPU affinity on the async I/O verifi‐
3275 cation threads. See cpus_allowed for the format used.
3276
3277 verify_backlog=int
3278 Fio will normally verify the written contents of a job that uti‐
3279 lizes verify once that job has completed. In other words, every‐
3280 thing is written then everything is read back and verified. You
3281 may want to verify continually instead for a variety of reasons.
3282 Fio stores the meta data associated with an I/O block in memory,
3283 so for large verify workloads, quite a bit of memory would be
3284 used up holding this meta data. If this option is enabled, fio
3285 will write only N blocks before verifying these blocks.
3286
3287 verify_backlog_batch=int
3288 Control how many blocks fio will verify if verify_backlog is
3289 set. If not set, will default to the value of verify_backlog
3290 (meaning the entire queue is read back and verified). If ver‐
3291 ify_backlog_batch is less than verify_backlog then not all
3292 blocks will be verified, if verify_backlog_batch is larger than
3293 verify_backlog, some blocks will be verified more than once.
3294
3295 verify_state_save=bool
3296 When a job exits during the write phase of a verify workload,
3297 save its current state. This allows fio to replay up until that
3298 point, if the verify state is loaded for the verify read phase.
3299 The format of the filename is, roughly:
3300
3301 <type>-<jobname>-<jobindex>-verify.state.
3302
3303 <type> is "local" for a local run, "sock" for a client/server
3304 socket connection, and "ip" (192.168.0.1, for instance) for a
3305 networked client/server connection. Defaults to true.
3306
3307 verify_state_load=bool
3308 If a verify termination trigger was used, fio stores the current
3309 write state of each thread. This can be used at verification
3310 time so that fio knows how far it should verify. Without this
3311 information, fio will run a full verification pass, according to
3312 the settings in the job file used. Default false.
3313
3314 experimental_verify=bool
3315 Enable experimental verification. Standard verify records I/O
3316 metadata for later use during the verification phase. Experimen‐
3317 tal verify instead resets the file after the write phase and
3318 then replays I/Os for the verification phase.
3319
3320 trim_percentage=int
3321 Number of verify blocks to discard/trim.
3322
3323 trim_verify_zero=bool
3324 Verify that trim/discarded blocks are returned as zeros.
3325
3326 trim_backlog=int
3327 Verify that trim/discarded blocks are returned as zeros.
3328
3329 trim_backlog_batch=int
3330 Trim this number of I/O blocks.
3331
3332 Steady state
3333 steadystate=str:float, ss=str:float
3334 Define the criterion and limit for assessing steady state per‐
3335 formance. The first parameter designates the criterion whereas
3336 the second parameter sets the threshold. When the criterion
3337 falls below the threshold for the specified duration, the job
3338 will stop. For example, `iops_slope:0.1%' will direct fio to
3339 terminate the job when the least squares regression slope falls
3340 below 0.1% of the mean IOPS. If group_reporting is enabled this
3341 will apply to all jobs in the group. Below is the list of avail‐
3342 able steady state assessment criteria. All assessments are car‐
3343 ried out using only data from the rolling collection window.
3344 Threshold limits can be expressed as a fixed value or as a per‐
3345 centage of the mean in the collection window.
3346
3347 When using this feature, most jobs should include the time_based
3348 and runtime options or the loops option so that fio does not
3349 stop running after it has covered the full size of the specified
3350 file(s) or device(s).
3351
3352 iops Collect IOPS data. Stop the job if all in‐
3353 dividual IOPS measurements are within the
3354 specified limit of the mean IOPS (e.g.,
3355 `iops:2' means that all individual IOPS
3356 values must be within 2 of the mean,
3357 whereas `iops:0.2%' means that all individ‐
3358 ual IOPS values must be within 0.2% of the
3359 mean IOPS to terminate the job).
3360
3361 iops_slope
3362 Collect IOPS data and calculate the least
3363 squares regression slope. Stop the job if
3364 the slope falls below the specified limit.
3365
3366 bw Collect bandwidth data. Stop the job if all
3367 individual bandwidth measurements are
3368 within the specified limit of the mean
3369 bandwidth.
3370
3371 bw_slope
3372 Collect bandwidth data and calculate the
3373 least squares regression slope. Stop the
3374 job if the slope falls below the specified
3375 limit.
3376
3377 steadystate_duration=time, ss_dur=time
3378 A rolling window of this duration will be used to judge
3379 whether steady state has been reached. Data will be col‐
3380 lected every ss_interval. The default is 0 which disables
3381 steady state detection. When the unit is omitted, the
3382 value is interpreted in seconds.
3383
3384 steadystate_ramp_time=time, ss_ramp=time
3385 Allow the job to run for the specified duration before
3386 beginning data collection for checking the steady state
3387 job termination criterion. The default is 0. When the
3388 unit is omitted, the value is interpreted in seconds.
3389
3390 steadystate_check_interval=time, ss_interval=time
3391 The values suring the rolling window will be collected
3392 with a period of this value. If ss_interval is 30s and
3393 ss_dur is 300s, 10 measurements will be taken. Default is
3394 1s but that might not converge, especially for slower de‐
3395 vices, so set this accordingly. When the unit is omitted,
3396 the value is interpreted in seconds.
3397
3398 Measurements and reporting
3399 per_job_logs=bool
3400 If set, this generates bw/clat/iops log with per file private
3401 filenames. If not set, jobs with identical names will share the
3402 log filename. Default: true.
3403
3404 group_reporting
3405 It may sometimes be interesting to display statistics for groups
3406 of jobs as a whole instead of for each individual job. This is
3407 especially true if numjobs is used; looking at individual
3408 thread/process output quickly becomes unwieldy. To see the final
3409 report per-group instead of per-job, use group_reporting. Jobs
3410 in a file will be part of the same reporting group, unless if
3411 separated by a stonewall, or by using new_group.
3412
3413 new_group
3414 Start a new reporting group. See: group_reporting. If not given,
3415 all jobs in a file will be part of the same reporting group, un‐
3416 less separated by a stonewall.
3417
3418 stats=bool
3419 By default, fio collects and shows final output results for all
3420 jobs that run. If this option is set to 0, then fio will ignore
3421 it in the final stat output.
3422
3423 write_bw_log=str
3424 If given, write a bandwidth log for this job. Can be used to
3425 store data of the bandwidth of the jobs in their lifetime.
3426
3427 If no str argument is given, the default filename of `job‐
3428 name_type.x.log' is used. Even when the argument is given, fio
3429 will still append the type of log. So if one specifies:
3430
3431 write_bw_log=foo
3432
3433 The actual log name will be `foo_bw.x.log' where `x' is the in‐
3434 dex of the job (1..N, where N is the number of jobs). If
3435 per_job_logs is false, then the filename will not include the
3436 `.x` job index.
3437
3438 The included fio_generate_plots script uses gnuplot to turn
3439 these text files into nice graphs. See the LOG FILE FORMATS sec‐
3440 tion for how data is structured within the file.
3441
3442 write_lat_log=str
3443 Same as write_bw_log, except this option creates I/O submission
3444 (e.g., `name_slat.x.log'), completion (e.g., `name_clat.x.log'),
3445 and total (e.g., `name_lat.x.log') latency files instead. See
3446 write_bw_log for details about the filename format and the LOG
3447 FILE FORMATS section for how data is structured within the
3448 files.
3449
3450 write_hist_log=str
3451 Same as write_bw_log but writes an I/O completion latency his‐
3452 togram file (e.g., `name_hist.x.log') instead. Note that this
3453 file will be empty unless log_hist_msec has also been set. See
3454 write_bw_log for details about the filename format and the LOG
3455 FILE FORMATS section for how data is structured within the file.
3456
3457 write_iops_log=str
3458 Same as write_bw_log, but writes an IOPS file (e.g.
3459 `name_iops.x.log`) instead. Because fio defaults to individual
3460 I/O logging, the value entry in the IOPS log will be 1 unless
3461 windowed logging (see log_avg_msec) has been enabled. See
3462 write_bw_log for details about the filename format and LOG FILE
3463 FORMATS for how data is structured within the file.
3464
3465 log_entries=int
3466 By default, fio will log an entry in the iops, latency, or bw
3467 log for every I/O that completes. The initial number of I/O log
3468 entries is 1024. When the log entries are all used, new log en‐
3469 tries are dynamically allocated. This dynamic log entry alloca‐
3470 tion may negatively impact time-related statistics such as I/O
3471 tail latencies (e.g. 99.9th percentile completion latency). This
3472 option allows specifying a larger initial number of log entries
3473 to avoid run-time allocation of new log entries, resulting in
3474 more precise time-related I/O statistics. Also see log_avg_msec
3475 as well. Defaults to 1024.
3476
3477 log_avg_msec=int
3478 By default, fio will log an entry in the iops, latency, or bw
3479 log for every I/O that completes. When writing to the disk log,
3480 that can quickly grow to a very large size. Setting this option
3481 makes fio average the each log entry over the specified period
3482 of time, reducing the resolution of the log. See log_max_value
3483 as well. Defaults to 0, logging all entries. Also see LOG FILE
3484 FORMATS section.
3485
3486 log_hist_msec=int
3487 Same as log_avg_msec, but logs entries for completion latency
3488 histograms. Computing latency percentiles from averages of in‐
3489 tervals using log_avg_msec is inaccurate. Setting this option
3490 makes fio log histogram entries over the specified period of
3491 time, reducing log sizes for high IOPS devices while retaining
3492 percentile accuracy. See log_hist_coarseness and write_hist_log
3493 as well. Defaults to 0, meaning histogram logging is disabled.
3494
3495 log_hist_coarseness=int
3496 Integer ranging from 0 to 6, defining the coarseness of the res‐
3497 olution of the histogram logs enabled with log_hist_msec. For
3498 each increment in coarseness, fio outputs half as many bins. De‐
3499 faults to 0, for which histogram logs contain 1216 latency bins.
3500 See LOG FILE FORMATS section.
3501
3502 log_max_value=bool
3503 If log_avg_msec is set, fio logs the average over that window.
3504 If you instead want to log the maximum value, set this option to
3505 1. Defaults to 0, meaning that averaged values are logged.
3506
3507 log_offset=bool
3508 If this is set, the iolog options will include the byte offset
3509 for the I/O entry as well as the other data values. Defaults to
3510 0 meaning that offsets are not present in logs. Also see LOG
3511 FILE FORMATS section.
3512
3513 log_prio=bool
3514 If this is set, the iolog options will include the I/O priority
3515 for the I/O entry as well as the other data values. Defaults to
3516 0 meaning that I/O priorities are not present in logs. Also see
3517 LOG FILE FORMATS section.
3518
3519 log_compression=int
3520 If this is set, fio will compress the I/O logs as it goes, to
3521 keep the memory footprint lower. When a log reaches the speci‐
3522 fied size, that chunk is removed and compressed in the back‐
3523 ground. Given that I/O logs are fairly highly compressible, this
3524 yields a nice memory savings for longer runs. The downside is
3525 that the compression will consume some background CPU cycles, so
3526 it may impact the run. This, however, is also true if the log‐
3527 ging ends up consuming most of the system memory. So pick your
3528 poison. The I/O logs are saved normally at the end of a run, by
3529 decompressing the chunks and storing them in the specified log
3530 file. This feature depends on the availability of zlib.
3531
3532 log_compression_cpus=str
3533 Define the set of CPUs that are allowed to handle online log
3534 compression for the I/O jobs. This can provide better isolation
3535 between performance sensitive jobs, and background compression
3536 work. See cpus_allowed for the format used.
3537
3538 log_store_compressed=bool
3539 If set, fio will store the log files in a compressed format.
3540 They can be decompressed with fio, using the --inflate-log com‐
3541 mand line parameter. The files will be stored with a `.fz' suf‐
3542 fix.
3543
3544 log_unix_epoch=bool
3545 If set, fio will log Unix timestamps to the log files produced
3546 by enabling write_type_log for each log type, instead of the de‐
3547 fault zero-based timestamps.
3548
3549 log_alternate_epoch=bool
3550 If set, fio will log timestamps based on the epoch used by the
3551 clock specified in the log_alternate_epoch_clock_id option, to
3552 the log files produced by enabling write_type_log for each log
3553 type, instead of the default zero-based timestamps.
3554
3555 log_alternate_epoch_clock_id=int
3556 Specifies the clock_id to be used by clock_gettime to obtain the
3557 alternate epoch if either Blog_unix_epoch or log_alternate_epoch
3558 are true. Otherwise has no effect. Default value is 0, or
3559 CLOCK_REALTIME.
3560
3561 block_error_percentiles=bool
3562 If set, record errors in trim block-sized units from writes and
3563 trims and output a histogram of how many trims it took to get to
3564 errors, and what kind of error was encountered.
3565
3566 bwavgtime=int
3567 Average the calculated bandwidth over the given time. Value is
3568 specified in milliseconds. If the job also does bandwidth log‐
3569 ging through write_bw_log, then the minimum of this option and
3570 log_avg_msec will be used. Default: 500ms.
3571
3572 iopsavgtime=int
3573 Average the calculated IOPS over the given time. Value is speci‐
3574 fied in milliseconds. If the job also does IOPS logging through
3575 write_iops_log, then the minimum of this option and log_avg_msec
3576 will be used. Default: 500ms.
3577
3578 disk_util=bool
3579 Generate disk utilization statistics, if the platform supports
3580 it. Default: true.
3581
3582 disable_lat=bool
3583 Disable measurements of total latency numbers. Useful only for
3584 cutting back the number of calls to gettimeofday(2), as that
3585 does impact performance at really high IOPS rates. Note that to
3586 really get rid of a large amount of these calls, this option
3587 must be used with disable_slat and disable_bw_measurement as
3588 well.
3589
3590 disable_clat=bool
3591 Disable measurements of completion latency numbers. See dis‐
3592 able_lat.
3593
3594 disable_slat=bool
3595 Disable measurements of submission latency numbers. See dis‐
3596 able_lat.
3597
3598 disable_bw_measurement=bool, disable_bw=bool
3599 Disable measurements of throughput/bandwidth numbers. See dis‐
3600 able_lat.
3601
3602 slat_percentiles=bool
3603 Report submission latency percentiles. Submission latency is not
3604 recorded for synchronous ioengines.
3605
3606 clat_percentiles=bool
3607 Report completion latency percentiles.
3608
3609 lat_percentiles=bool
3610 Report total latency percentiles. Total latency is the sum of
3611 submission latency and completion latency.
3612
3613 percentile_list=float_list
3614 Overwrite the default list of percentiles for latencies and the
3615 block error histogram. Each number is a floating point number in
3616 the range (0,100], and the maximum length of the list is 20. Use
3617 ':' to separate the numbers. For example, `--per‐
3618 centile_list=99.5:99.9' will cause fio to report the latency du‐
3619 rations below which 99.5% and 99.9% of the observed latencies
3620 fell, respectively.
3621
3622 significant_figures=int
3623 If using --output-format of `normal', set the significant fig‐
3624 ures to this value. Higher values will yield more precise IOPS
3625 and throughput units, while lower values will round. Requires a
3626 minimum value of 1 and a maximum value of 10. Defaults to 4.
3627
3628 Error handling
3629 exitall_on_error
3630 When one job finishes in error, terminate the rest. The default
3631 is to wait for each job to finish.
3632
3633 continue_on_error=str
3634 Normally fio will exit the job on the first observed failure. If
3635 this option is set, fio will continue the job when there is a
3636 'non-fatal error' (EIO or EILSEQ) until the runtime is exceeded
3637 or the I/O size specified is completed. If this option is used,
3638 there are two more stats that are appended, the total error
3639 count and the first error. The error field given in the stats is
3640 the first error that was hit during the run.
3641
3642 Note: a write error from the device may go unnoticed by fio when
3643 using buffered IO, as the write() (or similar) system call
3644 merely dirties the kernel pages, unless `sync' or `direct' is
3645 used. Device IO errors occur when the dirty data is actually
3646 written out to disk. If fully sync writes aren't desirable,
3647 `fsync' or `fdatasync' can be used as well. This is specific to
3648 writes, as reads are always synchronous.
3649
3650 The allowed values are:
3651
3652 none Exit on any I/O or verify errors.
3653
3654 read Continue on read errors, exit on all
3655 others.
3656
3657 write Continue on write errors, exit on
3658 all others.
3659
3660 io Continue on any I/O error, exit on
3661 all others.
3662
3663 verify Continue on verify errors, exit on
3664 all others.
3665
3666 all Continue on all errors.
3667
3668 0 Backward-compatible alias for
3669 'none'.
3670
3671 1 Backward-compatible alias for 'all'.
3672
3673 ignore_error=str
3674 Sometimes you want to ignore some errors during
3675 test in that case you can specify error list for
3676 each error type, instead of only being able to ig‐
3677 nore the default 'non-fatal error' using con‐
3678 tinue_on_error. `ignore_er‐
3679 ror=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST'
3680 errors for given error type is separated with ':'.
3681 Error may be symbol ('ENOSPC', 'ENOMEM') or inte‐
3682 ger. Example:
3683
3684 ignore_error=EAGAIN,ENOSPC:122
3685
3686 This option will ignore EAGAIN from READ, and
3687 ENOSPC and 122(EDQUOT) from WRITE. This option
3688 works by overriding continue_on_error with the
3689 list of errors for each error type if any.
3690
3691 error_dump=bool
3692 If set dump every error even if it is non fatal,
3693 true by default. If disabled only fatal error will
3694 be dumped.
3695
3696 Running predefined workloads
3697 Fio includes predefined profiles that mimic the I/O workloads generated
3698 by other tools.
3699
3700 profile=str
3701 The predefined workload to run. Current profiles are:
3702
3703 tiobench
3704 Threaded I/O bench (tiotest/tiobench) like work‐
3705 load.
3706
3707 act Aerospike Certification Tool (ACT) like workload.
3708
3709 To view a profile's additional options use --cmdhelp after specifying
3710 the profile. For example:
3711
3712 $ fio --profile=act --cmdhelp
3713
3714 Act profile options
3715 device-names=str
3716 Devices to use.
3717
3718 load=int
3719 ACT load multiplier. Default: 1.
3720
3721 test-duration=time
3722 How long the entire test takes to run. When the unit is omitted,
3723 the value is given in seconds. Default: 24h.
3724
3725 threads-per-queue=int
3726 Number of read I/O threads per device. Default: 8.
3727
3728 read-req-num-512-blocks=int
3729 Number of 512B blocks to read at the time. Default: 3.
3730
3731 large-block-op-kbytes=int
3732 Size of large block ops in KiB (writes). Default: 131072.
3733
3734 prep Set to run ACT prep phase.
3735
3736 Tiobench profile options
3737 size=str
3738 Size in MiB.
3739
3740 block=int
3741 Block size in bytes. Default: 4096.
3742
3743 numruns=int
3744 Number of runs.
3745
3746 dir=str
3747 Test directory.
3748
3749 threads=int
3750 Number of threads.
3751
3753 Fio spits out a lot of output. While running, fio will display the sta‐
3754 tus of the jobs created. An example of that would be:
3755
3756 Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
3757
3758 The characters inside the first set of square brackets denote the cur‐
3759 rent status of each thread. The first character is the first job de‐
3760 fined in the job file, and so forth. The possible values (in typical
3761 life cycle order) are:
3762
3763 P Thread setup, but not started.
3764 C Thread created.
3765 I Thread initialized, waiting or generating necessary data.
3766 p Thread running pre-reading file(s).
3767 / Thread is in ramp period.
3768 R Running, doing sequential reads.
3769 r Running, doing random reads.
3770 W Running, doing sequential writes.
3771 w Running, doing random writes.
3772 M Running, doing mixed sequential reads/writes.
3773 m Running, doing mixed random reads/writes.
3774 D Running, doing sequential trims.
3775 d Running, doing random trims.
3776 F Running, currently waiting for fsync(2).
3777 V Running, doing verification of written data.
3778 f Thread finishing.
3779 E Thread exited, not reaped by main thread yet.
3780 - Thread reaped.
3781 X Thread reaped, exited with an error.
3782 K Thread reaped, exited due to signal.
3783
3784 Fio will condense the thread string as not to take up more space on the
3785 command line than needed. For instance, if you have 10 readers and 10
3786 writers running, the output would look like this:
3787
3788 Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
3789
3790 Note that the status string is displayed in order, so it's possible to
3791 tell which of the jobs are currently doing what. In the example above
3792 this means that jobs 1--10 are readers and 11--20 are writers.
3793
3794 The other values are fairly self explanatory -- number of threads cur‐
3795 rently running and doing I/O, the number of currently open files (f=),
3796 the estimated completion percentage, the rate of I/O since last check
3797 (read speed listed first, then write speed and optionally trim speed)
3798 in terms of bandwidth and IOPS, and time to completion for the current
3799 running group. It's impossible to estimate runtime of the following
3800 groups (if any).
3801
3802 When fio is done (or interrupted by Ctrl-C), it will show the data for
3803 each thread, group of threads, and disks in that order. For each over‐
3804 all thread (or group) the output looks like:
3805
3806 Client1: (groupid=0, jobs=1): err= 0: pid=16109: Sat Jun 24 12:07:54 2017
3807 write: IOPS=88, BW=623KiB/s (638kB/s)(30.4MiB/50032msec)
3808 slat (nsec): min=500, max=145500, avg=8318.00, stdev=4781.50
3809 clat (usec): min=170, max=78367, avg=4019.02, stdev=8293.31
3810 lat (usec): min=174, max=78375, avg=4027.34, stdev=8291.79
3811 clat percentiles (usec):
3812 | 1.00th=[ 302], 5.00th=[ 326], 10.00th=[ 343], 20.00th=[ 363],
3813 | 30.00th=[ 392], 40.00th=[ 404], 50.00th=[ 416], 60.00th=[ 445],
3814 | 70.00th=[ 816], 80.00th=[ 6718], 90.00th=[12911], 95.00th=[21627],
3815 | 99.00th=[43779], 99.50th=[51643], 99.90th=[68682], 99.95th=[72877],
3816 | 99.99th=[78119]
3817 bw ( KiB/s): min= 532, max= 686, per=0.10%, avg=622.87, stdev=24.82, samples= 100
3818 iops : min= 76, max= 98, avg=88.98, stdev= 3.54, samples= 100
3819 lat (usec) : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
3820 lat (msec) : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
3821 lat (msec) : 100=0.65%
3822 cpu : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
3823 IO depths : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
3824 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3825 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3826 issued rwt: total=0,4450,0, short=0,0,0, dropped=0,0,0
3827 latency : target=0, window=0, percentile=100.00%, depth=8
3828
3829 The job name (or first job's name when using group_reporting) is
3830 printed, along with the group id, count of jobs being aggregated, last
3831 error id seen (which is 0 when there are no errors), pid/tid of that
3832 thread and the time the job/group completed. Below are the I/O statis‐
3833 tics for each data direction performed (showing writes in the example
3834 above). In the order listed, they denote:
3835
3836 read/write/trim
3837 The string before the colon shows the I/O direction the
3838 statistics are for. IOPS is the average I/Os performed
3839 per second. BW is the average bandwidth rate shown as:
3840 value in power of 2 format (value in power of 10 format).
3841 The last two values show: (total I/O performed in power
3842 of 2 format / runtime of that thread).
3843
3844 slat Submission latency (min being the minimum, max being the
3845 maximum, avg being the average, stdev being the standard
3846 deviation). This is the time it took to submit the I/O.
3847 For sync I/O this row is not displayed as the slat is re‐
3848 ally the completion latency (since queue/complete is one
3849 operation there). This value can be in nanoseconds, mi‐
3850 croseconds or milliseconds --- fio will choose the most
3851 appropriate base and print that (in the example above
3852 nanoseconds was the best scale). Note: in --minimal mode
3853 latencies are always expressed in microseconds.
3854
3855 clat Completion latency. Same names as slat, this denotes the
3856 time from submission to completion of the I/O pieces. For
3857 sync I/O, clat will usually be equal (or very close) to
3858 0, as the time from submit to complete is basically just
3859 CPU time (I/O has already been done, see slat explana‐
3860 tion).
3861
3862 lat Total latency. Same names as slat and clat, this denotes
3863 the time from when fio created the I/O unit to completion
3864 of the I/O operation.
3865
3866 bw Bandwidth statistics based on measurements from discrete
3867 intervals. Fio continuosly monitors bytes transferred and
3868 I/O operations completed. By default fio calculates band‐
3869 width in each half-second interval (see bwavgtime) and
3870 reports descriptive statistics for the measurements here.
3871 Same names as the xlat stats, but also includes the num‐
3872 ber of samples taken (samples) and an approximate per‐
3873 centage of total aggregate bandwidth this thread received
3874 in its group (per). This last value is only really useful
3875 if the threads in this group are on the same disk, since
3876 they are then competing for disk access.
3877
3878 iops IOPS statistics based on measurements from discrete in‐
3879 tervals. For details see the description for bw above.
3880 See iopsavgtime to control the duration of the intervals.
3881 Same values reported here as for bw except for percent‐
3882 age.
3883
3884 lat (nsec/usec/msec)
3885 The distribution of I/O completion latencies. This is the
3886 time from when I/O leaves fio and when it gets completed.
3887 Unlike the separate read/write/trim sections above, the
3888 data here and in the remaining sections apply to all I/Os
3889 for the reporting group. 250=0.04% means that 0.04% of
3890 the I/Os completed in under 250us. 500=64.11% means that
3891 64.11% of the I/Os required 250 to 499us for completion.
3892
3893 cpu CPU usage. User and system time, along with the number of
3894 context switches this thread went through, usage of sys‐
3895 tem and user time, and finally the number of major and
3896 minor page faults. The CPU utilization numbers are aver‐
3897 ages for the jobs in that reporting group, while the con‐
3898 text and fault counters are summed.
3899
3900 IO depths
3901 The distribution of I/O depths over the job lifetime. The
3902 numbers are divided into powers of 2 and each entry cov‐
3903 ers depths from that value up to those that are lower
3904 than the next entry -- e.g., 16= covers depths from 16 to
3905 31. Note that the range covered by a depth distribution
3906 entry can be different to the range covered by the equiv‐
3907 alent submit/complete distribution entry.
3908
3909 IO submit
3910 How many pieces of I/O were submitting in a single submit
3911 call. Each entry denotes that amount and below, until the
3912 previous entry -- e.g., 16=100% means that we submitted
3913 anywhere between 9 to 16 I/Os per submit call. Note that
3914 the range covered by a submit distribution entry can be
3915 different to the range covered by the equivalent depth
3916 distribution entry.
3917
3918 IO complete
3919 Like the above submit number, but for completions in‐
3920 stead.
3921
3922 IO issued rwt
3923 The number of read/write/trim requests issued, and how
3924 many of them were short or dropped.
3925
3926 IO latency
3927 These values are for latency_target and related options.
3928 When these options are engaged, this section describes
3929 the I/O depth required to meet the specified latency tar‐
3930 get.
3931
3932 After each client has been listed, the group statistics are printed.
3933 They will look like this:
3934
3935 Run status group 0 (all jobs):
3936 READ: bw=20.9MiB/s (21.9MB/s), 10.4MiB/s-10.8MiB/s (10.9MB/s-11.3MB/s), io=64.0MiB (67.1MB), run=2973-3069msec
3937 WRITE: bw=1231KiB/s (1261kB/s), 616KiB/s-621KiB/s (630kB/s-636kB/s), io=64.0MiB (67.1MB), run=52747-53223msec
3938
3939 For each data direction it prints:
3940
3941 bw Aggregate bandwidth of threads in this group followed by
3942 the minimum and maximum bandwidth of all the threads in
3943 this group. Values outside of brackets are power-of-2
3944 format and those within are the equivalent value in a
3945 power-of-10 format.
3946
3947 io Aggregate I/O performed of all threads in this group. The
3948 format is the same as bw.
3949
3950 run The smallest and longest runtimes of the threads in this
3951 group.
3952
3953 And finally, the disk statistics are printed. This is Linux specific.
3954 They will look like this:
3955
3956 Disk stats (read/write):
3957 sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
3958
3959 Each value is printed for both reads and writes, with reads first. The
3960 numbers denote:
3961
3962 ios Number of I/Os performed by all groups.
3963
3964 merge Number of merges performed by the I/O scheduler.
3965
3966 ticks Number of ticks we kept the disk busy.
3967
3968 in_queue
3969 Total time spent in the disk queue.
3970
3971 util The disk utilization. A value of 100% means we kept the
3972 disk busy constantly, 50% would be a disk idling half of
3973 the time.
3974
3975 It is also possible to get fio to dump the current output while it is
3976 running, without terminating the job. To do that, send fio the USR1
3977 signal. You can also get regularly timed dumps by using the --sta‐
3978 tus-interval parameter, or by creating a file in `/tmp' named
3979 `fio-dump-status'. If fio sees this file, it will unlink it and dump
3980 the current output status.
3981
3983 For scripted usage where you typically want to generate tables or
3984 graphs of the results, fio can output the results in a semicolon sepa‐
3985 rated format. The format is one long line of values, such as:
3986
3987 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
3988 A description of this job goes here.
3989
3990 The job description (if provided) follows on a second line for terse
3991 v2. It appears on the same line for other terse versions.
3992
3993 To enable terse output, use the --minimal or `--output-format=terse'
3994 command line options. The first value is the version of the terse out‐
3995 put format. If the output has to be changed for some reason, this num‐
3996 ber will be incremented by 1 to signify that change.
3997
3998 Split up, the format is as follows (comments in brackets denote when a
3999 field was introduced or whether it's specific to some terse version):
4000
4001 terse version, fio version [v3], jobname, groupid, error
4002
4003 READ status:
4004
4005 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
4006 Submission latency: min, max, mean, stdev (usec)
4007 Completion latency: min, max, mean, stdev (usec)
4008 Completion latency percentiles: 20 fields (see below)
4009 Total latency: min, max, mean, stdev (usec)
4010 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
4011 IOPS [v5]: min, max, mean, stdev, number of samples
4012
4013 WRITE status:
4014
4015 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
4016 Submission latency: min, max, mean, stdev (usec)
4017 Completion latency: min, max, mean, stdev (usec)
4018 Completion latency percentiles: 20 fields (see below)
4019 Total latency: min, max, mean, stdev (usec)
4020 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
4021 IOPS [v5]: min, max, mean, stdev, number of samples
4022
4023 TRIM status [all but version 3]:
4024
4025 Fields are similar to READ/WRITE status.
4026
4027 CPU usage:
4028
4029 user, system, context switches, major faults, minor faults
4030
4031 I/O depths:
4032
4033 <=1, 2, 4, 8, 16, 32, >=64
4034
4035 I/O latencies microseconds:
4036
4037 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
4038
4039 I/O latencies milliseconds:
4040
4041 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
4042
4043 Disk utilization [v3]:
4044
4045 disk name, read ios, write ios, read merges, write merges, read ticks, write ticks, time spent in queue, disk utilization percentage
4046
4047 Additional Info (dependent on continue_on_error, default off):
4048
4049 total # errors, first error code
4050
4051 Additional Info (dependent on description being set):
4052
4053 Text description
4054
4055 Completion latency percentiles can be a grouping of up to 20 sets, so
4056 for the terse output fio writes all of them. Each field will look like
4057 this:
4058
4059 1.00%=6112
4060
4061 which is the Xth percentile, and the `usec' latency associated with it.
4062
4063 For Disk utilization, all disks used by fio are shown. So for each disk
4064 there will be a disk utilization section.
4065
4066 Below is a single line containing short names for each of the fields in
4067 the minimal output v3, separated by semicolons:
4068
4069 terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
4070
4071 In client/server mode terse output differs from what appears when jobs
4072 are run locally. Disk utilization data is omitted from the standard
4073 terse output and for v3 and later appears on its own separate line at
4074 the end of each terse reporting cycle.
4075
4077 The json output format is intended to be both human readable and conve‐
4078 nient for automated parsing. For the most part its sections mirror
4079 those of the normal output. The runtime value is reported in msec and
4080 the bw value is reported in 1024 bytes per second units.
4081
4083 The json+ output format is identical to the json output format except
4084 that it adds a full dump of the completion latency bins. Each bins ob‐
4085 ject contains a set of (key, value) pairs where keys are latency dura‐
4086 tions and values count how many I/Os had completion latencies of the
4087 corresponding duration. For example, consider:
4088
4089 "bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1,
4090 "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" :
4091 534, "105984" : 5995, "107008" : 7529, ... }
4092
4093 This data indicates that one I/O required 87,552ns to complete, two
4094 I/Os required 100,864ns to complete, and 7529 I/Os required 107,008ns
4095 to complete.
4096
4097 Also included with fio is a Python script fio_jsonplus_clat2csv that
4098 takes json+ output and generates CSV-formatted latency data suitable
4099 for plotting.
4100
4101 The latency durations actually represent the midpoints of latency in‐
4102 tervals. For details refer to `stat.h' in the fio source.
4103
4105 There are two trace file format that you can encounter. The older (v1)
4106 format is unsupported since version 1.20-rc3 (March 2008). It will
4107 still be described below in case that you get an old trace and want to
4108 understand it.
4109
4110 In any case the trace is a simple text file with a single action per
4111 line.
4112
4113 Trace file format v1
4114 Each line represents a single I/O action in the following for‐
4115 mat:
4116
4117 rw, offset, length
4118
4119 where `rw=0/1' for read/write, and the `offset' and `length' en‐
4120 tries being in bytes.
4121
4122 This format is not supported in fio versions >= 1.20-rc3.
4123
4124 Trace file format v2
4125 The second version of the trace file format was added in fio
4126 version 1.17. It allows one to access more than one file per
4127 trace and has a bigger set of possible file actions.
4128
4129 The first line of the trace file has to be:
4130
4131 "fio version 2 iolog"
4132
4133 Following this can be lines in two different formats, which are
4134 described below.
4135
4136 The file management format:
4137 filename action
4138
4139 The `filename' is given as an absolute path. The `action'
4140 can be one of these:
4141
4142 add Add the given `filename' to the trace.
4143
4144 open Open the file with the given `filename'.
4145 The `filename' has to have been added with
4146 the add action before.
4147
4148 close Close the file with the given `filename'.
4149 The file has to have been opened before.
4150
4151 The file I/O action format:
4152 filename action offset length
4153
4154 The `filename' is given as an absolute path, and has to
4155 have been added and opened before it can be used with
4156 this format. The `offset' and `length' are given in
4157 bytes. The `action' can be one of these:
4158
4159 wait Wait for `offset' microseconds. Everything
4160 below 100 is discarded. The time is rela‐
4161 tive to the previous `wait' statement. Note
4162 that action `wait` is not allowed as of
4163 version 3, as the same behavior can be
4164 achieved using timestamps.
4165
4166 read Read `length' bytes beginning from `off‐
4167 set'.
4168
4169 write Write `length' bytes beginning from `off‐
4170 set'.
4171
4172 sync fsync(2) the file.
4173
4174 datasync
4175 fdatasync(2) the file.
4176
4177 trim Trim the given file from the given `offset'
4178 for `length' bytes.
4179
4180 Trace file format v3
4181 The third version of the trace file format was added in fio ver‐
4182 sion 3.31. It forces each action to have a timestamp associated
4183 with it.
4184
4185 The first line of the trace file has to be:
4186
4187 "fio version 3 iolog"
4188
4189 Following this can be lines in two different formats, which are
4190 described below.
4191
4192 The file management format:
4193 timestamp filename action
4194
4195 The file I/O action format:
4196 timestamp filename action offset length
4197
4198 The `timestamp` is relative to the beginning of the run
4199 (ie starts at 0). The `filename`, `action`, `offset` and
4200 `length` are identical to version 2, except that version
4201 3 does not allow the `wait` action.
4202
4204 Colocation is a common practice used to get the most out of a machine.
4205 Knowing which workloads play nicely with each other and which ones
4206 don't is a much harder task. While fio can replay workloads concur‐
4207 rently via multiple jobs, it leaves some variability up to the sched‐
4208 uler making results harder to reproduce. Merging is a way to make the
4209 order of events consistent.
4210
4211 Merging is integrated into I/O replay and done when a merge_blk‐
4212 trace_file is specified. The list of files passed to read_iolog go
4213 through the merge process and output a single file stored to the speci‐
4214 fied file. The output file is passed on as if it were the only file
4215 passed to read_iolog. An example would look like:
4216
4217 $ fio --read_iolog="<file1>:<file2>" --merge_blk‐
4218 trace_file="<output_file>"
4219
4220 Creating only the merged file can be done by passing the command line
4221 argument merge-blktrace-only.
4222
4223 Scaling traces can be done to see the relative impact of any particular
4224 trace being slowed down or sped up. merge_blktrace_scalars takes in a
4225 colon separated list of percentage scalars. It is index paired with the
4226 files passed to read_iolog.
4227
4228 With scaling, it may be desirable to match the running time of all
4229 traces. This can be done with merge_blktrace_iters. It is index paired
4230 with read_iolog just like merge_blktrace_scalars.
4231
4232 In an example, given two traces, A and B, each 60s long. If we want to
4233 see the impact of trace A issuing IOs twice as fast and repeat trace A
4234 over the runtime of trace B, the following can be done:
4235
4236 $ fio --read_iolog="<trace_a>:"<trace_b>" --merge_blk‐
4237 trace_file"<output_file>" --merge_blktrace_scalars="50:100"
4238 --merge_blktrace_iters="2:1"
4239
4240 This runs trace A at 2x the speed twice for approximately the same run‐
4241 time as a single run of trace B.
4242
4244 In some cases, we want to understand CPU overhead in a test. For exam‐
4245 ple, we test patches for the specific goodness of whether they reduce
4246 CPU usage. Fio implements a balloon approach to create a thread per
4247 CPU that runs at idle priority, meaning that it only runs when nobody
4248 else needs the cpu. By measuring the amount of work completed by the
4249 thread, idleness of each CPU can be derived accordingly.
4250
4251 An unit work is defined as touching a full page of unsigned characters.
4252 Mean and standard deviation of time to complete an unit work is re‐
4253 ported in "unit work" section. Options can be chosen to report detailed
4254 percpu idleness or overall system idleness by aggregating percpu stats.
4255
4257 Fio is usually run in one of two ways, when data verification is done.
4258 The first is a normal write job of some sort with verify enabled. When
4259 the write phase has completed, fio switches to reads and verifies ev‐
4260 erything it wrote. The second model is running just the write phase,
4261 and then later on running the same job (but with reads instead of
4262 writes) to repeat the same I/O patterns and verify the contents. Both
4263 of these methods depend on the write phase being completed, as fio oth‐
4264 erwise has no idea how much data was written.
4265
4266 With verification triggers, fio supports dumping the current write
4267 state to local files. Then a subsequent read verify workload can load
4268 this state and know exactly where to stop. This is useful for testing
4269 cases where power is cut to a server in a managed fashion, for in‐
4270 stance.
4271
4272 A verification trigger consists of two things:
4273
4274 1) Storing the write state of each job.
4275
4276 2) Executing a trigger command.
4277
4278 The write state is relatively small, on the order of hundreds of bytes
4279 to single kilobytes. It contains information on the number of comple‐
4280 tions done, the last X completions, etc.
4281
4282 A trigger is invoked either through creation ('touch') of a specified
4283 file in the system, or through a timeout setting. If fio is run with
4284 `--trigger-file=/tmp/trigger-file', then it will continually check for
4285 the existence of `/tmp/trigger-file'. When it sees this file, it will
4286 fire off the trigger (thus saving state, and executing the trigger com‐
4287 mand).
4288
4289 For client/server runs, there's both a local and remote trigger. If fio
4290 is running as a server backend, it will send the job states back to the
4291 client for safe storage, then execute the remote trigger, if specified.
4292 If a local trigger is specified, the server will still send back the
4293 write state, but the client will then execute the trigger.
4294
4295 Verification trigger example
4296 Let's say we want to run a powercut test on the remote Linux ma‐
4297 chine 'server'. Our write workload is in `write-test.fio'. We
4298 want to cut power to 'server' at some point during the run, and
4299 we'll run this test from the safety or our local machine, 'lo‐
4300 calbox'. On the server, we'll start the fio backend normally:
4301
4302 server# fio --server
4303
4304 and on the client, we'll fire off the workload:
4305
4306 localbox$ fio --client=server --trig‐
4307 ger-file=/tmp/my-trigger --trigger-remote="bash -c "echo
4308 b > /proc/sysrq-triger""
4309
4310 We set `/tmp/my-trigger' as the trigger file, and we tell fio to
4311 execute:
4312
4313 echo b > /proc/sysrq-trigger
4314
4315 on the server once it has received the trigger and sent us the
4316 write state. This will work, but it's not really cutting power
4317 to the server, it's merely abruptly rebooting it. If we have a
4318 remote way of cutting power to the server through IPMI or simi‐
4319 lar, we could do that through a local trigger command instead.
4320 Let's assume we have a script that does IPMI reboot of a given
4321 hostname, ipmi-reboot. On localbox, we could then have run fio
4322 with a local trigger instead:
4323
4324 localbox$ fio --client=server --trig‐
4325 ger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
4326
4327 For this case, fio would wait for the server to send us the
4328 write state, then execute `ipmi-reboot server' when that hap‐
4329 pened.
4330
4331 Loading verify state
4332 To load stored write state, a read verification job file must
4333 contain the verify_state_load option. If that is set, fio will
4334 load the previously stored state. For a local fio run this is
4335 done by loading the files directly, and on a client/server run,
4336 the server backend will ask the client to send the files over
4337 and load them from there.
4338
4340 Fio supports a variety of log file formats, for logging latencies,
4341 bandwidth, and IOPS. The logs share a common format, which looks like
4342 this:
4343
4344 time (msec), value, data direction, block size (bytes), offset
4345 (bytes), command priority
4346
4347 `Time' for the log entry is always in milliseconds. The `value' logged
4348 depends on the type of log, it will be one of the following:
4349
4350 Latency log
4351 Value is latency in nsecs
4352
4353 Bandwidth log
4354 Value is in KiB/sec
4355
4356 IOPS log
4357 Value is IOPS
4358
4359 `Data direction' is one of the following:
4360
4361 0 I/O is a READ
4362
4363 1 I/O is a WRITE
4364
4365 2 I/O is a TRIM
4366
4367 The entry's `block size' is always in bytes. The `offset' is the posi‐
4368 tion in bytes from the start of the file for that particular I/O. The
4369 logging of the offset can be toggled with log_offset.
4370
4371 If log_prio is not set, the entry's `Command priority` is 1 for an IO
4372 executed with the highest RT priority class (prioclass=1 or cmd‐
4373 prio_class=1) and 0 otherwise. This is controlled by the prioclass op‐
4374 tion and the ioengine specific cmdprio_percentage cmdprio_class op‐
4375 tions. If log_prio is set, the entry's `Command priority` is the prior‐
4376 ity set for the IO, as a 16-bits hexadecimal number with the lowest 13
4377 bits indicating the priority value (prio and cmdprio options) and the
4378 highest 3 bits indicating the IO priority class (prioclass and cmd‐
4379 prio_class options).
4380
4381 Fio defaults to logging every individual I/O but when windowed logging
4382 is set through log_avg_msec, either the average (by default) or the
4383 maximum (log_max_value is set) `value' seen over the specified period
4384 of time is recorded. Each `data direction' seen within the window pe‐
4385 riod will aggregate its values in a separate row. Further, when using
4386 windowed logging the `block size' and `offset' entries will always con‐
4387 tain 0.
4388
4390 Normally fio is invoked as a stand-alone application on the machine
4391 where the I/O workload should be generated. However, the backend and
4392 frontend of fio can be run separately i.e., the fio server can generate
4393 an I/O workload on the "Device Under Test" while being controlled by a
4394 client on another machine.
4395
4396 Start the server on the machine which has access to the storage DUT:
4397
4398 $ fio --server=args
4399
4400 where `args' defines what fio listens to. The arguments are of the form
4401 `type,hostname' or `IP,port'. `type' is either `ip' (or ip4) for TCP/IP
4402 v4, `ip6' for TCP/IP v6, or `sock' for a local unix domain socket.
4403 `hostname' is either a hostname or IP address, and `port' is the port
4404 to listen to (only valid for TCP/IP, not a local socket). Some exam‐
4405 ples:
4406
4407 1) fio --server
4408 Start a fio server, listening on all interfaces on the
4409 default port (8765).
4410
4411 2) fio --server=ip:hostname,4444
4412 Start a fio server, listening on IP belonging to hostname
4413 and on port 4444.
4414
4415 3) fio --server=ip6:::1,4444
4416 Start a fio server, listening on IPv6 localhost ::1 and
4417 on port 4444.
4418
4419 4) fio --server=,4444
4420 Start a fio server, listening on all interfaces on port
4421 4444.
4422
4423 5) fio --server=1.2.3.4
4424 Start a fio server, listening on IP 1.2.3.4 on the de‐
4425 fault port.
4426
4427 6) fio --server=sock:/tmp/fio.sock
4428 Start a fio server, listening on the local socket
4429 `/tmp/fio.sock'.
4430
4431 Once a server is running, a "client" can connect to the fio server
4432 with:
4433
4434 $ fio <local-args> --client=<server> <remote-args> <job file(s)>
4435
4436 where `local-args' are arguments for the client where it is running,
4437 `server' is the connect string, and `remote-args' and `job file(s)' are
4438 sent to the server. The `server' string follows the same format as it
4439 does on the server side, to allow IP/hostname/socket and port strings.
4440
4441 Fio can connect to multiple servers this way:
4442
4443 $ fio --client=<server1> <job file(s)> --client=<server2> <job
4444 file(s)>
4445
4446 If the job file is located on the fio server, then you can tell the
4447 server to load a local file as well. This is done by using --re‐
4448 mote-config:
4449
4450 $ fio --client=server --remote-config /path/to/file.fio
4451
4452 Then fio will open this local (to the server) job file instead of being
4453 passed one from the client.
4454
4455 If you have many servers (example: 100 VMs/containers), you can input a
4456 pathname of a file containing host IPs/names as the parameter value for
4457 the --client option. For example, here is an example `host.list' file
4458 containing 2 hostnames:
4459
4460 host1.your.dns.domain
4461 host2.your.dns.domain
4462
4463 The fio command would then be:
4464
4465 $ fio --client=host.list <job file(s)>
4466
4467 In this mode, you cannot input server-specific parameters or job files
4468 -- all servers receive the same job file.
4469
4470 In order to let `fio --client' runs use a shared filesystem from multi‐
4471 ple hosts, `fio --client' now prepends the IP address of the server to
4472 the filename. For example, if fio is using the directory `/mnt/nfs/fio'
4473 and is writing filename `fileio.tmp', with a --client `hostfile' con‐
4474 taining two hostnames `h1' and `h2' with IP addresses 192.168.10.120
4475 and 192.168.10.121, then fio will create two files:
4476
4477 /mnt/nfs/fio/192.168.10.120.fileio.tmp
4478 /mnt/nfs/fio/192.168.10.121.fileio.tmp
4479
4480 Terse output in client/server mode will differ slightly from what is
4481 produced when fio is run in stand-alone mode. See the terse output sec‐
4482 tion for details.
4483
4485 fio was written by Jens Axboe <axboe@kernel.dk>.
4486 This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au>
4487 based on documentation by Jens Axboe.
4488 This man page was rewritten by Tomohiro Kusumi <tkusumi@tuxera.com>
4489 based on documentation by Jens Axboe.
4490
4492 Report bugs to the fio mailing list <fio@vger.kernel.org>.
4493 See REPORTING-BUGS.
4494
4495 REPORTING-BUGS: http://git.kernel.dk/cgit/fio/plain/REPORTING-BUGS
4496
4498 For further documentation see HOWTO and README.
4499 Sample jobfiles are available in the `examples/' directory.
4500 These are typically located under `/usr/share/doc/fio'.
4501
4502 HOWTO: http://git.kernel.dk/cgit/fio/plain/HOWTO
4503 README: http://git.kernel.dk/cgit/fio/plain/README
4504
4505
4506
4507User Manual August 2017 fio(1)