1fio(1) General Commands Manual fio(1)
2
3
4
6 fio - flexible I/O tester
7
9 fio [options] [jobfile]...
10
12 fio is a tool that will spawn a number of threads or processes doing a
13 particular type of I/O action as specified by the user. The typical
14 use of fio is to write a job file matching the I/O load one wants to
15 simulate.
16
18 --debug=type
19 Enable verbose tracing type of various fio actions. May be `all'
20 for all types or individual types separated by a comma (e.g.
21 `--debug=file,mem' will enable file and memory debugging).
22 `help' will list all available tracing options.
23
24 --parse-only
25 Parse options only, don't start any I/O.
26
27 --merge-blktrace-only
28 Merge blktraces only, don't start any I/O.
29
30 --output=filename
31 Write output to filename.
32
33 --output-format=format
34 Set the reporting format to `normal', `terse', `json', or
35 `json+'. Multiple formats can be selected, separate by a comma.
36 `terse' is a CSV based format. `json+' is like `json', except it
37 adds a full dump of the latency buckets.
38
39 --bandwidth-log
40 Generate aggregate bandwidth logs.
41
42 --minimal
43 Print statistics in a terse, semicolon-delimited format.
44
45 --append-terse
46 Print statistics in selected mode AND terse, semicolon-delimited
47 format. Deprecated, use --output-format instead to select mul‐
48 tiple formats.
49
50 --terse-version=version
51 Set terse version output format (default `3', or `2', `4', `5').
52
53 --version
54 Print version information and exit.
55
56 --help Print a summary of the command line options and exit.
57
58 --cpuclock-test
59 Perform test and validation of internal CPU clock.
60
61 --crctest=[test]
62 Test the speed of the built-in checksumming functions. If no
63 argument is given, all of them are tested. Alternatively, a
64 comma separated list can be passed, in which case the given ones
65 are tested.
66
67 --cmdhelp=command
68 Print help information for command. May be `all' for all com‐
69 mands.
70
71 --enghelp=[ioengine[,command]]
72 List all commands defined by ioengine, or print help for command
73 defined by ioengine. If no ioengine is given, list all available
74 ioengines.
75
76 --showcmd=jobfile
77 Convert jobfile to a set of command-line options.
78
79 --readonly
80 Turn on safety read-only checks, preventing writes and trims.
81 The --readonly option is an extra safety guard to prevent users
82 from accidentally starting a write or trim workload when that is
83 not desired. Fio will only modify the device under test if
84 `rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite' is given.
85 This safety net can be used as an extra precaution.
86
87 --eta=when
88 Specifies when real-time ETA estimate should be printed. when
89 may be `always', `never' or `auto'. `auto' is the default, it
90 prints ETA when requested if the output is a TTY. `always' dis‐
91 regards the output type, and prints ETA when requested. `never'
92 never prints ETA.
93
94 --eta-interval=time
95 By default, fio requests client ETA status roughly every second.
96 With this option, the interval is configurable. Fio imposes a
97 minimum allowed time to avoid flooding the console, less than
98 250 msec is not supported.
99
100 --eta-newline=time
101 Force a new line for every time period passed. When the unit is
102 omitted, the value is interpreted in seconds.
103
104 --status-interval=time
105 Force a full status dump of cumulative (from job start) values
106 at time intervals. This option does *not* provide per-period
107 measurements. So values such as bandwidth are running averages.
108 When the time unit is omitted, time is interpreted in seconds.
109 Note that using this option with `--output-format=json' will
110 yield output that technically isn't valid json, since the output
111 will be collated sets of valid json. It will need to be split
112 into valid sets of json after the run.
113
114 --section=name
115 Only run specified section name in job file. Multiple sections
116 can be specified. The --section option allows one to combine
117 related jobs into one file. E.g. one job file could define
118 light, moderate, and heavy sections. Tell fio to run only the
119 "heavy" section by giving `--section=heavy' command line option.
120 One can also specify the "write" operations in one section and
121 "verify" operation in another section. The --section option only
122 applies to job sections. The reserved *global* section is always
123 parsed and used.
124
125 --alloc-size=kb
126 Allocate additional internal smalloc pools of size kb in KiB.
127 The --alloc-size option increases shared memory set aside for
128 use by fio. If running large jobs with randommap enabled, fio
129 can run out of memory. Smalloc is an internal allocator for
130 shared structures from a fixed size memory pool and can grow to
131 16 pools. The pool size defaults to 16MiB. NOTE: While running
132 `.fio_smalloc.*' backing store files are visible in `/tmp'.
133
134 --warnings-fatal
135 All fio parser warnings are fatal, causing fio to exit with an
136 error.
137
138 --max-jobs=nr
139 Set the maximum number of threads/processes to support to nr.
140 NOTE: On Linux, it may be necessary to increase the shared-mem‐
141 ory limit (`/proc/sys/kernel/shmmax') if fio runs into errors
142 while creating jobs.
143
144 --server=args
145 Start a backend server, with args specifying what to listen to.
146 See CLIENT/SERVER section.
147
148 --daemonize=pidfile
149 Background a fio server, writing the pid to the given pidfile
150 file.
151
152 --client=hostname
153 Instead of running the jobs locally, send and run them on the
154 given hostname or set of hostnames. See CLIENT/SERVER section.
155
156 --remote-config=file
157 Tell fio server to load this local file.
158
159 --idle-prof=option
160 Report CPU idleness. option is one of the following:
161
162 calibrate
163 Run unit work calibration only and exit.
164
165 system Show aggregate system idleness and unit work.
166
167 percpu As system but also show per CPU idleness.
168
169 --inflate-log=log
170 Inflate and output compressed log.
171
172 --trigger-file=file
173 Execute trigger command when file exists.
174
175 --trigger-timeout=time
176 Execute trigger at this time.
177
178 --trigger=command
179 Set this command as local trigger.
180
181 --trigger-remote=command
182 Set this command as remote trigger.
183
184 --aux-path=path
185 Use the directory specified by path for generated state files
186 instead of the current working directory.
187
189 Any parameters following the options will be assumed to be job files,
190 unless they match a job file parameter. Multiple job files can be
191 listed and each job file will be regarded as a separate group. Fio will
192 stonewall execution between each group.
193
194 Fio accepts one or more job files describing what it is supposed to do.
195 The job file format is the classic ini file, where the names enclosed
196 in [] brackets define the job name. You are free to use any ASCII name
197 you want, except *global* which has special meaning. Following the job
198 name is a sequence of zero or more parameters, one per line, that
199 define the behavior of the job. If the first character in a line is a
200 ';' or a '#', the entire line is discarded as a comment.
201
202 A *global* section sets defaults for the jobs described in that file. A
203 job may override a *global* section parameter, and a job file may even
204 have several *global* sections if so desired. A job is only affected by
205 a *global* section residing above it.
206
207 The --cmdhelp option also lists all options. If used with an command
208 argument, --cmdhelp will detail the given command.
209
210 See the `examples/' directory for inspiration on how to write job
211 files. Note the copyright and license requirements currently apply to
212 `examples/' files.
213
214 Note that the maximum length of a line in the job file is 8192 bytes.
215
217 Some parameters take an option of a given type, such as an integer or a
218 string. Anywhere a numeric value is required, an arithmetic expression
219 may be used, provided it is surrounded by parentheses. Supported opera‐
220 tors are:
221
222 addition (+)
223
224 subtraction (-)
225
226 multiplication (*)
227
228 division (/)
229
230 modulus (%)
231
232 exponentiation (^)
233
234 For time values in expressions, units are microseconds by default. This
235 is different than for time values not in expressions (not enclosed in
236 parentheses).
237
239 The following parameter types are used.
240
241 str String. A sequence of alphanumeric characters.
242
243 time Integer with possible time suffix. Without a unit value is
244 interpreted as seconds unless otherwise specified. Accepts a
245 suffix of 'd' for days, 'h' for hours, 'm' for minutes, 's' for
246 seconds, 'ms' (or 'msec') for milliseconds and 'us' (or 'usec')
247 for microseconds. For example, use 10m for 10 minutes.
248
249 int Integer. A whole number value, which may contain an integer pre‐
250 fix and an integer suffix.
251
252 [*integer prefix*] **number** [*integer suffix*]
253
254 The optional *integer prefix* specifies the number's base. The
255 default is decimal. *0x* specifies hexadecimal.
256
257 The optional *integer suffix* specifies the number's units, and
258 includes an optional unit prefix and an optional unit. For quan‐
259 tities of data, the default unit is bytes. For quantities of
260 time, the default unit is seconds unless otherwise specified.
261
262 With `kb_base=1000', fio follows international standards for
263 unit prefixes. To specify power-of-10 decimal values defined in
264 the International System of Units (SI):
265
266 K means kilo (K) or 1000
267 M means mega (M) or 1000**2
268 G means giga (G) or 1000**3
269 T means tera (T) or 1000**4
270 P means peta (P) or 1000**5
271
272 To specify power-of-2 binary values defined in IEC 80000-13:
273
274 Ki means kibi (Ki) or 1024
275 Mi means mebi (Mi) or 1024**2
276 Gi means gibi (Gi) or 1024**3
277 Ti means tebi (Ti) or 1024**4
278 Pi means pebi (Pi) or 1024**5
279
280 With `kb_base=1024' (the default), the unit prefixes are oppo‐
281 site from those specified in the SI and IEC 80000-13 standards
282 to provide compatibility with old scripts. For example, 4k means
283 4096.
284
285 For quantities of data, an optional unit of 'B' may be included
286 (e.g., 'kB' is the same as 'k').
287
288 The *integer suffix* is not case sensitive (e.g., m/mi mean
289 mebi/mega, not milli). 'b' and 'B' both mean byte, not bit.
290
291 Examples with `kb_base=1000':
292
293 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
294 1 MiB: 1048576, 1m, 1024k
295 1 MB: 1000000, 1mi, 1000ki
296 1 TiB: 1073741824, 1t, 1024m, 1048576k
297 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
298
299 Examples with `kb_base=1024' (default):
300
301 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
302 1 MiB: 1048576, 1m, 1024k
303 1 MB: 1000000, 1mi, 1000ki
304 1 TiB: 1073741824, 1t, 1024m, 1048576k
305 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
306
307 To specify times (units are not case sensitive):
308
309 D means days
310 H means hours
311 M mean minutes
312 s or sec means seconds (default)
313 ms or msec means milliseconds
314 us or usec means microseconds
315
316 If the option accepts an upper and lower range, use a colon ':'
317 or minus '-' to separate such values. See irange parameter type.
318 If the lower value specified happens to be larger than the upper
319 value the two values are swapped.
320
321 bool Boolean. Usually parsed as an integer, however only defined for
322 true and false (1 and 0).
323
324 irange Integer range with suffix. Allows value range to be given, such
325 as 1024-4096. A colon may also be used as the separator, e.g.
326 1k:4k. If the option allows two sets of ranges, they can be
327 specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see
328 int parameter type.
329
330 float_list
331 A list of floating point numbers, separated by a ':' character.
332
334 With the above in mind, here follows the complete list of fio job
335 parameters.
336
337 Units
338 kb_base=int
339 Select the interpretation of unit prefixes in input parameters.
340
341 1000 Inputs comply with IEC 80000-13 and the Interna‐
342 tional System of Units (SI). Use:
343
344 - power-of-2 values with IEC prefixes (e.g., KiB)
345 - power-of-10 values with SI prefixes (e.g., kB)
346
347 1024 Compatibility mode (default). To avoid breaking
348 old scripts:
349
350 - power-of-2 values with SI prefixes
351 - power-of-10 values with IEC prefixes
352
353 See bs for more details on input parameters.
354
355 Outputs always use correct prefixes. Most outputs include both
356 side-by-side, like:
357
358 bw=2383.3kB/s (2327.4KiB/s)
359
360 If only one value is reported, then kb_base selects the one to
361 use:
362
363 1000 -- SI prefixes
364 1024 -- IEC prefixes
365
366 unit_base=int
367 Base unit for reporting. Allowed values are:
368
369 0 Use auto-detection (default).
370
371 8 Byte based.
372
373 1 Bit based.
374
375 Job description
376 name=str
377 ASCII name of the job. This may be used to override the name
378 printed by fio for this job. Otherwise the job name is used. On
379 the command line this parameter has the special purpose of also
380 signaling the start of a new job.
381
382 description=str
383 Text description of the job. Doesn't do anything except dump
384 this text description when this job is run. It's not parsed.
385
386 loops=int
387 Run the specified number of iterations of this job. Used to
388 repeat the same workload a given number of times. Defaults to 1.
389
390 numjobs=int
391 Create the specified number of clones of this job. Each clone of
392 job is spawned as an independent thread or process. May be used
393 to setup a larger number of threads/processes doing the same
394 thing. Each thread is reported separately; to see statistics for
395 all clones as a whole, use group_reporting in conjunction with
396 new_group. See --max-jobs. Default: 1.
397
398 Time related parameters
399 runtime=time
400 Tell fio to terminate processing after the specified period of
401 time. It can be quite hard to determine for how long a specified
402 job will run, so this parameter is handy to cap the total run‐
403 time to a given time. When the unit is omitted, the value is
404 interpreted in seconds.
405
406 time_based
407 If set, fio will run for the duration of the runtime specified
408 even if the file(s) are completely read or written. It will sim‐
409 ply loop over the same workload as many times as the runtime
410 allows.
411
412 startdelay=irange(int)
413 Delay the start of job for the specified amount of time. Can be
414 a single value or a range. When given as a range, each thread
415 will choose a value randomly from within the range. Value is in
416 seconds if a unit is omitted.
417
418 ramp_time=time
419 If set, fio will run the specified workload for this amount of
420 time before logging any performance numbers. Useful for letting
421 performance settle before logging results, thus minimizing the
422 runtime required for stable results. Note that the ramp_time is
423 considered lead in time for a job, thus it will increase the
424 total runtime if a special timeout or runtime is specified. When
425 the unit is omitted, the value is given in seconds.
426
427 clocksource=str
428 Use the given clocksource as the base of timing. The supported
429 options are:
430
431 gettimeofday
432 gettimeofday(2)
433
434 clock_gettime
435 clock_gettime(2)
436
437 cpu Internal CPU clock source
438
439 cpu is the preferred clocksource if it is reliable, as it is
440 very fast (and fio is heavy on time calls). Fio will automati‐
441 cally use this clocksource if it's supported and considered
442 reliable on the system it is running on, unless another clock‐
443 source is specifically set. For x86/x86-64 CPUs, this means sup‐
444 porting TSC Invariant.
445
446 gtod_reduce=bool
447 Enable all of the gettimeofday(2) reducing options (dis‐
448 able_clat, disable_slat, disable_bw_measurement) plus reduce
449 precision of the timeout somewhat to really shrink the gettime‐
450 ofday(2) call count. With this option enabled, we only do about
451 0.4% of the gettimeofday(2) calls we would have done if all time
452 keeping was enabled.
453
454 gtod_cpu=int
455 Sometimes it's cheaper to dedicate a single thread of execution
456 to just getting the current time. Fio (and databases, for
457 instance) are very intensive on gettimeofday(2) calls. With this
458 option, you can set one CPU aside for doing nothing but logging
459 current time to a shared memory location. Then the other
460 threads/processes that run I/O workloads need only copy that
461 segment, instead of entering the kernel with a gettimeofday(2)
462 call. The CPU set aside for doing these time calls will be
463 excluded from other uses. Fio will manually clear it from the
464 CPU mask of other jobs.
465
466 Target file/device
467 directory=str
468 Prefix filenames with this directory. Used to place files in a
469 different location than `./'. You can specify a number of direc‐
470 tories by separating the names with a ':' character. These
471 directories will be assigned equally distributed to job clones
472 created by numjobs as long as they are using generated file‐
473 names. If specific filename(s) are set fio will use the first
474 listed directory, and thereby matching the filename semantic
475 (which generates a file for each clone if not specified, but
476 lets all clones use the same file if set).
477
478 See the filename option for information on how to escape ':'
479 characters within the directory path itself.
480
481 Note: To control the directory fio will use for internal state
482 files use --aux-path.
483
484 filename=str
485 Fio normally makes up a filename based on the job name, thread
486 number, and file number (see filename_format). If you want to
487 share files between threads in a job or several jobs with fixed
488 file paths, specify a filename for each of them to override the
489 default. If the ioengine is file based, you can specify a number
490 of files by separating the names with a ':' colon. So if you
491 wanted a job to open `/dev/sda' and `/dev/sdb' as the two work‐
492 ing files, you would use `filename=/dev/sda:/dev/sdb'. This also
493 means that whenever this option is specified, nrfiles is
494 ignored. The size of regular files specified by this option will
495 be size divided by number of files unless an explicit size is
496 specified by filesize.
497
498 Each colon in the wanted path must be escaped with a '\' charac‐
499 ter. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
500 would use `filename=/dev/dsk/foo@3,0\:c' and if the path is
501 `F:\filename' then you would use `filename=F\:\filename'.
502
503 On Windows, disk devices are accessed as `\\.\PhysicalDrive0'
504 for the first device, `\\.\PhysicalDrive1' for the second etc.
505 Note: Windows and FreeBSD prevent write access to areas of the
506 disk containing in-use data (e.g. filesystems).
507
508 The filename `-' is a reserved name, meaning *stdin* or *std‐
509 out*. Which of the two depends on the read/write direction set.
510
511 filename_format=str
512 If sharing multiple files between jobs, it is usually necessary
513 to have fio generate the exact names that you want. By default,
514 fio will name a file based on the default file format specifica‐
515 tion of `jobname.jobnumber.filenumber'. With this option, that
516 can be customized. Fio will recognize and replace the following
517 keywords in this string:
518
519 $jobname
520 The name of the worker thread or process.
521
522 $jobnum
523 The incremental number of the worker thread or
524 process.
525
526 $filenum
527 The incremental number of the file for that worker
528 thread or process.
529
530 To have dependent jobs share a set of files, this option can be
531 set to have fio generate filenames that are shared between the
532 two. For instance, if `testfiles.$filenum' is specified, file
533 number 4 for any job will be named `testfiles.4'. The default of
534 `$jobname.$jobnum.$filenum' will be used if no other format
535 specifier is given.
536
537 If you specify a path then the directories will be created up to
538 the main directory for the file. So for example if you specify
539 `a/b/c/$jobnum` then the directories a/b/c will be created
540 before the file setup part of the job. If you specify directory
541 then the path will be relative that directory, otherwise it is
542 treated as the absolute path.
543
544 unique_filename=bool
545 To avoid collisions between networked clients, fio defaults to
546 prefixing any generated filenames (with a directory specified)
547 with the source of the client connecting. To disable this behav‐
548 ior, set this option to 0.
549
550 opendir=str
551 Recursively open any files below directory str.
552
553 lockfile=str
554 Fio defaults to not locking any files before it does I/O to
555 them. If a file or file descriptor is shared, fio can serialize
556 I/O to that file to make the end result consistent. This is
557 usual for emulating real workloads that share files. The lock
558 modes are:
559
560 none No locking. The default.
561
562 exclusive
563 Only one thread or process may do I/O at a time,
564 excluding all others.
565
566 readwrite
567 Read-write locking on the file. Many readers may
568 access the file at the same time, but writes get
569 exclusive access.
570
571 nrfiles=int
572 Number of files to use for this job. Defaults to 1. The size of
573 files will be size divided by this unless explicit size is spec‐
574 ified by filesize. Files are created for each thread separately,
575 and each file will have a file number within its name by
576 default, as explained in filename section.
577
578 openfiles=int
579 Number of files to keep open at the same time. Defaults to the
580 same as nrfiles, can be set smaller to limit the number simulta‐
581 neous opens.
582
583 file_service_type=str
584 Defines how fio decides which file from a job to service next.
585 The following types are defined:
586
587 random Choose a file at random.
588
589 roundrobin
590 Round robin over opened files. This is the
591 default.
592
593 sequential
594 Finish one file before moving on to the next. Mul‐
595 tiple files can still be open depending on open‐
596 files.
597
598 zipf Use a Zipf distribution to decide what file to
599 access.
600
601 pareto Use a Pareto distribution to decide what file to
602 access.
603
604 normal Use a Gaussian (normal) distribution to decide
605 what file to access.
606
607 gauss Alias for normal.
608
609 For random, roundrobin, and sequential, a postfix can be
610 appended to tell fio how many I/Os to issue before switching to
611 a new file. For example, specifying `file_service_type=random:8'
612 would cause fio to issue 8 I/Os before selecting a new file at
613 random. For the non-uniform distributions, a floating point
614 postfix can be given to influence how the distribution is
615 skewed. See random_distribution for a description of how that
616 would work.
617
618 ioscheduler=str
619 Attempt to switch the device hosting the file to the specified
620 I/O scheduler before running.
621
622 create_serialize=bool
623 If true, serialize the file creation for the jobs. This may be
624 handy to avoid interleaving of data files, which may greatly
625 depend on the filesystem used and even the number of processors
626 in the system. Default: true.
627
628 create_fsync=bool
629 fsync(2) the data file after creation. This is the default.
630
631 create_on_open=bool
632 If true, don't pre-create files but allow the job's open() to
633 create a file when it's time to do I/O. Default: false -- pre-
634 create all necessary files when the job starts.
635
636 create_only=bool
637 If true, fio will only run the setup phase of the job. If files
638 need to be laid out or updated on disk, only that will be done
639 -- the actual job contents are not executed. Default: false.
640
641 allow_file_create=bool
642 If true, fio is permitted to create files as part of its work‐
643 load. If this option is false, then fio will error out if the
644 files it needs to use don't already exist. Default: true.
645
646 allow_mounted_write=bool
647 If this isn't set, fio will abort jobs that are destructive
648 (e.g. that write) to what appears to be a mounted device or par‐
649 tition. This should help catch creating inadvertently destruc‐
650 tive tests, not realizing that the test will destroy data on the
651 mounted file system. Note that some platforms don't allow writ‐
652 ing against a mounted device regardless of this option. Default:
653 false.
654
655 pre_read=bool
656 If this is given, files will be pre-read into memory before
657 starting the given I/O operation. This will also clear the
658 invalidate flag, since it is pointless to pre-read and then drop
659 the cache. This will only work for I/O engines that are seek-
660 able, since they allow you to read the same data multiple times.
661 Thus it will not work on non-seekable I/O engines (e.g. network,
662 splice). Default: false.
663
664 unlink=bool
665 Unlink the job files when done. Not the default, as repeated
666 runs of that job would then waste time recreating the file set
667 again and again. Default: false.
668
669 unlink_each_loop=bool
670 Unlink job files after each iteration or loop. Default: false.
671
672 zonemode=str
673 Accepted values are:
674
675 none The zonerange, zonesize and zoneskip parameters
676 are ignored.
677
678 strided
679 I/O happens in a single zone until zonesize bytes
680 have been transferred. After that number of bytes
681 has been transferred processing of the next zone
682 starts.
683
684 zbd Zoned block device mode. I/O happens sequentially
685 in each zone, even if random I/O has been
686 selected. Random I/O happens across all zones
687 instead of being restricted to a single zone.
688
689 zonerange=int
690 For zonemode=strided, this is the size of a single zone. See
691 also zonesize and zoneskip.
692
693 For zonemode=zbd, this parameter is ignored.
694
695 zonesize=int
696 For zonemode=strided, this is the number of bytes to transfer
697 before skipping zoneskip bytes. If this parameter is smaller
698 than zonerange then only a fraction of each zone with zonerange
699 bytes will be accessed. If this parameter is larger than zon‐
700 erange then each zone will be accessed multiple times before
701 skipping to the next zone.
702
703 For zonemode=zbd, this is the size of a single zone. The zon‐
704 erange parameter is ignored in this mode. For a job accessing a
705 zoned block device, the specified zonesize must be 0 or equal to
706 the device zone size. For a regular block device or file, the
707 specified zonesize must be at least 512B.
708
709 zoneskip=int
710 For zonemode=strided, the number of bytes to skip after zonesize
711 bytes of data have been transferred.
712
713 For zonemode=zbd, the zonesize aligned number of bytes to skip
714 once a zone is fully written (write workloads) or all written
715 data in the zone have been read (read workloads). This parameter
716 is valid only for sequential workloads and ignored for random
717 workloads. For read workloads, see also read_beyond_wp.
718
719
720 read_beyond_wp=bool
721 This parameter applies to zonemode=zbd only.
722
723 Zoned block devices are block devices that consist of multiple
724 zones. Each zone has a type, e.g. conventional or sequential. A
725 conventional zone can be written at any offset that is a multi‐
726 ple of the block size. Sequential zones must be written sequen‐
727 tially. The position at which a write must occur is called the
728 write pointer. A zoned block device can be either host managed
729 or host aware. For host managed devices the host must ensure
730 that writes happen sequentially. Fio recognizes host managed
731 devices and serializes writes to sequential zones for these
732 devices.
733
734 If a read occurs in a sequential zone beyond the write pointer
735 then the zoned block device will complete the read without read‐
736 ing any data from the storage medium. Since such reads lead to
737 unrealistically high bandwidth and IOPS numbers fio only reads
738 beyond the write pointer if explicitly told to do so. Default:
739 false.
740
741 max_open_zones=int
742 When running a random write test across an entire drive many
743 more zones will be open than in a typical application workload.
744 Hence this command line option that allows to limit the number
745 of open zones. The number of open zones is defined as the number
746 of zones to which write commands are issued.
747
748 zone_reset_threshold=float
749 A number between zero and one that indicates the ratio of logi‐
750 cal blocks with data to the total number of logical blocks in
751 the test above which zones should be reset periodically.
752
753 zone_reset_frequency=float
754 A number between zero and one that indicates how often a zone
755 reset should be issued if the zone reset threshold has been
756 exceeded. A zone reset is submitted after each (1 /
757 zone_reset_frequency) write requests. This and the previous
758 parameter can be used to simulate garbage collection activity.
759
760
761 I/O type
762 direct=bool
763 If value is true, use non-buffered I/O. This is usually
764 O_DIRECT. Note that OpenBSD and ZFS on Solaris don't support
765 direct I/O. On Windows the synchronous ioengines don't support
766 direct I/O. Default: false.
767
768 atomic=bool
769 If value is true, attempt to use atomic direct I/O. Atomic
770 writes are guaranteed to be stable once acknowledged by the
771 operating system. Only Linux supports O_ATOMIC right now.
772
773 buffered=bool
774 If value is true, use buffered I/O. This is the opposite of the
775 direct option. Defaults to true.
776
777 readwrite=str, rw=str
778 Type of I/O pattern. Accepted values are:
779
780 read Sequential reads.
781
782 write Sequential writes.
783
784 trim Sequential trims (Linux block devices and SCSI
785 character devices only).
786
787 randread
788 Random reads.
789
790 randwrite
791 Random writes.
792
793 randtrim
794 Random trims (Linux block devices and SCSI charac‐
795 ter devices only).
796
797 rw,readwrite
798 Sequential mixed reads and writes.
799
800 randrw Random mixed reads and writes.
801
802 trimwrite
803 Sequential trim+write sequences. Blocks will be
804 trimmed first, then the same blocks will be writ‐
805 ten to.
806
807 Fio defaults to read if the option is not specified. For the
808 mixed I/O types, the default is to split them 50/50. For certain
809 types of I/O the result may still be skewed a bit, since the
810 speed may be different.
811
812 It is possible to specify the number of I/Os to do before get‐
813 ting a new offset by appending `:<nr>' to the end of the string
814 given. For a random read, it would look like `rw=randread:8' for
815 passing in an offset modifier with a value of 8. If the suffix
816 is used with a sequential I/O pattern, then the `<nr>' value
817 specified will be added to the generated offset for each I/O
818 turning sequential I/O into sequential I/O with holes. For
819 instance, using `rw=write:4k' will skip 4k for every write. Also
820 see the rw_sequencer option.
821
822 rw_sequencer=str
823 If an offset modifier is given by appending a number to the
824 `rw=str' line, then this option controls how that number modi‐
825 fies the I/O offset being generated. Accepted values are:
826
827 sequential
828 Generate sequential offset.
829
830 identical
831 Generate the same offset.
832
833 sequential is only useful for random I/O, where fio would nor‐
834 mally generate a new random offset for every I/O. If you append
835 e.g. 8 to randread, you would get a new random offset for every
836 8 I/Os. The result would be a seek for only every 8 I/Os,
837 instead of for every I/O. Use `rw=randread:8' to specify that.
838 As sequential I/O is already sequential, setting sequential for
839 that would not result in any differences. identical behaves in a
840 similar fashion, except it sends the same offset 8 number of
841 times before generating a new offset.
842
843 unified_rw_reporting=bool
844 Fio normally reports statistics on a per data direction basis,
845 meaning that reads, writes, and trims are accounted and reported
846 separately. If this option is set fio sums the results and
847 report them as "mixed" instead.
848
849 randrepeat=bool
850 Seed the random number generator used for random I/O patterns in
851 a predictable way so the pattern is repeatable across runs.
852 Default: true.
853
854 allrandrepeat=bool
855 Seed all random number generators in a predictable way so
856 results are repeatable across runs. Default: false.
857
858 randseed=int
859 Seed the random number generators based on this seed value, to
860 be able to control what sequence of output is being generated.
861 If not set, the random sequence depends on the randrepeat set‐
862 ting.
863
864 fallocate=str
865 Whether pre-allocation is performed when laying down files.
866 Accepted values are:
867
868 none Do not pre-allocate space.
869
870 native Use a platform's native pre-allocation call but
871 fall back to none behavior if it fails/is not
872 implemented.
873
874 posix Pre-allocate via posix_fallocate(3).
875
876 keep Pre-allocate via fallocate(2) with FAL‐
877 LOC_FL_KEEP_SIZE set.
878
879 truncate
880 Extend file to final size using ftruncate|(2)
881 instead of allocating.
882
883 0 Backward-compatible alias for none.
884
885 1 Backward-compatible alias for posix.
886
887 May not be available on all supported platforms. keep is only
888 available on Linux. If using ZFS on Solaris this cannot be set
889 to posix because ZFS doesn't support pre-allocation. Default:
890 native if any pre-allocation methods except truncate are avail‐
891 able, none if not.
892
893 Note that using truncate on Windows will interact surprisingly
894 with non-sequential write patterns. When writing to a file that
895 has been extended by setting the end-of-file information, Win‐
896 dows will backfill the unwritten portion of the file up to that
897 offset with zeroes before issuing the new write. This means that
898 a single small write to the end of an extended file will stall
899 until the entire file has been filled with zeroes.
900
901 fadvise_hint=str
902 Use posix_fadvise(2) or posix_madvise(2) to advise the kernel
903 what I/O patterns are likely to be issued. Accepted values are:
904
905 0 Backwards compatible hint for "no hint".
906
907 1 Backwards compatible hint for "advise with fio
908 workload type". This uses FADV_RANDOM for a random
909 workload, and FADV_SEQUENTIAL for a sequential
910 workload.
911
912 sequential
913 Advise using FADV_SEQUENTIAL.
914
915 random Advise using FADV_RANDOM.
916
917 write_hint=str
918 Use fcntl(2) to advise the kernel what life time to expect from
919 a write. Only supported on Linux, as of version 4.13. Accepted
920 values are:
921
922 none No particular life time associated with this file.
923
924 short Data written to this file has a short life time.
925
926 medium Data written to this file has a medium life time.
927
928 long Data written to this file has a long life time.
929
930 extreme
931 Data written to this file has a very long life
932 time.
933
934 The values are all relative to each other, and no absolute mean‐
935 ing should be associated with them.
936
937 offset=int
938 Start I/O at the provided offset in the file, given as either a
939 fixed size in bytes or a percentage. If a percentage is given,
940 the generated offset will be aligned to the minimum blocksize or
941 to the value of offset_align if provided. Data before the given
942 offset will not be touched. This effectively caps the file size
943 at `real_size - offset'. Can be combined with size to constrain
944 the start and end range of the I/O workload. A percentage can
945 be specified by a number between 1 and 100 followed by '%', for
946 example, `offset=20%' to specify 20%.
947
948 offset_align=int
949 If set to non-zero value, the byte offset generated by a per‐
950 centage offset is aligned upwards to this value. Defaults to 0
951 meaning that a percentage offset is aligned to the minimum block
952 size.
953
954 offset_increment=int
955 If this is provided, then the real offset becomes `offset + off‐
956 set_increment * thread_number', where the thread number is a
957 counter that starts at 0 and is incremented for each sub-job
958 (i.e. when numjobs option is specified). This option is useful
959 if there are several jobs which are intended to operate on a
960 file in parallel disjoint segments, with even spacing between
961 the starting points. Percentages can be used for this option.
962 If a percentage is given, the generated offset will be aligned
963 to the minimum blocksize or to the value of offset_align if pro‐
964 vided.
965
966 number_ios=int
967 Fio will normally perform I/Os until it has exhausted the size
968 of the region set by size, or if it exhaust the allocated time
969 (or hits an error condition). With this setting, the range/size
970 can be set independently of the number of I/Os to perform. When
971 fio reaches this number, it will exit normally and report sta‐
972 tus. Note that this does not extend the amount of I/O that will
973 be done, it will only stop fio if this condition is met before
974 other end-of-job criteria.
975
976 fsync=int
977 If writing to a file, issue an fsync(2) (or its equivalent) of
978 the dirty data for every number of blocks given. For example, if
979 you give 32 as a parameter, fio will sync the file after every
980 32 writes issued. If fio is using non-buffered I/O, we may not
981 sync the file. The exception is the sg I/O engine, which syn‐
982 chronizes the disk cache anyway. Defaults to 0, which means fio
983 does not periodically issue and wait for a sync to complete.
984 Also see end_fsync and fsync_on_close.
985
986 fdatasync=int
987 Like fsync but uses fdatasync(2) to only sync data and not meta‐
988 data blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is
989 no fdatasync(2) so this falls back to using fsync(2). Defaults
990 to 0, which means fio does not periodically issue and wait for a
991 data-only sync to complete.
992
993 write_barrier=int
994 Make every N-th write a barrier write.
995
996 sync_file_range=str:int
997 Use sync_file_range(2) for every int number of write operations.
998 Fio will track range of writes that have happened since the last
999 sync_file_range(2) call. str can currently be one or more of:
1000
1001 wait_before
1002 SYNC_FILE_RANGE_WAIT_BEFORE
1003
1004 write SYNC_FILE_RANGE_WRITE
1005
1006 wait_after
1007 SYNC_FILE_RANGE_WRITE_AFTER
1008
1009 So if you do `sync_file_range=wait_before,write:8', fio would
1010 use `SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE' for
1011 every 8 writes. Also see the sync_file_range(2) man page. This
1012 option is Linux specific.
1013
1014 overwrite=bool
1015 If true, writes to a file will always overwrite existing data.
1016 If the file doesn't already exist, it will be created before the
1017 write phase begins. If the file exists and is large enough for
1018 the specified write phase, nothing will be done. Default: false.
1019
1020 end_fsync=bool
1021 If true, fsync(2) file contents when a write stage has com‐
1022 pleted. Default: false.
1023
1024 fsync_on_close=bool
1025 If true, fio will fsync(2) a dirty file on close. This differs
1026 from end_fsync in that it will happen on every file close, not
1027 just at the end of the job. Default: false.
1028
1029 rwmixread=int
1030 Percentage of a mixed workload that should be reads. Default:
1031 50.
1032
1033 rwmixwrite=int
1034 Percentage of a mixed workload that should be writes. If both
1035 rwmixread and rwmixwrite is given and the values do not add up
1036 to 100%, the latter of the two will be used to override the
1037 first. This may interfere with a given rate setting, if fio is
1038 asked to limit reads or writes to a certain rate. If that is the
1039 case, then the distribution may be skewed. Default: 50.
1040
1041 random_distribution=str:float[,str:float][,str:float]
1042 By default, fio will use a completely uniform random distribu‐
1043 tion when asked to perform random I/O. Sometimes it is useful to
1044 skew the distribution in specific ways, ensuring that some parts
1045 of the data is more hot than others. fio includes the following
1046 distribution models:
1047
1048 random Uniform random distribution
1049
1050 zipf Zipf distribution
1051
1052 pareto Pareto distribution
1053
1054 normal Normal (Gaussian) distribution
1055
1056 zoned Zoned random distribution zoned_abs Zoned absolute
1057 random distribution
1058
1059 When using a zipf or pareto distribution, an input value is also
1060 needed to define the access pattern. For zipf, this is the `Zipf
1061 theta'. For pareto, it's the `Pareto power'. Fio includes a
1062 test program, fio-genzipf, that can be used visualize what the
1063 given input values will yield in terms of hit rates. If you
1064 wanted to use zipf with a `theta' of 1.2, you would use `ran‐
1065 dom_distribution=zipf:1.2' as the option. If a non-uniform model
1066 is used, fio will disable use of the random map. For the normal
1067 distribution, a normal (Gaussian) deviation is supplied as a
1068 value between 0 and 100.
1069
1070 For a zoned distribution, fio supports specifying percentages of
1071 I/O access that should fall within what range of the file or
1072 device. For example, given a criteria of:
1073
1074 60% of accesses should be to the first 10%
1075 30% of accesses should be to the next 20%
1076 8% of accesses should be to the next 30%
1077 2% of accesses should be to the next 40%
1078
1079 we can define that through zoning of the random accesses. For
1080 the above example, the user would do:
1081
1082 random_distribution=zoned:60/10:30/20:8/30:2/40
1083
1084 A zoned_abs distribution works exactly like thezoned, except
1085 that it takes absolute sizes. For example, let's say you wanted
1086 to define access according to the following criteria:
1087
1088 60% of accesses should be to the first 20G
1089 30% of accesses should be to the next 100G
1090 10% of accesses should be to the next 500G
1091
1092 we can define an absolute zoning distribution with:
1093
1094 random_distribution=zoned:60/10:30/20:8/30:2/40
1095
1096 For both zoned and zoned_abs, fio supports defining up to 256
1097 separate zones.
1098
1099 Similarly to how bssplit works for setting ranges and percent‐
1100 ages of block sizes. Like bssplit, it's possible to specify sep‐
1101 arate zones for reads, writes, and trims. If just one set is
1102 given, it'll apply to all of them.
1103
1104 percentage_random=int[,int][,int]
1105 For a random workload, set how big a percentage should be ran‐
1106 dom. This defaults to 100%, in which case the workload is fully
1107 random. It can be set from anywhere from 0 to 100. Setting it to
1108 0 would make the workload fully sequential. Any setting in
1109 between will result in a random mix of sequential and random
1110 I/O, at the given percentages. Comma-separated values may be
1111 specified for reads, writes, and trims as described in block‐
1112 size.
1113
1114 norandommap
1115 Normally fio will cover every block of the file when doing ran‐
1116 dom I/O. If this option is given, fio will just get a new random
1117 offset without looking at past I/O history. This means that some
1118 blocks may not be read or written, and that some blocks may be
1119 read/written more than once. If this option is used with verify
1120 and multiple blocksizes (via bsrange), only intact blocks are
1121 verified, i.e., partially-overwritten blocks are ignored. With
1122 an async I/O engine and an I/O depth > 1, it is possible for the
1123 same block to be overwritten, which can cause verification
1124 errors. Either do not use norandommap in this case, or also use
1125 the lfsr random generator.
1126
1127 softrandommap=bool
1128 See norandommap. If fio runs with the random block map enabled
1129 and it fails to allocate the map, if this option is set it will
1130 continue without a random block map. As coverage will not be as
1131 complete as with random maps, this option is disabled by
1132 default.
1133
1134 random_generator=str
1135 Fio supports the following engines for generating I/O offsets
1136 for random I/O:
1137
1138 tausworthe
1139 Strong 2^88 cycle random number generator.
1140
1141 lfsr Linear feedback shift register generator.
1142
1143 tausworthe64
1144 Strong 64-bit 2^258 cycle random number generator.
1145
1146 tausworthe is a strong random number generator, but it requires
1147 tracking on the side if we want to ensure that blocks are only
1148 read or written once. lfsr guarantees that we never generate the
1149 same offset twice, and it's also less computationally expensive.
1150 It's not a true random generator, however, though for I/O pur‐
1151 poses it's typically good enough. lfsr only works with single
1152 block sizes, not with workloads that use multiple block sizes.
1153 If used with such a workload, fio may read or write some blocks
1154 multiple times. The default value is tausworthe, unless the
1155 required space exceeds 2^32 blocks. If it does, then taus‐
1156 worthe64 is selected automatically.
1157
1158 Block size
1159 blocksize=int[,int][,int], bs=int[,int][,int]
1160 The block size in bytes used for I/O units. Default: 4096. A
1161 single value applies to reads, writes, and trims. Comma-sepa‐
1162 rated values may be specified for reads, writes, and trims. A
1163 value not terminated in a comma applies to subsequent types.
1164 Examples:
1165
1166 bs=256k means 256k for reads, writes and trims.
1167 bs=8k,32k means 8k for reads, 32k for writes and
1168 trims.
1169 bs=8k,32k, means 8k for reads, 32k for writes, and
1170 default for trims.
1171 bs=,8k means default for reads, 8k for writes and
1172 trims.
1173 bs=,8k, means default for reads, 8k for writes,
1174 and default for trims.
1175
1176 blocksize_range=irange[,irange][,irange],
1177 bsrange=irange[,irange][,irange]
1178 A range of block sizes in bytes for I/O units. The issued I/O
1179 unit will always be a multiple of the minimum size, unless
1180 blocksize_unaligned is set. Comma-separated ranges may be spec‐
1181 ified for reads, writes, and trims as described in blocksize.
1182 Example:
1183
1184 bsrange=1k-4k,2k-8k
1185
1186 bssplit=str[,str][,str]
1187 Sometimes you want even finer grained control of the block sizes
1188 issued, not just an even split between them. This option allows
1189 you to weight various block sizes, so that you are able to
1190 define a specific amount of block sizes issued. The format for
1191 this option is:
1192
1193 bssplit=blocksize/percentage:blocksize/percentage
1194
1195 for as many block sizes as needed. So if you want to define a
1196 workload that has 50% 64k blocks, 10% 4k blocks, and 40% 32k
1197 blocks, you would write:
1198
1199 bssplit=4k/10:64k/50:32k/40
1200
1201 Ordering does not matter. If the percentage is left blank, fio
1202 will fill in the remaining values evenly. So a bssplit option
1203 like this one:
1204
1205 bssplit=4k/50:1k/:32k/
1206
1207 would have 50% 4k ios, and 25% 1k and 32k ios. The percentages
1208 always add up to 100, if bssplit is given a range that adds up
1209 to more, it will error out.
1210
1211 Comma-separated values may be specified for reads, writes, and
1212 trims as described in blocksize.
1213
1214 If you want a workload that has 50% 2k reads and 50% 4k reads,
1215 while having 90% 4k writes and 10% 8k writes, you would specify:
1216
1217 bssplit=2k/50:4k/50,4k/90:8k/10
1218
1219 Fio supports defining up to 64 different weights for each data
1220 direction.
1221
1222 blocksize_unaligned, bs_unaligned
1223 If set, fio will issue I/O units with any size within block‐
1224 size_range, not just multiples of the minimum size. This typi‐
1225 cally won't work with direct I/O, as that normally requires sec‐
1226 tor alignment.
1227
1228 bs_is_seq_rand=bool
1229 If this option is set, fio will use the normal read,write block‐
1230 size settings as sequential,random blocksize settings instead.
1231 Any random read or write will use the WRITE blocksize settings,
1232 and any sequential read or write will use the READ blocksize
1233 settings.
1234
1235 blockalign=int[,int][,int], ba=int[,int][,int]
1236 Boundary to which fio will align random I/O units. Default:
1237 blocksize. Minimum alignment is typically 512b for using direct
1238 I/O, though it usually depends on the hardware block size. This
1239 option is mutually exclusive with using a random map for files,
1240 so it will turn off that option. Comma-separated values may be
1241 specified for reads, writes, and trims as described in block‐
1242 size.
1243
1244 Buffers and memory
1245 zero_buffers
1246 Initialize buffers with all zeros. Default: fill buffers with
1247 random data.
1248
1249 refill_buffers
1250 If this option is given, fio will refill the I/O buffers on
1251 every submit. The default is to only fill it at init time and
1252 reuse that data. Only makes sense if zero_buffers isn't speci‐
1253 fied, naturally. If data verification is enabled, refill_buffers
1254 is also automatically enabled.
1255
1256 scramble_buffers=bool
1257 If refill_buffers is too costly and the target is using data
1258 deduplication, then setting this option will slightly modify the
1259 I/O buffer contents to defeat normal de-dupe attempts. This is
1260 not enough to defeat more clever block compression attempts, but
1261 it will stop naive dedupe of blocks. Default: true.
1262
1263 buffer_compress_percentage=int
1264 If this is set, then fio will attempt to provide I/O buffer con‐
1265 tent (on WRITEs) that compresses to the specified level. Fio
1266 does this by providing a mix of random data followed by fixed
1267 pattern data. The fixed pattern is either zeros, or the pattern
1268 specified by buffer_pattern. If the buffer_pattern option is
1269 used, it might skew the compression ratio slightly. Setting buf‐
1270 fer_compress_percentage to a value other than 100 will also
1271 enable refill_buffers in order to reduce the likelihood that
1272 adjacent blocks are so similar that they over compress when seen
1273 together. See buffer_compress_chunk for how to set a finer or
1274 coarser granularity of the random/fixed data regions. Defaults
1275 to unset i.e., buffer data will not adhere to any compression
1276 level.
1277
1278 buffer_compress_chunk=int
1279 This setting allows fio to manage how big the random/fixed data
1280 region is when using buffer_compress_percentage. When buf‐
1281 fer_compress_chunk is set to some non-zero value smaller than
1282 the block size, fio can repeat the random/fixed region through‐
1283 out the I/O buffer at the specified interval (which particularly
1284 useful when bigger block sizes are used for a job). When set to
1285 0, fio will use a chunk size that matches the block size result‐
1286 ing in a single random/fixed region within the I/O buffer.
1287 Defaults to 512. When the unit is omitted, the value is inter‐
1288 preted in bytes.
1289
1290 buffer_pattern=str
1291 If set, fio will fill the I/O buffers with this pattern or with
1292 the contents of a file. If not set, the contents of I/O buffers
1293 are defined by the other options related to buffer contents. The
1294 setting can be any pattern of bytes, and can be prefixed with 0x
1295 for hex values. It may also be a string, where the string must
1296 then be wrapped with "". Or it may also be a filename, where the
1297 filename must be wrapped with '' in which case the file is
1298 opened and read. Note that not all the file contents will be
1299 read if that would cause the buffers to overflow. So, for exam‐
1300 ple:
1301
1302 buffer_pattern='filename'
1303 or:
1304 buffer_pattern="abcd"
1305 or:
1306 buffer_pattern=-12
1307 or:
1308 buffer_pattern=0xdeadface
1309
1310 Also you can combine everything together in any order:
1311
1312 buffer_pattern=0xdeadface"abcd"-12'filename'
1313
1314 dedupe_percentage=int
1315 If set, fio will generate this percentage of identical buffers
1316 when writing. These buffers will be naturally dedupable. The
1317 contents of the buffers depend on what other buffer compression
1318 settings have been set. It's possible to have the individual
1319 buffers either fully compressible, or not at all -- this option
1320 only controls the distribution of unique buffers. Setting this
1321 option will also enable refill_buffers to prevent every buffer
1322 being identical.
1323
1324 invalidate=bool
1325 Invalidate the buffer/page cache parts of the files to be used
1326 prior to starting I/O if the platform and file type support it.
1327 Defaults to true. This will be ignored if pre_read is also
1328 specified for the same job.
1329
1330 sync=bool
1331 Use synchronous I/O for buffered writes. For the majority of I/O
1332 engines, this means using O_SYNC. Default: false.
1333
1334 iomem=str, mem=str
1335 Fio can use various types of memory as the I/O unit buffer. The
1336 allowed values are:
1337
1338 malloc Use memory from malloc(3) as the buffers. Default
1339 memory type.
1340
1341 shm Use shared memory as the buffers. Allocated
1342 through shmget(2).
1343
1344 shmhuge
1345 Same as shm, but use huge pages as backing.
1346
1347 mmap Use mmap(2) to allocate buffers. May either be
1348 anonymous memory, or can be file backed if a file‐
1349 name is given after the option. The format is
1350 `mem=mmap:/path/to/file'.
1351
1352 mmaphuge
1353 Use a memory mapped huge file as the buffer back‐
1354 ing. Append filename after mmaphuge, ala `mem=mma‐
1355 phuge:/hugetlbfs/file'.
1356
1357 mmapshared
1358 Same as mmap, but use a MMAP_SHARED mapping.
1359
1360 cudamalloc
1361 Use GPU memory as the buffers for GPUDirect RDMA
1362 benchmark. The ioengine must be rdma.
1363
1364 The area allocated is a function of the maximum allowed bs size
1365 for the job, multiplied by the I/O depth given. Note that for
1366 shmhuge and mmaphuge to work, the system must have free huge
1367 pages allocated. This can normally be checked and set by read‐
1368 ing/writing `/proc/sys/vm/nr_hugepages' on a Linux system. Fio
1369 assumes a huge page is 4MiB in size. So to calculate the number
1370 of huge pages you need for a given job file, add up the I/O
1371 depth of all jobs (normally one unless iodepth is used) and mul‐
1372 tiply by the maximum bs set. Then divide that number by the huge
1373 page size. You can see the size of the huge pages in `/proc/mem‐
1374 info'. If no huge pages are allocated by having a non-zero num‐
1375 ber in `nr_hugepages', using mmaphuge or shmhuge will fail. Also
1376 see hugepage-size.
1377
1378 mmaphuge also needs to have hugetlbfs mounted and the file loca‐
1379 tion should point there. So if it's mounted in `/huge', you
1380 would use `mem=mmaphuge:/huge/somefile'.
1381
1382 iomem_align=int, mem_align=int
1383 This indicates the memory alignment of the I/O memory buffers.
1384 Note that the given alignment is applied to the first I/O unit
1385 buffer, if using iodepth the alignment of the following buffers
1386 are given by the bs used. In other words, if using a bs that is
1387 a multiple of the page sized in the system, all buffers will be
1388 aligned to this value. If using a bs that is not page aligned,
1389 the alignment of subsequent I/O memory buffers is the sum of the
1390 iomem_align and bs used.
1391
1392 hugepage-size=int
1393 Defines the size of a huge page. Must at least be equal to the
1394 system setting, see `/proc/meminfo'. Defaults to 4MiB. Should
1395 probably always be a multiple of megabytes, so using
1396 `hugepage-size=Xm' is the preferred way to set this to avoid
1397 setting a non-pow-2 bad value.
1398
1399 lockmem=int
1400 Pin the specified amount of memory with mlock(2). Can be used to
1401 simulate a smaller amount of memory. The amount specified is per
1402 worker.
1403
1404 I/O size
1405 size=int
1406 The total size of file I/O for each thread of this job. Fio will
1407 run until this many bytes has been transferred, unless runtime
1408 is limited by other options (such as runtime, for instance, or
1409 increased/decreased by io_size). Fio will divide this size
1410 between the available files determined by options such as
1411 nrfiles, filename, unless filesize is specified by the job. If
1412 the result of division happens to be 0, the size is set to the
1413 physical size of the given files or devices if they exist. If
1414 this option is not specified, fio will use the full size of the
1415 given files or devices. If the files do not exist, size must be
1416 given. It is also possible to give size as a percentage between
1417 1 and 100. If `size=20%' is given, fio will use 20% of the full
1418 size of the given files or devices. Can be combined with offset
1419 to constrain the start and end range that I/O will be done
1420 within.
1421
1422 io_size=int, io_limit=int
1423 Normally fio operates within the region set by size, which means
1424 that the size option sets both the region and size of I/O to be
1425 performed. Sometimes that is not what you want. With this
1426 option, it is possible to define just the amount of I/O that fio
1427 should do. For instance, if size is set to 20GiB and io_size is
1428 set to 5GiB, fio will perform I/O within the first 20GiB but
1429 exit when 5GiB have been done. The opposite is also possible --
1430 if size is set to 20GiB, and io_size is set to 40GiB, then fio
1431 will do 40GiB of I/O within the 0..20GiB region.
1432
1433 filesize=irange(int)
1434 Individual file sizes. May be a range, in which case fio will
1435 select sizes for files at random within the given range and lim‐
1436 ited to size in total (if that is given). If not given, each
1437 created file is the same size. This option overrides size in
1438 terms of file size, which means this value is used as a fixed
1439 size or possible range of each file.
1440
1441 file_append=bool
1442 Perform I/O after the end of the file. Normally fio will operate
1443 within the size of a file. If this option is set, then fio will
1444 append to the file instead. This has identical behavior to set‐
1445 ting offset to the size of a file. This option is ignored on
1446 non-regular files.
1447
1448 fill_device=bool, fill_fs=bool
1449 Sets size to something really large and waits for ENOSPC (no
1450 space left on device) as the terminating condition. Only makes
1451 sense with sequential write. For a read workload, the mount
1452 point will be filled first then I/O started on the result. This
1453 option doesn't make sense if operating on a raw device node,
1454 since the size of that is already known by the file system.
1455 Additionally, writing beyond end-of-device will not return
1456 ENOSPC there.
1457
1458 I/O engine
1459 ioengine=str
1460 Defines how the job issues I/O to the file. The following types
1461 are defined:
1462
1463 sync Basic read(2) or write(2) I/O. lseek(2) is used to
1464 position the I/O location. See fsync and fdata‐
1465 sync for syncing write I/Os.
1466
1467 psync Basic pread(2) or pwrite(2) I/O. Default on all
1468 supported operating systems except for Windows.
1469
1470 vsync Basic readv(2) or writev(2) I/O. Will emulate
1471 queuing by coalescing adjacent I/Os into a single
1472 submission.
1473
1474 pvsync Basic preadv(2) or pwritev(2) I/O.
1475
1476 pvsync2
1477 Basic preadv2(2) or pwritev2(2) I/O.
1478
1479 libaio Linux native asynchronous I/O. Note that Linux may
1480 only support queued behavior with non-buffered I/O
1481 (set `direct=1' or `buffered=0'). This engine
1482 defines engine specific options.
1483
1484 posixaio
1485 POSIX asynchronous I/O using aio_read(3) and
1486 aio_write(3).
1487
1488 solarisaio
1489 Solaris native asynchronous I/O.
1490
1491 windowsaio
1492 Windows native asynchronous I/O. Default on Win‐
1493 dows.
1494
1495 mmap File is memory mapped with mmap(2) and data copied
1496 to/from using memcpy(3).
1497
1498 splice splice(2) is used to transfer the data and
1499 vmsplice(2) to transfer data from user space to
1500 the kernel.
1501
1502 sg SCSI generic sg v3 I/O. May either be synchronous
1503 using the SG_IO ioctl, or if the target is an sg
1504 character device we use read(2) and write(2) for
1505 asynchronous I/O. Requires filename option to
1506 specify either block or character devices. This
1507 engine supports trim operations. The sg engine
1508 includes engine specific options.
1509
1510 null Doesn't transfer any data, just pretends to. This
1511 is mainly used to exercise fio itself and for
1512 debugging/testing purposes.
1513
1514 net Transfer over the network to given `host:port'.
1515 Depending on the protocol used, the hostname,
1516 port, listen and filename options are used to
1517 specify what sort of connection to make, while the
1518 protocol option determines which protocol will be
1519 used. This engine defines engine specific options.
1520
1521 netsplice
1522 Like net, but uses splice(2) and vmsplice(2) to
1523 map data and send/receive. This engine defines
1524 engine specific options.
1525
1526 cpuio Doesn't transfer any data, but burns CPU cycles
1527 according to the cpuload and cpuchunks options.
1528 Setting cpuload=85 will cause that job to do noth‐
1529 ing but burn 85% of the CPU. In case of SMP
1530 machines, use `numjobs=<nr_of_cpu>' to get desired
1531 CPU usage, as the cpuload only loads a single CPU
1532 at the desired rate. A job never finishes unless
1533 there is at least one non-cpuio job.
1534
1535 guasi The GUASI I/O engine is the Generic Userspace
1536 Asynchronous Syscall Interface approach to async
1537 I/O. See http://www.xmailserver.org/guasi-lib.html
1538 for more info on GUASI.
1539
1540 rdma The RDMA I/O engine supports both RDMA memory
1541 semantics (RDMA_WRITE/RDMA_READ) and channel
1542 semantics (Send/Recv) for the InfiniBand, RoCE and
1543 iWARP protocols. This engine defines engine spe‐
1544 cific options.
1545
1546 falloc I/O engine that does regular fallocate to simulate
1547 data transfer as fio ioengine.
1548
1549 DDIR_READ does fallocate(,mode = FAL‐
1550 LOC_FL_KEEP_SIZE,).
1551 DIR_WRITE does fallocate(,mode = 0).
1552 DDIR_TRIM does fallocate(,mode = FAL‐
1553 LOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
1554
1555 ftruncate
1556 I/O engine that sends ftruncate(2) operations in
1557 response to write (DDIR_WRITE) events. Each ftrun‐
1558 cate issued sets the file's size to the current
1559 block offset. blocksize is ignored.
1560
1561 e4defrag
1562 I/O engine that does regular EXT4_IOC_MOVE_EXT
1563 ioctls to simulate defragment activity in request
1564 to DDIR_WRITE event.
1565
1566 rados I/O engine supporting direct access to Ceph Reli‐
1567 able Autonomic Distributed Object Store (RADOS)
1568 via librados. This ioengine defines engine spe‐
1569 cific options.
1570
1571 rbd I/O engine supporting direct access to Ceph Rados
1572 Block Devices (RBD) via librbd without the need to
1573 use the kernel rbd driver. This ioengine defines
1574 engine specific options.
1575
1576 http I/O engine supporting GET/PUT requests over
1577 HTTP(S) with libcurl to a WebDAV or S3 endpoint.
1578 This ioengine defines engine specific options.
1579
1580 This engine only supports direct IO of iodepth=1;
1581 you need to scale this via numjobs. blocksize
1582 defines the size of the objects to be created.
1583
1584 TRIM is translated to object deletion.
1585
1586 gfapi Using GlusterFS libgfapi sync interface to direct
1587 access to GlusterFS volumes without having to go
1588 through FUSE. This ioengine defines engine spe‐
1589 cific options.
1590
1591 gfapi_async
1592 Using GlusterFS libgfapi async interface to direct
1593 access to GlusterFS volumes without having to go
1594 through FUSE. This ioengine defines engine spe‐
1595 cific options.
1596
1597 libhdfs
1598 Read and write through Hadoop (HDFS). The filename
1599 option is used to specify host,port of the hdfs
1600 name-node to connect. This engine interprets off‐
1601 sets a little differently. In HDFS, files once
1602 created cannot be modified so random writes are
1603 not possible. To imitate this the libhdfs engine
1604 expects a bunch of small files to be created over
1605 HDFS and will randomly pick a file from them based
1606 on the offset generated by fio backend (see the
1607 example job file to create such files, use
1608 `rw=write' option). Please note, it may be neces‐
1609 sary to set environment variables to work with
1610 HDFS/libhdfs properly. Each job uses its own con‐
1611 nection to HDFS.
1612
1613 mtd Read, write and erase an MTD character device
1614 (e.g., `/dev/mtd0'). Discards are treated as
1615 erases. Depending on the underlying device type,
1616 the I/O may have to go in a certain pattern, e.g.,
1617 on NAND, writing sequentially to erase blocks and
1618 discarding before overwriting. The trimwrite mode
1619 works well for this constraint.
1620
1621 pmemblk
1622 Read and write using filesystem DAX to a file on a
1623 filesystem mounted with DAX on a persistent memory
1624 device through the PMDK libpmemblk library.
1625
1626 dev-dax
1627 Read and write using device DAX to a persistent
1628 memory device (e.g., /dev/dax0.0) through the PMDK
1629 libpmem library.
1630
1631 external
1632 Prefix to specify loading an external I/O engine
1633 object file. Append the engine filename, e.g.
1634 `ioengine=external:/tmp/foo.o' to load ioengine
1635 `foo.o' in `/tmp'. The path can be either absolute
1636 or relative. See `engines/skeleton_external.c' in
1637 the fio source for details of writing an external
1638 I/O engine.
1639
1640 filecreate
1641 Simply create the files and do no I/O to them.
1642 You still need to set filesize so that all the
1643 accounting still occurs, but no actual I/O will be
1644 done other than creating the file.
1645
1646 filestat
1647 Simply do stat() and do no I/O to the file. You
1648 need to set 'filesize' and 'nrfiles', so that
1649 files will be created. This engine is to measure
1650 file lookup and meta data access.
1651
1652 libpmem
1653 Read and write using mmap I/O to a file on a
1654 filesystem mounted with DAX on a persistent memory
1655 device through the PMDK libpmem library.
1656
1657 ime_psync
1658 Synchronous read and write using DDN's Infinite
1659 Memory Engine (IME). This engine is very basic and
1660 issues calls to IME whenever an IO is queued.
1661
1662 ime_psyncv
1663 Synchronous read and write using DDN's Infinite
1664 Memory Engine (IME). This engine uses iovecs and
1665 will try to stack as much IOs as possible (if the
1666 IOs are "contiguous" and the IO depth is not
1667 exceeded) before issuing a call to IME.
1668
1669 ime_aio
1670 Asynchronous read and write using DDN's Infinite
1671 Memory Engine (IME). This engine will try to stack
1672 as much IOs as possible by creating requests for
1673 IME. FIO will then decide when to commit these
1674 requests.
1675
1676 libiscsi
1677 Read and write iscsi lun with libiscsi.
1678
1679 nbd Synchronous read and write a Network Block Device
1680 (NBD).
1681
1682 I/O engine specific parameters
1683 In addition, there are some parameters which are only valid when a spe‐
1684 cific ioengine is in use. These are used identically to normal parame‐
1685 ters, with the caveat that when used on the command line, they must
1686 come after the ioengine that defines them is selected.
1687
1688 (io_uring,libaio)cmdprio_percentage=int
1689 Set the percentage of I/O that will be issued with higher prior‐
1690 ity by setting the priority bit. Non-read I/O is likely unaf‐
1691 fected by ``cmdprio_percentage``. This option cannot be used
1692 with the `prio` or `prioclass` options. For this option to set
1693 the priority bit properly, NCQ priority must be supported and
1694 enabled and `direct=1' option must be used.
1695
1696 (io_uring)fixedbufs
1697 If fio is asked to do direct IO, then Linux will map pages for
1698 each IO call, and release them when IO is done. If this option
1699 is set, the pages are pre-mapped before IO is started. This
1700 eliminates the need to map and release for each IO. This is
1701 more efficient, and reduces the IO latency as well.
1702
1703 (io_uring)hipri
1704 If this option is set, fio will attempt to use polled IO comple‐
1705 tions. Normal IO completions generate interrupts to signal the
1706 completion of IO, polled completions do not. Hence they are
1707 require active reaping by the application. The benefits are
1708 more efficient IO for high IOPS scenarios, and lower latencies
1709 for low queue depth IO.
1710
1711 (io_uring)registerfiles
1712 With this option, fio registers the set of files being used with
1713 the kernel. This avoids the overhead of managing file counts in
1714 the kernel, making the submission and completion part more
1715 lightweight. Required for the below sqthread_poll option.
1716
1717 (io_uring)sqthread_poll
1718 Normally fio will submit IO by issuing a system call to notify
1719 the kernel of available items in the SQ ring. If this option is
1720 set, the act of submitting IO will be done by a polling thread
1721 in the kernel. This frees up cycles for fio, at the cost of
1722 using more CPU in the system.
1723
1724 (io_uring)sqthread_poll_cpu
1725 When `sqthread_poll` is set, this option provides a way to
1726 define which CPU should be used for the polling thread.
1727
1728 (libaio)userspace_reap
1729 Normally, with the libaio engine in use, fio will use the
1730 io_getevents(3) system call to reap newly returned events. With
1731 this flag turned on, the AIO ring will be read directly from
1732 user-space to reap events. The reaping mode is only enabled when
1733 polling for a minimum of 0 events (e.g. when `iodepth_batch_com‐
1734 plete=0').
1735
1736 (pvsync2)hipri
1737 Set RWF_HIPRI on I/O, indicating to the kernel that it's of
1738 higher priority than normal.
1739
1740 (pvsync2)hipri_percentage
1741 When hipri is set this determines the probability of a pvsync2
1742 I/O being high priority. The default is 100%.
1743
1744 (cpuio)cpuload=int
1745 Attempt to use the specified percentage of CPU cycles. This is a
1746 mandatory option when using cpuio I/O engine.
1747
1748 (cpuio)cpuchunks=int
1749 Split the load into cycles of the given time. In microseconds.
1750
1751 (cpuio)exit_on_io_done=bool
1752 Detect when I/O threads are done, then exit.
1753
1754 (libhdfs)namenode=str
1755 The hostname or IP address of a HDFS cluster namenode to con‐
1756 tact.
1757
1758 (libhdfs)port
1759 The listening port of the HFDS cluster namenode.
1760
1761 (netsplice,net)port
1762 The TCP or UDP port to bind to or connect to. If this is used
1763 with numjobs to spawn multiple instances of the same job type,
1764 then this will be the starting port number since fio will use a
1765 range of ports.
1766
1767 (rdma)port
1768 The port to use for RDMA-CM communication. This should be the
1769 same value on the client and the server side.
1770
1771 (netsplice,net,rdma)hostname=str
1772 The hostname or IP address to use for TCP, UDP or RDMA-CM based
1773 I/O. If the job is a TCP listener or UDP reader, the hostname
1774 is not used and must be omitted unless it is a valid UDP multi‐
1775 cast address.
1776
1777 (netsplice,net)interface=str
1778 The IP address of the network interface used to send or receive
1779 UDP multicast.
1780
1781 (netsplice,net)ttl=int
1782 Time-to-live value for outgoing UDP multicast packets. Default:
1783 1.
1784
1785 (netsplice,net)nodelay=bool
1786 Set TCP_NODELAY on TCP connections.
1787
1788 (netsplice,net)protocol=str, proto=str
1789 The network protocol to use. Accepted values are:
1790
1791 tcp Transmission control protocol.
1792
1793 tcpv6 Transmission control protocol V6.
1794
1795 udp User datagram protocol.
1796
1797 udpv6 User datagram protocol V6.
1798
1799 unix UNIX domain socket.
1800
1801 When the protocol is TCP or UDP, the port must also be given, as
1802 well as the hostname if the job is a TCP listener or UDP reader.
1803 For unix sockets, the normal filename option should be used and
1804 the port is invalid.
1805
1806 (netsplice,net)listen
1807 For TCP network connections, tell fio to listen for incoming
1808 connections rather than initiating an outgoing connection. The
1809 hostname must be omitted if this option is used.
1810
1811 (netsplice,net)pingpong
1812 Normally a network writer will just continue writing data, and a
1813 network reader will just consume packages. If `pingpong=1' is
1814 set, a writer will send its normal payload to the reader, then
1815 wait for the reader to send the same payload back. This allows
1816 fio to measure network latencies. The submission and completion
1817 latencies then measure local time spent sending or receiving,
1818 and the completion latency measures how long it took for the
1819 other end to receive and send back. For UDP multicast traffic
1820 `pingpong=1' should only be set for a single reader when multi‐
1821 ple readers are listening to the same address.
1822
1823 (netsplice,net)window_size=int
1824 Set the desired socket buffer size for the connection.
1825
1826 (netsplice,net)mss=int
1827 Set the TCP maximum segment size (TCP_MAXSEG).
1828
1829 (e4defrag)donorname=str
1830 File will be used as a block donor (swap extents between files).
1831
1832 (e4defrag)inplace=int
1833 Configure donor file blocks allocation strategy:
1834
1835 0 Default. Preallocate donor's file on init.
1836
1837 1 Allocate space immediately inside defragment
1838 event, and free right after event.
1839
1840 (rbd,rados)clustername=str
1841 Specifies the name of the Ceph cluster.
1842
1843 (rbd)rbdname=str
1844 Specifies the name of the RBD.
1845
1846 (rbd,rados)pool=str
1847 Specifies the name of the Ceph pool containing RBD or RADOS
1848 data.
1849
1850 (rbd,rados)clientname=str
1851 Specifies the username (without the 'client.' prefix) used to
1852 access the Ceph cluster. If the clustername is specified, the
1853 clientname shall be the full *type.id* string. If no type. pre‐
1854 fix is given, fio will add 'client.' by default.
1855
1856 (rbd,rados)busy_poll=bool
1857 Poll store instead of waiting for completion. Usually this pro‐
1858 vides better throughput at cost of higher(up to 100%) CPU uti‐
1859 lization.
1860
1861 (http)http_host=str
1862 Hostname to connect to. For S3, this could be the bucket name.
1863 Default is localhost
1864
1865 (http)http_user=str
1866 Username for HTTP authentication.
1867
1868 (http)http_pass=str
1869 Password for HTTP authentication.
1870
1871 (http)https=str
1872 Whether to use HTTPS instead of plain HTTP. on enables HTTPS;
1873 insecure will enable HTTPS, but disable SSL peer verification
1874 (use with caution!). Default is off.
1875
1876 (http)http_mode=str
1877 Which HTTP access mode to use: webdav, swift, or s3. Default is
1878 webdav.
1879
1880 (http)http_s3_region=str
1881 The S3 region/zone to include in the request. Default is us-
1882 east-1.
1883
1884 (http)http_s3_key=str
1885 The S3 secret key.
1886
1887 (http)http_s3_keyid=str
1888 The S3 key/access id.
1889
1890 (http)http_swift_auth_token=str
1891 The Swift auth token. See the example configuration file on how
1892 to retrieve this.
1893
1894 (http)http_verbose=int
1895 Enable verbose requests from libcurl. Useful for debugging. 1
1896 turns on verbose logging from libcurl, 2 additionally enables
1897 HTTP IO tracing. Default is 0
1898
1899 (mtd)skip_bad=bool
1900 Skip operations against known bad blocks.
1901
1902 (libhdfs)hdfsdirectory
1903 libhdfs will create chunk in this HDFS directory.
1904
1905 (libhdfs)chunk_size
1906 The size of the chunk to use for each file.
1907
1908 (rdma)verb=str
1909 The RDMA verb to use on this side of the RDMA ioengine connec‐
1910 tion. Valid values are write, read, send and recv. These corre‐
1911 spond to the equivalent RDMA verbs (e.g. write = rdma_write
1912 etc.). Note that this only needs to be specified on the client
1913 side of the connection. See the examples folder.
1914
1915 (rdma)bindname=str
1916 The name to use to bind the local RDMA-CM connection to a local
1917 RDMA device. This could be a hostname or an IPv4 or IPv6
1918 address. On the server side this will be passed into the
1919 rdma_bind_addr() function and on the client site it will be used
1920 in the rdma_resolve_add() function. This can be useful when mul‐
1921 tiple paths exist between the client and the server or in cer‐
1922 tain loopback configurations.
1923
1924 (filestat)stat_type=str
1925 Specify stat system call type to measure lookup/getattr perfor‐
1926 mance. Default is stat for stat(2).
1927
1928 (sg)readfua=bool
1929 With readfua option set to 1, read operations include the force
1930 unit access (fua) flag. Default: 0.
1931
1932 (sg)writefua=bool
1933 With writefua option set to 1, write operations include the
1934 force unit access (fua) flag. Default: 0.
1935
1936 (sg)sg_write_mode=str
1937 Specify the type of write commands to issue. This option can
1938 take three values:
1939
1940 write (default)
1941 Write opcodes are issued as usual
1942
1943 verify Issue WRITE AND VERIFY commands. The BYTCHK bit is
1944 set to 0. This directs the device to carry out a
1945 medium verification with no data comparison. The
1946 writefua option is ignored with this selection.
1947
1948 same Issue WRITE SAME commands. This transfers a single
1949 block to the device and writes this same block of
1950 data to a contiguous sequence of LBAs beginning at
1951 the specified offset. fio's block size parameter
1952 specifies the amount of data written with each
1953 command. However, the amount of data actually
1954 transferred to the device is equal to the device's
1955 block (sector) size. For a device with 512 byte
1956 sectors, blocksize=8k will write 16 sectors with
1957 each command. fio will still generate 8k of data
1958 for each command butonly the first 512 bytes will
1959 be used and transferred to the device. The write‐
1960 fua option is ignored with this selection.
1961
1962 (nbd)uri=str
1963 Specify the NBD URI of the server to test. The string is a
1964 standard NBD URI (see https://github.com/NetworkBlockDe‐
1965 vice/nbd/tree/master/doc). Example URIs:
1966
1967 nbd://localhost:10809
1968
1969 nbd+unix:///?socket=/tmp/socket
1970
1971 nbds://tlshost/exportname
1972
1973
1974 I/O depth
1975 iodepth=int
1976 Number of I/O units to keep in flight against the file. Note
1977 that increasing iodepth beyond 1 will not affect synchronous
1978 ioengines (except for small degrees when verify_async is in
1979 use). Even async engines may impose OS restrictions causing the
1980 desired depth not to be achieved. This may happen on Linux when
1981 using libaio and not setting `direct=1', since buffered I/O is
1982 not async on that OS. Keep an eye on the I/O depth distribution
1983 in the fio output to verify that the achieved depth is as
1984 expected. Default: 1.
1985
1986 iodepth_batch_submit=int, iodepth_batch=int
1987 This defines how many pieces of I/O to submit at once. It
1988 defaults to 1 which means that we submit each I/O as soon as it
1989 is available, but can be raised to submit bigger batches of I/O
1990 at the time. If it is set to 0 the iodepth value will be used.
1991
1992 iodepth_batch_complete_min=int, iodepth_batch_complete=int
1993 This defines how many pieces of I/O to retrieve at once. It
1994 defaults to 1 which means that we'll ask for a minimum of 1 I/O
1995 in the retrieval process from the kernel. The I/O retrieval will
1996 go on until we hit the limit set by iodepth_low. If this vari‐
1997 able is set to 0, then fio will always check for completed
1998 events before queuing more I/O. This helps reduce I/O latency,
1999 at the cost of more retrieval system calls.
2000
2001 iodepth_batch_complete_max=int
2002 This defines maximum pieces of I/O to retrieve at once. This
2003 variable should be used along with iodepth_batch_com‐
2004 plete_min=int variable, specifying the range of min and max
2005 amount of I/O which should be retrieved. By default it is equal
2006 to iodepth_batch_complete_min value. Example #1:
2007
2008 iodepth_batch_complete_min=1
2009 iodepth_batch_complete_max=<iodepth>
2010
2011 which means that we will retrieve at least 1 I/O and up to the
2012 whole submitted queue depth. If none of I/O has been completed
2013 yet, we will wait. Example #2:
2014
2015 iodepth_batch_complete_min=0
2016 iodepth_batch_complete_max=<iodepth>
2017
2018 which means that we can retrieve up to the whole submitted queue
2019 depth, but if none of I/O has been completed yet, we will NOT
2020 wait and immediately exit the system call. In this example we
2021 simply do polling.
2022
2023 iodepth_low=int
2024 The low water mark indicating when to start filling the queue
2025 again. Defaults to the same as iodepth, meaning that fio will
2026 attempt to keep the queue full at all times. If iodepth is set
2027 to e.g. 16 and iodepth_low is set to 4, then after fio has
2028 filled the queue of 16 requests, it will let the depth drain
2029 down to 4 before starting to fill it again.
2030
2031 serialize_overlap=bool
2032 Serialize in-flight I/Os that might otherwise cause or suffer
2033 from data races. When two or more I/Os are submitted simultane‐
2034 ously, there is no guarantee that the I/Os will be processed or
2035 completed in the submitted order. Further, if two or more of
2036 those I/Os are writes, any overlapping region between them can
2037 become indeterminate/undefined on certain storage. These issues
2038 can cause verification to fail erratically when at least one of
2039 the racing I/Os is changing data and the overlapping region has
2040 a non-zero size. Setting serialize_overlap tells fio to avoid
2041 provoking this behavior by explicitly serializing in-flight I/Os
2042 that have a non-zero overlap. Note that setting this option can
2043 reduce both performance and the iodepth achieved.
2044
2045 This option only applies to I/Os issued for a single job except
2046 when it is enabled along with io_submit_mode=offload. In offload
2047 mode, fio will check for overlap among all I/Os submitted by
2048 offload jobs with serialize_overlap enabled.
2049
2050 Default: false.
2051
2052 io_submit_mode=str
2053 This option controls how fio submits the I/O to the I/O engine.
2054 The default is `inline', which means that the fio job threads
2055 submit and reap I/O directly. If set to `offload', the job
2056 threads will offload I/O submission to a dedicated pool of I/O
2057 threads. This requires some coordination and thus has a bit of
2058 extra overhead, especially for lower queue depth I/O where it
2059 can increase latencies. The benefit is that fio can manage sub‐
2060 mission rates independently of the device completion rates. This
2061 avoids skewed latency reporting if I/O gets backed up on the
2062 device side (the coordinated omission problem).
2063
2064 I/O rate
2065 thinktime=time
2066 Stall the job for the specified period of time after an I/O has
2067 completed before issuing the next. May be used to simulate pro‐
2068 cessing being done by an application. When the unit is omitted,
2069 the value is interpreted in microseconds. See thinktime_blocks
2070 and thinktime_spin.
2071
2072 thinktime_spin=time
2073 Only valid if thinktime is set - pretend to spend CPU time doing
2074 something with the data received, before falling back to sleep‐
2075 ing for the rest of the period specified by thinktime. When the
2076 unit is omitted, the value is interpreted in microseconds.
2077
2078 thinktime_blocks=int
2079 Only valid if thinktime is set - control how many blocks to
2080 issue, before waiting thinktime usecs. If not set, defaults to 1
2081 which will make fio wait thinktime usecs after every block. This
2082 effectively makes any queue depth setting redundant, since no
2083 more than 1 I/O will be queued before we have to complete it and
2084 do our thinktime. In other words, this setting effectively caps
2085 the queue depth if the latter is larger.
2086
2087 rate=int[,int][,int]
2088 Cap the bandwidth used by this job. The number is in bytes/sec,
2089 the normal suffix rules apply. Comma-separated values may be
2090 specified for reads, writes, and trims as described in block‐
2091 size.
2092
2093 For example, using `rate=1m,500k' would limit reads to 1MiB/sec
2094 and writes to 500KiB/sec. Capping only reads or writes can be
2095 done with `rate=,500k' or `rate=500k,' where the former will
2096 only limit writes (to 500KiB/sec) and the latter will only limit
2097 reads.
2098
2099 rate_min=int[,int][,int]
2100 Tell fio to do whatever it can to maintain at least this band‐
2101 width. Failing to meet this requirement will cause the job to
2102 exit. Comma-separated values may be specified for reads, writes,
2103 and trims as described in blocksize.
2104
2105 rate_iops=int[,int][,int]
2106 Cap the bandwidth to this number of IOPS. Basically the same as
2107 rate, just specified independently of bandwidth. If the job is
2108 given a block size range instead of a fixed value, the smallest
2109 block size is used as the metric. Comma-separated values may be
2110 specified for reads, writes, and trims as described in block‐
2111 size.
2112
2113 rate_iops_min=int[,int][,int]
2114 If fio doesn't meet this rate of I/O, it will cause the job to
2115 exit. Comma-separated values may be specified for reads,
2116 writes, and trims as described in blocksize.
2117
2118 rate_process=str
2119 This option controls how fio manages rated I/O submissions. The
2120 default is `linear', which submits I/O in a linear fashion with
2121 fixed delays between I/Os that gets adjusted based on I/O com‐
2122 pletion rates. If this is set to `poisson', fio will submit I/O
2123 based on a more real world random request flow, known as the
2124 Poisson process (https://en.wikipedia.org/wiki/Pois‐
2125 son_point_process). The lambda will be 10^6 / IOPS for the given
2126 workload.
2127
2128 rate_ignore_thinktime=bool
2129 By default, fio will attempt to catch up to the specified rate
2130 setting, if any kind of thinktime setting was used. If this
2131 option is set, then fio will ignore the thinktime and continue
2132 doing IO at the specified rate, instead of entering a catch-up
2133 mode after thinktime is done.
2134
2135 I/O latency
2136 latency_target=time
2137 If set, fio will attempt to find the max performance point that
2138 the given workload will run at while maintaining a latency below
2139 this target. When the unit is omitted, the value is interpreted
2140 in microseconds. See latency_window and latency_percentile.
2141
2142 latency_window=time
2143 Used with latency_target to specify the sample window that the
2144 job is run at varying queue depths to test the performance. When
2145 the unit is omitted, the value is interpreted in microseconds.
2146
2147 latency_percentile=float
2148 The percentage of I/Os that must fall within the criteria speci‐
2149 fied by latency_target and latency_window. If not set, this
2150 defaults to 100.0, meaning that all I/Os must be equal or below
2151 to the value set by latency_target.
2152
2153 max_latency=time
2154 If set, fio will exit the job with an ETIMEDOUT error if it
2155 exceeds this maximum latency. When the unit is omitted, the
2156 value is interpreted in microseconds.
2157
2158 rate_cycle=int
2159 Average bandwidth for rate and rate_min over this number of mil‐
2160 liseconds. Defaults to 1000.
2161
2162 I/O replay
2163 write_iolog=str
2164 Write the issued I/O patterns to the specified file. See
2165 read_iolog. Specify a separate file for each job, otherwise the
2166 iologs will be interspersed and the file may be corrupt.
2167
2168 read_iolog=str
2169 Open an iolog with the specified filename and replay the I/O
2170 patterns it contains. This can be used to store a workload and
2171 replay it sometime later. The iolog given may also be a blktrace
2172 binary file, which allows fio to replay a workload captured by
2173 blktrace. See blktrace(8) for how to capture such logging data.
2174 For blktrace replay, the file needs to be turned into a blkparse
2175 binary data file first (`blkparse <device> -o /dev/null -d
2176 file_for_fio.bin'). You can specify a number of files by sepa‐
2177 rating the names with a ':' character. See the filename option
2178 for information on how to escape ':' characters within the file
2179 names. These files will be sequentially assigned to job clones
2180 created by numjobs.
2181
2182 read_iolog_chunked=bool
2183 Determines how iolog is read. If false (default) entire
2184 read_iolog will be read at once. If selected true, input from
2185 iolog will be read gradually. Useful when iolog is very large,
2186 or it is generated.
2187
2188 merge_blktrace_file=str
2189 When specified, rather than replaying the logs passed to
2190 read_iolog, the logs go through a merge phase which aggregates
2191 them into a single blktrace. The resulting file is then passed
2192 on as the read_iolog parameter. The intention here is to make
2193 the order of events consistent. This limits the influence of the
2194 scheduler compared to replaying multiple blktraces via concur‐
2195 rent jobs.
2196
2197 merge_blktrace_scalars=float_list
2198 This is a percentage based option that is index paired with the
2199 list of files passed to read_iolog. When merging is performed,
2200 scale the time of each event by the corresponding amount. For
2201 example, `--merge_blktrace_scalars="50:100"' runs the first
2202 trace in halftime and the second trace in realtime. This knob is
2203 separately tunable from replay_time_scale which scales the trace
2204 during runtime and will not change the output of the merge
2205 unlike this option.
2206
2207 merge_blktrace_iters=float_list
2208 This is a whole number option that is index paired with the list
2209 of files passed to read_iolog. When merging is performed, run
2210 each trace for the specified number of iterations. For example,
2211 `--merge_blktrace_iters="2:1"' runs the first trace for two
2212 iterations and the second trace for one iteration.
2213
2214 replay_no_stall=bool
2215 When replaying I/O with read_iolog the default behavior is to
2216 attempt to respect the timestamps within the log and replay them
2217 with the appropriate delay between IOPS. By setting this vari‐
2218 able fio will not respect the timestamps and attempt to replay
2219 them as fast as possible while still respecting ordering. The
2220 result is the same I/O pattern to a given device, but different
2221 timings.
2222
2223 replay_time_scale=int
2224 When replaying I/O with read_iolog, fio will honor the original
2225 timing in the trace. With this option, it's possible to scale
2226 the time. It's a percentage option, if set to 50 it means run at
2227 50% the original IO rate in the trace. If set to 200, run at
2228 twice the original IO rate. Defaults to 100.
2229
2230 replay_redirect=str
2231 While replaying I/O patterns using read_iolog the default behav‐
2232 ior is to replay the IOPS onto the major/minor device that each
2233 IOP was recorded from. This is sometimes undesirable because on
2234 a different machine those major/minor numbers can map to a dif‐
2235 ferent device. Changing hardware on the same system can also
2236 result in a different major/minor mapping. replay_redirect
2237 causes all I/Os to be replayed onto the single specified device
2238 regardless of the device it was recorded from. i.e. `replay_re‐
2239 direct=/dev/sdc' would cause all I/O in the blktrace or iolog to
2240 be replayed onto `/dev/sdc'. This means multiple devices will be
2241 replayed onto a single device, if the trace contains multiple
2242 devices. If you want multiple devices to be replayed concur‐
2243 rently to multiple redirected devices you must blkparse your
2244 trace into separate traces and replay them with independent fio
2245 invocations. Unfortunately this also breaks the strict time
2246 ordering between multiple device accesses.
2247
2248 replay_align=int
2249 Force alignment of the byte offsets in a trace to this value.
2250 The value must be a power of 2.
2251
2252 replay_scale=int
2253 Scale bye offsets down by this factor when replaying traces.
2254 Should most likely use replay_align as well.
2255
2256 Threads, processes and job synchronization
2257 replay_skip=str
2258 Sometimes it's useful to skip certain IO types in a replay
2259 trace. This could be, for instance, eliminating the writes in
2260 the trace. Or not replaying the trims/discards, if you are redi‐
2261 recting to a device that doesn't support them. This option
2262 takes a comma separated list of read, write, trim, sync.
2263
2264 thread Fio defaults to creating jobs by using fork, however if this
2265 option is given, fio will create jobs by using POSIX Threads'
2266 function pthread_create(3) to create threads instead.
2267
2268 wait_for=str
2269 If set, the current job won't be started until all workers of
2270 the specified waitee job are done. wait_for operates on the job
2271 name basis, so there are a few limitations. First, the waitee
2272 must be defined prior to the waiter job (meaning no forward ref‐
2273 erences). Second, if a job is being referenced as a waitee, it
2274 must have a unique name (no duplicate waitees).
2275
2276 nice=int
2277 Run the job with the given nice value. See man nice(2). On Win‐
2278 dows, values less than -15 set the process class to "High"; -1
2279 through -15 set "Above Normal"; 1 through 15 "Below Normal"; and
2280 above 15 "Idle" priority class.
2281
2282 prio=int
2283 Set the I/O priority value of this job. Linux limits us to a
2284 positive value between 0 and 7, with 0 being the highest. See
2285 man ionice(1). Refer to an appropriate manpage for other operat‐
2286 ing systems since meaning of priority may differ. For per-com‐
2287 mand priority setting, see I/O engine specific `cmdprio_percent‐
2288 age` and `hipri_percentage` options.
2289
2290 prioclass=int
2291 Set the I/O priority class. See man ionice(1). For per-command
2292 priority setting, see I/O engine specific `cmdprio_percentage`
2293 and `hipri_percent` options.
2294
2295 cpus_allowed=str
2296 Controls the same options as cpumask, but accepts a textual
2297 specification of the permitted CPUs instead and CPUs are indexed
2298 from 0. So to use CPUs 0 and 5 you would specify
2299 `cpus_allowed=0,5'. This option also allows a range of CPUs to
2300 be specified -- say you wanted a binding to CPUs 0, 5, and 8 to
2301 15, you would set `cpus_allowed=0,5,8-15'.
2302
2303 On Windows, when `cpus_allowed' is unset only CPUs from fio's
2304 current processor group will be used and affinity settings are
2305 inherited from the system. An fio build configured to target
2306 Windows 7 makes options that set CPUs processor group aware and
2307 values will set both the processor group and a CPU from within
2308 that group. For example, on a system where processor group 0 has
2309 40 CPUs and processor group 1 has 32 CPUs, `cpus_allowed' values
2310 between 0 and 39 will bind CPUs from processor group 0 and
2311 `cpus_allowed' values between 40 and 71 will bind CPUs from pro‐
2312 cessor group 1. When using `cpus_allowed_policy=shared' all CPUs
2313 specified by a single `cpus_allowed' option must be from the
2314 same processor group. For Windows fio builds not built for Win‐
2315 dows 7, CPUs will only be selected from (and be relative to)
2316 whatever processor group fio happens to be running in and CPUs
2317 from other processor groups cannot be used.
2318
2319 cpus_allowed_policy=str
2320 Set the policy of how fio distributes the CPUs specified by
2321 cpus_allowed or cpumask. Two policies are supported:
2322
2323 shared All jobs will share the CPU set specified.
2324
2325 split Each job will get a unique CPU from the CPU set.
2326
2327 shared is the default behavior, if the option isn't specified.
2328 If split is specified, then fio will assign one cpu per job. If
2329 not enough CPUs are given for the jobs listed, then fio will
2330 roundrobin the CPUs in the set.
2331
2332 cpumask=int
2333 Set the CPU affinity of this job. The parameter given is a bit
2334 mask of allowed CPUs the job may run on. So if you want the
2335 allowed CPUs to be 1 and 5, you would pass the decimal value of
2336 (1 << 1 | 1 << 5), or 34. See man sched_setaffinity(2). This may
2337 not work on all supported operating systems or kernel versions.
2338 This option doesn't work well for a higher CPU count than what
2339 you can store in an integer mask, so it can only control cpus
2340 1-32. For boxes with larger CPU counts, use cpus_allowed.
2341
2342 numa_cpu_nodes=str
2343 Set this job running on specified NUMA nodes' CPUs. The argu‐
2344 ments allow comma delimited list of cpu numbers, A-B ranges, or
2345 `all'. Note, to enable NUMA options support, fio must be built
2346 on a system with libnuma-dev(el) installed.
2347
2348 numa_mem_policy=str
2349 Set this job's memory policy and corresponding NUMA nodes. For‐
2350 mat of the arguments:
2351
2352 <mode>[:<nodelist>]
2353
2354 `mode' is one of the following memory policies: `default', `pre‐
2355 fer', `bind', `interleave' or `local'. For `default' and `local'
2356 memory policies, no node needs to be specified. For `prefer',
2357 only one node is allowed. For `bind' and `interleave' the
2358 `nodelist' may be as follows: a comma delimited list of numbers,
2359 A-B ranges, or `all'.
2360
2361 cgroup=str
2362 Add job to this control group. If it doesn't exist, it will be
2363 created. The system must have a mounted cgroup blkio mount point
2364 for this to work. If your system doesn't have it mounted, you
2365 can do so with:
2366
2367 # mount -t cgroup -o blkio none /cgroup
2368
2369 cgroup_weight=int
2370 Set the weight of the cgroup to this value. See the documenta‐
2371 tion that comes with the kernel, allowed values are in the range
2372 of 100..1000.
2373
2374 cgroup_nodelete=bool
2375 Normally fio will delete the cgroups it has created after the
2376 job completion. To override this behavior and to leave cgroups
2377 around after the job completion, set `cgroup_nodelete=1'. This
2378 can be useful if one wants to inspect various cgroup files after
2379 job completion. Default: false.
2380
2381 flow_id=int
2382 The ID of the flow. If not specified, it defaults to being a
2383 global flow. See flow.
2384
2385 flow=int
2386 Weight in token-based flow control. If this value is used, then
2387 there is a 'flow counter' which is used to regulate the propor‐
2388 tion of activity between two or more jobs. Fio attempts to keep
2389 this flow counter near zero. The flow parameter stands for how
2390 much should be added or subtracted to the flow counter on each
2391 iteration of the main I/O loop. That is, if one job has `flow=8'
2392 and another job has `flow=-1', then there will be a roughly 1:8
2393 ratio in how much one runs vs the other.
2394
2395 flow_watermark=int
2396 The maximum value that the absolute value of the flow counter is
2397 allowed to reach before the job must wait for a lower value of
2398 the counter.
2399
2400 flow_sleep=int
2401 The period of time, in microseconds, to wait after the flow
2402 watermark has been exceeded before retrying operations.
2403
2404 stonewall, wait_for_previous
2405 Wait for preceding jobs in the job file to exit, before starting
2406 this one. Can be used to insert serialization points in the job
2407 file. A stone wall also implies starting a new reporting group,
2408 see group_reporting.
2409
2410 exitall
2411 By default, fio will continue running all other jobs when one
2412 job finishes. Sometimes this is not the desired action. Setting
2413 exitall will instead make fio terminate all jobs in the same
2414 group, as soon as one job of that group finishes.
2415
2416 exit_what
2417 By default, fio will continue running all other jobs when one
2418 job finishes. Sometimes this is not the desired action. Setting
2419 exit_all will instead make fio terminate all jobs in the same
2420 group. The option exit_what allows to control which jobs get
2421 terminated when exitall is enabled. The default is group and
2422 does not change the behaviour of exitall. The setting all termi‐
2423 nates all jobs. The setting stonewall terminates all currently
2424 running jobs across all groups and continues execution with the
2425 next stonewalled group.
2426
2427 exec_prerun=str
2428 Before running this job, issue the command specified through
2429 system(3). Output is redirected in a file called `jobname.pre‐
2430 run.txt'.
2431
2432 exec_postrun=str
2433 After the job completes, issue the command specified though sys‐
2434 tem(3). Output is redirected in a file called `job‐
2435 name.postrun.txt'.
2436
2437 uid=int
2438 Instead of running as the invoking user, set the user ID to this
2439 value before the thread/process does any work.
2440
2441 gid=int
2442 Set group ID, see uid.
2443
2444 Verification
2445 verify_only
2446 Do not perform specified workload, only verify data still
2447 matches previous invocation of this workload. This option allows
2448 one to check data multiple times at a later date without over‐
2449 writing it. This option makes sense only for workloads that
2450 write data, and does not support workloads with the time_based
2451 option set.
2452
2453 do_verify=bool
2454 Run the verify phase after a write phase. Only valid if verify
2455 is set. Default: true.
2456
2457 verify=str
2458 If writing to a file, fio can verify the file contents after
2459 each iteration of the job. Each verification method also implies
2460 verification of special header, which is written to the begin‐
2461 ning of each block. This header also includes meta information,
2462 like offset of the block, block number, timestamp when block was
2463 written, etc. verify can be combined with verify_pattern option.
2464 The allowed values are:
2465
2466 md5 Use an md5 sum of the data area and store it in
2467 the header of each block.
2468
2469 crc64 Use an experimental crc64 sum of the data area and
2470 store it in the header of each block.
2471
2472 crc32c Use a crc32c sum of the data area and store it in
2473 the header of each block. This will automatically
2474 use hardware acceleration (e.g. SSE4.2 on an x86
2475 or CRC crypto extensions on ARM64) but will fall
2476 back to software crc32c if none is found. Gener‐
2477 ally the fastest checksum fio supports when hard‐
2478 ware accelerated.
2479
2480 crc32c-intel
2481 Synonym for crc32c.
2482
2483 crc32 Use a crc32 sum of the data area and store it in
2484 the header of each block.
2485
2486 crc16 Use a crc16 sum of the data area and store it in
2487 the header of each block.
2488
2489 crc7 Use a crc7 sum of the data area and store it in
2490 the header of each block.
2491
2492 xxhash Use xxhash as the checksum function. Generally the
2493 fastest software checksum that fio supports.
2494
2495 sha512 Use sha512 as the checksum function.
2496
2497 sha256 Use sha256 as the checksum function.
2498
2499 sha1 Use optimized sha1 as the checksum function.
2500
2501 sha3-224
2502 Use optimized sha3-224 as the checksum function.
2503
2504 sha3-256
2505 Use optimized sha3-256 as the checksum function.
2506
2507 sha3-384
2508 Use optimized sha3-384 as the checksum function.
2509
2510 sha3-512
2511 Use optimized sha3-512 as the checksum function.
2512
2513 meta This option is deprecated, since now meta informa‐
2514 tion is included in generic verification header
2515 and meta verification happens by default. For
2516 detailed information see the description of the
2517 verify setting. This option is kept because of
2518 compatibility's sake with old configurations. Do
2519 not use it.
2520
2521 pattern
2522 Verify a strict pattern. Normally fio includes a
2523 header with some basic information and checksum‐
2524 ming, but if this option is set, only the specific
2525 pattern set with verify_pattern is verified.
2526
2527 null Only pretend to verify. Useful for testing inter‐
2528 nals with `ioengine=null', not for much else.
2529
2530 This option can be used for repeated burn-in tests of a system
2531 to make sure that the written data is also correctly read back.
2532 If the data direction given is a read or random read, fio will
2533 assume that it should verify a previously written file. If the
2534 data direction includes any form of write, the verify will be of
2535 the newly written data.
2536
2537 To avoid false verification errors, do not use the norandommap
2538 option when verifying data with async I/O engines and I/O depths
2539 > 1. Or use the norandommap and the lfsr random generator
2540 together to avoid writing to the same offset with muliple out‐
2541 standing I/Os.
2542
2543 verify_offset=int
2544 Swap the verification header with data somewhere else in the
2545 block before writing. It is swapped back before verifying.
2546
2547 verify_interval=int
2548 Write the verification header at a finer granularity than the
2549 blocksize. It will be written for chunks the size of ver‐
2550 ify_interval. blocksize should divide this evenly.
2551
2552 verify_pattern=str
2553 If set, fio will fill the I/O buffers with this pattern. Fio
2554 defaults to filling with totally random bytes, but sometimes
2555 it's interesting to fill with a known pattern for I/O verifica‐
2556 tion purposes. Depending on the width of the pattern, fio will
2557 fill 1/2/3/4 bytes of the buffer at the time (it can be either a
2558 decimal or a hex number). The verify_pattern if larger than a
2559 32-bit quantity has to be a hex number that starts with either
2560 "0x" or "0X". Use with verify. Also, verify_pattern supports %o
2561 format, which means that for each block offset will be written
2562 and then verified back, e.g.:
2563
2564 verify_pattern=%o
2565
2566 Or use combination of everything:
2567
2568 verify_pattern=0xff%o"abcd"-12
2569
2570 verify_fatal=bool
2571 Normally fio will keep checking the entire contents before quit‐
2572 ting on a block verification failure. If this option is set, fio
2573 will exit the job on the first observed failure. Default: false.
2574
2575 verify_dump=bool
2576 If set, dump the contents of both the original data block and
2577 the data block we read off disk to files. This allows later
2578 analysis to inspect just what kind of data corruption occurred.
2579 Off by default.
2580
2581 verify_async=int
2582 Fio will normally verify I/O inline from the submitting thread.
2583 This option takes an integer describing how many async offload
2584 threads to create for I/O verification instead, causing fio to
2585 offload the duty of verifying I/O contents to one or more sepa‐
2586 rate threads. If using this offload option, even sync I/O
2587 engines can benefit from using an iodepth setting higher than 1,
2588 as it allows them to have I/O in flight while verifies are run‐
2589 ning. Defaults to 0 async threads, i.e. verification is not
2590 asynchronous.
2591
2592 verify_async_cpus=str
2593 Tell fio to set the given CPU affinity on the async I/O verifi‐
2594 cation threads. See cpus_allowed for the format used.
2595
2596 verify_backlog=int
2597 Fio will normally verify the written contents of a job that uti‐
2598 lizes verify once that job has completed. In other words, every‐
2599 thing is written then everything is read back and verified. You
2600 may want to verify continually instead for a variety of reasons.
2601 Fio stores the meta data associated with an I/O block in memory,
2602 so for large verify workloads, quite a bit of memory would be
2603 used up holding this meta data. If this option is enabled, fio
2604 will write only N blocks before verifying these blocks.
2605
2606 verify_backlog_batch=int
2607 Control how many blocks fio will verify if verify_backlog is
2608 set. If not set, will default to the value of verify_backlog
2609 (meaning the entire queue is read back and verified). If ver‐
2610 ify_backlog_batch is less than verify_backlog then not all
2611 blocks will be verified, if verify_backlog_batch is larger than
2612 verify_backlog, some blocks will be verified more than once.
2613
2614 verify_state_save=bool
2615 When a job exits during the write phase of a verify workload,
2616 save its current state. This allows fio to replay up until that
2617 point, if the verify state is loaded for the verify read phase.
2618 The format of the filename is, roughly:
2619
2620 <type>-<jobname>-<jobindex>-verify.state.
2621
2622 <type> is "local" for a local run, "sock" for a client/server
2623 socket connection, and "ip" (192.168.0.1, for instance) for a
2624 networked client/server connection. Defaults to true.
2625
2626 verify_state_load=bool
2627 If a verify termination trigger was used, fio stores the current
2628 write state of each thread. This can be used at verification
2629 time so that fio knows how far it should verify. Without this
2630 information, fio will run a full verification pass, according to
2631 the settings in the job file used. Default false.
2632
2633 trim_percentage=int
2634 Number of verify blocks to discard/trim.
2635
2636 trim_verify_zero=bool
2637 Verify that trim/discarded blocks are returned as zeros.
2638
2639 trim_backlog=int
2640 Verify that trim/discarded blocks are returned as zeros.
2641
2642 trim_backlog_batch=int
2643 Trim this number of I/O blocks.
2644
2645 experimental_verify=bool
2646 Enable experimental verification.
2647
2648 Steady state
2649 steadystate=str:float, ss=str:float
2650 Define the criterion and limit for assessing steady state per‐
2651 formance. The first parameter designates the criterion whereas
2652 the second parameter sets the threshold. When the criterion
2653 falls below the threshold for the specified duration, the job
2654 will stop. For example, `iops_slope:0.1%' will direct fio to
2655 terminate the job when the least squares regression slope falls
2656 below 0.1% of the mean IOPS. If group_reporting is enabled this
2657 will apply to all jobs in the group. Below is the list of avail‐
2658 able steady state assessment criteria. All assessments are car‐
2659 ried out using only data from the rolling collection window.
2660 Threshold limits can be expressed as a fixed value or as a per‐
2661 centage of the mean in the collection window.
2662
2663 When using this feature, most jobs should include the time_based
2664 and runtime options or the loops option so that fio does not
2665 stop running after it has covered the full size of the specified
2666 file(s) or device(s).
2667
2668 iops Collect IOPS data. Stop the job if all
2669 individual IOPS measurements are within the
2670 specified limit of the mean IOPS (e.g.,
2671 `iops:2' means that all individual IOPS
2672 values must be within 2 of the mean,
2673 whereas `iops:0.2%' means that all individ‐
2674 ual IOPS values must be within 0.2% of the
2675 mean IOPS to terminate the job).
2676
2677 iops_slope
2678 Collect IOPS data and calculate the least
2679 squares regression slope. Stop the job if
2680 the slope falls below the specified limit.
2681
2682 bw Collect bandwidth data. Stop the job if all
2683 individual bandwidth measurements are
2684 within the specified limit of the mean
2685 bandwidth.
2686
2687 bw_slope
2688 Collect bandwidth data and calculate the
2689 least squares regression slope. Stop the
2690 job if the slope falls below the specified
2691 limit.
2692
2693 steadystate_duration=time, ss_dur=time
2694 A rolling window of this duration will be used to judge
2695 whether steady state has been reached. Data will be col‐
2696 lected once per second. The default is 0 which disables
2697 steady state detection. When the unit is omitted, the
2698 value is interpreted in seconds.
2699
2700 steadystate_ramp_time=time, ss_ramp=time
2701 Allow the job to run for the specified duration before
2702 beginning data collection for checking the steady state
2703 job termination criterion. The default is 0. When the
2704 unit is omitted, the value is interpreted in seconds.
2705
2706 Measurements and reporting
2707 per_job_logs=bool
2708 If set, this generates bw/clat/iops log with per file private
2709 filenames. If not set, jobs with identical names will share the
2710 log filename. Default: true.
2711
2712 group_reporting
2713 It may sometimes be interesting to display statistics for groups
2714 of jobs as a whole instead of for each individual job. This is
2715 especially true if numjobs is used; looking at individual
2716 thread/process output quickly becomes unwieldy. To see the final
2717 report per-group instead of per-job, use group_reporting. Jobs
2718 in a file will be part of the same reporting group, unless if
2719 separated by a stonewall, or by using new_group.
2720
2721 new_group
2722 Start a new reporting group. See: group_reporting. If not given,
2723 all jobs in a file will be part of the same reporting group,
2724 unless separated by a stonewall.
2725
2726 stats=bool
2727 By default, fio collects and shows final output results for all
2728 jobs that run. If this option is set to 0, then fio will ignore
2729 it in the final stat output.
2730
2731 write_bw_log=str
2732 If given, write a bandwidth log for this job. Can be used to
2733 store data of the bandwidth of the jobs in their lifetime.
2734
2735 If no str argument is given, the default filename of `job‐
2736 name_type.x.log' is used. Even when the argument is given, fio
2737 will still append the type of log. So if one specifies:
2738
2739 write_bw_log=foo
2740
2741 The actual log name will be `foo_bw.x.log' where `x' is the
2742 index of the job (1..N, where N is the number of jobs). If
2743 per_job_logs is false, then the filename will not include the
2744 `.x` job index.
2745
2746 The included fio_generate_plots script uses gnuplot to turn
2747 these text files into nice graphs. See the LOG FILE FORMATS sec‐
2748 tion for how data is structured within the file.
2749
2750 write_lat_log=str
2751 Same as write_bw_log, except this option creates I/O submission
2752 (e.g., `name_slat.x.log'), completion (e.g., `name_clat.x.log'),
2753 and total (e.g., `name_lat.x.log') latency files instead. See
2754 write_bw_log for details about the filename format and the LOG
2755 FILE FORMATS section for how data is structured within the
2756 files.
2757
2758 write_hist_log=str
2759 Same as write_bw_log but writes an I/O completion latency his‐
2760 togram file (e.g., `name_hist.x.log') instead. Note that this
2761 file will be empty unless log_hist_msec has also been set. See
2762 write_bw_log for details about the filename format and the LOG
2763 FILE FORMATS section for how data is structured within the file.
2764
2765 write_iops_log=str
2766 Same as write_bw_log, but writes an IOPS file (e.g.
2767 `name_iops.x.log`) instead. Because fio defaults to individual
2768 I/O logging, the value entry in the IOPS log will be 1 unless
2769 windowed logging (see log_avg_msec) has been enabled. See
2770 write_bw_log for details about the filename format and LOG FILE
2771 FORMATS for how data is structured within the file.
2772
2773 log_avg_msec=int
2774 By default, fio will log an entry in the iops, latency, or bw
2775 log for every I/O that completes. When writing to the disk log,
2776 that can quickly grow to a very large size. Setting this option
2777 makes fio average the each log entry over the specified period
2778 of time, reducing the resolution of the log. See log_max_value
2779 as well. Defaults to 0, logging all entries. Also see LOG FILE
2780 FORMATS section.
2781
2782 log_hist_msec=int
2783 Same as log_avg_msec, but logs entries for completion latency
2784 histograms. Computing latency percentiles from averages of
2785 intervals using log_avg_msec is inaccurate. Setting this option
2786 makes fio log histogram entries over the specified period of
2787 time, reducing log sizes for high IOPS devices while retaining
2788 percentile accuracy. See log_hist_coarseness and write_hist_log
2789 as well. Defaults to 0, meaning histogram logging is disabled.
2790
2791 log_hist_coarseness=int
2792 Integer ranging from 0 to 6, defining the coarseness of the res‐
2793 olution of the histogram logs enabled with log_hist_msec. For
2794 each increment in coarseness, fio outputs half as many bins.
2795 Defaults to 0, for which histogram logs contain 1216 latency
2796 bins. See LOG FILE FORMATS section.
2797
2798 log_max_value=bool
2799 If log_avg_msec is set, fio logs the average over that window.
2800 If you instead want to log the maximum value, set this option to
2801 1. Defaults to 0, meaning that averaged values are logged.
2802
2803 log_offset=bool
2804 If this is set, the iolog options will include the byte offset
2805 for the I/O entry as well as the other data values. Defaults to
2806 0 meaning that offsets are not present in logs. Also see LOG
2807 FILE FORMATS section.
2808
2809 log_compression=int
2810 If this is set, fio will compress the I/O logs as it goes, to
2811 keep the memory footprint lower. When a log reaches the speci‐
2812 fied size, that chunk is removed and compressed in the back‐
2813 ground. Given that I/O logs are fairly highly compressible, this
2814 yields a nice memory savings for longer runs. The downside is
2815 that the compression will consume some background CPU cycles, so
2816 it may impact the run. This, however, is also true if the log‐
2817 ging ends up consuming most of the system memory. So pick your
2818 poison. The I/O logs are saved normally at the end of a run, by
2819 decompressing the chunks and storing them in the specified log
2820 file. This feature depends on the availability of zlib.
2821
2822 log_compression_cpus=str
2823 Define the set of CPUs that are allowed to handle online log
2824 compression for the I/O jobs. This can provide better isolation
2825 between performance sensitive jobs, and background compression
2826 work. See cpus_allowed for the format used.
2827
2828 log_store_compressed=bool
2829 If set, fio will store the log files in a compressed format.
2830 They can be decompressed with fio, using the --inflate-log com‐
2831 mand line parameter. The files will be stored with a `.fz' suf‐
2832 fix.
2833
2834 log_unix_epoch=bool
2835 If set, fio will log Unix timestamps to the log files produced
2836 by enabling write_type_log for each log type, instead of the
2837 default zero-based timestamps.
2838
2839 block_error_percentiles=bool
2840 If set, record errors in trim block-sized units from writes and
2841 trims and output a histogram of how many trims it took to get to
2842 errors, and what kind of error was encountered.
2843
2844 bwavgtime=int
2845 Average the calculated bandwidth over the given time. Value is
2846 specified in milliseconds. If the job also does bandwidth log‐
2847 ging through write_bw_log, then the minimum of this option and
2848 log_avg_msec will be used. Default: 500ms.
2849
2850 iopsavgtime=int
2851 Average the calculated IOPS over the given time. Value is speci‐
2852 fied in milliseconds. If the job also does IOPS logging through
2853 write_iops_log, then the minimum of this option and log_avg_msec
2854 will be used. Default: 500ms.
2855
2856 disk_util=bool
2857 Generate disk utilization statistics, if the platform supports
2858 it. Default: true.
2859
2860 disable_lat=bool
2861 Disable measurements of total latency numbers. Useful only for
2862 cutting back the number of calls to gettimeofday(2), as that
2863 does impact performance at really high IOPS rates. Note that to
2864 really get rid of a large amount of these calls, this option
2865 must be used with disable_slat and disable_bw_measurement as
2866 well.
2867
2868 disable_clat=bool
2869 Disable measurements of completion latency numbers. See dis‐
2870 able_lat.
2871
2872 disable_slat=bool
2873 Disable measurements of submission latency numbers. See dis‐
2874 able_lat.
2875
2876 disable_bw_measurement=bool, disable_bw=bool
2877 Disable measurements of throughput/bandwidth numbers. See dis‐
2878 able_lat.
2879
2880 slat_percentiles=bool
2881 Report submission latency percentiles. Submission latency is not
2882 recorded for synchronous ioengines.
2883
2884 clat_percentiles=bool
2885 Report completion latency percentiles.
2886
2887 lat_percentiles=bool
2888 Report total latency percentiles. Total latency is the sum of
2889 submission latency and completion latency.
2890
2891 percentile_list=float_list
2892 Overwrite the default list of percentiles for latencies and the
2893 block error histogram. Each number is a floating point number in
2894 the range (0,100], and the maximum length of the list is 20. Use
2895 ':' to separate the numbers. For example, `--per‐
2896 centile_list=99.5:99.9' will cause fio to report the latency
2897 durations below which 99.5% and 99.9% of the observed latencies
2898 fell, respectively.
2899
2900 significant_figures=int
2901 If using --output-format of `normal', set the significant fig‐
2902 ures to this value. Higher values will yield more precise IOPS
2903 and throughput units, while lower values will round. Requires a
2904 minimum value of 1 and a maximum value of 10. Defaults to 4.
2905
2906 Error handling
2907 exitall_on_error
2908 When one job finishes in error, terminate the rest. The default
2909 is to wait for each job to finish.
2910
2911 continue_on_error=str
2912 Normally fio will exit the job on the first observed failure. If
2913 this option is set, fio will continue the job when there is a
2914 'non-fatal error' (EIO or EILSEQ) until the runtime is exceeded
2915 or the I/O size specified is completed. If this option is used,
2916 there are two more stats that are appended, the total error
2917 count and the first error. The error field given in the stats is
2918 the first error that was hit during the run. The allowed values
2919 are:
2920
2921 none Exit on any I/O or verify errors.
2922
2923 read Continue on read errors, exit on all others.
2924
2925 write Continue on write errors, exit on all others.
2926
2927 io Continue on any I/O error, exit on all others.
2928
2929 verify Continue on verify errors, exit on all others.
2930
2931 all Continue on all errors.
2932
2933 0 Backward-compatible alias for 'none'.
2934
2935 1 Backward-compatible alias for 'all'.
2936
2937 ignore_error=str
2938 Sometimes you want to ignore some errors during test in that
2939 case you can specify error list for each error type, instead of
2940 only being able to ignore the default 'non-fatal error' using
2941 continue_on_error.
2942 `ignore_error=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST'
2943 errors for given error type is separated with ':'. Error may be
2944 symbol ('ENOSPC', 'ENOMEM') or integer. Example:
2945
2946 ignore_error=EAGAIN,ENOSPC:122
2947
2948 This option will ignore EAGAIN from READ, and ENOSPC and
2949 122(EDQUOT) from WRITE. This option works by overriding con‐
2950 tinue_on_error with the list of errors for each error type if
2951 any.
2952
2953 error_dump=bool
2954 If set dump every error even if it is non fatal, true by
2955 default. If disabled only fatal error will be dumped.
2956
2957 Running predefined workloads
2958 Fio includes predefined profiles that mimic the I/O workloads generated
2959 by other tools.
2960
2961 profile=str
2962 The predefined workload to run. Current profiles are:
2963
2964 tiobench
2965 Threaded I/O bench (tiotest/tiobench) like work‐
2966 load.
2967
2968 act Aerospike Certification Tool (ACT) like workload.
2969
2970 To view a profile's additional options use --cmdhelp after specifying
2971 the profile. For example:
2972
2973 $ fio --profile=act --cmdhelp
2974
2975 Act profile options
2976 device-names=str
2977 Devices to use.
2978
2979 load=int
2980 ACT load multiplier. Default: 1.
2981
2982 test-duration=time
2983 How long the entire test takes to run. When the unit is omitted,
2984 the value is given in seconds. Default: 24h.
2985
2986 threads-per-queue=int
2987 Number of read I/O threads per device. Default: 8.
2988
2989 read-req-num-512-blocks=int
2990 Number of 512B blocks to read at the time. Default: 3.
2991
2992 large-block-op-kbytes=int
2993 Size of large block ops in KiB (writes). Default: 131072.
2994
2995 prep Set to run ACT prep phase.
2996
2997 Tiobench profile options
2998 size=str
2999 Size in MiB.
3000
3001 block=int
3002 Block size in bytes. Default: 4096.
3003
3004 numruns=int
3005 Number of runs.
3006
3007 dir=str
3008 Test directory.
3009
3010 threads=int
3011 Number of threads.
3012
3014 Fio spits out a lot of output. While running, fio will display the sta‐
3015 tus of the jobs created. An example of that would be:
3016
3017 Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
3018
3019 The characters inside the first set of square brackets denote the cur‐
3020 rent status of each thread. The first character is the first job
3021 defined in the job file, and so forth. The possible values (in typical
3022 life cycle order) are:
3023
3024 P Thread setup, but not started.
3025 C Thread created.
3026 I Thread initialized, waiting or generating necessary data.
3027 p Thread running pre-reading file(s).
3028 / Thread is in ramp period.
3029 R Running, doing sequential reads.
3030 r Running, doing random reads.
3031 W Running, doing sequential writes.
3032 w Running, doing random writes.
3033 M Running, doing mixed sequential reads/writes.
3034 m Running, doing mixed random reads/writes.
3035 D Running, doing sequential trims.
3036 d Running, doing random trims.
3037 F Running, currently waiting for fsync(2).
3038 V Running, doing verification of written data.
3039 f Thread finishing.
3040 E Thread exited, not reaped by main thread yet.
3041 - Thread reaped.
3042 X Thread reaped, exited with an error.
3043 K Thread reaped, exited due to signal.
3044
3045 Fio will condense the thread string as not to take up more space on the
3046 command line than needed. For instance, if you have 10 readers and 10
3047 writers running, the output would look like this:
3048
3049 Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
3050
3051 Note that the status string is displayed in order, so it's possible to
3052 tell which of the jobs are currently doing what. In the example above
3053 this means that jobs 1--10 are readers and 11--20 are writers.
3054
3055 The other values are fairly self explanatory -- number of threads cur‐
3056 rently running and doing I/O, the number of currently open files (f=),
3057 the estimated completion percentage, the rate of I/O since last check
3058 (read speed listed first, then write speed and optionally trim speed)
3059 in terms of bandwidth and IOPS, and time to completion for the current
3060 running group. It's impossible to estimate runtime of the following
3061 groups (if any).
3062
3063 When fio is done (or interrupted by Ctrl-C), it will show the data for
3064 each thread, group of threads, and disks in that order. For each over‐
3065 all thread (or group) the output looks like:
3066
3067 Client1: (groupid=0, jobs=1): err= 0: pid=16109: Sat Jun 24 12:07:54 2017
3068 write: IOPS=88, BW=623KiB/s (638kB/s)(30.4MiB/50032msec)
3069 slat (nsec): min=500, max=145500, avg=8318.00, stdev=4781.50
3070 clat (usec): min=170, max=78367, avg=4019.02, stdev=8293.31
3071 lat (usec): min=174, max=78375, avg=4027.34, stdev=8291.79
3072 clat percentiles (usec):
3073 | 1.00th=[ 302], 5.00th=[ 326], 10.00th=[ 343], 20.00th=[ 363],
3074 | 30.00th=[ 392], 40.00th=[ 404], 50.00th=[ 416], 60.00th=[ 445],
3075 | 70.00th=[ 816], 80.00th=[ 6718], 90.00th=[12911], 95.00th=[21627],
3076 | 99.00th=[43779], 99.50th=[51643], 99.90th=[68682], 99.95th=[72877],
3077 | 99.99th=[78119]
3078 bw ( KiB/s): min= 532, max= 686, per=0.10%, avg=622.87, stdev=24.82, samples= 100
3079 iops : min= 76, max= 98, avg=88.98, stdev= 3.54, samples= 100
3080 lat (usec) : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
3081 lat (msec) : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
3082 lat (msec) : 100=0.65%
3083 cpu : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
3084 IO depths : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
3085 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3086 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3087 issued rwt: total=0,4450,0, short=0,0,0, dropped=0,0,0
3088 latency : target=0, window=0, percentile=100.00%, depth=8
3089
3090 The job name (or first job's name when using group_reporting) is
3091 printed, along with the group id, count of jobs being aggregated, last
3092 error id seen (which is 0 when there are no errors), pid/tid of that
3093 thread and the time the job/group completed. Below are the I/O statis‐
3094 tics for each data direction performed (showing writes in the example
3095 above). In the order listed, they denote:
3096
3097 read/write/trim
3098 The string before the colon shows the I/O direction the
3099 statistics are for. IOPS is the average I/Os performed
3100 per second. BW is the average bandwidth rate shown as:
3101 value in power of 2 format (value in power of 10 format).
3102 The last two values show: (total I/O performed in power
3103 of 2 format / runtime of that thread).
3104
3105 slat Submission latency (min being the minimum, max being the
3106 maximum, avg being the average, stdev being the standard
3107 deviation). This is the time it took to submit the I/O.
3108 For sync I/O this row is not displayed as the slat is
3109 really the completion latency (since queue/complete is
3110 one operation there). This value can be in nanoseconds,
3111 microseconds or milliseconds --- fio will choose the most
3112 appropriate base and print that (in the example above
3113 nanoseconds was the best scale). Note: in --minimal mode
3114 latencies are always expressed in microseconds.
3115
3116 clat Completion latency. Same names as slat, this denotes the
3117 time from submission to completion of the I/O pieces. For
3118 sync I/O, clat will usually be equal (or very close) to
3119 0, as the time from submit to complete is basically just
3120 CPU time (I/O has already been done, see slat explana‐
3121 tion).
3122
3123 lat Total latency. Same names as slat and clat, this denotes
3124 the time from when fio created the I/O unit to completion
3125 of the I/O operation.
3126
3127 bw Bandwidth statistics based on samples. Same names as the
3128 xlat stats, but also includes the number of samples taken
3129 (samples) and an approximate percentage of total aggre‐
3130 gate bandwidth this thread received in its group (per).
3131 This last value is only really useful if the threads in
3132 this group are on the same disk, since they are then com‐
3133 peting for disk access.
3134
3135 iops IOPS statistics based on samples. Same names as bw.
3136
3137 lat (nsec/usec/msec)
3138 The distribution of I/O completion latencies. This is the
3139 time from when I/O leaves fio and when it gets completed.
3140 Unlike the separate read/write/trim sections above, the
3141 data here and in the remaining sections apply to all I/Os
3142 for the reporting group. 250=0.04% means that 0.04% of
3143 the I/Os completed in under 250us. 500=64.11% means that
3144 64.11% of the I/Os required 250 to 499us for completion.
3145
3146 cpu CPU usage. User and system time, along with the number of
3147 context switches this thread went through, usage of sys‐
3148 tem and user time, and finally the number of major and
3149 minor page faults. The CPU utilization numbers are aver‐
3150 ages for the jobs in that reporting group, while the con‐
3151 text and fault counters are summed.
3152
3153 IO depths
3154 The distribution of I/O depths over the job lifetime. The
3155 numbers are divided into powers of 2 and each entry cov‐
3156 ers depths from that value up to those that are lower
3157 than the next entry -- e.g., 16= covers depths from 16 to
3158 31. Note that the range covered by a depth distribution
3159 entry can be different to the range covered by the equiv‐
3160 alent submit/complete distribution entry.
3161
3162 IO submit
3163 How many pieces of I/O were submitting in a single submit
3164 call. Each entry denotes that amount and below, until the
3165 previous entry -- e.g., 16=100% means that we submitted
3166 anywhere between 9 to 16 I/Os per submit call. Note that
3167 the range covered by a submit distribution entry can be
3168 different to the range covered by the equivalent depth
3169 distribution entry.
3170
3171 IO complete
3172 Like the above submit number, but for completions
3173 instead.
3174
3175 IO issued rwt
3176 The number of read/write/trim requests issued, and how
3177 many of them were short or dropped.
3178
3179 IO latency
3180 These values are for latency_target and related options.
3181 When these options are engaged, this section describes
3182 the I/O depth required to meet the specified latency tar‐
3183 get.
3184
3185 After each client has been listed, the group statistics are printed.
3186 They will look like this:
3187
3188 Run status group 0 (all jobs):
3189 READ: bw=20.9MiB/s (21.9MB/s), 10.4MiB/s-10.8MiB/s (10.9MB/s-11.3MB/s), io=64.0MiB (67.1MB), run=2973-3069msec
3190 WRITE: bw=1231KiB/s (1261kB/s), 616KiB/s-621KiB/s (630kB/s-636kB/s), io=64.0MiB (67.1MB), run=52747-53223msec
3191
3192 For each data direction it prints:
3193
3194 bw Aggregate bandwidth of threads in this group followed by
3195 the minimum and maximum bandwidth of all the threads in
3196 this group. Values outside of brackets are power-of-2
3197 format and those within are the equivalent value in a
3198 power-of-10 format.
3199
3200 io Aggregate I/O performed of all threads in this group. The
3201 format is the same as bw.
3202
3203 run The smallest and longest runtimes of the threads in this
3204 group.
3205
3206 And finally, the disk statistics are printed. This is Linux specific.
3207 They will look like this:
3208
3209 Disk stats (read/write):
3210 sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
3211
3212 Each value is printed for both reads and writes, with reads first. The
3213 numbers denote:
3214
3215 ios Number of I/Os performed by all groups.
3216
3217 merge Number of merges performed by the I/O scheduler.
3218
3219 ticks Number of ticks we kept the disk busy.
3220
3221 in_queue
3222 Total time spent in the disk queue.
3223
3224 util The disk utilization. A value of 100% means we kept the
3225 disk busy constantly, 50% would be a disk idling half of
3226 the time.
3227
3228 It is also possible to get fio to dump the current output while it is
3229 running, without terminating the job. To do that, send fio the USR1
3230 signal. You can also get regularly timed dumps by using the --sta‐
3231 tus-interval parameter, or by creating a file in `/tmp' named
3232 `fio-dump-status'. If fio sees this file, it will unlink it and dump
3233 the current output status.
3234
3236 For scripted usage where you typically want to generate tables or
3237 graphs of the results, fio can output the results in a semicolon sepa‐
3238 rated format. The format is one long line of values, such as:
3239
3240 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
3241 A description of this job goes here.
3242
3243 The job description (if provided) follows on a second line for terse
3244 v2. It appears on the same line for other terse versions.
3245
3246 To enable terse output, use the --minimal or `--output-format=terse'
3247 command line options. The first value is the version of the terse out‐
3248 put format. If the output has to be changed for some reason, this num‐
3249 ber will be incremented by 1 to signify that change.
3250
3251 Split up, the format is as follows (comments in brackets denote when a
3252 field was introduced or whether it's specific to some terse version):
3253
3254 terse version, fio version [v3], jobname, groupid, error
3255
3256 READ status:
3257
3258 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3259 Submission latency: min, max, mean, stdev (usec)
3260 Completion latency: min, max, mean, stdev (usec)
3261 Completion latency percentiles: 20 fields (see below)
3262 Total latency: min, max, mean, stdev (usec)
3263 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3264 IOPS [v5]: min, max, mean, stdev, number of samples
3265
3266 WRITE status:
3267
3268 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3269 Submission latency: min, max, mean, stdev (usec)
3270 Completion latency: min, max, mean, stdev (usec)
3271 Completion latency percentiles: 20 fields (see below)
3272 Total latency: min, max, mean, stdev (usec)
3273 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3274 IOPS [v5]: min, max, mean, stdev, number of samples
3275
3276 TRIM status [all but version 3]:
3277
3278 Fields are similar to READ/WRITE status.
3279
3280 CPU usage:
3281
3282 user, system, context switches, major faults, minor faults
3283
3284 I/O depths:
3285
3286 <=1, 2, 4, 8, 16, 32, >=64
3287
3288 I/O latencies microseconds:
3289
3290 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
3291
3292 I/O latencies milliseconds:
3293
3294 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
3295
3296 Disk utilization [v3]:
3297
3298 disk name, read ios, write ios, read merges, write merges, read ticks, write ticks, time spent in queue, disk utilization percentage
3299
3300 Additional Info (dependent on continue_on_error, default off):
3301
3302 total # errors, first error code
3303
3304 Additional Info (dependent on description being set):
3305
3306 Text description
3307
3308 Completion latency percentiles can be a grouping of up to 20 sets, so
3309 for the terse output fio writes all of them. Each field will look like
3310 this:
3311
3312 1.00%=6112
3313
3314 which is the Xth percentile, and the `usec' latency associated with it.
3315
3316 For Disk utilization, all disks used by fio are shown. So for each disk
3317 there will be a disk utilization section.
3318
3319 Below is a single line containing short names for each of the fields in
3320 the minimal output v3, separated by semicolons:
3321
3322 terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
3323
3324 In client/server mode terse output differs from what appears when jobs
3325 are run locally. Disk utilization data is omitted from the standard
3326 terse output and for v3 and later appears on its own separate line at
3327 the end of each terse reporting cycle.
3328
3330 The json output format is intended to be both human readable and conve‐
3331 nient for automated parsing. For the most part its sections mirror
3332 those of the normal output. The runtime value is reported in msec and
3333 the bw value is reported in 1024 bytes per second units.
3334
3336 The json+ output format is identical to the json output format except
3337 that it adds a full dump of the completion latency bins. Each bins
3338 object contains a set of (key, value) pairs where keys are latency
3339 durations and values count how many I/Os had completion latencies of
3340 the corresponding duration. For example, consider:
3341
3342 "bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1,
3343 "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" :
3344 534, "105984" : 5995, "107008" : 7529, ... }
3345
3346 This data indicates that one I/O required 87,552ns to complete, two
3347 I/Os required 100,864ns to complete, and 7529 I/Os required 107,008ns
3348 to complete.
3349
3350 Also included with fio is a Python script fio_jsonplus_clat2csv that
3351 takes json+ output and generates CSV-formatted latency data suitable
3352 for plotting.
3353
3354 The latency durations actually represent the midpoints of latency
3355 intervals. For details refer to `stat.h' in the fio source.
3356
3358 There are two trace file format that you can encounter. The older (v1)
3359 format is unsupported since version 1.20-rc3 (March 2008). It will
3360 still be described below in case that you get an old trace and want to
3361 understand it.
3362
3363 In any case the trace is a simple text file with a single action per
3364 line.
3365
3366 Trace file format v1
3367 Each line represents a single I/O action in the following for‐
3368 mat:
3369
3370 rw, offset, length
3371
3372 where `rw=0/1' for read/write, and the `offset' and `length'
3373 entries being in bytes.
3374
3375 This format is not supported in fio versions >= 1.20-rc3.
3376
3377 Trace file format v2
3378 The second version of the trace file format was added in fio
3379 version 1.17. It allows to access more then one file per trace
3380 and has a bigger set of possible file actions.
3381
3382 The first line of the trace file has to be:
3383
3384 "fio version 2 iolog"
3385
3386 Following this can be lines in two different formats, which are
3387 described below.
3388
3389 The file management format:
3390 filename action
3391
3392 The `filename' is given as an absolute path. The `action'
3393 can be one of these:
3394
3395 add Add the given `filename' to the trace.
3396
3397 open Open the file with the given `filename'.
3398 The `filename' has to have been added with
3399 the add action before.
3400
3401 close Close the file with the given `filename'.
3402 The file has to have been opened before.
3403
3404 The file I/O action format:
3405 filename action offset length
3406
3407 The `filename' is given as an absolute path, and has to
3408 have been added and opened before it can be used with
3409 this format. The `offset' and `length' are given in
3410 bytes. The `action' can be one of these:
3411
3412 wait Wait for `offset' microseconds. Everything
3413 below 100 is discarded. The time is rela‐
3414 tive to the previous `wait' statement.
3415
3416 read Read `length' bytes beginning from `off‐
3417 set'.
3418
3419 write Write `length' bytes beginning from `off‐
3420 set'.
3421
3422 sync fsync(2) the file.
3423
3424 datasync
3425 fdatasync(2) the file.
3426
3427 trim Trim the given file from the given `offset'
3428 for `length' bytes.
3429
3431 Colocation is a common practice used to get the most out of a machine.
3432 Knowing which workloads play nicely with each other and which ones
3433 don't is a much harder task. While fio can replay workloads concur‐
3434 rently via multiple jobs, it leaves some variability up to the sched‐
3435 uler making results harder to reproduce. Merging is a way to make the
3436 order of events consistent.
3437
3438 Merging is integrated into I/O replay and done when a merge_blk‐
3439 trace_file is specified. The list of files passed to read_iolog go
3440 through the merge process and output a single file stored to the speci‐
3441 fied file. The output file is passed on as if it were the only file
3442 passed to read_iolog. An example would look like:
3443
3444 $ fio --read_iolog="<file1>:<file2>" --merge_blk‐
3445 trace_file="<output_file>"
3446
3447 Creating only the merged file can be done by passing the command line
3448 argument merge-blktrace-only.
3449
3450 Scaling traces can be done to see the relative impact of any particular
3451 trace being slowed down or sped up. merge_blktrace_scalars takes in a
3452 colon separated list of percentage scalars. It is index paired with the
3453 files passed to read_iolog.
3454
3455 With scaling, it may be desirable to match the running time of all
3456 traces. This can be done with merge_blktrace_iters. It is index paired
3457 with read_iolog just like merge_blktrace_scalars.
3458
3459 In an example, given two traces, A and B, each 60s long. If we want to
3460 see the impact of trace A issuing IOs twice as fast and repeat trace A
3461 over the runtime of trace B, the following can be done:
3462
3463 $ fio --read_iolog="<trace_a>:"<trace_b>" --merge_blk‐
3464 trace_file"<output_file>" --merge_blktrace_scalars="50:100"
3465 --merge_blktrace_iters="2:1"
3466
3467 This runs trace A at 2x the speed twice for approximately the same run‐
3468 time as a single run of trace B.
3469
3471 In some cases, we want to understand CPU overhead in a test. For exam‐
3472 ple, we test patches for the specific goodness of whether they reduce
3473 CPU usage. Fio implements a balloon approach to create a thread per
3474 CPU that runs at idle priority, meaning that it only runs when nobody
3475 else needs the cpu. By measuring the amount of work completed by the
3476 thread, idleness of each CPU can be derived accordingly.
3477
3478 An unit work is defined as touching a full page of unsigned characters.
3479 Mean and standard deviation of time to complete an unit work is
3480 reported in "unit work" section. Options can be chosen to report
3481 detailed percpu idleness or overall system idleness by aggregating per‐
3482 cpu stats.
3483
3485 Fio is usually run in one of two ways, when data verification is done.
3486 The first is a normal write job of some sort with verify enabled. When
3487 the write phase has completed, fio switches to reads and verifies
3488 everything it wrote. The second model is running just the write phase,
3489 and then later on running the same job (but with reads instead of
3490 writes) to repeat the same I/O patterns and verify the contents. Both
3491 of these methods depend on the write phase being completed, as fio oth‐
3492 erwise has no idea how much data was written.
3493
3494 With verification triggers, fio supports dumping the current write
3495 state to local files. Then a subsequent read verify workload can load
3496 this state and know exactly where to stop. This is useful for testing
3497 cases where power is cut to a server in a managed fashion, for
3498 instance.
3499
3500 A verification trigger consists of two things:
3501
3502 1) Storing the write state of each job.
3503
3504 2) Executing a trigger command.
3505
3506 The write state is relatively small, on the order of hundreds of bytes
3507 to single kilobytes. It contains information on the number of comple‐
3508 tions done, the last X completions, etc.
3509
3510 A trigger is invoked either through creation ('touch') of a specified
3511 file in the system, or through a timeout setting. If fio is run with
3512 `--trigger-file=/tmp/trigger-file', then it will continually check for
3513 the existence of `/tmp/trigger-file'. When it sees this file, it will
3514 fire off the trigger (thus saving state, and executing the trigger com‐
3515 mand).
3516
3517 For client/server runs, there's both a local and remote trigger. If fio
3518 is running as a server backend, it will send the job states back to the
3519 client for safe storage, then execute the remote trigger, if specified.
3520 If a local trigger is specified, the server will still send back the
3521 write state, but the client will then execute the trigger.
3522
3523 Verification trigger example
3524 Let's say we want to run a powercut test on the remote Linux
3525 machine 'server'. Our write workload is in `write-test.fio'. We
3526 want to cut power to 'server' at some point during the run, and
3527 we'll run this test from the safety or our local machine,
3528 'localbox'. On the server, we'll start the fio backend normally:
3529
3530 server# fio --server
3531
3532 and on the client, we'll fire off the workload:
3533
3534 localbox$ fio --client=server --trig‐
3535 ger-file=/tmp/my-trigger --trigger-remote="bash -c "echo
3536 b > /proc/sysrq-triger""
3537
3538 We set `/tmp/my-trigger' as the trigger file, and we tell fio to
3539 execute:
3540
3541 echo b > /proc/sysrq-trigger
3542
3543 on the server once it has received the trigger and sent us the
3544 write state. This will work, but it's not really cutting power
3545 to the server, it's merely abruptly rebooting it. If we have a
3546 remote way of cutting power to the server through IPMI or simi‐
3547 lar, we could do that through a local trigger command instead.
3548 Let's assume we have a script that does IPMI reboot of a given
3549 hostname, ipmi-reboot. On localbox, we could then have run fio
3550 with a local trigger instead:
3551
3552 localbox$ fio --client=server --trig‐
3553 ger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
3554
3555 For this case, fio would wait for the server to send us the
3556 write state, then execute `ipmi-reboot server' when that hap‐
3557 pened.
3558
3559 Loading verify state
3560 To load stored write state, a read verification job file must
3561 contain the verify_state_load option. If that is set, fio will
3562 load the previously stored state. For a local fio run this is
3563 done by loading the files directly, and on a client/server run,
3564 the server backend will ask the client to send the files over
3565 and load them from there.
3566
3568 Fio supports a variety of log file formats, for logging latencies,
3569 bandwidth, and IOPS. The logs share a common format, which looks like
3570 this:
3571
3572 time (msec), value, data direction, block size (bytes), offset
3573 (bytes)
3574
3575 `Time' for the log entry is always in milliseconds. The `value' logged
3576 depends on the type of log, it will be one of the following:
3577
3578 Latency log
3579 Value is latency in nsecs
3580
3581 Bandwidth log
3582 Value is in KiB/sec
3583
3584 IOPS log
3585 Value is IOPS
3586
3587 `Data direction' is one of the following:
3588
3589 0 I/O is a READ
3590
3591 1 I/O is a WRITE
3592
3593 2 I/O is a TRIM
3594
3595 The entry's `block size' is always in bytes. The `offset' is the posi‐
3596 tion in bytes from the start of the file for that particular I/O. The
3597 logging of the offset can be toggled with log_offset.
3598
3599 Fio defaults to logging every individual I/O but when windowed logging
3600 is set through log_avg_msec, either the average (by default) or the
3601 maximum (log_max_value is set) `value' seen over the specified period
3602 of time is recorded. Each `data direction' seen within the window
3603 period will aggregate its values in a separate row. Further, when using
3604 windowed logging the `block size' and `offset' entries will always con‐
3605 tain 0.
3606
3608 Normally fio is invoked as a stand-alone application on the machine
3609 where the I/O workload should be generated. However, the backend and
3610 frontend of fio can be run separately i.e., the fio server can generate
3611 an I/O workload on the "Device Under Test" while being controlled by a
3612 client on another machine.
3613
3614 Start the server on the machine which has access to the storage DUT:
3615
3616 $ fio --server=args
3617
3618 where `args' defines what fio listens to. The arguments are of the form
3619 `type,hostname' or `IP,port'. `type' is either `ip' (or ip4) for TCP/IP
3620 v4, `ip6' for TCP/IP v6, or `sock' for a local unix domain socket.
3621 `hostname' is either a hostname or IP address, and `port' is the port
3622 to listen to (only valid for TCP/IP, not a local socket). Some exam‐
3623 ples:
3624
3625 1) fio --server
3626 Start a fio server, listening on all interfaces on the
3627 default port (8765).
3628
3629 2) fio --server=ip:hostname,4444
3630 Start a fio server, listening on IP belonging to hostname
3631 and on port 4444.
3632
3633 3) fio --server=ip6:::1,4444
3634 Start a fio server, listening on IPv6 localhost ::1 and
3635 on port 4444.
3636
3637 4) fio --server=,4444
3638 Start a fio server, listening on all interfaces on port
3639 4444.
3640
3641 5) fio --server=1.2.3.4
3642 Start a fio server, listening on IP 1.2.3.4 on the
3643 default port.
3644
3645 6) fio --server=sock:/tmp/fio.sock
3646 Start a fio server, listening on the local socket
3647 `/tmp/fio.sock'.
3648
3649 Once a server is running, a "client" can connect to the fio server
3650 with:
3651
3652 $ fio <local-args> --client=<server> <remote-args> <job file(s)>
3653
3654 where `local-args' are arguments for the client where it is running,
3655 `server' is the connect string, and `remote-args' and `job file(s)' are
3656 sent to the server. The `server' string follows the same format as it
3657 does on the server side, to allow IP/hostname/socket and port strings.
3658
3659 Fio can connect to multiple servers this way:
3660
3661 $ fio --client=<server1> <job file(s)> --client=<server2> <job
3662 file(s)>
3663
3664 If the job file is located on the fio server, then you can tell the
3665 server to load a local file as well. This is done by using
3666 --remote-config:
3667
3668 $ fio --client=server --remote-config /path/to/file.fio
3669
3670 Then fio will open this local (to the server) job file instead of being
3671 passed one from the client.
3672
3673 If you have many servers (example: 100 VMs/containers), you can input a
3674 pathname of a file containing host IPs/names as the parameter value for
3675 the --client option. For example, here is an example `host.list' file
3676 containing 2 hostnames:
3677
3678 host1.your.dns.domain
3679 host2.your.dns.domain
3680
3681 The fio command would then be:
3682
3683 $ fio --client=host.list <job file(s)>
3684
3685 In this mode, you cannot input server-specific parameters or job files
3686 -- all servers receive the same job file.
3687
3688 In order to let `fio --client' runs use a shared filesystem from multi‐
3689 ple hosts, `fio --client' now prepends the IP address of the server to
3690 the filename. For example, if fio is using the directory `/mnt/nfs/fio'
3691 and is writing filename `fileio.tmp', with a --client `hostfile' con‐
3692 taining two hostnames `h1' and `h2' with IP addresses 192.168.10.120
3693 and 192.168.10.121, then fio will create two files:
3694
3695 /mnt/nfs/fio/192.168.10.120.fileio.tmp
3696 /mnt/nfs/fio/192.168.10.121.fileio.tmp
3697
3698 Terse output in client/server mode will differ slightly from what is
3699 produced when fio is run in stand-alone mode. See the terse output sec‐
3700 tion for details.
3701
3703 fio was written by Jens Axboe <axboe@kernel.dk>.
3704 This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au>
3705 based on documentation by Jens Axboe.
3706 This man page was rewritten by Tomohiro Kusumi <tkusumi@tuxera.com>
3707 based on documentation by Jens Axboe.
3708
3710 Report bugs to the fio mailing list <fio@vger.kernel.org>.
3711 See REPORTING-BUGS.
3712
3713 REPORTING-BUGS: http://git.kernel.dk/cgit/fio/plain/REPORTING-BUGS
3714
3716 For further documentation see HOWTO and README.
3717 Sample jobfiles are available in the `examples/' directory.
3718 These are typically located under `/usr/share/doc/fio'.
3719
3720 HOWTO: http://git.kernel.dk/cgit/fio/plain/HOWTO
3721 README: http://git.kernel.dk/cgit/fio/plain/README
3722
3723
3724
3725User Manual August 2017 fio(1)