1fio(1) General Commands Manual fio(1)
2
3
4
6 fio - flexible I/O tester
7
9 fio [options] [jobfile]...
10
12 fio is a tool that will spawn a number of threads or processes doing a
13 particular type of I/O action as specified by the user. The typical
14 use of fio is to write a job file matching the I/O load one wants to
15 simulate.
16
18 --debug=type
19 Enable verbose tracing type of various fio actions. May be `all'
20 for all types or individual types separated by a comma (e.g.
21 `--debug=file,mem' will enable file and memory debugging).
22 `help' will list all available tracing options.
23
24 --parse-only
25 Parse options only, don't start any I/O.
26
27 --merge-blktrace-only
28 Merge blktraces only, don't start any I/O.
29
30 --output=filename
31 Write output to filename.
32
33 --output-format=format
34 Set the reporting format to `normal', `terse', `json', or
35 `json+'. Multiple formats can be selected, separate by a comma.
36 `terse' is a CSV based format. `json+' is like `json', except it
37 adds a full dump of the latency buckets.
38
39 --bandwidth-log
40 Generate aggregate bandwidth logs.
41
42 --minimal
43 Print statistics in a terse, semicolon-delimited format.
44
45 --append-terse
46 Print statistics in selected mode AND terse, semicolon-delimited
47 format. Deprecated, use --output-format instead to select mul‐
48 tiple formats.
49
50 --terse-version=version
51 Set terse version output format (default `3', or `2', `4', `5').
52
53 --version
54 Print version information and exit.
55
56 --help Print a summary of the command line options and exit.
57
58 --cpuclock-test
59 Perform test and validation of internal CPU clock.
60
61 --crctest=[test]
62 Test the speed of the built-in checksumming functions. If no ar‐
63 gument is given, all of them are tested. Alternatively, a comma
64 separated list can be passed, in which case the given ones are
65 tested.
66
67 --cmdhelp=command
68 Print help information for command. May be `all' for all com‐
69 mands.
70
71 --enghelp=[ioengine[,command]]
72 List all commands defined by ioengine, or print help for command
73 defined by ioengine. If no ioengine is given, list all available
74 ioengines.
75
76 --showcmd=jobfile
77 Convert jobfile to a set of command-line options.
78
79 --readonly
80 Turn on safety read-only checks, preventing writes and trims.
81 The --readonly option is an extra safety guard to prevent users
82 from accidentally starting a write or trim workload when that is
83 not desired. Fio will only modify the device under test if
84 `rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite' is given.
85 This safety net can be used as an extra precaution.
86
87 --eta=when
88 Specifies when real-time ETA estimate should be printed. when
89 may be `always', `never' or `auto'. `auto' is the default, it
90 prints ETA when requested if the output is a TTY. `always' dis‐
91 regards the output type, and prints ETA when requested. `never'
92 never prints ETA.
93
94 --eta-interval=time
95 By default, fio requests client ETA status roughly every second.
96 With this option, the interval is configurable. Fio imposes a
97 minimum allowed time to avoid flooding the console, less than
98 250 msec is not supported.
99
100 --eta-newline=time
101 Force a new line for every time period passed. When the unit is
102 omitted, the value is interpreted in seconds.
103
104 --status-interval=time
105 Force a full status dump of cumulative (from job start) values
106 at time intervals. This option does *not* provide per-period
107 measurements. So values such as bandwidth are running averages.
108 When the time unit is omitted, time is interpreted in seconds.
109 Note that using this option with `--output-format=json' will
110 yield output that technically isn't valid json, since the output
111 will be collated sets of valid json. It will need to be split
112 into valid sets of json after the run.
113
114 --section=name
115 Only run specified section name in job file. Multiple sections
116 can be specified. The --section option allows one to combine
117 related jobs into one file. E.g. one job file could define
118 light, moderate, and heavy sections. Tell fio to run only the
119 "heavy" section by giving `--section=heavy' command line option.
120 One can also specify the "write" operations in one section and
121 "verify" operation in another section. The --section option only
122 applies to job sections. The reserved *global* section is always
123 parsed and used.
124
125 --alloc-size=kb
126 Allocate additional internal smalloc pools of size kb in KiB.
127 The --alloc-size option increases shared memory set aside for
128 use by fio. If running large jobs with randommap enabled, fio
129 can run out of memory. Smalloc is an internal allocator for
130 shared structures from a fixed size memory pool and can grow to
131 16 pools. The pool size defaults to 16MiB. NOTE: While running
132 `.fio_smalloc.*' backing store files are visible in `/tmp'.
133
134 --warnings-fatal
135 All fio parser warnings are fatal, causing fio to exit with an
136 error.
137
138 --max-jobs=nr
139 Set the maximum number of threads/processes to support to nr.
140 NOTE: On Linux, it may be necessary to increase the shared-mem‐
141 ory limit (`/proc/sys/kernel/shmmax') if fio runs into errors
142 while creating jobs.
143
144 --server=args
145 Start a backend server, with args specifying what to listen to.
146 See CLIENT/SERVER section.
147
148 --daemonize=pidfile
149 Background a fio server, writing the pid to the given pidfile
150 file.
151
152 --client=hostname
153 Instead of running the jobs locally, send and run them on the
154 given hostname or set of hostnames. See CLIENT/SERVER section.
155
156 --remote-config=file
157 Tell fio server to load this local file.
158
159 --idle-prof=option
160 Report CPU idleness. option is one of the following:
161
162 calibrate
163 Run unit work calibration only and exit.
164
165 system Show aggregate system idleness and unit work.
166
167 percpu As system but also show per CPU idleness.
168
169 --inflate-log=log
170 Inflate and output compressed log.
171
172 --trigger-file=file
173 Execute trigger command when file exists.
174
175 --trigger-timeout=time
176 Execute trigger at this time.
177
178 --trigger=command
179 Set this command as local trigger.
180
181 --trigger-remote=command
182 Set this command as remote trigger.
183
184 --aux-path=path
185 Use the directory specified by path for generated state files
186 instead of the current working directory.
187
189 Any parameters following the options will be assumed to be job files,
190 unless they match a job file parameter. Multiple job files can be
191 listed and each job file will be regarded as a separate group. Fio will
192 stonewall execution between each group.
193
194 Fio accepts one or more job files describing what it is supposed to do.
195 The job file format is the classic ini file, where the names enclosed
196 in [] brackets define the job name. You are free to use any ASCII name
197 you want, except *global* which has special meaning. Following the job
198 name is a sequence of zero or more parameters, one per line, that de‐
199 fine the behavior of the job. If the first character in a line is a ';'
200 or a '#', the entire line is discarded as a comment.
201
202 A *global* section sets defaults for the jobs described in that file. A
203 job may override a *global* section parameter, and a job file may even
204 have several *global* sections if so desired. A job is only affected by
205 a *global* section residing above it.
206
207 The --cmdhelp option also lists all options. If used with an command
208 argument, --cmdhelp will detail the given command.
209
210 See the `examples/' directory for inspiration on how to write job
211 files. Note the copyright and license requirements currently apply to
212 `examples/' files.
213
214 Note that the maximum length of a line in the job file is 8192 bytes.
215
217 Some parameters take an option of a given type, such as an integer or a
218 string. Anywhere a numeric value is required, an arithmetic expression
219 may be used, provided it is surrounded by parentheses. Supported opera‐
220 tors are:
221
222 addition (+)
223
224 subtraction (-)
225
226 multiplication (*)
227
228 division (/)
229
230 modulus (%)
231
232 exponentiation (^)
233
234 For time values in expressions, units are microseconds by default. This
235 is different than for time values not in expressions (not enclosed in
236 parentheses).
237
239 The following parameter types are used.
240
241 str String. A sequence of alphanumeric characters.
242
243 time Integer with possible time suffix. Without a unit value is in‐
244 terpreted as seconds unless otherwise specified. Accepts a suf‐
245 fix of 'd' for days, 'h' for hours, 'm' for minutes, 's' for
246 seconds, 'ms' (or 'msec') for milliseconds and 'us' (or 'usec')
247 for microseconds. For example, use 10m for 10 minutes.
248
249 int Integer. A whole number value, which may contain an integer pre‐
250 fix and an integer suffix.
251
252 [*integer prefix*] **number** [*integer suffix*]
253
254 The optional *integer prefix* specifies the number's base. The
255 default is decimal. *0x* specifies hexadecimal.
256
257 The optional *integer suffix* specifies the number's units, and
258 includes an optional unit prefix and an optional unit. For quan‐
259 tities of data, the default unit is bytes. For quantities of
260 time, the default unit is seconds unless otherwise specified.
261
262 With `kb_base=1000', fio follows international standards for
263 unit prefixes. To specify power-of-10 decimal values defined in
264 the International System of Units (SI):
265
266 K means kilo (K) or 1000
267 M means mega (M) or 1000**2
268 G means giga (G) or 1000**3
269 T means tera (T) or 1000**4
270 P means peta (P) or 1000**5
271
272 To specify power-of-2 binary values defined in IEC 80000-13:
273
274 Ki means kibi (Ki) or 1024
275 Mi means mebi (Mi) or 1024**2
276 Gi means gibi (Gi) or 1024**3
277 Ti means tebi (Ti) or 1024**4
278 Pi means pebi (Pi) or 1024**5
279
280 With `kb_base=1024' (the default), the unit prefixes are oppo‐
281 site from those specified in the SI and IEC 80000-13 standards
282 to provide compatibility with old scripts. For example, 4k means
283 4096.
284
285 For quantities of data, an optional unit of 'B' may be included
286 (e.g., 'kB' is the same as 'k').
287
288 The *integer suffix* is not case sensitive (e.g., m/mi mean
289 mebi/mega, not milli). 'b' and 'B' both mean byte, not bit.
290
291 Examples with `kb_base=1000':
292
293 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
294 1 MiB: 1048576, 1m, 1024k
295 1 MB: 1000000, 1mi, 1000ki
296 1 TiB: 1073741824, 1t, 1024m, 1048576k
297 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
298
299 Examples with `kb_base=1024' (default):
300
301 4 KiB: 4096, 4096b, 4096B, 4k, 4kb, 4kB, 4K, 4KB
302 1 MiB: 1048576, 1m, 1024k
303 1 MB: 1000000, 1mi, 1000ki
304 1 TiB: 1073741824, 1t, 1024m, 1048576k
305 1 TB: 1000000000, 1ti, 1000mi, 1000000ki
306
307 To specify times (units are not case sensitive):
308
309 D means days
310 H means hours
311 M mean minutes
312 s or sec means seconds (default)
313 ms or msec means milliseconds
314 us or usec means microseconds
315
316 `z' suffix specifies that the value is measured in zones. Value
317 is recalculated once block device's zone size becomes known.
318
319 If the option accepts an upper and lower range, use a colon ':'
320 or minus '-' to separate such values. See irange parameter type.
321 If the lower value specified happens to be larger than the upper
322 value the two values are swapped.
323
324 bool Boolean. Usually parsed as an integer, however only defined for
325 true and false (1 and 0).
326
327 irange Integer range with suffix. Allows value range to be given, such
328 as 1024-4096. A colon may also be used as the separator, e.g.
329 1k:4k. If the option allows two sets of ranges, they can be
330 specified with a ',' or '/' delimiter: 1k-4k/8k-32k. Also see
331 int parameter type.
332
333 float_list
334 A list of floating point numbers, separated by a ':' character.
335
337 With the above in mind, here follows the complete list of fio job pa‐
338 rameters.
339
340 Units
341 kb_base=int
342 Select the interpretation of unit prefixes in input parameters.
343
344 1000 Inputs comply with IEC 80000-13 and the Interna‐
345 tional System of Units (SI). Use:
346
347 - power-of-2 values with IEC prefixes (e.g., KiB)
348 - power-of-10 values with SI prefixes (e.g., kB)
349
350 1024 Compatibility mode (default). To avoid breaking
351 old scripts:
352
353 - power-of-2 values with SI prefixes
354 - power-of-10 values with IEC prefixes
355
356 See bs for more details on input parameters.
357
358 Outputs always use correct prefixes. Most outputs include both
359 side-by-side, like:
360
361 bw=2383.3kB/s (2327.4KiB/s)
362
363 If only one value is reported, then kb_base selects the one to
364 use:
365
366 1000 -- SI prefixes
367 1024 -- IEC prefixes
368
369 unit_base=int
370 Base unit for reporting. Allowed values are:
371
372 0 Use auto-detection (default).
373
374 8 Byte based.
375
376 1 Bit based.
377
378 Job description
379 name=str
380 ASCII name of the job. This may be used to override the name
381 printed by fio for this job. Otherwise the job name is used. On
382 the command line this parameter has the special purpose of also
383 signaling the start of a new job.
384
385 description=str
386 Text description of the job. Doesn't do anything except dump
387 this text description when this job is run. It's not parsed.
388
389 loops=int
390 Run the specified number of iterations of this job. Used to re‐
391 peat the same workload a given number of times. Defaults to 1.
392
393 numjobs=int
394 Create the specified number of clones of this job. Each clone of
395 job is spawned as an independent thread or process. May be used
396 to setup a larger number of threads/processes doing the same
397 thing. Each thread is reported separately; to see statistics for
398 all clones as a whole, use group_reporting in conjunction with
399 new_group. See --max-jobs. Default: 1.
400
401 Time related parameters
402 runtime=time
403 Tell fio to terminate processing after the specified period of
404 time. It can be quite hard to determine for how long a specified
405 job will run, so this parameter is handy to cap the total run‐
406 time to a given time. When the unit is omitted, the value is in‐
407 terpreted in seconds.
408
409 time_based
410 If set, fio will run for the duration of the runtime specified
411 even if the file(s) are completely read or written. It will sim‐
412 ply loop over the same workload as many times as the runtime al‐
413 lows.
414
415 startdelay=irange(int)
416 Delay the start of job for the specified amount of time. Can be
417 a single value or a range. When given as a range, each thread
418 will choose a value randomly from within the range. Value is in
419 seconds if a unit is omitted.
420
421 ramp_time=time
422 If set, fio will run the specified workload for this amount of
423 time before logging any performance numbers. Useful for letting
424 performance settle before logging results, thus minimizing the
425 runtime required for stable results. Note that the ramp_time is
426 considered lead in time for a job, thus it will increase the to‐
427 tal runtime if a special timeout or runtime is specified. When
428 the unit is omitted, the value is given in seconds.
429
430 clocksource=str
431 Use the given clocksource as the base of timing. The supported
432 options are:
433
434 gettimeofday
435 gettimeofday(2)
436
437 clock_gettime
438 clock_gettime(2)
439
440 cpu Internal CPU clock source
441
442 cpu is the preferred clocksource if it is reliable, as it is
443 very fast (and fio is heavy on time calls). Fio will automati‐
444 cally use this clocksource if it's supported and considered re‐
445 liable on the system it is running on, unless another clock‐
446 source is specifically set. For x86/x86-64 CPUs, this means sup‐
447 porting TSC Invariant.
448
449 gtod_reduce=bool
450 Enable all of the gettimeofday(2) reducing options (dis‐
451 able_clat, disable_slat, disable_bw_measurement) plus reduce
452 precision of the timeout somewhat to really shrink the gettime‐
453 ofday(2) call count. With this option enabled, we only do about
454 0.4% of the gettimeofday(2) calls we would have done if all time
455 keeping was enabled.
456
457 gtod_cpu=int
458 Sometimes it's cheaper to dedicate a single thread of execution
459 to just getting the current time. Fio (and databases, for in‐
460 stance) are very intensive on gettimeofday(2) calls. With this
461 option, you can set one CPU aside for doing nothing but logging
462 current time to a shared memory location. Then the other
463 threads/processes that run I/O workloads need only copy that
464 segment, instead of entering the kernel with a gettimeofday(2)
465 call. The CPU set aside for doing these time calls will be ex‐
466 cluded from other uses. Fio will manually clear it from the CPU
467 mask of other jobs.
468
469 Target file/device
470 directory=str
471 Prefix filenames with this directory. Used to place files in a
472 different location than `./'. You can specify a number of direc‐
473 tories by separating the names with a ':' character. These di‐
474 rectories will be assigned equally distributed to job clones
475 created by numjobs as long as they are using generated file‐
476 names. If specific filename(s) are set fio will use the first
477 listed directory, and thereby matching the filename semantic
478 (which generates a file for each clone if not specified, but
479 lets all clones use the same file if set).
480
481 See the filename option for information on how to escape ':'
482 characters within the directory path itself.
483
484 Note: To control the directory fio will use for internal state
485 files use --aux-path.
486
487 filename=str
488 Fio normally makes up a filename based on the job name, thread
489 number, and file number (see filename_format). If you want to
490 share files between threads in a job or several jobs with fixed
491 file paths, specify a filename for each of them to override the
492 default. If the ioengine is file based, you can specify a number
493 of files by separating the names with a ':' colon. So if you
494 wanted a job to open `/dev/sda' and `/dev/sdb' as the two work‐
495 ing files, you would use `filename=/dev/sda:/dev/sdb'. This also
496 means that whenever this option is specified, nrfiles is ig‐
497 nored. The size of regular files specified by this option will
498 be size divided by number of files unless an explicit size is
499 specified by filesize.
500
501 Each colon in the wanted path must be escaped with a '\' charac‐
502 ter. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
503 would use `filename=/dev/dsk/foo@3,0\:c' and if the path is
504 `F:\filename' then you would use `filename=F\:\filename'.
505
506 On Windows, disk devices are accessed as `\\.\PhysicalDrive0'
507 for the first device, `\\.\PhysicalDrive1' for the second etc.
508 Note: Windows and FreeBSD prevent write access to areas of the
509 disk containing in-use data (e.g. filesystems).
510
511 The filename `-' is a reserved name, meaning *stdin* or *std‐
512 out*. Which of the two depends on the read/write direction set.
513
514 filename_format=str
515 If sharing multiple files between jobs, it is usually necessary
516 to have fio generate the exact names that you want. By default,
517 fio will name a file based on the default file format specifica‐
518 tion of `jobname.jobnumber.filenumber'. With this option, that
519 can be customized. Fio will recognize and replace the following
520 keywords in this string:
521
522 $jobname
523 The name of the worker thread or process.
524
525 $clientuid
526 IP of the fio process when using client/server
527 mode.
528
529 $jobnum
530 The incremental number of the worker thread or
531 process.
532
533 $filenum
534 The incremental number of the file for that worker
535 thread or process.
536
537 To have dependent jobs share a set of files, this option can be
538 set to have fio generate filenames that are shared between the
539 two. For instance, if `testfiles.$filenum' is specified, file
540 number 4 for any job will be named `testfiles.4'. The default of
541 `$jobname.$jobnum.$filenum' will be used if no other format
542 specifier is given.
543
544 If you specify a path then the directories will be created up to
545 the main directory for the file. So for example if you specify
546 `a/b/c/$jobnum` then the directories a/b/c will be created be‐
547 fore the file setup part of the job. If you specify directory
548 then the path will be relative that directory, otherwise it is
549 treated as the absolute path.
550
551 unique_filename=bool
552 To avoid collisions between networked clients, fio defaults to
553 prefixing any generated filenames (with a directory specified)
554 with the source of the client connecting. To disable this behav‐
555 ior, set this option to 0.
556
557 opendir=str
558 Recursively open any files below directory str.
559
560 lockfile=str
561 Fio defaults to not locking any files before it does I/O to
562 them. If a file or file descriptor is shared, fio can serialize
563 I/O to that file to make the end result consistent. This is
564 usual for emulating real workloads that share files. The lock
565 modes are:
566
567 none No locking. The default.
568
569 exclusive
570 Only one thread or process may do I/O at a time,
571 excluding all others.
572
573 readwrite
574 Read-write locking on the file. Many readers may
575 access the file at the same time, but writes get
576 exclusive access.
577
578 nrfiles=int
579 Number of files to use for this job. Defaults to 1. The size of
580 files will be size divided by this unless explicit size is spec‐
581 ified by filesize. Files are created for each thread separately,
582 and each file will have a file number within its name by de‐
583 fault, as explained in filename section.
584
585 openfiles=int
586 Number of files to keep open at the same time. Defaults to the
587 same as nrfiles, can be set smaller to limit the number simulta‐
588 neous opens.
589
590 file_service_type=str
591 Defines how fio decides which file from a job to service next.
592 The following types are defined:
593
594 random Choose a file at random.
595
596 roundrobin
597 Round robin over opened files. This is the de‐
598 fault.
599
600 sequential
601 Finish one file before moving on to the next. Mul‐
602 tiple files can still be open depending on open‐
603 files.
604
605 zipf Use a Zipf distribution to decide what file to ac‐
606 cess.
607
608 pareto Use a Pareto distribution to decide what file to
609 access.
610
611 normal Use a Gaussian (normal) distribution to decide
612 what file to access.
613
614 gauss Alias for normal.
615
616 For random, roundrobin, and sequential, a postfix can be ap‐
617 pended to tell fio how many I/Os to issue before switching to a
618 new file. For example, specifying `file_service_type=random:8'
619 would cause fio to issue 8 I/Os before selecting a new file at
620 random. For the non-uniform distributions, a floating point
621 postfix can be given to influence how the distribution is
622 skewed. See random_distribution for a description of how that
623 would work.
624
625 ioscheduler=str
626 Attempt to switch the device hosting the file to the specified
627 I/O scheduler before running.
628
629 create_serialize=bool
630 If true, serialize the file creation for the jobs. This may be
631 handy to avoid interleaving of data files, which may greatly de‐
632 pend on the filesystem used and even the number of processors in
633 the system. Default: true.
634
635 create_fsync=bool
636 fsync(2) the data file after creation. This is the default.
637
638 create_on_open=bool
639 If true, don't pre-create files but allow the job's open() to
640 create a file when it's time to do I/O. Default: false -- pre-
641 create all necessary files when the job starts.
642
643 create_only=bool
644 If true, fio will only run the setup phase of the job. If files
645 need to be laid out or updated on disk, only that will be done
646 -- the actual job contents are not executed. Default: false.
647
648 allow_file_create=bool
649 If true, fio is permitted to create files as part of its work‐
650 load. If this option is false, then fio will error out if the
651 files it needs to use don't already exist. Default: true.
652
653 allow_mounted_write=bool
654 If this isn't set, fio will abort jobs that are destructive
655 (e.g. that write) to what appears to be a mounted device or par‐
656 tition. This should help catch creating inadvertently destruc‐
657 tive tests, not realizing that the test will destroy data on the
658 mounted file system. Note that some platforms don't allow writ‐
659 ing against a mounted device regardless of this option. Default:
660 false.
661
662 pre_read=bool
663 If this is given, files will be pre-read into memory before
664 starting the given I/O operation. This will also clear the in‐
665 validate flag, since it is pointless to pre-read and then drop
666 the cache. This will only work for I/O engines that are seek-
667 able, since they allow you to read the same data multiple times.
668 Thus it will not work on non-seekable I/O engines (e.g. network,
669 splice). Default: false.
670
671 unlink=bool
672 Unlink the job files when done. Not the default, as repeated
673 runs of that job would then waste time recreating the file set
674 again and again. Default: false.
675
676 unlink_each_loop=bool
677 Unlink job files after each iteration or loop. Default: false.
678
679 zonemode=str
680 Accepted values are:
681
682 none The zonerange, zonesize zonecapacity and zoneskip
683 parameters are ignored.
684
685 strided
686 I/O happens in a single zone until zonesize bytes
687 have been transferred. After that number of bytes
688 has been transferred processing of the next zone
689 starts. The zonecapacity parameter is ignored.
690
691 zbd Zoned block device mode. I/O happens sequentially
692 in each zone, even if random I/O has been se‐
693 lected. Random I/O happens across all zones in‐
694 stead of being restricted to a single zone.
695
696 zonerange=int
697 For zonemode=strided, this is the size of a single zone. See
698 also zonesize and zoneskip.
699
700 For zonemode=zbd, this parameter is ignored.
701
702 zonesize=int
703 For zonemode=strided, this is the number of bytes to transfer
704 before skipping zoneskip bytes. If this parameter is smaller
705 than zonerange then only a fraction of each zone with zonerange
706 bytes will be accessed. If this parameter is larger than zon‐
707 erange then each zone will be accessed multiple times before
708 skipping to the next zone.
709
710 For zonemode=zbd, this is the size of a single zone. The zon‐
711 erange parameter is ignored in this mode. For a job accessing a
712 zoned block device, the specified zonesize must be 0 or equal to
713 the device zone size. For a regular block device or file, the
714 specified zonesize must be at least 512B.
715
716 zonecapacity=int
717 For zonemode=zbd, this defines the capacity of a single zone,
718 which is the accessible area starting from the zone start ad‐
719 dress. This parameter only applies when using zonemode=zbd in
720 combination with regular block devices. If not specified it de‐
721 faults to the zone size. If the target device is a zoned block
722 device, the zone capacity is obtained from the device informa‐
723 tion and this option is ignored.
724
725 zoneskip=int[z]
726 For zonemode=strided, the number of bytes to skip after zonesize
727 bytes of data have been transferred.
728
729 For zonemode=zbd, the zonesize aligned number of bytes to skip
730 once a zone is fully written (write workloads) or all written
731 data in the zone have been read (read workloads). This parameter
732 is valid only for sequential workloads and ignored for random
733 workloads. For read workloads, see also read_beyond_wp.
734
735
736 read_beyond_wp=bool
737 This parameter applies to zonemode=zbd only.
738
739 Zoned block devices are block devices that consist of multiple
740 zones. Each zone has a type, e.g. conventional or sequential. A
741 conventional zone can be written at any offset that is a multi‐
742 ple of the block size. Sequential zones must be written sequen‐
743 tially. The position at which a write must occur is called the
744 write pointer. A zoned block device can be either host managed
745 or host aware. For host managed devices the host must ensure
746 that writes happen sequentially. Fio recognizes host managed de‐
747 vices and serializes writes to sequential zones for these de‐
748 vices.
749
750 If a read occurs in a sequential zone beyond the write pointer
751 then the zoned block device will complete the read without read‐
752 ing any data from the storage medium. Since such reads lead to
753 unrealistically high bandwidth and IOPS numbers fio only reads
754 beyond the write pointer if explicitly told to do so. Default:
755 false.
756
757 max_open_zones=int
758 When running a random write test across an entire drive many
759 more zones will be open than in a typical application workload.
760 Hence this command line option that allows to limit the number
761 of open zones. The number of open zones is defined as the number
762 of zones to which write commands are issued by all threads/pro‐
763 cesses.
764
765 job_max_open_zones=int
766 Limit on the number of simultaneously opened zones per single
767 thread/process.
768
769 zone_reset_threshold=float
770 A number between zero and one that indicates the ratio of logi‐
771 cal blocks with data to the total number of logical blocks in
772 the test above which zones should be reset periodically.
773
774 zone_reset_frequency=float
775 A number between zero and one that indicates how often a zone
776 reset should be issued if the zone reset threshold has been ex‐
777 ceeded. A zone reset is submitted after each (1 / zone_re‐
778 set_frequency) write requests. This and the previous parameter
779 can be used to simulate garbage collection activity.
780
781
782 I/O type
783 direct=bool
784 If value is true, use non-buffered I/O. This is usually O_DI‐
785 RECT. Note that OpenBSD and ZFS on Solaris don't support direct
786 I/O. On Windows the synchronous ioengines don't support direct
787 I/O. Default: false.
788
789 atomic=bool
790 If value is true, attempt to use atomic direct I/O. Atomic
791 writes are guaranteed to be stable once acknowledged by the op‐
792 erating system. Only Linux supports O_ATOMIC right now.
793
794 buffered=bool
795 If value is true, use buffered I/O. This is the opposite of the
796 direct option. Defaults to true.
797
798 readwrite=str, rw=str
799 Type of I/O pattern. Accepted values are:
800
801 read Sequential reads.
802
803 write Sequential writes.
804
805 trim Sequential trims (Linux block devices and SCSI
806 character devices only).
807
808 randread
809 Random reads.
810
811 randwrite
812 Random writes.
813
814 randtrim
815 Random trims (Linux block devices and SCSI charac‐
816 ter devices only).
817
818 rw,readwrite
819 Sequential mixed reads and writes.
820
821 randrw Random mixed reads and writes.
822
823 trimwrite
824 Sequential trim+write sequences. Blocks will be
825 trimmed first, then the same blocks will be writ‐
826 ten to.
827
828 Fio defaults to read if the option is not specified. For the
829 mixed I/O types, the default is to split them 50/50. For certain
830 types of I/O the result may still be skewed a bit, since the
831 speed may be different.
832
833 It is possible to specify the number of I/Os to do before get‐
834 ting a new offset by appending `:<nr>' to the end of the string
835 given. For a random read, it would look like `rw=randread:8' for
836 passing in an offset modifier with a value of 8. If the suffix
837 is used with a sequential I/O pattern, then the `<nr>' value
838 specified will be added to the generated offset for each I/O
839 turning sequential I/O into sequential I/O with holes. For in‐
840 stance, using `rw=write:4k' will skip 4k for every write. Also
841 see the rw_sequencer option.
842
843 rw_sequencer=str
844 If an offset modifier is given by appending a number to the
845 `rw=str' line, then this option controls how that number modi‐
846 fies the I/O offset being generated. Accepted values are:
847
848 sequential
849 Generate sequential offset.
850
851 identical
852 Generate the same offset.
853
854 sequential is only useful for random I/O, where fio would nor‐
855 mally generate a new random offset for every I/O. If you append
856 e.g. 8 to randread, you would get a new random offset for every
857 8 I/Os. The result would be a seek for only every 8 I/Os, in‐
858 stead of for every I/O. Use `rw=randread:8' to specify that. As
859 sequential I/O is already sequential, setting sequential for
860 that would not result in any differences. identical behaves in a
861 similar fashion, except it sends the same offset 8 number of
862 times before generating a new offset.
863
864 unified_rw_reporting=bool
865 Fio normally reports statistics on a per data direction basis,
866 meaning that reads, writes, and trims are accounted and reported
867 separately. If this option is set fio sums the results and re‐
868 port them as "mixed" instead.
869
870 randrepeat=bool
871 Seed the random number generator used for random I/O patterns in
872 a predictable way so the pattern is repeatable across runs. De‐
873 fault: true.
874
875 allrandrepeat=bool
876 Seed all random number generators in a predictable way so re‐
877 sults are repeatable across runs. Default: false.
878
879 randseed=int
880 Seed the random number generators based on this seed value, to
881 be able to control what sequence of output is being generated.
882 If not set, the random sequence depends on the randrepeat set‐
883 ting.
884
885 fallocate=str
886 Whether pre-allocation is performed when laying down files. Ac‐
887 cepted values are:
888
889 none Do not pre-allocate space.
890
891 native Use a platform's native pre-allocation call but
892 fall back to none behavior if it fails/is not im‐
893 plemented.
894
895 posix Pre-allocate via posix_fallocate(3).
896
897 keep Pre-allocate via fallocate(2) with FAL‐
898 LOC_FL_KEEP_SIZE set.
899
900 truncate
901 Extend file to final size using ftruncate|(2) in‐
902 stead of allocating.
903
904 0 Backward-compatible alias for none.
905
906 1 Backward-compatible alias for posix.
907
908 May not be available on all supported platforms. keep is only
909 available on Linux. If using ZFS on Solaris this cannot be set
910 to posix because ZFS doesn't support pre-allocation. Default:
911 native if any pre-allocation methods except truncate are avail‐
912 able, none if not.
913
914 Note that using truncate on Windows will interact surprisingly
915 with non-sequential write patterns. When writing to a file that
916 has been extended by setting the end-of-file information, Win‐
917 dows will backfill the unwritten portion of the file up to that
918 offset with zeroes before issuing the new write. This means that
919 a single small write to the end of an extended file will stall
920 until the entire file has been filled with zeroes.
921
922 fadvise_hint=str
923 Use posix_fadvise(2) or posix_madvise(2) to advise the kernel
924 what I/O patterns are likely to be issued. Accepted values are:
925
926 0 Backwards compatible hint for "no hint".
927
928 1 Backwards compatible hint for "advise with fio
929 workload type". This uses FADV_RANDOM for a random
930 workload, and FADV_SEQUENTIAL for a sequential
931 workload.
932
933 sequential
934 Advise using FADV_SEQUENTIAL.
935
936 random Advise using FADV_RANDOM.
937
938 write_hint=str
939 Use fcntl(2) to advise the kernel what life time to expect from
940 a write. Only supported on Linux, as of version 4.13. Accepted
941 values are:
942
943 none No particular life time associated with this file.
944
945 short Data written to this file has a short life time.
946
947 medium Data written to this file has a medium life time.
948
949 long Data written to this file has a long life time.
950
951 extreme
952 Data written to this file has a very long life
953 time.
954
955 The values are all relative to each other, and no absolute mean‐
956 ing should be associated with them.
957
958 offset=int[%|z]
959 Start I/O at the provided offset in the file, given as either a
960 fixed size in bytes or a percentage. If a percentage is given,
961 the generated offset will be aligned to the minimum blocksize or
962 to the value of offset_align if provided. Data before the given
963 offset will not be touched. This effectively caps the file size
964 at `real_size - offset'. Can be combined with size to constrain
965 the start and end range of the I/O workload. A percentage can
966 be specified by a number between 1 and 100 followed by '%', for
967 example, `offset=20%' to specify 20%.
968
969 offset_align=int
970 If set to non-zero value, the byte offset generated by a per‐
971 centage offset is aligned upwards to this value. Defaults to 0
972 meaning that a percentage offset is aligned to the minimum block
973 size.
974
975 offset_increment=int[%|z]
976 If this is provided, then the real offset becomes `offset + off‐
977 set_increment * thread_number', where the thread number is a
978 counter that starts at 0 and is incremented for each sub-job
979 (i.e. when numjobs option is specified). This option is useful
980 if there are several jobs which are intended to operate on a
981 file in parallel disjoint segments, with even spacing between
982 the starting points. Percentages can be used for this option.
983 If a percentage is given, the generated offset will be aligned
984 to the minimum blocksize or to the value of offset_align if pro‐
985 vided.
986
987 number_ios=int
988 Fio will normally perform I/Os until it has exhausted the size
989 of the region set by size, or if it exhaust the allocated time
990 (or hits an error condition). With this setting, the range/size
991 can be set independently of the number of I/Os to perform. When
992 fio reaches this number, it will exit normally and report sta‐
993 tus. Note that this does not extend the amount of I/O that will
994 be done, it will only stop fio if this condition is met before
995 other end-of-job criteria.
996
997 fsync=int
998 If writing to a file, issue an fsync(2) (or its equivalent) of
999 the dirty data for every number of blocks given. For example, if
1000 you give 32 as a parameter, fio will sync the file after every
1001 32 writes issued. If fio is using non-buffered I/O, we may not
1002 sync the file. The exception is the sg I/O engine, which syn‐
1003 chronizes the disk cache anyway. Defaults to 0, which means fio
1004 does not periodically issue and wait for a sync to complete.
1005 Also see end_fsync and fsync_on_close.
1006
1007 fdatasync=int
1008 Like fsync but uses fdatasync(2) to only sync data and not meta‐
1009 data blocks. In Windows, FreeBSD, DragonFlyBSD or OSX there is
1010 no fdatasync(2) so this falls back to using fsync(2). Defaults
1011 to 0, which means fio does not periodically issue and wait for a
1012 data-only sync to complete.
1013
1014 write_barrier=int
1015 Make every N-th write a barrier write.
1016
1017 sync_file_range=str:int
1018 Use sync_file_range(2) for every int number of write operations.
1019 Fio will track range of writes that have happened since the last
1020 sync_file_range(2) call. str can currently be one or more of:
1021
1022 wait_before
1023 SYNC_FILE_RANGE_WAIT_BEFORE
1024
1025 write SYNC_FILE_RANGE_WRITE
1026
1027 wait_after
1028 SYNC_FILE_RANGE_WRITE_AFTER
1029
1030 So if you do `sync_file_range=wait_before,write:8', fio would
1031 use `SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE' for
1032 every 8 writes. Also see the sync_file_range(2) man page. This
1033 option is Linux specific.
1034
1035 overwrite=bool
1036 If true, writes to a file will always overwrite existing data.
1037 If the file doesn't already exist, it will be created before the
1038 write phase begins. If the file exists and is large enough for
1039 the specified write phase, nothing will be done. Default: false.
1040
1041 end_fsync=bool
1042 If true, fsync(2) file contents when a write stage has com‐
1043 pleted. Default: false.
1044
1045 fsync_on_close=bool
1046 If true, fio will fsync(2) a dirty file on close. This differs
1047 from end_fsync in that it will happen on every file close, not
1048 just at the end of the job. Default: false.
1049
1050 rwmixread=int
1051 Percentage of a mixed workload that should be reads. Default:
1052 50.
1053
1054 rwmixwrite=int
1055 Percentage of a mixed workload that should be writes. If both
1056 rwmixread and rwmixwrite is given and the values do not add up
1057 to 100%, the latter of the two will be used to override the
1058 first. This may interfere with a given rate setting, if fio is
1059 asked to limit reads or writes to a certain rate. If that is the
1060 case, then the distribution may be skewed. Default: 50.
1061
1062 random_distribution=str:float[:float][,str:float][,str:float]
1063 By default, fio will use a completely uniform random distribu‐
1064 tion when asked to perform random I/O. Sometimes it is useful to
1065 skew the distribution in specific ways, ensuring that some parts
1066 of the data is more hot than others. fio includes the following
1067 distribution models:
1068
1069 random Uniform random distribution
1070
1071 zipf Zipf distribution
1072
1073 pareto Pareto distribution
1074
1075 normal Normal (Gaussian) distribution
1076
1077 zoned Zoned random distribution zoned_abs Zoned absolute
1078 random distribution
1079
1080 When using a zipf or pareto distribution, an input value is also
1081 needed to define the access pattern. For zipf, this is the `Zipf
1082 theta'. For pareto, it's the `Pareto power'. Fio includes a
1083 test program, fio-genzipf, that can be used visualize what the
1084 given input values will yield in terms of hit rates. If you
1085 wanted to use zipf with a `theta' of 1.2, you would use `ran‐
1086 dom_distribution=zipf:1.2' as the option. If a non-uniform model
1087 is used, fio will disable use of the random map. For the normal
1088 distribution, a normal (Gaussian) deviation is supplied as a
1089 value between 0 and 100.
1090
1091 The second, optional float is allowed for pareto, zipf and nor‐
1092 mal distributions. It allows to set base of distribution in non-
1093 default place, giving more control over most probable outcome.
1094 This value is in range [0-1] which maps linearly to range of
1095 possible random values. Defaults are: random for pareto and
1096 zipf, and 0.5 for normal. If you wanted to use zipf with a
1097 `theta` of 1.2 centered on 1/4 of allowed value range, you would
1098 use `random_distibution=zipf:1.2:0.25`.
1099
1100 For a zoned distribution, fio supports specifying percentages of
1101 I/O access that should fall within what range of the file or de‐
1102 vice. For example, given a criteria of:
1103
1104 60% of accesses should be to the first 10%
1105 30% of accesses should be to the next 20%
1106 8% of accesses should be to the next 30%
1107 2% of accesses should be to the next 40%
1108
1109 we can define that through zoning of the random accesses. For
1110 the above example, the user would do:
1111
1112 random_distribution=zoned:60/10:30/20:8/30:2/40
1113
1114 A zoned_abs distribution works exactly like thezoned, except
1115 that it takes absolute sizes. For example, let's say you wanted
1116 to define access according to the following criteria:
1117
1118 60% of accesses should be to the first 20G
1119 30% of accesses should be to the next 100G
1120 10% of accesses should be to the next 500G
1121
1122 we can define an absolute zoning distribution with:
1123
1124 random_distribution=zoned:60/10:30/20:8/30:2/40
1125
1126 For both zoned and zoned_abs, fio supports defining up to 256
1127 separate zones.
1128
1129 Similarly to how bssplit works for setting ranges and percent‐
1130 ages of block sizes. Like bssplit, it's possible to specify sep‐
1131 arate zones for reads, writes, and trims. If just one set is
1132 given, it'll apply to all of them.
1133
1134 percentage_random=int[,int][,int]
1135 For a random workload, set how big a percentage should be ran‐
1136 dom. This defaults to 100%, in which case the workload is fully
1137 random. It can be set from anywhere from 0 to 100. Setting it to
1138 0 would make the workload fully sequential. Any setting in be‐
1139 tween will result in a random mix of sequential and random I/O,
1140 at the given percentages. Comma-separated values may be speci‐
1141 fied for reads, writes, and trims as described in blocksize.
1142
1143 norandommap
1144 Normally fio will cover every block of the file when doing ran‐
1145 dom I/O. If this option is given, fio will just get a new random
1146 offset without looking at past I/O history. This means that some
1147 blocks may not be read or written, and that some blocks may be
1148 read/written more than once. If this option is used with verify
1149 and multiple blocksizes (via bsrange), only intact blocks are
1150 verified, i.e., partially-overwritten blocks are ignored. With
1151 an async I/O engine and an I/O depth > 1, it is possible for the
1152 same block to be overwritten, which can cause verification er‐
1153 rors. Either do not use norandommap in this case, or also use
1154 the lfsr random generator.
1155
1156 softrandommap=bool
1157 See norandommap. If fio runs with the random block map enabled
1158 and it fails to allocate the map, if this option is set it will
1159 continue without a random block map. As coverage will not be as
1160 complete as with random maps, this option is disabled by de‐
1161 fault.
1162
1163 random_generator=str
1164 Fio supports the following engines for generating I/O offsets
1165 for random I/O:
1166
1167 tausworthe
1168 Strong 2^88 cycle random number generator.
1169
1170 lfsr Linear feedback shift register generator.
1171
1172 tausworthe64
1173 Strong 64-bit 2^258 cycle random number generator.
1174
1175 tausworthe is a strong random number generator, but it requires
1176 tracking on the side if we want to ensure that blocks are only
1177 read or written once. lfsr guarantees that we never generate the
1178 same offset twice, and it's also less computationally expensive.
1179 It's not a true random generator, however, though for I/O pur‐
1180 poses it's typically good enough. lfsr only works with single
1181 block sizes, not with workloads that use multiple block sizes.
1182 If used with such a workload, fio may read or write some blocks
1183 multiple times. The default value is tausworthe, unless the re‐
1184 quired space exceeds 2^32 blocks. If it does, then tausworthe64
1185 is selected automatically.
1186
1187 Block size
1188 blocksize=int[,int][,int], bs=int[,int][,int]
1189 The block size in bytes used for I/O units. Default: 4096. A
1190 single value applies to reads, writes, and trims. Comma-sepa‐
1191 rated values may be specified for reads, writes, and trims. A
1192 value not terminated in a comma applies to subsequent types. Ex‐
1193 amples:
1194
1195 bs=256k means 256k for reads, writes and trims.
1196 bs=8k,32k means 8k for reads, 32k for writes and
1197 trims.
1198 bs=8k,32k, means 8k for reads, 32k for writes, and
1199 default for trims.
1200 bs=,8k means default for reads, 8k for writes and
1201 trims.
1202 bs=,8k, means default for reads, 8k for writes,
1203 and default for trims.
1204
1205 blocksize_range=irange[,irange][,irange],
1206 bsrange=irange[,irange][,irange]
1207 A range of block sizes in bytes for I/O units. The issued I/O
1208 unit will always be a multiple of the minimum size, unless
1209 blocksize_unaligned is set. Comma-separated ranges may be spec‐
1210 ified for reads, writes, and trims as described in blocksize.
1211 Example:
1212
1213 bsrange=1k-4k,2k-8k
1214
1215 bssplit=str[,str][,str]
1216 Sometimes you want even finer grained control of the block sizes
1217 issued, not just an even split between them. This option allows
1218 you to weight various block sizes, so that you are able to de‐
1219 fine a specific amount of block sizes issued. The format for
1220 this option is:
1221
1222 bssplit=blocksize/percentage:blocksize/percentage
1223
1224 for as many block sizes as needed. So if you want to define a
1225 workload that has 50% 64k blocks, 10% 4k blocks, and 40% 32k
1226 blocks, you would write:
1227
1228 bssplit=4k/10:64k/50:32k/40
1229
1230 Ordering does not matter. If the percentage is left blank, fio
1231 will fill in the remaining values evenly. So a bssplit option
1232 like this one:
1233
1234 bssplit=4k/50:1k/:32k/
1235
1236 would have 50% 4k ios, and 25% 1k and 32k ios. The percentages
1237 always add up to 100, if bssplit is given a range that adds up
1238 to more, it will error out.
1239
1240 Comma-separated values may be specified for reads, writes, and
1241 trims as described in blocksize.
1242
1243 If you want a workload that has 50% 2k reads and 50% 4k reads,
1244 while having 90% 4k writes and 10% 8k writes, you would specify:
1245
1246 bssplit=2k/50:4k/50,4k/90:8k/10
1247
1248 Fio supports defining up to 64 different weights for each data
1249 direction.
1250
1251 blocksize_unaligned, bs_unaligned
1252 If set, fio will issue I/O units with any size within block‐
1253 size_range, not just multiples of the minimum size. This typi‐
1254 cally won't work with direct I/O, as that normally requires sec‐
1255 tor alignment.
1256
1257 bs_is_seq_rand=bool
1258 If this option is set, fio will use the normal read,write block‐
1259 size settings as sequential,random blocksize settings instead.
1260 Any random read or write will use the WRITE blocksize settings,
1261 and any sequential read or write will use the READ blocksize
1262 settings.
1263
1264 blockalign=int[,int][,int], ba=int[,int][,int]
1265 Boundary to which fio will align random I/O units. Default:
1266 blocksize. Minimum alignment is typically 512b for using direct
1267 I/O, though it usually depends on the hardware block size. This
1268 option is mutually exclusive with using a random map for files,
1269 so it will turn off that option. Comma-separated values may be
1270 specified for reads, writes, and trims as described in block‐
1271 size.
1272
1273 Buffers and memory
1274 zero_buffers
1275 Initialize buffers with all zeros. Default: fill buffers with
1276 random data.
1277
1278 refill_buffers
1279 If this option is given, fio will refill the I/O buffers on ev‐
1280 ery submit. The default is to only fill it at init time and re‐
1281 use that data. Only makes sense if zero_buffers isn't specified,
1282 naturally. If data verification is enabled, refill_buffers is
1283 also automatically enabled.
1284
1285 scramble_buffers=bool
1286 If refill_buffers is too costly and the target is using data
1287 deduplication, then setting this option will slightly modify the
1288 I/O buffer contents to defeat normal de-dupe attempts. This is
1289 not enough to defeat more clever block compression attempts, but
1290 it will stop naive dedupe of blocks. Default: true.
1291
1292 buffer_compress_percentage=int
1293 If this is set, then fio will attempt to provide I/O buffer con‐
1294 tent (on WRITEs) that compresses to the specified level. Fio
1295 does this by providing a mix of random data followed by fixed
1296 pattern data. The fixed pattern is either zeros, or the pattern
1297 specified by buffer_pattern. If the buffer_pattern option is
1298 used, it might skew the compression ratio slightly. Setting buf‐
1299 fer_compress_percentage to a value other than 100 will also en‐
1300 able refill_buffers in order to reduce the likelihood that adja‐
1301 cent blocks are so similar that they over compress when seen to‐
1302 gether. See buffer_compress_chunk for how to set a finer or
1303 coarser granularity of the random/fixed data regions. Defaults
1304 to unset i.e., buffer data will not adhere to any compression
1305 level.
1306
1307 buffer_compress_chunk=int
1308 This setting allows fio to manage how big the random/fixed data
1309 region is when using buffer_compress_percentage. When buf‐
1310 fer_compress_chunk is set to some non-zero value smaller than
1311 the block size, fio can repeat the random/fixed region through‐
1312 out the I/O buffer at the specified interval (which particularly
1313 useful when bigger block sizes are used for a job). When set to
1314 0, fio will use a chunk size that matches the block size result‐
1315 ing in a single random/fixed region within the I/O buffer. De‐
1316 faults to 512. When the unit is omitted, the value is inter‐
1317 preted in bytes.
1318
1319 buffer_pattern=str
1320 If set, fio will fill the I/O buffers with this pattern or with
1321 the contents of a file. If not set, the contents of I/O buffers
1322 are defined by the other options related to buffer contents. The
1323 setting can be any pattern of bytes, and can be prefixed with 0x
1324 for hex values. It may also be a string, where the string must
1325 then be wrapped with "". Or it may also be a filename, where the
1326 filename must be wrapped with '' in which case the file is
1327 opened and read. Note that not all the file contents will be
1328 read if that would cause the buffers to overflow. So, for exam‐
1329 ple:
1330
1331 buffer_pattern='filename'
1332 or:
1333 buffer_pattern="abcd"
1334 or:
1335 buffer_pattern=-12
1336 or:
1337 buffer_pattern=0xdeadface
1338
1339 Also you can combine everything together in any order:
1340
1341 buffer_pattern=0xdeadface"abcd"-12'filename'
1342
1343 dedupe_percentage=int
1344 If set, fio will generate this percentage of identical buffers
1345 when writing. These buffers will be naturally dedupable. The
1346 contents of the buffers depend on what other buffer compression
1347 settings have been set. It's possible to have the individual
1348 buffers either fully compressible, or not at all -- this option
1349 only controls the distribution of unique buffers. Setting this
1350 option will also enable refill_buffers to prevent every buffer
1351 being identical.
1352
1353 invalidate=bool
1354 Invalidate the buffer/page cache parts of the files to be used
1355 prior to starting I/O if the platform and file type support it.
1356 Defaults to true. This will be ignored if pre_read is also
1357 specified for the same job.
1358
1359 sync=str
1360 Whether, and what type, of synchronous I/O to use for writes.
1361 The allowed values are:
1362
1363 none Do not use synchronous IO, the default.
1364
1365 0 Same as none.
1366
1367 sync Use synchronous file IO. For the majority of I/O
1368 engines, this means using O_SYNC.
1369
1370 1 Same as sync.
1371
1372 dsync Use synchronous data IO. For the majority of I/O
1373 engines, this means using O_DSYNC.
1374
1375 iomem=str, mem=str
1376 Fio can use various types of memory as the I/O unit buffer. The
1377 allowed values are:
1378
1379 malloc Use memory from malloc(3) as the buffers. Default
1380 memory type.
1381
1382 shm Use shared memory as the buffers. Allocated
1383 through shmget(2).
1384
1385 shmhuge
1386 Same as shm, but use huge pages as backing.
1387
1388 mmap Use mmap(2) to allocate buffers. May either be
1389 anonymous memory, or can be file backed if a file‐
1390 name is given after the option. The format is
1391 `mem=mmap:/path/to/file'.
1392
1393 mmaphuge
1394 Use a memory mapped huge file as the buffer back‐
1395 ing. Append filename after mmaphuge, ala `mem=mma‐
1396 phuge:/hugetlbfs/file'.
1397
1398 mmapshared
1399 Same as mmap, but use a MMAP_SHARED mapping.
1400
1401 cudamalloc
1402 Use GPU memory as the buffers for GPUDirect RDMA
1403 benchmark. The ioengine must be rdma.
1404
1405 The area allocated is a function of the maximum allowed bs size
1406 for the job, multiplied by the I/O depth given. Note that for
1407 shmhuge and mmaphuge to work, the system must have free huge
1408 pages allocated. This can normally be checked and set by read‐
1409 ing/writing `/proc/sys/vm/nr_hugepages' on a Linux system. Fio
1410 assumes a huge page is 4MiB in size. So to calculate the number
1411 of huge pages you need for a given job file, add up the I/O
1412 depth of all jobs (normally one unless iodepth is used) and mul‐
1413 tiply by the maximum bs set. Then divide that number by the huge
1414 page size. You can see the size of the huge pages in `/proc/mem‐
1415 info'. If no huge pages are allocated by having a non-zero num‐
1416 ber in `nr_hugepages', using mmaphuge or shmhuge will fail. Also
1417 see hugepage-size.
1418
1419 mmaphuge also needs to have hugetlbfs mounted and the file loca‐
1420 tion should point there. So if it's mounted in `/huge', you
1421 would use `mem=mmaphuge:/huge/somefile'.
1422
1423 iomem_align=int, mem_align=int
1424 This indicates the memory alignment of the I/O memory buffers.
1425 Note that the given alignment is applied to the first I/O unit
1426 buffer, if using iodepth the alignment of the following buffers
1427 are given by the bs used. In other words, if using a bs that is
1428 a multiple of the page sized in the system, all buffers will be
1429 aligned to this value. If using a bs that is not page aligned,
1430 the alignment of subsequent I/O memory buffers is the sum of the
1431 iomem_align and bs used.
1432
1433 hugepage-size=int
1434 Defines the size of a huge page. Must at least be equal to the
1435 system setting, see `/proc/meminfo'. Defaults to 4MiB. Should
1436 probably always be a multiple of megabytes, so using
1437 `hugepage-size=Xm' is the preferred way to set this to avoid
1438 setting a non-pow-2 bad value.
1439
1440 lockmem=int
1441 Pin the specified amount of memory with mlock(2). Can be used to
1442 simulate a smaller amount of memory. The amount specified is per
1443 worker.
1444
1445 I/O size
1446 size=int[%|z]
1447 The total size of file I/O for each thread of this job. Fio will
1448 run until this many bytes has been transferred, unless runtime
1449 is limited by other options (such as runtime, for instance, or
1450 increased/decreased by io_size). Fio will divide this size be‐
1451 tween the available files determined by options such as nrfiles,
1452 filename, unless filesize is specified by the job. If the result
1453 of division happens to be 0, the size is set to the physical
1454 size of the given files or devices if they exist. If this op‐
1455 tion is not specified, fio will use the full size of the given
1456 files or devices. If the files do not exist, size must be given.
1457 It is also possible to give size as a percentage between 1 and
1458 100. If `size=20%' is given, fio will use 20% of the full size
1459 of the given files or devices. Can be combined with offset to
1460 constrain the start and end range that I/O will be done within.
1461
1462 io_size=int[%|z], io_limit=int[%|z]
1463 Normally fio operates within the region set by size, which means
1464 that the size option sets both the region and size of I/O to be
1465 performed. Sometimes that is not what you want. With this op‐
1466 tion, it is possible to define just the amount of I/O that fio
1467 should do. For instance, if size is set to 20GiB and io_size is
1468 set to 5GiB, fio will perform I/O within the first 20GiB but
1469 exit when 5GiB have been done. The opposite is also possible --
1470 if size is set to 20GiB, and io_size is set to 40GiB, then fio
1471 will do 40GiB of I/O within the 0..20GiB region. Value can be
1472 set as percentage: io_size=N%. In this case io_size multiplies
1473 size= value.
1474
1475 filesize=irange(int)
1476 Individual file sizes. May be a range, in which case fio will
1477 select sizes for files at random within the given range and lim‐
1478 ited to size in total (if that is given). If not given, each
1479 created file is the same size. This option overrides size in
1480 terms of file size, which means this value is used as a fixed
1481 size or possible range of each file.
1482
1483 file_append=bool
1484 Perform I/O after the end of the file. Normally fio will operate
1485 within the size of a file. If this option is set, then fio will
1486 append to the file instead. This has identical behavior to set‐
1487 ting offset to the size of a file. This option is ignored on
1488 non-regular files.
1489
1490 fill_device=bool, fill_fs=bool
1491 Sets size to something really large and waits for ENOSPC (no
1492 space left on device) as the terminating condition. Only makes
1493 sense with sequential write. For a read workload, the mount
1494 point will be filled first then I/O started on the result. This
1495 option doesn't make sense if operating on a raw device node,
1496 since the size of that is already known by the file system. Ad‐
1497 ditionally, writing beyond end-of-device will not return ENOSPC
1498 there.
1499
1500 I/O engine
1501 ioengine=str
1502 Defines how the job issues I/O to the file. The following types
1503 are defined:
1504
1505 sync Basic read(2) or write(2) I/O. lseek(2) is used to
1506 position the I/O location. See fsync and fdata‐
1507 sync for syncing write I/Os.
1508
1509 psync Basic pread(2) or pwrite(2) I/O. Default on all
1510 supported operating systems except for Windows.
1511
1512 vsync Basic readv(2) or writev(2) I/O. Will emulate
1513 queuing by coalescing adjacent I/Os into a single
1514 submission.
1515
1516 pvsync Basic preadv(2) or pwritev(2) I/O.
1517
1518 pvsync2
1519 Basic preadv2(2) or pwritev2(2) I/O.
1520
1521 libaio Linux native asynchronous I/O. Note that Linux may
1522 only support queued behavior with non-buffered I/O
1523 (set `direct=1' or `buffered=0'). This engine de‐
1524 fines engine specific options.
1525
1526 posixaio
1527 POSIX asynchronous I/O using aio_read(3) and
1528 aio_write(3).
1529
1530 solarisaio
1531 Solaris native asynchronous I/O.
1532
1533 windowsaio
1534 Windows native asynchronous I/O. Default on Win‐
1535 dows.
1536
1537 mmap File is memory mapped with mmap(2) and data copied
1538 to/from using memcpy(3).
1539
1540 splice splice(2) is used to transfer the data and vm‐
1541 splice(2) to transfer data from user space to the
1542 kernel.
1543
1544 sg SCSI generic sg v3 I/O. May either be synchronous
1545 using the SG_IO ioctl, or if the target is an sg
1546 character device we use read(2) and write(2) for
1547 asynchronous I/O. Requires filename option to
1548 specify either block or character devices. This
1549 engine supports trim operations. The sg engine in‐
1550 cludes engine specific options.
1551
1552 libzbc Synchronous I/O engine for SMR hard-disks using
1553 the libzbc library. The target can be either an sg
1554 character device or a block device file. This en‐
1555 gine supports the zonemode=zbd zone operations.
1556
1557 null Doesn't transfer any data, just pretends to. This
1558 is mainly used to exercise fio itself and for de‐
1559 bugging/testing purposes.
1560
1561 net Transfer over the network to given `host:port'.
1562 Depending on the protocol used, the hostname,
1563 port, listen and filename options are used to
1564 specify what sort of connection to make, while the
1565 protocol option determines which protocol will be
1566 used. This engine defines engine specific options.
1567
1568 netsplice
1569 Like net, but uses splice(2) and vmsplice(2) to
1570 map data and send/receive. This engine defines
1571 engine specific options.
1572
1573 cpuio Doesn't transfer any data, but burns CPU cycles
1574 according to the cpuload, cpuchunks and cpumode
1575 options. A job never finishes unless there is at
1576 least one non-cpuio job.
1577
1578 cpuload=85 will cause that job to do nothing but
1579 burn 85% of the CPU. In case of SMP machines, use
1580 numjobs=<nr_of_cpu> to get desired CPU usage, as
1581 the cpuload only loads a single CPU at the desired
1582 rate.
1583
1584 cpumode=qsort replace the default noop instruc‐
1585 tions loop by a qsort algorithm to consume more
1586 energy.
1587
1588 rdma The RDMA I/O engine supports both RDMA memory se‐
1589 mantics (RDMA_WRITE/RDMA_READ) and channel seman‐
1590 tics (Send/Recv) for the InfiniBand, RoCE and
1591 iWARP protocols. This engine defines engine spe‐
1592 cific options.
1593 falloc I/O engine that does regular fallocate to simulate
1594 data transfer as fio ioengine.
1595 DDIR_READ does fallocate(,mode = FAL‐
1596 LOC_FL_KEEP_SIZE,).
1597 DIR_WRITE does fallocate(,mode = 0).
1598 DDIR_TRIM does fallocate(,mode = FAL‐
1599 LOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE).
1600
1601 ftruncate
1602 I/O engine that sends ftruncate(2) operations in
1603 response to write (DDIR_WRITE) events. Each ftrun‐
1604 cate issued sets the file's size to the current
1605 block offset. blocksize is ignored.
1606
1607 e4defrag
1608 I/O engine that does regular EXT4_IOC_MOVE_EXT
1609 ioctls to simulate defragment activity in request
1610 to DDIR_WRITE event.
1611
1612 rados I/O engine supporting direct access to Ceph Reli‐
1613 able Autonomic Distributed Object Store (RADOS)
1614 via librados. This ioengine defines engine spe‐
1615 cific options.
1616
1617 rbd I/O engine supporting direct access to Ceph Rados
1618 Block Devices (RBD) via librbd without the need to
1619 use the kernel rbd driver. This ioengine defines
1620 engine specific options.
1621
1622 http I/O engine supporting GET/PUT requests over
1623 HTTP(S) with libcurl to a WebDAV or S3 endpoint.
1624 This ioengine defines engine specific options.
1625
1626 This engine only supports direct IO of iodepth=1;
1627 you need to scale this via numjobs. blocksize de‐
1628 fines the size of the objects to be created.
1629
1630 TRIM is translated to object deletion.
1631
1632 gfapi Using GlusterFS libgfapi sync interface to direct
1633 access to GlusterFS volumes without having to go
1634 through FUSE. This ioengine defines engine spe‐
1635 cific options.
1636
1637 gfapi_async
1638 Using GlusterFS libgfapi async interface to direct
1639 access to GlusterFS volumes without having to go
1640 through FUSE. This ioengine defines engine spe‐
1641 cific options.
1642
1643 libhdfs
1644 Read and write through Hadoop (HDFS). The filename
1645 option is used to specify host,port of the hdfs
1646 name-node to connect. This engine interprets off‐
1647 sets a little differently. In HDFS, files once
1648 created cannot be modified so random writes are
1649 not possible. To imitate this the libhdfs engine
1650 expects a bunch of small files to be created over
1651 HDFS and will randomly pick a file from them based
1652 on the offset generated by fio backend (see the
1653 example job file to create such files, use
1654 `rw=write' option). Please note, it may be neces‐
1655 sary to set environment variables to work with
1656 HDFS/libhdfs properly. Each job uses its own con‐
1657 nection to HDFS.
1658
1659 mtd Read, write and erase an MTD character device
1660 (e.g., `/dev/mtd0'). Discards are treated as
1661 erases. Depending on the underlying device type,
1662 the I/O may have to go in a certain pattern, e.g.,
1663 on NAND, writing sequentially to erase blocks and
1664 discarding before overwriting. The trimwrite mode
1665 works well for this constraint.
1666
1667 pmemblk
1668 Read and write using filesystem DAX to a file on a
1669 filesystem mounted with DAX on a persistent memory
1670 device through the PMDK libpmemblk library.
1671
1672 dev-dax
1673 Read and write using device DAX to a persistent
1674 memory device (e.g., /dev/dax0.0) through the PMDK
1675 libpmem library.
1676
1677 external
1678 Prefix to specify loading an external I/O engine
1679 object file. Append the engine filename, e.g. `io‐
1680 engine=external:/tmp/foo.o' to load ioengine
1681 `foo.o' in `/tmp'. The path can be either absolute
1682 or relative. See `engines/skeleton_external.c' in
1683 the fio source for details of writing an external
1684 I/O engine.
1685
1686 filecreate
1687 Simply create the files and do no I/O to them.
1688 You still need to set filesize so that all the ac‐
1689 counting still occurs, but no actual I/O will be
1690 done other than creating the file.
1691
1692 filestat
1693 Simply do stat() and do no I/O to the file. You
1694 need to set 'filesize' and 'nrfiles', so that
1695 files will be created. This engine is to measure
1696 file lookup and meta data access.
1697
1698 libpmem
1699 Read and write using mmap I/O to a file on a
1700 filesystem mounted with DAX on a persistent memory
1701 device through the PMDK libpmem library.
1702
1703 ime_psync
1704 Synchronous read and write using DDN's Infinite
1705 Memory Engine (IME). This engine is very basic and
1706 issues calls to IME whenever an IO is queued.
1707
1708 ime_psyncv
1709 Synchronous read and write using DDN's Infinite
1710 Memory Engine (IME). This engine uses iovecs and
1711 will try to stack as much IOs as possible (if the
1712 IOs are "contiguous" and the IO depth is not ex‐
1713 ceeded) before issuing a call to IME.
1714
1715 ime_aio
1716 Asynchronous read and write using DDN's Infinite
1717 Memory Engine (IME). This engine will try to stack
1718 as much IOs as possible by creating requests for
1719 IME. FIO will then decide when to commit these
1720 requests.
1721
1722 libiscsi
1723 Read and write iscsi lun with libiscsi.
1724
1725 nbd Synchronous read and write a Network Block Device
1726 (NBD).
1727
1728 libcufile
1729 I/O engine supporting libcufile synchronous access
1730 to nvidia-fs and a GPUDirect Storage-supported
1731 filesystem. This engine performs I/O without
1732 transferring buffers between user-space and the
1733 kernel, unless verify is set or cuda_io is posix.
1734 iomem must not be cudamalloc. This ioengine de‐
1735 fines engine specific options.
1736
1737 I/O engine specific parameters
1738 In addition, there are some parameters which are only valid when a spe‐
1739 cific ioengine is in use. These are used identically to normal parame‐
1740 ters, with the caveat that when used on the command line, they must
1741 come after the ioengine that defines them is selected.
1742
1743 (io_uring,libaio)cmdprio_percentage=int
1744 Set the percentage of I/O that will be issued with higher prior‐
1745 ity by setting the priority bit. Non-read I/O is likely unaf‐
1746 fected by ``cmdprio_percentage``. This option cannot be used
1747 with the `prio` or `prioclass` options. For this option to set
1748 the priority bit properly, NCQ priority must be supported and
1749 enabled and `direct=1' option must be used. fio must also be run
1750 as the root user.
1751
1752 (io_uring)fixedbufs
1753 If fio is asked to do direct IO, then Linux will map pages for
1754 each IO call, and release them when IO is done. If this option
1755 is set, the pages are pre-mapped before IO is started. This
1756 eliminates the need to map and release for each IO. This is
1757 more efficient, and reduces the IO latency as well.
1758
1759 (io_uring)hipri
1760 If this option is set, fio will attempt to use polled IO comple‐
1761 tions. Normal IO completions generate interrupts to signal the
1762 completion of IO, polled completions do not. Hence they are re‐
1763 quire active reaping by the application. The benefits are more
1764 efficient IO for high IOPS scenarios, and lower latencies for
1765 low queue depth IO.
1766
1767 (io_uring)registerfiles
1768 With this option, fio registers the set of files being used with
1769 the kernel. This avoids the overhead of managing file counts in
1770 the kernel, making the submission and completion part more
1771 lightweight. Required for the below sqthread_poll option.
1772
1773 (io_uring)sqthread_poll
1774 Normally fio will submit IO by issuing a system call to notify
1775 the kernel of available items in the SQ ring. If this option is
1776 set, the act of submitting IO will be done by a polling thread
1777 in the kernel. This frees up cycles for fio, at the cost of us‐
1778 ing more CPU in the system.
1779
1780 (io_uring)sqthread_poll_cpu
1781 When `sqthread_poll` is set, this option provides a way to de‐
1782 fine which CPU should be used for the polling thread.
1783
1784 (libaio)userspace_reap
1785 Normally, with the libaio engine in use, fio will use the
1786 io_getevents(3) system call to reap newly returned events. With
1787 this flag turned on, the AIO ring will be read directly from
1788 user-space to reap events. The reaping mode is only enabled when
1789 polling for a minimum of 0 events (e.g. when `iodepth_batch_com‐
1790 plete=0').
1791
1792 (pvsync2)hipri
1793 Set RWF_HIPRI on I/O, indicating to the kernel that it's of
1794 higher priority than normal.
1795
1796 (pvsync2)hipri_percentage
1797 When hipri is set this determines the probability of a pvsync2
1798 I/O being high priority. The default is 100%.
1799
1800 (pvsync2,libaio,io_uring)nowait
1801 By default if a request cannot be executed immediately (e.g. re‐
1802 source starvation, waiting on locks) it is queued and the initi‐
1803 ating process will be blocked until the required resource be‐
1804 comes free. This option sets the RWF_NOWAIT flag (supported
1805 from the 4.14 Linux kernel) and the call will return instantly
1806 with EAGAIN or a partial result rather than waiting.
1807
1808 It is useful to also use ignore_error=EAGAIN when using this op‐
1809 tion. Note: glibc 2.27, 2.28 have a bug in syscall wrappers
1810 preadv2, pwritev2. They return EOPNOTSUP instead of EAGAIN.
1811
1812 For cached I/O, using this option usually means a request oper‐
1813 ates only with cached data. Currently the RWF_NOWAIT flag does
1814 not supported for cached write. For direct I/O, requests will
1815 only succeed if cache invalidation isn't required, file blocks
1816 are fully allocated and the disk request could be issued immedi‐
1817 ately.
1818
1819 (cpuio)cpuload=int
1820 Attempt to use the specified percentage of CPU cycles. This is a
1821 mandatory option when using cpuio I/O engine.
1822
1823 (cpuio)cpuchunks=int
1824 Split the load into cycles of the given time. In microseconds.
1825
1826 (cpuio)exit_on_io_done=bool
1827 Detect when I/O threads are done, then exit.
1828
1829 (libhdfs)namenode=str
1830 The hostname or IP address of a HDFS cluster namenode to con‐
1831 tact.
1832
1833 (libhdfs)port
1834 The listening port of the HFDS cluster namenode.
1835
1836 (netsplice,net)port
1837 The TCP or UDP port to bind to or connect to. If this is used
1838 with numjobs to spawn multiple instances of the same job type,
1839 then this will be the starting port number since fio will use a
1840 range of ports.
1841
1842 (rdma)port
1843 The port to use for RDMA-CM communication. This should be the
1844 same value on the client and the server side.
1845
1846 (netsplice,net,rdma)hostname=str
1847 The hostname or IP address to use for TCP, UDP or RDMA-CM based
1848 I/O. If the job is a TCP listener or UDP reader, the hostname
1849 is not used and must be omitted unless it is a valid UDP multi‐
1850 cast address.
1851
1852 (netsplice,net)interface=str
1853 The IP address of the network interface used to send or receive
1854 UDP multicast.
1855
1856 (netsplice,net)ttl=int
1857 Time-to-live value for outgoing UDP multicast packets. Default:
1858 1.
1859
1860 (netsplice,net)nodelay=bool
1861 Set TCP_NODELAY on TCP connections.
1862
1863 (netsplice,net)protocol=str, proto=str
1864 The network protocol to use. Accepted values are:
1865
1866 tcp Transmission control protocol.
1867
1868 tcpv6 Transmission control protocol V6.
1869
1870 udp User datagram protocol.
1871
1872 udpv6 User datagram protocol V6.
1873
1874 unix UNIX domain socket.
1875
1876 When the protocol is TCP or UDP, the port must also be given, as
1877 well as the hostname if the job is a TCP listener or UDP reader.
1878 For unix sockets, the normal filename option should be used and
1879 the port is invalid.
1880
1881 (netsplice,net)listen
1882 For TCP network connections, tell fio to listen for incoming
1883 connections rather than initiating an outgoing connection. The
1884 hostname must be omitted if this option is used.
1885
1886 (netsplice,net)pingpong
1887 Normally a network writer will just continue writing data, and a
1888 network reader will just consume packages. If `pingpong=1' is
1889 set, a writer will send its normal payload to the reader, then
1890 wait for the reader to send the same payload back. This allows
1891 fio to measure network latencies. The submission and completion
1892 latencies then measure local time spent sending or receiving,
1893 and the completion latency measures how long it took for the
1894 other end to receive and send back. For UDP multicast traffic
1895 `pingpong=1' should only be set for a single reader when multi‐
1896 ple readers are listening to the same address.
1897
1898 (netsplice,net)window_size=int
1899 Set the desired socket buffer size for the connection.
1900
1901 (netsplice,net)mss=int
1902 Set the TCP maximum segment size (TCP_MAXSEG).
1903
1904 (e4defrag)donorname=str
1905 File will be used as a block donor (swap extents between files).
1906
1907 (e4defrag)inplace=int
1908 Configure donor file blocks allocation strategy:
1909
1910 0 Default. Preallocate donor's file on init.
1911
1912 1 Allocate space immediately inside defragment
1913 event, and free right after event.
1914
1915 (rbd,rados)clustername=str
1916 Specifies the name of the Ceph cluster.
1917
1918 (rbd)rbdname=str
1919 Specifies the name of the RBD.
1920
1921 (rbd,rados)pool=str
1922 Specifies the name of the Ceph pool containing RBD or RADOS
1923 data.
1924
1925 (rbd,rados)clientname=str
1926 Specifies the username (without the 'client.' prefix) used to
1927 access the Ceph cluster. If the clustername is specified, the
1928 clientname shall be the full *type.id* string. If no type. pre‐
1929 fix is given, fio will add 'client.' by default.
1930
1931 (rbd,rados)busy_poll=bool
1932 Poll store instead of waiting for completion. Usually this pro‐
1933 vides better throughput at cost of higher(up to 100%) CPU uti‐
1934 lization.
1935
1936 (http)http_host=str
1937 Hostname to connect to. For S3, this could be the bucket name.
1938 Default is localhost
1939
1940 (http)http_user=str
1941 Username for HTTP authentication.
1942
1943 (http)http_pass=str
1944 Password for HTTP authentication.
1945
1946 (http)https=str
1947 Whether to use HTTPS instead of plain HTTP. on enables HTTPS;
1948 insecure will enable HTTPS, but disable SSL peer verification
1949 (use with caution!). Default is off.
1950
1951 (http)http_mode=str
1952 Which HTTP access mode to use: webdav, swift, or s3. Default is
1953 webdav.
1954
1955 (http)http_s3_region=str
1956 The S3 region/zone to include in the request. Default is us-
1957 east-1.
1958
1959 (http)http_s3_key=str
1960 The S3 secret key.
1961
1962 (http)http_s3_keyid=str
1963 The S3 key/access id.
1964
1965 (http)http_swift_auth_token=str
1966 The Swift auth token. See the example configuration file on how
1967 to retrieve this.
1968
1969 (http)http_verbose=int
1970 Enable verbose requests from libcurl. Useful for debugging. 1
1971 turns on verbose logging from libcurl, 2 additionally enables
1972 HTTP IO tracing. Default is 0
1973
1974 (mtd)skip_bad=bool
1975 Skip operations against known bad blocks.
1976
1977 (libhdfs)hdfsdirectory
1978 libhdfs will create chunk in this HDFS directory.
1979
1980 (libhdfs)chunk_size
1981 The size of the chunk to use for each file.
1982
1983 (rdma)verb=str
1984 The RDMA verb to use on this side of the RDMA ioengine connec‐
1985 tion. Valid values are write, read, send and recv. These corre‐
1986 spond to the equivalent RDMA verbs (e.g. write = rdma_write
1987 etc.). Note that this only needs to be specified on the client
1988 side of the connection. See the examples folder.
1989
1990 (rdma)bindname=str
1991 The name to use to bind the local RDMA-CM connection to a local
1992 RDMA device. This could be a hostname or an IPv4 or IPv6 ad‐
1993 dress. On the server side this will be passed into the
1994 rdma_bind_addr() function and on the client site it will be used
1995 in the rdma_resolve_add() function. This can be useful when mul‐
1996 tiple paths exist between the client and the server or in cer‐
1997 tain loopback configurations.
1998
1999 (filestat)stat_type=str
2000 Specify stat system call type to measure lookup/getattr perfor‐
2001 mance. Default is stat for stat(2).
2002
2003 (sg)hipri
2004 If this option is set, fio will attempt to use polled IO comple‐
2005 tions. This will have a similar effect as (io_uring)hipri. Only
2006 SCSI READ and WRITE commands will have the SGV4_FLAG_HIPRI set
2007 (not UNMAP (trim) nor VERIFY). Older versions of the Linux sg
2008 driver that do not support hipri will simply ignore this flag
2009 and do normal IO. The Linux SCSI Low Level Driver (LLD) that
2010 "owns" the device also needs to support hipri (also known as
2011 iopoll and mq_poll). The MegaRAID driver is an example of a SCSI
2012 LLD. Default: clear (0) which does normal (interrupted based)
2013 IO.
2014
2015 (sg)readfua=bool
2016 With readfua option set to 1, read operations include the force
2017 unit access (fua) flag. Default: 0.
2018
2019 (sg)writefua=bool
2020 With writefua option set to 1, write operations include the
2021 force unit access (fua) flag. Default: 0.
2022
2023 (sg)sg_write_mode=str
2024 Specify the type of write commands to issue. This option can
2025 take three values:
2026
2027 write (default)
2028 Write opcodes are issued as usual
2029
2030 verify Issue WRITE AND VERIFY commands. The BYTCHK bit is
2031 set to 0. This directs the device to carry out a
2032 medium verification with no data comparison. The
2033 writefua option is ignored with this selection.
2034
2035 same Issue WRITE SAME commands. This transfers a single
2036 block to the device and writes this same block of
2037 data to a contiguous sequence of LBAs beginning at
2038 the specified offset. fio's block size parameter
2039 specifies the amount of data written with each
2040 command. However, the amount of data actually
2041 transferred to the device is equal to the device's
2042 block (sector) size. For a device with 512 byte
2043 sectors, blocksize=8k will write 16 sectors with
2044 each command. fio will still generate 8k of data
2045 for each command butonly the first 512 bytes will
2046 be used and transferred to the device. The write‐
2047 fua option is ignored with this selection.
2048
2049 (nbd)uri=str
2050 Specify the NBD URI of the server to test. The string is a
2051 standard NBD URI (see https://github.com/NetworkBlockDe‐
2052 vice/nbd/tree/master/doc). Example URIs:
2053
2054 nbd://localhost:10809
2055
2056 nbd+unix:///?socket=/tmp/socket
2057
2058 nbds://tlshost/exportname
2059
2060 (libcufile)gpu_dev_ids=str
2061 Specify the GPU IDs to use with CUDA. This is a colon-separated
2062 list of int. GPUs are assigned to workers roundrobin. Default
2063 is 0.
2064
2065 (libcufile)cuda_io=str
2066 Specify the type of I/O to use with CUDA. This option takes the
2067 following values:
2068
2069 cufile (default)
2070 Use libcufile and nvidia-fs. This option performs
2071 I/O directly between a GPUDirect Storage filesys‐
2072 tem and GPU buffers, avoiding use of a bounce buf‐
2073 fer. If verify is set, cudaMemcpy is used to copy
2074 verification data between RAM and GPU(s). Verifi‐
2075 cation data is copied from RAM to GPU before a
2076 write and from GPU to RAM after a read. direct
2077 must be 1.
2078
2079 posix Use POSIX to perform I/O with a RAM buffer, and
2080 use cudaMemcpy to transfer data between RAM and
2081 the GPU(s). Data is copied from GPU to RAM before
2082 a write and copied from RAM to GPU after a read.
2083 verify does not affect the use of cudaMemcpy.
2084
2085 I/O depth
2086 iodepth=int
2087 Number of I/O units to keep in flight against the file. Note
2088 that increasing iodepth beyond 1 will not affect synchronous io‐
2089 engines (except for small degrees when verify_async is in use).
2090 Even async engines may impose OS restrictions causing the de‐
2091 sired depth not to be achieved. This may happen on Linux when
2092 using libaio and not setting `direct=1', since buffered I/O is
2093 not async on that OS. Keep an eye on the I/O depth distribution
2094 in the fio output to verify that the achieved depth is as ex‐
2095 pected. Default: 1.
2096
2097 iodepth_batch_submit=int, iodepth_batch=int
2098 This defines how many pieces of I/O to submit at once. It de‐
2099 faults to 1 which means that we submit each I/O as soon as it is
2100 available, but can be raised to submit bigger batches of I/O at
2101 the time. If it is set to 0 the iodepth value will be used.
2102
2103 iodepth_batch_complete_min=int, iodepth_batch_complete=int
2104 This defines how many pieces of I/O to retrieve at once. It de‐
2105 faults to 1 which means that we'll ask for a minimum of 1 I/O in
2106 the retrieval process from the kernel. The I/O retrieval will go
2107 on until we hit the limit set by iodepth_low. If this variable
2108 is set to 0, then fio will always check for completed events be‐
2109 fore queuing more I/O. This helps reduce I/O latency, at the
2110 cost of more retrieval system calls.
2111
2112 iodepth_batch_complete_max=int
2113 This defines maximum pieces of I/O to retrieve at once. This
2114 variable should be used along with iodepth_batch_com‐
2115 plete_min=int variable, specifying the range of min and max
2116 amount of I/O which should be retrieved. By default it is equal
2117 to iodepth_batch_complete_min value. Example #1:
2118
2119 iodepth_batch_complete_min=1
2120 iodepth_batch_complete_max=<iodepth>
2121
2122 which means that we will retrieve at least 1 I/O and up to the
2123 whole submitted queue depth. If none of I/O has been completed
2124 yet, we will wait. Example #2:
2125
2126 iodepth_batch_complete_min=0
2127 iodepth_batch_complete_max=<iodepth>
2128
2129 which means that we can retrieve up to the whole submitted queue
2130 depth, but if none of I/O has been completed yet, we will NOT
2131 wait and immediately exit the system call. In this example we
2132 simply do polling.
2133
2134 iodepth_low=int
2135 The low water mark indicating when to start filling the queue
2136 again. Defaults to the same as iodepth, meaning that fio will
2137 attempt to keep the queue full at all times. If iodepth is set
2138 to e.g. 16 and iodepth_low is set to 4, then after fio has
2139 filled the queue of 16 requests, it will let the depth drain
2140 down to 4 before starting to fill it again.
2141
2142 serialize_overlap=bool
2143 Serialize in-flight I/Os that might otherwise cause or suffer
2144 from data races. When two or more I/Os are submitted simultane‐
2145 ously, there is no guarantee that the I/Os will be processed or
2146 completed in the submitted order. Further, if two or more of
2147 those I/Os are writes, any overlapping region between them can
2148 become indeterminate/undefined on certain storage. These issues
2149 can cause verification to fail erratically when at least one of
2150 the racing I/Os is changing data and the overlapping region has
2151 a non-zero size. Setting serialize_overlap tells fio to avoid
2152 provoking this behavior by explicitly serializing in-flight I/Os
2153 that have a non-zero overlap. Note that setting this option can
2154 reduce both performance and the iodepth achieved.
2155
2156 This option only applies to I/Os issued for a single job except
2157 when it is enabled along with io_submit_mode=offload. In offload
2158 mode, fio will check for overlap among all I/Os submitted by
2159 offload jobs with serialize_overlap enabled.
2160
2161 Default: false.
2162
2163 io_submit_mode=str
2164 This option controls how fio submits the I/O to the I/O engine.
2165 The default is `inline', which means that the fio job threads
2166 submit and reap I/O directly. If set to `offload', the job
2167 threads will offload I/O submission to a dedicated pool of I/O
2168 threads. This requires some coordination and thus has a bit of
2169 extra overhead, especially for lower queue depth I/O where it
2170 can increase latencies. The benefit is that fio can manage sub‐
2171 mission rates independently of the device completion rates. This
2172 avoids skewed latency reporting if I/O gets backed up on the de‐
2173 vice side (the coordinated omission problem). Note that this op‐
2174 tion cannot reliably be used with async IO engines.
2175
2176 I/O rate
2177 thinktime=time
2178 Stall the job for the specified period of time after an I/O has
2179 completed before issuing the next. May be used to simulate pro‐
2180 cessing being done by an application. When the unit is omitted,
2181 the value is interpreted in microseconds. See thinktime_blocks
2182 and thinktime_spin.
2183
2184 thinktime_spin=time
2185 Only valid if thinktime is set - pretend to spend CPU time doing
2186 something with the data received, before falling back to sleep‐
2187 ing for the rest of the period specified by thinktime. When the
2188 unit is omitted, the value is interpreted in microseconds.
2189
2190 thinktime_blocks=int
2191 Only valid if thinktime is set - control how many blocks to is‐
2192 sue, before waiting thinktime usecs. If not set, defaults to 1
2193 which will make fio wait thinktime usecs after every block. This
2194 effectively makes any queue depth setting redundant, since no
2195 more than 1 I/O will be queued before we have to complete it and
2196 do our thinktime. In other words, this setting effectively caps
2197 the queue depth if the latter is larger.
2198
2199 thinktime_blocks_type=str
2200 Only valid if thinktime is set - control how thinktime_blocks
2201 triggers. The default is `complete', which triggers thinktime
2202 when fio completes thinktime_blocks blocks. If this is set to
2203 `issue', then the trigger happens at the issue side.
2204
2205 rate=int[,int][,int]
2206 Cap the bandwidth used by this job. The number is in bytes/sec,
2207 the normal suffix rules apply. Comma-separated values may be
2208 specified for reads, writes, and trims as described in block‐
2209 size.
2210
2211 For example, using `rate=1m,500k' would limit reads to 1MiB/sec
2212 and writes to 500KiB/sec. Capping only reads or writes can be
2213 done with `rate=,500k' or `rate=500k,' where the former will
2214 only limit writes (to 500KiB/sec) and the latter will only limit
2215 reads.
2216
2217 rate_min=int[,int][,int]
2218 Tell fio to do whatever it can to maintain at least this band‐
2219 width. Failing to meet this requirement will cause the job to
2220 exit. Comma-separated values may be specified for reads, writes,
2221 and trims as described in blocksize.
2222
2223 rate_iops=int[,int][,int]
2224 Cap the bandwidth to this number of IOPS. Basically the same as
2225 rate, just specified independently of bandwidth. If the job is
2226 given a block size range instead of a fixed value, the smallest
2227 block size is used as the metric. Comma-separated values may be
2228 specified for reads, writes, and trims as described in block‐
2229 size.
2230
2231 rate_iops_min=int[,int][,int]
2232 If fio doesn't meet this rate of I/O, it will cause the job to
2233 exit. Comma-separated values may be specified for reads,
2234 writes, and trims as described in blocksize.
2235
2236 rate_process=str
2237 This option controls how fio manages rated I/O submissions. The
2238 default is `linear', which submits I/O in a linear fashion with
2239 fixed delays between I/Os that gets adjusted based on I/O com‐
2240 pletion rates. If this is set to `poisson', fio will submit I/O
2241 based on a more real world random request flow, known as the
2242 Poisson process (https://en.wikipedia.org/wiki/Pois‐
2243 son_point_process). The lambda will be 10^6 / IOPS for the given
2244 workload.
2245
2246 rate_ignore_thinktime=bool
2247 By default, fio will attempt to catch up to the specified rate
2248 setting, if any kind of thinktime setting was used. If this op‐
2249 tion is set, then fio will ignore the thinktime and continue do‐
2250 ing IO at the specified rate, instead of entering a catch-up
2251 mode after thinktime is done.
2252
2253 I/O latency
2254 latency_target=time
2255 If set, fio will attempt to find the max performance point that
2256 the given workload will run at while maintaining a latency below
2257 this target. When the unit is omitted, the value is interpreted
2258 in microseconds. See latency_window and latency_percentile.
2259
2260 latency_window=time
2261 Used with latency_target to specify the sample window that the
2262 job is run at varying queue depths to test the performance. When
2263 the unit is omitted, the value is interpreted in microseconds.
2264
2265 latency_percentile=float
2266 The percentage of I/Os that must fall within the criteria speci‐
2267 fied by latency_target and latency_window. If not set, this de‐
2268 faults to 100.0, meaning that all I/Os must be equal or below to
2269 the value set by latency_target.
2270
2271 latency_run=bool
2272 Used with latency_target. If false (default), fio will find the
2273 highest queue depth that meets latency_target and exit. If true,
2274 fio will continue running and try to meet latency_target by ad‐
2275 justing queue depth.
2276
2277 max_latency=time
2278 If set, fio will exit the job with an ETIMEDOUT error if it ex‐
2279 ceeds this maximum latency. When the unit is omitted, the value
2280 is interpreted in microseconds.
2281
2282 rate_cycle=int
2283 Average bandwidth for rate and rate_min over this number of mil‐
2284 liseconds. Defaults to 1000.
2285
2286 I/O replay
2287 write_iolog=str
2288 Write the issued I/O patterns to the specified file. See
2289 read_iolog. Specify a separate file for each job, otherwise the
2290 iologs will be interspersed and the file may be corrupt.
2291
2292 read_iolog=str
2293 Open an iolog with the specified filename and replay the I/O
2294 patterns it contains. This can be used to store a workload and
2295 replay it sometime later. The iolog given may also be a blktrace
2296 binary file, which allows fio to replay a workload captured by
2297 blktrace. See blktrace(8) for how to capture such logging data.
2298 For blktrace replay, the file needs to be turned into a blkparse
2299 binary data file first (`blkparse <device> -o /dev/null -d
2300 file_for_fio.bin'). You can specify a number of files by sepa‐
2301 rating the names with a ':' character. See the filename option
2302 for information on how to escape ':' characters within the file
2303 names. These files will be sequentially assigned to job clones
2304 created by numjobs. '-' is a reserved name, meaning read from
2305 stdin, notably if filename is set to '-' which means stdin as
2306 well, then this flag can't be set to '-'.
2307
2308 read_iolog_chunked=bool
2309 Determines how iolog is read. If false (default) entire
2310 read_iolog will be read at once. If selected true, input from
2311 iolog will be read gradually. Useful when iolog is very large,
2312 or it is generated.
2313
2314 merge_blktrace_file=str
2315 When specified, rather than replaying the logs passed to
2316 read_iolog, the logs go through a merge phase which aggregates
2317 them into a single blktrace. The resulting file is then passed
2318 on as the read_iolog parameter. The intention here is to make
2319 the order of events consistent. This limits the influence of the
2320 scheduler compared to replaying multiple blktraces via concur‐
2321 rent jobs.
2322
2323 merge_blktrace_scalars=float_list
2324 This is a percentage based option that is index paired with the
2325 list of files passed to read_iolog. When merging is performed,
2326 scale the time of each event by the corresponding amount. For
2327 example, `--merge_blktrace_scalars="50:100"' runs the first
2328 trace in halftime and the second trace in realtime. This knob is
2329 separately tunable from replay_time_scale which scales the trace
2330 during runtime and will not change the output of the merge un‐
2331 like this option.
2332
2333 merge_blktrace_iters=float_list
2334 This is a whole number option that is index paired with the list
2335 of files passed to read_iolog. When merging is performed, run
2336 each trace for the specified number of iterations. For example,
2337 `--merge_blktrace_iters="2:1"' runs the first trace for two it‐
2338 erations and the second trace for one iteration.
2339
2340 replay_no_stall=bool
2341 When replaying I/O with read_iolog the default behavior is to
2342 attempt to respect the timestamps within the log and replay them
2343 with the appropriate delay between IOPS. By setting this vari‐
2344 able fio will not respect the timestamps and attempt to replay
2345 them as fast as possible while still respecting ordering. The
2346 result is the same I/O pattern to a given device, but different
2347 timings.
2348
2349 replay_time_scale=int
2350 When replaying I/O with read_iolog, fio will honor the original
2351 timing in the trace. With this option, it's possible to scale
2352 the time. It's a percentage option, if set to 50 it means run at
2353 50% the original IO rate in the trace. If set to 200, run at
2354 twice the original IO rate. Defaults to 100.
2355
2356 replay_redirect=str
2357 While replaying I/O patterns using read_iolog the default behav‐
2358 ior is to replay the IOPS onto the major/minor device that each
2359 IOP was recorded from. This is sometimes undesirable because on
2360 a different machine those major/minor numbers can map to a dif‐
2361 ferent device. Changing hardware on the same system can also re‐
2362 sult in a different major/minor mapping. replay_redirect causes
2363 all I/Os to be replayed onto the single specified device regard‐
2364 less of the device it was recorded from. i.e. `replay_redi‐
2365 rect=/dev/sdc' would cause all I/O in the blktrace or iolog to
2366 be replayed onto `/dev/sdc'. This means multiple devices will be
2367 replayed onto a single device, if the trace contains multiple
2368 devices. If you want multiple devices to be replayed concur‐
2369 rently to multiple redirected devices you must blkparse your
2370 trace into separate traces and replay them with independent fio
2371 invocations. Unfortunately this also breaks the strict time or‐
2372 dering between multiple device accesses.
2373
2374 replay_align=int
2375 Force alignment of the byte offsets in a trace to this value.
2376 The value must be a power of 2.
2377
2378 replay_scale=int
2379 Scale bye offsets down by this factor when replaying traces.
2380 Should most likely use replay_align as well.
2381
2382 Threads, processes and job synchronization
2383 replay_skip=str
2384 Sometimes it's useful to skip certain IO types in a replay
2385 trace. This could be, for instance, eliminating the writes in
2386 the trace. Or not replaying the trims/discards, if you are redi‐
2387 recting to a device that doesn't support them. This option
2388 takes a comma separated list of read, write, trim, sync.
2389
2390 thread Fio defaults to creating jobs by using fork, however if this op‐
2391 tion is given, fio will create jobs by using POSIX Threads'
2392 function pthread_create(3) to create threads instead.
2393
2394 wait_for=str
2395 If set, the current job won't be started until all workers of
2396 the specified waitee job are done. wait_for operates on the job
2397 name basis, so there are a few limitations. First, the waitee
2398 must be defined prior to the waiter job (meaning no forward ref‐
2399 erences). Second, if a job is being referenced as a waitee, it
2400 must have a unique name (no duplicate waitees).
2401
2402 nice=int
2403 Run the job with the given nice value. See man nice(2). On Win‐
2404 dows, values less than -15 set the process class to "High"; -1
2405 through -15 set "Above Normal"; 1 through 15 "Below Normal"; and
2406 above 15 "Idle" priority class.
2407
2408 prio=int
2409 Set the I/O priority value of this job. Linux limits us to a
2410 positive value between 0 and 7, with 0 being the highest. See
2411 man ionice(1). Refer to an appropriate manpage for other operat‐
2412 ing systems since meaning of priority may differ. For per-com‐
2413 mand priority setting, see I/O engine specific `cmdprio_percent‐
2414 age` and `hipri_percentage` options.
2415
2416 prioclass=int
2417 Set the I/O priority class. See man ionice(1). For per-command
2418 priority setting, see I/O engine specific `cmdprio_percentage`
2419 and `hipri_percent` options.
2420
2421 cpus_allowed=str
2422 Controls the same options as cpumask, but accepts a textual
2423 specification of the permitted CPUs instead and CPUs are indexed
2424 from 0. So to use CPUs 0 and 5 you would specify `cpus_al‐
2425 lowed=0,5'. This option also allows a range of CPUs to be speci‐
2426 fied -- say you wanted a binding to CPUs 0, 5, and 8 to 15, you
2427 would set `cpus_allowed=0,5,8-15'.
2428
2429 On Windows, when `cpus_allowed' is unset only CPUs from fio's
2430 current processor group will be used and affinity settings are
2431 inherited from the system. An fio build configured to target
2432 Windows 7 makes options that set CPUs processor group aware and
2433 values will set both the processor group and a CPU from within
2434 that group. For example, on a system where processor group 0 has
2435 40 CPUs and processor group 1 has 32 CPUs, `cpus_allowed' values
2436 between 0 and 39 will bind CPUs from processor group 0 and
2437 `cpus_allowed' values between 40 and 71 will bind CPUs from pro‐
2438 cessor group 1. When using `cpus_allowed_policy=shared' all CPUs
2439 specified by a single `cpus_allowed' option must be from the
2440 same processor group. For Windows fio builds not built for Win‐
2441 dows 7, CPUs will only be selected from (and be relative to)
2442 whatever processor group fio happens to be running in and CPUs
2443 from other processor groups cannot be used.
2444
2445 cpus_allowed_policy=str
2446 Set the policy of how fio distributes the CPUs specified by
2447 cpus_allowed or cpumask. Two policies are supported:
2448
2449 shared All jobs will share the CPU set specified.
2450
2451 split Each job will get a unique CPU from the CPU set.
2452
2453 shared is the default behavior, if the option isn't specified.
2454 If split is specified, then fio will assign one cpu per job. If
2455 not enough CPUs are given for the jobs listed, then fio will
2456 roundrobin the CPUs in the set.
2457
2458 cpumask=int
2459 Set the CPU affinity of this job. The parameter given is a bit
2460 mask of allowed CPUs the job may run on. So if you want the al‐
2461 lowed CPUs to be 1 and 5, you would pass the decimal value of (1
2462 << 1 | 1 << 5), or 34. See man sched_setaffinity(2). This may
2463 not work on all supported operating systems or kernel versions.
2464 This option doesn't work well for a higher CPU count than what
2465 you can store in an integer mask, so it can only control cpus
2466 1-32. For boxes with larger CPU counts, use cpus_allowed.
2467
2468 numa_cpu_nodes=str
2469 Set this job running on specified NUMA nodes' CPUs. The argu‐
2470 ments allow comma delimited list of cpu numbers, A-B ranges, or
2471 `all'. Note, to enable NUMA options support, fio must be built
2472 on a system with libnuma-dev(el) installed.
2473
2474 numa_mem_policy=str
2475 Set this job's memory policy and corresponding NUMA nodes. For‐
2476 mat of the arguments:
2477
2478 <mode>[:<nodelist>]
2479
2480 `mode' is one of the following memory policies: `default', `pre‐
2481 fer', `bind', `interleave' or `local'. For `default' and `local'
2482 memory policies, no node needs to be specified. For `prefer',
2483 only one node is allowed. For `bind' and `interleave' the `node‐
2484 list' may be as follows: a comma delimited list of numbers, A-B
2485 ranges, or `all'.
2486
2487 cgroup=str
2488 Add job to this control group. If it doesn't exist, it will be
2489 created. The system must have a mounted cgroup blkio mount point
2490 for this to work. If your system doesn't have it mounted, you
2491 can do so with:
2492
2493 # mount -t cgroup -o blkio none /cgroup
2494
2495 cgroup_weight=int
2496 Set the weight of the cgroup to this value. See the documenta‐
2497 tion that comes with the kernel, allowed values are in the range
2498 of 100..1000.
2499
2500 cgroup_nodelete=bool
2501 Normally fio will delete the cgroups it has created after the
2502 job completion. To override this behavior and to leave cgroups
2503 around after the job completion, set `cgroup_nodelete=1'. This
2504 can be useful if one wants to inspect various cgroup files after
2505 job completion. Default: false.
2506
2507 flow_id=int
2508 The ID of the flow. If not specified, it defaults to being a
2509 global flow. See flow.
2510
2511 flow=int
2512 Weight in token-based flow control. If this value is used, then
2513 fio regulates the activity between two or more jobs sharing the
2514 same flow_id. Fio attempts to keep each job activity propor‐
2515 tional to other jobs' activities in the same flow_id group, with
2516 respect to requested weight per job. That is, if one job has
2517 `flow=3', another job has `flow=2' and another with `flow=1`,
2518 then there will be a roughly 3:2:1 ratio in how much one runs vs
2519 the others.
2520
2521 flow_sleep=int
2522 The period of time, in microseconds, to wait after the flow
2523 counter has exceeded its proportion before retrying operations.
2524
2525 stonewall, wait_for_previous
2526 Wait for preceding jobs in the job file to exit, before starting
2527 this one. Can be used to insert serialization points in the job
2528 file. A stone wall also implies starting a new reporting group,
2529 see group_reporting. Optionally you can use `stonewall=0` to
2530 disable or `stonewall=1` to enable it.
2531
2532 exitall
2533 By default, fio will continue running all other jobs when one
2534 job finishes. Sometimes this is not the desired action. Setting
2535 exitall will instead make fio terminate all jobs in the same
2536 group, as soon as one job of that group finishes.
2537
2538 exit_what=str
2539 By default, fio will continue running all other jobs when one
2540 job finishes. Sometimes this is not the desired action. Setting
2541 exitall will instead make fio terminate all jobs in the same
2542 group. The option exit_what allows you to control which jobs get
2543 terminated when exitall is enabled. The default value is group.
2544 The allowed values are:
2545
2546 all terminates all jobs.
2547
2548 group is the default and does not change the behaviour
2549 of exitall.
2550
2551 stonewall
2552 terminates all currently running jobs across all
2553 groups and continues execution with the next
2554 stonewalled group.
2555
2556 exec_prerun=str
2557 Before running this job, issue the command specified through
2558 system(3). Output is redirected in a file called `jobname.pre‐
2559 run.txt'.
2560
2561 exec_postrun=str
2562 After the job completes, issue the command specified though sys‐
2563 tem(3). Output is redirected in a file called `job‐
2564 name.postrun.txt'.
2565
2566 uid=int
2567 Instead of running as the invoking user, set the user ID to this
2568 value before the thread/process does any work.
2569
2570 gid=int
2571 Set group ID, see uid.
2572
2573 Verification
2574 verify_only
2575 Do not perform specified workload, only verify data still
2576 matches previous invocation of this workload. This option allows
2577 one to check data multiple times at a later date without over‐
2578 writing it. This option makes sense only for workloads that
2579 write data, and does not support workloads with the time_based
2580 option set.
2581
2582 do_verify=bool
2583 Run the verify phase after a write phase. Only valid if verify
2584 is set. Default: true.
2585
2586 verify=str
2587 If writing to a file, fio can verify the file contents after
2588 each iteration of the job. Each verification method also implies
2589 verification of special header, which is written to the begin‐
2590 ning of each block. This header also includes meta information,
2591 like offset of the block, block number, timestamp when block was
2592 written, etc. verify can be combined with verify_pattern option.
2593 The allowed values are:
2594
2595 md5 Use an md5 sum of the data area and store it in
2596 the header of each block.
2597
2598 crc64 Use an experimental crc64 sum of the data area and
2599 store it in the header of each block.
2600
2601 crc32c Use a crc32c sum of the data area and store it in
2602 the header of each block. This will automatically
2603 use hardware acceleration (e.g. SSE4.2 on an x86
2604 or CRC crypto extensions on ARM64) but will fall
2605 back to software crc32c if none is found. Gener‐
2606 ally the fastest checksum fio supports when hard‐
2607 ware accelerated.
2608
2609 crc32c-intel
2610 Synonym for crc32c.
2611
2612 crc32 Use a crc32 sum of the data area and store it in
2613 the header of each block.
2614
2615 crc16 Use a crc16 sum of the data area and store it in
2616 the header of each block.
2617
2618 crc7 Use a crc7 sum of the data area and store it in
2619 the header of each block.
2620
2621 xxhash Use xxhash as the checksum function. Generally the
2622 fastest software checksum that fio supports.
2623
2624 sha512 Use sha512 as the checksum function.
2625
2626 sha256 Use sha256 as the checksum function.
2627
2628 sha1 Use optimized sha1 as the checksum function.
2629
2630 sha3-224
2631 Use optimized sha3-224 as the checksum function.
2632
2633 sha3-256
2634 Use optimized sha3-256 as the checksum function.
2635
2636 sha3-384
2637 Use optimized sha3-384 as the checksum function.
2638
2639 sha3-512
2640 Use optimized sha3-512 as the checksum function.
2641
2642 meta This option is deprecated, since now meta informa‐
2643 tion is included in generic verification header
2644 and meta verification happens by default. For de‐
2645 tailed information see the description of the ver‐
2646 ify setting. This option is kept because of com‐
2647 patibility's sake with old configurations. Do not
2648 use it.
2649
2650 pattern
2651 Verify a strict pattern. Normally fio includes a
2652 header with some basic information and checksum‐
2653 ming, but if this option is set, only the specific
2654 pattern set with verify_pattern is verified.
2655
2656 null Only pretend to verify. Useful for testing inter‐
2657 nals with `ioengine=null', not for much else.
2658
2659 This option can be used for repeated burn-in tests of a system
2660 to make sure that the written data is also correctly read back.
2661 If the data direction given is a read or random read, fio will
2662 assume that it should verify a previously written file. If the
2663 data direction includes any form of write, the verify will be of
2664 the newly written data.
2665
2666 To avoid false verification errors, do not use the norandommap
2667 option when verifying data with async I/O engines and I/O depths
2668 > 1. Or use the norandommap and the lfsr random generator to‐
2669 gether to avoid writing to the same offset with muliple out‐
2670 standing I/Os.
2671
2672 verify_offset=int
2673 Swap the verification header with data somewhere else in the
2674 block before writing. It is swapped back before verifying.
2675
2676 verify_interval=int
2677 Write the verification header at a finer granularity than the
2678 blocksize. It will be written for chunks the size of verify_in‐
2679 terval. blocksize should divide this evenly.
2680
2681 verify_pattern=str
2682 If set, fio will fill the I/O buffers with this pattern. Fio de‐
2683 faults to filling with totally random bytes, but sometimes it's
2684 interesting to fill with a known pattern for I/O verification
2685 purposes. Depending on the width of the pattern, fio will fill
2686 1/2/3/4 bytes of the buffer at the time (it can be either a dec‐
2687 imal or a hex number). The verify_pattern if larger than a
2688 32-bit quantity has to be a hex number that starts with either
2689 "0x" or "0X". Use with verify. Also, verify_pattern supports %o
2690 format, which means that for each block offset will be written
2691 and then verified back, e.g.:
2692
2693 verify_pattern=%o
2694
2695 Or use combination of everything:
2696
2697 verify_pattern=0xff%o"abcd"-12
2698
2699 verify_fatal=bool
2700 Normally fio will keep checking the entire contents before quit‐
2701 ting on a block verification failure. If this option is set, fio
2702 will exit the job on the first observed failure. Default: false.
2703
2704 verify_dump=bool
2705 If set, dump the contents of both the original data block and
2706 the data block we read off disk to files. This allows later
2707 analysis to inspect just what kind of data corruption occurred.
2708 Off by default.
2709
2710 verify_async=int
2711 Fio will normally verify I/O inline from the submitting thread.
2712 This option takes an integer describing how many async offload
2713 threads to create for I/O verification instead, causing fio to
2714 offload the duty of verifying I/O contents to one or more sepa‐
2715 rate threads. If using this offload option, even sync I/O en‐
2716 gines can benefit from using an iodepth setting higher than 1,
2717 as it allows them to have I/O in flight while verifies are run‐
2718 ning. Defaults to 0 async threads, i.e. verification is not
2719 asynchronous.
2720
2721 verify_async_cpus=str
2722 Tell fio to set the given CPU affinity on the async I/O verifi‐
2723 cation threads. See cpus_allowed for the format used.
2724
2725 verify_backlog=int
2726 Fio will normally verify the written contents of a job that uti‐
2727 lizes verify once that job has completed. In other words, every‐
2728 thing is written then everything is read back and verified. You
2729 may want to verify continually instead for a variety of reasons.
2730 Fio stores the meta data associated with an I/O block in memory,
2731 so for large verify workloads, quite a bit of memory would be
2732 used up holding this meta data. If this option is enabled, fio
2733 will write only N blocks before verifying these blocks.
2734
2735 verify_backlog_batch=int
2736 Control how many blocks fio will verify if verify_backlog is
2737 set. If not set, will default to the value of verify_backlog
2738 (meaning the entire queue is read back and verified). If ver‐
2739 ify_backlog_batch is less than verify_backlog then not all
2740 blocks will be verified, if verify_backlog_batch is larger than
2741 verify_backlog, some blocks will be verified more than once.
2742
2743 verify_state_save=bool
2744 When a job exits during the write phase of a verify workload,
2745 save its current state. This allows fio to replay up until that
2746 point, if the verify state is loaded for the verify read phase.
2747 The format of the filename is, roughly:
2748
2749 <type>-<jobname>-<jobindex>-verify.state.
2750
2751 <type> is "local" for a local run, "sock" for a client/server
2752 socket connection, and "ip" (192.168.0.1, for instance) for a
2753 networked client/server connection. Defaults to true.
2754
2755 verify_state_load=bool
2756 If a verify termination trigger was used, fio stores the current
2757 write state of each thread. This can be used at verification
2758 time so that fio knows how far it should verify. Without this
2759 information, fio will run a full verification pass, according to
2760 the settings in the job file used. Default false.
2761
2762 trim_percentage=int
2763 Number of verify blocks to discard/trim.
2764
2765 trim_verify_zero=bool
2766 Verify that trim/discarded blocks are returned as zeros.
2767
2768 trim_backlog=int
2769 Verify that trim/discarded blocks are returned as zeros.
2770
2771 trim_backlog_batch=int
2772 Trim this number of I/O blocks.
2773
2774 experimental_verify=bool
2775 Enable experimental verification.
2776
2777 Steady state
2778 steadystate=str:float, ss=str:float
2779 Define the criterion and limit for assessing steady state per‐
2780 formance. The first parameter designates the criterion whereas
2781 the second parameter sets the threshold. When the criterion
2782 falls below the threshold for the specified duration, the job
2783 will stop. For example, `iops_slope:0.1%' will direct fio to
2784 terminate the job when the least squares regression slope falls
2785 below 0.1% of the mean IOPS. If group_reporting is enabled this
2786 will apply to all jobs in the group. Below is the list of avail‐
2787 able steady state assessment criteria. All assessments are car‐
2788 ried out using only data from the rolling collection window.
2789 Threshold limits can be expressed as a fixed value or as a per‐
2790 centage of the mean in the collection window.
2791
2792 When using this feature, most jobs should include the time_based
2793 and runtime options or the loops option so that fio does not
2794 stop running after it has covered the full size of the specified
2795 file(s) or device(s).
2796
2797 iops Collect IOPS data. Stop the job if all in‐
2798 dividual IOPS measurements are within the
2799 specified limit of the mean IOPS (e.g.,
2800 `iops:2' means that all individual IOPS
2801 values must be within 2 of the mean,
2802 whereas `iops:0.2%' means that all individ‐
2803 ual IOPS values must be within 0.2% of the
2804 mean IOPS to terminate the job).
2805
2806 iops_slope
2807 Collect IOPS data and calculate the least
2808 squares regression slope. Stop the job if
2809 the slope falls below the specified limit.
2810
2811 bw Collect bandwidth data. Stop the job if all
2812 individual bandwidth measurements are
2813 within the specified limit of the mean
2814 bandwidth.
2815
2816 bw_slope
2817 Collect bandwidth data and calculate the
2818 least squares regression slope. Stop the
2819 job if the slope falls below the specified
2820 limit.
2821
2822 steadystate_duration=time, ss_dur=time
2823 A rolling window of this duration will be used to judge
2824 whether steady state has been reached. Data will be col‐
2825 lected once per second. The default is 0 which disables
2826 steady state detection. When the unit is omitted, the
2827 value is interpreted in seconds.
2828
2829 steadystate_ramp_time=time, ss_ramp=time
2830 Allow the job to run for the specified duration before
2831 beginning data collection for checking the steady state
2832 job termination criterion. The default is 0. When the
2833 unit is omitted, the value is interpreted in seconds.
2834
2835 Measurements and reporting
2836 per_job_logs=bool
2837 If set, this generates bw/clat/iops log with per file private
2838 filenames. If not set, jobs with identical names will share the
2839 log filename. Default: true.
2840
2841 group_reporting
2842 It may sometimes be interesting to display statistics for groups
2843 of jobs as a whole instead of for each individual job. This is
2844 especially true if numjobs is used; looking at individual
2845 thread/process output quickly becomes unwieldy. To see the final
2846 report per-group instead of per-job, use group_reporting. Jobs
2847 in a file will be part of the same reporting group, unless if
2848 separated by a stonewall, or by using new_group.
2849
2850 new_group
2851 Start a new reporting group. See: group_reporting. If not given,
2852 all jobs in a file will be part of the same reporting group, un‐
2853 less separated by a stonewall.
2854
2855 stats=bool
2856 By default, fio collects and shows final output results for all
2857 jobs that run. If this option is set to 0, then fio will ignore
2858 it in the final stat output.
2859
2860 write_bw_log=str
2861 If given, write a bandwidth log for this job. Can be used to
2862 store data of the bandwidth of the jobs in their lifetime.
2863
2864 If no str argument is given, the default filename of `job‐
2865 name_type.x.log' is used. Even when the argument is given, fio
2866 will still append the type of log. So if one specifies:
2867
2868 write_bw_log=foo
2869
2870 The actual log name will be `foo_bw.x.log' where `x' is the in‐
2871 dex of the job (1..N, where N is the number of jobs). If
2872 per_job_logs is false, then the filename will not include the
2873 `.x` job index.
2874
2875 The included fio_generate_plots script uses gnuplot to turn
2876 these text files into nice graphs. See the LOG FILE FORMATS sec‐
2877 tion for how data is structured within the file.
2878
2879 write_lat_log=str
2880 Same as write_bw_log, except this option creates I/O submission
2881 (e.g., `name_slat.x.log'), completion (e.g., `name_clat.x.log'),
2882 and total (e.g., `name_lat.x.log') latency files instead. See
2883 write_bw_log for details about the filename format and the LOG
2884 FILE FORMATS section for how data is structured within the
2885 files.
2886
2887 write_hist_log=str
2888 Same as write_bw_log but writes an I/O completion latency his‐
2889 togram file (e.g., `name_hist.x.log') instead. Note that this
2890 file will be empty unless log_hist_msec has also been set. See
2891 write_bw_log for details about the filename format and the LOG
2892 FILE FORMATS section for how data is structured within the file.
2893
2894 write_iops_log=str
2895 Same as write_bw_log, but writes an IOPS file (e.g.
2896 `name_iops.x.log`) instead. Because fio defaults to individual
2897 I/O logging, the value entry in the IOPS log will be 1 unless
2898 windowed logging (see log_avg_msec) has been enabled. See
2899 write_bw_log for details about the filename format and LOG FILE
2900 FORMATS for how data is structured within the file.
2901
2902 log_avg_msec=int
2903 By default, fio will log an entry in the iops, latency, or bw
2904 log for every I/O that completes. When writing to the disk log,
2905 that can quickly grow to a very large size. Setting this option
2906 makes fio average the each log entry over the specified period
2907 of time, reducing the resolution of the log. See log_max_value
2908 as well. Defaults to 0, logging all entries. Also see LOG FILE
2909 FORMATS section.
2910
2911 log_hist_msec=int
2912 Same as log_avg_msec, but logs entries for completion latency
2913 histograms. Computing latency percentiles from averages of in‐
2914 tervals using log_avg_msec is inaccurate. Setting this option
2915 makes fio log histogram entries over the specified period of
2916 time, reducing log sizes for high IOPS devices while retaining
2917 percentile accuracy. See log_hist_coarseness and write_hist_log
2918 as well. Defaults to 0, meaning histogram logging is disabled.
2919
2920 log_hist_coarseness=int
2921 Integer ranging from 0 to 6, defining the coarseness of the res‐
2922 olution of the histogram logs enabled with log_hist_msec. For
2923 each increment in coarseness, fio outputs half as many bins. De‐
2924 faults to 0, for which histogram logs contain 1216 latency bins.
2925 See LOG FILE FORMATS section.
2926
2927 log_max_value=bool
2928 If log_avg_msec is set, fio logs the average over that window.
2929 If you instead want to log the maximum value, set this option to
2930 1. Defaults to 0, meaning that averaged values are logged.
2931
2932 log_offset=bool
2933 If this is set, the iolog options will include the byte offset
2934 for the I/O entry as well as the other data values. Defaults to
2935 0 meaning that offsets are not present in logs. Also see LOG
2936 FILE FORMATS section.
2937
2938 log_compression=int
2939 If this is set, fio will compress the I/O logs as it goes, to
2940 keep the memory footprint lower. When a log reaches the speci‐
2941 fied size, that chunk is removed and compressed in the back‐
2942 ground. Given that I/O logs are fairly highly compressible, this
2943 yields a nice memory savings for longer runs. The downside is
2944 that the compression will consume some background CPU cycles, so
2945 it may impact the run. This, however, is also true if the log‐
2946 ging ends up consuming most of the system memory. So pick your
2947 poison. The I/O logs are saved normally at the end of a run, by
2948 decompressing the chunks and storing them in the specified log
2949 file. This feature depends on the availability of zlib.
2950
2951 log_compression_cpus=str
2952 Define the set of CPUs that are allowed to handle online log
2953 compression for the I/O jobs. This can provide better isolation
2954 between performance sensitive jobs, and background compression
2955 work. See cpus_allowed for the format used.
2956
2957 log_store_compressed=bool
2958 If set, fio will store the log files in a compressed format.
2959 They can be decompressed with fio, using the --inflate-log com‐
2960 mand line parameter. The files will be stored with a `.fz' suf‐
2961 fix.
2962
2963 log_unix_epoch=bool
2964 If set, fio will log Unix timestamps to the log files produced
2965 by enabling write_type_log for each log type, instead of the de‐
2966 fault zero-based timestamps.
2967
2968 block_error_percentiles=bool
2969 If set, record errors in trim block-sized units from writes and
2970 trims and output a histogram of how many trims it took to get to
2971 errors, and what kind of error was encountered.
2972
2973 bwavgtime=int
2974 Average the calculated bandwidth over the given time. Value is
2975 specified in milliseconds. If the job also does bandwidth log‐
2976 ging through write_bw_log, then the minimum of this option and
2977 log_avg_msec will be used. Default: 500ms.
2978
2979 iopsavgtime=int
2980 Average the calculated IOPS over the given time. Value is speci‐
2981 fied in milliseconds. If the job also does IOPS logging through
2982 write_iops_log, then the minimum of this option and log_avg_msec
2983 will be used. Default: 500ms.
2984
2985 disk_util=bool
2986 Generate disk utilization statistics, if the platform supports
2987 it. Default: true.
2988
2989 disable_lat=bool
2990 Disable measurements of total latency numbers. Useful only for
2991 cutting back the number of calls to gettimeofday(2), as that
2992 does impact performance at really high IOPS rates. Note that to
2993 really get rid of a large amount of these calls, this option
2994 must be used with disable_slat and disable_bw_measurement as
2995 well.
2996
2997 disable_clat=bool
2998 Disable measurements of completion latency numbers. See dis‐
2999 able_lat.
3000
3001 disable_slat=bool
3002 Disable measurements of submission latency numbers. See dis‐
3003 able_lat.
3004
3005 disable_bw_measurement=bool, disable_bw=bool
3006 Disable measurements of throughput/bandwidth numbers. See dis‐
3007 able_lat.
3008
3009 slat_percentiles=bool
3010 Report submission latency percentiles. Submission latency is not
3011 recorded for synchronous ioengines.
3012
3013 clat_percentiles=bool
3014 Report completion latency percentiles.
3015
3016 lat_percentiles=bool
3017 Report total latency percentiles. Total latency is the sum of
3018 submission latency and completion latency.
3019
3020 percentile_list=float_list
3021 Overwrite the default list of percentiles for latencies and the
3022 block error histogram. Each number is a floating point number in
3023 the range (0,100], and the maximum length of the list is 20. Use
3024 ':' to separate the numbers. For example, `--per‐
3025 centile_list=99.5:99.9' will cause fio to report the latency du‐
3026 rations below which 99.5% and 99.9% of the observed latencies
3027 fell, respectively.
3028
3029 significant_figures=int
3030 If using --output-format of `normal', set the significant fig‐
3031 ures to this value. Higher values will yield more precise IOPS
3032 and throughput units, while lower values will round. Requires a
3033 minimum value of 1 and a maximum value of 10. Defaults to 4.
3034
3035 Error handling
3036 exitall_on_error
3037 When one job finishes in error, terminate the rest. The default
3038 is to wait for each job to finish.
3039
3040 continue_on_error=str
3041 Normally fio will exit the job on the first observed failure. If
3042 this option is set, fio will continue the job when there is a
3043 'non-fatal error' (EIO or EILSEQ) until the runtime is exceeded
3044 or the I/O size specified is completed. If this option is used,
3045 there are two more stats that are appended, the total error
3046 count and the first error. The error field given in the stats is
3047 the first error that was hit during the run. The allowed values
3048 are:
3049
3050 none Exit on any I/O or verify errors.
3051
3052 read Continue on read errors, exit on all others.
3053
3054 write Continue on write errors, exit on all others.
3055
3056 io Continue on any I/O error, exit on all others.
3057
3058 verify Continue on verify errors, exit on all others.
3059
3060 all Continue on all errors.
3061
3062 0 Backward-compatible alias for 'none'.
3063
3064 1 Backward-compatible alias for 'all'.
3065
3066 ignore_error=str
3067 Sometimes you want to ignore some errors during test in that
3068 case you can specify error list for each error type, instead of
3069 only being able to ignore the default 'non-fatal error' using
3070 continue_on_error. `ignore_er‐
3071 ror=READ_ERR_LIST,WRITE_ERR_LIST,VERIFY_ERR_LIST' errors for
3072 given error type is separated with ':'. Error may be symbol
3073 ('ENOSPC', 'ENOMEM') or integer. Example:
3074
3075 ignore_error=EAGAIN,ENOSPC:122
3076
3077 This option will ignore EAGAIN from READ, and ENOSPC and
3078 122(EDQUOT) from WRITE. This option works by overriding con‐
3079 tinue_on_error with the list of errors for each error type if
3080 any.
3081
3082 error_dump=bool
3083 If set dump every error even if it is non fatal, true by de‐
3084 fault. If disabled only fatal error will be dumped.
3085
3086 Running predefined workloads
3087 Fio includes predefined profiles that mimic the I/O workloads generated
3088 by other tools.
3089
3090 profile=str
3091 The predefined workload to run. Current profiles are:
3092
3093 tiobench
3094 Threaded I/O bench (tiotest/tiobench) like work‐
3095 load.
3096
3097 act Aerospike Certification Tool (ACT) like workload.
3098
3099 To view a profile's additional options use --cmdhelp after specifying
3100 the profile. For example:
3101
3102 $ fio --profile=act --cmdhelp
3103
3104 Act profile options
3105 device-names=str
3106 Devices to use.
3107
3108 load=int
3109 ACT load multiplier. Default: 1.
3110
3111 test-duration=time
3112 How long the entire test takes to run. When the unit is omitted,
3113 the value is given in seconds. Default: 24h.
3114
3115 threads-per-queue=int
3116 Number of read I/O threads per device. Default: 8.
3117
3118 read-req-num-512-blocks=int
3119 Number of 512B blocks to read at the time. Default: 3.
3120
3121 large-block-op-kbytes=int
3122 Size of large block ops in KiB (writes). Default: 131072.
3123
3124 prep Set to run ACT prep phase.
3125
3126 Tiobench profile options
3127 size=str
3128 Size in MiB.
3129
3130 block=int
3131 Block size in bytes. Default: 4096.
3132
3133 numruns=int
3134 Number of runs.
3135
3136 dir=str
3137 Test directory.
3138
3139 threads=int
3140 Number of threads.
3141
3143 Fio spits out a lot of output. While running, fio will display the sta‐
3144 tus of the jobs created. An example of that would be:
3145
3146 Jobs: 1 (f=1): [_(1),M(1)][24.8%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 01m:31s]
3147
3148 The characters inside the first set of square brackets denote the cur‐
3149 rent status of each thread. The first character is the first job de‐
3150 fined in the job file, and so forth. The possible values (in typical
3151 life cycle order) are:
3152
3153 P Thread setup, but not started.
3154 C Thread created.
3155 I Thread initialized, waiting or generating necessary data.
3156 p Thread running pre-reading file(s).
3157 / Thread is in ramp period.
3158 R Running, doing sequential reads.
3159 r Running, doing random reads.
3160 W Running, doing sequential writes.
3161 w Running, doing random writes.
3162 M Running, doing mixed sequential reads/writes.
3163 m Running, doing mixed random reads/writes.
3164 D Running, doing sequential trims.
3165 d Running, doing random trims.
3166 F Running, currently waiting for fsync(2).
3167 V Running, doing verification of written data.
3168 f Thread finishing.
3169 E Thread exited, not reaped by main thread yet.
3170 - Thread reaped.
3171 X Thread reaped, exited with an error.
3172 K Thread reaped, exited due to signal.
3173
3174 Fio will condense the thread string as not to take up more space on the
3175 command line than needed. For instance, if you have 10 readers and 10
3176 writers running, the output would look like this:
3177
3178 Jobs: 20 (f=20): [R(10),W(10)][4.0%][r=20.5MiB/s,w=23.5MiB/s][r=82,w=94 IOPS][eta 57m:36s]
3179
3180 Note that the status string is displayed in order, so it's possible to
3181 tell which of the jobs are currently doing what. In the example above
3182 this means that jobs 1--10 are readers and 11--20 are writers.
3183
3184 The other values are fairly self explanatory -- number of threads cur‐
3185 rently running and doing I/O, the number of currently open files (f=),
3186 the estimated completion percentage, the rate of I/O since last check
3187 (read speed listed first, then write speed and optionally trim speed)
3188 in terms of bandwidth and IOPS, and time to completion for the current
3189 running group. It's impossible to estimate runtime of the following
3190 groups (if any).
3191
3192 When fio is done (or interrupted by Ctrl-C), it will show the data for
3193 each thread, group of threads, and disks in that order. For each over‐
3194 all thread (or group) the output looks like:
3195
3196 Client1: (groupid=0, jobs=1): err= 0: pid=16109: Sat Jun 24 12:07:54 2017
3197 write: IOPS=88, BW=623KiB/s (638kB/s)(30.4MiB/50032msec)
3198 slat (nsec): min=500, max=145500, avg=8318.00, stdev=4781.50
3199 clat (usec): min=170, max=78367, avg=4019.02, stdev=8293.31
3200 lat (usec): min=174, max=78375, avg=4027.34, stdev=8291.79
3201 clat percentiles (usec):
3202 | 1.00th=[ 302], 5.00th=[ 326], 10.00th=[ 343], 20.00th=[ 363],
3203 | 30.00th=[ 392], 40.00th=[ 404], 50.00th=[ 416], 60.00th=[ 445],
3204 | 70.00th=[ 816], 80.00th=[ 6718], 90.00th=[12911], 95.00th=[21627],
3205 | 99.00th=[43779], 99.50th=[51643], 99.90th=[68682], 99.95th=[72877],
3206 | 99.99th=[78119]
3207 bw ( KiB/s): min= 532, max= 686, per=0.10%, avg=622.87, stdev=24.82, samples= 100
3208 iops : min= 76, max= 98, avg=88.98, stdev= 3.54, samples= 100
3209 lat (usec) : 250=0.04%, 500=64.11%, 750=4.81%, 1000=2.79%
3210 lat (msec) : 2=4.16%, 4=1.84%, 10=4.90%, 20=11.33%, 50=5.37%
3211 lat (msec) : 100=0.65%
3212 cpu : usr=0.27%, sys=0.18%, ctx=12072, majf=0, minf=21
3213 IO depths : 1=85.0%, 2=13.1%, 4=1.8%, 8=0.1%, 16=0.0%, 32=0.0%, >=64=0.0%
3214 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3215 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
3216 issued rwt: total=0,4450,0, short=0,0,0, dropped=0,0,0
3217 latency : target=0, window=0, percentile=100.00%, depth=8
3218
3219 The job name (or first job's name when using group_reporting) is
3220 printed, along with the group id, count of jobs being aggregated, last
3221 error id seen (which is 0 when there are no errors), pid/tid of that
3222 thread and the time the job/group completed. Below are the I/O statis‐
3223 tics for each data direction performed (showing writes in the example
3224 above). In the order listed, they denote:
3225
3226 read/write/trim
3227 The string before the colon shows the I/O direction the
3228 statistics are for. IOPS is the average I/Os performed
3229 per second. BW is the average bandwidth rate shown as:
3230 value in power of 2 format (value in power of 10 format).
3231 The last two values show: (total I/O performed in power
3232 of 2 format / runtime of that thread).
3233
3234 slat Submission latency (min being the minimum, max being the
3235 maximum, avg being the average, stdev being the standard
3236 deviation). This is the time it took to submit the I/O.
3237 For sync I/O this row is not displayed as the slat is re‐
3238 ally the completion latency (since queue/complete is one
3239 operation there). This value can be in nanoseconds, mi‐
3240 croseconds or milliseconds --- fio will choose the most
3241 appropriate base and print that (in the example above
3242 nanoseconds was the best scale). Note: in --minimal mode
3243 latencies are always expressed in microseconds.
3244
3245 clat Completion latency. Same names as slat, this denotes the
3246 time from submission to completion of the I/O pieces. For
3247 sync I/O, clat will usually be equal (or very close) to
3248 0, as the time from submit to complete is basically just
3249 CPU time (I/O has already been done, see slat explana‐
3250 tion).
3251
3252 lat Total latency. Same names as slat and clat, this denotes
3253 the time from when fio created the I/O unit to completion
3254 of the I/O operation.
3255
3256 bw Bandwidth statistics based on samples. Same names as the
3257 xlat stats, but also includes the number of samples taken
3258 (samples) and an approximate percentage of total aggre‐
3259 gate bandwidth this thread received in its group (per).
3260 This last value is only really useful if the threads in
3261 this group are on the same disk, since they are then com‐
3262 peting for disk access.
3263
3264 iops IOPS statistics based on samples. Same names as bw.
3265
3266 lat (nsec/usec/msec)
3267 The distribution of I/O completion latencies. This is the
3268 time from when I/O leaves fio and when it gets completed.
3269 Unlike the separate read/write/trim sections above, the
3270 data here and in the remaining sections apply to all I/Os
3271 for the reporting group. 250=0.04% means that 0.04% of
3272 the I/Os completed in under 250us. 500=64.11% means that
3273 64.11% of the I/Os required 250 to 499us for completion.
3274
3275 cpu CPU usage. User and system time, along with the number of
3276 context switches this thread went through, usage of sys‐
3277 tem and user time, and finally the number of major and
3278 minor page faults. The CPU utilization numbers are aver‐
3279 ages for the jobs in that reporting group, while the con‐
3280 text and fault counters are summed.
3281
3282 IO depths
3283 The distribution of I/O depths over the job lifetime. The
3284 numbers are divided into powers of 2 and each entry cov‐
3285 ers depths from that value up to those that are lower
3286 than the next entry -- e.g., 16= covers depths from 16 to
3287 31. Note that the range covered by a depth distribution
3288 entry can be different to the range covered by the equiv‐
3289 alent submit/complete distribution entry.
3290
3291 IO submit
3292 How many pieces of I/O were submitting in a single submit
3293 call. Each entry denotes that amount and below, until the
3294 previous entry -- e.g., 16=100% means that we submitted
3295 anywhere between 9 to 16 I/Os per submit call. Note that
3296 the range covered by a submit distribution entry can be
3297 different to the range covered by the equivalent depth
3298 distribution entry.
3299
3300 IO complete
3301 Like the above submit number, but for completions in‐
3302 stead.
3303
3304 IO issued rwt
3305 The number of read/write/trim requests issued, and how
3306 many of them were short or dropped.
3307
3308 IO latency
3309 These values are for latency_target and related options.
3310 When these options are engaged, this section describes
3311 the I/O depth required to meet the specified latency tar‐
3312 get.
3313
3314 After each client has been listed, the group statistics are printed.
3315 They will look like this:
3316
3317 Run status group 0 (all jobs):
3318 READ: bw=20.9MiB/s (21.9MB/s), 10.4MiB/s-10.8MiB/s (10.9MB/s-11.3MB/s), io=64.0MiB (67.1MB), run=2973-3069msec
3319 WRITE: bw=1231KiB/s (1261kB/s), 616KiB/s-621KiB/s (630kB/s-636kB/s), io=64.0MiB (67.1MB), run=52747-53223msec
3320
3321 For each data direction it prints:
3322
3323 bw Aggregate bandwidth of threads in this group followed by
3324 the minimum and maximum bandwidth of all the threads in
3325 this group. Values outside of brackets are power-of-2
3326 format and those within are the equivalent value in a
3327 power-of-10 format.
3328
3329 io Aggregate I/O performed of all threads in this group. The
3330 format is the same as bw.
3331
3332 run The smallest and longest runtimes of the threads in this
3333 group.
3334
3335 And finally, the disk statistics are printed. This is Linux specific.
3336 They will look like this:
3337
3338 Disk stats (read/write):
3339 sda: ios=16398/16511, merge=30/162, ticks=6853/819634, in_queue=826487, util=100.00%
3340
3341 Each value is printed for both reads and writes, with reads first. The
3342 numbers denote:
3343
3344 ios Number of I/Os performed by all groups.
3345
3346 merge Number of merges performed by the I/O scheduler.
3347
3348 ticks Number of ticks we kept the disk busy.
3349
3350 in_queue
3351 Total time spent in the disk queue.
3352
3353 util The disk utilization. A value of 100% means we kept the
3354 disk busy constantly, 50% would be a disk idling half of
3355 the time.
3356
3357 It is also possible to get fio to dump the current output while it is
3358 running, without terminating the job. To do that, send fio the USR1
3359 signal. You can also get regularly timed dumps by using the --sta‐
3360 tus-interval parameter, or by creating a file in `/tmp' named
3361 `fio-dump-status'. If fio sees this file, it will unlink it and dump
3362 the current output status.
3363
3365 For scripted usage where you typically want to generate tables or
3366 graphs of the results, fio can output the results in a semicolon sepa‐
3367 rated format. The format is one long line of values, such as:
3368
3369 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00%
3370 A description of this job goes here.
3371
3372 The job description (if provided) follows on a second line for terse
3373 v2. It appears on the same line for other terse versions.
3374
3375 To enable terse output, use the --minimal or `--output-format=terse'
3376 command line options. The first value is the version of the terse out‐
3377 put format. If the output has to be changed for some reason, this num‐
3378 ber will be incremented by 1 to signify that change.
3379
3380 Split up, the format is as follows (comments in brackets denote when a
3381 field was introduced or whether it's specific to some terse version):
3382
3383 terse version, fio version [v3], jobname, groupid, error
3384
3385 READ status:
3386
3387 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3388 Submission latency: min, max, mean, stdev (usec)
3389 Completion latency: min, max, mean, stdev (usec)
3390 Completion latency percentiles: 20 fields (see below)
3391 Total latency: min, max, mean, stdev (usec)
3392 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3393 IOPS [v5]: min, max, mean, stdev, number of samples
3394
3395 WRITE status:
3396
3397 Total IO (KiB), bandwidth (KiB/sec), IOPS, runtime (msec)
3398 Submission latency: min, max, mean, stdev (usec)
3399 Completion latency: min, max, mean, stdev (usec)
3400 Completion latency percentiles: 20 fields (see below)
3401 Total latency: min, max, mean, stdev (usec)
3402 Bw (KiB/s): min, max, aggregate percentage of total, mean, stdev, number of samples [v5]
3403 IOPS [v5]: min, max, mean, stdev, number of samples
3404
3405 TRIM status [all but version 3]:
3406
3407 Fields are similar to READ/WRITE status.
3408
3409 CPU usage:
3410
3411 user, system, context switches, major faults, minor faults
3412
3413 I/O depths:
3414
3415 <=1, 2, 4, 8, 16, 32, >=64
3416
3417 I/O latencies microseconds:
3418
3419 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000
3420
3421 I/O latencies milliseconds:
3422
3423 <=2, 4, 10, 20, 50, 100, 250, 500, 750, 1000, 2000, >=2000
3424
3425 Disk utilization [v3]:
3426
3427 disk name, read ios, write ios, read merges, write merges, read ticks, write ticks, time spent in queue, disk utilization percentage
3428
3429 Additional Info (dependent on continue_on_error, default off):
3430
3431 total # errors, first error code
3432
3433 Additional Info (dependent on description being set):
3434
3435 Text description
3436
3437 Completion latency percentiles can be a grouping of up to 20 sets, so
3438 for the terse output fio writes all of them. Each field will look like
3439 this:
3440
3441 1.00%=6112
3442
3443 which is the Xth percentile, and the `usec' latency associated with it.
3444
3445 For Disk utilization, all disks used by fio are shown. So for each disk
3446 there will be a disk utilization section.
3447
3448 Below is a single line containing short names for each of the fields in
3449 the minimal output v3, separated by semicolons:
3450
3451 terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth_kb;read_iops;read_runtime_ms;read_slat_min_us;read_slat_max_us;read_slat_mean_us;read_slat_dev_us;read_clat_min_us;read_clat_max_us;read_clat_mean_us;read_clat_dev_us;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min_us;read_lat_max_us;read_lat_mean_us;read_lat_dev_us;read_bw_min_kb;read_bw_max_kb;read_bw_agg_pct;read_bw_mean_kb;read_bw_dev_kb;write_kb;write_bandwidth_kb;write_iops;write_runtime_ms;write_slat_min_us;write_slat_max_us;write_slat_mean_us;write_slat_dev_us;write_clat_min_us;write_clat_max_us;write_clat_mean_us;write_clat_dev_us;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min_us;write_lat_max_us;write_lat_mean_us;write_lat_dev_us;write_bw_min_kb;write_bw_max_kb;write_bw_agg_pct;write_bw_mean_kb;write_bw_dev_kb;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util
3452
3453 In client/server mode terse output differs from what appears when jobs
3454 are run locally. Disk utilization data is omitted from the standard
3455 terse output and for v3 and later appears on its own separate line at
3456 the end of each terse reporting cycle.
3457
3459 The json output format is intended to be both human readable and conve‐
3460 nient for automated parsing. For the most part its sections mirror
3461 those of the normal output. The runtime value is reported in msec and
3462 the bw value is reported in 1024 bytes per second units.
3463
3465 The json+ output format is identical to the json output format except
3466 that it adds a full dump of the completion latency bins. Each bins ob‐
3467 ject contains a set of (key, value) pairs where keys are latency dura‐
3468 tions and values count how many I/Os had completion latencies of the
3469 corresponding duration. For example, consider:
3470
3471 "bins" : { "87552" : 1, "89600" : 1, "94720" : 1, "96768" : 1,
3472 "97792" : 1, "99840" : 1, "100864" : 2, "103936" : 6, "104960" :
3473 534, "105984" : 5995, "107008" : 7529, ... }
3474
3475 This data indicates that one I/O required 87,552ns to complete, two
3476 I/Os required 100,864ns to complete, and 7529 I/Os required 107,008ns
3477 to complete.
3478
3479 Also included with fio is a Python script fio_jsonplus_clat2csv that
3480 takes json+ output and generates CSV-formatted latency data suitable
3481 for plotting.
3482
3483 The latency durations actually represent the midpoints of latency in‐
3484 tervals. For details refer to `stat.h' in the fio source.
3485
3487 There are two trace file format that you can encounter. The older (v1)
3488 format is unsupported since version 1.20-rc3 (March 2008). It will
3489 still be described below in case that you get an old trace and want to
3490 understand it.
3491
3492 In any case the trace is a simple text file with a single action per
3493 line.
3494
3495 Trace file format v1
3496 Each line represents a single I/O action in the following for‐
3497 mat:
3498
3499 rw, offset, length
3500
3501 where `rw=0/1' for read/write, and the `offset' and `length' en‐
3502 tries being in bytes.
3503
3504 This format is not supported in fio versions >= 1.20-rc3.
3505
3506 Trace file format v2
3507 The second version of the trace file format was added in fio
3508 version 1.17. It allows to access more then one file per trace
3509 and has a bigger set of possible file actions.
3510
3511 The first line of the trace file has to be:
3512
3513 "fio version 2 iolog"
3514
3515 Following this can be lines in two different formats, which are
3516 described below.
3517
3518 The file management format:
3519 filename action
3520
3521 The `filename' is given as an absolute path. The `action'
3522 can be one of these:
3523
3524 add Add the given `filename' to the trace.
3525
3526 open Open the file with the given `filename'.
3527 The `filename' has to have been added with
3528 the add action before.
3529
3530 close Close the file with the given `filename'.
3531 The file has to have been opened before.
3532
3533 The file I/O action format:
3534 filename action offset length
3535
3536 The `filename' is given as an absolute path, and has to
3537 have been added and opened before it can be used with
3538 this format. The `offset' and `length' are given in
3539 bytes. The `action' can be one of these:
3540
3541 wait Wait for `offset' microseconds. Everything
3542 below 100 is discarded. The time is rela‐
3543 tive to the previous `wait' statement.
3544
3545 read Read `length' bytes beginning from `off‐
3546 set'.
3547
3548 write Write `length' bytes beginning from `off‐
3549 set'.
3550
3551 sync fsync(2) the file.
3552
3553 datasync
3554 fdatasync(2) the file.
3555
3556 trim Trim the given file from the given `offset'
3557 for `length' bytes.
3558
3560 Colocation is a common practice used to get the most out of a machine.
3561 Knowing which workloads play nicely with each other and which ones
3562 don't is a much harder task. While fio can replay workloads concur‐
3563 rently via multiple jobs, it leaves some variability up to the sched‐
3564 uler making results harder to reproduce. Merging is a way to make the
3565 order of events consistent.
3566
3567 Merging is integrated into I/O replay and done when a merge_blk‐
3568 trace_file is specified. The list of files passed to read_iolog go
3569 through the merge process and output a single file stored to the speci‐
3570 fied file. The output file is passed on as if it were the only file
3571 passed to read_iolog. An example would look like:
3572
3573 $ fio --read_iolog="<file1>:<file2>" --merge_blk‐
3574 trace_file="<output_file>"
3575
3576 Creating only the merged file can be done by passing the command line
3577 argument merge-blktrace-only.
3578
3579 Scaling traces can be done to see the relative impact of any particular
3580 trace being slowed down or sped up. merge_blktrace_scalars takes in a
3581 colon separated list of percentage scalars. It is index paired with the
3582 files passed to read_iolog.
3583
3584 With scaling, it may be desirable to match the running time of all
3585 traces. This can be done with merge_blktrace_iters. It is index paired
3586 with read_iolog just like merge_blktrace_scalars.
3587
3588 In an example, given two traces, A and B, each 60s long. If we want to
3589 see the impact of trace A issuing IOs twice as fast and repeat trace A
3590 over the runtime of trace B, the following can be done:
3591
3592 $ fio --read_iolog="<trace_a>:"<trace_b>" --merge_blk‐
3593 trace_file"<output_file>" --merge_blktrace_scalars="50:100"
3594 --merge_blktrace_iters="2:1"
3595
3596 This runs trace A at 2x the speed twice for approximately the same run‐
3597 time as a single run of trace B.
3598
3600 In some cases, we want to understand CPU overhead in a test. For exam‐
3601 ple, we test patches for the specific goodness of whether they reduce
3602 CPU usage. Fio implements a balloon approach to create a thread per
3603 CPU that runs at idle priority, meaning that it only runs when nobody
3604 else needs the cpu. By measuring the amount of work completed by the
3605 thread, idleness of each CPU can be derived accordingly.
3606
3607 An unit work is defined as touching a full page of unsigned characters.
3608 Mean and standard deviation of time to complete an unit work is re‐
3609 ported in "unit work" section. Options can be chosen to report detailed
3610 percpu idleness or overall system idleness by aggregating percpu stats.
3611
3613 Fio is usually run in one of two ways, when data verification is done.
3614 The first is a normal write job of some sort with verify enabled. When
3615 the write phase has completed, fio switches to reads and verifies ev‐
3616 erything it wrote. The second model is running just the write phase,
3617 and then later on running the same job (but with reads instead of
3618 writes) to repeat the same I/O patterns and verify the contents. Both
3619 of these methods depend on the write phase being completed, as fio oth‐
3620 erwise has no idea how much data was written.
3621
3622 With verification triggers, fio supports dumping the current write
3623 state to local files. Then a subsequent read verify workload can load
3624 this state and know exactly where to stop. This is useful for testing
3625 cases where power is cut to a server in a managed fashion, for in‐
3626 stance.
3627
3628 A verification trigger consists of two things:
3629
3630 1) Storing the write state of each job.
3631
3632 2) Executing a trigger command.
3633
3634 The write state is relatively small, on the order of hundreds of bytes
3635 to single kilobytes. It contains information on the number of comple‐
3636 tions done, the last X completions, etc.
3637
3638 A trigger is invoked either through creation ('touch') of a specified
3639 file in the system, or through a timeout setting. If fio is run with
3640 `--trigger-file=/tmp/trigger-file', then it will continually check for
3641 the existence of `/tmp/trigger-file'. When it sees this file, it will
3642 fire off the trigger (thus saving state, and executing the trigger com‐
3643 mand).
3644
3645 For client/server runs, there's both a local and remote trigger. If fio
3646 is running as a server backend, it will send the job states back to the
3647 client for safe storage, then execute the remote trigger, if specified.
3648 If a local trigger is specified, the server will still send back the
3649 write state, but the client will then execute the trigger.
3650
3651 Verification trigger example
3652 Let's say we want to run a powercut test on the remote Linux ma‐
3653 chine 'server'. Our write workload is in `write-test.fio'. We
3654 want to cut power to 'server' at some point during the run, and
3655 we'll run this test from the safety or our local machine, 'lo‐
3656 calbox'. On the server, we'll start the fio backend normally:
3657
3658 server# fio --server
3659
3660 and on the client, we'll fire off the workload:
3661
3662 localbox$ fio --client=server --trig‐
3663 ger-file=/tmp/my-trigger --trigger-remote="bash -c "echo
3664 b > /proc/sysrq-triger""
3665
3666 We set `/tmp/my-trigger' as the trigger file, and we tell fio to
3667 execute:
3668
3669 echo b > /proc/sysrq-trigger
3670
3671 on the server once it has received the trigger and sent us the
3672 write state. This will work, but it's not really cutting power
3673 to the server, it's merely abruptly rebooting it. If we have a
3674 remote way of cutting power to the server through IPMI or simi‐
3675 lar, we could do that through a local trigger command instead.
3676 Let's assume we have a script that does IPMI reboot of a given
3677 hostname, ipmi-reboot. On localbox, we could then have run fio
3678 with a local trigger instead:
3679
3680 localbox$ fio --client=server --trig‐
3681 ger-file=/tmp/my-trigger --trigger="ipmi-reboot server"
3682
3683 For this case, fio would wait for the server to send us the
3684 write state, then execute `ipmi-reboot server' when that hap‐
3685 pened.
3686
3687 Loading verify state
3688 To load stored write state, a read verification job file must
3689 contain the verify_state_load option. If that is set, fio will
3690 load the previously stored state. For a local fio run this is
3691 done by loading the files directly, and on a client/server run,
3692 the server backend will ask the client to send the files over
3693 and load them from there.
3694
3696 Fio supports a variety of log file formats, for logging latencies,
3697 bandwidth, and IOPS. The logs share a common format, which looks like
3698 this:
3699
3700 time (msec), value, data direction, block size (bytes), offset
3701 (bytes), command priority
3702
3703 `Time' for the log entry is always in milliseconds. The `value' logged
3704 depends on the type of log, it will be one of the following:
3705
3706 Latency log
3707 Value is latency in nsecs
3708
3709 Bandwidth log
3710 Value is in KiB/sec
3711
3712 IOPS log
3713 Value is IOPS
3714
3715 `Data direction' is one of the following:
3716
3717 0 I/O is a READ
3718
3719 1 I/O is a WRITE
3720
3721 2 I/O is a TRIM
3722
3723 The entry's `block size' is always in bytes. The `offset' is the posi‐
3724 tion in bytes from the start of the file for that particular I/O. The
3725 logging of the offset can be toggled with log_offset.
3726
3727 `Command priority` is 0 for normal priority and 1 for high priority.
3728 This is controlled by the ioengine specific cmdprio_percentage.
3729
3730 Fio defaults to logging every individual I/O but when windowed logging
3731 is set through log_avg_msec, either the average (by default) or the
3732 maximum (log_max_value is set) `value' seen over the specified period
3733 of time is recorded. Each `data direction' seen within the window pe‐
3734 riod will aggregate its values in a separate row. Further, when using
3735 windowed logging the `block size' and `offset' entries will always con‐
3736 tain 0.
3737
3739 Normally fio is invoked as a stand-alone application on the machine
3740 where the I/O workload should be generated. However, the backend and
3741 frontend of fio can be run separately i.e., the fio server can generate
3742 an I/O workload on the "Device Under Test" while being controlled by a
3743 client on another machine.
3744
3745 Start the server on the machine which has access to the storage DUT:
3746
3747 $ fio --server=args
3748
3749 where `args' defines what fio listens to. The arguments are of the form
3750 `type,hostname' or `IP,port'. `type' is either `ip' (or ip4) for TCP/IP
3751 v4, `ip6' for TCP/IP v6, or `sock' for a local unix domain socket.
3752 `hostname' is either a hostname or IP address, and `port' is the port
3753 to listen to (only valid for TCP/IP, not a local socket). Some exam‐
3754 ples:
3755
3756 1) fio --server
3757 Start a fio server, listening on all interfaces on the
3758 default port (8765).
3759
3760 2) fio --server=ip:hostname,4444
3761 Start a fio server, listening on IP belonging to hostname
3762 and on port 4444.
3763
3764 3) fio --server=ip6:::1,4444
3765 Start a fio server, listening on IPv6 localhost ::1 and
3766 on port 4444.
3767
3768 4) fio --server=,4444
3769 Start a fio server, listening on all interfaces on port
3770 4444.
3771
3772 5) fio --server=1.2.3.4
3773 Start a fio server, listening on IP 1.2.3.4 on the de‐
3774 fault port.
3775
3776 6) fio --server=sock:/tmp/fio.sock
3777 Start a fio server, listening on the local socket
3778 `/tmp/fio.sock'.
3779
3780 Once a server is running, a "client" can connect to the fio server
3781 with:
3782
3783 $ fio <local-args> --client=<server> <remote-args> <job file(s)>
3784
3785 where `local-args' are arguments for the client where it is running,
3786 `server' is the connect string, and `remote-args' and `job file(s)' are
3787 sent to the server. The `server' string follows the same format as it
3788 does on the server side, to allow IP/hostname/socket and port strings.
3789
3790 Fio can connect to multiple servers this way:
3791
3792 $ fio --client=<server1> <job file(s)> --client=<server2> <job
3793 file(s)>
3794
3795 If the job file is located on the fio server, then you can tell the
3796 server to load a local file as well. This is done by using --re‐
3797 mote-config:
3798
3799 $ fio --client=server --remote-config /path/to/file.fio
3800
3801 Then fio will open this local (to the server) job file instead of being
3802 passed one from the client.
3803
3804 If you have many servers (example: 100 VMs/containers), you can input a
3805 pathname of a file containing host IPs/names as the parameter value for
3806 the --client option. For example, here is an example `host.list' file
3807 containing 2 hostnames:
3808
3809 host1.your.dns.domain
3810 host2.your.dns.domain
3811
3812 The fio command would then be:
3813
3814 $ fio --client=host.list <job file(s)>
3815
3816 In this mode, you cannot input server-specific parameters or job files
3817 -- all servers receive the same job file.
3818
3819 In order to let `fio --client' runs use a shared filesystem from multi‐
3820 ple hosts, `fio --client' now prepends the IP address of the server to
3821 the filename. For example, if fio is using the directory `/mnt/nfs/fio'
3822 and is writing filename `fileio.tmp', with a --client `hostfile' con‐
3823 taining two hostnames `h1' and `h2' with IP addresses 192.168.10.120
3824 and 192.168.10.121, then fio will create two files:
3825
3826 /mnt/nfs/fio/192.168.10.120.fileio.tmp
3827 /mnt/nfs/fio/192.168.10.121.fileio.tmp
3828
3829 Terse output in client/server mode will differ slightly from what is
3830 produced when fio is run in stand-alone mode. See the terse output sec‐
3831 tion for details.
3832
3834 fio was written by Jens Axboe <axboe@kernel.dk>.
3835 This man page was written by Aaron Carroll <aaronc@cse.unsw.edu.au>
3836 based on documentation by Jens Axboe.
3837 This man page was rewritten by Tomohiro Kusumi <tkusumi@tuxera.com>
3838 based on documentation by Jens Axboe.
3839
3841 Report bugs to the fio mailing list <fio@vger.kernel.org>.
3842 See REPORTING-BUGS.
3843
3844 REPORTING-BUGS: http://git.kernel.dk/cgit/fio/plain/REPORTING-BUGS
3845
3847 For further documentation see HOWTO and README.
3848 Sample jobfiles are available in the `examples/' directory.
3849 These are typically located under `/usr/share/doc/fio'.
3850
3851 HOWTO: http://git.kernel.dk/cgit/fio/plain/HOWTO
3852 README: http://git.kernel.dk/cgit/fio/plain/README
3853
3854
3855
3856User Manual August 2017 fio(1)