1fiologparser_hist.py(1) General Commands Manual fiologparser_hist.py(1)
2
3
4
6 fiologparser_hist.py - Calculate statistics from fio histograms
7
9 fiologparser_hist.py [options] [clat_hist_files]...
10
12 fiologparser_hist.py is a utility for converting *_clat_hist* files
13 generated by fio into a CSV of latency statistics including minimum,
14 average, maximum latency, and 50th, 95th, and 99th percentiles.
15
17 $ fiologparser_hist.py *_clat_hist*
18 end-time, samples, min, avg, median, 90%, 95%, 99%, max
19 1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
20 2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
21 4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
22
24 --help Print these options.
25
26 --buff_size=int
27 Number of samples to buffer into numpy at a time. Default is
28 10,000. This can be adjusted to help performance.
29
30 --max_latency=int
31 Number of seconds of data to process at a time. Defaults to 20
32 seconds, in order to handle the 17 second upper bound on latency
33 in histograms reported by fio. This should be increased if fio
34 has been run with a larger maximum latency. Lowering this when a
35 lower maximum latency is known can improve performance. See
36 NOTES for more details.
37
38 -i, --interval=int
39 Interval at which statistics are reported. Defaults to 1000 ms.
40 This should be set a minimum of the value for log_hist_msec as
41 given to fio.
42
43 -d, --divisor=int
44 Divide statistics by this value. Defaults to 1. Useful if you
45 want to convert latencies from milliseconds to seconds (divi‐
46 sor=1000).
47
48 --warn Enables warning messages printed to stderr, useful for debug‐
49 ging.
50
51 --group_nr=int
52 Set this to the value of FIO_IO_U_PLAT_GROUP_NR as defined in
53 stat.h if fio has been recompiled. Defaults to 19, the current
54 value used in fio. See NOTES for more details.
55
56
58 end-times are calculated to be uniform increments of the --interval
59 value given, regardless of when histogram samples are reported. Of
60 note:
61
62 Intervals with no samples are omitted. In the example above this
63 means "no statistics from 2 to 3 seconds" and "39 samples influ‐
64 enced the statistics of the interval from 3 to 4 seconds".
65
66 Intervals with a single sample will have the same value for all
67 statistics
68
69
70 The number of samples is unweighted, corresponding to the total number
71 of samples which have any effect whatsoever on the interval.
72
73 Min statistics are computed using value of the lower boundary of the
74 first bin (in increasing bin order) with non-zero samples in it. Simi‐
75 larly for max, we take the upper boundary of the last bin with non-zero
76 samples in it. This is semantically identical to taking the 0th and
77 100th percentiles with a 50% bin-width buffer (because percentiles are
78 computed using mid-points of the bins). This enforces the following
79 nice properties:
80
81 min <= 50th <= 90th <= 95th <= 99th <= max
82
83 min and max are strict lower and upper bounds on the actual min
84 / max seen by fio (and reported in *_clat.* with averaging
85 turned off).
86
87
88 Average statistics use a standard weighted arithmetic mean.
89
90 Percentile statistics are computed using the weighted percentile method
91 as described here: https://en.wikipedia.org/wiki/Per‐
92 centile#Weighted_percentile. See weights() method for details on how
93 weights are computed for individual samples. In process_interval() we
94 further multiply by the height of each bin to get weighted histograms.
95
96 We convert files given on the command line, assumed to be fio histogram
97 files, An individual histogram file can contain the histograms for mul‐
98 tiple different r/w directions (notably when --rw=randrw). This is
99 accounted for by tracking each r/w direction separately. In the statis‐
100 tics reported we ultimately merge *all* histograms (regardless of r/w
101 direction).
102
103 The value of *_GROUP_NR in stat.h (and *_BITS) determines how many
104 latency bins fio outputs when histogramming is enabled. Namely for the
105 current default of GROUP_NR=19, we get 1,216 bins with a maximum
106 latency of approximately 17 seconds. For certain applications this may
107 not be sufficient. With GROUP_NR=24 we have 1,536 bins, giving us a
108 maximum latency of 541 seconds (~ 9 minutes). If you expect your appli‐
109 cation to experience latencies greater than 17 seconds, you will need
110 to recompile fio with a larger GROUP_NR, e.g. with:
111
112
113 sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 190#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
114 make fio
115
116 Quick reference table for the max latency corresponding to a sampling
117 of values for GROUP_NR:
118
119
120 GROUP_NR | # bins | max latency bin value
121 19 | 1216 | 16.9 sec
122 20 | 1280 | 33.8 sec
123 21 | 1344 | 67.6 sec
124 22 | 1408 | 2 min, 15 sec
125 23 | 1472 | 4 min, 32 sec
126 24 | 1536 | 9 min, 4 sec
127 25 | 1600 | 18 min, 8 sec
128 26 | 1664 | 36 min, 16 sec
129
130 At present this program automatically detects the number of histogram
131 bins in the log files, and adjusts the bin latency values accordingly.
132 In particular if you use the --log_hist_coarseness parameter of fio,
133 you get output files with a number of bins according to the following
134 table (note that the first row is identical to the table above):
135
136
137 coarse \ GROUP_NR
138 19 20 21 22 23 24 25 26
139 -------------------------------------------------------
140 0 [[ 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664],
141 1 [ 608, 640, 672, 704, 736, 768, 800, 832],
142 2 [ 304, 320, 336, 352, 368, 384, 400, 416],
143 3 [ 152, 160, 168, 176, 184, 192, 200, 208],
144 4 [ 76, 80, 84, 88, 92, 96, 100, 104],
145 5 [ 38, 40, 42, 44, 46, 48, 50, 52],
146 6 [ 19, 20, 21, 22, 23, 24, 25, 26],
147 7 [ N/A, 10, N/A, 11, N/A, 12, N/A, 13],
148 8 [ N/A, 5, N/A, N/A, N/A, 6, N/A, N/A]]
149
150 For other values of GROUP_NR and coarseness, this table can be computed
151 like this:
152
153
154 bins = [1216,1280,1344,1408,1472,1536,1600,1664]
155 max_coarse = 8
156 fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
157 np.transpose(list(map(fncn, bins)))
158
159 If you have not adjusted GROUP_NR for your (high latency) application,
160 then you will see the percentiles computed by this tool max out at the
161 max latency bin value as in the first table above, and in this plot
162 (where GROUP_NR=19 and thus we see a max latency of ~16.7 seconds in
163 the red line):
164
165 https://www.cronburg.com/fio/max_latency_bin_value_bug.png
166
167
168 Motivation for, design decisions, and the implementation process are
169 described in further detail here:
170
171 https://www.cronburg.com/fio/cloud-latency-problem-measurement/
172
173
175 fiologparser_hist.py and this manual page were written by Karl Cronburg
176 <karl.cronburg@gmail.com>.
177
179 Report bugs to the fio mailing list <fio@vger.kernel.org>.
180
181
182
183 August 18, 2016 fiologparser_hist.py(1)