1fiologparser_hist.py(1) General Commands Manual fiologparser_hist.py(1)
2
3
4
6 fiologparser_hist.py - Calculate statistics from fio histograms
7
9 fiologparser_hist.py [options] [clat_hist_files]...
10
12 fiologparser_hist.py is a utility for converting *_clat_hist* files
13 generated by fio into a CSV of latency statistics including minimum,
14 average, maximum latency, and selectable percentiles.
15
17 $ fiologparser_hist.py *_clat_hist*
18 end-time, samples, min, avg, median, 90%, 95%, 99%, max
19 1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
20 2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
21 4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
22 ...
23
25 --help Print these options.
26
27 --buff_size=int
28 Number of samples to buffer into numpy at a time. Default is
29 10,000. This can be adjusted to help performance.
30
31 --max_latency=int
32 Number of seconds of data to process at a time. Defaults to 20
33 seconds, in order to handle the 17 second upper bound on latency
34 in histograms reported by fio. This should be increased if fio
35 has been run with a larger maximum latency. Lowering this when a
36 lower maximum latency is known can improve performance. See
37 NOTES for more details.
38
39 -i, --interval=int
40 Interval at which statistics are reported. Defaults to 1000 ms.
41 This should be set a minimum of the value for log_hist_msec as
42 given to fio.
43
44 --noweight
45 Do not perform weighting of samples between output intervals.
46 Default is False.
47
48 -d, --divisor=int
49 Divide statistics by this value. Defaults to 1. Useful if you
50 want to convert latencies from milliseconds to seconds (divi‐
51 sor=1000).
52
53 --warn Enables warning messages printed to stderr, useful for debug‐
54 ging.
55
56 --group_nr=int
57 Set this to the value of FIO_IO_U_PLAT_GROUP_NR as defined in
58 stat.h if fio has been recompiled. Defaults to 19, the current
59 value used in fio. See NOTES for more details.
60
61 --percentiles=str
62 Pass desired list of comma or colon separated percentiles to
63 print. The default is "90.0:95.0:99.0", but min, median(50%)
64 and max percentiles are always printed
65
66 --usbin
67 Use to indicate to parser that histogram bin latencies values
68 are in microseconds. The default is to use nanoseconds, but
69 histogram logs from fio versions <= 2.99 are in microseconds.
70
71 --directions=str
72 By default, all directions (e.g read and write) histogram bins
73 are combined producing one 'mixed' result. To produce indepen‐
74 dent directional results, pass some combination of ´rwtm´ char‐
75 acters with the --directions=rwtm option. A ´dir´ column is
76 added indicating the result direction for a row.
77
78
80 end-times are calculated to be uniform increments of the --interval
81 value given, regardless of when histogram samples are reported. Of
82 note:
83
84 Intervals with no samples are omitted. In the example above this
85 means "no statistics from 2 to 3 seconds" and "39 samples influ‐
86 enced the statistics of the interval from 3 to 4 seconds".
87
88 Intervals with a single sample will have the same value for all
89 statistics
90
91
92 The number of samples is unweighted, corresponding to the total number
93 of samples which have any effect whatsoever on the interval.
94
95 Min statistics are computed using value of the lower boundary of the
96 first bin (in increasing bin order) with non-zero samples in it. Simi‐
97 larly for max, we take the upper boundary of the last bin with non-zero
98 samples in it. This is semantically identical to taking the 0th and
99 100th percentiles with a 50% bin-width buffer (because percentiles are
100 computed using mid-points of the bins). This enforces the following
101 nice properties:
102
103 min <= 50th <= 90th <= 95th <= 99th <= max
104
105 min and max are strict lower and upper bounds on the actual min
106 / max seen by fio (and reported in *_clat.* with averaging
107 turned off).
108
109
110 Average statistics use a standard weighted arithmetic mean.
111
112 When --noweights option is false (the default) percentile statistics
113 are computed using the weighted percentile method as described here:
114 https://en.wikipedia.org/wiki/Percentile#Weighted_percentile. See
115 weights() method for details on how weights are computed for individual
116 samples. In process_interval() we further multiply by the height of
117 each bin to get weighted histograms.
118
119 We convert files given on the command line, assumed to be fio histogram
120 files, An individual histogram file can contain the histograms for mul‐
121 tiple different r/w directions (notably when --rw=randrw). This is
122 accounted for by tracking each r/w direction separately. In the statis‐
123 tics reported we ultimately merge *all* histograms (regardless of r/w
124 direction).
125
126 The value of *_GROUP_NR in stat.h (and *_BITS) determines how many
127 latency bins fio outputs when histogramming is enabled. Namely for the
128 current default of GROUP_NR=19, we get 1,216 bins with a maximum
129 latency of approximately 17 seconds. For certain applications this may
130 not be sufficient. With GROUP_NR=24 we have 1,536 bins, giving us a
131 maximum latency of 541 seconds (~ 9 minutes). If you expect your appli‐
132 cation to experience latencies greater than 17 seconds, you will need
133 to recompile fio with a larger GROUP_NR, e.g. with:
134
135
136 sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 190#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
137 make fio
138
139 Quick reference table for the max latency corresponding to a sampling
140 of values for GROUP_NR:
141
142
143 GROUP_NR | # bins | max latency bin value
144 19 | 1216 | 16.9 sec
145 20 | 1280 | 33.8 sec
146 21 | 1344 | 67.6 sec
147 22 | 1408 | 2 min, 15 sec
148 23 | 1472 | 4 min, 32 sec
149 24 | 1536 | 9 min, 4 sec
150 25 | 1600 | 18 min, 8 sec
151 26 | 1664 | 36 min, 16 sec
152
153 At present this program automatically detects the number of histogram
154 bins in the log files, and adjusts the bin latency values accordingly.
155 In particular if you use the --log_hist_coarseness parameter of fio,
156 you get output files with a number of bins according to the following
157 table (note that the first row is identical to the table above):
158
159
160 coarse \ GROUP_NR
161 19 20 21 22 23 24 25 26
162 -------------------------------------------------------
163 0 [[ 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664],
164 1 [ 608, 640, 672, 704, 736, 768, 800, 832],
165 2 [ 304, 320, 336, 352, 368, 384, 400, 416],
166 3 [ 152, 160, 168, 176, 184, 192, 200, 208],
167 4 [ 76, 80, 84, 88, 92, 96, 100, 104],
168 5 [ 38, 40, 42, 44, 46, 48, 50, 52],
169 6 [ 19, 20, 21, 22, 23, 24, 25, 26],
170 7 [ N/A, 10, N/A, 11, N/A, 12, N/A, 13],
171 8 [ N/A, 5, N/A, N/A, N/A, 6, N/A, N/A]]
172
173 For other values of GROUP_NR and coarseness, this table can be computed
174 like this:
175
176
177 bins = [1216,1280,1344,1408,1472,1536,1600,1664]
178 max_coarse = 8
179 fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
180 np.transpose(list(map(fncn, bins)))
181
182 If you have not adjusted GROUP_NR for your (high latency) application,
183 then you will see the percentiles computed by this tool max out at the
184 max latency bin value as in the first table above, and in this plot
185 (where GROUP_NR=19 and thus we see a max latency of ~16.7 seconds in
186 the red line):
187
188 https://www.cronburg.com/fio/max_latency_bin_value_bug.png
189
190
191 Motivation for, design decisions, and the implementation process are
192 described in further detail here:
193
194 https://www.cronburg.com/fio/cloud-latency-problem-measurement/
195
196
198 fiologparser_hist.py and this manual page were written by Karl Cronburg
199 <karl.cronburg@gmail.com>.
200
202 Report bugs to the fio mailing list <fio@vger.kernel.org>.
203
204
205
206 August 18, 2016 fiologparser_hist.py(1)