1Dumbbench(3) User Contributed Perl Documentation Dumbbench(3)
2
3
4
6 Dumbbench - More reliable benchmarking with the least amount of
7 thinking
8
10 Command line interface: (See "dumbbench --help")
11
12 dumbbench -p 0.005 -- ./testprogram --testprogramoption
13
14 This will start churning for a while and then prints something like:
15
16 Ran 23 iterations of the command.
17 Rejected 3 samples as outliers.
18 Rounded run time per iteration (seconds): 9.519e-01 +/- 3.7e-03 (0.4%)
19
20 As a module:
21
22 use Dumbbench;
23
24 my $bench = Dumbbench->new(
25 target_rel_precision => 0.005, # seek ~0.5%
26 initial_runs => 20, # the higher the more reliable
27 );
28 $bench->add_instances(
29 Dumbbench::Instance::Cmd->new(command => [qw(perl -e 'something')]),
30 Dumbbench::Instance::PerlEval->new(code => 'for(1..1e7){something}'),
31 Dumbbench::Instance::PerlSub->new(code => sub {for(1..1e7){something}}),
32 );
33 # (Note: Comparing the run of externals commands with
34 # evals/subs probably isn't reliable)
35 $bench->run;
36 $bench->report;
37
39 This module attempts to implement reasonably robust benchmarking with
40 little extra effort and expertise required from the user. That is to
41 say, benchmarking using this module is likely an improvement over
42
43 time some-command --to --benchmark
44
45 or
46
47 use Benchmark qw/timethis/;
48 timethis(1000, 'system("some-command", ...)');
49
50 The module currently works similar to the former command line, except
51 (in layman terms) it will run the command many times, estimate the
52 uncertainty of the result and keep iterating until a certain user-
53 defined precision has been reached. Then, it calculates the resulting
54 uncertainty and goes through some pain to discard bad runs and subtract
55 overhead from the timings. The reported timing includes an uncertainty,
56 so that multiple benchmarks can more easily be compared.
57
58 Please note that "Dumbbench" works entirely with wallclock time as
59 reported by "Time::HiRes"' "time()" function.
60
62 In addition to the methods listed here, there are read-only accessors
63 for all named arguments of the constructor (which are also object
64 attributes).
65
66 new
67 Constructor that takes the following arguments (with defaults):
68
69 verbosity => 0, # 0, 1, or 2
70 target_rel_precision => 0.05, # 5% target precision
71 target_abs_precision => 0, # no target absolute precision (in s)
72 intial_runs => 20, # no. of guaranteed initial runs
73 max_iterations => 10000, # hard max. no of iterations
74 variability_measure => 'mad', # method for calculating uncertainty
75 outlier_rejection => 3, # no. of "sigma"s for the outlier rejection
76
77 "variability_measure" and "outlier_rejection" probably make sense after
78 reading "HOW IT WORKS" below. Setting "outlier_rejection" to 0 will
79 turn it off entirely.
80
81 add_instances
82 Takes one or more instances of subclasses of Dumbbench::Instance as
83 argument. Each of those is one benchmark, really. They are run in
84 sequence and reported separately.
85
86 Right now, there are the following "Dumbbench::Instance"
87 implementations: Dumbbench::Instance::Cmd for running/benchmarking
88 external commands, Dumbbench::Instance::PerlEval for
89 running/benchmarking Perl code in this same process using "eval", and
90 Dumbbench::Instance::PerlSub for running/benchmarking Perl code in this
91 same process using a subroutine reference.
92
93 run
94 Runs the dry-run and benchmark run.
95
96 report
97 Prints a short report about the benchmark results.
98
99 instances
100 Returns a list of all instance objects in this benchmark set. The
101 instance objects each have a "result()" and "dry_result()" method for
102 accessing the numeric benchmark results.
103
104 box_plot
105 Returns a Dumbbench::BoxPlot instance. Note that you need SOOT
106 installed to use that module, but this does not require it as a
107 prerequisite since it's not a trivial installation.
108
109 A Dumbbench::BoxPlot is a nice an easy way to get a graphic chart if
110 you're in the mood instead of getting the same results from "report".
111
112 If you don't want to get into the details of Dumbbench::BoxPlot, you
113 can do:
114
115 # $bench is your Dumbbench instance
116 eval { $bench->box_plot->show };
117
119 Why it doesn't work and why we try regardless
120 Recall that the goal is to obtain a reliable estimate of the run-time
121 of a certain operation or command. Now, please realize that this is
122 impossible since the run-time of an operation may depend on many things
123 that can change rapidly: Modern CPUs change their frequency dynamically
124 depending on load. CPU caches may be invalidated at odd moments and
125 page faults provide less fine-grained distration. Naturally, OS
126 kernels will do weird things just to spite you. It's almost hopeless.
127
128 Since people (you, I, everybody!) insist on benchmarking anyway, this
129 is a best-effort at estimating the run-time. Naturally, it includes
130 estimating the uncertainty of the run time. This is extremely important
131 for comparing multiple benchmarks and that is usually the ultimate
132 goal. In order to get an estimate of the expectation value and its
133 uncertainty, we need a model of the underlying distribution:
134
135 A model for timing results
136 Let's take a step back and think about how the run-time of multiple
137 invocations of the same code will be distributed. Having a qualitative
138 idea what the distribution of many (MANY) measurements looks like is
139 extremely important for estimating the expectation value and
140 uncertainty from a sample of few measurements.
141
142 In a perfect, deterministic, single-tasking computer, we will get N
143 times the exact same timing. In the real world, there are at least a
144 million ways that this assumption is broken on a small scale. For each
145 run, the load of the computer will be slightly different. The content
146 of main memory and CPU caches may differ. All of these small effects
147 will make a given run a tiny bit slower or faster than any other.
148 Thankfully, this is a case where statistics (more precisely the Central
149 Limit Theorem) provides us with the qualitative result: The
150 measurements will be normally distributed (i.e. following a Gaussian
151 distribution) around some expectation value (which happens to be the
152 mean in this case). Good. Unfortunately, benchmarks are more evil than
153 that. In addition to the small-scale effects that smear the result,
154 there are things that (at the given run time of the benchmark) may be
155 large enough to cause a large jump in run time. Assuming these are
156 comparatively rare and typically cause extraordinarily long run-times
157 (as opposed to extraordinarily low run-times), we arrive at an overall
158 model of having a central, smooth-ish normal distribution with a few
159 outliers towards long run-times.
160
161 So in this model, if we perform "N" measurements, almost all "N" times
162 will be close to the expectation value and a fraction will be
163 significantly higher. This is troublesome because the outliers create
164 a bias in the uncertainty estimation and the asymmetry of the overall
165 distribution will bias a simple calculation of the mean.
166
167 What we would like to report to the user is the mean and uncertainty of
168 the main distribution while ignoring the outliers.
169
170 A robust estimation of the expectation value
171 Given the previously discussed model, we estimate the expectation value
172 with the following algorithm:
173
174 1)
175 Calculate the median of the whole distribution. The median is a
176 fairly robust estimator of the expectation value with respect to
177 outliers (assuming they're comparatively rare).
178
179 2)
180 Calculate the median-absolute-deviation from the whole distribution
181 (MAD, see wikipedia). The MAD needs rescaling to become a measure of
182 variability. The MAD will be our initial guess for an uncertainty.
183 Like the median, it is quite robust against outliers.
184
185 3)
186 We use the median and MAD to remove the tails of our distribution.
187 All timings that deviate from the median by more than $X times the
188 MAD are rejected. This measure should cut outliers without
189 introducing much bias both in symmetric and asymmetric source
190 distributions.
191
192 An alternative would be to use an ordinary truncated mean (that is
193 the mean of all timings while disregarding the $N largest and $N
194 smallest results). But the truncated mean can produce a biased result
195 in asymmetric source distributions. The resulting expectation value
196 would be artificially increased.
197
198 In summary: Using the median as the initial guess for the expectation
199 value and the MAD as the guess for the variability keeps the bias
200 down in the general case.
201
202 4)
203 Finally, the use the mean of the truncated distribution as the
204 expectation value and the MAD of the truncated distribution as a
205 measure of variability. To get the uncertainty on the expectation
206 value, we take "MAD / sqrt($N)" where $N is the number of remaining
207 measurements.
208
209 Conclusion
210 I hope I could convince you that interpreting less sophisticated
211 benchmarks is a dangerous if not futile exercise. The reason this
212 module exists is that not everybody is willing to go through such
213 contortions to arrive at a reliable conclusion, but everybody loves
214 benchmarking. So let's at least get the basics right. Do not compare
215 raw timings of meaningless benchmarks but robust estimates of the run
216 time of meaningless benchmarks instead.
217
219 Dumbbench::Instance, Dumbbench::Instance::Cmd,
220 Dumbbench::Instance::PerlEval, Dumbbench::Instance::PerlSub,
221 Dumbbench::Result
222
223 Benchmark
224
225 Number::WithError does the Gaussian error propagation.
226
227 <http://en.wikipedia.org/wiki/Median_absolute_deviation>
228
230 Steffen Mueller, <smueller@cpan.org>
231
233 Copyright (C) 2010, 2011, 2012, 2013 by Steffen Mueller
234
235 This library is free software; you can redistribute it and/or modify it
236 under the same terms as Perl itself, either Perl version 5.8.1 or, at
237 your option, any later version of Perl 5 you may have available.
238
239
240
241perl v5.36.0 2022-07-22 Dumbbench(3)