Dumbbench(3pm)

1Dumbbench(3)          User Contributed Perl Documentation         Dumbbench(3)
2
3
4

NAME

6       Dumbbench - More reliable benchmarking with the least amount of
7       thinking
8

SYNOPSIS

10       Command line interface: (See "dumbbench --help")
11
12         dumbbench -p 0.005 -- ./testprogram --testprogramoption
13
14       This will start churning for a while and then prints something like:
15
16         Ran 23 iterations of the command.
17         Rejected 3 samples as outliers.
18         Rounded run time per iteration: 9.519e-01 +/- 3.7e-03 (0.4%)
19
20       As a module:
21
22         use Dumbbench;
23
24         my $bench = Dumbbench->new(
25           target_rel_precision => 0.005, # seek ~0.5%
26           initial_runs         => 20,    # the higher the more reliable
27         );
28         $bench->add_instances(
29           Dumbbench::Instance::Cmd->new(command => [qw(perl -e 'something')]),
30           Dumbbench::Instance::PerlEval->new(code => 'for(1..1e7){something}'),
31           Dumbbench::Instance::PerlSub->new(code => sub {for(1..1e7){something}}),
32         );
33         # (Note: Comparing the run of externals commands with
34         #  evals/subs probably isn't reliable)
35         $bench->run;
36         $bench->report;
37

DESCRIPTION

39       This module attempts to implement reasonably robust benchmarking with
40       little extra effort and expertise required from the user. That is to
41       say, benchmarking using this module is likely an improvement over
42
43         time some-command --to --benchmark
44
45       or
46
47         use Benchmark qw/timethis/;
48         timethis(1000, 'system("some-command", ...)');
49
50       The module currently works similar to the former command line, except
51       (in layman terms) it will run the command many times, estimate the
52       uncertainty of the result and keep iterating until a certain user-
53       defined precision has been reached. Then, it calculates the resulting
54       uncertainty and goes through some pain to discard bad runs and subtract
55       overhead from the timings. The reported timing includes an uncertainty,
56       so that multiple benchmarks can more easily be compared.
57
58       Please note that "Dumbbench" works entirely with wallclock time as
59       reported by "Time::HiRes"' "time()" function.
60

METHODS

62       In addition to the methods listed here, there are read-only accessors
63       for all named arguments of the constructor (which are also object
64       attributes).
65
66   new
67       Constructor that takes the following arguments (with defaults):
68
69         verbosity            => 0,     # 0, 1, or 2
70         target_rel_precision => 0.05,  # 5% target precision
71         target_abs_precision => 0,     # no target absolute precision (in s)
72         intial_runs          => 20,    # no. of guaranteed initial runs
73         max_iterations       => 10000, # hard max. no of iterations
74         variability_measure  => 'mad', # method for calculating uncertainty
75         outlier_rejection    => 3,     # no. of "sigma"s for the outlier rejection
76
77       "variability_measure" and "outlier_rejection" probably make sense after
78       reading "HOW IT WORKS" below. Setting "outlier_rejection" to 0 will
79       turn it off entirely.
80
81   add_instances
82       Takes one ore more instances of subclasses of Dumbbench::Instance as
83       argument. Each of those is one benchmark, really.  They are run in
84       sequence and reported separately.
85
86       Right now, there are the following "Dumbbench::Instance"
87       implementations: Dumbbench::Instance::Cmd for running/benchmarking
88       external commands, Dumbbench::Instance::PerlEval for
89       running/benchmarking Perl code in this same process using "eval", and
90       Dumbbench::Instance::PerlSub for running/benchmarking Perl code in this
91       same process using a subroutine reference.
92
93   run
94       Runs the dry-run and benchmark run.
95
96   report
97       Prints a short report about the benchmark results.
98
99   instances
100       Returns a list of all instance objects in this benchmark set.  The
101       instance objects each have a "result()" and "dry_result()" method for
102       accessing the numeric benchmark results.
103
104   box_plot
105       Returns a Dumbbench::BoxPlot instance.
106
107       A Dumbbench::BoxPlot is a nice an easy way to get a graphic chart if
108       you're in the mood instead of getting the same results from "report".
109
110       If you don't want to get into the details of Dumbbench::BoxPlot, you
111       can do:
112
113         # $bench is your Dumbbench instance
114         $bench->box_plot->show;
115

HOW IT WORKS AND WHY IT DOESN'T

117   Why it doesn't work and why we try regardless
118       Recall that the goal is to obtain a reliable estimate of the run-time
119       of a certain operation or command. Now, please realize that this is
120       impossible since the run-time of an operation may depend on many things
121       that can change rapidly: Modern CPUs change their frequency dynamically
122       depending on load. CPU caches may be invalidated at odd moments and
123       page faults provide less fine-grained distration.  Naturally, OS
124       kernels will do weird things just to spite you. It's almost hopeless.
125
126       Since people (you, I, everybody!) insist on benchmarking anyway, this
127       is a best-effort at estimating the run-time. Naturally, it includes
128       estimating the uncertainty of the run time. This is extremely important
129       for comparing multiple benchmarks and that is usually the ultimate
130       goal. In order to get an estimate of the expectation value and its
131       uncertainty, we need a model of the underlying distribution:
132
133   A model for timing results
134       Let's take a step back and think about how the run-time of multiple
135       invocations of the same code will be distributed. Having a qualitative
136       idea what the distribution of many (MANY) measurements looks like is
137       extremely important for estimating the expectation value and
138       uncertainty from a sample of few measurements.
139
140       In a perfect, deterministic, single-tasking computer, we will get N
141       times the exact same timing. In the real world, there are at least a
142       million ways that this assumption is broken on a small scale. For each
143       run, the load of the computer will be slightly different. The content
144       of main memory and CPU caches may differ. All of these small effects
145       will make a given run a tiny bit slower or faster than any other.
146       Thankfully, this is a case where statistics (more precisely the Central
147       Limit Theorem) provides us with the qualitative result: The
148       measurements will be normally distributed (i.e. following a Gaussian
149       distribution) around some expectation value (which happens to be the
150       mean in this case).  Good. Unfortunately, benchmarks are more evil than
151       that. In addition to the small-scale effects that smear the result,
152       there are things that (at the given run time of the benchmark) may be
153       large enough to cause a large jump in run time. Assuming these are
154       comparatively rare and typically cause extraordinarily long run-times
155       (as opposed to extraordinarily low run-times), we arrive at an overall
156       model of having a central, smooth-ish normal distribution with a few
157       outliers towards long run-times.
158
159       So in this model, if we perform "N" measurements, almost all "N" times
160       will be close to the expectation value and a fraction will be
161       significantly higher.  This is troublesome because the outliers create
162       a bias in the uncertainty estimation and the asymmetry of the overall
163       distribution will bias a simple calculation of the mean.
164
165       What we would like to report to the user is the mean and uncertainty of
166       the main distribution while ignoring the outliers.
167
168   A robust estimation of the expectation value
169       Given the previously discussed model, we estimate the expectation value
170       with the following algorithm:
171
172       1)
173         Calculate the median of the whole distribution.  The median is a
174         fairly robust estimator of the expectation value with respect to
175         outliers (assuming they're comparatively rare).
176
177       2)
178         Calculate the median-absolute-deviation from the whole distribution
179         (MAD, see wikipedia). The MAD needs rescaling to become a measure of
180         variability. The MAD will be our initial guess for an uncertainty.
181         Like the median, it is quite robust against outliers.
182
183       3)
184         We use the median and MAD to remove the tails of our distribution.
185         All timings that deviate from the median by more than $X times the
186         MAD are rejected. This measure should cut outliers without
187         introducing much bias both in symmetric and asymmetric source
188         distributions.
189
190         An alternative would be to use an ordinary truncated mean (that is
191         the mean of all timings while disregarding the $N largest and $N
192         smallest results). But the truncated mean can produce a biased result
193         in asymmetric source distributions. The resulting expectation value
194         would be artificially increased.
195
196         In summary: Using the median as the initial guess for the expectation
197         value and the MAD as the guess for the variability keeps the bias
198         down in the general case.
199
200       4)
201         Finally, the use the mean of the truncated distribution as the
202         expectation value and the MAD of the truncated distribution as a
203         measure of variability.  To get the uncertainty on the expectation
204         value, we take "MAD / sqrt($N)" where $N is the number of remaining
205         measurements.
206
207   Conclusion
208       I hope I could convince you that interpreting less sophisticated
209       benchmarks is a dangerous if not futile exercise. The reason this
210       module exists is that not everybody is willing to go through such
211       contortions to arrive at a reliable conclusion, but everybody loves
212       benchmarking. So let's at least get the basics right.  Do not compare
213       raw timings of meaningless benchmarks but robust estimates of the run
214       time of meaningless benchmarks instead.
215

AUTHOR

228       Steffen Mueller, <smueller@cpan.org>
229

COPYRIGHT AND LICENSE

231       Copyright (C) 2010, 2011, 2012, 2013 by Steffen Mueller
232
233       This library is free software; you can redistribute it and/or modify it
234       under the same terms as Perl itself, either Perl version 5.8.1 or, at
235       your option, any later version of Perl 5 you may have available.
236
237
238
239perl v5.32.1                      2021-01-27                      Dumbbench(3)