1dieharder(1) General Commands Manual dieharder(1)
2
3
4
6 dieharder - A testing and benchmarking tool for random number genera‐
7 tors.
8
9
11 dieharder [-a] [-d dieharder test number] [-f filename] [-B]
12 [-D output flag [-D output flag] ... ] [-F] [-c separator]
13 [-g generator number or -1] [-h] [-k ks_flag] [-l]
14 [-L overlap] [-m multiply_p] [-n ntuple]
15 [-p number of p samples] [-P Xoff]
16 [-o filename] [-s seed strategy] [-S random number seed]
17 [-n ntuple] [-p number of p samples] [-o filename]
18 [-s seed strategy] [-S random number seed]
19 [-t number of test samples] [-v verbose flag]
20 [-W weak] [-X fail] [-Y Xtrategy]
21 [-x xvalue] [-y yvalue] [-z zvalue]
22
23
25 -a runs all the tests with standard/default options to create a
26 user-controllable report. To control the formatting of the
27 report, see -D below. To control the power of the test (which
28 uses default values for tsamples that cannot generally be varied
29 and psamples which generally can) see -m below as a "multiplier"
30 of the default number of psamples (used only in a -a run).
31
32 -d test number - selects specific diehard test.
33
34 -f filename - generators 201 or 202 permit either raw binary or
35 formatted ASCII numbers to be read in from a file for testing.
36 generator 200 reads in raw binary numbers from stdin. Note
37 well: many tests with default parameters require a lot of rands!
38 To see a sample of the (required) header for ASCII formatted
39 input, run
40
41 dieharder -o -f example.input -t 10
42
43 and then examine the contents of example.input. Raw binary
44 input reads 32 bit increments of the specified data stream.
45 stdin_input_raw accepts a pipe from a raw binary stream.
46
47 -B binary mode (used with -o below) causes output rands to be written
48 in raw binary, not formatted ascii.
49
50 -D output flag - permits fields to be selected for inclusion in
51 dieharder output. Each flag can be entered as a binary number
52 that turns on a specific output field or header or by flag name;
53 flags are aggregated. To see all currently known flags use the
54 -F command.
55
56 -F - lists all known flags by name and number.
57
58 -c table separator - where separator is e.g. ',' (CSV) or ' ' (white‐
59 space).
60
61 -g generator number - selects a specific generator for testing. Using
62 -g -1 causes all known generators to be printed out to the dis‐
63 play.
64
65 -h prints context-sensitive help -- usually Usage (this message) or a
66 test synopsis if entered as e.g. dieharder -d 3 -h.
67
68 -k ks_flag - ks_flag
69
70 0 is fast but slightly sloppy for psamples > 4999 (default).
71
72 1 is MUCH slower but more accurate for larger numbers of psam‐
73 ples.
74
75 2 is slower still, but (we hope) accurate to machine precision
76 for any number of psamples up to some as yet unknown numerical
77 upper limit (it has been tested out to at least hundreds of
78 thousands).
79
80 3 is kuiper ks, fast, quite inaccurate for small samples, depre‐
81 cated.
82
83 -l list all known tests.
84
85 -L overlap
86
87 1 (use overlap, default)
88
89 0 (don't use overlap)
90
91 in operm5 or other tests that support overlapping and non-over‐
92 lapping sample modes.
93
94 -m multiply_p - multiply default # of psamples in -a(ll) runs to crank
95 up the resolution of failure. -n ntuple - set ntuple length for
96 tests on short bit strings that permit the length to be varied
97 (e.g. rgb bitdist).
98
99 -o filename - output -t count random numbers from current generator to
100 file.
101
102 -p count - sets the number of p-value samples per test (default 100).
103
104 -P Xoff - sets the number of psamples that will cumulate before decid‐
105 ing
106 that a generator is "good" and really, truly passes even a -Y 2
107 T2D run. Currently the default is 100000; eventually it will be
108 set from AES-derived T2D test failure thresholds for fully auto‐
109 mated reliable operation, but for now it is more a "boredom"
110 threshold set by how long one might reasonably want to wait on
111 any given test run.
112
113 -S seed - where seed is a uint. Overrides the default random seed
114 selection. Ignored for file or stdin input.
115
116 -s strategy - if strategy is the (default) 0, dieharder reseeds (or
117 rewinds) once at the beginning when the random number generator
118 is selected and then never again. If strategy is nonzero, the
119 generator is reseeded or rewound at the beginning of EACH TEST.
120 If -S seed was specified, or a file is used, this means every
121 test is applied to the same sequence (which is useful for vali‐
122 dation and testing of dieharder, but not a good way to test
123 rngs). Otherwise a new random seed is selected for each test.
124
125 -t count - sets the number of random entities used in each test, where
126 possible. Be warned -- some tests have fixed sample sizes; oth‐
127 ers are variable but have practical minimum sizes. It is sug‐
128 gested you begin with the values used in -a and experiment care‐
129 fully on a test by test basis.
130
131 -W weak - sets the "weak" threshold to make the test(s) more or less
132 forgiving during e.g. a test-to-destruction run. Default is
133 currently 0.005.
134
135 -X fail - sets the "fail" threshold to make the test(s) more or less
136 forgiving during e.g. a test-to-destruction run. Default is
137 currently 0.000001, which is basically "certain failure of the
138 null hypothesis", the desired mode of reproducible generator
139 failure.
140
141 -Y Xtrategy - the Xtrategy flag controls the new "test to failure"
142 (T2F)
143 modes. These flags and their modes act as follows:
144
145 0 - just run dieharder with the specified number of tsamples
146 and psamples, do not dynamically modify a run based on results.
147 This is the way it has always run, and is the default.
148
149 1 - "resolve ambiguity" (RA) mode. If a test returns "weak",
150 this is an undesired result. What does that mean, after all?
151 If you run a long test series, you will see occasional weak
152 returns for a perfect generators because p is uniformly distrib‐
153 uted and will appear in any finite interval from time to time.
154 Even if a test run returns more than one weak result, you cannot
155 be certain that the generator is failing. RA mode adds psamples
156 (usually in blocks of 100) until the test result ends up solidly
157 not weak or proceeds to unambiguous failure. This is morally
158 equivalent to running the test several times to see if a weak
159 result is reproducible, but eliminates the bias of personal
160 judgement in the process since the default failure threshold is
161 very small and very unlikely to be reached by random chance even
162 in many runs.
163
164 This option should only be used with -k 2.
165
166 2 - "test to destruction" mode. Sometimes you just want to
167 know where or if a generator will .I ever fail a test (or test
168 series). -Y 2 causes psamples to be added 100 at a time until a
169 test returns an overall pvalue lower than the failure threshold
170 or a specified maximum number of psamples (see -P) is reached.
171
172 Note well! In this mode one may well fail due to the alternate
173 null hypothesis -- the test itself is a bad test and fails!
174 Many dieharder tests, despite our best efforts, are numerically
175 unstable or have only approximately known target statistics or
176 are straight up asymptotic results, and will eventually return a
177 failing result even for a gold-standard generator (such as AES),
178 or for the hypercautious the XOR generator with AES, threefish,
179 kiss, all loaded at once and xor'd together. It is therefore
180 safest to use this mode .I comparatively, executing a T2D run on
181 AES to get an idea of the test failure threshold(s) (something I
182 will eventually do and publish on the web so everybody doesn't
183 have to do it independently) and then running it on your target
184 generator. Failure with numbers of psamples within an order of
185 magnitude of the AES thresholds should probably be considered
186 possible test failures, not generator failures. Failures at
187 levels significantly less than the known gold standard generator
188 failure thresholds are, of course, probably failures of the gen‐
189 erator.
190
191 This option should only be used with -k 2.
192
193 -v verbose flag -- controls the verbosity of the output for debugging
194 only. Probably of little use to non-developers, and developers
195 can read the enum(s) in dieharder.h and the test sources to see
196 which flag values turn on output on which routines. 1 is result
197 in a highly detailed trace of program activity.
198
199 -x,-y,-z number - Some tests have parameters that can safely be varied
200 from their default value. For example, in the diehard birthdays
201 test, one can vary the number of length, which can also be var‐
202 ied. -x 2048 -y 30 alters these two values but should still run
203 fine. These parameters should be documented internally (where
204 they exist) in the e.g. -d 0 -h visible notes.
205
206 NOTE WELL: The assessment(s) for the rngs may, in fact, be com‐
207 pletely incorrect or misleading. There are still "bad tests" in
208 dieharder, although we are working to fix and improve them (and
209 try to document them in the test descriptions visible with -g
210 testnumber -h). In particular, 'Weak' pvalues should occur one
211 test in two hundred, and 'Failed' pvalues should occur one test
212 in a million with the default thresholds - that's what p MEANS.
213 Use them at your Own Risk! Be Warned!
214
215 Or better yet, use the new -Y 1 and -Y 2 resolve ambiguity or
216 test to destruction modes above, comparing to similar runs on
217 one of the as-good-as-it-gets cryptographic generators, AES or
218 threefish.
219
220
222 dieharder
223
224 Welcome to the current snapshot of the dieharder random number tester.
225 It encapsulates all of the Gnu Scientific Library (GSL) random number
226 generators (rngs) as well as a number of generators from the R statis‐
227 tical library, hardware sources such as /dev/*random, "gold standard"
228 cryptographic quality generators (useful for testing dieharder and for
229 purposes of comparison to new generators) as well as generators con‐
230 tributed by users or found in the literature into a single harness that
231 can time them and subject them to various tests for randomness. These
232 tests are variously drawn from George Marsaglia's "Diehard battery of
233 random number tests", the NIST Statistical Test Suite, and again from
234 other sources such as personal invention, user contribution, other
235 (open source) test suites, or the literature.
236
237 The primary point of dieharder is to make it easy to time and test
238 (pseudo)random number generators, including both software and hardware
239 rngs, with a fully open source tool. In addition to providing
240 "instant" access to testing of all built-in generators, users can
241 choose one of three ways to test their own random number generators or
242 sources: a unix pipe of a raw binary (presumed random) bitstream; a
243 file containing a (presumed random) raw binary bitstream or formatted
244 ascii uints or floats; and embedding your generator in dieharder's GSL-
245 compatible rng harness and adding it to the list of built-in genera‐
246 tors. The stdin and file input methods are described below in their
247 own section, as is suggested "best practice" for newbies to random num‐
248 ber generator testing.
249
250 An important motivation for using dieharder is that the entire test
251 suite is fully Gnu Public License (GPL) open source code and hence
252 rather than being prohibited from "looking underneath the hood" all
253 users are openly encouraged to critically examine the dieharder code
254 for errors, add new tests or generators or user interfaces, or use it
255 freely as is to test their own favorite candidate rngs subject only to
256 the constraints of the GPL. As a result of its openness, literally
257 hundreds of improvements and bug fixes have been contributed by users
258 to date, resulting in a far stronger and more reliable test suite than
259 would have been possible with closed and locked down sources or even
260 open sources (such as STS) that lack the dynamical feedback mechanism
261 permitting corrections to be shared.
262
263 Even small errors in test statistics permit the alternative (usually
264 unstated) null hypothesis to become an important factor in rng testing
265 -- the unwelcome possibility that your generator is just fine but it is
266 the test that is failing. One extremely useful feature of dieharder is
267 that it is at least moderately self validating. Using the "gold stan‐
268 dard" aes and threefish cryptographic generators, you can observe how
269 these generators perform on dieharder runs to the same general degree
270 of accuracy that you wish to use on the generators you are testing. In
271 general, dieharder tests that consistently fail at any given level of
272 precision (selected with e.g. -a -m 10) on both of the gold standard
273 rngs (and/or the better GSL generators, mt19937, gfsr4, taus) are prob‐
274 ably unreliable at that precision and it would hardly be surprising if
275 they failed your generator as well.
276
277 Experts in statistics are encouraged to give the suite a try, perhaps
278 using any of the example calls below at first and then using it freely
279 on their own generators or as a harness for adding their own tests.
280 Novices (to either statistics or random number generator testing) are
281 strongly encouraged to read the next section on p-values and the null
282 hypothesis and running the test suite a few times with a more verbose
283 output report to learn how the whole thing works.
284
285
287 Examples for how to set up pipe or file input are given below. How‐
288 ever, it is recommended that a user play with some of the built in gen‐
289 erators to gain familiarity with dieharder reports and tests before
290 tackling their own favorite generator or file full of possibly random
291 numbers.
292
293 To see dieharder's default standard test report for its default genera‐
294 tor (mt19937) simply run:
295
296 dieharder -a
297
298 To increase the resolution of possible failures of the standard -a(ll)
299 test, use the -m "multiplier" for the test default numbers of pvalues
300 (which are selected more to make a full test run take an hour or so
301 instead of days than because it is truly an exhaustive test sequence)
302 run:
303
304 dieharder -a -m 10
305
306 To test a different generator (say the gold standard AES_OFB) simply
307 specify the generator on the command line with a flag:
308
309 dieharder -g 205 -a -m 10
310
311 Arguments can be in any order. The generator can also be selected by
312 name:
313
314 dieharder -g AES_OFB -a
315
316 To apply only the diehard opso test to the AES_OFB generator, specify
317 the test by name or number:
318
319 dieharder -g 205 -d 5
320
321 or
322
323 dieharder -g 205 -d diehard_opso
324
325 Nearly every aspect or field in dieharder's output report format is
326 user-selectable by means of display option flags. In addition, the
327 field separator character can be selected by the user to make the out‐
328 put particularly easy for them to parse (-c ' ') or import into a
329 spreadsheet (-c ','). Try:
330
331 dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues
332
333 to see an extremely terse, easy to import report or
334
335 dieharder -g 205 -d diehard_opso -c ' ' -D default -D histogram -D
336 description
337
338 to see a verbose report good for a "beginner" that includes a full
339 description of each test itself.
340
341 Finally, the dieharder binary is remarkably autodocumenting even if the
342 man page is not available. All users should try the following commands
343 to see what they do:
344
345 dieharder -h
346
347 (prints the command synopsis like the one above).
348
349 dieharder -a -h
350 dieharder -d 6 -h
351
352 (prints the test descriptions only for -a(ll) tests or for the specific
353 test indicated).
354
355 dieharder -l
356
357 (lists all known tests, including how reliable rgb thinks that they are
358 as things stand).
359
360 dieharder -g -1
361
362 (lists all known rngs).
363
364 dieharder -F
365
366 (lists all the currently known display/output control flags used with
367 -D).
368
369 Both beginners and experts should be aware that the assessment provided
370 by dieharder in its standard report should be regarded with great sus‐
371 picion. It is entirely possible for a generator to "pass" all tests as
372 far as their individual p-values are concerned and yet to fail utterly
373 when considering them all together. Similarly, it is probable that a
374 rng will at the very least show up as "weak" on 0, 1 or 2 tests in a
375 typical -a(ll) run, and may even "fail" 1 test one such run in 10 or
376 so. To understand why this is so, it is necessary to understand some‐
377 thing of rng testing, p-values, and the null hypothesis!
378
379
381 dieharder returns "p-values". To understand what a p-value is and how
382 to use it, it is essential to understand the null hypothesis, H0.
383
384 The null hypothesis for random number generator testing is "This gener‐
385 ator is a perfect random number generator, and for any choice of seed
386 produces a infinitely long, unique sequence of numbers that have all
387 the expected statistical properties of random numbers, to all orders".
388 Note well that we know that this hypothesis is technically false for
389 all software generators as they are periodic and do not have the cor‐
390 rect entropy content for this statement to ever be true. However, many
391 hardware generators fail a priori as well, as they contain subtle bias
392 or correlations due to the deterministic physics that underlies them.
393 Nature is often unpredictable but it is rarely random and the two words
394 don't (quite) mean the same thing!
395
396 The null hypothesis can be practically true, however. Both software
397 and hardware generators can be "random" enough that their sequences
398 cannot be distinguished from random ones, at least not easily or with
399 the available tools (including dieharder!) Hence the null hypothesis is
400 a practical, not a theoretically pure, statement.
401
402 To test H0 , one uses the rng in question to generate a sequence of
403 presumably random numbers. Using these numbers one can generate any
404 one of a wide range of test statistics -- empirically computed numbers
405 that are considered random samples that may or may not be covariant
406 subject to H0, depending on whether overlapping sequences of random
407 numbers are used to generate successive samples while generating the
408 statistic(s), drawn from a known distribution. From a knowledge of the
409 target distribution of the statistic(s) and the associated cumulative
410 distribution function (CDF) and the empirical value of the randomly
411 generated statistic(s), one can read off the probability of obtaining
412 the empirical result if the sequence was truly random, that is, if the
413 null hypothesis is true and the generator in question is a "good" ran‐
414 dom number generator! This probability is the "p-value" for the par‐
415 ticular test run.
416
417 For example, to test a coin (or a sequence of bits) we might simply
418 count the number of heads and tails in a very long string of flips. If
419 we assume that the coin is a "perfect coin", we expect the number of
420 heads and tails to be binomially distributed and can easily compute the
421 probability of getting any particular number of heads and tails. If we
422 compare our recorded number of heads and tails from the test series to
423 this distribution and find that the probability of getting the count we
424 obtained is very low with, say, way more heads than tails we'd suspect
425 the coin wasn't a perfect coin. dieharder applies this very test (made
426 mathematically precise) and many others that operate on this same prin‐
427 ciple to the string of random bits produced by the rng being tested to
428 provide a picture of how "random" the rng is.
429
430 Note that the usual dogma is that if the p-value is low -- typically
431 less than 0.05 -- one "rejects" the null hypothesis. In a word, it is
432 improbable that one would get the result obtained if the generator is a
433 good one. If it is any other value, one does not "accept" the genera‐
434 tor as good, one "fails to reject" the generator as bad for this par‐
435 ticular test. A "good random number generator" is hence one that we
436 haven't been able to make fail yet!
437
438 This criterion is, of course, naive in the extreme and cannot be used
439 with dieharder! It makes just as much sense to reject a generator that
440 has p-values of 0.95 or more! Both of these p-value ranges are equally
441 unlikely on any given test run, and should be returned for (on average)
442 5% of all test runs by a perfect random number generator. A generator
443 that fails to produce p-values less than 0.05 5% of the time it is
444 tested with different seeds is a bad random number generator, one that
445 fails the test of the null hypothesis. Since dieharder returns over
446 100 pvalues by default per test, one would expect any perfectly good
447 rng to "fail" such a naive test around five times by this criterion in
448 a single dieharder run!
449
450 The p-values themselves, as it turns out, are test statistics! By
451 their nature, p-values should be uniformly distributed on the range
452 0-1. In 100+ test runs with independent seeds, one should not be sur‐
453 prised to obtain 0, 1, 2, or even (rarely) 3 p-values less than 0.01.
454 On the other hand obtaining 7 p-values in the range 0.24-0.25, or see‐
455 ing that 70 of the p-values are greater than 0.5 should make the gener‐
456 ator highly suspect! How can a user determine when a test is producing
457 "too many" of any particular value range for p? Or too few?
458
459 Dieharder does it for you, automatically. One can in fact convert a
460 set of p-values into a p-value by comparing their distribution to the
461 expected one, using a Kolmogorov-Smirnov test against the expected uni‐
462 form distribution of p.
463
464 These p-values obtained from looking at the distribution of p-values
465 should in turn be uniformly distributed and could in principle be sub‐
466 jected to still more KS tests in aggregate. The distribution of p-val‐
467 ues for a good generator should be idempotent, even across different
468 test statistics and multiple runs.
469
470 A failure of the distribution of p-values at any level of aggregation
471 signals trouble. In fact, if the p-values of any given test are sub‐
472 jected to a KS test, and those p-values are then subjected to a KS
473 test, as we add more p-values to either level we will either observe
474 idempotence of the resulting distribution of p to uniformity, or we
475 will observe idempotence to a single p-value of zero! That is, a good
476 generator will produce a roughly uniform distribution of p-values, in
477 the specific sense that the p-values of the distributions of p-values
478 are themselves roughly uniform and so on ad infinitum, while a bad gen‐
479 erator will produce a non-uniform distribution of p-values, and as more
480 p-values drawn from the non-uniform distribution are added to its KS
481 test, at some point the failure will be absolutely unmistakeable as the
482 resulting p-value approaches 0 in the limit. Trouble indeed!
483
484 The question is, trouble with what? Random number tests are themselves
485 complex computational objects, and there is a probability that their
486 code is incorrectly framed or that roundoff or other numerical -- not
487 methodical -- errors are contributing to a distortion of the distribu‐
488 tion of some of the p-values obtained. This is not an idle observa‐
489 tion; when one works on writing random number generator testing pro‐
490 grams, one is always testing the tests themselves with "good" (we hope)
491 random number generators so that egregious failures of the null hypoth‐
492 esis signal not a bad generator but an error in the test code. The
493 null hypothesis above is correctly framed from a theoretical point of
494 view, but from a real and practical point of view it should read: "This
495 generator is a perfect random number generator, and for any choice of
496 seed produces a infinitely long, unique sequence of numbers that have
497 all the expected statistical properties of random numbers, to all
498 orders and this test is a perfect test and returns precisely correct p-
499 values from the test computation." Observed "failure" of this joint
500 null hypothesis H0' can come from failure of either or both of these
501 disjoint components, and comes from the second as often or more often
502 than the first during the test development process. When one cranks up
503 the "resolution" of the test (discussed next) to where a generator
504 starts to fail some test one realizes, or should realize, that develop‐
505 ment never ends and that new test regimes will always reveal new fail‐
506 ures not only of the generators but of the code.
507
508 With that said, one of dieharder's most significant advantages is the
509 control that it gives you over a critical test parameter. From the
510 remarks above, we can see that we should feel very uncomfortable about
511 "failing" any given random number generator on the basis of a 5%, or
512 even a 1%, criterion, especially when we apply a test suite like
513 dieharder that returns over 100 (and climbing) distinct test p-values
514 as of the last snapshot. We want failure to be unambiguous and repro‐
515 ducible!
516
517 To accomplish this, one can simply crank up its resolution. If we ran
518 any given test against a random number generator and it returned a p-
519 value of (say) 0.007328, we'd be perfectly justified in wondering if it
520 is really a good generator. However, the probability of getting this
521 result isn't really all that small -- when one uses dieharder for hours
522 at a time numbers like this will definitely happen quite frequently and
523 mean nothing. If one runs the same test again (with a different seed
524 or part of the random sequence) and gets a p-value of 0.009122, and a
525 third time and gets 0.002669 -- well, that's three 1% (or less) shots
526 in a row and that should happen only one in a million times. One way
527 to clearly resolve failures, then, is to increase the number of p-val‐
528 ues generated in a test run. If the actual distribution of p being
529 returned by the test is not uniform, a KS test will eventually return a
530 p-value that is not some ambiguous 0.035517 but is instead 0.000000,
531 with the latter produced time after time as we rerun.
532
533 For this reason, dieharder is extremely conservative about announcing
534 rng "weakness" or "failure" relative to any given test. It's internal
535 criterion for these things are currently p < 0.5% or p > 99.5% weakness
536 (at the 1% level total) and a considerably more stringent criterion for
537 failure: p < 0.05% or p > 99.95%. Note well that the ranges are sym‐
538 metric -- too high a value of p is just as bad (and unlikely) as too
539 low, and it is critical to flag it, because it is quite possible for a
540 rng to be too good, on average, and not to produce enough low p-values
541 on the full spectrum of dieharder tests. This is where the final
542 kstest is of paramount importance, and where the "histogram" option can
543 be very useful to help you visualize the failure in the distribution of
544 p -- run e.g.:
545
546 dieharder [whatever] -D default -D histogram
547
548 and you will see a crude ascii histogram of the pvalues that failed (or
549 passed) any given level of test.
550
551 Scattered reports of weakness or marginal failure in a preliminary
552 -a(ll) run should therefore not be immediate cause for alarm. Rather,
553 they are tests to repeat, to watch out for, to push the rng harder on
554 using the -m option to -a or simply increasing -p for a specific test.
555 Dieharder permits one to increase the number of p-values generated for
556 any test, subject only to the availability of enough random numbers
557 (for file based tests) and time, to make failures unambiguous. A test
558 that is truly weak at -p 100 will almost always fail egregiously at
559 some larger value of psamples, be it -p 1000 or -p 100000. However,
560 because dieharder is a research tool and is under perpetual development
561 and testing, it is strongly suggested that one always consider the
562 alternative null hypothesis -- that the failure is a failure of the
563 test code in dieharder itself in some limit of large numbers -- and
564 take at least some steps (such as running the same test at the same
565 resolution on a "gold standard" generator) to ensure that the failure
566 is indeed probably in the rng and not the dieharder code.
567
568 Lacking a source of perfect random numbers to use as a reference, vali‐
569 dating the tests themselves is not easy and always leaves one with some
570 ambiguity (even aes or threefish). During development the best one can
571 usually do is to rely heavily on these "presumed good" random number
572 generators. There are a number of generators that we have theoretical
573 reasons to expect to be extraordinarily good and to lack correlations
574 out to some known underlying dimensionality, and that also test out
575 extremely well quite consistently. By using several such generators
576 and not just one, one can hope that those generators have (at the very
577 least) different correlations and should not all uniformly fail a test
578 in the same way and with the same number of p-values. When all of
579 these generators consistently fail a test at a given level, I tend to
580 suspect that the problem is in the test code, not the generators,
581 although it is very difficult to be certain, and many errors in
582 dieharder's code have been discovered and ultimately fixed in just this
583 way by myself or others.
584
585 One advantage of dieharder is that it has a number of these "good gen‐
586 erators" immediately available for comparison runs, courtesy of the Gnu
587 Scientific Library and user contribution (notably David Bauer, who
588 kindly encapsulated aes and threefish). I use AES_OFB, Threefish_OFB,
589 mt19937_1999, gfsr4, ranldx2 and taus2 (as well as "true random" num‐
590 bers from random.org) for this purpose, and I try to ensure that
591 dieharder will "pass" in particular the -g 205 -S 1 -s 1 generator at
592 any reasonable p-value resolution out to -p 1000 or farther.
593
594 Tests (such as the diehard operm5 and sums test) that consistently fail
595 at these high resolutions are flagged as being "suspect" -- possible
596 failures of the alternative null hypothesis -- and they are strongly
597 deprecated! Their results should not be used to test random number
598 generators pending agreement in the statistics and random number commu‐
599 nity that those tests are in fact valid and correct so that observed
600 failures can indeed safely be attributed to a failure of the intended
601 null hypothesis.
602
603 As I keep emphasizing (for good reason!) dieharder is community sup‐
604 ported. I therefore openly ask that the users of dieharder who are
605 expert in statistics to help me fix the code or algorithms being imple‐
606 mented. I would like to see this test suite ultimately be validated by
607 the general statistics community in hard use in an open environment,
608 where every possible failure of the testing mechanism itself is subject
609 to scrutiny and eventual correction. In this way we will eventually
610 achieve a very powerful suite of tools indeed, ones that may well give
611 us very specific information not just about failure but of the mode of
612 failure as well, just how the sequence tested deviates from randomness.
613
614 Thus far, dieharder has benefitted tremendously from the community.
615 Individuals have openly contributed tests, new generators to be tested,
616 and fixes for existing tests that were revealed by their own work with
617 the testing instrument. Efforts are underway to make dieharder more
618 portable so that it will build on more platforms and faster so that
619 more thorough testing can be done. Please feel free to participate.
620
621
623 The simplest way to use dieharder with an external generator that pro‐
624 duces raw binary (presumed random) bits is to pipe the raw binary out‐
625 put from this generator (presumed to be a binary stream of 32 bit
626 unsigned integers) directly into dieharder, e.g.:
627
628 cat /dev/urandom | ./dieharder -a -g 200
629
630 Go ahead and try this example. It will run the entire dieharder suite
631 of tests on the stream produced by the linux built-in generator
632 /dev/urandom (using /dev/random is not recommended as it is too slow to
633 test in a reasonable amount of time).
634
635 Alternatively, dieharder can be used to test files of numbers produced
636 by a candidate random number generators:
637
638 dieharder -a -g 201 -f random.org_bin
639
640 for raw binary input or
641
642 dieharder -a -g 202 -f random.org.txt
643
644 for formatted ascii input.
645
646 A formatted ascii input file can accept either uints (integers in the
647 range 0 to 2^31-1, one per line) or decimal uniform deviates with at
648 least ten significant digits (that can be multiplied by UINT_MAX = 2^32
649 to produce a uint without dropping precition), also one per line.
650 Floats with fewer digits will almost certainly fail bitlevel tests,
651 although they may pass some of the tests that act on uniform deviates.
652
653 Finally, one can fairly easily wrap any generator in the same (GSL)
654 random number harness used internally by dieharder and simply test it
655 the same way one would any other internal generator recognized by
656 dieharder. This is strongly recommended where it is possible, because
657 dieharder needs to use a lot of random numbers to thoroughly test a
658 generator. A built in generator can simply let dieharder determine how
659 many it needs and generate them on demand, where a file that is too
660 small will "rewind" and render the test results where a rewind occurs
661 suspect.
662
663 Note well that file input rands are delivered to the tests on demand,
664 but if the test needs more than are available it simply rewinds the
665 file and cycles through it again, and again, and again as needed.
666 Obviously this significantly reduces the sample space and can lead to
667 completely incorrect results for the p-value histograms unless there
668 are enough rands to run EACH test without repetition (it is harmless to
669 reuse the sequence for different tests). Let the user beware!
670
671
673 A frequently asked question from new users wishing to test a generator
674 they are working on for fun or profit (or both) is "How should I get
675 its output into dieharder?" This is a nontrivial question, as
676 dieharder consumes enormous numbers of random numbers in a full test
677 cycle, and then there are features like -m 10 or -m 100 that let one
678 effortlessly demand 10 or 100 times as many to stress a new generator
679 even more.
680
681 Even with large file support in dieharder, it is difficult to provide
682 enough random numbers in a file to really make dieharder happy. It is
683 therefore strongly suggested that you either:
684
685 a) Edit the output stage of your random number generator and get it to
686 write its production to stdout as a random bit stream -- basically cre‐
687 ate 32 bit unsigned random integers and write them directly to stdout
688 as e.g. char data or raw binary. Note that this is not the same as
689 writing raw floating point numbers (that will not be random at all as a
690 bitstream) and that "endianness" of the uints should not matter for the
691 null hypothesis of a "good" generator, as random bytes are random in
692 any order. Crank the generator and feed this stream to dieharder in a
693 pipe as described above.
694
695 b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
696 generator (or calls to your generator's hardware interface). Follow
697 the examples in the ./dieharder source directory to add it as a "user"
698 generator in the command line interface, rebuild, and invoke the gener‐
699 ator as a "native" dieharder generator (it should appear in the list
700 produced by -g -1 when done correctly). The advantage of doing it this
701 way is that you can then (if your new generator is highly successful)
702 contribute it back to the dieharder project if you wish! Not to men‐
703 tion the fact that it makes testing it very easy.
704
705 Most users will probably go with option a) at least initially, but be
706 aware that b) is probably easier than you think. The dieharder main‐
707 tainers may be able to give you a hand with it if you get into trouble,
708 but no promises.
709
710
712 A warning for those who are testing files of random numbers. dieharder
713 is a tool that tests random number generators, not files of random num‐
714 bers! It is extremely inappropriate to try to "certify" a file of ran‐
715 dom numbers as being random just because it fails to "fail" any of the
716 dieharder tests in e.g. a dieharder -a run. To put it bluntly, if one
717 rejects all such files that fail any test at the 0.05 level (or any
718 other), the one thing one can be certain of is that the files in ques‐
719 tion are not random, as a truly random sequence would fail any given
720 test at the 0.05 level 5% of the time!
721
722 To put it another way, any file of numbers produced by a generator that
723 "fails to fail" the dieharder suite should be considered "random", even
724 if it contains sequences that might well "fail" any given test at some
725 specific cutoff. One has to presume that passing the broader tests of
726 the generator itself, it was determined that the p-values for the test
727 involved was globally correctly distributed, so that e.g. failure at
728 the 0.01 level occurs neither more nor less than 1% of the time, on
729 average, over many many tests. If one particular file generates a
730 failure at this level, one can therefore safely presume that it is a
731 random file pulled from many thousands of similar files the generator
732 might create that have the correct distribution of p-values at all lev‐
733 els of testing and aggregation.
734
735 To sum up, use dieharder to validate your generator (via input from
736 files or an embedded stream). Then by all means use your generator to
737 produce files or streams of random numbers. Do not use dieharder as an
738 accept/reject tool to validate the files themselves!
739
740
742 To demonstrate all tests, run on the default GSL rng, enter:
743
744 dieharder -a
745
746 To demonstrate a test of an external generator of a raw binary stream
747 of bits, use the stdin (raw) interface:
748
749 cat /dev/urandom | dieharder -g 200 -a
750
751 To use it with an ascii formatted file:
752
753 dieharder -g 202 -f testrands.txt -a
754
755 (testrands.txt should consist of a header such as:
756
757 #==================================================================
758 # generator mt19937_1999 seed = 1274511046
759 #==================================================================
760 type: d
761 count: 100000
762 numbit: 32
763 3129711816
764 85411969
765 2545911541
766
767 etc.).
768
769 To use it with a binary file
770
771 dieharder -g 201 -f testrands.bin -a
772
773 or
774
775 cat testrands.bin | dieharder -g 200 -a
776
777 An example that demonstrates the use of "prefixes" on the output lines
778 that make it relatively easy to filter off the different parts of the
779 output report and chop them up into numbers that can be used in other
780 programs or in spreadsheets, try:
781
782 dieharder -a -c ',' -D default -D prefix
783
784
786 As of version 3.x.x, dieharder has a single output interface that pro‐
787 duces tabular data per test, with common information in headers. The
788 display control options and flags can be used to customize the output
789 to your individual specific needs.
790
791 The options are controlled by binary flags. The flags, and their text
792 versions, are displayed if you enter:
793
794 dieharder -F
795
796 by itself on a line.
797
798 The flags can be entered all at once by adding up all the desired
799 option flags. For example, a very sparse output could be selected by
800 adding the flags for the test_name (8) and the associated pvalues (128)
801 to get 136:
802
803 dieharder -a -D 136
804
805 Since the flags are cumulated from zero (unless no flag is entered and
806 the default is used) you could accomplish the same display via:
807
808 dieharder -a -D 8 -D pvalues
809
810 Note that you can enter flags by value or by name, in any combination.
811 Because people use dieharder to obtain values and then with to export
812 them into spreadsheets (comma separated values) or into filter scripts,
813 you can chance the field separator character. For example:
814
815 dieharder -a -c ',' -D default -D -1 -D -2
816
817 produces output that is ideal for importing into a spreadsheet (note
818 that one can subtract field values from the base set of fields provided
819 by the default option as long as it is given first).
820
821 An interesting option is the -D prefix flag, which turns on a field
822 identifier prefix to make it easy to filter out particular kinds of
823 data. However, it is equally easy to turn on any particular kind of
824 output to the exclusion of others directly by means of the flags.
825
826 Two other flags of interest to novices to random number generator test‐
827 ing are the -D histogram (turns on a histogram of the underlying pval‐
828 ues, per test) and -D description (turns on a complete test descrip‐
829 tion, per test). These flags turn the output table into more of a
830 series of "reports" of each test.
831
832
834 dieharder is entirely original code and can be modified and used at
835 will by any user, provided that:
836
837 a) The original copyright notices are maintained and that the source,
838 including all modifications, is made publically available at the time
839 of any derived publication. This is open source software according to
840 the precepts and spirit of the Gnu Public License. See the accompany‐
841 ing file COPYING, which also must accompany any redistribution.
842
843 b) The primary author of the code (Robert G. Brown) is appropriately
844 acknowledged and referenced in any derived publication. It is strongly
845 suggested that George Marsaglia and the Diehard suite and the various
846 authors of the Statistical Test Suite be similarly acknowledged,
847 although this suite shares no actual code with these random number test
848 suites.
849
850 c) Full responsibility for the accuracy, suitability, and effective‐
851 ness of the program rests with the users and/or modifiers. As is
852 clearly stated in the accompanying copyright.h:
853
854 THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFT‐
855 WARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS,
856 IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL,
857 INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING
858 FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
859 NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
860 WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
861
862
864 The author of this suite gratefully acknowledges George Marsaglia (the
865 author of the diehard test suite) and the various authors of NIST Spe‐
866 cial Publication 800-22 (which describes the Statistical Test Suite for
867 testing pseudorandom number generators for cryptographic applications),
868 for excellent descriptions of the tests therein. These descriptions
869 enabled this suite to be developed with a GPL.
870
871 The author also wishes to reiterate that the academic correctness and
872 accuracy of the implementation of these tests is his sole responsibil‐
873 ity and not that of the authors of the Diehard or STS suites. This is
874 especially true where he has seen fit to modify those tests from their
875 strict original descriptions.
876
877
879 GPL 2b; see the file COPYING that accompanies the source of this pro‐
880 gram. This is the "standard Gnu General Public License version 2 or
881 any later version", with the one minor (humorous) "Beverage" modifica‐
882 tion listed below. Note that this modification is probably not legally
883 defensible and can be followed really pretty much according to the
884 honor rule.
885
886 As to my personal preferences in beverages, red wine is great, beer is
887 delightful, and Coca Cola or coffee or tea or even milk acceptable to
888 those who for religious or personal reasons wish to avoid stressing my
889 liver.
890
891 The Beverage Modification to the GPL:
892
893 Any satisfied user of this software shall, upon meeting the primary
894 author(s) of this software for the first time under the appropriate
895 circumstances, offer to buy him or her or them a beverage. This bever‐
896 age may or may not be alcoholic, depending on the personal ethical and
897 moral views of the offerer. The beverage cost need not exceed one U.S.
898 dollar (although it certainly may at the whim of the offerer:-) and may
899 be accepted or declined with no further obligation on the part of the
900 offerer. It is not necessary to repeat the offer after the first meet‐
901 ing, but it can't hurt...
902
903
904
905
906dieharder Copyright 2003 Robert G. Brown dieharder(1)