1dieharder(1)                General Commands Manual               dieharder(1)
2
3
4

NAME

6       dieharder  -  A testing and benchmarking tool for random number genera‐
7       tors.
8
9

SYNOPSIS

11       dieharder [-a] [-d dieharder test number] [-f filename] [-B]
12                 [-D output flag [-D output flag] ... ] [-F] [-c separator]
13                 [-g generator number or -1] [-h] [-k ks_flag] [-l]
14                 [-L overlap] [-m multiply_p] [-n ntuple]
15                 [-p number of p samples] [-P Xoff]
16                 [-o filename] [-s seed strategy] [-S random number seed]
17                 [-n ntuple] [-p number of p samples] [-o filename]
18                 [-s seed strategy] [-S random number seed]
19                 [-t number of test samples] [-v verbose flag]
20                 [-W weak] [-X fail] [-Y Xtrategy]
21                 [-x xvalue] [-y yvalue] [-z zvalue]
22
23

dieharder OPTIONS

25       -a runs all the tests with standard/default options to create a
26              user-controllable report.  To  control  the  formatting  of  the
27              report,  see  -D below.  To control the power of the test (which
28              uses default values for tsamples that cannot generally be varied
29              and psamples which generally can) see -m below as a "multiplier"
30              of the default number of psamples (used only in a -a run).
31
32       -d test number -  selects specific diehard test.
33
34       -f filename - generators 201 or 202 permit either raw binary or
35              formatted ASCII numbers to be read in from a file  for  testing.
36              generator  200  reads  in  raw  binary numbers from stdin.  Note
37              well: many tests with default parameters require a lot of rands!
38              To  see  a  sample  of the (required) header for ASCII formatted
39              input, run
40
41                       dieharder -o -f example.input -t 10
42
43              and then examine the  contents  of  example.input.   Raw  binary
44              input  reads  32  bit  increments  of the specified data stream.
45              stdin_input_raw accepts a pipe from a raw binary stream.
46
47       -B binary mode (used with -o below) causes output rands to  be  written
48       in raw binary, not formatted ascii.
49
50       -D output flag - permits fields to be selected for inclusion in
51              dieharder  output.   Each flag can be entered as a binary number
52              that turns on a specific output field or header or by flag name;
53              flags  are aggregated.  To see all currently known flags use the
54              -F command.
55
56       -F - lists all known flags by name and number.
57
58       -c table separator - where separator is e.g. ',' (CSV) or '  '  (white‐
59       space).
60
61       -g generator number - selects a specific generator for testing.  Using
62              -g  -1 causes all known generators to be printed out to the dis‐
63              play.
64
65       -h prints context-sensitive help -- usually Usage (this message) or a
66              test synopsis if entered as e.g. dieharder -d 3 -h.
67
68       -k ks_flag - ks_flag
69
70              0 is fast but slightly sloppy for psamples > 4999 (default).
71
72              1 is MUCH slower but more accurate for larger numbers  of  psam‐
73              ples.
74
75              2  is  slower still, but (we hope) accurate to machine precision
76              for any number of psamples up to some as yet  unknown  numerical
77              upper  limit  (it  has  been  tested out to at least hundreds of
78              thousands).
79
80              3 is kuiper ks, fast, quite inaccurate for small samples, depre‐
81              cated.
82
83       -l list all known tests.
84
85       -L overlap
86
87              1 (use overlap, default)
88
89              0 (don't use overlap)
90
91              in  operm5 or other tests that support overlapping and non-over‐
92              lapping sample modes.
93
94       -m multiply_p - multiply default # of psamples in -a(ll) runs to crank
95              up the resolution of failure.  -n ntuple - set ntuple length for
96              tests  on  short bit strings that permit the length to be varied
97              (e.g. rgb bitdist).
98
99       -o filename - output -t count random numbers from current generator  to
100       file.
101
102       -p count - sets the number of p-value samples per test (default 100).
103
104       -P  Xoff - sets the number of psamples that will cumulate before decid‐
105       ing
106              that a generator is "good" and really, truly passes even a -Y  2
107              T2D run.  Currently the default is 100000; eventually it will be
108              set from AES-derived T2D test failure thresholds for fully auto‐
109              mated  reliable  operation,  but  for now it is more a "boredom"
110              threshold set by how long one might reasonably want to  wait  on
111              any given test run.
112
113       -S seed - where seed is a uint.  Overrides the default random seed
114              selection.  Ignored for file or stdin input.
115
116       -s strategy - if strategy is the (default) 0, dieharder reseeds (or
117              rewinds)  once at the beginning when the random number generator
118              is selected and then never again.  If strategy is  nonzero,  the
119              generator  is reseeded or rewound at the beginning of EACH TEST.
120              If -S seed was specified, or a file is used,  this  means  every
121              test  is applied to the same sequence (which is useful for vali‐
122              dation and testing of dieharder, but not  a  good  way  to  test
123              rngs).  Otherwise a new random seed is selected for each test.
124
125       -t count - sets the number of random entities used in each test, where
126              possible.  Be warned -- some tests have fixed sample sizes; oth‐
127              ers are variable but have practical minimum sizes.  It  is  sug‐
128              gested you begin with the values used in -a and experiment care‐
129              fully on a test by test basis.
130
131       -W weak - sets the "weak" threshold to make the test(s) more or less
132              forgiving during e.g. a  test-to-destruction  run.   Default  is
133              currently 0.005.
134
135       -X fail - sets the "fail" threshold to make the test(s) more or less
136              forgiving  during  e.g.  a  test-to-destruction run.  Default is
137              currently 0.000001, which is basically "certain failure  of  the
138              null  hypothesis",  the  desired  mode of reproducible generator
139              failure.
140
141       -Y Xtrategy - the Xtrategy flag controls  the  new  "test  to  failure"
142       (T2F)
143              modes.  These flags and their modes act as follows:
144
145                0  -  just run dieharder with the specified number of tsamples
146              and psamples, do not dynamically modify a run based on  results.
147              This is the way it has always run, and is the default.
148
149                1  - "resolve ambiguity" (RA) mode.  If a test returns "weak",
150              this is an undesired result.  What does that  mean,  after  all?
151              If  you  run  a  long  test series, you will see occasional weak
152              returns for a perfect generators because p is uniformly distrib‐
153              uted  and  will appear in any finite interval from time to time.
154              Even if a test run returns more than one weak result, you cannot
155              be certain that the generator is failing.  RA mode adds psamples
156              (usually in blocks of 100) until the test result ends up solidly
157              not  weak  or  proceeds to unambiguous failure.  This is morally
158              equivalent to running the test several times to see  if  a  weak
159              result  is  reproducible,  but  eliminates  the bias of personal
160              judgement in the process since the default failure threshold  is
161              very small and very unlikely to be reached by random chance even
162              in many runs.
163
164              This option should only be used with -k 2.
165
166                2 - "test to destruction" mode.  Sometimes you  just  want  to
167              know  where  or if a generator will .I ever fail a test (or test
168              series).  -Y 2 causes psamples to be added 100 at a time until a
169              test  returns an overall pvalue lower than the failure threshold
170              or a specified maximum number of psamples (see -P) is reached.
171
172              Note well!  In this mode one may well fail due to the  alternate
173              null  hypothesis  --  the  test  itself is a bad test and fails!
174              Many dieharder tests, despite our best efforts, are  numerically
175              unstable  or  have only approximately known target statistics or
176              are straight up asymptotic results, and will eventually return a
177              failing result even for a gold-standard generator (such as AES),
178              or for the hypercautious the XOR generator with AES,  threefish,
179              kiss,  all  loaded  at once and xor'd together.  It is therefore
180              safest to use this mode .I comparatively, executing a T2D run on
181              AES to get an idea of the test failure threshold(s) (something I
182              will eventually do and publish on the web so  everybody  doesn't
183              have  to do it independently) and then running it on your target
184              generator.  Failure with numbers of psamples within an order  of
185              magnitude  of  the  AES thresholds should probably be considered
186              possible test failures, not  generator  failures.   Failures  at
187              levels significantly less than the known gold standard generator
188              failure thresholds are, of course, probably failures of the gen‐
189              erator.
190
191              This option should only be used with -k 2.
192
193       -v verbose flag -- controls the verbosity of the output for debugging
194              only.   Probably of little use to non-developers, and developers
195              can read the enum(s) in dieharder.h and the test sources to  see
196              which flag values turn on output on which routines.  1 is result
197              in a highly detailed trace of program activity.
198
199       -x,-y,-z number - Some tests have parameters that can safely be varied
200              from their default value.  For example, in the diehard birthdays
201              test,  one can vary the number of length, which can also be var‐
202              ied.  -x 2048 -y 30 alters these two values but should still run
203              fine.   These  parameters should be documented internally (where
204              they exist) in the e.g. -d 0 -h visible notes.
205
206              NOTE WELL: The assessment(s) for the rngs may, in fact, be  com‐
207              pletely incorrect or misleading.  There are still "bad tests" in
208              dieharder, although we are working to fix and improve them  (and
209              try  to  document  them in the test descriptions visible with -g
210              testnumber -h).  In particular, 'Weak' pvalues should occur  one
211              test  in two hundred, and 'Failed' pvalues should occur one test
212              in a million with the default thresholds - that's what p  MEANS.
213              Use them at your Own Risk!  Be Warned!
214
215              Or  better  yet,  use the new -Y 1 and -Y 2 resolve ambiguity or
216              test to destruction modes above, comparing to  similar  runs  on
217              one  of  the as-good-as-it-gets cryptographic generators, AES or
218              threefish.
219
220

DESCRIPTION

222       dieharder
223
224       Welcome to the current snapshot of the dieharder random number  tester.
225       It  encapsulates  all of the Gnu Scientific Library (GSL) random number
226       generators (rngs) as well as a number of generators from the R  statis‐
227       tical  library,  hardware sources such as /dev/*random, "gold standard"
228       cryptographic quality generators (useful for testing dieharder and  for
229       purposes  of  comparison  to new generators) as well as generators con‐
230       tributed by users or found in the literature into a single harness that
231       can  time them and subject them to various tests for randomness.  These
232       tests are variously drawn from George Marsaglia's "Diehard  battery  of
233       random  number  tests", the NIST Statistical Test Suite, and again from
234       other sources such as  personal  invention,  user  contribution,  other
235       (open source) test suites, or the literature.
236
237       The  primary  point  of  dieharder  is to make it easy to time and test
238       (pseudo)random number generators, including both software and  hardware
239       rngs,  with  a  fully  open  source  tool.   In  addition  to providing
240       "instant" access to testing  of  all  built-in  generators,  users  can
241       choose  one of three ways to test their own random number generators or
242       sources:  a unix pipe of a raw binary (presumed  random)  bitstream;  a
243       file  containing  a (presumed random) raw binary bitstream or formatted
244       ascii uints or floats; and embedding your generator in dieharder's GSL-
245       compatible  rng  harness  and adding it to the list of built-in genera‐
246       tors.  The stdin and file input methods are described  below  in  their
247       own section, as is suggested "best practice" for newbies to random num‐
248       ber generator testing.
249
250       An important motivation for using dieharder is  that  the  entire  test
251       suite  is  fully  Gnu  Public  License (GPL) open source code and hence
252       rather than being prohibited from "looking  underneath  the  hood"  all
253       users  are  openly  encouraged to critically examine the dieharder code
254       for errors, add new tests or generators or user interfaces, or  use  it
255       freely  as is to test their own favorite candidate rngs subject only to
256       the constraints of the GPL.  As a result  of  its  openness,  literally
257       hundreds  of  improvements and bug fixes have been contributed by users
258       to date, resulting in a far stronger and more reliable test suite  than
259       would  have  been  possible with closed and locked down sources or even
260       open sources (such as STS) that lack the dynamical  feedback  mechanism
261       permitting corrections to be shared.
262
263       Even  small  errors  in test statistics permit the alternative (usually
264       unstated) null hypothesis to become an important factor in rng  testing
265       -- the unwelcome possibility that your generator is just fine but it is
266       the test that is failing.  One extremely useful feature of dieharder is
267       that  it is at least moderately self validating.  Using the "gold stan‐
268       dard" aes and threefish cryptographic generators, you can  observe  how
269       these  generators  perform on dieharder runs to the same general degree
270       of accuracy that you wish to use on the generators you are testing.  In
271       general,  dieharder  tests that consistently fail at any given level of
272       precision (selected with e.g. -a -m 10) on both of  the  gold  standard
273       rngs (and/or the better GSL generators, mt19937, gfsr4, taus) are prob‐
274       ably unreliable at that precision and it would hardly be surprising  if
275       they failed your generator as well.
276
277       Experts  in  statistics are encouraged to give the suite a try, perhaps
278       using any of the example calls below at first and then using it  freely
279       on  their  own  generators  or as a harness for adding their own tests.
280       Novices (to either statistics or random number generator  testing)  are
281       strongly  encouraged  to read the next section on p-values and the null
282       hypothesis and running the test suite a few times with a  more  verbose
283       output report to learn how the whole thing works.
284
285

QUICK START EXAMPLES

287       Examples  for  how  to set up pipe or file input are given below.  How‐
288       ever, it is recommended that a user play with some of the built in gen‐
289       erators  to  gain  familiarity  with dieharder reports and tests before
290       tackling their own favorite generator or file full of  possibly  random
291       numbers.
292
293       To see dieharder's default standard test report for its default genera‐
294       tor (mt19937) simply run:
295
296          dieharder -a
297
298       To increase the resolution of possible failures of the standard  -a(ll)
299       test,  use  the -m "multiplier" for the test default numbers of pvalues
300       (which are selected more to make a full test run take  an  hour  or  so
301       instead  of  days than because it is truly an exhaustive test sequence)
302       run:
303
304          dieharder -a -m 10
305
306       To test a different generator (say the gold  standard  AES_OFB)  simply
307       specify the generator on the command line with a flag:
308
309          dieharder -g 205 -a -m 10
310
311       Arguments  can  be in any order.  The generator can also be selected by
312       name:
313
314          dieharder -g AES_OFB -a
315
316       To apply only the diehard opso test to the AES_OFB  generator,  specify
317       the test by name or number:
318
319          dieharder -g 205 -d 5
320
321       or
322
323          dieharder -g 205 -d diehard_opso
324
325       Nearly  every  aspect  or  field in dieharder's output report format is
326       user-selectable by means of display option  flags.   In  addition,  the
327       field  separator character can be selected by the user to make the out‐
328       put particularly easy for them to parse (-c  '  ')  or  import  into  a
329       spreadsheet (-c ',').  Try:
330
331          dieharder -g 205 -d diehard_opso -c ',' -D test_name -D pvalues
332
333       to see an extremely terse, easy to import report or
334
335          dieharder  -g  205 -d diehard_opso -c ' ' -D default -D histogram -D
336       description
337
338       to see a verbose report good for a  "beginner"  that  includes  a  full
339       description of each test itself.
340
341       Finally, the dieharder binary is remarkably autodocumenting even if the
342       man page is not available. All users should try the following  commands
343       to see what they do:
344
345          dieharder -h
346
347       (prints the command synopsis like the one above).
348
349          dieharder -a -h
350          dieharder -d 6 -h
351
352       (prints the test descriptions only for -a(ll) tests or for the specific
353       test indicated).
354
355          dieharder -l
356
357       (lists all known tests, including how reliable rgb thinks that they are
358       as things stand).
359
360          dieharder -g -1
361
362       (lists all known rngs).
363
364          dieharder -F
365
366       (lists  all  the currently known display/output control flags used with
367       -D).
368
369       Both beginners and experts should be aware that the assessment provided
370       by  dieharder in its standard report should be regarded with great sus‐
371       picion.  It is entirely possible for a generator to "pass" all tests as
372       far  as their individual p-values are concerned and yet to fail utterly
373       when considering them all together.  Similarly, it is probable  that  a
374       rng  will  at  the very least show up as "weak" on 0, 1 or 2 tests in a
375       typical -a(ll) run, and may even "fail" 1 test one such run  in  10  or
376       so.   To understand why this is so, it is necessary to understand some‐
377       thing of rng testing, p-values, and the null hypothesis!
378
379

P-VALUES AND THE NULL HYPOTHESIS

381       dieharder returns "p-values".  To understand what a p-value is and  how
382       to use it, it is essential to understand the null hypothesis, H0.
383
384       The null hypothesis for random number generator testing is "This gener‐
385       ator is a perfect random number generator, and for any choice  of  seed
386       produces  a  infinitely  long, unique sequence of numbers that have all
387       the expected statistical properties of random numbers, to all  orders".
388       Note  well  that  we know that this hypothesis is technically false for
389       all software generators as they are periodic and do not have  the  cor‐
390       rect entropy content for this statement to ever be true.  However, many
391       hardware generators fail a priori as well, as they contain subtle  bias
392       or  correlations  due to the deterministic physics that underlies them.
393       Nature is often unpredictable but it is rarely random and the two words
394       don't (quite) mean the same thing!
395
396       The  null  hypothesis  can be practically true, however.  Both software
397       and hardware generators can be "random"  enough  that  their  sequences
398       cannot  be  distinguished from random ones, at least not easily or with
399       the available tools (including dieharder!) Hence the null hypothesis is
400       a practical, not a theoretically pure, statement.
401
402       To  test  H0  ,  one uses the rng in question to generate a sequence of
403       presumably random numbers.  Using these numbers one  can  generate  any
404       one  of a wide range of test statistics -- empirically computed numbers
405       that are considered random samples that may or  may  not  be  covariant
406       subject  to  H0,  depending  on whether overlapping sequences of random
407       numbers are used to generate successive samples  while  generating  the
408       statistic(s), drawn from a known distribution.  From a knowledge of the
409       target distribution of the statistic(s) and the  associated  cumulative
410       distribution  function  (CDF)  and  the empirical value of the randomly
411       generated statistic(s), one can read off the probability  of  obtaining
412       the  empirical result if the sequence was truly random, that is, if the
413       null hypothesis is true and the generator in question is a "good"  ran‐
414       dom  number  generator!  This probability is the "p-value" for the par‐
415       ticular test run.
416
417       For example, to test a coin (or a sequence of  bits)  we  might  simply
418       count the number of heads and tails in a very long string of flips.  If
419       we assume that the coin is a "perfect coin", we expect  the  number  of
420       heads and tails to be binomially distributed and can easily compute the
421       probability of getting any particular number of heads and tails.  If we
422       compare  our recorded number of heads and tails from the test series to
423       this distribution and find that the probability of getting the count we
424       obtained  is very low with, say, way more heads than tails we'd suspect
425       the coin wasn't a perfect coin.  dieharder applies this very test (made
426       mathematically precise) and many others that operate on this same prin‐
427       ciple to the string of random bits produced by the rng being tested  to
428       provide a picture of how "random" the rng is.
429
430       Note  that  the  usual dogma is that if the p-value is low -- typically
431       less than 0.05 -- one "rejects" the null hypothesis.  In a word, it  is
432       improbable that one would get the result obtained if the generator is a
433       good one.  If it is any other value, one does not "accept" the  genera‐
434       tor  as  good, one "fails to reject" the generator as bad for this par‐
435       ticular test.  A "good random number generator" is hence  one  that  we
436       haven't been able to make fail yet!
437
438       This  criterion  is, of course, naive in the extreme and cannot be used
439       with dieharder!  It makes just as much sense to reject a generator that
440       has p-values of 0.95 or more!  Both of these p-value ranges are equally
441       unlikely on any given test run, and should be returned for (on average)
442       5%  of all test runs by a perfect random number generator.  A generator
443       that fails to produce p-values less than 0.05 5%  of  the  time  it  is
444       tested  with different seeds is a bad random number generator, one that
445       fails the test of the null hypothesis.  Since  dieharder  returns  over
446       100  pvalues  by  default per test, one would expect any perfectly good
447       rng to "fail" such a naive test around five times by this criterion  in
448       a single dieharder run!
449
450       The  p-values  themselves,  as  it  turns out, are test statistics!  By
451       their nature, p-values should be uniformly  distributed  on  the  range
452       0-1.   In 100+ test runs with independent seeds, one should not be sur‐
453       prised to obtain 0, 1, 2, or even (rarely) 3 p-values less  than  0.01.
454       On  the other hand obtaining 7 p-values in the range 0.24-0.25, or see‐
455       ing that 70 of the p-values are greater than 0.5 should make the gener‐
456       ator highly suspect!  How can a user determine when a test is producing
457       "too many" of any particular value range for p?  Or too few?
458
459       Dieharder does it for you, automatically.  One can in  fact  convert  a
460       set  of  p-values into a p-value by comparing their distribution to the
461       expected one, using a Kolmogorov-Smirnov test against the expected uni‐
462       form distribution of p.
463
464       These  p-values  obtained  from looking at the distribution of p-values
465       should in turn be uniformly distributed and could in principle be  sub‐
466       jected to still more KS tests in aggregate.  The distribution of p-val‐
467       ues for a good generator should be idempotent,  even  across  different
468       test statistics and multiple runs.
469
470       A  failure  of the distribution of p-values at any level of aggregation
471       signals trouble.  In fact, if the p-values of any given test  are  sub‐
472       jected  to  a  KS  test,  and those p-values are then subjected to a KS
473       test, as we add more p-values to either level we  will  either  observe
474       idempotence  of  the  resulting  distribution of p to uniformity, or we
475       will observe idempotence to a single p-value of zero!  That is, a  good
476       generator  will  produce a roughly uniform distribution of p-values, in
477       the specific sense that the p-values of the distributions  of  p-values
478       are themselves roughly uniform and so on ad infinitum, while a bad gen‐
479       erator will produce a non-uniform distribution of p-values, and as more
480       p-values  drawn  from  the non-uniform distribution are added to its KS
481       test, at some point the failure will be absolutely unmistakeable as the
482       resulting p-value approaches 0 in the limit.  Trouble indeed!
483
484       The question is, trouble with what?  Random number tests are themselves
485       complex computational objects, and there is a  probability  that  their
486       code  is  incorrectly framed or that roundoff or other numerical -- not
487       methodical -- errors are contributing to a distortion of the  distribu‐
488       tion  of  some  of the p-values obtained.  This is not an idle observa‐
489       tion; when one works on writing random number  generator  testing  pro‐
490       grams, one is always testing the tests themselves with "good" (we hope)
491       random number generators so that egregious failures of the null hypoth‐
492       esis  signal  not  a  bad generator but an error in the test code.  The
493       null hypothesis above is correctly framed from a theoretical  point  of
494       view, but from a real and practical point of view it should read: "This
495       generator is a perfect random number generator, and for any  choice  of
496       seed  produces  a infinitely long, unique sequence of numbers that have
497       all the expected statistical  properties  of  random  numbers,  to  all
498       orders and this test is a perfect test and returns precisely correct p-
499       values from the test computation."  Observed "failure"  of  this  joint
500       null  hypothesis  H0'  can come from failure of either or both of these
501       disjoint components, and comes from the second as often or  more  often
502       than the first during the test development process.  When one cranks up
503       the "resolution" of the test (discussed  next)  to  where  a  generator
504       starts to fail some test one realizes, or should realize, that develop‐
505       ment never ends and that new test regimes will always reveal new  fail‐
506       ures not only of the generators but of the code.
507
508       With  that  said, one of dieharder's most significant advantages is the
509       control that it gives you over a critical  test  parameter.   From  the
510       remarks  above, we can see that we should feel very uncomfortable about
511       "failing" any given random number generator on the basis of  a  5%,  or
512       even  a  1%,  criterion,  especially  when  we  apply a test suite like
513       dieharder that returns over 100 (and climbing) distinct  test  p-values
514       as  of the last snapshot.  We want failure to be unambiguous and repro‐
515       ducible!
516
517       To accomplish this, one can simply crank up its resolution.  If we  ran
518       any  given  test against a random number generator and it returned a p-
519       value of (say) 0.007328, we'd be perfectly justified in wondering if it
520       is  really  a good generator.  However, the probability of getting this
521       result isn't really all that small -- when one uses dieharder for hours
522       at a time numbers like this will definitely happen quite frequently and
523       mean nothing.  If one runs the same test again (with a  different  seed
524       or  part  of the random sequence) and gets a p-value of 0.009122, and a
525       third time and gets 0.002669 -- well, that's three 1% (or  less)  shots
526       in  a  row and that should happen only one in a million times.  One way
527       to clearly resolve failures, then, is to increase the number of  p-val‐
528       ues  generated  in  a  test run.  If the actual distribution of p being
529       returned by the test is not uniform, a KS test will eventually return a
530       p-value  that  is  not some ambiguous 0.035517 but is instead 0.000000,
531       with the latter produced time after time as we rerun.
532
533       For this reason, dieharder is extremely conservative  about  announcing
534       rng  "weakness" or "failure" relative to any given test.  It's internal
535       criterion for these things are currently p < 0.5% or p > 99.5% weakness
536       (at the 1% level total) and a considerably more stringent criterion for
537       failure: p < 0.05% or p > 99.95%.  Note well that the ranges  are  sym‐
538       metric  --  too  high a value of p is just as bad (and unlikely) as too
539       low, and it is critical to flag it, because it is quite possible for  a
540       rng  to be too good, on average, and not to produce enough low p-values
541       on the full spectrum of dieharder  tests.   This  is  where  the  final
542       kstest is of paramount importance, and where the "histogram" option can
543       be very useful to help you visualize the failure in the distribution of
544       p -- run e.g.:
545
546         dieharder [whatever] -D default -D histogram
547
548       and you will see a crude ascii histogram of the pvalues that failed (or
549       passed) any given level of test.
550
551       Scattered reports of weakness or  marginal  failure  in  a  preliminary
552       -a(ll)  run should therefore not be immediate cause for alarm.  Rather,
553       they are tests to repeat, to watch out for, to push the rng  harder  on
554       using  the -m option to -a or simply increasing -p for a specific test.
555       Dieharder permits one to increase the number of p-values generated  for
556       any  test,  subject  only  to the availability of enough random numbers
557       (for file based tests) and time, to make failures unambiguous.  A  test
558       that  is  truly  weak  at -p 100 will almost always fail egregiously at
559       some larger value of psamples, be it -p 1000 or  -p  100000.   However,
560       because dieharder is a research tool and is under perpetual development
561       and testing, it is strongly suggested  that  one  always  consider  the
562       alternative  null  hypothesis  --  that the failure is a failure of the
563       test code in dieharder itself in some limit of  large  numbers  --  and
564       take  at  least  some  steps (such as running the same test at the same
565       resolution on a "gold standard" generator) to ensure that  the  failure
566       is indeed probably in the rng and not the dieharder code.
567
568       Lacking a source of perfect random numbers to use as a reference, vali‐
569       dating the tests themselves is not easy and always leaves one with some
570       ambiguity (even aes or threefish).  During development the best one can
571       usually do is to rely heavily on these "presumed  good"  random  number
572       generators.   There are a number of generators that we have theoretical
573       reasons to expect to be extraordinarily good and to  lack  correlations
574       out  to  some  known  underlying dimensionality, and that also test out
575       extremely well quite consistently.  By using  several  such  generators
576       and  not just one, one can hope that those generators have (at the very
577       least) different correlations and should not all uniformly fail a  test
578       in  the  same  way  and  with the same number of p-values.  When all of
579       these generators consistently fail a test at a given level, I  tend  to
580       suspect  that  the  problem  is  in  the test code, not the generators,
581       although it is very  difficult  to  be  certain,  and  many  errors  in
582       dieharder's code have been discovered and ultimately fixed in just this
583       way by myself or others.
584
585       One advantage of dieharder is that it has a number of these "good  gen‐
586       erators" immediately available for comparison runs, courtesy of the Gnu
587       Scientific Library and user  contribution  (notably  David  Bauer,  who
588       kindly  encapsulated aes and threefish).  I use AES_OFB, Threefish_OFB,
589       mt19937_1999, gfsr4, ranldx2 and taus2 (as well as "true  random"  num‐
590       bers  from  random.org)  for  this  purpose,  and  I try to ensure that
591       dieharder will "pass" in particular the -g 205 -S 1 -s 1  generator  at
592       any reasonable p-value resolution out to -p 1000 or farther.
593
594       Tests (such as the diehard operm5 and sums test) that consistently fail
595       at these high resolutions are flagged as being  "suspect"  --  possible
596       failures  of  the  alternative null hypothesis -- and they are strongly
597       deprecated!  Their results should not be used  to  test  random  number
598       generators pending agreement in the statistics and random number commu‐
599       nity that those tests are in fact valid and correct  so  that  observed
600       failures  can  indeed safely be attributed to a failure of the intended
601       null hypothesis.
602
603       As I keep emphasizing (for good reason!) dieharder  is  community  sup‐
604       ported.   I  therefore  openly  ask that the users of dieharder who are
605       expert in statistics to help me fix the code or algorithms being imple‐
606       mented.  I would like to see this test suite ultimately be validated by
607       the general statistics community in hard use in  an  open  environment,
608       where every possible failure of the testing mechanism itself is subject
609       to scrutiny and eventual correction.  In this way  we  will  eventually
610       achieve  a very powerful suite of tools indeed, ones that may well give
611       us very specific information not just about failure but of the mode  of
612       failure as well, just how the sequence tested deviates from randomness.
613
614       Thus  far,  dieharder  has  benefitted tremendously from the community.
615       Individuals have openly contributed tests, new generators to be tested,
616       and  fixes for existing tests that were revealed by their own work with
617       the testing instrument.  Efforts are underway to  make  dieharder  more
618       portable  so  that  it  will build on more platforms and faster so that
619       more thorough testing can be done.  Please feel free to participate.
620
621

FILE INPUT

623       The simplest way to use dieharder with an external generator that  pro‐
624       duces  raw binary (presumed random) bits is to pipe the raw binary out‐
625       put from this generator (presumed to be  a  binary  stream  of  32  bit
626       unsigned integers) directly into dieharder, e.g.:
627
628         cat /dev/urandom | ./dieharder -a -g 200
629
630       Go  ahead and try this example.  It will run the entire dieharder suite
631       of tests on  the  stream  produced  by  the  linux  built-in  generator
632       /dev/urandom (using /dev/random is not recommended as it is too slow to
633       test in a reasonable amount of time).
634
635       Alternatively, dieharder can be used to test files of numbers  produced
636       by a candidate random number generators:
637
638         dieharder -a -g 201 -f random.org_bin
639
640       for raw binary input or
641
642         dieharder -a -g 202 -f random.org.txt
643
644       for formatted ascii input.
645
646       A  formatted  ascii input file can accept either uints (integers in the
647       range 0 to 2^31-1, one per line) or decimal uniform  deviates  with  at
648       least ten significant digits (that can be multiplied by UINT_MAX = 2^32
649       to produce a uint without  dropping  precition),  also  one  per  line.
650       Floats  with  fewer  digits  will almost certainly fail bitlevel tests,
651       although they may pass some of the tests that act on uniform deviates.
652
653       Finally, one can fairly easily wrap any generator  in  the  same  (GSL)
654       random  number  harness used internally by dieharder and simply test it
655       the same way one would  any  other  internal  generator  recognized  by
656       dieharder.   This is strongly recommended where it is possible, because
657       dieharder needs to use a lot of random numbers  to  thoroughly  test  a
658       generator.  A built in generator can simply let dieharder determine how
659       many it needs and generate them on demand, where a  file  that  is  too
660       small  will  "rewind" and render the test results where a rewind occurs
661       suspect.
662
663       Note well that file input rands are delivered to the tests  on  demand,
664       but  if  the  test  needs more than are available it simply rewinds the
665       file and cycles through it again,  and  again,  and  again  as  needed.
666       Obviously  this  significantly reduces the sample space and can lead to
667       completely incorrect results for the p-value  histograms  unless  there
668       are enough rands to run EACH test without repetition (it is harmless to
669       reuse the sequence for different tests).  Let the user beware!
670
671

BEST PRACTICE

673       A frequently asked question from new users wishing to test a  generator
674       they  are  working  on for fun or profit (or both) is "How should I get
675       its  output  into  dieharder?"   This  is  a  nontrivial  question,  as
676       dieharder  consumes  enormous  numbers of random numbers in a full test
677       cycle, and then there are features like -m 10 or -m 100  that  let  one
678       effortlessly  demand  10 or 100 times as many to stress a new generator
679       even more.
680
681       Even with large file support in dieharder, it is difficult  to  provide
682       enough  random numbers in a file to really make dieharder happy.  It is
683       therefore strongly suggested that you either:
684
685       a) Edit the output stage of your random number generator and get it  to
686       write its production to stdout as a random bit stream -- basically cre‐
687       ate 32 bit unsigned random integers and write them directly  to  stdout
688       as  e.g.  char  data  or raw binary.  Note that this is not the same as
689       writing raw floating point numbers (that will not be random at all as a
690       bitstream) and that "endianness" of the uints should not matter for the
691       null hypothesis of a "good" generator, as random bytes  are  random  in
692       any  order.  Crank the generator and feed this stream to dieharder in a
693       pipe as described above.
694
695       b) Use the samples of GSL-wrapped dieharder rngs to similarly wrap your
696       generator  (or  calls  to your generator's hardware interface).  Follow
697       the examples in the ./dieharder source directory to add it as a  "user"
698       generator in the command line interface, rebuild, and invoke the gener‐
699       ator as a "native" dieharder generator (it should appear  in  the  list
700       produced by -g -1 when done correctly).  The advantage of doing it this
701       way is that you can then (if your new generator is  highly  successful)
702       contribute  it  back to the dieharder project if you wish!  Not to men‐
703       tion the fact that it makes testing it very easy.
704
705       Most users will probably go with option a) at least initially,  but  be
706       aware  that  b) is probably easier than you think.  The dieharder main‐
707       tainers may be able to give you a hand with it if you get into trouble,
708       but no promises.
709
710

WARNING!

712       A warning for those who are testing files of random numbers.  dieharder
713       is a tool that tests random number generators, not files of random num‐
714       bers!  It is extremely inappropriate to try to "certify" a file of ran‐
715       dom numbers as being random just because it fails to "fail" any of  the
716       dieharder  tests in e.g. a dieharder -a run.  To put it bluntly, if one
717       rejects all such files that fail any test at the  0.05  level  (or  any
718       other),  the one thing one can be certain of is that the files in ques‐
719       tion are not random, as a truly random sequence would  fail  any  given
720       test at the 0.05 level 5% of the time!
721
722       To put it another way, any file of numbers produced by a generator that
723       "fails to fail" the dieharder suite should be considered "random", even
724       if  it contains sequences that might well "fail" any given test at some
725       specific cutoff.  One has to presume that passing the broader tests  of
726       the  generator itself, it was determined that the p-values for the test
727       involved was globally correctly distributed, so that  e.g.  failure  at
728       the  0.01  level  occurs  neither more nor less than 1% of the time, on
729       average, over many many tests.  If  one  particular  file  generates  a
730       failure  at  this  level, one can therefore safely presume that it is a
731       random file pulled from many thousands of similar files  the  generator
732       might create that have the correct distribution of p-values at all lev‐
733       els of testing and aggregation.
734
735       To sum up, use dieharder to validate your  generator  (via  input  from
736       files  or an embedded stream).  Then by all means use your generator to
737       produce files or streams of random numbers.  Do not use dieharder as an
738       accept/reject tool to validate the files themselves!
739
740

EXAMPLES

742       To demonstrate all tests, run on the default GSL rng, enter:
743
744         dieharder -a
745
746       To  demonstrate  a test of an external generator of a raw binary stream
747       of bits, use the stdin (raw) interface:
748
749         cat /dev/urandom | dieharder -g 200 -a
750
751       To use it with an ascii formatted file:
752
753         dieharder -g 202 -f testrands.txt -a
754
755       (testrands.txt should consist of a header such as:
756
757        #==================================================================
758        # generator mt19937_1999  seed = 1274511046
759        #==================================================================
760        type: d
761        count: 100000
762        numbit: 32
763        3129711816
764          85411969
765        2545911541
766
767       etc.).
768
769       To use it with a binary file
770
771         dieharder -g 201 -f testrands.bin -a
772
773       or
774
775         cat testrands.bin | dieharder -g 200 -a
776
777       An example that demonstrates the use of "prefixes" on the output  lines
778       that  make  it relatively easy to filter off the different parts of the
779       output report and chop them up into numbers that can be used  in  other
780       programs or in spreadsheets, try:
781
782         dieharder -a -c ',' -D default -D prefix
783
784

DISPLAY OPTIONS

786       As  of version 3.x.x, dieharder has a single output interface that pro‐
787       duces tabular data per test, with common information in  headers.   The
788       display  control  options and flags can be used to customize the output
789       to your individual specific needs.
790
791       The options are controlled by binary flags.  The flags, and their  text
792       versions, are displayed if you enter:
793
794         dieharder -F
795
796       by itself on a line.
797
798       The  flags  can  be  entered  all  at once by adding up all the desired
799       option flags.  For example, a very sparse output could be  selected  by
800       adding the flags for the test_name (8) and the associated pvalues (128)
801       to get 136:
802
803         dieharder -a -D 136
804
805       Since the flags are cumulated from zero (unless no flag is entered  and
806       the default is used) you could accomplish the same display via:
807
808         dieharder -a -D 8 -D pvalues
809
810       Note  that you can enter flags by value or by name, in any combination.
811       Because people use dieharder to obtain values and then with  to  export
812       them into spreadsheets (comma separated values) or into filter scripts,
813       you can chance the field separator character.  For example:
814
815         dieharder -a -c ',' -D default -D -1 -D -2
816
817       produces output that is ideal for importing into  a  spreadsheet  (note
818       that one can subtract field values from the base set of fields provided
819       by the default option as long as it is given first).
820
821       An interesting option is the -D prefix flag, which  turns  on  a  field
822       identifier  prefix  to  make  it easy to filter out particular kinds of
823       data.  However, it is equally easy to turn on any  particular  kind  of
824       output to the exclusion of others directly by means of the flags.
825
826       Two other flags of interest to novices to random number generator test‐
827       ing are the -D histogram (turns on a histogram of the underlying  pval‐
828       ues,  per  test)  and -D description (turns on a complete test descrip‐
829       tion, per test).  These flags turn the output  table  into  more  of  a
830       series of "reports" of each test.
831
832

PUBLICATION RULES

834       dieharder  is  entirely  original  code and can be modified and used at
835       will by any user, provided that:
836
837         a) The original copyright notices are maintained and that the source,
838       including  all  modifications, is made publically available at the time
839       of any derived publication.  This is open source software according  to
840       the  precepts and spirit of the Gnu Public License.  See the accompany‐
841       ing file COPYING, which also must accompany any redistribution.
842
843         b) The primary author of the code (Robert G. Brown) is  appropriately
844       acknowledged and referenced in any derived publication.  It is strongly
845       suggested that George Marsaglia and the Diehard suite and  the  various
846       authors  of  the  Statistical  Test  Suite  be  similarly acknowledged,
847       although this suite shares no actual code with these random number test
848       suites.
849
850         c)  Full responsibility for the accuracy, suitability, and effective‐
851       ness of the program rests with  the  users  and/or  modifiers.   As  is
852       clearly stated in the accompanying copyright.h:
853
854       THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFT‐
855       WARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND  FITNESS,
856       IN  NO  EVENT  SHALL  THE  COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL,
857       INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES  WHATSOEVER  RESULTING
858       FROM  LOSS  OF  USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
859       NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF  OR  IN  CONNECTION
860       WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
861
862

ACKNOWLEDGEMENTS

864       The  author of this suite gratefully acknowledges George Marsaglia (the
865       author of the diehard test suite) and the various authors of NIST  Spe‐
866       cial Publication 800-22 (which describes the Statistical Test Suite for
867       testing pseudorandom number generators for cryptographic applications),
868       for  excellent  descriptions  of the tests therein.  These descriptions
869       enabled this suite to be developed with a GPL.
870
871       The author also wishes to reiterate that the academic  correctness  and
872       accuracy  of the implementation of these tests is his sole responsibil‐
873       ity and not that of the authors of the Diehard or STS suites.  This  is
874       especially  true where he has seen fit to modify those tests from their
875       strict original descriptions.
876
877
879       GPL 2b; see the file COPYING that accompanies the source of  this  pro‐
880       gram.   This  is  the "standard Gnu General Public License version 2 or
881       any later version", with the one minor (humorous) "Beverage"  modifica‐
882       tion listed below.  Note that this modification is probably not legally
883       defensible and can be followed really  pretty  much  according  to  the
884       honor rule.
885
886       As  to my personal preferences in beverages, red wine is great, beer is
887       delightful, and Coca Cola or coffee or tea or even milk  acceptable  to
888       those  who for religious or personal reasons wish to avoid stressing my
889       liver.
890
891       The Beverage Modification to the GPL:
892
893       Any satisfied user of this software shall,  upon  meeting  the  primary
894       author(s)  of  this  software  for the first time under the appropriate
895       circumstances, offer to buy him or her or them a beverage.  This bever‐
896       age  may or may not be alcoholic, depending on the personal ethical and
897       moral views of the offerer.  The beverage cost need not exceed one U.S.
898       dollar (although it certainly may at the whim of the offerer:-) and may
899       be accepted or declined with no further obligation on the part  of  the
900       offerer.  It is not necessary to repeat the offer after the first meet‐
901       ing, but it can't hurt...
902
903
904
905
906dieharder               Copyright 2003 Robert G. Brown            dieharder(1)
Impressum