1bup-margin(1)                                                    bup-margin(1)
2
3
4

NAME

6       bup-margin - figure out your deduplication safety margin
7

SYNOPSIS

9       bup margin [options...]
10

DESCRIPTION

12       bup margin  iterates through all objects in your bup repository, calcu‐
13       lating the largest number of prefix bits shared  between  any  two  en‐
14       tries.   This  number,  n,  identifies  the longest subset of SHA-1 you
15       could use and still encounter a collision between your object ids.
16
17       For example, one system that was tested had a collection of 11  million
18       objects  (70 GB), and bup margin returned 45.  That means a 46-bit hash
19       would be sufficient to avoid all collisions among that set of  objects;
20       each  object  in  that  repository  could be uniquely identified by its
21       first 46 bits.
22
23       The number of bits needed seems to increase by about 1 or 2  for  every
24       doubling  of  the number of objects.  Since SHA-1 hashes have 160 bits,
25       that leaves 115 bits of margin.  Of course, because  SHA-1  hashes  are
26       essentially  random,  it's theoretically possible to use many more bits
27       with far fewer objects.
28
29       If you're paranoid about the possibility of SHA-1 collisions,  you  can
30       monitor  your  repository  by running bup margin occasionally to see if
31       you're getting dangerously close to 160 bits.
32

OPTIONS

34       --predict
35              Guess the offset into each index file where a particular  object
36              will appear, and report the maximum deviation of the correct an‐
37              swer from the guess.  This is potentially useful for  tuning  an
38              interpolation search algorithm.
39
40       --ignore-midx
41              don't use .midx files, use only .idx files.  This is only really
42              useful when used with --predict.
43

EXAMPLES

45              $ bup margin
46              Reading indexes: 100.00% (1612581/1612581), done.
47              40
48              40 matching prefix bits
49              1.94 bits per doubling
50              120 bits (61.86 doublings) remaining
51              4.19338e+18 times larger is possible
52
53              Everyone on earth could have 625878182 data sets
54              like yours, all in one repository, and we would
55              expect 1 object collision.
56
57              $ bup margin --predict
58              PackIdxList: using 1 index.
59              Reading indexes: 100.00% (1612581/1612581), done.
60              915 of 1612581 (0.057%)
61

SEE ALSO

63       bup-midx(1), bup-save(1)
64

BUP

66       Part of the bup(1) suite.
67

AUTHORS

69       Avery Pennarun <apenwarr@gmail.com>.
70
71
72
73Bup 0.29.2                        2018-10-20                     bup-margin(1)
Impressum