1bup-margin(1) bup-margin(1)
2
3
4
6 bup-margin - figure out your deduplication safety margin
7
9 bup margin [options...]
10
12 bup margin iterates through all objects in your bup repository, calcu‐
13 lating the largest number of prefix bits shared between any two en‐
14 tries. This number, n, identifies the longest subset of SHA-1 you
15 could use and still encounter a collision between your object ids.
16
17 For example, one system that was tested had a collection of 11 million
18 objects (70 GB), and bup margin returned 45. That means a 46-bit hash
19 would be sufficient to avoid all collisions among that set of objects;
20 each object in that repository could be uniquely identified by its
21 first 46 bits.
22
23 The number of bits needed seems to increase by about 1 or 2 for every
24 doubling of the number of objects. Since SHA-1 hashes have 160 bits,
25 that leaves 115 bits of margin. Of course, because SHA-1 hashes are
26 essentially random, it's theoretically possible to use many more bits
27 with far fewer objects.
28
29 If you're paranoid about the possibility of SHA-1 collisions, you can
30 monitor your repository by running bup margin occasionally to see if
31 you're getting dangerously close to 160 bits.
32
34 --predict
35 Guess the offset into each index file where a particular object
36 will appear, and report the maximum deviation of the correct an‐
37 swer from the guess. This is potentially useful for tuning an
38 interpolation search algorithm.
39
40 --ignore-midx
41 don't use .midx files, use only .idx files. This is only really
42 useful when used with --predict.
43
45 $ bup margin
46 Reading indexes: 100.00% (1612581/1612581), done.
47 40
48 40 matching prefix bits
49 1.94 bits per doubling
50 120 bits (61.86 doublings) remaining
51 4.19338e+18 times larger is possible
52
53 Everyone on earth could have 625878182 data sets
54 like yours, all in one repository, and we would
55 expect 1 object collision.
56
57 $ bup margin --predict
58 PackIdxList: using 1 index.
59 Reading indexes: 100.00% (1612581/1612581), done.
60 915 of 1612581 (0.057%)
61
63 bup-midx(1), bup-save(1)
64
66 Part of the bup(1) suite.
67
69 Avery Pennarun <apenwarr@gmail.com>.
70
71
72
73Bup 0.29.2 2018-10-20 bup-margin(1)