1Digest(3) User Contributed Perl Documentation Digest(3)
2
3
4
6 File::RsyncP::Digest - Perl interface to rsync message digest algo‐
7 rithms
8
10 use File::RsyncP::Digest;
11
12 $rsDigest = new File::RsyncP::Digest;
13
14 # specify rsync protocol version (default is <= 26 -> buggy digests).
15 $rsDigest->protocol(version);
16
17 # file MD4 digests
18 $rsDigest->reset();
19 $rsDigest->add(LIST);
20 $rsDigest->addfile(HANDLE);
21
22 $digest = $rsDigest->digest();
23 $string = $rsDigest->hexdigest();
24
25 # Return 32 byte pair of digests (protocol <= 26 and >= 27).
26 $digestPair = $rsDigest->digest2();
27
28 $digest = File::RsyncP::Digest->hash(SCALAR);
29 $string = File::RsyncP::Digest->hexhash(SCALAR);
30
31 # block digests
32 $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
33 $checksumSeed);
34
35 $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
36 $blockLastLen, $md4DigestLen, $checksumSeed);
37
38 $digests2 = $rsDigest->blockDigestExtract($digests16, $md4DigestLen);
39
41 The File::RsyncP::Digest module allows you to compute rsync digests,
42 including the RSA Data Security Inc. MD4 Message Digest algorithm, and
43 Adler32 checksums from within Perl programs.
44
45 Rsync Digests
46
47 Rsync uses two main digests (or checksums), for checking with very high
48 probability that the underlying data is identical, without the need to
49 exchange the underlying data.
50
51 The server (remote) side of rsync generates a checksumSeed (usually
52 unix time()) that is exchanged during the protocol startup. This seed
53 is used in both the file and MD4 checksum calculations. This causes
54 the block and file checksums to change every time Rsync is run.
55
56 File Digest
57 This is an MD4 digest of the checksum seed, followed by the entire
58 file's contents. This digest is 128 bits long. The file digest is
59 sent at the end of a file's deltas to ensure that the reconstructed
60 file is correct. This digest is also optionally computed and sent
61 as part of the file list if the --checksum option is specified to
62 rsync.
63
64 Block digest
65 Each file is divided into blocks of default length 700 bytes. The
66 digest of each block is formed by computing the Adler32 checksum of
67 the block, and also the MD4 digest of the block followed by the
68 checksum seed. During phase 1, just the first two bytes of the MD4
69 digest are sent, meaning the total digest is 6 bytes or 48 bits (4
70 bytes for Adler32 and the first 2 bytes of the MD4 digest). During
71 phase 2 (which is necessary for received files that have an incor‐
72 rect file digest), the entire MD4 checksum is used (128 bits) mean‐
73 ing the block digest is 20 bytes or 160 bits. (Prior to rsync pro‐
74 tocol XXX, the full 20 byte digest was sent every time and there
75 was only a single phase.)
76
77 This module contains routines for computing file and block digests in a
78 manner that is identical to rsync.
79
80 Incidentally, rsync contains two bugs in its implementation of MD4 (up
81 to and including rsync protocol version 26):
82
83 · MD4Final() is not called when the data size (ie: file or block size
84 plus 4 bytes for the checksum seed) is a multiple of 64.
85
86 · MD4 is not correct for total data sizes greater than 512MB (2^32
87 bits). Rsync's MD4 only maintains the data size using a 32 bit
88 counter, so it overflows for file sizes bigger than 512MB.
89
90 The effects of these bugs are benign: the MD4 digest should not be
91 cryptographically weakened and both sides are consistent.
92
93 This module implements both versions of the MD4 digest: the buggy ver‐
94 sion for protocol versions <= 26 and the correct version for protocol
95 versions >= 27. The default mode is the buggy version (protocol ver‐
96 sions <= 26).
97
98 You can specify the rsync protocol version to determine which MD4 ver‐
99 sion is used:
100
101 # specify rsync protocol version (default is <= 26 -> buggy digests).
102 $rsDigest->protocol(version);
103
104 Also, you can get both digests in a single call. The result is
105 returned as a single 32 byte scalar: the first 16 bytes is the buggy
106 digest and the second 16 bytes is the correct digest:
107
108 # Return 32 byte pair of digests (protocol <= 26 and >= 27).
109 $digestPair = $rsDigest->digest2();
110
111 Usage
112
113 A new rsync digest context object is created with the new operation.
114 Multiple simultaneous digest contexts can be maintained, if desired.
115
116 Computing Block Digests
117
118 After a context is created, the function to compute block checksums is:
119
120 $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
121 $checksumSeed)
122
123 The first argument is the data, which can contain as much raw data as
124 you wish (ie: multiple blocks). Both the Adler32 checksum and the MD4
125 checksum are computed for each block in data. The partial end block
126 (if present) is also processed. The 4 bytes of the integer checksum‐
127 Seed is added at the end of each block digest calculation if it is
128 non-zero. The blockSize is specified in the second argument (default
129 is 700). The third argument, md4DigestLen, specifies how many bytes of
130 the MD4 digest are included in the returned data. Rsync uses a value
131 of 2 for the first pass (meaning 6 bytes of total digests are returned
132 per block), and all 16 bytes for the second pass (meaning 20 bytes of
133 total digests are returned per block). The returned number of bytes is
134 the number of bytes in each digest (Alder32 + partial/compete MD4)
135 times the number of blocks:
136
137 (4 + md4DigestLen) * ceil(length(data) / blockSize);
138
139 To allow block checksums to be cached (when checksumSeed is unknown),
140 and then quickly updated with the known checksumSeed, the checksum data
141 should be first computed with a digest length of -1 and a checksumSeed
142 of 0:
143
144 $state = $rsDigest->blockDigest($data, $blockSize, -1, 0);
145
146 The returned $state should be saved for later retrieval, together with
147 the length of the last partial block (eg: length($data) % $blockSize).
148 The length of $state depends upon the number of blocks and the block
149 size. In addition to the 16 bytes of MD4 state, up to 63 bytes of
150 unprocessed data per block also is saved in $state. For each block,
151
152 16 + ($blockSize % 64)
153
154 bytes are saved in $state, so $state is most compact when $blockSize is
155 a multiple of 64. (The last, partial, block might have a smaller block
156 size, requiring up to 63 bytes of state even if $blockSize is a multi‐
157 ple of 64.)
158
159 Once the checksumSeed is known the updated checksums can then be com‐
160 puted using:
161
162 $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
163 $blockLastLen, $md4DigestLen, $checksumSeed);
164
165 The first argument is the cached checksums from blockDigest. The third
166 argument is the length of the (partial) last block.
167
168 Alternatively, I hope to add a --checksum-seed=n option to rsync that
169 allows the checksum seed to be set to 0. This causes the checksum seed
170 to be omitted from the MD4 calculation and it makes caching the check‐
171 sums much easier. A zero checksum seed does not weaken the block
172 digest. I'm not sure whether or not it weakens the file digest (the
173 checksum seed is applied at the start of the file digest and end of the
174 block digest). In this case, the full 16 byte checksums should be com‐
175 puted using:
176
177 $digests16 = $rsDigest->blockDigest($data, $blockSize, 16, 0);
178
179 and for phase 1 the 2 byte MD4 substrings can be extracted with:
180
181 $digests2 = $rsDigest->blockDigestExtract($digests16, 2);
182
183 The original $digests16 does not need any additional processing for
184 phase 2.
185
186 Computing File Digests
187
188 In addition, functions identical to Digest::MD4 are provided that allow
189 rsync's MD4 file digest to be computed. The checksum seed, if
190 non-zero, is included at the start of the data, before the file's con‐
191 tents are added.
192
193 The context is updated with the add operation which adds the strings
194 contained in the LIST parameter. Note, however, that "add('foo',
195 'bar')", "add('foo')" followed by "add('bar')" and "add('foobar')"
196 should all give the same result.
197
198 The final MD4 message digest value is returned by the digest operation
199 as a 16-byte binary string. This operation delivers the result of add
200 operations since the last new or reset operation. Note that the digest
201 operation is effectively a destructive, read-once operation. Once it
202 has been performed, the context must be reset before being used to cal‐
203 culate another digest value.
204
205 Several convenience functions are also provided. The addfile operation
206 takes an open file-handle and reads it until end-of file in 1024 byte
207 blocks adding the contents to the context. The file-handle can either
208 be specified by name or passed as a type-glob reference, as shown in
209 the examples below. The hexdigest operation calls digest and returns
210 the result as a printable string of hexdecimal digits. This is exactly
211 the same operation as performed by the unpack operation in the examples
212 below.
213
214 The hash operation can act as either a static member function (ie you
215 invoke it on the MD4 class as in the synopsis above) or as a normal
216 virtual function. In both cases it performs the complete MD4 cycle
217 (reset, add, digest) on the supplied scalar value. This is convenient
218 for handling small quantities of data. When invoked on the class a tem‐
219 porary context is created. When invoked through an already created con‐
220 text object, this context is used. The latter form is slightly more
221 efficient. The hexhash operation is analogous to hexdigest.
222
224 use File::RsyncP::Digest;
225
226 my $rsDigest = new File::RsyncP::Digest;
227 $rsDigest->add('foo', 'bar');
228 $rsDigest->add('baz');
229 my $digest = $rsDigest->digest();
230
231 print("Rsync MD4 Digest is " . unpack("H*", $digest) . "\n");
232
233 The above example would print out the message
234
235 Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21
236
237 To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per
238 block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's,
239 with a checksum seed of 0x12345678:
240
241 use File::RsyncP::Digest;
242
243 my $rsDigest = new File::RsyncP::Digest;
244 my $data = ("a" x 700) . ("b" x 700) . ("c" x 600);
245 my $digest = $rsDigest->rsyncChecksum($data, 700, 2, 0x12345678);
246
247 print("Rsync block checksums are " . unpack("H*", $digest) . "\n");
248
249 This will print:
250
251 Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49
252
253 The same result can be achieved in two steps by saving the state, and
254 then finishing the calculation:
255
256 my $state = $rsDigest->blockDigest($data, 700, -1, 0);
257
258 my $digest = $rsDigest->blockDigestUpdate($state, 700,
259 length($data) % 700, 2, 0x12345678);
260
261 or by computing full-length MD4 digests, and extracting the 2 byte ver‐
262 sion:
263
264 my $digest16 = $rsDigest->blockDigest($data, 700, 16, 0x12345678);
265 my $digest = $rsDigest->blockDigestExtract($digest16, 2);
266
268 This program is free software; you can redistribute it and/or modify it
269 under the terms of the GNU General Public License as published by the
270 Free Software Foundation; either version 2 of the License, or (at your
271 option) any later version.
272
273 This program is distributed in the hope that it will be useful, but
274 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
275 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
276 Public License for more details.
277
278 You should have received a copy of the GNU General Public License in
279 the LICENSE file along with this program; if not, write to the Free
280 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
281 02111-1307 USA.
282
283 The MD4 algorithm is defined in RFC1320. The basic C code implementing
284 the algorithm is derived from that in the RFC and is covered by the
285 following copyright:
286
287 MD4 is Copyright (C) 1990-2, RSA Data Security, Inc. All rights
288 reserved.
289
290 License to copy and use this software is granted provided that it
291 is identified as the "RSA Data Security, Inc. MD4 Message-Digest
292 Algorithm" in all material mentioning or referencing this software
293 or this function.
294
295 License is also granted to make and use derivative works provided
296 that such works are identified as "derived from the RSA Data
297 Security, Inc. MD4 Message-Digest Algorithm" in all material
298 mentioning or referencing the derived work.
299
300 RSA Data Security, Inc. makes no representations concerning either
301 the merchantability of this software or the suitability of this
302 software for any particular purpose. It is provided "as is"
303 without express or implied warranty of any kind.
304
305 These notices must be retained in any copies of any part of this
306 documentation and/or software.
307
308 This copyright does not prohibit distribution of any version of Perl
309 containing this extension under the terms of the GNU or Artistic
310 licences.
311
313 File::RsyncP::Digest was written by Craig Barratt <cbar‐
314 ratt@users.sourceforge.net> based on Digest::MD4 and the Adler32 imple‐
315 mentation was based on rsync 2.5.5.
316
317 Digest::MD4 was adapted by Mike McCauley ("mikem@open.com.au"), based
318 entirely on MD5-1.7, written by Neil Winton ("N.Win‐
319 ton@axion.bt.co.uk").
320
321 Rsync was written by Andrew Tridgell <tridge@samba.org> and Paul Mack‐
322 erras. It is available under a GPL license. See
323 <http://rsync.samba.org>.
324
326 See <http://perlrsync.sourceforge.net> for File::RsyncP's SourceForge
327 home page.
328
329 See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.
330
331
332
333perl v5.8.8 2006-11-19 Digest(3)