1Digest(3)             User Contributed Perl Documentation            Digest(3)
2
3
4

NAME

6       File::RsyncP::Digest - Perl interface to rsync message digest algo‐
7       rithms
8

SYNOPSIS

10           use File::RsyncP::Digest;
11
12           $rsDigest = new File::RsyncP::Digest;
13
14           # specify rsync protocol version (default is <= 26 -> buggy digests).
15           $rsDigest->protocol(version);
16
17           # file MD4 digests
18           $rsDigest->reset();
19           $rsDigest->add(LIST);
20           $rsDigest->addfile(HANDLE);
21
22           $digest = $rsDigest->digest();
23           $string = $rsDigest->hexdigest();
24
25           # Return 32 byte pair of digests (protocol <= 26 and >= 27).
26           $digestPair = $rsDigest->digest2();
27
28           $digest = File::RsyncP::Digest->hash(SCALAR);
29           $string = File::RsyncP::Digest->hexhash(SCALAR);
30
31           # block digests
32           $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
33                                             $checksumSeed);
34
35           $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
36                                       $blockLastLen, $md4DigestLen, $checksumSeed);
37
38           $digests2 = $rsDigest->blockDigestExtract($digests16, $md4DigestLen);
39

DESCRIPTION

41       The File::RsyncP::Digest module allows you to compute rsync digests,
42       including the RSA Data Security Inc. MD4 Message Digest algorithm, and
43       Adler32 checksums from within Perl programs.
44
45       Rsync Digests
46
47       Rsync uses two main digests (or checksums), for checking with very high
48       probability that the underlying data is identical, without the need to
49       exchange the underlying data.
50
51       The server (remote) side of rsync generates a checksumSeed (usually
52       unix time()) that is exchanged during the protocol startup.  This seed
53       is used in both the file and MD4 checksum calculations.  This causes
54       the block and file checksums to change every time Rsync is run.
55
56       File Digest
57           This is an MD4 digest of the checksum seed, followed by the entire
58           file's contents.  This digest is 128 bits long.  The file digest is
59           sent at the end of a file's deltas to ensure that the reconstructed
60           file is correct.  This digest is also optionally computed and sent
61           as part of the file list if the --checksum option is specified to
62           rsync.
63
64       Block digest
65           Each file is divided into blocks of default length 700 bytes.  The
66           digest of each block is formed by computing the Adler32 checksum of
67           the block, and also the MD4 digest of the block followed by the
68           checksum seed.  During phase 1, just the first two bytes of the MD4
69           digest are sent, meaning the total digest is 6 bytes or 48 bits (4
70           bytes for Adler32 and the first 2 bytes of the MD4 digest).  During
71           phase 2 (which is necessary for received files that have an incor‐
72           rect file digest), the entire MD4 checksum is used (128 bits) mean‐
73           ing the block digest is 20 bytes or 160 bits.  (Prior to rsync pro‐
74           tocol XXX, the full 20 byte digest was sent every time and there
75           was only a single phase.)
76
77       This module contains routines for computing file and block digests in a
78       manner that is identical to rsync.
79
80       Incidentally, rsync contains two bugs in its implementation of MD4 (up
81       to and including rsync protocol version 26):
82
83       ·   MD4Final() is not called when the data size (ie: file or block size
84           plus 4 bytes for the checksum seed) is a multiple of 64.
85
86       ·   MD4 is not correct for total data sizes greater than 512MB (2^32
87           bits).  Rsync's MD4 only maintains the data size using a 32 bit
88           counter, so it overflows for file sizes bigger than 512MB.
89
90       The effects of these bugs are benign: the MD4 digest should not be
91       cryptographically weakened and both sides are consistent.
92
93       This module implements both versions of the MD4 digest: the buggy ver‐
94       sion for protocol versions <= 26 and the correct version for protocol
95       versions >= 27.  The default mode is the buggy version (protocol ver‐
96       sions <= 26).
97
98       You can specify the rsync protocol version to determine which MD4 ver‐
99       sion is used:
100
101           # specify rsync protocol version (default is <= 26 -> buggy digests).
102           $rsDigest->protocol(version);
103
104       Also, you can get both digests in a single call.  The result is
105       returned as a single 32 byte scalar: the first 16 bytes is the buggy
106       digest and the second 16 bytes is the correct digest:
107
108           # Return 32 byte pair of digests (protocol <= 26 and >= 27).
109           $digestPair = $rsDigest->digest2();
110
111       Usage
112
113       A new rsync digest context object is created with the new operation.
114       Multiple simultaneous digest contexts can be maintained, if desired.
115
116       Computing Block Digests
117
118       After a context is created, the function to compute block checksums is:
119
120           $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
121                                             $checksumSeed)
122
123       The first argument is the data, which can contain as much raw data as
124       you wish (ie: multiple blocks).  Both the Adler32 checksum and the MD4
125       checksum are computed for each block in data.  The partial end block
126       (if present) is also processed.  The 4 bytes of the integer checksum‐
127       Seed is added at the end of each block digest calculation if it is
128       non-zero.  The blockSize is specified in the second argument (default
129       is 700).  The third argument, md4DigestLen, specifies how many bytes of
130       the MD4 digest are included in the returned data.  Rsync uses a value
131       of 2 for the first pass (meaning 6 bytes of total digests are returned
132       per block), and all 16 bytes for the second pass (meaning 20 bytes of
133       total digests are returned per block).  The returned number of bytes is
134       the number of bytes in each digest (Alder32 + partial/compete MD4)
135       times the number of blocks:
136
137           (4 + md4DigestLen) * ceil(length(data) / blockSize);
138
139       To allow block checksums to be cached (when checksumSeed is unknown),
140       and then quickly updated with the known checksumSeed, the checksum data
141       should be first computed with a digest length of -1 and a checksumSeed
142       of 0:
143
144           $state = $rsDigest->blockDigest($data, $blockSize, -1, 0);
145
146       The returned $state should be saved for later retrieval, together with
147       the length of the last partial block (eg: length($data) % $blockSize).
148       The length of $state depends upon the number of blocks and the block
149       size.  In addition to the 16 bytes of MD4 state, up to 63 bytes of
150       unprocessed data per block also is saved in $state.  For each block,
151
152           16 + ($blockSize % 64)
153
154       bytes are saved in $state, so $state is most compact when $blockSize is
155       a multiple of 64.  (The last, partial, block might have a smaller block
156       size, requiring up to 63 bytes of state even if $blockSize is a multi‐
157       ple of 64.)
158
159       Once the checksumSeed is known the updated checksums can then be com‐
160       puted using:
161
162           $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
163                                       $blockLastLen, $md4DigestLen, $checksumSeed);
164
165       The first argument is the cached checksums from blockDigest.  The third
166       argument is the length of the (partial) last block.
167
168       Alternatively, I hope to add a --checksum-seed=n option to rsync that
169       allows the checksum seed to be set to 0.  This causes the checksum seed
170       to be omitted from the MD4 calculation and it makes caching the check‐
171       sums much easier.  A zero checksum seed does not weaken the block
172       digest.  I'm not sure whether or not it weakens the file digest (the
173       checksum seed is applied at the start of the file digest and end of the
174       block digest).  In this case, the full 16 byte checksums should be com‐
175       puted using:
176
177           $digests16 = $rsDigest->blockDigest($data, $blockSize, 16, 0);
178
179       and for phase 1 the 2 byte MD4 substrings can be extracted with:
180
181           $digests2  = $rsDigest->blockDigestExtract($digests16, 2);
182
183       The original $digests16 does not need any additional processing for
184       phase 2.
185
186       Computing File Digests
187
188       In addition, functions identical to Digest::MD4 are provided that allow
189       rsync's MD4 file digest to be computed.  The checksum seed, if
190       non-zero, is included at the start of the data, before the file's con‐
191       tents are added.
192
193       The context is updated with the add operation which adds the strings
194       contained in the LIST parameter. Note, however, that "add('foo',
195       'bar')", "add('foo')" followed by "add('bar')" and "add('foobar')"
196       should all give the same result.
197
198       The final MD4 message digest value is returned by the digest operation
199       as a 16-byte binary string. This operation delivers the result of add
200       operations since the last new or reset operation. Note that the digest
201       operation is effectively a destructive, read-once operation. Once it
202       has been performed, the context must be reset before being used to cal‐
203       culate another digest value.
204
205       Several convenience functions are also provided. The addfile operation
206       takes an open file-handle and reads it until end-of file in 1024 byte
207       blocks adding the contents to the context. The file-handle can either
208       be specified by name or passed as a type-glob reference, as shown in
209       the examples below. The hexdigest operation calls digest and returns
210       the result as a printable string of hexdecimal digits. This is exactly
211       the same operation as performed by the unpack operation in the examples
212       below.
213
214       The hash operation can act as either a static member function (ie you
215       invoke it on the MD4 class as in the synopsis above) or as a normal
216       virtual function. In both cases it performs the complete MD4 cycle
217       (reset, add, digest) on the supplied scalar value. This is convenient
218       for handling small quantities of data. When invoked on the class a tem‐
219       porary context is created. When invoked through an already created con‐
220       text object, this context is used. The latter form is slightly more
221       efficient. The hexhash operation is analogous to hexdigest.
222

EXAMPLES

224           use File::RsyncP::Digest;
225
226           my $rsDigest = new File::RsyncP::Digest;
227           $rsDigest->add('foo', 'bar');
228           $rsDigest->add('baz');
229           my $digest = $rsDigest->digest();
230
231           print("Rsync MD4 Digest is " . unpack("H*", $digest) . "\n");
232
233       The above example would print out the message
234
235           Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21
236
237       To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per
238       block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's,
239       with a checksum seed of 0x12345678:
240
241           use File::RsyncP::Digest;
242
243           my $rsDigest = new File::RsyncP::Digest;
244           my $data = ("a" x 700) . ("b" x 700) . ("c" x 600);
245           my $digest = $rsDigest->rsyncChecksum($data, 700, 2, 0x12345678);
246
247           print("Rsync block checksums are " . unpack("H*", $digest) . "\n");
248
249       This will print:
250
251           Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49
252
253       The same result can be achieved in two steps by saving the state, and
254       then finishing the calculation:
255
256           my $state = $rsDigest->blockDigest($data, 700, -1, 0);
257
258           my $digest = $rsDigest->blockDigestUpdate($state, 700,
259                                           length($data) % 700, 2, 0x12345678);
260
261       or by computing full-length MD4 digests, and extracting the 2 byte ver‐
262       sion:
263
264           my $digest16 = $rsDigest->blockDigest($data, 700, 16, 0x12345678);
265           my $digest   = $rsDigest->blockDigestExtract($digest16, 2);
266

LICENSE

268       This program is free software; you can redistribute it and/or modify it
269       under the terms of the GNU General Public License as published by the
270       Free Software Foundation; either version 2 of the License, or (at your
271       option) any later version.
272
273       This program is distributed in the hope that it will be useful, but
274       WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
275       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
276       Public License for more details.
277
278       You should have received a copy of the GNU General Public License in
279       the LICENSE file along with this program; if not, write to the Free
280       Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
281       02111-1307 USA.
282
283       The MD4 algorithm is defined in RFC1320. The basic C code implementing
284       the algorithm is derived from that in the RFC and is covered by the
285       following copyright:
286
287          MD4 is Copyright (C) 1990-2, RSA Data Security, Inc. All rights
288          reserved.
289
290          License to copy and use this software is granted provided that it
291          is identified as the "RSA Data Security, Inc. MD4 Message-Digest
292          Algorithm" in all material mentioning or referencing this software
293          or this function.
294
295          License is also granted to make and use derivative works provided
296          that such works are identified as "derived from the RSA Data
297          Security, Inc. MD4 Message-Digest Algorithm" in all material
298          mentioning or referencing the derived work.
299
300          RSA Data Security, Inc. makes no representations concerning either
301          the merchantability of this software or the suitability of this
302          software for any particular purpose. It is provided "as is"
303          without express or implied warranty of any kind.
304
305          These notices must be retained in any copies of any part of this
306          documentation and/or software.
307
308       This copyright does not prohibit distribution of any version of Perl
309       containing this extension under the terms of the GNU or Artistic
310       licences.
311

AUTHOR

313       File::RsyncP::Digest was written by Craig Barratt <cbar‐
314       ratt@users.sourceforge.net> based on Digest::MD4 and the Adler32 imple‐
315       mentation was based on rsync 2.5.5.
316
317       Digest::MD4 was adapted by Mike McCauley ("mikem@open.com.au"), based
318       entirely on MD5-1.7, written by Neil Winton ("N.Win‐
319       ton@axion.bt.co.uk").
320
321       Rsync was written by Andrew Tridgell <tridge@samba.org> and Paul Mack‐
322       erras.  It is available under a GPL license.  See
323       <http://rsync.samba.org>.
324

SEE ALSO

326       See <http://perlrsync.sourceforge.net> for File::RsyncP's SourceForge
327       home page.
328
329       See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.
330
331
332
333perl v5.8.8                       2006-11-19                         Digest(3)
Impressum