1Digest(3)             User Contributed Perl Documentation            Digest(3)
2
3
4

NAME

6       File::RsyncP::Digest - Perl interface to rsync message digest
7       algorithms
8

SYNOPSIS

10           use File::RsyncP::Digest;
11
12           $rsDigest = new File::RsyncP::Digest;
13
14           # specify rsync protocol version (default is <= 26 -> buggy digests).
15           $rsDigest->protocol(version);
16
17           # file MD4 digests
18           $rsDigest->reset();
19           $rsDigest->add(LIST);
20           $rsDigest->addfile(HANDLE);
21
22           $digest = $rsDigest->digest();
23           $string = $rsDigest->hexdigest();
24
25           # Return 32 byte pair of digests (protocol <= 26 and >= 27).
26           $digestPair = $rsDigest->digest2();
27
28           $digest = File::RsyncP::Digest->hash(SCALAR);
29           $string = File::RsyncP::Digest->hexhash(SCALAR);
30
31           # block digests
32           $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
33                                             $checksumSeed);
34
35           $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
36                                       $blockLastLen, $md4DigestLen, $checksumSeed);
37
38           $digests2 = $rsDigest->blockDigestExtract($digests16, $md4DigestLen);
39

DESCRIPTION

41       The File::RsyncP::Digest module allows you to compute rsync digests,
42       including the RSA Data Security Inc. MD4 Message Digest algorithm, and
43       Adler32 checksums from within Perl programs.
44
45   Rsync Digests
46       Rsync uses two main digests (or checksums), for checking with very high
47       probability that the underlying data is identical, without the need to
48       exchange the underlying data.
49
50       The server (remote) side of rsync generates a checksumSeed (usually
51       unix time()) that is exchanged during the protocol startup.  This seed
52       is used in both the file and MD4 checksum calculations.  This causes
53       the block and file checksums to change every time Rsync is run.
54
55       File Digest
56           This is an MD4 digest of the checksum seed, followed by the entire
57           file's contents.  This digest is 128 bits long.  The file digest is
58           sent at the end of a file's deltas to ensure that the reconstructed
59           file is correct.  This digest is also optionally computed and sent
60           as part of the file list if the --checksum option is specified to
61           rsync.
62
63       Block digest
64           Each file is divided into blocks of default length 700 bytes.  The
65           digest of each block is formed by computing the Adler32 checksum of
66           the block, and also the MD4 digest of the block followed by the
67           checksum seed.  During phase 1, just the first two bytes of the MD4
68           digest are sent, meaning the total digest is 6 bytes or 48 bits (4
69           bytes for Adler32 and the first 2 bytes of the MD4 digest).  During
70           phase 2 (which is necessary for received files that have an
71           incorrect file digest), the entire MD4 checksum is used (128 bits)
72           meaning the block digest is 20 bytes or 160 bits.  (Prior to rsync
73           protocol XXX, the full 20 byte digest was sent every time and there
74           was only a single phase.)
75
76       This module contains routines for computing file and block digests in a
77       manner that is identical to rsync.
78
79       Incidentally, rsync contains two bugs in its implementation of MD4 (up
80       to and including rsync protocol version 26):
81
82MD4Final() is not called when the data size (ie: file or block size
83           plus 4 bytes for the checksum seed) is a multiple of 64.
84
85       •   MD4 is not correct for total data sizes greater than 512MB (2^32
86           bits).  Rsync's MD4 only maintains the data size using a 32 bit
87           counter, so it overflows for file sizes bigger than 512MB.
88
89       The effects of these bugs are benign: the MD4 digest should not be
90       cryptographically weakened and both sides are consistent.
91
92       This module implements both versions of the MD4 digest: the buggy
93       version for protocol versions <= 26 and the correct version for
94       protocol versions >= 27.  The default mode is the buggy version
95       (protocol versions <= 26).
96
97       You can specify the rsync protocol version to determine which MD4
98       version is used:
99
100           # specify rsync protocol version (default is <= 26 -> buggy digests).
101           $rsDigest->protocol(version);
102
103       Also, you can get both digests in a single call.  The result is
104       returned as a single 32 byte scalar: the first 16 bytes is the buggy
105       digest and the second 16 bytes is the correct digest:
106
107           # Return 32 byte pair of digests (protocol <= 26 and >= 27).
108           $digestPair = $rsDigest->digest2();
109
110   Usage
111       A new rsync digest context object is created with the new operation.
112       Multiple simultaneous digest contexts can be maintained, if desired.
113
114   Computing Block Digests
115       After a context is created, the function to compute block checksums is:
116
117           $digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
118                                             $checksumSeed)
119
120       The first argument is the data, which can contain as much raw data as
121       you wish (ie: multiple blocks).  Both the Adler32 checksum and the MD4
122       checksum are computed for each block in data.  The partial end block
123       (if present) is also processed.  The 4 bytes of the integer
124       checksumSeed is added at the end of each block digest calculation if it
125       is non-zero.  The blockSize is specified in the second argument
126       (default is 700).  The third argument, md4DigestLen, specifies how many
127       bytes of the MD4 digest are included in the returned data.  Rsync uses
128       a value of 2 for the first pass (meaning 6 bytes of total digests are
129       returned per block), and all 16 bytes for the second pass (meaning 20
130       bytes of total digests are returned per block).  The returned number of
131       bytes is the number of bytes in each digest (Alder32 + partial/compete
132       MD4) times the number of blocks:
133
134           (4 + md4DigestLen) * ceil(length(data) / blockSize);
135
136       To allow block checksums to be cached (when checksumSeed is unknown),
137       and then quickly updated with the known checksumSeed, the checksum data
138       should be first computed with a digest length of -1 and a checksumSeed
139       of 0:
140
141           $state = $rsDigest->blockDigest($data, $blockSize, -1, 0);
142
143       The returned $state should be saved for later retrieval, together with
144       the length of the last partial block (eg: length($data) % $blockSize).
145       The length of $state depends upon the number of blocks and the block
146       size.  In addition to the 16 bytes of MD4 state, up to 63 bytes of
147       unprocessed data per block also is saved in $state.  For each block,
148
149           16 + ($blockSize % 64)
150
151       bytes are saved in $state, so $state is most compact when $blockSize is
152       a multiple of 64.  (The last, partial, block might have a smaller block
153       size, requiring up to 63 bytes of state even if $blockSize is a
154       multiple of 64.)
155
156       Once the checksumSeed is known the updated checksums can then be
157       computed using:
158
159           $digests = $rsDigest->blockDigestUpdate($state, $blockSize,
160                                       $blockLastLen, $md4DigestLen, $checksumSeed);
161
162       The first argument is the cached checksums from blockDigest.  The third
163       argument is the length of the (partial) last block.
164
165       Alternatively, I hope to add a --checksum-seed=n option to rsync that
166       allows the checksum seed to be set to 0.  This causes the checksum seed
167       to be omitted from the MD4 calculation and it makes caching the
168       checksums much easier.  A zero checksum seed does not weaken the block
169       digest.  I'm not sure whether or not it weakens the file digest (the
170       checksum seed is applied at the start of the file digest and end of the
171       block digest).  In this case, the full 16 byte checksums should be
172       computed using:
173
174           $digests16 = $rsDigest->blockDigest($data, $blockSize, 16, 0);
175
176       and for phase 1 the 2 byte MD4 substrings can be extracted with:
177
178           $digests2  = $rsDigest->blockDigestExtract($digests16, 2);
179
180       The original $digests16 does not need any additional processing for
181       phase 2.
182
183   Computing File Digests
184       In addition, functions identical to Digest::MD4 are provided that allow
185       rsync's MD4 file digest to be computed.  The checksum seed, if non-
186       zero, is included at the start of the data, before the file's contents
187       are added.
188
189       The context is updated with the add operation which adds the strings
190       contained in the LIST parameter. Note, however, that "add('foo',
191       'bar')", "add('foo')" followed by "add('bar')" and "add('foobar')"
192       should all give the same result.
193
194       The final MD4 message digest value is returned by the digest operation
195       as a 16-byte binary string. This operation delivers the result of add
196       operations since the last new or reset operation. Note that the digest
197       operation is effectively a destructive, read-once operation. Once it
198       has been performed, the context must be reset before being used to
199       calculate another digest value.
200
201       Several convenience functions are also provided. The addfile operation
202       takes an open file-handle and reads it until end-of file in 1024 byte
203       blocks adding the contents to the context. The file-handle can either
204       be specified by name or passed as a type-glob reference, as shown in
205       the examples below. The hexdigest operation calls digest and returns
206       the result as a printable string of hexdecimal digits. This is exactly
207       the same operation as performed by the unpack operation in the examples
208       below.
209
210       The hash operation can act as either a static member function (ie you
211       invoke it on the MD4 class as in the synopsis above) or as a normal
212       virtual function. In both cases it performs the complete MD4 cycle
213       (reset, add, digest) on the supplied scalar value. This is convenient
214       for handling small quantities of data. When invoked on the class a
215       temporary context is created. When invoked through an already created
216       context object, this context is used. The latter form is slightly more
217       efficient. The hexhash operation is analogous to hexdigest.
218

EXAMPLES

220           use File::RsyncP::Digest;
221
222           my $rsDigest = new File::RsyncP::Digest;
223           $rsDigest->add('foo', 'bar');
224           $rsDigest->add('baz');
225           my $digest = $rsDigest->digest();
226
227           print("Rsync MD4 Digest is " . unpack("H*", $digest) . "\n");
228
229       The above example would print out the message
230
231           Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21
232
233       To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per
234       block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's,
235       with a checksum seed of 0x12345678:
236
237           use File::RsyncP::Digest;
238
239           my $rsDigest = new File::RsyncP::Digest;
240           my $data = ("a" x 700) . ("b" x 700) . ("c" x 600);
241           my $digest = $rsDigest->rsyncChecksum($data, 700, 2, 0x12345678);
242
243           print("Rsync block checksums are " . unpack("H*", $digest) . "\n");
244
245       This will print:
246
247           Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49
248
249       The same result can be achieved in two steps by saving the state, and
250       then finishing the calculation:
251
252           my $state = $rsDigest->blockDigest($data, 700, -1, 0);
253
254           my $digest = $rsDigest->blockDigestUpdate($state, 700,
255                                           length($data) % 700, 2, 0x12345678);
256
257       or by computing full-length MD4 digests, and extracting the 2 byte
258       version:
259
260           my $digest16 = $rsDigest->blockDigest($data, 700, 16, 0x12345678);
261           my $digest   = $rsDigest->blockDigestExtract($digest16, 2);
262

LICENSE

264       This program is free software: you can redistribute it and/or modify it
265       under the terms of the GNU General Public License as published by the
266       Free Software Foundation, either version 3 of the License, or (at your
267       option) any later version.
268
269       This program is distributed in the hope that it will be useful, but
270       WITHOUT ANY WARRANTY; without even the implied warranty of
271       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
272       General Public License for more details.
273
274       You should have received a copy of the GNU General Public License along
275       with this program.  If not, see <http://www.gnu.org/licenses/>.
276
277       The MD4 algorithm is defined in RFC1320. The basic C code implementing
278       the algorithm is derived from that in the RFC and is covered by the
279       following copyright:
280
281          MD4 is Copyright (C) 1990-2, RSA Data Security, Inc. All rights
282          reserved.
283
284          License to copy and use this software is granted provided that it
285          is identified as the "RSA Data Security, Inc. MD4 Message-Digest
286          Algorithm" in all material mentioning or referencing this software
287          or this function.
288
289          License is also granted to make and use derivative works provided
290          that such works are identified as "derived from the RSA Data
291          Security, Inc. MD4 Message-Digest Algorithm" in all material
292          mentioning or referencing the derived work.
293
294          RSA Data Security, Inc. makes no representations concerning either
295          the merchantability of this software or the suitability of this
296          software for any particular purpose. It is provided "as is"
297          without express or implied warranty of any kind.
298
299          These notices must be retained in any copies of any part of this
300          documentation and/or software.
301
302       This copyright does not prohibit distribution of any version of Perl
303       containing this extension under the terms of the GNU or Artistic
304       licences.
305

AUTHOR

307       File::RsyncP::Digest was written by Craig Barratt
308       <cbarratt@users.sourceforge.net> based on Digest::MD4 and the Adler32
309       implementation was based on rsync 2.5.5.
310
311       Digest::MD4 was adapted by Mike McCauley ("mikem@open.com.au"), based
312       entirely on MD5-1.7, written by Neil Winton
313       ("N.Winton@axion.bt.co.uk").
314
315       Rsync was written by Andrew Tridgell <tridge@samba.org> and Paul
316       Mackerras.  It is available under a GPL license.  See
317       <http://rsync.samba.org>.
318

SEE ALSO

320       See <http://perlrsync.sourceforge.net> for File::RsyncP's SourceForge
321       home page.
322
323       See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.
324
325
326
327perl v5.36.0                      2022-07-22                         Digest(3)
Impressum