1Convert::BinHex(3) User Contributed Perl Documentation Convert::BinHex(3)
2
3
4
6 Convert::BinHex - extract data from Macintosh BinHex files
7
8 ALPHA WARNING: this code is currently in its Alpha release. Things may
9 change drastically until the interface is hammered out: if you have
10 suggestions or objections, please speak up now!
11
13 Simple functions:
14
15 use Convert::BinHex qw(binhex_crc macbinary_crc);
16
17 # Compute HQX7-style CRC for data, pumping in old CRC if desired:
18 $crc = binhex_crc($data, $crc);
19
20 # Compute the MacBinary-II-style CRC for the data:
21 $crc = macbinary_crc($data, $crc);
22
23 Hex to bin, low-level interface. Conversion is actually done via an
24 object ("Convert::BinHex::Hex2Bin") which keeps internal conversion
25 state:
26
27 # Create and use a "translator" object:
28 my $H2B = Convert::BinHex->hex2bin; # get a converter object
29 while (<STDIN>) {
30 print $STDOUT $H2B->next($_); # convert some more input
31 }
32 print $STDOUT $H2B->done; # no more input: finish up
33
34 Hex to bin, OO interface. The following operations must be done in the
35 order shown!
36
37 # Read data in piecemeal:
38 $HQX = Convert::BinHex->open(FH=>\*STDIN) ⎪⎪ die "open: $!";
39 $HQX->read_header; # read header info
40 @data = $HQX->read_data; # read in all the data
41 @rsrc = $HQX->read_resource; # read in all the resource
42
43 Bin to hex, low-level interface. Conversion is actually done via an
44 object ("Convert::BinHex::Bin2Hex") which keeps internal conversion
45 state:
46
47 # Create and use a "translator" object:
48 my $B2H = Convert::BinHex->bin2hex; # get a converter object
49 while (<STDIN>) {
50 print $STDOUT $B2H->next($_); # convert some more input
51 }
52 print $STDOUT $B2H->done; # no more input: finish up
53
54 Bin to hex, file interface. Yes, you can convert to BinHex as well as
55 from it!
56
57 # Create new, empty object:
58 my $HQX = Convert::BinHex->new;
59
60 # Set header attributes:
61 $HQX->filename("logo.gif");
62 $HQX->type("GIFA");
63 $HQX->creator("CNVS");
64
65 # Give it the data and resource forks (either can be absent):
66 $HQX->data(Path => "/path/to/data"); # here, data is on disk
67 $HQX->resource(Data => $resourcefork); # here, resource is in core
68
69 # Output as a BinHex stream, complete with leading comment:
70 $HQX->encode(\*STDOUT);
71
72 PLANNED!!!! Bin to hex, "CAP" interface. Thanks to Ken Lunde for sug‐
73 gesting this.
74
75 # Create new, empty object from CAP tree:
76 my $HQX = Convert::BinHex->from_cap("/path/to/root/file");
77 $HQX->encode(\*STDOUT);
78
80 BinHex is a format used by Macintosh for transporting Mac files safely
81 through electronic mail, as short-lined, 7-bit, semi-compressed data
82 streams. Ths module provides a means of converting those data streams
83 back into into binary data.
84
86 (Some text taken from RFC-1741.) Files on the Macintosh consist of two
87 parts, called forks:
88
89 Data fork
90 The actual data included in the file. The Data fork is typically
91 the only meaningful part of a Macintosh file on a non-Macintosh
92 computer system. For example, if a Macintosh user wants to send a
93 file of data to a user on an IBM-PC, she would only send the Data
94 fork.
95
96 Resource fork
97 Contains a collection of arbitrary attribute/value pairs, including
98 program segments, icon bitmaps, and parametric values.
99
100 Additional information regarding Macintosh files is stored by the
101 Finder in a hidden file, called the "Desktop Database".
102
103 Because of the complications in storing different parts of a Macintosh
104 file in a non-Macintosh filesystem that only handles consecutive data
105 in one part, it is common to convert the Macintosh file into some other
106 format before transferring it over the network. The BinHex format
107 squashes that data into transmittable ASCII as follows:
108
109 1. The file is output as a byte stream consisting of some basic header
110 information (filename, type, creator), then the data fork, then the
111 resource fork.
112
113 2. The byte stream is compressed by looking for series of duplicated
114 bytes and representing them using a special binary escape sequence
115 (of course, any occurences of the escape character must also be
116 escaped).
117
118 3. The compressed stream is encoded via the "6/8 hemiola" common to
119 base64 and uuencode: each group of three 8-bit bytes (24 bits) is
120 chopped into four 6-bit numbers, which are used as indexes into an
121 ASCII "alphabet". (I assume that leftover bytes are zero-padded;
122 documentation is thin).
123
125 CRC computation
126
127 macbinary_crc DATA, SEED
128 Compute the MacBinary-II-style CRC for the given DATA, with the CRC
129 seeded to SEED. Normally, you start with a SEED of 0, and you pump
130 in the previous CRC as the SEED if you're handling a lot of data
131 one chunk at a time. That is:
132
133 $crc = 0;
134 while (<STDIN>) {
135 $crc = macbinary_crc($_, $crc);
136 }
137
138 Note: Extracted from the mcvert utility (Doug Moore, April '87),
139 using a "magic array" algorithm by Jim Van Verth for efficiency.
140 Converted to Perl5 by Eryq. Untested.
141
142 binhex_crc DATA, SEED
143 Compute the HQX-style CRC for the given DATA, with the CRC seeded
144 to SEED. Normally, you start with a SEED of 0, and you pump in the
145 previous CRC as the SEED if you're handling a lot of data one chunk
146 at a time. That is:
147
148 $crc = 0;
149 while (<STDIN>) {
150 $crc = binhex_crc($_, $crc);
151 }
152
153 Note: Extracted from the mcvert utility (Doug Moore, April '87),
154 using a "magic array" algorithm by Jim Van Verth for efficiency.
155 Converted to Perl5 by Eryq.
156
158 Conversion
159
160 bin2hex
161 Class method, constructor. Return a converter object. Just cre‐
162 ates a new instance of "Convert::BinHex::Bin2Hex"; see that class
163 for details.
164
165 hex2bin
166 Class method, constructor. Return a converter object. Just cre‐
167 ates a new instance of "Convert::BinHex::Hex2Bin"; see that class
168 for details.
169
170 Construction
171
172 new PARAMHASH
173 Class method, constructor. Return a handle on a BinHex'able
174 entity. In general, the data and resource forks for such an entity
175 are stored in native format (binary) format.
176
177 Parameters in the PARAMHASH are the same as header-oriented method
178 names, and may be used to set attributes:
179
180 $HQX = new Convert::BinHex filename => "icon.gif",
181 type => "GIFB",
182 creator => "CNVS";
183
184 open PARAMHASH
185 Class method, constructor. Return a handle on a new BinHex'ed
186 stream, for parsing. Params are:
187
188 Data
189 Input a HEX stream from the given data. This can be a scalar,
190 or a reference to an array of scalars.
191
192 Expr
193 Input a HEX stream from any open()able expression. It will be
194 opened and binmode'd, and the filehandle will be closed either
195 on a "close()" or when the object is destructed.
196
197 FH Input a HEX stream from the given filehandle.
198
199 NoComment
200 If true, the parser should not attempt to skip a leading "(This
201 file...)" comment. That means that the first nonwhite charac‐
202 ters encountered must be the binhex'ed data.
203
204 Get/set header information
205
206 creator [VALUE]
207 Instance method. Get/set the creator of the file. This is a four-
208 character string (though I don't know if it's guaranteed to be
209 printable ASCII!) that serves as part of the Macintosh's version
210 of a MIME "content-type".
211
212 For example, a document created by "Canvas" might have creator
213 "CNVS".
214
215 data [PARAMHASH]
216 Instance method. Get/set the data fork. Any arguments are passed
217 into the new() method of "Convert::BinHex::Fork".
218
219 filename [VALUE]
220 Instance method. Get/set the name of the file.
221
222 flags [VALUE]
223 Instance method. Return the flags, as an integer. Use bitmasking
224 to get as the values you need.
225
226 header_as_string
227 Return a stringified version of the header that you might use for
228 logging/debugging purposes. It looks like this:
229
230 X-HQX-Software: BinHex 4.0 (Convert::BinHex 1.102)
231 X-HQX-Filename: Something_new.eps
232 X-HQX-Version: 0
233 X-HQX-Type: EPSF
234 X-HQX-Creator: ART5
235 X-HQX-Data-Length: 49731
236 X-HQX-Rsrc-Length: 23096
237
238 As some of you might have guessed, this is RFC-822-style, and may
239 be easily plunked down into the middle of a mail header, or split
240 into lines, etc.
241
242 requires [VALUE]
243 Instance method. Get/set the software version required to convert
244 this file, as extracted from the comment that preceded the actual
245 binhex'ed data; e.g.:
246
247 (This file must be converted with BinHex 4.0)
248
249 In this case, after parsing in the comment, the code:
250
251 $HQX->requires;
252
253 would get back "4.0".
254
255 resource [PARAMHASH]
256 Instance method. Get/set the resource fork. Any arguments are
257 passed into the new() method of "Convert::BinHex::Fork".
258
259 type [VALUE]
260 Instance method. Get/set the type of the file. This is a four-
261 character string (though I don't know if it's guaranteed to be
262 printable ASCII!) that serves as part of the Macintosh's version
263 of a MIME "content-type".
264
265 For example, a GIF89a file might have type "GF89".
266
267 version [VALUE]
268 Instance method. Get/set the version, as an integer.
269
270 Decode, high-level
271
272 read_comment
273 Instance method. Skip past the opening comment in the file, which
274 is of the form:
275
276 (This file must be converted with BinHex 4.0)
277
278 As per RFC-1741, this comment must immediately precede the BinHex
279 data, and any text before it will be ignored.
280
281 You don't need to invoke this method yourself; "read_header()" will
282 do it for you. After the call, the version number in the comment
283 is accessible via the "requires()" method.
284
285 read_header
286 Instance method. Read in the BinHex file header. You must do this
287 first!
288
289 read_data [NBYTES]
290 Instance method. Read information from the data fork. Use it in
291 an array context to slurp all the data into an array of scalars:
292
293 @data = $HQX->read_data;
294
295 Or use it in a scalar context to get the data piecemeal:
296
297 while (defined($data = $HQX->read_data)) {
298 # do stuff with $data
299 }
300
301 The NBYTES to read defaults to 2048.
302
303 read_resource [NBYTES]
304 Instance method. Read in all/some of the resource fork. See
305 "read_data()" for usage.
306
307 Encode, high-level
308
309 encode OUT
310 Encode the object as a BinHex stream to the given output handle
311 OUT. OUT can be a filehandle, or any blessed object that responds
312 to a "print()" message.
313
314 The leading comment is output, using the "requires()" attribute.
315
317 Convert::BinHex::Bin2Hex
318
319 A BINary-to-HEX converter. This kind of conversion requires a certain
320 amount of state information; it cannot be done by just calling a simple
321 function repeatedly. Use it like this:
322
323 # Create and use a "translator" object:
324 my $B2H = Convert::BinHex->bin2hex; # get a converter object
325 while (<STDIN>) {
326 print STDOUT $B2H->next($_); # convert some more input
327 }
328 print STDOUT $B2H->done; # no more input: finish up
329
330 # Re-use the object:
331 $B2H->rewind; # ready for more action!
332 while (<MOREIN>) { ...
333
334 On each iteration, "next()" (and "done()") may return either a decent-
335 sized non-empty string (indicating that more converted data is ready
336 for you) or an empty string (indicating that the converter is waiting
337 to amass more input in its private buffers before handing you more
338 stuff to output.
339
340 Note that "done()" always converts and hands you whatever is left.
341
342 This may have been a good approach. It may not. Someday, the con‐
343 verter may also allow you give it an object that responds to read(), or
344 a FileHandle, and it will do all the nasty buffer-filling on its own,
345 serving you stuff line by line:
346
347 # Someday, maybe...
348 my $B2H = Convert::BinHex->bin2hex(\*STDIN);
349 while (defined($_ = $B2H->getline)) {
350 print STDOUT $_;
351 }
352
353 Someday, maybe. Feel free to voice your opinions.
354
355 Convert::BinHex::Hex2Bin
356
357 A HEX-to-BINary converter. This kind of conversion requires a certain
358 amount of state information; it cannot be done by just calling a simple
359 function repeatedly. Use it like this:
360
361 # Create and use a "translator" object:
362 my $H2B = Convert::BinHex->hex2bin; # get a converter object
363 while (<STDIN>) {
364 print STDOUT $H2B->next($_); # convert some more input
365 }
366 print STDOUT $H2B->done; # no more input: finish up
367
368 # Re-use the object:
369 $H2B->rewind; # ready for more action!
370 while (<MOREIN>) { ...
371
372 On each iteration, "next()" (and "done()") may return either a decent-
373 sized non-empty string (indicating that more converted data is ready
374 for you) or an empty string (indicating that the converter is waiting
375 to amass more input in its private buffers before handing you more
376 stuff to output.
377
378 Note that "done()" always converts and hands you whatever is left.
379
380 Note that this converter does not find the initial "BinHex version"
381 comment. You have to skip that yourself. It only handles data between
382 the opening and closing ":".
383
384 Convert::BinHex::Fork
385
386 A fork in a Macintosh file.
387
388 # How to get them...
389 $data_fork = $HQX->data; # get the data fork
390 $rsrc_fork = $HQX->resource; # get the resource fork
391
392 # Make a new fork:
393 $FORK = Convert::BinHex::Fork->new(Path => "/tmp/file.data");
394 $FORK = Convert::BinHex::Fork->new(Data => $scalar);
395 $FORK = Convert::BinHex::Fork->new(Data => \@array_of_scalars);
396
397 # Get/set the length of the data fork:
398 $len = $FORK->length;
399 $FORK->length(170); # this overrides the REAL value: be careful!
400
401 # Get/set the path to the underlying data (if in a disk file):
402 $path = $FORK->path;
403 $FORK->path("/tmp/file.data");
404
405 # Get/set the in-core data itself, which may be a scalar or an arrayref:
406 $data = $FORK->data;
407 $FORK->data($scalar);
408 $FORK->data(\@array_of_scalars);
409
410 # Get/set the CRC:
411 $crc = $FORK->crc;
412 $FORK->crc($crc);
413
415 Design issues
416
417 BinHex needs a stateful parser
418 Unlike its cousins base64 and uuencode, BinHex format is not
419 amenable to being parsed line-by-line. There appears to be no
420 guarantee that lines contain 4n encoded characters... and even if
421 there is one, the BinHex compression algorithm interferes: even
422 when you can decode one line at a time, you can't necessarily
423 decompress a line at a time.
424
425 For example: a decoded line ending with the byte "\x90" (the escape
426 or "mark" character) is ambiguous: depending on the next decoded
427 byte, it could mean a literal "\x90" (if the next byte is a
428 "\x00"), or it could mean n-1 more repetitions of the previous
429 character (if the next byte is some nonzero "n").
430
431 For this reason, a BinHex parser has to be somewhat stateful: you
432 cannot have code like this:
433
434 #### NO! #### NO! #### NO! #### NO! #### NO! ####
435 while (<STDIN>) { # read HEX
436 print hexbin($_); # convert and write BIN
437 }
438
439 unless something is happening "behind the scenes" to keep track of
440 what was last done. The dangerous thing, however, is that this
441 approach will seem to work, if you only test it on BinHex files
442 which do not use compression and which have 4n HEX characters on
443 each line.
444
445 Since we have to be stateful anyway, we use the parser object to
446 keep our state.
447
448 We need to be handle large input files
449 Solutions that demand reading everything into core don't cut it in
450 my book. The first MPEG file that comes along can louse up your
451 whole day. So, there are no size limitations in this module: the
452 data is read on-demand, and filehandles are always an option.
453
454 Boy, is this slow!
455 A lot of the byte-level manipulation that has to go on, particu‐
456 larly the CRC computing (which involves intensive bit-shifting and
457 masking) slows this module down significantly. What is needed per‐
458 haps is an optional extension library where the slow pieces can be
459 done more quickly... a Convert::BinHex::CRC, if you will. Volun‐
460 teers, anyone?
461
462 Even considering that, however, it's slower than I'd like. I'm
463 sure many improvements can be made in the HEX-to-BIN end of things.
464 No doubt I'll attempt some as time goes on...
465
466 How it works
467
468 Since BinHex is a layered format, consisting of...
469
470 A Macintosh file [the "BIN"]...
471 Encoded as a structured 8-bit bytestream, then...
472 Compressed to reduce duplicate bytes, then...
473 Encoded as 7-bit ASCII [the "HEX"]
474
475 ...there is a layered parsing algorithm to reverse the process. Basi‐
476 cally, it works in a similar fashion to stdio's fread():
477
478 0. There is an internal buffer of decompressed (BIN) data,
479 initially empty.
480 1. Application asks to read() n bytes of data from object
481 2. If the buffer is not full enough to accomodate the request:
482 2a. The read() method grabs the next available chunk of input
483 data (the HEX).
484 2b. HEX data is converted and decompressed into as many BIN
485 bytes as possible.
486 2c. BIN bytes are added to the read() buffer.
487 2d. Go back to step 2a. until the buffer is full enough
488 or we hit end-of-input.
489
490 The conversion-and-decompression algorithms need their own internal
491 buffers and state (since the next input chunk may not contain all the
492 data needed for a complete conversion/decompression operation). These
493 are maintained in the object, so parsing two different input streams
494 simultaneously is possible.
495
497 Only handles "Hqx7" files, as per RFC-1741.
498
499 Remember that Macintosh text files use "\r" as end-of-line: this means
500 that if you want a textual file to look normal on a non-Mac system, you
501 probably want to do this to the data:
502
503 # Get the data, and output it according to normal conventions:
504 foreach ($HQX->read_data) { s/\r/\n/g; print }
505
507 Current version: $Id: BinHex.pm,v 1.119 1997/06/28 05:12:42 eryq Exp $
508
509 Version 1.118
510 Ready to go public (with Paul's version, patched for native Mac
511 support)! Warnings have been suppressed in a few places where
512 undefined values appear.
513
514 Version 1.115
515 Fixed another bug in comp2bin, related to the MARK falling on a
516 boundary between inputs. Added testing code.
517
518 Version 1.114
519 Added BIN-to-HEX conversion. Eh. It's a start. Also, a lot of
520 documentation additions and cleanups. Some methods were also
521 renamed.
522
523 Version 1.103
524 Fixed bug in decompression (wasn't saving last character). Fixed
525 "NoComment" bug.
526
527 Version 1.102
528 Initial release.
529
531 Written by Eryq, http://www.enteract.com/~eryq / eryq@enteract.com
532
533 Support for native-Mac conversion, plus invaluable contributions in
534 Alpha Testing, plus a few patches, plus the baseline binhex/debinhex
535 programs, were provided by Paul J. Schinder (NASA/GSFC).
536
537 Ken Lunde (Adobe) suggested incorporating the CAP file representation.
538
540 Copyright (c) 1997 by Eryq. All rights reserved. This program is free
541 software; you can redistribute it and/or modify it under the same terms
542 as Perl itself.
543
544 This software comes with NO WARRANTY of any kind. See the COPYING file
545 in the distribution for details.
546
547
548
549perl v5.8.8 1997-06-28 Convert::BinHex(3)