1Convert::BinHex(3)    User Contributed Perl Documentation   Convert::BinHex(3)
2
3
4

NAME

6       Convert::BinHex - extract data from Macintosh BinHex files
7
8       ALPHA WARNING: this code is currently in its Alpha release.  Things may
9       change drastically until the interface is hammered out: if you have
10       suggestions or objections, please speak up now!
11

SYNOPSIS

13       Simple functions:
14
15           use Convert::BinHex qw(binhex_crc macbinary_crc);
16
17           # Compute HQX7-style CRC for data, pumping in old CRC if desired:
18           $crc = binhex_crc($data, $crc);
19
20           # Compute the MacBinary-II-style CRC for the data:
21           $crc = macbinary_crc($data, $crc);
22
23       Hex to bin, low-level interface.  Conversion is actually done via an
24       object ("Convert::BinHex::Hex2Bin") which keeps internal conversion
25       state:
26
27           # Create and use a "translator" object:
28           my $H2B = Convert::BinHex->hex2bin;    # get a converter object
29           while (<STDIN>) {
30               print $STDOUT $H2B->next($_);        # convert some more input
31           }
32           print $STDOUT $H2B->done;              # no more input: finish up
33
34       Hex to bin, OO interface.  The following operations must be done in the
35       order shown!
36
37           # Read data in piecemeal:
38           $HQX = Convert::BinHex->open(FH=>\*STDIN) || die "open: $!";
39           $HQX->read_header;                  # read header info
40           @data = $HQX->read_data;            # read in all the data
41           @rsrc = $HQX->read_resource;        # read in all the resource
42
43       Bin to hex, low-level interface.  Conversion is actually done via an
44       object ("Convert::BinHex::Bin2Hex") which keeps internal conversion
45       state:
46
47           # Create and use a "translator" object:
48           my $B2H = Convert::BinHex->bin2hex;    # get a converter object
49           while (<STDIN>) {
50               print $STDOUT $B2H->next($_);        # convert some more input
51           }
52           print $STDOUT $B2H->done;              # no more input: finish up
53
54       Bin to hex, file interface.  Yes, you can convert to BinHex as well as
55       from it!
56
57           # Create new, empty object:
58           my $HQX = Convert::BinHex->new;
59
60           # Set header attributes:
61           $HQX->filename("logo.gif");
62           $HQX->type("GIFA");
63           $HQX->creator("CNVS");
64
65           # Give it the data and resource forks (either can be absent):
66           $HQX->data(Path => "/path/to/data");       # here, data is on disk
67           $HQX->resource(Data => $resourcefork);     # here, resource is in core
68
69           # Output as a BinHex stream, complete with leading comment:
70           $HQX->encode(\*STDOUT);
71
72       PLANNED!!!! Bin to hex, "CAP" interface.  Thanks to Ken Lunde for
73       suggesting this.
74
75           # Create new, empty object from CAP tree:
76           my $HQX = Convert::BinHex->from_cap("/path/to/root/file");
77           $HQX->encode(\*STDOUT);
78

DESCRIPTION

80       BinHex is a format used by Macintosh for transporting Mac files safely
81       through electronic mail, as short-lined, 7-bit, semi-compressed data
82       streams.  Ths module provides a means of converting those data streams
83       back into into binary data.
84

FORMAT

86       (Some text taken from RFC-1741.)  Files on the Macintosh consist of two
87       parts, called forks:
88
89       Data fork
90           The actual data included in the file.  The Data fork is typically
91           the only meaningful part of a Macintosh file on a non-Macintosh
92           computer system.  For example, if a Macintosh user wants to send a
93           file of data to a user on an IBM-PC, she would only send the Data
94           fork.
95
96       Resource fork
97           Contains a collection of arbitrary attribute/value pairs, including
98           program segments, icon bitmaps, and parametric values.
99
100       Additional information regarding Macintosh files is stored by the
101       Finder in a hidden file, called the "Desktop Database".
102
103       Because of the complications in storing different parts of a Macintosh
104       file in a non-Macintosh filesystem that only handles consecutive data
105       in one part, it is common to convert the Macintosh file into some other
106       format before transferring it over the network.  The BinHex format
107       squashes that data into transmittable ASCII as follows:
108
109       1.  The file is output as a byte stream consisting of some basic header
110           information (filename, type, creator), then the data fork, then the
111           resource fork.
112
113       2.  The byte stream is compressed by looking for series of duplicated
114           bytes and representing them using a special binary escape sequence
115           (of course, any occurences of the escape character must also be
116           escaped).
117
118       3.  The compressed stream is encoded via the "6/8 hemiola" common to
119           base64 and uuencode: each group of three 8-bit bytes (24 bits) is
120           chopped into four 6-bit numbers, which are used as indexes into an
121           ASCII "alphabet".  (I assume that leftover bytes are zero-padded;
122           documentation is thin).
123

FUNCTIONS

125   CRC computation
126       macbinary_crc DATA, SEED
127           Compute the MacBinary-II-style CRC for the given DATA, with the CRC
128           seeded to SEED.  Normally, you start with a SEED of 0, and you pump
129           in the previous CRC as the SEED if you're handling a lot of data
130           one chunk at a time.  That is:
131
132               $crc = 0;
133               while (<STDIN>) {
134                   $crc = macbinary_crc($_, $crc);
135               }
136
137           Note: Extracted from the mcvert utility (Doug Moore, April '87),
138           using a "magic array" algorithm by Jim Van Verth for efficiency.
139           Converted to Perl5 by Eryq.  Untested.
140
141       binhex_crc DATA, SEED
142           Compute the HQX-style CRC for the given DATA, with the CRC seeded
143           to SEED.  Normally, you start with a SEED of 0, and you pump in the
144           previous CRC as the SEED if you're handling a lot of data one chunk
145           at a time.  That is:
146
147               $crc = 0;
148               while (<STDIN>) {
149                   $crc = binhex_crc($_, $crc);
150               }
151
152           Note: Extracted from the mcvert utility (Doug Moore, April '87),
153           using a "magic array" algorithm by Jim Van Verth for efficiency.
154           Converted to Perl5 by Eryq.
155

OO INTERFACE

157   Conversion
158       bin2hex
159           Class method, constructor.  Return a converter object.  Just
160           creates a new instance of "Convert::BinHex::Bin2Hex"; see that
161           class for details.
162
163       hex2bin
164           Class method, constructor.  Return a converter object.  Just
165           creates a new instance of "Convert::BinHex::Hex2Bin"; see that
166           class for details.
167
168   Construction
169       new PARAMHASH
170           Class method, constructor.  Return a handle on a BinHex'able
171           entity.  In general, the data and resource forks for such an entity
172           are stored in native format (binary) format.
173
174           Parameters in the PARAMHASH are the same as header-oriented method
175           names, and may be used to set attributes:
176
177               $HQX = new Convert::BinHex filename => "icon.gif",
178                                          type    => "GIFB",
179                                          creator => "CNVS";
180
181       open PARAMHASH
182           Class method, constructor.  Return a handle on a new BinHex'ed
183           stream, for parsing.  Params are:
184
185           Data
186               Input a HEX stream from the given data.  This can be a scalar,
187               or a reference to an array of scalars.
188
189           Expr
190               Input a HEX stream from any open()able expression.  It will be
191               opened and binmode'd, and the filehandle will be closed either
192               on a "close()" or when the object is destructed.
193
194           FH  Input a HEX stream from the given filehandle.
195
196           NoComment
197               If true, the parser should not attempt to skip a leading "(This
198               file...)"  comment.  That means that the first nonwhite
199               characters encountered must be the binhex'ed data.
200
201   Get/set header information
202       creator [VALUE]
203           Instance method.  Get/set the creator of the file.  This is a four-
204           character string (though I don't know if it's guaranteed to be
205           printable ASCII!)  that serves as part of the Macintosh's version
206           of a MIME "content-type".
207
208           For example, a document created by "Canvas" might have creator
209           "CNVS".
210
211       data [PARAMHASH]
212           Instance method.  Get/set the data fork.  Any arguments are passed
213           into the new() method of "Convert::BinHex::Fork".
214
215       filename [VALUE]
216           Instance method.  Get/set the name of the file.
217
218       flags [VALUE]
219           Instance method.  Return the flags, as an integer.  Use bitmasking
220           to get as the values you need.
221
222       header_as_string
223           Return a stringified version of the header that you might use for
224           logging/debugging purposes.  It looks like this:
225
226               X-HQX-Software: BinHex 4.0 (Convert::BinHex 1.102)
227               X-HQX-Filename: Something_new.eps
228               X-HQX-Version: 0
229               X-HQX-Type: EPSF
230               X-HQX-Creator: ART5
231               X-HQX-Data-Length: 49731
232               X-HQX-Rsrc-Length: 23096
233
234           As some of you might have guessed, this is RFC-822-style, and may
235           be easily plunked down into the middle of a mail header, or split
236           into lines, etc.
237
238       requires [VALUE]
239           Instance method.  Get/set the software version required to convert
240           this file, as extracted from the comment that preceded the actual
241           binhex'ed data; e.g.:
242
243               (This file must be converted with BinHex 4.0)
244
245           In this case, after parsing in the comment, the code:
246
247               $HQX->requires;
248
249           would get back "4.0".
250
251       resource [PARAMHASH]
252           Instance method.  Get/set the resource fork.  Any arguments are
253           passed into the new() method of "Convert::BinHex::Fork".
254
255       type [VALUE]
256           Instance method.  Get/set the type of the file.  This is a four-
257           character string (though I don't know if it's guaranteed to be
258           printable ASCII!)  that serves as part of the Macintosh's version
259           of a MIME "content-type".
260
261           For example, a GIF89a file might have type "GF89".
262
263       version [VALUE]
264           Instance method.  Get/set the version, as an integer.
265
266   Decode, high-level
267       read_comment
268           Instance method.  Skip past the opening comment in the file, which
269           is of the form:
270
271              (This file must be converted with BinHex 4.0)
272
273           As per RFC-1741, this comment must immediately precede the BinHex
274           data, and any text before it will be ignored.
275
276           You don't need to invoke this method yourself; "read_header()" will
277           do it for you.  After the call, the version number in the comment
278           is accessible via the "requires()" method.
279
280       read_header
281           Instance method.  Read in the BinHex file header.  You must do this
282           first!
283
284       read_data [NBYTES]
285           Instance method.  Read information from the data fork.  Use it in
286           an array context to slurp all the data into an array of scalars:
287
288               @data = $HQX->read_data;
289
290           Or use it in a scalar context to get the data piecemeal:
291
292               while (defined($data = $HQX->read_data)) {
293                  # do stuff with $data
294               }
295
296           The NBYTES to read defaults to 2048.
297
298       read_resource [NBYTES]
299           Instance method.  Read in all/some of the resource fork.  See
300           "read_data()" for usage.
301
302   Encode, high-level
303       encode OUT
304           Encode the object as a BinHex stream to the given output handle
305           OUT.  OUT can be a filehandle, or any blessed object that responds
306           to a "print()" message.
307
308           The leading comment is output, using the "requires()" attribute.
309

SUBMODULES

311   Convert::BinHex::Bin2Hex
312       A BINary-to-HEX converter.  This kind of conversion requires a certain
313       amount of state information; it cannot be done by just calling a simple
314       function repeatedly.  Use it like this:
315
316           # Create and use a "translator" object:
317           my $B2H = Convert::BinHex->bin2hex;    # get a converter object
318           while (<STDIN>) {
319               print STDOUT $B2H->next($_);          # convert some more input
320           }
321           print STDOUT $B2H->done;               # no more input: finish up
322
323           # Re-use the object:
324           $B2H->rewind;                 # ready for more action!
325           while (<MOREIN>) { ...
326
327       On each iteration, "next()" (and "done()") may return either a decent-
328       sized non-empty string (indicating that more converted data is ready
329       for you) or an empty string (indicating that the converter is waiting
330       to amass more input in its private buffers before handing you more
331       stuff to output.
332
333       Note that "done()" always converts and hands you whatever is left.
334
335       This may have been a good approach.  It may not.  Someday, the
336       converter may also allow you give it an object that responds to read(),
337       or a FileHandle, and it will do all the nasty buffer-filling on its
338       own, serving you stuff line by line:
339
340           # Someday, maybe...
341           my $B2H = Convert::BinHex->bin2hex(\*STDIN);
342           while (defined($_ = $B2H->getline)) {
343               print STDOUT $_;
344           }
345
346       Someday, maybe.  Feel free to voice your opinions.
347
348   Convert::BinHex::Hex2Bin
349       A HEX-to-BINary converter. This kind of conversion requires a certain
350       amount of state information; it cannot be done by just calling a simple
351       function repeatedly.  Use it like this:
352
353           # Create and use a "translator" object:
354           my $H2B = Convert::BinHex->hex2bin;    # get a converter object
355           while (<STDIN>) {
356               print STDOUT $H2B->next($_);          # convert some more input
357           }
358           print STDOUT $H2B->done;               # no more input: finish up
359
360           # Re-use the object:
361           $H2B->rewind;                 # ready for more action!
362           while (<MOREIN>) { ...
363
364       On each iteration, "next()" (and "done()") may return either a decent-
365       sized non-empty string (indicating that more converted data is ready
366       for you) or an empty string (indicating that the converter is waiting
367       to amass more input in its private buffers before handing you more
368       stuff to output.
369
370       Note that "done()" always converts and hands you whatever is left.
371
372       Note that this converter does not find the initial "BinHex version"
373       comment.  You have to skip that yourself.  It only handles data between
374       the opening and closing ":".
375
376   Convert::BinHex::Fork
377       A fork in a Macintosh file.
378
379           # How to get them...
380           $data_fork = $HQX->data;      # get the data fork
381           $rsrc_fork = $HQX->resource;  # get the resource fork
382
383           # Make a new fork:
384           $FORK = Convert::BinHex::Fork->new(Path => "/tmp/file.data");
385           $FORK = Convert::BinHex::Fork->new(Data => $scalar);
386           $FORK = Convert::BinHex::Fork->new(Data => \@array_of_scalars);
387
388           # Get/set the length of the data fork:
389           $len = $FORK->length;
390           $FORK->length(170);        # this overrides the REAL value: be careful!
391
392           # Get/set the path to the underlying data (if in a disk file):
393           $path = $FORK->path;
394           $FORK->path("/tmp/file.data");
395
396           # Get/set the in-core data itself, which may be a scalar or an arrayref:
397           $data = $FORK->data;
398           $FORK->data($scalar);
399           $FORK->data(\@array_of_scalars);
400
401           # Get/set the CRC:
402           $crc = $FORK->crc;
403           $FORK->crc($crc);
404

UNDER THE HOOD

406   Design issues
407       BinHex needs a stateful parser
408           Unlike its cousins base64 and uuencode, BinHex format is not
409           amenable to being parsed line-by-line.  There appears to be no
410           guarantee that lines contain 4n encoded characters... and even if
411           there is one, the BinHex compression algorithm interferes: even
412           when you can decode one line at a time, you can't necessarily
413           decompress a line at a time.
414
415           For example: a decoded line ending with the byte "\x90" (the escape
416           or "mark" character) is ambiguous: depending on the next decoded
417           byte, it could mean a literal "\x90" (if the next byte is a
418           "\x00"), or it could mean n-1 more repetitions of the previous
419           character (if the next byte is some nonzero "n").
420
421           For this reason, a BinHex parser has to be somewhat stateful: you
422           cannot have code like this:
423
424               #### NO! #### NO! #### NO! #### NO! #### NO! ####
425               while (<STDIN>) {            # read HEX
426                   print hexbin($_);          # convert and write BIN
427               }
428
429           unless something is happening "behind the scenes" to keep track of
430           what was last done.  The dangerous thing, however, is that this
431           approach will seem to work, if you only test it on BinHex files
432           which do not use compression and which have 4n HEX characters on
433           each line.
434
435           Since we have to be stateful anyway, we use the parser object to
436           keep our state.
437
438       We need to be handle large input files
439           Solutions that demand reading everything into core don't cut it in
440           my book.  The first MPEG file that comes along can louse up your
441           whole day.  So, there are no size limitations in this module: the
442           data is read on-demand, and filehandles are always an option.
443
444       Boy, is this slow!
445           A lot of the byte-level manipulation that has to go on,
446           particularly the CRC computing (which involves intensive bit-
447           shifting and masking) slows this module down significantly.  What
448           is needed perhaps is an optional extension library where the slow
449           pieces can be done more quickly... a Convert::BinHex::CRC, if you
450           will.  Volunteers, anyone?
451
452           Even considering that, however, it's slower than I'd like.  I'm
453           sure many improvements can be made in the HEX-to-BIN end of things.
454           No doubt I'll attempt some as time goes on...
455
456   How it works
457       Since BinHex is a layered format, consisting of...
458
459             A Macintosh file [the "BIN"]...
460                Encoded as a structured 8-bit bytestream, then...
461                   Compressed to reduce duplicate bytes, then...
462                      Encoded as 7-bit ASCII [the "HEX"]
463
464       ...there is a layered parsing algorithm to reverse the process.
465       Basically, it works in a similar fashion to stdio's fread():
466
467              0. There is an internal buffer of decompressed (BIN) data,
468                 initially empty.
469              1. Application asks to read() n bytes of data from object
470              2. If the buffer is not full enough to accomodate the request:
471                   2a. The read() method grabs the next available chunk of input
472                       data (the HEX).
473                   2b. HEX data is converted and decompressed into as many BIN
474                       bytes as possible.
475                   2c. BIN bytes are added to the read() buffer.
476                   2d. Go back to step 2a. until the buffer is full enough
477                       or we hit end-of-input.
478
479       The conversion-and-decompression algorithms need their own internal
480       buffers and state (since the next input chunk may not contain all the
481       data needed for a complete conversion/decompression operation).  These
482       are maintained in the object, so parsing two different input streams
483       simultaneously is possible.
484

WARNINGS

486       Only handles "Hqx7" files, as per RFC-1741.
487
488       Remember that Macintosh text files use "\r" as end-of-line: this means
489       that if you want a textual file to look normal on a non-Mac system, you
490       probably want to do this to the data:
491
492           # Get the data, and output it according to normal conventions:
493           foreach ($HQX->read_data) { s/\r/\n/g; print }
494

CHANGE LOG

496       Current version: $Id: BinHex.pm,v 1.119 1997/06/28 05:12:42 eryq Exp $
497
498       Version 1.118
499           Ready to go public (with Paul's version, patched for native Mac
500           support)!  Warnings have been suppressed in a few places where
501           undefined values appear.
502
503       Version 1.115
504           Fixed another bug in comp2bin, related to the MARK falling on a
505           boundary between inputs.  Added testing code.
506
507       Version 1.114
508           Added BIN-to-HEX conversion.  Eh.  It's a start.  Also, a lot of
509           documentation additions and cleanups.  Some methods were also
510           renamed.
511
512       Version 1.103
513           Fixed bug in decompression (wasn't saving last character).  Fixed
514           "NoComment" bug.
515
516       Version 1.102
517           Initial release.
518

AUTHOR AND CREDITS

520       Written by Eryq, http://www.enteract.com/~eryq / eryq@enteract.com
521
522       Support for native-Mac conversion, plus invaluable contributions in
523       Alpha Testing, plus a few patches, plus the baseline binhex/debinhex
524       programs, were provided by Paul J. Schinder (NASA/GSFC).
525
526       Ken Lunde (Adobe) suggested incorporating the CAP file representation.
527

TERMS AND CONDITIONS

529       Copyright (c) 1997 by Eryq.  All rights reserved.  This program is free
530       software; you can redistribute it and/or modify it under the same terms
531       as Perl itself.
532
533       This software comes with NO WARRANTY of any kind.  See the COPYING file
534       in the distribution for details.
535
536
537
538perl v5.10.1                      1997-06-28                Convert::BinHex(3)
Impressum