IO::Compress::FAQ(3pm)

1IO::Compress::FAQ(3)  User Contributed Perl Documentation IO::Compress::FAQ(3)
2
3
4

NAME

6       IO::Compress::FAQ -- Frequently Asked Questions about IO::Compress
7

DESCRIPTION

9       Common questions answered.
10

GENERAL

12   Compatibility with Unix compress/uncompress.
13       Although "Compress::Zlib" has a pair of functions called "compress" and
14       "uncompress", they are not related to the Unix programs of the same
15       name. The "Compress::Zlib" module is not compatible with Unix
16       "compress".
17
18       If you have the "uncompress" program available, you can use this to
19       read compressed files
20
21           open F, "uncompress -c $filename |";
22           while (<F>)
23           {
24               ...
25
26       Alternatively, if you have the "gunzip" program available, you can use
27       this to read compressed files
28
29           open F, "gunzip -c $filename |";
30           while (<F>)
31           {
32               ...
33
34       and this to write compress files, if you have the "compress" program
35       available
36
37           open F, "| compress -c $filename ";
38           print F "data";
39           ...
40           close F ;
41
42   Accessing .tar.Z files
43       The "Archive::Tar" module can optionally use "Compress::Zlib" (via the
44       "IO::Zlib" module) to access tar files that have been compressed with
45       "gzip". Unfortunately tar files compressed with the Unix "compress"
46       utility cannot be read by "Compress::Zlib" and so cannot be directly
47       accessed by "Archive::Tar".
48
49       If the "uncompress" or "gunzip" programs are available, you can use one
50       of these workarounds to read ".tar.Z" files from "Archive::Tar"
51
52       Firstly with "uncompress"
53
54           use strict;
55           use warnings;
56           use Archive::Tar;
57
58           open F, "uncompress -c $filename |";
59           my $tar = Archive::Tar->new(*F);
60           ...
61
62       and this with "gunzip"
63
64           use strict;
65           use warnings;
66           use Archive::Tar;
67
68           open F, "gunzip -c $filename |";
69           my $tar = Archive::Tar->new(*F);
70           ...
71
72       Similarly, if the "compress" program is available, you can use this to
73       write a ".tar.Z" file
74
75           use strict;
76           use warnings;
77           use Archive::Tar;
78           use IO::File;
79
80           my $fh = new IO::File "| compress -c >$filename";
81           my $tar = Archive::Tar->new();
82           ...
83           $tar->write($fh);
84           $fh->close ;
85
86   How do I recompress using a different compression?
87       This is easier that you might expect if you realise that all the
88       "IO::Compress::*" objects are derived from "IO::File" and that all the
89       "IO::Uncompress::*" modules can read from an "IO::File" filehandle.
90
91       So, for example, say you have a file compressed with gzip that you want
92       to recompress with bzip2. Here is all that is needed to carry out the
93       recompression.
94
95           use IO::Uncompress::Gunzip ':all';
96           use IO::Compress::Bzip2 ':all';
97
98           my $gzipFile = "somefile.gz";
99           my $bzipFile = "somefile.bz2";
100
101           my $gunzip = new IO::Uncompress::Gunzip $gzipFile
102               or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
103
104           bzip2 $gunzip => $bzipFile
105               or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
106
107       Note, there is a limitation of this technique. Some compression file
108       formats store extra information along with the compressed data payload.
109       For example, gzip can optionally store the original filename and Zip
110       stores a lot of information about the original file. If the original
111       compressed file contains any of this extra information, it will not be
112       transferred to the new compressed file usign the technique above.
113

ZIP

115   What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip
116       support?
117       The following compression formats are supported by "IO::Compress::Zip"
118       and "IO::Uncompress::Unzip"
119
120       ·    Store (method 0)
121
122            No compression at all.
123
124       ·    Deflate (method 8)
125
126            This is the default compression used when creating a zip file with
127            "IO::Compress::Zip".
128
129       ·    Bzip2 (method 12)
130
131            Only supported if the "IO-Compress-Bzip2" module is installed.
132
133       ·    Lzma (method 14)
134
135            Only supported if the "IO-Compress-Lzma" module is installed.
136
137   Can I Read/Write Zip files larger the 4 Gig?
138       Yes, both the "IO-Compress-Zip" and "IO-Uncompress-Unzip"  modules
139       support the zip feature called Zip64. That allows them to read/write
140       files/buffers larger than 4Gig.
141
142       If you are creating a Zip file using the one-shot interface, and any of
143       the input files is greater than 4Gig, a zip64 complaint zip file will
144       be created.
145
146           zip "really-large-file" => "my.zip";
147
148       Similarly with the one-shot interface, if the input is a buffer larger
149       than 4 Gig, a zip64 complaint zip file will be created.
150
151           zip \$really_large_buffer => "my.zip";
152
153       The one-shot interface allows you to force the creation of a zip64 zip
154       file by including the "Zip64" option.
155
156           zip $filehandle => "my.zip", Zip64 => 1;
157
158       If you want to create a zip64 zip file with the OO interface you must
159       specify the "Zip64" option.
160
161           my $zip = new IO::Compress::Zip "whatever", Zip64 => 1;
162
163       When uncompressing with "IO-Uncompress-Unzip", it will automatically
164       detect if the zip file is zip64.
165
166       If you intend to manipulate the Zip64 zip files created with
167       "IO-Compress-Zip" using an external zip/unzip, make sure that it
168       supports Zip64.
169
170       In particular, if you are using Info-Zip you need to have zip version
171       3.x or better to update a Zip64 archive and unzip version 6.x to read a
172       zip64 archive.
173
174   Can I write more that 64K entries is a Zip files?
175       Yes. Zip64 allows this. See previous question.
176
177   Zip Resources
178       The primary reference for zip files is the "appnote" document available
179       at <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>
180
181       An alternatively is the Info-Zip appnote. This is available from
182       <ftp://ftp.info-zip.org/pub/infozip/doc/>
183

GZIP

185   Gzip Resources
186       The primary reference for gzip files is RFC 1952
187       <http://www.faqs.org/rfcs/rfc1952.html>
188
189       The primary site for gzip is http://www.gzip.org.
190
191   Dealing with Concatenated gzip files
192       If the gunzip program encounters a file containing multiple gzip files
193       concatenated together it will automatically uncompress them all.  The
194       example below illustrates this behaviour
195
196           $ echo abc | gzip -c >x.gz
197           $ echo def | gzip -c >>x.gz
198           $ gunzip -c x.gz
199           abc
200           def
201
202       By default "IO::Uncompress::Gunzip" will not bahave like the gunzip
203       program. It will only uncompress the first gzip data stream in the
204       file, as shown below
205
206           $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
207           abc
208
209       To force "IO::Uncompress::Gunzip" to uncompress all the gzip data
210       streams, include the "MultiStream" option, as shown below
211
212           $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
213           abc
214           def
215

ZLIB

217   Zlib Resources
218       The primary site for the zlib compression library is
219       http://www.zlib.org.
220

Bzip2

222   Bzip2 Resources
223       The primary site for bzip2 is http://www.bzip.org.
224
225   Dealing with Concatenated bzip2 files
226       If the bunzip2 program encounters a file containing multiple bzip2
227       files concatenated together it will automatically uncompress them all.
228       The example below illustrates this behaviour
229
230           $ echo abc | bzip2 -c >x.bz2
231           $ echo def | bzip2 -c >>x.bz2
232           $ bunzip2 -c x.bz2
233           abc
234           def
235
236       By default "IO::Uncompress::Bunzip2" will not bahave like the bunzip2
237       program. It will only uncompress the first bunzip2 data stream in the
238       file, as shown below
239
240           $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
241           abc
242
243       To force "IO::Uncompress::Bunzip2" to uncompress all the bzip2 data
244       streams, include the "MultiStream" option, as shown below
245
246           $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
247           abc
248           def
249
250   Interoperating with Pbzip2
251       Pbzip2 (<http://compression.ca/pbzip2/>) is a parallel implementation
252       of bzip2. The output from pbzip2 consists of a series of concatenated
253       bzip2 data streams.
254
255       By default "IO::Uncompress::Bzip2" will only uncompress the first bzip2
256       data stream in a pbzip2 file. To uncompress the complete pbzip2 file
257       you must include the "MultiStream" option, like this.
258
259           bunzip2 $input => \$output, MultiStream => 1
260               or die "bunzip2 failed: $Bunzip2Error\n";
261

HTTP & NETWORK

263   Apache::GZip Revisited
264       Below is a mod_perl Apache compression module, called "Apache::GZip",
265       taken from
266       http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression
267
268         package Apache::GZip;
269         #File: Apache::GZip.pm
270
271         use strict vars;
272         use Apache::Constants ':common';
273         use Compress::Zlib;
274         use IO::File;
275         use constant GZIP_MAGIC => 0x1f8b;
276         use constant OS_MAGIC => 0x03;
277
278         sub handler {
279             my $r = shift;
280             my ($fh,$gz);
281             my $file = $r->filename;
282             return DECLINED unless $fh=IO::File->new($file);
283             $r->header_out('Content-Encoding'=>'gzip');
284             $r->send_http_header;
285             return OK if $r->header_only;
286
287             tie *STDOUT,'Apache::GZip',$r;
288             print($_) while <$fh>;
289             untie *STDOUT;
290             return OK;
291         }
292
293         sub TIEHANDLE {
294             my($class,$r) = @_;
295             # initialize a deflation stream
296             my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
297
298             # gzip header -- don't ask how I found out
299             $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
300
301             return bless { r   => $r,
302                            crc =>  crc32(undef),
303                            d   => $d,
304                            l   =>  0
305                          },$class;
306         }
307
308         sub PRINT {
309             my $self = shift;
310             foreach (@_) {
311               # deflate the data
312               my $data = $self->{d}->deflate($_);
313               $self->{r}->print($data);
314               # keep track of its length and crc
315               $self->{l} += length($_);
316               $self->{crc} = crc32($_,$self->{crc});
317             }
318         }
319
320         sub DESTROY {
321            my $self = shift;
322
323            # flush the output buffers
324            my $data = $self->{d}->flush;
325            $self->{r}->print($data);
326
327            # print the CRC and the total length (uncompressed)
328            $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
329         }
330
331         1;
332
333       Here's the Apache configuration entry you'll need to make use of it.
334       Once set it will result in everything in the /compressed directory will
335       be compressed automagically.
336
337         <Location /compressed>
338            SetHandler  perl-script
339            PerlHandler Apache::GZip
340         </Location>
341
342       Although at first sight there seems to be quite a lot going on in
343       "Apache::GZip", you could sum up what the code was doing as follows --
344       read the contents of the file in "$r->filename", compress it and write
345       the compressed data to standard output. That's all.
346
347       This code has to jump through a few hoops to achieve this because
348
349       1.  The gzip support in "Compress::Zlib" version 1.x can only work with
350           a real filesystem filehandle. The filehandles used by Apache
351           modules are not associated with the filesystem.
352
353       2.  That means all the gzip support has to be done by hand - in this
354           case by creating a tied filehandle to deal with creating the gzip
355           header and trailer.
356
357       "IO::Compress::Gzip" doesn't have that filehandle limitation (this was
358       one of the reasons for writing it in the first place). So if
359       "IO::Compress::Gzip" is used instead of "Compress::Zlib" the whole tied
360       filehandle code can be removed. Here is the rewritten code.
361
362         package Apache::GZip;
363
364         use strict vars;
365         use Apache::Constants ':common';
366         use IO::Compress::Gzip;
367         use IO::File;
368
369         sub handler {
370             my $r = shift;
371             my ($fh,$gz);
372             my $file = $r->filename;
373             return DECLINED unless $fh=IO::File->new($file);
374             $r->header_out('Content-Encoding'=>'gzip');
375             $r->send_http_header;
376             return OK if $r->header_only;
377
378             my $gz = new IO::Compress::Gzip '-', Minimal => 1
379                 or return DECLINED ;
380
381             print $gz $_ while <$fh>;
382
383             return OK;
384         }
385
386       or even more succinctly, like this, using a one-shot gzip
387
388         package Apache::GZip;
389
390         use strict vars;
391         use Apache::Constants ':common';
392         use IO::Compress::Gzip qw(gzip);
393
394         sub handler {
395             my $r = shift;
396             $r->header_out('Content-Encoding'=>'gzip');
397             $r->send_http_header;
398             return OK if $r->header_only;
399
400             gzip $r->filename => '-', Minimal => 1
401               or return DECLINED ;
402
403             return OK;
404         }
405
406         1;
407
408       The use of one-shot "gzip" above just reads from "$r->filename" and
409       writes the compressed data to standard output.
410
411       Note the use of the "Minimal" option in the code above. When using gzip
412       for Content-Encoding you should always use this option. In the example
413       above it will prevent the filename being included in the gzip header
414       and make the size of the gzip data stream a slight bit smaller.
415
416   Compressed files and Net::FTP
417       The "Net::FTP" module provides two low-level methods called "stor" and
418       "retr" that both return filehandles. These filehandles can used with
419       the "IO::Compress/Uncompress" modules to compress or uncompress files
420       read from or written to an FTP Server on the fly, without having to
421       create a temporary file.
422
423       Firstly, here is code that uses "retr" to uncompressed a file as it is
424       read from the FTP Server.
425
426           use Net::FTP;
427           use IO::Uncompress::Gunzip qw(:all);
428
429           my $ftp = new Net::FTP ...
430
431           my $retr_fh = $ftp->retr($compressed_filename);
432           gunzip $retr_fh => $outFilename, AutoClose => 1
433               or die "Cannot uncompress '$compressed_file': $GunzipError\n";
434
435       and this to compress a file as it is written to the FTP Server
436
437           use Net::FTP;
438           use IO::Compress::Gzip qw(:all);
439
440           my $stor_fh = $ftp->stor($filename);
441           gzip "filename" => $stor_fh, AutoClose => 1
442               or die "Cannot compress '$filename': $GzipError\n";
443

MISC

445   Using "InputLength" to uncompress data embedded in a larger file/buffer.
446       A fairly common use-case is where compressed data is embedded in a
447       larger file/buffer and you want to read both.
448
449       As an example consider the structure of a zip file. This is a well-
450       defined file format that mixes both compressed and uncompressed
451       sections of data in a single file.
452
453       For the purposes of this discussion you can think of a zip file as
454       sequence of compressed data streams, each of which is prefixed by an
455       uncompressed local header. The local header contains information about
456       the compressed data stream, including the name of the compressed file
457       and, in particular, the length of the compressed data stream.
458
459       To illustrate how to use "InputLength" here is a script that walks a
460       zip file and prints out how many lines are in each compressed file (if
461       you intend write code to walking through a zip file for real see
462       "Walking through a zip file" in IO::Uncompress::Unzip ). Also, although
463       this example uses the zlib-based compression, the technique can be used
464       by the other "IO::Uncompress::*" modules.
465
466           use strict;
467           use warnings;
468
469           use IO::File;
470           use IO::Uncompress::RawInflate qw(:all);
471
472           use constant ZIP_LOCAL_HDR_SIG  => 0x04034b50;
473           use constant ZIP_LOCAL_HDR_LENGTH => 30;
474
475           my $file = $ARGV[0] ;
476
477           my $fh = new IO::File "<$file"
478                       or die "Cannot open '$file': $!\n";
479
480           while (1)
481           {
482               my $sig;
483               my $buffer;
484
485               my $x ;
486               ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
487                   or die "Truncated file: $!\n";
488
489               my $signature = unpack ("V", substr($buffer, 0, 4));
490
491               last unless $signature == ZIP_LOCAL_HDR_SIG;
492
493               # Read Local Header
494               my $gpFlag             = unpack ("v", substr($buffer, 6, 2));
495               my $compressedMethod   = unpack ("v", substr($buffer, 8, 2));
496               my $compressedLength   = unpack ("V", substr($buffer, 18, 4));
497               my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
498               my $filename_length    = unpack ("v", substr($buffer, 26, 2));
499               my $extra_length       = unpack ("v", substr($buffer, 28, 2));
500
501               my $filename ;
502               $fh->read($filename, $filename_length) == $filename_length
503                   or die "Truncated file\n";
504
505               $fh->read($buffer, $extra_length) == $extra_length
506                   or die "Truncated file\n";
507
508               if ($compressedMethod != 8 && $compressedMethod != 0)
509               {
510                   warn "Skipping file '$filename' - not deflated $compressedMethod\n";
511                   $fh->read($buffer, $compressedLength) == $compressedLength
512                       or die "Truncated file\n";
513                   next;
514               }
515
516               if ($compressedMethod == 0 && $gpFlag & 8 == 8)
517               {
518                   die "Streamed Stored not supported for '$filename'\n";
519               }
520
521               next if $compressedLength == 0;
522
523               # Done reading the Local Header
524
525               my $inf = new IO::Uncompress::RawInflate $fh,
526                                   Transparent => 1,
527                                   InputLength => $compressedLength
528                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
529
530               my $line_count = 0;
531
532               while (<$inf>)
533               {
534                   ++ $line_count;
535               }
536
537               print "$filename: $line_count\n";
538           }
539
540       The majority of the code above is concerned with reading the zip local
541       header data. The code that I want to focus on is at the bottom.
542
543           while (1) {
544
545               # read local zip header data
546               # get $filename
547               # get $compressedLength
548
549               my $inf = new IO::Uncompress::RawInflate $fh,
550                                   Transparent => 1,
551                                   InputLength => $compressedLength
552                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
553
554               my $line_count = 0;
555
556               while (<$inf>)
557               {
558                   ++ $line_count;
559               }
560
561               print "$filename: $line_count\n";
562           }
563
564       The call to "IO::Uncompress::RawInflate" creates a new filehandle $inf
565       that can be used to read from the parent filehandle $fh, uncompressing
566       it as it goes. The use of the "InputLength" option will guarantee that
567       at most $compressedLength bytes of compressed data will be read from
568       the $fh filehandle (The only exception is for an error case like a
569       truncated file or a corrupt data stream).
570
571       This means that once RawInflate is finished $fh will be left at the
572       byte directly after the compressed data stream.
573
574       Now consider what the code looks like without "InputLength"
575
576           while (1) {
577
578               # read local zip header data
579               # get $filename
580               # get $compressedLength
581
582               # read all the compressed data into $data
583               read($fh, $data, $compressedLength);
584
585               my $inf = new IO::Uncompress::RawInflate \$data,
586                                   Transparent => 1,
587                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
588
589               my $line_count = 0;
590
591               while (<$inf>)
592               {
593                   ++ $line_count;
594               }
595
596               print "$filename: $line_count\n";
597           }
598
599       The difference here is the addition of the temporary variable $data.
600       This is used to store a copy of the compressed data while it is being
601       uncompressed.
602
603       If you know that $compressedLength isn't that big then using temporary
604       storage won't be a problem. But if $compressedLength is very large or
605       you are writing an application that other people will use, and so have
606       no idea how big $compressedLength will be, it could be an issue.
607
608       Using "InputLength" avoids the use of temporary storage and means the
609       application can cope with large compressed data streams.
610
611       One final point -- obviously "InputLength" can only be used whenever
612       you know the length of the compressed data beforehand, like here with a
613       zip file.
614

AUTHOR

630       This module was written by Paul Marquess, pmqs@cpan.org.
631

MODIFICATION HISTORY

633       See the Changes file.
634

COPYRIGHT AND LICENSE

636       Copyright (c) 2005-2013 Paul Marquess. All rights reserved.
637
638       This program is free software; you can redistribute it and/or modify it
639       under the same terms as Perl itself.
640
641
642
643perl v5.16.3                      2013-05-19              IO::Compress::FAQ(3)