IO::Compress::FAQ(3pm)

1IO::Compress::FAQ(3)  User Contributed Perl Documentation IO::Compress::FAQ(3)
2
3
4

NAME

6       IO::Compress::FAQ -- Frequently Asked Questions about IO::Compress
7

DESCRIPTION

9       Common questions answered.
10

GENERAL

12   Compatibility with Unix compress/uncompress.
13       Although "Compress::Zlib" has a pair of functions called "compress" and
14       "uncompress", they are not related to the Unix programs of the same
15       name. The "Compress::Zlib" module is not compatible with Unix
16       "compress".
17
18       If you have the "uncompress" program available, you can use this to
19       read compressed files
20
21           open F, "uncompress -c $filename |";
22           while (<F>)
23           {
24               ...
25
26       Alternatively, if you have the "gunzip" program available, you can use
27       this to read compressed files
28
29           open F, "gunzip -c $filename |";
30           while (<F>)
31           {
32               ...
33
34       and this to write compress files, if you have the "compress" program
35       available
36
37           open F, "| compress -c $filename ";
38           print F "data";
39           ...
40           close F ;
41
42   Accessing .tar.Z files
43       The "Archive::Tar" module can optionally use "Compress::Zlib" (via the
44       "IO::Zlib" module) to access tar files that have been compressed with
45       "gzip". Unfortunately tar files compressed with the Unix "compress"
46       utility cannot be read by "Compress::Zlib" and so cannot be directly
47       accessed by "Archive::Tar".
48
49       If the "uncompress" or "gunzip" programs are available, you can use one
50       of these workarounds to read ".tar.Z" files from "Archive::Tar"
51
52       Firstly with "uncompress"
53
54           use strict;
55           use warnings;
56           use Archive::Tar;
57
58           open F, "uncompress -c $filename |";
59           my $tar = Archive::Tar->new(*F);
60           ...
61
62       and this with "gunzip"
63
64           use strict;
65           use warnings;
66           use Archive::Tar;
67
68           open F, "gunzip -c $filename |";
69           my $tar = Archive::Tar->new(*F);
70           ...
71
72       Similarly, if the "compress" program is available, you can use this to
73       write a ".tar.Z" file
74
75           use strict;
76           use warnings;
77           use Archive::Tar;
78           use IO::File;
79
80           my $fh = IO::File->new( "| compress -c >$filename" );
81           my $tar = Archive::Tar->new();
82           ...
83           $tar->write($fh);
84           $fh->close ;
85
86   How do I recompress using a different compression?
87       This is easier that you might expect if you realise that all the
88       "IO::Compress::*" objects are derived from "IO::File" and that all the
89       "IO::Uncompress::*" modules can read from an "IO::File" filehandle.
90
91       So, for example, say you have a file compressed with gzip that you want
92       to recompress with bzip2. Here is all that is needed to carry out the
93       recompression.
94
95           use IO::Uncompress::Gunzip ':all';
96           use IO::Compress::Bzip2 ':all';
97
98           my $gzipFile = "somefile.gz";
99           my $bzipFile = "somefile.bz2";
100
101           my $gunzip = IO::Uncompress::Gunzip->new( $gzipFile )
102               or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
103
104           bzip2 $gunzip => $bzipFile
105               or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
106
107       Note, there is a limitation of this technique. Some compression file
108       formats store extra information along with the compressed data payload.
109       For example, gzip can optionally store the original filename and Zip
110       stores a lot of information about the original file. If the original
111       compressed file contains any of this extra information, it will not be
112       transferred to the new compressed file using the technique above.
113

ZIP

115   What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip
116       support?
117       The following compression formats are supported by "IO::Compress::Zip"
118       and "IO::Uncompress::Unzip"
119
120       •    Store (method 0)
121
122            No compression at all.
123
124       •    Deflate (method 8)
125
126            This is the default compression used when creating a zip file with
127            "IO::Compress::Zip".
128
129       •    Bzip2 (method 12)
130
131            Only supported if the "IO-Compress-Bzip2" module is installed.
132
133       •    Lzma (method 14)
134
135            Only supported if the "IO-Compress-Lzma" module is installed.
136
137   Can I Read/Write Zip files larger the 4 Gig?
138       Yes, both the "IO-Compress-Zip" and "IO-Uncompress-Unzip"  modules
139       support the zip feature called Zip64. That allows them to read/write
140       files/buffers larger than 4Gig.
141
142       If you are creating a Zip file using the one-shot interface, and any of
143       the input files is greater than 4Gig, a zip64 complaint zip file will
144       be created.
145
146           zip "really-large-file" => "my.zip";
147
148       Similarly with the one-shot interface, if the input is a buffer larger
149       than 4 Gig, a zip64 complaint zip file will be created.
150
151           zip \$really_large_buffer => "my.zip";
152
153       The one-shot interface allows you to force the creation of a zip64 zip
154       file by including the "Zip64" option.
155
156           zip $filehandle => "my.zip", Zip64 => 1;
157
158       If you want to create a zip64 zip file with the OO interface you must
159       specify the "Zip64" option.
160
161           my $zip = IO::Compress::Zip->new( "whatever", Zip64 => 1 );
162
163       When uncompressing with "IO-Uncompress-Unzip", it will automatically
164       detect if the zip file is zip64.
165
166       If you intend to manipulate the Zip64 zip files created with
167       "IO-Compress-Zip" using an external zip/unzip, make sure that it
168       supports Zip64.
169
170       In particular, if you are using Info-Zip you need to have zip version
171       3.x or better to update a Zip64 archive and unzip version 6.x to read a
172       zip64 archive.
173
174   Can I write more that 64K entries is a Zip files?
175       Yes. Zip64 allows this. See previous question.
176
177   Zip Resources
178       The primary reference for zip files is the "appnote" document available
179       at <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>
180
181       An alternatively is the Info-Zip appnote. This is available from
182       <ftp://ftp.info-zip.org/pub/infozip/doc/>
183

GZIP

185   Gzip Resources
186       The primary reference for gzip files is RFC 1952
187       <https://datatracker.ietf.org/doc/html/rfc1952>
188
189       The primary site for gzip is <http://www.gzip.org>.
190
191   Dealing with concatenated gzip files
192       If the gunzip program encounters a file containing multiple gzip files
193       concatenated together it will automatically uncompress them all.  The
194       example below illustrates this behaviour
195
196           $ echo abc | gzip -c >x.gz
197           $ echo def | gzip -c >>x.gz
198           $ gunzip -c x.gz
199           abc
200           def
201
202       By default "IO::Uncompress::Gunzip" will not behave like the gunzip
203       program. It will only uncompress the first gzip data stream in the
204       file, as shown below
205
206           $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
207           abc
208
209       To force "IO::Uncompress::Gunzip" to uncompress all the gzip data
210       streams, include the "MultiStream" option, as shown below
211
212           $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
213           abc
214           def
215
216   Reading bgzip files with IO::Uncompress::Gunzip
217       A "bgzip" file consists of a series of valid gzip-compliant data
218       streams concatenated together. To read a file created by "bgzip" with
219       "IO::Uncompress::Gunzip" use the "MultiStream" option as shown in the
220       previous section.
221
222       See the section titled "The BGZF compression format" in
223       <http://samtools.github.io/hts-specs/SAMv1.pdf> for a definition of
224       "bgzip".
225

ZLIB

227   Zlib Resources
228       The primary site for the zlib compression library is
229       <http://www.zlib.org>.
230

Bzip2

232   Bzip2 Resources
233       The primary site for bzip2 is <http://www.bzip.org>.
234
235   Dealing with Concatenated bzip2 files
236       If the bunzip2 program encounters a file containing multiple bzip2
237       files concatenated together it will automatically uncompress them all.
238       The example below illustrates this behaviour
239
240           $ echo abc | bzip2 -c >x.bz2
241           $ echo def | bzip2 -c >>x.bz2
242           $ bunzip2 -c x.bz2
243           abc
244           def
245
246       By default "IO::Uncompress::Bunzip2" will not behave like the bunzip2
247       program. It will only uncompress the first bunzip2 data stream in the
248       file, as shown below
249
250           $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
251           abc
252
253       To force "IO::Uncompress::Bunzip2" to uncompress all the bzip2 data
254       streams, include the "MultiStream" option, as shown below
255
256           $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
257           abc
258           def
259
260   Interoperating with Pbzip2
261       Pbzip2 (<http://compression.ca/pbzip2/>) is a parallel implementation
262       of bzip2. The output from pbzip2 consists of a series of concatenated
263       bzip2 data streams.
264
265       By default "IO::Uncompress::Bzip2" will only uncompress the first bzip2
266       data stream in a pbzip2 file. To uncompress the complete pbzip2 file
267       you must include the "MultiStream" option, like this.
268
269           bunzip2 $input => \$output, MultiStream => 1
270               or die "bunzip2 failed: $Bunzip2Error\n";
271

HTTP & NETWORK

273   Apache::GZip Revisited
274       Below is a mod_perl Apache compression module, called "Apache::GZip",
275       taken from
276       <http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>
277
278         package Apache::GZip;
279         #File: Apache::GZip.pm
280
281         use strict vars;
282         use Apache::Constants ':common';
283         use Compress::Zlib;
284         use IO::File;
285         use constant GZIP_MAGIC => 0x1f8b;
286         use constant OS_MAGIC => 0x03;
287
288         sub handler {
289             my $r = shift;
290             my ($fh,$gz);
291             my $file = $r->filename;
292             return DECLINED unless $fh=IO::File->new($file);
293             $r->header_out('Content-Encoding'=>'gzip');
294             $r->send_http_header;
295             return OK if $r->header_only;
296
297             tie *STDOUT,'Apache::GZip',$r;
298             print($_) while <$fh>;
299             untie *STDOUT;
300             return OK;
301         }
302
303         sub TIEHANDLE {
304             my($class,$r) = @_;
305             # initialize a deflation stream
306             my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
307
308             # gzip header -- don't ask how I found out
309             $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
310
311             return bless { r   => $r,
312                            crc =>  crc32(undef),
313                            d   => $d,
314                            l   =>  0
315                          },$class;
316         }
317
318         sub PRINT {
319             my $self = shift;
320             foreach (@_) {
321               # deflate the data
322               my $data = $self->{d}->deflate($_);
323               $self->{r}->print($data);
324               # keep track of its length and crc
325               $self->{l} += length($_);
326               $self->{crc} = crc32($_,$self->{crc});
327             }
328         }
329
330         sub DESTROY {
331            my $self = shift;
332
333            # flush the output buffers
334            my $data = $self->{d}->flush;
335            $self->{r}->print($data);
336
337            # print the CRC and the total length (uncompressed)
338            $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
339         }
340
341         1;
342
343       Here's the Apache configuration entry you'll need to make use of it.
344       Once set it will result in everything in the /compressed directory will
345       be compressed automagically.
346
347         <Location /compressed>
348            SetHandler  perl-script
349            PerlHandler Apache::GZip
350         </Location>
351
352       Although at first sight there seems to be quite a lot going on in
353       "Apache::GZip", you could sum up what the code was doing as follows --
354       read the contents of the file in "$r->filename", compress it and write
355       the compressed data to standard output. That's all.
356
357       This code has to jump through a few hoops to achieve this because
358
359       1.  The gzip support in "Compress::Zlib" version 1.x can only work with
360           a real filesystem filehandle. The filehandles used by Apache
361           modules are not associated with the filesystem.
362
363       2.  That means all the gzip support has to be done by hand - in this
364           case by creating a tied filehandle to deal with creating the gzip
365           header and trailer.
366
367       "IO::Compress::Gzip" doesn't have that filehandle limitation (this was
368       one of the reasons for writing it in the first place). So if
369       "IO::Compress::Gzip" is used instead of "Compress::Zlib" the whole tied
370       filehandle code can be removed. Here is the rewritten code.
371
372         package Apache::GZip;
373
374         use strict vars;
375         use Apache::Constants ':common';
376         use IO::Compress::Gzip;
377         use IO::File;
378
379         sub handler {
380             my $r = shift;
381             my ($fh,$gz);
382             my $file = $r->filename;
383             return DECLINED unless $fh=IO::File->new($file);
384             $r->header_out('Content-Encoding'=>'gzip');
385             $r->send_http_header;
386             return OK if $r->header_only;
387
388             my $gz = IO::Compress::Gzip->new( '-', Minimal => 1 )
389                 or return DECLINED ;
390
391             print $gz $_ while <$fh>;
392
393             return OK;
394         }
395
396       or even more succinctly, like this, using a one-shot gzip
397
398         package Apache::GZip;
399
400         use strict vars;
401         use Apache::Constants ':common';
402         use IO::Compress::Gzip qw(gzip);
403
404         sub handler {
405             my $r = shift;
406             $r->header_out('Content-Encoding'=>'gzip');
407             $r->send_http_header;
408             return OK if $r->header_only;
409
410             gzip $r->filename => '-', Minimal => 1
411               or return DECLINED ;
412
413             return OK;
414         }
415
416         1;
417
418       The use of one-shot "gzip" above just reads from "$r->filename" and
419       writes the compressed data to standard output.
420
421       Note the use of the "Minimal" option in the code above. When using gzip
422       for Content-Encoding you should always use this option. In the example
423       above it will prevent the filename being included in the gzip header
424       and make the size of the gzip data stream a slight bit smaller.
425
426   Compressed files and Net::FTP
427       The "Net::FTP" module provides two low-level methods called "stor" and
428       "retr" that both return filehandles. These filehandles can used with
429       the "IO::Compress/Uncompress" modules to compress or uncompress files
430       read from or written to an FTP Server on the fly, without having to
431       create a temporary file.
432
433       Firstly, here is code that uses "retr" to uncompressed a file as it is
434       read from the FTP Server.
435
436           use Net::FTP;
437           use IO::Uncompress::Gunzip qw(:all);
438
439           my $ftp = Net::FTP->new( ... )
440
441           my $retr_fh = $ftp->retr($compressed_filename);
442           gunzip $retr_fh => $outFilename, AutoClose => 1
443               or die "Cannot uncompress '$compressed_file': $GunzipError\n";
444
445       and this to compress a file as it is written to the FTP Server
446
447           use Net::FTP;
448           use IO::Compress::Gzip qw(:all);
449
450           my $stor_fh = $ftp->stor($filename);
451           gzip "filename" => $stor_fh, AutoClose => 1
452               or die "Cannot compress '$filename': $GzipError\n";
453

MISC

455   Using "InputLength" to uncompress data embedded in a larger file/buffer.
456       A fairly common use-case is where compressed data is embedded in a
457       larger file/buffer and you want to read both.
458
459       As an example consider the structure of a zip file. This is a well-
460       defined file format that mixes both compressed and uncompressed
461       sections of data in a single file.
462
463       For the purposes of this discussion you can think of a zip file as
464       sequence of compressed data streams, each of which is prefixed by an
465       uncompressed local header. The local header contains information about
466       the compressed data stream, including the name of the compressed file
467       and, in particular, the length of the compressed data stream.
468
469       To illustrate how to use "InputLength" here is a script that walks a
470       zip file and prints out how many lines are in each compressed file (if
471       you intend write code to walking through a zip file for real see
472       "Walking through a zip file" in IO::Uncompress::Unzip ). Also, although
473       this example uses the zlib-based compression, the technique can be used
474       by the other "IO::Uncompress::*" modules.
475
476           use strict;
477           use warnings;
478
479           use IO::File;
480           use IO::Uncompress::RawInflate qw(:all);
481
482           use constant ZIP_LOCAL_HDR_SIG  => 0x04034b50;
483           use constant ZIP_LOCAL_HDR_LENGTH => 30;
484
485           my $file = $ARGV[0] ;
486
487           my $fh = IO::File->new( "<$file" )
488                       or die "Cannot open '$file': $!\n";
489
490           while (1)
491           {
492               my $sig;
493               my $buffer;
494
495               my $x ;
496               ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
497                   or die "Truncated file: $!\n";
498
499               my $signature = unpack ("V", substr($buffer, 0, 4));
500
501               last unless $signature == ZIP_LOCAL_HDR_SIG;
502
503               # Read Local Header
504               my $gpFlag             = unpack ("v", substr($buffer, 6, 2));
505               my $compressedMethod   = unpack ("v", substr($buffer, 8, 2));
506               my $compressedLength   = unpack ("V", substr($buffer, 18, 4));
507               my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
508               my $filename_length    = unpack ("v", substr($buffer, 26, 2));
509               my $extra_length       = unpack ("v", substr($buffer, 28, 2));
510
511               my $filename ;
512               $fh->read($filename, $filename_length) == $filename_length
513                   or die "Truncated file\n";
514
515               $fh->read($buffer, $extra_length) == $extra_length
516                   or die "Truncated file\n";
517
518               if ($compressedMethod != 8 && $compressedMethod != 0)
519               {
520                   warn "Skipping file '$filename' - not deflated $compressedMethod\n";
521                   $fh->read($buffer, $compressedLength) == $compressedLength
522                       or die "Truncated file\n";
523                   next;
524               }
525
526               if ($compressedMethod == 0 && $gpFlag & 8 == 8)
527               {
528                   die "Streamed Stored not supported for '$filename'\n";
529               }
530
531               next if $compressedLength == 0;
532
533               # Done reading the Local Header
534
535               my $inf = IO::Uncompress::RawInflate->new( $fh,
536                                   Transparent => 1,
537                                   InputLength => $compressedLength )
538                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
539
540               my $line_count = 0;
541
542               while (<$inf>)
543               {
544                   ++ $line_count;
545               }
546
547               print "$filename: $line_count\n";
548           }
549
550       The majority of the code above is concerned with reading the zip local
551       header data. The code that I want to focus on is at the bottom.
552
553           while (1) {
554
555               # read local zip header data
556               # get $filename
557               # get $compressedLength
558
559               my $inf = IO::Uncompress::RawInflate->new( $fh,
560                                   Transparent => 1,
561                                   InputLength => $compressedLength )
562                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
563
564               my $line_count = 0;
565
566               while (<$inf>)
567               {
568                   ++ $line_count;
569               }
570
571               print "$filename: $line_count\n";
572           }
573
574       The call to "IO::Uncompress::RawInflate" creates a new filehandle $inf
575       that can be used to read from the parent filehandle $fh, uncompressing
576       it as it goes. The use of the "InputLength" option will guarantee that
577       at most $compressedLength bytes of compressed data will be read from
578       the $fh filehandle (The only exception is for an error case like a
579       truncated file or a corrupt data stream).
580
581       This means that once RawInflate is finished $fh will be left at the
582       byte directly after the compressed data stream.
583
584       Now consider what the code looks like without "InputLength"
585
586           while (1) {
587
588               # read local zip header data
589               # get $filename
590               # get $compressedLength
591
592               # read all the compressed data into $data
593               read($fh, $data, $compressedLength);
594
595               my $inf = IO::Uncompress::RawInflate->new( \$data,
596                                   Transparent => 1 )
597                 or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
598
599               my $line_count = 0;
600
601               while (<$inf>)
602               {
603                   ++ $line_count;
604               }
605
606               print "$filename: $line_count\n";
607           }
608
609       The difference here is the addition of the temporary variable $data.
610       This is used to store a copy of the compressed data while it is being
611       uncompressed.
612
613       If you know that $compressedLength isn't that big then using temporary
614       storage won't be a problem. But if $compressedLength is very large or
615       you are writing an application that other people will use, and so have
616       no idea how big $compressedLength will be, it could be an issue.
617
618       Using "InputLength" avoids the use of temporary storage and means the
619       application can cope with large compressed data streams.
620
621       One final point -- obviously "InputLength" can only be used whenever
622       you know the length of the compressed data beforehand, like here with a
623       zip file.
624

SUPPORT

626       General feedback/questions/bug reports should be sent to
627       <https://github.com/pmqs//issues> (preferred) or
628       <https://rt.cpan.org/Public/Dist/Display.html?Name=>.
629

AUTHOR

646       This module was written by Paul Marquess, "pmqs@cpan.org".
647

MODIFICATION HISTORY

649       See the Changes file.
650

COPYRIGHT AND LICENSE

652       Copyright (c) 2005-2023 Paul Marquess. All rights reserved.
653
654       This program is free software; you can redistribute it and/or modify it
655       under the same terms as Perl itself.
656
657
658
659perl v5.38.0                      2023-07-26              IO::Compress::FAQ(3)