1IO::Compress::FAQ(3) User Contributed Perl Documentation IO::Compress::FAQ(3)
2
3
4
6 IO::Compress::FAQ -- Frequently Asked Questions about IO::Compress
7
9 Common questions answered.
10
12 Compatibility with Unix compress/uncompress.
13 Although "Compress::Zlib" has a pair of functions called "compress" and
14 "uncompress", they are not related to the Unix programs of the same
15 name. The "Compress::Zlib" module is not compatible with Unix
16 "compress".
17
18 If you have the "uncompress" program available, you can use this to
19 read compressed files
20
21 open F, "uncompress -c $filename |";
22 while (<F>)
23 {
24 ...
25
26 Alternatively, if you have the "gunzip" program available, you can use
27 this to read compressed files
28
29 open F, "gunzip -c $filename |";
30 while (<F>)
31 {
32 ...
33
34 and this to write compress files, if you have the "compress" program
35 available
36
37 open F, "| compress -c $filename ";
38 print F "data";
39 ...
40 close F ;
41
42 Accessing .tar.Z files
43 The "Archive::Tar" module can optionally use "Compress::Zlib" (via the
44 "IO::Zlib" module) to access tar files that have been compressed with
45 "gzip". Unfortunately tar files compressed with the Unix "compress"
46 utility cannot be read by "Compress::Zlib" and so cannot be directly
47 accessed by "Archive::Tar".
48
49 If the "uncompress" or "gunzip" programs are available, you can use one
50 of these workarounds to read ".tar.Z" files from "Archive::Tar"
51
52 Firstly with "uncompress"
53
54 use strict;
55 use warnings;
56 use Archive::Tar;
57
58 open F, "uncompress -c $filename |";
59 my $tar = Archive::Tar->new(*F);
60 ...
61
62 and this with "gunzip"
63
64 use strict;
65 use warnings;
66 use Archive::Tar;
67
68 open F, "gunzip -c $filename |";
69 my $tar = Archive::Tar->new(*F);
70 ...
71
72 Similarly, if the "compress" program is available, you can use this to
73 write a ".tar.Z" file
74
75 use strict;
76 use warnings;
77 use Archive::Tar;
78 use IO::File;
79
80 my $fh = new IO::File "| compress -c >$filename";
81 my $tar = Archive::Tar->new();
82 ...
83 $tar->write($fh);
84 $fh->close ;
85
86 How do I recompress using a different compression?
87 This is easier that you might expect if you realise that all the
88 "IO::Compress::*" objects are derived from "IO::File" and that all the
89 "IO::Uncompress::*" modules can read from an "IO::File" filehandle.
90
91 So, for example, say you have a file compressed with gzip that you want
92 to recompress with bzip2. Here is all that is needed to carry out the
93 recompression.
94
95 use IO::Uncompress::Gunzip ':all';
96 use IO::Compress::Bzip2 ':all';
97
98 my $gzipFile = "somefile.gz";
99 my $bzipFile = "somefile.bz2";
100
101 my $gunzip = new IO::Uncompress::Gunzip $gzipFile
102 or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
103
104 bzip2 $gunzip => $bzipFile
105 or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
106
107 Note, there is a limitation of this technique. Some compression file
108 formats store extra information along with the compressed data payload.
109 For example, gzip can optionally store the original filename and Zip
110 stores a lot of information about the original file. If the original
111 compressed file contains any of this extra information, it will not be
112 transferred to the new compressed file usign the technique above.
113
115 What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip
116 support?
117 The following compression formats are supported by "IO::Compress::Zip"
118 and "IO::Uncompress::Unzip"
119
120 · Store (method 0)
121
122 No compression at all.
123
124 · Deflate (method 8)
125
126 This is the default compression used when creating a zip file with
127 "IO::Compress::Zip".
128
129 · Bzip2 (method 12)
130
131 Only supported if the "IO-Compress-Bzip2" module is installed.
132
133 · Lzma (method 14)
134
135 Only supported if the "IO-Compress-Lzma" module is installed.
136
137 Can I Read/Write Zip files larger the 4 Gig?
138 Yes, both the "IO-Compress-Zip" and "IO-Uncompress-Unzip" modules
139 support the zip feature called Zip64. That allows them to read/write
140 files/buffers larger than 4Gig.
141
142 If you are creating a Zip file using the one-shot interface, and any of
143 the input files is greater than 4Gig, a zip64 complaint zip file will
144 be created.
145
146 zip "really-large-file" => "my.zip";
147
148 Similarly with the one-shot interface, if the input is a buffer larger
149 than 4 Gig, a zip64 complaint zip file will be created.
150
151 zip \$really_large_buffer => "my.zip";
152
153 The one-shot interface allows you to force the creation of a zip64 zip
154 file by including the "Zip64" option.
155
156 zip $filehandle => "my.zip", Zip64 => 1;
157
158 If you want to create a zip64 zip file with the OO interface you must
159 specify the "Zip64" option.
160
161 my $zip = new IO::Compress::Zip "whatever", Zip64 => 1;
162
163 When uncompressing with "IO-Uncompress-Unzip", it will automatically
164 detect if the zip file is zip64.
165
166 If you intend to manipulate the Zip64 zip files created with
167 "IO-Compress-Zip" using an external zip/unzip, make sure that it
168 supports Zip64.
169
170 In particular, if you are using Info-Zip you need to have zip version
171 3.x or better to update a Zip64 archive and unzip version 6.x to read a
172 zip64 archive.
173
174 Can I write more that 64K entries is a Zip files?
175 Yes. Zip64 allows this. See previous question.
176
177 Zip Resources
178 The primary reference for zip files is the "appnote" document available
179 at <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>
180
181 An alternatively is the Info-Zip appnote. This is available from
182 <ftp://ftp.info-zip.org/pub/infozip/doc/>
183
185 Gzip Resources
186 The primary reference for gzip files is RFC 1952
187 <http://www.faqs.org/rfcs/rfc1952.html>
188
189 The primary site for gzip is http://www.gzip.org.
190
191 Dealing with Concatenated gzip files
192 If the gunzip program encounters a file containing multiple gzip files
193 concatenated together it will automatically uncompress them all. The
194 example below illustrates this behaviour
195
196 $ echo abc | gzip -c >x.gz
197 $ echo def | gzip -c >>x.gz
198 $ gunzip -c x.gz
199 abc
200 def
201
202 By default "IO::Uncompress::Gunzip" will not bahave like the gunzip
203 program. It will only uncompress the first gzip data stream in the
204 file, as shown below
205
206 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
207 abc
208
209 To force "IO::Uncompress::Gunzip" to uncompress all the gzip data
210 streams, include the "MultiStream" option, as shown below
211
212 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
213 abc
214 def
215
217 Zlib Resources
218 The primary site for the zlib compression library is
219 http://www.zlib.org.
220
222 Bzip2 Resources
223 The primary site for bzip2 is http://www.bzip.org.
224
225 Dealing with Concatenated bzip2 files
226 If the bunzip2 program encounters a file containing multiple bzip2
227 files concatenated together it will automatically uncompress them all.
228 The example below illustrates this behaviour
229
230 $ echo abc | bzip2 -c >x.bz2
231 $ echo def | bzip2 -c >>x.bz2
232 $ bunzip2 -c x.bz2
233 abc
234 def
235
236 By default "IO::Uncompress::Bunzip2" will not bahave like the bunzip2
237 program. It will only uncompress the first bunzip2 data stream in the
238 file, as shown below
239
240 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
241 abc
242
243 To force "IO::Uncompress::Bunzip2" to uncompress all the bzip2 data
244 streams, include the "MultiStream" option, as shown below
245
246 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
247 abc
248 def
249
250 Interoperating with Pbzip2
251 Pbzip2 (<http://compression.ca/pbzip2/>) is a parallel implementation
252 of bzip2. The output from pbzip2 consists of a series of concatenated
253 bzip2 data streams.
254
255 By default "IO::Uncompress::Bzip2" will only uncompress the first bzip2
256 data stream in a pbzip2 file. To uncompress the complete pbzip2 file
257 you must include the "MultiStream" option, like this.
258
259 bunzip2 $input => \$output, MultiStream => 1
260 or die "bunzip2 failed: $Bunzip2Error\n";
261
263 Apache::GZip Revisited
264 Below is a mod_perl Apache compression module, called "Apache::GZip",
265 taken from
266 http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression
267
268 package Apache::GZip;
269 #File: Apache::GZip.pm
270
271 use strict vars;
272 use Apache::Constants ':common';
273 use Compress::Zlib;
274 use IO::File;
275 use constant GZIP_MAGIC => 0x1f8b;
276 use constant OS_MAGIC => 0x03;
277
278 sub handler {
279 my $r = shift;
280 my ($fh,$gz);
281 my $file = $r->filename;
282 return DECLINED unless $fh=IO::File->new($file);
283 $r->header_out('Content-Encoding'=>'gzip');
284 $r->send_http_header;
285 return OK if $r->header_only;
286
287 tie *STDOUT,'Apache::GZip',$r;
288 print($_) while <$fh>;
289 untie *STDOUT;
290 return OK;
291 }
292
293 sub TIEHANDLE {
294 my($class,$r) = @_;
295 # initialize a deflation stream
296 my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
297
298 # gzip header -- don't ask how I found out
299 $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
300
301 return bless { r => $r,
302 crc => crc32(undef),
303 d => $d,
304 l => 0
305 },$class;
306 }
307
308 sub PRINT {
309 my $self = shift;
310 foreach (@_) {
311 # deflate the data
312 my $data = $self->{d}->deflate($_);
313 $self->{r}->print($data);
314 # keep track of its length and crc
315 $self->{l} += length($_);
316 $self->{crc} = crc32($_,$self->{crc});
317 }
318 }
319
320 sub DESTROY {
321 my $self = shift;
322
323 # flush the output buffers
324 my $data = $self->{d}->flush;
325 $self->{r}->print($data);
326
327 # print the CRC and the total length (uncompressed)
328 $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
329 }
330
331 1;
332
333 Here's the Apache configuration entry you'll need to make use of it.
334 Once set it will result in everything in the /compressed directory will
335 be compressed automagically.
336
337 <Location /compressed>
338 SetHandler perl-script
339 PerlHandler Apache::GZip
340 </Location>
341
342 Although at first sight there seems to be quite a lot going on in
343 "Apache::GZip", you could sum up what the code was doing as follows --
344 read the contents of the file in "$r->filename", compress it and write
345 the compressed data to standard output. That's all.
346
347 This code has to jump through a few hoops to achieve this because
348
349 1. The gzip support in "Compress::Zlib" version 1.x can only work with
350 a real filesystem filehandle. The filehandles used by Apache
351 modules are not associated with the filesystem.
352
353 2. That means all the gzip support has to be done by hand - in this
354 case by creating a tied filehandle to deal with creating the gzip
355 header and trailer.
356
357 "IO::Compress::Gzip" doesn't have that filehandle limitation (this was
358 one of the reasons for writing it in the first place). So if
359 "IO::Compress::Gzip" is used instead of "Compress::Zlib" the whole tied
360 filehandle code can be removed. Here is the rewritten code.
361
362 package Apache::GZip;
363
364 use strict vars;
365 use Apache::Constants ':common';
366 use IO::Compress::Gzip;
367 use IO::File;
368
369 sub handler {
370 my $r = shift;
371 my ($fh,$gz);
372 my $file = $r->filename;
373 return DECLINED unless $fh=IO::File->new($file);
374 $r->header_out('Content-Encoding'=>'gzip');
375 $r->send_http_header;
376 return OK if $r->header_only;
377
378 my $gz = new IO::Compress::Gzip '-', Minimal => 1
379 or return DECLINED ;
380
381 print $gz $_ while <$fh>;
382
383 return OK;
384 }
385
386 or even more succinctly, like this, using a one-shot gzip
387
388 package Apache::GZip;
389
390 use strict vars;
391 use Apache::Constants ':common';
392 use IO::Compress::Gzip qw(gzip);
393
394 sub handler {
395 my $r = shift;
396 $r->header_out('Content-Encoding'=>'gzip');
397 $r->send_http_header;
398 return OK if $r->header_only;
399
400 gzip $r->filename => '-', Minimal => 1
401 or return DECLINED ;
402
403 return OK;
404 }
405
406 1;
407
408 The use of one-shot "gzip" above just reads from "$r->filename" and
409 writes the compressed data to standard output.
410
411 Note the use of the "Minimal" option in the code above. When using gzip
412 for Content-Encoding you should always use this option. In the example
413 above it will prevent the filename being included in the gzip header
414 and make the size of the gzip data stream a slight bit smaller.
415
416 Compressed files and Net::FTP
417 The "Net::FTP" module provides two low-level methods called "stor" and
418 "retr" that both return filehandles. These filehandles can used with
419 the "IO::Compress/Uncompress" modules to compress or uncompress files
420 read from or written to an FTP Server on the fly, without having to
421 create a temporary file.
422
423 Firstly, here is code that uses "retr" to uncompressed a file as it is
424 read from the FTP Server.
425
426 use Net::FTP;
427 use IO::Uncompress::Gunzip qw(:all);
428
429 my $ftp = new Net::FTP ...
430
431 my $retr_fh = $ftp->retr($compressed_filename);
432 gunzip $retr_fh => $outFilename, AutoClose => 1
433 or die "Cannot uncompress '$compressed_file': $GunzipError\n";
434
435 and this to compress a file as it is written to the FTP Server
436
437 use Net::FTP;
438 use IO::Compress::Gzip qw(:all);
439
440 my $stor_fh = $ftp->stor($filename);
441 gzip "filename" => $stor_fh, AutoClose => 1
442 or die "Cannot compress '$filename': $GzipError\n";
443
445 Using "InputLength" to uncompress data embedded in a larger file/buffer.
446 A fairly common use-case is where compressed data is embedded in a
447 larger file/buffer and you want to read both.
448
449 As an example consider the structure of a zip file. This is a well-
450 defined file format that mixes both compressed and uncompressed
451 sections of data in a single file.
452
453 For the purposes of this discussion you can think of a zip file as
454 sequence of compressed data streams, each of which is prefixed by an
455 uncompressed local header. The local header contains information about
456 the compressed data stream, including the name of the compressed file
457 and, in particular, the length of the compressed data stream.
458
459 To illustrate how to use "InputLength" here is a script that walks a
460 zip file and prints out how many lines are in each compressed file (if
461 you intend write code to walking through a zip file for real see
462 "Walking through a zip file" in IO::Uncompress::Unzip ). Also, although
463 this example uses the zlib-based compression, the technique can be used
464 by the other "IO::Uncompress::*" modules.
465
466 use strict;
467 use warnings;
468
469 use IO::File;
470 use IO::Uncompress::RawInflate qw(:all);
471
472 use constant ZIP_LOCAL_HDR_SIG => 0x04034b50;
473 use constant ZIP_LOCAL_HDR_LENGTH => 30;
474
475 my $file = $ARGV[0] ;
476
477 my $fh = new IO::File "<$file"
478 or die "Cannot open '$file': $!\n";
479
480 while (1)
481 {
482 my $sig;
483 my $buffer;
484
485 my $x ;
486 ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
487 or die "Truncated file: $!\n";
488
489 my $signature = unpack ("V", substr($buffer, 0, 4));
490
491 last unless $signature == ZIP_LOCAL_HDR_SIG;
492
493 # Read Local Header
494 my $gpFlag = unpack ("v", substr($buffer, 6, 2));
495 my $compressedMethod = unpack ("v", substr($buffer, 8, 2));
496 my $compressedLength = unpack ("V", substr($buffer, 18, 4));
497 my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
498 my $filename_length = unpack ("v", substr($buffer, 26, 2));
499 my $extra_length = unpack ("v", substr($buffer, 28, 2));
500
501 my $filename ;
502 $fh->read($filename, $filename_length) == $filename_length
503 or die "Truncated file\n";
504
505 $fh->read($buffer, $extra_length) == $extra_length
506 or die "Truncated file\n";
507
508 if ($compressedMethod != 8 && $compressedMethod != 0)
509 {
510 warn "Skipping file '$filename' - not deflated $compressedMethod\n";
511 $fh->read($buffer, $compressedLength) == $compressedLength
512 or die "Truncated file\n";
513 next;
514 }
515
516 if ($compressedMethod == 0 && $gpFlag & 8 == 8)
517 {
518 die "Streamed Stored not supported for '$filename'\n";
519 }
520
521 next if $compressedLength == 0;
522
523 # Done reading the Local Header
524
525 my $inf = new IO::Uncompress::RawInflate $fh,
526 Transparent => 1,
527 InputLength => $compressedLength
528 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
529
530 my $line_count = 0;
531
532 while (<$inf>)
533 {
534 ++ $line_count;
535 }
536
537 print "$filename: $line_count\n";
538 }
539
540 The majority of the code above is concerned with reading the zip local
541 header data. The code that I want to focus on is at the bottom.
542
543 while (1) {
544
545 # read local zip header data
546 # get $filename
547 # get $compressedLength
548
549 my $inf = new IO::Uncompress::RawInflate $fh,
550 Transparent => 1,
551 InputLength => $compressedLength
552 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
553
554 my $line_count = 0;
555
556 while (<$inf>)
557 {
558 ++ $line_count;
559 }
560
561 print "$filename: $line_count\n";
562 }
563
564 The call to "IO::Uncompress::RawInflate" creates a new filehandle $inf
565 that can be used to read from the parent filehandle $fh, uncompressing
566 it as it goes. The use of the "InputLength" option will guarantee that
567 at most $compressedLength bytes of compressed data will be read from
568 the $fh filehandle (The only exception is for an error case like a
569 truncated file or a corrupt data stream).
570
571 This means that once RawInflate is finished $fh will be left at the
572 byte directly after the compressed data stream.
573
574 Now consider what the code looks like without "InputLength"
575
576 while (1) {
577
578 # read local zip header data
579 # get $filename
580 # get $compressedLength
581
582 # read all the compressed data into $data
583 read($fh, $data, $compressedLength);
584
585 my $inf = new IO::Uncompress::RawInflate \$data,
586 Transparent => 1,
587 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
588
589 my $line_count = 0;
590
591 while (<$inf>)
592 {
593 ++ $line_count;
594 }
595
596 print "$filename: $line_count\n";
597 }
598
599 The difference here is the addition of the temporary variable $data.
600 This is used to store a copy of the compressed data while it is being
601 uncompressed.
602
603 If you know that $compressedLength isn't that big then using temporary
604 storage won't be a problem. But if $compressedLength is very large or
605 you are writing an application that other people will use, and so have
606 no idea how big $compressedLength will be, it could be an issue.
607
608 Using "InputLength" avoids the use of temporary storage and means the
609 application can cope with large compressed data streams.
610
611 One final point -- obviously "InputLength" can only be used whenever
612 you know the length of the compressed data beforehand, like here with a
613 zip file.
614
616 Compress::Zlib, IO::Compress::Gzip, IO::Uncompress::Gunzip,
617 IO::Compress::Deflate, IO::Uncompress::Inflate,
618 IO::Compress::RawDeflate, IO::Uncompress::RawInflate,
619 IO::Compress::Bzip2, IO::Uncompress::Bunzip2, IO::Compress::Lzma,
620 IO::Uncompress::UnLzma, IO::Compress::Xz, IO::Uncompress::UnXz,
621 IO::Compress::Lzop, IO::Uncompress::UnLzop, IO::Compress::Lzf,
622 IO::Uncompress::UnLzf, IO::Uncompress::AnyInflate,
623 IO::Uncompress::AnyUncompress
624
625 IO::Compress::FAQ
626
627 File::GlobMapper, Archive::Zip, Archive::Tar, IO::Zlib
628
630 This module was written by Paul Marquess, pmqs@cpan.org.
631
633 See the Changes file.
634
636 Copyright (c) 2005-2013 Paul Marquess. All rights reserved.
637
638 This program is free software; you can redistribute it and/or modify it
639 under the same terms as Perl itself.
640
641
642
643perl v5.16.3 2013-05-19 IO::Compress::FAQ(3)