1IO::Compress::FAQ(3) User Contributed Perl Documentation IO::Compress::FAQ(3)
2
3
4
6 IO::Compress::FAQ -- Frequently Asked Questions about IO::Compress
7
9 Common questions answered.
10
12 Compatibility with Unix compress/uncompress.
13 Although "Compress::Zlib" has a pair of functions called "compress" and
14 "uncompress", they are not related to the Unix programs of the same
15 name. The "Compress::Zlib" module is not compatible with Unix
16 "compress".
17
18 If you have the "uncompress" program available, you can use this to
19 read compressed files
20
21 open F, "uncompress -c $filename |";
22 while (<F>)
23 {
24 ...
25
26 Alternatively, if you have the "gunzip" program available, you can use
27 this to read compressed files
28
29 open F, "gunzip -c $filename |";
30 while (<F>)
31 {
32 ...
33
34 and this to write compress files, if you have the "compress" program
35 available
36
37 open F, "| compress -c $filename ";
38 print F "data";
39 ...
40 close F ;
41
42 Accessing .tar.Z files
43 The "Archive::Tar" module can optionally use "Compress::Zlib" (via the
44 "IO::Zlib" module) to access tar files that have been compressed with
45 "gzip". Unfortunately tar files compressed with the Unix "compress"
46 utility cannot be read by "Compress::Zlib" and so cannot be directly
47 accessed by "Archive::Tar".
48
49 If the "uncompress" or "gunzip" programs are available, you can use one
50 of these workarounds to read ".tar.Z" files from "Archive::Tar"
51
52 Firstly with "uncompress"
53
54 use strict;
55 use warnings;
56 use Archive::Tar;
57
58 open F, "uncompress -c $filename |";
59 my $tar = Archive::Tar->new(*F);
60 ...
61
62 and this with "gunzip"
63
64 use strict;
65 use warnings;
66 use Archive::Tar;
67
68 open F, "gunzip -c $filename |";
69 my $tar = Archive::Tar->new(*F);
70 ...
71
72 Similarly, if the "compress" program is available, you can use this to
73 write a ".tar.Z" file
74
75 use strict;
76 use warnings;
77 use Archive::Tar;
78 use IO::File;
79
80 my $fh = IO::File->new( "| compress -c >$filename" );
81 my $tar = Archive::Tar->new();
82 ...
83 $tar->write($fh);
84 $fh->close ;
85
86 How do I recompress using a different compression?
87 This is easier that you might expect if you realise that all the
88 "IO::Compress::*" objects are derived from "IO::File" and that all the
89 "IO::Uncompress::*" modules can read from an "IO::File" filehandle.
90
91 So, for example, say you have a file compressed with gzip that you want
92 to recompress with bzip2. Here is all that is needed to carry out the
93 recompression.
94
95 use IO::Uncompress::Gunzip ':all';
96 use IO::Compress::Bzip2 ':all';
97
98 my $gzipFile = "somefile.gz";
99 my $bzipFile = "somefile.bz2";
100
101 my $gunzip = IO::Uncompress::Gunzip->new( $gzipFile )
102 or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
103
104 bzip2 $gunzip => $bzipFile
105 or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
106
107 Note, there is a limitation of this technique. Some compression file
108 formats store extra information along with the compressed data payload.
109 For example, gzip can optionally store the original filename and Zip
110 stores a lot of information about the original file. If the original
111 compressed file contains any of this extra information, it will not be
112 transferred to the new compressed file using the technique above.
113
115 What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip
116 support?
117 The following compression formats are supported by "IO::Compress::Zip"
118 and "IO::Uncompress::Unzip"
119
120 • Store (method 0)
121
122 No compression at all.
123
124 • Deflate (method 8)
125
126 This is the default compression used when creating a zip file with
127 "IO::Compress::Zip".
128
129 • Bzip2 (method 12)
130
131 Only supported if the "IO-Compress-Bzip2" module is installed.
132
133 • Lzma (method 14)
134
135 Only supported if the "IO-Compress-Lzma" module is installed.
136
137 Can I Read/Write Zip files larger the 4 Gig?
138 Yes, both the "IO-Compress-Zip" and "IO-Uncompress-Unzip" modules
139 support the zip feature called Zip64. That allows them to read/write
140 files/buffers larger than 4Gig.
141
142 If you are creating a Zip file using the one-shot interface, and any of
143 the input files is greater than 4Gig, a zip64 complaint zip file will
144 be created.
145
146 zip "really-large-file" => "my.zip";
147
148 Similarly with the one-shot interface, if the input is a buffer larger
149 than 4 Gig, a zip64 complaint zip file will be created.
150
151 zip \$really_large_buffer => "my.zip";
152
153 The one-shot interface allows you to force the creation of a zip64 zip
154 file by including the "Zip64" option.
155
156 zip $filehandle => "my.zip", Zip64 => 1;
157
158 If you want to create a zip64 zip file with the OO interface you must
159 specify the "Zip64" option.
160
161 my $zip = IO::Compress::Zip->new( "whatever", Zip64 => 1 );
162
163 When uncompressing with "IO-Uncompress-Unzip", it will automatically
164 detect if the zip file is zip64.
165
166 If you intend to manipulate the Zip64 zip files created with
167 "IO-Compress-Zip" using an external zip/unzip, make sure that it
168 supports Zip64.
169
170 In particular, if you are using Info-Zip you need to have zip version
171 3.x or better to update a Zip64 archive and unzip version 6.x to read a
172 zip64 archive.
173
174 Can I write more that 64K entries is a Zip files?
175 Yes. Zip64 allows this. See previous question.
176
177 Zip Resources
178 The primary reference for zip files is the "appnote" document available
179 at <http://www.pkware.com/documents/casestudies/APPNOTE.TXT>
180
181 An alternatively is the Info-Zip appnote. This is available from
182 <ftp://ftp.info-zip.org/pub/infozip/doc/>
183
185 Gzip Resources
186 The primary reference for gzip files is RFC 1952
187 <https://datatracker.ietf.org/doc/html/rfc1952>
188
189 The primary site for gzip is <http://www.gzip.org>.
190
191 Dealing with concatenated gzip files
192 If the gunzip program encounters a file containing multiple gzip files
193 concatenated together it will automatically uncompress them all. The
194 example below illustrates this behaviour
195
196 $ echo abc | gzip -c >x.gz
197 $ echo def | gzip -c >>x.gz
198 $ gunzip -c x.gz
199 abc
200 def
201
202 By default "IO::Uncompress::Gunzip" will not behave like the gunzip
203 program. It will only uncompress the first gzip data stream in the
204 file, as shown below
205
206 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
207 abc
208
209 To force "IO::Uncompress::Gunzip" to uncompress all the gzip data
210 streams, include the "MultiStream" option, as shown below
211
212 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
213 abc
214 def
215
216 Reading bgzip files with IO::Uncompress::Gunzip
217 A "bgzip" file consists of a series of valid gzip-compliant data
218 streams concatenated together. To read a file created by "bgzip" with
219 "IO::Uncompress::Gunzip" use the "MultiStream" option as shown in the
220 previous section.
221
222 See the section titled "The BGZF compression format" in
223 <http://samtools.github.io/hts-specs/SAMv1.pdf> for a definition of
224 "bgzip".
225
227 Zlib Resources
228 The primary site for the zlib compression library is
229 <http://www.zlib.org>.
230
232 Bzip2 Resources
233 The primary site for bzip2 is <http://www.bzip.org>.
234
235 Dealing with Concatenated bzip2 files
236 If the bunzip2 program encounters a file containing multiple bzip2
237 files concatenated together it will automatically uncompress them all.
238 The example below illustrates this behaviour
239
240 $ echo abc | bzip2 -c >x.bz2
241 $ echo def | bzip2 -c >>x.bz2
242 $ bunzip2 -c x.bz2
243 abc
244 def
245
246 By default "IO::Uncompress::Bunzip2" will not behave like the bunzip2
247 program. It will only uncompress the first bunzip2 data stream in the
248 file, as shown below
249
250 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
251 abc
252
253 To force "IO::Uncompress::Bunzip2" to uncompress all the bzip2 data
254 streams, include the "MultiStream" option, as shown below
255
256 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
257 abc
258 def
259
260 Interoperating with Pbzip2
261 Pbzip2 (<http://compression.ca/pbzip2/>) is a parallel implementation
262 of bzip2. The output from pbzip2 consists of a series of concatenated
263 bzip2 data streams.
264
265 By default "IO::Uncompress::Bzip2" will only uncompress the first bzip2
266 data stream in a pbzip2 file. To uncompress the complete pbzip2 file
267 you must include the "MultiStream" option, like this.
268
269 bunzip2 $input => \$output, MultiStream => 1
270 or die "bunzip2 failed: $Bunzip2Error\n";
271
273 Apache::GZip Revisited
274 Below is a mod_perl Apache compression module, called "Apache::GZip",
275 taken from
276 <http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>
277
278 package Apache::GZip;
279 #File: Apache::GZip.pm
280
281 use strict vars;
282 use Apache::Constants ':common';
283 use Compress::Zlib;
284 use IO::File;
285 use constant GZIP_MAGIC => 0x1f8b;
286 use constant OS_MAGIC => 0x03;
287
288 sub handler {
289 my $r = shift;
290 my ($fh,$gz);
291 my $file = $r->filename;
292 return DECLINED unless $fh=IO::File->new($file);
293 $r->header_out('Content-Encoding'=>'gzip');
294 $r->send_http_header;
295 return OK if $r->header_only;
296
297 tie *STDOUT,'Apache::GZip',$r;
298 print($_) while <$fh>;
299 untie *STDOUT;
300 return OK;
301 }
302
303 sub TIEHANDLE {
304 my($class,$r) = @_;
305 # initialize a deflation stream
306 my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
307
308 # gzip header -- don't ask how I found out
309 $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
310
311 return bless { r => $r,
312 crc => crc32(undef),
313 d => $d,
314 l => 0
315 },$class;
316 }
317
318 sub PRINT {
319 my $self = shift;
320 foreach (@_) {
321 # deflate the data
322 my $data = $self->{d}->deflate($_);
323 $self->{r}->print($data);
324 # keep track of its length and crc
325 $self->{l} += length($_);
326 $self->{crc} = crc32($_,$self->{crc});
327 }
328 }
329
330 sub DESTROY {
331 my $self = shift;
332
333 # flush the output buffers
334 my $data = $self->{d}->flush;
335 $self->{r}->print($data);
336
337 # print the CRC and the total length (uncompressed)
338 $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
339 }
340
341 1;
342
343 Here's the Apache configuration entry you'll need to make use of it.
344 Once set it will result in everything in the /compressed directory will
345 be compressed automagically.
346
347 <Location /compressed>
348 SetHandler perl-script
349 PerlHandler Apache::GZip
350 </Location>
351
352 Although at first sight there seems to be quite a lot going on in
353 "Apache::GZip", you could sum up what the code was doing as follows --
354 read the contents of the file in "$r->filename", compress it and write
355 the compressed data to standard output. That's all.
356
357 This code has to jump through a few hoops to achieve this because
358
359 1. The gzip support in "Compress::Zlib" version 1.x can only work with
360 a real filesystem filehandle. The filehandles used by Apache
361 modules are not associated with the filesystem.
362
363 2. That means all the gzip support has to be done by hand - in this
364 case by creating a tied filehandle to deal with creating the gzip
365 header and trailer.
366
367 "IO::Compress::Gzip" doesn't have that filehandle limitation (this was
368 one of the reasons for writing it in the first place). So if
369 "IO::Compress::Gzip" is used instead of "Compress::Zlib" the whole tied
370 filehandle code can be removed. Here is the rewritten code.
371
372 package Apache::GZip;
373
374 use strict vars;
375 use Apache::Constants ':common';
376 use IO::Compress::Gzip;
377 use IO::File;
378
379 sub handler {
380 my $r = shift;
381 my ($fh,$gz);
382 my $file = $r->filename;
383 return DECLINED unless $fh=IO::File->new($file);
384 $r->header_out('Content-Encoding'=>'gzip');
385 $r->send_http_header;
386 return OK if $r->header_only;
387
388 my $gz = IO::Compress::Gzip->new( '-', Minimal => 1 )
389 or return DECLINED ;
390
391 print $gz $_ while <$fh>;
392
393 return OK;
394 }
395
396 or even more succinctly, like this, using a one-shot gzip
397
398 package Apache::GZip;
399
400 use strict vars;
401 use Apache::Constants ':common';
402 use IO::Compress::Gzip qw(gzip);
403
404 sub handler {
405 my $r = shift;
406 $r->header_out('Content-Encoding'=>'gzip');
407 $r->send_http_header;
408 return OK if $r->header_only;
409
410 gzip $r->filename => '-', Minimal => 1
411 or return DECLINED ;
412
413 return OK;
414 }
415
416 1;
417
418 The use of one-shot "gzip" above just reads from "$r->filename" and
419 writes the compressed data to standard output.
420
421 Note the use of the "Minimal" option in the code above. When using gzip
422 for Content-Encoding you should always use this option. In the example
423 above it will prevent the filename being included in the gzip header
424 and make the size of the gzip data stream a slight bit smaller.
425
426 Compressed files and Net::FTP
427 The "Net::FTP" module provides two low-level methods called "stor" and
428 "retr" that both return filehandles. These filehandles can used with
429 the "IO::Compress/Uncompress" modules to compress or uncompress files
430 read from or written to an FTP Server on the fly, without having to
431 create a temporary file.
432
433 Firstly, here is code that uses "retr" to uncompressed a file as it is
434 read from the FTP Server.
435
436 use Net::FTP;
437 use IO::Uncompress::Gunzip qw(:all);
438
439 my $ftp = Net::FTP->new( ... )
440
441 my $retr_fh = $ftp->retr($compressed_filename);
442 gunzip $retr_fh => $outFilename, AutoClose => 1
443 or die "Cannot uncompress '$compressed_file': $GunzipError\n";
444
445 and this to compress a file as it is written to the FTP Server
446
447 use Net::FTP;
448 use IO::Compress::Gzip qw(:all);
449
450 my $stor_fh = $ftp->stor($filename);
451 gzip "filename" => $stor_fh, AutoClose => 1
452 or die "Cannot compress '$filename': $GzipError\n";
453
455 Using "InputLength" to uncompress data embedded in a larger file/buffer.
456 A fairly common use-case is where compressed data is embedded in a
457 larger file/buffer and you want to read both.
458
459 As an example consider the structure of a zip file. This is a well-
460 defined file format that mixes both compressed and uncompressed
461 sections of data in a single file.
462
463 For the purposes of this discussion you can think of a zip file as
464 sequence of compressed data streams, each of which is prefixed by an
465 uncompressed local header. The local header contains information about
466 the compressed data stream, including the name of the compressed file
467 and, in particular, the length of the compressed data stream.
468
469 To illustrate how to use "InputLength" here is a script that walks a
470 zip file and prints out how many lines are in each compressed file (if
471 you intend write code to walking through a zip file for real see
472 "Walking through a zip file" in IO::Uncompress::Unzip ). Also, although
473 this example uses the zlib-based compression, the technique can be used
474 by the other "IO::Uncompress::*" modules.
475
476 use strict;
477 use warnings;
478
479 use IO::File;
480 use IO::Uncompress::RawInflate qw(:all);
481
482 use constant ZIP_LOCAL_HDR_SIG => 0x04034b50;
483 use constant ZIP_LOCAL_HDR_LENGTH => 30;
484
485 my $file = $ARGV[0] ;
486
487 my $fh = IO::File->new( "<$file" )
488 or die "Cannot open '$file': $!\n";
489
490 while (1)
491 {
492 my $sig;
493 my $buffer;
494
495 my $x ;
496 ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
497 or die "Truncated file: $!\n";
498
499 my $signature = unpack ("V", substr($buffer, 0, 4));
500
501 last unless $signature == ZIP_LOCAL_HDR_SIG;
502
503 # Read Local Header
504 my $gpFlag = unpack ("v", substr($buffer, 6, 2));
505 my $compressedMethod = unpack ("v", substr($buffer, 8, 2));
506 my $compressedLength = unpack ("V", substr($buffer, 18, 4));
507 my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
508 my $filename_length = unpack ("v", substr($buffer, 26, 2));
509 my $extra_length = unpack ("v", substr($buffer, 28, 2));
510
511 my $filename ;
512 $fh->read($filename, $filename_length) == $filename_length
513 or die "Truncated file\n";
514
515 $fh->read($buffer, $extra_length) == $extra_length
516 or die "Truncated file\n";
517
518 if ($compressedMethod != 8 && $compressedMethod != 0)
519 {
520 warn "Skipping file '$filename' - not deflated $compressedMethod\n";
521 $fh->read($buffer, $compressedLength) == $compressedLength
522 or die "Truncated file\n";
523 next;
524 }
525
526 if ($compressedMethod == 0 && $gpFlag & 8 == 8)
527 {
528 die "Streamed Stored not supported for '$filename'\n";
529 }
530
531 next if $compressedLength == 0;
532
533 # Done reading the Local Header
534
535 my $inf = IO::Uncompress::RawInflate->new( $fh,
536 Transparent => 1,
537 InputLength => $compressedLength )
538 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
539
540 my $line_count = 0;
541
542 while (<$inf>)
543 {
544 ++ $line_count;
545 }
546
547 print "$filename: $line_count\n";
548 }
549
550 The majority of the code above is concerned with reading the zip local
551 header data. The code that I want to focus on is at the bottom.
552
553 while (1) {
554
555 # read local zip header data
556 # get $filename
557 # get $compressedLength
558
559 my $inf = IO::Uncompress::RawInflate->new( $fh,
560 Transparent => 1,
561 InputLength => $compressedLength )
562 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
563
564 my $line_count = 0;
565
566 while (<$inf>)
567 {
568 ++ $line_count;
569 }
570
571 print "$filename: $line_count\n";
572 }
573
574 The call to "IO::Uncompress::RawInflate" creates a new filehandle $inf
575 that can be used to read from the parent filehandle $fh, uncompressing
576 it as it goes. The use of the "InputLength" option will guarantee that
577 at most $compressedLength bytes of compressed data will be read from
578 the $fh filehandle (The only exception is for an error case like a
579 truncated file or a corrupt data stream).
580
581 This means that once RawInflate is finished $fh will be left at the
582 byte directly after the compressed data stream.
583
584 Now consider what the code looks like without "InputLength"
585
586 while (1) {
587
588 # read local zip header data
589 # get $filename
590 # get $compressedLength
591
592 # read all the compressed data into $data
593 read($fh, $data, $compressedLength);
594
595 my $inf = IO::Uncompress::RawInflate->new( \$data,
596 Transparent => 1 )
597 or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ;
598
599 my $line_count = 0;
600
601 while (<$inf>)
602 {
603 ++ $line_count;
604 }
605
606 print "$filename: $line_count\n";
607 }
608
609 The difference here is the addition of the temporary variable $data.
610 This is used to store a copy of the compressed data while it is being
611 uncompressed.
612
613 If you know that $compressedLength isn't that big then using temporary
614 storage won't be a problem. But if $compressedLength is very large or
615 you are writing an application that other people will use, and so have
616 no idea how big $compressedLength will be, it could be an issue.
617
618 Using "InputLength" avoids the use of temporary storage and means the
619 application can cope with large compressed data streams.
620
621 One final point -- obviously "InputLength" can only be used whenever
622 you know the length of the compressed data beforehand, like here with a
623 zip file.
624
626 General feedback/questions/bug reports should be sent to
627 <https://github.com/pmqs//issues> (preferred) or
628 <https://rt.cpan.org/Public/Dist/Display.html?Name=>.
629
631 Compress::Zlib, IO::Compress::Gzip, IO::Uncompress::Gunzip,
632 IO::Compress::Deflate, IO::Uncompress::Inflate,
633 IO::Compress::RawDeflate, IO::Uncompress::RawInflate,
634 IO::Compress::Bzip2, IO::Uncompress::Bunzip2, IO::Compress::Lzma,
635 IO::Uncompress::UnLzma, IO::Compress::Xz, IO::Uncompress::UnXz,
636 IO::Compress::Lzip, IO::Uncompress::UnLzip, IO::Compress::Lzop,
637 IO::Uncompress::UnLzop, IO::Compress::Lzf, IO::Uncompress::UnLzf,
638 IO::Compress::Zstd, IO::Uncompress::UnZstd, IO::Uncompress::AnyInflate,
639 IO::Uncompress::AnyUncompress
640
641 IO::Compress::FAQ
642
643 File::GlobMapper, Archive::Zip, Archive::Tar, IO::Zlib
644
646 This module was written by Paul Marquess, "pmqs@cpan.org".
647
649 See the Changes file.
650
652 Copyright (c) 2005-2023 Paul Marquess. All rights reserved.
653
654 This program is free software; you can redistribute it and/or modify it
655 under the same terms as Perl itself.
656
657
658
659perl v5.38.0 2023-07-26 IO::Compress::FAQ(3)