1DICTZIP(1) DICTZIP(1)
2
3
4
6 dictzip, dictunzip - compress (or expand) files, allowing random access
7
9 dictzip [options] name
10 dictunzip [options] name
11
13 dictzip compresses files using the gzip(1) algorithm (LZ77) in a manner
14 which is completely compatible with the gzip file format. An extension
15 to the gzip file format (Extra Field, described in 2.3.1.1 of RFC 1952)
16 allows extra data to be stored in the header of a compressed file.
17 Programs like gzip and zcat will ignore this extra data. However,
18 dictd(8), the DICT protocol dictionary server will make use of this
19 data to perform pseudo-random access on the file. Files in the dictzip
20 format should end in ".dz" so that they may be distinguished from com‐
21 mon gzip files that do not contain the special header information.
22
23 From RFC 1952, the extra field is specified as follows:
24
25 If the FLG.FEXTRA bit is set, an "extra field" is present in the
26 header, with total length XLEN bytes. It consists of a series
27 of subfields, each of the form:
28
29 +---+---+---+---+==================================+
30 |SI1|SI2| LEN |... LEN bytes of subfield data ...|
31 +---+---+---+---+==================================+
32
33 SI1 and SI2 provide a subfield ID, typically two ASCII letters
34 with some mnemonic value. Jean-Loup Gailly
35 <gzip@prep.ai.mit.edu> is maintaining a registry of subfield
36 IDs; please send him any subfield ID you wish to use. Subfield
37 IDs with SI2 = 0 are reserved for future use.
38
39 LEN gives the length of the subfield data, excluding the 4 ini‐
40 tial bytes.
41
42 The dictzip program uses 'R' for SI1, and 'A' for SI2 (i.e., "Random
43 Access"). After the LEN field, the data is arranged as follows:
44
45 +---+---+---+---+---+---+===============================+
46 | VER | CHLEN | CHCNT | ... CHCNT words of data ... |
47 +---+---+---+---+---+---+===============================+
48
49 As per RFC 1952, all data is stored least-significant byte first. For
50 VER 1 of the data, all values are 16-bits long (2 bytes), and are
51 unsigned integers.
52
53 XLEN (which is specified earlier in the header) is a two byte integer,
54 so the extra field can be 0xffff bytes long, 2 bytes of which are used
55 for the subfield ID (SI1 and SI1), and 2 bytes of which are used for
56 the subfield length (LEN). This leaves 0xfffb bytes (0x7ffd 2-byte
57 entries or 0x3ffe 4-byte entries). Given that the zip output buffer
58 must be 10% + 12 bytes larger than the input buffer, we can store 58969
59 bytes per entry, or about 1.8GB if the 2-byte entries are used. If
60 this becomes a limiting factor, another format version can be selected
61 and defined for 4-byte entries.
62
63 For compression, the file is divided up into "chunks" of data, each
64 chunk is less than 64kB, and can be compressed into an area that is
65 also less than 64kB long (taking incompressible data into account --
66 usually the data is compressed into a block that is much smaller than
67 the original). The CHLEN field specifies the length of a "chunk" of
68 data. The CHCNT field specifies how many chunks are preset, and the
69 CHCNT words of data specifies how long each chunk is after compression
70 (i.e., in the current compressed file).
71
72 To perform random access on the data, the offset and length of the data
73 are provided to library routines. These routines determine the chunk
74 in which the desired data begins, and decompresses that chunk. Consec‐
75 utive chunks are decompressed as necessary.
76
78 Speed True random file access is not realized, since any access, even
79 for a single byte, requires that a 64kB chunk be read and decom‐
80 pressed. This is slower than accessing a flat text file, but is
81 much, much faster than performing serial access on a fully com‐
82 pressed file.
83
84 Space For the textual dictionary databases we are working with, the
85 use of 64kB chunks and maximal LZ77 compression realizes a file
86 which is only about 4% larger than the same file compressed all
87 at once.
88
90 -d or --decompress
91 Decompress. This is the default if the executable is called
92 dictunzip.
93
94 -c or --stdout
95 Write output on standard output; keep original files unchanged.
96 This is only available when decompressing (because parts of the
97 header must be updated after a write when compressing).
98
99 -f or --force
100 Force compression or decompression even if the output file
101 already exists.
102
103 -h or --help
104 Display help.
105
106 -k or --keep
107 Do not delete the original file.
108
109 -l or --list
110 For each compressed file, list the following fields:
111
112 type: dzip, gzip, or text (includes files in unknown for‐
113 mats)
114 crc: CRC checksum
115 date and time: from header
116 chunks: number of chunks in file
117 size: size of each uncompressed chunk
118 compr.: compressed size
119 uncompr.: uncompressed size
120 ratio: compression ratio (0.0% if unknown)
121 name: name of uncompressed file
122
123 Unlike gzip, the compression method is not detected.
124
125 -L or --license
126 Display the dictzip license and quit.
127
128 -t or --test
129 Check the compressed file integrity. This option is not imple‐
130 mented. Instead, it will list the header information.
131
132 -v or --verbose
133 Verbose. Display extra information during compression.
134
135 -V or --version
136 Version. Display the version number and compilation options then
137 quit.
138
139 -s start or --start start
140 Specify the offer to start decompression, using decimal numbers.
141 The default is at the beginning of the file.
142
143 -e size or --size size
144 Specify the size of the portion of the file to decompress, using
145 decimal numbers. The default is the whole file.
146
147 -S start or --Start start
148 Specify the offer to start decompression, using base64 numbers.
149 The default is at the beginning of the file.
150
151 -E size or --Size start
152 Specify the size of the portion of the file to decompress, using
153 base64 numbers. The default is the whole file.
154
155 -p prefilter or --pre prefilter
156 Specify a shell command to execute as a filter before compres‐
157 sion or decompression of a chunk. The pre- and post-compression
158 filters can be used to provide additional compression or output
159 formatting. The filters may not increase the buffer size sig‐
160 nificantly. The pre- and post-compression filters were designed
161 to provide the most general interface possible.
162
163 -P postfilter or --post postfilter
164 Specify a shell command to execute as a filter after compression
165 or decompression.
166
168 dictzip was written by Rik Faith (faith@cs.unc.edu) and is distributed
169 under the terms of the GNU General Public License. If you need to dis‐
170 tribute under other terms, write to the author.
171
172 The main libraries used by this programs (zlib, regex, libmaa) are dis‐
173 tributed under different terms, so you may be able to use the libraries
174 for applications which are incompatible with the GPL -- please see the
175 copyright notices and license information that come with the libraries
176 for more information, and consult with your attorney to resolve these
177 issues.
178
180 dict(1), dictd(8), gzip(1), gunzip(1), zcat(1)
181
182
183
184 22 Jun 1997 DICTZIP(1)