lbzip2(1)

1LBZIP2(1)                        User commands                       LBZIP2(1)
2
3
4

NAME

6       lbzip2 - parallel bzip2 utility
7
8

SYNOPSIS

10       lbzip2|bzip2  [-n WTHRS] [-k|-c|-t] [-d] [-1 .. -9] [-f] [-s] [-u] [-v]
11       [-S] [ FILE ... ]
12
13       lbunzip2|bunzip2 [-n WTHRS] [-k|-c|-t] [-z] [-f] [-s] [-u] [-v] [-S]  [
14       FILE ... ]
15
16       lbzcat|bzcat [-n WTHRS] [-z] [-f] [-s] [-u] [-v] [-S] [ FILE ... ]
17
18       lbzip2|bzip2|lbunzip2|bunzip2|lbzcat|bzcat -h
19
20
21

DESCRIPTION

23       Compress or decompress FILE operands or standard input to regular files
24       or standard output using the Burrows-Wheeler  block-sorting  text  com‐
25       pression  algorithm. The lbzip2 utility employs multiple threads and an
26       input-bound splitter even when  decompressing  .bz2  files  created  by
27       standard bzip2.
28
29       Compression is generally considerably better than that achieved by more
30       conventional LZ77/LZ78-based compressors, and competitive with all  but
31       the best of the PPM family of statistical compressors.
32
33       Compression  is  always  performed,  even  if  the  compressed  file is
34       slightly larger than the original. The  worst  case  expansion  is  for
35       files  of  zero  length,  which  expand  to fourteen bytes. Random data
36       (including the output of most file compressors) is coded with asymptot‐
37       ic expansion of around 0.5%.
38
39       The  command-line  options  are  deliberately  very similar to those of
40       bzip2 and gzip, but they are not identical.
41
42
43

INVOCATION

45       The default mode of operation is compression. If the utility is invoked
46       as  lbunzip2 or bunzip2, the mode is switched to decompression. Calling
47       the utility as lbzcat or bzcat selects decompression, with  the  decom‐
48       pressed byte-stream written to standard output.
49
50
51

OPTIONS

53       -n WTHRS
54              Set  the  number  of  (de)compressor  threads to WTHRS.  If this
55              option is not specified, lbzip2 tries to query  the  system  for
56              the  number  of online processors (if both the compilation envi‐
57              ronment and the execution environment support  that),  or  exits
58              with an error (if it's unable to determine the number of proces‐
59              sors online).
60
61
62       -k, --keep
63              Don't remove FILE  operands  after  successful  (de)compression.
64              Open regular input files with more than one link.
65
66
67       -c, --stdout
68              Write  output  to  standard  output, even when FILE operands are
69              present. Implies -k and excludes -t.
70
71
72       -t, --test
73              Test decompression; discard output  instead  of  writing  it  to
74              files  or  standard output. Implies -k and excludes -c.  Roughly
75              equivalent to passing -c and redirecting standard output to  the
76              bit bucket.
77
78
79       -d, --decompress
80              Force  decompression  over the mode of operation selected by the
81              invocation name.
82
83
84       -z, --compress
85              Force compression over the mode of  operation  selected  by  the
86              invocation name.
87
88
89       -1 .. -9
90              Set  the  compression block size to 100K .. 900K, in 100K incre‐
91              ments.  Ignored during decompression. See also  the  BLOCK  SIZE
92              section below.
93
94
95       --fast Alias for -1.
96
97
98       --best Alias for -9.  This is the default.
99
100
101       -f, --force
102              Open  non-regular  input  files. Open input files with more than
103              one link, breaking links when -k isn't  specified  in  addition.
104              Try  to  remove  each output file before opening it.  By default
105              lbzip2 will not overwrite existing files; if you  want  this  to
106              happen,  you  should  specify  -f.   If -c and -d are also given
107              don't reject files not in bzip2 format, just copy  them  without
108              change;  without -f lbzip2 would stop after reaching a file that
109              is not in bzip2 format.
110
111
112       -s, --small
113              Reduce memory usage at cost of performance.
114
115
116       -u, --sequential
117              Perform splitting input blocks sequentially.  This  may  improve
118              compression ratio and decrease CPU usage, but will degrade scal‐
119              ability.
120
121
122       -v, --verbose
123              Be more verbose. Print more detailed information about  (de)com‐
124              pression  progress  to  standard  error:  before processing each
125              file, print a message stating the  names  of  input  and  output
126              files;  during (de)compression, print a rough percentage of com‐
127              pleteness and estimated time of arrival (only if standard  error
128              is  connected to a terminal); after processing each file print a
129              message showing compression ratio, space savings, total compres‐
130              sion  time  (wall time) and average (de)compression speed (bytes
131              of plain data processed per second).
132
133
134       -S     Print condition variable statistics to standard error  for  each
135              completed (de)compression operation. Useful in profiling.
136
137
138       -q, --quiet, --repetitive-fast, --repetitive-best, --exponential
139              Accepted for compatibility with bzip2, otherwise ignored.
140
141
142       -h, --help
143              Print  help  on  command-line  usage on standard output and exit
144              successfully.
145
146
147       -L, --license, -V, --version
148              Print license and version information  on  standard  output  and
149              exit successfully.
150
151
152

ENVIRONMENT

154       LBZIP2, BZIP2, BZIP
155              Before  parsing the command line, lbzip2 inserts the contents of
156              these variables, in the order specified, between the  invocation
157              name  and  the rest of the command line. Tokens are separated by
158              spaces and tabs, which cannot be escaped.
159
160
161

OPERANDS

163       FILE   Specify files to compress or decompress.
164
165              FILEs with .bz2, .tbz, .tbz2 and  .tz2  name  suffixes  will  be
166              skipped when compressing. When decompressing, .bz2 suffixes will
167              be removed in output filenames; .tbz, .tbz2  and  .tz2  suffixes
168              will  be replaced by .tar; other filenames will be suffixed with
169              .out. If an INT or TERM signal is delivered to lbzip2,  then  it
170              removes the regular output file currently open before exiting.
171
172              If  no FILE is given, lbzip2 works as a filter, processing stan‐
173              dard input to standard output. In this case, lbzip2 will decline
174              to  write  compressed  output  to a terminal (or read compressed
175              input from a terminal), as this would be entirely incomprehensi‐
176              ble and therefore pointless.
177
178
179

EXIT STATUS

181       0      if  lbzip2 finishes successfully. This presumes that whenever it
182              tries, lbzip2 never fails to write to standard error.
183
184
185       1      if lbzip2 encounters a fatal error.
186
187
188       4      if lbzip2 issues warnings without encountering  a  fatal  error.
189              This  presumes  that  whenever  it  tries, lbzip2 never fails to
190              write to standard error.
191
192
193       SIGPIPE, SIGXFSZ
194              if lbzip2 intends to exit with status 1 due to any fatal  error,
195              but  any such signal with inherited SIG_DFL action was generated
196              for lbzip2 previously, then lbzip2 terminates by way of  one  of
197              said signals, after cleaning up any interrupted output file.
198
199
200       SIGABRT
201              if  a  runtime  assertion  fails  (i.e.  lbzip2 detects a bug in
202              itself). Hopefully whoever  compiled  your  binary  wasn't  bold
203              enough to #define NDEBUG.
204
205
206       SIGINT, SIGTERM
207              lbzip2  catches  these  signals  so that it can remove an inter‐
208              rupted output file. In such cases, lbzip2  exits  by  re-raising
209              (one of) the received signal(s).
210
211
212

BLOCK SIZE

214       lbzip2  compresses  large  files  in  blocks. It can operate at various
215       block sizes, ranging from 100k to 900k in 100k steps, and it  allocates
216       only  as  much  memory  as it needs to. The block size affects both the
217       compression ratio achieved, and the amount of memory  needed  both  for
218       compression  and decompression.  Compression and decompression speed is
219       virtually unaffected by block size, provided that the file  being  pro‐
220       cessed is large enough to be split among all worker threads.
221
222       The  flags  -1  through  -9  specify the block size to be 100,000 bytes
223       through 900,000 bytes (the  default)  respectively.  At  decompression-
224       time,  the  block size used for compression is read from the compressed
225       file -- the flags -1 to -9 are irrelevant  to  and  so  ignored  during
226       decompression.
227
228       Larger  block  sizes give rapidly diminishing marginal returns; most of
229       the compression comes from the first two or three hundred  k  of  block
230       size, a fact worth bearing in mind when using lbzip2 on small machines.
231       It is also  important  to  appreciate  that  the  decompression  memory
232       requirement  is set at compression-time by the choice of block size. In
233       general you should try and use  the  largest  block  size  memory  con‐
234       straints allow.
235
236       Another  significant  point applies to small files. By design, only one
237       of lbzip2's worker threads can work on a single block. This means  that
238       if  the number of blocks in the compressed file is less than the number
239       of processors online, then some of worker threads will remain idle  for
240       the  entire  time. Compressing small files with smaller block sizes can
241       therefore significantly increase  both  compression  and  decompression
242       speed.  The  speed  difference  is more noticeable as the number of CPU
243       cores grows.
244
245
246

ERROR HANDLING

248       Dealing with error conditions  is  the  least  satisfactory  aspect  of
249       lbzip2.   The policy is to try and leave the filesystem in a consistent
250       state, then quit, even if it means not processing  some  of  the  files
251       mentioned in the command line.
252
253       `A  consistent state' means that a file exists either in its compressed
254       or uncompressed form, but not both. This boils down to the rule `delete
255       the  output  file  if  an  error  condition  occurs,  leaving the input
256       intact'. Input files are only deleted when we can be  pretty  sure  the
257       output file has been written and closed successfully.
258
259
260
261

RESOURCE ALLOCATION

263       lbzip2  needs  various  kinds  of  system  resources  to operate. Those
264       include memory, threads, mutexes and condition variables. The policy is
265       to simply give up if a resource allocation failure occurs.
266
267       Resource  consumption  grows linearly with number of worker threads. If
268       lbzip2 fails because of lack of some resources,  decreasing  number  of
269       worker  threads  may  help.  It  would be possible for lbzip2 to try to
270       reduce number of worker threads (and hence the  resource  consumption),
271       or to move on to subsequent files in the hope that some might need less
272       resources, but the complications for doing this seem more trouble  than
273       they are worth.
274
275
276

DAMAGED FILES

278       lbzip2  attempts  to  compress  data  by performing several non-trivial
279       transformations on it. Every compression of a file implies  an  assump‐
280       tion  that  the  compressed  file  can be decompressed to reproduce the
281       original. Great efforts in design, coding and testing have been made to
282       ensure  that  this program works correctly.  However, the complexity of
283       the algorithms, and, in particular, the  presence  of  various  special
284       cases  in  the  code which occur with very low but non-zero probability
285       make it very difficult to rule out the possibility of bugs remaining in
286       the  program. That is not to say this program is inherently unreliable.
287       Indeed, I very much hope the opposite is true -- lbzip2 has been  care‐
288       fully constructed and extensively tested.
289
290       As  a  self-check  for your protection, lbzip2 uses 32-bit CRCs to make
291       sure that the decompressed version of a file is identical to the origi‐
292       nal. This guards against corruption of the compressed data, and against
293       undiscovered bugs in lbzip2 (hopefully unlikely). The chances  of  data
294       corruption  going  undetected  is microscopic, about one chance in four
295       billion for each file processed.  Be  aware,  though,  that  the  check
296       occurs  upon decompression, so it can only tell you that that something
297       is wrong.
298
299       CRCs can only detect corrupted files, they can't help you  recover  the
300       original,  uncompressed  data.  However, because of the block nature of
301       the compression algorithm, it may be possible to recover some parts  of
302       the damaged file, even if some blocks are destroyed.
303
304
305

BUGS

307       Separate input files don't share worker threads; at most one input file
308       is worked on at any moment.
309
310
311

AUTHORS

313       lbzip2 was originally written by Laszlo  Ersek  <lacos@caesar.elte.hu>,
314       http://lacos.hu/. Versions 2.0 and later were written by Mikolaj Izdeb‐
315       ski.
316
317
318

COPYRIGHT

320       Copyright (C) 2011, 2012, 2013 Mikolaj Izdebski
321       Copyright (C) 2008, 2009, 2010 Laszlo Ersek
322       Copyright (C) 1996 Julian Seward
323
324       This manual page is part of lbzip2, version 2.5. lbzip2 is  free  soft‐
325       ware:  you  can redistribute it and/or modify it under the terms of the
326       GNU General Public License as published by the  Free  Software  Founda‐
327       tion,  either  version  3 of the License, or (at your option) any later
328       version.
329
330       lbzip2 is distributed in the hope that it will be useful,  but  WITHOUT
331       ANY  WARRANTY;  without even the implied warranty of MERCHANTABILITY or
332       FITNESS FOR A PARTICULAR PURPOSE. See the GNU  General  Public  License
333       for more details.
334
335       You should have received a copy of the GNU General Public License along
336       with lbzip2. If not, see <http://www.gnu.org/licenses/>.
337
338
339

THANKS

341       Adam Maulis at ELTE IIG; Julian Seward;  Paul  Sladen;  Michael  Thomas
342       from  Caltech  HEP; Bryan Stillwell; Zsolt Bartos-Elekes; Imre Csatlos;
343       Gabor Kovesdan; Paul Wise; Paolo Bonzini; Department of Electrical  and
344       Information Engineering at the University of Oulu; Yuta Mori.
345
346
347