bgzip(1) - f38

1bgzip(1)                     Bioinformatics tools                     bgzip(1)
2
3
4

NAME

6       bgzip - Block compression/decompression utility
7

SYNOPSIS

9       bgzip  [-cdfhikrt]  [-b  virtualOffset]  [-I  index_name]  [-l compres‐
10       sion_level] [-s size] [-@ threads] [file]
11

DESCRIPTION

13       Bgzip compresses files in a similar manner  to,  and  compatible  with,
14       gzip(1).  The file is compressed into a series of small (less than 64K)
15       'BGZF' blocks.  This allows indexes to be built against the  compressed
16       file and used to retrieve portions of the data without having to decom‐
17       press the entire file.
18
19       If no files are specified on the command line, bgzip will compress  (or
20       decompress if the -d option is used) standard input to standard output.
21       If a file is specified, it will be  compressed  (or  decompressed  with
22       -d).   If the -c option is used, the result will be written to standard
23       output, otherwise when compressing bgzip will write to a new file  with
24       a  .gz  suffix  and  remove the original.  When decompressing the input
25       file must have a .gz suffix, which will be removed to make  the  output
26       name.   Again  after decompression completes the input file will be re‐
27       moved.
28
29

OPTIONS

31       -b, --offset INT
32                 Decompress to standard  output  from  virtual  file  position
33                 (0-based uncompressed offset).  Implies -c and -d.
34
35       -c, --stdout
36                 Write to standard output, keep original files unchanged.
37
38       -d, --decompress
39                 Decompress.
40
41       -f, --force
42                 Overwrite  files  without  asking,  or  decompress files that
43                 don't have a known compression filename extension (e.g., .gz)
44                 without asking.  Use --force twice to do both without asking.
45
46       -g, --rebgzip
47                 Try to use an existing index to create a compressed file with
48                 matching block offsets.  Note that this assumes that the same
49                 compression  library  and level are in use as when making the
50                 original file.  Don't use it unless you know what you're  do‐
51                 ing.
52
53       -h, --help
54                 Displays a help message.
55
56       -i, --index
57                 Create  a BGZF index while compressing.  Unless the -I option
58                 is used, this will have the name of the compressed file  with
59                 .gzi appended to it.
60
61       -I, --index-name FILE
62                 Index file name.
63
64       -k, --keep
65                 Do not delete input file during operation.
66
67       -l, --compress-level INT
68                 Compression  level  to use when compressing.  From 0 to 9, or
69                 -1 for the default level set by the compression library. [-1]
70
71       -r, --reindex
72                 Rebuild the index on an existing compressed file.
73
74       -s, --size INT
75                 Decompress INT bytes (uncompressed size) to standard  output.
76                 Implies -c.
77
78       -t, --test
79                 Test the intregrity of the compressed file.
80
81       -@, --threads INT
82                 Number of threads to use [1].
83

BGZF FORMAT

85       The  BGZF format written by bgzip is described in the SAM format speci‐
86       fication available from http://samtools.github.io/hts-specs/SAMv1.pdf.
87
88       It makes use of a gzip feature which allows compressed files to be con‐
89       catenated.   The  input data is divided into blocks which are no larger
90       than 64 kilobytes both before and after compression (including compres‐
91       sion  headers).   Each  block is compressed into a gzip file.  The gzip
92       header includes an extra sub-field with identifier 'BC' and the  length
93       of the compressed block, including all headers.
94
95

GZI FORMAT

97       The  index  format is a binary file listing pairs of compressed and un‐
98       compressed offsets in a BGZF file.  Each compressed  offset  points  to
99       the  start of a BGZF block.  The uncompressed offset is the correspond‐
100       ing location in the uncompressed data stream.
101
102       All values are stored as little-endian 64-bit unsigned integers.
103
104       The file contents are:
105
106           uint64_t number_entries
107
108       followed by number_entries pairs of:
109
110           uint64_t compressed_offset
111           uint64_t uncompressed_offset
112
113
114

EXAMPLES

116           # Compress stdin to stdout
117           bgzip < /usr/share/dict/words > /tmp/words.gz
118
119           # Make a .gzi index
120           bgzip -r /tmp/words.gz
121
122           # Extract part of the data using the index
123           bgzip -b 367635 -s 4 /tmp/words.gz
124
125           # Uncompress the whole file, removing the compressed copy
126           bgzip -d /tmp/words.gz
127
128
129

AUTHOR

131       The BGZF library was originally implemented by Bob Handsaker and  modi‐
132       fied by Heng Li for remote file access and in-memory caching.
133
134