1bgzip(1) Bioinformatics tools bgzip(1)
2
3
4
6 bgzip - Block compression/decompression utility
7
9 bgzip [-cdfhikrt] [-b virtualOffset] [-I index_name] [-l compres‐
10 sion_level] [-s size] [-@ threads] [file]
11
13 Bgzip compresses files in a similar manner to, and compatible with,
14 gzip(1). The file is compressed into a series of small (less than 64K)
15 'BGZF' blocks. This allows indexes to be built against the compressed
16 file and used to retrieve portions of the data without having to decom‐
17 press the entire file.
18
19 If no files are specified on the command line, bgzip will compress (or
20 decompress if the -d option is used) standard input to standard output.
21 If a file is specified, it will be compressed (or decompressed with
22 -d). If the -c option is used, the result will be written to standard
23 output, otherwise when compressing bgzip will write to a new file with
24 a .gz suffix and remove the original. When decompressing the input
25 file must have a .gz suffix, which will be removed to make the output
26 name. Again after decompression completes the input file will be re‐
27 moved.
28
29
31 -b, --offset INT
32 Decompress to standard output from virtual file position
33 (0-based uncompressed offset). Implies -c and -d.
34
35 -c, --stdout
36 Write to standard output, keep original files unchanged.
37
38 -d, --decompress
39 Decompress.
40
41 -f, --force
42 Overwrite files without asking, or decompress files that
43 don't have a known compression filename extension (e.g., .gz)
44 without asking. Use --force twice to do both without asking.
45
46 -g, --rebgzip
47 Try to use an existing index to create a compressed file with
48 matching block offsets. Note that this assumes that the same
49 compression library and level are in use as when making the
50 original file. Don't use it unless you know what you're do‐
51 ing.
52
53 -h, --help
54 Displays a help message.
55
56 -i, --index
57 Create a BGZF index while compressing. Unless the -I option
58 is used, this will have the name of the compressed file with
59 .gzi appended to it.
60
61 -I, --index-name FILE
62 Index file name.
63
64 -k, --keep
65 Do not delete input file during operation.
66
67 -l, --compress-level INT
68 Compression level to use when compressing. From 0 to 9, or
69 -1 for the default level set by the compression library. [-1]
70
71 -r, --reindex
72 Rebuild the index on an existing compressed file.
73
74 -s, --size INT
75 Decompress INT bytes (uncompressed size) to standard output.
76 Implies -c.
77
78 -t, --test
79 Test the intregrity of the compressed file.
80
81 -@, --threads INT
82 Number of threads to use [1].
83
85 The BGZF format written by bgzip is described in the SAM format speci‐
86 fication available from http://samtools.github.io/hts-specs/SAMv1.pdf.
87
88 It makes use of a gzip feature which allows compressed files to be con‐
89 catenated. The input data is divided into blocks which are no larger
90 than 64 kilobytes both before and after compression (including compres‐
91 sion headers). Each block is compressed into a gzip file. The gzip
92 header includes an extra sub-field with identifier 'BC' and the length
93 of the compressed block, including all headers.
94
95
97 The index format is a binary file listing pairs of compressed and un‐
98 compressed offsets in a BGZF file. Each compressed offset points to
99 the start of a BGZF block. The uncompressed offset is the correspond‐
100 ing location in the uncompressed data stream.
101
102 All values are stored as little-endian 64-bit unsigned integers.
103
104 The file contents are:
105
106 uint64_t number_entries
107
108 followed by number_entries pairs of:
109
110 uint64_t compressed_offset
111 uint64_t uncompressed_offset
112
113
114
116 # Compress stdin to stdout
117 bgzip < /usr/share/dict/words > /tmp/words.gz
118
119 # Make a .gzi index
120 bgzip -r /tmp/words.gz
121
122 # Extract part of the data using the index
123 bgzip -b 367635 -s 4 /tmp/words.gz
124
125 # Uncompress the whole file, removing the compressed copy
126 bgzip -d /tmp/words.gz
127
128
129
131 The BGZF library was originally implemented by Bob Handsaker and modi‐
132 fied by Heng Li for remote file access and in-memory caching.
133
134
136 gzip(1), tabix(1)
137
138
139
140htslib-1.15.1 7 April 2022 bgzip(1)