1samtools-view(1) Bioinformatics tools samtools-view(1)
2
3
4
6 samtools view - views and converts SAM/BAM/CRAM files
7
9 samtools view [options] in.sam|in.bam|in.cram [region...]
10
11
13 With no options or regions specified, prints all alignments in the
14 specified input alignment file (in SAM, BAM, or CRAM format) to stan‐
15 dard output in SAM format (with no header).
16
17 You may specify one or more space-separated region specifications after
18 the input filename to restrict output to only those alignments which
19 overlap the specified region(s). Use of region specifications requires
20 a coordinate-sorted and indexed input file (in BAM or CRAM format).
21
22 The -b, -C, -1, -u, -h, -H, and -c options change the output format
23 from the default of headerless SAM, and the -o and -U options set the
24 output file name(s).
25
26 The -t and -T options provide additional reference data. One of these
27 two options is required when SAM input does not contain @SQ headers,
28 and the -T option is required whenever writing CRAM output.
29
30 The -L, -M, -N, -r, -R, -d, -D, -s, -q, -l, -m, -f, -F, and -G options
31 filter the alignments that will be included in the output to only those
32 alignments that match certain criteria.
33
34 The -x, -B, --add-flags, and --remove-flags options modify the data
35 which is contained in each alignment.
36
37 The -X option can be used to allow user to specify customized index
38 file location(s) if the data folder does not contain any index file.
39 See EXAMPLES section for sample of usage.
40
41 Finally, the -@ option can be used to allocate additional threads to be
42 used for compression, and the -? option requests a long help message.
43
44
45 REGIONS:
46 Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all
47 position coordinates are 1-based.
48
49 Important note: when multiple regions are given, some alignments
50 may be output multiple times if they overlap more than one of
51 the specified regions.
52
53 Examples of region specifications:
54
55 chr1 Output all alignments mapped to the reference sequence
56 named `chr1' (i.e. @SQ SN:chr1).
57
58 chr2:1000000
59 The region on chr2 beginning at base position
60 1,000,000 and ending at the end of the chromosome.
61
62 chr3:1000-2000
63 The 1001bp region on chr3 beginning at base position
64 1,000 and ending at base position 2,000 (including
65 both end positions).
66
67 '*' Output the unmapped reads at the end of the file.
68 (This does not include any unmapped reads placed on a
69 reference sequence alongside their mapped mates.)
70
71 . Output all alignments. (Mostly unnecessary as not
72 specifying a region at all has the same effect.)
73
74
75
77 -b, --bam Output in the BAM format.
78
79 -C, --cram
80 Output in the CRAM format (requires -T).
81
82 -1, --fast
83 Enable fast BAM compression (implies -b).
84
85 -u, --uncompressed
86 Output uncompressed BAM. This option saves time spent on com‐
87 pression/decompression and is thus preferred when the output
88 is piped to another samtools command.
89
90 -h, --with-header
91 Include the header in the output.
92
93 -H, --header-only
94 Output the header only.
95
96 --no-header
97 When producing SAM format, output alignment records but not
98 headers. This is the default; the option can be used to re‐
99 set the effect of -h/-H.
100
101 -c, --count
102 Instead of printing the alignments, only count them and print
103 the total number. All filter options, such as -f, -F, and -q,
104 are taken into account.
105
106 -?, --help
107 Output long help and exit immediately.
108
109 -o FILE, --output FILE
110 Output to FILE [stdout].
111
112 -U FILE, --unoutput FILE, --output-unselected FILE
113 Write alignments that are not selected by the various filter
114 options to FILE. When this option is used, all alignments
115 (or all alignments intersecting the regions specified) are
116 written to either the output file or this file, but never
117 both.
118
119 -t FILE, --fai-reference FILE
120 A tab-delimited FILE. Each line must contain the reference
121 name in the first column and the length of the reference in
122 the second column, with one line for each distinct reference.
123 Any additional fields beyond the second column are ignored.
124 This file also defines the order of the reference sequences
125 in sorting. If you run: `samtools faidx <ref.fa>', the re‐
126 sulting index file <ref.fa>.fai can be used as this FILE.
127
128 -T FILE, --reference FILE
129 A FASTA format reference FILE, optionally compressed by bgzip
130 and ideally indexed by samtools faidx. If an index is not
131 present one will be generated for you, if the reference file
132 is local.
133
134 If the reference file is not local, but is accessed instead
135 via an https://, s3:// or other URL, the index file will need
136 to be supplied by the server alongside the reference. It is
137 possible to have the reference and index files in different
138 locations by supplying both to this option separated by the
139 string "##idx##", for example:
140
141 -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai
142
143 However, note that only the location of the reference will be
144 stored in the output file header. If this method is used to
145 make CRAM files, the cram reader may not be able to find the
146 index, and may not be able to decode the file unless it can
147 get the references it needs using a different method.
148
149 -L FILE, --target-file FILE, --targets-file FILE
150 Only output alignments overlapping the input BED FILE [null].
151
152 -M, --use-index
153 Use the multi-region iterator on the union of a BED file and
154 command-line region arguments. This avoids re-reading the
155 same regions of files so can sometimes be much faster. Note
156 this also removes duplicate sequences. Without this a se‐
157 quence that overlaps multiple regions specified on the com‐
158 mand line will be reported multiple times. The usage of a
159 BED file is optional and its path has to be preceded by -L
160 option.
161
162 --region-file FILE, --regions-file FILE
163 Use an index and multi-region iterator to only output align‐
164 ments overlapping the input BED FILE. Equivalent to -M -L
165 FILE or --use-index --target-file FILE.
166
167 -N FILE, --qname-file FILE
168 Output only alignments with read names listed in FILE.
169
170 -r STR, --read-group STR
171 Output alignments in read group STR [null]. Note that
172 records with no RG tag will also be output when using this
173 option. This behaviour may change in a future release.
174
175 -R FILE, --read-group-file FILE
176 Output alignments in read groups listed in FILE [null]. Note
177 that records with no RG tag will also be output when using
178 this option. This behaviour may change in a future release.
179
180 -d STR1[:STR2], --tag STR1[:STR2]
181 Only output alignments with tag STR1 and associated value
182 STR2, which can be a string or an integer [null]. The value
183 can be omitted, in which case only the tag is considered.
184
185 -D STR:FILE, --tag-file STR:FILE
186 Only output alignments with tag STR and associated values
187 listed in FILE [null].
188
189 -q INT, --min-MQ INT
190 Skip alignments with MAPQ smaller than INT [0].
191
192 -l STR, --library STR
193 Only output alignments in library STR [null].
194
195 -m INT, --min-qlen INT
196 Only output alignments with number of CIGAR bases consuming
197 query sequence ≥ INT [0]
198
199 -e STR, --expr STR
200 Only include alignments that match the filter expression STR.
201 The syntax for these expressions is described in the main
202 samtools(1) man page under the FILTER EXPRESSIONS heading.
203
204 -f FLAG, --require-flags FLAG
205 Only output alignments with all bits set in FLAG present in
206 the FLAG field. FLAG can be specified in hex by beginning
207 with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with
208 `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
209 '0' or as a comma-separated list of flag names.
210
211
212 For a list of flag names see samtools-flags(1).
213
214 -F FLAG, --excl-flags FLAG, --exclude-flags FLAG
215 Do not output alignments with any bits set in FLAG present in
216 the FLAG field. FLAG can be specified in hex by beginning
217 with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning with
218 `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
219 '0' or as a comma-separated list of flag names.
220
221 -G FLAG Do not output alignments with all bits set in INT present in
222 the FLAG field. This is the opposite of -f such that -f12
223 -G12 is the same as no filtering at all. FLAG can be speci‐
224 fied in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/), in
225 octal by beginning with `0' (i.e. /^0[0-7]+/), as a decimal
226 number not beginning with '0' or as a comma-separated list of
227 flag names.
228
229 -x STR, --remove-tag STR
230 Read tag to exclude from output (repeatable) [null]
231
232 -B, --remove-B
233 Collapse the backward CIGAR operation.
234
235 --add-flags FLAG
236 Adds flag(s) to read. FLAG can be specified in hex by begin‐
237 ning with `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning
238 with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning
239 with '0' or as a comma-separated list of flag names.
240
241 --remove-flags FLAG
242 Remove flag(s) from read. FLAG is specified in the same way
243 as with the --add-flags option.
244
245 --subsample FLOAT
246 Output only a proportion of the input alignments, as speci‐
247 fied by 0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of tem‐
248 plates/pairs to be kept. This subsampling acts in the same
249 way on all of the alignment records in the same template or
250 read pair, so it never keeps a read but not its mate.
251
252 --subsample-seed INT
253 Subsampling seed used to influence which subset of reads is
254 kept. When subsampling data that has previously been subsam‐
255 pled, be sure to use a different seed value from those used
256 previously; otherwise more reads will be retained than ex‐
257 pected. [0]
258
259 -s FLOAT Subsampling shorthand option: -s INT.FRAC is equivalent to
260 --subsample-seed INT --subsample 0.FRAC.
261
262 -@ INT, --threads INT
263 Number of BAM compression threads to use in addition to main
264 thread [0].
265
266 -S Ignored for compatibility with previous samtools versions.
267 Previously this option was required if input was in SAM for‐
268 mat, but now the correct format is automatically detected by
269 examining the first few characters of input.
270
271 -X, --customized-index
272 Include customized index file as a part of arguments. See EX‐
273 AMPLES section for sample of usage.
274
275 --no-PG Do not add a @PG line to the header of the output file.
276
277
279 o Import SAM to BAM when @SQ lines are present in the header:
280
281 samtools view -bo aln.bam aln.sam
282
283 If @SQ lines are absent:
284
285 samtools faidx ref.fa
286 samtools view -bt ref.fa.fai -o aln.bam aln.sam
287
288 where ref.fa.fai is generated automatically by the faidx command.
289
290
291 o Convert a BAM file to a CRAM file using a local reference sequence.
292
293 samtools view -C -T ref.fa -o aln.cram aln.bam
294
295
296
297 o Convert a BAM file to a CRAM with NM and MD tags stored verbatim
298 rather than calculating on the fly during CRAM decode, so that mixed
299 data sets with MD/NM only on some records, or NM calculated using
300 different definitions of mismatch, can be decoded without change.
301 The second command demonstrates how to decode such a file. The re‐
302 quest to not decode MD here is turning off auto-generation of both MD
303 and NM; it will still emit the MD/NM tags on records that had these
304 stored verbatim.
305
306 samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
307 samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram
308
309
310 o An alternative way of achieving the above is listing multiple options
311 after the --output-fmt or -O option. The commands below are equiva‐
312 lent to the two above.
313
314 samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
315 samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram
316
317
318
319 o Include customized index file as a part of arguments.
320
321 samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10
322
323
324
325 o Output alignments in read group grp2 (records with no RG tag will
326 also be in the output).
327
328 samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam
329
330
331
332 o Only keep reads with tag BC and were the barcode matches the barcodes
333 listed in the barcode file.
334
335 samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam
336
337
338
339 o Only keep reads with tag RG and read group grp2. This does almost
340 the same than -r grp2 but will not keep records without the RG tag.
341
342 samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam
343
344
345
346 o Remove the actions of samtools markdup. Clear the duplicate flag and
347 remove the dt tag, keep the header.
348
349 samtools view -h --remove-flags DUP -x dt -o /data_folder/dat.no_dup_markings.bam /data_folder/data.bam
350
351
352
354 Written by Heng Li from the Sanger Institute.
355
356
358 samtools(1), samtools-tview(1), sam(5)
359
360 Samtools website: <http://www.htslib.org/>
361
362
363
364samtools-1.13 7 July 2021 samtools-view(1)