1samtools-view(1)             Bioinformatics tools             samtools-view(1)
2
3
4

NAME

6       samtools view - views and converts SAM/BAM/CRAM files
7

SYNOPSIS

9       samtools view [options] in.sam|in.bam|in.cram [region...]
10
11

DESCRIPTION

13       With  no  options  or  regions  specified, prints all alignments in the
14       specified input alignment file (in SAM, BAM, or CRAM format)  to  stan‐
15       dard output in SAM format (with no header).
16
17       You may specify one or more space-separated region specifications after
18       the input filename to restrict output to only  those  alignments  which
19       overlap  the specified region(s). Use of region specifications requires
20       a coordinate-sorted and indexed input file (in BAM or CRAM format).
21
22       The -b, -C, -1, -u, -h, -H, and -c options  change  the  output  format
23       from  the  default of headerless SAM, and the -o and -U options set the
24       output file name(s).
25
26       The -t and -T options provide additional reference data. One  of  these
27       two  options  is  required when SAM input does not contain @SQ headers,
28       and the -T option is required whenever writing CRAM output.
29
30       The -L, -M, -N, -r, -R, -d, -D, -s, -q, -l, -m, -f, -F,  -G,  and  --rf
31       options  filter  the  alignments that will be included in the output to
32       only those alignments that match certain criteria.
33
34       The -p, option sets the UNMAP flag on filtered alignments  then  writes
35       them to the output file.
36
37       The  -x,  -B,  --add-flags,  and --remove-flags options modify the data
38       which is contained in each alignment.
39
40       The -X option can be used to allow user  to  specify  customized  index
41       file  location(s)  if  the data folder does not contain any index file.
42       See EXAMPLES section for sample of usage.
43
44       Finally, the -@ option can be used to allocate additional threads to be
45       used for compression, and the -?  option requests a long help message.
46
47
48       REGIONS:
49              Regions  can  be specified as: RNAME[:STARTPOS[-ENDPOS]] and all
50              position coordinates are 1-based.
51
52              Important note: when multiple regions are given, some alignments
53              may  be  output  multiple times if they overlap more than one of
54              the specified regions.
55
56              Examples of region specifications:
57
58              chr1      Output all alignments mapped to the reference sequence
59                        named `chr1' (i.e. @SQ SN:chr1).
60
61              chr2:1000000
62                        The   region   on  chr2  beginning  at  base  position
63                        1,000,000 and ending at the end of the chromosome.
64
65              chr3:1000-2000
66                        The 1001bp region on chr3 beginning at  base  position
67                        1,000  and  ending  at  base position 2,000 (including
68                        both end positions).
69
70              '*'       Output the unmapped reads at  the  end  of  the  file.
71                        (This  does not include any unmapped reads placed on a
72                        reference sequence alongside their mapped mates.)
73
74              .         Output all alignments.   (Mostly  unnecessary  as  not
75                        specifying a region at all has the same effect.)
76
77
78

OPTIONS

80       -b, --bam Output in the BAM format.
81
82       -C, --cram
83                 Output in the CRAM format (requires -T).
84
85       -1, --fast
86                 Enable  fast compression.  This also changes the default out‐
87                 put format to BAM, but this can be overridden by the explicit
88                 format options or using a filename with a known suffix.
89
90       -u, --uncompressed
91                 Output  uncompressed data. This also changes the default out‐
92                 put format to BAM, but this can be overridden by the explicit
93                 format options or using a filename with a known suffix.
94
95                 This option saves time spent on compression/decompression and
96                 is thus preferred when the output is piped  to  another  sam‐
97                 tools command.
98
99       -h, --with-header
100                 Include the header in the output.
101
102       -H, --header-only
103                 Output the header only.
104
105       --no-header
106                 When  producing  SAM format, output alignment records but not
107                 headers.  This is the default; the option can be used to  re‐
108                 set the effect of -h/-H.
109
110       -c, --count
111                 Instead of printing the alignments, only count them and print
112                 the total number. All filter options, such as -f, -F, and -q,
113                 are  taken  into  account.   The -p option is ignored in this
114                 mode.
115
116       -?, --help
117                 Output long help and exit immediately.
118
119       -o FILE, --output FILE
120                 Output to FILE [stdout].
121
122       -U FILE, --unoutput FILE, --output-unselected FILE
123                 Write alignments that are not selected by the various  filter
124                 options  to  FILE.   When this option is used, all alignments
125                 (or all alignments intersecting the  regions  specified)  are
126                 written  to  either  the  output file or this file, but never
127                 both.
128
129       -p, --unmap
130                 Set the UNMAP flag on alignments that are not selected by the
131                 filter  options.   These  alignments  are then written to the
132                 normal output.  This is not compatible with -U.
133
134       -t FILE, --fai-reference FILE
135                 A tab-delimited FILE.  Each line must contain  the  reference
136                 name  in  the first column and the length of the reference in
137                 the second column, with one line for each distinct reference.
138                 Any  additional  fields beyond the second column are ignored.
139                 This file also defines the order of the  reference  sequences
140                 in  sorting.  If  you run: `samtools faidx <ref.fa>', the re‐
141                 sulting index file <ref.fa>.fai can be used as this FILE.
142
143       -T FILE, --reference FILE
144                 A FASTA format reference FILE, optionally compressed by bgzip
145                 and  ideally  indexed  by samtools faidx.  If an index is not
146                 present one will be generated for you, if the reference  file
147                 is local.
148
149                 If  the  reference file is not local, but is accessed instead
150                 via an https://, s3:// or other URL, the index file will need
151                 to  be supplied by the server alongside the reference.  It is
152                 possible to have the reference and index files  in  different
153                 locations  by  supplying both to this option separated by the
154                 string "##idx##", for example:
155
156                 -T ftp://x.com/ref.fa##idx##ftp://y.com/index.fa.fai
157
158                 However, note that only the location of the reference will be
159                 stored  in the output file header.  If this method is used to
160                 make CRAM files, the cram reader may not be able to find  the
161                 index,  and  may not be able to decode the file unless it can
162                 get the references it needs using a different method.
163
164       -L FILE, --target-file FILE, --targets-file FILE
165                 Only output alignments overlapping the input BED FILE [null].
166
167       -M, --use-index
168                 Use the multi-region iterator on the union of a BED file  and
169                 command-line  region  arguments.   This avoids re-reading the
170                 same regions of files so can sometimes be much faster.   Note
171                 this  also  removes  duplicate sequences.  Without this a se‐
172                 quence that overlaps multiple regions specified on  the  com‐
173                 mand  line  will  be reported multiple times.  The usage of a
174                 BED file is optional and its path has to be  preceded  by  -L
175                 option.
176
177       --region-file FILE, --regions-file FILE
178                 Use  an index and multi-region iterator to only output align‐
179                 ments overlapping the input BED FILE.  Equivalent  to  -M  -L
180                 FILE or --use-index --target-file FILE.
181
182       -N FILE, --qname-file FILE
183                 Output only alignments with read names listed in FILE.
184
185       -r STR, --read-group STR
186                 Output  alignments  in  read  group  STR  [null].   Note that
187                 records with no RG tag will also be output  when  using  this
188                 option.  This behaviour may change in a future release.
189
190       -R FILE, --read-group-file FILE
191                 Output alignments in read groups listed in FILE [null].  Note
192                 that records with no RG tag will also be  output  when  using
193                 this option.  This behaviour may change in a future release.
194
195       -d STR1[:STR2], --tag STR1[:STR2]
196                 Only  output  alignments  with  tag STR1 and associated value
197                 STR2, which can be a string or an integer [null].  The  value
198                 can be omitted, in which case only the tag is considered.
199
200       -D STR:FILE, --tag-file STR:FILE
201                 Only  output  alignments  with  tag STR and associated values
202                 listed in FILE [null].
203
204       -q INT, --min-MQ INT
205                 Skip alignments with MAPQ smaller than INT [0].
206
207       -l STR, --library STR
208                 Only output alignments in library STR [null].
209
210       -m INT, --min-qlen INT
211                 Only output alignments with number of CIGAR  bases  consuming
212                 query sequence ≥ INT [0]
213
214       -e STR, --expr STR
215                 Only include alignments that match the filter expression STR.
216                 The syntax for these expressions is  described  in  the  main
217                 samtools(1) man page under the FILTER EXPRESSIONS heading.
218
219       -f FLAG, --require-flags FLAG
220                 Only  output  alignments with all bits set in FLAG present in
221                 the FLAG field.  FLAG can be specified in  hex  by  beginning
222                 with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by beginning with
223                 `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
224                 '0' or as a comma-separated list of flag names.
225
226
227                 For a list of flag names see samtools-flags(1).
228
229       -F FLAG, --excl-flags FLAG, --exclude-flags FLAG
230                 Do not output alignments with any bits set in FLAG present in
231                 the FLAG field.  FLAG can be specified in  hex  by  beginning
232                 with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by beginning with
233                 `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
234                 '0' or as a comma-separated list of flag names.
235
236       --rf FLAG , --incl-flags FLAG, --include-flags FLAG
237                 Only  output  alignments  with any bit set in FLAG present in
238                 the FLAG field.  FLAG can be specified in  hex  by  beginning
239                 with  `0x'  (i.e. /^0x[0-9A-F]+/), in octal by beginning with
240                 `0' (i.e. /^0[0-7]+/), as a decimal number not beginning with
241                 '0' or as a comma-separated list of flag names.
242
243       -G FLAG   Do  not output alignments with all bits set in INT present in
244                 the FLAG field.  This is the opposite of -f  such  that  -f12
245                 -G12  is the same as no filtering at all.  FLAG can be speci‐
246                 fied in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/),  in
247                 octal  by  beginning with `0' (i.e. /^0[0-7]+/), as a decimal
248                 number not beginning with '0' or as a comma-separated list of
249                 flag names.
250
251       -x STR, --remove-tag STR
252                 Read tag(s) to exclude from output (repeatable) [null].  This
253                 can be a single tag or a comma separated list.  Alternatively
254                 the option itself can be repeated multiple times.
255
256                 If  the list starts with a `^' then it is negated and treated
257                 as a request to remove all tags except those in STR. The list
258                 may be empty, so -x ^ will remove all tags.
259
260                 Note that tags will only be removed from reads that pass fil‐
261                 tering.
262
263       --keep-tag STR
264                 This keeps only tags listed in STR and is directly equivalent
265                 to  --remove-tag  ^STR.  Specifying an empty list will remove
266                 all tags.  If both --keep-tag and --remove-tag are  specified
267                 then --keep-tag has precedence.
268
269                 Note that tags will only be removed from reads that pass fil‐
270                 tering.
271
272       -B, --remove-B
273                 Collapse the backward CIGAR operation.
274
275       --add-flags FLAG
276                 Adds flag(s) to read.  FLAG can be specified in hex by begin‐
277                 ning  with  `0x' (i.e. /^0x[0-9A-F]+/), in octal by beginning
278                 with `0' (i.e. /^0[0-7]+/), as a decimal number not beginning
279                 with '0' or as a comma-separated list of flag names.
280
281       --remove-flags FLAG
282                 Remove  flag(s) from read.  FLAG is specified in the same way
283                 as with the --add-flags option.
284
285       --subsample FLOAT
286                 Output only a proportion of the input alignments,  as  speci‐
287                 fied  by  0.0 ≤ FLOAT ≤ 1.0, which gives the fraction of tem‐
288                 plates/pairs to be kept.  This subsampling acts in  the  same
289                 way  on  all of the alignment records in the same template or
290                 read pair, so it never keeps a read but not its mate.
291
292       --subsample-seed INT
293                 Subsampling seed used to influence which subset of  reads  is
294                 kept.  When subsampling data that has previously been subsam‐
295                 pled, be sure to use a different seed value from  those  used
296                 previously;  otherwise  more  reads will be retained than ex‐
297                 pected.  [0]
298
299       -s FLOAT  Subsampling shorthand option: -s INT.FRAC  is  equivalent  to
300                 --subsample-seed INT --subsample 0.FRAC.
301
302       -@ INT, --threads INT
303                 Number  of BAM compression threads to use in addition to main
304                 thread [0].
305
306       -P, --fetch-pairs
307                 Retrieve pairs even when the mate is outside of the requested
308                 region.   Enabling this option also turns on the multi-region
309                 iterator (-M).  A region to search must be specified,  either
310                 on  the command-line, or using the -L option.  The input file
311                 must be an indexed regular file.
312
313                 This option first scans the requested region, using the RNEXT
314                 and PNEXT fields of the records that have the PAIRED flag set
315                 and pass other filtering options to find where  paired  reads
316                 are  located.   These locations are used to build an expanded
317                 region list, and a set of QNAMEs to allow from  the  new  re‐
318                 gions.  It will then make a second pass, collecting all reads
319                 from the originally-specified region list together with reads
320                 from  additional  locations  that  match  the  allowed set of
321                 QNAMEs.  Any other filtering options used will be applied  to
322                 all reads found during this second pass.
323
324                 As  this  option links reads using RNEXT and PNEXT, it is im‐
325                 portant that these fields are set accurately.  Use  'samtools
326                 fixmate' to correct them if necessary.
327
328                 Note that this option does not work with the -c, --count; -U,
329                 --output-unselected; or -p, --unmap options.
330
331       -S        Ignored for compatibility with  previous  samtools  versions.
332                 Previously  this option was required if input was in SAM for‐
333                 mat, but now the correct format is automatically detected  by
334                 examining the first few characters of input.
335
336       -X, --customized-index
337                 Include customized index file as a part of arguments. See EX‐
338                 AMPLES section for sample of usage.
339
340       --no-PG   Do not add a @PG line to the header of the output file.
341
342

EXAMPLES

344       o Import SAM to BAM when @SQ lines are present in the header:
345
346           samtools view -bo aln.bam aln.sam
347
348         If @SQ lines are absent:
349
350           samtools faidx ref.fa
351           samtools view -bt ref.fa.fai -o aln.bam aln.sam
352
353         where ref.fa.fai is generated automatically by the faidx command.
354
355
356       o Convert a BAM file to a CRAM file using a local reference sequence.
357
358           samtools view -C -T ref.fa -o aln.cram aln.bam
359
360
361
362       o Convert a BAM file to a CRAM with NM  and  MD  tags  stored  verbatim
363         rather  than calculating on the fly during CRAM decode, so that mixed
364         data sets with MD/NM only on some records,  or  NM  calculated  using
365         different  definitions  of  mismatch,  can be decoded without change.
366         The second command demonstrates how to decode such a file.   The  re‐
367         quest to not decode MD here is turning off auto-generation of both MD
368         and NM; it will still emit the MD/NM tags on records that  had  these
369         stored verbatim.
370
371           samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
372           samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram
373
374
375       o An alternative way of achieving the above is listing multiple options
376         after the --output-fmt or -O option.  The commands below are  equiva‐
377         lent to the two above.
378
379           samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
380           samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram
381
382
383
384       o Include customized index file as a part of arguments.
385
386           samtools view [options] -X /data_folder/data.bam /index_folder/data.bai chrM:1-10
387
388
389
390       o Output  alignments  in  read  group grp2 (records with no RG tag will
391         also be in the output).
392
393           samtools view -r grp2 -o /data_folder/data.rg2.bam /data_folder/data.bam
394
395
396
397       o Only keep reads with tag BC and were the barcode matches the barcodes
398         listed in the barcode file.
399
400           samtools view -D BC:barcodes.txt -o /data_folder/data.barcodes.bam /data_folder/data.bam
401
402
403
404       o Only  keep  reads  with tag RG and read group grp2.  This does almost
405         the same than -r grp2 but will not keep records without the RG tag.
406
407           samtools view -d RG:grp2 -o /data_folder/data.rg2_only.bam /data_folder/data.bam
408
409
410
411       o Remove the actions of samtools markdup.  Clear the duplicate flag and
412         remove the dt tag, keep the header.
413
414           samtools view -h --remove-flags DUP -x dt -o /data_folder/dat.no_dup_markings.bam /data_folder/data.bam
415
416
417

AUTHOR

419       Written by Heng Li from the Sanger Institute.
420
421

SEE ALSO

423       samtools(1), samtools-tview(1), sam(5)
424
425       Samtools website: <http://www.htslib.org/>
426
427
428
429samtools-1.15.1                  7 April 2022                 samtools-view(1)
Impressum