1samtools-fasta(1) Bioinformatics tools samtools-fasta(1)
2
3
4
6 samtools fasta / fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQ
7
9 samtools fastq [options] in.bam
10 samtools fasta [options] in.bam
11
12
14 Converts a BAM or CRAM into either FASTQ or FASTA format depending on
15 the command invoked. The files will be automatically compressed if the
16 file names have a .gz or .bgzf extension.
17
18 If the input contains read-pairs which are to be interleaved or written
19 to separate files in the same order, then the input should be first
20 collated by name. Use samtools collate or samtools sort -n to ensure
21 this.
22
23 For each different QNAME, the input records are categorised according
24 to the state of the READ1 and READ2 flag bits. The three categories
25 used are:
26
27 1 : Only READ1 is set.
28
29 2 : Only READ2 is set.
30
31 0 : Either both READ1 and READ2 are set; or neither is set.
32
33 The exact meaning of these categories depends on the sequencing tech‐
34 nology used. It is expected that ordinary single and paired-end se‐
35 quencing reads will be in categories 1 and 2 (in the case of paired-end
36 reads, one read of the pair will be in category 1, the other in cate‐
37 gory 2). Category 0 is essentially a “catch-all” for reads that do not
38 fit into a simple paired-end sequencing model.
39
40 For each category only one sequence will be written for a given QNAME.
41 If more than one record is available for a given QNAME and category,
42 the first in input file order that has quality values will be used. If
43 none of the candidate records has quality values, then the first in in‐
44 put file order will be used instead.
45
46 Sequences will be written to standard output unless one of the -1, -2,
47 -o, or -0 options is used, in which case sequences for that category
48 will be written to the specified file. The same filename may be speci‐
49 fied with multiple options, in which case the sequences will be multi‐
50 plexed in order of occurrence.
51
52 If a singleton file is specified using the -s option then only paired
53 sequences will be output for categories 1 and 2; paired meaning that
54 for a given QNAME there are sequences for both category 1 and 2. If
55 there is a sequence for only one of categories 1 or 2 then it will be
56 diverted into the specified singletons file. This can be used to pre‐
57 pare fastq files for programs that cannot handle a mixture of paired
58 and singleton reads.
59
60 The -s option only affects category 1 and 2 records. The output for
61 category 0 will be the same irrespective of the use of this option.
62
63
65 -n By default, either '/1' or '/2' is added to the end of read
66 names where the corresponding READ1 or READ2 FLAG bit is set.
67 Using -n causes read names to be left as they are.
68
69 -N Always add either '/1' or '/2' to the end of read names even
70 when put into different files.
71
72 -O Use quality values from OQ tags in preference to standard qual‐
73 ity string if available.
74
75 -s FILE Write singleton reads to FILE.
76
77 -t Copy RG, BC and QT tags to the FASTQ header line, if they ex‐
78 ist.
79
80 -T TAGLIST
81 Specify a comma-separated list of tags to copy to the FASTQ
82 header line, if they exist.
83
84 -1 FILE Write reads with the READ1 FLAG set (and READ2 not set) to FILE
85 instead of outputting them. If the -s option is used, only
86 paired reads will be written to this file.
87
88 -2 FILE Write reads with the READ2 FLAG set (and READ1 not set) to FILE
89 instead of outputting them. If the -s option is used, only
90 paired reads will be written to this file.
91
92 -o FILE Write reads with either READ1 FLAG or READ2 flag set to FILE
93 instead of outputting them to stdout. This is equivalent to -1
94 FILE -2 FILE.
95
96 -0 FILE Write reads where the READ1 and READ2 FLAG bits set are either
97 both set or both unset to FILE instead of outputting them.
98
99 -f INT Only output alignments with all bits set in INT present in the
100 FLAG field. INT can be specified in hex by beginning with `0x'
101 (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e.
102 /^0[0-7]+/) [0].
103
104 -F INT Do not output alignments with any bits set in INT present in
105 the FLAG field. INT can be specified in hex by beginning with
106 `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
107 (i.e. /^0[0-7]+/) [0x900]. This defaults to 0x900 representing
108 filtering of secondary and supplementary alignments.
109
110 -G INT Only EXCLUDE reads with all of the bits set in INT present in
111 the FLAG field. INT can be specified in hex by beginning with
112 `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0'
113 (i.e. /^0[0-7]+/) [0].
114
115 -i add Illumina Casava 1.8 format entry to header (eg 1:N:0:AT‐
116 CACG)
117
118 -c [0..9]
119 set compression level when writing gz or bgzf fastq files.
120
121 --i1 FILE
122 write first index reads to FILE
123
124 --i2 FILE
125 write second index reads to FILE
126
127 --barcode-tag TAG
128 aux tag to find index reads in [default: BC]
129
130 --quality-tag TAG
131 aux tag to find index quality in [default: QT]
132
133 -@, --threads INT
134 Number of input/output compression threads to use in addition
135 to main thread [0].
136
137 --index-format STR
138 string to describe how to parse the barcode and quality tags.
139 For example:
140
141
142 i14i8 the first 14 characters are index 1, the next 8 charac‐
143 ters are index 2
144
145 n8i14 ignore the first 8 characters, and use the next 14
146 characters for index 1
147
148 If the tag contains a separator, then the numeric part
149 can be replaced with '*' to mean 'read until the sepa‐
150 rator or end of tag', for example:
151
152 n*i* ignore the left part of the tag until the separator,
153 then use the second part
154
155
157 Starting from a coordinate sorted file, output paired reads to separate
158 files, discarding singletons, supplementary and secondary reads. The
159 resulting files can be used with, for example, the bwa aligner.
160
161 samtools collate -u -O in_pos.bam | \
162 samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n
163
164
165 Starting with a name collated file, output paired and singleton reads
166 in a single file, discarding supplementary and secondary reads. To get
167 all of the reads in a single file, it is necessary to redirect the out‐
168 put of samtools fastq. The output file is suitable for use with bwa
169 mem -p which understands interleaved files containing a mixture of
170 paired and singleton reads.
171
172 samtools fastq -0 /dev/null in_name.bam > all_reads.fq
173
174
175 Output paired reads in a single file, discarding supplementary and sec‐
176 ondary reads. Save any singletons in a separate file. Append /1 and
177 /2 to read names. This format is suitable for use by NextGenMap when
178 using its -p and -q options. With this aligner, paired reads must be
179 mapped separately to the singletons.
180
181 samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
182
183
184
186 o The way of specifying output files is far too complicated and easy to
187 get wrong.
188
189
191 Written by Heng Li, with modifications by Martin Pollard and Jennifer
192 Liddle, all from the Sanger Institute.
193
194
196 samtools(1), samtools-faidx(1), samtools-fqidx(1) samtools-import(1)
197
198 Samtools website: <http://www.htslib.org/>
199
200
201
202samtools-1.15.1 7 April 2022 samtools-fasta(1)