1samtools-fasta(1)            Bioinformatics tools            samtools-fasta(1)
2
3
4

NAME

6       samtools fasta / fastq - converts a SAM/BAM/CRAM file to FASTA or FASTQ
7

SYNOPSIS

9       samtools fastq [options] in.bam
10       samtools fasta [options] in.bam
11
12

DESCRIPTION

14       Converts  a  BAM or CRAM into either FASTQ or FASTA format depending on
15       the command invoked. The files will be automatically compressed if  the
16       file names have a .gz or .bgzf extension.
17
18       If the input contains read-pairs which are to be interleaved or written
19       to separate files in the same order, then the  input  should  be  first
20       collated  by  name.  Use samtools collate or samtools sort -n to ensure
21       this.
22
23       For each different QNAME, the input records are  categorised  according
24       to  the  state  of the READ1 and READ2 flag bits.  The three categories
25       used are:
26
27       1 : Only READ1 is set.
28
29       2 : Only READ2 is set.
30
31       0 : Either both READ1 and READ2 are set; or neither is set.
32
33       The exact meaning of these categories depends on the  sequencing  tech‐
34       nology  used.   It  is expected that ordinary single and paired-end se‐
35       quencing reads will be in categories 1 and 2 (in the case of paired-end
36       reads,  one  read of the pair will be in category 1, the other in cate‐
37       gory 2).  Category 0 is essentially a “catch-all” for reads that do not
38       fit into a simple paired-end sequencing model.
39
40       For  each category only one sequence will be written for a given QNAME.
41       If more than one record is available for a given  QNAME  and  category,
42       the first in input file order that has quality values will be used.  If
43       none of the candidate records has quality values, then the first in in‐
44       put file order will be used instead.
45
46       Sequences  will be written to standard output unless one of the -1, -2,
47       -o, or -0 options is used, in which case sequences  for  that  category
48       will be written to the specified file.  The same filename may be speci‐
49       fied with multiple options, in which case the sequences will be  multi‐
50       plexed in order of occurrence.
51
52       If  a  singleton file is specified using the -s option then only paired
53       sequences will be output for categories 1 and 2;  paired  meaning  that
54       for  a  given  QNAME there are sequences for both category 1 and 2.  If
55       there is a sequence for only one of categories 1 or 2 then it  will  be
56       diverted  into the specified singletons file.  This can be used to pre‐
57       pare fastq files for programs that cannot handle a  mixture  of  paired
58       and singleton reads.
59
60       The  -s  option  only affects category 1 and 2 records.  The output for
61       category 0 will be the same irrespective of the use of this option.
62
63

OPTIONS

65       -n      By default, either '/1' or '/2' is added to  the  end  of  read
66               names  where  the corresponding READ1 or READ2 FLAG bit is set.
67               Using -n causes read names to be left as they are.
68
69       -N      Always add either '/1' or '/2' to the end of  read  names  even
70               when put into different files.
71
72       -O      Use quality values from OQ tags in preference to standard qual‐
73               ity string if available.
74
75       -s FILE Write singleton reads to FILE.
76
77       -t      Copy RG, BC and QT tags to the FASTQ header line, if  they  ex‐
78               ist.
79
80       -T TAGLIST
81               Specify  a  comma-separated  list  of tags to copy to the FASTQ
82               header line, if they exist.
83
84       -1 FILE Write reads with the READ1 FLAG set (and READ2 not set) to FILE
85               instead  of  outputting  them.   If the -s option is used, only
86               paired reads will be written to this file.
87
88       -2 FILE Write reads with the READ2 FLAG set (and READ1 not set) to FILE
89               instead  of  outputting  them.   If the -s option is used, only
90               paired reads will be written to this file.
91
92       -o FILE Write reads with either READ1 FLAG or READ2 flag  set  to  FILE
93               instead of outputting them to stdout.  This is equivalent to -1
94               FILE -2 FILE.
95
96       -0 FILE Write reads where the READ1 and READ2 FLAG bits set are  either
97               both set or both unset to FILE instead of outputting them.
98
99       -f INT  Only  output alignments with all bits set in INT present in the
100               FLAG field.  INT can be specified in hex by beginning with `0x'
101               (i.e.  /^0x[0-9A-F]+/)  or in octal by beginning with `0' (i.e.
102               /^0[0-7]+/) [0].
103
104       -F INT  Do not output alignments with any bits set in  INT  present  in
105               the  FLAG field.  INT can be specified in hex by beginning with
106               `0x' (i.e. /^0x[0-9A-F]+/) or in octal by  beginning  with  `0'
107               (i.e. /^0[0-7]+/) [0x900].  This defaults to 0x900 representing
108               filtering of secondary and supplementary alignments.
109
110       -G INT  Only EXCLUDE reads with all of the bits set in INT  present  in
111               the  FLAG field.  INT can be specified in hex by beginning with
112               `0x' (i.e. /^0x[0-9A-F]+/) or in octal by  beginning  with  `0'
113               (i.e. /^0[0-7]+/) [0].
114
115       -i      add  Illumina  Casava  1.8 format entry to header (eg 1:N:0:AT‐
116               CACG)
117
118       -c [0..9]
119               set compression level when writing gz or bgzf fastq files.
120
121       --i1 FILE
122               write first index reads to FILE
123
124       --i2 FILE
125               write second index reads to FILE
126
127       --barcode-tag TAG
128               aux tag to find index reads in [default: BC]
129
130       --quality-tag TAG
131               aux tag to find index quality in [default: QT]
132
133       -@, --threads INT
134               Number of input/output compression threads to use  in  addition
135               to main thread [0].
136
137       --index-format STR
138               string  to  describe how to parse the barcode and quality tags.
139               For example:
140
141
142               i14i8   the first 14 characters are index 1, the next 8 charac‐
143                       ters are index 2
144
145               n8i14   ignore  the  first  8  characters,  and use the next 14
146                       characters for index 1
147
148                       If the tag contains a separator, then the numeric  part
149                       can  be replaced with '*' to mean 'read until the sepa‐
150                       rator or end of tag', for example:
151
152               n*i*    ignore the left part of the tag  until  the  separator,
153                       then use the second part
154
155

EXAMPLES

157       Starting from a coordinate sorted file, output paired reads to separate
158       files, discarding singletons, supplementary and secondary  reads.   The
159       resulting files can be used with, for example, the bwa aligner.
160
161           samtools collate -u -O in_pos.bam | \
162           samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n
163
164
165       Starting  with  a name collated file, output paired and singleton reads
166       in a single file, discarding supplementary and secondary reads.  To get
167       all of the reads in a single file, it is necessary to redirect the out‐
168       put of samtools fastq.  The output file is suitable for  use  with  bwa
169       mem  -p  which  understands  interleaved  files containing a mixture of
170       paired and singleton reads.
171
172           samtools fastq -0 /dev/null in_name.bam > all_reads.fq
173
174
175       Output paired reads in a single file, discarding supplementary and sec‐
176       ondary  reads.   Save any singletons in a separate file.  Append /1 and
177       /2 to read names.  This format is suitable for use by  NextGenMap  when
178       using  its  -p and -q options.  With this aligner, paired reads must be
179       mapped separately to the singletons.
180
181           samtools fastq -0 /dev/null -s single.fq -N in_name.bam > paired.fq
182
183
184

BUGS

186       o The way of specifying output files is far too complicated and easy to
187         get wrong.
188
189

AUTHOR

191       Written  by  Heng Li, with modifications by Martin Pollard and Jennifer
192       Liddle, all from the Sanger Institute.
193
194

SEE ALSO

196       samtools(1), samtools-faidx(1), samtools-fqidx(1) samtools-import(1)
197
198       Samtools website: <http://www.htslib.org/>
199
200
201
202samtools-1.15.1                  7 April 2022                samtools-fasta(1)
Impressum