1samtools-import(1) Bioinformatics tools samtools-import(1)
2
3
4
6 samtools import - converts FASTQ files to unmapped SAM/BAM/CRAM
7
9 samtools import [options] [ fastq_file ... ]
10
11
12
14 Reads one or more FASTQ files and converts them to unmapped SAM, BAM or
15 CRAM. The input files may be automatically decompressed if they have a
16 .gz extension.
17
18 The simplest usage in the absence of any other command line options is
19 to provide one or two input files.
20
21 If a single file is given, it will be interpreted as a single-ended se‐
22 quencing format unless the read names end with /1 and /2 in which case
23 they will be labelled as PAIRED with READ1 or READ2 BAM flags set. If
24 a pair of filenames are given they will be read from alternately to
25 produce an interleaved output file, also setting PAIRED and READ1 /
26 READ2 flags.
27
28 The filenames may be explicitly labelled using -1 and -2 for READ1 and
29 READ2 data files, -s for an interleaved paired file (or one half of a
30 paired-end run), -0 for unpaired data and explicit index files speci‐
31 fied with --i1 and --i2. These correspond to typical output produced
32 by Illumina bcl2fastq and match the output from samtools fastq. The
33 index files will set both the BC barcode code and it's associated QT
34 quality tag.
35
36 The Illumina CASAVA identifiers may also be processed when the -i op‐
37 tion is given. This tag will be processed for READ1 / READ2, whether
38 or not the read failed processing (QCFAIL flag), and the barcode se‐
39 quence which will be added to the BC tag. This can be an alternative
40 to explicitly specifying the index files, although note that doing so
41 will not fill out the barcode quality tag.
42
43
44
46 -s FILE Import paired interleaved data from FILE.
47
48
49 -0 FILE Import single-ended (unpaired) data from FILE.
50
51 Operationally there is no difference between the -s and -0 op‐
52 tions as given an interleaved file with /1 and /2 read name
53 endings both will correctly set the PAIRED, READ1 and READ2
54 flags, and given data with no suffixes and no CASAVA identi‐
55 fiers being processed both will leave the data as unpaired.
56 However their inclusion here is for more descriptive command
57 lines and to improve the header comment describing the samtools
58 fastq decode command.
59
60
61 -1 FILE, -2 FILE
62 Import paired data from a pair of FILEs. The BAM flag PAIRED
63 will be set, but not PROPER_PAIR as it has not been aligned.
64 READ1 and READ2 will be stored in their original, unmapped,
65 orientation.
66
67
68 --i1 FILE, --i2 FILE
69 Specifies index barcodes associated with the -1 and -2 files.
70 These will be appended to READ1 and READ2 records in the bar‐
71 code (BC) and quality (QT) tags.
72
73
74 -i Specifies that the Illumina CASAVA identifiers should be pro‐
75 cessed. This may set the READ1, READ2 and QCFAIL flags and add
76 a barcode tag.
77
78
79 -N, --name2
80 Assume the read names are encoded in the SRA and ENA formats
81 where the first word is an automatically generated name with
82 the second field being the original name. This option extracts
83 that second field instead.
84
85
86 --barcode-tag TAG
87 Changes the auxiliary tag used for barcode sequence. Defaults
88 to BC.
89
90
91 --quality-tag TAG
92 Changes the auxiliary tag used for barcode quality. Defaults
93 to QT.
94
95
96 -oFILE Output to FILE. By default output will be written to stdout.
97
98
99 --order TAG
100 When outputting a SAM record, also output an integer tag con‐
101 taining the Nth record number. This may be useful if the data
102 is to be sorted or collated in some manner and we wish this to
103 be reversible. In this case the tag may be used with samtools
104 sort -t TAG to regenerate the original input order.
105
106
107 -r RG_line, --rg-line RG_line
108 A complete @RG header line may be specified, with or without
109 the initial "@RG" component. If specified this will also use
110 the ID field from RG_line in each SAM records RG auxiliary tag.
111
112 If specified multiple times this appends to the RG line, auto‐
113 matically adding tabs between invocations.
114
115
116 -R RG_ID, --rg RG_ID
117 This is a shorter form of the option above, equivalent to --rg-
118 line ID:RG_ID. If both are specified then this option is ig‐
119 nored.
120
121
122 -u Output BAM or CRAM as uncompressed data.
123
124
125 -T TAGLIST
126 This looks for any SAM-format auxiliary tags in the comment
127 field of a fastq read name. These must match the <alpha-
128 num><alpha-num>:<type>:<data> pattern as specified in the SAM
129 specification. TAGLIST can be blank or * to indicate all tags
130 should be copied to the output, otherwise it is a comma-sepa‐
131 rated list of tag types to include with all others being dis‐
132 carded.
133
134
135
137 Convert a single-ended fastq file to an unmapped CRAM. Both of these
138 commands perform the same action.
139
140
141 samtools import -0 in.fastq -o out.cram
142 samtools import in.fastq > out.cram
143
144
145 Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM,
146 adding the barcode information to the BC auxiliary tag.
147
148
149 samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam
150 samtools import -i in_[12].fastq > out.bam
151
152
153 Specify the read group. These commands are equivalent
154
155
156 samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq
157 samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq
158 samtools import -r ID:xyz -r PL:ILLUMINA in.fq
159
160
161 Create an unmapped BAM file from a set of 4 Illumina fastqs from
162 bcf2fastq, consisting of two read and two index tags. The CASAVA iden‐
163 tifier is used only for setting QC pass / failure status.
164
165
166 samtools import -i -1 R1.fq -2 R2.fq --i1 I1.fq --i2 I2.fq -o out.bam
167
168
169 Convert a pair of CASAVA barcoded fastq files to unmapped CRAM with an
170 incremental record counter, then sort this by minimiser in order to re‐
171 duce file space. The reversal process is also shown using samtools
172 sort and samtools fastq.
173
174
175 samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0 | \
176 samtools sort -@4 -M -o out.srt.cram -
177
178 samtools sort -@4 -O bam -u -t ro out.srt.cram | \
179 samtools fastq -1 out_1.fq -2 out_2.fq -i --index-format "i*i*"
180
181
182
184 Written by James Bonfield of the Wellcome Sanger Institute.
185
186
188 samtools(1), samtools-fastq(1)
189
190 Samtools website: <http://www.htslib.org/>
191
192
193
194samtools-1.15.1 7 April 2022 samtools-import(1)