1samtools-import(1)           Bioinformatics tools           samtools-import(1)
2
3
4

NAME

6       samtools import - converts FASTQ files to unmapped SAM/BAM/CRAM
7

SYNOPSIS

9       samtools import [options] [ fastq_file ... ]
10
11
12

DESCRIPTION

14       Reads one or more FASTQ files and converts them to unmapped SAM, BAM or
15       CRAM.  The input files may be automatically decompressed if they have a
16       .gz extension.
17
18       The  simplest usage in the absence of any other command line options is
19       to provide one or two input files.
20
21       If a single file is given, it will be interpreted as a single-ended se‐
22       quencing  format unless the read names end with /1 and /2 in which case
23       they will be labelled as PAIRED with READ1 or READ2 BAM flags set.   If
24       a  pair  of  filenames  are given they will be read from alternately to
25       produce an interleaved output file, also setting  PAIRED  and  READ1  /
26       READ2 flags.
27
28       The  filenames may be explicitly labelled using -1 and -2 for READ1 and
29       READ2 data files, -s for an interleaved paired file (or one half  of  a
30       paired-end  run),  -0 for unpaired data and explicit index files speci‐
31       fied with --i1 and --i2.  These correspond to typical  output  produced
32       by  Illumina  bcl2fastq  and match the output from samtools fastq.  The
33       index files will set both the BC barcode code and  it's  associated  QT
34       quality tag.
35
36       The  Illumina  CASAVA identifiers may also be processed when the -i op‐
37       tion is given.  This tag will be processed for READ1 /  READ2,  whether
38       or  not  the  read failed processing (QCFAIL flag), and the barcode se‐
39       quence which will be added to the BC tag.  This can be  an  alternative
40       to  explicitly  specifying the index files, although note that doing so
41       will not fill out the barcode quality tag.
42
43
44

OPTIONS

46       -s FILE Import paired interleaved data from FILE.
47
48
49       -0 FILE Import single-ended (unpaired) data from FILE.
50
51               Operationally there is no difference between the -s and -0  op‐
52               tions  as  given  an  interleaved file with /1 and /2 read name
53               endings both will correctly set the  PAIRED,  READ1  and  READ2
54               flags,  and  given  data with no suffixes and no CASAVA identi‐
55               fiers being processed both will leave  the  data  as  unpaired.
56               However  their  inclusion  here is for more descriptive command
57               lines and to improve the header comment describing the samtools
58               fastq decode command.
59
60
61       -1 FILE, -2 FILE
62               Import  paired  data from a pair of FILEs.  The BAM flag PAIRED
63               will be set, but not PROPER_PAIR as it has  not  been  aligned.
64               READ1  and  READ2  will  be stored in their original, unmapped,
65               orientation.
66
67
68       --i1 FILE, --i2 FILE
69               Specifies index barcodes associated with the -1 and  -2  files.
70               These  will  be appended to READ1 and READ2 records in the bar‐
71               code (BC) and quality (QT) tags.
72
73
74       -i      Specifies that the Illumina CASAVA identifiers should  be  pro‐
75               cessed.  This may set the READ1, READ2 and QCFAIL flags and add
76               a barcode tag.
77
78
79       -N, --name2
80               Assume the read names are encoded in the SRA  and  ENA  formats
81               where  the  first  word is an automatically generated name with
82               the second field being the original name.  This option extracts
83               that second field instead.
84
85
86       --barcode-tag TAG
87               Changes  the auxiliary tag used for barcode sequence.  Defaults
88               to BC.
89
90
91       --quality-tag TAG
92               Changes the auxiliary tag used for barcode  quality.   Defaults
93               to QT.
94
95
96       -oFILE  Output to FILE.  By default output will be written to stdout.
97
98
99       --order TAG
100               When  outputting  a SAM record, also output an integer tag con‐
101               taining the Nth record number.  This may be useful if the  data
102               is  to be sorted or collated in some manner and we wish this to
103               be reversible.  In this case the tag may be used with  samtools
104               sort -t TAG to regenerate the original input order.
105
106
107       -r RG_line, --rg-line RG_line
108               A  complete  @RG  header line may be specified, with or without
109               the initial "@RG" component.  If specified this will  also  use
110               the ID field from RG_line in each SAM records RG auxiliary tag.
111
112               If  specified multiple times this appends to the RG line, auto‐
113               matically adding tabs between invocations.
114
115
116       -R RG_ID, --rg RG_ID
117               This is a shorter form of the option above, equivalent to --rg-
118               line  ID:RG_ID.   If both are specified then this option is ig‐
119               nored.
120
121
122       -u      Output BAM or CRAM as uncompressed data.
123
124
125       -T TAGLIST
126               This looks for any SAM-format auxiliary  tags  in  the  comment
127               field  of  a  fastq  read  name.   These must match the <alpha-
128               num><alpha-num>:<type>:<data> pattern as specified in  the  SAM
129               specification.   TAGLIST can be blank or * to indicate all tags
130               should be copied to the output, otherwise it is  a  comma-sepa‐
131               rated  list  of tag types to include with all others being dis‐
132               carded.
133
134
135

EXAMPLES

137       Convert a single-ended fastq file to an unmapped CRAM.  Both  of  these
138       commands perform the same action.
139
140
141           samtools import -0 in.fastq -o out.cram
142           samtools import in.fastq > out.cram
143
144
145       Convert a pair of Illumina fastqs containing CASAVA identifiers to BAM,
146       adding the barcode information to the BC auxiliary tag.
147
148
149           samtools import -i -1 in_1.fastq -2 in_2.fastq -o out.bam
150           samtools import -i in_[12].fastq > out.bam
151
152
153       Specify the read group. These commands are equivalent
154
155
156           samtools import -r "$(echo -e 'ID:xyz\tPL:ILLUMINA')" in.fq
157           samtools import -r "$(echo -e '@RG\tID:xyz\tPL:ILLUMINA')" in.fq
158           samtools import -r ID:xyz -r PL:ILLUMINA in.fq
159
160
161       Create an unmapped BAM file from  a  set  of  4  Illumina  fastqs  from
162       bcf2fastq, consisting of two read and two index tags.  The CASAVA iden‐
163       tifier is used only for setting QC pass / failure status.
164
165
166           samtools import -i -1 R1.fq -2 R2.fq --i1 I1.fq --i2 I2.fq -o out.bam
167
168
169       Convert a pair of CASAVA barcoded fastq files to unmapped CRAM with  an
170       incremental record counter, then sort this by minimiser in order to re‐
171       duce file space.  The reversal process is  also  shown  using  samtools
172       sort and samtools fastq.
173
174
175           samtools import -i in_1.fq in_2.fq --order ro -O bam,level=0 | \
176               samtools sort -@4 -M -o out.srt.cram -
177
178           samtools sort -@4 -O bam -u -t ro out.srt.cram | \
179               samtools fastq -1 out_1.fq -2 out_2.fq -i --index-format "i*i*"
180
181
182

AUTHOR

184       Written by James Bonfield of the Wellcome Sanger Institute.
185
186

SEE ALSO

188       samtools(1), samtools-fastq(1)
189
190       Samtools website: <http://www.htslib.org/>
191
192
193
194samtools-1.15.1                  7 April 2022               samtools-import(1)
Impressum