1samtools-sort(1) Bioinformatics tools samtools-sort(1)
2
3
4
6 samtools sort - sorts SAM/BAM/CRAM files
7
9 samtools sort [-l level] [-u] [-m maxMem] [-o out.bam] [-O format] [-M]
10 [-K kmerLen] [-n] [-t tag] [-T tmpprefix] [-@ threads]
11 [in.sam|in.bam|in.cram]
12
13
15 Sort alignments by leftmost coordinates, or by read name when -n is
16 used. An appropriate @HD-SO sort order header tag will be added or an
17 existing one updated if necessary.
18
19 The sorted output is written to standard output by default, or to the
20 specified file (out.bam) when -o is used. This command will also cre‐
21 ate temporary files tmpprefix.%d.bam as needed when the entire align‐
22 ment data cannot fit into memory (as controlled via the -m option).
23
24 Consider using samtools collate instead if you need name collated data
25 without a full lexicographical sort.
26
27 Note that if the sorted output file is to be indexed with samtools in‐
28 dex, the default coordinate sort must be used. Thus the -n and -t op‐
29 tions are incompatible with samtools index.
30
31
33 -K INT Sets the kmer size to be used in the -M option. [20]
34
35 -l INT Set the desired compression level for the final output file,
36 ranging from 0 (uncompressed) or 1 (fastest but minimal com‐
37 pression) to 9 (best compression but slowest to write), sim‐
38 ilarly to gzip(1)'s compression level setting.
39
40 If -l is not used, the default compression level will apply.
41
42 -u Set the compression level to 0, for uncompressed output.
43 This is a synonym for -l 0.
44
45 -m INT Approximately the maximum required memory per thread, speci‐
46 fied either in bytes or with a K, M, or G suffix. [768 MiB]
47
48 To prevent sort from creating a huge number of temporary
49 files, it enforces a minimum value of 1M for this setting.
50
51 -M Sort unmapped reads (those in chromosome "*") by their se‐
52 quence minimiser (Schleimer et al., 2003; Roberts et al.,
53 2004), also reverse complementing as appropriate. This has
54 the effect of collating some similar data together, improv‐
55 ing the compressibility of the unmapped sequence. The min‐
56 imiser kmer size is adjusted using the -K option. Note data
57 compressed in this manner may need to be name collated prior
58 to conversion back to fastq.
59
60 Mapped sequences are sorted by chromosome and position.
61
62 -n Sort by read names (i.e., the QNAME field) rather than by
63 chromosomal coordinates.
64
65 -t TAG Sort first by the value in the alignment tag TAG, then by
66 position or name (if also using -n).
67
68 -o FILE Write the final sorted output to FILE, rather than to stan‐
69 dard output.
70
71 -O FORMAT Write the final output as sam, bam, or cram.
72
73 By default, samtools tries to select a format based on the
74 -o filename extension; if output is to standard output or no
75 format can be deduced, bam is selected.
76
77 -T PREFIX Write temporary files to PREFIX.nnnn.bam, or if the speci‐
78 fied PREFIX is an existing directory, to PREFIX/sam‐
79 tools.mmm.mmm.tmp.nnnn.bam, where mmm is unique to this in‐
80 vocation of the sort command.
81
82 By default, any temporary files are written alongside the
83 output file, as out.bam.tmp.nnnn.bam, or if output is to
84 standard output, in the current directory as sam‐
85 tools.mmm.mmm.tmp.nnnn.bam.
86
87 -@ INT Set number of sorting and compression threads. By default,
88 operation is single-threaded.
89
90 --no-PG Do not add a @PG line to the header of the output file.
91
92 Ordering Rules
93
94 The following rules are used for ordering records.
95
96 If option -t is in use, records are first sorted by the value of the
97 given alignment tag, and then by position or name (if using -n). For
98 example, “-t RG” will make read group the primary sort key. The rules
99 for ordering by tag are:
100
101
102 • Records that do not have the tag are sorted before ones that do.
103
104 • If the types of the tags are different, they will be sorted so that
105 single character tags (type A) come before array tags (type B),
106 then string tags (types H and Z), then numeric tags (types f and
107 i).
108
109 • Numeric tags (types f and i) are compared by value. Note that com‐
110 parisons of floating-point values are subject to issues of rounding
111 and precision.
112
113 • String tags (types H and Z) are compared based on the binary con‐
114 tents of the tag using the C strcmp(3) function.
115
116 • Character tags (type A) are compared by binary character value.
117
118 • No attempt is made to compare tags of other types — notably type B
119 array values will not be compared.
120
121 When the -n option is present, records are sorted by name. Names are
122 compared so as to give a “natural” ordering — i.e. sections consisting
123 of digits are compared numerically while all other sections are com‐
124 pared based on their binary representation. This means “a1” will come
125 before “b1” and “a9” will come before “a10”. Records with the same
126 name will be ordered according to the values of the READ1 and READ2
127 flags (see flags).
128
129 When the -n option is not present, reads are sorted by reference (ac‐
130 cording to the order of the @SQ header records), then by position in
131 the reference, and then by the REVERSE flag.
132
133 Note
134
135
136 Historically samtools sort also accepted a less flexible way of speci‐
137 fying the final and temporary output filenames:
138
139 samtools sort [-f] [-o] in.bam out.prefix
140
141 This has now been removed. The previous out.prefix argument (and -f
142 option, if any) should be changed to an appropriate combination of -T
143 PREFIX and -o FILE. The previous -o option should be removed, as out‐
144 put defaults to standard output.
145
146
148 Written by Heng Li from the Sanger Institute with numerous subsequent
149 modifications.
150
151
153 samtools(1), samtools-collate(1), samtools-merge(1)
154
155 Samtools website: <http://www.htslib.org/>
156
157
158
159samtools-1.13 7 July 2021 samtools-sort(1)