1samtools-sort(1)             Bioinformatics tools             samtools-sort(1)
2
3
4

NAME

6       samtools sort - sorts SAM/BAM/CRAM files
7

SYNOPSIS

9       samtools sort [-l level] [-u] [-m maxMem] [-o out.bam] [-O format] [-M]
10       [-K   kmerLen]   [-n]   [-t   tag]   [-T   tmpprefix]   [-@    threads]
11       [in.sam|in.bam|in.cram]
12
13

DESCRIPTION

15       Sort  alignments  by  leftmost  coordinates, or by read name when -n is
16       used.  An appropriate @HD-SO sort order header tag will be added or  an
17       existing one updated if necessary.
18
19       The  sorted  output is written to standard output by default, or to the
20       specified file (out.bam) when -o is used.  This command will also  cre‐
21       ate  temporary  files tmpprefix.%d.bam as needed when the entire align‐
22       ment data cannot fit into memory (as controlled via the -m option).
23
24       Consider using samtools collate instead if you need name collated  data
25       without a full lexicographical sort.
26
27       Note  that if the sorted output file is to be indexed with samtools in‐
28       dex, the default coordinate sort must be used.  Thus the -n and -t  op‐
29       tions are incompatible with samtools index.
30
31

OPTIONS

33       -K INT     Sets the kmer size to be used in the -M option. [20]
34
35       -l INT     Set the desired compression level for the final output file,
36                  ranging from 0 (uncompressed) or 1 (fastest but minimal com‐
37                  pression) to 9 (best compression but slowest to write), sim‐
38                  ilarly to gzip(1)'s compression level setting.
39
40                  If -l is not used, the default compression level will apply.
41
42       -u         Set the compression level to  0,  for  uncompressed  output.
43                  This is a synonym for -l 0.
44
45       -m INT     Approximately the maximum required memory per thread, speci‐
46                  fied either in bytes or with a K, M, or G suffix.  [768 MiB]
47
48                  To prevent sort from creating a  huge  number  of  temporary
49                  files, it enforces a minimum value of 1M for this setting.
50
51       -M         Sort  unmapped  reads (those in chromosome "*") by their se‐
52                  quence minimiser (Schleimer et al., 2003;  Roberts  et  al.,
53                  2004),  also reverse complementing as appropriate.  This has
54                  the effect of collating some similar data together,  improv‐
55                  ing  the compressibility of the unmapped sequence.  The min‐
56                  imiser kmer size is adjusted using the -K option.  Note data
57                  compressed in this manner may need to be name collated prior
58                  to conversion back to fastq.
59
60                  Mapped sequences are sorted by chromosome and position.
61
62       -n         Sort by read names (i.e., the QNAME field)  rather  than  by
63                  chromosomal coordinates.
64
65       -t TAG     Sort  first  by  the value in the alignment tag TAG, then by
66                  position or name (if also using -n).
67
68       -o FILE    Write the final sorted output to FILE, rather than to  stan‐
69                  dard output.
70
71       -O FORMAT  Write the final output as sam, bam, or cram.
72
73                  By  default,  samtools tries to select a format based on the
74                  -o filename extension; if output is to standard output or no
75                  format can be deduced, bam is selected.
76
77       -T PREFIX  Write  temporary  files to PREFIX.nnnn.bam, or if the speci‐
78                  fied  PREFIX  is  an  existing  directory,  to   PREFIX/sam‐
79                  tools.mmm.mmm.tmp.nnnn.bam,  where mmm is unique to this in‐
80                  vocation of the sort command.
81
82                  By default, any temporary files are  written  alongside  the
83                  output  file,  as  out.bam.tmp.nnnn.bam,  or if output is to
84                  standard  output,  in  the   current   directory   as   sam‐
85                  tools.mmm.mmm.tmp.nnnn.bam.
86
87       -@ INT     Set  number of sorting and compression threads.  By default,
88                  operation is single-threaded.
89
90       --no-PG    Do not add a @PG line to the header of the output file.
91
92       Ordering Rules
93
94       The following rules are used for ordering records.
95
96       If option -t is in use, records are first sorted by the  value  of  the
97       given  alignment  tag, and then by position or name (if using -n).  For
98       example, “-t RG” will make read group the primary sort key.  The  rules
99       for ordering by tag are:
100
101
102       •   Records that do not have the tag are sorted before ones that do.
103
104       •   If the types of the tags are different, they will be sorted so that
105           single character tags (type A) come before  array  tags  (type  B),
106           then  string  tags  (types H and Z), then numeric tags (types f and
107           i).
108
109       •   Numeric tags (types f and i) are compared by value.  Note that com‐
110           parisons of floating-point values are subject to issues of rounding
111           and precision.
112
113       •   String tags (types H and Z) are compared based on the  binary  con‐
114           tents of the tag using the C strcmp(3) function.
115
116       •   Character tags (type A) are compared by binary character value.
117
118       •   No  attempt is made to compare tags of other types — notably type B
119           array values will not be compared.
120
121       When the -n option is present, records are sorted by name.   Names  are
122       compared  so as to give a “natural” ordering — i.e. sections consisting
123       of digits are compared numerically while all other  sections  are  com‐
124       pared  based on their binary representation.  This means “a1” will come
125       before “b1” and “a9” will come before “a10”.   Records  with  the  same
126       name  will  be  ordered  according to the values of the READ1 and READ2
127       flags (see flags).
128
129       When the -n option is not present, reads are sorted by  reference  (ac‐
130       cording  to  the  order of the @SQ header records), then by position in
131       the reference, and then by the REVERSE flag.
132
133       Note
134
135
136       Historically samtools sort also accepted a less flexible way of  speci‐
137       fying the final and temporary output filenames:
138
139              samtools sort [-f] [-o] in.bam out.prefix
140
141       This  has  now  been removed.  The previous out.prefix argument (and -f
142       option, if any) should be changed to an appropriate combination  of  -T
143       PREFIX  and -o FILE.  The previous -o option should be removed, as out‐
144       put defaults to standard output.
145
146

AUTHOR

148       Written by Heng Li from the Sanger Institute with  numerous  subsequent
149       modifications.
150
151

SEE ALSO

153       samtools(1), samtools-collate(1), samtools-merge(1)
154
155       Samtools website: <http://www.htslib.org/>
156
157
158
159samtools-1.13                     7 July 2021                 samtools-sort(1)
Impressum