samtools-collate(1)

1samtools-collate(1)          Bioinformatics tools          samtools-collate(1)
2
3
4

NAME

6       samtools collate - shuffles and groups reads together by their names
7

SYNOPSIS

9       samtools collate [options] in.sam|in.bam|in.cram [<prefix>]
10
11

DESCRIPTION

13       Shuffles  and  groups reads together by their names.  A faster alterna‐
14       tive to a full query name sort, collate ensures that reads of the  same
15       name  are  grouped  together in contiguous groups, but doesn't make any
16       guarantees about the order of read names between groups.
17
18       The output from this command should be suitable for any operation  that
19       requires all reads from the same template to be grouped together.
20
21       If  present,  <prefix> is used to name the temporary files that collate
22       uses when sorting the data.  If neither the '-O' nor '-o'  options  are
23       used,  <prefix> must be present and collate will use it to make an out‐
24       put file name by appending a suffix depending  on  the  format  written
25       (.bam by default).
26
27       If  either the -O or -o option is used, <prefix> is optional.  If <pre‐
28       fix> is absent, collate will write the temporary files to a  system-de‐
29       pendent location (/tmp on UNIX).
30
31       Using  -f  for  fast mode will output only primary alignments that have
32       either the READ1 or READ2 flags set (but not both).  Any  other  align‐
33       ment  records  will be filtered out.  The collation will only work cor‐
34       rectly if there are no more than two reads for any  given  QNAME  after
35       filtering.
36
37       Fast  mode  keeps a buffer of alignments in memory so that it can write
38       out most pairs as soon as they are found instead  of  storing  them  in
39       temporary  files.  This allows collate to avoid some work and so finish
40       more quickly compared to the standard mode.  The number  of  alignments
41       held  can be changed using -r, storing more alignments uses more memory
42       but increases the number of pairs that can be written early.
43
44       While collate normally randomises the ordering of read pairs, fast mode
45       does  not.   Position-dependent biases that would normally be broken up
46       can remain in the fast collate output.  It is therefore not a good idea
47       to  use fast mode when preparing data for programs that expect randomly
48       ordered paired reads.  For example using fast collate  instead  of  the
49       standard mode may lead to significantly different results from aligners
50       that estimate library insert sizes on batches of reads.
51
52

OPTIONS

54       -O      Output to stdout.  This option cannot be used with '-o'.
55
56       -o FILE Write output to FILE.  This option cannot be used with '-O'.
57
58       -u      Write uncompressed BAM output
59
60       -l INT  Compression level.  [1]
61
62       -n INT  Number of temporary files to use.  [64]
63
64       -f      Fast mode (primary alignments only).
65
66       -r INT  Number of reads to store in memory (for use with -f).  [10000]
67
68       --no-PG Do not add a @PG line to the header of the output file.
69
70       -@, --threads INT
71               Number of input/output compression threads to use  in  addition
72               to main thread [0].
73
74

AUTHOR

76       Written  by  Heng  Li  from the Sanger Institute and extended by Andrew
77       Whitwham.
78
79

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

AUTHOR

SEE ALSO