1scramble(1) Staden io_lib scramble(1)
2
3
4
6 scramble - Converts between the SAM, BAM and CRAM file formats.
7
8
10 scramble [options] [input_file [output_file]]
11
12
14 scramble converts between various next-gen sequencing alignment file
15 formats, including SAM, BAM and CRAM. It can either act as a pipe read‐
16 ing stdin and writing to stdout, or on named files.
17
18 When operating as a pipe the input type defaults to SAM or BAM, requir‐
19 ing the -I cram option to indicate input is in CRAM format is appropri‐
20 ate. The output defaults to BAM, but can be adjusted by using the -O
21 format option. When given filenames the file type is automatically cho‐
22 sen based on the filename suffix.
23
24
26 -I format
27 Selects the input format, where format is one of sam, bam or
28 cram. Use this when reading via a pipe to avoid input bytes be‐
29 ing consumed when attempting to detect if the input is in SAM or
30 BAM format.
31
32
33 -O format
34 Selects the output format, where format is one of sam, bam or
35 cram.
36
37
38 -1 to -9
39 Sets the compression level from 1 (low compression, fast) to 9
40 (high compression, slow) when writing in BAM or CRAM format.
41 This is only used during writing.
42
43
44 -0 or -u
45 Writes uncompressed data. In BAM this still uses BGZF contain‐
46 ers, but with no internal compression. In CRAM it stores blocks
47 in RAW format instead. The option has no effect on SAM output.
48
49
50 -j CRAM encoding only. Add bzip2 to the list of compression codes
51 potentially used during CRAM creation.
52
53
54 -Z CRAM encoding only. Add lzma to the list of compression codes
55 potentially used during CRAM creation. Given the slow compres‐
56 sion speed of lzma, this may only be used where it gives a sig‐
57 nificant advantage over zlib or bzip2, but with higher compres‐
58 sion levels (-7) this weighting is ignored as LZMA decompression
59 speed is acceptable, albeit still slower than zlib.
60
61
62 -m CRAM decoding only. Generate MD:Z: and NM:I: auxiliary fields
63 based on the reference-based compression.
64
65
66 -M CRAM encoding only. Forcibly pack sequences from multiple ref‐
67 erences into the same slice. Normally CRAM will start a new
68 slice when changing from one reference to another, but will
69 still automatically switch to multi-reference slices if the num‐
70 ber of sequences per slice becomes too small.
71
72
73 -R range
74 Currently for CRAM input only, but SAM/BAM support is pending.
75 This indicates a reference sequence name and optionally a start
76 and end location within that reference, using the syntax
77 ref_name or ref_name:start-end. For efficient operation the CRAM
78 file needs a .crai format index (built using the cram_index pro‐
79 gram).
80
81
82 -r ref.fa
83 CRAM encoding only. Use this to specify the reference fasta
84 file. Note that if the input SAM or BAM file a file: or local
85 file system based URI specified in the @SQ headers then this op‐
86 tion may not be necessary.
87
88
89 -s number
90 CRAM encoding only. Specifies the number of sequecnes per
91 slice. Defaults to 10000.
92
93
94 -S number
95 CRAM encoding only. Specifies the number of slices per con‐
96 tainer. Defaults to 1.
97
98
99 -t BAM and CRAM only. Specifies the number of compression or de‐
100 compression threads, adaptively shared between both encoding and
101 decoding. Defaults to 1 (no threading).
102
103
104 -V version_string
105 CRAM encoding only. Sets the CRAM file format version. Sup‐
106 ported values are "2.0", "2.1" and "3.0".
107
108
109 -e CRAM encoding only. Embed snippets of the reference sequence in
110 every slice. This means the files can be decoded without need‐
111 ing to specify the reference fasta file.
112
113
114 -x CRAM encoding only. Omit reference based compression and in‐
115 stead store details of every base verbatim.
116
117
118 -B Experimental, encoding only. When storing quality values, bin
119 into 8 discrete values (plus 0), as typically used by modern Il‐
120 lumina instruments. (Note that the bins may not be precisely
121 the same ranges.)
122
123
124 -! CRAM v3.0 and above decoding only. Do not check CRCs. This op‐
125 tion should only be used when attempting to recover from a data
126 corruption.
127
128
130 To convert a BAM file from stdin to CRAM on stdout, using reference
131 MT.fa.
132
133 some_command | scramble -I bam -O cram -r MT.fa | some_command
134
135
136 The default CRAM output format is version 3.0, so no version needs to
137 be specified when converting from 2.1 to 3.0. To perform the reverse
138 use:
139
140 scramble -V 2.1 in.cram out.cram
141
142
144 James Bonfield, Wellcome Trust Sanger Institute
145
146
147
148 March 19 2013 scramble(1)