1esl-reformat(1) Easel Manual esl-reformat(1)
2
3
4
6 esl-reformat - convert sequence file formats
7
8
10 esl-reformat [options] format seqfile
11
12
13
15 esl-reformat reads the sequence file seqfile in any supported format,
16 reformats it into a new format specified by format, then outputs the
17 reformatted text.
18
19
20 The format argument must (case-insensitively) match a supported se‐
21 quence file format. Common choices for format include: fasta, embl,
22 genbank. If seqfile is an alignment file, alignment output formats
23 also work. Common choices include: stockholm, a2m, afa, psiblast,
24 clustal, phylip. For more information, and for codes for some less
25 common formats, see main documentation. The string <s> is case-insen‐
26 sitive (fasta or FASTA both work).
27
28
29 Unaligned format files cannot be reformatted to aligned formats. How‐
30 ever, aligned formats can be reformatted to unaligned formats, in which
31 case gap characters are simply stripped out.
32
33
35 -d DNA; convert U's to T's, to make sure a nucleic acid sequence is
36 shown as DNA not RNA. See -r.
37
38
39
40 -h Print brief help; includes version number and summary of all op‐
41 tions, including expert options.
42
43
44
45 -l Lowercase; convert all sequence residues to lower case. See -u.
46
47
48
49 -n For DNA/RNA sequences, converts any character that's not unam‐
50 biguous RNA/DNA (e.g. ACGTU/acgtu) to an N. Used to convert IU‐
51 PAC ambiguity codes to N's, for software that can't handle all
52 IUPAC codes (some public RNA folding codes, for example). If the
53 file is an alignment, gap characters are also left unchanged. If
54 sequences are not nucleic acid sequences, this option will cor‐
55 rupt the data in a predictable fashion.
56
57
58
59 -o <f> Send output to file <f> instead of stdout.
60
61
62
63 -r RNA; convert T's to U's, to make sure a nucleic acid sequence is
64 shown as RNA not DNA. See -d.
65
66
67
68 -u Uppercase; convert all sequence residues to upper case. See -l.
69
70
71
72 -x For DNA sequences, convert non-IUPAC characters (such as X's) to
73 N's. This is for compatibility with benighted people who insist
74 on using X instead of the IUPAC ambiguity character N. (X is for
75 ambiguity in an amino acid residue).
76
77 Warning: like the -n option, the code doesn't check that you are
78 actually giving it DNA. It simply literally just converts non-
79 IUPAC DNA symbols to N. So if you accidentally give it protein
80 sequence, it will happily convert most every amino acid residue
81 to an N.
82
83
84
85
86
88 --gapsym <c>
89 Convert all gap characters to <c>. Used to prepare alignment
90 files for programs with strict requirements for gap symbols.
91 Only makes sense if the input seqfile is an alignment.
92
93
94 --informat <s>
95 Assert that input seqfile is in format <s>, bypassing format au‐
96 todetection. Common choices for <s> include: fasta, embl, gen‐
97 bank. Alignment formats also work; common choices include:
98 stockholm, a2m, afa, psiblast, clustal, phylip. For more infor‐
99 mation, and for codes for some less common formats, see main
100 documentation. The string <s> is case-insensitive (fasta or
101 FASTA both work).
102
103
104 --mingap
105 If seqfile is an alignment, remove any columns that contain 100%
106 gap or missing data characters, minimizing the overall length of
107 the alignment. (Often useful if you've extracted a subset of
108 aligned sequences from a larger alignment.)
109
110
111 --keeprf
112 When used in combination with --mingap, never remove a column
113 that is not a gap in the reference (#=GC RF) annotation, even if
114 the column contains 100% gap characters in all aligned se‐
115 quences. By default with --mingap, nongap RF columns that are
116 100% gaps in all sequences are removed.
117
118
119 --nogap
120 Remove any aligned columns that contain any gap or missing data
121 symbols at all. Useful as a prelude to phylogenetic analyses,
122 where you only want to analyze columns containing 100% residues,
123 so you want to strip out any columns with gaps in them. Only
124 makes sense if the file is an alignment file.
125
126
127 --wussify
128 Convert RNA secondary structure annotation strings (both consen‐
129 sus and individual) from old "KHS" format, ><, to the new WUSS
130 notation, <>. If the notation is already in WUSS format, this
131 option will screw it up, without warning. Only SELEX and Stock‐
132 holm format files have secondary structure markup at present.
133
134
135 --dewuss
136 Convert RNA secondary structure annotation strings from the new
137 WUSS notation, <>, back to the old KHS format, ><. If the anno‐
138 tation is already in KHS, this option will corrupt it, without
139 warning. Only SELEX and Stockholm format files have secondary
140 structure markup.
141
142
143 --fullwuss
144 Convert RNA secondary structure annotation strings from simple
145 (input) WUSS notation to full (output) WUSS notation.
146
147
148 --replace <s>
149 <s> must be in the format <s1>:<s2> with equal numbers of char‐
150 acters in <s1> and <s2> separated by a ":" symbol. Each charac‐
151 ter from <s1> in the input file will be replaced by its counter‐
152 part (at the same position) from <s2>. Note that special char‐
153 acters in <s> (such as "~") may need to be prefixed by a "\"
154 character.
155
156
157 --small
158 Operate in small memory mode for input alignment files in Pfam
159 format. If not used, each alignment is stored in memory so the
160 required memory will be roughly the size of the largest align‐
161 ment in the input file. With --small, input alignments are not
162 stored in memory. This option only works in combination with
163 --informat pfam and output format pfam or afa.
164
165
166
167
169 http://bioeasel.org/
170
171
173 Copyright (C) 2020 Howard Hughes Medical Institute.
174 Freely distributed under the BSD open source license.
175
176
178 http://eddylab.org
179
180
181
182
183Easel 0.48 Nov 2020 esl-reformat(1)