esl-reformat(1)

1esl-reformat(1)                  Easel Manual                  esl-reformat(1)
2
3
4

NAME

6       esl-reformat - convert sequence file formats
7
8

SYNOPSIS

10       esl-reformat [options] format seqfile
11
12
13

DESCRIPTION

15       esl-reformat  reads  the sequence file seqfile in any supported format,
16       reformats it into a new format specified by format,  then  outputs  the
17       reformatted text.
18
19
20       The  format  argument  must  (case-insensitively) match a supported se‐
21       quence file format.  Common choices for format  include:  fasta,  embl,
22       genbank.   If  seqfile  is  an alignment file, alignment output formats
23       also work.  Common choices  include:  stockholm,  a2m,  afa,  psiblast,
24       clustal,  phylip.   For  more  information, and for codes for some less
25       common formats, see main documentation.  The string <s> is  case-insen‐
26       sitive (fasta or FASTA both work).
27
28
29       Unaligned  format files cannot be reformatted to aligned formats.  How‐
30       ever, aligned formats can be reformatted to unaligned formats, in which
31       case gap characters are simply stripped out.
32
33

OPTIONS

35       -d     DNA; convert U's to T's, to make sure a nucleic acid sequence is
36              shown as DNA not RNA. See -r.
37
38
39
40       -h     Print brief help; includes version number and summary of all op‐
41              tions, including expert options.
42
43
44
45       -l     Lowercase; convert all sequence residues to lower case.  See -u.
46
47
48
49       -n     For  DNA/RNA  sequences, converts any character that's not unam‐
50              biguous RNA/DNA (e.g. ACGTU/acgtu) to an N. Used to convert  IU‐
51              PAC  ambiguity  codes to N's, for software that can't handle all
52              IUPAC codes (some public RNA folding codes, for example). If the
53              file is an alignment, gap characters are also left unchanged. If
54              sequences are not nucleic acid sequences, this option will  cor‐
55              rupt the data in a predictable fashion.
56
57
58
59       -o <f> Send output to file <f> instead of stdout.
60
61
62
63       -r     RNA; convert T's to U's, to make sure a nucleic acid sequence is
64              shown as RNA not DNA. See -d.
65
66
67
68       -u     Uppercase; convert all sequence residues to upper case.  See -l.
69
70
71
72       -x     For DNA sequences, convert non-IUPAC characters (such as X's) to
73              N's.  This is for compatibility with benighted people who insist
74              on using X instead of the IUPAC ambiguity character N. (X is for
75              ambiguity in an amino acid residue).
76
77              Warning: like the -n option, the code doesn't check that you are
78              actually giving it DNA. It simply literally just  converts  non-
79              IUPAC  DNA  symbols to N. So if you accidentally give it protein
80              sequence, it will happily convert most every amino acid  residue
81              to an N.
82
83
84
85
86

EXPERT OPTIONS

88       --gapsym <c>
89              Convert  all  gap  characters to <c>.  Used to prepare alignment
90              files for programs with strict  requirements  for  gap  symbols.
91              Only makes sense if the input seqfile is an alignment.
92
93
94       --informat <s>
95              Assert that input seqfile is in format <s>, bypassing format au‐
96              todetection.  Common choices for <s> include: fasta, embl,  gen‐
97              bank.   Alignment  formats  also  work;  common choices include:
98              stockholm, a2m, afa, psiblast, clustal, phylip.  For more infor‐
99              mation,  and  for  codes  for some less common formats, see main
100              documentation.  The string <s>  is  case-insensitive  (fasta  or
101              FASTA both work).
102
103
104       --mingap
105              If seqfile is an alignment, remove any columns that contain 100%
106              gap or missing data characters, minimizing the overall length of
107              the  alignment.   (Often  useful if you've extracted a subset of
108              aligned sequences from a larger alignment.)
109
110
111       --keeprf
112              When used in combination with --mingap, never  remove  a  column
113              that is not a gap in the reference (#=GC RF) annotation, even if
114              the column contains 100%  gap  characters  in  all  aligned  se‐
115              quences.  By  default  with --mingap, nongap RF columns that are
116              100% gaps in all sequences are removed.
117
118
119       --nogap
120              Remove any aligned columns that contain any gap or missing  data
121              symbols  at  all.  Useful as a prelude to phylogenetic analyses,
122              where you only want to analyze columns containing 100% residues,
123              so  you  want  to strip out any columns with gaps in them.  Only
124              makes sense if the file is an alignment file.
125
126
127       --wussify
128              Convert RNA secondary structure annotation strings (both consen‐
129              sus  and  individual) from old "KHS" format, ><, to the new WUSS
130              notation, <>. If the notation is already in  WUSS  format,  this
131              option  will screw it up, without warning. Only SELEX and Stock‐
132              holm format files have secondary structure markup at present.
133
134
135       --dewuss
136              Convert RNA secondary structure annotation strings from the  new
137              WUSS  notation, <>, back to the old KHS format, ><. If the anno‐
138              tation is already in KHS, this option will corrupt  it,  without
139              warning.   Only  SELEX and Stockholm format files have secondary
140              structure markup.
141
142
143       --fullwuss
144              Convert RNA secondary structure annotation strings  from  simple
145              (input) WUSS notation to full (output) WUSS notation.
146
147
148       --replace <s>
149              <s>  must be in the format <s1>:<s2> with equal numbers of char‐
150              acters in <s1> and <s2> separated by a ":" symbol. Each  charac‐
151              ter from <s1> in the input file will be replaced by its counter‐
152              part (at the same position) from <s2>.  Note that special  char‐
153              acters  in  <s>  (such  as "~") may need to be prefixed by a "\"
154              character.
155
156
157       --small
158              Operate in small memory mode for input alignment files  in  Pfam
159              format.  If  not used, each alignment is stored in memory so the
160              required memory will be roughly the size of the  largest  align‐
161              ment  in  the input file. With --small, input alignments are not
162              stored in memory.  This option only works  in  combination  with
163              --informat pfam and output format pfam or afa.
164
165
166
167

COPYRIGHT

173       Copyright (C) 2020 Howard Hughes Medical Institute.
174       Freely distributed under the BSD open source license.
175
176

AUTHOR

178       http://eddylab.org
179
180
181
182
183Easel 0.48                         Nov 2020                    esl-reformat(1)