1esl-alimerge(1) Easel Manual esl-alimerge(1)
2
3
4
6 esl-alimerge - merge alignments based on their reference (RF) annota‐
7 tion
8
9
11 esl-alimerge [options] alifile1 alifile2
12 (merge two alignment files)
13
14 esl-alimerge --list [options] listfile
15 (merge many alignment files listed in a file)
16
17
18
20 esl-alimerge reads more than one input alignments, merges them into a
21 single alignment and outputs it.
22
23
24 The input alignments must all be in Stockholm format. All alignments
25 must have reference ('#=GC RF') annotation. Further, the RF annotation
26 must be identical in all alignments once gap characters in the RF anno‐
27 tation ('.','-','_') have been removed. This requirement allows align‐
28 ments with different numbers of total columns to be merged together
29 based on consistent RF annotation, such as alignments created by suc‐
30 cessive runs of the cmalign program of the INFERNAL package using the
31 same CM. Columns which have a gap character in the RF annotation are
32 called 'insert' columns.
33
34
35 All sequence data in all input alignments will be included in the out‐
36 put alignment regardless of the output format (see --outformat option
37 below). However, sequences in the merged alignment will usually contain
38 more gaps ('.') than they did in their respective input alignments.
39 This is because esl-alimerge must add 100% gap columns to each individ‐
40 ual input alignment so that insert columns in the other input align‐
41 ments can be accomodated in the merged alignment.
42
43
44 If the output format is Stockholm or Pfam, annotation will be trans‐
45 ferred from the input alignments to the merged alignment as follows.
46 All per-sequence ('#=GS') and per-residue ('#=GR') annotation is trans‐
47 ferred. Per-file ('#=GF') annotation is transferred if it is present
48 and identical in all alignments. Per-column ('#=GC') annotation is
49 transferred if it is present and identical in all alignments once all
50 insert positions have been removed and the '#=GC' annotation includes
51 zero non-gap characters in insert columns.
52
53
54 With the --list <f> option, <f> is a file listing alignment files to
55 merge. In the list file, blank lines and lines that start with '#'
56 (comments) are ignored. Each data line contains a single word: the name
57 of an alignment file to be merged. All alignments in each file will be
58 merged.
59
60
61 With the --small option, esl-alimerge will operate in memory saving
62 mode and the required RAM for the merge will be minimal (should be only
63 a few Mb) and independent of the alignment sizes. To use --small, all
64 alignments must be in Pfam format (non-interleaved, 1 line/sequence
65 Stockholm format). You can reformat alignments to Pfam using the
66 esl-reformat Easel miniapp. Without --small the required RAM will be
67 equal to roughly the size of the final merged alignment file which will
68 necessarily be at least the summed size of all of the input alignment
69 files to be merged and sometimes several times larger. If you're merg‐
70 ing large alignments or you're experiencing very slow performance of
71 esl-alimerge, try reformatting to Pfam and using --small.
72
73
74
75
77 -h Print brief help; includes version number and summary of all op‐
78 tions, including expert options.
79
80
81 -o <f> Output merged alignment to file <f> instead of to stdout.
82
83
84 -v Be verbose; print information on the size of the alignments be‐
85 ing merged, and the annotation transferred to the merged align‐
86 ment to stdout. This option can only be used in combination
87 with the -o option (so that the printed info doesn't corrupt the
88 output alignment file).
89
90
91 --small
92 Operate in memory saving mode. Required RAM will be independent
93 of the sizes of the alignments to merge, instead of roughly the
94 size of the eventual merged alignment. When enabled, all align‐
95 ments must be in Pfam Stockholm (non-interleaved 1 line/seq)
96 format; see esl-reformat(1). The output alignment will be in
97 Pfam format.
98
99
100 --rfonly
101 Only include columns that are not gaps in the GC RF annotation
102 in the merged alignment.
103
104
105 --outformat <s>
106 Write the output alignment in format <s>. Common choices for
107 <s> include: stockholm, a2m, afa, psiblast, clustal, phylip.
108 The string <s> is case-insensitive (a2m or A2M both work). De‐
109 fault is stockholm.
110
111
112
113 --rna Specify that the input alignments are RNA alignments. By default
114 esl-alimerge will try to autodetect the alphabet, but if the
115 alignment is sufficiently small it may be ambiguous. This option
116 defines the alphabet as RNA.
117
118
119 --dna Specify that the input alignments are DNA alignments.
120
121
122 --amino
123 Specify that the input alignments are protein alignments.
124
125
126
127
129 http://bioeasel.org/
130
131
133 Copyright (C) 2020 Howard Hughes Medical Institute.
134 Freely distributed under the BSD open source license.
135
136
138 http://eddylab.org
139
140
141
142Easel 0.48 Nov 2020 esl-alimerge(1)