1esl-alimap(1) Easel Manual esl-alimap(1)
2
3
4
6 esl-alimap - map two alignments to each other
7
8
10 esl-alimap [options] msafile1 msafile2
11
12
14 esl-alimap is a highly specialized application that determines the op‐
15 timal alignment mapping of columns between two alignments of the same
16 sequences. An alignment mapping defines for each column in alignment 1
17 a matching column in alignment 2. The number of residues in the aligned
18 sequences that are in common between the two matched columns are con‐
19 sidered 'shared' by those two columns.
20
21
22 For example, if the nth residue of sequence i occurs in alignment 1
23 column x and alignment 2 column y, then only a mapping of alignment 1
24 and 2 that includes column x mapping to column y would correctly map
25 and share the residue.
26
27
28 The optimal mapping of the two alignments is the mapping which maxi‐
29 mizes the sum of shared residues between all pairs of matching columns.
30 The fraction of total residues that are shared is reported as the cov‐
31 erage in the esl-alimap output.
32
33
34 Only the first alignments in msafile1 and msafile2 will be mapped to
35 each other. If the files contain more than one alignment, all align‐
36 ments after the first will be ignored.
37
38
39 The two alignments (one from each file) must contain exactly the same
40 sequences (if they were unaligned, they'd be identical) in precisely
41 the same order. They must also be in Stockholm format.
42
43
44 The output of esl-alimap differs depending on whether one or both of
45 the alignments contain reference (#=GC RF) annotation. If so, the cov‐
46 erage for residues from nongap RF positions will be reported separately
47 from the total coverage.
48
49
50 esl-alimap uses a dynamic programming algorithm to compute the optimal
51 mapping. The algorithm is similar to the Needleman-Wunsch-Sellers algo‐
52 rithm but the scores used at each step of the recursion are not
53 residue-residue comparison scores but rather the number of residues
54 shared between two columns.
55
56 The --mask-a2a <f>, --mask-a2rf <f>, --mask-rf2a <f>, and --mask-rf2rf
57 <f> options create 'mask' files that pertain to the optimal mapping in
58 slightly different ways. A mask file consists of a single line, of only
59 '0' and '1' characters. These denote which positions of the alignment
60 from msafile1 map to positions of the alignment from msafile2 as de‐
61 scribed below for each of the four respective masking options. These
62 masks can be used to extract only those columns of the msafile1 align‐
63 ment that optimally map to columns of the msafile2 alignment using the
64 esl-alimask miniapp. To extract the corresponding set of columns from
65 msafile2 (that optimally map to columns of the alignment from
66 msafile1), it is necessary to rerun the program with the order of the
67 two msafiles reversed, save new masks, and use esl-alimask again.
68
69
71 -h Print brief help; includes version number and summary of all op‐
72 tions.
73
74
75 -q Be quiet; don't print information the optimal mapping of each
76 column, only report coverage and potentially save masks to op‐
77 tional output files.
78
79
80 --mask-a2a <f>
81 Save a mask of '0's and '1's to file <f>. A '1' at position x
82 means that position x of the alignment from msafile1 maps to an
83 alignment position in the alignment from msafile2 in the optimal
84 map.
85
86
87 --mask-a2rf <f>
88 Save a mask of '0's and '1's to file <f>. A '1' at position x
89 means that position x of the alignment from msafile1 maps to a
90 nongap RF position in the alignment from msafile2 in the optimal
91 map.
92
93
94 --mask-rf2a <f>
95 Save a mask of '0's and '1's to file <f>. A '1' at position x
96 means that nongap RF position x of the alignment from msafile1
97 maps to an alignment position in the alignment from msafile2 in
98 the optimal map.
99
100
101 --mask-rf2rf <f>
102 Save a mask of '0's and '1's to file <f>. A '1' at position x
103 means that nongap RF position x of the alignment from msafile1
104 maps to a nongap RF position in the alignment from msafile2 in
105 the optimal map.
106
107
108 --submap <f>
109 Specify that all of the columns from the alignment from msafile1
110 exist identically (contain the same residues from all sequences)
111 in the alignment from msafile2. This makes the task of mapping
112 trivial. However, not all columns of msafile1 must exist in
113 msafile2. Save the mask to file <f>. A '1' at position x of
114 the mask means that position x of the alignment from msafile1 is
115 the same as position y of msafile2, where y is the number of
116 '1's that occur at positions <= x in the mask.
117
118
119 --amino
120 Assert that msafile1 and msafile2 contain protein sequences.
121
122
123 --dna Assert that msafile1 and msafile2 contain DNA sequences.
124
125
126 --rna Assert that the msafile1 and msafile2 contain RNA sequences.
127
128
129
131 http://bioeasel.org/
132
133
135 Copyright (C) 2020 Howard Hughes Medical Institute.
136 Freely distributed under the BSD open source license.
137
138
140 http://eddylab.org
141
142
143
144Easel 0.48 Nov 2020 esl-alimap(1)