1esl-alimap(1)                    Easel Manual                    esl-alimap(1)
2
3
4

NAME

6       esl-alimap - map two alignments to each other
7
8

SYNOPSIS

10       esl-alimap [options] msafile1 msafile2
11
12

DESCRIPTION

14       esl-alimap  is a highly specialized application that determines the op‐
15       timal alignment mapping of columns between two alignments of  the  same
16       sequences.  An alignment mapping defines for each column in alignment 1
17       a matching column in alignment 2. The number of residues in the aligned
18       sequences  that  are in common between the two matched columns are con‐
19       sidered 'shared' by those two columns.
20
21
22       For example, if the nth residue of sequence i  occurs  in  alignment  1
23       column  x  and alignment 2 column y, then only a mapping of alignment 1
24       and 2 that includes column x mapping to column y  would  correctly  map
25       and share the residue.
26
27
28       The  optimal  mapping  of the two alignments is the mapping which maxi‐
29       mizes the sum of shared residues between all pairs of matching columns.
30       The  fraction of total residues that are shared is reported as the cov‐
31       erage in the esl-alimap output.
32
33
34       Only the first alignments in msafile1 and msafile2 will  be  mapped  to
35       each  other.  If  the files contain more than one alignment, all align‐
36       ments after the first will be ignored.
37
38
39       The two alignments (one from each file) must contain exactly  the  same
40       sequences  (if  they  were unaligned, they'd be identical) in precisely
41       the same order. They must also be in Stockholm format.
42
43
44       The output of esl-alimap differs depending on whether one  or  both  of
45       the  alignments contain reference (#=GC RF) annotation. If so, the cov‐
46       erage for residues from nongap RF positions will be reported separately
47       from the total coverage.
48
49
50       esl-alimap  uses a dynamic programming algorithm to compute the optimal
51       mapping. The algorithm is similar to the Needleman-Wunsch-Sellers algo‐
52       rithm  but  the  scores  used  at  each  step  of the recursion are not
53       residue-residue comparison scores but rather  the  number  of  residues
54       shared between two columns.
55
56       The  --mask-a2a <f>, --mask-a2rf <f>, --mask-rf2a <f>, and --mask-rf2rf
57       <f> options create 'mask' files that pertain to the optimal mapping  in
58       slightly different ways. A mask file consists of a single line, of only
59       '0' and '1' characters. These denote which positions of  the  alignment
60       from  msafile1  map  to positions of the alignment from msafile2 as de‐
61       scribed below for each of the four respective masking  options.   These
62       masks  can be used to extract only those columns of the msafile1 align‐
63       ment that optimally map to columns of the msafile2 alignment using  the
64       esl-alimask  miniapp.  To extract the corresponding set of columns from
65       msafile2  (that  optimally  map  to  columns  of  the  alignment   from
66       msafile1),  it  is necessary to rerun the program with the order of the
67       two msafiles reversed, save new masks, and use esl-alimask again.
68
69

OPTIONS

71       -h     Print brief help; includes version number and summary of all op‐
72              tions.
73
74
75       -q     Be  quiet;  don't  print information the optimal mapping of each
76              column, only report coverage and potentially save masks  to  op‐
77              tional output files.
78
79
80       --mask-a2a <f>
81              Save  a  mask of '0's and '1's to file <f>.  A '1' at position x
82              means that position x of the alignment from msafile1 maps to  an
83              alignment position in the alignment from msafile2 in the optimal
84              map.
85
86
87       --mask-a2rf <f>
88              Save a mask of '0's and '1's to file <f>.  A '1' at  position  x
89              means  that  position x of the alignment from msafile1 maps to a
90              nongap RF position in the alignment from msafile2 in the optimal
91              map.
92
93
94       --mask-rf2a <f>
95              Save  a  mask of '0's and '1's to file <f>.  A '1' at position x
96              means that nongap RF position x of the alignment  from  msafile1
97              maps  to an alignment position in the alignment from msafile2 in
98              the optimal map.
99
100
101       --mask-rf2rf <f>
102              Save a mask of '0's and '1's to file <f>.  A '1' at  position  x
103              means  that  nongap RF position x of the alignment from msafile1
104              maps to a nongap RF position in the alignment from  msafile2  in
105              the optimal map.
106
107
108       --submap <f>
109              Specify that all of the columns from the alignment from msafile1
110              exist identically (contain the same residues from all sequences)
111              in  the alignment from msafile2.  This makes the task of mapping
112              trivial.  However, not all columns of  msafile1  must  exist  in
113              msafile2.   Save  the  mask to file <f>.  A '1' at position x of
114              the mask means that position x of the alignment from msafile1 is
115              the  same  as  position  y of msafile2, where y is the number of
116              '1's that occur at positions <= x in the mask.
117
118
119       --amino
120              Assert that msafile1 and msafile2 contain protein sequences.
121
122
123       --dna  Assert that msafile1 and msafile2 contain DNA sequences.
124
125
126       --rna  Assert that the msafile1 and msafile2 contain RNA sequences.
127
128
129

SEE ALSO

131       http://bioeasel.org/
132
133
135       Copyright (C) 2020 Howard Hughes Medical Institute.
136       Freely distributed under the BSD open source license.
137
138

AUTHOR

140       http://eddylab.org
141
142
143
144Easel 0.48                         Nov 2020                      esl-alimap(1)
Impressum