1esl-compalign(1) Easel Manual esl-compalign(1)
2
3
4
6 esl-compalign - compare two multiple sequence alignments
7
8
10 esl-compalign [options] trusted_file test_file
11
12
13
14
16 esl-compalign evaluates the accuracy of a predicted multiple sequence
17 alignment with respect to a trusted alignment of the same sequences.
18
19
20 The trusted_file and test_file must contain the same number of align‐
21 ments. Each predicted alignment in the test_file will be compared
22 against a single trusted alignment from the trusted_file. The first
23 alignments in each file correspond to each other and will be compared,
24 the second alignment in each file correspond to each other and will be
25 compared, and so on. Each corresponding pair of alignments must con‐
26 tain the same sequences (i.e. if they were unaligned they would be
27 identical) in the same order in both files. Further, both alignment
28 files must be in Stockholm format and contain 'reference' annotation,
29 which appears as "#=GC RF" per-column markup for each alignment. The
30 number of nongap (non '.' characters) in the reference (RF) annotation
31 must be identical between all corresponding alignments in the two
32 files.
33
34
35 esl-compalign reads an alignment from each file, and compares them
36 based on their 'reference' annotation. The number of correctly pre‐
37 dicted residues for each sequence is computed as follows. A residue
38 that is in the Nth nongap RF column in the trusted alignment must also
39 appear in the Nth nongap RF column in the predicted alignment to be
40 counted as 'correct', otherwise it is 'incorrect'. A residue that ap‐
41 pears in a gap RF column in the trusted alignment between nongap RF
42 columns N and N+1 must also appear in a nongap RF column in the pre‐
43 dicted alignment between nongap RF columns N and N+1 to be counted as
44 'correct', otherwise it is incorrect.
45
46
47 The default output of esl-compalign lists each sequence and the number
48 of correctly and incorrectly predicted residues for that sequence.
49 These counts are broken down into counts for residues in the predicted
50 alignments that occur in 'match' columns and 'insert' columns. A
51 'match' column is one for which the RF annotation does not contain a
52 gap. An 'insert' column is one for which the RF annotation does contain
53 a gap.
54
55
56
57
59 -h Print brief help; includes version number and summary of all op‐
60 tions.
61
62
63 -c Print per-column statistics instead of per-sequence statistics.
64
65
66 -p Print statistics on accuracy versus posterior probability val‐
67 ues. The test_file must be annotated with posterior probabili‐
68 ties (#=GR PP) for this option to work.
69
70
71
73 --p-mask <f>
74 This option may only be used in combination with the -p option.
75 Read a "mask" from file <f>. The mask file must consist of a
76 single line, of only '0' and '1' characters. There must be ex‐
77 actly RFLEN characters where RFLEN is the number of nongap char‐
78 acters in the RF annotation of all alignments in both
79 trusted_file and test_file. Positions of the mask that are '1'
80 characters indicate that the corresponding nongap RF position is
81 included by the mask. The posterior probability accuracy statis‐
82 tics for match columns will only pertain to positions that are
83 included by the mask, those that are excluded will be ignored
84 from the accuracy calculation.
85
86 --c2dfile <f> Save a 'draw file' to file <f> which can be read
87 into the esl-ssdraw miniapp. This draw file will define two
88 postscript pages for esl-ssdraw. The first page will depict the
89 frequency of errors per match position and frequency of gaps per
90 match position, indicated by magenta and yellow, respectively.
91 The darker magenta, the more errors and the darker yellow, the
92 more gaps. The second page will depict the frequency of errors
93 in insert positions in shades of magenta, the darker the magenta
94 the more errors in inserts after each position. See esl-ssdraw
95 documentation for more information on these diagrams.
96
97
98 --amino
99 Assert that trusted_file and test_file contain protein se‐
100 quences.
101
102
103 --dna Assert that trusted_file and test_file contain DNA sequences.
104
105
106 --rna Assert that the trusted_file and test_file contain RNA se‐
107 quences.
108
109
110
111
113 http://bioeasel.org/
114
115
117 Copyright (C) 2020 Howard Hughes Medical Institute.
118 Freely distributed under the BSD open source license.
119
120
122 http://eddylab.org
123
124
125
126
127Easel 0.48 Nov 2020 esl-compalign(1)