1esl-compalign(1)                 Easel Manual                 esl-compalign(1)
2
3
4

NAME

6       esl-compalign - compare two multiple sequence alignments
7
8

SYNOPSIS

10       esl-compalign [options] trusted_file test_file
11
12
13
14

DESCRIPTION

16       esl-compalign  evaluates  the accuracy of a predicted multiple sequence
17       alignment with respect to a trusted alignment of the same sequences.
18
19
20       The trusted_file and test_file must contain the same number  of  align‐
21       ments.  Each  predicted  alignment  in  the  test_file will be compared
22       against a single trusted alignment from the  trusted_file.   The  first
23       alignments  in each file correspond to each other and will be compared,
24       the second alignment in each file correspond to each other and will  be
25       compared,  and  so on.  Each corresponding pair of alignments must con‐
26       tain the same sequences (i.e. if they  were  unaligned  they  would  be
27       identical)  in  the  same  order in both files. Further, both alignment
28       files must be in Stockholm format and contain  'reference'  annotation,
29       which  appears  as  "#=GC RF" per-column markup for each alignment. The
30       number of nongap (non '.' characters) in the reference (RF)  annotation
31       must  be  identical  between  all  corresponding  alignments in the two
32       files.
33
34
35       esl-compalign reads an alignment from  each  file,  and  compares  them
36       based  on  their  'reference' annotation.  The number of correctly pre‐
37       dicted residues for each sequence is computed  as  follows.  A  residue
38       that  is in the Nth nongap RF column in the trusted alignment must also
39       appear in the Nth nongap RF column in the  predicted  alignment  to  be
40       counted  as  'correct', otherwise it is 'incorrect'. A residue that ap‐
41       pears in a gap RF column in the trusted  alignment  between  nongap  RF
42       columns  N  and  N+1 must also appear in a nongap RF column in the pre‐
43       dicted alignment between nongap RF columns N and N+1 to be  counted  as
44       'correct', otherwise it is incorrect.
45
46
47       The  default output of esl-compalign lists each sequence and the number
48       of correctly and incorrectly  predicted  residues  for  that  sequence.
49       These  counts are broken down into counts for residues in the predicted
50       alignments that occur  in  'match'  columns  and  'insert'  columns.  A
51       'match'  column  is  one for which the RF annotation does not contain a
52       gap. An 'insert' column is one for which the RF annotation does contain
53       a gap.
54
55
56
57

OPTIONS

59       -h     Print brief help; includes version number and summary of all op‐
60              tions.
61
62
63       -c     Print per-column statistics instead of per-sequence statistics.
64
65
66       -p     Print statistics on accuracy versus posterior  probability  val‐
67              ues.  The  test_file must be annotated with posterior probabili‐
68              ties (#=GR PP) for this option to work.
69
70
71

EXPERT OPTIONS

73       --p-mask <f>
74              This option may only be used in combination with the -p  option.
75              Read  a  "mask"  from file <f>.  The mask file must consist of a
76              single line, of only '0' and '1' characters. There must  be  ex‐
77              actly RFLEN characters where RFLEN is the number of nongap char‐
78              acters  in  the  RF  annotation  of  all  alignments   in   both
79              trusted_file  and test_file.  Positions of the mask that are '1'
80              characters indicate that the corresponding nongap RF position is
81              included by the mask. The posterior probability accuracy statis‐
82              tics for match columns will only pertain to positions  that  are
83              included  by  the  mask, those that are excluded will be ignored
84              from the accuracy calculation.
85
86              --c2dfile <f> Save a 'draw file' to file <f> which can  be  read
87              into  the  esl-ssdraw  miniapp.  This  draw file will define two
88              postscript pages for esl-ssdraw.  The first page will depict the
89              frequency of errors per match position and frequency of gaps per
90              match position, indicated by magenta and  yellow,  respectively.
91              The  darker  magenta, the more errors and the darker yellow, the
92              more gaps. The second page will depict the frequency  of  errors
93              in insert positions in shades of magenta, the darker the magenta
94              the more errors in inserts after each position.  See  esl-ssdraw
95              documentation for more information on these diagrams.
96
97
98       --amino
99              Assert  that  trusted_file  and  test_file  contain  protein se‐
100              quences.
101
102
103       --dna  Assert that trusted_file and test_file contain DNA sequences.
104
105
106       --rna  Assert that the  trusted_file  and  test_file  contain  RNA  se‐
107              quences.
108
109
110
111

SEE ALSO

113       http://bioeasel.org/
114
115
117       Copyright (C) 2020 Howard Hughes Medical Institute.
118       Freely distributed under the BSD open source license.
119
120

AUTHOR

122       http://eddylab.org
123
124
125
126
127Easel 0.48                         Nov 2020                   esl-compalign(1)
Impressum