esl-alimerge(1)

1esl-alimerge(1)                  Easel Manual                  esl-alimerge(1)
2
3
4

NAME

6       esl-alimerge  -  merge alignments based on their reference (RF) annota‐
7       tion
8
9

SYNOPSIS

11       esl-alimerge [options] alifile1 alifile2
12         (merge two alignment files)
13
14       esl-alimerge --list [options] listfile
15         (merge many alignment files listed in a file)
16
17
18

DESCRIPTION

20       esl-alimerge reads more than one input alignments, merges them  into  a
21       single alignment and outputs it.
22
23
24       The  input  alignments must all be in Stockholm format.  All alignments
25       must have reference ('#=GC RF') annotation. Further, the RF  annotation
26       must be identical in all alignments once gap characters in the RF anno‐
27       tation ('.','-','_') have been removed.  This requirement allows align‐
28       ments  with  different  numbers  of total columns to be merged together
29       based on consistent RF annotation, such as alignments created  by  suc‐
30       cessive  runs  of the cmalign program of the INFERNAL package using the
31       same CM.  Columns which have a gap character in the RF  annotation  are
32       called 'insert' columns.
33
34
35       All  sequence data in all input alignments will be included in the out‐
36       put alignment regardless of the output format (see  --outformat  option
37       below). However, sequences in the merged alignment will usually contain
38       more gaps ('.') than they did in  their  respective  input  alignments.
39       This is because esl-alimerge must add 100% gap columns to each individ‐
40       ual input alignment so that insert columns in the  other  input  align‐
41       ments can be accomodated in the merged alignment.
42
43
44       If  the  output  format is Stockholm or Pfam, annotation will be trans‐
45       ferred from the input alignments to the merged  alignment  as  follows.
46       All per-sequence ('#=GS') and per-residue ('#=GR') annotation is trans‐
47       ferred.  Per-file ('#=GF') annotation is transferred if it  is  present
48       and  identical  in  all  alignments.  Per-column ('#=GC') annotation is
49       transferred if it is present and identical in all alignments  once  all
50       insert  positions  have been removed and the '#=GC' annotation includes
51       zero non-gap characters in insert columns.
52
53
54       With the --list <f> option, <f> is a file listing  alignment  files  to
55       merge.  In  the  list  file,  blank lines and lines that start with '#'
56       (comments) are ignored. Each data line contains a single word: the name
57       of  an alignment file to be merged. All alignments in each file will be
58       merged.
59
60
61       With the --small option, esl-alimerge will  operate  in  memory  saving
62       mode and the required RAM for the merge will be minimal (should be only
63       a few Mb) and independent of the alignment sizes. To use  --small,  all
64       alignments  must  be  in  Pfam format (non-interleaved, 1 line/sequence
65       Stockholm format). You  can  reformat  alignments  to  Pfam  using  the
66       esl-reformat  Easel  miniapp.  Without --small the required RAM will be
67       equal to roughly the size of the final merged alignment file which will
68       necessarily  be  at least the summed size of all of the input alignment
69       files to be merged and sometimes several times larger. If you're  merg‐
70       ing  large  alignments  or you're experiencing very slow performance of
71       esl-alimerge, try reformatting to Pfam and using --small.
72
73
74
75

OPTIONS

77       -h     Print brief help; includes version number and summary of all op‐
78              tions, including expert options.
79
80
81       -o <f> Output merged alignment to file <f> instead of to stdout.
82
83
84       -v     Be  verbose; print information on the size of the alignments be‐
85              ing merged, and the annotation transferred to the merged  align‐
86              ment  to  stdout.   This  option can only be used in combination
87              with the -o option (so that the printed info doesn't corrupt the
88              output alignment file).
89
90
91       --small
92              Operate  in memory saving mode. Required RAM will be independent
93              of the sizes of the alignments to merge, instead of roughly  the
94              size  of the eventual merged alignment. When enabled, all align‐
95              ments must be in Pfam  Stockholm  (non-interleaved  1  line/seq)
96              format;  see  esl-reformat(1).   The output alignment will be in
97              Pfam format.
98
99
100       --rfonly
101              Only include columns that are not gaps in the GC  RF  annotation
102              in the merged alignment.
103
104
105       --outformat <s>
106              Write  the  output  alignment in format <s>.  Common choices for
107              <s> include: stockholm, a2m,  afa,  psiblast,  clustal,  phylip.
108              The  string <s> is case-insensitive (a2m or A2M both work).  De‐
109              fault is stockholm.
110
111
112
113       --rna  Specify that the input alignments are RNA alignments. By default
114              esl-alimerge  will  try  to  autodetect the alphabet, but if the
115              alignment is sufficiently small it may be ambiguous. This option
116              defines the alphabet as RNA.
117
118
119       --dna  Specify that the input alignments are DNA alignments.
120
121
122       --amino
123              Specify that the input alignments are protein alignments.
124
125
126
127

COPYRIGHT

133       Copyright (C) 2020 Howard Hughes Medical Institute.
134       Freely distributed under the BSD open source license.
135
136

AUTHOR

138       http://eddylab.org
139
140
141
142Easel 0.48                         Nov 2020                    esl-alimerge(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

COPYRIGHT

AUTHOR