1alimask(1)                       HMMER Manual                       alimask(1)
2
3
4

NAME

6       alimask - Add mask line to a multiple sequence alignment
7
8

SYNOPSIS

10       alimask [options] <msafile> <postmsafile>
11
12
13

DESCRIPTION

15       alimask  is used to apply a mask line to a multiple sequence alignment,
16       based on  provided  alignment  or  model  coordinates.   When  hmmbuild
17       receives  a  masked  alignment as input, it produces a profile model in
18       which the emission probabilities at masked positions are set  to  match
19       the  background frequency, rather than being set based on observed fre‐
20       quencies in the alignment.  Position-specific  insertion  and  deletion
21       rates  are  not  altered,  even in masked regions.  alimask autodetects
22       input format, and  produces  masked  alignments  in  Stockholm  format.
23       <msafile> may contain only one sequence alignment.
24
25
26       A  common  motivation  for masking a region in an alignment is that the
27       region contains a simple tandem repeat that is  observed  to  cause  an
28       unacceptably high rate of false positive hits.
29
30
31       In  the simplest case, a mask range is given in coordinates relative to
32       the input alignment, using --alirange <s>.  However it  is  more  often
33       the  case  that  the region to be masked has been identified in coordi‐
34       nates relative to the profile model (e.g. based on recognizing a simple
35       repeat  pattern  in  false hit alignments or in the HMM logo).  Not all
36       alignment columns are converted to match state positions in the profile
37       (see  the  --symfrac  flag for hmmbuild for discussion), so model posi‐
38       tions do not necessarily match up to alignment  column  positions.   To
39       remove the burden of converting model positions to alignment positions,
40       alimask accepts the mask range input  in  model  coordinates  as  well,
41       using --modelrange <s>.  When using this flag, alimask determines which
42       alignment positions would be identified by hmmbuild as match states,  a
43       process  that  requires that all hmmbuild flags impacting that decision
44       be supplied to alimask.  It is for this reason that many  of  the  hmm‐
45       build flags are also used by alimask.
46
47
48
49

OPTIONS

51       -h     Help;  print  a  brief  reminder  of  command line usage and all
52              available options.
53
54
55       -o <f> Direct the summary output to file <f>, rather than to stdout.
56
57
58

OPTIONS FOR SPECIFYING MASK RANGE

60       A single mask range is given as a dash-separated  pair,  like  --model‐
61       range  10-20  and multiple ranges may be submitted as a comma-separated
62       list, --modelrange 10-20,30-42.
63
64
65
66       --modelrange <s>
67              Supply the given range(s) in model coordinates.
68
69
70       --alirange <s>
71              Supply the given range(s) in alignment coordinates.
72
73
74       --apendmask
75              Add to the existing mask found with the alignment.  The  default
76              is to overwrite any existing mask.
77
78
79       --model2ali <s>
80              Rather  than actually produce the masked alignment, simply print
81              model range(s) corresponding to input alignment range(s).
82
83
84       --ali2model <s>
85              Rather than actually produce the masked alignment, simply  print
86              alignment range(s) corresponding to input model range(s).
87
88
89

OPTIONS FOR SPECIFYING THE ALPHABET

91       The  alphabet  type (amino, DNA, or RNA) is autodetected by default, by
92       looking at the composition of the msafile.  Autodetection  is  normally
93       quite  reliable,  but  occasionally  alphabet type may be ambiguous and
94       autodetection can fail (for instance, on tiny toy alignments of just  a
95       few  residues).  To  avoid this, or to increase robustness in automated
96       analysis pipelines, you may specify the alphabet type of  msafile  with
97       these options.
98
99
100       --amino
101              Specify that all sequences in msafile are proteins.
102
103
104       --dna  Specify that all sequences in msafile are DNAs.
105
106
107       --rna  Specify that all sequences in msafile are RNAs.
108
109
110
111

OPTIONS CONTROLLING PROFILE CONSTRUCTION

113       These  options  control  how consensus columns are defined in an align‐
114       ment.
115
116
117       --fast Define consensus columns as those that have a fraction  >=  sym‐
118              frac  of  residues as opposed to gaps. (See below for the --sym‐
119              frac option.) This is the default.
120
121
122       --hand Define consensus columns in next profile using reference annota‐
123              tion  to  the multiple alignment.  This allows you to define any
124              consensus columns you like.
125
126
127       --symfrac <x>
128              Define the residue fraction threshold necessary to define a con‐
129              sensus  column when using the --fast option. The default is 0.5.
130              The symbol fraction in each column is  calculated  after  taking
131              relative sequence weighting into account, and ignoring gap char‐
132              acters corresponding to ends of sequence fragments  (as  opposed
133              to  internal  insertions/deletions).   Setting this to 0.0 means
134              that every alignment column will be assigned as consensus, which
135              may  be  useful in some cases. Setting it to 1.0 means that only
136              columns that include 0 gaps (internal insertions/deletions) will
137              be assigned as consensus.
138
139
140       --fragthresh <x>
141              We  only want to count terminal gaps as deletions if the aligned
142              sequence is known to be full-length, not if  it  is  a  fragment
143              (for  instance,  because  only  part of it was sequenced). HMMER
144              uses a simple rule to infer fragments: if the sequence length  L
145              is  less  than  or  equal  to a fraction <x> times the alignment
146              length in columns, then the sequence is handled as  a  fragment.
147              The  default  is  0.5.   Setting  --fragthresh0  will  define no
148              (nonempty) sequence as a fragment; you might want to do this  if
149              you know you've got a carefully curated alignment of full-length
150              sequences.  Setting --fragthresh1 will define all  sequences  as
151              fragments;  you might want to do this if you know your alignment
152              is entirely composed of  fragments,  such  as  translated  short
153              reads in metagenomic shotgun data.
154
155
156

OPTIONS CONTROLLING RELATIVE WEIGHTS

158       HMMER uses an ad hoc sequence weighting algorithm to downweight closely
159       related sequences and upweight distantly related  ones.  This  has  the
160       effect  of making models less biased by uneven phylogenetic representa‐
161       tion. For example, two identical sequences would typically each receive
162       half  the  weight that one sequence would.  These options control which
163       algorithm gets used.
164
165
166       --wpb  Use  the  Henikoff  position-based  sequence  weighting   scheme
167              [Henikoff  and  Henikoff, J. Mol. Biol. 243:574, 1994].  This is
168              the default.
169
170
171       --wgsc Use the Gerstein/Sonnhammer/Chothia  weighting  algorithm  [Ger‐
172              stein et al, J. Mol. Biol. 235:1067, 1994].
173
174
175       --wblosum
176              Use  the  same clustering scheme that was used to weight data in
177              calculating BLOSUM subsitution matrices [Henikoff and  Henikoff,
178              Proc.  Natl.  Acad.  Sci  89:10915, 1992]. Sequences are single-
179              linkage clustered at an identity threshold  (default  0.62;  see
180              --wid)  and  within  each  cluster of c sequences, each sequence
181              gets relative weight 1/c.
182
183
184       --wnone
185              No relative weights. All sequences are assigned uniform weight.
186
187
188       --wid <x>
189              Sets the identity threshold used  by  single-linkage  clustering
190              when  using --wblosum.  Invalid with any other weighting scheme.
191              Default is 0.62.
192
193
194
195
196
197

OTHER OPTIONS

199       --informat <s>
200              Declare that the input msafile is in format <s>.  Currently  the
201              accepted multiple alignment sequence file formats include Stock‐
202              holm, Aligned FASTA, Clustal, NCBI PSI-BLAST, PHYLIP, Selex, and
203              UCSC SAM A2M. Default is to autodetect the format of the file.
204
205
206
207       --seed <n>
208              Seed  the random number generator with <n>, an integer >= 0.  If
209              <n> is nonzero, any stochastic simulations will be reproducible;
210              the  same  command will give the same results.  If <n> is 0, the
211              random number generator is seeded  arbitrarily,  and  stochastic
212              simulations  will vary from run to run of the same command.  The
213              default seed is 42.
214
215
216
217

SEE ALSO

219       See hmmer(1) for a master man page with a list of  all  the  individual
220       man pages for programs in the HMMER package.
221
222
223       For  complete  documentation,  see  the  user guide that came with your
224       HMMER distribution (Userguide.pdf); or see the HMMER web page ().
225
226
227
228
230       Copyright (C) 2015 Howard Hughes Medical Institute.
231       Freely distributed under the GNU General Public License (GPLv3).
232
233       For additional information on copyright and  licensing,  see  the  file
234       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
235       web page ().
236
237
238

AUTHOR

240       Eddy/Rivas Laboratory
241       Janelia Farm Research Campus
242       19700 Helix Drive
243       Ashburn VA 20147 USA
244       http://eddylab.org
245
246
247
248
249HMMER 3.1b2                      February 2015                      alimask(1)
Impressum