alimask(1)

1alimask(1)                       HMMER Manual                       alimask(1)
2
3
4

NAME

6       alimask  -  calculate and add column mask to a multiple sequence align‐
7       ment
8
9

SYNOPSIS

11       alimask [options] msafile postmsafile
12
13
14

DESCRIPTION

16       alimask is used to apply a mask line to a multiple sequence  alignment,
17       based  on  provided  alignment or model coordinates.  When hmmbuild re‐
18       ceives a masked alignment as input, it  produces  a  profile  model  in
19       which  the  emission probabilities at masked positions are set to match
20       the background frequency, rather than being set based on observed  fre‐
21       quencies  in  the  alignment.  Position-specific insertion and deletion
22       rates are not altered, even in masked regions.  alimask autodetects in‐
23       put  format,  and  produces  masked  alignments  in  Stockholm  format.
24       msafile may contain only one sequence alignment.
25
26
27       A common motivation for masking a region in an alignment  is  that  the
28       region contains a simple tandem repeat that is observed to cause an un‐
29       acceptably high rate of false positive hits.
30
31
32       In the simplest case, a mask range is given in coordinates relative  to
33       the  input  alignment,  using --alirange <s>.  However it is more often
34       the case that the region to be masked has been  identified  in  coordi‐
35       nates relative to the profile model (e.g. based on recognizing a simple
36       repeat pattern in false hit alignments or in the HMM  logo).   Not  all
37       alignment columns are converted to match state positions in the profile
38       (see the --symfrac flag for hmmbuild for discussion),  so  model  posi‐
39       tions  do  not  necessarily match up to alignment column positions.  To
40       remove the burden of converting model positions to alignment positions,
41       alimask  accepts the mask range input in model coordinates as well, us‐
42       ing --modelrange <s>.  When using this flag, alimask  determines  which
43       alignment  positions would be identified by hmmbuild as match states, a
44       process that requires that all hmmbuild flags impacting  that  decision
45       be  supplied  to  alimask.  It is for this reason that many of the hmm‐
46       build flags are also used by alimask.
47
48
49
50

OPTIONS

52       -h     Help; print a brief reminder  of  command  line  usage  and  all
53              available options.
54
55
56       -o <f> Direct the summary output to file <f>, rather than to stdout.
57
58
59

OPTIONS FOR SPECIFYING MASK RANGE

61       A  single  mask  range is given as a dash-separated pair, like --model‐
62       range 10-20 and multiple ranges may be submitted as  a  comma-separated
63       list, --modelrange 10-20,30-42.
64
65
66
67       --modelrange <s>
68              Supply the given range(s) in model coordinates.
69
70
71       --alirange <s>
72              Supply the given range(s) in alignment coordinates.
73
74
75       --apendmask
76              Add  to the existing mask found with the alignment.  The default
77              is to overwrite any existing mask.
78
79
80       --model2ali <s>
81              Rather than actually produce the masked alignment, simply  print
82              model range(s) corresponding to input alignment range(s).
83
84
85       --ali2model <s>
86              Rather  than actually produce the masked alignment, simply print
87              alignment range(s) corresponding to input model range(s).
88
89
90

OPTIONS FOR SPECIFYING THE ALPHABET

92       --amino
93              Assert that sequences in msafile are protein, bypassing alphabet
94              autodetection.
95
96
97       --dna  Assert that sequences in msafile are DNA, bypassing alphabet au‐
98              todetection.
99
100
101       --rna  Assert that sequences in msafile are RNA, bypassing alphabet au‐
102              todetection.
103
104
105
106

OPTIONS CONTROLLING PROFILE CONSTRUCTION

108       These  options  control  how consensus columns are defined in an align‐
109       ment.
110
111
112       --fast Define consensus columns as those that have a fraction  >=  sym‐
113              frac  of  residues as opposed to gaps. (See below for the --sym‐
114              frac option.) This is the default.
115
116
117       --hand Define consensus columns in next profile using reference annota‐
118              tion  to  the multiple alignment.  This allows you to define any
119              consensus columns you like.
120
121
122       --symfrac <x>
123              Define the residue fraction threshold necessary to define a con‐
124              sensus  column when using the --fast option. The default is 0.5.
125              The symbol fraction in each column is  calculated  after  taking
126              relative sequence weighting into account, and ignoring gap char‐
127              acters corresponding to ends of sequence fragments  (as  opposed
128              to  internal  insertions/deletions).   Setting this to 0.0 means
129              that every alignment column will be assigned as consensus, which
130              may  be  useful in some cases. Setting it to 1.0 means that only
131              columns that include 0 gaps (internal insertions/deletions) will
132              be assigned as consensus.
133
134
135       --fragthresh <x>
136              We  only want to count terminal gaps as deletions if the aligned
137              sequence is known to be full-length, not if  it  is  a  fragment
138              (for  instance,  because  only  part of it was sequenced). HMMER
139              uses a simple rule to infer fragments: if the sequence length  L
140              is  less  than  or  equal  to a fraction <x> times the alignment
141              length in columns, then the sequence is handled as  a  fragment.
142              The  default  is  0.5.   Setting  --fragthresh  0 will define no
143              (nonempty) sequence as a fragment; you might want to do this  if
144              you know you've got a carefully curated alignment of full-length
145              sequences.  Setting --fragthresh 1 will define all sequences  as
146              fragments;  you might want to do this if you know your alignment
147              is entirely composed of  fragments,  such  as  translated  short
148              reads in metagenomic shotgun data.
149
150
151

OPTIONS CONTROLLING RELATIVE WEIGHTS

153       HMMER uses an ad hoc sequence weighting algorithm to downweight closely
154       related sequences and upweight distantly related ones. This has the ef‐
155       fect  of  making  models less biased by uneven phylogenetic representa‐
156       tion. For example, two identical sequences would typically each receive
157       half  the  weight that one sequence would.  These options control which
158       algorithm gets used.
159
160
161       --wpb  Use  the  Henikoff  position-based  sequence  weighting   scheme
162              [Henikoff  and  Henikoff, J. Mol. Biol. 243:574, 1994].  This is
163              the default.
164
165
166       --wgsc Use the Gerstein/Sonnhammer/Chothia  weighting  algorithm  [Ger‐
167              stein et al, J. Mol. Biol. 235:1067, 1994].
168
169
170       --wblosum
171              Use  the  same clustering scheme that was used to weight data in
172              calculating BLOSUM subsitution matrices [Henikoff and  Henikoff,
173              Proc.  Natl.  Acad.  Sci  89:10915, 1992]. Sequences are single-
174              linkage clustered at an identity threshold  (default  0.62;  see
175              --wid)  and  within  each  cluster of c sequences, each sequence
176              gets relative weight 1/c.
177
178
179       --wnone
180              No relative weights. All sequences are assigned uniform weight.
181
182
183       --wid <x>
184              Sets the identity threshold used  by  single-linkage  clustering
185              when  using --wblosum.  Invalid with any other weighting scheme.
186              Default is 0.62.
187
188
189
190
191
192

OTHER OPTIONS

194       --informat <s>
195              Assert that input msafile is in alignment format <s>,  bypassing
196              format  autodetection.   Common  choices for <s> include: stock‐
197              holm, a2m, afa, psiblast, clustal, phylip.   For  more  informa‐
198              tion, and for codes for some less common formats, see main docu‐
199              mentation.  The string <s> is case-insensitive (a2m or A2M  both
200              work).
201
202
203
204       --outformat <s>
205              Write  the  output  postmsafile in alignment format <s>.  Common
206              choices for <s> include: stockholm, a2m, afa, psiblast, clustal,
207              phylip.   The  string  <s>  is case-insensitive (a2m or A2M both
208              work).  Default is stockholm.
209
210
211
212       --seed <n>
213              Seed the random number generator with <n>, an integer >= 0.   If
214              <n> is nonzero, any stochastic simulations will be reproducible;
215              the same command will give the same results.  If <n> is  0,  the
216              random  number  generator  is seeded arbitrarily, and stochastic
217              simulations will vary from run to run of the same command.   The
218              default seed is 42.
219
220
221
222

COPYRIGHT

236       Copyright (C) 2020 Howard Hughes Medical Institute.
237       Freely distributed under the BSD open source license.
238
239       For additional information on copyright and  licensing,  see  the  file
240       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
241       web page (http://hmmer.org/).
242
243
244

AUTHOR

246       http://eddylab.org
247
248
249
250
251HMMER 3.3.2                        Nov 2020                         alimask(1)