hmmsearch(1)

1hmmsearch(1)                     HMMER Manual                     hmmsearch(1)
2
3
4

NAME

6       hmmsearch - search profile(s) against a sequence database
7
8
9

SYNOPSIS

11       hmmsearch [options] <hmmfile> <seqdb>
12
13
14

DESCRIPTION

16       hmmsearch  is  used  to  search one or more profiles against a sequence
17       database.  For each profile in <hmmfile>, use  that  query  profile  to
18       search  the  target  database of profiles in <seqdb>, and output ranked
19       lists of the sequences with the most significant matches  to  the  pro‐
20       file.
21
22
23       The <hmmfile> may contain more than one profile. To build profiles from
24       multiple alignments, see hmmbuild.
25
26
27       The output format is designed to be human-readable,  but  is  often  so
28       voluminous  that  reading  it is impractical, and parsing it is a pain.
29       The --tblout and --domtblout options save output in simple tabular for‐
30       mats  that are concise and easier to parse.  The -o option allows redi‐
31       recting the main output, including throwing it away in /dev/null.
32
33
34
35

OPTIONS

37       -h     Help; print a brief reminder  of  command  line  usage  and  all
38              available options.
39
40
41
42

OPTIONS FOR CONTROLLING OUTPUT

44       -o <f> Direct  the  main human-readable output to a file <f> instead of
45              the default stdout.
46
47
48       -A <f> Save a multiple alignment of all significant hits (those  satis‐
49              fying inclusion thresholds) to the file <f>.
50
51
52       --tblout <f>
53              Save  a  simple  tabular  (space-delimited) file summarizing the
54              per-target output, with one  data  line  per  homologous  target
55              sequence found.
56
57
58       --domtblout <f>
59              Save  a  simple  tabular  (space-delimited) file summarizing the
60              per-domain output, with one  data  line  per  homologous  domain
61              detected in a query sequence for each homologous model.
62
63
64       --acc  Use accessions instead of names in the main output, where avail‐
65              able for profiles and/or sequences.
66
67
68       --noali
69              Omit the alignment  section  from  the  main  output.  This  can
70              greatly reduce the output volume.
71
72
73       --notextw
74              Unlimit  the length of each line in the main output. The default
75              is a limit of 120 characters per line, which helps in displaying
76              the output cleanly on terminals and in editors, but can truncate
77              target profile description lines.
78
79
80       --textw <n>
81              Set the main output's line length limit to  <n>  characters  per
82              line. The default is 120.
83
84
85
86

OPTIONS CONTROLLING REPORTING THRESHOLDS

88       Reporting  thresholds  control  which hits are reported in output files
89       (the main output, --tblout, and --domtblout).  Sequence hits and domain
90       hits  are  ranked  by  statistical significance (E-value) and output is
91       generated in two sections called per-target and per-domain  output.  In
92       per-target  output, by default, all sequence hits with an E-value <= 10
93       are reported. In the per-domain output, for each target that has passed
94       per-target  reporting  thresholds,  all  domains  satisfying per-domain
95       reporting thresholds are reported. By default, these are  domains  with
96       conditional  E-values  of  <=  10.  The  following options allow you to
97       change the default E-value reporting thresholds, or to  use  bit  score
98       thresholds instead.
99
100
101
102       -E <x> In  the  per-target  output,  report target sequences with an E-
103              value of <= <x>.  The default is 10.0, meaning that on  average,
104              about  10 false positives will be reported per query, so you can
105              see the top of the noise and decide for yourself if it's  really
106              noise.
107
108
109       -T <x> Instead  of  thresholding per-profile output on E-value, instead
110              report target sequences with a bit score of >= <x>.
111
112
113       --domE <x>
114              In the per-domain output, for target sequences that have already
115              satisfied the per-profile reporting threshold, report individual
116              domains with a conditional E-value of <= <x>.   The  default  is
117              10.0.   A conditional E-value means the expected number of addi‐
118              tional false positive domains in the  smaller  search  space  of
119              those  comparisons that already satisfied the per-target report‐
120              ing threshold (and thus must have at least one homologous domain
121              already).
122
123
124
125       --domT <x>
126              Instead  of  thresholding  per-domain output on E-value, instead
127              report domains with a bit score of >= <x>.
128
129
130
131
132

OPTIONS FOR INCLUSION THRESHOLDS

134       Inclusion thresholds are stricter than reporting thresholds.  Inclusion
135       thresholds  control  which hits are considered to be reliable enough to
136       be included in an output alignment or a  subsequent  search  round,  or
137       marked as significant ("!") as opposed to questionable ("?")  in domain
138       output.
139
140
141       --incE <x>
142              Use an E-value of <= <x> as the per-target inclusion  threshold.
143              The default is 0.01, meaning that on average, about 1 false pos‐
144              itive would be expected in every  100  searches  with  different
145              query sequences.
146
147
148       --incT <x>
149              Instead  of  using E-values for setting the inclusion threshold,
150              instead use a bit score of >= <x> as  the  per-target  inclusion
151              threshold.  By default this option is unset.
152
153
154       --incdomE <x>
155              Use  a conditional E-value of <= <x> as the per-domain inclusion
156              threshold, in targets that have already  satisfied  the  overall
157              per-target inclusion threshold.  The default is 0.01.
158
159
160       --incdomT <x>
161              Instead of using E-values, use a bit score of >= <x> as the per-
162              domain inclusion threshold.
163
164
165
166

OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING

168       Curated profile databases may define specific bit score thresholds  for
169       each profile, superseding any thresholding based on statistical signif‐
170       icance alone.
171
172       To use these options, the profile must contain the appropriate (GA, TC,
173       and/or  NC)  optional  score threshold annotation; this is picked up by
174       hmmbuild from  Stockholm  format  alignment  files.  Each  thresholding
175       option  has  two  scores:  the per-sequence threshold <x1> and the per-
176       domain threshold <x2> These act  as  if  -T<x1>  --incT<x1>  --domT<x2>
177       --incdomT<x2>  has been applied specifically using each model's curated
178       thresholds.
179
180
181       --cut_ga
182              Use the GA (gathering) bit scores  in  the  model  to  set  per-
183              sequence  (GA1)  and  per-domain  (GA2)  reporting and inclusion
184              thresholds. GA thresholds are generally  considered  to  be  the
185              reliable  curated  thresholds  defining  family  membership; for
186              example, in Pfam, these thresholds define what gets included  in
187              Pfam Full alignments based on searches with Pfam Seed models.
188
189
190       --cut_nc
191              Use  the  NC (noise cutoff) bit score thresholds in the model to
192              set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
193              sion  thresholds.  NC  thresholds are generally considered to be
194              the score of the highest-scoring known false positive.
195
196
197       --cut_tc
198              Use the NC (trusted cutoff) bit score thresholds in the model to
199              set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
200              sion thresholds. TC thresholds are generally  considered  to  be
201              the  score  of  the  lowest-scoring  known true positive that is
202              above all known false positives.
203
204
205
206
207

OPTIONS CONTROLLING THE ACCELERATION PIPELINE

209       HMMER3 searches are accelerated in a three-step  filter  pipeline:  the
210       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
211       ter is the fastest and most approximate; the last is the  full  Forward
212       scoring  algorithm.  There  is  also a bias filter step between MSV and
213       Viterbi. Targets that pass all the steps in the  acceleration  pipeline
214       are then subjected to postprocessing -- domain identification and scor‐
215       ing using the Forward/Backward algorithm.
216
217       Changing filter thresholds only removes or includes targets  from  con‐
218       sideration;  changing  filter  thresholds does not alter bit scores, E-
219       values, or alignments, all of which are determined solely  in  postpro‐
220       cessing.
221
222
223       --max  Turn  off  all  filters, including the bias filter, and run full
224              Forward/Backward postprocessing on every target. This  increases
225              sensitivity somewhat, at a large cost in speed.
226
227
228       --F1 <x>
229              Set  the P-value threshold for the MSV filter step.  The default
230              is 0.02, meaning that roughly 2% of the highest  scoring  nonho‐
231              mologous targets are expected to pass the filter.
232
233
234       --F2 <x>
235              Set  the  P-value  threshold  for  the Viterbi filter step.  The
236              default is 0.001.
237
238
239       --F3 <x>
240              Set the P-value threshold for  the  Forward  filter  step.   The
241              default is 1e-5.
242
243
244       --nobias
245              Turn  off  the bias filter. This increases sensitivity somewhat,
246              but can come at a high cost in speed, especially  if  the  query
247              has  biased  residue  composition (such as a repetitive sequence
248              region, or if it is a membrane protein  with  large  regions  of
249              hydrophobicity). Without the bias filter, too many sequences may
250              pass the filter with biased  queries,  leading  to  slower  than
251              expected  performance  as  the  computationally  intensive  For‐
252              ward/Backward algorithms shoulder an abnormally heavy load.
253
254
255
256

OTHER OPTIONS

258       --nonull2
259              Turn off the null2 score corrections for biased composition.
260
261
262       -Z <x> Assert that the total number of targets in your searches is <x>,
263              for  the  purposes  of per-sequence E-value calculations, rather
264              than the actual number of targets seen.
265
266
267       --domZ <x>
268              Assert that the total number of targets in your searches is <x>,
269              for the purposes of per-domain conditional E-value calculations,
270              rather than the number of  targets  that  passed  the  reporting
271              thresholds.
272
273
274       --seed <n>
275              Set the random number seed to <n>.  Some steps in postprocessing
276              require Monte Carlo simulation.  The default is to use  a  fixed
277              seed  (42),  so that results are exactly reproducible. Any other
278              positive integer will give  different  (but  also  reproducible)
279              results. A choice of 0 uses a randomly chosen seed.
280
281
282       --qformat <s>
283              Assert  that the query sequence file is in format <s>.  Accepted
284              formats include fasta, embl, genbank, ddbj, uniprot,  stockholm,
285              pfam,  a2m, and afa.  The default is to autodetect the format of
286              the file.
287
288
289
290       --cpu <n>
291              Set the number of parallel worker threads to <n>.   By  default,
292              HMMER  sets  this  to the number of CPU cores it detects in your
293              machine - that is, it tries to maximize the use of  your  avail‐
294              able  processor  cores.  Setting  <n>  higher than the number of
295              available cores is of little if any value, but you may  want  to
296              set  it  to  something less. You can also control this number by
297              setting an environment variable, HMMER_NCPU.
298
299              This option is only available if HMMER was compiled  with  POSIX
300              threads  support.  This  is  the  default,  but it may have been
301              turned off at compile-time for your site  or  machine  for  some
302              reason.
303
304
305
306       --stall
307              For  debugging the MPI master/worker version: pause after start,
308              to enable the developer to attach debuggers to the running  mas‐
309              ter  and worker(s) processes. Send SIGCONT signal to release the
310              pause.  (Under gdb: (gdb) signal  SIGCONT)  (Only  available  if
311              optional MPI support was enabled at compile-time.)
312
313
314       --mpi  Run in MPI master/worker mode, using mpirun.  (Only available if
315              optional MPI support was enabled at compile-time.)
316
317
318
319
320
321
322
323

COPYRIGHT

337       @HMMER_COPYRIGHT@
338       @HMMER_LICENSE@
339
340       For  additional  information  on  copyright and licensing, see the file
341       called COPYRIGHT in your HMMER source distribution, or  see  the  HMMER
342       web page (@HMMER_URL@).
343
344
345

AUTHOR

347       Eddy/Rivas Laboratory
348       Janelia Farm Research Campus
349       19700 Helix Drive
350       Ashburn VA 20147 USA
351       http://eddylab.org
352
353
354
355
356
357
358HMMER @HMMER_VERSION@            @HMMER_DATE@                     hmmsearch(1)