hmmsearch(1)

1hmmsearch(1)                     HMMER Manual                     hmmsearch(1)
2
3
4

NAME

6       hmmsearch - search profile(s) against a sequence database
7
8
9

SYNOPSIS

11       hmmsearch [options] hmmfile seqdb
12
13
14

DESCRIPTION

16       hmmsearch  is  used  to  search one or more profiles against a sequence
17       database.  For each profile in  hmmfile,  use  that  query  profile  to
18       search  the  target  database  of sequences in seqdb, and output ranked
19       lists of the sequences with the most significant matches  to  the  pro‐
20       file.  To build profiles from multiple alignments, see hmmbuild.
21
22
23       Either the query hmmfile or the target seqdb may be '-' (a dash charac‐
24       ter), in which case the query profile or target database input will  be
25       read  from  a  stdin pipe instead of from a file. Only one input source
26       can come through stdin, not both.  An exception is that if the  hmmfile
27       contains  more  than  one  profile  query,  then seqdb cannot come from
28       stdin, because we can't rewind the streaming target database to  search
29       it with another profile.
30
31
32       The output format is designed to be human-readable, but is often so vo‐
33       luminous that reading it is impractical, and parsing it is a pain.  The
34       --tblout  and --domtblout options save output in simple tabular formats
35       that are concise and easier to parse.  The -o option allows redirecting
36       the main output, including throwing it away in /dev/null.
37
38
39
40

OPTIONS

42       -h     Help;  print  a  brief  reminder  of  command line usage and all
43              available options.
44
45
46
47

OPTIONS FOR CONTROLLING OUTPUT

49       -o <f> Direct the main human-readable output to a file <f>  instead  of
50              the default stdout.
51
52
53       -A <f> Save  a multiple alignment of all significant hits (those satis‐
54              fying inclusion thresholds) to the file <f>.
55
56
57       --tblout <f>
58              Save a simple tabular  (space-delimited)  file  summarizing  the
59              per-target  output, with one data line per homologous target se‐
60              quence found.
61
62
63       --domtblout <f>
64              Save a simple tabular  (space-delimited)  file  summarizing  the
65              per-domain  output, with one data line per homologous domain de‐
66              tected in a query sequence for each homologous model.
67
68
69       --acc  Use accessions instead of names in the main output, where avail‐
70              able for profiles and/or sequences.
71
72
73       --noali
74              Omit  the  alignment  section  from  the  main  output. This can
75              greatly reduce the output volume.
76
77
78       --notextw
79              Unlimit the length of each line in the main output. The  default
80              is a limit of 120 characters per line, which helps in displaying
81              the output cleanly on terminals and in editors, but can truncate
82              target profile description lines.
83
84
85       --textw <n>
86              Set  the  main  output's line length limit to <n> characters per
87              line. The default is 120.
88
89
90
91

OPTIONS CONTROLLING REPORTING THRESHOLDS

93       Reporting thresholds control which hits are reported  in  output  files
94       (the main output, --tblout, and --domtblout).  Sequence hits and domain
95       hits are ranked by statistical significance  (E-value)  and  output  is
96       generated  in  two sections called per-target and per-domain output. In
97       per-target output, by default, all sequence hits with an E-value <=  10
98       are reported. In the per-domain output, for each target that has passed
99       per-target reporting thresholds, all domains satisfying per-domain  re‐
100       porting  thresholds  are  reported.  By default, these are domains with
101       conditional E-values of <= 10.  The  following  options  allow  you  to
102       change  the  default  E-value reporting thresholds, or to use bit score
103       thresholds instead.
104
105
106
107       -E <x> In the per-target output, report target  sequences  with  an  E-
108              value  of <= <x>.  The default is 10.0, meaning that on average,
109              about 10 false positives will be reported per query, so you  can
110              see  the top of the noise and decide for yourself if it's really
111              noise.
112
113
114       -T <x> Instead of thresholding per-profile output on  E-value,  instead
115              report target sequences with a bit score of >= <x>.
116
117
118       --domE <x>
119              In the per-domain output, for target sequences that have already
120              satisfied the per-profile reporting threshold, report individual
121              domains  with  a  conditional E-value of <= <x>.  The default is
122              10.0.  A conditional E-value means the expected number of  addi‐
123              tional  false  positive  domains  in the smaller search space of
124              those comparisons that already satisfied the per-target  report‐
125              ing threshold (and thus must have at least one homologous domain
126              already).
127
128
129
130       --domT <x>
131              Instead of thresholding per-domain output  on  E-value,  instead
132              report domains with a bit score of >= <x>.
133
134
135
136
137

OPTIONS FOR INCLUSION THRESHOLDS

139       Inclusion thresholds are stricter than reporting thresholds.  Inclusion
140       thresholds control which hits are considered to be reliable  enough  to
141       be  included  in  an  output alignment or a subsequent search round, or
142       marked as significant ("!") as opposed to questionable ("?")  in domain
143       output.
144
145
146       --incE <x>
147              Use  an E-value of <= <x> as the per-target inclusion threshold.
148              The default is 0.01, meaning that on average, about 1 false pos‐
149              itive  would  be  expected  in every 100 searches with different
150              query sequences.
151
152
153       --incT <x>
154              Instead of using E-values for setting the  inclusion  threshold,
155              instead  use  a  bit score of >= <x> as the per-target inclusion
156              threshold.  By default this option is unset.
157
158
159       --incdomE <x>
160              Use a conditional E-value of <= <x> as the per-domain  inclusion
161              threshold,  in  targets  that have already satisfied the overall
162              per-target inclusion threshold.  The default is 0.01.
163
164
165       --incdomT <x>
166              Instead of using E-values, use a bit score of >= <x> as the per-
167              domain inclusion threshold.
168
169
170
171

OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING

173       Curated  profile databases may define specific bit score thresholds for
174       each profile, superseding any thresholding based on statistical signif‐
175       icance alone.
176
177       To use these options, the profile must contain the appropriate (GA, TC,
178       and/or NC) optional score threshold annotation; this is  picked  up  by
179       hmmbuild  from  Stockholm format alignment files. Each thresholding op‐
180       tion has two scores: the per-sequence threshold <x1> and the per-domain
181       threshold  <x2>  These act as if -T <x1> --incT <x1> --domT <x2> --inc‐
182       domT <x2> has been applied  specifically  using  each  model's  curated
183       thresholds.
184
185
186       --cut_ga
187              Use  the  GA  (gathering) bit scores in the model to set per-se‐
188              quence  (GA1)  and  per-domain  (GA2)  reporting  and  inclusion
189              thresholds. GA thresholds are generally considered to be the re‐
190              liable curated thresholds defining family membership; for  exam‐
191              ple, in Pfam, these thresholds define what gets included in Pfam
192              Full alignments based on searches with Pfam Seed models.
193
194
195       --cut_nc
196              Use the NC (noise cutoff) bit score thresholds in the  model  to
197              set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
198              sion thresholds. NC thresholds are generally  considered  to  be
199              the score of the highest-scoring known false positive.
200
201
202       --cut_tc
203              Use the TC (trusted cutoff) bit score thresholds in the model to
204              set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
205              sion  thresholds.  TC  thresholds are generally considered to be
206              the score of the lowest-scoring  known  true  positive  that  is
207              above all known false positives.
208
209
210
211
212

OPTIONS CONTROLLING THE ACCELERATION PIPELINE

214       HMMER3  searches  are  accelerated in a three-step filter pipeline: the
215       MSV filter, the Viterbi filter, and the Forward filter. The first  fil‐
216       ter  is  the fastest and most approximate; the last is the full Forward
217       scoring algorithm. There is also a bias filter  step  between  MSV  and
218       Viterbi.  Targets  that pass all the steps in the acceleration pipeline
219       are then subjected to postprocessing -- domain identification and scor‐
220       ing using the Forward/Backward algorithm.
221
222       Changing  filter  thresholds only removes or includes targets from con‐
223       sideration; changing filter thresholds does not alter  bit  scores,  E-
224       values,  or  alignments, all of which are determined solely in postpro‐
225       cessing.
226
227
228       --max  Turn off all filters, including the bias filter,  and  run  full
229              Forward/Backward  postprocessing on every target. This increases
230              sensitivity somewhat, at a large cost in speed.
231
232
233       --F1 <x>
234              Set the P-value threshold for the MSV filter step.  The  default
235              is  0.02,  meaning that roughly 2% of the highest scoring nonho‐
236              mologous targets are expected to pass the filter.
237
238
239       --F2 <x>
240              Set the P-value threshold for the Viterbi filter step.  The  de‐
241              fault is 0.001.
242
243
244       --F3 <x>
245              Set  the P-value threshold for the Forward filter step.  The de‐
246              fault is 1e-5.
247
248
249       --nobias
250              Turn off the bias filter. This increases  sensitivity  somewhat,
251              but  can  come  at a high cost in speed, especially if the query
252              has biased residue composition (such as  a  repetitive  sequence
253              region, or if it is a membrane protein with large regions of hy‐
254              drophobicity). Without the bias filter, too many  sequences  may
255              pass  the filter with biased queries, leading to slower than ex‐
256              pected  performance  as  the  computationally   intensive   For‐
257              ward/Backward algorithms shoulder an abnormally heavy load.
258
259
260
261

OTHER OPTIONS

263       --nonull2
264              Turn off the null2 score corrections for biased composition.
265
266
267       -Z <x> Assert that the total number of targets in your searches is <x>,
268              for the purposes of per-sequence  E-value  calculations,  rather
269              than the actual number of targets seen.
270
271
272       --domZ <x>
273              Assert that the total number of targets in your searches is <x>,
274              for the purposes of per-domain conditional E-value calculations,
275              rather  than  the  number  of  targets that passed the reporting
276              thresholds.
277
278
279       --seed <n>
280              Set the random number seed to <n>.  Some steps in postprocessing
281              require  Monte  Carlo simulation.  The default is to use a fixed
282              seed (42), so that results are exactly reproducible.  Any  other
283              positive integer will give different (but also reproducible) re‐
284              sults. A choice of 0 uses a randomly chosen seed.
285
286
287       --tformat <s>
288              Assert that target sequence file seqfile is in format  <s>,  by‐
289              passing  format  autodetection.  Common choices for <s> include:
290              fasta, embl,  genbank.   Alignment  formats  also  work;  common
291              choices include: stockholm, a2m, afa, psiblast, clustal, phylip.
292              For more information, and for codes for some  less  common  for‐
293              mats,  see  main documentation.  The string <s> is case-insensi‐
294              tive (fasta or FASTA both work).
295
296
297       --cpu <n>
298              Set the number of parallel worker threads to <n>.  On  multicore
299              machines, the default is 2.  You can also control this number by
300              setting an environment variable, HMMER_NCPU.  There  is  also  a
301              master thread, so the actual number of threads that HMMER spawns
302              is <n>+1.
303
304              This option is not available if HMMER was  compiled  with  POSIX
305              threads support turned off.
306
307
308
309       --stall
310              For  debugging the MPI master/worker version: pause after start,
311              to enable the developer to attach debuggers to the running  mas‐
312              ter  and worker(s) processes. Send SIGCONT signal to release the
313              pause.  (Under gdb: (gdb) signal SIGCONT) (Only available if op‐
314              tional MPI support was enabled at compile-time.)
315
316
317
318
319       --mpi  Run  under MPI control with master/worker parallelization (using
320              mpirun, for example, or equivalent). Only available if  optional
321              MPI support was enabled at compile-time.
322
323
324
325
326
327
328

COPYRIGHT

342       Copyright (C) 2020 Howard Hughes Medical Institute.
343       Freely distributed under the BSD open source license.
344
345       For additional information on copyright and  licensing,  see  the  file
346       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
347       web page (http://hmmer.org/).
348
349
350

AUTHOR

352       http://eddylab.org
353
354
355
356
357
358
359HMMER 3.3.2                        Nov 2020                       hmmsearch(1)