hmmscan(1)

1hmmscan(1)                       HMMER Manual                       hmmscan(1)
2
3
4

NAME

6       hmmscan - search sequence(s) against a profile database
7
8
9

SYNOPSIS

11       hmmscan [options] hmmdb seqfile
12
13
14
15

DESCRIPTION

17       hmmscan is used to search protein sequences against collections of pro‐
18       tein profiles. For each sequence in seqfile, use that query sequence to
19       search  the  target  database  of  profiles in hmmdb, and output ranked
20       lists of the profiles with the most  significant  matches  to  the  se‐
21       quence.
22
23
24       The  seqfile  may  contain  more  than one query sequence. Each will be
25       searched in turn against hmmdb.
26
27
28       The hmmdb needs to be press'ed using hmmpress before it can be searched
29       with hmmscan.  This creates four binary files, suffixed .h3{fimp}.
30
31
32       The  query  seqfile  may  be  '-' (a dash character), in which case the
33       query sequences are read from a stdin pipe instead of from a file.  The
34       hmmdb  cannot  be  read  from  a stdin stream, because it needs to have
35       those four auxiliary binary files generated by hmmpress.
36
37
38       The output format is designed to be human-readable, but is often so vo‐
39       luminous  that reading it is impractical, and parsing it is a pain. The
40       --tblout and --domtblout options save output in simple tabular  formats
41       that are concise and easier to parse.  The -o option allows redirecting
42       the main output, including throwing it away in /dev/null.
43
44
45
46

OPTIONS

48       -h     Help; print a brief reminder  of  command  line  usage  and  all
49              available options.
50
51
52
53

OPTIONS FOR CONTROLLING OUTPUT

55       -o <f> Direct  the  main human-readable output to a file <f> instead of
56              the default stdout.
57
58
59       --tblout <f>
60              Save a simple tabular  (space-delimited)  file  summarizing  the
61              per-target  output,  with  one  data  line per homologous target
62              model found.
63
64
65       --domtblout <f>
66              Save a simple tabular  (space-delimited)  file  summarizing  the
67              per-domain  output, with one data line per homologous domain de‐
68              tected in a query sequence for each homologous model.
69
70
71       --pfamtblout <f>
72              Save an especially succinct tabular (space-delimited) file  sum‐
73              marizing  the  per-target output, with one data line per homolo‐
74              gous target model found.
75
76
77
78       --acc  Use accessions instead of names in the main output, where avail‐
79              able for profiles and/or sequences.
80
81
82       --noali
83              Omit  the  alignment  section  from  the  main  output. This can
84              greatly reduce the output volume.
85
86
87       --notextw
88              Unlimit the length of each line in the main output. The  default
89              is a limit of 120 characters per line, which helps in displaying
90              the output cleanly on terminals and in editors, but can truncate
91              target profile description lines.
92
93
94       --textw <n>
95              Set  the  main  output's line length limit to <n> characters per
96              line. The default is 120.
97
98
99
100

OPTIONS FOR REPORTING THRESHOLDS

102       Reporting thresholds control which hits are reported  in  output  files
103       (the main output, --tblout, and --domtblout).
104
105
106       -E <x> In the per-target output, report target profiles with an E-value
107              of <= <x>.  The default is 10.0, meaning that on average,  about
108              10  false  positives  will be reported per query, so you can see
109              the top of the noise and decide  for  yourself  if  it's  really
110              noise.
111
112
113       -T <x> Instead  of  thresholding per-profile output on E-value, instead
114              report target profiles with a bit score of >= <x>.
115
116
117       --domE <x>
118              In the per-domain output, for target profiles that have  already
119              satisfied the per-profile reporting threshold, report individual
120              domains with a conditional E-value of <= <x>.   The  default  is
121              10.0.   A conditional E-value means the expected number of addi‐
122              tional false positive domains in the  smaller  search  space  of
123              those comparisons that already satisfied the per-profile report‐
124              ing threshold (and thus must have at least one homologous domain
125              already).
126
127
128
129       --domT <x>
130              Instead  of  thresholding  per-domain output on E-value, instead
131              report domains with a bit score of >= <x>.
132
133
134
135
136

OPTIONS FOR INCLUSION THRESHOLDS

138       Inclusion thresholds are stricter than reporting thresholds.  Inclusion
139       thresholds  control  which hits are considered to be reliable enough to
140       be included in an output alignment or a subsequent  search  round.   In
141       hmmscan,  which  does  not have any alignment output (like hmmsearch or
142       phmmer) nor any iterative  search  steps  (like  jackhmmer),  inclusion
143       thresholds have little effect. They only affect what domains get marked
144       as significant (!) or questionable (?) in domain output.
145
146
147       --incE <x>
148              Use an E-value of <= <x> as the per-target inclusion  threshold.
149              The default is 0.01, meaning that on average, about 1 false pos‐
150              itive would be expected in every  100  searches  with  different
151              query sequences.
152
153
154       --incT <x>
155              Instead  of  using E-values for setting the inclusion threshold,
156              instead use a bit score of >= <x> as  the  per-target  inclusion
157              threshold.  It would be unusual to use bit score thresholds with
158              hmmscan, because you don't expect a single  score  threshold  to
159              work  for  different  profiles; different profiles have slightly
160              different expected score distributions.
161
162
163       --incdomE <x>
164              Use a conditional E-value of <= <x> as the per-domain  inclusion
165              threshold,  in  targets  that have already satisfied the overall
166              per-target inclusion threshold.  The default is 0.01.
167
168
169       --incdomT <x>
170              Instead of using E-values, instead use a bit score of >= <x>  as
171              the  per-domain  inclusion  threshold.  As with --incT above, it
172              would be unusual to use a single bit score threshold in hmmscan.
173
174
175
176

OPTIONS FOR MODEL-SPECIFIC SCORE THRESHOLDING

178       Curated profile databases may define specific bit score thresholds  for
179       each profile, superseding any thresholding based on statistical signif‐
180       icance alone.
181
182       To use these options, the profile must contain the appropriate (GA, TC,
183       and/or  NC)  optional  score threshold annotation; this is picked up by
184       hmmbuild from Stockholm format alignment files. Each  thresholding  op‐
185       tion has two scores: the per-sequence threshold <x1> and the per-domain
186       threshold <x2>.  These act as if -T <x1> --incT <x1> --domT <x2> --inc‐
187       domT  <x2>  has  been  applied  specifically using each model's curated
188       thresholds.
189
190
191       --cut_ga
192              Use the GA (gathering) bit scores in the model  to  set  per-se‐
193              quence  (GA1)  and  per-domain  (GA2)  reporting  and  inclusion
194              thresholds. GA thresholds are generally considered to be the re‐
195              liable  curated thresholds defining family membership; for exam‐
196              ple, in Pfam, these thresholds define what gets included in Pfam
197              Full alignments based on searches with Pfam Seed models.
198
199
200       --cut_nc
201              Use  the  NC (noise cutoff) bit score thresholds in the model to
202              set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
203              sion  thresholds.  NC  thresholds are generally considered to be
204              the score of the highest-scoring known false positive.
205
206
207       --cut_tc
208              Use the NC (trusted cutoff) bit score thresholds in the model to
209              set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
210              sion thresholds. TC thresholds are generally  considered  to  be
211              the  score  of  the  lowest-scoring  known true positive that is
212              above all known false positives.
213
214
215
216
217

CONTROL OF THE ACCELERATION PIPELINE

219       HMMER3 searches are accelerated in a three-step  filter  pipeline:  the
220       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
221       ter is the fastest and most approximate; the last is the  full  Forward
222       scoring  algorithm.  There  is  also a bias filter step between MSV and
223       Viterbi. Targets that pass all the steps in the  acceleration  pipeline
224       are then subjected to postprocessing -- domain identification and scor‐
225       ing using the Forward/Backward algorithm.
226
227       Changing filter thresholds only removes or includes targets  from  con‐
228       sideration;  changing  filter  thresholds does not alter bit scores, E-
229       values, or alignments, all of which are determined solely  in  postpro‐
230       cessing.
231
232
233       --max  Turn  off  all  filters, including the bias filter, and run full
234              Forward/Backward postprocessing on every target. This  increases
235              sensitivity somewhat, at a large cost in speed.
236
237
238       --F1 <x>
239              Set  the P-value threshold for the MSV filter step.  The default
240              is 0.02, meaning that roughly 2% of the highest  scoring  nonho‐
241              mologous targets are expected to pass the filter.
242
243
244       --F2 <x>
245              Set  the P-value threshold for the Viterbi filter step.  The de‐
246              fault is 0.001.
247
248
249       --F3 <x>
250              Set the P-value threshold for the Forward filter step.  The  de‐
251              fault is 1e-5.
252
253
254       --nobias
255              Turn  off  the bias filter. This increases sensitivity somewhat,
256              but can come at a high cost in speed, especially  if  the  query
257              has  biased  residue  composition (such as a repetitive sequence
258              region, or if it is a membrane protein with large regions of hy‐
259              drophobicity).  Without  the bias filter, too many sequences may
260              pass the filter with biased queries, leading to slower than  ex‐
261              pected   performance   as  the  computationally  intensive  For‐
262              ward/Backward algorithms shoulder an abnormally heavy load.
263
264
265
266

OTHER OPTIONS

268       --nonull2
269              Turn off the null2 score corrections for biased composition.
270
271
272       -Z <x> Assert that the total number of targets in your searches is <x>,
273              for  the  purposes  of per-sequence E-value calculations, rather
274              than the actual number of targets seen.
275
276
277       --domZ <x>
278              Assert that the total number of targets in your searches is <x>,
279              for the purposes of per-domain conditional E-value calculations,
280              rather than the number of  targets  that  passed  the  reporting
281              thresholds.
282
283
284       --seed <n>
285              Set the random number seed to <n>.  Some steps in postprocessing
286              require Monte Carlo simulation.  The default is to use  a  fixed
287              seed  (42),  so that results are exactly reproducible. Any other
288              positive integer will give different (but also reproducible) re‐
289              sults. A choice of 0 uses an arbitrarily chosen seed.
290
291
292       --qformat <s>
293              Assert that input seqfile is in format <s>, bypassing format au‐
294              todetection.  Common choices for <s> include: fasta, embl,  gen‐
295              bank.   Alignment  formats  also  work;  common choices include:
296              stockholm, a2m, afa, psiblast, clustal, phylip.  For more infor‐
297              mation,  and  for  codes  for some less common formats, see main
298              documentation.  The string <s>  is  case-insensitive  (fasta  or
299              FASTA both work).
300
301
302
303
304       --cpu <n>
305              Set  the number of parallel worker threads to <n>.  On multicore
306              machines, the default is 2.  You can also control this number by
307              setting  an  environment  variable, HMMER_NCPU.  There is also a
308              master thread, so the actual number of threads that HMMER spawns
309              is <n>+1.
310
311              This  option  is  not available if HMMER was compiled with POSIX
312              threads support turned off.
313
314
315
316       --stall
317              For debugging the MPI master/worker version: pause after  start,
318              to  enable the developer to attach debuggers to the running mas‐
319              ter and worker(s) processes. Send SIGCONT signal to release  the
320              pause.  (Under gdb: (gdb) signal SIGCONT)
321
322              (Only  available if optional MPI support was enabled at compile-
323              time.)
324
325
326       --mpi  Run under MPI control with master/worker parallelization  (using
327              mpirun,  for example, or equivalent). Only available if optional
328              MPI support was enabled at compile-time.
329
330
331
332
333

COPYRIGHT

347       Copyright (C) 2020 Howard Hughes Medical Institute.
348       Freely distributed under the BSD open source license.
349
350       For  additional  information  on  copyright and licensing, see the file
351       called COPYRIGHT in your HMMER source distribution, or  see  the  HMMER
352       web page (http://hmmer.org/).
353
354
355

AUTHOR

357       http://eddylab.org
358
359
360
361
362HMMER 3.3.2                        Nov 2020                         hmmscan(1)