1hmmscan(1) HMMER Manual hmmscan(1)
2
3
4
6 hmmscan - search sequence(s) against a profile database
7
8
9
11 hmmscan [options] <hmmdb> <seqfile>
12
13
14
15
17 hmmscan is used to search sequences against collections of profiles.
18 For each sequence in <seqfile>, use that query sequence to search the
19 target database of profiles in <hmmdb>, and output ranked lists of the
20 profiles with the most significant matches to the sequence.
21
22
23 The <seqfile> may contain more than one query sequence. It can be in
24 FASTA format, or several other common sequence file formats (genbank,
25 embl, and uniprot, among others), or in alignment file formats (stock‐
26 holm, aligned fasta, and others). See the --qformat option for a com‐
27 plete list.
28
29
30 The <hmmdb> needs to be press'ed using hmmpress before it can be
31 searched with hmmscan. This creates four binary files, suffixed
32 .h3{fimp}.
33
34
35 The output format is designed to be human-readable, but is often so
36 voluminous that reading it is impractical, and parsing it is a pain.
37 The --tblout and --domtblout options save output in simple tabular for‐
38 mats that are concise and easier to parse. The -o option allows redi‐
39 recting the main output, including throwing it away in /dev/null.
40
41
42
43
44
46 -h Help; print a brief reminder of command line usage and all
47 available options.
48
49
50
51
53 -o <f> Direct the main human-readable output to a file <f> instead of
54 the default stdout.
55
56
57 --tblout <f>
58 Save a simple tabular (space-delimited) file summarizing the
59 per-target output, with one data line per homologous target
60 model found.
61
62
63 --domtblout <f>
64 Save a simple tabular (space-delimited) file summarizing the
65 per-domain output, with one data line per homologous domain
66 detected in a query sequence for each homologous model.
67
68
69 --acc Use accessions instead of names in the main output, where avail‐
70 able for profiles and/or sequences.
71
72
73 --noali
74 Omit the alignment section from the main output. This can
75 greatly reduce the output volume.
76
77
78 --notextw
79 Unlimit the length of each line in the main output. The default
80 is a limit of 120 characters per line, which helps in displaying
81 the output cleanly on terminals and in editors, but can truncate
82 target profile description lines.
83
84
85 --textw <n>
86 Set the main output's line length limit to <n> characters per
87 line. The default is 120.
88
89
90
91
93 Reporting thresholds control which hits are reported in output files
94 (the main output, --tblout, and --domtblout).
95
96
97 -E <x> In the per-target output, report target profiles with an E-value
98 of <= <x>. The default is 10.0, meaning that on average, about
99 10 false positives will be reported per query, so you can see
100 the top of the noise and decide for yourself if it's really
101 noise.
102
103
104 -T <x> Instead of thresholding per-profile output on E-value, instead
105 report target profiles with a bit score of >= <x>.
106
107
108 --domE <x>
109 In the per-domain output, for target profiles that have already
110 satisfied the per-profile reporting threshold, report individual
111 domains with a conditional E-value of <= <x>. The default is
112 10.0. A conditional E-value means the expected number of addi‐
113 tional false positive domains in the smaller search space of
114 those comparisons that already satisfied the per-profile report‐
115 ing threshold (and thus must have at least one homologous domain
116 already).
117
118
119
120 --domT <x>
121 Instead of thresholding per-domain output on E-value, instead
122 report domains with a bit score of >= <x>.
123
124
125
126
127
129 Inclusion thresholds are stricter than reporting thresholds. Inclusion
130 thresholds control which hits are considered to be reliable enough to
131 be included in an output alignment or a subsequent search round. In
132 hmmscan, which does not have any alignment output (like hmmsearch or
133 phmmer) nor any iterative search steps (like jackhmmer), inclusion
134 thresholds have little effect. They only affect what domains get marked
135 as significant (!) or questionable (?) in domain output.
136
137
138 --incE <x>
139 Use an E-value of <= <x> as the per-target inclusion threshold.
140 The default is 0.01, meaning that on average, about 1 false pos‐
141 itive would be expected in every 100 searches with different
142 query sequences.
143
144
145 --incT <x>
146 Instead of using E-values for setting the inclusion threshold,
147 instead use a bit score of >= <x> as the per-target inclusion
148 threshold. It would be unusual to use bit score thresholds with
149 hmmscan, because you don't expect a single score threshold to
150 work for different profiles; different profiles have slightly
151 different expected score distributions.
152
153
154 --incdomE <x>
155 Use a conditional E-value of <= <x> as the per-domain inclusion
156 threshold, in targets that have already satisfied the overall
157 per-target inclusion threshold. The default is 0.01.
158
159
160 --incdomT <x>
161 Instead of using E-values, instead use a bit score of >= <x> as
162 the per-domain inclusion threshold. As with --incT above, it
163 would be unusual to use a single bit score threshold in hmmscan.
164
165
166
167
169 Curated profile databases may define specific bit score thresholds for
170 each profile, superseding any thresholding based on statistical signif‐
171 icance alone.
172
173 To use these options, the profile must contain the appropriate (GA, TC,
174 and/or NC) optional score threshold annotation; this is picked up by
175 hmmbuild from Stockholm format alignment files. Each thresholding
176 option has two scores: the per-sequence threshold <x1> and the per-
177 domain threshold <x2> These act as if -T<x1> --incT<x1> --domT<x2>
178 --incdomT<x2> has been applied specifically using each model's curated
179 thresholds.
180
181
182 --cut_ga
183 Use the GA (gathering) bit scores in the model to set per-
184 sequence (GA1) and per-domain (GA2) reporting and inclusion
185 thresholds. GA thresholds are generally considered to be the
186 reliable curated thresholds defining family membership; for
187 example, in Pfam, these thresholds define what gets included in
188 Pfam Full alignments based on searches with Pfam Seed models.
189
190
191 --cut_nc
192 Use the NC (noise cutoff) bit score thresholds in the model to
193 set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
194 sion thresholds. NC thresholds are generally considered to be
195 the score of the highest-scoring known false positive.
196
197
198 --cut_tc
199 Use the NC (trusted cutoff) bit score thresholds in the model to
200 set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
201 sion thresholds. TC thresholds are generally considered to be
202 the score of the lowest-scoring known true positive that is
203 above all known false positives.
204
205
206
207
208
210 HMMER3 searches are accelerated in a three-step filter pipeline: the
211 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
212 ter is the fastest and most approximate; the last is the full Forward
213 scoring algorithm. There is also a bias filter step between MSV and
214 Viterbi. Targets that pass all the steps in the acceleration pipeline
215 are then subjected to postprocessing -- domain identification and scor‐
216 ing using the Forward/Backward algorithm.
217
218 Changing filter thresholds only removes or includes targets from con‐
219 sideration; changing filter thresholds does not alter bit scores, E-
220 values, or alignments, all of which are determined solely in postpro‐
221 cessing.
222
223
224 --max Turn off all filters, including the bias filter, and run full
225 Forward/Backward postprocessing on every target. This increases
226 sensitivity somewhat, at a large cost in speed.
227
228
229 --F1 <x>
230 Set the P-value threshold for the MSV filter step. The default
231 is 0.02, meaning that roughly 2% of the highest scoring nonho‐
232 mologous targets are expected to pass the filter.
233
234
235 --F2 <x>
236 Set the P-value threshold for the Viterbi filter step. The
237 default is 0.001.
238
239
240 --F3 <x>
241 Set the P-value threshold for the Forward filter step. The
242 default is 1e-5.
243
244
245 --nobias
246 Turn off the bias filter. This increases sensitivity somewhat,
247 but can come at a high cost in speed, especially if the query
248 has biased residue composition (such as a repetitive sequence
249 region, or if it is a membrane protein with large regions of
250 hydrophobicity). Without the bias filter, too many sequences may
251 pass the filter with biased queries, leading to slower than
252 expected performance as the computationally intensive For‐
253 ward/Backward algorithms shoulder an abnormally heavy load.
254
255
256
257
259 --nonull2
260 Turn off the null2 score corrections for biased composition.
261
262
263 -Z <x> Assert that the total number of targets in your searches is <x>,
264 for the purposes of per-sequence E-value calculations, rather
265 than the actual number of targets seen.
266
267
268 --domZ <x>
269 Assert that the total number of targets in your searches is <x>,
270 for the purposes of per-domain conditional E-value calculations,
271 rather than the number of targets that passed the reporting
272 thresholds.
273
274
275 --seed <n>
276 Set the random number seed to <n>. Some steps in postprocessing
277 require Monte Carlo simulation. The default is to use a fixed
278 seed (42), so that results are exactly reproducible. Any other
279 positive integer will give different (but also reproducible)
280 results. A choice of 0 uses an arbitrarily chosen seed.
281
282
283 --qformat <s>
284 Assert that the query sequence file is in format <s>. Accepted
285 formats include fasta, embl, genbank, ddbj, uniprot, stockholm,
286 pfam, a2m, and afa.
287
288
289 --cpu <n>
290 Set the number of parallel worker threads to <n>. By default,
291 HMMER sets this to the number of CPU cores it detects in your
292 machine - that is, it tries to maximize the use of your avail‐
293 able processor cores. Setting <n> higher than the number of
294 available cores is of little if any value, but you may want to
295 set it to something less. You can also control this number by
296 setting an environment variable, HMMER_NCPU.
297
298 This option is only available if HMMER was compiled with POSIX
299 threads support. This is the default, but it may have been
300 turned off for your site or machine for some reason.
301
302
303
304 --stall
305 For debugging the MPI master/worker version: pause after start,
306 to enable the developer to attach debuggers to the running mas‐
307 ter and worker(s) processes. Send SIGCONT signal to release the
308 pause. (Under gdb: (gdb) signal SIGCONT)
309
310 (Only available if optional MPI support was enabled at compile-
311 time.)
312
313
314 --mpi Run in MPI master/worker mode, using mpirun.
315
316 (Only available if optional MPI support was enabled at compile-
317 time.)
318
319
320
321
322
323
324
325
327 See hmmer(1) for a master man page with a list of all the individual
328 man pages for programs in the HMMER package.
329
330
331 For complete documentation, see the user guide that came with your
332 HMMER distribution (Userguide.pdf); or see the HMMER web page
333 (@HMMER_URL@).
334
335
336
337
339 @HMMER_COPYRIGHT@
340 @HMMER_LICENSE@
341
342 For additional information on copyright and licensing, see the file
343 called COPYRIGHT in your HMMER source distribution, or see the HMMER
344 web page (@HMMER_URL@).
345
346
347
349 Eddy/Rivas Laboratory
350 Janelia Farm Research Campus
351 19700 Helix Drive
352 Ashburn VA 20147 USA
353 http://eddylab.org
354
355
356
357
358HMMER @HMMER_VERSION@ @HMMER_DATE@ hmmscan(1)