1hmmsearch(1) HMMER Manual hmmsearch(1)
2
3
4
6 hmmsearch - search profile(s) against a sequence database
7
8
9
11 hmmsearch [options] <hmmfile> <seqdb>
12
13
14
16 hmmsearch is used to search one or more profiles against a sequence
17 database. For each profile in <hmmfile>, use that query profile to
18 search the target database of sequences in <seqdb>, and output ranked
19 lists of the sequences with the most significant matches to the pro‐
20 file. To build profiles from multiple alignments, see hmmbuild.
21
22
23 Either the query <hmmfile> or the target <seqdb> may be '-' (a dash
24 character), in which case the query profile or target database input
25 will be read from a <stdin> pipe instead of from a file. Only one input
26 source can come through <stdin>, not both. An exception is that if the
27 <hmmfile> contains more than one profile query, then <seqdb> cannot
28 come from <stdin>, because we can't rewind the streaming target data‐
29 base to search it with another profile.
30
31
32 The output format is designed to be human-readable, but is often so
33 voluminous that reading it is impractical, and parsing it is a pain.
34 The --tblout and --domtblout options save output in simple tabular for‐
35 mats that are concise and easier to parse. The -o option allows redi‐
36 recting the main output, including throwing it away in /dev/null.
37
38
39
40
42 -h Help; print a brief reminder of command line usage and all
43 available options.
44
45
46
47
49 -o <f> Direct the main human-readable output to a file <f> instead of
50 the default stdout.
51
52
53 -A <f> Save a multiple alignment of all significant hits (those satis‐
54 fying inclusion thresholds) to the file <f>.
55
56
57 --tblout <f>
58 Save a simple tabular (space-delimited) file summarizing the
59 per-target output, with one data line per homologous target
60 sequence found.
61
62
63 --domtblout <f>
64 Save a simple tabular (space-delimited) file summarizing the
65 per-domain output, with one data line per homologous domain
66 detected in a query sequence for each homologous model.
67
68
69 --acc Use accessions instead of names in the main output, where avail‐
70 able for profiles and/or sequences.
71
72
73 --noali
74 Omit the alignment section from the main output. This can
75 greatly reduce the output volume.
76
77
78 --notextw
79 Unlimit the length of each line in the main output. The default
80 is a limit of 120 characters per line, which helps in displaying
81 the output cleanly on terminals and in editors, but can truncate
82 target profile description lines.
83
84
85 --textw <n>
86 Set the main output's line length limit to <n> characters per
87 line. The default is 120.
88
89
90
91
93 Reporting thresholds control which hits are reported in output files
94 (the main output, --tblout, and --domtblout). Sequence hits and domain
95 hits are ranked by statistical significance (E-value) and output is
96 generated in two sections called per-target and per-domain output. In
97 per-target output, by default, all sequence hits with an E-value <= 10
98 are reported. In the per-domain output, for each target that has passed
99 per-target reporting thresholds, all domains satisfying per-domain
100 reporting thresholds are reported. By default, these are domains with
101 conditional E-values of <= 10. The following options allow you to
102 change the default E-value reporting thresholds, or to use bit score
103 thresholds instead.
104
105
106
107 -E <x> In the per-target output, report target sequences with an E-
108 value of <= <x>. The default is 10.0, meaning that on average,
109 about 10 false positives will be reported per query, so you can
110 see the top of the noise and decide for yourself if it's really
111 noise.
112
113
114 -T <x> Instead of thresholding per-profile output on E-value, instead
115 report target sequences with a bit score of >= <x>.
116
117
118 --domE <x>
119 In the per-domain output, for target sequences that have already
120 satisfied the per-profile reporting threshold, report individual
121 domains with a conditional E-value of <= <x>. The default is
122 10.0. A conditional E-value means the expected number of addi‐
123 tional false positive domains in the smaller search space of
124 those comparisons that already satisfied the per-target report‐
125 ing threshold (and thus must have at least one homologous domain
126 already).
127
128
129
130 --domT <x>
131 Instead of thresholding per-domain output on E-value, instead
132 report domains with a bit score of >= <x>.
133
134
135
136
137
139 Inclusion thresholds are stricter than reporting thresholds. Inclusion
140 thresholds control which hits are considered to be reliable enough to
141 be included in an output alignment or a subsequent search round, or
142 marked as significant ("!") as opposed to questionable ("?") in domain
143 output.
144
145
146 --incE <x>
147 Use an E-value of <= <x> as the per-target inclusion threshold.
148 The default is 0.01, meaning that on average, about 1 false pos‐
149 itive would be expected in every 100 searches with different
150 query sequences.
151
152
153 --incT <x>
154 Instead of using E-values for setting the inclusion threshold,
155 instead use a bit score of >= <x> as the per-target inclusion
156 threshold. By default this option is unset.
157
158
159 --incdomE <x>
160 Use a conditional E-value of <= <x> as the per-domain inclusion
161 threshold, in targets that have already satisfied the overall
162 per-target inclusion threshold. The default is 0.01.
163
164
165 --incdomT <x>
166 Instead of using E-values, use a bit score of >= <x> as the per-
167 domain inclusion threshold.
168
169
170
171
173 Curated profile databases may define specific bit score thresholds for
174 each profile, superseding any thresholding based on statistical signif‐
175 icance alone.
176
177 To use these options, the profile must contain the appropriate (GA, TC,
178 and/or NC) optional score threshold annotation; this is picked up by
179 hmmbuild from Stockholm format alignment files. Each thresholding
180 option has two scores: the per-sequence threshold <x1> and the per-
181 domain threshold <x2> These act as if -T<x1> --incT<x1> --domT<x2>
182 --incdomT<x2> has been applied specifically using each model's curated
183 thresholds.
184
185
186 --cut_ga
187 Use the GA (gathering) bit scores in the model to set per-
188 sequence (GA1) and per-domain (GA2) reporting and inclusion
189 thresholds. GA thresholds are generally considered to be the
190 reliable curated thresholds defining family membership; for
191 example, in Pfam, these thresholds define what gets included in
192 Pfam Full alignments based on searches with Pfam Seed models.
193
194
195 --cut_nc
196 Use the NC (noise cutoff) bit score thresholds in the model to
197 set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
198 sion thresholds. NC thresholds are generally considered to be
199 the score of the highest-scoring known false positive.
200
201
202 --cut_tc
203 Use the TC (trusted cutoff) bit score thresholds in the model to
204 set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
205 sion thresholds. TC thresholds are generally considered to be
206 the score of the lowest-scoring known true positive that is
207 above all known false positives.
208
209
210
211
212
214 HMMER3 searches are accelerated in a three-step filter pipeline: the
215 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
216 ter is the fastest and most approximate; the last is the full Forward
217 scoring algorithm. There is also a bias filter step between MSV and
218 Viterbi. Targets that pass all the steps in the acceleration pipeline
219 are then subjected to postprocessing -- domain identification and scor‐
220 ing using the Forward/Backward algorithm.
221
222 Changing filter thresholds only removes or includes targets from con‐
223 sideration; changing filter thresholds does not alter bit scores, E-
224 values, or alignments, all of which are determined solely in postpro‐
225 cessing.
226
227
228 --max Turn off all filters, including the bias filter, and run full
229 Forward/Backward postprocessing on every target. This increases
230 sensitivity somewhat, at a large cost in speed.
231
232
233 --F1 <x>
234 Set the P-value threshold for the MSV filter step. The default
235 is 0.02, meaning that roughly 2% of the highest scoring nonho‐
236 mologous targets are expected to pass the filter.
237
238
239 --F2 <x>
240 Set the P-value threshold for the Viterbi filter step. The
241 default is 0.001.
242
243
244 --F3 <x>
245 Set the P-value threshold for the Forward filter step. The
246 default is 1e-5.
247
248
249 --nobias
250 Turn off the bias filter. This increases sensitivity somewhat,
251 but can come at a high cost in speed, especially if the query
252 has biased residue composition (such as a repetitive sequence
253 region, or if it is a membrane protein with large regions of
254 hydrophobicity). Without the bias filter, too many sequences may
255 pass the filter with biased queries, leading to slower than
256 expected performance as the computationally intensive For‐
257 ward/Backward algorithms shoulder an abnormally heavy load.
258
259
260
261
263 --nonull2
264 Turn off the null2 score corrections for biased composition.
265
266
267 -Z <x> Assert that the total number of targets in your searches is <x>,
268 for the purposes of per-sequence E-value calculations, rather
269 than the actual number of targets seen.
270
271
272 --domZ <x>
273 Assert that the total number of targets in your searches is <x>,
274 for the purposes of per-domain conditional E-value calculations,
275 rather than the number of targets that passed the reporting
276 thresholds.
277
278
279 --seed <n>
280 Set the random number seed to <n>. Some steps in postprocessing
281 require Monte Carlo simulation. The default is to use a fixed
282 seed (42), so that results are exactly reproducible. Any other
283 positive integer will give different (but also reproducible)
284 results. A choice of 0 uses a randomly chosen seed.
285
286
287 --tformat <s>
288 Assert that the target sequence database file is in format <s>.
289 Accepted formats include fasta, embl, genbank, ddbj, uniprot,
290 stockholm, pfam, a2m, and afa. The default is to autodetect the
291 format of the file.
292
293
294
295 --cpu <n>
296 Set the number of parallel worker threads to <n>. By default,
297 HMMER sets this to the number of CPU cores it detects in your
298 machine - that is, it tries to maximize the use of your avail‐
299 able processor cores. Setting <n> higher than the number of
300 available cores is of little if any value, but you may want to
301 set it to something less. You can also control this number by
302 setting an environment variable, HMMER_NCPU.
303
304 This option is only available if HMMER was compiled with POSIX
305 threads support. This is the default, but it may have been
306 turned off at compile-time for your site or machine for some
307 reason.
308
309
310
311 --stall
312 For debugging the MPI master/worker version: pause after start,
313 to enable the developer to attach debuggers to the running mas‐
314 ter and worker(s) processes. Send SIGCONT signal to release the
315 pause. (Under gdb: (gdb) signal SIGCONT) (Only available if
316 optional MPI support was enabled at compile-time.)
317
318
319 --mpi Run in MPI master/worker mode, using mpirun. (Only available if
320 optional MPI support was enabled at compile-time.)
321
322
323
324
325
326
327
328
330 See hmmer(1) for a master man page with a list of all the individual
331 man pages for programs in the HMMER package.
332
333
334 For complete documentation, see the user guide that came with your
335 HMMER distribution (Userguide.pdf); or see the HMMER web page ().
336
337
338
339
341 Copyright (C) 2015 Howard Hughes Medical Institute.
342 Freely distributed under the GNU General Public License (GPLv3).
343
344 For additional information on copyright and licensing, see the file
345 called COPYRIGHT in your HMMER source distribution, or see the HMMER
346 web page ().
347
348
349
351 Eddy/Rivas Laboratory
352 Janelia Farm Research Campus
353 19700 Helix Drive
354 Ashburn VA 20147 USA
355 http://eddylab.org
356
357
358
359
360
361
362HMMER 3.1b2 February 2015 hmmsearch(1)