1hmmsearch(1) HMMER Manual hmmsearch(1)
2
3
4
6 hmmsearch - search profile(s) against a sequence database
7
8
9
11 hmmsearch [options] <hmmfile> <seqdb>
12
13
14
16 hmmsearch is used to search one or more profiles against a sequence
17 database. For each profile in <hmmfile>, use that query profile to
18 search the target database of profiles in <seqdb>, and output ranked
19 lists of the sequences with the most significant matches to the pro‐
20 file.
21
22
23 The <hmmfile> may contain more than one profile. To build profiles from
24 multiple alignments, see hmmbuild.
25
26
27 The output format is designed to be human-readable, but is often so
28 voluminous that reading it is impractical, and parsing it is a pain.
29 The --tblout and --domtblout options save output in simple tabular for‐
30 mats that are concise and easier to parse. The -o option allows redi‐
31 recting the main output, including throwing it away in /dev/null.
32
33
34
35
37 -h Help; print a brief reminder of command line usage and all
38 available options.
39
40
41
42
44 -o <f> Direct the main human-readable output to a file <f> instead of
45 the default stdout.
46
47
48 -A <f> Save a multiple alignment of all significant hits (those satis‐
49 fying inclusion thresholds) to the file <f>.
50
51
52 --tblout <f>
53 Save a simple tabular (space-delimited) file summarizing the
54 per-target output, with one data line per homologous target
55 sequence found.
56
57
58 --domtblout <f>
59 Save a simple tabular (space-delimited) file summarizing the
60 per-domain output, with one data line per homologous domain
61 detected in a query sequence for each homologous model.
62
63
64 --acc Use accessions instead of names in the main output, where avail‐
65 able for profiles and/or sequences.
66
67
68 --noali
69 Omit the alignment section from the main output. This can
70 greatly reduce the output volume.
71
72
73 --notextw
74 Unlimit the length of each line in the main output. The default
75 is a limit of 120 characters per line, which helps in displaying
76 the output cleanly on terminals and in editors, but can truncate
77 target profile description lines.
78
79
80 --textw <n>
81 Set the main output's line length limit to <n> characters per
82 line. The default is 120.
83
84
85
86
88 Reporting thresholds control which hits are reported in output files
89 (the main output, --tblout, and --domtblout). Sequence hits and domain
90 hits are ranked by statistical significance (E-value) and output is
91 generated in two sections called per-target and per-domain output. In
92 per-target output, by default, all sequence hits with an E-value <= 10
93 are reported. In the per-domain output, for each target that has passed
94 per-target reporting thresholds, all domains satisfying per-domain
95 reporting thresholds are reported. By default, these are domains with
96 conditional E-values of <= 10. The following options allow you to
97 change the default E-value reporting thresholds, or to use bit score
98 thresholds instead.
99
100
101
102 -E <x> In the per-target output, report target sequences with an E-
103 value of <= <x>. The default is 10.0, meaning that on average,
104 about 10 false positives will be reported per query, so you can
105 see the top of the noise and decide for yourself if it's really
106 noise.
107
108
109 -T <x> Instead of thresholding per-profile output on E-value, instead
110 report target sequences with a bit score of >= <x>.
111
112
113 --domE <x>
114 In the per-domain output, for target sequences that have already
115 satisfied the per-profile reporting threshold, report individual
116 domains with a conditional E-value of <= <x>. The default is
117 10.0. A conditional E-value means the expected number of addi‐
118 tional false positive domains in the smaller search space of
119 those comparisons that already satisfied the per-target report‐
120 ing threshold (and thus must have at least one homologous domain
121 already).
122
123
124
125 --domT <x>
126 Instead of thresholding per-domain output on E-value, instead
127 report domains with a bit score of >= <x>.
128
129
130
131
132
134 Inclusion thresholds are stricter than reporting thresholds. Inclusion
135 thresholds control which hits are considered to be reliable enough to
136 be included in an output alignment or a subsequent search round, or
137 marked as significant ("!") as opposed to questionable ("?") in domain
138 output.
139
140
141 --incE <x>
142 Use an E-value of <= <x> as the per-target inclusion threshold.
143 The default is 0.01, meaning that on average, about 1 false pos‐
144 itive would be expected in every 100 searches with different
145 query sequences.
146
147
148 --incT <x>
149 Instead of using E-values for setting the inclusion threshold,
150 instead use a bit score of >= <x> as the per-target inclusion
151 threshold. By default this option is unset.
152
153
154 --incdomE <x>
155 Use a conditional E-value of <= <x> as the per-domain inclusion
156 threshold, in targets that have already satisfied the overall
157 per-target inclusion threshold. The default is 0.01.
158
159
160 --incdomT <x>
161 Instead of using E-values, use a bit score of >= <x> as the per-
162 domain inclusion threshold.
163
164
165
166
168 Curated profile databases may define specific bit score thresholds for
169 each profile, superseding any thresholding based on statistical signif‐
170 icance alone.
171
172 To use these options, the profile must contain the appropriate (GA, TC,
173 and/or NC) optional score threshold annotation; this is picked up by
174 hmmbuild from Stockholm format alignment files. Each thresholding
175 option has two scores: the per-sequence threshold <x1> and the per-
176 domain threshold <x2> These act as if -T<x1> --incT<x1> --domT<x2>
177 --incdomT<x2> has been applied specifically using each model's curated
178 thresholds.
179
180
181 --cut_ga
182 Use the GA (gathering) bit scores in the model to set per-
183 sequence (GA1) and per-domain (GA2) reporting and inclusion
184 thresholds. GA thresholds are generally considered to be the
185 reliable curated thresholds defining family membership; for
186 example, in Pfam, these thresholds define what gets included in
187 Pfam Full alignments based on searches with Pfam Seed models.
188
189
190 --cut_nc
191 Use the NC (noise cutoff) bit score thresholds in the model to
192 set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
193 sion thresholds. NC thresholds are generally considered to be
194 the score of the highest-scoring known false positive.
195
196
197 --cut_tc
198 Use the NC (trusted cutoff) bit score thresholds in the model to
199 set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
200 sion thresholds. TC thresholds are generally considered to be
201 the score of the lowest-scoring known true positive that is
202 above all known false positives.
203
204
205
206
207
209 HMMER3 searches are accelerated in a three-step filter pipeline: the
210 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
211 ter is the fastest and most approximate; the last is the full Forward
212 scoring algorithm. There is also a bias filter step between MSV and
213 Viterbi. Targets that pass all the steps in the acceleration pipeline
214 are then subjected to postprocessing -- domain identification and scor‐
215 ing using the Forward/Backward algorithm.
216
217 Changing filter thresholds only removes or includes targets from con‐
218 sideration; changing filter thresholds does not alter bit scores, E-
219 values, or alignments, all of which are determined solely in postpro‐
220 cessing.
221
222
223 --max Turn off all filters, including the bias filter, and run full
224 Forward/Backward postprocessing on every target. This increases
225 sensitivity somewhat, at a large cost in speed.
226
227
228 --F1 <x>
229 Set the P-value threshold for the MSV filter step. The default
230 is 0.02, meaning that roughly 2% of the highest scoring nonho‐
231 mologous targets are expected to pass the filter.
232
233
234 --F2 <x>
235 Set the P-value threshold for the Viterbi filter step. The
236 default is 0.001.
237
238
239 --F3 <x>
240 Set the P-value threshold for the Forward filter step. The
241 default is 1e-5.
242
243
244 --nobias
245 Turn off the bias filter. This increases sensitivity somewhat,
246 but can come at a high cost in speed, especially if the query
247 has biased residue composition (such as a repetitive sequence
248 region, or if it is a membrane protein with large regions of
249 hydrophobicity). Without the bias filter, too many sequences may
250 pass the filter with biased queries, leading to slower than
251 expected performance as the computationally intensive For‐
252 ward/Backward algorithms shoulder an abnormally heavy load.
253
254
255
256
258 --nonull2
259 Turn off the null2 score corrections for biased composition.
260
261
262 -Z <x> Assert that the total number of targets in your searches is <x>,
263 for the purposes of per-sequence E-value calculations, rather
264 than the actual number of targets seen.
265
266
267 --domZ <x>
268 Assert that the total number of targets in your searches is <x>,
269 for the purposes of per-domain conditional E-value calculations,
270 rather than the number of targets that passed the reporting
271 thresholds.
272
273
274 --seed <n>
275 Set the random number seed to <n>. Some steps in postprocessing
276 require Monte Carlo simulation. The default is to use a fixed
277 seed (42), so that results are exactly reproducible. Any other
278 positive integer will give different (but also reproducible)
279 results. A choice of 0 uses a randomly chosen seed.
280
281
282 --qformat <s>
283 Assert that the query sequence file is in format <s>. Accepted
284 formats include fasta, embl, genbank, ddbj, uniprot, stockholm,
285 pfam, a2m, and afa. The default is to autodetect the format of
286 the file.
287
288
289
290 --cpu <n>
291 Set the number of parallel worker threads to <n>. By default,
292 HMMER sets this to the number of CPU cores it detects in your
293 machine - that is, it tries to maximize the use of your avail‐
294 able processor cores. Setting <n> higher than the number of
295 available cores is of little if any value, but you may want to
296 set it to something less. You can also control this number by
297 setting an environment variable, HMMER_NCPU.
298
299 This option is only available if HMMER was compiled with POSIX
300 threads support. This is the default, but it may have been
301 turned off at compile-time for your site or machine for some
302 reason.
303
304
305
306 --stall
307 For debugging the MPI master/worker version: pause after start,
308 to enable the developer to attach debuggers to the running mas‐
309 ter and worker(s) processes. Send SIGCONT signal to release the
310 pause. (Under gdb: (gdb) signal SIGCONT) (Only available if
311 optional MPI support was enabled at compile-time.)
312
313
314 --mpi Run in MPI master/worker mode, using mpirun. (Only available if
315 optional MPI support was enabled at compile-time.)
316
317
318
319
320
321
322
323
325 See hmmer(1) for a master man page with a list of all the individual
326 man pages for programs in the HMMER package.
327
328
329 For complete documentation, see the user guide that came with your
330 HMMER distribution (Userguide.pdf); or see the HMMER web page
331 (@HMMER_URL@).
332
333
334
335
337 @HMMER_COPYRIGHT@
338 @HMMER_LICENSE@
339
340 For additional information on copyright and licensing, see the file
341 called COPYRIGHT in your HMMER source distribution, or see the HMMER
342 web page (@HMMER_URL@).
343
344
345
347 Eddy/Rivas Laboratory
348 Janelia Farm Research Campus
349 19700 Helix Drive
350 Ashburn VA 20147 USA
351 http://eddylab.org
352
353
354
355
356
357
358HMMER @HMMER_VERSION@ @HMMER_DATE@ hmmsearch(1)