1hmmsearch(1) HMMER Manual hmmsearch(1)
2
3
4
6 hmmsearch - search profile(s) against a sequence database
7
8
9
11 hmmsearch [options] hmmfile seqdb
12
13
14
16 hmmsearch is used to search one or more profiles against a sequence
17 database. For each profile in hmmfile, use that query profile to
18 search the target database of sequences in seqdb, and output ranked
19 lists of the sequences with the most significant matches to the pro‐
20 file. To build profiles from multiple alignments, see hmmbuild.
21
22
23 Either the query hmmfile or the target seqdb may be '-' (a dash charac‐
24 ter), in which case the query profile or target database input will be
25 read from a stdin pipe instead of from a file. Only one input source
26 can come through stdin, not both. An exception is that if the hmmfile
27 contains more than one profile query, then seqdb cannot come from
28 stdin, because we can't rewind the streaming target database to search
29 it with another profile.
30
31
32 The output format is designed to be human-readable, but is often so vo‐
33 luminous that reading it is impractical, and parsing it is a pain. The
34 --tblout and --domtblout options save output in simple tabular formats
35 that are concise and easier to parse. The -o option allows redirecting
36 the main output, including throwing it away in /dev/null.
37
38
39
40
42 -h Help; print a brief reminder of command line usage and all
43 available options.
44
45
46
47
49 -o <f> Direct the main human-readable output to a file <f> instead of
50 the default stdout.
51
52
53 -A <f> Save a multiple alignment of all significant hits (those satis‐
54 fying inclusion thresholds) to the file <f>.
55
56
57 --tblout <f>
58 Save a simple tabular (space-delimited) file summarizing the
59 per-target output, with one data line per homologous target se‐
60 quence found.
61
62
63 --domtblout <f>
64 Save a simple tabular (space-delimited) file summarizing the
65 per-domain output, with one data line per homologous domain de‐
66 tected in a query sequence for each homologous model.
67
68
69 --acc Use accessions instead of names in the main output, where avail‐
70 able for profiles and/or sequences.
71
72
73 --noali
74 Omit the alignment section from the main output. This can
75 greatly reduce the output volume.
76
77
78 --notextw
79 Unlimit the length of each line in the main output. The default
80 is a limit of 120 characters per line, which helps in displaying
81 the output cleanly on terminals and in editors, but can truncate
82 target profile description lines.
83
84
85 --textw <n>
86 Set the main output's line length limit to <n> characters per
87 line. The default is 120.
88
89
90
91
93 Reporting thresholds control which hits are reported in output files
94 (the main output, --tblout, and --domtblout). Sequence hits and domain
95 hits are ranked by statistical significance (E-value) and output is
96 generated in two sections called per-target and per-domain output. In
97 per-target output, by default, all sequence hits with an E-value <= 10
98 are reported. In the per-domain output, for each target that has passed
99 per-target reporting thresholds, all domains satisfying per-domain re‐
100 porting thresholds are reported. By default, these are domains with
101 conditional E-values of <= 10. The following options allow you to
102 change the default E-value reporting thresholds, or to use bit score
103 thresholds instead.
104
105
106
107 -E <x> In the per-target output, report target sequences with an E-
108 value of <= <x>. The default is 10.0, meaning that on average,
109 about 10 false positives will be reported per query, so you can
110 see the top of the noise and decide for yourself if it's really
111 noise.
112
113
114 -T <x> Instead of thresholding per-profile output on E-value, instead
115 report target sequences with a bit score of >= <x>.
116
117
118 --domE <x>
119 In the per-domain output, for target sequences that have already
120 satisfied the per-profile reporting threshold, report individual
121 domains with a conditional E-value of <= <x>. The default is
122 10.0. A conditional E-value means the expected number of addi‐
123 tional false positive domains in the smaller search space of
124 those comparisons that already satisfied the per-target report‐
125 ing threshold (and thus must have at least one homologous domain
126 already).
127
128
129
130 --domT <x>
131 Instead of thresholding per-domain output on E-value, instead
132 report domains with a bit score of >= <x>.
133
134
135
136
137
139 Inclusion thresholds are stricter than reporting thresholds. Inclusion
140 thresholds control which hits are considered to be reliable enough to
141 be included in an output alignment or a subsequent search round, or
142 marked as significant ("!") as opposed to questionable ("?") in domain
143 output.
144
145
146 --incE <x>
147 Use an E-value of <= <x> as the per-target inclusion threshold.
148 The default is 0.01, meaning that on average, about 1 false pos‐
149 itive would be expected in every 100 searches with different
150 query sequences.
151
152
153 --incT <x>
154 Instead of using E-values for setting the inclusion threshold,
155 instead use a bit score of >= <x> as the per-target inclusion
156 threshold. By default this option is unset.
157
158
159 --incdomE <x>
160 Use a conditional E-value of <= <x> as the per-domain inclusion
161 threshold, in targets that have already satisfied the overall
162 per-target inclusion threshold. The default is 0.01.
163
164
165 --incdomT <x>
166 Instead of using E-values, use a bit score of >= <x> as the per-
167 domain inclusion threshold.
168
169
170
171
173 Curated profile databases may define specific bit score thresholds for
174 each profile, superseding any thresholding based on statistical signif‐
175 icance alone.
176
177 To use these options, the profile must contain the appropriate (GA, TC,
178 and/or NC) optional score threshold annotation; this is picked up by
179 hmmbuild from Stockholm format alignment files. Each thresholding op‐
180 tion has two scores: the per-sequence threshold <x1> and the per-domain
181 threshold <x2> These act as if -T <x1> --incT <x1> --domT <x2> --inc‐
182 domT <x2> has been applied specifically using each model's curated
183 thresholds.
184
185
186 --cut_ga
187 Use the GA (gathering) bit scores in the model to set per-se‐
188 quence (GA1) and per-domain (GA2) reporting and inclusion
189 thresholds. GA thresholds are generally considered to be the re‐
190 liable curated thresholds defining family membership; for exam‐
191 ple, in Pfam, these thresholds define what gets included in Pfam
192 Full alignments based on searches with Pfam Seed models.
193
194
195 --cut_nc
196 Use the NC (noise cutoff) bit score thresholds in the model to
197 set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
198 sion thresholds. NC thresholds are generally considered to be
199 the score of the highest-scoring known false positive.
200
201
202 --cut_tc
203 Use the TC (trusted cutoff) bit score thresholds in the model to
204 set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
205 sion thresholds. TC thresholds are generally considered to be
206 the score of the lowest-scoring known true positive that is
207 above all known false positives.
208
209
210
211
212
214 HMMER3 searches are accelerated in a three-step filter pipeline: the
215 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
216 ter is the fastest and most approximate; the last is the full Forward
217 scoring algorithm. There is also a bias filter step between MSV and
218 Viterbi. Targets that pass all the steps in the acceleration pipeline
219 are then subjected to postprocessing -- domain identification and scor‐
220 ing using the Forward/Backward algorithm.
221
222 Changing filter thresholds only removes or includes targets from con‐
223 sideration; changing filter thresholds does not alter bit scores, E-
224 values, or alignments, all of which are determined solely in postpro‐
225 cessing.
226
227
228 --max Turn off all filters, including the bias filter, and run full
229 Forward/Backward postprocessing on every target. This increases
230 sensitivity somewhat, at a large cost in speed.
231
232
233 --F1 <x>
234 Set the P-value threshold for the MSV filter step. The default
235 is 0.02, meaning that roughly 2% of the highest scoring nonho‐
236 mologous targets are expected to pass the filter.
237
238
239 --F2 <x>
240 Set the P-value threshold for the Viterbi filter step. The de‐
241 fault is 0.001.
242
243
244 --F3 <x>
245 Set the P-value threshold for the Forward filter step. The de‐
246 fault is 1e-5.
247
248
249 --nobias
250 Turn off the bias filter. This increases sensitivity somewhat,
251 but can come at a high cost in speed, especially if the query
252 has biased residue composition (such as a repetitive sequence
253 region, or if it is a membrane protein with large regions of hy‐
254 drophobicity). Without the bias filter, too many sequences may
255 pass the filter with biased queries, leading to slower than ex‐
256 pected performance as the computationally intensive For‐
257 ward/Backward algorithms shoulder an abnormally heavy load.
258
259
260
261
263 --nonull2
264 Turn off the null2 score corrections for biased composition.
265
266
267 -Z <x> Assert that the total number of targets in your searches is <x>,
268 for the purposes of per-sequence E-value calculations, rather
269 than the actual number of targets seen.
270
271
272 --domZ <x>
273 Assert that the total number of targets in your searches is <x>,
274 for the purposes of per-domain conditional E-value calculations,
275 rather than the number of targets that passed the reporting
276 thresholds.
277
278
279 --seed <n>
280 Set the random number seed to <n>. Some steps in postprocessing
281 require Monte Carlo simulation. The default is to use a fixed
282 seed (42), so that results are exactly reproducible. Any other
283 positive integer will give different (but also reproducible) re‐
284 sults. A choice of 0 uses a randomly chosen seed.
285
286
287 --tformat <s>
288 Assert that target sequence file seqfile is in format <s>, by‐
289 passing format autodetection. Common choices for <s> include:
290 fasta, embl, genbank. Alignment formats also work; common
291 choices include: stockholm, a2m, afa, psiblast, clustal, phylip.
292 For more information, and for codes for some less common for‐
293 mats, see main documentation. The string <s> is case-insensi‐
294 tive (fasta or FASTA both work).
295
296
297 --cpu <n>
298 Set the number of parallel worker threads to <n>. On multicore
299 machines, the default is 2. You can also control this number by
300 setting an environment variable, HMMER_NCPU. There is also a
301 master thread, so the actual number of threads that HMMER spawns
302 is <n>+1.
303
304 This option is not available if HMMER was compiled with POSIX
305 threads support turned off.
306
307
308
309 --stall
310 For debugging the MPI master/worker version: pause after start,
311 to enable the developer to attach debuggers to the running mas‐
312 ter and worker(s) processes. Send SIGCONT signal to release the
313 pause. (Under gdb: (gdb) signal SIGCONT) (Only available if op‐
314 tional MPI support was enabled at compile-time.)
315
316
317
318
319 --mpi Run under MPI control with master/worker parallelization (using
320 mpirun, for example, or equivalent). Only available if optional
321 MPI support was enabled at compile-time.
322
323
324
325
326
327
328
330 See hmmer(1) for a master man page with a list of all the individual
331 man pages for programs in the HMMER package.
332
333
334 For complete documentation, see the user guide that came with your HM‐
335 MER distribution (Userguide.pdf); or see the HMMER web page (http://hm‐
336 mer.org/).
337
338
339
340
342 Copyright (C) 2020 Howard Hughes Medical Institute.
343 Freely distributed under the BSD open source license.
344
345 For additional information on copyright and licensing, see the file
346 called COPYRIGHT in your HMMER source distribution, or see the HMMER
347 web page (http://hmmer.org/).
348
349
350
352 http://eddylab.org
353
354
355
356
357
358
359HMMER 3.3.2 Nov 2020 hmmsearch(1)