1phmmer(1) HMMER Manual phmmer(1)
2
3
4
6 phmmer - search protein sequence(s) against a protein sequence database
7
8
9
11 phmmer [options] <seqfile> <seqdb>
12
13
14
16 phmmer is used to search one or more query protein sequences against a
17 protein sequence database. For each query sequence in <seqfile>, use
18 that sequence to search the target database of sequences in <seqdb>,
19 and output ranked lists of the sequences with the most significant
20 matches to the query.
21
22
23 The output format is designed to be human-readable, but is often so
24 voluminous that reading it is impractical, and parsing it is a pain.
25 The --tblout and --domtblout options save output in simple tabular for‐
26 mats that are concise and easier to parse. The -o option allows redi‐
27 recting the main output, including throwing it away in /dev/null.
28
29
31 -h Help; print a brief reminder of command line usage and all
32 available options.
33
34
35
37 -o <f> Direct the main human-readable output to a file <f> instead of
38 the default stdout.
39
40
41 -A <f> Save a multiple alignment of all significant hits (those satis‐
42 fying inclusion thresholds) to the file <f> in Stockholm format.
43
44
45 --tblout <f>
46 Save a simple tabular (space-delimited) file summarizing the
47 per-target output, with one data line per homologous target
48 sequence found.
49
50
51 --domtblout <f>
52 Save a simple tabular (space-delimited) file summarizing the
53 per-domain output, with one data line per homologous domain
54 detected in a query sequence for each homologous model.
55
56
57 --acc Use accessions instead of names in the main output, where avail‐
58 able for profiles and/or sequences.
59
60
61 --noali
62 Omit the alignment section from the main output. This can
63 greatly reduce the output volume.
64
65
66 --notextw
67 Unlimit the length of each line in the main output. The default
68 is a limit of 120 characters per line, which helps in displaying
69 the output cleanly on terminals and in editors, but can truncate
70 target profile description lines.
71
72
73 --textw <n>
74 Set the main output's line length limit to <n> characters per
75 line. The default is 120.
76
77
78
79
81 The probability model in phmmer is constructed by inferring residue
82 probabilities from a standard 20x20 substitution score matrix, plus two
83 additional parameters for position-independent gap open and gap extend
84 probabilities.
85
86
87 --popen <x>
88 Set the gap open probability for a single sequence query model
89 to <x>. The default is 0.02. <x> must be >= 0 and < 0.5.
90
91
92 --pextend <x>
93 Set the gap extend probability for a single sequence query model
94 to <x>. The default is 0.4. <x> must be >= 0 and < 1.0.
95
96
97 --mxfile <mxfile>
98 Obtain residue alignment probabilities from the substitution
99 matrix in file <mxfile>. The default score matrix is BLOSUM62
100 (this matrix is internal to HMMER and does not have to be avail‐
101 able as a file). The format of a substitution matrix <mxfile>
102 is the standard format accepted by BLAST, FASTA, and other
103 sequence analysis software.
104
105
106
107
109 Reporting thresholds control which hits are reported in output files
110 (the main output, --tblout, and --domtblout). Sequence hits and domain
111 hits are ranked by statistical significance (E-value) and output is
112 generated in two sections called per-target and per-domain output. In
113 per-target output, by default, all sequence hits with an E-value <= 10
114 are reported. In the per-domain output, for each target that has passed
115 per-target reporting thresholds, all domains satisfying per-domain
116 reporting thresholds are reported. By default, these are domains with
117 conditional E-values of <= 10. The following options allow you to
118 change the default E-value reporting thresholds, or to use bit score
119 thresholds instead.
120
121
122
123 -E <x> In the per-target output, report target sequences with an E-
124 value of <= <x>. The default is 10.0, meaning that on average,
125 about 10 false positives will be reported per query, so you can
126 see the top of the noise and decide for yourself if it's really
127 noise.
128
129
130 -T <x> Instead of thresholding per-profile output on E-value, instead
131 report target sequences with a bit score of >= <x>.
132
133
134 --domE <x>
135 In the per-domain output, for target sequences that have already
136 satisfied the per-profile reporting threshold, report individual
137 domains with a conditional E-value of <= <x>. The default is
138 10.0. A conditional E-value means the expected number of addi‐
139 tional false positive domains in the smaller search space of
140 those comparisons that already satisfied the per-target report‐
141 ing threshold (and thus must have at least one homologous domain
142 already).
143
144
145 --domT <x>
146 Instead of thresholding per-domain output on E-value, instead
147 report domains with a bit score of >= <x>.
148
149
151 Inclusion thresholds are stricter than reporting thresholds. They con‐
152 trol which hits are included in any output multiple alignment (the -A
153 option) and which domains are marked as significant ("!") as opposed to
154 questionable ("?") in domain output.
155
156
157 --incE <x>
158 Use an E-value of <= <x> as the per-target inclusion threshold.
159 The default is 0.01, meaning that on average, about 1 false pos‐
160 itive would be expected in every 100 searches with different
161 query sequences.
162
163
164 --incT <x>
165 Instead of using E-values for setting the inclusion threshold,
166 instead use a bit score of >= <x> as the per-target inclusion
167 threshold. By default this option is unset.
168
169
170 --incdomE <x>
171 Use a conditional E-value of <= <x> as the per-domain inclusion
172 threshold, in targets that have already satisfied the overall
173 per-target inclusion threshold. The default is 0.01.
174
175
176 --incdomT <x>
177 Instead of using E-values, use a bit score of >= <x> as the per-
178 domain inclusion threshold. By default this option is unset.
179
180
181
182
183
185 HMMER3 searches are accelerated in a three-step filter pipeline: the
186 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
187 ter is the fastest and most approximate; the last is the full Forward
188 scoring algorithm, slowest but most accurate. There is also a bias fil‐
189 ter step between MSV and Viterbi. Targets that pass all the steps in
190 the acceleration pipeline are then subjected to postprocessing --
191 domain identification and scoring using the Forward/Backward algorithm.
192
193 Essentially the only free parameters that control HMMER's heuristic
194 filters are the P-value thresholds controlling the expected fraction of
195 nonhomologous sequences that pass the filters. Setting the default
196 thresholds higher will pass a higher proportion of nonhomologous
197 sequence, increasing sensitivity at the expense of speed; conversely,
198 setting lower P-value thresholds will pass a smaller proportion,
199 decreasing sensitivity and increasing speed. Setting a filter's P-value
200 threshold to 1.0 means it will passing all sequences, and effectively
201 disables the filter.
202
203 Changing filter thresholds only removes or includes targets from con‐
204 sideration; changing filter thresholds does not alter bit scores, E-
205 values, or alignments, all of which are determined solely in postpro‐
206 cessing.
207
208
209 --max Maximum sensitivity. Turn off all filters, including the bias
210 filter, and run full Forward/Backward postprocessing on every
211 target. This increases sensitivity slightly, at a large cost in
212 speed.
213
214
215 --F1 <x>
216 First filter threshold; set the P-value threshold for the MSV
217 filter step. The default is 0.02, meaning that roughly 2% of
218 the highest scoring nonhomologous targets are expected to pass
219 the filter.
220
221
222 --F2 <x>
223 Second filter threshold; set the P-value threshold for the
224 Viterbi filter step. The default is 0.001.
225
226
227 --F3 <x>
228 Third filter threshold; set the P-value threshold for the For‐
229 ward filter step. The default is 1e-5.
230
231
232 --nobias
233 Turn off the bias filter. This increases sensitivity somewhat,
234 but can come at a high cost in speed, especially if the query
235 has biased residue composition (such as a repetitive sequence
236 region, or if it is a membrane protein with large regions of
237 hydrophobicity). Without the bias filter, too many sequences may
238 pass the filter with biased queries, leading to slower than
239 expected performance as the computationally intensive For‐
240 ward/Backward algorithms shoulder an abnormally heavy load.
241
242
243
244
245
247 Estimating the location parameters for the expected score distributions
248 for MSV filter scores, Viterbi filter scores, and Forward scores
249 requires three short random sequence simulations.
250
251
252 --EmL <n>
253 Sets the sequence length in simulation that estimates the loca‐
254 tion parameter mu for MSV filter E-values. Default is 200.
255
256
257 --EmN <n>
258 Sets the number of sequences in simulation that estimates the
259 location parameter mu for MSV filter E-values. Default is 200.
260
261
262 --EvL <n>
263 Sets the sequence length in simulation that estimates the loca‐
264 tion parameter mu for Viterbi filter E-values. Default is 200.
265
266
267 --EvN <n>
268 Sets the number of sequences in simulation that estimates the
269 location parameter mu for Viterbi filter E-values. Default is
270 200.
271
272
273 --EfL <n>
274 Sets the sequence length in simulation that estimates the loca‐
275 tion parameter tau for Forward E-values. Default is 100.
276
277
278 --EfN <n>
279 Sets the number of sequences in simulation that estimates the
280 location parameter tau for Forward E-values. Default is 200.
281
282
283 --Eft <x>
284 Sets the tail mass fraction to fit in the simulation that esti‐
285 mates the location parameter tau for Forward evalues. Default is
286 0.04.
287
288
289
290
291
293 --nonull2
294 Turn off the null2 score corrections for biased composition.
295
296
297 -Z <x> Assert that the total number of targets in your searches is <x>,
298 for the purposes of per-sequence E-value calculations, rather
299 than the actual number of targets seen.
300
301
302 --domZ <x>
303 Assert that the total number of targets in your searches is <x>,
304 for the purposes of per-domain conditional E-value calculations,
305 rather than the number of targets that passed the reporting
306 thresholds.
307
308
309 --seed <n>
310 Seed the random number generator with <n>, an integer >= 0. If
311 <n> is >0, any stochastic simulations will be reproducible; the
312 same command will give the same results. If <n> is 0, the ran‐
313 dom number generator is seeded arbitrarily, and stochastic simu‐
314 lations will vary from run to run of the same command. The
315 default seed is 42.
316
317
318 --qformat <s>
319 Declare that the input <seqfile> is in format <s>. Accepted
320 formats include fasta, embl, genbank, ddbj, uniprot, stockholm,
321 pfam, a2m, and afa. The default is to autodetect the format of
322 the file.
323
324
325 --tformat <s>
326 Declare that the input <seqdb> is in format <s>. Accepted for‐
327 mats include fasta, embl, genbank, ddbj, uniprot, stockholm,
328 pfam, a2m, and afa. The default is to autodetect the format of
329 the file.
330
331
332 --cpu <n>
333 Set the number of parallel worker threads to <n>. By default,
334 HMMER sets this to the number of CPU cores it detects in your
335 machine - that is, it tries to maximize the use of your avail‐
336 able processor cores. Setting <n> higher than the number of
337 available cores is of little if any value, but you may want to
338 set it to something less. You can also control this number by
339 setting an environment variable, HMMER_NCPU.
340
341 This option is only available if HMMER was compiled with POSIX
342 threads support. This is the default, but it may have been
343 turned off at compile-time for your site or machine for some
344 reason.
345
346
347 --stall For debugging the MPI master/worker version: pause after
348 start, to enable the developer to attach debuggers to the run‐
349 ning master and worker(s) processes. Send SIGCONT signal to
350 release the pause. (Under gdb: (gdb) signal SIGCONT) (Only
351 available if optional MPI support was enabled at compile-time.)
352
353
354 --mpi Run in MPI master/worker mode, using mpirun. (Only available if
355 optional MPI support was enabled at compile-time.)
356
357
358
359
360
362 See hmmer(1) for a master man page with a list of all the individual
363 man pages for programs in the HMMER package.
364
365
366 For complete documentation, see the user guide that came with your
367 HMMER distribution (Userguide.pdf); or see the HMMER web page
368 (@HMMER_URL@).
369
370
371
372
374 @HMMER_COPYRIGHT@
375 @HMMER_LICENSE@
376
377 For additional information on copyright and licensing, see the file
378 called COPYRIGHT in your HMMER source distribution, or see the HMMER
379 web page (@HMMER_URL@).
380
381
382
384 Eddy/Rivas Laboratory
385 Janelia Farm Research Campus
386 19700 Helix Drive
387 Ashburn VA 20147 USA
388 http://eddylab.org
389
390
391
392
393HMMER @HMMER_VERSION@ @HMMER_DATE@ phmmer(1)