1phmmer(1) HMMER Manual phmmer(1)
2
3
4
6 phmmer - search protein sequence(s) against a protein sequence database
7
8
9
11 phmmer [options] <seqfile> <seqdb>
12
13
14
16 phmmer is used to search one or more query protein sequences against a
17 protein sequence database. For each query sequence in <seqfile>, use
18 that sequence to search the target database of sequences in <seqdb>,
19 and output ranked lists of the sequences with the most significant
20 matches to the query.
21
22
23 Either the query <seqfile> or the target <seqdb> may be '-' (a dash
24 character), in which case the query sequences or target database input
25 will be read from a <stdin> pipe instead of from a file. Only one input
26 source can come through <stdin>, not both. An exception is that if the
27 <seqfile> contains more than one query sequence, then <seqdb> cannot
28 come from <stdin>, because we can't rewind the streaming target data‐
29 base to search it with another query.
30
31
32
33 The output format is designed to be human-readable, but is often so
34 voluminous that reading it is impractical, and parsing it is a pain.
35 The --tblout and --domtblout options save output in simple tabular for‐
36 mats that are concise and easier to parse. The -o option allows redi‐
37 recting the main output, including throwing it away in /dev/null.
38
39
41 -h Help; print a brief reminder of command line usage and all
42 available options.
43
44
45
47 -o <f> Direct the main human-readable output to a file <f> instead of
48 the default stdout.
49
50
51 -A <f> Save a multiple alignment of all significant hits (those satis‐
52 fying inclusion thresholds) to the file <f> in Stockholm format.
53
54
55 --tblout <f>
56 Save a simple tabular (space-delimited) file summarizing the
57 per-target output, with one data line per homologous target
58 sequence found.
59
60
61 --domtblout <f>
62 Save a simple tabular (space-delimited) file summarizing the
63 per-domain output, with one data line per homologous domain
64 detected in a query sequence for each homologous model.
65
66
67 --acc Use accessions instead of names in the main output, where avail‐
68 able for profiles and/or sequences.
69
70
71 --noali
72 Omit the alignment section from the main output. This can
73 greatly reduce the output volume.
74
75
76 --notextw
77 Unlimit the length of each line in the main output. The default
78 is a limit of 120 characters per line, which helps in displaying
79 the output cleanly on terminals and in editors, but can truncate
80 target profile description lines.
81
82
83 --textw <n>
84 Set the main output's line length limit to <n> characters per
85 line. The default is 120.
86
87
88
89
91 The probability model in phmmer is constructed by inferring residue
92 probabilities from a standard 20x20 substitution score matrix, plus two
93 additional parameters for position-independent gap open and gap extend
94 probabilities.
95
96
97 --popen <x>
98 Set the gap open probability for a single sequence query model
99 to <x>. The default is 0.02. <x> must be >= 0 and < 0.5.
100
101
102 --pextend <x>
103 Set the gap extend probability for a single sequence query model
104 to <x>. The default is 0.4. <x> must be >= 0 and < 1.0.
105
106
107 --mx <s>
108 Obtain residue alignment probabilities from the built-in substi‐
109 tution matrix named <s>. Several standard matrices are built-
110 in, and do not need to be read from files. The matrix name <s>
111 can be PAM30, PAM70, PAM120, PAM240, BLOSUM45, BLOSUM50, BLO‐
112 SUM62, BLOSUM80, or BLOSUM90. Only one of the --mx and --mxfile
113 options may be used.
114
115
116 --mxfile <mxfile>
117 Obtain residue alignment probabilities from the substitution
118 matrix in file <mxfile>. The default score matrix is BLOSUM62
119 (this matrix is internal to HMMER and does not have to be avail‐
120 able as a file). The format of a substitution matrix <mxfile>
121 is the standard format accepted by BLAST, FASTA, and other
122 sequence analysis software. Only one of the --mx and --mxfile
123 options may be used.
124
125
126
128 Reporting thresholds control which hits are reported in output files
129 (the main output, --tblout, and --domtblout). Sequence hits and domain
130 hits are ranked by statistical significance (E-value) and output is
131 generated in two sections called per-target and per-domain output. In
132 per-target output, by default, all sequence hits with an E-value <= 10
133 are reported. In the per-domain output, for each target that has passed
134 per-target reporting thresholds, all domains satisfying per-domain
135 reporting thresholds are reported. By default, these are domains with
136 conditional E-values of <= 10. The following options allow you to
137 change the default E-value reporting thresholds, or to use bit score
138 thresholds instead.
139
140
141
142 -E <x> In the per-target output, report target sequences with an E-
143 value of <= <x>. The default is 10.0, meaning that on average,
144 about 10 false positives will be reported per query, so you can
145 see the top of the noise and decide for yourself if it's really
146 noise.
147
148
149 -T <x> Instead of thresholding per-profile output on E-value, instead
150 report target sequences with a bit score of >= <x>.
151
152
153 --domE <x>
154 In the per-domain output, for target sequences that have already
155 satisfied the per-profile reporting threshold, report individual
156 domains with a conditional E-value of <= <x>. The default is
157 10.0. A conditional E-value means the expected number of addi‐
158 tional false positive domains in the smaller search space of
159 those comparisons that already satisfied the per-target report‐
160 ing threshold (and thus must have at least one homologous domain
161 already).
162
163
164 --domT <x>
165 Instead of thresholding per-domain output on E-value, instead
166 report domains with a bit score of >= <x>.
167
168
170 Inclusion thresholds are stricter than reporting thresholds. They con‐
171 trol which hits are included in any output multiple alignment (the -A
172 option) and which domains are marked as significant ("!") as opposed to
173 questionable ("?") in domain output.
174
175
176 --incE <x>
177 Use an E-value of <= <x> as the per-target inclusion threshold.
178 The default is 0.01, meaning that on average, about 1 false pos‐
179 itive would be expected in every 100 searches with different
180 query sequences.
181
182
183 --incT <x>
184 Instead of using E-values for setting the inclusion threshold,
185 instead use a bit score of >= <x> as the per-target inclusion
186 threshold. By default this option is unset.
187
188
189 --incdomE <x>
190 Use a conditional E-value of <= <x> as the per-domain inclusion
191 threshold, in targets that have already satisfied the overall
192 per-target inclusion threshold. The default is 0.01.
193
194
195 --incdomT <x>
196 Instead of using E-values, use a bit score of >= <x> as the per-
197 domain inclusion threshold. By default this option is unset.
198
199
200
201
202
204 HMMER3 searches are accelerated in a three-step filter pipeline: the
205 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
206 ter is the fastest and most approximate; the last is the full Forward
207 scoring algorithm, slowest but most accurate. There is also a bias fil‐
208 ter step between MSV and Viterbi. Targets that pass all the steps in
209 the acceleration pipeline are then subjected to postprocessing --
210 domain identification and scoring using the Forward/Backward algorithm.
211
212 Essentially the only free parameters that control HMMER's heuristic
213 filters are the P-value thresholds controlling the expected fraction of
214 nonhomologous sequences that pass the filters. Setting the default
215 thresholds higher will pass a higher proportion of nonhomologous
216 sequence, increasing sensitivity at the expense of speed; conversely,
217 setting lower P-value thresholds will pass a smaller proportion,
218 decreasing sensitivity and increasing speed. Setting a filter's P-value
219 threshold to 1.0 means it will passing all sequences, and effectively
220 disables the filter.
221
222 Changing filter thresholds only removes or includes targets from con‐
223 sideration; changing filter thresholds does not alter bit scores, E-
224 values, or alignments, all of which are determined solely in postpro‐
225 cessing.
226
227
228 --max Maximum sensitivity. Turn off all filters, including the bias
229 filter, and run full Forward/Backward postprocessing on every
230 target. This increases sensitivity slightly, at a large cost in
231 speed.
232
233
234 --F1 <x>
235 First filter threshold; set the P-value threshold for the MSV
236 filter step. The default is 0.02, meaning that roughly 2% of
237 the highest scoring nonhomologous targets are expected to pass
238 the filter.
239
240
241 --F2 <x>
242 Second filter threshold; set the P-value threshold for the
243 Viterbi filter step. The default is 0.001.
244
245
246 --F3 <x>
247 Third filter threshold; set the P-value threshold for the For‐
248 ward filter step. The default is 1e-5.
249
250
251 --nobias
252 Turn off the bias filter. This increases sensitivity somewhat,
253 but can come at a high cost in speed, especially if the query
254 has biased residue composition (such as a repetitive sequence
255 region, or if it is a membrane protein with large regions of
256 hydrophobicity). Without the bias filter, too many sequences may
257 pass the filter with biased queries, leading to slower than
258 expected performance as the computationally intensive For‐
259 ward/Backward algorithms shoulder an abnormally heavy load.
260
261
262
263
264
266 Estimating the location parameters for the expected score distributions
267 for MSV filter scores, Viterbi filter scores, and Forward scores
268 requires three short random sequence simulations.
269
270
271 --EmL <n>
272 Sets the sequence length in simulation that estimates the loca‐
273 tion parameter mu for MSV filter E-values. Default is 200.
274
275
276 --EmN <n>
277 Sets the number of sequences in simulation that estimates the
278 location parameter mu for MSV filter E-values. Default is 200.
279
280
281 --EvL <n>
282 Sets the sequence length in simulation that estimates the loca‐
283 tion parameter mu for Viterbi filter E-values. Default is 200.
284
285
286 --EvN <n>
287 Sets the number of sequences in simulation that estimates the
288 location parameter mu for Viterbi filter E-values. Default is
289 200.
290
291
292 --EfL <n>
293 Sets the sequence length in simulation that estimates the loca‐
294 tion parameter tau for Forward E-values. Default is 100.
295
296
297 --EfN <n>
298 Sets the number of sequences in simulation that estimates the
299 location parameter tau for Forward E-values. Default is 200.
300
301
302 --Eft <x>
303 Sets the tail mass fraction to fit in the simulation that esti‐
304 mates the location parameter tau for Forward evalues. Default is
305 0.04.
306
307
308
309
310
312 --nonull2
313 Turn off the null2 score corrections for biased composition.
314
315
316 -Z <x> Assert that the total number of targets in your searches is <x>,
317 for the purposes of per-sequence E-value calculations, rather
318 than the actual number of targets seen.
319
320
321 --domZ <x>
322 Assert that the total number of targets in your searches is <x>,
323 for the purposes of per-domain conditional E-value calculations,
324 rather than the number of targets that passed the reporting
325 thresholds.
326
327
328 --seed <n>
329 Seed the random number generator with <n>, an integer >= 0. If
330 <n> is >0, any stochastic simulations will be reproducible; the
331 same command will give the same results. If <n> is 0, the ran‐
332 dom number generator is seeded arbitrarily, and stochastic simu‐
333 lations will vary from run to run of the same command. The
334 default seed is 42.
335
336
337 --qformat <s>
338 Declare that the input <seqfile> is in format <s>. Accepted
339 formats include fasta, embl, genbank, ddbj, uniprot, stockholm,
340 pfam, a2m, and afa. The default is to autodetect the format of
341 the file.
342
343
344 --tformat <s>
345 Declare that the input <seqdb> is in format <s>. Accepted for‐
346 mats include fasta, embl, genbank, ddbj, uniprot, stockholm,
347 pfam, a2m, and afa. The default is to autodetect the format of
348 the file.
349
350
351 --cpu <n>
352 Set the number of parallel worker threads to <n>. By default,
353 HMMER sets this to the number of CPU cores it detects in your
354 machine - that is, it tries to maximize the use of your avail‐
355 able processor cores. Setting <n> higher than the number of
356 available cores is of little if any value, but you may want to
357 set it to something less. You can also control this number by
358 setting an environment variable, HMMER_NCPU.
359
360 This option is only available if HMMER was compiled with POSIX
361 threads support. This is the default, but it may have been
362 turned off at compile-time for your site or machine for some
363 reason.
364
365
366 --stall
367 For debugging the MPI master/worker version: pause after start,
368 to enable the developer to attach debuggers to the running mas‐
369 ter and worker(s) processes. Send SIGCONT signal to release the
370 pause. (Under gdb: (gdb) signal SIGCONT) (Only available if
371 optional MPI support was enabled at compile-time.)
372
373
374 --mpi Run in MPI master/worker mode, using mpirun. (Only available if
375 optional MPI support was enabled at compile-time.)
376
377
378
379
380
382 See hmmer(1) for a master man page with a list of all the individual
383 man pages for programs in the HMMER package.
384
385
386 For complete documentation, see the user guide that came with your
387 HMMER distribution (Userguide.pdf); or see the HMMER web page ().
388
389
390
391
393 Copyright (C) 2015 Howard Hughes Medical Institute.
394 Freely distributed under the GNU General Public License (GPLv3).
395
396 For additional information on copyright and licensing, see the file
397 called COPYRIGHT in your HMMER source distribution, or see the HMMER
398 web page ().
399
400
401
403 Eddy/Rivas Laboratory
404 Janelia Farm Research Campus
405 19700 Helix Drive
406 Ashburn VA 20147 USA
407 http://eddylab.org
408
409
410
411
412HMMER 3.1b2 February 2015 phmmer(1)