1phmmer(1) HMMER Manual phmmer(1)
2
3
4
6 phmmer - search protein sequence(s) against a protein sequence database
7
8
9
11 phmmer [options] seqfile seqdb
12
13
14
16 phmmer is used to search one or more query protein sequences against a
17 protein sequence database. For each query sequence in seqfile, use
18 that sequence to search the target database of sequences in seqdb, and
19 output ranked lists of the sequences with the most significant matches
20 to the query.
21
22
23 Either the query seqfile or the target seqdb may be '-' (a dash charac‐
24 ter), in which case the query sequences or target database input will
25 be read from a <stdin> pipe instead of from a file. Only one input
26 source can come through <stdin>, not both. An exception is that if the
27 seqfile contains more than one query sequence, then seqdb cannot come
28 from <stdin>, because we can't rewind the streaming target database to
29 search it with another query.
30
31
32
33 The output format is designed to be human-readable, but is often so vo‐
34 luminous that reading it is impractical, and parsing it is a pain. The
35 --tblout and --domtblout options save output in simple tabular formats
36 that are concise and easier to parse. The -o option allows redirecting
37 the main output, including throwing it away in /dev/null.
38
39
41 -h Help; print a brief reminder of command line usage and all
42 available options.
43
44
45
47 -o <f> Direct the main human-readable output to a file <f> instead of
48 the default stdout.
49
50
51 -A <f> Save a multiple alignment of all significant hits (those satis‐
52 fying inclusion thresholds) to the file <f> in Stockholm format.
53
54
55 --tblout <f>
56 Save a simple tabular (space-delimited) file summarizing the
57 per-target output, with one data line per homologous target se‐
58 quence found.
59
60
61 --domtblout <f>
62 Save a simple tabular (space-delimited) file summarizing the
63 per-domain output, with one data line per homologous domain de‐
64 tected in a query sequence for each homologous model.
65
66
67 --acc Use accessions instead of names in the main output, where avail‐
68 able for profiles and/or sequences.
69
70
71 --noali
72 Omit the alignment section from the main output. This can
73 greatly reduce the output volume.
74
75
76 --notextw
77 Unlimit the length of each line in the main output. The default
78 is a limit of 120 characters per line, which helps in displaying
79 the output cleanly on terminals and in editors, but can truncate
80 target profile description lines.
81
82
83 --textw <n>
84 Set the main output's line length limit to <n> characters per
85 line. The default is 120.
86
87
88
89
91 The probability model in phmmer is constructed by inferring residue
92 probabilities from a standard 20x20 substitution score matrix, plus two
93 additional parameters for position-independent gap open and gap extend
94 probabilities.
95
96
97 --popen <x>
98 Set the gap open probability for a single sequence query model
99 to <x>. The default is 0.02. <x> must be >= 0 and < 0.5.
100
101
102 --pextend <x>
103 Set the gap extend probability for a single sequence query model
104 to <x>. The default is 0.4. <x> must be >= 0 and < 1.0.
105
106
107 --mx <s>
108 Obtain residue alignment probabilities from the built-in substi‐
109 tution matrix named <s>. Several standard matrices are built-
110 in, and do not need to be read from files. The matrix name <s>
111 can be PAM30, PAM70, PAM120, PAM240, BLOSUM45, BLOSUM50, BLO‐
112 SUM62, BLOSUM80, or BLOSUM90. Only one of the --mx and --mxfile
113 options may be used.
114
115
116 --mxfile mxfile
117 Obtain residue alignment probabilities from the substitution ma‐
118 trix in file mxfile. The default score matrix is BLOSUM62 (this
119 matrix is internal to HMMER and does not have to be available as
120 a file). The format of a substitution matrix mxfile is the
121 standard format accepted by BLAST, FASTA, and other sequence
122 analysis software. See ftp.ncbi.nlm.nih.gov/blast/matrices/ for
123 example files. (The only exception: we require matrices to be
124 square, so for DNA, use files like NCBI's NUC.4.4, not NUC.4.2.)
125
126
127
128
130 Reporting thresholds control which hits are reported in output files
131 (the main output, --tblout, and --domtblout). Sequence hits and domain
132 hits are ranked by statistical significance (E-value) and output is
133 generated in two sections called per-target and per-domain output. In
134 per-target output, by default, all sequence hits with an E-value <= 10
135 are reported. In the per-domain output, for each target that has passed
136 per-target reporting thresholds, all domains satisfying per-domain re‐
137 porting thresholds are reported. By default, these are domains with
138 conditional E-values of <= 10. The following options allow you to
139 change the default E-value reporting thresholds, or to use bit score
140 thresholds instead.
141
142
143
144 -E <x> In the per-target output, report target sequences with an E-
145 value of <= <x>. The default is 10.0, meaning that on average,
146 about 10 false positives will be reported per query, so you can
147 see the top of the noise and decide for yourself if it's really
148 noise.
149
150
151 -T <x> Instead of thresholding per-profile output on E-value, instead
152 report target sequences with a bit score of >= <x>.
153
154
155 --domE <x>
156 In the per-domain output, for target sequences that have already
157 satisfied the per-profile reporting threshold, report individual
158 domains with a conditional E-value of <= <x>. The default is
159 10.0. A conditional E-value means the expected number of addi‐
160 tional false positive domains in the smaller search space of
161 those comparisons that already satisfied the per-target report‐
162 ing threshold (and thus must have at least one homologous domain
163 already).
164
165
166 --domT <x>
167 Instead of thresholding per-domain output on E-value, instead
168 report domains with a bit score of >= <x>.
169
170
172 Inclusion thresholds are stricter than reporting thresholds. They con‐
173 trol which hits are included in any output multiple alignment (the -A
174 option) and which domains are marked as significant ("!") as opposed to
175 questionable ("?") in domain output.
176
177
178 --incE <x>
179 Use an E-value of <= <x> as the per-target inclusion threshold.
180 The default is 0.01, meaning that on average, about 1 false pos‐
181 itive would be expected in every 100 searches with different
182 query sequences.
183
184
185 --incT <x>
186 Instead of using E-values for setting the inclusion threshold,
187 instead use a bit score of >= <x> as the per-target inclusion
188 threshold. By default this option is unset.
189
190
191 --incdomE <x>
192 Use a conditional E-value of <= <x> as the per-domain inclusion
193 threshold, in targets that have already satisfied the overall
194 per-target inclusion threshold. The default is 0.01.
195
196
197 --incdomT <x>
198 Instead of using E-values, use a bit score of >= <x> as the per-
199 domain inclusion threshold. By default this option is unset.
200
201
202
203
204
206 HMMER3 searches are accelerated in a three-step filter pipeline: the
207 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
208 ter is the fastest and most approximate; the last is the full Forward
209 scoring algorithm, slowest but most accurate. There is also a bias fil‐
210 ter step between MSV and Viterbi. Targets that pass all the steps in
211 the acceleration pipeline are then subjected to postprocessing -- do‐
212 main identification and scoring using the Forward/Backward algorithm.
213
214 Essentially the only free parameters that control HMMER's heuristic
215 filters are the P-value thresholds controlling the expected fraction of
216 nonhomologous sequences that pass the filters. Setting the default
217 thresholds higher will pass a higher proportion of nonhomologous se‐
218 quence, increasing sensitivity at the expense of speed; conversely,
219 setting lower P-value thresholds will pass a smaller proportion, de‐
220 creasing sensitivity and increasing speed. Setting a filter's P-value
221 threshold to 1.0 means it will passing all sequences, and effectively
222 disables the filter.
223
224 Changing filter thresholds only removes or includes targets from con‐
225 sideration; changing filter thresholds does not alter bit scores, E-
226 values, or alignments, all of which are determined solely in postpro‐
227 cessing.
228
229
230 --max Maximum sensitivity. Turn off all filters, including the bias
231 filter, and run full Forward/Backward postprocessing on every
232 target. This increases sensitivity slightly, at a large cost in
233 speed.
234
235
236 --F1 <x>
237 First filter threshold; set the P-value threshold for the MSV
238 filter step. The default is 0.02, meaning that roughly 2% of
239 the highest scoring nonhomologous targets are expected to pass
240 the filter.
241
242
243 --F2 <x>
244 Second filter threshold; set the P-value threshold for the
245 Viterbi filter step. The default is 0.001.
246
247
248 --F3 <x>
249 Third filter threshold; set the P-value threshold for the For‐
250 ward filter step. The default is 1e-5.
251
252
253 --nobias
254 Turn off the bias filter. This increases sensitivity somewhat,
255 but can come at a high cost in speed, especially if the query
256 has biased residue composition (such as a repetitive sequence
257 region, or if it is a membrane protein with large regions of hy‐
258 drophobicity). Without the bias filter, too many sequences may
259 pass the filter with biased queries, leading to slower than ex‐
260 pected performance as the computationally intensive For‐
261 ward/Backward algorithms shoulder an abnormally heavy load.
262
263
264
265
266
268 Estimating the location parameters for the expected score distributions
269 for MSV filter scores, Viterbi filter scores, and Forward scores re‐
270 quires three short random sequence simulations.
271
272
273 --EmL <n>
274 Sets the sequence length in simulation that estimates the loca‐
275 tion parameter mu for MSV filter E-values. Default is 200.
276
277
278 --EmN <n>
279 Sets the number of sequences in simulation that estimates the
280 location parameter mu for MSV filter E-values. Default is 200.
281
282
283 --EvL <n>
284 Sets the sequence length in simulation that estimates the loca‐
285 tion parameter mu for Viterbi filter E-values. Default is 200.
286
287
288 --EvN <n>
289 Sets the number of sequences in simulation that estimates the
290 location parameter mu for Viterbi filter E-values. Default is
291 200.
292
293
294 --EfL <n>
295 Sets the sequence length in simulation that estimates the loca‐
296 tion parameter tau for Forward E-values. Default is 100.
297
298
299 --EfN <n>
300 Sets the number of sequences in simulation that estimates the
301 location parameter tau for Forward E-values. Default is 200.
302
303
304 --Eft <x>
305 Sets the tail mass fraction to fit in the simulation that esti‐
306 mates the location parameter tau for Forward evalues. Default is
307 0.04.
308
309
310
311
312
314 --nonull2
315 Turn off the null2 score corrections for biased composition.
316
317
318 -Z <x> Assert that the total number of targets in your searches is <x>,
319 for the purposes of per-sequence E-value calculations, rather
320 than the actual number of targets seen.
321
322
323 --domZ <x>
324 Assert that the total number of targets in your searches is <x>,
325 for the purposes of per-domain conditional E-value calculations,
326 rather than the number of targets that passed the reporting
327 thresholds.
328
329
330 --seed <n>
331 Seed the random number generator with <n>, an integer >= 0. If
332 <n> is >0, any stochastic simulations will be reproducible; the
333 same command will give the same results. If <n> is 0, the ran‐
334 dom number generator is seeded arbitrarily, and stochastic simu‐
335 lations will vary from run to run of the same command. The de‐
336 fault seed is 42.
337
338
339 --qformat <s>
340 Assert that input seqfile is in format <s>, bypassing format au‐
341 todetection. Common choices for <s> include: fasta, embl, gen‐
342 bank. Alignment formats also work; common choices include:
343 stockholm, a2m, afa, psiblast, clustal, phylip. phmmer always
344 uses a single sequence query to start its search, so when the
345 input seqfile is an alignment, phmmer reads it one unaligned
346 query sequence at a time, not as an alignment. For more infor‐
347 mation, and for codes for some less common formats, see main
348 documentation. The string <s> is case-insensitive (fasta or
349 FASTA both work).
350
351 --tformat <s> Assert that target sequence database seqdb is in
352 format <s>, bypassing format autodetection. See --qformat above
353 for list of accepted format codes for <s>.
354
355
356
357 --cpu <n>
358 Set the number of parallel worker threads to <n>. On multicore
359 machines, the default is 2. You can also control this number by
360 setting an environment variable, HMMER_NCPU. There is also a
361 master thread, so the actual number of threads that HMMER spawns
362 is <n>+1.
363
364 This option is not available if HMMER was compiled with POSIX
365 threads support turned off.
366
367
368
369
370 --stall
371 For debugging the MPI master/worker version: pause after start,
372 to enable the developer to attach debuggers to the running mas‐
373 ter and worker(s) processes. Send SIGCONT signal to release the
374 pause. (Under gdb: (gdb) signal SIGCONT) (Only available if op‐
375 tional MPI support was enabled at compile-time.)
376
377
378 --mpi Run under MPI control with master/worker parallelization (using
379 mpirun, for example, or equivalent). Only available if optional
380 MPI support was enabled at compile-time.
381
382
383
384
385
386
388 See hmmer(1) for a master man page with a list of all the individual
389 man pages for programs in the HMMER package.
390
391
392 For complete documentation, see the user guide that came with your HM‐
393 MER distribution (Userguide.pdf); or see the HMMER web page (http://hm‐
394 mer.org/).
395
396
397
398
400 Copyright (C) 2020 Howard Hughes Medical Institute.
401 Freely distributed under the BSD open source license.
402
403 For additional information on copyright and licensing, see the file
404 called COPYRIGHT in your HMMER source distribution, or see the HMMER
405 web page (http://hmmer.org/).
406
407
408
410 http://eddylab.org
411
412
413
414
415HMMER 3.3.2 Nov 2020 phmmer(1)