phmmer(1)

1phmmer(1)                        HMMER Manual                        phmmer(1)
2
3
4

NAME

6       phmmer - search protein sequence(s) against a protein sequence database
7
8
9

SYNOPSIS

11       phmmer [options] <seqfile> <seqdb>
12
13
14

DESCRIPTION

16       phmmer  is used to search one or more query protein sequences against a
17       protein sequence database.  For each query sequence in  <seqfile>,  use
18       that  sequence  to  search the target database of sequences in <seqdb>,
19       and output ranked lists of the  sequences  with  the  most  significant
20       matches to the query.
21
22
23       The  output  format  is  designed to be human-readable, but is often so
24       voluminous that reading it is impractical, and parsing it  is  a  pain.
25       The --tblout and --domtblout options save output in simple tabular for‐
26       mats that are concise and easier to parse.  The -o option allows  redi‐
27       recting the main output, including throwing it away in /dev/null.
28
29

OPTIONS

31       -h     Help;  print  a  brief  reminder  of  command line usage and all
32              available options.
33
34
35

OPTIONS FOR CONTROLLING OUTPUT

37       -o <f> Direct the main human-readable output to a file <f>  instead  of
38              the default stdout.
39
40
41       -A <f> Save  a multiple alignment of all significant hits (those satis‐
42              fying inclusion thresholds) to the file <f> in Stockholm format.
43
44
45       --tblout <f>
46              Save a simple tabular  (space-delimited)  file  summarizing  the
47              per-target  output,  with  one  data  line per homologous target
48              sequence found.
49
50
51       --domtblout <f>
52              Save a simple tabular  (space-delimited)  file  summarizing  the
53              per-domain  output,  with  one  data  line per homologous domain
54              detected in a query sequence for each homologous model.
55
56
57       --acc  Use accessions instead of names in the main output, where avail‐
58              able for profiles and/or sequences.
59
60
61       --noali
62              Omit  the  alignment  section  from  the  main  output. This can
63              greatly reduce the output volume.
64
65
66       --notextw
67              Unlimit the length of each line in the main output. The  default
68              is a limit of 120 characters per line, which helps in displaying
69              the output cleanly on terminals and in editors, but can truncate
70              target profile description lines.
71
72
73       --textw <n>
74              Set  the  main  output's line length limit to <n> characters per
75              line. The default is 120.
76
77
78
79

OPTIONS CONTROLLING SCORING SYSTEM

81       The probability model in phmmer is  constructed  by  inferring  residue
82       probabilities from a standard 20x20 substitution score matrix, plus two
83       additional parameters for position-independent gap open and gap  extend
84       probabilities.
85
86
87       --popen <x>
88              Set  the  gap open probability for a single sequence query model
89              to <x>.  The default is 0.02.  <x> must be >= 0 and < 0.5.
90
91
92       --pextend <x>
93              Set the gap extend probability for a single sequence query model
94              to <x>.  The default is 0.4.  <x> must be >= 0 and < 1.0.
95
96
97       --mxfile <mxfile>
98              Obtain  residue  alignment  probabilities  from the substitution
99              matrix in file <mxfile>.  The default score matrix  is  BLOSUM62
100              (this matrix is internal to HMMER and does not have to be avail‐
101              able as a file).  The format of a substitution  matrix  <mxfile>
102              is  the  standard  format  accepted  by  BLAST, FASTA, and other
103              sequence analysis software.
104
105
106
107

OPTIONS CONTROLLING REPORTING THRESHOLDS

109       Reporting thresholds control which hits are reported  in  output  files
110       (the main output, --tblout, and --domtblout).  Sequence hits and domain
111       hits are ranked by statistical significance  (E-value)  and  output  is
112       generated  in  two sections called per-target and per-domain output. In
113       per-target output, by default, all sequence hits with an E-value <=  10
114       are reported. In the per-domain output, for each target that has passed
115       per-target reporting  thresholds,  all  domains  satisfying  per-domain
116       reporting  thresholds  are reported. By default, these are domains with
117       conditional E-values of <= 10.  The  following  options  allow  you  to
118       change  the  default  E-value reporting thresholds, or to use bit score
119       thresholds instead.
120
121
122
123       -E <x> In the per-target output, report target  sequences  with  an  E-
124              value  of <= <x>.  The default is 10.0, meaning that on average,
125              about 10 false positives will be reported per query, so you  can
126              see  the top of the noise and decide for yourself if it's really
127              noise.
128
129
130       -T <x> Instead of thresholding per-profile output on  E-value,  instead
131              report target sequences with a bit score of >= <x>.
132
133
134       --domE <x>
135              In the per-domain output, for target sequences that have already
136              satisfied the per-profile reporting threshold, report individual
137              domains  with  a  conditional E-value of <= <x>.  The default is
138              10.0.  A conditional E-value means the expected number of  addi‐
139              tional  false  positive  domains  in the smaller search space of
140              those comparisons that already satisfied the per-target  report‐
141              ing threshold (and thus must have at least one homologous domain
142              already).
143
144
145       --domT <x>
146              Instead of thresholding per-domain output  on  E-value,  instead
147              report domains with a bit score of >= <x>.
148
149

OPTIONS CONTROLLING INCLUSION THRESHOLDS

151       Inclusion  thresholds are stricter than reporting thresholds. They con‐
152       trol which hits are included in any output multiple alignment  (the  -A
153       option) and which domains are marked as significant ("!") as opposed to
154       questionable ("?")  in domain output.
155
156
157       --incE <x>
158              Use an E-value of <= <x> as the per-target inclusion  threshold.
159              The default is 0.01, meaning that on average, about 1 false pos‐
160              itive would be expected in every  100  searches  with  different
161              query sequences.
162
163
164       --incT <x>
165              Instead  of  using E-values for setting the inclusion threshold,
166              instead use a bit score of >= <x> as  the  per-target  inclusion
167              threshold.  By default this option is unset.
168
169
170       --incdomE <x>
171              Use  a conditional E-value of <= <x> as the per-domain inclusion
172              threshold, in targets that have already  satisfied  the  overall
173              per-target inclusion threshold.  The default is 0.01.
174
175
176       --incdomT <x>
177              Instead of using E-values, use a bit score of >= <x> as the per-
178              domain inclusion threshold.  By default this option is unset.
179
180
181
182
183

OPTIONS CONTROLLING THE ACCELERATION PIPELINE

185       HMMER3 searches are accelerated in a three-step  filter  pipeline:  the
186       MSV  filter, the Viterbi filter, and the Forward filter. The first fil‐
187       ter is the fastest and most approximate; the last is the  full  Forward
188       scoring algorithm, slowest but most accurate. There is also a bias fil‐
189       ter step between MSV and Viterbi. Targets that pass all  the  steps  in
190       the  acceleration  pipeline  are  then  subjected  to postprocessing --
191       domain identification and scoring using the Forward/Backward algorithm.
192
193       Essentially the only free parameters  that  control  HMMER's  heuristic
194       filters are the P-value thresholds controlling the expected fraction of
195       nonhomologous sequences that pass  the  filters.  Setting  the  default
196       thresholds  higher  will  pass  a  higher  proportion  of nonhomologous
197       sequence, increasing sensitivity at the expense of  speed;  conversely,
198       setting  lower  P-value  thresholds  will  pass  a  smaller proportion,
199       decreasing sensitivity and increasing speed. Setting a filter's P-value
200       threshold  to  1.0 means it will passing all sequences, and effectively
201       disables the filter.
202
203       Changing filter thresholds only removes or includes targets  from  con‐
204       sideration;  changing  filter  thresholds does not alter bit scores, E-
205       values, or alignments, all of which are determined solely  in  postpro‐
206       cessing.
207
208
209       --max  Maximum  sensitivity.   Turn off all filters, including the bias
210              filter, and run full Forward/Backward  postprocessing  on  every
211              target.  This increases sensitivity slightly, at a large cost in
212              speed.
213
214
215       --F1 <x>
216              First filter threshold; set the P-value threshold  for  the  MSV
217              filter  step.   The  default is 0.02, meaning that roughly 2% of
218              the highest scoring nonhomologous targets are expected  to  pass
219              the filter.
220
221
222       --F2 <x>
223              Second  filter  threshold;  set  the  P-value  threshold for the
224              Viterbi filter step.  The default is 0.001.
225
226
227       --F3 <x>
228              Third filter threshold; set the P-value threshold for  the  For‐
229              ward filter step.  The default is 1e-5.
230
231
232       --nobias
233              Turn  off  the bias filter. This increases sensitivity somewhat,
234              but can come at a high cost in speed, especially  if  the  query
235              has  biased  residue  composition (such as a repetitive sequence
236              region, or if it is a membrane protein  with  large  regions  of
237              hydrophobicity). Without the bias filter, too many sequences may
238              pass the filter with biased  queries,  leading  to  slower  than
239              expected  performance  as  the  computationally  intensive  For‐
240              ward/Backward algorithms shoulder an abnormally heavy load.
241
242
243
244
245

OPTIONS CONTROLLING E-VALUE CALIBRATION

247       Estimating the location parameters for the expected score distributions
248       for  MSV  filter  scores,  Viterbi  filter  scores,  and Forward scores
249       requires three short random sequence simulations.
250
251
252       --EmL <n>
253              Sets the sequence length in simulation that estimates the  loca‐
254              tion parameter mu for MSV filter E-values. Default is 200.
255
256
257       --EmN <n>
258              Sets  the  number  of sequences in simulation that estimates the
259              location parameter mu for MSV filter E-values. Default is 200.
260
261
262       --EvL <n>
263              Sets the sequence length in simulation that estimates the  loca‐
264              tion parameter mu for Viterbi filter E-values. Default is 200.
265
266
267       --EvN <n>
268              Sets  the  number  of sequences in simulation that estimates the
269              location parameter mu for Viterbi filter  E-values.  Default  is
270              200.
271
272
273       --EfL <n>
274              Sets  the sequence length in simulation that estimates the loca‐
275              tion parameter tau for Forward E-values. Default is 100.
276
277
278       --EfN <n>
279              Sets the number of sequences in simulation  that  estimates  the
280              location parameter tau for Forward E-values. Default is 200.
281
282
283       --Eft <x>
284              Sets  the tail mass fraction to fit in the simulation that esti‐
285              mates the location parameter tau for Forward evalues. Default is
286              0.04.
287
288
289
290
291

OTHER OPTIONS

293       --nonull2
294              Turn off the null2 score corrections for biased composition.
295
296
297       -Z <x> Assert that the total number of targets in your searches is <x>,
298              for the purposes of per-sequence  E-value  calculations,  rather
299              than the actual number of targets seen.
300
301
302       --domZ <x>
303              Assert that the total number of targets in your searches is <x>,
304              for the purposes of per-domain conditional E-value calculations,
305              rather  than  the  number  of  targets that passed the reporting
306              thresholds.
307
308
309       --seed <n>
310              Seed the random number generator with <n>, an integer >= 0.   If
311              <n>  is >0, any stochastic simulations will be reproducible; the
312              same command will give the same results.  If <n> is 0, the  ran‐
313              dom number generator is seeded arbitrarily, and stochastic simu‐
314              lations will vary from run to run  of  the  same  command.   The
315              default seed is 42.
316
317
318       --qformat <s>
319              Declare  that  the  input  <seqfile> is in format <s>.  Accepted
320              formats include fasta, embl, genbank, ddbj, uniprot,  stockholm,
321              pfam,  a2m, and afa.  The default is to autodetect the format of
322              the file.
323
324
325       --tformat <s>
326              Declare that the input <seqdb> is in format <s>.  Accepted  for‐
327              mats  include  fasta,  embl,  genbank, ddbj, uniprot, stockholm,
328              pfam, a2m, and afa.  The default is to autodetect the format  of
329              the file.
330
331
332       --cpu <n>
333              Set  the  number of parallel worker threads to <n>.  By default,
334              HMMER sets this to the number of CPU cores it  detects  in  your
335              machine  -  that is, it tries to maximize the use of your avail‐
336              able processor cores. Setting <n>  higher  than  the  number  of
337              available  cores  is of little if any value, but you may want to
338              set it to something less. You can also control  this  number  by
339              setting an environment variable, HMMER_NCPU.
340
341              This  option  is only available if HMMER was compiled with POSIX
342              threads support. This is the  default,  but  it  may  have  been
343              turned  off  at  compile-time  for your site or machine for some
344              reason.
345
346
347              --stall For debugging the MPI master/worker version: pause after
348              start,  to  enable the developer to attach debuggers to the run‐
349              ning master and worker(s)  processes.  Send  SIGCONT  signal  to
350              release  the  pause.   (Under  gdb:  (gdb) signal SIGCONT) (Only
351              available if optional MPI support was enabled at compile-time.)
352
353
354       --mpi  Run in MPI master/worker mode, using mpirun.  (Only available if
355              optional MPI support was enabled at compile-time.)
356
357
358
359
360

COPYRIGHT

374       @HMMER_COPYRIGHT@
375       @HMMER_LICENSE@
376
377       For additional information on copyright and  licensing,  see  the  file
378       called  COPYRIGHT  in  your HMMER source distribution, or see the HMMER
379       web page (@HMMER_URL@).
380
381
382

AUTHOR

384       Eddy/Rivas Laboratory
385       Janelia Farm Research Campus
386       19700 Helix Drive
387       Ashburn VA 20147 USA
388       http://eddylab.org
389
390
391
392
393HMMER @HMMER_VERSION@            @HMMER_DATE@                        phmmer(1)