hmmpfam(1)

1hmmpfam(1)                       HMMER Manual                       hmmpfam(1)
2
3
4

NAME

6       hmmpfam - search one or more sequences against an HMM database
7
8

SYNOPSIS

10       hmmpfam [options] hmmfile seqfile
11
12

DESCRIPTION

14       hmmpfam reads a sequence file seqfile and compares each sequence in it,
15       one at a time, against all the HMMs in  hmmfile  looking  for  signifi‐
16       cantly similar sequence matches.
17
18
19       hmmfile will be looked for first in the current working directory, then
20       in a directory named by the environment variable  HMMERDB.   This  lets
21       administrators  install  HMM  library(s) such as Pfam in a common loca‐
22       tion.
23
24
25       There is a separate output report for each sequence in  seqfile.   This
26       report  consists  of  three sections: a ranked list of the best scoring
27       HMMs, a list of the best scoring domains in order of  their  occurrence
28       in  the  sequence,  and alignments for all the best scoring domains.  A
29       sequence score may be higher than a domain score for the same  sequence
30       if  there  is  more than one domain in the sequence; the sequence score
31       takes into account all the domains.  All sequences scoring above the -E
32       and  -T cutoffs are shown in the first list, then every domain found in
33       this list is shown in the second list of domain hits.  If  desired,  E-
34       value  and  bit score thresholds may also be applied to the domain list
35       using the --domE and --domT options.
36
37

OPTIONS

39       -h     Print brief help; includes version number  and  summary  of  all
40              options, including expert options.
41
42
43       -n     Specify  that models and sequence are nucleic acid, not protein.
44              Other HMMER programs autodetect this; but because of  the  order
45              in  which hmmpfam accesses data, it can't reliably determine the
46              correct "alphabet" by itself.
47
48
49       -A <n> Limits the alignment output to the  <n>  best  scoring  domains.
50              -A0 shuts off the alignment output and can be used to reduce the
51              size of output files.
52
53
54       -E <x> Set the E-value cutoff for the per-sequence ranked hit  list  to
55              <x>,  where  <x> is a positive real number. The default is 10.0.
56              Hits with E-values better than (less than) this  threshold  will
57              be shown.
58
59
60       -T <x> Set the bit score cutoff for the per-sequence ranked hit list to
61              <x>, where <x> is a real number.  The default is negative infin‐
62              ity;  by default, the threshold is controlled by E-value and not
63              by bit score.  Hits with bit scores better than  (greater  than)
64              this threshold will be shown.
65
66
67       -Z <n> Calculate  the E-value scores as if we had seen a sequence data‐
68              base of <n> sequences. The default is arbitrarily set to  59021,
69              the size of Swissprot 34.
70
71

EXPERT OPTIONS

73       --acc  Report  HMM  accessions  instead of names in the output reports.
74              Useful for high-throughput annotation, where the data are  being
75              parsed for storage in a relational database.
76
77
78       --compat
79              Use  the  output  format  of  HMMER  2.1.1, the 1998-2001 public
80              release; provided so 2.1.1 parsers don't have to be rewritten.
81
82
83       --cpu <n>
84              Sets the maximum number of CPUs that the program  will  run  on.
85              The  default  is  to  use all CPUs in the machine. Overrides the
86              HMMER_NCPU environment variable. Only affects threaded  versions
87              of HMMER (the default on most systems).
88
89
90       --cut_ga
91              Use  Pfam GA (gathering threshold) score cutoffs.  Equivalent to
92              --globT <GA1> --domT <GA2>, but the GA1 and GA2 cutoffs are read
93              from  each HMM in hmmfile individually. hmmbuild puts these cut‐
94              offs there if the  alignment  file  was  annotated  in  a  Pfam-
95              friendly  alignment  format (extended SELEX or Stockholm format)
96              and the optional GA annotation line was present. If  these  cut‐
97              offs are not set in the HMM file, --cut_ga doesn't work.
98
99
100       --cut_tc
101              Use  Pfam  TC  (trusted  cutoff)  score  cutoffs.  Equivalent to
102              --globT <TC1> --domT <TC2>, but the TC1 and TC2 cutoffs are read
103              from  each HMM in hmmfile individually. hmmbuild puts these cut‐
104              offs there if the  alignment  file  was  annotated  in  a  Pfam-
105              friendly  alignment  format (extended SELEX or Stockholm format)
106              and the optional TC annotation line was present. If  these  cut‐
107              offs are not set in the HMM file, --cut_tc doesn't work.
108
109
110       --cut_nc
111              Use  Pfam NC (noise cutoff) score cutoffs. Equivalent to --globT
112              <NC1> --domT <NC2>, but the NC1 and NC2 cutoffs  are  read  from
113              each  HMM  in  hmmfile individually. hmmbuild puts these cutoffs
114              there if the alignment file was  annotated  in  a  Pfam-friendly
115              alignment  format  (extended  SELEX or Stockholm format) and the
116              optional NC annotation line was present. If  these  cutoffs  are
117              not set in the HMM file, --cut_nc doesn't work.
118
119
120       --domE <x>
121              Set  the  E-value  cutoff  for the per-domain ranked hit list to
122              <x>, where <x> is a positive real number.  The default is infin‐
123              ity;  by  default,  all domains in the sequences that passed the
124              first threshold will be reported in the second list, so that the
125              number  of  domains reported in the per-sequence list is consis‐
126              tent with the number that appear in the per-domain list.
127
128
129       --domT <x>
130              Set the bit score cutoff for the per-domain ranked hit  list  to
131              <x>,  where <x> is a real number. The default is negative infin‐
132              ity; by default, all domains in the sequences  that  passed  the
133              first threshold will be reported in the second list, so that the
134              number of domains reported in the per-sequence list  is  consis‐
135              tent with the number that appear in the per-domain list.  Impor‐
136              tant note: only one domain in  a  sequence  is  absolutely  con‐
137              trolled  by this parameter, or by --domT.  The second and subse‐
138              quent domains in a sequence have a de facto bit score  threshold
139              of  0  because of the details of how HMMER works. HMMER requires
140              at least one pass through the main model  per  sequence;  to  do
141              more than one pass (more than one domain) the multidomain align‐
142              ment must have a better score than the single domain  alignment,
143              and  hence the extra domains must contribute positive score. See
144              the Users' Guide for more detail.
145
146
147       --forward
148              Use the Forward algorithm instead of the  Viterbi  algorithm  to
149              determine  the  per-sequence scores. Per-domain scores are still
150              determined by the Viterbi algorithm. Some have argued that  For‐
151              ward is a more sensitive algorithm for detecting remote sequence
152              homologues; my experiments with HMMER have not  confirmed  this,
153              however.
154
155
156       --informat <s>
157              Assert  that  the  input  seqfile  is  in format <s>; do not run
158              Babelfish format autodection. This increases the reliability  of
159              the  program  somewhat, because the Babelfish can make mistakes;
160              particularly recommended for unattended, high-throughput runs of
161              HMMER.  Valid  format strings include FASTA, GENBANK, EMBL, GCG,
162              PIR, STOCKHOLM, SELEX, MSF, CLUSTAL, and PHYLIP. See the  User's
163              Guide for a complete list.
164
165
166       --null2
167              Turn off the post hoc second null model. By default, each align‐
168              ment is rescored  by  a  postprocessing  step  that  takes  into
169              account  possible  biased  composition  in either the HMM or the
170              target sequence.  This is almost essential in database searches,
171              especially  with  local  alignment models. There is a very small
172              chance that this postprocessing might remove real  matches,  and
173              in these cases --null2 may improve sensitivity at the expense of
174              reducing specificity by letting biased composition hits through.
175
176
177       --pvm  Run on a Parallel Virtual Machine (PVM). The PVM must already be
178              running. The client program hmmpfam-pvm must be installed on all
179              the PVM nodes.  The HMM database hmmfile and an  associated  GSI
180              index  file  hmmfile.gsi  must  also be installed on all the PVM
181              nodes.  (The GSI index is produced  by  the  program  hmmindex.)
182              Because the PVM implementation is I/O bound, it is highly recom‐
183              mended that each node have a local copy of hmmfile  rather  than
184              NFS mounting a shared copy.  Optional PVM support must have been
185              compiled into HMMER for --pvm to function.
186
187
188       --xnu  Turn on XNU filtering of target protein sequences. Has no effect
189              on  nucleic  acid sequences. In trial experiments, --xnu appears
190              to perform less well than the default post hoc null2 model.
191
192
193
194
195

COPYRIGHT

206       Copyright (C) 1992-2003 HHMI/Washington University School of Medicine.
207       Freely distributed under the GNU General Public License (GPL).
208       See the file COPYING in your distribution for details on redistribution
209       conditions.
210
211

AUTHOR

213       Sean Eddy
214       HHMI/Dept. of Genetics
215       Washington Univ. School of Medicine
216       4566 Scott Ave.
217       St Louis, MO 63110 USA
218       http://www.genetics.wustl.edu/eddy/
219
220
221
222
223
224HMMER 2.3.2                        Oct 2003                         hmmpfam(1)