1hmmscan(1) HMMER Manual hmmscan(1)
2
3
4
6 hmmscan - search sequence(s) against a profile database
7
8
9
11 hmmscan [options] hmmdb seqfile
12
13
14
15
17 hmmscan is used to search protein sequences against collections of pro‐
18 tein profiles. For each sequence in seqfile, use that query sequence to
19 search the target database of profiles in hmmdb, and output ranked
20 lists of the profiles with the most significant matches to the se‐
21 quence.
22
23
24 The seqfile may contain more than one query sequence. Each will be
25 searched in turn against hmmdb.
26
27
28 The hmmdb needs to be press'ed using hmmpress before it can be searched
29 with hmmscan. This creates four binary files, suffixed .h3{fimp}.
30
31
32 The query seqfile may be '-' (a dash character), in which case the
33 query sequences are read from a stdin pipe instead of from a file. The
34 hmmdb cannot be read from a stdin stream, because it needs to have
35 those four auxiliary binary files generated by hmmpress.
36
37
38 The output format is designed to be human-readable, but is often so vo‐
39 luminous that reading it is impractical, and parsing it is a pain. The
40 --tblout and --domtblout options save output in simple tabular formats
41 that are concise and easier to parse. The -o option allows redirecting
42 the main output, including throwing it away in /dev/null.
43
44
45
46
48 -h Help; print a brief reminder of command line usage and all
49 available options.
50
51
52
53
55 -o <f> Direct the main human-readable output to a file <f> instead of
56 the default stdout.
57
58
59 --tblout <f>
60 Save a simple tabular (space-delimited) file summarizing the
61 per-target output, with one data line per homologous target
62 model found.
63
64
65 --domtblout <f>
66 Save a simple tabular (space-delimited) file summarizing the
67 per-domain output, with one data line per homologous domain de‐
68 tected in a query sequence for each homologous model.
69
70
71 --pfamtblout <f>
72 Save an especially succinct tabular (space-delimited) file sum‐
73 marizing the per-target output, with one data line per homolo‐
74 gous target model found.
75
76
77
78 --acc Use accessions instead of names in the main output, where avail‐
79 able for profiles and/or sequences.
80
81
82 --noali
83 Omit the alignment section from the main output. This can
84 greatly reduce the output volume.
85
86
87 --notextw
88 Unlimit the length of each line in the main output. The default
89 is a limit of 120 characters per line, which helps in displaying
90 the output cleanly on terminals and in editors, but can truncate
91 target profile description lines.
92
93
94 --textw <n>
95 Set the main output's line length limit to <n> characters per
96 line. The default is 120.
97
98
99
100
102 Reporting thresholds control which hits are reported in output files
103 (the main output, --tblout, and --domtblout).
104
105
106 -E <x> In the per-target output, report target profiles with an E-value
107 of <= <x>. The default is 10.0, meaning that on average, about
108 10 false positives will be reported per query, so you can see
109 the top of the noise and decide for yourself if it's really
110 noise.
111
112
113 -T <x> Instead of thresholding per-profile output on E-value, instead
114 report target profiles with a bit score of >= <x>.
115
116
117 --domE <x>
118 In the per-domain output, for target profiles that have already
119 satisfied the per-profile reporting threshold, report individual
120 domains with a conditional E-value of <= <x>. The default is
121 10.0. A conditional E-value means the expected number of addi‐
122 tional false positive domains in the smaller search space of
123 those comparisons that already satisfied the per-profile report‐
124 ing threshold (and thus must have at least one homologous domain
125 already).
126
127
128
129 --domT <x>
130 Instead of thresholding per-domain output on E-value, instead
131 report domains with a bit score of >= <x>.
132
133
134
135
136
138 Inclusion thresholds are stricter than reporting thresholds. Inclusion
139 thresholds control which hits are considered to be reliable enough to
140 be included in an output alignment or a subsequent search round. In
141 hmmscan, which does not have any alignment output (like hmmsearch or
142 phmmer) nor any iterative search steps (like jackhmmer), inclusion
143 thresholds have little effect. They only affect what domains get marked
144 as significant (!) or questionable (?) in domain output.
145
146
147 --incE <x>
148 Use an E-value of <= <x> as the per-target inclusion threshold.
149 The default is 0.01, meaning that on average, about 1 false pos‐
150 itive would be expected in every 100 searches with different
151 query sequences.
152
153
154 --incT <x>
155 Instead of using E-values for setting the inclusion threshold,
156 instead use a bit score of >= <x> as the per-target inclusion
157 threshold. It would be unusual to use bit score thresholds with
158 hmmscan, because you don't expect a single score threshold to
159 work for different profiles; different profiles have slightly
160 different expected score distributions.
161
162
163 --incdomE <x>
164 Use a conditional E-value of <= <x> as the per-domain inclusion
165 threshold, in targets that have already satisfied the overall
166 per-target inclusion threshold. The default is 0.01.
167
168
169 --incdomT <x>
170 Instead of using E-values, instead use a bit score of >= <x> as
171 the per-domain inclusion threshold. As with --incT above, it
172 would be unusual to use a single bit score threshold in hmmscan.
173
174
175
176
178 Curated profile databases may define specific bit score thresholds for
179 each profile, superseding any thresholding based on statistical signif‐
180 icance alone.
181
182 To use these options, the profile must contain the appropriate (GA, TC,
183 and/or NC) optional score threshold annotation; this is picked up by
184 hmmbuild from Stockholm format alignment files. Each thresholding op‐
185 tion has two scores: the per-sequence threshold <x1> and the per-domain
186 threshold <x2>. These act as if -T <x1> --incT <x1> --domT <x2> --inc‐
187 domT <x2> has been applied specifically using each model's curated
188 thresholds.
189
190
191 --cut_ga
192 Use the GA (gathering) bit scores in the model to set per-se‐
193 quence (GA1) and per-domain (GA2) reporting and inclusion
194 thresholds. GA thresholds are generally considered to be the re‐
195 liable curated thresholds defining family membership; for exam‐
196 ple, in Pfam, these thresholds define what gets included in Pfam
197 Full alignments based on searches with Pfam Seed models.
198
199
200 --cut_nc
201 Use the NC (noise cutoff) bit score thresholds in the model to
202 set per-sequence (NC1) and per-domain (NC2) reporting and inclu‐
203 sion thresholds. NC thresholds are generally considered to be
204 the score of the highest-scoring known false positive.
205
206
207 --cut_tc
208 Use the NC (trusted cutoff) bit score thresholds in the model to
209 set per-sequence (TC1) and per-domain (TC2) reporting and inclu‐
210 sion thresholds. TC thresholds are generally considered to be
211 the score of the lowest-scoring known true positive that is
212 above all known false positives.
213
214
215
216
217
219 HMMER3 searches are accelerated in a three-step filter pipeline: the
220 MSV filter, the Viterbi filter, and the Forward filter. The first fil‐
221 ter is the fastest and most approximate; the last is the full Forward
222 scoring algorithm. There is also a bias filter step between MSV and
223 Viterbi. Targets that pass all the steps in the acceleration pipeline
224 are then subjected to postprocessing -- domain identification and scor‐
225 ing using the Forward/Backward algorithm.
226
227 Changing filter thresholds only removes or includes targets from con‐
228 sideration; changing filter thresholds does not alter bit scores, E-
229 values, or alignments, all of which are determined solely in postpro‐
230 cessing.
231
232
233 --max Turn off all filters, including the bias filter, and run full
234 Forward/Backward postprocessing on every target. This increases
235 sensitivity somewhat, at a large cost in speed.
236
237
238 --F1 <x>
239 Set the P-value threshold for the MSV filter step. The default
240 is 0.02, meaning that roughly 2% of the highest scoring nonho‐
241 mologous targets are expected to pass the filter.
242
243
244 --F2 <x>
245 Set the P-value threshold for the Viterbi filter step. The de‐
246 fault is 0.001.
247
248
249 --F3 <x>
250 Set the P-value threshold for the Forward filter step. The de‐
251 fault is 1e-5.
252
253
254 --nobias
255 Turn off the bias filter. This increases sensitivity somewhat,
256 but can come at a high cost in speed, especially if the query
257 has biased residue composition (such as a repetitive sequence
258 region, or if it is a membrane protein with large regions of hy‐
259 drophobicity). Without the bias filter, too many sequences may
260 pass the filter with biased queries, leading to slower than ex‐
261 pected performance as the computationally intensive For‐
262 ward/Backward algorithms shoulder an abnormally heavy load.
263
264
265
266
268 --nonull2
269 Turn off the null2 score corrections for biased composition.
270
271
272 -Z <x> Assert that the total number of targets in your searches is <x>,
273 for the purposes of per-sequence E-value calculations, rather
274 than the actual number of targets seen.
275
276
277 --domZ <x>
278 Assert that the total number of targets in your searches is <x>,
279 for the purposes of per-domain conditional E-value calculations,
280 rather than the number of targets that passed the reporting
281 thresholds.
282
283
284 --seed <n>
285 Set the random number seed to <n>. Some steps in postprocessing
286 require Monte Carlo simulation. The default is to use a fixed
287 seed (42), so that results are exactly reproducible. Any other
288 positive integer will give different (but also reproducible) re‐
289 sults. A choice of 0 uses an arbitrarily chosen seed.
290
291
292 --qformat <s>
293 Assert that input seqfile is in format <s>, bypassing format au‐
294 todetection. Common choices for <s> include: fasta, embl, gen‐
295 bank. Alignment formats also work; common choices include:
296 stockholm, a2m, afa, psiblast, clustal, phylip. For more infor‐
297 mation, and for codes for some less common formats, see main
298 documentation. The string <s> is case-insensitive (fasta or
299 FASTA both work).
300
301
302
303
304 --cpu <n>
305 Set the number of parallel worker threads to <n>. On multicore
306 machines, the default is 2. You can also control this number by
307 setting an environment variable, HMMER_NCPU. There is also a
308 master thread, so the actual number of threads that HMMER spawns
309 is <n>+1.
310
311 This option is not available if HMMER was compiled with POSIX
312 threads support turned off.
313
314
315
316 --stall
317 For debugging the MPI master/worker version: pause after start,
318 to enable the developer to attach debuggers to the running mas‐
319 ter and worker(s) processes. Send SIGCONT signal to release the
320 pause. (Under gdb: (gdb) signal SIGCONT)
321
322 (Only available if optional MPI support was enabled at compile-
323 time.)
324
325
326 --mpi Run under MPI control with master/worker parallelization (using
327 mpirun, for example, or equivalent). Only available if optional
328 MPI support was enabled at compile-time.
329
330
331
332
333
335 See hmmer(1) for a master man page with a list of all the individual
336 man pages for programs in the HMMER package.
337
338
339 For complete documentation, see the user guide that came with your HM‐
340 MER distribution (Userguide.pdf); or see the HMMER web page (http://hm‐
341 mer.org/).
342
343
344
345
347 Copyright (C) 2020 Howard Hughes Medical Institute.
348 Freely distributed under the BSD open source license.
349
350 For additional information on copyright and licensing, see the file
351 called COPYRIGHT in your HMMER source distribution, or see the HMMER
352 web page (http://hmmer.org/).
353
354
355
357 http://eddylab.org
358
359
360
361
362HMMER 3.3.2 Nov 2020 hmmscan(1)