makehmmerdb(1)

1makehmmerdb(1)                   HMMER Manual                   makehmmerdb(1)
2
3
4

NAME

6       makehmmerdb - build nhmmer database from a sequence file
7
8
9

SYNOPSIS

11       makehmmerdb [options] seqfile binaryfile
12
13
14

DESCRIPTION

16       makehmmerdb  is  used to create a binary file from a DNA sequence file.
17       This binary file may be used as a target database for  the  DNA  search
18       tool  nhmmer.   Using default settings in nhmmer, this yields a roughly
19       10-fold acceleration with small loss of sensitivity on benchmarks.
20
21
22

OPTIONS

24       -h     Help; print a brief reminder  of  command  line  usage  and  all
25              available options.
26
27
28
29

OTHER OPTIONS

31       --informat <s>
32              Assert that input seqfile is in format <s>, bypassing format au‐
33              todetection.  Common choices for <s> include: fasta, embl,  gen‐
34              bank.   Alignment  formats  also  work;  common choices include:
35              stockholm, a2m, afa, psiblast, clustal, phylip.  For more infor‐
36              mation,  and  for  codes  for some less common formats, see main
37              documentation.  The string <s>  is  case-insensitive  (fasta  or
38              FASTA both work).
39
40
41
42       --bin_length <n>
43              Bin  length.  The binary file depends on a data structure called
44              the FM index, which organizes a permuted copy of the sequence in
45              bins  of  length  <n>.   Longer  bin length will lead to smaller
46              files (because data is captured about  each  bin)  and  possibly
47              slower  query  time.  The default is 256. Much more than 512 may
48              lead to notable reduction in speed.
49
50
51
52       --sa_freq <n>
53              Suffix array sample rate. The FM index  structure  also  samples
54              from the underlying suffix array for the sequence database. More
55              frequent sampling (smaller value for <n>) will yield larger file
56              size  and faster search (until file size becomes large enough to
57              cause I/O to be a bottleneck). The default value is 8. Must be a
58              power of 2.
59
60
61
62       --block_size <n>
63              The  input  sequence  is  broken into blocks of size <n> million
64              letters. An FM index is built for each block, rather than build‐
65              ing an FM index for the entire sequence database. Default is 50.
66              Larger blocks do not seem to yield substantial speed increase.
67
68
69
70

COPYRIGHT

84       Copyright (C) 2020 Howard Hughes Medical Institute.
85       Freely distributed under the BSD open source license.
86
87       For  additional  information  on  copyright and licensing, see the file
88       called COPYRIGHT in your HMMER source distribution, or  see  the  HMMER
89       web page (http://hmmer.org/).
90
91
92

AUTHOR

94       http://eddylab.org
95
96
97
98
99
100
101HMMER 3.3.2                        Nov 2020                     makehmmerdb(1)