1esl-mixdchlet(1) Easel Manual esl-mixdchlet(1)
2
3
4
6 esl-mixdchlet - fitting mixture Dirichlets to count data
7
8
10 esl-mixdchlet fit [options] Q K in_countfile out_mixchlet
11 (train a new mixture Dirichlet)
12
13 esl-mixdchlet score [options] mixdchlet_file counts_file
14 (calculate log likelihood of count data, given mixture Dirichlet)
15
16 esl-mixdchlet gen [options] mixdchlet_file
17 (generate synthetic count data from mixture Dirichlet)
18
19 esl-mixdchlet sample [options]
20 (sample a random mixture Dirichlet for testing)
21
22
23
25 The esl-mixdchlet miniapp is for training mixture Dirichlet priors,
26 such as the priors used in HMMER and Infernal. It has four subcommands:
27 fit, score, gen, and sample. The most important subcommand is fit,
28 which is the subcommand for fitting a new mixture Dirichlet distribu‐
29 tion to a collection of count vectors (for example, emission or transi‐
30 tion count vectors from Pfam or Rfam training sets).
31
32
33 Specifically, esl-mixdchlet fit fits a new mixture Dirichlet distribu‐
34 tion with Q mixture components to the count vectors (of alphabet size K
35 ) in input file in_countfile, and saves the mixture Dirichlet into out‐
36 put file out_mixdchlet.
37
38
39 The input count vector file in_countfile contains one count vector of
40 length K fields per line, for any number of lines. Blank lines and
41 lines starting in # (comments) are ignored. Fields are nonnegative
42 real values; they do not have to be integers, because they can be
43 weighted counts.
44
45
46 The format of a mixture Dirichlet file out_mixdchlet is as follows. The
47 first line has two fields, K Q, where K is the alphabet size and Q is
48 the number of mixture components. The next Q lines consist of K+1
49 fields. The first field is the mixture coefficient q_k, followed by K
50 fields with the Dirichlet alpha[k][a] parameters for this component.
51
52
53 The esl-mixdchlet score subcommand calculates the log likelihood of the
54 count vector data in counts_file, given the mixture Dirichlet in mixd‐
55 chlet_file.
56
57
58 The esl-mixdchlet gen subcommand generates synthetic count data, given
59 a mixture Dirichlet.
60
61
62 The esl-mixdchlet sample subcommand creates a random mixture Dirichlet
63 distribution and outputs it to standard output.
64
65
66
68 -h Print brief help specific to the fit subcommand.
69
70
71 -s <seed>
72 Set random number generator seed to nonnegative integer <seed>.
73 Default is 0, which means to use a quasirandom arbitrary seed.
74 Values >0 give reproducible results.
75
76
77
78
79
81 -h Print brief help specific to the score subcommand.
82
83
84
85
87 -h Print brief help specific to the gen subcommand.
88
89
90 -s <seed>
91 Set random number generator seed to nonnegative integer <seed>.
92 Default is 0, which means to use a quasirandom arbitrary seed.
93 Values >0 give reproducible results.
94
95
96
97 -M <M> Generate <M> counts per sampled vector. (Default 100.)
98
99
100 -N <N> Generate <N> count vectors. (Default 1000.)
101
102
103
105 -h Print brief help specific to the sample subcommand.
106
107
108 -s <seed>
109 Set random number generator seed to nonnegative integer <seed>.
110 Default is 0, which means to use a quasirandom arbitrary seed.
111 Values >0 give reproducible results.
112
113
114
115 -K <K> Set the alphabet size to <K>. (Default is 20, for amino acids.)
116
117
118 -Q <Q> Set the number of mixture components to <Q>. (Default is 9.)
119
120
121
122
123
125 http://bioeasel.org/
126
127
129 Copyright (C) 2020 Howard Hughes Medical Institute.
130 Freely distributed under the BSD open source license.
131
132
134 http://eddylab.org
135
136
137
138
139Easel 0.48 Nov 2020 esl-mixdchlet(1)