1esl-mixdchlet(1)                 Easel Manual                 esl-mixdchlet(1)
2
3
4

NAME

6       esl-mixdchlet - fitting mixture Dirichlets to count data
7
8

SYNOPSIS

10       esl-mixdchlet fit [options] Q K in_countfile out_mixchlet
11         (train a new mixture Dirichlet)
12
13       esl-mixdchlet score [options] mixdchlet_file counts_file
14         (calculate log likelihood of count data, given mixture Dirichlet)
15
16       esl-mixdchlet gen [options] mixdchlet_file
17         (generate synthetic count data from mixture Dirichlet)
18
19       esl-mixdchlet sample [options]
20         (sample a random mixture Dirichlet for testing)
21
22
23

DESCRIPTION

25       The  esl-mixdchlet  miniapp  is  for training mixture Dirichlet priors,
26       such as the priors used in HMMER and Infernal. It has four subcommands:
27       fit,  score,  gen,  and  sample.  The most important subcommand is fit,
28       which is the subcommand for fitting a new mixture  Dirichlet  distribu‐
29       tion to a collection of count vectors (for example, emission or transi‐
30       tion count vectors from Pfam or Rfam training sets).
31
32
33       Specifically, esl-mixdchlet fit fits a new mixture Dirichlet  distribu‐
34       tion with Q mixture components to the count vectors (of alphabet size K
35       ) in input file in_countfile, and saves the mixture Dirichlet into out‐
36       put file out_mixdchlet.
37
38
39       The  input  count vector file in_countfile contains one count vector of
40       length K fields per line, for any number of  lines.   Blank  lines  and
41       lines  starting  in  #  (comments) are ignored.  Fields are nonnegative
42       real values; they do not have to  be  integers,  because  they  can  be
43       weighted counts.
44
45
46       The format of a mixture Dirichlet file out_mixdchlet is as follows. The
47       first line has two fields, K Q, where K is the alphabet size and  Q  is
48       the  number  of  mixture  components.   The next Q lines consist of K+1
49       fields. The first field is the mixture coefficient q_k, followed  by  K
50       fields with the Dirichlet alpha[k][a] parameters for this component.
51
52
53       The esl-mixdchlet score subcommand calculates the log likelihood of the
54       count vector data in counts_file, given the mixture Dirichlet in  mixd‐
55       chlet_file.
56
57
58       The  esl-mixdchlet gen subcommand generates synthetic count data, given
59       a mixture Dirichlet.
60
61
62       The esl-mixdchlet sample subcommand creates a random mixture  Dirichlet
63       distribution and outputs it to standard output.
64
65
66

OPTIONS FOR FIT SUBCOMMAND

68       -h     Print brief help specific to the fit subcommand.
69
70
71       -s <seed>
72              Set  random number generator seed to nonnegative integer <seed>.
73              Default is 0, which means to use a quasirandom  arbitrary  seed.
74              Values >0 give reproducible results.
75
76
77
78
79

OPTIONS FOR SCORE SUBCOMMAND

81       -h     Print brief help specific to the score subcommand.
82
83
84
85

OPTIONS FOR GEN SUBCOMMAND

87       -h     Print brief help specific to the gen subcommand.
88
89
90       -s <seed>
91              Set  random number generator seed to nonnegative integer <seed>.
92              Default is 0, which means to use a quasirandom  arbitrary  seed.
93              Values >0 give reproducible results.
94
95
96
97       -M <M> Generate <M> counts per sampled vector. (Default 100.)
98
99
100       -N <N> Generate <N> count vectors. (Default 1000.)
101
102
103

OPTIONS FOR SAMPLE SUBCOMMAND

105       -h     Print brief help specific to the sample subcommand.
106
107
108       -s <seed>
109              Set  random number generator seed to nonnegative integer <seed>.
110              Default is 0, which means to use a quasirandom  arbitrary  seed.
111              Values >0 give reproducible results.
112
113
114
115       -K <K> Set the alphabet size to <K>.  (Default is 20, for amino acids.)
116
117
118       -Q <Q> Set the number of mixture components to <Q>.  (Default is 9.)
119
120
121
122
123

SEE ALSO

125       http://bioeasel.org/
126
127
129       Copyright (C) 2020 Howard Hughes Medical Institute.
130       Freely distributed under the BSD open source license.
131
132

AUTHOR

134       http://eddylab.org
135
136
137
138
139Easel 0.48                         Nov 2020                   esl-mixdchlet(1)
Impressum