mlpack_gmm_train(1)

1mlpack_gmm_train(1)              User Commands             mlpack_gmm_train(1)
2
3
4

NAME

6       mlpack_gmm_train - gaussian mixture model (gmm) training
7

SYNOPSIS

9        mlpack_gmm_train -g int -i string [-d bool] [-m unknown] [-n int] [-P bool] [-N double] [-p double] [-r bool] [-S int] [-s int] [-T double] [-t int] [-V bool] [-M unknown] [-h -v]
10

DESCRIPTION

12       This  program  takes  a parametric estimate of a Gaussian mixture model
13       (GMM) using the EM algorithm to find the maximum  likelihood  estimate.
14       The model may be saved and reused by other mlpack GMM tools.
15
16       The  input  data  to  train on must be specified with the '--input_file
17       (-i)' parameter, and the number of Gaussians in the model must be spec‐
18       ified  with  the  ’--gaussians (-g)' parameter. Optionally, many trials
19       with different random initializations may be run, and the  result  with
20       highest  log-likelihood  on the training data will be taken. The number
21       of trials to run is specified with the '--trials  (-t)'  parameter.  By
22       default, only one trial is run.
23
24       The  tolerance  for convergence and maximum number of iterations of the
25       EM algorithm are specified with the '--tolerance (-T)' and '--max_iter‐
26       ations  (-n)'  parameters, respectively. The GMM may be initialized for
27       training with another model,  specified  with  the  '--input_model_file
28       (-m)'  parameter.   Otherwise,  the  model is initialized by running k-
29       means on the data. The k-means clustering initialization  can  be  con‐
30       trolled  with  the  '--refined_start  (-r)',  '--samplings  (-S)',  and
31       '--percentage (-p)' parameters. If ’--refined_start (-r)' is specified,
32       then the Bradley-Fayyad refined start initialization will be used. This
33       can often lead to better clustering results.
34
35       The 'diagonal_covariance' flag will cause the learned covariances to be
36       diagonal  matrices.  This significantly simplifies the model itself and
37       causes training to be faster, but restricts the  ability  to  fit  more
38       complex GMMs.
39
40       If GMM training fails with an error indicating that a covariance matrix
41       could not be inverted, make sure that  the  '--no_force_positive  (-P)'
42       parameter  is  not  specified.  Alternately,  adding  a small amount of
43       Gaussian noise (using the  '--noise  (-N)'  parameter)  to  the  entire
44       dataset  may  help prevent Gaussians with zero variance in a particular
45       dimension, which is usually  the  cause  of  non-invertible  covariance
46       matrices.
47
48       The '--no_force_positive (-P)' parameter, if set, will avoid the checks
49       after each iteration of the EM algorithm which ensure that the  covari‐
50       ance  matrices  are  positive  definite.  Specifying the flag can cause
51       faster runtime, but may also  cause  non-positive  definite  covariance
52       matrices, which will cause the program to crash.
53
54       As an example, to train a 6-Gaussian GMM on the data in 'data.csv' with
55       a maximum of 100 iterations of EM and 3 trials, saving the trained  GMM
56       to ’gmm.bin', the following command can be used:
57
58       $  gmm_train  --input_file  data.csv  --gaussians  6  --trials 3 --out‐
59       put_model_file gmm.bin
60
61       To re-train that GMM on another set of data 'data2.csv', the  following
62       command may be used:
63
64       $  gmm_train  --input_model_file gmm.bin --input_file data2.csv --gaus‐
65       sians 6 --output_model_file new_gmm.bin
66

REQUIRED INPUT OPTIONS

68       --gaussians (-g) [int]
69              Number of Gaussians in the GMM.
70
71       --input_file (-i) [string]
72              The training data on which the model will be fit.
73

OPTIONAL INPUT OPTIONS

75       --diagonal_covariance (-d) [bool]
76              Force the covariance of the Gaussians to be diagonal.  This  can
77              accelerate training time significantly.
78
79       --help (-h) [bool]
80              Default help info.
81
82       --info [string]
83              Get help on a specific module or option.  Default value ''.
84
85       --input_model_file (-m) [unknown]
86              Initial  input  GMM model to start training with.  Default value
87              ''.
88
89       --max_iterations (-n) [int]
90              Maximum number of iterations of EM algorithm (passing 0 will run
91              until convergence). Default value 250.
92
93       --no_force_positive (-P) [bool]
94              Do not force the covariance matrices to be positive definite.
95
96       --noise (-N) [double]
97              Variance  of  zero-mean  Gaussian  noise to add to data. Default
98              value 0.
99
100       --percentage (-p) [double]
101              If using --refined_start, specify the percentage of the  dataset
102              used  for each sampling (should be between 0.0 and 1.0). Default
103              value 0.02.
104
105       --refined_start (-r) [bool]
106              During the initialization, use refined initial positions for  k-
107              means clustering (Bradley and Fayyad, 1998).
108
109       --samplings (-S) [int]
110              If  using  --refined_start, specify the number of samplings used
111              for initial points. Default value 100.
112
113       --seed (-s) [int]
114              Random seed. If 0, 'std::time(NULL)' is used.  Default value 0.
115
116       --tolerance (-T) [double]
117              Tolerance for convergence of EM. Default value 1e-10.
118
119       --trials (-t) [int]
120              Number of trials to perform in training GMM.  Default value 1.
121
122       --verbose (-v) [bool]
123              Display informational messages and the full list  of  parameters
124              and timers at the end of execution.
125
126       --version (-V) [bool]
127              Display the version of mlpack.
128

OPTIONAL OUTPUT OPTIONS

130       --output_model_file (-M) [unknown]
131              Output for trained GMM model. Default value ''.
132

ADDITIONAL INFORMATION

134       For further information, including relevant papers, citations, and the‐
135       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
136       included with your distribution of mlpack.
137
138
139
140mlpack-3.0.4                   21 February 2019            mlpack_gmm_train(1)