1mlpack_gmm_train(1) General Commands Manual mlpack_gmm_train(1)
2
3
4
6 mlpack_gmm_train - gaussian mixture model (gmm) training
7
9 mlpack_gmm_train [-h] [-v]
10
12 This program takes a parametric estimate of a Gaussian mixture model
13 (GMM) using the EM algorithm to find the maximum likelihood estimate.
14 The model may be saved to file, which will contain information about
15 each Gaussian.
16
17 If GMM training fails with an error indicating that a covariance matrix
18 could not be inverted, make sure that the --no_force_positive flag is
19 not specified. Alternately, adding a small amount of Gaussian noise
20 (using the --noise parameter) to the entire dataset may help prevent
21 Gaussians with zero variance in a particular dimension, which is usu‐
22 ally the cause of non-invertible covariance matrices.
23
24 The 'no_force_positive' flag, if set, will avoid the checks after each
25 iteration of the EM algorithm which ensure that the covariance matrices
26 are positive definite. Specifying the flag can cause faster runtime,
27 but may also cause non-positive definite covariance matrices, which
28 will cause the program to crash.
29
30 Optionally, multiple trials may be performed, by specifying the --tri‐
31 als option. The model with greatest log-likelihood will be taken.
32
34 --gaussians (-g) [int]
35 Number of Gaussians in the GMM.
36
37 --input_file (-i) [string]
38 File containing the data on which the model will be fit.
39
41 --help (-h)
42 Default help info.
43
44 --info [string]
45 Get help on a specific module or option. Default value ''.
46 --input_model_file (-m) [string] File containing initial input
47 GMM model. Default value ''.
48
49 --max_iterations (-n) [int]
50 Maximum number of iterations of EM algorithm (passing 0 will run
51 until convergence). Default value 250.
52
53 --no_force_positive (-P)
54 Do not force the covariance matrices to be positive definite.
55
56 --noise (-N) [double]
57 Variance of zero-mean Gaussian noise to add to data. Default
58 value 0.
59
60 --percentage (-p) [double]
61 If using --refined_start, specify the percentage of the dataset
62 used for each sampling (should be between 0.0 and 1.0). Default
63 value 0.02.
64
65 --refined_start (-r)
66 During the initialization, use refined initial positions for k-
67 means clustering (Bradley and Fayyad, 1998).
68
69 --samplings (-S) [int]
70 If using --refined_start, specify the number of samplings used
71 for initial points. Default value 100.
72
73 --seed (-s) [int]
74 Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
75
76 --tolerance (-T) [double]
77 Tolerance for convergence of EM. Default value 1e-10.
78
79 --trials (-t) [int]
80 Number of trials to perform in training GMM. Default value 1.
81
82 --verbose (-v)
83 Display informational messages and the full list of parameters
84 and timers at the end of execution.
85
86 --version (-V)
87 Display the version of mlpack.
88
90 --output_model_file (-M) [string] File to save trained GMM model to.
91 Default value ''.
92
95 For further information, including relevant papers, citations, and the‐
96 ory, For further information, including relevant papers, citations, and
97 theory, consult the documentation found at http://www.mlpack.org or
98 included with your consult the documentation found at
99 http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.
100 DISTRIBUTION OF MLPACK.
101
102
103
104 mlpack_gmm_train(1)