1mlpack_logistic_regression(1) User Commands mlpack_logistic_regression(1)
2
3
4
6 mlpack_logistic_regression - l2-regularized logistic regression and
7 prediction
8
10 mlpack_logistic_regression [-b int] [-d double] [-m unknown] [-l string] [-L double] [-n int] [-O string] [-s double] [-T string] [-e double] [-t string] [-V bool] [-o string] [-M unknown] [-p string] [-h -v]
11
13 An implementation of L2-regularized logistic regression using either
14 the L-BFGS optimizer or SGD (stochastic gradient descent). This solves
15 the regression problem
16
17 y = (1 / 1 + e^-(X * b))
18
19 where y takes values 0 or 1.
20
21 This program allows loading a logistic regression model (via the
22 ’--input_model_file (-m)' parameter) or training a logistic regression
23 model given training data (specified with the '--training_file (-t)'
24 parameter), or both those things at once. In addition, this program
25 allows classification on a test dataset (specified with the
26 '--test_file (-T)' parameter) and the classification results may be
27 saved with the '--output_file (-o)' output parameter. The trained
28 logistic regression model may be saved using the ’--output_model_file
29 (-M)' output parameter.
30
31 The training data, if specified, may have class labels as its last
32 dimension. Alternately, the '--labels_file (-l)' parameter may be used
33 to specify a separate matrix of labels.
34
35 When a model is being trained, there are many options. L2 regulariza‐
36 tion (to prevent overfitting) can be specified with the '--lambda (-L)'
37 option, and the optimizer used to train the model can be specified with
38 the '--optimizer (-O)' parameter. Available options are 'sgd' (stochas‐
39 tic gradient descent) and ’lbfgs' (the L-BFGS optimizer). There are
40 also various parameters for the optimizer; the '--max_iterations (-n)'
41 parameter specifies the maximum number of allowed iterations, and the
42 '--tolerance (-e)' parameter specifies the tolerance for convergence.
43 For the SGD optimizer, the '--step_size (-s)' parameter controls the
44 step size taken at each iteration by the optimizer. The batch size for
45 SGD is controlled with the '--batch_size (-b)' parameter. If the
46 objective function for your data is oscillating between Inf and 0, the
47 step size is probably too large. There are more parameters for the
48 optimizers, but the C++ interface must be used to access these.
49
50 For SGD, an iteration refers to a single point. So to take a single
51 pass over the dataset with SGD, '--max_iterations (-n)' should be set
52 to the number of points in the dataset.
53
54 Optionally, the model can be used to predict the responses for another
55 matrix of data points, if '--test_file (-T)' is specified. The
56 '--test_file (-T)' parameter can be specified without the '--train‐
57 ing_file (-t)' parameter, so long as an existing logistic regression
58 model is given with the ’--input_model_file (-m)' parameter. The output
59 predictions from the logistic regression model may be saved with the
60 '--output_file (-o)' parameter.
61
62 This implementation of logistic regression does not support the general
63 multi-class case but instead only the two-class case. Any labels must
64 be either 0 or 1. For more classes, see the softmax_regression program.
65
66 As an example, to train a logistic regression model on the data
67 ''data.csv'' with labels ''labels.csv'' with L2 regularization of 0.1,
68 saving the model to ’'lr_model.bin'', the following command may be
69 used:
70
71 $ logistic_regression --training_file data.csv --labels_file labels.csv
72 --lambda 0.1 --output_model_file lr_model.bin
73
74 Then, to use that model to predict classes for the dataset
75 ''test.csv'', storing the output predictions in ''predictions.csv'',
76 the following command may be used:
77
78 $ logistic_regression --input_model_file lr_model.bin --test_file
79 test.csv --output_file predictions.csv
80
82 --batch_size (-b) [int]
83 Batch size for SGD. Default value 64.
84
85 --decision_boundary (-d) [double]
86 Decision boundary for prediction; if the logistic function for a
87 point is less than the boundary, the class is taken to be 0;
88 otherwise, the class is 1. Default value 0.5.
89
90 --help (-h) [bool]
91 Default help info.
92
93 --info [string]
94 Get help on a specific module or option. Default value ''.
95
96 --input_model_file (-m) [unknown]
97 Existing model (parameters). Default value ''.
98
99 --labels_file (-l) [string]
100 A matrix containing labels (0 or 1) for the points in the train‐
101 ing set (y). Default value ''.
102
103 --lambda (-L) [double]
104 L2-regularization parameter for training. Default value 0.
105
106 --max_iterations (-n) [int]
107 Maximum iterations for optimizer (0 indicates no limit). Default
108 value 10000.
109
110 --optimizer (-O) [string]
111 Optimizer to use for training ('lbfgs' or 'sgd'). Default value
112 'lbfgs'.
113
114 --step_size (-s) [double]
115 Step size for SGD optimizer. Default value 0.01.
116
117 --test_file (-T) [string]
118 Matrix containing test dataset. Default value ''.
119
120 --tolerance (-e) [double]
121 Convergence tolerance for optimizer. Default value 1e-10.
122
123 --training_file (-t) [string]
124 A matrix containing the training set (the matrix of predictors,
125 X). Default value ''.
126
127 --verbose (-v) [bool]
128 Display informational messages and the full list of parameters
129 and timers at the end of execution.
130
131 --version (-V) [bool]
132 Display the version of mlpack.
133
135 --output_file (-o) [string]
136 If test data is specified, this matrix is where the predictions
137 for the test set will be saved. Default value ''.
138
139 --output_model_file (-M) [unknown]
140 Output for trained logistic regression model. Default value ''.
141
142 --output_probabilities_file (-p) [string]
143 If test data is specified, this matrix is where the class proba‐
144 bilities for the test set will be saved. Default value ''.
145
147 For further information, including relevant papers, citations, and the‐
148 ory, consult the documentation found at http://www.mlpack.org or
149 included with your distribution of mlpack.
150
151
152
153mlpack-3.0.4 21 February 2019 mlpack_logistic_regression(1)