1mlpack_logistic_regression(1)General Commands Manuamllpack_logistic_regression(1)
2
3
4
6 mlpack_logistic_regression - l2-regularized logistic regression and
7 prediction
8
10 mlpack_logistic_regression [-h] [-v]
11
13 An implementation of L2-regularized logistic regression using either
14 the L-BFGS optimizer or SGD (stochastic gradient descent). This solves
15 the regression problem
16
17 y = (1 / 1 + e^-(X * b))
18
19 where y takes values 0 or 1.
20
21 This program allows loading a logistic regression model from a file
22 (-i) or training a logistic regression model given training data (-t),
23 or both those things at once. In addition, this program allows classi‐
24 fication on a test dataset (-T) and will save the classification
25 results to the given output file (-o). The logistic regression model
26 itself may be saved with a file specified using the -m option.
27
28 The training data given with the -t option should have class labels as
29 its last dimension (so, if the training data is in CSV format, labels
30 should be the last column). Alternately, the -l (--labels_file) option
31 may be used to specify a separate file of labels.
32
33 When a model is being trained, there are many options. L2 regulariza‐
34 tion (to prevent overfitting) can be specified with the -l option, and
35 the optimizer used to train the model can be specified with the --opti‐
36 mizer option. Available options are 'sgd' (stochastic gradient
37 descent), 'lbfgs' (the L-BFGS optimizer), and 'minibatch-sgd' (mini‐
38 batch stochastic gradient descent). There are also various parameters
39 for the optimizer; the --max_iterations parameter specifies the maximum
40 number of allowed iterations, and the --tolerance (-e) parameter speci‐
41 fies the tolerance for convergence. For the SGD and mini-batch SGD
42 optimizers, the --step_size parameter controls the step size taken at
43 each iteration by the optimizer. The batch size for mini-batch SGD is
44 controlled with the --batch_size (-b) parameter. If the objective func‐
45 tion for your data is oscillating between Inf and 0, the step size is
46 probably too large. There are more parameters for the optimizers, but
47 the C++ interface must be used to access these.
48
49 For SGD, an iteration refers to a single point, and for mini-batch SGD,
50 an iteration refers to a single batch. So to take a single pass over
51 the dataset with SGD, --max_iterations should be set to the number of
52 points in the dataset.
53
54 Optionally, the model can be used to predict the responses for another
55 matrix of data points, if --test_file is specified. The --test_file
56 option can be specified without --input_file, so long as an existing
57 logistic regression model is given with --model_file. The output pre‐
58 dictions from the logistic regression model are stored in the file
59 given with --output_predictions.
60
61 This implementation of logistic regression does not support the general
62 multi-class case but instead only the two-class case. Any responses
63 must be either 0 or 1.
64
66 --batch_size (-b) [int]
67 Batch size for mini-batch SGD. Default value
68
69 50.
70
71 --decision_boundary (-d) [double] Decision boundary for pre‐
72 diction; if the logistic function for a point is less than
73 the boundary, the class is taken to be 0; otherwise, the
74 class is 1. Default value 0.5.
75
76 --help (-h)
77 Default help info.
78
79 --info [string]
80 Get help on a specific module or option. Default value ''.
81 --input_model_file (-m) [string] File containing existing model
82 (parameters). Default value ''.
83
84 --labels_file (-l) [string]
85 A file containing labels (0 or 1) for the points in the training
86 set (y). Default value ''.
87
88 --lambda (-L) [double]
89 L2-regularization parameter for training. Default value 0.
90
91 --max_iterations (-n) [int]
92 Maximum iterations for optimizer (0 indicates no limit). Default
93 value 10000.
94
95 --optimizer (-O) [string]
96 Optimizer to use for training ('lbfgs' or ’sgd'). Default value
97 'lbfgs'.
98
99 --step_size (-s) [double]
100 Step size for SGD and mini-batch SGD optimizers. Default value
101 0.01.
102
103 --test_file (-T) [string]
104 File containing test dataset. Default value ’'.
105
106 --tolerance (-e) [double]
107 Convergence tolerance for optimizer. Default value 1e-10.
108 --training_file (-t) [string] A file containing the training set
109 (the matrix of predictors, X). Default value ''.
110
111 --verbose (-v)
112 Display informational messages and the full list of parameters
113 and timers at the end of execution.
114
115 --version (-V)
116 Display the version of mlpack.
117
119 --output_file (-o) [string]
120 If --test_file is specified, this file is where the predictions
121 for the test set will be saved. Default value ''. --out‐
122 put_model_file (-M) [string] File to save trained logistic
123 regression model to. Default value ''. --output_probabili‐
124 ties_file (-p) [string] If --test_file is specified, this file
125 is where the class probabilities for the test set will be saved.
126 Default value ''.
127
130 For further information, including relevant papers, citations, and the‐
131 ory, For further information, including relevant papers, citations, and
132 theory, consult the documentation found at http://www.mlpack.org or
133 included with your consult the documentation found at
134 http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.
135 DISTRIBUTION OF MLPACK.
136
137
138
139 mlpack_logistic_regression(1)