1mlpack_logistic_regression(1)    User Commands   mlpack_logistic_regression(1)
2
3
4

NAME

6       mlpack_logistic_regression  -  l2-regularized  logistic  regression and
7       prediction
8

SYNOPSIS

10        mlpack_logistic_regression [-b int] [-d double] [-m unknown] [-l string] [-L double] [-n int] [-O string] [-s double] [-T string] [-e double] [-t string] [-V bool] [-o string] [-M unknown] [-p string] [-h -v]
11

DESCRIPTION

13       An implementation of L2-regularized logistic  regression  using  either
14       the  L-BFGS optimizer or SGD (stochastic gradient descent). This solves
15       the regression problem
16
17         y = (1 / 1 + e^-(X * b))
18
19       where y takes values 0 or 1.
20
21       This program allows  loading  a  logistic  regression  model  (via  the
22--input_model_file  (-m)' parameter) or training a logistic regression
23       model given training data (specified with  the  '--training_file  (-t)'
24       parameter),  or  both  those  things at once. In addition, this program
25       allows  classification  on  a  test   dataset   (specified   with   the
26       '--test_file  (-T)'  parameter)  and  the classification results may be
27       saved with the  '--output_file  (-o)'  output  parameter.  The  trained
28       logistic  regression  model may be saved using the ’--output_model_file
29       (-M)' output parameter.
30
31       The training data, if specified, may have  class  labels  as  its  last
32       dimension.  Alternately, the '--labels_file (-l)' parameter may be used
33       to specify a separate matrix of labels.
34
35       When a model is being trained, there are many options.  L2  regulariza‐
36       tion (to prevent overfitting) can be specified with the '--lambda (-L)'
37       option, and the optimizer used to train the model can be specified with
38       the '--optimizer (-O)' parameter. Available options are 'sgd' (stochas‐
39       tic gradient descent) and ’lbfgs' (the  L-BFGS  optimizer).  There  are
40       also  various parameters for the optimizer; the '--max_iterations (-n)'
41       parameter specifies the maximum number of allowed iterations,  and  the
42       '--tolerance  (-e)'  parameter specifies the tolerance for convergence.
43       For the SGD optimizer, the '--step_size (-s)'  parameter  controls  the
44       step size taken at each iteration by the optimizer.  The batch size for
45       SGD is controlled with  the  '--batch_size  (-b)'  parameter.   If  the
46       objective  function for your data is oscillating between Inf and 0, the
47       step size is probably too large. There  are  more  parameters  for  the
48       optimizers, but the C++ interface must be used to access these.
49
50       For  SGD,  an  iteration  refers to a single point. So to take a single
51       pass over the dataset with SGD, '--max_iterations (-n)' should  be  set
52       to the number of points in the dataset.
53
54       Optionally,  the model can be used to predict the responses for another
55       matrix  of  data  points,  if  '--test_file  (-T)'  is  specified.  The
56       '--test_file  (-T)'  parameter  can  be specified without the '--train‐
57       ing_file (-t)' parameter, so long as an  existing  logistic  regression
58       model is given with the ’--input_model_file (-m)' parameter. The output
59       predictions from the logistic regression model may be  saved  with  the
60       '--output_file (-o)' parameter.
61
62       This implementation of logistic regression does not support the general
63       multi-class case but instead only the two-class case. Any  labels  must
64       be either 0 or 1. For more classes, see the softmax_regression program.
65
66       As  an  example,  to  train  a  logistic  regression  model on the data
67       ''data.csv'' with labels ''labels.csv'' with L2 regularization of  0.1,
68       saving  the  model  to  ’'lr_model.bin'',  the following command may be
69       used:
70
71       $ logistic_regression --training_file data.csv --labels_file labels.csv
72       --lambda 0.1 --output_model_file lr_model.bin
73
74       Then,   to   use   that  model  to  predict  classes  for  the  dataset
75       ''test.csv'', storing the output  predictions  in  ''predictions.csv'',
76       the following command may be used:
77
78       $   logistic_regression   --input_model_file  lr_model.bin  --test_file
79       test.csv --output_file predictions.csv
80

OPTIONAL INPUT OPTIONS

82       --batch_size (-b) [int]
83              Batch size for SGD. Default value 64.
84
85       --decision_boundary (-d) [double]
86              Decision boundary for prediction; if the logistic function for a
87              point  is  less  than  the boundary, the class is taken to be 0;
88              otherwise, the class is 1. Default value 0.5.
89
90       --help (-h) [bool]
91              Default help info.
92
93       --info [string]
94              Get help on a specific module or option.  Default value ''.
95
96       --input_model_file (-m) [unknown]
97              Existing model (parameters). Default value ''.
98
99       --labels_file (-l) [string]
100              A matrix containing labels (0 or 1) for the points in the train‐
101              ing set (y). Default value ''.
102
103       --lambda (-L) [double]
104              L2-regularization parameter for training.  Default value 0.
105
106       --max_iterations (-n) [int]
107              Maximum iterations for optimizer (0 indicates no limit). Default
108              value 10000.
109
110       --optimizer (-O) [string]
111              Optimizer to use for training ('lbfgs' or 'sgd'). Default  value
112              'lbfgs'.
113
114       --step_size (-s) [double]
115              Step size for SGD optimizer. Default value 0.01.
116
117       --test_file (-T) [string]
118              Matrix containing test dataset. Default value ''.
119
120       --tolerance (-e) [double]
121              Convergence tolerance for optimizer. Default value 1e-10.
122
123       --training_file (-t) [string]
124              A  matrix containing the training set (the matrix of predictors,
125              X). Default value ''.
126
127       --verbose (-v) [bool]
128              Display informational messages and the full list  of  parameters
129              and timers at the end of execution.
130
131       --version (-V) [bool]
132              Display the version of mlpack.
133

OPTIONAL OUTPUT OPTIONS

135       --output_file (-o) [string]
136              If  test data is specified, this matrix is where the predictions
137              for the test set will be saved.  Default value ''.
138
139       --output_model_file (-M) [unknown]
140              Output for trained logistic regression model.  Default value ''.
141
142       --output_probabilities_file (-p) [string]
143              If test data is specified, this matrix is where the class proba‐
144              bilities for the test set will be saved. Default value ''.
145

ADDITIONAL INFORMATION

147       For further information, including relevant papers, citations, and the‐
148       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
149       included with your distribution of mlpack.
150
151
152
153mlpack-3.0.4                   21 February 2019  mlpack_logistic_regression(1)
Impressum