mlpack_logistic_regression(1)

1mlpack_logistic_regression(1)General Commands Manuamllpack_logistic_regression(1)
2
3
4

NAME

6       mlpack_logistic_regression  -  l2-regularized  logistic  regression and
7       prediction
8

SYNOPSIS

10        mlpack_logistic_regression [-h] [-v]
11

DESCRIPTION

13       An implementation of L2-regularized logistic  regression  using  either
14       the  L-BFGS optimizer or SGD (stochastic gradient descent). This solves
15       the regression problem
16
17         y = (1 / 1 + e^-(X * b))
18
19       where y takes values 0 or 1.
20
21       This program allows loading a logistic regression  model  from  a  file
22       (-i)  or training a logistic regression model given training data (-t),
23       or both those things at once. In addition, this program allows  classi‐
24       fication  on  a  test  dataset  (-T)  and  will save the classification
25       results to the given output file (-o). The  logistic  regression  model
26       itself may be saved with a file specified using the -m option.
27
28       The  training data given with the -t option should have class labels as
29       its last dimension (so, if the training data is in CSV  format,  labels
30       should  be the last column). Alternately, the -l (--labels_file) option
31       may be used to specify a separate file of labels.
32
33       When a model is being trained, there are many options.  L2  regulariza‐
34       tion  (to prevent overfitting) can be specified with the -l option, and
35       the optimizer used to train the model can be specified with the --opti‐
36       mizer   option.   Available  options  are  'sgd'  (stochastic  gradient
37       descent), 'lbfgs' (the L-BFGS optimizer),  and  'minibatch-sgd'  (mini‐
38       batch  stochastic gradient descent).  There are also various parameters
39       for the optimizer; the --max_iterations parameter specifies the maximum
40       number of allowed iterations, and the --tolerance (-e) parameter speci‐
41       fies the tolerance for convergence. For  the  SGD  and  mini-batch  SGD
42       optimizers,  the  --step_size parameter controls the step size taken at
43       each iteration by the optimizer. The batch size for mini-batch  SGD  is
44       controlled with the --batch_size (-b) parameter. If the objective func‐
45       tion for your data is oscillating between Inf and 0, the step  size  is
46       probably  too  large. There are more parameters for the optimizers, but
47       the C++ interface must be used to access these.
48
49       For SGD, an iteration refers to a single point, and for mini-batch SGD,
50       an  iteration  refers  to a single batch. So to take a single pass over
51       the dataset with SGD, --max_iterations should be set to the  number  of
52       points in the dataset.
53
54       Optionally,  the model can be used to predict the responses for another
55       matrix of data points, if --test_file  is  specified.  The  --test_file
56       option  can  be  specified without --input_file, so long as an existing
57       logistic regression model is given with --model_file. The  output  pre‐
58       dictions  from  the  logistic  regression  model are stored in the file
59       given with --output_predictions.
60
61       This implementation of logistic regression does not support the general
62       multi-class  case  but  instead  only the two-class case. Any responses
63       must be either 0 or 1.
64

OPTIONAL INPUT OPTIONS

66       --batch_size (-b) [int]
67              Batch size for mini-batch SGD. Default value
68
69              50.
70
71                  --decision_boundary (-d) [double] Decision boundary for pre‐
72                  diction;  if  the logistic function for a point is less than
73                  the boundary, the class is taken to  be  0;  otherwise,  the
74                  class is 1. Default value 0.5.
75
76       --help (-h)
77              Default help info.
78
79       --info [string]
80              Get  help  on  a  specific  module or option.  Default value ''.
81              --input_model_file (-m) [string] File containing existing  model
82              (parameters).  Default value ''.
83
84       --labels_file (-l) [string]
85              A file containing labels (0 or 1) for the points in the training
86              set (y). Default value ''.
87
88       --lambda (-L) [double]
89              L2-regularization parameter for training.  Default value 0.
90
91       --max_iterations (-n) [int]
92              Maximum iterations for optimizer (0 indicates no limit). Default
93              value 10000.
94
95       --optimizer (-O) [string]
96              Optimizer  to use for training ('lbfgs' or ’sgd'). Default value
97              'lbfgs'.
98
99       --step_size (-s) [double]
100              Step size for SGD and mini-batch SGD optimizers.  Default  value
101              0.01.
102
103       --test_file (-T) [string]
104              File containing test dataset. Default value ’'.
105
106       --tolerance (-e) [double]
107              Convergence   tolerance  for  optimizer.  Default  value  1e-10.
108              --training_file (-t) [string] A file containing the training set
109              (the matrix of predictors, X). Default value ''.
110
111       --verbose (-v)
112              Display  informational  messages and the full list of parameters
113              and timers at the end of execution.
114
115       --version (-V)
116              Display the version of mlpack.
117

OPTIONAL OUTPUT OPTIONS

119       --output_file (-o) [string]
120              If --test_file is specified, this file is where the  predictions
121              for  the  test  set  will  be  saved.  Default value ''.  --out‐
122              put_model_file (-M)  [string]  File  to  save  trained  logistic
123              regression  model  to.  Default  value  ''.  --output_probabili‐
124              ties_file (-p) [string] If --test_file is specified,  this  file
125              is where the class probabilities for the test set will be saved.
126              Default value ''.
127

ADDITIONAL INFORMATION

130       For further information, including relevant papers, citations, and the‐
131       ory, For further information, including relevant papers, citations, and
132       theory, consult the documentation  found  at  http://www.mlpack.org  or
133       included    with    your    consult    the   documentation   found   at
134       http://www.mlpack.org or included with  your  DISTRIBUTION  OF  MLPACK.
135       DISTRIBUTION OF MLPACK.
136
137
138
139                                                 mlpack_logistic_regression(1)