1mlpack_random_forest(1)          User Commands         mlpack_random_forest(1)
2
3
4

NAME

6       mlpack_random_forest - random forests
7

SYNOPSIS

9        mlpack_random_forest [-m unknown] [-l string] [-n int] [-N int] [-a bool] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]
10

DESCRIPTION

12       This program is an implementation of the standard random forest classi‐
13       fication algorithm by Leo Breiman. A random forest can be  trained  and
14       saved  for  later use, or a random forest may be loaded and predictions
15       or class probabilities for points may be generated.
16
17       The training set and associated labels are specified with the '--train‐
18       ing_file  (-t)'  and '--labels_file (-l)' parameters, respectively. The
19       labels should be in the range [0,  num_classes  -  1].  Optionally,  if
20       '--labels_file (-l)' is not specified, the labels are assumed to be the
21       last dimension of the training dataset.
22
23       When a model is trained, the '--output_model_file (-M)' output  parame‐
24       ter  may  be  used to save the trained model. A model may be loaded for
25       predictions   with   the   '--input_model_file   (-m)'parameter.    The
26       '--input_model_file  (-m)'  parameter  may  not  be  specified when the
27       '--training_file (-t)' parameter is specified. The '--minimum_leaf_size
28       (-n)'  parameter  specifies  the minimum number of training points that
29       must fall into each leaf for it to be split.   The  '--num_trees  (-N)'
30       controls  the  number of trees in the random forest. If ’--print_train‐
31       ing_accuracy (-a)' is specified, the calculated accuracy on the  train‐
32       ing set will be printed.
33
34       Test  data  may be specified with the '--test_file (-T)' parameter, and
35       if performance measures are desired for that test set, labels  for  the
36       test points may be specified with the '--test_labels_file (-L)' parame‐
37       ter. Predictions for each test point may be saved  via  the  '--predic‐
38       tions_file  (-p)'output parameter. Class probabilities for each predic‐
39       tion may be saved with the ’--probabilities_file (-P)'  output  parame‐
40       ter.
41
42       For  example,  to  train a random forest with a minimum leaf size of 20
43       using 10 trees  on  the  dataset  contained  in  'data.csv'with  labels
44       'labels.csv',  saving  the  output  random forest to 'rf_model.bin' and
45       printing the training error, one could call
46
47       $  random_forest  --training_file  data.csv  --labels_file   labels.csv
48       --minimum_leaf_size  20 --num_trees 10 --output_model_file rf_model.bin
49       --print_training_accuracy
50
51       Then, to use that model to classify points in 'test_set.csv' and  print
52       the  test  error  given  the labels 'test_labels.csv' using that model,
53       while saving the predictions for each point to  'predictions.csv',  one
54       could call
55
56       $    random_forest    --input_model_file    rf_model.bin    --test_file
57       test_set.csv --test_labels_file test_labels.csv --predictions_file pre‐
58       dictions.csv
59

OPTIONAL INPUT OPTIONS

61       --help (-h) [bool]
62              Default help info.
63
64       --info [string]
65              Get help on a specific module or option.  Default value ''.
66
67       --input_model_file (-m) [unknown]
68              Pre-trained  random  forest  to  use for classification. Default
69              value ''.
70
71       --labels_file (-l) [string]
72              Labels for training dataset. Default value ''.
73
74       --minimum_leaf_size (-n) [int]
75              Minimum number of points in each leaf node.  Default value 20.
76
77       --num_trees (-N) [int]
78              Number of trees in the random forest. Default value 10.
79
80       --print_training_accuracy (-a) [bool]
81              If set, then the accuracy of the model on the training set  will
82              be predicted (verbose must also be specified).
83
84       --test_file (-T) [string]
85              Test dataset to produce predictions for.  Default value ''.
86
87       --test_labels_file (-L) [string]
88              Test dataset labels, if accuracy calculation is desired. Default
89              value ''.
90
91       --training_file (-t) [string]
92              Training dataset. Default value ''.
93
94       --verbose (-v) [bool]
95              Display informational messages and the full list  of  parameters
96              and timers at the end of execution.
97
98       --version (-V) [bool]
99              Display the version of mlpack.
100

OPTIONAL OUTPUT OPTIONS

102       --output_model_file (-M) [unknown]
103              Model to save trained random forest to. Default value ''.
104
105       --predictions_file (-p) [string]
106              Predicted  classes for each point in the test set. Default value
107              ''.
108
109       --probabilities_file (-P) [string]
110              Predicted class probabilities for each point in  the  test  set.
111              Default value ''.
112

ADDITIONAL INFORMATION

114       For further information, including relevant papers, citations, and the‐
115       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
116       included with your distribution of mlpack.
117
118
119
120mlpack-3.0.4                   21 February 2019        mlpack_random_forest(1)
Impressum