1mlpack_random_forest(1) User Commands mlpack_random_forest(1)
2
3
4
6 mlpack_random_forest - random forests
7
9 mlpack_random_forest [-m unknown] [-l string] [-n int] [-N int] [-a bool] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]
10
12 This program is an implementation of the standard random forest classi‐
13 fication algorithm by Leo Breiman. A random forest can be trained and
14 saved for later use, or a random forest may be loaded and predictions
15 or class probabilities for points may be generated.
16
17 The training set and associated labels are specified with the '--train‐
18 ing_file (-t)' and '--labels_file (-l)' parameters, respectively. The
19 labels should be in the range [0, num_classes - 1]. Optionally, if
20 '--labels_file (-l)' is not specified, the labels are assumed to be the
21 last dimension of the training dataset.
22
23 When a model is trained, the '--output_model_file (-M)' output parame‐
24 ter may be used to save the trained model. A model may be loaded for
25 predictions with the '--input_model_file (-m)'parameter. The
26 '--input_model_file (-m)' parameter may not be specified when the
27 '--training_file (-t)' parameter is specified. The '--minimum_leaf_size
28 (-n)' parameter specifies the minimum number of training points that
29 must fall into each leaf for it to be split. The '--num_trees (-N)'
30 controls the number of trees in the random forest. If ’--print_train‐
31 ing_accuracy (-a)' is specified, the calculated accuracy on the train‐
32 ing set will be printed.
33
34 Test data may be specified with the '--test_file (-T)' parameter, and
35 if performance measures are desired for that test set, labels for the
36 test points may be specified with the '--test_labels_file (-L)' parame‐
37 ter. Predictions for each test point may be saved via the '--predic‐
38 tions_file (-p)'output parameter. Class probabilities for each predic‐
39 tion may be saved with the ’--probabilities_file (-P)' output parame‐
40 ter.
41
42 For example, to train a random forest with a minimum leaf size of 20
43 using 10 trees on the dataset contained in 'data.csv'with labels
44 'labels.csv', saving the output random forest to 'rf_model.bin' and
45 printing the training error, one could call
46
47 $ random_forest --training_file data.csv --labels_file labels.csv
48 --minimum_leaf_size 20 --num_trees 10 --output_model_file rf_model.bin
49 --print_training_accuracy
50
51 Then, to use that model to classify points in 'test_set.csv' and print
52 the test error given the labels 'test_labels.csv' using that model,
53 while saving the predictions for each point to 'predictions.csv', one
54 could call
55
56 $ random_forest --input_model_file rf_model.bin --test_file
57 test_set.csv --test_labels_file test_labels.csv --predictions_file pre‐
58 dictions.csv
59
61 --help (-h) [bool]
62 Default help info.
63
64 --info [string]
65 Get help on a specific module or option. Default value ''.
66
67 --input_model_file (-m) [unknown]
68 Pre-trained random forest to use for classification. Default
69 value ''.
70
71 --labels_file (-l) [string]
72 Labels for training dataset. Default value ''.
73
74 --minimum_leaf_size (-n) [int]
75 Minimum number of points in each leaf node. Default value 20.
76
77 --num_trees (-N) [int]
78 Number of trees in the random forest. Default value 10.
79
80 --print_training_accuracy (-a) [bool]
81 If set, then the accuracy of the model on the training set will
82 be predicted (verbose must also be specified).
83
84 --test_file (-T) [string]
85 Test dataset to produce predictions for. Default value ''.
86
87 --test_labels_file (-L) [string]
88 Test dataset labels, if accuracy calculation is desired. Default
89 value ''.
90
91 --training_file (-t) [string]
92 Training dataset. Default value ''.
93
94 --verbose (-v) [bool]
95 Display informational messages and the full list of parameters
96 and timers at the end of execution.
97
98 --version (-V) [bool]
99 Display the version of mlpack.
100
102 --output_model_file (-M) [unknown]
103 Model to save trained random forest to. Default value ''.
104
105 --predictions_file (-p) [string]
106 Predicted classes for each point in the test set. Default value
107 ''.
108
109 --probabilities_file (-P) [string]
110 Predicted class probabilities for each point in the test set.
111 Default value ''.
112
114 For further information, including relevant papers, citations, and the‐
115 ory, consult the documentation found at http://www.mlpack.org or
116 included with your distribution of mlpack.
117
118
119
120mlpack-3.0.4 21 February 2019 mlpack_random_forest(1)