1mlpack_decision_tree(1) User Commands mlpack_decision_tree(1)
2
3
4
6 mlpack_decision_tree - decision tree
7
9 mlpack_decision_tree [-m unknown] [-l string] [-g double] [-n int] [-e bool] [-T string] [-L string] [-t string] [-V bool] [-w string] [-M unknown] [-p string] [-P string] [-h -v]
10
12 Train and evaluate using a decision tree. Given a dataset containing
13 numeric or categorical features, and associated labels for each point
14 in the dataset, this program can train a decision tree on that data.
15
16 The training set and associated labels are specified with the '--train‐
17 ing_file (-t)' and '--labels_file (-l)' parameters, respectively. The
18 labels should be in the range [0, num_classes - 1]. Optionally, if
19 '--labels_file (-l)' is not specified, the labels are assumed to be the
20 last dimension of the training dataset.
21
22 When a model is trained, the '--output_model_file (-M)' output parame‐
23 ter may be used to save the trained model. A model may be loaded for
24 predictions with the '--input_model_file (-m)' parameter. The
25 '--input_model_file (-m)' parameter may not be specified when the
26 '--training_file (-t)' parameter is specified. The '--minimum_leaf_size
27 (-n)' parameter specifies the minimum number of training points that
28 must fall into each leaf for it to be split. The '--minimum_gain_split
29 (-g)' parameter specifies the minimum gain that is needed for the node
30 to split. If '--print_training_error (-e)' is specified, the training
31 error will be printed.
32
33 Test data may be specified with the '--test_file (-T)' parameter, and
34 if performance numbers are desired for that test set, labels may be
35 specified with the '--test_labels_file (-L)' parameter. Predictions for
36 each test point may be saved via the '--predictions_file (-p)' output
37 parameter. Class probabilities for each prediction may be saved with
38 the '--probabilities_file (-P)' output parameter.
39
40 For example, to train a decision tree with a minimum leaf size of 20 on
41 the dataset contained in 'data.csv' with labels 'labels.csv', saving
42 the output model to 'tree.bin' and printing the training error, one
43 could call
44
45 $ decision_tree --training_file data.arff --labels_file labels.csv
46 --output_model_file tree.bin --minimum_leaf_size 20 --mini‐
47 mum_gain_split 0.001 --print_training_error
48
49 Then, to use that model to classify points in 'test_set.csv' and print
50 the test error given the labels 'test_labels.csv' using that model,
51 while saving the predictions for each point to 'predictions.csv', one
52 could call
53
54 $ decision_tree --input_model_file tree.bin --test_file test_set.arff
55 --test_labels_file test_labels.csv --predictions_file predictions.csv
56
58 --help (-h) [bool]
59 Default help info.
60
61 --info [string]
62 Get help on a specific module or option. Default value ''.
63
64 --input_model_file (-m) [unknown]
65 Pre-trained decision tree, to be used with test points. Default
66 value ''.
67
68 --labels_file (-l) [string]
69 Training labels. Default value ''.
70
71 --minimum_gain_split (-g) [double]
72 Minimum gain for node splitting. Default value 1e-07.
73
74 --minimum_leaf_size (-n) [int]
75 Minimum number of points in a leaf. Default value 20.
76
77 --print_training_error (-e) [bool]
78 Print the training error.
79
80 --test_file (-T) [string]
81 Testing dataset (may be categorical). Default value ''.
82
83 --test_labels_file (-L) [string]
84 Test point labels, if accuracy calculation is desired. Default
85 value ''.
86
87 --training_file (-t) [string]
88 Training dataset (may be categorical). Default value ''.
89
90 --verbose (-v) [bool]
91 Display informational messages and the full list of parameters
92 and timers at the end of execution.
93
94 --version (-V) [bool]
95 Display the version of mlpack.
96
97 --weights_file (-w) [string] The weight of labels Default value ''.
98
100 --output_model_file (-M) [unknown]
101 Output for trained decision tree. Default value ''.
102
103 --predictions_file (-p) [string]
104 Class predictions for each test point. Default value ''.
105
106 --probabilities_file (-P) [string]
107 Class probabilities for each test point. Default value ''.
108
110 For further information, including relevant papers, citations, and the‐
111 ory, consult the documentation found at http://www.mlpack.org or
112 included with your distribution of mlpack.
113
114
115
116mlpack-3.0.4 21 February 2019 mlpack_decision_tree(1)