1mlpack_hoeffding_tree(1) User Commands mlpack_hoeffding_tree(1)
2
3
4
6 mlpack_hoeffding_tree - hoeffding trees
7
9 mlpack_hoeffding_tree [-b bool] [-B int] [-c double] [-m unknown] [-l string] [-n int] [-I int] [-N string] [-o int] [-s int] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]
10
12 This program implements Hoeffding trees, a form of streaming decision
13 tree suited best for large (or streaming) datasets. This program sup‐
14 ports both categorical and numeric data. Given an input dataset, this
15 program is able to train the tree with numerous training options, and
16 save the model to a file. The program is also able to use a trained
17 model or a model from file in order to predict classes for a given test
18 set.
19
20 The training file and associated labels are specified with the
21 ’--training_file (-t)' and '--labels_file (-l)' parameters, respec‐
22 tively. Optionally, if '--labels_file (-l)' is not specified, the
23 labels are assumed to be the last dimension of the training dataset.
24
25 The training may be performed in batch mode (like a typical decision
26 tree algorithm) by specifying the '--batch_mode (-b)' option, but this
27 may not be the best option for large datasets.
28
29 When a model is trained, it may be saved via the '--output_model_file
30 (-M)' output parameter. A model may be loaded from file for further
31 training or testing with the '--input_model_file (-m)' parameter.
32
33 Test data may be specified with the '--test_file (-T)' parameter, and
34 if performance statistics are desired for that test set, labels may be
35 specified with the '--test_labels_file (-L)' parameter. Predictions for
36 each test point may be saved with the '--predictions_file (-p)' output
37 parameter, and class probabilities for each prediction may be saved
38 with the '--probabilities_file (-P)' output parameter.
39
40 For example, to train a Hoeffding tree with confidence 0.99 with data
41 ’dataset.csv', saving the trained tree to 'tree.bin', the following
42 command may be used:
43
44 $ hoeffding_tree --training_file dataset.arff --confidence 0.99 --out‐
45 put_model_file tree.bin
46
47 Then, this tree may be used to make predictions on the test set
48 ’test_set.csv', saving the predictions into 'predictions.csv' and the
49 class probabilities into 'class_probs.csv' with the following command:
50
51 $ hoeffding_tree --input_model_file tree.bin --test_file test_set.arff
52 --predictions_file predictions.csv --probabilities_file class_probs.csv
53
55 --batch_mode (-b) [bool]
56 If true, samples will be considered in batch instead of as a
57 stream. This generally results in better trees but at the cost
58 of memory usage and runtime.
59
60 --bins (-B) [int]
61 If the 'domingos' split strategy is used, this specifies the
62 number of bins for each numeric split. Default value 10.
63
64 --confidence (-c) [double]
65 Confidence before splitting (between 0 and 1). Default value
66 0.95.
67
68 --help (-h) [bool]
69 Default help info.
70
71 --info [string]
72 Get help on a specific module or option. Default value ''.
73
74 --info_gain (-i) [bool]
75 If set, information gain is used instead of Gini impurity for
76 calculating Hoeffding bounds.
77
78 --input_model_file (-m) [unknown]
79 Input trained Hoeffding tree model. Default value ''.
80
81 --labels_file (-l) [string]
82 Labels for training dataset. Default value ''.
83
84 --max_samples (-n) [int]
85 Maximum number of samples before splitting. Default value 5000.
86
87 --min_samples (-I) [int]
88 Minimum number of samples before splitting. Default value 100.
89
90 --numeric_split_strategy (-N) [string]
91 The splitting strategy to use for numeric features: 'domingos'
92 or 'binary'. Default value 'binary'.
93
94 --observations_before_binning (-o) [int]
95 If the 'domingos' split strategy is used, this specifies the
96 number of samples observed before binning is performed. Default
97 value 100.
98
99 --passes (-s) [int]
100 Number of passes to take over the dataset. Default value 1.
101
102 --test_file (-T) [string]
103 Testing dataset (may be categorical). Default value ''.
104
105 --test_labels_file (-L) [string]
106 Labels of test data. Default value ''.
107
108 --training_file (-t) [string]
109 Training dataset (may be categorical). Default value ''.
110
111 --verbose (-v) [bool]
112 Display informational messages and the full list of parameters
113 and timers at the end of execution.
114
115 --version (-V) [bool]
116 Display the version of mlpack.
117
119 --output_model_file (-M) [unknown]
120 Output for trained Hoeffding tree model. Default value ''.
121
122 --predictions_file (-p) [string]
123 Matrix to output label predictions for test data into. Default
124 value ''.
125
126 --probabilities_file (-P) [string]
127 In addition to predicting labels, provide rediction probabili‐
128 ties in this matrix. Default value ''.
129
131 For further information, including relevant papers, citations, and the‐
132 ory, consult the documentation found at http://www.mlpack.org or
133 included with your distribution of mlpack.
134
135
136
137mlpack-3.0.4 21 February 2019 mlpack_hoeffding_tree(1)