1mlpack_hoeffding_tree(1) General Commands Manual mlpack_hoeffding_tree(1)
2
3
4
6 mlpack_hoeffding_tree - hoeffding trees
7
9 mlpack_hoeffding_tree [-h] [-v]
10
12 This program implements Hoeffding trees, a form of streaming decision
13 tree suited best for large (or streaming) datasets. This program sup‐
14 ports both categorical and numeric data stored in the ARFF format.
15 Given an input dataset, this program is able to train the tree with
16 numerous training options, and save the model to a file. The program is
17 also able to use a trained model or a model from file in order to pre‐
18 dict classes for a given test set.
19
20 The training file and associated labels are specified with the --train‐
21 ing_file and --labels_file options, respectively. The training file
22 must be in ARFF format. The training may be performed in batch mode
23 (like a typical decision tree algorithm) by specifying the --batch_mode
24 option, but this may not be the best option for large datasets.
25
26 When a model is trained, it may be saved to a file with the --out‐
27 put_model_file (-M) option. A model may be loaded from file for further
28 training or testing with the --input_model_file (-m) option.
29
30 A test file may be specified with the --test_file (-T) option, and if
31 performance numbers are desired for that test set, labels may be speci‐
32 fied with the --test_labels_file (-L) option. Predictions for each test
33 point will be stored in the file specified by --predictions_file (-p)
34 and probabilities for each predictions will be stored in the file spec‐
35 ified by the --probabilities_file (-P) option.
36
38 --batch_mode (-b)
39 If true, samples will be considered in batch instead of as a
40 stream. This generally results in better trees but at the cost
41 of memory usage and runtime.
42
43 --bins (-B) [int]
44 If the 'domingos' split strategy is used, this specifies the
45 number of bins for each numeric split. Default value 10.
46
47 --confidence (-c) [double]
48 Confidence before splitting (between 0 and 1). Default value
49 0.95.
50
51 --help (-h)
52 Default help info.
53
54 --info [string]
55 Get help on a specific module or option. Default value ''.
56
57 --info_gain (-i)
58 If set, information gain is used instead of Gini impurity for
59 calculating Hoeffding bounds. --input_model_file (-m) [string]
60 File to load trained tree from. Default value ’'.
61
62 --labels_file (-l) [string]
63 Labels for training dataset. Default value ''.
64
65 --max_samples (-n) [int]
66 Maximum number of samples before splitting. Default value 5000.
67
68 --min_samples (-I) [int]
69 Minimum number of samples before splitting. Default value 100.
70 --numeric_split_strategy (-N) [string] The splitting strategy to
71 use for numeric features: 'domingos' or 'binary'. Default value
72 ’binary'. --observations_before_binning (-o) [int] If the
73 'domingos' split strategy is used, this specifies the number of
74 samples observed before binning is performed. Default value 100.
75
76 --passes (-s) [int]
77 Number of passes to take over the dataset. Default value 1.
78
79 --test_file (-T) [string]
80 File of testing data. Default value ''. --test_labels_file (-L)
81 [string] Labels of test data. Default value ''. --training_file
82 (-t) [string] Training dataset file. Default value ''.
83
84 --verbose (-v)
85 Display informational messages and the full list of parameters
86 and timers at the end of execution.
87
88 --version (-V)
89 Display the version of mlpack.
90
92 --output_model_file (-M) [string] File to save trained tree to. Default
93 value ’'. --predictions_file (-p) [string] File to output label pre‐
94 dictions for test data into. Default value ''. --probabilities_file
95 (-P) [string] In addition to predicting labels, provide prediction
96 probabilities in this file. Default value ''.
97
100 For further information, including relevant papers, citations, and the‐
101 ory, For further information, including relevant papers, citations, and
102 theory, consult the documentation found at http://www.mlpack.org or
103 included with your consult the documentation found at
104 http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.
105 DISTRIBUTION OF MLPACK.
106
107
108
109 mlpack_hoeffding_tree(1)