1mlpack_hoeffding_tree(1)         User Commands        mlpack_hoeffding_tree(1)
2
3
4

NAME

6       mlpack_hoeffding_tree - hoeffding trees
7

SYNOPSIS

9        mlpack_hoeffding_tree [-b bool] [-B int] [-c double] [-m unknown] [-l string] [-n int] [-I int] [-N string] [-o int] [-s int] [-T string] [-L string] [-t string] [-V bool] [-M unknown] [-p string] [-P string] [-h -v]
10

DESCRIPTION

12       This  program  implements Hoeffding trees, a form of streaming decision
13       tree suited best for large (or streaming) datasets. This  program  sup‐
14       ports  both  categorical and numeric data. Given an input dataset, this
15       program is able to train the tree with numerous training  options,  and
16       save  the  model  to a file.  The program is also able to use a trained
17       model or a model from file in order to predict classes for a given test
18       set.
19
20       The  training  file  and  associated  labels  are  specified  with  the
21--training_file (-t)' and  '--labels_file  (-l)'  parameters,  respec‐
22       tively.   Optionally,  if  '--labels_file  (-l)'  is not specified, the
23       labels are assumed to be the last dimension of the training dataset.
24
25       The training may be performed in batch mode (like  a  typical  decision
26       tree  algorithm) by specifying the '--batch_mode (-b)' option, but this
27       may not be the best option for large datasets.
28
29       When a model is trained, it may be saved via  the  '--output_model_file
30       (-M)'  output  parameter.  A  model may be loaded from file for further
31       training or testing with the '--input_model_file (-m)' parameter.
32
33       Test data may be specified with the '--test_file (-T)'  parameter,  and
34       if  performance statistics are desired for that test set, labels may be
35       specified with the '--test_labels_file (-L)' parameter. Predictions for
36       each  test point may be saved with the '--predictions_file (-p)' output
37       parameter, and class probabilities for each  prediction  may  be  saved
38       with the '--probabilities_file (-P)' output parameter.
39
40       For  example,  to train a Hoeffding tree with confidence 0.99 with data
41       ’dataset.csv', saving the trained tree  to  'tree.bin',  the  following
42       command may be used:
43
44       $  hoeffding_tree --training_file dataset.arff --confidence 0.99 --out‐
45       put_model_file tree.bin
46
47       Then, this tree may be  used  to  make  predictions  on  the  test  set
48       ’test_set.csv',  saving  the predictions into 'predictions.csv' and the
49       class probabilities into 'class_probs.csv' with the following command:
50
51       $ hoeffding_tree --input_model_file tree.bin --test_file  test_set.arff
52       --predictions_file predictions.csv --probabilities_file class_probs.csv
53

OPTIONAL INPUT OPTIONS

55       --batch_mode (-b) [bool]
56              If  true,  samples  will  be considered in batch instead of as a
57              stream. This generally results in better trees but at  the  cost
58              of memory usage and runtime.
59
60       --bins (-B) [int]
61              If  the  'domingos'  split  strategy is used, this specifies the
62              number of bins for each numeric split. Default value 10.
63
64       --confidence (-c) [double]
65              Confidence before splitting (between 0 and  1).   Default  value
66              0.95.
67
68       --help (-h) [bool]
69              Default help info.
70
71       --info [string]
72              Get help on a specific module or option.  Default value ''.
73
74       --info_gain (-i) [bool]
75              If  set,  information  gain is used instead of Gini impurity for
76              calculating Hoeffding bounds.
77
78       --input_model_file (-m) [unknown]
79              Input trained Hoeffding tree model. Default value ''.
80
81       --labels_file (-l) [string]
82              Labels for training dataset. Default value ''.
83
84       --max_samples (-n) [int]
85              Maximum number of samples before splitting.  Default value 5000.
86
87       --min_samples (-I) [int]
88              Minimum number of samples before splitting.  Default value 100.
89
90       --numeric_split_strategy (-N) [string]
91              The splitting strategy to use for numeric  features:  'domingos'
92              or 'binary'. Default value 'binary'.
93
94       --observations_before_binning (-o) [int]
95              If  the  'domingos'  split  strategy is used, this specifies the
96              number of samples observed before binning is performed.  Default
97              value 100.
98
99       --passes (-s) [int]
100              Number of passes to take over the dataset.  Default value 1.
101
102       --test_file (-T) [string]
103              Testing dataset (may be categorical). Default value ''.
104
105       --test_labels_file (-L) [string]
106              Labels of test data. Default value ''.
107
108       --training_file (-t) [string]
109              Training dataset (may be categorical). Default value ''.
110
111       --verbose (-v) [bool]
112              Display  informational  messages and the full list of parameters
113              and timers at the end of execution.
114
115       --version (-V) [bool]
116              Display the version of mlpack.
117

OPTIONAL OUTPUT OPTIONS

119       --output_model_file (-M) [unknown]
120              Output for trained Hoeffding tree model.  Default value ''.
121
122       --predictions_file (-p) [string]
123              Matrix to output label predictions for test data  into.  Default
124              value ''.
125
126       --probabilities_file (-P) [string]
127              In  addition  to predicting labels, provide rediction probabili‐
128              ties in this matrix. Default value ''.
129

ADDITIONAL INFORMATION

131       For further information, including relevant papers, citations, and the‐
132       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
133       included with your distribution of mlpack.
134
135
136
137mlpack-3.0.4                   21 February 2019       mlpack_hoeffding_tree(1)
Impressum