1mlpack_hoeffding_tree(1)    General Commands Manual   mlpack_hoeffding_tree(1)
2
3
4

NAME

6       mlpack_hoeffding_tree - hoeffding trees
7

SYNOPSIS

9        mlpack_hoeffding_tree [-h] [-v]
10

DESCRIPTION

12       This  program  implements Hoeffding trees, a form of streaming decision
13       tree suited best for large (or streaming) datasets. This  program  sup‐
14       ports  both  categorical  and  numeric  data stored in the ARFF format.
15       Given an input dataset, this program is able to  train  the  tree  with
16       numerous training options, and save the model to a file. The program is
17       also able to use a trained model or a model from file in order to  pre‐
18       dict classes for a given test set.
19
20       The training file and associated labels are specified with the --train‐
21       ing_file and --labels_file options,  respectively.  The  training  file
22       must  be  in  ARFF  format. The training may be performed in batch mode
23       (like a typical decision tree algorithm) by specifying the --batch_mode
24       option, but this may not be the best option for large datasets.
25
26       When  a  model  is  trained,  it may be saved to a file with the --out‐
27       put_model_file (-M) option. A model may be loaded from file for further
28       training or testing with the --input_model_file (-m) option.
29
30       A  test  file may be specified with the --test_file (-T) option, and if
31       performance numbers are desired for that test set, labels may be speci‐
32       fied with the --test_labels_file (-L) option. Predictions for each test
33       point will be stored in the file specified by  --predictions_file  (-p)
34       and probabilities for each predictions will be stored in the file spec‐
35       ified by the --probabilities_file (-P) option.
36

OPTIONAL INPUT OPTIONS

38       --batch_mode (-b)
39              If true, samples will be considered in batch  instead  of  as  a
40              stream.  This  generally results in better trees but at the cost
41              of memory usage and runtime.
42
43       --bins (-B) [int]
44              If the 'domingos' split strategy is  used,  this  specifies  the
45              number of bins for each numeric split. Default value 10.
46
47       --confidence (-c) [double]
48              Confidence  before  splitting  (between 0 and 1).  Default value
49              0.95.
50
51       --help (-h)
52              Default help info.
53
54       --info [string]
55              Get help on a specific module or option.  Default value ''.
56
57       --info_gain (-i)
58              If set, information gain is used instead of  Gini  impurity  for
59              calculating  Hoeffding bounds.  --input_model_file (-m) [string]
60              File to load trained tree from. Default value ’'.
61
62       --labels_file (-l) [string]
63              Labels for training dataset. Default value ''.
64
65       --max_samples (-n) [int]
66              Maximum number of samples before splitting.  Default value 5000.
67
68       --min_samples (-I) [int]
69              Minimum number of samples before splitting.  Default value  100.
70              --numeric_split_strategy (-N) [string] The splitting strategy to
71              use for numeric features: 'domingos' or 'binary'. Default  value
72              ’binary'.    --observations_before_binning  (-o)  [int]  If  the
73              'domingos' split strategy is used, this specifies the number  of
74              samples observed before binning is performed. Default value 100.
75
76       --passes (-s) [int]
77              Number of passes to take over the dataset.  Default value 1.
78
79       --test_file (-T) [string]
80              File of testing data. Default value ''.  --test_labels_file (-L)
81              [string] Labels of test data. Default value ''.  --training_file
82              (-t) [string] Training dataset file. Default value ''.
83
84       --verbose (-v)
85              Display  informational  messages and the full list of parameters
86              and timers at the end of execution.
87
88       --version (-V)
89              Display the version of mlpack.
90

OPTIONAL OUTPUT OPTIONS

92       --output_model_file (-M) [string] File to save trained tree to. Default
93       value  ’'.   --predictions_file (-p) [string] File to output label pre‐
94       dictions for test data into. Default  value  ''.   --probabilities_file
95       (-P)  [string]  In  addition  to  predicting labels, provide prediction
96       probabilities in this file. Default value ''.
97

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

100       For further information, including relevant papers, citations, and the‐
101       ory, For further information, including relevant papers, citations, and
102       theory, consult the documentation  found  at  http://www.mlpack.org  or
103       included    with    your    consult    the   documentation   found   at
104       http://www.mlpack.org or included with  your  DISTRIBUTION  OF  MLPACK.
105       DISTRIBUTION OF MLPACK.
106
107
108
109                                                      mlpack_hoeffding_tree(1)
Impressum