1mlpack_preprocess_split(1) User Commands mlpack_preprocess_split(1)
2
3
4
6 mlpack_preprocess_split - split data
7
9 mlpack_preprocess_split -i string [-I string] [-s int] [-r double] [-V bool] [-T string] [-L string] [-t string] [-l string] [-h -v]
10
12 This utility takes a dataset and optionally labels and splits them into
13 a training set and a test set. Before the split, the points in the
14 dataset are randomly reordered. The percentage of the dataset to be
15 used as the test set can be specified with the '--test_ratio (-r)'
16 parameter; the default is 0.2 (20%).
17
18 The output training and test matrices may be saved with the '--train‐
19 ing_file (-t)' and '--test_file (-T)' output parameters.
20
21 Optionally, labels can be also be split along with the data by specify‐
22 ing the ’--input_labels_file (-I)' parameter. Splitting labels works
23 the same way as splitting the data. The output training and test labels
24 may be saved with the ’--training_labels_file (-l)' and
25 '--test_labels_file (-L)' output parameters, respectively.
26
27 So, a simple example where we want to split the dataset 'X.csv' into
28 ’X_train.csv' and 'X_test.csv' with 60% of the data in the training set
29 and 40% of the dataset in the test set, we could run
30
31 $ preprocess_split --input_file X.csv --training_file X_train.csv
32 --test_file X_test.csv --test_ratio 0.4
33
34 If we had a dataset 'X.csv' and associated labels 'y.csv', and we
35 wanted to split these into 'X_train.csv', 'y_train.csv', 'X_test.csv',
36 and 'y_test.csv', with 30% of the data in the test set, we could run
37
38 $ preprocess_split --input_file X.csv --input_labels_file y.csv
39 --test_ratio 0.3 --training_file X_train.csv --training_labels_file
40 y_train.csv --test_file X_test.csv --test_labels_file y_test.csv
41
43 --input_file (-i) [string]
44 Matrix containing data.
45
47 --help (-h) [bool]
48 Default help info.
49
50 --info [string]
51 Get help on a specific module or option. Default value ''.
52
53 --input_labels_file (-I) [string]
54 Matrix containing labels. Default value ''.
55
56 --seed (-s) [int]
57 Random seed (0 for std::time(NULL)). Default value 0.
58
59 --test_ratio (-r) [double]
60 Ratio of test set; if not set,the ratio defaults to 0.2 Default
61 value 0.2.
62
63 --verbose (-v) [bool]
64 Display informational messages and the full list of parameters
65 and timers at the end of execution.
66
67 --version (-V) [bool]
68 Display the version of mlpack.
69
71 --test_file (-T) [string]
72 Matrix to save test data to. Default value ''.
73
74 --test_labels_file (-L) [string]
75 Matrix to save test labels to. Default value ''.
76
77 --training_file (-t) [string]
78 Matrix to save training data to. Default value ''.
79
80 --training_labels_file (-l) [string]
81 Matrix to save train labels to. Default value ''.
82
84 For further information, including relevant papers, citations, and the‐
85 ory, consult the documentation found at http://www.mlpack.org or
86 included with your distribution of mlpack.
87
88
89
90mlpack-3.0.4 21 February 2019 mlpack_preprocess_split(1)