1mlpack_preprocess_split(1) General Commands Manual mlpack_preprocess_split(1)
2
3
4
6 mlpack_preprocess_split - split data
7
9 mlpack_preprocess_split [-h] [-v]
10
12 This utility takes a dataset and optionally labels and splits them into
13 a training set and a test set. Before the split, the points in the
14 dataset are randomly reordered. The percentage of the dataset to be
15 used as the test set can be specified with the --test_ratio (-r)
16 option; the default is 0.2 (20%).
17
18 The program does not modify the original file, but instead makes sepa‐
19 rate files to save the training and test files; The program requires
20 you to specify the file names with --training_file (-t) and --test_file
21 (-T).
22
23 Optionally, labels can be also be split along with the data by specify‐
24 ing the --input_labels_file (-I) option. Splitting labels works the
25 same way as splitting the data. The output training and test labels
26 will be saved to the files specified by --training_labels_file (-l) and
27 --test_labels_file (-L), respectively.
28
29 So, a simple example where we want to split dataset.csv into train.csv
30 and test.csv with 60% of the data in the training set and 40% of the
31 dataset in the test set, we could run
32
33 $ mlpack_preprocess_split -i dataset.csv -t train.csv -T test.csv -r
34 0.4
35
36 If we had a dataset in dataset.csv and associated labels in labels.csv,
37 and we wanted to split these into training_set.csv, train‐
38 ing_labels.csv, test_set.csv, and test_labels.csv, with 30% of the data
39 in the test set, we could run
40
41 $ mlpack_preprocess_split -i dataset.csv -I labels.csv -r 0.3 > -t
42 training_set.csv -l training_labels.csv -T test_set.csv > -L
43 test_labels.csv
44
46 --input_file (-i) [string]
47 File containing data,
48
50 --help (-h)
51 Default help info.
52
53 --info [string]
54 Get help on a specific module or option. Default value ''.
55 --input_labels_file (-I) [string] File containing labels Default
56 value ''.
57
58 --seed (-s) [int]
59 Random seed (0 for std::time(NULL)). Default value 0.
60
61 --test_ratio (-r) [double]
62 Ratio of test set; if not set,the ratio defaults to 0.2 Default
63 value 0.2.
64
65 --verbose (-v)
66 Display informational messages and the full list of parameters
67 and timers at the end of execution.
68
69 --version (-V)
70 Display the version of mlpack.
71
73 --test_file (-T) [string]
74 File name to save test data Default value ''.
75 --test_labels_file (-L) [string] File name to save test label
76 Default value ''. --training_file (-t) [string] File name to
77 save train data Default value ''. --training_labels_file (-l)
78 [string] File name to save train label Default value ’'.
79
82 For further information, including relevant papers, citations, and the‐
83 ory, For further information, including relevant papers, citations, and
84 theory, consult the documentation found at http://www.mlpack.org or
85 included with your consult the documentation found at
86 http://www.mlpack.org or included with your DISTRIBUTION OF MLPACK.
87 DISTRIBUTION OF MLPACK.
88
89
90
91 mlpack_preprocess_split(1)