1mlpack_preprocess_split(1)  General Commands Manual mlpack_preprocess_split(1)
2
3
4

NAME

6       mlpack_preprocess_split - split data
7

SYNOPSIS

9        mlpack_preprocess_split [-h] [-v]
10

DESCRIPTION

12       This utility takes a dataset and optionally labels and splits them into
13       a training set and a test set. Before the  split,  the  points  in  the
14       dataset  are  randomly  reordered.  The percentage of the dataset to be
15       used as the test set  can  be  specified  with  the  --test_ratio  (-r)
16       option; the default is 0.2 (20%).
17
18       The  program does not modify the original file, but instead makes sepa‐
19       rate files to save the training and test files;  The  program  requires
20       you to specify the file names with --training_file (-t) and --test_file
21       (-T).
22
23       Optionally, labels can be also be split along with the data by specify‐
24       ing  the  --input_labels_file  (-I)  option. Splitting labels works the
25       same way as splitting the data. The output  training  and  test  labels
26       will be saved to the files specified by --training_labels_file (-l) and
27       --test_labels_file (-L), respectively.
28
29       So, a simple example where we want to split dataset.csv into  train.csv
30       and  test.csv  with  60% of the data in the training set and 40% of the
31       dataset in the test set, we could run
32
33       $ mlpack_preprocess_split -i dataset.csv -t train.csv  -T  test.csv  -r
34       0.4
35
36       If we had a dataset in dataset.csv and associated labels in labels.csv,
37       and  we  wanted  to   split   these   into   training_set.csv,   train‐
38       ing_labels.csv, test_set.csv, and test_labels.csv, with 30% of the data
39       in the test set, we could run
40
41       $ mlpack_preprocess_split -i dataset.csv -I  labels.csv  -r  0.3  >  -t
42       training_set.csv   -l   training_labels.csv   -T   test_set.csv   >  -L
43       test_labels.csv
44

REQUIRED INPUT OPTIONS

46       --input_file (-i) [string]
47              File containing data,
48

OPTIONAL INPUT OPTIONS

50       --help (-h)
51              Default help info.
52
53       --info [string]
54              Get help on a specific module  or  option.   Default  value  ''.
55              --input_labels_file (-I) [string] File containing labels Default
56              value ''.
57
58       --seed (-s) [int]
59              Random seed (0 for std::time(NULL)). Default value 0.
60
61       --test_ratio (-r) [double]
62              Ratio of test set; if not set,the ratio defaults to 0.2  Default
63              value 0.2.
64
65       --verbose (-v)
66              Display  informational  messages and the full list of parameters
67              and timers at the end of execution.
68
69       --version (-V)
70              Display the version of mlpack.
71

OPTIONAL OUTPUT OPTIONS

73       --test_file (-T) [string]
74              File   name   to   save   test   data    Default    value    ''.
75              --test_labels_file  (-L)  [string]  File name to save test label
76              Default value ''.  --training_file (-t) [string]  File  name  to
77              save  train  data Default value ''.  --training_labels_file (-l)
78              [string] File name to save train label Default value ’'.
79

ADDITIONAL INFORMATION

ADDITIONAL INFORMATION

82       For further information, including relevant papers, citations, and the‐
83       ory, For further information, including relevant papers, citations, and
84       theory, consult the documentation  found  at  http://www.mlpack.org  or
85       included    with    your    consult    the   documentation   found   at
86       http://www.mlpack.org or included with  your  DISTRIBUTION  OF  MLPACK.
87       DISTRIBUTION OF MLPACK.
88
89
90
91                                                    mlpack_preprocess_split(1)
Impressum