1mlpack_preprocess_split(1)       User Commands      mlpack_preprocess_split(1)
2
3
4

NAME

6       mlpack_preprocess_split - split data
7

SYNOPSIS

9        mlpack_preprocess_split -i string [-I string] [-s int] [-r double] [-V bool] [-T string] [-L string] [-t string] [-l string] [-h -v]
10

DESCRIPTION

12       This utility takes a dataset and optionally labels and splits them into
13       a training set and a test set. Before the  split,  the  points  in  the
14       dataset  are  randomly  reordered.  The percentage of the dataset to be
15       used as the test set can be  specified  with  the  '--test_ratio  (-r)'
16       parameter; the default is 0.2 (20%).
17
18       The  output  training and test matrices may be saved with the '--train‐
19       ing_file (-t)' and '--test_file (-T)' output parameters.
20
21       Optionally, labels can be also be split along with the data by specify‐
22       ing  the  ’--input_labels_file  (-I)' parameter. Splitting labels works
23       the same way as splitting the data. The output training and test labels
24       may    be    saved   with   the   ’--training_labels_file   (-l)'   and
25       '--test_labels_file (-L)' output parameters, respectively.
26
27       So, a simple example where we want to split the  dataset  'X.csv'  into
28       ’X_train.csv' and 'X_test.csv' with 60% of the data in the training set
29       and 40% of the dataset in the test set, we could run
30
31       $  preprocess_split  --input_file  X.csv  --training_file   X_train.csv
32       --test_file X_test.csv --test_ratio 0.4
33
34       If  we  had  a  dataset  'X.csv'  and associated labels 'y.csv', and we
35       wanted to split these into 'X_train.csv', 'y_train.csv',  'X_test.csv',
36       and 'y_test.csv', with 30% of the data in the test set, we could run
37
38       $   preprocess_split   --input_file   X.csv  --input_labels_file  y.csv
39       --test_ratio  0.3  --training_file  X_train.csv  --training_labels_file
40       y_train.csv --test_file X_test.csv --test_labels_file y_test.csv
41

REQUIRED INPUT OPTIONS

43       --input_file (-i) [string]
44              Matrix containing data.
45

OPTIONAL INPUT OPTIONS

47       --help (-h) [bool]
48              Default help info.
49
50       --info [string]
51              Get help on a specific module or option.  Default value ''.
52
53       --input_labels_file (-I) [string]
54              Matrix containing labels. Default value ''.
55
56       --seed (-s) [int]
57              Random seed (0 for std::time(NULL)). Default value 0.
58
59       --test_ratio (-r) [double]
60              Ratio  of test set; if not set,the ratio defaults to 0.2 Default
61              value 0.2.
62
63       --verbose (-v) [bool]
64              Display informational messages and the full list  of  parameters
65              and timers at the end of execution.
66
67       --version (-V) [bool]
68              Display the version of mlpack.
69

OPTIONAL OUTPUT OPTIONS

71       --test_file (-T) [string]
72              Matrix to save test data to. Default value ''.
73
74       --test_labels_file (-L) [string]
75              Matrix to save test labels to. Default value ''.
76
77       --training_file (-t) [string]
78              Matrix to save training data to. Default value ''.
79
80       --training_labels_file (-l) [string]
81              Matrix to save train labels to. Default value ''.
82

ADDITIONAL INFORMATION

84       For further information, including relevant papers, citations, and the‐
85       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
86       included with your distribution of mlpack.
87
88
89
90mlpack-3.0.4                   21 February 2019     mlpack_preprocess_split(1)
Impressum