mlpack_kernel_pca(1)

1mlpack_kernel_pca(1)        General Commands Manual       mlpack_kernel_pca(1)
2
3
4

NAME

6       mlpack_kernel_pca - kernel principal components analysis
7

SYNOPSIS

9        mlpack_kernel_pca [-h] [-v]
10

DESCRIPTION

12       This  program  performs  Kernel Principal Components Analysis (KPCA) on
13       the specified dataset with the specified kernel.  This  will  transform
14       the  data  onto  the kernel principal components, and optionally reduce
15       the dimensionality by ignoring the kernel principal components with the
16       smallest eigenvalues.
17
18       For  the  case  where  a linear kernel is used, this reduces to regular
19       PCA.
20
21       For example, the following will perform KPCA on  the  'input.csv'  file
22       using the gaussian kernel and store the transformed date in the 'trans‐
23       formed.csv' file.
24
25       $ kernel_pca -i input.csv -k gaussian -o transformed.csv
26
27       The kernels that are supported are listed below:
28
29              ·  ’linear': the standard linear dot  product  (same  as  normal
30                 PCA): K(x, y) = x^T y
31
32              ·  ’gaussian':  a Gaussian kernel; requires bandwidth: K(x, y) =
33                 exp(-(|| x - y || ^ 2) / (2 * (bandwidth ^ 2)))
34
35              ·  ’polynomial': polynomial kernel; requires offset and  degree:
36                 K(x, y) = (x^T y + offset) ^ degree
37
38              ·  ’hyptan':  hyperbolic tangent kernel; requires scale and off‐
39                 set: K(x, y) = tanh(scale * (x^T y) + offset)
40
41              ·  ’laplacian': Laplacian kernel; requires bandwidth: K(x, y)  =
42                 exp(-(|| x - y ||) / bandwidth)
43
44              ·  ’epanechnikov': Epanechnikov kernel; requires bandwidth: K(x,
45                 y) = max(0, 1 - || x - y ||^2 / bandwidth^2)
46
47              ·  ’cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || *
48                 || y ||)
49
50       The  parameters  for  each  of the kernels should be specified with the
51       options --bandwidth, --kernel_scale, --offset, or --degree (or a combi‐
52       nation of those options).
53
54       Optionally,  the nyström method ("Using the Nystroem method to speed up
55       kernel machines", 2001) can be used to calculate the kernel  matrix  by
56       specifying  the  --nystroem_method  (-n) option. This approach works by
57       using a subset of the data as basis to reconstruct the  kernel  matrix;
58       to  specify  the sampling scheme, the --sampling parameter is used, the
59       sampling scheme for the nyström method can be chosen from the following
60       list: kmeans, random, ordered.
61

REQUIRED INPUT OPTIONS

63       --input_file (-i) [string]
64              Input dataset to perform KPCA on.
65
66       --kernel (-k) [string]
67              The  kernel  to use; see the above documentation for the list of
68              usable kernels.
69

OPTIONAL INPUT OPTIONS

71       --bandwidth (-b) [double]
72              Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value
73              1.
74
75       --center (-c)
76              If set, the transformed data will be centered about the origin.
77
78       --degree (-D) [double]
79              Degree of polynomial, for 'polynomial' kernel.  Default value 1.
80
81       --help (-h)
82              Default help info.
83
84       --info [string]
85              Get  help  on  a  specific  module or option.  Default value ''.
86              --kernel_scale (-S) [double] Scale, for 'hyptan' kernel. Default
87              value  1.   --new_dimensionality (-d) [int] If not 0, reduce the
88              dimensionality of the output dataset by ignoring the  dimensions
89              with the smallest eigenvalues. Default value 0.
90
91       --nystroem_method (-n)
92              If set, the nystroem method will be used.
93
94       --offset (-O) [double]
95              Offset, for 'hyptan' and 'polynomial' kernels.  Default value 0.
96
97       --sampling (-s) [string]
98              Sampling  scheme to use for the nystroem method: ’kmeans', 'ran‐
99              dom', 'ordered' Default value ’kmeans'.
100
101       --verbose (-v)
102              Display informational messages and the full list  of  parameters
103              and timers at the end of execution.
104
105       --version (-V)
106              Display the version of mlpack.
107

OPTIONAL OUTPUT OPTIONS

109       --output_file (-o) [string]
110              File to save modified dataset to. Default value ’'.
111

ADDITIONAL INFORMATION

114       For further information, including relevant papers, citations, and the‐
115       ory, For further information, including relevant papers, citations, and
116       theory,  consult  the  documentation  found at http://www.mlpack.org or
117       included   with   your   consult    the    documentation    found    at
118       http://www.mlpack.org  or  included  with  your DISTRIBUTION OF MLPACK.
119       DISTRIBUTION OF MLPACK.
120
121
122
123                                                          mlpack_kernel_pca(1)