mlpack_kernel_pca(1)

1mlpack_kernel_pca(1)             User Commands            mlpack_kernel_pca(1)
2
3
4

NAME

6       mlpack_kernel_pca - kernel principal components analysis
7

SYNOPSIS

9        mlpack_kernel_pca -i string -k string [-b double] [-c bool] [-D double] [-S double] [-d int] [-n bool] [-O double] [-s string] [-V bool] [-o string] [-h -v]
10

DESCRIPTION

12       This  program  performs  Kernel Principal Components Analysis (KPCA) on
13       the specified dataset with the specified kernel.  This  will  transform
14       the  data  onto  the kernel principal components, and optionally reduce
15       the dimensionality by ignoring the kernel principal components with the
16       smallest eigenvalues.
17
18       For  the  case  where  a linear kernel is used, this reduces to regular
19       PCA.
20
21       For example, the following command will perform  KPCA  on  the  dataset
22       ’input.csv'  using the Gaussian kernel, and saving the transformed data
23       to ’transformed.csv':
24
25       $ kernel_pca --input_file  input.csv  --kernel  gaussian  --output_file
26       transformed.csv
27
28       The kernels that are supported are listed below:
29
30              ·  ’linear':  the  standard  linear  dot product (same as normal
31                 PCA): K(x, y) = x^T y
32
33              ·  ’gaussian': a Gaussian kernel; requires bandwidth: K(x, y)  =
34                 exp(-(|| x - y || ^ 2) / (2 * (bandwidth ^ 2)))
35
36              ·  ’polynomial':  polynomial kernel; requires offset and degree:
37                 K(x, y) = (x^T y + offset) ^ degree
38
39              ·  ’hyptan': hyperbolic tangent kernel; requires scale and  off‐
40                 set: K(x, y) = tanh(scale * (x^T y) + offset)
41
42              ·  ’laplacian':  Laplacian kernel; requires bandwidth: K(x, y) =
43                 exp(-(|| x - y ||) / bandwidth)
44
45              ·  ’epanechnikov': Epanechnikov kernel; requires bandwidth: K(x,
46                 y) = max(0, 1 - || x - y ||^2 / bandwidth^2)
47
48              ·  ’cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || *
49                 || y ||)
50
51       The parameters for each of the kernels should  be  specified  with  the
52       options  ’--bandwidth (-b)', '--kernel_scale (-S)', '--offset (-O)', or
53       '--degree (-D)' (or a combination of those parameters).
54
55       Optionally, the Nyström method ("Using the Nystroem method to speed  up
56       kernel  machines",  2001) can be used to calculate the kernel matrix by
57       specifying the ’--nystroem_method (-n)' parameter. This approach  works
58       by  using  a  subset  of  the  data  as basis to reconstruct the kernel
59       matrix; to specify the sampling scheme, the '--sampling (-s)' parameter
60       is  used. The sampling scheme for the Nyström method can be chosen from
61       the following list: 'kmeans', 'random', ’ordered'.
62

REQUIRED INPUT OPTIONS

64       --input_file (-i) [string]
65              Input dataset to perform KPCA on.
66
67       --kernel (-k) [string]
68              The kernel to use; see the above documentation for the  list  of
69              usable kernels.
70

OPTIONAL INPUT OPTIONS

72       --bandwidth (-b) [double]
73              Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value
74              1.
75
76       --center (-c) [bool]
77              If set, the transformed data will be centered about the origin.
78
79       --degree (-D) [double]
80              Degree of polynomial, for 'polynomial' kernel.  Default value 1.
81
82       --help (-h) [bool]
83              Default help info.
84
85       --info [string]
86              Get help on a specific module  or  option.   Default  value  ''.
87              --kernel_scale (-S) [double] Scale, for 'hyptan' kernel. Default
88              value 1.
89
90       --new_dimensionality (-d) [int]
91              If not 0, reduce the dimensionality of  the  output  dataset  by
92              ignoring  the  dimensions with the smallest eigenvalues. Default
93              value 0.
94
95       --nystroem_method (-n) [bool]
96              If set, the nystroem method will be used.
97
98       --offset (-O) [double]
99              Offset, for 'hyptan' and 'polynomial' kernels.  Default value 0.
100
101       --sampling (-s) [string]
102              Sampling scheme to use for the nystroem method: 'kmeans',  'ran‐
103              dom', 'ordered' Default value 'kmeans'.
104
105       --verbose (-v) [bool]
106              Display  informational  messages and the full list of parameters
107              and timers at the end of execution.
108
109       --version (-V) [bool]
110              Display the version of mlpack.
111

OPTIONAL OUTPUT OPTIONS

113       --output_file (-o) [string]
114              Matrix to save modified dataset to. Default value ''.
115

ADDITIONAL INFORMATION

117       For further information, including relevant papers, citations, and the‐
118       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
119       included with your distribution of mlpack.
120
121
122
123mlpack-3.0.4                   21 February 2019           mlpack_kernel_pca(1)