1mlpack_kernel_pca(1) User Commands mlpack_kernel_pca(1)
2
3
4
6 mlpack_kernel_pca - kernel principal components analysis
7
9 mlpack_kernel_pca -i string -k string [-b double] [-c bool] [-D double] [-S double] [-d int] [-n bool] [-O double] [-s string] [-V bool] [-o string] [-h -v]
10
12 This program performs Kernel Principal Components Analysis (KPCA) on
13 the specified dataset with the specified kernel. This will transform
14 the data onto the kernel principal components, and optionally reduce
15 the dimensionality by ignoring the kernel principal components with the
16 smallest eigenvalues.
17
18 For the case where a linear kernel is used, this reduces to regular
19 PCA.
20
21 For example, the following command will perform KPCA on the dataset
22 ’input.csv' using the Gaussian kernel, and saving the transformed data
23 to ’transformed.csv':
24
25 $ kernel_pca --input_file input.csv --kernel gaussian --output_file
26 transformed.csv
27
28 The kernels that are supported are listed below:
29
30 · ’linear': the standard linear dot product (same as normal
31 PCA): K(x, y) = x^T y
32
33 · ’gaussian': a Gaussian kernel; requires bandwidth: K(x, y) =
34 exp(-(|| x - y || ^ 2) / (2 * (bandwidth ^ 2)))
35
36 · ’polynomial': polynomial kernel; requires offset and degree:
37 K(x, y) = (x^T y + offset) ^ degree
38
39 · ’hyptan': hyperbolic tangent kernel; requires scale and off‐
40 set: K(x, y) = tanh(scale * (x^T y) + offset)
41
42 · ’laplacian': Laplacian kernel; requires bandwidth: K(x, y) =
43 exp(-(|| x - y ||) / bandwidth)
44
45 · ’epanechnikov': Epanechnikov kernel; requires bandwidth: K(x,
46 y) = max(0, 1 - || x - y ||^2 / bandwidth^2)
47
48 · ’cosine': cosine distance: K(x, y) = 1 - (x^T y) / (|| x || *
49 || y ||)
50
51 The parameters for each of the kernels should be specified with the
52 options ’--bandwidth (-b)', '--kernel_scale (-S)', '--offset (-O)', or
53 '--degree (-D)' (or a combination of those parameters).
54
55 Optionally, the Nyström method ("Using the Nystroem method to speed up
56 kernel machines", 2001) can be used to calculate the kernel matrix by
57 specifying the ’--nystroem_method (-n)' parameter. This approach works
58 by using a subset of the data as basis to reconstruct the kernel
59 matrix; to specify the sampling scheme, the '--sampling (-s)' parameter
60 is used. The sampling scheme for the Nyström method can be chosen from
61 the following list: 'kmeans', 'random', ’ordered'.
62
64 --input_file (-i) [string]
65 Input dataset to perform KPCA on.
66
67 --kernel (-k) [string]
68 The kernel to use; see the above documentation for the list of
69 usable kernels.
70
72 --bandwidth (-b) [double]
73 Bandwidth, for 'gaussian' and 'laplacian' kernels. Default value
74 1.
75
76 --center (-c) [bool]
77 If set, the transformed data will be centered about the origin.
78
79 --degree (-D) [double]
80 Degree of polynomial, for 'polynomial' kernel. Default value 1.
81
82 --help (-h) [bool]
83 Default help info.
84
85 --info [string]
86 Get help on a specific module or option. Default value ''.
87 --kernel_scale (-S) [double] Scale, for 'hyptan' kernel. Default
88 value 1.
89
90 --new_dimensionality (-d) [int]
91 If not 0, reduce the dimensionality of the output dataset by
92 ignoring the dimensions with the smallest eigenvalues. Default
93 value 0.
94
95 --nystroem_method (-n) [bool]
96 If set, the nystroem method will be used.
97
98 --offset (-O) [double]
99 Offset, for 'hyptan' and 'polynomial' kernels. Default value 0.
100
101 --sampling (-s) [string]
102 Sampling scheme to use for the nystroem method: 'kmeans', 'ran‐
103 dom', 'ordered' Default value 'kmeans'.
104
105 --verbose (-v) [bool]
106 Display informational messages and the full list of parameters
107 and timers at the end of execution.
108
109 --version (-V) [bool]
110 Display the version of mlpack.
111
113 --output_file (-o) [string]
114 Matrix to save modified dataset to. Default value ''.
115
117 For further information, including relevant papers, citations, and the‐
118 ory, consult the documentation found at http://www.mlpack.org or
119 included with your distribution of mlpack.
120
121
122
123mlpack-3.0.4 21 February 2019 mlpack_kernel_pca(1)