1mlpack_approx_kfn(1) User Commands mlpack_approx_kfn(1)
2
3
4
6 mlpack_approx_kfn - approximate furthest neighbor search
7
9 mlpack_approx_kfn [-a string] [-e bool] [-x string] [-m unknown] [-k int] [-p int] [-t int] [-q string] [-r string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]
10
12 This program implements two strategies for furthest neighbor search.
13 These strategies are:
14
15 · The 'qdafn' algorithm from "Approximate Furthest Neighbor in
16 High Dimensions" by R. Pagh, F. Silvestri, J. Sivertsen, and
17 M. Skala, in Similarity Search and Applications 2015 (SISAP).
18
19 · The 'DrusillaSelect' algorithm from "Fast approximate fur‐
20 thest neighbors with data-dependent candidate selection", by
21 R.R. Curtin and A.B. Gardner, in Similarity Search and Appli‐
22 cations 2016 (SISAP).
23
24 These two strategies give approximate results for the furthest neighbor
25 search problem and can be used as fast replacements for other furthest
26 neighbor techniques such as those found in the mlpack_kfn program. Note
27 that typically, the 'ds' algorithm requires far fewer tables and pro‐
28 jections than the 'qdafn' algorithm.
29
30 Specify a reference set (set to search in) with '--reference_file
31 (-r)', specify a query set with '--query_file (-q)', and specify algo‐
32 rithm parameters with '--num_tables (-t)' and '--num_projections (-p)'
33 (or don't and defaults will be used). The algorithm to be used (either
34 'ds'---the default---or ’qdafn') may be specified with '--algorithm
35 (-a)'. Also specify the number of neighbors to search for with '--k
36 (-k)'.
37
38 If no query set is specified, the reference set will be used as the
39 query set. The '--output_model_file (-M)' output parameter may be used
40 to store the built model, and an input model may be loaded instead of
41 specifying a reference set with the '--input_model_file (-m)' option.
42
43 Results for each query point can be stored with the '--neighbors_file
44 (-n)' and '--distances_file (-d)' output parameters. Each row of these
45 output matrices holds the k distances or neighbor indices for each
46 query point.
47
48 For example, to find the 5 approximate furthest neighbors with ’refer‐
49 ence_set.csv' as the reference set and 'query_set.csv' as the query set
50 using DrusillaSelect, storing the furthest neighbor indices to 'neigh‐
51 bors.csv' and the furthest neighbor distances to 'distances.csv', one
52 could call
53
54 $ approx_kfn --query_file query_set.csv --reference_file refer‐
55 ence_set.csv --k 5 --algorithm ds --neighbors_file neighbors.csv --dis‐
56 tances_file distances.csv
57
58 and to perform approximate all-furthest-neighbors search with k=1 on
59 the set ’data.csv' storing only the furthest neighbor distances to
60 'distances.csv', one could call
61
62 $ approx_kfn --reference_file reference_set.csv --k 1 --distances_file
63 distances.csv
64
65 A trained model can be re-used. If a model has been previously saved to
66 ’model.bin', then we may find 3 approximate furthest neighbors on a
67 query set ’new_query_set.csv' using that model and store the furthest
68 neighbor indices into 'neighbors.csv' by calling
69
70 $ approx_kfn --input_model_file model.bin --query_file
71 new_query_set.csv --k 3 --neighbors_file neighbors.csv
72
74 --algorithm (-a) [string]
75 Algorithm to use: 'ds' or 'qdafn'. Default value 'ds'.
76
77 --calculate_error (-e) [bool]
78 If set, calculate the average distance error for the first fur‐
79 thest neighbor only.
80
81 --exact_distances_file (-x) [string]
82 Matrix containing exact distances to furthest neighbors; this
83 can be used to avoid explicit
84
85 calculation when --calculate_error is set.
86 Default value ''.
87
88 --help (-h) [bool]
89 Default help info.
90
91 --info [string]
92 Get help on a specific module or option. Default value ''.
93
94 --input_model_file (-m) [unknown]
95 File containing input model. Default value ''.
96
97 --k (-k) [int]
98 Number of furthest neighbors to search for. Default value 0.
99 --num_projections (-p) [int] Number of projections to use in
100 each hash table. Default value 5.
101
102 --num_tables (-t) [int]
103 Number of hash tables to use. Default value 5.
104
105 --query_file (-q) [string]
106 Matrix containing query points. Default value ''.
107
108 --reference_file (-r) [string]
109 Matrix containing the reference dataset. Default value ''.
110
111 --verbose (-v) [bool]
112 Display informational messages and the full list of parameters
113 and timers at the end of execution.
114
115 --version (-V) [bool]
116 Display the version of mlpack.
117
119 --distances_file (-d) [string]
120 Matrix to save furthest neighbor distances to. Default value
121 ''.
122
123 --neighbors_file (-n) [string]
124 Matrix to save neighbor indices to. Default value ''.
125
126 --output_model_file (-M) [unknown]
127 File to save output model to. Default value ''.
128
130 For further information, including relevant papers, citations, and the‐
131 ory, consult the documentation found at http://www.mlpack.org or
132 included with your distribution of mlpack.
133
134
135
136mlpack-3.0.4 21 February 2019 mlpack_approx_kfn(1)