mlpack_krann(1)

1mlpack_krann(1)                  User Commands                 mlpack_krann(1)
2
3
4

NAME

6       mlpack_krann - k-rank-approximate-nearest-neighbors (krann)
7

SYNOPSIS

9        mlpack_krann [-a double] [-X bool] [-m unknown] [-k int] [-l int] [-N bool] [-q string] [-R bool] [-r string] [-L bool] [-s int] [-S bool] [-z int] [-T double] [-t string] [-V bool] [-d string] [-n string] [-M unknown] [-h -v]
10

DESCRIPTION

12       This program will calculate the k rank-approximate-nearest-neighbors of
13       a set of points. You may specify a separate set of reference points and
14       query  points,  or  just a reference set which will be used as both the
15       reference and query set. You must specify the rank approximation (in %)
16       (and optionally the success probability).
17
18       For example, the following will return 5 neighbors from the top 0.1% of
19       the data (with probability 0.95) for  each  point  in  'input.csv'  and
20       store  the  distances  in  'distances.csv' and the neighbors in 'neigh‐
21       bors.csv.csv':
22
23       $ krann --reference_file input.csv --k 5 --distances_file distances.csv
24       --neighbors_file neighbors.csv --tau 0.1
25
26       Note  that tau must be set such that the number of points in the corre‐
27       sponding percentile of the data is greater than k. Thus, if  we  choose
28       tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempt‐
29       ing to choose 5 nearest neighbors out of the closest 1 point -- this is
30       invalid and the program will terminate with an error message.
31
32       The  output  matrices are organized such that row i and column j in the
33       neighbors output file corresponds to the index of the point in the ref‐
34       erence  set  which  is  the i'th nearest neighbor from the point in the
35       query set with index j. Row i and column j in the distances output file
36       corresponds to the distance between those two points.
37

OPTIONAL INPUT OPTIONS

39       --alpha (-a) [double]
40              The desired success probability. Default value 0.95.
41
42       --first_leaf_exact (-X) [bool]
43              The  flag  to  trigger sampling only after exactly exploring the
44              first leaf.
45
46       --help (-h) [bool]
47              Default help info.
48
49       --info [string]
50              Get help on a specific module or option.  Default value ''.
51
52       --input_model_file (-m) [unknown]
53              Pre-trained kNN model. Default value ''.
54
55       --k (-k) [int]
56              Number of nearest neighbors to find. Default value 0.
57
58       --leaf_size (-l) [int]
59              Leaf size for tree building (used  for  kd-trees,  UB  trees,  R
60              trees,  R* trees, X trees, Hilbert R trees, R+ trees, R++ trees,
61              and octrees).  Default value 20.
62
63       --naive (-N) [bool]
64              If true, sampling will be done without using a tree.
65
66       --query_file (-q) [string]
67              Matrix containing query points (optional).  Default value ''.
68
69       --random_basis (-R) [bool]
70              Before tree-building, project the data onto a random  orthogonal
71              basis.
72
73       --reference_file (-r) [string]
74              Matrix containing the reference dataset.  Default value ''.
75
76       --sample_at_leaves (-L) [bool]
77              The flag to trigger sampling at leaves.
78
79       --seed (-s) [int]
80              Random seed (if 0, std::time(NULL) is used).  Default value 0.
81
82       --single_mode (-S) [bool]
83              If  true,  single-tree  search  is used (as opposed to dual-tree
84              search.
85
86       --single_sample_limit (-z) [int]
87              The limit on the  maximum  number  of  samples  (and  hence  the
88              largest node you can approximate).  Default value 20.
89
90       --tau (-T) [double]
91              The  allowed  rank-error in terms of the percentile of the data.
92              Default value 5.
93
94       --tree_type (-t) [string]
95              Type of tree to use: 'kd', 'ub', 'cover',  'r',  'x',  'r-star',
96              'hilbert-r', 'r-plus', 'r-plus-plus', 'oct'. Default value 'kd'.
97
98       --verbose (-v) [bool]
99              Display  informational  messages and the full list of parameters
100              and timers at the end of execution.
101
102       --version (-V) [bool]
103              Display the version of mlpack.
104

OPTIONAL OUTPUT OPTIONS

106       --distances_file (-d) [string]
107              Matrix to output distances into. Default value ''.
108
109       --neighbors_file (-n) [string]
110              Matrix to output neighbors into. Default value ''.
111
112       --output_model_file (-M) [unknown]
113              If specified, the kNN model will be output here.  Default  value
114              ''.
115

ADDITIONAL INFORMATION

117       For further information, including relevant papers, citations, and the‐
118       ory,  consult  the  documentation  found  at  http://www.mlpack.org  or
119       included with your distribution of mlpack.
120
121
122
123mlpack-3.0.4                   21 February 2019                mlpack_krann(1)