1RECOLLINDEX(1) General Commands Manual RECOLLINDEX(1)
2
3
4
6 recollindex - indexing command for the Recoll full text search system
7
9 recollindex -h
10 recollindex [ -c <cfdir>] [ -z|-Z ] [ -k ] [ --diagsfile <diagpath> ]
11 recollindex [ -c <cfd>] -m [ -w <secs>] [ -D ] [ -x ] [ -C ] [ -n|-k ]
12 recollindex [ -c <cfdir>] -i [ -Z -k -f -P ] [<path [path ...]>]
13 recollindex [ -c <cfdir>] -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
14 recollindex [ -c <cfdir>] -e [<path [path ...]>]
15 recollindex [ -c <cfdir>] -l|-S|-E
16 recollindex [ -c <cfdir>] -s <lang>
17 recollindex [ -c <cfdir>] --webcache-compact
18 recollindex [ -c <cfdir>] --webcache-burst <destdir> recollindex [ -c
19 <cfdir>] --notindexed [path [path ...]]
20
21
23 The recollindex command is the Recoll indexer.
24
25 As indexing can sometimes take a long time, the command can be inter‐
26 rupted by sending an interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM)
27 signal. Some time may elapse before the process exits, because it needs
28 to properly flush and close the index. This can also be done from the
29 recoll GUI (menu entry: File/Stop_Indexing). After such an interrup‐
30 tion, the index will be somewhat inconsistent because some operations
31 which are normally performed at the end of the indexing pass will have
32 been skipped (for example, the stemming and spelling databases will be
33 inexistent or out of date). You just need to restart indexing at a
34 later time to restore consistency. The indexing will restart at the in‐
35 terruption point (the full file tree will be traversed, but files that
36 were indexed up to the interruption and for which the index is still up
37 to date will not need to be reindexed).
38
39 The -c option specifies the configuration directory name, overriding
40 the default or $RECOLL_CONFDIR.
41
42 There are several modes of operation.
43
44 The normal mode will index the set of files described in the configura‐
45 tion file recoll.conf. This will incrementally update the database
46 with files that changed since the last run. If option -z is given, the
47 database will be erased before starting. If option -Z is given, the
48 database will not be reset, but all files will be considered as needing
49 reindexing (in place reset).
50
51 As of version 1.21, recollindex usually does not process again files
52 which previously failed to index (for example because of a missing
53 helper program). If option -k is given, recollindex will try again to
54 process all failed files. Please note that recollindex may also decide
55 to retry failed files if the auxiliary checking script defined by the
56 "checkneedretryindexscript" configuration variable indicates that this
57 should happen.
58
59 If option --diagsfile is given, the path given as parameter will be
60 truncated and indexing diagnostics will be written to it. Each line in
61 the file will have a diagnostic type (reason for the file not to be in‐
62 dexed), the file path, and a possible additional piece of information,
63 which can be the MIME type or the archive internal path depending on
64 the issue. The following diagnostic types are currently defined:
65
66 Skipped : the path matches an element of skippedPaths or
67 skippedNames.
68
69 NoContentSuffix : the file name suffix is found in the noCon‐
70 tentSuffixes list.
71
72 MissingHelper : a helper program is missing.
73
74 Error : general error (see the log).
75
76 NoHandler: no handler is defined for the MIME type.
77
78 ExcludedMime : the MIME type is part of the excludedmimetypes
79 list.
80
81 NotIncludedMime : the onlymimetypes list is not empty and the
82 the MIME type is not in it.
83
84 If option -m is given, recollindex is started for real time monitoring,
85 using the file system monitoring package it was configured for (either
86 fam, gamin, or inotify). This mode must have been explicitly configured
87 when building the package, it is not available by default. The program
88 will normally detach from the controlling terminal and become a daemon.
89 If option -D is given, it will stay in the foreground. Option -w <sec‐
90 onds> can be used to specify that the program should sleep for the
91 specified time before indexing begins. The default value is 60. The
92 daemon normally monitors the X11 session and exits when it is reset.
93 Option -x disables this X11 session monitoring (daemon will stay alive
94 even if it cannot connect to the X11 server). You need to use this too
95 if you use the daemon without an X11 context. You can use option -n to
96 skip the initial incrementing pass which is normally performed before
97 monitoring starts. Once monitoring is started, the daemon normally mon‐
98 itors the configuration and restarts from scratch if a change is made.
99 You can disable this with option -C
100
101 recollindex -i will index individual files into the database. The stem
102 expansion and aspell databases will not be updated. The skippedPaths
103 and skippedNames configuration variables will be used, so that some
104 files may be skipped. You can tell recollindex to ignore skippedPaths
105 and skippedNames by setting the -f option. This allows fully custom
106 file selection for a given subtree, for which you would add the top di‐
107 rectory to skippedPaths, and use any custom tool to generate the file
108 list (ie: a tool from a source code control system). When run this way,
109 the indexer normally does not perform the deleted files purge pass, be‐
110 cause it cannot be sure to have seen all the existing files. You can
111 force a purge pass with -P.
112
113 recollindex -e will erase data for individual files from the database.
114 The stem expansion databases will not be updated.
115
116 Options -i and -e can be combined. This will first perform the purge,
117 then the indexing.
118
119 With options -i or -e , if no file names are given on the command line,
120 they will be read from stdin, so that you could for example run:
121
122 find /path/to/dir -print | recollindex -e -i
123
124 to force the reindexing of a directory tree (which has to exist inside
125 the file system area defined by topdirs in recoll.conf). You could
126 mostly accomplish the same thing with
127
128 find /path/to/dir -print | recollindex -Z -i
129
130 The latter will perform a less thorough job of purging stale sub-docu‐
131 ments though.
132
133 recollindex -r mostly works like -i , but the parameter is a single di‐
134 rectory, which will be recursively updated. This mostly does nothing
135 more than find topdir | recollindex -i but it may be more convenient to
136 use when started from another program. This retries failed files by de‐
137 fault, use option -K to change. One or multiple -p options can be used
138 to set shell-type selection patterns (e.g.: *.pdf).
139
140 recollindex -l will list the names of available language stemmers.
141
142 recollindex -s will build the stem expansion database for a given lan‐
143 guage, which may or may not be part of the list in the configuration
144 file. If the language is not part of the configuration, the stem expan‐
145 sion database will be deleted at the end of the next normal indexing
146 run. You can get the list of stemmer names from the recollindex -l com‐
147 mand. Note that this is mostly for experimental use, the normal way to
148 add a stemming language is to set it in the configuration, either by
149 editing "recoll.conf" or by using the GUI indexing configuration dia‐
150 log.
151 At the time of this writing, the following languages are recognized
152 (out of Xapian's stem.h):
153
154 • danish
155
156 • dutch
157
158 • english Martin Porter's 2002 revision of his stemmer
159
160 • english_lovins Lovin's stemmer
161
162 • english_porter Porter's stemmer as described in his 1980 paper
163
164 • finnish
165
166 • french
167
168 • german
169
170 • italian
171
172 • norwegian
173
174 • portuguese
175
176 • russian
177
178 • spanish
179
180 • swedish
181
182 recollindex -S will rebuild the phonetic/orthographic index. This fea‐
183 ture uses the aspell package, which must be installed on the system.
184
185 recollindex -E will check the configuration file for topdirs and other
186 relevant paths existence (to help catch typos).
187
188 recollindex --webcache-compact will recover the space wasted by erased
189 page instances inside the Web cache. It may temporarily need to use
190 twice the disk space used by the Web cache.
191
192 recollindex --webcache-burst <destdir> will extract all entries from
193 the Web cache to files created inside <destdir>. Each cache entry is
194 extracted as two files, for the data and metadata.
195
196 recollindex --notindexed [path [path ...]] will check each path and
197 print out those which are absent from the index (with an "ABSENT" pre‐
198 fix), or caused an indexing error (with an "ERROR" prefix). If no paths
199 are given on the command line, the command will read them, one per
200 line, from stdin.
201
202
204 recoll(1) recoll.conf(5)
205
206
207
208 8 January 2006 RECOLLINDEX(1)