1RECOLLINDEX(1) General Commands Manual RECOLLINDEX(1)
2
3
4
6 recollindex - indexing command for the Recoll full text search system
7
9 recollindex -h
10 recollindex [ -z|-Z ] [ -k ] [ --nopurge ] [ -P ] [ --diagsfile <diag‐
11 path> ]
12 recollindex -m [ -w <secs>] [ -D ] [ -x ] [ -C ] [ -n|-k ]
13 recollindex -i [ -Z -k -f -P ] [<path [path ...]>]
14 recollindex -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
15 recollindex -e [<path [path ...]>]
16 recollindex -l|-S|-E
17 recollindex -s <lang>
18 recollindex --webcache-compact
19 recollindex --webcache-burst <destdir>
20 recollindex --notindexed [path [path ...]]
21
22
24 Create or update a Recoll index.
25
26 There are several modes of operation. All modes support an optional -c
27 <cfgdir> option to specify the configuration directory name, overriding
28 the default or $RECOLL_CONFDIR (or $HOME/.recoll by default).
29
30
31 The normal mode will index the set of files described in the configura‐
32 tion. This will incrementally update the index with files that changed
33 since the last run. If option -z is given, the index will be erased be‐
34 fore starting. If option -Z is given, the index will not be reset, but
35 all files will be considered as needing reindexing (in place reset).
36
37 recollindex does not process again files which previously failed to in‐
38 dex (for example because of a missing helper program). If option -k is
39 given, recollindex will try again to process all failed files. Please
40 note that recollindex may also decide to retry failed files if the aux‐
41 iliary checking script defined by the "checkneedretryindexscript" con‐
42 figuration variable indicates that this should happen.
43
44 The --nopurge option will disable the normal erasure of deleted docu‐
45 ments from the index. This can be useful in special cases (when it is
46 known that part of the document set is temporarily not accessible).
47
48 The -P option will force the purge pass. This is useful only if the
49 idxnoautopurge parameter is set in the configuration file.
50
51 If the option --diagsfile is given, the path given as parameter will be
52 truncated and indexing diagnostics will be written to it. Each line in
53 the file will have a diagnostic type (reason for the file not to be in‐
54 dexed), the file path, and a possible additional piece of information,
55 which can be the MIME type or the archive internal path depending on
56 the issue. The following diagnostic types are currently defined:
57
58 Skipped : the path matches an element of skippedPaths or
59 skippedNames.
60
61 NoContentSuffix : the file name suffix is found in the noCon‐
62 tentSuffixes list.
63
64 MissingHelper : a helper program is missing.
65
66 Error : general error (see the log).
67
68 NoHandler: no handler is defined for the MIME type.
69
70 ExcludedMime : the MIME type is part of the excludedmimetypes
71 list.
72
73 NotIncludedMime : the onlymimetypes list is not empty and the
74 the MIME type is not in it.
75
76 If option -m is given, recollindex is started for real time monitoring,
77 using the file system monitoring package it was configured for (either
78 fam, gamin, or inotify). This mode must have been explicitly configured
79 when building the package, it is not available by default. The program
80 will normally detach from the controlling terminal and become a daemon.
81 If option -D is given, it will stay in the foreground. Option -w <sec‐
82 onds> can be used to specify that the program should sleep for the
83 specified time before indexing begins. The default value is 60. The
84 daemon normally monitors the X11 session and exits when it is reset.
85 Option -x disables this X11 session monitoring (daemon will stay alive
86 even if it cannot connect to the X11 server). You need to use this too
87 if you use the daemon without an X11 context. You can use option -n to
88 skip the initial incrementing pass which is normally performed before
89 monitoring starts. Once monitoring is started, the daemon normally mon‐
90 itors the configuration and restarts from scratch if a change is made.
91 You can disable this with option -C
92
93 recollindex -i will index individual files into the index. The stem ex‐
94 pansion and aspell databases will not be updated. The skippedPaths and
95 skippedNames configuration variables will be used, so that some files
96 may be skipped. You can tell recollindex to ignore skippedPaths and
97 skippedNames by setting the -f option. This allows fully custom file
98 selection for a given subtree, for which you would add the top direc‐
99 tory to skippedPaths, and use any custom tool to generate the file list
100 (ie: a tool from a source code control system). When run this way, the
101 indexer normally does not perform the deleted files purge pass, because
102 it cannot be sure to have seen all the existing files. You can force a
103 purge pass with -P.
104
105 recollindex -e will erase data for individual files from the index. The
106 stem expansion databases will not be updated.
107
108 Options -i and -e can be combined. This will first perform the purge,
109 then the indexing.
110
111 With options -i or -e , if no file names are given on the command line,
112 they will be read from stdin, so that you could for example run:
113
114 find /path/to/dir -print | recollindex -e -i
115
116 to force the reindexing of a directory tree (which has to exist inside
117 the file system area defined by topdirs in recoll.conf). You could
118 mostly accomplish the same thing with
119
120 find /path/to/dir -print | recollindex -Z -i
121
122 The latter will perform a less thorough job of purging stale sub-docu‐
123 ments though.
124
125 recollindex -r mostly works like -i , but the parameter is a single di‐
126 rectory, which will be recursively updated. This mostly does nothing
127 more than find topdir | recollindex -i but it may be more convenient to
128 use when started from another program. This retries failed files by de‐
129 fault, use option -K to change. One or multiple -p options can be used
130 to set shell-type selection patterns (e.g.: *.pdf).
131
132 recollindex -l will list the names of available language stemmers.
133
134 recollindex -s will build the stem expansion database for a given lan‐
135 guage, which may or may not be part of the list in the configuration
136 file. If the language is not part of the configuration, the stem expan‐
137 sion database will be deleted at the end of the next normal indexing
138 run. You can get the list of stemmer names from the recollindex -l com‐
139 mand. Note that this is mostly for experimental use, the normal way to
140 add a stemming language is to set it in the configuration, either by
141 editing "recoll.conf" or by using the GUI indexing configuration dia‐
142 log.
143 At the time of this writing, the following languages are recognized
144 (out of Xapian's stem.h):
145
146 • danish
147
148 • dutch
149
150 • english Martin Porter's 2002 revision of his stemmer
151
152 • english_lovins Lovin's stemmer
153
154 • english_porter Porter's stemmer as described in his 1980 paper
155
156 • finnish
157
158 • french
159
160 • german
161
162 • italian
163
164 • norwegian
165
166 • portuguese
167
168 • russian
169
170 • spanish
171
172 • swedish
173
174 recollindex -S will rebuild the phonetic/orthographic index. This fea‐
175 ture uses the aspell package, which must be installed on the system.
176
177 recollindex -E will check the configuration file for topdirs and other
178 relevant paths existence (to help catch typos).
179
180 recollindex --webcache-compact will recover the space wasted by erased
181 page instances inside the Web cache. It may temporarily need to use
182 twice the disk space used by the Web cache.
183
184 recollindex --webcache-burst <destdir> will extract all entries from
185 the Web cache to files created inside <destdir>. Each cache entry is
186 extracted as two files, for the data and metadata.
187
188 recollindex --notindexed [path [path ...]] will check each path and
189 print out those which are absent from the index (with an "ABSENT" pre‐
190 fix), or caused an indexing error (with an "ERROR" prefix). If no paths
191 are given on the command line, the command will read them, one per
192 line, from stdin.
193
194 Interrupting the command: as indexing can sometimes take a long time,
195 the command can be interrupted by sending an interrupt (Ctrl-C, SIGINT)
196 or terminate (SIGTERM) signal. Some time may elapse before the process
197 exits, because it needs to properly flush and close the index. This can
198 also be done from the recoll GUI (menu entry: File/Stop_Indexing). Af‐
199 ter such an interruption, the index will be somewhat inconsistent be‐
200 cause some operations which are normally performed at the end of the
201 indexing pass will have been skipped (for example, the stemming and
202 spelling databases will be inexistent or out of date). You just need to
203 restart indexing at a later time to restore consistency. The indexing
204 will restart at the interruption point (the full file tree will be tra‐
205 versed, but files that were indexed up to the interruption and for
206 which the index is still up to date will not need to be reindexed).
207
208
210 recoll(1) recoll.conf(5)
211
212
213
214 8 January 2006 RECOLLINDEX(1)