1RECOLLINDEX(1)              General Commands Manual             RECOLLINDEX(1)
2
3
4

NAME

6       recollindex - indexing command for the Recoll full text search system
7

SYNOPSIS

9       recollindex -h
10       recollindex  [ -z|-Z ] [ -k ] [ --nopurge ] [ -P ] [ --diagsfile <diag‐
11       path> ]
12       recollindex -m [ -w <secs>] [ -D ] [ -x ] [ -C ] [ -n|-k ]
13       recollindex -i [ -Z -k -f -P ] [<path [path ...]>]
14       recollindex -r [ -Z -K -e -f ] [ -p pattern ] <dirpath>
15       recollindex -e [<path [path ...]>]
16       recollindex -l|-S|-E
17       recollindex -s <lang>
18       recollindex --webcache-compact
19       recollindex --webcache-burst <destdir>
20       recollindex --notindexed [path [path ...]]
21
22

DESCRIPTION

24       Create or update a Recoll index.
25
26       There are several modes of operation. All modes support an optional  -c
27       <cfgdir> option to specify the configuration directory name, overriding
28       the default or $RECOLL_CONFDIR (or $HOME/.recoll by default).
29
30
31       The normal mode will index the set of files described in the configura‐
32       tion.  This will incrementally update the index with files that changed
33       since the last run. If option -z is given, the index will be erased be‐
34       fore  starting. If option -Z is given, the index will not be reset, but
35       all files will be considered as needing reindexing (in place reset).
36
37       recollindex does not process again files which previously failed to in‐
38       dex  (for example because of a missing helper program). If option -k is
39       given, recollindex will try again to process all failed  files.  Please
40       note that recollindex may also decide to retry failed files if the aux‐
41       iliary checking script defined by the "checkneedretryindexscript"  con‐
42       figuration variable indicates that this should happen.
43
44       The  --nopurge  option will disable the normal erasure of deleted docu‐
45       ments from the index. This can be useful in special cases (when  it  is
46       known that part of the document set is temporarily not accessible).
47
48       The  -P  option  will  force the purge pass. This is useful only if the
49       idxnoautopurge parameter is set in the configuration file.
50
51       If the option --diagsfile is given, the path given as parameter will be
52       truncated  and indexing diagnostics will be written to it. Each line in
53       the file will have a diagnostic type (reason for the file not to be in‐
54       dexed),  the file path, and a possible additional piece of information,
55       which can be the MIME type or the archive internal  path  depending  on
56       the issue. The following diagnostic types are currently defined:
57
58              Skipped  :  the  path  matches  an  element  of  skippedPaths or
59              skippedNames.
60
61              NoContentSuffix : the file name suffix is found  in  the  noCon‐
62              tentSuffixes list.
63
64              MissingHelper : a helper program is missing.
65
66              Error : general error (see the log).
67
68              NoHandler: no handler is defined for the MIME type.
69
70              ExcludedMime  :  the  MIME type is part of the excludedmimetypes
71              list.
72
73              NotIncludedMime : the onlymimetypes list is not  empty  and  the
74              the MIME type is not in it.
75
76       If option -m is given, recollindex is started for real time monitoring,
77       using the file system monitoring package it was configured for  (either
78       fam, gamin, or inotify). This mode must have been explicitly configured
79       when building the package, it is not available by default. The  program
80       will normally detach from the controlling terminal and become a daemon.
81       If option -D is given, it will stay in the foreground. Option -w  <sec‐
82       onds>  can  be  used  to  specify that the program should sleep for the
83       specified time before indexing begins. The default  value  is  60.  The
84       daemon  normally  monitors  the X11 session and exits when it is reset.
85       Option -x disables this X11 session monitoring (daemon will stay  alive
86       even  if it cannot connect to the X11 server). You need to use this too
87       if you use the daemon without an X11 context. You can use option -n  to
88       skip  the  initial incrementing pass which is normally performed before
89       monitoring starts. Once monitoring is started, the daemon normally mon‐
90       itors  the configuration and restarts from scratch if a change is made.
91       You can disable this with option -C
92
93       recollindex -i will index individual files into the index. The stem ex‐
94       pansion  and aspell databases will not be updated. The skippedPaths and
95       skippedNames configuration variables will be used, so that  some  files
96       may  be  skipped.  You  can tell recollindex to ignore skippedPaths and
97       skippedNames by setting the -f option. This allows  fully  custom  file
98       selection  for  a given subtree, for which you would add the top direc‐
99       tory to skippedPaths, and use any custom tool to generate the file list
100       (ie:  a tool from a source code control system). When run this way, the
101       indexer normally does not perform the deleted files purge pass, because
102       it  cannot be sure to have seen all the existing files. You can force a
103       purge pass with -P.
104
105       recollindex -e will erase data for individual files from the index. The
106       stem expansion databases will not be updated.
107
108       Options  -i  and -e can be combined. This will first perform the purge,
109       then the indexing.
110
111       With options -i or -e , if no file names are given on the command line,
112       they will be read from stdin, so that you could for example run:
113
114       find /path/to/dir -print | recollindex -e -i
115
116       to  force the reindexing of a directory tree (which has to exist inside
117       the file system area defined by  topdirs  in  recoll.conf).  You  could
118       mostly accomplish the same thing with
119
120       find /path/to/dir -print | recollindex -Z -i
121
122       The  latter will perform a less thorough job of purging stale sub-docu‐
123       ments though.
124
125       recollindex -r mostly works like -i , but the parameter is a single di‐
126       rectory,  which  will  be recursively updated. This mostly does nothing
127       more than find topdir | recollindex -i but it may be more convenient to
128       use when started from another program. This retries failed files by de‐
129       fault, use option -K to change. One or multiple -p options can be  used
130       to set shell-type selection patterns (e.g.: *.pdf).
131
132       recollindex -l will list the names of available language stemmers.
133
134       recollindex  -s will build the stem expansion database for a given lan‐
135       guage, which may or may not be part of the list  in  the  configuration
136       file. If the language is not part of the configuration, the stem expan‐
137       sion database will be deleted at the end of the  next  normal  indexing
138       run. You can get the list of stemmer names from the recollindex -l com‐
139       mand. Note that this is mostly for experimental use, the normal way  to
140       add  a  stemming  language is to set it in the configuration, either by
141       editing "recoll.conf" or by using the GUI indexing  configuration  dia‐
142       log.
143       At  the  time  of  this writing, the following languages are recognized
144       (out of Xapian's stem.h):
145
146       •      danish
147
148       •      dutch
149
150       •      english Martin Porter's 2002 revision of his stemmer
151
152       •      english_lovins Lovin's stemmer
153
154       •      english_porter Porter's stemmer as described in his 1980 paper
155
156       •      finnish
157
158       •      french
159
160       •      german
161
162       •      italian
163
164       •      norwegian
165
166       •      portuguese
167
168       •      russian
169
170       •      spanish
171
172       •      swedish
173
174       recollindex -S will rebuild the phonetic/orthographic index. This  fea‐
175       ture uses the aspell package, which must be installed on the system.
176
177       recollindex  -E will check the configuration file for topdirs and other
178       relevant paths existence (to help catch typos).
179
180       recollindex --webcache-compact will recover the space wasted by  erased
181       page  instances  inside  the  Web cache. It may temporarily need to use
182       twice the disk space used by the Web cache.
183
184       recollindex --webcache-burst <destdir> will extract  all  entries  from
185       the  Web  cache  to files created inside <destdir>. Each cache entry is
186       extracted as two files, for the data and metadata.
187
188       recollindex --notindexed [path [path ...]]  will check  each  path  and
189       print  out those which are absent from the index (with an "ABSENT" pre‐
190       fix), or caused an indexing error (with an "ERROR" prefix). If no paths
191       are  given  on  the  command  line, the command will read them, one per
192       line, from stdin.
193
194       Interrupting the command: as indexing can sometimes take a  long  time,
195       the command can be interrupted by sending an interrupt (Ctrl-C, SIGINT)
196       or terminate (SIGTERM) signal. Some time may elapse before the  process
197       exits, because it needs to properly flush and close the index. This can
198       also be done from the recoll GUI (menu entry: File/Stop_Indexing).  Af‐
199       ter  such  an interruption, the index will be somewhat inconsistent be‐
200       cause some operations which are normally performed at the  end  of  the
201       indexing  pass  will  have  been skipped (for example, the stemming and
202       spelling databases will be inexistent or out of date). You just need to
203       restart  indexing  at a later time to restore consistency. The indexing
204       will restart at the interruption point (the full file tree will be tra‐
205       versed,  but  files  that  were  indexed up to the interruption and for
206       which the index is still up to date will not need to be reindexed).
207
208

SEE ALSO

210       recoll(1) recoll.conf(5)
211
212
213
214                                8 January 2006                  RECOLLINDEX(1)
Impressum