1RECOLL.CONF(5) File Formats Manual RECOLL.CONF(5)
2
3
4
6 recoll.conf - main personal configuration file for Recoll
7
9 This file defines the index configuration for the Recoll full-text
10 search system.
11
12 The system-wide configuration file is normally located inside /usr/[lo‐
13 cal]/share/recoll/examples. Any parameter set in the common file may be
14 overridden by setting it in the personal configuration file, by de‐
15 fault: $HOME/.recoll/recoll.conf
16
17 Please note while I try to keep this manual page reasonably up to date,
18 it will frequently lag the current state of the software. The best
19 source of information about the configuration are the comments in the
20 system-wide configuration file or the user manual which you can access
21 from the recoll GUI help menu or on the recoll web site.
22
23
24 A short extract of the file might look as follows:
25
26 # Space-separated list of directories to index.
27 topdirs = ~/docs /usr/share/doc
28
29 [~/somedirectory-with-utf8-txt-files]
30 defaultcharset = utf-8
31
32
33 There are three kinds of lines:
34
35 • Comment or empty
36
37 • Parameter affectation
38
39 • Section definition
40
41 Empty lines or lines beginning with # are ignored.
42
43 Affectation lines are in the form 'name = value'.
44
45 Section lines allow redefining a parameter for a directory subtree.
46 Some of the parameters used for indexing are looked up hierarchically
47 from the more to the less specific. Not all parameters can be meaning‐
48 fully redefined, this is specified for each in the next section.
49
50 The tilde character (~) is expanded in file names to the name of the
51 user's home directory.
52
53 Where values are lists, white space is used for separation, and ele‐
54 ments with embedded spaces can be quoted with double-quotes.
55
57 topdirs = string
58 Space-separated list of files or directories to recursively in‐
59 dex. Default to ~ (indexes $HOME). You can use symbolic links in
60 the list, they will be followed, independently of the value of
61 the followLinks variable.
62
63 monitordirs = string
64 Space-separated list of files or directories to monitor for up‐
65 dates. When running the real-time indexer, this allows monitor‐
66 ing only a subset of the whole indexed area. The elements must
67 be included in the tree defined by the 'topdirs' members.
68
69 skippedNames = string
70 Files and directories which should be ignored. White space sep‐
71 arated list of wildcard patterns (simple ones, not paths, must
72 contain no / ), which will be tested against file and directory
73 names. The list in the default configuration does not exclude
74 hidden directories (names beginning with a dot), which means
75 that it may index quite a few things that you do not want. On
76 the other hand, email user agents like Thunderbird usually store
77 messages in hidden directories, and you probably want this in‐
78 dexed. One possible solution is to have ".*" in "skippedNames",
79 and add things like "~/.thunderbird" "~/.evolution" to
80 "topdirs". Not even the file names are indexed for patterns in
81 this list, see the "noContentSuffixes" variable for an alterna‐
82 tive approach which indexes the file names. Can be redefined for
83 any subtree.
84
85 skippedNames- = string
86 List of name endings to remove from the default skippedNames
87 list.
88
89 skippedNames+ = string
90 List of name endings to add to the default skippedNames list.
91
92 onlyNames = string
93 Regular file name filter patterns If this is set, only the file
94 names not in skippedNames and matching one of the patterns will
95 be considered for indexing. Can be redefined per subtree. Does
96 not apply to directories.
97
98 noContentSuffixes = string
99 List of name endings (not necessarily dot-separated suffixes)
100 for which we don't try MIME type identification, and don't un‐
101 compress or index content. Only the names will be indexed. This
102 complements the now obsoleted recoll_noindex list from the
103 mimemap file, which will go away in a future release (the move
104 from mimemap to recoll.conf allows editing the list through the
105 GUI). This is different from skippedNames because these are name
106 ending matches only (not wildcard patterns), and the file name
107 itself gets indexed normally. This can be redefined for subdi‐
108 rectories.
109
110 noContentSuffixes- = string
111 List of name endings to remove from the default noContentSuf‐
112 fixes list.
113
114 noContentSuffixes+ = string
115 List of name endings to add to the default noContentSuffixes
116 list.
117
118 skippedPaths = string
119 Absolute paths we should not go into. Space-separated list of
120 wildcard expressions for absolute filesystem paths. Must be de‐
121 fined at the top level of the configuration file, not in a sub‐
122 section. Can contain files and directories. The database and
123 configuration directories will automatically be added. The ex‐
124 pressions are matched using 'fnmatch(3)' with the FNM_PATHNAME
125 flag set by default. This means that '/' characters must be
126 matched explicitly. You can set 'skippedPathsFnmPathname' to 0
127 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will
128 match '/dir1/dir2/dir3'). The default value contains the usual
129 mount point for removable media to remind you that it is a bad
130 idea to have Recoll work on these (esp. with the monitor: media
131 gets indexed on mount, all data gets erased on unmount). Explic‐
132 itly adding '/media/xxx' to the 'topdirs' variable will override
133 this.
134
135 skippedPathsFnmPathname = bool
136 Set to 0 to override use of FNM_PATHNAME for matching skipped
137 paths.
138
139 nowalkfn = string
140 File name which will cause its parent directory to be skipped.
141 Any directory containing a file with this name will be skipped
142 as if it was part of the skippedPaths list. Ex: .recoll-noindex
143
144 daemSkippedPaths = string
145 skippedPaths equivalent specific to real time indexing. This en‐
146 ables having parts of the tree which are initially indexed but
147 not monitored. If daemSkippedPaths is not set, the daemon uses
148 skippedPaths.
149
150 zipUseSkippedNames = bool
151 Use skippedNames inside Zip archives. Fetched directly by the
152 rclzip.py handler. Skip the patterns defined by skippedNames in‐
153 side Zip archives. Can be redefined for subdirectories. See
154 https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil‐
155 teringOutZipArchiveMembers.html
156
157
158 zipSkippedNames = string
159 Space-separated list of wildcard expressions for names that
160 should be ignored inside zip archives. This is used directly by
161 the zip handler. If zipUseSkippedNames is not set, zipSkipped‐
162 Names defines the patterns to be skipped inside archives. If zi‐
163 pUseSkippedNames is set, the two lists are concatenated and
164 used. Can be redefined for subdirectories. See https://www.les‐
165 bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
166 bers.html
167
168
169 followLinks = bool
170 Follow symbolic links during indexing. The default is to ignore
171 symbolic links to avoid multiple indexing of linked files. No
172 effort is made to avoid duplication when this option is set to
173 true. This option can be set individually for each of the
174 'topdirs' members by using sections. It can not be changed below
175 the 'topdirs' level. Links in the 'topdirs' list itself are al‐
176 ways followed.
177
178 indexedmimetypes = string
179 Restrictive list of indexed mime types. Normally not set (in
180 which case all supported types are indexed). If it is set, only
181 the types from the list will have their contents indexed. The
182 names will be indexed anyway if indexallfilenames is set (de‐
183 fault). MIME type names should be taken from the mimemap file
184 (the values may be different from xdg-mime or file -i output in
185 some cases). Can be redefined for subtrees.
186
187 excludedmimetypes = string
188 List of excluded MIME types. Lets you exclude some types from
189 indexing. MIME type names should be taken from the mimemap file
190 (the values may be different from xdg-mime or file -i output in
191 some cases) Can be redefined for subtrees.
192
193 nomd5types = string
194 Don't compute md5 for these types. md5 checksums are used only
195 for deduplicating results, and can be very expensive to compute
196 on multimedia or other big files. This list lets you turn off
197 md5 computation for selected types. It is global (no redefini‐
198 tion for subtrees). At the moment, it only has an effect for ex‐
199 ternal handlers (exec and execm). The file types can be speci‐
200 fied by listing either MIME types (e.g. audio/mpeg) or handler
201 names (e.g. rclaudio.py).
202
203 compressedfilemaxkbs = int
204 Size limit for compressed files. We need to decompress these in
205 a temporary directory for identification, which can be wasteful
206 in some cases. Limit the waste. Negative means no limit. 0 re‐
207 sults in no processing of any compressed file. Default 50 MB.
208
209 textfilemaxmbs = int
210 Size limit for text files. Mostly for skipping monster logs. De‐
211 fault 20 MB.
212
213 indexallfilenames = bool
214 Index the file names of unprocessed files Index the names of
215 files the contents of which we don't index because of an ex‐
216 cluded or unsupported MIME type.
217
218 usesystemfilecommand = bool
219 Use a system command for file MIME type guessing as a final step
220 in file type identification This is generally useful, but will
221 usually cause the indexing of many bogus 'text' files. See 'sys‐
222 temfilecommand' for the command used.
223
224 systemfilecommand = string
225 Command used to guess MIME types if the internal methods fails
226 This should be a "file -i" workalike. The file path will be
227 added as a last parameter to the command line. "xdg-mime" works
228 better than the traditional "file" command, and is now the con‐
229 figured default (with a hard-coded fallback to "file")
230
231 processwebqueue = bool
232 Decide if we process the Web queue. The queue is a directory
233 where the Recoll Web browser plugins create the copies of vis‐
234 ited pages.
235
236 textfilepagekbs = int
237 Page size for text files. If this is set, text/plain files will
238 be divided into documents of approximately this size. Will re‐
239 duce memory usage at index time and help with loading data in
240 the preview window at query time. Particularly useful with very
241 big files, such as application or system logs. Also see
242 textfilemaxmbs and compressedfilemaxkbs.
243
244 membermaxkbs = int
245 Size limit for archive members. This is passed to the filters in
246 the environment as RECOLL_FILTER_MAXMEMBERKB.
247
248 indexStripChars = bool
249 Decide if we store character case and diacritics in the index.
250 If we do, searches sensitive to case and diacritics can be per‐
251 formed, but the index will be bigger, and some marginal weird‐
252 ness may sometimes occur. The default is a stripped index. When
253 using multiple indexes for a search, this parameter must be de‐
254 fined identically for all. Changing the value implies an index
255 reset.
256
257 indexStoreDocText = bool
258 Decide if we store the documents' text content in the index.
259 Storing the text allows extracting snippets from it at query
260 time, instead of building them from index position data. Newer
261 Xapian index formats have rendered our use of positions list un‐
262 acceptably slow in some cases. The last Xapian index format with
263 good performance for the old method is Chert, which is default
264 for 1.2, still supported but not default in 1.4 and will be
265 dropped in 1.6. The stored document text is translated from its
266 original format to UTF-8 plain text, but not stripped of upper-
267 case, diacritics, or punctuation signs. Storing it increases the
268 index size by 10-20% typically, but also allows for nicer snip‐
269 pets, so it may be worth enabling it even if not strictly needed
270 for performance if you can afford the space. The variable only
271 has an effect when creating an index, meaning that the xapiandb
272 directory must not exist yet. Its exact effect depends on the
273 Xapian version. For Xapian 1.4, if the variable is set to 0,
274 the Chert format will be used, and the text will not be stored.
275 If the variable is 1, Glass will be used, and the text stored.
276 For Xapian 1.2, and for versions after 1.5 and newer, the index
277 format is always the default, but the variable controls if the
278 text is stored or not, and the abstract generation method. With
279 Xapian 1.5 and later, and the variable set to 0, abstract gener‐
280 ation may be very slow, but this setting may still be useful to
281 save space if you do not use abstract generation at all.
282
283
284 nonumbers = bool
285 Decides if terms will be generated for numbers. For example
286 "123", "1.5e6", 192.168.1.4, would not be indexed if nonumbers
287 is set ("value123" would still be). Numbers are often quite in‐
288 teresting to search for, and this should probably not be set ex‐
289 cept for special situations, ie, scientific documents with huge
290 amounts of numbers in them, where setting nonumbers will reduce
291 the index size. This can only be set for a whole index, not for
292 a subtree.
293
294 dehyphenate = bool
295 Determines if we index 'coworker' also when the input is 'co-
296 worker'. This is new in version 1.22, and on by default. Setting
297 the variable to off allows restoring the previous behaviour.
298
299 backslashasletter = bool
300 Process backslash as normal letter. This may make sense for peo‐
301 ple wanting to index TeX commands as such but is not of much
302 general use.
303
304 underscoreasletter = bool
305 Process underscore as normal letter. This makes sense in so many
306 cases that one wonders if it should not be the default.
307
308 maxtermlength = int
309 Maximum term length. Words longer than this will be discarded.
310 The default is 40 and used to be hard-coded, but it can now be
311 adjusted. You need an index reset if you change the value.
312
313 nocjk = bool
314 Decides if specific East Asian (Chinese Korean Japanese) charac‐
315 ters/word splitting is turned off. This will save a small amount
316 of CPU if you have no CJK documents. If your document base does
317 include such text but you are not interested in searching it,
318 setting nocjk may be a significant time and space saver.
319
320 cjkngramlen = int
321 This lets you adjust the size of n-grams used for indexing CJK
322 text. The default value of 2 is probably appropriate in most
323 cases. A value of 3 would allow more precision and efficiency on
324 longer words, but the index will be approximately twice as
325 large.
326
327 indexstemminglanguages = string
328 Languages for which to create stemming expansion data. Stemmer
329 names can be found by executing 'recollindex -l', or this can
330 also be set from a list in the GUI. The values are full language
331 names, e.g. english, french...
332
333 defaultcharset = string
334 Default character set. This is used for files which do not con‐
335 tain a character set definition (e.g.: text/plain). Values found
336 inside files, e.g. a 'charset' tag in HTML documents, will over‐
337 ride it. If this is not set, the default character set is the
338 one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),
339 or ultimately iso-8859-1 (cp-1252 in fact). If for some reason
340 you want a general default which does not match your LANG and is
341 not 8859-1, use this variable. This can be redefined for any
342 sub-directory.
343
344 unac_except_trans = string
345 A list of characters, encoded in UTF-8, which should be handled
346 specially when converting text to unaccented lowercase. For ex‐
347 ample, in Swedish, the letter a with diaeresis has full alphabet
348 citizenship and should not be turned into an a. Each element in
349 the space-separated list has the special character as first ele‐
350 ment and the translation following. The handling of both the
351 lowercase and upper-case versions of a character should be spec‐
352 ified, as appartenance to the list will turn-off both standard
353 accent and case processing. The value is global and affects both
354 indexing and querying. Examples:
355 Swedish:
356 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
357 fifi flfl åå Åå
358 German:
359 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
360 fifi flfl
361 French: you probably want to decompose oe and ae and nobody
362 would type a German ß
363 unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
364 The default for all until someone protests follows. These decom‐
365 positions are not performed by unac, but it is unlikely that
366 someone would type the composed forms in a search.
367 unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
368
369 maildefcharset = string
370 Overrides the default character set for email messages which
371 don't specify one. This is mainly useful for readpst (libpst)
372 dumps, which are utf-8 but do not say so.
373
374 localfields = string
375 Set fields on all files (usually of a specific fs area). Syntax
376 is the usual: name = value ; attr1 = val1 ; [...] value is
377 empty so this needs an initial semi-colon. This is useful, e.g.,
378 for setting the rclaptg field for application selection inside
379 mimeview.
380
381 testmodifusemtime = bool
382 Use mtime instead of ctime to test if a file has been modified.
383 The time is used in addition to the size, which is always used.
384 Setting this can reduce re-indexing on systems where extended
385 attributes are used (by some other application), but not in‐
386 dexed, because changing extended attributes only affects ctime.
387 Notes: - This may prevent detection of change in some marginal
388 file rename cases (the target would need to have the same size
389 and mtime). - You should probably also set noxattrfields to 1
390 in this case, except if you still prefer to perform xattr index‐
391 ing, for example if the local file update pattern makes it of
392 value (as in general, there is a risk for pure extended at‐
393 tributes updates without file modification to go undetected).
394 Perform a full index reset after changing this.
395
396
397 noxattrfields = bool
398 Disable extended attributes conversion to metadata fields. This
399 probably needs to be set if testmodifusemtime is set.
400
401 metadatacmds = string
402 Define commands to gather external metadata, e.g. tmsu tags.
403 There can be several entries, separated by semi-colons, each
404 defining which field name the data goes into and the command to
405 use. Don't forget the initial semi-colon. All the field names
406 must be different. You can use aliases in the "field" file if
407 necessary. As a not too pretty hack conceded to convenience,
408 any field name beginning with "rclmulti" will be taken as an in‐
409 dication that the command returns multiple field values inside a
410 text blob formatted as a recoll configuration file ("fieldname =
411 fieldvalue" lines). The rclmultixx name will be ignored, and
412 field names and values will be parsed from the data. Example:
413 metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
414 %f
415
416
417 cachedir = dfn
418 Top directory for Recoll data. Recoll data directories are nor‐
419 mally located relative to the configuration directory (e.g.
420 ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set,
421 the directories are stored under the specified value instead
422 (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
423 ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
424 mboxcachedir, aspellDicDir, which can still be individually
425 specified to override cachedir. Note that if you have multiple
426 configurations, each must have a different cachedir, there is no
427 automatic computation of a subpath under cachedir.
428
429 maxfsoccuppc = int
430 Maximum file system occupation over which we stop indexing. The
431 value is a percentage, corresponding to what the "Capacity" df
432 output column shows. The default value is 0, meaning no check‐
433 ing.
434
435 dbdir = dfn
436 Xapian database directory location. This will be created on
437 first indexing. If the value is not an absolute path, it will be
438 interpreted as relative to cachedir if set, or the configuration
439 directory (-c argument or $RECOLL_CONFDIR). If nothing is spec‐
440 ified, the default is then ~/.recoll/xapiandb/
441
442 idxstatusfile = fn
443 Name of the scratch file where the indexer process updates its
444 status. Default: idxstatus.txt inside the configuration direc‐
445 tory.
446
447 mboxcachedir = dfn
448 Directory location for storing mbox message offsets cache files.
449 This is normally 'mboxcache' under cachedir if set, or else un‐
450 der the configuration directory, but it may be useful to share a
451 directory between different configurations.
452
453 mboxcacheminmbs = int
454 Minimum mbox file size over which we cache the offsets. There is
455 really no sense in caching offsets for small files. The default
456 is 5 MB.
457
458 mboxmaxmsgmbs = int
459 Maximum mbox member message size in megabytes. Size over which
460 we assume that the mbox format is bad or we misinterpreted it,
461 at which point we just stop processing the file.
462
463 webcachedir = dfn
464 Directory where we store the archived web pages. This is only
465 used by the web history indexing code Default: cachedir/webcache
466 if cachedir is set, else $RECOLL_CONFDIR/webcache
467
468 webcachemaxmbs = int
469 Maximum size in MB of the Web archive. This is only used by the
470 web history indexing code. Default: 40 MB. Reducing the size
471 will not physically truncate the file.
472
473 webqueuedir = fn
474 The path to the Web indexing queue. This used to be hard-coded
475 in the old plugin as ~/.recollweb/ToIndex so there would be no
476 need or possibility to change it, but the WebExtensions plugin
477 now downloads the files to the user Downloads directory, and a
478 script moves them to webqueuedir. The script reads this value
479 from the config so it has become possible to change it.
480
481 webdownloadsdir = fn
482 The path to browser downloads directory. This is where the new
483 browser add-on extension has to create the files. They are then
484 moved by a script to webqueuedir.
485
486 aspellDicDir = dfn
487 Aspell dictionary storage directory location. The aspell dictio‐
488 nary (aspdict.(lang).rws) is normally stored in the directory
489 specified by cachedir if set, or under the configuration direc‐
490 tory.
491
492 filtersdir = dfn
493 Directory location for executable input handlers. If RECOLL_FIL‐
494 TERSDIR is set in the environment, we use it instead. Defaults
495 to $prefix/share/recoll/filters. Can be redefined for subdirec‐
496 tories.
497
498 iconsdir = dfn
499 Directory location for icons. The only reason to change this
500 would be if you want to change the icons displayed in the result
501 list. Defaults to $prefix/share/recoll/images
502
503 idxflushmb = int
504 Threshold (megabytes of new data) where we flush from memory to
505 disk index. Setting this allows some control over memory usage
506 by the indexer process. A value of 0 means no explicit flushing,
507 which lets Xapian perform its own thing, meaning flushing every
508 $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted:
509 as memory usage depends on average document size, not only docu‐
510 ment count, the Xapian approach is is not very useful, and you
511 should let Recoll manage the flushes. The program compiled value
512 is 0. The configured default value (from this file) is now 50
513 MB, and should be ok in many cases. You can set it as low as 10
514 to conserve memory, but if you are looking for maximum speed,
515 you may want to experiment with values between 20 and 200. In my
516 experience, values beyond this are always counterproductive. If
517 you find otherwise, please drop me a note.
518
519 filtermaxseconds = int
520 Maximum external filter execution time in seconds. Default 1200
521 (20mn). Set to 0 for no limit. This is mainly to avoid infinite
522 loops in postscript files (loop.ps)
523
524 filtermaxmbytes = int
525 Maximum virtual memory space for filter processes (setr‐
526 limit(RLIMIT_AS)), in megabytes. Note that this includes any
527 mapped libs (there is no reliable Linux way to limit the data
528 space only), so we need to be a bit generous here. Anything over
529 2000 will be ignored on 32 bits machines. The previous default
530 value of 2000 would prevent java pdftk to work when executed
531 from Python rclpdf.py.
532
533 thrQSizes = string
534 Stage input queues configuration. There are three internal
535 queues in the indexing pipeline stages (file data extraction,
536 terms generation, index update). This parameter defines the
537 queue depths for each stage (three integer values). If a value
538 of -1 is given for a given stage, no queue is used, and the
539 thread will go on performing the next stage. In practise, deep
540 queues have not been shown to increase performance. Default: a
541 value of 0 for the first queue tells Recoll to perform autocon‐
542 figuration based on the detected number of CPUs (no need for the
543 two other values in this case). Use thrQSizes = -1 -1 -1 to
544 disable multithreading entirely.
545
546 thrTCounts = string
547 Number of threads used for each indexing stage. The three stages
548 are: file data extraction, terms generation, index update). The
549 use of the counts is also controlled by some special values in
550 thrQSizes: if the first queue depth is 0, all counts are ignored
551 (autoconfigured); if a value of -1 is used for a queue depth,
552 the corresponding thread count is ignored. It makes no sense to
553 use a value other than 1 for the last stage because updating the
554 Xapian index is necessarily single-threaded (and protected by a
555 mutex).
556
557 loglevel = int
558 Log file verbosity 1-6. A value of 2 will print only errors and
559 warnings. 3 will print information like document updates, 4 is
560 quite verbose and 6 very verbose.
561
562 logfilename = fn
563 Log file destination. Use 'stderr' (default) to write to the
564 console.
565
566 idxloglevel = int
567 Override loglevel for the indexer.
568
569 idxlogfilename = fn
570 Override logfilename for the indexer.
571
572 daemloglevel = int
573 Override loglevel for the indexer in real time mode. The default
574 is to use the idx... values if set, else the log... values.
575
576 daemlogfilename = fn
577 Override logfilename for the indexer in real time mode. The de‐
578 fault is to use the idx... values if set, else the log... val‐
579 ues.
580
581 pyloglevel = int
582 Override loglevel for the python module.
583
584 pylogfilename = fn
585 Override logfilename for the python module.
586
587 orgidxconfdir = dfn
588 Original location of the configuration directory. This is used
589 exclusively for movable datasets. Locating the configuration di‐
590 rectory inside the directory tree makes it possible to provide
591 automatic query time path translations once the data set has
592 moved (for example, because it has been mounted on another loca‐
593 tion).
594
595 curidxconfdir = dfn
596 Current location of the configuration directory. Complement
597 orgidxconfdir for movable datasets. This should be used if the
598 configuration directory has been copied from the dataset to an‐
599 other location, either because the dataset is readonly and an
600 r/w copy is desired, or for performance reasons. This records
601 the original moved location before copy, to allow path transla‐
602 tion computations. For example if a dataset originally indexed
603 as '/home/me/mydata/config' has been mounted to '/media/me/my‐
604 data', and the GUI is running from a copied configuration,
605 orgidxconfdir would be '/home/me/mydata/config', and curidx‐
606 confdir (as set in the copied configuration) would be '/me‐
607 dia/me/mydata/config'.
608
609 idxrundir = dfn
610 Indexing process current directory. The input handlers sometimes
611 leave temporary files in the current directory, so it makes
612 sense to have recollindex chdir to some temporary directory. If
613 the value is empty, the current directory is not changed. If the
614 value is (literal) tmp, we use the temporary directory as set by
615 the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the
616 value is an absolute path to a directory, we go there.
617
618 checkneedretryindexscript = fn
619 Script used to heuristically check if we need to retry indexing
620 files which previously failed. The default script checks the
621 modified dates on /usr/bin and /usr/local/bin. A relative path
622 will be looked up in the filters dirs, then in the path. Use an
623 absolute path to do otherwise.
624
625 recollhelperpath = string
626 Additional places to search for helper executables. This is only
627 used on Windows for now.
628
629 idxabsmlen = int
630 Length of abstracts we store while indexing. Recoll stores an
631 abstract for each indexed file. The text can come from an ac‐
632 tual 'abstract' section in the document or will just be the be‐
633 ginning of the document. It is stored in the index so that it
634 can be displayed inside the result lists without decoding the
635 original file. The idxabsmlen parameter defines the size of the
636 stored abstract. The default value is 250 bytes. The search in‐
637 terface gives you the choice to display this stored text or a
638 synthetic abstract built by extracting text around the search
639 terms. If you always prefer the synthetic abstract, you can re‐
640 duce this value and save a little space.
641
642 idxmetastoredlen = int
643 Truncation length of stored metadata fields. This does not af‐
644 fect indexing (the whole field is processed anyway), just the
645 amount of data stored in the index for the purpose of displaying
646 fields inside result lists or previews. The default value is 150
647 bytes which may be too low if you have custom fields.
648
649 idxtexttruncatelen = int
650 Truncation length for all document texts. Only index the begin‐
651 ning of documents. This is not recommended except if you are
652 sure that the interesting keywords are at the top and have se‐
653 vere disk space issues.
654
655 aspellLanguage = string
656 Language definitions to use when creating the aspell dictionary.
657 The value must match a set of aspell language definition files.
658 You can type "aspell dicts" to see a list The default if this is
659 not set is to use the NLS environment to guess the value. The
660 values are the 2-letter language codes (e.g. 'en', 'fr'...)
661
662 aspellAddCreateParam = string
663 Additional option and parameter to aspell dictionary creation
664 command. Some aspell packages may need an additional option
665 (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See
666 Debian bug 772415.
667
668 aspellKeepStderr = bool
669 Set this to have a look at aspell dictionary creation errors.
670 There are always many, so this is mostly for debugging.
671
672 noaspell = bool
673 Disable aspell use. The aspell dictionary generation takes time,
674 and some combinations of aspell version, language, and local
675 terms, result in aspell crashing, so it sometimes makes sense to
676 just disable the thing.
677
678 monauxinterval = int
679 Auxiliary database update interval. The real time indexer only
680 updates the auxiliary databases (stemdb, aspell) periodically,
681 because it would be too costly to do it for every document
682 change. The default period is one hour.
683
684 monixinterval = int
685 Minimum interval (seconds) between processings of the indexing
686 queue. The real time indexer does not process each event when it
687 comes in, but lets the queue accumulate, to diminish overhead
688 and to aggregate multiple events affecting the same file. De‐
689 fault 30 S.
690
691 mondelaypatterns = string
692 Timing parameters for the real time indexing. Definitions for
693 files which get a longer delay before reindexing is allowed.
694 This is for fast-changing files, that should only be reindexed
695 once in a while. A list of wildcardPattern:seconds pairs. The
696 patterns are matched with fnmatch(pattern, path, 0) You can
697 quote entries containing white space with double quotes (quote
698 the whole entry, not the pattern). The default is empty. Exam‐
699 ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
700
701 idxniceprio = int
702 "nice" process priority for the indexing processes. Default: 19
703 (lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
704
705 monioniceclass = int
706 ionice class for the indexing process. Despite the misleading
707 name, and on platforms where this is supported, this affects all
708 indexing processes, not only the real time/monitoring ones. The
709 default value is 3 (use lowest "Idle" priority).
710
711 monioniceclassdata = string
712 ionice class level parameter if the class supports it. The de‐
713 fault is empty, as the default "Idle" class has no levels.
714
715 autodiacsens = bool
716 auto-trigger diacritics sensitivity (raw index only). IF the in‐
717 dex is not stripped, decide if we automatically trigger diacrit‐
718 ics sensitivity if the search term has accented characters (not
719 in unac_except_trans). Else you need to use the query language
720 and the "D" modifier to specify diacritics sensitivity. Default
721 is no.
722
723 autocasesens = bool
724 auto-trigger case sensitivity (raw index only). IF the index is
725 not stripped (see indexStripChars), decide if we automatically
726 trigger character case sensitivity if the search term has upper-
727 case characters in any but the first position. Else you need to
728 use the query language and the "C" modifier to specify charac‐
729 ter-case sensitivity. Default is yes.
730
731 maxTermExpand = int
732 Maximum query expansion count for a single term (e.g.: when us‐
733 ing wildcards). This only affects queries, not indexing. We used
734 to not limit this at all (except for filenames where the limit
735 was too low at 1000), but it is unreasonable with a big index.
736 Default 10000.
737
738 maxXapianClauses = int
739 Maximum number of clauses we add to a single Xapian query. This
740 only affects queries, not indexing. In some cases, the result of
741 term expansion can be multiplicative, and we want to avoid eat‐
742 ing all the memory. Default 50000.
743
744 snippetMaxPosWalk = int
745 Maximum number of positions we walk while populating a snippet
746 for the result list. The default of 1,000,000 may be insuffi‐
747 cient for very big documents, the consequence would be snippets
748 with possibly meaning-altering missing words.
749
750 pdfocr = bool
751 Attempt OCR of PDF files with no text content. This can be de‐
752 fined in subdirectories. The default is off because OCR is so
753 very slow.
754
755 pdfattach = bool
756 Enable PDF attachment extraction by executing pdftk (if avail‐
757 able). This is normally disabled, because it does slow down PDF
758 indexing a bit even if not one attachment is ever found.
759
760 pdfextrameta = string
761 Extract text from selected XMP metadata tags. This is a space-
762 separated list of qualified XMP tag names. Each element can also
763 include a translation to a Recoll field name, separated by a '|'
764 character. If the second element is absent, the tag name is used
765 as the Recoll field names. You will also need to add specifica‐
766 tions to the "fields" file to direct processing of the extracted
767 data.
768
769 pdfextrametafix = fn
770 Define name of XMP field editing script. This defines the name
771 of a script to be loaded for editing XMP field values. The
772 script should define a 'MetaFixer' class with a metafix() method
773 which will be called with the qualified tag name and value of
774 each selected field, for editing or erasing. A new instance is
775 created for each document, so that the object can keep state
776 for, e.g. eliminating duplicate values.
777
778 ocrprogs = string
779 OCR modules to try. The top OCR script will try to load the cor‐
780 responding modules in order and use the first which reports be‐
781 ing capable of performing OCR on the input file. Modules for
782 tesseract (tesseract) and ABBYY FineReader (abbyy) are present
783 in the standard distribution. For compatibility with the previ‐
784 ous version, if this is not defined at all, the default value is
785 "tesseract". Use an explicit empty value if needed. A value of
786 "abbyy tesseract" will try everything.
787
788 ocrcachedir = dfn
789 Location for caching OCR data. The default if this is empty or
790 undefined is to store the cached OCR data under $REC‐
791 OLL_CONFDIR/ocrcache.
792
793 tesseractlang = string
794 Language to assume for tesseract OCR. Important for improving
795 the OCR accuracy. This can also be set through the contents of a
796 file in the currently processed directory. See the rclocrtesser‐
797 act.py script. Example values: eng, fra... See the tesseract
798 documentation.
799
800 tesseractcmd = fn
801 Path for the tesseract command. Do not quote. This is mostly
802 useful on Windows, or for specifying a non-default tesseract
803 command. E.g. on Windows. tesseractcmd = C:/Pro‐
804 gram Files (x86)/Tesseract-OCR/tesseract.exe
805
806
807 abbyylang = string
808 Language to assume for abbyy OCR. Important for improving the
809 OCR accuracy. This can also be set through the contents of a
810 file in the currently processed directory. See the rclocrab‐
811 byy.py script. Typical values: English, French... See the ABBYY
812 documentation.
813
814
815 abbyycmd = fn
816 Path for the abbyy command The ABBY directory is usually not in
817 the path, so you should set this.
818
819
820 mhmboxquirks = string
821 Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
822 for the directory where the email mbox files are stored.
823
824
825
827 recollindex(1) recoll(1)
828
829
830
831 14 November 2012 RECOLL.CONF(5)