1RECOLL.CONF(5) File Formats Manual RECOLL.CONF(5)
2
3
4
6 recoll.conf - main personal configuration file for Recoll
7
9 This file defines the index configuration for the Recoll full-text
10 search system.
11
12 The system-wide configuration file is normally located inside
13 /usr/[local]/share/recoll/examples. Any parameter set in the common
14 file may be overridden by setting it in the personal configuration
15 file, by default: $HOME/.recoll/recoll.conf
16
17 Please note while I try to keep this manual page reasonably up to date,
18 it will frequently lag the current state of the software. The best
19 source of information about the configuration are the comments in the
20 system-wide configuration file or the user manual which you can access
21 from the recoll GUI help menu or on the recoll web site.
22
23
24 A short extract of the file might look as follows:
25
26 # Space-separated list of directories to index.
27 topdirs = ~/docs /usr/share/doc
28
29 [~/somedirectory-with-utf8-txt-files]
30 defaultcharset = utf-8
31
32
33 There are three kinds of lines:
34
35 · Comment or empty
36
37 · Parameter affectation
38
39 · Section definition
40
41 Empty lines or lines beginning with # are ignored.
42
43 Affectation lines are in the form 'name = value'.
44
45 Section lines allow redefining a parameter for a directory subtree.
46 Some of the parameters used for indexing are looked up hierarchically
47 from the more to the less specific. Not all parameters can be meaning‐
48 fully redefined, this is specified for each in the next section.
49
50 The tilde character (~) is expanded in file names to the name of the
51 user's home directory.
52
53 Where values are lists, white space is used for separation, and ele‐
54 ments with embedded spaces can be quoted with double-quotes.
55
57 topdirs = string
58 Space-separated list of files or directories to recursively
59 index. Default to ~ (indexes $HOME). You can use symbolic links
60 in the list, they will be followed, independently of the value
61 of the followLinks variable.
62
63 monitordirs = string
64 Space-separated list of files or directories to monitor for
65 updates. When running the real-time indexer, this allows moni‐
66 toring only a subset of the whole indexed area. The elements
67 must be included in the tree defined by the 'topdirs' members.
68
69 skippedNames = string
70 Files and directories which should be ignored. White space sep‐
71 arated list of wildcard patterns (simple ones, not paths, must
72 contain no / ), which will be tested against file and directory
73 names. The list in the default configuration does not exclude
74 hidden directories (names beginning with a dot), which means
75 that it may index quite a few things that you do not want. On
76 the other hand, email user agents like Thunderbird usually store
77 messages in hidden directories, and you probably want this
78 indexed. One possible solution is to have ".*" in "skipped‐
79 Names", and add things like "~/.thunderbird" "~/.evolution" to
80 "topdirs". Not even the file names are indexed for patterns in
81 this list, see the "noContentSuffixes" variable for an alterna‐
82 tive approach which indexes the file names. Can be redefined for
83 any subtree.
84
85 skippedNames- = string
86 List of name endings to remove from the default skippedNames
87 list.
88
89 skippedNames+ = string
90 List of name endings to add to the default skippedNames list.
91
92 onlyNames = string
93 Regular file name filter patterns If this is set, only the file
94 names not in skippedNames and matching one of the patterns will
95 be considered for indexing. Can be redefined per subtree. Does
96 not apply to directories.
97
98 noContentSuffixes = string
99 List of name endings (not necessarily dot-separated suffixes)
100 for which we don't try MIME type identification, and don't
101 uncompress or index content. Only the names will be indexed.
102 This complements the now obsoleted recoll_noindex list from the
103 mimemap file, which will go away in a future release (the move
104 from mimemap to recoll.conf allows editing the list through the
105 GUI). This is different from skippedNames because these are name
106 ending matches only (not wildcard patterns), and the file name
107 itself gets indexed normally. This can be redefined for subdi‐
108 rectories.
109
110 noContentSuffixes- = string
111 List of name endings to remove from the default noContentSuf‐
112 fixes list.
113
114 noContentSuffixes+ = string
115 List of name endings to add to the default noContentSuffixes
116 list.
117
118 skippedPaths = string
119 Absolute paths we should not go into. Space-separated list of
120 wildcard expressions for absolute filesystem paths. Must be
121 defined at the top level of the configuration file, not in a
122 subsection. Can contain files and directories. The database and
123 configuration directories will automatically be added. The
124 expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME
125 flag set by default. This means that '/' characters must be
126 matched explicitly. You can set 'skippedPathsFnmPathname' to 0
127 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will
128 match '/dir1/dir2/dir3'). The default value contains the usual
129 mount point for removable media to remind you that it is a bad
130 idea to have Recoll work on these (esp. with the monitor: media
131 gets indexed on mount, all data gets erased on unmount). Explic‐
132 itly adding '/media/xxx' to the 'topdirs' variable will override
133 this.
134
135 skippedPathsFnmPathname = bool
136 Set to 0 to override use of FNM_PATHNAME for matching skipped
137 paths.
138
139 nowalkfn = string
140 File name which will cause its parent directory to be skipped.
141 Any directory containing a file with this name will be skipped
142 as if it was part of the skippedPaths list. Ex: .recoll-noindex
143
144 daemSkippedPaths = string
145 skippedPaths equivalent specific to real time indexing. This
146 enables having parts of the tree which are initially indexed but
147 not monitored. If daemSkippedPaths is not set, the daemon uses
148 skippedPaths.
149
150 zipUseSkippedNames = bool
151 Use skippedNames inside Zip archives. Fetched directly by the
152 rclzip handler. Skip the patterns defined by skippedNames inside
153 Zip archives. Can be redefined for subdirectories. See
154 https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil‐
155 teringOutZipArchiveMembers.html
156
157
158 zipSkippedNames = string
159 Space-separated list of wildcard expressions for names that
160 should be ignored inside zip archives. This is used directly by
161 the zip handler. If zipUseSkippedNames is not set, zipSkipped‐
162 Names defines the patterns to be skipped inside archives. If
163 zipUseSkippedNames is set, the two lists are concatenated and
164 used. Can be redefined for subdirectories. See https://www.les‐
165 bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
166 bers.html
167
168
169 followLinks = bool
170 Follow symbolic links during indexing. The default is to ignore
171 symbolic links to avoid multiple indexing of linked files. No
172 effort is made to avoid duplication when this option is set to
173 true. This option can be set individually for each of the
174 'topdirs' members by using sections. It can not be changed below
175 the 'topdirs' level. Links in the 'topdirs' list itself are
176 always followed.
177
178 indexedmimetypes = string
179 Restrictive list of indexed mime types. Normally not set (in
180 which case all supported types are indexed). If it is set, only
181 the types from the list will have their contents indexed. The
182 names will be indexed anyway if indexallfilenames is set
183 (default). MIME type names should be taken from the mimemap file
184 (the values may be different from xdg-mime or file -i output in
185 some cases). Can be redefined for subtrees.
186
187 excludedmimetypes = string
188 List of excluded MIME types. Lets you exclude some types from
189 indexing. MIME type names should be taken from the mimemap file
190 (the values may be different from xdg-mime or file -i output in
191 some cases) Can be redefined for subtrees.
192
193 nomd5types = string
194 Don't compute md5 for these types. md5 checksums are used only
195 for deduplicating results, and can be very expensive to compute
196 on multimedia or other big files. This list lets you turn off
197 md5 computation for selected types. It is global (no redefini‐
198 tion for subtrees). At the moment, it only has an effect for
199 external handlers (exec and execm). The file types can be speci‐
200 fied by listing either MIME types (e.g. audio/mpeg) or handler
201 names (e.g. rclaudio).
202
203 compressedfilemaxkbs = int
204 Size limit for compressed files. We need to decompress these in
205 a temporary directory for identification, which can be wasteful
206 in some cases. Limit the waste. Negative means no limit. 0
207 results in no processing of any compressed file. Default 50 MB.
208
209 textfilemaxmbs = int
210 Size limit for text files. Mostly for skipping monster logs.
211 Default 20 MB.
212
213 indexallfilenames = bool
214 Index the file names of unprocessed files Index the names of
215 files the contents of which we don't index because of an
216 excluded or unsupported MIME type.
217
218 usesystemfilecommand = bool
219 Use a system command for file MIME type guessing as a final step
220 in file type identification This is generally useful, but will
221 usually cause the indexing of many bogus 'text' files. See 'sys‐
222 temfilecommand' for the command used.
223
224 systemfilecommand = string
225 Command used to guess MIME types if the internal methods fails
226 This should be a "file -i" workalike. The file path will be
227 added as a last parameter to the command line. "xdg-mime" works
228 better than the traditional "file" command, and is now the con‐
229 figured default (with a hard-coded fallback to "file")
230
231 processwebqueue = bool
232 Decide if we process the Web queue. The queue is a directory
233 where the Recoll Web browser plugins create the copies of vis‐
234 ited pages.
235
236 textfilepagekbs = int
237 Page size for text files. If this is set, text/plain files will
238 be divided into documents of approximately this size. Will
239 reduce memory usage at index time and help with loading data in
240 the preview window at query time. Particularly useful with very
241 big files, such as application or system logs. Also see
242 textfilemaxmbs and compressedfilemaxkbs.
243
244 membermaxkbs = int
245 Size limit for archive members. This is passed to the filters in
246 the environment as RECOLL_FILTER_MAXMEMBERKB.
247
248 indexStripChars = bool
249 Decide if we store character case and diacritics in the index.
250 If we do, searches sensitive to case and diacritics can be per‐
251 formed, but the index will be bigger, and some marginal weird‐
252 ness may sometimes occur. The default is a stripped index. When
253 using multiple indexes for a search, this parameter must be
254 defined identically for all. Changing the value implies an index
255 reset.
256
257 indexStoreDocText = bool
258 Decide if we store the documents' text content in the index.
259 Storing the text allows extracting snippets from it at query
260 time, instead of building them from index position data. Newer
261 Xapian index formats have rendered our use of positions list
262 unacceptably slow in some cases. The last Xapian index format
263 with good performance for the old method is Chert, which is
264 default for 1.2, still supported but not default in 1.4 and will
265 be dropped in 1.6. The stored document text is translated from
266 its original format to UTF-8 plain text, but not stripped of
267 upper-case, diacritics, or punctuation signs. Storing it
268 increases the index size by 10-20% typically, but also allows
269 for nicer snippets, so it may be worth enabling it even if not
270 strictly needed for performance if you can afford the space.
271 The variable only has an effect when creating an index, meaning
272 that the xapiandb directory must not exist yet. Its exact effect
273 depends on the Xapian version. For Xapian 1.4, if the variable
274 is set to 0, the Chert format will be used, and the text will
275 not be stored. If the variable is 1, Glass will be used, and the
276 text stored. For Xapian 1.2, and for versions after 1.5 and
277 newer, the index format is always the default, but the variable
278 controls if the text is stored or not, and the abstract genera‐
279 tion method. With Xapian 1.5 and later, and the variable set to
280 0, abstract generation may be very slow, but this setting may
281 still be useful to save space if you do not use abstract genera‐
282 tion at all.
283
284
285 nonumbers = bool
286 Decides if terms will be generated for numbers. For example
287 "123", "1.5e6", 192.168.1.4, would not be indexed if nonumbers
288 is set ("value123" would still be). Numbers are often quite
289 interesting to search for, and this should probably not be set
290 except for special situations, ie, scientific documents with
291 huge amounts of numbers in them, where setting nonumbers will
292 reduce the index size. This can only be set for a whole index,
293 not for a subtree.
294
295 dehyphenate = bool
296 Determines if we index 'coworker' also when the input is 'co-
297 worker'. This is new in version 1.22, and on by default. Setting
298 the variable to off allows restoring the previous behaviour.
299
300 backslashasletter = bool
301 Process backslash as normal letter. This may make sense for peo‐
302 ple wanting to index TeX commands as such but is not of much
303 general use.
304
305 underscoreasletter = bool
306 Process underscore as normal letter. This makes sense in so many
307 cases that one wonders if it should not be the default.
308
309 maxtermlength = int
310 Maximum term length. Words longer than this will be discarded.
311 The default is 40 and used to be hard-coded, but it can now be
312 adjusted. You need an index reset if you change the value.
313
314 nocjk = bool
315 Decides if specific East Asian (Chinese Korean Japanese) charac‐
316 ters/word splitting is turned off. This will save a small amount
317 of CPU if you have no CJK documents. If your document base does
318 include such text but you are not interested in searching it,
319 setting nocjk may be a significant time and space saver.
320
321 cjkngramlen = int
322 This lets you adjust the size of n-grams used for indexing CJK
323 text. The default value of 2 is probably appropriate in most
324 cases. A value of 3 would allow more precision and efficiency on
325 longer words, but the index will be approximately twice as
326 large.
327
328 indexstemminglanguages = string
329 Languages for which to create stemming expansion data. Stemmer
330 names can be found by executing 'recollindex -l', or this can
331 also be set from a list in the GUI. The values are full language
332 names, e.g. english, french...
333
334 defaultcharset = string
335 Default character set. This is used for files which do not con‐
336 tain a character set definition (e.g.: text/plain). Values found
337 inside files, e.g. a 'charset' tag in HTML documents, will over‐
338 ride it. If this is not set, the default character set is the
339 one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),
340 or ultimately iso-8859-1 (cp-1252 in fact). If for some reason
341 you want a general default which does not match your LANG and is
342 not 8859-1, use this variable. This can be redefined for any
343 sub-directory.
344
345 unac_except_trans = string
346 A list of characters, encoded in UTF-8, which should be handled
347 specially when converting text to unaccented lowercase. For
348 example, in Swedish, the letter a with diaeresis has full alpha‐
349 bet citizenship and should not be turned into an a. Each ele‐
350 ment in the space-separated list has the special character as
351 first element and the translation following. The handling of
352 both the lowercase and upper-case versions of a character should
353 be specified, as appartenance to the list will turn-off both
354 standard accent and case processing. The value is global and
355 affects both indexing and querying. Examples:
356 Swedish:
357 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
358 fifi flfl åå Åå
359 German:
360 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
361 fifi flfl
362 French: you probably want to decompose oe and ae and nobody
363 would type a German ß
364 unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
365 The default for all until someone protests follows. These decom‐
366 positions are not performed by unac, but it is unlikely that
367 someone would type the composed forms in a search.
368 unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
369
370 maildefcharset = string
371 Overrides the default character set for email messages which
372 don't specify one. This is mainly useful for readpst (libpst)
373 dumps, which are utf-8 but do not say so.
374
375 localfields = string
376 Set fields on all files (usually of a specific fs area). Syntax
377 is the usual: name = value ; attr1 = val1 ; [...] value is
378 empty so this needs an initial semi-colon. This is useful, e.g.,
379 for setting the rclaptg field for application selection inside
380 mimeview.
381
382 testmodifusemtime = bool
383 Use mtime instead of ctime to test if a file has been modified.
384 The time is used in addition to the size, which is always used.
385 Setting this can reduce re-indexing on systems where extended
386 attributes are used (by some other application), but not
387 indexed, because changing extended attributes only affects
388 ctime. Notes: - This may prevent detection of change in some
389 marginal file rename cases (the target would need to have the
390 same size and mtime). - You should probably also set noxattr‐
391 fields to 1 in this case, except if you still prefer to perform
392 xattr indexing, for example if the local file update pattern
393 makes it of value (as in general, there is a risk for pure
394 extended attributes updates without file modification to go
395 undetected). Perform a full index reset after changing this.
396
397
398 noxattrfields = bool
399 Disable extended attributes conversion to metadata fields. This
400 probably needs to be set if testmodifusemtime is set.
401
402 metadatacmds = string
403 Define commands to gather external metadata, e.g. tmsu tags.
404 There can be several entries, separated by semi-colons, each
405 defining which field name the data goes into and the command to
406 use. Don't forget the initial semi-colon. All the field names
407 must be different. You can use aliases in the "field" file if
408 necessary. As a not too pretty hack conceded to convenience,
409 any field name beginning with "rclmulti" will be taken as an
410 indication that the command returns multiple field values inside
411 a text blob formatted as a recoll configuration file ("fieldname
412 = fieldvalue" lines). The rclmultixx name will be ignored, and
413 field names and values will be parsed from the data. Example:
414 metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
415 %f
416
417
418 cachedir = dfn
419 Top directory for Recoll data. Recoll data directories are nor‐
420 mally located relative to the configuration directory (e.g.
421 ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set,
422 the directories are stored under the specified value instead
423 (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
424 ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
425 mboxcachedir, aspellDicDir, which can still be individually
426 specified to override cachedir. Note that if you have multiple
427 configurations, each must have a different cachedir, there is no
428 automatic computation of a subpath under cachedir.
429
430 maxfsoccuppc = int
431 Maximum file system occupation over which we stop indexing. The
432 value is a percentage, corresponding to what the "Capacity" df
433 output column shows. The default value is 0, meaning no check‐
434 ing.
435
436 dbdir = dfn
437 Xapian database directory location. This will be created on
438 first indexing. If the value is not an absolute path, it will be
439 interpreted as relative to cachedir if set, or the configuration
440 directory (-c argument or $RECOLL_CONFDIR). If nothing is spec‐
441 ified, the default is then ~/.recoll/xapiandb/
442
443 idxstatusfile = fn
444 Name of the scratch file where the indexer process updates its
445 status. Default: idxstatus.txt inside the configuration direc‐
446 tory.
447
448 mboxcachedir = dfn
449 Directory location for storing mbox message offsets cache files.
450 This is normally 'mboxcache' under cachedir if set, or else
451 under the configuration directory, but it may be useful to share
452 a directory between different configurations.
453
454 mboxcacheminmbs = int
455 Minimum mbox file size over which we cache the offsets. There is
456 really no sense in caching offsets for small files. The default
457 is 5 MB.
458
459 mboxmaxmsgmbs = int
460 Maximum mbox member message size in megabytes. Size over which
461 we assume that the mbox format is bad or we misinterpreted it,
462 at which point we just stop processing the file.
463
464 webcachedir = dfn
465 Directory where we store the archived web pages. This is only
466 used by the web history indexing code Default: cachedir/webcache
467 if cachedir is set, else $RECOLL_CONFDIR/webcache
468
469 webcachemaxmbs = int
470 Maximum size in MB of the Web archive. This is only used by the
471 web history indexing code. Default: 40 MB. Reducing the size
472 will not physically truncate the file.
473
474 webqueuedir = fn
475 The path to the Web indexing queue. This used to be hard-coded
476 in the old plugin as ~/.recollweb/ToIndex so there would be no
477 need or possibility to change it, but the WebExtensions plugin
478 now downloads the files to the user Downloads directory, and a
479 script moves them to webqueuedir. The script reads this value
480 from the config so it has become possible to change it.
481
482 webdownloadsdir = fn
483 The path to browser downloads directory. This is where the new
484 browser add-on extension has to create the files. They are then
485 moved by a script to webqueuedir.
486
487 aspellDicDir = dfn
488 Aspell dictionary storage directory location. The aspell dictio‐
489 nary (aspdict.(lang).rws) is normally stored in the directory
490 specified by cachedir if set, or under the configuration direc‐
491 tory.
492
493 filtersdir = dfn
494 Directory location for executable input handlers. If RECOLL_FIL‐
495 TERSDIR is set in the environment, we use it instead. Defaults
496 to $prefix/share/recoll/filters. Can be redefined for subdirec‐
497 tories.
498
499 iconsdir = dfn
500 Directory location for icons. The only reason to change this
501 would be if you want to change the icons displayed in the result
502 list. Defaults to $prefix/share/recoll/images
503
504 idxflushmb = int
505 Threshold (megabytes of new data) where we flush from memory to
506 disk index. Setting this allows some control over memory usage
507 by the indexer process. A value of 0 means no explicit flushing,
508 which lets Xapian perform its own thing, meaning flushing every
509 $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted:
510 as memory usage depends on average document size, not only docu‐
511 ment count, the Xapian approach is is not very useful, and you
512 should let Recoll manage the flushes. The program compiled value
513 is 0. The configured default value (from this file) is now 50
514 MB, and should be ok in many cases. You can set it as low as 10
515 to conserve memory, but if you are looking for maximum speed,
516 you may want to experiment with values between 20 and 200. In my
517 experience, values beyond this are always counterproductive. If
518 you find otherwise, please drop me a note.
519
520 filtermaxseconds = int
521 Maximum external filter execution time in seconds. Default 1200
522 (20mn). Set to 0 for no limit. This is mainly to avoid infinite
523 loops in postscript files (loop.ps)
524
525 filtermaxmbytes = int
526 Maximum virtual memory space for filter processes (setr‐
527 limit(RLIMIT_AS)), in megabytes. Note that this includes any
528 mapped libs (there is no reliable Linux way to limit the data
529 space only), so we need to be a bit generous here. Anything over
530 2000 will be ignored on 32 bits machines. The previous default
531 value of 2000 would prevent java pdftk to work when executed
532 from Python rclpdf.py.
533
534 thrQSizes = string
535 Stage input queues configuration. There are three internal
536 queues in the indexing pipeline stages (file data extraction,
537 terms generation, index update). This parameter defines the
538 queue depths for each stage (three integer values). If a value
539 of -1 is given for a given stage, no queue is used, and the
540 thread will go on performing the next stage. In practise, deep
541 queues have not been shown to increase performance. Default: a
542 value of 0 for the first queue tells Recoll to perform autocon‐
543 figuration based on the detected number of CPUs (no need for the
544 two other values in this case). Use thrQSizes = -1 -1 -1 to
545 disable multithreading entirely.
546
547 thrTCounts = string
548 Number of threads used for each indexing stage. The three stages
549 are: file data extraction, terms generation, index update). The
550 use of the counts is also controlled by some special values in
551 thrQSizes: if the first queue depth is 0, all counts are ignored
552 (autoconfigured); if a value of -1 is used for a queue depth,
553 the corresponding thread count is ignored. It makes no sense to
554 use a value other than 1 for the last stage because updating the
555 Xapian index is necessarily single-threaded (and protected by a
556 mutex).
557
558 loglevel = int
559 Log file verbosity 1-6. A value of 2 will print only errors and
560 warnings. 3 will print information like document updates, 4 is
561 quite verbose and 6 very verbose.
562
563 logfilename = fn
564 Log file destination. Use 'stderr' (default) to write to the
565 console.
566
567 idxloglevel = int
568 Override loglevel for the indexer.
569
570 idxlogfilename = fn
571 Override logfilename for the indexer.
572
573 daemloglevel = int
574 Override loglevel for the indexer in real time mode. The default
575 is to use the idx... values if set, else the log... values.
576
577 daemlogfilename = fn
578 Override logfilename for the indexer in real time mode. The
579 default is to use the idx... values if set, else the log... val‐
580 ues.
581
582 pyloglevel = int
583 Override loglevel for the python module.
584
585 pylogfilename = fn
586 Override logfilename for the python module.
587
588 orgidxconfdir = dfn
589 Original location of the configuration directory. This is used
590 exclusively for movable datasets. Locating the configuration
591 directory inside the directory tree makes it possible to provide
592 automatic query time path translations once the data set has
593 moved (for example, because it has been mounted on another loca‐
594 tion).
595
596 curidxconfdir = dfn
597 Current location of the configuration directory. Complement
598 orgidxconfdir for movable datasets. This should be used if the
599 configuration directory has been copied from the dataset to
600 another location, either because the dataset is readonly and an
601 r/w copy is desired, or for performance reasons. This records
602 the original moved location before copy, to allow path transla‐
603 tion computations. For example if a dataset originally indexed
604 as '/home/me/mydata/config' has been mounted to
605 '/media/me/mydata', and the GUI is running from a copied config‐
606 uration, orgidxconfdir would be '/home/me/mydata/config', and
607 curidxconfdir (as set in the copied configuration) would be
608
609 idxrundir = dfn
610 Indexing process current directory. The input handlers sometimes
611 leave temporary files in the current directory, so it makes
612 sense to have recollindex chdir to some temporary directory. If
613 the value is empty, the current directory is not changed. If the
614 value is (literal) tmp, we use the temporary directory as set by
615 the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the
616 value is an absolute path to a directory, we go there.
617
618 checkneedretryindexscript = fn
619 Script used to heuristically check if we need to retry indexing
620 files which previously failed. The default script checks the
621 modified dates on /usr/bin and /usr/local/bin. A relative path
622 will be looked up in the filters dirs, then in the path. Use an
623 absolute path to do otherwise.
624
625 recollhelperpath = string
626 Additional places to search for helper executables. This is only
627 used on Windows for now.
628
629 idxabsmlen = int
630 Length of abstracts we store while indexing. Recoll stores an
631 abstract for each indexed file. The text can come from an
632 actual 'abstract' section in the document or will just be the
633 beginning of the document. It is stored in the index so that it
634 can be displayed inside the result lists without decoding the
635 original file. The idxabsmlen parameter defines the size of the
636 stored abstract. The default value is 250 bytes. The search
637 interface gives you the choice to display this stored text or a
638 synthetic abstract built by extracting text around the search
639 terms. If you always prefer the synthetic abstract, you can
640 reduce this value and save a little space.
641
642 idxmetastoredlen = int
643 Truncation length of stored metadata fields. This does not
644 affect indexing (the whole field is processed anyway), just the
645 amount of data stored in the index for the purpose of displaying
646 fields inside result lists or previews. The default value is 150
647 bytes which may be too low if you have custom fields.
648
649 idxtexttruncatelen = int
650 Truncation length for all document texts. Only index the begin‐
651 ning of documents. This is not recommended except if you are
652 sure that the interesting keywords are at the top and have
653 severe disk space issues.
654
655 aspellLanguage = string
656 Language definitions to use when creating the aspell dictionary.
657 The value must match a set of aspell language definition files.
658 You can type "aspell dicts" to see a list The default if this is
659 not set is to use the NLS environment to guess the value. The
660 values are the 2-letter language codes (e.g. 'en', 'fr'...)
661
662 aspellAddCreateParam = string
663 Additional option and parameter to aspell dictionary creation
664 command. Some aspell packages may need an additional option
665 (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See
666 Debian bug 772415.
667
668 aspellKeepStderr = bool
669 Set this to have a look at aspell dictionary creation errors.
670 There are always many, so this is mostly for debugging.
671
672 noaspell = bool
673 Disable aspell use. The aspell dictionary generation takes time,
674 and some combinations of aspell version, language, and local
675 terms, result in aspell crashing, so it sometimes makes sense to
676 just disable the thing.
677
678 monauxinterval = int
679 Auxiliary database update interval. The real time indexer only
680 updates the auxiliary databases (stemdb, aspell) periodically,
681 because it would be too costly to do it for every document
682 change. The default period is one hour.
683
684 monixinterval = int
685 Minimum interval (seconds) between processings of the indexing
686 queue. The real time indexer does not process each event when it
687 comes in, but lets the queue accumulate, to diminish overhead
688 and to aggregate multiple events affecting the same file.
689 Default 30 S.
690
691 mondelaypatterns = string
692 Timing parameters for the real time indexing. Definitions for
693 files which get a longer delay before reindexing is allowed.
694 This is for fast-changing files, that should only be reindexed
695 once in a while. A list of wildcardPattern:seconds pairs. The
696 patterns are matched with fnmatch(pattern, path, 0) You can
697 quote entries containing white space with double quotes (quote
698 the whole entry, not the pattern). The default is empty. Exam‐
699 ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
700
701 idxniceprio = int
702 "nice" process priority for the indexing processes. Default: 19
703 (lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
704
705 monioniceclass = int
706 ionice class for the indexing process. Despite the misleading
707 name, and on platforms where this is supported, this affects all
708 indexing processes, not only the real time/monitoring ones. The
709 default value is 3 (use lowest "Idle" priority).
710
711 monioniceclassdata = string
712 ionice class level parameter if the class supports it. The
713 default is empty, as the default "Idle" class has no levels.
714
715 autodiacsens = bool
716 auto-trigger diacritics sensitivity (raw index only). IF the
717 index is not stripped, decide if we automatically trigger dia‐
718 critics sensitivity if the search term has accented characters
719 (not in unac_except_trans). Else you need to use the query lan‐
720 guage and the "D" modifier to specify diacritics sensitivity.
721 Default is no.
722
723 autocasesens = bool
724 auto-trigger case sensitivity (raw index only). IF the index is
725 not stripped (see indexStripChars), decide if we automatically
726 trigger character case sensitivity if the search term has upper-
727 case characters in any but the first position. Else you need to
728 use the query language and the "C" modifier to specify charac‐
729 ter-case sensitivity. Default is yes.
730
731 maxTermExpand = int
732 Maximum query expansion count for a single term (e.g.: when
733 using wildcards). This only affects queries, not indexing. We
734 used to not limit this at all (except for filenames where the
735 limit was too low at 1000), but it is unreasonable with a big
736 index. Default 10000.
737
738 maxXapianClauses = int
739 Maximum number of clauses we add to a single Xapian query. This
740 only affects queries, not indexing. In some cases, the result of
741 term expansion can be multiplicative, and we want to avoid eat‐
742 ing all the memory. Default 50000.
743
744 snippetMaxPosWalk = int
745 Maximum number of positions we walk while populating a snippet
746 for the result list. The default of 1,000,000 may be insuffi‐
747 cient for very big documents, the consequence would be snippets
748 with possibly meaning-altering missing words.
749
750 pdfocr = bool
751 Attempt OCR of PDF files with no text content. This can be
752 defined in subdirectories. The default is off because OCR is so
753 very slow.
754
755 pdfattach = bool
756 Enable PDF attachment extraction by executing pdftk (if avail‐
757 able). This is normally disabled, because it does slow down PDF
758 indexing a bit even if not one attachment is ever found.
759
760 pdfextrameta = string
761 Extract text from selected XMP metadata tags. This is a space-
762 separated list of qualified XMP tag names. Each element can also
763 include a translation to a Recoll field name, separated by a '|'
764 character. If the second element is absent, the tag name is used
765 as the Recoll field names. You will also need to add specifica‐
766 tions to the "fields" file to direct processing of the extracted
767 data.
768
769 pdfextrametafix = fn
770 Define name of XMP field editing script. This defines the name
771 of a script to be loaded for editing XMP field values. The
772 script should define a 'MetaFixer' class with a metafix() method
773 which will be called with the qualified tag name and value of
774 each selected field, for editing or erasing. A new instance is
775 created for each document, so that the object can keep state
776 for, e.g. eliminating duplicate values.
777
778 ocrprogs = string
779 OCR modules to try. The top OCR script will try to load the cor‐
780 responding modules in order and use the first which reports
781 being capable of performing OCR on the input file. Modules for
782 tesseract (tesseract) and ABBYY FineReader (abbyy) are present
783 in the standard distribution. For compatibility with the previ‐
784 ous version, if this is not defined at all, the default value is
785 "tesseract". Use an explicit empty value if needed. A value of
786 "abbyy tesseract" will try everything.
787
788 ocrcachedir = dfn
789 Location for caching OCR data. The default if this is empty or
790 undefined is to store the cached OCR data under $REC‐
791 OLL_CONFDIR/ocrcache.
792
793 tesseractlang = string
794 Language to assume for tesseract OCR. Important for improving
795 the OCR accuracy. This can also be set through the contents of a
796 file in the currently processed directory. See the rclocrtesser‐
797 act.py script. Example values: eng, fra... See the tesseract
798 documentation.
799
800 tesseractcmd = fn
801 Path for the tesseract command. Do not quote. This is mostly
802 useful on Windows, or for specifying a non-default tesseract
803 command. E.g. on Windows. tesseractcmd = C:/Pro‐
804 gram Files (x86)/Tesseract-OCR/tesseract.exe
805
806
807 abbyylang = string
808 Language to assume for abbyy OCR. Important for improving the
809 OCR accuracy. This can also be set through the contents of a
810 file in the currently processed directory. See the rclocrab‐
811 byy.py script. Typical values: English, French... See the ABBYY
812 documentation.
813
814
815 abbyycmd = fn
816 Path for the abbyy command The ABBY directory is usually not in
817 the path, so you should set this.
818
819
820 mhmboxquirks = string
821 Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
822 for the directory where the email mbox files are stored.
823
824
825
827 recollindex(1) recoll(1)
828
829
830
831 14 November 2012 RECOLL.CONF(5)