1RECOLL.CONF(5) File Formats Manual RECOLL.CONF(5)
2
3
4
6 recoll.conf - main personal configuration file for Recoll
7
9 This file defines the index configuration for the Recoll full-text
10 search system.
11
12 The system-wide configuration file is normally located inside
13 /usr/[local]/share/recoll/examples. Any parameter set in the common
14 file may be overridden by setting it in the personal configuration
15 file, by default: $HOME/.recoll/recoll.conf
16
17 Please note while I try to keep this manual page reasonably up to date,
18 it will frequently lag the current state of the software. The best
19 source of information about the configuration are the comments in the
20 system-wide configuration file or the user manual which you can access
21 from the recoll GUI help menu or on the recoll web site.
22
23
24 A short extract of the file might look as follows:
25
26 # Space-separated list of directories to index.
27 topdirs = ~/docs /usr/share/doc
28
29 [~/somedirectory-with-utf8-txt-files]
30 defaultcharset = utf-8
31
32
33 There are three kinds of lines:
34
35 · Comment or empty
36
37 · Parameter affectation
38
39 · Section definition
40
41 Empty lines or lines beginning with # are ignored.
42
43 Affectation lines are in the form 'name = value'.
44
45 Section lines allow redefining a parameter for a directory subtree.
46 Some of the parameters used for indexing are looked up hierarchically
47 from the more to the less specific. Not all parameters can be meaning‐
48 fully redefined, this is specified for each in the next section.
49
50 The tilde character (~) is expanded in file names to the name of the
51 user's home directory.
52
53 Where values are lists, white space is used for separation, and ele‐
54 ments with embedded spaces can be quoted with double-quotes.
55
57 topdirs = string
58 Space-separated list of files or directories to recursively
59 index. Default to ~ (indexes $HOME). You can use symbolic links
60 in the list, they will be followed, independently of the value
61 of the followLinks variable.
62
63 monitordirs = string
64 Space-separated list of files or directories to monitor for
65 updates. When running the real-time indexer, this allows moni‐
66 toring only a subset of the whole indexed area. The elements
67 must be included in the tree defined by the 'topdirs' members.
68
69 skippedNames = string
70 Files and directories which should be ignored. White space sep‐
71 arated list of wildcard patterns (simple ones, not paths, must
72 contain no / ), which will be tested against file and directory
73 names. The list in the default configuration does not exclude
74 hidden directories (names beginning with a dot), which means
75 that it may index quite a few things that you do not want. On
76 the other hand, email user agents like Thunderbird usually store
77 messages in hidden directories, and you probably want this
78 indexed. One possible solution is to have ".*" in "skipped‐
79 Names", and add things like "~/.thunderbird" "~/.evolution" to
80 "topdirs". Not even the file names are indexed for patterns in
81 this list, see the "noContentSuffixes" variable for an alterna‐
82 tive approach which indexes the file names. Can be redefined for
83 any subtree.
84
85 skippedNames- = string
86 List of name endings to remove from the default skippedNames
87 list.
88
89 skippedNames+ = string
90 List of name endings to add to the default skippedNames list.
91
92 noContentSuffixes = string
93 List of name endings (not necessarily dot-separated suffixes)
94 for which we don't try MIME type identification, and don't
95 uncompress or index content. Only the names will be indexed.
96 This complements the now obsoleted recoll_noindex list from the
97 mimemap file, which will go away in a future release (the move
98 from mimemap to recoll.conf allows editing the list through the
99 GUI). This is different from skippedNames because these are name
100 ending matches only (not wildcard patterns), and the file name
101 itself gets indexed normally. This can be redefined for subdi‐
102 rectories.
103
104 noContentSuffixes- = string
105 List of name endings to remove from the default noContentSuf‐
106 fixes list.
107
108 noContentSuffixes+ = string
109 List of name endings to add to the default noContentSuffixes
110 list.
111
112 skippedPaths = string
113 Absolute paths we should not go into. Space-separated list of
114 wildcard expressions for absolute filesystem paths. Must be
115 defined at the top level of the configuration file, not in a
116 subsection. Can contain files and directories. The database and
117 configuration directories will automatically be added. The
118 expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME
119 flag set by default. This means that '/' characters must be
120 matched explicitly. You can set 'skippedPathsFnmPathname' to 0
121 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will
122 match '/dir1/dir2/dir3'). The default value contains the usual
123 mount point for removable media to remind you that it is a bad
124 idea to have Recoll work on these (esp. with the monitor: media
125 gets indexed on mount, all data gets erased on unmount). Explic‐
126 itly adding '/media/xxx' to the 'topdirs' variable will override
127 this.
128
129 skippedPathsFnmPathname = bool
130 Set to 0 to override use of FNM_PATHNAME for matching skipped
131 paths.
132
133 nowalkfn = string
134 File name which will cause its parent directory to be skipped.
135 Any directory containing a file with this name will be skipped
136 as if it was part of the skippedPaths list. Ex: .recoll-noindex
137
138 daemSkippedPaths = string
139 skippedPaths equivalent specific to real time indexing. This
140 enables having parts of the tree which are initially indexed but
141 not monitored. If daemSkippedPaths is not set, the daemon uses
142 skippedPaths.
143
144 zipUseSkippedNames = bool
145 Use skippedNames inside Zip archives. Fetched directly by the
146 rclzip handler. Skip the patterns defined by skippedNames inside
147 Zip archives. Can be redefined for subdirectories. See
148 https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil‐
149 teringOutZipArchiveMembers.html
150
151
152 zipSkippedNames = string
153 Space-separated list of wildcard expressions for names that
154 should be ignored inside zip archives. This is used directly by
155 the zip handler. If zipUseSkippedNames is not set, zipSkipped‐
156 Names defines the patterns to be skipped inside archives. If
157 zipUseSkippedNames is set, the two lists are concatenated and
158 used. Can be redefined for subdirectories. See https://www.les‐
159 bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
160 bers.html
161
162
163 followLinks = bool
164 Follow symbolic links during indexing. The default is to ignore
165 symbolic links to avoid multiple indexing of linked files. No
166 effort is made to avoid duplication when this option is set to
167 true. This option can be set individually for each of the
168 'topdirs' members by using sections. It can not be changed below
169 the 'topdirs' level. Links in the 'topdirs' list itself are
170 always followed.
171
172 indexedmimetypes = string
173 Restrictive list of indexed mime types. Normally not set (in
174 which case all supported types are indexed). If it is set, only
175 the types from the list will have their contents indexed. The
176 names will be indexed anyway if indexallfilenames is set
177 (default). MIME type names should be taken from the mimemap file
178 (the values may be different from xdg-mime or file -i output in
179 some cases). Can be redefined for subtrees.
180
181 excludedmimetypes = string
182 List of excluded MIME types. Lets you exclude some types from
183 indexing. MIME type names should be taken from the mimemap file
184 (the values may be different from xdg-mime or file -i output in
185 some cases) Can be redefined for subtrees.
186
187 nomd5types = string
188 Don't compute md5 for these types. md5 checksums are used only
189 for deduplicating results, and can be very expensive to compute
190 on multimedia or other big files. This list lets you turn off
191 md5 computation for selected types. It is global (no redefini‐
192 tion for subtrees). At the moment, it only has an effect for
193 external handlers (exec and execm). The file types can be speci‐
194 fied by listing either MIME types (e.g. audio/mpeg) or handler
195 names (e.g. rclaudio).
196
197 compressedfilemaxkbs = int
198 Size limit for compressed files. We need to decompress these in
199 a temporary directory for identification, which can be wasteful
200 in some cases. Limit the waste. Negative means no limit. 0
201 results in no processing of any compressed file. Default 50 MB.
202
203 textfilemaxmbs = int
204 Size limit for text files. Mostly for skipping monster logs.
205 Default 20 MB.
206
207 indexallfilenames = bool
208 Index the file names of unprocessed files Index the names of
209 files the contents of which we don't index because of an
210 excluded or unsupported MIME type.
211
212 usesystemfilecommand = bool
213 Use a system command for file MIME type guessing as a final step
214 in file type identification This is generally useful, but will
215 usually cause the indexing of many bogus 'text' files. See 'sys‐
216 temfilecommand' for the command used.
217
218 systemfilecommand = string
219 Command used to guess MIME types if the internal methods fails
220 This should be a "file -i" workalike. The file path will be
221 added as a last parameter to the command line. "xdg-mime" works
222 better than the traditional "file" command, and is now the con‐
223 figured default (with a hard-coded fallback to "file")
224
225 processwebqueue = bool
226 Decide if we process the Web queue. The queue is a directory
227 where the Recoll Web browser plugins create the copies of vis‐
228 ited pages.
229
230 textfilepagekbs = int
231 Page size for text files. If this is set, text/plain files will
232 be divided into documents of approximately this size. Will
233 reduce memory usage at index time and help with loading data in
234 the preview window at query time. Particularly useful with very
235 big files, such as application or system logs. Also see
236 textfilemaxmbs and compressedfilemaxkbs.
237
238 membermaxkbs = int
239 Size limit for archive members. This is passed to the filters in
240 the environment as RECOLL_FILTER_MAXMEMBERKB.
241
242 indexStripChars = bool
243 Decide if we store character case and diacritics in the index.
244 If we do, searches sensitive to case and diacritics can be per‐
245 formed, but the index will be bigger, and some marginal weird‐
246 ness may sometimes occur. The default is a stripped index. When
247 using multiple indexes for a search, this parameter must be
248 defined identically for all. Changing the value implies an index
249 reset.
250
251 indexStoreDocText = bool
252 Decide if we store the documents' text content in the index.
253 Storing the text allows extracting snippets from it at query
254 time, instead of building them from index position data. Newer
255 Xapian index formats have rendered our use of positions list
256 unacceptably slow in some cases. The last Xapian index format
257 with good performance for the old method is Chert, which is
258 default for 1.2, still supported but not default in 1.4 and will
259 be dropped in 1.6. The stored document text is translated from
260 its original format to UTF-8 plain text, but not stripped of
261 upper-case, diacritics, or punctuation signs. Storing it
262 increases the index size by 10-20% typically, but also allows
263 for nicer snippets, so it may be worth enabling it even if not
264 strictly needed for performance if you can afford the space.
265 The variable only has an effect when creating an index, meaning
266 that the xapiandb directory must not exist yet. Its exact effect
267 depends on the Xapian version. For Xapian 1.4, if the variable
268 is set to 0, the Chert format will be used, and the text will
269 not be stored. If the variable is 1, Glass will be used, and the
270 text stored. For Xapian 1.2, and for versions after 1.5 and
271 newer, the index format is always the default, but the variable
272 controls if the text is stored or not, and the abstract genera‐
273 tion method. With Xapian 1.5 and later, and the variable set to
274 0, abstract generation may be very slow, but this setting may
275 still be useful to save space if you do not use abstract genera‐
276 tion at all.
277
278
279 nonumbers = bool
280 Decides if terms will be generated for numbers. For example
281 "123", "1.5e6", 192.168.1.4, would not be indexed if nonumbers
282 is set ("value123" would still be). Numbers are often quite
283 interesting to search for, and this should probably not be set
284 except for special situations, ie, scientific documents with
285 huge amounts of numbers in them, where setting nonumbers will
286 reduce the index size. This can only be set for a whole index,
287 not for a subtree.
288
289 dehyphenate = bool
290 Determines if we index in version 1.22, and on by default. Set‐
291 ting the variable to off allows restoring the previous behav‐
292 iour.
293
294 backslashasletter = bool
295 Process backslash as normal letter This may make sense for peo‐
296 ple wanting to index TeX commands as such but is not of much
297 general use.
298
299 maxtermlength = int
300 Maximum term length. Words longer than this will be discarded.
301 The default is 40 and used to be hard-coded, but it can now be
302 adjusted. You need an index reset if you change the value.
303
304 nocjk = bool
305 Decides if specific East Asian (Chinese Korean Japanese) charac‐
306 ters/word splitting is turned off. This will save a small amount
307 of CPU if you have no CJK documents. If your document base does
308 include such text but you are not interested in searching it,
309 setting nocjk may be a significant time and space saver.
310
311 cjkngramlen = int
312 This lets you adjust the size of n-grams used for indexing CJK
313 text. The default value of 2 is probably appropriate in most
314 cases. A value of 3 would allow more precision and efficiency on
315 longer words, but the index will be approximately twice as
316 large.
317
318 indexstemminglanguages = string
319 Languages for which to create stemming expansion data. Stemmer
320 names can be found by executing 'recollindex -l', or this can
321 also be set from a list in the GUI.
322
323 defaultcharset = string
324 Default character set. This is used for files which do not con‐
325 tain a character set definition (e.g.: text/plain). Values found
326 inside files, e.g. a 'charset' tag in HTML documents, will over‐
327 ride it. If this is not set, the default character set is the
328 one defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),
329 or ultimately iso-8859-1 (cp-1252 in fact). If for some reason
330 you want a general default which does not match your LANG and is
331 not 8859-1, use this variable. This can be redefined for any
332 sub-directory.
333
334 unac_except_trans = string
335 A list of characters, encoded in UTF-8, which should be handled
336 specially when converting text to unaccented lowercase. For
337 example, in Swedish, the letter a with diaeresis has full alpha‐
338 bet citizenship and should not be turned into an a. Each ele‐
339 ment in the space-separated list has the special character as
340 first element and the translation following. The handling of
341 both the lowercase and upper-case versions of a character should
342 be specified, as appartenance to the list will turn-off both
343 standard accent and case processing. The value is global and
344 affects both indexing and querying. Examples: Swedish:
345 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
346 fifi flfl åå Åå unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe
347 æae Æae ffff fifi flfl In French, you probably want to decompose oe
348 and ae and nobody would type a German ß unac_except_trans = ßss
349 œoe Œoe æae Æae ffff fifi flfl are not performed by unac, but it is
350 unlikely that someone would type the composed forms in a search.
351 unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
352
353 maildefcharset = string
354 Overrides the default character set for email messages which
355 don't specify one. This is mainly useful for readpst (libpst)
356 dumps, which are utf-8 but do not say so.
357
358 localfields = string
359 Set fields on all files (usually of a specific fs area). Syntax
360 is the usual: name = value ; attr1 = val1 ; [...] value is
361 empty so this needs an initial semi-colon. This is useful, e.g.,
362 for setting the rclaptg field for application selection inside
363 mimeview.
364
365 testmodifusemtime = bool
366 Use mtime instead of ctime to test if a file has been modified.
367 The time is used in addition to the size, which is always used.
368 Setting this can reduce re-indexing on systems where extended
369 attributes are used (by some other application), but not
370 indexed, because changing extended attributes only affects
371 ctime. Notes: - This may prevent detection of change in some
372 marginal file rename cases (the target would need to have the
373 same size and mtime). - You should probably also set noxattr‐
374 fields to 1 in this case, except if you still prefer to perform
375 xattr indexing, for example if the local file update pattern
376 makes it of value (as in general, there is a risk for pure
377 extended attributes updates without file modification to go
378 undetected). Perform a full index reset after changing this.
379
380
381 noxattrfields = bool
382 Disable extended attributes conversion to metadata fields. This
383 probably needs to be set if testmodifusemtime is set.
384
385 metadatacmds = string
386 Define commands to gather external metadata, e.g. tmsu tags.
387 There can be several entries, separated by semi-colons, each
388 defining which field name the data goes into and the command to
389 use. Don't forget the initial semi-colon. All the field names
390 must be different. You can use aliases in the "field" file if
391 necessary. As a not too pretty hack conceded to convenience,
392 any field name beginning with "rclmulti" will be taken as an
393 indication that the command returns multiple field values inside
394 a text blob formatted as a recoll configuration file ("fieldname
395 = fieldvalue" lines). The rclmultixx name will be ignored, and
396 field names and values will be parsed from the data. Example:
397 metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
398 %f
399
400
401 cachedir = dfn
402 Top directory for Recoll data. Recoll data directories are nor‐
403 mally located relative to the configuration directory (e.g.
404 ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set,
405 the directories are stored under the specified value instead
406 (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
407 ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
408 mboxcachedir, aspellDicDir, which can still be individually
409 specified to override cachedir. Note that if you have multiple
410 configurations, each must have a different cachedir, there is no
411 automatic computation of a subpath under cachedir.
412
413 maxfsoccuppc = int
414 Maximum file system occupation over which we stop indexing. The
415 value is a percentage, corresponding to what the "Capacity" df
416 output column shows. The default value is 0, meaning no check‐
417 ing.
418
419 dbdir = dfn
420 Xapian database directory location. This will be created on
421 first indexing. If the value is not an absolute path, it will be
422 interpreted as relative to cachedir if set, or the configuration
423 directory (-c argument or $RECOLL_CONFDIR). If nothing is spec‐
424 ified, the default is then ~/.recoll/xapiandb/
425
426 idxstatusfile = fn
427 Name of the scratch file where the indexer process updates its
428 status. Default: idxstatus.txt inside the configuration direc‐
429 tory.
430
431 mboxcachedir = dfn
432 Directory location for storing mbox message offsets cache files.
433 This is normally 'mboxcache' under cachedir if set, or else
434 under the configuration directory, but it may be useful to share
435 a directory between different configurations.
436
437 mboxcacheminmbs = int
438 Minimum mbox file size over which we cache the offsets. There is
439 really no sense in caching offsets for small files. The default
440 is 5 MB.
441
442 webcachedir = dfn
443 Directory where we store the archived web pages. This is only
444 used by the web history indexing code Default: cachedir/webcache
445 if cachedir is set, else $RECOLL_CONFDIR/webcache
446
447 webcachemaxmbs = int
448 Maximum size in MB of the Web archive. This is only used by the
449 web history indexing code. Default: 40 MB. Reducing the size
450 will not physically truncate the file.
451
452 webqueuedir = fn
453 The path to the Web indexing queue. This used to be hard-coded
454 in the old plugin as ~/.recollweb/ToIndex so there would be no
455 need or possibility to change it, but the WebExtensions plugin
456 now downloads the files to the user Downloads directory, and a
457 script moves them to webqueuedir. The script reads this value
458 from the config so it has become possible to change it.
459
460 webdownloadsdir = fn
461 The path to browser downloads directory. This is where the new
462 browser add-on extension has to create the files. They are then
463 moved by a script to webqueuedir.
464
465 aspellDicDir = dfn
466 Aspell dictionary storage directory location. The aspell dictio‐
467 nary (aspdict.(lang).rws) is normally stored in the directory
468 specified by cachedir if set, or under the configuration direc‐
469 tory.
470
471 filtersdir = dfn
472 Directory location for executable input handlers. If RECOLL_FIL‐
473 TERSDIR is set in the environment, we use it instead. Defaults
474 to $prefix/share/recoll/filters. Can be redefined for subdirec‐
475 tories.
476
477 iconsdir = dfn
478 Directory location for icons. The only reason to change this
479 would be if you want to change the icons displayed in the result
480 list. Defaults to $prefix/share/recoll/images
481
482 idxflushmb = int
483 Threshold (megabytes of new data) where we flush from memory to
484 disk index. Setting this allows some control over memory usage
485 by the indexer process. A value of 0 means no explicit flushing,
486 which lets Xapian perform its own thing, meaning flushing every
487 $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted:
488 as memory usage depends on average document size, not only docu‐
489 ment count, the Xapian approach is is not very useful, and you
490 should let Recoll manage the flushes. The program compiled value
491 is 0. The configured default value (from this file) is now 50
492 MB, and should be ok in many cases. You can set it as low as 10
493 to conserve memory, but if you are looking for maximum speed,
494 you may want to experiment with values between 20 and 200. In my
495 experience, values beyond this are always counterproductive. If
496 you find otherwise, please drop me a note.
497
498 filtermaxseconds = int
499 Maximum external filter execution time in seconds. Default 1200
500 (20mn). Set to 0 for no limit. This is mainly to avoid infinite
501 loops in postscript files (loop.ps)
502
503 filtermaxmbytes = int
504 Maximum virtual memory space for filter processes (setr‐
505 limit(RLIMIT_AS)), in megabytes. Note that this includes any
506 mapped libs (there is no reliable Linux way to limit the data
507 space only), so we need to be a bit generous here. Anything over
508 2000 will be ignored on 32 bits machines.
509
510 thrQSizes = string
511 Stage input queues configuration. There are three internal
512 queues in the indexing pipeline stages (file data extraction,
513 terms generation, index update). This parameter defines the
514 queue depths for each stage (three integer values). If a value
515 of -1 is given for a given stage, no queue is used, and the
516 thread will go on performing the next stage. In practise, deep
517 queues have not been shown to increase performance. Default: a
518 value of 0 for the first queue tells Recoll to perform autocon‐
519 figuration based on the detected number of CPUs (no need for the
520 two other values in this case). Use thrQSizes = -1 -1 -1 to
521 disable multithreading entirely.
522
523 thrTCounts = string
524 Number of threads used for each indexing stage. The three stages
525 are: file data extraction, terms generation, index update). The
526 use of the counts is also controlled by some special values in
527 thrQSizes: if the first queue depth is 0, all counts are ignored
528 (autoconfigured); if a value of -1 is used for a queue depth,
529 the corresponding thread count is ignored. It makes no sense to
530 use a value other than 1 for the last stage because updating the
531 Xapian index is necessarily single-threaded (and protected by a
532 mutex).
533
534 loglevel = int
535 Log file verbosity 1-6. A value of 2 will print only errors and
536 warnings. 3 will print information like document updates, 4 is
537 quite verbose and 6 very verbose.
538
539 logfilename = fn
540 Log file destination. Use 'stderr' (default) to write to the
541 console.
542
543 idxloglevel = int
544 Override loglevel for the indexer.
545
546 idxlogfilename = fn
547 Override logfilename for the indexer.
548
549 daemloglevel = int
550 Override loglevel for the indexer in real time mode. The default
551 is to use the idx... values if set, else the log... values.
552
553 daemlogfilename = fn
554 Override logfilename for the indexer in real time mode. The
555 default is to use the idx... values if set, else the log... val‐
556 ues.
557
558 orgidxconfdir = dfn
559 Original location of the configuration directory. This is used
560 exclusively for movable datasets. Locating the configuration
561 directory inside the directory tree makes it possible to provide
562 automatic query time path translations once the data set has
563 moved (for example, because it has been mounted on another loca‐
564 tion).
565
566 curidxconfdir = dfn
567 Current location of the configuration directory. Complement
568 orgidxconfdir for movable datasets. This should be used if the
569 configuration directory has been copied from the dataset to
570 another location, either because the dataset is readonly and an
571 r/w copy is desired, or for performance reasons. This records
572 the original moved location before copy, to allow path transla‐
573 tion computations. For example if a dataset originally indexed
574 as '/home/me/mydata/config' has been mounted to
575 '/media/me/mydata', and the GUI is running from a copied config‐
576 uration, orgidxconfdir would be '/home/me/mydata/config', and
577 curidxconfdir (as set in the copied configuration) would be
578
579 idxrundir = dfn
580 Indexing process current directory. The input handlers sometimes
581 leave temporary files in the current directory, so it makes
582 sense to have recollindex chdir to some temporary directory. If
583 the value is empty, the current directory is not changed. If the
584 value is (literal) tmp, we use the temporary directory as set by
585 the environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the
586 value is an absolute path to a directory, we go there.
587
588 checkneedretryindexscript = fn
589 Script used to heuristically check if we need to retry indexing
590 files which previously failed. The default script checks the
591 modified dates on /usr/bin and /usr/local/bin. A relative path
592 will be looked up in the filters dirs, then in the path. Use an
593 absolute path to do otherwise.
594
595 recollhelperpath = string
596 Additional places to search for helper executables. This is only
597 used on Windows for now.
598
599 idxabsmlen = int
600 Length of abstracts we store while indexing. Recoll stores an
601 abstract for each indexed file. The text can come from an
602 actual 'abstract' section in the document or will just be the
603 beginning of the document. It is stored in the index so that it
604 can be displayed inside the result lists without decoding the
605 original file. The idxabsmlen parameter defines the size of the
606 stored abstract. The default value is 250 bytes. The search
607 interface gives you the choice to display this stored text or a
608 synthetic abstract built by extracting text around the search
609 terms. If you always prefer the synthetic abstract, you can
610 reduce this value and save a little space.
611
612 idxmetastoredlen = int
613 Truncation length of stored metadata fields. This does not
614 affect indexing (the whole field is processed anyway), just the
615 amount of data stored in the index for the purpose of displaying
616 fields inside result lists or previews. The default value is 150
617 bytes which may be too low if you have custom fields.
618
619 idxtexttruncatelen = int
620 Truncation length for all document texts. Only index the begin‐
621 ning of documents. This is not recommended except if you are
622 sure that the interesting keywords are at the top and have
623 severe disk space issues.
624
625 aspellLanguage = string
626 Language definitions to use when creating the aspell dictionary.
627 The value must match a set of aspell language definition files.
628 You can type "aspell dicts" to see a list The default if this
629 is not set is to use the NLS environment to guess the value.
630
631 aspellAddCreateParam = string
632 Additional option and parameter to aspell dictionary creation
633 command. Some aspell packages may need an additional option
634 (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See
635 Debian bug 772415.
636
637 aspellKeepStderr = bool
638 Set this to have a look at aspell dictionary creation errors.
639 There are always many, so this is mostly for debugging.
640
641 noaspell = bool
642 Disable aspell use. The aspell dictionary generation takes time,
643 and some combinations of aspell version, language, and local
644 terms, result in aspell crashing, so it sometimes makes sense to
645 just disable the thing.
646
647 monauxinterval = int
648 Auxiliary database update interval. The real time indexer only
649 updates the auxiliary databases (stemdb, aspell) periodically,
650 because it would be too costly to do it for every document
651 change. The default period is one hour.
652
653 monixinterval = int
654 Minimum interval (seconds) between processings of the indexing
655 queue. The real time indexer does not process each event when it
656 comes in, but lets the queue accumulate, to diminish overhead
657 and to aggregate multiple events affecting the same file.
658 Default 30 S.
659
660 mondelaypatterns = string
661 Timing parameters for the real time indexing. Definitions for
662 files which get a longer delay before reindexing is allowed.
663 This is for fast-changing files, that should only be reindexed
664 once in a while. A list of wildcardPattern:seconds pairs. The
665 patterns are matched with fnmatch(pattern, path, 0) You can
666 quote entries containing white space with double quotes (quote
667 the whole entry, not the pattern). The default is empty. Exam‐
668 ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
669
670 monioniceclass = int
671 ionice class for the real time indexing process On platforms
672 where this is supported. The default value is 3.
673
674 monioniceclassdata = string
675 ionice class parameter for the real time indexing process. On
676 platforms where this is supported. The default is empty.
677
678 autodiacsens = bool
679 auto-trigger diacritics sensitivity (raw index only). IF the
680 index is not stripped, decide if we automatically trigger dia‐
681 critics sensitivity if the search term has accented characters
682 (not in unac_except_trans). Else you need to use the query lan‐
683 guage and the "D" modifier to specify diacritics sensitivity.
684 Default is no.
685
686 autocasesens = bool
687 auto-trigger case sensitivity (raw index only). IF the index is
688 not stripped (see indexStripChars), decide if we automatically
689 trigger character case sensitivity if the search term has upper-
690 case characters in any but the first position. Else you need to
691 use the query language and the "C" modifier to specify charac‐
692 ter-case sensitivity. Default is yes.
693
694 maxTermExpand = int
695 Maximum query expansion count for a single term (e.g.: when
696 using wildcards). This only affects queries, not indexing. We
697 used to not limit this at all (except for filenames where the
698 limit was too low at 1000), but it is unreasonable with a big
699 index. Default 10000.
700
701 maxXapianClauses = int
702 Maximum number of clauses we add to a single Xapian query. This
703 only affects queries, not indexing. In some cases, the result of
704 term expansion can be multiplicative, and we want to avoid eat‐
705 ing all the memory. Default 50000.
706
707 snippetMaxPosWalk = int
708 Maximum number of positions we walk while populating a snippet
709 for the result list. The default of 1,000,000 may be insuffi‐
710 cient for very big documents, the consequence would be snippets
711 with possibly meaning-altering missing words.
712
713 pdfocr = bool
714 Attempt OCR of PDF files with no text content if both tesseract
715 and pdftoppm are installed. The default is off because OCR is so
716 very slow.
717
718 pdfocrlang = string
719 Language to assume for PDF OCR. This is very important for hav‐
720 ing a reasonable rate of errors with tesseract. This can also be
721 set through a configuration variable or directory-local parame‐
722 ters. See the rclpdf.py script.
723
724 pdfattach = bool
725 Enable PDF attachment extraction by executing pdftk (if avail‐
726 able). This is normally disabled, because it does slow down PDF
727 indexing a bit even if not one attachment is ever found.
728
729 pdfextrameta = string
730 Extract text from selected XMP metadata tags. This is a space-
731 separated list of qualified XMP tag names. Each element can also
732 include a translation to a Recoll field name, separated by a '|'
733 character. If the second element is absent, the tag name is used
734 as the Recoll field names. You will also need to add specifica‐
735 tions to the "fields" file to direct processing of the extracted
736 data.
737
738 pdfextrametafix = fn
739 Define name of XMP field editing script. This defines the name
740 of a script to be loaded for editing XMP field values. The
741 script should define a 'MetaFixer' class with a metafix() method
742 which will be called with the qualified tag name and value of
743 each selected field, for editing or erasing. A new instance is
744 created for each document, so that the object can keep state
745 for, e.g. eliminating duplicate values.
746
747 mhmboxquirks = string
748 Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
749 for the directory where the email mbox files are stored.
750
751
752
754 recollindex(1) recoll(1)
755
756
757
758 14 November 2012 RECOLL.CONF(5)