1RECOLL.CONF(5)                File Formats Manual               RECOLL.CONF(5)
2
3
4

NAME

6       recoll.conf - main personal configuration file for Recoll
7

DESCRIPTION

9       This  file  defines  the  index  configuration for the Recoll full-text
10       search system.
11
12       The system-wide configuration file is normally located inside /usr/[lo‐
13       cal]/share/recoll/examples. Any parameter set in the common file may be
14       overridden by setting it in the personal  configuration  file,  by  de‐
15       fault: $HOME/.recoll/recoll.conf
16
17       Please note while I try to keep this manual page reasonably up to date,
18       it will frequently lag the current state  of  the  software.  The  best
19       source  of  information about the configuration are the comments in the
20       system-wide configuration file or the user manual which you can  access
21       from the recoll GUI help menu or on the recoll web site.
22
23
24       A short extract of the file might look as follows:
25
26              # Space-separated list of directories to index.
27              topdirs =  ~/docs /usr/share/doc
28
29              [~/somedirectory-with-utf8-txt-files]
30              defaultcharset = utf-8
31
32
33       There are three kinds of lines:
34
35              •      Comment or empty
36
37              •      Parameter affectation
38
39              •      Section definition
40
41       Empty lines or lines beginning with # are ignored.
42
43       Affectation lines are in the form 'name = value'.
44
45       Section  lines  allow  redefining  a parameter for a directory subtree.
46       Some of the parameters used for indexing are looked  up  hierarchically
47       from  the more to the less specific. Not all parameters can be meaning‐
48       fully redefined, this is specified for each in the next section.
49
50       The tilde character (~) is expanded in file names to the  name  of  the
51       user's home directory.
52
53       Where  values  are  lists, white space is used for separation, and ele‐
54       ments with embedded spaces can be quoted with double-quotes.
55

OPTIONS

57       topdirs = string
58              Space-separated list of files or directories to recursively  in‐
59              dex. Default to ~ (indexes $HOME). You can use symbolic links in
60              the list, they will be followed, independently of the  value  of
61              the followLinks variable.
62
63       monitordirs = string
64              Space-separated  list of files or directories to monitor for up‐
65              dates. When running the real-time indexer, this allows  monitor‐
66              ing  only  a subset of the whole indexed area. The elements must
67              be included in the tree defined by the 'topdirs' members.
68
69       skippedNames = string
70              Files and directories which should be ignored.  White space sep‐
71              arated  list  of wildcard patterns (simple ones, not paths, must
72              contain no / ), which will be tested against file and  directory
73              names.   The  list in the default configuration does not exclude
74              hidden directories (names beginning with  a  dot),  which  means
75              that  it  may  index quite a few things that you do not want. On
76              the other hand, email user agents like Thunderbird usually store
77              messages  in  hidden directories, and you probably want this in‐
78              dexed. One possible solution is to have ".*" in  "skippedNames",
79              and   add   things   like   "~/.thunderbird"  "~/.evolution"  to
80              "topdirs".  Not even the file names are indexed for patterns  in
81              this  list, see the "noContentSuffixes" variable for an alterna‐
82              tive approach which indexes the file names. Can be redefined for
83              any subtree.
84
85       skippedNames- = string
86              List  of  name  endings  to remove from the default skippedNames
87              list.
88
89       skippedNames+ = string
90              List of name endings to add to the default skippedNames list.
91
92       onlyNames = string
93              Regular file name filter patterns If this is set, only the  file
94              names  not in skippedNames and matching one of the patterns will
95              be considered for indexing. Can be redefined per  subtree.  Does
96              not apply to directories.
97
98       noContentSuffixes = string
99              List  of  name  endings (not necessarily dot-separated suffixes)
100              for which we don't try MIME type identification, and  don't  un‐
101              compress  or index content. Only the names will be indexed. This
102              complements the  now  obsoleted  recoll_noindex  list  from  the
103              mimemap  file,  which will go away in a future release (the move
104              from mimemap to recoll.conf allows editing the list through  the
105              GUI). This is different from skippedNames because these are name
106              ending matches only (not wildcard patterns), and the  file  name
107              itself  gets  indexed normally. This can be redefined for subdi‐
108              rectories.
109
110       noContentSuffixes- = string
111              List of name endings to remove from  the  default  noContentSuf‐
112              fixes list.
113
114       noContentSuffixes+ = string
115              List  of  name  endings  to add to the default noContentSuffixes
116              list.
117
118       skippedPaths = string
119              Absolute paths we should not go into.  Space-separated  list  of
120              wildcard  expressions for absolute filesystem paths. Must be de‐
121              fined at the top level of the configuration file, not in a  sub‐
122              section.  Can  contain  files  and directories. The database and
123              configuration directories will automatically be added.  The  ex‐
124              pressions  are  matched using 'fnmatch(3)' with the FNM_PATHNAME
125              flag set by default. This means  that  '/'  characters  must  be
126              matched  explicitly.  You can set 'skippedPathsFnmPathname' to 0
127              to disable the use of FNM_PATHNAME (meaning that '/*/dir3'  will
128              match  '/dir1/dir2/dir3').  The default value contains the usual
129              mount point for removable media to remind you that it is  a  bad
130              idea  to have Recoll work on these (esp. with the monitor: media
131              gets indexed on mount, all data gets erased on unmount). Explic‐
132              itly adding '/media/xxx' to the 'topdirs' variable will override
133              this.
134
135       skippedPathsFnmPathname = bool
136              Set to 0 to override use of FNM_PATHNAME  for  matching  skipped
137              paths.
138
139       nowalkfn = string
140              File  name  which will cause its parent directory to be skipped.
141              Any directory containing a file with this name will  be  skipped
142              as if it was part of the skippedPaths list. Ex: .recoll-noindex
143
144       daemSkippedPaths = string
145              skippedPaths equivalent specific to real time indexing. This en‐
146              ables having parts of the tree which are initially  indexed  but
147              not  monitored.  If daemSkippedPaths is not set, the daemon uses
148              skippedPaths.
149
150       zipUseSkippedNames = bool
151              Use skippedNames inside Zip archives. Fetched  directly  by  the
152              rclzip.py handler. Skip the patterns defined by skippedNames in‐
153              side Zip archives. Can be  redefined  for  subdirectories.   See
154              https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil
155              teringOutZipArchiveMembers.html
156
157
158       zipSkippedNames = string
159              Space-separated list of  wildcard  expressions  for  names  that
160              should  be ignored inside zip archives. This is used directly by
161              the zip handler. If zipUseSkippedNames is not  set,  zipSkipped‐
162              Names defines the patterns to be skipped inside archives. If zi‐
163              pUseSkippedNames is set, the  two  lists  are  concatenated  and
164              used. Can be redefined for subdirectories.  See https://www.les
165              bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
166              bers.html
167
168
169       followLinks = bool
170              Follow  symbolic links during indexing. The default is to ignore
171              symbolic links to avoid multiple indexing of  linked  files.  No
172              effort  is  made to avoid duplication when this option is set to
173              true. This option can  be  set  individually  for  each  of  the
174              'topdirs' members by using sections. It can not be changed below
175              the 'topdirs' level. Links in the 'topdirs' list itself are  al‐
176              ways followed.
177
178       indexedmimetypes = string
179              Restrictive  list  of  indexed  mime types. Normally not set (in
180              which case all supported types are indexed). If it is set,  only
181              the  types  from  the list will have their contents indexed. The
182              names will be indexed anyway if indexallfilenames  is  set  (de‐
183              fault).  MIME  type  names should be taken from the mimemap file
184              (the values may be different from xdg-mime or file -i output  in
185              some cases). Can be redefined for subtrees.
186
187       excludedmimetypes = string
188              List  of  excluded  MIME types. Lets you exclude some types from
189              indexing. MIME type names should be taken from the mimemap  file
190              (the  values may be different from xdg-mime or file -i output in
191              some cases) Can be redefined for subtrees.
192
193       nomd5types = string
194              Don't compute md5 for these types. md5 checksums are  used  only
195              for  deduplicating results, and can be very expensive to compute
196              on multimedia or other big files. This list lets  you  turn  off
197              md5  computation  for selected types. It is global (no redefini‐
198              tion for subtrees). At the moment, it only has an effect for ex‐
199              ternal  handlers  (exec and execm). The file types can be speci‐
200              fied by listing either MIME types (e.g. audio/mpeg)  or  handler
201              names (e.g. rclaudio.py).
202
203       compressedfilemaxkbs = int
204              Size  limit for compressed files. We need to decompress these in
205              a temporary directory for identification, which can be  wasteful
206              in  some  cases. Limit the waste. Negative means no limit. 0 re‐
207              sults in no processing of any compressed file. Default 50 MB.
208
209       textfilemaxmbs = int
210              Size limit for text files. Mostly for skipping monster logs. De‐
211              fault 20 MB.
212
213       indexallfilenames = bool
214              Index  the  file  names  of unprocessed files Index the names of
215              files the contents of which we don't index  because  of  an  ex‐
216              cluded or unsupported MIME type.
217
218       usesystemfilecommand = bool
219              Use a system command for file MIME type guessing as a final step
220              in file type identification This is generally useful,  but  will
221              usually cause the indexing of many bogus 'text' files. See 'sys‐
222              temfilecommand' for the command used.
223
224       systemfilecommand = string
225              Command used to guess MIME types if the internal  methods  fails
226              This  should  be  a  "file -i" workalike.  The file path will be
227              added as a last parameter to the command line. "xdg-mime"  works
228              better  than the traditional "file" command, and is now the con‐
229              figured default (with a hard-coded fallback to "file")
230
231       processwebqueue = bool
232              Decide if we process the Web queue. The  queue  is  a  directory
233              where  the  Recoll Web browser plugins create the copies of vis‐
234              ited pages.
235
236       textfilepagekbs = int
237              Page size for text files. If this is set, text/plain files  will
238              be  divided  into documents of approximately this size. Will re‐
239              duce memory usage at index time and help with  loading  data  in
240              the  preview window at query time. Particularly useful with very
241              big  files,  such  as  application  or  system  logs.  Also  see
242              textfilemaxmbs and compressedfilemaxkbs.
243
244       membermaxkbs = int
245              Size limit for archive members. This is passed to the filters in
246              the environment as RECOLL_FILTER_MAXMEMBERKB.
247
248       indexStripChars = bool
249              Decide if we store character case and diacritics in  the  index.
250              If  we do, searches sensitive to case and diacritics can be per‐
251              formed, but the index will be bigger, and some  marginal  weird‐
252              ness  may sometimes occur. The default is a stripped index. When
253              using multiple indexes for a search, this parameter must be  de‐
254              fined  identically  for all. Changing the value implies an index
255              reset.
256
257       indexStoreDocText = bool
258              Decide if we store the documents' text  content  in  the  index.
259              Storing  the  text  allows  extracting snippets from it at query
260              time, instead of building them from index position data.   Newer
261              Xapian index formats have rendered our use of positions list un‐
262              acceptably slow in some cases. The last Xapian index format with
263              good  performance  for the old method is Chert, which is default
264              for 1.2, still supported but not default  in  1.4  and  will  be
265              dropped in 1.6.  The stored document text is translated from its
266              original format to UTF-8 plain text, but not stripped of  upper-
267              case, diacritics, or punctuation signs. Storing it increases the
268              index size by 10-20% typically, but also allows for nicer  snip‐
269              pets, so it may be worth enabling it even if not strictly needed
270              for performance if you can afford the space.  The variable  only
271              has  an effect when creating an index, meaning that the xapiandb
272              directory must not exist yet. Its exact effect  depends  on  the
273              Xapian  version.   For  Xapian 1.4, if the variable is set to 0,
274              the Chert format will be used, and the text will not be  stored.
275              If  the  variable is 1, Glass will be used, and the text stored.
276              For Xapian 1.2, and for versions after 1.5 and newer, the  index
277              format  is  always the default, but the variable controls if the
278              text is stored or not, and the abstract generation method.  With
279              Xapian 1.5 and later, and the variable set to 0, abstract gener‐
280              ation may be very slow, but this setting may still be useful  to
281              save space if you do not use abstract generation at all.
282
283
284       nonumbers = bool
285              Decides  if  terms  will  be  generated for numbers. For example
286              "123", "1.5e6", 192.168.1.4, would not be indexed  if  nonumbers
287              is  set ("value123" would still be). Numbers are often quite in‐
288              teresting to search for, and this should probably not be set ex‐
289              cept  for special situations, ie, scientific documents with huge
290              amounts of numbers in them, where setting nonumbers will  reduce
291              the  index size. This can only be set for a whole index, not for
292              a subtree.
293
294       dehyphenate = bool
295              Determines if we index 'coworker' also when the  input  is  'co-
296              worker'. This is new in version 1.22, and on by default. Setting
297              the variable to off allows restoring the previous behaviour.
298
299       backslashasletter = bool
300              Process backslash as normal letter. This may make sense for peo‐
301              ple  wanting  to  index  TeX commands as such but is not of much
302              general use.
303
304       underscoreasletter = bool
305              Process underscore as normal letter. This makes sense in so many
306              cases that one wonders if it should not be the default.
307
308       maxtermlength = int
309              Maximum  term  length. Words longer than this will be discarded.
310              The default is 40 and used to be hard-coded, but it can  now  be
311              adjusted. You need an index reset if you change the value.
312
313       nocjk = bool
314              Decides if specific East Asian (Chinese Korean Japanese) charac‐
315              ters/word splitting is turned off. This will save a small amount
316              of  CPU if you have no CJK documents. If your document base does
317              include such text but you are not interested  in  searching  it,
318              setting nocjk may be a significant time and space saver.
319
320       cjkngramlen = int
321              This  lets  you adjust the size of n-grams used for indexing CJK
322              text. The default value of 2 is  probably  appropriate  in  most
323              cases. A value of 3 would allow more precision and efficiency on
324              longer words, but the  index  will  be  approximately  twice  as
325              large.
326
327       indexstemminglanguages = string
328              Languages  for  which to create stemming expansion data. Stemmer
329              names can be found by executing 'recollindex -l',  or  this  can
330              also be set from a list in the GUI. The values are full language
331              names, e.g. english, french...
332
333       defaultcharset = string
334              Default character set. This is used for files which do not  con‐
335              tain a character set definition (e.g.: text/plain). Values found
336              inside files, e.g. a 'charset' tag in HTML documents, will over‐
337              ride  it.  If  this is not set, the default character set is the
338              one defined by the NLS environment ($LC_ALL, $LC_CTYPE,  $LANG),
339              or  ultimately iso-8859-1 (cp-1252 in fact).  If for some reason
340              you want a general default which does not match your LANG and is
341              not  8859-1,  use  this  variable. This can be redefined for any
342              sub-directory.
343
344       unac_except_trans = string
345              A list of characters, encoded in UTF-8, which should be  handled
346              specially  when converting text to unaccented lowercase. For ex‐
347              ample, in Swedish, the letter a with diaeresis has full alphabet
348              citizenship and should not be turned into an a.  Each element in
349              the space-separated list has the special character as first ele‐
350              ment  and  the  translation  following. The handling of both the
351              lowercase and upper-case versions of a character should be spec‐
352              ified,  as  appartenance to the list will turn-off both standard
353              accent and case processing. The value is global and affects both
354              indexing and querying.  Examples:
355              Swedish:
356              unac_except_trans  =  ää  Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
357              fifi flfl åå Åå
358              German:
359              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe  æae  Æae  ffff
360              fifi flfl
361              French:  you  probably  want  to  decompose oe and ae and nobody
362              would type a German ß
363              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
364              The default for all until someone protests follows. These decom‐
365              positions  are  not  performed  by unac, but it is unlikely that
366              someone would type the composed forms in a search.
367              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
368
369       maildefcharset = string
370              Overrides the default character set  for  email  messages  which
371              don't  specify  one.  This is mainly useful for readpst (libpst)
372              dumps, which are utf-8 but do not say so.
373
374       localfields = string
375              Set fields on all files (usually of a specific fs area).  Syntax
376              is  the  usual:  name  =  value ; attr1 = val1 ; [...]  value is
377              empty so this needs an initial semi-colon. This is useful, e.g.,
378              for  setting  the rclaptg field for application selection inside
379              mimeview.
380
381       testmodifusemtime = bool
382              Use mtime instead of ctime to test if a file has been  modified.
383              The  time is used in addition to the size, which is always used.
384              Setting this can reduce re-indexing on  systems  where  extended
385              attributes  are  used  (by  some other application), but not in‐
386              dexed, because changing extended attributes only affects  ctime.
387              Notes:  -  This may prevent detection of change in some marginal
388              file rename cases (the target would need to have the  same  size
389              and  mtime).   - You should probably also set noxattrfields to 1
390              in this case, except if you still prefer to perform xattr index‐
391              ing,  for  example  if the local file update pattern makes it of
392              value (as in general, there is a  risk  for  pure  extended  at‐
393              tributes  updates  without  file modification to go undetected).
394              Perform a full index reset after changing this.
395
396
397       noxattrfields = bool
398              Disable extended attributes conversion to metadata fields.  This
399              probably needs to be set if testmodifusemtime is set.
400
401       metadatacmds = string
402              Define  commands  to  gather  external metadata, e.g. tmsu tags.
403              There can be several entries,  separated  by  semi-colons,  each
404              defining  which field name the data goes into and the command to
405              use. Don't forget the initial semi-colon. All  the  field  names
406              must  be  different.  You can use aliases in the "field" file if
407              necessary.  As a not too pretty hack  conceded  to  convenience,
408              any field name beginning with "rclmulti" will be taken as an in‐
409              dication that the command returns multiple field values inside a
410              text blob formatted as a recoll configuration file ("fieldname =
411              fieldvalue" lines). The rclmultixx name  will  be  ignored,  and
412              field  names  and values will be parsed from the data.  Example:
413              metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
414              %f
415
416
417       cachedir = dfn
418              Top  directory for Recoll data. Recoll data directories are nor‐
419              mally located relative  to  the  configuration  directory  (e.g.
420              ~/.recoll/xapiandb,  ~/.recoll/mboxcache). If 'cachedir' is set,
421              the directories are stored under  the  specified  value  instead
422              (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
423              ~/.cache/recoll/xapiandb).   This  affects  dbdir,  webcachedir,
424              mboxcachedir,  aspellDicDir,  which  can  still  be individually
425              specified to override cachedir.  Note that if you have  multiple
426              configurations, each must have a different cachedir, there is no
427              automatic computation of a subpath under cachedir.
428
429       maxfsoccuppc = int
430              Maximum file system occupation over which we stop indexing.  The
431              value  is  a percentage, corresponding to what the "Capacity" df
432              output column shows. The default value is 0, meaning  no  check‐
433              ing.
434
435       dbdir = dfn
436              Xapian  database  directory  location.  This  will be created on
437              first indexing. If the value is not an absolute path, it will be
438              interpreted as relative to cachedir if set, or the configuration
439              directory (-c argument or $RECOLL_CONFDIR).  If nothing is spec‐
440              ified, the default is then ~/.recoll/xapiandb/
441
442       idxstatusfile = fn
443              Name  of  the scratch file where the indexer process updates its
444              status. Default: idxstatus.txt inside the  configuration  direc‐
445              tory.
446
447       mboxcachedir = dfn
448              Directory location for storing mbox message offsets cache files.
449              This is normally 'mboxcache' under cachedir if set, or else  un‐
450              der the configuration directory, but it may be useful to share a
451              directory between different configurations.
452
453       mboxcacheminmbs = int
454              Minimum mbox file size over which we cache the offsets. There is
455              really  no sense in caching offsets for small files. The default
456              is 5 MB.
457
458       mboxmaxmsgmbs = int
459              Maximum mbox member message size in megabytes. Size  over  which
460              we  assume  that the mbox format is bad or we misinterpreted it,
461              at which point we just stop processing the file.
462
463       webcachedir = dfn
464              Directory where we store the archived web pages.  This  is  only
465              used by the web history indexing code Default: cachedir/webcache
466              if cachedir is set, else $RECOLL_CONFDIR/webcache
467
468       webcachemaxmbs = int
469              Maximum size in MB of the Web archive. This is only used by  the
470              web  history  indexing code.  Default: 40 MB.  Reducing the size
471              will not physically truncate the file.
472
473       webqueuedir = fn
474              The path to the Web indexing queue. This used to  be  hard-coded
475              in  the  old plugin as ~/.recollweb/ToIndex so there would be no
476              need or possibility to change it, but the  WebExtensions  plugin
477              now  downloads  the files to the user Downloads directory, and a
478              script moves them to webqueuedir. The script  reads  this  value
479              from the config so it has become possible to change it.
480
481       webdownloadsdir = fn
482              The  path  to browser downloads directory. This is where the new
483              browser add-on extension has to create the files. They are  then
484              moved by a script to webqueuedir.
485
486       aspellDicDir = dfn
487              Aspell dictionary storage directory location. The aspell dictio‐
488              nary (aspdict.(lang).rws) is normally stored  in  the  directory
489              specified  by cachedir if set, or under the configuration direc‐
490              tory.
491
492       filtersdir = dfn
493              Directory location for executable input handlers. If RECOLL_FIL‐
494              TERSDIR  is  set in the environment, we use it instead. Defaults
495              to $prefix/share/recoll/filters. Can be redefined for  subdirec‐
496              tories.
497
498       iconsdir = dfn
499              Directory  location  for  icons.  The only reason to change this
500              would be if you want to change the icons displayed in the result
501              list. Defaults to $prefix/share/recoll/images
502
503       idxflushmb = int
504              Threshold  (megabytes of new data) where we flush from memory to
505              disk index. Setting this allows some control over  memory  usage
506              by the indexer process. A value of 0 means no explicit flushing,
507              which lets Xapian perform its own thing, meaning flushing  every
508              $XAPIAN_FLUSH_THRESHOLD  documents created, modified or deleted:
509              as memory usage depends on average document size, not only docu‐
510              ment  count,  the Xapian approach is is not very useful, and you
511              should let Recoll manage the flushes. The program compiled value
512              is  0.  The  configured default value (from this file) is now 50
513              MB, and should be ok in many cases.  You can set it as low as 10
514              to  conserve  memory,  but if you are looking for maximum speed,
515              you may want to experiment with values between 20 and 200. In my
516              experience,  values beyond this are always counterproductive. If
517              you find otherwise, please drop me a note.
518
519       filtermaxseconds = int
520              Maximum external filter execution time in seconds. Default  1200
521              (20mn).  Set to 0 for no limit. This is mainly to avoid infinite
522              loops in postscript files (loop.ps)
523
524       filtermaxmbytes = int
525              Maximum  virtual  memory  space  for  filter  processes   (setr‐
526              limit(RLIMIT_AS)),  in  megabytes.  Note  that this includes any
527              mapped libs (there is no reliable Linux way to  limit  the  data
528              space only), so we need to be a bit generous here. Anything over
529              2000 will be ignored on 32 bits machines. The  previous  default
530              value  of  2000  would  prevent java pdftk to work when executed
531              from Python rclpdf.py.
532
533       thrQSizes = string
534              Stage input  queues  configuration.  There  are  three  internal
535              queues  in  the  indexing pipeline stages (file data extraction,
536              terms generation, index  update).  This  parameter  defines  the
537              queue  depths  for each stage (three integer values). If a value
538              of -1 is given for a given stage, no  queue  is  used,  and  the
539              thread  will  go on performing the next stage. In practise, deep
540              queues have not been shown to increase performance.  Default:  a
541              value  of 0 for the first queue tells Recoll to perform autocon‐
542              figuration based on the detected number of CPUs (no need for the
543              two  other  values  in  this case).  Use thrQSizes = -1 -1 -1 to
544              disable multithreading entirely.
545
546       thrTCounts = string
547              Number of threads used for each indexing stage. The three stages
548              are:  file data extraction, terms generation, index update). The
549              use of the counts is also controlled by some special  values  in
550              thrQSizes: if the first queue depth is 0, all counts are ignored
551              (autoconfigured); if a value of -1 is used for  a  queue  depth,
552              the  corresponding thread count is ignored. It makes no sense to
553              use a value other than 1 for the last stage because updating the
554              Xapian  index is necessarily single-threaded (and protected by a
555              mutex).
556
557       loglevel = int
558              Log file verbosity 1-6. A value of 2 will print only errors  and
559              warnings.  3  will print information like document updates, 4 is
560              quite verbose and 6 very verbose.
561
562       logfilename = fn
563              Log file destination. Use 'stderr' (default)  to  write  to  the
564              console.
565
566       idxloglevel = int
567              Override loglevel for the indexer.
568
569       idxlogfilename = fn
570              Override logfilename for the indexer.
571
572       daemloglevel = int
573              Override loglevel for the indexer in real time mode. The default
574              is to use the idx... values if set, else the log... values.
575
576       daemlogfilename = fn
577              Override logfilename for the indexer in real time mode. The  de‐
578              fault  is  to use the idx... values if set, else the log... val‐
579              ues.
580
581       pyloglevel = int
582              Override loglevel for the python module.
583
584       pylogfilename = fn
585              Override logfilename for the python module.
586
587       orgidxconfdir = dfn
588              Original location of the configuration directory. This  is  used
589              exclusively for movable datasets. Locating the configuration di‐
590              rectory inside the directory tree makes it possible  to  provide
591              automatic  query  time  path  translations once the data set has
592              moved (for example, because it has been mounted on another loca‐
593              tion).
594
595       curidxconfdir = dfn
596              Current  location  of  the  configuration  directory. Complement
597              orgidxconfdir for movable datasets. This should be used  if  the
598              configuration  directory has been copied from the dataset to an‐
599              other location, either because the dataset is  readonly  and  an
600              r/w  copy  is  desired, or for performance reasons. This records
601              the original moved location before copy, to allow path  transla‐
602              tion  computations.  For example if a dataset originally indexed
603              as '/home/me/mydata/config' has been mounted  to  '/media/me/my‐
604              data',  and  the  GUI  is  running  from a copied configuration,
605              orgidxconfdir would  be  '/home/me/mydata/config',  and  curidx‐
606              confdir  (as  set  in  the  copied configuration) would be '/me‐
607              dia/me/mydata/config'.
608
609       idxrundir = dfn
610              Indexing process current directory. The input handlers sometimes
611              leave  temporary  files  in  the  current directory, so it makes
612              sense to have recollindex chdir to some temporary directory.  If
613              the value is empty, the current directory is not changed. If the
614              value is (literal) tmp, we use the temporary directory as set by
615              the  environment  (RECOLL_TMPDIR  else TMPDIR else /tmp). If the
616              value is an absolute path to a directory, we go there.
617
618       checkneedretryindexscript = fn
619              Script used to heuristically check if we need to retry  indexing
620              files  which  previously  failed.  The default script checks the
621              modified dates on /usr/bin and /usr/local/bin. A  relative  path
622              will  be looked up in the filters dirs, then in the path. Use an
623              absolute path to do otherwise.
624
625       recollhelperpath = string
626              Additional places to search for helper executables. This is only
627              used on Windows for now.
628
629       idxabsmlen = int
630              Length  of  abstracts  we store while indexing. Recoll stores an
631              abstract for each indexed file.  The text can come from  an  ac‐
632              tual  'abstract' section in the document or will just be the be‐
633              ginning of the document. It is stored in the index  so  that  it
634              can  be  displayed  inside the result lists without decoding the
635              original file. The idxabsmlen parameter defines the size of  the
636              stored  abstract. The default value is 250 bytes. The search in‐
637              terface gives you the choice to display this stored  text  or  a
638              synthetic  abstract  built  by extracting text around the search
639              terms. If you always prefer the synthetic abstract, you can  re‐
640              duce this value and save a little space.
641
642       idxmetastoredlen = int
643              Truncation  length  of stored metadata fields. This does not af‐
644              fect indexing (the whole field is processed  anyway),  just  the
645              amount of data stored in the index for the purpose of displaying
646              fields inside result lists or previews. The default value is 150
647              bytes which may be too low if you have custom fields.
648
649       idxtexttruncatelen = int
650              Truncation  length for all document texts. Only index the begin‐
651              ning of documents. This is not recommended  except  if  you  are
652              sure  that  the interesting keywords are at the top and have se‐
653              vere disk space issues.
654
655       aspellLanguage = string
656              Language definitions to use when creating the aspell dictionary.
657              The  value must match a set of aspell language definition files.
658              You can type "aspell dicts" to see a list The default if this is
659              not  set  is  to use the NLS environment to guess the value. The
660              values are the 2-letter language codes (e.g. 'en', 'fr'...)
661
662       aspellAddCreateParam = string
663              Additional option and parameter to  aspell  dictionary  creation
664              command.  Some  aspell  packages  may  need an additional option
665              (e.g. on Debian Jessie:  --local-data-dir=/usr/lib/aspell).  See
666              Debian bug 772415.
667
668       aspellKeepStderr = bool
669              Set  this  to  have a look at aspell dictionary creation errors.
670              There are always many, so this is mostly for debugging.
671
672       noaspell = bool
673              Disable aspell use. The aspell dictionary generation takes time,
674              and  some  combinations  of  aspell version, language, and local
675              terms, result in aspell crashing, so it sometimes makes sense to
676              just disable the thing.
677
678       monauxinterval = int
679              Auxiliary  database  update interval. The real time indexer only
680              updates the auxiliary databases (stemdb,  aspell)  periodically,
681              because  it  would  be  too  costly  to do it for every document
682              change. The default period is one hour.
683
684       monixinterval = int
685              Minimum interval (seconds) between processings of  the  indexing
686              queue. The real time indexer does not process each event when it
687              comes in, but lets the queue accumulate,  to  diminish  overhead
688              and  to  aggregate  multiple events affecting the same file. De‐
689              fault 30 S.
690
691       mondelaypatterns = string
692              Timing parameters for the real time  indexing.  Definitions  for
693              files  which  get  a  longer delay before reindexing is allowed.
694              This is for fast-changing files, that should only  be  reindexed
695              once  in  a  while. A list of wildcardPattern:seconds pairs. The
696              patterns are matched with  fnmatch(pattern,  path,  0)  You  can
697              quote  entries  containing white space with double quotes (quote
698              the whole entry, not the pattern). The default is empty.   Exam‐
699              ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
700
701       idxniceprio = int
702              "nice"  process priority for the indexing processes. Default: 19
703              (lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
704
705       monioniceclass = int
706              ionice class for the indexing process.  Despite  the  misleading
707              name, and on platforms where this is supported, this affects all
708              indexing processes, not only the real time/monitoring ones.  The
709              default value is 3 (use lowest "Idle" priority).
710
711       monioniceclassdata = string
712              ionice  class  level parameter if the class supports it. The de‐
713              fault is empty, as the default "Idle" class has no levels.
714
715       autodiacsens = bool
716              auto-trigger diacritics sensitivity (raw index only). IF the in‐
717              dex is not stripped, decide if we automatically trigger diacrit‐
718              ics sensitivity if the search term has accented characters  (not
719              in  unac_except_trans).  Else you need to use the query language
720              and the "D" modifier to specify diacritics sensitivity.  Default
721              is no.
722
723       autocasesens = bool
724              auto-trigger  case sensitivity (raw index only). IF the index is
725              not stripped (see indexStripChars), decide if  we  automatically
726              trigger character case sensitivity if the search term has upper-
727              case characters in any but the first position. Else you need  to
728              use  the  query language and the "C" modifier to specify charac‐
729              ter-case sensitivity. Default is yes.
730
731       maxTermExpand = int
732              Maximum query expansion count for a single term (e.g.: when  us‐
733              ing wildcards). This only affects queries, not indexing. We used
734              to not limit this at all (except for filenames where  the  limit
735              was  too  low at 1000), but it is unreasonable with a big index.
736              Default 10000.
737
738       maxXapianClauses = int
739              Maximum number of clauses we add to a single Xapian query.  This
740              only affects queries, not indexing. In some cases, the result of
741              term expansion can be multiplicative, and we want to avoid  eat‐
742              ing all the memory. Default 50000.
743
744       snippetMaxPosWalk = int
745              Maximum  number  of positions we walk while populating a snippet
746              for the result list. The default of 1,000,000  may  be  insuffi‐
747              cient  for very big documents, the consequence would be snippets
748              with possibly meaning-altering missing words.
749
750       pdfocr = bool
751              Attempt OCR of PDF files with no text content. This can  be  de‐
752              fined  in  subdirectories.  The default is off because OCR is so
753              very slow.
754
755       pdfattach = bool
756              Enable PDF attachment extraction by executing pdftk  (if  avail‐
757              able).  This is normally disabled, because it does slow down PDF
758              indexing a bit even if not one attachment is ever found.
759
760       pdfextrameta = string
761              Extract text from selected XMP metadata tags. This is  a  space-
762              separated list of qualified XMP tag names. Each element can also
763              include a translation to a Recoll field name, separated by a '|'
764              character. If the second element is absent, the tag name is used
765              as the Recoll field names. You will also need to add  specifica‐
766              tions to the "fields" file to direct processing of the extracted
767              data.
768
769       pdfextrametafix = fn
770              Define name of XMP field editing script. This defines  the  name
771              of  a  script  to  be  loaded  for editing XMP field values. The
772              script should define a 'MetaFixer' class with a metafix() method
773              which  will  be  called with the qualified tag name and value of
774              each selected field, for editing or erasing. A new  instance  is
775              created  for  each  document,  so that the object can keep state
776              for, e.g. eliminating duplicate values.
777
778       ocrprogs = string
779              OCR modules to try. The top OCR script will try to load the cor‐
780              responding  modules in order and use the first which reports be‐
781              ing capable of performing OCR on the  input  file.  Modules  for
782              tesseract  (tesseract)  and ABBYY FineReader (abbyy) are present
783              in the standard distribution. For compatibility with the  previ‐
784              ous version, if this is not defined at all, the default value is
785              "tesseract". Use an explicit empty value if needed. A  value  of
786              "abbyy tesseract" will try everything.
787
788       ocrcachedir = dfn
789              Location  for  caching OCR data. The default if this is empty or
790              undefined  is  to  store  the  cached  OCR  data   under   $REC‐
791              OLL_CONFDIR/ocrcache.
792
793       tesseractlang = string
794              Language  to  assume  for tesseract OCR. Important for improving
795              the OCR accuracy. This can also be set through the contents of a
796              file in the currently processed directory. See the rclocrtesser‐
797              act.py script. Example values: eng,  fra...  See  the  tesseract
798              documentation.
799
800       tesseractcmd = fn
801              Path  for  the  tesseract  command. Do not quote. This is mostly
802              useful on Windows, or for  specifying  a  non-default  tesseract
803              command.    E.g.    on    Windows.    tesseractcmd   =   C:/Pro‐
804              gram Files (x86)/Tesseract-OCR/tesseract.exe
805
806
807       abbyylang = string
808              Language to assume for abbyy OCR. Important  for  improving  the
809              OCR  accuracy.  This  can  also be set through the contents of a
810              file in the currently processed  directory.  See  the  rclocrab‐
811              byy.py  script. Typical values: English, French... See the ABBYY
812              documentation.
813
814
815       abbyycmd = fn
816              Path for the abbyy command The ABBY directory is usually not  in
817              the path, so you should set this.
818
819
820       mhmboxquirks = string
821              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
822              for the directory where the email mbox files are stored.
823
824
825

SEE ALSO

827       recollindex(1) recoll(1)
828
829
830
831                               14 November 2012                 RECOLL.CONF(5)
Impressum