1RECOLL.CONF(5)                File Formats Manual               RECOLL.CONF(5)
2
3
4

NAME

6       recoll.conf - main personal configuration file for Recoll
7

DESCRIPTION

9       This  file  defines  the  index  configuration for the Recoll full-text
10       search system.
11
12       The  system-wide  configuration  file  is   normally   located   inside
13       /usr/[local]/share/recoll/examples.  Any  parameter  set  in the common
14       file may be overridden by setting  it  in  the  personal  configuration
15       file, by default: $HOME/.recoll/recoll.conf
16
17       Please note while I try to keep this manual page reasonably up to date,
18       it will frequently lag the current state  of  the  software.  The  best
19       source  of  information about the configuration are the comments in the
20       system-wide configuration file or the user manual which you can  access
21       from the recoll GUI help menu or on the recoll web site.
22
23
24       A short extract of the file might look as follows:
25
26              # Space-separated list of directories to index.
27              topdirs =  ~/docs /usr/share/doc
28
29              [~/somedirectory-with-utf8-txt-files]
30              defaultcharset = utf-8
31
32
33       There are three kinds of lines:
34
35              ·      Comment or empty
36
37              ·      Parameter affectation
38
39              ·      Section definition
40
41       Empty lines or lines beginning with # are ignored.
42
43       Affectation lines are in the form 'name = value'.
44
45       Section  lines  allow  redefining  a parameter for a directory subtree.
46       Some of the parameters used for indexing are looked  up  hierarchically
47       from  the more to the less specific. Not all parameters can be meaning‐
48       fully redefined, this is specified for each in the next section.
49
50       The tilde character (~) is expanded in file names to the  name  of  the
51       user's home directory.
52
53       Where  values  are  lists, white space is used for separation, and ele‐
54       ments with embedded spaces can be quoted with double-quotes.
55

OPTIONS

57       topdirs = string
58              Space-separated list of  files  or  directories  to  recursively
59              index.  Default to ~ (indexes $HOME). You can use symbolic links
60              in the list, they will be followed, independently of  the  value
61              of the followLinks variable.
62
63       monitordirs = string
64              Space-separated  list  of  files  or  directories to monitor for
65              updates. When running the real-time indexer, this  allows  moni‐
66              toring  only  a  subset  of the whole indexed area. The elements
67              must be included in the tree defined by the 'topdirs' members.
68
69       skippedNames = string
70              Files and directories which should be ignored.  White space sep‐
71              arated  list  of wildcard patterns (simple ones, not paths, must
72              contain no / ), which will be tested against file and  directory
73              names.   The  list in the default configuration does not exclude
74              hidden directories (names beginning with  a  dot),  which  means
75              that  it  may  index quite a few things that you do not want. On
76              the other hand, email user agents like Thunderbird usually store
77              messages  in  hidden  directories,  and  you  probably want this
78              indexed. One possible solution is  to  have  ".*"  in  "skipped‐
79              Names",  and  add things like "~/.thunderbird" "~/.evolution" to
80              "topdirs".  Not even the file names are indexed for patterns  in
81              this  list, see the "noContentSuffixes" variable for an alterna‐
82              tive approach which indexes the file names. Can be redefined for
83              any subtree.
84
85       skippedNames- = string
86              List  of  name  endings  to remove from the default skippedNames
87              list.
88
89       skippedNames+ = string
90              List of name endings to add to the default skippedNames list.
91
92       noContentSuffixes = string
93              List of name endings (not  necessarily  dot-separated  suffixes)
94              for  which  we  don't  try  MIME  type identification, and don't
95              uncompress or index content. Only the  names  will  be  indexed.
96              This  complements the now obsoleted recoll_noindex list from the
97              mimemap file, which will go away in a future release  (the  move
98              from  mimemap to recoll.conf allows editing the list through the
99              GUI). This is different from skippedNames because these are name
100              ending  matches  only (not wildcard patterns), and the file name
101              itself gets indexed normally. This can be redefined  for  subdi‐
102              rectories.
103
104       noContentSuffixes- = string
105              List  of  name  endings to remove from the default noContentSuf‐
106              fixes list.
107
108       noContentSuffixes+ = string
109              List of name endings to add  to  the  default  noContentSuffixes
110              list.
111
112       skippedPaths = string
113              Absolute  paths  we  should not go into. Space-separated list of
114              wildcard expressions for  absolute  filesystem  paths.  Must  be
115              defined  at  the  top  level of the configuration file, not in a
116              subsection. Can contain files and directories. The database  and
117              configuration  directories  will  automatically  be  added.  The
118              expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME
119              flag  set  by  default.  This  means that '/' characters must be
120              matched explicitly. You can set 'skippedPathsFnmPathname'  to  0
121              to  disable the use of FNM_PATHNAME (meaning that '/*/dir3' will
122              match '/dir1/dir2/dir3'). The default value contains  the  usual
123              mount  point  for removable media to remind you that it is a bad
124              idea to have Recoll work on these (esp. with the monitor:  media
125              gets indexed on mount, all data gets erased on unmount). Explic‐
126              itly adding '/media/xxx' to the 'topdirs' variable will override
127              this.
128
129       skippedPathsFnmPathname = bool
130              Set  to  0  to override use of FNM_PATHNAME for matching skipped
131              paths.
132
133       nowalkfn = string
134              File name which will cause its parent directory to  be  skipped.
135              Any  directory  containing a file with this name will be skipped
136              as if it was part of the skippedPaths list. Ex: .recoll-noindex
137
138       daemSkippedPaths = string
139              skippedPaths equivalent specific to  real  time  indexing.  This
140              enables having parts of the tree which are initially indexed but
141              not monitored. If daemSkippedPaths is not set, the  daemon  uses
142              skippedPaths.
143
144       zipUseSkippedNames = bool
145              Use  skippedNames  inside  Zip archives. Fetched directly by the
146              rclzip handler. Skip the patterns defined by skippedNames inside
147              Zip   archives.   Can  be  redefined  for  subdirectories.   See
148              https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil
149              teringOutZipArchiveMembers.html
150
151
152       zipSkippedNames = string
153              Space-separated  list  of  wildcard  expressions  for names that
154              should be ignored inside zip archives. This is used directly  by
155              the  zip  handler. If zipUseSkippedNames is not set, zipSkipped‐
156              Names defines the patterns to be  skipped  inside  archives.  If
157              zipUseSkippedNames  is  set,  the two lists are concatenated and
158              used. Can be redefined for subdirectories.  See https://www.les
159              bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
160              bers.html
161
162
163       followLinks = bool
164              Follow symbolic links during indexing. The default is to  ignore
165              symbolic  links  to  avoid multiple indexing of linked files. No
166              effort is made to avoid duplication when this option is  set  to
167              true.  This  option  can  be  set  individually  for each of the
168              'topdirs' members by using sections. It can not be changed below
169              the  'topdirs'  level.  Links  in  the 'topdirs' list itself are
170              always followed.
171
172       indexedmimetypes = string
173              Restrictive list of indexed mime types.  Normally  not  set  (in
174              which  case all supported types are indexed). If it is set, only
175              the types from the list will have their  contents  indexed.  The
176              names  will  be  indexed  anyway  if  indexallfilenames  is  set
177              (default). MIME type names should be taken from the mimemap file
178              (the  values may be different from xdg-mime or file -i output in
179              some cases). Can be redefined for subtrees.
180
181       excludedmimetypes = string
182              List of excluded MIME types. Lets you exclude  some  types  from
183              indexing.  MIME type names should be taken from the mimemap file
184              (the values may be different from xdg-mime or file -i output  in
185              some cases) Can be redefined for subtrees.
186
187       nomd5types = string
188              Don't  compute  md5 for these types. md5 checksums are used only
189              for deduplicating results, and can be very expensive to  compute
190              on  multimedia  or  other big files. This list lets you turn off
191              md5 computation for selected types. It is global  (no  redefini‐
192              tion  for  subtrees).  At  the moment, it only has an effect for
193              external handlers (exec and execm). The file types can be speci‐
194              fied  by  listing either MIME types (e.g. audio/mpeg) or handler
195              names (e.g. rclaudio).
196
197       compressedfilemaxkbs = int
198              Size limit for compressed files. We need to decompress these  in
199              a  temporary directory for identification, which can be wasteful
200              in some cases. Limit the  waste.  Negative  means  no  limit.  0
201              results in no processing of any compressed file. Default 50 MB.
202
203       textfilemaxmbs = int
204              Size  limit  for  text  files. Mostly for skipping monster logs.
205              Default 20 MB.
206
207       indexallfilenames = bool
208              Index the file names of unprocessed files  Index  the  names  of
209              files  the  contents  of  which  we  don't  index  because of an
210              excluded or unsupported MIME type.
211
212       usesystemfilecommand = bool
213              Use a system command for file MIME type guessing as a final step
214              in  file  type identification This is generally useful, but will
215              usually cause the indexing of many bogus 'text' files. See 'sys‐
216              temfilecommand' for the command used.
217
218       systemfilecommand = string
219              Command  used  to guess MIME types if the internal methods fails
220              This should be a "file -i" workalike.  The  file  path  will  be
221              added  as a last parameter to the command line. "xdg-mime" works
222              better than the traditional "file" command, and is now the  con‐
223              figured default (with a hard-coded fallback to "file")
224
225       processwebqueue = bool
226              Decide  if  we  process  the Web queue. The queue is a directory
227              where the Recoll Web browser plugins create the copies  of  vis‐
228              ited pages.
229
230       textfilepagekbs = int
231              Page  size for text files. If this is set, text/plain files will
232              be divided into  documents  of  approximately  this  size.  Will
233              reduce  memory usage at index time and help with loading data in
234              the preview window at query time. Particularly useful with  very
235              big  files,  such  as  application  or  system  logs.  Also  see
236              textfilemaxmbs and compressedfilemaxkbs.
237
238       membermaxkbs = int
239              Size limit for archive members. This is passed to the filters in
240              the environment as RECOLL_FILTER_MAXMEMBERKB.
241
242       indexStripChars = bool
243              Decide  if  we store character case and diacritics in the index.
244              If we do, searches sensitive to case and diacritics can be  per‐
245              formed,  but  the index will be bigger, and some marginal weird‐
246              ness may sometimes occur. The default is a stripped index.  When
247              using  multiple  indexes  for  a  search, this parameter must be
248              defined identically for all. Changing the value implies an index
249              reset.
250
251       indexStoreDocText = bool
252              Decide  if  we  store  the documents' text content in the index.
253              Storing the text allows extracting snippets  from  it  at  query
254              time,  instead of building them from index position data.  Newer
255              Xapian index formats have rendered our  use  of  positions  list
256              unacceptably  slow  in  some cases. The last Xapian index format
257              with good performance for the old  method  is  Chert,  which  is
258              default for 1.2, still supported but not default in 1.4 and will
259              be dropped in 1.6.  The stored document text is translated  from
260              its  original  format  to  UTF-8 plain text, but not stripped of
261              upper-case,  diacritics,  or  punctuation  signs.   Storing   it
262              increases  the  index  size by 10-20% typically, but also allows
263              for nicer snippets, so it may be worth enabling it even  if  not
264              strictly  needed  for  performance  if you can afford the space.
265              The variable only has an effect when creating an index,  meaning
266              that the xapiandb directory must not exist yet. Its exact effect
267              depends on the Xapian version.  For Xapian 1.4, if the  variable
268              is  set  to  0, the Chert format will be used, and the text will
269              not be stored. If the variable is 1, Glass will be used, and the
270              text  stored.   For  Xapian  1.2, and for versions after 1.5 and
271              newer, the index format is always the default, but the  variable
272              controls  if the text is stored or not, and the abstract genera‐
273              tion method. With Xapian 1.5 and later, and the variable set  to
274              0,  abstract  generation  may be very slow, but this setting may
275              still be useful to save space if you do not use abstract genera‐
276              tion at all.
277
278
279       nonumbers = bool
280              Decides  if  terms  will  be  generated for numbers. For example
281              "123", "1.5e6", 192.168.1.4, would not be indexed  if  nonumbers
282              is  set  ("value123"  would  still  be). Numbers are often quite
283              interesting to search for, and this should probably not  be  set
284              except  for  special  situations,  ie, scientific documents with
285              huge amounts of numbers in them, where  setting  nonumbers  will
286              reduce  the  index size. This can only be set for a whole index,
287              not for a subtree.
288
289       dehyphenate = bool
290              Determines if we index 'coworker' also when the  input  is  'co-
291              worker'. This is new in version 1.22, and on by default. Setting
292              the variable to off allows restoring the previous behaviour.
293
294       backslashasletter = bool
295              Process backslash as normal letter This may make sense for  peo‐
296              ple  wanting  to  index  TeX commands as such but is not of much
297              general use.
298
299       maxtermlength = int
300              Maximum term length. Words longer than this will  be  discarded.
301              The  default  is 40 and used to be hard-coded, but it can now be
302              adjusted. You need an index reset if you change the value.
303
304       nocjk = bool
305              Decides if specific East Asian (Chinese Korean Japanese) charac‐
306              ters/word splitting is turned off. This will save a small amount
307              of CPU if you have no CJK documents. If your document base  does
308              include  such  text  but you are not interested in searching it,
309              setting nocjk may be a significant time and space saver.
310
311       cjkngramlen = int
312              This lets you adjust the size of n-grams used for  indexing  CJK
313              text.  The  default  value  of 2 is probably appropriate in most
314              cases. A value of 3 would allow more precision and efficiency on
315              longer  words,  but  the  index  will  be approximately twice as
316              large.
317
318       indexstemminglanguages = string
319              Languages for which to create stemming expansion  data.  Stemmer
320              names  can  be  found by executing 'recollindex -l', or this can
321              also be set from a list in the GUI.
322
323       defaultcharset = string
324              Default character set. This is used for files which do not  con‐
325              tain a character set definition (e.g.: text/plain). Values found
326              inside files, e.g. a 'charset' tag in HTML documents, will over‐
327              ride  it.  If  this is not set, the default character set is the
328              one defined by the NLS environment ($LC_ALL, $LC_CTYPE,  $LANG),
329              or  ultimately iso-8859-1 (cp-1252 in fact).  If for some reason
330              you want a general default which does not match your LANG and is
331              not  8859-1,  use  this  variable. This can be redefined for any
332              sub-directory.
333
334       unac_except_trans = string
335              A list of characters, encoded in UTF-8, which should be  handled
336              specially  when  converting  text  to  unaccented lowercase. For
337              example, in Swedish, the letter a with diaeresis has full alpha‐
338              bet  citizenship  and should not be turned into an a.  Each ele‐
339              ment in the space-separated list has the  special  character  as
340              first  element  and  the  translation following. The handling of
341              both the lowercase and upper-case versions of a character should
342              be  specified,  as  appartenance  to the list will turn-off both
343              standard accent and case processing. The  value  is  global  and
344              affects   both   indexing   and  querying.   Examples:  Swedish:
345              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe  æae  Æae  ffff
346              fifi  flfl åå Åå unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe
347              æae Æae ffff fifi flfl In French, you probably want to decompose oe
348              and  ae and nobody would type a German ß unac_except_trans = ßss
349              œoe Œoe æae Æae ffff fifi flfl are not performed by unac, but it is
350              unlikely that someone would type the composed forms in a search.
351              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
352
353       maildefcharset = string
354              Overrides the default character set  for  email  messages  which
355              don't  specify  one.  This is mainly useful for readpst (libpst)
356              dumps, which are utf-8 but do not say so.
357
358       localfields = string
359              Set fields on all files (usually of a specific fs area).  Syntax
360              is  the  usual:  name  =  value ; attr1 = val1 ; [...]  value is
361              empty so this needs an initial semi-colon. This is useful, e.g.,
362              for  setting  the rclaptg field for application selection inside
363              mimeview.
364
365       testmodifusemtime = bool
366              Use mtime instead of ctime to test if a file has been  modified.
367              The  time is used in addition to the size, which is always used.
368              Setting this can reduce re-indexing on  systems  where  extended
369              attributes  are  used  (by  some  other  application),  but  not
370              indexed,  because  changing  extended  attributes  only  affects
371              ctime.   Notes:  -  This may prevent detection of change in some
372              marginal file rename cases (the target would need  to  have  the
373              same  size  and mtime).  - You should probably also set noxattr‐
374              fields to 1 in this case, except if you still prefer to  perform
375              xattr  indexing,  for  example  if the local file update pattern
376              makes it of value (as in general,  there  is  a  risk  for  pure
377              extended  attributes  updates  without  file  modification to go
378              undetected). Perform a full index reset after changing this.
379
380
381       noxattrfields = bool
382              Disable extended attributes conversion to metadata fields.  This
383              probably needs to be set if testmodifusemtime is set.
384
385       metadatacmds = string
386              Define  commands  to  gather  external metadata, e.g. tmsu tags.
387              There can be several entries,  separated  by  semi-colons,  each
388              defining  which field name the data goes into and the command to
389              use. Don't forget the initial semi-colon. All  the  field  names
390              must  be  different.  You can use aliases in the "field" file if
391              necessary.  As a not too pretty hack  conceded  to  convenience,
392              any  field  name  beginning  with "rclmulti" will be taken as an
393              indication that the command returns multiple field values inside
394              a text blob formatted as a recoll configuration file ("fieldname
395              = fieldvalue" lines). The rclmultixx name will be  ignored,  and
396              field  names  and values will be parsed from the data.  Example:
397              metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
398              %f
399
400
401       cachedir = dfn
402              Top  directory for Recoll data. Recoll data directories are nor‐
403              mally located relative  to  the  configuration  directory  (e.g.
404              ~/.recoll/xapiandb,  ~/.recoll/mboxcache). If 'cachedir' is set,
405              the directories are stored under  the  specified  value  instead
406              (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
407              ~/.cache/recoll/xapiandb).   This  affects  dbdir,  webcachedir,
408              mboxcachedir,  aspellDicDir,  which  can  still  be individually
409              specified to override cachedir.  Note that if you have  multiple
410              configurations, each must have a different cachedir, there is no
411              automatic computation of a subpath under cachedir.
412
413       maxfsoccuppc = int
414              Maximum file system occupation over which we stop indexing.  The
415              value  is  a percentage, corresponding to what the "Capacity" df
416              output column shows. The default value is 0, meaning  no  check‐
417              ing.
418
419       dbdir = dfn
420              Xapian  database  directory  location.  This  will be created on
421              first indexing. If the value is not an absolute path, it will be
422              interpreted as relative to cachedir if set, or the configuration
423              directory (-c argument or $RECOLL_CONFDIR).  If nothing is spec‐
424              ified, the default is then ~/.recoll/xapiandb/
425
426       idxstatusfile = fn
427              Name  of  the scratch file where the indexer process updates its
428              status. Default: idxstatus.txt inside the  configuration  direc‐
429              tory.
430
431       mboxcachedir = dfn
432              Directory location for storing mbox message offsets cache files.
433              This is normally 'mboxcache' under  cachedir  if  set,  or  else
434              under the configuration directory, but it may be useful to share
435              a directory between different configurations.
436
437       mboxcacheminmbs = int
438              Minimum mbox file size over which we cache the offsets. There is
439              really  no sense in caching offsets for small files. The default
440              is 5 MB.
441
442       webcachedir = dfn
443              Directory where we store the archived web pages.  This  is  only
444              used by the web history indexing code Default: cachedir/webcache
445              if cachedir is set, else $RECOLL_CONFDIR/webcache
446
447       webcachemaxmbs = int
448              Maximum size in MB of the Web archive. This is only used by  the
449              web  history  indexing code.  Default: 40 MB.  Reducing the size
450              will not physically truncate the file.
451
452       webqueuedir = fn
453              The path to the Web indexing queue. This used to  be  hard-coded
454              in  the  old plugin as ~/.recollweb/ToIndex so there would be no
455              need or possibility to change it, but the  WebExtensions  plugin
456              now  downloads  the files to the user Downloads directory, and a
457              script moves them to webqueuedir. The script  reads  this  value
458              from the config so it has become possible to change it.
459
460       webdownloadsdir = fn
461              The  path  to browser downloads directory. This is where the new
462              browser add-on extension has to create the files. They are  then
463              moved by a script to webqueuedir.
464
465       aspellDicDir = dfn
466              Aspell dictionary storage directory location. The aspell dictio‐
467              nary (aspdict.(lang).rws) is normally stored  in  the  directory
468              specified  by cachedir if set, or under the configuration direc‐
469              tory.
470
471       filtersdir = dfn
472              Directory location for executable input handlers. If RECOLL_FIL‐
473              TERSDIR  is  set in the environment, we use it instead. Defaults
474              to $prefix/share/recoll/filters. Can be redefined for  subdirec‐
475              tories.
476
477       iconsdir = dfn
478              Directory  location  for  icons.  The only reason to change this
479              would be if you want to change the icons displayed in the result
480              list. Defaults to $prefix/share/recoll/images
481
482       idxflushmb = int
483              Threshold  (megabytes of new data) where we flush from memory to
484              disk index. Setting this allows some control over  memory  usage
485              by the indexer process. A value of 0 means no explicit flushing,
486              which lets Xapian perform its own thing, meaning flushing  every
487              $XAPIAN_FLUSH_THRESHOLD  documents created, modified or deleted:
488              as memory usage depends on average document size, not only docu‐
489              ment  count,  the Xapian approach is is not very useful, and you
490              should let Recoll manage the flushes. The program compiled value
491              is  0.  The  configured default value (from this file) is now 50
492              MB, and should be ok in many cases.  You can set it as low as 10
493              to  conserve  memory,  but if you are looking for maximum speed,
494              you may want to experiment with values between 20 and 200. In my
495              experience,  values beyond this are always counterproductive. If
496              you find otherwise, please drop me a note.
497
498       filtermaxseconds = int
499              Maximum external filter execution time in seconds. Default  1200
500              (20mn).  Set to 0 for no limit. This is mainly to avoid infinite
501              loops in postscript files (loop.ps)
502
503       filtermaxmbytes = int
504              Maximum  virtual  memory  space  for  filter  processes   (setr‐
505              limit(RLIMIT_AS)),  in  megabytes.  Note  that this includes any
506              mapped libs (there is no reliable Linux way to  limit  the  data
507              space only), so we need to be a bit generous here. Anything over
508              2000 will be ignored on 32 bits machines.
509
510       thrQSizes = string
511              Stage input  queues  configuration.  There  are  three  internal
512              queues  in  the  indexing pipeline stages (file data extraction,
513              terms generation, index  update).  This  parameter  defines  the
514              queue  depths  for each stage (three integer values). If a value
515              of -1 is given for a given stage, no  queue  is  used,  and  the
516              thread  will  go on performing the next stage. In practise, deep
517              queues have not been shown to increase performance.  Default:  a
518              value  of 0 for the first queue tells Recoll to perform autocon‐
519              figuration based on the detected number of CPUs (no need for the
520              two  other  values  in  this case).  Use thrQSizes = -1 -1 -1 to
521              disable multithreading entirely.
522
523       thrTCounts = string
524              Number of threads used for each indexing stage. The three stages
525              are:  file data extraction, terms generation, index update). The
526              use of the counts is also controlled by some special  values  in
527              thrQSizes: if the first queue depth is 0, all counts are ignored
528              (autoconfigured); if a value of -1 is used for  a  queue  depth,
529              the  corresponding thread count is ignored. It makes no sense to
530              use a value other than 1 for the last stage because updating the
531              Xapian  index is necessarily single-threaded (and protected by a
532              mutex).
533
534       loglevel = int
535              Log file verbosity 1-6. A value of 2 will print only errors  and
536              warnings.  3  will print information like document updates, 4 is
537              quite verbose and 6 very verbose.
538
539       logfilename = fn
540              Log file destination. Use 'stderr' (default)  to  write  to  the
541              console.
542
543       idxloglevel = int
544              Override loglevel for the indexer.
545
546       idxlogfilename = fn
547              Override logfilename for the indexer.
548
549       daemloglevel = int
550              Override loglevel for the indexer in real time mode. The default
551              is to use the idx... values if set, else the log... values.
552
553       daemlogfilename = fn
554              Override logfilename for the indexer  in  real  time  mode.  The
555              default is to use the idx... values if set, else the log... val‐
556              ues.
557
558       orgidxconfdir = dfn
559              Original location of the configuration directory. This  is  used
560              exclusively  for  movable  datasets.  Locating the configuration
561              directory inside the directory tree makes it possible to provide
562              automatic  query  time  path  translations once the data set has
563              moved (for example, because it has been mounted on another loca‐
564              tion).
565
566       curidxconfdir = dfn
567              Current  location  of  the  configuration  directory. Complement
568              orgidxconfdir for movable datasets. This should be used  if  the
569              configuration  directory  has  been  copied  from the dataset to
570              another location, either because the dataset is readonly and  an
571              r/w  copy  is  desired, or for performance reasons. This records
572              the original moved location before copy, to allow path  transla‐
573              tion  computations.  For example if a dataset originally indexed
574              as    '/home/me/mydata/config'    has    been     mounted     to
575              '/media/me/mydata', and the GUI is running from a copied config‐
576              uration, orgidxconfdir would  be  '/home/me/mydata/config',  and
577              curidxconfdir (as set in the copied configuration) would be
578
579       idxrundir = dfn
580              Indexing process current directory. The input handlers sometimes
581              leave temporary files in the  current  directory,  so  it  makes
582              sense  to have recollindex chdir to some temporary directory. If
583              the value is empty, the current directory is not changed. If the
584              value is (literal) tmp, we use the temporary directory as set by
585              the environment (RECOLL_TMPDIR else TMPDIR else  /tmp).  If  the
586              value is an absolute path to a directory, we go there.
587
588       checkneedretryindexscript = fn
589              Script  used to heuristically check if we need to retry indexing
590              files which previously failed.  The default  script  checks  the
591              modified  dates  on /usr/bin and /usr/local/bin. A relative path
592              will be looked up in the filters dirs, then in the path. Use  an
593              absolute path to do otherwise.
594
595       recollhelperpath = string
596              Additional places to search for helper executables. This is only
597              used on Windows for now.
598
599       idxabsmlen = int
600              Length of abstracts we store while indexing.  Recoll  stores  an
601              abstract  for  each  indexed  file.   The  text can come from an
602              actual 'abstract' section in the document or will  just  be  the
603              beginning  of the document. It is stored in the index so that it
604              can be displayed inside the result lists  without  decoding  the
605              original  file. The idxabsmlen parameter defines the size of the
606              stored abstract. The default value  is  250  bytes.  The  search
607              interface  gives you the choice to display this stored text or a
608              synthetic abstract built by extracting text  around  the  search
609              terms.  If  you  always  prefer  the synthetic abstract, you can
610              reduce this value and save a little space.
611
612       idxmetastoredlen = int
613              Truncation length of  stored  metadata  fields.  This  does  not
614              affect  indexing (the whole field is processed anyway), just the
615              amount of data stored in the index for the purpose of displaying
616              fields inside result lists or previews. The default value is 150
617              bytes which may be too low if you have custom fields.
618
619       idxtexttruncatelen = int
620              Truncation length for all document texts. Only index the  begin‐
621              ning  of  documents.  This  is not recommended except if you are
622              sure that the interesting keywords  are  at  the  top  and  have
623              severe disk space issues.
624
625       aspellLanguage = string
626              Language definitions to use when creating the aspell dictionary.
627              The value must match a set of aspell language definition  files.
628              You  can  type "aspell dicts"  to see a list The default if this
629              is not set is to use the NLS environment to guess the value.
630
631       aspellAddCreateParam = string
632              Additional option and parameter to  aspell  dictionary  creation
633              command.  Some  aspell  packages  may  need an additional option
634              (e.g. on Debian Jessie:  --local-data-dir=/usr/lib/aspell).  See
635              Debian bug 772415.
636
637       aspellKeepStderr = bool
638              Set  this  to  have a look at aspell dictionary creation errors.
639              There are always many, so this is mostly for debugging.
640
641       noaspell = bool
642              Disable aspell use. The aspell dictionary generation takes time,
643              and  some  combinations  of  aspell version, language, and local
644              terms, result in aspell crashing, so it sometimes makes sense to
645              just disable the thing.
646
647       monauxinterval = int
648              Auxiliary  database  update interval. The real time indexer only
649              updates the auxiliary databases (stemdb,  aspell)  periodically,
650              because  it  would  be  too  costly  to do it for every document
651              change. The default period is one hour.
652
653       monixinterval = int
654              Minimum interval (seconds) between processings of  the  indexing
655              queue. The real time indexer does not process each event when it
656              comes in, but lets the queue accumulate,  to  diminish  overhead
657              and  to  aggregate  multiple  events  affecting  the  same file.
658              Default 30 S.
659
660       mondelaypatterns = string
661              Timing parameters for the real time  indexing.  Definitions  for
662              files  which  get  a  longer delay before reindexing is allowed.
663              This is for fast-changing files, that should only  be  reindexed
664              once  in  a  while. A list of wildcardPattern:seconds pairs. The
665              patterns are matched with  fnmatch(pattern,  path,  0)  You  can
666              quote  entries  containing white space with double quotes (quote
667              the whole entry, not the pattern). The default is empty.   Exam‐
668              ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
669
670       monioniceclass = int
671              ionice  class  for  the  real time indexing process On platforms
672              where this is supported. The default value is 3.
673
674       monioniceclassdata = string
675              ionice class parameter for the real time  indexing  process.  On
676              platforms where this is supported. The default is empty.
677
678       autodiacsens = bool
679              auto-trigger  diacritics  sensitivity  (raw  index only). IF the
680              index is not stripped, decide if we automatically  trigger  dia‐
681              critics  sensitivity  if the search term has accented characters
682              (not in unac_except_trans). Else you need to use the query  lan‐
683              guage  and  the  "D" modifier to specify diacritics sensitivity.
684              Default is no.
685
686       autocasesens = bool
687              auto-trigger case sensitivity (raw index only). IF the index  is
688              not  stripped  (see indexStripChars), decide if we automatically
689              trigger character case sensitivity if the search term has upper-
690              case  characters in any but the first position. Else you need to
691              use the query language and the "C" modifier to  specify  charac‐
692              ter-case sensitivity. Default is yes.
693
694       maxTermExpand = int
695              Maximum  query  expansion  count  for  a single term (e.g.: when
696              using wildcards). This only affects queries,  not  indexing.  We
697              used  to  not  limit this at all (except for filenames where the
698              limit was too low at 1000), but it is unreasonable  with  a  big
699              index. Default 10000.
700
701       maxXapianClauses = int
702              Maximum  number of clauses we add to a single Xapian query. This
703              only affects queries, not indexing. In some cases, the result of
704              term  expansion can be multiplicative, and we want to avoid eat‐
705              ing all the memory. Default 50000.
706
707       snippetMaxPosWalk = int
708              Maximum number of positions we walk while populating  a  snippet
709              for  the  result  list. The default of 1,000,000 may be insuffi‐
710              cient for very big documents, the consequence would be  snippets
711              with possibly meaning-altering missing words.
712
713       pdfocr = bool
714              Attempt  OCR of PDF files with no text content if both tesseract
715              and pdftoppm are installed. The default is off because OCR is so
716              very slow.
717
718       pdfocrlang = string
719              Language  to assume for PDF OCR. This is very important for hav‐
720              ing a reasonable rate of errors with tesseract. This can also be
721              set  through a configuration variable or directory-local parame‐
722              ters. See the rclpdf.py script.
723
724       pdfattach = bool
725              Enable PDF attachment extraction by executing pdftk  (if  avail‐
726              able).  This is normally disabled, because it does slow down PDF
727              indexing a bit even if not one attachment is ever found.
728
729       pdfextrameta = string
730              Extract text from selected XMP metadata tags. This is  a  space-
731              separated list of qualified XMP tag names. Each element can also
732              include a translation to a Recoll field name, separated by a '|'
733              character. If the second element is absent, the tag name is used
734              as the Recoll field names. You will also need to add  specifica‐
735              tions to the "fields" file to direct processing of the extracted
736              data.
737
738       pdfextrametafix = fn
739              Define name of XMP field editing script. This defines  the  name
740              of  a  script  to  be  loaded  for editing XMP field values. The
741              script should define a 'MetaFixer' class with a metafix() method
742              which  will  be  called with the qualified tag name and value of
743              each selected field, for editing or erasing. A new  instance  is
744              created  for  each  document,  so that the object can keep state
745              for, e.g. eliminating duplicate values.
746
747       mhmboxquirks = string
748              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
749              for the directory where the email mbox files are stored.
750
751
752

SEE ALSO

754       recollindex(1) recoll(1)
755
756
757
758                               14 November 2012                 RECOLL.CONF(5)
Impressum