1RECOLL.CONF(5)                File Formats Manual               RECOLL.CONF(5)
2
3
4

NAME

6       recoll.conf - main personal configuration file for Recoll
7

DESCRIPTION

9       This  file  defines  the  index  configuration for the Recoll full-text
10       search system.
11
12       The  system-wide  configuration  file  is   normally   located   inside
13       /usr/[local]/share/recoll/examples.  Any  parameter  set  in the common
14       file may be overridden by setting  it  in  the  personal  configuration
15       file, by default: $HOME/.recoll/recoll.conf
16
17       Please note while I try to keep this manual page reasonably up to date,
18       it will frequently lag the current state  of  the  software.  The  best
19       source  of  information about the configuration are the comments in the
20       system-wide configuration file or the user manual which you can  access
21       from the recoll GUI help menu or on the recoll web site.
22
23
24       A short extract of the file might look as follows:
25
26              # Space-separated list of directories to index.
27              topdirs =  ~/docs /usr/share/doc
28
29              [~/somedirectory-with-utf8-txt-files]
30              defaultcharset = utf-8
31
32
33       There are three kinds of lines:
34
35              ·      Comment or empty
36
37              ·      Parameter affectation
38
39              ·      Section definition
40
41       Empty lines or lines beginning with # are ignored.
42
43       Affectation lines are in the form 'name = value'.
44
45       Section  lines  allow  redefining  a parameter for a directory subtree.
46       Some of the parameters used for indexing are looked  up  hierarchically
47       from  the more to the less specific. Not all parameters can be meaning‐
48       fully redefined, this is specified for each in the next section.
49
50       The tilde character (~) is expanded in file names to the  name  of  the
51       user's home directory.
52
53       Where  values  are  lists, white space is used for separation, and ele‐
54       ments with embedded spaces can be quoted with double-quotes.
55

OPTIONS

57       topdirs = string
58              Space-separated list of  files  or  directories  to  recursively
59              index.  Default to ~ (indexes $HOME). You can use symbolic links
60              in the list, they will be followed, independently of  the  value
61              of the followLinks variable.
62
63       monitordirs = string
64              Space-separated  list  of  files  or  directories to monitor for
65              updates. When running the real-time indexer, this  allows  moni‐
66              toring  only  a  subset  of the whole indexed area. The elements
67              must be included in the tree defined by the 'topdirs' members.
68
69       skippedNames = string
70              Files and directories which should be ignored.  White space sep‐
71              arated  list  of wildcard patterns (simple ones, not paths, must
72              contain no / ), which will be tested against file and  directory
73              names.   The  list in the default configuration does not exclude
74              hidden directories (names beginning with  a  dot),  which  means
75              that  it  may  index quite a few things that you do not want. On
76              the other hand, email user agents like Thunderbird usually store
77              messages  in  hidden  directories,  and  you  probably want this
78              indexed. One possible solution is  to  have  ".*"  in  "skipped‐
79              Names",  and  add things like "~/.thunderbird" "~/.evolution" to
80              "topdirs".  Not even the file names are indexed for patterns  in
81              this  list, see the "noContentSuffixes" variable for an alterna‐
82              tive approach which indexes the file names. Can be redefined for
83              any subtree.
84
85       skippedNames- = string
86              List  of  name  endings  to remove from the default skippedNames
87              list.
88
89       skippedNames+ = string
90              List of name endings to add to the default skippedNames list.
91
92       onlyNames = string
93              Regular file name filter patterns If this is set, only the  file
94              names  not in skippedNames and matching one of the patterns will
95              be considered for indexing. Can be redefined per  subtree.  Does
96              not apply to directories.
97
98       noContentSuffixes = string
99              List  of  name  endings (not necessarily dot-separated suffixes)
100              for which we don't  try  MIME  type  identification,  and  don't
101              uncompress  or  index  content.  Only the names will be indexed.
102              This complements the now obsoleted recoll_noindex list from  the
103              mimemap  file,  which will go away in a future release (the move
104              from mimemap to recoll.conf allows editing the list through  the
105              GUI). This is different from skippedNames because these are name
106              ending matches only (not wildcard patterns), and the  file  name
107              itself  gets  indexed normally. This can be redefined for subdi‐
108              rectories.
109
110       noContentSuffixes- = string
111              List of name endings to remove from  the  default  noContentSuf‐
112              fixes list.
113
114       noContentSuffixes+ = string
115              List  of  name  endings  to add to the default noContentSuffixes
116              list.
117
118       skippedPaths = string
119              Absolute paths we should not go into.  Space-separated  list  of
120              wildcard  expressions  for  absolute  filesystem  paths. Must be
121              defined at the top level of the configuration  file,  not  in  a
122              subsection.  Can contain files and directories. The database and
123              configuration  directories  will  automatically  be  added.  The
124              expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME
125              flag set by default. This means  that  '/'  characters  must  be
126              matched  explicitly.  You can set 'skippedPathsFnmPathname' to 0
127              to disable the use of FNM_PATHNAME (meaning that '/*/dir3'  will
128              match  '/dir1/dir2/dir3').  The default value contains the usual
129              mount point for removable media to remind you that it is  a  bad
130              idea  to have Recoll work on these (esp. with the monitor: media
131              gets indexed on mount, all data gets erased on unmount). Explic‐
132              itly adding '/media/xxx' to the 'topdirs' variable will override
133              this.
134
135       skippedPathsFnmPathname = bool
136              Set to 0 to override use of FNM_PATHNAME  for  matching  skipped
137              paths.
138
139       nowalkfn = string
140              File  name  which will cause its parent directory to be skipped.
141              Any directory containing a file with this name will  be  skipped
142              as if it was part of the skippedPaths list. Ex: .recoll-noindex
143
144       daemSkippedPaths = string
145              skippedPaths  equivalent  specific  to  real time indexing. This
146              enables having parts of the tree which are initially indexed but
147              not  monitored.  If daemSkippedPaths is not set, the daemon uses
148              skippedPaths.
149
150       zipUseSkippedNames = bool
151              Use skippedNames inside Zip archives. Fetched  directly  by  the
152              rclzip handler. Skip the patterns defined by skippedNames inside
153              Zip  archives.  Can  be  redefined  for   subdirectories.    See
154              https://www.lesbonscomptes.com/recoll/faqsandhowtos/Fil
155              teringOutZipArchiveMembers.html
156
157
158       zipSkippedNames = string
159              Space-separated list of  wildcard  expressions  for  names  that
160              should  be ignored inside zip archives. This is used directly by
161              the zip handler. If zipUseSkippedNames is not  set,  zipSkipped‐
162              Names  defines  the  patterns  to be skipped inside archives. If
163              zipUseSkippedNames is set, the two lists  are  concatenated  and
164              used. Can be redefined for subdirectories.  See https://www.les
165              bonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMem‐
166              bers.html
167
168
169       followLinks = bool
170              Follow  symbolic links during indexing. The default is to ignore
171              symbolic links to avoid multiple indexing of  linked  files.  No
172              effort  is  made to avoid duplication when this option is set to
173              true. This option can  be  set  individually  for  each  of  the
174              'topdirs' members by using sections. It can not be changed below
175              the 'topdirs' level. Links in  the  'topdirs'  list  itself  are
176              always followed.
177
178       indexedmimetypes = string
179              Restrictive  list  of  indexed  mime types. Normally not set (in
180              which case all supported types are indexed). If it is set,  only
181              the  types  from  the list will have their contents indexed. The
182              names  will  be  indexed  anyway  if  indexallfilenames  is  set
183              (default). MIME type names should be taken from the mimemap file
184              (the values may be different from xdg-mime or file -i output  in
185              some cases). Can be redefined for subtrees.
186
187       excludedmimetypes = string
188              List  of  excluded  MIME types. Lets you exclude some types from
189              indexing. MIME type names should be taken from the mimemap  file
190              (the  values may be different from xdg-mime or file -i output in
191              some cases) Can be redefined for subtrees.
192
193       nomd5types = string
194              Don't compute md5 for these types. md5 checksums are  used  only
195              for  deduplicating results, and can be very expensive to compute
196              on multimedia or other big files. This list lets  you  turn  off
197              md5  computation  for selected types. It is global (no redefini‐
198              tion for subtrees). At the moment, it only  has  an  effect  for
199              external handlers (exec and execm). The file types can be speci‐
200              fied by listing either MIME types (e.g. audio/mpeg)  or  handler
201              names (e.g. rclaudio).
202
203       compressedfilemaxkbs = int
204              Size  limit for compressed files. We need to decompress these in
205              a temporary directory for identification, which can be  wasteful
206              in  some  cases.  Limit  the  waste.  Negative means no limit. 0
207              results in no processing of any compressed file. Default 50 MB.
208
209       textfilemaxmbs = int
210              Size limit for text files. Mostly  for  skipping  monster  logs.
211              Default 20 MB.
212
213       indexallfilenames = bool
214              Index  the  file  names  of unprocessed files Index the names of
215              files the contents  of  which  we  don't  index  because  of  an
216              excluded or unsupported MIME type.
217
218       usesystemfilecommand = bool
219              Use a system command for file MIME type guessing as a final step
220              in file type identification This is generally useful,  but  will
221              usually cause the indexing of many bogus 'text' files. See 'sys‐
222              temfilecommand' for the command used.
223
224       systemfilecommand = string
225              Command used to guess MIME types if the internal  methods  fails
226              This  should  be  a  "file -i" workalike.  The file path will be
227              added as a last parameter to the command line. "xdg-mime"  works
228              better  than the traditional "file" command, and is now the con‐
229              figured default (with a hard-coded fallback to "file")
230
231       processwebqueue = bool
232              Decide if we process the Web queue. The  queue  is  a  directory
233              where  the  Recoll Web browser plugins create the copies of vis‐
234              ited pages.
235
236       textfilepagekbs = int
237              Page size for text files. If this is set, text/plain files  will
238              be  divided  into  documents  of  approximately  this size. Will
239              reduce memory usage at index time and help with loading data  in
240              the  preview window at query time. Particularly useful with very
241              big  files,  such  as  application  or  system  logs.  Also  see
242              textfilemaxmbs and compressedfilemaxkbs.
243
244       membermaxkbs = int
245              Size limit for archive members. This is passed to the filters in
246              the environment as RECOLL_FILTER_MAXMEMBERKB.
247
248       indexStripChars = bool
249              Decide if we store character case and diacritics in  the  index.
250              If  we do, searches sensitive to case and diacritics can be per‐
251              formed, but the index will be bigger, and some  marginal  weird‐
252              ness  may sometimes occur. The default is a stripped index. When
253              using multiple indexes for a  search,  this  parameter  must  be
254              defined identically for all. Changing the value implies an index
255              reset.
256
257       indexStoreDocText = bool
258              Decide if we store the documents' text  content  in  the  index.
259              Storing  the  text  allows  extracting snippets from it at query
260              time, instead of building them from index position data.   Newer
261              Xapian  index  formats  have  rendered our use of positions list
262              unacceptably slow in some cases. The last  Xapian  index  format
263              with  good  performance  for  the  old method is Chert, which is
264              default for 1.2, still supported but not default in 1.4 and will
265              be  dropped in 1.6.  The stored document text is translated from
266              its original format to UTF-8 plain text,  but  not  stripped  of
267              upper-case,   diacritics,   or  punctuation  signs.  Storing  it
268              increases the index size by 10-20% typically,  but  also  allows
269              for  nicer  snippets, so it may be worth enabling it even if not
270              strictly needed for performance if you  can  afford  the  space.
271              The  variable only has an effect when creating an index, meaning
272              that the xapiandb directory must not exist yet. Its exact effect
273              depends  on the Xapian version.  For Xapian 1.4, if the variable
274              is set to 0, the Chert format will be used, and  the  text  will
275              not be stored. If the variable is 1, Glass will be used, and the
276              text stored.  For Xapian 1.2, and for  versions  after  1.5  and
277              newer,  the index format is always the default, but the variable
278              controls if the text is stored or not, and the abstract  genera‐
279              tion  method. With Xapian 1.5 and later, and the variable set to
280              0, abstract generation may be very slow, but  this  setting  may
281              still be useful to save space if you do not use abstract genera‐
282              tion at all.
283
284
285       nonumbers = bool
286              Decides if terms will be  generated  for  numbers.  For  example
287              "123",  "1.5e6",  192.168.1.4, would not be indexed if nonumbers
288              is set ("value123" would still  be).  Numbers  are  often  quite
289              interesting  to  search for, and this should probably not be set
290              except for special situations,  ie,  scientific  documents  with
291              huge  amounts  of  numbers in them, where setting nonumbers will
292              reduce the index size. This can only be set for a  whole  index,
293              not for a subtree.
294
295       dehyphenate = bool
296              Determines  if  we  index 'coworker' also when the input is 'co-
297              worker'. This is new in version 1.22, and on by default. Setting
298              the variable to off allows restoring the previous behaviour.
299
300       backslashasletter = bool
301              Process backslash as normal letter. This may make sense for peo‐
302              ple wanting to index TeX commands as such but  is  not  of  much
303              general use.
304
305       underscoreasletter = bool
306              Process underscore as normal letter. This makes sense in so many
307              cases that one wonders if it should not be the default.
308
309       maxtermlength = int
310              Maximum term length. Words longer than this will  be  discarded.
311              The  default  is 40 and used to be hard-coded, but it can now be
312              adjusted. You need an index reset if you change the value.
313
314       nocjk = bool
315              Decides if specific East Asian (Chinese Korean Japanese) charac‐
316              ters/word splitting is turned off. This will save a small amount
317              of CPU if you have no CJK documents. If your document base  does
318              include  such  text  but you are not interested in searching it,
319              setting nocjk may be a significant time and space saver.
320
321       cjkngramlen = int
322              This lets you adjust the size of n-grams used for  indexing  CJK
323              text.  The  default  value  of 2 is probably appropriate in most
324              cases. A value of 3 would allow more precision and efficiency on
325              longer  words,  but  the  index  will  be approximately twice as
326              large.
327
328       indexstemminglanguages = string
329              Languages for which to create stemming expansion  data.  Stemmer
330              names  can  be  found by executing 'recollindex -l', or this can
331              also be set from a list in the GUI. The values are full language
332              names, e.g. english, french...
333
334       defaultcharset = string
335              Default  character set. This is used for files which do not con‐
336              tain a character set definition (e.g.: text/plain). Values found
337              inside files, e.g. a 'charset' tag in HTML documents, will over‐
338              ride it. If this is not set, the default character  set  is  the
339              one  defined by the NLS environment ($LC_ALL, $LC_CTYPE, $LANG),
340              or ultimately iso-8859-1 (cp-1252 in fact).  If for some  reason
341              you want a general default which does not match your LANG and is
342              not 8859-1, use this variable. This can  be  redefined  for  any
343              sub-directory.
344
345       unac_except_trans = string
346              A  list of characters, encoded in UTF-8, which should be handled
347              specially when converting  text  to  unaccented  lowercase.  For
348              example, in Swedish, the letter a with diaeresis has full alpha‐
349              bet citizenship and should not be turned into an a.   Each  ele‐
350              ment  in  the  space-separated list has the special character as
351              first element and the translation  following.  The  handling  of
352              both the lowercase and upper-case versions of a character should
353              be specified, as appartenance to the  list  will  turn-off  both
354              standard  accent  and  case  processing. The value is global and
355              affects both indexing and querying.  Examples:
356              Swedish:
357              unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe  æae  Æae  ffff
358              fifi flfl åå Åå
359              German:
360              unac_except_trans  =  ää  Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff
361              fifi flfl
362              French: you probably want to decompose  oe  and  ae  and  nobody
363              would type a German ß
364              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
365              The default for all until someone protests follows. These decom‐
366              positions are not performed by unac, but  it  is  unlikely  that
367              someone would type the composed forms in a search.
368              unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
369
370       maildefcharset = string
371              Overrides  the  default  character  set for email messages which
372              don't specify one. This is mainly useful  for  readpst  (libpst)
373              dumps, which are utf-8 but do not say so.
374
375       localfields = string
376              Set  fields on all files (usually of a specific fs area). Syntax
377              is the usual: name = value ; attr1 =  val1  ;  [...]   value  is
378              empty so this needs an initial semi-colon. This is useful, e.g.,
379              for setting the rclaptg field for application  selection  inside
380              mimeview.
381
382       testmodifusemtime = bool
383              Use  mtime instead of ctime to test if a file has been modified.
384              The time is used in addition to the size, which is always  used.
385              Setting  this  can  reduce re-indexing on systems where extended
386              attributes  are  used  (by  some  other  application),  but  not
387              indexed,  because  changing  extended  attributes  only  affects
388              ctime.  Notes: - This may prevent detection of  change  in  some
389              marginal  file  rename  cases (the target would need to have the
390              same size and mtime).  - You should probably also  set  noxattr‐
391              fields  to 1 in this case, except if you still prefer to perform
392              xattr indexing, for example if the  local  file  update  pattern
393              makes  it  of  value  (as  in  general, there is a risk for pure
394              extended attributes updates  without  file  modification  to  go
395              undetected). Perform a full index reset after changing this.
396
397
398       noxattrfields = bool
399              Disable  extended attributes conversion to metadata fields. This
400              probably needs to be set if testmodifusemtime is set.
401
402       metadatacmds = string
403              Define commands to gather external  metadata,  e.g.  tmsu  tags.
404              There  can  be  several  entries, separated by semi-colons, each
405              defining which field name the data goes into and the command  to
406              use.  Don't  forget  the initial semi-colon. All the field names
407              must be different. You can use aliases in the  "field"  file  if
408              necessary.   As  a  not too pretty hack conceded to convenience,
409              any field name beginning with "rclmulti" will  be  taken  as  an
410              indication that the command returns multiple field values inside
411              a text blob formatted as a recoll configuration file ("fieldname
412              =  fieldvalue"  lines). The rclmultixx name will be ignored, and
413              field names and values will be parsed from the  data.   Example:
414              metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf
415              %f
416
417
418       cachedir = dfn
419              Top directory for Recoll data. Recoll data directories are  nor‐
420              mally  located  relative  to  the  configuration directory (e.g.
421              ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is  set,
422              the  directories  are  stored  under the specified value instead
423              (e.g. if cachedir is ~/.cache/recoll, the default dbdir would be
424              ~/.cache/recoll/xapiandb).   This  affects  dbdir,  webcachedir,
425              mboxcachedir, aspellDicDir,  which  can  still  be  individually
426              specified  to override cachedir.  Note that if you have multiple
427              configurations, each must have a different cachedir, there is no
428              automatic computation of a subpath under cachedir.
429
430       maxfsoccuppc = int
431              Maximum  file system occupation over which we stop indexing. The
432              value is a percentage, corresponding to what the  "Capacity"  df
433              output  column  shows. The default value is 0, meaning no check‐
434              ing.
435
436       dbdir = dfn
437              Xapian database directory location.  This  will  be  created  on
438              first indexing. If the value is not an absolute path, it will be
439              interpreted as relative to cachedir if set, or the configuration
440              directory (-c argument or $RECOLL_CONFDIR).  If nothing is spec‐
441              ified, the default is then ~/.recoll/xapiandb/
442
443       idxstatusfile = fn
444              Name of the scratch file where the indexer process  updates  its
445              status.  Default:  idxstatus.txt inside the configuration direc‐
446              tory.
447
448       mboxcachedir = dfn
449              Directory location for storing mbox message offsets cache files.
450              This  is  normally  'mboxcache'  under  cachedir if set, or else
451              under the configuration directory, but it may be useful to share
452              a directory between different configurations.
453
454       mboxcacheminmbs = int
455              Minimum mbox file size over which we cache the offsets. There is
456              really no sense in caching offsets for small files. The  default
457              is 5 MB.
458
459       mboxmaxmsgmbs = int
460              Maximum  mbox  member message size in megabytes. Size over which
461              we assume that the mbox format is bad or we  misinterpreted  it,
462              at which point we just stop processing the file.
463
464       webcachedir = dfn
465              Directory  where  we  store the archived web pages. This is only
466              used by the web history indexing code Default: cachedir/webcache
467              if cachedir is set, else $RECOLL_CONFDIR/webcache
468
469       webcachemaxmbs = int
470              Maximum  size in MB of the Web archive. This is only used by the
471              web history indexing code.  Default: 40 MB.  Reducing  the  size
472              will not physically truncate the file.
473
474       webqueuedir = fn
475              The  path  to the Web indexing queue. This used to be hard-coded
476              in the old plugin as ~/.recollweb/ToIndex so there would  be  no
477              need  or  possibility to change it, but the WebExtensions plugin
478              now downloads the files to the user Downloads directory,  and  a
479              script  moves  them  to webqueuedir. The script reads this value
480              from the config so it has become possible to change it.
481
482       webdownloadsdir = fn
483              The path to browser downloads directory. This is where  the  new
484              browser  add-on extension has to create the files. They are then
485              moved by a script to webqueuedir.
486
487       aspellDicDir = dfn
488              Aspell dictionary storage directory location. The aspell dictio‐
489              nary  (aspdict.(lang).rws)  is  normally stored in the directory
490              specified by cachedir if set, or under the configuration  direc‐
491              tory.
492
493       filtersdir = dfn
494              Directory location for executable input handlers. If RECOLL_FIL‐
495              TERSDIR is set in the environment, we use it  instead.  Defaults
496              to  $prefix/share/recoll/filters. Can be redefined for subdirec‐
497              tories.
498
499       iconsdir = dfn
500              Directory location for icons. The only  reason  to  change  this
501              would be if you want to change the icons displayed in the result
502              list. Defaults to $prefix/share/recoll/images
503
504       idxflushmb = int
505              Threshold (megabytes of new data) where we flush from memory  to
506              disk  index.  Setting this allows some control over memory usage
507              by the indexer process. A value of 0 means no explicit flushing,
508              which  lets Xapian perform its own thing, meaning flushing every
509              $XAPIAN_FLUSH_THRESHOLD documents created, modified or  deleted:
510              as memory usage depends on average document size, not only docu‐
511              ment count, the Xapian approach is is not very useful,  and  you
512              should let Recoll manage the flushes. The program compiled value
513              is 0. The configured default value (from this file)  is  now  50
514              MB, and should be ok in many cases.  You can set it as low as 10
515              to conserve memory, but if you are looking  for  maximum  speed,
516              you may want to experiment with values between 20 and 200. In my
517              experience, values beyond this are always counterproductive.  If
518              you find otherwise, please drop me a note.
519
520       filtermaxseconds = int
521              Maximum  external filter execution time in seconds. Default 1200
522              (20mn). Set to 0 for no limit. This is mainly to avoid  infinite
523              loops in postscript files (loop.ps)
524
525       filtermaxmbytes = int
526              Maximum   virtual  memory  space  for  filter  processes  (setr‐
527              limit(RLIMIT_AS)), in megabytes. Note  that  this  includes  any
528              mapped  libs  (there  is no reliable Linux way to limit the data
529              space only), so we need to be a bit generous here. Anything over
530              2000  will  be ignored on 32 bits machines. The previous default
531              value of 2000 would prevent java pdftk  to  work  when  executed
532              from Python rclpdf.py.
533
534       thrQSizes = string
535              Stage  input  queues  configuration.  There  are  three internal
536              queues in the indexing pipeline stages  (file  data  extraction,
537              terms  generation,  index  update).  This  parameter defines the
538              queue depths for each stage (three integer values). If  a  value
539              of  -1  is  given  for  a given stage, no queue is used, and the
540              thread will go on performing the next stage. In  practise,  deep
541              queues  have  not been shown to increase performance. Default: a
542              value of 0 for the first queue tells Recoll to perform  autocon‐
543              figuration based on the detected number of CPUs (no need for the
544              two other values in this case).  Use thrQSizes =  -1  -1  -1  to
545              disable multithreading entirely.
546
547       thrTCounts = string
548              Number of threads used for each indexing stage. The three stages
549              are: file data extraction, terms generation, index update).  The
550              use  of  the counts is also controlled by some special values in
551              thrQSizes: if the first queue depth is 0, all counts are ignored
552              (autoconfigured);  if  a  value of -1 is used for a queue depth,
553              the corresponding thread count is ignored. It makes no sense  to
554              use a value other than 1 for the last stage because updating the
555              Xapian index is necessarily single-threaded (and protected by  a
556              mutex).
557
558       loglevel = int
559              Log  file verbosity 1-6. A value of 2 will print only errors and
560              warnings. 3 will print information like document updates,  4  is
561              quite verbose and 6 very verbose.
562
563       logfilename = fn
564              Log  file  destination.  Use  'stderr' (default) to write to the
565              console.
566
567       idxloglevel = int
568              Override loglevel for the indexer.
569
570       idxlogfilename = fn
571              Override logfilename for the indexer.
572
573       daemloglevel = int
574              Override loglevel for the indexer in real time mode. The default
575              is to use the idx... values if set, else the log... values.
576
577       daemlogfilename = fn
578              Override  logfilename  for  the  indexer  in real time mode. The
579              default is to use the idx... values if set, else the log... val‐
580              ues.
581
582       pyloglevel = int
583              Override loglevel for the python module.
584
585       pylogfilename = fn
586              Override logfilename for the python module.
587
588       orgidxconfdir = dfn
589              Original  location  of the configuration directory. This is used
590              exclusively for movable  datasets.  Locating  the  configuration
591              directory inside the directory tree makes it possible to provide
592              automatic query time path translations once  the  data  set  has
593              moved (for example, because it has been mounted on another loca‐
594              tion).
595
596       curidxconfdir = dfn
597              Current location  of  the  configuration  directory.  Complement
598              orgidxconfdir  for  movable datasets. This should be used if the
599              configuration directory has been  copied  from  the  dataset  to
600              another  location, either because the dataset is readonly and an
601              r/w copy is desired, or for performance  reasons.  This  records
602              the  original moved location before copy, to allow path transla‐
603              tion computations.  For example if a dataset originally  indexed
604              as     '/home/me/mydata/config'     has    been    mounted    to
605              '/media/me/mydata', and the GUI is running from a copied config‐
606              uration,  orgidxconfdir  would  be '/home/me/mydata/config', and
607              curidxconfdir (as set in the copied configuration) would be
608
609       idxrundir = dfn
610              Indexing process current directory. The input handlers sometimes
611              leave  temporary  files  in  the  current directory, so it makes
612              sense to have recollindex chdir to some temporary directory.  If
613              the value is empty, the current directory is not changed. If the
614              value is (literal) tmp, we use the temporary directory as set by
615              the  environment  (RECOLL_TMPDIR  else TMPDIR else /tmp). If the
616              value is an absolute path to a directory, we go there.
617
618       checkneedretryindexscript = fn
619              Script used to heuristically check if we need to retry  indexing
620              files  which  previously  failed.  The default script checks the
621              modified dates on /usr/bin and /usr/local/bin. A  relative  path
622              will  be looked up in the filters dirs, then in the path. Use an
623              absolute path to do otherwise.
624
625       recollhelperpath = string
626              Additional places to search for helper executables. This is only
627              used on Windows for now.
628
629       idxabsmlen = int
630              Length  of  abstracts  we store while indexing. Recoll stores an
631              abstract for each indexed file.   The  text  can  come  from  an
632              actual  'abstract'  section  in the document or will just be the
633              beginning of the document. It is stored in the index so that  it
634              can  be  displayed  inside the result lists without decoding the
635              original file. The idxabsmlen parameter defines the size of  the
636              stored  abstract.  The  default  value  is 250 bytes. The search
637              interface gives you the choice to display this stored text or  a
638              synthetic  abstract  built  by extracting text around the search
639              terms. If you always prefer  the  synthetic  abstract,  you  can
640              reduce this value and save a little space.
641
642       idxmetastoredlen = int
643              Truncation  length  of  stored  metadata  fields.  This does not
644              affect indexing (the whole field is processed anyway), just  the
645              amount of data stored in the index for the purpose of displaying
646              fields inside result lists or previews. The default value is 150
647              bytes which may be too low if you have custom fields.
648
649       idxtexttruncatelen = int
650              Truncation  length for all document texts. Only index the begin‐
651              ning of documents. This is not recommended  except  if  you  are
652              sure  that  the  interesting  keywords  are  at the top and have
653              severe disk space issues.
654
655       aspellLanguage = string
656              Language definitions to use when creating the aspell dictionary.
657              The  value must match a set of aspell language definition files.
658              You can type "aspell dicts" to see a list The default if this is
659              not  set  is  to use the NLS environment to guess the value. The
660              values are the 2-letter language codes (e.g. 'en', 'fr'...)
661
662       aspellAddCreateParam = string
663              Additional option and parameter to  aspell  dictionary  creation
664              command.  Some  aspell  packages  may  need an additional option
665              (e.g. on Debian Jessie:  --local-data-dir=/usr/lib/aspell).  See
666              Debian bug 772415.
667
668       aspellKeepStderr = bool
669              Set  this  to  have a look at aspell dictionary creation errors.
670              There are always many, so this is mostly for debugging.
671
672       noaspell = bool
673              Disable aspell use. The aspell dictionary generation takes time,
674              and  some  combinations  of  aspell version, language, and local
675              terms, result in aspell crashing, so it sometimes makes sense to
676              just disable the thing.
677
678       monauxinterval = int
679              Auxiliary  database  update interval. The real time indexer only
680              updates the auxiliary databases (stemdb,  aspell)  periodically,
681              because  it  would  be  too  costly  to do it for every document
682              change. The default period is one hour.
683
684       monixinterval = int
685              Minimum interval (seconds) between processings of  the  indexing
686              queue. The real time indexer does not process each event when it
687              comes in, but lets the queue accumulate,  to  diminish  overhead
688              and  to  aggregate  multiple  events  affecting  the  same file.
689              Default 30 S.
690
691       mondelaypatterns = string
692              Timing parameters for the real time  indexing.  Definitions  for
693              files  which  get  a  longer delay before reindexing is allowed.
694              This is for fast-changing files, that should only  be  reindexed
695              once  in  a  while. A list of wildcardPattern:seconds pairs. The
696              patterns are matched with  fnmatch(pattern,  path,  0)  You  can
697              quote  entries  containing white space with double quotes (quote
698              the whole entry, not the pattern). The default is empty.   Exam‐
699              ple: mondelaypatterns = *.log:20 "*with spaces.*:30"
700
701       idxniceprio = int
702              "nice"  process priority for the indexing processes. Default: 19
703              (lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
704
705       monioniceclass = int
706              ionice class for the indexing process.  Despite  the  misleading
707              name, and on platforms where this is supported, this affects all
708              indexing processes, not only the real time/monitoring ones.  The
709              default value is 3 (use lowest "Idle" priority).
710
711       monioniceclassdata = string
712              ionice  class  level  parameter  if  the  class supports it. The
713              default is empty, as the default "Idle" class has no levels.
714
715       autodiacsens = bool
716              auto-trigger diacritics sensitivity (raw  index  only).  IF  the
717              index  is  not stripped, decide if we automatically trigger dia‐
718              critics sensitivity if the search term has  accented  characters
719              (not  in unac_except_trans). Else you need to use the query lan‐
720              guage and the "D" modifier to  specify  diacritics  sensitivity.
721              Default is no.
722
723       autocasesens = bool
724              auto-trigger  case sensitivity (raw index only). IF the index is
725              not stripped (see indexStripChars), decide if  we  automatically
726              trigger character case sensitivity if the search term has upper-
727              case characters in any but the first position. Else you need  to
728              use  the  query language and the "C" modifier to specify charac‐
729              ter-case sensitivity. Default is yes.
730
731       maxTermExpand = int
732              Maximum query expansion count for  a  single  term  (e.g.:  when
733              using  wildcards).  This  only affects queries, not indexing. We
734              used to not limit this at all (except for  filenames  where  the
735              limit  was  too  low at 1000), but it is unreasonable with a big
736              index. Default 10000.
737
738       maxXapianClauses = int
739              Maximum number of clauses we add to a single Xapian query.  This
740              only affects queries, not indexing. In some cases, the result of
741              term expansion can be multiplicative, and we want to avoid  eat‐
742              ing all the memory. Default 50000.
743
744       snippetMaxPosWalk = int
745              Maximum  number  of positions we walk while populating a snippet
746              for the result list. The default of 1,000,000  may  be  insuffi‐
747              cient  for very big documents, the consequence would be snippets
748              with possibly meaning-altering missing words.
749
750       pdfocr = bool
751              Attempt OCR of PDF files with  no  text  content.  This  can  be
752              defined  in subdirectories. The default is off because OCR is so
753              very slow.
754
755       pdfattach = bool
756              Enable PDF attachment extraction by executing pdftk  (if  avail‐
757              able).  This is normally disabled, because it does slow down PDF
758              indexing a bit even if not one attachment is ever found.
759
760       pdfextrameta = string
761              Extract text from selected XMP metadata tags. This is  a  space-
762              separated list of qualified XMP tag names. Each element can also
763              include a translation to a Recoll field name, separated by a '|'
764              character. If the second element is absent, the tag name is used
765              as the Recoll field names. You will also need to add  specifica‐
766              tions to the "fields" file to direct processing of the extracted
767              data.
768
769       pdfextrametafix = fn
770              Define name of XMP field editing script. This defines  the  name
771              of  a  script  to  be  loaded  for editing XMP field values. The
772              script should define a 'MetaFixer' class with a metafix() method
773              which  will  be  called with the qualified tag name and value of
774              each selected field, for editing or erasing. A new  instance  is
775              created  for  each  document,  so that the object can keep state
776              for, e.g. eliminating duplicate values.
777
778       ocrprogs = string
779              OCR modules to try. The top OCR script will try to load the cor‐
780              responding  modules  in  order  and  use the first which reports
781              being capable of performing OCR on the input file.  Modules  for
782              tesseract  (tesseract)  and ABBYY FineReader (abbyy) are present
783              in the standard distribution. For compatibility with the  previ‐
784              ous version, if this is not defined at all, the default value is
785              "tesseract". Use an explicit empty value if needed. A  value  of
786              "abbyy tesseract" will try everything.
787
788       ocrcachedir = dfn
789              Location  for  caching OCR data. The default if this is empty or
790              undefined  is  to  store  the  cached  OCR  data   under   $REC‐
791              OLL_CONFDIR/ocrcache.
792
793       tesseractlang = string
794              Language  to  assume  for tesseract OCR. Important for improving
795              the OCR accuracy. This can also be set through the contents of a
796              file in the currently processed directory. See the rclocrtesser‐
797              act.py script. Example values: eng,  fra...  See  the  tesseract
798              documentation.
799
800       tesseractcmd = fn
801              Path  for  the  tesseract  command. Do not quote. This is mostly
802              useful on Windows, or for  specifying  a  non-default  tesseract
803              command.    E.g.    on    Windows.    tesseractcmd   =   C:/Pro‐
804              gram Files (x86)/Tesseract-OCR/tesseract.exe
805
806
807       abbyylang = string
808              Language to assume for abbyy OCR. Important  for  improving  the
809              OCR  accuracy.  This  can  also be set through the contents of a
810              file in the currently processed  directory.  See  the  rclocrab‐
811              byy.py  script. Typical values: English, French... See the ABBYY
812              documentation.
813
814
815       abbyycmd = fn
816              Path for the abbyy command The ABBY directory is usually not  in
817              the path, so you should set this.
818
819
820       mhmboxquirks = string
821              Enable thunderbird/mozilla-seamonkey mbox format quirks Set this
822              for the directory where the email mbox files are stored.
823
824
825

SEE ALSO

827       recollindex(1) recoll(1)
828
829
830
831                               14 November 2012                 RECOLL.CONF(5)
Impressum