1RECOLL.CONF(5)                File Formats Manual               RECOLL.CONF(5)
2
3
4

NAME

6       recoll.conf - main personal configuration file for Recoll
7

DESCRIPTION

9       This file defines the indexation configuration for the Recoll full-text
10       search system.
11
12       The  system-wide  configuration  file  is   normally   located   inside
13       /usr/[local]/share/recoll/examples.  Any  parameter  set  in the common
14       file may be overridden by setting  it  in  the  personal  configuration
15       file, by default: $HOME/.recoll/recoll.conf
16
17       Please  note  while  we  try  to keep this manual page reasonably up to
18       date, it will frequently lag the current state  of  the  software.  The
19       best  source of information about the configuration are the comments in
20       the configuration file.
21
22
23       A short extract of the file might look as follows:
24
25              # Space-separated list of directories to index.
26              topdirs =  ~/docs /usr/share/doc
27
28              [~/somedirectory-with-utf8-txt-files]
29              defaultcharset = utf-8
30
31
32       There are three kinds of lines:
33
34              ·      Comment or empty
35
36              ·      Parameter affectation
37
38              ·      Section definition
39
40       Empty lines or lines beginning with # are ignored.
41
42       Affectation lines are in the form 'name = value'.
43
44       Section lines allow redefining a parameter  for  a  directory  subtree.
45       Some  of  the  parameters used for indexaction are looked up hierarchi‐
46       cally from the more to the less specific. Not  all  parameters  can  be
47       meaningfully redefined, this is specified for each in the next section.
48
49       The  tilde  character  (~) is expanded in file names to the name of the
50       user's home directory.
51
52       Where values are lists, white space is used for  separation,  and  ele‐
53       ments with embedded spaces can be quoted with double-quotes.
54

OPTIONS

56       topdirs = directories
57              Specifies the list of directories to index (recursively).
58
59       dbdir = directory
60              The name of the Xapian database directory. It will be created if
61              needed when the database is initialized. If this is not an abso‐
62              lute  pathname,  it  will be taken relative to the configuration
63              directory.
64
65       skippedNames = patterns
66              A space-separated list of patterns for names of files or  direc‐
67              tories  that  should  be completely ignored. The list defined in
68              the default file is:
69
70              *~ #* bin CVS  Cache caughtspam  tmp
71
72              The list can be redefined for subdirectories, but is only  actu‐
73              ally changed for the top level ones in topdirs
74
75       skippedPaths = patterns
76              A  space-separated list of patterns for paths the indexer should
77              not descend into. Together with topdirs, this allows pruning the
78              indexed  tree  to one's content. daemSkippedPaths can be used to
79              define a specific value for the real time indexing monitor.
80
81       followLinks = boolean
82              Specifies if the indexer  should  follow  symbolic  links  while
83              walking  the  file tree. The default is to ignore symbolic links
84              to avoid multiple indexing of linked files. No effort is made to
85              avoid  duplication  when this option is set to true. This option
86              can be set individually for each of the topdirs members by using
87              sections. It can not be changed below the topdirs level.
88
89       loglevel = value
90              Verbosity  level  for recoll and recollindex. A value of 4 lists
91              quite a lot of debug/information messages. 3 lists only  errors.
92              daemloglevel  can  be  used to specify a different value for the
93              real-time indexing daemon.
94
95       logfilename = file
96              Where should the messages go. 'stderr' can be used as a  special
97              value.  daemlogfilename can be used to specify a different value
98              for the real-time indexing daemon.
99
100       indexstemminglanguages = languages
101              A list of languages for which the stem expansion databases  will
102              be built. See recollindex(1) for possible values.
103
104       defaultcharset = charset
105              The name of the character set used for files that do not contain
106              a character set definition (ie: plain text files). This  can  be
107              redefined for any subdirectory.
108
109       maxfsoccuppc = percentnumber
110              Maximum  file  system  occupation  before  we stop indexing. The
111              value is a percentage, corresponding to what the  "Capacity"  df
112              output  column shows.  The default value is 0, meaning no check‐
113              ing.
114
115       idxflushmb = megabytes
116              Threshold (megabytes of new text data) where we flush from  mem‐
117              ory to disk index. Setting this can help control memory usage. A
118              value of 0 means no explicit flushing, letting  Xapian  use  its
119              own  default,  which  is  flushing  every  10000  documents  (or
120              XAPIAN_FLUSH_THRESHOLD), meaning that memory  usage  depends  on
121              average document size. The default value is 10.
122
123       filtersdir = directory
124              A  directory  to  search for the external filter scripts used to
125              index some types of files. The  value  should  not  be  changed,
126              except  if  you  want  to modify one of the default scripts. The
127              value can be redefined for any subdirectory.
128
129       iconsdir = directory
130              The name of the directory where recoll  result  list  icons  are
131              stored. You can change this if you want different images.
132
133       guesscharset = boolean
134              Try  to guess the character set of files if no internal value is
135              available (ie: for plain text files). This does not work well in
136              general, and should probably not be used.
137
138       usesystemfilecommand = boolean
139              Decide  if we use the file -i system command as a final step for
140              determining the mime type for a file (the  main  procedure  uses
141              suffix associations as defined in the mimemap file). This can be
142              useful for files with suffixless names, but it will  also  cause
143              the indexation of many bogus "text" files.
144
145       indexedmimetypes = list
146              Recoll  normally  indexes  any  file which it knows how to read.
147              This list lets you restrict the indexed mime types to  what  you
148              specify.  If  the variable is unspecified or the list empty (the
149              default), all supported types are processed.
150
151       compressedfilemaxkbs = value
152              Size limit for compressed (.gz or .bz2) files. These need to  be
153              decompressed  in a temporary directory for identification, which
154              can be very wasteful if 'uninteresting' big compressed files are
155              present.   Negative means no limit, 0 means no processing of any
156              compressed file. Defaults to -1.
157
158       indexallfilenames = boolean
159              Recoll indexes file names into a special section of the database
160              to  allow  specific  file  names searches using wild cards. This
161              parameter decides if file name indexing is  performed  only  for
162              files  with  mime  types  that  would qualify them for full text
163              indexation, or for all files inside the selected subtrees, inde‐
164              pendent of mime type.
165
166       idxabsmlen = value
167              Recoll stores an abstract for each indexed file inside the data‐
168              base. The text can come from an actual 'abstract' section in the
169              document  or  will  just be the beginning of the document. It is
170              stored in the index so that  it  can  be  displayed  inside  the
171              result  lists without decoding the original file. The idxabsmlen
172              parameter defines the size of the stored abstract.  The  default
173              value  is  250 bytes.  The search interface gives you the choice
174              to display this stored text or a  synthetic  abstract  built  by
175              extracting  text  around  the search terms. If you always prefer
176              the synthetic abstract, you can reduce this  value  and  save  a
177              little space.
178
179       aspellLanguage = lang
180              Language definitions to use when creating the aspell dictionary.
181              The value must match a set of aspell language definition  files.
182              You  can  type  "aspell config" to see where these are installed
183              (look for data-dir). The default if the variable is not  set  is
184              to  use  your desktop national language environment to guess the
185              value.
186
187       noaspell = boolean
188              If this is set, the aspell dictionary generation is turned  off.
189              Useful  for cases where you don't need the functionality or when
190              it is unusable because aspell crashes during dictionary  genera‐
191              tion.
192
193       nocjk = boolean
194              If  this  set to true, specific east asian (Chinese Korean Japa‐
195              nese) characters/word splitting is turned off. This will save  a
196              small  amount of cpu if you have no CJK documents. If your docu‐
197              ment base does include such text but you are not  interested  in
198              searching  it, setting nocjk may be a significant time and space
199              saver.
200
201       cjkngramlen = value
202              This lets you adjust the size of n-grams used for  indexing  CJK
203              text.  The  default  value  of 2 is probably appropriate in most
204              cases. A value of 3 would allow more precision and efficiency on
205              longer  words,  but  the  index  will  be approximately twice as
206              large.
207

SEE ALSO

209       recollindex(1) recoll(1)
210
211
212
213                                8 January 2006                  RECOLL.CONF(5)
Impressum