1RECOLL.CONF(5) File Formats Manual RECOLL.CONF(5)
2
3
4
6 recoll.conf - main personal configuration file for Recoll
7
9 This file defines the indexation configuration for the Recoll full-text
10 search system.
11
12 The system-wide configuration file is normally located inside
13 /usr/[local]/share/recoll/examples. Any parameter set in the common
14 file may be overridden by setting it in the personal configuration
15 file, by default: $HOME/.recoll/recoll.conf
16
17 Please note while we try to keep this manual page reasonably up to
18 date, it will frequently lag the current state of the software. The
19 best source of information about the configuration are the comments in
20 the configuration file.
21
22
23 A short extract of the file might look as follows:
24
25 # Space-separated list of directories to index.
26 topdirs = ~/docs /usr/share/doc
27
28 [~/somedirectory-with-utf8-txt-files]
29 defaultcharset = utf-8
30
31
32 There are three kinds of lines:
33
34 · Comment or empty
35
36 · Parameter affectation
37
38 · Section definition
39
40 Empty lines or lines beginning with # are ignored.
41
42 Affectation lines are in the form 'name = value'.
43
44 Section lines allow redefining a parameter for a directory subtree.
45 Some of the parameters used for indexaction are looked up hierarchi‐
46 cally from the more to the less specific. Not all parameters can be
47 meaningfully redefined, this is specified for each in the next section.
48
49 The tilde character (~) is expanded in file names to the name of the
50 user's home directory.
51
52 Where values are lists, white space is used for separation, and ele‐
53 ments with embedded spaces can be quoted with double-quotes.
54
56 topdirs = directories
57 Specifies the list of directories to index (recursively).
58
59 dbdir = directory
60 The name of the Xapian database directory. It will be created if
61 needed when the database is initialized. If this is not an abso‐
62 lute pathname, it will be taken relative to the configuration
63 directory.
64
65 skippedNames = patterns
66 A space-separated list of patterns for names of files or direc‐
67 tories that should be completely ignored. The list defined in
68 the default file is:
69
70 *~ #* bin CVS Cache caughtspam tmp
71
72 The list can be redefined for subdirectories, but is only actu‐
73 ally changed for the top level ones in topdirs
74
75 skippedPaths = patterns
76 A space-separated list of patterns for paths the indexer should
77 not descend into. Together with topdirs, this allows pruning the
78 indexed tree to one's content. daemSkippedPaths can be used to
79 define a specific value for the real time indexing monitor.
80
81 followLinks = boolean
82 Specifies if the indexer should follow symbolic links while
83 walking the file tree. The default is to ignore symbolic links
84 to avoid multiple indexing of linked files. No effort is made to
85 avoid duplication when this option is set to true. This option
86 can be set individually for each of the topdirs members by using
87 sections. It can not be changed below the topdirs level.
88
89 loglevel = value
90 Verbosity level for recoll and recollindex. A value of 4 lists
91 quite a lot of debug/information messages. 3 lists only errors.
92 daemloglevel can be used to specify a different value for the
93 real-time indexing daemon.
94
95 logfilename = file
96 Where should the messages go. 'stderr' can be used as a special
97 value. daemlogfilename can be used to specify a different value
98 for the real-time indexing daemon.
99
100 indexstemminglanguages = languages
101 A list of languages for which the stem expansion databases will
102 be built. See recollindex(1) for possible values.
103
104 defaultcharset = charset
105 The name of the character set used for files that do not contain
106 a character set definition (ie: plain text files). This can be
107 redefined for any subdirectory.
108
109 maxfsoccuppc = percentnumber
110 Maximum file system occupation before we stop indexing. The
111 value is a percentage, corresponding to what the "Capacity" df
112 output column shows. The default value is 0, meaning no check‐
113 ing.
114
115 idxflushmb = megabytes
116 Threshold (megabytes of new text data) where we flush from mem‐
117 ory to disk index. Setting this can help control memory usage. A
118 value of 0 means no explicit flushing, letting Xapian use its
119 own default, which is flushing every 10000 documents (or
120 XAPIAN_FLUSH_THRESHOLD), meaning that memory usage depends on
121 average document size. The default value is 10.
122
123 filtersdir = directory
124 A directory to search for the external filter scripts used to
125 index some types of files. The value should not be changed,
126 except if you want to modify one of the default scripts. The
127 value can be redefined for any subdirectory.
128
129 iconsdir = directory
130 The name of the directory where recoll result list icons are
131 stored. You can change this if you want different images.
132
133 guesscharset = boolean
134 Try to guess the character set of files if no internal value is
135 available (ie: for plain text files). This does not work well in
136 general, and should probably not be used.
137
138 usesystemfilecommand = boolean
139 Decide if we use the file -i system command as a final step for
140 determining the mime type for a file (the main procedure uses
141 suffix associations as defined in the mimemap file). This can be
142 useful for files with suffixless names, but it will also cause
143 the indexation of many bogus "text" files.
144
145 indexedmimetypes = list
146 Recoll normally indexes any file which it knows how to read.
147 This list lets you restrict the indexed mime types to what you
148 specify. If the variable is unspecified or the list empty (the
149 default), all supported types are processed.
150
151 compressedfilemaxkbs = value
152 Size limit for compressed (.gz or .bz2) files. These need to be
153 decompressed in a temporary directory for identification, which
154 can be very wasteful if 'uninteresting' big compressed files are
155 present. Negative means no limit, 0 means no processing of any
156 compressed file. Defaults to -1.
157
158 indexallfilenames = boolean
159 Recoll indexes file names into a special section of the database
160 to allow specific file names searches using wild cards. This
161 parameter decides if file name indexing is performed only for
162 files with mime types that would qualify them for full text
163 indexation, or for all files inside the selected subtrees, inde‐
164 pendent of mime type.
165
166 idxabsmlen = value
167 Recoll stores an abstract for each indexed file inside the data‐
168 base. The text can come from an actual 'abstract' section in the
169 document or will just be the beginning of the document. It is
170 stored in the index so that it can be displayed inside the
171 result lists without decoding the original file. The idxabsmlen
172 parameter defines the size of the stored abstract. The default
173 value is 250 bytes. The search interface gives you the choice
174 to display this stored text or a synthetic abstract built by
175 extracting text around the search terms. If you always prefer
176 the synthetic abstract, you can reduce this value and save a
177 little space.
178
179 aspellLanguage = lang
180 Language definitions to use when creating the aspell dictionary.
181 The value must match a set of aspell language definition files.
182 You can type "aspell config" to see where these are installed
183 (look for data-dir). The default if the variable is not set is
184 to use your desktop national language environment to guess the
185 value.
186
187 noaspell = boolean
188 If this is set, the aspell dictionary generation is turned off.
189 Useful for cases where you don't need the functionality or when
190 it is unusable because aspell crashes during dictionary genera‐
191 tion.
192
193 nocjk = boolean
194 If this set to true, specific east asian (Chinese Korean Japa‐
195 nese) characters/word splitting is turned off. This will save a
196 small amount of cpu if you have no CJK documents. If your docu‐
197 ment base does include such text but you are not interested in
198 searching it, setting nocjk may be a significant time and space
199 saver.
200
201 cjkngramlen = value
202 This lets you adjust the size of n-grams used for indexing CJK
203 text. The default value of 2 is probably appropriate in most
204 cases. A value of 3 would allow more precision and efficiency on
205 longer words, but the index will be approximately twice as
206 large.
207
209 recollindex(1) recoll(1)
210
211
212
213 8 January 2006 RECOLL.CONF(5)