1MU-INDEX(1) General Commands Manual MU-INDEX(1)
2
3
4
6 mu index - index e-mail messages stored in Maildirs
7
8
10 mu index [options]
11
12
14 mu index is the mu command for scanning the contents of Maildir direc‐
15 tories and storing the results in a Xapian database. The data can then
16 be queried using mu-find(1).
17
18 index understands Maildirs as defined by Daniel Bernstein for qmail(7).
19 In addition, it understands recursive Maildirs (Maildirs within
20 Maildirs), Maildir++. It can also deal with VFAT-based Maildirs which
21 use '!' as the separators instead of ':'.
22
23 E-mail messages which are not stored in something resembling a maildir
24 leaf-directory (cur and new) are ignored, as are the cache directories
25 for notmuch and gnus, and any dot-directory.
26
27 Symlinks are not followed.
28
29 If there is a file called .noindex in a directory, the contents of that
30 directory and all of its subdirectories will be ignored. This can be
31 useful to exclude certain directories from the indexing process, for
32 example directories with spam-messages.
33
34 If there is a file called .noupdate in a directory, the contents of
35 that directory and all of its subdirectories will be ignored, unless we
36 do a full rebuild (with --rebuild). This can be useful to speed up
37 things you have some maildirs that never change. Note that you can
38 still search for these messages, this only affects updating the data‐
39 base.
40
41 There also the --lazy-check which can greatly speed up indexing; see
42 below for details.
43
44 The first run of mu index may take a few minutes if you have a lot of
45 mail (tens of thousands of messages). Fortunately, such a full scan
46 needs to be done only once; after that it suffices to index the
47 changes, which goes much faster. See the 'Note on performance
48 (i,ii,iii)' below for more information.
49
50 The optional 'phase two' of the indexing-process is the removal of mes‐
51 sages from the database for which there is no longer a corresponding
52 file in the Maildir. If you do not want this, you can use -n,
53 --nocleanup.
54
55 When mu index catches one of the signals SIGINT, SIGHUP or SIGTERM
56 (e.g., when you press Ctrl-C during the indexing process), it tries to
57 shutdown gracefully; it tries to save and commit data, and close the
58 database etc. If it receives another signal (e.g., when pressing Ctrl-C
59 once more), mu index will terminate immediately.
60
61
63 Note, some of the general options are described in the mu(1) man-page
64 and not here, as they apply to multiple mu commands.
65
66
67 -m, --maildir=<maildir>
68 starts searching at <maildir>. By default, mu uses whatever the
69 MAILDIR environment variable is set to; if it is not set, it
70 tries ~/Maildir. See the note on mixing sub-maildirs below.
71
72
73 --my-address=<my-email-address>
74 specifies that some e-mail address is 'my-address' (--my-address
75 can be used multiple times). This is used by mu cfind -- any e-
76 mail address found in the address fields of a message which also
77 has <my-email-address> in one of its address fields is consid‐
78 ered a personal e-mail address. This allows you, for example, to
79 filter out (mu cfind --personal) addresses which were merely
80 seen in mailing list messages.
81
82
83 --lazy-check
84 in lazy-check mode, mu does not consider messages for which the
85 time-stamp (ctime) of the directory they reside in has not
86 changed since the previous indexing run. This is much faster
87 than the non-lazy check, but won't update messages that have
88 change (rather than having been added or removed), since merely
89 editing a message does not update the directory time-stamp. Of
90 course, you can run mu-index occasionally without --lazy-check,
91 to pick up such messages.
92
93
94 --nocleanup
95 disables the database cleanup that mu does by default after
96 indexing.
97
98
99 --rebuild
100 clear all messages from the database before indexing. --rebuild
101 guarantees that after the indexing has finished, there are no
102 'old' messages in the database anymore, which is not true with
103 --reindex when indexing only a part of messages (using
104 --maildir). For this reason, it is necessary to run mu index
105 --rebuild when there is an upgrade in the database format. mu
106 index will issue a warning about this.
107
108
109 --autoupgrade
110 automatically use -y, --empty when mu notices that the database
111 version is not up-to-date. This option is for use in cron
112 scripts and the like, so they won't require any user interac‐
113 tion, even when mu introduces a new database version.
114
115
116 --xbatchsize=<batch size>
117 set the maximum number of messages to process in a single Xapian
118 transaction. In practice, this option is only useful if you find
119 that mu is running out of memory while indexing; in that case,
120 you can set the batch size to (for example) 1000, which will
121 reduce memory consumption, but also substantially reduce the
122 indexing performance.
123
124
125 --max-msg-size=<max msg size>
126 set the maximum size (in bytes) for messages. The default maxi‐
127 mum (currently at 500Mb) should be enough in most cases, but if
128 you encounter warnings from mu about ignoring messsage because
129 they are too big, you may want to increase this. Note that the
130 reason for having a maximum size is that big messages require
131 big memory allocations, which may lead to problems.
132
133 NOTE: It is not recommended to mix maildirs and sub-maildirs
134 within the hierarchy in the same database; for example, it's
135 better not to index both with --maildir=~/MyMaildir and
136 --maildir=~/MyMaildir/foo, as this may lead to unexpected
137 results when searching with the 'maildir:' search parameter (see
138 below).
139
140
141 A note on performance (i)
142 As a non-scientific benchmark, a simple test on the author's machine (a
143 Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with
144 no existing database, and a maildir with 27273 messages:
145
146 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
147 $ time mu index --quiet
148 66,65s user 6,05s system 27% cpu 4:24,20 total
149 (about 103 messages per second)
150
151 A second run, which is the more typical use case when there is a data‐
152 base already, goes much faster:
153
154 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
155 $ time mu index --quiet
156 0,48s user 0,76s system 10% cpu 11,796 total
157 (more than 56818 messages per second)
158
159 Note that each test flushes the caches first; a more common use case
160 might be to run mu index when new mail has arrived; the cache may stay
161 quite 'warm' in that case:
162
163 $ time mu index --quiet
164 0,33s user 0,40s system 80% cpu 0,905 total
165 which is more than 30000 messages per second.
166
167
168
169 A note on performance (ii)
170 As per June 2012, we did the same non-scientific benchmark, this time
171 with an Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir
172 with 22589 messages. We start without an existing database.
173
174 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
175 $ time mu index --quiet
176 27,79s user 2,17s system 48% cpu 1:01,47 total
177 (about 813 messages per second)
178
179 A second run, which is the more typical use case when there is a data‐
180 base already, goes much faster:
181
182 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
183 $ time mu index --quiet
184 0,13s user 0,30s system 19% cpu 2,162 total
185 (more than 173000 messages per second)
186
187
188
189 A note on performance (iii)
190 As per July 2016, we did the same non-scientific benchmark, again with
191 the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system. This time, the
192 maildir contains 72525 messages.
193
194 $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
195 $ time mu index --quiet
196 40,34s user 2,56s system 64% cpu 1:06,17 total
197 (about 1099 messages per second).
198
199 As shown, mu has been getting faster with each release, even with rela‐
200 tively expensive new features such as text-normalization (for case-
201 insensitve/accent-insensitive matching). The profiles are dominated by
202 operations in the Xapian database now.
203
204
206 By default, mu index stores its message database in ~/.mu/xapian; the
207 database has an embedded version number, and mu will automatically
208 update it when it notices a different version. This allows for auto‐
209 matic updating of mu-versions, without the need to clear out any old
210 databases.
211
212 However, note that versions of mu before 0.7 used a different scheme,
213 which puts the database in ~/.mu/xapian-<version>. These older data‐
214 bases can safely be deleted. Starting from version 0.7, this manual
215 cleanup should no longer be needed.
216
217 mu stores logs of its operations and queries in <muhome>/mu.log (by
218 default, this is ~/.mu/mu.log). Upon startup, mu checks the size of
219 this log file. If it exceeds 1 MB, it will be moved to
220 ~/.mu/mu.log.old, overwriting any existing file of that name, and start
221 with an empty log file. This scheme allows for continued use of mu
222 without the need for any manual maintenance of log files.
223
224
226 mu index uses MAILDIR to find the user's Maildir if it has not been
227 specified explicitly with --maildir=<maildir>. If MAILDIR is not set,
228 mu index will try ~/Maildir.
229
230
232 mu index return 0 upon successful completion, and any other number
233 greater than 0 signals an error.
234
235
237 Please report bugs if you find them: https://github.com/djcb/mu/issues
238
239
241 Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
242
243
245 maildir(5), mu(1), mu-find(1), mu-cfind(1)
246
247
248
249User Manuals July 2016 MU-INDEX(1)