1MU-INDEX(1)                 General Commands Manual                MU-INDEX(1)
2
3
4

NAME

6       mu index - index e-mail messages stored in Maildirs
7
8

SYNOPSIS

10       mu index [options]
11
12

DESCRIPTION

14       mu  index is the mu command for scanning the contents of Maildir direc‐
15       tories and storing the results in a Xapian database. The data can  then
16       be queried using mu-find(1).
17
18       index understands Maildirs as defined by Daniel Bernstein for qmail(7).
19       In  addition,  it  understands  recursive  Maildirs  (Maildirs   within
20       Maildirs),  Maildir++.  It can also deal with VFAT-based Maildirs which
21       use '!' as the separators instead of ':'.
22
23       E-mail messages which are not stored in something resembling a  maildir
24       leaf-directory  (cur and new) are ignored, as are the cache directories
25       for notmuch and gnus, and any dot-directory.
26
27       Symlinks are not followed.
28
29       If there is a file called .noindex in a directory, the contents of that
30       directory  and  all  of its subdirectories will be ignored. This can be
31       useful to exclude certain directories from the  indexing  process,  for
32       example directories with spam-messages.
33
34       If  there  is  a  file called .noupdate in a directory, the contents of
35       that directory and all of its subdirectories will be ignored, unless we
36       do  a  full  rebuild  (with  --rebuild). This can be useful to speed up
37       things you have some maildirs that never  change.  Note  that  you  can
38       still  search  for these messages, this only affects updating the data‐
39       base.
40
41       There also the --lazy-check which can greatly speed  up  indexing;  see
42       below for details.
43
44       The  first  run of mu index may take a few minutes if you have a lot of
45       mail (tens of thousands of messages).  Fortunately, such  a  full  scan
46       needs  to  be  done  only  once;  after  that  it suffices to index the
47       changes,  which  goes  much  faster.  See  the  'Note  on   performance
48       (i,ii,iii)' below for more information.
49
50       The optional 'phase two' of the indexing-process is the removal of mes‐
51       sages from the database for which there is no  longer  a  corresponding
52       file  in  the  Maildir.  If  you  do  not  want  this,  you can use -n,
53       --nocleanup.
54
55       When mu index catches one of the  signals  SIGINT,  SIGHUP  or  SIGTERM
56       (e.g.,  when you press Ctrl-C during the indexing process), it tries to
57       shutdown gracefully; it tries to save and commit data,  and  close  the
58       database etc. If it receives another signal (e.g., when pressing Ctrl-C
59       once more), mu index will terminate immediately.
60
61

OPTIONS

63       Note, some of the general options are described in the  mu(1)  man-page
64       and not here, as they apply to multiple mu commands.
65
66
67       -m, --maildir=<maildir>
68              starts  searching at <maildir>. By default, mu uses whatever the
69              MAILDIR environment variable is set to; if it  is  not  set,  it
70              tries ~/Maildir. See the note on mixing sub-maildirs below.
71
72
73       --my-address=<my-email-address>
74              specifies that some e-mail address is 'my-address' (--my-address
75              can be used multiple times). This is used by mu cfind -- any  e-
76              mail address found in the address fields of a message which also
77              has <my-email-address> in one of its address fields  is  consid‐
78              ered a personal e-mail address. This allows you, for example, to
79              filter out (mu cfind --personal)  addresses  which  were  merely
80              seen in mailing list messages.
81
82
83       --lazy-check
84              in  lazy-check mode, mu does not consider messages for which the
85              time-stamp (ctime) of the  directory  they  reside  in  has  not
86              changed  since  the  previous  indexing run. This is much faster
87              than the non-lazy check, but won't  update  messages  that  have
88              change  (rather than having been added or removed), since merely
89              editing a message does not update the directory  time-stamp.  Of
90              course,  you can run mu-index occasionally without --lazy-check,
91              to pick up such messages.
92
93
94       --nocleanup
95              disables the database cleanup that  mu  does  by  default  after
96              indexing.
97
98
99       --rebuild
100              clear  all messages from the database before indexing. --rebuild
101              guarantees that after the indexing has finished,  there  are  no
102              'old'  messages  in the database anymore, which is not true with
103              --reindex  when  indexing  only  a  part  of   messages   (using
104              --maildir).  For  this  reason,  it is necessary to run mu index
105              --rebuild when there is an upgrade in the  database  format.  mu
106              index will issue a warning about this.
107
108
109       --autoupgrade
110              automatically  use -y, --empty when mu notices that the database
111              version is not up-to-date.  This  option  is  for  use  in  cron
112              scripts  and  the  like, so they won't require any user interac‐
113              tion, even when mu introduces a new database version.
114
115
116       --xbatchsize=<batch size>
117              set the maximum number of messages to process in a single Xapian
118              transaction. In practice, this option is only useful if you find
119              that mu is running out of memory while indexing; in  that  case,
120              you  can  set  the  batch size to (for example) 1000, which will
121              reduce memory consumption, but  also  substantially  reduce  the
122              indexing performance.
123
124
125       --max-msg-size=<max msg size>
126              set  the maximum size (in bytes) for messages. The default maxi‐
127              mum (currently at 500Mb) should be enough in most cases, but  if
128              you  encounter  warnings from mu about ignoring messsage because
129              they are too big, you may want to increase this. Note  that  the
130              reason  for  having  a maximum size is that big messages require
131              big memory allocations, which may lead to problems.
132
133              NOTE: It is not recommended to  mix  maildirs  and  sub-maildirs
134              within  the  hierarchy  in  the same database; for example, it's
135              better  not  to  index  both  with   --maildir=~/MyMaildir   and
136              --maildir=~/MyMaildir/foo,   as  this  may  lead  to  unexpected
137              results when searching with the 'maildir:' search parameter (see
138              below).
139
140
141   A note on performance (i)
142       As a non-scientific benchmark, a simple test on the author's machine (a
143       Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file  system)  with
144       no existing database, and a maildir with 27273 messages:
145
146        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
147        $ time mu index --quiet
148        66,65s user 6,05s system 27% cpu 4:24,20 total
149       (about 103 messages per second)
150
151       A  second run, which is the more typical use case when there is a data‐
152       base already, goes much faster:
153
154        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
155        $ time mu index --quiet
156        0,48s user 0,76s system 10% cpu 11,796 total
157       (more than 56818 messages per second)
158
159       Note that each test flushes the caches first; a more  common  use  case
160       might  be to run mu index when new mail has arrived; the cache may stay
161       quite 'warm' in that case:
162
163        $ time mu index --quiet
164        0,33s user 0,40s system 80% cpu 0,905 total
165       which is more than 30000 messages per second.
166
167
168
169   A note on performance (ii)
170       As per June 2012, we did the same non-scientific benchmark,  this  time
171       with  an Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir
172       with 22589 messages. We start without an existing database.
173
174        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
175        $ time mu index --quiet
176        27,79s user 2,17s system 48% cpu 1:01,47 total
177       (about 813 messages per second)
178
179       A second run, which is the more typical use case when there is a  data‐
180       base already, goes much faster:
181
182        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
183        $ time mu index --quiet
184        0,13s user 0,30s system 19% cpu 2,162 total
185       (more than 173000 messages per second)
186
187
188
189   A note on performance (iii)
190       As  per July 2016, we did the same non-scientific benchmark, again with
191       the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system.  This  time,  the
192       maildir contains 72525 messages.
193
194        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
195        $ time mu index --quiet
196        40,34s user 2,56s system 64% cpu 1:06,17 total
197       (about 1099 messages per second).
198
199       As shown, mu has been getting faster with each release, even with rela‐
200       tively expensive new features such  as  text-normalization  (for  case-
201       insensitve/accent-insensitive  matching). The profiles are dominated by
202       operations in the Xapian database now.
203
204

FILES

206       By default, mu index stores its message database in  ~/.mu/xapian;  the
207       database  has  an  embedded  version  number, and mu will automatically
208       update it when it notices a different version. This  allows  for  auto‐
209       matic  updating  of  mu-versions, without the need to clear out any old
210       databases.
211
212       However, note that versions of mu before 0.7 used a  different  scheme,
213       which  puts  the  database in ~/.mu/xapian-<version>. These older data‐
214       bases can safely be deleted. Starting from  version  0.7,  this  manual
215       cleanup should no longer be needed.
216
217       mu  stores  logs  of  its operations and queries in <muhome>/mu.log (by
218       default, this is ~/.mu/mu.log). Upon startup, mu  checks  the  size  of
219       this   log   file.   If   it   exceeds  1  MB,  it  will  be  moved  to
220       ~/.mu/mu.log.old, overwriting any existing file of that name, and start
221       with  an  empty  log  file.  This scheme allows for continued use of mu
222       without the need for any manual maintenance of log files.
223
224

ENVIRONMENT

226       mu index uses MAILDIR to find the user's Maildir if  it  has  not  been
227       specified  explicitly  with --maildir=<maildir>. If MAILDIR is not set,
228       mu index will try ~/Maildir.
229
230

RETURN VALUE

232       mu index return 0 upon successful  completion,  and  any  other  number
233       greater than 0 signals an error.
234
235

BUGS

237       Please report bugs if you find them: https://github.com/djcb/mu/issues
238
239

AUTHOR

241       Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
242
243

SEE ALSO

245       maildir(5), mu(1), mu-find(1), mu-cfind(1)
246
247
248
249User Manuals                       July 2016                       MU-INDEX(1)
Impressum