1MU-INDEX(1)                 General Commands Manual                MU-INDEX(1)
2
3
4

NAME

6       mu index - index e-mail messages stored in Maildirs
7
8

SYNOPSIS

10       mu index [options]
11
12

DESCRIPTION

14       mu  index is the mu command for scanning the contents of Maildir direc‐
15       tories and storing the results in a Xapian database. The data can  then
16       be queried using mu-find(1).
17
18       Before  the  first  time you run mu index, you must run mu init to ini‐
19       tialize the database.
20
21       index understands Maildirs as defined by Daniel Bernstein for qmail(7).
22       In   addition,  it  understands  recursive  Maildirs  (Maildirs  within
23       Maildirs), Maildir++. It can also deal with VFAT-based  Maildirs  which
24       use '!'  or ';' as the separators instead of ':'.
25
26       E-mail  messages which are not stored in something resembling a maildir
27       leaf-directory (cur and new) are ignored, as are the cache  directories
28       for notmuch and gnus, and any dot-directory.
29
30       Starting  with  mu 1.5.x, symlinks are followed, and can be spread over
31       multiple filesystems; however note that moving  files  around  is  much
32       faster when multiple filesystems are not involved.
33
34       If there is a file called .noindex in a directory, the contents of that
35       directory and all of its subdirectories will be ignored.  This  can  be
36       useful  to  exclude  certain directories from the indexing process, for
37       example directories with spam-messages.
38
39       If there is a file called .noupdate in a  directory,  the  contents  of
40       that directory and all of its subdirectories will be ignored, unless we
41       do a full rebuild (with mu init). This can be useful to speed up things
42       you  have  some  maildirs  that  never  change. Note that you can still
43       search for these messages, this only  affects  updating  the  database.
44       .noupdate  is  ignored  when  you start indexing with an empty database
45       (such as directly after mu init.
46
47       There also the --lazy-check which can greatly speed  up  indexing;  see
48       below for details.
49
50       The  first  run of mu index may take a few minutes if you have a lot of
51       mail (tens of thousands of messages). Fortunately,  such  a  full  scan
52       needs  to  be  done  only  once;  after  that  it suffices to index the
53       changes,  which  goes  much  faster.   See  the  'Note  on  performance
54       (i,ii,iii)' below for more information.
55
56       The optional 'phase two' of the indexing-process is the removal of mes‐
57       sages from the database for which there is no  longer  a  corresponding
58       file  in  the  Maildir.  If you do not want this, you can use -n, --no‐
59       cleanup.
60
61       When mu index catches one of the  signals  SIGINT,  SIGHUP  or  SIGTERM
62       (e.g.,  when you press Ctrl-C during the indexing process), it tries to
63       shutdown gracefully; it tries to save and commit data,  and  close  the
64       database etc. If it receives another signal (e.g., when pressing Ctrl-C
65       once more), mu index will terminate immediately.
66
67

OPTIONS

69       Some of the general options are described in the mu(1) man-page and not
70       here, as they apply to multiple mu commands.
71
72
73       --lazy-check
74              in  lazy-check mode, mu does not consider messages for which the
75              time-stamp (ctime) of the  directory  they  reside  in  has  not
76              changed  since  the  previous  indexing run. This is much faster
77              than the non-lazy check, but won't  update  messages  that  have
78              change  (rather than having been added or removed), since merely
79              editing a message does not update the directory  time-stamp.  Of
80              course,  you can run mu-index occasionally without --lazy-check,
81              to pick up such messages.
82
83
84       --nocleanup
85              disables the database cleanup that mu does by default after  in‐
86              dexing.
87
88
89   A note on performance (i)
90       As a non-scientific benchmark, a simple test on the author's machine (a
91       Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file  system)  with
92       no existing database, and a maildir with 27273 messages:
93
94        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
95        $ time mu index --quiet
96        66,65s user 6,05s system 27% cpu 4:24,20 total
97       (about 103 messages per second)
98
99       A  second run, which is the more typical use case when there is a data‐
100       base already, goes much faster:
101
102        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
103        $ time mu index --quiet
104        0,48s user 0,76s system 10% cpu 11,796 total
105       (more than 56818 messages per second)
106
107       Note that each test flushes the caches first; a more  common  use  case
108       might  be to run mu index when new mail has arrived; the cache may stay
109       quite 'warm' in that case:
110
111        $ time mu index --quiet
112        0,33s user 0,40s system 80% cpu 0,905 total
113       which is more than 30000 messages per second.
114
115
116
117   A note on performance (ii)
118       As per June 2012, we did the same non-scientific benchmark,  this  time
119       with  an Intel i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir
120       with 22589 messages. We start without an existing database.
121
122        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
123        $ time mu index --quiet
124        27,79s user 2,17s system 48% cpu 1:01,47 total
125       (about 813 messages per second)
126
127       A second run, which is the more typical use case when there is a  data‐
128       base already, goes much faster:
129
130        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
131        $ time mu index --quiet
132        0,13s user 0,30s system 19% cpu 2,162 total
133       (more than 173000 messages per second)
134
135
136
137   A note on performance (iii)
138       As  per July 2016, we did the same non-scientific benchmark, again with
139       the Intel i5-2500 CPU @ 3.30GHz, an ext4 file system.  This  time,  the
140       maildir contains 72525 messages.
141
142        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
143        $ time mu index --quiet
144        40,34s user 2,56s system 64% cpu 1:06,17 total
145       (about 1099 messages per second).
146
147
148   A note on performance (iv)
149       A  few years later and its June 2022. There's a lot more happening dur‐
150       ing indexing, but  indexing  became  multi-threaded  and  machines  are
151       faster;  e.g.  this  is  with  an  AMD  Ryzen Threadripper 1950X (32) @
152       3.399GHz.
153
154       The instructions are a little different since we have a proper  repeat‐
155       able benchmark now. After building,
156
157        $ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
158       % THREAD_NUM=4 build/lib/tests/bench-indexer -m perf
159       # random seed: R02Sf5c50e4851ec51adaf301e0e054bd52b
160       1..1
161       # Start of bench tests
162       # Start of indexer tests
163       indexed 5000 messages in 20 maildirs in 3763ms; 752 μs/message; 1328 messages/s (4 thread(s))
164       ok 1 /bench/indexer/4-cores
165       # End of indexer tests
166       # End of bench tests
167
168       Things are again a little faster, even though the index does a lot more
169       now (text-normalizatian, and pre-generating  message-sexps).  A  faster
170       machine helps, too!
171
172

RETURN VALUE

174       mu  index return 0 upon successful completion; any other number signals
175       an error.
176
177

BUGS

179       Please report bugs if you find any: https://github.com/djcb/mu/issues
180
181

AUTHOR

183       Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
184
185

SEE ALSO

187       maildir(5), mu(1), mu-init(1), mu-find(1), mu-cfind(1)
188
189
190
191User Manuals                       June 2022                       MU-INDEX(1)
Impressum