1MAIRIX(1)                   General Commands Manual                  MAIRIX(1)
2
3
4

NAME

6       mairix - index and search mail folders
7

SYNOPSIS

9   Indexing
10       mairix  [  -v|--verbose  ]  [  -p|--purge  ] [ -f|--rcfile mairixrc ] [
11       -F|--fast-index ]
12
13
14   Searching
15       mairix [ -v|--verbose ] [ -f|--rcfile mairixrc ] [ -r|--raw-output ]  [
16       -x|--excerpt-output  ]  [  -o|--mfolder  mfolder  ]  [ -a|--augment ] [
17       -t|--threads ] search-patterns
18
19
20   Other
21       mairix [ -h|--help ]
22
23       mairix [ -V|--version ]
24
25       mairix [ -d|--dump ]
26
27

DESCRIPTION

29       mairix indexes and searches a collection of email messages.  The  fold‐
30       ers  containing the messages for indexing are defined in the configura‐
31       tion file.  The indexing stage produces a database file.  The  database
32       file  provides  rapid  access to details of the indexed messages during
33       searching operations.  A search normally produces a  folder  (so-called
34       mfolder)  containing  the  matched  messages.  However, a raw mode (-r)
35       exists which just lists the matched messages instead.
36
37       It can operate with the following folder types
38
39       *      maildir
40
41       *      MH (compatible with the MH folder formats used by xmh, sylpheed,
42              claws-mail, nnml (Gnus) and evolution)
43
44       *      mbox  (including  mboxes  that have been compressed with gzip or
45              bzip2)
46
47       If maildir or MH source folders are used,  and  a  search  outputs  its
48       matches  to an mfolder in maildir or MH format, symbolic links are used
49       to reference the original messages inside  the  mfolder.   However,  if
50       mbox folders are involved, copies of messages are made instead.
51
52

OPTIONS

54       mairix decides whether indexing or searching is required by looking for
55       the presence of any search-patterns on the command line.
56
57
58   Special modes
59       -h, --help
60              Show usage summary and exit
61
62
63       -V, --version
64              Show program version and exit
65
66
67       -d
68              Dump the database's contents in human-readable form to stdout.
69
70
71   General options
72       -f mairixrc
73       --rcfile mairixrc
74              Specify an alternative configuration file to use.   The  default
75              configuration file is ~/.mairixrc.
76
77
78       -v, --verbose
79              Make the output more verbose
80
81
82       -Q, --no-integrity-checks
83              Normally  mairix  will  do  some internal integrity tests on the
84              database.  The -Q option removes these checks, making mairix run
85              faster,  but  it will be less likely to detect internal problems
86              if any bugs creep in.
87
88              The nochecks directive in the rc file has the same effect.
89
90
91       --unlock
92              mairix locks its database file during any indexing or  searching
93              operation  to  prevent  multiple  indexing runs interfering with
94              each other, or an indexing run  interfering  with  search  runs.
95              The  --unlock  option  removes  the  lockfile  before  doing the
96              requested indexing or searching operation.  This is a convenient
97              way  of  cleaning  up a stale lockfile if an earlier run crashed
98              for some reason or was aborted.
99
100
101   Indexing options
102       -p, --purge
103              Cause stale (dead) messages to be purged from the database  dur‐
104              ing  an indexing run.  (Normally, stale messages are left in the
105              database because of the additional cost of compacting  away  the
106              storage that they take up.)
107
108
109       -F, --fast-index
110              When processing maildir and MH folders, mairix normally compares
111              the mtime and size of each message against the values stored  in
112              the  database.   If  they have changed, the message will be res‐
113              canned.  This check requires each message file  to  be  stat'ed.
114              For large numbers of messages in these folder types, this can be
115              a sizeable overhead.
116
117              This option tells mairix to assume that when a message currently
118              on-disc  has  a  name  matching  one already in the database, it
119              should assume the message is unchanged.
120
121              A later indexing run without using this option will fix  up  any
122              rescans that were missed due to its use.
123
124
125   Search options
126       -a, --augment
127              Append  newly matches messages to the current mfolder instead of
128              creating the mfolder from scratch.
129
130
131       -t, --threads
132              As well as returning the matched  messages,  also  return  every
133              message in the same thread as one of the real matches.
134
135
136       -r, --raw-output
137              Instead  of creating an mfolder containing the matched messages,
138              just show their paths on stdout.
139
140
141       -x, --excerpt-output
142              Instead of creating an mfolder containing the matched  messages,
143              display  an  excerpt  from their headers on stdout.  The excerpt
144              shows To, Cc, From, Subject and Date.
145
146
147       -o mfolder
148       --mfolder mfolder
149              Specify a temporary alternative path for  the  mfolder  to  use,
150              overriding the mfolder directive in the rc file.
151
152              mairix will refuse to output search results into any folder that
153              appears to be amongst those that are indexed.  This is  to  pre‐
154              vent accidental deletion of emails.
155
156
157   Search patterns
158       t:word
159              Match word in the To: header.
160
161
162       c:word
163              Match word in the Cc: header.
164
165
166       f:word
167              Match word in the From: header.
168
169
170       s:word
171              Match word in the Subject: header.
172
173
174       m:word
175              Match word in the Message-ID: header.
176
177
178       b:word
179              Match word in the message body.
180
181              Message  body  is taken to mean any body part of type text/plain
182              or text/html.  For text/html, text within meta tags is  ignored.
183              In  particular, the URLs inside <A HREF="..."> tags are not cur‐
184              rently indexed.  Non-text attachments are ignored.   If  there's
185              an  attachment  of  type  message/rfc822, this is parsed and the
186              match is performed on this sub-message too.  If  a  hit  occurs,
187              the enclosing message is treated as having a hit.
188
189
190       d:[start-datespec]-[end-datespec]
191              Match messages with Date: headers lying in the specific range.
192
193
194       z:[low-size]-[high-size]
195              Match  messages  whose size lies in the specified range.  If the
196              low-size argument is omitted it defaults to zero.  If the  high-
197              size argument is omitted it defaults to infinite size.
198
199              For  example,  to match messages between 10kilobytes and 20kilo‐
200              bytes in size, the following search term can be used:
201
202                   mairix z:10k-20k
203
204
205
206              The suffix 'k' on a number means multiply by 1024, and the  suf‐
207              fix 'M' on a number means multiply by 1024*1024.
208
209
210       n:word
211              Match  word  occurring  as the name of an attachment in the mes‐
212              sage.  Since attachment names  are  usually  long,  this  option
213              would usually be used in the substring form.  So
214
215                   mairix n:mairix=
216
217
218
219              would match all messages which have attachments whose names con‐
220              tain the substring mairix.
221
222              The attachment name is determined from  the  name=xxx  or  file‐
223              name=xxx  qualifiers  on  the Content-Type: and Content-Disposi‐
224              tion: headers respectively.
225
226
227       F:flags
228              Match messages with particular  flag  settings.   The  available
229              flags are 's' meaning seen, 'r' meaning replied, and 'f' meaning
230              flagged.  The flags are case-insensitive.  A flag letter may  be
231              prefixed by a '-' to negate its sense.  Thus
232
233
234                   mairix F:-s d:1w-
235
236
237
238              would match any unread message less than a week old, and
239
240
241                   mairix F:f-r d:-1m
242
243
244
245              would  match  any  flagged  message older than a month which you
246              haven't replied to yet.
247
248              Note that the flag characters  and  their  meanings  agree  with
249              those used as the suffix letters on message filenames in maildir
250              folders.
251
252
253   Searching for a match amongst more than one part of a message
254       Multiple body parts may be grouped together, if a match in any of  them
255       is sought.  Common examples follow.
256
257
258       tc:word
259              Match word in either the To: or Cc: headers (or both).
260
261
262       bs:word
263              Match word in either the Subject: header or the message body (or
264              both).
265
266
267       The a: search pattern is an abbreviation for tcf:; i.e. match the  word
268       in  the  To:,  Cc: or From: headers.  ("a" stands for "address" in this
269       case.)
270
271
272   Match words
273       The word argument to the search strings can take various forms.
274
275
276       ~word
277              Match messages not containing the word.
278
279
280       word1,word2
281              This matches if both the words are matched in the specified mes‐
282              sage part.
283
284
285       word1/word2
286              This matches if either of the words are matched in the specified
287              message part.
288
289
290       substring=
291              Match any word containing substring as a substring
292
293
294       substring=N
295              Match any word containing substring, allowing up to N errors  in
296              the  match.   For example, if N is 1, a single error is allowed,
297              where an error can be
298
299       *      a missing letter
300
301       *      an extra letter
302
303       *      a different letter.
304
305
306       ^substring=
307              Match any word containing substring as  a  substring,  with  the
308              requirement  that  substring  occurs  at  the  beginning  of the
309              matched word.
310
311
312   Precedence matters
313       The binding order of the constructions is:
314
315
316       1.     Individual command line  arguments  define  separate  conditions
317              which are AND-ed together
318
319
320       2.     Within  a  single  argument, the letters before the colon define
321              which message parts the expression applies to.  If there  is  no
322              colon,  the expression applies to all the headers listed earlier
323              and the body.
324
325
326       3.     After the colon, commas delineate separate disjuncts, which  are
327              OR-ed together.
328
329
330       4.     Each  disjunct  may  contain separate conjuncts, which are sepa‐
331              rated by plus signs.  These conditions are AND-ed together.
332
333
334       5.     Each conjunct may start with a tilde to negate it,  and  may  be
335              followed  by  a  slash to indicate a substring match, optionally
336              followed by an integer to define the maximum  number  of  errors
337              allowed.
338
339
340   Date specification
341       This  section  describes  the  syntax  used  for  specifying dates when
342       searching using the `d:' option.
343
344       Dates are specified as a range.  The start and end  of  the  range  can
345       both  be  specified.   Alternatively,  if  the  start is omitted, it is
346       treated as being the beginning of time.  If the end is omitted,  it  is
347       treated as the current time.
348
349       There are 4 basic formats:
350
351       d:start-end
352              Specify both start and end explicitly
353
354       d:start-
355              Specify start, end is the current time
356
357       d:-end Specify  end,  start  is 'a long time ago' (i.e. early enough to
358              include any message).
359
360       d:period
361              Specify start and end implicitly, as the start and  end  of  the
362              period given.
363
364
365       The start and end can be specified either absolute or relative.  A rel‐
366       ative endpoint is given as a number followed by a single letter  defin‐
367       ing the scaling:
368
369
370       ┌────────┬─────────────┬───────────┬───────────────────────┐
371letter  short for  example  meaning              
372       ├────────┼─────────────┼───────────┼───────────────────────┤
373       │d       │  days       │  3d       │  3 days               │
374       │w       │  weeks      │  2w       │  2 weeks (14 days)    │
375       │m       │  months     │  5m       │  5 months (150 days)  │
376       │y       │  years      │  4y       │  4 years (4*365 days) │
377       └────────┴─────────────┴───────────┴───────────────────────┘
378
379       Months  are  always treated as 30 days, and years as 365 days, for this
380       purpose.
381
382       Absolute times can be specified in many forms.  Some forms have differ‐
383       ent  meanings  when they define a start date from that when they define
384       an end date.  Where a single expression specifies both  the  start  and
385       end (i.e. where the argument to d: doesn't contain a `-'), it will usu‐
386       ally have different interpretations in the two cases.
387
388       In the examples below, suppose the current date  is  Sunday  May  18th,
389       2003 (when I started to write this material.)
390
391
392       ┌─────────────────────┬──────────────────────┬───────────────────────┬─────────────────────────────────┐
393       │Example              │  Start date          │  End date             │  Notes                          │
394       ├─────────────────────┼──────────────────────┼───────────────────────┼─────────────────────────────────┤
395       │d:20030301-20030425  │  March 1st, 2003     │  25th April, 2003     │                                 │
396       │d:030301-030425      │  March 1st, 2003     │  April 25th, 2003     │  century assumed                │
397       │d:mar1-apr25         │  March 1st, 2003     │  April 25th, 2003     │                                 │
398       │d:Mar1-Apr25         │  March 1st, 2003     │  April 25th, 2003     │  case insensitive               │
399       │d:MAR1-APR25         │  March 1st, 2003     │  April 25th, 2003     │  case insensitive               │
400       │d:1mar-25apr         │  March 1st, 2003     │  April 25th, 2003     │  date and month in either order │
401       │d:2002               │  January 1st, 2002   │  December 31st, 2002  │  whole year                     │
402       │d:mar                │  March 1st, 2003     │  March 31st, 2003     │  most recent March              │
403       │d:oct                │  October 1st, 2002   │  October 31st, 2002   │  most recent October            │
404       │d:21oct-mar          │  October 21st, 2002  │  March 31st, 2003     │  start before end               │
405       │d:21apr-mar          │  April 21st, 2002    │  March 31st, 2003     │  start before end               │
406       │d:21apr-             │  April 21st, 2003    │  May 18th, 2003       │  end omitted                    │
407       │d:-21apr             │  January 1st, 1900   │  April 21st, 2003     │  start omitted                  │
408       │d:6w-2w              │  April 6th, 2003     │  May 4th, 2003        │  both dates relative            │
409       │d:21apr-1w           │  April 21st, 2003    │  May 11th, 2003       │  one date relative              │
410       │d:21apr-2y           │  April 21st, 2001    │  May 11th, 2001       │  start before end               │
411       │d:99-11              │  January 1st, 1999   │  May 11th, 2003       │ 2 digits are a day of the month │
412       │                     │                      │                       │ if possible, otherwise a year   │
413       │d:99oct-1oct         │  October 1st, 1999   │  October 1st, 2002    │ end before now, single digit is │
414       │                     │                      │                       │ a day of the month              │
415       │d:99oct-01oct        │  October 1st, 1999   │  October 31st, 2001   │ 2  digits  starting  with  zero │
416       │                     │                      │                       │ treated as a year               │
417       │d:oct99-oct1         │  October 1st, 1999   │  October 1st, 2002    │ day and month in either order   │
418       │d:oct99-oct01        │  October 1st, 1999   │  October 31st, 2001   │ year and month in either order  │
419       └─────────────────────┴──────────────────────┴───────────────────────┴─────────────────────────────────┘
420
421       The principles in the table work as follows.
422
423       ·      When the expression defines a period of more than a day (i.e. if
424              a month or year is specified), the earliest day in the period is
425              taken when the start date is defined, and the last  day  in  the
426              period if the end of the range is being defined.
427
428       ·      The  end  date  is  always  taken to be on or before the current
429              date.
430
431       ·      The start date is always taken to be on or before the end date.
432
433

SETTING UP THE MATCH FOLDER

435       If the match folder does not exist when running in search mode,  it  is
436       automatically  created.   For  'mformat=maildir'  (the  default),  this
437       should be all you need to do.  If you use 'mformat=mh', you may have to
438       run  some  commands before your mailer will recognize the folder.  e.g.
439       for mutt, you could do
440
441              mkdir -p /home/richard/Mail/mfolder
442              touch /home/richard/Mail/mfolder/.mh_sequences
443
444       which seems  to  work.   Alternatively,  within  mutt,  you  could  set
445       MBOX_TYPE to in advance.
446
447       If  you use Sylpheed, the best way seems to be to create the new folder
448       from within Sylpheed before letting mairix write into it.
449
450

EXAMPLES

452       Suppose my email address is <richard@doesnt.exist>.
453
454       Either of the following will match all messages  newer  than  3  months
455       from me with the word 'chrony' in the subject line:
456
457              mairix d:3m- f:richard+doesnt+exist s:chrony
458              mairix d:3m- f:richard@doesnt.exist s:chrony
459
460       Suppose  I  don't  mind a few spurious matches on the address, I want a
461       wider date range, and I suspect that some messages I replied  to  might
462       have  had  the  subject  keyword  spelt  wrongly  (let's  allow up to 2
463       errors):
464
465              mairix d:6m- f:richard s:chrony=2
466

NOTES

468       mairix works exclusively in terms of words.  The index that's built  in
469       indexing  mode contains a table of which words occur in which messages.
470       Hence, the search capability is based on finding messages that  contain
471       particular  words.  mairix defines a word as any string of alphanumeric
472       characters + underscore.  Any whitespace, punctuation, hyphens etc  are
473       treated as word boundaries.
474
475       mairix  has  special  handling  for  the  To:,  Cc:  and From: headers.
476       Besides the normal word scan, these headers are scanned a second  time,
477       where  the characters '@', '-' and '.' are also treated as word charac‐
478       ters.  This allows most (if not all) email addresses to appear  in  the
479       database  as  single  words.   So  if  you have a mail from wibble@foo‐
480       bar.zzz, it will match on both these searches
481
482
483              mairix f:foobar
484              mairix f:wibble@foobar.zzz
485
486       It should be clear by now that the searching cannot  be  used  to  find
487       messages  matching  general  regular  expressions.  This has never been
488       much of a limitation.  Most searches are for particular  keywords  that
489       were  in the messages, or details of the recipients, or the approximate
490       date.
491
492       It's also worth pointing out that there is  no  'locality'  information
493       stored, so you can't search for messages that have one words 'close' to
494       some other word.  For every message and every word, there is  a  simple
495       yes/no  condition  stored  - whether the message contains the word in a
496       particular header or in the body.  So far this has proved  to  be  ade‐
497       quate.  mairix has a similar feel to using an Internet search engine.
498
499

FILES

501       ~/.mairixrc
502
503

AUTHOR

505       Copyright (C) 2002-2006 Richard P. Curnow <rc@rc0.org.uk>
506

SEE ALSO

508       mairixrc(5)
509

BUGS

511       We need a plugin scheme to allow more types of attachment to be scanned
512       and indexed.
513
514
515
516
517                                 January 2006                        MAIRIX(1)
Impressum