1MboxParser(3)         User Contributed Perl Documentation        MboxParser(3)
2
3
4

NAME

6       Mail::MboxParser - read-only access to UNIX-mailboxes
7

SYNOPSIS

9           use Mail::MboxParser;
10
11           my $parseropts = {
12               enable_cache    => 1,
13               enable_grep     => 1,
14               cache_file_name => 'mail/cache-file',
15           };
16           my $mb = Mail::MboxParser->new('some_mailbox',
17                                           decode     => 'ALL',
18                                           parseropts => $parseropts);
19
20           # -----------
21
22           # slurping
23           for my $msg ($mb->get_messages) {
24               print $msg->header->{subject}, "\n";
25               $msg->store_all_attachments(path => '/tmp');
26           }
27
28           # iterating
29           while (my $msg = $mb->next_message) {
30               print $msg->header->{subject}, "\n";
31               # ...
32           }
33
34           # we forgot to do something with the messages
35           $mb->rewind;
36           while (my $msg = $mb->next_message) {
37               # iterate again
38               # ...
39           }
40
41           # subscripting one message after the other
42           for my $idx (0 .. $mb->nmsgs - 1) {
43               my $msg = $mb->get_message($idx);
44           }
45

DESCRIPTION

47       This module attempts to provide a simplified access to standard UNIX-
48       mailboxes.  It offers only a subset of methods to get 'straight to the
49       point'. More sophisticated things can still be done by invoking any
50       method from MIME::Tools on the appropriate return values.
51
52       Mail::MboxParser has not been derived from Mail::Box and thus isn't
53       acquainted with it in any way. It, however, incorporates some
54       invaluable hints by the author of Mail::Box, Mark Overmeer.
55

METHODS

57       See also the section ERROR-HANDLING much further below.
58
59       More to that, see the relevant manpages of Mail::MboxParser::Mail,
60       Mail::MboxParser::Mail::Body and Mail::MboxParser::Mail::Convertable
61       for a description of the methods for these objects.
62
63       new(mailbox, options)
64       new(scalar-ref, options)
65       new(array-ref, options)
66       new(filehandle, options)
67           This creates a new MboxParser-object opening the specified
68           'mailbox' with either absolute or relative path.
69
70           new() can also take a reference to a variable containing the
71           mailbox either as one string (reference to a scalar) or linewise
72           (reference to an array), or a filehandle from which to read the
73           mailbox.
74
75           The following option(s) may be useful. The value in brackets below
76           the key is the default if none given.
77
78               key:      | value:     | description:
79               ==========|============|===============================
80               decode    | 'NEVER'    | never decode transfer-encoded
81               (NEVER)   |            | data
82                         |------------|-------------------------------
83                         | 'BODY'     | will decode body into a human-
84                         |            | readable format
85                         |------------|-------------------------------
86                         | 'HEADER'   | will decode header fields if
87                         |            | any is encoded
88                         |------------|-------------------------------
89                         | 'ALL'      | decode any data
90               ==========|============|===============================
91               uudecode  | 1          | enable extraction of uuencoded
92               (0)       |            | attachments in MIME::Parser
93                         |------------|-------------------------------
94                         | 0          | uuencoded attachments are
95                         |            | treated as plain body text
96               ==========|============|===============================
97               newline   | 'UNIX'     | UNIXish line-endings
98               (AUTO)    |            | ("\n" aka \012)
99                         |------------|-------------------------------
100                         | 'WIN'      | Win32 line-endings
101                         |            | ("\n\r" aka \012\015)
102                         |------------|-------------------------------
103                         | 'AUTO'     | try to do autodetection
104                         |------------|-------------------------------
105                         | custom     | a user-given value for totally
106                         |            | borked mailboxes
107               ==========|============|===============================
108               oldparser | 1          | uses the old (and slower)
109               (0)       |            | parser (but guaranteed to show
110                         |            | the old behaviour)
111                         |------------|-------------------------------
112                         | 0          | uses Mail::Mbox::MessageParser
113               ==========|============|===============================
114               parseropts|            | see "Specifying parser opts"
115                         |            | below
116               ==========|============|===============================
117
118           The newline option comes in handy if you have a mbox-file that
119           happens to not conform to the rules of your operating-system's
120           character semantics one way or another. One such scenario: You are
121           using the module under Win but deliberately have mailboxes with
122           UNIX-newlines (or the other way round). If you do not give this
123           option, 'AUTO' is assumed and some basic tests on the mailbox are
124           performed. This autoedection is of course not capable of detecting
125           cases where you use something like '#DELIMITER' as line-ending. It
126           can as to yet only distinguish between UNIX and Win32ish newlines.
127           You may be lucky and it even works for Macintoshs. If you have more
128           extravagant wishes, pass a costum value:
129
130               my $mb = new Mail::MboxParser ("mbox", newline => '#DELIMITER');
131
132           You can't use regexes here since internally this relies on the $/
133           var ($INPUT_RECORD_SEPERATOR, that is).
134
135           When passing either a scalar-, array-ref or \*STDIN as first-
136           argument, an anonymous tmp-file is created to hold the data. This
137           procedure is hidden away from the user so there is no need to worry
138           about it. Since a tmp-file acts just like an ordinary mailbox-file
139           you don't need to be concerned about loss of data or so once you
140           have been walking through the mailbox-data. No data will be lost
141           and it'll all be fine and smooth.
142
143   Specifying parser options
144       When available, the module will use "Mail::Mbox::MessageParser" to do
145       the parsing. To get the most speed out of it, you can tweak some of its
146       options.  Arguably, you even have to do that in order to make it use
147       caching. Options for the parser are given via the parseropts switch
148       that expects a reference to a hash as values. The values you can
149       specify are:
150
151       enable_cache
152               When set to a true value, caching is used but only if you gave
153               cache_file_name. There is no default value here!
154
155       cache_file_name
156               The file used for caching. This option is mandatory if
157               enable_cache is true.
158
159       enable_grep
160               When set to a true value (which is the default), the extern
161               grep(1) is used to speed up parsing. If your system does not
162               provide a usable grep implementation, it silently falls back to
163               the pure Perl parser.
164
165       When the module was unable to create a "Mail::Mbox::MessageParser"
166       object, it will fall back to the old parser in the hope that the
167       construction of the object then succeeds.
168
169       open(source, options)
170           Takes exactly the same arguments as new() does just that it can be
171           used to change the characteristics of a mailbox on the fly.
172
173       get_messages
174           Returns an array containing all messages in the mailbox
175           respresented as Mail::MboxParser::Mail objects. This method is
176           _minimally_ quicker than iterating over the mailbox using
177           "next_message" but eats much more memory.  Memory-usage will grow
178           linearly for each new message detected since this method creates a
179           huge array containing all messages. After creating this array, it
180           will be returned.
181
182       get_message(n)
183           Returns the n-th message (first message has index 0) in a mailbox.
184           Examine "$mb->error" which contains an error-string if the message
185           does not exist.  In this case, "get_message" returns undef.
186
187       next_message
188           This lets you iterate over a mailbox one mail after another. The
189           great advantage over "get_messages" is the very low memory-
190           comsumption. It will be at a constant level throughout the
191           execution of your script. Secondly, it almost instantly begins
192           spitting out Mail::MboxParser::Mail-objects since it doesn't have
193           to slurp in all mails before returing them.
194
195       set_pos(n)
196       rewind
197       current_pos
198           These three methods deal with the position of the internal
199           filehandle backening the mailbox. Once you have iterated over the
200           whole mailbox using "next_message" MboxParser has reached the end
201           of the mailbox and you have to do repositioning if you want to
202           iterate again. You could do this with either "set_pos" or "rewind".
203
204               $mb->rewind;  # equivalent to
205               $mb->set_pos(0);
206
207           "current_pos" reveals the current position in the mailbox and can
208           be used to later return to this position if you want to do tricky
209           things. Mark that "current_pos" does *not* return the current line
210           but rather the current character as returned by Perl's tell()
211           function.
212
213               my $last_pos;
214               while (my $msg = $mb->next_message) {
215                   # ...
216                   if ($msg->header->{subject} eq 'I was looking for this') {
217                       $last_pos = $mb->current_pos;
218                       last; # bail out here and do something else
219                   }
220               }
221
222               # ...
223               # ...
224
225               # now continue where we stopped:
226               $mb->set_pos($last_pos)
227               while (my $msg = $mb->next_message) {
228                   # ...
229               }
230
231           WARNING:  Be very careful with these methods when using the parser
232           of "Mail::Mbox::MessageParser". This parser maintains its own state
233           and you shouldn't expect it to always be in sync with the state of
234           "Mail::MboxParser".  If you need some finer control over the
235           parsing, better consider to use the public interface as described
236           in the manpage of Mail::Mbox::MessageParser. Use parser() to get
237           the underlying parser object.
238
239           This however may expose you to the same problems turned around:
240           "Mail::MboxParser" may loose its sync with its parser when you do
241           that.
242
243           Therefore: Just avoid any of the above for now and wait till
244           "Mail::Mbox::MessageParser" has a stable interface.
245
246       make_index
247           You can force the creation of a message-index with this method. The
248           message-index is a mapping between the index-number of a message (0
249           ..  $mb->nmsgs - 1) and the byte-position of the filehandle. This
250           is usually done automatically for you once you call "get_message"
251           hence the first call for a particular message will be a little
252           slower since the message-index first has to be built. This is,
253           however, done rather quickly.
254
255           You can have a peek at the index if you are interested. The
256           following produces a nicely padded table (suitable for mailboxes up
257           to 9.9999...GB ;-).
258
259               $mb->make_index;
260               for (0 .. $mb->nmsgs - 1) {
261                   printf "%5.5d => %10.10d\n",
262                           $_, $mb->get_pos($_);
263               }
264
265       get_pos(n)
266           This method takes the index-number of a certain message within the
267           mailbox and returns the corresponding position of the filehandle
268           that represents that start of the file.
269
270           It is mainly used by get_message() and you wouldn't really have to
271           bother using it yourself except for statistical purpose as
272           demonstrated above along with make_index.
273
274       nmsgs
275           Returns the number of messages in a mailbox. You could naturally
276           also call get_messages in scalar-context, but this one wont create
277           new objects. It just counts them and thus it is much quicker and
278           wont eat a lot of memory.
279
280       parser
281           Returns the bare "Mail::Mbox::MessageParser" object. If no such
282           object exists returns "undef".
283
284           You can use this method to check whether the module actually uses
285           the old or new parser. If "parser" returns a false value, it is
286           using the old parsing routines.
287
288   METHODS SHARED BY ALL OBJECTS
289       error
290           Call this immediately after one of the methods above that mention a
291           possible error-message.
292
293       log Sort of internal weirdnesses are recorded here. Again only the last
294           event is saved.
295

ERROR-HANDLING

297       Mail::MboxParser provides a mechanism for you to figure out why some
298       methods did not function as you expected. There are four classes of
299       unexpected behavior:
300
301       (1) bad arguments
302           In this case you called a method with arguments that did not make
303           sense, hence you confused Mail::MboxParser. Example:
304
305             $mail->store_entity_body;           # wrong, needs two arguments
306             $mail->store_entity_body(0);        # wrong, still needs one more
307
308           In any of the above two cases, you'll get an error message and your
309           script will exit. The message will, however, tell you in which line
310           of your script this error occured.
311
312       (2) correct arguments but...
313           Consider this line:
314
315             $mail->store_entity_body(50, \*FH); # could be wrong
316
317           Obviously you did call store_entity_body with the correct number of
318           arguments.  That's good because now your script wont just exit.
319           Unfortunately, your program can't know in advance whether the
320           particular mail ($mail) has a 51st entity.
321
322           So, what to do?
323
324           Just be brave: Write the above line and do the error-checking
325           afterwards by calling $mail->error immediately after
326           store_entity_body:
327
328                   $mail->store_entity_body(50, *\FH);
329                   if ($mail->error) {
330                           print "Oups, something wrong:", $mail->error;
331                   }
332
333           In the description of the available methods above, you always find
334           a remark when you could use $mail->error. It always returns a
335           string that you can print out and investigate any further.
336
337       (3) errors, that never get visible
338           Well, they exist. When you handle MIME-stuff a lot such as
339           attachments etc., Mail::MboxParser internally calls a lot of
340           methods provided by the MIME::Tools package. These work splendidly
341           in most cases, but the MIME::Tools may fail to produce something
342           sensible if you have a very queer or even screwed up mailbox.
343
344           If this happens you might find information on that when calling
345           $mail->log.  This will give you the more or less unfiltered error-
346           messages produced by MIME::Tools.
347
348           My advice: Ignore them! If there really is something in $mail->log
349           it is either because you're mails are totally weird (there is
350           nothing you can do about that then) or these errors are smoothly
351           catched inside Mail::MboxParser in which case all should be fine
352           for you.
353
354       (4) the apocalyps
355           If nothing seems to work the way it should and $mail->error is
356           empty, then the worst case has set in: Mail::MboxParser has a bug.
357
358           Needless to say that there is any way to get around of this. In
359           this case you should contact and I'll examine that.
360

CAVEATS

362       I have been working hard on making Mail::MboxParser eat less memory and
363       as quick as possible. Due to that, two time and memory consuming
364       matters are now called on demand. That is, parsing out the MIME-parts
365       and turning the raw header into a hash have become closures.
366
367       The drawback of that is that it may get inefficient if you often call
368
369        $mail->header->{field}
370
371       In this case you should probably save the return value of $mail->header
372       (a hashref) into a variable since each time you call it the raw header
373       is parsed.
374
375       On the other hand, if you have a mailbox of, say, 25MB, and hold each
376       header of each message in memory, you'll quickly run out of that. So,
377       you can now choose between more performance and more memory.
378
379       This all does not happen if you just parse a mailbox to extract one
380       header-field (eg. subject), work with that and exit. In this case it
381       will need both less memory and is still considerably quicker. :-)
382

BUGS

384       Some mailers have a fancy idea of how a "To: "- or "Cc: "-line should
385       look. I have seen things like:
386
387               To: "\"John Doe"\" <john.doe@example.com>
388
389       The splitting into name and email, however, does still work here, but
390       you have to remove these silly double-quotes and backslashes yourself.
391
392       The way of counting the messages and detecting them now complies to RFC
393       822.  This is, however, no guarentee that it all works seamlessly.
394       There are just so many mailboxes that get screwed up by mal-formated
395       mails.
396

TODO

398       Apart from new bugs that almost certainly have been introduced with
399       this release, following things still need to be done:
400
401       Transfer-Encoding
402           Still, only quoted-printable encoding is correctly handled.
403
404       Tests
405           Clean-up of the test-scripts is desperately needed. Now they
406           represent rather an arbitrary selection of tested functions. Some
407           are tested several times while others don't show up at all in the
408           suits.
409

THANKS

411       Thanks to a number of people who gave me invaluable hints that helped
412       me with Mail::Box, notably Mark Overmeer for his hints on more object-
413       orientedness.
414
415       Kenn Frankel (kenn AT kenn DOT cc) kindly patched the broken split-
416       header routine and added get_field().
417
418       David Coppit for making me aware of "Mail::Mbox::MessageParser" and
419       designing it the way I needed to make it work for my module.
420

VERSION

422       This is version 0.55.
423
425       Tassilo von Parseval <tassilo.von.parseval@rwth-aachen.de>
426
427       Copyright (c)  2001-2005 Tassilo von Parseval.  This program is free
428       software; you can redistribute it and/or modify it under the same terms
429       as Perl itself.
430

SEE ALSO

432       MIME::Entity
433
434       Mail::MboxParser::Mail, Mail::MboxParser::Mail::Body,
435       Mail::MboxParser::Mail::Convertable
436
437       Mail::Mbox::MessageParser
438
439
440
441perl v5.38.0                      2023-07-20                     MboxParser(3)
Impressum