1Mail::Mbox::MessageParsUesre(r3)Contributed Perl DocumenMtaaitli:o:nMbox::MessageParser(3)
2
3
4

NAME

6       Mail::Mbox::MessageParser - A fast and simple mbox folder reader
7

SYNOPSIS

9         #!/usr/bin/perl
10
11         use Mail::Mbox::MessageParser;
12
13         # Compression support
14         my $file_name = 'mail/saved-mail.xz';
15         my $file_handle = new FileHandle($file_name);
16
17         # Set up cache. (Not necessary if enable_cache is false.)
18         Mail::Mbox::MessageParser::SETUP_CACHE(
19           { 'file_name' => '/tmp/cache' } );
20
21         my $folder_reader =
22           new Mail::Mbox::MessageParser( {
23             'file_name' => $file_name,
24             'file_handle' => $file_handle,
25             'enable_cache' => 1,
26             'enable_grep' => 1,
27           } );
28
29         die $folder_reader unless ref $folder_reader;
30
31         # Any newlines or such before the start of the first email
32         my $prologue = $folder_reader->prologue;
33         print $prologue;
34
35         # This is the main loop. It's executed once for each email
36         while(!$folder_reader->end_of_file())
37         {
38           my $email = $folder_reader->read_next_email();
39           print $$email;
40         }
41

DESCRIPTION

43       This module implements a fast but simple mbox folder reader. One of
44       three implementations (Cache, Grep, Perl) will be used depending on the
45       wishes of the user and the system configuration. The first
46       implementation is a cached-based one which stores email information
47       about mailboxes on the file system.  Subsequent accesses will be faster
48       because no analysis of the mailbox will be needed. The second
49       implementation is one based on GNU grep, and is significantly faster
50       than the Perl version for mailboxes which contain very large (10MB)
51       emails. The final implementation is a fast Perl-based one which should
52       always be applicable.
53
54       The Cache implementation is about 6 times faster than the standard Perl
55       implementation. The Grep implementation is about 4 times faster than
56       the standard Perl implementation. If you have GNU grep, it's best to
57       enable both the Cache and Grep implementations. If the cache
58       information is available, you'll get very fast speeds. Otherwise,
59       you'll take about a 1/3 performance hit when the Grep version is used
60       instead.
61
62       The overriding requirement for this module is speed. If you wish more
63       sophisticated parsing, use Mail::MboxParser (which is based on this
64       module) or Mail::Box.
65
66   METHODS AND FUNCTIONS
67       SETUP_CACHE(...)
68             SETUP_CACHE( { 'file_name' => <cache file name> } );
69
70             <cache file name> - the file name of the cache
71
72           Call this function once to set up the cache before creating any
73           parsers. You must provide the location to the cache file. There is
74           no default value.
75
76       new(...)
77             new( { 'file_name' => <mailbox file name>,
78               'file_handle' => <mailbox file handle>,
79               'enable_cache' => <1 or 0>,
80               'enable_grep' => <1 or 0>,
81               'force_processing' => <1 or 0>,
82               'debug' => <1 or 0>,
83             } );
84
85             <mailbox file name> - the file name of the mailbox
86             <mailbox file handle> - the already opened file handle for the mailbox
87             <enable_cache> - true to attempt to use the cache implementation
88             <enable_grep> - true to attempt to use the grep implementation
89             <force_processing> - true to force processing of files that look invalid
90             <debug> - true to print some debugging information to STDERR
91
92           The constructor takes either a file name or a file handle, or both.
93           If the file handle is not defined, Mail::Mbox::MessageParser will
94           attempt to open the file using the file name. You should always
95           pass the file name if you have it, so that the parser can cache the
96           mailbox information.
97
98           This module will automatically decompress the mailbox as necessary.
99           If a filename is available but the file handle is undef, the module
100           will call bzip, bzip2, gzip, lzip, xz to decompress the file in
101           memory if the filename ends with the appropriate suffix. If the
102           file handle is defined, it will detect the type of compression and
103           apply the correct decompression program.
104
105           The Cache, Grep, or Perl implementation of the parser will be
106           loaded, whichever is most appropriate. For example, the first time
107           you use caching, there will be no cache. In this case, the grep
108           implementation can be used instead. The cache will be updated in
109           memory as the grep implementation parses the mailbox, and the cache
110           will be written after the program exits. The file name is optional,
111           in which case enable_cache and enable_grep must both be false.
112
113           force_processing will cause the module to process folders that look
114           to be binary, or whose text data doesn't look like a mailbox.
115
116           Returns a reference to a Mail::Mbox::MessageParser object on
117           success, and a scalar desribing an error on failure. ("Not a
118           mailbox", "Can't open <filename>: <system error>", "Can't execute
119           <uncompress command> for file <filename>"
120
121       reset()
122           Reset the filehandle and all internal state. Note that this will
123           not work with filehandles which are streams. If there is enough
124           demand, I may add the ability to store the previously read stream
125           data internally so that reset() will work correctly.
126
127       endline()
128           Returns "\n" or "\r\n", depending on the file format.
129
130       prologue()
131           Returns any newlines or other content at the start of the mailbox
132           prior to the first email.
133
134       end_of_file()
135           Returns true if the end of the file has been encountered.
136
137       line_number()
138           Returns the line number for the start of the last email read.
139
140       number()
141           Returns the number of the last email read. (i.e. The first email
142           will have a number of 1.)
143
144       length()
145           Returns the length of the last email read.
146
147       offset()
148           Returns the byte offset of the last email read.
149
150       read_next_email()
151           Returns a reference to a scalar holding the text of the next email
152           in the mailbox, or undef at the end of the file.
153

BUGS

155       No known bugs.
156
157       Contact david@coppit.org for bug reports and suggestions.
158

AUTHOR

160       David Coppit <david@coppit.org>.
161

LICENSE

163       This code is distributed under the GNU General Public License (GPL)
164       Version 2.  See the file LICENSE in the distribution for details.
165

HISTORY

167       This code was originally part of the grepmail distribution. See
168       http://grepmail.sf.net/ for previous versions of grepmail which
169       included early versions of this code.
170

SEE ALSO

172       Mail::MboxParser, Mail::Box
173
174
175
176perl v5.38.0                      2023-07-20      Mail::Mbox::MessageParser(3)
Impressum