1Mail::Mbox::MessageParsUesre(r3)Contributed Perl DocumenMtaaitli:o:nMbox::MessageParser(3)
2
3
4
6 Mail::Mbox::MessageParser - A fast and simple mbox folder reader
7
9 #!/usr/bin/perl
10
11 use Mail::Mbox::MessageParser;
12
13 # Compression support
14 my $file_name = 'mail/saved-mail.xz';
15 my $file_handle = new FileHandle($file_name);
16
17 # Set up cache. (Not necessary if enable_cache is false.)
18 Mail::Mbox::MessageParser::SETUP_CACHE(
19 { 'file_name' => '/tmp/cache' } );
20
21 my $folder_reader =
22 new Mail::Mbox::MessageParser( {
23 'file_name' => $file_name,
24 'file_handle' => $file_handle,
25 'enable_cache' => 1,
26 'enable_grep' => 1,
27 } );
28
29 die $folder_reader unless ref $folder_reader;
30
31 # Any newlines or such before the start of the first email
32 my $prologue = $folder_reader->prologue;
33 print $prologue;
34
35 # This is the main loop. It's executed once for each email
36 while(!$folder_reader->end_of_file())
37 {
38 my $email = $folder_reader->read_next_email();
39 print $$email;
40 }
41
43 This module implements a fast but simple mbox folder reader. One of
44 three implementations (Cache, Grep, Perl) will be used depending on the
45 wishes of the user and the system configuration. The first
46 implementation is a cached-based one which stores email information
47 about mailboxes on the file system. Subsequent accesses will be faster
48 because no analysis of the mailbox will be needed. The second
49 implementation is one based on GNU grep, and is significantly faster
50 than the Perl version for mailboxes which contain very large (10MB)
51 emails. The final implementation is a fast Perl-based one which should
52 always be applicable.
53
54 The Cache implementation is about 6 times faster than the standard Perl
55 implementation. The Grep implementation is about 4 times faster than
56 the standard Perl implementation. If you have GNU grep, it's best to
57 enable both the Cache and Grep implementations. If the cache
58 information is available, you'll get very fast speeds. Otherwise,
59 you'll take about a 1/3 performance hit when the Grep version is used
60 instead.
61
62 The overriding requirement for this module is speed. If you wish more
63 sophisticated parsing, use Mail::MboxParser (which is based on this
64 module) or Mail::Box.
65
66 METHODS AND FUNCTIONS
67 SETUP_CACHE(...)
68 SETUP_CACHE( { 'file_name' => <cache file name> } );
69
70 <cache file name> - the file name of the cache
71
72 Call this function once to set up the cache before creating any
73 parsers. You must provide the location to the cache file. There is
74 no default value.
75
76 new(...)
77 new( { 'file_name' => <mailbox file name>,
78 'file_handle' => <mailbox file handle>,
79 'enable_cache' => <1 or 0>,
80 'enable_grep' => <1 or 0>,
81 'force_processing' => <1 or 0>,
82 'debug' => <1 or 0>,
83 } );
84
85 <mailbox file name> - the file name of the mailbox
86 <mailbox file handle> - the already opened file handle for the mailbox
87 <enable_cache> - true to attempt to use the cache implementation
88 <enable_grep> - true to attempt to use the grep implementation
89 <force_processing> - true to force processing of files that look invalid
90 <debug> - true to print some debugging information to STDERR
91
92 The constructor takes either a file name or a file handle, or both.
93 If the file handle is not defined, Mail::Mbox::MessageParser will
94 attempt to open the file using the file name. You should always
95 pass the file name if you have it, so that the parser can cache the
96 mailbox information.
97
98 This module will automatically decompress the mailbox as necessary.
99 If a filename is available but the file handle is undef, the module
100 will call bzip, bzip2, gzip, lzip, xz to decompress the file in
101 memory if the filename ends with the appropriate suffix. If the
102 file handle is defined, it will detect the type of compression and
103 apply the correct decompression program.
104
105 The Cache, Grep, or Perl implementation of the parser will be
106 loaded, whichever is most appropriate. For example, the first time
107 you use caching, there will be no cache. In this case, the grep
108 implementation can be used instead. The cache will be updated in
109 memory as the grep implementation parses the mailbox, and the cache
110 will be written after the program exits. The file name is optional,
111 in which case enable_cache and enable_grep must both be false.
112
113 force_processing will cause the module to process folders that look
114 to be binary, or whose text data doesn't look like a mailbox.
115
116 Returns a reference to a Mail::Mbox::MessageParser object on
117 success, and a scalar desribing an error on failure. ("Not a
118 mailbox", "Can't open <filename>: <system error>", "Can't execute
119 <uncompress command> for file <filename>"
120
121 reset()
122 Reset the filehandle and all internal state. Note that this will
123 not work with filehandles which are streams. If there is enough
124 demand, I may add the ability to store the previously read stream
125 data internally so that reset() will work correctly.
126
127 endline()
128 Returns "\n" or "\r\n", depending on the file format.
129
130 prologue()
131 Returns any newlines or other content at the start of the mailbox
132 prior to the first email.
133
134 end_of_file()
135 Returns true if the end of the file has been encountered.
136
137 line_number()
138 Returns the line number for the start of the last email read.
139
140 number()
141 Returns the number of the last email read. (i.e. The first email
142 will have a number of 1.)
143
144 length()
145 Returns the length of the last email read.
146
147 offset()
148 Returns the byte offset of the last email read.
149
150 read_next_email()
151 Returns a reference to a scalar holding the text of the next email
152 in the mailbox, or undef at the end of the file.
153
155 No known bugs.
156
157 Contact david@coppit.org for bug reports and suggestions.
158
160 David Coppit <david@coppit.org>.
161
163 This code is distributed under the GNU General Public License (GPL)
164 Version 2. See the file LICENSE in the distribution for details.
165
167 This code was originally part of the grepmail distribution. See
168 http://grepmail.sf.net/ for previous versions of grepmail which
169 included early versions of this code.
170
172 Mail::MboxParser, Mail::Box
173
174
175
176perl v5.36.0 2023-01-20 Mail::Mbox::MessageParser(3)