1Mail::Mbox::MessageParsUesre(r3)Contributed Perl DocumenMtaaitli:o:nMbox::MessageParser(3)
2
3
4
6 Mail::Mbox::MessageParser - A fast and simple mbox folder reader
7
9 #!/usr/bin/perl
10
11 use Mail::Mbox::MessageParser;
12
13 my $file_name = 'mail/saved-mail';
14 my $file_handle = new FileHandle($file_name);
15
16 # Set up cache. (Not necessary if enable_cache is false.)
17 Mail::Mbox::MessageParser::SETUP_CACHE(
18 { 'file_name' => '/tmp/cache' } );
19
20 my $folder_reader =
21 new Mail::Mbox::MessageParser( {
22 'file_name' => $file_name,
23 'file_handle' => $file_handle,
24 'enable_cache' => 1,
25 'enable_grep' => 1,
26 } );
27
28 die $folder_reader unless ref $folder_reader;
29
30 # Any newlines or such before the start of the first email
31 my $prologue = $folder_reader->prologue;
32 print $prologue;
33
34 # This is the main loop. It's executed once for each email
35 while(!$folder_reader->end_of_file())
36 {
37 my $email = $folder_reader->read_next_email();
38 print $$email;
39 }
40
42 This module implements a fast but simple mbox folder reader. One of
43 three implementations (Cache, Grep, Perl) will be used depending on the
44 wishes of the user and the system configuration. The first
45 implementation is a cached-based one which stores email information
46 about mailboxes on the file system. Subsequent accesses will be faster
47 because no analysis of the mailbox will be needed. The second
48 implementation is one based on GNU grep, and is significantly faster
49 than the Perl version for mailboxes which contain very large (10MB)
50 emails. The final implementation is a fast Perl-based one which should
51 always be applicable.
52
53 The Cache implementation is about 6 times faster than the standard Perl
54 implementation. The Grep implementation is about 4 times faster than
55 the standard Perl implementation. If you have GNU grep, it's best to
56 enable both the Cache and Grep implementations. If the cache
57 information is available, you'll get very fast speeds. Otherwise,
58 you'll take about a 1/3 performance hit when the Grep version is used
59 instead.
60
61 The overriding requirement for this module is speed. If you wish more
62 sophisticated parsing, use Mail::MboxParser (which is based on this
63 module) or Mail::Box.
64
65 METHODS AND FUNCTIONS
66 SETUP_CACHE(...)
67 SETUP_CACHE( { 'file_name' => <cache file name> } );
68
69 <cache file name> - the file name of the cache
70
71 Call this function once to set up the cache before creating any
72 parsers. You must provide the location to the cache file. There is
73 no default value.
74
75 new(...)
76 new( { 'file_name' => <mailbox file name>,
77 'file_handle' => <mailbox file handle>,
78 'enable_cache' => <1 or 0>,
79 'enable_grep' => <1 or 0>,
80 'force_processing' => <1 or 0>,
81 'debug' => <1 or 0>,
82 } );
83
84 <mailbox file name> - the file name of the mailbox
85 <mailbox file handle> - the already opened file handle for the mailbox
86 <enable_cache> - true to attempt to use the cache implementation
87 <enable_grep> - true to attempt to use the grep implementation
88 <force_processing> - true to force processing of files that look invalid
89 <debug> - true to print some debugging information to STDERR
90
91 The constructor takes either a file name or a file handle, or both.
92 If the file handle is not defined, Mail::Mbox::MessageParser will
93 attempt to open the file using the file name. You should always
94 pass the file name if you have it, so that the parser can cache the
95 mailbox information.
96
97 This module will automatically decompress the mailbox as necessary.
98 If a filename is available but the file handle is undef, the module
99 will call either bzip2, or gzip to decompress the file in memory if
100 the filename ends with .tz, .bz2, or .gz, respectively. If the file
101 handle is defined, it will detect the type of compression and apply
102 the correct decompression program.
103
104 The Cache, Grep, or Perl implementation of the parser will be
105 loaded, whichever is most appropriate. For example, the first time
106 you use caching, there will be no cache. In this case, the grep
107 implementation can be used instead. The cache will be updated in
108 memory as the grep implementation parses the mailbox, and the cache
109 will be written after the program exits. The file name is optional,
110 in which case enable_cache and enable_grep must both be false.
111
112 force_processing will cause the module to process folders that look
113 to be binary, or whose text data doesn't look like a mailbox.
114
115 Returns a reference to a Mail::Mbox::MessageParser object on
116 success, and a scalar desribing an error on failure. ("Not a
117 mailbox", "Can't open <filename>: <system error>", "Can't execute
118 <uncompress command> for file <filename>"
119
120 reset()
121 Reset the filehandle and all internal state. Note that this will
122 not work with filehandles which are streams. If there is enough
123 demand, I may add the ability to store the previously read stream
124 data internally so that reset() will work correctly.
125
126 endline()
127 Returns "\n" or "\r\n", depending on the file format.
128
129 prologue()
130 Returns any newlines or other content at the start of the mailbox
131 prior to the first email.
132
133 end_of_file()
134 Returns true if the end of the file has been encountered.
135
136 line_number()
137 Returns the line number for the start of the last email read.
138
139 number()
140 Returns the number of the last email read. (i.e. The first email
141 will have a number of 1.)
142
143 length()
144 Returns the length of the last email read.
145
146 offset()
147 Returns the byte offset of the last email read.
148
149 read_next_email()
150 Returns a reference to a scalar holding the text of the next email
151 in the mailbox, or undef at the end of the file.
152
154 No known bugs.
155
156 Contact david@coppit.org for bug reports and suggestions.
157
159 David Coppit <david@coppit.org>.
160
162 This software is distributed under the terms of the GPL. See the file
163 "LICENSE" for more information.
164
166 This code was originally part of the grepmail distribution. See
167 http://grepmail.sf.net/ for previous versions of grepmail which
168 included early versions of this code.
169
171 Mail::MboxParser, Mail::Box
172
173
174
175perl v5.12.0 2010-06-01 Mail::Mbox::MessageParser(3)