1MIME::Parser::Filer(3)User Contributed Perl DocumentationMIME::Parser::Filer(3)
2
3
4
6 MIME::Parser::Filer - manage file-output of the parser
7
9 Before reading further, you should see MIME::Parser to make sure that
10 you understand where this module fits into the grand scheme of things.
11 Go on, do it now. I'll wait.
12
13 Ready? Ok... now read "DESCRIPTION" below, and everything else should
14 make sense.
15
16 Public interface
17
18 ### Create a "filer" of the desired class:
19 my $filer = MIME::Parser::FileInto->new($dir);
20 my $filer = MIME::Parser::FileUnder->new($basedir);
21 ...
22
23 ### Want added security? Don't let outsiders name your files:
24 $filer->ignore_filename(1);
25
26 ### Prepare for the parsing of a new top-level message:
27 $filer->init_parse;
28
29 ### Return the path where this message's data should be placed:
30 $path = $filer->output_path($head);
31
32 Semi-public interface
33
34 These methods might be overriden or ignored in some subclasses, so they
35 don't all make sense in all circumstances:
36
37 ### Tweak the mapping from content-type to extension:
38 $emap = $filer->output_extension_map;
39 $emap->{"text/html"} = ".htm";
40
42 How this class is used when parsing
43
44 When a MIME::Parser decides that it wants to output a file to disk, it
45 uses its "Filer" object -- an instance of a MIME::Parser::Filer sub‐
46 class -- to determine where to put the file.
47
48 Every parser has a single Filer object, which it uses for all parsing.
49 You can get the Filer for a given $parser like this:
50
51 $filer = $parser->filer;
52
53 At the beginning of each "parse()", the filer's internal state is reset
54 by the parser:
55
56 $parser->filer->init_parse;
57
58 The parser can then get a path for each entity in the message by hand‐
59 ing that entity's header (a MIME::Head) to the filer and having it do
60 the work, like this:
61
62 $new_file = $parser->filer->output_path($head);
63
64 Since it's nice to be able to clean up after a parse (especially a
65 failed parse), the parser tells the filer when it has actually used a
66 path:
67
68 $parser->filer->purgeable($new_file);
69
70 Then, if you want to clean up the files which were created for a par‐
71 ticular parse (and also any directories that the Filer created), you
72 would do this:
73
74 $parser->filer->purge;
75
76 Writing your own subclasses
77
78 There are two standard "Filer" subclasses (see below):
79 MIME::Parser::FileInto, which throws all files from all parses into the
80 same directory, and MIME::Parser::FileUnder (preferred), which creates
81 a subdirectory for each message. Hopefully, these will be sufficient
82 for most uses, but just in case...
83
84 The only method you have to override is output_path():
85
86 $filer->output_path($head);
87
88 This method is invoked by MIME::Parser when it wants to put a decoded
89 message body in an output file. The method should return a path to the
90 file to create. Failure is indicated by throwing an exception.
91
92 The path returned by "output_path()" should be "ready for open()": any
93 necessary parent directories need to exist at that point. These direc‐
94 tories can be created by the Filer, if course, and they should be
95 marked as purgeable() if a purge should delete them.
96
97 Actually, if your issue is more where the files go than what they're
98 named, you can use the default output_path() method and just override
99 one of its components:
100
101 $dir = $filer->output_dir($head);
102 $name = $filer->output_filename($head);
103 ...
104
106 MIME::Parser::Filer
107
108 This is the abstract superclass of all "filer" objects.
109
110 new INITARGS...
111 Class method, constructor. Create a new outputter for the given
112 parser. Any subsequent arguments are given to init(), which sub‐
113 classes should override for their own use (the default init does
114 nothing).
115
116 results RESULTS
117 Instance method. Link this filer to a MIME::Parser::Results object
118 which will tally the messages. Notice that we avoid linking it to
119 the parser to avoid circular reference!
120
121 init_parse
122 Instance method. Prepare to start parsing a new message. Sub‐
123 classes should always be sure to invoke the inherited method.
124
125 evil_filename FILENAME
126 Instance method. Is this an evil filename; i.e., one which should
127 not be used in generating a disk file name? It is if any of these
128 are true:
129
130 * it is empty
131 * it is a string of dots: ".", "..", etc.
132 * it contains characters not in the set: "A" - "Z", "a" - "z",
133 "0" - "9", "-", "_", "+", "=", ".", ",", "@", "#",
134 "$", and " ".
135 * it is too long
136
137 If you just want to change this behavior, you should override this
138 method in the subclass of MIME::Parser::Filer that you use.
139
140 Warning: at the time this method is invoked, the FILENAME has
141 already been unmime'd into the local character set. If you're
142 using any character set other than ASCII, ISO-8859-*, or UTF-8, the
143 interpretation of the "path" characters might be very different,
144 and you will probably need to override this method. See "unmime"
145 in MIME::WordDecoder for more details.
146
147 Note: subclasses of MIME::Parser::Filer which override out‐
148 put_path() might not consult this method; note, however, that the
149 built-in subclasses do consult it.
150
151 Thanks to Andrew Pimlott for finding a real dumb bug in the origi‐
152 nal version. Thanks to Nickolay Saukh for noting that evil is in
153 the eye of the beholder.
154
155 exorcise_filename FILENAME
156 Instance method. If a given filename is evil (see "evil_filename")
157 we try to rescue it by performing some basic operations: shortening
158 it, removing bad characters, etc., and checking each against
159 evil_filename().
160
161 Returns the exorcised filename (which is guaranteed to not be
162 evil), or undef if it could not be salvaged.
163
164 Warning: at the time this method is invoked, the FILENAME has
165 already been unmime'd into the local character set. If you're
166 using anything character set other than ASCII, ISO-8859-*, or
167 UTF-8, the interpretation of the "path" characters might be very
168 very different, and you will probably need to override this method.
169 See "unmime" in MIME::WordDecoder for more details.
170
171 find_unused_path DIR, FILENAME
172 Instance method, subclasses only. We have decided on an output
173 directory and tentative filename, but there is a chance that it
174 might already exist. Keep adding a numeric suffix "-1", "-2", etc.
175 to the filename until an unused path is found, and then return that
176 path.
177
178 The suffix is actually added before the first "." in the filename
179 is there is one; for example:
180
181 picture.gif archive.tar.gz readme
182 picture-1.gif archive-1.tar.gz readme-1
183 picture-2.gif archive-2.tar.gz readme-2
184 ... ... ...
185 picture-10.gif
186 ...
187
188 This can be a costly operation, and risky if you don't want files
189 renamed, so it is in your best interest to minimize situations
190 where these kinds of collisions occur. Unfortunately, if a multi‐
191 part message gives all of its parts the same recommended filename,
192 and you are placing them all in the same directory, this method
193 might be unavoidable.
194
195 ignore_filename [YESNO]
196 Instance method. Return true if we should always ignore recom‐
197 mended filenames in messages, choosing instead to always generate
198 our own filenames. With argument, sets this value.
199
200 Note: subclasses of MIME::Parser::Filer which override out‐
201 put_path() might not honor this setting; note, however, that the
202 built-in subclasses honor it.
203
204 output_dir HEAD
205 Instance method. Return the output directory for the given header.
206 The default method returns ".".
207
208 output_filename HEAD
209 Instance method, subclasses only. A given recommended filename was
210 either not given, or it was judged to be evil. Return a fake name,
211 possibly using information in the message HEADer. Note that this
212 is just the filename, not the full path.
213
214 Used by output_path(). If you're using the default "out‐
215 put_path()", you probably don't need to worry about avoiding colli‐
216 sions with existing files; we take care of that in
217 find_unused_path().
218
219 output_prefix [PREFIX]
220 Instance method. Get the short string that all filenames for
221 extracted body-parts will begin with (assuming that there is no
222 better "recommended filename"). The default is "msg".
223
224 If PREFIX is not given, the current output prefix is returned. If
225 PREFIX is given, the output prefix is set to the new value, and the
226 previous value is returned.
227
228 Used by output_filename().
229
230 Note: subclasses of MIME::Parser::Filer which override out‐
231 put_path() or output_filename() might not honor this setting; note,
232 however, that the built-in subclasses honor it.
233
234 output_type_ext
235 Instance method. Return a reference to the hash used by the
236 default output_filename() for mapping from content-types to exten‐
237 sions when there is no default extension to use.
238
239 $emap = $filer->output_typemap;
240 $emap->{'text/plain'} = '.txt';
241 $emap->{'text/html'} = '.html';
242 $emap->{'text/*'} = '.txt';
243 $emap->{'*/*'} = '.dat';
244
245 Note: subclasses of MIME::Parser::Filer which override out‐
246 put_path() or output_filename() might not consult this hash; note,
247 however, that the built-in subclasses consult it.
248
249 output_path HEAD
250 Instance method, subclasses only. Given a MIME head for a file to
251 be extracted, come up with a good output pathname for the extracted
252 file. This is the only method you need to worry about if you are
253 building a custom filer.
254
255 The default implementation does a lot of work; subclass imple‐
256 menters really should try to just override its components instead
257 of the whole thing. It works basically as follows:
258
259 $directory = $self->output_dir($head);
260
261 $filename = $head->recommended_filename();
262 if (!$filename or
263 $self->ignore_filename() or
264 $self->evil_filename($filename)) {
265 $filename = $self->output_filename($head);
266 }
267
268 return $self->find_unused_path($directory, $filename);
269
270 Note: There are many, many, many ways you might want to control the
271 naming of files, based on your application. If you don't like the
272 behavior of this function, you can easily define your own subclass
273 of MIME::Parser::Filer and override it there.
274
275 Note: Nickolay Saukh pointed out that, given the subjective nature
276 of what is "evil", this function really shouldn't warn about an
277 evil filename, but maybe just issue a debug message. I considered
278 that, but then I thought: if debugging were off, people wouldn't
279 know why (or even if) a given filename had been ignored. In mail
280 robots that depend on externally-provided filenames, this could
281 cause hard-to-diagnose problems. So, the message is still a warn‐
282 ing.
283
284 Thanks to Laurent Amon for pointing out problems with the original
285 implementation, and for making some good suggestions. Thanks also
286 to Achim Bohnet for pointing out that there should be a hookless,
287 OO way of overriding the output path.
288
289 purge
290 Instance method, final. Purge all files/directories created by the
291 last parse. This method simply goes through the purgeable list in
292 reverse order (see "purgeable") and removes all existing
293 files/directories in it. You should not need to override this
294 method.
295
296 purgeable [FILE]
297 Instance method, final. Add FILE to the list of "purgeable"
298 files/directories (those which will be removed if you do a
299 "purge()"). You should not need to override this method.
300
301 If FILE is not given, the "purgeable" list is returned. This may
302 be used for more-sophisticated purging.
303
304 As a special case, invoking this method with a FILE that is an
305 arrayref will replace the purgeable list with a copy of the array's
306 contents, so [] may be used to clear the list.
307
308 Note that the "purgeable" list is cleared when a parser begins a
309 new parse; therefore, if you want to use purge() to do cleanup, you
310 must do so before starting a new parse!
311
312 MIME::Parser::FileInto
313
314 This concrete subclass of MIME::Parser::Filer supports filing into a
315 given directory.
316
317 init DIRECTORY
318 Instance method, initiallizer. Set the directory where all files
319 will go.
320
321 MIME::Parser::FileUnder
322
323 This concrete subclass of MIME::Parser::Filer supports filing under a
324 given directory, using one subdirectory per message, but with all mes‐
325 sage parts in the same directory.
326
327 init BASEDIR, OPTSHASH...
328 Instance method, initiallizer. Set the base directory which will
329 contain the message directories. If used, then each parse of
330 begins by creating a new subdirectory of BASEDIR where the actual
331 parts of the message are placed. OPTSHASH can contain the follow‐
332 ing:
333
334 DirName
335 Explicitly set the name of the subdirectory which is created.
336 The default is to use the time, process id, and a sequence num‐
337 ber, but you might want a predictable directory.
338
339 Purge
340 Automatically purge the contents of the directory (including
341 all subdirectories) before each parse. This is really only
342 needed if using an explicit DirName, and is provided as a con‐
343 venience only. Currently we use the 1-arg form of
344 File::Path::rmtree; you should familiarize yourself with the
345 caveats therein.
346
347 The output_dir() will return the path to this message-specific
348 directory until the next parse is begun, so you can do this:
349
350 use File::Path;
351
352 $parser->output_under("/tmp");
353 $ent = eval { $parser->parse_open($msg); }; ### parse
354 if (!$ent) { ### parse failed
355 rmtree($parser->output_dir);
356 die "parse failed: $@";
357 }
358 else { ### parse succeeded
359 ...do stuff...
360 }
361
363 Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
364
365 All rights reserved. This program is free software; you can redis‐
366 tribute it and/or modify it under the same terms as Perl itself.
367
369 $Revision: 1.6 $
370
371
372
373perl v5.8.8 2006-03-17 MIME::Parser::Filer(3)