1MIME::Parser(3) User Contributed Perl Documentation MIME::Parser(3)
2
3
4
6 MIME::Parser - experimental class for parsing MIME streams
7
9 Before reading further, you should see MIME::Tools to make sure that
10 you understand where this module fits into the grand scheme of things.
11 Go on, do it now. I'll wait.
12
13 Ready? Ok...
14
15 Basic usage examples
16
17 ### Create a new parser object:
18 my $parser = new MIME::Parser;
19
20 ### Tell it where to put things:
21 $parser->output_under("/tmp");
22
23 ### Parse an input filehandle:
24 $entity = $parser->parse(\*STDIN);
25
26 ### Congratulations: you now have a (possibly multipart) MIME entity!
27 $entity->dump_skeleton; # for debugging
28
29 Examples of input
30
31 ### Parse from filehandles:
32 $entity = $parser->parse(\*STDIN);
33 $entity = $parser->parse(IO::File->new("some command⎪");
34
35 ### Parse from any object that supports getline() and read():
36 $entity = $parser->parse($myHandle);
37
38 ### Parse an in-core MIME message:
39 $entity = $parser->parse_data($message);
40
41 ### Parse an MIME message in a file:
42 $entity = $parser->parse_open("/some/file.msg");
43
44 ### Parse an MIME message out of a pipeline:
45 $entity = $parser->parse_open("gunzip - < file.msg.gz ⎪");
46
47 ### Parse already-split input (as "deliver" would give it to you):
48 $entity = $parser->parse_two("msg.head", "msg.body");
49
50 Examples of output control
51
52 ### Keep parsed message bodies in core (default outputs to disk):
53 $parser->output_to_core(1);
54
55 ### Output each message body to a one-per-message directory:
56 $parser->output_under("/tmp");
57
58 ### Output each message body to the same directory:
59 $parser->output_dir("/tmp");
60
61 ### Change how nameless message-component files are named:
62 $parser->output_prefix("msg");
63
64 Examples of error recovery
65
66 ### Normal mechanism:
67 eval { $entity = $parser->parse(\*STDIN) };
68 if ($@) {
69 $results = $parser->results;
70 $decapitated = $parser->last_head; ### get last top-level head
71 }
72
73 ### Ultra-tolerant mechanism:
74 $parser->ignore_errors(1);
75 $entity = eval { $parser->parse(\*STDIN) };
76 $error = ($@ ⎪⎪ $parser->last_error);
77
78 ### Cleanup all files created by the parse:
79 eval { $entity = $parser->parse(\*STDIN) };
80 ...
81 $parser->filer->purge;
82
83 Examples of parser options
84
85 ### Automatically attempt to RFC-1522-decode the MIME headers?
86 $parser->decode_headers(1); ### default is false
87
88 ### Parse contained "message/rfc822" objects as nested MIME streams?
89 $parser->extract_nested_messages(0); ### default is true
90
91 ### Look for uuencode in "text" messages, and extract it?
92 $parser->extract_uuencode(1); ### default is false
93
94 ### Should we forgive normally-fatal errors?
95 $parser->ignore_errors(0); ### default is true
96
97 Miscellaneous examples
98
99 ### Convert a Mail::Internet object to a MIME::Entity:
100 @lines = (@{$mail->header}, "\n", @{$mail->body});
101 $entity = $parser->parse_data(\@lines);
102
104 You can inherit from this class to create your own subclasses that
105 parse MIME streams into MIME::Entity objects.
106
108 Construction
109
110 new ARGS...
111 Class method. Create a new parser object. Once you do this, you
112 can then set up various parameters before doing the actual parsing.
113 For example:
114
115 my $parser = new MIME::Parser;
116 $parser->output_dir("/tmp");
117 $parser->output_prefix("msg1");
118 my $entity = $parser->parse(\*STDIN);
119
120 Any arguments are passed into "init()". Don't override this in
121 your subclasses; override init() instead.
122
123 init ARGS...
124 Instance method. Initiallize a new MIME::Parser object. This is
125 automatically sent to a new object; you may want to override it.
126 If you override this, be sure to invoke the inherited method.
127
128 init_parse
129 Instance method. Invoked automatically whenever one of the top-
130 level parse() methods is called, to reset the parser to a "ready"
131 state.
132
133 Altering how messages are parsed
134
135 decode_headers [YESNO]
136 Instance method. Controls whether the parser will attempt to
137 decode all the MIME headers (as per RFC-1522) the moment it sees
138 them. This is not advisable for two very important reasons:
139
140 * It screws up the extraction of information from MIME fields.
141 If you fully decode the headers into bytes, you can inadver‐
142 tently transform a parseable MIME header like this:
143
144 Content-type: text/plain; filename="=?ISO-8859-1?Q?Hi=22Ho?="
145
146 into unparseable gobbledygook; in this case:
147
148 Content-type: text/plain; filename="Hi"Ho"
149
150 * It is information-lossy. An encoded string which contains both
151 Latin-1 and Cyrillic characters will be turned into a binary
152 mishmosh which simply can't be rendered.
153
154 History. This method was once the only out-of-the-box way to deal
155 with attachments whose filenames had non-ASCII characters. How‐
156 ever, since MIME-tools 5.4xx this is no longer necessary.
157
158 Parameters. If YESNO is true, decoding is done. However, you will
159 get a warning unless you use one of the special "true" values:
160
161 "I_NEED_TO_FIX_THIS"
162 Just shut up and do it. Not recommended.
163 Provided only for those who need to keep old scripts functioning.
164
165 "I_KNOW_WHAT_I_AM_DOING"
166 Just shut up and do it. Not recommended.
167 Provided for those who REALLY know what they are doing.
168
169 If YESNO is false (the default), no attempt at decoding will be
170 done. With no argument, just returns the current setting. Remem‐
171 ber: you can always decode the headers after the parsing has com‐
172 pleted (see MIME::Head::decode()), or decode the words on demand
173 (see MIME::Words).
174
175 extract_nested_messages OPTION
176 Instance method. Some MIME messages will contain a part of type
177 "message/rfc822" ,"message/partial" or "message/external-body":
178 literally, the text of an embedded mail/news/whatever message.
179 This option controls whether (and how) we parse that embedded mes‐
180 sage.
181
182 If the OPTION is false, we treat such a message just as if it were
183 a "text/plain" document, without attempting to decode its contents.
184
185 If the OPTION is true (the default), the body of the "mes‐
186 sage/rfc822" or "message/partial" part is parsed by this parser,
187 creating an entity object. What happens then is determined by the
188 actual OPTION:
189
190 NEST or 1
191 The default setting. The contained message becomes the sole
192 "part" of the "message/rfc822" entity (as if the containing
193 message were a special kind of "multipart" message). You can
194 recover the sub-entity by invoking the parts() method on the
195 "message/rfc822" entity.
196
197 REPLACE
198 The contained message replaces the "message/rfc822" entity, as
199 though the "message/rfc822" "container" never existed.
200
201 Warning: notice that, with this option, all the header informa‐
202 tion in the "message/rfc822" header is lost. This might seri‐
203 ously bother you if you're dealing with a top-level message,
204 and you've just lost the sender's address and the subject line.
205 ":-/".
206
207 Thanks to Andreas Koenig for suggesting this method.
208
209 extract_uuencode [YESNO]
210 Instance method. If set true, then whenever we are confronted with
211 a message whose effective content-type is "text/plain" and whose
212 encoding is 7bit/8bit/binary, we scan the encoded body to see if it
213 contains uuencoded data (generally given away by a "begin XXX"
214 line).
215
216 If it does, we explode the uuencoded message into a multipart,
217 where the text before the first "begin XXX" becomes the first part,
218 and all "begin...end" sections following become the subsequent
219 parts. The filename (if given) is accessible through the normal
220 means.
221
222 ignore_errors [YESNO]
223 Instance method. Controls whether the parser will attempt to
224 ignore normally-fatal errors, treating them as warnings and contin‐
225 uing with the parse.
226
227 If YESNO is true (the default), many syntax errors are tolerated.
228 If YESNO is false, fatal errors throw exceptions. With no argu‐
229 ment, just returns the current setting.
230
231 decode_bodies [YESNO]
232 Instance method. Controls whether the parser should decode entity
233 bodies or not. If this is set to a false value (default is true),
234 all entity bodies will be kept as-is in the original content-trans‐
235 fer encoding.
236
237 To prevent double encoding on the output side
238 MIME::Body->is_encoded is set, which tells MIME::Body not to encode
239 the data again, if encoded data was requested. This is in particu‐
240 lar useful, when it's important that the content must not be modi‐
241 fied, e.g. if you want to calculate OpenPGP signatures from it.
242
243 WARNING: the semantics change significantly if you parse MIME mes‐
244 sages with this option set, because MIME::Entity resp. MIME::Body
245 *always* see encoded data now, while the default behaviour is work‐
246 ing with *decoded* data (and encoding it only if you request it).
247 You need to decode the data yourself, if you want to have it
248 decoded.
249
250 So use this option only if you exactly know, what you're doing, and
251 that you're sure, that you really need it.
252
253 Parsing an input source
254
255 parse_data DATA
256 Instance method. Parse a MIME message that's already in core. You
257 may supply the DATA in any of a number of ways...
258
259 * A scalar which holds the message.
260
261 * A ref to a scalar which holds the message. This is an effi‐
262 ciency hack.
263
264 * A ref to an array of scalars. They are treated as a stream
265 which (conceptually) consists of simply concatenating the
266 scalars.
267
268 Returns the parsed MIME::Entity on success.
269
270 parse INSTREAM
271 Instance method. Takes a MIME-stream and splits it into its compo‐
272 nent entities.
273
274 The INSTREAM can be given as a readable FileHandle, an IO::File, a
275 globref filehandle (like "\*STDIN"), or as any blessed object con‐
276 forming to the IO:: interface (which minimally implements getline()
277 and read()).
278
279 Returns the parsed MIME::Entity on success. Throws exception on
280 failure. If the message contained too many parts (as set by
281 max_parts), returns undef.
282
283 parse_open EXPR
284 Instance method. Convenience front-end onto "parse()". Simply
285 give this method any expression that may be sent as the second
286 argument to open() to open a filehandle for reading.
287
288 Returns the parsed MIME::Entity on success. Throws exception on
289 failure.
290
291 parse_two HEADFILE, BODYFILE
292 Instance method. Convenience front-end onto "parse_open()",
293 intended for programs running under mail-handlers like deliver,
294 which splits the incoming mail message into a header file and a
295 body file. Simply give this method the paths to the respective
296 files.
297
298 Warning: it is assumed that, once the files are cat'ed together,
299 there will be a blank line separating the head part and the body
300 part.
301
302 Warning: new implementation slurps files into line array for porta‐
303 bility, instead of using 'cat'. May be an issue if your messages
304 are large.
305
306 Returns the parsed MIME::Entity on success. Throws exception on
307 failure.
308
309 Specifying output destination
310
311 Warning: in 5.212 and before, this was done by methods of MIME::Parser.
312 However, since many users have requested fine-tuned control over how
313 this is done, the logic has been split off from the parser into its own
314 class, MIME::Parser::Filer Every MIME::Parser maintains an instance of
315 a MIME::Parser::Filer subclass to manage disk output (see
316 MIME::Parser::Filer for details.)
317
318 The benefit to this is that the MIME::Parser code won't be confounded
319 with a lot of garbage related to disk output. The drawback is that the
320 way you override the default behavior will change.
321
322 For now, all the normal public-interface methods are still provided,
323 but many are only stubs which create or delegate to the underlying
324 MIME::Parser::Filer object.
325
326 filer [FILER]
327 Instance method. Get/set the FILER object used to manage the out‐
328 put of files to disk. This will be some subclass of
329 MIME::Parser::Filer.
330
331 output_dir DIRECTORY
332 Instance method. Causes messages to be filed directly into the
333 given DIRECTORY. It does this by setting the underlying filer() to
334 a new instance of MIME::Parser::FileInto, and passing the arguments
335 into that class' new() method.
336
337 Note: Since this method replaces the underlying filer, you must
338 invoke it before doing changing any attributes of the filer, like
339 the output prefix; otherwise those changes will be lost.
340
341 output_under BASEDIR, OPTS...
342 Instance method. Causes messages to be filed directly into subdi‐
343 rectories of the given BASEDIR, one subdirectory per message. It
344 does this by setting the underlying filer() to a new instance of
345 MIME::Parser::FileUnder, and passing the arguments into that class'
346 new() method.
347
348 Note: Since this method replaces the underlying filer, you must
349 invoke it before doing changing any attributes of the filer, like
350 the output prefix; otherwise those changes will be lost.
351
352 output_path HEAD
353 Instance method, DEPRECATED. Given a MIME head for a file to be
354 extracted, come up with a good output pathname for the extracted
355 file. Identical to the preferred form:
356
357 $parser->filer->output_path(...args...);
358
359 We just delegate this to the underlying filer() object.
360
361 output_prefix [PREFIX]
362 Instance method, DEPRECATED. Get/set the short string that all
363 filenames for extracted body-parts will begin with (assuming that
364 there is no better "recommended filename"). Identical to the pre‐
365 ferred form:
366
367 $parser->filer->output_prefix(...args...);
368
369 We just delegate this to the underlying filer() object.
370
371 evil_filename NAME
372 Instance method, DEPRECATED. Identical to the preferred form:
373
374 $parser->filer->evil_filename(...args...);
375
376 We just delegate this to the underlying filer() object.
377
378 max_parts NUM
379 Instance method. Limits the number of MIME parts we will parse.
380
381 Normally, instances of this class parse a message to the bitter
382 end. Messages with many MIME parts can cause excessive memory con‐
383 sumption. If you invoke this method, parsing will abort with a
384 die() if a message contains more than NUM parts.
385
386 If NUM is set to -1 (the default), then no maximum limit is
387 enforced.
388
389 With no argument, returns the current setting as an integer
390
391 output_to_core YESNO
392 Instance method. Normally, instances of this class output all
393 their decoded body data to disk files (via MIME::Body::File). How‐
394 ever, you can change this behaviour by invoking this method before
395 parsing:
396
397 If YESNO is false (the default), then all body data goes to disk
398 files.
399
400 If YESNO is true, then all body data goes to in-core data struc‐
401 tures This is a little risky (what if someone emails you an MPEG or
402 a tar file, hmmm?) but people seem to want this bit of noose-shaped
403 rope, so I'm providing it. Note that setting this attribute true
404 does not mean that parser-internal temporary files are avoided!
405 Use tmp_to_core() for that.
406
407 With no argument, returns the current setting as a boolean.
408
409 tmp_recycling [YESNO]
410 Instance method. Normally, tmpfiles are created when needed during
411 parsing, and destroyed automatically when they go out of scope.
412 But for efficiency, you might prefer for your parser to attempt to
413 rewind and reuse the same file until the parser itself is
414 destroyed.
415
416 If YESNO is true (the default), we allow recycling; tmpfiles per‐
417 sist until the parser itself is destroyed. If YESNO is false, we
418 do not allow recycling; tmpfiles persist only as long as they are
419 needed during the parse. With no argument, just returns the cur‐
420 rent setting.
421
422 tmp_to_core [YESNO]
423 Instance method. Should new_tmpfile() create real temp files, or
424 use fake in-core ones? Normally we allow the creation of temporary
425 disk files, since this allows us to handle huge attachments even
426 when core is limited.
427
428 If YESNO is true, we implement new_tmpfile() via in-core handles.
429 If YESNO is false (the default), we use real tmpfiles. With no
430 argument, just returns the current setting.
431
432 use_inner_files [YESNO]
433 Instance method. If you are parsing from a handle which supports
434 seek() and tell(), then we can avoid tmpfiles completely by using
435 IO::InnerFile, if so desired: basically, we simulate a temporary
436 file via pointers to virtual start- and end-positions in the input
437 stream.
438
439 If YESNO is false (the default), then we will not use IO::Inner‐
440 File. If YESNO is true, we use IO::InnerFile if we can. With no
441 argument, just returns the current setting.
442
443 Note: inner files are slower than real tmpfiles, but possibly
444 faster than in-core tmpfiles... so your choice for this option will
445 probably depend on your choice for tmp_to_core() and the kind of
446 input streams you are parsing.
447
448 Specifying classes to be instantiated
449
450 interface ROLE,[VALUE]
451 Instance method. During parsing, the parser normally creates
452 instances of certain classes, like MIME::Entity. However, you may
453 want to create a parser subclass that uses your own experimental
454 head, entity, etc. classes (for example, your "head" class may pro‐
455 vide some additional MIME-field-oriented methods).
456
457 If so, then this is the method that your subclass should invoke
458 during init. Use it like this:
459
460 package MyParser;
461 @ISA = qw(MIME::Parser);
462 ...
463 sub init {
464 my $self = shift;
465 $self->SUPER::init(@_); ### do my parent's init
466 $self->interface(ENTITY_CLASS => 'MIME::MyEntity');
467 $self->interface(HEAD_CLASS => 'MIME::MyHead');
468 $self; ### return
469 }
470
471 With no VALUE, returns the VALUE currently associated with that
472 ROLE.
473
474 new_body_for HEAD
475 Instance method. Based on the HEAD of a part we are parsing,
476 return a new body object (any desirable subclass of MIME::Body) for
477 receiving that part's data.
478
479 If you set the "output_to_core" option to false before parsing (the
480 default), then we call "output_path()" and create a new
481 MIME::Body::File on that filename.
482
483 If you set the "output_to_core" option to true before parsing, then
484 you get a MIME::Body::InCore instead.
485
486 If you want the parser to do something else entirely, you can over‐
487 ride this method in a subclass.
488
489 new_tmpfile [RECYCLE]
490 Instance method. Return an IO handle to be used to hold temporary
491 data during a parse. The default uses the standard
492 IO::File->new_tmpfile() method unless tmp_to_core() dictates other‐
493 wise, but you can override this. You shouldn't need to.
494
495 If you do override this, make certain that the object you return is
496 set for binmode(), and is able to handle the following methods:
497
498 read(BUF, NBYTES)
499 getline()
500 getlines()
501 print(@ARGS)
502 flush()
503 seek(0, 0)
504
505 Fatal exception if the stream could not be established.
506
507 If RECYCLE is given, it is an object returned by a previous invoca‐
508 tion of this method; to recycle it, this method must effectively
509 rewind and truncate it, and return the same object. If you don't
510 want to support recycling, just ignore it and always return a new
511 object.
512
513 Parse results and error recovery
514
515 last_error
516 Instance method. Return the error (if any) that we ignored in the
517 last parse.
518
519 last_head
520 Instance method. Return the top-level MIME header of the last
521 stream we attempted to parse. This is useful for replying to peo‐
522 ple who sent us bad MIME messages.
523
524 ### Parse an input stream:
525 eval { $entity = $parser->parse(\*STDIN) };
526 if (!$entity) { ### parse failed!
527 my $decapitated = $parser->last_head;
528 ...
529 }
530
531 results
532 Instance method. Return an object containing lots of info from the
533 last entity parsed. This will be an instance of class
534 MIME::Parser::Results.
535
537 Maximizing speed
538
539 Optimum input mechanisms:
540
541 parse() YES (if you give it a globref or a
542 subclass of IO::File)
543 parse_open() YES
544 parse_data() NO (see below)
545 parse_two() NO (see below)
546
547 Optimum settings:
548
549 decode_headers() *** (no real difference; 0 is slightly faster)
550 extract_nested_messages() 0 (may be slightly faster, but in
551 general you want it set to 1)
552 output_to_core() 0 (will be MUCH faster)
553 tmp_recycling() 1? (probably, but should be investigated)
554 tmp_to_core() 0 (will be MUCH faster)
555 use_inner_files() 0 (if tmp_to_core() is 0;
556 use 1 otherwise)
557
558 File I/O is much faster than in-core I/O. Although it seems like
559 slurping a message into core and processing it in-core should be
560 faster... it isn't. Reason: Perl's filehandle-based I/O translates
561 directly into native operating-system calls, whereas the in-core I/O is
562 implemented in Perl.
563
564 Inner files are slower than real tmpfiles, but faster than in-core
565 ones. If speed is your concern, that's why you should set
566 use_inner_files(true) if you set tmp_to_core(true): so that we can
567 bypass the slow in-core tmpfiles if the input stream permits.
568
569 Native I/O is much faster than object-oriented I/O. It's much faster
570 to use <$foo> than $foo->getline. For backwards compatibilty, this
571 module must continue to use object-oriented I/O in most places, but if
572 you use parse() with a "real" filehandle (string, globref, or subclass
573 of IO::File) then MIME::Parser is able to perform some crucial opti‐
574 mizations.
575
576 The parse_two() call is very inefficient. Currently this is just a
577 front-end onto parse_data(). If your OS supports it, you're far better
578 off doing something like:
579
580 $parser->parse_open("/bin/cat msg.head msg.body ⎪");
581
582 Minimizing memory
583
584 Optimum input mechanisms:
585
586 parse() YES
587 parse_open() YES
588 parse_data() NO (in-core I/O will burn core)
589 parse_two() NO (in-core I/O will burn core)
590
591 Optimum settings:
592
593 decode_headers() *** (no real difference)
594 extract_nested_messages() *** (no real difference)
595 output_to_core() 0 (will use MUCH less memory)
596 tmp_recycling() 0? (promotes faster GC if
597 tmp_to_core is 1)
598 tmp_to_core() 0 (will use MUCH less memory)
599 use_inner_files() *** (no real difference, but set it to 1
600 if you *must* have tmp_to_core set to 1,
601 so that you avoid in-core tmpfiles)
602
603 Maximizing tolerance of bad MIME
604
605 Optimum input mechanisms:
606
607 parse() *** (doesn't matter)
608 parse_open() *** (doesn't matter)
609 parse_data() *** (doesn't matter)
610 parse_two() *** (doesn't matter)
611
612 Optimum settings:
613
614 decode_headers() 0 (sidesteps problem of bad hdr encodings)
615 extract_nested_messages() 0 (sidesteps problems of bad nested messages,
616 but often you want it set to 1 anyway).
617 output_to_core() *** (doesn't matter)
618 tmp_recycling() *** (doesn't matter)
619 tmp_to_core() *** (doesn't matter)
620 use_inner_files() *** (doesn't matter)
621
622 Avoiding disk-based temporary files
623
624 Optimum input mechanisms:
625
626 parse() YES (if you give it a seekable handle)
627 parse_open() YES (becomes a seekable handle)
628 parse_data() NO (unless you set tmp_to_core(1))
629 parse_two() NO (unless you set tmp_to_core(1))
630
631 Optimum settings:
632
633 decode_headers() *** (doesn't matter)
634 extract_nested_messages() *** (doesn't matter)
635 output_to_core() *** (doesn't matter)
636 tmp_recycling 1 (restricts created files to 1 per parser)
637 tmp_to_core() 1
638 use_inner_files() 1
639
640 If we can use them, inner files avoid most tmpfiles. If you parse from
641 a seekable-and-tellable filehandle, then the internal
642 process_to_bound() doesn't need to extract each part into a temporary
643 buffer; it can use IO::InnerFile (warning: this will slow down the
644 parsing of messages with large attachments).
645
646 You can veto tmpfiles entirely. If you might not be parsing from a
647 seekable-and-tellable filehandle, you can set tmp_to_core() true: this
648 will always use in-core I/O for the buffering (warning: this will slow
649 down the parsing of messages with large attachments).
650
651 Final resort. You can always override new_tmpfile() in a subclass.
652
654 Multipart messages are always read line-by-line
655 Multipart document parts are read line-by-line, so that the encap‐
656 sulation boundaries may easily be detected. However, bad MIME com‐
657 position agents (for example, naive CGI scripts) might return mul‐
658 tipart documents where the parts are, say, unencoded bitmap
659 files... and, consequently, where such "lines" might be
660 veeeeeeeeery long indeed.
661
662 A better solution for this case would be to set up some form of
663 state machine for input processing. This will be left for future
664 versions.
665
666 Multipart parts read into temp files before decoding
667 In my original implementation, the MIME::Decoder classes had to be
668 aware of encapsulation boundaries in multipart MIME documents.
669 While this decode-while-parsing approach obviated the need for tem‐
670 porary files, it resulted in inflexible and complex decoder imple‐
671 mentations.
672
673 The revised implementation uses a temporary file (a la "tmpfile()")
674 during parsing to hold the encoded portion of the current MIME doc‐
675 ument or part. This file is deleted automatically after the cur‐
676 rent part is decoded and the data is written to the "body stream"
677 object; you'll never see it, and should never need to worry about
678 it.
679
680 Some folks have asked for the ability to bypass this temp-file
681 mechanism, I suppose because they assume it would slow down their
682 application. I considered accomodating this wish, but the temp-
683 file approach solves a lot of thorny problems in parsing, and it
684 also protects against hidden bugs in user applications (what if
685 you've directed the encoded part into a scalar, and someone unex‐
686 pectedly sends you a 6 MB tar file?). Finally, I'm just not con‐
687 viced that the temp-file use adds significant overhead.
688
689 Fuzzing of CRLF and newline on input
690 RFC-1521 dictates that MIME streams have lines terminated by CRLF
691 ("\r\n"). However, it is extremely likely that folks will want to
692 parse MIME streams where each line ends in the local newline char‐
693 acter "\n" instead.
694
695 An attempt has been made to allow the parser to handle both CRLF
696 and newline-terminated input.
697
698 Fuzzing of CRLF and newline on output
699 The "7bit" and "8bit" decoders will decode both a "\n" and a "\r\n"
700 end-of-line sequence into a "\n".
701
702 The "binary" decoder (default if no encoding specified) still out‐
703 puts stuff verbatim... so a MIME message with CRLFs and no explicit
704 encoding will be output as a text file that, on many systems, will
705 have an annoying ^M at the end of each line... but this is as it
706 should be.
707
708 Inability to handle multipart boundaries that contain newlines
709 First, let's get something straight: this is an evil, EVIL prac‐
710 tice, and is incompatible with RFC-1521... hence, it's not valid
711 MIME.
712
713 If your mailer creates multipart boundary strings that contain new‐
714 lines when they appear in the message body, give it two weeks
715 notice and find another one. If your mail robot receives MIME mail
716 like this, regard it as syntactically incorrect MIME, which it is.
717
718 Why do I say that? Well, in RFC-1521, the syntax of a boundary is
719 given quite clearly:
720
721 boundary := 0*69<bchars> bcharsnospace
722
723 bchars := bcharsnospace / " "
724
725 bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" /"_"
726 / "," / "-" / "." / "/" / ":" / "=" / "?"
727
728 All of which means that a valid boundary string cannot have new‐
729 lines in it, and any newlines in such a string in the message
730 header are expected to be solely the result of folding the string
731 (i.e., inserting to-be-removed newlines for readability and line-
732 shortening only).
733
734 Yet, there is at least one brain-damaged user agent out there that
735 composes mail like this:
736
737 MIME-Version: 1.0
738 Content-type: multipart/mixed; boundary="----ABC-
739 123----"
740 Subject: Hi... I'm a dork!
741
742 This is a multipart MIME message (yeah, right...)
743
744 ----ABC-
745 123----
746
747 Hi there!
748
749 We have got to discourage practices like this (and the recent file
750 upload idiocy where binary files that are part of a multipart MIME
751 message aren't base64-encoded) if we want MIME to stay relatively
752 simple, and MIME parsers to be relatively robust.
753
754 Thanks to Andreas Koenig for bringing a baaaaaaaaad user agent to
755 my attention.
756
758 Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
759 David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
760
761 All rights reserved. This program is free software; you can redis‐
762 tribute it and/or modify it under the same terms as Perl itself.
763
765 $Revision: 1.20 $ $Date: 2006/03/17 21:03:23 $
766
767
768
769perl v5.8.8 2006-03-17 MIME::Parser(3)