MIME::WordDecoder(3pm)

1MIME::WordDecoder(3)  User Contributed Perl Documentation MIME::WordDecoder(3)
2
3
4

NAME

6       MIME::WordDecoder - decode RFC-1522 encoded words to a local represen‐
7       tation
8

SYNOPSIS

10       See MIME::Words for the basics of encoded words.  See "DESCRIPTION" for
11       how this class works.
12
13           use MIME::WordDecoder;
14
15           ### Get the default word-decoder (used by unmime()):
16           $wd = default MIME::WordDecoder;
17
18           ### Get a word-decoder which maps to ISO-8859-1 (Latin1):
19           $wd = supported MIME::WordDecoder "ISO-8859-1";
20
21           ### Decode a MIME string (e.g., into Latin1) via the default decoder:
22           $str = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
23
24           ### Decode a string using the default decoder, non-OO style:
25           $str = unmime('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
26

DESCRIPTION

28       A MIME::WordDecoder consists, fundamentally, of a hash which maps a
29       character set name (US-ASCII, ISO-8859-1, etc.) to a subroutine which
30       knows how to take bytes in that character set and turn them into the
31       target string representation.  Ideally, this target representation
32       would be Unicode, but we don't want to overspecify the translation that
33       takes place: if you want to convert MIME strings directly to Big5,
34       that's your own decision.
35
36       The subroutine will be invoked with two arguments: DATA (the data in
37       the given character set), and CHARSET (the upcased character set name).
38
39       For example:
40
41           ### Keep 7-bit characters as-is, convert 8-bit characters to '#':
42           sub keep7bit {
43               local $_ = shift;
44               tr/\x00-\x7F/#/c;
45               $_;
46           }
47
48       Here's a decoder which uses that:
49
50          ### Construct a decoder:
51          $wd = MIME::WordDecoder->new({'US-ASCII'   => "KEEP",   ### sub { $_[0] }
52                                        'ISO-8859-1' => \&keep7bit,
53                                        'ISO-8859-2' => \&keep7bit,
54                                        'Big5'       => "WARN",
55                                        '*'          => "DIE"});
56
57          ### Convert some MIME text to a pure ASCII string...
58          $ascii = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
59
60          ### ...which will now hold: "To: Keld J#rn Simonsen <keld>"
61

PUBLIC INTERFACE

63       default [DECODER]
64           Class method.  Get/set the default DECODER object.
65
66       supported CHARSET, [DECODER]
67           Class method.  If just CHARSET is given, returns a decoder object
68           which maps data into that character set (the character set is
69           forced to all-uppercase).
70
71               $wd = supported MIME::WordDecoder "ISO-8859-1";
72
73           If DECODER is given, installs such an object:
74
75               MIME::WordDecoder->supported("ISO-8859-1" =>
76                                            (new MIME::WordDecoder::ISO_8859 "1"));
77
78           You should not override this method.
79
80       new [\@HANDLERS]
81           Class method, constructor.  If \@HANDLERS is given, then @HANDLERS
82           is passed to handler() to initiallize the internal map.
83
84       handler CHARSET=>\&SUBREF, ...
85           Instance method.  Set the handler SUBREF for a given CHARSET, for
86           as many pairs as you care to supply.
87
88           When performing the translation of a MIME-encoded string, a given
89           SUBREF will be invoked when translating a block of text in charac‐
90           ter set CHARSET.  The subroutine will be invoked with the following
91           arguments:
92
93               DATA    - the data in the given character set.
94               CHARSET - the upcased character set name, which may prove useful
95                         if you are using the same SUBREF for multiple CHARSETs.
96               DECODER - the decoder itself, if it contains configuration information
97                         that your handler function needs.
98
99           For example:
100
101               $wd = new MIME::WordDecoder;
102               $wd->handler('US-ASCII'   => "KEEP");
103               $wd->handler('ISO-8859-1' => \&handle_latin1,
104                            'ISO-8859-2' => \&handle_latin1,
105                            '*'          => "DIE");
106
107           Notice that, much as with %SIG, the SUBREF can also be taken from a
108           set of special keywords:
109
110              KEEP     Pass data through unchanged.
111              IGNORE   Ignore data in this character set, without warning.
112              WARN     Ignore data in this character set, with warning.
113              DIE      Fatal exception with "can't handle character set" message.
114
115           The subroutine for the special CHARSET of 'raw' is used for raw
116           (non-MIME-encoded) text, which is supposed to be US-ASCII.  The
117           handler for 'raw' defaults to whatever was specified for 'US-ASCII'
118           at the time of construction.
119
120           The subroutine for the special CHARSET of '*' is used for any
121           unrecognized character set.  The default action for '*' is WARN.
122
123       decode STRING
124           Instance method.  Decode a STRING which might contain MIME-encoded
125           components into a local representation (e.g., UTF-8, etc.).
126
127       unmime STRING
128           Function, exported.  Decode the given STRING using the default()
129           decoder.  See default().
130

SUBCLASSES

132       MIME::WordDecoder::ISO_8859
133           A simple decoder which keeps US-ASCII and the 7-bit characters of
134           ISO-8859 character sets and UTF8, and also keeps 8-bit characters
135           from the indicated character set.
136
137               ### Construct:
138               $wd = new MIME::WordDecoder::ISO_8859 2;    ### ISO-8859-2
139
140               ### What to translate unknown characters to (can also use empty):
141               ### Default is "?".
142               $wd->unknown("?");
143
144               ### Collapse runs of unknown characters to a single unknown()?
145               ### Default is false.
146               $wd->collapse(1);
147
148           According to http://czyborra.com/charsets/iso8859.html (ca. Novem‐
149           ber 2000):
150
151           ISO 8859 is a full series of 10 (and soon even more) standardized
152           multilingual single-byte coded (8bit) graphic character sets for
153           writing in alphabetic languages:
154
155               1. Latin1 (West European)
156               2. Latin2 (East European)
157               3. Latin3 (South European)
158               4. Latin4 (North European)
159               5. Cyrillic
160               6. Arabic
161               7. Greek
162               8. Hebrew
163               9. Latin5 (Turkish)
164              10. Latin6 (Nordic)
165
166           The ISO 8859 charsets are not even remotely as complete as the
167           truly great Unicode but they have been around and usable for quite
168           a while (first registered Internet charsets for use with MIME) and
169           have already offered a major improvement over the plain 7bit
170           US-ASCII.
171
172           Characters 0 to 127 are always identical with US-ASCII and the
173           positions 128 to 159 hold some less used control characters: the
174           so-called C1 set from ISO 6429.
175
176       MIME::WordDecoder::US_ASCII
177           A subclass of the ISO-8859-1 decoder which discards 8-bit charac‐
178           ters.  You're probably better off using ISO-8859-1.
179

AUTHOR

181       Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
182       David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
183

VERSION

185       $Revision: 1.3 $ $Date: 2005/04/19 16:23:40 $
186
187
188
189perl v5.8.8                       2006-03-17              MIME::WordDecoder(3)