MIME::WordDecoder(3pm)

1MIME::WordDecoder(3)  User Contributed Perl Documentation MIME::WordDecoder(3)
2
3
4

NAME

6       MIME::WordDecoder - decode RFC 2047 encoded words to a local
7       representation
8
9       WARNING: Most of this module is deprecated and may disappear.  The only
10       function you should use for MIME decoding is "mime_to_perl_string".
11

SYNOPSIS

13       See MIME::Words for the basics of encoded words.  See "DESCRIPTION" for
14       how this class works.
15
16           use MIME::WordDecoder;
17
18
19           ### Get the default word-decoder (used by unmime()):
20           $wd = default MIME::WordDecoder;
21
22           ### Get a word-decoder which maps to ISO-8859-1 (Latin1):
23           $wd = supported MIME::WordDecoder "ISO-8859-1";
24
25
26           ### Decode a MIME string (e.g., into Latin1) via the default decoder:
27           $str = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
28
29           ### Decode a string using the default decoder, non-OO style:
30           $str = unmime('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
31
32           ### Decode a string to an internal Perl string, non-OO style
33           ### The result is likely to have the UTF8 flag ON.
34           $str = mime_to_perl_string('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
35

DESCRIPTION

37       WARNING: Most of this module is deprecated and may disappear.  It
38       duplicates (badly) the function of the standard 'Encode' module.  The
39       only function you should rely on is mime_to_perl_string.
40
41       A MIME::WordDecoder consists, fundamentally, of a hash which maps a
42       character set name (US-ASCII, ISO-8859-1, etc.) to a subroutine which
43       knows how to take bytes in that character set and turn them into the
44       target string representation.  Ideally, this target representation
45       would be Unicode, but we don't want to overspecify the translation that
46       takes place: if you want to convert MIME strings directly to Big5,
47       that's your own decision.
48
49       The subroutine will be invoked with two arguments: DATA (the data in
50       the given character set), and CHARSET (the upcased character set name).
51
52       For example:
53
54           ### Keep 7-bit characters as-is, convert 8-bit characters to '#':
55           sub keep7bit {
56               local $_ = shift;
57               tr/\x00-\x7F/#/c;
58               $_;
59           }
60
61       Here's a decoder which uses that:
62
63          ### Construct a decoder:
64          $wd = MIME::WordDecoder->new({'US-ASCII'   => "KEEP",   ### sub { $_[0] }
65                                        'ISO-8859-1' => \&keep7bit,
66                                        'ISO-8859-2' => \&keep7bit,
67                                        'Big5'       => "WARN",
68                                        '*'          => "DIE"});
69
70          ### Convert some MIME text to a pure ASCII string...
71          $ascii = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
72
73          ### ...which will now hold: "To: Keld J#rn Simonsen <keld>"
74
75       The UTF-8 built-in decoder decodes everything into Perl's internal
76       string format, possibly turning on the internal UTF8 flag.  Use it like
77       this:
78
79           $wd = supported MIME::WordDecoder 'UTF-8';
80           $perl_string = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
81           # perl_string will be a valid UTF-8 string with the "UTF8" flag set.
82
83       Generally, you should use the UTF-8 decoder in preference to "unmime".
84

PUBLIC INTERFACE

86       default [DECODER]
87           Class method.  Get/set the default DECODER object.
88
89       supported CHARSET, [DECODER]
90           Class method.  If just CHARSET is given, returns a decoder object
91           which maps data into that character set (the character set is
92           forced to all-uppercase).
93
94               $wd = supported MIME::WordDecoder "ISO-8859-1";
95
96           If DECODER is given, installs such an object:
97
98               MIME::WordDecoder->supported("ISO-8859-1" =>
99                                            (new MIME::WordDecoder::ISO_8859 "1"));
100
101           You should not override this method.
102
103       new [\@HANDLERS]
104           Class method, constructor.  If \@HANDLERS is given, then @HANDLERS
105           is passed to handler() to initialize the internal map.
106
107       handler CHARSET=>\&SUBREF, ...
108           Instance method.  Set the handler SUBREF for a given CHARSET, for
109           as many pairs as you care to supply.
110
111           When performing the translation of a MIME-encoded string, a given
112           SUBREF will be invoked when translating a block of text in
113           character set CHARSET.  The subroutine will be invoked with the
114           following arguments:
115
116               DATA    - the data in the given character set.
117               CHARSET - the upcased character set name, which may prove useful
118                         if you are using the same SUBREF for multiple CHARSETs.
119               DECODER - the decoder itself, if it contains configuration information
120                         that your handler function needs.
121
122           For example:
123
124               $wd = new MIME::WordDecoder;
125               $wd->handler('US-ASCII'   => "KEEP");
126               $wd->handler('ISO-8859-1' => \&handle_latin1,
127                            'ISO-8859-2' => \&handle_latin1,
128                            '*'          => "DIE");
129
130           Notice that, much as with %SIG, the SUBREF can also be taken from a
131           set of special keywords:
132
133              KEEP     Pass data through unchanged.
134              IGNORE   Ignore data in this character set, without warning.
135              WARN     Ignore data in this character set, with warning.
136              DIE      Fatal exception with "can't handle character set" message.
137
138           The subroutine for the special CHARSET of 'raw' is used for raw
139           (non-MIME-encoded) text, which is supposed to be US-ASCII.  The
140           handler for 'raw' defaults to whatever was specified for 'US-ASCII'
141           at the time of construction.
142
143           The subroutine for the special CHARSET of '*' is used for any
144           unrecognized character set.  The default action for '*' is WARN.
145
146       decode STRING
147           Instance method.  Decode a STRING which might contain MIME-encoded
148           components into a local representation (e.g., UTF-8, etc.).
149
150       unmime STRING
151           Function, exported.  Decode the given STRING using the default()
152           decoder.  See default().
153
154           You should consider using the UTF-8 decoder instead.  It decodes
155           MIME strings into Perl's internal string format.
156
157       mime_to_perl_string
158           Function, exported.  Decode the given STRING into an internal Perl
159           Unicode string.  You should use this function in preference to all
160           others.
161
162           The result of mime_to_perl_string is likely to have Perl's UTF8
163           flag set.
164

SUBCLASSES

166       MIME::WordDecoder::ISO_8859
167           A simple decoder which keeps US-ASCII and the 7-bit characters of
168           ISO-8859 character sets and UTF8, and also keeps 8-bit characters
169           from the indicated character set.
170
171               ### Construct:
172               $wd = new MIME::WordDecoder::ISO_8859 2;    ### ISO-8859-2
173
174               ### What to translate unknown characters to (can also use empty):
175               ### Default is "?".
176               $wd->unknown("?");
177
178               ### Collapse runs of unknown characters to a single unknown()?
179               ### Default is false.
180               $wd->collapse(1);
181
182           According to http://czyborra.com/charsets/iso8859.html (ca.
183           November 2000):
184
185           ISO 8859 is a full series of 10 (and soon even more) standardized
186           multilingual single-byte coded (8bit) graphic character sets for
187           writing in alphabetic languages:
188
189               1. Latin1 (West European)
190               2. Latin2 (East European)
191               3. Latin3 (South European)
192               4. Latin4 (North European)
193               5. Cyrillic
194               6. Arabic
195               7. Greek
196               8. Hebrew
197               9. Latin5 (Turkish)
198              10. Latin6 (Nordic)
199
200           The ISO 8859 charsets are not even remotely as complete as the
201           truly great Unicode but they have been around and usable for quite
202           a while (first registered Internet charsets for use with MIME) and
203           have already offered a major improvement over the plain 7bit US-
204           ASCII.
205
206           Characters 0 to 127 are always identical with US-ASCII and the
207           positions 128 to 159 hold some less used control characters: the
208           so-called C1 set from ISO 6429.
209
210       MIME::WordDecoder::US_ASCII
211           A subclass of the ISO-8859-1 decoder which discards 8-bit
212           characters.  You're probably better off using ISO-8859-1.
213

AUTHOR

218       Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
219       Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
220
221
222
223perl v5.28.0                      2015-06-19              MIME::WordDecoder(3)

NAME

SYNOPSIS

DESCRIPTION

PUBLIC INTERFACE

SUBCLASSES

SEE ALSO

AUTHOR