1MIME::WordDecoder(3) User Contributed Perl Documentation MIME::WordDecoder(3)
2
3
4
6 MIME::WordDecoder - decode RFC-1522 encoded words to a local represen‐
7 tation
8
10 See MIME::Words for the basics of encoded words. See "DESCRIPTION" for
11 how this class works.
12
13 use MIME::WordDecoder;
14
15 ### Get the default word-decoder (used by unmime()):
16 $wd = default MIME::WordDecoder;
17
18 ### Get a word-decoder which maps to ISO-8859-1 (Latin1):
19 $wd = supported MIME::WordDecoder "ISO-8859-1";
20
21 ### Decode a MIME string (e.g., into Latin1) via the default decoder:
22 $str = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
23
24 ### Decode a string using the default decoder, non-OO style:
25 $str = unmime('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
26
28 A MIME::WordDecoder consists, fundamentally, of a hash which maps a
29 character set name (US-ASCII, ISO-8859-1, etc.) to a subroutine which
30 knows how to take bytes in that character set and turn them into the
31 target string representation. Ideally, this target representation
32 would be Unicode, but we don't want to overspecify the translation that
33 takes place: if you want to convert MIME strings directly to Big5,
34 that's your own decision.
35
36 The subroutine will be invoked with two arguments: DATA (the data in
37 the given character set), and CHARSET (the upcased character set name).
38
39 For example:
40
41 ### Keep 7-bit characters as-is, convert 8-bit characters to '#':
42 sub keep7bit {
43 local $_ = shift;
44 tr/\x00-\x7F/#/c;
45 $_;
46 }
47
48 Here's a decoder which uses that:
49
50 ### Construct a decoder:
51 $wd = MIME::WordDecoder->new({'US-ASCII' => "KEEP", ### sub { $_[0] }
52 'ISO-8859-1' => \&keep7bit,
53 'ISO-8859-2' => \&keep7bit,
54 'Big5' => "WARN",
55 '*' => "DIE"});
56
57 ### Convert some MIME text to a pure ASCII string...
58 $ascii = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
59
60 ### ...which will now hold: "To: Keld J#rn Simonsen <keld>"
61
63 default [DECODER]
64 Class method. Get/set the default DECODER object.
65
66 supported CHARSET, [DECODER]
67 Class method. If just CHARSET is given, returns a decoder object
68 which maps data into that character set (the character set is
69 forced to all-uppercase).
70
71 $wd = supported MIME::WordDecoder "ISO-8859-1";
72
73 If DECODER is given, installs such an object:
74
75 MIME::WordDecoder->supported("ISO-8859-1" =>
76 (new MIME::WordDecoder::ISO_8859 "1"));
77
78 You should not override this method.
79
80 new [\@HANDLERS]
81 Class method, constructor. If \@HANDLERS is given, then @HANDLERS
82 is passed to handler() to initiallize the internal map.
83
84 handler CHARSET=>\&SUBREF, ...
85 Instance method. Set the handler SUBREF for a given CHARSET, for
86 as many pairs as you care to supply.
87
88 When performing the translation of a MIME-encoded string, a given
89 SUBREF will be invoked when translating a block of text in charac‐
90 ter set CHARSET. The subroutine will be invoked with the following
91 arguments:
92
93 DATA - the data in the given character set.
94 CHARSET - the upcased character set name, which may prove useful
95 if you are using the same SUBREF for multiple CHARSETs.
96 DECODER - the decoder itself, if it contains configuration information
97 that your handler function needs.
98
99 For example:
100
101 $wd = new MIME::WordDecoder;
102 $wd->handler('US-ASCII' => "KEEP");
103 $wd->handler('ISO-8859-1' => \&handle_latin1,
104 'ISO-8859-2' => \&handle_latin1,
105 '*' => "DIE");
106
107 Notice that, much as with %SIG, the SUBREF can also be taken from a
108 set of special keywords:
109
110 KEEP Pass data through unchanged.
111 IGNORE Ignore data in this character set, without warning.
112 WARN Ignore data in this character set, with warning.
113 DIE Fatal exception with "can't handle character set" message.
114
115 The subroutine for the special CHARSET of 'raw' is used for raw
116 (non-MIME-encoded) text, which is supposed to be US-ASCII. The
117 handler for 'raw' defaults to whatever was specified for 'US-ASCII'
118 at the time of construction.
119
120 The subroutine for the special CHARSET of '*' is used for any
121 unrecognized character set. The default action for '*' is WARN.
122
123 decode STRING
124 Instance method. Decode a STRING which might contain MIME-encoded
125 components into a local representation (e.g., UTF-8, etc.).
126
127 unmime STRING
128 Function, exported. Decode the given STRING using the default()
129 decoder. See default().
130
132 MIME::WordDecoder::ISO_8859
133 A simple decoder which keeps US-ASCII and the 7-bit characters of
134 ISO-8859 character sets and UTF8, and also keeps 8-bit characters
135 from the indicated character set.
136
137 ### Construct:
138 $wd = new MIME::WordDecoder::ISO_8859 2; ### ISO-8859-2
139
140 ### What to translate unknown characters to (can also use empty):
141 ### Default is "?".
142 $wd->unknown("?");
143
144 ### Collapse runs of unknown characters to a single unknown()?
145 ### Default is false.
146 $wd->collapse(1);
147
148 According to http://czyborra.com/charsets/iso8859.html (ca. Novem‐
149 ber 2000):
150
151 ISO 8859 is a full series of 10 (and soon even more) standardized
152 multilingual single-byte coded (8bit) graphic character sets for
153 writing in alphabetic languages:
154
155 1. Latin1 (West European)
156 2. Latin2 (East European)
157 3. Latin3 (South European)
158 4. Latin4 (North European)
159 5. Cyrillic
160 6. Arabic
161 7. Greek
162 8. Hebrew
163 9. Latin5 (Turkish)
164 10. Latin6 (Nordic)
165
166 The ISO 8859 charsets are not even remotely as complete as the
167 truly great Unicode but they have been around and usable for quite
168 a while (first registered Internet charsets for use with MIME) and
169 have already offered a major improvement over the plain 7bit
170 US-ASCII.
171
172 Characters 0 to 127 are always identical with US-ASCII and the
173 positions 128 to 159 hold some less used control characters: the
174 so-called C1 set from ISO 6429.
175
176 MIME::WordDecoder::US_ASCII
177 A subclass of the ISO-8859-1 decoder which discards 8-bit charac‐
178 ters. You're probably better off using ISO-8859-1.
179
181 Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
182 David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
183
185 $Revision: 1.3 $ $Date: 2005/04/19 16:23:40 $
186
187
188
189perl v5.8.8 2006-03-17 MIME::WordDecoder(3)