1MIME::WordDecoder(3) User Contributed Perl Documentation MIME::WordDecoder(3)
2
3
4
6 MIME::WordDecoder - decode RFC 2047 encoded words to a local
7 representation
8
9 WARNING: Most of this module is deprecated and may disappear. The only
10 function you should use for MIME decoding is "mime_to_perl_string".
11
13 See MIME::Words for the basics of encoded words. See "DESCRIPTION" for
14 how this class works.
15
16 use MIME::WordDecoder;
17
18
19 ### Get the default word-decoder (used by unmime()):
20 $wd = default MIME::WordDecoder;
21
22 ### Get a word-decoder which maps to ISO-8859-1 (Latin1):
23 $wd = supported MIME::WordDecoder "ISO-8859-1";
24
25
26 ### Decode a MIME string (e.g., into Latin1) via the default decoder:
27 $str = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
28
29 ### Decode a string using the default decoder, non-OO style:
30 $str = unmime('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
31
32 ### Decode a string to an internal Perl string, non-OO style
33 ### The result is likely to have the UTF8 flag ON.
34 $str = mime_to_perl_string('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
35
37 WARNING: Most of this module is deprecated and may disappear. It
38 duplicates (badly) the function of the standard 'Encode' module. The
39 only function you should rely on is mime_to_perl_string.
40
41 A MIME::WordDecoder consists, fundamentally, of a hash which maps a
42 character set name (US-ASCII, ISO-8859-1, etc.) to a subroutine which
43 knows how to take bytes in that character set and turn them into the
44 target string representation. Ideally, this target representation
45 would be Unicode, but we don't want to overspecify the translation that
46 takes place: if you want to convert MIME strings directly to Big5,
47 that's your own decision.
48
49 The subroutine will be invoked with two arguments: DATA (the data in
50 the given character set), and CHARSET (the upcased character set name).
51
52 For example:
53
54 ### Keep 7-bit characters as-is, convert 8-bit characters to '#':
55 sub keep7bit {
56 local $_ = shift;
57 tr/\x00-\x7F/#/c;
58 $_;
59 }
60
61 Here's a decoder which uses that:
62
63 ### Construct a decoder:
64 $wd = MIME::WordDecoder->new({'US-ASCII' => "KEEP", ### sub { $_[0] }
65 'ISO-8859-1' => \&keep7bit,
66 'ISO-8859-2' => \&keep7bit,
67 'Big5' => "WARN",
68 '*' => "DIE"});
69
70 ### Convert some MIME text to a pure ASCII string...
71 $ascii = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
72
73 ### ...which will now hold: "To: Keld J#rn Simonsen <keld>"
74
75 The UTF-8 built-in decoder decodes everything into Perl's internal
76 string format, possibly turning on the internal UTF8 flag. Use it like
77 this:
78
79 $wd = supported MIME::WordDecoder 'UTF-8';
80 $perl_string = $wd->decode('To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld>');
81 # perl_string will be a valid UTF-8 string with the "UTF8" flag set.
82
83 Generally, you should use the UTF-8 decoder in preference to "unmime".
84
86 default [DECODER]
87 Class method. Get/set the default DECODER object.
88
89 supported CHARSET, [DECODER]
90 Class method. If just CHARSET is given, returns a decoder object
91 which maps data into that character set (the character set is
92 forced to all-uppercase).
93
94 $wd = supported MIME::WordDecoder "ISO-8859-1";
95
96 If DECODER is given, installs such an object:
97
98 MIME::WordDecoder->supported("ISO-8859-1" =>
99 (new MIME::WordDecoder::ISO_8859 "1"));
100
101 You should not override this method.
102
103 new [\@HANDLERS]
104 Class method, constructor. If \@HANDLERS is given, then @HANDLERS
105 is passed to handler() to initialize the internal map.
106
107 handler CHARSET=>\&SUBREF, ...
108 Instance method. Set the handler SUBREF for a given CHARSET, for
109 as many pairs as you care to supply.
110
111 When performing the translation of a MIME-encoded string, a given
112 SUBREF will be invoked when translating a block of text in
113 character set CHARSET. The subroutine will be invoked with the
114 following arguments:
115
116 DATA - the data in the given character set.
117 CHARSET - the upcased character set name, which may prove useful
118 if you are using the same SUBREF for multiple CHARSETs.
119 DECODER - the decoder itself, if it contains configuration information
120 that your handler function needs.
121
122 For example:
123
124 $wd = new MIME::WordDecoder;
125 $wd->handler('US-ASCII' => "KEEP");
126 $wd->handler('ISO-8859-1' => \&handle_latin1,
127 'ISO-8859-2' => \&handle_latin1,
128 '*' => "DIE");
129
130 Notice that, much as with %SIG, the SUBREF can also be taken from a
131 set of special keywords:
132
133 KEEP Pass data through unchanged.
134 IGNORE Ignore data in this character set, without warning.
135 WARN Ignore data in this character set, with warning.
136 DIE Fatal exception with "can't handle character set" message.
137
138 The subroutine for the special CHARSET of 'raw' is used for raw
139 (non-MIME-encoded) text, which is supposed to be US-ASCII. The
140 handler for 'raw' defaults to whatever was specified for 'US-ASCII'
141 at the time of construction.
142
143 The subroutine for the special CHARSET of '*' is used for any
144 unrecognized character set. The default action for '*' is WARN.
145
146 decode STRING
147 Instance method. Decode a STRING which might contain MIME-encoded
148 components into a local representation (e.g., UTF-8, etc.).
149
150 unmime STRING
151 Function, exported. Decode the given STRING using the default()
152 decoder. See default().
153
154 You should consider using the UTF-8 decoder instead. It decodes
155 MIME strings into Perl's internal string format.
156
157 mime_to_perl_string
158 Function, exported. Decode the given STRING into an internal Perl
159 Unicode string. You should use this function in preference to all
160 others.
161
162 The result of mime_to_perl_string is likely to have Perl's UTF8
163 flag set.
164
166 MIME::WordDecoder::ISO_8859
167 A simple decoder which keeps US-ASCII and the 7-bit characters of
168 ISO-8859 character sets and UTF8, and also keeps 8-bit characters
169 from the indicated character set.
170
171 ### Construct:
172 $wd = new MIME::WordDecoder::ISO_8859 2; ### ISO-8859-2
173
174 ### What to translate unknown characters to (can also use empty):
175 ### Default is "?".
176 $wd->unknown("?");
177
178 ### Collapse runs of unknown characters to a single unknown()?
179 ### Default is false.
180 $wd->collapse(1);
181
182 According to http://czyborra.com/charsets/iso8859.html (ca.
183 November 2000):
184
185 ISO 8859 is a full series of 10 (and soon even more) standardized
186 multilingual single-byte coded (8bit) graphic character sets for
187 writing in alphabetic languages:
188
189 1. Latin1 (West European)
190 2. Latin2 (East European)
191 3. Latin3 (South European)
192 4. Latin4 (North European)
193 5. Cyrillic
194 6. Arabic
195 7. Greek
196 8. Hebrew
197 9. Latin5 (Turkish)
198 10. Latin6 (Nordic)
199
200 The ISO 8859 charsets are not even remotely as complete as the
201 truly great Unicode but they have been around and usable for quite
202 a while (first registered Internet charsets for use with MIME) and
203 have already offered a major improvement over the plain 7bit US-
204 ASCII.
205
206 Characters 0 to 127 are always identical with US-ASCII and the
207 positions 128 to 159 hold some less used control characters: the
208 so-called C1 set from ISO 6429.
209
210 MIME::WordDecoder::US_ASCII
211 A subclass of the ISO-8859-1 decoder which discards 8-bit
212 characters. You're probably better off using ISO-8859-1.
213
215 MIME::Tools
216
218 Eryq (eryq@zeegee.com), ZeeGee Software Inc (http://www.zeegee.com).
219 Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com
220
221
222
223perl v5.34.0 2021-07-22 MIME::WordDecoder(3)