1Charset(3) User Contributed Perl Documentation Charset(3)
2
3
4
6 MIME::Charset - Charset Informations for MIME
7
9 use MIME::Charset:
10
11 $charset = MIME::Charset->new("euc-jp");
12
13 Getting charset informations:
14
15 $benc = $charset->body_encoding; # e.g. "Q"
16 $cset = $charset->canonical_charset; # e.g. "US-ASCII"
17 $henc = $charset->header_encoding; # e.g. "S"
18 $cset = $charset->output_charset; # e.g. "ISO-2022-JP"
19
20 Translating text data:
21
22 ($text, $charset, $encoding) =
23 $charset->header_encode(
24 "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
25 "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef");
26 # ...returns e.g. (<converted>, "ISO-2022-JP", "B");
27
28 ($text, $charset, $encoding) =
29 $charset->body_encode(
30 "Collectioneur path\xe9tiquement ".
31 "\xe9clectique de d\xe9chets");
32 # ...returns e.g. (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");
33
34 $len = $charset->encoded_header_len(
35 "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b"); # e.g. 28
36
37 Manipulating module defaults:
38
39 use MIME::Charset;
40
41 MIME::Charset::alias("csEUCKR", "euc-kr");
42 MIME::Charset::default("iso-8859-1");
43 MIME::Charset::fallback("us-ascii");
44
45 Non-OO functions (may be deprecated in near future):
46
47 use MIME::Charset qw(:info);
48
49 $benc = body_encoding("iso-8859-2"); # "Q"
50 $cset = canonical_charset("ANSI X3.4-1968"); # "US-ASCII"
51 $henc = header_encoding("utf-8"); # "S"
52 $cset = output_charset("shift_jis"); # "ISO-2022-JP"
53
54 use MIME::Charset qw(:trans);
55
56 ($text, $charset, $encoding) =
57 header_encode(
58 "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
59 "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
60 "euc-jp");
61 # ...returns (<converted>, "ISO-2022-JP", "B");
62
63 ($text, $charset, $encoding) =
64 body_encode(
65 "Collectioneur path\xe9tiquement ".
66 "\xe9clectique de d\xe9chets",
67 "latin1");
68 # ...returns (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");
69
70 $len = encoded_header_len(
71 "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); # 28
72
74 MIME::Charset provides informations about character sets used for MIME
75 messages on Internet.
76
77 Definitions
78 The charset is ``character set'' used in MIME to refer to a method of
79 converting a sequence of octets into a sequence of characters. It
80 includes both concepts of ``coded character set'' (CCS) and ``character
81 encoding scheme'' (CES) of ISO/IEC.
82
83 The encoding is that used in MIME to refer to a method of representing
84 a body part or a header body as sequence(s) of printable US-ASCII
85 characters.
86
87 Constructor
88 $charset = MIME::Charset->new([CHARSET [, OPTS]])
89 Create charset object.
90
91 OPTS may accept following key-value pairs. NOTE: When
92 Unicode/multibyte support is disabled (see "USE_ENCODE"),
93 conversion will not be performed. So these options do not have any
94 effects.
95
96 Mapping => MAPTYPE
97 Specify extended mappings actually used for charset names.
98 "EXTENDED" uses extended mappings. "STANDARD" uses
99 standardized strict mappings. Default is "EXTENDED".
100
101 Getting Informations of Charsets
102 $charset->body_encoding
103 body_encoding CHARSET
104 Get recommended transfer-encoding of CHARSET for message body.
105
106 Returned value will be one of "B" (BASE64), "Q" (QUOTED-PRINTABLE)
107 or "undef" (might not be transfer-encoded; either 7BIT or 8BIT).
108 This may not be same as encoding for message header.
109
110 $charset->as_string
111 canonical_charset CHARSET
112 Get canonical name for charset.
113
114 $charset->decoder
115 Get "Encode::Encoding" object to decode strings to Unicode by
116 charset.
117
118 $charset->dup
119 Get a copy of charset object.
120
121 $charset->encoder([CHARSET])
122 Get "Encode::Encoding" object to encode Unicode string using
123 compatible charset recommended to be used for messages on Internet.
124
125 If optional CHARSET is specified, replace encoder (and output
126 charset name) of $charset object with those of CHARSET, therefore,
127 $charset object will be a converter between original charset and
128 new CHARSET.
129
130 $charset->header_encoding
131 header_encoding CHARSET
132 Get recommended encoding scheme of CHARSET for message header.
133
134 Returned value will be one of "B", "Q", "S" (shorter one of either)
135 or "undef" (might not be encoded). This may not be same as
136 encoding for message body.
137
138 $charset->output_charset
139 output_charset CHARSET
140 Get a charset which is compatible with given CHARSET and is
141 recommended to be used for MIME messages on Internet (if it is
142 known by this module).
143
144 When Unicode/multibyte support is disabled (see "USE_ENCODE"), this
145 function will simply return the result of "canonical_charset".
146
147 Translating Text Data
148 $charset->body_encode(STRING [, OPTS])
149 body_encode STRING, CHARSET [, OPTS]
150 Get converted (if needed) data of STRING and recommended transfer-
151 encoding of that data for message body. CHARSET is the charset by
152 which STRING is encoded.
153
154 OPTS may accept following key-value pairs. NOTE: When
155 Unicode/multibyte support is disabled (see "USE_ENCODE"),
156 conversion will not be performed. So these options do not have any
157 effects.
158
159 Detect7bit => YESNO
160 Try auto-detecting 7-bit charset when CHARSET is not given.
161 Default is "YES".
162
163 Replacement => REPLACEMENT
164 Specifies error handling scheme. See "Error Handling".
165
166 3-item list of (converted string, charset for output, transfer-
167 encoding) will be returned. Transfer-encoding will be either
168 "BASE64", "QUOTED-PRINTABLE", "7BIT" or "8BIT". If charset for
169 output could not be determined and converted string contains non-
170 ASCII byte(s), charset for output will be "undef" and transfer-
171 encoding will be "BASE64". Charset for output will be "US-ASCII"
172 if and only if string does not contain any non-ASCII bytes.
173
174 $charset->decode(STRING [,CHECK])
175 Decode STRING to Unicode.
176
177 Note: When Unicode/multibyte support is disabled (see
178 "USE_ENCODE"), this function will die.
179
180 $charset->encode(STRING [,CHECK])
181 Encode STRING (Unicode or non-Unicode) using compatible charset
182 recommended to be used for messages on Internet (if this module
183 knows it). Note that string will be decoded to Unicode then
184 encoded even if compatible charset was equal to original charset.
185
186 Note: When Unicode/multibyte support is disabled (see
187 "USE_ENCODE"), this function will die.
188
189 $charset->encoded_header_len(STRING [, ENCODING])
190 encoded_header_len STRING, ENCODING, CHARSET
191 Get length of encoded STRING for message header (without folding).
192
193 ENCODING may be one of "B", "Q" or "S" (shorter one of either "B"
194 or "Q").
195
196 $charset->header_encode(STRING [, OPTS])
197 header_encode STRING, CHARSET [, OPTS]
198 Get converted (if needed) data of STRING and recommended encoding
199 scheme of that data for message headers. CHARSET is the charset by
200 which STRING is encoded.
201
202 OPTS may accept following key-value pairs. NOTE: When
203 Unicode/multibyte support is disabled (see "USE_ENCODE"),
204 conversion will not be performed. So these options do not have any
205 effects.
206
207 Detect7bit => YESNO
208 Try auto-detecting 7-bit charset when CHARSET is not given.
209 Default is "YES".
210
211 Replacement => REPLACEMENT
212 Specifies error handling scheme. See "Error Handling".
213
214 3-item list of (converted string, charset for output, encoding
215 scheme) will be returned. Encoding scheme will be either "B", "Q"
216 or "undef" (might not be encoded). If charset for output could not
217 be determined and converted string contains non-ASCII byte(s),
218 charset for output will be "8BIT" (this is not charset name but a
219 special value to represent unencodable data) and encoding scheme
220 will be "undef" (should not be encoded). Charset for output will
221 be "US-ASCII" if and only if string does not contain any non-ASCII
222 bytes.
223
224 $charset->undecode(STRING [,CHECK])
225 Encode Unicode string STRING to byte string by input charset of
226 $charset. This is equivalent to "$charset->decoder->encode()".
227
228 Note: When Unicode/multibyte support is disabled (see
229 "USE_ENCODE"), this function will die.
230
231 Manipulating Module Defaults
232 alias ALIAS [, CHARSET]
233 Get/set charset alias for canonical names determined by
234 "canonical_charset".
235
236 If CHARSET is given and isn't false, ALIAS will be assigned as an
237 alias of CHARSET. Otherwise, alias won't be changed. In both
238 cases, current charset name that ALIAS is assigned will be
239 returned.
240
241 default [CHARSET]
242 Get/set default charset.
243
244 Default charset is used by this module when charset context is
245 unknown. Modules using this module are recommended to use this
246 charset when charset context is unknown or implicit default is
247 expected. By default, it is "US-ASCII".
248
249 If CHARSET is given and isn't false, it will be set to default
250 charset. Otherwise, default charset won't be changed. In both
251 cases, current default charset will be returned.
252
253 NOTE: Default charset should not be changed.
254
255 fallback [CHARSET]
256 Get/set fallback charset.
257
258 Fallback charset is used by this module when conversion by given
259 charset is failed and "FALLBACK" error handling scheme is
260 specified. Modules using this module may use this charset as last
261 resort of charset for conversion. By default, it is "UTF-8".
262
263 If CHARSET is given and isn't false, it will be set to fallback
264 charset. If CHARSET is "NONE", fallback charset will be undefined.
265 Otherwise, fallback charset won't be changed. In any cases,
266 current fallback charset will be returned.
267
268 NOTE: It is useful that "US-ASCII" is specified as fallback
269 charset, since result of conversion will be readable without
270 charset informations.
271
272 recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
273 Get/set charset profiles.
274
275 If optional arguments are given and any of them are not false,
276 profiles for CHARSET will be set by those arguments. Otherwise,
277 profiles won't be changed. In both cases, current profiles for
278 CHARSET will be returned as 3-item list of (HEADERENC, BODYENC,
279 ENCCHARSET).
280
281 HEADERENC is recommended encoding scheme for message header. It
282 may be one of "B", "Q", "S" (shorter one of either) or "undef"
283 (might not be encoded).
284
285 BODYENC is recommended transfer-encoding for message body. It may
286 be one of "B", "Q" or "undef" (might not be transfer-encoded).
287
288 ENCCHARSET is a charset which is compatible with given CHARSET and
289 is recommended to be used for MIME messages on Internet. If
290 conversion is not needed (or this module doesn't know appropriate
291 charset), ENCCHARSET is "undef".
292
293 NOTE: This function in the future releases can accept more optional
294 arguments (for example, properties to handle character widths, line
295 folding behavior, ...). So format of returned value may probably
296 be changed. Use "header_encoding", "body_encoding" or
297 "output_charset" to get particular profile.
298
299 Constants
300 USE_ENCODE
301 Unicode/multibyte support flag. Non-empty string will be set when
302 Unicode and multibyte support is enabled. Currently, this flag
303 will be non-empty on Perl 5.8.1 or later and empty string on
304 earlier versions of Perl.
305
306 Error Handling
307 "body_encode" and "header_encode" accept following "Replacement"
308 options:
309
310 "DEFAULT"
311 Put a substitution character in place of a malformed character.
312 For UCM-based encodings, <subchar> will be used.
313
314 "FALLBACK"
315 Try "DEFAULT" scheme using fallback charset (see "fallback"). When
316 fallback charset is undefined and conversion causes error, code
317 will die on error with an error message.
318
319 "CROAK"
320 Code will die on error immediately with an error message.
321 Therefore, you should trap the fatal error with eval{} unless you
322 really want to let it die on error. Synonym is "STRICT".
323
324 "PERLQQ"
325 "HTMLCREF"
326 "XMLCREF"
327 Use "FB_PERLQQ", "FB_HTMLCREF" or "FB_XMLCREF" scheme defined by
328 Encode module.
329
330 numeric values
331 Numeric values are also allowed. For more details see "Handling
332 Malformed Data" in Encode.
333
334 If error handling scheme is not specified or unknown scheme is
335 specified, "DEFAULT" will be assumed.
336
337 Configuration File
338 Built-in defaults for option parameters can be overridden by
339 configuration file: MIME/Charset/Defaults.pm. For more details read
340 MIME/Charset/Defaults.pm.sample.
341
343 Consult $VERSION variable.
344
345 Development versions of this module may be found at
346 http://hatuka.nezumi.nu/repos/MIME-Charset/
347 <http://hatuka.nezumi.nu/repos/MIME-Charset/>.
348
350 Multipurpose Internet Mail Extensions (MIME).
351
353 Copyright (C) 2006-2008 Hatuka*nezumi - IKEDA Soji
354 <hatuka(at)nezumi.nu>.
355
356 All rights reserved. This program is free software; you can
357 redistribute it and/or modify it under the same terms as Perl itself.
358
360 Hey! The above document had some coding errors, which are explained
361 below:
362
363 Around line 301:
364 You forgot a '=back' before '=head2'
365
366 Around line 303:
367 '=item' outside of any '=over'
368
369 Around line 387:
370 You forgot a '=back' before '=head2'
371
372 Around line 389:
373 '=item' outside of any '=over'
374
375 Around line 516:
376 You forgot a '=back' before '=head2'
377
378 Around line 518:
379 '=item' outside of any '=over'
380
381 Around line 883:
382 '=item' outside of any '=over'
383
384 Around line 903:
385 You forgot a '=back' before '=head2'
386
387 Around line 1039:
388 You forgot a '=back' before '=head2'
389
390 Around line 1041:
391 '=item' outside of any '=over'
392
393 Around line 1048:
394 You forgot a '=back' before '=head2'
395
396 Around line 1053:
397 '=item' outside of any '=over'
398
399
400
401perl v5.12.0 2008-04-17 Charset(3)