MIME::Charset(3pm)

1Charset(3)            User Contributed Perl Documentation           Charset(3)
2
3
4

NAME

6       MIME::Charset - Charset Informations for MIME
7

SYNOPSIS

9           use MIME::Charset:
10
11           $charset = MIME::Charset->new("euc-jp");
12
13       Getting charset informations:
14
15           $benc = $charset->body_encoding; # e.g. "Q"
16           $cset = $charset->canonical_charset; # e.g. "US-ASCII"
17           $henc = $charset->header_encoding; # e.g. "S"
18           $cset = $charset->output_charset; # e.g. "ISO-2022-JP"
19
20       Translating text data:
21
22           ($text, $charset, $encoding) =
23               $charset->header_encode(
24                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
25                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef");
26           # ...returns e.g. (<converted>, "ISO-2022-JP", "B");
27
28           ($text, $charset, $encoding) =
29               $charset->body_encode(
30                   "Collectioneur path\xe9tiquement ".
31                   "\xe9clectique de d\xe9chets");
32           # ...returns e.g. (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");
33
34           $len = $charset->encoded_header_len(
35               "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b"); # e.g. 28
36
37       Manipulating module defaults:
38
39           use MIME::Charset;
40
41           MIME::Charset::alias("csEUCKR", "euc-kr");
42           MIME::Charset::default("iso-8859-1");
43           MIME::Charset::fallback("us-ascii");
44
45       Non-OO functions (may be deprecated in near future):
46
47           use MIME::Charset qw(:info);
48
49           $benc = body_encoding("iso-8859-2"); # "Q"
50           $cset = canonical_charset("ANSI X3.4-1968"); # "US-ASCII"
51           $henc = header_encoding("utf-8"); # "S"
52           $cset = output_charset("shift_jis"); # "ISO-2022-JP"
53
54           use MIME::Charset qw(:trans);
55
56           ($text, $charset, $encoding) =
57               header_encode(
58                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
59                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
60                  "euc-jp");
61           # ...returns (<converted>, "ISO-2022-JP", "B");
62
63           ($text, $charset, $encoding) =
64               body_encode(
65                   "Collectioneur path\xe9tiquement ".
66                   "\xe9clectique de d\xe9chets",
67                   "latin1");
68           # ...returns (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");
69
70           $len = encoded_header_len(
71               "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); # 28
72

DESCRIPTION

74       MIME::Charset provides informations about character sets used for MIME
75       messages on Internet.
76
77   Definitions
78       The charset is ``character set'' used in MIME to refer to a method of
79       converting a sequence of octets into a sequence of characters.  It
80       includes both concepts of ``coded character set'' (CCS) and ``character
81       encoding scheme'' (CES) of ISO/IEC.
82
83       The encoding is that used in MIME to refer to a method of representing
84       a body part or a header body as sequence(s) of printable US-ASCII
85       characters.
86
87   Constructor
88       $charset = MIME::Charset->new([CHARSET [, OPTS]])
89           Create charset object.
90
91           OPTS may accept following key-value pairs.  NOTE: When
92           Unicode/multibyte support is disabled (see "USE_ENCODE"),
93           conversion will not be performed.  So these options do not have any
94           effects.
95
96           Mapping => MAPTYPE
97               Specify extended mappings actually used for charset names.
98               "EXTENDED" uses extended mappings.  "STANDARD" uses
99               standardized strict mappings.  Default is "EXTENDED".
100
101   Getting Informations of Charsets
102       $charset->body_encoding
103       body_encoding CHARSET
104           Get recommended transfer-encoding of CHARSET for message body.
105
106           Returned value will be one of "B" (BASE64), "Q" (QUOTED-PRINTABLE)
107           or "undef" (might not be transfer-encoded; either 7BIT or 8BIT).
108           This may not be same as encoding for message header.
109
110       $charset->as_string
111       canonical_charset CHARSET
112           Get canonical name for charset.
113
114       $charset->decoder
115           Get "Encode::Encoding" object to decode strings to Unicode by
116           charset.
117
118       $charset->dup
119           Get a copy of charset object.
120
121       $charset->encoder([CHARSET])
122           Get "Encode::Encoding" object to encode Unicode string using
123           compatible charset recommended to be used for messages on Internet.
124
125           If optional CHARSET is specified, replace encoder (and output
126           charset name) of $charset object with those of CHARSET, therefore,
127           $charset object will be a converter between original charset and
128           new CHARSET.
129
130       $charset->header_encoding
131       header_encoding CHARSET
132           Get recommended encoding scheme of CHARSET for message header.
133
134           Returned value will be one of "B", "Q", "S" (shorter one of either)
135           or "undef" (might not be encoded).  This may not be same as
136           encoding for message body.
137
138       $charset->output_charset
139       output_charset CHARSET
140           Get a charset which is compatible with given CHARSET and is
141           recommended to be used for MIME messages on Internet (if it is
142           known by this module).
143
144           When Unicode/multibyte support is disabled (see "USE_ENCODE"), this
145           function will simply return the result of "canonical_charset".
146
147   Translating Text Data
148       $charset->body_encode(STRING [, OPTS])
149       body_encode STRING, CHARSET [, OPTS]
150           Get converted (if needed) data of STRING and recommended transfer-
151           encoding of that data for message body.  CHARSET is the charset by
152           which STRING is encoded.
153
154           OPTS may accept following key-value pairs.  NOTE: When
155           Unicode/multibyte support is disabled (see "USE_ENCODE"),
156           conversion will not be performed.  So these options do not have any
157           effects.
158
159           Detect7bit => YESNO
160               Try auto-detecting 7-bit charset when CHARSET is not given.
161               Default is "YES".
162
163           Replacement => REPLACEMENT
164               Specifies error handling scheme.  See "Error Handling".
165
166           3-item list of (converted string, charset for output, transfer-
167           encoding) will be returned.  Transfer-encoding will be either
168           "BASE64", "QUOTED-PRINTABLE", "7BIT" or "8BIT".  If charset for
169           output could not be determined and converted string contains non-
170           ASCII byte(s), charset for output will be "undef" and transfer-
171           encoding will be "BASE64".  Charset for output will be "US-ASCII"
172           if and only if string does not contain any non-ASCII bytes.
173
174       $charset->decode(STRING [,CHECK])
175           Decode STRING to Unicode.
176
177           Note: When Unicode/multibyte support is disabled (see
178           "USE_ENCODE"), this function will die.
179
180       $charset->encode(STRING [,CHECK])
181           Encode STRING (Unicode or non-Unicode) using compatible charset
182           recommended to be used for messages on Internet (if this module
183           knows it).  Note that string will be decoded to Unicode then
184           encoded even if compatible charset was equal to original charset.
185
186           Note: When Unicode/multibyte support is disabled (see
187           "USE_ENCODE"), this function will die.
188
189       $charset->encoded_header_len(STRING [, ENCODING])
190       encoded_header_len STRING, ENCODING, CHARSET
191           Get length of encoded STRING for message header (without folding).
192
193           ENCODING may be one of "B", "Q" or "S" (shorter one of either "B"
194           or "Q").
195
196       $charset->header_encode(STRING [, OPTS])
197       header_encode STRING, CHARSET [, OPTS]
198           Get converted (if needed) data of STRING and recommended encoding
199           scheme of that data for message headers.  CHARSET is the charset by
200           which STRING is encoded.
201
202           OPTS may accept following key-value pairs.  NOTE: When
203           Unicode/multibyte support is disabled (see "USE_ENCODE"),
204           conversion will not be performed.  So these options do not have any
205           effects.
206
207           Detect7bit => YESNO
208               Try auto-detecting 7-bit charset when CHARSET is not given.
209               Default is "YES".
210
211           Replacement => REPLACEMENT
212               Specifies error handling scheme.  See "Error Handling".
213
214           3-item list of (converted string, charset for output, encoding
215           scheme) will be returned.  Encoding scheme will be either "B", "Q"
216           or "undef" (might not be encoded).  If charset for output could not
217           be determined and converted string contains non-ASCII byte(s),
218           charset for output will be "8BIT" (this is not charset name but a
219           special value to represent unencodable data) and encoding scheme
220           will be "undef" (should not be encoded).  Charset for output will
221           be "US-ASCII" if and only if string does not contain any non-ASCII
222           bytes.
223
224       $charset->undecode(STRING [,CHECK])
225           Encode Unicode string STRING to byte string by input charset of
226           $charset.  This is equivalent to "$charset->decoder->encode()".
227
228           Note: When Unicode/multibyte support is disabled (see
229           "USE_ENCODE"), this function will die.
230
231   Manipulating Module Defaults
232       alias ALIAS [, CHARSET]
233           Get/set charset alias for canonical names determined by
234           "canonical_charset".
235
236           If CHARSET is given and isn't false, ALIAS will be assigned as an
237           alias of CHARSET.  Otherwise, alias won't be changed.  In both
238           cases, current charset name that ALIAS is assigned will be
239           returned.
240
241       default [CHARSET]
242           Get/set default charset.
243
244           Default charset is used by this module when charset context is
245           unknown.  Modules using this module are recommended to use this
246           charset when charset context is unknown or implicit default is
247           expected.  By default, it is "US-ASCII".
248
249           If CHARSET is given and isn't false, it will be set to default
250           charset.  Otherwise, default charset won't be changed.  In both
251           cases, current default charset will be returned.
252
253           NOTE: Default charset should not be changed.
254
255       fallback [CHARSET]
256           Get/set fallback charset.
257
258           Fallback charset is used by this module when conversion by given
259           charset is failed and "FALLBACK" error handling scheme is
260           specified.  Modules using this module may use this charset as last
261           resort of charset for conversion.  By default, it is "UTF-8".
262
263           If CHARSET is given and isn't false, it will be set to fallback
264           charset.  If CHARSET is "NONE", fallback charset will be undefined.
265           Otherwise, fallback charset won't be changed.  In any cases,
266           current fallback charset will be returned.
267
268           NOTE: It is useful that "US-ASCII" is specified as fallback
269           charset, since result of conversion will be readable without
270           charset informations.
271
272       recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
273           Get/set charset profiles.
274
275           If optional arguments are given and any of them are not false,
276           profiles for CHARSET will be set by those arguments.  Otherwise,
277           profiles won't be changed.  In both cases, current profiles for
278           CHARSET will be returned as 3-item list of (HEADERENC, BODYENC,
279           ENCCHARSET).
280
281           HEADERENC is recommended encoding scheme for message header.  It
282           may be one of "B", "Q", "S" (shorter one of either) or "undef"
283           (might not be encoded).
284
285           BODYENC is recommended transfer-encoding for message body.  It may
286           be one of "B", "Q" or "undef" (might not be transfer-encoded).
287
288           ENCCHARSET is a charset which is compatible with given CHARSET and
289           is recommended to be used for MIME messages on Internet.  If
290           conversion is not needed (or this module doesn't know appropriate
291           charset), ENCCHARSET is "undef".
292
293           NOTE: This function in the future releases can accept more optional
294           arguments (for example, properties to handle character widths, line
295           folding behavior, ...).  So format of returned value may probably
296           be changed.  Use "header_encoding", "body_encoding" or
297           "output_charset" to get particular profile.
298
299   Constants
300       USE_ENCODE
301           Unicode/multibyte support flag.  Non-empty string will be set when
302           Unicode and multibyte support is enabled.  Currently, this flag
303           will be non-empty on Perl 5.8.1 or later and empty string on
304           earlier versions of Perl.
305
306   Error Handling
307       "body_encode" and "header_encode" accept following "Replacement"
308       options:
309
310       "DEFAULT"
311           Put a substitution character in place of a malformed character.
312           For UCM-based encodings, <subchar> will be used.
313
314       "FALLBACK"
315           Try "DEFAULT" scheme using fallback charset (see "fallback").  When
316           fallback charset is undefined and conversion causes error, code
317           will die on error with an error message.
318
319       "CROAK"
320           Code will die on error immediately with an error message.
321           Therefore, you should trap the fatal error with eval{} unless you
322           really want to let it die on error.  Synonym is "STRICT".
323
324       "PERLQQ"
325       "HTMLCREF"
326       "XMLCREF"
327           Use "FB_PERLQQ", "FB_HTMLCREF" or "FB_XMLCREF" scheme defined by
328           Encode module.
329
330       numeric values
331           Numeric values are also allowed.  For more details see "Handling
332           Malformed Data" in Encode.
333
334       If error handling scheme is not specified or unknown scheme is
335       specified, "DEFAULT" will be assumed.
336
337   Configuration File
338       Built-in defaults for option parameters can be overridden by
339       configuration file: MIME/Charset/Defaults.pm.  For more details read
340       MIME/Charset/Defaults.pm.sample.
341

VERSION

343       Consult $VERSION variable.
344
345       Development versions of this module may be found at
346       http://hatuka.nezumi.nu/repos/MIME-Charset/
347       <http://hatuka.nezumi.nu/repos/MIME-Charset/>.
348

AUTHORS

353       Copyright (C) 2006-2008 Hatuka*nezumi - IKEDA Soji
354       <hatuka(at)nezumi.nu>.
355
356       All rights reserved.  This program is free software; you can
357       redistribute it and/or modify it under the same terms as Perl itself.
358

POD ERRORS

360       Hey! The above document had some coding errors, which are explained
361       below:
362
363       Around line 301:
364           You forgot a '=back' before '=head2'
365
366       Around line 303:
367           '=item' outside of any '=over'
368
369       Around line 387:
370           You forgot a '=back' before '=head2'
371
372       Around line 389:
373           '=item' outside of any '=over'
374
375       Around line 516:
376           You forgot a '=back' before '=head2'
377
378       Around line 518:
379           '=item' outside of any '=over'
380
381       Around line 883:
382           '=item' outside of any '=over'
383
384       Around line 903:
385           You forgot a '=back' before '=head2'
386
387       Around line 1039:
388           You forgot a '=back' before '=head2'
389
390       Around line 1041:
391           '=item' outside of any '=over'
392
393       Around line 1048:
394           You forgot a '=back' before '=head2'
395
396       Around line 1053:
397           '=item' outside of any '=over'
398
399
400
401perl v5.12.0                      2008-04-17                        Charset(3)

NAME

SYNOPSIS

DESCRIPTION

VERSION

SEE ALSO

AUTHORS

POD ERRORS