1Locale::Recode(3) User Contributed Perl Documentation Locale::Recode(3)
2
3
4
6 Locale::Recode - Object-Oriented Portable Charset Conversion
7
9 use Locale::Recode;
10
11 $cd = Locale::Recode->new (from => 'UTF-8',
12 to => 'ISO-8859-1');
13
14 die $cd->getError if $cd->getError;
15
16 $cd->recode ($text) or die $cd->getError;
17
18 $mime_name = Locale::Recode->resolveAlias ('latin-1');
19
20 $supported = Locale::Recode->getSupported;
21
22 $complete = Locale::Recode->getCharsets;
23
25 This module provides routines that convert textual data from one code‐
26 set to another in a portable way. The module has been started before
27 Encode(3) was written. It's main purpose today is to provide charset
28 conversion even when Encode(3) is not available on the system. It
29 should also work for older Perl versions without Unicode support.
30
31 Internally Locale::Recode(3) will use Encode(3) whenever possible, to
32 allow for a faster conversion and for a wider range of supported
33 charsets, and will only fall back to the Perl implementation when
34 Encode(3) is not available or does not support a particular charset
35 that Locale::Recode(3) does.
36
37 Locale::Recode(3) is part of libintl-perl, and it's main purpose is
38 actually to implement a portable charset conversion framework for the
39 message translation facilities described in Locale::TextDomain(3).
40
42 The constructor "new()" requires two named arguments:
43
44 from
45 The encoding of the original data. Case doesn't matter, aliases
46 are resolved.
47
48 to The target encoding. Again, case doesn't matter, and aliases are
49 resolved.
50
51 The constructor will never fail. In case of an error, the object's
52 internal state is set to bad and it will refuse to do any conversions.
53 You can inquire the reason for the failure with the method getError().
54
56 The following object methods are available.
57
58 recode (STRING)
59 Converts STRING from the source encoding into the destination
60 encoding. In case of success, a truth value is returned, false
61 otherwise. You can inquire the reason for the failure with the
62 method getError().
63
64 getError
65 Returns either false if the object is not in an error state or an
66 error message.
67
69 The object provides some additional class methods:
70
71 getSupported
72 Returns a reference to a list of all supported charsets. This may
73 implicitely load additional Encode(3) conversions like
74 Encode::HanExtra(3) which may produce considerable load on your
75 system.
76
77 The method is therefore not intended for regular use but rather for
78 getting resp. displaying once a list of available encodings.
79
80 The members of the list are all converted to uppercase!
81
82 getCharsets
83 Like getSupported() but also returns all available aliases.
84
86 The range of supported charsets is system-dependent. The following
87 somewhat special charsets are always available:
88
89 UTF-8
90 UTF-8 is available independently of your Perl version. For Perl
91 5.6 or better or in the presence of Encode(3), conversions are not
92 done in Perl but with the interfaces provided by these facilities
93 which are written in C, hence much faster.
94
95 Encoding data into UTF-8 is fast, even if it is done in Perl.
96 Decoding it in Perl may become quite slow. If you frequently have
97 to decode UTF-8 with Locale::Recode you will probably want to make
98 sure that you do that with Perl 5.6 or beter, or install Encode(3)
99 to speed up things.
100
101 INTERNAL
102 UTF-8 is fast to write but hard to read for applications. It is
103 therefore not the worst for internal string representation but not
104 far from that. Locale::Recode(3) stores strings internally as a
105 reference to an array of integer values like most programming lan‐
106 guages (Perl is an exception) do, trading memory for performance.
107
108 The integer values are the UCS-4 codes of the characters in host
109 byte order.
110
111 The encoding INTERNAL is directly availabe via Locale::Recode(3)
112 but of course you should not really use it for data exchange,
113 unless you know what you are doing.
114
115 Locale::Recode(3) has native support for a plethora of other encodings,
116 most of them 8 bit encodings that are fast to decode, including most
117 encodings used on popular micros like the ISO-8859-* series of encod‐
118 ings, most Windows-* encodings (also known as CP*), Macintosh, Atari,
119 etc.
120
122 Each charset resp. encoding is available internally under a unique
123 name. Whenever the information was available, the preferred MIME name
124 (see <http://www.iana.org/assignments/character-sets/>) was chosen as
125 the internal name.
126
127 Alias handling is quite strict. The module does not make wild guesses
128 at what you mean ("What's the meaning of the acronym JIS" is a valid
129 alias for "7bit-jis" in Encode(3) ....) but aims at providing common
130 aliases only. The same applies to so-called aliases that are really
131 mistakes, like "utf8" for UTF-8.
132
133 The module knows all aliases that are listed with the IANA character
134 set registry (<http://www.iana.org/assignments/character-sets/>), plus
135 those known to libiconv version 1.8, and a bunch of additional ones.
136
138 The conversion tables have either been taken from official sources like
139 the IANA or the Unicode Consortium, from Bruno Haible's libiconv, or
140 from the sources of the GNU libc and the regression tests for libintl-
141 perl will check for conformance here. For some encodings this data
142 differs from Encode(3)'s data which would cause these tests to fail.
143 In these cases, the module will not invoke the Encode(3) methods, but
144 will fall back to the internal implementation for the sake of consis‐
145 tency.
146
147 The few encodings that are affected are so simple that you will not
148 experience any real performance penalty unless you convert large chunks
149 of data. But the package is not really intended for such use anyway,
150 and since Encode(3) is relatively new, I rather think that the differ‐
151 ences are bugs in Encode which will be fixed soon.
152
154 The module should provide fall back conversions for other Unicode
155 encoding schemes like UCS-2, UCS-4 (big- and little-endian).
156
157 The pure Perl UTF-8 decoder will not always handle corrupt UTF-8 cor‐
158 rectly, especially at the end and at the beginning of the string. This
159 is not likely to be fixed, since the module's intention is not to be a
160 consistency checker for UTF-8 data.
161
163 Copyright (C) 2002-2004, Guido Flohr <guido@imperia.net>, all rights
164 reserved. See the source code for details.
165
166 This software is contributed to the Perl community by Imperia
167 (<http://www.imperia.net/>).
168
170 Encode(3), iconv(3), iconv(1), recode(1), perl(1)
171
172
173
174perl v5.8.8 2006-08-28 Locale::Recode(3)