1Unicode::MapUTF8(3) User Contributed Perl Documentation Unicode::MapUTF8(3)
2
3
4
6 Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
7 UTF8
8
10 use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
11
12 # Convert a string in 'ISO-8859-1' to 'UTF8'
13 my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
14
15 # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
16 my $other = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
17
18 # List available character set encodings
19 my @character_sets = utf8_supported_charset;
20
21 # Add a character set alias
22 utf8_charset_alias({ 'ms-japanese' => 'sjis' });
23
24 # Convert between two arbitrary (but largely compatible) charset encodings
25 # (SJIS to EUC-JP)
26 my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
27 my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
28
29 # Verify that a specific character set is supported
30 if (utf8_supported_charset('ISO-8859-1') {
31 # Yes
32 }
33
35 Provides an adapter layer between core routines for converting to and
36 from UTF8 and other encodings. In essence, a way to give multiple
37 existing Unicode modules a single common interface so you don't have to
38 know the underlaying implementations to do simple UTF8 to-from other
39 character set encoding conversions. As such, it wraps the Uni‐
40 code::String, Unicode::Map8, Unicode::Map and Jcode modules in a stan‐
41 dardized and simple API.
42
43 This also provides general character set conversion operation based on
44 UTF8 - it is possible to convert between any two compatible and sup‐
45 ported character sets via a simple two step chaining of conversions.
46
47 As with most things Perlish - if you give it a few big chunks of text
48 to chew on instead of lots of small ones it will handle many more char‐
49 acters per second.
50
51 By design, it can be easily extended to encompass any new charset
52 encoding conversion modules that arrive on the scene.
53
54 This module is intended to provide good Unicode support to versions of
55 Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
56 want to be using the Encode module instead. This module does work with
57 Perl 5.8, but Encode is the preferred method in that environment.
58
60 1.11 2005.10.10 Documentation changes. Addition of Build.PL support.
61 Added various build tests, LICENSE, Artistic_License.txt,
62 GPL_License.txt. Split documentation into seperate
63 .pod file. Added Japanese translation of POD.
64
65 1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
66 Problem and fix found by Masahiro HONMA
67 <masahiro.honma@tsutaya.co.jp>.
68
69 Similar bugs in conversions of shift_jis and euc-jp
70 to UTF-8 fixed as well.
71
72 1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
73 where 'utf' was meant in code. Problem affected
74 utf16 and utf7 encodings. Problem found
75 by devon smith <devon@taller.PSCL.cwru.edu>
76
77 1.08 2000.11.06 - Added 'utf8_charset_alias' function to
78 allow for runtime setting of character
79 set aliases. Added several alternate
80 names for 'sjis' (shiftjis, shift-jis,
81 shift_jis, s-jis, and s_jis).
82
83 Corrected 'croak' messages for
84 'from_utf8' functions to appropriate
85 function name.
86
87 Tightened up initialization encapsulation
88
89 Corrected fatal problem in jcode from
90 unicode internals. Problem and fix
91 found by Brian Wisti <wbrian2@uswest.net>.
92
93 1.07 2000.11.01 - Added 'croak' to use Carp declaration to
94 fix error messages. Problem and fix
95 found by Brian Wisti
96 <wbrian2@uswest.net>.
97
98 1.06 2000.10.30 - Fix to handle change in stringification
99 of overloaded objects between Perl 5.005
100 and 5.6. Problem noticed by Brian Wisti
101 <wbrian2@uswest.net>.
102
103 1.05 2000.10.23 - Error in conversions from UTF8 to
104 multibyte encodings corrected
105
106 1.04 2000.10.23 - Additional diagnostic messages added
107 for internal error conditions
108
109 1.03 2000.10.22 - Bug fix for load time autodetction of
110 Unicode::Map8 encodings
111
112 1.02 2000.10.22 - Added load time autodetection of
113 Unicode::Map8 supported character set
114 encodings.
115
116 Fixed internal calling error for some
117 character sets with 'from_utf8'. Thanks
118 goes to Ilia Lobsanov
119 <ilia@lobsanov.com> for reporting this
120 problem.
121
122 1.01 2000.10.02 - Fixed handling of empty strings and
123 added more identification for error
124 messages.
125
126 1.00 2000.09.29 - Pre-release version
127
129 utf8_charset_alias({ $alias => $charset });
130 Used for runtime assignment of character set aliases.
131
132 Called with no parameters, returns a hash of defined aliases and
133 the character sets they map to.
134
135 Example:
136
137 my $aliases = utf8_charset_alias;
138 my @alias_names = keys %$aliases;
139
140 If called with ONE parameter, returns the name of the 'real'
141 charset if the alias is defined. Returns undef if it is not found
142 in the aliases.
143
144 Example:
145
146 if (! utf8_charset_alias('VISCII')) {
147 # No alias for this
148 }
149
150 If called with a list of 'alias' => 'charset' pairs, defines those
151 aliases for use.
152
153 Example:
154
155 utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });
156
157 Note: It will croak if a passed pair does not map to a character
158 set defined in the predefined set of character encoding. It is NOT
159 allowed to alias something to another alias.
160
161 Multiple character set aliases can be set with a single call.
162
163 To clear an alias, pass a character set mapping of undef.
164
165 Example:
166
167 utf8_charset_alias({ 'japanese' => undef });
168
169 While an alias is set, the 'utf8_supported_charset' function will
170 return the alias as if it were a predefined charset.
171
172 Overriding a base defined character encoding with an alias will
173 generate a warning message to STDERR.
174
175 utf8_supported_charset($charset_name);
176 Returns true if the named charset is supported (including user
177 defined aliases).
178
179 Returns false if it is not.
180
181 Example:
182
183 if (! utf8_supported_charset('VISCII')) {
184 # No support yet
185 }
186
187 If called in a list context with no parameters, it will return a
188 list of all supported character set names (including user defined
189 aliases).
190
191 Example:
192
193 my @charsets = utf8_supported_charset;
194
195 to_utf8({ -string => $string, -charset => $source_charset });
196 Returns the string converted to UTF8 from the specified source
197 charset.
198
199 from_utf8({ -string => $string, -charset => $target_charset});
200 Returns the string converted from UTF8 to the specified target
201 charset.
202
204 1.11 2005.10.10
205
207 Regression tests for Jcode, 2-byte encodings and encoding aliases
208
210 Unicode::String Unicode::Map8 Unicode::Map Jcode Encode
211
213 Copyright 2000-2005, Benjamin Franz. All rights reserved.
214
216 Benjamin Franz <snowhare@nihongo.org>
217
219 This program is free software; you can redistribute it and/or modify it
220 under the same terms and conditions as Perl itself.
221
222 This means that you can, at your option, redistribute it and/or modify
223 it under either the terms the GNU Public License (GPL) version 1 or
224 later, or under the Perl Artistic License.
225
226 See http://dev.perl.org/licenses/
227
229 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
230 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
231 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
232
233 Use of this software in any way or in any form, source or binary, is
234 not allowed in any country which prohibits disclaimers of any implied
235 warranties of merchantability or fitness for a particular purpose or
236 any disclaimers of a similar nature.
237
238 IN NO EVENT SHALL I BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPE‐
239 CIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF
240 THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT LIMITED TO,
241 LOST PROFITS) EVEN IF I HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
242 DAMAGE
243
244
245
246perl v5.8.8 2006-10-29 Unicode::MapUTF8(3)