1Unicode::MapUTF8(3) User Contributed Perl Documentation Unicode::MapUTF8(3)
2
3
4
6 Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
7 UTF8
8
10 use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
11
12 # Convert a string in 'ISO-8859-1' to 'UTF8'
13 my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
14
15 # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
16 my $other = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
17
18 # List available character set encodings
19 my @character_sets = utf8_supported_charset;
20
21 # Add a character set alias
22 utf8_charset_alias({ 'ms-japanese' => 'sjis' });
23
24 # Convert between two arbitrary (but largely compatible) charset encodings
25 # (SJIS to EUC-JP)
26 my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
27 my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
28
29 # Verify that a specific character set is supported
30 if (utf8_supported_charset('ISO-8859-1') {
31 # Yes
32 }
33
35 Provides an adapter layer between core routines for converting to and
36 from UTF8 and other encodings. In essence, a way to give multiple
37 existing Unicode modules a single common interface so you don't have to
38 know the underlaying implementations to do simple UTF8 to-from other
39 character set encoding conversions. As such, it wraps the
40 Unicode::String, Unicode::Map8, Unicode::Map and Jcode modules in a
41 standardized and simple API.
42
43 This also provides general character set conversion operation based on
44 UTF8 - it is possible to convert between any two compatible and
45 supported character sets via a simple two step chaining of conversions.
46
47 As with most things Perlish - if you give it a few big chunks of text
48 to chew on instead of lots of small ones it will handle many more
49 characters per second.
50
51 By design, it can be easily extended to encompass any new charset
52 encoding conversion modules that arrive on the scene.
53
54 This module is intended to provide good Unicode support to versions of
55 Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
56 want to be using the Encode module instead. This module does work with
57 Perl 5.8, but Encode is the preferred method in that environment.
58
60 1.14 2020.09.27 Fixing POD breakage in EUC-JP version of POD
61
62 1.13 2020.09.27 Fixing MANIFEST.SKIP error
63
64 1.12 2020.09.27 Build tool updates. Maintainer updates. POD error
65 fixes.
66 Relicensed under MIT license.
67
68 1.11 2005.10.10 Documentation changes. Addition of Build.PL support.
69 Added various build tests, LICENSE,
70 Artistic_License.txt,
71 GPL_License.txt. Split documentation into seperate
72 .pod file. Added Japanese translation of POD.
73
74 1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
75 Problem and fix found by Masahiro HONMA
76 <masahiro.honma@tsutaya.co.jp>.
77
78 Similar bugs in conversions of shift_jis and euc-jp
79 to UTF-8 corrected as well.
80
81 1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
82 where 'utf' was meant in code. Problem affected
83 utf16 and utf7 encodings. Problem found
84 by devon smith <devon@taller.PSCL.cwru.edu>
85
86 1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for
87 runtime
88 setting of character set aliases. Added several
89 alternate
90 names for 'sjis' (shiftjis, shift-jis, shift_jis,
91 s-jis,
92 and s_jis).
93
94 Corrected 'croak' messages for 'from_utf8' functions to
95 appropriate function name.
96
97 Corrected fatal problem in jcode-unicode internals. Problem
98 and fix found by Brian Wisti <wbrian2@uswest.net>.
99
100 1.07 2000.11.01 Added 'croak' to use Carp declaration to fix error
101 messages. Problem and fix found by
102 <wbrian2@uswest.net>.
103
104 1.06 2000.10.30 Fix to handle change in stringification of overloaded
105 objects between Perl 5.005 and 5.6.
106 Problem noticed by Brian Wisti <wbrian2@uswest.net>.
107
108 1.05 2000.10.23 Error in conversions from UTF8 to multibyte encodings
109 corrected
110
111 1.04 2000.10.23 Additional diagnostic error messages added for
112 internal errors
113
114 1.03 2000.10.22 Bug fix for load time Unicode::Map encoding
115 detection
116
117 1.02 2000.10.22 Bug fix to 'from_utf8' method and load time
118 detection of Unicode::Map8 supported character
119 set encodings
120
121 1.01 2000.10.02 Initial public release
122
124 utf8_charset_alias({ $alias => $charset });
125 Used for runtime assignment of character set aliases.
126
127 Called with no parameters, returns a hash of defined aliases and
128 the character sets they map to.
129
130 Example:
131
132 my $aliases = utf8_charset_alias;
133 my @alias_names = keys %$aliases;
134
135 If called with ONE parameter, returns the name of the 'real'
136 charset if the alias is defined. Returns undef if it is not found
137 in the aliases.
138
139 Example:
140
141 if (! utf8_charset_alias('VISCII')) {
142 # No alias for this
143 }
144
145 If called with a list of 'alias' => 'charset' pairs, defines those
146 aliases for use.
147
148 Example:
149
150 utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });
151
152 Note: It will croak if a passed pair does not map to a character
153 set defined in the predefined set of character encoding. It is NOT
154 allowed to alias something to another alias.
155
156 Multiple character set aliases can be set with a single call.
157
158 To clear an alias, pass a character set mapping of undef.
159
160 Example:
161
162 utf8_charset_alias({ 'japanese' => undef });
163
164 While an alias is set, the 'utf8_supported_charset' function will
165 return the alias as if it were a predefined charset.
166
167 Overriding a base defined character encoding with an alias will
168 generate a warning message to STDERR.
169
170 utf8_supported_charset($charset_name);
171 Returns true if the named charset is supported (including user
172 defined aliases).
173
174 Returns false if it is not.
175
176 Example:
177
178 if (! utf8_supported_charset('VISCII')) {
179 # No support yet
180 }
181
182 If called in a list context with no parameters, it will return a
183 list of all supported character set names (including user defined
184 aliases).
185
186 Example:
187
188 my @charsets = utf8_supported_charset;
189
190 to_utf8({ -string => $string, -charset => $source_charset });
191 Returns the string converted to UTF8 from the specified source
192 charset.
193
194 from_utf8({ -string => $string, -charset => $target_charset});
195 Returns the string converted from UTF8 to the specified target
196 charset.
197
199 1.14 2020.09.27
200
202 Regression tests for Jcode, 2-byte encodings and encoding aliases
203
205 Unicode::String Unicode::Map8 Unicode::Map Jcode Encode
206
208 Copyright 2000-2020, Jerilyn Franz. All rights reserved.
209
211 Jerilyn Franz <cpan@jerilyn.info>
212
214 MIT License
215
216 Copyright (c) 2020 Jerilyn Franz
217
218 Permission is hereby granted, free of charge, to any person obtaining a
219 copy of this software and associated documentation files (the
220 "Software"), to deal in the Software without restriction, including
221 without limitation the rights to use, copy, modify, merge, publish,
222 distribute, sublicense, and/or sell copies of the Software, and to
223 permit persons to whom the Software is furnished to do so, subject to
224 the following conditions:
225
226 The above copyright notice and this permission notice shall be included
227 in all copies or substantial portions of the Software.
228
229 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
230 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
231 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
232 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
233 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
234 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
235 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
236
237
238
239perl v5.38.0 2023-07-21 Unicode::MapUTF8(3)