1Unicode::MapUTF8(3)   User Contributed Perl Documentation  Unicode::MapUTF8(3)
2
3
4

NAME

6       Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
7       UTF8
8

SYNOPSIS

10        use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
11
12        # Convert a string in 'ISO-8859-1' to 'UTF8'
13        my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
14
15        # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
16        my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
17
18        # List available character set encodings
19        my @character_sets = utf8_supported_charset;
20
21        # Add a character set alias
22        utf8_charset_alias({ 'ms-japanese' => 'sjis' });
23
24        # Convert between two arbitrary (but largely compatible) charset encodings
25        # (SJIS to EUC-JP)
26        my $utf8_string   = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
27        my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
28
29        # Verify that a specific character set is supported
30        if (utf8_supported_charset('ISO-8859-1') {
31            # Yes
32        }
33

DESCRIPTION

35       Provides an adapter layer between core routines for converting to and
36       from UTF8 and other encodings. In essence, a way to give multiple
37       existing Unicode modules a single common interface so you don't have to
38       know the underlaying implementations to do simple UTF8 to-from other
39       character set encoding conversions. As such, it wraps the
40       Unicode::String, Unicode::Map8, Unicode::Map and Jcode modules in a
41       standardized and simple API.
42
43       This also provides general character set conversion operation based on
44       UTF8 - it is possible to convert between any two compatible and
45       supported character sets via a simple two step chaining of conversions.
46
47       As with most things Perlish - if you give it a few big chunks of text
48       to chew on instead of lots of small ones it will handle many more
49       characters per second.
50
51       By design, it can be easily extended to encompass any new charset
52       encoding conversion modules that arrive on the scene.
53
54       This module is intended to provide good Unicode support to versions of
55       Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
56       want to be using the Encode module instead. This module does work with
57       Perl 5.8, but Encode is the preferred method in that environment.
58

CHANGES

60       1.14 2020.09.27   Fixing POD breakage in EUC-JP version of POD
61
62       1.13 2020.09.27   Fixing MANIFEST.SKIP error
63
64       1.12 2020.09.27   Build tool updates. Maintainer updates. POD error
65       fixes.
66                         Relicensed under MIT license.
67
68       1.11 2005.10.10   Documentation changes. Addition of Build.PL support.
69                         Added various build tests, LICENSE,
70       Artistic_License.txt,
71                         GPL_License.txt. Split documentation into seperate
72                         .pod file. Added Japanese translation of POD.
73
74       1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
75                         Problem and fix found by Masahiro HONMA
76                         <masahiro.honma@tsutaya.co.jp>.
77
78                         Similar bugs in conversions of shift_jis and euc-jp
79                         to UTF-8 corrected as well.
80
81       1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
82                         where 'utf' was meant in code. Problem affected
83                         utf16 and utf7 encodings. Problem found
84                         by devon smith <devon@taller.PSCL.cwru.edu>
85
86       1.08 2000.11.06 Added 'utf8_charset_alias' function to allow for
87       runtime
88                       setting of character set aliases. Added several
89       alternate
90                       names for 'sjis' (shiftjis, shift-jis, shift_jis,
91       s-jis,
92                       and s_jis).
93
94                       Corrected 'croak' messages for 'from_utf8' functions to
95                       appropriate function name.
96
97                       Corrected fatal problem in jcode-unicode internals. Problem
98                       and fix found by Brian Wisti <wbrian2@uswest.net>.
99
100       1.07 2000.11.01 Added 'croak' to use Carp declaration to fix error
101                       messages. Problem and fix found by
102       <wbrian2@uswest.net>.
103
104       1.06 2000.10.30 Fix to handle change in stringification of overloaded
105                       objects between Perl 5.005 and 5.6.
106                       Problem noticed by Brian Wisti <wbrian2@uswest.net>.
107
108       1.05 2000.10.23 Error in conversions from UTF8 to multibyte encodings
109       corrected
110
111       1.04 2000.10.23 Additional diagnostic error messages added for
112                       internal errors
113
114       1.03 2000.10.22 Bug fix for load time Unicode::Map encoding
115                       detection
116
117       1.02 2000.10.22 Bug fix to 'from_utf8' method and load time
118                       detection of Unicode::Map8 supported character
119                       set encodings
120
121       1.01 2000.10.02 Initial public release
122

FUNCTIONS

124       utf8_charset_alias({ $alias => $charset });
125           Used for runtime assignment of character set aliases.
126
127           Called with no parameters, returns a hash of defined aliases and
128           the character sets they map to.
129
130           Example:
131
132             my $aliases     = utf8_charset_alias;
133             my @alias_names = keys %$aliases;
134
135           If called with ONE parameter, returns the name of the 'real'
136           charset if the alias is defined. Returns undef if it is not found
137           in the aliases.
138
139           Example:
140
141               if (! utf8_charset_alias('VISCII')) {
142                   # No alias for this
143               }
144
145           If called with a list of 'alias' => 'charset' pairs, defines those
146           aliases for use.
147
148           Example:
149
150               utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });
151
152           Note: It will croak if a passed pair does not map to a character
153           set defined in the predefined set of character encoding. It is NOT
154           allowed to alias something to another alias.
155
156           Multiple character set aliases can be set with a single call.
157
158           To clear an alias, pass a character set mapping of undef.
159
160           Example:
161
162               utf8_charset_alias({ 'japanese' => undef });
163
164           While an alias is set, the 'utf8_supported_charset' function will
165           return the alias as if it were a predefined charset.
166
167           Overriding a base defined character encoding with an alias will
168           generate a warning message to STDERR.
169
170       utf8_supported_charset($charset_name);
171           Returns true if the named charset is supported (including user
172           defined aliases).
173
174           Returns false if it is not.
175
176           Example:
177
178               if (! utf8_supported_charset('VISCII')) {
179                   # No support yet
180               }
181
182           If called in a list context with no parameters, it will return a
183           list of all supported character set names (including user defined
184           aliases).
185
186           Example:
187
188               my @charsets = utf8_supported_charset;
189
190       to_utf8({ -string => $string, -charset => $source_charset });
191           Returns the string converted to UTF8 from the specified source
192           charset.
193
194       from_utf8({ -string => $string, -charset => $target_charset});
195           Returns the string converted from UTF8 to the specified target
196           charset.
197

VERSION

199       1.14 2020.09.27
200

TODO

202       Regression tests for Jcode, 2-byte encodings and encoding aliases
203

SEE ALSO

205       Unicode::String Unicode::Map8 Unicode::Map Jcode Encode
206
208       Copyright 2000-2020, Jerilyn Franz. All rights reserved.
209

AUTHOR

211       Jerilyn Franz <cpan@jerilyn.info>
212

LICENSE

214       MIT License
215
216       Copyright (c) 2020 Jerilyn Franz
217
218       Permission is hereby granted, free of charge, to any person obtaining a
219       copy of this software and associated documentation files (the
220       "Software"), to deal in the Software without restriction, including
221       without limitation the rights to use, copy, modify, merge, publish,
222       distribute, sublicense, and/or sell copies of the Software, and to
223       permit persons to whom the Software is furnished to do so, subject to
224       the following conditions:
225
226       The above copyright notice and this permission notice shall be included
227       in all copies or substantial portions of the Software.
228
229       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
230       OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
231       MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
232       IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
233       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
234       TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
235       SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
236
237
238
239perl v5.34.0                      2022-01-21               Unicode::MapUTF8(3)
Impressum