1Unicode::MapUTF8(3)   User Contributed Perl Documentation  Unicode::MapUTF8(3)
2
3
4

NAME

6       Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
7       UTF8
8

SYNOPSIS

10        use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
11
12        # Convert a string in 'ISO-8859-1' to 'UTF8'
13        my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
14
15        # Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
16        my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
17
18        # List available character set encodings
19        my @character_sets = utf8_supported_charset;
20
21        # Add a character set alias
22        utf8_charset_alias({ 'ms-japanese' => 'sjis' });
23
24        # Convert between two arbitrary (but largely compatible) charset encodings
25        # (SJIS to EUC-JP)
26        my $utf8_string   = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
27        my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
28
29        # Verify that a specific character set is supported
30        if (utf8_supported_charset('ISO-8859-1') {
31            # Yes
32        }
33

DESCRIPTION

35       Provides an adapter layer between core routines for converting to and
36       from UTF8 and other encodings. In essence, a way to give multiple
37       existing Unicode modules a single common interface so you don't have to
38       know the underlaying implementations to do simple UTF8 to-from other
39       character set encoding conversions. As such, it wraps the Uni‐
40       code::String, Unicode::Map8, Unicode::Map and Jcode modules in a stan‐
41       dardized and simple API.
42
43       This also provides general character set conversion operation based on
44       UTF8 - it is possible to convert between any two compatible and sup‐
45       ported character sets via a simple two step chaining of conversions.
46
47       As with most things Perlish - if you give it a few big chunks of text
48       to chew on instead of lots of small ones it will handle many more char‐
49       acters per second.
50
51       By design, it can be easily extended to encompass any new charset
52       encoding conversion modules that arrive on the scene.
53
54       This module is intended to provide good Unicode support to versions of
55       Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
56       want to be using the Encode module instead. This module does work with
57       Perl 5.8, but Encode is the preferred method in that environment.
58

CHANGES

60        1.11 2005.10.10   Documentation changes. Addition of Build.PL support.
61                          Added various build tests, LICENSE, Artistic_License.txt,
62                          GPL_License.txt. Split documentation into seperate
63                          .pod file. Added Japanese translation of POD.
64
65        1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
66                          Problem and fix found by Masahiro HONMA
67                          <masahiro.honma@tsutaya.co.jp>.
68
69                          Similar bugs in conversions of shift_jis and euc-jp
70                          to UTF-8 fixed as well.
71
72        1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
73                          where 'utf' was meant in code. Problem affected
74                          utf16 and utf7 encodings. Problem found
75                          by devon smith <devon@taller.PSCL.cwru.edu>
76
77        1.08 2000.11.06 - Added 'utf8_charset_alias' function to
78                          allow for runtime setting of character
79                          set aliases. Added several alternate
80                          names for 'sjis' (shiftjis, shift-jis,
81                          shift_jis, s-jis, and s_jis).
82
83                          Corrected 'croak' messages for
84                          'from_utf8' functions to appropriate
85                          function name.
86
87                          Tightened up initialization encapsulation
88
89                          Corrected fatal problem in jcode from
90                          unicode internals. Problem and fix
91                          found by Brian Wisti <wbrian2@uswest.net>.
92
93        1.07 2000.11.01 - Added 'croak' to use Carp declaration to
94                          fix error messages.  Problem and fix
95                          found by Brian Wisti
96                          <wbrian2@uswest.net>.
97
98        1.06 2000.10.30 - Fix to handle change in stringification
99                          of overloaded objects between Perl 5.005
100                          and 5.6. Problem noticed by Brian Wisti
101                          <wbrian2@uswest.net>.
102
103        1.05 2000.10.23 - Error in conversions from UTF8 to
104                          multibyte encodings corrected
105
106        1.04 2000.10.23 - Additional diagnostic messages added
107                          for internal error conditions
108
109        1.03 2000.10.22 - Bug fix for load time autodetction of
110                          Unicode::Map8 encodings
111
112        1.02 2000.10.22 - Added load time autodetection of
113                          Unicode::Map8 supported character set
114                          encodings.
115
116                          Fixed internal calling error for some
117                          character sets with 'from_utf8'. Thanks
118                          goes to Ilia Lobsanov
119                          <ilia@lobsanov.com> for reporting this
120                          problem.
121
122        1.01 2000.10.02 - Fixed handling of empty strings and
123                          added more identification for error
124                          messages.
125
126        1.00 2000.09.29 - Pre-release version
127

FUNCTIONS

129       utf8_charset_alias({ $alias => $charset });
130           Used for runtime assignment of character set aliases.
131
132           Called with no parameters, returns a hash of defined aliases and
133           the character sets they map to.
134
135           Example:
136
137             my $aliases     = utf8_charset_alias;
138             my @alias_names = keys %$aliases;
139
140           If called with ONE parameter, returns the name of the 'real'
141           charset if the alias is defined. Returns undef if it is not found
142           in the aliases.
143
144           Example:
145
146               if (! utf8_charset_alias('VISCII')) {
147                   # No alias for this
148               }
149
150           If called with a list of 'alias' => 'charset' pairs, defines those
151           aliases for use.
152
153           Example:
154
155               utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });
156
157           Note: It will croak if a passed pair does not map to a character
158           set defined in the predefined set of character encoding. It is NOT
159           allowed to alias something to another alias.
160
161           Multiple character set aliases can be set with a single call.
162
163           To clear an alias, pass a character set mapping of undef.
164
165           Example:
166
167               utf8_charset_alias({ 'japanese' => undef });
168
169           While an alias is set, the 'utf8_supported_charset' function will
170           return the alias as if it were a predefined charset.
171
172           Overriding a base defined character encoding with an alias will
173           generate a warning message to STDERR.
174
175       utf8_supported_charset($charset_name);
176           Returns true if the named charset is supported (including user
177           defined aliases).
178
179           Returns false if it is not.
180
181           Example:
182
183               if (! utf8_supported_charset('VISCII')) {
184                   # No support yet
185               }
186
187           If called in a list context with no parameters, it will return a
188           list of all supported character set names (including user defined
189           aliases).
190
191           Example:
192
193               my @charsets = utf8_supported_charset;
194
195       to_utf8({ -string => $string, -charset => $source_charset });
196           Returns the string converted to UTF8 from the specified source
197           charset.
198
199       from_utf8({ -string => $string, -charset => $target_charset});
200           Returns the string converted from UTF8 to the specified target
201           charset.
202

VERSION

204       1.11 2005.10.10
205

TODO

207       Regression tests for Jcode, 2-byte encodings and encoding aliases
208

SEE ALSO

210       Unicode::String Unicode::Map8 Unicode::Map Jcode Encode
211
213       Copyright 2000-2005, Benjamin Franz. All rights reserved.
214

AUTHOR

216       Benjamin Franz <snowhare@nihongo.org>
217

LICENSE

219       This program is free software; you can redistribute it and/or modify it
220       under the same terms and conditions as Perl itself.
221
222       This means that you can, at your option, redistribute it and/or modify
223       it under either the terms the GNU Public License (GPL) version 1 or
224       later, or under the Perl Artistic License.
225
226       See http://dev.perl.org/licenses/
227

DISCLAIMER

229       THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
230       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
231       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
232
233       Use of this software in any way or in any form, source or binary, is
234       not allowed in any country which prohibits disclaimers of any implied
235       warranties of merchantability or fitness for a particular purpose or
236       any disclaimers of a similar nature.
237
238       IN NO EVENT SHALL I BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPE‐
239       CIAL, INCIDENTAL,  OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF
240       THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT LIMITED TO,
241       LOST PROFITS) EVEN IF I HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
242       DAMAGE
243
244
245
246perl v5.8.8                       2006-10-29               Unicode::MapUTF8(3)
Impressum