1Encode::HanExtra(3)   User Contributed Perl Documentation  Encode::HanExtra(3)
2
3
4

NAME

6       Encode::HanExtra - Extra sets of Chinese encodings
7

VERSION

9       This document describes version 0.23 of Encode::HanExtra, released
10       November 10, 2007.
11

SYNOPSIS

13           use Encode;
14
15           # Traditional Chinese
16           $euc_tw = encode("euc-tw", $utf8);   # loads Encode::HanExtra implicitly
17           $utf8   = decode("euc-tw", $euc_tw); # ditto
18
19           # Simplified Chinese
20           $gb18030 = encode("gb18030", $utf8);    # loads Encode::HanExtra implicitly
21           $utf8    = decode("gb18030", $gb18030); # ditto
22

DESCRIPTION

24       Perl 5.7.3 and later ships with an adequate set of Chinese encodings,
25       including the commonly used "CP950", "CP936" (also known as "GBK"),
26       "Big5" (alias for "Big5-Eten"), "Big5-HKSCS", "EUC-CN", "HZ", and
27       "ISO-IR-165".
28
29       However, the numbers of Chinese encodings are staggering, and a
30       complete coverage will easily increase the size of perl distribution by
31       several megabytes; hence, this CPAN module tries to provide the rest of
32       them.
33
34       If you are using Perl 5.8 or later, Encode::CN and Encode::TW will
35       automatically load the extra encodings for you, so there's no need to
36       explicitly write "use Encode::HanExtra" if you are using one of them
37       already.
38

ENCODINGS

40       This version includes the following encoding tables:
41
42         Canonical   Alias                             Description
43         -----------------------------------------------------------------------------
44         big5-1984   /\b(tca-)?big5-?(19)?84$/i        TCA's original Big5-1984
45         big5-2003   /\b(cmex-)?big5-?(20)?03$/i       Big5-2003 (national standard)
46         big5ext     /\b(cmex-)?big5-?e(xt)?$/i        CMEX's Big5e Extension
47         big5plus    /\b(cmex-)?big5-?p(lus)?$/i       CMEX's Big5+ Extension
48                     /\b(cmex-)?big5\+$/i
49         cccii       /\b(ccag-)?cccii$/i               Chinese Character Code for
50                                                       Information Interchange
51         cns11643-1  /\bCNS[-_ ]?11643[-_]1$/i         Taiwan's CNS map, plane 1
52         cns11643-2  /\bCNS[-_ ]?11643[-_]2$/i         Taiwan's CNS map, plane 2
53         cns11643-3  /\bCNS[-_ ]?11643[-_]3$/i         Taiwan's CNS map, plane 3
54         cns11643-4  /\bCNS[-_ ]?11643[-_]4$/i         Taiwan's CNS map, plane 4
55         cns11643-5  /\bCNS[-_ ]?11643[-_]5$/i         Taiwan's CNS map, plane 5
56         cns11643-6  /\bCNS[-_ ]?11643[-_]6$/i         Taiwan's CNS map, plane 6
57         cns11643-7  /\bCNS[-_ ]?11643[-_]7$/i         Taiwan's CNS map, plane 7
58         cns11643-f  /\bCNS[-_ ]?11643[-_]f$/i         Taiwan's CNS map, plane F
59         euc-tw      /\beuc.*tw$/i                     EUC (Extended Unix Character)
60                     /\btw.*euc$/i
61         gb18030     /\bGB[-_ ]?18030$/i               GBK with Traditional Characters
62         unisys      /\bunisys$/i                      Unisys Traditional Chinese
63         unisys-sosi1                                  Unisys SOSI1 transport encoding
64         unisys-sosi2                                  Unisys SOSI2 transport encoding
65
66       Detailed descriptions are as follows:
67
68       BIG5-1984
69           This is the original Big5 encoding made by TCA Taiwan.
70
71       BIG5-2003
72           This revised encoding is now national standard, as an appendix of
73           CNS11643.
74
75       BIG5PLUS
76           This encoding, while not heavily used, is an attempt to bring all
77           Taiwan's conflicting internal-use encodings together, and fit it as
78           an extension to the widely-deployed Big5 range, by CMEX Taiwan.
79
80       BIG5EXT
81           The CMEX's second (and less ambitious) try at unifying the most
82           commonly used characters not covered by Big5, while not polluting
83           out of the 94x94 arragement like BIG5PLUS did.
84
85       CCCII
86           The earliest (and most sophisticated) Traditional Chinese encoding,
87           with a three-byte raw character map, made in 1980 by the Chinese
88           Character Analysis Group (CCAG), used mostly in library systems.
89
90       EUC-TW
91           The EUC transport version of "CNS11643" (planes 1-7), the
92           comprehensive character set used by the Taiwan government.
93
94       CNS11643-*
95           The raw character map extracted from the Unihan database, including
96           the plane F which wasn't included in "EUC-TW".
97
98       GB18030
99           An extension to GBK, this encoding lists most Han characters (both
100           simplified and traditional), as well as some other encodings used
101           by other peoples in China.
102
103       UNISYS
104           Unisys System's internal Chinese mapping.
105

NOTES

107       If you are looking for ways to transliterate between Simplified and
108       Traditional Chinese, please take a look at Encode::HanConvert. Note
109       that the direct mapping via Unicode is lossy, and usually doesn't work
110       at all.
111
112       Please send me suggestions if you want to see more encoding added, such
113       as "BIG5-GCCS" (superseded by "BIG5-HKSCS").  Other suggestions are
114       welcome, too.
115

SEE ALSO

117       Encode, Encode::HanConvert
118

ACKNOWLEDGEMENTS

120       Some of the maps here are generated from GNU libiconv's test files,
121       with kind permission from Bruno Haible.
122
123       Map for "BIG5PLUS" is generated from the BIG52UCS.TXT file, courtesy of
124       CMEX Taiwan (Chinese Microcomputer Extended Foundation,
125       <http://www.cmex.org.tw/>).
126
127       Map for "BIG5-1984" is supplied by imacat.
128
129       Map for "CCCII" is supplied by the Koha Taiwan project.
130

AUTHORS

132       Audrey Tang <audreyt@audreyt.org>
133
135       Copyright 2002-2007 by Audrey Tang <audreyt@audreyt.org>.
136
137       This software is released under the MIT license cited below.
138
139   The "MIT" License
140       Permission is hereby granted, free of charge, to any person obtaining a
141       copy of this software and associated documentation files (the
142       "Software"), to deal in the Software without restriction, including
143       without limitation the rights to use, copy, modify, merge, publish,
144       distribute, sublicense, and/or sell copies of the Software, and to
145       permit persons to whom the Software is furnished to do so, subject to
146       the following conditions:
147
148       The above copyright notice and this permission notice shall be included
149       in all copies or substantial portions of the Software.
150
151       THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
152       OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
153       MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
154       IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
155       CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
156       TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
157       SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
158
159
160
161perl v5.28.1                      2007-11-10               Encode::HanExtra(3)
Impressum