1Encode::HanExtra(3) User Contributed Perl Documentation Encode::HanExtra(3)
2
3
4
6 Encode::HanExtra - Extra sets of Chinese encodings
7
9 This document describes version 0.23 of Encode::HanExtra, released
10 November 10, 2007.
11
13 use Encode;
14
15 # Traditional Chinese
16 $euc_tw = encode("euc-tw", $utf8); # loads Encode::HanExtra implicitly
17 $utf8 = decode("euc-tw", $euc_tw); # ditto
18
19 # Simplified Chinese
20 $gb18030 = encode("gb18030", $utf8); # loads Encode::HanExtra implicitly
21 $utf8 = decode("gb18030", $gb18030); # ditto
22
24 Perl 5.7.3 and later ships with an adequate set of Chinese encodings,
25 including the commonly used "CP950", "CP936" (also known as "GBK"),
26 "Big5" (alias for "Big5-Eten"), "Big5-HKSCS", "EUC-CN", "HZ", and
27 "ISO-IR-165".
28
29 However, the numbers of Chinese encodings are staggering, and a
30 complete coverage will easily increase the size of perl distribution by
31 several megabytes; hence, this CPAN module tries to provide the rest of
32 them.
33
34 If you are using Perl 5.8 or later, Encode::CN and Encode::TW will
35 automatically load the extra encodings for you, so there's no need to
36 explicitly write "use Encode::HanExtra" if you are using one of them
37 already.
38
40 This version includes the following encoding tables:
41
42 Canonical Alias Description
43 -----------------------------------------------------------------------------
44 big5-1984 /\b(tca-)?big5-?(19)?84$/i TCA's original Big5-1984
45 big5-2003 /\b(cmex-)?big5-?(20)?03$/i Big5-2003 (national standard)
46 big5ext /\b(cmex-)?big5-?e(xt)?$/i CMEX's Big5e Extension
47 big5plus /\b(cmex-)?big5-?p(lus)?$/i CMEX's Big5+ Extension
48 /\b(cmex-)?big5\+$/i
49 cccii /\b(ccag-)?cccii$/i Chinese Character Code for
50 Information Interchange
51 cns11643-1 /\bCNS[-_ ]?11643[-_]1$/i Taiwan's CNS map, plane 1
52 cns11643-2 /\bCNS[-_ ]?11643[-_]2$/i Taiwan's CNS map, plane 2
53 cns11643-3 /\bCNS[-_ ]?11643[-_]3$/i Taiwan's CNS map, plane 3
54 cns11643-4 /\bCNS[-_ ]?11643[-_]4$/i Taiwan's CNS map, plane 4
55 cns11643-5 /\bCNS[-_ ]?11643[-_]5$/i Taiwan's CNS map, plane 5
56 cns11643-6 /\bCNS[-_ ]?11643[-_]6$/i Taiwan's CNS map, plane 6
57 cns11643-7 /\bCNS[-_ ]?11643[-_]7$/i Taiwan's CNS map, plane 7
58 cns11643-f /\bCNS[-_ ]?11643[-_]f$/i Taiwan's CNS map, plane F
59 euc-tw /\beuc.*tw$/i EUC (Extended Unix Character)
60 /\btw.*euc$/i
61 gb18030 /\bGB[-_ ]?18030$/i GBK with Traditional Characters
62 unisys /\bunisys$/i Unisys Traditional Chinese
63 unisys-sosi1 Unisys SOSI1 transport encoding
64 unisys-sosi2 Unisys SOSI2 transport encoding
65
66 Detailed descriptions are as follows:
67
68 BIG5-1984
69 This is the original Big5 encoding made by TCA Taiwan.
70
71 BIG5-2003
72 This revised encoding is now national standard, as an appendix of
73 CNS11643.
74
75 BIG5PLUS
76 This encoding, while not heavily used, is an attempt to bring all
77 Taiwan's conflicting internal-use encodings together, and fit it as
78 an extension to the widely-deployed Big5 range, by CMEX Taiwan.
79
80 BIG5EXT
81 The CMEX's second (and less ambitious) try at unifying the most
82 commonly used characters not covered by Big5, while not polluting
83 out of the 94x94 arragement like BIG5PLUS did.
84
85 CCCII
86 The earliest (and most sophisticated) Traditional Chinese encoding,
87 with a three-byte raw character map, made in 1980 by the Chinese
88 Character Analysis Group (CCAG), used mostly in library systems.
89
90 EUC-TW
91 The EUC transport version of "CNS11643" (planes 1-7), the
92 comprehensive character set used by the Taiwan government.
93
94 CNS11643-*
95 The raw character map extracted from the Unihan database, including
96 the plane F which wasn't included in "EUC-TW".
97
98 GB18030
99 An extension to GBK, this encoding lists most Han characters (both
100 simplified and traditional), as well as some other encodings used
101 by other peoples in China.
102
103 UNISYS
104 Unisys System's internal Chinese mapping.
105
107 If you are looking for ways to transliterate between Simplified and
108 Traditional Chinese, please take a look at Encode::HanConvert. Note
109 that the direct mapping via Unicode is lossy, and usually doesn't work
110 at all.
111
112 Please send me suggestions if you want to see more encoding added, such
113 as "BIG5-GCCS" (superseded by "BIG5-HKSCS"). Other suggestions are
114 welcome, too.
115
117 Encode, Encode::HanConvert
118
120 Some of the maps here are generated from GNU libiconv's test files,
121 with kind permission from Bruno Haible.
122
123 Map for "BIG5PLUS" is generated from the BIG52UCS.TXT file, courtesy of
124 CMEX Taiwan (Chinese Microcomputer Extended Foundation,
125 <http://www.cmex.org.tw/>).
126
127 Map for "BIG5-1984" is supplied by imacat.
128
129 Map for "CCCII" is supplied by the Koha Taiwan project.
130
132 Audrey Tang <audreyt@audreyt.org>
133
135 Copyright 2002-2007 by Audrey Tang <audreyt@audreyt.org>.
136
137 This software is released under the MIT license cited below.
138
139 The "MIT" License
140 Permission is hereby granted, free of charge, to any person obtaining a
141 copy of this software and associated documentation files (the
142 "Software"), to deal in the Software without restriction, including
143 without limitation the rights to use, copy, modify, merge, publish,
144 distribute, sublicense, and/or sell copies of the Software, and to
145 permit persons to whom the Software is furnished to do so, subject to
146 the following conditions:
147
148 The above copyright notice and this permission notice shall be included
149 in all copies or substantial portions of the Software.
150
151 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
152 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
153 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
154 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
155 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
156 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
157 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
158
159
160
161perl v5.28.0 2007-11-10 Encode::HanExtra(3)