iconv_unicode(5)

1iconv_unicode(5)      Standards, Environments, and Macros     iconv_unicode(5)
2
3
4

NAME

6       iconv_unicode - code set conversion tables for Unicode
7

DESCRIPTION

9       The following code set conversions are supported:
10
11                             CODE SET CONVERSIONS SUPPORTED
12                             ------------------------------
13           FROM Code Set                               TO Code Set
14               Code              FROM          Target Code            TO
15                                 Filename                             Filename
16                                 Element                              Element
17
18         ISO 8859-1 (Latin 1)    8859-1            UTF-8               UTF-8
19         ISO 8859-2 (Latin 2)    8859-2            UTF-8               UTF-8
20         ISO 8859-3 (Latin 3)    8859-3            UTF-8               UTF-8
21         ISO 8859-4 (Latin 4)    8859-4            UTF-8               UTF-8
22         ISO 8859-5 (Cyrillic)   8859-5            UTF-8               UTF-8
23         ISO 8859-6 (Arabic)     8859-6            UTF-8               UTF-8
24         ISO 8859-7 (Greek)      8859-7            UTF-8               UTF-8
25         ISO 8859-8 (Hebrew)     8859-8            UTF-8               UTF-8
26         ISO 8859-9 (Latin 5)    8859-9            UTF-8               UTF-8
27         ISO 8859-10 (Latin 6)   8859-10           UTF-8               UTF-8
28         Japanese EUC            eucJP             UTF-8               UTF-8
29         Chinese/PRC EUC
30         (GB 2312-1980)          gb2312            UTF-8               UTF-8
31         ISO-2022                iso2022           UTF-8               UTF-8
32         Korean EUC              ko_KR-euc         Korean UTF-8        ko_KR-UTF-8
33         ISO-2022-KR             ko_KR-iso2022-7   Korean UTF-8        ko_KR_UTF-8
34         Korean Johap
35         (KS C 5601-1987)        ko_KR-johap       Korean UTF-8        ko_KR-UTF-8
36         Korean Johap
37         (KS C 5601-1992)        ko_KR-johap92     Korean UTF-8        ko_KR-UTF-8
38         Korean UTF-8            ko_KR-UTF-8       Korean EUC          ko_KR-euc
39         Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap
40                                                   (KS C 5601-1987)
41         Korean UTF-8            ko_KR-UTF-8       Korean Johap        ko_KR-johap92
42                                                   (KS C 5601-1992)
43         KOI8-R (Cyrillic)       KOI8-R            UCS-2               UCS-2
44         KOI8-R (Cyrillic)       KOI8-R            UTF-8               UTF-8
45         PC Kanji (SJIS)         PCK               UTF-8               UTF-8
46         PC Kanji (SJIS)         SJIS              UTF-8               UTF-8
47         UCS-2                   UCS-2             KOI8-R (Cyrillic)   KOI8-R
48         UCS-2                   UCS-2             UCS-4               UCS-4
49
50
51
52                             CODE SET CONVERSIONS SUPPORTED
53                             ------------------------------
54           FROM Code Set                               TO Code Set
55               Code              FROM          Target Code            TO
56                                 Filename                             Filename
57                                 Element                              Element
58
59         UCS-2              UCS-2           UTF-7                   UTF-7
60         UCS-2              UCS-2           UTF-8                   UTF-8
61         UCS-4              UCS-4           UCS-2                   UCS-2
62         UCS-4              UCS-4           UTF-16                  UTF-16
63         UCS-4              UCS-4           UTF-7                   UTF-7
64         UCS-4              UCS-4           UTF-8                   UTF-8
65         UTF-16             UTF-16          UCS-4                   UCS-4
66         UTF-16             UTF-16          UTF-8                   UTF-8
67         UTF-7              UTF-7           UCS-2                   UCS-2
68         UTF-7              UTF-7           UCS-4                   UCS-4
69         UTF-7              UTF-7           UTF-8                   UTF-8
70         UTF-8              UTF-8           ISO 8859-1 (Latin 1)    8859-1
71         UTF-8              UTF-8           ISO 8859-2 (Latin 2)    8859-2
72         UTF-8              UTF-8           ISO 8859-3 (Latin 3)    8859-3
73         UTF-8              UTF-8           ISO 8859-4 (Latin 4)    8859-4
74         UTF-8              UTF-8           ISO 8859-5 (Cyrillic)   8859-5
75         UTF-8              UTF-8           ISO 8859-6 (Arabic)     8859-6
76         UTF-8              UTF-8           ISO 8859-7 (Greek)      8859-7
77         UTF-8              UTF-8           ISO 8859-8 (Hebrew)     8859-8
78         UTF-8              UTF-8           ISO 8859-9 (Latin 5)    8859-9
79         UTF-8              UTF-8           ISO 8859-10 (Latin 6)   8859-10
80         UTF-8              UTF-8           Japanese EUC            eucJP
81         UTF-8              UTF-8           Chinese/PRC EUC         gb2312
82                                            (GB 2312-1980)
83         UTF-8              UTF-8           ISO-2022                iso2022
84         UTF-8              UTF-8           KOI8-R (Cyrillic)       KOI8-R
85         UTF-8              UTF-8           PC Kanji (SJIS)         PCK
86         UTF-8              UTF-8           PC Kanji (SJIS)         SJIS
87         UTF-8              UTF-8           UCS-2                   UCS-2
88         UTF-8              UTF-8           UCS-4                   UCS-4
89         UTF-8              UTF-8           UTF-16                  UTF-16
90         UTF-8              UTF-8           UTF-7                   UTF-7
91         UTF-8              UTF-8           Chinese/PRC EUC         zh_CN.euc
92                                            (GB 2312-1980)
93
94
95
96                             CODE SET CONVERSIONS SUPPORTED
97                             ------------------------------
98           FROM Code Set                               TO Code Set
99               Code              FROM          Target Code            TO
100                                 Filename                             Filename
101                                 Element                              Element
102
103         UTF-8                 UTF-8             ISO 2022-CN           zh_CN.iso2022-7
104         UTF-8                 UTF-8             Chinese/Taiwan Big5   zh_TW-big5
105         UTF-8                 UTF-8             Chinese/Taiwan  EUC   zh_TW-euc
106                                                 (CNS 11643-1992)
107         UTF-8                 UTF-8             ISO 2022-TW           zh_TW-iso2022-7
108         Chinese/PRC EUC       zh_CN.euc         UTF-8                 UTF-8
109         (GB 2312-1980)
110         ISO 2022-CN           zh_CN.iso2022-7   UTF-8                 UTF-8
111         Chinese/Taiwan Big5   zh_TW-big5        UTF-8                 UTF-8
112         Chinese/Taiwan  EUC   zh_TW-euc         UTF-8                 UTF-8
113         (CNS 11643-1992)
114         ISO 2022-TW           zh_TW-iso2022-7   UTF-8                 UTF-8
115
116
117

EXAMPLES

119       Example 1 The library module filename
120
121
122       In  the conversion library, /usr/lib/iconv (see iconv(3C)), the library
123       module filename is composed of two symbolic elements separated  by  the
124       percent sign (%). The first symbol specifies the code set that is being
125       converted; the second symbol specifies the target code,  that  is,  the
126       code set to which the first one is being converted.
127
128
129
130       In  the  conversion  table above, the first  symbol is termed the "FROM
131       Filename Element". The second symbol, representing the target code set,
132       is the "TO Filename Element".
133
134
135
136       For example, the library module filename to convert from the Korean EUC
137       code set to the Korean UTF-8 code set is
138
139
140
141       ko_KR-euc%ko_KR-UTF-8
142
143

FILES

145       /usr/lib/iconv/*.so    conversion modules
146
147

149       iconv(1), iconv(3C), iconv(5)
150
151
152       Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM
153       Development Team, July 1993.
154
155
156       Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Inter‐
157       net Messages, RFC 1557, Solvit Chosun Media, December 1993.
158
159
160       Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation  Format
161       of Unicode, RFC 1642, Taligent, Inc., July 1994.
162
163
164       Lee,  F.,  HZ - A Data Format for Exchanging Files of Arbitrarily Mixed
165       Chinese and ASCII characters, RFC  1843,  Stanford  University,  August
166       1995.
167
168
169       Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding
170       for Internet Messages, RFC 1468, Keio  University,  Panda  Programming,
171       June 1993.
172
173
174       Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet
175       Messages, RFC 1555, Israeli Inter-University, Hebrew University, Decem‐
176       ber 1993.
177
178
179       Ohta,  M.,  Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo
180       Institute of Technology, July 1995.
181
182
183       Ohta, M.,  and  K.  Handa,  ISO-2022-JP-2:  Multilingual  Extension  of
184       ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.
185
186
187       Reynolds,  J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of
188       Southern California/Information Sciences Institute, October 1994.
189
190
191       Simonson, K., Character Mnemonics & Character Sets, RFC 1345,  Rationel
192       Almen Planlaegning, June 1992.
193
194
195       Spinellis,  D.,  Greek Character Encoding for Electronic Mail Messages,
196       RFC 1947, SENA S.A., May 1996.
197
198
199       The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wes‐
200       ley Developers Press, July 1996.
201
202
203       Wei,  Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCII Printable Char‐
204       acters-Based Chinese Character  Encoding  for  Internet  Messages,  RFC
205       1842, AsiaInfo Services Inc., Harvard University, Rice University, Uni‐
206       versity of Maryland, August 1995.
207
208
209       Yergeau, F., UTF-8, a transformation format of Unicode and  ISO  10646,
210       RFC 2044, Alis Technologies, October 1996.
211
212
213       Zhu,  H.,  D.  Hu,  Z.  Wang, T. Kao, W. Chang, and M. Crispin, Chinese
214       Character Encoding for Internet Messages, RFC  1922,  Tsinghua  Univer‐
215       sity,  China Information Technology Standardization Technical Committee
216       (CITS), Institute for Information Industry (III), University  of  Wash‐
217       ington, March 1996.
218

NOTES

220       ISO  8859  character sets using Latin alphabetic characters are distin‐
221       guished as follows:
222
223       ISO 8859-1 (Latin 1)     For most West European languages, including:
224
225
226
227
228                                Albanian             Finnish               Italian
229                                Catalan              French                Norwegian
230                                Danish               German                Portuguese
231                                Dutch                Galician              Spanish
232                                English              Irish                 Swedish
233                                Faeroese             Icelandic
234
235
236
237       ISO 8859-2 (Latin 2)     For  most  Latin-written  Slavic  and  Central
238                                European languages:
239
240
241
242
243                                Czech                Polish                Slovak
244                                German               Rumanian              Slovene
245                                Hungarian            Croatian
246
247
248
249       ISO 8859-3 (Latin 3)     Popularly  used  for Esperanto, Galician, Mal‐
250                                tese, and Turkish.
251
252
253       ISO 8859-4 (Latin 4)     Introduces letters for Estonian, Latvian,  and
254                                Lithuanian. It is an incomplete predecessor of
255                                ISO 8859-10 (Latin 6).
256
257
258       ISO 8859-9 (Latin 5)     Replaces the rarely needed  Icelandic  letters
259                                in ISO 8859-1 (Latin 1) with the Turkish ones.
260
261
262       ISO 8859-10 (Latin 6)    Adds  the  last  Inuit  (Greenlandic) and Sami
263                                (Lappish) letters that were  not  included  in
264                                ISO  8859-4  (Latin 4) to complete coverage of
265                                the Nordic area.
266
267
268
269
270SunOS 5.11                        18 Apr 1997                 iconv_unicode(5)

NAME

DESCRIPTION

EXAMPLES

FILES

SEE ALSO

NOTES