1iconv_unicode(5) Standards, Environments, and Macros iconv_unicode(5)
2
3
4
6 iconv_unicode - code set conversion tables for Unicode
7
9 The following code set conversions are supported:
10
11 CODE SET CONVERSIONS SUPPORTED
12 ------------------------------
13 FROM Code Set TO Code Set
14 Code FROM Target Code TO
15 Filename Filename
16 Element Element
17
18 ISO 8859-1 (Latin 1) 8859-1 UTF-8 UTF-8
19 ISO 8859-2 (Latin 2) 8859-2 UTF-8 UTF-8
20 ISO 8859-3 (Latin 3) 8859-3 UTF-8 UTF-8
21 ISO 8859-4 (Latin 4) 8859-4 UTF-8 UTF-8
22 ISO 8859-5 (Cyrillic) 8859-5 UTF-8 UTF-8
23 ISO 8859-6 (Arabic) 8859-6 UTF-8 UTF-8
24 ISO 8859-7 (Greek) 8859-7 UTF-8 UTF-8
25 ISO 8859-8 (Hebrew) 8859-8 UTF-8 UTF-8
26 ISO 8859-9 (Latin 5) 8859-9 UTF-8 UTF-8
27 ISO 8859-10 (Latin 6) 8859-10 UTF-8 UTF-8
28 Japanese EUC eucJP UTF-8 UTF-8
29 Chinese/PRC EUC
30 (GB 2312-1980) gb2312 UTF-8 UTF-8
31 ISO-2022 iso2022 UTF-8 UTF-8
32 Korean EUC ko_KR-euc Korean UTF-8 ko_KR-UTF-8
33 ISO-2022-KR ko_KR-iso2022-7 Korean UTF-8 ko_KR_UTF-8
34 Korean Johap
35 (KS C 5601-1987) ko_KR-johap Korean UTF-8 ko_KR-UTF-8
36 Korean Johap
37 (KS C 5601-1992) ko_KR-johap92 Korean UTF-8 ko_KR-UTF-8
38 Korean UTF-8 ko_KR-UTF-8 Korean EUC ko_KR-euc
39 Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap
40 (KS C 5601-1987)
41 Korean UTF-8 ko_KR-UTF-8 Korean Johap ko_KR-johap92
42 (KS C 5601-1992)
43 KOI8-R (Cyrillic) KOI8-R UCS-2 UCS-2
44 KOI8-R (Cyrillic) KOI8-R UTF-8 UTF-8
45 PC Kanji (SJIS) PCK UTF-8 UTF-8
46 PC Kanji (SJIS) SJIS UTF-8 UTF-8
47 UCS-2 UCS-2 KOI8-R (Cyrillic) KOI8-R
48 UCS-2 UCS-2 UCS-4 UCS-4
49
50
51
52 CODE SET CONVERSIONS SUPPORTED
53 ------------------------------
54 FROM Code Set TO Code Set
55 Code FROM Target Code TO
56 Filename Filename
57 Element Element
58
59 UCS-2 UCS-2 UTF-7 UTF-7
60 UCS-2 UCS-2 UTF-8 UTF-8
61 UCS-4 UCS-4 UCS-2 UCS-2
62 UCS-4 UCS-4 UTF-16 UTF-16
63 UCS-4 UCS-4 UTF-7 UTF-7
64 UCS-4 UCS-4 UTF-8 UTF-8
65 UTF-16 UTF-16 UCS-4 UCS-4
66 UTF-16 UTF-16 UTF-8 UTF-8
67 UTF-7 UTF-7 UCS-2 UCS-2
68 UTF-7 UTF-7 UCS-4 UCS-4
69 UTF-7 UTF-7 UTF-8 UTF-8
70 UTF-8 UTF-8 ISO 8859-1 (Latin 1) 8859-1
71 UTF-8 UTF-8 ISO 8859-2 (Latin 2) 8859-2
72 UTF-8 UTF-8 ISO 8859-3 (Latin 3) 8859-3
73 UTF-8 UTF-8 ISO 8859-4 (Latin 4) 8859-4
74 UTF-8 UTF-8 ISO 8859-5 (Cyrillic) 8859-5
75 UTF-8 UTF-8 ISO 8859-6 (Arabic) 8859-6
76 UTF-8 UTF-8 ISO 8859-7 (Greek) 8859-7
77 UTF-8 UTF-8 ISO 8859-8 (Hebrew) 8859-8
78 UTF-8 UTF-8 ISO 8859-9 (Latin 5) 8859-9
79 UTF-8 UTF-8 ISO 8859-10 (Latin 6) 8859-10
80 UTF-8 UTF-8 Japanese EUC eucJP
81 UTF-8 UTF-8 Chinese/PRC EUC gb2312
82 (GB 2312-1980)
83 UTF-8 UTF-8 ISO-2022 iso2022
84 UTF-8 UTF-8 KOI8-R (Cyrillic) KOI8-R
85 UTF-8 UTF-8 PC Kanji (SJIS) PCK
86 UTF-8 UTF-8 PC Kanji (SJIS) SJIS
87 UTF-8 UTF-8 UCS-2 UCS-2
88 UTF-8 UTF-8 UCS-4 UCS-4
89 UTF-8 UTF-8 UTF-16 UTF-16
90 UTF-8 UTF-8 UTF-7 UTF-7
91 UTF-8 UTF-8 Chinese/PRC EUC zh_CN.euc
92 (GB 2312-1980)
93
94
95
96 CODE SET CONVERSIONS SUPPORTED
97 ------------------------------
98 FROM Code Set TO Code Set
99 Code FROM Target Code TO
100 Filename Filename
101 Element Element
102
103 UTF-8 UTF-8 ISO 2022-CN zh_CN.iso2022-7
104 UTF-8 UTF-8 Chinese/Taiwan Big5 zh_TW-big5
105 UTF-8 UTF-8 Chinese/Taiwan EUC zh_TW-euc
106 (CNS 11643-1992)
107 UTF-8 UTF-8 ISO 2022-TW zh_TW-iso2022-7
108 Chinese/PRC EUC zh_CN.euc UTF-8 UTF-8
109 (GB 2312-1980)
110 ISO 2022-CN zh_CN.iso2022-7 UTF-8 UTF-8
111 Chinese/Taiwan Big5 zh_TW-big5 UTF-8 UTF-8
112 Chinese/Taiwan EUC zh_TW-euc UTF-8 UTF-8
113 (CNS 11643-1992)
114 ISO 2022-TW zh_TW-iso2022-7 UTF-8 UTF-8
115
116
117
119 Example 1 The library module filename
120
121
122 In the conversion library, /usr/lib/iconv (see iconv(3C)), the library
123 module filename is composed of two symbolic elements separated by the
124 percent sign (%). The first symbol specifies the code set that is being
125 converted; the second symbol specifies the target code, that is, the
126 code set to which the first one is being converted.
127
128
129
130 In the conversion table above, the first symbol is termed the "FROM
131 Filename Element". The second symbol, representing the target code set,
132 is the "TO Filename Element".
133
134
135
136 For example, the library module filename to convert from the Korean EUC
137 code set to the Korean UTF-8 code set is
138
139
140
141 ko_KR-euc%ko_KR-UTF-8
142
143
145 /usr/lib/iconv/*.so conversion modules
146
147
149 iconv(1), iconv(3C), iconv(5)
150
151
152 Chernov, A., Registration of a Cyrillic Character Set, RFC 1489, RELCOM
153 Development Team, July 1993.
154
155
156 Chon, K., H. Je Park, and U. Choi, Korean Character Encoding for Inter‐
157 net Messages, RFC 1557, Solvit Chosun Media, December 1993.
158
159
160 Goldsmith, D., and M. Davis, UTF-7 - A Mail-Safe Transformation Format
161 of Unicode, RFC 1642, Taligent, Inc., July 1994.
162
163
164 Lee, F., HZ - A Data Format for Exchanging Files of Arbitrarily Mixed
165 Chinese and ASCII characters, RFC 1843, Stanford University, August
166 1995.
167
168
169 Murai, J., M. Crispin, and E. van der Poel, Japanese Character Encoding
170 for Internet Messages, RFC 1468, Keio University, Panda Programming,
171 June 1993.
172
173
174 Nussbacher, H., and Y. Bourvine, Hebrew Character Encoding for Internet
175 Messages, RFC 1555, Israeli Inter-University, Hebrew University, Decem‐
176 ber 1993.
177
178
179 Ohta, M., Character Sets ISO-10646 and ISO-10646-J-1, RFC 1815, Tokyo
180 Institute of Technology, July 1995.
181
182
183 Ohta, M., and K. Handa, ISO-2022-JP-2: Multilingual Extension of
184 ISO-2022-JP, RFC 1554, Tokyo Institute of Technology, December 1993.
185
186
187 Reynolds, J., and J. Postel, ASSIGNED NUMBERS, RFC 1700, University of
188 Southern California/Information Sciences Institute, October 1994.
189
190
191 Simonson, K., Character Mnemonics & Character Sets, RFC 1345, Rationel
192 Almen Planlaegning, June 1992.
193
194
195 Spinellis, D., Greek Character Encoding for Electronic Mail Messages,
196 RFC 1947, SENA S.A., May 1996.
197
198
199 The Unicode Consortium, The Unicode Standard, Version 2.0, Addison Wes‐
200 ley Developers Press, July 1996.
201
202
203 Wei, Y., Y. Zhang, J. Li, J. Ding, and Y. Jiang, ASCII Printable Char‐
204 acters-Based Chinese Character Encoding for Internet Messages, RFC
205 1842, AsiaInfo Services Inc., Harvard University, Rice University, Uni‐
206 versity of Maryland, August 1995.
207
208
209 Yergeau, F., UTF-8, a transformation format of Unicode and ISO 10646,
210 RFC 2044, Alis Technologies, October 1996.
211
212
213 Zhu, H., D. Hu, Z. Wang, T. Kao, W. Chang, and M. Crispin, Chinese
214 Character Encoding for Internet Messages, RFC 1922, Tsinghua Univer‐
215 sity, China Information Technology Standardization Technical Committee
216 (CITS), Institute for Information Industry (III), University of Wash‐
217 ington, March 1996.
218
220 ISO 8859 character sets using Latin alphabetic characters are distin‐
221 guished as follows:
222
223 ISO 8859-1 (Latin 1) For most West European languages, including:
224
225
226
227
228 Albanian Finnish Italian
229 Catalan French Norwegian
230 Danish German Portuguese
231 Dutch Galician Spanish
232 English Irish Swedish
233 Faeroese Icelandic
234
235
236
237 ISO 8859-2 (Latin 2) For most Latin-written Slavic and Central
238 European languages:
239
240
241
242
243 Czech Polish Slovak
244 German Rumanian Slovene
245 Hungarian Croatian
246
247
248
249 ISO 8859-3 (Latin 3) Popularly used for Esperanto, Galician, Mal‐
250 tese, and Turkish.
251
252
253 ISO 8859-4 (Latin 4) Introduces letters for Estonian, Latvian, and
254 Lithuanian. It is an incomplete predecessor of
255 ISO 8859-10 (Latin 6).
256
257
258 ISO 8859-9 (Latin 5) Replaces the rarely needed Icelandic letters
259 in ISO 8859-1 (Latin 1) with the Turkish ones.
260
261
262 ISO 8859-10 (Latin 6) Adds the last Inuit (Greenlandic) and Sami
263 (Lappish) letters that were not included in
264 ISO 8859-4 (Latin 4) to complete coverage of
265 the Nordic area.
266
267
268
269
270SunOS 5.11 18 Apr 1997 iconv_unicode(5)