1Map(3) User Contributed Perl Documentation Map(3)
2
3
4
6 Unicode::Map V0.112 - maps charsets from and to utf16 unicode
7
9 use Unicode::Map();
10
11 $Map = new Unicode::Map("ISO-8859-1");
12
13 $utf16 = $Map -> to_unicode ("Hello world!");
14 => $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
15
16 $locale = $Map -> from_unicode ($utf16);
17 => $locale == "Hello world!"
18
19 A more detailed description below.
20
21 2do: short note about perl's Unicode perspectives.
22
24 This module converts strings from and to 2-byte Unicode UCS2 format.
25 All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8
26 encoding. To transform these use Unicode::String.
27
28 For historical reasons this module coexists with Unicode::Map8. Please
29 use Unicode::Map8 unless you need to care for two byte character sets,
30 e.g. chinese GB2312. Anyway, if you stick to the basic functionality
31 (see documentation) you can use both modules equivalently.
32
33 Practically this module will disappear from earth sooner or later as
34 Unicode mapping support needs somehow to get into perl's core. If you
35 like to work on this field please don't hesitate contacting Gisle Aas!
36
37 This module can't deal directly with utf8. Use Unicode::String to con‐
38 vert utf8 to utf16 and vice versa.
39
40 Character mapping is according to the data of binary mapfiles in Uni‐
41 code::Map hierarchy. Binary mapfiles can also be created with this mod‐
42 ule, enabling you to install own specific character sets. Refer to
43 mkmapfile or file REGISTRY in the Unicode::Map hierarchy.
44
46 Probably these are the only methods you will need from this module.
47 Their usage is compatible with Unicode::Map8.
48
49 new $Map = new Unicode::Map("GB2312-80")
50
51 Returns a new Map object for GB2312-80 encoding.
52
53 from_unicode
54 $dest = $Map -> from_unicode ($src)
55
56 Creates a string in locale charset representation from utf16
57 encoded string $src.
58
59 to_unicode
60 $dest = $Map -> to_unicode ($src)
61
62 Creates a string in utf16 representation from $src.
63
64 to8 Alias for from_unicode. For compatibility with Unicode::Map8
65
66 to16
67 Alias for to_unicode. For compatibility with Unicode::Map8
68
70 You can demand Unicode::Map to issue warnings at deprecated or
71 incompatible usage with the constants WARN_DEFAULT, WARN_DEPRECA‐
72 TION or WARN_COMPATIBILITY. The latter both can be ored together.
73
74 No special warnings:
75 $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT
76
77 Warnings for deprecated usage:
78 $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION
79
80 Warnings for incompatible usage:
81 $Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY
82
84 Note: These methods are solely for the maintainance of Unicode::Map.
85 Using any of these methods will lead to programs incompatible with Uni‐
86 code::Map8.
87
88 alias
89 @list = $Map -> alias ($csid)
90
91 Returns a list of alias names of character set $csid.
92
93 mapping
94 $path = $Map -> mapping ($csid)
95
96 Returns the absolute path of binary character mapping for character
97 set $csid according to REGISTRY file of Unicode::Map.
98
99 id $real_id⎪⎪"" = $Map -> id ($test_id)
100
101 Returns a valid character set identifier $real_id, if $test_id is a
102 valid character set name or alias name according to REGISTRY file
103 of Unicode::Map.
104
105 ids @ids = $Map -> ids()
106
107 Returns a list of all character set names defined in REGISTRY file.
108
109 read_text_mapping
110 1⎪⎪0 = $Map -> read_text_mapping ($csid, $path, $style)
111
112 Read a text mapping of style $style named $csid from filename
113 $path. The mapping then can be saved to a file with method:
114 write_binary_mapping. <$style> can be:
115
116 style description
117
118 "unicode" A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
119 "" Same as "unicode"
120 "reverse" Similar to unicode, but both columns are switched
121 "keld" A text mapping as of ftp://dkuug.dk/i18n/charmaps/
122
123 src $path = $Map -> src ($csid)
124
125 Returns the path of textual character mapping for character set
126 $csid according to REGISTRY file of Unicode::Map.
127
128 style
129 $path = $Map -> style ($csid)
130
131 Returns the style of textual character mapping for character set
132 $csid according to REGISTRY file of Unicode::Map.
133
134 write_binary_mapping
135 1⎪⎪0 = $Map -> write_binary_mapping ($csid, $path)
136
137 Stores a mapping that has been loaded via method read_text_mapping
138 in file $path.
139
141 Some functionality is no longer promoted.
142
143 noise
144 Deprecated! Don't use any longer.
145
146 reverse_unicode
147 Deprecated! Use Unicode::String::byteswap instead.
148
150 Structure of binary Mapfiles
151
152 Unicode character mapping tables have sequences of sequential key and
153 sequential value codes. This property is used to crunch the maps eas‐
154 ily. n (0<n<256) sequential characters are represented as a bytecount
155 n and the first character code key_start. For these subsequences the
156 according value sequences are crunched together, also. The value 0 is
157 used to start an extended information block (that is just partially
158 implemented, though).
159
160 One could think of two ways to make a binary mapfile. First method
161 would be first to write a list of all key codes, and then to write a
162 list of all value codes. Second method, used here, appends to all par‐
163 tial key code lists the according crunched value code lists. This makes
164 value codes a little bit closer to key codes.
165
166 Note: the file format is still in a very liquid state. Neither rely on
167 that it will stay as this, nor that the description is bugless, nor
168 that all features are implemented.
169
170 STRUCTURE:
171
172 <main>:
173 offset structure value
174
175 0x00 word 0x27b8 (magic)
176 0x02 @(<extended> ⎪⎪ <submapping>)
177
178 The mapfile ends with extended mode <end> in main stream.
179
180 <submapping>:
181 0x00 byte != 0 charsize1 (bits)
182 0x01 byte n1 number of chars for one entry
183 0x02 byte charsize2 (bits)
184 0x03 byte n2 number of chars for one entry
185 0x04 @(<extended> ⎪⎪ <key_seq> ⎪⎪ <key_val_seq)
186
187 bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)
188
189 One submapping ends when <mapend> entry occurs.
190
191 <key_val_seq>:
192 0x00 size=0⎪1⎪2⎪4 n, number of sequential characters
193 size bs1 key1
194 +bs1 bs2 value1
195 +bs2 bs1 key2
196 +bs1 bs2 value2
197 ...
198
199 key_val_seq ends, if either file ends (n = infinite mode) or n
200 pairs are read.
201
202 <key_seq>:
203 0x00 byte n, number of sequential characters
204 0x01 bs1 key_start, first character of sequence
205 1+bs1 @(<extended> ⎪⎪ <val_seq>)
206
207 A key sequence starts with a byte count telling how long the
208 sequence is. It is followed by the key start code. After this comes
209 a list of value sequences. The list of value sequences ends, if
210 sum(m) equals n.
211
212 <val_seq>:
213 0x00 byte m, number of sequential characters
214 0x01 bs2 val_start, first character of sequence
215
216 <extended>:
217 0x00 byte 0
218 0x01 byte ftype
219 0x02 byte fsize, size of following structure
220 0x03 fsize bytes something
221
222 For future extensions or private use one can insert here 1..255
223 byte long streams. ftype can have values 30..255, values 0..29 are
224 reserved. Modi are not fully defined now and could change. They
225 will be explained later.
226
228 - Something clever, when a character has no translation.
229
230 - Direct charset -> charset mapping.
231
232 - Better performance.
233
234 - Support for mappings according to RFC 1345.
235
237 - File "REGISTRY" and binary mappings in directory "Unicode/Map" of
238 your perl library path
239
240 - recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3),
241 Unicode::String(3), Unicode::CharName(3), mirrorMappings(1)
242
243 - RFC 1345
244
245 - Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/
246
247 - Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/
248
249 - 2do: more references
250
252 Martin Schwartz <martin@nacho.de>
253
254
255
256perl v5.8.8 2002-03-19 Map(3)