1Map(3) User Contributed Perl Documentation Map(3)
2
3
4
6 Unicode::Map V0.112 - maps charsets from and to utf16 unicode
7
9 use Unicode::Map();
10
11 $Map = new Unicode::Map("ISO-8859-1");
12
13 $utf16 = $Map -> to_unicode ("Hello world!");
14 => $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
15
16 $locale = $Map -> from_unicode ($utf16);
17 => $locale == "Hello world!"
18
19 A more detailed description below.
20
21 2do: short note about perl's Unicode perspectives.
22
24 This module converts strings from and to 2-byte Unicode UCS2 format.
25 All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8
26 encoding. To transform these use Unicode::String.
27
28 For historical reasons this module coexists with Unicode::Map8. Please
29 use Unicode::Map8 unless you need to care for two byte character sets,
30 e.g. chinese GB2312. Anyway, if you stick to the basic functionality
31 (see documentation) you can use both modules equivalently.
32
33 Practically this module will disappear from earth sooner or later as
34 Unicode mapping support needs somehow to get into perl's core. If you
35 like to work on this field please don't hesitate contacting Gisle Aas!
36
37 This module can't deal directly with utf8. Use Unicode::String to
38 convert utf8 to utf16 and vice versa.
39
40 Character mapping is according to the data of binary mapfiles in
41 Unicode::Map hierarchy. Binary mapfiles can also be created with this
42 module, enabling you to install own specific character sets. Refer to
43 mkmapfile or file REGISTRY in the Unicode::Map hierarchy.
44
46 Probably these are the only methods you will need from this module.
47 Their usage is compatible with Unicode::Map8.
48
49 new $Map = new Unicode::Map("GB2312-80")
50
51 Returns a new Map object for GB2312-80 encoding.
52
53 from_unicode
54 $dest = $Map -> from_unicode ($src)
55
56 Creates a string in locale charset representation from utf16
57 encoded string $src.
58
59 to_unicode
60 $dest = $Map -> to_unicode ($src)
61
62 Creates a string in utf16 representation from $src.
63
64 to8 Alias for from_unicode. For compatibility with Unicode::Map8
65
66 to16
67 Alias for to_unicode. For compatibility with Unicode::Map8
68
70 You can demand Unicode::Map to issue warnings at deprecated or
71 incompatible usage with the constants WARN_DEFAULT,
72 WARN_DEPRECATION or WARN_COMPATIBILITY. The latter both can be
73 ored together.
74
75 No special warnings:
76
77 $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT
78
79 Warnings for deprecated usage:
80
81 $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION
82
83 Warnings for incompatible usage:
84
85 $Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY
86
88 Note: These methods are solely for the maintainance of Unicode::Map.
89 Using any of these methods will lead to programs incompatible with
90 Unicode::Map8.
91
92 alias
93 @list = $Map -> alias ($csid)
94
95 Returns a list of alias names of character set $csid.
96
97 mapping
98 $path = $Map -> mapping ($csid)
99
100 Returns the absolute path of binary character mapping for character
101 set $csid according to REGISTRY file of Unicode::Map.
102
103 id $real_id||"" = $Map -> id ($test_id)
104
105 Returns a valid character set identifier $real_id, if $test_id is a
106 valid character set name or alias name according to REGISTRY file
107 of Unicode::Map.
108
109 ids @ids = $Map -> ids()
110
111 Returns a list of all character set names defined in REGISTRY file.
112
113 read_text_mapping
114 1||0 = $Map -> read_text_mapping ($csid, $path, $style)
115
116 Read a text mapping of style $style named $csid from filename
117 $path. The mapping then can be saved to a file with method:
118 write_binary_mapping. <$style> can be:
119
120 style description
121
122 "unicode" A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
123 "" Same as "unicode"
124 "reverse" Similar to unicode, but both columns are switched
125 "keld" A text mapping as of ftp://dkuug.dk/i18n/charmaps/
126
127 src $path = $Map -> src ($csid)
128
129 Returns the path of textual character mapping for character set
130 $csid according to REGISTRY file of Unicode::Map.
131
132 style
133 $path = $Map -> style ($csid)
134
135 Returns the style of textual character mapping for character set
136 $csid according to REGISTRY file of Unicode::Map.
137
138 write_binary_mapping
139 1||0 = $Map -> write_binary_mapping ($csid, $path)
140
141 Stores a mapping that has been loaded via method read_text_mapping
142 in file $path.
143
145 Some functionality is no longer promoted.
146
147 noise
148 Deprecated! Don't use any longer.
149
150 reverse_unicode
151 Deprecated! Use Unicode::String::byteswap instead.
152
154 Structure of binary Mapfiles
155
156 Unicode character mapping tables have sequences of sequential key and
157 sequential value codes. This property is used to crunch the maps
158 easily. n (0<n<256) sequential characters are represented as a
159 bytecount n and the first character code key_start. For these
160 subsequences the according value sequences are crunched together, also.
161 The value 0 is used to start an extended information block (that is
162 just partially implemented, though).
163
164 One could think of two ways to make a binary mapfile. First method
165 would be first to write a list of all key codes, and then to write a
166 list of all value codes. Second method, used here, appends to all
167 partial key code lists the according crunched value code lists. This
168 makes value codes a little bit closer to key codes.
169
170 Note: the file format is still in a very liquid state. Neither rely on
171 that it will stay as this, nor that the description is bugless, nor
172 that all features are implemented.
173
174 STRUCTURE:
175
176 <main>:
177 offset structure value
178
179 0x00 word 0x27b8 (magic)
180 0x02 @(<extended> || <submapping>)
181
182 The mapfile ends with extended mode <end> in main stream.
183
184 <submapping>:
185 0x00 byte != 0 charsize1 (bits)
186 0x01 byte n1 number of chars for one entry
187 0x02 byte charsize2 (bits)
188 0x03 byte n2 number of chars for one entry
189 0x04 @(<extended> || <key_seq> || <key_val_seq)
190
191 bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)
192
193 One submapping ends when <mapend> entry occurs.
194
195 <key_val_seq>:
196 0x00 size=0|1|2|4 n, number of sequential characters
197 size bs1 key1
198 +bs1 bs2 value1
199 +bs2 bs1 key2
200 +bs1 bs2 value2
201 ...
202
203 key_val_seq ends, if either file ends (n = infinite mode) or n
204 pairs are read.
205
206 <key_seq>:
207 0x00 byte n, number of sequential characters
208 0x01 bs1 key_start, first character of sequence
209 1+bs1 @(<extended> || <val_seq>)
210
211 A key sequence starts with a byte count telling how long the
212 sequence is. It is followed by the key start code. After this comes
213 a list of value sequences. The list of value sequences ends, if
214 sum(m) equals n.
215
216 <val_seq>:
217 0x00 byte m, number of sequential characters
218 0x01 bs2 val_start, first character of sequence
219
220 <extended>:
221 0x00 byte 0
222 0x01 byte ftype
223 0x02 byte fsize, size of following structure
224 0x03 fsize bytes something
225
226 For future extensions or private use one can insert here 1..255
227 byte long streams. ftype can have values 30..255, values 0..29 are
228 reserved. Modi are not fully defined now and could change. They
229 will be explained later.
230
232 - Something clever, when a character has no translation.
233
234 - Direct charset -> charset mapping.
235
236 - Better performance.
237
238 - Support for mappings according to RFC 1345.
239
241 - File "REGISTRY" and binary mappings in directory "Unicode/Map" of
242 your perl library path
243
244 - recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3),
245 Unicode::String(3), Unicode::CharName(3), mirrorMappings(1)
246
247 - RFC 1345
248
249 - Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/
250
251 - Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/
252
253 - 2do: more references
254
256 Martin Schwartz <martin@nacho.de>
257
259 Hey! The above document had some coding errors, which are explained
260 below:
261
262 Around line 1112:
263 You can't have =items (as at line 1118) unless the first thing
264 after the =over is an =item
265
266
267
268perl v5.32.1 2021-01-27 Map(3)