1Map(3)                User Contributed Perl Documentation               Map(3)
2
3
4

NAME

6       Unicode::Map V0.112 - maps charsets from and to utf16 unicode
7

SYNOPSIS

9           use Unicode::Map();
10
11           $Map = new Unicode::Map("ISO-8859-1");
12
13           $utf16 = $Map -> to_unicode ("Hello world!");
14             => $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
15
16           $locale = $Map -> from_unicode ($utf16);
17             => $locale == "Hello world!"
18
19       A more detailed description below.
20
21       2do: short note about perl's Unicode perspectives.
22

DESCRIPTION

24       This module converts strings from and to 2-byte Unicode UCS2 format.
25       All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8
26       encoding. To transform these use Unicode::String.
27
28       For historical reasons this module coexists with Unicode::Map8.  Please
29       use Unicode::Map8 unless you need to care for two byte character sets,
30       e.g. chinese GB2312. Anyway, if you stick to the basic functionality
31       (see documentation) you can use both modules equivalently.
32
33       Practically this module will disappear from earth sooner or later as
34       Unicode mapping support needs somehow to get into perl's core. If you
35       like to work on this field please don't hesitate contacting Gisle Aas!
36
37       This module can't deal directly with utf8. Use Unicode::String to con‐
38       vert utf8 to utf16 and vice versa.
39
40       Character mapping is according to the data of binary mapfiles in Uni‐
41       code::Map hierarchy. Binary mapfiles can also be created with this mod‐
42       ule, enabling you to install own specific character sets. Refer to
43       mkmapfile or file REGISTRY in the Unicode::Map hierarchy.
44

CONVERSION METHODS

46       Probably these are the only methods you will need from this module.
47       Their usage is compatible with Unicode::Map8.
48
49       new $Map = new Unicode::Map("GB2312-80")
50
51           Returns a new Map object for GB2312-80 encoding.
52
53       from_unicode
54           $dest = $Map -> from_unicode ($src)
55
56           Creates a string in locale charset representation from utf16
57           encoded string $src.
58
59       to_unicode
60           $dest   = $Map -> to_unicode ($src)
61
62           Creates a string in utf16 representation from $src.
63
64       to8 Alias for from_unicode. For compatibility with Unicode::Map8
65
66       to16
67           Alias for to_unicode. For compatibility with Unicode::Map8
68

WARNINGS

70           You can demand Unicode::Map to issue warnings at deprecated or
71           incompatible usage with the constants WARN_DEFAULT, WARN_DEPRECA‐
72           TION or WARN_COMPATIBILITY.  The latter both can be ored together.
73
74       No special warnings:
75           $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT
76
77       Warnings for deprecated usage:
78           $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION
79
80       Warnings for incompatible usage:
81           $Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY
82

MAINTAINANCE METHODS

84       Note: These methods are solely for the maintainance of Unicode::Map.
85       Using any of these methods will lead to programs incompatible with Uni‐
86       code::Map8.
87
88       alias
89           @list = $Map -> alias ($csid)
90
91           Returns a list of alias names of character set $csid.
92
93       mapping
94           $path = $Map -> mapping ($csid)
95
96           Returns the absolute path of binary character mapping for character
97           set $csid according to REGISTRY file of Unicode::Map.
98
99       id  $real_id⎪⎪"" = $Map -> id ($test_id)
100
101           Returns a valid character set identifier $real_id, if $test_id is a
102           valid character set name or alias name according to REGISTRY file
103           of Unicode::Map.
104
105       ids @ids = $Map -> ids()
106
107           Returns a list of all character set names defined in REGISTRY file.
108
109       read_text_mapping
110           1⎪⎪0 = $Map -> read_text_mapping ($csid, $path, $style)
111
112           Read a text mapping of style $style named $csid from filename
113           $path.  The mapping then can be saved to a file with method:
114           write_binary_mapping.  <$style> can be:
115
116            style          description
117
118            "unicode"    A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
119            ""           Same as "unicode"
120            "reverse"    Similar to unicode, but both columns are switched
121            "keld"       A text mapping as of ftp://dkuug.dk/i18n/charmaps/
122
123       src $path = $Map -> src ($csid)
124
125           Returns the path of textual character mapping for character set
126           $csid according to REGISTRY file of Unicode::Map.
127
128       style
129           $path = $Map -> style ($csid)
130
131           Returns the style of textual character mapping for character set
132           $csid according to REGISTRY file of Unicode::Map.
133
134       write_binary_mapping
135           1⎪⎪0 = $Map -> write_binary_mapping ($csid, $path)
136
137           Stores a mapping that has been loaded via method read_text_mapping
138           in file $path.
139

DEPRECATED METHODS

141       Some functionality is no longer promoted.
142
143       noise
144           Deprecated! Don't use any longer.
145
146       reverse_unicode
147           Deprecated! Use Unicode::String::byteswap instead.
148

BINARY MAPPINGS

150       Structure of binary Mapfiles
151
152       Unicode character mapping tables have sequences of sequential key and
153       sequential value codes. This property is used to crunch the maps eas‐
154       ily.  n (0<n<256) sequential characters are represented as a bytecount
155       n and the first character code key_start. For these subsequences the
156       according value sequences are crunched together, also. The value 0 is
157       used to start an extended information block (that is just partially
158       implemented, though).
159
160       One could think of two ways to make a binary mapfile. First method
161       would be first to write a list of all key codes, and then to write a
162       list of all value codes. Second method, used here, appends to all par‐
163       tial key code lists the according crunched value code lists. This makes
164       value codes a little bit closer to key codes.
165
166       Note: the file format is still in a very liquid state. Neither rely on
167       that it will stay as this, nor that the description is bugless, nor
168       that all features are implemented.
169
170       STRUCTURE:
171
172       <main>:
173              offset  structure     value
174
175              0x00    word          0x27b8   (magic)
176              0x02    @(<extended> ⎪⎪ <submapping>)
177
178           The mapfile ends with extended mode <end> in main stream.
179
180       <submapping>:
181              0x00    byte != 0     charsize1 (bits)
182              0x01    byte          n1 number of chars for one entry
183              0x02    byte          charsize2 (bits)
184              0x03    byte          n2 number of chars for one entry
185              0x04    @(<extended> ⎪⎪ <key_seq> ⎪⎪ <key_val_seq)
186
187              bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)
188
189           One submapping ends when <mapend> entry occurs.
190
191       <key_val_seq>:
192              0x00    size=0⎪1⎪2⎪4  n, number of sequential characters
193              size    bs1           key1
194              +bs1    bs2           value1
195              +bs2    bs1           key2
196              +bs1    bs2           value2
197              ...
198
199           key_val_seq ends, if either file ends (n = infinite mode) or n
200           pairs are read.
201
202       <key_seq>:
203              0x00    byte          n, number of sequential characters
204              0x01    bs1           key_start, first character of sequence
205              1+bs1   @(<extended> ⎪⎪ <val_seq>)
206
207           A key sequence starts with a byte count telling how long the
208           sequence is. It is followed by the key start code. After this comes
209           a list of value sequences. The list of value sequences ends, if
210           sum(m) equals n.
211
212       <val_seq>:
213              0x00    byte          m, number of sequential characters
214              0x01    bs2           val_start, first character of sequence
215
216       <extended>:
217              0x00    byte          0
218              0x01    byte          ftype
219              0x02    byte          fsize, size of following structure
220              0x03    fsize bytes   something
221
222           For future extensions or private use one can insert here 1..255
223           byte long streams. ftype can have values 30..255, values 0..29 are
224           reserved. Modi are not fully defined now and could change. They
225           will be explained later.
226

TO BE DONE

228       -   Something clever, when a character has no translation.
229
230       -   Direct charset -> charset mapping.
231
232       -   Better performance.
233
234       -   Support for mappings according to RFC 1345.
235

SEE ALSO

237       -   File "REGISTRY" and binary mappings in directory "Unicode/Map" of
238           your perl library path
239
240       -   recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3),
241           Unicode::String(3), Unicode::CharName(3), mirrorMappings(1)
242
243       -   RFC 1345
244
245       -   Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/
246
247       -   Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/
248
249       -   2do: more references
250

AUTHOR

252       Martin Schwartz <martin@nacho.de>
253
254
255
256perl v5.8.8                       2002-03-19                            Map(3)
Impressum