1Map(3)                User Contributed Perl Documentation               Map(3)
2
3
4

NAME

6       Unicode::Map V0.112 - maps charsets from and to utf16 unicode
7

SYNOPSIS

9           use Unicode::Map();
10
11           $Map = new Unicode::Map("ISO-8859-1");
12
13           $utf16 = $Map -> to_unicode ("Hello world!");
14             => $utf16 == "\0H\0e\0l\0l\0o\0 \0w\0o\0r\0l\0d\0!"
15
16           $locale = $Map -> from_unicode ($utf16);
17             => $locale == "Hello world!"
18
19       A more detailed description below.
20
21       2do: short note about perl's Unicode perspectives.
22

DESCRIPTION

24       This module converts strings from and to 2-byte Unicode UCS2 format.
25       All mappings happen via 2 byte UTF16 encodings, not via 1 byte UTF8
26       encoding. To transform these use Unicode::String.
27
28       For historical reasons this module coexists with Unicode::Map8.  Please
29       use Unicode::Map8 unless you need to care for two byte character sets,
30       e.g. chinese GB2312. Anyway, if you stick to the basic functionality
31       (see documentation) you can use both modules equivalently.
32
33       Practically this module will disappear from earth sooner or later as
34       Unicode mapping support needs somehow to get into perl's core. If you
35       like to work on this field please don't hesitate contacting Gisle Aas!
36
37       This module can't deal directly with utf8. Use Unicode::String to
38       convert utf8 to utf16 and vice versa.
39
40       Character mapping is according to the data of binary mapfiles in
41       Unicode::Map hierarchy. Binary mapfiles can also be created with this
42       module, enabling you to install own specific character sets. Refer to
43       mkmapfile or file REGISTRY in the Unicode::Map hierarchy.
44

CONVERSION METHODS

46       Probably these are the only methods you will need from this module.
47       Their usage is compatible with Unicode::Map8.
48
49       new $Map = new Unicode::Map("GB2312-80")
50
51           Returns a new Map object for GB2312-80 encoding.
52
53       from_unicode
54           $dest = $Map -> from_unicode ($src)
55
56           Creates a string in locale charset representation from utf16
57           encoded string $src.
58
59       to_unicode
60           $dest   = $Map -> to_unicode ($src)
61
62           Creates a string in utf16 representation from $src.
63
64       to8 Alias for from_unicode. For compatibility with Unicode::Map8
65
66       to16
67           Alias for to_unicode. For compatibility with Unicode::Map8
68

WARNINGS

70           You can demand Unicode::Map to issue warnings at deprecated or
71           incompatible usage with the constants WARN_DEFAULT,
72           WARN_DEPRECATION or WARN_COMPATIBILITY.  The latter both can be
73           ored together.
74
75           No special warnings:
76
77           $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEFAULT
78
79           Warnings for deprecated usage:
80
81           $Unicode::Map::WARNINGS = Unicode::Map::WARN_DEPRECATION
82
83           Warnings for incompatible usage:
84
85           $Unicode::Map::WARNINGS = Unicode::Map::WARN_COMPATIBILITY
86

MAINTAINANCE METHODS

88       Note: These methods are solely for the maintainance of Unicode::Map.
89       Using any of these methods will lead to programs incompatible with
90       Unicode::Map8.
91
92       alias
93           @list = $Map -> alias ($csid)
94
95           Returns a list of alias names of character set $csid.
96
97       mapping
98           $path = $Map -> mapping ($csid)
99
100           Returns the absolute path of binary character mapping for character
101           set $csid according to REGISTRY file of Unicode::Map.
102
103       id  $real_id||"" = $Map -> id ($test_id)
104
105           Returns a valid character set identifier $real_id, if $test_id is a
106           valid character set name or alias name according to REGISTRY file
107           of Unicode::Map.
108
109       ids @ids = $Map -> ids()
110
111           Returns a list of all character set names defined in REGISTRY file.
112
113       read_text_mapping
114           1||0 = $Map -> read_text_mapping ($csid, $path, $style)
115
116           Read a text mapping of style $style named $csid from filename
117           $path.  The mapping then can be saved to a file with method:
118           write_binary_mapping.  <$style> can be:
119
120            style          description
121
122            "unicode"    A text mapping as of ftp://ftp.unicode.org/MAPPINGS/
123            ""           Same as "unicode"
124            "reverse"    Similar to unicode, but both columns are switched
125            "keld"       A text mapping as of ftp://dkuug.dk/i18n/charmaps/
126
127       src $path = $Map -> src ($csid)
128
129           Returns the path of textual character mapping for character set
130           $csid according to REGISTRY file of Unicode::Map.
131
132       style
133           $path = $Map -> style ($csid)
134
135           Returns the style of textual character mapping for character set
136           $csid according to REGISTRY file of Unicode::Map.
137
138       write_binary_mapping
139           1||0 = $Map -> write_binary_mapping ($csid, $path)
140
141           Stores a mapping that has been loaded via method read_text_mapping
142           in file $path.
143

DEPRECATED METHODS

145       Some functionality is no longer promoted.
146
147       noise
148           Deprecated! Don't use any longer.
149
150       reverse_unicode
151           Deprecated! Use Unicode::String::byteswap instead.
152

BINARY MAPPINGS

154       Structure of binary Mapfiles
155
156       Unicode character mapping tables have sequences of sequential key and
157       sequential value codes. This property is used to crunch the maps
158       easily.  n (0<n<256) sequential characters are represented as a
159       bytecount n and the first character code key_start. For these
160       subsequences the according value sequences are crunched together, also.
161       The value 0 is used to start an extended information block (that is
162       just partially implemented, though).
163
164       One could think of two ways to make a binary mapfile. First method
165       would be first to write a list of all key codes, and then to write a
166       list of all value codes. Second method, used here, appends to all
167       partial key code lists the according crunched value code lists. This
168       makes value codes a little bit closer to key codes.
169
170       Note: the file format is still in a very liquid state. Neither rely on
171       that it will stay as this, nor that the description is bugless, nor
172       that all features are implemented.
173
174       STRUCTURE:
175
176       <main>:
177              offset  structure     value
178
179              0x00    word          0x27b8   (magic)
180              0x02    @(<extended> || <submapping>)
181
182           The mapfile ends with extended mode <end> in main stream.
183
184       <submapping>:
185              0x00    byte != 0     charsize1 (bits)
186              0x01    byte          n1 number of chars for one entry
187              0x02    byte          charsize2 (bits)
188              0x03    byte          n2 number of chars for one entry
189              0x04    @(<extended> || <key_seq> || <key_val_seq)
190
191              bs1=int((charsize1+7)/8), bs2=int((charsize2+7)/8)
192
193           One submapping ends when <mapend> entry occurs.
194
195       <key_val_seq>:
196              0x00    size=0|1|2|4  n, number of sequential characters
197              size    bs1           key1
198              +bs1    bs2           value1
199              +bs2    bs1           key2
200              +bs1    bs2           value2
201              ...
202
203           key_val_seq ends, if either file ends (n = infinite mode) or n
204           pairs are read.
205
206       <key_seq>:
207              0x00    byte          n, number of sequential characters
208              0x01    bs1           key_start, first character of sequence
209              1+bs1   @(<extended> || <val_seq>)
210
211           A key sequence starts with a byte count telling how long the
212           sequence is. It is followed by the key start code. After this comes
213           a list of value sequences. The list of value sequences ends, if
214           sum(m) equals n.
215
216       <val_seq>:
217              0x00    byte          m, number of sequential characters
218              0x01    bs2           val_start, first character of sequence
219
220       <extended>:
221              0x00    byte          0
222              0x01    byte          ftype
223              0x02    byte          fsize, size of following structure
224              0x03    fsize bytes   something
225
226           For future extensions or private use one can insert here 1..255
227           byte long streams. ftype can have values 30..255, values 0..29 are
228           reserved. Modi are not fully defined now and could change. They
229           will be explained later.
230

TO BE DONE

232       -   Something clever, when a character has no translation.
233
234       -   Direct charset -> charset mapping.
235
236       -   Better performance.
237
238       -   Support for mappings according to RFC 1345.
239

SEE ALSO

241       -   File "REGISTRY" and binary mappings in directory "Unicode/Map" of
242           your perl library path
243
244       -   recode(1), map(1), mkmapfile(1), Unicode::Map(3), Unicode::Map8(3),
245           Unicode::String(3), Unicode::CharName(3), mirrorMappings(1)
246
247       -   RFC 1345
248
249       -   Mappings at Unicode consortium ftp://ftp.unicode.org/MAPPINGS/
250
251       -   Registrated Internet character sets ftp://dkuug.dk/i18n/charmaps/
252
253       -   2do: more references
254

AUTHOR

256       Martin Schwartz <martin@nacho.de>
257

POD ERRORS

259       Hey! The above document had some coding errors, which are explained
260       below:
261
262       Around line 1112:
263           You can't have =items (as at line 1118) unless the first thing
264           after the =over is an =item
265
266
267
268perl v5.32.0                      2020-07-28                            Map(3)
Impressum