1Map8(3)               User Contributed Perl Documentation              Map8(3)
2
3
4

NAME

6       Unicode::Map8 - Mapping table between 8-bit chars and Unicode
7

SYNOPSIS

9        require Unicode::Map8;
10        my $no_map = Unicode::Map8->new("ISO646-NO") || die;
11        my $l1_map = Unicode::Map8->new("latin1")    || die;
12
13        my $ustr = $no_map->to16("V}re norske tegn b|r {res\n");
14        my $lstr = $l1_map->to8($ustr);
15        print $lstr;
16
17        print $no_map->tou("V}re norske tegn b|r {res\n")->utf8
18

DESCRIPTION

20       The Unicode::Map8 class implement efficient mapping tables between
21       8-bit character sets and 16 bit character sets like Unicode.  The
22       tables are efficient both in terms of space allocated and translation
23       speed.  The 16-bit strings is assumed to use network byte order.
24
25       The following methods are available:
26
27       $m = Unicode::Map8->new( [$charset] )
28           The object constructor creates new instances of the Unicode::Map8
29           class.  I takes an optional argument that specify then name of a
30           8-bit character set to initialize mappings from.  The argument can
31           also be a the name of a mapping file.  If the charset/file can not
32           be located, then the constructor returns undef.
33
34           If you omit the argument, then an empty mapping table is
35           constructed.  You must then add mapping pairs to it using the
36           addpair() method described below.
37
38       $m->addpair( $u8, $u16 );
39           Adds a new mapping pair to the mapping object.  It takes two
40           arguments.  The first is the code value in the 8-bit character set
41           and the second is the corresponding code value in the 16-bit
42           character set.  The same codes can be used multiple times (but
43           using the same pair has no effect).  The first definition for a
44           code is the one that is used.
45
46           Consider the following example:
47
48             $m->addpair(0x20, 0x0020);
49             $m->addpair(0x20, 0x00A0);
50             $m->addpair(0xA0, 0x00A0);
51
52           It means that the character 0x20 and 0xA0 in the 8-bit charset maps
53           to themselves in the 16-bit set, but in the 16-bit character set
54           0x0A0 maps to 0x20.
55
56       $m->default_to8( $u8 )
57           Set the code of the default character to use when mapping from
58           16-bit to 8-bit strings.  If there is no mapping pair defined for a
59           character then this default is substituted by to8() and recode8().
60
61       $m->default_to16( $u16 )
62           Set the code of the default character to use when mapping from
63           8-bit to 16-bit strings. If there is no mapping pair defined for a
64           character then this default is used by to16(), tou() and recode8().
65
66       $m->nostrict;
67           All undefined mappings are replaced with the identity mapping.
68           Undefined character are normally just removed (or replaced with the
69           default if defined) when converting between character sets.
70
71       $m->to8( $ustr );
72           Converts a 16-bit character string to the corresponding string in
73           the 8-bit character set.
74
75       $m->to16( $str );
76           Converts a 8-bit character string to the corresponding string in
77           the 16-bit character set.
78
79       $m->tou( $str );
80           Same an to16() but return a Unicode::String object instead of a
81           plain UCS2 string.
82
83       $m->recode8($m2, $str);
84           Map the string $str from one 8-bit character set ($m) to another
85           one ($m2).  Since we assume we know the mappings towards the common
86           16-bit encoding we can use this to convert between any of the 8-bit
87           character sets.
88
89       $m->to_char16( $u8 )
90           Maps a single 8-bit character code to an 16-bit code.  If the 8-bit
91           character is unmapped then the constant NOCHAR is returned.  The
92           default is not used and the callback method is not invoked.
93
94       $m->to_char8( $u16 )
95           Maps a single 16-bit character code to an 8-bit code. If the 16-bit
96           character is unmapped then the constant NOCHAR is returned.  The
97           default is not used and the callback method is not invoked.
98
99       The following callback methods are available.  You can override these
100       methods by creating a subclass of Unicode::Map8.
101
102       $m->unmapped_to8
103           When mapping to 8-bit character string and there is no mapping
104           defined (and no default either), then this method is called as the
105           last resort.  It is called with a single integer argument which is
106           the code of the unmapped 16-bit character.  It is expected to
107           return a string that will be incorporated in the 8-bit string.  The
108           default version of this method always returns an empty string.
109
110           Example:
111
112            package MyMapper;
113            @ISA=qw(Unicode::Map8);
114
115            sub unmapped_to8
116            {
117               my($self, $code) = @_;
118               require Unicode::CharName;
119               "<" . Unicode::CharName::uname($code) . ">";
120            }
121
122       $m->unmapped_to16
123           Likewise when mapping to 16-bit character string and no mapping is
124           defined then this method is called.  It should return a 16-bit
125           string with the bytes in network byte order.  The default version
126           of this method always returns an empty string.
127

FILES

129       The Unicode::Map8 constructor can parse two different file formats; a
130       binary format and a textual format.
131
132       The binary format is simple.  It consist of a sequence of 16-bit
133       integer pairs in network byte order.  The first pair should contain the
134       magic value 0xFFFE, 0x0001.  Of each pair, the first value is the code
135       of an 8-bit character and the second is the code of the 16-bit
136       character.  If follows from this that the first value should be less
137       than 256.
138
139       The textual format consist of lines that is either a comment (first
140       non-blank character is '#'), a completely blank line or a line with two
141       hexadecimal numbers.  The hexadecimal numbers must be preceded by "0x"
142       as in C and Perl.  This is the same format used by the Unicode mapping
143       files available from <URL:ftp://ftp.unicode.org/Public>.
144
145       The mapping table files are installed in the Unicode/Map8/maps
146       directory somewhere in the Perl @INC path.  The variable
147       $Unicode::Map8::MAPS_DIR is the complete path name to this directory.
148       Binary mapping files are stored within this directory with the suffix
149       .bin.  Textual mapping files are stored with the suffix .txt.
150
151       The scripts map8_bin2txt and map8_txt2bin can translate between these
152       mapping file formats.
153
154       A special file called aliases within $MAPS_DIR specify all the alias
155       names that can be used to denote the various character sets.  The first
156       name of each line is the real file name and the rest is alias names
157       separated by space.
158
159       The `"umap --list"' command be used to list the character sets
160       supported.
161

BUGS

163       Does not handle Unicode surrogate pairs as a single character.
164

SEE ALSO

166       umap(1), Unicode::String
167
169       Copyright 1998 Gisle Aas.
170
171       This library is free software; you can redistribute it and/or modify it
172       under the same terms as Perl itself.
173
174
175
176perl v5.16.3                      2010-01-18                           Map8(3)
Impressum