1charnames(3pm) Perl Programmers Reference Guide charnames(3pm)
2
3
4
6 charnames - define character names for "\N{named}" string literal
7 escapes
8
10 use charnames ':full';
11 print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
12
13 use charnames ':short';
14 print "\N{greek:Sigma} is an upper-case sigma.\n";
15
16 use charnames qw(cyrillic greek);
17 print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n";
18
19 use charnames ":full", ":alias" => {
20 e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
21 };
22 print "\N{e_ACUTE} is a small letter e with an acute.\n";
23
24 use charnames ();
25 print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE"
26 printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints "10330"
27
29 Pragma "use charnames" supports arguments ":full", ":short", script
30 names and customized aliases. If ":full" is present, for expansion of
31 "\N{CHARNAME}", the string "CHARNAME" is first looked up in the list of
32 standard Unicode character names. If ":short" is present, and
33 "CHARNAME" has the form "SCRIPT:CNAME", then "CNAME" is looked up as a
34 letter in script "SCRIPT". If pragma "use charnames" is used with
35 script name arguments, then for "\N{CHARNAME}" the name "CHARNAME" is
36 looked up as a letter in the given scripts (in the specified order).
37 Customized aliases are explained in "CUSTOM ALIASES".
38
39 For lookup of "CHARNAME" inside a given script "SCRIPTNAME" this pragma
40 looks for the names
41
42 SCRIPTNAME CAPITAL LETTER CHARNAME
43 SCRIPTNAME SMALL LETTER CHARNAME
44 SCRIPTNAME LETTER CHARNAME
45
46 in the table of standard Unicode names. If "CHARNAME" is lowercase,
47 then the "CAPITAL" variant is ignored, otherwise the "SMALL" variant is
48 ignored.
49
50 Note that "\N{...}" is compile-time, it's a special form of string
51 constant used inside double-quoted strings: in other words, you cannot
52 use variables inside the "\N{...}". If you want similar run-time
53 functionality, use charnames::vianame().
54
55 For the C0 and C1 control characters (U+0000..U+001F, U+0080..U+009F)
56 as of Unicode 3.1, there are no official Unicode names but you can use
57 instead the ISO 6429 names (LINE FEED, ESCAPE, and so forth). In
58 Unicode 3.2 (as of Perl 5.8) some naming changes take place ISO 6429
59 has been updated, see "ALIASES". Also note that the U+UU80, U+0081,
60 U+0084, and U+0099 do not have names even in ISO 6429.
61
62 Since the Unicode standard uses "U+HHHH", so can you: "\N{U+263a}" is
63 the Unicode smiley face, or "\N{WHITE SMILING FACE}".
64
66 A few aliases have been defined for convenience: instead of having to
67 use the official names
68
69 LINE FEED (LF)
70 FORM FEED (FF)
71 CARRIAGE RETURN (CR)
72 NEXT LINE (NEL)
73
74 (yes, with parentheses) one can use
75
76 LINE FEED
77 FORM FEED
78 CARRIAGE RETURN
79 NEXT LINE
80 LF
81 FF
82 CR
83 NEL
84
85 One can also use
86
87 BYTE ORDER MARK
88 BOM
89
90 and
91
92 ZWNJ
93 ZWJ
94
95 for ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER.
96
97 For backward compatibility one can use the old names for certain C0 and
98 C1 controls
99
100 old new
101
102 HORIZONTAL TABULATION CHARACTER TABULATION
103 VERTICAL TABULATION LINE TABULATION
104 FILE SEPARATOR INFORMATION SEPARATOR FOUR
105 GROUP SEPARATOR INFORMATION SEPARATOR THREE
106 RECORD SEPARATOR INFORMATION SEPARATOR TWO
107 UNIT SEPARATOR INFORMATION SEPARATOR ONE
108 PARTIAL LINE DOWN PARTIAL LINE FORWARD
109 PARTIAL LINE UP PARTIAL LINE BACKWARD
110
111 but the old names in addition to giving the character will also give a
112 warning about being deprecated.
113
115 This version of charnames supports three mechanisms of adding local or
116 customized aliases to standard Unicode naming conventions (:full).
117
118 Note that an alias should not be something that is a legal curly brace-
119 enclosed quantifier (see "QUANTIFIERS" in perlreref). For example
120 "\N{123}" means to match 123 non-newline characters, and is not treated
121 as an alias. Aliases are discouraged from beginning with anything
122 other than an alphabetic character and from containing anything other
123 than alphanumerics, spaces, dashes, colons, parentheses, and
124 underscores. Currently they must be ASCII.
125
126 Anonymous hashes
127 use charnames ":full", ":alias" => {
128 e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
129 };
130 my $str = "\N{e_ACUTE}";
131
132 Alias file
133 use charnames ":full", ":alias" => "pro";
134
135 will try to read "unicore/pro_alias.pl" from the @INC path. This
136 file should return a list in plain perl:
137
138 (
139 A_GRAVE => "LATIN CAPITAL LETTER A WITH GRAVE",
140 A_CIRCUM => "LATIN CAPITAL LETTER A WITH CIRCUMFLEX",
141 A_DIAERES => "LATIN CAPITAL LETTER A WITH DIAERESIS",
142 A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE",
143 A_BREVE => "LATIN CAPITAL LETTER A WITH BREVE",
144 A_RING => "LATIN CAPITAL LETTER A WITH RING ABOVE",
145 A_MACRON => "LATIN CAPITAL LETTER A WITH MACRON",
146 );
147
148 Alias shortcut
149 use charnames ":alias" => ":pro";
150
151 works exactly the same as the alias pairs, only this time,
152 ":full" is inserted automatically as first argument (if no
153 other argument is given).
154
156 Returns the full name of the character indicated by the numeric code.
157 The example
158
159 print charnames::viacode(0x2722);
160
161 prints "FOUR TEARDROP-SPOKED ASTERISK".
162
163 Returns undef if no name is known for the code.
164
165 This works only for the standard names, and does not yet apply to
166 custom translators.
167
168 Notice that the name returned for of U+FEFF is "ZERO WIDTH NO-BREAK
169 SPACE", not "BYTE ORDER MARK".
170
172 Returns the code point indicated by the name. The example
173
174 printf "%04X", charnames::vianame("FOUR TEARDROP-SPOKED ASTERISK");
175
176 prints "2722".
177
178 Returns undef if the name is unknown.
179
180 This works only for the standard names, and does not yet apply to
181 custom translators.
182
184 The mechanism of translation of "\N{...}" escapes is general and not
185 hardwired into charnames.pm. A module can install custom translations
186 (inside the scope which "use"s the module) with the following magic
187 incantation:
188
189 sub import {
190 shift;
191 $^H{charnames} = \&translator;
192 }
193
194 Here translator() is a subroutine which takes "CHARNAME" as an
195 argument, and returns text to insert into the string instead of the
196 "\N{CHARNAME}" escape. Since the text to insert should be different in
197 "bytes" mode and out of it, the function should check the current state
198 of "bytes"-flag as in:
199
200 use bytes (); # for $bytes::hint_bits
201 sub translator {
202 if ($^H & $bytes::hint_bits) {
203 return bytes_translator(@_);
204 }
205 else {
206 return utf8_translator(@_);
207 }
208 }
209
210 See "CUSTOM ALIASES" above for restrictions on "CHARNAME".
211
213 If you ask by name for a character that does not exist, a warning is
214 given and the Unicode replacement character "\x{FFFD}" is returned.
215
216 If you ask by code for a character that is unassigned, no warning is
217 given and "undef" is returned. (Though if you ask for a code point
218 past U+10FFFF you do get a warning.) See "BUGS" below.
219
221 viacode should return an empty string for unassigned in-range Unicode
222 code points, as that is their correct current name.
223
224 viacode(0) doesn't return "NULL", but "undef"
225
226 vianame returns a chr if the input name is of the form "U+...", and an
227 ord otherwise. It is planned to change this to always return an ord.
228
229 None of the functions work on almost all the Hangul syllable and CJK
230 Unicode characters that have their code points as part of their names.
231
232 Names must be ASCII characters only.
233
234 Unicode standard named sequences are not recognized, such as "LATIN
235 CAPITAL LETTER A WITH MACRON AND GRAVE" (which should mean "LATIN
236 CAPITAL LETTER A WITH MACRON" with an additional "COMBINING GRAVE
237 ACCENT").
238
239 Since evaluation of the translation function happens in the middle of
240 compilation (of a string literal), the translation function should not
241 do any "eval"s or "require"s. This restriction should be lifted in a
242 future version of Perl.
243
244
245
246perl v5.12.4 2011-06-07 charnames(3pm)