1charnames(3pm) Perl Programmers Reference Guide charnames(3pm)
2
3
4
6 charnames - define character names for "\N{named}" string literal
7 escapes
8
10 use charnames ':full';
11 print "\N{GREEK SMALL LETTER SIGMA} is called sigma.\n";
12
13 use charnames ':short';
14 print "\N{greek:Sigma} is an upper-case sigma.\n";
15
16 use charnames qw(cyrillic greek);
17 print "\N{sigma} is Greek sigma, and \N{be} is Cyrillic b.\n";
18
19 use charnames ":full", ":alias" => {
20 e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
21 };
22 print "\N{e_ACUTE} is a small letter e with an acute.\n";
23
24 use charnames ();
25 print charnames::viacode(0x1234); # prints "ETHIOPIC SYLLABLE SEE"
26 printf "%04X", charnames::vianame("GOTHIC LETTER AHSA"); # prints "10330"
27
29 Pragma "use charnames" supports arguments ":full", ":short", script
30 names and customized aliases. If ":full" is present, for expansion of
31 "\N{CHARNAME}", the string "CHARNAME" is first looked up in the list of
32 standard Unicode character names. If ":short" is present, and "CHAR‐
33 NAME" has the form "SCRIPT:CNAME", then "CNAME" is looked up as a let‐
34 ter in script "SCRIPT". If pragma "use charnames" is used with script
35 name arguments, then for "\N{CHARNAME}" the name "CHARNAME" is looked
36 up as a letter in the given scripts (in the specified order). Custom‐
37 ized aliases are explained in "CUSTOM ALIASES".
38
39 For lookup of "CHARNAME" inside a given script "SCRIPTNAME" this pragma
40 looks for the names
41
42 SCRIPTNAME CAPITAL LETTER CHARNAME
43 SCRIPTNAME SMALL LETTER CHARNAME
44 SCRIPTNAME LETTER CHARNAME
45
46 in the table of standard Unicode names. If "CHARNAME" is lowercase,
47 then the "CAPITAL" variant is ignored, otherwise the "SMALL" variant is
48 ignored.
49
50 Note that "\N{...}" is compile-time, it's a special form of string con‐
51 stant used inside double-quoted strings: in other words, you cannot use
52 variables inside the "\N{...}". If you want similar run-time function‐
53 ality, use charnames::vianame().
54
55 For the C0 and C1 control characters (U+0000..U+001F, U+0080..U+009F)
56 as of Unicode 3.1, there are no official Unicode names but you can use
57 instead the ISO 6429 names (LINE FEED, ESCAPE, and so forth). In Uni‐
58 code 3.2 (as of Perl 5.8) some naming changes take place ISO 6429 has
59 been updated, see "ALIASES". Also note that the U+UU80, U+0081,
60 U+0084, and U+0099 do not have names even in ISO 6429.
61
62 Since the Unicode standard uses "U+HHHH", so can you: "\N{U+263a}" is
63 the Unicode smiley face, or "\N{WHITE SMILING FACE}".
64
66 The mechanism of translation of "\N{...}" escapes is general and not
67 hardwired into charnames.pm. A module can install custom translations
68 (inside the scope which "use"s the module) with the following magic
69 incantation:
70
71 use charnames (); # for $charnames::hint_bits
72 sub import {
73 shift;
74 $^H ⎪= $charnames::hint_bits;
75 $^H{charnames} = \&translator;
76 }
77
78 Here translator() is a subroutine which takes "CHARNAME" as an argu‐
79 ment, and returns text to insert into the string instead of the
80 "\N{CHARNAME}" escape. Since the text to insert should be different in
81 "bytes" mode and out of it, the function should check the current state
82 of "bytes"-flag as in:
83
84 use bytes (); # for $bytes::hint_bits
85 sub translator {
86 if ($^H & $bytes::hint_bits) {
87 return bytes_translator(@_);
88 }
89 else {
90 return utf8_translator(@_);
91 }
92 }
93
95 This version of charnames supports three mechanisms of adding local or
96 customized aliases to standard Unicode naming conventions (:full)
97
98 Anonymous hashes
99
100 use charnames ":full", ":alias" => {
101 e_ACUTE => "LATIN SMALL LETTER E WITH ACUTE",
102 };
103 my $str = "\N{e_ACUTE}";
104
105 Alias file
106
107 use charnames ":full", ":alias" => "pro";
108
109 will try to read "unicore/pro_alias.pl" from the @INC path. This
110 file should return a list in plain perl:
111
112 (
113 A_GRAVE => "LATIN CAPITAL LETTER A WITH GRAVE",
114 A_CIRCUM => "LATIN CAPITAL LETTER A WITH CIRCUMFLEX",
115 A_DIAERES => "LATIN CAPITAL LETTER A WITH DIAERESIS",
116 A_TILDE => "LATIN CAPITAL LETTER A WITH TILDE",
117 A_BREVE => "LATIN CAPITAL LETTER A WITH BREVE",
118 A_RING => "LATIN CAPITAL LETTER A WITH RING ABOVE",
119 A_MACRON => "LATIN CAPITAL LETTER A WITH MACRON",
120 );
121
122 Alias shortcut
123
124 use charnames ":alias" => ":pro";
125
126 works exactly the same as the alias pairs, only this time,
127 ":full" is inserted automatically as first argument (if no
128 other argument is given).
129
131 Returns the full name of the character indicated by the numeric code.
132 The example
133
134 print charnames::viacode(0x2722);
135
136 prints "FOUR TEARDROP-SPOKED ASTERISK".
137
138 Returns undef if no name is known for the code.
139
140 This works only for the standard names, and does not yet apply to cus‐
141 tom translators.
142
143 Notice that the name returned for of U+FEFF is "ZERO WIDTH NO-BREAK
144 SPACE", not "BYTE ORDER MARK".
145
147 Returns the code point indicated by the name. The example
148
149 printf "%04X", charnames::vianame("FOUR TEARDROP-SPOKED ASTERISK");
150
151 prints "2722".
152
153 Returns undef if the name is unknown.
154
155 This works only for the standard names, and does not yet apply to cus‐
156 tom translators.
157
159 A few aliases have been defined for convenience: instead of having to
160 use the official names
161
162 LINE FEED (LF)
163 FORM FEED (FF)
164 CARRIAGE RETURN (CR)
165 NEXT LINE (NEL)
166
167 (yes, with parentheses) one can use
168
169 LINE FEED
170 FORM FEED
171 CARRIAGE RETURN
172 NEXT LINE
173 LF
174 FF
175 CR
176 NEL
177
178 One can also use
179
180 BYTE ORDER MARK
181 BOM
182
183 and
184
185 ZWNJ
186 ZWJ
187
188 for ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER.
189
190 For backward compatibility one can use the old names for certain C0 and
191 C1 controls
192
193 old new
194
195 HORIZONTAL TABULATION CHARACTER TABULATION
196 VERTICAL TABULATION LINE TABULATION
197 FILE SEPARATOR INFORMATION SEPARATOR FOUR
198 GROUP SEPARATOR INFORMATION SEPARATOR THREE
199 RECORD SEPARATOR INFORMATION SEPARATOR TWO
200 UNIT SEPARATOR INFORMATION SEPARATOR ONE
201 PARTIAL LINE DOWN PARTIAL LINE FORWARD
202 PARTIAL LINE UP PARTIAL LINE BACKWARD
203
204 but the old names in addition to giving the character will also give a
205 warning about being deprecated.
206
208 If you ask by name for a character that does not exist, a warning is
209 given and the Unicode replacement character "\x{FFFD}" is returned.
210
211 If you ask by code for a character that does not exist, no warning is
212 given and "undef" is returned. (Though if you ask for a code point
213 past U+10FFFF you do get a warning.)
214
216 Since evaluation of the translation function happens in a middle of
217 compilation (of a string literal), the translation function should not
218 do any "eval"s or "require"s. This restriction should be lifted in a
219 future version of Perl.
220
221
222
223perl v5.8.8 2001-09-21 charnames(3pm)