1charmap(5) Standards, Environments, and Macros charmap(5)
2
3
4
6 charmap - character set description file
7
9 A character set description file or charmap defines characteristics for
10 a coded character set. Other information about the coded character set
11 may also be in the file. Coded character set character values are
12 defined using symbolic character names followed by character encoding
13 values.
14
15
16 The character set description file provides:
17
18 o The capability to describe character set attributes (such as
19 collation order or character classes) independent of charac‐
20 ter set encoding, and using only the characters in the por‐
21 table character set. This makes it possible to create
22 generic localedef(1) source files for all codesets that
23 share the portable character set.
24
25 o Standardized symbolic names for all characters in the porta‐
26 ble character set, making it possible to refer to any such
27 character regardless of encoding.
28
29 Symbolic Names
30 Each symbolic name is included in the file and is mapped to a unique
31 encoding value (except for those symbolic names that are shown with
32 identical glyphs). If the control characters commonly associated with
33 the symbolic names in the following table are supported by the imple‐
34 mentation, the symbolic names and their corresponding encoding values
35 are included in the file. Some of the encodings associated with the
36 symbolic names in this table may be the same as characters in the por‐
37 table character set table.
38
39
40
41
42 ┌───────────────────────────────────────────────────────────────────────┐
43 │<ACK> <DC2> <ENQ> <FS> <IS4> <SOH> │
44 │<BEL> <DC3> <EOT> <GS> <LF> <STX> │
45 │<BS> <DC4> <ESC> <HT> <NAK> <SUB> │
46 │<CAN> <DEL> <ETB> <IS1> <RS> <SYN> │
47 │<CR> <DLE> <ETX> <IS2> <SI> <US> │
48 │<DC1> <EM> <FF> <IS3> <SO> <VT> │
49 └───────────────────────────────────────────────────────────────────────┘
50
51 Declarations
52 The following declarations can precede the character definitions. Each
53 must consist of the symbol shown in the following list, starting in
54 column 1, including the surrounding brackets, followed by one or more
55 blank characters, followed by the value to be assigned to the symbol.
56
57 <code_set_name> The name of the coded character set for which the
58 character set description file is defined.
59
60
61 <mb_cur_max> The maximum number of bytes in a multi-byte charac‐
62 ter. This defaults to 1.
63
64
65 <mb_cur_min> An unsigned positive integer value that defines the
66 minimum number of bytes in a character for the
67 encoded character set.
68
69
70 <escape_char> The escape character used to indicate that the char‐
71 acters following will be interpreted in a special
72 way, as defined later in this section. This defaults
73 to backslash ('\'), which is the character glyph
74 used in all the following text and examples, unless
75 otherwise noted.
76
77
78 <comment_char> The character that when placed in column 1 of a
79 charmap line, is used to indicate that the line is
80 to be ignored. The default character is the number
81 sign (#).
82
83
84 Format
85 The character set mapping definitions will be all the lines immediately
86 following an identifier line containing the string CHARMAP starting in
87 column 1, and preceding a trailer line containing the string END
88 CHARMAP starting in column 1. Empty lines and lines containing a <com‐
89 ment_char> in the first column will be ignored. Each non-comment line
90 of the character set mapping definition, that is, between the CHARMAP
91 and END CHARMAP lines of the file), must be in either of two forms:
92
93 "%s %s %s\n",<symbolic-name>,<encoding>,<comments>
94
95
96
97 or
98
99 "%s...%s %s %s\n",<symbolic-name>,<symbolic-name>, <encoding>,\
100 <comments>
101
102
103
104 In the first format, the line in the character set mapping definition
105 defines a single symbolic name and a corresponding encoding. A charac‐
106 ter following an escape character is interpreted as itself; for exam‐
107 ple, the sequence "<\\\>>" represents the symbolic name "\>" enclosed
108 between angle brackets.
109
110
111 In the second format, the line in the character set mapping definition
112 defines a range of one or more symbolic names. In this form, the sym‐
113 bolic names must consist of zero or more non-numeric characters, fol‐
114 lowed by an integer formed by one or more decimal digits. The charac‐
115 ters preceding the integer must be identical in the two symbolic names,
116 and the integer formed by the digits in the second symbolic name must
117 be equal to or greater than the integer formed by the digits in the
118 first name. This is interpreted as a series of symbolic names formed
119 from the common part and each of the integers between the first and the
120 second integer, inclusive. As an example, <j0101>...<j0104> is inter‐
121 preted as the symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in
122 that order.
123
124
125 A character set mapping definition line must exist for all symbolic
126 names and must define the coded character value that corresponds to the
127 character glyph indicated in the table, or the coded character value
128 that corresponds with the control character symbolic name. If the con‐
129 trol characters commonly associated with the symbolic names are sup‐
130 ported by the implementation, the symbolic name and the corresponding
131 encoding value must be included in the file. Additional unique symbolic
132 names may be included. A coded character value can be represented by
133 more than one symbolic name.
134
135
136 The encoding part is expressed as one (for single-byte character val‐
137 ues) or more concatenated decimal, octal or hexadecimal constants in
138 the following formats:
139
140 "%cd%d",<escape_char>,<decimal byte value>
141
142 "%cx%x",<escape_char>,<hexadecimal byte value>
143
144 "%c%o",<escape_char>,<octal byte value>
145
146
147 Decimal Constants
148 Decimal constants must be represented by two or three decimal digits,
149 preceded by the escape character and the lower-case letter d; for exam‐
150 ple, \d05, \d97, or \d143. Hexadecimal constants must be represented by
151 two hexadecimal digits, preceded by the escape character and the lower-
152 case letter x; for example, \x05, \x61, or \x8f. Octal constants must
153 be represented by two or three octal digits, preceded by the escape
154 character; for example, \05, \141, or \217. In a portable charmap file,
155 each constant must represent an 8-bit byte. Implementations supporting
156 other byte sizes may allow constants to represent values larger than
157 those that can be represented in 8-bit bytes, and to allow additional
158 digits in constants. When constants are concatenated for multi-byte
159 character values, they must be of the same type, and interpreted in
160 byte order from first to last with the least significant byte of the
161 multi-byte character specified by the last constant.
162
163 Ranges of Symbolic Names
164 In lines defining ranges of symbolic names, the encoded value is the
165 value for the first symbolic name in the range (the symbolic name pre‐
166 ceding the ellipsis). Subsequent symbolic names defined by the range
167 will have encoding values in increasing order. Bytes are treated as
168 unsigned octets and carry is propagated between the bytes as necessary
169 to represent the range. However, because this causes a null byte in the
170 second or subsequent bytes of a character, such a declaration should
171 not be specified. For example, the line
172
173 <j0101>...<j0104> \d129\d254
174
175
176
177 is interpreted as:
178
179 <j0101> \d129\d254
180 <j0102> \d129\d255
181 <j0103> \d130\d00
182 <j0104> \d130\d01
183
184
185
186 The expanded declaration of the symbol <j0103> in the above example is
187 an invalid specification, because it contains a null byte in the second
188 byte of a character.
189
190
191 The comment is optional.
192
193 Width Specification
194 The following declarations can follow the character set mapping defini‐
195 tions (after the "END CHARMAP" statement). Each consists of the keyword
196 shown in the following list, starting in column 1, followed by the
197 value(s) to be associated to the keyword, as defined below.
198
199 WIDTH A non-negative integer value defining the column width
200 for the printable character in the coded character set
201 mapping definitions. Coded character set character
202 values are defined using symbolic character names fol‐
203 lowed by column width values. Defining a character
204 with more than one WIDTH produces undefined results.
205 The END WIDTH keyword is used to terminate the WIDTH
206 definitions. Specifying the width of a non-printable
207 character in a WIDTH declaration produces undefined
208 results.
209
210
211 WIDTH_DEFAULT A non-negative integer value defining the default col‐
212 umn width for any printable character not listed by
213 one of the WIDTH keywords. If no WIDTH_DEFAULT keyword
214 is included in the charmap, the default character
215 width is 1.
216
217
218
219 Example:
220
221
222 After the "END CHARMAP" statement, a syntax for a width definition
223 would be:
224
225 WIDTH
226 <A> 1
227 <B> 1
228 <C>...<Z> 1
229 ...
230 <fool>...<foon> 2
231 ...
232 END WIDTH
233
234
235
236
237 In this example, the numerical code point values represented by the
238 symbols <A> and <B> are assigned a width of 1. The code point values <
239 C> to <Z> inclusive, that is, <C>, <D>, <E>, and so on, are also
240 assigned a width of 1. Using <A>. . .<Z> would have required fewer
241 lines, but the alternative was shown to demonstrate flexibility. The
242 keyword WIDTH_DEFAULT could have been added as appropriate.
243
245 locale(1), localedef(1), nl_langinfo(3C), extensions(5), locale(5)
246
247
248
249SunOS 5.11 1 Dec 2003 charmap(5)