1charmap(5)            Standards, Environments, and Macros           charmap(5)
2
3
4

NAME

6       charmap - character set description file
7

DESCRIPTION

9       A character set description file or charmap defines characteristics for
10       a coded character set. Other information about the coded character  set
11       may  also  be  in  the  file.  Coded character set character values are
12       defined using symbolic character names followed by  character  encoding
13       values.
14
15
16       The character set description file provides:
17
18           o      The capability to describe character set attributes (such as
19                  collation order or character classes) independent of charac‐
20                  ter  set encoding, and using only the characters in the por‐
21                  table character  set.  This  makes  it  possible  to  create
22                  generic  localedef(1)  source  files  for  all codesets that
23                  share the portable character set.
24
25           o      Standardized symbolic names for all characters in the porta‐
26                  ble  character  set, making it possible to refer to any such
27                  character regardless of encoding.
28
29   Symbolic Names
30       Each symbolic name  is included in the file and is mapped to  a  unique
31       encoding  value  (except  for  those symbolic names that are shown with
32       identical glyphs). If the control characters commonly  associated  with
33       the  symbolic  names in the following table are supported by the imple‐
34       mentation, the symbolic names and their corresponding  encoding  values
35       are  included  in  the  file. Some of the encodings associated with the
36       symbolic names in this table may be the same as characters in the  por‐
37       table character set table.
38
39
40
41
42       ┌───────────────────────────────────────────────────────────────────────┐
43       │<ACK>       <DC2>       <ENQ>       <FS>         <IS4>       <SOH>     │
44       │<BEL>       <DC3>       <EOT>       <GS>         <LF>        <STX>     │
45       │<BS>        <DC4>       <ESC>       <HT>         <NAK>       <SUB>     │
46       │<CAN>       <DEL>       <ETB>       <IS1>        <RS>        <SYN>     │
47       │<CR>        <DLE>       <ETX>       <IS2>        <SI>        <US>      │
48       │<DC1>       <EM>        <FF>        <IS3>        <SO>        <VT>      │
49       └───────────────────────────────────────────────────────────────────────┘
50
51   Declarations
52       The  following declarations can precede the character definitions. Each
53       must consist of the symbol shown in the  following  list,  starting  in
54       column  1,  including the surrounding brackets, followed by one or more
55       blank characters, followed by the value to be assigned to the symbol.
56
57       <code_set_name>    The name of the coded character set  for  which  the
58                          character set description file is defined.
59
60
61       <mb_cur_max>       The  maximum number of bytes in a multi-byte charac‐
62                          ter. This defaults to 1.
63
64
65       <mb_cur_min>       An unsigned positive integer value that defines  the
66                          minimum  number  of  bytes  in  a  character for the
67                          encoded character set.
68
69
70       <escape_char>      The escape character used to indicate that the char‐
71                          acters  following  will  be interpreted in a special
72                          way, as defined later in this section. This defaults
73                          to  backslash  ('\'),  which  is the character glyph
74                          used in all the following text and examples,  unless
75                          otherwise noted.
76
77
78       <comment_char>     The  character  that  when  placed  in column 1 of a
79                          charmap line, is used to indicate that the  line  is
80                          to  be  ignored. The default character is the number
81                          sign (#).
82
83
84   Format
85       The character set mapping definitions will be all the lines immediately
86       following  an identifier line containing the string CHARMAP starting in
87       column 1, and preceding  a  trailer  line  containing  the  string  END
88       CHARMAP  starting in column 1. Empty lines and lines containing a <com‐
89       ment_char> in the first column will be ignored. Each  non-comment  line
90       of  the  character set mapping definition, that is, between the CHARMAP
91       and END CHARMAP lines of the file), must be in either of two forms:
92
93         "%s %s %s\n",<symbolic-name>,<encoding>,<comments>
94
95
96
97       or
98
99         "%s...%s %s %s\n",<symbolic-name>,<symbolic-name>, <encoding>,\
100                  <comments>
101
102
103
104       In the first format, the line in the character set  mapping  definition
105       defines  a single symbolic name and a corresponding encoding. A charac‐
106       ter following an escape character is interpreted as itself;  for  exam‐
107       ple,  the  sequence "<\\\>>" represents the symbolic name "\>" enclosed
108       between angle brackets.
109
110
111       In the second format, the line in the character set mapping  definition
112       defines  a  range of one or more symbolic names. In this form, the sym‐
113       bolic names must consist of zero or more non-numeric characters,   fol‐
114       lowed  by  an integer formed by one or more decimal digits. The charac‐
115       ters preceding the integer must be identical in the two symbolic names,
116       and  the  integer formed by the digits in the second symbolic name must
117       be equal to or greater than the integer formed by  the  digits  in  the
118       first  name.  This  is interpreted as a series of symbolic names formed
119       from the common part and each of the integers between the first and the
120       second  integer,  inclusive. As an example, <j0101>...<j0104> is inter‐
121       preted as the symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in
122       that order.
123
124
125       A  character  set  mapping  definition line must exist for all symbolic
126       names and must define the coded character value that corresponds to the
127       character  glyph  indicated  in the table, or the coded character value
128       that corresponds with the control character symbolic name. If the  con‐
129       trol  characters  commonly associated with the symbolic names  are sup‐
130       ported by the implementation, the symbolic name and  the  corresponding
131       encoding value must be included in the file. Additional unique symbolic
132       names may be included. A coded character value can  be  represented  by
133       more than one symbolic name.
134
135
136       The  encoding  part is expressed as one (for single-byte character val‐
137       ues) or more concatenated decimal, octal or  hexadecimal  constants  in
138       the following formats:
139
140         "%cd%d",<escape_char>,<decimal byte value>
141
142         "%cx%x",<escape_char>,<hexadecimal byte value>
143
144         "%c%o",<escape_char>,<octal byte value>
145
146
147   Decimal Constants
148       Decimal  constants  must be represented by two or three decimal digits,
149       preceded by the escape character and the lower-case letter d; for exam‐
150       ple, \d05, \d97, or \d143. Hexadecimal constants must be represented by
151       two hexadecimal digits, preceded by the escape character and the lower-
152       case  letter  x; for example, \x05, \x61, or \x8f. Octal constants must
153       be represented by two or three octal digits,  preceded  by  the  escape
154       character; for example, \05, \141, or \217. In a portable charmap file,
155       each constant must represent an 8-bit byte. Implementations  supporting
156       other  byte  sizes  may allow constants to represent values larger than
157       those that can be represented in 8-bit bytes, and to  allow  additional
158       digits  in  constants.  When  constants are concatenated for multi-byte
159       character values, they must be of the same  type,  and  interpreted  in
160       byte  order  from  first to last with the least significant byte of the
161       multi-byte character specified by the last constant.
162
163   Ranges of Symbolic Names
164       In lines defining ranges of symbolic names, the encoded  value  is  the
165       value  for the first symbolic name in the range (the symbolic name pre‐
166       ceding the ellipsis). Subsequent symbolic names defined  by  the  range
167       will  have  encoding  values  in increasing order. Bytes are treated as
168       unsigned octets and carry is propagated between the bytes as  necessary
169       to represent the range. However, because this causes a null byte in the
170       second or subsequent bytes of a character, such  a  declaration  should
171       not be specified. For example, the line
172
173         <j0101>...<j0104>     \d129\d254
174
175
176
177       is interpreted as:
178
179         <j0101>                \d129\d254
180         <j0102>                \d129\d255
181         <j0103>                \d130\d00
182         <j0104>                \d130\d01
183
184
185
186       The  expanded declaration of the symbol <j0103> in the above example is
187       an invalid specification, because it contains a null byte in the second
188       byte of a character.
189
190
191       The comment is optional.
192
193   Width Specification
194       The following declarations can follow the character set mapping defini‐
195       tions (after the "END CHARMAP" statement). Each consists of the keyword
196       shown  in  the  following  list,  starting in column 1, followed by the
197       value(s) to be associated to the keyword, as defined below.
198
199       WIDTH            A non-negative integer value defining the column width
200                        for the printable character in the coded character set
201                        mapping definitions.  Coded  character  set  character
202                        values are defined using symbolic character names fol‐
203                        lowed by column width  values.  Defining  a  character
204                        with  more  than one WIDTH produces undefined results.
205                        The END WIDTH keyword is used to terminate  the  WIDTH
206                        definitions.  Specifying  the width of a non-printable
207                        character in a WIDTH  declaration  produces  undefined
208                        results.
209
210
211       WIDTH_DEFAULT    A non-negative integer value defining the default col‐
212                        umn width for any printable character  not  listed  by
213                        one of the WIDTH keywords. If no WIDTH_DEFAULT keyword
214                        is included in  the  charmap,  the  default  character
215                        width is 1.
216
217
218
219       Example:
220
221
222       After  the  "END  CHARMAP"  statement,  a syntax for a width definition
223       would be:
224
225         WIDTH
226         <A>             1
227         <B>             1
228         <C>...<Z>       1
229         ...
230         <fool>...<foon> 2
231         ...
232         END WIDTH
233
234
235
236
237       In this example, the numerical code point  values  represented  by  the
238       symbols  <A> and <B> are assigned a width of 1. The code point values <
239       C> to <Z> inclusive, that is, <C>,  <D>,  <E>,  and  so  on,  are  also
240       assigned  a  width  of  1.  Using <A>. . .<Z> would have required fewer
241       lines, but the alternative was shown to  demonstrate  flexibility.  The
242       keyword WIDTH_DEFAULT could have been added as appropriate.
243

SEE ALSO

245       locale(1), localedef(1), nl_langinfo(3C), extensions(5), locale(5)
246
247
248
249SunOS 5.11                        1 Dec 2003                        charmap(5)
Impressum