1u8_textprep_str(3C) Standard C Library Functions u8_textprep_str(3C)
2
3
4
6 u8_textprep_str - string-based UTF-8 text preparation function
7
9 #include <sys/u8_textprep.h>
10
11 size_t u8_textprep_str(char *inarray, size_t *inlen,
12 char *outarray, size_t *outlen, int flag,
13 size_t unicode_version, int *errnum);
14
15
17 inarray A pointer to a byte array containing a sequence of
18 UTF-8 character bytes to be prepared.
19
20
21 inlen As input argument, the number of bytes to be pre‐
22 pared in inarray. As output argument, the number of
23 bytes in inarray still not consumed.
24
25
26 outarray A pointer to a byte array where prepared UTF-8
27 character bytes can be saved.
28
29
30 outlen As input argument, the number of available bytes at
31 outarray where prepared character bytes can be
32 saved. As output argument, after the conversion,
33 the number of bytes still available at outarray.
34
35
36 flag The possible preparation options constructed by a
37 bitwise-inclusive-OR of the following values:
38
39 U8_TEXTPREP_IGNORE_NULL
40
41 Normally u8_textprep_str() stops the prepara‐
42 tion if it encounters null byte even if the
43 current inlen is pointing to a value bigger
44 than zero.
45
46 With this option, null byte does not stop the
47 preparation and the preparation continues until
48 inlen specified amount of inarray bytes are all
49 consumed for preparation or an error happened.
50
51
52 U8_TEXTPREP_IGNORE_INVALID
53
54 Normally u8_textprep_str() stops the prepara‐
55 tion if it encounters illegal or incomplete
56 characters with corresponding errnum values.
57
58 When this option is set, u8_textprep_str() does
59 not stop the preparation and instead treats
60 such characters as no need to do any prepara‐
61 tion.
62
63
64 U8_TEXTPREP_TOUPPER
65
66 Map lowercase characters to uppercase charac‐
67 ters if applicable.
68
69
70 U8_TEXTPREP_TOLOWER
71
72 Map uppercase characters to lowercase charac‐
73 ters if applicable.
74
75
76 U8_TEXTPREP_NFD
77
78 Apply Unicode Normalization Form D.
79
80
81 U8_TEXTPREP_NFC
82
83 Apply Unicode Normalization Form C.
84
85
86 U8_TEXTPREP_NFKD
87
88 Apply Unicode Normalization Form KD.
89
90
91 U8_TEXTPREP_NFKC
92
93 Apply Unicode Normalization Form KC.
94
95 Only one case folding option is allowed. Only one
96 Unicode Normalization option is allowed.
97
98 When a case folding option and a Unicode Normaliza‐
99 tion option are specified together, UTF-8 text
100 preparation is done by doing case folding first and
101 then Unicode Normalization.
102
103 If no option is specified, no processing occurs
104 except the simple copying of bytes from input to
105 output.
106
107
108 unicode_version The version of Unicode data that should be used
109 during UTF-8 text preparation. The following values
110 are supported:
111
112 U8_UNICODE_320
113
114 Use Unicode 3.2.0 data during comparison.
115
116
117 U8_UNICODE_500
118
119 Use Unicode 5.0.0 data during comparison.
120
121
122 U8_UNICODE_LATEST
123
124 Use the latest Unicode version data available
125 which is Unicode 5.0.0 currently.
126
127
128
129 errnum The error value when preparation is not completed
130 or fails. The following values are supported:
131
132 E2BIG Text preparation stopped due to lack of
133 space in the output array.
134
135
136 EBADF Specified option values are conflicting
137 and cannot be supported.
138
139
140 EILSEQ Text preparation stopped due to an input
141 byte that does not belong to UTF-8.
142
143
144 EINVAL Text preparation stopped due to an incom‐
145 plete UTF-8 character at the end of the
146 input array.
147
148
149 ERANGE The specified Unicode version value is
150 not a supported version.
151
152
153
155 The u8_textprep_str() function prepares the sequence of UTF-8 charac‐
156 ters in the array specified by inarray into a sequence of corresponding
157 UTF-8 characters prepared in the array specified by outarray. The inar‐
158 ray argument points to a character byte array to the first character in
159 the input array and inlen indicates the number of bytes to the end of
160 the array to be converted. The outarray argument points to a character
161 byte array to the first available byte in the output array and outlen
162 indicates the number of the available bytes to the end of the array.
163 Unless flag is U8_TEXTPREP_IGNORE_NULL, u8_textprep_str() normally
164 stops when it encounters a null byte from the input array regardless of
165 the current inlen value.
166
167
168 If flag is U8_TEXTPREP_IGNORE_INVALID and a sequence of input bytes
169 does not form a valid UTF-8 character, preparation stops after the pre‐
170 vious successfully prepared character. If flag is
171 U8_TEXTPREP_IGNORE_INVALID and the input array ends with an incomplete
172 UTF-8 character, preparation stops after the previous successfully pre‐
173 pared bytes. If the output array is not large enough to hold the entire
174 prepared text, preparation stops just prior to the input bytes that
175 would cause the output array to overflow. The value pointed to by inlen
176 is decremented to reflect the number of bytes still not prepared in the
177 input array. The value pointed to by outlen is decremented to reflect
178 the number of bytes still available in the output array.
179
181 The u8_textprep_str() function updates the values pointed to by inlen
182 and outlen arguments to reflect the extent of the preparation. When
183 U8_TEXTPREP_IGNORE_INVALID is specified, u8_textprep_str() returns the
184 number of illegal or incomplete characters found during the text prepa‐
185 ration. When U8_TEXTPREP_IGNORE_INVALID is not specified and the text
186 preparation is entirely successful, the function returns 0. If the
187 entire string in the input array is prepared, the value pointed to by
188 inlen will be 0. If the text preparation is stopped due to any condi‐
189 tions mentioned above, the value pointed to by inlen will be non-zero
190 and errnum is set to indicate the error. If such and any other error
191 occurs, u8_textprep_str() returns (size_t)-1 and sets errnum to indi‐
192 cate the error.
193
195 Example 1 Simple UTF-8 text preparation
196
197 #include <sys/u8_textprep.h>
198 .
199 .
200 .
201 size_t ret;
202 char ib[MAXPATHLEN];
203 char ob[MAXPATHLEN];
204 size_t il, ol;
205 int err;
206 .
207 .
208 .
209 /*
210 * We got a UTF-8 pathname from somewhere.
211 *
212 * Calculate the length of input string including the terminating
213 * NULL byte and prepare other arguments.
214 */
215 (void) strlcpy(ib, pathname, MAXPATHLEN);
216 il = strlen(ib) + 1;
217 ol = MAXPATHLEN;
218
219 /*
220 * Do toupper case folding, apply Unicode Normalization Form D,
221 * ignore NULL byte, and ignore any illegal/incomplete characters.
222 */
223 ret = u8_textprep_str(ib, &il, ob, &ol,
224 (U8_TEXTPREP_IGNORE_NULL|U8_TEXTPREP_IGNORE_INVALID|
225 U8_TEXTPREP_TOUPPER|U8_TEXTPREP_NFD), U8_UNICODE_LATEST, &err);
226 if (ret == (size_t)-1) {
227 if (err == E2BIG)
228 return (-1);
229 if (err == EBADF)
230 return (-2);
231 if (err == ERANGE)
232 return (-3);
233 return (-4);
234 }
235
236
238 See attributes(5) for descriptions of the following attributes:
239
240
241
242
243 ┌─────────────────────────────┬─────────────────────────────┐
244 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
245 ├─────────────────────────────┼─────────────────────────────┤
246 │Interface Stability │Committed │
247 ├─────────────────────────────┼─────────────────────────────┤
248 │MT-Level │MT-Safe │
249 └─────────────────────────────┴─────────────────────────────┘
250
252 u8_strcmp(3C), u8_validate(3C), attributes(5), u8_strcmp(9F),
253 u8_textprep_str(9F), u8_validate(9F)
254
255
256 The Unicode Standard (http://www.unicode.org)
257
259 After the text preparation, the number of prepared UTF-8 characters and
260 the total number bytes may decrease or increase when you compare the
261 numbers with the input buffer.
262
263
264 Case conversions are performed using Unicode data of the corresponding
265 version. There are no locale-specific case conversions that can be per‐
266 formed.
267
268
269
270SunOS 5.11 18 Sep 2007 u8_textprep_str(3C)