1u8_textprep_str(3C)      Standard C Library Functions      u8_textprep_str(3C)
2
3
4

NAME

6       u8_textprep_str - string-based UTF-8 text preparation function
7

SYNOPSIS

9       #include <sys/u8_textprep.h>
10
11       size_t u8_textprep_str(char *inarray, size_t *inlen,
12            char *outarray, size_t *outlen, int flag,
13            size_t unicode_version, int *errnum);
14
15

PARAMETERS

17       inarray             A  pointer to a byte array containing a sequence of
18                           UTF-8 character bytes to be prepared.
19
20
21       inlen               As input argument, the number of bytes to  be  pre‐
22                           pared in inarray. As output argument, the number of
23                           bytes in inarray still not consumed.
24
25
26       outarray            A pointer to a  byte  array  where  prepared  UTF-8
27                           character bytes can be saved.
28
29
30       outlen              As input argument, the number of available bytes at
31                           outarray where  prepared  character  bytes  can  be
32                           saved.   As  output argument, after the conversion,
33                           the number of bytes still available at outarray.
34
35
36       flag                The possible preparation options constructed  by  a
37                           bitwise-inclusive-OR of the following values:
38
39                           U8_TEXTPREP_IGNORE_NULL
40
41                               Normally  u8_textprep_str()  stops the prepara‐
42                               tion if it encounters null  byte  even  if  the
43                               current  inlen  is  pointing  to a value bigger
44                               than zero.
45
46                               With this option, null byte does not  stop  the
47                               preparation and the preparation continues until
48                               inlen specified amount of inarray bytes are all
49                               consumed for preparation or an error happened.
50
51
52                           U8_TEXTPREP_IGNORE_INVALID
53
54                               Normally  u8_textprep_str()  stops the prepara‐
55                               tion if it  encounters  illegal  or  incomplete
56                               characters with corresponding errnum values.
57
58                               When this option is set, u8_textprep_str() does
59                               not stop the  preparation  and  instead  treats
60                               such  characters  as no need to do any prepara‐
61                               tion.
62
63
64                           U8_TEXTPREP_TOUPPER
65
66                               Map lowercase characters to  uppercase  charac‐
67                               ters if applicable.
68
69
70                           U8_TEXTPREP_TOLOWER
71
72                               Map  uppercase  characters to lowercase charac‐
73                               ters if applicable.
74
75
76                           U8_TEXTPREP_NFD
77
78                               Apply Unicode Normalization Form D.
79
80
81                           U8_TEXTPREP_NFC
82
83                               Apply Unicode Normalization Form C.
84
85
86                           U8_TEXTPREP_NFKD
87
88                               Apply Unicode Normalization Form KD.
89
90
91                           U8_TEXTPREP_NFKC
92
93                               Apply Unicode Normalization Form KC.
94
95                           Only one case folding option is allowed.  Only  one
96                           Unicode Normalization option is allowed.
97
98                           When a case folding option and a Unicode Normaliza‐
99                           tion option  are  specified  together,  UTF-8  text
100                           preparation is done by doing case folding first and
101                           then Unicode Normalization.
102
103                           If no option is  specified,  no  processing  occurs
104                           except  the  simple  copying of bytes from input to
105                           output.
106
107
108       unicode_version     The version of Unicode data  that  should  be  used
109                           during UTF-8 text preparation. The following values
110                           are supported:
111
112                           U8_UNICODE_320
113
114                               Use Unicode 3.2.0 data during comparison.
115
116
117                           U8_UNICODE_500
118
119                               Use Unicode 5.0.0 data during comparison.
120
121
122                           U8_UNICODE_LATEST
123
124                               Use the latest Unicode version  data  available
125                               which is Unicode 5.0.0 currently.
126
127
128
129       errnum              The  error  value when preparation is not completed
130                           or fails. The following values are supported:
131
132                           E2BIG     Text preparation stopped due to  lack  of
133                                     space in the output array.
134
135
136                           EBADF     Specified  option  values are conflicting
137                                     and cannot be supported.
138
139
140                           EILSEQ    Text preparation stopped due to an  input
141                                     byte that does not belong to UTF-8.
142
143
144                           EINVAL    Text preparation stopped due to an incom‐
145                                     plete UTF-8 character at the end  of  the
146                                     input array.
147
148
149                           ERANGE    The  specified  Unicode  version value is
150                                     not a supported version.
151
152
153

DESCRIPTION

155       The u8_textprep_str() function prepares the sequence of  UTF-8  charac‐
156       ters in the array specified by inarray into a sequence of corresponding
157       UTF-8 characters prepared in the array specified by outarray. The inar‐
158       ray argument points to a character byte array to the first character in
159       the input array and inlen indicates the number of bytes to the  end  of
160       the  array to be converted. The outarray argument points to a character
161       byte array to the first available byte in the output array  and  outlen
162       indicates  the  number  of the available bytes to the end of the array.
163       Unless  flag  is  U8_TEXTPREP_IGNORE_NULL,  u8_textprep_str()  normally
164       stops when it encounters a null byte from the input array regardless of
165       the current inlen value.
166
167
168       If flag is U8_TEXTPREP_IGNORE_INVALID and a  sequence  of  input  bytes
169       does not form a valid UTF-8 character, preparation stops after the pre‐
170       vious    successfully    prepared     character.     If     flag     is
171       U8_TEXTPREP_IGNORE_INVALID  and the input array ends with an incomplete
172       UTF-8 character, preparation stops after the previous successfully pre‐
173       pared bytes. If the output array is not large enough to hold the entire
174       prepared text, preparation stops just prior to  the  input  bytes  that
175       would cause the output array to overflow. The value pointed to by inlen
176       is decremented to reflect the number of bytes still not prepared in the
177       input  array.  The value pointed to by outlen is decremented to reflect
178       the number of bytes still available in the output array.
179

RETURN VALUES

181       The u8_textprep_str() function updates the values pointed to  by  inlen
182       and  outlen  arguments  to  reflect the extent of the preparation. When
183       U8_TEXTPREP_IGNORE_INVALID is specified, u8_textprep_str() returns  the
184       number of illegal or incomplete characters found during the text prepa‐
185       ration. When U8_TEXTPREP_IGNORE_INVALID is not specified and  the  text
186       preparation  is  entirely  successful,  the  function returns 0. If the
187       entire string in the input array is prepared, the value pointed  to  by
188       inlen  will  be 0. If the text preparation is stopped due to any condi‐
189       tions mentioned above, the value pointed to by inlen will  be  non-zero
190       and  errnum  is  set to indicate the error. If such and any other error
191       occurs, u8_textprep_str() returns (size_t)-1 and sets errnum  to  indi‐
192       cate the error.
193

EXAMPLES

195       Example 1 Simple UTF-8 text preparation
196
197         #include <sys/u8_textprep.h>
198         .
199         .
200         .
201         size_t ret;
202         char ib[MAXPATHLEN];
203         char ob[MAXPATHLEN];
204         size_t il, ol;
205         int err;
206         .
207         .
208         .
209         /*
210          * We got a UTF-8 pathname from somewhere.
211          *
212          * Calculate the length of input string including the terminating
213          * NULL byte and prepare other arguments.
214          */
215         (void) strlcpy(ib, pathname, MAXPATHLEN);
216         il = strlen(ib) + 1;
217         ol = MAXPATHLEN;
218
219         /*
220          * Do toupper case folding, apply Unicode Normalization Form D,
221          * ignore NULL byte, and ignore any illegal/incomplete characters.
222          */
223         ret = u8_textprep_str(ib, &il, ob, &ol,
224             (U8_TEXTPREP_IGNORE_NULL|U8_TEXTPREP_IGNORE_INVALID|
225             U8_TEXTPREP_TOUPPER|U8_TEXTPREP_NFD), U8_UNICODE_LATEST, &err);
226         if (ret == (size_t)-1) {
227             if (err == E2BIG)
228                 return (-1);
229             if (err == EBADF)
230                 return (-2);
231             if (err == ERANGE)
232                 return (-3);
233             return (-4);
234         }
235
236

ATTRIBUTES

238       See attributes(5) for descriptions of the following attributes:
239
240
241
242
243       ┌─────────────────────────────┬─────────────────────────────┐
244       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
245       ├─────────────────────────────┼─────────────────────────────┤
246       │Interface Stability          │Committed                    │
247       ├─────────────────────────────┼─────────────────────────────┤
248       │MT-Level                     │MT-Safe                      │
249       └─────────────────────────────┴─────────────────────────────┘
250

SEE ALSO

252       u8_strcmp(3C),     u8_validate(3C),    attributes(5),    u8_strcmp(9F),
253       u8_textprep_str(9F), u8_validate(9F)
254
255
256       The Unicode Standard (http://www.unicode.org)
257

NOTES

259       After the text preparation, the number of prepared UTF-8 characters and
260       the  total  number  bytes may decrease or increase when you compare the
261       numbers with the input buffer.
262
263
264       Case conversions are performed using Unicode data of the  corresponding
265       version. There are no locale-specific case conversions that can be per‐
266       formed.
267
268
269
270SunOS 5.11                        18 Sep 2007              u8_textprep_str(3C)
Impressum