1TICKIT_UTF8_COUNT(3) Library Functions Manual TICKIT_UTF8_COUNT(3)
2
3
4
6 tickit_utf8_count, tickit_utf8_countmore - count characters in Unicode
7 strings
8
10 #include <tickit.h>
11
12 typedef struct {
13 size_t bytes;
14 int codepoints;
15 int graphemes;
16 int columns;
17 } TickitStringPos;
18
19 size_t tickit_utf8_count(const char *str, TickitStringPos *pos,
20 const TickitStringPos *limit);
21 size_t tickit_utf8_countmore(const char *str, TickitStringPos *pos,
22 const TickitStringPos *limit);
23
24 size_t tickit_utf8_ncount(const char *str, size_t len,
25 TickitStringPos *pos, const TickitStringPos *limit);
26 size_t tickit_utf8_ncountmore(const char *str, size_t len,
27 TickitStringPos *pos, const TickitStringPos *limit);
28
29 Link with -ltickit.
30
32 tickit_utf8_count() counts characters in the given Unicode string,
33 which must be in UTF-8 encoding. It starts at the beginning of the
34 string and counts forward over codepoints and graphemes, incrementing
35 the counters in pos until it reaches a limit. It will not go further
36 than any of the limits given by the limits structure (where the value
37 -1 indicates no limit of that type). It will never split a codepoint in
38 the middle of a UTF-8 sequence, nor will it split a grapheme between
39 its codepoints; it is therefore possible that the function returns
40 before any of the limits have been reached, if the next whole grapheme
41 would involve going past at least one of the specified limits. The
42 function will also stop when it reaches the end of str. It returns the
43 total number of bytes it has counted over.
44
45 The bytes member counts UTF-8 bytes which encode individual codepoints.
46 For example the Unicode character U+00E9 is encoded by two bytes 0xc3,
47 0xa9; it would increment the bytes counter by 2 and the codepoints
48 counter by 1.
49
50 The codepoints member counts individual Unicode codepoints.
51
52 The graphemes member counts whole composed graphical clusters of code‐
53 points, where combining accents which count as individual codepoints do
54 not count as separate graphemes. For example, the codepoint sequence
55 U+0065 U+0301 would increment the codepoint counter by 2 and the
56 graphemes counter by 1.
57
58 The columns member counts the number of screen columns consumed by the
59 graphemes. Most graphemes consume only 1 column, but some are defined
60 in Unicode to consume 2.
61
62 tickit_utf8_countmore() is similar to tickit_utf8_count() except it
63 will not zero any of the counters before it starts. It can continue
64 counting where a previous call finished. In particular, it will assume
65 that it is starting at the beginning of a UTF-8 sequence that begins a
66 new grapheme; it will not check these facts and the behavior is unde‐
67 fined if these assumptions do not hold. It will begin at the offset
68 given by pos.bytes.
69
70 The tickit_utf8_ncount() and tickit_utf8_ncountmore() variants are sim‐
71 ilar except that they read no more than len bytes from the string and
72 do not require it to be NUL terminated. They will still stop at a NUL
73 byte if one is found before len bytes have been read.
74
75 These functions will all immediately abort if any C0 or C1 control byte
76 other than NUL is encountered, returning the value -1. In this circum‐
77 stance, the pos structure will still be updated with the progress so
78 far.
79
81 Typically, these functions would be used either of two ways.
82
83 When given a value in limit.bytes (or no limit and simply using string
84 termination), tickit_utf8_count() will yield the width of the given
85 string in terminal columns, in the limit.columns field.
86
87 When given a value in limit.columns, tickit_utf8_count() will yield the
88 number of bytes of that string that will consume the given space on the
89 terminal.
90
92 tickit_utf8_count() and tickit_utf8_countmore() return the number of
93 bytes they have skipped over this call, or -1 if they encounter a C0 or
94 C1 byte other than NUL .
95
97 tickit_stringpos_zero(3), tickit_stringpos_limit_bytes(3),
98 tickit_utf8_mbswidth(3), tickit(7)
99
100
101
102 TICKIT_UTF8_COUNT(3)