tickit_utf8_count(3)

1TICKIT_UTF8_COUNT(3)       Library Functions Manual       TICKIT_UTF8_COUNT(3)
2
3
4

NAME

6       tickit_utf8_count,  tickit_utf8_countmore - count characters in Unicode
7       strings
8

SYNOPSIS

10       #include <tickit.h>
11
12       typedef struct {
13           size_t bytes;
14           int    codepoints;
15           int    graphemes;
16           int    columns;
17       } TickitStringPos;
18
19       size_t tickit_utf8_count(const char *str, TickitStringPos *pos,
20           const TickitStringPos *limit);
21       size_t tickit_utf8_countmore(const char *str, TickitStringPos *pos,
22           const TickitStringPos *limit);
23
24       size_t tickit_utf8_ncount(const char *str, size_t len,
25           TickitStringPos *pos, const TickitStringPos *limit);
26       size_t tickit_utf8_ncountmore(const char *str, size_t len,
27           TickitStringPos *pos, const TickitStringPos *limit);
28
29       Link with -ltickit.
30

DESCRIPTION

32       tickit_utf8_count() counts characters  in  the  given  Unicode  string,
33       which  must  be  in  UTF-8  encoding. It starts at the beginning of the
34       string and counts forward over codepoints and  graphemes,  incrementing
35       the  counters  in  pos until it reaches a limit. It will not go further
36       than any of the limits given by the limits structure (where  the  value
37       -1 indicates no limit of that type). It will never split a codepoint in
38       the middle of a UTF-8 sequence, nor will it split  a  grapheme  between
39       its  codepoints;  it  is  therefore  possible that the function returns
40       before any of the limits have been reached, if the next whole  grapheme
41       would  involve  going  past  at  least one of the specified limits. The
42       function will also stop when it reaches the end of str. It returns  the
43       total number of bytes it has counted over.
44
45       The bytes member counts UTF-8 bytes which encode individual codepoints.
46       For example the Unicode character U+00E9 is encoded by two bytes  0xc3,
47       0xa9;  it  would  increment  the  bytes counter by 2 and the codepoints
48       counter by 1.
49
50       The codepoints member counts individual Unicode codepoints.
51
52       The graphemes member counts whole composed graphical clusters of  code‐
53       points, where combining accents which count as individual codepoints do
54       not count as separate graphemes. For example,  the  codepoint  sequence
55       U+0065  U+0301  would  increment  the  codepoint  counter  by 2 and the
56       graphemes counter by 1.
57
58       The columns member counts the number of screen columns consumed by  the
59       graphemes.  Most  graphemes consume only 1 column, but some are defined
60       in Unicode to consume 2.
61
62       tickit_utf8_countmore() is similar  to  tickit_utf8_count()  except  it
63       will  not  zero  any  of the counters before it starts. It can continue
64       counting where a previous call finished. In particular, it will  assume
65       that  it is starting at the beginning of a UTF-8 sequence that begins a
66       new grapheme; it will not check these facts and the behavior  is  unde‐
67       fined  if  these  assumptions  do not hold. It will begin at the offset
68       given by pos.bytes.
69
70       The tickit_utf8_ncount() and tickit_utf8_ncountmore() variants are sim‐
71       ilar  except  that they read no more than len bytes from the string and
72       do not require it to be NUL terminated. They will still stop at  a  NUL
73       byte if one is found before len bytes have been read.
74
75       These functions will all immediately abort if any C0 or C1 control byte
76       other than NUL is encountered, returning the value -1. In this  circum‐
77       stance,  the  pos  structure will still be updated with the progress so
78       far.
79

USAGE

81       Typically, these functions would be used either of two ways.
82
83       When given a value in limit.bytes (or no limit and simply using  string
84       termination),  tickit_utf8_count()  will  yield  the width of the given
85       string in terminal columns, in the limit.columns field.
86
87       When given a value in limit.columns, tickit_utf8_count() will yield the
88       number of bytes of that string that will consume the given space on the
89       terminal.
90

RETURN VALUE

92       tickit_utf8_count() and tickit_utf8_countmore() return  the  number  of
93       bytes they have skipped over this call, or -1 if they encounter a C0 or
94       C1 byte other than NUL .
95

NAME

SYNOPSIS

DESCRIPTION

USAGE

RETURN VALUE

SEE ALSO