unicode::wordbreak(3)

1UNICODE::WORDBREAK(3)       Courier Unicode Library      UNICODE::WORDBREAK(3)
2
3
4

NAME

6       unicode::wordbreak_callback_base, unicode::wordbreak - unicode
7       word-breaking rules
8

SYNOPSIS

10       #include <courier-unicode.h>
11
12       class wordbreak : public unicode::wordbreak_callback_base {
13
14       public:
15
16           using unicode::wordbreak_callback_base::operator<<;
17           using unicode::wordbreak_callback_base::operator();
18           int callback(bool flag)
19           {
20               // ...
21           }
22       };
23
24       char32_t c;
25       std::u32string buf;
26
27       wordbreak compute_wordbreak;
28
29       compute_wordbreak << c;
30
31       compute_wordbreak(buf);
32       compute_wordbreak(buf.begin(), buf.end());
33
34       compute_wordbreak.finish();
35
36       // ...
37
38       unicode_wordbreakscan scan;
39
40       scan << c;
41
42       size_t nchars=scan.finish();
43
44

DESCRIPTION

46       unicode::wordbreak_callback_base is a C++ binding for the unicode
47       word-breaking rule implementation described in unicode_word_break(3).
48
49       Subclass unicode::wordbreak_callback_base and implement callback()
50       that's virtually inherited from unicode::wordbreak_callback_base. The
51       callback() callback function receives the output values from the
52       word-breaking algorithm, namely a bool indicating whether a word break
53       exists before the unicode character in the underlying input sequence.
54
55       callback() should return 0. A non-zero return reports an error, that
56       stops the word-breaking algorithm. See unicode_word_break(3) for more
57       information.
58
59       The input unicode characters for the word-breaking algorithm are
60       provided by the << operator, one unicode character at a time; or by the
61       () operator, passing either a container, or a beginning and an ending
62       iterator value for an input sequence of unicode characters.  finish()
63       indicates the end of the unicode character sequence.
64
65       unicode::wordbreakscan is a C++ binding for the unicode_wbscan_init(),
66       unicode_wbscan_next() and unicode_wbscan_end methods described in
67       unicode_word_break(3). Its << iterates over the unicode characters, and
68       finish() indicates the number of characters before the first unicode
69       word break. The << iterator returns a bool indicating when the first
70       word break has already been found, so further calls are not necessary.
71

AUTHOR

76       Sam Varshavchik
77           Author
78
79
80
81Courier Unicode Library           04/16/2022             UNICODE::WORDBREAK(3)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR