1UNICODE::WORDBREAK(3)       Courier Unicode Library      UNICODE::WORDBREAK(3)
2
3
4

NAME

6       unicode::wordbreak_callback_base, unicode::wordbreak - unicode
7       word-breaking rules
8

SYNOPSIS

10       #include <courier-unicode.h>
11
12       class wordbreak : public unicode::wordbreak_callback_base {
13
14       public:
15
16           using unicode::wordbreak_callback_base::operator<<;
17           using unicode::wordbreak_callback_base::operator();
18           int callback(bool flag)
19           {
20               // ...
21           }
22       };
23
24       char32_t c;
25       std::u32string buf;
26
27       wordbreak compute_wordbreak;
28
29       compute_wordbreak << c;
30
31       compute_wordbreak(buf);
32       compute_wordbreak(buf.begin(), buf.end());
33
34       compute_wordbreak.finish();
35
36       // ...
37
38       unicode_wordbreakscan scan;
39
40       scan << c;
41
42       size_t nchars=scan.finish();
43
44

DESCRIPTION

46       unicode::wordbreak_callback_base is a C++ binding for the unicode
47       word-breaking rule implementation described in unicode_word_break(3).
48
49       Subclass unicode::wordbreak_callback_base and implement callback()
50       that's virtually inherited from unicode::wordbreak_callback_base. The
51       callback() callback function receives the output values from the
52       word-breaking algorithm, namely a bool indicating whether a word break
53       exists before the unicode character in the underlying input sequence.
54
55       callback() should return 0. A non-zero return reports an error, that
56       stops the word-breaking algorithm. See unicode_word_break(3) for more
57       information.
58
59       The input unicode characters for the word-breaking algorithm are
60       provided by the << operator, one unicode character at a time; or by the
61       () operator, passing either a container, or a beginning and an ending
62       iterator value for an input sequence of unicode characters.  finish()
63       indicates the end of the unicode character sequence.
64
65       unicode::wordbreakscan is a C++ binding for the unicode_wbscan_init(),
66       unicode_wbscan_next() and unicode_wbscan_end methods described in
67       unicode_word_break(3). Its << iterates over the unicode characters, and
68       finish() indicates the number of characters before the first unicode
69       word break. The << iterator returns a bool indicating when the first
70       word break has already been found, so further calls are not necessary.
71

SEE ALSO

73       courier-unicode(7), unicode_word_break(3).
74

AUTHOR

76       Sam Varshavchik
77           Author
78
79
80
81Courier Unicode Library           04/16/2022             UNICODE::WORDBREAK(3)
Impressum