1UNICODE::WORDBREAK(3) Courier Unicode Library UNICODE::WORDBREAK(3)
2
3
4
6 unicode::wordbreak_callback_base, unicode::wordbreak_callback_base -
7 unicode word-breaking rules
8
10 #include <courier-unicode.h>
11
12 class wordbreak : public unicode::wordbreak_callback_base {
13
14 public:
15
16 using unicode::wordbreak_callback_base::operator<<;
17 using unicode::wordbreak_callback_base::operator();
18 int callback(bool flag)
19 {
20 // ...
21 }
22 };
23
24 char32_t c;
25 std::u32string buf;
26
27 wordbreak compute_wordbreak;
28
29 compute_wordbreak << c;
30
31 compute_wordbreak(buf);
32 compute_wordbreak(buf.begin(), buf.end());
33
34 compute_wordbreak.finish();
35
36 // ...
37
38 unicode_wordbreakscan scan;
39
40 scan << c;
41
42 size_t nchars=scan.finish();
43
44
46 unicode::wordbreak_callback_base is a C++ binding for the unicode
47 word-breaking rule implementation described in unicode_word_break(3).
48
49 Subclass unicode::wordbreak_callback_base and implement callback()
50 that's virtually inherited from unicode::wordbreak_callback_base. The
51 callback() callback function receives the output values from the
52 word-breaking algorithm, namely a bool indicating whether a word break
53 exists before the unicode character in the underlying input sequence.
54
55 callback() should return 0. A non-zero return reports an error, that
56 stops the word-breaking algorithm. See unicode_word_break(3) for more
57 information.
58
59 The input unicode characters for the word-breaking algorithm are
60 provided by the << operator, one unicode character at a time; or by the
61 () operator, passing either a container, or a beginning and an ending
62 iterator value for an input sequence of unicode characters. finish()
63 indicates the end of the unicode character sequence.
64
65 unicode::wordbreakscan is a C++ binding for the unicode_wbscan_init(),
66 unicode_wbscan_next() and unicode_wbscan_end methods described in
67 unicode_word_break(3). Its << iterates over the unicode characters, and
68 finish() indicates the number of characters before the first unicode
69 word break. The << iterator returns a bool indicating when the first
70 word break has already been found, so further calls are not necessary.
71
73 courier-unicode(7), unicode_word_break(3).
74
76 Sam Varshavchik
77 Author
78
79
80
81Courier Unicode Library 03/11/2017 UNICODE::WORDBREAK(3)