1UNICODE_LINE_BREAK(3)       Courier Unicode Library      UNICODE_LINE_BREAK(3)
2
3
4

NAME

6       unicode_line_break, unicode_lb_init, unicode_lb_set_opts,
7       unicode_lb_next, unicode_lb_next_cnt, unicode_lb_end, unicode_lbc_init,
8       unicode_lbc_set_opts, unicode_lbc_next, unicode_lbc_next_cnt,
9       unicode_lbc_end - calculate mandatory or allowed line breaks
10

SYNOPSIS

12       #include <courier-unicode.h>
13
14       unicode_lb_info_t unicode_lb_init(int (*cb_func)(int, void *),
15                                         void *cb_arg);
16
17       void unicode_lb_set_opts(unicode_lb_info_t lb, int opts);
18
19       int unicode_lb_next(unicode_lb_info_t lb, char32_t c);
20
21       int unicode_lb_next_cnt(unicode_lb_info_t lb, const char32_t *cptr,
22                               size_t cnt);
23
24       int unicode_lb_end(unicode_lb_info_t lb);
25
26       unicode_lbc_info_t
27                                           unicode_lbc_init(int (*cb_func)(int, char32_t, void *),
28                                           void *cb_arg);
29
30       void unicode_lbc_set_opts(unicode_lbc_info_t lb, int opts);
31
32       int unicode_lbc_next(unicode_lb_info_t lb, char32_t c);
33
34       int unicode_lbc_next_cnt(unicode_lb_info_t lb, const char32_t *cptr,
35                                size_t cnt);
36
37       int unicode_lbc_end(unicode_lb_info_t lb);
38

DESCRIPTION

40       These functions implement the unicode line breaking algorithm. Invoke
41       unicode_lb_init() to initialize the line breaking algorithm. The first
42       parameter is a callback function. The second parameter is an opaque
43       pointer. The callback function gets invoked with two parameters. The
44       first parameter is one of three values: UNICODE_LB_MANDATORY,
45       UNICODE_LB_NONE, or UNICODE_LB_ALLOWED, as described below. The second
46       parameter is the opaque pointer that was passed to unicode_lb_init();
47       the opaque pointer is not subject to any further interpretation by
48       these functions.
49
50       unicode_lb_init() returns an opaque handle. Repeated invocations of
51       unicode_lb_next(), passing the handle and one unicode character at a
52       time, defines a sequence of unicode characters over which the line
53       breaking algorithm calculation takes place.  unicode_lb_next_cnt() is a
54       shortcut for invoking unicode_lb_next() repeatedly over an array cptr
55       containing cnt unicode characters.
56
57       unicode_lb_end() denotes the end of the unicode character sequence.
58       After the call to unicode_lb_end() the line breaking unicode_lb_info_t
59       handle is no longer valid.
60
61       Between the call to unicode_lb_init() and unicode_lb_end(), the
62       callback function gets invoked exactly once for each unicode character
63       given to unicode_lb_next() or unicode_lb_next_cnt(). Usually each call
64       to unicode_lb_next() results in the callback function getting invoked
65       immediately, but it does not have to be. It's possible that a call to
66       unicode_lb_next() returns without invoking the callback function, and
67       some subsequent call to unicode_lb_next() (or unicode_lb_end()) invokes
68       the callback function more than once, to catch up. The contract is that
69       before unicode_lb_end() returns, the callback function gets invoked the
70       exact number of times as the number of characters in the unicode
71       sequence defined by the intervening calls to unicode_lb_next() and
72       unicode_lb_next_cnt(), unless an error occurs.
73
74       Each call to the callback function reports the calculated line breaking
75       status of the corresponding character in the unicode character
76       sequence:
77
78       UNICODE_LB_MANDATORY
79           A line break is MANDATORY before the corresponding character.
80
81       UNICODE_LB_NONE
82           A line break is PROHIBITED before the corresponding character.
83
84       UNICODE_LB_ALLOWED
85           A line break is OPTIONAL before the corresponding character.
86
87       The callback function should return 0. A non-zero value indicates to
88       the line breaking algorithm that an error has occurred.
89       unicode_lb_next() and unicode_lb_next_cnt() return zero either if they
90       never invoked the callback function, or if each call to the callback
91       function returned zero. A non zero return from the callback function
92       results in unicode_lb_next() and unicode_lb_next_cnt() immediately
93       returning the same value.
94
95       unicode_lb_end() must be invoked to destroy the line breaking handle
96       even if unicode_lb_next() and unicode_lb_next_cnt() returned an error
97       indication. It's also possible that, under normal circumstances,
98       unicode_lb_end() invokes the callback function one or more times. The
99       return value from unicode_lb_end() has the same meaning as from
100       unicode_lb_next() and unicode_lb_next_cnt(); however in all cases after
101       unicode_lb_end() returns the line breaking handle is no longer valid.
102
103   Alternative callback function
104       unicode_lbc_init(), unicode_lbc_next(), unicode_lbc_next_cnt(),
105       unicode_lbc_end() are alternative functions that implement the same
106       algorithm. The only difference is that the callback function receives
107       an extra parameter, the unicode character value to which the line
108       breaking status applies to, passed through from the input unicode
109       character sequence.
110
111   Options
112       unicode_lb_set_opts() and unicode_lbc_set_opts() enable non-default
113       options for the line breaking algorithm. These functions must be called
114       immediately after unicode_lb_init() or unicode_lbc_init(), and before
115       any other function.  opts is a bitmask that can contain the following
116       values:
117
118       UNICODE_LB_OPT_PRBREAK
119           Enables a modified LB24 rule. This prevents plus signs, as in “C++”
120           from breaking. This flag adds the following rules to the LB24 rule:
121
122                              PR x PR
123
124                              AL x PR
125
126                                 ID x PR
127
128       UNICODE_LB_OPT_SYBREAK
129           Tailored breaking rules for the “/” character. This prevents
130           breaking after the “/” character (think URLs); including an
131           exception to the “x SY” rule in LB13. This flag adds the following
132           rules to the LB24 rule:
133
134                              SY x EX
135
136                              SY x AL
137
138                              SY x ID
139
140                                 SP ÷ SY, which takes precedence over "x SY".
141
142       UNICODE_LB_OPT_DASHWJ
143           This flag reclassifies U+2013 and U+2014 as class WJ, prohibiting
144           breaks before and after the m-dash and the n-dash unicode
145           characters.
146

SEE ALSO

148       courier-unicode(7), unicode::linebreak(3), TR-14[1]
149

AUTHOR

151       Sam Varshavchik
152           Author
153

NOTES

155        1. TR-14
156           https://www.unicode.org/reports/tr14/tr14-45.html
157
158
159
160Courier Unicode Library           04/16/2022             UNICODE_LINE_BREAK(3)
Impressum