1Prima::Drawable::GlyphsU(s3e)r Contributed Perl DocumentaPtriiomna::Drawable::Glyphs(3)
2
3
4
6 Prima::Drawable::Glyphs - helper routines for bi-directional text input
7 and complex scripts output
8
10 use Prima;
11 $::application-> begin_paint;
12 $::application-> text_shape_out('אפס123', 0,0);
13
14 123ספא
15
17 The class implements an abstraction over a set of glyphs that can be
18 rendered to represent text strings. Objects of the class are created
19 and returned from "Prima::Drawable::text_shape" calls, see more in
20 "text_shape" in Prima::Drawable. An object is a blessed array reference
21 that can contain either two or four packed arrays with 16-bit integers,
22 representing, correspondingly, a set of glyph indexes, a set of
23 character indexes, a set of glyph advances, and a set of glyph position
24 offsets per glyph. Additionally, the class implements several sets of
25 helper routines that aim to address common tasks when displaying glyph-
26 based strings.
27
28 Structure
29 Each array is an instance of "Prima::array", an effective plain memory
30 structure that provides standard perl interface over a string scalar
31 filled with fixed-width integers.
32
33 The following methods provide read-only access to these arrays:
34
35 glyphs
36 Contains set of unsigned 16-bit integers where each is a glyph
37 number corresponding to the font that was used when shaping the
38 text. These glyph numbers are only applicable to that font. Zero is
39 usually treated as a default glyph in vector fonts, when shaping
40 cannot map a character; in bitmap fonts this number it is usually a
41 "defaultChar".
42
43 This array is recognized as a special case when is set to
44 "text_out" or "get_text_width", that can process it without other
45 arrays. In this case, no special advances and glyph positions are
46 taken into the account though.
47
48 Each glyph is not necessarily mapped to a character, and quite
49 often it is not, even in english left-to-right texts. F ex
50 character combinations like "ff", "fi", "fl" can be mapped as
51 single ligature glyphs. When right-to-left, RTL, text direction is
52 taken into the account, the glyph positions may change, too. See
53 "indexes" below that addresses mapping of glyph to characters.
54
55 indexes
56 Contains set of unsigned 16-bit integers where each is an offset
57 corresponding to the text was used in shaping. Each glyph position
58 thus points to a first character in the text that maps to the
59 glyph.
60
61 There can be more than one characters per glyphs, such as the above
62 example with a "ff" ligature. There can also be cases with more
63 than one characher per more than one glyph, such is the case in
64 indic scripts. In these cases it is easier to operate neither by
65 character offsets nor glyph offsets, but rather by clusters, where
66 each is an individual syntax unit that contains one or more
67 characters perl one or more glyphs.
68
69 In addition to the text offset, each index value can be flagged
70 with a "to::RTL" bit, signifying that the character in question has
71 RTL direction. This is not necessarily semitic characters from RTL
72 languages that only have that attributes set; spaces in these
73 languages are normally attributed the RTL bit too, sometimes also
74 numbers. Use of explicit direction control characters from U+20XX
75 block can result in any character being assigned or not assigned
76 the RTL bit.
77
78 The array has an extra item added to its end, the length of the
79 text that was used in the snaping. This helps for easy calculation
80 of cluster length in characters, especially of the last one, where
81 difference between indexes is, basically, the cluster length.
82
83 The array is not used for text drawing or calculation, but only for
84 conversion between character, glyph, and cluster coordinates (see
85 "Coordinates" below).
86
87 advances
88 Contains set of unsigned 16-bit integers where each is a pixel
89 distance of how much space the glyph occupies. Where the advances
90 array is not present, or filled by "advances" options in
91 "text_shape", it is basically a sum of a, b, and c widths of a
92 glyph. However there are cases when depending on shaping input,
93 these values can differ.
94
95 One of those cases is combining graphemes, where text consisting of
96 two characters, "A" and combining grave accent U+300 should be
97 drawn as a single "À" symbol, but font doesn't have that single
98 glyph but rather two individual glyphs "A" and "`". There, where
99 grave glyph has its own advance for standalone usage, in this case
100 it should be ignored though, and that is achieved by setting the
101 advance of the "`" to zero.
102
103 The array content is respected by "text_out" and "get_text_width",
104 and its content can be changed at will to produce gaps in the text
105 quite easily. F ex "Prima::Edit" uses that to display tab
106 characters as spaces with 8x advance.
107
108 positions
109 Contains set of pairs of signed 16-bit integers where each is a X
110 and Y pixel offset for each glyph. Like in the previous example
111 with the "À" symbol, the grave glyph "`" may be positioned
112 differently on the vertical f ex on "À" and "à" graphemes.
113
114 The array is respected by "text_out" (but not by "get_text_width").
115
116 fonts
117 Contains set of unsigned 16-bit integers where each is an index in
118 the font substitution list (see "fontMapperPalette" in
119 Prima::Drawable). Zero means the current font.
120
121 The font substitution is applied by "text_shape" when "polyfont"
122 options is set (it is by default), and when the shaper cannot match
123 all fonts. If the current font contains all needed glyphs, this
124 entry is not present at all.
125
126 The array is respected by "text_out" and "get_text_width".
127
128 Coordinates
129 In addition to natural character coordinates, where each index is an
130 offset that can be directly used in "substr" perl function, this class
131 offers two additional coordinate systems that help abstract the object
132 data for display and navigation.
133
134 The glyph coordinate is a rather straighforward copy of the character
135 coordinates, where each number is an offset in the "glyphs" array.
136 Similarly, these offsets can be used to address individual glyphs,
137 indexes, advances, and positions. However these are not easy to use
138 when one needs, for example, to select a grapheme with a mouse, or
139 break set of glyphs in such a way so that a grapheme is not broken.
140 These can be managed easier in the cluster coordinate system.
141
142 The cluster coordinates are virtually superimposed set of offset where
143 each correspond to a set of one or more characters displayed by a one
144 or more glyphs. Most useful functions below operate in this system.
145
146 Selection
147 Practically, most useful coordinates that can be used for implementing
148 selection is either character or cluster, but not glyphs. The charater-
149 based selections makes trivial extraction or replacement of the
150 selected text, while the cluster-based makes it easier to manipulate (f
151 ex with Shift- arrow keys) the selection itself.
152
153 The class supports both, by operatin on selection maps or selection
154 chunks, where each represent same information but in different ways.
155 For example, consider embedded number in a bidi text. For the sake of
156 clarity I'll use latin characters here. Let's have a text scalar
157 containing these characters:
158
159 ABC123
160
161 where ABC is right-to-left text, and which, when rendered on screen,
162 should be displayed as
163
164 123CBA
165
166 (and index array is (3,4,5,2,1,0) ).
167
168 Next, the user clicks the mouse between A and B (in text offset 1),
169 drags the mouse then to the left, and finally stops between characters
170 2 and 3 (text offset 4). The resulting selection then should not be, as
171 one might naively expect, this:
172
173 123CBA
174 __^^^_
175
176 but this instead:
177
178 123CBA
179 ^^_^^_
180
181 because the next character after C is 1, and the range of the selected
182 sub-text is from characters 1 to 4.
183
184 The class offers to encode such information in a map, i.e. array of
185 integers "1,1,0,1,1,0", where each entry is either 0 or 1 depending on
186 whether the cluster is or is not selected. Alternatively, the same
187 information can be encoded in chunks, or RLE sets, as array
188 "0,2,1,2,1", where the first integer signifies number of non-selected
189 clusters to display, the second - number of selected clusters, the
190 third the non-selected again, etc. If the first character belongs to
191 the selected chunk, the first integer in the result is set to 0.
192
193 Bidi input
194 When sending input to a widget in order to type in text, the otherwise
195 trivial case of figuring out at which position the text should be
196 inserted (or removed, for that matter), becomes interesting when there
197 are characters with mixed direction.
198
199 F ex it is indeed trivial, when the latin text is "AB", and the cursor
200 is positioned between "A" and "B", to figure out that whenever the user
201 types "C", the result should become "ACB". Likewise, when the text is
202 RTL and both text and input is arabic, the result is the same. However
203 when f.ex. the text is "A1", that is displayed as "1A" because of RTL
204 shaping, and the cursor is positioned between 1 (LTR) and "A" (RTL), it
205 is not clear whether that means the new input should be appended after
206 1 and become "A1C", or after "A", and become, correspondingly, "AC1".
207
208 There is no easy solution for this problem, and different programs
209 approach this differently, and some go as far as to provide two cursors
210 for both directions. The class offers its own solution that uses some
211 primitive heuristics to detect whether cursor belongs to the left or to
212 the right glyph. This is the area that can be enhanced, and any help
213 from native users of RTL languages can be greatly appreciated.
214
216 abc $CANVAS, $INDEX
217 Returns a, b, c metrics from the glyph $INDEX
218
219 advances
220 Read-only accessor to the advances array, see Structure above.
221
222 clone
223 Clones the object
224
225 cluster2glyph $FROM, $LENGTH
226 Maps a range of clusters starting with $FROM with size $LENGTH into
227 the corresponding range of glyphs. Undefined $LENGTH calculates the
228 range from $FROM till the object end.
229
230 cluster2index $CLUSTER
231 Returns character offset of the first character in cluster
232 $CLUSTER.
233
234 Note: result may contain "to::RTL" flag.
235
236 cluster2range $CLUSTER
237 Returns character offset of the first character in cluster $CLUSTER
238 and how many characters are there in the cluster.
239
240 clusters
241 Returns array of integers where each is a first character offsets
242 per cluster.
243
244 cursor2offset $AT_CLUSTER, $PREFERRED_RTL
245 Given a cursor positioned next to the cluster $AT_CLUSTER, runs
246 simple heuristics to see what character offset it corresponds to.
247 $PREFERRED_RTL is used when object data are not enough.
248
249 See "Bidi input" above.
250
251 def $CANVAS, $INDEX
252 Returns d, e, f metrics from the glyph $INDEX
253
254 fonts
255 Read-only accessor to the font indexes, see Structure above.
256
257 get_box $CANVAS
258 Return box metrics of the glyph object.
259
260 See "get_text_box" in Prima::Drawable.
261
262 get_sub $FROM, $LENGTH
263 Extracts and clones a new object that constains data from cluster
264 offset $FROM, with cluster length $LENGTH.
265
266 get_sub_box $CANVAS, $FROM, $LENGTH
267 Calculate box metrics of a glyph string from the cluster $FROM with
268 size $LENGTH.
269
270 get_sub_width $CANVAS, $FROM, $LENGTH
271 Calculate pixel width of a glyph string from the cluster $FROM with
272 size $LENGTH.
273
274 get_width $CANVAS, $WITH_OVERHANGS
275 Return width of the glyph objects, with overhangs if requested.
276
277 glyph2cluster $GLYPH
278 Return the cluster that contains $GLYPH.
279
280 glyphs
281 Read-only accessor to the glyph indexes, see Structure above.
282
283 glyph_lengths
284 Returns array where each glyph position is set to a number showing
285 how many glyphs the cluster occupies at this position
286
287 index2cluster $INDEX
288 Returns the cluster that contains the character offset $INDEX.
289
290 indexes
291 Read-only accessor to the indexes, see Structure above.
292
293 index_lengths
294 Returns array where each glyph position is set to a number showing
295 how many characters the cluster occupies at this position
296
297 left_overhang
298 First integer from the "overhangs" result.
299
300 log2vis
301 Returns a map of integers where each character position corresponds
302 to a glyph position. The name is a rudiment from pure fribidi
303 shaping, where "log2vis" and "vis2log" were mapper functions with
304 the same functionality.
305
306 n_clusters
307 Calculates how many clusters the object contains.
308
309 new @ARRAYS
310 Create new object. Not used directly, but rather from inside
311 "text_shape" calls.
312
313 new_array NAME
314 Creates an array suitable for the object for direct insertion, if
315 manual construction of the object is needed. F ex one may set
316 missing "fonts" array like this:
317
318 $obj->[ Prima::Drawable::Glyphs::FONTS() ] = $obj->new_array('fonts');
319 $obj->fonts->[0] = 1;
320
321 The newly created array is filled with zeros.
322
323 new_empty
324 Creates a new empty object.
325
326 overhangs
327 Calculates two pixel widths for overhangs in the beginning and in
328 the end of the glyph string. This is used in emulation of a
329 "get_text_width" call with the "to::AddOverhangs" flag.
330
331 positions
332 Read-only accessor to the positions array, see Structure above.
333
334 reorder_text TEXT
335 Returns a visual representation of "TEXT" assuming it was the input
336 of the "text_shape" call that created the object.
337
338 reverse
339 Creates a new object that has all arrays reversed. User for
340 calculation of pixel offset from the right end of a glyph string.
341
342 right_overhang
343 Second integer from the "overhangs" result.
344
345 selection2range $CLUSTER_START $CLUSTER_END
346 Converts cluster selection range into text selection range
347
348 selection_chunks_clusters, selection_chunks_glyphs $START, $END
349 Calculates a set of chunks of texts, that, given a text selection
350 from positions $START to $END, represent each either a set of
351 selected and non-selected clusters/glyphs.
352
353 selection_diff $OLD, $NEW
354 Given set of two chunk lists, in format as returned by
355 "selection_chunks_clusters" or "selection_chunks_glyphs",
356 calculates the list of chunks affected by the selection change. Can
357 be used for efficient repaints when the user interactively changes
358 text selection, to redraw only the changed regions.
359
360 selection_map_clusters, selection_map_glyphs $START, $END
361 Same as "selection_chunks_XXX", but instead of RLE chunks returns
362 full array for each cluster/glyph, where each entry is a boolean
363 value corresponding to whether that cluster/glyph is to be
364 displayed as selected, or not.
365
366 selection_walk $CHUNKS, $FROM, $TO = length, $SUB
367 Walks the selection chunks array, returned by "selection_chunks",
368 between $FROM and $TO clusters/glyphs, and for each chunk calls the
369 provided "$SUB->($offset, $length, $selected)", where each call
370 contains 2 integers to chunk offset and length, and a boolean flag
371 whether the chunk is selected or not.
372
373 Can be also used on a result of "selection_diff", in which case
374 $selected flag is irrelevant.
375
376 sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
377 Optimized version of "$CANVAS->text_out( $self->get_sub($FROM,
378 $LENGTH), $X, $Y )".
379
380 sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
381 Optimized version of "$CANVAS->text_wrap( $self->get_sub($FROM,
382 $LENGTH), $WIDTH, $OPT, $TABS )". The result is also converted to
383 chunks.
384
385 text_length
386 Returns the length of the text that was shaped and that produced
387 the object.
388
389 x2cluster $CANVAS, $X, $FROM, $LENGTH
390 Given sub-cluster from $FROM with size $LENGTH, calculates how many
391 clusters would fit in width $X.
392
394 This section is only there to test proper rendering
395
396 Latin
397 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
398 eiusmod tempor incididunt ut labore et dolore magna aliqua.
399
400 Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
401
402 Latin combining
403 D̍üi̔s͙ a̸u̵t͏eͬ ịr͡u̍r͜e̥ d͎ǒl̋o̻rͫ i̮n̓
404 r͐e̔p͊rͨe̾h̍e͐n̔ḋe͠r̕i̾t̅ ịn̷ vͅo̖lͦuͦpͧt̪ątͅe̪
405
406 v̰e̷l̳i̯t̽ e̵s̼s̈e̮ ċi̵l͟l͙u͆m͂ d̿o̙lͭo͕r̀e̯ ḛu̅ fͩuͧg̦iͩa̓ť n̜u̼lͩl͠a̒ p̏a̽r̗i͆a͆t̳űr̀
407
408 Cyrillic
409 Lorem Ipsum используют потому, что тот обеспечивает более или менее
410 стандартное заполнение шаблона.
411
412 а также реальное распределение букв и пробелов в абзацах
413
414 Hebrew
415 זוהי עובדה מבוססת שדעתו של הקורא תהיה מוסחת על ידי טקטס קריא כאשר
416 הוא יביט בפריסתו.
417
418 המטרה בשימוש ב-Lorem Ipsum הוא שיש לו פחות או יותר תפוצה של אותיות, בניגוד למלל
419
420 Arabic
421 العديد من برامح النشر المكتبي وبرامح تحرير صفحات الويب تستخدم لوريم
422 إيبسوم بشكل إفتراضي
423
424 كنموذج عن النص، وإذا قمت بإدخال "lorem ipsum" في أي محرك بحث ستظهر العديد من
425
426 Hindi
427 Lorem Ipsum के अंश कई रूप में उपलब्ध हैं, लेकिन बहुमत को किसी अन्य
428 रूप में परिवर्तन का सामना करना पड़ा है, हास्य डालना या क्रमरहित
429 शब्द ,
430
431 जो तनिक भी विश्वसनीय नहीं लग रहे हो. यदि आप Lorem Ipsum के एक अनुच्छेद का उपयोग करने जा रहे हैं, तो आप को यकीन दिला दें कि पाठ के मध्य में वहाँ कुछ भी शर्मनाक छिपा हुआ नहीं है.
432
433 Chinese
434 无可否认,当读者在浏览一个页面的排版时,难免会被可阅读的内容所分散注意力。
435
436 Lorem Ipsum的目的就是为了保持字母多多少少标准及平
437
438 Largest well-known grapheme cluster in Unicode
439 ཧྐྵྨླྺྼྻྂ
440
441 <http://archives.miloush.net/michkap/archive/2010/04/28/10002896.html>.
442
444 Dmitry Karasik, <dmitry@karasik.eu.org>.
445
447 examples/bidi.pl
448
449
450
451perl v5.32.0 2020-07-28 Prima::Drawable::Glyphs(3)