1Prima::Drawable::GlyphsU(s3e)r Contributed Perl DocumentaPtriiomna::Drawable::Glyphs(3)
2
3
4
6 Prima::Drawable::Glyphs - helper routines for bi-directional text input
7 and complex scripts output
8
10 use Prima;
11 $::application-> begin_paint;
12 $::application-> text_shape_out('אפס123', 0,0);
13
14 123ספא
15
17 The class implements an abstraction over a set of glyphs that can be
18 rendered to represent text strings. Objects of the class are created
19 and returned from "Prima::Drawable::text_shape" calls, see more in
20 "text_shape" in Prima::Drawable. A "Prima::Drawable::Glyphs" object is
21 a blessed array reference that can contain either two, four, or five
22 packed arrays with 16-bit integers, representing, correspondingly, a
23 set of glyph indexes, a set of character indexes, a set of glyph
24 advances, a set of glyph position offsets per glyph, and a font index.
25 Additionally, the class implements several sets of helper routines that
26 aim to address common tasks when displaying glyph-based strings.
27
28 Structure
29 Each sub-array is an instance of "Prima::array", an effective plain
30 memory structure that provides standard perl interface over a string
31 scalar filled with fixed-width integers.
32
33 The following methods provide read-only access to these arrays:
34
35 glyphs
36 Contains a set of unsigned 16-bit integers where each is a glyph
37 number corresponding to the font that was used for shaping the
38 text. These glyph numbers are only applicable to that font. Zero is
39 usually treated as a default glyph in vector fonts, when shaping
40 cannot map a character; in bitmap fonts this number is usually same
41 as "defaultChar".
42
43 This array is recognized as a special case when is sent to
44 "text_out" or "get_text_width", that can process it without other
45 arrays. In this case, no special advances and glyph positions are
46 taken into the account though.
47
48 Each glyph is not necessarily mapped to a character, and quite
49 often is not, even in english left-to-right texts. F ex character
50 combinations like "ff", "fi", "fl" may be mapped to single ligature
51 glyphs. When right-to-left, RTL, text direction is taken into the
52 account, the glyph positions may change, too. See "indexes" below
53 that addresses mapping of glyphs to characters.
54
55 indexes
56 Contains a set of unsigned 16-bit integers where each is a text
57 offset corresponding to the text was used in shaping. Each glyph
58 position thus points to a first character in the text that maps to
59 the glyph.
60
61 There can be more than one character per glyph, such as the above
62 example with a "ff" ligature. There can also be cases with more
63 than one character per more than one glyph, f ex in indic scripts.
64 In these cases it is easier to operate neither by character offsets
65 nor by glyph offsets, but rather by clusters, where each cluster is
66 an individual syntax unit that contains one or more characters per
67 one or more glyphs.
68
69 In addition to the text offset, each index value can be flagged
70 with a "to::RTL" bit, signifying that the character in question has
71 RTL direction. This is not necessarily semitic characters from RTL
72 languages that only have that attribute set; spaces in these
73 languages are normally attributed the RTL bit too, sometimes also
74 numbers. Use of explicit direction control characters from U+20XX
75 block can result in any character being assigned or not assigned
76 the RTL bit.
77
78 The array has an extra item added to its end, the length of the
79 text that was used for the shaping. This helps for easy calculation
80 of cluster length in characters, especially of the last one, where
81 the difference between indexes is, basically, the cluster length.
82
83 The array is not used for text drawing or calculation, but only for
84 conversion between character, glyph, and cluster coordinates (see
85 "Coordinates" below).
86
87 advances
88 Contains a set of unsigned 16-bit integers where each is a pixel
89 distance of how much space the corresponding glyph occupies. Where
90 the advances array is not present, or was force-filled by
91 "advances" options in "text_shape", a glyph advance value is
92 basically a sum of a, b, and c widths of the corresponding glyph.
93 However there are cases when depending on shaping input, these
94 values can differ.
95
96 One of those cases is the combining graphemes, where the text
97 consisting of two characters, "A" and combining grave accent U+300
98 should be drawn as a single "À" symbol, and where the font doesn't
99 have that single glyph but rather two individual glyphs "A" and
100 "`". There, where the grave glyph has its own advance for
101 standalone usage, in this case it should be ignored though, and
102 that is achieved by the shaper setting the advance of the "`" to
103 zero.
104
105 The array content is respected by "text_out" and "get_text_width",
106 and its content can be changed at will to produce gaps in the text
107 quite easily. F ex "Prima::Edit" uses that to display tab
108 characters as spaces with 8x advance.
109
110 positions
111 Contains a set of pairs of signed 16-bit integers where each is a X
112 and Y pixel offset for each glyph. Like in the previous example
113 with the "À" symbol, the grave glyph "`" may be positioned
114 differently on the vertical axis in "À" and "à" graphemes, for
115 example.
116
117 The array is respected by "text_out" (but not by "get_text_width").
118
119 fonts
120 Contains a set of unsigned 16-bit integers where each is an index
121 in the font substitution list (see "font_mapper" in
122 Prima::Drawable). Zero means the current font.
123
124 The font substitution is applied by "text_shape" when "polyfont"
125 options is set (it is by default), and when the shaper cannot match
126 all fonts. If the current font contains all needed glyphs, this
127 entry is not present at all.
128
129 The array is respected by "text_out" and "get_text_width".
130
131 Coordinates
132 In addition to the natural character coordinates, where each index is a
133 text offset that can be directly used in "substr" perl function, the
134 "Prima::Drawable::Glyphs" class offers two additional coordinate
135 systems that help abstract the object data for display and navigation.
136
137 The glyph coordinate system is a rather straighforward copy of the
138 character coordinate system, where each number is an offset in the
139 "glyphs" array. Similarly, these offsets can be used to address
140 individual glyphs, indexes, advances, and positions. However these are
141 not easy to use when one needs, for example, to select a grapheme with
142 a mouse, or break set of glyphs in such a way so that a grapheme is not
143 broken. These can be managed easier in the cluster coordinate system.
144
145 The cluster coordinates represent a virtually superimposed set of
146 offsets where each corresponds to a set of one or more characters
147 displayed by a one or more glyphs. Most useful functions below operate
148 in this system.
149
150 Selection
151 Practically, most useful coordinates that can be used for implementing
152 selection is either character or cluster, but not glyphs. The charater-
153 based selections makes trivial extraction or replacement of the
154 selected text, while the cluster-based makes it easier to manipulate (f
155 ex with Shift- arrow keys) the selection itself.
156
157 The class supports both, by operating on selection maps or selection
158 chunks, where each represent same information but in different ways.
159 For example, consider embedded number in a bidi text. For the sake of
160 clarity I'll use latin characters here. Let's have a text scalar
161 containing these characters:
162
163 ABC123
164
165 where ABC is right-to-left text, and which, when rendered on screen,
166 should be displayed as
167
168 123CBA
169
170 (and index array is (3,4,5,2,1,0) ).
171
172 Next, the user clicks the mouse between A and B (in text offset 1),
173 drags the mouse then to the left, and finally stops between characters
174 2 and 3 (text offset 4). The resulting selection then should not be, as
175 one might naively expect, this:
176
177 123CBA
178 __^^^_
179
180 but this instead:
181
182 123CBA
183 ^^_^^_
184
185 because the next character after C is 1, and the range of the selected
186 sub-text is from characters 1 to 4.
187
188 The class offers to encode such information in a map, i.e. array of
189 integers "1,1,0,1,1,0", where each entry is either 0 or 1 depending on
190 whether the cluster is or is not selected. Alternatively, the same
191 information can be encoded in chunks, or RLE sets, as array
192 "0,2,1,2,1", where the first integer signifies number of non-selected
193 clusters to display, the second - number of selected clusters, the
194 third the non-selected again, etc. If the first character belongs to
195 the selected chunk, the first integer in the result is set to 0.
196
197 Bidi input
198 When sending input to a widget in order to type in text, the otherwise
199 trivial case of figuring out at which position the text should be
200 inserted (or removed, for that matter), becomes interesting when there
201 are characters with mixed direction.
202
203 F ex it is indeed trivial, when the latin text is "AB", and the cursor
204 is positioned between "A" and "B", to figure out that whenever the user
205 types "C", the result should become "ACB". Likewise, when the text is
206 RTL and both text and input is arabic, the result is the same. However
207 when f.ex. the text is "A1", that is displayed as "1A" because of RTL
208 shaping, and the cursor is positioned between 1 (LTR) and "A" (RTL), it
209 is not clear whether that means the new input should be appended after
210 1 and become "A1C", or after "A", and become, correspondingly, "AC1".
211
212 There is no easy solution for this problem, and different programs
213 approach this differently, and some go as far as to provide two cursors
214 for both directions. The class offers its own solution that uses some
215 primitive heuristics to detect whether cursor belongs to the left or to
216 the right glyph. This is the area that can be enhanced, and any help
217 from native users of RTL languages can be greatly appreciated.
218
220 abc $CANVAS, $INDEX
221 Returns a, b, c metrics from the glyph $INDEX
222
223 advances
224 Read-only accessor to the advances array, see Structure above.
225
226 clone
227 Clones the object
228
229 cluster2glyph $FROM, $LENGTH
230 Maps a range of clusters starting with $FROM with size $LENGTH into
231 the corresponding range of glyphs. Undefined $LENGTH calculates the
232 range from $FROM till the object end.
233
234 cluster2index $CLUSTER
235 Returns character offset of the first character in cluster
236 $CLUSTER.
237
238 Note: result may contain "to::RTL" flag.
239
240 cluster2range $CLUSTER
241 Returns character offset of the first character in cluster $CLUSTER
242 and how many characters are there in the cluster.
243
244 clusters
245 Returns array of integers where each is a first character offsets
246 per cluster.
247
248 cursor2offset $AT_CLUSTER, $PREFERRED_RTL
249 Given a cursor positioned next to the cluster $AT_CLUSTER, runs
250 simple heuristics to see what character offset it corresponds to.
251 $PREFERRED_RTL is used when object data are not enough.
252
253 See "Bidi input" above.
254
255 def $CANVAS, $INDEX
256 Returns d, e, f metrics from the glyph $INDEX
257
258 fonts
259 Read-only accessor to the font indexes, see Structure above.
260
261 get_box $CANVAS
262 Return box metrics of the glyph object.
263
264 See "get_text_box" in Prima::Drawable.
265
266 get_sub $FROM, $LENGTH
267 Extracts and clones a new object that constains data from cluster
268 offset $FROM, with cluster length $LENGTH.
269
270 get_sub_box $CANVAS, $FROM, $LENGTH
271 Calculate box metrics of a glyph string from the cluster $FROM with
272 size $LENGTH.
273
274 get_sub_width $CANVAS, $FROM, $LENGTH
275 Calculate pixel width of a glyph string from the cluster $FROM with
276 size $LENGTH.
277
278 get_width $CANVAS, $WITH_OVERHANGS
279 Return width of the glyph objects, with overhangs if requested.
280
281 glyph2cluster $GLYPH
282 Return the cluster that contains $GLYPH.
283
284 glyphs
285 Read-only accessor to the glyph indexes, see Structure above.
286
287 glyph_lengths
288 Returns array where each glyph position is set to a number showing
289 how many glyphs the cluster occupies at this position
290
291 index2cluster $INDEX
292 Returns the cluster that contains the character offset $INDEX.
293
294 indexes
295 Read-only accessor to the indexes, see Structure above.
296
297 index_lengths
298 Returns array where each glyph position is set to a number showing
299 how many characters the cluster occupies at this position
300
301 justify CANVAS, TEXT, WIDTH, %OPTIONS
302 Umbrella call for "justify_interspace" if $OPTIONS{letter} or
303 $OPTIONS{word} if set; for "justify_arabic" if $OPTIONS{kashida} is
304 set; and for "justify_tabs" if $OPTIONS{tabs} is set.
305
306 Returns a boolean flag whether the glyph object was changed or not.
307
308 justify_arabic CANVAS, TEXT, WIDTH, %OPTIONS
309 Performs justifications of arabic TEXT with kashida to the given
310 WIDTH, returns either success flag, or new text with explicit
311 tatweel characters inserted.
312
313 my $text = "\x{6a9}\x{634}\x{6cc}\x{62f}\x{647}";
314 my $g = $canvas->text_shape($text) or return;
315 $canvas->text_out($g, 10, 50);
316 $g->justify_arabic($canvas, $text, 200) or return;
317 $canvas->text_out($g, 10, 10);
318
319 Inserts tatweels only between arabic letters that did not form any
320 ligatures in the glyph object, max one tatweel set per word (if
321 any). Does not apply the justification if the letters in the word
322 are rendered as LTR due to embedding or explcit shaping options;
323 only does justification on RTL letters. If for some reason newly
324 inserted tatweels do not form a monotonically increasing series
325 after shaping, skips the justifications in that word.
326
327 Note: Does not use JSTF font table, on Windows results may be
328 different from native rendering.
329
330 Options:
331
332 If justification is found to be needed, eventual ligatures with
333 newly inserted tatweel glyphs are resolved via a call to
334 "text_shape(%OPTIONS)" - so any needed shaping options, such as
335 "language", may be passed there.
336
337 as_text BOOL = 0
338 If set, returns new text with inserted tatweels, or undef if no
339 justification is possible.
340
341 If unset, runs inplace justification on the caller glyph
342 object, and returns the boolean success flag.
343
344 min_kashida INTEGER = 0
345 Specifies minimal width of a kashida strike to be inserted.
346
347 kashida_width INTEGER
348 During the calculation a width of a tatweel glyph is needed -
349 unless supplied by this option, it is calculated dynamically.
350 Also, when called in list context, and succeeded, returns " 1,
351 kashida_width " that can be reused in subsequent calls.
352
353 justify_interspace CANVAS, TEXT, WIDTH, %OPTIONS
354 Performs inplace inter-letter and/or inter-word justifications of
355 TEXT to the given WIDTH. Returns either a boolean flag whether
356 there were any change made, or, new text with explicit space
357 characters inserted.
358
359 Options:
360
361 as_text BOOL = 0
362 If set, returns new text with inserted spaces, or undef if no
363 justification is possible.
364
365 If unset, runs inplace justification on the caller glyph
366 object, and returns the boolean success flag.
367
368 letter BOOL = 1
369 If set, runs an inter-letter spacing on all glyphs.
370
371 max_interletter FLOAT = 1.05
372 When the inter-letter spacing is applied, it is applied first,
373 and can take up to "$OPTIONS{max_interletter} * glyph_width"
374 space.
375
376 Inter-word spacing does not have such limit, and in worst case,
377 can produce two words moved to the left and to the right edges
378 of the enclosing 0 - WIDTH-1 rectangle.
379
380 space_width INTEGER
381 "as_text" mode: during the calculation the width of space glyph
382 may be needed - unless supplied by $OPTIONS{space_width}, it is
383 calculated dynamically. Also, when called in list context, and
384 succeeded, returns " 1, space_width " that can be reused in
385 subsequent calls.
386
387 word BOOL = 1
388 If set, runs an inter-word spacing by extending advances on all
389 space glyphs.
390
391 min_text_to_space_ratio FLOAT = 0.75
392 If "word" set, does not run inter-word justification if text to
393 space ratio is too small (i e don't spread text too thin )
394
395 justify_tabs CANVAS, TEXT, %OPTIONS
396 Expands tabs as $OPTIONS{tabs} (default:8) spaces.
397
398 Needs glyph and the advance of the space glyph to replace the tab
399 glyph. If no $OPTIONS{glyph} and $OPTIONS{width} are specified,
400 calculates them.
401
402 Returns a boolean flag whether there were any change made. On
403 success, if called in the list context, returns also space glyph ID
404 and space glyph width for eventual use on the later calls.
405
406 left_overhang
407 First integer from the "overhangs" result.
408
409 log2vis
410 Returns a map of integers where each character position corresponds
411 to a glyph position. The name is a rudiment from pure fribidi
412 shaping, where "log2vis" and "vis2log" were mapper functions with
413 the same functionality.
414
415 n_clusters
416 Calculates how many clusters the object contains.
417
418 new @ARRAYS
419 Create new object. Not used directly, but rather from inside
420 "text_shape" calls.
421
422 new_array NAME
423 Creates an array suitable for the object for direct insertion, if
424 manual construction of the object is needed. F ex one may set
425 missing "fonts" array like this:
426
427 $obj->[ Prima::Drawable::Glyphs::FONTS() ] = $obj->new_array('fonts');
428 $obj->fonts->[0] = 1;
429
430 The newly created array is filled with zeros.
431
432 new_empty
433 Creates a new empty object.
434
435 overhangs
436 Calculates two pixel widths for overhangs in the beginning and in
437 the end of the glyph string. This is used in emulation of a
438 "get_text_width" call with the "to::AddOverhangs" flag.
439
440 positions
441 Read-only accessor to the positions array, see Structure above.
442
443 reorder_text TEXT
444 Returns a visual representation of "TEXT" assuming it was the input
445 of the "text_shape" call that created the object.
446
447 reverse
448 Creates a new object that has all arrays reversed. User for
449 calculation of pixel offset from the right end of a glyph string.
450
451 right_overhang
452 Second integer from the "overhangs" result.
453
454 selection2range $CLUSTER_START $CLUSTER_END
455 Converts cluster selection range into text selection range
456
457 selection_chunks_clusters, selection_chunks_glyphs $START, $END
458 Calculates a set of chunks of texts, that, given a text selection
459 from positions $START to $END, represent each either a set of
460 selected and non-selected clusters/glyphs.
461
462 selection_diff $OLD, $NEW
463 Given set of two chunk lists, in format as returned by
464 "selection_chunks_clusters" or "selection_chunks_glyphs",
465 calculates the list of chunks affected by the selection change. Can
466 be used for efficient repaints when the user interactively changes
467 text selection, to redraw only the changed regions.
468
469 selection_map_clusters, selection_map_glyphs $START, $END
470 Same as "selection_chunks_XXX", but instead of RLE chunks returns
471 full array for each cluster/glyph, where each entry is a boolean
472 value corresponding to whether that cluster/glyph is to be
473 displayed as selected, or not.
474
475 selection_walk $CHUNKS, $FROM, $TO = length, $SUB
476 Walks the selection chunks array, returned by "selection_chunks",
477 between $FROM and $TO clusters/glyphs, and for each chunk calls the
478 provided "$SUB->($offset, $length, $selected)", where each call
479 contains 2 integers to chunk offset and length, and a boolean flag
480 whether the chunk is selected or not.
481
482 Can be also used on a result of "selection_diff", in which case
483 $selected flag is irrelevant.
484
485 sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
486 Optimized version of "$CANVAS->text_out( $self->get_sub($FROM,
487 $LENGTH), $X, $Y )".
488
489 sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
490 Optimized version of "$CANVAS->text_wrap( $self->get_sub($FROM,
491 $LENGTH), $WIDTH, $OPT, $TABS )". The result is also converted to
492 chunks.
493
494 text_length
495 Returns the length of the text that was shaped and that produced
496 the object.
497
498 x2cluster $CANVAS, $X, $FROM, $LENGTH
499 Given sub-cluster from $FROM with size $LENGTH, calculates how many
500 clusters would fit in width $X.
501
502 _debug
503 Dumps glyph object content in a readable format.
504
506 This section is only there to test proper rendering
507
508 Latin
509 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
510 eiusmod tempor incididunt ut labore et dolore magna aliqua.
511
512 Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
513
514 Latin combining
515 D̍üi̔s͙ a̸u̵t͏eͬ ịr͡u̍r͜e̥ d͎ǒl̋o̻rͫ i̮n̓
516 r͐e̔p͊rͨe̾h̍e͐n̔ḋe͠r̕i̾t̅ ịn̷ vͅo̖lͦuͦpͧt̪ątͅe̪
517
518 v̰e̷l̳i̯t̽ e̵s̼s̈e̮ ċi̵l͟l͙u͆m͂ d̿o̙lͭo͕r̀e̯ ḛu̅ fͩuͧg̦iͩa̓ť n̜u̼lͩl͠a̒ p̏a̽r̗i͆a͆t̳űr̀
519
520 Cyrillic
521 Lorem Ipsum используют потому, что тот обеспечивает более или менее
522 стандартное заполнение шаблона.
523
524 а также реальное распределение букв и пробелов в абзацах
525
526 Hebrew
527 זוהי עובדה מבוססת שדעתו של הקורא תהיה מוסחת על ידי טקטס קריא כאשר
528 הוא יביט בפריסתו.
529
530 המטרה בשימוש ב-Lorem Ipsum הוא שיש לו פחות או יותר תפוצה של אותיות, בניגוד למלל
531
532 Arabic
533 العديد من برامح النشر المكتبي وبرامح تحرير صفحات الويب تستخدم لوريم
534 إيبسوم بشكل إفتراضي
535
536 كنموذج عن النص، وإذا قمت بإدخال "lorem ipsum" في أي محرك بحث ستظهر العديد من
537
538 Hindi
539 Lorem Ipsum के अंश कई रूप में उपलब्ध हैं, लेकिन बहुमत को किसी अन्य
540 रूप में परिवर्तन का सामना करना पड़ा है, हास्य डालना या क्रमरहित
541 शब्द ,
542
543 जो तनिक भी विश्वसनीय नहीं लग रहे हो. यदि आप Lorem Ipsum के एक अनुच्छेद का उपयोग करने जा रहे हैं, तो आप को यकीन दिला दें कि पाठ के मध्य में वहाँ कुछ भी शर्मनाक छिपा हुआ नहीं है.
544
545 Chinese
546 无可否认,当读者在浏览一个页面的排版时,难免会被可阅读的内容所分散注意力。
547
548 Lorem Ipsum的目的就是为了保持字母多多少少标准及平
549
550 Thai
551 มีหลักฐานที่เป็นข้อเท็จจริงยืนยันมานานแล้ว
552 ว่าเนื้อหาที่อ่านรู้เรื่องนั้นจะไปกวนสมาธิของคนอ่านให้เขวไปจากส่วนที้เป็น
553 Layout เรานำ Lorem Ipsum
554 มาใช้เพราะความที่มันมีการกระจายของตัวอักษรธรรมดาๆ แบบพอประมาณ
555 ซึ่งเอามาใช้แทนการเขียนว่า ‘ตรงนี้เป็นเนื้อหา, ตรงนี้เป็นเนื้อหา'
556 ได้ และยังทำให้มองดูเหมือนกับภาษาอังกฤษที่อ่านได้ปกติ
557 ปัจจุบันมีแพ็กเกจของซอฟท์แวร์การทำสื่อสิ่งพิมพ์
558 และซอฟท์แวร์การสร้างเว็บเพจ
559
560 กวนสมาธิของคนอ่านให้เขวไปจากส่วนที้เป็น Layout เรานำ Lorem Ipsum
561
562 (Note: libthai is required for text wrapping by the word boundary)
563
564 Largest well-known grapheme cluster in Unicode
565 ཧྐྵྨླྺྼྻྂ
566
567 <http://archives.miloush.net/michkap/archive/2010/04/28/10002896.html>.
568
570 Dmitry Karasik, <dmitry@karasik.eu.org>.
571
573 examples/bidi.pl
574
575
576
577perl v5.34.1 2022-04-20 Prima::Drawable::Glyphs(3)