1Prima::Drawable::GlyphsU(s3e)r Contributed Perl DocumentaPtriiomna::Drawable::Glyphs(3)
2
3
4

NAME

6       Prima::Drawable::Glyphs - helper routines for bi-directional text input
7       and complex scripts output
8

SYNOPSIS

10          use Prima;
11          $::application-> begin_paint;
12          ‭$::application-> text_shape_out('אפס123', 0,0);
13
14          ‭123ספא
15

DESCRIPTION

17       The class implements an abstraction over a set of glyphs that can be
18       rendered to represent text strings. Objects of the class are created
19       and returned from "Prima::Drawable::text_shape" calls, see more in
20       "text_shape" in Prima::Drawable. A "Prima::Drawable::Glyphs" object is
21       a blessed array reference that can contain either two, four, or five
22       packed arrays with 16-bit integers, representing, correspondingly, a
23       set of glyph indexes, a set of character indexes, a set of glyph
24       advances, a set of glyph position offsets per glyph, and a font index.
25       Additionally, the class implements several sets of helper routines that
26       aim to address common tasks when displaying glyph-based strings.
27
28   Structure
29       Each sub-array is an instance of "Prima::array", an effective plain
30       memory structure that provides standard perl interface over a string
31       scalar filled with fixed-width integers.
32
33       The following methods provide read-only access to these arrays:
34
35       glyphs
36           Contains a set of unsigned 16-bit integers where each is a glyph
37           number corresponding to the font that was used for shaping the
38           text. These glyph numbers are only applicable to that font. Zero is
39           usually treated as a default glyph in vector fonts, when shaping
40           cannot map a character; in bitmap fonts this number is usually same
41           as "defaultChar".
42
43           This array is recognized as a special case when is sent to
44           "text_out" or "get_text_width", that can process it without other
45           arrays. In this case, no special advances and glyph positions are
46           taken into the account though.
47
48           Each glyph is not necessarily mapped to a character, and quite
49           often is not, even in english left-to-right texts. F ex character
50           combinations like "ff", "fi", "fl" may be mapped to single ligature
51           glyphs. When right-to-left, RTL, text direction is taken into the
52           account, the glyph positions may change, too.  See "indexes" below
53           that addresses mapping of glyphs to characters.
54
55       indexes
56           Contains a set of unsigned 16-bit integers where each is a text
57           offset corresponding to the text was used in shaping. Each glyph
58           position thus points to a first character in the text that maps to
59           the glyph.
60
61           There can be more than one character per glyph, such as the above
62           example with a "ff" ligature. There can also be cases with more
63           than one character per more than one glyph, f ex in indic scripts.
64           In these cases it is easier to operate neither by character offsets
65           nor by glyph offsets, but rather by clusters, where each cluster is
66           an individual syntax unit that contains one or more characters per
67           one or more glyphs.
68
69           In addition to the text offset, each index value can be flagged
70           with a "to::RTL" bit, signifying that the character in question has
71           RTL direction.  This is not necessarily semitic characters from RTL
72           languages that only have that attribute set; spaces in these
73           languages are normally attributed the RTL bit too, sometimes also
74           numbers. Use of explicit direction control characters from U+20XX
75           block can result in any character being assigned or not assigned
76           the RTL bit.
77
78           The array has an extra item added to its end, the length of the
79           text that was used for the shaping. This helps for easy calculation
80           of cluster length in characters, especially of the last one, where
81           the difference between indexes is, basically, the cluster length.
82
83           The array is not used for text drawing or calculation, but only for
84           conversion between character, glyph, and cluster coordinates (see
85           "Coordinates" below).
86
87       advances
88           Contains a set of unsigned 16-bit integers where each is a pixel
89           distance of how much space the corresponding glyph occupies. Where
90           the advances array is not present, or was force-filled by
91           "advances" options in "text_shape", a glyph advance value is
92           basically a sum of a, b, and c widths of the corresponding glyph.
93           However there are cases when depending on shaping input, these
94           values can differ.
95
96           One of those cases is the combining graphemes, where the text
97           consisting of two characters, "A" and combining grave accent U+300
98           should be drawn as a single "À" symbol, and where the font doesn't
99           have that single glyph but rather two individual glyphs "A" and
100           "`". There, where the grave glyph has its own advance for
101           standalone usage, in this case it should be ignored though, and
102           that is achieved by the shaper setting the advance of the "`" to
103           zero.
104
105           The array content is respected by "text_out" and "get_text_width",
106           and its content can be changed at will to produce gaps in the text
107           quite easily. F ex "Prima::Edit" uses that to display tab
108           characters as spaces with 8x advance.
109
110       positions
111           Contains a set of pairs of signed 16-bit integers where each is a X
112           and Y pixel offset for each glyph. Like in the previous example
113           with the "À" symbol, the grave glyph "`" may be positioned
114           differently on the vertical axis in "À" and "à" graphemes, for
115           example.
116
117           The array is respected by "text_out" (but not by "get_text_width").
118
119       fonts
120           Contains a set of unsigned 16-bit integers where each is an index
121           in the font substitution list (see "font_mapper" in
122           Prima::Drawable). Zero means the current font.
123
124           The font substitution is applied by "text_shape" when "polyfont"
125           options is set (it is by default), and when the shaper cannot match
126           all fonts. If the current font contains all needed glyphs, this
127           entry is not present at all.
128
129           The array is respected by "text_out" and "get_text_width".
130
131   Coordinates
132       In addition to the natural character coordinates, where each index is a
133       text offset that can be directly used in "substr" perl function, the
134       "Prima::Drawable::Glyphs" class offers two additional coordinate
135       systems that help abstract the object data for display and navigation.
136
137       The glyph coordinate system is a rather straighforward copy of the
138       character coordinate system, where each number is an offset in the
139       "glyphs" array. Similarly, these offsets can be used to address
140       individual glyphs, indexes, advances, and positions. However these are
141       not easy to use when one needs, for example, to select a grapheme with
142       a mouse, or break set of glyphs in such a way so that a grapheme is not
143       broken. These can be managed easier in the cluster coordinate system.
144
145       The cluster coordinates represent a virtually superimposed set of
146       offsets where each corresponds to a set of one or more characters
147       displayed by a one or more glyphs. Most useful functions below operate
148       in this system.
149
150   Selection
151       Practically, most useful coordinates that can be used for implementing
152       selection is either character or cluster, but not glyphs. The charater-
153       based selections makes trivial extraction or replacement of the
154       selected text, while the cluster-based makes it easier to manipulate (f
155       ex with Shift- arrow keys) the selection itself.
156
157       The class supports both, by operating on selection maps or selection
158       chunks, where each represent same information but in different ways.
159       For example, consider embedded number in a bidi text. For the sake of
160       clarity I'll use latin characters here. Let's have a text scalar
161       containing these characters:
162
163          ABC123
164
165       where ABC is right-to-left text, and which, when rendered on screen,
166       should be displayed as
167
168          123CBA
169
170       (and index array is (3,4,5,2,1,0) ).
171
172       Next, the user clicks the mouse between A and B (in text offset 1),
173       drags the mouse then to the left, and finally stops between characters
174       2 and 3 (text offset 4). The resulting selection then should not be, as
175       one might naively expect, this:
176
177          123CBA
178          __^^^_
179
180       but this instead:
181
182          123CBA
183          ^^_^^_
184
185       because the next character after C is 1, and the range of the selected
186       sub-text is from characters 1 to 4.
187
188       The class offers to encode such information in a map, i.e. array of
189       integers "1,1,0,1,1,0", where each entry is either 0 or 1 depending on
190       whether the cluster is or is not selected.  Alternatively, the same
191       information can be encoded in chunks, or RLE sets, as array
192       "0,2,1,2,1", where the first integer signifies number of non-selected
193       clusters to display, the second - number of selected clusters, the
194       third the non-selected again, etc. If the first character belongs to
195       the selected chunk, the first integer in the result is set to 0.
196
197   Bidi input
198       When sending input to a widget in order to type in text, the otherwise
199       trivial case of figuring out at which position the text should be
200       inserted (or removed, for that matter), becomes interesting when there
201       are characters with mixed direction.
202
203       F ex it is indeed trivial, when the latin text is "AB", and the cursor
204       is positioned between "A" and "B", to figure out that whenever the user
205       types "C", the result should become "ACB". Likewise, when the text is
206       RTL and both text and input is arabic, the result is the same. However
207       when f.ex. the text is "A1", that is displayed as "1A" because of RTL
208       shaping, and the cursor is positioned between 1 (LTR) and "A" (RTL), it
209       is not clear whether that means the new input should be appended after
210       1 and become "A1C", or after "A", and become, correspondingly, "AC1".
211
212       There is no easy solution for this problem, and different programs
213       approach this differently, and some go as far as to provide two cursors
214       for both directions. The class offers its own solution that uses some
215       primitive heuristics to detect whether cursor belongs to the left or to
216       the right glyph.  This is the area that can be enhanced, and any help
217       from native users of RTL languages can be greatly appreciated.
218

API

220       abc $CANVAS, $INDEX
221           Returns a, b, c metrics from the glyph $INDEX
222
223       advances
224           Read-only accessor to the advances array, see Structure above.
225
226       clone
227           Clones the object
228
229       cluster2glyph $FROM, $LENGTH
230           Maps a range of clusters starting with $FROM with size $LENGTH into
231           the corresponding range of glyphs. Undefined $LENGTH calculates the
232           range from $FROM till the object end.
233
234       cluster2index $CLUSTER
235           Returns character offset of the first character in cluster
236           $CLUSTER.
237
238           Note: result may contain "to::RTL" flag.
239
240       cluster2range $CLUSTER
241           Returns character offset of the first character in cluster $CLUSTER
242           and how many characters are there in the cluster.
243
244       clusters
245           Returns array of integers where each is a first character offsets
246           per cluster.
247
248       cursor2offset $AT_CLUSTER, $PREFERRED_RTL
249           Given a cursor positioned next to the cluster $AT_CLUSTER, runs
250           simple heuristics to see what character offset it corresponds to.
251           $PREFERRED_RTL is used when object data are not enough.
252
253           See "Bidi input" above.
254
255       def $CANVAS, $INDEX
256           Returns d, e, f metrics from the glyph $INDEX
257
258       fonts
259           Read-only accessor to the font indexes, see Structure above.
260
261       get_box $CANVAS
262           Return box metrics of the glyph object.
263
264           See "get_text_box" in Prima::Drawable.
265
266       get_sub $FROM, $LENGTH
267           Extracts and clones a new object that constains data from cluster
268           offset $FROM, with cluster length $LENGTH.
269
270       get_sub_box $CANVAS, $FROM, $LENGTH
271           Calculate box metrics of a glyph string from the cluster $FROM with
272           size $LENGTH.
273
274       get_sub_width $CANVAS, $FROM, $LENGTH
275           Calculate pixel width of a glyph string from the cluster $FROM with
276           size $LENGTH.
277
278       get_width $CANVAS, $WITH_OVERHANGS
279           Return width of the glyph objects, with overhangs if requested.
280
281       glyph2cluster $GLYPH
282           Return the cluster that contains $GLYPH.
283
284       glyphs
285           Read-only accessor to the glyph indexes, see Structure above.
286
287       glyph_lengths
288           Returns array where each glyph position is set to a number showing
289           how many glyphs the cluster occupies at this position
290
291       index2cluster $INDEX, $ADVANCE = 0
292           Returns the cluster that contains the character offset $INDEX.
293
294           $ADVANCE is set to 1 if need to add the RTL-dependent advance to
295           the resulting cluser
296
297       indexes
298           Read-only accessor to the indexes, see Structure above.
299
300       index_lengths
301           Returns array where each glyph position is set to a number showing
302           how many characters the cluster occupies at this position
303
304       justify CANVAS, TEXT, WIDTH, %OPTIONS
305           Umbrella call for "justify_interspace" if $OPTIONS{letter} or
306           $OPTIONS{word} if set; for "justify_arabic" if $OPTIONS{kashida} is
307           set; and for "justify_tabs" if $OPTIONS{tabs} is set.
308
309           Returns a boolean flag whether the glyph object was changed or not.
310
311       justify_arabic CANVAS, TEXT, WIDTH, %OPTIONS
312           Performs justifications of arabic TEXT with kashida to the given
313           WIDTH, returns either success flag, or new text with explicit
314           tatweel characters inserted.
315
316              my $text = "\x{6a9}\x{634}\x{6cc}\x{62f}\x{647}";
317              my $g = $canvas->text_shape($text) or return;
318              $canvas->text_out($g, 10, 50);
319              $g->justify_arabic($canvas, $text, 200) or return;
320              $canvas->text_out($g, 10, 10);
321
322           Inserts tatweels only between arabic letters that did not form any
323           ligatures in the glyph object, max one tatweel set per word (if
324           any). Does not apply the justification if the letters in the word
325           are rendered as LTR due to embedding or explcit shaping options;
326           only does justification on RTL letters. If for some reason newly
327           inserted tatweels do not form a monotonically increasing series
328           after shaping, skips the justifications in that word.
329
330           Note: Does not use JSTF font table, on Windows results may be
331           different from native rendering.
332
333           Options:
334
335           If justification is found to be needed, eventual ligatures with
336           newly inserted tatweel glyphs are resolved via a call to
337           text_shape(%OPTIONS) - so any needed shaping options, such as
338           "language", may be passed there.
339
340           as_text BOOL = 0
341               If set, returns new text with inserted tatweels, or undef if no
342               justification is possible.
343
344               If unset, runs inplace justification on the caller glyph
345               object, and returns the boolean success flag.
346
347           min_kashida INTEGER = 0
348               Specifies minimal width of a kashida strike to be inserted.
349
350           kashida_width INTEGER
351               During the calculation a width of a tatweel glyph is needed -
352               unless supplied by this option, it is calculated dynamically.
353               Also, when called in list context, and succeeded, returns " 1,
354               kashida_width " that can be reused in subsequent calls.
355
356       justify_interspace CANVAS, TEXT, WIDTH, %OPTIONS
357           Performs inplace inter-letter and/or inter-word justifications of
358           TEXT to the given WIDTH. Returns either a boolean flag whether
359           there were any change made, or, new text with explicit space
360           characters inserted.
361
362           Options:
363
364           as_text BOOL = 0
365               If set, returns new text with inserted spaces, or undef if no
366               justification is possible.
367
368               If unset, runs inplace justification on the caller glyph
369               object, and returns the boolean success flag.
370
371           letter BOOL = 1
372               If set, runs an inter-letter spacing on all glyphs.
373
374           max_interletter FLOAT = 1.05
375               When the inter-letter spacing is applied, it is applied first,
376               and can take up to "$OPTIONS{max_interletter} * glyph_width"
377               space.
378
379               Inter-word spacing does not have such limit, and in worst case,
380               can produce two words moved to the left and to the right edges
381               of the enclosing 0 - WIDTH-1 rectangle.
382
383           space_width INTEGER
384               "as_text" mode: during the calculation the width of space glyph
385               may be needed - unless supplied by $OPTIONS{space_width}, it is
386               calculated dynamically.  Also, when called in list context, and
387               succeeded, returns " 1, space_width " that can be reused in
388               subsequent calls.
389
390           word BOOL = 1
391               If set, runs an inter-word spacing by extending advances on all
392               space glyphs.
393
394           min_text_to_space_ratio FLOAT = 0.75
395               If "word" set, does not run inter-word justification if text to
396               space ratio is too small (i e don't spread text too thin )
397
398       justify_tabs CANVAS, TEXT, %OPTIONS
399           Expands tabs as $OPTIONS{tabs} (default:8) spaces.
400
401           Needs glyph and the advance of the space glyph to replace the tab
402           glyph.  If no $OPTIONS{glyph} and $OPTIONS{width} are specified,
403           calculates them.
404
405           Returns a boolean flag whether there were any change made. On
406           success, if called in the list context, returns also space glyph ID
407           and space glyph width for eventual use on the later calls.
408
409       left_overhang
410           First integer from the "overhangs" result.
411
412       log2vis
413           Returns a map of integers where each character position corresponds
414           to a glyph position. The name is a rudiment from pure fribidi
415           shaping, where "log2vis" and "vis2log" were mapper functions with
416           the same functionality.
417
418       n_clusters
419           Calculates how many clusters the object contains.
420
421       new @ARRAYS
422           Create new object. Not used directly, but rather from inside
423           "text_shape" calls.
424
425       new_array NAME
426           Creates an array suitable for the object for direct insertion, if
427           manual construction of the object is needed. F ex one may set
428           missing "fonts" array like this:
429
430              $obj->[ Prima::Drawable::Glyphs::FONTS() ] = $obj->new_array('fonts');
431              $obj->fonts->[0] = 1;
432
433           The newly created array is filled with zeros.
434
435       new_empty
436           Creates a new empty object.
437
438       overhangs
439           Calculates two pixel widths for overhangs in the beginning and in
440           the end of the glyph string.  This is used in emulation of a
441           "get_text_width" call with the "to::AddOverhangs" flag.
442
443       positions
444           Read-only accessor to the positions array, see Structure above.
445
446       reorder_text TEXT
447           Returns a visual representation of "TEXT" assuming it was the input
448           of the "text_shape" call that created the object.
449
450       reverse
451           Creates a new object that has all arrays reversed. User for
452           calculation of pixel offset from the right end of a glyph string.
453
454       right_overhang
455           Second integer from the "overhangs" result.
456
457       selection2range $CLUSTER_START $CLUSTER_END
458           Converts cluster selection range into text selection range
459
460       selection_chunks_clusters, selection_chunks_glyphs $START, $END
461           Calculates a set of chunks of texts, that, given a text selection
462           from positions $START to $END, represent each either a set of
463           selected and non-selected clusters/glyphs.
464
465       selection_diff $OLD, $NEW
466           Given set of two chunk lists, in format as returned by
467           "selection_chunks_clusters" or "selection_chunks_glyphs",
468           calculates the list of chunks affected by the selection change. Can
469           be used for efficient repaints when the user interactively changes
470           text selection, to redraw only the changed regions.
471
472       selection_map_clusters, selection_map_glyphs $START, $END
473           Same as "selection_chunks_XXX", but instead of RLE chunks returns
474           full array for each cluster/glyph, where each entry is a boolean
475           value corresponding to whether that cluster/glyph is to be
476           displayed as selected, or not.
477
478       selection_walk $CHUNKS, $FROM, $TO = length, $SUB
479           Walks the selection chunks array, returned by "selection_chunks",
480           between $FROM and $TO clusters/glyphs, and for each chunk calls the
481           provided "$SUB->($offset, $length, $selected)", where each call
482           contains 2 integers to chunk offset and length, and a boolean flag
483           whether the chunk is selected or not.
484
485           Can be also used on a result of "selection_diff", in which case
486           $selected flag is irrelevant.
487
488       sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
489           Optimized version of "$CANVAS->text_out( $self->get_sub($FROM,
490           $LENGTH), $X, $Y )".
491
492       sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
493           Optimized version of "$CANVAS->text_wrap( $self->get_sub($FROM,
494           $LENGTH), $WIDTH, $OPT, $TABS )".  The result is also converted to
495           chunks.
496
497       text_length
498           Returns the length of the text that was shaped and that produced
499           the object.
500
501       x2cluster $CANVAS, $X, $FROM, $LENGTH
502           Given sub-cluster from $FROM with size $LENGTH, calculates how many
503           clusters would fit in width $X.
504
505       _debug
506           Dumps glyph object content in a readable format.
507

EXAMPLES

509       This section is only there to test proper rendering
510
511       Latin
512           Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
513           eiusmod tempor incididunt ut labore et dolore magna aliqua.
514
515              Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
516
517       Latin combining
518           D̍üi̔s͙ a̸u̵t͏eͬ ịr͡u̍r͜e̥ d͎ǒl̋o̻rͫ i̮n̓
519           r͐e̔p͊rͨe̾h̍e͐n̔ḋe͠r̕i̾t̅ ịn̷ vͅo̖lͦuͦpͧt̪ątͅe̪
520
521              v̰e̷l̳i̯t̽ e̵s̼s̈e̮ ċi̵l͟l͙u͆m͂ d̿o̙lͭo͕r̀e̯ ḛu̅ fͩuͧg̦iͩa̓ť n̜u̼lͩl͠a̒ p̏a̽r̗i͆a͆t̳űr̀
522
523       Cyrillic
524           Lorem Ipsum используют потому, что тот обеспечивает более или менее
525           стандартное заполнение шаблона.
526
527              а также реальное распределение букв и пробелов в абзацах
528
529       Hebrew
530           זוהי עובדה מבוססת שדעתו של הקורא תהיה מוסחת על ידי טקטס קריא כאשר
531           הוא יביט בפריסתו.
532
533             המטרה בשימוש ב-Lorem Ipsum הוא שיש לו פחות או יותר תפוצה של אותיות, בניגוד למלל
534
535       Arabic
536           العديد من برامح النشر المكتبي وبرامح تحرير صفحات الويب تستخدم لوريم
537           إيبسوم بشكل إفتراضي
538
539             كنموذج عن النص، وإذا قمت بإدخال "lorem ipsum" في أي محرك بحث ستظهر العديد من
540
541       Hindi
542           Lorem Ipsum के अंश कई रूप में उपलब्ध हैं, लेकिन बहुमत को किसी अन्य
543           रूप में परिवर्तन का सामना करना पड़ा है, हास्य डालना या क्रमरहित
544           शब्द ,
545
546             जो तनिक भी विश्वसनीय नहीं लग रहे हो. यदि आप Lorem Ipsum के एक अनुच्छेद का उपयोग करने जा रहे हैं, तो आप को यकीन दिला दें कि पाठ के मध्य में वहाँ कुछ भी शर्मनाक छिपा हुआ नहीं है.
547
548       Chinese
549           无可否认,当读者在浏览一个页面的排版时,难免会被可阅读的内容所分散注意力。
550
551             Lorem Ipsum的目的就是为了保持字母多多少少标准及平
552
553       Thai
554           มีหลักฐานที่เป็นข้อเท็จจริงยืนยันมานานแล้ว
555           ว่าเนื้อหาที่อ่านรู้เรื่องนั้นจะไปกวนสมาธิของคนอ่านให้เขวไปจากส่วนที้เป็น
556           Layout เรานำ Lorem Ipsum
557           มาใช้เพราะความที่มันมีการกระจายของตัวอักษรธรรมดาๆ แบบพอประมาณ
558           ซึ่งเอามาใช้แทนการเขียนว่า ‘ตรงนี้เป็นเนื้อหา, ตรงนี้เป็นเนื้อหา'
559           ได้ และยังทำให้มองดูเหมือนกับภาษาอังกฤษที่อ่านได้ปกติ
560           ปัจจุบันมีแพ็กเกจของซอฟท์แวร์การทำสื่อสิ่งพิมพ์
561           และซอฟท์แวร์การสร้างเว็บเพจ
562
563              กวนสมาธิของคนอ่านให้เขวไปจากส่วนที้เป็น Layout เรานำ Lorem Ipsum
564
565           (Note: libthai is required for text wrapping by the word boundary)
566
567       Largest well-known grapheme cluster in Unicode
568           ཧྐྵྨླྺྼྻྂ
569
570           <http://archives.miloush.net/michkap/archive/2010/04/28/10002896.html>.
571

AUTHOR

573       Dmitry Karasik, <dmitry@karasik.eu.org>.
574

SEE ALSO

576       examples/bidi.pl
577
578
579
580perl v5.36.0                      2023-03-20        Prima::Drawable::Glyphs(3)
Impressum