1Prima::Drawable::GlyphsU(s3e)r Contributed Perl DocumentaPtriiomna::Drawable::Glyphs(3)
2
3
4

NAME

6       Prima::Drawable::Glyphs - helper routines for bi-directional text input
7       and complex scripts output
8

SYNOPSIS

10          use Prima;
11          $::application-> begin_paint;
12          ‭$::application-> text_shape_out('אפס123', 0,0);
13
14          ‭123ספא
15

DESCRIPTION

17       The class implements an abstraction over a set of glyphs that can be
18       rendered to represent text strings. Objects of the class are created
19       and returned from "Prima::Drawable::text_shape" calls, see more in
20       "text_shape" in Prima::Drawable. An object is a blessed array reference
21       that can contain either two or four packed arrays with 16-bit integers,
22       representing, correspondingly, a set of glyph indexes, a set of
23       character indexes, a set of glyph advances, and a set of glyph position
24       offsets per glyph. Additionally, the class implements several sets of
25       helper routines that aim to address common tasks when displaying glyph-
26       based strings.
27
28   Structure
29       Each array is an instance of "Prima::array", an effective plain memory
30       structure that provides standard perl interface over a string scalar
31       filled with fixed-width integers.
32
33       The following methods provide read-only access to these arrays:
34
35       glyphs
36           Contains set of unsigned 16-bit integers where each is a glyph
37           number corresponding to the font that was used when shaping the
38           text. These glyph numbers are only applicable to that font. Zero is
39           usually treated as a default glyph in vector fonts, when shaping
40           cannot map a character; in bitmap fonts this number it is usually a
41           "defaultChar".
42
43           This array is recognized as a special case when is set to
44           "text_out" or "get_text_width", that can process it without other
45           arrays. In this case, no special advances and glyph positions are
46           taken into the account though.
47
48           Each glyph is not necessarily mapped to a character, and quite
49           often it is not, even in english left-to-right texts. F ex
50           character combinations like "ff", "fi", "fl" can be mapped as
51           single ligature glyphs. When right-to-left, RTL, text direction is
52           taken into the account, the glyph positions may change, too.  See
53           "indexes" below that addresses mapping of glyph to characters.
54
55       indexes
56           Contains set of unsigned 16-bit integers where each is an offset
57           corresponding to the text was used in shaping. Each glyph position
58           thus points to a first character in the text that maps to the
59           glyph.
60
61           There can be more than one characters per glyphs, such as the above
62           example with a "ff" ligature. There can also be cases with more
63           than one characher per more than one glyph, such is the case in
64           indic scripts. In these cases it is easier to operate neither by
65           character offsets nor glyph offsets, but rather by clusters, where
66           each is an individual syntax unit that contains one or more
67           characters perl one or more glyphs.
68
69           In addition to the text offset, each index value can be flagged
70           with a "to::RTL" bit, signifying that the character in question has
71           RTL direction.  This is not necessarily semitic characters from RTL
72           languages that only have that attributes set; spaces in these
73           languages are normally attributed the RTL bit too, sometimes also
74           numbers. Use of explicit direction control characters from U+20XX
75           block can result in any character being assigned or not assigned
76           the RTL bit.
77
78           The array has an extra item added to its end, the length of the
79           text that was used in the snaping. This helps for easy calculation
80           of cluster length in characters, especially of the last one, where
81           difference between indexes is, basically, the cluster length.
82
83           The array is not used for text drawing or calculation, but only for
84           conversion between character, glyph, and cluster coordinates (see
85           "Coordinates" below).
86
87       advances
88           Contains set of unsigned 16-bit integers where each is a pixel
89           distance of how much space the glyph occupies. Where the advances
90           array is not present, or filled by "advances" options in
91           "text_shape", it is basically a sum of a, b, and c widths of a
92           glyph. However there are cases when depending on shaping input,
93           these values can differ.
94
95           One of those cases is combining graphemes, where text consisting of
96           two characters, "A" and combining grave accent U+300 should be
97           drawn as a single "À" symbol, but font doesn't have that single
98           glyph but rather two individual glyphs "A" and "`". There, where
99           grave glyph has its own advance for standalone usage, in this case
100           it should be ignored though, and that is achieved by setting the
101           advance of the "`" to zero.
102
103           The array content is respected by "text_out" and "get_text_width",
104           and its content can be changed at will to produce gaps in the text
105           quite easily. F ex "Prima::Edit" uses that to display tab
106           characters as spaces with 8x advance.
107
108       positions
109           Contains set of pairs of signed 16-bit integers where each is a X
110           and Y pixel offset for each glyph. Like in the previous example
111           with the "À" symbol, the grave glyph "`" may be positioned
112           differently on the vertical f ex on "À" and "à" graphemes.
113
114           The array is respected by "text_out" (but not by "get_text_width").
115
116       fonts
117           Contains set of unsigned 16-bit integers where each is an index in
118           the font substitution list (see "fontMapperPalette" in
119           Prima::Drawable). Zero means the current font.
120
121           The font substitution is applied by "text_shape" when "polyfont"
122           options is set (it is by default), and when the shaper cannot match
123           all fonts. If the current font contains all needed glyphs, this
124           entry is not present at all.
125
126           The array is respected by "text_out" and "get_text_width".
127
128   Coordinates
129       In addition to natural character coordinates, where each index is an
130       offset that can be directly used in "substr" perl function, this class
131       offers two additional coordinate systems that help abstract the object
132       data for display and navigation.
133
134       The glyph coordinate is a rather straighforward copy of the character
135       coordinates, where each number is an offset in the "glyphs" array.
136       Similarly, these offsets can be used to address individual glyphs,
137       indexes, advances, and positions. However these are not easy to use
138       when one needs, for example, to select a grapheme with a mouse, or
139       break set of glyphs in such a way so that a grapheme is not broken.
140       These can be managed easier in the cluster coordinate system.
141
142       The cluster coordinates are virtually superimposed set of offset where
143       each correspond to a set of one or more characters displayed by a one
144       or more glyphs. Most useful functions below operate in this system.
145
146   Selection
147       Practically, most useful coordinates that can be used for implementing
148       selection is either character or cluster, but not glyphs. The charater-
149       based selections makes trivial extraction or replacement of the
150       selected text, while the cluster-based makes it easier to manipulate (f
151       ex with Shift- arrow keys) the selection itself.
152
153       The class supports both, by operatin on selection maps or selection
154       chunks, where each represent same information but in different ways.
155       For example, consider embedded number in a bidi text. For the sake of
156       clarity I'll use latin characters here. Let's have a text scalar
157       containing these characters:
158
159          ABC123
160
161       where ABC is right-to-left text, and which, when rendered on screen,
162       should be displayed as
163
164          123CBA
165
166       (and index array is (3,4,5,2,1,0) ).
167
168       Next, the user clicks the mouse between A and B (in text offset 1),
169       drags the mouse then to the left, and finally stops between characters
170       2 and 3 (text offset 4). The resulting selection then should not be, as
171       one might naively expect, this:
172
173          123CBA
174          __^^^_
175
176       but this instead:
177
178          123CBA
179          ^^_^^_
180
181       because the next character after C is 1, and the range of the selected
182       sub-text is from characters 1 to 4.
183
184       The class offers to encode such information in a map, i.e. array of
185       integers "1,1,0,1,1,0", where each entry is either 0 or 1 depending on
186       whether the cluster is or is not selected.  Alternatively, the same
187       information can be encoded in chunks, or RLE sets, as array
188       "0,2,1,2,1", where the first integer signifies number of non-selected
189       clusters to display, the second - number of selected clusters, the
190       third the non-selected again, etc. If the first character belongs to
191       the selected chunk, the first integer in the result is set to 0.
192
193   Bidi input
194       When sending input to a widget in order to type in text, the otherwise
195       trivial case of figuring out at which position the text should be
196       inserted (or removed, for that matter), becomes interesting when there
197       are characters with mixed direction.
198
199       F ex it is indeed trivial, when the latin text is "AB", and the cursor
200       is positioned between "A" and "B", to figure out that whenever the user
201       types "C", the result should become "ACB". Likewise, when the text is
202       RTL and both text and input is arabic, the result is the same. However
203       when f.ex. the text is "A1", that is displayed as "1A" because of RTL
204       shaping, and the cursor is positioned between 1 (LTR) and "A" (RTL), it
205       is not clear whether that means the new input should be appended after
206       1 and become "A1C", or after "A", and become, correspondingly, "AC1".
207
208       There is no easy solution for this problem, and different programs
209       approach this differently, and some go as far as to provide two cursors
210       for both directions. The class offers its own solution that uses some
211       primitive heuristics to detect whether cursor belongs to the left or to
212       the right glyph.  This is the area that can be enhanced, and any help
213       from native users of RTL languages can be greatly appreciated.
214

API

216       abc $CANVAS, $INDEX
217           Returns a, b, c metrics from the glyph $INDEX
218
219       advances
220           Read-only accessor to the advances array, see Structure above.
221
222       clone
223           Clones the object
224
225       cluster2glyph $FROM, $LENGTH
226           Maps a range of clusters starting with $FROM with size $LENGTH into
227           the corresponding range of glyphs. Undefined $LENGTH calculates the
228           range from $FROM till the object end.
229
230       cluster2index $CLUSTER
231           Returns character offset of the first character in cluster
232           $CLUSTER.
233
234           Note: result may contain "to::RTL" flag.
235
236       cluster2range $CLUSTER
237           Returns character offset of the first character in cluster $CLUSTER
238           and how many characters are there in the cluster.
239
240       clusters
241           Returns array of integers where each is a first character offsets
242           per cluster.
243
244       cursor2offset $AT_CLUSTER, $PREFERRED_RTL
245           Given a cursor positioned next to the cluster $AT_CLUSTER, runs
246           simple heuristics to see what character offset it corresponds to.
247           $PREFERRED_RTL is used when object data are not enough.
248
249           See "Bidi input" above.
250
251       def $CANVAS, $INDEX
252           Returns d, e, f metrics from the glyph $INDEX
253
254       fonts
255           Read-only accessor to the font indexes, see Structure above.
256
257       get_box $CANVAS
258           Return box metrics of the glyph object.
259
260           See "get_text_box" in Prima::Drawable.
261
262       get_sub $FROM, $LENGTH
263           Extracts and clones a new object that constains data from cluster
264           offset $FROM, with cluster length $LENGTH.
265
266       get_sub_box $CANVAS, $FROM, $LENGTH
267           Calculate box metrics of a glyph string from the cluster $FROM with
268           size $LENGTH.
269
270       get_sub_width $CANVAS, $FROM, $LENGTH
271           Calculate pixel width of a glyph string from the cluster $FROM with
272           size $LENGTH.
273
274       get_width $CANVAS, $WITH_OVERHANGS
275           Return width of the glyph objects, with overhangs if requested.
276
277       glyph2cluster $GLYPH
278           Return the cluster that contains $GLYPH.
279
280       glyphs
281           Read-only accessor to the glyph indexes, see Structure above.
282
283       glyph_lengths
284           Returns array where each glyph position is set to a number showing
285           how many glyphs the cluster occupies at this position
286
287       index2cluster $INDEX
288           Returns the cluster that contains the character offset $INDEX.
289
290       indexes
291           Read-only accessor to the indexes, see Structure above.
292
293       index_lengths
294           Returns array where each glyph position is set to a number showing
295           how many characters the cluster occupies at this position
296
297       left_overhang
298           First integer from the "overhangs" result.
299
300       log2vis
301           Returns a map of integers where each character position corresponds
302           to a glyph position. The name is a rudiment from pure fribidi
303           shaping, where "log2vis" and "vis2log" were mapper functions with
304           the same functionality.
305
306       n_clusters
307           Calculates how many clusters the object contains.
308
309       new @ARRAYS
310           Create new object. Not used directly, but rather from inside
311           "text_shape" calls.
312
313       new_array NAME
314           Creates an array suitable for the object for direct insertion, if
315           manual construction of the object is needed. F ex one may set
316           missing "fonts" array like this:
317
318              $obj->[ Prima::Drawable::Glyphs::FONTS() ] = $obj->new_array('fonts');
319              $obj->fonts->[0] = 1;
320
321           The newly created array is filled with zeros.
322
323       new_empty
324           Creates a new empty object.
325
326       overhangs
327           Calculates two pixel widths for overhangs in the beginning and in
328           the end of the glyph string.  This is used in emulation of a
329           "get_text_width" call with the "to::AddOverhangs" flag.
330
331       positions
332           Read-only accessor to the positions array, see Structure above.
333
334       reorder_text TEXT
335           Returns a visual representation of "TEXT" assuming it was the input
336           of the "text_shape" call that created the object.
337
338       reverse
339           Creates a new object that has all arrays reversed. User for
340           calculation of pixel offset from the right end of a glyph string.
341
342       right_overhang
343           Second integer from the "overhangs" result.
344
345       selection2range $CLUSTER_START $CLUSTER_END
346           Converts cluster selection range into text selection range
347
348       selection_chunks_clusters, selection_chunks_glyphs $START, $END
349           Calculates a set of chunks of texts, that, given a text selection
350           from positions $START to $END, represent each either a set of
351           selected and non-selected clusters/glyphs.
352
353       selection_diff $OLD, $NEW
354           Given set of two chunk lists, in format as returned by
355           "selection_chunks_clusters" or "selection_chunks_glyphs",
356           calculates the list of chunks affected by the selection change. Can
357           be used for efficient repaints when the user interactively changes
358           text selection, to redraw only the changed regions.
359
360       selection_map_clusters, selection_map_glyphs $START, $END
361           Same as "selection_chunks_XXX", but instead of RLE chunks returns
362           full array for each cluster/glyph, where each entry is a boolean
363           value corresponding to whether that cluster/glyph is to be
364           displayed as selected, or not.
365
366       selection_walk $CHUNKS, $FROM, $TO = length, $SUB
367           Walks the selection chunks array, returned by "selection_chunks",
368           between $FROM and $TO clusters/glyphs, and for each chunk calls the
369           provided "$SUB->($offset, $length, $selected)", where each call
370           contains 2 integers to chunk offset and length, and a boolean flag
371           whether the chunk is selected or not.
372
373           Can be also used on a result of "selection_diff", in which case
374           $selected flag is irrelevant.
375
376       sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
377           Optimized version of "$CANVAS->text_out( $self->get_sub($FROM,
378           $LENGTH), $X, $Y )".
379
380       sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
381           Optimized version of "$CANVAS->text_wrap( $self->get_sub($FROM,
382           $LENGTH), $WIDTH, $OPT, $TABS )".  The result is also converted to
383           chunks.
384
385       text_length
386           Returns the length of the text that was shaped and that produced
387           the object.
388
389       x2cluster $CANVAS, $X, $FROM, $LENGTH
390           Given sub-cluster from $FROM with size $LENGTH, calculates how many
391           clusters would fit in width $X.
392

EXAMPLES

394       This section is only there to test proper rendering
395
396       Latin
397           Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
398           eiusmod tempor incididunt ut labore et dolore magna aliqua.
399
400              Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
401
402       Latin combining
403           D̍üi̔s͙ a̸u̵t͏eͬ ịr͡u̍r͜e̥ d͎ǒl̋o̻rͫ i̮n̓
404           r͐e̔p͊rͨe̾h̍e͐n̔ḋe͠r̕i̾t̅ ịn̷ vͅo̖lͦuͦpͧt̪ątͅe̪
405
406              v̰e̷l̳i̯t̽ e̵s̼s̈e̮ ċi̵l͟l͙u͆m͂ d̿o̙lͭo͕r̀e̯ ḛu̅ fͩuͧg̦iͩa̓ť n̜u̼lͩl͠a̒ p̏a̽r̗i͆a͆t̳űr̀
407
408       Cyrillic
409           Lorem Ipsum используют потому, что тот обеспечивает более или менее
410           стандартное заполнение шаблона.
411
412           а также реальное распределение букв и пробелов в абзацах
413
414       Hebrew
415           זוהי עובדה מבוססת שדעתו של הקורא תהיה מוסחת על ידי טקטס קריא כאשר
416           הוא יביט בפריסתו.
417
418             המטרה בשימוש ב-Lorem Ipsum הוא שיש לו פחות או יותר תפוצה של אותיות, בניגוד למלל
419
420       Arabic
421           العديد من برامح النشر المكتبي وبرامح تحرير صفحات الويب تستخدم لوريم
422           إيبسوم بشكل إفتراضي
423
424             كنموذج عن النص، وإذا قمت بإدخال "lorem ipsum" في أي محرك بحث ستظهر العديد من
425
426       Hindi
427           Lorem Ipsum के अंश कई रूप में उपलब्ध हैं, लेकिन बहुमत को किसी अन्य
428           रूप में परिवर्तन का सामना करना पड़ा है, हास्य डालना या क्रमरहित
429           शब्द ,
430
431             जो तनिक भी विश्वसनीय नहीं लग रहे हो. यदि आप Lorem Ipsum के एक अनुच्छेद का उपयोग करने जा रहे हैं, तो आप को यकीन दिला दें कि पाठ के मध्य में वहाँ कुछ भी शर्मनाक छिपा हुआ नहीं है.
432
433       Chinese
434           无可否认,当读者在浏览一个页面的排版时,难免会被可阅读的内容所分散注意力。
435
436             Lorem Ipsum的目的就是为了保持字母多多少少标准及平
437
438       Largest well-known grapheme cluster in Unicode
439           ཧྐྵྨླྺྼྻྂ
440
441           <http://archives.miloush.net/michkap/archive/2010/04/28/10002896.html>.
442

AUTHOR

444       Dmitry Karasik, <dmitry@karasik.eu.org>.
445

SEE ALSO

447       examples/bidi.pl
448
449
450
451perl v5.32.0                      2020-07-28        Prima::Drawable::Glyphs(3)
Impressum