1Util(3) User Contributed Perl Documentation Util(3)
2
3
4
6 Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode
7
9 use Lingua::KO::Hangul::Util qw(:all);
10
11 decomposeSyllable("\x{AC00}"); # "\x{1100}\x{1161}"
12 composeSyllable("\x{1100}\x{1161}"); # "\x{AC00}"
13 decomposeJamo("\x{1101}"); # "\x{1100}\x{1100}"
14 composeJamo("\x{1100}\x{1100}"); # "\x{1101}"
15
16 getHangulName(0xAC00); # "HANGUL SYLLABLE GA"
17 parseHangulName("HANGUL SYLLABLE GA"); # 0xAC00
18
20 A Hangul syllable consists of Hangul jamo (Hangul letters).
21
22 Hangul letters are classified into three classes:
23
24 CHOSEONG (the initial sound) as a leading consonant (L),
25 JUNGSEONG (the medial sound) as a vowel (V),
26 JONGSEONG (the final sound) as a trailing consonant (T).
27
28 Any Hangul syllable is a composition of (i) L + V, or (ii) L + V + T.
29
30 Composition and Decomposition
31 "$resultant_string = decomposeSyllable($string)"
32 It decomposes a precomposed syllable ("LV" or "LVT") to a sequence
33 of conjoining jamo ("L + V" or "L + V + T") and returns the result
34 as a string.
35
36 Any characters other than Hangul syllables are not affected.
37
38 "$resultant_string = composeSyllable($string)"
39 It composes a sequence of conjoining jamo ("L + V" or "L + V + T")
40 to a precomposed syllable ("LV" or "LVT") if possible, and returns
41 the result as a string. A syllable "LV" and final jamo "T" are
42 also composed.
43
44 Any characters other than Hangul jamo and syllables are not
45 affected.
46
47 "$resultant_string = decomposeJamo($string)"
48 It decomposes a complex jamo to a sequence of simple jamo if
49 possible, and returns the result as a string. Any characters other
50 than complex jamo are not affected.
51
52 e.g.
53 CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP
54 JUNGSEONG AE to JUNGSEONG A + I
55 JUNGSEONG WE to JUNGSEONG U + EO + I
56 JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
57
58 "$resultant_string = composeJamo($string)"
59 It composes a sequence of simple jamo ("L1 + L2", "V1 + V2 + V3",
60 etc.) to a complex jamo if possible, and returns the result as a
61 string. Any characters other than simple jamo are not affected.
62
63 e.g.
64 CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP
65 JUNGSEONG A + I to JUNGSEONG AE
66 JUNGSEONG U + EO + I to JUNGSEONG WE
67 JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
68
69 "$resultant_string = decomposeFull($string)"
70 It decomposes a syllable/complex jamo to a sequence of simple jamo.
71 Equivalent to "decomposeJamo(decomposeSyllable($string))".
72
73 Composition and Decomposition (Old-interface, deprecated!)
74 "$string_decomposed = decomposeHangul($code_point)"
75 "@codepoints = decomposeHangul($code_point)"
76 If the specified code point is of a Hangul syllable, it returns a
77 list of code points (in a list context) or a string (in a scalar
78 context) of its decomposition.
79
80 decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA.
81 returns "\x{1100}\x{1161}" or (0x1100, 0x1161);
82
83 decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL.
84 returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);
85
86 Otherwise, returns false (empty string or empty list).
87
88 decomposeHangul(0x0041) # outside Hangul syllables
89 returns empty string or empty list.
90
91 "$string_composed = composeHangul($src_string)"
92 "@code_points_composed = composeHangul($src_string)"
93 Any sequence of an initial jamo "L" and a medial jamo "V" is
94 composed to a syllable "LV"; then any sequence of a syllable "LV"
95 and a final jamo "T" is composed to a syllable "LVT".
96
97 Any characters other than Hangul jamo and syllables are not
98 affected.
99
100 composeHangul("\x{1100}\x{1173}\x{11AF}.")
101 # returns "\x{AE00}." or (0xAE00,0x2E);
102
103 "$code_point_composite = getHangulComposite($code_point_here,
104 $code_point_next)"
105 It returns the codepoint of the composite if both two code points,
106 $code_point_here and $code_point_next, are in Hangul, and
107 composable.
108
109 Otherwise, returns "undef".
110
111 Hangul Syllable Name
112 The following functions handle only a precomposed Hangul syllable (from
113 "U+AC00" to "U+D7A3"), but not a Hangul jamo or other Hangul-related
114 character.
115
116 Names of Hangul syllables have a format of "HANGUL SYLLABLE %s".
117
118 "$name = getHangulName($code_point)"
119 If the specified code point is of a Hangul syllable, it returns its
120 name; otherwise it returns undef.
121
122 getHangulName(0xAC00) returns "HANGUL SYLLABLE GA";
123 getHangulName(0x0041) returns undef.
124
125 "$codepoint = parseHangulName($name)"
126 If the specified name is of a Hangul syllable, it returns its code
127 point; otherwise it returns undef.
128
129 parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00;
130
131 parseHangulName("LATIN SMALL LETTER A") returns undef;
132
133 parseHangulName("HANGUL SYLLABLE PERL") returns undef;
134 # Regrettably, HANGUL SYLLABLE PERL does not exist :-)
135
136 Standard Korean Syllable Block
137 Standard Korean syllable block consists of "L+ V+ T*" (a sequence of
138 one or more L, one or more V, and zero or more T) according to
139 conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28). A
140 sequence of "L" followed by "T" is not a syllable block without "V",
141 but consists of two nonstandard syllable blocks: one without "V", and
142 another without "L" and "V".
143
144 "$bool = isStandardForm($string)"
145 It returns boolean whether the string is encoded in the standard
146 form without a nonstandard sequence. It returns true only if the
147 string contains no nonstandard sequence.
148
149 "$resultant_string = insertFiller($string)"
150 It transforms the string into standard form by inserting fillers
151 into each syllables and returns the result as a string. Choseong
152 filler ("Lf", "U+115F") is inserted into a syllable block without
153 "L". Jungseong filler ("Vf", "U+1160") is inserted into a syllable
154 block without "V".
155
156 "$type = getSyllableType($code_point)"
157 It returns the Hangul syllable type (cf. HangulSyllableType.txt)
158 for the specified code point as a string: "L" for leading jamo, "V"
159 for vowel jamo, "T" for trailing jamo, "LV" for LV syllables, "LVT"
160 for LVT syllables, and "NA" for other code points (as Not
161 Applicable).
162
164 By default:
165
166 decomposeHangul
167 composeHangul
168 getHangulName
169 parseHangulName
170 getHangulComposite
171
172 On request:
173
174 decomposeSyllable
175 composeSyllable
176 decomposeJamo
177 composeJamo
178 decomposeFull
179 isStandardForm
180 insertFiller
181 getSyllableType
182
184 This module does not support Hangul jamo assigned in Unicode 5.2.0
185 (2009).
186
187 A list of Hangul charcters this module supports:
188
189 1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH
190 115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA
191 11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH
192 AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
193
195 SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
196
197 Copyright(C) 2001, 2003, 2005, SADAHIRO Tomoyuki. Japan. All rights
198 reserved.
199
200 This module is free software; you can redistribute it and/or modify it
201 under the same terms as Perl itself.
202
204 Unicode Normalization Forms (UAX #15)
205 <http://www.unicode.org/reports/tr15/>
206
207 Conjoining Jamo Behavior (revision) in UAX #28
208 <http://www.unicode.org/reports/tr28/#3_11_conjoining_jamo_behavior>
209
210 Hangul Syllable Type
211 <http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt>
212
213 Jamo Decomposition in Old Unicode
214 <http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt>
215
216 ISO/IEC JTC1/SC22/WG20 N954
217 Paper by K. KIM: New canonical decomposition and composition
218 processes for Hangeul
219
220 <http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF>
221
222 (summary: <http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF>) (cf.
223 <http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html>)
224
225
226
227perl v5.36.0 2023-01-20 Util(3)