1Unicode::Collate(3pm) Perl Programmers Reference Guide Unicode::Collate(3pm)
2
3
4
6 Unicode::Collate - Unicode Collation Algorithm
7
9 use Unicode::Collate;
10
11 #construct
12 $Collator = Unicode::Collate->new(%tailoring);
13
14 #sort
15 @sorted = $Collator->sort(@not_sorted);
16
17 #compare
18 $result = $Collator->cmp($a, $b); # returns 1, 0, or -1.
19
20 # If %tailoring is false (i.e. empty),
21 # $Collator should do the default collation.
22
24 This module is an implementation of Unicode Technical Standard #10
25 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).
26
27 Constructor and Tailoring
28 The "new" method returns a collator object.
29
30 $Collator = Unicode::Collate->new(
31 UCA_Version => $UCA_Version,
32 alternate => $alternate, # deprecated: use of 'variable' is recommended.
33 backwards => $levelNumber, # or \@levelNumbers
34 entry => $element,
35 hangul_terminator => $term_primary_weight,
36 ignoreName => qr/$ignoreName/,
37 ignoreChar => qr/$ignoreChar/,
38 katakana_before_hiragana => $bool,
39 level => $collationLevel,
40 normalization => $normalization_form,
41 overrideCJK => \&overrideCJK,
42 overrideHangul => \&overrideHangul,
43 preprocess => \&preprocess,
44 rearrange => \@charList,
45 table => $filename,
46 undefName => qr/$undefName/,
47 undefChar => qr/$undefChar/,
48 upper_before_lower => $bool,
49 variable => $variable,
50 );
51
52 UCA_Version
53 If the tracking version number of UCA is given, behavior of that
54 tracking version is emulated on collating. If omitted, the return
55 value of "UCA_Version()" is used. "UCA_Version()" should return
56 the latest tracking version supported.
57
58 The supported tracking version: 8, 9, 11, or 14.
59
60 UCA Unicode Standard DUCET (@version)
61 ---------------------------------------------------
62 8 3.1 3.0.1 (3.0.1d9)
63 9 3.1 with Corrigendum 3 3.1.1 (3.1.1)
64 11 4.0 4.0.0 (4.0.0)
65 14 4.1.0 4.1.0 (4.1.0)
66
67 Note: Recent UTS #10 renames "Tracking Version" to "Revision."
68
69 alternate
70 -- see 3.2.2 Alternate Weighting, version 8 of UTS #10
71
72 For backward compatibility, "alternate" (old name) can be used as
73 an alias for "variable".
74
75 backwards
76 -- see 3.1.2 French Accents, UTS #10.
77
78 backwards => $levelNumber or \@levelNumbers
79
80 Weights in reverse order; ex. level 2 (diacritic ordering) in
81 French. If omitted, forwards at all the levels.
82
83 entry
84 -- see 3.1 Linguistic Features; 3.2.1 File Format, UTS #10.
85
86 If the same character (or a sequence of characters) exists in the
87 collation element table through "table", mapping to collation
88 elements is overrided. If it does not exist, the mapping is
89 defined additionally.
90
91 entry => <<'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
92 0063 0068 ; [.0E6A.0020.0002.0063] # ch
93 0043 0068 ; [.0E6A.0020.0007.0043] # Ch
94 0043 0048 ; [.0E6A.0020.0008.0043] # CH
95 006C 006C ; [.0F4C.0020.0002.006C] # ll
96 004C 006C ; [.0F4C.0020.0007.004C] # Ll
97 004C 004C ; [.0F4C.0020.0008.004C] # LL
98 00F1 ; [.0F7B.0020.0002.00F1] # n-tilde
99 006E 0303 ; [.0F7B.0020.0002.00F1] # n-tilde
100 00D1 ; [.0F7B.0020.0008.00D1] # N-tilde
101 004E 0303 ; [.0F7B.0020.0008.00D1] # N-tilde
102 ENTRY
103
104 entry => <<'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
105 00E6 ; [.0E33.0020.0002.00E6][.0E8B.0020.0002.00E6] # ae ligature as <a><e>
106 00C6 ; [.0E33.0020.0008.00C6][.0E8B.0020.0008.00C6] # AE ligature as <A><E>
107 ENTRY
108
109 NOTE: The code point in the UCA file format (before ';') must be a
110 Unicode code point (defined as hexadecimal), but not a native code
111 point. So 0063 must always denote "U+0063", but not a character of
112 "\x63".
113
114 Weighting may vary depending on collation element table. So ensure
115 the weights defined in "entry" will be consistent with those in the
116 collation element table loaded via "table".
117
118 In DUCET v4.0.0, primary weight of "C" is 0E60 and that of "D" is
119 "0E6D". So setting primary weight of "CH" to "0E6A" (as a value
120 between 0E60 and "0E6D") makes ordering as "C < CH < D". Exactly
121 speaking DUCET already has some characters between "C" and "D":
122 "small capital C" ("U+1D04") with primary weight 0E64,
123 "c-hook/C-hook" ("U+0188/U+0187") with 0E65, and "c-curl"
124 ("U+0255") with 0E69. Then primary weight "0E6A" for "CH" makes
125 "CH" ordered between "c-curl" and "D".
126
127 hangul_terminator
128 -- see 7.1.4 Trailing Weights, UTS #10.
129
130 If a true value is given (non-zero but should be positive), it will
131 be added as a terminator primary weight to the end of every
132 standard Hangul syllable. Secondary and any higher weights for
133 terminator are set to zero. If the value is false or
134 "hangul_terminator" key does not exist, insertion of terminator
135 weights will not be performed.
136
137 Boundaries of Hangul syllables are determined according to
138 conjoining Jamo behavior in the Unicode Standard and
139 HangulSyllableType.txt.
140
141 Implementation Note: [22m(1) For expansion mapping (Unicode character
142 mapped to a sequence of collation elements), a terminator will not
143 be added between collation elements, even if Hangul syllable
144 boundary exists there. Addition of terminator is restricted to the
145 next position to the last collation element.
146
147 (2) Non-conjoining Hangul letters (Compatibility Jamo, halfwidth
148 Jamo, and enclosed letters) are not automatically terminated with a
149 terminator primary weight. These characters may need terminator
150 included in a collation element table beforehand.
151
152 ignoreChar
153 ignoreName
154 -- see 3.2.2 Variable Weighting, UTS #10.
155
156 Makes the entry in the table completely ignorable; i.e. as if the
157 weights were zero at all level.
158
159 Through "ignoreChar", any character matching "qr/$ignoreChar/" will
160 be ignored. Through "ignoreName", any character whose name (given
161 in the "table" file as a comment) matches "qr/$ignoreName/" will be
162 ignored.
163
164 E.g. when 'a' and 'e' are ignorable, 'element' is equal to 'lament'
165 (or 'lmnt').
166
167 katakana_before_hiragana
168 -- see 7.3.1 Tertiary Weight Table, UTS #10.
169
170 By default, hiragana is before katakana. If the parameter is made
171 true, this is reversed.
172
173 NOTE: This parameter simplemindedly assumes that any
174 hiragana/katakana distinctions must occur in level 3, and their
175 weights at level 3 must be same as those mentioned in 7.3.1, UTS
176 #10. If you define your collation elements which violate this
177 requirement, this parameter does not work validly.
178
179 level
180 -- see 4.3 Form Sort Key, UTS #10.
181
182 Set the maximum level. Any higher levels than the specified one
183 are ignored.
184
185 Level 1: alphabetic ordering
186 Level 2: diacritic ordering
187 Level 3: case ordering
188 Level 4: tie-breaking (e.g. in the case when variable is 'shifted')
189
190 ex.level => 2,
191
192 If omitted, the maximum is the 4th.
193
194 normalization
195 -- see 4.1 Normalize, UTS #10.
196
197 If specified, strings are normalized before preparation of sort
198 keys (the normalization is executed after preprocess).
199
200 A form name "Unicode::Normalize::normalize()" accepts will be
201 applied as $normalization_form. Acceptable names include 'NFD',
202 'NFC', 'NFKD', and 'NFKC'. See "Unicode::Normalize::normalize()"
203 for detail. If omitted, 'NFD' is used.
204
205 "normalization" is performed after "preprocess" (if defined).
206
207 Furthermore, special values, "undef" and "prenormalized", can be
208 used, though they are not concerned with
209 "Unicode::Normalize::normalize()".
210
211 If "undef" (not a string "undef") is passed explicitly as the value
212 for this key, any normalization is not carried out (this may make
213 tailoring easier if any normalization is not desired). Under
214 "(normalization => undef)", only contiguous contractions are
215 resolved; e.g. even if "A-ring" (and "A-ring-cedilla") is ordered
216 after "Z", "A-cedilla-ring" would be primary equal to "A". In this
217 point, "(normalization => undef, preprocess => sub { NFD(shift) })"
218 is not equivalent to "(normalization => 'NFD')".
219
220 In the case of "(normalization => "prenormalized")", any
221 normalization is not performed, but non-contiguous contractions
222 with combining characters are performed. Therefore "(normalization
223 => 'prenormalized', preprocess => sub { NFD(shift) })" is
224 equivalent to "(normalization => 'NFD')". If source strings are
225 finely prenormalized, "(normalization => 'prenormalized')" may save
226 time for normalization.
227
228 Except "(normalization => undef)", Unicode::Normalize is required
229 (see also CAVEAT).
230
231 overrideCJK
232 -- see 7.1 Derived Collation Elements, UTS #10.
233
234 By default, CJK Unified Ideographs are ordered in Unicode codepoint
235 order but "CJK Unified Ideographs" (if "UCA_Version" is 8 to 11,
236 its range is "U+4E00..U+9FA5"; if "UCA_Version" is 14, its range is
237 "U+4E00..U+9FBB") are lesser than "CJK Unified Ideographs
238 Extension" (its range is "U+3400..U+4DB5" and "U+20000..U+2A6D6").
239
240 Through "overrideCJK", ordering of CJK Unified Ideographs can be
241 overrided.
242
243 ex. CJK Unified Ideographs in the JIS code point order.
244
245 overrideCJK => sub {
246 my $u = shift; # get a Unicode codepoint
247 my $b = pack('n', $u); # to UTF-16BE
248 my $s = your_unicode_to_sjis_converter($b); # convert
249 my $n = unpack('n', $s); # convert sjis to short
250 [ $n, 0x20, 0x2, $u ]; # return the collation element
251 },
252
253 ex. ignores all CJK Unified Ideographs.
254
255 overrideCJK => sub {()}, # CODEREF returning empty list
256
257 # where ->eq("Pe\x{4E00}rl", "Perl") is true
258 # as U+4E00 is a CJK Unified Ideograph and to be ignorable.
259
260 If "undef" is passed explicitly as the value for this key, weights
261 for CJK Unified Ideographs are treated as undefined. But
262 assignment of weight for CJK Unified Ideographs in table or "entry"
263 is still valid.
264
265 overrideHangul
266 -- see 7.1 Derived Collation Elements, UTS #10.
267
268 By default, Hangul Syllables are decomposed into Hangul Jamo, even
269 if "(normalization => undef)". But the mapping of Hangul Syllables
270 may be overrided.
271
272 This parameter works like "overrideCJK", so see there for examples.
273
274 If you want to override the mapping of Hangul Syllables, NFD, NFKD,
275 and FCD are not appropriate, since they will decompose Hangul
276 Syllables before overriding.
277
278 If "undef" is passed explicitly as the value for this key, weight
279 for Hangul Syllables is treated as undefined without decomposition
280 into Hangul Jamo. But definition of weight for Hangul Syllables in
281 table or "entry" is still valid.
282
283 preprocess
284 -- see 5.1 Preprocessing, UTS #10.
285
286 If specified, the coderef is used to preprocess before the
287 formation of sort keys.
288
289 ex. dropping English articles, such as "a" or "the". Then, "the
290 pen" is before "a pencil".
291
292 preprocess => sub {
293 my $str = shift;
294 $str =~ s/\b(?:an?|the)\s+//gi;
295 return $str;
296 },
297
298 "preprocess" is performed before "normalization" (if defined).
299
300 rearrange
301 -- see 3.1.3 Rearrangement, UTS #10.
302
303 Characters that are not coded in logical order and to be
304 rearranged. If "UCA_Version" is equal to or lesser than 11,
305 default is:
306
307 rearrange => [ 0x0E40..0x0E44, 0x0EC0..0x0EC4 ],
308
309 If you want to disallow any rearrangement, pass "undef" or "[]" (a
310 reference to empty list) as the value for this key.
311
312 If "UCA_Version" is equal to 14, default is "[]" (i.e. no
313 rearrangement).
314
315 According to the version 9 of UCA, this parameter shall not be
316 used; but it is not warned at present.
317
318 table
319 -- see 3.2 Default Unicode Collation Element Table, UTS #10.
320
321 You can use another collation element table if desired.
322
323 The table file should locate in the Unicode/Collate directory on
324 @INC. Say, if the filename is Foo.txt, the table file is searched
325 as Unicode/Collate/Foo.txt in @INC.
326
327 By default, allkeys.txt (as the filename of DUCET) is used. If you
328 will prepare your own table file, any name other than allkeys.txt
329 may be better to avoid namespace conflict.
330
331 If "undef" is passed explicitly as the value for this key, no file
332 is read (but you can define collation elements via "entry").
333
334 A typical way to define a collation element table without any file
335 of table:
336
337 $onlyABC = Unicode::Collate->new(
338 table => undef,
339 entry => << 'ENTRIES',
340 0061 ; [.0101.0020.0002.0061] # LATIN SMALL LETTER A
341 0041 ; [.0101.0020.0008.0041] # LATIN CAPITAL LETTER A
342 0062 ; [.0102.0020.0002.0062] # LATIN SMALL LETTER B
343 0042 ; [.0102.0020.0008.0042] # LATIN CAPITAL LETTER B
344 0063 ; [.0103.0020.0002.0063] # LATIN SMALL LETTER C
345 0043 ; [.0103.0020.0008.0043] # LATIN CAPITAL LETTER C
346 ENTRIES
347 );
348
349 If "ignoreName" or "undefName" is used, character names should be
350 specified as a comment (following "#") on each line.
351
352 undefChar
353 undefName
354 -- see 6.3.4 Reducing the Repertoire, UTS #10.
355
356 Undefines the collation element as if it were unassigned in the
357 table. This reduces the size of the table. If an unassigned
358 character appears in the string to be collated, the sort key is
359 made from its codepoint as a single-character collation element, as
360 it is greater than any other assigned collation elements (in the
361 codepoint order among the unassigned characters). But, it'd be
362 better to ignore characters unfamiliar to you and maybe never used.
363
364 Through "undefChar", any character matching "qr/$undefChar/" will
365 be undefined. Through "undefName", any character whose name (given
366 in the "table" file as a comment) matches "qr/$undefName/" will be
367 undefined.
368
369 ex. Collation weights for beyond-BMP characters are not stored in
370 object:
371
372 undefChar => qr/[^\0-\x{fffd}]/,
373
374 upper_before_lower
375 -- see 6.6 Case Comparisons, UTS #10.
376
377 By default, lowercase is before uppercase. If the parameter is
378 made true, this is reversed.
379
380 NOTE: This parameter simplemindedly assumes that any
381 lowercase/uppercase distinctions must occur in level 3, and their
382 weights at level 3 must be same as those mentioned in 7.3.1, UTS
383 #10. If you define your collation elements which differs from this
384 requirement, this parameter doesn't work validly.
385
386 variable
387 -- see 3.2.2 Variable Weighting, UTS #10.
388
389 This key allows to variable weighting for variable collation
390 elements, which are marked with an ASTERISK in the table (NOTE:
391 Many punction marks and symbols are variable in allkeys.txt).
392
393 variable => 'blanked', 'non-ignorable', 'shifted', or 'shift-trimmed'.
394
395 These names are case-insensitive. By default (if specification is
396 omitted), 'shifted' is adopted.
397
398 'Blanked' Variable elements are made ignorable at levels 1 through 3;
399 considered at the 4th level.
400
401 'Non-Ignorable' Variable elements are not reset to ignorable.
402
403 'Shifted' Variable elements are made ignorable at levels 1 through 3
404 their level 4 weight is replaced by the old level 1 weight.
405 Level 4 weight for Non-Variable elements is 0xFFFF.
406
407 'Shift-Trimmed' Same as 'shifted', but all FFFF's at the 4th level
408 are trimmed.
409
410 Methods for Collation
411 "@sorted = $Collator->sort(@not_sorted)"
412 Sorts a list of strings.
413
414 "$result = $Collator->cmp($a, $b)"
415 Returns 1 (when $a is greater than $b) or 0 (when $a is equal to
416 $b) or -1 (when $a is lesser than $b).
417
418 "$result = $Collator->eq($a, $b)"
419 "$result = $Collator->ne($a, $b)"
420 "$result = $Collator->lt($a, $b)"
421 "$result = $Collator->le($a, $b)"
422 "$result = $Collator->gt($a, $b)"
423 "$result = $Collator->ge($a, $b)"
424 They works like the same name operators as theirs.
425
426 eq : whether $a is equal to $b.
427 ne : whether $a is not equal to $b.
428 lt : whether $a is lesser than $b.
429 le : whether $a is lesser than $b or equal to $b.
430 gt : whether $a is greater than $b.
431 ge : whether $a is greater than $b or equal to $b.
432
433 "$sortKey = $Collator->getSortKey($string)"
434 -- see 4.3 Form Sort Key, UTS #10.
435
436 Returns a sort key.
437
438 You compare the sort keys using a binary comparison and get the
439 result of the comparison of the strings using UCA.
440
441 $Collator->getSortKey($a) cmp $Collator->getSortKey($b)
442
443 is equivalent to
444
445 $Collator->cmp($a, $b)
446
447 "$sortKeyForm = $Collator->viewSortKey($string)"
448 Converts a sorting key into its representation form. If
449 "UCA_Version" is 8, the output is slightly different.
450
451 use Unicode::Collate;
452 my $c = Unicode::Collate->new();
453 print $c->viewSortKey("Perl"),"\n";
454
455 # output:
456 # [0B67 0A65 0B7F 0B03 | 0020 0020 0020 0020 | 0008 0002 0002 0002 | FFFF FFFF FFFF FFFF]
457 # Level 1 Level 2 Level 3 Level 4
458
459 Methods for Searching
460 DISCLAIMER: If "preprocess" or "normalization" parameter is true for
461 $Collator, calling these methods ("index", "match", "gmatch", "subst",
462 "gsubst") is croaked, as the position and the length might differ from
463 those on the specified string. (And "rearrange" and
464 "hangul_terminator" parameters are neglected.)
465
466 The "match", "gmatch", "subst", "gsubst" methods work like "m//",
467 "m//g", "s///", "s///g", respectively, but they are not aware of any
468 pattern, but only a literal substring.
469
470 "$position = $Collator->index($string, $substring[, $position])"
471 "($position, $length) = $Collator->index($string, $substring[,
472 $position])"
473 If $substring matches a part of $string, returns the position of
474 the first occurrence of the matching part in scalar context; in
475 list context, returns a two-element list of the position and the
476 length of the matching part.
477
478 If $substring does not match any part of $string, returns "-1" in
479 scalar context and an empty list in list context.
480
481 e.g. you say
482
483 my $Collator = Unicode::Collate->new( normalization => undef, level => 1 );
484 # (normalization => undef) is REQUIRED.
485 my $str = "Ich muss studieren Perl.";
486 my $sub = "MUeSS";
487 my $match;
488 if (my($pos,$len) = $Collator->index($str, $sub)) {
489 $match = substr($str, $pos, $len);
490 }
491
492 and get "muss" in $match since "muss" is primary equal to "MUeSS".
493
494 "$match_ref = $Collator->match($string, $substring)"
495 "($match) = $Collator->match($string, $substring)"
496 If $substring matches a part of $string, in scalar context, returns
497 a reference to the first occurrence of the matching part
498 ($match_ref is always true if matches, since every reference is
499 true); in list context, returns the first occurrence of the
500 matching part.
501
502 If $substring does not match any part of $string, returns "undef"
503 in scalar context and an empty list in list context.
504
505 e.g.
506
507 if ($match_ref = $Collator->match($str, $sub)) { # scalar context
508 print "matches [$$match_ref].\n";
509 } else {
510 print "doesn't match.\n";
511 }
512
513 or
514
515 if (($match) = $Collator->match($str, $sub)) { # list context
516 print "matches [$match].\n";
517 } else {
518 print "doesn't match.\n";
519 }
520
521 "@match = $Collator->gmatch($string, $substring)"
522 If $substring matches a part of $string, returns all the matching
523 parts (or matching count in scalar context).
524
525 If $substring does not match any part of $string, returns an empty
526 list.
527
528 "$count = $Collator->subst($string, $substring, $replacement)"
529 If $substring matches a part of $string, the first occurrence of
530 the matching part is replaced by $replacement ($string is modified)
531 and return $count (always equals to 1).
532
533 $replacement can be a "CODEREF", taking the matching part as an
534 argument, and returning a string to replace the matching part (a
535 bit similar to "s/(..)/$coderef->($1)/e").
536
537 "$count = $Collator->gsubst($string, $substring, $replacement)"
538 If $substring matches a part of $string, all the occurrences of the
539 matching part is replaced by $replacement ($string is modified) and
540 return $count.
541
542 $replacement can be a "CODEREF", taking the matching part as an
543 argument, and returning a string to replace the matching part (a
544 bit similar to "s/(..)/$coderef->($1)/eg").
545
546 e.g.
547
548 my $Collator = Unicode::Collate->new( normalization => undef, level => 1 );
549 # (normalization => undef) is REQUIRED.
550 my $str = "Camel donkey zebra came\x{301}l CAMEL horse cAm\0E\0L...";
551 $Collator->gsubst($str, "camel", sub { "<b>$_[0]</b>" });
552
553 # now $str is "<b>Camel</b> donkey zebra <b>came\x{301}l</b> <b>CAMEL</b> horse <b>cAm\0E\0L</b>...";
554 # i.e., all the camels are made bold-faced.
555
556 Other Methods
557 "%old_tailoring = $Collator->change(%new_tailoring)"
558 Change the value of specified keys and returns the changed part.
559
560 $Collator = Unicode::Collate->new(level => 4);
561
562 $Collator->eq("perl", "PERL"); # false
563
564 %old = $Collator->change(level => 2); # returns (level => 4).
565
566 $Collator->eq("perl", "PERL"); # true
567
568 $Collator->change(%old); # returns (level => 2).
569
570 $Collator->eq("perl", "PERL"); # false
571
572 Not all "(key,value)"s are allowed to be changed. See also
573 @Unicode::Collate::ChangeOK and @Unicode::Collate::ChangeNG.
574
575 In the scalar context, returns the modified collator (but it is not
576 a clone from the original).
577
578 $Collator->change(level => 2)->eq("perl", "PERL"); # true
579
580 $Collator->eq("perl", "PERL"); # true; now max level is 2nd.
581
582 $Collator->change(level => 4)->eq("perl", "PERL"); # false
583
584 "$version = $Collator->version()"
585 Returns the version number (a string) of the Unicode Standard which
586 the "table" file used by the collator object is based on. If the
587 table does not include a version line (starting with @version),
588 returns "unknown".
589
590 "UCA_Version()"
591 Returns the tracking version number of UTS #10 this module
592 consults.
593
594 "Base_Unicode_Version()"
595 Returns the version number of UTS #10 this module consults.
596
598 No method will be exported.
599
601 Though this module can be used without any "table" file, to use this
602 module easily, it is recommended to install a table file in the UCA
603 format, by copying it under the directory <a place in
604 @INC>/Unicode/Collate.
605
606 The most preferable one is "The Default Unicode Collation Element
607 Table" (aka DUCET), available from the Unicode Consortium's website:
608
609 http://www.unicode.org/Public/UCA/
610
611 http://www.unicode.org/Public/UCA/latest/allkeys.txt (latest version)
612
613 If DUCET is not installed, it is recommended to copy the file from
614 http://www.unicode.org/Public/UCA/latest/allkeys.txt to <a place in
615 @INC>/Unicode/Collate/allkeys.txt manually.
616
618 Normalization
619 Use of the "normalization" parameter requires the
620 Unicode::Normalize module (see Unicode::Normalize).
621
622 If you need not it (say, in the case when you need not handle any
623 combining characters), assign "normalization => undef" explicitly.
624
625 -- see 6.5 Avoiding Normalization, UTS #10.
626
627 Conformance Test
628 The Conformance Test for the UCA is available under
629 <http://www.unicode.org/Public/UCA/>.
630
631 For CollationTest_SHIFTED.txt, a collator via
632 "Unicode::Collate->new( )" should be used; for
633 CollationTest_NON_IGNORABLE.txt, a collator via
634 "Unicode::Collate->new(variable => "non-ignorable", level => 3)".
635
636 Unicode::Normalize is required to try The Conformance Test.
637
639 The Unicode::Collate module for perl was written by SADAHIRO Tomoyuki,
640 <SADAHIRO@cpan.org>. This module is Copyright(C) 2001-2005, SADAHIRO
641 Tomoyuki. Japan. All rights reserved.
642
643 This module is free software; you can redistribute it and/or modify it
644 under the same terms as Perl itself.
645
646 The file Unicode/Collate/allkeys.txt was copied directly from
647 <http://www.unicode.org/Public/UCA/4.1.0/allkeys.txt>. This file is
648 Copyright (c) 1991-2005 Unicode, Inc. All rights reserved. Distributed
649 under the Terms of Use in <http://www.unicode.org/copyright.html>.
650
652 Unicode Collation Algorithm - UTS #10
653 <http://www.unicode.org/reports/tr10/>
654
655 The Default Unicode Collation Element Table (DUCET)
656 <http://www.unicode.org/Public/UCA/latest/allkeys.txt>
657
658 The conformance test for the UCA
659 <http://www.unicode.org/Public/UCA/latest/CollationTest.html>
660
661 <http://www.unicode.org/Public/UCA/latest/CollationTest.zip>
662
663 Hangul Syllable Type
664 <http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt>
665
666 Unicode Normalization Forms - UAX #15
667 <http://www.unicode.org/reports/tr15/>
668
669
670
671perl v5.12.4 2011-06-07 Unicode::Collate(3pm)