1Unicode::Collate(3pm) Perl Programmers Reference Guide Unicode::Collate(3pm)
2
3
4
6 Unicode::Collate - Unicode Collation Algorithm
7
9 use Unicode::Collate;
10
11 #construct
12 $Collator = Unicode::Collate->new(%tailoring);
13
14 #sort
15 @sorted = $Collator->sort(@not_sorted);
16
17 #compare
18 $result = $Collator->cmp($a, $b); # returns 1, 0, or -1.
19
20 # If %tailoring is false (i.e. empty),
21 # $Collator should do the default collation.
22
24 This module is an implementation of Unicode Technical Standard #10
25 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).
26
27 Constructor and Tailoring
28
29 The "new" method returns a collator object.
30
31 $Collator = Unicode::Collate->new(
32 UCA_Version => $UCA_Version,
33 alternate => $alternate, # deprecated: use of 'variable' is recommended.
34 backwards => $levelNumber, # or \@levelNumbers
35 entry => $element,
36 hangul_terminator => $term_primary_weight,
37 ignoreName => qr/$ignoreName/,
38 ignoreChar => qr/$ignoreChar/,
39 katakana_before_hiragana => $bool,
40 level => $collationLevel,
41 normalization => $normalization_form,
42 overrideCJK => \&overrideCJK,
43 overrideHangul => \&overrideHangul,
44 preprocess => \&preprocess,
45 rearrange => \@charList,
46 table => $filename,
47 undefName => qr/$undefName/,
48 undefChar => qr/$undefChar/,
49 upper_before_lower => $bool,
50 variable => $variable,
51 );
52
53 UCA_Version
54 If the tracking version number of UCA is given, behavior of that
55 tracking version is emulated on collating. If omitted, the return
56 value of "UCA_Version()" is used. "UCA_Version()" should return
57 the latest tracking version supported.
58
59 The supported tracking version: 8, 9, 11, or 14.
60
61 UCA Unicode Standard DUCET (@version)
62 ---------------------------------------------------
63 8 3.1 3.0.1 (3.0.1d9)
64 9 3.1 with Corrigendum 3 3.1.1 (3.1.1)
65 11 4.0 4.0.0 (4.0.0)
66 14 4.1.0 4.1.0 (4.1.0)
67
68 Note: Recent UTS #10 renames "Tracking Version" to "Revision."
69
70 alternate
71 -- see 3.2.2 Alternate Weighting, version 8 of UTS #10
72
73 For backward compatibility, "alternate" (old name) can be used as
74 an alias for "variable".
75
76 backwards
77 -- see 3.1.2 French Accents, UTS #10.
78
79 backwards => $levelNumber or \@levelNumbers
80
81 Weights in reverse order; ex. level 2 (diacritic ordering) in
82 French. If omitted, forwards at all the levels.
83
84 entry
85 -- see 3.1 Linguistic Features; 3.2.1 File Format, UTS #10.
86
87 If the same character (or a sequence of characters) exists in the
88 collation element table through "table", mapping to collation ele‐
89 ments is overrided. If it does not exist, the mapping is defined
90 additionally.
91
92 entry => <<'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
93 0063 0068 ; [.0E6A.0020.0002.0063] # ch
94 0043 0068 ; [.0E6A.0020.0007.0043] # Ch
95 0043 0048 ; [.0E6A.0020.0008.0043] # CH
96 006C 006C ; [.0F4C.0020.0002.006C] # ll
97 004C 006C ; [.0F4C.0020.0007.004C] # Ll
98 004C 004C ; [.0F4C.0020.0008.004C] # LL
99 00F1 ; [.0F7B.0020.0002.00F1] # n-tilde
100 006E 0303 ; [.0F7B.0020.0002.00F1] # n-tilde
101 00D1 ; [.0F7B.0020.0008.00D1] # N-tilde
102 004E 0303 ; [.0F7B.0020.0008.00D1] # N-tilde
103 ENTRY
104
105 entry => <<'ENTRY', # for DUCET v4.0.0 (allkeys-4.0.0.txt)
106 00E6 ; [.0E33.0020.0002.00E6][.0E8B.0020.0002.00E6] # ae ligature as <a><e>
107 00C6 ; [.0E33.0020.0008.00C6][.0E8B.0020.0008.00C6] # AE ligature as <A><E>
108 ENTRY
109
110 NOTE: The code point in the UCA file format (before ';') must be a
111 Unicode code point (defined as hexadecimal), but not a native code
112 point. So 0063 must always denote "U+0063", but not a character of
113 "\x63".
114
115 Weighting may vary depending on collation element table. So ensure
116 the weights defined in "entry" will be consistent with those in the
117 collation element table loaded via "table".
118
119 In DUCET v4.0.0, primary weight of "C" is 0E60 and that of "D" is
120 "0E6D". So setting primary weight of "CH" to "0E6A" (as a value
121 between 0E60 and "0E6D") makes ordering as "C < CH < D". Exactly
122 speaking DUCET already has some characters between "C" and "D":
123 "small capital C" ("U+1D04") with primary weight 0E64,
124 "c-hook/C-hook" ("U+0188/U+0187") with 0E65, and "c-curl"
125 ("U+0255") with 0E69. Then primary weight "0E6A" for "CH" makes
126 "CH" ordered between "c-curl" and "D".
127
128 hangul_terminator
129 -- see 7.1.4 Trailing Weights, UTS #10.
130
131 If a true value is given (non-zero but should be positive), it will
132 be added as a terminator primary weight to the end of every stan‐
133 dard Hangul syllable. Secondary and any higher weights for termina‐
134 tor are set to zero. If the value is false or "hangul_terminator"
135 key does not exist, insertion of terminator weights will not be
136 performed.
137
138 Boundaries of Hangul syllables are determined according to conjoin‐
139 ing Jamo behavior in the Unicode Standard and HangulSyllable‐
140 Type.txt.
141
142 Implementation Note: [22m(1) For expansion mapping (Unicode character
143 mapped to a sequence of collation elements), a terminator will not
144 be added between collation elements, even if Hangul syllable bound‐
145 ary exists there. Addition of terminator is restricted to the next
146 position to the last collation element.
147
148 (2) Non-conjoining Hangul letters (Compatibility Jamo, halfwidth
149 Jamo, and enclosed letters) are not automatically terminated with a
150 terminator primary weight. These characters may need terminator
151 included in a collation element table beforehand.
152
153 ignoreChar
154 ignoreName
155 -- see 3.2.2 Variable Weighting, UTS #10.
156
157 Makes the entry in the table completely ignorable; i.e. as if the
158 weights were zero at all level.
159
160 Through "ignoreChar", any character matching "qr/$ignoreChar/" will
161 be ignored. Through "ignoreName", any character whose name (given
162 in the "table" file as a comment) matches "qr/$ignoreName/" will be
163 ignored.
164
165 E.g. when 'a' and 'e' are ignorable, 'element' is equal to 'lament'
166 (or 'lmnt').
167
168 katakana_before_hiragana
169 -- see 7.3.1 Tertiary Weight Table, UTS #10.
170
171 By default, hiragana is before katakana. If the parameter is made
172 true, this is reversed.
173
174 NOTE: This parameter simplemindedly assumes that any hira‐
175 gana/katakana distinctions must occur in level 3, and their weights
176 at level 3 must be same as those mentioned in 7.3.1, UTS #10. If
177 you define your collation elements which violate this requirement,
178 this parameter does not work validly.
179
180 level
181 -- see 4.3 Form Sort Key, UTS #10.
182
183 Set the maximum level. Any higher levels than the specified one
184 are ignored.
185
186 Level 1: alphabetic ordering
187 Level 2: diacritic ordering
188 Level 3: case ordering
189 Level 4: tie-breaking (e.g. in the case when variable is 'shifted')
190
191 ex.level => 2,
192
193 If omitted, the maximum is the 4th.
194
195 normalization
196 -- see 4.1 Normalize, UTS #10.
197
198 If specified, strings are normalized before preparation of sort
199 keys (the normalization is executed after preprocess).
200
201 A form name "Unicode::Normalize::normalize()" accepts will be
202 applied as $normalization_form. Acceptable names include 'NFD',
203 'NFC', 'NFKD', and 'NFKC'. See "Unicode::Normalize::normalize()"
204 for detail. If omitted, 'NFD' is used.
205
206 "normalization" is performed after "preprocess" (if defined).
207
208 Furthermore, special values, "undef" and "prenormalized", can be
209 used, though they are not concerned with "Unicode::Normalize::nor‐
210 malize()".
211
212 If "undef" (not a string "undef") is passed explicitly as the value
213 for this key, any normalization is not carried out (this may make
214 tailoring easier if any normalization is not desired). Under "(nor‐
215 malization => undef)", only contiguous contractions are resolved;
216 e.g. even if "A-ring" (and "A-ring-cedilla") is ordered after "Z",
217 "A-cedilla-ring" would be primary equal to "A". In this point,
218 "(normalization => undef, preprocess => sub { NFD(shift) })" is not
219 equivalent to "(normalization => 'NFD')".
220
221 In the case of "(normalization => "prenormalized")", any normaliza‐
222 tion is not performed, but non-contiguous contractions with combin‐
223 ing characters are performed. Therefore "(normalization =>
224 'prenormalized', preprocess => sub { NFD(shift) })" is equivalent
225 to "(normalization => 'NFD')". If source strings are finely
226 prenormalized, "(normalization => 'prenormalized')" may save time
227 for normalization.
228
229 Except "(normalization => undef)", Unicode::Normalize is required
230 (see also CAVEAT).
231
232 overrideCJK
233 -- see 7.1 Derived Collation Elements, UTS #10.
234
235 By default, CJK Unified Ideographs are ordered in Unicode codepoint
236 order but "CJK Unified Ideographs" (if "UCA_Version" is 8 to 11,
237 its range is "U+4E00..U+9FA5"; if "UCA_Version" is 14, its range is
238 "U+4E00..U+9FBB") are lesser than "CJK Unified Ideographs Exten‐
239 sion" (its range is "U+3400..U+4DB5" and "U+20000..U+2A6D6").
240
241 Through "overrideCJK", ordering of CJK Unified Ideographs can be
242 overrided.
243
244 ex. CJK Unified Ideographs in the JIS code point order.
245
246 overrideCJK => sub {
247 my $u = shift; # get a Unicode codepoint
248 my $b = pack('n', $u); # to UTF-16BE
249 my $s = your_unicode_to_sjis_converter($b); # convert
250 my $n = unpack('n', $s); # convert sjis to short
251 [ $n, 0x20, 0x2, $u ]; # return the collation element
252 },
253
254 ex. ignores all CJK Unified Ideographs.
255
256 overrideCJK => sub {()}, # CODEREF returning empty list
257
258 # where ->eq("Pe\x{4E00}rl", "Perl") is true
259 # as U+4E00 is a CJK Unified Ideograph and to be ignorable.
260
261 If "undef" is passed explicitly as the value for this key, weights
262 for CJK Unified Ideographs are treated as undefined. But assign‐
263 ment of weight for CJK Unified Ideographs in table or "entry" is
264 still valid.
265
266 overrideHangul
267 -- see 7.1 Derived Collation Elements, UTS #10.
268
269 By default, Hangul Syllables are decomposed into Hangul Jamo, even
270 if "(normalization => undef)". But the mapping of Hangul Syllables
271 may be overrided.
272
273 This parameter works like "overrideCJK", so see there for examples.
274
275 If you want to override the mapping of Hangul Syllables, NFD, NFKD,
276 and FCD are not appropriate, since they will decompose Hangul Syl‐
277 lables before overriding.
278
279 If "undef" is passed explicitly as the value for this key, weight
280 for Hangul Syllables is treated as undefined without decomposition
281 into Hangul Jamo. But definition of weight for Hangul Syllables in
282 table or "entry" is still valid.
283
284 preprocess
285 -- see 5.1 Preprocessing, UTS #10.
286
287 If specified, the coderef is used to preprocess before the forma‐
288 tion of sort keys.
289
290 ex. dropping English articles, such as "a" or "the". Then, "the
291 pen" is before "a pencil".
292
293 preprocess => sub {
294 my $str = shift;
295 $str =~ s/\b(?:an?⎪the)\s+//gi;
296 return $str;
297 },
298
299 "preprocess" is performed before "normalization" (if defined).
300
301 rearrange
302 -- see 3.1.3 Rearrangement, UTS #10.
303
304 Characters that are not coded in logical order and to be rear‐
305 ranged. If "UCA_Version" is equal to or lesser than 11, default
306 is:
307
308 rearrange => [ 0x0E40..0x0E44, 0x0EC0..0x0EC4 ],
309
310 If you want to disallow any rearrangement, pass "undef" or "[]" (a
311 reference to empty list) as the value for this key.
312
313 If "UCA_Version" is equal to 14, default is "[]" (i.e. no re‐
314 arrangement).
315
316 According to the version 9 of UCA, this parameter shall not be
317 used; but it is not warned at present.
318
319 table
320 -- see 3.2 Default Unicode Collation Element Table, UTS #10.
321
322 You can use another collation element table if desired.
323
324 The table file should locate in the Unicode/Collate directory on
325 @INC. Say, if the filename is Foo.txt, the table file is searched
326 as Unicode/Collate/Foo.txt in @INC.
327
328 By default, allkeys.txt (as the filename of DUCET) is used. If you
329 will prepare your own table file, any name other than allkeys.txt
330 may be better to avoid namespace conflict.
331
332 If "undef" is passed explicitly as the value for this key, no file
333 is read (but you can define collation elements via "entry").
334
335 A typical way to define a collation element table without any file
336 of table:
337
338 $onlyABC = Unicode::Collate->new(
339 table => undef,
340 entry => << 'ENTRIES',
341 0061 ; [.0101.0020.0002.0061] # LATIN SMALL LETTER A
342 0041 ; [.0101.0020.0008.0041] # LATIN CAPITAL LETTER A
343 0062 ; [.0102.0020.0002.0062] # LATIN SMALL LETTER B
344 0042 ; [.0102.0020.0008.0042] # LATIN CAPITAL LETTER B
345 0063 ; [.0103.0020.0002.0063] # LATIN SMALL LETTER C
346 0043 ; [.0103.0020.0008.0043] # LATIN CAPITAL LETTER C
347 ENTRIES
348 );
349
350 If "ignoreName" or "undefName" is used, character names should be
351 specified as a comment (following "#") on each line.
352
353 undefChar
354 undefName
355 -- see 6.3.4 Reducing the Repertoire, UTS #10.
356
357 Undefines the collation element as if it were unassigned in the ta‐
358 ble. This reduces the size of the table. If an unassigned charac‐
359 ter appears in the string to be collated, the sort key is made from
360 its codepoint as a single-character collation element, as it is
361 greater than any other assigned collation elements (in the code‐
362 point order among the unassigned characters). But, it'd be better
363 to ignore characters unfamiliar to you and maybe never used.
364
365 Through "undefChar", any character matching "qr/$undefChar/" will
366 be undefined. Through "undefName", any character whose name (given
367 in the "table" file as a comment) matches "qr/$undefName/" will be
368 undefined.
369
370 ex. Collation weights for beyond-BMP characters are not stored in
371 object:
372
373 undefChar => qr/[^\0-\x{fffd}]/,
374
375 upper_before_lower
376 -- see 6.6 Case Comparisons, UTS #10.
377
378 By default, lowercase is before uppercase. If the parameter is
379 made true, this is reversed.
380
381 NOTE: This parameter simplemindedly assumes that any lower‐
382 case/uppercase distinctions must occur in level 3, and their
383 weights at level 3 must be same as those mentioned in 7.3.1, UTS
384 #10. If you define your collation elements which differs from this
385 requirement, this parameter doesn't work validly.
386
387 variable
388 -- see 3.2.2 Variable Weighting, UTS #10.
389
390 This key allows to variable weighting for variable collation ele‐
391 ments, which are marked with an ASTERISK in the table (NOTE: Many
392 punction marks and symbols are variable in allkeys.txt).
393
394 variable => 'blanked', 'non-ignorable', 'shifted', or 'shift-trimmed'.
395
396 These names are case-insensitive. By default (if specification is
397 omitted), 'shifted' is adopted.
398
399 'Blanked' Variable elements are made ignorable at levels 1 through 3;
400 considered at the 4th level.
401
402 'Non-Ignorable' Variable elements are not reset to ignorable.
403
404 'Shifted' Variable elements are made ignorable at levels 1 through 3
405 their level 4 weight is replaced by the old level 1 weight.
406 Level 4 weight for Non-Variable elements is 0xFFFF.
407
408 'Shift-Trimmed' Same as 'shifted', but all FFFF's at the 4th level
409 are trimmed.
410
411 Methods for Collation
412
413 "@sorted = $Collator->sort(@not_sorted)"
414 Sorts a list of strings.
415
416 "$result = $Collator->cmp($a, $b)"
417 Returns 1 (when $a is greater than $b) or 0 (when $a is equal to
418 $b) or -1 (when $a is lesser than $b).
419
420 "$result = $Collator->eq($a, $b)"
421 "$result = $Collator->ne($a, $b)"
422 "$result = $Collator->lt($a, $b)"
423 "$result = $Collator->le($a, $b)"
424 "$result = $Collator->gt($a, $b)"
425 "$result = $Collator->ge($a, $b)"
426 They works like the same name operators as theirs.
427
428 eq : whether $a is equal to $b.
429 ne : whether $a is not equal to $b.
430 lt : whether $a is lesser than $b.
431 le : whether $a is lesser than $b or equal to $b.
432 gt : whether $a is greater than $b.
433 ge : whether $a is greater than $b or equal to $b.
434
435 "$sortKey = $Collator->getSortKey($string)"
436 -- see 4.3 Form Sort Key, UTS #10.
437
438 Returns a sort key.
439
440 You compare the sort keys using a binary comparison and get the
441 result of the comparison of the strings using UCA.
442
443 $Collator->getSortKey($a) cmp $Collator->getSortKey($b)
444
445 is equivalent to
446
447 $Collator->cmp($a, $b)
448
449 "$sortKeyForm = $Collator->viewSortKey($string)"
450 Converts a sorting key into its representation form. If "UCA_Ver‐
451 sion" is 8, the output is slightly different.
452
453 use Unicode::Collate;
454 my $c = Unicode::Collate->new();
455 print $c->viewSortKey("Perl"),"\n";
456
457 # output:
458 # [0B67 0A65 0B7F 0B03 ⎪ 0020 0020 0020 0020 ⎪ 0008 0002 0002 0002 ⎪ FFFF FFFF FFFF FFFF]
459 # Level 1 Level 2 Level 3 Level 4
460
461 Methods for Searching
462
463 DISCLAIMER: If "preprocess" or "normalization" parameter is true for
464 $Collator, calling these methods ("index", "match", "gmatch", "subst",
465 "gsubst") is croaked, as the position and the length might differ from
466 those on the specified string. (And "rearrange" and "hangul_termina‐
467 tor" parameters are neglected.)
468
469 The "match", "gmatch", "subst", "gsubst" methods work like "m//",
470 "m//g", "s///", "s///g", respectively, but they are not aware of any
471 pattern, but only a literal substring.
472
473 "$position = $Collator->index($string, $substring[, $position])"
474 "($position, $length) = $Collator->index($string, $substring[, $posi‐
475 tion])"
476 If $substring matches a part of $string, returns the position of
477 the first occurrence of the matching part in scalar context; in
478 list context, returns a two-element list of the position and the
479 length of the matching part.
480
481 If $substring does not match any part of $string, returns "-1" in
482 scalar context and an empty list in list context.
483
484 e.g. you say
485
486 my $Collator = Unicode::Collate->new( normalization => undef, level => 1 );
487 # (normalization => undef) is REQUIRED.
488 my $str = "Ich muß studieren Perl.";
489 my $sub = "MÜSS";
490 my $match;
491 if (my($pos,$len) = $Collator->index($str, $sub)) {
492 $match = substr($str, $pos, $len);
493 }
494
495 and get "muß" in $match since "muß" is primary equal to "MÜSS".
496
497 "$match_ref = $Collator->match($string, $substring)"
498 "($match) = $Collator->match($string, $substring)"
499 If $substring matches a part of $string, in scalar context, returns
500 a reference to the first occurrence of the matching part
501 ($match_ref is always true if matches, since every reference is
502 true); in list context, returns the first occurrence of the match‐
503 ing part.
504
505 If $substring does not match any part of $string, returns "undef"
506 in scalar context and an empty list in list context.
507
508 e.g.
509
510 if ($match_ref = $Collator->match($str, $sub)) { # scalar context
511 print "matches [$$match_ref].\n";
512 } else {
513 print "doesn't match.\n";
514 }
515
516 or
517
518 if (($match) = $Collator->match($str, $sub)) { # list context
519 print "matches [$match].\n";
520 } else {
521 print "doesn't match.\n";
522 }
523
524 "@match = $Collator->gmatch($string, $substring)"
525 If $substring matches a part of $string, returns all the matching
526 parts (or matching count in scalar context).
527
528 If $substring does not match any part of $string, returns an empty
529 list.
530
531 "$count = $Collator->subst($string, $substring, $replacement)"
532 If $substring matches a part of $string, the first occurrence of
533 the matching part is replaced by $replacement ($string is modified)
534 and return $count (always equals to 1).
535
536 $replacement can be a "CODEREF", taking the matching part as an
537 argument, and returning a string to replace the matching part (a
538 bit similar to "s/(..)/$coderef->($1)/e").
539
540 "$count = $Collator->gsubst($string, $substring, $replacement)"
541 If $substring matches a part of $string, all the occurrences of the
542 matching part is replaced by $replacement ($string is modified) and
543 return $count.
544
545 $replacement can be a "CODEREF", taking the matching part as an
546 argument, and returning a string to replace the matching part (a
547 bit similar to "s/(..)/$coderef->($1)/eg").
548
549 e.g.
550
551 my $Collator = Unicode::Collate->new( normalization => undef, level => 1 );
552 # (normalization => undef) is REQUIRED.
553 my $str = "Camel donkey zebra came\x{301}l CAMEL horse cAm\0E\0L...";
554 $Collator->gsubst($str, "camel", sub { "<b>$_[0]</b>" });
555
556 # now $str is "<b>Camel</b> donkey zebra <b>came\x{301}l</b> <b>CAMEL</b> horse <b>cAm\0E\0L</b>...";
557 # i.e., all the camels are made bold-faced.
558
559 Other Methods
560
561 "%old_tailoring = $Collator->change(%new_tailoring)"
562 Change the value of specified keys and returns the changed part.
563
564 $Collator = Unicode::Collate->new(level => 4);
565
566 $Collator->eq("perl", "PERL"); # false
567
568 %old = $Collator->change(level => 2); # returns (level => 4).
569
570 $Collator->eq("perl", "PERL"); # true
571
572 $Collator->change(%old); # returns (level => 2).
573
574 $Collator->eq("perl", "PERL"); # false
575
576 Not all "(key,value)"s are allowed to be changed. See also @Uni‐
577 code::Collate::ChangeOK and @Unicode::Collate::ChangeNG.
578
579 In the scalar context, returns the modified collator (but it is not
580 a clone from the original).
581
582 $Collator->change(level => 2)->eq("perl", "PERL"); # true
583
584 $Collator->eq("perl", "PERL"); # true; now max level is 2nd.
585
586 $Collator->change(level => 4)->eq("perl", "PERL"); # false
587
588 "$version = $Collator->version()"
589 Returns the version number (a string) of the Unicode Standard which
590 the "table" file used by the collator object is based on. If the
591 table does not include a version line (starting with @version),
592 returns "unknown".
593
594 "UCA_Version()"
595 Returns the tracking version number of UTS #10 this module con‐
596 sults.
597
598 "Base_Unicode_Version()"
599 Returns the version number of UTS #10 this module consults.
600
602 No method will be exported.
603
605 Though this module can be used without any "table" file, to use this
606 module easily, it is recommended to install a table file in the UCA
607 format, by copying it under the directory <a place in @INC>/Uni‐
608 code/Collate.
609
610 The most preferable one is "The Default Unicode Collation Element Ta‐
611 ble" (aka DUCET), available from the Unicode Consortium's website:
612
613 http://www.unicode.org/Public/UCA/
614
615 http://www.unicode.org/Public/UCA/latest/allkeys.txt (latest version)
616
617 If DUCET is not installed, it is recommended to copy the file from
618 http://www.unicode.org/Public/UCA/latest/allkeys.txt to <a place in
619 @INC>/Unicode/Collate/allkeys.txt manually.
620
622 Normalization
623 Use of the "normalization" parameter requires the Unicode::Normal‐
624 ize module (see Unicode::Normalize).
625
626 If you need not it (say, in the case when you need not handle any
627 combining characters), assign "normalization => undef" explicitly.
628
629 -- see 6.5 Avoiding Normalization, UTS #10.
630
631 Conformance Test
632 The Conformance Test for the UCA is available under
633 <http://www.unicode.org/Public/UCA/>.
634
635 For CollationTest_SHIFTED.txt, a collator via "Unicode::Col‐
636 late->new( )" should be used; for CollationTest_NON_IGNORABLE.txt,
637 a collator via "Unicode::Collate->new(variable => "non-ignorable",
638 level => 3)".
639
640 Unicode::Normalize is required to try The Conformance Test.
641
643 The Unicode::Collate module for perl was written by SADAHIRO Tomoyuki,
644 <SADAHIRO@cpan.org>. This module is Copyright(C) 2001-2005, SADAHIRO
645 Tomoyuki. Japan. All rights reserved.
646
647 This module is free software; you can redistribute it and/or modify it
648 under the same terms as Perl itself.
649
650 The file Unicode/Collate/allkeys.txt was copied directly from
651 <http://www.unicode.org/Public/UCA/4.1.0/allkeys.txt>. This file is
652 Copyright (c) 1991-2005 Unicode, Inc. All rights reserved. Distributed
653 under the Terms of Use in <http://www.unicode.org/copyright.html>.
654
656 Unicode Collation Algorithm - UTS #10
657 <http://www.unicode.org/reports/tr10/>
658
659 The Default Unicode Collation Element Table (DUCET)
660 <http://www.unicode.org/Public/UCA/latest/allkeys.txt>
661
662 The conformance test for the UCA
663 <http://www.unicode.org/Public/UCA/latest/CollationTest.html>
664
665 <http://www.unicode.org/Public/UCA/latest/CollationTest.zip>
666
667 Hangul Syllable Type
668 <http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt>
669
670 Unicode Normalization Forms - UAX #15
671 <http://www.unicode.org/reports/tr15/>
672
673
674
675perl v5.8.8 2001-09-21 Unicode::Collate(3pm)