1Unicode::CaseFold(3) User Contributed Perl Documentation Unicode::CaseFold(3)
2
3
4
6 Unicode::CaseFold - Unicode case-folding for case-insensitive lookups.
7
9 version 1.01
10
12 use Unicode::CaseFold;
13
14 my $folded = fc $string;
15
16 What is Case-Folding?
17 In non-Unicode contexts, a common idiom to compare two strings case-
18 insensitively is "lc($this) eq lc($that)". Before comparing two strings
19 we normalize them to an all-lowercase version. "Hello", "HELLO", and
20 "HeLlO" all have the same lowercase form ("hello"), so it doesn't
21 matter which one we start with; they are all equal to one another after
22 "lc".
23
24 In Unicode, things aren't so simple. A Unicode character might have
25 mappings for uppercase, lowercase, and titlecase, and the lowercase
26 mapping of the uppercase mapping of a given character might not be the
27 character that you started with! For example "lc(uc("\N{LATIN SMALL
28 LETTER SHARP S"))" is "ss", not the eszett we started off with! Case-
29 folding is a part of the Unicode standard that allows any two strings
30 that differ from one another only by case to map to the same "case-
31 folded" form, even when those strings include characters with complex
32 case-mappings.
33
34 Use for Case-insensitive Comparison
35 Simply write "fc($this) eq fc($that)" instead of "lc($this) eq
36 lc($that)". You can also use "index" on case-folded strings for
37 substring search.
38
39 Use for String Lookups
40 Frequently we want to store data in a hash, or a database, or an
41 external file for later retrieval. Sometimes we want to be able to
42 match the keys in this data case-insensitively -- that is, we should be
43 able to store some data under the key "hello" and later retrieve it
44 with the key "HELLO". Some databases have complete support for
45 collation, but in other databases the support is missing or broken, and
46 Perl hashes don't support it at all. By making case-folding part of the
47 process you use to normalize your keys before using them to access a
48 database or data structure, you get case-insensitive lookup.
49
50 $roles{fc "Samuel L. Jackson"} = ["Gin Rummy", "Nick Fury", "Mace Windu"];
51
52 $roles = $roles{fc "Samuel l. JACKSON"}; # Gets the data.
53
55 This module provides Unicode case-folding for Perl. Case-folding is a
56 tool that allows a program to make case-insensitive string comparisons
57 or do case-insensitive lookups.
58
60 fc($str)
61 Exported by default when you use the module. "use Unicode::CaseFold ()"
62 or "use Unicode::CaseFold qw(case_fold !fc)" if you don't want it to be
63 exported.
64
65 Returns the case-folded version of $str. This function is prototyped to
66 act as much as possible like the built-ins "lc" and "uc"; it imposes a
67 scalar context on its argument, and if called with no argument it will
68 return the case-folded version of $_.
69
70 case_fold($str)
71 Exported on request. Just like "fc", except that it has no prototype
72 and won't case-fold $_ if called without an argument.
73
75 $Unicode::CaseFold::XS
76 Whether the XS extension is in use. The pure-perl implementation is
77 5-10 times slower than the XS extension, and on versions of perl before
78 5.10.0 it will use simple case-folding instead of full case-folding
79 (see below).
80
81 $Unicode::CaseFold::SIMPLE_FOLDING
82 Is set to true if the perl version is prior to 5.10.0 and the XS
83 extension is not available. In this case, "fc" will perform a simple
84 case-folding instead of a full case-folding. Although relatively few
85 characters are affected, strings case-folded using simple folding might
86 not compare equal to the corresponding strings case-folded with full
87 folding, which may cause compatibility issues.
88
89 Furthermore, when simple folding is in use, some strings that would
90 have case-folded to the same value when using full folding will instead
91 case-fold to different values. For example, "fc("Wei\x{df}")" and
92 "fc("Weiss")" both produce "weiss" when full folding is in effect, but
93 the former produces "wei\x{df}" when using simple folding.
94
95 If you want to check for this potentially dangerous situation, consult
96 the $Unicode::CaseFold::SIMPLE_FOLDING variable.
97
99 · "Unicode::CaseFold" requires Perl 5.8.1 or newer.
100
101 · Different versions of perl include different versions of the
102 Unicode database, which is revised over time. If you are likely to
103 be comparing strings that have been folded using different versions
104 of perl, you may need to consult the changes for intervening
105 Unicode standard versions to find out whether your code will work
106 correctly.
107
108 · "Unicode::CaseFold" uses "simple" rather than "full" case-folding
109 when operating in Pure-perl mode on perl versions previous to
110 5.10.0. For compatibility implications, see
111 "$Unicode::CaseFold::SIMPLE_FOLDING".
112
114 · <http://unicode.org/reports/tr21/tr21-5.html>: Unicode Standard
115 Annex #21: Case Mappings
116
118 Andrew Rodland <arodland@cpan.org>
119
121 This software is copyright (c) 2017 by Andrew Rodland.
122
123 This is free software; you can redistribute it and/or modify it under
124 the same terms as the Perl 5 programming language system itself.
125
126
127
128perl v5.32.0 2020-07-28 Unicode::CaseFold(3)