1Unicode::Stringprep(3)User Contributed Perl DocumentationUnicode::Stringprep(3)
2
3
4
6 Unicode::Stringprep - Preparation of Internationalized Strings
7 (RFC 3454)
8
10 use Unicode::Stringprep;
11 use Unicode::Stringprep::Mapping;
12 use Unicode::Stringprep::Prohibited;
13
14 my $prepper = Unicode::Stringprep->new(
15 3.2,
16 [ { 32 => '<SPACE>'}, ],
17 'KC',
18 [ @Unicode::Stringprep::Prohibited::C12, @Unicode::Stringprep::Prohibited::C22,
19 @Unicode::Stringprep::Prohibited::C3, @Unicode::Stringprep::Prohibited::C4,
20 @Unicode::Stringprep::Prohibited::C5, @Unicode::Stringprep::Prohibited::C6,
21 @Unicode::Stringprep::Prohibited::C7, @Unicode::Stringprep::Prohibited::C8,
22 @Unicode::Stringprep::Prohibited::C9 ],
23 1, 0 );
24 $output = $prepper->($input)
25
27 This module implements the stringprep framework for preparing Unicode
28 text strings in order to increase the likelihood that string input and
29 string comparison work in ways that make sense for typical users
30 throughout the world. The stringprep protocol is useful for protocol
31 identifier values, company and personal names, internationalized domain
32 names, and other text strings.
33
34 The stringprep framework does not specify how protocols should prepare
35 text strings. Protocols must create profiles of stringprep in order to
36 fully specify the processing options.
37
39 This module provides a single function, "new", that creates a perl
40 function implementing a stringprep profile.
41
42 This module exports nothing.
43
44 new($unicode_version, $mapping_tables, $unicode_normalization,
45 $prohibited_tables, $bidi_check, $unassigned_check)
46 Creates a "bless"ed function reference that implements a stringprep
47 profile.
48
49 This function takes the following parameters:
50
51 $unicode_version
52 The Unicode version specified by the stringprep profile.
53
54 Currently, this parameter must be 3.2 (numeric).
55
56 $mapping_tables
57 The mapping tables used for stringprep.
58
59 The parameter may be a reference to a hash or an array, or
60 "undef". A hash must map Unicode codepoints (as integers, e. g.
61 0x0020 for U+0020) to replacement strings (as perl strings).
62 An array may contain pairs of Unicode codepoints and
63 replacement strings as well as references to nested hashes and
64 arrays.
65
66 Unicode::Stringprep::Mapping provides the tables from RFC 3454,
67 Appendix B.
68
69 For further information on the mapping step, see RFC 3454,
70 section 3.
71
72 $unicode_normalization
73 The Unicode normalization to be used.
74
75 Currently, "undef"/'' (no normalization) and 'KC'
76 (compatibility composed) are specified for stringprep.
77
78 For further information on the normalization step, see
79 RFC 3454, section 4.
80
81 Normalization form KC will also enable checks for some problem
82 sequences for which the normalization can't be implemented in
83 an interoperable way.
84
85 For more information, see "CAVEATS" below.
86
87 $prohibited_tables
88 The list of prohibited output characters for stringprep.
89
90 The parameter may be a reference to an array, or "undef". The
91 array contains pairs of codepoints, which define the start and
92 end of a Unicode character range (as integers). The end
93 character may be "undef", specifying a single-character range.
94 The array may also contain references to nested arrays.
95
96 Unicode::Stringprep::Prohibited provides the tables from
97 RFC 3454, Appendix C.
98
99 For further information on the prohibition checking step, see
100 RFC 3454, section 5.
101
102 $bidi_check
103 Whether to employ checks for confusing bidirectional text. A
104 boolean value.
105
106 For further information on the bidi checking step, see
107 RFC 3454, section 6.
108
109 $unassigned_check
110 Whether to check for and prohibit unassigned characters. A
111 boolean value.
112
113 The check must be used when creating stored strings. It should
114 not be used for query strings, increasing the chance that newly
115 assigned characters work as expected.
116
117 For further information on stored and query strings, see
118 RFC 3454, section 7.
119
120 The function returned can be called with a single parameter, the
121 string to be prepared, and returns the prepared string. It will die
122 if the input string cannot be successfully prepared because it
123 would contain invalid output (so use "eval" if necessary).
124
125 For performance reasons, it is strongly recommended to call the
126 "new" function as few times as possible, i. e. exactly once per
127 stringprep profile. It might also be better not to use this module
128 directly but to use (or write) a module implementing a profile,
129 such as Authen::SASL::SASLprep.
130
132 You can easily implement a stringprep profile without subclassing:
133
134 package ACME::ExamplePrep;
135
136 use Unicode::Stringprep;
137
138 use Unicode::Stringprep::Mapping;
139 use Unicode::Stringprep::Prohibited;
140
141 *exampleprep = Unicode::Stringprep->new(
142 3.2,
143 [ \@Unicode::Stringprep::Mapping::B1, ],
144 '',
145 [ \@Unicode::Stringprep::Prohibited::C12,
146 \@Unicode::Stringprep::Prohibited::C22, ],
147 1,
148 );
149
150 This binds "ACME::ExamplePrep::exampleprep" to the function created by
151 "Unicode::Stringprep->new".
152
153 Usually, it is not necessary to subclass this module. Sublassing this
154 module is not recommended.
155
157 The following modules contain the data tables from RFC 3454. These
158 modules are automatically loaded when loading "Unicode::Stringprep".
159
160 • Unicode::Stringprep::Unassigned
161
162 @Unicode::Stringprep::Unassigned::A1 # Appendix A.1
163
164 • Unicode::Stringprep::Mapping
165
166 @Unicode::Stringprep::Mapping::B1 # Appendix B.1
167 @Unicode::Stringprep::Mapping::B2 # Appendix B.2
168 @Unicode::Stringprep::Mapping::B2 # Appendix B.3
169
170 • Unicode::Stringprep::Prohibited
171
172 @Unicode::Stringprep::Prohibited::C11 # Appendix C.1.1
173 @Unicode::Stringprep::Prohibited::C12 # Appendix C.1.2
174 @Unicode::Stringprep::Prohibited::C21 # Appendix C.2.1
175 @Unicode::Stringprep::Prohibited::C22 # Appendix C.2.2
176 @Unicode::Stringprep::Prohibited::C3 # Appendix C.3
177 @Unicode::Stringprep::Prohibited::C4 # Appendix C.4
178 @Unicode::Stringprep::Prohibited::C5 # Appendix C.5
179 @Unicode::Stringprep::Prohibited::C6 # Appendix C.6
180 @Unicode::Stringprep::Prohibited::C7 # Appendix C.7
181 @Unicode::Stringprep::Prohibited::C8 # Appendix C.8
182 @Unicode::Stringprep::Prohibited::C9 # Appendix C.9
183
184 • Unicode::Stringprep::BiDi
185
186 @Unicode::Stringprep::BiDi::D1 # Appendix D.1
187 @Unicode::Stringprep::BiDi::D2 # Appendix D.2
188
190 In Unicode 3.2 to 4.0.1, the specification of UAX #15: Unicode
191 Normalization Forms for forms NFC and NFKC is not logically self-
192 consistent. This has been fixed in Corrigendum #5
193 (<http://unicode.org/versions/corrigendum5.html>).
194
195 Unfortunately, this yields two ways to implement NFC and NFKC in
196 Unicode 3.2, on which the Stringprep standard is based: one based on a
197 literal interpretation of the original specification and one based on
198 the corrected specification. The output of these implementations
199 differs for a small class of strings, all of which can't appear in
200 meaningful text. See UAX #15, section 19
201 <http://unicode.org/reports/tr15/#Stability_Prior_to_Unicode41> for
202 details.
203
204 This module will check for these strings and, if normalization is done,
205 prohibit them in output as it is not possible to interoperate under
206 these circumstandes.
207
208 Please note that due to this, the normalization step may cause the
209 preparation to fail. That is, the preparation function may die even if
210 there are no prohibited characters and no checks for bidi sequences and
211 unassigned characters, which may be surprising.
212
214 Claus Faerber <CFAERBER@cpan.org>
215
217 Copyright 2007-2009 Claus Faerber.
218
219 This library is free software; you can redistribute it and/or modify it
220 under the same terms as Perl itself.
221
223 Unicode::Normalize, RFC 3454 (<http://www.ietf.org/rfc/rfc3454.txt>)
224
225
226
227perl v5.34.0 2022-01-21 Unicode::Stringprep(3)