Unicode::Stringprep(3pm)

1Unicode::Stringprep(3pmU)ser Contributed Perl DocumentatiUonnicode::Stringprep(3pm)
2
3
4

NAME

6       Unicode::Stringprep - Preparation of Internationalized Strings
7       (RFC 3454)
8

SYNOPSIS

10         use Unicode::Stringprep;
11         use Unicode::Stringprep::Mapping;
12         use Unicode::Stringprep::Prohibited;
13
14         my $prepper = Unicode::Stringprep->new(
15           3.2,
16           [ { 32 => '<SPACE>'},  ],
17           'KC',
18           [ @Unicode::Stringprep::Prohibited::C12, @Unicode::Stringprep::Prohibited::C22,
19             @Unicode::Stringprep::Prohibited::C3, @Unicode::Stringprep::Prohibited::C4,
20             @Unicode::Stringprep::Prohibited::C5, @Unicode::Stringprep::Prohibited::C6,
21             @Unicode::Stringprep::Prohibited::C7, @Unicode::Stringprep::Prohibited::C8,
22             @Unicode::Stringprep::Prohibited::C9 ],
23           1, 0 );
24         $output = $prepper->($input)
25

DESCRIPTION

27       This module implements the stringprep framework for preparing Unicode
28       text strings in order to increase the likelihood that string input and
29       string comparison work in ways that make sense for typical users
30       throughout the world.  The stringprep protocol is useful for protocol
31       identifier values, company and personal names, internationalized domain
32       names, and other text strings.
33
34       The stringprep framework does not specify how protocols should prepare
35       text strings. Protocols must create profiles of stringprep in order to
36       fully specify the processing options.
37

FUNCTIONS

39       This module provides a single function, "new", that creates a perl
40       function implementing a stringprep profile.
41
42       This module exports nothing.
43
44       new($unicode_version, $mapping_tables, $unicode_normalization,
45       $prohibited_tables, $bidi_check, $unassigned_check)
46           Creates a "bless"ed function reference that implements a stringprep
47           profile.
48
49           This function takes the following parameters:
50
51           $unicode_version
52               The Unicode version specified by the stringprep profile.
53
54               Currently, this parameter must be 3.2 (numeric).
55
56           $mapping_tables
57               The mapping tables used for stringprep.
58
59               The parameter may be a reference to a hash or an array, or
60               "undef". A hash must map Unicode codepoints (as integers, e. g.
61               0x0020 for U+0020) to replacement strings (as perl strings).
62               An array may contain pairs of Unicode codepoints and
63               replacement strings as well as references to nested hashes and
64               arrays.
65
66               Unicode::Stringprep::Mapping provides the tables from RFC 3454,
67               Appendix B.
68
69               For further information on the mapping step, see RFC 3454,
70               section 3.
71
72           $unicode_normalization
73               The Unicode normalization to be used.
74
75               Currently, "undef"/'' (no normalization) and 'KC'
76               (compatibility composed) are specified for stringprep.
77
78               For further information on the normalization step, see
79               RFC 3454, section 4.
80
81               Normalization form KC will also enable checks for some problem
82               sequences for which the normalization can't be implemented in
83               an interoperable way.
84
85               For more information, see "CAVEATS" below.
86
87           $prohibited_tables
88               The list of prohibited output characters for stringprep.
89
90               The parameter may be a reference to an array, or "undef". The
91               array contains pairs of codepoints, which define the start and
92               end of a Unicode character range (as integers). The end
93               character may be "undef", specifying a single-character range.
94               The array may also contain references to nested arrays.
95
96               Unicode::Stringprep::Prohibited provides the tables from
97               RFC 3454, Appendix C.
98
99               For further information on the prohibition checking step, see
100               RFC 3454, section 5.
101
102           $bidi_check
103               Whether to employ checks for confusing bidirectional text. A
104               boolean value.
105
106               For further information on the bidi checking step, see
107               RFC 3454, section 6.
108
109           $unassigned_check
110               Whether to check for and prohibit unassigned characters. A
111               boolean value.
112
113               The check must be used when creating stored strings. It should
114               not be used for query strings, increasing the chance that newly
115               assigned characters work as expected.
116
117               For further information on stored and query strings, see
118               RFC 3454, section 7.
119
120           The function returned can be called with a single parameter, the
121           string to be prepared, and returns the prepared string. It will die
122           if the input string cannot be successfully prepared because it
123           would contain invalid output (so use "eval" if necessary).
124
125           For performance reasons, it is strongly recommended to call the
126           "new" function as few times as possible, i. e. exactly once per
127           stringprep profile. It might also be better not to use this module
128           directly but to use (or write) a module implementing a profile,
129           such as Authen::SASL::SASLprep.
130

IMPLEMENTING PROFILES

132       You can easily implement a stringprep profile without subclassing:
133
134         package ACME::ExamplePrep;
135
136         use Unicode::Stringprep;
137
138         use Unicode::Stringprep::Mapping;
139         use Unicode::Stringprep::Prohibited;
140
141         *exampleprep = Unicode::Stringprep->new(
142           3.2,
143           [ \@Unicode::Stringprep::Mapping::B1, ],
144           '',
145           [ \@Unicode::Stringprep::Prohibited::C12,
146             \@Unicode::Stringprep::Prohibited::C22, ],
147           1,
148         );
149
150       This binds "ACME::ExamplePrep::exampleprep" to the function created by
151       "Unicode::Stringprep->new".
152
153       Usually, it is not necessary to subclass this module. Sublassing this
154       module is not recommended.
155

DATA TABLES

157       The following modules contain the data tables from RFC 3454.  These
158       modules are automatically loaded when loading "Unicode::Stringprep".
159
160       •   Unicode::Stringprep::Unassigned
161
162             @Unicode::Stringprep::Unassigned::A1  # Appendix A.1
163
164       •   Unicode::Stringprep::Mapping
165
166             @Unicode::Stringprep::Mapping::B1     # Appendix B.1
167             @Unicode::Stringprep::Mapping::B2     # Appendix B.2
168             @Unicode::Stringprep::Mapping::B2     # Appendix B.3
169
170       •   Unicode::Stringprep::Prohibited
171
172             @Unicode::Stringprep::Prohibited::C11 # Appendix C.1.1
173             @Unicode::Stringprep::Prohibited::C12 # Appendix C.1.2
174             @Unicode::Stringprep::Prohibited::C21 # Appendix C.2.1
175             @Unicode::Stringprep::Prohibited::C22 # Appendix C.2.2
176             @Unicode::Stringprep::Prohibited::C3  # Appendix C.3
177             @Unicode::Stringprep::Prohibited::C4  # Appendix C.4
178             @Unicode::Stringprep::Prohibited::C5  # Appendix C.5
179             @Unicode::Stringprep::Prohibited::C6  # Appendix C.6
180             @Unicode::Stringprep::Prohibited::C7  # Appendix C.7
181             @Unicode::Stringprep::Prohibited::C8  # Appendix C.8
182             @Unicode::Stringprep::Prohibited::C9  # Appendix C.9
183
184       •   Unicode::Stringprep::BiDi
185
186             @Unicode::Stringprep::BiDi::D1        # Appendix D.1
187             @Unicode::Stringprep::BiDi::D2        # Appendix D.2
188

CAVEATS

190       In Unicode 3.2 to 4.0.1, the specification of UAX #15: Unicode
191       Normalization Forms for forms NFC and NFKC is not logically self-
192       consistent.  This has been fixed in Corrigendum #5
193       (<http://unicode.org/versions/corrigendum5.html>).
194
195       Unfortunately, this yields two ways to implement NFC and NFKC in
196       Unicode 3.2, on which the Stringprep standard is based: one based on a
197       literal interpretation of the original specification and one based on
198       the corrected specification. The output of these implementations
199       differs for a small class of strings, all of which can't appear in
200       meaningful text. See UAX #15, section 19
201       <http://unicode.org/reports/tr15/#Stability_Prior_to_Unicode41> for
202       details.
203
204       This module will check for these strings and, if normalization is done,
205       prohibit them in output as it is not possible to interoperate under
206       these circumstandes.
207
208       Please note that due to this, the normalization step may cause the
209       preparation to fail. That is, the preparation function may die even if
210       there are no prohibited characters and no checks for bidi sequences and
211       unassigned characters, which may be surprising.
212

AUTHOR

214       Claus Färber <CFAERBER@cpan.org>
215

LICENSE

217       Copyright 2007-2009 Claus Färber.
218
219       This library is free software; you can redistribute it and/or modify it
220       under the same terms as Perl itself.
221