1Lucy::Analysis::NormaliUzseerr(3C)ontributed Perl DocumeLnutcayt:i:oAnnalysis::Normalizer(3)
2
3
4
6 Lucy::Analysis::Normalizer - Unicode normalization, case folding and
7 accent stripping.
8
10 my $normalizer = Lucy::Analysis::Normalizer->new;
11
12 my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
13 analyzers => [ $tokenizer, $normalizer, $stemmer ],
14 );
15
17 Normalizer is an Analyzer which normalizes tokens to one of the Unicode
18 normalization forms. Optionally, it performs Unicode case folding and
19 converts accented characters to their base character.
20
21 If you use highlighting, Normalizer should be run after tokenization
22 because it might add or remove characters.
23
25 new
26 my $normalizer = Lucy::Analysis::Normalizer->new(
27 normalization_form => 'NFKC',
28 case_fold => 1,
29 strip_accents => 0,
30 );
31
32 Create a new Normalizer.
33
34 • normalization_form - Unicode normalization form, can be one of
35 XNFCX, XNFKCX, XNFDX, XNFKDX. Defaults to XNFKCX.
36
37 • case_fold - Perform case folding, default is true.
38
39 • strip_accents - Strip accents, default is false.
40
42 transform
43 my $inversion = $normalizer->transform($inversion);
44
45 Take a single Inversion as input and returns an Inversion, either the
46 same one (presumably transformed in some way), or a new one.
47
48 • inversion - An inversion.
49
51 Lucy::Analysis::Normalizer isa Lucy::Analysis::Analyzer isa
52 Clownfish::Obj.
53
54
55
56perl v5.32.1 2021-01-27 Lucy::Analysis::Normalizer(3)