1Lucy::Analysis::StandarUdsTeorkeCnoinzterri(b3u)ted PerlLuDcoyc:u:mAennatlaytsiiosn::StandardTokenizer(3)
2
3
4
6 Lucy::Analysis::StandardTokenizer - Split a string into tokens.
7
9 my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
10
11 # Then... once you have a tokenizer, put it into a PolyAnalyzer:
12 my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
13 analyzers => [ $tokenizer, $normalizer, $stemmer ], );
14
16 Generically, XtokenizingX is a process of breaking up a string into an
17 array of XtokensX. For instance, the string Xthree blind miceX might
18 be tokenized into XthreeX, XblindX, XmiceX.
19
20 Lucy::Analysis::StandardTokenizer breaks up the text at the word
21 boundaries defined in Unicode Standard Annex #29. It then returns those
22 words that contain alphabetic or numeric characters.
23
25 new
26 my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
27
28 Constructor. Takes no arguments.
29
31 transform
32 my $inversion = $standard_tokenizer->transform($inversion);
33
34 Take a single Inversion as input and returns an Inversion, either the
35 same one (presumably transformed in some way), or a new one.
36
37 • inversion - An inversion.
38
40 Lucy::Analysis::StandardTokenizer isa Lucy::Analysis::Analyzer isa
41 Clownfish::Obj.
42
43
44
45perl v5.36.0 2022-07-2L2ucy::Analysis::StandardTokenizer(3)