1Lucy::Analysis::StandarUdsTeorkeCnoinzterri(b3u)ted PerlLuDcoyc:u:mAennatlaytsiiosn::StandardTokenizer(3)
2
3
4

NAME

6       Lucy::Analysis::StandardTokenizer - Split a string into tokens.
7

SYNOPSIS

9           my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
10
11           # Then... once you have a tokenizer, put it into a PolyAnalyzer:
12           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
13               analyzers => [ $tokenizer, $normalizer, $stemmer ], );
14

DESCRIPTION

16       Generically, XtokenizingX is a process of breaking up a string into an
17       array of XtokensX.  For instance, the string Xthree blind miceX might
18       be tokenized into XthreeX, XblindX, XmiceX.
19
20       Lucy::Analysis::StandardTokenizer breaks up the text at the word
21       boundaries defined in Unicode Standard Annex #29. It then returns those
22       words that contain alphabetic or numeric characters.
23

CONSTRUCTORS

25   new
26           my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
27
28       Constructor.  Takes no arguments.
29

METHODS

31   transform
32           my $inversion = $standard_tokenizer->transform($inversion);
33
34       Take a single Inversion as input and returns an Inversion, either the
35       same one (presumably transformed in some way), or a new one.
36
37inversion - An inversion.
38

INHERITANCE

40       Lucy::Analysis::StandardTokenizer isa Lucy::Analysis::Analyzer isa
41       Clownfish::Obj.
42
43
44
45perl v5.34.0                      2022-01-2L1ucy::Analysis::StandardTokenizer(3)
Impressum