Lucy::Docs::Tutorial::AnalysisTutorial(3pm)

1Lucy::Docs::Tutorial::AUnsaelrysCiosnTturtiobruitaeldL(u3Pc)eyr:l:DDooccsu:m:eTnuttaotriioanl::AnalysisTutorial(3)
2
3
4

NAME

6       Lucy::Docs::Tutorial::AnalysisTutorial - How to choose and use
7       Analyzers.
8

DESCRIPTION

10       Try swapping out the EasyAnalyzer in our Schema for a
11       StandardTokenizer:
12
13           my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
14           my $type = Lucy::Plan::FullTextType->new(
15               analyzer => $tokenizer,
16           );
17
18       Search for "senate", "Senate", and "Senator" before and after making
19       the change and re-indexing.
20
21       Under EasyAnalyzer, the results are identical for all three searches,
22       but under StandardTokenizer, searches are case-sensitive, and the
23       result sets for "Senate" and "Senator" are distinct.
24
25   EasyAnalyzer
26       WhatXs happening is that EasyAnalyzer is performing more aggressive
27       processing than StandardTokenizer.  In addition to tokenizing, itXs
28       also converting all text to lower case so that searches are case-
29       insensitive, and using a XstemmingX algorithm to reduce related words
30       to a common stem ("senat", in this case).
31
32       EasyAnalyzer is actually multiple Analyzers wrapped up in a single
33       package.  In this case, itXs three-in-one, since specifying a
34       EasyAnalyzer with "language => 'en'" is equivalent to this snippet
35       creating a PolyAnalyzer:
36
37           my $tokenizer    = Lucy::Analysis::StandardTokenizer->new;
38           my $normalizer   = Lucy::Analysis::Normalizer->new;
39           my $stemmer      = Lucy::Analysis::SnowballStemmer->new( language => 'en' );
40           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
41               analyzers => [ $tokenizer, $normalizer, $stemmer ],
42           );
43
44       You can add or subtract Analyzers from there if you like.  Try adding a
45       fourth Analyzer, a SnowballStopFilter for suppressing XstopwordsX like
46       XtheX, XifX, and XmaybeX.
47
48           my $stopfilter = Lucy::Analysis::SnowballStopFilter->new(
49               language => 'en',
50           );
51           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
52               analyzers => [ $tokenizer, $normalizer, $stopfilter, $stemmer ],
53           );
54
55       Also, try removing the SnowballStemmer.
56
57           my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
58               analyzers => [ $tokenizer, $normalizer ],
59           );
60
61       The original choice of a stock English EasyAnalyzer probably still
62       yields the best results for this document collection, but you get the
63       idea: sometimes you want a different Analyzer.
64
65   When the best Analyzer is no Analyzer
66       Sometimes you donXt want an Analyzer at all.  That was true for our
67       XurlX field because we didnXt need it to be searchable, but itXs also
68       true for certain types of searchable fields.  For instance, XcategoryX
69       fields are often set up to match exactly or not at all, as are fields
70       like Xlast_nameX (because you may not want to conflate results for
71       XHumphreyX and XHumphriesX).
72
73       To specify that there should be no analysis performed at all, use
74       StringType:
75
76           my $type = Lucy::Plan::StringType->new;
77           $schema->spec_field( name => 'category', type => $type );
78
79   Highlighting up next
80       In our next tutorial chapter, HighlighterTutorial, weXll add
81       highlighted excerpts from the XcontentX field to our search results.
82
83
84
85perl v5.32.1                      2021L-u0c1y-:2:7Docs::Tutorial::AnalysisTutorial(3)