1Lucy::Index::SimilarityU(s3e)r Contributed Perl DocumentaLtuicoyn::Index::Similarity(3)
2
3
4

NAME

6       Lucy::Index::Similarity - Judge how well a document matches a query.
7

SYNOPSIS

9           package MySimilarity;
10
11           sub length_norm { return 1.0 }    # disable length normalization
12
13           package MyFullTextType;
14           use base qw( Lucy::Plan::FullTextType );
15
16           sub make_similarity { MySimilarity->new }
17

DESCRIPTION

19       After determining whether a document matches a given query, a score
20       must be calculated which indicates how well the document matches the
21       query.  The Similarity class is used to judge how XsimilarX the query
22       and the document are to each other; the closer the resemblance, they
23       higher the document scores.
24
25       The default implementation uses LuceneXs modified cosine similarity
26       measure.  Subclasses might tweak the existing algorithms, or might be
27       used in conjunction with custom Query subclasses to implement arbitrary
28       scoring schemes.
29
30       Most of the methods operate on single fields, but some are used to
31       combine scores from multiple fields.
32

CONSTRUCTORS

34   new
35           my $sim = Lucy::Index::Similarity->new;
36
37       Constructor. Takes no arguments.
38

METHODS

40   length_norm
41           my $float = $similarity->length_norm($num_tokens);
42
43       Dampen the scores of long documents.
44
45       After a field is broken up into terms at index-time, each term must be
46       assigned a weight.  One of the factors in calculating this weight is
47       the number of tokens that the original field was broken into.
48
49       Typically, we assume that the more tokens in a field, the less
50       important any one of them is X so that, e.g. 5 mentions of XKafkaX in a
51       short article are given more heft than 5 mentions of XKafkaX in an
52       entire book.  The default implementation of length_norm expresses this
53       using an inverted square root.
54
55       However, the inverted square root has a tendency to reward very short
56       fields highly, which isnXt always appropriate for fields you expect to
57       have a lot of tokens on average.
58

INHERITANCE

60       Lucy::Index::Similarity isa Clownfish::Obj.
61
62
63
64perl v5.34.0                      2021-07-22        Lucy::Index::Similarity(3)
Impressum