1Lucy::Analysis::Token(3U)ser Contributed Perl DocumentatiLouncy::Analysis::Token(3)
2
3
4

NAME

6       Lucy::Analysis::Token - Unit of text.
7

SYNOPSIS

9               my $token = Lucy::Analysis::Token->new(
10                   text         => 'blind',
11                   start_offset => 8,
12                   end_offset   => 13,
13               );
14
15               $token->set_text('mice');
16

DESCRIPTION

18       Token is the fundamental unit used by Apache Lucy’s Analyzer
19       subclasses.  Each Token has 5 attributes: "text", "start_offset",
20       "end_offset", "boost", and "pos_inc".
21
22       The "text" attribute is a Unicode string encoded as UTF-8.
23
24       "start_offset" is the start point of the token text, measured in
25       Unicode code points from the top of the stored field; "end_offset"
26       delimits the corresponding closing boundary.  "start_offset" and
27       "end_offset" locate the Token within a larger context, even if the
28       Token’s text attribute gets modified – by stemming, for instance.  The
29       Token for “beating” in the text “beating a dead horse” begins life with
30       a start_offset of 0 and an end_offset of 7; after stemming, the text is
31       “beat”, but the start_offset is still 0 and the end_offset is still 7.
32       This allows “beating” to be highlighted correctly after a search
33       matches “beat”.
34
35       "boost" is a per-token weight.  Use this when you want to assign more
36       or less importance to a particular token, as you might for emboldened
37       text within an HTML document, for example.  (Note: The field this token
38       belongs to must be spec’d to use a posting of type RichPosting.)
39
40       "pos_inc" is the POSition INCrement, measured in Tokens.  This
41       attribute, which defaults to 1, is a an advanced tool for manipulating
42       phrase matching.  Ordinarily, Tokens are assigned consecutive position
43       numbers: 0, 1, and 2 for "three blind mice".  However, if you set the
44       position increment for “blind” to, say, 1000, then the three tokens
45       will end up assigned to positions 0, 1, and 1001 – and will no longer
46       produce a phrase match for the query "three blind mice".
47

CONSTRUCTORS

49   new
50           my $token = Lucy::Analysis::Token->new(
51               text         => $text,          # required
52               start_offset => $start_offset,  # required
53               end_offset   => $end_offset,    # required
54               boost        => 1.0,            # optional
55               pos_inc      => 1,              # optional
56           );
57
58text - A string.
59
60start_offset - Start offset into the original document in Unicode
61           code points.
62
63start_offset - End offset into the original document in Unicode
64           code points.
65
66boost - Per-token weight.
67
68pos_inc - Position increment for phrase matching.
69

METHODS

71   get_text
72           my $text = $token->get_text;
73
74       Get the token's text.
75
76   set_text
77           $token->set_text($text);
78
79       Set the token's text.
80
81   get_start_offset
82           my $int = $token->get_start_offset();
83
84   get_end_offset
85           my $int = $token->get_end_offset();
86
87   get_boost
88           my $float = $token->get_boost();
89
90   get_pos_inc
91           my $int = $token->get_pos_inc();
92
93   get_len
94           my $int = $token->get_len();
95

INHERITANCE

97       Lucy::Analysis::Token isa Clownfish::Obj.
98
99
100
101perl v5.36.0                      2023-01-20          Lucy::Analysis::Token(3)
Impressum