1KinoSearch1::Analysis::UTsoekrenC(o3n)tributed Perl DocuKmiennotSaetairocnh1::Analysis::Token(3)
2
3
4
6 KinoSearch1::Analysis::Token - unit of text
7
9 # private class - no public API
10
12 You can't actually instantiate a Token object at the Perl level --
13 however, you can affect individual Tokens within a TokenBatch by way of
14 TokenBatch's (experimental) API.
15
17 Token is the fundamental unit used by KinoSearch1's Analyzer
18 subclasses. Each Token has 4 attributes: text, start_offset,
19 end_offset, and pos_inc (for position increment).
20
21 The text of a token is a string.
22
23 A Token's start_offset and end_offset locate it within a larger text,
24 even if the Token's text attribute gets modified -- by stemming, for
25 instance. The Token for "beating" in the text "beating a dead horse"
26 begins life with a start_offset of 0 and an end_offset of 7; after
27 stemming, the text is "beat", but the end_offset is still 7.
28
29 The position increment, which defaults to 1, is a an advanced tool for
30 manipulating phrase matching. Ordinarily, Tokens are assigned
31 consecutive position numbers: 0, 1, and 2 for "three blind mice".
32 However, if you set the position increment for "blind" to, say, 1000,
33 then the three tokens will end up assigned to positions 0, 1, and 1001
34 -- and will no longer produce a phrase match for the query '"three
35 blind mice"'.
36
38 Copyright 2006-2010 Marvin Humphrey
39
41 See KinoSearch1 version 1.01.
42
43
44
45perl v5.36.0 2023-01-20 KinoSearch1::Analysis::Token(3)