KinoSearch1::Analysis::Token(3pm)

1KinoSearch1::Analysis::UTsoekrenC(o3n)tributed Perl DocuKmiennotSaetairocnh1::Analysis::Token(3)
2
3
4

NAME

6       KinoSearch1::Analysis::Token - unit of text
7

SYNOPSIS

9           # private class - no public API
10

PRIVATE CLASS

12       You can't actually instantiate a Token object at the Perl level --
13       however, you can affect individual Tokens within a TokenBatch by way of
14       TokenBatch's (experimental) API.
15

DESCRIPTION

17       Token is the fundamental unit used by KinoSearch1's Analyzer
18       subclasses.  Each Token has 4 attributes: text, start_offset,
19       end_offset, and pos_inc (for position increment).
20
21       The text of a token is a string.
22
23       A Token's start_offset and end_offset locate it within a larger text,
24       even if the Token's text attribute gets modified -- by stemming, for
25       instance.  The Token for "beating" in the text "beating a dead horse"
26       begins life with a start_offset of 0 and an end_offset of 7; after
27       stemming, the text is "beat", but the end_offset is still 7.
28
29       The position increment, which defaults to 1, is a an advanced tool for
30       manipulating phrase matching.  Ordinarily, Tokens are assigned
31       consecutive position numbers: 0, 1, and 2 for "three blind mice".
32       However, if you set the position increment for "blind" to, say, 1000,
33       then the three tokens will end up assigned to positions 0, 1, and 1001
34       -- and will no longer produce a phrase match for the query '"three
35       blind mice"'.
36

COPYRIGHT

38       Copyright 2006-2010 Marvin Humphrey
39

LICENSE, DISCLAIMER, BUGS, etc.

41       See KinoSearch1 version 1.00.
42
43
44
45perl v5.12.2                      2010-10-05   KinoSearch1::Analysis::Token(3)