1Lucy::Index::Indexer(3)User Contributed Perl DocumentatioLnucy::Index::Indexer(3)
2
3
4
6 Lucy::Index::Indexer - Build inverted indexes.
7
9 my $indexer = Lucy::Index::Indexer->new(
10 schema => $schema,
11 index => '/path/to/index',
12 create => 1,
13 );
14 while ( my ( $title, $content ) = each %source_docs ) {
15 $indexer->add_doc({
16 title => $title,
17 content => $content,
18 });
19 }
20 $indexer->commit;
21
23 The Indexer class is Apache Lucy’s primary tool for managing the
24 content of inverted indexes, which may later be searched using
25 IndexSearcher.
26
27 In general, only one Indexer at a time may write to an index safely.
28 If a write lock cannot be secured, new() will throw an exception.
29
30 If an index is located on a shared volume, each writer application must
31 identify itself by supplying an IndexManager with a unique "host" id to
32 Indexer’s constructor or index corruption will occur. See FileLocking
33 for a detailed discussion.
34
35 Note: at present, delete_by_term() and delete_by_query() only affect
36 documents which had been previously committed to the index – and not
37 any documents added this indexing session but not yet committed. This
38 may change in a future update.
39
41 new
42 my $indexer = Lucy::Index::Indexer->new(
43 schema => $schema, # required at index creation
44 index => '/path/to/index', # required
45 create => 1, # default: 0
46 truncate => 1, # default: 0
47 manager => $manager # default: created internally
48 );
49
50 • schema - A Schema. Required when index is being created; if not
51 supplied, will be extracted from the index folder.
52
53 • index - Either a filepath to an index or a Folder.
54
55 • create - If true and the index directory does not exist, attempt to
56 create it.
57
58 • truncate - If true, proceed with the intention of discarding all
59 previous indexing data. The old data will remain intact and
60 visible until commit() succeeds.
61
62 • manager - An IndexManager.
63
65 add_doc
66 $indexer->add_doc($doc);
67 $indexer->add_doc( { field_name => $field_value } );
68 $indexer->add_doc(
69 doc => { field_name => $field_value },
70 boost => 2.5, # default: 1.0
71 );
72
73 Add a document to the index. Accepts either a single argument or
74 labeled params.
75
76 • doc - Either a Lucy::Document::Doc object, or a hashref (which will
77 be attached to a Lucy::Document::Doc object internally).
78
79 • boost - A floating point weight which affects how this document
80 scores.
81
82 add_index
83 $indexer->add_index($index);
84
85 Absorb an existing index into this one. The two indexes must have
86 matching Schemas.
87
88 • index - Either an index path name or a Folder.
89
90 delete_by_term
91 $indexer->delete_by_term(
92 field => $field, # required
93 term => $term, # required
94 );
95
96 Mark documents which contain the supplied term as deleted, so that they
97 will be excluded from search results and eventually removed altogether.
98 The change is not apparent to search apps until after commit()
99 succeeds.
100
101 • field - The name of an indexed field. (If it is not spec’d as
102 "indexed", an error will occur.)
103
104 • term - The term which identifies docs to be marked as deleted. If
105 "field" is associated with an Analyzer, "term" will be processed
106 automatically (so don’t pre-process it yourself).
107
108 delete_by_query
109 $indexer->delete_by_query($query);
110
111 Mark documents which match the supplied Query as deleted.
112
113 • query - A Query.
114
115 delete_by_doc_id
116 $indexer->delete_by_doc_id($doc_id);
117
118 Mark the document identified by the supplied document ID as deleted.
119
120 • doc_id - A document id.
121
122 optimize
123 $indexer->optimize();
124
125 Optimize the index for search-time performance. This may take a while,
126 as it can involve rewriting large amounts of data.
127
128 Every Indexer session which changes index content and ends in a
129 commit() creates a new segment. Once written, segments are never
130 modified. However, they are periodically recycled by feeding their
131 content into the segment currently being written.
132
133 The optimize() method causes all existing index content to be fed back
134 into the Indexer. When commit() completes after an optimize(), the
135 index will consist of one segment. So optimize() must be called before
136 commit(). Also, optimizing a fresh index created from scratch has no
137 effect.
138
139 Historically, there was a significant search-time performance benefit
140 to collapsing down to a single segment versus even two segments. Now
141 the effect of collapsing is much less significant, and calling
142 optimize() is rarely justified.
143
144 commit
145 $indexer->commit();
146
147 Commit any changes made to the index. Until this is called, none of
148 the changes made during an indexing session are permanent.
149
150 Calling commit() invalidates the Indexer, so if you want to make more
151 changes you’ll need a new one.
152
153 prepare_commit
154 $indexer->prepare_commit();
155
156 Perform the expensive setup for commit() in advance, so that commit()
157 completes quickly. (If prepare_commit() is not called explicitly by
158 the user, commit() will call it internally.)
159
160 get_schema
161 my $schema = $indexer->get_schema();
162
163 Accessor for schema.
164
166 Lucy::Index::Indexer isa Clownfish::Obj.
167
168
169
170perl v5.36.0 2023-01-20 Lucy::Index::Indexer(3)