1Lucy::Docs::Cookbook::FUassetrUpCdoanttersi(b3u)ted PerlLuDcoyc:u:mDeonctsa:t:iCoonokbook::FastUpdates(3)
2
3
4
6 Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates
7
9 While index updates are fast on average, worst-case update performance
10 may be significantly slower. To make index updates consistently quick,
11 we must manually intervene to control the process of index segment
12 consolidation.
13
14 The problem
15 Ordinarily, modifying an index is cheap. New data is added to new
16 segments, and the time to write a new segment scales more or less
17 linearly with the number of documents added during the indexing
18 session.
19
20 Deletions are also cheap most of the time, because we donXt remove
21 documents immediately but instead mark them as deleted, and adding the
22 deletion mark is cheap.
23
24 However, as new segments are added and the deletion rate for existing
25 segments increases, search-time performance slowly begins to degrade.
26 At some point, it becomes necessary to consolidate existing segments,
27 rewriting their data into a new segment.
28
29 If the recycled segments are small, the time it takes to rewrite them
30 may not be significant. Every once in a while, though, a large amount
31 of data must be rewritten.
32
33 Procrastinating and playing catch-up
34 The simplest way to force fast index updates is to avoid rewriting
35 anything.
36
37 Indexer relies upon IndexManagerXs recycle() method to tell it which
38 segments should be consolidated. If we subclass IndexManager and
39 override the method so that it always returns an empty array, we get
40 consistently quick performance:
41
42 package NoMergeManager;
43 use base qw( Lucy::Index::IndexManager );
44 sub recycle { [] }
45
46 package main;
47 my $indexer = Lucy::Index::Indexer->new(
48 index => '/path/to/index',
49 manager => NoMergeManager->new,
50 );
51 ...
52 $indexer->commit;
53
54 However, we canXt procrastinate forever. Eventually, weXll have to run
55 an ordinary, uncontrolled indexing session, potentially triggering a
56 large rewrite of lots of small and/or degraded segments:
57
58 my $indexer = Lucy::Index::Indexer->new(
59 index => '/path/to/index',
60 # manager => NoMergeManager->new,
61 );
62 ...
63 $indexer->commit;
64
65 Acceptable worst-case update time, slower degradation
66 Never merging anything at all in the main indexing process is probably
67 overkill. Small segments are relatively cheap to merge; we just need
68 to guard against the big rewrites.
69
70 Setting a ceiling on the number of documents in the segments to be
71 recycled allows us to avoid a mass proliferation of tiny, single-
72 document segments, while still offering decent worst-case update speed:
73
74 package LightMergeManager;
75 use base qw( Lucy::Index::IndexManager );
76
77 sub recycle {
78 my $self = shift;
79 my $seg_readers = $self->SUPER::recycle(@_);
80 @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
81 return $seg_readers;
82 }
83
84 However, we still have to consolidate every once in a while, and while
85 that happens content updates will be locked out.
86
87 Background merging
88 If itXs not acceptable to lock out updates while the index
89 consolidation process runs, the alternative is to move the
90 consolidation process out of band, using BackgroundMerger.
91
92 ItXs never safe to have more than one Indexer attempting to modify the
93 content of an index at the same time, but a BackgroundMerger and an
94 Indexer can operate simultaneously:
95
96 # Indexing process.
97 use Scalar::Util qw( blessed );
98 my $retries = 0;
99 while (1) {
100 eval {
101 my $indexer = Lucy::Index::Indexer->new(
102 index => '/path/to/index',
103 manager => LightMergeManager->new,
104 );
105 $indexer->add_doc($doc);
106 $indexer->commit;
107 };
108 last unless $@;
109 if ( blessed($@) and $@->isa("Lucy::Store::LockErr") ) {
110 # Catch LockErr.
111 warn "Couldn't get lock ($retries retries)";
112 $retries++;
113 }
114 else {
115 die "Write failed: $@";
116 }
117 }
118
119 # Background merge process.
120 my $manager = Lucy::Index::IndexManager->new;
121 $manager->set_write_lock_timeout(60_000);
122 my $bg_merger = Lucy::Index::BackgroundMerger->new(
123 index => '/path/to/index',
124 manager => $manager,
125 );
126 $bg_merger->commit;
127
128 The exception handling code becomes useful once you have more than one
129 index modification process happening simultaneously. By default,
130 Indexer tries several times to acquire a write lock over the span of
131 one second, then holds it until commit() completes. BackgroundMerger
132 handles most of its work without the write lock, but it does need it
133 briefly once at the beginning and once again near the end. Under
134 normal loads, the internal retry logic will resolve conflicts, but if
135 itXs not acceptable to miss an insert, you probably want to catch
136 LockErr exceptions thrown by Indexer. In contrast, a LockErr from
137 BackgroundMerger probably just needs to be logged.
138
139
140
141perl v5.36.0 2022-07-2L2ucy::Docs::Cookbook::FastUpdates(3)