Lucy::Docs::Cookbook::FastUpdates(3pm)

1Lucy::Docs::Cookbook::FUassetrUpCdoanttersi(b3u)ted PerlLuDcoyc:u:mDeonctsa:t:iCoonokbook::FastUpdates(3)
2
3
4

NAME

6       Lucy::Docs::Cookbook::FastUpdates - Near real-time index updates
7

DESCRIPTION

9       While index updates are fast on average, worst-case update performance
10       may be significantly slower.  To make index updates consistently quick,
11       we must manually intervene to control the process of index segment
12       consolidation.
13
14   The problem
15       Ordinarily, modifying an index is cheap. New data is added to new
16       segments, and the time to write a new segment scales more or less
17       linearly with the number of documents added during the indexing
18       session.
19
20       Deletions are also cheap most of the time, because we donXt remove
21       documents immediately but instead mark them as deleted, and adding the
22       deletion mark is cheap.
23
24       However, as new segments are added and the deletion rate for existing
25       segments increases, search-time performance slowly begins to degrade.
26       At some point, it becomes necessary to consolidate existing segments,
27       rewriting their data into a new segment.
28
29       If the recycled segments are small, the time it takes to rewrite them
30       may not be significant.  Every once in a while, though, a large amount
31       of data must be rewritten.
32
33   Procrastinating and playing catch-up
34       The simplest way to force fast index updates is to avoid rewriting
35       anything.
36
37       Indexer relies upon IndexManagerXs recycle() method to tell it which
38       segments should be consolidated.  If we subclass IndexManager and
39       override the method so that it always returns an empty array, we get
40       consistently quick performance:
41
42           package NoMergeManager;
43           use base qw( Lucy::Index::IndexManager );
44           sub recycle { [] }
45
46           package main;
47           my $indexer = Lucy::Index::Indexer->new(
48               index => '/path/to/index',
49               manager => NoMergeManager->new,
50           );
51           ...
52           $indexer->commit;
53
54       However, we canXt procrastinate forever.  Eventually, weXll have to run
55       an ordinary, uncontrolled indexing session, potentially triggering a
56       large rewrite of lots of small and/or degraded segments:
57
58           my $indexer = Lucy::Index::Indexer->new(
59               index => '/path/to/index',
60               # manager => NoMergeManager->new,
61           );
62           ...
63           $indexer->commit;
64
65   Acceptable worst-case update time, slower degradation
66       Never merging anything at all in the main indexing process is probably
67       overkill.  Small segments are relatively cheap to merge; we just need
68       to guard against the big rewrites.
69
70       Setting a ceiling on the number of documents in the segments to be
71       recycled allows us to avoid a mass proliferation of tiny, single-
72       document segments, while still offering decent worst-case update speed:
73
74           package LightMergeManager;
75           use base qw( Lucy::Index::IndexManager );
76
77           sub recycle {
78               my $self = shift;
79               my $seg_readers = $self->SUPER::recycle(@_);
80               @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
81               return $seg_readers;
82           }
83
84       However, we still have to consolidate every once in a while, and while
85       that happens content updates will be locked out.
86
87   Background merging
88       If itXs not acceptable to lock out updates while the index
89       consolidation process runs, the alternative is to move the
90       consolidation process out of band, using BackgroundMerger.
91
92       ItXs never safe to have more than one Indexer attempting to modify the
93       content of an index at the same time, but a BackgroundMerger and an
94       Indexer can operate simultaneously:
95
96           # Indexing process.
97           use Scalar::Util qw( blessed );
98           my $retries = 0;
99           while (1) {
100               eval {
101                   my $indexer = Lucy::Index::Indexer->new(
102                           index => '/path/to/index',
103                           manager => LightMergeManager->new,
104                       );
105                   $indexer->add_doc($doc);
106                   $indexer->commit;
107               };
108               last unless $@;
109               if ( blessed($@) and $@->isa("Lucy::Store::LockErr") ) {
110                   # Catch LockErr.
111                   warn "Couldn't get lock ($retries retries)";
112                   $retries++;
113               }
114               else {
115                   die "Write failed: $@";
116               }
117           }
118
119           # Background merge process.
120           my $manager = Lucy::Index::IndexManager->new;
121           $manager->set_write_lock_timeout(60_000);
122           my $bg_merger = Lucy::Index::BackgroundMerger->new(
123               index   => '/path/to/index',
124               manager => $manager,
125           );
126           $bg_merger->commit;
127
128       The exception handling code becomes useful once you have more than one
129       index modification process happening simultaneously.  By default,
130       Indexer tries several times to acquire a write lock over the span of
131       one second, then holds it until commit() completes.  BackgroundMerger
132       handles most of its work without the write lock, but it does need it
133       briefly once at the beginning and once again near the end.  Under
134       normal loads, the internal retry logic will resolve conflicts, but if
135       itXs not acceptable to miss an insert, you probably want to catch
136       LockErr exceptions thrown by Indexer.  In contrast, a LockErr from
137       BackgroundMerger probably just needs to be logged.
138
139
140
141perl v5.32.0                      2020-07-2L8ucy::Docs::Cookbook::FastUpdates(3)