1Search::Elasticsearch::UCsleirenCto:n:t7r_i0bS:ue:taSercdcrhoP:le:lrE(ll3a)Dsotciucmseenatracthi:o:nClient::7_0::Scroll(3)
2
3
4
6 Search::Elasticsearch::Client::7_0::Scroll - A helper module for
7 scrolled searches
8
10 version 7.717
11
13 use Search::Elasticsearch;
14
15 my $es = Search::Elasticsearch->new;
16
17 my $scroll = $es->scroll_helper(
18 index => 'my_index',
19 body => {
20 query => {...},
21 size => 1000,
22 sort => '_doc'
23 }
24 );
25
26 say "Total hits: ". $scroll->total;
27
28 while (my $doc = $scroll->next) {
29 # do something
30 }
31
33 A scrolled search is a search that allows you to keep pulling results
34 until there are no more matching results, much like a cursor in an SQL
35 database.
36
37 Unlike paginating through results (with the "from" parameter in
38 search()), scrolled searches take a snapshot of the current state of
39 the index. Even if you keep adding new documents to the index or
40 updating existing documents, a scrolled search will only see the index
41 as it was when the search began.
42
43 This module is a helper utility that wraps the functionality of the
44 search() and scroll() methods to make them easier to use.
45
46 This class does Search::Elasticsearch::Client::7_0::Role::Scroll and
47 Search::Elasticsearch::Role::Is_Sync.
48
50 There are two primary use cases:
51
52 Pulling enough results
53 Perhaps you want to group your results by some field, and you don't
54 know exactly how many results you will need in order to return 10
55 grouped results. With a scrolled search you can keep pulling more
56 results until you have enough. For instance, you can search emails in
57 a mailing list, and return results grouped by "thread_id":
58
59 my (%groups,@results);
60
61 my $scroll = $es->scroll_helper(
62 index => 'my_emails',
63 type => 'email',
64 body => { query => {... some query ... }}
65 );
66
67 my $doc;
68 while (@results < 10 and $doc = $scroll->next) {
69
70 my $thread = $doc->{_source}{thread_id};
71
72 unless ($groups{$thread}) {
73 $groups{$thread} = [];
74 push @results, $groups{$thread};
75 }
76 push @{$groups{$thread}},$doc;
77
78 }
79
80 Extracting all documents
81 Often you will want to extract all (or a subset of) documents in an
82 index. If you want to change your type mappings, you will need to
83 reindex all of your data. Or perhaps you want to move a subset of the
84 data in one index into a new dedicated index. In these cases, you don't
85 care about sort order, you just want to retrieve all documents which
86 match a query, and do something with them. For instance, to retrieve
87 all the docs for a particular "client_id":
88
89 my $scroll = $es->scroll_helper(
90 index => 'my_index',
91 size => 1000,
92 body => {
93 query => {
94 match => {
95 client_id => 123
96 }
97 },
98 sort => '_doc'
99 }
100 );
101
102 while (my $doc = $scroll->next) {
103 # do something
104 }
105
106 Very often the something that you will want to do with these results
107 involves bulk-indexing them into a new index. The easiest way to do
108 this is to use the built-in "reindex()" in
109 Search::Elasticsearch::Client::7_0::Direct functionality provided by
110 Elasticsearch.
111
113 "new()"
114 use Search::Elasticsearch;
115
116 my $es = Search::Elasticsearch->new(...);
117 my $scroll = $es->scroll_helper(
118 scroll => '1m', # optional
119 %search_params
120 );
121
122 The "scroll_helper()" in Search::Elasticsearch::Client::7_0::Direct
123 method loads Search::Elasticsearch::Client::7_0::Scroll class and calls
124 "new()", passing in any arguments.
125
126 You can specify a "scroll" duration (which defaults to "1m"). Any
127 other parameters are passed directly to "search()" in
128 Search::Elasticsearch::Client::7_0::Direct.
129
130 The "scroll" duration tells Elasticearch how long it should keep the
131 scroll alive. Note: this duration doesn't need to be long enough to
132 process all results, just long enough to process a single batch of
133 results. The expiry gets renewed for another "scroll" period every
134 time new a new batch of results is retrieved from the cluster.
135
136 By default, the "scroll_id" is passed as the "body" to the scroll
137 request.
138
139 The "scroll" request uses "GET" by default. To use "POST" instead, set
140 send_get_body_as to "POST".
141
142 "next()"
143 $doc = $scroll->next;
144 @docs = $scroll->next($num);
145
146 The "next()" method returns the next result, or the next $num results
147 (pulling more results if required). If all results have been
148 exhausted, it returns an empty list.
149
150 "drain_buffer()"
151 @docs = $scroll->drain_buffer;
152
153 The "drain_buffer()" method returns all of the documents currently in
154 the buffer, without fetching any more from the cluster.
155
156 "refill_buffer()"
157 $total = $scroll->refill_buffer;
158
159 The "refill_buffer()" method fetches the next batch of results from the
160 cluster, stores them in the buffer, and returns the total number of
161 docs currently in the buffer.
162
163 "buffer_size()"
164 $total = $scroll->buffer_size;
165
166 The "buffer_size()" method returns the total number of docs currently
167 in the buffer.
168
169 "finish()"
170 $scroll->finish;
171
172 The "finish()" method clears out the buffer, sets "is_finished()" to
173 "true" and tries to clear the "scroll_id" on Elasticsearch. This API
174 is only supported since v0.90.6, but the call to "clear_scroll" is
175 wrapped in an "eval" so the "finish()" method can be safely called with
176 any version of Elasticsearch.
177
178 When the $scroll instance goes out of scope, "finish()" is called
179 automatically if required.
180
181 "is_finished()"
182 $bool = $scroll->is_finished;
183
184 A flag which returns "true" if all results have been processed or
185 "finish()" has been called.
186
188 The information from the original search is returned via the following
189 accessors:
190
191 "total"
192 The total number of documents that matched your query.
193
194 "max_score"
195 The maximum score of any documents in your query.
196
197 "aggregations"
198 Any aggregations that were specified, or "undef"
199
200 "facets"
201 Any facets that were specified, or "undef"
202
203 "suggest"
204 Any suggestions that were specified, or "undef"
205
206 "took"
207 How long the original search took, in milliseconds
208
209 "took_total"
210 How long the original search plus all subsequent batches took, in
211 milliseconds.
212
214 • "search()" in Search::Elasticsearch::Client::7_0::Direct
215
216 • "scroll()" in Search::Elasticsearch::Client::7_0::Direct
217
218 • "reindex()" in Search::Elasticsearch::Client::7_0::Direct
219
221 Enrico Zimuel <enrico.zimuel@elastic.co>
222
224 This software is Copyright (c) 2022 by Elasticsearch BV.
225
226 This is free software, licensed under:
227
228 The Apache License, Version 2.0, January 2004
229
230
231
232perl v5.36.0 S2e0a2r2c-h0:7:-E3l1asticsearch::Client::7_0::Scroll(3)