1Search::Elasticsearch::UCsleirenCto:n:t6r_i0bS:ue:taSercdcrhoP:le:lrE(ll3a)Dsotciucmseenatracthi:o:nClient::6_0::Scroll(3)
2
3
4
6 Search::Elasticsearch::Client::6_0::Scroll - A helper module for
7 scrolled searches
8
10 version 6.00
11
13 use Search::Elasticsearch;
14
15 my $es = Search::Elasticsearch->new;
16
17 my $scroll = $es->scroll_helper(
18 index => 'my_index',
19 body => {
20 query => {...},
21 size => 1000,
22 sort => '_doc'
23 }
24 );
25
26 say "Total hits: ". $scroll->total;
27
28 while (my $doc = $scroll->next) {
29 # do something
30 }
31
33 A scrolled search is a search that allows you to keep pulling results
34 until there are no more matching results, much like a cursor in an SQL
35 database.
36
37 Unlike paginating through results (with the "from" parameter in
38 search()), scrolled searches take a snapshot of the current state of
39 the index. Even if you keep adding new documents to the index or
40 updating existing documents, a scrolled search will only see the index
41 as it was when the search began.
42
43 This module is a helper utility that wraps the functionality of the
44 search() and scroll() methods to make them easier to use.
45
46 This class does Search::Elasticsearch::Client::6_0::Role::Scroll and
47 Search::Elasticsearch::Role::Is_Sync.
48
50 There are two primary use cases:
51
52 Pulling enough results
53 Perhaps you want to group your results by some field, and you don't
54 know exactly how many results you will need in order to return 10
55 grouped results. With a scrolled search you can keep pulling more
56 results until you have enough. For instance, you can search emails in
57 a mailing list, and return results grouped by "thread_id":
58
59 my (%groups,@results);
60
61 my $scroll = $es->scroll_helper(
62 index => 'my_emails',
63 type => 'email',
64 body => { query => {... some query ... }}
65 );
66
67 my $doc;
68 while (@results < 10 and $doc = $scroll->next) {
69
70 my $thread = $doc->{_source}{thread_id};
71
72 unless ($groups{$thread}) {
73 $groups{$thread} = [];
74 push @results, $groups{$thread};
75 }
76 push @{$groups{$thread}},$doc;
77
78 }
79
80 Extracting all documents
81 Often you will want to extract all (or a subset of) documents in an
82 index. If you want to change your type mappings, you will need to
83 reindex all of your data. Or perhaps you want to move a subset of the
84 data in one index into a new dedicated index. In these cases, you don't
85 care about sort order, you just want to retrieve all documents which
86 match a query, and do something with them. For instance, to retrieve
87 all the docs for a particular "client_id":
88
89 my $scroll = $es->scroll_helper(
90 index => 'my_index',
91 size => 1000,
92 body => {
93 query => {
94 match => {
95 client_id => 123
96 }
97 },
98 sort => '_doc'
99 }
100 );
101
102 while (my $doc = $scroll->next) {
103 # do something
104 }
105
106 Very often the something that you will want to do with these results
107 involves bulk-indexing them into a new index. The easiest way to do
108 this is to use the built-in "reindex()" in
109 Search::Elasticsearch::Client::6_0::Direct functionality provided by
110 Elasticsearch.
111
113 "new()"
114 use Search::Elasticsearch;
115
116 my $es = Search::Elasticsearch->new(...);
117 my $scroll = $es->scroll_helper(
118 scroll => '1m', # optional
119 scroll_in_qs => 0|1, # optional
120 %search_params
121 );
122
123 The "scroll_helper()" in Search::Elasticsearch::Client::6_0::Direct
124 method loads Search::Elasticsearch::Client::6_0::Scroll class and calls
125 "new()", passing in any arguments.
126
127 You can specify a "scroll" duration (which defaults to "1m") and
128 "scroll_in_qs" (which defaults to "false"). Any other parameters are
129 passed directly to "search()" in
130 Search::Elasticsearch::Client::6_0::Direct.
131
132 The "scroll" duration tells Elasticearch how long it should keep the
133 scroll alive. Note: this duration doesn't need to be long enough to
134 process all results, just long enough to process a single batch of
135 results. The expiry gets renewed for another "scroll" period every
136 time new a new batch of results is retrieved from the cluster.
137
138 By default, the "scroll_id" is passed as the "body" to the scroll
139 request. To send it in the query string instead, set "scroll_in_qs" to
140 a true value, but be aware: when querying very many indices, the scroll
141 ID can become too long for intervening proxies.
142
143 The "scroll" request uses "GET" by default. To use "POST" instead, set
144 send_get_body_as to "POST".
145
146 "next()"
147 $doc = $scroll->next;
148 @docs = $scroll->next($num);
149
150 The "next()" method returns the next result, or the next $num results
151 (pulling more results if required). If all results have been
152 exhausted, it returns an empty list.
153
154 "drain_buffer()"
155 @docs = $scroll->drain_buffer;
156
157 The "drain_buffer()" method returns all of the documents currently in
158 the buffer, without fetching any more from the cluster.
159
160 "refill_buffer()"
161 $total = $scroll->refill_buffer;
162
163 The "refill_buffer()" method fetches the next batch of results from the
164 cluster, stores them in the buffer, and returns the total number of
165 docs currently in the buffer.
166
167 "buffer_size()"
168 $total = $scroll->buffer_size;
169
170 The "buffer_size()" method returns the total number of docs currently
171 in the buffer.
172
173 "finish()"
174 $scroll->finish;
175
176 The "finish()" method clears out the buffer, sets "is_finished()" to
177 "true" and tries to clear the "scroll_id" on Elasticsearch. This API
178 is only supported since v0.90.6, but the call to "clear_scroll" is
179 wrapped in an "eval" so the "finish()" method can be safely called with
180 any version of Elasticsearch.
181
182 When the $scroll instance goes out of scope, "finish()" is called
183 automatically if required.
184
185 "is_finished()"
186 $bool = $scroll->is_finished;
187
188 A flag which returns "true" if all results have been processed or
189 "finish()" has been called.
190
192 The information from the original search is returned via the following
193 accessors:
194
195 "total"
196 The total number of documents that matched your query.
197
198 "max_score"
199 The maximum score of any documents in your query.
200
201 "aggregations"
202 Any aggregations that were specified, or "undef"
203
204 "facets"
205 Any facets that were specified, or "undef"
206
207 "suggest"
208 Any suggestions that were specified, or "undef"
209
210 "took"
211 How long the original search took, in milliseconds
212
213 "took_total"
214 How long the original search plus all subsequent batches took, in
215 milliseconds.
216
218 · "search()" in Search::Elasticsearch::Client::6_0::Direct
219
220 · "scroll()" in Search::Elasticsearch::Client::6_0::Direct
221
222 · "reindex()" in Search::Elasticsearch::Client::6_0::Direct
223
225 Clinton Gormley <drtech@cpan.org>
226
228 This software is Copyright (c) 2017 by Elasticsearch BV.
229
230 This is free software, licensed under:
231
232 The Apache License, Version 2.0, January 2004
233
234
235
236perl v5.28.1 S2e0a1r7c-h1:1:-E1l4asticsearch::Client::6_0::Scroll(3)