1INDEXER(1)                       Sphinxsearch                       INDEXER(1)
2
3
4

NAME

6       indexer - Sphinxsearch fulltext index generator
7

SYNOPSIS

9       indexer [--config CONFIGFILE] [--rotate] [--noprogress | --quiet]
10               [--all | INDEX | ...]
11
12       indexer --buildstops OUTPUTFILE COUNT [--config CONFIGFILE]
13               [--noprogress | --quiet] [--all | INDEX | ...]
14
15       indexer --merge MAIN_INDEX DELTA_INDEX [--config CONFIGFILE] [--rotate]
16               [--noprogress | --quiet]
17

DESCRIPTION

19       Sphinx is a collection of programs that aim to provide high quality
20       fulltext search.
21
22       indexer is the first of the two principle tools as part of Sphinx.
23       Invoked from either the command line directly, or as part of a larger
24       script, indexer is solely responsible for gathering the data that will
25       be searchable.
26
27       The calling syntax for indexer is as follows:
28
29           $ indexer [OPTIONS] [indexname1 [indexname2 [...]]]
30
31       Essentially you would list the different possible indexes (that you
32       would later make available to search) in sphinx.conf, so when calling
33       indexer, as a minimum you need to be telling it what index (or indexes)
34       you want to index.
35
36       If sphinx.conf contained details on 2 indexes, mybigindex and
37       mysmallindex, you could do the following:
38
39           $ indexer mybigindex
40           $ indexer mysmallindex mybigindex
41
42       As part of the configuration file, sphinx.conf, you specify one or more
43       indexes for your data. You might call indexer to reindex one of them,
44       ad-hoc, or you can tell it to process all indexes - you are not limited
45       to calling just one, or all at once, you can always pick some
46       combination of the available indexes.
47

OPTIONS

49       The majority of the options for indexer are given in the configuration
50       file, however there are some options you might need to specify on the
51       command line as well, as they can affect how the indexing operation is
52       performed. These options are:
53
54       --all
55           Tells indexer to update every index listed in sphinx.conf, instead
56           of listing individual indexes. This would be useful in small
57           configurations, or cron-type or maintenance jobs where the entire
58           index set will get rebuilt each day, or week, or whatever period is
59           best.
60
61           Example usage:
62
63               $ indexer --config /home/myuser/sphinx.conf --all
64
65       --buildstops outfile.txt NUM
66           Reviews the index source, as if it were indexing the data, and
67           produces a list of the terms that are being indexed. In other
68           words, it produces a list of all the searchable terms that are
69           becoming part of the index. Note; it does not update the index in
70           question, it simply processes the data 'as if' it were indexing,
71           including running queries defined with sql_query_pre or
72           sql_query_post.  outputfile.txt will contain the list of words, one
73           per line, sorted by frequency with most frequent first, and NUM
74           specifies the maximum number of words that will be listed; if
75           sufficiently large to encompass every word in the index, only that
76           many words will be returned. Such a dictionary list could be used
77           for client application features around "Did you mean..."
78           functionality, usually in conjunction with --buildfreqs, below.
79
80           Example:
81
82               $ indexer myindex --buildstops word_freq.txt 1000
83
84           This would produce a document in the current directory,
85           word_freq.txt with the 1,000 most common words in 'myindex',
86           ordered by most common first. Note that the file will pertain to
87           the last index indexed when specified with multiple indexes or
88           --all (i.e. the last one listed in the configuration file)
89
90       --buildfreqs
91           Used in pair with --buildstops (and is ignored if --buildstops is
92           not specified). As --buildstops provides the list of words used
93           within the index, --buildfreqs adds the quantity present in the
94           index, which would be useful in establishing whether certain words
95           should be considered stopwords if they are too prevalent. It will
96           also help with developing "Did you mean..." features where you can
97           how much more common a given word compared to another, similar one.
98
99           Example:
100
101               $ indexer myindex --buildstops word_freq.txt 1000 --buildfreqs
102
103           This would produce the word_freq.txt as above, however after each
104           word would be the number of times it occurred in the index in
105           question.
106
107       --config CONFIGRILE, -c CONFIGFILE
108           Use the given file as configuration. Normally, it will look for
109           sphinx.conf in the installation directory
110           (e.g./usr/local/sphinx/etc/sphinx.conf if installed into
111           /usr/local/sphinx), followed by the current directory you are in
112           when calling indexer from the shell. This is most of use in shared
113           environments where the binary files are installed somewhere like
114           /usr/local/sphinx/ but you want to provide users with the ability
115           to make their own custom Sphinx set-ups, or if you want to run
116           multiple instances on a single server. In cases like those you
117           could allow them to create their own sphinx.conf files and pass
118           them to indexer with this option.
119
120           For example:
121
122               $ indexer --config /home/myuser/sphinx.conf myindex
123
124       --dump-rows FILE
125           Dumps rows fetched by SQL source(s) into the specified file, in a
126           MySQL compatible syntax. Resulting dumps are the exact
127           representation of data as received by indexer and help to repeat
128           indexing-time issues.
129
130       --merge DST-INDEX SRC-INDEX
131           Physically merge together two indexes. For example if you have a
132           main+delta scheme, where the main index rarely changes, but the
133           delta index is rebuilt frequently, and --merge would be used to
134           combine the two. The operation moves from right to left - the
135           contents of SRC-INDEX get examined and physically combined with the
136           contents of DST-INDEX and the result is left in DST-INDEX. In
137           pseudo-code, it might be expressed as: DST-INDEX += SRC-INDEX
138
139           An example:
140
141               $ indexer --merge main delta --rotate
142
143           In the above example, where the main is the master, rarely modified
144           index, and delta is the less frequently modified one, you might use
145           the above to call indexer to combine the contents of the delta into
146           the main index and rotate the indexes.
147
148       --merge-dst-range ATTR MIN MAX
149           Run the filter range given upon merging. Specifically, as the merge
150           is applied to the destination index (as part of --merge, and is
151           ignored if --merge is not specified), indexer will also filter the
152           documents ending up in the destination index, and only documents
153           will pass through the filter given will end up in the final index.
154           This could be used for example, in an index where there is a
155           'deleted' attribute, where 0 means 'not deleted'. Such an index
156           could be merged with:
157
158               $ indexer --merge main delta --merge-dst-range deleted 0 0
159
160           Any documents marked as deleted (value 1) would be removed from the
161           newly-merged destination index. It can be added several times to
162           the command line, to add successive filters to the merge, all of
163           which must be met in order for a document to become part of the
164           final index.
165
166       --merge-killlists, --merge-klists
167           Used in pair with --merge. Usually when merging indexer uses
168           kill-list of source index (i.e., the one which is merged into) as
169           the filter to wipe out the matching docs from the destination
170           index. At the same time the kill-list of the destination itself
171           isn't touched at all. When using --merge-killlists, (or it shorter
172           form --merge-klists) the indexer will not filter the dst-index docs
173           with src-index killlist, but it will merge their kill-lists
174           together, so the final result index will have the kill-list
175           containing the merged source kill-lists.
176
177       --noprogress
178           Don't display progress details as they occur; instead, the final
179           status details (such as documents indexed, speed of indexing and so
180           on are only reported at completion of indexing. In instances where
181           the script is not being run on a console (or 'tty'), this will be
182           on by default.
183
184           Example usage:
185
186               $ indexer --rotate --all --noprogress
187
188       --print-queries
189           Prints out SQL queries that indexer sends to the database, along
190           with SQL connection and disconnection events. That is useful to
191           diagnose and fix problems with SQL sources.
192
193       --quiet
194           Tells indexer not to output anything, unless there is an error.
195           Again, most used for cron-type, or other script jobs where the
196           output is irrelevant or unnecessary, except in the event of some
197           kind of error.
198
199           Example usage:
200
201               $ indexer --rotate --all --quiet
202
203       --rotate
204           Used for rotating indexes. Unless you have the situation where you
205           can take the search function offline without troubling users, you
206           will almost certainly need to keep search running whilst indexing
207           new documents.  --rotate creates a second index, parallel to the
208           first (in the same place, simply including .new in the filenames).
209           Once complete, indexer notifies searchd via sending the SIGHUP
210           signal, and searchd will attempt to rename the indexes (renaming
211           the existing ones to include .old and renaming the .new to replace
212           them), and then start serving from the newer files. Depending on
213           the setting of seamless_rotate, there may be a slight delay in
214           being able to search the newer indexes.
215
216           Example usage:
217
218               $ indexer --rotate --all
219
220       --sighup-each
221           is useful when you are rebuilding many big indexes, and want each
222           one rotated into searchd as soon as possible. With --sighup-each,
223           indexer will send a SIGHUP signal to searchd after succesfully
224           completing the work on each index. (The default behavior is to send
225           a single SIGHUP after all the indexes were built.)
226
227       --verbose
228           Guarantees that every row that caused problems indexing (duplicate,
229           zero, or missing document ID; or file field IO issues; etc) will be
230           reported. By default, this option is off, and problem summaries may
231           be reported instead.
232

AUTHOR

234       Andrey Aksenoff (shodan@sphinxsearch.com). This manual page is written
235       by Alexey Vinogradov (klirichek@sphinxsearch.com), using the one
236       written by Christian Hofstaedtler ch+debian-packages@zeha.at for the
237       Debian system (but may be used by others). Permission is granted to
238       copy, distribute and/or modify this document under the terms of the GNU
239       General Public License, Version 2 any later version published by the
240       Free Software Foundation.
241
242       On Debian systems, the complete text of the GNU General Public License
243       can be found in /usr/share/common-licenses/GPL.
244

SEE ALSO

246       searchd(1), search(1), indextool(1), spelldump(1)
247
248       Sphinx and it's programs are documented fully by the Sphinx reference
249       manual available in /usr/share/doc/sphinxsearch.
250
251
252
2532.2.11-release                    07/19/2016                        INDEXER(1)
Impressum