1PUBLIC-INBOX-INDEX(1)      public-inbox user manual      PUBLIC-INBOX-INDEX(1)
2
3
4

NAME

6       public-inbox-index - create and update search indices
7

SYNOPSIS

9       public-inbox-index [OPTIONS] INBOX_DIR...
10
11       public-inbox-index [OPTIONS] --all
12

DESCRIPTION

14       public-inbox-index creates and updates the search, overview and NNTP
15       article number database used by the read-only public-inbox HTTP and
16       NNTP interfaces.  Currently, this requires DBD::SQLite and DBI Perl
17       modules.  Search::Xapian is optional, only to support the PSGI search
18       interface.
19
20       Once the initial indices are created by public-inbox-index,
21       public-inbox-mda(1) and public-inbox-watch(1) will automatically
22       maintain them.
23
24       Running this manually to update indices is only required if relying on
25       git-fetch(1) to mirror an existing public-inbox; or if upgrading to a
26       new version of public-inbox using the "--reindex" option.
27
28       Having the overview and article number database is essential to running
29       the NNTP interface, and strongly recommended for the HTTP interface as
30       it provides thread grouping in addition to normal search functionality.
31

OPTIONS

33       -j JOBS
34       --jobs=JOBS
35           Influences the number of Xapian indexing shards in a
36           (public-inbox-v2-format(5)) inbox.
37
38           See "--jobs" in public-inbox-init(1) for a full description of
39           sharding.
40
41           "--jobs=0" is accepted as of public-inbox 1.6.0 to disable parallel
42           indexing regardless of the number of pre-existing shards.
43
44           If the inbox has not been indexed or initialized, "JOBS - 1" shards
45           will be created (one job is always needed for indexing the overview
46           and article number mapping).
47
48           Default: the number of existing Xapian shards
49
50       -c
51       --compact
52           Compacts the Xapian DBs after indexing.  This is recommended when
53           using "--reindex" to avoid running out of disk space while indexing
54           multiple inboxes.
55
56           While option takes a negligible amount of time compared to
57           "--reindex", it requires temporarily duplicating the entire
58           contents of the Xapian DB.
59
60           This switch may be specified twice, in which case compaction
61           happens both before and after indexing to minimize the temporal
62           footprint of the (re)indexing operation.
63
64           Available since public-inbox 1.4.0.
65
66       --reindex
67           Forces a re-index of all messages in the inbox.  This can be used
68           for in-place upgrades and bugfixes while NNTP/HTTP server processes
69           are utilizing the index.  Keep in mind this roughly doubles the
70           size of the already-large Xapian database.  Using this with
71           "--compact" or running public-inbox-compact(1) afterwards is
72           recommended to release free space.
73
74           public-inbox protects writes to various indices with flock(2), so
75           it is safe to reindex (and rethread) while public-inbox-watch(1),
76           public-inbox-mda(1) or public-inbox-learn(1) run.
77
78           This does not touch the NNTP article number database.  It does not
79           affect threading unless "--rethread" is used.
80
81       --all
82           Index all inboxes configured in ~/.public-inbox/config.  This is an
83           alternative to specifying individual inboxes directories on the
84           command-line.
85
86       --rethread
87           Regenerate internal THREADID and message thread associations when
88           reindexing.
89
90           This fixes some bugs in older versions of public-inbox.  While it
91           is possible to use this without "--reindex", it makes little sense
92           to do so.
93
94           Available in public-inbox 1.6.0+.
95
96       --prune
97           Run git-gc(1) to prune and expire reflogs if discontiguous history
98           is detected.  This is intended to be used in mirrors after running
99           public-inbox-edit(1) or public-inbox-purge(1) to ensure data is
100           expunged from mirrors.
101
102           Available since public-inbox 1.2.0.
103
104       --max-size SIZE
105           Sets or overrides "publicinbox.indexMaxSize" on a per-invocation
106           basis.  See "publicinbox.indexMaxSize" below.
107
108           Available since public-inbox 1.5.0.
109
110       --batch-size SIZE
111           Sets or overrides "publicinbox.indexBatchSize" on a per-invocation
112           basis.  See "publicinbox.indexBatchSize" below.
113
114           When using rotational storage but abundant RAM, using a large value
115           (e.g. "500m") with "--sequential-shard" can significantly speed up
116           and reduce fragmentation during the initial index and full
117           "--reindex" invocations (but not incremental updates).
118
119           Available in public-inbox 1.6.0+.
120
121       --no-fsync
122           Disables fsync(2) and fdatasync(2) operations on SQLite and Xapian.
123           This is only effective with Xapian 1.4+.  This is primarily
124           intended for systems with low RAM and the small (default)
125           "--batch-size=1m".  Users of large "--batch-size" may even find
126           disabling fdatasync(2) causes too much dirty data to accumulate,
127           resulting on latency spikes from writeback.
128
129           Available in public-inbox 1.6.0+.
130
131       --dangerous
132           Speed up initial index by using in-place updates and denying
133           support for concurrent readers.  This is only effective with Xapian
134           1.4+.
135
136           Available in public-inbox 1.8.0+
137
138       --sequential-shard
139           Sets or overrides "publicinbox.indexSequentialShard" on a per-
140           invocation basis.  See "publicinbox.indexSequentialShard" below.
141
142           Available in public-inbox 1.6.0+.
143
144       --skip-docdata
145           Stop storing document data in Xapian on an existing inbox.
146
147           See "--skip-docdata" in public-inbox-init(1) for description and
148           caveats.
149
150           Available in public-inbox 1.6.0+.
151
152       -E EXTINDEX
153       --update-extindex=EXTINDEX
154           Update the given external index (public-inbox-extindex-format(5).
155           Either the configured section name (e.g. "all") or a directory name
156           may be specified.
157
158           Defaults to "all" if "[extindex "all"]" is configured, otherwise no
159           external indices are updated.
160
161           May be specified multiple times in rare cases where multiple
162           external indices are configured.
163
164       --no-update-extindex
165           Do not update the "all" external index by default.  This negates
166           all uses of "-E" / "--update-extindex=" on the command-line.
167
168       --since=DATESTRING
169       --after=DATESTRING
170       --until=DATESTRING
171       --before=DATESTRING
172           Passed directly to git-log(1) to limit changes for "--reindex"
173

FILES

175       For v1 (ssoma) repositories described in public-inbox-v1-format(5).
176       All public-inbox-specific files are contained within the
177       "$GIT_DIR/public-inbox/" directory.
178
179       v2 inboxes are described in public-inbox-v2-format(5).
180

CONFIGURATION

182       publicinbox.indexMaxSize
183               Prevents indexing of messages larger than the specified size
184               value.  A single suffix modifier of "k", "m" or "g" is
185               supported, thus the value of "1m" to prevents indexing of
186               messages larger than one megabyte.
187
188               This is useful for avoiding memory exhaustion in mirrors via
189               git.  It does not prevent public-inbox-mda(1) or
190               public-inbox-watch(1) from importing (and indexing) a message.
191
192               This option is only available in public-inbox 1.5 or later.
193
194               Default: none
195
196       publicinbox.indexBatchSize
197               Flushes changes to the filesystem and releases locks after
198               indexing the given number of bytes.  The default value of "1m"
199               (one megabyte) is low to minimize memory use and reduce
200               contention with parallel invocations of public-inbox-mda(1),
201               public-inbox-learn(1), and public-inbox-watch(1).
202
203               Increase this value on powerful systems to improve throughput
204               at the expense of memory use.  The reduction of lock
205               granularity may not be noticeable on fast systems.  With SSDs,
206               values above "4m" have little benefit.
207
208               For public-inbox-v2-format(5) inboxes, this value is multiplied
209               by the number of Xapian shards.  Thus a typical v2 inbox with 3
210               shards will flush every 3 megabytes by default unless
211               parallelism is disabled via "--sequential-shard" or "--jobs=0".
212
213               This influences memory usage of Xapian, but it is not exact.
214               The actual memory used by Xapian and Perl has been observed in
215               excess of 10x this value.
216
217               This option is available in public-inbox 1.6 or later.  public-
218               inbox 1.5 and earlier used the current default, "1m".
219
220               Default: 1m (one megabyte)
221
222       publicinbox.indexSequentialShard
223               For public-inbox-v2-format(5) inboxes, setting this to "true"
224               allows indexing Xapian shards in multiple passes.  This speeds
225               up indexing on rotational storage with high seek latency by
226               allowing individual shards to fit into the kernel page cache.
227
228               Using a higher-than-normal number of "--jobs" with
229               public-inbox-init(1) may be required to ensure individual
230               shards are small enough to fit into cache.
231
232               Warning: interrupting "public-inbox-index(1)" while this option
233               is in use may leave the search indices out-of-date with respect
234               to SQLite databases.  WWW and IMAP users may notice incomplete
235               search results, but it is otherwise non-fatal.  Using
236               "--reindex" will bring everything back up-to-date.
237
238               Available in public-inbox 1.6.0+.
239
240               This is ignored on public-inbox-v1-format(5) inboxes.
241
242               Default: false, shards are indexed in parallel
243
244       publicinbox.<name>.indexSequentialShard
245               Identical to "publicinbox.indexSequentialShard", but only
246               affect the inbox matching <name>.
247

ENVIRONMENT

249       PI_CONFIG
250               Used to override the default "~/.public-inbox/config" value.
251
252       XAPIAN_FLUSH_THRESHOLD
253               The number of documents to update before committing changes to
254               disk.  This environment is handled directly by Xapian, refer to
255               Xapian API documentation for more details.
256
257               For public-inbox 1.6 and later, use
258               "publicinbox.indexBatchSize" instead.
259
260               Setting "XAPIAN_FLUSH_THRESHOLD" or
261               "publicinbox.indexBatchSize" for a large "--reindex" may cause
262               public-inbox-mda(1), public-inbox-learn(1) and
263               public-inbox-watch(1) tasks to wait long and unpredictable
264               periods of time during "--reindex".
265
266               Default: none, uses "publicinbox.indexBatchSize"
267

UPGRADING

269       Occasionally, public-inbox will update it's schema version and require
270       a full index by running this command.
271

CONTACT

273       Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>
274
275       The mail archives are hosted at <https://public-inbox.org/meta/> and
276       <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
277
279       Copyright all contributors <mailto:meta@public-inbox.org>
280
281       License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
282

SEE ALSO

284       Search::Xapian, DBD::SQLite, public-inbox-extindex-format(5)
285
286
287
288public-inbox.git                  1993-10-02             PUBLIC-INBOX-INDEX(1)
Impressum