1PUBLIC-INBOX-INDEX(1) public-inbox user manual PUBLIC-INBOX-INDEX(1)
2
3
4
6 public-inbox-index - create and update search indices
7
9 public-inbox-index [OPTIONS] INBOX_DIR...
10
11 public-inbox-index [OPTIONS] --all
12
14 public-inbox-index creates and updates the search, overview and NNTP
15 article number database used by the read-only public-inbox HTTP and
16 NNTP interfaces. Currently, this requires DBD::SQLite and DBI Perl
17 modules. Search::Xapian is optional, only to support the PSGI search
18 interface.
19
20 Once the initial indices are created by public-inbox-index,
21 public-inbox-mda(1) and public-inbox-watch(1) will automatically
22 maintain them.
23
24 Running this manually to update indices is only required if relying on
25 git-fetch(1) to mirror an existing public-inbox; or if upgrading to a
26 new version of public-inbox using the "--reindex" option.
27
28 Having the overview and article number database is essential to running
29 the NNTP interface, and strongly recommended for the HTTP interface as
30 it provides thread grouping in addition to normal search functionality.
31
33 -j JOBS
34 --jobs=JOBS
35 Influences the number of Xapian indexing shards in a
36 (public-inbox-v2-format(5)) inbox.
37
38 See "--jobs" in public-inbox-init(1) for a full description of
39 sharding.
40
41 "--jobs=0" is accepted as of public-inbox 1.6.0 to disable parallel
42 indexing regardless of the number of pre-existing shards.
43
44 If the inbox has not been indexed or initialized, "JOBS - 1" shards
45 will be created (one job is always needed for indexing the overview
46 and article number mapping).
47
48 Default: the number of existing Xapian shards
49
50 -c
51 --compact
52 Compacts the Xapian DBs after indexing. This is recommended when
53 using "--reindex" to avoid running out of disk space while indexing
54 multiple inboxes.
55
56 While option takes a negligible amount of time compared to
57 "--reindex", it requires temporarily duplicating the entire
58 contents of the Xapian DB.
59
60 This switch may be specified twice, in which case compaction
61 happens both before and after indexing to minimize the temporal
62 footprint of the (re)indexing operation.
63
64 Available since public-inbox 1.4.0.
65
66 --reindex
67 Forces a re-index of all messages in the inbox. This can be used
68 for in-place upgrades and bugfixes while NNTP/HTTP server processes
69 are utilizing the index. Keep in mind this roughly doubles the
70 size of the already-large Xapian database. Using this with
71 "--compact" or running public-inbox-compact(1) afterwards is
72 recommended to release free space.
73
74 public-inbox protects writes to various indices with flock(2), so
75 it is safe to reindex (and rethread) while public-inbox-watch(1),
76 public-inbox-mda(1) or public-inbox-learn(1) run.
77
78 This does not touch the NNTP article number database. It does not
79 affect threading unless "--rethread" is used.
80
81 --all
82 Index all inboxes configured in ~/.public-inbox/config. This is an
83 alternative to specifying individual inboxes directories on the
84 command-line.
85
86 --rethread
87 Regenerate internal THREADID and message thread associations when
88 reindexing.
89
90 This fixes some bugs in older versions of public-inbox. While it
91 is possible to use this without "--reindex", it makes little sense
92 to do so.
93
94 Available in public-inbox 1.6.0+.
95
96 --prune
97 Run git-gc(1) to prune and expire reflogs if discontiguous history
98 is detected. This is intended to be used in mirrors after running
99 public-inbox-edit(1) or public-inbox-purge(1) to ensure data is
100 expunged from mirrors.
101
102 Available since public-inbox 1.2.0.
103
104 --max-size SIZE
105 Sets or overrides "publicinbox.indexMaxSize" on a per-invocation
106 basis. See "publicinbox.indexMaxSize" below.
107
108 Available since public-inbox 1.5.0.
109
110 --batch-size SIZE
111 Sets or overrides "publicinbox.indexBatchSize" on a per-invocation
112 basis. See "publicinbox.indexBatchSize" below.
113
114 When using rotational storage but abundant RAM, using a large value
115 (e.g. "500m") with "--sequential-shard" can significantly speed up
116 and reduce fragmentation during the initial index and full
117 "--reindex" invocations (but not incremental updates).
118
119 Available in public-inbox 1.6.0+.
120
121 --no-fsync
122 Disables fsync(2) and fdatasync(2) operations on SQLite and Xapian.
123 This is only effective with Xapian 1.4+. This is primarily
124 intended for systems with low RAM and the small (default)
125 "--batch-size=1m". Users of large "--batch-size" may even find
126 disabling fdatasync(2) causes too much dirty data to accumulate,
127 resulting on latency spikes from writeback.
128
129 Available in public-inbox 1.6.0+.
130
131 --dangerous
132 Speed up initial index by using in-place updates and denying
133 support for concurrent readers. This is only effective with Xapian
134 1.4+.
135
136 Available in public-inbox 1.8.0+
137
138 --sequential-shard
139 Sets or overrides "publicinbox.indexSequentialShard" on a per-
140 invocation basis. See "publicinbox.indexSequentialShard" below.
141
142 Available in public-inbox 1.6.0+.
143
144 --skip-docdata
145 Stop storing document data in Xapian on an existing inbox.
146
147 See "--skip-docdata" in public-inbox-init(1) for description and
148 caveats.
149
150 Available in public-inbox 1.6.0+.
151
152 -E EXTINDEX
153 --update-extindex=EXTINDEX
154 Update the given external index (public-inbox-extindex-format(5).
155 Either the configured section name (e.g. "all") or a directory name
156 may be specified.
157
158 Defaults to "all" if "[extindex "all"]" is configured, otherwise no
159 external indices are updated.
160
161 May be specified multiple times in rare cases where multiple
162 external indices are configured.
163
164 --no-update-extindex
165 Do not update the "all" external index by default. This negates
166 all uses of "-E" / "--update-extindex=" on the command-line.
167
168 --since=DATESTRING
169 --after=DATESTRING
170 --until=DATESTRING
171 --before=DATESTRING
172 Passed directly to git-log(1) to limit changes for "--reindex"
173
175 For v1 (ssoma) repositories described in public-inbox-v1-format(5).
176 All public-inbox-specific files are contained within the
177 "$GIT_DIR/public-inbox/" directory.
178
179 v2 inboxes are described in public-inbox-v2-format(5).
180
182 publicinbox.indexMaxSize
183 Prevents indexing of messages larger than the specified size
184 value. A single suffix modifier of "k", "m" or "g" is
185 supported, thus the value of "1m" to prevents indexing of
186 messages larger than one megabyte.
187
188 This is useful for avoiding memory exhaustion in mirrors via
189 git. It does not prevent public-inbox-mda(1) or
190 public-inbox-watch(1) from importing (and indexing) a message.
191
192 This option is only available in public-inbox 1.5 or later.
193
194 Default: none
195
196 publicinbox.indexBatchSize
197 Flushes changes to the filesystem and releases locks after
198 indexing the given number of bytes. The default value of "1m"
199 (one megabyte) is low to minimize memory use and reduce
200 contention with parallel invocations of public-inbox-mda(1),
201 public-inbox-learn(1), and public-inbox-watch(1).
202
203 Increase this value on powerful systems to improve throughput
204 at the expense of memory use. The reduction of lock
205 granularity may not be noticeable on fast systems. With SSDs,
206 values above "4m" have little benefit.
207
208 For public-inbox-v2-format(5) inboxes, this value is multiplied
209 by the number of Xapian shards. Thus a typical v2 inbox with 3
210 shards will flush every 3 megabytes by default unless
211 parallelism is disabled via "--sequential-shard" or "--jobs=0".
212
213 This influences memory usage of Xapian, but it is not exact.
214 The actual memory used by Xapian and Perl has been observed in
215 excess of 10x this value.
216
217 This option is available in public-inbox 1.6 or later. public-
218 inbox 1.5 and earlier used the current default, "1m".
219
220 Default: 1m (one megabyte)
221
222 publicinbox.indexSequentialShard
223 For public-inbox-v2-format(5) inboxes, setting this to "true"
224 allows indexing Xapian shards in multiple passes. This speeds
225 up indexing on rotational storage with high seek latency by
226 allowing individual shards to fit into the kernel page cache.
227
228 Using a higher-than-normal number of "--jobs" with
229 public-inbox-init(1) may be required to ensure individual
230 shards are small enough to fit into cache.
231
232 Warning: interrupting "public-inbox-index(1)" while this option
233 is in use may leave the search indices out-of-date with respect
234 to SQLite databases. WWW and IMAP users may notice incomplete
235 search results, but it is otherwise non-fatal. Using
236 "--reindex" will bring everything back up-to-date.
237
238 Available in public-inbox 1.6.0+.
239
240 This is ignored on public-inbox-v1-format(5) inboxes.
241
242 Default: false, shards are indexed in parallel
243
244 publicinbox.<name>.indexSequentialShard
245 Identical to "publicinbox.indexSequentialShard", but only
246 affect the inbox matching <name>.
247
249 PI_CONFIG
250 Used to override the default "~/.public-inbox/config" value.
251
252 XAPIAN_FLUSH_THRESHOLD
253 The number of documents to update before committing changes to
254 disk. This environment is handled directly by Xapian, refer to
255 Xapian API documentation for more details.
256
257 For public-inbox 1.6 and later, use
258 "publicinbox.indexBatchSize" instead.
259
260 Setting "XAPIAN_FLUSH_THRESHOLD" or
261 "publicinbox.indexBatchSize" for a large "--reindex" may cause
262 public-inbox-mda(1), public-inbox-learn(1) and
263 public-inbox-watch(1) tasks to wait long and unpredictable
264 periods of time during "--reindex".
265
266 Default: none, uses "publicinbox.indexBatchSize"
267
269 Occasionally, public-inbox will update it's schema version and require
270 a full index by running this command.
271
273 Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>
274
275 The mail archives are hosted at <https://public-inbox.org/meta/> and
276 <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
277
279 Copyright all contributors <mailto:meta@public-inbox.org>
280
281 License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
282
284 Search::Xapian, DBD::SQLite, public-inbox-extindex-format(5)
285
286
287
288public-inbox.git 1993-10-02 PUBLIC-INBOX-INDEX(1)