1PUBLIC-INBOX-TUNING(7) public-inbox user manual PUBLIC-INBOX-TUNING(7)
2
3
4
6 public-inbox-tuning - tuning public-inbox
7
9 public-inbox intends to support a wide variety of hardware. While we
10 strive to provide the best out-of-the-box performance possible, tuning
11 knobs are an unfortunate necessity in some cases.
12
13 1. New inboxes: public-inbox-init -V2
14
15 2. Optional Inline::C use
16
17 3. Performance on rotational hard disk drives
18
19 4. Btrfs (and possibly other copy-on-write filesystems)
20
21 5. Performance on solid state drives
22
23 6. Read-only daemons
24
25 7. Other OS tuning knobs
26
27 8. Scalability to many inboxes
28
29 New inboxes: public-inbox-init -V2
30 If you're starting a new inbox (and not mirroring an existing one), the
31 -V2 requires DBD::SQLite, but is orders of magnitude more scalable than
32 the original "-V1" format.
33
34 Optional Inline::C use
35 Our optional use of Inline::C speeds up subprocess spawning from large
36 daemon processes.
37
38 To enable Inline::C, either set the "PERL_INLINE_DIRECTORY" environment
39 variable to point to a writable directory, or create
40 "~/.cache/public-inbox/inline-c" for any user(s) running public-inbox
41 processes.
42
43 If libgit2 development files are installed and Inline::C is enabled
44 (described above), per-inbox "git cat-file --batch" processes are
45 replaced with a single perl(1) process running
46 "PublicInbox::Gcf2::loop" in read-only daemons. libgit2 use will be
47 available in public-inbox 1.7.0+
48
49 More (optional) Inline::C use will be introduced in the future to lower
50 memory use and improve scalability.
51
52 Note: Inline::C is required for lei(1), but not public-inbox-*
53
54 Performance on rotational hard disk drives
55 Random I/O performance is poor on rotational HDDs. Xapian indexing
56 performance degrades significantly as DBs grow larger than available
57 RAM. Attempts to parallelize random I/O on HDDs leads to pathological
58 slowdowns as inboxes grow.
59
60 While "-V2" introduced Xapian shards as a parallelization mechanism for
61 SSDs; enabling "publicInbox.indexSequentialShard" repurposes sharding
62 as mechanism to reduce the kernel page cache footprint when indexing on
63 HDDs.
64
65 Initializing a mirror with a high "--jobs" count to create more shards
66 (in "-V2" inboxes) will keep each shard smaller and reduce its kernel
67 page cache footprint. Keep in mind excessive sharding imposes a
68 performance penalty for read-only queries.
69
70 Users with large amounts of RAM are advised to set a large value for
71 "publicinbox.indexBatchSize" as documented in public-inbox-index(1).
72
73 "dm-crypt" users on Linux 4.0+ are advised to try the
74 "--perf-same_cpu_crypt" "--perf-submit_from_crypt_cpus" switches of
75 cryptsetup(8) to reduce I/O contention from kernel workqueue threads.
76
77 Btrfs (and possibly other copy-on-write filesystems)
78 btrfs(5) performance degrades from fragmentation when using large
79 databases and random writes. The Xapian + SQLite indices used by
80 public-inbox are no exception to that.
81
82 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
83 indices on btrfs to achieve acceptable performance (even on SSD).
84 Disabling copy-on-write also disables checksumming, thus "raid1" (or
85 higher) configurations may be corrupt after unsafe shutdowns.
86
87 Fortunately, these SQLite and Xapian indices are designed to
88 recoverable from git if missing.
89
90 Disabling CoW does not prevent all fragmentation. Large values of
91 "publicInbox.indexBatchSize" also limit fragmentation during the
92 initial index.
93
94 Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
95 Snapshots use CoW despite our efforts to disable it, resulting in
96 fragmentation.
97
98 filefrag(8) can be used to monitor fragmentation, and "btrfs filesystem
99 defragment -fr $INBOX_DIR" may be necessary.
100
101 Large filesystems benefit significantly from the "space_cache=v2" mount
102 option documented in btrfs(5).
103
104 Older, non-CoW filesystems are generally work well out-of-the-box for
105 our Xapian and SQLite indices.
106
107 Performance on solid state drives
108 While SSD read performance is generally good, SSD write performance
109 degrades as the drive ages and/or gets full. Issuing "TRIM" commands
110 via fstrim(8) or similar is required to sustain write performance.
111
112 Users of the Flash-Friendly File System F2FS
113 <https://en.wikipedia.org/wiki/F2FS> may benefit from optimizations
114 found in SQLite 3.21.0+. Benchmarks are greatly appreciated.
115
116 Read-only daemons
117 public-inbox-httpd(1), public-inbox-imapd(1), and public-inbox-nntpd(1)
118 are all designed for C10K (or higher) levels of concurrency from a
119 single process. SMP systems may use "--worker-processes=NUM" as
120 documented in public-inbox-daemon(8) for parallelism.
121
122 The open file descriptor limit ("RLIMIT_NOFILE", "ulimit -n" in sh(1),
123 "LimitNOFILE=" in systemd.exec(5)) may need to be raised to accommodate
124 many concurrent clients.
125
126 Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
127 increases memory use of client sockets, sure to account for that in
128 capacity planning.
129
130 Other OS tuning knobs
131 Linux users: the "sys.vm.max_map_count" sysctl may need to be increased
132 if handling thousands of inboxes (with public-inbox-extindex(1)) to
133 avoid out-of-memory errors from git.
134
135 Other OSes may have similar tuning knobs (patches appreciated).
136
137 Scalability to many inboxes
138 public-inbox-extindex(1) allows any number of public-inboxes to share
139 the same Xapian indices.
140
141 git 2.33+ startup time is orders-of-magnitude faster and uses less
142 memory when dealing with thousands of alternates required for thousands
143 of inboxes with public-inbox-extindex(1).
144
145 Frequent packing (via git-gc(1)) both improves performance and reduces
146 the need to increase "sys.vm.max_map_count".
147
149 Feedback encouraged via plain-text mail to
150 <mailto:meta@public-inbox.org>
151
152 Information for *BSDs and non-traditional filesystems especially
153 welcome.
154
155 Our archives are hosted at <https://public-inbox.org/meta/>,
156 <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>,
157 and other places
158
160 Copyright all contributors <mailto:meta@public-inbox.org>
161
162 License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
163
164
165
166public-inbox.git 1993-10-02 PUBLIC-INBOX-TUNING(7)