1PUBLIC-INBOX-V1-FORMAT(5) public-inbox user manual PUBLIC-INBOX-V1-FORMAT(5)
2
3
4
6 public-inbox-v1-format - git repository and tree description (aka
7 "ssoma")
8
10 WARNING: this does NOT describe the scalable v2 format used by public-
11 inbox. Use of ssoma is not recommended for new installations due to
12 scalability problems.
13
14 ssoma uses a git repository to store each email as a git blob. The
15 tree filename of the blob is based on the SHA1 hexdigest of the first
16 Message-ID header. A commit is made for each message delivered. The
17 commit SHA-1 identifier is used by ssoma clients to track
18 synchronization state.
19
21 A Message-ID may be extremely long and also contain slashes, so using
22 them as a path name is challenging. Instead we use the SHA-1 hexdigest
23 of the Message-ID (excluding the leading "<" and trailing ">") to
24 generate a path name. Leading and trailing white space in the Message-
25 ID header is ignored for hashing.
26
27 A message with Message-ID of: <20131106023245.GA20224@dcvr.yhbt.net>
28
29 Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63
30
31 Thus it is easy to look up the contents of a message matching a given a
32 Message-ID.
33
35 public-inbox v1 repositories currently do not resolve conflicting
36 Message-IDs or messages with multiple Message-IDs.
37
39 The Message-ID header is required. "Bytes", "Lines" and "Content-
40 Length" headers are stripped and not allowed, they can interfere with
41 further processing. When using ssoma with public-inbox-mda, the
42 "Status" mbox header is also stripped as that header makes no sense in
43 a public archive.
44
46 flock(2) locking exclusively locks the empty $GIT_DIR/ssoma.lock file
47 for all non-atomic operations.
48
50 1. Message is delivered to a mail transport agent (MTA)
51
52 1a. (optional) reject/discard spam, this should run before ssoma-mda
53
54 1b. (optional) reject/strip unwanted attachments
55
56 ssoma-mda handles all steps once invoked.
57
58 2. Mail transport agent invokes ssoma-mda
59
60 3. reads message via stdin, extracting Message-ID
61
62 4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock
63
64 5. creates or updates the blob of associated 2/38 SHA-1 path
65
66 6. updates the index and commits
67
68 7. releases $GIT_DIR/ssoma.lock
69
70 ssoma-mda can also be used as an inotify(7) trigger to monitor
71 maildirs, and the ability to monitor IMAP mailboxes using IDLE will be
72 available in the future.
73
75 ssoma uses bare git repositories on both servers and clients.
76
77 Using the git-init(1) command with --bare is the recommend method of
78 creating a git repository on a server:
79
80 git init --bare /path/to/wherever/you/want.git
81
82 There are no standardized paths for servers, administrators make all
83 the choices regarding git repository locations.
84
85 Special files in $GIT_DIR on the server:
86
87 $GIT_DIR/ssoma.lock
88 An empty file for flock(2) locking. This is necessary to ensure
89 the index and commits are updated consistently and multiple
90 processes running MDA do not step on each other.
91
92 $GIT_DIR/public-inbox/msgmap.sqlite3
93 SQLite3 database maintaining a stable mapping of Message-IDs to
94 NNTP article numbers. Used by public-inbox-nntpd(1) and created
95 and updated by public-inbox-index(1).
96
97 Users of the PublicInbox::WWW interface will find it useful for
98 attempting recovery from copy-paste truncations of URLs containing
99 long Message-IDs.
100
101 Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
102 and public-inbox-watch(1).
103
104 Losing or damaging this file will cause synchronization problems
105 for NNTP clients. This file is expected to be stable and require
106 no updates to its schema.
107
108 Requires DBD::SQLite.
109
110 $GIT_DIR/public-inbox/xapian$N/
111 Xapian database for search indices in the PSGI web UI.
112
113 $N is the value of PublicInbox::Search::SCHEMA_VERSION, and
114 installations may have parallel versions on disk during upgrades or
115 to roll-back upgrades.
116
117 This is created and updated by public-inbox-index(1).
118
119 Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
120 and public-inbox-watch(1).
121
122 This directory can always be regenerated with
123 public-inbox-index(1). If lost or damaged, there is no need to
124 back it up unless the CPU/memory cost of regenerating it outweighs
125 the storage/transfer cost.
126
127 Since SCHEMA_VERSION 15 and the development of the v2 format, the
128 "overview" DB also exists in the xapian directory for v1
129 repositories. See "OVERVIEW DB" in public-inbox-v2-format(5)
130
131 Our use of the "OVERVIEW DB" requires Xapian document IDs to remain
132 stable. Using public-inbox-compact(1) and public-inbox-xcpdb(1)
133 wrappers are recommended over tools provided by Xapian.
134
135 This directory is large, often two to three times the size of the
136 objects stored in a packed git repository.
137
138 $GIT_DIR/ssoma.index
139 This file is no longer used or created by public-inbox, but it is
140 updated if it exists to remain compatible with ssoma installations.
141
142 A git index file used for MDA updates. The normal git index (in
143 $GIT_DIR/index) is not used at all as there is typically no working
144 tree.
145
146 Each client $GIT_DIR may have multiple mbox/maildir/command targets.
147 It is possible for a client to extract the mail stored in the git
148 repository to multiple mboxes for compatibility with a variety of
149 different tools.
150
152 It is NOT recommended to check out the working directory of a git.
153 there may be many files.
154
155 It is impossible to completely expunge messages, even spam, as git
156 retains full history. Projects may (with adequate notice) cycle to new
157 repositories/branches with history cleaned up via git-filter-repo(1) or
158 git-filter-branch(1). This is up to the administrators.
159
161 Copyright 2013-2021 all contributors <mailto:meta@public-inbox.org>
162
163 License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
164
166 gitrepository-layout(5), ssoma(1)
167
168
169
170public-inbox.git 1993-10-02 PUBLIC-INBOX-V1-FORMAT(5)