1PUBLIC-INBOX-V1-FORMAT(5)  public-inbox user manual  PUBLIC-INBOX-V1-FORMAT(5)
2
3
4

NAME

6       public-inbox-v1-format - git repository and tree description (aka
7       "ssoma")
8

DESCRIPTION

10       WARNING: this does NOT describe the scalable v2 format used by public-
11       inbox.  Use of ssoma is not recommended for new installations due to
12       scalability problems.
13
14       ssoma uses a git repository to store each email as a git blob.  The
15       tree filename of the blob is based on the SHA1 hexdigest of the first
16       Message-ID header.  A commit is made for each message delivered.  The
17       commit SHA-1 identifier is used by ssoma clients to track
18       synchronization state.
19

PATHNAMES IN TREES

21       A Message-ID may be extremely long and also contain slashes, so using
22       them as a path name is challenging.  Instead we use the SHA-1 hexdigest
23       of the Message-ID (excluding the leading "<" and trailing ">") to
24       generate a path name.  Leading and trailing white space in the Message-
25       ID header is ignored for hashing.
26
27       A message with Message-ID of: <20131106023245.GA20224@dcvr.yhbt.net>
28
29       Would be stored as: f2/8c6cfd2b0a65f994c3e1be266105413b3d3f63
30
31       Thus it is easy to look up the contents of a message matching a given a
32       Message-ID.
33

MESSAGE-ID CONFLICTS

35       public-inbox v1 repositories currently do not resolve conflicting
36       Message-IDs or messages with multiple Message-IDs.
37

HEADERS

39       The Message-ID header is required.  "Bytes", "Lines" and "Content-
40       Length" headers are stripped and not allowed, they can interfere with
41       further processing.  When using ssoma with public-inbox-mda, the
42       "Status" mbox header is also stripped as that header makes no sense in
43       a public archive.
44

LOCKING

46       flock(2) locking exclusively locks the empty $GIT_DIR/ssoma.lock file
47       for all non-atomic operations.
48

EXAMPLE INPUT FLOW (SERVER-SIDE MDA)

50       1. Message is delivered to a mail transport agent (MTA)
51
52       1a. (optional) reject/discard spam, this should run before ssoma-mda
53
54       1b. (optional) reject/strip unwanted attachments
55
56       ssoma-mda handles all steps once invoked.
57
58       2. Mail transport agent invokes ssoma-mda
59
60       3. reads message via stdin, extracting Message-ID
61
62       4. acquires exclusive flock lock on $GIT_DIR/ssoma.lock
63
64       5. creates or updates the blob of associated 2/38 SHA-1 path
65
66       6. updates the index and commits
67
68       7. releases $GIT_DIR/ssoma.lock
69
70       ssoma-mda can also be used as an inotify(7) trigger to monitor
71       maildirs, and the ability to monitor IMAP mailboxes using IDLE will be
72       available in the future.
73

GIT REPOSITORIES (SERVERS)

75       ssoma uses bare git repositories on both servers and clients.
76
77       Using the git-init(1) command with --bare is the recommend method of
78       creating a git repository on a server:
79
80               git init --bare /path/to/wherever/you/want.git
81
82       There are no standardized paths for servers, administrators make all
83       the choices regarding git repository locations.
84
85       Special files in $GIT_DIR on the server:
86
87       $GIT_DIR/ssoma.lock
88           An empty file for flock(2) locking.  This is necessary to ensure
89           the index and commits are updated consistently and multiple
90           processes running MDA do not step on each other.
91
92       $GIT_DIR/public-inbox/msgmap.sqlite3
93           SQLite3 database maintaining a stable mapping of Message-IDs to
94           NNTP article numbers.  Used by public-inbox-nntpd(1) and created
95           and updated by public-inbox-index(1).
96
97           Users of the PublicInbox::WWW interface will find it useful for
98           attempting recovery from copy-paste truncations of URLs containing
99           long Message-IDs.
100
101           Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
102           and public-inbox-watch(1).
103
104           Losing or damaging this file will cause synchronization problems
105           for NNTP clients.  This file is expected to be stable and require
106           no updates to its schema.
107
108           Requires DBD::SQLite.
109
110       $GIT_DIR/public-inbox/xapian$N/
111           Xapian database for search indices in the PSGI web UI.
112
113           $N is the value of PublicInbox::Search::SCHEMA_VERSION, and
114           installations may have parallel versions on disk during upgrades or
115           to roll-back upgrades.
116
117           This is created and updated by public-inbox-index(1).
118
119           Automatically updated by public-inbox-mda(1), public-inbox-learn(1)
120           and public-inbox-watch(1).
121
122           This directory can always be regenerated with
123           public-inbox-index(1).  If lost or damaged, there is no need to
124           back it up unless the CPU/memory cost of regenerating it outweighs
125           the storage/transfer cost.
126
127           Since SCHEMA_VERSION 15 and the development of the v2 format, the
128           "overview" DB also exists in the xapian directory for v1
129           repositories.  See "OVERVIEW DB" in public-inbox-v2-format(5)
130
131           Our use of the "OVERVIEW DB" requires Xapian document IDs to remain
132           stable.  Using public-inbox-compact(1) and public-inbox-xcpdb(1)
133           wrappers are recommended over tools provided by Xapian.
134
135           This directory is large, often two to three times the size of the
136           objects stored in a packed git repository.
137
138       $GIT_DIR/ssoma.index
139           This file is no longer used or created by public-inbox, but it is
140           updated if it exists to remain compatible with ssoma installations.
141
142           A git index file used for MDA updates.  The normal git index (in
143           $GIT_DIR/index) is not used at all as there is typically no working
144           tree.
145
146       Each client $GIT_DIR may have multiple mbox/maildir/command targets.
147       It is possible for a client to extract the mail stored in the git
148       repository to multiple mboxes for compatibility with a variety of
149       different tools.
150

CAVEATS

152       It is NOT recommended to check out the working directory of a git.
153       there may be many files.
154
155       It is impossible to completely expunge messages, even spam, as git
156       retains full history.  Projects may (with adequate notice) cycle to new
157       repositories/branches with history cleaned up via git-filter-repo(1) or
158       git-filter-branch(1).  This is up to the administrators.
159
161       Copyright 2013-2021 all contributors <mailto:meta@public-inbox.org>
162
163       License: AGPL-3.0+ <http://www.gnu.org/licenses/agpl-3.0.txt>
164

SEE ALSO

166       gitrepository-layout(5), ssoma(1)
167
168
169
170public-inbox.git                  1993-10-02         PUBLIC-INBOX-V1-FORMAT(5)
Impressum