libinn_dbz(3)

1libinn_dbz(3)             InterNetNews Documentation             libinn_dbz(3)
2
3
4

NAME

6       dbz - Database routines for InterNetNews
7

SYNOPSIS

9           #include <inn/dbz.h>
10
11           #define DBZMAXKEY              ...
12           #define DBZ_INTERNAL_HASH_SIZE ...
13
14           typedef enum
15           {
16               DBZSTORE_OK,
17               DBZSTORE_EXISTS,
18               DBZSTORE_ERROR
19           } DBZSTORE_RESULT;
20
21           typedef enum
22           {
23               INCORE_NO,
24               INCORE_MEM,
25               INCORE_MMAP
26           } dbz_incore_val;
27
28           typedef struct {
29               bool writethrough;
30               dbz_incore_val pag_incore;
31               dbz_incore_val exists_incore;
32               bool nonblock;
33           } dbzoptions;
34
35           typedef struct {
36               char hash[DBZ_INTERNAL_HASH_SIZE];
37           } __attribute__((__packed__)) erec;
38
39           extern bool dbzinit(const char *name);
40           extern bool dbzclose(void);
41
42           extern bool dbzfresh(const char *name, off_t size);
43           extern bool dbzagain(const char *name, const char *oldname);
44           extern bool dbzexists(const HASH key);
45           extern bool dbzfetch(const HASH key, off_t *value);
46           extern DBZSTORE_RESULT dbzstore(const HASH key, off_t data);
47           extern bool dbzsync(void);
48           extern long dbzsize(off_t contents);
49           extern void dbzsetoptions(const dbzoptions options);
50           extern void dbzgetoptions(dbzoptions *options);
51

DESCRIPTION

53       These functions provide an indexing system for rapid random access to a
54       text file, hereafter named the base file.
55
56       dbz stores offsets into the base file for rapid retrieval.  All
57       retrievals are keyed on a hash value that is generated by the
58       HashMessageID function in libinn(3).
59
60       dbzinit opens a database, an index into the base file name, consisting
61       of files name.dir, name.index, and name.hash which must already exist.
62       (If the database is new, they should be zero-length files.)  Subsequent
63       accesses go to that database until dbzclose is called to close the
64       database.  When tagged hash format is used (if --enable-tagged-hash was
65       given at configure time), a name.pag file is used instead of .index and
66       .hash.
67
68       dbzfetch searches the database for the specified key, assigning the
69       offset of the base file for the corresponding key to value, if any.
70
71       dbzstore stores the key-data pair in the database.  It will return
72       "DBZSTORE_EXISTS" for duplicates (already existing entries), and
73       "DBZSTORE_OK" for success.  It will fail with "DBZSTORE_ERROR" if the
74       database files are not writable or not opened, or if any other error
75       occurs.
76
77       dbzexists will verify whether or not the given hash exists or not.  dbz
78       is optimized for this operation and it may be significantly faster than
79       dbzfetch.
80
81       dbzfresh is a variant of dbzinit for creating a new database with more
82       control over details.  The size parameter specifies the size of the
83       first hash table within the database, in number of key-value pairs.
84       Performance will be best if the number of key-value pairs stored in the
85       database does not exceed about 2/3 of size.  (The dbzsize function,
86       given the expected number of key-value pairs, will suggest a database
87       size that meets these criteria.)  Assuming that an fseek offset is 4
88       bytes, the .index file will be 4 * size bytes.  The .hash file will be
89       "DBZ_INTERNAL_HASH_SIZE" * size bytes (the .dir file is tiny and
90       roughly constant in size) until the number of key-value pairs exceeds
91       about 80% of size.  (Nothing awful will happen if the database grows
92       beyond 100% of size, but accesses will slow down quite a bit and the
93       .index and .hash files will grow somewhat.)
94
95       dbz stores up to "DBZ_INTERNAL_HASH_SIZE" bytes (by default, 4 bytes if
96       tagged hash format is used, 6 otherwise) of the Message-ID's hash in
97       the .hash file to confirm a hit.  This eliminates the need to read the
98       base file to handle collisions.
99
100       A size of 0 given to dbzfresh is synonymous with the local default; the
101       normal default is suitable for tables of 5,000,000 key-value pairs.
102       That default value is used by dbzinit.
103
104       When databases are regenerated periodically, as it is the case for the
105       history file, it is simplest to pick the parameters for a new database
106       based on the old one.  This also permits some memory of past sizes of
107       the old database, so that a new database size can be chosen to cover
108       expected fluctuations.  dbzagain is a variant of dbzinit for creating a
109       new database as a new generation of an old database.  The database
110       files for oldname must exist.  dbzagain is equivalent to calling
111       dbzfresh with a size equal to the result of applying dbzsize to the
112       largest number of entries in the oldname database and its previous 10
113       generations.
114
115       When many accesses are being done by the same program, dbz is massively
116       faster if its first hash table is in memory.  If the pag_incore flag is
117       set to "INCORE_MEM", an attempt is made to read the table in when the
118       database is opened, and dbzclose writes it out to disk again (if it was
119       read successfully and has been modified).  dbzsetoptions can be used to
120       set the pag_incore and exists_incore flags to different values which
121       should be "INCORE_NO" (read from disk), "INCORE_MEM" (read from memory)
122       or "INCORE_MMAP" (read from a mmap'ed file) for the .hash and .index
123       files separately; this does not affect the status of a database that
124       has already been opened.  The default is "INCORE_NO" for the .index
125       file and "INCORE_MMAP" for the .hash file.  The attempt to read the
126       table in may fail due to memory shortage; in this case dbz fails with
127       an error.  Stores to an in-memory database are not (in general) written
128       out to the file until dbzclose or dbzsync, so if robustness in the
129       presence of crashes or concurrent accesses is crucial, in-memory
130       databases should probably be avoided or the writethrough option should
131       be set to true (telling to systematically write to the filesystem in
132       addition to updating the in-memory database).
133
134       If the nonblock option is true, then writes to the .hash and .index
135       files will be done using non-blocking I/O.  This can be significantly
136       faster if your platform supports non-blocking I/O with files.  It is
137       only applicable if you're not mmap'ing the database.
138
139       dbzsync causes all buffers etc. to be flushed out to the files.  It is
140       typically used as a precaution against crashes or concurrent accesses
141       when a dbz-using process will be running for a long time.  It is a
142       somewhat expensive operation, especially for an in-memory database.
143
144       Concurrent reading of databases is fairly safe, but there is no
145       (inter)locking, so concurrent updating is not.
146
147       An open database occupies three stdio streams and two file descriptors;
148       Memory consumption is negligible except for in-memory databases (and
149       stdio buffers).
150

DIAGNOSTICS

152       Functions returning bool values return true for success, false for
153       failure.
154
155       dbzinit attempts to have errno set plausibly on return, but otherwise
156       this is not guaranteed.  An errno of "EDOM" from dbzinit indicates that
157       the database did not appear to be in dbz format.
158
159       If "DBZTEST" is defined at compile-time, then a main() function will be
160       included.  This will do performance tests and integrity test.
161

BUGS

163       Unlike dbm, dbz will refuse to dbzstore with a key already in the
164       database.  The user is responsible for avoiding this.
165
166       The RFC5322 case mapper implements only a first approximation to the
167       hideously-complex RFC5322 case rules.
168
169       dbz no longer tries to be call-compatible with dbm in any way.
170

HISTORY

172       The original dbz was written by Jon Zeeff
173       <zeeff@b-tech.ann-arbor.mi.us>.  Later contributions by David Butler
174       and Mark Moraes.  Extensive reworking, including this documentation, by
175       Henry Spencer <henry@zoo.toronto.edu> as part of the C News project.
176       MD5 code borrowed from RSA.  Extensive reworking to remove backwards
177       compatibility and to add hashes into dbz files by Clayton O'Neill
178       <coneill@oneill.net>.  Rewritten into POD by Julien Elie.
179

NAME

SYNOPSIS

DESCRIPTION

DIAGNOSTICS

BUGS

HISTORY

SEE ALSO