1DBZ(3) Library Functions Manual DBZ(3)
2
3
4
6 dbzinit, dbzfresh, dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore,
7 dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug - database
8 routines
9
11 #include <dbz.h>
12
13 bool dbzinit(const char *base)
14
15 bool dbzclose(void)
16
17 bool dbzfresh(const char *base, long size)
18
19 bool dbzagain(const char *base, const char *oldbase)
20
21 bool dbzexists(const HASH key)
22
23 off_t dbzfetch(const HASH key)
24 bool dbzfetch(const HASH key, void *ivalue)
25
26 bool dbzstore(const HASH key, off_t offset)
27 bool dbzstore(const HASH key, void *ivalue)
28
29 bool dbzsync(void)
30
31 long dbzsize(long nentries)
32
33 void dbzgetoptions(dbzoptions *opt)
34
35 void dbzsetoptions(const dbzoptions opt)
36
38 These functions provide an indexing system for rapid random access to a
39 text file (the base file).
40
41 Dbz stores offsets into the base text file for rapid retrieval. All
42 retrievals are keyed on a hash value that is generated by the HashMes‐
43 sageID() function.
44
45 Dbzinit opens a database, an index into the base file base, consisting
46 of files base.dir , base.index , and base.hash which must already
47 exist. (If the database is new, they should be zero-length files.)
48 Subsequent accesses go to that database until dbzclose is called to
49 close the database.
50
51 Dbzfetch searches the database for the specified key, returning the
52 corresponding value if any, if <--enable-tagged-hash at configure> is
53 specified. If <--enable-tagged-hash at configure> is not specified, it
54 returns true and content of ivalue is set. Dbzstore stores the key -
55 value pair in the database, if <--enable-tagged-hash at configure> is
56 specified. If <--enable-tagged-hash at configure> is not specified, it
57 stores the content of ivalue. Dbzstore will fail unless the database
58 files are writable. Dbzexists will verify whether or not the given
59 hash exists or not. Dbz is optimized for this operation and it may be
60 significantly faster than dbzfetch().
61
62 Dbzfresh is a variant of dbzinit for creating a new database with more
63 control over details.
64
65 Dbzfresh's size parameter specifies the size of the first hash table
66 within the database, in key-value pairs. Performance will be best if
67 the number of key-value pairs stored in the database does not exceed
68 about 2/3 of size. (The dbzsize function, given the expected number of
69 key-value pairs, will suggest a database size that meets these crite‐
70 ria.) Assuming that an fseek offset is 4 bytes, the .index file will
71 be 4 * size bytes. The .hash file will be DBZ_INTERNAL_HASH_SIZE *
72 size bytes (the .dir file is tiny and roughly constant in size) until
73 the number of key-value pairs exceeds about 80% of size. (Nothing
74 awful will happen if the database grows beyond 100% of size, but
75 accesses will slow down quite a bit and the .index and .hash files will
76 grow somewhat.)
77
78 Dbz stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash
79 in the .hash file to confirm a hit. This eliminates the need to read
80 the base file to handle collisions. This replaces the tagmask feature
81 in previous dbz releases.
82
83 A size of ``0'' given to dbzfresh is synonymous with the local default;
84 the normal default is suitable for tables of 5,000,000 key-value pairs.
85 Calling dbzinit(name) with the empty name is equivalent to calling
86 dbzfresh(name, 0).
87
88 When databases are regenerated periodically, as in news, it is simplest
89 to pick the parameters for a new database based on the old one. This
90 also permits some memory of past sizes of the old database, so that a
91 new database size can be chosen to cover expected fluctuations. Dbza‐
92 gain is a variant of dbzinit for creating a new database as a new gen‐
93 eration of an old database. The database files for oldbase must exist.
94 Dbzagain is equivalent to calling dbzfresh with a size equal to the
95 result of applying dbzsize to the largest number of entries in the old‐
96 base database and its previous 10 generations.
97
98 When many accesses are being done by the same program, dbz is massively
99 faster if its first hash table is in memory. If the ``pag_incore''
100 flag is set to INCORE_MEM, an attempt is made to read the table in when
101 the database is opened, and dbzclose writes it out to disk again (if it
102 was read successfully and has been modified). Dbzsetoptions can be
103 used to set the pag_incore and exists_incore flag to new value which
104 should be ``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the
105 .hash and .index files separately; this does not affect the status of a
106 database that has already been opened. The default is ``INCORE_NO''
107 for the .index file and ``INCORE_MMAP'' for the .hash file. The
108 attempt to read the table in may fail due to memory shortage; in this
109 case dbz fails with an error. Stores to an in-memory database are not
110 (in general) written out to the file until dbzclose or dbzsync, so if
111 robustness in the presence of crashes or concurrent accesses is cru‐
112 cial, in-memory databases should probably be avoided or the
113 writethrough option should be set to ``true'';
114
115 If the nonblock option is ``true'', then writes to the .hash and .index
116 files will be done using non-blocking I/O. This can be significantly
117 faster if your platform supports non-blocking I/O with files.
118
119 Dbzsync causes all buffers etc. to be flushed out to the files. It is
120 typically used as a precaution against crashes or concurrent accesses
121 when a dbz-using process will be running for a long time. It is a
122 somewhat expensive operation, especially for an in-memory database.
123
124 Concurrent reading of databases is fairly safe, but there is no
125 (inter)locking, so concurrent updating is not.
126
127 An open database occupies three stdio streams and two file descriptors;
128 Memory consumption is negligible (except for stdio buffers) except for
129 in-memory databases.
130
132 dbm(3), history(5), libinn(3)
133
135 Functions returning bool values return ``true'' for success, ``false''
136 for failure. Functions returning off_t values return a value with -1
137 for failure. Dbzinit attempts to have errno set plausibly on return,
138 but otherwise this is not guaranteed. An errno of EDOM from dbzinit
139 indicates that the database did not appear to be in dbz format.
140
141 If DBZTEST is defined at compile-time then a main() function will be
142 included. This will do performance tests and integrity test.
143
145 The original dbz was written by Jon Zeeff (zeeff@b-tech.ann-
146 arbor.mi.us). Later contributions by David Butler and Mark Moraes.
147 Extensive reworking, including this documentation, by Henry Spencer
148 (henry@zoo.toronto.edu) as part of the C News project. MD5 code bor‐
149 rowed from RSA. Extensive reworking to remove backwards compatibility
150 and to add hashes into dbz files by Clayton O'Neill
151 (coneill@oneill.net)
152
154 Unlike dbm, dbz will refuse to dbzstore with a key already in the data‐
155 base. The user is responsible for avoiding this.
156
157 The RFC822 case mapper implements only a first approximation to the
158 hideously-complex RFC822 case rules.
159
160 Dbz no longer tries to be call-compatible with dbm in any way.
161
162
163
164 6 Sep 1997 DBZ(3)