dbz(3) - f7

1DBZ(3)                     Library Functions Manual                     DBZ(3)
2
3
4

NAME

6       dbzinit,  dbzfresh,  dbzagain, dbzclose, dbzexists, dbzfetch, dbzstore,
7       dbzsync, dbzsize, dbzgetoptions,  dbzsetoptions,  dbzdebug  -  database
8       routines
9

SYNOPSIS

11       #include <dbz.h>
12
13       bool dbzinit(const char *base)
14
15       bool dbzclose(void)
16
17       bool dbzfresh(const char *base, long size)
18
19       bool dbzagain(const char *base, const char *oldbase)
20
21       bool dbzexists(const HASH key)
22
23       off_t dbzfetch(const HASH key)
24       bool dbzfetch(const HASH key, void *ivalue)
25
26       bool dbzstore(const HASH key, off_t offset)
27       bool dbzstore(const HASH key, void *ivalue)
28
29       bool dbzsync(void)
30
31       long dbzsize(long nentries)
32
33       void dbzgetoptions(dbzoptions *opt)
34
35       void dbzsetoptions(const dbzoptions opt)
36

DESCRIPTION

38       These functions provide an indexing system for rapid random access to a
39       text file (the base file).
40
41       Dbz stores offsets into the base text file for  rapid  retrieval.   All
42       retrievals  are keyed on a hash value that is generated by the HashMes‐
43       sageID() function.
44
45       Dbzinit opens a database, an index into the base file base,  consisting
46       of  files  base.dir  ,  base.index  ,  and base.hash which must already
47       exist.  (If the database is new, they  should  be  zero-length  files.)
48       Subsequent  accesses  go  to  that database until dbzclose is called to
49       close the database.
50
51       Dbzfetch searches the database for the  specified  key,  returning  the
52       corresponding  value  if any, if <--enable-tagged-hash at configure> is
53       specified.  If <--enable-tagged-hash at configure> is not specified, it
54       returns  true  and content of ivalue is set.  Dbzstore stores the key -
55       value pair in the database, if <--enable-tagged-hash at  configure>  is
56       specified.  If <--enable-tagged-hash at configure> is not specified, it
57       stores the content of ivalue.  Dbzstore will fail unless  the  database
58       files  are  writable.   Dbzexists  will verify whether or not the given
59       hash exists or not.  Dbz is optimized for this operation and it may  be
60       significantly faster than dbzfetch().
61
62       Dbzfresh  is a variant of dbzinit for creating a new database with more
63       control over details.
64
65       Dbzfresh's size parameter specifies the size of the  first  hash  table
66       within  the  database, in key-value pairs.  Performance will be best if
67       the number of key-value pairs stored in the database  does  not  exceed
68       about 2/3 of size.  (The dbzsize function, given the expected number of
69       key-value pairs, will suggest a database size that meets  these  crite‐
70       ria.)   Assuming  that an fseek offset is 4 bytes, the .index file will
71       be 4 * size bytes.  The .hash file  will  be  DBZ_INTERNAL_HASH_SIZE  *
72       size  bytes  (the .dir file is tiny and roughly constant in size) until
73       the number of key-value pairs exceeds  about  80%  of  size.   (Nothing
74       awful  will  happen  if  the  database  grows  beyond 100% of size, but
75       accesses will slow down quite a bit and the .index and .hash files will
76       grow somewhat.)
77
78       Dbz  stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash
79       in the .hash file to confirm a hit.  This eliminates the need  to  read
80       the  base file to handle collisions.  This replaces the tagmask feature
81       in previous dbz releases.
82
83       A size of ``0'' given to dbzfresh is synonymous with the local default;
84       the normal default is suitable for tables of 5,000,000 key-value pairs.
85       Calling dbzinit(name) with the empty  name  is  equivalent  to  calling
86       dbzfresh(name, 0).
87
88       When databases are regenerated periodically, as in news, it is simplest
89       to pick the parameters for a new database based on the old  one.   This
90       also  permits  some memory of past sizes of the old database, so that a
91       new database size can be chosen to cover expected fluctuations.   Dbza‐
92       gain  is a variant of dbzinit for creating a new database as a new gen‐
93       eration of an old database.  The database files for oldbase must exist.
94       Dbzagain  is  equivalent  to  calling dbzfresh with a size equal to the
95       result of applying dbzsize to the largest number of entries in the old‐
96       base database and its previous 10 generations.
97
98       When many accesses are being done by the same program, dbz is massively
99       faster if its first hash table is in  memory.   If  the  ``pag_incore''
100       flag is set to INCORE_MEM, an attempt is made to read the table in when
101       the database is opened, and dbzclose writes it out to disk again (if it
102       was  read  successfully  and  has been modified).  Dbzsetoptions can be
103       used to set the pag_incore and exists_incore flag to  new  value  which
104       should  be  ``INCORE_NO'',  ``INCORE_MEM'',  or ``INCORE_MMAP'' for the
105       .hash and .index files separately; this does not affect the status of a
106       database  that  has  already been opened.  The default is ``INCORE_NO''
107       for the .index file  and  ``INCORE_MMAP''  for  the  .hash  file.   The
108       attempt  to  read the table in may fail due to memory shortage; in this
109       case dbz fails with an error.  Stores to an in-memory database are  not
110       (in  general)  written out to the file until dbzclose or dbzsync, so if
111       robustness in the presence of crashes or concurrent  accesses  is  cru‐
112       cial,   in-memory   databases   should   probably  be  avoided  or  the
113       writethrough option should be set to ``true'';
114
115       If the nonblock option is ``true'', then writes to the .hash and .index
116       files  will  be done using non-blocking I/O.  This can be significantly
117       faster if your platform supports non-blocking I/O with files.
118
119       Dbzsync causes all buffers etc. to be flushed out to the files.  It  is
120       typically  used  as a precaution against crashes or concurrent accesses
121       when a dbz-using process will be running for a  long  time.   It  is  a
122       somewhat expensive operation, especially for an in-memory database.
123
124       Concurrent  reading  of  databases  is  fairly  safe,  but  there is no
125       (inter)locking, so concurrent updating is not.
126
127       An open database occupies three stdio streams and two file descriptors;
128       Memory  consumption is negligible (except for stdio buffers) except for
129       in-memory databases.
130

DIAGNOSTICS

135       Functions returning bool values return ``true'' for success,  ``false''
136       for  failure.   Functions returning off_t values return a value with -1
137       for failure.  Dbzinit attempts to have errno set plausibly  on  return,
138       but  otherwise  this  is not guaranteed.  An errno of EDOM from dbzinit
139       indicates that the database did not appear to be in dbz format.
140
141       If DBZTEST is defined at compile-time then a main()  function  will  be
142       included.  This will do performance tests and integrity test.
143

HISTORY

145       The   original   dbz   was  written  by  Jon  Zeeff  (zeeff@b-tech.ann-
146       arbor.mi.us).  Later contributions by David  Butler  and  Mark  Moraes.
147       Extensive  reworking,  including  this  documentation, by Henry Spencer
148       (henry@zoo.toronto.edu) as part of the C News project.  MD5  code  bor‐
149       rowed  from RSA.  Extensive reworking to remove backwards compatibility
150       and   to   add   hashes   into   dbz   files   by    Clayton    O'Neill
151       (coneill@oneill.net)
152

BUGS

154       Unlike dbm, dbz will refuse to dbzstore with a key already in the data‐
155       base.  The user is responsible for avoiding this.
156
157       The RFC822 case mapper implements only a  first  approximation  to  the
158       hideously-complex RFC822 case rules.
159
160       Dbz no longer tries to be call-compatible with dbm in any way.
161
162
163
164                                  6 Sep 1997                            DBZ(3)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

DIAGNOSTICS

HISTORY

BUGS