1QDBM(3) Quick Database Manager QDBM(3)
2
3
4
6 QDBM - quick database manager
7
8
10 QDBM is a library of routines for managing a database. The database is
11 a simple data file containing records, each is a pair of a key and a
12 value. Every key and value is serial bytes with variable length. Both
13 binary data and character string can be used as a key and a value.
14 There is neither concept of data tables nor data types. Records are
15 organized in hash table or B+ tree.
16
17 As for database of hash table, each key must be unique within a data‐
18 base, so it is impossible to store two or more records with a key over‐
19 laps. The following access methods are provided to the database: stor‐
20 ing a record with a key and a value, deleting a record by a key,
21 retrieving a record by a key. Moreover, traversal access to every key
22 are provided, although the order is arbitrary. These access methods
23 are similar to ones of DBM (or its followers: NDBM and GDBM) library
24 defined in the UNIX standard. QDBM is an alternative for DBM because
25 of its higher performance.
26
27 As for database of B+ tree, records whose keys are duplicated can be
28 stored. Access methods of storing, deleting, and retrieving are pro‐
29 vided as with the database of hash table. Records are stored in order
30 by a comparing function assigned by a user. It is possible to access
31 each record with the cursor in ascending or descending order. Accord‐
32 ing to this mechanism, forward matching search for strings and range
33 search for integers are realized. Moreover, transaction is available
34 in database of B+ tree.
35
36
38 QDBM is developed referring to GDBM for the purpose of the following
39 three points: higher processing speed, smaller size of a database file,
40 and simpler API. They have been achieved. Moreover, the following
41 three restrictions of traditional DBM: a process can handle only one
42 database, the size of a key and a value is bounded, a database file is
43 sparse, are cleared.
44
45 QDBM uses hash algorithm to retrieve records. If a bucket array has
46 sufficient number of elements, the time complexity of retrieval is
47 `O(1)'. That is, time required for retrieving a record is constant,
48 regardless of the scale of a database. It is also the same about stor‐
49 ing and deleting. Collision of hash values is managed by separate
50 chaining. Data structure of the chains is binary search tree. Even if
51 a bucket array has unusually scarce elements, the time complexity of
52 retrieval is `O(log n)'.
53
54 QDBM attains improvement in retrieval by loading RAM with the whole of
55 a bucket array. If a bucket array is on RAM, it is possible to access
56 a region of a target record by about one path of file operations. A
57 bucket array saved in a file is not read into RAM with the `read' call
58 but directly mapped to RAM with the `mmap' call. Therefore, prepara‐
59 tion time on connecting to a database is very short, and two or more
60 processes can share the same memory map.
61
62 If the number of elements of a bucket array is about half of records
63 stored within a database, although it depends on characteristic of the
64 input, the probability of collision of hash values is about 56.7%
65 (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight
66 times). In such case, it is possible to retrieve a record by two or
67 less paths of file operations. If it is made into a performance index,
68 in order to handle a database containing one million of records, a
69 bucket array with half a million of elements is needed. The size of
70 each element is 4 bytes. That is, if 2M bytes of RAM is available, a
71 database containing one million records can be handled.
72
73 QDBM provides two modes to connect to a database: `reader' and
74 `writer'. A reader can perform retrieving but neither storing nor
75 deleting. A writer can perform all access methods. Exclusion control
76 between processes is performed when connecting to a database by file
77 locking. While a writer is connected to a database, neither readers
78 nor writers can be connected. While a reader is connected to a data‐
79 base, other readers can be connect, but writers can not. According to
80 this mechanism, data consistency is guaranteed with simultaneous con‐
81 nections in multitasking environment.
82
83 Traditional DBM provides two modes of the storing operations: `insert'
84 and `replace'. In the case a key overlaps an existing record, the
85 insert mode keeps the existing value, while the replace mode transposes
86 it to the specified value. In addition to the two modes, QDBM provides
87 `concatenate' mode. In the mode, the specified value is concatenated
88 at the end of the existing value and stored. This feature is useful
89 when adding a element to a value as an array. Moreover, although DBM
90 has a method to fetch out a value from a database only by reading the
91 whole of a region of a record, QDBM has a method to fetch out a part of
92 a region of a value. When a value is treated as an array, this feature
93 is also useful.
94
95 Generally speaking, while succession of updating, fragmentation of
96 available regions occurs, and the size of a database grows rapidly.
97 QDBM deal with this problem by coalescence of dispensable regions and
98 reuse of them, and featuring of optimization of a database. When over‐
99 writing a record with a value whose size is greater than the existing
100 one, it is necessary to remove the region to another position of the
101 file. Because the time complexity of the operation depends on the size
102 of the region of a record, extending values successively is ineffi‐
103 cient. However, QDBM deal with this problem by alignment. If incre‐
104 ment can be put in padding, it is not necessary to remove the region.
105
106 As for many file systems, it is impossible to handle a file whose size
107 is more than 2GB. To deal with this problem, QDBM provides a directory
108 database containing multiple database files. Due to this feature, it
109 is possible to handle a database whose total size is up to 1TB in the‐
110 ory. Moreover, because database files can be deployed on multiple
111 disks, the speed of updating operations can be improved as with RAID-0
112 (striping). It is also possible for the database files to deploy on
113 multiple file servers using NFS and so on.
114
115
117 Although B+ tree database is slower than hash database, it features
118 ordering access to each record. The order can be assigned by users.
119 Records of B+ tree are sorted and arranged in logical pages. Sparse
120 index organized in B tree that is multiway balanced tree are maintained
121 for each page. Thus, the time complexity of retrieval and so on is
122 `O(log n)'. Cursor is provided to access each record in order. The
123 cursor can jump to a position specified by a key and can step forward
124 or backward from the current position. Because each page is arranged
125 as double linked list, the time complexity of stepping cursor is
126 `O(1)'.
127
128 B+ tree database is implemented, based on above hash database. Because
129 each page of B+ tree is stored as each record of hash database, B+ tree
130 database inherits efficiency of storage management of hash database.
131 Because the header of each record is smaller and alignment of each page
132 is calculated statistically, in most cases, the size of database file
133 is cut by half compared to one of hash database. Although operation of
134 many pages are required to update B+ tree, QDBM expedites the process
135 by caching pages and reducing file operations. In most cases, because
136 whole of the sparse index is cached on memory, it is possible to
137 retrieve a record by one or less path of file operations.
138
139 B+ tree database features transaction mechanism. It is possible to
140 commit a series of operations between the beginning and the end of the
141 transaction in a lump, or to abort the transaction and perform rollback
142 to the state before the transaction. Even if the process of an appli‐
143 cation is crushed while the transaction, the database file is not bro‐
144 ken.
145
146 In case that QDBM is built with ZLIB, LZO, or BZIP2 enabled, a lossless
147 data-compression library, the content of each page of B+ tree is com‐
148 pressed and stored in a file. Because each record in a page has simi‐
149 lar patterns, high efficiency of compression is expected due to the
150 Lempel-Ziv algorithm and the like. In case handling text data, the
151 size of a database is reduced to about 25%. If the scale of a database
152 is large and disk I/O is the bottleneck, featuring compression makes
153 the processing speed improved to a large extent.
154
155
157 QDBM provides very simple APIs. You can perform database I/O as usual
158 file I/O with `FILE' pointer defined in ANSI C. In the basic API of
159 QDBM, entity of a database is recorded as one file. In the extended
160 API, entity of a database is recorded as several files in one direc‐
161 tory. Because the two APIs are very similar with each other, porting
162 an application from one to the other is easy.
163
164 APIs which are compatible with NDBM and GDBM are also provided. As
165 there are a lot of applications using NDBM or GDBM, it is easy to port
166 them onto QDBM. In most cases, it is completed only by replacement of
167 header including (#include) and re-compiling. However, QDBM can not
168 handle database files made by the original NDBM or GDBM.
169
170 In order to handle records on memory easily, the utility API is pro‐
171 vided. It implements memory allocating functions, sorting functions,
172 extensible datum, array list, hash map, and so on. Using them, you can
173 handle records in C language cheaply as in such script languages as
174 Perl or Ruby.
175
176 B+ tree database is used with the advanced API. The advanced API is
177 implemented using the basic API and the utility API. Because the
178 advanced API is also similar to the basic API and the extended API, it
179 is easy to learn how to use it.
180
181 In order to handle an inverted index which is used by full-text search
182 systems, the inverted API is provided. If it is easy to handle an
183 inverted index of documents, an application can focus on text process‐
184 ing and natural language processing. Because this API does not depend
185 on character codes nor languages, it is possible to implement a
186 full-text search system which can respond to various requests from
187 users.
188
189 Along with APIs for C, QDBM provides APIs for C++, Java, Perl, and
190 Ruby. APIs for C are composed of seven kinds: the basic API, the
191 extended API, the NDBM-compatible API, the GDBM-compatible API, the
192 utility API, the advanced API, and the inverted API. Command line
193 interfaces corresponding to each API are also provided. They are use‐
194 ful for prototyping, testing, debugging, and so on. The C++ API encap‐
195 sulates database handling functions of the basic API, the extended API,
196 and the advanced API with class mechanism of C++. The Java API has
197 native methods calling the basic API, the extended API, and the
198 advanced API with Java Native Interface. The Perl API has methods
199 calling the basic API, the extended API, and the advanced API with XS
200 language. The Ruby API has method calling the basic API, the extended
201 API, and the advanced API as modules of Ruby. Moreover, CGI scripts
202 for administration of databases and full-text search are provided.
203
204
206 QDBM is implemented being based on syntax of ANSI C (C89) and using
207 only APIs defined in ANSI C or POSIX. Thus, QDBM works on most UNIX
208 and its compatible OSs. As for C API, checking operations have been
209 done at least on Linux 2.2, Linux 2.4, FreeBSD 4.8, FreeBSD 5.0, SunOS
210 5.7, SunOS 5.8, SunOS 5.9, HP-UX 11.00, Cygwin 1.3.10, Mac OS X 10.2,
211 and RISC OS 5.03. Although a database file created by QDBM depends on
212 byte order of the processor, to do with it, utilities to dump data in
213 format which is independent to byte orders are provided.
214
215
217 For building a program using QDBM, the program should be linked with a
218 library file `libqdbm.a' or `libqdbm.so'. For example, the following
219 command is executed to build `sample' from `sample.c'.
220
221 gcc -I/usr/local/include -o sample sample.c -L/usr/local/lib
222 -lqdbm
223
224
226 QDBM is written by Mikio Hirabayashi. You can contact the author by
227 e-mail to <mikio@fallabs.com>. Any suggestion or bug report is welcome
228 to the author.
229
230
232 Copyright(c) 2000-2003 Mikio Hirabayashi
233
234 QDBM is free software; you can redistribute it and/or modify it under
235 the terms of the GNU Lesser General Public License as published by the
236 Free Software Foundation; either version 2 of the License, or any later
237 version.
238
239 QDBM is distributed in the hope that it will be useful, but WITHOUT ANY
240 WARRANTY; without even the implied warranty of MERCHANTABILITY or FIT‐
241 NESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
242 License for more details.
243
244 You should have received a copy of the GNU Lesser General Public
245 License along with QDBM; if not, write to the Free Software Foundation,
246 Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
247
248
250 depot(3), curia(3), relic(3), hovel(3), cabin(3), villa(3), odeum(3),
251 ndbm(3), gdbm(3)
252
253
254
255Man Page 2004-04-22 QDBM(3)