CDB_File(3pm)

1CDB_File(3)           User Contributed Perl Documentation          CDB_File(3)
2
3
4

NAME

6       CDB_File - Perl extension for access to cdb databases
7

SYNOPSIS

9           use CDB_File;
10           $c = tie %h, 'CDB_File', 'file.cdb' or die "tie failed: $!\n";
11
12           $fh = $c->handle;
13           sysseek $fh, $c->datapos, 0 or die ...;
14           sysread $fh, $x, $c->datalen;
15           undef $c;
16           untie %h;
17
18           $t = new CDB_File ('t.cdb', "t.$$") or die ...;
19           $t->insert('key', 'value');
20           $t->finish;
21
22           CDB_File::create %t, $file, "$file.$$";
23
24       or
25
26           use CDB_File 'create';
27           create %t, $file, "$file.$$";
28

DESCRIPTION

30       CDB_File is a module which provides a Perl interface to Dan Bernstein's
31       cdb package:
32
33           cdb is a fast, reliable, lightweight package for creating and
34           reading constant databases.
35
36   Reading from a cdb
37       After the "tie" shown above, accesses to %h will refer to the cdb file
38       "file.cdb", as described in "tie" in perlfunc.
39
40       Low level access to the database is provided by the three methods
41       "handle", "datapos", and "datalen".  To use them, you must remember the
42       "CDB_File" object returned by the "tie" call: $c in the example above.
43       The "datapos" and "datalen" methods return the file offset position and
44       length respectively of the most recently visited key (for example, via
45       "exists").
46
47       Beware that if you create an extra reference to the "CDB_File" object
48       (like $c in the example above) you must destroy it (with "undef")
49       before calling "untie" on the hash.  This ensures that the object's
50       "DESTROY" method is called.  Note that "perl -w" will check this for
51       you; see perltie for further details.
52
53   Creating a cdb
54       A cdb file is created in three steps.  First call "new CDB_File
55       ($final, $tmp)", where $final is the name of the database to be
56       created, and $tmp is the name of a temporary file which can be
57       atomically renamed to $final.  Secondly, call the "insert" method once
58       for each (key, value) pair.  Finally, call the "finish" method to
59       complete the creation and renaming of the cdb file.
60
61       Alternatively, call the "insert()" method with multiple key/value
62       pairs. This can be significantly faster because there is less crossing
63       over the bridge from perl to C code. One simple way to do this is to
64       pass in an entire hash, as in: "$cdbmaker->insert(%hash);".
65
66       A simpler interface to cdb file creation is provided by
67       "CDB_File::create %t, $final, $tmp".  This creates a cdb file named
68       $final containing the contents of %t.  As before,  $tmp must name a
69       temporary file which can be atomically renamed to $final.
70       "CDB_File::create" may be imported.
71

EXAMPLES

73       These are all complete programs.
74
75       1. Convert a Berkeley DB (B-tree) database to cdb format.
76
77           use CDB_File;
78           use DB_File;
79
80           tie %h, DB_File, $ARGV[0], O_RDONLY, undef, $DB_BTREE or
81                   die "$0: can't tie to $ARGV[0]: $!\n";
82
83           CDB_File::create %h, $ARGV[1], "$ARGV[1].$$" or
84                   die "$0: can't create cdb: $!\n";
85
86       2. Convert a flat file to cdb format.  In this example, the flat file
87       consists of one key per line, separated by a colon from the value.
88       Blank lines and lines beginning with # are skipped.
89
90           use CDB_File;
91
92           $cdb = new CDB_File("data.cdb", "data.$$") or
93                   die "$0: new CDB_File failed: $!\n";
94           while (<>) {
95                   next if /^$/ or /^#/;
96                   chop;
97                   ($k, $v) = split /:/, $_, 2;
98                   if (defined $v) {
99                           $cdb->insert($k, $v);
100                   } else {
101                           warn "bogus line: $_\n";
102                   }
103           }
104           $cdb->finish or die "$0: CDB_File finish failed: $!\n";
105
106       3. Perl version of cdbdump.
107
108           use CDB_File;
109
110           tie %data, 'CDB_File', $ARGV[0] or
111                   die "$0: can't tie to $ARGV[0]: $!\n";
112           while (($k, $v) = each %data) {
113                   print '+', length $k, ',', length $v, ":$k->$v\n";
114           }
115           print "\n";
116
117       4. For really enormous data values, you can use "handle", "datapos",
118       and "datalen", in combination with "sysseek" and "sysread", to avoid
119       reading the values into memory.  Here is the script bun-x.pl, which can
120       extract uncompressed files and directories from a bun file.
121
122           use CDB_File;
123
124           sub unnetstrings {
125               my($netstrings) = @_;
126               my @result;
127               while ($netstrings =~ s/^([0-9]+)://) {
128                       push @result, substr($netstrings, 0, $1, '');
129                       $netstrings =~ s/^,//;
130               }
131               return @result;
132           }
133
134           my $chunk = 8192;
135
136           sub extract {
137               my($file, $t, $b) = @_;
138               my $head = $$b{"H$file"};
139               my ($code, $type) = $head =~ m/^([0-9]+)(.)/;
140               if ($type eq "/") {
141                       mkdir $file, 0777;
142               } elsif ($type eq "_") {
143                       my ($total, $now, $got, $x);
144                       open OUT, ">$file" or die "open for output: $!\n";
145                       exists $$b{"D$code"} or die "corrupt bun file\n";
146                       my $fh = $t->handle;
147                       sysseek $fh, $t->datapos, 0;
148                       $total = $t->datalen;
149                       while ($total) {
150                               $now = ($total > $chunk) ? $chunk : $total;
151                               $got = sysread $fh, $x, $now;
152                               if (not $got) { die "read error\n"; }
153                               $total -= $got;
154                               print OUT $x;
155                       }
156                       close OUT;
157               } else {
158                       print STDERR "warning: skipping unknown file type\n";
159               }
160           }
161
162           die "usage\n" if @ARGV != 1;
163
164           my (%b, $t);
165           $t = tie %b, 'CDB_File', $ARGV[0] or die "tie: $!\n";
166           map { extract $_, $t, \%b } unnetstrings $b{""};
167
168       5. Although a cdb file is constant, you can simulate updating it in
169       Perl.  This is an expensive operation, as you have to create a new
170       database, and copy into it everything that's unchanged from the old
171       database.  (As compensation, the update does not affect database
172       readers.  The old database is available for them, till the moment the
173       new one is "finish"ed.)
174
175           use CDB_File;
176
177           $file = 'data.cdb';
178           $new = new CDB_File($file, "$file.$$") or
179                   die "$0: new CDB_File failed: $!\n";
180
181           # Add the new values; remember which keys we've seen.
182           while (<>) {
183                   chop;
184                   ($k, $v) = split;
185                   $new->insert($k, $v);
186                   $seen{$k} = 1;
187           }
188
189           # Add any old values that haven't been replaced.
190           tie %old, 'CDB_File', $file or die "$0: can't tie to $file: $!\n";
191           while (($k, $v) = each %old) {
192                   $new->insert($k, $v) unless $seen{$k};
193           }
194
195           $new->finish or die "$0: CDB_File finish failed: $!\n";
196

REPEATED KEYS

198       Most users can ignore this section.
199
200       A cdb file can contain repeated keys.  If the "insert" method is called
201       more than once with the same key during the creation of a cdb file,
202       that key will be repeated.
203
204       Here's an example.
205
206           $cdb = new CDB_File ("$file.cdb", "$file.$$") or die ...;
207           $cdb->insert('cat', 'gato');
208           $cdb->insert('cat', 'chat');
209           $cdb->finish;
210
211       Normally, any attempt to access a key retrieves the first value stored
212       under that key.  This code snippet always prints gato.
213
214           $catref = tie %catalogue, CDB_File, "$file.cdb" or die ...;
215           print "$catalogue{cat}";
216
217       However, all the usual ways of iterating over a hash---"keys",
218       "values", and "each"---do the Right Thing, even in the presence of
219       repeated keys.  This code snippet prints cat cat gato chat.
220
221           print join(' ', keys %catalogue, values %catalogue);
222
223       And these two both print cat:gato cat:chat, although the second is more
224       efficient.
225
226           foreach $key (keys %catalogue) {
227                   print "$key:$catalogue{$key} ";
228           }
229
230           while (($key, $val) = each %catalogue) {
231                   print "$key:$val ";
232           }
233
234       The "multi_get" method retrieves all the values associated with a key.
235       It returns a reference to an array containing all the values.  This
236       code prints gato chat.
237
238           print "@{$catref->multi_get('cat')}";
239
240       "multi_get" always returns an array reference.  If the key was not
241       found in the database, it will be a reference to an empty array.  To
242       test whether the key was found, you must test the array, and not the
243       reference.
244
245           $x = $catref->multiget($key);
246           warn "$key not found\n" unless $x; # WRONG; message never printed
247           warn "$key not found\n" unless @$x; # Correct
248
249       The "fetch_all" method returns a hashref of all keys with the first
250       value in the cdb.  This is useful for quickly loading a cdb file where
251       there is a 1:1 key mapping.  In practice it proved to be about 400%
252       faster then iterating a tied hash.
253
254           # Slow
255           my %copy = %tied_cdb;
256
257           # Much Faster
258           my $copy_hashref = $catref->fetch_all();
259

RETURN VALUES

261       The routines "tie", "new", and "finish" return undef if the attempted
262       operation failed; $! contains the reason for failure.
263

DIAGNOSTICS

265       The following fatal errors may occur.  (See "eval" in perlfunc if you
266       want to trap them.)
267
268       Modification of a CDB_File attempted
269           You attempted to modify a hash tied to a CDB_File.
270
271       CDB database too large
272           You attempted to create a cdb file larger than 4 gigabytes.
273
274       [ Write to | Read of | Seek in ] CDB_File failed: <error string>
275           If error string is Protocol error, you tried to "use CDB_File" to
276           access something that isn't a cdb file.  Otherwise a serious OS
277           level problem occurred, for example, you have run out of disk
278           space.
279

PERFORMANCE

281       Sometimes you need to get the most performance possible out of a
282       library. Rumour has it that perl's tie() interface is slow. In order to
283       get around that you can use CDB_File in an object oriented fashion,
284       rather than via tie().
285
286         my $cdb = CDB_File->TIEHASH('/path/to/cdbfile.cdb');
287
288         if ($cdb->EXISTS('key')) {
289             print "Key is: ", $cdb->FETCH('key'), "\n";
290         }
291
292       For more information on the methods available on tied hashes see
293       perltie.
294

BUGS

296       The "create()" interface could be done with "TIEHASH".
297

AUTHOR

302       Tim Goodwin, <tjg@star.le.ac.uk>.  CDB_File began on 1997-01-08.
303
304       Now maintained by Matt Sergeant, <matt@sergeant.org>
305
306
307
308perl v5.30.1                      2020-01-27                       CDB_File(3)