DBD::CSV(3pm)

1DBD::CSV(3)           User Contributed Perl Documentation          DBD::CSV(3)
2
3
4

NAME

6       DBD::CSV - DBI driver for CSV files
7

SYNOPSIS

9           use DBI;
10           # See "Creating database handle" below
11           $dbh = DBI->connect ("dbi:CSV:") or
12               die "Cannot connect: $DBI::errstr";
13
14           # Simple statements
15           $dbh->do ("CREATE TABLE a (id INTEGER, name CHAR (10))") or
16               die "Cannot prepare: " . $dbh->errstr ();
17
18           # Selecting
19           $dbh->{RaiseError} = 1;
20           my $sth = $dbh->prepare ("select * from foo");
21           $sth->execute;
22           while (my @row = $sth->fetchrow_array) {
23               print "id: $row[0], name: $row[1]\n";
24               }
25
26           # Updates
27           my $sth = $dbh->prepare ("UPDATE a SET name = ? WHERE id = ?");
28           $sth->execute ("DBI rocks!", 1);
29           $sth->finish;
30
31           $dbh->disconnect;
32

DESCRIPTION

34       The DBD::CSV module is yet another driver for the DBI (Database
35       independent interface for Perl). This one is based on the SQL "engine"
36       SQL::Statement and the abstract DBI driver DBD::File and implements
37       access to so-called CSV files (Comma Separated Values). Such files are
38       often used for exporting MS Access and MS Excel data.
39
40       See DBI for details on DBI, SQL::Statement for details on
41       SQL::Statement and DBD::File for details on the base class DBD::File.
42
43   Prerequisites
44       The only system dependent feature that DBD::File uses, is the "flock
45       ()" function. Thus the module should run (in theory) on any system with
46       a working "flock ()", in particular on all Unix machines and on Windows
47       NT. Under Windows 95 and MacOS the use of "flock ()" is disabled, thus
48       the module should still be usable,
49
50       Unlike other DBI drivers, you don't need an external SQL engine or a
51       running server. All you need are the following Perl modules, available
52       from any CPAN mirror, for example
53
54         http://search.cpan.org/
55
56       DBI The DBI (Database independent interface for Perl), version 1.00 or
57           a later release
58
59       DBD::File
60           This is the base class for DBD::CSV, and it is part of the DBI
61           distribution. As DBD::CSV requires version 0.38 or newer for
62           DBD::File it effectively requires DBI version 1.611 or newer.
63
64       SQL::Statement
65           A simple SQL engine. This module defines all of the SQL syntax for
66           DBD::CSV, new SQL support is added with each release so you should
67           look for updates to SQL::Statement regularly.
68
69           It is possible to run "DBD::CSV" without this module if you define
70           the environment variable $DBI_SQL_NANO to 1. This will reduce the
71           SQL support a lot though. See DBI::SQL::Nano for more details. Note
72           that the test suite does not test in this mode!
73
74       Text::CSV_XS
75           This module is used for writing rows to or reading rows from CSV
76           files.
77
78   Installation
79       Installing this module (and the prerequisites from above) is quite
80       simple.  The simplest way is to install the bundle:
81
82           $ cpan Bundle::CSV
83
84       Alternatively, you can name them all
85
86           $ cpan Text::CSV_XS DBI DBD::CSV
87
88       or even trust "cpan" to resolve all dependencies for you:
89
90           $ cpan DBD::CSV
91
92       If you cannot, for whatever reason, use cpan, fetch all modules from
93       CPAN, and build with a sequence like:
94
95           gzip -d < DBD-CSV-0.28.tgz | tar xf -
96
97       (this is for Unix users, Windows users would prefer WinZip or something
98       similar) and then enter the following:
99
100           cd DBD-CSV-0.28
101           perl Makefile.PL
102           make test
103
104       If any tests fail, let us know. Otherwise go on with
105
106           make install UNINST=1
107
108       Note that you almost definitely need root or administrator permissions.
109       If you don't have them, read the ExtUtils::MakeMaker man page for
110       details on installing in your own directories. ExtUtils::MakeMaker.
111
112   Supported SQL Syntax
113       All SQL processing for DBD::CSV is done by the SQL::Statement module.
114       Features include joins, aliases, built-in and user-defined functions,
115       and more.  See SQL::Statement::Syntax for a description of the SQL
116       syntax supported in DBD::CSV.
117
118       Table names are case insensitive unless quoted.
119

Using DBD::CSV with DBI

121       For most things, DBD-CSV operates the same as any DBI driver.  See DBI
122       for detailed usage.
123
124   Creating a database handle (connect)
125       Creating a database handle usually implies connecting to a database
126       server.  Thus this command reads
127
128           use DBI;
129           my $dbh = DBI->connect ("dbi:CSV:", "", "", {
130               f_dir => "/home/user/folder",
131               });
132
133       The directory tells the driver where it should create or open tables
134       (a.k.a.  files). It defaults to the current directory, so the following
135       are equivalent:
136
137           $dbh = DBI->connect ("dbi:CSV:");
138           $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "." });
139           $dbh = DBI->connect ("dbi:CSV:f_dir=.");
140
141       We were told, that VMS might - for whatever reason - require:
142
143           $dbh = DBI->connect ("dbi:CSV:f_dir=");
144
145       The preferred way of passing the arguments is by driver attributes:
146
147           # specify most possible flags via driver flags
148           $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
149               f_schema         => undef,
150               f_dir            => "data",
151               f_ext            => ".csv/r",
152               f_lock           => 2,
153               f_encoding       => "utf8",
154
155               csv_eol          => "\r\n",
156               csv_sep_char     => ",",
157               csv_quote_char   => '"',
158               csv_escape_char  => '"',
159               csv_class        => "Text::CSV_XS",
160               csv_null         => 1,
161               csv_tables       => {
162                   info => { file => "info.csv" }
163                   },
164
165               RaiseError       => 1,
166               PrintError       => 1,
167               FetchHashKeyName => "NAME_lc",
168               }) or die $DBI::errstr;
169
170       but you may set these attributes in the DSN as well, separated by
171       semicolons.  Pay attention to the semi-colon for "csv_sep_char" (as
172       seen in many CSV exports from MS Excel) is being escaped in below
173       example, as is would otherwise be seen as attribute separator:
174
175           $dbh = DBI->connect (
176               "dbi:CSV:f_dir=$ENV{HOME}/csvdb;f_ext=.csv;f_lock=2;" .
177               "f_encoding=utf8;csv_eol=\n;csv_sep_char=\\;;" .
178               "csv_quote_char=\";csv_escape_char=\\;csv_class=Text::CSV_XS;" .
179               "csv_null=1") or die $DBI::errstr;
180
181       Using attributes in the DSN is easier to use when the DSN is derived
182       from an outside source (environment variable, database entry, or
183       configure file), whereas using all entries in the attribute hash is
184       easier to read and to maintain.
185
186   Creating and dropping tables
187       You can create and drop tables with commands like the following:
188
189           $dbh->do ("CREATE TABLE $table (id INTEGER, name CHAR (64))");
190           $dbh->do ("DROP TABLE $table");
191
192       Note that currently only the column names will be stored and no other
193       data.  Thus all other information including column type (INTEGER or
194       CHAR (x), for example), column attributes (NOT NULL, PRIMARY KEY, ...)
195       will silently be discarded. This may change in a later release.
196
197       A drop just removes the file without any warning.
198
199       See DBI for more details.
200
201       Table names cannot be arbitrary, due to restrictions of the SQL syntax.
202       I recommend that table names are valid SQL identifiers: The first
203       character is alphabetic, followed by an arbitrary number of
204       alphanumeric characters. If you want to use other files, the file names
205       must start with "/", "./" or "../" and they must not contain white
206       space.
207
208   Inserting, fetching and modifying data
209       The following examples insert some data in a table and fetch it back:
210       First all data in the string:
211
212           $dbh->do ("INSERT INTO $table VALUES (1, ".
213                      $dbh->quote ("foobar") . ")");
214
215       Note the use of the quote method for escaping the word "foobar". Any
216       string must be escaped, even if it does not contain binary data.
217
218       Next an example using parameters:
219
220           $dbh->do ("INSERT INTO $table VALUES (?, ?)", undef, 2,
221                     "It's a string!");
222
223       Note that you don't need to use the quote method here, this is done
224       automatically for you. This version is particularly well designed for
225       loops. Whenever performance is an issue, I recommend using this method.
226
227       You might wonder about the "undef". Don't wonder, just take it as it
228       is. :-) It's an attribute argument that I have never ever used and will
229       be parsed to the prepare method as a second argument.
230
231       To retrieve data, you can use the following:
232
233           my $query = "SELECT * FROM $table WHERE id > 1 ORDER BY id";
234           my $sth   = $dbh->prepare ($query);
235           $sth->execute ();
236           while (my $row = $sth->fetchrow_hashref) {
237               print "Found result row: id = ", $row->{id},
238                     ", name = ", $row->{name};
239               }
240           $sth->finish ();
241
242       Again, column binding works: The same example again.
243
244           my $sth = $dbh->prepare (qq;
245               SELECT * FROM $table WHERE id > 1 ORDER BY id;
246               ;);
247           $sth->execute;
248           my ($id, $name);
249           $sth->bind_columns (undef, \$id, \$name);
250           while ($sth->fetch) {
251               print "Found result row: id = $id, name = $name\n";
252               }
253           $sth->finish;
254
255       Of course you can even use input parameters. Here's the same example
256       for the third time:
257
258           my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?");
259           $sth->bind_columns (undef, \$id, \$name);
260           for (my $i = 1; $i <= 2; $i++) {
261               $sth->execute ($id);
262               if ($sth->fetch) {
263                   print "Found result row: id = $id, name = $name\n";
264                   }
265               $sth->finish;
266               }
267
268       See DBI for details on these methods. See SQL::Statement for details on
269       the WHERE clause.
270
271       Data rows are modified with the UPDATE statement:
272
273           $dbh->do ("UPDATE $table SET id = 3 WHERE id = 1");
274
275       Likewise you use the DELETE statement for removing rows:
276
277           $dbh->do ("DELETE FROM $table WHERE id > 1");
278
279   Error handling
280       In the above examples we have never cared about return codes. Of
281       course, this cannot be recommended. Instead we should have written (for
282       example):
283
284           my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?") or
285               die "prepare: " . $dbh->errstr ();
286           $sth->bind_columns (undef, \$id, \$name) or
287               die "bind_columns: " . $dbh->errstr ();
288           for (my $i = 1; $i <= 2; $i++) {
289               $sth->execute ($id) or
290                   die "execute: " . $dbh->errstr ();
291               $sth->fetch and
292                   print "Found result row: id = $id, name = $name\n";
293               }
294           $sth->finish ($id) or die "finish: " . $dbh->errstr ();
295
296       Obviously this is tedious. Fortunately we have DBI's RaiseError
297       attribute:
298
299           $dbh->{RaiseError} = 1;
300           $@ = "";
301           eval {
302               my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?");
303               $sth->bind_columns (undef, \$id, \$name);
304               for (my $i = 1; $i <= 2; $i++) {
305                   $sth->execute ($id);
306                   $sth->fetch and
307                       print "Found result row: id = $id, name = $name\n";
308                   }
309               $sth->finish ($id);
310               };
311           $@ and die "SQL database error: $@";
312
313       This is not only shorter, it even works when using DBI methods within
314       subroutines.
315

DBI database handle attributes

317   Metadata
318       The following attributes are handled by DBI itself and not by
319       DBD::File, thus they all work as expected:
320
321           Active
322           ActiveKids
323           CachedKids
324           CompatMode             (Not used)
325           InactiveDestroy
326           Kids
327           PrintError
328           RaiseError
329           Warn                   (Not used)
330
331       The following DBI attributes are handled by DBD::File:
332
333       AutoCommit
334           Always on
335
336       ChopBlanks
337           Works
338
339       NUM_OF_FIELDS
340           Valid after "$sth->execute"
341
342       NUM_OF_PARAMS
343           Valid after "$sth->prepare"
344
345       NAME
346       NAME_lc
347       NAME_uc
348           Valid after "$sth->execute"; undef for Non-Select statements.
349
350       NULLABLE
351           Not really working. Always returns an array ref of one's, as
352           DBD::CSV does not verify input data. Valid after "$sth->execute";
353           undef for non-Select statements.
354
355       These attributes and methods are not supported:
356
357           bind_param_inout
358           CursorName
359           LongReadLen
360           LongTruncOk
361

DBD-CSV specific database handle attributes

363       In addition to the DBI attributes, you can use the following dbh
364       attributes:
365
366       f_dir
367           This attribute is used for setting the directory where CSV files
368           are opened. Usually you set it in the dbh, it defaults to the
369           current directory ("."). However, it is overwritable in the
370           statement handles.
371
372       f_ext
373           This attribute is used for setting the file extension.
374
375       f_schema
376           This attribute allows you to set the database schema name. The
377           default is to use the owner of "f_dir". "undef" is allowed, but not
378           in the DSN part.
379
380               my $dbh = DBI->connect ("dbi:CSV:", "", "", {
381                   f_schema => undef,
382                   f_dir    => "data",
383                   f_ext    => ".csv/r",
384                   }) or die $DBI::errstr;
385
386       f_encoding
387           This attribute allows you to set the encoding of the data. With
388           CSV, it is not possible to set (and remember) the encoding on a
389           per-field basis, but DBD::File now allows to set the encoding of
390           the underlying file. If this attribute is not set, or undef is
391           passed, the file will be seen as binary.
392
393       f_lock
394           With this attribute, you can force locking mode (if locking is
395           supported at all) for opening tables. By default, tables are opened
396           with a shared lock for reading, and with an exclusive lock for
397           writing. The supported modes are:
398
399           0 Force no locking at all.
400
401           1 Only shared locks will be used.
402
403           2 Only exclusive locks will be used.
404
405           But see "KNOWN BUGS" in DBD::File.
406
407       csv_eol
408       csv_sep_char
409       csv_quote_char
410       csv_escape_char
411       csv_class
412       csv_csv
413           The attributes csv_eol, csv_sep_char, csv_quote_char and
414           csv_escape_char are corresponding to the respective attributes of
415           the Text::CSV_XS object. You want to set these attributes if you
416           have unusual CSV files like /etc/passwd or MS Excel generated CSV
417           files with a semicolon as separator. Defaults are "\015\012", ';',
418           '"' and '"', respectively.
419
420           The csv_eol attribute defines the end-of-line pattern, which is
421           better known as a record separator pattern since it separates
422           records.  The default is windows-style end-of-lines "\015\012" for
423           output (writing) and unset for input (reading), so if on unix you
424           may want to set this to newline ("\n") like this:
425
426             $dbh->{csv_eol} = "\n";
427
428           It is also possible to use multi-character patterns as record
429           separators.  For example this file uses newlines as field
430           separators (sep_char) and the pattern "\n__ENDREC__\n" as the
431           record separators (eol):
432
433             name
434             city
435             __ENDREC__
436             joe
437             seattle
438             __ENDREC__
439             sue
440             portland
441             __ENDREC__
442
443           To handle this file, you'd do this:
444
445             $dbh->{eol}      = "\n__ENDREC__\n" ,
446             $dbh->{sep_char} = "\n"
447
448           The attributes are used to create an instance of the class
449           csv_class, by default Text::CSV_XS. Alternatively you may pass an
450           instance as csv_csv, the latter takes precedence. Note that the
451           binary attribute must be set to a true value in that case.
452
453           Additionally you may overwrite these attributes on a per-table base
454           in the csv_tables attribute.
455
456       csv_null
457           With this option set, all new statement handles will set
458           "always_quote" and "blank_is_undef" in the CSV parser and writer,
459           so it knows how to distinguish between the empty string and "undef"
460           or "NULL". You cannot reset it with a false value. You can pass it
461           to connect, or set it later:
462
463             $dbh = DBI->connect ("dbi:CSV:", "", "", { csv_null => 1 });
464
465             $dbh->{csv_null} = 1;
466
467       csv_tables
468           This hash ref is used for storing table dependent metadata. For any
469           table it contains an element with the table name as key and another
470           hash ref with the following attributes:
471
472       csv_*
473           All other attributes that start with "csv_" and are not described
474           above will be passed to "Text::CSV_XS" (without the "csv_" prefix).
475           these extra options are most likely to be only useful for reading
476           (select) handles. Examples:
477
478             $dbh->{csv_allow_whitespace}    = 1;
479             $dbh->{csv_allow_loose_quotes}  = 1;
480             $dbh->{csv_allow_loose_escapes} = 1;
481
482           See the "Text::CSV_XS" documentation for the full list and the
483           documentation.
484
485           file
486               The tables file name; defaults to
487
488                   "$dbh->{f_dir}/$table"
489
490           eol
491           sep_char
492           quote_char
493           escape_char
494           class
495           csv These correspond to the attributes csv_eol, csv_sep_char,
496               csv_quote_char, csv_escape_char, csv_class and csv_csv.  The
497               difference is that they work on a per-table base.
498
499           col_names
500           skip_first_row
501               By default DBD::CSV assumes that column names are stored in the
502               first row of the CSV file and sanitizes them (see "raw_header"
503               below). If this is not the case, you can supply an array ref of
504               table names with the col_names attribute. In that case the
505               attribute skip_first_row will be set to FALSE.
506
507               If you supply an empty array ref, the driver will read the
508               first row for you, count the number of columns and create
509               column names like "col0", "col1", ...
510
511           raw_header
512               Due to the SQL standard, field names cannot contain special
513               characters like a dot ("."). Following the approach of
514               mdb_tools, all these tokens are translated to an underscore
515               ("_") when reading the first line of the CSV file, so all field
516               names are `sanitized'. If you do not want this to happen, set
517               "raw_header" to a true value. DBD::CSV cannot guarantee that
518               any part in the toolchain will work if field names have those
519               characters, and the chances are high that the SQL statements
520               will fail.
521
522       It's strongly recommended to check the attributes supported by
523       "Metadata" in DBD::File.
524
525       Example: Suggest you want to use /etc/passwd as a CSV file. :-) There
526       simplest way is:
527
528           use DBI;
529           my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
530               f_dir           => "/etc",
531               csv_sep_char    => ":",
532               csv_quote_char  => undef,
533               csv_escape_char => undef,
534               });
535           $dbh->{csv_tables}{passwd} = {
536               col_names => [qw( login password uid gid realname
537                                 directory shell )];
538               };
539           $sth = $dbh->prepare ("SELECT * FROM passwd");
540
541       Another possibility where you leave all the defaults as they are and
542       overwrite them on a per table base:
543
544           require DBI;
545           my $dbh = DBI->connect ("dbi:CSV:");
546           $dbh->{csv_tables}{passwd} = {
547               eol         => "\n",
548               sep_char    => ":",
549               quote_char  => undef,
550               escape_char => undef,
551               file        => "/etc/passwd",
552               col_names   => [qw( login password uid gid
553                                   realname directory shell )],
554               };
555           $sth = $dbh->prepare ("SELECT * FROM passwd");
556
557   Driver private methods
558       These methods are inherited from DBD::File:
559
560       data_sources
561           The "data_sources" method returns a list of sub-directories of the
562           current directory in the form "dbi:CSV:directory=$dirname".
563
564           If you want to read the sub-directories of another directory, use
565
566               my $drh  = DBI->install_driver ("CSV");
567               my @list = $drh->data_sources (f_dir => "/usr/local/csv_data");
568
569       list_tables
570           This method returns a list of file names inside $dbh->{directory}.
571           Example:
572
573               my $dbh  = DBI->connect ("dbi:CSV:directory=/usr/local/csv_data");
574               my @list = $dbh->func ("list_tables");
575
576           Note that the list includes all files contained in the directory,
577           even those that have non-valid table names, from the view of SQL.
578           See "Creating and dropping tables" above.
579

KNOWN ISSUES

581       ·   The module is using flock () internally. However, this function is
582           not available on platforms. Using flock () is disabled on MacOS and
583           Windows 95: There's no locking at all (perhaps not so important on
584           these operating systems, as they are for single users anyways).
585

TODO

587       Tests
588           Aim for a full 100% code coverage
589
590            - eol      Make tests for different record separators.
591            - csv_xs   Test with a variety of combinations for
592                       sep_char, quote_char, and escape_char testing
593            - quoting  $dbh->do ("drop table $_") for DBI-tables ();
594            - errors   Make sure that all documented exceptions are tested.
595                       . write to write-protected file
596                       . read from badly formatted csv
597                       . pass bad arguments to csv parser while fetching
598
599           Add tests that specifically test DBD::File functionality where that
600           is useful.
601
602       RT  Attack all open DBD::CSV bugs in RT
603
604       CPAN::Forum
605           Attack all items in http://www.cpanforum.com/dist/DBD-CSV
606
607       Documentation
608           Expand on error-handling, and document all possible errors.  Use
609           Text::CSV_XS::error_diag () wherever possible.
610
611       Debugging
612           Implement and document dbd_verbose.
613
614       Data dictionary
615           Investigate the possibility to store the data dictionary in a file
616           like .sys$columns that can store the field attributes (type, key,
617           nullable).
618
619       Examples
620           Make more real-life examples from the docs in examples/
621

AUTHORS and MAINTAINERS

634       This module is currently maintained by
635
636           H.Merijn Brand <h.m.brand@xs4all.nl>
637
638       in close cooperation with and help from
639
640           Jens Rehsack <sno@NetBSD.org>
641
642       The original author is Jochen Wiedmann.  Previous maintainer was Jeff
643       Zucker
644

COPYRIGHT AND LICENSE

646       Copyright (C) 2009-2010 by H.Merijn Brand Copyright (C) 2004-2009 by
647       Jeff Zucker Copyright (C) 1998-2004 by Jochen Wiedmann
648
649       All rights reserved.
650
651       You may distribute this module under the terms of either the GNU
652       General Public License or the Artistic License, as specified in the
653       Perl README file.
654
655
656
657perl v5.12.1                      2010-07-09                       DBD::CSV(3)