1DBD::CSV(3) User Contributed Perl Documentation DBD::CSV(3)
2
3
4
6 DBD::CSV - DBI driver for CSV files
7
9 use DBI;
10 # See "Creating database handle" below
11 $dbh = DBI->connect ("dbi:CSV:") or
12 die "Cannot connect: $DBI::errstr";
13
14 # Simple statements
15 $dbh->do ("CREATE TABLE a (id INTEGER, name CHAR (10))") or
16 die "Cannot prepare: " . $dbh->errstr ();
17
18 # Selecting
19 $dbh->{RaiseError} = 1;
20 my $sth = $dbh->prepare ("select * from foo");
21 $sth->execute;
22 while (my @row = $sth->fetchrow_array) {
23 print "id: $row[0], name: $row[1]\n";
24 }
25
26 # Updates
27 my $sth = $dbh->prepare ("UPDATE a SET name = ? WHERE id = ?");
28 $sth->execute ("DBI rocks!", 1);
29 $sth->finish;
30
31 $dbh->disconnect;
32
34 The DBD::CSV module is yet another driver for the DBI (Database
35 independent interface for Perl). This one is based on the SQL "engine"
36 SQL::Statement and the abstract DBI driver DBD::File and implements
37 access to so-called CSV files (Comma Separated Values). Such files are
38 often used for exporting MS Access and MS Excel data.
39
40 See DBI for details on DBI, SQL::Statement for details on
41 SQL::Statement and DBD::File for details on the base class DBD::File.
42
43 Prerequisites
44 The only system dependent feature that DBD::File uses, is the "flock
45 ()" function. Thus the module should run (in theory) on any system with
46 a working "flock ()", in particular on all Unix machines and on Windows
47 NT. Under Windows 95 and MacOS the use of "flock ()" is disabled, thus
48 the module should still be usable,
49
50 Unlike other DBI drivers, you don't need an external SQL engine or a
51 running server. All you need are the following Perl modules, available
52 from any CPAN mirror, for example
53
54 http://search.cpan.org/
55
56 DBI The DBI (Database independent interface for Perl), version 1.00 or
57 a later release
58
59 DBD::File
60 This is the base class for DBD::CSV, and it is part of the DBI
61 distribution. As DBD::CSV requires version 0.38 or newer for
62 DBD::File it effectively requires DBI version 1.611 or newer.
63
64 SQL::Statement
65 A simple SQL engine. This module defines all of the SQL syntax for
66 DBD::CSV, new SQL support is added with each release so you should
67 look for updates to SQL::Statement regularly.
68
69 It is possible to run "DBD::CSV" without this module if you define
70 the environment variable $DBI_SQL_NANO to 1. This will reduce the
71 SQL support a lot though. See DBI::SQL::Nano for more details. Note
72 that the test suite does not test in this mode!
73
74 Text::CSV_XS
75 This module is used for writing rows to or reading rows from CSV
76 files.
77
78 Installation
79 Installing this module (and the prerequisites from above) is quite
80 simple. The simplest way is to install the bundle:
81
82 $ cpan Bundle::CSV
83
84 Alternatively, you can name them all
85
86 $ cpan Text::CSV_XS DBI DBD::CSV
87
88 or even trust "cpan" to resolve all dependencies for you:
89
90 $ cpan DBD::CSV
91
92 If you cannot, for whatever reason, use cpan, fetch all modules from
93 CPAN, and build with a sequence like:
94
95 gzip -d < DBD-CSV-0.28.tgz | tar xf -
96
97 (this is for Unix users, Windows users would prefer WinZip or something
98 similar) and then enter the following:
99
100 cd DBD-CSV-0.28
101 perl Makefile.PL
102 make test
103
104 If any tests fail, let us know. Otherwise go on with
105
106 make install UNINST=1
107
108 Note that you almost definitely need root or administrator permissions.
109 If you don't have them, read the ExtUtils::MakeMaker man page for
110 details on installing in your own directories. ExtUtils::MakeMaker.
111
112 Supported SQL Syntax
113 All SQL processing for DBD::CSV is done by the SQL::Statement module.
114 Features include joins, aliases, built-in and user-defined functions,
115 and more. See SQL::Statement::Syntax for a description of the SQL
116 syntax supported in DBD::CSV.
117
118 Table names are case insensitive unless quoted.
119
121 For most things, DBD-CSV operates the same as any DBI driver. See DBI
122 for detailed usage.
123
124 Creating a database handle (connect)
125 Creating a database handle usually implies connecting to a database
126 server. Thus this command reads
127
128 use DBI;
129 my $dbh = DBI->connect ("dbi:CSV:", "", "", {
130 f_dir => "/home/user/folder",
131 });
132
133 The directory tells the driver where it should create or open tables
134 (a.k.a. files). It defaults to the current directory, so the following
135 are equivalent:
136
137 $dbh = DBI->connect ("dbi:CSV:");
138 $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "." });
139 $dbh = DBI->connect ("dbi:CSV:f_dir=.");
140
141 We were told, that VMS might - for whatever reason - require:
142
143 $dbh = DBI->connect ("dbi:CSV:f_dir=");
144
145 The preferred way of passing the arguments is by driver attributes:
146
147 # specify most possible flags via driver flags
148 $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
149 f_schema => undef,
150 f_dir => "data",
151 f_ext => ".csv/r",
152 f_lock => 2,
153 f_encoding => "utf8",
154
155 csv_eol => "\r\n",
156 csv_sep_char => ",",
157 csv_quote_char => '"',
158 csv_escape_char => '"',
159 csv_class => "Text::CSV_XS",
160 csv_null => 1,
161 csv_tables => {
162 info => { file => "info.csv" }
163 },
164
165 RaiseError => 1,
166 PrintError => 1,
167 FetchHashKeyName => "NAME_lc",
168 }) or die $DBI::errstr;
169
170 but you may set these attributes in the DSN as well, separated by
171 semicolons. Pay attention to the semi-colon for "csv_sep_char" (as
172 seen in many CSV exports from MS Excel) is being escaped in below
173 example, as is would otherwise be seen as attribute separator:
174
175 $dbh = DBI->connect (
176 "dbi:CSV:f_dir=$ENV{HOME}/csvdb;f_ext=.csv;f_lock=2;" .
177 "f_encoding=utf8;csv_eol=\n;csv_sep_char=\\;;" .
178 "csv_quote_char=\";csv_escape_char=\\;csv_class=Text::CSV_XS;" .
179 "csv_null=1") or die $DBI::errstr;
180
181 Using attributes in the DSN is easier to use when the DSN is derived
182 from an outside source (environment variable, database entry, or
183 configure file), whereas using all entries in the attribute hash is
184 easier to read and to maintain.
185
186 Creating and dropping tables
187 You can create and drop tables with commands like the following:
188
189 $dbh->do ("CREATE TABLE $table (id INTEGER, name CHAR (64))");
190 $dbh->do ("DROP TABLE $table");
191
192 Note that currently only the column names will be stored and no other
193 data. Thus all other information including column type (INTEGER or
194 CHAR (x), for example), column attributes (NOT NULL, PRIMARY KEY, ...)
195 will silently be discarded. This may change in a later release.
196
197 A drop just removes the file without any warning.
198
199 See DBI for more details.
200
201 Table names cannot be arbitrary, due to restrictions of the SQL syntax.
202 I recommend that table names are valid SQL identifiers: The first
203 character is alphabetic, followed by an arbitrary number of
204 alphanumeric characters. If you want to use other files, the file names
205 must start with "/", "./" or "../" and they must not contain white
206 space.
207
208 Inserting, fetching and modifying data
209 The following examples insert some data in a table and fetch it back:
210 First all data in the string:
211
212 $dbh->do ("INSERT INTO $table VALUES (1, ".
213 $dbh->quote ("foobar") . ")");
214
215 Note the use of the quote method for escaping the word "foobar". Any
216 string must be escaped, even if it does not contain binary data.
217
218 Next an example using parameters:
219
220 $dbh->do ("INSERT INTO $table VALUES (?, ?)", undef, 2,
221 "It's a string!");
222
223 Note that you don't need to use the quote method here, this is done
224 automatically for you. This version is particularly well designed for
225 loops. Whenever performance is an issue, I recommend using this method.
226
227 You might wonder about the "undef". Don't wonder, just take it as it
228 is. :-) It's an attribute argument that I have never ever used and will
229 be parsed to the prepare method as a second argument.
230
231 To retrieve data, you can use the following:
232
233 my $query = "SELECT * FROM $table WHERE id > 1 ORDER BY id";
234 my $sth = $dbh->prepare ($query);
235 $sth->execute ();
236 while (my $row = $sth->fetchrow_hashref) {
237 print "Found result row: id = ", $row->{id},
238 ", name = ", $row->{name};
239 }
240 $sth->finish ();
241
242 Again, column binding works: The same example again.
243
244 my $sth = $dbh->prepare (qq;
245 SELECT * FROM $table WHERE id > 1 ORDER BY id;
246 ;);
247 $sth->execute;
248 my ($id, $name);
249 $sth->bind_columns (undef, \$id, \$name);
250 while ($sth->fetch) {
251 print "Found result row: id = $id, name = $name\n";
252 }
253 $sth->finish;
254
255 Of course you can even use input parameters. Here's the same example
256 for the third time:
257
258 my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?");
259 $sth->bind_columns (undef, \$id, \$name);
260 for (my $i = 1; $i <= 2; $i++) {
261 $sth->execute ($id);
262 if ($sth->fetch) {
263 print "Found result row: id = $id, name = $name\n";
264 }
265 $sth->finish;
266 }
267
268 See DBI for details on these methods. See SQL::Statement for details on
269 the WHERE clause.
270
271 Data rows are modified with the UPDATE statement:
272
273 $dbh->do ("UPDATE $table SET id = 3 WHERE id = 1");
274
275 Likewise you use the DELETE statement for removing rows:
276
277 $dbh->do ("DELETE FROM $table WHERE id > 1");
278
279 Error handling
280 In the above examples we have never cared about return codes. Of
281 course, this cannot be recommended. Instead we should have written (for
282 example):
283
284 my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?") or
285 die "prepare: " . $dbh->errstr ();
286 $sth->bind_columns (undef, \$id, \$name) or
287 die "bind_columns: " . $dbh->errstr ();
288 for (my $i = 1; $i <= 2; $i++) {
289 $sth->execute ($id) or
290 die "execute: " . $dbh->errstr ();
291 $sth->fetch and
292 print "Found result row: id = $id, name = $name\n";
293 }
294 $sth->finish ($id) or die "finish: " . $dbh->errstr ();
295
296 Obviously this is tedious. Fortunately we have DBI's RaiseError
297 attribute:
298
299 $dbh->{RaiseError} = 1;
300 $@ = "";
301 eval {
302 my $sth = $dbh->prepare ("SELECT * FROM $table WHERE id = ?");
303 $sth->bind_columns (undef, \$id, \$name);
304 for (my $i = 1; $i <= 2; $i++) {
305 $sth->execute ($id);
306 $sth->fetch and
307 print "Found result row: id = $id, name = $name\n";
308 }
309 $sth->finish ($id);
310 };
311 $@ and die "SQL database error: $@";
312
313 This is not only shorter, it even works when using DBI methods within
314 subroutines.
315
317 Metadata
318 The following attributes are handled by DBI itself and not by
319 DBD::File, thus they all work as expected:
320
321 Active
322 ActiveKids
323 CachedKids
324 CompatMode (Not used)
325 InactiveDestroy
326 Kids
327 PrintError
328 RaiseError
329 Warn (Not used)
330
331 The following DBI attributes are handled by DBD::File:
332
333 AutoCommit
334 Always on
335
336 ChopBlanks
337 Works
338
339 NUM_OF_FIELDS
340 Valid after "$sth->execute"
341
342 NUM_OF_PARAMS
343 Valid after "$sth->prepare"
344
345 NAME
346 NAME_lc
347 NAME_uc
348 Valid after "$sth->execute"; undef for Non-Select statements.
349
350 NULLABLE
351 Not really working. Always returns an array ref of one's, as
352 DBD::CSV does not verify input data. Valid after "$sth->execute";
353 undef for non-Select statements.
354
355 These attributes and methods are not supported:
356
357 bind_param_inout
358 CursorName
359 LongReadLen
360 LongTruncOk
361
363 In addition to the DBI attributes, you can use the following dbh
364 attributes:
365
366 f_dir
367 This attribute is used for setting the directory where CSV files
368 are opened. Usually you set it in the dbh, it defaults to the
369 current directory ("."). However, it is overwritable in the
370 statement handles.
371
372 f_ext
373 This attribute is used for setting the file extension.
374
375 f_schema
376 This attribute allows you to set the database schema name. The
377 default is to use the owner of "f_dir". "undef" is allowed, but not
378 in the DSN part.
379
380 my $dbh = DBI->connect ("dbi:CSV:", "", "", {
381 f_schema => undef,
382 f_dir => "data",
383 f_ext => ".csv/r",
384 }) or die $DBI::errstr;
385
386 f_encoding
387 This attribute allows you to set the encoding of the data. With
388 CSV, it is not possible to set (and remember) the encoding on a
389 per-field basis, but DBD::File now allows to set the encoding of
390 the underlying file. If this attribute is not set, or undef is
391 passed, the file will be seen as binary.
392
393 f_lock
394 With this attribute, you can force locking mode (if locking is
395 supported at all) for opening tables. By default, tables are opened
396 with a shared lock for reading, and with an exclusive lock for
397 writing. The supported modes are:
398
399 0 Force no locking at all.
400
401 1 Only shared locks will be used.
402
403 2 Only exclusive locks will be used.
404
405 But see "KNOWN BUGS" in DBD::File.
406
407 csv_eol
408 csv_sep_char
409 csv_quote_char
410 csv_escape_char
411 csv_class
412 csv_csv
413 The attributes csv_eol, csv_sep_char, csv_quote_char and
414 csv_escape_char are corresponding to the respective attributes of
415 the Text::CSV_XS object. You want to set these attributes if you
416 have unusual CSV files like /etc/passwd or MS Excel generated CSV
417 files with a semicolon as separator. Defaults are "\015\012", ';',
418 '"' and '"', respectively.
419
420 The csv_eol attribute defines the end-of-line pattern, which is
421 better known as a record separator pattern since it separates
422 records. The default is windows-style end-of-lines "\015\012" for
423 output (writing) and unset for input (reading), so if on unix you
424 may want to set this to newline ("\n") like this:
425
426 $dbh->{csv_eol} = "\n";
427
428 It is also possible to use multi-character patterns as record
429 separators. For example this file uses newlines as field
430 separators (sep_char) and the pattern "\n__ENDREC__\n" as the
431 record separators (eol):
432
433 name
434 city
435 __ENDREC__
436 joe
437 seattle
438 __ENDREC__
439 sue
440 portland
441 __ENDREC__
442
443 To handle this file, you'd do this:
444
445 $dbh->{eol} = "\n__ENDREC__\n" ,
446 $dbh->{sep_char} = "\n"
447
448 The attributes are used to create an instance of the class
449 csv_class, by default Text::CSV_XS. Alternatively you may pass an
450 instance as csv_csv, the latter takes precedence. Note that the
451 binary attribute must be set to a true value in that case.
452
453 Additionally you may overwrite these attributes on a per-table base
454 in the csv_tables attribute.
455
456 csv_null
457 With this option set, all new statement handles will set
458 "always_quote" and "blank_is_undef" in the CSV parser and writer,
459 so it knows how to distinguish between the empty string and "undef"
460 or "NULL". You cannot reset it with a false value. You can pass it
461 to connect, or set it later:
462
463 $dbh = DBI->connect ("dbi:CSV:", "", "", { csv_null => 1 });
464
465 $dbh->{csv_null} = 1;
466
467 csv_tables
468 This hash ref is used for storing table dependent metadata. For any
469 table it contains an element with the table name as key and another
470 hash ref with the following attributes:
471
472 csv_*
473 All other attributes that start with "csv_" and are not described
474 above will be passed to "Text::CSV_XS" (without the "csv_" prefix).
475 these extra options are most likely to be only useful for reading
476 (select) handles. Examples:
477
478 $dbh->{csv_allow_whitespace} = 1;
479 $dbh->{csv_allow_loose_quotes} = 1;
480 $dbh->{csv_allow_loose_escapes} = 1;
481
482 See the "Text::CSV_XS" documentation for the full list and the
483 documentation.
484
485 file
486 The tables file name; defaults to
487
488 "$dbh->{f_dir}/$table"
489
490 eol
491 sep_char
492 quote_char
493 escape_char
494 class
495 csv These correspond to the attributes csv_eol, csv_sep_char,
496 csv_quote_char, csv_escape_char, csv_class and csv_csv. The
497 difference is that they work on a per-table base.
498
499 col_names
500 skip_first_row
501 By default DBD::CSV assumes that column names are stored in the
502 first row of the CSV file and sanitizes them (see "raw_header"
503 below). If this is not the case, you can supply an array ref of
504 table names with the col_names attribute. In that case the
505 attribute skip_first_row will be set to FALSE.
506
507 If you supply an empty array ref, the driver will read the
508 first row for you, count the number of columns and create
509 column names like "col0", "col1", ...
510
511 raw_header
512 Due to the SQL standard, field names cannot contain special
513 characters like a dot ("."). Following the approach of
514 mdb_tools, all these tokens are translated to an underscore
515 ("_") when reading the first line of the CSV file, so all field
516 names are `sanitized'. If you do not want this to happen, set
517 "raw_header" to a true value. DBD::CSV cannot guarantee that
518 any part in the toolchain will work if field names have those
519 characters, and the chances are high that the SQL statements
520 will fail.
521
522 It's strongly recommended to check the attributes supported by
523 "Metadata" in DBD::File.
524
525 Example: Suggest you want to use /etc/passwd as a CSV file. :-) There
526 simplest way is:
527
528 use DBI;
529 my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
530 f_dir => "/etc",
531 csv_sep_char => ":",
532 csv_quote_char => undef,
533 csv_escape_char => undef,
534 });
535 $dbh->{csv_tables}{passwd} = {
536 col_names => [qw( login password uid gid realname
537 directory shell )];
538 };
539 $sth = $dbh->prepare ("SELECT * FROM passwd");
540
541 Another possibility where you leave all the defaults as they are and
542 overwrite them on a per table base:
543
544 require DBI;
545 my $dbh = DBI->connect ("dbi:CSV:");
546 $dbh->{csv_tables}{passwd} = {
547 eol => "\n",
548 sep_char => ":",
549 quote_char => undef,
550 escape_char => undef,
551 file => "/etc/passwd",
552 col_names => [qw( login password uid gid
553 realname directory shell )],
554 };
555 $sth = $dbh->prepare ("SELECT * FROM passwd");
556
557 Driver private methods
558 These methods are inherited from DBD::File:
559
560 data_sources
561 The "data_sources" method returns a list of sub-directories of the
562 current directory in the form "dbi:CSV:directory=$dirname".
563
564 If you want to read the sub-directories of another directory, use
565
566 my $drh = DBI->install_driver ("CSV");
567 my @list = $drh->data_sources (f_dir => "/usr/local/csv_data");
568
569 list_tables
570 This method returns a list of file names inside $dbh->{directory}.
571 Example:
572
573 my $dbh = DBI->connect ("dbi:CSV:directory=/usr/local/csv_data");
574 my @list = $dbh->func ("list_tables");
575
576 Note that the list includes all files contained in the directory,
577 even those that have non-valid table names, from the view of SQL.
578 See "Creating and dropping tables" above.
579
581 ยท The module is using flock () internally. However, this function is
582 not available on platforms. Using flock () is disabled on MacOS and
583 Windows 95: There's no locking at all (perhaps not so important on
584 these operating systems, as they are for single users anyways).
585
587 Tests
588 Aim for a full 100% code coverage
589
590 - eol Make tests for different record separators.
591 - csv_xs Test with a variety of combinations for
592 sep_char, quote_char, and escape_char testing
593 - quoting $dbh->do ("drop table $_") for DBI-tables ();
594 - errors Make sure that all documented exceptions are tested.
595 . write to write-protected file
596 . read from badly formatted csv
597 . pass bad arguments to csv parser while fetching
598
599 Add tests that specifically test DBD::File functionality where that
600 is useful.
601
602 RT Attack all open DBD::CSV bugs in RT
603
604 CPAN::Forum
605 Attack all items in http://www.cpanforum.com/dist/DBD-CSV
606
607 Documentation
608 Expand on error-handling, and document all possible errors. Use
609 Text::CSV_XS::error_diag () wherever possible.
610
611 Debugging
612 Implement and document dbd_verbose.
613
614 Data dictionary
615 Investigate the possibility to store the data dictionary in a file
616 like .sys$columns that can store the field attributes (type, key,
617 nullable).
618
619 Examples
620 Make more real-life examples from the docs in examples/
621
623 DBI, Text::CSV_XS, SQL::Statement, DBI::SQL::Nano
624
625 For help on the use of DBD::CSV, see the DBI users mailing list:
626
627 http://lists.cpan.org/showlist.cgi?name=dbi-users
628
629 For general information on DBI see
630
631 http://dbi.perl.org/ and http://faq.dbi-support.com/
632
634 This module is currently maintained by
635
636 H.Merijn Brand <h.m.brand@xs4all.nl>
637
638 in close cooperation with and help from
639
640 Jens Rehsack <sno@NetBSD.org>
641
642 The original author is Jochen Wiedmann. Previous maintainer was Jeff
643 Zucker
644
646 Copyright (C) 2009-2010 by H.Merijn Brand Copyright (C) 2004-2009 by
647 Jeff Zucker Copyright (C) 1998-2004 by Jochen Wiedmann
648
649 All rights reserved.
650
651 You may distribute this module under the terms of either the GNU
652 General Public License or the Artistic License, as specified in the
653 Perl README file.
654
655
656
657perl v5.12.1 2010-07-09 DBD::CSV(3)