1CSV_XS(3) User Contributed Perl Documentation CSV_XS(3)
2
3
4
6 Text::CSV_XS - comma-separated values manipulation routines
7
9 use Text::CSV_XS;
10
11 my @rows;
12 my $csv = Text::CSV_XS->new ({ binary => 1 }) or
13 die "Cannot use CSV: ".Text::CSV->error_diag ();
14 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
15 while (my $row = $csv->getline ($fh)) {
16 $row->[2] =~ m/pattern/ or next; # 3rd field should match
17 push @rows, $row;
18 }
19 $csv->eof or $csv->error_diag ();
20 close $fh;
21
22 $csv->eol ("\r\n");
23 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
24 $csv->print ($fh, $_) for @rows;
25 close $fh or die "new.csv: $!";
26
28 Text::CSV_XS provides facilities for the composition and decomposition
29 of comma-separated values. An instance of the Text::CSV_XS class can
30 combine fields into a CSV string and parse a CSV string into fields.
31
32 The module accepts either strings or files as input and can utilize any
33 user-specified characters as delimiters, separators, and escapes so it
34 is perhaps better called ASV (anything separated values) rather than
35 just CSV.
36
37 Embedded newlines
38 Important Note: The default behavior is to only accept ascii
39 characters. This means that fields can not contain newlines. If your
40 data contains newlines embedded in fields, or characters above 0x7e
41 (tilde), or binary data, you *must* set "binary => 1" in the call to
42 "new ()". To cover the widest range of parsing options, you will
43 always want to set binary.
44
45 But you still have the problem that you have to pass a correct line to
46 the "parse ()" method, which is more complicated from the usual point
47 of usage:
48
49 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
50 while (<>) { # WRONG!
51 $csv->parse ($_);
52 my @fields = $csv->fields ();
53
54 will break, as the while might read broken lines, as that doesn't care
55 about the quoting. If you need to support embedded newlines, the way to
56 go is either
57
58 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
59 while (my $row = $csv->getline (*ARGV)) {
60 my @fields = @$row;
61
62 or, more safely in perl 5.6 and up
63
64 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
65 open my $io, "<", $file or die "$file: $!";
66 while (my $row = $csv->getline ($io)) {
67 my @fields = @$row;
68
69 Unicode (UTF8)
70 On parsing (both for "getline ()" and "parse ()"), if the source is
71 marked being UTF8, then all fields that are marked binary will also be
72 be marked UTF8.
73
74 On combining ("print ()" and "combine ()"), if any of the combining
75 fields was marked UTF8, the resulting string will be marked UTF8.
76
77 For complete control over encoding, please use Text::CSV::Encoded:
78
79 use Text::CSV::Encoded;
80 my $csv = Text::CSV::Encoded->new ({
81 encoding_in => "iso-8859-1", # the encoding comes into Perl
82 encoding_out => "cp1252", # the encoding comes out of Perl
83 });
84
85 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
86 # combine () and print () accept *literally* utf8 encoded data
87 # parse () and getline () return *literally* utf8 encoded data
88
89 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
90 # combine () and print () accept UTF8 marked data
91 # parse () and getline () return UTF8 marked data
92
94 While no formal specification for CSV exists, RFC 4180 1) describes a
95 common format and establishes "text/csv" as the MIME type registered
96 with the IANA.
97
98 Many informal documents exist that describe the CSV format. How To: The
99 Comma Separated Value (CSV) File Format 2) provides an overview of the
100 CSV format in the most widely used applications and explains how it can
101 best be used and supported.
102
103 1) http://tools.ietf.org/html/rfc4180
104 2) http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
105
106 The basic rules are as follows:
107
108 CSV is a delimited data format that has fields/columns separated by the
109 comma character and records/rows separated by newlines. Fields that
110 contain a special character (comma, newline, or double quote), must be
111 enclosed in double quotes. However, if a line contains a single entry
112 which is the empty string, it may be enclosed in double quotes. If a
113 field's value contains a double quote character it is escaped by
114 placing another double quote character next to it. The CSV file format
115 does not require a specific character encoding, byte order, or line
116 terminator format.
117
118 · Each record is one line terminated by a line feed (ASCII/LF=0x0A) or
119 a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however,
120 line-breaks can be embedded.
121
122 · Fields are separated by commas.
123
124 · Allowable characters within a CSV field include 0x09 (tab) and the
125 inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode
126 all characters are accepted, at least in quoted fields.
127
128 · A field within CSV must be surrounded by double-quotes to contain a
129 the separator character (comma).
130
131 Though this is the most clear and restrictive definition, Text::CSV_XS
132 is way more liberal than this, and allows extension:
133
134 · Line termination by a single carriage return is accepted by default
135
136 · The separation-, escape-, and escape- characters can be any ASCII
137 character in the range from 0x20 (space) to 0x7E (tilde). Characters
138 outside this range may or may not work as expected. Multibyte
139 characters, like U+060c (ARABIC COMMA), U+FF0C (FULLWIDTH COMMA),
140 U+241B (SYMBOL FOR ESCAPE), U+2424 (SYMBOL FOR NEWLINE), U+FF02
141 (FULLWIDTH QUOTATION MARK), and U+201C (LEFT DOUBLE QUOTATION MARK)
142 (to give some examples of what might look promising) are therefor not
143 allowed.
144
145 If you use perl-5.8.2 or higher, these three attributes are
146 utf8-decoded, to increase the likelihood of success. This way U+00FE
147 will be allowed as a quote character.
148
149 · A field within CSV must be surrounded by double-quotes to contain an
150 embedded double-quote, represented by a pair of consecutive double-
151 quotes. In binary mode you may additionally use the sequence ""0" for
152 representation of a NULL byte.
153
154 · Several violations of the above specification may be allowed by
155 passing options to the object creator.
156
158 version ()
159 (Class method) Returns the current module version.
160
161 new (\%attr)
162 (Class method) Returns a new instance of Text::CSV_XS. The objects
163 attributes are described by the (optional) hash ref "\%attr".
164 Currently the following attributes are available:
165
166 eol An end-of-line string to add to rows. "undef" is replaced with an
167 empty string. The default is "$\". Common values for "eol" are
168 "\012" (Line Feed) or "\015\012" (Carriage Return, Line Feed).
169 Cannot be longer than 7 (ASCII) characters.
170
171 If both $/ and "eol" equal "\015", parsing lines that end on only a
172 Carriage Return without Line Feed, will be "parse"d correct. Line
173 endings, whether in $/ or "eol", other than "undef", "\n", "\r\n",
174 or "\r" are not (yet) supported for parsing.
175
176 sep_char
177 The char used for separating fields, by default a comma. (",").
178 Limited to a single-byte character, usually in the range from 0x20
179 (space) to 0x7e (tilde).
180
181 The separation character can not be equal to the quote character.
182 The separation character can not be equal to the escape character.
183
184 See also CAVEATS
185
186 allow_whitespace
187 When this option is set to true, whitespace (TAB's and SPACE's)
188 surrounding the separation character is removed when parsing. If
189 either TAB or SPACE is one of the three major characters
190 "sep_char", "quote_char", or "escape_char" it will not be
191 considered whitespace.
192
193 So lines like:
194
195 1 , "foo" , bar , 3 , zapp
196
197 are now correctly parsed, even though it violates the CSV specs.
198
199 Note that all whitespace is stripped from start and end of each
200 field. That would make it more a feature than a way to be able to
201 parse bad CSV lines, as
202
203 1, 2.0, 3, ape , monkey
204
205 will now be parsed as
206
207 ("1", "2.0", "3", "ape", "monkey")
208
209 even if the original line was perfectly sane CSV.
210
211 blank_is_undef
212 Under normal circumstances, CSV data makes no distinction between
213 quoted- and unquoted empty fields. They both end up in an empty
214 string field once read, so
215
216 1,"",," ",2
217
218 is read as
219
220 ("1", "", "", " ", "2")
221
222 When writing CSV files with "always_quote" set, the unquoted empty
223 field is the result of an undefined value. To make it possible to
224 also make this distinction when reading CSV data, the
225 "blank_is_undef" option will cause unquoted empty fields to be set
226 to undef, causing the above to be parsed as
227
228 ("1", "", undef, " ", "2")
229
230 empty_is_undef
231 Going one step further than "blank_is_undef", this attribute
232 converts all empty fields to undef, so
233
234 1,"",," ",2
235
236 is read as
237
238 (1, undef, undef, " ", 2)
239
240 Note that this only effects fields that are realy empty, not fields
241 that are empty after stripping allowed whitespace. YMMV.
242
243 quote_char
244 The char used for quoting fields containing blanks, by default the
245 double quote character ("""). A value of undef suppresses quote
246 chars. (For simple cases only). Limited to a single-byte
247 character, usually in the range from 0x20 (space) to 0x7e (tilde).
248
249 The quote character can not be equal to the separation character.
250
251 allow_loose_quotes
252 By default, parsing fields that have "quote_char" characters inside
253 an unquoted field, like
254
255 1,foo "bar" baz,42
256
257 would result in a parse error. Though it is still bad practice to
258 allow this format, we cannot help there are some vendors that make
259 their applications spit out lines styled like this.
260
261 In case there is really bad CSV data, like
262
263 1,"foo "bar" baz",42
264
265 or
266
267 1,""foo bar baz"",42
268
269 there is a way to get that parsed, and leave the quotes inside the
270 quoted field as-is. This can be achieved by setting
271 "allow_loose_quotes" AND making sure that the "escape_char" is not
272 equal to "quote_char".
273
274 escape_char
275 The character used for escaping certain characters inside quoted
276 fields. Limited to a single-byte character, usually in the range
277 from 0x20 (space) to 0x7e (tilde).
278
279 The "escape_char" defaults to being the literal double-quote mark
280 (""") in other words, the same as the default "quote_char". This
281 means that doubling the quote mark in a field escapes it:
282
283 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
284
285 If you change the default quote_char without changing the default
286 escape_char, the escape_char will still be the quote mark. If
287 instead you want to escape the quote_char by doubling it, you will
288 need to change the escape_char to be the same as what you changed
289 the quote_char to.
290
291 The escape character can not be equal to the separation character.
292
293 allow_loose_escapes
294 By default, parsing fields that have "escape_char" characters that
295 escape characters that do not need to be escaped, like:
296
297 my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
298 $csv->parse (qq{1,"my bar\'s",baz,42});
299
300 would result in a parse error. Though it is still bad practice to
301 allow this format, this option enables you to treat all escape
302 character sequences equal.
303
304 binary
305 If this attribute is TRUE, you may use binary characters in quoted
306 fields, including line feeds, carriage returns and NULL bytes. (The
307 latter must be escaped as ""0".) By default this feature is off.
308
309 If a string is marked UTF8, binary will be turned on automatically
310 when binary characters other than CR or NL are encountered. Note
311 that a simple string like "\x{00a0}" might still be binary, but not
312 marked UTF8, so setting "{ binary =" 1 }> is still a wise option.
313
314 types
315 A set of column types; this attribute is immediately passed to the
316 types method below. You must not set this attribute otherwise,
317 except for using the types method. For details see the description
318 of the types method below.
319
320 always_quote
321 By default the generated fields are quoted only, if they need to,
322 for example, if they contain the separator. If you set this
323 attribute to a TRUE value, then all fields will be quoted. This is
324 typically easier to handle in external applications. (Poor
325 creatures who aren't using Text::CSV_XS. :-)
326
327 quote_space
328 By default, a space in a field would trigger quotation. As no rule
329 exists this to be forced in CSV, nor any for the opposite, the
330 default is true for safety. You can exclude the space from this
331 trigger by setting this attribute to 0.
332
333 quote_null
334 By default, a NULL byte in a field would be escaped. This attribute
335 enables you to treat the NULL byte as a simple binary character in
336 binary mode (the "{ binary =" 1 }> is set). The default is true.
337 You can prevent NULL escapes by setting this attribute to 0.
338
339 keep_meta_info
340 By default, the parsing of input lines is as simple and fast as
341 possible. However, some parsing information - like quotation of the
342 original field - is lost in that process. Set this flag to true to
343 be able to retrieve that information after parsing with the methods
344 "meta_info ()", "is_quoted ()", and "is_binary ()" described below.
345 Default is false.
346
347 verbatim
348 This is a quite controversial attribute to set, but it makes hard
349 things possible.
350
351 The basic thought behind this is to tell the parser that the
352 normally special characters newline (NL) and Carriage Return (CR)
353 will not be special when this flag is set, and be dealt with as
354 being ordinary binary characters. This will ease working with data
355 with embedded newlines.
356
357 When "verbatim" is used with "getline ()", "getline ()" auto-
358 chomp's every line.
359
360 Imagine a file format like
361
362 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
363
364 where, the line ending is a very specific "#\r\n", and the sep_char
365 is a ^ (caret). None of the fields is quoted, but embedded binary
366 data is likely to be present. With the specific line ending, that
367 shouldn't be too hard to detect.
368
369 By default, Text::CSV_XS' parse function however is instructed to
370 only know about "\n" and "\r" to be legal line endings, and so has
371 to deal with the embedded newline as a real end-of-line, so it can
372 scan the next line if binary is true, and the newline is inside a
373 quoted field. With this attribute however, we can tell parse () to
374 parse the line as if \n is just nothing more than a binary
375 character.
376
377 For parse () this means that the parser has no idea about line
378 ending anymore, and getline () chomps line endings on reading.
379
380 auto_diag
381 Set to true will cause "error_diag ()" to be automatically be
382 called in void context upon errors.
383
384 In case of error "2012 - EOF", this call will be void.
385
386 If set to a value greater than 1, it will die on errors instead of
387 warn.
388
389 Future extensions to this feature will include more reliable auto-
390 detection of the "autodie" module being enabled, which will raise
391 the value of "auto_diag" with 1 on the moment the error is
392 detected.
393
394 To sum it up,
395
396 $csv = Text::CSV_XS->new ();
397
398 is equivalent to
399
400 $csv = Text::CSV_XS->new ({
401 quote_char => '"',
402 escape_char => '"',
403 sep_char => ',',
404 eol => $\,
405 always_quote => 0,
406 quote_space => 1,
407 quote_null => 1,
408 binary => 0,
409 keep_meta_info => 0,
410 allow_loose_quotes => 0,
411 allow_loose_escapes => 0,
412 allow_whitespace => 0,
413 blank_is_undef => 0,
414 empty_is_undef => 0,
415 verbatim => 0,
416 auto_diag => 0,
417 });
418
419 For all of the above mentioned flags, there is an accessor method
420 available where you can inquire for the current value, or change the
421 value
422
423 my $quote = $csv->quote_char;
424 $csv->binary (1);
425
426 It is unwise to change these settings halfway through writing CSV data
427 to a stream. If however, you want to create a new stream using the
428 available CSV object, there is no harm in changing them.
429
430 If the "new ()" constructor call fails, it returns "undef", and makes
431 the fail reason available through the "error_diag ()" method.
432
433 $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
434 die "".Text::CSV_XS->error_diag ();
435
436 "error_diag ()" will return a string like
437
438 "INI - Unknown attribute 'ecs_char'"
439
440 print
441 $status = $csv->print ($io, $colref);
442
443 Similar to "combine () + string () + print", but more efficient. It
444 expects an array ref as input (not an array!) and the resulting string
445 is not really created, but immediately written to the $io object,
446 typically an IO handle or any other object that offers a print method.
447 Note, this implies that the following is wrong in perl 5.005_xx and
448 older:
449
450 open FILE, ">", "whatever";
451 $status = $csv->print (\*FILE, $colref);
452
453 as in perl 5.005 and older, the glob "\*FILE" is not an object, thus it
454 doesn't have a print method. The solution is to use an IO::File object
455 or to hide the glob behind an IO::Wrap object. See IO::File and
456 IO::Wrap for details.
457
458 For performance reasons the print method doesn't create a result
459 string. In particular the $csv->string (), $csv->status (),
460 $csv-fields ()> and $csv->error_input () methods are meaningless after
461 executing this method.
462
463 combine
464 $status = $csv->combine (@columns);
465
466 This object function constructs a CSV string from the arguments,
467 returning success or failure. Failure can result from lack of
468 arguments or an argument containing an invalid character. Upon
469 success, "string ()" can be called to retrieve the resultant CSV
470 string. Upon failure, the value returned by "string ()" is undefined
471 and "error_input ()" can be called to retrieve an invalid argument.
472
473 string
474 $line = $csv->string ();
475
476 This object function returns the input to "parse ()" or the resultant
477 CSV string of "combine ()", whichever was called more recently.
478
479 getline
480 $colref = $csv->getline ($io);
481
482 This is the counterpart to print, like parse is the counterpart to
483 combine: It reads a row from the IO object $io using $io->getline ()
484 and parses this row into an array ref. This array ref is returned by
485 the function or undef for failure.
486
487 When fields are bound with "bind_columns ()", the return value is a
488 reference to an empty list.
489
490 The $csv->string (), $csv->fields () and $csv->status () methods are
491 meaningless, again.
492
493 parse
494 $status = $csv->parse ($line);
495
496 This object function decomposes a CSV string into fields, returning
497 success or failure. Failure can result from a lack of argument or the
498 given CSV string is improperly formatted. Upon success, "fields ()"
499 can be called to retrieve the decomposed fields . Upon failure, the
500 value returned by "fields ()" is undefined and "error_input ()" can be
501 called to retrieve the invalid argument.
502
503 You may use the types () method for setting column types. See the
504 description below.
505
506 getline_hr
507 The "getline_hr ()" and "column_names ()" methods work together to
508 allow you to have rows returned as hashrefs. You must call
509 "column_names ()" first to declare your column names.
510
511 $csv->column_names (qw( code name price description ));
512 $hr = $csv->getline_hr ($io);
513 print "Price for $hr->{name} is $hr->{price} EUR\n";
514
515 "getline_hr ()" will croak if called before "column_names ()".
516
517 column_names
518 Set the keys that will be used in the "getline_hr ()" calls. If no keys
519 (column names) are passed, it'll return the current setting.
520
521 "column_names ()" accepts a list of scalars (the column names) or a
522 single array_ref, so you can pass "getline ()"
523
524 $csv->column_names ($csv->getline ($io));
525
526 "column_names ()" does no checking on duplicates at all, which might
527 lead to unwanted results. Undefined entries will be replaced with the
528 string "\cAUNDEF\cA", so
529
530 $csv->column_names (undef, "", "name", "name");
531 $hr = $csv->getline_hr ($io);
532
533 Will set "$hr-"{"\cAUNDEF\cA"}> to the 1st field, "$hr-"{""}> to the
534 2nd field, and "$hr-"{name}> to the 4th field, discarding the 3rd
535 field.
536
537 "column_names ()" croaks on invalid arguments.
538
539 bind_columns
540 Takes a list of references to scalars to store the fields fetched
541 "getline ()" in. When you don't pass enough references to store the
542 fetched fields in, "getline ()" will fail. If you pass more than there
543 are fields to return, the remaining references are left untouched.
544
545 $csv->bind_columns (\$code, \$name, \$price, \$description);
546 while ($csv->getline ($io)) {
547 print "The price of a $name is \x{20ac} $price\n";
548 }
549
550 eof
551 $eof = $csv->eof ();
552
553 If "parse ()" or "getline ()" was used with an IO stream, this method
554 will return true (1) if the last call hit end of file, otherwise it
555 will return false (''). This is useful to see the difference between a
556 failure and end of file.
557
558 types
559 $csv->types (\@tref);
560
561 This method is used to force that columns are of a given type. For
562 example, if you have an integer column, two double columns and a string
563 column, then you might do a
564
565 $csv->types ([Text::CSV_XS::IV (),
566 Text::CSV_XS::NV (),
567 Text::CSV_XS::NV (),
568 Text::CSV_XS::PV ()]);
569
570 Column types are used only for decoding columns, in other words by the
571 parse () and getline () methods.
572
573 You can unset column types by doing a
574
575 $csv->types (undef);
576
577 or fetch the current type settings with
578
579 $types = $csv->types ();
580
581 IV Set field type to integer.
582
583 NV Set field type to numeric/float.
584
585 PV Set field type to string.
586
587 fields
588 @columns = $csv->fields ();
589
590 This object function returns the input to "combine ()" or the resultant
591 decomposed fields of C successful <parse ()>, whichever was called more
592 recently.
593
594 Note that the return value is undefined after using "getline ()", which
595 does not fill the data structures returned by "parse ()".
596
597 meta_info
598 @flags = $csv->meta_info ();
599
600 This object function returns the flags of the input to "combine ()" or
601 the flags of the resultant decomposed fields of "parse ()", whichever
602 was called more recently.
603
604 For each field, a meta_info field will hold flags that tell something
605 about the field returned by the "fields ()" method or passed to the
606 "combine ()" method. The flags are bitwise-or'd like:
607
608 0x0001
609 The field was quoted.
610
611 0x0002
612 The field was binary.
613
614 See the "is_*** ()" methods below.
615
616 is_quoted
617 my $quoted = $csv->is_quoted ($column_idx);
618
619 Where $column_idx is the (zero-based) index of the column in the last
620 result of "parse ()".
621
622 This returns a true value if the data in the indicated column was
623 enclosed in "quote_char" quotes. This might be important for data where
624 ",20070108," is to be treated as a numeric value, and where
625 ","20070108"," is explicitly marked as character string data.
626
627 is_binary
628 my $binary = $csv->is_binary ($column_idx);
629
630 Where $column_idx is the (zero-based) index of the column in the last
631 result of "parse ()".
632
633 This returns a true value if the data in the indicated column contained
634 any byte in the range [\x00-\x08,\x10-\x1F,\x7F-\xFF]
635
636 status
637 $status = $csv->status ();
638
639 This object function returns success (or failure) of "combine ()" or
640 "parse ()", whichever was called more recently.
641
642 error_input
643 $bad_argument = $csv->error_input ();
644
645 This object function returns the erroneous argument (if it exists) of
646 "combine ()" or "parse ()", whichever was called more recently.
647
648 error_diag
649 Text::CSV_XS->error_diag ();
650 $csv->error_diag ();
651 $error_code = 0 + $csv->error_diag ();
652 $error_str = "" . $csv->error_diag ();
653 ($cde, $str, $pos) = $csv->error_diag ();
654
655 If (and only if) an error occured, this function returns the
656 diagnostics of that error.
657
658 If called in void context, it will print the internal error code and
659 the associated error message to STDERR.
660
661 If called in list context, it will return the error code and the error
662 message in that order. If the last error was from parsing, the third
663 value returned is a best guess at the location within the line that was
664 being parsed. It's value is 1-based. See "examples/csv-check" for how
665 this can be used.
666
667 If called in scalar context, it will return the diagnostics in a single
668 scalar, a-la $!. It will contain the error code in numeric context, and
669 the diagnostics message in string context.
670
671 When called as a class method or a direct function call, the error diag
672 is that of the last "new ()" call.
673
674 SetDiag
675 $csv->SetDiag (0);
676
677 Use to reset the diagnostics if you are dealing with errors.
678
680 Combine (...)
681 Parse (...)
682
683 The arguments to these two internal functions are deliberately not
684 described or documented to enable the module author(s) to change it
685 when they feel the need for it and using them is highly discouraged as
686 the API may change in future releases.
687
689 Reading a CSV file line by line:
690
691 my $csv = Text::CSV_XS->new ({ binary => 1 });
692 open my $fh, "<", "file.csv" or die "file.csv: $!";
693 while (my $row = $csv->getline ($fh)) {
694 # do something with @$row
695 }
696 $csv->eof or $csv->error_diag;
697 close $fh or die "file.csv: $!";
698
699 Parsing CSV strings:
700
701 my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
702
703 my $sample_input_string =
704 qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
705 if ($csv->parse ($sample_input_string)) {
706 my @field = $csv->fields;
707 foreach my $col (0 .. $#field) {
708 my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
709 printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
710 }
711 }
712 else {
713 print STDERR "parse () failed on argument: ",
714 $csv->error_input, "\n";
715 $csv->error_diag ();
716 }
717
718 An example for creating CSV files using the "print ()" method, like in
719 dumping the content of a database ($dbh) table ($tbl) to CSV:
720
721 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
722 open my $fh, ">", "$tbl.csv" or die "$tbl.csv: $!";
723 my $sth = $dbh->prepare ("select * from $tbl");
724 $sth->execute;
725 $csv->print ($fh, $sth->{NAME_lc});
726 while (my $row = $sth->fetch) {
727 $csv->print ($fh, $row) or $csv->error_diag;
728 }
729 close $fh or die "$tbl.csv: $!";
730
731 or using the slower "combine ()" and "string ()" methods:
732
733 my $csv = Text::CSV_XS->new;
734
735 open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
736
737 my @sample_input_fields = (
738 'You said, "Hello!"', 5.67,
739 '"Surely"', '', '3.14159');
740 if ($csv->combine (@sample_input_fields)) {
741 print $csv_fh $csv->string, "\n";
742 }
743 else {
744 print "combine () failed on argument: ",
745 $csv->error_input, "\n";
746 }
747 close $csv_fh or die "hello.csv: $!";
748
749 For more extended examples, see the "examples/" subdirectory in the
750 original distribution or the git repository at
751 http://repo.or.cz/w/Text-CSV_XS.git?a=tree;f=examples. The following
752 files can be found there:
753
754 parser-xs.pl
755 This can be used as a boilerplate to `fix' bad CSV and parse beyond
756 errors.
757
758 $ perl examples/parser-xs.pl bad.csv >good.csv
759
760 csv-check
761 This is a command-line tool that uses parser-xs.pl techniques to
762 check the CSV file and report on its content.
763
764 $ csv-check files/utf8.csv
765 Checked with examples/csv-check 1.2 using Text::CSV_XS 0.61
766 OK: rows: 1, columns: 2
767 sep = <,>, quo = <">, bin = <1>
768
769 csv2xls
770 A script to convert CSV to Microsoft Excel. This requires Date::Calc
771 and Spreadsheet::WriteExcel. The converter accepts various options
772 and can produce UTF-8 Excel files.
773
774 csvdiff
775 A script that provides colorized diff on sorted CSV files, assuming
776 first line is header and first field is the key. Output options
777 include colorized ANSI escape codes or HTML.
778
779 $ csvdiff --html --output=diff.html file1.csv file2.csv
780
782 "Text::CSV_XS" is not designed to detect the characters used for field
783 separation and quoting. The parsing is done using predefined settings.
784 In the examples subdirectory, you can find scripts that demonstrate how
785 you can try to detect these characters yourself.
786
787 Microsoft Excel
788 The import/export from Microsoft Excel is a risky task, according to
789 the documentation in "Text::CSV::Separator". Microsoft uses the
790 system's default list separator defined in the regional settings, which
791 happens to be a semicolon for Dutch, German and Spanish (and probably
792 some others as well). For the English locale, the default is a comma.
793 In Windows however, the user is free to choose a predefined locale, and
794 then change every individual setting in it, so checking the locale is
795 no solution.
796
798 More Errors & Warnings
799 New extensions ought to be clear and concise in reporting what error
800 occurred where and why, and possibly also tell a remedy to the
801 problem. error_diag is a (very) good start, but there is more work
802 to be done here.
803
804 Basic calls should croak or warn on illegal parameters. Errors should
805 be documented.
806
807 setting meta info
808 Future extensions might include extending the "meta_info ()",
809 "is_quoted ()", and "is_binary ()" to accept setting these flags for
810 fields, so you can specify which fields are quoted in the combine
811 ()/string () combination.
812
813 $csv->meta_info (0, 1, 1, 3, 0, 0);
814 $csv->is_quoted (3, 1);
815
816 combined methods
817 Requests for adding means (methods) that combine "combine ()" and
818 "string ()" in a single call will not be honored. Likewise for "parse
819 ()" and "fields ()". Given the trouble with embedded newlines, Using
820 "getline ()" and "print ()" instead is the prefered way to go.
821
822 Parse the whole file at once
823 Implement a new methods that enables the parsing of a complete file
824 at once, returning a lis of hashes. Possible extension to this could
825 be to enable a column selection on the call:
826
827 my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
828
829 Returning something like
830
831 [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
832 flags => [ ... ],
833 errors => [ ... ],
834 },
835 { fields => [ ... ],
836 .
837 .
838 },
839 ]
840
841 EBCDIC
842 The hard-coding of characters and character ranges makes this module
843 unusable on EBCDIC system. Using some #ifdef structure could enable
844 these again without loosing speed. Testing would be the hard part.
845
847 No guarantees, but this is what I have in mind right now:
848
849 next
850 - This might very well be 1.00
851 - DIAGNOSTICS setction in pod to *describe* the errors (see below)
852 - croak / carp
853
854 next + 1
855 - csv2csv - a script to regenerate a CSV file to follow standards
856 - EBCDIC support
857
859 Still under construction ...
860
861 If an error occured, "$csv-"error_diag ()> can be used to get more
862 information on the cause of the failure. Note that for speed reasons,
863 the internal value is never cleared on success, so using the value
864 returned by "error_diag ()" in normal cases - when no error occured -
865 may cause unexpected results.
866
867 If the constructor failed, the cause can be found using "error_diag ()"
868 as a class method, like "Text::CSV_XS-"error_diag ()>.
869
870 "$csv-"error_diag ()> is automatically called upon error when the
871 contractor was called with "auto_diag" set to 1 or 2, or when "autodie"
872 is in effect. When set to 1, this will cause a "warn ()" with the
873 error message, when set to 2, it will "die ()". "2012 - EOF" is
874 excluded from "auto_diag" reports.
875
876 Currently errors as described below are available. I've tried to make
877 the error itself explanatory enough, but more descriptions will be
878 added. For most of these errors, the first three capitals describe the
879 error category:
880
881 INI
882 Initialization error or option conflict.
883
884 ECR
885 Carriage-Return related parse error.
886
887 EOF
888 End-Of-File related parse error.
889
890 EIQ
891 Parse error inside quotation.
892
893 EIF
894 Parse error inside field.
895
896 ECB
897 Combine error.
898
899 EHR
900 HashRef parse related error.
901
902 1001 "INI - sep_char is equal to quote_char or escape_char"
903 The separation character cannot be equal to either the quotation
904 character or the escape character, as that will invalidate all
905 parsing rules.
906
907 1002 "INI - allow_whitespace with escape_char or quote_char SP or TAB"
908 Using "allow_whitespace" when either "escape_char" or "quote_char" is
909 equal to SPACE or TAB is too ambiguous to allow.
910
911 1003 "INI - \r or \n in main attr not allowed"
912 Using default "eol" characters in either "sep_char", "quote_char", or
913 "escape_char" is not allowed.
914
915 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
916 When "eol" has been set to something specific, other than the
917 default, like "\r\t\n", and the "\r" is following the second
918 (closing) "quote_char", where the characters following the "\r" do
919 not make up the "eol" sequence, this is an error.
920
921 2011 "ECR - Characters after end of quoted field"
922 Sequences like "1,foo,"bar"baz,2" are not allowed. "bar" is a quoted
923 field, and after the closing quote, there should be either a new-line
924 sequence or a separation character.
925
926 2012 "EOF - End of data in parsing input stream"
927 Self-explaining. End-of-file while inside parsing a stream. Can only
928 happen when reading from streams with "getline ()", as using "parse
929 ()" is done on strings that are not required to have a trailing
930 "eol".
931
932 2021 "EIQ - NL char inside quotes, binary off"
933 Sequences like "1,"foo\nbar",2" are only allowed when the binary
934 option has been selected with the constructor.
935
936 2022 "EIQ - CR char inside quotes, binary off"
937 Sequences like "1,"foo\rbar",2" are only allowed when the binary
938 option has been selected with the constructor.
939
940 2023 "EIQ - QUO character not allowed"
941 Sequences like ""foo "bar" baz",quux" and "2023,",2008-04-05,"Foo,
942 Bar",\n" will cause this error.
943
944 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
945 The escape character is not allowed as last character in an input
946 stream.
947
948 2025 "EIQ - Loose unescaped escape"
949 An escape character should escape only characters that need escaping.
950 Allowing the escape for other characters is possible with the
951 "allow_loose_escape" attribute.
952
953 2026 "EIQ - Binary character inside quoted field, binary off"
954 Binary characters are not allowed by default. Exceptions are fields
955 that contain valid UTF-8, that will automatically be upgraded is the
956 content is valid UTF-8. Pass the "binary" attribute with a true value
957 to accept binary characters.
958
959 2027 "EIQ - Quoted field not terminated"
960 When parsing a field that started with a quotation character, the
961 field is expected to be closed with a quotation character. When the
962 parsed line is exhausted before the quote is found, that field is not
963 terminated.
964
965 2030 "EIF - NL char inside unquoted verbatim, binary off"
966 2031 "EIF - CR char is first char of field, not part of EOL"
967 2032 "EIF - CR char inside unquoted, not part of EOL"
968 2034 "EIF - Loose unescaped quote"
969 2035 "EIF - Escaped EOF in unquoted field"
970 2036 "EIF - ESC error"
971 2037 "EIF - Binary character in unquoted field, binary off"
972 2110 "ECB - Binary character in Combine, binary off"
973 2200 "EIO - print to IO failed. See errno"
974 3001 "EHR - Unsupported syntax for column_names ()"
975 3002 "EHR - getline_hr () called before column_names ()"
976 3003 "EHR - bind_columns () and column_names () fields count mismatch"
977 3004 "EHR - bind_columns () only accepts refs to scalars"
978 3006 "EHR - bind_columns () did not pass enough refs for parsed fields"
979 3007 "EHR - bind_columns needs refs to writeable scalars"
980 3008 "EHR - unexpected error in bound fields"
981
983 perl, IO::File, IO::Handle, IO::Wrap, Text::CSV, Text::CSV_PP,
984 Text::CSV::Encoded, Text::CSV::Separator, and Spreadsheet::Read.
985
987 Alan Citterman <alan@mfgrtl.com> wrote the original Perl module. Please
988 don't send mail concerning Text::CSV_XS to Alan, as he's not involved
989 in the C part which is now the main part of the module.
990
991 Jochen Wiedmann <joe@ispsoft.de> rewrote the encoding and decoding in C
992 by implementing a simple finite-state machine and added the variable
993 quote, escape and separator characters, the binary mode and the print
994 and getline methods. See ChangeLog releases 0.10 through 0.23.
995
996 H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code, added the
997 field flags methods, wrote the major part of the test suite, completed
998 the documentation, fixed some RT bugs and added all the allow flags.
999 See ChangeLog releases 0.25 and on.
1000
1002 Copyright (C) 2007-2010 H.Merijn Brand for PROCURA B.V. Copyright (C)
1003 1998-2001 Jochen Wiedmann. All rights reserved. Portions Copyright (C)
1004 1997 Alan Citterman. All rights reserved.
1005
1006 This library is free software; you can redistribute it and/or modify it
1007 under the same terms as Perl itself.
1008
1009
1010
1011perl v5.12.0 2010-03-15 CSV_XS(3)