1CSV_XS(3)             User Contributed Perl Documentation            CSV_XS(3)
2
3
4

NAME

6       Text::CSV_XS - comma-separated values manipulation routines
7

SYNOPSIS

9        use Text::CSV_XS;
10
11        my @rows;
12        my $csv = Text::CSV_XS->new ({ binary => 1 }) or
13            die "Cannot use CSV: ".Text::CSV->error_diag ();
14        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
15        while (my $row = $csv->getline ($fh)) {
16            $row->[2] =~ m/pattern/ or next; # 3rd field should match
17            push @rows, $row;
18            }
19        $csv->eof or $csv->error_diag ();
20        close $fh;
21
22        $csv->eol ("\r\n");
23        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
24        $csv->print ($fh, $_) for @rows;
25        close $fh or die "new.csv: $!";
26

DESCRIPTION

28       Text::CSV_XS provides facilities for the composition and decomposition
29       of comma-separated values.  An instance of the Text::CSV_XS class can
30       combine fields into a CSV string and parse a CSV string into fields.
31
32       The module accepts either strings or files as input and can utilize any
33       user-specified characters as delimiters, separators, and escapes so it
34       is perhaps better called ASV (anything separated values) rather than
35       just CSV.
36
37   Embedded newlines
38       Important Note: The default behavior is to only accept ascii
39       characters.  This means that fields can not contain newlines. If your
40       data contains newlines embedded in fields, or characters above 0x7e
41       (tilde), or binary data, you *must* set "binary => 1" in the call to
42       "new ()".  To cover the widest range of parsing options, you will
43       always want to set binary.
44
45       But you still have the problem that you have to pass a correct line to
46       the "parse ()" method, which is more complicated from the usual point
47       of usage:
48
49        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
50        while (<>) {           #  WRONG!
51            $csv->parse ($_);
52            my @fields = $csv->fields ();
53
54       will break, as the while might read broken lines, as that doesn't care
55       about the quoting. If you need to support embedded newlines, the way to
56       go is either
57
58        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
59        while (my $row = $csv->getline (*ARGV)) {
60            my @fields = @$row;
61
62       or, more safely in perl 5.6 and up
63
64        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
65        open my $io, "<", $file or die "$file: $!";
66        while (my $row = $csv->getline ($io)) {
67            my @fields = @$row;
68
69   Unicode (UTF8)
70       On parsing (both for "getline ()" and "parse ()"), if the source is
71       marked being UTF8, then all fields that are marked binary will also be
72       be marked UTF8.
73
74       On combining ("print ()" and "combine ()"), if any of the combining
75       fields was marked UTF8, the resulting string will be marked UTF8.
76
77       For complete control over encoding, please use Text::CSV::Encoded:
78
79           use Text::CSV::Encoded;
80           my $csv = Text::CSV::Encoded->new ({
81               encoding_in  => "iso-8859-1", # the encoding comes into   Perl
82               encoding_out => "cp1252",     # the encoding comes out of Perl
83               });
84
85           $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
86           # combine () and print () accept *literally* utf8 encoded data
87           # parse () and getline () return *literally* utf8 encoded data
88
89           $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
90           # combine () and print () accept UTF8 marked data
91           # parse () and getline () return UTF8 marked data
92

SPECIFICATION

94       While no formal specification for CSV exists, RFC 4180 1) describes a
95       common format and establishes "text/csv" as the MIME type registered
96       with the IANA.
97
98       Many informal documents exist that describe the CSV format. How To: The
99       Comma Separated Value (CSV) File Format 2) provides an overview of the
100       CSV format in the most widely used applications and explains how it can
101       best be used and supported.
102
103        1) http://tools.ietf.org/html/rfc4180
104        2) http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
105
106       The basic rules are as follows:
107
108       CSV is a delimited data format that has fields/columns separated by the
109       comma character and records/rows separated by newlines. Fields that
110       contain a special character (comma, newline, or double quote), must be
111       enclosed in double quotes.  However, if a line contains a single entry
112       which is the empty string, it may be enclosed in double quotes. If a
113       field's value contains a double quote character it is escaped by
114       placing another double quote character next to it. The CSV file format
115       does not require a specific character encoding, byte order, or line
116       terminator format.
117
118       · Each record is one line terminated by a line feed (ASCII/LF=0x0A) or
119         a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however,
120         line-breaks can be embedded.
121
122       · Fields are separated by commas.
123
124       · Allowable characters within a CSV field include 0x09 (tab) and the
125         inclusive range of 0x20 (space) through 0x7E (tilde). In binary mode
126         all characters are accepted, at least in quoted fields.
127
128       · A field within CSV must be surrounded by double-quotes to contain a
129         the separator character (comma).
130
131       Though this is the most clear and restrictive definition, Text::CSV_XS
132       is way more liberal than this, and allows extension:
133
134       · Line termination by a single carriage return is accepted by default
135
136       · The separation-, escape-, and escape- characters can be any ASCII
137         character in the range from 0x20 (space) to 0x7E (tilde). Characters
138         outside this range may or may not work as expected. Multibyte
139         characters, like U+060c (ARABIC COMMA), U+FF0C (FULLWIDTH COMMA),
140         U+241B (SYMBOL FOR ESCAPE), U+2424 (SYMBOL FOR NEWLINE), U+FF02
141         (FULLWIDTH QUOTATION MARK), and U+201C (LEFT DOUBLE QUOTATION MARK)
142         (to give some examples of what might look promising) are therefor not
143         allowed.
144
145         If you use perl-5.8.2 or higher, these three attributes are
146         utf8-decoded, to increase the likelihood of success. This way U+00FE
147         will be allowed as a quote character.
148
149       · A field within CSV must be surrounded by double-quotes to contain an
150         embedded double-quote, represented by a pair of consecutive double-
151         quotes. In binary mode you may additionally use the sequence ""0" for
152         representation of a NULL byte.
153
154       · Several violations of the above specification may be allowed by
155         passing options to the object creator.
156

FUNCTIONS

158   version ()
159       (Class method) Returns the current module version.
160
161   new (\%attr)
162       (Class method) Returns a new instance of Text::CSV_XS. The objects
163       attributes are described by the (optional) hash ref "\%attr".
164       Currently the following attributes are available:
165
166       eol An end-of-line string to add to rows. "undef" is replaced with an
167           empty string. The default is "$\". Common values for "eol" are
168           "\012" (Line Feed) or "\015\012" (Carriage Return, Line Feed).
169           Cannot be longer than 7 (ASCII) characters.
170
171           If both $/ and "eol" equal "\015", parsing lines that end on only a
172           Carriage Return without Line Feed, will be "parse"d correct.  Line
173           endings, whether in $/ or "eol", other than "undef", "\n", "\r\n",
174           or "\r" are not (yet) supported for parsing.
175
176       sep_char
177           The char used for separating fields, by default a comma. (",").
178           Limited to a single-byte character, usually in the range from 0x20
179           (space) to 0x7e (tilde).
180
181           The separation character can not be equal to the quote character.
182           The separation character can not be equal to the escape character.
183
184           See also CAVEATS
185
186       allow_whitespace
187           When this option is set to true, whitespace (TAB's and SPACE's)
188           surrounding the separation character is removed when parsing. If
189           either TAB or SPACE is one of the three major characters
190           "sep_char", "quote_char", or "escape_char" it will not be
191           considered whitespace.
192
193           So lines like:
194
195             1 , "foo" , bar , 3 , zapp
196
197           are now correctly parsed, even though it violates the CSV specs.
198
199           Note that all whitespace is stripped from start and end of each
200           field. That would make it more a feature than a way to be able to
201           parse bad CSV lines, as
202
203            1,   2.0,  3,   ape  , monkey
204
205           will now be parsed as
206
207            ("1", "2.0", "3", "ape", "monkey")
208
209           even if the original line was perfectly sane CSV.
210
211       blank_is_undef
212           Under normal circumstances, CSV data makes no distinction between
213           quoted- and unquoted empty fields. They both end up in an empty
214           string field once read, so
215
216            1,"",," ",2
217
218           is read as
219
220            ("1", "", "", " ", "2")
221
222           When writing CSV files with "always_quote" set, the unquoted empty
223           field is the result of an undefined value. To make it possible to
224           also make this distinction when reading CSV data, the
225           "blank_is_undef" option will cause unquoted empty fields to be set
226           to undef, causing the above to be parsed as
227
228            ("1", "", undef, " ", "2")
229
230       empty_is_undef
231           Going one step further than "blank_is_undef", this attribute
232           converts all empty fields to undef, so
233
234            1,"",," ",2
235
236           is read as
237
238            (1, undef, undef, " ", 2)
239
240           Note that this only effects fields that are realy empty, not fields
241           that are empty after stripping allowed whitespace. YMMV.
242
243       quote_char
244           The char used for quoting fields containing blanks, by default the
245           double quote character ("""). A value of undef suppresses quote
246           chars. (For simple cases only).  Limited to a single-byte
247           character, usually in the range from 0x20 (space) to 0x7e (tilde).
248
249           The quote character can not be equal to the separation character.
250
251       allow_loose_quotes
252           By default, parsing fields that have "quote_char" characters inside
253           an unquoted field, like
254
255            1,foo "bar" baz,42
256
257           would result in a parse error. Though it is still bad practice to
258           allow this format, we cannot help there are some vendors that make
259           their applications spit out lines styled like this.
260
261           In case there is really bad CSV data, like
262
263            1,"foo "bar" baz",42
264
265           or
266
267            1,""foo bar baz"",42
268
269           there is a way to get that parsed, and leave the quotes inside the
270           quoted field as-is. This can be achieved by setting
271           "allow_loose_quotes" AND making sure that the "escape_char" is not
272           equal to "quote_char".
273
274       escape_char
275           The character used for escaping certain characters inside quoted
276           fields.  Limited to a single-byte character, usually in the range
277           from 0x20 (space) to 0x7e (tilde).
278
279           The "escape_char" defaults to being the literal double-quote mark
280           (""") in other words, the same as the default "quote_char". This
281           means that doubling the quote mark in a field escapes it:
282
283             "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
284
285           If you change the default quote_char without changing the default
286           escape_char, the escape_char will still be the quote mark.  If
287           instead you want to escape the quote_char by doubling it, you will
288           need to change the escape_char to be the same as what you changed
289           the quote_char to.
290
291           The escape character can not be equal to the separation character.
292
293       allow_loose_escapes
294           By default, parsing fields that have "escape_char" characters that
295           escape characters that do not need to be escaped, like:
296
297            my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
298            $csv->parse (qq{1,"my bar\'s",baz,42});
299
300           would result in a parse error. Though it is still bad practice to
301           allow this format, this option enables you to treat all escape
302           character sequences equal.
303
304       binary
305           If this attribute is TRUE, you may use binary characters in quoted
306           fields, including line feeds, carriage returns and NULL bytes. (The
307           latter must be escaped as ""0".) By default this feature is off.
308
309           If a string is marked UTF8, binary will be turned on automatically
310           when binary characters other than CR or NL are encountered. Note
311           that a simple string like "\x{00a0}" might still be binary, but not
312           marked UTF8, so setting "{ binary =" 1 }> is still a wise option.
313
314       types
315           A set of column types; this attribute is immediately passed to the
316           types method below. You must not set this attribute otherwise,
317           except for using the types method. For details see the description
318           of the types method below.
319
320       always_quote
321           By default the generated fields are quoted only, if they need to,
322           for example, if they contain the separator. If you set this
323           attribute to a TRUE value, then all fields will be quoted. This is
324           typically easier to handle in external applications. (Poor
325           creatures who aren't using Text::CSV_XS. :-)
326
327       quote_space
328           By default, a space in a field would trigger quotation. As no rule
329           exists this to be forced in CSV, nor any for the opposite, the
330           default is true for safety. You can exclude the space from this
331           trigger by setting this attribute to 0.
332
333       quote_null
334           By default, a NULL byte in a field would be escaped. This attribute
335           enables you to treat the NULL byte as a simple binary character in
336           binary mode (the "{ binary =" 1 }> is set). The default is true.
337           You can prevent NULL escapes by setting this attribute to 0.
338
339       keep_meta_info
340           By default, the parsing of input lines is as simple and fast as
341           possible. However, some parsing information - like quotation of the
342           original field - is lost in that process. Set this flag to true to
343           be able to retrieve that information after parsing with the methods
344           "meta_info ()", "is_quoted ()", and "is_binary ()" described below.
345           Default is false.
346
347       verbatim
348           This is a quite controversial attribute to set, but it makes hard
349           things possible.
350
351           The basic thought behind this is to tell the parser that the
352           normally special characters newline (NL) and Carriage Return (CR)
353           will not be special when this flag is set, and be dealt with as
354           being ordinary binary characters. This will ease working with data
355           with embedded newlines.
356
357           When "verbatim" is used with "getline ()", "getline ()" auto-
358           chomp's every line.
359
360           Imagine a file format like
361
362             M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
363
364           where, the line ending is a very specific "#\r\n", and the sep_char
365           is a ^ (caret). None of the fields is quoted, but embedded binary
366           data is likely to be present. With the specific line ending, that
367           shouldn't be too hard to detect.
368
369           By default, Text::CSV_XS' parse function however is instructed to
370           only know about "\n" and "\r" to be legal line endings, and so has
371           to deal with the embedded newline as a real end-of-line, so it can
372           scan the next line if binary is true, and the newline is inside a
373           quoted field.  With this attribute however, we can tell parse () to
374           parse the line as if \n is just nothing more than a binary
375           character.
376
377           For parse () this means that the parser has no idea about line
378           ending anymore, and getline () chomps line endings on reading.
379
380       auto_diag
381           Set to true will cause "error_diag ()" to be automatically be
382           called in void context upon errors.
383
384           In case of error "2012 - EOF", this call will be void.
385
386           If set to a value greater than 1, it will die on errors instead of
387           warn.
388
389           Future extensions to this feature will include more reliable auto-
390           detection of the "autodie" module being enabled, which will raise
391           the value of "auto_diag" with 1 on the moment the error is
392           detected.
393
394       To sum it up,
395
396        $csv = Text::CSV_XS->new ();
397
398       is equivalent to
399
400        $csv = Text::CSV_XS->new ({
401            quote_char          => '"',
402            escape_char         => '"',
403            sep_char            => ',',
404            eol                 => $\,
405            always_quote        => 0,
406            quote_space         => 1,
407            quote_null          => 1,
408            binary              => 0,
409            keep_meta_info      => 0,
410            allow_loose_quotes  => 0,
411            allow_loose_escapes => 0,
412            allow_whitespace    => 0,
413            blank_is_undef      => 0,
414            empty_is_undef      => 0,
415            verbatim            => 0,
416            auto_diag           => 0,
417            });
418
419       For all of the above mentioned flags, there is an accessor method
420       available where you can inquire for the current value, or change the
421       value
422
423        my $quote = $csv->quote_char;
424        $csv->binary (1);
425
426       It is unwise to change these settings halfway through writing CSV data
427       to a stream. If however, you want to create a new stream using the
428       available CSV object, there is no harm in changing them.
429
430       If the "new ()" constructor call fails, it returns "undef", and makes
431       the fail reason available through the "error_diag ()" method.
432
433        $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
434            die "".Text::CSV_XS->error_diag ();
435
436       "error_diag ()" will return a string like
437
438        "INI - Unknown attribute 'ecs_char'"
439
440   print
441        $status = $csv->print ($io, $colref);
442
443       Similar to "combine () + string () + print", but more efficient. It
444       expects an array ref as input (not an array!) and the resulting string
445       is not really created, but immediately written to the $io object,
446       typically an IO handle or any other object that offers a print method.
447       Note, this implies that the following is wrong in perl 5.005_xx and
448       older:
449
450        open FILE, ">", "whatever";
451        $status = $csv->print (\*FILE, $colref);
452
453       as in perl 5.005 and older, the glob "\*FILE" is not an object, thus it
454       doesn't have a print method. The solution is to use an IO::File object
455       or to hide the glob behind an IO::Wrap object. See IO::File and
456       IO::Wrap for details.
457
458       For performance reasons the print method doesn't create a result
459       string.  In particular the $csv->string (), $csv->status (),
460       $csv-fields ()> and $csv->error_input () methods are meaningless after
461       executing this method.
462
463   combine
464        $status = $csv->combine (@columns);
465
466       This object function constructs a CSV string from the arguments,
467       returning success or failure.  Failure can result from lack of
468       arguments or an argument containing an invalid character.  Upon
469       success, "string ()" can be called to retrieve the resultant CSV
470       string.  Upon failure, the value returned by "string ()" is undefined
471       and "error_input ()" can be called to retrieve an invalid argument.
472
473   string
474        $line = $csv->string ();
475
476       This object function returns the input to "parse ()" or the resultant
477       CSV string of "combine ()", whichever was called more recently.
478
479   getline
480        $colref = $csv->getline ($io);
481
482       This is the counterpart to print, like parse is the counterpart to
483       combine: It reads a row from the IO object $io using $io->getline ()
484       and parses this row into an array ref. This array ref is returned by
485       the function or undef for failure.
486
487       When fields are bound with "bind_columns ()", the return value is a
488       reference to an empty list.
489
490       The $csv->string (), $csv->fields () and $csv->status () methods are
491       meaningless, again.
492
493   parse
494        $status = $csv->parse ($line);
495
496       This object function decomposes a CSV string into fields, returning
497       success or failure.  Failure can result from a lack of argument or the
498       given CSV string is improperly formatted.  Upon success, "fields ()"
499       can be called to retrieve the decomposed fields .  Upon failure, the
500       value returned by "fields ()" is undefined and "error_input ()" can be
501       called to retrieve the invalid argument.
502
503       You may use the types () method for setting column types. See the
504       description below.
505
506   getline_hr
507       The "getline_hr ()" and "column_names ()" methods work together to
508       allow you to have rows returned as hashrefs. You must call
509       "column_names ()" first to declare your column names.
510
511        $csv->column_names (qw( code name price description ));
512        $hr = $csv->getline_hr ($io);
513        print "Price for $hr->{name} is $hr->{price} EUR\n";
514
515       "getline_hr ()" will croak if called before "column_names ()".
516
517   column_names
518       Set the keys that will be used in the "getline_hr ()" calls. If no keys
519       (column names) are passed, it'll return the current setting.
520
521       "column_names ()" accepts a list of scalars (the column names) or a
522       single array_ref, so you can pass "getline ()"
523
524         $csv->column_names ($csv->getline ($io));
525
526       "column_names ()" does no checking on duplicates at all, which might
527       lead to unwanted results. Undefined entries will be replaced with the
528       string "\cAUNDEF\cA", so
529
530         $csv->column_names (undef, "", "name", "name");
531         $hr = $csv->getline_hr ($io);
532
533       Will set "$hr-"{"\cAUNDEF\cA"}> to the 1st field, "$hr-"{""}> to the
534       2nd field, and "$hr-"{name}> to the 4th field, discarding the 3rd
535       field.
536
537       "column_names ()" croaks on invalid arguments.
538
539   bind_columns
540       Takes a list of references to scalars to store the fields fetched
541       "getline ()" in. When you don't pass enough references to store the
542       fetched fields in, "getline ()" will fail. If you pass more than there
543       are fields to return, the remaining references are left untouched.
544
545         $csv->bind_columns (\$code, \$name, \$price, \$description);
546         while ($csv->getline ($io)) {
547             print "The price of a $name is \x{20ac} $price\n";
548             }
549
550   eof
551        $eof = $csv->eof ();
552
553       If "parse ()" or "getline ()" was used with an IO stream, this method
554       will return true (1) if the last call hit end of file, otherwise it
555       will return false (''). This is useful to see the difference between a
556       failure and end of file.
557
558   types
559        $csv->types (\@tref);
560
561       This method is used to force that columns are of a given type. For
562       example, if you have an integer column, two double columns and a string
563       column, then you might do a
564
565        $csv->types ([Text::CSV_XS::IV (),
566                      Text::CSV_XS::NV (),
567                      Text::CSV_XS::NV (),
568                      Text::CSV_XS::PV ()]);
569
570       Column types are used only for decoding columns, in other words by the
571       parse () and getline () methods.
572
573       You can unset column types by doing a
574
575        $csv->types (undef);
576
577       or fetch the current type settings with
578
579        $types = $csv->types ();
580
581       IV  Set field type to integer.
582
583       NV  Set field type to numeric/float.
584
585       PV  Set field type to string.
586
587   fields
588        @columns = $csv->fields ();
589
590       This object function returns the input to "combine ()" or the resultant
591       decomposed fields of C successful <parse ()>, whichever was called more
592       recently.
593
594       Note that the return value is undefined after using "getline ()", which
595       does not fill the data structures returned by "parse ()".
596
597   meta_info
598        @flags = $csv->meta_info ();
599
600       This object function returns the flags of the input to "combine ()" or
601       the flags of the resultant decomposed fields of "parse ()", whichever
602       was called more recently.
603
604       For each field, a meta_info field will hold flags that tell something
605       about the field returned by the "fields ()" method or passed to the
606       "combine ()" method. The flags are bitwise-or'd like:
607
608       0x0001
609           The field was quoted.
610
611       0x0002
612           The field was binary.
613
614       See the "is_*** ()" methods below.
615
616   is_quoted
617         my $quoted = $csv->is_quoted ($column_idx);
618
619       Where $column_idx is the (zero-based) index of the column in the last
620       result of "parse ()".
621
622       This returns a true value if the data in the indicated column was
623       enclosed in "quote_char" quotes. This might be important for data where
624       ",20070108," is to be treated as a numeric value, and where
625       ","20070108"," is explicitly marked as character string data.
626
627   is_binary
628         my $binary = $csv->is_binary ($column_idx);
629
630       Where $column_idx is the (zero-based) index of the column in the last
631       result of "parse ()".
632
633       This returns a true value if the data in the indicated column contained
634       any byte in the range [\x00-\x08,\x10-\x1F,\x7F-\xFF]
635
636   status
637        $status = $csv->status ();
638
639       This object function returns success (or failure) of "combine ()" or
640       "parse ()", whichever was called more recently.
641
642   error_input
643        $bad_argument = $csv->error_input ();
644
645       This object function returns the erroneous argument (if it exists) of
646       "combine ()" or "parse ()", whichever was called more recently.
647
648   error_diag
649        Text::CSV_XS->error_diag ();
650        $csv->error_diag ();
651        $error_code   = 0  + $csv->error_diag ();
652        $error_str    = "" . $csv->error_diag ();
653        ($cde, $str, $pos) = $csv->error_diag ();
654
655       If (and only if) an error occured, this function returns the
656       diagnostics of that error.
657
658       If called in void context, it will print the internal error code and
659       the associated error message to STDERR.
660
661       If called in list context, it will return the error code and the error
662       message in that order. If the last error was from parsing, the third
663       value returned is a best guess at the location within the line that was
664       being parsed. It's value is 1-based. See "examples/csv-check" for how
665       this can be used.
666
667       If called in scalar context, it will return the diagnostics in a single
668       scalar, a-la $!. It will contain the error code in numeric context, and
669       the diagnostics message in string context.
670
671       When called as a class method or a direct function call, the error diag
672       is that of the last "new ()" call.
673
674   SetDiag
675        $csv->SetDiag (0);
676
677       Use to reset the diagnostics if you are dealing with errors.
678

INTERNALS

680       Combine (...)
681       Parse (...)
682
683       The arguments to these two internal functions are deliberately not
684       described or documented to enable the module author(s) to change it
685       when they feel the need for it and using them is highly discouraged as
686       the API may change in future releases.
687

EXAMPLES

689       Reading a CSV file line by line:
690
691         my $csv = Text::CSV_XS->new ({ binary => 1 });
692         open my $fh, "<", "file.csv" or die "file.csv: $!";
693         while (my $row = $csv->getline ($fh)) {
694             # do something with @$row
695             }
696         $csv->eof or $csv->error_diag;
697         close $fh or die "file.csv: $!";
698
699       Parsing CSV strings:
700
701         my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
702
703         my $sample_input_string =
704             qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
705         if ($csv->parse ($sample_input_string)) {
706             my @field = $csv->fields;
707             foreach my $col (0 .. $#field) {
708                 my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
709                 printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
710                 }
711             }
712         else {
713             print STDERR "parse () failed on argument: ",
714                 $csv->error_input, "\n";
715             $csv->error_diag ();
716             }
717
718       An example for creating CSV files using the "print ()" method, like in
719       dumping the content of a database ($dbh) table ($tbl) to CSV:
720
721         my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
722         open my $fh, ">", "$tbl.csv" or die "$tbl.csv: $!";
723         my $sth = $dbh->prepare ("select * from $tbl");
724         $sth->execute;
725         $csv->print ($fh, $sth->{NAME_lc});
726         while (my $row = $sth->fetch) {
727             $csv->print ($fh, $row) or $csv->error_diag;
728             }
729         close $fh or die "$tbl.csv: $!";
730
731       or using the slower "combine ()" and "string ()" methods:
732
733         my $csv = Text::CSV_XS->new;
734
735         open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
736
737         my @sample_input_fields = (
738             'You said, "Hello!"',   5.67,
739             '"Surely"',   '',   '3.14159');
740         if ($csv->combine (@sample_input_fields)) {
741             print $csv_fh $csv->string, "\n";
742             }
743         else {
744             print "combine () failed on argument: ",
745                 $csv->error_input, "\n";
746             }
747         close $csv_fh or die "hello.csv: $!";
748
749       For more extended examples, see the "examples/" subdirectory in the
750       original distribution or the git repository at
751       http://repo.or.cz/w/Text-CSV_XS.git?a=tree;f=examples. The following
752       files can be found there:
753
754       parser-xs.pl
755         This can be used as a boilerplate to `fix' bad CSV and parse beyond
756         errors.
757
758           $ perl examples/parser-xs.pl bad.csv >good.csv
759
760       csv-check
761         This is a command-line tool that uses parser-xs.pl techniques to
762         check the CSV file and report on its content.
763
764           $ csv-check files/utf8.csv
765           Checked with examples/csv-check 1.2 using Text::CSV_XS 0.61
766           OK: rows: 1, columns: 2
767               sep = <,>, quo = <">, bin = <1>
768
769       csv2xls
770         A script to convert CSV to Microsoft Excel. This requires Date::Calc
771         and Spreadsheet::WriteExcel. The converter accepts various options
772         and can produce UTF-8 Excel files.
773
774       csvdiff
775         A script that provides colorized diff on sorted CSV files, assuming
776         first line is header and first field is the key. Output options
777         include colorized ANSI escape codes or HTML.
778
779           $ csvdiff --html --output=diff.html file1.csv file2.csv
780

CAVEATS

782       "Text::CSV_XS" is not designed to detect the characters used for field
783       separation and quoting. The parsing is done using predefined settings.
784       In the examples subdirectory, you can find scripts that demonstrate how
785       you can try to detect these characters yourself.
786
787   Microsoft Excel
788       The import/export from Microsoft Excel is a risky task, according to
789       the documentation in "Text::CSV::Separator". Microsoft uses the
790       system's default list separator defined in the regional settings, which
791       happens to be a semicolon for Dutch, German and Spanish (and probably
792       some others as well).  For the English locale, the default is a comma.
793       In Windows however, the user is free to choose a predefined locale, and
794       then change every individual setting in it, so checking the locale is
795       no solution.
796

TODO

798       More Errors & Warnings
799         New extensions ought to be clear and concise in reporting what error
800         occurred where and why, and possibly also tell a remedy to the
801         problem.  error_diag is a (very) good start, but there is more work
802         to be done here.
803
804         Basic calls should croak or warn on illegal parameters. Errors should
805         be documented.
806
807       setting meta info
808         Future extensions might include extending the "meta_info ()",
809         "is_quoted ()", and "is_binary ()" to accept setting these flags for
810         fields, so you can specify which fields are quoted in the combine
811         ()/string () combination.
812
813           $csv->meta_info (0, 1, 1, 3, 0, 0);
814           $csv->is_quoted (3, 1);
815
816       combined methods
817         Requests for adding means (methods) that combine "combine ()" and
818         "string ()" in a single call will not be honored. Likewise for "parse
819         ()" and "fields ()". Given the trouble with embedded newlines, Using
820         "getline ()" and "print ()" instead is the prefered way to go.
821
822       Parse the whole file at once
823         Implement a new methods that enables the parsing of a complete file
824         at once, returning a lis of hashes. Possible extension to this could
825         be to enable a column selection on the call:
826
827            my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
828
829         Returning something like
830
831            [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
832                flags  => [ ... ],
833                errors => [ ... ],
834                },
835              { fields => [ ... ],
836                .
837                .
838                },
839              ]
840
841       EBCDIC
842         The hard-coding of characters and character ranges makes this module
843         unusable on EBCDIC system. Using some #ifdef structure could enable
844         these again without loosing speed. Testing would be the hard part.
845

Release plan

847       No guarantees, but this is what I have in mind right now:
848
849       next
850          - This might very well be 1.00
851          - DIAGNOSTICS setction in pod to *describe* the errors (see below)
852          - croak / carp
853
854       next + 1
855          - csv2csv - a script to regenerate a CSV file to follow standards
856          - EBCDIC support
857

DIAGNOSTICS

859       Still under construction ...
860
861       If an error occured, "$csv-"error_diag ()> can be used to get more
862       information on the cause of the failure. Note that for speed reasons,
863       the internal value is never cleared on success, so using the value
864       returned by "error_diag ()" in normal cases - when no error occured -
865       may cause unexpected results.
866
867       If the constructor failed, the cause can be found using "error_diag ()"
868       as a class method, like "Text::CSV_XS-"error_diag ()>.
869
870       "$csv-"error_diag ()> is automatically called upon error when the
871       contractor was called with "auto_diag" set to 1 or 2, or when "autodie"
872       is in effect.  When set to 1, this will cause a "warn ()" with the
873       error message, when set to 2, it will "die ()". "2012 - EOF" is
874       excluded from "auto_diag" reports.
875
876       Currently errors as described below are available. I've tried to make
877       the error itself explanatory enough, but more descriptions will be
878       added. For most of these errors, the first three capitals describe the
879       error category:
880
881       INI
882         Initialization error or option conflict.
883
884       ECR
885         Carriage-Return related parse error.
886
887       EOF
888         End-Of-File related parse error.
889
890       EIQ
891         Parse error inside quotation.
892
893       EIF
894         Parse error inside field.
895
896       ECB
897         Combine error.
898
899       EHR
900         HashRef parse related error.
901
902       1001 "INI - sep_char is equal to quote_char or escape_char"
903         The separation character cannot be equal to either the quotation
904         character or the escape character, as that will invalidate all
905         parsing rules.
906
907       1002 "INI - allow_whitespace with escape_char or quote_char SP or TAB"
908         Using "allow_whitespace" when either "escape_char" or "quote_char" is
909         equal to SPACE or TAB is too ambiguous to allow.
910
911       1003 "INI - \r or \n in main attr not allowed"
912         Using default "eol" characters in either "sep_char", "quote_char", or
913         "escape_char" is not allowed.
914
915       2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
916         When "eol" has been set to something specific, other than the
917         default, like "\r\t\n", and the "\r" is following the second
918         (closing) "quote_char", where the characters following the "\r" do
919         not make up the "eol" sequence, this is an error.
920
921       2011 "ECR - Characters after end of quoted field"
922         Sequences like "1,foo,"bar"baz,2" are not allowed. "bar" is a quoted
923         field, and after the closing quote, there should be either a new-line
924         sequence or a separation character.
925
926       2012 "EOF - End of data in parsing input stream"
927         Self-explaining. End-of-file while inside parsing a stream. Can only
928         happen when reading from streams with "getline ()", as using "parse
929         ()" is done on strings that are not required to have a trailing
930         "eol".
931
932       2021 "EIQ - NL char inside quotes, binary off"
933         Sequences like "1,"foo\nbar",2" are only allowed when the binary
934         option has been selected with the constructor.
935
936       2022 "EIQ - CR char inside quotes, binary off"
937         Sequences like "1,"foo\rbar",2" are only allowed when the binary
938         option has been selected with the constructor.
939
940       2023 "EIQ - QUO character not allowed"
941         Sequences like ""foo "bar" baz",quux" and "2023,",2008-04-05,"Foo,
942         Bar",\n" will cause this error.
943
944       2024 "EIQ - EOF cannot be escaped, not even inside quotes"
945         The escape character is not allowed as last character in an input
946         stream.
947
948       2025 "EIQ - Loose unescaped escape"
949         An escape character should escape only characters that need escaping.
950         Allowing the escape for other characters is possible with the
951         "allow_loose_escape" attribute.
952
953       2026 "EIQ - Binary character inside quoted field, binary off"
954         Binary characters are not allowed by default. Exceptions are fields
955         that contain valid UTF-8, that will automatically be upgraded is the
956         content is valid UTF-8. Pass the "binary" attribute with a true value
957         to accept binary characters.
958
959       2027 "EIQ - Quoted field not terminated"
960         When parsing a field that started with a quotation character, the
961         field is expected to be closed with a quotation character. When the
962         parsed line is exhausted before the quote is found, that field is not
963         terminated.
964
965       2030 "EIF - NL char inside unquoted verbatim, binary off"
966       2031 "EIF - CR char is first char of field, not part of EOL"
967       2032 "EIF - CR char inside unquoted, not part of EOL"
968       2034 "EIF - Loose unescaped quote"
969       2035 "EIF - Escaped EOF in unquoted field"
970       2036 "EIF - ESC error"
971       2037 "EIF - Binary character in unquoted field, binary off"
972       2110 "ECB - Binary character in Combine, binary off"
973       2200 "EIO - print to IO failed. See errno"
974       3001 "EHR - Unsupported syntax for column_names ()"
975       3002 "EHR - getline_hr () called before column_names ()"
976       3003 "EHR - bind_columns () and column_names () fields count mismatch"
977       3004 "EHR - bind_columns () only accepts refs to scalars"
978       3006 "EHR - bind_columns () did not pass enough refs for parsed fields"
979       3007 "EHR - bind_columns needs refs to writeable scalars"
980       3008 "EHR - unexpected error in bound fields"
981

SEE ALSO

983       perl, IO::File, IO::Handle, IO::Wrap, Text::CSV, Text::CSV_PP,
984       Text::CSV::Encoded, Text::CSV::Separator, and Spreadsheet::Read.
985

AUTHORS and MAINTAINERS

987       Alan Citterman <alan@mfgrtl.com> wrote the original Perl module. Please
988       don't send mail concerning Text::CSV_XS to Alan, as he's not involved
989       in the C part which is now the main part of the module.
990
991       Jochen Wiedmann <joe@ispsoft.de> rewrote the encoding and decoding in C
992       by implementing a simple finite-state machine and added the variable
993       quote, escape and separator characters, the binary mode and the print
994       and getline methods. See ChangeLog releases 0.10 through 0.23.
995
996       H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code, added the
997       field flags methods, wrote the major part of the test suite, completed
998       the documentation, fixed some RT bugs and added all the allow flags.
999       See ChangeLog releases 0.25 and on.
1000
1002       Copyright (C) 2007-2010 H.Merijn Brand for PROCURA B.V.  Copyright (C)
1003       1998-2001 Jochen Wiedmann. All rights reserved.  Portions Copyright (C)
1004       1997 Alan Citterman. All rights reserved.
1005
1006       This library is free software; you can redistribute it and/or modify it
1007       under the same terms as Perl itself.
1008
1009
1010
1011perl v5.12.0                      2010-03-15                         CSV_XS(3)
Impressum