1CSV_XS(3)             User Contributed Perl Documentation            CSV_XS(3)
2
3
4

NAME

6       Text::CSV_XS - comma-separated values manipulation routines
7

SYNOPSIS

9        # Functional interface
10        use Text::CSV_XS qw( csv );
11
12        # Read whole file in memory
13        my $aoa = csv (in => "data.csv");    # as array of array
14        my $aoh = csv (in => "data.csv",
15                       headers => "auto");   # as array of hash
16
17        # Write array of arrays as csv file
18        csv (in => $aoa, out => "file.csv", sep_char => ";");
19
20        # Only show lines where "code" is odd
21        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
22
23
24        # Object interface
25        use Text::CSV_XS;
26
27        my @rows;
28        # Read/parse CSV
29        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
30        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
31        while (my $row = $csv->getline ($fh)) {
32            $row->[2] =~ m/pattern/ or next; # 3rd field should match
33            push @rows, $row;
34            }
35        close $fh;
36
37        # and write as CSV
38        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
39        $csv->say ($fh, $_) for @rows;
40        close $fh or die "new.csv: $!";
41

DESCRIPTION

43       Text::CSV_XS  provides facilities for the composition  and
44       decomposition of comma-separated values.  An instance of the
45       Text::CSV_XS class will combine fields into a "CSV" string and parse a
46       "CSV" string into fields.
47
48       The module accepts either strings or files as input  and support the
49       use of user-specified characters for delimiters, separators, and
50       escapes.
51
52   Embedded newlines
53       Important Note:  The default behavior is to accept only ASCII
54       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
55       that the fields can not contain newlines. If your data contains
56       newlines embedded in fields, or characters above 0x7E (tilde), or
57       binary data, you must set "binary => 1" in the call to "new". To cover
58       the widest range of parsing options, you will always want to set
59       binary.
60
61       But you still have the problem  that you have to pass a correct line to
62       the "parse" method, which is more complicated from the usual point of
63       usage:
64
65        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
66        while (<>) {           #  WRONG!
67            $csv->parse ($_);
68            my @fields = $csv->fields ();
69            }
70
71       this will break, as the "while" might read broken lines:  it does not
72       care about the quoting. If you need to support embedded newlines,  the
73       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
74       and "\r\n" by default) and then
75
76        my $csv = Text::CSV_XS->new ({ binary => 1 });
77        open my $fh, "<", $file or die "$file: $!";
78        while (my $row = $csv->getline ($fh)) {
79            my @fields = @$row;
80            }
81
82       The old(er) way of using global file handles is still supported
83
84        while (my $row = $csv->getline (*ARGV)) { ... }
85
86   Unicode
87       Unicode is only tested to work with perl-5.8.2 and up.
88
89       See also "BOM".
90
91       The simplest way to ensure the correct encoding is used for  in- and
92       output is by either setting layers on the filehandles, or setting the
93       "encoding" argument for "csv".
94
95        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
96       or
97        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
98
99        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
100       or
101        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
102
103       On parsing (both for  "getline" and  "parse"),  if the source is marked
104       being UTF8, then all fields that are marked binary will also be marked
105       UTF8.
106
107       On combining ("print"  and  "combine"):  if any of the combining fields
108       was marked UTF8, the resulting string will be marked as UTF8.  Note
109       however that all fields  before  the first field marked UTF8 and
110       contained 8-bit characters that were not upgraded to UTF8,  these will
111       be  "bytes"  in the resulting string too, possibly causing unexpected
112       errors.  If you pass data of different encoding,  or you don't know if
113       there is  different  encoding, force it to be upgraded before you pass
114       them on:
115
116        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
117
118       For complete control over encoding, please use Text::CSV::Encoded:
119
120        use Text::CSV::Encoded;
121        my $csv = Text::CSV::Encoded->new ({
122            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
123            encoding_out => "cp1252",     # the encoding comes out of Perl
124            });
125
126        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
127        # combine () and print () accept *literally* utf8 encoded data
128        # parse () and getline () return *literally* utf8 encoded data
129
130        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
131        # combine () and print () accept UTF8 marked data
132        # parse () and getline () return UTF8 marked data
133
134   BOM
135       BOM  (or Byte Order Mark)  handling is available only inside the
136       "header" method.   This method supports the following encodings:
137       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
138       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
139       <https://en.wikipedia.org/wiki/Byte_order_mark>.
140
141       If a file has a BOM, the easiest way to deal with that is
142
143        my $aoh = csv (in => $file, detect_bom => 1);
144
145       All records will be encoded based on the detected BOM.
146
147       This implies a call to the  "header"  method,  which defaults to also
148       set the "column_names". So this is not the same as
149
150        my $aoh = csv (in => $file, headers => "auto");
151
152       which only reads the first record to set  "column_names"  but ignores
153       any meaning of possible present BOM.
154

SPECIFICATION

156       While no formal specification for CSV exists, RFC 4180
157       <https://datatracker.ietf.org/doc/html/rfc4180> (1) describes the
158       common format and establishes  "text/csv" as the MIME type registered
159       with the IANA. RFC 7111 <https://datatracker.ietf.org/doc/html/rfc7111>
160       (2) adds fragments to CSV.
161
162       Many informal documents exist that describe the "CSV" format.   "How
163       To: The Comma Separated Value (CSV) File Format"
164       <http://creativyst.com/Doc/Articles/CSV/CSV01.shtml> (3)  provides an
165       overview of the  "CSV"  format in the most widely used applications and
166       explains how it can best be used and supported.
167
168        1) https://datatracker.ietf.org/doc/html/rfc4180
169        2) https://datatracker.ietf.org/doc/html/rfc7111
170        3) http://creativyst.com/Doc/Articles/CSV/CSV01.shtml
171
172       The basic rules are as follows:
173
174       CSV  is a delimited data format that has fields/columns separated by
175       the comma character and records/rows separated by newlines. Fields that
176       contain a special character (comma, newline, or double quote),  must be
177       enclosed in double quotes. However, if a line contains a single entry
178       that is the empty string, it may be enclosed in double quotes.  If a
179       field's value contains a double quote character it is escaped by
180       placing another double quote character next to it. The "CSV" file
181       format does not require a specific character encoding, byte order, or
182       line terminator format.
183
184       • Each record is a single line ended by a line feed  (ASCII/"LF"=0x0A)
185         or a carriage return and line feed pair (ASCII/"CRLF"="0x0D 0x0A"),
186         however, line-breaks may be embedded.
187
188       • Fields are separated by commas.
189
190       • Allowable characters within a "CSV" field include 0x09 ("TAB") and
191         the inclusive range of 0x20 (space) through 0x7E (tilde).  In binary
192         mode all characters are accepted, at least in quoted fields.
193
194       • A field within  "CSV"  must be surrounded by  double-quotes to
195         contain  a separator character (comma).
196
197       Though this is the most clear and restrictive definition,  Text::CSV_XS
198       is way more liberal than this, and allows extension:
199
200       • Line termination by a single carriage return is accepted by default
201
202       • The separation-, escape-, and escape- characters can be any ASCII
203         character in the range from  0x20 (space) to  0x7E (tilde).
204         Characters outside this range may or may not work as expected.
205         Multibyte characters, like UTF "U+060C" (ARABIC COMMA),   "U+FF0C"
206         (FULLWIDTH COMMA),  "U+241B" (SYMBOL FOR ESCAPE), "U+2424" (SYMBOL
207         FOR NEWLINE), "U+FF02" (FULLWIDTH QUOTATION MARK), and "U+201C" (LEFT
208         DOUBLE QUOTATION MARK) (to give some examples of what might look
209         promising) work for newer versions of perl for "sep_char", and
210         "quote_char" but not for "escape_char".
211
212         If you use perl-5.8.2 or higher these three attributes are
213         utf8-decoded, to increase the likelihood of success. This way
214         "U+00FE" will be allowed as a quote character.
215
216       • A field in  "CSV"  must be surrounded by double-quotes to make an
217         embedded double-quote, represented by a pair of consecutive double-
218         quotes, valid. In binary mode you may additionally use the sequence
219         ""0" for representation of a NULL byte. Using 0x00 in binary mode is
220         just as valid.
221
222       • Several violations of the above specification may be lifted by
223         passing some options as attributes to the object constructor.
224

METHODS

226   version
227       (Class method) Returns the current module version.
228
229   new
230       (Class method) Returns a new instance of class Text::CSV_XS. The
231       attributes are described by the (optional) hash ref "\%attr".
232
233        my $csv = Text::CSV_XS->new ({ attributes ... });
234
235       The following attributes are available:
236
237       eol
238
239        my $csv = Text::CSV_XS->new ({ eol => $/ });
240                  $csv->eol (undef);
241        my $eol = $csv->eol;
242
243       The end-of-line string to add to rows for "print" or the record
244       separator for "getline".
245
246       When not passed in a parser instance,  the default behavior is to
247       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
248       "eol" at all. Passing "undef" or the empty string behave the same.
249
250       When not passed in a generating instance,  records are not terminated
251       at all, so it is probably wise to pass something you expect. A safe
252       choice for "eol" on output is either $/ or "\r\n".
253
254       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
255       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
256       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
257
258       If both $/ and "eol" equal "\015", parsing lines that end on only a
259       Carriage Return without Line Feed, will be "parse"d correct.
260
261       sep_char
262
263        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
264                $csv->sep_char (";");
265        my $c = $csv->sep_char;
266
267       The char used to separate fields, by default a comma. (",").  Limited
268       to a single-byte character, usually in the range from 0x20 (space) to
269       0x7E (tilde). When longer sequences are required, use "sep".
270
271       The separation character can not be equal to the quote character  or to
272       the escape character.
273
274       See also "CAVEATS"
275
276       sep
277
278        my $csv = Text::CSV_XS->new ({ sep => "\N{FULLWIDTH COMMA}" });
279                  $csv->sep (";");
280        my $sep = $csv->sep;
281
282       The chars used to separate fields, by default undefined. Limited to 8
283       bytes.
284
285       When set, overrules "sep_char".  If its length is one byte it acts as
286       an alias to "sep_char".
287
288       See also "CAVEATS"
289
290       quote_char
291
292        my $csv = Text::CSV_XS->new ({ quote_char => "'" });
293                $csv->quote_char (undef);
294        my $c = $csv->quote_char;
295
296       The character to quote fields containing blanks or binary data,  by
297       default the double quote character (""").  A value of undef suppresses
298       quote chars (for simple cases only). Limited to a single-byte
299       character, usually in the range from  0x20 (space) to  0x7E (tilde).
300       When longer sequences are required, use "quote".
301
302       "quote_char" can not be equal to "sep_char".
303
304       quote
305
306        my $csv = Text::CSV_XS->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
307                    $csv->quote ("'");
308        my $quote = $csv->quote;
309
310       The chars used to quote fields, by default undefined. Limited to 8
311       bytes.
312
313       When set, overrules "quote_char". If its length is one byte it acts as
314       an alias to "quote_char".
315
316       This method does not support "undef".  Use "quote_char" to disable
317       quotation.
318
319       See also "CAVEATS"
320
321       escape_char
322
323        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
324                $csv->escape_char (":");
325        my $c = $csv->escape_char;
326
327       The character to  escape  certain characters inside quoted fields.
328       This is limited to a  single-byte  character,  usually  in the  range
329       from  0x20 (space) to 0x7E (tilde).
330
331       The "escape_char" defaults to being the double-quote mark ("""). In
332       other words the same as the default "quote_char". This means that
333       doubling the quote mark in a field escapes it:
334
335        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
336
337       If  you  change  the   "quote_char"  without  changing  the
338       "escape_char",  the  "escape_char" will still be the double-quote
339       (""").  If instead you want to escape the  "quote_char" by doubling it
340       you will need to also change the  "escape_char"  to be the same as what
341       you have changed the "quote_char" to.
342
343       Setting "escape_char" to "undef" or "" will completely disable escapes
344       and is greatly discouraged. This will also disable "escape_null".
345
346       The escape character can not be equal to the separation character.
347
348       binary
349
350        my $csv = Text::CSV_XS->new ({ binary => 1 });
351                $csv->binary (0);
352        my $f = $csv->binary;
353
354       If this attribute is 1,  you may use binary characters in quoted
355       fields, including line feeds, carriage returns and "NULL" bytes. (The
356       latter could be escaped as ""0".) By default this feature is off.
357
358       If a string is marked UTF8,  "binary" will be turned on automatically
359       when binary characters other than "CR" and "NL" are encountered.   Note
360       that a simple string like "\x{00a0}" might still be binary, but not
361       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
362
363       strict
364
365        my $csv = Text::CSV_XS->new ({ strict => 1 });
366                $csv->strict (0);
367        my $f = $csv->strict;
368
369       If this attribute is set to 1, any row that parses to a different
370       number of fields than the previous row will cause the parser to throw
371       error 2014.
372
373       skip_empty_rows
374
375        my $csv = Text::CSV_XS->new ({ skip_empty_rows => 1 });
376                $csv->skip_empty_rows ("eof");
377        my $f = $csv->skip_empty_rows;
378
379       This attribute defines the behavior for empty rows:  an "eol"
380       immediately following the start of line. Default behavior is to return
381       one single empty field.
382
383       This attribute is only used in parsing.  This attribute is ineffective
384       when using "parse" and "fields".
385
386       Possible values for this attribute are
387
388       0 | undef
389          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 0 });
390          $csv->skip_empty_rows (undef);
391
392         No special action is taken. The result will be one single empty
393         field.
394
395       1 | "skip"
396          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 1 });
397          $csv->skip_empty_rows ("skip");
398
399         The row will be skipped.
400
401       2 | "eof" | "stop"
402          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 2 });
403          $csv->skip_empty_rows ("eof");
404
405         The parsing will stop as if an "eof" was detected.
406
407       3 | "die"
408          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 3 });
409          $csv->skip_empty_rows ("die");
410
411         The parsing will stop.  The internal error code will be set to 2015
412         and the parser will "die".
413
414       4 | "croak"
415          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 4 });
416          $csv->skip_empty_rows ("croak");
417
418         The parsing will stop.  The internal error code will be set to 2015
419         and the parser will "croak".
420
421       5 | "error"
422          my $csv = Text::CSV_XS->new ({ skip_empty_rows => 5 });
423          $csv->skip_empty_rows ("error");
424
425         The parsing will fail.  The internal error code will be set to 2015.
426
427       callback
428          my $csv = Text::CSV_XS->new ({ skip_empty_rows => sub { [] } });
429          $csv->skip_empty_rows (sub { [ 42, $., undef, "empty" ] });
430
431         The callback is invoked and its result used instead.  If you want the
432         parse to stop after the callback, make sure to return a false value.
433
434         The returned value from the callback should be an array-ref. Any
435         other type will cause the parse to stop, so these are equivalent in
436         behavior:
437
438          csv (in => $fh, skip_empty_rows => "stop");
439          csv (in => $fh. skip_empty_rows => sub { 0; });
440
441       Without arguments, the current value is returned: 0, 1, "eof", "die",
442       "croak" or the callback.
443
444       formula_handling
445
446       Alias for "formula"
447
448       formula
449
450        my $csv = Text::CSV_XS->new ({ formula => "none" });
451                $csv->formula ("none");
452        my $f = $csv->formula;
453
454       This defines the behavior of fields containing formulas. As formulas
455       are considered dangerous in spreadsheets, this attribute can define an
456       optional action to be taken if a field starts with an equal sign ("=").
457
458       For purpose of code-readability, this can also be written as
459
460        my $csv = Text::CSV_XS->new ({ formula_handling => "none" });
461                $csv->formula_handling ("none");
462        my $f = $csv->formula_handling;
463
464       Possible values for this attribute are
465
466       none
467         Take no specific action. This is the default.
468
469          $csv->formula ("none");
470
471       die
472         Cause the process to "die" whenever a leading "=" is encountered.
473
474          $csv->formula ("die");
475
476       croak
477         Cause the process to "croak" whenever a leading "=" is encountered.
478         (See Carp)
479
480          $csv->formula ("croak");
481
482       diag
483         Report position and content of the field whenever a leading  "=" is
484         found.  The value of the field is unchanged.
485
486          $csv->formula ("diag");
487
488       empty
489         Replace the content of fields that start with a "=" with the empty
490         string.
491
492          $csv->formula ("empty");
493          $csv->formula ("");
494
495       undef
496         Replace the content of fields that start with a "=" with "undef".
497
498          $csv->formula ("undef");
499          $csv->formula (undef);
500
501       a callback
502         Modify the content of fields that start with a  "="  with the return-
503         value of the callback.  The original content of the field is
504         available inside the callback as $_;
505
506          # Replace all formula's with 42
507          $csv->formula (sub { 42; });
508
509          # same as $csv->formula ("empty") but slower
510          $csv->formula (sub { "" });
511
512          # Allow =4+12
513          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
514
515          # Allow more complex calculations
516          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
517
518       All other values will give a warning and then fallback to "diag".
519
520       decode_utf8
521
522        my $csv = Text::CSV_XS->new ({ decode_utf8 => 1 });
523                $csv->decode_utf8 (0);
524        my $f = $csv->decode_utf8;
525
526       This attributes defaults to TRUE.
527
528       While parsing,  fields that are valid UTF-8, are automatically set to
529       be UTF-8, so that
530
531         $csv->parse ("\xC4\xA8\n");
532
533       results in
534
535         PV("\304\250"\0) [UTF8 "\x{128}"]
536
537       Sometimes it might not be a desired action.  To prevent those upgrades,
538       set this attribute to false, and the result will be
539
540         PV("\304\250"\0)
541
542       auto_diag
543
544        my $csv = Text::CSV_XS->new ({ auto_diag => 1 });
545                $csv->auto_diag (2);
546        my $l = $csv->auto_diag;
547
548       Set this attribute to a number between 1 and 9 causes  "error_diag" to
549       be automatically called in void context upon errors.
550
551       In case of error "2012 - EOF", this call will be void.
552
553       If "auto_diag" is set to a numeric value greater than 1, it will "die"
554       on errors instead of "warn".  If set to anything unrecognized,  it will
555       be silently ignored.
556
557       Future extensions to this feature will include more reliable auto-
558       detection of  "autodie"  being active in the scope of which the error
559       occurred which will increment the value of "auto_diag" with  1 the
560       moment the error is detected.
561
562       diag_verbose
563
564        my $csv = Text::CSV_XS->new ({ diag_verbose => 1 });
565                $csv->diag_verbose (2);
566        my $l = $csv->diag_verbose;
567
568       Set the verbosity of the output triggered by "auto_diag".   Currently
569       only adds the current  input-record-number  (if known)  to the
570       diagnostic output with an indication of the position of the error.
571
572       blank_is_undef
573
574        my $csv = Text::CSV_XS->new ({ blank_is_undef => 1 });
575                $csv->blank_is_undef (0);
576        my $f = $csv->blank_is_undef;
577
578       Under normal circumstances, "CSV" data makes no distinction between
579       quoted- and unquoted empty fields.  These both end up in an empty
580       string field once read, thus
581
582        1,"",," ",2
583
584       is read as
585
586        ("1", "", "", " ", "2")
587
588       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
589       set, the unquoted  empty field is the result of an undefined value.
590       To enable this distinction when  reading "CSV"  data,  the
591       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
592       to "undef", causing the above to be parsed as
593
594        ("1", "", undef, " ", "2")
595
596       Note that this is specifically important when loading  "CSV" fields
597       into a database that allows "NULL" values,  as the perl equivalent for
598       "NULL" is "undef" in DBI land.
599
600       empty_is_undef
601
602        my $csv = Text::CSV_XS->new ({ empty_is_undef => 1 });
603                $csv->empty_is_undef (0);
604        my $f = $csv->empty_is_undef;
605
606       Going one  step  further  than  "blank_is_undef",  this attribute
607       converts all empty fields to "undef", so
608
609        1,"",," ",2
610
611       is read as
612
613        (1, undef, undef, " ", 2)
614
615       Note that this affects only fields that are  originally  empty,  not
616       fields that are empty after stripping allowed whitespace. YMMV.
617
618       allow_whitespace
619
620        my $csv = Text::CSV_XS->new ({ allow_whitespace => 1 });
621                $csv->allow_whitespace (0);
622        my $f = $csv->allow_whitespace;
623
624       When this option is set to true,  the whitespace  ("TAB"'s and
625       "SPACE"'s) surrounding  the  separation character  is removed when
626       parsing.  If either "TAB" or "SPACE" is one of the three characters
627       "sep_char", "quote_char", or "escape_char" it will not be considered
628       whitespace.
629
630       Now lines like:
631
632        1 , "foo" , bar , 3 , zapp
633
634       are parsed as valid "CSV", even though it violates the "CSV" specs.
635
636       Note that  all  whitespace is stripped from both  start and  end of
637       each field.  That would make it  more than a feature to enable parsing
638       bad "CSV" lines, as
639
640        1,   2.0,  3,   ape  , monkey
641
642       will now be parsed as
643
644        ("1", "2.0", "3", "ape", "monkey")
645
646       even if the original line was perfectly acceptable "CSV".
647
648       allow_loose_quotes
649
650        my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1 });
651                $csv->allow_loose_quotes (0);
652        my $f = $csv->allow_loose_quotes;
653
654       By default, parsing unquoted fields containing "quote_char" characters
655       like
656
657        1,foo "bar" baz,42
658
659       would result in parse error 2034.  Though it is still bad practice to
660       allow this format,  we  cannot  help  the  fact  that  some  vendors
661       make  their applications spit out lines styled this way.
662
663       If there is really bad "CSV" data, like
664
665        1,"foo "bar" baz",42
666
667       or
668
669        1,""foo bar baz"",42
670
671       there is a way to get this data-line parsed and leave the quotes inside
672       the quoted field as-is.  This can be achieved by setting
673       "allow_loose_quotes" AND making sure that the "escape_char" is  not
674       equal to "quote_char".
675
676       allow_loose_escapes
677
678        my $csv = Text::CSV_XS->new ({ allow_loose_escapes => 1 });
679                $csv->allow_loose_escapes (0);
680        my $f = $csv->allow_loose_escapes;
681
682       Parsing fields  that  have  "escape_char"  characters that escape
683       characters that do not need to be escaped, like:
684
685        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
686        $csv->parse (qq{1,"my bar\'s",baz,42});
687
688       would result in parse error 2025.   Though it is bad practice to allow
689       this format,  this attribute enables you to treat all escape character
690       sequences equal.
691
692       allow_unquoted_escape
693
694        my $csv = Text::CSV_XS->new ({ allow_unquoted_escape => 1 });
695                $csv->allow_unquoted_escape (0);
696        my $f = $csv->allow_unquoted_escape;
697
698       A backward compatibility issue where "escape_char" differs from
699       "quote_char"  prevents  "escape_char" to be in the first position of a
700       field.  If "quote_char" is equal to the default """ and "escape_char"
701       is set to "\", this would be illegal:
702
703        1,\0,2
704
705       Setting this attribute to 1  might help to overcome issues with
706       backward compatibility and allow this style.
707
708       always_quote
709
710        my $csv = Text::CSV_XS->new ({ always_quote => 1 });
711                $csv->always_quote (0);
712        my $f = $csv->always_quote;
713
714       By default the generated fields are quoted only if they need to be.
715       For example, if they contain the separator character. If you set this
716       attribute to 1 then all defined fields will be quoted. ("undef" fields
717       are not quoted, see "blank_is_undef"). This makes it quite often easier
718       to handle exported data in external applications.   (Poor creatures who
719       are better to use Text::CSV_XS. :)
720
721       quote_space
722
723        my $csv = Text::CSV_XS->new ({ quote_space => 1 });
724                $csv->quote_space (0);
725        my $f = $csv->quote_space;
726
727       By default,  a space in a field would trigger quotation.  As no rule
728       exists this to be forced in "CSV",  nor any for the opposite, the
729       default is true for safety.   You can exclude the space  from this
730       trigger  by setting this attribute to 0.
731
732       quote_empty
733
734        my $csv = Text::CSV_XS->new ({ quote_empty => 1 });
735                $csv->quote_empty (0);
736        my $f = $csv->quote_empty;
737
738       By default the generated fields are quoted only if they need to be.
739       An empty (defined) field does not need quotation. If you set this
740       attribute to 1 then empty defined fields will be quoted.  ("undef"
741       fields are not quoted, see "blank_is_undef"). See also "always_quote".
742
743       quote_binary
744
745        my $csv = Text::CSV_XS->new ({ quote_binary => 1 });
746                $csv->quote_binary (0);
747        my $f = $csv->quote_binary;
748
749       By default,  all "unsafe" bytes inside a string cause the combined
750       field to be quoted.  By setting this attribute to 0, you can disable
751       that trigger for bytes ">= 0x7F".
752
753       escape_null
754
755        my $csv = Text::CSV_XS->new ({ escape_null => 1 });
756                $csv->escape_null (0);
757        my $f = $csv->escape_null;
758
759       By default, a "NULL" byte in a field would be escaped. This option
760       enables you to treat the  "NULL"  byte as a simple binary character in
761       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
762       can prevent "NULL" escapes by setting this attribute to 0.
763
764       When the "escape_char" attribute is set to undefined,  this attribute
765       will be set to false.
766
767       The default setting will encode "=\x00=" as
768
769        "="0="
770
771       With "escape_null" set, this will result in
772
773        "=\x00="
774
775       The default when using the "csv" function is "false".
776
777       For backward compatibility reasons,  the deprecated old name
778       "quote_null" is still recognized.
779
780       keep_meta_info
781
782        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1 });
783                $csv->keep_meta_info (0);
784        my $f = $csv->keep_meta_info;
785
786       By default, the parsing of input records is as simple and fast as
787       possible.  However,  some parsing information - like quotation of the
788       original field - is lost in that process.  Setting this flag to true
789       enables retrieving that information after parsing with  the methods
790       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
791       false for performance.
792
793       If you set this attribute to a value greater than 9,   then you can
794       control output quotation style like it was used in the input of the the
795       last parsed record (unless quotation was added because of other
796       reasons).
797
798        my $csv = Text::CSV_XS->new ({
799           binary         => 1,
800           keep_meta_info => 1,
801           quote_space    => 0,
802           });
803
804        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
805
806        $csv->print (*STDOUT, \@row);
807        # 1,,, , ,f,g,"h""h",help,help
808        $csv->keep_meta_info (11);
809        $csv->print (*STDOUT, \@row);
810        # 1,,"", ," ",f,"g","h""h",help,"help"
811
812       undef_str
813
814        my $csv = Text::CSV_XS->new ({ undef_str => "\\N" });
815                $csv->undef_str (undef);
816        my $s = $csv->undef_str;
817
818       This attribute optionally defines the output of undefined fields. The
819       value passed is not changed at all, so if it needs quotation, the
820       quotation needs to be included in the value of the attribute.  Use with
821       caution, as passing a value like  ",",,,,"""  will for sure mess up
822       your output. The default for this attribute is "undef", meaning no
823       special treatment.
824
825       This attribute is useful when exporting  CSV data  to be imported in
826       custom loaders, like for MySQL, that recognize special sequences for
827       "NULL" data.
828
829       This attribute has no meaning when parsing CSV data.
830
831       comment_str
832
833        my $csv = Text::CSV_XS->new ({ comment_str => "#" });
834                $csv->comment_str (undef);
835        my $s = $csv->comment_str;
836
837       This attribute optionally defines a string to be recognized as comment.
838       If this attribute is defined,   all lines starting with this sequence
839       will not be parsed as CSV but skipped as comment.
840
841       This attribute has no meaning when generating CSV.
842
843       Comment strings that start with any of the special characters/sequences
844       are not supported (so it cannot start with any of "sep_char",
845       "quote_char", "escape_char", "sep", "quote", or "eol").
846
847       For convenience, "comment" is an alias for "comment_str".
848
849       verbatim
850
851        my $csv = Text::CSV_XS->new ({ verbatim => 1 });
852                $csv->verbatim (0);
853        my $f = $csv->verbatim;
854
855       This is a quite controversial attribute to set,  but makes some hard
856       things possible.
857
858       The rationale behind this attribute is to tell the parser that the
859       normally special characters newline ("NL") and Carriage Return ("CR")
860       will not be special when this flag is set,  and be dealt with  as being
861       ordinary binary characters. This will ease working with data with
862       embedded newlines.
863
864       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
865       every line.
866
867       Imagine a file format like
868
869        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
870
871       where, the line ending is a very specific "#\r\n", and the sep_char is
872       a "^" (caret).   None of the fields is quoted,   but embedded binary
873       data is likely to be present. With the specific line ending, this
874       should not be too hard to detect.
875
876       By default,  Text::CSV_XS'  parse function is instructed to only know
877       about "\n" and "\r"  to be legal line endings,  and so has to deal with
878       the embedded newline as a real "end-of-line",  so it can scan the next
879       line if binary is true, and the newline is inside a quoted field. With
880       this option, we tell "parse" to parse the line as if "\n" is just
881       nothing more than a binary character.
882
883       For "parse" this means that the parser has no more idea about line
884       ending and "getline" "chomp"s line endings on reading.
885
886       types
887
888       A set of column types; the attribute is immediately passed to the
889       "types" method.
890
891       callbacks
892
893       See the "Callbacks" section below.
894
895       accessors
896
897       To sum it up,
898
899        $csv = Text::CSV_XS->new ();
900
901       is equivalent to
902
903        $csv = Text::CSV_XS->new ({
904            eol                   => undef, # \r, \n, or \r\n
905            sep_char              => ',',
906            sep                   => undef,
907            quote_char            => '"',
908            quote                 => undef,
909            escape_char           => '"',
910            binary                => 0,
911            decode_utf8           => 1,
912            auto_diag             => 0,
913            diag_verbose          => 0,
914            blank_is_undef        => 0,
915            empty_is_undef        => 0,
916            allow_whitespace      => 0,
917            allow_loose_quotes    => 0,
918            allow_loose_escapes   => 0,
919            allow_unquoted_escape => 0,
920            always_quote          => 0,
921            quote_empty           => 0,
922            quote_space           => 1,
923            escape_null           => 1,
924            quote_binary          => 1,
925            keep_meta_info        => 0,
926            strict                => 0,
927            skip_empty_rows       => 0,
928            formula               => 0,
929            verbatim              => 0,
930            undef_str             => undef,
931            comment_str           => undef,
932            types                 => undef,
933            callbacks             => undef,
934            });
935
936       For all of the above mentioned flags, an accessor method is available
937       where you can inquire the current value, or change the value
938
939        my $quote = $csv->quote_char;
940        $csv->binary (1);
941
942       It is not wise to change these settings halfway through writing "CSV"
943       data to a stream. If however you want to create a new stream using the
944       available "CSV" object, there is no harm in changing them.
945
946       If the "new" constructor call fails,  it returns "undef",  and makes
947       the fail reason available through the "error_diag" method.
948
949        $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
950            die "".Text::CSV_XS->error_diag ();
951
952       "error_diag" will return a string like
953
954        "INI - Unknown attribute 'ecs_char'"
955
956   known_attributes
957        @attr = Text::CSV_XS->known_attributes;
958        @attr = Text::CSV_XS::known_attributes;
959        @attr = $csv->known_attributes;
960
961       This method will return an ordered list of all the supported
962       attributes as described above.   This can be useful for knowing what
963       attributes are valid in classes that use or extend Text::CSV_XS.
964
965   print
966        $status = $csv->print ($fh, $colref);
967
968       Similar to  "combine" + "string" + "print",  but much more efficient.
969       It expects an array ref as input  (not an array!)  and the resulting
970       string is not really  created,  but  immediately  written  to the  $fh
971       object, typically an IO handle or any other object that offers a
972       "print" method.
973
974       For performance reasons  "print"  does not create a result string,  so
975       all "string", "status", "fields", and "error_input" methods will return
976       undefined information after executing this method.
977
978       If $colref is "undef"  (explicit,  not through a variable argument) and
979       "bind_columns"  was used to specify fields to be printed,  it is
980       possible to make performance improvements, as otherwise data would have
981       to be copied as arguments to the method call:
982
983        $csv->bind_columns (\($foo, $bar));
984        $status = $csv->print ($fh, undef);
985
986       A short benchmark
987
988        my @data = ("aa" .. "zz");
989        $csv->bind_columns (\(@data));
990
991        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
992        $csv->print ($fh,  \@data  );   # 57600 recs/sec
993        $csv->print ($fh,   undef  );   # 48500 recs/sec
994
995   say
996        $status = $csv->say ($fh, $colref);
997
998       Like "print", but "eol" defaults to "$\".
999
1000   print_hr
1001        $csv->print_hr ($fh, $ref);
1002
1003       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
1004       provided the column names are set with "column_names".
1005
1006       It is just a wrapper method with basic parameter checks over
1007
1008        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
1009
1010   combine
1011        $status = $csv->combine (@fields);
1012
1013       This method constructs a "CSV" record from  @fields,  returning success
1014       or failure.   Failure can result from lack of arguments or an argument
1015       that contains an invalid character.   Upon success,  "string" can be
1016       called to retrieve the resultant "CSV" string.  Upon failure,  the
1017       value returned by "string" is undefined and "error_input" could be
1018       called to retrieve the invalid argument.
1019
1020   string
1021        $line = $csv->string ();
1022
1023       This method returns the input to  "parse"  or the resultant "CSV"
1024       string of "combine", whichever was called more recently.
1025
1026   getline
1027        $colref = $csv->getline ($fh);
1028
1029       This is the counterpart to  "print",  as "parse"  is the counterpart to
1030       "combine":  it parses a row from the $fh  handle using the "getline"
1031       method associated with $fh  and parses this row into an array ref.
1032       This array ref is returned by the function or "undef" for failure.
1033       When $fh does not support "getline", you are likely to hit errors.
1034
1035       When fields are bound with "bind_columns" the return value is a
1036       reference to an empty list.
1037
1038       The "string", "fields", and "status" methods are meaningless again.
1039
1040   getline_all
1041        $arrayref = $csv->getline_all ($fh);
1042        $arrayref = $csv->getline_all ($fh, $offset);
1043        $arrayref = $csv->getline_all ($fh, $offset, $length);
1044
1045       This will return a reference to a list of getline ($fh) results.  In
1046       this call, "keep_meta_info" is disabled.  If $offset is negative, as
1047       with "splice", only the last  "abs ($offset)" records of $fh are taken
1048       into consideration. Parameters $offset and $length are expected to be
1049       integers. Non-integer values are interpreted as integer without check.
1050
1051       Given a CSV file with 10 lines:
1052
1053        lines call
1054        ----- ---------------------------------------------------------
1055        0..9  $csv->getline_all ($fh)         # all
1056        0..9  $csv->getline_all ($fh,  0)     # all
1057        8..9  $csv->getline_all ($fh,  8)     # start at 8
1058        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
1059        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
1060        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
1061        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
1062        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
1063
1064   getline_hr
1065       The "getline_hr" and "column_names" methods work together  to allow you
1066       to have rows returned as hashrefs.  You must call "column_names" first
1067       to declare your column names.
1068
1069        $csv->column_names (qw( code name price description ));
1070        $hr = $csv->getline_hr ($fh);
1071        print "Price for $hr->{name} is $hr->{price} EUR\n";
1072
1073       "getline_hr" will croak if called before "column_names".
1074
1075       Note that  "getline_hr"  creates a hashref for every row and will be
1076       much slower than the combined use of "bind_columns"  and "getline" but
1077       still offering the same easy to use hashref inside the loop:
1078
1079        my @cols = @{$csv->getline ($fh)};
1080        $csv->column_names (@cols);
1081        while (my $row = $csv->getline_hr ($fh)) {
1082            print $row->{price};
1083            }
1084
1085       Could easily be rewritten to the much faster:
1086
1087        my @cols = @{$csv->getline ($fh)};
1088        my $row = {};
1089        $csv->bind_columns (\@{$row}{@cols});
1090        while ($csv->getline ($fh)) {
1091            print $row->{price};
1092            }
1093
1094       Your mileage may vary for the size of the data and the number of rows.
1095       With perl-5.14.2 the comparison for a 100_000 line file with 14
1096       columns:
1097
1098                   Rate hashrefs getlines
1099        hashrefs 1.00/s       --     -76%
1100        getlines 4.15/s     313%       --
1101
1102   getline_hr_all
1103        $arrayref = $csv->getline_hr_all ($fh);
1104        $arrayref = $csv->getline_hr_all ($fh, $offset);
1105        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
1106
1107       This will return a reference to a list of   getline_hr ($fh) results.
1108       In this call, "keep_meta_info" is disabled.
1109
1110   parse
1111        $status = $csv->parse ($line);
1112
1113       This method decomposes a  "CSV"  string into fields,  returning success
1114       or failure.   Failure can result from a lack of argument  or the given
1115       "CSV" string is improperly formatted.   Upon success, "fields" can be
1116       called to retrieve the decomposed fields. Upon failure calling "fields"
1117       will return undefined data and  "error_input"  can be called to
1118       retrieve  the invalid argument.
1119
1120       You may use the "types"  method for setting column types.  See "types"'
1121       description below.
1122
1123       The $line argument is supposed to be a simple scalar. Everything else
1124       is supposed to croak and set error 1500.
1125
1126   fragment
1127       This function tries to implement RFC7111  (URI Fragment Identifiers for
1128       the text/csv Media Type) -
1129       https://datatracker.ietf.org/doc/html/rfc7111
1130
1131        my $AoA = $csv->fragment ($fh, $spec);
1132
1133       In specifications,  "*" is used to specify the last item, a dash ("-")
1134       to indicate a range.   All indices are 1-based:  the first row or
1135       column has index 1. Selections can be combined with the semi-colon
1136       (";").
1137
1138       When using this method in combination with  "column_names",  the
1139       returned reference  will point to a  list of hashes  instead of a  list
1140       of lists.  A disjointed  cell-based combined selection  might return
1141       rows with different number of columns making the use of hashes
1142       unpredictable.
1143
1144        $csv->column_names ("Name", "Age");
1145        my $AoH = $csv->fragment ($fh, "col=3;8");
1146
1147       If the "after_parse" callback is active,  it is also called on every
1148       line parsed and skipped before the fragment.
1149
1150       row
1151          row=4
1152          row=5-7
1153          row=6-*
1154          row=1-2;4;6-*
1155
1156       col
1157          col=2
1158          col=1-3
1159          col=4-*
1160          col=1-2;4;7-*
1161
1162       cell
1163         In cell-based selection, the comma (",") is used to pair row and
1164         column
1165
1166          cell=4,1
1167
1168         The range operator ("-") using "cell"s can be used to define top-left
1169         and bottom-right "cell" location
1170
1171          cell=3,1-4,6
1172
1173         The "*" is only allowed in the second part of a pair
1174
1175          cell=3,2-*,2    # row 3 till end, only column 2
1176          cell=3,2-3,*    # column 2 till end, only row 3
1177          cell=3,2-*,*    # strip row 1 and 2, and column 1
1178
1179         Cells and cell ranges may be combined with ";", possibly resulting in
1180         rows with different numbers of columns
1181
1182          cell=1,1-2,2;3,3-4,4;1,4;4,1
1183
1184         Disjointed selections will only return selected cells.   The cells
1185         that are not  specified  will  not  be  included  in the  returned
1186         set,  not even as "undef".  As an example given a "CSV" like
1187
1188          11,12,13,...19
1189          21,22,...28,29
1190          :            :
1191          91,...97,98,99
1192
1193         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1194
1195          11,12,14
1196          21,22
1197          33,34
1198          41,43,44
1199
1200         Overlapping cell-specs will return those cells only once, So
1201         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1202
1203          11,12,13
1204          21,22,23,24
1205          31,32,33,34
1206          42,43,44
1207
1208       RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does  not
1209       allow different types of specs to be combined   (either "row" or "col"
1210       or "cell").  Passing an invalid fragment specification will croak and
1211       set error 2013.
1212
1213   column_names
1214       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1215       keys (column names) are passed, it will return the current setting as a
1216       list.
1217
1218       "column_names" accepts a list of scalars  (the column names)  or a
1219       single array_ref, so you can pass the return value from "getline" too:
1220
1221        $csv->column_names ($csv->getline ($fh));
1222
1223       "column_names" does no checking on duplicates at all, which might lead
1224       to unexpected results.   Undefined entries will be replaced with the
1225       string "\cAUNDEF\cA", so
1226
1227        $csv->column_names (undef, "", "name", "name");
1228        $hr = $csv->getline_hr ($fh);
1229
1230       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1231       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1232       field.
1233
1234       "column_names" croaks on invalid arguments.
1235
1236   header
1237       This method does NOT work in perl-5.6.x
1238
1239       Parse the CSV header and set "sep", column_names and encoding.
1240
1241        my @hdr = $csv->header ($fh);
1242        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1243        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1244
1245       The first argument should be a file handle.
1246
1247       This method resets some object properties,  as it is supposed to be
1248       invoked only once per file or stream.  It will leave attributes
1249       "column_names" and "bound_columns" alone if setting column names is
1250       disabled. Reading headers on previously process objects might fail on
1251       perl-5.8.0 and older.
1252
1253       Assuming that the file opened for parsing has a header, and the header
1254       does not contain problematic characters like embedded newlines,   read
1255       the first line from the open handle then auto-detect whether the header
1256       separates the column names with a character from the allowed separator
1257       list.
1258
1259       If any of the allowed separators matches,  and none of the other
1260       allowed separators match,  set  "sep"  to that  separator  for the
1261       current CSV_XS instance and use it to parse the first line, map those
1262       to lowercase, and use that to set the instance "column_names":
1263
1264        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1265        open my $fh, "<", "file.csv";
1266        binmode $fh; # for Windows
1267        $csv->header ($fh);
1268        while (my $row = $csv->getline_hr ($fh)) {
1269            ...
1270            }
1271
1272       If the header is empty,  contains more than one unique separator out of
1273       the allowed set,  contains empty fields,   or contains identical fields
1274       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1275       respectively.
1276
1277       If the header contains embedded newlines or is not valid  CSV  in any
1278       other way, this method will croak and leave the parse error untouched.
1279
1280       A successful call to "header"  will always set the  "sep"  of the $csv
1281       object. This behavior can not be disabled.
1282
1283       return value
1284
1285       On error this method will croak.
1286
1287       In list context,  the headers will be returned whether they are used to
1288       set "column_names" or not.
1289
1290       In scalar context, the instance itself is returned.  Note: the values
1291       as found in the header will effectively be  lost if  "set_column_names"
1292       is false.
1293
1294       Options
1295
1296       sep_set
1297          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1298
1299         The list of legal separators defaults to "[ ";", "," ]" and can be
1300         changed by this option.  As this is probably the most often used
1301         option,  it can be passed on its own as an unnamed argument:
1302
1303          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1304
1305         Multi-byte  sequences are allowed,  both multi-character and
1306         Unicode.  See "sep".
1307
1308       detect_bom
1309          $csv->header ($fh, { detect_bom => 1 });
1310
1311         The default behavior is to detect if the header line starts with a
1312         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1313         This default behavior can be disabled by passing a false value to
1314         "detect_bom".
1315
1316         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1317         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1318         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1319         supported.
1320
1321         If a supported BOM was detected as start of the stream, it is stored
1322         in the object attribute "ENCODING".
1323
1324          my $enc = $csv->{ENCODING};
1325
1326         The encoding is used with "binmode" on $fh.
1327
1328         If the handle was opened in a (correct) encoding,  this method will
1329         not alter the encoding, as it checks the leading bytes of the first
1330         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1331         "{ENCODING}" will be "" (empty) instead of the default "undef".
1332
1333       munge_column_names
1334         This option offers the means to modify the column names into
1335         something that is most useful to the application.   The default is to
1336         map all column names to lower case.
1337
1338          $csv->header ($fh, { munge_column_names => "lc" });
1339
1340         The following values are available:
1341
1342           lc     - lower case
1343           uc     - upper case
1344           db     - valid DB field names
1345           none   - do not change
1346           \%hash - supply a mapping
1347           \&cb   - supply a callback
1348
1349         Lower case
1350            $csv->header ($fh, { munge_column_names => "lc" });
1351
1352           The header is changed to all lower-case
1353
1354            $_ = lc;
1355
1356         Upper case
1357            $csv->header ($fh, { munge_column_names => "uc" });
1358
1359           The header is changed to all upper-case
1360
1361            $_ = uc;
1362
1363         Literal
1364            $csv->header ($fh, { munge_column_names => "none" });
1365
1366         Hash
1367            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1368
1369           if a value does not exist, the original value is used unchanged
1370
1371         Database
1372            $csv->header ($fh, { munge_column_names => "db" });
1373
1374           - lower-case
1375
1376           - all sequences of non-word characters are replaced with an
1377             underscore
1378
1379           - all leading underscores are removed
1380
1381            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1382
1383         Callback
1384            $csv->header ($fh, { munge_column_names => sub { fc } });
1385            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1386            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1387
1388           As this callback is called in a "map", you can use $_ directly.
1389
1390       set_column_names
1391          $csv->header ($fh, { set_column_names => 1 });
1392
1393         The default is to set the instances column names using
1394         "column_names" if the method is successful,  so subsequent calls to
1395         "getline_hr" can return a hash. Disable setting the header can be
1396         forced by using a false value for this option.
1397
1398         As described in "return value" above, content is lost in scalar
1399         context.
1400
1401       Validation
1402
1403       When receiving CSV files from external sources,  this method can be
1404       used to protect against changes in the layout by restricting to known
1405       headers  (and typos in the header fields).
1406
1407        my %known = (
1408            "record key" => "c_rec",
1409            "rec id"     => "c_rec",
1410            "id_rec"     => "c_rec",
1411            "kode"       => "code",
1412            "code"       => "code",
1413            "vaule"      => "value",
1414            "value"      => "value",
1415            );
1416        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1417        open my $fh, "<", $source or die "$source: $!";
1418        $csv->header ($fh, { munge_column_names => sub {
1419            s/\s+$//;
1420            s/^\s+//;
1421            $known{lc $_} or die "Unknown column '$_' in $source";
1422            }});
1423        while (my $row = $csv->getline_hr ($fh)) {
1424            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1425            }
1426
1427   bind_columns
1428       Takes a list of scalar references to be used for output with  "print"
1429       or to store in the fields fetched by "getline".  When you do not pass
1430       enough references to store the fetched fields in, "getline" will fail
1431       with error 3006.  If you pass more than there are fields to return,
1432       the content of the remaining references is left untouched.
1433
1434        $csv->bind_columns (\$code, \$name, \$price, \$description);
1435        while ($csv->getline ($fh)) {
1436            print "The price of a $name is \x{20ac} $price\n";
1437            }
1438
1439       To reset or clear all column binding, call "bind_columns" with the
1440       single argument "undef". This will also clear column names.
1441
1442        $csv->bind_columns (undef);
1443
1444       If no arguments are passed at all, "bind_columns" will return the list
1445       of current bindings or "undef" if no binds are active.
1446
1447       Note that in parsing with  "bind_columns",  the fields are set on the
1448       fly.  That implies that if the third field of a row causes an error
1449       (or this row has just two fields where the previous row had more),  the
1450       first two fields already have been assigned the values of the current
1451       row, while the rest of the fields will still hold the values of the
1452       previous row.  If you want the parser to fail in these cases, use the
1453       "strict" attribute.
1454
1455   eof
1456        $eof = $csv->eof ();
1457
1458       If "parse" or  "getline"  was used with an IO stream,  this method will
1459       return true (1) if the last call hit end of file,  otherwise it will
1460       return false ('').  This is useful to see the difference between a
1461       failure and end of file.
1462
1463       Note that if the parsing of the last line caused an error,  "eof" is
1464       still true.  That means that if you are not using "auto_diag", an idiom
1465       like
1466
1467        while (my $row = $csv->getline ($fh)) {
1468            # ...
1469            }
1470        $csv->eof or $csv->error_diag;
1471
1472       will not report the error. You would have to change that to
1473
1474        while (my $row = $csv->getline ($fh)) {
1475            # ...
1476            }
1477        +$csv->error_diag and $csv->error_diag;
1478
1479   types
1480        $csv->types (\@tref);
1481
1482       This method is used to force that  (all)  columns are of a given type.
1483       For example, if you have an integer column,  two  columns  with
1484       doubles  and a string column, then you might do a
1485
1486        $csv->types ([Text::CSV_XS::IV (),
1487                      Text::CSV_XS::NV (),
1488                      Text::CSV_XS::NV (),
1489                      Text::CSV_XS::PV ()]);
1490
1491       Column types are used only for decoding columns while parsing,  in
1492       other words by the "parse" and "getline" methods.
1493
1494       You can unset column types by doing a
1495
1496        $csv->types (undef);
1497
1498       or fetch the current type settings with
1499
1500        $types = $csv->types ();
1501
1502       IV
1503       CSV_TYPE_IV
1504           Set field type to integer.
1505
1506       NV
1507       CSV_TYPE_NV
1508           Set field type to numeric/float.
1509
1510       PV
1511       CSV_TYPE_PV
1512           Set field type to string.
1513
1514   fields
1515        @columns = $csv->fields ();
1516
1517       This method returns the input to   "combine"  or the resultant
1518       decomposed fields of a successful "parse", whichever was called more
1519       recently.
1520
1521       Note that the return value is undefined after using "getline", which
1522       does not fill the data structures returned by "parse".
1523
1524   meta_info
1525        @flags = $csv->meta_info ();
1526
1527       This method returns the "flags" of the input to "combine" or the flags
1528       of the resultant  decomposed fields of  "parse",   whichever was called
1529       more recently.
1530
1531       For each field,  a meta_info field will hold  flags that  inform
1532       something about  the  field  returned  by  the  "fields"  method or
1533       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1534
1535       0x0001
1536       "CSV_FLAGS_IS_QUOTED"
1537         The field was quoted.
1538
1539       0x0002
1540       "CSV_FLAGS_IS_BINARY"
1541         The field was binary.
1542
1543       0x0004
1544       "CSV_FLAGS_ERROR_IN_FIELD"
1545         The field was invalid.
1546
1547         Currently only used when "allow_loose_quotes" is active.
1548
1549       0x0010
1550       "CSV_FLAGS_IS_MISSING"
1551         The field was missing.
1552
1553       See the "is_***" methods below.
1554
1555   is_quoted
1556        my $quoted = $csv->is_quoted ($column_idx);
1557
1558       where  $column_idx is the  (zero-based)  index of the column in the
1559       last result of "parse".
1560
1561       This returns a true value  if the data in the indicated column was
1562       enclosed in "quote_char" quotes.  This might be important for fields
1563       where content ",20070108," is to be treated as a numeric value,  and
1564       where ","20070108"," is explicitly marked as character string data.
1565
1566       This method is only valid when "keep_meta_info" is set to a true value.
1567
1568   is_binary
1569        my $binary = $csv->is_binary ($column_idx);
1570
1571       where  $column_idx is the  (zero-based)  index of the column in the
1572       last result of "parse".
1573
1574       This returns a true value if the data in the indicated column contained
1575       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1576
1577       This method is only valid when "keep_meta_info" is set to a true value.
1578
1579   is_missing
1580        my $missing = $csv->is_missing ($column_idx);
1581
1582       where  $column_idx is the  (zero-based)  index of the column in the
1583       last result of "getline_hr".
1584
1585        $csv->keep_meta_info (1);
1586        while (my $hr = $csv->getline_hr ($fh)) {
1587            $csv->is_missing (0) and next; # This was an empty line
1588            }
1589
1590       When using  "getline_hr",  it is impossible to tell if the  parsed
1591       fields are "undef" because they where not filled in the "CSV" stream
1592       or because they were not read at all, as all the fields defined by
1593       "column_names" are set in the hash-ref.    If you still need to know if
1594       all fields in each row are provided, you should enable "keep_meta_info"
1595       so you can check the flags.
1596
1597       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1598       "undef", regardless of $column_idx being valid or not. If this
1599       attribute is "true" it will return either 0 (the field is present) or 1
1600       (the field is missing).
1601
1602       A special case is the empty line.  If the line is completely empty -
1603       after dealing with the flags - this is still a valid CSV line:  it is a
1604       record of just one single empty field. However, if "keep_meta_info" is
1605       set, invoking "is_missing" with index 0 will now return true.
1606
1607   status
1608        $status = $csv->status ();
1609
1610       This method returns the status of the last invoked "combine" or "parse"
1611       call. Status is success (true: 1) or failure (false: "undef" or 0).
1612
1613       Note that as this only keeps track of the status of above mentioned
1614       methods, you are probably looking for "error_diag" instead.
1615
1616   error_input
1617        $bad_argument = $csv->error_input ();
1618
1619       This method returns the erroneous argument (if it exists) of "combine"
1620       or "parse",  whichever was called more recently.  If the last
1621       invocation was successful, "error_input" will return "undef".
1622
1623       Depending on the type of error, it might also hold the data for the
1624       last error-input of "getline".
1625
1626   error_diag
1627        Text::CSV_XS->error_diag ();
1628        $csv->error_diag ();
1629        $error_code               = 0  + $csv->error_diag ();
1630        $error_str                = "" . $csv->error_diag ();
1631        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1632
1633       If (and only if) an error occurred,  this function returns  the
1634       diagnostics of that error.
1635
1636       If called in void context,  this will print the internal error code and
1637       the associated error message to STDERR.
1638
1639       If called in list context,  this will return  the error code  and the
1640       error message in that order.  If the last error was from parsing, the
1641       rest of the values returned are a best guess at the location  within
1642       the line  that was being parsed. Their values are 1-based.  The
1643       position currently is index of the byte at which the parsing failed in
1644       the current record. It might change to be the index of the current
1645       character in a later release. The records is the index of the record
1646       parsed by the csv instance. The field number is the index of the field
1647       the parser thinks it is currently  trying to  parse. See
1648       examples/csv-check for how this can be used.
1649
1650       If called in  scalar context,  it will return  the diagnostics  in a
1651       single scalar, a-la $!.  It will contain the error code in numeric
1652       context, and the diagnostics message in string context.
1653
1654       When called as a class method or a  direct function call,  the
1655       diagnostics are that of the last "new" call.
1656
1657   record_number
1658        $recno = $csv->record_number ();
1659
1660       Returns the records parsed by this csv instance.  This value should be
1661       more accurate than $. when embedded newlines come in play. Records
1662       written by this instance are not counted.
1663
1664   SetDiag
1665        $csv->SetDiag (0);
1666
1667       Use to reset the diagnostics if you are dealing with errors.
1668

IMPORTS/EXPORTS

1670       By default none of these are exported.
1671
1672       csv
1673          use Text::CSV_XS qw( csv );
1674
1675         Import the function "csv" function. See below.
1676
1677       :CONSTANTS
1678          use Text::CSV_XS qw( :CONSTANTS );
1679
1680         Import module constants  "CSV_FLAGS_IS_QUOTED",
1681         "CSV_FLAGS_IS_BINARY", "CSV_FLAGS_ERROR_IN_FIELD",
1682         "CSV_FLAGS_IS_MISSING",   "CSV_TYPE_PV", "CSV_TYPE_IV", and
1683         "CSV_TYPE_NV". Each can be imported alone
1684
1685          use Text::CSV_XS qw( CSV_FLAS_IS_BINARY CSV_TYPE_NV );
1686

FUNCTIONS

1688   csv
1689       This function is not exported by default and should be explicitly
1690       requested:
1691
1692        use Text::CSV_XS qw( csv );
1693
1694       This is a high-level function that aims at simple (user) interfaces.
1695       This can be used to read/parse a "CSV" file or stream (the default
1696       behavior) or to produce a file or write to a stream (define the  "out"
1697       attribute).  It returns an array- or hash-reference on parsing (or
1698       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1699       When this function fails you can get to the error using the class call
1700       to "error_diag"
1701
1702        my $aoa = csv (in => "test.csv") or
1703            die Text::CSV_XS->error_diag;
1704
1705       This function takes the arguments as key-value pairs. This can be
1706       passed as a list or as an anonymous hash:
1707
1708        my $aoa = csv (  in => "test.csv", sep_char => ";");
1709        my $aoh = csv ({ in => $fh, headers => "auto" });
1710
1711       The arguments passed consist of two parts:  the arguments to "csv"
1712       itself and the optional attributes to the  "CSV"  object used inside
1713       the function as enumerated and explained in "new".
1714
1715       If not overridden, the default option used for CSV is
1716
1717        auto_diag   => 1
1718        escape_null => 0
1719
1720       The option that is always set and cannot be altered is
1721
1722        binary      => 1
1723
1724       As this function will likely be used in one-liners,  it allows  "quote"
1725       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1726       "esc" or "escape".
1727
1728       Alternative invocations:
1729
1730        my $aoa = Text::CSV_XS::csv (in => "file.csv");
1731
1732        my $csv = Text::CSV_XS->new ();
1733        my $aoa = $csv->csv (in => "file.csv");
1734
1735       In the latter case, the object attributes are used from the existing
1736       object and the attribute arguments in the function call are ignored:
1737
1738        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
1739        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1740
1741       will parse using ";" as "sep_char", not ",".
1742
1743       in
1744
1745       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1746       which will be  opened for reading  and closed when finished,  a file
1747       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1748       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1749       "\q{1,2,"csv"}").
1750
1751       When used with "out", "in" should be a reference to a CSV structure
1752       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1753       reference.  The code-ref will be invoked with no arguments.
1754
1755        my $aoa = csv (in => "file.csv");
1756
1757        open my $fh, "<", "file.csv";
1758        my $aoa = csv (in => $fh);
1759
1760        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1761        my $err = csv (in => $csv, out => "file.csv");
1762
1763       If called in void context without the "out" attribute, the resulting
1764       ref will be used as input to a subsequent call to csv:
1765
1766        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1767
1768       will be a shortcut to
1769
1770        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1771
1772       where, in the absence of the "out" attribute, this is a shortcut to
1773
1774        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1775             out => *STDOUT)
1776
1777       out
1778
1779        csv (in => $aoa, out => "file.csv");
1780        csv (in => $aoa, out => $fh);
1781        csv (in => $aoa, out =>   STDOUT);
1782        csv (in => $aoa, out =>  *STDOUT);
1783        csv (in => $aoa, out => \*STDOUT);
1784        csv (in => $aoa, out => \my $data);
1785        csv (in => $aoa, out =>  undef);
1786        csv (in => $aoa, out => \"skip");
1787
1788        csv (in => $fh,  out => \@aoa);
1789        csv (in => $fh,  out => \@aoh, bom => 1);
1790        csv (in => $fh,  out => \%hsh, key => "key");
1791
1792       In output mode, the default CSV options when producing CSV are
1793
1794        eol       => "\r\n"
1795
1796       The "fragment" attribute is ignored in output mode.
1797
1798       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1799       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1800       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1801       or a reference to a scalar (e.g. "\my $data").
1802
1803        csv (in => sub { $sth->fetch },            out => "dump.csv");
1804        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1805             headers => $sth->{NAME_lc});
1806
1807       When a code-ref is used for "in", the output is generated  per
1808       invocation, so no buffering is involved. This implies that there is no
1809       size restriction on the number of records. The "csv" function ends when
1810       the coderef returns a false value.
1811
1812       If "out" is set to a reference of the literal string "skip", the output
1813       will be suppressed completely,  which might be useful in combination
1814       with a filter for side effects only.
1815
1816        my %cache;
1817        csv (in    => "dump.csv",
1818             out   => \"skip",
1819             on_in => sub { $cache{$_[1][1]}++ });
1820
1821       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1822       equivalent to "\"skip"".
1823
1824       If the "in" argument point to something to parse, and the "out" is set
1825       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1826       data in the existing reference. The result of the parse should match
1827       what exists in the reference passed. This might come handy when you
1828       have to parse a set of files with similar content (like data stored per
1829       period) and you want to collect that into a single data structure:
1830
1831        my %hash;
1832        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1833
1834        my @list; # List of arrays
1835        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1836
1837        my @list; # List of hashes
1838        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1839
1840       encoding
1841
1842       If passed,  it should be an encoding accepted by the  :encoding()
1843       option to "open". There is no default value. This attribute does not
1844       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1845       use in command line invocations.
1846
1847       If "encoding" is set to the literal value "auto", the method "header"
1848       will be invoked on the opened stream to check if there is a BOM and set
1849       the encoding accordingly.   This is equal to passing a true value in
1850       the option "detect_bom".
1851
1852       Encodings can be stacked, as supported by "binmode":
1853
1854        # Using PerlIO::via::gzip
1855        csv (in       => \@csv,
1856             out      => "test.csv:via.gz",
1857             encoding => ":via(gzip):encoding(utf-8)",
1858             );
1859        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1860
1861        # Using PerlIO::gzip
1862        csv (in       => \@csv,
1863             out      => "test.csv:via.gz",
1864             encoding => ":gzip:encoding(utf-8)",
1865             );
1866        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1867
1868       detect_bom
1869
1870       If  "detect_bom"  is given, the method  "header"  will be invoked on
1871       the opened stream to check if there is a BOM and set the encoding
1872       accordingly.
1873
1874       "detect_bom" can be abbreviated to "bom".
1875
1876       This is the same as setting "encoding" to "auto".
1877
1878       Note that as the method  "header" is invoked,  its default is to also
1879       set the headers.
1880
1881       headers
1882
1883       If this attribute is not given, the default behavior is to produce an
1884       array of arrays.
1885
1886       If "headers" is supplied,  it should be an anonymous list of column
1887       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1888       "lc", "uc", or "skip".
1889
1890       skip
1891         When "skip" is used, the header will not be included in the output.
1892
1893          my $aoa = csv (in => $fh, headers => "skip");
1894
1895         "skip" is invalid/ignored in combinations with "detect_bom".
1896
1897       auto
1898         If "auto" is used, the first line of the "CSV" source will be read as
1899         the list of field headers and used to produce an array of hashes.
1900
1901          my $aoh = csv (in => $fh, headers => "auto");
1902
1903       lc
1904         If "lc" is used,  the first line of the  "CSV" source will be read as
1905         the list of field headers mapped to  lower case and used to produce
1906         an array of hashes. This is a variation of "auto".
1907
1908          my $aoh = csv (in => $fh, headers => "lc");
1909
1910       uc
1911         If "uc" is used,  the first line of the  "CSV" source will be read as
1912         the list of field headers mapped to  upper case and used to produce
1913         an array of hashes. This is a variation of "auto".
1914
1915          my $aoh = csv (in => $fh, headers => "uc");
1916
1917       CODE
1918         If a coderef is used,  the first line of the  "CSV" source will be
1919         read as the list of mangled field headers in which each field is
1920         passed as the only argument to the coderef. This list is used to
1921         produce an array of hashes.
1922
1923          my $aoh = csv (in      => $fh,
1924                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1925
1926         this example is a variation of using "lc" where all occurrences of
1927         "kode" are replaced with "code".
1928
1929       ARRAY
1930         If  "headers"  is an anonymous list,  the entries in the list will be
1931         used as field names. The first line is considered data instead of
1932         headers.
1933
1934          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1935          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1936
1937       HASH
1938         If "headers" is a hash reference, this implies "auto", but header
1939         fields that exist as key in the hashref will be replaced by the value
1940         for that key. Given a CSV file like
1941
1942          post-kode,city,name,id number,fubble
1943          1234AA,Duckstad,Donald,13,"X313DF"
1944
1945         using
1946
1947          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1948
1949         will return an entry like
1950
1951          { pc     => "1234AA",
1952            city   => "Duckstad",
1953            name   => "Donald",
1954            ID     => "13",
1955            fubble => "X313DF",
1956            }
1957
1958       See also "munge_column_names" and "set_column_names".
1959
1960       munge_column_names
1961
1962       If "munge_column_names" is set,  the method  "header"  is invoked on
1963       the opened stream with all matching arguments to detect and set the
1964       headers.
1965
1966       "munge_column_names" can be abbreviated to "munge".
1967
1968       key
1969
1970       If passed,  will default  "headers"  to "auto" and return a hashref
1971       instead of an array of hashes. Allowed values are simple scalars or
1972       array-references where the first element is the joiner and the rest are
1973       the fields to join to combine the key.
1974
1975        my $ref = csv (in => "test.csv", key => "code");
1976        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1977
1978       with test.csv like
1979
1980        code,product,price,color
1981        1,pc,850,gray
1982        2,keyboard,12,white
1983        3,mouse,5,black
1984
1985       the first example will return
1986
1987         { 1   => {
1988               code    => 1,
1989               color   => 'gray',
1990               price   => 850,
1991               product => 'pc'
1992               },
1993           2   => {
1994               code    => 2,
1995               color   => 'white',
1996               price   => 12,
1997               product => 'keyboard'
1998               },
1999           3   => {
2000               code    => 3,
2001               color   => 'black',
2002               price   => 5,
2003               product => 'mouse'
2004               }
2005           }
2006
2007       the second example will return
2008
2009         { "1:gray"    => {
2010               code    => 1,
2011               color   => 'gray',
2012               price   => 850,
2013               product => 'pc'
2014               },
2015           "2:white"   => {
2016               code    => 2,
2017               color   => 'white',
2018               price   => 12,
2019               product => 'keyboard'
2020               },
2021           "3:black"   => {
2022               code    => 3,
2023               color   => 'black',
2024               price   => 5,
2025               product => 'mouse'
2026               }
2027           }
2028
2029       The "key" attribute can be combined with "headers" for "CSV" date that
2030       has no header line, like
2031
2032        my $ref = csv (
2033            in      => "foo.csv",
2034            headers => [qw( c_foo foo bar description stock )],
2035            key     =>     "c_foo",
2036            );
2037
2038       value
2039
2040       Used to create key-value hashes.
2041
2042       Only allowed when "key" is valid. A "value" can be either a single
2043       column label or an anonymous list of column labels.  In the first case,
2044       the value will be a simple scalar value, in the latter case, it will be
2045       a hashref.
2046
2047        my $ref = csv (in => "test.csv", key   => "code",
2048                                         value => "price");
2049        my $ref = csv (in => "test.csv", key   => "code",
2050                                         value => [ "product", "price" ]);
2051        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
2052                                         value => "price");
2053        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
2054                                         value => [ "product", "price" ]);
2055
2056       with test.csv like
2057
2058        code,product,price,color
2059        1,pc,850,gray
2060        2,keyboard,12,white
2061        3,mouse,5,black
2062
2063       the first example will return
2064
2065         { 1 => 850,
2066           2 =>  12,
2067           3 =>   5,
2068           }
2069
2070       the second example will return
2071
2072         { 1   => {
2073               price   => 850,
2074               product => 'pc'
2075               },
2076           2   => {
2077               price   => 12,
2078               product => 'keyboard'
2079               },
2080           3   => {
2081               price   => 5,
2082               product => 'mouse'
2083               }
2084           }
2085
2086       the third example will return
2087
2088         { "1:gray"    => 850,
2089           "2:white"   =>  12,
2090           "3:black"   =>   5,
2091           }
2092
2093       the fourth example will return
2094
2095         { "1:gray"    => {
2096               price   => 850,
2097               product => 'pc'
2098               },
2099           "2:white"   => {
2100               price   => 12,
2101               product => 'keyboard'
2102               },
2103           "3:black"   => {
2104               price   => 5,
2105               product => 'mouse'
2106               }
2107           }
2108
2109       keep_headers
2110
2111       When using hashes,  keep the column names into the arrayref passed,  so
2112       all headers are available after the call in the original order.
2113
2114        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
2115
2116       This attribute can be abbreviated to "kh" or passed as
2117       "keep_column_names".
2118
2119       This attribute implies a default of "auto" for the "headers" attribute.
2120
2121       The headers can also be kept internally to keep stable header order:
2122
2123        csv (in      => csv (in => "file.csv", kh => "internal"),
2124             out     => "new.csv",
2125             kh      => "internal");
2126
2127       where "internal" can also be 1, "yes", or "true". This is similar to
2128
2129        my @h;
2130        csv (in      => csv (in => "file.csv", kh => \@h),
2131             out     => "new.csv",
2132             headers => \@h);
2133
2134       fragment
2135
2136       Only output the fragment as defined in the "fragment" method. This
2137       option is ignored when generating "CSV". See "out".
2138
2139       Combining all of them could give something like
2140
2141        use Text::CSV_XS qw( csv );
2142        my $aoh = csv (
2143            in       => "test.txt",
2144            encoding => "utf-8",
2145            headers  => "auto",
2146            sep_char => "|",
2147            fragment => "row=3;6-9;15-*",
2148            );
2149        say $aoh->[15]{Foo};
2150
2151       sep_set
2152
2153       If "sep_set" is set, the method "header" is invoked on the opened
2154       stream to detect and set "sep_char" with the given set.
2155
2156       "sep_set" can be abbreviated to "seps". If neither "sep_set" not "seps"
2157       is given, but "sep" is defined, "sep_set" defaults to "[ sep ]". This
2158       is only supported for perl version 5.10 and up.
2159
2160       Note that as the  "header" method is invoked,  its default is to also
2161       set the headers.
2162
2163       set_column_names
2164
2165       If  "set_column_names" is passed,  the method "header" is invoked on
2166       the opened stream with all arguments meant for "header".
2167
2168       If "set_column_names" is passed as a false value, the content of the
2169       first row is only preserved if the output is AoA:
2170
2171       With an input-file like
2172
2173        bAr,foo
2174        1,2
2175        3,4,5
2176
2177       This call
2178
2179        my $aoa = csv (in => $file, set_column_names => 0);
2180
2181       will result in
2182
2183        [[ "bar", "foo"     ],
2184         [ "1",   "2"       ],
2185         [ "3",   "4",  "5" ]]
2186
2187       and
2188
2189        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2190
2191       will result in
2192
2193        [[ "bAr", "foo"     ],
2194         [ "1",   "2"       ],
2195         [ "3",   "4",  "5" ]]
2196
2197   Callbacks
2198       Callbacks enable actions triggered from the inside of Text::CSV_XS.
2199
2200       While most of what this enables  can easily be done in an  unrolled
2201       loop as described in the "SYNOPSIS" callbacks can be used to meet
2202       special demands or enhance the "csv" function.
2203
2204       error
2205          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2206
2207         the "error"  callback is invoked when an error occurs,  but  only
2208         when "auto_diag" is set to a true value. A callback is invoked with
2209         the values returned by "error_diag":
2210
2211          my ($c, $s);
2212
2213          sub ignore3006 {
2214              my ($err, $msg, $pos, $recno, $fldno) = @_;
2215              if ($err == 3006) {
2216                  # ignore this error
2217                  ($c, $s) = (undef, undef);
2218                  Text::CSV_XS->SetDiag (0);
2219                  }
2220              # Any other error
2221              return;
2222              } # ignore3006
2223
2224          $csv->callbacks (error => \&ignore3006);
2225          $csv->bind_columns (\$c, \$s);
2226          while ($csv->getline ($fh)) {
2227              # Error 3006 will not stop the loop
2228              }
2229
2230       after_parse
2231          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2232          while (my $row = $csv->getline ($fh)) {
2233              $row->[-1] eq "NEW";
2234              }
2235
2236         This callback is invoked after parsing with  "getline"  only if no
2237         error occurred.  The callback is invoked with two arguments:   the
2238         current "CSV" parser object and an array reference to the fields
2239         parsed.
2240
2241         The return code of the callback is ignored  unless it is a reference
2242         to the string "skip", in which case the record will be skipped in
2243         "getline_all".
2244
2245          sub add_from_db {
2246              my ($csv, $row) = @_;
2247              $sth->execute ($row->[4]);
2248              push @$row, $sth->fetchrow_array;
2249              } # add_from_db
2250
2251          my $aoa = csv (in => "file.csv", callbacks => {
2252              after_parse => \&add_from_db });
2253
2254         This hook can be used for validation:
2255
2256         FAIL
2257           Die if any of the records does not validate a rule:
2258
2259            after_parse => sub {
2260                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2261                    die "5th field does not have a valid Dutch zipcode";
2262                }
2263
2264         DEFAULT
2265           Replace invalid fields with a default value:
2266
2267            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2268
2269         SKIP
2270           Skip records that have invalid fields (only applies to
2271           "getline_all"):
2272
2273            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2274
2275       before_print
2276          my $idx = 1;
2277          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2278          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2279
2280         This callback is invoked  before printing with  "print"  only if no
2281         error occurred.  The callback is invoked with two arguments:  the
2282         current  "CSV" parser object and an array reference to the fields
2283         passed.
2284
2285         The return code of the callback is ignored.
2286
2287          sub max_4_fields {
2288              my ($csv, $row) = @_;
2289              @$row > 4 and splice @$row, 4;
2290              } # max_4_fields
2291
2292          csv (in => csv (in => "file.csv"), out => *STDOUT,
2293              callbacks => { before_print => \&max_4_fields });
2294
2295         This callback is not active for "combine".
2296
2297       Callbacks for csv ()
2298
2299       The "csv" allows for some callbacks that do not integrate in XS
2300       internals but only feature the "csv" function.
2301
2302         csv (in        => "file.csv",
2303              callbacks => {
2304                  filter       => { 6 => sub { $_ > 15 } },    # first
2305                  after_parse  => sub { say "AFTER PARSE";  }, # first
2306                  after_in     => sub { say "AFTER IN";     }, # second
2307                  on_in        => sub { say "ON IN";        }, # third
2308                  },
2309              );
2310
2311         csv (in        => $aoh,
2312              out       => "file.csv",
2313              callbacks => {
2314                  on_in        => sub { say "ON IN";        }, # first
2315                  before_out   => sub { say "BEFORE OUT";   }, # second
2316                  before_print => sub { say "BEFORE PRINT"; }, # third
2317                  },
2318              );
2319
2320       filter
2321         This callback can be used to filter records.  It is called just after
2322         a new record has been scanned.  The callback accepts a:
2323
2324         hashref
2325           The keys are the index to the row (the field name or field number,
2326           1-based) and the values are subs to return a true or false value.
2327
2328            csv (in => "file.csv", filter => {
2329                       3 => sub { m/a/ },       # third field should contain an "a"
2330                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2331                       });
2332
2333            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2334
2335           If the keys to the filter hash contain any character that is not a
2336           digit it will also implicitly set "headers" to "auto"  unless
2337           "headers"  was already passed as argument.  When headers are
2338           active, returning an array of hashes, the filter is not applicable
2339           to the header itself.
2340
2341           All sub results should match, as in AND.
2342
2343           The context of the callback sets  $_ localized to the field
2344           indicated by the filter. The two arguments are as with all other
2345           callbacks, so the other fields in the current row can be seen:
2346
2347            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2348
2349           If the context is set to return a list of hashes  ("headers" is
2350           defined), the current record will also be available in the
2351           localized %_:
2352
2353            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2354
2355           If the filter is used to alter the content by changing $_,  make
2356           sure that the sub returns true in order not to have that record
2357           skipped:
2358
2359            filter => { 2 => sub { $_ = uc }}
2360
2361           will upper-case the second field, and then skip it if the resulting
2362           content evaluates to false. To always accept, end with truth:
2363
2364            filter => { 2 => sub { $_ = uc; 1 }}
2365
2366         coderef
2367            csv (in => "file.csv", filter => sub { $n++; 0; });
2368
2369           If the argument to "filter" is a coderef,  it is an alias or
2370           shortcut to a filter on column 0:
2371
2372            csv (filter => sub { $n++; 0 });
2373
2374           is equal to
2375
2376            csv (filter => { 0 => sub { $n++; 0 });
2377
2378         filter-name
2379            csv (in => "file.csv", filter => "not_blank");
2380            csv (in => "file.csv", filter => "not_empty");
2381            csv (in => "file.csv", filter => "filled");
2382
2383           These are predefined filters
2384
2385           Given a file like (line numbers prefixed for doc purpose only):
2386
2387            1:1,2,3
2388            2:
2389            3:,
2390            4:""
2391            5:,,
2392            6:, ,
2393            7:"",
2394            8:" "
2395            9:4,5,6
2396
2397           not_blank
2398             Filter out the blank lines
2399
2400             This filter is a shortcut for
2401
2402              filter => { 0 => sub { @{$_[1]} > 1 or
2403                          defined $_[1][0] && $_[1][0] ne "" } }
2404
2405             Due to the implementation,  it is currently impossible to also
2406             filter lines that consists only of a quoted empty field. These
2407             lines are also considered blank lines.
2408
2409             With the given example, lines 2 and 4 will be skipped.
2410
2411           not_empty
2412             Filter out lines where all the fields are empty.
2413
2414             This filter is a shortcut for
2415
2416              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2417
2418             A space is not regarded being empty, so given the example data,
2419             lines 2, 3, 4, 5, and 7 are skipped.
2420
2421           filled
2422             Filter out lines that have no visible data
2423
2424             This filter is a shortcut for
2425
2426              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2427
2428             This filter rejects all lines that not have at least one field
2429             that does not evaluate to the empty string.
2430
2431             With the given example data, this filter would skip lines 2
2432             through 8.
2433
2434         One could also use modules like Types::Standard:
2435
2436          use Types::Standard -types;
2437
2438          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2439          my $check  = $type->compiled_check;
2440
2441          # filter with compiled check and warnings
2442          my $aoa = csv (
2443             in     => \$data,
2444             filter => {
2445                 0 => sub {
2446                     my $ok = $check->($_[1]) or
2447                         warn $type->get_message ($_[1]), "\n";
2448                     return $ok;
2449                     },
2450                 },
2451             );
2452
2453       after_in
2454         This callback is invoked for each record after all records have been
2455         parsed but before returning the reference to the caller.  The hook is
2456         invoked with two arguments:  the current  "CSV"  parser object  and a
2457         reference to the record.   The reference can be a reference to a
2458         HASH  or a reference to an ARRAY as determined by the arguments.
2459
2460         This callback can also be passed as  an attribute without the
2461         "callbacks" wrapper.
2462
2463       before_out
2464         This callback is invoked for each record before the record is
2465         printed.  The hook is invoked with two arguments:  the current "CSV"
2466         parser object and a reference to the record.   The reference can be a
2467         reference to a  HASH or a reference to an ARRAY as determined by the
2468         arguments.
2469
2470         This callback can also be passed as an attribute  without the
2471         "callbacks" wrapper.
2472
2473         This callback makes the row available in %_ if the row is a hashref.
2474         In this case %_ is writable and will change the original row.
2475
2476       on_in
2477         This callback acts exactly as the "after_in" or the "before_out"
2478         hooks.
2479
2480         This callback can also be passed as an attribute  without the
2481         "callbacks" wrapper.
2482
2483         This callback makes the row available in %_ if the row is a hashref.
2484         In this case %_ is writable and will change the original row. So e.g.
2485         with
2486
2487           my $aoh = csv (
2488               in      => \"foo\n1\n2\n",
2489               headers => "auto",
2490               on_in   => sub { $_{bar} = 2; },
2491               );
2492
2493         $aoh will be:
2494
2495           [ { foo => 1,
2496               bar => 2,
2497               }
2498             { foo => 2,
2499               bar => 2,
2500               }
2501             ]
2502
2503       csv
2504         The function  "csv" can also be called as a method or with an
2505         existing Text::CSV_XS object. This could help if the function is to
2506         be invoked a lot of times and the overhead of creating the object
2507         internally over  and  over again would be prevented by passing an
2508         existing instance.
2509
2510          my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2511
2512          my $aoa = $csv->csv (in => $fh);
2513          my $aoa = csv (in => $fh, csv => $csv);
2514
2515         both act the same. Running this 20000 times on a 20 lines CSV file,
2516         showed a 53% speedup.
2517

INTERNALS

2519       Combine (...)
2520       Parse (...)
2521
2522       The arguments to these internal functions are deliberately not
2523       described or documented in order to enable the  module authors make
2524       changes it when they feel the need for it.  Using them is  highly
2525       discouraged  as  the  API may change in future releases.
2526

EXAMPLES

2528   Reading a CSV file line by line:
2529        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2530        open my $fh, "<", "file.csv" or die "file.csv: $!";
2531        while (my $row = $csv->getline ($fh)) {
2532            # do something with @$row
2533            }
2534        close $fh or die "file.csv: $!";
2535
2536       or
2537
2538        my $aoa = csv (in => "file.csv", on_in => sub {
2539            # do something with %_
2540            });
2541
2542       Reading only a single column
2543
2544        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2545        open my $fh, "<", "file.csv" or die "file.csv: $!";
2546        # get only the 4th column
2547        my @column = map { $_->[3] } @{$csv->getline_all ($fh)};
2548        close $fh or die "file.csv: $!";
2549
2550       with "csv", you could do
2551
2552        my @column = map { $_->[0] }
2553            @{csv (in => "file.csv", fragment => "col=4")};
2554
2555   Parsing CSV strings:
2556        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
2557
2558        my $sample_input_string =
2559            qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
2560        if ($csv->parse ($sample_input_string)) {
2561            my @field = $csv->fields;
2562            foreach my $col (0 .. $#field) {
2563                my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
2564                printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
2565                }
2566            }
2567        else {
2568            print STDERR "parse () failed on argument: ",
2569                $csv->error_input, "\n";
2570            $csv->error_diag ();
2571            }
2572
2573       Parsing CSV from memory
2574
2575       Given a complete CSV data-set in scalar $data,  generate a list of
2576       lists to represent the rows and fields
2577
2578        # The data
2579        my $data = join "\r\n" => map { join "," => 0 .. 5 } 0 .. 5;
2580
2581        # in a loop
2582        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2583        open my $fh, "<", \$data;
2584        my @foo;
2585        while (my $row = $csv->getline ($fh)) {
2586            push @foo, $row;
2587            }
2588        close $fh;
2589
2590        # a single call
2591        my $foo = csv (in => \$data);
2592
2593   Printing CSV data
2594       The fast way: using "print"
2595
2596       An example for creating "CSV" files using the "print" method:
2597
2598        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
2599        open my $fh, ">", "foo.csv" or die "foo.csv: $!";
2600        for (1 .. 10) {
2601            $csv->print ($fh, [ $_, "$_" ]) or $csv->error_diag;
2602            }
2603        close $fh or die "$tbl.csv: $!";
2604
2605       The slow way: using "combine" and "string"
2606
2607       or using the slower "combine" and "string" methods:
2608
2609        my $csv = Text::CSV_XS->new;
2610
2611        open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
2612
2613        my @sample_input_fields = (
2614            'You said, "Hello!"',   5.67,
2615            '"Surely"',   '',   '3.14159');
2616        if ($csv->combine (@sample_input_fields)) {
2617            print $csv_fh $csv->string, "\n";
2618            }
2619        else {
2620            print "combine () failed on argument: ",
2621                $csv->error_input, "\n";
2622            }
2623        close $csv_fh or die "hello.csv: $!";
2624
2625       Generating CSV into memory
2626
2627       Format a data-set (@foo) into a scalar value in memory ($data):
2628
2629        # The data
2630        my @foo = map { [ 0 .. 5 ] } 0 .. 3;
2631
2632        # in a loop
2633        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
2634        open my $fh, ">", \my $data;
2635        $csv->print ($fh, $_) for @foo;
2636        close $fh;
2637
2638        # a single call
2639        csv (in => \@foo, out => \my $data);
2640
2641   Rewriting CSV
2642       Rewrite "CSV" files with ";" as separator character to well-formed
2643       "CSV":
2644
2645        use Text::CSV_XS qw( csv );
2646        csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
2647
2648       As "STDOUT" is now default in "csv", a one-liner converting a UTF-16
2649       CSV file with BOM and TAB-separation to valid UTF-8 CSV could be:
2650
2651        $ perl -C3 -MText::CSV_XS=csv -we\
2652           'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
2653
2654   Dumping database tables to CSV
2655       Dumping a database table can be simple as this (TIMTOWTDI):
2656
2657        my $dbh = DBI->connect (...);
2658        my $sql = "select * from foo";
2659
2660        # using your own loop
2661        open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
2662        my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
2663        my $sth = $dbh->prepare ($sql); $sth->execute;
2664        $csv->print ($fh, $sth->{NAME_lc});
2665        while (my $row = $sth->fetch) {
2666            $csv->print ($fh, $row);
2667            }
2668
2669        # using the csv function, all in memory
2670        csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
2671
2672        # using the csv function, streaming with callbacks
2673        my $sth = $dbh->prepare ($sql); $sth->execute;
2674        csv (out => "foo.csv", in => sub { $sth->fetch            });
2675        csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
2676
2677       Note that this does not discriminate between "empty" values and NULL-
2678       values from the database,  as both will be the same empty field in CSV.
2679       To enable distinction between the two, use "quote_empty".
2680
2681        csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
2682
2683       If the database import utility supports special sequences to insert
2684       "NULL" values into the database,  like MySQL/MariaDB supports "\N",
2685       use a filter or a map
2686
2687        csv (out => "foo.csv", in => sub { $sth->fetch },
2688                            on_in => sub { $_ //= "\\N" for @{$_[1]} });
2689
2690        while (my $row = $sth->fetch) {
2691            $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
2692            }
2693
2694       Note that this will not work as expected when choosing the backslash
2695       ("\") as "escape_char", as that will cause the "\" to need to be
2696       escaped by yet another "\",  which will cause the field to need
2697       quotation and thus ending up as "\\N" instead of "\N". See also
2698       "undef_str".
2699
2700        csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
2701
2702       These special sequences are not recognized by  Text::CSV_XS  on parsing
2703       the CSV generated like this, but map and filter are your friends again
2704
2705        while (my $row = $csv->getline ($fh)) {
2706            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
2707            }
2708
2709        csv (in => "foo.csv", filter => { 1 => sub {
2710            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
2711
2712   Converting CSV to JSON
2713        use Text::CSV_XS qw( csv );
2714        use JSON; # or Cpanel::JSON::XS for better performance
2715
2716        # AoA (no header interpretation)
2717        say encode_json (csv (in => "file.csv"));
2718
2719        # AoH (convert to structures)
2720        say encode_json (csv (in => "file.csv", bom => 1));
2721
2722       Yes, it is that simple.
2723
2724   The examples folder
2725       For more extended examples, see the examples/ 1. sub-directory in the
2726       original distribution or the git repository 2.
2727
2728        1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2729        2. https://github.com/Tux/Text-CSV_XS
2730
2731       The following files can be found there:
2732
2733       parser-xs.pl
2734         This can be used as a boilerplate to parse invalid "CSV"  and parse
2735         beyond (expected) errors alternative to using the "error" callback.
2736
2737          $ perl examples/parser-xs.pl bad.csv >good.csv
2738
2739       csv-check
2740         This is a command-line tool that uses parser-xs.pl  techniques to
2741         check the "CSV" file and report on its content.
2742
2743          $ csv-check files/utf8.csv
2744          Checked files/utf8.csv  with csv-check 1.9
2745          using Text::CSV_XS 1.32 with perl 5.26.0 and Unicode 9.0.0
2746          OK: rows: 1, columns: 2
2747              sep = <,>, quo = <">, bin = <1>, eol = <"\n">
2748
2749       csv-split
2750         This command splits "CSV" files into smaller files,  keeping (part
2751         of) the header.  Options include maximum number of (data) rows per
2752         file and maximum number of columns per file or a combination of the
2753         two.
2754
2755       csv2xls
2756         A script to convert "CSV" to Microsoft Excel ("XLS"). This requires
2757         extra modules Date::Calc and Spreadsheet::WriteExcel. The converter
2758         accepts various options and can produce UTF-8 compliant Excel files.
2759
2760       csv2xlsx
2761         A script to convert "CSV" to Microsoft Excel ("XLSX").  This requires
2762         the modules Date::Calc and Spreadsheet::Writer::XLSX.  The converter
2763         does accept various options including merging several "CSV" files
2764         into a single Excel file.
2765
2766       csvdiff
2767         A script that provides colorized diff on sorted CSV files,  assuming
2768         first line is header and first field is the key. Output options
2769         include colorized ANSI escape codes or HTML.
2770
2771          $ csvdiff --html --output=diff.html file1.csv file2.csv
2772
2773       rewrite.pl
2774         A script to rewrite (in)valid CSV into valid CSV files.  Script has
2775         options to generate confusing CSV files or CSV files that conform to
2776         Dutch MS-Excel exports (using ";" as separation).
2777
2778         Script - by default - honors BOM  and auto-detects separation
2779         converting it to default standard CSV with "," as separator.
2780

CAVEATS

2782       Text::CSV_XS  is not designed to detect the characters used to quote
2783       and separate fields.  The parsing is done using predefined  (default)
2784       settings.  In the examples  sub-directory,  you can find scripts  that
2785       demonstrate how you could try to detect these characters yourself.
2786
2787   Microsoft Excel
2788       The import/export from Microsoft Excel is a risky task, according to
2789       the documentation in "Text::CSV::Separator".  Microsoft uses the
2790       system's list separator defined in the regional settings, which happens
2791       to be a semicolon for Dutch, German and Spanish (and probably some
2792       others as well).   For the English locale,  the default is a comma.
2793       In Windows however,  the user is free to choose a  predefined locale,
2794       and then change  every  individual setting in it, so checking the
2795       locale is no solution.
2796
2797       As of version 1.17, a lone first line with just
2798
2799         sep=;
2800
2801       will be recognized and honored when parsing with "getline".
2802

TODO

2804       More Errors & Warnings
2805         New extensions ought to be  clear and concise  in reporting what
2806         error has occurred where and why, and maybe also offer a remedy to
2807         the problem.
2808
2809         "error_diag" is a (very) good start, but there is more work to be
2810         done in this area.
2811
2812         Basic calls  should croak or warn on  illegal parameters.  Errors
2813         should be documented.
2814
2815       setting meta info
2816         Future extensions might include extending the "meta_info",
2817         "is_quoted", and  "is_binary"  to accept setting these  flags for
2818         fields,  so you can specify which fields are quoted in the
2819         "combine"/"string" combination.
2820
2821          $csv->meta_info (0, 1, 1, 3, 0, 0);
2822          $csv->is_quoted (3, 1);
2823
2824         Metadata Vocabulary for Tabular Data
2825         <http://w3c.github.io/csvw/metadata/> (a W3C editor's draft) could be
2826         an example for supporting more metadata.
2827
2828       Parse the whole file at once
2829         Implement new methods or functions  that enable parsing of a
2830         complete file at once, returning a list of hashes. Possible extension
2831         to this could be to enable a column selection on the call:
2832
2833          my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
2834
2835         returning something like
2836
2837          [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
2838              flags  => [ ... ],
2839              },
2840            { fields => [ ... ],
2841              .
2842              },
2843            ]
2844
2845         Note that the "csv" function already supports most of this,  but does
2846         not return flags. "getline_all" returns all rows for an open stream,
2847         but this will not return flags either.  "fragment"  can reduce the
2848         required  rows or columns, but cannot combine them.
2849
2850       provider
2851          csv (in => $fh) vs csv (provider => sub { get_line });
2852
2853         Whatever the attribute name might end up to be,  this should make it
2854         easier to add input providers for parsing.   Currently most special
2855         variations for the "in" attribute are aimed at CSV generation: e.g. a
2856         callback is defined to return a reference to a record. This new
2857         attribute should enable passing data to parse, like getline.
2858
2859         Suggested by Johan Vromans.
2860
2861       Cookbook
2862         Write a document that has recipes for  most known  non-standard  (and
2863         maybe some standard)  "CSV" formats,  including formats that use
2864         "TAB",  ";", "|", or other non-comma separators.
2865
2866         Examples could be taken from W3C's CSV on the Web: Use Cases and
2867         Requirements <http://w3c.github.io/csvw/use-cases-and-
2868         requirements/index.html>
2869
2870       Steal
2871         Steal good new ideas and features from PapaParse
2872         <http://papaparse.com> or csvkit <http://csvkit.readthedocs.org>.
2873
2874       Raku support
2875         Raku support can be found here <https://github.com/Tux/CSV>. The
2876         interface is richer in support than the Perl5 API, as Raku supports
2877         more types.
2878
2879         The Raku version does not (yet) support pure binary CSV datasets.
2880
2881   NOT TODO
2882       combined methods
2883         Requests for adding means (methods) that combine "combine" and
2884         "string" in a single call will not be honored (use "print" instead).
2885         Likewise for "parse" and "fields"  (use "getline" instead), given the
2886         problems with embedded newlines.
2887
2888   Release plan
2889       No guarantees, but this is what I had in mind some time ago:
2890
2891       • DIAGNOSTICS section in pod to *describe* the errors (see below)
2892

EBCDIC

2894       Everything should now work on native EBCDIC systems.   As the test does
2895       not cover all possible codepoints and Encode does not support
2896       "utf-ebcdic", there is no guarantee that all handling of Unicode is
2897       done correct.
2898
2899       Opening "EBCDIC" encoded files on  "ASCII"+  systems is likely to
2900       succeed using Encode's "cp37", "cp1047", or "posix-bc":
2901
2902        open my $fh, "<:encoding(cp1047)", "ebcdic_file.csv" or die "...";
2903

DIAGNOSTICS

2905       Still under construction ...
2906
2907       If an error occurs,  "$csv->error_diag" can be used to get information
2908       on the cause of the failure. Note that for speed reasons the internal
2909       value is never cleared on success,  so using the value returned by
2910       "error_diag" in normal cases - when no error occurred - may cause
2911       unexpected results.
2912
2913       If the constructor failed, the cause can be found using "error_diag" as
2914       a class method, like "Text::CSV_XS->error_diag".
2915
2916       The "$csv->error_diag" method is automatically invoked upon error when
2917       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2918       autodie is in effect.  When set to 1, this will cause a "warn" with the
2919       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2920       from "auto_diag" reports.
2921
2922       Errors can be (individually) caught using the "error" callback.
2923
2924       The errors as described below are available. I have tried to make the
2925       error itself explanatory enough, but more descriptions will be added.
2926       For most of these errors, the first three capitals describe the error
2927       category:
2928
2929       • INI
2930
2931         Initialization error or option conflict.
2932
2933       • ECR
2934
2935         Carriage-Return related parse error.
2936
2937       • EOF
2938
2939         End-Of-File related parse error.
2940
2941       • EIQ
2942
2943         Parse error inside quotation.
2944
2945       • EIF
2946
2947         Parse error inside field.
2948
2949       • ECB
2950
2951         Combine error.
2952
2953       • EHR
2954
2955         HashRef parse related error.
2956
2957       And below should be the complete list of error codes that can be
2958       returned:
2959
2960       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2961
2962         The  separation character  cannot be equal to  the quotation
2963         character or to the escape character,  as this would invalidate all
2964         parsing rules.
2965
2966       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2967         TAB"
2968
2969         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2970         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2971         allow.
2972
2973       • 1003 "INI - \r or \n in main attr not allowed"
2974
2975         Using default "eol" characters in either "sep_char", "quote_char",
2976         or  "escape_char"  is  not allowed.
2977
2978       • 1004 "INI - callbacks should be undef or a hashref"
2979
2980         The "callbacks"  attribute only allows one to be "undef" or a hash
2981         reference.
2982
2983       • 1005 "INI - EOL too long"
2984
2985         The value passed for EOL is exceeding its maximum length (16).
2986
2987       • 1006 "INI - SEP too long"
2988
2989         The value passed for SEP is exceeding its maximum length (16).
2990
2991       • 1007 "INI - QUOTE too long"
2992
2993         The value passed for QUOTE is exceeding its maximum length (16).
2994
2995       • 1008 "INI - SEP undefined"
2996
2997         The value passed for SEP should be defined and not empty.
2998
2999       • 1010 "INI - the header is empty"
3000
3001         The header line parsed in the "header" is empty.
3002
3003       • 1011 "INI - the header contains more than one valid separator"
3004
3005         The header line parsed in the  "header"  contains more than one
3006         (unique) separator character out of the allowed set of separators.
3007
3008       • 1012 "INI - the header contains an empty field"
3009
3010         The header line parsed in the "header" contains an empty field.
3011
3012       • 1013 "INI - the header contains nun-unique fields"
3013
3014         The header line parsed in the  "header"  contains at least  two
3015         identical fields.
3016
3017       • 1014 "INI - header called on undefined stream"
3018
3019         The header line cannot be parsed from an undefined source.
3020
3021       • 1500 "PRM - Invalid/unsupported argument(s)"
3022
3023         Function or method called with invalid argument(s) or parameter(s).
3024
3025       • 1501 "PRM - The key attribute is passed as an unsupported type"
3026
3027         The "key" attribute is of an unsupported type.
3028
3029       • 1502 "PRM - The value attribute is passed without the key attribute"
3030
3031         The "value" attribute is only allowed when a valid key is given.
3032
3033       • 1503 "PRM - The value attribute is passed as an unsupported type"
3034
3035         The "value" attribute is of an unsupported type.
3036
3037       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
3038
3039         When  "eol"  has  been  set  to  anything  but the  default,  like
3040         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
3041         "quote_char", where the characters following the "\r" do not make up
3042         the "eol" sequence, this is an error.
3043
3044       • 2011 "ECR - Characters after end of quoted field"
3045
3046         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
3047         quoted field and after the closing double-quote, there should be
3048         either a new-line sequence or a separation character.
3049
3050       • 2012 "EOF - End of data in parsing input stream"
3051
3052         Self-explaining. End-of-file while inside parsing a stream. Can
3053         happen only when reading from streams with "getline",  as using
3054         "parse" is done on strings that are not required to have a trailing
3055         "eol".
3056
3057       • 2013 "INI - Specification error for fragments RFC7111"
3058
3059         Invalid specification for URI "fragment" specification.
3060
3061       • 2014 "ENF - Inconsistent number of fields"
3062
3063         Inconsistent number of fields under strict parsing.
3064
3065       • 2015 "ERW - Empty row"
3066
3067         An empty row was not allowed.
3068
3069       • 2021 "EIQ - NL char inside quotes, binary off"
3070
3071         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
3072         option has been selected with the constructor.
3073
3074       • 2022 "EIQ - CR char inside quotes, binary off"
3075
3076         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
3077         option has been selected with the constructor.
3078
3079       • 2023 "EIQ - QUO character not allowed"
3080
3081         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
3082         Bar",\n" will cause this error.
3083
3084       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
3085
3086         The escape character is not allowed as last character in an input
3087         stream.
3088
3089       • 2025 "EIQ - Loose unescaped escape"
3090
3091         An escape character should escape only characters that need escaping.
3092
3093         Allowing  the escape  for other characters  is possible  with the
3094         attribute "allow_loose_escapes".
3095
3096       • 2026 "EIQ - Binary character inside quoted field, binary off"
3097
3098         Binary characters are not allowed by default.    Exceptions are
3099         fields that contain valid UTF-8,  that will automatically be upgraded
3100         if the content is valid UTF-8. Set "binary" to 1 to accept binary
3101         data.
3102
3103       • 2027 "EIQ - Quoted field not terminated"
3104
3105         When parsing a field that started with a quotation character,  the
3106         field is expected to be closed with a quotation character.   When the
3107         parsed line is exhausted before the quote is found, that field is not
3108         terminated.
3109
3110       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
3111
3112       • 2031 "EIF - CR char is first char of field, not part of EOL"
3113
3114       • 2032 "EIF - CR char inside unquoted, not part of EOL"
3115
3116       • 2034 "EIF - Loose unescaped quote"
3117
3118       • 2035 "EIF - Escaped EOF in unquoted field"
3119
3120       • 2036 "EIF - ESC error"
3121
3122       • 2037 "EIF - Binary character in unquoted field, binary off"
3123
3124       • 2110 "ECB - Binary character in Combine, binary off"
3125
3126       • 2200 "EIO - print to IO failed. See errno"
3127
3128       • 3001 "EHR - Unsupported syntax for column_names ()"
3129
3130       • 3002 "EHR - getline_hr () called before column_names ()"
3131
3132       • 3003 "EHR - bind_columns () and column_names () fields count
3133         mismatch"
3134
3135       • 3004 "EHR - bind_columns () only accepts refs to scalars"
3136
3137       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
3138         fields"
3139
3140       • 3007 "EHR - bind_columns needs refs to writable scalars"
3141
3142       • 3008 "EHR - unexpected error in bound fields"
3143
3144       • 3009 "EHR - print_hr () called before column_names ()"
3145
3146       • 3010 "EHR - print_hr () called with invalid arguments"
3147

SEE ALSO

3149       IO::File,  IO::Handle,  IO::Wrap,  Text::CSV,  Text::CSV_PP,
3150       Text::CSV::Encoded,     Text::CSV::Separator,    Text::CSV::Slurp,
3151       Spreadsheet::CSV and Spreadsheet::Read, and of course perl.
3152
3153       If you are using Raku,  have a look at "Text::CSV" in the Raku
3154       ecosystem, offering the same features.
3155
3156       non-perl
3157
3158       A CSV parser in JavaScript,  also used by W3C <http://www.w3.org>,  is
3159       the multi-threaded in-browser PapaParse <http://papaparse.com/>.
3160
3161       csvkit <http://csvkit.readthedocs.org> is a python CSV parsing toolkit.
3162

AUTHOR

3164       Alan Citterman <alan@mfgrtl.com> wrote the original Perl module.
3165       Please don't send mail concerning Text::CSV_XS to Alan, who is not
3166       involved in the C/XS part that is now the main part of the module.
3167
3168       Jochen Wiedmann <joe@ispsoft.de> rewrote the en- and decoding in C by
3169       implementing a simple finite-state machine.   He added variable quote,
3170       escape and separator characters, the binary mode and the print and
3171       getline methods. See ChangeLog releases 0.10 through 0.23.
3172
3173       H.Merijn Brand <hmbrand@cpan.org> cleaned up the code,  added the field
3174       flags methods,  wrote the major part of the test suite, completed the
3175       documentation,   fixed most RT bugs,  added all the allow flags and the
3176       "csv" function. See ChangeLog releases 0.25 and on.
3177
3179        Copyright (C) 2007-2023 H.Merijn Brand.  All rights reserved.
3180        Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
3181        Copyright (C) 1997      Alan Citterman.  All rights reserved.
3182
3183       This library is free software;  you can redistribute and/or modify it
3184       under the same terms as Perl itself.
3185
3186
3187
3188perl v5.38.0                      2023-09-21                         CSV_XS(3)
Impressum