1CSV_XS(3)             User Contributed Perl Documentation            CSV_XS(3)
2
3
4

NAME

6       Text::CSV_XS - comma-separated values manipulation routines
7

SYNOPSIS

9        # Functional interface
10        use Text::CSV_XS qw( csv );
11
12        # Read whole file in memory
13        my $aoa = csv (in => "data.csv");    # as array of array
14        my $aoh = csv (in => "data.csv",
15                       headers => "auto");   # as array of hash
16
17        # Write array of arrays as csv file
18        csv (in => $aoa, out => "file.csv", sep_char=> ";");
19
20        # Only show lines where "code" is odd
21        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
22
23
24        # Object interface
25        use Text::CSV_XS;
26
27        my @rows;
28        # Read/parse CSV
29        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
30        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
31        while (my $row = $csv->getline ($fh)) {
32            $row->[2] =~ m/pattern/ or next; # 3rd field should match
33            push @rows, $row;
34            }
35        close $fh;
36
37        # and write as CSV
38        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
39        $csv->say ($fh, $_) for @rows;
40        close $fh or die "new.csv: $!";
41

DESCRIPTION

43       Text::CSV_XS  provides facilities for the composition  and
44       decomposition of comma-separated values.  An instance of the
45       Text::CSV_XS class will combine fields into a "CSV" string and parse a
46       "CSV" string into fields.
47
48       The module accepts either strings or files as input  and support the
49       use of user-specified characters for delimiters, separators, and
50       escapes.
51
52   Embedded newlines
53       Important Note:  The default behavior is to accept only ASCII
54       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
55       that the fields can not contain newlines. If your data contains
56       newlines embedded in fields, or characters above 0x7E (tilde), or
57       binary data, you must set "binary => 1" in the call to "new". To cover
58       the widest range of parsing options, you will always want to set
59       binary.
60
61       But you still have the problem  that you have to pass a correct line to
62       the "parse" method, which is more complicated from the usual point of
63       usage:
64
65        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
66        while (<>) {           #  WRONG!
67            $csv->parse ($_);
68            my @fields = $csv->fields ();
69            }
70
71       this will break, as the "while" might read broken lines:  it does not
72       care about the quoting. If you need to support embedded newlines,  the
73       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
74       and "\r\n" by default) and then
75
76        my $csv = Text::CSV_XS->new ({ binary => 1 });
77        open my $fh, "<", $file or die "$file: $!";
78        while (my $row = $csv->getline ($fh)) {
79            my @fields = @$row;
80            }
81
82       The old(er) way of using global file handles is still supported
83
84        while (my $row = $csv->getline (*ARGV)) { ... }
85
86   Unicode
87       Unicode is only tested to work with perl-5.8.2 and up.
88
89       See also "BOM".
90
91       The simplest way to ensure the correct encoding is used for  in- and
92       output is by either setting layers on the filehandles, or setting the
93       "encoding" argument for "csv".
94
95        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
96       or
97        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
98
99        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
100       or
101        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
102
103       On parsing (both for  "getline" and  "parse"),  if the source is marked
104       being UTF8, then all fields that are marked binary will also be marked
105       UTF8.
106
107       On combining ("print"  and  "combine"):  if any of the combining fields
108       was marked UTF8, the resulting string will be marked as UTF8.  Note
109       however that all fields  before  the first field marked UTF8 and
110       contained 8-bit characters that were not upgraded to UTF8,  these will
111       be  "bytes"  in the resulting string too, possibly causing unexpected
112       errors.  If you pass data of different encoding,  or you don't know if
113       there is  different  encoding, force it to be upgraded before you pass
114       them on:
115
116        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
117
118       For complete control over encoding, please use Text::CSV::Encoded:
119
120        use Text::CSV::Encoded;
121        my $csv = Text::CSV::Encoded->new ({
122            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
123            encoding_out => "cp1252",     # the encoding comes out of Perl
124            });
125
126        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
127        # combine () and print () accept *literally* utf8 encoded data
128        # parse () and getline () return *literally* utf8 encoded data
129
130        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
131        # combine () and print () accept UTF8 marked data
132        # parse () and getline () return UTF8 marked data
133
134   BOM
135       BOM  (or Byte Order Mark)  handling is available only inside the
136       "header" method.   This method supports the following encodings:
137       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
138       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
139       <https://en.wikipedia.org/wiki/Byte_order_mark>.
140
141       If a file has a BOM, the easiest way to deal with that is
142
143        my $aoh = csv (in => $file, detect_bom => 1);
144
145       All records will be encoded based on the detected BOM.
146
147       This implies a call to the  "header"  method,  which defaults to also
148       set the "column_names". So this is not the same as
149
150        my $aoh = csv (in => $file, headers => "auto");
151
152       which only reads the first record to set  "column_names"  but ignores
153       any meaning of possible present BOM.
154

SPECIFICATION

156       While no formal specification for CSV exists, RFC 4180
157       <https://datatracker.ietf.org/doc/html/rfc4180> (1) describes the
158       common format and establishes  "text/csv" as the MIME type registered
159       with the IANA. RFC 7111 <https://datatracker.ietf.org/doc/html/rfc7111>
160       (2) adds fragments to CSV.
161
162       Many informal documents exist that describe the "CSV" format.   "How
163       To: The Comma Separated Value (CSV) File Format"
164       <http://creativyst.com/Doc/Articles/CSV/CSV01.shtml> (3)  provides an
165       overview of the  "CSV"  format in the most widely used applications and
166       explains how it can best be used and supported.
167
168        1) https://datatracker.ietf.org/doc/html/rfc4180
169        2) https://datatracker.ietf.org/doc/html/rfc7111
170        3) http://creativyst.com/Doc/Articles/CSV/CSV01.shtml
171
172       The basic rules are as follows:
173
174       CSV  is a delimited data format that has fields/columns separated by
175       the comma character and records/rows separated by newlines. Fields that
176       contain a special character (comma, newline, or double quote),  must be
177       enclosed in double quotes. However, if a line contains a single entry
178       that is the empty string, it may be enclosed in double quotes.  If a
179       field's value contains a double quote character it is escaped by
180       placing another double quote character next to it. The "CSV" file
181       format does not require a specific character encoding, byte order, or
182       line terminator format.
183
184       • Each record is a single line ended by a line feed  (ASCII/"LF"=0x0A)
185         or a carriage return and line feed pair (ASCII/"CRLF"="0x0D 0x0A"),
186         however, line-breaks may be embedded.
187
188       • Fields are separated by commas.
189
190       • Allowable characters within a "CSV" field include 0x09 ("TAB") and
191         the inclusive range of 0x20 (space) through 0x7E (tilde).  In binary
192         mode all characters are accepted, at least in quoted fields.
193
194       • A field within  "CSV"  must be surrounded by  double-quotes to
195         contain  a separator character (comma).
196
197       Though this is the most clear and restrictive definition,  Text::CSV_XS
198       is way more liberal than this, and allows extension:
199
200       • Line termination by a single carriage return is accepted by default
201
202       • The separation-, escape-, and escape- characters can be any ASCII
203         character in the range from  0x20 (space) to  0x7E (tilde).
204         Characters outside this range may or may not work as expected.
205         Multibyte characters, like UTF "U+060C" (ARABIC COMMA),   "U+FF0C"
206         (FULLWIDTH COMMA),  "U+241B" (SYMBOL FOR ESCAPE), "U+2424" (SYMBOL
207         FOR NEWLINE), "U+FF02" (FULLWIDTH QUOTATION MARK), and "U+201C" (LEFT
208         DOUBLE QUOTATION MARK) (to give some examples of what might look
209         promising) work for newer versions of perl for "sep_char", and
210         "quote_char" but not for "escape_char".
211
212         If you use perl-5.8.2 or higher these three attributes are
213         utf8-decoded, to increase the likelihood of success. This way
214         "U+00FE" will be allowed as a quote character.
215
216       • A field in  "CSV"  must be surrounded by double-quotes to make an
217         embedded double-quote, represented by a pair of consecutive double-
218         quotes, valid. In binary mode you may additionally use the sequence
219         ""0" for representation of a NULL byte. Using 0x00 in binary mode is
220         just as valid.
221
222       • Several violations of the above specification may be lifted by
223         passing some options as attributes to the object constructor.
224

METHODS

226   version
227       (Class method) Returns the current module version.
228
229   new
230       (Class method) Returns a new instance of class Text::CSV_XS. The
231       attributes are described by the (optional) hash ref "\%attr".
232
233        my $csv = Text::CSV_XS->new ({ attributes ... });
234
235       The following attributes are available:
236
237       eol
238
239        my $csv = Text::CSV_XS->new ({ eol => $/ });
240                  $csv->eol (undef);
241        my $eol = $csv->eol;
242
243       The end-of-line string to add to rows for "print" or the record
244       separator for "getline".
245
246       When not passed in a parser instance,  the default behavior is to
247       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
248       "eol" at all. Passing "undef" or the empty string behave the same.
249
250       When not passed in a generating instance,  records are not terminated
251       at all, so it is probably wise to pass something you expect. A safe
252       choice for "eol" on output is either $/ or "\r\n".
253
254       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
255       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
256       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
257
258       If both $/ and "eol" equal "\015", parsing lines that end on only a
259       Carriage Return without Line Feed, will be "parse"d correct.
260
261       sep_char
262
263        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
264                $csv->sep_char (";");
265        my $c = $csv->sep_char;
266
267       The char used to separate fields, by default a comma. (",").  Limited
268       to a single-byte character, usually in the range from 0x20 (space) to
269       0x7E (tilde). When longer sequences are required, use "sep".
270
271       The separation character can not be equal to the quote character  or to
272       the escape character.
273
274       See also "CAVEATS"
275
276       sep
277
278        my $csv = Text::CSV_XS->new ({ sep => "\N{FULLWIDTH COMMA}" });
279                  $csv->sep (";");
280        my $sep = $csv->sep;
281
282       The chars used to separate fields, by default undefined. Limited to 8
283       bytes.
284
285       When set, overrules "sep_char".  If its length is one byte it acts as
286       an alias to "sep_char".
287
288       See also "CAVEATS"
289
290       quote_char
291
292        my $csv = Text::CSV_XS->new ({ quote_char => "'" });
293                $csv->quote_char (undef);
294        my $c = $csv->quote_char;
295
296       The character to quote fields containing blanks or binary data,  by
297       default the double quote character (""").  A value of undef suppresses
298       quote chars (for simple cases only). Limited to a single-byte
299       character, usually in the range from  0x20 (space) to  0x7E (tilde).
300       When longer sequences are required, use "quote".
301
302       "quote_char" can not be equal to "sep_char".
303
304       quote
305
306        my $csv = Text::CSV_XS->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
307                    $csv->quote ("'");
308        my $quote = $csv->quote;
309
310       The chars used to quote fields, by default undefined. Limited to 8
311       bytes.
312
313       When set, overrules "quote_char". If its length is one byte it acts as
314       an alias to "quote_char".
315
316       This method does not support "undef".  Use "quote_char" to disable
317       quotation.
318
319       See also "CAVEATS"
320
321       escape_char
322
323        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
324                $csv->escape_char (":");
325        my $c = $csv->escape_char;
326
327       The character to  escape  certain characters inside quoted fields.
328       This is limited to a  single-byte  character,  usually  in the  range
329       from  0x20 (space) to 0x7E (tilde).
330
331       The "escape_char" defaults to being the double-quote mark ("""). In
332       other words the same as the default "quote_char". This means that
333       doubling the quote mark in a field escapes it:
334
335        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
336
337       If  you  change  the   "quote_char"  without  changing  the
338       "escape_char",  the  "escape_char" will still be the double-quote
339       (""").  If instead you want to escape the  "quote_char" by doubling it
340       you will need to also change the  "escape_char"  to be the same as what
341       you have changed the "quote_char" to.
342
343       Setting "escape_char" to "undef" or "" will completely disable escapes
344       and is greatly discouraged. This will also disable "escape_null".
345
346       The escape character can not be equal to the separation character.
347
348       binary
349
350        my $csv = Text::CSV_XS->new ({ binary => 1 });
351                $csv->binary (0);
352        my $f = $csv->binary;
353
354       If this attribute is 1,  you may use binary characters in quoted
355       fields, including line feeds, carriage returns and "NULL" bytes. (The
356       latter could be escaped as ""0".) By default this feature is off.
357
358       If a string is marked UTF8,  "binary" will be turned on automatically
359       when binary characters other than "CR" and "NL" are encountered.   Note
360       that a simple string like "\x{00a0}" might still be binary, but not
361       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
362
363       strict
364
365        my $csv = Text::CSV_XS->new ({ strict => 1 });
366                $csv->strict (0);
367        my $f = $csv->strict;
368
369       If this attribute is set to 1, any row that parses to a different
370       number of fields than the previous row will cause the parser to throw
371       error 2014.
372
373       skip_empty_rows
374
375        my $csv = Text::CSV_XS->new ({ skip_empty_rows => 1 });
376                $csv->skip_empty_rows (0);
377        my $f = $csv->skip_empty_rows;
378
379       If this attribute is set to 1,  any row that has an  "eol" immediately
380       following the start of line will be skipped.  Default behavior is to
381       return one single empty field.
382
383       This attribute is only used in parsing.
384
385       formula_handling
386
387       Alias for "formula"
388
389       formula
390
391        my $csv = Text::CSV_XS->new ({ formula => "none" });
392                $csv->formula ("none");
393        my $f = $csv->formula;
394
395       This defines the behavior of fields containing formulas. As formulas
396       are considered dangerous in spreadsheets, this attribute can define an
397       optional action to be taken if a field starts with an equal sign ("=").
398
399       For purpose of code-readability, this can also be written as
400
401        my $csv = Text::CSV_XS->new ({ formula_handling => "none" });
402                $csv->formula_handling ("none");
403        my $f = $csv->formula_handling;
404
405       Possible values for this attribute are
406
407       none
408         Take no specific action. This is the default.
409
410          $csv->formula ("none");
411
412       die
413         Cause the process to "die" whenever a leading "=" is encountered.
414
415          $csv->formula ("die");
416
417       croak
418         Cause the process to "croak" whenever a leading "=" is encountered.
419         (See Carp)
420
421          $csv->formula ("croak");
422
423       diag
424         Report position and content of the field whenever a leading  "=" is
425         found.  The value of the field is unchanged.
426
427          $csv->formula ("diag");
428
429       empty
430         Replace the content of fields that start with a "=" with the empty
431         string.
432
433          $csv->formula ("empty");
434          $csv->formula ("");
435
436       undef
437         Replace the content of fields that start with a "=" with "undef".
438
439          $csv->formula ("undef");
440          $csv->formula (undef);
441
442       a callback
443         Modify the content of fields that start with a  "="  with the return-
444         value of the callback.  The original content of the field is
445         available inside the callback as $_;
446
447          # Replace all formula's with 42
448          $csv->formula (sub { 42; });
449
450          # same as $csv->formula ("empty") but slower
451          $csv->formula (sub { "" });
452
453          # Allow =4+12
454          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
455
456          # Allow more complex calculations
457          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
458
459       All other values will give a warning and then fallback to "diag".
460
461       decode_utf8
462
463        my $csv = Text::CSV_XS->new ({ decode_utf8 => 1 });
464                $csv->decode_utf8 (0);
465        my $f = $csv->decode_utf8;
466
467       This attributes defaults to TRUE.
468
469       While parsing,  fields that are valid UTF-8, are automatically set to
470       be UTF-8, so that
471
472         $csv->parse ("\xC4\xA8\n");
473
474       results in
475
476         PV("\304\250"\0) [UTF8 "\x{128}"]
477
478       Sometimes it might not be a desired action.  To prevent those upgrades,
479       set this attribute to false, and the result will be
480
481         PV("\304\250"\0)
482
483       auto_diag
484
485        my $csv = Text::CSV_XS->new ({ auto_diag => 1 });
486                $csv->auto_diag (2);
487        my $l = $csv->auto_diag;
488
489       Set this attribute to a number between 1 and 9 causes  "error_diag" to
490       be automatically called in void context upon errors.
491
492       In case of error "2012 - EOF", this call will be void.
493
494       If "auto_diag" is set to a numeric value greater than 1, it will "die"
495       on errors instead of "warn".  If set to anything unrecognized,  it will
496       be silently ignored.
497
498       Future extensions to this feature will include more reliable auto-
499       detection of  "autodie"  being active in the scope of which the error
500       occurred which will increment the value of "auto_diag" with  1 the
501       moment the error is detected.
502
503       diag_verbose
504
505        my $csv = Text::CSV_XS->new ({ diag_verbose => 1 });
506                $csv->diag_verbose (2);
507        my $l = $csv->diag_verbose;
508
509       Set the verbosity of the output triggered by "auto_diag".   Currently
510       only adds the current  input-record-number  (if known)  to the
511       diagnostic output with an indication of the position of the error.
512
513       blank_is_undef
514
515        my $csv = Text::CSV_XS->new ({ blank_is_undef => 1 });
516                $csv->blank_is_undef (0);
517        my $f = $csv->blank_is_undef;
518
519       Under normal circumstances, "CSV" data makes no distinction between
520       quoted- and unquoted empty fields.  These both end up in an empty
521       string field once read, thus
522
523        1,"",," ",2
524
525       is read as
526
527        ("1", "", "", " ", "2")
528
529       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
530       set, the unquoted  empty field is the result of an undefined value.
531       To enable this distinction when  reading "CSV"  data,  the
532       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
533       to "undef", causing the above to be parsed as
534
535        ("1", "", undef, " ", "2")
536
537       Note that this is specifically important when loading  "CSV" fields
538       into a database that allows "NULL" values,  as the perl equivalent for
539       "NULL" is "undef" in DBI land.
540
541       empty_is_undef
542
543        my $csv = Text::CSV_XS->new ({ empty_is_undef => 1 });
544                $csv->empty_is_undef (0);
545        my $f = $csv->empty_is_undef;
546
547       Going one  step  further  than  "blank_is_undef",  this attribute
548       converts all empty fields to "undef", so
549
550        1,"",," ",2
551
552       is read as
553
554        (1, undef, undef, " ", 2)
555
556       Note that this affects only fields that are  originally  empty,  not
557       fields that are empty after stripping allowed whitespace. YMMV.
558
559       allow_whitespace
560
561        my $csv = Text::CSV_XS->new ({ allow_whitespace => 1 });
562                $csv->allow_whitespace (0);
563        my $f = $csv->allow_whitespace;
564
565       When this option is set to true,  the whitespace  ("TAB"'s and
566       "SPACE"'s) surrounding  the  separation character  is removed when
567       parsing.  If either "TAB" or "SPACE" is one of the three characters
568       "sep_char", "quote_char", or "escape_char" it will not be considered
569       whitespace.
570
571       Now lines like:
572
573        1 , "foo" , bar , 3 , zapp
574
575       are parsed as valid "CSV", even though it violates the "CSV" specs.
576
577       Note that  all  whitespace is stripped from both  start and  end of
578       each field.  That would make it  more than a feature to enable parsing
579       bad "CSV" lines, as
580
581        1,   2.0,  3,   ape  , monkey
582
583       will now be parsed as
584
585        ("1", "2.0", "3", "ape", "monkey")
586
587       even if the original line was perfectly acceptable "CSV".
588
589       allow_loose_quotes
590
591        my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1 });
592                $csv->allow_loose_quotes (0);
593        my $f = $csv->allow_loose_quotes;
594
595       By default, parsing unquoted fields containing "quote_char" characters
596       like
597
598        1,foo "bar" baz,42
599
600       would result in parse error 2034.  Though it is still bad practice to
601       allow this format,  we  cannot  help  the  fact  that  some  vendors
602       make  their applications spit out lines styled this way.
603
604       If there is really bad "CSV" data, like
605
606        1,"foo "bar" baz",42
607
608       or
609
610        1,""foo bar baz"",42
611
612       there is a way to get this data-line parsed and leave the quotes inside
613       the quoted field as-is.  This can be achieved by setting
614       "allow_loose_quotes" AND making sure that the "escape_char" is  not
615       equal to "quote_char".
616
617       allow_loose_escapes
618
619        my $csv = Text::CSV_XS->new ({ allow_loose_escapes => 1 });
620                $csv->allow_loose_escapes (0);
621        my $f = $csv->allow_loose_escapes;
622
623       Parsing fields  that  have  "escape_char"  characters that escape
624       characters that do not need to be escaped, like:
625
626        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
627        $csv->parse (qq{1,"my bar\'s",baz,42});
628
629       would result in parse error 2025.   Though it is bad practice to allow
630       this format,  this attribute enables you to treat all escape character
631       sequences equal.
632
633       allow_unquoted_escape
634
635        my $csv = Text::CSV_XS->new ({ allow_unquoted_escape => 1 });
636                $csv->allow_unquoted_escape (0);
637        my $f = $csv->allow_unquoted_escape;
638
639       A backward compatibility issue where "escape_char" differs from
640       "quote_char"  prevents  "escape_char" to be in the first position of a
641       field.  If "quote_char" is equal to the default """ and "escape_char"
642       is set to "\", this would be illegal:
643
644        1,\0,2
645
646       Setting this attribute to 1  might help to overcome issues with
647       backward compatibility and allow this style.
648
649       always_quote
650
651        my $csv = Text::CSV_XS->new ({ always_quote => 1 });
652                $csv->always_quote (0);
653        my $f = $csv->always_quote;
654
655       By default the generated fields are quoted only if they need to be.
656       For example, if they contain the separator character. If you set this
657       attribute to 1 then all defined fields will be quoted. ("undef" fields
658       are not quoted, see "blank_is_undef"). This makes it quite often easier
659       to handle exported data in external applications.   (Poor creatures who
660       are better to use Text::CSV_XS. :)
661
662       quote_space
663
664        my $csv = Text::CSV_XS->new ({ quote_space => 1 });
665                $csv->quote_space (0);
666        my $f = $csv->quote_space;
667
668       By default,  a space in a field would trigger quotation.  As no rule
669       exists this to be forced in "CSV",  nor any for the opposite, the
670       default is true for safety.   You can exclude the space  from this
671       trigger  by setting this attribute to 0.
672
673       quote_empty
674
675        my $csv = Text::CSV_XS->new ({ quote_empty => 1 });
676                $csv->quote_empty (0);
677        my $f = $csv->quote_empty;
678
679       By default the generated fields are quoted only if they need to be.
680       An empty (defined) field does not need quotation. If you set this
681       attribute to 1 then empty defined fields will be quoted.  ("undef"
682       fields are not quoted, see "blank_is_undef"). See also "always_quote".
683
684       quote_binary
685
686        my $csv = Text::CSV_XS->new ({ quote_binary => 1 });
687                $csv->quote_binary (0);
688        my $f = $csv->quote_binary;
689
690       By default,  all "unsafe" bytes inside a string cause the combined
691       field to be quoted.  By setting this attribute to 0, you can disable
692       that trigger for bytes ">= 0x7F".
693
694       escape_null
695
696        my $csv = Text::CSV_XS->new ({ escape_null => 1 });
697                $csv->escape_null (0);
698        my $f = $csv->escape_null;
699
700       By default, a "NULL" byte in a field would be escaped. This option
701       enables you to treat the  "NULL"  byte as a simple binary character in
702       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
703       can prevent "NULL" escapes by setting this attribute to 0.
704
705       When the "escape_char" attribute is set to undefined,  this attribute
706       will be set to false.
707
708       The default setting will encode "=\x00=" as
709
710        "="0="
711
712       With "escape_null" set, this will result in
713
714        "=\x00="
715
716       The default when using the "csv" function is "false".
717
718       For backward compatibility reasons,  the deprecated old name
719       "quote_null" is still recognized.
720
721       keep_meta_info
722
723        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1 });
724                $csv->keep_meta_info (0);
725        my $f = $csv->keep_meta_info;
726
727       By default, the parsing of input records is as simple and fast as
728       possible.  However,  some parsing information - like quotation of the
729       original field - is lost in that process.  Setting this flag to true
730       enables retrieving that information after parsing with  the methods
731       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
732       false for performance.
733
734       If you set this attribute to a value greater than 9,   then you can
735       control output quotation style like it was used in the input of the the
736       last parsed record (unless quotation was added because of other
737       reasons).
738
739        my $csv = Text::CSV_XS->new ({
740           binary         => 1,
741           keep_meta_info => 1,
742           quote_space    => 0,
743           });
744
745        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
746
747        $csv->print (*STDOUT, \@row);
748        # 1,,, , ,f,g,"h""h",help,help
749        $csv->keep_meta_info (11);
750        $csv->print (*STDOUT, \@row);
751        # 1,,"", ," ",f,"g","h""h",help,"help"
752
753       undef_str
754
755        my $csv = Text::CSV_XS->new ({ undef_str => "\\N" });
756                $csv->undef_str (undef);
757        my $s = $csv->undef_str;
758
759       This attribute optionally defines the output of undefined fields. The
760       value passed is not changed at all, so if it needs quotation, the
761       quotation needs to be included in the value of the attribute.  Use with
762       caution, as passing a value like  ",",,,,"""  will for sure mess up
763       your output. The default for this attribute is "undef", meaning no
764       special treatment.
765
766       This attribute is useful when exporting  CSV data  to be imported in
767       custom loaders, like for MySQL, that recognize special sequences for
768       "NULL" data.
769
770       This attribute has no meaning when parsing CSV data.
771
772       comment_str
773
774        my $csv = Text::CSV_XS->new ({ comment_str => "#" });
775                $csv->comment_str (undef);
776        my $s = $csv->comment_str;
777
778       This attribute optionally defines a string to be recognized as comment.
779       If this attribute is defined,   all lines starting with this sequence
780       will not be parsed as CSV but skipped as comment.
781
782       This attribute has no meaning when generating CSV.
783
784       Comment strings that start with any of the special characters/sequences
785       are not supported (so it cannot start with any of "sep_char",
786       "quote_char", "escape_char", "sep", "quote", or "eol").
787
788       For convenience, "comment" is an alias for "comment_str".
789
790       verbatim
791
792        my $csv = Text::CSV_XS->new ({ verbatim => 1 });
793                $csv->verbatim (0);
794        my $f = $csv->verbatim;
795
796       This is a quite controversial attribute to set,  but makes some hard
797       things possible.
798
799       The rationale behind this attribute is to tell the parser that the
800       normally special characters newline ("NL") and Carriage Return ("CR")
801       will not be special when this flag is set,  and be dealt with  as being
802       ordinary binary characters. This will ease working with data with
803       embedded newlines.
804
805       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
806       every line.
807
808       Imagine a file format like
809
810        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
811
812       where, the line ending is a very specific "#\r\n", and the sep_char is
813       a "^" (caret).   None of the fields is quoted,   but embedded binary
814       data is likely to be present. With the specific line ending, this
815       should not be too hard to detect.
816
817       By default,  Text::CSV_XS'  parse function is instructed to only know
818       about "\n" and "\r"  to be legal line endings,  and so has to deal with
819       the embedded newline as a real "end-of-line",  so it can scan the next
820       line if binary is true, and the newline is inside a quoted field. With
821       this option, we tell "parse" to parse the line as if "\n" is just
822       nothing more than a binary character.
823
824       For "parse" this means that the parser has no more idea about line
825       ending and "getline" "chomp"s line endings on reading.
826
827       types
828
829       A set of column types; the attribute is immediately passed to the
830       "types" method.
831
832       callbacks
833
834       See the "Callbacks" section below.
835
836       accessors
837
838       To sum it up,
839
840        $csv = Text::CSV_XS->new ();
841
842       is equivalent to
843
844        $csv = Text::CSV_XS->new ({
845            eol                   => undef, # \r, \n, or \r\n
846            sep_char              => ',',
847            sep                   => undef,
848            quote_char            => '"',
849            quote                 => undef,
850            escape_char           => '"',
851            binary                => 0,
852            decode_utf8           => 1,
853            auto_diag             => 0,
854            diag_verbose          => 0,
855            blank_is_undef        => 0,
856            empty_is_undef        => 0,
857            allow_whitespace      => 0,
858            allow_loose_quotes    => 0,
859            allow_loose_escapes   => 0,
860            allow_unquoted_escape => 0,
861            always_quote          => 0,
862            quote_empty           => 0,
863            quote_space           => 1,
864            escape_null           => 1,
865            quote_binary          => 1,
866            keep_meta_info        => 0,
867            strict                => 0,
868            skip_empty_rows       => 0,
869            formula               => 0,
870            verbatim              => 0,
871            undef_str             => undef,
872            comment_str           => undef,
873            types                 => undef,
874            callbacks             => undef,
875            });
876
877       For all of the above mentioned flags, an accessor method is available
878       where you can inquire the current value, or change the value
879
880        my $quote = $csv->quote_char;
881        $csv->binary (1);
882
883       It is not wise to change these settings halfway through writing "CSV"
884       data to a stream. If however you want to create a new stream using the
885       available "CSV" object, there is no harm in changing them.
886
887       If the "new" constructor call fails,  it returns "undef",  and makes
888       the fail reason available through the "error_diag" method.
889
890        $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
891            die "".Text::CSV_XS->error_diag ();
892
893       "error_diag" will return a string like
894
895        "INI - Unknown attribute 'ecs_char'"
896
897   known_attributes
898        @attr = Text::CSV_XS->known_attributes;
899        @attr = Text::CSV_XS::known_attributes;
900        @attr = $csv->known_attributes;
901
902       This method will return an ordered list of all the supported
903       attributes as described above.   This can be useful for knowing what
904       attributes are valid in classes that use or extend Text::CSV_XS.
905
906   print
907        $status = $csv->print ($fh, $colref);
908
909       Similar to  "combine" + "string" + "print",  but much more efficient.
910       It expects an array ref as input  (not an array!)  and the resulting
911       string is not really  created,  but  immediately  written  to the  $fh
912       object, typically an IO handle or any other object that offers a
913       "print" method.
914
915       For performance reasons  "print"  does not create a result string,  so
916       all "string", "status", "fields", and "error_input" methods will return
917       undefined information after executing this method.
918
919       If $colref is "undef"  (explicit,  not through a variable argument) and
920       "bind_columns"  was used to specify fields to be printed,  it is
921       possible to make performance improvements, as otherwise data would have
922       to be copied as arguments to the method call:
923
924        $csv->bind_columns (\($foo, $bar));
925        $status = $csv->print ($fh, undef);
926
927       A short benchmark
928
929        my @data = ("aa" .. "zz");
930        $csv->bind_columns (\(@data));
931
932        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
933        $csv->print ($fh,  \@data  );   # 57600 recs/sec
934        $csv->print ($fh,   undef  );   # 48500 recs/sec
935
936   say
937        $status = $csv->say ($fh, $colref);
938
939       Like "print", but "eol" defaults to "$\".
940
941   print_hr
942        $csv->print_hr ($fh, $ref);
943
944       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
945       provided the column names are set with "column_names".
946
947       It is just a wrapper method with basic parameter checks over
948
949        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
950
951   combine
952        $status = $csv->combine (@fields);
953
954       This method constructs a "CSV" record from  @fields,  returning success
955       or failure.   Failure can result from lack of arguments or an argument
956       that contains an invalid character.   Upon success,  "string" can be
957       called to retrieve the resultant "CSV" string.  Upon failure,  the
958       value returned by "string" is undefined and "error_input" could be
959       called to retrieve the invalid argument.
960
961   string
962        $line = $csv->string ();
963
964       This method returns the input to  "parse"  or the resultant "CSV"
965       string of "combine", whichever was called more recently.
966
967   getline
968        $colref = $csv->getline ($fh);
969
970       This is the counterpart to  "print",  as "parse"  is the counterpart to
971       "combine":  it parses a row from the $fh  handle using the "getline"
972       method associated with $fh  and parses this row into an array ref.
973       This array ref is returned by the function or "undef" for failure.
974       When $fh does not support "getline", you are likely to hit errors.
975
976       When fields are bound with "bind_columns" the return value is a
977       reference to an empty list.
978
979       The "string", "fields", and "status" methods are meaningless again.
980
981   getline_all
982        $arrayref = $csv->getline_all ($fh);
983        $arrayref = $csv->getline_all ($fh, $offset);
984        $arrayref = $csv->getline_all ($fh, $offset, $length);
985
986       This will return a reference to a list of getline ($fh) results.  In
987       this call, "keep_meta_info" is disabled.  If $offset is negative, as
988       with "splice", only the last  "abs ($offset)" records of $fh are taken
989       into consideration.
990
991       Given a CSV file with 10 lines:
992
993        lines call
994        ----- ---------------------------------------------------------
995        0..9  $csv->getline_all ($fh)         # all
996        0..9  $csv->getline_all ($fh,  0)     # all
997        8..9  $csv->getline_all ($fh,  8)     # start at 8
998        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
999        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
1000        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
1001        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
1002        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
1003
1004   getline_hr
1005       The "getline_hr" and "column_names" methods work together  to allow you
1006       to have rows returned as hashrefs.  You must call "column_names" first
1007       to declare your column names.
1008
1009        $csv->column_names (qw( code name price description ));
1010        $hr = $csv->getline_hr ($fh);
1011        print "Price for $hr->{name} is $hr->{price} EUR\n";
1012
1013       "getline_hr" will croak if called before "column_names".
1014
1015       Note that  "getline_hr"  creates a hashref for every row and will be
1016       much slower than the combined use of "bind_columns"  and "getline" but
1017       still offering the same easy to use hashref inside the loop:
1018
1019        my @cols = @{$csv->getline ($fh)};
1020        $csv->column_names (@cols);
1021        while (my $row = $csv->getline_hr ($fh)) {
1022            print $row->{price};
1023            }
1024
1025       Could easily be rewritten to the much faster:
1026
1027        my @cols = @{$csv->getline ($fh)};
1028        my $row = {};
1029        $csv->bind_columns (\@{$row}{@cols});
1030        while ($csv->getline ($fh)) {
1031            print $row->{price};
1032            }
1033
1034       Your mileage may vary for the size of the data and the number of rows.
1035       With perl-5.14.2 the comparison for a 100_000 line file with 14
1036       columns:
1037
1038                   Rate hashrefs getlines
1039        hashrefs 1.00/s       --     -76%
1040        getlines 4.15/s     313%       --
1041
1042   getline_hr_all
1043        $arrayref = $csv->getline_hr_all ($fh);
1044        $arrayref = $csv->getline_hr_all ($fh, $offset);
1045        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
1046
1047       This will return a reference to a list of   getline_hr ($fh) results.
1048       In this call, "keep_meta_info" is disabled.
1049
1050   parse
1051        $status = $csv->parse ($line);
1052
1053       This method decomposes a  "CSV"  string into fields,  returning success
1054       or failure.   Failure can result from a lack of argument  or the given
1055       "CSV" string is improperly formatted.   Upon success, "fields" can be
1056       called to retrieve the decomposed fields. Upon failure calling "fields"
1057       will return undefined data and  "error_input"  can be called to
1058       retrieve  the invalid argument.
1059
1060       You may use the "types"  method for setting column types.  See "types"'
1061       description below.
1062
1063       The $line argument is supposed to be a simple scalar. Everything else
1064       is supposed to croak and set error 1500.
1065
1066   fragment
1067       This function tries to implement RFC7111  (URI Fragment Identifiers for
1068       the text/csv Media Type) -
1069       https://datatracker.ietf.org/doc/html/rfc7111
1070
1071        my $AoA = $csv->fragment ($fh, $spec);
1072
1073       In specifications,  "*" is used to specify the last item, a dash ("-")
1074       to indicate a range.   All indices are 1-based:  the first row or
1075       column has index 1. Selections can be combined with the semi-colon
1076       (";").
1077
1078       When using this method in combination with  "column_names",  the
1079       returned reference  will point to a  list of hashes  instead of a  list
1080       of lists.  A disjointed  cell-based combined selection  might return
1081       rows with different number of columns making the use of hashes
1082       unpredictable.
1083
1084        $csv->column_names ("Name", "Age");
1085        my $AoH = $csv->fragment ($fh, "col=3;8");
1086
1087       If the "after_parse" callback is active,  it is also called on every
1088       line parsed and skipped before the fragment.
1089
1090       row
1091          row=4
1092          row=5-7
1093          row=6-*
1094          row=1-2;4;6-*
1095
1096       col
1097          col=2
1098          col=1-3
1099          col=4-*
1100          col=1-2;4;7-*
1101
1102       cell
1103         In cell-based selection, the comma (",") is used to pair row and
1104         column
1105
1106          cell=4,1
1107
1108         The range operator ("-") using "cell"s can be used to define top-left
1109         and bottom-right "cell" location
1110
1111          cell=3,1-4,6
1112
1113         The "*" is only allowed in the second part of a pair
1114
1115          cell=3,2-*,2    # row 3 till end, only column 2
1116          cell=3,2-3,*    # column 2 till end, only row 3
1117          cell=3,2-*,*    # strip row 1 and 2, and column 1
1118
1119         Cells and cell ranges may be combined with ";", possibly resulting in
1120         rows with different numbers of columns
1121
1122          cell=1,1-2,2;3,3-4,4;1,4;4,1
1123
1124         Disjointed selections will only return selected cells.   The cells
1125         that are not  specified  will  not  be  included  in the  returned
1126         set,  not even as "undef".  As an example given a "CSV" like
1127
1128          11,12,13,...19
1129          21,22,...28,29
1130          :            :
1131          91,...97,98,99
1132
1133         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1134
1135          11,12,14
1136          21,22
1137          33,34
1138          41,43,44
1139
1140         Overlapping cell-specs will return those cells only once, So
1141         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1142
1143          11,12,13
1144          21,22,23,24
1145          31,32,33,34
1146          42,43,44
1147
1148       RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does  not
1149       allow different types of specs to be combined   (either "row" or "col"
1150       or "cell").  Passing an invalid fragment specification will croak and
1151       set error 2013.
1152
1153   column_names
1154       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1155       keys (column names) are passed, it will return the current setting as a
1156       list.
1157
1158       "column_names" accepts a list of scalars  (the column names)  or a
1159       single array_ref, so you can pass the return value from "getline" too:
1160
1161        $csv->column_names ($csv->getline ($fh));
1162
1163       "column_names" does no checking on duplicates at all, which might lead
1164       to unexpected results.   Undefined entries will be replaced with the
1165       string "\cAUNDEF\cA", so
1166
1167        $csv->column_names (undef, "", "name", "name");
1168        $hr = $csv->getline_hr ($fh);
1169
1170       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1171       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1172       field.
1173
1174       "column_names" croaks on invalid arguments.
1175
1176   header
1177       This method does NOT work in perl-5.6.x
1178
1179       Parse the CSV header and set "sep", column_names and encoding.
1180
1181        my @hdr = $csv->header ($fh);
1182        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1183        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1184
1185       The first argument should be a file handle.
1186
1187       This method resets some object properties,  as it is supposed to be
1188       invoked only once per file or stream.  It will leave attributes
1189       "column_names" and "bound_columns" alone if setting column names is
1190       disabled. Reading headers on previously process objects might fail on
1191       perl-5.8.0 and older.
1192
1193       Assuming that the file opened for parsing has a header, and the header
1194       does not contain problematic characters like embedded newlines,   read
1195       the first line from the open handle then auto-detect whether the header
1196       separates the column names with a character from the allowed separator
1197       list.
1198
1199       If any of the allowed separators matches,  and none of the other
1200       allowed separators match,  set  "sep"  to that  separator  for the
1201       current CSV_XS instance and use it to parse the first line, map those
1202       to lowercase, and use that to set the instance "column_names":
1203
1204        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1205        open my $fh, "<", "file.csv";
1206        binmode $fh; # for Windows
1207        $csv->header ($fh);
1208        while (my $row = $csv->getline_hr ($fh)) {
1209            ...
1210            }
1211
1212       If the header is empty,  contains more than one unique separator out of
1213       the allowed set,  contains empty fields,   or contains identical fields
1214       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1215       respectively.
1216
1217       If the header contains embedded newlines or is not valid  CSV  in any
1218       other way, this method will croak and leave the parse error untouched.
1219
1220       A successful call to "header"  will always set the  "sep"  of the $csv
1221       object. This behavior can not be disabled.
1222
1223       return value
1224
1225       On error this method will croak.
1226
1227       In list context,  the headers will be returned whether they are used to
1228       set "column_names" or not.
1229
1230       In scalar context, the instance itself is returned.  Note: the values
1231       as found in the header will effectively be  lost if  "set_column_names"
1232       is false.
1233
1234       Options
1235
1236       sep_set
1237          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1238
1239         The list of legal separators defaults to "[ ";", "," ]" and can be
1240         changed by this option.  As this is probably the most often used
1241         option,  it can be passed on its own as an unnamed argument:
1242
1243          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1244
1245         Multi-byte  sequences are allowed,  both multi-character and
1246         Unicode.  See "sep".
1247
1248       detect_bom
1249          $csv->header ($fh, { detect_bom => 1 });
1250
1251         The default behavior is to detect if the header line starts with a
1252         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1253         This default behavior can be disabled by passing a false value to
1254         "detect_bom".
1255
1256         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1257         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1258         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1259         supported.
1260
1261         If a supported BOM was detected as start of the stream, it is stored
1262         in the object attribute "ENCODING".
1263
1264          my $enc = $csv->{ENCODING};
1265
1266         The encoding is used with "binmode" on $fh.
1267
1268         If the handle was opened in a (correct) encoding,  this method will
1269         not alter the encoding, as it checks the leading bytes of the first
1270         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1271         "{ENCODING}" will be "" (empty) instead of the default "undef".
1272
1273       munge_column_names
1274         This option offers the means to modify the column names into
1275         something that is most useful to the application.   The default is to
1276         map all column names to lower case.
1277
1278          $csv->header ($fh, { munge_column_names => "lc" });
1279
1280         The following values are available:
1281
1282           lc     - lower case
1283           uc     - upper case
1284           db     - valid DB field names
1285           none   - do not change
1286           \%hash - supply a mapping
1287           \&cb   - supply a callback
1288
1289         Lower case
1290            $csv->header ($fh, { munge_column_names => "lc" });
1291
1292           The header is changed to all lower-case
1293
1294            $_ = lc;
1295
1296         Upper case
1297            $csv->header ($fh, { munge_column_names => "uc" });
1298
1299           The header is changed to all upper-case
1300
1301            $_ = uc;
1302
1303         Literal
1304            $csv->header ($fh, { munge_column_names => "none" });
1305
1306         Hash
1307            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1308
1309           if a value does not exist, the original value is used unchanged
1310
1311         Database
1312            $csv->header ($fh, { munge_column_names => "db" });
1313
1314           - lower-case
1315
1316           - all sequences of non-word characters are replaced with an
1317             underscore
1318
1319           - all leading underscores are removed
1320
1321            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1322
1323         Callback
1324            $csv->header ($fh, { munge_column_names => sub { fc } });
1325            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1326            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1327
1328           As this callback is called in a "map", you can use $_ directly.
1329
1330       set_column_names
1331          $csv->header ($fh, { set_column_names => 1 });
1332
1333         The default is to set the instances column names using
1334         "column_names" if the method is successful,  so subsequent calls to
1335         "getline_hr" can return a hash. Disable setting the header can be
1336         forced by using a false value for this option.
1337
1338         As described in "return value" above, content is lost in scalar
1339         context.
1340
1341       Validation
1342
1343       When receiving CSV files from external sources,  this method can be
1344       used to protect against changes in the layout by restricting to known
1345       headers  (and typos in the header fields).
1346
1347        my %known = (
1348            "record key" => "c_rec",
1349            "rec id"     => "c_rec",
1350            "id_rec"     => "c_rec",
1351            "kode"       => "code",
1352            "code"       => "code",
1353            "vaule"      => "value",
1354            "value"      => "value",
1355            );
1356        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1357        open my $fh, "<", $source or die "$source: $!";
1358        $csv->header ($fh, { munge_column_names => sub {
1359            s/\s+$//;
1360            s/^\s+//;
1361            $known{lc $_} or die "Unknown column '$_' in $source";
1362            }});
1363        while (my $row = $csv->getline_hr ($fh)) {
1364            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1365            }
1366
1367   bind_columns
1368       Takes a list of scalar references to be used for output with  "print"
1369       or to store in the fields fetched by "getline".  When you do not pass
1370       enough references to store the fetched fields in, "getline" will fail
1371       with error 3006.  If you pass more than there are fields to return,
1372       the content of the remaining references is left untouched.
1373
1374        $csv->bind_columns (\$code, \$name, \$price, \$description);
1375        while ($csv->getline ($fh)) {
1376            print "The price of a $name is \x{20ac} $price\n";
1377            }
1378
1379       To reset or clear all column binding, call "bind_columns" with the
1380       single argument "undef". This will also clear column names.
1381
1382        $csv->bind_columns (undef);
1383
1384       If no arguments are passed at all, "bind_columns" will return the list
1385       of current bindings or "undef" if no binds are active.
1386
1387       Note that in parsing with  "bind_columns",  the fields are set on the
1388       fly.  That implies that if the third field of a row causes an error
1389       (or this row has just two fields where the previous row had more),  the
1390       first two fields already have been assigned the values of the current
1391       row, while the rest of the fields will still hold the values of the
1392       previous row.  If you want the parser to fail in these cases, use the
1393       "strict" attribute.
1394
1395   eof
1396        $eof = $csv->eof ();
1397
1398       If "parse" or  "getline"  was used with an IO stream,  this method will
1399       return true (1) if the last call hit end of file,  otherwise it will
1400       return false ('').  This is useful to see the difference between a
1401       failure and end of file.
1402
1403       Note that if the parsing of the last line caused an error,  "eof" is
1404       still true.  That means that if you are not using "auto_diag", an idiom
1405       like
1406
1407        while (my $row = $csv->getline ($fh)) {
1408            # ...
1409            }
1410        $csv->eof or $csv->error_diag;
1411
1412       will not report the error. You would have to change that to
1413
1414        while (my $row = $csv->getline ($fh)) {
1415            # ...
1416            }
1417        +$csv->error_diag and $csv->error_diag;
1418
1419   types
1420        $csv->types (\@tref);
1421
1422       This method is used to force that  (all)  columns are of a given type.
1423       For example, if you have an integer column,  two  columns  with
1424       doubles  and a string column, then you might do a
1425
1426        $csv->types ([Text::CSV_XS::IV (),
1427                      Text::CSV_XS::NV (),
1428                      Text::CSV_XS::NV (),
1429                      Text::CSV_XS::PV ()]);
1430
1431       Column types are used only for decoding columns while parsing,  in
1432       other words by the "parse" and "getline" methods.
1433
1434       You can unset column types by doing a
1435
1436        $csv->types (undef);
1437
1438       or fetch the current type settings with
1439
1440        $types = $csv->types ();
1441
1442       IV
1443       CSV_TYPE_IV
1444           Set field type to integer.
1445
1446       NV
1447       CSV_TYPE_NV
1448           Set field type to numeric/float.
1449
1450       PV
1451       CSV_TYPE_PV
1452           Set field type to string.
1453
1454   fields
1455        @columns = $csv->fields ();
1456
1457       This method returns the input to   "combine"  or the resultant
1458       decomposed fields of a successful "parse", whichever was called more
1459       recently.
1460
1461       Note that the return value is undefined after using "getline", which
1462       does not fill the data structures returned by "parse".
1463
1464   meta_info
1465        @flags = $csv->meta_info ();
1466
1467       This method returns the "flags" of the input to "combine" or the flags
1468       of the resultant  decomposed fields of  "parse",   whichever was called
1469       more recently.
1470
1471       For each field,  a meta_info field will hold  flags that  inform
1472       something about  the  field  returned  by  the  "fields"  method or
1473       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1474
1475       0x0001
1476       "CSV_FLAGS_IS_QUOTED"
1477         The field was quoted.
1478
1479       0x0002
1480       "CSV_FLAGS_IS_BINARY"
1481         The field was binary.
1482
1483       0x0004
1484       "CSV_FLAGS_ERROR_IN_FIELD"
1485         The field was invalid.
1486
1487         Currently only used when "allow_loose_quotes" is active.
1488
1489       0x0010
1490       "CSV_FLAGS_IS_MISSING"
1491         The field was missing.
1492
1493       See the "is_***" methods below.
1494
1495   is_quoted
1496        my $quoted = $csv->is_quoted ($column_idx);
1497
1498       where  $column_idx is the  (zero-based)  index of the column in the
1499       last result of "parse".
1500
1501       This returns a true value  if the data in the indicated column was
1502       enclosed in "quote_char" quotes.  This might be important for fields
1503       where content ",20070108," is to be treated as a numeric value,  and
1504       where ","20070108"," is explicitly marked as character string data.
1505
1506       This method is only valid when "keep_meta_info" is set to a true value.
1507
1508   is_binary
1509        my $binary = $csv->is_binary ($column_idx);
1510
1511       where  $column_idx is the  (zero-based)  index of the column in the
1512       last result of "parse".
1513
1514       This returns a true value if the data in the indicated column contained
1515       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1516
1517       This method is only valid when "keep_meta_info" is set to a true value.
1518
1519   is_missing
1520        my $missing = $csv->is_missing ($column_idx);
1521
1522       where  $column_idx is the  (zero-based)  index of the column in the
1523       last result of "getline_hr".
1524
1525        $csv->keep_meta_info (1);
1526        while (my $hr = $csv->getline_hr ($fh)) {
1527            $csv->is_missing (0) and next; # This was an empty line
1528            }
1529
1530       When using  "getline_hr",  it is impossible to tell if the  parsed
1531       fields are "undef" because they where not filled in the "CSV" stream
1532       or because they were not read at all, as all the fields defined by
1533       "column_names" are set in the hash-ref.    If you still need to know if
1534       all fields in each row are provided, you should enable "keep_meta_info"
1535       so you can check the flags.
1536
1537       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1538       "undef", regardless of $column_idx being valid or not. If this
1539       attribute is "true" it will return either 0 (the field is present) or 1
1540       (the field is missing).
1541
1542       A special case is the empty line.  If the line is completely empty -
1543       after dealing with the flags - this is still a valid CSV line:  it is a
1544       record of just one single empty field. However, if "keep_meta_info" is
1545       set, invoking "is_missing" with index 0 will now return true.
1546
1547   status
1548        $status = $csv->status ();
1549
1550       This method returns the status of the last invoked "combine" or "parse"
1551       call. Status is success (true: 1) or failure (false: "undef" or 0).
1552
1553       Note that as this only keeps track of the status of above mentioned
1554       methods, you are probably looking for "error_diag" instead.
1555
1556   error_input
1557        $bad_argument = $csv->error_input ();
1558
1559       This method returns the erroneous argument (if it exists) of "combine"
1560       or "parse",  whichever was called more recently.  If the last
1561       invocation was successful, "error_input" will return "undef".
1562
1563       Depending on the type of error, it might also hold the data for the
1564       last error-input of "getline".
1565
1566   error_diag
1567        Text::CSV_XS->error_diag ();
1568        $csv->error_diag ();
1569        $error_code               = 0  + $csv->error_diag ();
1570        $error_str                = "" . $csv->error_diag ();
1571        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1572
1573       If (and only if) an error occurred,  this function returns  the
1574       diagnostics of that error.
1575
1576       If called in void context,  this will print the internal error code and
1577       the associated error message to STDERR.
1578
1579       If called in list context,  this will return  the error code  and the
1580       error message in that order.  If the last error was from parsing, the
1581       rest of the values returned are a best guess at the location  within
1582       the line  that was being parsed. Their values are 1-based.  The
1583       position currently is index of the byte at which the parsing failed in
1584       the current record. It might change to be the index of the current
1585       character in a later release. The records is the index of the record
1586       parsed by the csv instance. The field number is the index of the field
1587       the parser thinks it is currently  trying to  parse. See
1588       examples/csv-check for how this can be used.
1589
1590       If called in  scalar context,  it will return  the diagnostics  in a
1591       single scalar, a-la $!.  It will contain the error code in numeric
1592       context, and the diagnostics message in string context.
1593
1594       When called as a class method or a  direct function call,  the
1595       diagnostics are that of the last "new" call.
1596
1597   record_number
1598        $recno = $csv->record_number ();
1599
1600       Returns the records parsed by this csv instance.  This value should be
1601       more accurate than $. when embedded newlines come in play. Records
1602       written by this instance are not counted.
1603
1604   SetDiag
1605        $csv->SetDiag (0);
1606
1607       Use to reset the diagnostics if you are dealing with errors.
1608

IMPORTS/EXPORTS

1610       By default none of these are exported.
1611
1612       csv
1613          use Text::CSV_XS qw( csv );
1614
1615         Import the function "csv" function. See below.
1616
1617       :CONSTANTS
1618          use Text::CSV_XS qw( :CONSTANTS );
1619
1620         Import module constants  "CSV_FLAGS_IS_QUOTED",
1621         "CSV_FLAGS_IS_BINARY", "CSV_FLAGS_ERROR_IN_FIELD",
1622         "CSV_FLAGS_IS_MISSING",   "CSV_TYPE_PV", "CSV_TYPE_IV", and
1623         "CSV_TYPE_NV". Each can be imported alone
1624
1625          use Text::CSV_XS qw( CSV_FLAS_IS_BINARY CSV_TYPE_NV );
1626

FUNCTIONS

1628   csv
1629       This function is not exported by default and should be explicitly
1630       requested:
1631
1632        use Text::CSV_XS qw( csv );
1633
1634       This is a high-level function that aims at simple (user) interfaces.
1635       This can be used to read/parse a "CSV" file or stream (the default
1636       behavior) or to produce a file or write to a stream (define the  "out"
1637       attribute).  It returns an array- or hash-reference on parsing (or
1638       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1639       When this function fails you can get to the error using the class call
1640       to "error_diag"
1641
1642        my $aoa = csv (in => "test.csv") or
1643            die Text::CSV_XS->error_diag;
1644
1645       This function takes the arguments as key-value pairs. This can be
1646       passed as a list or as an anonymous hash:
1647
1648        my $aoa = csv (  in => "test.csv", sep_char => ";");
1649        my $aoh = csv ({ in => $fh, headers => "auto" });
1650
1651       The arguments passed consist of two parts:  the arguments to "csv"
1652       itself and the optional attributes to the  "CSV"  object used inside
1653       the function as enumerated and explained in "new".
1654
1655       If not overridden, the default option used for CSV is
1656
1657        auto_diag   => 1
1658        escape_null => 0
1659
1660       The option that is always set and cannot be altered is
1661
1662        binary      => 1
1663
1664       As this function will likely be used in one-liners,  it allows  "quote"
1665       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1666       "esc" or "escape".
1667
1668       Alternative invocations:
1669
1670        my $aoa = Text::CSV_XS::csv (in => "file.csv");
1671
1672        my $csv = Text::CSV_XS->new ();
1673        my $aoa = $csv->csv (in => "file.csv");
1674
1675       In the latter case, the object attributes are used from the existing
1676       object and the attribute arguments in the function call are ignored:
1677
1678        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
1679        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1680
1681       will parse using ";" as "sep_char", not ",".
1682
1683       in
1684
1685       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1686       which will be  opened for reading  and closed when finished,  a file
1687       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1688       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1689       "\q{1,2,"csv"}").
1690
1691       When used with "out", "in" should be a reference to a CSV structure
1692       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1693       reference.  The code-ref will be invoked with no arguments.
1694
1695        my $aoa = csv (in => "file.csv");
1696
1697        open my $fh, "<", "file.csv";
1698        my $aoa = csv (in => $fh);
1699
1700        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1701        my $err = csv (in => $csv, out => "file.csv");
1702
1703       If called in void context without the "out" attribute, the resulting
1704       ref will be used as input to a subsequent call to csv:
1705
1706        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1707
1708       will be a shortcut to
1709
1710        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1711
1712       where, in the absence of the "out" attribute, this is a shortcut to
1713
1714        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1715             out => *STDOUT)
1716
1717       out
1718
1719        csv (in => $aoa, out => "file.csv");
1720        csv (in => $aoa, out => $fh);
1721        csv (in => $aoa, out =>   STDOUT);
1722        csv (in => $aoa, out =>  *STDOUT);
1723        csv (in => $aoa, out => \*STDOUT);
1724        csv (in => $aoa, out => \my $data);
1725        csv (in => $aoa, out =>  undef);
1726        csv (in => $aoa, out => \"skip");
1727
1728        csv (in => $fh,  out => \@aoa);
1729        csv (in => $fh,  out => \@aoh, bom => 1);
1730        csv (in => $fh,  out => \%hsh, key => "key");
1731
1732       In output mode, the default CSV options when producing CSV are
1733
1734        eol       => "\r\n"
1735
1736       The "fragment" attribute is ignored in output mode.
1737
1738       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1739       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1740       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1741       or a reference to a scalar (e.g. "\my $data").
1742
1743        csv (in => sub { $sth->fetch },            out => "dump.csv");
1744        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1745             headers => $sth->{NAME_lc});
1746
1747       When a code-ref is used for "in", the output is generated  per
1748       invocation, so no buffering is involved. This implies that there is no
1749       size restriction on the number of records. The "csv" function ends when
1750       the coderef returns a false value.
1751
1752       If "out" is set to a reference of the literal string "skip", the output
1753       will be suppressed completely,  which might be useful in combination
1754       with a filter for side effects only.
1755
1756        my %cache;
1757        csv (in    => "dump.csv",
1758             out   => \"skip",
1759             on_in => sub { $cache{$_[1][1]}++ });
1760
1761       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1762       equivalent to "\"skip"".
1763
1764       If the "in" argument point to something to parse, and the "out" is set
1765       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1766       data in the existing reference. The result of the parse should match
1767       what exists in the reference passed. This might come handy when you
1768       have to parse a set of files with similar content (like data stored per
1769       period) and you want to collect that into a single data structure:
1770
1771        my %hash;
1772        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1773
1774        my @list; # List of arrays
1775        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1776
1777        my @list; # List of hashes
1778        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1779
1780       encoding
1781
1782       If passed,  it should be an encoding accepted by the  ":encoding()"
1783       option to "open". There is no default value. This attribute does not
1784       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1785       use in command line invocations.
1786
1787       If "encoding" is set to the literal value "auto", the method "header"
1788       will be invoked on the opened stream to check if there is a BOM and set
1789       the encoding accordingly.   This is equal to passing a true value in
1790       the option "detect_bom".
1791
1792       Encodings can be stacked, as supported by "binmode":
1793
1794        # Using PerlIO::via::gzip
1795        csv (in       => \@csv,
1796             out      => "test.csv:via.gz",
1797             encoding => ":via(gzip):encoding(utf-8)",
1798             );
1799        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1800
1801        # Using PerlIO::gzip
1802        csv (in       => \@csv,
1803             out      => "test.csv:via.gz",
1804             encoding => ":gzip:encoding(utf-8)",
1805             );
1806        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1807
1808       detect_bom
1809
1810       If  "detect_bom"  is given, the method  "header"  will be invoked on
1811       the opened stream to check if there is a BOM and set the encoding
1812       accordingly.
1813
1814       "detect_bom" can be abbreviated to "bom".
1815
1816       This is the same as setting "encoding" to "auto".
1817
1818       Note that as the method  "header" is invoked,  its default is to also
1819       set the headers.
1820
1821       headers
1822
1823       If this attribute is not given, the default behavior is to produce an
1824       array of arrays.
1825
1826       If "headers" is supplied,  it should be an anonymous list of column
1827       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1828       "lc", "uc", or "skip".
1829
1830       skip
1831         When "skip" is used, the header will not be included in the output.
1832
1833          my $aoa = csv (in => $fh, headers => "skip");
1834
1835       auto
1836         If "auto" is used, the first line of the "CSV" source will be read as
1837         the list of field headers and used to produce an array of hashes.
1838
1839          my $aoh = csv (in => $fh, headers => "auto");
1840
1841       lc
1842         If "lc" is used,  the first line of the  "CSV" source will be read as
1843         the list of field headers mapped to  lower case and used to produce
1844         an array of hashes. This is a variation of "auto".
1845
1846          my $aoh = csv (in => $fh, headers => "lc");
1847
1848       uc
1849         If "uc" is used,  the first line of the  "CSV" source will be read as
1850         the list of field headers mapped to  upper case and used to produce
1851         an array of hashes. This is a variation of "auto".
1852
1853          my $aoh = csv (in => $fh, headers => "uc");
1854
1855       CODE
1856         If a coderef is used,  the first line of the  "CSV" source will be
1857         read as the list of mangled field headers in which each field is
1858         passed as the only argument to the coderef. This list is used to
1859         produce an array of hashes.
1860
1861          my $aoh = csv (in      => $fh,
1862                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1863
1864         this example is a variation of using "lc" where all occurrences of
1865         "kode" are replaced with "code".
1866
1867       ARRAY
1868         If  "headers"  is an anonymous list,  the entries in the list will be
1869         used as field names. The first line is considered data instead of
1870         headers.
1871
1872          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1873          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1874
1875       HASH
1876         If "headers" is a hash reference, this implies "auto", but header
1877         fields that exist as key in the hashref will be replaced by the value
1878         for that key. Given a CSV file like
1879
1880          post-kode,city,name,id number,fubble
1881          1234AA,Duckstad,Donald,13,"X313DF"
1882
1883         using
1884
1885          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1886
1887         will return an entry like
1888
1889          { pc     => "1234AA",
1890            city   => "Duckstad",
1891            name   => "Donald",
1892            ID     => "13",
1893            fubble => "X313DF",
1894            }
1895
1896       See also "munge_column_names" and "set_column_names".
1897
1898       munge_column_names
1899
1900       If "munge_column_names" is set,  the method  "header"  is invoked on
1901       the opened stream with all matching arguments to detect and set the
1902       headers.
1903
1904       "munge_column_names" can be abbreviated to "munge".
1905
1906       key
1907
1908       If passed,  will default  "headers"  to "auto" and return a hashref
1909       instead of an array of hashes. Allowed values are simple scalars or
1910       array-references where the first element is the joiner and the rest are
1911       the fields to join to combine the key.
1912
1913        my $ref = csv (in => "test.csv", key => "code");
1914        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1915
1916       with test.csv like
1917
1918        code,product,price,color
1919        1,pc,850,gray
1920        2,keyboard,12,white
1921        3,mouse,5,black
1922
1923       the first example will return
1924
1925         { 1   => {
1926               code    => 1,
1927               color   => 'gray',
1928               price   => 850,
1929               product => 'pc'
1930               },
1931           2   => {
1932               code    => 2,
1933               color   => 'white',
1934               price   => 12,
1935               product => 'keyboard'
1936               },
1937           3   => {
1938               code    => 3,
1939               color   => 'black',
1940               price   => 5,
1941               product => 'mouse'
1942               }
1943           }
1944
1945       the second example will return
1946
1947         { "1:gray"    => {
1948               code    => 1,
1949               color   => 'gray',
1950               price   => 850,
1951               product => 'pc'
1952               },
1953           "2:white"   => {
1954               code    => 2,
1955               color   => 'white',
1956               price   => 12,
1957               product => 'keyboard'
1958               },
1959           "3:black"   => {
1960               code    => 3,
1961               color   => 'black',
1962               price   => 5,
1963               product => 'mouse'
1964               }
1965           }
1966
1967       The "key" attribute can be combined with "headers" for "CSV" date that
1968       has no header line, like
1969
1970        my $ref = csv (
1971            in      => "foo.csv",
1972            headers => [qw( c_foo foo bar description stock )],
1973            key     =>     "c_foo",
1974            );
1975
1976       value
1977
1978       Used to create key-value hashes.
1979
1980       Only allowed when "key" is valid. A "value" can be either a single
1981       column label or an anonymous list of column labels.  In the first case,
1982       the value will be a simple scalar value, in the latter case, it will be
1983       a hashref.
1984
1985        my $ref = csv (in => "test.csv", key   => "code",
1986                                         value => "price");
1987        my $ref = csv (in => "test.csv", key   => "code",
1988                                         value => [ "product", "price" ]);
1989        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1990                                         value => "price");
1991        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1992                                         value => [ "product", "price" ]);
1993
1994       with test.csv like
1995
1996        code,product,price,color
1997        1,pc,850,gray
1998        2,keyboard,12,white
1999        3,mouse,5,black
2000
2001       the first example will return
2002
2003         { 1 => 850,
2004           2 =>  12,
2005           3 =>   5,
2006           }
2007
2008       the second example will return
2009
2010         { 1   => {
2011               price   => 850,
2012               product => 'pc'
2013               },
2014           2   => {
2015               price   => 12,
2016               product => 'keyboard'
2017               },
2018           3   => {
2019               price   => 5,
2020               product => 'mouse'
2021               }
2022           }
2023
2024       the third example will return
2025
2026         { "1:gray"    => 850,
2027           "2:white"   =>  12,
2028           "3:black"   =>   5,
2029           }
2030
2031       the fourth example will return
2032
2033         { "1:gray"    => {
2034               price   => 850,
2035               product => 'pc'
2036               },
2037           "2:white"   => {
2038               price   => 12,
2039               product => 'keyboard'
2040               },
2041           "3:black"   => {
2042               price   => 5,
2043               product => 'mouse'
2044               }
2045           }
2046
2047       keep_headers
2048
2049       When using hashes,  keep the column names into the arrayref passed,  so
2050       all headers are available after the call in the original order.
2051
2052        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
2053
2054       This attribute can be abbreviated to "kh" or passed as
2055       "keep_column_names".
2056
2057       This attribute implies a default of "auto" for the "headers" attribute.
2058
2059       The headers can also be kept internally to keep stable header order:
2060
2061        csv (in      => csv (in => "file.csv", kh => "internal"),
2062             out     => "new.csv",
2063             kh      => "internal");
2064
2065       where "internal" can also be 1, "yes", or "true". This is similar to
2066
2067        my @h;
2068        csv (in      => csv (in => "file.csv", kh => \@h),
2069             out     => "new.csv",
2070             headers => \@h);
2071
2072       fragment
2073
2074       Only output the fragment as defined in the "fragment" method. This
2075       option is ignored when generating "CSV". See "out".
2076
2077       Combining all of them could give something like
2078
2079        use Text::CSV_XS qw( csv );
2080        my $aoh = csv (
2081            in       => "test.txt",
2082            encoding => "utf-8",
2083            headers  => "auto",
2084            sep_char => "|",
2085            fragment => "row=3;6-9;15-*",
2086            );
2087        say $aoh->[15]{Foo};
2088
2089       sep_set
2090
2091       If "sep_set" is set, the method "header" is invoked on the opened
2092       stream to detect and set "sep_char" with the given set.
2093
2094       "sep_set" can be abbreviated to "seps".
2095
2096       Note that as the  "header" method is invoked,  its default is to also
2097       set the headers.
2098
2099       set_column_names
2100
2101       If  "set_column_names" is passed,  the method "header" is invoked on
2102       the opened stream with all arguments meant for "header".
2103
2104       If "set_column_names" is passed as a false value, the content of the
2105       first row is only preserved if the output is AoA:
2106
2107       With an input-file like
2108
2109        bAr,foo
2110        1,2
2111        3,4,5
2112
2113       This call
2114
2115        my $aoa = csv (in => $file, set_column_names => 0);
2116
2117       will result in
2118
2119        [[ "bar", "foo"     ],
2120         [ "1",   "2"       ],
2121         [ "3",   "4",  "5" ]]
2122
2123       and
2124
2125        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2126
2127       will result in
2128
2129        [[ "bAr", "foo"     ],
2130         [ "1",   "2"       ],
2131         [ "3",   "4",  "5" ]]
2132
2133   Callbacks
2134       Callbacks enable actions triggered from the inside of Text::CSV_XS.
2135
2136       While most of what this enables  can easily be done in an  unrolled
2137       loop as described in the "SYNOPSIS" callbacks can be used to meet
2138       special demands or enhance the "csv" function.
2139
2140       error
2141          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2142
2143         the "error"  callback is invoked when an error occurs,  but  only
2144         when "auto_diag" is set to a true value. A callback is invoked with
2145         the values returned by "error_diag":
2146
2147          my ($c, $s);
2148
2149          sub ignore3006 {
2150              my ($err, $msg, $pos, $recno, $fldno) = @_;
2151              if ($err == 3006) {
2152                  # ignore this error
2153                  ($c, $s) = (undef, undef);
2154                  Text::CSV_XS->SetDiag (0);
2155                  }
2156              # Any other error
2157              return;
2158              } # ignore3006
2159
2160          $csv->callbacks (error => \&ignore3006);
2161          $csv->bind_columns (\$c, \$s);
2162          while ($csv->getline ($fh)) {
2163              # Error 3006 will not stop the loop
2164              }
2165
2166       after_parse
2167          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2168          while (my $row = $csv->getline ($fh)) {
2169              $row->[-1] eq "NEW";
2170              }
2171
2172         This callback is invoked after parsing with  "getline"  only if no
2173         error occurred.  The callback is invoked with two arguments:   the
2174         current "CSV" parser object and an array reference to the fields
2175         parsed.
2176
2177         The return code of the callback is ignored  unless it is a reference
2178         to the string "skip", in which case the record will be skipped in
2179         "getline_all".
2180
2181          sub add_from_db {
2182              my ($csv, $row) = @_;
2183              $sth->execute ($row->[4]);
2184              push @$row, $sth->fetchrow_array;
2185              } # add_from_db
2186
2187          my $aoa = csv (in => "file.csv", callbacks => {
2188              after_parse => \&add_from_db });
2189
2190         This hook can be used for validation:
2191
2192         FAIL
2193           Die if any of the records does not validate a rule:
2194
2195            after_parse => sub {
2196                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2197                    die "5th field does not have a valid Dutch zipcode";
2198                }
2199
2200         DEFAULT
2201           Replace invalid fields with a default value:
2202
2203            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2204
2205         SKIP
2206           Skip records that have invalid fields (only applies to
2207           "getline_all"):
2208
2209            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2210
2211       before_print
2212          my $idx = 1;
2213          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2214          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2215
2216         This callback is invoked  before printing with  "print"  only if no
2217         error occurred.  The callback is invoked with two arguments:  the
2218         current  "CSV" parser object and an array reference to the fields
2219         passed.
2220
2221         The return code of the callback is ignored.
2222
2223          sub max_4_fields {
2224              my ($csv, $row) = @_;
2225              @$row > 4 and splice @$row, 4;
2226              } # max_4_fields
2227
2228          csv (in => csv (in => "file.csv"), out => *STDOUT,
2229              callbacks => { before_print => \&max_4_fields });
2230
2231         This callback is not active for "combine".
2232
2233       Callbacks for csv ()
2234
2235       The "csv" allows for some callbacks that do not integrate in XS
2236       internals but only feature the "csv" function.
2237
2238         csv (in        => "file.csv",
2239              callbacks => {
2240                  filter       => { 6 => sub { $_ > 15 } },    # first
2241                  after_parse  => sub { say "AFTER PARSE";  }, # first
2242                  after_in     => sub { say "AFTER IN";     }, # second
2243                  on_in        => sub { say "ON IN";        }, # third
2244                  },
2245              );
2246
2247         csv (in        => $aoh,
2248              out       => "file.csv",
2249              callbacks => {
2250                  on_in        => sub { say "ON IN";        }, # first
2251                  before_out   => sub { say "BEFORE OUT";   }, # second
2252                  before_print => sub { say "BEFORE PRINT"; }, # third
2253                  },
2254              );
2255
2256       filter
2257         This callback can be used to filter records.  It is called just after
2258         a new record has been scanned.  The callback accepts a:
2259
2260         hashref
2261           The keys are the index to the row (the field name or field number,
2262           1-based) and the values are subs to return a true or false value.
2263
2264            csv (in => "file.csv", filter => {
2265                       3 => sub { m/a/ },       # third field should contain an "a"
2266                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2267                       });
2268
2269            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2270
2271           If the keys to the filter hash contain any character that is not a
2272           digit it will also implicitly set "headers" to "auto"  unless
2273           "headers"  was already passed as argument.  When headers are
2274           active, returning an array of hashes, the filter is not applicable
2275           to the header itself.
2276
2277           All sub results should match, as in AND.
2278
2279           The context of the callback sets  $_ localized to the field
2280           indicated by the filter. The two arguments are as with all other
2281           callbacks, so the other fields in the current row can be seen:
2282
2283            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2284
2285           If the context is set to return a list of hashes  ("headers" is
2286           defined), the current record will also be available in the
2287           localized %_:
2288
2289            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2290
2291           If the filter is used to alter the content by changing $_,  make
2292           sure that the sub returns true in order not to have that record
2293           skipped:
2294
2295            filter => { 2 => sub { $_ = uc }}
2296
2297           will upper-case the second field, and then skip it if the resulting
2298           content evaluates to false. To always accept, end with truth:
2299
2300            filter => { 2 => sub { $_ = uc; 1 }}
2301
2302         coderef
2303            csv (in => "file.csv", filter => sub { $n++; 0; });
2304
2305           If the argument to "filter" is a coderef,  it is an alias or
2306           shortcut to a filter on column 0:
2307
2308            csv (filter => sub { $n++; 0 });
2309
2310           is equal to
2311
2312            csv (filter => { 0 => sub { $n++; 0 });
2313
2314         filter-name
2315            csv (in => "file.csv", filter => "not_blank");
2316            csv (in => "file.csv", filter => "not_empty");
2317            csv (in => "file.csv", filter => "filled");
2318
2319           These are predefined filters
2320
2321           Given a file like (line numbers prefixed for doc purpose only):
2322
2323            1:1,2,3
2324            2:
2325            3:,
2326            4:""
2327            5:,,
2328            6:, ,
2329            7:"",
2330            8:" "
2331            9:4,5,6
2332
2333           not_blank
2334             Filter out the blank lines
2335
2336             This filter is a shortcut for
2337
2338              filter => { 0 => sub { @{$_[1]} > 1 or
2339                          defined $_[1][0] && $_[1][0] ne "" } }
2340
2341             Due to the implementation,  it is currently impossible to also
2342             filter lines that consists only of a quoted empty field. These
2343             lines are also considered blank lines.
2344
2345             With the given example, lines 2 and 4 will be skipped.
2346
2347           not_empty
2348             Filter out lines where all the fields are empty.
2349
2350             This filter is a shortcut for
2351
2352              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2353
2354             A space is not regarded being empty, so given the example data,
2355             lines 2, 3, 4, 5, and 7 are skipped.
2356
2357           filled
2358             Filter out lines that have no visible data
2359
2360             This filter is a shortcut for
2361
2362              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2363
2364             This filter rejects all lines that not have at least one field
2365             that does not evaluate to the empty string.
2366
2367             With the given example data, this filter would skip lines 2
2368             through 8.
2369
2370         One could also use modules like Types::Standard:
2371
2372          use Types::Standard -types;
2373
2374          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2375          my $check  = $type->compiled_check;
2376
2377          # filter with compiled check and warnings
2378          my $aoa = csv (
2379             in     => \$data,
2380             filter => {
2381                 0 => sub {
2382                     my $ok = $check->($_[1]) or
2383                         warn $type->get_message ($_[1]), "\n";
2384                     return $ok;
2385                     },
2386                 },
2387             );
2388
2389       after_in
2390         This callback is invoked for each record after all records have been
2391         parsed but before returning the reference to the caller.  The hook is
2392         invoked with two arguments:  the current  "CSV"  parser object  and a
2393         reference to the record.   The reference can be a reference to a
2394         HASH  or a reference to an ARRAY as determined by the arguments.
2395
2396         This callback can also be passed as  an attribute without the
2397         "callbacks" wrapper.
2398
2399       before_out
2400         This callback is invoked for each record before the record is
2401         printed.  The hook is invoked with two arguments:  the current "CSV"
2402         parser object and a reference to the record.   The reference can be a
2403         reference to a  HASH or a reference to an ARRAY as determined by the
2404         arguments.
2405
2406         This callback can also be passed as an attribute  without the
2407         "callbacks" wrapper.
2408
2409         This callback makes the row available in %_ if the row is a hashref.
2410         In this case %_ is writable and will change the original row.
2411
2412       on_in
2413         This callback acts exactly as the "after_in" or the "before_out"
2414         hooks.
2415
2416         This callback can also be passed as an attribute  without the
2417         "callbacks" wrapper.
2418
2419         This callback makes the row available in %_ if the row is a hashref.
2420         In this case %_ is writable and will change the original row. So e.g.
2421         with
2422
2423           my $aoh = csv (
2424               in      => \"foo\n1\n2\n",
2425               headers => "auto",
2426               on_in   => sub { $_{bar} = 2; },
2427               );
2428
2429         $aoh will be:
2430
2431           [ { foo => 1,
2432               bar => 2,
2433               }
2434             { foo => 2,
2435               bar => 2,
2436               }
2437             ]
2438
2439       csv
2440         The function  "csv" can also be called as a method or with an
2441         existing Text::CSV_XS object. This could help if the function is to
2442         be invoked a lot of times and the overhead of creating the object
2443         internally over  and  over again would be prevented by passing an
2444         existing instance.
2445
2446          my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2447
2448          my $aoa = $csv->csv (in => $fh);
2449          my $aoa = csv (in => $fh, csv => $csv);
2450
2451         both act the same. Running this 20000 times on a 20 lines CSV file,
2452         showed a 53% speedup.
2453

INTERNALS

2455       Combine (...)
2456       Parse (...)
2457
2458       The arguments to these internal functions are deliberately not
2459       described or documented in order to enable the  module authors make
2460       changes it when they feel the need for it.  Using them is  highly
2461       discouraged  as  the  API may change in future releases.
2462

EXAMPLES

2464   Reading a CSV file line by line:
2465        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2466        open my $fh, "<", "file.csv" or die "file.csv: $!";
2467        while (my $row = $csv->getline ($fh)) {
2468            # do something with @$row
2469            }
2470        close $fh or die "file.csv: $!";
2471
2472       or
2473
2474        my $aoa = csv (in => "file.csv", on_in => sub {
2475            # do something with %_
2476            });
2477
2478       Reading only a single column
2479
2480        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2481        open my $fh, "<", "file.csv" or die "file.csv: $!";
2482        # get only the 4th column
2483        my @column = map { $_->[3] } @{$csv->getline_all ($fh)};
2484        close $fh or die "file.csv: $!";
2485
2486       with "csv", you could do
2487
2488        my @column = map { $_->[0] }
2489            @{csv (in => "file.csv", fragment => "col=4")};
2490
2491   Parsing CSV strings:
2492        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
2493
2494        my $sample_input_string =
2495            qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
2496        if ($csv->parse ($sample_input_string)) {
2497            my @field = $csv->fields;
2498            foreach my $col (0 .. $#field) {
2499                my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
2500                printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
2501                }
2502            }
2503        else {
2504            print STDERR "parse () failed on argument: ",
2505                $csv->error_input, "\n";
2506            $csv->error_diag ();
2507            }
2508
2509       Parsing CSV from memory
2510
2511       Given a complete CSV data-set in scalar $data,  generate a list of
2512       lists to represent the rows and fields
2513
2514        # The data
2515        my $data = join "\r\n" => map { join "," => 0 .. 5 } 0 .. 5;
2516
2517        # in a loop
2518        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2519        open my $fh, "<", \$data;
2520        my @foo;
2521        while (my $row = $csv->getline ($fh)) {
2522            push @foo, $row;
2523            }
2524        close $fh;
2525
2526        # a single call
2527        my $foo = csv (in => \$data);
2528
2529   Printing CSV data
2530       The fast way: using "print"
2531
2532       An example for creating "CSV" files using the "print" method:
2533
2534        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
2535        open my $fh, ">", "foo.csv" or die "foo.csv: $!";
2536        for (1 .. 10) {
2537            $csv->print ($fh, [ $_, "$_" ]) or $csv->error_diag;
2538            }
2539        close $fh or die "$tbl.csv: $!";
2540
2541       The slow way: using "combine" and "string"
2542
2543       or using the slower "combine" and "string" methods:
2544
2545        my $csv = Text::CSV_XS->new;
2546
2547        open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
2548
2549        my @sample_input_fields = (
2550            'You said, "Hello!"',   5.67,
2551            '"Surely"',   '',   '3.14159');
2552        if ($csv->combine (@sample_input_fields)) {
2553            print $csv_fh $csv->string, "\n";
2554            }
2555        else {
2556            print "combine () failed on argument: ",
2557                $csv->error_input, "\n";
2558            }
2559        close $csv_fh or die "hello.csv: $!";
2560
2561       Generating CSV into memory
2562
2563       Format a data-set (@foo) into a scalar value in memory ($data):
2564
2565        # The data
2566        my @foo = map { [ 0 .. 5 ] } 0 .. 3;
2567
2568        # in a loop
2569        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
2570        open my $fh, ">", \my $data;
2571        $csv->print ($fh, $_) for @foo;
2572        close $fh;
2573
2574        # a single call
2575        csv (in => \@foo, out => \my $data);
2576
2577   Rewriting CSV
2578       Rewrite "CSV" files with ";" as separator character to well-formed
2579       "CSV":
2580
2581        use Text::CSV_XS qw( csv );
2582        csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
2583
2584       As "STDOUT" is now default in "csv", a one-liner converting a UTF-16
2585       CSV file with BOM and TAB-separation to valid UTF-8 CSV could be:
2586
2587        $ perl -C3 -MText::CSV_XS=csv -we\
2588           'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
2589
2590   Dumping database tables to CSV
2591       Dumping a database table can be simple as this (TIMTOWTDI):
2592
2593        my $dbh = DBI->connect (...);
2594        my $sql = "select * from foo";
2595
2596        # using your own loop
2597        open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
2598        my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
2599        my $sth = $dbh->prepare ($sql); $sth->execute;
2600        $csv->print ($fh, $sth->{NAME_lc});
2601        while (my $row = $sth->fetch) {
2602            $csv->print ($fh, $row);
2603            }
2604
2605        # using the csv function, all in memory
2606        csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
2607
2608        # using the csv function, streaming with callbacks
2609        my $sth = $dbh->prepare ($sql); $sth->execute;
2610        csv (out => "foo.csv", in => sub { $sth->fetch            });
2611        csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
2612
2613       Note that this does not discriminate between "empty" values and NULL-
2614       values from the database,  as both will be the same empty field in CSV.
2615       To enable distinction between the two, use "quote_empty".
2616
2617        csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
2618
2619       If the database import utility supports special sequences to insert
2620       "NULL" values into the database,  like MySQL/MariaDB supports "\N",
2621       use a filter or a map
2622
2623        csv (out => "foo.csv", in => sub { $sth->fetch },
2624                            on_in => sub { $_ //= "\\N" for @{$_[1]} });
2625
2626        while (my $row = $sth->fetch) {
2627            $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
2628            }
2629
2630       Note that this will not work as expected when choosing the backslash
2631       ("\") as "escape_char", as that will cause the "\" to need to be
2632       escaped by yet another "\",  which will cause the field to need
2633       quotation and thus ending up as "\\N" instead of "\N". See also
2634       "undef_str".
2635
2636        csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
2637
2638       These special sequences are not recognized by  Text::CSV_XS  on parsing
2639       the CSV generated like this, but map and filter are your friends again
2640
2641        while (my $row = $csv->getline ($fh)) {
2642            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
2643            }
2644
2645        csv (in => "foo.csv", filter => { 1 => sub {
2646            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
2647
2648   Converting CSV to JSON
2649        use Text::CSV_XS qw( csv );
2650        use JSON; # or Cpanel::JSON::XS for better performance
2651
2652        # AoA (no header interpretation)
2653        say encode_json (csv (in => "file.csv"));
2654
2655        # AoH (convert to structures)
2656        say encode_json (csv (in => "file.csv", bom => 1));
2657
2658       Yes, it is that simple.
2659
2660   The examples folder
2661       For more extended examples, see the examples/ 1. sub-directory in the
2662       original distribution or the git repository 2.
2663
2664        1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2665        2. https://github.com/Tux/Text-CSV_XS
2666
2667       The following files can be found there:
2668
2669       parser-xs.pl
2670         This can be used as a boilerplate to parse invalid "CSV"  and parse
2671         beyond (expected) errors alternative to using the "error" callback.
2672
2673          $ perl examples/parser-xs.pl bad.csv >good.csv
2674
2675       csv-check
2676         This is a command-line tool that uses parser-xs.pl  techniques to
2677         check the "CSV" file and report on its content.
2678
2679          $ csv-check files/utf8.csv
2680          Checked files/utf8.csv  with csv-check 1.9
2681          using Text::CSV_XS 1.32 with perl 5.26.0 and Unicode 9.0.0
2682          OK: rows: 1, columns: 2
2683              sep = <,>, quo = <">, bin = <1>, eol = <"\n">
2684
2685       csv-split
2686         This command splits "CSV" files into smaller files,  keeping (part
2687         of) the header.  Options include maximum number of (data) rows per
2688         file and maximum number of columns per file or a combination of the
2689         two.
2690
2691       csv2xls
2692         A script to convert "CSV" to Microsoft Excel ("XLS"). This requires
2693         extra modules Date::Calc and Spreadsheet::WriteExcel. The converter
2694         accepts various options and can produce UTF-8 compliant Excel files.
2695
2696       csv2xlsx
2697         A script to convert "CSV" to Microsoft Excel ("XLSX").  This requires
2698         the modules Date::Calc and Spreadsheet::Writer::XLSX.  The converter
2699         does accept various options including merging several "CSV" files
2700         into a single Excel file.
2701
2702       csvdiff
2703         A script that provides colorized diff on sorted CSV files,  assuming
2704         first line is header and first field is the key. Output options
2705         include colorized ANSI escape codes or HTML.
2706
2707          $ csvdiff --html --output=diff.html file1.csv file2.csv
2708
2709       rewrite.pl
2710         A script to rewrite (in)valid CSV into valid CSV files.  Script has
2711         options to generate confusing CSV files or CSV files that conform to
2712         Dutch MS-Excel exports (using ";" as separation).
2713
2714         Script - by default - honors BOM  and auto-detects separation
2715         converting it to default standard CSV with "," as separator.
2716

CAVEATS

2718       Text::CSV_XS  is not designed to detect the characters used to quote
2719       and separate fields.  The parsing is done using predefined  (default)
2720       settings.  In the examples  sub-directory,  you can find scripts  that
2721       demonstrate how you could try to detect these characters yourself.
2722
2723   Microsoft Excel
2724       The import/export from Microsoft Excel is a risky task, according to
2725       the documentation in "Text::CSV::Separator".  Microsoft uses the
2726       system's list separator defined in the regional settings, which happens
2727       to be a semicolon for Dutch, German and Spanish (and probably some
2728       others as well).   For the English locale,  the default is a comma.
2729       In Windows however,  the user is free to choose a  predefined locale,
2730       and then change  every  individual setting in it, so checking the
2731       locale is no solution.
2732
2733       As of version 1.17, a lone first line with just
2734
2735         sep=;
2736
2737       will be recognized and honored when parsing with "getline".
2738

TODO

2740       More Errors & Warnings
2741         New extensions ought to be  clear and concise  in reporting what
2742         error has occurred where and why, and maybe also offer a remedy to
2743         the problem.
2744
2745         "error_diag" is a (very) good start, but there is more work to be
2746         done in this area.
2747
2748         Basic calls  should croak or warn on  illegal parameters.  Errors
2749         should be documented.
2750
2751       setting meta info
2752         Future extensions might include extending the "meta_info",
2753         "is_quoted", and  "is_binary"  to accept setting these  flags for
2754         fields,  so you can specify which fields are quoted in the
2755         "combine"/"string" combination.
2756
2757          $csv->meta_info (0, 1, 1, 3, 0, 0);
2758          $csv->is_quoted (3, 1);
2759
2760         Metadata Vocabulary for Tabular Data
2761         <http://w3c.github.io/csvw/metadata/> (a W3C editor's draft) could be
2762         an example for supporting more metadata.
2763
2764       Parse the whole file at once
2765         Implement new methods or functions  that enable parsing of a
2766         complete file at once, returning a list of hashes. Possible extension
2767         to this could be to enable a column selection on the call:
2768
2769          my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
2770
2771         returning something like
2772
2773          [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
2774              flags  => [ ... ],
2775              },
2776            { fields => [ ... ],
2777              .
2778              },
2779            ]
2780
2781         Note that the "csv" function already supports most of this,  but does
2782         not return flags. "getline_all" returns all rows for an open stream,
2783         but this will not return flags either.  "fragment"  can reduce the
2784         required  rows or columns, but cannot combine them.
2785
2786       Cookbook
2787         Write a document that has recipes for  most known  non-standard  (and
2788         maybe some standard)  "CSV" formats,  including formats that use
2789         "TAB",  ";", "|", or other non-comma separators.
2790
2791         Examples could be taken from W3C's CSV on the Web: Use Cases and
2792         Requirements <http://w3c.github.io/csvw/use-cases-and-
2793         requirements/index.html>
2794
2795       Steal
2796         Steal good new ideas and features from PapaParse
2797         <http://papaparse.com> or csvkit <http://csvkit.readthedocs.org>.
2798
2799       Raku support
2800         Raku support can be found here <https://github.com/Tux/CSV>. The
2801         interface is richer in support than the Perl5 API, as Raku supports
2802         more types.
2803
2804         The Raku version does not (yet) support pure binary CSV datasets.
2805
2806   NOT TODO
2807       combined methods
2808         Requests for adding means (methods) that combine "combine" and
2809         "string" in a single call will not be honored (use "print" instead).
2810         Likewise for "parse" and "fields"  (use "getline" instead), given the
2811         problems with embedded newlines.
2812
2813   Release plan
2814       No guarantees, but this is what I had in mind some time ago:
2815
2816       • DIAGNOSTICS section in pod to *describe* the errors (see below)
2817

EBCDIC

2819       Everything should now work on native EBCDIC systems.   As the test does
2820       not cover all possible codepoints and Encode does not support
2821       "utf-ebcdic", there is no guarantee that all handling of Unicode is
2822       done correct.
2823
2824       Opening "EBCDIC" encoded files on  "ASCII"+  systems is likely to
2825       succeed using Encode's "cp37", "cp1047", or "posix-bc":
2826
2827        open my $fh, "<:encoding(cp1047)", "ebcdic_file.csv" or die "...";
2828

DIAGNOSTICS

2830       Still under construction ...
2831
2832       If an error occurs,  "$csv->error_diag" can be used to get information
2833       on the cause of the failure. Note that for speed reasons the internal
2834       value is never cleared on success,  so using the value returned by
2835       "error_diag" in normal cases - when no error occurred - may cause
2836       unexpected results.
2837
2838       If the constructor failed, the cause can be found using "error_diag" as
2839       a class method, like "Text::CSV_XS->error_diag".
2840
2841       The "$csv->error_diag" method is automatically invoked upon error when
2842       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2843       autodie is in effect.  When set to 1, this will cause a "warn" with the
2844       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2845       from "auto_diag" reports.
2846
2847       Errors can be (individually) caught using the "error" callback.
2848
2849       The errors as described below are available. I have tried to make the
2850       error itself explanatory enough, but more descriptions will be added.
2851       For most of these errors, the first three capitals describe the error
2852       category:
2853
2854       • INI
2855
2856         Initialization error or option conflict.
2857
2858       • ECR
2859
2860         Carriage-Return related parse error.
2861
2862       • EOF
2863
2864         End-Of-File related parse error.
2865
2866       • EIQ
2867
2868         Parse error inside quotation.
2869
2870       • EIF
2871
2872         Parse error inside field.
2873
2874       • ECB
2875
2876         Combine error.
2877
2878       • EHR
2879
2880         HashRef parse related error.
2881
2882       And below should be the complete list of error codes that can be
2883       returned:
2884
2885       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2886
2887         The  separation character  cannot be equal to  the quotation
2888         character or to the escape character,  as this would invalidate all
2889         parsing rules.
2890
2891       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2892         TAB"
2893
2894         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2895         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2896         allow.
2897
2898       • 1003 "INI - \r or \n in main attr not allowed"
2899
2900         Using default "eol" characters in either "sep_char", "quote_char",
2901         or  "escape_char"  is  not allowed.
2902
2903       • 1004 "INI - callbacks should be undef or a hashref"
2904
2905         The "callbacks"  attribute only allows one to be "undef" or a hash
2906         reference.
2907
2908       • 1005 "INI - EOL too long"
2909
2910         The value passed for EOL is exceeding its maximum length (16).
2911
2912       • 1006 "INI - SEP too long"
2913
2914         The value passed for SEP is exceeding its maximum length (16).
2915
2916       • 1007 "INI - QUOTE too long"
2917
2918         The value passed for QUOTE is exceeding its maximum length (16).
2919
2920       • 1008 "INI - SEP undefined"
2921
2922         The value passed for SEP should be defined and not empty.
2923
2924       • 1010 "INI - the header is empty"
2925
2926         The header line parsed in the "header" is empty.
2927
2928       • 1011 "INI - the header contains more than one valid separator"
2929
2930         The header line parsed in the  "header"  contains more than one
2931         (unique) separator character out of the allowed set of separators.
2932
2933       • 1012 "INI - the header contains an empty field"
2934
2935         The header line parsed in the "header" contains an empty field.
2936
2937       • 1013 "INI - the header contains nun-unique fields"
2938
2939         The header line parsed in the  "header"  contains at least  two
2940         identical fields.
2941
2942       • 1014 "INI - header called on undefined stream"
2943
2944         The header line cannot be parsed from an undefined source.
2945
2946       • 1500 "PRM - Invalid/unsupported argument(s)"
2947
2948         Function or method called with invalid argument(s) or parameter(s).
2949
2950       • 1501 "PRM - The key attribute is passed as an unsupported type"
2951
2952         The "key" attribute is of an unsupported type.
2953
2954       • 1502 "PRM - The value attribute is passed without the key attribute"
2955
2956         The "value" attribute is only allowed when a valid key is given.
2957
2958       • 1503 "PRM - The value attribute is passed as an unsupported type"
2959
2960         The "value" attribute is of an unsupported type.
2961
2962       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2963
2964         When  "eol"  has  been  set  to  anything  but the  default,  like
2965         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2966         "quote_char", where the characters following the "\r" do not make up
2967         the "eol" sequence, this is an error.
2968
2969       • 2011 "ECR - Characters after end of quoted field"
2970
2971         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2972         quoted field and after the closing double-quote, there should be
2973         either a new-line sequence or a separation character.
2974
2975       • 2012 "EOF - End of data in parsing input stream"
2976
2977         Self-explaining. End-of-file while inside parsing a stream. Can
2978         happen only when reading from streams with "getline",  as using
2979         "parse" is done on strings that are not required to have a trailing
2980         "eol".
2981
2982       • 2013 "INI - Specification error for fragments RFC7111"
2983
2984         Invalid specification for URI "fragment" specification.
2985
2986       • 2014 "ENF - Inconsistent number of fields"
2987
2988         Inconsistent number of fields under strict parsing.
2989
2990       • 2021 "EIQ - NL char inside quotes, binary off"
2991
2992         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2993         option has been selected with the constructor.
2994
2995       • 2022 "EIQ - CR char inside quotes, binary off"
2996
2997         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2998         option has been selected with the constructor.
2999
3000       • 2023 "EIQ - QUO character not allowed"
3001
3002         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
3003         Bar",\n" will cause this error.
3004
3005       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
3006
3007         The escape character is not allowed as last character in an input
3008         stream.
3009
3010       • 2025 "EIQ - Loose unescaped escape"
3011
3012         An escape character should escape only characters that need escaping.
3013
3014         Allowing  the escape  for other characters  is possible  with the
3015         attribute "allow_loose_escapes".
3016
3017       • 2026 "EIQ - Binary character inside quoted field, binary off"
3018
3019         Binary characters are not allowed by default.    Exceptions are
3020         fields that contain valid UTF-8,  that will automatically be upgraded
3021         if the content is valid UTF-8. Set "binary" to 1 to accept binary
3022         data.
3023
3024       • 2027 "EIQ - Quoted field not terminated"
3025
3026         When parsing a field that started with a quotation character,  the
3027         field is expected to be closed with a quotation character.   When the
3028         parsed line is exhausted before the quote is found, that field is not
3029         terminated.
3030
3031       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
3032
3033       • 2031 "EIF - CR char is first char of field, not part of EOL"
3034
3035       • 2032 "EIF - CR char inside unquoted, not part of EOL"
3036
3037       • 2034 "EIF - Loose unescaped quote"
3038
3039       • 2035 "EIF - Escaped EOF in unquoted field"
3040
3041       • 2036 "EIF - ESC error"
3042
3043       • 2037 "EIF - Binary character in unquoted field, binary off"
3044
3045       • 2110 "ECB - Binary character in Combine, binary off"
3046
3047       • 2200 "EIO - print to IO failed. See errno"
3048
3049       • 3001 "EHR - Unsupported syntax for column_names ()"
3050
3051       • 3002 "EHR - getline_hr () called before column_names ()"
3052
3053       • 3003 "EHR - bind_columns () and column_names () fields count
3054         mismatch"
3055
3056       • 3004 "EHR - bind_columns () only accepts refs to scalars"
3057
3058       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
3059         fields"
3060
3061       • 3007 "EHR - bind_columns needs refs to writable scalars"
3062
3063       • 3008 "EHR - unexpected error in bound fields"
3064
3065       • 3009 "EHR - print_hr () called before column_names ()"
3066
3067       • 3010 "EHR - print_hr () called with invalid arguments"
3068

SEE ALSO

3070       IO::File,  IO::Handle,  IO::Wrap,  Text::CSV,  Text::CSV_PP,
3071       Text::CSV::Encoded,     Text::CSV::Separator,    Text::CSV::Slurp,
3072       Spreadsheet::CSV and Spreadsheet::Read, and of course perl.
3073
3074       If you are using Raku,  have a look at "Text::CSV" in the Raku
3075       ecosystem, offering the same features.
3076
3077       non-perl
3078
3079       A CSV parser in JavaScript,  also used by W3C <http://www.w3.org>,  is
3080       the multi-threaded in-browser PapaParse <http://papaparse.com/>.
3081
3082       csvkit <http://csvkit.readthedocs.org> is a python CSV parsing toolkit.
3083

AUTHOR

3085       Alan Citterman <alan@mfgrtl.com> wrote the original Perl module.
3086       Please don't send mail concerning Text::CSV_XS to Alan, who is not
3087       involved in the C/XS part that is now the main part of the module.
3088
3089       Jochen Wiedmann <joe@ispsoft.de> rewrote the en- and decoding in C by
3090       implementing a simple finite-state machine.   He added variable quote,
3091       escape and separator characters, the binary mode and the print and
3092       getline methods. See ChangeLog releases 0.10 through 0.23.
3093
3094       H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code,  added the
3095       field flags methods,  wrote the major part of the test suite, completed
3096       the documentation,   fixed most RT bugs,  added all the allow flags and
3097       the "csv" function. See ChangeLog releases 0.25 and on.
3098
3100        Copyright (C) 2007-2022 H.Merijn Brand.  All rights reserved.
3101        Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
3102        Copyright (C) 1997      Alan Citterman.  All rights reserved.
3103
3104       This library is free software;  you can redistribute and/or modify it
3105       under the same terms as Perl itself.
3106
3107
3108
3109perl v5.36.0                      2022-07-22                         CSV_XS(3)
Impressum