1CSV_XS(3)             User Contributed Perl Documentation            CSV_XS(3)
2
3
4

NAME

6       Text::CSV_XS - comma-separated values manipulation routines
7

SYNOPSIS

9        # Functional interface
10        use Text::CSV_XS qw( csv );
11
12        # Read whole file in memory
13        my $aoa = csv (in => "data.csv");    # as array of array
14        my $aoh = csv (in => "data.csv",
15                       headers => "auto");   # as array of hash
16
17        # Write array of arrays as csv file
18        csv (in => $aoa, out => "file.csv", sep_char=> ";");
19
20        # Only show lines where "code" is odd
21        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
22
23
24        # Object interface
25        use Text::CSV_XS;
26
27        my @rows;
28        # Read/parse CSV
29        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
30        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
31        while (my $row = $csv->getline ($fh)) {
32            $row->[2] =~ m/pattern/ or next; # 3rd field should match
33            push @rows, $row;
34            }
35        close $fh;
36
37        # and write as CSV
38        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
39        $csv->say ($fh, $_) for @rows;
40        close $fh or die "new.csv: $!";
41

DESCRIPTION

43       Text::CSV_XS  provides facilities for the composition  and
44       decomposition of comma-separated values.  An instance of the
45       Text::CSV_XS class will combine fields into a "CSV" string and parse a
46       "CSV" string into fields.
47
48       The module accepts either strings or files as input  and support the
49       use of user-specified characters for delimiters, separators, and
50       escapes.
51
52   Embedded newlines
53       Important Note:  The default behavior is to accept only ASCII
54       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
55       that the fields can not contain newlines. If your data contains
56       newlines embedded in fields, or characters above 0x7E (tilde), or
57       binary data, you must set "binary => 1" in the call to "new". To cover
58       the widest range of parsing options, you will always want to set
59       binary.
60
61       But you still have the problem  that you have to pass a correct line to
62       the "parse" method, which is more complicated from the usual point of
63       usage:
64
65        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
66        while (<>) {           #  WRONG!
67            $csv->parse ($_);
68            my @fields = $csv->fields ();
69            }
70
71       this will break, as the "while" might read broken lines:  it does not
72       care about the quoting. If you need to support embedded newlines,  the
73       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
74       and "\r\n" by default) and then
75
76        my $csv = Text::CSV_XS->new ({ binary => 1 });
77        open my $fh, "<", $file or die "$file: $!";
78        while (my $row = $csv->getline ($fh)) {
79            my @fields = @$row;
80            }
81
82       The old(er) way of using global file handles is still supported
83
84        while (my $row = $csv->getline (*ARGV)) { ... }
85
86   Unicode
87       Unicode is only tested to work with perl-5.8.2 and up.
88
89       See also "BOM".
90
91       The simplest way to ensure the correct encoding is used for  in- and
92       output is by either setting layers on the filehandles, or setting the
93       "encoding" argument for "csv".
94
95        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
96       or
97        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
98
99        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
100       or
101        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
102
103       On parsing (both for  "getline" and  "parse"),  if the source is marked
104       being UTF8, then all fields that are marked binary will also be marked
105       UTF8.
106
107       On combining ("print"  and  "combine"):  if any of the combining fields
108       was marked UTF8, the resulting string will be marked as UTF8.  Note
109       however that all fields  before  the first field marked UTF8 and
110       contained 8-bit characters that were not upgraded to UTF8,  these will
111       be  "bytes"  in the resulting string too, possibly causing unexpected
112       errors.  If you pass data of different encoding,  or you don't know if
113       there is  different  encoding, force it to be upgraded before you pass
114       them on:
115
116        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
117
118       For complete control over encoding, please use Text::CSV::Encoded:
119
120        use Text::CSV::Encoded;
121        my $csv = Text::CSV::Encoded->new ({
122            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
123            encoding_out => "cp1252",     # the encoding comes out of Perl
124            });
125
126        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
127        # combine () and print () accept *literally* utf8 encoded data
128        # parse () and getline () return *literally* utf8 encoded data
129
130        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
131        # combine () and print () accept UTF8 marked data
132        # parse () and getline () return UTF8 marked data
133
134   BOM
135       BOM  (or Byte Order Mark)  handling is available only inside the
136       "header" method.   This method supports the following encodings:
137       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
138       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
139       <https://en.wikipedia.org/wiki/Byte_order_mark>.
140
141       If a file has a BOM, the easiest way to deal with that is
142
143        my $aoh = csv (in => $file, detect_bom => 1);
144
145       All records will be encoded based on the detected BOM.
146
147       This implies a call to the  "header"  method,  which defaults to also
148       set the "column_names". So this is not the same as
149
150        my $aoh = csv (in => $file, headers => "auto");
151
152       which only reads the first record to set  "column_names"  but ignores
153       any meaning of possible present BOM.
154

SPECIFICATION

156       While no formal specification for CSV exists, RFC 4180
157       <https://datatracker.ietf.org/doc/html/rfc4180> (1) describes the
158       common format and establishes  "text/csv" as the MIME type registered
159       with the IANA. RFC 7111 <https://datatracker.ietf.org/doc/html/rfc7111>
160       (2) adds fragments to CSV.
161
162       Many informal documents exist that describe the "CSV" format.   "How
163       To: The Comma Separated Value (CSV) File Format"
164       <http://creativyst.com/Doc/Articles/CSV/CSV01.shtml> (3)  provides an
165       overview of the  "CSV"  format in the most widely used applications and
166       explains how it can best be used and supported.
167
168        1) https://datatracker.ietf.org/doc/html/rfc4180
169        2) https://datatracker.ietf.org/doc/html/rfc7111
170        3) http://creativyst.com/Doc/Articles/CSV/CSV01.shtml
171
172       The basic rules are as follows:
173
174       CSV  is a delimited data format that has fields/columns separated by
175       the comma character and records/rows separated by newlines. Fields that
176       contain a special character (comma, newline, or double quote),  must be
177       enclosed in double quotes. However, if a line contains a single entry
178       that is the empty string, it may be enclosed in double quotes.  If a
179       field's value contains a double quote character it is escaped by
180       placing another double quote character next to it. The "CSV" file
181       format does not require a specific character encoding, byte order, or
182       line terminator format.
183
184       • Each record is a single line ended by a line feed  (ASCII/"LF"=0x0A)
185         or a carriage return and line feed pair (ASCII/"CRLF"="0x0D 0x0A"),
186         however, line-breaks may be embedded.
187
188       • Fields are separated by commas.
189
190       • Allowable characters within a "CSV" field include 0x09 ("TAB") and
191         the inclusive range of 0x20 (space) through 0x7E (tilde).  In binary
192         mode all characters are accepted, at least in quoted fields.
193
194       • A field within  "CSV"  must be surrounded by  double-quotes to
195         contain  a separator character (comma).
196
197       Though this is the most clear and restrictive definition,  Text::CSV_XS
198       is way more liberal than this, and allows extension:
199
200       • Line termination by a single carriage return is accepted by default
201
202       • The separation-, escape-, and escape- characters can be any ASCII
203         character in the range from  0x20 (space) to  0x7E (tilde).
204         Characters outside this range may or may not work as expected.
205         Multibyte characters, like UTF "U+060C" (ARABIC COMMA),   "U+FF0C"
206         (FULLWIDTH COMMA),  "U+241B" (SYMBOL FOR ESCAPE), "U+2424" (SYMBOL
207         FOR NEWLINE), "U+FF02" (FULLWIDTH QUOTATION MARK), and "U+201C" (LEFT
208         DOUBLE QUOTATION MARK) (to give some examples of what might look
209         promising) work for newer versions of perl for "sep_char", and
210         "quote_char" but not for "escape_char".
211
212         If you use perl-5.8.2 or higher these three attributes are
213         utf8-decoded, to increase the likelihood of success. This way
214         "U+00FE" will be allowed as a quote character.
215
216       • A field in  "CSV"  must be surrounded by double-quotes to make an
217         embedded double-quote, represented by a pair of consecutive double-
218         quotes, valid. In binary mode you may additionally use the sequence
219         ""0" for representation of a NULL byte. Using 0x00 in binary mode is
220         just as valid.
221
222       • Several violations of the above specification may be lifted by
223         passing some options as attributes to the object constructor.
224

METHODS

226   version
227       (Class method) Returns the current module version.
228
229   new
230       (Class method) Returns a new instance of class Text::CSV_XS. The
231       attributes are described by the (optional) hash ref "\%attr".
232
233        my $csv = Text::CSV_XS->new ({ attributes ... });
234
235       The following attributes are available:
236
237       eol
238
239        my $csv = Text::CSV_XS->new ({ eol => $/ });
240                  $csv->eol (undef);
241        my $eol = $csv->eol;
242
243       The end-of-line string to add to rows for "print" or the record
244       separator for "getline".
245
246       When not passed in a parser instance,  the default behavior is to
247       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
248       "eol" at all. Passing "undef" or the empty string behave the same.
249
250       When not passed in a generating instance,  records are not terminated
251       at all, so it is probably wise to pass something you expect. A safe
252       choice for "eol" on output is either $/ or "\r\n".
253
254       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
255       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
256       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
257
258       If both $/ and "eol" equal "\015", parsing lines that end on only a
259       Carriage Return without Line Feed, will be "parse"d correct.
260
261       sep_char
262
263        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
264                $csv->sep_char (";");
265        my $c = $csv->sep_char;
266
267       The char used to separate fields, by default a comma. (",").  Limited
268       to a single-byte character, usually in the range from 0x20 (space) to
269       0x7E (tilde). When longer sequences are required, use "sep".
270
271       The separation character can not be equal to the quote character  or to
272       the escape character.
273
274       See also "CAVEATS"
275
276       sep
277
278        my $csv = Text::CSV_XS->new ({ sep => "\N{FULLWIDTH COMMA}" });
279                  $csv->sep (";");
280        my $sep = $csv->sep;
281
282       The chars used to separate fields, by default undefined. Limited to 8
283       bytes.
284
285       When set, overrules "sep_char".  If its length is one byte it acts as
286       an alias to "sep_char".
287
288       See also "CAVEATS"
289
290       quote_char
291
292        my $csv = Text::CSV_XS->new ({ quote_char => "'" });
293                $csv->quote_char (undef);
294        my $c = $csv->quote_char;
295
296       The character to quote fields containing blanks or binary data,  by
297       default the double quote character (""").  A value of undef suppresses
298       quote chars (for simple cases only). Limited to a single-byte
299       character, usually in the range from  0x20 (space) to  0x7E (tilde).
300       When longer sequences are required, use "quote".
301
302       "quote_char" can not be equal to "sep_char".
303
304       quote
305
306        my $csv = Text::CSV_XS->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
307                    $csv->quote ("'");
308        my $quote = $csv->quote;
309
310       The chars used to quote fields, by default undefined. Limited to 8
311       bytes.
312
313       When set, overrules "quote_char". If its length is one byte it acts as
314       an alias to "quote_char".
315
316       This method does not support "undef".  Use "quote_char" to disable
317       quotation.
318
319       See also "CAVEATS"
320
321       escape_char
322
323        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
324                $csv->escape_char (":");
325        my $c = $csv->escape_char;
326
327       The character to  escape  certain characters inside quoted fields.
328       This is limited to a  single-byte  character,  usually  in the  range
329       from  0x20 (space) to 0x7E (tilde).
330
331       The "escape_char" defaults to being the double-quote mark ("""). In
332       other words the same as the default "quote_char". This means that
333       doubling the quote mark in a field escapes it:
334
335        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
336
337       If  you  change  the   "quote_char"  without  changing  the
338       "escape_char",  the  "escape_char" will still be the double-quote
339       (""").  If instead you want to escape the  "quote_char" by doubling it
340       you will need to also change the  "escape_char"  to be the same as what
341       you have changed the "quote_char" to.
342
343       Setting "escape_char" to "undef" or "" will completely disable escapes
344       and is greatly discouraged. This will also disable "escape_null".
345
346       The escape character can not be equal to the separation character.
347
348       binary
349
350        my $csv = Text::CSV_XS->new ({ binary => 1 });
351                $csv->binary (0);
352        my $f = $csv->binary;
353
354       If this attribute is 1,  you may use binary characters in quoted
355       fields, including line feeds, carriage returns and "NULL" bytes. (The
356       latter could be escaped as ""0".) By default this feature is off.
357
358       If a string is marked UTF8,  "binary" will be turned on automatically
359       when binary characters other than "CR" and "NL" are encountered.   Note
360       that a simple string like "\x{00a0}" might still be binary, but not
361       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
362
363       strict
364
365        my $csv = Text::CSV_XS->new ({ strict => 1 });
366                $csv->strict (0);
367        my $f = $csv->strict;
368
369       If this attribute is set to 1, any row that parses to a different
370       number of fields than the previous row will cause the parser to throw
371       error 2014.
372
373       skip_empty_rows
374
375        my $csv = Text::CSV_XS->new ({ skip_empty_rows => 1 });
376                $csv->skip_empty_rows (0);
377        my $f = $csv->skip_empty_rows;
378
379       If this attribute is set to 1,  any row that has an  "eol" immediately
380       following the start of line will be skipped.  Default behavior is to
381       return one single empty field.
382
383       This attribute is only used in parsing.
384
385       formula_handling
386
387       Alias for "formula"
388
389       formula
390
391        my $csv = Text::CSV_XS->new ({ formula => "none" });
392                $csv->formula ("none");
393        my $f = $csv->formula;
394
395       This defines the behavior of fields containing formulas. As formulas
396       are considered dangerous in spreadsheets, this attribute can define an
397       optional action to be taken if a field starts with an equal sign ("=").
398
399       For purpose of code-readability, this can also be written as
400
401        my $csv = Text::CSV_XS->new ({ formula_handling => "none" });
402                $csv->formula_handling ("none");
403        my $f = $csv->formula_handling;
404
405       Possible values for this attribute are
406
407       none
408         Take no specific action. This is the default.
409
410          $csv->formula ("none");
411
412       die
413         Cause the process to "die" whenever a leading "=" is encountered.
414
415          $csv->formula ("die");
416
417       croak
418         Cause the process to "croak" whenever a leading "=" is encountered.
419         (See Carp)
420
421          $csv->formula ("croak");
422
423       diag
424         Report position and content of the field whenever a leading  "=" is
425         found.  The value of the field is unchanged.
426
427          $csv->formula ("diag");
428
429       empty
430         Replace the content of fields that start with a "=" with the empty
431         string.
432
433          $csv->formula ("empty");
434          $csv->formula ("");
435
436       undef
437         Replace the content of fields that start with a "=" with "undef".
438
439          $csv->formula ("undef");
440          $csv->formula (undef);
441
442       a callback
443         Modify the content of fields that start with a  "="  with the return-
444         value of the callback.  The original content of the field is
445         available inside the callback as $_;
446
447          # Replace all formula's with 42
448          $csv->formula (sub { 42; });
449
450          # same as $csv->formula ("empty") but slower
451          $csv->formula (sub { "" });
452
453          # Allow =4+12
454          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
455
456          # Allow more complex calculations
457          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
458
459       All other values will give a warning and then fallback to "diag".
460
461       decode_utf8
462
463        my $csv = Text::CSV_XS->new ({ decode_utf8 => 1 });
464                $csv->decode_utf8 (0);
465        my $f = $csv->decode_utf8;
466
467       This attributes defaults to TRUE.
468
469       While parsing,  fields that are valid UTF-8, are automatically set to
470       be UTF-8, so that
471
472         $csv->parse ("\xC4\xA8\n");
473
474       results in
475
476         PV("\304\250"\0) [UTF8 "\x{128}"]
477
478       Sometimes it might not be a desired action.  To prevent those upgrades,
479       set this attribute to false, and the result will be
480
481         PV("\304\250"\0)
482
483       auto_diag
484
485        my $csv = Text::CSV_XS->new ({ auto_diag => 1 });
486                $csv->auto_diag (2);
487        my $l = $csv->auto_diag;
488
489       Set this attribute to a number between 1 and 9 causes  "error_diag" to
490       be automatically called in void context upon errors.
491
492       In case of error "2012 - EOF", this call will be void.
493
494       If "auto_diag" is set to a numeric value greater than 1, it will "die"
495       on errors instead of "warn".  If set to anything unrecognized,  it will
496       be silently ignored.
497
498       Future extensions to this feature will include more reliable auto-
499       detection of  "autodie"  being active in the scope of which the error
500       occurred which will increment the value of "auto_diag" with  1 the
501       moment the error is detected.
502
503       diag_verbose
504
505        my $csv = Text::CSV_XS->new ({ diag_verbose => 1 });
506                $csv->diag_verbose (2);
507        my $l = $csv->diag_verbose;
508
509       Set the verbosity of the output triggered by "auto_diag".   Currently
510       only adds the current  input-record-number  (if known)  to the
511       diagnostic output with an indication of the position of the error.
512
513       blank_is_undef
514
515        my $csv = Text::CSV_XS->new ({ blank_is_undef => 1 });
516                $csv->blank_is_undef (0);
517        my $f = $csv->blank_is_undef;
518
519       Under normal circumstances, "CSV" data makes no distinction between
520       quoted- and unquoted empty fields.  These both end up in an empty
521       string field once read, thus
522
523        1,"",," ",2
524
525       is read as
526
527        ("1", "", "", " ", "2")
528
529       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
530       set, the unquoted  empty field is the result of an undefined value.
531       To enable this distinction when  reading "CSV"  data,  the
532       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
533       to "undef", causing the above to be parsed as
534
535        ("1", "", undef, " ", "2")
536
537       Note that this is specifically important when loading  "CSV" fields
538       into a database that allows "NULL" values,  as the perl equivalent for
539       "NULL" is "undef" in DBI land.
540
541       empty_is_undef
542
543        my $csv = Text::CSV_XS->new ({ empty_is_undef => 1 });
544                $csv->empty_is_undef (0);
545        my $f = $csv->empty_is_undef;
546
547       Going one  step  further  than  "blank_is_undef",  this attribute
548       converts all empty fields to "undef", so
549
550        1,"",," ",2
551
552       is read as
553
554        (1, undef, undef, " ", 2)
555
556       Note that this affects only fields that are  originally  empty,  not
557       fields that are empty after stripping allowed whitespace. YMMV.
558
559       allow_whitespace
560
561        my $csv = Text::CSV_XS->new ({ allow_whitespace => 1 });
562                $csv->allow_whitespace (0);
563        my $f = $csv->allow_whitespace;
564
565       When this option is set to true,  the whitespace  ("TAB"'s and
566       "SPACE"'s) surrounding  the  separation character  is removed when
567       parsing.  If either "TAB" or "SPACE" is one of the three characters
568       "sep_char", "quote_char", or "escape_char" it will not be considered
569       whitespace.
570
571       Now lines like:
572
573        1 , "foo" , bar , 3 , zapp
574
575       are parsed as valid "CSV", even though it violates the "CSV" specs.
576
577       Note that  all  whitespace is stripped from both  start and  end of
578       each field.  That would make it  more than a feature to enable parsing
579       bad "CSV" lines, as
580
581        1,   2.0,  3,   ape  , monkey
582
583       will now be parsed as
584
585        ("1", "2.0", "3", "ape", "monkey")
586
587       even if the original line was perfectly acceptable "CSV".
588
589       allow_loose_quotes
590
591        my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1 });
592                $csv->allow_loose_quotes (0);
593        my $f = $csv->allow_loose_quotes;
594
595       By default, parsing unquoted fields containing "quote_char" characters
596       like
597
598        1,foo "bar" baz,42
599
600       would result in parse error 2034.  Though it is still bad practice to
601       allow this format,  we  cannot  help  the  fact  that  some  vendors
602       make  their applications spit out lines styled this way.
603
604       If there is really bad "CSV" data, like
605
606        1,"foo "bar" baz",42
607
608       or
609
610        1,""foo bar baz"",42
611
612       there is a way to get this data-line parsed and leave the quotes inside
613       the quoted field as-is.  This can be achieved by setting
614       "allow_loose_quotes" AND making sure that the "escape_char" is  not
615       equal to "quote_char".
616
617       allow_loose_escapes
618
619        my $csv = Text::CSV_XS->new ({ allow_loose_escapes => 1 });
620                $csv->allow_loose_escapes (0);
621        my $f = $csv->allow_loose_escapes;
622
623       Parsing fields  that  have  "escape_char"  characters that escape
624       characters that do not need to be escaped, like:
625
626        my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
627        $csv->parse (qq{1,"my bar\'s",baz,42});
628
629       would result in parse error 2025.   Though it is bad practice to allow
630       this format,  this attribute enables you to treat all escape character
631       sequences equal.
632
633       allow_unquoted_escape
634
635        my $csv = Text::CSV_XS->new ({ allow_unquoted_escape => 1 });
636                $csv->allow_unquoted_escape (0);
637        my $f = $csv->allow_unquoted_escape;
638
639       A backward compatibility issue where "escape_char" differs from
640       "quote_char"  prevents  "escape_char" to be in the first position of a
641       field.  If "quote_char" is equal to the default """ and "escape_char"
642       is set to "\", this would be illegal:
643
644        1,\0,2
645
646       Setting this attribute to 1  might help to overcome issues with
647       backward compatibility and allow this style.
648
649       always_quote
650
651        my $csv = Text::CSV_XS->new ({ always_quote => 1 });
652                $csv->always_quote (0);
653        my $f = $csv->always_quote;
654
655       By default the generated fields are quoted only if they need to be.
656       For example, if they contain the separator character. If you set this
657       attribute to 1 then all defined fields will be quoted. ("undef" fields
658       are not quoted, see "blank_is_undef"). This makes it quite often easier
659       to handle exported data in external applications.   (Poor creatures who
660       are better to use Text::CSV_XS. :)
661
662       quote_space
663
664        my $csv = Text::CSV_XS->new ({ quote_space => 1 });
665                $csv->quote_space (0);
666        my $f = $csv->quote_space;
667
668       By default,  a space in a field would trigger quotation.  As no rule
669       exists this to be forced in "CSV",  nor any for the opposite, the
670       default is true for safety.   You can exclude the space  from this
671       trigger  by setting this attribute to 0.
672
673       quote_empty
674
675        my $csv = Text::CSV_XS->new ({ quote_empty => 1 });
676                $csv->quote_empty (0);
677        my $f = $csv->quote_empty;
678
679       By default the generated fields are quoted only if they need to be.
680       An empty (defined) field does not need quotation. If you set this
681       attribute to 1 then empty defined fields will be quoted.  ("undef"
682       fields are not quoted, see "blank_is_undef"). See also "always_quote".
683
684       quote_binary
685
686        my $csv = Text::CSV_XS->new ({ quote_binary => 1 });
687                $csv->quote_binary (0);
688        my $f = $csv->quote_binary;
689
690       By default,  all "unsafe" bytes inside a string cause the combined
691       field to be quoted.  By setting this attribute to 0, you can disable
692       that trigger for bytes ">= 0x7F".
693
694       escape_null
695
696        my $csv = Text::CSV_XS->new ({ escape_null => 1 });
697                $csv->escape_null (0);
698        my $f = $csv->escape_null;
699
700       By default, a "NULL" byte in a field would be escaped. This option
701       enables you to treat the  "NULL"  byte as a simple binary character in
702       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
703       can prevent "NULL" escapes by setting this attribute to 0.
704
705       When the "escape_char" attribute is set to undefined,  this attribute
706       will be set to false.
707
708       The default setting will encode "=\x00=" as
709
710        "="0="
711
712       With "escape_null" set, this will result in
713
714        "=\x00="
715
716       The default when using the "csv" function is "false".
717
718       For backward compatibility reasons,  the deprecated old name
719       "quote_null" is still recognized.
720
721       keep_meta_info
722
723        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1 });
724                $csv->keep_meta_info (0);
725        my $f = $csv->keep_meta_info;
726
727       By default, the parsing of input records is as simple and fast as
728       possible.  However,  some parsing information - like quotation of the
729       original field - is lost in that process.  Setting this flag to true
730       enables retrieving that information after parsing with  the methods
731       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
732       false for performance.
733
734       If you set this attribute to a value greater than 9,   then you can
735       control output quotation style like it was used in the input of the the
736       last parsed record (unless quotation was added because of other
737       reasons).
738
739        my $csv = Text::CSV_XS->new ({
740           binary         => 1,
741           keep_meta_info => 1,
742           quote_space    => 0,
743           });
744
745        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
746
747        $csv->print (*STDOUT, \@row);
748        # 1,,, , ,f,g,"h""h",help,help
749        $csv->keep_meta_info (11);
750        $csv->print (*STDOUT, \@row);
751        # 1,,"", ," ",f,"g","h""h",help,"help"
752
753       undef_str
754
755        my $csv = Text::CSV_XS->new ({ undef_str => "\\N" });
756                $csv->undef_str (undef);
757        my $s = $csv->undef_str;
758
759       This attribute optionally defines the output of undefined fields. The
760       value passed is not changed at all, so if it needs quotation, the
761       quotation needs to be included in the value of the attribute.  Use with
762       caution, as passing a value like  ",",,,,"""  will for sure mess up
763       your output. The default for this attribute is "undef", meaning no
764       special treatment.
765
766       This attribute is useful when exporting  CSV data  to be imported in
767       custom loaders, like for MySQL, that recognize special sequences for
768       "NULL" data.
769
770       This attribute has no meaning when parsing CSV data.
771
772       comment_str
773
774        my $csv = Text::CSV_XS->new ({ comment_str => "#" });
775                $csv->comment_str (undef);
776        my $s = $csv->comment_str;
777
778       This attribute optionally defines a string to be recognized as comment.
779       If this attribute is defined,   all lines starting with this sequence
780       will not be parsed as CSV but skipped as comment.
781
782       This attribute has no meaning when generating CSV.
783
784       Comment strings that start with any of the special characters/sequences
785       are not supported (so it cannot start with any of "sep_char",
786       "quote_char", "escape_char", "sep", "quote", or "eol").
787
788       For convenience, "comment" is an alias for "comment_str".
789
790       verbatim
791
792        my $csv = Text::CSV_XS->new ({ verbatim => 1 });
793                $csv->verbatim (0);
794        my $f = $csv->verbatim;
795
796       This is a quite controversial attribute to set,  but makes some hard
797       things possible.
798
799       The rationale behind this attribute is to tell the parser that the
800       normally special characters newline ("NL") and Carriage Return ("CR")
801       will not be special when this flag is set,  and be dealt with  as being
802       ordinary binary characters. This will ease working with data with
803       embedded newlines.
804
805       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
806       every line.
807
808       Imagine a file format like
809
810        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
811
812       where, the line ending is a very specific "#\r\n", and the sep_char is
813       a "^" (caret).   None of the fields is quoted,   but embedded binary
814       data is likely to be present. With the specific line ending, this
815       should not be too hard to detect.
816
817       By default,  Text::CSV_XS'  parse function is instructed to only know
818       about "\n" and "\r"  to be legal line endings,  and so has to deal with
819       the embedded newline as a real "end-of-line",  so it can scan the next
820       line if binary is true, and the newline is inside a quoted field. With
821       this option, we tell "parse" to parse the line as if "\n" is just
822       nothing more than a binary character.
823
824       For "parse" this means that the parser has no more idea about line
825       ending and "getline" "chomp"s line endings on reading.
826
827       types
828
829       A set of column types; the attribute is immediately passed to the
830       "types" method.
831
832       callbacks
833
834       See the "Callbacks" section below.
835
836       accessors
837
838       To sum it up,
839
840        $csv = Text::CSV_XS->new ();
841
842       is equivalent to
843
844        $csv = Text::CSV_XS->new ({
845            eol                   => undef, # \r, \n, or \r\n
846            sep_char              => ',',
847            sep                   => undef,
848            quote_char            => '"',
849            quote                 => undef,
850            escape_char           => '"',
851            binary                => 0,
852            decode_utf8           => 1,
853            auto_diag             => 0,
854            diag_verbose          => 0,
855            blank_is_undef        => 0,
856            empty_is_undef        => 0,
857            allow_whitespace      => 0,
858            allow_loose_quotes    => 0,
859            allow_loose_escapes   => 0,
860            allow_unquoted_escape => 0,
861            always_quote          => 0,
862            quote_empty           => 0,
863            quote_space           => 1,
864            escape_null           => 1,
865            quote_binary          => 1,
866            keep_meta_info        => 0,
867            strict                => 0,
868            skip_empty_rows       => 0,
869            formula               => 0,
870            verbatim              => 0,
871            undef_str             => undef,
872            comment_str           => undef,
873            types                 => undef,
874            callbacks             => undef,
875            });
876
877       For all of the above mentioned flags, an accessor method is available
878       where you can inquire the current value, or change the value
879
880        my $quote = $csv->quote_char;
881        $csv->binary (1);
882
883       It is not wise to change these settings halfway through writing "CSV"
884       data to a stream. If however you want to create a new stream using the
885       available "CSV" object, there is no harm in changing them.
886
887       If the "new" constructor call fails,  it returns "undef",  and makes
888       the fail reason available through the "error_diag" method.
889
890        $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
891            die "".Text::CSV_XS->error_diag ();
892
893       "error_diag" will return a string like
894
895        "INI - Unknown attribute 'ecs_char'"
896
897   known_attributes
898        @attr = Text::CSV_XS->known_attributes;
899        @attr = Text::CSV_XS::known_attributes;
900        @attr = $csv->known_attributes;
901
902       This method will return an ordered list of all the supported
903       attributes as described above.   This can be useful for knowing what
904       attributes are valid in classes that use or extend Text::CSV_XS.
905
906   print
907        $status = $csv->print ($fh, $colref);
908
909       Similar to  "combine" + "string" + "print",  but much more efficient.
910       It expects an array ref as input  (not an array!)  and the resulting
911       string is not really  created,  but  immediately  written  to the  $fh
912       object, typically an IO handle or any other object that offers a
913       "print" method.
914
915       For performance reasons  "print"  does not create a result string,  so
916       all "string", "status", "fields", and "error_input" methods will return
917       undefined information after executing this method.
918
919       If $colref is "undef"  (explicit,  not through a variable argument) and
920       "bind_columns"  was used to specify fields to be printed,  it is
921       possible to make performance improvements, as otherwise data would have
922       to be copied as arguments to the method call:
923
924        $csv->bind_columns (\($foo, $bar));
925        $status = $csv->print ($fh, undef);
926
927       A short benchmark
928
929        my @data = ("aa" .. "zz");
930        $csv->bind_columns (\(@data));
931
932        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
933        $csv->print ($fh,  \@data  );   # 57600 recs/sec
934        $csv->print ($fh,   undef  );   # 48500 recs/sec
935
936   say
937        $status = $csv->say ($fh, $colref);
938
939       Like "print", but "eol" defaults to "$\".
940
941   print_hr
942        $csv->print_hr ($fh, $ref);
943
944       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
945       provided the column names are set with "column_names".
946
947       It is just a wrapper method with basic parameter checks over
948
949        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
950
951   combine
952        $status = $csv->combine (@fields);
953
954       This method constructs a "CSV" record from  @fields,  returning success
955       or failure.   Failure can result from lack of arguments or an argument
956       that contains an invalid character.   Upon success,  "string" can be
957       called to retrieve the resultant "CSV" string.  Upon failure,  the
958       value returned by "string" is undefined and "error_input" could be
959       called to retrieve the invalid argument.
960
961   string
962        $line = $csv->string ();
963
964       This method returns the input to  "parse"  or the resultant "CSV"
965       string of "combine", whichever was called more recently.
966
967   getline
968        $colref = $csv->getline ($fh);
969
970       This is the counterpart to  "print",  as "parse"  is the counterpart to
971       "combine":  it parses a row from the $fh  handle using the "getline"
972       method associated with $fh  and parses this row into an array ref.
973       This array ref is returned by the function or "undef" for failure.
974       When $fh does not support "getline", you are likely to hit errors.
975
976       When fields are bound with "bind_columns" the return value is a
977       reference to an empty list.
978
979       The "string", "fields", and "status" methods are meaningless again.
980
981   getline_all
982        $arrayref = $csv->getline_all ($fh);
983        $arrayref = $csv->getline_all ($fh, $offset);
984        $arrayref = $csv->getline_all ($fh, $offset, $length);
985
986       This will return a reference to a list of getline ($fh) results.  In
987       this call, "keep_meta_info" is disabled.  If $offset is negative, as
988       with "splice", only the last  "abs ($offset)" records of $fh are taken
989       into consideration. Parameters $offset and $length are expected to be
990       an integers. Non-integer values are interpreted as integer without
991       check.
992
993       Given a CSV file with 10 lines:
994
995        lines call
996        ----- ---------------------------------------------------------
997        0..9  $csv->getline_all ($fh)         # all
998        0..9  $csv->getline_all ($fh,  0)     # all
999        8..9  $csv->getline_all ($fh,  8)     # start at 8
1000        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
1001        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
1002        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
1003        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
1004        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
1005
1006   getline_hr
1007       The "getline_hr" and "column_names" methods work together  to allow you
1008       to have rows returned as hashrefs.  You must call "column_names" first
1009       to declare your column names.
1010
1011        $csv->column_names (qw( code name price description ));
1012        $hr = $csv->getline_hr ($fh);
1013        print "Price for $hr->{name} is $hr->{price} EUR\n";
1014
1015       "getline_hr" will croak if called before "column_names".
1016
1017       Note that  "getline_hr"  creates a hashref for every row and will be
1018       much slower than the combined use of "bind_columns"  and "getline" but
1019       still offering the same easy to use hashref inside the loop:
1020
1021        my @cols = @{$csv->getline ($fh)};
1022        $csv->column_names (@cols);
1023        while (my $row = $csv->getline_hr ($fh)) {
1024            print $row->{price};
1025            }
1026
1027       Could easily be rewritten to the much faster:
1028
1029        my @cols = @{$csv->getline ($fh)};
1030        my $row = {};
1031        $csv->bind_columns (\@{$row}{@cols});
1032        while ($csv->getline ($fh)) {
1033            print $row->{price};
1034            }
1035
1036       Your mileage may vary for the size of the data and the number of rows.
1037       With perl-5.14.2 the comparison for a 100_000 line file with 14
1038       columns:
1039
1040                   Rate hashrefs getlines
1041        hashrefs 1.00/s       --     -76%
1042        getlines 4.15/s     313%       --
1043
1044   getline_hr_all
1045        $arrayref = $csv->getline_hr_all ($fh);
1046        $arrayref = $csv->getline_hr_all ($fh, $offset);
1047        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
1048
1049       This will return a reference to a list of   getline_hr ($fh) results.
1050       In this call, "keep_meta_info" is disabled.
1051
1052   parse
1053        $status = $csv->parse ($line);
1054
1055       This method decomposes a  "CSV"  string into fields,  returning success
1056       or failure.   Failure can result from a lack of argument  or the given
1057       "CSV" string is improperly formatted.   Upon success, "fields" can be
1058       called to retrieve the decomposed fields. Upon failure calling "fields"
1059       will return undefined data and  "error_input"  can be called to
1060       retrieve  the invalid argument.
1061
1062       You may use the "types"  method for setting column types.  See "types"'
1063       description below.
1064
1065       The $line argument is supposed to be a simple scalar. Everything else
1066       is supposed to croak and set error 1500.
1067
1068   fragment
1069       This function tries to implement RFC7111  (URI Fragment Identifiers for
1070       the text/csv Media Type) -
1071       https://datatracker.ietf.org/doc/html/rfc7111
1072
1073        my $AoA = $csv->fragment ($fh, $spec);
1074
1075       In specifications,  "*" is used to specify the last item, a dash ("-")
1076       to indicate a range.   All indices are 1-based:  the first row or
1077       column has index 1. Selections can be combined with the semi-colon
1078       (";").
1079
1080       When using this method in combination with  "column_names",  the
1081       returned reference  will point to a  list of hashes  instead of a  list
1082       of lists.  A disjointed  cell-based combined selection  might return
1083       rows with different number of columns making the use of hashes
1084       unpredictable.
1085
1086        $csv->column_names ("Name", "Age");
1087        my $AoH = $csv->fragment ($fh, "col=3;8");
1088
1089       If the "after_parse" callback is active,  it is also called on every
1090       line parsed and skipped before the fragment.
1091
1092       row
1093          row=4
1094          row=5-7
1095          row=6-*
1096          row=1-2;4;6-*
1097
1098       col
1099          col=2
1100          col=1-3
1101          col=4-*
1102          col=1-2;4;7-*
1103
1104       cell
1105         In cell-based selection, the comma (",") is used to pair row and
1106         column
1107
1108          cell=4,1
1109
1110         The range operator ("-") using "cell"s can be used to define top-left
1111         and bottom-right "cell" location
1112
1113          cell=3,1-4,6
1114
1115         The "*" is only allowed in the second part of a pair
1116
1117          cell=3,2-*,2    # row 3 till end, only column 2
1118          cell=3,2-3,*    # column 2 till end, only row 3
1119          cell=3,2-*,*    # strip row 1 and 2, and column 1
1120
1121         Cells and cell ranges may be combined with ";", possibly resulting in
1122         rows with different numbers of columns
1123
1124          cell=1,1-2,2;3,3-4,4;1,4;4,1
1125
1126         Disjointed selections will only return selected cells.   The cells
1127         that are not  specified  will  not  be  included  in the  returned
1128         set,  not even as "undef".  As an example given a "CSV" like
1129
1130          11,12,13,...19
1131          21,22,...28,29
1132          :            :
1133          91,...97,98,99
1134
1135         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1136
1137          11,12,14
1138          21,22
1139          33,34
1140          41,43,44
1141
1142         Overlapping cell-specs will return those cells only once, So
1143         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1144
1145          11,12,13
1146          21,22,23,24
1147          31,32,33,34
1148          42,43,44
1149
1150       RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does  not
1151       allow different types of specs to be combined   (either "row" or "col"
1152       or "cell").  Passing an invalid fragment specification will croak and
1153       set error 2013.
1154
1155   column_names
1156       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1157       keys (column names) are passed, it will return the current setting as a
1158       list.
1159
1160       "column_names" accepts a list of scalars  (the column names)  or a
1161       single array_ref, so you can pass the return value from "getline" too:
1162
1163        $csv->column_names ($csv->getline ($fh));
1164
1165       "column_names" does no checking on duplicates at all, which might lead
1166       to unexpected results.   Undefined entries will be replaced with the
1167       string "\cAUNDEF\cA", so
1168
1169        $csv->column_names (undef, "", "name", "name");
1170        $hr = $csv->getline_hr ($fh);
1171
1172       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1173       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1174       field.
1175
1176       "column_names" croaks on invalid arguments.
1177
1178   header
1179       This method does NOT work in perl-5.6.x
1180
1181       Parse the CSV header and set "sep", column_names and encoding.
1182
1183        my @hdr = $csv->header ($fh);
1184        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1185        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1186
1187       The first argument should be a file handle.
1188
1189       This method resets some object properties,  as it is supposed to be
1190       invoked only once per file or stream.  It will leave attributes
1191       "column_names" and "bound_columns" alone if setting column names is
1192       disabled. Reading headers on previously process objects might fail on
1193       perl-5.8.0 and older.
1194
1195       Assuming that the file opened for parsing has a header, and the header
1196       does not contain problematic characters like embedded newlines,   read
1197       the first line from the open handle then auto-detect whether the header
1198       separates the column names with a character from the allowed separator
1199       list.
1200
1201       If any of the allowed separators matches,  and none of the other
1202       allowed separators match,  set  "sep"  to that  separator  for the
1203       current CSV_XS instance and use it to parse the first line, map those
1204       to lowercase, and use that to set the instance "column_names":
1205
1206        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1207        open my $fh, "<", "file.csv";
1208        binmode $fh; # for Windows
1209        $csv->header ($fh);
1210        while (my $row = $csv->getline_hr ($fh)) {
1211            ...
1212            }
1213
1214       If the header is empty,  contains more than one unique separator out of
1215       the allowed set,  contains empty fields,   or contains identical fields
1216       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1217       respectively.
1218
1219       If the header contains embedded newlines or is not valid  CSV  in any
1220       other way, this method will croak and leave the parse error untouched.
1221
1222       A successful call to "header"  will always set the  "sep"  of the $csv
1223       object. This behavior can not be disabled.
1224
1225       return value
1226
1227       On error this method will croak.
1228
1229       In list context,  the headers will be returned whether they are used to
1230       set "column_names" or not.
1231
1232       In scalar context, the instance itself is returned.  Note: the values
1233       as found in the header will effectively be  lost if  "set_column_names"
1234       is false.
1235
1236       Options
1237
1238       sep_set
1239          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1240
1241         The list of legal separators defaults to "[ ";", "," ]" and can be
1242         changed by this option.  As this is probably the most often used
1243         option,  it can be passed on its own as an unnamed argument:
1244
1245          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1246
1247         Multi-byte  sequences are allowed,  both multi-character and
1248         Unicode.  See "sep".
1249
1250       detect_bom
1251          $csv->header ($fh, { detect_bom => 1 });
1252
1253         The default behavior is to detect if the header line starts with a
1254         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1255         This default behavior can be disabled by passing a false value to
1256         "detect_bom".
1257
1258         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1259         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1260         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1261         supported.
1262
1263         If a supported BOM was detected as start of the stream, it is stored
1264         in the object attribute "ENCODING".
1265
1266          my $enc = $csv->{ENCODING};
1267
1268         The encoding is used with "binmode" on $fh.
1269
1270         If the handle was opened in a (correct) encoding,  this method will
1271         not alter the encoding, as it checks the leading bytes of the first
1272         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1273         "{ENCODING}" will be "" (empty) instead of the default "undef".
1274
1275       munge_column_names
1276         This option offers the means to modify the column names into
1277         something that is most useful to the application.   The default is to
1278         map all column names to lower case.
1279
1280          $csv->header ($fh, { munge_column_names => "lc" });
1281
1282         The following values are available:
1283
1284           lc     - lower case
1285           uc     - upper case
1286           db     - valid DB field names
1287           none   - do not change
1288           \%hash - supply a mapping
1289           \&cb   - supply a callback
1290
1291         Lower case
1292            $csv->header ($fh, { munge_column_names => "lc" });
1293
1294           The header is changed to all lower-case
1295
1296            $_ = lc;
1297
1298         Upper case
1299            $csv->header ($fh, { munge_column_names => "uc" });
1300
1301           The header is changed to all upper-case
1302
1303            $_ = uc;
1304
1305         Literal
1306            $csv->header ($fh, { munge_column_names => "none" });
1307
1308         Hash
1309            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1310
1311           if a value does not exist, the original value is used unchanged
1312
1313         Database
1314            $csv->header ($fh, { munge_column_names => "db" });
1315
1316           - lower-case
1317
1318           - all sequences of non-word characters are replaced with an
1319             underscore
1320
1321           - all leading underscores are removed
1322
1323            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1324
1325         Callback
1326            $csv->header ($fh, { munge_column_names => sub { fc } });
1327            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1328            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1329
1330           As this callback is called in a "map", you can use $_ directly.
1331
1332       set_column_names
1333          $csv->header ($fh, { set_column_names => 1 });
1334
1335         The default is to set the instances column names using
1336         "column_names" if the method is successful,  so subsequent calls to
1337         "getline_hr" can return a hash. Disable setting the header can be
1338         forced by using a false value for this option.
1339
1340         As described in "return value" above, content is lost in scalar
1341         context.
1342
1343       Validation
1344
1345       When receiving CSV files from external sources,  this method can be
1346       used to protect against changes in the layout by restricting to known
1347       headers  (and typos in the header fields).
1348
1349        my %known = (
1350            "record key" => "c_rec",
1351            "rec id"     => "c_rec",
1352            "id_rec"     => "c_rec",
1353            "kode"       => "code",
1354            "code"       => "code",
1355            "vaule"      => "value",
1356            "value"      => "value",
1357            );
1358        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1359        open my $fh, "<", $source or die "$source: $!";
1360        $csv->header ($fh, { munge_column_names => sub {
1361            s/\s+$//;
1362            s/^\s+//;
1363            $known{lc $_} or die "Unknown column '$_' in $source";
1364            }});
1365        while (my $row = $csv->getline_hr ($fh)) {
1366            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1367            }
1368
1369   bind_columns
1370       Takes a list of scalar references to be used for output with  "print"
1371       or to store in the fields fetched by "getline".  When you do not pass
1372       enough references to store the fetched fields in, "getline" will fail
1373       with error 3006.  If you pass more than there are fields to return,
1374       the content of the remaining references is left untouched.
1375
1376        $csv->bind_columns (\$code, \$name, \$price, \$description);
1377        while ($csv->getline ($fh)) {
1378            print "The price of a $name is \x{20ac} $price\n";
1379            }
1380
1381       To reset or clear all column binding, call "bind_columns" with the
1382       single argument "undef". This will also clear column names.
1383
1384        $csv->bind_columns (undef);
1385
1386       If no arguments are passed at all, "bind_columns" will return the list
1387       of current bindings or "undef" if no binds are active.
1388
1389       Note that in parsing with  "bind_columns",  the fields are set on the
1390       fly.  That implies that if the third field of a row causes an error
1391       (or this row has just two fields where the previous row had more),  the
1392       first two fields already have been assigned the values of the current
1393       row, while the rest of the fields will still hold the values of the
1394       previous row.  If you want the parser to fail in these cases, use the
1395       "strict" attribute.
1396
1397   eof
1398        $eof = $csv->eof ();
1399
1400       If "parse" or  "getline"  was used with an IO stream,  this method will
1401       return true (1) if the last call hit end of file,  otherwise it will
1402       return false ('').  This is useful to see the difference between a
1403       failure and end of file.
1404
1405       Note that if the parsing of the last line caused an error,  "eof" is
1406       still true.  That means that if you are not using "auto_diag", an idiom
1407       like
1408
1409        while (my $row = $csv->getline ($fh)) {
1410            # ...
1411            }
1412        $csv->eof or $csv->error_diag;
1413
1414       will not report the error. You would have to change that to
1415
1416        while (my $row = $csv->getline ($fh)) {
1417            # ...
1418            }
1419        +$csv->error_diag and $csv->error_diag;
1420
1421   types
1422        $csv->types (\@tref);
1423
1424       This method is used to force that  (all)  columns are of a given type.
1425       For example, if you have an integer column,  two  columns  with
1426       doubles  and a string column, then you might do a
1427
1428        $csv->types ([Text::CSV_XS::IV (),
1429                      Text::CSV_XS::NV (),
1430                      Text::CSV_XS::NV (),
1431                      Text::CSV_XS::PV ()]);
1432
1433       Column types are used only for decoding columns while parsing,  in
1434       other words by the "parse" and "getline" methods.
1435
1436       You can unset column types by doing a
1437
1438        $csv->types (undef);
1439
1440       or fetch the current type settings with
1441
1442        $types = $csv->types ();
1443
1444       IV
1445       CSV_TYPE_IV
1446           Set field type to integer.
1447
1448       NV
1449       CSV_TYPE_NV
1450           Set field type to numeric/float.
1451
1452       PV
1453       CSV_TYPE_PV
1454           Set field type to string.
1455
1456   fields
1457        @columns = $csv->fields ();
1458
1459       This method returns the input to   "combine"  or the resultant
1460       decomposed fields of a successful "parse", whichever was called more
1461       recently.
1462
1463       Note that the return value is undefined after using "getline", which
1464       does not fill the data structures returned by "parse".
1465
1466   meta_info
1467        @flags = $csv->meta_info ();
1468
1469       This method returns the "flags" of the input to "combine" or the flags
1470       of the resultant  decomposed fields of  "parse",   whichever was called
1471       more recently.
1472
1473       For each field,  a meta_info field will hold  flags that  inform
1474       something about  the  field  returned  by  the  "fields"  method or
1475       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1476
1477       0x0001
1478       "CSV_FLAGS_IS_QUOTED"
1479         The field was quoted.
1480
1481       0x0002
1482       "CSV_FLAGS_IS_BINARY"
1483         The field was binary.
1484
1485       0x0004
1486       "CSV_FLAGS_ERROR_IN_FIELD"
1487         The field was invalid.
1488
1489         Currently only used when "allow_loose_quotes" is active.
1490
1491       0x0010
1492       "CSV_FLAGS_IS_MISSING"
1493         The field was missing.
1494
1495       See the "is_***" methods below.
1496
1497   is_quoted
1498        my $quoted = $csv->is_quoted ($column_idx);
1499
1500       where  $column_idx is the  (zero-based)  index of the column in the
1501       last result of "parse".
1502
1503       This returns a true value  if the data in the indicated column was
1504       enclosed in "quote_char" quotes.  This might be important for fields
1505       where content ",20070108," is to be treated as a numeric value,  and
1506       where ","20070108"," is explicitly marked as character string data.
1507
1508       This method is only valid when "keep_meta_info" is set to a true value.
1509
1510   is_binary
1511        my $binary = $csv->is_binary ($column_idx);
1512
1513       where  $column_idx is the  (zero-based)  index of the column in the
1514       last result of "parse".
1515
1516       This returns a true value if the data in the indicated column contained
1517       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1518
1519       This method is only valid when "keep_meta_info" is set to a true value.
1520
1521   is_missing
1522        my $missing = $csv->is_missing ($column_idx);
1523
1524       where  $column_idx is the  (zero-based)  index of the column in the
1525       last result of "getline_hr".
1526
1527        $csv->keep_meta_info (1);
1528        while (my $hr = $csv->getline_hr ($fh)) {
1529            $csv->is_missing (0) and next; # This was an empty line
1530            }
1531
1532       When using  "getline_hr",  it is impossible to tell if the  parsed
1533       fields are "undef" because they where not filled in the "CSV" stream
1534       or because they were not read at all, as all the fields defined by
1535       "column_names" are set in the hash-ref.    If you still need to know if
1536       all fields in each row are provided, you should enable "keep_meta_info"
1537       so you can check the flags.
1538
1539       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1540       "undef", regardless of $column_idx being valid or not. If this
1541       attribute is "true" it will return either 0 (the field is present) or 1
1542       (the field is missing).
1543
1544       A special case is the empty line.  If the line is completely empty -
1545       after dealing with the flags - this is still a valid CSV line:  it is a
1546       record of just one single empty field. However, if "keep_meta_info" is
1547       set, invoking "is_missing" with index 0 will now return true.
1548
1549   status
1550        $status = $csv->status ();
1551
1552       This method returns the status of the last invoked "combine" or "parse"
1553       call. Status is success (true: 1) or failure (false: "undef" or 0).
1554
1555       Note that as this only keeps track of the status of above mentioned
1556       methods, you are probably looking for "error_diag" instead.
1557
1558   error_input
1559        $bad_argument = $csv->error_input ();
1560
1561       This method returns the erroneous argument (if it exists) of "combine"
1562       or "parse",  whichever was called more recently.  If the last
1563       invocation was successful, "error_input" will return "undef".
1564
1565       Depending on the type of error, it might also hold the data for the
1566       last error-input of "getline".
1567
1568   error_diag
1569        Text::CSV_XS->error_diag ();
1570        $csv->error_diag ();
1571        $error_code               = 0  + $csv->error_diag ();
1572        $error_str                = "" . $csv->error_diag ();
1573        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1574
1575       If (and only if) an error occurred,  this function returns  the
1576       diagnostics of that error.
1577
1578       If called in void context,  this will print the internal error code and
1579       the associated error message to STDERR.
1580
1581       If called in list context,  this will return  the error code  and the
1582       error message in that order.  If the last error was from parsing, the
1583       rest of the values returned are a best guess at the location  within
1584       the line  that was being parsed. Their values are 1-based.  The
1585       position currently is index of the byte at which the parsing failed in
1586       the current record. It might change to be the index of the current
1587       character in a later release. The records is the index of the record
1588       parsed by the csv instance. The field number is the index of the field
1589       the parser thinks it is currently  trying to  parse. See
1590       examples/csv-check for how this can be used.
1591
1592       If called in  scalar context,  it will return  the diagnostics  in a
1593       single scalar, a-la $!.  It will contain the error code in numeric
1594       context, and the diagnostics message in string context.
1595
1596       When called as a class method or a  direct function call,  the
1597       diagnostics are that of the last "new" call.
1598
1599   record_number
1600        $recno = $csv->record_number ();
1601
1602       Returns the records parsed by this csv instance.  This value should be
1603       more accurate than $. when embedded newlines come in play. Records
1604       written by this instance are not counted.
1605
1606   SetDiag
1607        $csv->SetDiag (0);
1608
1609       Use to reset the diagnostics if you are dealing with errors.
1610

IMPORTS/EXPORTS

1612       By default none of these are exported.
1613
1614       csv
1615          use Text::CSV_XS qw( csv );
1616
1617         Import the function "csv" function. See below.
1618
1619       :CONSTANTS
1620          use Text::CSV_XS qw( :CONSTANTS );
1621
1622         Import module constants  "CSV_FLAGS_IS_QUOTED",
1623         "CSV_FLAGS_IS_BINARY", "CSV_FLAGS_ERROR_IN_FIELD",
1624         "CSV_FLAGS_IS_MISSING",   "CSV_TYPE_PV", "CSV_TYPE_IV", and
1625         "CSV_TYPE_NV". Each can be imported alone
1626
1627          use Text::CSV_XS qw( CSV_FLAS_IS_BINARY CSV_TYPE_NV );
1628

FUNCTIONS

1630   csv
1631       This function is not exported by default and should be explicitly
1632       requested:
1633
1634        use Text::CSV_XS qw( csv );
1635
1636       This is a high-level function that aims at simple (user) interfaces.
1637       This can be used to read/parse a "CSV" file or stream (the default
1638       behavior) or to produce a file or write to a stream (define the  "out"
1639       attribute).  It returns an array- or hash-reference on parsing (or
1640       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1641       When this function fails you can get to the error using the class call
1642       to "error_diag"
1643
1644        my $aoa = csv (in => "test.csv") or
1645            die Text::CSV_XS->error_diag;
1646
1647       This function takes the arguments as key-value pairs. This can be
1648       passed as a list or as an anonymous hash:
1649
1650        my $aoa = csv (  in => "test.csv", sep_char => ";");
1651        my $aoh = csv ({ in => $fh, headers => "auto" });
1652
1653       The arguments passed consist of two parts:  the arguments to "csv"
1654       itself and the optional attributes to the  "CSV"  object used inside
1655       the function as enumerated and explained in "new".
1656
1657       If not overridden, the default option used for CSV is
1658
1659        auto_diag   => 1
1660        escape_null => 0
1661
1662       The option that is always set and cannot be altered is
1663
1664        binary      => 1
1665
1666       As this function will likely be used in one-liners,  it allows  "quote"
1667       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1668       "esc" or "escape".
1669
1670       Alternative invocations:
1671
1672        my $aoa = Text::CSV_XS::csv (in => "file.csv");
1673
1674        my $csv = Text::CSV_XS->new ();
1675        my $aoa = $csv->csv (in => "file.csv");
1676
1677       In the latter case, the object attributes are used from the existing
1678       object and the attribute arguments in the function call are ignored:
1679
1680        my $csv = Text::CSV_XS->new ({ sep_char => ";" });
1681        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1682
1683       will parse using ";" as "sep_char", not ",".
1684
1685       in
1686
1687       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1688       which will be  opened for reading  and closed when finished,  a file
1689       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1690       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1691       "\q{1,2,"csv"}").
1692
1693       When used with "out", "in" should be a reference to a CSV structure
1694       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1695       reference.  The code-ref will be invoked with no arguments.
1696
1697        my $aoa = csv (in => "file.csv");
1698
1699        open my $fh, "<", "file.csv";
1700        my $aoa = csv (in => $fh);
1701
1702        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1703        my $err = csv (in => $csv, out => "file.csv");
1704
1705       If called in void context without the "out" attribute, the resulting
1706       ref will be used as input to a subsequent call to csv:
1707
1708        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1709
1710       will be a shortcut to
1711
1712        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1713
1714       where, in the absence of the "out" attribute, this is a shortcut to
1715
1716        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1717             out => *STDOUT)
1718
1719       out
1720
1721        csv (in => $aoa, out => "file.csv");
1722        csv (in => $aoa, out => $fh);
1723        csv (in => $aoa, out =>   STDOUT);
1724        csv (in => $aoa, out =>  *STDOUT);
1725        csv (in => $aoa, out => \*STDOUT);
1726        csv (in => $aoa, out => \my $data);
1727        csv (in => $aoa, out =>  undef);
1728        csv (in => $aoa, out => \"skip");
1729
1730        csv (in => $fh,  out => \@aoa);
1731        csv (in => $fh,  out => \@aoh, bom => 1);
1732        csv (in => $fh,  out => \%hsh, key => "key");
1733
1734       In output mode, the default CSV options when producing CSV are
1735
1736        eol       => "\r\n"
1737
1738       The "fragment" attribute is ignored in output mode.
1739
1740       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1741       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1742       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1743       or a reference to a scalar (e.g. "\my $data").
1744
1745        csv (in => sub { $sth->fetch },            out => "dump.csv");
1746        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1747             headers => $sth->{NAME_lc});
1748
1749       When a code-ref is used for "in", the output is generated  per
1750       invocation, so no buffering is involved. This implies that there is no
1751       size restriction on the number of records. The "csv" function ends when
1752       the coderef returns a false value.
1753
1754       If "out" is set to a reference of the literal string "skip", the output
1755       will be suppressed completely,  which might be useful in combination
1756       with a filter for side effects only.
1757
1758        my %cache;
1759        csv (in    => "dump.csv",
1760             out   => \"skip",
1761             on_in => sub { $cache{$_[1][1]}++ });
1762
1763       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1764       equivalent to "\"skip"".
1765
1766       If the "in" argument point to something to parse, and the "out" is set
1767       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1768       data in the existing reference. The result of the parse should match
1769       what exists in the reference passed. This might come handy when you
1770       have to parse a set of files with similar content (like data stored per
1771       period) and you want to collect that into a single data structure:
1772
1773        my %hash;
1774        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1775
1776        my @list; # List of arrays
1777        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1778
1779        my @list; # List of hashes
1780        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1781
1782       encoding
1783
1784       If passed,  it should be an encoding accepted by the  :encoding()
1785       option to "open". There is no default value. This attribute does not
1786       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1787       use in command line invocations.
1788
1789       If "encoding" is set to the literal value "auto", the method "header"
1790       will be invoked on the opened stream to check if there is a BOM and set
1791       the encoding accordingly.   This is equal to passing a true value in
1792       the option "detect_bom".
1793
1794       Encodings can be stacked, as supported by "binmode":
1795
1796        # Using PerlIO::via::gzip
1797        csv (in       => \@csv,
1798             out      => "test.csv:via.gz",
1799             encoding => ":via(gzip):encoding(utf-8)",
1800             );
1801        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1802
1803        # Using PerlIO::gzip
1804        csv (in       => \@csv,
1805             out      => "test.csv:via.gz",
1806             encoding => ":gzip:encoding(utf-8)",
1807             );
1808        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1809
1810       detect_bom
1811
1812       If  "detect_bom"  is given, the method  "header"  will be invoked on
1813       the opened stream to check if there is a BOM and set the encoding
1814       accordingly.
1815
1816       "detect_bom" can be abbreviated to "bom".
1817
1818       This is the same as setting "encoding" to "auto".
1819
1820       Note that as the method  "header" is invoked,  its default is to also
1821       set the headers.
1822
1823       headers
1824
1825       If this attribute is not given, the default behavior is to produce an
1826       array of arrays.
1827
1828       If "headers" is supplied,  it should be an anonymous list of column
1829       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1830       "lc", "uc", or "skip".
1831
1832       skip
1833         When "skip" is used, the header will not be included in the output.
1834
1835          my $aoa = csv (in => $fh, headers => "skip");
1836
1837       auto
1838         If "auto" is used, the first line of the "CSV" source will be read as
1839         the list of field headers and used to produce an array of hashes.
1840
1841          my $aoh = csv (in => $fh, headers => "auto");
1842
1843       lc
1844         If "lc" is used,  the first line of the  "CSV" source will be read as
1845         the list of field headers mapped to  lower case and used to produce
1846         an array of hashes. This is a variation of "auto".
1847
1848          my $aoh = csv (in => $fh, headers => "lc");
1849
1850       uc
1851         If "uc" is used,  the first line of the  "CSV" source will be read as
1852         the list of field headers mapped to  upper case and used to produce
1853         an array of hashes. This is a variation of "auto".
1854
1855          my $aoh = csv (in => $fh, headers => "uc");
1856
1857       CODE
1858         If a coderef is used,  the first line of the  "CSV" source will be
1859         read as the list of mangled field headers in which each field is
1860         passed as the only argument to the coderef. This list is used to
1861         produce an array of hashes.
1862
1863          my $aoh = csv (in      => $fh,
1864                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1865
1866         this example is a variation of using "lc" where all occurrences of
1867         "kode" are replaced with "code".
1868
1869       ARRAY
1870         If  "headers"  is an anonymous list,  the entries in the list will be
1871         used as field names. The first line is considered data instead of
1872         headers.
1873
1874          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1875          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1876
1877       HASH
1878         If "headers" is a hash reference, this implies "auto", but header
1879         fields that exist as key in the hashref will be replaced by the value
1880         for that key. Given a CSV file like
1881
1882          post-kode,city,name,id number,fubble
1883          1234AA,Duckstad,Donald,13,"X313DF"
1884
1885         using
1886
1887          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1888
1889         will return an entry like
1890
1891          { pc     => "1234AA",
1892            city   => "Duckstad",
1893            name   => "Donald",
1894            ID     => "13",
1895            fubble => "X313DF",
1896            }
1897
1898       See also "munge_column_names" and "set_column_names".
1899
1900       munge_column_names
1901
1902       If "munge_column_names" is set,  the method  "header"  is invoked on
1903       the opened stream with all matching arguments to detect and set the
1904       headers.
1905
1906       "munge_column_names" can be abbreviated to "munge".
1907
1908       key
1909
1910       If passed,  will default  "headers"  to "auto" and return a hashref
1911       instead of an array of hashes. Allowed values are simple scalars or
1912       array-references where the first element is the joiner and the rest are
1913       the fields to join to combine the key.
1914
1915        my $ref = csv (in => "test.csv", key => "code");
1916        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1917
1918       with test.csv like
1919
1920        code,product,price,color
1921        1,pc,850,gray
1922        2,keyboard,12,white
1923        3,mouse,5,black
1924
1925       the first example will return
1926
1927         { 1   => {
1928               code    => 1,
1929               color   => 'gray',
1930               price   => 850,
1931               product => 'pc'
1932               },
1933           2   => {
1934               code    => 2,
1935               color   => 'white',
1936               price   => 12,
1937               product => 'keyboard'
1938               },
1939           3   => {
1940               code    => 3,
1941               color   => 'black',
1942               price   => 5,
1943               product => 'mouse'
1944               }
1945           }
1946
1947       the second example will return
1948
1949         { "1:gray"    => {
1950               code    => 1,
1951               color   => 'gray',
1952               price   => 850,
1953               product => 'pc'
1954               },
1955           "2:white"   => {
1956               code    => 2,
1957               color   => 'white',
1958               price   => 12,
1959               product => 'keyboard'
1960               },
1961           "3:black"   => {
1962               code    => 3,
1963               color   => 'black',
1964               price   => 5,
1965               product => 'mouse'
1966               }
1967           }
1968
1969       The "key" attribute can be combined with "headers" for "CSV" date that
1970       has no header line, like
1971
1972        my $ref = csv (
1973            in      => "foo.csv",
1974            headers => [qw( c_foo foo bar description stock )],
1975            key     =>     "c_foo",
1976            );
1977
1978       value
1979
1980       Used to create key-value hashes.
1981
1982       Only allowed when "key" is valid. A "value" can be either a single
1983       column label or an anonymous list of column labels.  In the first case,
1984       the value will be a simple scalar value, in the latter case, it will be
1985       a hashref.
1986
1987        my $ref = csv (in => "test.csv", key   => "code",
1988                                         value => "price");
1989        my $ref = csv (in => "test.csv", key   => "code",
1990                                         value => [ "product", "price" ]);
1991        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1992                                         value => "price");
1993        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1994                                         value => [ "product", "price" ]);
1995
1996       with test.csv like
1997
1998        code,product,price,color
1999        1,pc,850,gray
2000        2,keyboard,12,white
2001        3,mouse,5,black
2002
2003       the first example will return
2004
2005         { 1 => 850,
2006           2 =>  12,
2007           3 =>   5,
2008           }
2009
2010       the second example will return
2011
2012         { 1   => {
2013               price   => 850,
2014               product => 'pc'
2015               },
2016           2   => {
2017               price   => 12,
2018               product => 'keyboard'
2019               },
2020           3   => {
2021               price   => 5,
2022               product => 'mouse'
2023               }
2024           }
2025
2026       the third example will return
2027
2028         { "1:gray"    => 850,
2029           "2:white"   =>  12,
2030           "3:black"   =>   5,
2031           }
2032
2033       the fourth example will return
2034
2035         { "1:gray"    => {
2036               price   => 850,
2037               product => 'pc'
2038               },
2039           "2:white"   => {
2040               price   => 12,
2041               product => 'keyboard'
2042               },
2043           "3:black"   => {
2044               price   => 5,
2045               product => 'mouse'
2046               }
2047           }
2048
2049       keep_headers
2050
2051       When using hashes,  keep the column names into the arrayref passed,  so
2052       all headers are available after the call in the original order.
2053
2054        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
2055
2056       This attribute can be abbreviated to "kh" or passed as
2057       "keep_column_names".
2058
2059       This attribute implies a default of "auto" for the "headers" attribute.
2060
2061       The headers can also be kept internally to keep stable header order:
2062
2063        csv (in      => csv (in => "file.csv", kh => "internal"),
2064             out     => "new.csv",
2065             kh      => "internal");
2066
2067       where "internal" can also be 1, "yes", or "true". This is similar to
2068
2069        my @h;
2070        csv (in      => csv (in => "file.csv", kh => \@h),
2071             out     => "new.csv",
2072             headers => \@h);
2073
2074       fragment
2075
2076       Only output the fragment as defined in the "fragment" method. This
2077       option is ignored when generating "CSV". See "out".
2078
2079       Combining all of them could give something like
2080
2081        use Text::CSV_XS qw( csv );
2082        my $aoh = csv (
2083            in       => "test.txt",
2084            encoding => "utf-8",
2085            headers  => "auto",
2086            sep_char => "|",
2087            fragment => "row=3;6-9;15-*",
2088            );
2089        say $aoh->[15]{Foo};
2090
2091       sep_set
2092
2093       If "sep_set" is set, the method "header" is invoked on the opened
2094       stream to detect and set "sep_char" with the given set.
2095
2096       "sep_set" can be abbreviated to "seps". If neither "sep_set" not "seps"
2097       is given, but "sep" is defined, "sep_set" defaults to "[ sep ]". This
2098       is only supported for perl version 5.10 and up.
2099
2100       Note that as the  "header" method is invoked,  its default is to also
2101       set the headers.
2102
2103       set_column_names
2104
2105       If  "set_column_names" is passed,  the method "header" is invoked on
2106       the opened stream with all arguments meant for "header".
2107
2108       If "set_column_names" is passed as a false value, the content of the
2109       first row is only preserved if the output is AoA:
2110
2111       With an input-file like
2112
2113        bAr,foo
2114        1,2
2115        3,4,5
2116
2117       This call
2118
2119        my $aoa = csv (in => $file, set_column_names => 0);
2120
2121       will result in
2122
2123        [[ "bar", "foo"     ],
2124         [ "1",   "2"       ],
2125         [ "3",   "4",  "5" ]]
2126
2127       and
2128
2129        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2130
2131       will result in
2132
2133        [[ "bAr", "foo"     ],
2134         [ "1",   "2"       ],
2135         [ "3",   "4",  "5" ]]
2136
2137   Callbacks
2138       Callbacks enable actions triggered from the inside of Text::CSV_XS.
2139
2140       While most of what this enables  can easily be done in an  unrolled
2141       loop as described in the "SYNOPSIS" callbacks can be used to meet
2142       special demands or enhance the "csv" function.
2143
2144       error
2145          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2146
2147         the "error"  callback is invoked when an error occurs,  but  only
2148         when "auto_diag" is set to a true value. A callback is invoked with
2149         the values returned by "error_diag":
2150
2151          my ($c, $s);
2152
2153          sub ignore3006 {
2154              my ($err, $msg, $pos, $recno, $fldno) = @_;
2155              if ($err == 3006) {
2156                  # ignore this error
2157                  ($c, $s) = (undef, undef);
2158                  Text::CSV_XS->SetDiag (0);
2159                  }
2160              # Any other error
2161              return;
2162              } # ignore3006
2163
2164          $csv->callbacks (error => \&ignore3006);
2165          $csv->bind_columns (\$c, \$s);
2166          while ($csv->getline ($fh)) {
2167              # Error 3006 will not stop the loop
2168              }
2169
2170       after_parse
2171          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2172          while (my $row = $csv->getline ($fh)) {
2173              $row->[-1] eq "NEW";
2174              }
2175
2176         This callback is invoked after parsing with  "getline"  only if no
2177         error occurred.  The callback is invoked with two arguments:   the
2178         current "CSV" parser object and an array reference to the fields
2179         parsed.
2180
2181         The return code of the callback is ignored  unless it is a reference
2182         to the string "skip", in which case the record will be skipped in
2183         "getline_all".
2184
2185          sub add_from_db {
2186              my ($csv, $row) = @_;
2187              $sth->execute ($row->[4]);
2188              push @$row, $sth->fetchrow_array;
2189              } # add_from_db
2190
2191          my $aoa = csv (in => "file.csv", callbacks => {
2192              after_parse => \&add_from_db });
2193
2194         This hook can be used for validation:
2195
2196         FAIL
2197           Die if any of the records does not validate a rule:
2198
2199            after_parse => sub {
2200                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2201                    die "5th field does not have a valid Dutch zipcode";
2202                }
2203
2204         DEFAULT
2205           Replace invalid fields with a default value:
2206
2207            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2208
2209         SKIP
2210           Skip records that have invalid fields (only applies to
2211           "getline_all"):
2212
2213            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2214
2215       before_print
2216          my $idx = 1;
2217          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2218          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2219
2220         This callback is invoked  before printing with  "print"  only if no
2221         error occurred.  The callback is invoked with two arguments:  the
2222         current  "CSV" parser object and an array reference to the fields
2223         passed.
2224
2225         The return code of the callback is ignored.
2226
2227          sub max_4_fields {
2228              my ($csv, $row) = @_;
2229              @$row > 4 and splice @$row, 4;
2230              } # max_4_fields
2231
2232          csv (in => csv (in => "file.csv"), out => *STDOUT,
2233              callbacks => { before_print => \&max_4_fields });
2234
2235         This callback is not active for "combine".
2236
2237       Callbacks for csv ()
2238
2239       The "csv" allows for some callbacks that do not integrate in XS
2240       internals but only feature the "csv" function.
2241
2242         csv (in        => "file.csv",
2243              callbacks => {
2244                  filter       => { 6 => sub { $_ > 15 } },    # first
2245                  after_parse  => sub { say "AFTER PARSE";  }, # first
2246                  after_in     => sub { say "AFTER IN";     }, # second
2247                  on_in        => sub { say "ON IN";        }, # third
2248                  },
2249              );
2250
2251         csv (in        => $aoh,
2252              out       => "file.csv",
2253              callbacks => {
2254                  on_in        => sub { say "ON IN";        }, # first
2255                  before_out   => sub { say "BEFORE OUT";   }, # second
2256                  before_print => sub { say "BEFORE PRINT"; }, # third
2257                  },
2258              );
2259
2260       filter
2261         This callback can be used to filter records.  It is called just after
2262         a new record has been scanned.  The callback accepts a:
2263
2264         hashref
2265           The keys are the index to the row (the field name or field number,
2266           1-based) and the values are subs to return a true or false value.
2267
2268            csv (in => "file.csv", filter => {
2269                       3 => sub { m/a/ },       # third field should contain an "a"
2270                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2271                       });
2272
2273            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2274
2275           If the keys to the filter hash contain any character that is not a
2276           digit it will also implicitly set "headers" to "auto"  unless
2277           "headers"  was already passed as argument.  When headers are
2278           active, returning an array of hashes, the filter is not applicable
2279           to the header itself.
2280
2281           All sub results should match, as in AND.
2282
2283           The context of the callback sets  $_ localized to the field
2284           indicated by the filter. The two arguments are as with all other
2285           callbacks, so the other fields in the current row can be seen:
2286
2287            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2288
2289           If the context is set to return a list of hashes  ("headers" is
2290           defined), the current record will also be available in the
2291           localized %_:
2292
2293            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2294
2295           If the filter is used to alter the content by changing $_,  make
2296           sure that the sub returns true in order not to have that record
2297           skipped:
2298
2299            filter => { 2 => sub { $_ = uc }}
2300
2301           will upper-case the second field, and then skip it if the resulting
2302           content evaluates to false. To always accept, end with truth:
2303
2304            filter => { 2 => sub { $_ = uc; 1 }}
2305
2306         coderef
2307            csv (in => "file.csv", filter => sub { $n++; 0; });
2308
2309           If the argument to "filter" is a coderef,  it is an alias or
2310           shortcut to a filter on column 0:
2311
2312            csv (filter => sub { $n++; 0 });
2313
2314           is equal to
2315
2316            csv (filter => { 0 => sub { $n++; 0 });
2317
2318         filter-name
2319            csv (in => "file.csv", filter => "not_blank");
2320            csv (in => "file.csv", filter => "not_empty");
2321            csv (in => "file.csv", filter => "filled");
2322
2323           These are predefined filters
2324
2325           Given a file like (line numbers prefixed for doc purpose only):
2326
2327            1:1,2,3
2328            2:
2329            3:,
2330            4:""
2331            5:,,
2332            6:, ,
2333            7:"",
2334            8:" "
2335            9:4,5,6
2336
2337           not_blank
2338             Filter out the blank lines
2339
2340             This filter is a shortcut for
2341
2342              filter => { 0 => sub { @{$_[1]} > 1 or
2343                          defined $_[1][0] && $_[1][0] ne "" } }
2344
2345             Due to the implementation,  it is currently impossible to also
2346             filter lines that consists only of a quoted empty field. These
2347             lines are also considered blank lines.
2348
2349             With the given example, lines 2 and 4 will be skipped.
2350
2351           not_empty
2352             Filter out lines where all the fields are empty.
2353
2354             This filter is a shortcut for
2355
2356              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2357
2358             A space is not regarded being empty, so given the example data,
2359             lines 2, 3, 4, 5, and 7 are skipped.
2360
2361           filled
2362             Filter out lines that have no visible data
2363
2364             This filter is a shortcut for
2365
2366              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2367
2368             This filter rejects all lines that not have at least one field
2369             that does not evaluate to the empty string.
2370
2371             With the given example data, this filter would skip lines 2
2372             through 8.
2373
2374         One could also use modules like Types::Standard:
2375
2376          use Types::Standard -types;
2377
2378          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2379          my $check  = $type->compiled_check;
2380
2381          # filter with compiled check and warnings
2382          my $aoa = csv (
2383             in     => \$data,
2384             filter => {
2385                 0 => sub {
2386                     my $ok = $check->($_[1]) or
2387                         warn $type->get_message ($_[1]), "\n";
2388                     return $ok;
2389                     },
2390                 },
2391             );
2392
2393       after_in
2394         This callback is invoked for each record after all records have been
2395         parsed but before returning the reference to the caller.  The hook is
2396         invoked with two arguments:  the current  "CSV"  parser object  and a
2397         reference to the record.   The reference can be a reference to a
2398         HASH  or a reference to an ARRAY as determined by the arguments.
2399
2400         This callback can also be passed as  an attribute without the
2401         "callbacks" wrapper.
2402
2403       before_out
2404         This callback is invoked for each record before the record is
2405         printed.  The hook is invoked with two arguments:  the current "CSV"
2406         parser object and a reference to the record.   The reference can be a
2407         reference to a  HASH or a reference to an ARRAY as determined by the
2408         arguments.
2409
2410         This callback can also be passed as an attribute  without the
2411         "callbacks" wrapper.
2412
2413         This callback makes the row available in %_ if the row is a hashref.
2414         In this case %_ is writable and will change the original row.
2415
2416       on_in
2417         This callback acts exactly as the "after_in" or the "before_out"
2418         hooks.
2419
2420         This callback can also be passed as an attribute  without the
2421         "callbacks" wrapper.
2422
2423         This callback makes the row available in %_ if the row is a hashref.
2424         In this case %_ is writable and will change the original row. So e.g.
2425         with
2426
2427           my $aoh = csv (
2428               in      => \"foo\n1\n2\n",
2429               headers => "auto",
2430               on_in   => sub { $_{bar} = 2; },
2431               );
2432
2433         $aoh will be:
2434
2435           [ { foo => 1,
2436               bar => 2,
2437               }
2438             { foo => 2,
2439               bar => 2,
2440               }
2441             ]
2442
2443       csv
2444         The function  "csv" can also be called as a method or with an
2445         existing Text::CSV_XS object. This could help if the function is to
2446         be invoked a lot of times and the overhead of creating the object
2447         internally over  and  over again would be prevented by passing an
2448         existing instance.
2449
2450          my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2451
2452          my $aoa = $csv->csv (in => $fh);
2453          my $aoa = csv (in => $fh, csv => $csv);
2454
2455         both act the same. Running this 20000 times on a 20 lines CSV file,
2456         showed a 53% speedup.
2457

INTERNALS

2459       Combine (...)
2460       Parse (...)
2461
2462       The arguments to these internal functions are deliberately not
2463       described or documented in order to enable the  module authors make
2464       changes it when they feel the need for it.  Using them is  highly
2465       discouraged  as  the  API may change in future releases.
2466

EXAMPLES

2468   Reading a CSV file line by line:
2469        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2470        open my $fh, "<", "file.csv" or die "file.csv: $!";
2471        while (my $row = $csv->getline ($fh)) {
2472            # do something with @$row
2473            }
2474        close $fh or die "file.csv: $!";
2475
2476       or
2477
2478        my $aoa = csv (in => "file.csv", on_in => sub {
2479            # do something with %_
2480            });
2481
2482       Reading only a single column
2483
2484        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2485        open my $fh, "<", "file.csv" or die "file.csv: $!";
2486        # get only the 4th column
2487        my @column = map { $_->[3] } @{$csv->getline_all ($fh)};
2488        close $fh or die "file.csv: $!";
2489
2490       with "csv", you could do
2491
2492        my @column = map { $_->[0] }
2493            @{csv (in => "file.csv", fragment => "col=4")};
2494
2495   Parsing CSV strings:
2496        my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
2497
2498        my $sample_input_string =
2499            qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
2500        if ($csv->parse ($sample_input_string)) {
2501            my @field = $csv->fields;
2502            foreach my $col (0 .. $#field) {
2503                my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
2504                printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
2505                }
2506            }
2507        else {
2508            print STDERR "parse () failed on argument: ",
2509                $csv->error_input, "\n";
2510            $csv->error_diag ();
2511            }
2512
2513       Parsing CSV from memory
2514
2515       Given a complete CSV data-set in scalar $data,  generate a list of
2516       lists to represent the rows and fields
2517
2518        # The data
2519        my $data = join "\r\n" => map { join "," => 0 .. 5 } 0 .. 5;
2520
2521        # in a loop
2522        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2523        open my $fh, "<", \$data;
2524        my @foo;
2525        while (my $row = $csv->getline ($fh)) {
2526            push @foo, $row;
2527            }
2528        close $fh;
2529
2530        # a single call
2531        my $foo = csv (in => \$data);
2532
2533   Printing CSV data
2534       The fast way: using "print"
2535
2536       An example for creating "CSV" files using the "print" method:
2537
2538        my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
2539        open my $fh, ">", "foo.csv" or die "foo.csv: $!";
2540        for (1 .. 10) {
2541            $csv->print ($fh, [ $_, "$_" ]) or $csv->error_diag;
2542            }
2543        close $fh or die "$tbl.csv: $!";
2544
2545       The slow way: using "combine" and "string"
2546
2547       or using the slower "combine" and "string" methods:
2548
2549        my $csv = Text::CSV_XS->new;
2550
2551        open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
2552
2553        my @sample_input_fields = (
2554            'You said, "Hello!"',   5.67,
2555            '"Surely"',   '',   '3.14159');
2556        if ($csv->combine (@sample_input_fields)) {
2557            print $csv_fh $csv->string, "\n";
2558            }
2559        else {
2560            print "combine () failed on argument: ",
2561                $csv->error_input, "\n";
2562            }
2563        close $csv_fh or die "hello.csv: $!";
2564
2565       Generating CSV into memory
2566
2567       Format a data-set (@foo) into a scalar value in memory ($data):
2568
2569        # The data
2570        my @foo = map { [ 0 .. 5 ] } 0 .. 3;
2571
2572        # in a loop
2573        my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
2574        open my $fh, ">", \my $data;
2575        $csv->print ($fh, $_) for @foo;
2576        close $fh;
2577
2578        # a single call
2579        csv (in => \@foo, out => \my $data);
2580
2581   Rewriting CSV
2582       Rewrite "CSV" files with ";" as separator character to well-formed
2583       "CSV":
2584
2585        use Text::CSV_XS qw( csv );
2586        csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
2587
2588       As "STDOUT" is now default in "csv", a one-liner converting a UTF-16
2589       CSV file with BOM and TAB-separation to valid UTF-8 CSV could be:
2590
2591        $ perl -C3 -MText::CSV_XS=csv -we\
2592           'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
2593
2594   Dumping database tables to CSV
2595       Dumping a database table can be simple as this (TIMTOWTDI):
2596
2597        my $dbh = DBI->connect (...);
2598        my $sql = "select * from foo";
2599
2600        # using your own loop
2601        open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
2602        my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
2603        my $sth = $dbh->prepare ($sql); $sth->execute;
2604        $csv->print ($fh, $sth->{NAME_lc});
2605        while (my $row = $sth->fetch) {
2606            $csv->print ($fh, $row);
2607            }
2608
2609        # using the csv function, all in memory
2610        csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
2611
2612        # using the csv function, streaming with callbacks
2613        my $sth = $dbh->prepare ($sql); $sth->execute;
2614        csv (out => "foo.csv", in => sub { $sth->fetch            });
2615        csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
2616
2617       Note that this does not discriminate between "empty" values and NULL-
2618       values from the database,  as both will be the same empty field in CSV.
2619       To enable distinction between the two, use "quote_empty".
2620
2621        csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
2622
2623       If the database import utility supports special sequences to insert
2624       "NULL" values into the database,  like MySQL/MariaDB supports "\N",
2625       use a filter or a map
2626
2627        csv (out => "foo.csv", in => sub { $sth->fetch },
2628                            on_in => sub { $_ //= "\\N" for @{$_[1]} });
2629
2630        while (my $row = $sth->fetch) {
2631            $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
2632            }
2633
2634       Note that this will not work as expected when choosing the backslash
2635       ("\") as "escape_char", as that will cause the "\" to need to be
2636       escaped by yet another "\",  which will cause the field to need
2637       quotation and thus ending up as "\\N" instead of "\N". See also
2638       "undef_str".
2639
2640        csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
2641
2642       These special sequences are not recognized by  Text::CSV_XS  on parsing
2643       the CSV generated like this, but map and filter are your friends again
2644
2645        while (my $row = $csv->getline ($fh)) {
2646            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
2647            }
2648
2649        csv (in => "foo.csv", filter => { 1 => sub {
2650            $sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
2651
2652   Converting CSV to JSON
2653        use Text::CSV_XS qw( csv );
2654        use JSON; # or Cpanel::JSON::XS for better performance
2655
2656        # AoA (no header interpretation)
2657        say encode_json (csv (in => "file.csv"));
2658
2659        # AoH (convert to structures)
2660        say encode_json (csv (in => "file.csv", bom => 1));
2661
2662       Yes, it is that simple.
2663
2664   The examples folder
2665       For more extended examples, see the examples/ 1. sub-directory in the
2666       original distribution or the git repository 2.
2667
2668        1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2669        2. https://github.com/Tux/Text-CSV_XS
2670
2671       The following files can be found there:
2672
2673       parser-xs.pl
2674         This can be used as a boilerplate to parse invalid "CSV"  and parse
2675         beyond (expected) errors alternative to using the "error" callback.
2676
2677          $ perl examples/parser-xs.pl bad.csv >good.csv
2678
2679       csv-check
2680         This is a command-line tool that uses parser-xs.pl  techniques to
2681         check the "CSV" file and report on its content.
2682
2683          $ csv-check files/utf8.csv
2684          Checked files/utf8.csv  with csv-check 1.9
2685          using Text::CSV_XS 1.32 with perl 5.26.0 and Unicode 9.0.0
2686          OK: rows: 1, columns: 2
2687              sep = <,>, quo = <">, bin = <1>, eol = <"\n">
2688
2689       csv-split
2690         This command splits "CSV" files into smaller files,  keeping (part
2691         of) the header.  Options include maximum number of (data) rows per
2692         file and maximum number of columns per file or a combination of the
2693         two.
2694
2695       csv2xls
2696         A script to convert "CSV" to Microsoft Excel ("XLS"). This requires
2697         extra modules Date::Calc and Spreadsheet::WriteExcel. The converter
2698         accepts various options and can produce UTF-8 compliant Excel files.
2699
2700       csv2xlsx
2701         A script to convert "CSV" to Microsoft Excel ("XLSX").  This requires
2702         the modules Date::Calc and Spreadsheet::Writer::XLSX.  The converter
2703         does accept various options including merging several "CSV" files
2704         into a single Excel file.
2705
2706       csvdiff
2707         A script that provides colorized diff on sorted CSV files,  assuming
2708         first line is header and first field is the key. Output options
2709         include colorized ANSI escape codes or HTML.
2710
2711          $ csvdiff --html --output=diff.html file1.csv file2.csv
2712
2713       rewrite.pl
2714         A script to rewrite (in)valid CSV into valid CSV files.  Script has
2715         options to generate confusing CSV files or CSV files that conform to
2716         Dutch MS-Excel exports (using ";" as separation).
2717
2718         Script - by default - honors BOM  and auto-detects separation
2719         converting it to default standard CSV with "," as separator.
2720

CAVEATS

2722       Text::CSV_XS  is not designed to detect the characters used to quote
2723       and separate fields.  The parsing is done using predefined  (default)
2724       settings.  In the examples  sub-directory,  you can find scripts  that
2725       demonstrate how you could try to detect these characters yourself.
2726
2727   Microsoft Excel
2728       The import/export from Microsoft Excel is a risky task, according to
2729       the documentation in "Text::CSV::Separator".  Microsoft uses the
2730       system's list separator defined in the regional settings, which happens
2731       to be a semicolon for Dutch, German and Spanish (and probably some
2732       others as well).   For the English locale,  the default is a comma.
2733       In Windows however,  the user is free to choose a  predefined locale,
2734       and then change  every  individual setting in it, so checking the
2735       locale is no solution.
2736
2737       As of version 1.17, a lone first line with just
2738
2739         sep=;
2740
2741       will be recognized and honored when parsing with "getline".
2742

TODO

2744       More Errors & Warnings
2745         New extensions ought to be  clear and concise  in reporting what
2746         error has occurred where and why, and maybe also offer a remedy to
2747         the problem.
2748
2749         "error_diag" is a (very) good start, but there is more work to be
2750         done in this area.
2751
2752         Basic calls  should croak or warn on  illegal parameters.  Errors
2753         should be documented.
2754
2755       setting meta info
2756         Future extensions might include extending the "meta_info",
2757         "is_quoted", and  "is_binary"  to accept setting these  flags for
2758         fields,  so you can specify which fields are quoted in the
2759         "combine"/"string" combination.
2760
2761          $csv->meta_info (0, 1, 1, 3, 0, 0);
2762          $csv->is_quoted (3, 1);
2763
2764         Metadata Vocabulary for Tabular Data
2765         <http://w3c.github.io/csvw/metadata/> (a W3C editor's draft) could be
2766         an example for supporting more metadata.
2767
2768       Parse the whole file at once
2769         Implement new methods or functions  that enable parsing of a
2770         complete file at once, returning a list of hashes. Possible extension
2771         to this could be to enable a column selection on the call:
2772
2773          my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
2774
2775         returning something like
2776
2777          [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
2778              flags  => [ ... ],
2779              },
2780            { fields => [ ... ],
2781              .
2782              },
2783            ]
2784
2785         Note that the "csv" function already supports most of this,  but does
2786         not return flags. "getline_all" returns all rows for an open stream,
2787         but this will not return flags either.  "fragment"  can reduce the
2788         required  rows or columns, but cannot combine them.
2789
2790       Cookbook
2791         Write a document that has recipes for  most known  non-standard  (and
2792         maybe some standard)  "CSV" formats,  including formats that use
2793         "TAB",  ";", "|", or other non-comma separators.
2794
2795         Examples could be taken from W3C's CSV on the Web: Use Cases and
2796         Requirements <http://w3c.github.io/csvw/use-cases-and-
2797         requirements/index.html>
2798
2799       Steal
2800         Steal good new ideas and features from PapaParse
2801         <http://papaparse.com> or csvkit <http://csvkit.readthedocs.org>.
2802
2803       Raku support
2804         Raku support can be found here <https://github.com/Tux/CSV>. The
2805         interface is richer in support than the Perl5 API, as Raku supports
2806         more types.
2807
2808         The Raku version does not (yet) support pure binary CSV datasets.
2809
2810   NOT TODO
2811       combined methods
2812         Requests for adding means (methods) that combine "combine" and
2813         "string" in a single call will not be honored (use "print" instead).
2814         Likewise for "parse" and "fields"  (use "getline" instead), given the
2815         problems with embedded newlines.
2816
2817   Release plan
2818       No guarantees, but this is what I had in mind some time ago:
2819
2820       • DIAGNOSTICS section in pod to *describe* the errors (see below)
2821

EBCDIC

2823       Everything should now work on native EBCDIC systems.   As the test does
2824       not cover all possible codepoints and Encode does not support
2825       "utf-ebcdic", there is no guarantee that all handling of Unicode is
2826       done correct.
2827
2828       Opening "EBCDIC" encoded files on  "ASCII"+  systems is likely to
2829       succeed using Encode's "cp37", "cp1047", or "posix-bc":
2830
2831        open my $fh, "<:encoding(cp1047)", "ebcdic_file.csv" or die "...";
2832

DIAGNOSTICS

2834       Still under construction ...
2835
2836       If an error occurs,  "$csv->error_diag" can be used to get information
2837       on the cause of the failure. Note that for speed reasons the internal
2838       value is never cleared on success,  so using the value returned by
2839       "error_diag" in normal cases - when no error occurred - may cause
2840       unexpected results.
2841
2842       If the constructor failed, the cause can be found using "error_diag" as
2843       a class method, like "Text::CSV_XS->error_diag".
2844
2845       The "$csv->error_diag" method is automatically invoked upon error when
2846       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2847       autodie is in effect.  When set to 1, this will cause a "warn" with the
2848       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2849       from "auto_diag" reports.
2850
2851       Errors can be (individually) caught using the "error" callback.
2852
2853       The errors as described below are available. I have tried to make the
2854       error itself explanatory enough, but more descriptions will be added.
2855       For most of these errors, the first three capitals describe the error
2856       category:
2857
2858       • INI
2859
2860         Initialization error or option conflict.
2861
2862       • ECR
2863
2864         Carriage-Return related parse error.
2865
2866       • EOF
2867
2868         End-Of-File related parse error.
2869
2870       • EIQ
2871
2872         Parse error inside quotation.
2873
2874       • EIF
2875
2876         Parse error inside field.
2877
2878       • ECB
2879
2880         Combine error.
2881
2882       • EHR
2883
2884         HashRef parse related error.
2885
2886       And below should be the complete list of error codes that can be
2887       returned:
2888
2889       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2890
2891         The  separation character  cannot be equal to  the quotation
2892         character or to the escape character,  as this would invalidate all
2893         parsing rules.
2894
2895       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2896         TAB"
2897
2898         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2899         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2900         allow.
2901
2902       • 1003 "INI - \r or \n in main attr not allowed"
2903
2904         Using default "eol" characters in either "sep_char", "quote_char",
2905         or  "escape_char"  is  not allowed.
2906
2907       • 1004 "INI - callbacks should be undef or a hashref"
2908
2909         The "callbacks"  attribute only allows one to be "undef" or a hash
2910         reference.
2911
2912       • 1005 "INI - EOL too long"
2913
2914         The value passed for EOL is exceeding its maximum length (16).
2915
2916       • 1006 "INI - SEP too long"
2917
2918         The value passed for SEP is exceeding its maximum length (16).
2919
2920       • 1007 "INI - QUOTE too long"
2921
2922         The value passed for QUOTE is exceeding its maximum length (16).
2923
2924       • 1008 "INI - SEP undefined"
2925
2926         The value passed for SEP should be defined and not empty.
2927
2928       • 1010 "INI - the header is empty"
2929
2930         The header line parsed in the "header" is empty.
2931
2932       • 1011 "INI - the header contains more than one valid separator"
2933
2934         The header line parsed in the  "header"  contains more than one
2935         (unique) separator character out of the allowed set of separators.
2936
2937       • 1012 "INI - the header contains an empty field"
2938
2939         The header line parsed in the "header" contains an empty field.
2940
2941       • 1013 "INI - the header contains nun-unique fields"
2942
2943         The header line parsed in the  "header"  contains at least  two
2944         identical fields.
2945
2946       • 1014 "INI - header called on undefined stream"
2947
2948         The header line cannot be parsed from an undefined source.
2949
2950       • 1500 "PRM - Invalid/unsupported argument(s)"
2951
2952         Function or method called with invalid argument(s) or parameter(s).
2953
2954       • 1501 "PRM - The key attribute is passed as an unsupported type"
2955
2956         The "key" attribute is of an unsupported type.
2957
2958       • 1502 "PRM - The value attribute is passed without the key attribute"
2959
2960         The "value" attribute is only allowed when a valid key is given.
2961
2962       • 1503 "PRM - The value attribute is passed as an unsupported type"
2963
2964         The "value" attribute is of an unsupported type.
2965
2966       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2967
2968         When  "eol"  has  been  set  to  anything  but the  default,  like
2969         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2970         "quote_char", where the characters following the "\r" do not make up
2971         the "eol" sequence, this is an error.
2972
2973       • 2011 "ECR - Characters after end of quoted field"
2974
2975         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2976         quoted field and after the closing double-quote, there should be
2977         either a new-line sequence or a separation character.
2978
2979       • 2012 "EOF - End of data in parsing input stream"
2980
2981         Self-explaining. End-of-file while inside parsing a stream. Can
2982         happen only when reading from streams with "getline",  as using
2983         "parse" is done on strings that are not required to have a trailing
2984         "eol".
2985
2986       • 2013 "INI - Specification error for fragments RFC7111"
2987
2988         Invalid specification for URI "fragment" specification.
2989
2990       • 2014 "ENF - Inconsistent number of fields"
2991
2992         Inconsistent number of fields under strict parsing.
2993
2994       • 2021 "EIQ - NL char inside quotes, binary off"
2995
2996         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2997         option has been selected with the constructor.
2998
2999       • 2022 "EIQ - CR char inside quotes, binary off"
3000
3001         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
3002         option has been selected with the constructor.
3003
3004       • 2023 "EIQ - QUO character not allowed"
3005
3006         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
3007         Bar",\n" will cause this error.
3008
3009       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
3010
3011         The escape character is not allowed as last character in an input
3012         stream.
3013
3014       • 2025 "EIQ - Loose unescaped escape"
3015
3016         An escape character should escape only characters that need escaping.
3017
3018         Allowing  the escape  for other characters  is possible  with the
3019         attribute "allow_loose_escapes".
3020
3021       • 2026 "EIQ - Binary character inside quoted field, binary off"
3022
3023         Binary characters are not allowed by default.    Exceptions are
3024         fields that contain valid UTF-8,  that will automatically be upgraded
3025         if the content is valid UTF-8. Set "binary" to 1 to accept binary
3026         data.
3027
3028       • 2027 "EIQ - Quoted field not terminated"
3029
3030         When parsing a field that started with a quotation character,  the
3031         field is expected to be closed with a quotation character.   When the
3032         parsed line is exhausted before the quote is found, that field is not
3033         terminated.
3034
3035       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
3036
3037       • 2031 "EIF - CR char is first char of field, not part of EOL"
3038
3039       • 2032 "EIF - CR char inside unquoted, not part of EOL"
3040
3041       • 2034 "EIF - Loose unescaped quote"
3042
3043       • 2035 "EIF - Escaped EOF in unquoted field"
3044
3045       • 2036 "EIF - ESC error"
3046
3047       • 2037 "EIF - Binary character in unquoted field, binary off"
3048
3049       • 2110 "ECB - Binary character in Combine, binary off"
3050
3051       • 2200 "EIO - print to IO failed. See errno"
3052
3053       • 3001 "EHR - Unsupported syntax for column_names ()"
3054
3055       • 3002 "EHR - getline_hr () called before column_names ()"
3056
3057       • 3003 "EHR - bind_columns () and column_names () fields count
3058         mismatch"
3059
3060       • 3004 "EHR - bind_columns () only accepts refs to scalars"
3061
3062       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
3063         fields"
3064
3065       • 3007 "EHR - bind_columns needs refs to writable scalars"
3066
3067       • 3008 "EHR - unexpected error in bound fields"
3068
3069       • 3009 "EHR - print_hr () called before column_names ()"
3070
3071       • 3010 "EHR - print_hr () called with invalid arguments"
3072

SEE ALSO

3074       IO::File,  IO::Handle,  IO::Wrap,  Text::CSV,  Text::CSV_PP,
3075       Text::CSV::Encoded,     Text::CSV::Separator,    Text::CSV::Slurp,
3076       Spreadsheet::CSV and Spreadsheet::Read, and of course perl.
3077
3078       If you are using Raku,  have a look at "Text::CSV" in the Raku
3079       ecosystem, offering the same features.
3080
3081       non-perl
3082
3083       A CSV parser in JavaScript,  also used by W3C <http://www.w3.org>,  is
3084       the multi-threaded in-browser PapaParse <http://papaparse.com/>.
3085
3086       csvkit <http://csvkit.readthedocs.org> is a python CSV parsing toolkit.
3087

AUTHOR

3089       Alan Citterman <alan@mfgrtl.com> wrote the original Perl module.
3090       Please don't send mail concerning Text::CSV_XS to Alan, who is not
3091       involved in the C/XS part that is now the main part of the module.
3092
3093       Jochen Wiedmann <joe@ispsoft.de> rewrote the en- and decoding in C by
3094       implementing a simple finite-state machine.   He added variable quote,
3095       escape and separator characters, the binary mode and the print and
3096       getline methods. See ChangeLog releases 0.10 through 0.23.
3097
3098       H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code,  added the
3099       field flags methods,  wrote the major part of the test suite, completed
3100       the documentation,   fixed most RT bugs,  added all the allow flags and
3101       the "csv" function. See ChangeLog releases 0.25 and on.
3102
3104        Copyright (C) 2007-2023 H.Merijn Brand.  All rights reserved.
3105        Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
3106        Copyright (C) 1997      Alan Citterman.  All rights reserved.
3107
3108       This library is free software;  you can redistribute and/or modify it
3109       under the same terms as Perl itself.
3110
3111
3112
3113perl v5.36.0                      2023-03-01                         CSV_XS(3)
Impressum