Text::CSV(3pm)

1Text::CSV(3)          User Contributed Perl Documentation         Text::CSV(3)
2
3
4

NAME

6       Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7

SYNOPSIS

9       This section is taken from Text::CSV_XS.
10
11        # Functional interface
12        use Text::CSV qw( csv );
13
14        # Read whole file in memory
15        my $aoa = csv (in => "data.csv");    # as array of array
16        my $aoh = csv (in => "data.csv",
17                       headers => "auto");   # as array of hash
18
19        # Write array of arrays as csv file
20        csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22        # Only show lines where "code" is odd
23        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25        # Object interface
26        use Text::CSV;
27
28        my @rows;
29        # Read/parse CSV
30        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
31        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32        while (my $row = $csv->getline ($fh)) {
33            $row->[2] =~ m/pattern/ or next; # 3rd field should match
34            push @rows, $row;
35            }
36        close $fh;
37
38        # and write as CSV
39        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40        $csv->say ($fh, $_) for @rows;
41        close $fh or die "new.csv: $!";
42

DESCRIPTION

44       Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
45       All the backend modules provide facilities for the composition and
46       decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
47       default, and when Text::CSV_XS is not available, falls back on
48       Text::CSV_PP, which is bundled in the same distribution as this module.
49

CHOOSING BACKEND

51       This module respects an environmental variable called "PERL_TEXT_CSV"
52       when it decides a backend module to use. If this environmental variable
53       is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
54       available, falls back on Text::CSV_PP;
55
56       If you always don't want it to fall back on Text::CSV_PP, set the
57       variable like this ("export" may be "setenv", "set" and the likes,
58       depending on your environment):
59
60         > export PERL_TEXT_CSV=Text::CSV_XS
61
62       If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
63
64         > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
65
66       You may also want to set this variable at the top of your test files,
67       in order not to be bothered with incompatibilities between backends
68       (you need to wrap this in "BEGIN", and set before actually "use"-ing
69       Text::CSV module, as it decides its backend as soon as it's loaded):
70
71         BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
72         use Text::CSV;
73

NOTES

75       This section is also taken from Text::CSV_XS.
76
77   Embedded newlines
78       Important Note:  The default behavior is to accept only ASCII
79       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
80       that the fields can not contain newlines. If your data contains
81       newlines embedded in fields, or characters above 0x7E (tilde), or
82       binary data, you must set "binary => 1" in the call to "new". To cover
83       the widest range of parsing options, you will always want to set
84       binary.
85
86       But you still have the problem  that you have to pass a correct line to
87       the "parse" method, which is more complicated from the usual point of
88       usage:
89
90        my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
91        while (<>) {           #  WRONG!
92            $csv->parse ($_);
93            my @fields = $csv->fields ();
94            }
95
96       this will break, as the "while" might read broken lines:  it does not
97       care about the quoting. If you need to support embedded newlines,  the
98       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
99       and "\r\n" by default) and then
100
101        my $csv = Text::CSV->new ({ binary => 1 });
102        open my $fh, "<", $file or die "$file: $!";
103        while (my $row = $csv->getline ($fh)) {
104            my @fields = @$row;
105            }
106
107       The old(er) way of using global file handles is still supported
108
109        while (my $row = $csv->getline (*ARGV)) { ... }
110
111   Unicode
112       Unicode is only tested to work with perl-5.8.2 and up.
113
114       See also "BOM".
115
116       The simplest way to ensure the correct encoding is used for  in- and
117       output is by either setting layers on the filehandles, or setting the
118       "encoding" argument for "csv".
119
120        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
121       or
122        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
123
124        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
125       or
126        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
127
128       On parsing (both for  "getline" and  "parse"),  if the source is marked
129       being UTF8, then all fields that are marked binary will also be marked
130       UTF8.
131
132       On combining ("print"  and  "combine"):  if any of the combining fields
133       was marked UTF8, the resulting string will be marked as UTF8.  Note
134       however that all fields  before  the first field marked UTF8 and
135       contained 8-bit characters that were not upgraded to UTF8,  these will
136       be  "bytes"  in the resulting string too, possibly causing unexpected
137       errors.  If you pass data of different encoding,  or you don't know if
138       there is  different  encoding, force it to be upgraded before you pass
139       them on:
140
141        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
142
143       For complete control over encoding, please use Text::CSV::Encoded:
144
145        use Text::CSV::Encoded;
146        my $csv = Text::CSV::Encoded->new ({
147            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
148            encoding_out => "cp1252",     # the encoding comes out of Perl
149            });
150
151        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
152        # combine () and print () accept *literally* utf8 encoded data
153        # parse () and getline () return *literally* utf8 encoded data
154
155        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
156        # combine () and print () accept UTF8 marked data
157        # parse () and getline () return UTF8 marked data
158
159   BOM
160       BOM  (or Byte Order Mark)  handling is available only inside the
161       "header" method.   This method supports the following encodings:
162       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
163       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
164       <https://en.wikipedia.org/wiki/Byte_order_mark>.
165
166       If a file has a BOM, the easiest way to deal with that is
167
168        my $aoh = csv (in => $file, detect_bom => 1);
169
170       All records will be encoded based on the detected BOM.
171
172       This implies a call to the  "header"  method,  which defaults to also
173       set the "column_names". So this is not the same as
174
175        my $aoh = csv (in => $file, headers => "auto");
176
177       which only reads the first record to set  "column_names"  but ignores
178       any meaning of possible present BOM.
179

METHODS

181       This section is also taken from Text::CSV_XS.
182
183   version
184       (Class method) Returns the current module version.
185
186   new
187       (Class method) Returns a new instance of class Text::CSV. The
188       attributes are described by the (optional) hash ref "\%attr".
189
190        my $csv = Text::CSV->new ({ attributes ... });
191
192       The following attributes are available:
193
194       eol
195
196        my $csv = Text::CSV->new ({ eol => $/ });
197                  $csv->eol (undef);
198        my $eol = $csv->eol;
199
200       The end-of-line string to add to rows for "print" or the record
201       separator for "getline".
202
203       When not passed in a parser instance,  the default behavior is to
204       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
205       "eol" at all. Passing "undef" or the empty string behave the same.
206
207       When not passed in a generating instance,  records are not terminated
208       at all, so it is probably wise to pass something you expect. A safe
209       choice for "eol" on output is either $/ or "\r\n".
210
211       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
212       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
213       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
214
215       If both $/ and "eol" equal "\015", parsing lines that end on only a
216       Carriage Return without Line Feed, will be "parse"d correct.
217
218       sep_char
219
220        my $csv = Text::CSV->new ({ sep_char => ";" });
221                $csv->sep_char (";");
222        my $c = $csv->sep_char;
223
224       The char used to separate fields, by default a comma. (",").  Limited
225       to a single-byte character, usually in the range from 0x20 (space) to
226       0x7E (tilde). When longer sequences are required, use "sep".
227
228       The separation character can not be equal to the quote character  or to
229       the escape character.
230
231       sep
232
233        my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
234                  $csv->sep (";");
235        my $sep = $csv->sep;
236
237       The chars used to separate fields, by default undefined. Limited to 8
238       bytes.
239
240       When set, overrules "sep_char".  If its length is one byte it acts as
241       an alias to "sep_char".
242
243       quote_char
244
245        my $csv = Text::CSV->new ({ quote_char => "'" });
246                $csv->quote_char (undef);
247        my $c = $csv->quote_char;
248
249       The character to quote fields containing blanks or binary data,  by
250       default the double quote character (""").  A value of undef suppresses
251       quote chars (for simple cases only). Limited to a single-byte
252       character, usually in the range from  0x20 (space) to  0x7E (tilde).
253       When longer sequences are required, use "quote".
254
255       "quote_char" can not be equal to "sep_char".
256
257       quote
258
259        my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
260                    $csv->quote ("'");
261        my $quote = $csv->quote;
262
263       The chars used to quote fields, by default undefined. Limited to 8
264       bytes.
265
266       When set, overrules "quote_char". If its length is one byte it acts as
267       an alias to "quote_char".
268
269       escape_char
270
271        my $csv = Text::CSV->new ({ escape_char => "\\" });
272                $csv->escape_char (":");
273        my $c = $csv->escape_char;
274
275       The character to  escape  certain characters inside quoted fields.
276       This is limited to a  single-byte  character,  usually  in the  range
277       from  0x20 (space) to 0x7E (tilde).
278
279       The "escape_char" defaults to being the double-quote mark ("""). In
280       other words the same as the default "quote_char". This means that
281       doubling the quote mark in a field escapes it:
282
283        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
284
285       If  you  change  the   "quote_char"  without  changing  the
286       "escape_char",  the  "escape_char" will still be the double-quote
287       (""").  If instead you want to escape the  "quote_char" by doubling it
288       you will need to also change the  "escape_char"  to be the same as what
289       you have changed the "quote_char" to.
290
291       Setting "escape_char" to <undef> or "" will disable escaping completely
292       and is greatly discouraged. This will also disable "escape_null".
293
294       The escape character can not be equal to the separation character.
295
296       binary
297
298        my $csv = Text::CSV->new ({ binary => 1 });
299                $csv->binary (0);
300        my $f = $csv->binary;
301
302       If this attribute is 1,  you may use binary characters in quoted
303       fields, including line feeds, carriage returns and "NULL" bytes. (The
304       latter could be escaped as ""0".) By default this feature is off.
305
306       If a string is marked UTF8,  "binary" will be turned on automatically
307       when binary characters other than "CR" and "NL" are encountered.   Note
308       that a simple string like "\x{00a0}" might still be binary, but not
309       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
310
311       strict
312
313        my $csv = Text::CSV->new ({ strict => 1 });
314                $csv->strict (0);
315        my $f = $csv->strict;
316
317       If this attribute is set to 1, any row that parses to a different
318       number of fields than the previous row will cause the parser to throw
319       error 2014.
320
321       formula_handling
322
323       formula
324
325        my $csv = Text::CSV->new ({ formula => "none" });
326                $csv->formula ("none");
327        my $f = $csv->formula;
328
329       This defines the behavior of fields containing formulas. As formulas
330       are considered dangerous in spreadsheets, this attribute can define an
331       optional action to be taken if a field starts with an equal sign ("=").
332
333       For purpose of code-readability, this can also be written as
334
335        my $csv = Text::CSV->new ({ formula_handling => "none" });
336                $csv->formula_handling ("none");
337        my $f = $csv->formula_handling;
338
339       Possible values for this attribute are
340
341       none
342         Take no specific action. This is the default.
343
344          $csv->formula ("none");
345
346       die
347         Cause the process to "die" whenever a leading "=" is encountered.
348
349          $csv->formula ("die");
350
351       croak
352         Cause the process to "croak" whenever a leading "=" is encountered.
353         (See Carp)
354
355          $csv->formula ("croak");
356
357       diag
358         Report position and content of the field whenever a leading  "=" is
359         found.  The value of the field is unchanged.
360
361          $csv->formula ("diag");
362
363       empty
364         Replace the content of fields that start with a "=" with the empty
365         string.
366
367          $csv->formula ("empty");
368          $csv->formula ("");
369
370       undef
371         Replace the content of fields that start with a "=" with "undef".
372
373          $csv->formula ("undef");
374          $csv->formula (undef);
375
376       All other values will give a warning and then fallback to "diag".
377
378       decode_utf8
379
380        my $csv = Text::CSV->new ({ decode_utf8 => 1 });
381                $csv->decode_utf8 (0);
382        my $f = $csv->decode_utf8;
383
384       This attributes defaults to TRUE.
385
386       While parsing,  fields that are valid UTF-8, are automatically set to
387       be UTF-8, so that
388
389         $csv->parse ("\xC4\xA8\n");
390
391       results in
392
393         PV("\304\250"\0) [UTF8 "\x{128}"]
394
395       Sometimes it might not be a desired action.  To prevent those upgrades,
396       set this attribute to false, and the result will be
397
398         PV("\304\250"\0)
399
400       auto_diag
401
402        my $csv = Text::CSV->new ({ auto_diag => 1 });
403                $csv->auto_diag (2);
404        my $l = $csv->auto_diag;
405
406       Set this attribute to a number between 1 and 9 causes  "error_diag" to
407       be automatically called in void context upon errors.
408
409       In case of error "2012 - EOF", this call will be void.
410
411       If "auto_diag" is set to a numeric value greater than 1, it will "die"
412       on errors instead of "warn".  If set to anything unrecognized,  it will
413       be silently ignored.
414
415       Future extensions to this feature will include more reliable auto-
416       detection of  "autodie"  being active in the scope of which the error
417       occurred which will increment the value of "auto_diag" with  1 the
418       moment the error is detected.
419
420       diag_verbose
421
422        my $csv = Text::CSV->new ({ diag_verbose => 1 });
423                $csv->diag_verbose (2);
424        my $l = $csv->diag_verbose;
425
426       Set the verbosity of the output triggered by "auto_diag".   Currently
427       only adds the current  input-record-number  (if known)  to the
428       diagnostic output with an indication of the position of the error.
429
430       blank_is_undef
431
432        my $csv = Text::CSV->new ({ blank_is_undef => 1 });
433                $csv->blank_is_undef (0);
434        my $f = $csv->blank_is_undef;
435
436       Under normal circumstances, "CSV" data makes no distinction between
437       quoted- and unquoted empty fields.  These both end up in an empty
438       string field once read, thus
439
440        1,"",," ",2
441
442       is read as
443
444        ("1", "", "", " ", "2")
445
446       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
447       set, the unquoted  empty field is the result of an undefined value.
448       To enable this distinction when  reading "CSV"  data,  the
449       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
450       to "undef", causing the above to be parsed as
451
452        ("1", "", undef, " ", "2")
453
454       note that this is specifically important when loading  "CSV" fields
455       into a database that allows "NULL" values,  as the perl equivalent for
456       "NULL" is "undef" in DBI land.
457
458       empty_is_undef
459
460        my $csv = Text::CSV->new ({ empty_is_undef => 1 });
461                $csv->empty_is_undef (0);
462        my $f = $csv->empty_is_undef;
463
464       Going one  step  further  than  "blank_is_undef",  this attribute
465       converts all empty fields to "undef", so
466
467        1,"",," ",2
468
469       is read as
470
471        (1, undef, undef, " ", 2)
472
473       Note that this effects only fields that are  originally  empty,  not
474       fields that are empty after stripping allowed whitespace. YMMV.
475
476       allow_whitespace
477
478        my $csv = Text::CSV->new ({ allow_whitespace => 1 });
479                $csv->allow_whitespace (0);
480        my $f = $csv->allow_whitespace;
481
482       When this option is set to true,  the whitespace  ("TAB"'s and
483       "SPACE"'s) surrounding  the  separation character  is removed when
484       parsing.  If either "TAB" or "SPACE" is one of the three characters
485       "sep_char", "quote_char", or "escape_char" it will not be considered
486       whitespace.
487
488       Now lines like:
489
490        1 , "foo" , bar , 3 , zapp
491
492       are parsed as valid "CSV", even though it violates the "CSV" specs.
493
494       Note that  all  whitespace is stripped from both  start and  end of
495       each field.  That would make it  more than a feature to enable parsing
496       bad "CSV" lines, as
497
498        1,   2.0,  3,   ape  , monkey
499
500       will now be parsed as
501
502        ("1", "2.0", "3", "ape", "monkey")
503
504       even if the original line was perfectly acceptable "CSV".
505
506       allow_loose_quotes
507
508        my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
509                $csv->allow_loose_quotes (0);
510        my $f = $csv->allow_loose_quotes;
511
512       By default, parsing unquoted fields containing "quote_char" characters
513       like
514
515        1,foo "bar" baz,42
516
517       would result in parse error 2034.  Though it is still bad practice to
518       allow this format,  we  cannot  help  the  fact  that  some  vendors
519       make  their applications spit out lines styled this way.
520
521       If there is really bad "CSV" data, like
522
523        1,"foo "bar" baz",42
524
525       or
526
527        1,""foo bar baz"",42
528
529       there is a way to get this data-line parsed and leave the quotes inside
530       the quoted field as-is.  This can be achieved by setting
531       "allow_loose_quotes" AND making sure that the "escape_char" is  not
532       equal to "quote_char".
533
534       allow_loose_escapes
535
536        my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
537                $csv->allow_loose_escapes (0);
538        my $f = $csv->allow_loose_escapes;
539
540       Parsing fields  that  have  "escape_char"  characters that escape
541       characters that do not need to be escaped, like:
542
543        my $csv = Text::CSV->new ({ escape_char => "\\" });
544        $csv->parse (qq{1,"my bar\'s",baz,42});
545
546       would result in parse error 2025.   Though it is bad practice to allow
547       this format,  this attribute enables you to treat all escape character
548       sequences equal.
549
550       allow_unquoted_escape
551
552        my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
553                $csv->allow_unquoted_escape (0);
554        my $f = $csv->allow_unquoted_escape;
555
556       A backward compatibility issue where "escape_char" differs from
557       "quote_char"  prevents  "escape_char" to be in the first position of a
558       field.  If "quote_char" is equal to the default """ and "escape_char"
559       is set to "\", this would be illegal:
560
561        1,\0,2
562
563       Setting this attribute to 1  might help to overcome issues with
564       backward compatibility and allow this style.
565
566       always_quote
567
568        my $csv = Text::CSV->new ({ always_quote => 1 });
569                $csv->always_quote (0);
570        my $f = $csv->always_quote;
571
572       By default the generated fields are quoted only if they need to be.
573       For example, if they contain the separator character. If you set this
574       attribute to 1 then all defined fields will be quoted. ("undef" fields
575       are not quoted, see "blank_is_undef"). This makes it quite often easier
576       to handle exported data in external applications.
577
578       quote_space
579
580        my $csv = Text::CSV->new ({ quote_space => 1 });
581                $csv->quote_space (0);
582        my $f = $csv->quote_space;
583
584       By default,  a space in a field would trigger quotation.  As no rule
585       exists this to be forced in "CSV",  nor any for the opposite, the
586       default is true for safety.   You can exclude the space  from this
587       trigger  by setting this attribute to 0.
588
589       quote_empty
590
591        my $csv = Text::CSV->new ({ quote_empty => 1 });
592                $csv->quote_empty (0);
593        my $f = $csv->quote_empty;
594
595       By default the generated fields are quoted only if they need to be.
596       An empty (defined) field does not need quotation. If you set this
597       attribute to 1 then empty defined fields will be quoted.  ("undef"
598       fields are not quoted, see "blank_is_undef"). See also "always_quote".
599
600       quote_binary
601
602        my $csv = Text::CSV->new ({ quote_binary => 1 });
603                $csv->quote_binary (0);
604        my $f = $csv->quote_binary;
605
606       By default,  all "unsafe" bytes inside a string cause the combined
607       field to be quoted.  By setting this attribute to 0, you can disable
608       that trigger for bytes >= 0x7F.
609
610       escape_null
611
612        my $csv = Text::CSV->new ({ escape_null => 1 });
613                $csv->escape_null (0);
614        my $f = $csv->escape_null;
615
616       By default, a "NULL" byte in a field would be escaped. This option
617       enables you to treat the  "NULL"  byte as a simple binary character in
618       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
619       can prevent "NULL" escapes by setting this attribute to 0.
620
621       When the "escape_char" attribute is set to undefined,  this attribute
622       will be set to false.
623
624       The default setting will encode "=\x00=" as
625
626        "="0="
627
628       With "escape_null" set, this will result in
629
630        "=\x00="
631
632       The default when using the "csv" function is "false".
633
634       For backward compatibility reasons,  the deprecated old name
635       "quote_null" is still recognized.
636
637       keep_meta_info
638
639        my $csv = Text::CSV->new ({ keep_meta_info => 1 });
640                $csv->keep_meta_info (0);
641        my $f = $csv->keep_meta_info;
642
643       By default, the parsing of input records is as simple and fast as
644       possible.  However,  some parsing information - like quotation of the
645       original field - is lost in that process.  Setting this flag to true
646       enables retrieving that information after parsing with  the methods
647       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
648       false for performance.
649
650       If you set this attribute to a value greater than 9,   than you can
651       control output quotation style like it was used in the input of the the
652       last parsed record (unless quotation was added because of other
653       reasons).
654
655        my $csv = Text::CSV->new ({
656           binary         => 1,
657           keep_meta_info => 1,
658           quote_space    => 0,
659           });
660
661        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
662
663        $csv->print (*STDOUT, \@row);
664        # 1,,, , ,f,g,"h""h",help,help
665        $csv->keep_meta_info (11);
666        $csv->print (*STDOUT, \@row);
667        # 1,,"", ," ",f,"g","h""h",help,"help"
668
669       undef_str
670
671        my $csv = Text::CSV->new ({ undef_str => "\\N" });
672                $csv->undef_str (undef);
673        my $s = $csv->undef_str;
674
675       This attribute optionally defines the output of undefined fields. The
676       value passed is not changed at all, so if it needs quotation, the
677       quotation needs to be included in the value of the attribute.  Use with
678       caution, as passing a value like  ",",,,,"""  will for sure mess up
679       your output. The default for this attribute is "undef", meaning no
680       special treatment.
681
682       This attribute is useful when exporting  CSV data  to be imported in
683       custom loaders, like for MySQL, that recognize special sequences for
684       "NULL" data.
685
686       This attribute has no meaning when parsing CSV data.
687
688       verbatim
689
690        my $csv = Text::CSV->new ({ verbatim => 1 });
691                $csv->verbatim (0);
692        my $f = $csv->verbatim;
693
694       This is a quite controversial attribute to set,  but makes some hard
695       things possible.
696
697       The rationale behind this attribute is to tell the parser that the
698       normally special characters newline ("NL") and Carriage Return ("CR")
699       will not be special when this flag is set,  and be dealt with  as being
700       ordinary binary characters. This will ease working with data with
701       embedded newlines.
702
703       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
704       every line.
705
706       Imagine a file format like
707
708        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
709
710       where, the line ending is a very specific "#\r\n", and the sep_char is
711       a "^" (caret).   None of the fields is quoted,   but embedded binary
712       data is likely to be present. With the specific line ending, this
713       should not be too hard to detect.
714
715       By default,  Text::CSV'  parse function is instructed to only know
716       about "\n" and "\r"  to be legal line endings,  and so has to deal with
717       the embedded newline as a real "end-of-line",  so it can scan the next
718       line if binary is true, and the newline is inside a quoted field. With
719       this option, we tell "parse" to parse the line as if "\n" is just
720       nothing more than a binary character.
721
722       For "parse" this means that the parser has no more idea about line
723       ending and "getline" "chomp"s line endings on reading.
724
725       types
726
727       A set of column types; the attribute is immediately passed to the
728       "types" method.
729
730       callbacks
731
732       See the "Callbacks" section below.
733
734       accessors
735
736       To sum it up,
737
738        $csv = Text::CSV->new ();
739
740       is equivalent to
741
742        $csv = Text::CSV->new ({
743            eol                   => undef, # \r, \n, or \r\n
744            sep_char              => ',',
745            sep                   => undef,
746            quote_char            => '"',
747            quote                 => undef,
748            escape_char           => '"',
749            binary                => 0,
750            decode_utf8           => 1,
751            auto_diag             => 0,
752            diag_verbose          => 0,
753            blank_is_undef        => 0,
754            empty_is_undef        => 0,
755            allow_whitespace      => 0,
756            allow_loose_quotes    => 0,
757            allow_loose_escapes   => 0,
758            allow_unquoted_escape => 0,
759            always_quote          => 0,
760            quote_empty           => 0,
761            quote_space           => 1,
762            escape_null           => 1,
763            quote_binary          => 1,
764            keep_meta_info        => 0,
765            strict                => 0,
766            formula               => 0,
767            verbatim              => 0,
768            undef_str             => undef,
769            types                 => undef,
770            callbacks             => undef,
771            });
772
773       For all of the above mentioned flags, an accessor method is available
774       where you can inquire the current value, or change the value
775
776        my $quote = $csv->quote_char;
777        $csv->binary (1);
778
779       It is not wise to change these settings halfway through writing "CSV"
780       data to a stream. If however you want to create a new stream using the
781       available "CSV" object, there is no harm in changing them.
782
783       If the "new" constructor call fails,  it returns "undef",  and makes
784       the fail reason available through the "error_diag" method.
785
786        $csv = Text::CSV->new ({ ecs_char => 1 }) or
787            die "".Text::CSV->error_diag ();
788
789       "error_diag" will return a string like
790
791        "INI - Unknown attribute 'ecs_char'"
792
793   known_attributes
794        @attr = Text::CSV->known_attributes;
795        @attr = Text::CSV::known_attributes;
796        @attr = $csv->known_attributes;
797
798       This method will return an ordered list of all the supported
799       attributes as described above.   This can be useful for knowing what
800       attributes are valid in classes that use or extend Text::CSV.
801
802   print
803        $status = $csv->print ($fh, $colref);
804
805       Similar to  "combine" + "string" + "print",  but much more efficient.
806       It expects an array ref as input  (not an array!)  and the resulting
807       string is not really  created,  but  immediately  written  to the  $fh
808       object, typically an IO handle or any other object that offers a
809       "print" method.
810
811       For performance reasons  "print"  does not create a result string,  so
812       all "string", "status", "fields", and "error_input" methods will return
813       undefined information after executing this method.
814
815       If $colref is "undef"  (explicit,  not through a variable argument) and
816       "bind_columns"  was used to specify fields to be printed,  it is
817       possible to make performance improvements, as otherwise data would have
818       to be copied as arguments to the method call:
819
820        $csv->bind_columns (\($foo, $bar));
821        $status = $csv->print ($fh, undef);
822
823       A short benchmark
824
825        my @data = ("aa" .. "zz");
826        $csv->bind_columns (\(@data));
827
828        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
829        $csv->print ($fh,  \@data  );   # 57600 recs/sec
830        $csv->print ($fh,   undef  );   # 48500 recs/sec
831
832   say
833        $status = $csv->say ($fh, $colref);
834
835       Like "print", but "eol" defaults to "$\".
836
837   print_hr
838        $csv->print_hr ($fh, $ref);
839
840       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
841       provided the column names are set with "column_names".
842
843       It is just a wrapper method with basic parameter checks over
844
845        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
846
847   combine
848        $status = $csv->combine (@fields);
849
850       This method constructs a "CSV" record from  @fields,  returning success
851       or failure.   Failure can result from lack of arguments or an argument
852       that contains an invalid character.   Upon success,  "string" can be
853       called to retrieve the resultant "CSV" string.  Upon failure,  the
854       value returned by "string" is undefined and "error_input" could be
855       called to retrieve the invalid argument.
856
857   string
858        $line = $csv->string ();
859
860       This method returns the input to  "parse"  or the resultant "CSV"
861       string of "combine", whichever was called more recently.
862
863   getline
864        $colref = $csv->getline ($fh);
865
866       This is the counterpart to  "print",  as "parse"  is the counterpart to
867       "combine":  it parses a row from the $fh  handle using the "getline"
868       method associated with $fh  and parses this row into an array ref.
869       This array ref is returned by the function or "undef" for failure.
870       When $fh does not support "getline", you are likely to hit errors.
871
872       When fields are bound with "bind_columns" the return value is a
873       reference to an empty list.
874
875       The "string", "fields", and "status" methods are meaningless again.
876
877   getline_all
878        $arrayref = $csv->getline_all ($fh);
879        $arrayref = $csv->getline_all ($fh, $offset);
880        $arrayref = $csv->getline_all ($fh, $offset, $length);
881
882       This will return a reference to a list of getline ($fh) results.  In
883       this call, "keep_meta_info" is disabled.  If $offset is negative, as
884       with "splice", only the last  "abs ($offset)" records of $fh are taken
885       into consideration.
886
887       Given a CSV file with 10 lines:
888
889        lines call
890        ----- ---------------------------------------------------------
891        0..9  $csv->getline_all ($fh)         # all
892        0..9  $csv->getline_all ($fh,  0)     # all
893        8..9  $csv->getline_all ($fh,  8)     # start at 8
894        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
895        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
896        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
897        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
898        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
899
900   getline_hr
901       The "getline_hr" and "column_names" methods work together  to allow you
902       to have rows returned as hashrefs.  You must call "column_names" first
903       to declare your column names.
904
905        $csv->column_names (qw( code name price description ));
906        $hr = $csv->getline_hr ($fh);
907        print "Price for $hr->{name} is $hr->{price} EUR\n";
908
909       "getline_hr" will croak if called before "column_names".
910
911       Note that  "getline_hr"  creates a hashref for every row and will be
912       much slower than the combined use of "bind_columns"  and "getline" but
913       still offering the same ease of use hashref inside the loop:
914
915        my @cols = @{$csv->getline ($fh)};
916        $csv->column_names (@cols);
917        while (my $row = $csv->getline_hr ($fh)) {
918            print $row->{price};
919            }
920
921       Could easily be rewritten to the much faster:
922
923        my @cols = @{$csv->getline ($fh)};
924        my $row = {};
925        $csv->bind_columns (\@{$row}{@cols});
926        while ($csv->getline ($fh)) {
927            print $row->{price};
928            }
929
930       Your mileage may vary for the size of the data and the number of rows.
931       With perl-5.14.2 the comparison for a 100_000 line file with 14 rows:
932
933                   Rate hashrefs getlines
934        hashrefs 1.00/s       --     -76%
935        getlines 4.15/s     313%       --
936
937   getline_hr_all
938        $arrayref = $csv->getline_hr_all ($fh);
939        $arrayref = $csv->getline_hr_all ($fh, $offset);
940        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
941
942       This will return a reference to a list of   getline_hr ($fh) results.
943       In this call, "keep_meta_info" is disabled.
944
945   parse
946        $status = $csv->parse ($line);
947
948       This method decomposes a  "CSV"  string into fields,  returning success
949       or failure.   Failure can result from a lack of argument  or the given
950       "CSV" string is improperly formatted.   Upon success, "fields" can be
951       called to retrieve the decomposed fields. Upon failure calling "fields"
952       will return undefined data and  "error_input"  can be called to
953       retrieve  the invalid argument.
954
955       You may use the "types"  method for setting column types.  See "types"'
956       description below.
957
958       The $line argument is supposed to be a simple scalar. Everything else
959       is supposed to croak and set error 1500.
960
961   fragment
962       This function tries to implement RFC7111  (URI Fragment Identifiers for
963       the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
964
965        my $AoA = $csv->fragment ($fh, $spec);
966
967       In specifications,  "*" is used to specify the last item, a dash ("-")
968       to indicate a range.   All indices are 1-based:  the first row or
969       column has index 1. Selections can be combined with the semi-colon
970       (";").
971
972       When using this method in combination with  "column_names",  the
973       returned reference  will point to a  list of hashes  instead of a  list
974       of lists.  A disjointed  cell-based combined selection  might return
975       rows with different number of columns making the use of hashes
976       unpredictable.
977
978        $csv->column_names ("Name", "Age");
979        my $AoH = $csv->fragment ($fh, "col=3;8");
980
981       If the "after_parse" callback is active,  it is also called on every
982       line parsed and skipped before the fragment.
983
984       row
985          row=4
986          row=5-7
987          row=6-*
988          row=1-2;4;6-*
989
990       col
991          col=2
992          col=1-3
993          col=4-*
994          col=1-2;4;7-*
995
996       cell
997         In cell-based selection, the comma (",") is used to pair row and
998         column
999
1000          cell=4,1
1001
1002         The range operator ("-") using "cell"s can be used to define top-left
1003         and bottom-right "cell" location
1004
1005          cell=3,1-4,6
1006
1007         The "*" is only allowed in the second part of a pair
1008
1009          cell=3,2-*,2    # row 3 till end, only column 2
1010          cell=3,2-3,*    # column 2 till end, only row 3
1011          cell=3,2-*,*    # strip row 1 and 2, and column 1
1012
1013         Cells and cell ranges may be combined with ";", possibly resulting in
1014         rows with different number of columns
1015
1016          cell=1,1-2,2;3,3-4,4;1,4;4,1
1017
1018         Disjointed selections will only return selected cells.   The cells
1019         that are not  specified  will  not  be  included  in the  returned
1020         set,  not even as "undef".  As an example given a "CSV" like
1021
1022          11,12,13,...19
1023          21,22,...28,29
1024          :            :
1025          91,...97,98,99
1026
1027         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1028
1029          11,12,14
1030          21,22
1031          33,34
1032          41,43,44
1033
1034         Overlapping cell-specs will return those cells only once, So
1035         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1036
1037          11,12,13
1038          21,22,23,24
1039          31,32,33,34
1040          42,43,44
1041
1042       RFC7111 <http://tools.ietf.org/html/rfc7111> does  not  allow different
1043       types of specs to be combined   (either "row" or "col" or "cell").
1044       Passing an invalid fragment specification will croak and set error
1045       2013.
1046
1047   column_names
1048       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1049       keys (column names) are passed, it will return the current setting as a
1050       list.
1051
1052       "column_names" accepts a list of scalars  (the column names)  or a
1053       single array_ref, so you can pass the return value from "getline" too:
1054
1055        $csv->column_names ($csv->getline ($fh));
1056
1057       "column_names" does no checking on duplicates at all, which might lead
1058       to unexpected results.   Undefined entries will be replaced with the
1059       string "\cAUNDEF\cA", so
1060
1061        $csv->column_names (undef, "", "name", "name");
1062        $hr = $csv->getline_hr ($fh);
1063
1064       Will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1065       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1066       field.
1067
1068       "column_names" croaks on invalid arguments.
1069
1070   header
1071       This method does NOT work in perl-5.6.x
1072
1073       Parse the CSV header and set "sep", column_names and encoding.
1074
1075        my @hdr = $csv->header ($fh);
1076        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1077        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1078
1079       The first argument should be a file handle.
1080
1081       This method resets some object properties,  as it is supposed to be
1082       invoked only once per file or stream.  It will leave attributes
1083       "column_names" and "bound_columns" alone of setting column names is
1084       disabled. Reading headers on previously process objects might fail on
1085       perl-5.8.0 and older.
1086
1087       Assuming that the file opened for parsing has a header, and the header
1088       does not contain problematic characters like embedded newlines,   read
1089       the first line from the open handle then auto-detect whether the header
1090       separates the column names with a character from the allowed separator
1091       list.
1092
1093       If any of the allowed separators matches,  and none of the other
1094       allowed separators match,  set  "sep"  to that  separator  for the
1095       current CSV instance and use it to parse the first line, map those to
1096       lowercase, and use that to set the instance "column_names":
1097
1098        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1099        open my $fh, "<", "file.csv";
1100        binmode $fh; # for Windows
1101        $csv->header ($fh);
1102        while (my $row = $csv->getline_hr ($fh)) {
1103            ...
1104            }
1105
1106       If the header is empty,  contains more than one unique separator out of
1107       the allowed set,  contains empty fields,   or contains identical fields
1108       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1109       respectively.
1110
1111       If the header contains embedded newlines or is not valid  CSV  in any
1112       other way, this method will croak and leave the parse error untouched.
1113
1114       A successful call to "header"  will always set the  "sep"  of the $csv
1115       object. This behavior can not be disabled.
1116
1117       return value
1118
1119       On error this method will croak.
1120
1121       In list context,  the headers will be returned whether they are used to
1122       set "column_names" or not.
1123
1124       In scalar context, the instance itself is returned.  Note: the values
1125       as found in the header will effectively be  lost if  "set_column_names"
1126       is false.
1127
1128       Options
1129
1130       sep_set
1131          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1132
1133         The list of legal separators defaults to "[ ";", "," ]" and can be
1134         changed by this option.  As this is probably the most often used
1135         option,  it can be passed on its own as an unnamed argument:
1136
1137          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1138
1139         Multi-byte  sequences are allowed,  both multi-character and
1140         Unicode.  See "sep".
1141
1142       detect_bom
1143          $csv->header ($fh, { detect_bom => 1 });
1144
1145         The default behavior is to detect if the header line starts with a
1146         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1147         This default behavior can be disabled by passing a false value to
1148         "detect_bom".
1149
1150         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1151         UTF-32BE,  and UTF-32LE. BOM's also support UTF-1, UTF-EBCDIC, SCSU,
1152         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1153         supported.
1154
1155         If a supported BOM was detected as start of the stream, it is stored
1156         in the abject attribute "ENCODING".
1157
1158          my $enc = $csv->{ENCODING};
1159
1160         The encoding is used with "binmode" on $fh.
1161
1162         If the handle was opened in a (correct) encoding,  this method will
1163         not alter the encoding, as it checks the leading bytes of the first
1164         line. In case the stream starts with a decode BOM ("U+FEFF"),
1165         "{ENCODING}" will be "" (empty) instead of the default "undef".
1166
1167       munge_column_names
1168         This option offers the means to modify the column names into
1169         something that is most useful to the application.   The default is to
1170         map all column names to lower case.
1171
1172          $csv->header ($fh, { munge_column_names => "lc" });
1173
1174         The following values are available:
1175
1176           lc     - lower case
1177           uc     - upper case
1178           none   - do not change
1179           \%hash - supply a mapping
1180           \&cb   - supply a callback
1181
1182         Literal:
1183
1184          $csv->header ($fh, { munge_column_names => "none" });
1185
1186         Hash:
1187
1188          $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1189
1190         if a value does not exist, the original value is used unchanged
1191
1192         Callback:
1193
1194          $csv->header ($fh, { munge_column_names => sub { fc } });
1195          $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1196          $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1197
1198         As this callback is called in a "map", you can use $_ directly.
1199
1200       set_column_names
1201          $csv->header ($fh, { set_column_names => 1 });
1202
1203         The default is to set the instances column names using
1204         "column_names" if the method is successful,  so subsequent calls to
1205         "getline_hr" can return a hash. Disable setting the header can be
1206         forced by using a false value for this option.
1207
1208         As described in "return value" above, content is lost in scalar
1209         context.
1210
1211       Validation
1212
1213       When receiving CSV files from external sources,  this method can be
1214       used to protect against changes in the layout by restricting to known
1215       headers  (and typos in the header fields).
1216
1217        my %known = (
1218            "record key" => "c_rec",
1219            "rec id"     => "c_rec",
1220            "id_rec"     => "c_rec",
1221            "kode"       => "code",
1222            "code"       => "code",
1223            "vaule"      => "value",
1224            "value"      => "value",
1225            );
1226        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1227        open my $fh, "<", $source or die "$source: $!";
1228        $csv->header ($fh, { munge_column_names => sub {
1229            s/\s+$//;
1230            s/^\s+//;
1231            $known{lc $_} or die "Unknown column '$_' in $source";
1232            }});
1233        while (my $row = $csv->getline_hr ($fh)) {
1234            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1235            }
1236
1237   bind_columns
1238       Takes a list of scalar references to be used for output with  "print"
1239       or to store in the fields fetched by "getline".  When you do not pass
1240       enough references to store the fetched fields in, "getline" will fail
1241       with error 3006.  If you pass more than there are fields to return,
1242       the content of the remaining references is left untouched.
1243
1244        $csv->bind_columns (\$code, \$name, \$price, \$description);
1245        while ($csv->getline ($fh)) {
1246            print "The price of a $name is \x{20ac} $price\n";
1247            }
1248
1249       To reset or clear all column binding, call "bind_columns" with the
1250       single argument "undef". This will also clear column names.
1251
1252        $csv->bind_columns (undef);
1253
1254       If no arguments are passed at all, "bind_columns" will return the list
1255       of current bindings or "undef" if no binds are active.
1256
1257       Note that in parsing with  "bind_columns",  the fields are set on the
1258       fly.  That implies that if the third field of a row causes an error
1259       (or this row has just two fields where the previous row had more),  the
1260       first two fields already have been assigned the values of the current
1261       row, while the rest of the fields will still hold the values of the
1262       previous row.  If you want the parser to fail in these cases, use the
1263       "strict" attribute.
1264
1265   eof
1266        $eof = $csv->eof ();
1267
1268       If "parse" or  "getline"  was used with an IO stream,  this method will
1269       return true (1) if the last call hit end of file,  otherwise it will
1270       return false ('').  This is useful to see the difference between a
1271       failure and end of file.
1272
1273       Note that if the parsing of the last line caused an error,  "eof" is
1274       still true.  That means that if you are not using "auto_diag", an idiom
1275       like
1276
1277        while (my $row = $csv->getline ($fh)) {
1278            # ...
1279            }
1280        $csv->eof or $csv->error_diag;
1281
1282       will not report the error. You would have to change that to
1283
1284        while (my $row = $csv->getline ($fh)) {
1285            # ...
1286            }
1287        +$csv->error_diag and $csv->error_diag;
1288
1289   types
1290        $csv->types (\@tref);
1291
1292       This method is used to force that  (all)  columns are of a given type.
1293       For example, if you have an integer column,  two  columns  with
1294       doubles  and a string column, then you might do a
1295
1296        $csv->types ([Text::CSV::IV (),
1297                      Text::CSV::NV (),
1298                      Text::CSV::NV (),
1299                      Text::CSV::PV ()]);
1300
1301       Column types are used only for decoding columns while parsing,  in
1302       other words by the "parse" and "getline" methods.
1303
1304       You can unset column types by doing a
1305
1306        $csv->types (undef);
1307
1308       or fetch the current type settings with
1309
1310        $types = $csv->types ();
1311
1312       IV  Set field type to integer.
1313
1314       NV  Set field type to numeric/float.
1315
1316       PV  Set field type to string.
1317
1318   fields
1319        @columns = $csv->fields ();
1320
1321       This method returns the input to   "combine"  or the resultant
1322       decomposed fields of a successful "parse", whichever was called more
1323       recently.
1324
1325       Note that the return value is undefined after using "getline", which
1326       does not fill the data structures returned by "parse".
1327
1328   meta_info
1329        @flags = $csv->meta_info ();
1330
1331       This method returns the "flags" of the input to "combine" or the flags
1332       of the resultant  decomposed fields of  "parse",   whichever was called
1333       more recently.
1334
1335       For each field,  a meta_info field will hold  flags that  inform
1336       something about  the  field  returned  by  the  "fields"  method or
1337       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1338
1339       " "0x0001
1340         The field was quoted.
1341
1342       " "0x0002
1343         The field was binary.
1344
1345       See the "is_***" methods below.
1346
1347   is_quoted
1348        my $quoted = $csv->is_quoted ($column_idx);
1349
1350       Where  $column_idx is the  (zero-based)  index of the column in the
1351       last result of "parse".
1352
1353       This returns a true value  if the data in the indicated column was
1354       enclosed in "quote_char" quotes.  This might be important for fields
1355       where content ",20070108," is to be treated as a numeric value,  and
1356       where ","20070108"," is explicitly marked as character string data.
1357
1358       This method is only valid when "keep_meta_info" is set to a true value.
1359
1360   is_binary
1361        my $binary = $csv->is_binary ($column_idx);
1362
1363       Where  $column_idx is the  (zero-based)  index of the column in the
1364       last result of "parse".
1365
1366       This returns a true value if the data in the indicated column contained
1367       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1368
1369       This method is only valid when "keep_meta_info" is set to a true value.
1370
1371   is_missing
1372        my $missing = $csv->is_missing ($column_idx);
1373
1374       Where  $column_idx is the  (zero-based)  index of the column in the
1375       last result of "getline_hr".
1376
1377        $csv->keep_meta_info (1);
1378        while (my $hr = $csv->getline_hr ($fh)) {
1379            $csv->is_missing (0) and next; # This was an empty line
1380            }
1381
1382       When using  "getline_hr",  it is impossible to tell if the  parsed
1383       fields are "undef" because they where not filled in the "CSV" stream
1384       or because they were not read at all, as all the fields defined by
1385       "column_names" are set in the hash-ref.    If you still need to know if
1386       all fields in each row are provided, you should enable "keep_meta_info"
1387       so you can check the flags.
1388
1389       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1390       "undef", regardless of $column_idx being valid or not. If this
1391       attribute is "true" it will return either 0 (the field is present) or 1
1392       (the field is missing).
1393
1394       A special case is the empty line.  If the line is completely empty -
1395       after dealing with the flags - this is still a valid CSV line:  it is a
1396       record of just one single empty field. However, if "keep_meta_info" is
1397       set, invoking "is_missing" with index 0 will now return true.
1398
1399   status
1400        $status = $csv->status ();
1401
1402       This method returns the status of the last invoked "combine" or "parse"
1403       call. Status is success (true: 1) or failure (false: "undef" or 0).
1404
1405   error_input
1406        $bad_argument = $csv->error_input ();
1407
1408       This method returns the erroneous argument (if it exists) of "combine"
1409       or "parse",  whichever was called more recently.  If the last
1410       invocation was successful, "error_input" will return "undef".
1411
1412   error_diag
1413        Text::CSV->error_diag ();
1414        $csv->error_diag ();
1415        $error_code               = 0  + $csv->error_diag ();
1416        $error_str                = "" . $csv->error_diag ();
1417        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1418
1419       If (and only if) an error occurred,  this function returns  the
1420       diagnostics of that error.
1421
1422       If called in void context,  this will print the internal error code and
1423       the associated error message to STDERR.
1424
1425       If called in list context,  this will return  the error code  and the
1426       error message in that order.  If the last error was from parsing, the
1427       rest of the values returned are a best guess at the location  within
1428       the line  that was being parsed. Their values are 1-based.  The
1429       position currently is index of the byte at which the parsing failed in
1430       the current record. It might change to be the index of the current
1431       character in a later release. The records is the index of the record
1432       parsed by the csv instance. The field number is the index of the field
1433       the parser thinks it is currently  trying to  parse. See
1434       examples/csv-check for how this can be used.
1435
1436       If called in  scalar context,  it will return  the diagnostics  in a
1437       single scalar, a-la $!.  It will contain the error code in numeric
1438       context, and the diagnostics message in string context.
1439
1440       When called as a class method or a  direct function call,  the
1441       diagnostics are that of the last "new" call.
1442
1443   record_number
1444        $recno = $csv->record_number ();
1445
1446       Returns the records parsed by this csv instance.  This value should be
1447       more accurate than $. when embedded newlines come in play. Records
1448       written by this instance are not counted.
1449
1450   SetDiag
1451        $csv->SetDiag (0);
1452
1453       Use to reset the diagnostics if you are dealing with errors.
1454

ADDITIONAL METHODS

1456       backend
1457           Returns the backend module name called by Text::CSV.  "module" is
1458           an alias.
1459
1460       is_xs
1461           Returns true value if Text::CSV uses an XS backend.
1462
1463       is_pp
1464           Returns true value if Text::CSV uses a pure-Perl backend.
1465

FUNCTIONS

1467       This section is also taken from Text::CSV_XS.
1468
1469   csv
1470       This function is not exported by default and should be explicitly
1471       requested:
1472
1473        use Text::CSV qw( csv );
1474
1475       This is an high-level function that aims at simple (user) interfaces.
1476       This can be used to read/parse a "CSV" file or stream (the default
1477       behavior) or to produce a file or write to a stream (define the  "out"
1478       attribute).  It returns an array- or hash-reference on parsing (or
1479       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1480       When this function fails you can get to the error using the class call
1481       to "error_diag"
1482
1483        my $aoa = csv (in => "test.csv") or
1484            die Text::CSV->error_diag;
1485
1486       This function takes the arguments as key-value pairs. This can be
1487       passed as a list or as an anonymous hash:
1488
1489        my $aoa = csv (  in => "test.csv", sep_char => ";");
1490        my $aoh = csv ({ in => $fh, headers => "auto" });
1491
1492       The arguments passed consist of two parts:  the arguments to "csv"
1493       itself and the optional attributes to the  "CSV"  object used inside
1494       the function as enumerated and explained in "new".
1495
1496       If not overridden, the default option used for CSV is
1497
1498        auto_diag   => 1
1499        escape_null => 0
1500
1501       The option that is always set and cannot be altered is
1502
1503        binary      => 1
1504
1505       As this function will likely be used in one-liners,  it allows  "quote"
1506       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1507       "esc" or "escape".
1508
1509       Alternative invocations:
1510
1511        my $aoa = Text::CSV::csv (in => "file.csv");
1512
1513        my $csv = Text::CSV->new ();
1514        my $aoa = $csv->csv (in => "file.csv");
1515
1516       In the latter case, the object attributes are used from the existing
1517       object and the attribute arguments in the function call are ignored:
1518
1519        my $csv = Text::CSV->new ({ sep_char => ";" });
1520        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1521
1522       will parse using ";" as "sep_char", not ",".
1523
1524       in
1525
1526       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1527       which will be  opened for reading  and closed when finished,  a file
1528       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1529       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1530       "\q{1,2,"csv"}").
1531
1532       When used with "out", "in" should be a reference to a CSV structure
1533       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1534       reference.  The code-ref will be invoked with no arguments.
1535
1536        my $aoa = csv (in => "file.csv");
1537
1538        open my $fh, "<", "file.csv";
1539        my $aoa = csv (in => $fh);
1540
1541        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1542        my $err = csv (in => $csv, out => "file.csv");
1543
1544       If called in void context without the "out" attribute, the resulting
1545       ref will be used as input to a subsequent call to csv:
1546
1547        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1548
1549       will be a shortcut to
1550
1551        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1552
1553       where, in the absence of the "out" attribute, this is a shortcut to
1554
1555        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1556             out => *STDOUT)
1557
1558       out
1559
1560        csv (in => $aoa, out => "file.csv");
1561        csv (in => $aoa, out => $fh);
1562        csv (in => $aoa, out =>   STDOUT);
1563        csv (in => $aoa, out =>  *STDOUT);
1564        csv (in => $aoa, out => \*STDOUT);
1565        csv (in => $aoa, out => \my $data);
1566        csv (in => $aoa, out =>  undef);
1567        csv (in => $aoa, out => \"skip");
1568
1569       In output mode, the default CSV options when producing CSV are
1570
1571        eol       => "\r\n"
1572
1573       The "fragment" attribute is ignored in output mode.
1574
1575       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1576       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1577       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1578       or a reference to a scalar (e.g. "\my $data").
1579
1580        csv (in => sub { $sth->fetch },            out => "dump.csv");
1581        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1582             headers => $sth->{NAME_lc});
1583
1584       When a code-ref is used for "in", the output is generated  per
1585       invocation, so no buffering is involved. This implies that there is no
1586       size restriction on the number of records. The "csv" function ends when
1587       the coderef returns a false value.
1588
1589       If "out" is set to a reference of the literal string "skip", the output
1590       will be suppressed completely,  which might be useful in combination
1591       with a filter for side effects only.
1592
1593        my %cache;
1594        csv (in    => "dump.csv",
1595             out   => \"skip",
1596             on_in => sub { $cache{$_[1][1]}++ });
1597
1598       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1599       equivalent to "\"skip"".
1600
1601       encoding
1602
1603       If passed,  it should be an encoding accepted by the  ":encoding()"
1604       option to "open". There is no default value. This attribute does not
1605       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1606       use in command line invocations.
1607
1608       If "encoding" is set to the literal value "auto", the method "header"
1609       will be invoked on the opened stream to check if there is a BOM and set
1610       the encoding accordingly.   This is equal to passing a true value in
1611       the option "detect_bom".
1612
1613       detect_bom
1614
1615       If  "detect_bom"  is given, the method  "header"  will be invoked on
1616       the opened stream to check if there is a BOM and set the encoding
1617       accordingly.
1618
1619       "detect_bom" can be abbreviated to "bom".
1620
1621       This is the same as setting "encoding" to "auto".
1622
1623       Note that as the method  "header" is invoked,  its default is to also
1624       set the headers.
1625
1626       headers
1627
1628       If this attribute is not given, the default behavior is to produce an
1629       array of arrays.
1630
1631       If "headers" is supplied,  it should be an anonymous list of column
1632       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1633       "lc", "uc", or "skip".
1634
1635       skip
1636         When "skip" is used, the header will not be included in the output.
1637
1638          my $aoa = csv (in => $fh, headers => "skip");
1639
1640       auto
1641         If "auto" is used, the first line of the "CSV" source will be read as
1642         the list of field headers and used to produce an array of hashes.
1643
1644          my $aoh = csv (in => $fh, headers => "auto");
1645
1646       lc
1647         If "lc" is used,  the first line of the  "CSV" source will be read as
1648         the list of field headers mapped to  lower case and used to produce
1649         an array of hashes. This is a variation of "auto".
1650
1651          my $aoh = csv (in => $fh, headers => "lc");
1652
1653       uc
1654         If "uc" is used,  the first line of the  "CSV" source will be read as
1655         the list of field headers mapped to  upper case and used to produce
1656         an array of hashes. This is a variation of "auto".
1657
1658          my $aoh = csv (in => $fh, headers => "uc");
1659
1660       CODE
1661         If a coderef is used,  the first line of the  "CSV" source will be
1662         read as the list of mangled field headers in which each field is
1663         passed as the only argument to the coderef. This list is used to
1664         produce an array of hashes.
1665
1666          my $aoh = csv (in      => $fh,
1667                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1668
1669         this example is a variation of using "lc" where all occurrences of
1670         "kode" are replaced with "code".
1671
1672       ARRAY
1673         If  "headers"  is an anonymous list,  the entries in the list will be
1674         used as field names. The first line is considered data instead of
1675         headers.
1676
1677          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1678          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1679
1680       HASH
1681         If "headers" is an hash reference, this implies "auto", but header
1682         fields for that exist as key in the hashref will be replaced by the
1683         value for that key. Given a CSV file like
1684
1685          post-kode,city,name,id number,fubble
1686          1234AA,Duckstad,Donald,13,"X313DF"
1687
1688         using
1689
1690          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1691
1692         will return an entry like
1693
1694          { pc     => "1234AA",
1695            city   => "Duckstad",
1696            name   => "Donald",
1697            ID     => "13",
1698            fubble => "X313DF",
1699            }
1700
1701       See also "munge_column_names" and "set_column_names".
1702
1703       munge_column_names
1704
1705       If "munge_column_names" is set,  the method  "header"  is invoked on
1706       the opened stream with all matching arguments to detect and set the
1707       headers.
1708
1709       "munge_column_names" can be abbreviated to "munge".
1710
1711       key
1712
1713       If passed,  will default  "headers"  to "auto" and return a hashref
1714       instead of an array of hashes. Allowed values are simple scalars or
1715       array-references where the first element is the joiner and the rest are
1716       the fields to join to combine the key.
1717
1718        my $ref = csv (in => "test.csv", key => "code");
1719        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1720
1721       with test.csv like
1722
1723        code,product,price,color
1724        1,pc,850,gray
1725        2,keyboard,12,white
1726        3,mouse,5,black
1727
1728       the first example will return
1729
1730         { 1   => {
1731               code    => 1,
1732               color   => 'gray',
1733               price   => 850,
1734               product => 'pc'
1735               },
1736           2   => {
1737               code    => 2,
1738               color   => 'white',
1739               price   => 12,
1740               product => 'keyboard'
1741               },
1742           3   => {
1743               code    => 3,
1744               color   => 'black',
1745               price   => 5,
1746               product => 'mouse'
1747               }
1748           }
1749
1750       the second example will return
1751
1752         { "1:gray"    => {
1753               code    => 1,
1754               color   => 'gray',
1755               price   => 850,
1756               product => 'pc'
1757               },
1758           "2:white"   => {
1759               code    => 2,
1760               color   => 'white',
1761               price   => 12,
1762               product => 'keyboard'
1763               },
1764           "3:black"   => {
1765               code    => 3,
1766               color   => 'black',
1767               price   => 5,
1768               product => 'mouse'
1769               }
1770           }
1771
1772       The "key" attribute can be combined with "headers" for "CSV" date that
1773       has no header line, like
1774
1775        my $ref = csv (
1776            in      => "foo.csv",
1777            headers => [qw( c_foo foo bar description stock )],
1778            key     =>     "c_foo",
1779            );
1780
1781       value
1782
1783       Used to create key-value hashes.
1784
1785       Only allowed when "key" is valid. A "value" can be either a single
1786       column label or an anonymous list of column labels.  In the first case,
1787       the value will be a simple scalar value, in the latter case, it will be
1788       a hashref.
1789
1790        my $ref = csv (in => "test.csv", key   => "code",
1791                                         value => "price");
1792        my $ref = csv (in => "test.csv", key   => "code",
1793                                         value => [ "product", "price" ]);
1794        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1795                                         value => "price");
1796        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1797                                         value => [ "product", "price" ]);
1798
1799       with test.csv like
1800
1801        code,product,price,color
1802        1,pc,850,gray
1803        2,keyboard,12,white
1804        3,mouse,5,black
1805
1806       the first example will return
1807
1808         { 1 => 850,
1809           2 =>  12,
1810           3 =>   5,
1811           }
1812
1813       the second example will return
1814
1815         { 1   => {
1816               price   => 850,
1817               product => 'pc'
1818               },
1819           2   => {
1820               price   => 12,
1821               product => 'keyboard'
1822               },
1823           3   => {
1824               price   => 5,
1825               product => 'mouse'
1826               }
1827           }
1828
1829       the third example will return
1830
1831         { "1:gray"    => 850,
1832           "2:white"   =>  12,
1833           "3:black"   =>   5,
1834           }
1835
1836       the fourth example will return
1837
1838         { "1:gray"    => {
1839               price   => 850,
1840               product => 'pc'
1841               },
1842           "2:white"   => {
1843               price   => 12,
1844               product => 'keyboard'
1845               },
1846           "3:black"   => {
1847               price   => 5,
1848               product => 'mouse'
1849               }
1850           }
1851
1852       keep_headers
1853
1854       When using hashes,  keep the column names into the arrayref passed,  so
1855       all headers are available after the call in the original order.
1856
1857        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1858
1859       This attribute can be abbreviated to "kh" or passed as
1860       "keep_column_names".
1861
1862       This attribute implies a default of "auto" for the "headers" attribute.
1863
1864       fragment
1865
1866       Only output the fragment as defined in the "fragment" method. This
1867       option is ignored when generating "CSV". See "out".
1868
1869       Combining all of them could give something like
1870
1871        use Text::CSV qw( csv );
1872        my $aoh = csv (
1873            in       => "test.txt",
1874            encoding => "utf-8",
1875            headers  => "auto",
1876            sep_char => "|",
1877            fragment => "row=3;6-9;15-*",
1878            );
1879        say $aoh->[15]{Foo};
1880
1881       sep_set
1882
1883       If "sep_set" is set, the method "header" is invoked on the opened
1884       stream to detect and set "sep_char" with the given set.
1885
1886       "sep_set" can be abbreviated to "seps".
1887
1888       Note that as the  "header" method is invoked,  its default is to also
1889       set the headers.
1890
1891       set_column_names
1892
1893       If  "set_column_names" is passed,  the method "header" is invoked on
1894       the opened stream with all arguments meant for "header".
1895
1896       If "set_column_names" is passed as a false value, the content of the
1897       first row is only preserved if the output is AoA:
1898
1899       With an input-file like
1900
1901        bAr,foo
1902        1,2
1903        3,4,5
1904
1905       This call
1906
1907        my $aoa = csv (in => $file, set_column_names => 0);
1908
1909       will result in
1910
1911        [[ "bar", "foo"     ],
1912         [ "1",   "2"       ],
1913         [ "3",   "4",  "5" ]]
1914
1915       and
1916
1917        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
1918
1919       will result in
1920
1921        [[ "bAr", "foo"     ],
1922         [ "1",   "2"       ],
1923         [ "3",   "4",  "5" ]]
1924
1925   Callbacks
1926       Callbacks enable actions triggered from the inside of Text::CSV.
1927
1928       While most of what this enables  can easily be done in an  unrolled
1929       loop as described in the "SYNOPSIS" callbacks can be used to meet
1930       special demands or enhance the "csv" function.
1931
1932       error
1933          $csv->callbacks (error => sub { $csv->SetDiag (0) });
1934
1935         the "error"  callback is invoked when an error occurs,  but  only
1936         when "auto_diag" is set to a true value. A callback is invoked with
1937         the values returned by "error_diag":
1938
1939          my ($c, $s);
1940
1941          sub ignore3006
1942          {
1943              my ($err, $msg, $pos, $recno, $fldno) = @_;
1944              if ($err == 3006) {
1945                  # ignore this error
1946                  ($c, $s) = (undef, undef);
1947                  Text::CSV->SetDiag (0);
1948                  }
1949              # Any other error
1950              return;
1951              } # ignore3006
1952
1953          $csv->callbacks (error => \&ignore3006);
1954          $csv->bind_columns (\$c, \$s);
1955          while ($csv->getline ($fh)) {
1956              # Error 3006 will not stop the loop
1957              }
1958
1959       after_parse
1960          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
1961          while (my $row = $csv->getline ($fh)) {
1962              $row->[-1] eq "NEW";
1963              }
1964
1965         This callback is invoked after parsing with  "getline"  only if no
1966         error occurred.  The callback is invoked with two arguments:   the
1967         current "CSV" parser object and an array reference to the fields
1968         parsed.
1969
1970         The return code of the callback is ignored  unless it is a reference
1971         to the string "skip", in which case the record will be skipped in
1972         "getline_all".
1973
1974          sub add_from_db
1975          {
1976              my ($csv, $row) = @_;
1977              $sth->execute ($row->[4]);
1978              push @$row, $sth->fetchrow_array;
1979              } # add_from_db
1980
1981          my $aoa = csv (in => "file.csv", callbacks => {
1982              after_parse => \&add_from_db });
1983
1984         This hook can be used for validation:
1985
1986         FAIL
1987           Die if any of the records does not validate a rule:
1988
1989            after_parse => sub {
1990                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
1991                    die "5th field does not have a valid Dutch zipcode";
1992                }
1993
1994         DEFAULT
1995           Replace invalid fields with a default value:
1996
1997            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
1998
1999         SKIP
2000           Skip records that have invalid fields (only applies to
2001           "getline_all"):
2002
2003            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2004
2005       before_print
2006          my $idx = 1;
2007          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2008          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2009
2010         This callback is invoked  before printing with  "print"  only if no
2011         error occurred.  The callback is invoked with two arguments:  the
2012         current  "CSV" parser object and an array reference to the fields
2013         passed.
2014
2015         The return code of the callback is ignored.
2016
2017          sub max_4_fields
2018          {
2019              my ($csv, $row) = @_;
2020              @$row > 4 and splice @$row, 4;
2021              } # max_4_fields
2022
2023          csv (in => csv (in => "file.csv"), out => *STDOUT,
2024              callbacks => { before print => \&max_4_fields });
2025
2026         This callback is not active for "combine".
2027
2028       Callbacks for csv ()
2029
2030       The "csv" allows for some callbacks that do not integrate in XS
2031       internals but only feature the "csv" function.
2032
2033         csv (in        => "file.csv",
2034              callbacks => {
2035                  filter       => { 6 => sub { $_ > 15 } },    # first
2036                  after_parse  => sub { say "AFTER PARSE";  }, # first
2037                  after_in     => sub { say "AFTER IN";     }, # second
2038                  on_in        => sub { say "ON IN";        }, # third
2039                  },
2040              );
2041
2042         csv (in        => $aoh,
2043              out       => "file.csv",
2044              callbacks => {
2045                  on_in        => sub { say "ON IN";        }, # first
2046                  before_out   => sub { say "BEFORE OUT";   }, # second
2047                  before_print => sub { say "BEFORE PRINT"; }, # third
2048                  },
2049              );
2050
2051       filter
2052         This callback can be used to filter records.  It is called just after
2053         a new record has been scanned.  The callback accepts a:
2054
2055         hashref
2056           The keys are the index to the row (the field name or field number,
2057           1-based) and the values are subs to return a true or false value.
2058
2059            csv (in => "file.csv", filter => {
2060                       3 => sub { m/a/ },       # third field should contain an "a"
2061                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2062                       });
2063
2064            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2065
2066           If the keys to the filter hash contain any character that is not a
2067           digit it will also implicitly set "headers" to "auto"  unless
2068           "headers"  was already passed as argument.  When headers are
2069           active, returning an array of hashes, the filter is not applicable
2070           to the header itself.
2071
2072           All sub results should match, as in AND.
2073
2074           The context of the callback sets  $_ localized to the field
2075           indicated by the filter. The two arguments are as with all other
2076           callbacks, so the other fields in the current row can be seen:
2077
2078            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2079
2080           If the context is set to return a list of hashes  ("headers" is
2081           defined), the current record will also be available in the
2082           localized %_:
2083
2084            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2085
2086           If the filter is used to alter the content by changing $_,  make
2087           sure that the sub returns true in order not to have that record
2088           skipped:
2089
2090            filter => { 2 => sub { $_ = uc }}
2091
2092           will upper-case the second field, and then skip it if the resulting
2093           content evaluates to false. To always accept, end with truth:
2094
2095            filter => { 2 => sub { $_ = uc; 1 }}
2096
2097         coderef
2098            csv (in => "file.csv", filter => sub { $n++; 0; });
2099
2100           If the argument to "filter" is a coderef,  it is an alias or
2101           shortcut to a filter on column 0:
2102
2103            csv (filter => sub { $n++; 0 });
2104
2105           is equal to
2106
2107            csv (filter => { 0 => sub { $n++; 0 });
2108
2109         filter-name
2110            csv (in => "file.csv", filter => "not_blank");
2111            csv (in => "file.csv", filter => "not_empty");
2112            csv (in => "file.csv", filter => "filled");
2113
2114           These are predefined filters
2115
2116           Given a file like (line numbers prefixed for doc purpose only):
2117
2118            1:1,2,3
2119            2:
2120            3:,
2121            4:""
2122            5:,,
2123            6:, ,
2124            7:"",
2125            8:" "
2126            9:4,5,6
2127
2128           not_blank
2129             Filter out the blank lines
2130
2131             This filter is a shortcut for
2132
2133              filter => { 0 => sub { @{$_[1]} > 1 or
2134                          defined $_[1][0] && $_[1][0] ne "" } }
2135
2136             Due to the implementation,  it is currently impossible to also
2137             filter lines that consists only of a quoted empty field. These
2138             lines are also considered blank lines.
2139
2140             With the given example, lines 2 and 4 will be skipped.
2141
2142           not_empty
2143             Filter out lines where all the fields are empty.
2144
2145             This filter is a shortcut for
2146
2147              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2148
2149             A space is not regarded being empty, so given the example data,
2150             lines 2, 3, 4, 5, and 7 are skipped.
2151
2152           filled
2153             Filter out lines that have no visible data
2154
2155             This filter is a shortcut for
2156
2157              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2158
2159             This filter rejects all lines that not have at least one field
2160             that does not evaluate to the empty string.
2161
2162             With the given example data, this filter would skip lines 2
2163             through 8.
2164
2165       after_in
2166         This callback is invoked for each record after all records have been
2167         parsed but before returning the reference to the caller.  The hook is
2168         invoked with two arguments:  the current  "CSV"  parser object  and a
2169         reference to the record.   The reference can be a reference to a
2170         HASH  or a reference to an ARRAY as determined by the arguments.
2171
2172         This callback can also be passed as  an attribute without the
2173         "callbacks" wrapper.
2174
2175       before_out
2176         This callback is invoked for each record before the record is
2177         printed.  The hook is invoked with two arguments:  the current "CSV"
2178         parser object and a reference to the record.   The reference can be a
2179         reference to a  HASH or a reference to an ARRAY as determined by the
2180         arguments.
2181
2182         This callback can also be passed as an attribute  without the
2183         "callbacks" wrapper.
2184
2185         This callback makes the row available in %_ if the row is a hashref.
2186         In this case %_ is writable and will change the original row.
2187
2188       on_in
2189         This callback acts exactly as the "after_in" or the "before_out"
2190         hooks.
2191
2192         This callback can also be passed as an attribute  without the
2193         "callbacks" wrapper.
2194
2195         This callback makes the row available in %_ if the row is a hashref.
2196         In this case %_ is writable and will change the original row. So e.g.
2197         with
2198
2199           my $aoh = csv (
2200               in      => \"foo\n1\n2\n",
2201               headers => "auto",
2202               on_in   => sub { $_{bar} = 2; },
2203               );
2204
2205         $aoh will be:
2206
2207           [ { foo => 1,
2208               bar => 2,
2209               }
2210             { foo => 2,
2211               bar => 2,
2212               }
2213             ]
2214
2215       csv
2216         The function  "csv" can also be called as a method or with an
2217         existing Text::CSV object. This could help if the function is to be
2218         invoked a lot of times and the overhead of creating the object
2219         internally over  and  over again would be prevented by passing an
2220         existing instance.
2221
2222          my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2223
2224          my $aoa = $csv->csv (in => $fh);
2225          my $aoa = csv (in => $fh, csv => $csv);
2226
2227         both act the same. Running this 20000 times on a 20 lines CSV file,
2228         showed a 53% speedup.
2229

DIAGNOSTICS

2231       This section is also taken from Text::CSV_XS.
2232
2233       Still under construction ...
2234
2235       If an error occurs,  "$csv->error_diag" can be used to get information
2236       on the cause of the failure. Note that for speed reasons the internal
2237       value is never cleared on success,  so using the value returned by
2238       "error_diag" in normal cases - when no error occurred - may cause
2239       unexpected results.
2240
2241       If the constructor failed, the cause can be found using "error_diag" as
2242       a class method, like "Text::CSV->error_diag".
2243
2244       The "$csv->error_diag" method is automatically invoked upon error when
2245       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2246       autodie is in effect.  When set to 1, this will cause a "warn" with the
2247       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2248       from "auto_diag" reports.
2249
2250       Errors can be (individually) caught using the "error" callback.
2251
2252       The errors as described below are available. I have tried to make the
2253       error itself explanatory enough, but more descriptions will be added.
2254       For most of these errors, the first three capitals describe the error
2255       category:
2256
2257       · INI
2258
2259         Initialization error or option conflict.
2260
2261       · ECR
2262
2263         Carriage-Return related parse error.
2264
2265       · EOF
2266
2267         End-Of-File related parse error.
2268
2269       · EIQ
2270
2271         Parse error inside quotation.
2272
2273       · EIF
2274
2275         Parse error inside field.
2276
2277       · ECB
2278
2279         Combine error.
2280
2281       · EHR
2282
2283         HashRef parse related error.
2284
2285       And below should be the complete list of error codes that can be
2286       returned:
2287
2288       · 1001 "INI - sep_char is equal to quote_char or escape_char"
2289
2290         The  separation character  cannot be equal to  the quotation
2291         character or to the escape character,  as this would invalidate all
2292         parsing rules.
2293
2294       · 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2295         TAB"
2296
2297         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2298         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2299         allow.
2300
2301       · 1003 "INI - \r or \n in main attr not allowed"
2302
2303         Using default "eol" characters in either "sep_char", "quote_char",
2304         or  "escape_char"  is  not allowed.
2305
2306       · 1004 "INI - callbacks should be undef or a hashref"
2307
2308         The "callbacks"  attribute only allows one to be "undef" or a hash
2309         reference.
2310
2311       · 1005 "INI - EOL too long"
2312
2313         The value passed for EOL is exceeding its maximum length (16).
2314
2315       · 1006 "INI - SEP too long"
2316
2317         The value passed for SEP is exceeding its maximum length (16).
2318
2319       · 1007 "INI - QUOTE too long"
2320
2321         The value passed for QUOTE is exceeding its maximum length (16).
2322
2323       · 1008 "INI - SEP undefined"
2324
2325         The value passed for SEP should be defined and not empty.
2326
2327       · 1010 "INI - the header is empty"
2328
2329         The header line parsed in the "header" is empty.
2330
2331       · 1011 "INI - the header contains more than one valid separator"
2332
2333         The header line parsed in the  "header"  contains more than one
2334         (unique) separator character out of the allowed set of separators.
2335
2336       · 1012 "INI - the header contains an empty field"
2337
2338         The header line parsed in the "header" is contains an empty field.
2339
2340       · 1013 "INI - the header contains nun-unique fields"
2341
2342         The header line parsed in the  "header"  contains at least  two
2343         identical fields.
2344
2345       · 1014 "INI - header called on undefined stream"
2346
2347         The header line cannot be parsed from an undefined sources.
2348
2349       · 1500 "PRM - Invalid/unsupported argument(s)"
2350
2351         Function or method called with invalid argument(s) or parameter(s).
2352
2353       · 1501 "PRM - The key attribute is passed as an unsupported type"
2354
2355         The "key" attribute is of an unsupported type.
2356
2357       · 1502 "PRM - The value attribute is passed without the key attribute"
2358
2359         The "value" attribute is only allowed when a valid key is given.
2360
2361       · 1503 "PRM - The value attribute is passed as an unsupported type"
2362
2363         The "value" attribute is of an unsupported type.
2364
2365       · 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2366
2367         When  "eol"  has  been  set  to  anything  but the  default,  like
2368         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2369         "quote_char", where the characters following the "\r" do not make up
2370         the "eol" sequence, this is an error.
2371
2372       · 2011 "ECR - Characters after end of quoted field"
2373
2374         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2375         quoted field and after the closing double-quote, there should be
2376         either a new-line sequence or a separation character.
2377
2378       · 2012 "EOF - End of data in parsing input stream"
2379
2380         Self-explaining. End-of-file while inside parsing a stream. Can
2381         happen only when reading from streams with "getline",  as using
2382         "parse" is done on strings that are not required to have a trailing
2383         "eol".
2384
2385       · 2013 "INI - Specification error for fragments RFC7111"
2386
2387         Invalid specification for URI "fragment" specification.
2388
2389       · 2014 "ENF - Inconsistent number of fields"
2390
2391         Inconsistent number of fields under strict parsing.
2392
2393       · 2021 "EIQ - NL char inside quotes, binary off"
2394
2395         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2396         option has been selected with the constructor.
2397
2398       · 2022 "EIQ - CR char inside quotes, binary off"
2399
2400         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2401         option has been selected with the constructor.
2402
2403       · 2023 "EIQ - QUO character not allowed"
2404
2405         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2406         Bar",\n" will cause this error.
2407
2408       · 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2409
2410         The escape character is not allowed as last character in an input
2411         stream.
2412
2413       · 2025 "EIQ - Loose unescaped escape"
2414
2415         An escape character should escape only characters that need escaping.
2416
2417         Allowing  the escape  for other characters  is possible  with the
2418         attribute "allow_loose_escapes".
2419
2420       · 2026 "EIQ - Binary character inside quoted field, binary off"
2421
2422         Binary characters are not allowed by default.    Exceptions are
2423         fields that contain valid UTF-8,  that will automatically be upgraded
2424         if the content is valid UTF-8. Set "binary" to 1 to accept binary
2425         data.
2426
2427       · 2027 "EIQ - Quoted field not terminated"
2428
2429         When parsing a field that started with a quotation character,  the
2430         field is expected to be closed with a quotation character.   When the
2431         parsed line is exhausted before the quote is found, that field is not
2432         terminated.
2433
2434       · 2030 "EIF - NL char inside unquoted verbatim, binary off"
2435
2436       · 2031 "EIF - CR char is first char of field, not part of EOL"
2437
2438       · 2032 "EIF - CR char inside unquoted, not part of EOL"
2439
2440       · 2034 "EIF - Loose unescaped quote"
2441
2442       · 2035 "EIF - Escaped EOF in unquoted field"
2443
2444       · 2036 "EIF - ESC error"
2445
2446       · 2037 "EIF - Binary character in unquoted field, binary off"
2447
2448       · 2110 "ECB - Binary character in Combine, binary off"
2449
2450       · 2200 "EIO - print to IO failed. See errno"
2451
2452       · 3001 "EHR - Unsupported syntax for column_names ()"
2453
2454       · 3002 "EHR - getline_hr () called before column_names ()"
2455
2456       · 3003 "EHR - bind_columns () and column_names () fields count
2457         mismatch"
2458
2459       · 3004 "EHR - bind_columns () only accepts refs to scalars"
2460
2461       · 3006 "EHR - bind_columns () did not pass enough refs for parsed
2462         fields"
2463
2464       · 3007 "EHR - bind_columns needs refs to writable scalars"
2465
2466       · 3008 "EHR - unexpected error in bound fields"
2467
2468       · 3009 "EHR - print_hr () called before column_names ()"
2469
2470       · 3010 "EHR - print_hr () called with invalid arguments"
2471

AUTHORS and MAINTAINERS

2476       Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2477       Please don't send mail concerning Text::CSV to Alan, as he's not a
2478       present maintainer.
2479
2480       Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2481       in C by implementing a simple finite-state machine and added the
2482       variable quote, escape and separator characters, the binary mode and
2483       the print and getline methods. See ChangeLog releases 0.10 through
2484       0.23.
2485
2486       H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2487       field flags methods, wrote the major part of the test suite, completed
2488       the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2489       on.
2490
2491       Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2492       which is the pure-Perl version of Text::CSV_XS.
2493
2494       New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2495       Ishigaki since 1.91.
2496

COPYRIGHT AND LICENSE

2498       Text::CSV
2499
2500       Copyright (C) 1997 Alan Citterman. All rights reserved.  Copyright (C)
2501       2007-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017- Kenichi
2502       Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2503       below.
2504
2505       Text::CSV_PP:
2506
2507       Copyright (C) 2005-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017-
2508       Kenichi Ishigaki A large portion of the code/doc are also taken from
2509       Text::CSV_XS. See below.
2510
2511       Text:CSV_XS:
2512
2513       Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V.  Copyright (C)
2514       1998-2001 Jochen Wiedmann. All rights reserved.  Portions Copyright (C)
2515       1997 Alan Citterman. All rights reserved.
2516
2517       This library is free software; you can redistribute it and/or modify it
2518       under the same terms as Perl itself.
2519
2520
2521
2522perl v5.32.0                      2020-07-28                      Text::CSV(3)