Text::CSV(3pm)

1Text::CSV(3)          User Contributed Perl Documentation         Text::CSV(3)
2
3
4

NAME

6       Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7

SYNOPSIS

9       This section is taken from Text::CSV_XS.
10
11        # Functional interface
12        use Text::CSV qw( csv );
13
14        # Read whole file in memory
15        my $aoa = csv (in => "data.csv");    # as array of array
16        my $aoh = csv (in => "data.csv",
17                       headers => "auto");   # as array of hash
18
19        # Write array of arrays as csv file
20        csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22        # Only show lines where "code" is odd
23        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25        # Object interface
26        use Text::CSV;
27
28        my @rows;
29        # Read/parse CSV
30        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
31        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32        while (my $row = $csv->getline ($fh)) {
33            $row->[2] =~ m/pattern/ or next; # 3rd field should match
34            push @rows, $row;
35            }
36        close $fh;
37
38        # and write as CSV
39        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40        $csv->say ($fh, $_) for @rows;
41        close $fh or die "new.csv: $!";
42

DESCRIPTION

44       Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
45       All the backend modules provide facilities for the composition and
46       decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
47       default, and when Text::CSV_XS is not available, falls back on
48       Text::CSV_PP, which is bundled in the same distribution as this module.
49

CHOOSING BACKEND

51       This module respects an environmental variable called "PERL_TEXT_CSV"
52       when it decides a backend module to use. If this environmental variable
53       is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
54       available, falls back on Text::CSV_PP;
55
56       If you always don't want it to fall back on Text::CSV_PP, set the
57       variable like this ("export" may be "setenv", "set" and the likes,
58       depending on your environment):
59
60         > export PERL_TEXT_CSV=Text::CSV_XS
61
62       If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
63
64         > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
65
66       You may also want to set this variable at the top of your test files,
67       in order not to be bothered with incompatibilities between backends
68       (you need to wrap this in "BEGIN", and set before actually "use"-ing
69       Text::CSV module, as it decides its backend as soon as it's loaded):
70
71         BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
72         use Text::CSV;
73

NOTES

75       This section is also taken from Text::CSV_XS.
76
77   Embedded newlines
78       Important Note:  The default behavior is to accept only ASCII
79       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
80       that the fields can not contain newlines. If your data contains
81       newlines embedded in fields, or characters above 0x7E (tilde), or
82       binary data, you must set "binary => 1" in the call to "new". To cover
83       the widest range of parsing options, you will always want to set
84       binary.
85
86       But you still have the problem  that you have to pass a correct line to
87       the "parse" method, which is more complicated from the usual point of
88       usage:
89
90        my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
91        while (<>) {           #  WRONG!
92            $csv->parse ($_);
93            my @fields = $csv->fields ();
94            }
95
96       this will break, as the "while" might read broken lines:  it does not
97       care about the quoting. If you need to support embedded newlines,  the
98       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
99       and "\r\n" by default) and then
100
101        my $csv = Text::CSV->new ({ binary => 1 });
102        open my $fh, "<", $file or die "$file: $!";
103        while (my $row = $csv->getline ($fh)) {
104            my @fields = @$row;
105            }
106
107       The old(er) way of using global file handles is still supported
108
109        while (my $row = $csv->getline (*ARGV)) { ... }
110
111   Unicode
112       Unicode is only tested to work with perl-5.8.2 and up.
113
114       See also "BOM".
115
116       The simplest way to ensure the correct encoding is used for  in- and
117       output is by either setting layers on the filehandles, or setting the
118       "encoding" argument for "csv".
119
120        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
121       or
122        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
123
124        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
125       or
126        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
127
128       On parsing (both for  "getline" and  "parse"),  if the source is marked
129       being UTF8, then all fields that are marked binary will also be marked
130       UTF8.
131
132       On combining ("print"  and  "combine"):  if any of the combining fields
133       was marked UTF8, the resulting string will be marked as UTF8.  Note
134       however that all fields  before  the first field marked UTF8 and
135       contained 8-bit characters that were not upgraded to UTF8,  these will
136       be  "bytes"  in the resulting string too, possibly causing unexpected
137       errors.  If you pass data of different encoding,  or you don't know if
138       there is  different  encoding, force it to be upgraded before you pass
139       them on:
140
141        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
142
143       For complete control over encoding, please use Text::CSV::Encoded:
144
145        use Text::CSV::Encoded;
146        my $csv = Text::CSV::Encoded->new ({
147            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
148            encoding_out => "cp1252",     # the encoding comes out of Perl
149            });
150
151        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
152        # combine () and print () accept *literally* utf8 encoded data
153        # parse () and getline () return *literally* utf8 encoded data
154
155        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
156        # combine () and print () accept UTF8 marked data
157        # parse () and getline () return UTF8 marked data
158
159   BOM
160       BOM  (or Byte Order Mark)  handling is available only inside the
161       "header" method.   This method supports the following encodings:
162       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
163       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
164       <https://en.wikipedia.org/wiki/Byte_order_mark>.
165
166       If a file has a BOM, the easiest way to deal with that is
167
168        my $aoh = csv (in => $file, detect_bom => 1);
169
170       All records will be encoded based on the detected BOM.
171
172       This implies a call to the  "header"  method,  which defaults to also
173       set the "column_names". So this is not the same as
174
175        my $aoh = csv (in => $file, headers => "auto");
176
177       which only reads the first record to set  "column_names"  but ignores
178       any meaning of possible present BOM.
179

METHODS

181       This section is also taken from Text::CSV_XS.
182
183   version
184       (Class method) Returns the current module version.
185
186   new
187       (Class method) Returns a new instance of class Text::CSV. The
188       attributes are described by the (optional) hash ref "\%attr".
189
190        my $csv = Text::CSV->new ({ attributes ... });
191
192       The following attributes are available:
193
194       eol
195
196        my $csv = Text::CSV->new ({ eol => $/ });
197                  $csv->eol (undef);
198        my $eol = $csv->eol;
199
200       The end-of-line string to add to rows for "print" or the record
201       separator for "getline".
202
203       When not passed in a parser instance,  the default behavior is to
204       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
205       "eol" at all. Passing "undef" or the empty string behave the same.
206
207       When not passed in a generating instance,  records are not terminated
208       at all, so it is probably wise to pass something you expect. A safe
209       choice for "eol" on output is either $/ or "\r\n".
210
211       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
212       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
213       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
214
215       If both $/ and "eol" equal "\015", parsing lines that end on only a
216       Carriage Return without Line Feed, will be "parse"d correct.
217
218       sep_char
219
220        my $csv = Text::CSV->new ({ sep_char => ";" });
221                $csv->sep_char (";");
222        my $c = $csv->sep_char;
223
224       The char used to separate fields, by default a comma. (",").  Limited
225       to a single-byte character, usually in the range from 0x20 (space) to
226       0x7E (tilde). When longer sequences are required, use "sep".
227
228       The separation character can not be equal to the quote character  or to
229       the escape character.
230
231       sep
232
233        my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
234                  $csv->sep (";");
235        my $sep = $csv->sep;
236
237       The chars used to separate fields, by default undefined. Limited to 8
238       bytes.
239
240       When set, overrules "sep_char".  If its length is one byte it acts as
241       an alias to "sep_char".
242
243       quote_char
244
245        my $csv = Text::CSV->new ({ quote_char => "'" });
246                $csv->quote_char (undef);
247        my $c = $csv->quote_char;
248
249       The character to quote fields containing blanks or binary data,  by
250       default the double quote character (""").  A value of undef suppresses
251       quote chars (for simple cases only). Limited to a single-byte
252       character, usually in the range from  0x20 (space) to  0x7E (tilde).
253       When longer sequences are required, use "quote".
254
255       "quote_char" can not be equal to "sep_char".
256
257       quote
258
259        my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
260                    $csv->quote ("'");
261        my $quote = $csv->quote;
262
263       The chars used to quote fields, by default undefined. Limited to 8
264       bytes.
265
266       When set, overrules "quote_char". If its length is one byte it acts as
267       an alias to "quote_char".
268
269       This method does not support "undef".  Use "quote_char" to disable
270       quotation.
271
272       escape_char
273
274        my $csv = Text::CSV->new ({ escape_char => "\\" });
275                $csv->escape_char (":");
276        my $c = $csv->escape_char;
277
278       The character to  escape  certain characters inside quoted fields.
279       This is limited to a  single-byte  character,  usually  in the  range
280       from  0x20 (space) to 0x7E (tilde).
281
282       The "escape_char" defaults to being the double-quote mark ("""). In
283       other words the same as the default "quote_char". This means that
284       doubling the quote mark in a field escapes it:
285
286        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
287
288       If  you  change  the   "quote_char"  without  changing  the
289       "escape_char",  the  "escape_char" will still be the double-quote
290       (""").  If instead you want to escape the  "quote_char" by doubling it
291       you will need to also change the  "escape_char"  to be the same as what
292       you have changed the "quote_char" to.
293
294       Setting "escape_char" to <undef> or "" will disable escaping completely
295       and is greatly discouraged. This will also disable "escape_null".
296
297       The escape character can not be equal to the separation character.
298
299       binary
300
301        my $csv = Text::CSV->new ({ binary => 1 });
302                $csv->binary (0);
303        my $f = $csv->binary;
304
305       If this attribute is 1,  you may use binary characters in quoted
306       fields, including line feeds, carriage returns and "NULL" bytes. (The
307       latter could be escaped as ""0".) By default this feature is off.
308
309       If a string is marked UTF8,  "binary" will be turned on automatically
310       when binary characters other than "CR" and "NL" are encountered.   Note
311       that a simple string like "\x{00a0}" might still be binary, but not
312       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
313
314       strict
315
316        my $csv = Text::CSV->new ({ strict => 1 });
317                $csv->strict (0);
318        my $f = $csv->strict;
319
320       If this attribute is set to 1, any row that parses to a different
321       number of fields than the previous row will cause the parser to throw
322       error 2014.
323
324       skip_empty_rows
325
326        my $csv = Text::CSV->new ({ skip_empty_rows => 1 });
327                $csv->skip_empty_rows (0);
328        my $f = $csv->skip_empty_rows;
329
330       If this attribute is set to 1,  any row that has an  "eol" immediately
331       following the start of line will be skipped.  Default behavior is to
332       return one single empty field.
333
334       This attribute is only used in parsing.
335
336       formula_handling
337
338       formula
339
340        my $csv = Text::CSV->new ({ formula => "none" });
341                $csv->formula ("none");
342        my $f = $csv->formula;
343
344       This defines the behavior of fields containing formulas. As formulas
345       are considered dangerous in spreadsheets, this attribute can define an
346       optional action to be taken if a field starts with an equal sign ("=").
347
348       For purpose of code-readability, this can also be written as
349
350        my $csv = Text::CSV->new ({ formula_handling => "none" });
351                $csv->formula_handling ("none");
352        my $f = $csv->formula_handling;
353
354       Possible values for this attribute are
355
356       none
357         Take no specific action. This is the default.
358
359          $csv->formula ("none");
360
361       die
362         Cause the process to "die" whenever a leading "=" is encountered.
363
364          $csv->formula ("die");
365
366       croak
367         Cause the process to "croak" whenever a leading "=" is encountered.
368         (See Carp)
369
370          $csv->formula ("croak");
371
372       diag
373         Report position and content of the field whenever a leading  "=" is
374         found.  The value of the field is unchanged.
375
376          $csv->formula ("diag");
377
378       empty
379         Replace the content of fields that start with a "=" with the empty
380         string.
381
382          $csv->formula ("empty");
383          $csv->formula ("");
384
385       undef
386         Replace the content of fields that start with a "=" with "undef".
387
388          $csv->formula ("undef");
389          $csv->formula (undef);
390
391       a callback
392         Modify the content of fields that start with a  "="  with the return-
393         value of the callback.  The original content of the field is
394         available inside the callback as $_;
395
396          # Replace all formula's with 42
397          $csv->formula (sub { 42; });
398
399          # same as $csv->formula ("empty") but slower
400          $csv->formula (sub { "" });
401
402          # Allow =4+12
403          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
404
405          # Allow more complex calculations
406          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
407
408       All other values will give a warning and then fallback to "diag".
409
410       decode_utf8
411
412        my $csv = Text::CSV->new ({ decode_utf8 => 1 });
413                $csv->decode_utf8 (0);
414        my $f = $csv->decode_utf8;
415
416       This attributes defaults to TRUE.
417
418       While parsing,  fields that are valid UTF-8, are automatically set to
419       be UTF-8, so that
420
421         $csv->parse ("\xC4\xA8\n");
422
423       results in
424
425         PV("\304\250"\0) [UTF8 "\x{128}"]
426
427       Sometimes it might not be a desired action.  To prevent those upgrades,
428       set this attribute to false, and the result will be
429
430         PV("\304\250"\0)
431
432       auto_diag
433
434        my $csv = Text::CSV->new ({ auto_diag => 1 });
435                $csv->auto_diag (2);
436        my $l = $csv->auto_diag;
437
438       Set this attribute to a number between 1 and 9 causes  "error_diag" to
439       be automatically called in void context upon errors.
440
441       In case of error "2012 - EOF", this call will be void.
442
443       If "auto_diag" is set to a numeric value greater than 1, it will "die"
444       on errors instead of "warn".  If set to anything unrecognized,  it will
445       be silently ignored.
446
447       Future extensions to this feature will include more reliable auto-
448       detection of  "autodie"  being active in the scope of which the error
449       occurred which will increment the value of "auto_diag" with  1 the
450       moment the error is detected.
451
452       diag_verbose
453
454        my $csv = Text::CSV->new ({ diag_verbose => 1 });
455                $csv->diag_verbose (2);
456        my $l = $csv->diag_verbose;
457
458       Set the verbosity of the output triggered by "auto_diag".   Currently
459       only adds the current  input-record-number  (if known)  to the
460       diagnostic output with an indication of the position of the error.
461
462       blank_is_undef
463
464        my $csv = Text::CSV->new ({ blank_is_undef => 1 });
465                $csv->blank_is_undef (0);
466        my $f = $csv->blank_is_undef;
467
468       Under normal circumstances, "CSV" data makes no distinction between
469       quoted- and unquoted empty fields.  These both end up in an empty
470       string field once read, thus
471
472        1,"",," ",2
473
474       is read as
475
476        ("1", "", "", " ", "2")
477
478       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
479       set, the unquoted  empty field is the result of an undefined value.
480       To enable this distinction when  reading "CSV"  data,  the
481       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
482       to "undef", causing the above to be parsed as
483
484        ("1", "", undef, " ", "2")
485
486       Note that this is specifically important when loading  "CSV" fields
487       into a database that allows "NULL" values,  as the perl equivalent for
488       "NULL" is "undef" in DBI land.
489
490       empty_is_undef
491
492        my $csv = Text::CSV->new ({ empty_is_undef => 1 });
493                $csv->empty_is_undef (0);
494        my $f = $csv->empty_is_undef;
495
496       Going one  step  further  than  "blank_is_undef",  this attribute
497       converts all empty fields to "undef", so
498
499        1,"",," ",2
500
501       is read as
502
503        (1, undef, undef, " ", 2)
504
505       Note that this affects only fields that are  originally  empty,  not
506       fields that are empty after stripping allowed whitespace. YMMV.
507
508       allow_whitespace
509
510        my $csv = Text::CSV->new ({ allow_whitespace => 1 });
511                $csv->allow_whitespace (0);
512        my $f = $csv->allow_whitespace;
513
514       When this option is set to true,  the whitespace  ("TAB"'s and
515       "SPACE"'s) surrounding  the  separation character  is removed when
516       parsing.  If either "TAB" or "SPACE" is one of the three characters
517       "sep_char", "quote_char", or "escape_char" it will not be considered
518       whitespace.
519
520       Now lines like:
521
522        1 , "foo" , bar , 3 , zapp
523
524       are parsed as valid "CSV", even though it violates the "CSV" specs.
525
526       Note that  all  whitespace is stripped from both  start and  end of
527       each field.  That would make it  more than a feature to enable parsing
528       bad "CSV" lines, as
529
530        1,   2.0,  3,   ape  , monkey
531
532       will now be parsed as
533
534        ("1", "2.0", "3", "ape", "monkey")
535
536       even if the original line was perfectly acceptable "CSV".
537
538       allow_loose_quotes
539
540        my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
541                $csv->allow_loose_quotes (0);
542        my $f = $csv->allow_loose_quotes;
543
544       By default, parsing unquoted fields containing "quote_char" characters
545       like
546
547        1,foo "bar" baz,42
548
549       would result in parse error 2034.  Though it is still bad practice to
550       allow this format,  we  cannot  help  the  fact  that  some  vendors
551       make  their applications spit out lines styled this way.
552
553       If there is really bad "CSV" data, like
554
555        1,"foo "bar" baz",42
556
557       or
558
559        1,""foo bar baz"",42
560
561       there is a way to get this data-line parsed and leave the quotes inside
562       the quoted field as-is.  This can be achieved by setting
563       "allow_loose_quotes" AND making sure that the "escape_char" is  not
564       equal to "quote_char".
565
566       allow_loose_escapes
567
568        my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
569                $csv->allow_loose_escapes (0);
570        my $f = $csv->allow_loose_escapes;
571
572       Parsing fields  that  have  "escape_char"  characters that escape
573       characters that do not need to be escaped, like:
574
575        my $csv = Text::CSV->new ({ escape_char => "\\" });
576        $csv->parse (qq{1,"my bar\'s",baz,42});
577
578       would result in parse error 2025.   Though it is bad practice to allow
579       this format,  this attribute enables you to treat all escape character
580       sequences equal.
581
582       allow_unquoted_escape
583
584        my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
585                $csv->allow_unquoted_escape (0);
586        my $f = $csv->allow_unquoted_escape;
587
588       A backward compatibility issue where "escape_char" differs from
589       "quote_char"  prevents  "escape_char" to be in the first position of a
590       field.  If "quote_char" is equal to the default """ and "escape_char"
591       is set to "\", this would be illegal:
592
593        1,\0,2
594
595       Setting this attribute to 1  might help to overcome issues with
596       backward compatibility and allow this style.
597
598       always_quote
599
600        my $csv = Text::CSV->new ({ always_quote => 1 });
601                $csv->always_quote (0);
602        my $f = $csv->always_quote;
603
604       By default the generated fields are quoted only if they need to be.
605       For example, if they contain the separator character. If you set this
606       attribute to 1 then all defined fields will be quoted. ("undef" fields
607       are not quoted, see "blank_is_undef"). This makes it quite often easier
608       to handle exported data in external applications.
609
610       quote_space
611
612        my $csv = Text::CSV->new ({ quote_space => 1 });
613                $csv->quote_space (0);
614        my $f = $csv->quote_space;
615
616       By default,  a space in a field would trigger quotation.  As no rule
617       exists this to be forced in "CSV",  nor any for the opposite, the
618       default is true for safety.   You can exclude the space  from this
619       trigger  by setting this attribute to 0.
620
621       quote_empty
622
623        my $csv = Text::CSV->new ({ quote_empty => 1 });
624                $csv->quote_empty (0);
625        my $f = $csv->quote_empty;
626
627       By default the generated fields are quoted only if they need to be.
628       An empty (defined) field does not need quotation. If you set this
629       attribute to 1 then empty defined fields will be quoted.  ("undef"
630       fields are not quoted, see "blank_is_undef"). See also "always_quote".
631
632       quote_binary
633
634        my $csv = Text::CSV->new ({ quote_binary => 1 });
635                $csv->quote_binary (0);
636        my $f = $csv->quote_binary;
637
638       By default,  all "unsafe" bytes inside a string cause the combined
639       field to be quoted.  By setting this attribute to 0, you can disable
640       that trigger for bytes >= 0x7F.
641
642       escape_null
643
644        my $csv = Text::CSV->new ({ escape_null => 1 });
645                $csv->escape_null (0);
646        my $f = $csv->escape_null;
647
648       By default, a "NULL" byte in a field would be escaped. This option
649       enables you to treat the  "NULL"  byte as a simple binary character in
650       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
651       can prevent "NULL" escapes by setting this attribute to 0.
652
653       When the "escape_char" attribute is set to undefined,  this attribute
654       will be set to false.
655
656       The default setting will encode "=\x00=" as
657
658        "="0="
659
660       With "escape_null" set, this will result in
661
662        "=\x00="
663
664       The default when using the "csv" function is "false".
665
666       For backward compatibility reasons,  the deprecated old name
667       "quote_null" is still recognized.
668
669       keep_meta_info
670
671        my $csv = Text::CSV->new ({ keep_meta_info => 1 });
672                $csv->keep_meta_info (0);
673        my $f = $csv->keep_meta_info;
674
675       By default, the parsing of input records is as simple and fast as
676       possible.  However,  some parsing information - like quotation of the
677       original field - is lost in that process.  Setting this flag to true
678       enables retrieving that information after parsing with  the methods
679       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
680       false for performance.
681
682       If you set this attribute to a value greater than 9,   then you can
683       control output quotation style like it was used in the input of the the
684       last parsed record (unless quotation was added because of other
685       reasons).
686
687        my $csv = Text::CSV->new ({
688           binary         => 1,
689           keep_meta_info => 1,
690           quote_space    => 0,
691           });
692
693        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
694
695        $csv->print (*STDOUT, \@row);
696        # 1,,, , ,f,g,"h""h",help,help
697        $csv->keep_meta_info (11);
698        $csv->print (*STDOUT, \@row);
699        # 1,,"", ," ",f,"g","h""h",help,"help"
700
701       undef_str
702
703        my $csv = Text::CSV->new ({ undef_str => "\\N" });
704                $csv->undef_str (undef);
705        my $s = $csv->undef_str;
706
707       This attribute optionally defines the output of undefined fields. The
708       value passed is not changed at all, so if it needs quotation, the
709       quotation needs to be included in the value of the attribute.  Use with
710       caution, as passing a value like  ",",,,,"""  will for sure mess up
711       your output. The default for this attribute is "undef", meaning no
712       special treatment.
713
714       This attribute is useful when exporting  CSV data  to be imported in
715       custom loaders, like for MySQL, that recognize special sequences for
716       "NULL" data.
717
718       This attribute has no meaning when parsing CSV data.
719
720       comment_str
721
722        my $csv = Text::CSV->new ({ comment_str => "#" });
723                $csv->comment_str (undef);
724        my $s = $csv->comment_str;
725
726       This attribute optionally defines a string to be recognized as comment.
727       If this attribute is defined,   all lines starting with this sequence
728       will not be parsed as CSV but skipped as comment.
729
730       This attribute has no meaning when generating CSV.
731
732       Comment strings that start with any of the special characters/sequences
733       are not supported (so it cannot start with any of "sep_char",
734       "quote_char", "escape_char", "sep", "quote", or "eol").
735
736       For convenience, "comment" is an alias for "comment_str".
737
738       verbatim
739
740        my $csv = Text::CSV->new ({ verbatim => 1 });
741                $csv->verbatim (0);
742        my $f = $csv->verbatim;
743
744       This is a quite controversial attribute to set,  but makes some hard
745       things possible.
746
747       The rationale behind this attribute is to tell the parser that the
748       normally special characters newline ("NL") and Carriage Return ("CR")
749       will not be special when this flag is set,  and be dealt with  as being
750       ordinary binary characters. This will ease working with data with
751       embedded newlines.
752
753       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
754       every line.
755
756       Imagine a file format like
757
758        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
759
760       where, the line ending is a very specific "#\r\n", and the sep_char is
761       a "^" (caret).   None of the fields is quoted,   but embedded binary
762       data is likely to be present. With the specific line ending, this
763       should not be too hard to detect.
764
765       By default,  Text::CSV'  parse function is instructed to only know
766       about "\n" and "\r"  to be legal line endings,  and so has to deal with
767       the embedded newline as a real "end-of-line",  so it can scan the next
768       line if binary is true, and the newline is inside a quoted field. With
769       this option, we tell "parse" to parse the line as if "\n" is just
770       nothing more than a binary character.
771
772       For "parse" this means that the parser has no more idea about line
773       ending and "getline" "chomp"s line endings on reading.
774
775       types
776
777       A set of column types; the attribute is immediately passed to the
778       "types" method.
779
780       callbacks
781
782       See the "Callbacks" section below.
783
784       accessors
785
786       To sum it up,
787
788        $csv = Text::CSV->new ();
789
790       is equivalent to
791
792        $csv = Text::CSV->new ({
793            eol                   => undef, # \r, \n, or \r\n
794            sep_char              => ',',
795            sep                   => undef,
796            quote_char            => '"',
797            quote                 => undef,
798            escape_char           => '"',
799            binary                => 0,
800            decode_utf8           => 1,
801            auto_diag             => 0,
802            diag_verbose          => 0,
803            blank_is_undef        => 0,
804            empty_is_undef        => 0,
805            allow_whitespace      => 0,
806            allow_loose_quotes    => 0,
807            allow_loose_escapes   => 0,
808            allow_unquoted_escape => 0,
809            always_quote          => 0,
810            quote_empty           => 0,
811            quote_space           => 1,
812            escape_null           => 1,
813            quote_binary          => 1,
814            keep_meta_info        => 0,
815            strict                => 0,
816            skip_empty_rows       => 0,
817            formula               => 0,
818            verbatim              => 0,
819            undef_str             => undef,
820            comment_str           => undef,
821            types                 => undef,
822            callbacks             => undef,
823            });
824
825       For all of the above mentioned flags, an accessor method is available
826       where you can inquire the current value, or change the value
827
828        my $quote = $csv->quote_char;
829        $csv->binary (1);
830
831       It is not wise to change these settings halfway through writing "CSV"
832       data to a stream. If however you want to create a new stream using the
833       available "CSV" object, there is no harm in changing them.
834
835       If the "new" constructor call fails,  it returns "undef",  and makes
836       the fail reason available through the "error_diag" method.
837
838        $csv = Text::CSV->new ({ ecs_char => 1 }) or
839            die "".Text::CSV->error_diag ();
840
841       "error_diag" will return a string like
842
843        "INI - Unknown attribute 'ecs_char'"
844
845   known_attributes
846        @attr = Text::CSV->known_attributes;
847        @attr = Text::CSV::known_attributes;
848        @attr = $csv->known_attributes;
849
850       This method will return an ordered list of all the supported
851       attributes as described above.   This can be useful for knowing what
852       attributes are valid in classes that use or extend Text::CSV.
853
854   print
855        $status = $csv->print ($fh, $colref);
856
857       Similar to  "combine" + "string" + "print",  but much more efficient.
858       It expects an array ref as input  (not an array!)  and the resulting
859       string is not really  created,  but  immediately  written  to the  $fh
860       object, typically an IO handle or any other object that offers a
861       "print" method.
862
863       For performance reasons  "print"  does not create a result string,  so
864       all "string", "status", "fields", and "error_input" methods will return
865       undefined information after executing this method.
866
867       If $colref is "undef"  (explicit,  not through a variable argument) and
868       "bind_columns"  was used to specify fields to be printed,  it is
869       possible to make performance improvements, as otherwise data would have
870       to be copied as arguments to the method call:
871
872        $csv->bind_columns (\($foo, $bar));
873        $status = $csv->print ($fh, undef);
874
875       A short benchmark
876
877        my @data = ("aa" .. "zz");
878        $csv->bind_columns (\(@data));
879
880        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
881        $csv->print ($fh,  \@data  );   # 57600 recs/sec
882        $csv->print ($fh,   undef  );   # 48500 recs/sec
883
884   say
885        $status = $csv->say ($fh, $colref);
886
887       Like "print", but "eol" defaults to "$\".
888
889   print_hr
890        $csv->print_hr ($fh, $ref);
891
892       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
893       provided the column names are set with "column_names".
894
895       It is just a wrapper method with basic parameter checks over
896
897        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
898
899   combine
900        $status = $csv->combine (@fields);
901
902       This method constructs a "CSV" record from  @fields,  returning success
903       or failure.   Failure can result from lack of arguments or an argument
904       that contains an invalid character.   Upon success,  "string" can be
905       called to retrieve the resultant "CSV" string.  Upon failure,  the
906       value returned by "string" is undefined and "error_input" could be
907       called to retrieve the invalid argument.
908
909   string
910        $line = $csv->string ();
911
912       This method returns the input to  "parse"  or the resultant "CSV"
913       string of "combine", whichever was called more recently.
914
915   getline
916        $colref = $csv->getline ($fh);
917
918       This is the counterpart to  "print",  as "parse"  is the counterpart to
919       "combine":  it parses a row from the $fh  handle using the "getline"
920       method associated with $fh  and parses this row into an array ref.
921       This array ref is returned by the function or "undef" for failure.
922       When $fh does not support "getline", you are likely to hit errors.
923
924       When fields are bound with "bind_columns" the return value is a
925       reference to an empty list.
926
927       The "string", "fields", and "status" methods are meaningless again.
928
929   getline_all
930        $arrayref = $csv->getline_all ($fh);
931        $arrayref = $csv->getline_all ($fh, $offset);
932        $arrayref = $csv->getline_all ($fh, $offset, $length);
933
934       This will return a reference to a list of getline ($fh) results.  In
935       this call, "keep_meta_info" is disabled.  If $offset is negative, as
936       with "splice", only the last  "abs ($offset)" records of $fh are taken
937       into consideration.
938
939       Given a CSV file with 10 lines:
940
941        lines call
942        ----- ---------------------------------------------------------
943        0..9  $csv->getline_all ($fh)         # all
944        0..9  $csv->getline_all ($fh,  0)     # all
945        8..9  $csv->getline_all ($fh,  8)     # start at 8
946        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
947        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
948        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
949        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
950        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
951
952   getline_hr
953       The "getline_hr" and "column_names" methods work together  to allow you
954       to have rows returned as hashrefs.  You must call "column_names" first
955       to declare your column names.
956
957        $csv->column_names (qw( code name price description ));
958        $hr = $csv->getline_hr ($fh);
959        print "Price for $hr->{name} is $hr->{price} EUR\n";
960
961       "getline_hr" will croak if called before "column_names".
962
963       Note that  "getline_hr"  creates a hashref for every row and will be
964       much slower than the combined use of "bind_columns"  and "getline" but
965       still offering the same easy to use hashref inside the loop:
966
967        my @cols = @{$csv->getline ($fh)};
968        $csv->column_names (@cols);
969        while (my $row = $csv->getline_hr ($fh)) {
970            print $row->{price};
971            }
972
973       Could easily be rewritten to the much faster:
974
975        my @cols = @{$csv->getline ($fh)};
976        my $row = {};
977        $csv->bind_columns (\@{$row}{@cols});
978        while ($csv->getline ($fh)) {
979            print $row->{price};
980            }
981
982       Your mileage may vary for the size of the data and the number of rows.
983       With perl-5.14.2 the comparison for a 100_000 line file with 14
984       columns:
985
986                   Rate hashrefs getlines
987        hashrefs 1.00/s       --     -76%
988        getlines 4.15/s     313%       --
989
990   getline_hr_all
991        $arrayref = $csv->getline_hr_all ($fh);
992        $arrayref = $csv->getline_hr_all ($fh, $offset);
993        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
994
995       This will return a reference to a list of   getline_hr ($fh) results.
996       In this call, "keep_meta_info" is disabled.
997
998   parse
999        $status = $csv->parse ($line);
1000
1001       This method decomposes a  "CSV"  string into fields,  returning success
1002       or failure.   Failure can result from a lack of argument  or the given
1003       "CSV" string is improperly formatted.   Upon success, "fields" can be
1004       called to retrieve the decomposed fields. Upon failure calling "fields"
1005       will return undefined data and  "error_input"  can be called to
1006       retrieve  the invalid argument.
1007
1008       You may use the "types"  method for setting column types.  See "types"'
1009       description below.
1010
1011       The $line argument is supposed to be a simple scalar. Everything else
1012       is supposed to croak and set error 1500.
1013
1014   fragment
1015       This function tries to implement RFC7111  (URI Fragment Identifiers for
1016       the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
1017
1018        my $AoA = $csv->fragment ($fh, $spec);
1019
1020       In specifications,  "*" is used to specify the last item, a dash ("-")
1021       to indicate a range.   All indices are 1-based:  the first row or
1022       column has index 1. Selections can be combined with the semi-colon
1023       (";").
1024
1025       When using this method in combination with  "column_names",  the
1026       returned reference  will point to a  list of hashes  instead of a  list
1027       of lists.  A disjointed  cell-based combined selection  might return
1028       rows with different number of columns making the use of hashes
1029       unpredictable.
1030
1031        $csv->column_names ("Name", "Age");
1032        my $AoH = $csv->fragment ($fh, "col=3;8");
1033
1034       If the "after_parse" callback is active,  it is also called on every
1035       line parsed and skipped before the fragment.
1036
1037       row
1038          row=4
1039          row=5-7
1040          row=6-*
1041          row=1-2;4;6-*
1042
1043       col
1044          col=2
1045          col=1-3
1046          col=4-*
1047          col=1-2;4;7-*
1048
1049       cell
1050         In cell-based selection, the comma (",") is used to pair row and
1051         column
1052
1053          cell=4,1
1054
1055         The range operator ("-") using "cell"s can be used to define top-left
1056         and bottom-right "cell" location
1057
1058          cell=3,1-4,6
1059
1060         The "*" is only allowed in the second part of a pair
1061
1062          cell=3,2-*,2    # row 3 till end, only column 2
1063          cell=3,2-3,*    # column 2 till end, only row 3
1064          cell=3,2-*,*    # strip row 1 and 2, and column 1
1065
1066         Cells and cell ranges may be combined with ";", possibly resulting in
1067         rows with different numbers of columns
1068
1069          cell=1,1-2,2;3,3-4,4;1,4;4,1
1070
1071         Disjointed selections will only return selected cells.   The cells
1072         that are not  specified  will  not  be  included  in the  returned
1073         set,  not even as "undef".  As an example given a "CSV" like
1074
1075          11,12,13,...19
1076          21,22,...28,29
1077          :            :
1078          91,...97,98,99
1079
1080         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1081
1082          11,12,14
1083          21,22
1084          33,34
1085          41,43,44
1086
1087         Overlapping cell-specs will return those cells only once, So
1088         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1089
1090          11,12,13
1091          21,22,23,24
1092          31,32,33,34
1093          42,43,44
1094
1095       RFC7111 <http://tools.ietf.org/html/rfc7111> does  not  allow different
1096       types of specs to be combined   (either "row" or "col" or "cell").
1097       Passing an invalid fragment specification will croak and set error
1098       2013.
1099
1100   column_names
1101       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1102       keys (column names) are passed, it will return the current setting as a
1103       list.
1104
1105       "column_names" accepts a list of scalars  (the column names)  or a
1106       single array_ref, so you can pass the return value from "getline" too:
1107
1108        $csv->column_names ($csv->getline ($fh));
1109
1110       "column_names" does no checking on duplicates at all, which might lead
1111       to unexpected results.   Undefined entries will be replaced with the
1112       string "\cAUNDEF\cA", so
1113
1114        $csv->column_names (undef, "", "name", "name");
1115        $hr = $csv->getline_hr ($fh);
1116
1117       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1118       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1119       field.
1120
1121       "column_names" croaks on invalid arguments.
1122
1123   header
1124       This method does NOT work in perl-5.6.x
1125
1126       Parse the CSV header and set "sep", column_names and encoding.
1127
1128        my @hdr = $csv->header ($fh);
1129        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1130        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1131
1132       The first argument should be a file handle.
1133
1134       This method resets some object properties,  as it is supposed to be
1135       invoked only once per file or stream.  It will leave attributes
1136       "column_names" and "bound_columns" alone if setting column names is
1137       disabled. Reading headers on previously process objects might fail on
1138       perl-5.8.0 and older.
1139
1140       Assuming that the file opened for parsing has a header, and the header
1141       does not contain problematic characters like embedded newlines,   read
1142       the first line from the open handle then auto-detect whether the header
1143       separates the column names with a character from the allowed separator
1144       list.
1145
1146       If any of the allowed separators matches,  and none of the other
1147       allowed separators match,  set  "sep"  to that  separator  for the
1148       current CSV instance and use it to parse the first line, map those to
1149       lowercase, and use that to set the instance "column_names":
1150
1151        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1152        open my $fh, "<", "file.csv";
1153        binmode $fh; # for Windows
1154        $csv->header ($fh);
1155        while (my $row = $csv->getline_hr ($fh)) {
1156            ...
1157            }
1158
1159       If the header is empty,  contains more than one unique separator out of
1160       the allowed set,  contains empty fields,   or contains identical fields
1161       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1162       respectively.
1163
1164       If the header contains embedded newlines or is not valid  CSV  in any
1165       other way, this method will croak and leave the parse error untouched.
1166
1167       A successful call to "header"  will always set the  "sep"  of the $csv
1168       object. This behavior can not be disabled.
1169
1170       return value
1171
1172       On error this method will croak.
1173
1174       In list context,  the headers will be returned whether they are used to
1175       set "column_names" or not.
1176
1177       In scalar context, the instance itself is returned.  Note: the values
1178       as found in the header will effectively be  lost if  "set_column_names"
1179       is false.
1180
1181       Options
1182
1183       sep_set
1184          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1185
1186         The list of legal separators defaults to "[ ";", "," ]" and can be
1187         changed by this option.  As this is probably the most often used
1188         option,  it can be passed on its own as an unnamed argument:
1189
1190          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1191
1192         Multi-byte  sequences are allowed,  both multi-character and
1193         Unicode.  See "sep".
1194
1195       detect_bom
1196          $csv->header ($fh, { detect_bom => 1 });
1197
1198         The default behavior is to detect if the header line starts with a
1199         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1200         This default behavior can be disabled by passing a false value to
1201         "detect_bom".
1202
1203         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1204         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1205         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1206         supported.
1207
1208         If a supported BOM was detected as start of the stream, it is stored
1209         in the object attribute "ENCODING".
1210
1211          my $enc = $csv->{ENCODING};
1212
1213         The encoding is used with "binmode" on $fh.
1214
1215         If the handle was opened in a (correct) encoding,  this method will
1216         not alter the encoding, as it checks the leading bytes of the first
1217         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1218         "{ENCODING}" will be "" (empty) instead of the default "undef".
1219
1220       munge_column_names
1221         This option offers the means to modify the column names into
1222         something that is most useful to the application.   The default is to
1223         map all column names to lower case.
1224
1225          $csv->header ($fh, { munge_column_names => "lc" });
1226
1227         The following values are available:
1228
1229           lc     - lower case
1230           uc     - upper case
1231           db     - valid DB field names
1232           none   - do not change
1233           \%hash - supply a mapping
1234           \&cb   - supply a callback
1235
1236         Lower case
1237            $csv->header ($fh, { munge_column_names => "lc" });
1238
1239           The header is changed to all lower-case
1240
1241            $_ = lc;
1242
1243         Upper case
1244            $csv->header ($fh, { munge_column_names => "uc" });
1245
1246           The header is changed to all upper-case
1247
1248            $_ = uc;
1249
1250         Literal
1251            $csv->header ($fh, { munge_column_names => "none" });
1252
1253         Hash
1254            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1255
1256           if a value does not exist, the original value is used unchanged
1257
1258         Database
1259            $csv->header ($fh, { munge_column_names => "db" });
1260
1261           - lower-case
1262
1263           - all sequences of non-word characters are replaced with an
1264             underscore
1265
1266           - all leading underscores are removed
1267
1268            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1269
1270         Callback
1271            $csv->header ($fh, { munge_column_names => sub { fc } });
1272            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1273            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1274
1275           As this callback is called in a "map", you can use $_ directly.
1276
1277       set_column_names
1278          $csv->header ($fh, { set_column_names => 1 });
1279
1280         The default is to set the instances column names using
1281         "column_names" if the method is successful,  so subsequent calls to
1282         "getline_hr" can return a hash. Disable setting the header can be
1283         forced by using a false value for this option.
1284
1285         As described in "return value" above, content is lost in scalar
1286         context.
1287
1288       Validation
1289
1290       When receiving CSV files from external sources,  this method can be
1291       used to protect against changes in the layout by restricting to known
1292       headers  (and typos in the header fields).
1293
1294        my %known = (
1295            "record key" => "c_rec",
1296            "rec id"     => "c_rec",
1297            "id_rec"     => "c_rec",
1298            "kode"       => "code",
1299            "code"       => "code",
1300            "vaule"      => "value",
1301            "value"      => "value",
1302            );
1303        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1304        open my $fh, "<", $source or die "$source: $!";
1305        $csv->header ($fh, { munge_column_names => sub {
1306            s/\s+$//;
1307            s/^\s+//;
1308            $known{lc $_} or die "Unknown column '$_' in $source";
1309            }});
1310        while (my $row = $csv->getline_hr ($fh)) {
1311            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1312            }
1313
1314   bind_columns
1315       Takes a list of scalar references to be used for output with  "print"
1316       or to store in the fields fetched by "getline".  When you do not pass
1317       enough references to store the fetched fields in, "getline" will fail
1318       with error 3006.  If you pass more than there are fields to return,
1319       the content of the remaining references is left untouched.
1320
1321        $csv->bind_columns (\$code, \$name, \$price, \$description);
1322        while ($csv->getline ($fh)) {
1323            print "The price of a $name is \x{20ac} $price\n";
1324            }
1325
1326       To reset or clear all column binding, call "bind_columns" with the
1327       single argument "undef". This will also clear column names.
1328
1329        $csv->bind_columns (undef);
1330
1331       If no arguments are passed at all, "bind_columns" will return the list
1332       of current bindings or "undef" if no binds are active.
1333
1334       Note that in parsing with  "bind_columns",  the fields are set on the
1335       fly.  That implies that if the third field of a row causes an error
1336       (or this row has just two fields where the previous row had more),  the
1337       first two fields already have been assigned the values of the current
1338       row, while the rest of the fields will still hold the values of the
1339       previous row.  If you want the parser to fail in these cases, use the
1340       "strict" attribute.
1341
1342   eof
1343        $eof = $csv->eof ();
1344
1345       If "parse" or  "getline"  was used with an IO stream,  this method will
1346       return true (1) if the last call hit end of file,  otherwise it will
1347       return false ('').  This is useful to see the difference between a
1348       failure and end of file.
1349
1350       Note that if the parsing of the last line caused an error,  "eof" is
1351       still true.  That means that if you are not using "auto_diag", an idiom
1352       like
1353
1354        while (my $row = $csv->getline ($fh)) {
1355            # ...
1356            }
1357        $csv->eof or $csv->error_diag;
1358
1359       will not report the error. You would have to change that to
1360
1361        while (my $row = $csv->getline ($fh)) {
1362            # ...
1363            }
1364        +$csv->error_diag and $csv->error_diag;
1365
1366   types
1367        $csv->types (\@tref);
1368
1369       This method is used to force that  (all)  columns are of a given type.
1370       For example, if you have an integer column,  two  columns  with
1371       doubles  and a string column, then you might do a
1372
1373        $csv->types ([Text::CSV::IV (),
1374                      Text::CSV::NV (),
1375                      Text::CSV::NV (),
1376                      Text::CSV::PV ()]);
1377
1378       Column types are used only for decoding columns while parsing,  in
1379       other words by the "parse" and "getline" methods.
1380
1381       You can unset column types by doing a
1382
1383        $csv->types (undef);
1384
1385       or fetch the current type settings with
1386
1387        $types = $csv->types ();
1388
1389       IV  Set field type to integer.
1390
1391       NV  Set field type to numeric/float.
1392
1393       PV  Set field type to string.
1394
1395   fields
1396        @columns = $csv->fields ();
1397
1398       This method returns the input to   "combine"  or the resultant
1399       decomposed fields of a successful "parse", whichever was called more
1400       recently.
1401
1402       Note that the return value is undefined after using "getline", which
1403       does not fill the data structures returned by "parse".
1404
1405   meta_info
1406        @flags = $csv->meta_info ();
1407
1408       This method returns the "flags" of the input to "combine" or the flags
1409       of the resultant  decomposed fields of  "parse",   whichever was called
1410       more recently.
1411
1412       For each field,  a meta_info field will hold  flags that  inform
1413       something about  the  field  returned  by  the  "fields"  method or
1414       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1415
1416       " "0x0001
1417         The field was quoted.
1418
1419       " "0x0002
1420         The field was binary.
1421
1422       See the "is_***" methods below.
1423
1424   is_quoted
1425        my $quoted = $csv->is_quoted ($column_idx);
1426
1427       where  $column_idx is the  (zero-based)  index of the column in the
1428       last result of "parse".
1429
1430       This returns a true value  if the data in the indicated column was
1431       enclosed in "quote_char" quotes.  This might be important for fields
1432       where content ",20070108," is to be treated as a numeric value,  and
1433       where ","20070108"," is explicitly marked as character string data.
1434
1435       This method is only valid when "keep_meta_info" is set to a true value.
1436
1437   is_binary
1438        my $binary = $csv->is_binary ($column_idx);
1439
1440       where  $column_idx is the  (zero-based)  index of the column in the
1441       last result of "parse".
1442
1443       This returns a true value if the data in the indicated column contained
1444       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1445
1446       This method is only valid when "keep_meta_info" is set to a true value.
1447
1448   is_missing
1449        my $missing = $csv->is_missing ($column_idx);
1450
1451       where  $column_idx is the  (zero-based)  index of the column in the
1452       last result of "getline_hr".
1453
1454        $csv->keep_meta_info (1);
1455        while (my $hr = $csv->getline_hr ($fh)) {
1456            $csv->is_missing (0) and next; # This was an empty line
1457            }
1458
1459       When using  "getline_hr",  it is impossible to tell if the  parsed
1460       fields are "undef" because they where not filled in the "CSV" stream
1461       or because they were not read at all, as all the fields defined by
1462       "column_names" are set in the hash-ref.    If you still need to know if
1463       all fields in each row are provided, you should enable "keep_meta_info"
1464       so you can check the flags.
1465
1466       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1467       "undef", regardless of $column_idx being valid or not. If this
1468       attribute is "true" it will return either 0 (the field is present) or 1
1469       (the field is missing).
1470
1471       A special case is the empty line.  If the line is completely empty -
1472       after dealing with the flags - this is still a valid CSV line:  it is a
1473       record of just one single empty field. However, if "keep_meta_info" is
1474       set, invoking "is_missing" with index 0 will now return true.
1475
1476   status
1477        $status = $csv->status ();
1478
1479       This method returns the status of the last invoked "combine" or "parse"
1480       call. Status is success (true: 1) or failure (false: "undef" or 0).
1481
1482       Note that as this only keeps track of the status of above mentioned
1483       methods, you are probably looking for "error_diag" instead.
1484
1485   error_input
1486        $bad_argument = $csv->error_input ();
1487
1488       This method returns the erroneous argument (if it exists) of "combine"
1489       or "parse",  whichever was called more recently.  If the last
1490       invocation was successful, "error_input" will return "undef".
1491
1492       Depending on the type of error, it might also hold the data for the
1493       last error-input of "getline".
1494
1495   error_diag
1496        Text::CSV->error_diag ();
1497        $csv->error_diag ();
1498        $error_code               = 0  + $csv->error_diag ();
1499        $error_str                = "" . $csv->error_diag ();
1500        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1501
1502       If (and only if) an error occurred,  this function returns  the
1503       diagnostics of that error.
1504
1505       If called in void context,  this will print the internal error code and
1506       the associated error message to STDERR.
1507
1508       If called in list context,  this will return  the error code  and the
1509       error message in that order.  If the last error was from parsing, the
1510       rest of the values returned are a best guess at the location  within
1511       the line  that was being parsed. Their values are 1-based.  The
1512       position currently is index of the byte at which the parsing failed in
1513       the current record. It might change to be the index of the current
1514       character in a later release. The records is the index of the record
1515       parsed by the csv instance. The field number is the index of the field
1516       the parser thinks it is currently  trying to  parse. See
1517       examples/csv-check for how this can be used.
1518
1519       If called in  scalar context,  it will return  the diagnostics  in a
1520       single scalar, a-la $!.  It will contain the error code in numeric
1521       context, and the diagnostics message in string context.
1522
1523       When called as a class method or a  direct function call,  the
1524       diagnostics are that of the last "new" call.
1525
1526   record_number
1527        $recno = $csv->record_number ();
1528
1529       Returns the records parsed by this csv instance.  This value should be
1530       more accurate than $. when embedded newlines come in play. Records
1531       written by this instance are not counted.
1532
1533   SetDiag
1534        $csv->SetDiag (0);
1535
1536       Use to reset the diagnostics if you are dealing with errors.
1537

ADDITIONAL METHODS

1539       backend
1540           Returns the backend module name called by Text::CSV.  "module" is
1541           an alias.
1542
1543       is_xs
1544           Returns true value if Text::CSV uses an XS backend.
1545
1546       is_pp
1547           Returns true value if Text::CSV uses a pure-Perl backend.
1548

FUNCTIONS

1550       This section is also taken from Text::CSV_XS.
1551
1552   csv
1553       This function is not exported by default and should be explicitly
1554       requested:
1555
1556        use Text::CSV qw( csv );
1557
1558       This is a high-level function that aims at simple (user) interfaces.
1559       This can be used to read/parse a "CSV" file or stream (the default
1560       behavior) or to produce a file or write to a stream (define the  "out"
1561       attribute).  It returns an array- or hash-reference on parsing (or
1562       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1563       When this function fails you can get to the error using the class call
1564       to "error_diag"
1565
1566        my $aoa = csv (in => "test.csv") or
1567            die Text::CSV->error_diag;
1568
1569       This function takes the arguments as key-value pairs. This can be
1570       passed as a list or as an anonymous hash:
1571
1572        my $aoa = csv (  in => "test.csv", sep_char => ";");
1573        my $aoh = csv ({ in => $fh, headers => "auto" });
1574
1575       The arguments passed consist of two parts:  the arguments to "csv"
1576       itself and the optional attributes to the  "CSV"  object used inside
1577       the function as enumerated and explained in "new".
1578
1579       If not overridden, the default option used for CSV is
1580
1581        auto_diag   => 1
1582        escape_null => 0
1583
1584       The option that is always set and cannot be altered is
1585
1586        binary      => 1
1587
1588       As this function will likely be used in one-liners,  it allows  "quote"
1589       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1590       "esc" or "escape".
1591
1592       Alternative invocations:
1593
1594        my $aoa = Text::CSV::csv (in => "file.csv");
1595
1596        my $csv = Text::CSV->new ();
1597        my $aoa = $csv->csv (in => "file.csv");
1598
1599       In the latter case, the object attributes are used from the existing
1600       object and the attribute arguments in the function call are ignored:
1601
1602        my $csv = Text::CSV->new ({ sep_char => ";" });
1603        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1604
1605       will parse using ";" as "sep_char", not ",".
1606
1607       in
1608
1609       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1610       which will be  opened for reading  and closed when finished,  a file
1611       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1612       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1613       "\q{1,2,"csv"}").
1614
1615       When used with "out", "in" should be a reference to a CSV structure
1616       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1617       reference.  The code-ref will be invoked with no arguments.
1618
1619        my $aoa = csv (in => "file.csv");
1620
1621        open my $fh, "<", "file.csv";
1622        my $aoa = csv (in => $fh);
1623
1624        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1625        my $err = csv (in => $csv, out => "file.csv");
1626
1627       If called in void context without the "out" attribute, the resulting
1628       ref will be used as input to a subsequent call to csv:
1629
1630        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1631
1632       will be a shortcut to
1633
1634        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1635
1636       where, in the absence of the "out" attribute, this is a shortcut to
1637
1638        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1639             out => *STDOUT)
1640
1641       out
1642
1643        csv (in => $aoa, out => "file.csv");
1644        csv (in => $aoa, out => $fh);
1645        csv (in => $aoa, out =>   STDOUT);
1646        csv (in => $aoa, out =>  *STDOUT);
1647        csv (in => $aoa, out => \*STDOUT);
1648        csv (in => $aoa, out => \my $data);
1649        csv (in => $aoa, out =>  undef);
1650        csv (in => $aoa, out => \"skip");
1651
1652        csv (in => $fh,  out => \@aoa);
1653        csv (in => $fh,  out => \@aoh, bom => 1);
1654        csv (in => $fh,  out => \%hsh, key => "key");
1655
1656       In output mode, the default CSV options when producing CSV are
1657
1658        eol       => "\r\n"
1659
1660       The "fragment" attribute is ignored in output mode.
1661
1662       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1663       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1664       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1665       or a reference to a scalar (e.g. "\my $data").
1666
1667        csv (in => sub { $sth->fetch },            out => "dump.csv");
1668        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1669             headers => $sth->{NAME_lc});
1670
1671       When a code-ref is used for "in", the output is generated  per
1672       invocation, so no buffering is involved. This implies that there is no
1673       size restriction on the number of records. The "csv" function ends when
1674       the coderef returns a false value.
1675
1676       If "out" is set to a reference of the literal string "skip", the output
1677       will be suppressed completely,  which might be useful in combination
1678       with a filter for side effects only.
1679
1680        my %cache;
1681        csv (in    => "dump.csv",
1682             out   => \"skip",
1683             on_in => sub { $cache{$_[1][1]}++ });
1684
1685       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1686       equivalent to "\"skip"".
1687
1688       If the "in" argument point to something to parse, and the "out" is set
1689       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1690       data in the existing reference. The result of the parse should match
1691       what exists in the reference passed. This might come handy when you
1692       have to parse a set of files with similar content (like data stored per
1693       period) and you want to collect that into a single data structure:
1694
1695        my %hash;
1696        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1697
1698        my @list; # List of arrays
1699        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1700
1701        my @list; # List of hashes
1702        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1703
1704       encoding
1705
1706       If passed,  it should be an encoding accepted by the  ":encoding()"
1707       option to "open". There is no default value. This attribute does not
1708       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1709       use in command line invocations.
1710
1711       If "encoding" is set to the literal value "auto", the method "header"
1712       will be invoked on the opened stream to check if there is a BOM and set
1713       the encoding accordingly.   This is equal to passing a true value in
1714       the option "detect_bom".
1715
1716       Encodings can be stacked, as supported by "binmode":
1717
1718        # Using PerlIO::via::gzip
1719        csv (in       => \@csv,
1720             out      => "test.csv:via.gz",
1721             encoding => ":via(gzip):encoding(utf-8)",
1722             );
1723        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1724
1725        # Using PerlIO::gzip
1726        csv (in       => \@csv,
1727             out      => "test.csv:via.gz",
1728             encoding => ":gzip:encoding(utf-8)",
1729             );
1730        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1731
1732       detect_bom
1733
1734       If  "detect_bom"  is given, the method  "header"  will be invoked on
1735       the opened stream to check if there is a BOM and set the encoding
1736       accordingly.
1737
1738       "detect_bom" can be abbreviated to "bom".
1739
1740       This is the same as setting "encoding" to "auto".
1741
1742       Note that as the method  "header" is invoked,  its default is to also
1743       set the headers.
1744
1745       headers
1746
1747       If this attribute is not given, the default behavior is to produce an
1748       array of arrays.
1749
1750       If "headers" is supplied,  it should be an anonymous list of column
1751       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1752       "lc", "uc", or "skip".
1753
1754       skip
1755         When "skip" is used, the header will not be included in the output.
1756
1757          my $aoa = csv (in => $fh, headers => "skip");
1758
1759       auto
1760         If "auto" is used, the first line of the "CSV" source will be read as
1761         the list of field headers and used to produce an array of hashes.
1762
1763          my $aoh = csv (in => $fh, headers => "auto");
1764
1765       lc
1766         If "lc" is used,  the first line of the  "CSV" source will be read as
1767         the list of field headers mapped to  lower case and used to produce
1768         an array of hashes. This is a variation of "auto".
1769
1770          my $aoh = csv (in => $fh, headers => "lc");
1771
1772       uc
1773         If "uc" is used,  the first line of the  "CSV" source will be read as
1774         the list of field headers mapped to  upper case and used to produce
1775         an array of hashes. This is a variation of "auto".
1776
1777          my $aoh = csv (in => $fh, headers => "uc");
1778
1779       CODE
1780         If a coderef is used,  the first line of the  "CSV" source will be
1781         read as the list of mangled field headers in which each field is
1782         passed as the only argument to the coderef. This list is used to
1783         produce an array of hashes.
1784
1785          my $aoh = csv (in      => $fh,
1786                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1787
1788         this example is a variation of using "lc" where all occurrences of
1789         "kode" are replaced with "code".
1790
1791       ARRAY
1792         If  "headers"  is an anonymous list,  the entries in the list will be
1793         used as field names. The first line is considered data instead of
1794         headers.
1795
1796          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1797          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1798
1799       HASH
1800         If "headers" is a hash reference, this implies "auto", but header
1801         fields that exist as key in the hashref will be replaced by the value
1802         for that key. Given a CSV file like
1803
1804          post-kode,city,name,id number,fubble
1805          1234AA,Duckstad,Donald,13,"X313DF"
1806
1807         using
1808
1809          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1810
1811         will return an entry like
1812
1813          { pc     => "1234AA",
1814            city   => "Duckstad",
1815            name   => "Donald",
1816            ID     => "13",
1817            fubble => "X313DF",
1818            }
1819
1820       See also "munge_column_names" and "set_column_names".
1821
1822       munge_column_names
1823
1824       If "munge_column_names" is set,  the method  "header"  is invoked on
1825       the opened stream with all matching arguments to detect and set the
1826       headers.
1827
1828       "munge_column_names" can be abbreviated to "munge".
1829
1830       key
1831
1832       If passed,  will default  "headers"  to "auto" and return a hashref
1833       instead of an array of hashes. Allowed values are simple scalars or
1834       array-references where the first element is the joiner and the rest are
1835       the fields to join to combine the key.
1836
1837        my $ref = csv (in => "test.csv", key => "code");
1838        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1839
1840       with test.csv like
1841
1842        code,product,price,color
1843        1,pc,850,gray
1844        2,keyboard,12,white
1845        3,mouse,5,black
1846
1847       the first example will return
1848
1849         { 1   => {
1850               code    => 1,
1851               color   => 'gray',
1852               price   => 850,
1853               product => 'pc'
1854               },
1855           2   => {
1856               code    => 2,
1857               color   => 'white',
1858               price   => 12,
1859               product => 'keyboard'
1860               },
1861           3   => {
1862               code    => 3,
1863               color   => 'black',
1864               price   => 5,
1865               product => 'mouse'
1866               }
1867           }
1868
1869       the second example will return
1870
1871         { "1:gray"    => {
1872               code    => 1,
1873               color   => 'gray',
1874               price   => 850,
1875               product => 'pc'
1876               },
1877           "2:white"   => {
1878               code    => 2,
1879               color   => 'white',
1880               price   => 12,
1881               product => 'keyboard'
1882               },
1883           "3:black"   => {
1884               code    => 3,
1885               color   => 'black',
1886               price   => 5,
1887               product => 'mouse'
1888               }
1889           }
1890
1891       The "key" attribute can be combined with "headers" for "CSV" date that
1892       has no header line, like
1893
1894        my $ref = csv (
1895            in      => "foo.csv",
1896            headers => [qw( c_foo foo bar description stock )],
1897            key     =>     "c_foo",
1898            );
1899
1900       value
1901
1902       Used to create key-value hashes.
1903
1904       Only allowed when "key" is valid. A "value" can be either a single
1905       column label or an anonymous list of column labels.  In the first case,
1906       the value will be a simple scalar value, in the latter case, it will be
1907       a hashref.
1908
1909        my $ref = csv (in => "test.csv", key   => "code",
1910                                         value => "price");
1911        my $ref = csv (in => "test.csv", key   => "code",
1912                                         value => [ "product", "price" ]);
1913        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1914                                         value => "price");
1915        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1916                                         value => [ "product", "price" ]);
1917
1918       with test.csv like
1919
1920        code,product,price,color
1921        1,pc,850,gray
1922        2,keyboard,12,white
1923        3,mouse,5,black
1924
1925       the first example will return
1926
1927         { 1 => 850,
1928           2 =>  12,
1929           3 =>   5,
1930           }
1931
1932       the second example will return
1933
1934         { 1   => {
1935               price   => 850,
1936               product => 'pc'
1937               },
1938           2   => {
1939               price   => 12,
1940               product => 'keyboard'
1941               },
1942           3   => {
1943               price   => 5,
1944               product => 'mouse'
1945               }
1946           }
1947
1948       the third example will return
1949
1950         { "1:gray"    => 850,
1951           "2:white"   =>  12,
1952           "3:black"   =>   5,
1953           }
1954
1955       the fourth example will return
1956
1957         { "1:gray"    => {
1958               price   => 850,
1959               product => 'pc'
1960               },
1961           "2:white"   => {
1962               price   => 12,
1963               product => 'keyboard'
1964               },
1965           "3:black"   => {
1966               price   => 5,
1967               product => 'mouse'
1968               }
1969           }
1970
1971       keep_headers
1972
1973       When using hashes,  keep the column names into the arrayref passed,  so
1974       all headers are available after the call in the original order.
1975
1976        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1977
1978       This attribute can be abbreviated to "kh" or passed as
1979       "keep_column_names".
1980
1981       This attribute implies a default of "auto" for the "headers" attribute.
1982
1983       fragment
1984
1985       Only output the fragment as defined in the "fragment" method. This
1986       option is ignored when generating "CSV". See "out".
1987
1988       Combining all of them could give something like
1989
1990        use Text::CSV qw( csv );
1991        my $aoh = csv (
1992            in       => "test.txt",
1993            encoding => "utf-8",
1994            headers  => "auto",
1995            sep_char => "|",
1996            fragment => "row=3;6-9;15-*",
1997            );
1998        say $aoh->[15]{Foo};
1999
2000       sep_set
2001
2002       If "sep_set" is set, the method "header" is invoked on the opened
2003       stream to detect and set "sep_char" with the given set.
2004
2005       "sep_set" can be abbreviated to "seps".
2006
2007       Note that as the  "header" method is invoked,  its default is to also
2008       set the headers.
2009
2010       set_column_names
2011
2012       If  "set_column_names" is passed,  the method "header" is invoked on
2013       the opened stream with all arguments meant for "header".
2014
2015       If "set_column_names" is passed as a false value, the content of the
2016       first row is only preserved if the output is AoA:
2017
2018       With an input-file like
2019
2020        bAr,foo
2021        1,2
2022        3,4,5
2023
2024       This call
2025
2026        my $aoa = csv (in => $file, set_column_names => 0);
2027
2028       will result in
2029
2030        [[ "bar", "foo"     ],
2031         [ "1",   "2"       ],
2032         [ "3",   "4",  "5" ]]
2033
2034       and
2035
2036        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2037
2038       will result in
2039
2040        [[ "bAr", "foo"     ],
2041         [ "1",   "2"       ],
2042         [ "3",   "4",  "5" ]]
2043
2044   Callbacks
2045       Callbacks enable actions triggered from the inside of Text::CSV.
2046
2047       While most of what this enables  can easily be done in an  unrolled
2048       loop as described in the "SYNOPSIS" callbacks can be used to meet
2049       special demands or enhance the "csv" function.
2050
2051       error
2052          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2053
2054         the "error"  callback is invoked when an error occurs,  but  only
2055         when "auto_diag" is set to a true value. A callback is invoked with
2056         the values returned by "error_diag":
2057
2058          my ($c, $s);
2059
2060          sub ignore3006 {
2061              my ($err, $msg, $pos, $recno, $fldno) = @_;
2062              if ($err == 3006) {
2063                  # ignore this error
2064                  ($c, $s) = (undef, undef);
2065                  Text::CSV->SetDiag (0);
2066                  }
2067              # Any other error
2068              return;
2069              } # ignore3006
2070
2071          $csv->callbacks (error => \&ignore3006);
2072          $csv->bind_columns (\$c, \$s);
2073          while ($csv->getline ($fh)) {
2074              # Error 3006 will not stop the loop
2075              }
2076
2077       after_parse
2078          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2079          while (my $row = $csv->getline ($fh)) {
2080              $row->[-1] eq "NEW";
2081              }
2082
2083         This callback is invoked after parsing with  "getline"  only if no
2084         error occurred.  The callback is invoked with two arguments:   the
2085         current "CSV" parser object and an array reference to the fields
2086         parsed.
2087
2088         The return code of the callback is ignored  unless it is a reference
2089         to the string "skip", in which case the record will be skipped in
2090         "getline_all".
2091
2092          sub add_from_db {
2093              my ($csv, $row) = @_;
2094              $sth->execute ($row->[4]);
2095              push @$row, $sth->fetchrow_array;
2096              } # add_from_db
2097
2098          my $aoa = csv (in => "file.csv", callbacks => {
2099              after_parse => \&add_from_db });
2100
2101         This hook can be used for validation:
2102
2103         FAIL
2104           Die if any of the records does not validate a rule:
2105
2106            after_parse => sub {
2107                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2108                    die "5th field does not have a valid Dutch zipcode";
2109                }
2110
2111         DEFAULT
2112           Replace invalid fields with a default value:
2113
2114            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2115
2116         SKIP
2117           Skip records that have invalid fields (only applies to
2118           "getline_all"):
2119
2120            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2121
2122       before_print
2123          my $idx = 1;
2124          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2125          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2126
2127         This callback is invoked  before printing with  "print"  only if no
2128         error occurred.  The callback is invoked with two arguments:  the
2129         current  "CSV" parser object and an array reference to the fields
2130         passed.
2131
2132         The return code of the callback is ignored.
2133
2134          sub max_4_fields {
2135              my ($csv, $row) = @_;
2136              @$row > 4 and splice @$row, 4;
2137              } # max_4_fields
2138
2139          csv (in => csv (in => "file.csv"), out => *STDOUT,
2140              callbacks => { before_print => \&max_4_fields });
2141
2142         This callback is not active for "combine".
2143
2144       Callbacks for csv ()
2145
2146       The "csv" allows for some callbacks that do not integrate in XS
2147       internals but only feature the "csv" function.
2148
2149         csv (in        => "file.csv",
2150              callbacks => {
2151                  filter       => { 6 => sub { $_ > 15 } },    # first
2152                  after_parse  => sub { say "AFTER PARSE";  }, # first
2153                  after_in     => sub { say "AFTER IN";     }, # second
2154                  on_in        => sub { say "ON IN";        }, # third
2155                  },
2156              );
2157
2158         csv (in        => $aoh,
2159              out       => "file.csv",
2160              callbacks => {
2161                  on_in        => sub { say "ON IN";        }, # first
2162                  before_out   => sub { say "BEFORE OUT";   }, # second
2163                  before_print => sub { say "BEFORE PRINT"; }, # third
2164                  },
2165              );
2166
2167       filter
2168         This callback can be used to filter records.  It is called just after
2169         a new record has been scanned.  The callback accepts a:
2170
2171         hashref
2172           The keys are the index to the row (the field name or field number,
2173           1-based) and the values are subs to return a true or false value.
2174
2175            csv (in => "file.csv", filter => {
2176                       3 => sub { m/a/ },       # third field should contain an "a"
2177                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2178                       });
2179
2180            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2181
2182           If the keys to the filter hash contain any character that is not a
2183           digit it will also implicitly set "headers" to "auto"  unless
2184           "headers"  was already passed as argument.  When headers are
2185           active, returning an array of hashes, the filter is not applicable
2186           to the header itself.
2187
2188           All sub results should match, as in AND.
2189
2190           The context of the callback sets  $_ localized to the field
2191           indicated by the filter. The two arguments are as with all other
2192           callbacks, so the other fields in the current row can be seen:
2193
2194            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2195
2196           If the context is set to return a list of hashes  ("headers" is
2197           defined), the current record will also be available in the
2198           localized %_:
2199
2200            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2201
2202           If the filter is used to alter the content by changing $_,  make
2203           sure that the sub returns true in order not to have that record
2204           skipped:
2205
2206            filter => { 2 => sub { $_ = uc }}
2207
2208           will upper-case the second field, and then skip it if the resulting
2209           content evaluates to false. To always accept, end with truth:
2210
2211            filter => { 2 => sub { $_ = uc; 1 }}
2212
2213         coderef
2214            csv (in => "file.csv", filter => sub { $n++; 0; });
2215
2216           If the argument to "filter" is a coderef,  it is an alias or
2217           shortcut to a filter on column 0:
2218
2219            csv (filter => sub { $n++; 0 });
2220
2221           is equal to
2222
2223            csv (filter => { 0 => sub { $n++; 0 });
2224
2225         filter-name
2226            csv (in => "file.csv", filter => "not_blank");
2227            csv (in => "file.csv", filter => "not_empty");
2228            csv (in => "file.csv", filter => "filled");
2229
2230           These are predefined filters
2231
2232           Given a file like (line numbers prefixed for doc purpose only):
2233
2234            1:1,2,3
2235            2:
2236            3:,
2237            4:""
2238            5:,,
2239            6:, ,
2240            7:"",
2241            8:" "
2242            9:4,5,6
2243
2244           not_blank
2245             Filter out the blank lines
2246
2247             This filter is a shortcut for
2248
2249              filter => { 0 => sub { @{$_[1]} > 1 or
2250                          defined $_[1][0] && $_[1][0] ne "" } }
2251
2252             Due to the implementation,  it is currently impossible to also
2253             filter lines that consists only of a quoted empty field. These
2254             lines are also considered blank lines.
2255
2256             With the given example, lines 2 and 4 will be skipped.
2257
2258           not_empty
2259             Filter out lines where all the fields are empty.
2260
2261             This filter is a shortcut for
2262
2263              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2264
2265             A space is not regarded being empty, so given the example data,
2266             lines 2, 3, 4, 5, and 7 are skipped.
2267
2268           filled
2269             Filter out lines that have no visible data
2270
2271             This filter is a shortcut for
2272
2273              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2274
2275             This filter rejects all lines that not have at least one field
2276             that does not evaluate to the empty string.
2277
2278             With the given example data, this filter would skip lines 2
2279             through 8.
2280
2281         One could also use modules like Types::Standard:
2282
2283          use Types::Standard -types;
2284
2285          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2286          my $check  = $type->compiled_check;
2287
2288          # filter with compiled check and warnings
2289          my $aoa = csv (
2290             in     => \$data,
2291             filter => {
2292                 0 => sub {
2293                     my $ok = $check->($_[1]) or
2294                         warn $type->get_message ($_[1]), "\n";
2295                     return $ok;
2296                     },
2297                 },
2298             );
2299
2300       after_in
2301         This callback is invoked for each record after all records have been
2302         parsed but before returning the reference to the caller.  The hook is
2303         invoked with two arguments:  the current  "CSV"  parser object  and a
2304         reference to the record.   The reference can be a reference to a
2305         HASH  or a reference to an ARRAY as determined by the arguments.
2306
2307         This callback can also be passed as  an attribute without the
2308         "callbacks" wrapper.
2309
2310       before_out
2311         This callback is invoked for each record before the record is
2312         printed.  The hook is invoked with two arguments:  the current "CSV"
2313         parser object and a reference to the record.   The reference can be a
2314         reference to a  HASH or a reference to an ARRAY as determined by the
2315         arguments.
2316
2317         This callback can also be passed as an attribute  without the
2318         "callbacks" wrapper.
2319
2320         This callback makes the row available in %_ if the row is a hashref.
2321         In this case %_ is writable and will change the original row.
2322
2323       on_in
2324         This callback acts exactly as the "after_in" or the "before_out"
2325         hooks.
2326
2327         This callback can also be passed as an attribute  without the
2328         "callbacks" wrapper.
2329
2330         This callback makes the row available in %_ if the row is a hashref.
2331         In this case %_ is writable and will change the original row. So e.g.
2332         with
2333
2334           my $aoh = csv (
2335               in      => \"foo\n1\n2\n",
2336               headers => "auto",
2337               on_in   => sub { $_{bar} = 2; },
2338               );
2339
2340         $aoh will be:
2341
2342           [ { foo => 1,
2343               bar => 2,
2344               }
2345             { foo => 2,
2346               bar => 2,
2347               }
2348             ]
2349
2350       csv
2351         The function  "csv" can also be called as a method or with an
2352         existing Text::CSV object. This could help if the function is to be
2353         invoked a lot of times and the overhead of creating the object
2354         internally over  and  over again would be prevented by passing an
2355         existing instance.
2356
2357          my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2358
2359          my $aoa = $csv->csv (in => $fh);
2360          my $aoa = csv (in => $fh, csv => $csv);
2361
2362         both act the same. Running this 20000 times on a 20 lines CSV file,
2363         showed a 53% speedup.
2364

DIAGNOSTICS

2366       This section is also taken from Text::CSV_XS.
2367
2368       Still under construction ...
2369
2370       If an error occurs,  "$csv->error_diag" can be used to get information
2371       on the cause of the failure. Note that for speed reasons the internal
2372       value is never cleared on success,  so using the value returned by
2373       "error_diag" in normal cases - when no error occurred - may cause
2374       unexpected results.
2375
2376       If the constructor failed, the cause can be found using "error_diag" as
2377       a class method, like "Text::CSV->error_diag".
2378
2379       The "$csv->error_diag" method is automatically invoked upon error when
2380       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2381       autodie is in effect.  When set to 1, this will cause a "warn" with the
2382       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2383       from "auto_diag" reports.
2384
2385       Errors can be (individually) caught using the "error" callback.
2386
2387       The errors as described below are available. I have tried to make the
2388       error itself explanatory enough, but more descriptions will be added.
2389       For most of these errors, the first three capitals describe the error
2390       category:
2391
2392       • INI
2393
2394         Initialization error or option conflict.
2395
2396       • ECR
2397
2398         Carriage-Return related parse error.
2399
2400       • EOF
2401
2402         End-Of-File related parse error.
2403
2404       • EIQ
2405
2406         Parse error inside quotation.
2407
2408       • EIF
2409
2410         Parse error inside field.
2411
2412       • ECB
2413
2414         Combine error.
2415
2416       • EHR
2417
2418         HashRef parse related error.
2419
2420       And below should be the complete list of error codes that can be
2421       returned:
2422
2423       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2424
2425         The  separation character  cannot be equal to  the quotation
2426         character or to the escape character,  as this would invalidate all
2427         parsing rules.
2428
2429       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2430         TAB"
2431
2432         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2433         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2434         allow.
2435
2436       • 1003 "INI - \r or \n in main attr not allowed"
2437
2438         Using default "eol" characters in either "sep_char", "quote_char",
2439         or  "escape_char"  is  not allowed.
2440
2441       • 1004 "INI - callbacks should be undef or a hashref"
2442
2443         The "callbacks"  attribute only allows one to be "undef" or a hash
2444         reference.
2445
2446       • 1005 "INI - EOL too long"
2447
2448         The value passed for EOL is exceeding its maximum length (16).
2449
2450       • 1006 "INI - SEP too long"
2451
2452         The value passed for SEP is exceeding its maximum length (16).
2453
2454       • 1007 "INI - QUOTE too long"
2455
2456         The value passed for QUOTE is exceeding its maximum length (16).
2457
2458       • 1008 "INI - SEP undefined"
2459
2460         The value passed for SEP should be defined and not empty.
2461
2462       • 1010 "INI - the header is empty"
2463
2464         The header line parsed in the "header" is empty.
2465
2466       • 1011 "INI - the header contains more than one valid separator"
2467
2468         The header line parsed in the  "header"  contains more than one
2469         (unique) separator character out of the allowed set of separators.
2470
2471       • 1012 "INI - the header contains an empty field"
2472
2473         The header line parsed in the "header" contains an empty field.
2474
2475       • 1013 "INI - the header contains nun-unique fields"
2476
2477         The header line parsed in the  "header"  contains at least  two
2478         identical fields.
2479
2480       • 1014 "INI - header called on undefined stream"
2481
2482         The header line cannot be parsed from an undefined source.
2483
2484       • 1500 "PRM - Invalid/unsupported argument(s)"
2485
2486         Function or method called with invalid argument(s) or parameter(s).
2487
2488       • 1501 "PRM - The key attribute is passed as an unsupported type"
2489
2490         The "key" attribute is of an unsupported type.
2491
2492       • 1502 "PRM - The value attribute is passed without the key attribute"
2493
2494         The "value" attribute is only allowed when a valid key is given.
2495
2496       • 1503 "PRM - The value attribute is passed as an unsupported type"
2497
2498         The "value" attribute is of an unsupported type.
2499
2500       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2501
2502         When  "eol"  has  been  set  to  anything  but the  default,  like
2503         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2504         "quote_char", where the characters following the "\r" do not make up
2505         the "eol" sequence, this is an error.
2506
2507       • 2011 "ECR - Characters after end of quoted field"
2508
2509         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2510         quoted field and after the closing double-quote, there should be
2511         either a new-line sequence or a separation character.
2512
2513       • 2012 "EOF - End of data in parsing input stream"
2514
2515         Self-explaining. End-of-file while inside parsing a stream. Can
2516         happen only when reading from streams with "getline",  as using
2517         "parse" is done on strings that are not required to have a trailing
2518         "eol".
2519
2520       • 2013 "INI - Specification error for fragments RFC7111"
2521
2522         Invalid specification for URI "fragment" specification.
2523
2524       • 2014 "ENF - Inconsistent number of fields"
2525
2526         Inconsistent number of fields under strict parsing.
2527
2528       • 2021 "EIQ - NL char inside quotes, binary off"
2529
2530         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2531         option has been selected with the constructor.
2532
2533       • 2022 "EIQ - CR char inside quotes, binary off"
2534
2535         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2536         option has been selected with the constructor.
2537
2538       • 2023 "EIQ - QUO character not allowed"
2539
2540         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2541         Bar",\n" will cause this error.
2542
2543       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2544
2545         The escape character is not allowed as last character in an input
2546         stream.
2547
2548       • 2025 "EIQ - Loose unescaped escape"
2549
2550         An escape character should escape only characters that need escaping.
2551
2552         Allowing  the escape  for other characters  is possible  with the
2553         attribute "allow_loose_escapes".
2554
2555       • 2026 "EIQ - Binary character inside quoted field, binary off"
2556
2557         Binary characters are not allowed by default.    Exceptions are
2558         fields that contain valid UTF-8,  that will automatically be upgraded
2559         if the content is valid UTF-8. Set "binary" to 1 to accept binary
2560         data.
2561
2562       • 2027 "EIQ - Quoted field not terminated"
2563
2564         When parsing a field that started with a quotation character,  the
2565         field is expected to be closed with a quotation character.   When the
2566         parsed line is exhausted before the quote is found, that field is not
2567         terminated.
2568
2569       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2570
2571       • 2031 "EIF - CR char is first char of field, not part of EOL"
2572
2573       • 2032 "EIF - CR char inside unquoted, not part of EOL"
2574
2575       • 2034 "EIF - Loose unescaped quote"
2576
2577       • 2035 "EIF - Escaped EOF in unquoted field"
2578
2579       • 2036 "EIF - ESC error"
2580
2581       • 2037 "EIF - Binary character in unquoted field, binary off"
2582
2583       • 2110 "ECB - Binary character in Combine, binary off"
2584
2585       • 2200 "EIO - print to IO failed. See errno"
2586
2587       • 3001 "EHR - Unsupported syntax for column_names ()"
2588
2589       • 3002 "EHR - getline_hr () called before column_names ()"
2590
2591       • 3003 "EHR - bind_columns () and column_names () fields count
2592         mismatch"
2593
2594       • 3004 "EHR - bind_columns () only accepts refs to scalars"
2595
2596       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2597         fields"
2598
2599       • 3007 "EHR - bind_columns needs refs to writable scalars"
2600
2601       • 3008 "EHR - unexpected error in bound fields"
2602
2603       • 3009 "EHR - print_hr () called before column_names ()"
2604
2605       • 3010 "EHR - print_hr () called with invalid arguments"
2606

AUTHORS and MAINTAINERS

2611       Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2612       Please don't send mail concerning Text::CSV to Alan, as he's not a
2613       present maintainer.
2614
2615       Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2616       in C by implementing a simple finite-state machine and added the
2617       variable quote, escape and separator characters, the binary mode and
2618       the print and getline methods. See ChangeLog releases 0.10 through
2619       0.23.
2620
2621       H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2622       field flags methods, wrote the major part of the test suite, completed
2623       the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2624       on.
2625
2626       Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2627       which is the pure-Perl version of Text::CSV_XS.
2628
2629       New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2630       Ishigaki since 1.91.
2631

COPYRIGHT AND LICENSE

2633       Text::CSV
2634
2635       Copyright (C) 1997 Alan Citterman. All rights reserved.  Copyright (C)
2636       2007-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017- Kenichi
2637       Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2638       below.
2639
2640       Text::CSV_PP:
2641
2642       Copyright (C) 2005-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017-
2643       Kenichi Ishigaki A large portion of the code/doc are also taken from
2644       Text::CSV_XS. See below.
2645
2646       Text:CSV_XS:
2647
2648       Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V.  Copyright (C)
2649       1998-2001 Jochen Wiedmann. All rights reserved.  Portions Copyright (C)
2650       1997 Alan Citterman. All rights reserved.
2651
2652       This library is free software; you can redistribute it and/or modify it
2653       under the same terms as Perl itself.
2654
2655
2656
2657perl v5.34.0                      2022-01-21                      Text::CSV(3)