Text::CSV(3pm)

1Text::CSV(3)          User Contributed Perl Documentation         Text::CSV(3)
2
3
4

NAME

6       Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7

SYNOPSIS

9       This section is taken from Text::CSV_XS.
10
11        # Functional interface
12        use Text::CSV qw( csv );
13
14        # Read whole file in memory
15        my $aoa = csv (in => "data.csv");    # as array of array
16        my $aoh = csv (in => "data.csv",
17                       headers => "auto");   # as array of hash
18
19        # Write array of arrays as csv file
20        csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22        # Only show lines where "code" is odd
23        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25        # Object interface
26        use Text::CSV;
27
28        my @rows;
29        # Read/parse CSV
30        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
31        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32        while (my $row = $csv->getline ($fh)) {
33            $row->[2] =~ m/pattern/ or next; # 3rd field should match
34            push @rows, $row;
35            }
36        close $fh;
37
38        # and write as CSV
39        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40        $csv->say ($fh, $_) for @rows;
41        close $fh or die "new.csv: $!";
42

DESCRIPTION

44       Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
45       All the backend modules provide facilities for the composition and
46       decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
47       default, and when Text::CSV_XS is not available, falls back on
48       Text::CSV_PP, which is bundled in the same distribution as this module.
49

CHOOSING BACKEND

51       This module respects an environmental variable called "PERL_TEXT_CSV"
52       when it decides a backend module to use. If this environmental variable
53       is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
54       available, falls back on Text::CSV_PP;
55
56       If you always don't want it to fall back on Text::CSV_PP, set the
57       variable like this ("export" may be "setenv", "set" and the likes,
58       depending on your environment):
59
60         > export PERL_TEXT_CSV=Text::CSV_XS
61
62       If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
63
64         > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
65
66       You may also want to set this variable at the top of your test files,
67       in order not to be bothered with incompatibilities between backends
68       (you need to wrap this in "BEGIN", and set before actually "use"-ing
69       Text::CSV module, as it decides its backend as soon as it's loaded):
70
71         BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
72         use Text::CSV;
73

NOTES

75       This section is also taken from Text::CSV_XS.
76
77   Embedded newlines
78       Important Note:  The default behavior is to accept only ASCII
79       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
80       that the fields can not contain newlines. If your data contains
81       newlines embedded in fields, or characters above 0x7E (tilde), or
82       binary data, you must set "binary => 1" in the call to "new". To cover
83       the widest range of parsing options, you will always want to set
84       binary.
85
86       But you still have the problem  that you have to pass a correct line to
87       the "parse" method, which is more complicated from the usual point of
88       usage:
89
90        my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
91        while (<>) {           #  WRONG!
92            $csv->parse ($_);
93            my @fields = $csv->fields ();
94            }
95
96       this will break, as the "while" might read broken lines:  it does not
97       care about the quoting. If you need to support embedded newlines,  the
98       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
99       and "\r\n" by default) and then
100
101        my $csv = Text::CSV->new ({ binary => 1 });
102        open my $fh, "<", $file or die "$file: $!";
103        while (my $row = $csv->getline ($fh)) {
104            my @fields = @$row;
105            }
106
107       The old(er) way of using global file handles is still supported
108
109        while (my $row = $csv->getline (*ARGV)) { ... }
110
111   Unicode
112       Unicode is only tested to work with perl-5.8.2 and up.
113
114       See also "BOM".
115
116       The simplest way to ensure the correct encoding is used for  in- and
117       output is by either setting layers on the filehandles, or setting the
118       "encoding" argument for "csv".
119
120        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
121       or
122        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
123
124        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
125       or
126        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
127
128       On parsing (both for  "getline" and  "parse"),  if the source is marked
129       being UTF8, then all fields that are marked binary will also be marked
130       UTF8.
131
132       On combining ("print"  and  "combine"):  if any of the combining fields
133       was marked UTF8, the resulting string will be marked as UTF8.  Note
134       however that all fields  before  the first field marked UTF8 and
135       contained 8-bit characters that were not upgraded to UTF8,  these will
136       be  "bytes"  in the resulting string too, possibly causing unexpected
137       errors.  If you pass data of different encoding,  or you don't know if
138       there is  different  encoding, force it to be upgraded before you pass
139       them on:
140
141        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
142
143       For complete control over encoding, please use Text::CSV::Encoded:
144
145        use Text::CSV::Encoded;
146        my $csv = Text::CSV::Encoded->new ({
147            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
148            encoding_out => "cp1252",     # the encoding comes out of Perl
149            });
150
151        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
152        # combine () and print () accept *literally* utf8 encoded data
153        # parse () and getline () return *literally* utf8 encoded data
154
155        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
156        # combine () and print () accept UTF8 marked data
157        # parse () and getline () return UTF8 marked data
158
159   BOM
160       BOM  (or Byte Order Mark)  handling is available only inside the
161       "header" method.   This method supports the following encodings:
162       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
163       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
164       <https://en.wikipedia.org/wiki/Byte_order_mark>.
165
166       If a file has a BOM, the easiest way to deal with that is
167
168        my $aoh = csv (in => $file, detect_bom => 1);
169
170       All records will be encoded based on the detected BOM.
171
172       This implies a call to the  "header"  method,  which defaults to also
173       set the "column_names". So this is not the same as
174
175        my $aoh = csv (in => $file, headers => "auto");
176
177       which only reads the first record to set  "column_names"  but ignores
178       any meaning of possible present BOM.
179

METHODS

181       This section is also taken from Text::CSV_XS.
182
183   version
184       (Class method) Returns the current module version.
185
186   new
187       (Class method) Returns a new instance of class Text::CSV. The
188       attributes are described by the (optional) hash ref "\%attr".
189
190        my $csv = Text::CSV->new ({ attributes ... });
191
192       The following attributes are available:
193
194       eol
195
196        my $csv = Text::CSV->new ({ eol => $/ });
197                  $csv->eol (undef);
198        my $eol = $csv->eol;
199
200       The end-of-line string to add to rows for "print" or the record
201       separator for "getline".
202
203       When not passed in a parser instance,  the default behavior is to
204       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
205       "eol" at all. Passing "undef" or the empty string behave the same.
206
207       When not passed in a generating instance,  records are not terminated
208       at all, so it is probably wise to pass something you expect. A safe
209       choice for "eol" on output is either $/ or "\r\n".
210
211       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
212       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
213       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
214
215       If both $/ and "eol" equal "\015", parsing lines that end on only a
216       Carriage Return without Line Feed, will be "parse"d correct.
217
218       sep_char
219
220        my $csv = Text::CSV->new ({ sep_char => ";" });
221                $csv->sep_char (";");
222        my $c = $csv->sep_char;
223
224       The char used to separate fields, by default a comma. (",").  Limited
225       to a single-byte character, usually in the range from 0x20 (space) to
226       0x7E (tilde). When longer sequences are required, use "sep".
227
228       The separation character can not be equal to the quote character  or to
229       the escape character.
230
231       sep
232
233        my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
234                  $csv->sep (";");
235        my $sep = $csv->sep;
236
237       The chars used to separate fields, by default undefined. Limited to 8
238       bytes.
239
240       When set, overrules "sep_char".  If its length is one byte it acts as
241       an alias to "sep_char".
242
243       quote_char
244
245        my $csv = Text::CSV->new ({ quote_char => "'" });
246                $csv->quote_char (undef);
247        my $c = $csv->quote_char;
248
249       The character to quote fields containing blanks or binary data,  by
250       default the double quote character (""").  A value of undef suppresses
251       quote chars (for simple cases only). Limited to a single-byte
252       character, usually in the range from  0x20 (space) to  0x7E (tilde).
253       When longer sequences are required, use "quote".
254
255       "quote_char" can not be equal to "sep_char".
256
257       quote
258
259        my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
260                    $csv->quote ("'");
261        my $quote = $csv->quote;
262
263       The chars used to quote fields, by default undefined. Limited to 8
264       bytes.
265
266       When set, overrules "quote_char". If its length is one byte it acts as
267       an alias to "quote_char".
268
269       This method does not support "undef".  Use "quote_char" to disable
270       quotation.
271
272       escape_char
273
274        my $csv = Text::CSV->new ({ escape_char => "\\" });
275                $csv->escape_char (":");
276        my $c = $csv->escape_char;
277
278       The character to  escape  certain characters inside quoted fields.
279       This is limited to a  single-byte  character,  usually  in the  range
280       from  0x20 (space) to 0x7E (tilde).
281
282       The "escape_char" defaults to being the double-quote mark ("""). In
283       other words the same as the default "quote_char". This means that
284       doubling the quote mark in a field escapes it:
285
286        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
287
288       If  you  change  the   "quote_char"  without  changing  the
289       "escape_char",  the  "escape_char" will still be the double-quote
290       (""").  If instead you want to escape the  "quote_char" by doubling it
291       you will need to also change the  "escape_char"  to be the same as what
292       you have changed the "quote_char" to.
293
294       Setting "escape_char" to "undef" or "" will completely disable escapes
295       and is greatly discouraged. This will also disable "escape_null".
296
297       The escape character can not be equal to the separation character.
298
299       binary
300
301        my $csv = Text::CSV->new ({ binary => 1 });
302                $csv->binary (0);
303        my $f = $csv->binary;
304
305       If this attribute is 1,  you may use binary characters in quoted
306       fields, including line feeds, carriage returns and "NULL" bytes. (The
307       latter could be escaped as ""0".) By default this feature is off.
308
309       If a string is marked UTF8,  "binary" will be turned on automatically
310       when binary characters other than "CR" and "NL" are encountered.   Note
311       that a simple string like "\x{00a0}" might still be binary, but not
312       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
313
314       strict
315
316        my $csv = Text::CSV->new ({ strict => 1 });
317                $csv->strict (0);
318        my $f = $csv->strict;
319
320       If this attribute is set to 1, any row that parses to a different
321       number of fields than the previous row will cause the parser to throw
322       error 2014.
323
324       skip_empty_rows
325
326        my $csv = Text::CSV->new ({ skip_empty_rows => 1 });
327                $csv->skip_empty_rows (0);
328        my $f = $csv->skip_empty_rows;
329
330       If this attribute is set to 1,  any row that has an  "eol" immediately
331       following the start of line will be skipped.  Default behavior is to
332       return one single empty field.
333
334       This attribute is only used in parsing.
335
336       formula_handling
337
338       Alias for "formula"
339
340       formula
341
342        my $csv = Text::CSV->new ({ formula => "none" });
343                $csv->formula ("none");
344        my $f = $csv->formula;
345
346       This defines the behavior of fields containing formulas. As formulas
347       are considered dangerous in spreadsheets, this attribute can define an
348       optional action to be taken if a field starts with an equal sign ("=").
349
350       For purpose of code-readability, this can also be written as
351
352        my $csv = Text::CSV->new ({ formula_handling => "none" });
353                $csv->formula_handling ("none");
354        my $f = $csv->formula_handling;
355
356       Possible values for this attribute are
357
358       none
359         Take no specific action. This is the default.
360
361          $csv->formula ("none");
362
363       die
364         Cause the process to "die" whenever a leading "=" is encountered.
365
366          $csv->formula ("die");
367
368       croak
369         Cause the process to "croak" whenever a leading "=" is encountered.
370         (See Carp)
371
372          $csv->formula ("croak");
373
374       diag
375         Report position and content of the field whenever a leading  "=" is
376         found.  The value of the field is unchanged.
377
378          $csv->formula ("diag");
379
380       empty
381         Replace the content of fields that start with a "=" with the empty
382         string.
383
384          $csv->formula ("empty");
385          $csv->formula ("");
386
387       undef
388         Replace the content of fields that start with a "=" with "undef".
389
390          $csv->formula ("undef");
391          $csv->formula (undef);
392
393       a callback
394         Modify the content of fields that start with a  "="  with the return-
395         value of the callback.  The original content of the field is
396         available inside the callback as $_;
397
398          # Replace all formula's with 42
399          $csv->formula (sub { 42; });
400
401          # same as $csv->formula ("empty") but slower
402          $csv->formula (sub { "" });
403
404          # Allow =4+12
405          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
406
407          # Allow more complex calculations
408          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
409
410       All other values will give a warning and then fallback to "diag".
411
412       decode_utf8
413
414        my $csv = Text::CSV->new ({ decode_utf8 => 1 });
415                $csv->decode_utf8 (0);
416        my $f = $csv->decode_utf8;
417
418       This attributes defaults to TRUE.
419
420       While parsing,  fields that are valid UTF-8, are automatically set to
421       be UTF-8, so that
422
423         $csv->parse ("\xC4\xA8\n");
424
425       results in
426
427         PV("\304\250"\0) [UTF8 "\x{128}"]
428
429       Sometimes it might not be a desired action.  To prevent those upgrades,
430       set this attribute to false, and the result will be
431
432         PV("\304\250"\0)
433
434       auto_diag
435
436        my $csv = Text::CSV->new ({ auto_diag => 1 });
437                $csv->auto_diag (2);
438        my $l = $csv->auto_diag;
439
440       Set this attribute to a number between 1 and 9 causes  "error_diag" to
441       be automatically called in void context upon errors.
442
443       In case of error "2012 - EOF", this call will be void.
444
445       If "auto_diag" is set to a numeric value greater than 1, it will "die"
446       on errors instead of "warn".  If set to anything unrecognized,  it will
447       be silently ignored.
448
449       Future extensions to this feature will include more reliable auto-
450       detection of  "autodie"  being active in the scope of which the error
451       occurred which will increment the value of "auto_diag" with  1 the
452       moment the error is detected.
453
454       diag_verbose
455
456        my $csv = Text::CSV->new ({ diag_verbose => 1 });
457                $csv->diag_verbose (2);
458        my $l = $csv->diag_verbose;
459
460       Set the verbosity of the output triggered by "auto_diag".   Currently
461       only adds the current  input-record-number  (if known)  to the
462       diagnostic output with an indication of the position of the error.
463
464       blank_is_undef
465
466        my $csv = Text::CSV->new ({ blank_is_undef => 1 });
467                $csv->blank_is_undef (0);
468        my $f = $csv->blank_is_undef;
469
470       Under normal circumstances, "CSV" data makes no distinction between
471       quoted- and unquoted empty fields.  These both end up in an empty
472       string field once read, thus
473
474        1,"",," ",2
475
476       is read as
477
478        ("1", "", "", " ", "2")
479
480       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
481       set, the unquoted  empty field is the result of an undefined value.
482       To enable this distinction when  reading "CSV"  data,  the
483       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
484       to "undef", causing the above to be parsed as
485
486        ("1", "", undef, " ", "2")
487
488       Note that this is specifically important when loading  "CSV" fields
489       into a database that allows "NULL" values,  as the perl equivalent for
490       "NULL" is "undef" in DBI land.
491
492       empty_is_undef
493
494        my $csv = Text::CSV->new ({ empty_is_undef => 1 });
495                $csv->empty_is_undef (0);
496        my $f = $csv->empty_is_undef;
497
498       Going one  step  further  than  "blank_is_undef",  this attribute
499       converts all empty fields to "undef", so
500
501        1,"",," ",2
502
503       is read as
504
505        (1, undef, undef, " ", 2)
506
507       Note that this affects only fields that are  originally  empty,  not
508       fields that are empty after stripping allowed whitespace. YMMV.
509
510       allow_whitespace
511
512        my $csv = Text::CSV->new ({ allow_whitespace => 1 });
513                $csv->allow_whitespace (0);
514        my $f = $csv->allow_whitespace;
515
516       When this option is set to true,  the whitespace  ("TAB"'s and
517       "SPACE"'s) surrounding  the  separation character  is removed when
518       parsing.  If either "TAB" or "SPACE" is one of the three characters
519       "sep_char", "quote_char", or "escape_char" it will not be considered
520       whitespace.
521
522       Now lines like:
523
524        1 , "foo" , bar , 3 , zapp
525
526       are parsed as valid "CSV", even though it violates the "CSV" specs.
527
528       Note that  all  whitespace is stripped from both  start and  end of
529       each field.  That would make it  more than a feature to enable parsing
530       bad "CSV" lines, as
531
532        1,   2.0,  3,   ape  , monkey
533
534       will now be parsed as
535
536        ("1", "2.0", "3", "ape", "monkey")
537
538       even if the original line was perfectly acceptable "CSV".
539
540       allow_loose_quotes
541
542        my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
543                $csv->allow_loose_quotes (0);
544        my $f = $csv->allow_loose_quotes;
545
546       By default, parsing unquoted fields containing "quote_char" characters
547       like
548
549        1,foo "bar" baz,42
550
551       would result in parse error 2034.  Though it is still bad practice to
552       allow this format,  we  cannot  help  the  fact  that  some  vendors
553       make  their applications spit out lines styled this way.
554
555       If there is really bad "CSV" data, like
556
557        1,"foo "bar" baz",42
558
559       or
560
561        1,""foo bar baz"",42
562
563       there is a way to get this data-line parsed and leave the quotes inside
564       the quoted field as-is.  This can be achieved by setting
565       "allow_loose_quotes" AND making sure that the "escape_char" is  not
566       equal to "quote_char".
567
568       allow_loose_escapes
569
570        my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
571                $csv->allow_loose_escapes (0);
572        my $f = $csv->allow_loose_escapes;
573
574       Parsing fields  that  have  "escape_char"  characters that escape
575       characters that do not need to be escaped, like:
576
577        my $csv = Text::CSV->new ({ escape_char => "\\" });
578        $csv->parse (qq{1,"my bar\'s",baz,42});
579
580       would result in parse error 2025.   Though it is bad practice to allow
581       this format,  this attribute enables you to treat all escape character
582       sequences equal.
583
584       allow_unquoted_escape
585
586        my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
587                $csv->allow_unquoted_escape (0);
588        my $f = $csv->allow_unquoted_escape;
589
590       A backward compatibility issue where "escape_char" differs from
591       "quote_char"  prevents  "escape_char" to be in the first position of a
592       field.  If "quote_char" is equal to the default """ and "escape_char"
593       is set to "\", this would be illegal:
594
595        1,\0,2
596
597       Setting this attribute to 1  might help to overcome issues with
598       backward compatibility and allow this style.
599
600       always_quote
601
602        my $csv = Text::CSV->new ({ always_quote => 1 });
603                $csv->always_quote (0);
604        my $f = $csv->always_quote;
605
606       By default the generated fields are quoted only if they need to be.
607       For example, if they contain the separator character. If you set this
608       attribute to 1 then all defined fields will be quoted. ("undef" fields
609       are not quoted, see "blank_is_undef"). This makes it quite often easier
610       to handle exported data in external applications.
611
612       quote_space
613
614        my $csv = Text::CSV->new ({ quote_space => 1 });
615                $csv->quote_space (0);
616        my $f = $csv->quote_space;
617
618       By default,  a space in a field would trigger quotation.  As no rule
619       exists this to be forced in "CSV",  nor any for the opposite, the
620       default is true for safety.   You can exclude the space  from this
621       trigger  by setting this attribute to 0.
622
623       quote_empty
624
625        my $csv = Text::CSV->new ({ quote_empty => 1 });
626                $csv->quote_empty (0);
627        my $f = $csv->quote_empty;
628
629       By default the generated fields are quoted only if they need to be.
630       An empty (defined) field does not need quotation. If you set this
631       attribute to 1 then empty defined fields will be quoted.  ("undef"
632       fields are not quoted, see "blank_is_undef"). See also "always_quote".
633
634       quote_binary
635
636        my $csv = Text::CSV->new ({ quote_binary => 1 });
637                $csv->quote_binary (0);
638        my $f = $csv->quote_binary;
639
640       By default,  all "unsafe" bytes inside a string cause the combined
641       field to be quoted.  By setting this attribute to 0, you can disable
642       that trigger for bytes ">= 0x7F".
643
644       escape_null
645
646        my $csv = Text::CSV->new ({ escape_null => 1 });
647                $csv->escape_null (0);
648        my $f = $csv->escape_null;
649
650       By default, a "NULL" byte in a field would be escaped. This option
651       enables you to treat the  "NULL"  byte as a simple binary character in
652       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
653       can prevent "NULL" escapes by setting this attribute to 0.
654
655       When the "escape_char" attribute is set to undefined,  this attribute
656       will be set to false.
657
658       The default setting will encode "=\x00=" as
659
660        "="0="
661
662       With "escape_null" set, this will result in
663
664        "=\x00="
665
666       The default when using the "csv" function is "false".
667
668       For backward compatibility reasons,  the deprecated old name
669       "quote_null" is still recognized.
670
671       keep_meta_info
672
673        my $csv = Text::CSV->new ({ keep_meta_info => 1 });
674                $csv->keep_meta_info (0);
675        my $f = $csv->keep_meta_info;
676
677       By default, the parsing of input records is as simple and fast as
678       possible.  However,  some parsing information - like quotation of the
679       original field - is lost in that process.  Setting this flag to true
680       enables retrieving that information after parsing with  the methods
681       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
682       false for performance.
683
684       If you set this attribute to a value greater than 9,   then you can
685       control output quotation style like it was used in the input of the the
686       last parsed record (unless quotation was added because of other
687       reasons).
688
689        my $csv = Text::CSV->new ({
690           binary         => 1,
691           keep_meta_info => 1,
692           quote_space    => 0,
693           });
694
695        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
696
697        $csv->print (*STDOUT, \@row);
698        # 1,,, , ,f,g,"h""h",help,help
699        $csv->keep_meta_info (11);
700        $csv->print (*STDOUT, \@row);
701        # 1,,"", ," ",f,"g","h""h",help,"help"
702
703       undef_str
704
705        my $csv = Text::CSV->new ({ undef_str => "\\N" });
706                $csv->undef_str (undef);
707        my $s = $csv->undef_str;
708
709       This attribute optionally defines the output of undefined fields. The
710       value passed is not changed at all, so if it needs quotation, the
711       quotation needs to be included in the value of the attribute.  Use with
712       caution, as passing a value like  ",",,,,"""  will for sure mess up
713       your output. The default for this attribute is "undef", meaning no
714       special treatment.
715
716       This attribute is useful when exporting  CSV data  to be imported in
717       custom loaders, like for MySQL, that recognize special sequences for
718       "NULL" data.
719
720       This attribute has no meaning when parsing CSV data.
721
722       comment_str
723
724        my $csv = Text::CSV->new ({ comment_str => "#" });
725                $csv->comment_str (undef);
726        my $s = $csv->comment_str;
727
728       This attribute optionally defines a string to be recognized as comment.
729       If this attribute is defined,   all lines starting with this sequence
730       will not be parsed as CSV but skipped as comment.
731
732       This attribute has no meaning when generating CSV.
733
734       Comment strings that start with any of the special characters/sequences
735       are not supported (so it cannot start with any of "sep_char",
736       "quote_char", "escape_char", "sep", "quote", or "eol").
737
738       For convenience, "comment" is an alias for "comment_str".
739
740       verbatim
741
742        my $csv = Text::CSV->new ({ verbatim => 1 });
743                $csv->verbatim (0);
744        my $f = $csv->verbatim;
745
746       This is a quite controversial attribute to set,  but makes some hard
747       things possible.
748
749       The rationale behind this attribute is to tell the parser that the
750       normally special characters newline ("NL") and Carriage Return ("CR")
751       will not be special when this flag is set,  and be dealt with  as being
752       ordinary binary characters. This will ease working with data with
753       embedded newlines.
754
755       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
756       every line.
757
758       Imagine a file format like
759
760        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
761
762       where, the line ending is a very specific "#\r\n", and the sep_char is
763       a "^" (caret).   None of the fields is quoted,   but embedded binary
764       data is likely to be present. With the specific line ending, this
765       should not be too hard to detect.
766
767       By default,  Text::CSV'  parse function is instructed to only know
768       about "\n" and "\r"  to be legal line endings,  and so has to deal with
769       the embedded newline as a real "end-of-line",  so it can scan the next
770       line if binary is true, and the newline is inside a quoted field. With
771       this option, we tell "parse" to parse the line as if "\n" is just
772       nothing more than a binary character.
773
774       For "parse" this means that the parser has no more idea about line
775       ending and "getline" "chomp"s line endings on reading.
776
777       types
778
779       A set of column types; the attribute is immediately passed to the
780       "types" method.
781
782       callbacks
783
784       See the "Callbacks" section below.
785
786       accessors
787
788       To sum it up,
789
790        $csv = Text::CSV->new ();
791
792       is equivalent to
793
794        $csv = Text::CSV->new ({
795            eol                   => undef, # \r, \n, or \r\n
796            sep_char              => ',',
797            sep                   => undef,
798            quote_char            => '"',
799            quote                 => undef,
800            escape_char           => '"',
801            binary                => 0,
802            decode_utf8           => 1,
803            auto_diag             => 0,
804            diag_verbose          => 0,
805            blank_is_undef        => 0,
806            empty_is_undef        => 0,
807            allow_whitespace      => 0,
808            allow_loose_quotes    => 0,
809            allow_loose_escapes   => 0,
810            allow_unquoted_escape => 0,
811            always_quote          => 0,
812            quote_empty           => 0,
813            quote_space           => 1,
814            escape_null           => 1,
815            quote_binary          => 1,
816            keep_meta_info        => 0,
817            strict                => 0,
818            skip_empty_rows       => 0,
819            formula               => 0,
820            verbatim              => 0,
821            undef_str             => undef,
822            comment_str           => undef,
823            types                 => undef,
824            callbacks             => undef,
825            });
826
827       For all of the above mentioned flags, an accessor method is available
828       where you can inquire the current value, or change the value
829
830        my $quote = $csv->quote_char;
831        $csv->binary (1);
832
833       It is not wise to change these settings halfway through writing "CSV"
834       data to a stream. If however you want to create a new stream using the
835       available "CSV" object, there is no harm in changing them.
836
837       If the "new" constructor call fails,  it returns "undef",  and makes
838       the fail reason available through the "error_diag" method.
839
840        $csv = Text::CSV->new ({ ecs_char => 1 }) or
841            die "".Text::CSV->error_diag ();
842
843       "error_diag" will return a string like
844
845        "INI - Unknown attribute 'ecs_char'"
846
847   known_attributes
848        @attr = Text::CSV->known_attributes;
849        @attr = Text::CSV::known_attributes;
850        @attr = $csv->known_attributes;
851
852       This method will return an ordered list of all the supported
853       attributes as described above.   This can be useful for knowing what
854       attributes are valid in classes that use or extend Text::CSV.
855
856   print
857        $status = $csv->print ($fh, $colref);
858
859       Similar to  "combine" + "string" + "print",  but much more efficient.
860       It expects an array ref as input  (not an array!)  and the resulting
861       string is not really  created,  but  immediately  written  to the  $fh
862       object, typically an IO handle or any other object that offers a
863       "print" method.
864
865       For performance reasons  "print"  does not create a result string,  so
866       all "string", "status", "fields", and "error_input" methods will return
867       undefined information after executing this method.
868
869       If $colref is "undef"  (explicit,  not through a variable argument) and
870       "bind_columns"  was used to specify fields to be printed,  it is
871       possible to make performance improvements, as otherwise data would have
872       to be copied as arguments to the method call:
873
874        $csv->bind_columns (\($foo, $bar));
875        $status = $csv->print ($fh, undef);
876
877       A short benchmark
878
879        my @data = ("aa" .. "zz");
880        $csv->bind_columns (\(@data));
881
882        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
883        $csv->print ($fh,  \@data  );   # 57600 recs/sec
884        $csv->print ($fh,   undef  );   # 48500 recs/sec
885
886   say
887        $status = $csv->say ($fh, $colref);
888
889       Like "print", but "eol" defaults to "$\".
890
891   print_hr
892        $csv->print_hr ($fh, $ref);
893
894       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
895       provided the column names are set with "column_names".
896
897       It is just a wrapper method with basic parameter checks over
898
899        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
900
901   combine
902        $status = $csv->combine (@fields);
903
904       This method constructs a "CSV" record from  @fields,  returning success
905       or failure.   Failure can result from lack of arguments or an argument
906       that contains an invalid character.   Upon success,  "string" can be
907       called to retrieve the resultant "CSV" string.  Upon failure,  the
908       value returned by "string" is undefined and "error_input" could be
909       called to retrieve the invalid argument.
910
911   string
912        $line = $csv->string ();
913
914       This method returns the input to  "parse"  or the resultant "CSV"
915       string of "combine", whichever was called more recently.
916
917   getline
918        $colref = $csv->getline ($fh);
919
920       This is the counterpart to  "print",  as "parse"  is the counterpart to
921       "combine":  it parses a row from the $fh  handle using the "getline"
922       method associated with $fh  and parses this row into an array ref.
923       This array ref is returned by the function or "undef" for failure.
924       When $fh does not support "getline", you are likely to hit errors.
925
926       When fields are bound with "bind_columns" the return value is a
927       reference to an empty list.
928
929       The "string", "fields", and "status" methods are meaningless again.
930
931   getline_all
932        $arrayref = $csv->getline_all ($fh);
933        $arrayref = $csv->getline_all ($fh, $offset);
934        $arrayref = $csv->getline_all ($fh, $offset, $length);
935
936       This will return a reference to a list of getline ($fh) results.  In
937       this call, "keep_meta_info" is disabled.  If $offset is negative, as
938       with "splice", only the last  "abs ($offset)" records of $fh are taken
939       into consideration.
940
941       Given a CSV file with 10 lines:
942
943        lines call
944        ----- ---------------------------------------------------------
945        0..9  $csv->getline_all ($fh)         # all
946        0..9  $csv->getline_all ($fh,  0)     # all
947        8..9  $csv->getline_all ($fh,  8)     # start at 8
948        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
949        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
950        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
951        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
952        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
953
954   getline_hr
955       The "getline_hr" and "column_names" methods work together  to allow you
956       to have rows returned as hashrefs.  You must call "column_names" first
957       to declare your column names.
958
959        $csv->column_names (qw( code name price description ));
960        $hr = $csv->getline_hr ($fh);
961        print "Price for $hr->{name} is $hr->{price} EUR\n";
962
963       "getline_hr" will croak if called before "column_names".
964
965       Note that  "getline_hr"  creates a hashref for every row and will be
966       much slower than the combined use of "bind_columns"  and "getline" but
967       still offering the same easy to use hashref inside the loop:
968
969        my @cols = @{$csv->getline ($fh)};
970        $csv->column_names (@cols);
971        while (my $row = $csv->getline_hr ($fh)) {
972            print $row->{price};
973            }
974
975       Could easily be rewritten to the much faster:
976
977        my @cols = @{$csv->getline ($fh)};
978        my $row = {};
979        $csv->bind_columns (\@{$row}{@cols});
980        while ($csv->getline ($fh)) {
981            print $row->{price};
982            }
983
984       Your mileage may vary for the size of the data and the number of rows.
985       With perl-5.14.2 the comparison for a 100_000 line file with 14
986       columns:
987
988                   Rate hashrefs getlines
989        hashrefs 1.00/s       --     -76%
990        getlines 4.15/s     313%       --
991
992   getline_hr_all
993        $arrayref = $csv->getline_hr_all ($fh);
994        $arrayref = $csv->getline_hr_all ($fh, $offset);
995        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
996
997       This will return a reference to a list of   getline_hr ($fh) results.
998       In this call, "keep_meta_info" is disabled.
999
1000   parse
1001        $status = $csv->parse ($line);
1002
1003       This method decomposes a  "CSV"  string into fields,  returning success
1004       or failure.   Failure can result from a lack of argument  or the given
1005       "CSV" string is improperly formatted.   Upon success, "fields" can be
1006       called to retrieve the decomposed fields. Upon failure calling "fields"
1007       will return undefined data and  "error_input"  can be called to
1008       retrieve  the invalid argument.
1009
1010       You may use the "types"  method for setting column types.  See "types"'
1011       description below.
1012
1013       The $line argument is supposed to be a simple scalar. Everything else
1014       is supposed to croak and set error 1500.
1015
1016   fragment
1017       This function tries to implement RFC7111  (URI Fragment Identifiers for
1018       the text/csv Media Type) -
1019       https://datatracker.ietf.org/doc/html/rfc7111
1020
1021        my $AoA = $csv->fragment ($fh, $spec);
1022
1023       In specifications,  "*" is used to specify the last item, a dash ("-")
1024       to indicate a range.   All indices are 1-based:  the first row or
1025       column has index 1. Selections can be combined with the semi-colon
1026       (";").
1027
1028       When using this method in combination with  "column_names",  the
1029       returned reference  will point to a  list of hashes  instead of a  list
1030       of lists.  A disjointed  cell-based combined selection  might return
1031       rows with different number of columns making the use of hashes
1032       unpredictable.
1033
1034        $csv->column_names ("Name", "Age");
1035        my $AoH = $csv->fragment ($fh, "col=3;8");
1036
1037       If the "after_parse" callback is active,  it is also called on every
1038       line parsed and skipped before the fragment.
1039
1040       row
1041          row=4
1042          row=5-7
1043          row=6-*
1044          row=1-2;4;6-*
1045
1046       col
1047          col=2
1048          col=1-3
1049          col=4-*
1050          col=1-2;4;7-*
1051
1052       cell
1053         In cell-based selection, the comma (",") is used to pair row and
1054         column
1055
1056          cell=4,1
1057
1058         The range operator ("-") using "cell"s can be used to define top-left
1059         and bottom-right "cell" location
1060
1061          cell=3,1-4,6
1062
1063         The "*" is only allowed in the second part of a pair
1064
1065          cell=3,2-*,2    # row 3 till end, only column 2
1066          cell=3,2-3,*    # column 2 till end, only row 3
1067          cell=3,2-*,*    # strip row 1 and 2, and column 1
1068
1069         Cells and cell ranges may be combined with ";", possibly resulting in
1070         rows with different numbers of columns
1071
1072          cell=1,1-2,2;3,3-4,4;1,4;4,1
1073
1074         Disjointed selections will only return selected cells.   The cells
1075         that are not  specified  will  not  be  included  in the  returned
1076         set,  not even as "undef".  As an example given a "CSV" like
1077
1078          11,12,13,...19
1079          21,22,...28,29
1080          :            :
1081          91,...97,98,99
1082
1083         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1084
1085          11,12,14
1086          21,22
1087          33,34
1088          41,43,44
1089
1090         Overlapping cell-specs will return those cells only once, So
1091         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1092
1093          11,12,13
1094          21,22,23,24
1095          31,32,33,34
1096          42,43,44
1097
1098       RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does  not
1099       allow different types of specs to be combined   (either "row" or "col"
1100       or "cell").  Passing an invalid fragment specification will croak and
1101       set error 2013.
1102
1103   column_names
1104       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1105       keys (column names) are passed, it will return the current setting as a
1106       list.
1107
1108       "column_names" accepts a list of scalars  (the column names)  or a
1109       single array_ref, so you can pass the return value from "getline" too:
1110
1111        $csv->column_names ($csv->getline ($fh));
1112
1113       "column_names" does no checking on duplicates at all, which might lead
1114       to unexpected results.   Undefined entries will be replaced with the
1115       string "\cAUNDEF\cA", so
1116
1117        $csv->column_names (undef, "", "name", "name");
1118        $hr = $csv->getline_hr ($fh);
1119
1120       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1121       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1122       field.
1123
1124       "column_names" croaks on invalid arguments.
1125
1126   header
1127       This method does NOT work in perl-5.6.x
1128
1129       Parse the CSV header and set "sep", column_names and encoding.
1130
1131        my @hdr = $csv->header ($fh);
1132        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1133        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1134
1135       The first argument should be a file handle.
1136
1137       This method resets some object properties,  as it is supposed to be
1138       invoked only once per file or stream.  It will leave attributes
1139       "column_names" and "bound_columns" alone if setting column names is
1140       disabled. Reading headers on previously process objects might fail on
1141       perl-5.8.0 and older.
1142
1143       Assuming that the file opened for parsing has a header, and the header
1144       does not contain problematic characters like embedded newlines,   read
1145       the first line from the open handle then auto-detect whether the header
1146       separates the column names with a character from the allowed separator
1147       list.
1148
1149       If any of the allowed separators matches,  and none of the other
1150       allowed separators match,  set  "sep"  to that  separator  for the
1151       current CSV instance and use it to parse the first line, map those to
1152       lowercase, and use that to set the instance "column_names":
1153
1154        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1155        open my $fh, "<", "file.csv";
1156        binmode $fh; # for Windows
1157        $csv->header ($fh);
1158        while (my $row = $csv->getline_hr ($fh)) {
1159            ...
1160            }
1161
1162       If the header is empty,  contains more than one unique separator out of
1163       the allowed set,  contains empty fields,   or contains identical fields
1164       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1165       respectively.
1166
1167       If the header contains embedded newlines or is not valid  CSV  in any
1168       other way, this method will croak and leave the parse error untouched.
1169
1170       A successful call to "header"  will always set the  "sep"  of the $csv
1171       object. This behavior can not be disabled.
1172
1173       return value
1174
1175       On error this method will croak.
1176
1177       In list context,  the headers will be returned whether they are used to
1178       set "column_names" or not.
1179
1180       In scalar context, the instance itself is returned.  Note: the values
1181       as found in the header will effectively be  lost if  "set_column_names"
1182       is false.
1183
1184       Options
1185
1186       sep_set
1187          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1188
1189         The list of legal separators defaults to "[ ";", "," ]" and can be
1190         changed by this option.  As this is probably the most often used
1191         option,  it can be passed on its own as an unnamed argument:
1192
1193          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1194
1195         Multi-byte  sequences are allowed,  both multi-character and
1196         Unicode.  See "sep".
1197
1198       detect_bom
1199          $csv->header ($fh, { detect_bom => 1 });
1200
1201         The default behavior is to detect if the header line starts with a
1202         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1203         This default behavior can be disabled by passing a false value to
1204         "detect_bom".
1205
1206         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1207         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1208         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1209         supported.
1210
1211         If a supported BOM was detected as start of the stream, it is stored
1212         in the object attribute "ENCODING".
1213
1214          my $enc = $csv->{ENCODING};
1215
1216         The encoding is used with "binmode" on $fh.
1217
1218         If the handle was opened in a (correct) encoding,  this method will
1219         not alter the encoding, as it checks the leading bytes of the first
1220         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1221         "{ENCODING}" will be "" (empty) instead of the default "undef".
1222
1223       munge_column_names
1224         This option offers the means to modify the column names into
1225         something that is most useful to the application.   The default is to
1226         map all column names to lower case.
1227
1228          $csv->header ($fh, { munge_column_names => "lc" });
1229
1230         The following values are available:
1231
1232           lc     - lower case
1233           uc     - upper case
1234           db     - valid DB field names
1235           none   - do not change
1236           \%hash - supply a mapping
1237           \&cb   - supply a callback
1238
1239         Lower case
1240            $csv->header ($fh, { munge_column_names => "lc" });
1241
1242           The header is changed to all lower-case
1243
1244            $_ = lc;
1245
1246         Upper case
1247            $csv->header ($fh, { munge_column_names => "uc" });
1248
1249           The header is changed to all upper-case
1250
1251            $_ = uc;
1252
1253         Literal
1254            $csv->header ($fh, { munge_column_names => "none" });
1255
1256         Hash
1257            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1258
1259           if a value does not exist, the original value is used unchanged
1260
1261         Database
1262            $csv->header ($fh, { munge_column_names => "db" });
1263
1264           - lower-case
1265
1266           - all sequences of non-word characters are replaced with an
1267             underscore
1268
1269           - all leading underscores are removed
1270
1271            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1272
1273         Callback
1274            $csv->header ($fh, { munge_column_names => sub { fc } });
1275            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1276            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1277
1278           As this callback is called in a "map", you can use $_ directly.
1279
1280       set_column_names
1281          $csv->header ($fh, { set_column_names => 1 });
1282
1283         The default is to set the instances column names using
1284         "column_names" if the method is successful,  so subsequent calls to
1285         "getline_hr" can return a hash. Disable setting the header can be
1286         forced by using a false value for this option.
1287
1288         As described in "return value" above, content is lost in scalar
1289         context.
1290
1291       Validation
1292
1293       When receiving CSV files from external sources,  this method can be
1294       used to protect against changes in the layout by restricting to known
1295       headers  (and typos in the header fields).
1296
1297        my %known = (
1298            "record key" => "c_rec",
1299            "rec id"     => "c_rec",
1300            "id_rec"     => "c_rec",
1301            "kode"       => "code",
1302            "code"       => "code",
1303            "vaule"      => "value",
1304            "value"      => "value",
1305            );
1306        my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1307        open my $fh, "<", $source or die "$source: $!";
1308        $csv->header ($fh, { munge_column_names => sub {
1309            s/\s+$//;
1310            s/^\s+//;
1311            $known{lc $_} or die "Unknown column '$_' in $source";
1312            }});
1313        while (my $row = $csv->getline_hr ($fh)) {
1314            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1315            }
1316
1317   bind_columns
1318       Takes a list of scalar references to be used for output with  "print"
1319       or to store in the fields fetched by "getline".  When you do not pass
1320       enough references to store the fetched fields in, "getline" will fail
1321       with error 3006.  If you pass more than there are fields to return,
1322       the content of the remaining references is left untouched.
1323
1324        $csv->bind_columns (\$code, \$name, \$price, \$description);
1325        while ($csv->getline ($fh)) {
1326            print "The price of a $name is \x{20ac} $price\n";
1327            }
1328
1329       To reset or clear all column binding, call "bind_columns" with the
1330       single argument "undef". This will also clear column names.
1331
1332        $csv->bind_columns (undef);
1333
1334       If no arguments are passed at all, "bind_columns" will return the list
1335       of current bindings or "undef" if no binds are active.
1336
1337       Note that in parsing with  "bind_columns",  the fields are set on the
1338       fly.  That implies that if the third field of a row causes an error
1339       (or this row has just two fields where the previous row had more),  the
1340       first two fields already have been assigned the values of the current
1341       row, while the rest of the fields will still hold the values of the
1342       previous row.  If you want the parser to fail in these cases, use the
1343       "strict" attribute.
1344
1345   eof
1346        $eof = $csv->eof ();
1347
1348       If "parse" or  "getline"  was used with an IO stream,  this method will
1349       return true (1) if the last call hit end of file,  otherwise it will
1350       return false ('').  This is useful to see the difference between a
1351       failure and end of file.
1352
1353       Note that if the parsing of the last line caused an error,  "eof" is
1354       still true.  That means that if you are not using "auto_diag", an idiom
1355       like
1356
1357        while (my $row = $csv->getline ($fh)) {
1358            # ...
1359            }
1360        $csv->eof or $csv->error_diag;
1361
1362       will not report the error. You would have to change that to
1363
1364        while (my $row = $csv->getline ($fh)) {
1365            # ...
1366            }
1367        +$csv->error_diag and $csv->error_diag;
1368
1369   types
1370        $csv->types (\@tref);
1371
1372       This method is used to force that  (all)  columns are of a given type.
1373       For example, if you have an integer column,  two  columns  with
1374       doubles  and a string column, then you might do a
1375
1376        $csv->types ([Text::CSV::IV (),
1377                      Text::CSV::NV (),
1378                      Text::CSV::NV (),
1379                      Text::CSV::PV ()]);
1380
1381       Column types are used only for decoding columns while parsing,  in
1382       other words by the "parse" and "getline" methods.
1383
1384       You can unset column types by doing a
1385
1386        $csv->types (undef);
1387
1388       or fetch the current type settings with
1389
1390        $types = $csv->types ();
1391
1392       IV
1393       CSV_TYPE_IV
1394           Set field type to integer.
1395
1396       NV
1397       CSV_TYPE_NV
1398           Set field type to numeric/float.
1399
1400       PV
1401       CSV_TYPE_PV
1402           Set field type to string.
1403
1404   fields
1405        @columns = $csv->fields ();
1406
1407       This method returns the input to   "combine"  or the resultant
1408       decomposed fields of a successful "parse", whichever was called more
1409       recently.
1410
1411       Note that the return value is undefined after using "getline", which
1412       does not fill the data structures returned by "parse".
1413
1414   meta_info
1415        @flags = $csv->meta_info ();
1416
1417       This method returns the "flags" of the input to "combine" or the flags
1418       of the resultant  decomposed fields of  "parse",   whichever was called
1419       more recently.
1420
1421       For each field,  a meta_info field will hold  flags that  inform
1422       something about  the  field  returned  by  the  "fields"  method or
1423       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1424
1425       0x0001
1426       "CSV_FLAGS_IS_QUOTED"
1427         The field was quoted.
1428
1429       0x0002
1430       "CSV_FLAGS_IS_BINARY"
1431         The field was binary.
1432
1433       0x0004
1434       "CSV_FLAGS_ERROR_IN_FIELD"
1435         The field was invalid.
1436
1437         Currently only used when "allow_loose_quotes" is active.
1438
1439       0x0010
1440       "CSV_FLAGS_IS_MISSING"
1441         The field was missing.
1442
1443       See the "is_***" methods below.
1444
1445   is_quoted
1446        my $quoted = $csv->is_quoted ($column_idx);
1447
1448       where  $column_idx is the  (zero-based)  index of the column in the
1449       last result of "parse".
1450
1451       This returns a true value  if the data in the indicated column was
1452       enclosed in "quote_char" quotes.  This might be important for fields
1453       where content ",20070108," is to be treated as a numeric value,  and
1454       where ","20070108"," is explicitly marked as character string data.
1455
1456       This method is only valid when "keep_meta_info" is set to a true value.
1457
1458   is_binary
1459        my $binary = $csv->is_binary ($column_idx);
1460
1461       where  $column_idx is the  (zero-based)  index of the column in the
1462       last result of "parse".
1463
1464       This returns a true value if the data in the indicated column contained
1465       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1466
1467       This method is only valid when "keep_meta_info" is set to a true value.
1468
1469   is_missing
1470        my $missing = $csv->is_missing ($column_idx);
1471
1472       where  $column_idx is the  (zero-based)  index of the column in the
1473       last result of "getline_hr".
1474
1475        $csv->keep_meta_info (1);
1476        while (my $hr = $csv->getline_hr ($fh)) {
1477            $csv->is_missing (0) and next; # This was an empty line
1478            }
1479
1480       When using  "getline_hr",  it is impossible to tell if the  parsed
1481       fields are "undef" because they where not filled in the "CSV" stream
1482       or because they were not read at all, as all the fields defined by
1483       "column_names" are set in the hash-ref.    If you still need to know if
1484       all fields in each row are provided, you should enable "keep_meta_info"
1485       so you can check the flags.
1486
1487       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1488       "undef", regardless of $column_idx being valid or not. If this
1489       attribute is "true" it will return either 0 (the field is present) or 1
1490       (the field is missing).
1491
1492       A special case is the empty line.  If the line is completely empty -
1493       after dealing with the flags - this is still a valid CSV line:  it is a
1494       record of just one single empty field. However, if "keep_meta_info" is
1495       set, invoking "is_missing" with index 0 will now return true.
1496
1497   status
1498        $status = $csv->status ();
1499
1500       This method returns the status of the last invoked "combine" or "parse"
1501       call. Status is success (true: 1) or failure (false: "undef" or 0).
1502
1503       Note that as this only keeps track of the status of above mentioned
1504       methods, you are probably looking for "error_diag" instead.
1505
1506   error_input
1507        $bad_argument = $csv->error_input ();
1508
1509       This method returns the erroneous argument (if it exists) of "combine"
1510       or "parse",  whichever was called more recently.  If the last
1511       invocation was successful, "error_input" will return "undef".
1512
1513       Depending on the type of error, it might also hold the data for the
1514       last error-input of "getline".
1515
1516   error_diag
1517        Text::CSV->error_diag ();
1518        $csv->error_diag ();
1519        $error_code               = 0  + $csv->error_diag ();
1520        $error_str                = "" . $csv->error_diag ();
1521        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1522
1523       If (and only if) an error occurred,  this function returns  the
1524       diagnostics of that error.
1525
1526       If called in void context,  this will print the internal error code and
1527       the associated error message to STDERR.
1528
1529       If called in list context,  this will return  the error code  and the
1530       error message in that order.  If the last error was from parsing, the
1531       rest of the values returned are a best guess at the location  within
1532       the line  that was being parsed. Their values are 1-based.  The
1533       position currently is index of the byte at which the parsing failed in
1534       the current record. It might change to be the index of the current
1535       character in a later release. The records is the index of the record
1536       parsed by the csv instance. The field number is the index of the field
1537       the parser thinks it is currently  trying to  parse. See
1538       examples/csv-check for how this can be used.
1539
1540       If called in  scalar context,  it will return  the diagnostics  in a
1541       single scalar, a-la $!.  It will contain the error code in numeric
1542       context, and the diagnostics message in string context.
1543
1544       When called as a class method or a  direct function call,  the
1545       diagnostics are that of the last "new" call.
1546
1547   record_number
1548        $recno = $csv->record_number ();
1549
1550       Returns the records parsed by this csv instance.  This value should be
1551       more accurate than $. when embedded newlines come in play. Records
1552       written by this instance are not counted.
1553
1554   SetDiag
1555        $csv->SetDiag (0);
1556
1557       Use to reset the diagnostics if you are dealing with errors.
1558

ADDITIONAL METHODS

1560       backend
1561           Returns the backend module name called by Text::CSV.  "module" is
1562           an alias.
1563
1564       is_xs
1565           Returns true value if Text::CSV uses an XS backend.
1566
1567       is_pp
1568           Returns true value if Text::CSV uses a pure-Perl backend.
1569

FUNCTIONS

1571       This section is also taken from Text::CSV_XS.
1572
1573   csv
1574       This function is not exported by default and should be explicitly
1575       requested:
1576
1577        use Text::CSV qw( csv );
1578
1579       This is a high-level function that aims at simple (user) interfaces.
1580       This can be used to read/parse a "CSV" file or stream (the default
1581       behavior) or to produce a file or write to a stream (define the  "out"
1582       attribute).  It returns an array- or hash-reference on parsing (or
1583       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1584       When this function fails you can get to the error using the class call
1585       to "error_diag"
1586
1587        my $aoa = csv (in => "test.csv") or
1588            die Text::CSV->error_diag;
1589
1590       This function takes the arguments as key-value pairs. This can be
1591       passed as a list or as an anonymous hash:
1592
1593        my $aoa = csv (  in => "test.csv", sep_char => ";");
1594        my $aoh = csv ({ in => $fh, headers => "auto" });
1595
1596       The arguments passed consist of two parts:  the arguments to "csv"
1597       itself and the optional attributes to the  "CSV"  object used inside
1598       the function as enumerated and explained in "new".
1599
1600       If not overridden, the default option used for CSV is
1601
1602        auto_diag   => 1
1603        escape_null => 0
1604
1605       The option that is always set and cannot be altered is
1606
1607        binary      => 1
1608
1609       As this function will likely be used in one-liners,  it allows  "quote"
1610       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1611       "esc" or "escape".
1612
1613       Alternative invocations:
1614
1615        my $aoa = Text::CSV::csv (in => "file.csv");
1616
1617        my $csv = Text::CSV->new ();
1618        my $aoa = $csv->csv (in => "file.csv");
1619
1620       In the latter case, the object attributes are used from the existing
1621       object and the attribute arguments in the function call are ignored:
1622
1623        my $csv = Text::CSV->new ({ sep_char => ";" });
1624        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1625
1626       will parse using ";" as "sep_char", not ",".
1627
1628       in
1629
1630       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1631       which will be  opened for reading  and closed when finished,  a file
1632       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1633       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1634       "\q{1,2,"csv"}").
1635
1636       When used with "out", "in" should be a reference to a CSV structure
1637       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1638       reference.  The code-ref will be invoked with no arguments.
1639
1640        my $aoa = csv (in => "file.csv");
1641
1642        open my $fh, "<", "file.csv";
1643        my $aoa = csv (in => $fh);
1644
1645        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1646        my $err = csv (in => $csv, out => "file.csv");
1647
1648       If called in void context without the "out" attribute, the resulting
1649       ref will be used as input to a subsequent call to csv:
1650
1651        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1652
1653       will be a shortcut to
1654
1655        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1656
1657       where, in the absence of the "out" attribute, this is a shortcut to
1658
1659        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1660             out => *STDOUT)
1661
1662       out
1663
1664        csv (in => $aoa, out => "file.csv");
1665        csv (in => $aoa, out => $fh);
1666        csv (in => $aoa, out =>   STDOUT);
1667        csv (in => $aoa, out =>  *STDOUT);
1668        csv (in => $aoa, out => \*STDOUT);
1669        csv (in => $aoa, out => \my $data);
1670        csv (in => $aoa, out =>  undef);
1671        csv (in => $aoa, out => \"skip");
1672
1673        csv (in => $fh,  out => \@aoa);
1674        csv (in => $fh,  out => \@aoh, bom => 1);
1675        csv (in => $fh,  out => \%hsh, key => "key");
1676
1677       In output mode, the default CSV options when producing CSV are
1678
1679        eol       => "\r\n"
1680
1681       The "fragment" attribute is ignored in output mode.
1682
1683       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1684       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1685       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1686       or a reference to a scalar (e.g. "\my $data").
1687
1688        csv (in => sub { $sth->fetch },            out => "dump.csv");
1689        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1690             headers => $sth->{NAME_lc});
1691
1692       When a code-ref is used for "in", the output is generated  per
1693       invocation, so no buffering is involved. This implies that there is no
1694       size restriction on the number of records. The "csv" function ends when
1695       the coderef returns a false value.
1696
1697       If "out" is set to a reference of the literal string "skip", the output
1698       will be suppressed completely,  which might be useful in combination
1699       with a filter for side effects only.
1700
1701        my %cache;
1702        csv (in    => "dump.csv",
1703             out   => \"skip",
1704             on_in => sub { $cache{$_[1][1]}++ });
1705
1706       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1707       equivalent to "\"skip"".
1708
1709       If the "in" argument point to something to parse, and the "out" is set
1710       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1711       data in the existing reference. The result of the parse should match
1712       what exists in the reference passed. This might come handy when you
1713       have to parse a set of files with similar content (like data stored per
1714       period) and you want to collect that into a single data structure:
1715
1716        my %hash;
1717        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1718
1719        my @list; # List of arrays
1720        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1721
1722        my @list; # List of hashes
1723        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1724
1725       encoding
1726
1727       If passed,  it should be an encoding accepted by the  :encoding()
1728       option to "open". There is no default value. This attribute does not
1729       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1730       use in command line invocations.
1731
1732       If "encoding" is set to the literal value "auto", the method "header"
1733       will be invoked on the opened stream to check if there is a BOM and set
1734       the encoding accordingly.   This is equal to passing a true value in
1735       the option "detect_bom".
1736
1737       Encodings can be stacked, as supported by "binmode":
1738
1739        # Using PerlIO::via::gzip
1740        csv (in       => \@csv,
1741             out      => "test.csv:via.gz",
1742             encoding => ":via(gzip):encoding(utf-8)",
1743             );
1744        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1745
1746        # Using PerlIO::gzip
1747        csv (in       => \@csv,
1748             out      => "test.csv:via.gz",
1749             encoding => ":gzip:encoding(utf-8)",
1750             );
1751        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1752
1753       detect_bom
1754
1755       If  "detect_bom"  is given, the method  "header"  will be invoked on
1756       the opened stream to check if there is a BOM and set the encoding
1757       accordingly.
1758
1759       "detect_bom" can be abbreviated to "bom".
1760
1761       This is the same as setting "encoding" to "auto".
1762
1763       Note that as the method  "header" is invoked,  its default is to also
1764       set the headers.
1765
1766       headers
1767
1768       If this attribute is not given, the default behavior is to produce an
1769       array of arrays.
1770
1771       If "headers" is supplied,  it should be an anonymous list of column
1772       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1773       "lc", "uc", or "skip".
1774
1775       skip
1776         When "skip" is used, the header will not be included in the output.
1777
1778          my $aoa = csv (in => $fh, headers => "skip");
1779
1780       auto
1781         If "auto" is used, the first line of the "CSV" source will be read as
1782         the list of field headers and used to produce an array of hashes.
1783
1784          my $aoh = csv (in => $fh, headers => "auto");
1785
1786       lc
1787         If "lc" is used,  the first line of the  "CSV" source will be read as
1788         the list of field headers mapped to  lower case and used to produce
1789         an array of hashes. This is a variation of "auto".
1790
1791          my $aoh = csv (in => $fh, headers => "lc");
1792
1793       uc
1794         If "uc" is used,  the first line of the  "CSV" source will be read as
1795         the list of field headers mapped to  upper case and used to produce
1796         an array of hashes. This is a variation of "auto".
1797
1798          my $aoh = csv (in => $fh, headers => "uc");
1799
1800       CODE
1801         If a coderef is used,  the first line of the  "CSV" source will be
1802         read as the list of mangled field headers in which each field is
1803         passed as the only argument to the coderef. This list is used to
1804         produce an array of hashes.
1805
1806          my $aoh = csv (in      => $fh,
1807                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1808
1809         this example is a variation of using "lc" where all occurrences of
1810         "kode" are replaced with "code".
1811
1812       ARRAY
1813         If  "headers"  is an anonymous list,  the entries in the list will be
1814         used as field names. The first line is considered data instead of
1815         headers.
1816
1817          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1818          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1819
1820       HASH
1821         If "headers" is a hash reference, this implies "auto", but header
1822         fields that exist as key in the hashref will be replaced by the value
1823         for that key. Given a CSV file like
1824
1825          post-kode,city,name,id number,fubble
1826          1234AA,Duckstad,Donald,13,"X313DF"
1827
1828         using
1829
1830          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1831
1832         will return an entry like
1833
1834          { pc     => "1234AA",
1835            city   => "Duckstad",
1836            name   => "Donald",
1837            ID     => "13",
1838            fubble => "X313DF",
1839            }
1840
1841       See also "munge_column_names" and "set_column_names".
1842
1843       munge_column_names
1844
1845       If "munge_column_names" is set,  the method  "header"  is invoked on
1846       the opened stream with all matching arguments to detect and set the
1847       headers.
1848
1849       "munge_column_names" can be abbreviated to "munge".
1850
1851       key
1852
1853       If passed,  will default  "headers"  to "auto" and return a hashref
1854       instead of an array of hashes. Allowed values are simple scalars or
1855       array-references where the first element is the joiner and the rest are
1856       the fields to join to combine the key.
1857
1858        my $ref = csv (in => "test.csv", key => "code");
1859        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1860
1861       with test.csv like
1862
1863        code,product,price,color
1864        1,pc,850,gray
1865        2,keyboard,12,white
1866        3,mouse,5,black
1867
1868       the first example will return
1869
1870         { 1   => {
1871               code    => 1,
1872               color   => 'gray',
1873               price   => 850,
1874               product => 'pc'
1875               },
1876           2   => {
1877               code    => 2,
1878               color   => 'white',
1879               price   => 12,
1880               product => 'keyboard'
1881               },
1882           3   => {
1883               code    => 3,
1884               color   => 'black',
1885               price   => 5,
1886               product => 'mouse'
1887               }
1888           }
1889
1890       the second example will return
1891
1892         { "1:gray"    => {
1893               code    => 1,
1894               color   => 'gray',
1895               price   => 850,
1896               product => 'pc'
1897               },
1898           "2:white"   => {
1899               code    => 2,
1900               color   => 'white',
1901               price   => 12,
1902               product => 'keyboard'
1903               },
1904           "3:black"   => {
1905               code    => 3,
1906               color   => 'black',
1907               price   => 5,
1908               product => 'mouse'
1909               }
1910           }
1911
1912       The "key" attribute can be combined with "headers" for "CSV" date that
1913       has no header line, like
1914
1915        my $ref = csv (
1916            in      => "foo.csv",
1917            headers => [qw( c_foo foo bar description stock )],
1918            key     =>     "c_foo",
1919            );
1920
1921       value
1922
1923       Used to create key-value hashes.
1924
1925       Only allowed when "key" is valid. A "value" can be either a single
1926       column label or an anonymous list of column labels.  In the first case,
1927       the value will be a simple scalar value, in the latter case, it will be
1928       a hashref.
1929
1930        my $ref = csv (in => "test.csv", key   => "code",
1931                                         value => "price");
1932        my $ref = csv (in => "test.csv", key   => "code",
1933                                         value => [ "product", "price" ]);
1934        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1935                                         value => "price");
1936        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1937                                         value => [ "product", "price" ]);
1938
1939       with test.csv like
1940
1941        code,product,price,color
1942        1,pc,850,gray
1943        2,keyboard,12,white
1944        3,mouse,5,black
1945
1946       the first example will return
1947
1948         { 1 => 850,
1949           2 =>  12,
1950           3 =>   5,
1951           }
1952
1953       the second example will return
1954
1955         { 1   => {
1956               price   => 850,
1957               product => 'pc'
1958               },
1959           2   => {
1960               price   => 12,
1961               product => 'keyboard'
1962               },
1963           3   => {
1964               price   => 5,
1965               product => 'mouse'
1966               }
1967           }
1968
1969       the third example will return
1970
1971         { "1:gray"    => 850,
1972           "2:white"   =>  12,
1973           "3:black"   =>   5,
1974           }
1975
1976       the fourth example will return
1977
1978         { "1:gray"    => {
1979               price   => 850,
1980               product => 'pc'
1981               },
1982           "2:white"   => {
1983               price   => 12,
1984               product => 'keyboard'
1985               },
1986           "3:black"   => {
1987               price   => 5,
1988               product => 'mouse'
1989               }
1990           }
1991
1992       keep_headers
1993
1994       When using hashes,  keep the column names into the arrayref passed,  so
1995       all headers are available after the call in the original order.
1996
1997        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1998
1999       This attribute can be abbreviated to "kh" or passed as
2000       "keep_column_names".
2001
2002       This attribute implies a default of "auto" for the "headers" attribute.
2003
2004       The headers can also be kept internally to keep stable header order:
2005
2006        csv (in      => csv (in => "file.csv", kh => "internal"),
2007             out     => "new.csv",
2008             kh      => "internal");
2009
2010       where "internal" can also be 1, "yes", or "true". This is similar to
2011
2012        my @h;
2013        csv (in      => csv (in => "file.csv", kh => \@h),
2014             out     => "new.csv",
2015             headers => \@h);
2016
2017       fragment
2018
2019       Only output the fragment as defined in the "fragment" method. This
2020       option is ignored when generating "CSV". See "out".
2021
2022       Combining all of them could give something like
2023
2024        use Text::CSV qw( csv );
2025        my $aoh = csv (
2026            in       => "test.txt",
2027            encoding => "utf-8",
2028            headers  => "auto",
2029            sep_char => "|",
2030            fragment => "row=3;6-9;15-*",
2031            );
2032        say $aoh->[15]{Foo};
2033
2034       sep_set
2035
2036       If "sep_set" is set, the method "header" is invoked on the opened
2037       stream to detect and set "sep_char" with the given set.
2038
2039       "sep_set" can be abbreviated to "seps".
2040
2041       Note that as the  "header" method is invoked,  its default is to also
2042       set the headers.
2043
2044       set_column_names
2045
2046       If  "set_column_names" is passed,  the method "header" is invoked on
2047       the opened stream with all arguments meant for "header".
2048
2049       If "set_column_names" is passed as a false value, the content of the
2050       first row is only preserved if the output is AoA:
2051
2052       With an input-file like
2053
2054        bAr,foo
2055        1,2
2056        3,4,5
2057
2058       This call
2059
2060        my $aoa = csv (in => $file, set_column_names => 0);
2061
2062       will result in
2063
2064        [[ "bar", "foo"     ],
2065         [ "1",   "2"       ],
2066         [ "3",   "4",  "5" ]]
2067
2068       and
2069
2070        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2071
2072       will result in
2073
2074        [[ "bAr", "foo"     ],
2075         [ "1",   "2"       ],
2076         [ "3",   "4",  "5" ]]
2077
2078   Callbacks
2079       Callbacks enable actions triggered from the inside of Text::CSV.
2080
2081       While most of what this enables  can easily be done in an  unrolled
2082       loop as described in the "SYNOPSIS" callbacks can be used to meet
2083       special demands or enhance the "csv" function.
2084
2085       error
2086          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2087
2088         the "error"  callback is invoked when an error occurs,  but  only
2089         when "auto_diag" is set to a true value. A callback is invoked with
2090         the values returned by "error_diag":
2091
2092          my ($c, $s);
2093
2094          sub ignore3006 {
2095              my ($err, $msg, $pos, $recno, $fldno) = @_;
2096              if ($err == 3006) {
2097                  # ignore this error
2098                  ($c, $s) = (undef, undef);
2099                  Text::CSV->SetDiag (0);
2100                  }
2101              # Any other error
2102              return;
2103              } # ignore3006
2104
2105          $csv->callbacks (error => \&ignore3006);
2106          $csv->bind_columns (\$c, \$s);
2107          while ($csv->getline ($fh)) {
2108              # Error 3006 will not stop the loop
2109              }
2110
2111       after_parse
2112          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2113          while (my $row = $csv->getline ($fh)) {
2114              $row->[-1] eq "NEW";
2115              }
2116
2117         This callback is invoked after parsing with  "getline"  only if no
2118         error occurred.  The callback is invoked with two arguments:   the
2119         current "CSV" parser object and an array reference to the fields
2120         parsed.
2121
2122         The return code of the callback is ignored  unless it is a reference
2123         to the string "skip", in which case the record will be skipped in
2124         "getline_all".
2125
2126          sub add_from_db {
2127              my ($csv, $row) = @_;
2128              $sth->execute ($row->[4]);
2129              push @$row, $sth->fetchrow_array;
2130              } # add_from_db
2131
2132          my $aoa = csv (in => "file.csv", callbacks => {
2133              after_parse => \&add_from_db });
2134
2135         This hook can be used for validation:
2136
2137         FAIL
2138           Die if any of the records does not validate a rule:
2139
2140            after_parse => sub {
2141                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2142                    die "5th field does not have a valid Dutch zipcode";
2143                }
2144
2145         DEFAULT
2146           Replace invalid fields with a default value:
2147
2148            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2149
2150         SKIP
2151           Skip records that have invalid fields (only applies to
2152           "getline_all"):
2153
2154            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2155
2156       before_print
2157          my $idx = 1;
2158          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2159          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2160
2161         This callback is invoked  before printing with  "print"  only if no
2162         error occurred.  The callback is invoked with two arguments:  the
2163         current  "CSV" parser object and an array reference to the fields
2164         passed.
2165
2166         The return code of the callback is ignored.
2167
2168          sub max_4_fields {
2169              my ($csv, $row) = @_;
2170              @$row > 4 and splice @$row, 4;
2171              } # max_4_fields
2172
2173          csv (in => csv (in => "file.csv"), out => *STDOUT,
2174              callbacks => { before_print => \&max_4_fields });
2175
2176         This callback is not active for "combine".
2177
2178       Callbacks for csv ()
2179
2180       The "csv" allows for some callbacks that do not integrate in XS
2181       internals but only feature the "csv" function.
2182
2183         csv (in        => "file.csv",
2184              callbacks => {
2185                  filter       => { 6 => sub { $_ > 15 } },    # first
2186                  after_parse  => sub { say "AFTER PARSE";  }, # first
2187                  after_in     => sub { say "AFTER IN";     }, # second
2188                  on_in        => sub { say "ON IN";        }, # third
2189                  },
2190              );
2191
2192         csv (in        => $aoh,
2193              out       => "file.csv",
2194              callbacks => {
2195                  on_in        => sub { say "ON IN";        }, # first
2196                  before_out   => sub { say "BEFORE OUT";   }, # second
2197                  before_print => sub { say "BEFORE PRINT"; }, # third
2198                  },
2199              );
2200
2201       filter
2202         This callback can be used to filter records.  It is called just after
2203         a new record has been scanned.  The callback accepts a:
2204
2205         hashref
2206           The keys are the index to the row (the field name or field number,
2207           1-based) and the values are subs to return a true or false value.
2208
2209            csv (in => "file.csv", filter => {
2210                       3 => sub { m/a/ },       # third field should contain an "a"
2211                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2212                       });
2213
2214            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2215
2216           If the keys to the filter hash contain any character that is not a
2217           digit it will also implicitly set "headers" to "auto"  unless
2218           "headers"  was already passed as argument.  When headers are
2219           active, returning an array of hashes, the filter is not applicable
2220           to the header itself.
2221
2222           All sub results should match, as in AND.
2223
2224           The context of the callback sets  $_ localized to the field
2225           indicated by the filter. The two arguments are as with all other
2226           callbacks, so the other fields in the current row can be seen:
2227
2228            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2229
2230           If the context is set to return a list of hashes  ("headers" is
2231           defined), the current record will also be available in the
2232           localized %_:
2233
2234            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2235
2236           If the filter is used to alter the content by changing $_,  make
2237           sure that the sub returns true in order not to have that record
2238           skipped:
2239
2240            filter => { 2 => sub { $_ = uc }}
2241
2242           will upper-case the second field, and then skip it if the resulting
2243           content evaluates to false. To always accept, end with truth:
2244
2245            filter => { 2 => sub { $_ = uc; 1 }}
2246
2247         coderef
2248            csv (in => "file.csv", filter => sub { $n++; 0; });
2249
2250           If the argument to "filter" is a coderef,  it is an alias or
2251           shortcut to a filter on column 0:
2252
2253            csv (filter => sub { $n++; 0 });
2254
2255           is equal to
2256
2257            csv (filter => { 0 => sub { $n++; 0 });
2258
2259         filter-name
2260            csv (in => "file.csv", filter => "not_blank");
2261            csv (in => "file.csv", filter => "not_empty");
2262            csv (in => "file.csv", filter => "filled");
2263
2264           These are predefined filters
2265
2266           Given a file like (line numbers prefixed for doc purpose only):
2267
2268            1:1,2,3
2269            2:
2270            3:,
2271            4:""
2272            5:,,
2273            6:, ,
2274            7:"",
2275            8:" "
2276            9:4,5,6
2277
2278           not_blank
2279             Filter out the blank lines
2280
2281             This filter is a shortcut for
2282
2283              filter => { 0 => sub { @{$_[1]} > 1 or
2284                          defined $_[1][0] && $_[1][0] ne "" } }
2285
2286             Due to the implementation,  it is currently impossible to also
2287             filter lines that consists only of a quoted empty field. These
2288             lines are also considered blank lines.
2289
2290             With the given example, lines 2 and 4 will be skipped.
2291
2292           not_empty
2293             Filter out lines where all the fields are empty.
2294
2295             This filter is a shortcut for
2296
2297              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2298
2299             A space is not regarded being empty, so given the example data,
2300             lines 2, 3, 4, 5, and 7 are skipped.
2301
2302           filled
2303             Filter out lines that have no visible data
2304
2305             This filter is a shortcut for
2306
2307              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2308
2309             This filter rejects all lines that not have at least one field
2310             that does not evaluate to the empty string.
2311
2312             With the given example data, this filter would skip lines 2
2313             through 8.
2314
2315         One could also use modules like Types::Standard:
2316
2317          use Types::Standard -types;
2318
2319          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2320          my $check  = $type->compiled_check;
2321
2322          # filter with compiled check and warnings
2323          my $aoa = csv (
2324             in     => \$data,
2325             filter => {
2326                 0 => sub {
2327                     my $ok = $check->($_[1]) or
2328                         warn $type->get_message ($_[1]), "\n";
2329                     return $ok;
2330                     },
2331                 },
2332             );
2333
2334       after_in
2335         This callback is invoked for each record after all records have been
2336         parsed but before returning the reference to the caller.  The hook is
2337         invoked with two arguments:  the current  "CSV"  parser object  and a
2338         reference to the record.   The reference can be a reference to a
2339         HASH  or a reference to an ARRAY as determined by the arguments.
2340
2341         This callback can also be passed as  an attribute without the
2342         "callbacks" wrapper.
2343
2344       before_out
2345         This callback is invoked for each record before the record is
2346         printed.  The hook is invoked with two arguments:  the current "CSV"
2347         parser object and a reference to the record.   The reference can be a
2348         reference to a  HASH or a reference to an ARRAY as determined by the
2349         arguments.
2350
2351         This callback can also be passed as an attribute  without the
2352         "callbacks" wrapper.
2353
2354         This callback makes the row available in %_ if the row is a hashref.
2355         In this case %_ is writable and will change the original row.
2356
2357       on_in
2358         This callback acts exactly as the "after_in" or the "before_out"
2359         hooks.
2360
2361         This callback can also be passed as an attribute  without the
2362         "callbacks" wrapper.
2363
2364         This callback makes the row available in %_ if the row is a hashref.
2365         In this case %_ is writable and will change the original row. So e.g.
2366         with
2367
2368           my $aoh = csv (
2369               in      => \"foo\n1\n2\n",
2370               headers => "auto",
2371               on_in   => sub { $_{bar} = 2; },
2372               );
2373
2374         $aoh will be:
2375
2376           [ { foo => 1,
2377               bar => 2,
2378               }
2379             { foo => 2,
2380               bar => 2,
2381               }
2382             ]
2383
2384       csv
2385         The function  "csv" can also be called as a method or with an
2386         existing Text::CSV object. This could help if the function is to be
2387         invoked a lot of times and the overhead of creating the object
2388         internally over  and  over again would be prevented by passing an
2389         existing instance.
2390
2391          my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2392
2393          my $aoa = $csv->csv (in => $fh);
2394          my $aoa = csv (in => $fh, csv => $csv);
2395
2396         both act the same. Running this 20000 times on a 20 lines CSV file,
2397         showed a 53% speedup.
2398

DIAGNOSTICS

2400       This section is also taken from Text::CSV_XS.
2401
2402       Still under construction ...
2403
2404       If an error occurs,  "$csv->error_diag" can be used to get information
2405       on the cause of the failure. Note that for speed reasons the internal
2406       value is never cleared on success,  so using the value returned by
2407       "error_diag" in normal cases - when no error occurred - may cause
2408       unexpected results.
2409
2410       If the constructor failed, the cause can be found using "error_diag" as
2411       a class method, like "Text::CSV->error_diag".
2412
2413       The "$csv->error_diag" method is automatically invoked upon error when
2414       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2415       autodie is in effect.  When set to 1, this will cause a "warn" with the
2416       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2417       from "auto_diag" reports.
2418
2419       Errors can be (individually) caught using the "error" callback.
2420
2421       The errors as described below are available. I have tried to make the
2422       error itself explanatory enough, but more descriptions will be added.
2423       For most of these errors, the first three capitals describe the error
2424       category:
2425
2426       • INI
2427
2428         Initialization error or option conflict.
2429
2430       • ECR
2431
2432         Carriage-Return related parse error.
2433
2434       • EOF
2435
2436         End-Of-File related parse error.
2437
2438       • EIQ
2439
2440         Parse error inside quotation.
2441
2442       • EIF
2443
2444         Parse error inside field.
2445
2446       • ECB
2447
2448         Combine error.
2449
2450       • EHR
2451
2452         HashRef parse related error.
2453
2454       And below should be the complete list of error codes that can be
2455       returned:
2456
2457       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2458
2459         The  separation character  cannot be equal to  the quotation
2460         character or to the escape character,  as this would invalidate all
2461         parsing rules.
2462
2463       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2464         TAB"
2465
2466         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2467         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2468         allow.
2469
2470       • 1003 "INI - \r or \n in main attr not allowed"
2471
2472         Using default "eol" characters in either "sep_char", "quote_char",
2473         or  "escape_char"  is  not allowed.
2474
2475       • 1004 "INI - callbacks should be undef or a hashref"
2476
2477         The "callbacks"  attribute only allows one to be "undef" or a hash
2478         reference.
2479
2480       • 1005 "INI - EOL too long"
2481
2482         The value passed for EOL is exceeding its maximum length (16).
2483
2484       • 1006 "INI - SEP too long"
2485
2486         The value passed for SEP is exceeding its maximum length (16).
2487
2488       • 1007 "INI - QUOTE too long"
2489
2490         The value passed for QUOTE is exceeding its maximum length (16).
2491
2492       • 1008 "INI - SEP undefined"
2493
2494         The value passed for SEP should be defined and not empty.
2495
2496       • 1010 "INI - the header is empty"
2497
2498         The header line parsed in the "header" is empty.
2499
2500       • 1011 "INI - the header contains more than one valid separator"
2501
2502         The header line parsed in the  "header"  contains more than one
2503         (unique) separator character out of the allowed set of separators.
2504
2505       • 1012 "INI - the header contains an empty field"
2506
2507         The header line parsed in the "header" contains an empty field.
2508
2509       • 1013 "INI - the header contains nun-unique fields"
2510
2511         The header line parsed in the  "header"  contains at least  two
2512         identical fields.
2513
2514       • 1014 "INI - header called on undefined stream"
2515
2516         The header line cannot be parsed from an undefined source.
2517
2518       • 1500 "PRM - Invalid/unsupported argument(s)"
2519
2520         Function or method called with invalid argument(s) or parameter(s).
2521
2522       • 1501 "PRM - The key attribute is passed as an unsupported type"
2523
2524         The "key" attribute is of an unsupported type.
2525
2526       • 1502 "PRM - The value attribute is passed without the key attribute"
2527
2528         The "value" attribute is only allowed when a valid key is given.
2529
2530       • 1503 "PRM - The value attribute is passed as an unsupported type"
2531
2532         The "value" attribute is of an unsupported type.
2533
2534       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2535
2536         When  "eol"  has  been  set  to  anything  but the  default,  like
2537         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2538         "quote_char", where the characters following the "\r" do not make up
2539         the "eol" sequence, this is an error.
2540
2541       • 2011 "ECR - Characters after end of quoted field"
2542
2543         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2544         quoted field and after the closing double-quote, there should be
2545         either a new-line sequence or a separation character.
2546
2547       • 2012 "EOF - End of data in parsing input stream"
2548
2549         Self-explaining. End-of-file while inside parsing a stream. Can
2550         happen only when reading from streams with "getline",  as using
2551         "parse" is done on strings that are not required to have a trailing
2552         "eol".
2553
2554       • 2013 "INI - Specification error for fragments RFC7111"
2555
2556         Invalid specification for URI "fragment" specification.
2557
2558       • 2014 "ENF - Inconsistent number of fields"
2559
2560         Inconsistent number of fields under strict parsing.
2561
2562       • 2021 "EIQ - NL char inside quotes, binary off"
2563
2564         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2565         option has been selected with the constructor.
2566
2567       • 2022 "EIQ - CR char inside quotes, binary off"
2568
2569         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2570         option has been selected with the constructor.
2571
2572       • 2023 "EIQ - QUO character not allowed"
2573
2574         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2575         Bar",\n" will cause this error.
2576
2577       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2578
2579         The escape character is not allowed as last character in an input
2580         stream.
2581
2582       • 2025 "EIQ - Loose unescaped escape"
2583
2584         An escape character should escape only characters that need escaping.
2585
2586         Allowing  the escape  for other characters  is possible  with the
2587         attribute "allow_loose_escapes".
2588
2589       • 2026 "EIQ - Binary character inside quoted field, binary off"
2590
2591         Binary characters are not allowed by default.    Exceptions are
2592         fields that contain valid UTF-8,  that will automatically be upgraded
2593         if the content is valid UTF-8. Set "binary" to 1 to accept binary
2594         data.
2595
2596       • 2027 "EIQ - Quoted field not terminated"
2597
2598         When parsing a field that started with a quotation character,  the
2599         field is expected to be closed with a quotation character.   When the
2600         parsed line is exhausted before the quote is found, that field is not
2601         terminated.
2602
2603       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2604
2605       • 2031 "EIF - CR char is first char of field, not part of EOL"
2606
2607       • 2032 "EIF - CR char inside unquoted, not part of EOL"
2608
2609       • 2034 "EIF - Loose unescaped quote"
2610
2611       • 2035 "EIF - Escaped EOF in unquoted field"
2612
2613       • 2036 "EIF - ESC error"
2614
2615       • 2037 "EIF - Binary character in unquoted field, binary off"
2616
2617       • 2110 "ECB - Binary character in Combine, binary off"
2618
2619       • 2200 "EIO - print to IO failed. See errno"
2620
2621       • 3001 "EHR - Unsupported syntax for column_names ()"
2622
2623       • 3002 "EHR - getline_hr () called before column_names ()"
2624
2625       • 3003 "EHR - bind_columns () and column_names () fields count
2626         mismatch"
2627
2628       • 3004 "EHR - bind_columns () only accepts refs to scalars"
2629
2630       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2631         fields"
2632
2633       • 3007 "EHR - bind_columns needs refs to writable scalars"
2634
2635       • 3008 "EHR - unexpected error in bound fields"
2636
2637       • 3009 "EHR - print_hr () called before column_names ()"
2638
2639       • 3010 "EHR - print_hr () called with invalid arguments"
2640

AUTHORS and MAINTAINERS

2645       Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2646       Please don't send mail concerning Text::CSV to Alan, as he's not a
2647       present maintainer.
2648
2649       Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2650       in C by implementing a simple finite-state machine and added the
2651       variable quote, escape and separator characters, the binary mode and
2652       the print and getline methods. See ChangeLog releases 0.10 through
2653       0.23.
2654
2655       H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2656       field flags methods, wrote the major part of the test suite, completed
2657       the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2658       on.
2659
2660       Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2661       which is the pure-Perl version of Text::CSV_XS.
2662
2663       New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2664       Ishigaki since 1.91.
2665

COPYRIGHT AND LICENSE

2667       Text::CSV
2668
2669       Copyright (C) 1997 Alan Citterman. All rights reserved.  Copyright (C)
2670       2007-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017- Kenichi
2671       Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2672       below.
2673
2674       Text::CSV_PP:
2675
2676       Copyright (C) 2005-2015 Makamaka Hannyaharamitu.  Copyright (C) 2017-
2677       Kenichi Ishigaki A large portion of the code/doc are also taken from
2678       Text::CSV_XS. See below.
2679
2680       Text:CSV_XS:
2681
2682       Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V.  Copyright (C)
2683       1998-2001 Jochen Wiedmann. All rights reserved.  Portions Copyright (C)
2684       1997 Alan Citterman. All rights reserved.
2685
2686       This library is free software; you can redistribute it and/or modify it
2687       under the same terms as Perl itself.
2688
2689
2690
2691perl v5.38.0                      2023-07-21                      Text::CSV(3)