1Text::CSV_PP(3)       User Contributed Perl Documentation      Text::CSV_PP(3)
2
3
4

NAME

6       Text::CSV_PP - Text::CSV_XS compatible pure-Perl module
7

SYNOPSIS

9       This section is taken from Text::CSV_XS.
10
11        # Functional interface
12        use Text::CSV_PP qw( csv );
13
14        # Read whole file in memory
15        my $aoa = csv (in => "data.csv");    # as array of array
16        my $aoh = csv (in => "data.csv",
17                       headers => "auto");   # as array of hash
18
19        # Write array of arrays as csv file
20        csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22        # Only show lines where "code" is odd
23        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25        # Object interface
26        use Text::CSV_PP;
27
28        my @rows;
29        # Read/parse CSV
30        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
31        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32        while (my $row = $csv->getline ($fh)) {
33            $row->[2] =~ m/pattern/ or next; # 3rd field should match
34            push @rows, $row;
35            }
36        close $fh;
37
38        # and write as CSV
39        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40        $csv->say ($fh, $_) for @rows;
41        close $fh or die "new.csv: $!";
42

DESCRIPTION

44       Text::CSV_PP is a pure-perl module that provides facilities for the
45       composition and decomposition of comma-separated values. This is
46       (almost) compatible with much faster Text::CSV_XS, and mainly used as
47       its fallback module when you use Text::CSV module without having
48       installed Text::CSV_XS. If you don't have any reason to use this module
49       directly, use Text::CSV for speed boost and portability (or maybe
50       Text::CSV_XS when you write an one-off script and don't need to care
51       about portability).
52
53       The following caveats are taken from the doc of Text::CSV_XS.
54
55   Embedded newlines
56       Important Note:  The default behavior is to accept only ASCII
57       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
58       that the fields can not contain newlines. If your data contains
59       newlines embedded in fields, or characters above 0x7E (tilde), or
60       binary data, you must set "binary => 1" in the call to "new". To cover
61       the widest range of parsing options, you will always want to set
62       binary.
63
64       But you still have the problem  that you have to pass a correct line to
65       the "parse" method, which is more complicated from the usual point of
66       usage:
67
68        my $csv = Text::CSV_PP->new ({ binary => 1, eol => $/ });
69        while (<>) {           #  WRONG!
70            $csv->parse ($_);
71            my @fields = $csv->fields ();
72            }
73
74       this will break, as the "while" might read broken lines:  it does not
75       care about the quoting. If you need to support embedded newlines,  the
76       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
77       and "\r\n" by default) and then
78
79        my $csv = Text::CSV_PP->new ({ binary => 1 });
80        open my $fh, "<", $file or die "$file: $!";
81        while (my $row = $csv->getline ($fh)) {
82            my @fields = @$row;
83            }
84
85       The old(er) way of using global file handles is still supported
86
87        while (my $row = $csv->getline (*ARGV)) { ... }
88
89   Unicode
90       Unicode is only tested to work with perl-5.8.2 and up.
91
92       See also "BOM".
93
94       The simplest way to ensure the correct encoding is used for  in- and
95       output is by either setting layers on the filehandles, or setting the
96       "encoding" argument for "csv".
97
98        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
99       or
100        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
101
102        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
103       or
104        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
105
106       On parsing (both for  "getline" and  "parse"),  if the source is marked
107       being UTF8, then all fields that are marked binary will also be marked
108       UTF8.
109
110       On combining ("print"  and  "combine"):  if any of the combining fields
111       was marked UTF8, the resulting string will be marked as UTF8.  Note
112       however that all fields  before  the first field marked UTF8 and
113       contained 8-bit characters that were not upgraded to UTF8,  these will
114       be  "bytes"  in the resulting string too, possibly causing unexpected
115       errors.  If you pass data of different encoding,  or you don't know if
116       there is  different  encoding, force it to be upgraded before you pass
117       them on:
118
119        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
120
121       For complete control over encoding, please use Text::CSV::Encoded:
122
123        use Text::CSV::Encoded;
124        my $csv = Text::CSV::Encoded->new ({
125            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
126            encoding_out => "cp1252",     # the encoding comes out of Perl
127            });
128
129        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
130        # combine () and print () accept *literally* utf8 encoded data
131        # parse () and getline () return *literally* utf8 encoded data
132
133        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
134        # combine () and print () accept UTF8 marked data
135        # parse () and getline () return UTF8 marked data
136
137   BOM
138       BOM  (or Byte Order Mark)  handling is available only inside the
139       "header" method.   This method supports the following encodings:
140       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
141       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
142       <https://en.wikipedia.org/wiki/Byte_order_mark>.
143
144       If a file has a BOM, the easiest way to deal with that is
145
146        my $aoh = csv (in => $file, detect_bom => 1);
147
148       All records will be encoded based on the detected BOM.
149
150       This implies a call to the  "header"  method,  which defaults to also
151       set the "column_names". So this is not the same as
152
153        my $aoh = csv (in => $file, headers => "auto");
154
155       which only reads the first record to set  "column_names"  but ignores
156       any meaning of possible present BOM.
157

METHODS

159       This section is also taken from Text::CSV_XS.
160
161   version
162       (Class method) Returns the current module version.
163
164   new
165       (Class method) Returns a new instance of class Text::CSV_PP. The
166       attributes are described by the (optional) hash ref "\%attr".
167
168        my $csv = Text::CSV_PP->new ({ attributes ... });
169
170       The following attributes are available:
171
172       eol
173
174        my $csv = Text::CSV_PP->new ({ eol => $/ });
175                  $csv->eol (undef);
176        my $eol = $csv->eol;
177
178       The end-of-line string to add to rows for "print" or the record
179       separator for "getline".
180
181       When not passed in a parser instance,  the default behavior is to
182       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
183       "eol" at all. Passing "undef" or the empty string behave the same.
184
185       When not passed in a generating instance,  records are not terminated
186       at all, so it is probably wise to pass something you expect. A safe
187       choice for "eol" on output is either $/ or "\r\n".
188
189       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
190       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
191       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
192
193       If both $/ and "eol" equal "\015", parsing lines that end on only a
194       Carriage Return without Line Feed, will be "parse"d correct.
195
196       sep_char
197
198        my $csv = Text::CSV_PP->new ({ sep_char => ";" });
199                $csv->sep_char (";");
200        my $c = $csv->sep_char;
201
202       The char used to separate fields, by default a comma. (",").  Limited
203       to a single-byte character, usually in the range from 0x20 (space) to
204       0x7E (tilde). When longer sequences are required, use "sep".
205
206       The separation character can not be equal to the quote character  or to
207       the escape character.
208
209       sep
210
211        my $csv = Text::CSV_PP->new ({ sep => "\N{FULLWIDTH COMMA}" });
212                  $csv->sep (";");
213        my $sep = $csv->sep;
214
215       The chars used to separate fields, by default undefined. Limited to 8
216       bytes.
217
218       When set, overrules "sep_char".  If its length is one byte it acts as
219       an alias to "sep_char".
220
221       quote_char
222
223        my $csv = Text::CSV_PP->new ({ quote_char => "'" });
224                $csv->quote_char (undef);
225        my $c = $csv->quote_char;
226
227       The character to quote fields containing blanks or binary data,  by
228       default the double quote character (""").  A value of undef suppresses
229       quote chars (for simple cases only). Limited to a single-byte
230       character, usually in the range from  0x20 (space) to  0x7E (tilde).
231       When longer sequences are required, use "quote".
232
233       "quote_char" can not be equal to "sep_char".
234
235       quote
236
237        my $csv = Text::CSV_PP->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
238                    $csv->quote ("'");
239        my $quote = $csv->quote;
240
241       The chars used to quote fields, by default undefined. Limited to 8
242       bytes.
243
244       When set, overrules "quote_char". If its length is one byte it acts as
245       an alias to "quote_char".
246
247       This method does not support "undef".  Use "quote_char" to disable
248       quotation.
249
250       escape_char
251
252        my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
253                $csv->escape_char (":");
254        my $c = $csv->escape_char;
255
256       The character to  escape  certain characters inside quoted fields.
257       This is limited to a  single-byte  character,  usually  in the  range
258       from  0x20 (space) to 0x7E (tilde).
259
260       The "escape_char" defaults to being the double-quote mark ("""). In
261       other words the same as the default "quote_char". This means that
262       doubling the quote mark in a field escapes it:
263
264        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
265
266       If  you  change  the   "quote_char"  without  changing  the
267       "escape_char",  the  "escape_char" will still be the double-quote
268       (""").  If instead you want to escape the  "quote_char" by doubling it
269       you will need to also change the  "escape_char"  to be the same as what
270       you have changed the "quote_char" to.
271
272       Setting "escape_char" to "undef" or "" will completely disable escapes
273       and is greatly discouraged. This will also disable "escape_null".
274
275       The escape character can not be equal to the separation character.
276
277       binary
278
279        my $csv = Text::CSV_PP->new ({ binary => 1 });
280                $csv->binary (0);
281        my $f = $csv->binary;
282
283       If this attribute is 1,  you may use binary characters in quoted
284       fields, including line feeds, carriage returns and "NULL" bytes. (The
285       latter could be escaped as ""0".) By default this feature is off.
286
287       If a string is marked UTF8,  "binary" will be turned on automatically
288       when binary characters other than "CR" and "NL" are encountered.   Note
289       that a simple string like "\x{00a0}" might still be binary, but not
290       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
291
292       strict
293
294        my $csv = Text::CSV_PP->new ({ strict => 1 });
295                $csv->strict (0);
296        my $f = $csv->strict;
297
298       If this attribute is set to 1, any row that parses to a different
299       number of fields than the previous row will cause the parser to throw
300       error 2014.
301
302       skip_empty_rows
303
304        my $csv = Text::CSV_PP->new ({ skip_empty_rows => 1 });
305                $csv->skip_empty_rows (0);
306        my $f = $csv->skip_empty_rows;
307
308       If this attribute is set to 1,  any row that has an  "eol" immediately
309       following the start of line will be skipped.  Default behavior is to
310       return one single empty field.
311
312       This attribute is only used in parsing.
313
314       formula_handling
315
316       Alias for "formula"
317
318       formula
319
320        my $csv = Text::CSV_PP->new ({ formula => "none" });
321                $csv->formula ("none");
322        my $f = $csv->formula;
323
324       This defines the behavior of fields containing formulas. As formulas
325       are considered dangerous in spreadsheets, this attribute can define an
326       optional action to be taken if a field starts with an equal sign ("=").
327
328       For purpose of code-readability, this can also be written as
329
330        my $csv = Text::CSV_PP->new ({ formula_handling => "none" });
331                $csv->formula_handling ("none");
332        my $f = $csv->formula_handling;
333
334       Possible values for this attribute are
335
336       none
337         Take no specific action. This is the default.
338
339          $csv->formula ("none");
340
341       die
342         Cause the process to "die" whenever a leading "=" is encountered.
343
344          $csv->formula ("die");
345
346       croak
347         Cause the process to "croak" whenever a leading "=" is encountered.
348         (See Carp)
349
350          $csv->formula ("croak");
351
352       diag
353         Report position and content of the field whenever a leading  "=" is
354         found.  The value of the field is unchanged.
355
356          $csv->formula ("diag");
357
358       empty
359         Replace the content of fields that start with a "=" with the empty
360         string.
361
362          $csv->formula ("empty");
363          $csv->formula ("");
364
365       undef
366         Replace the content of fields that start with a "=" with "undef".
367
368          $csv->formula ("undef");
369          $csv->formula (undef);
370
371       a callback
372         Modify the content of fields that start with a  "="  with the return-
373         value of the callback.  The original content of the field is
374         available inside the callback as $_;
375
376          # Replace all formula's with 42
377          $csv->formula (sub { 42; });
378
379          # same as $csv->formula ("empty") but slower
380          $csv->formula (sub { "" });
381
382          # Allow =4+12
383          $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
384
385          # Allow more complex calculations
386          $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
387
388       All other values will give a warning and then fallback to "diag".
389
390       decode_utf8
391
392        my $csv = Text::CSV_PP->new ({ decode_utf8 => 1 });
393                $csv->decode_utf8 (0);
394        my $f = $csv->decode_utf8;
395
396       This attributes defaults to TRUE.
397
398       While parsing,  fields that are valid UTF-8, are automatically set to
399       be UTF-8, so that
400
401         $csv->parse ("\xC4\xA8\n");
402
403       results in
404
405         PV("\304\250"\0) [UTF8 "\x{128}"]
406
407       Sometimes it might not be a desired action.  To prevent those upgrades,
408       set this attribute to false, and the result will be
409
410         PV("\304\250"\0)
411
412       auto_diag
413
414        my $csv = Text::CSV_PP->new ({ auto_diag => 1 });
415                $csv->auto_diag (2);
416        my $l = $csv->auto_diag;
417
418       Set this attribute to a number between 1 and 9 causes  "error_diag" to
419       be automatically called in void context upon errors.
420
421       In case of error "2012 - EOF", this call will be void.
422
423       If "auto_diag" is set to a numeric value greater than 1, it will "die"
424       on errors instead of "warn".  If set to anything unrecognized,  it will
425       be silently ignored.
426
427       Future extensions to this feature will include more reliable auto-
428       detection of  "autodie"  being active in the scope of which the error
429       occurred which will increment the value of "auto_diag" with  1 the
430       moment the error is detected.
431
432       diag_verbose
433
434        my $csv = Text::CSV_PP->new ({ diag_verbose => 1 });
435                $csv->diag_verbose (2);
436        my $l = $csv->diag_verbose;
437
438       Set the verbosity of the output triggered by "auto_diag".   Currently
439       only adds the current  input-record-number  (if known)  to the
440       diagnostic output with an indication of the position of the error.
441
442       blank_is_undef
443
444        my $csv = Text::CSV_PP->new ({ blank_is_undef => 1 });
445                $csv->blank_is_undef (0);
446        my $f = $csv->blank_is_undef;
447
448       Under normal circumstances, "CSV" data makes no distinction between
449       quoted- and unquoted empty fields.  These both end up in an empty
450       string field once read, thus
451
452        1,"",," ",2
453
454       is read as
455
456        ("1", "", "", " ", "2")
457
458       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
459       set, the unquoted  empty field is the result of an undefined value.
460       To enable this distinction when  reading "CSV"  data,  the
461       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
462       to "undef", causing the above to be parsed as
463
464        ("1", "", undef, " ", "2")
465
466       Note that this is specifically important when loading  "CSV" fields
467       into a database that allows "NULL" values,  as the perl equivalent for
468       "NULL" is "undef" in DBI land.
469
470       empty_is_undef
471
472        my $csv = Text::CSV_PP->new ({ empty_is_undef => 1 });
473                $csv->empty_is_undef (0);
474        my $f = $csv->empty_is_undef;
475
476       Going one  step  further  than  "blank_is_undef",  this attribute
477       converts all empty fields to "undef", so
478
479        1,"",," ",2
480
481       is read as
482
483        (1, undef, undef, " ", 2)
484
485       Note that this affects only fields that are  originally  empty,  not
486       fields that are empty after stripping allowed whitespace. YMMV.
487
488       allow_whitespace
489
490        my $csv = Text::CSV_PP->new ({ allow_whitespace => 1 });
491                $csv->allow_whitespace (0);
492        my $f = $csv->allow_whitespace;
493
494       When this option is set to true,  the whitespace  ("TAB"'s and
495       "SPACE"'s) surrounding  the  separation character  is removed when
496       parsing.  If either "TAB" or "SPACE" is one of the three characters
497       "sep_char", "quote_char", or "escape_char" it will not be considered
498       whitespace.
499
500       Now lines like:
501
502        1 , "foo" , bar , 3 , zapp
503
504       are parsed as valid "CSV", even though it violates the "CSV" specs.
505
506       Note that  all  whitespace is stripped from both  start and  end of
507       each field.  That would make it  more than a feature to enable parsing
508       bad "CSV" lines, as
509
510        1,   2.0,  3,   ape  , monkey
511
512       will now be parsed as
513
514        ("1", "2.0", "3", "ape", "monkey")
515
516       even if the original line was perfectly acceptable "CSV".
517
518       allow_loose_quotes
519
520        my $csv = Text::CSV_PP->new ({ allow_loose_quotes => 1 });
521                $csv->allow_loose_quotes (0);
522        my $f = $csv->allow_loose_quotes;
523
524       By default, parsing unquoted fields containing "quote_char" characters
525       like
526
527        1,foo "bar" baz,42
528
529       would result in parse error 2034.  Though it is still bad practice to
530       allow this format,  we  cannot  help  the  fact  that  some  vendors
531       make  their applications spit out lines styled this way.
532
533       If there is really bad "CSV" data, like
534
535        1,"foo "bar" baz",42
536
537       or
538
539        1,""foo bar baz"",42
540
541       there is a way to get this data-line parsed and leave the quotes inside
542       the quoted field as-is.  This can be achieved by setting
543       "allow_loose_quotes" AND making sure that the "escape_char" is  not
544       equal to "quote_char".
545
546       allow_loose_escapes
547
548        my $csv = Text::CSV_PP->new ({ allow_loose_escapes => 1 });
549                $csv->allow_loose_escapes (0);
550        my $f = $csv->allow_loose_escapes;
551
552       Parsing fields  that  have  "escape_char"  characters that escape
553       characters that do not need to be escaped, like:
554
555        my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
556        $csv->parse (qq{1,"my bar\'s",baz,42});
557
558       would result in parse error 2025.   Though it is bad practice to allow
559       this format,  this attribute enables you to treat all escape character
560       sequences equal.
561
562       allow_unquoted_escape
563
564        my $csv = Text::CSV_PP->new ({ allow_unquoted_escape => 1 });
565                $csv->allow_unquoted_escape (0);
566        my $f = $csv->allow_unquoted_escape;
567
568       A backward compatibility issue where "escape_char" differs from
569       "quote_char"  prevents  "escape_char" to be in the first position of a
570       field.  If "quote_char" is equal to the default """ and "escape_char"
571       is set to "\", this would be illegal:
572
573        1,\0,2
574
575       Setting this attribute to 1  might help to overcome issues with
576       backward compatibility and allow this style.
577
578       always_quote
579
580        my $csv = Text::CSV_PP->new ({ always_quote => 1 });
581                $csv->always_quote (0);
582        my $f = $csv->always_quote;
583
584       By default the generated fields are quoted only if they need to be.
585       For example, if they contain the separator character. If you set this
586       attribute to 1 then all defined fields will be quoted. ("undef" fields
587       are not quoted, see "blank_is_undef"). This makes it quite often easier
588       to handle exported data in external applications.
589
590       quote_space
591
592        my $csv = Text::CSV_PP->new ({ quote_space => 1 });
593                $csv->quote_space (0);
594        my $f = $csv->quote_space;
595
596       By default,  a space in a field would trigger quotation.  As no rule
597       exists this to be forced in "CSV",  nor any for the opposite, the
598       default is true for safety.   You can exclude the space  from this
599       trigger  by setting this attribute to 0.
600
601       quote_empty
602
603        my $csv = Text::CSV_PP->new ({ quote_empty => 1 });
604                $csv->quote_empty (0);
605        my $f = $csv->quote_empty;
606
607       By default the generated fields are quoted only if they need to be.
608       An empty (defined) field does not need quotation. If you set this
609       attribute to 1 then empty defined fields will be quoted.  ("undef"
610       fields are not quoted, see "blank_is_undef"). See also "always_quote".
611
612       quote_binary
613
614        my $csv = Text::CSV_PP->new ({ quote_binary => 1 });
615                $csv->quote_binary (0);
616        my $f = $csv->quote_binary;
617
618       By default,  all "unsafe" bytes inside a string cause the combined
619       field to be quoted.  By setting this attribute to 0, you can disable
620       that trigger for bytes ">= 0x7F".
621
622       escape_null
623
624        my $csv = Text::CSV_PP->new ({ escape_null => 1 });
625                $csv->escape_null (0);
626        my $f = $csv->escape_null;
627
628       By default, a "NULL" byte in a field would be escaped. This option
629       enables you to treat the  "NULL"  byte as a simple binary character in
630       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
631       can prevent "NULL" escapes by setting this attribute to 0.
632
633       When the "escape_char" attribute is set to undefined,  this attribute
634       will be set to false.
635
636       The default setting will encode "=\x00=" as
637
638        "="0="
639
640       With "escape_null" set, this will result in
641
642        "=\x00="
643
644       The default when using the "csv" function is "false".
645
646       For backward compatibility reasons,  the deprecated old name
647       "quote_null" is still recognized.
648
649       keep_meta_info
650
651        my $csv = Text::CSV_PP->new ({ keep_meta_info => 1 });
652                $csv->keep_meta_info (0);
653        my $f = $csv->keep_meta_info;
654
655       By default, the parsing of input records is as simple and fast as
656       possible.  However,  some parsing information - like quotation of the
657       original field - is lost in that process.  Setting this flag to true
658       enables retrieving that information after parsing with  the methods
659       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
660       false for performance.
661
662       If you set this attribute to a value greater than 9,   then you can
663       control output quotation style like it was used in the input of the the
664       last parsed record (unless quotation was added because of other
665       reasons).
666
667        my $csv = Text::CSV_PP->new ({
668           binary         => 1,
669           keep_meta_info => 1,
670           quote_space    => 0,
671           });
672
673        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
674
675        $csv->print (*STDOUT, \@row);
676        # 1,,, , ,f,g,"h""h",help,help
677        $csv->keep_meta_info (11);
678        $csv->print (*STDOUT, \@row);
679        # 1,,"", ," ",f,"g","h""h",help,"help"
680
681       undef_str
682
683        my $csv = Text::CSV_PP->new ({ undef_str => "\\N" });
684                $csv->undef_str (undef);
685        my $s = $csv->undef_str;
686
687       This attribute optionally defines the output of undefined fields. The
688       value passed is not changed at all, so if it needs quotation, the
689       quotation needs to be included in the value of the attribute.  Use with
690       caution, as passing a value like  ",",,,,"""  will for sure mess up
691       your output. The default for this attribute is "undef", meaning no
692       special treatment.
693
694       This attribute is useful when exporting  CSV data  to be imported in
695       custom loaders, like for MySQL, that recognize special sequences for
696       "NULL" data.
697
698       This attribute has no meaning when parsing CSV data.
699
700       comment_str
701
702        my $csv = Text::CSV_PP->new ({ comment_str => "#" });
703                $csv->comment_str (undef);
704        my $s = $csv->comment_str;
705
706       This attribute optionally defines a string to be recognized as comment.
707       If this attribute is defined,   all lines starting with this sequence
708       will not be parsed as CSV but skipped as comment.
709
710       This attribute has no meaning when generating CSV.
711
712       Comment strings that start with any of the special characters/sequences
713       are not supported (so it cannot start with any of "sep_char",
714       "quote_char", "escape_char", "sep", "quote", or "eol").
715
716       For convenience, "comment" is an alias for "comment_str".
717
718       verbatim
719
720        my $csv = Text::CSV_PP->new ({ verbatim => 1 });
721                $csv->verbatim (0);
722        my $f = $csv->verbatim;
723
724       This is a quite controversial attribute to set,  but makes some hard
725       things possible.
726
727       The rationale behind this attribute is to tell the parser that the
728       normally special characters newline ("NL") and Carriage Return ("CR")
729       will not be special when this flag is set,  and be dealt with  as being
730       ordinary binary characters. This will ease working with data with
731       embedded newlines.
732
733       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
734       every line.
735
736       Imagine a file format like
737
738        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
739
740       where, the line ending is a very specific "#\r\n", and the sep_char is
741       a "^" (caret).   None of the fields is quoted,   but embedded binary
742       data is likely to be present. With the specific line ending, this
743       should not be too hard to detect.
744
745       By default,  Text::CSV_PP'  parse function is instructed to only know
746       about "\n" and "\r"  to be legal line endings,  and so has to deal with
747       the embedded newline as a real "end-of-line",  so it can scan the next
748       line if binary is true, and the newline is inside a quoted field. With
749       this option, we tell "parse" to parse the line as if "\n" is just
750       nothing more than a binary character.
751
752       For "parse" this means that the parser has no more idea about line
753       ending and "getline" "chomp"s line endings on reading.
754
755       types
756
757       A set of column types; the attribute is immediately passed to the
758       "types" method.
759
760       callbacks
761
762       See the "Callbacks" section below.
763
764       accessors
765
766       To sum it up,
767
768        $csv = Text::CSV_PP->new ();
769
770       is equivalent to
771
772        $csv = Text::CSV_PP->new ({
773            eol                   => undef, # \r, \n, or \r\n
774            sep_char              => ',',
775            sep                   => undef,
776            quote_char            => '"',
777            quote                 => undef,
778            escape_char           => '"',
779            binary                => 0,
780            decode_utf8           => 1,
781            auto_diag             => 0,
782            diag_verbose          => 0,
783            blank_is_undef        => 0,
784            empty_is_undef        => 0,
785            allow_whitespace      => 0,
786            allow_loose_quotes    => 0,
787            allow_loose_escapes   => 0,
788            allow_unquoted_escape => 0,
789            always_quote          => 0,
790            quote_empty           => 0,
791            quote_space           => 1,
792            escape_null           => 1,
793            quote_binary          => 1,
794            keep_meta_info        => 0,
795            strict                => 0,
796            skip_empty_rows       => 0,
797            formula               => 0,
798            verbatim              => 0,
799            undef_str             => undef,
800            comment_str           => undef,
801            types                 => undef,
802            callbacks             => undef,
803            });
804
805       For all of the above mentioned flags, an accessor method is available
806       where you can inquire the current value, or change the value
807
808        my $quote = $csv->quote_char;
809        $csv->binary (1);
810
811       It is not wise to change these settings halfway through writing "CSV"
812       data to a stream. If however you want to create a new stream using the
813       available "CSV" object, there is no harm in changing them.
814
815       If the "new" constructor call fails,  it returns "undef",  and makes
816       the fail reason available through the "error_diag" method.
817
818        $csv = Text::CSV_PP->new ({ ecs_char => 1 }) or
819            die "".Text::CSV_PP->error_diag ();
820
821       "error_diag" will return a string like
822
823        "INI - Unknown attribute 'ecs_char'"
824
825   known_attributes
826        @attr = Text::CSV_PP->known_attributes;
827        @attr = Text::CSV_PP::known_attributes;
828        @attr = $csv->known_attributes;
829
830       This method will return an ordered list of all the supported
831       attributes as described above.   This can be useful for knowing what
832       attributes are valid in classes that use or extend Text::CSV_PP.
833
834   print
835        $status = $csv->print ($fh, $colref);
836
837       Similar to  "combine" + "string" + "print",  but much more efficient.
838       It expects an array ref as input  (not an array!)  and the resulting
839       string is not really  created,  but  immediately  written  to the  $fh
840       object, typically an IO handle or any other object that offers a
841       "print" method.
842
843       For performance reasons  "print"  does not create a result string,  so
844       all "string", "status", "fields", and "error_input" methods will return
845       undefined information after executing this method.
846
847       If $colref is "undef"  (explicit,  not through a variable argument) and
848       "bind_columns"  was used to specify fields to be printed,  it is
849       possible to make performance improvements, as otherwise data would have
850       to be copied as arguments to the method call:
851
852        $csv->bind_columns (\($foo, $bar));
853        $status = $csv->print ($fh, undef);
854
855       A short benchmark
856
857        my @data = ("aa" .. "zz");
858        $csv->bind_columns (\(@data));
859
860        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
861        $csv->print ($fh,  \@data  );   # 57600 recs/sec
862        $csv->print ($fh,   undef  );   # 48500 recs/sec
863
864   say
865        $status = $csv->say ($fh, $colref);
866
867       Like "print", but "eol" defaults to "$\".
868
869   print_hr
870        $csv->print_hr ($fh, $ref);
871
872       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
873       provided the column names are set with "column_names".
874
875       It is just a wrapper method with basic parameter checks over
876
877        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
878
879   combine
880        $status = $csv->combine (@fields);
881
882       This method constructs a "CSV" record from  @fields,  returning success
883       or failure.   Failure can result from lack of arguments or an argument
884       that contains an invalid character.   Upon success,  "string" can be
885       called to retrieve the resultant "CSV" string.  Upon failure,  the
886       value returned by "string" is undefined and "error_input" could be
887       called to retrieve the invalid argument.
888
889   string
890        $line = $csv->string ();
891
892       This method returns the input to  "parse"  or the resultant "CSV"
893       string of "combine", whichever was called more recently.
894
895   getline
896        $colref = $csv->getline ($fh);
897
898       This is the counterpart to  "print",  as "parse"  is the counterpart to
899       "combine":  it parses a row from the $fh  handle using the "getline"
900       method associated with $fh  and parses this row into an array ref.
901       This array ref is returned by the function or "undef" for failure.
902       When $fh does not support "getline", you are likely to hit errors.
903
904       When fields are bound with "bind_columns" the return value is a
905       reference to an empty list.
906
907       The "string", "fields", and "status" methods are meaningless again.
908
909   getline_all
910        $arrayref = $csv->getline_all ($fh);
911        $arrayref = $csv->getline_all ($fh, $offset);
912        $arrayref = $csv->getline_all ($fh, $offset, $length);
913
914       This will return a reference to a list of getline ($fh) results.  In
915       this call, "keep_meta_info" is disabled.  If $offset is negative, as
916       with "splice", only the last  "abs ($offset)" records of $fh are taken
917       into consideration.
918
919       Given a CSV file with 10 lines:
920
921        lines call
922        ----- ---------------------------------------------------------
923        0..9  $csv->getline_all ($fh)         # all
924        0..9  $csv->getline_all ($fh,  0)     # all
925        8..9  $csv->getline_all ($fh,  8)     # start at 8
926        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
927        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
928        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
929        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
930        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
931
932   getline_hr
933       The "getline_hr" and "column_names" methods work together  to allow you
934       to have rows returned as hashrefs.  You must call "column_names" first
935       to declare your column names.
936
937        $csv->column_names (qw( code name price description ));
938        $hr = $csv->getline_hr ($fh);
939        print "Price for $hr->{name} is $hr->{price} EUR\n";
940
941       "getline_hr" will croak if called before "column_names".
942
943       Note that  "getline_hr"  creates a hashref for every row and will be
944       much slower than the combined use of "bind_columns"  and "getline" but
945       still offering the same easy to use hashref inside the loop:
946
947        my @cols = @{$csv->getline ($fh)};
948        $csv->column_names (@cols);
949        while (my $row = $csv->getline_hr ($fh)) {
950            print $row->{price};
951            }
952
953       Could easily be rewritten to the much faster:
954
955        my @cols = @{$csv->getline ($fh)};
956        my $row = {};
957        $csv->bind_columns (\@{$row}{@cols});
958        while ($csv->getline ($fh)) {
959            print $row->{price};
960            }
961
962       Your mileage may vary for the size of the data and the number of rows.
963       With perl-5.14.2 the comparison for a 100_000 line file with 14
964       columns:
965
966                   Rate hashrefs getlines
967        hashrefs 1.00/s       --     -76%
968        getlines 4.15/s     313%       --
969
970   getline_hr_all
971        $arrayref = $csv->getline_hr_all ($fh);
972        $arrayref = $csv->getline_hr_all ($fh, $offset);
973        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
974
975       This will return a reference to a list of   getline_hr ($fh) results.
976       In this call, "keep_meta_info" is disabled.
977
978   parse
979        $status = $csv->parse ($line);
980
981       This method decomposes a  "CSV"  string into fields,  returning success
982       or failure.   Failure can result from a lack of argument  or the given
983       "CSV" string is improperly formatted.   Upon success, "fields" can be
984       called to retrieve the decomposed fields. Upon failure calling "fields"
985       will return undefined data and  "error_input"  can be called to
986       retrieve  the invalid argument.
987
988       You may use the "types"  method for setting column types.  See "types"'
989       description below.
990
991       The $line argument is supposed to be a simple scalar. Everything else
992       is supposed to croak and set error 1500.
993
994   fragment
995       This function tries to implement RFC7111  (URI Fragment Identifiers for
996       the text/csv Media Type) -
997       https://datatracker.ietf.org/doc/html/rfc7111
998
999        my $AoA = $csv->fragment ($fh, $spec);
1000
1001       In specifications,  "*" is used to specify the last item, a dash ("-")
1002       to indicate a range.   All indices are 1-based:  the first row or
1003       column has index 1. Selections can be combined with the semi-colon
1004       (";").
1005
1006       When using this method in combination with  "column_names",  the
1007       returned reference  will point to a  list of hashes  instead of a  list
1008       of lists.  A disjointed  cell-based combined selection  might return
1009       rows with different number of columns making the use of hashes
1010       unpredictable.
1011
1012        $csv->column_names ("Name", "Age");
1013        my $AoH = $csv->fragment ($fh, "col=3;8");
1014
1015       If the "after_parse" callback is active,  it is also called on every
1016       line parsed and skipped before the fragment.
1017
1018       row
1019          row=4
1020          row=5-7
1021          row=6-*
1022          row=1-2;4;6-*
1023
1024       col
1025          col=2
1026          col=1-3
1027          col=4-*
1028          col=1-2;4;7-*
1029
1030       cell
1031         In cell-based selection, the comma (",") is used to pair row and
1032         column
1033
1034          cell=4,1
1035
1036         The range operator ("-") using "cell"s can be used to define top-left
1037         and bottom-right "cell" location
1038
1039          cell=3,1-4,6
1040
1041         The "*" is only allowed in the second part of a pair
1042
1043          cell=3,2-*,2    # row 3 till end, only column 2
1044          cell=3,2-3,*    # column 2 till end, only row 3
1045          cell=3,2-*,*    # strip row 1 and 2, and column 1
1046
1047         Cells and cell ranges may be combined with ";", possibly resulting in
1048         rows with different numbers of columns
1049
1050          cell=1,1-2,2;3,3-4,4;1,4;4,1
1051
1052         Disjointed selections will only return selected cells.   The cells
1053         that are not  specified  will  not  be  included  in the  returned
1054         set,  not even as "undef".  As an example given a "CSV" like
1055
1056          11,12,13,...19
1057          21,22,...28,29
1058          :            :
1059          91,...97,98,99
1060
1061         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1062
1063          11,12,14
1064          21,22
1065          33,34
1066          41,43,44
1067
1068         Overlapping cell-specs will return those cells only once, So
1069         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1070
1071          11,12,13
1072          21,22,23,24
1073          31,32,33,34
1074          42,43,44
1075
1076       RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does  not
1077       allow different types of specs to be combined   (either "row" or "col"
1078       or "cell").  Passing an invalid fragment specification will croak and
1079       set error 2013.
1080
1081   column_names
1082       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1083       keys (column names) are passed, it will return the current setting as a
1084       list.
1085
1086       "column_names" accepts a list of scalars  (the column names)  or a
1087       single array_ref, so you can pass the return value from "getline" too:
1088
1089        $csv->column_names ($csv->getline ($fh));
1090
1091       "column_names" does no checking on duplicates at all, which might lead
1092       to unexpected results.   Undefined entries will be replaced with the
1093       string "\cAUNDEF\cA", so
1094
1095        $csv->column_names (undef, "", "name", "name");
1096        $hr = $csv->getline_hr ($fh);
1097
1098       will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1099       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1100       field.
1101
1102       "column_names" croaks on invalid arguments.
1103
1104   header
1105       This method does NOT work in perl-5.6.x
1106
1107       Parse the CSV header and set "sep", column_names and encoding.
1108
1109        my @hdr = $csv->header ($fh);
1110        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1111        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1112
1113       The first argument should be a file handle.
1114
1115       This method resets some object properties,  as it is supposed to be
1116       invoked only once per file or stream.  It will leave attributes
1117       "column_names" and "bound_columns" alone if setting column names is
1118       disabled. Reading headers on previously process objects might fail on
1119       perl-5.8.0 and older.
1120
1121       Assuming that the file opened for parsing has a header, and the header
1122       does not contain problematic characters like embedded newlines,   read
1123       the first line from the open handle then auto-detect whether the header
1124       separates the column names with a character from the allowed separator
1125       list.
1126
1127       If any of the allowed separators matches,  and none of the other
1128       allowed separators match,  set  "sep"  to that  separator  for the
1129       current CSV_PP instance and use it to parse the first line, map those
1130       to lowercase, and use that to set the instance "column_names":
1131
1132        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1133        open my $fh, "<", "file.csv";
1134        binmode $fh; # for Windows
1135        $csv->header ($fh);
1136        while (my $row = $csv->getline_hr ($fh)) {
1137            ...
1138            }
1139
1140       If the header is empty,  contains more than one unique separator out of
1141       the allowed set,  contains empty fields,   or contains identical fields
1142       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1143       respectively.
1144
1145       If the header contains embedded newlines or is not valid  CSV  in any
1146       other way, this method will croak and leave the parse error untouched.
1147
1148       A successful call to "header"  will always set the  "sep"  of the $csv
1149       object. This behavior can not be disabled.
1150
1151       return value
1152
1153       On error this method will croak.
1154
1155       In list context,  the headers will be returned whether they are used to
1156       set "column_names" or not.
1157
1158       In scalar context, the instance itself is returned.  Note: the values
1159       as found in the header will effectively be  lost if  "set_column_names"
1160       is false.
1161
1162       Options
1163
1164       sep_set
1165          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1166
1167         The list of legal separators defaults to "[ ";", "," ]" and can be
1168         changed by this option.  As this is probably the most often used
1169         option,  it can be passed on its own as an unnamed argument:
1170
1171          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1172
1173         Multi-byte  sequences are allowed,  both multi-character and
1174         Unicode.  See "sep".
1175
1176       detect_bom
1177          $csv->header ($fh, { detect_bom => 1 });
1178
1179         The default behavior is to detect if the header line starts with a
1180         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1181         This default behavior can be disabled by passing a false value to
1182         "detect_bom".
1183
1184         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1185         UTF-32BE,  and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1186         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1187         supported.
1188
1189         If a supported BOM was detected as start of the stream, it is stored
1190         in the object attribute "ENCODING".
1191
1192          my $enc = $csv->{ENCODING};
1193
1194         The encoding is used with "binmode" on $fh.
1195
1196         If the handle was opened in a (correct) encoding,  this method will
1197         not alter the encoding, as it checks the leading bytes of the first
1198         line. In case the stream starts with a decoded BOM ("U+FEFF"),
1199         "{ENCODING}" will be "" (empty) instead of the default "undef".
1200
1201       munge_column_names
1202         This option offers the means to modify the column names into
1203         something that is most useful to the application.   The default is to
1204         map all column names to lower case.
1205
1206          $csv->header ($fh, { munge_column_names => "lc" });
1207
1208         The following values are available:
1209
1210           lc     - lower case
1211           uc     - upper case
1212           db     - valid DB field names
1213           none   - do not change
1214           \%hash - supply a mapping
1215           \&cb   - supply a callback
1216
1217         Lower case
1218            $csv->header ($fh, { munge_column_names => "lc" });
1219
1220           The header is changed to all lower-case
1221
1222            $_ = lc;
1223
1224         Upper case
1225            $csv->header ($fh, { munge_column_names => "uc" });
1226
1227           The header is changed to all upper-case
1228
1229            $_ = uc;
1230
1231         Literal
1232            $csv->header ($fh, { munge_column_names => "none" });
1233
1234         Hash
1235            $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1236
1237           if a value does not exist, the original value is used unchanged
1238
1239         Database
1240            $csv->header ($fh, { munge_column_names => "db" });
1241
1242           - lower-case
1243
1244           - all sequences of non-word characters are replaced with an
1245             underscore
1246
1247           - all leading underscores are removed
1248
1249            $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1250
1251         Callback
1252            $csv->header ($fh, { munge_column_names => sub { fc } });
1253            $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1254            $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1255
1256           As this callback is called in a "map", you can use $_ directly.
1257
1258       set_column_names
1259          $csv->header ($fh, { set_column_names => 1 });
1260
1261         The default is to set the instances column names using
1262         "column_names" if the method is successful,  so subsequent calls to
1263         "getline_hr" can return a hash. Disable setting the header can be
1264         forced by using a false value for this option.
1265
1266         As described in "return value" above, content is lost in scalar
1267         context.
1268
1269       Validation
1270
1271       When receiving CSV files from external sources,  this method can be
1272       used to protect against changes in the layout by restricting to known
1273       headers  (and typos in the header fields).
1274
1275        my %known = (
1276            "record key" => "c_rec",
1277            "rec id"     => "c_rec",
1278            "id_rec"     => "c_rec",
1279            "kode"       => "code",
1280            "code"       => "code",
1281            "vaule"      => "value",
1282            "value"      => "value",
1283            );
1284        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1285        open my $fh, "<", $source or die "$source: $!";
1286        $csv->header ($fh, { munge_column_names => sub {
1287            s/\s+$//;
1288            s/^\s+//;
1289            $known{lc $_} or die "Unknown column '$_' in $source";
1290            }});
1291        while (my $row = $csv->getline_hr ($fh)) {
1292            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1293            }
1294
1295   bind_columns
1296       Takes a list of scalar references to be used for output with  "print"
1297       or to store in the fields fetched by "getline".  When you do not pass
1298       enough references to store the fetched fields in, "getline" will fail
1299       with error 3006.  If you pass more than there are fields to return,
1300       the content of the remaining references is left untouched.
1301
1302        $csv->bind_columns (\$code, \$name, \$price, \$description);
1303        while ($csv->getline ($fh)) {
1304            print "The price of a $name is \x{20ac} $price\n";
1305            }
1306
1307       To reset or clear all column binding, call "bind_columns" with the
1308       single argument "undef". This will also clear column names.
1309
1310        $csv->bind_columns (undef);
1311
1312       If no arguments are passed at all, "bind_columns" will return the list
1313       of current bindings or "undef" if no binds are active.
1314
1315       Note that in parsing with  "bind_columns",  the fields are set on the
1316       fly.  That implies that if the third field of a row causes an error
1317       (or this row has just two fields where the previous row had more),  the
1318       first two fields already have been assigned the values of the current
1319       row, while the rest of the fields will still hold the values of the
1320       previous row.  If you want the parser to fail in these cases, use the
1321       "strict" attribute.
1322
1323   eof
1324        $eof = $csv->eof ();
1325
1326       If "parse" or  "getline"  was used with an IO stream,  this method will
1327       return true (1) if the last call hit end of file,  otherwise it will
1328       return false ('').  This is useful to see the difference between a
1329       failure and end of file.
1330
1331       Note that if the parsing of the last line caused an error,  "eof" is
1332       still true.  That means that if you are not using "auto_diag", an idiom
1333       like
1334
1335        while (my $row = $csv->getline ($fh)) {
1336            # ...
1337            }
1338        $csv->eof or $csv->error_diag;
1339
1340       will not report the error. You would have to change that to
1341
1342        while (my $row = $csv->getline ($fh)) {
1343            # ...
1344            }
1345        +$csv->error_diag and $csv->error_diag;
1346
1347   types
1348        $csv->types (\@tref);
1349
1350       This method is used to force that  (all)  columns are of a given type.
1351       For example, if you have an integer column,  two  columns  with
1352       doubles  and a string column, then you might do a
1353
1354        $csv->types ([Text::CSV_PP::IV (),
1355                      Text::CSV_PP::NV (),
1356                      Text::CSV_PP::NV (),
1357                      Text::CSV_PP::PV ()]);
1358
1359       Column types are used only for decoding columns while parsing,  in
1360       other words by the "parse" and "getline" methods.
1361
1362       You can unset column types by doing a
1363
1364        $csv->types (undef);
1365
1366       or fetch the current type settings with
1367
1368        $types = $csv->types ();
1369
1370       IV
1371       CSV_TYPE_IV
1372           Set field type to integer.
1373
1374       NV
1375       CSV_TYPE_NV
1376           Set field type to numeric/float.
1377
1378       PV
1379       CSV_TYPE_PV
1380           Set field type to string.
1381
1382   fields
1383        @columns = $csv->fields ();
1384
1385       This method returns the input to   "combine"  or the resultant
1386       decomposed fields of a successful "parse", whichever was called more
1387       recently.
1388
1389       Note that the return value is undefined after using "getline", which
1390       does not fill the data structures returned by "parse".
1391
1392   meta_info
1393        @flags = $csv->meta_info ();
1394
1395       This method returns the "flags" of the input to "combine" or the flags
1396       of the resultant  decomposed fields of  "parse",   whichever was called
1397       more recently.
1398
1399       For each field,  a meta_info field will hold  flags that  inform
1400       something about  the  field  returned  by  the  "fields"  method or
1401       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1402
1403       0x0001
1404       "CSV_FLAGS_IS_QUOTED"
1405         The field was quoted.
1406
1407       0x0002
1408       "CSV_FLAGS_IS_BINARY"
1409         The field was binary.
1410
1411       0x0004
1412       "CSV_FLAGS_ERROR_IN_FIELD"
1413         The field was invalid.
1414
1415         Currently only used when "allow_loose_quotes" is active.
1416
1417       0x0010
1418       "CSV_FLAGS_IS_MISSING"
1419         The field was missing.
1420
1421       See the "is_***" methods below.
1422
1423   is_quoted
1424        my $quoted = $csv->is_quoted ($column_idx);
1425
1426       where  $column_idx is the  (zero-based)  index of the column in the
1427       last result of "parse".
1428
1429       This returns a true value  if the data in the indicated column was
1430       enclosed in "quote_char" quotes.  This might be important for fields
1431       where content ",20070108," is to be treated as a numeric value,  and
1432       where ","20070108"," is explicitly marked as character string data.
1433
1434       This method is only valid when "keep_meta_info" is set to a true value.
1435
1436   is_binary
1437        my $binary = $csv->is_binary ($column_idx);
1438
1439       where  $column_idx is the  (zero-based)  index of the column in the
1440       last result of "parse".
1441
1442       This returns a true value if the data in the indicated column contained
1443       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1444
1445       This method is only valid when "keep_meta_info" is set to a true value.
1446
1447   is_missing
1448        my $missing = $csv->is_missing ($column_idx);
1449
1450       where  $column_idx is the  (zero-based)  index of the column in the
1451       last result of "getline_hr".
1452
1453        $csv->keep_meta_info (1);
1454        while (my $hr = $csv->getline_hr ($fh)) {
1455            $csv->is_missing (0) and next; # This was an empty line
1456            }
1457
1458       When using  "getline_hr",  it is impossible to tell if the  parsed
1459       fields are "undef" because they where not filled in the "CSV" stream
1460       or because they were not read at all, as all the fields defined by
1461       "column_names" are set in the hash-ref.    If you still need to know if
1462       all fields in each row are provided, you should enable "keep_meta_info"
1463       so you can check the flags.
1464
1465       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1466       "undef", regardless of $column_idx being valid or not. If this
1467       attribute is "true" it will return either 0 (the field is present) or 1
1468       (the field is missing).
1469
1470       A special case is the empty line.  If the line is completely empty -
1471       after dealing with the flags - this is still a valid CSV line:  it is a
1472       record of just one single empty field. However, if "keep_meta_info" is
1473       set, invoking "is_missing" with index 0 will now return true.
1474
1475   status
1476        $status = $csv->status ();
1477
1478       This method returns the status of the last invoked "combine" or "parse"
1479       call. Status is success (true: 1) or failure (false: "undef" or 0).
1480
1481       Note that as this only keeps track of the status of above mentioned
1482       methods, you are probably looking for "error_diag" instead.
1483
1484   error_input
1485        $bad_argument = $csv->error_input ();
1486
1487       This method returns the erroneous argument (if it exists) of "combine"
1488       or "parse",  whichever was called more recently.  If the last
1489       invocation was successful, "error_input" will return "undef".
1490
1491       Depending on the type of error, it might also hold the data for the
1492       last error-input of "getline".
1493
1494   error_diag
1495        Text::CSV_PP->error_diag ();
1496        $csv->error_diag ();
1497        $error_code               = 0  + $csv->error_diag ();
1498        $error_str                = "" . $csv->error_diag ();
1499        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1500
1501       If (and only if) an error occurred,  this function returns  the
1502       diagnostics of that error.
1503
1504       If called in void context,  this will print the internal error code and
1505       the associated error message to STDERR.
1506
1507       If called in list context,  this will return  the error code  and the
1508       error message in that order.  If the last error was from parsing, the
1509       rest of the values returned are a best guess at the location  within
1510       the line  that was being parsed. Their values are 1-based.  The
1511       position currently is index of the byte at which the parsing failed in
1512       the current record. It might change to be the index of the current
1513       character in a later release. The records is the index of the record
1514       parsed by the csv instance. The field number is the index of the field
1515       the parser thinks it is currently  trying to  parse. See
1516       examples/csv-check for how this can be used.
1517
1518       If called in  scalar context,  it will return  the diagnostics  in a
1519       single scalar, a-la $!.  It will contain the error code in numeric
1520       context, and the diagnostics message in string context.
1521
1522       When called as a class method or a  direct function call,  the
1523       diagnostics are that of the last "new" call.
1524
1525   record_number
1526        $recno = $csv->record_number ();
1527
1528       Returns the records parsed by this csv instance.  This value should be
1529       more accurate than $. when embedded newlines come in play. Records
1530       written by this instance are not counted.
1531
1532   SetDiag
1533        $csv->SetDiag (0);
1534
1535       Use to reset the diagnostics if you are dealing with errors.
1536

FUNCTIONS

1538       This section is also taken from Text::CSV_XS.
1539
1540   csv
1541       This function is not exported by default and should be explicitly
1542       requested:
1543
1544        use Text::CSV_PP qw( csv );
1545
1546       This is a high-level function that aims at simple (user) interfaces.
1547       This can be used to read/parse a "CSV" file or stream (the default
1548       behavior) or to produce a file or write to a stream (define the  "out"
1549       attribute).  It returns an array- or hash-reference on parsing (or
1550       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1551       When this function fails you can get to the error using the class call
1552       to "error_diag"
1553
1554        my $aoa = csv (in => "test.csv") or
1555            die Text::CSV_PP->error_diag;
1556
1557       This function takes the arguments as key-value pairs. This can be
1558       passed as a list or as an anonymous hash:
1559
1560        my $aoa = csv (  in => "test.csv", sep_char => ";");
1561        my $aoh = csv ({ in => $fh, headers => "auto" });
1562
1563       The arguments passed consist of two parts:  the arguments to "csv"
1564       itself and the optional attributes to the  "CSV"  object used inside
1565       the function as enumerated and explained in "new".
1566
1567       If not overridden, the default option used for CSV is
1568
1569        auto_diag   => 1
1570        escape_null => 0
1571
1572       The option that is always set and cannot be altered is
1573
1574        binary      => 1
1575
1576       As this function will likely be used in one-liners,  it allows  "quote"
1577       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1578       "esc" or "escape".
1579
1580       Alternative invocations:
1581
1582        my $aoa = Text::CSV_PP::csv (in => "file.csv");
1583
1584        my $csv = Text::CSV_PP->new ();
1585        my $aoa = $csv->csv (in => "file.csv");
1586
1587       In the latter case, the object attributes are used from the existing
1588       object and the attribute arguments in the function call are ignored:
1589
1590        my $csv = Text::CSV_PP->new ({ sep_char => ";" });
1591        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1592
1593       will parse using ";" as "sep_char", not ",".
1594
1595       in
1596
1597       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1598       which will be  opened for reading  and closed when finished,  a file
1599       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1600       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1601       "\q{1,2,"csv"}").
1602
1603       When used with "out", "in" should be a reference to a CSV structure
1604       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1605       reference.  The code-ref will be invoked with no arguments.
1606
1607        my $aoa = csv (in => "file.csv");
1608
1609        open my $fh, "<", "file.csv";
1610        my $aoa = csv (in => $fh);
1611
1612        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1613        my $err = csv (in => $csv, out => "file.csv");
1614
1615       If called in void context without the "out" attribute, the resulting
1616       ref will be used as input to a subsequent call to csv:
1617
1618        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1619
1620       will be a shortcut to
1621
1622        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1623
1624       where, in the absence of the "out" attribute, this is a shortcut to
1625
1626        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1627             out => *STDOUT)
1628
1629       out
1630
1631        csv (in => $aoa, out => "file.csv");
1632        csv (in => $aoa, out => $fh);
1633        csv (in => $aoa, out =>   STDOUT);
1634        csv (in => $aoa, out =>  *STDOUT);
1635        csv (in => $aoa, out => \*STDOUT);
1636        csv (in => $aoa, out => \my $data);
1637        csv (in => $aoa, out =>  undef);
1638        csv (in => $aoa, out => \"skip");
1639
1640        csv (in => $fh,  out => \@aoa);
1641        csv (in => $fh,  out => \@aoh, bom => 1);
1642        csv (in => $fh,  out => \%hsh, key => "key");
1643
1644       In output mode, the default CSV options when producing CSV are
1645
1646        eol       => "\r\n"
1647
1648       The "fragment" attribute is ignored in output mode.
1649
1650       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1651       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1652       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1653       or a reference to a scalar (e.g. "\my $data").
1654
1655        csv (in => sub { $sth->fetch },            out => "dump.csv");
1656        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1657             headers => $sth->{NAME_lc});
1658
1659       When a code-ref is used for "in", the output is generated  per
1660       invocation, so no buffering is involved. This implies that there is no
1661       size restriction on the number of records. The "csv" function ends when
1662       the coderef returns a false value.
1663
1664       If "out" is set to a reference of the literal string "skip", the output
1665       will be suppressed completely,  which might be useful in combination
1666       with a filter for side effects only.
1667
1668        my %cache;
1669        csv (in    => "dump.csv",
1670             out   => \"skip",
1671             on_in => sub { $cache{$_[1][1]}++ });
1672
1673       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1674       equivalent to "\"skip"".
1675
1676       If the "in" argument point to something to parse, and the "out" is set
1677       to a reference to an "ARRAY" or a "HASH", the output is appended to the
1678       data in the existing reference. The result of the parse should match
1679       what exists in the reference passed. This might come handy when you
1680       have to parse a set of files with similar content (like data stored per
1681       period) and you want to collect that into a single data structure:
1682
1683        my %hash;
1684        csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1685
1686        my @list; # List of arrays
1687        csv (in => $_, out => \@list)              for sort glob "foo-[0-9]*.csv";
1688
1689        my @list; # List of hashes
1690        csv (in => $_, out => \@list, bom => 1)    for sort glob "foo-[0-9]*.csv";
1691
1692       encoding
1693
1694       If passed,  it should be an encoding accepted by the  :encoding()
1695       option to "open". There is no default value. This attribute does not
1696       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1697       use in command line invocations.
1698
1699       If "encoding" is set to the literal value "auto", the method "header"
1700       will be invoked on the opened stream to check if there is a BOM and set
1701       the encoding accordingly.   This is equal to passing a true value in
1702       the option "detect_bom".
1703
1704       Encodings can be stacked, as supported by "binmode":
1705
1706        # Using PerlIO::via::gzip
1707        csv (in       => \@csv,
1708             out      => "test.csv:via.gz",
1709             encoding => ":via(gzip):encoding(utf-8)",
1710             );
1711        $aoa = csv (in => "test.csv:via.gz",  encoding => ":via(gzip)");
1712
1713        # Using PerlIO::gzip
1714        csv (in       => \@csv,
1715             out      => "test.csv:via.gz",
1716             encoding => ":gzip:encoding(utf-8)",
1717             );
1718        $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1719
1720       detect_bom
1721
1722       If  "detect_bom"  is given, the method  "header"  will be invoked on
1723       the opened stream to check if there is a BOM and set the encoding
1724       accordingly.
1725
1726       "detect_bom" can be abbreviated to "bom".
1727
1728       This is the same as setting "encoding" to "auto".
1729
1730       Note that as the method  "header" is invoked,  its default is to also
1731       set the headers.
1732
1733       headers
1734
1735       If this attribute is not given, the default behavior is to produce an
1736       array of arrays.
1737
1738       If "headers" is supplied,  it should be an anonymous list of column
1739       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1740       "lc", "uc", or "skip".
1741
1742       skip
1743         When "skip" is used, the header will not be included in the output.
1744
1745          my $aoa = csv (in => $fh, headers => "skip");
1746
1747       auto
1748         If "auto" is used, the first line of the "CSV" source will be read as
1749         the list of field headers and used to produce an array of hashes.
1750
1751          my $aoh = csv (in => $fh, headers => "auto");
1752
1753       lc
1754         If "lc" is used,  the first line of the  "CSV" source will be read as
1755         the list of field headers mapped to  lower case and used to produce
1756         an array of hashes. This is a variation of "auto".
1757
1758          my $aoh = csv (in => $fh, headers => "lc");
1759
1760       uc
1761         If "uc" is used,  the first line of the  "CSV" source will be read as
1762         the list of field headers mapped to  upper case and used to produce
1763         an array of hashes. This is a variation of "auto".
1764
1765          my $aoh = csv (in => $fh, headers => "uc");
1766
1767       CODE
1768         If a coderef is used,  the first line of the  "CSV" source will be
1769         read as the list of mangled field headers in which each field is
1770         passed as the only argument to the coderef. This list is used to
1771         produce an array of hashes.
1772
1773          my $aoh = csv (in      => $fh,
1774                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1775
1776         this example is a variation of using "lc" where all occurrences of
1777         "kode" are replaced with "code".
1778
1779       ARRAY
1780         If  "headers"  is an anonymous list,  the entries in the list will be
1781         used as field names. The first line is considered data instead of
1782         headers.
1783
1784          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1785          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1786
1787       HASH
1788         If "headers" is a hash reference, this implies "auto", but header
1789         fields that exist as key in the hashref will be replaced by the value
1790         for that key. Given a CSV file like
1791
1792          post-kode,city,name,id number,fubble
1793          1234AA,Duckstad,Donald,13,"X313DF"
1794
1795         using
1796
1797          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1798
1799         will return an entry like
1800
1801          { pc     => "1234AA",
1802            city   => "Duckstad",
1803            name   => "Donald",
1804            ID     => "13",
1805            fubble => "X313DF",
1806            }
1807
1808       See also "munge_column_names" and "set_column_names".
1809
1810       munge_column_names
1811
1812       If "munge_column_names" is set,  the method  "header"  is invoked on
1813       the opened stream with all matching arguments to detect and set the
1814       headers.
1815
1816       "munge_column_names" can be abbreviated to "munge".
1817
1818       key
1819
1820       If passed,  will default  "headers"  to "auto" and return a hashref
1821       instead of an array of hashes. Allowed values are simple scalars or
1822       array-references where the first element is the joiner and the rest are
1823       the fields to join to combine the key.
1824
1825        my $ref = csv (in => "test.csv", key => "code");
1826        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1827
1828       with test.csv like
1829
1830        code,product,price,color
1831        1,pc,850,gray
1832        2,keyboard,12,white
1833        3,mouse,5,black
1834
1835       the first example will return
1836
1837         { 1   => {
1838               code    => 1,
1839               color   => 'gray',
1840               price   => 850,
1841               product => 'pc'
1842               },
1843           2   => {
1844               code    => 2,
1845               color   => 'white',
1846               price   => 12,
1847               product => 'keyboard'
1848               },
1849           3   => {
1850               code    => 3,
1851               color   => 'black',
1852               price   => 5,
1853               product => 'mouse'
1854               }
1855           }
1856
1857       the second example will return
1858
1859         { "1:gray"    => {
1860               code    => 1,
1861               color   => 'gray',
1862               price   => 850,
1863               product => 'pc'
1864               },
1865           "2:white"   => {
1866               code    => 2,
1867               color   => 'white',
1868               price   => 12,
1869               product => 'keyboard'
1870               },
1871           "3:black"   => {
1872               code    => 3,
1873               color   => 'black',
1874               price   => 5,
1875               product => 'mouse'
1876               }
1877           }
1878
1879       The "key" attribute can be combined with "headers" for "CSV" date that
1880       has no header line, like
1881
1882        my $ref = csv (
1883            in      => "foo.csv",
1884            headers => [qw( c_foo foo bar description stock )],
1885            key     =>     "c_foo",
1886            );
1887
1888       value
1889
1890       Used to create key-value hashes.
1891
1892       Only allowed when "key" is valid. A "value" can be either a single
1893       column label or an anonymous list of column labels.  In the first case,
1894       the value will be a simple scalar value, in the latter case, it will be
1895       a hashref.
1896
1897        my $ref = csv (in => "test.csv", key   => "code",
1898                                         value => "price");
1899        my $ref = csv (in => "test.csv", key   => "code",
1900                                         value => [ "product", "price" ]);
1901        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1902                                         value => "price");
1903        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1904                                         value => [ "product", "price" ]);
1905
1906       with test.csv like
1907
1908        code,product,price,color
1909        1,pc,850,gray
1910        2,keyboard,12,white
1911        3,mouse,5,black
1912
1913       the first example will return
1914
1915         { 1 => 850,
1916           2 =>  12,
1917           3 =>   5,
1918           }
1919
1920       the second example will return
1921
1922         { 1   => {
1923               price   => 850,
1924               product => 'pc'
1925               },
1926           2   => {
1927               price   => 12,
1928               product => 'keyboard'
1929               },
1930           3   => {
1931               price   => 5,
1932               product => 'mouse'
1933               }
1934           }
1935
1936       the third example will return
1937
1938         { "1:gray"    => 850,
1939           "2:white"   =>  12,
1940           "3:black"   =>   5,
1941           }
1942
1943       the fourth example will return
1944
1945         { "1:gray"    => {
1946               price   => 850,
1947               product => 'pc'
1948               },
1949           "2:white"   => {
1950               price   => 12,
1951               product => 'keyboard'
1952               },
1953           "3:black"   => {
1954               price   => 5,
1955               product => 'mouse'
1956               }
1957           }
1958
1959       keep_headers
1960
1961       When using hashes,  keep the column names into the arrayref passed,  so
1962       all headers are available after the call in the original order.
1963
1964        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1965
1966       This attribute can be abbreviated to "kh" or passed as
1967       "keep_column_names".
1968
1969       This attribute implies a default of "auto" for the "headers" attribute.
1970
1971       The headers can also be kept internally to keep stable header order:
1972
1973        csv (in      => csv (in => "file.csv", kh => "internal"),
1974             out     => "new.csv",
1975             kh      => "internal");
1976
1977       where "internal" can also be 1, "yes", or "true". This is similar to
1978
1979        my @h;
1980        csv (in      => csv (in => "file.csv", kh => \@h),
1981             out     => "new.csv",
1982             headers => \@h);
1983
1984       fragment
1985
1986       Only output the fragment as defined in the "fragment" method. This
1987       option is ignored when generating "CSV". See "out".
1988
1989       Combining all of them could give something like
1990
1991        use Text::CSV_PP qw( csv );
1992        my $aoh = csv (
1993            in       => "test.txt",
1994            encoding => "utf-8",
1995            headers  => "auto",
1996            sep_char => "|",
1997            fragment => "row=3;6-9;15-*",
1998            );
1999        say $aoh->[15]{Foo};
2000
2001       sep_set
2002
2003       If "sep_set" is set, the method "header" is invoked on the opened
2004       stream to detect and set "sep_char" with the given set.
2005
2006       "sep_set" can be abbreviated to "seps".
2007
2008       Note that as the  "header" method is invoked,  its default is to also
2009       set the headers.
2010
2011       set_column_names
2012
2013       If  "set_column_names" is passed,  the method "header" is invoked on
2014       the opened stream with all arguments meant for "header".
2015
2016       If "set_column_names" is passed as a false value, the content of the
2017       first row is only preserved if the output is AoA:
2018
2019       With an input-file like
2020
2021        bAr,foo
2022        1,2
2023        3,4,5
2024
2025       This call
2026
2027        my $aoa = csv (in => $file, set_column_names => 0);
2028
2029       will result in
2030
2031        [[ "bar", "foo"     ],
2032         [ "1",   "2"       ],
2033         [ "3",   "4",  "5" ]]
2034
2035       and
2036
2037        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2038
2039       will result in
2040
2041        [[ "bAr", "foo"     ],
2042         [ "1",   "2"       ],
2043         [ "3",   "4",  "5" ]]
2044
2045   Callbacks
2046       Callbacks enable actions triggered from the inside of Text::CSV_PP.
2047
2048       While most of what this enables  can easily be done in an  unrolled
2049       loop as described in the "SYNOPSIS" callbacks can be used to meet
2050       special demands or enhance the "csv" function.
2051
2052       error
2053          $csv->callbacks (error => sub { $csv->SetDiag (0) });
2054
2055         the "error"  callback is invoked when an error occurs,  but  only
2056         when "auto_diag" is set to a true value. A callback is invoked with
2057         the values returned by "error_diag":
2058
2059          my ($c, $s);
2060
2061          sub ignore3006 {
2062              my ($err, $msg, $pos, $recno, $fldno) = @_;
2063              if ($err == 3006) {
2064                  # ignore this error
2065                  ($c, $s) = (undef, undef);
2066                  Text::CSV_PP->SetDiag (0);
2067                  }
2068              # Any other error
2069              return;
2070              } # ignore3006
2071
2072          $csv->callbacks (error => \&ignore3006);
2073          $csv->bind_columns (\$c, \$s);
2074          while ($csv->getline ($fh)) {
2075              # Error 3006 will not stop the loop
2076              }
2077
2078       after_parse
2079          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2080          while (my $row = $csv->getline ($fh)) {
2081              $row->[-1] eq "NEW";
2082              }
2083
2084         This callback is invoked after parsing with  "getline"  only if no
2085         error occurred.  The callback is invoked with two arguments:   the
2086         current "CSV" parser object and an array reference to the fields
2087         parsed.
2088
2089         The return code of the callback is ignored  unless it is a reference
2090         to the string "skip", in which case the record will be skipped in
2091         "getline_all".
2092
2093          sub add_from_db {
2094              my ($csv, $row) = @_;
2095              $sth->execute ($row->[4]);
2096              push @$row, $sth->fetchrow_array;
2097              } # add_from_db
2098
2099          my $aoa = csv (in => "file.csv", callbacks => {
2100              after_parse => \&add_from_db });
2101
2102         This hook can be used for validation:
2103
2104         FAIL
2105           Die if any of the records does not validate a rule:
2106
2107            after_parse => sub {
2108                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2109                    die "5th field does not have a valid Dutch zipcode";
2110                }
2111
2112         DEFAULT
2113           Replace invalid fields with a default value:
2114
2115            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2116
2117         SKIP
2118           Skip records that have invalid fields (only applies to
2119           "getline_all"):
2120
2121            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2122
2123       before_print
2124          my $idx = 1;
2125          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2126          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2127
2128         This callback is invoked  before printing with  "print"  only if no
2129         error occurred.  The callback is invoked with two arguments:  the
2130         current  "CSV" parser object and an array reference to the fields
2131         passed.
2132
2133         The return code of the callback is ignored.
2134
2135          sub max_4_fields {
2136              my ($csv, $row) = @_;
2137              @$row > 4 and splice @$row, 4;
2138              } # max_4_fields
2139
2140          csv (in => csv (in => "file.csv"), out => *STDOUT,
2141              callbacks => { before_print => \&max_4_fields });
2142
2143         This callback is not active for "combine".
2144
2145       Callbacks for csv ()
2146
2147       The "csv" allows for some callbacks that do not integrate in XS
2148       internals but only feature the "csv" function.
2149
2150         csv (in        => "file.csv",
2151              callbacks => {
2152                  filter       => { 6 => sub { $_ > 15 } },    # first
2153                  after_parse  => sub { say "AFTER PARSE";  }, # first
2154                  after_in     => sub { say "AFTER IN";     }, # second
2155                  on_in        => sub { say "ON IN";        }, # third
2156                  },
2157              );
2158
2159         csv (in        => $aoh,
2160              out       => "file.csv",
2161              callbacks => {
2162                  on_in        => sub { say "ON IN";        }, # first
2163                  before_out   => sub { say "BEFORE OUT";   }, # second
2164                  before_print => sub { say "BEFORE PRINT"; }, # third
2165                  },
2166              );
2167
2168       filter
2169         This callback can be used to filter records.  It is called just after
2170         a new record has been scanned.  The callback accepts a:
2171
2172         hashref
2173           The keys are the index to the row (the field name or field number,
2174           1-based) and the values are subs to return a true or false value.
2175
2176            csv (in => "file.csv", filter => {
2177                       3 => sub { m/a/ },       # third field should contain an "a"
2178                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2179                       });
2180
2181            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2182
2183           If the keys to the filter hash contain any character that is not a
2184           digit it will also implicitly set "headers" to "auto"  unless
2185           "headers"  was already passed as argument.  When headers are
2186           active, returning an array of hashes, the filter is not applicable
2187           to the header itself.
2188
2189           All sub results should match, as in AND.
2190
2191           The context of the callback sets  $_ localized to the field
2192           indicated by the filter. The two arguments are as with all other
2193           callbacks, so the other fields in the current row can be seen:
2194
2195            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2196
2197           If the context is set to return a list of hashes  ("headers" is
2198           defined), the current record will also be available in the
2199           localized %_:
2200
2201            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2202
2203           If the filter is used to alter the content by changing $_,  make
2204           sure that the sub returns true in order not to have that record
2205           skipped:
2206
2207            filter => { 2 => sub { $_ = uc }}
2208
2209           will upper-case the second field, and then skip it if the resulting
2210           content evaluates to false. To always accept, end with truth:
2211
2212            filter => { 2 => sub { $_ = uc; 1 }}
2213
2214         coderef
2215            csv (in => "file.csv", filter => sub { $n++; 0; });
2216
2217           If the argument to "filter" is a coderef,  it is an alias or
2218           shortcut to a filter on column 0:
2219
2220            csv (filter => sub { $n++; 0 });
2221
2222           is equal to
2223
2224            csv (filter => { 0 => sub { $n++; 0 });
2225
2226         filter-name
2227            csv (in => "file.csv", filter => "not_blank");
2228            csv (in => "file.csv", filter => "not_empty");
2229            csv (in => "file.csv", filter => "filled");
2230
2231           These are predefined filters
2232
2233           Given a file like (line numbers prefixed for doc purpose only):
2234
2235            1:1,2,3
2236            2:
2237            3:,
2238            4:""
2239            5:,,
2240            6:, ,
2241            7:"",
2242            8:" "
2243            9:4,5,6
2244
2245           not_blank
2246             Filter out the blank lines
2247
2248             This filter is a shortcut for
2249
2250              filter => { 0 => sub { @{$_[1]} > 1 or
2251                          defined $_[1][0] && $_[1][0] ne "" } }
2252
2253             Due to the implementation,  it is currently impossible to also
2254             filter lines that consists only of a quoted empty field. These
2255             lines are also considered blank lines.
2256
2257             With the given example, lines 2 and 4 will be skipped.
2258
2259           not_empty
2260             Filter out lines where all the fields are empty.
2261
2262             This filter is a shortcut for
2263
2264              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2265
2266             A space is not regarded being empty, so given the example data,
2267             lines 2, 3, 4, 5, and 7 are skipped.
2268
2269           filled
2270             Filter out lines that have no visible data
2271
2272             This filter is a shortcut for
2273
2274              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2275
2276             This filter rejects all lines that not have at least one field
2277             that does not evaluate to the empty string.
2278
2279             With the given example data, this filter would skip lines 2
2280             through 8.
2281
2282         One could also use modules like Types::Standard:
2283
2284          use Types::Standard -types;
2285
2286          my $type   = Tuple[Str, Str, Int, Bool, Optional[Num]];
2287          my $check  = $type->compiled_check;
2288
2289          # filter with compiled check and warnings
2290          my $aoa = csv (
2291             in     => \$data,
2292             filter => {
2293                 0 => sub {
2294                     my $ok = $check->($_[1]) or
2295                         warn $type->get_message ($_[1]), "\n";
2296                     return $ok;
2297                     },
2298                 },
2299             );
2300
2301       after_in
2302         This callback is invoked for each record after all records have been
2303         parsed but before returning the reference to the caller.  The hook is
2304         invoked with two arguments:  the current  "CSV"  parser object  and a
2305         reference to the record.   The reference can be a reference to a
2306         HASH  or a reference to an ARRAY as determined by the arguments.
2307
2308         This callback can also be passed as  an attribute without the
2309         "callbacks" wrapper.
2310
2311       before_out
2312         This callback is invoked for each record before the record is
2313         printed.  The hook is invoked with two arguments:  the current "CSV"
2314         parser object and a reference to the record.   The reference can be a
2315         reference to a  HASH or a reference to an ARRAY as determined by the
2316         arguments.
2317
2318         This callback can also be passed as an attribute  without the
2319         "callbacks" wrapper.
2320
2321         This callback makes the row available in %_ if the row is a hashref.
2322         In this case %_ is writable and will change the original row.
2323
2324       on_in
2325         This callback acts exactly as the "after_in" or the "before_out"
2326         hooks.
2327
2328         This callback can also be passed as an attribute  without the
2329         "callbacks" wrapper.
2330
2331         This callback makes the row available in %_ if the row is a hashref.
2332         In this case %_ is writable and will change the original row. So e.g.
2333         with
2334
2335           my $aoh = csv (
2336               in      => \"foo\n1\n2\n",
2337               headers => "auto",
2338               on_in   => sub { $_{bar} = 2; },
2339               );
2340
2341         $aoh will be:
2342
2343           [ { foo => 1,
2344               bar => 2,
2345               }
2346             { foo => 2,
2347               bar => 2,
2348               }
2349             ]
2350
2351       csv
2352         The function  "csv" can also be called as a method or with an
2353         existing Text::CSV_PP object. This could help if the function is to
2354         be invoked a lot of times and the overhead of creating the object
2355         internally over  and  over again would be prevented by passing an
2356         existing instance.
2357
2358          my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
2359
2360          my $aoa = $csv->csv (in => $fh);
2361          my $aoa = csv (in => $fh, csv => $csv);
2362
2363         both act the same. Running this 20000 times on a 20 lines CSV file,
2364         showed a 53% speedup.
2365

DIAGNOSTICS

2367       This section is also taken from Text::CSV_XS.
2368
2369       Still under construction ...
2370
2371       If an error occurs,  "$csv->error_diag" can be used to get information
2372       on the cause of the failure. Note that for speed reasons the internal
2373       value is never cleared on success,  so using the value returned by
2374       "error_diag" in normal cases - when no error occurred - may cause
2375       unexpected results.
2376
2377       If the constructor failed, the cause can be found using "error_diag" as
2378       a class method, like "Text::CSV_PP->error_diag".
2379
2380       The "$csv->error_diag" method is automatically invoked upon error when
2381       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2382       autodie is in effect.  When set to 1, this will cause a "warn" with the
2383       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2384       from "auto_diag" reports.
2385
2386       Errors can be (individually) caught using the "error" callback.
2387
2388       The errors as described below are available. I have tried to make the
2389       error itself explanatory enough, but more descriptions will be added.
2390       For most of these errors, the first three capitals describe the error
2391       category:
2392
2393       • INI
2394
2395         Initialization error or option conflict.
2396
2397       • ECR
2398
2399         Carriage-Return related parse error.
2400
2401       • EOF
2402
2403         End-Of-File related parse error.
2404
2405       • EIQ
2406
2407         Parse error inside quotation.
2408
2409       • EIF
2410
2411         Parse error inside field.
2412
2413       • ECB
2414
2415         Combine error.
2416
2417       • EHR
2418
2419         HashRef parse related error.
2420
2421       And below should be the complete list of error codes that can be
2422       returned:
2423
2424       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2425
2426         The  separation character  cannot be equal to  the quotation
2427         character or to the escape character,  as this would invalidate all
2428         parsing rules.
2429
2430       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2431         TAB"
2432
2433         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2434         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2435         allow.
2436
2437       • 1003 "INI - \r or \n in main attr not allowed"
2438
2439         Using default "eol" characters in either "sep_char", "quote_char",
2440         or  "escape_char"  is  not allowed.
2441
2442       • 1004 "INI - callbacks should be undef or a hashref"
2443
2444         The "callbacks"  attribute only allows one to be "undef" or a hash
2445         reference.
2446
2447       • 1005 "INI - EOL too long"
2448
2449         The value passed for EOL is exceeding its maximum length (16).
2450
2451       • 1006 "INI - SEP too long"
2452
2453         The value passed for SEP is exceeding its maximum length (16).
2454
2455       • 1007 "INI - QUOTE too long"
2456
2457         The value passed for QUOTE is exceeding its maximum length (16).
2458
2459       • 1008 "INI - SEP undefined"
2460
2461         The value passed for SEP should be defined and not empty.
2462
2463       • 1010 "INI - the header is empty"
2464
2465         The header line parsed in the "header" is empty.
2466
2467       • 1011 "INI - the header contains more than one valid separator"
2468
2469         The header line parsed in the  "header"  contains more than one
2470         (unique) separator character out of the allowed set of separators.
2471
2472       • 1012 "INI - the header contains an empty field"
2473
2474         The header line parsed in the "header" contains an empty field.
2475
2476       • 1013 "INI - the header contains nun-unique fields"
2477
2478         The header line parsed in the  "header"  contains at least  two
2479         identical fields.
2480
2481       • 1014 "INI - header called on undefined stream"
2482
2483         The header line cannot be parsed from an undefined source.
2484
2485       • 1500 "PRM - Invalid/unsupported argument(s)"
2486
2487         Function or method called with invalid argument(s) or parameter(s).
2488
2489       • 1501 "PRM - The key attribute is passed as an unsupported type"
2490
2491         The "key" attribute is of an unsupported type.
2492
2493       • 1502 "PRM - The value attribute is passed without the key attribute"
2494
2495         The "value" attribute is only allowed when a valid key is given.
2496
2497       • 1503 "PRM - The value attribute is passed as an unsupported type"
2498
2499         The "value" attribute is of an unsupported type.
2500
2501       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2502
2503         When  "eol"  has  been  set  to  anything  but the  default,  like
2504         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2505         "quote_char", where the characters following the "\r" do not make up
2506         the "eol" sequence, this is an error.
2507
2508       • 2011 "ECR - Characters after end of quoted field"
2509
2510         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2511         quoted field and after the closing double-quote, there should be
2512         either a new-line sequence or a separation character.
2513
2514       • 2012 "EOF - End of data in parsing input stream"
2515
2516         Self-explaining. End-of-file while inside parsing a stream. Can
2517         happen only when reading from streams with "getline",  as using
2518         "parse" is done on strings that are not required to have a trailing
2519         "eol".
2520
2521       • 2013 "INI - Specification error for fragments RFC7111"
2522
2523         Invalid specification for URI "fragment" specification.
2524
2525       • 2014 "ENF - Inconsistent number of fields"
2526
2527         Inconsistent number of fields under strict parsing.
2528
2529       • 2021 "EIQ - NL char inside quotes, binary off"
2530
2531         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2532         option has been selected with the constructor.
2533
2534       • 2022 "EIQ - CR char inside quotes, binary off"
2535
2536         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2537         option has been selected with the constructor.
2538
2539       • 2023 "EIQ - QUO character not allowed"
2540
2541         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2542         Bar",\n" will cause this error.
2543
2544       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2545
2546         The escape character is not allowed as last character in an input
2547         stream.
2548
2549       • 2025 "EIQ - Loose unescaped escape"
2550
2551         An escape character should escape only characters that need escaping.
2552
2553         Allowing  the escape  for other characters  is possible  with the
2554         attribute "allow_loose_escapes".
2555
2556       • 2026 "EIQ - Binary character inside quoted field, binary off"
2557
2558         Binary characters are not allowed by default.    Exceptions are
2559         fields that contain valid UTF-8,  that will automatically be upgraded
2560         if the content is valid UTF-8. Set "binary" to 1 to accept binary
2561         data.
2562
2563       • 2027 "EIQ - Quoted field not terminated"
2564
2565         When parsing a field that started with a quotation character,  the
2566         field is expected to be closed with a quotation character.   When the
2567         parsed line is exhausted before the quote is found, that field is not
2568         terminated.
2569
2570       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2571
2572       • 2031 "EIF - CR char is first char of field, not part of EOL"
2573
2574       • 2032 "EIF - CR char inside unquoted, not part of EOL"
2575
2576       • 2034 "EIF - Loose unescaped quote"
2577
2578       • 2035 "EIF - Escaped EOF in unquoted field"
2579
2580       • 2036 "EIF - ESC error"
2581
2582       • 2037 "EIF - Binary character in unquoted field, binary off"
2583
2584       • 2110 "ECB - Binary character in Combine, binary off"
2585
2586       • 2200 "EIO - print to IO failed. See errno"
2587
2588       • 3001 "EHR - Unsupported syntax for column_names ()"
2589
2590       • 3002 "EHR - getline_hr () called before column_names ()"
2591
2592       • 3003 "EHR - bind_columns () and column_names () fields count
2593         mismatch"
2594
2595       • 3004 "EHR - bind_columns () only accepts refs to scalars"
2596
2597       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2598         fields"
2599
2600       • 3007 "EHR - bind_columns needs refs to writable scalars"
2601
2602       • 3008 "EHR - unexpected error in bound fields"
2603
2604       • 3009 "EHR - print_hr () called before column_names ()"
2605
2606       • 3010 "EHR - print_hr () called with invalid arguments"
2607

SEE ALSO

2609       Text::CSV_XS, Text::CSV
2610
2611       Older versions took many regexp from
2612       <http://www.din.or.jp/~ohzaki/perl.htm>
2613

AUTHOR

2615       Kenichi Ishigaki, <ishigaki[at]cpan.org> Makamaka Hannyaharamitu,
2616       <makamaka[at]cpan.org>
2617
2618       Text::CSV_XS was written by <joe[at]ispsoft.de> and maintained by
2619       <h.m.brand[at]xs4all.nl>.
2620
2621       Text::CSV was written by <alan[at]mfgrtl.com>.
2622
2624       Copyright 2017- by Kenichi Ishigaki, <ishigaki[at]cpan.org> Copyright
2625       2005-2015 by Makamaka Hannyaharamitu, <makamaka[at]cpan.org>
2626
2627       Most of the code and doc is directly taken from the pure perl part of
2628       Text::CSV_XS.
2629
2630       Copyright (C) 2007-2016 H.Merijn Brand.  All rights reserved.
2631       Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
2632       Copyright (C) 1997      Alan Citterman.  All rights reserved.
2633
2634       This library is free software; you can redistribute it and/or modify it
2635       under the same terms as Perl itself.
2636
2637
2638
2639perl v5.38.0                      2023-07-21                   Text::CSV_PP(3)
Impressum