Text::CSV_PP(3pm)

1Text::CSV_PP(3)       User Contributed Perl Documentation      Text::CSV_PP(3)
2
3
4

NAME

6       Text::CSV_PP - Text::CSV_XS compatible pure-Perl module
7

SYNOPSIS

9       This section is taken from Text::CSV_XS.
10
11        # Functional interface
12        use Text::CSV_PP qw( csv );
13
14        # Read whole file in memory
15        my $aoa = csv (in => "data.csv");    # as array of array
16        my $aoh = csv (in => "data.csv",
17                       headers => "auto");   # as array of hash
18
19        # Write array of arrays as csv file
20        csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22        # Only show lines where "code" is odd
23        csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25        # Object interface
26        use Text::CSV_PP;
27
28        my @rows;
29        # Read/parse CSV
30        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
31        open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32        while (my $row = $csv->getline ($fh)) {
33            $row->[2] =~ m/pattern/ or next; # 3rd field should match
34            push @rows, $row;
35            }
36        close $fh;
37
38        # and write as CSV
39        open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40        $csv->say ($fh, $_) for @rows;
41        close $fh or die "new.csv: $!";
42

DESCRIPTION

44       Text::CSV_PP is a pure-perl module that provides facilities for the
45       composition and decomposition of comma-separated values. This is
46       (almost) compatible with much faster Text::CSV_XS, and mainly used as
47       its fallback module when you use Text::CSV module without having
48       installed Text::CSV_XS. If you don't have any reason to use this module
49       directly, use Text::CSV for speed boost and portability (or maybe
50       Text::CSV_XS when you write an one-off script and don't need to care
51       about portability).
52
53       The following caveats are taken from the doc of Text::CSV_XS.
54
55   Embedded newlines
56       Important Note:  The default behavior is to accept only ASCII
57       characters in the range from 0x20 (space) to 0x7E (tilde).   This means
58       that the fields can not contain newlines. If your data contains
59       newlines embedded in fields, or characters above 0x7E (tilde), or
60       binary data, you must set "binary => 1" in the call to "new". To cover
61       the widest range of parsing options, you will always want to set
62       binary.
63
64       But you still have the problem  that you have to pass a correct line to
65       the "parse" method, which is more complicated from the usual point of
66       usage:
67
68        my $csv = Text::CSV_PP->new ({ binary => 1, eol => $/ });
69        while (<>) {           #  WRONG!
70            $csv->parse ($_);
71            my @fields = $csv->fields ();
72            }
73
74       this will break, as the "while" might read broken lines:  it does not
75       care about the quoting. If you need to support embedded newlines,  the
76       way to go is to  not  pass "eol" in the parser  (it accepts "\n", "\r",
77       and "\r\n" by default) and then
78
79        my $csv = Text::CSV_PP->new ({ binary => 1 });
80        open my $fh, "<", $file or die "$file: $!";
81        while (my $row = $csv->getline ($fh)) {
82            my @fields = @$row;
83            }
84
85       The old(er) way of using global file handles is still supported
86
87        while (my $row = $csv->getline (*ARGV)) { ... }
88
89   Unicode
90       Unicode is only tested to work with perl-5.8.2 and up.
91
92       See also "BOM".
93
94       The simplest way to ensure the correct encoding is used for  in- and
95       output is by either setting layers on the filehandles, or setting the
96       "encoding" argument for "csv".
97
98        open my $fh, "<:encoding(UTF-8)", "in.csv"  or die "in.csv: $!";
99       or
100        my $aoa = csv (in => "in.csv",     encoding => "UTF-8");
101
102        open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
103       or
104        csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
105
106       On parsing (both for  "getline" and  "parse"),  if the source is marked
107       being UTF8, then all fields that are marked binary will also be marked
108       UTF8.
109
110       On combining ("print"  and  "combine"):  if any of the combining fields
111       was marked UTF8, the resulting string will be marked as UTF8.  Note
112       however that all fields  before  the first field marked UTF8 and
113       contained 8-bit characters that were not upgraded to UTF8,  these will
114       be  "bytes"  in the resulting string too, possibly causing unexpected
115       errors.  If you pass data of different encoding,  or you don't know if
116       there is  different  encoding, force it to be upgraded before you pass
117       them on:
118
119        $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
120
121       For complete control over encoding, please use Text::CSV::Encoded:
122
123        use Text::CSV::Encoded;
124        my $csv = Text::CSV::Encoded->new ({
125            encoding_in  => "iso-8859-1", # the encoding comes into   Perl
126            encoding_out => "cp1252",     # the encoding comes out of Perl
127            });
128
129        $csv = Text::CSV::Encoded->new ({ encoding  => "utf8" });
130        # combine () and print () accept *literally* utf8 encoded data
131        # parse () and getline () return *literally* utf8 encoded data
132
133        $csv = Text::CSV::Encoded->new ({ encoding  => undef }); # default
134        # combine () and print () accept UTF8 marked data
135        # parse () and getline () return UTF8 marked data
136
137   BOM
138       BOM  (or Byte Order Mark)  handling is available only inside the
139       "header" method.   This method supports the following encodings:
140       "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
141       "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
142       <https://en.wikipedia.org/wiki/Byte_order_mark>.
143
144       If a file has a BOM, the easiest way to deal with that is
145
146        my $aoh = csv (in => $file, detect_bom => 1);
147
148       All records will be encoded based on the detected BOM.
149
150       This implies a call to the  "header"  method,  which defaults to also
151       set the "column_names". So this is not the same as
152
153        my $aoh = csv (in => $file, headers => "auto");
154
155       which only reads the first record to set  "column_names"  but ignores
156       any meaning of possible present BOM.
157

METHODS

159       This section is also taken from Text::CSV_XS.
160
161   version
162       (Class method) Returns the current module version.
163
164   new
165       (Class method) Returns a new instance of class Text::CSV_PP. The
166       attributes are described by the (optional) hash ref "\%attr".
167
168        my $csv = Text::CSV_PP->new ({ attributes ... });
169
170       The following attributes are available:
171
172       eol
173
174        my $csv = Text::CSV_PP->new ({ eol => $/ });
175                  $csv->eol (undef);
176        my $eol = $csv->eol;
177
178       The end-of-line string to add to rows for "print" or the record
179       separator for "getline".
180
181       When not passed in a parser instance,  the default behavior is to
182       accept "\n", "\r", and "\r\n", so it is probably safer to not specify
183       "eol" at all. Passing "undef" or the empty string behave the same.
184
185       When not passed in a generating instance,  records are not terminated
186       at all, so it is probably wise to pass something you expect. A safe
187       choice for "eol" on output is either $/ or "\r\n".
188
189       Common values for "eol" are "\012" ("\n" or Line Feed),  "\015\012"
190       ("\r\n" or Carriage Return, Line Feed),  and "\015"  ("\r" or Carriage
191       Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
192
193       If both $/ and "eol" equal "\015", parsing lines that end on only a
194       Carriage Return without Line Feed, will be "parse"d correct.
195
196       sep_char
197
198        my $csv = Text::CSV_PP->new ({ sep_char => ";" });
199                $csv->sep_char (";");
200        my $c = $csv->sep_char;
201
202       The char used to separate fields, by default a comma. (",").  Limited
203       to a single-byte character, usually in the range from 0x20 (space) to
204       0x7E (tilde). When longer sequences are required, use "sep".
205
206       The separation character can not be equal to the quote character  or to
207       the escape character.
208
209       sep
210
211        my $csv = Text::CSV_PP->new ({ sep => "\N{FULLWIDTH COMMA}" });
212                  $csv->sep (";");
213        my $sep = $csv->sep;
214
215       The chars used to separate fields, by default undefined. Limited to 8
216       bytes.
217
218       When set, overrules "sep_char".  If its length is one byte it acts as
219       an alias to "sep_char".
220
221       quote_char
222
223        my $csv = Text::CSV_PP->new ({ quote_char => "'" });
224                $csv->quote_char (undef);
225        my $c = $csv->quote_char;
226
227       The character to quote fields containing blanks or binary data,  by
228       default the double quote character (""").  A value of undef suppresses
229       quote chars (for simple cases only). Limited to a single-byte
230       character, usually in the range from  0x20 (space) to  0x7E (tilde).
231       When longer sequences are required, use "quote".
232
233       "quote_char" can not be equal to "sep_char".
234
235       quote
236
237        my $csv = Text::CSV_PP->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
238                    $csv->quote ("'");
239        my $quote = $csv->quote;
240
241       The chars used to quote fields, by default undefined. Limited to 8
242       bytes.
243
244       When set, overrules "quote_char". If its length is one byte it acts as
245       an alias to "quote_char".
246
247       escape_char
248
249        my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
250                $csv->escape_char (":");
251        my $c = $csv->escape_char;
252
253       The character to  escape  certain characters inside quoted fields.
254       This is limited to a  single-byte  character,  usually  in the  range
255       from  0x20 (space) to 0x7E (tilde).
256
257       The "escape_char" defaults to being the double-quote mark ("""). In
258       other words the same as the default "quote_char". This means that
259       doubling the quote mark in a field escapes it:
260
261        "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
262
263       If  you  change  the   "quote_char"  without  changing  the
264       "escape_char",  the  "escape_char" will still be the double-quote
265       (""").  If instead you want to escape the  "quote_char" by doubling it
266       you will need to also change the  "escape_char"  to be the same as what
267       you have changed the "quote_char" to.
268
269       Setting "escape_char" to <undef> or "" will disable escaping completely
270       and is greatly discouraged. This will also disable "escape_null".
271
272       The escape character can not be equal to the separation character.
273
274       binary
275
276        my $csv = Text::CSV_PP->new ({ binary => 1 });
277                $csv->binary (0);
278        my $f = $csv->binary;
279
280       If this attribute is 1,  you may use binary characters in quoted
281       fields, including line feeds, carriage returns and "NULL" bytes. (The
282       latter could be escaped as ""0".) By default this feature is off.
283
284       If a string is marked UTF8,  "binary" will be turned on automatically
285       when binary characters other than "CR" and "NL" are encountered.   Note
286       that a simple string like "\x{00a0}" might still be binary, but not
287       marked UTF8, so setting "{ binary => 1 }" is still a wise option.
288
289       strict
290
291        my $csv = Text::CSV_PP->new ({ strict => 1 });
292                $csv->strict (0);
293        my $f = $csv->strict;
294
295       If this attribute is set to 1, any row that parses to a different
296       number of fields than the previous row will cause the parser to throw
297       error 2014.
298
299       formula_handling
300
301       formula
302
303        my $csv = Text::CSV_PP->new ({ formula => "none" });
304                $csv->formula ("none");
305        my $f = $csv->formula;
306
307       This defines the behavior of fields containing formulas. As formulas
308       are considered dangerous in spreadsheets, this attribute can define an
309       optional action to be taken if a field starts with an equal sign ("=").
310
311       For purpose of code-readability, this can also be written as
312
313        my $csv = Text::CSV_PP->new ({ formula_handling => "none" });
314                $csv->formula_handling ("none");
315        my $f = $csv->formula_handling;
316
317       Possible values for this attribute are
318
319       none
320         Take no specific action. This is the default.
321
322          $csv->formula ("none");
323
324       die
325         Cause the process to "die" whenever a leading "=" is encountered.
326
327          $csv->formula ("die");
328
329       croak
330         Cause the process to "croak" whenever a leading "=" is encountered.
331         (See Carp)
332
333          $csv->formula ("croak");
334
335       diag
336         Report position and content of the field whenever a leading  "=" is
337         found.  The value of the field is unchanged.
338
339          $csv->formula ("diag");
340
341       empty
342         Replace the content of fields that start with a "=" with the empty
343         string.
344
345          $csv->formula ("empty");
346          $csv->formula ("");
347
348       undef
349         Replace the content of fields that start with a "=" with "undef".
350
351          $csv->formula ("undef");
352          $csv->formula (undef);
353
354       All other values will give a warning and then fallback to "diag".
355
356       decode_utf8
357
358        my $csv = Text::CSV_PP->new ({ decode_utf8 => 1 });
359                $csv->decode_utf8 (0);
360        my $f = $csv->decode_utf8;
361
362       This attributes defaults to TRUE.
363
364       While parsing,  fields that are valid UTF-8, are automatically set to
365       be UTF-8, so that
366
367         $csv->parse ("\xC4\xA8\n");
368
369       results in
370
371         PV("\304\250"\0) [UTF8 "\x{128}"]
372
373       Sometimes it might not be a desired action.  To prevent those upgrades,
374       set this attribute to false, and the result will be
375
376         PV("\304\250"\0)
377
378       auto_diag
379
380        my $csv = Text::CSV_PP->new ({ auto_diag => 1 });
381                $csv->auto_diag (2);
382        my $l = $csv->auto_diag;
383
384       Set this attribute to a number between 1 and 9 causes  "error_diag" to
385       be automatically called in void context upon errors.
386
387       In case of error "2012 - EOF", this call will be void.
388
389       If "auto_diag" is set to a numeric value greater than 1, it will "die"
390       on errors instead of "warn".  If set to anything unrecognized,  it will
391       be silently ignored.
392
393       Future extensions to this feature will include more reliable auto-
394       detection of  "autodie"  being active in the scope of which the error
395       occurred which will increment the value of "auto_diag" with  1 the
396       moment the error is detected.
397
398       diag_verbose
399
400        my $csv = Text::CSV_PP->new ({ diag_verbose => 1 });
401                $csv->diag_verbose (2);
402        my $l = $csv->diag_verbose;
403
404       Set the verbosity of the output triggered by "auto_diag".   Currently
405       only adds the current  input-record-number  (if known)  to the
406       diagnostic output with an indication of the position of the error.
407
408       blank_is_undef
409
410        my $csv = Text::CSV_PP->new ({ blank_is_undef => 1 });
411                $csv->blank_is_undef (0);
412        my $f = $csv->blank_is_undef;
413
414       Under normal circumstances, "CSV" data makes no distinction between
415       quoted- and unquoted empty fields.  These both end up in an empty
416       string field once read, thus
417
418        1,"",," ",2
419
420       is read as
421
422        ("1", "", "", " ", "2")
423
424       When writing  "CSV" files with either  "always_quote" or  "quote_empty"
425       set, the unquoted  empty field is the result of an undefined value.
426       To enable this distinction when  reading "CSV"  data,  the
427       "blank_is_undef"  attribute will cause  unquoted empty fields to be set
428       to "undef", causing the above to be parsed as
429
430        ("1", "", undef, " ", "2")
431
432       note that this is specifically important when loading  "CSV" fields
433       into a database that allows "NULL" values,  as the perl equivalent for
434       "NULL" is "undef" in DBI land.
435
436       empty_is_undef
437
438        my $csv = Text::CSV_PP->new ({ empty_is_undef => 1 });
439                $csv->empty_is_undef (0);
440        my $f = $csv->empty_is_undef;
441
442       Going one  step  further  than  "blank_is_undef",  this attribute
443       converts all empty fields to "undef", so
444
445        1,"",," ",2
446
447       is read as
448
449        (1, undef, undef, " ", 2)
450
451       Note that this effects only fields that are  originally  empty,  not
452       fields that are empty after stripping allowed whitespace. YMMV.
453
454       allow_whitespace
455
456        my $csv = Text::CSV_PP->new ({ allow_whitespace => 1 });
457                $csv->allow_whitespace (0);
458        my $f = $csv->allow_whitespace;
459
460       When this option is set to true,  the whitespace  ("TAB"'s and
461       "SPACE"'s) surrounding  the  separation character  is removed when
462       parsing.  If either "TAB" or "SPACE" is one of the three characters
463       "sep_char", "quote_char", or "escape_char" it will not be considered
464       whitespace.
465
466       Now lines like:
467
468        1 , "foo" , bar , 3 , zapp
469
470       are parsed as valid "CSV", even though it violates the "CSV" specs.
471
472       Note that  all  whitespace is stripped from both  start and  end of
473       each field.  That would make it  more than a feature to enable parsing
474       bad "CSV" lines, as
475
476        1,   2.0,  3,   ape  , monkey
477
478       will now be parsed as
479
480        ("1", "2.0", "3", "ape", "monkey")
481
482       even if the original line was perfectly acceptable "CSV".
483
484       allow_loose_quotes
485
486        my $csv = Text::CSV_PP->new ({ allow_loose_quotes => 1 });
487                $csv->allow_loose_quotes (0);
488        my $f = $csv->allow_loose_quotes;
489
490       By default, parsing unquoted fields containing "quote_char" characters
491       like
492
493        1,foo "bar" baz,42
494
495       would result in parse error 2034.  Though it is still bad practice to
496       allow this format,  we  cannot  help  the  fact  that  some  vendors
497       make  their applications spit out lines styled this way.
498
499       If there is really bad "CSV" data, like
500
501        1,"foo "bar" baz",42
502
503       or
504
505        1,""foo bar baz"",42
506
507       there is a way to get this data-line parsed and leave the quotes inside
508       the quoted field as-is.  This can be achieved by setting
509       "allow_loose_quotes" AND making sure that the "escape_char" is  not
510       equal to "quote_char".
511
512       allow_loose_escapes
513
514        my $csv = Text::CSV_PP->new ({ allow_loose_escapes => 1 });
515                $csv->allow_loose_escapes (0);
516        my $f = $csv->allow_loose_escapes;
517
518       Parsing fields  that  have  "escape_char"  characters that escape
519       characters that do not need to be escaped, like:
520
521        my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
522        $csv->parse (qq{1,"my bar\'s",baz,42});
523
524       would result in parse error 2025.   Though it is bad practice to allow
525       this format,  this attribute enables you to treat all escape character
526       sequences equal.
527
528       allow_unquoted_escape
529
530        my $csv = Text::CSV_PP->new ({ allow_unquoted_escape => 1 });
531                $csv->allow_unquoted_escape (0);
532        my $f = $csv->allow_unquoted_escape;
533
534       A backward compatibility issue where "escape_char" differs from
535       "quote_char"  prevents  "escape_char" to be in the first position of a
536       field.  If "quote_char" is equal to the default """ and "escape_char"
537       is set to "\", this would be illegal:
538
539        1,\0,2
540
541       Setting this attribute to 1  might help to overcome issues with
542       backward compatibility and allow this style.
543
544       always_quote
545
546        my $csv = Text::CSV_PP->new ({ always_quote => 1 });
547                $csv->always_quote (0);
548        my $f = $csv->always_quote;
549
550       By default the generated fields are quoted only if they need to be.
551       For example, if they contain the separator character. If you set this
552       attribute to 1 then all defined fields will be quoted. ("undef" fields
553       are not quoted, see "blank_is_undef"). This makes it quite often easier
554       to handle exported data in external applications.
555
556       quote_space
557
558        my $csv = Text::CSV_PP->new ({ quote_space => 1 });
559                $csv->quote_space (0);
560        my $f = $csv->quote_space;
561
562       By default,  a space in a field would trigger quotation.  As no rule
563       exists this to be forced in "CSV",  nor any for the opposite, the
564       default is true for safety.   You can exclude the space  from this
565       trigger  by setting this attribute to 0.
566
567       quote_empty
568
569        my $csv = Text::CSV_PP->new ({ quote_empty => 1 });
570                $csv->quote_empty (0);
571        my $f = $csv->quote_empty;
572
573       By default the generated fields are quoted only if they need to be.
574       An empty (defined) field does not need quotation. If you set this
575       attribute to 1 then empty defined fields will be quoted.  ("undef"
576       fields are not quoted, see "blank_is_undef"). See also "always_quote".
577
578       quote_binary
579
580        my $csv = Text::CSV_PP->new ({ quote_binary => 1 });
581                $csv->quote_binary (0);
582        my $f = $csv->quote_binary;
583
584       By default,  all "unsafe" bytes inside a string cause the combined
585       field to be quoted.  By setting this attribute to 0, you can disable
586       that trigger for bytes >= 0x7F.
587
588       escape_null
589
590        my $csv = Text::CSV_PP->new ({ escape_null => 1 });
591                $csv->escape_null (0);
592        my $f = $csv->escape_null;
593
594       By default, a "NULL" byte in a field would be escaped. This option
595       enables you to treat the  "NULL"  byte as a simple binary character in
596       binary mode (the "{ binary => 1 }" is set).  The default is true.  You
597       can prevent "NULL" escapes by setting this attribute to 0.
598
599       When the "escape_char" attribute is set to undefined,  this attribute
600       will be set to false.
601
602       The default setting will encode "=\x00=" as
603
604        "="0="
605
606       With "escape_null" set, this will result in
607
608        "=\x00="
609
610       The default when using the "csv" function is "false".
611
612       For backward compatibility reasons,  the deprecated old name
613       "quote_null" is still recognized.
614
615       keep_meta_info
616
617        my $csv = Text::CSV_PP->new ({ keep_meta_info => 1 });
618                $csv->keep_meta_info (0);
619        my $f = $csv->keep_meta_info;
620
621       By default, the parsing of input records is as simple and fast as
622       possible.  However,  some parsing information - like quotation of the
623       original field - is lost in that process.  Setting this flag to true
624       enables retrieving that information after parsing with  the methods
625       "meta_info",  "is_quoted", and "is_binary" described below.  Default is
626       false for performance.
627
628       If you set this attribute to a value greater than 9,   than you can
629       control output quotation style like it was used in the input of the the
630       last parsed record (unless quotation was added because of other
631       reasons).
632
633        my $csv = Text::CSV_PP->new ({
634           binary         => 1,
635           keep_meta_info => 1,
636           quote_space    => 0,
637           });
638
639        my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
640
641        $csv->print (*STDOUT, \@row);
642        # 1,,, , ,f,g,"h""h",help,help
643        $csv->keep_meta_info (11);
644        $csv->print (*STDOUT, \@row);
645        # 1,,"", ," ",f,"g","h""h",help,"help"
646
647       undef_str
648
649        my $csv = Text::CSV_PP->new ({ undef_str => "\\N" });
650                $csv->undef_str (undef);
651        my $s = $csv->undef_str;
652
653       This attribute optionally defines the output of undefined fields. The
654       value passed is not changed at all, so if it needs quotation, the
655       quotation needs to be included in the value of the attribute.  Use with
656       caution, as passing a value like  ",",,,,"""  will for sure mess up
657       your output. The default for this attribute is "undef", meaning no
658       special treatment.
659
660       This attribute is useful when exporting  CSV data  to be imported in
661       custom loaders, like for MySQL, that recognize special sequences for
662       "NULL" data.
663
664       This attribute has no meaning when parsing CSV data.
665
666       verbatim
667
668        my $csv = Text::CSV_PP->new ({ verbatim => 1 });
669                $csv->verbatim (0);
670        my $f = $csv->verbatim;
671
672       This is a quite controversial attribute to set,  but makes some hard
673       things possible.
674
675       The rationale behind this attribute is to tell the parser that the
676       normally special characters newline ("NL") and Carriage Return ("CR")
677       will not be special when this flag is set,  and be dealt with  as being
678       ordinary binary characters. This will ease working with data with
679       embedded newlines.
680
681       When  "verbatim"  is used with  "getline",  "getline"  auto-"chomp"'s
682       every line.
683
684       Imagine a file format like
685
686        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
687
688       where, the line ending is a very specific "#\r\n", and the sep_char is
689       a "^" (caret).   None of the fields is quoted,   but embedded binary
690       data is likely to be present. With the specific line ending, this
691       should not be too hard to detect.
692
693       By default,  Text::CSV_PP'  parse function is instructed to only know
694       about "\n" and "\r"  to be legal line endings,  and so has to deal with
695       the embedded newline as a real "end-of-line",  so it can scan the next
696       line if binary is true, and the newline is inside a quoted field. With
697       this option, we tell "parse" to parse the line as if "\n" is just
698       nothing more than a binary character.
699
700       For "parse" this means that the parser has no more idea about line
701       ending and "getline" "chomp"s line endings on reading.
702
703       types
704
705       A set of column types; the attribute is immediately passed to the
706       "types" method.
707
708       callbacks
709
710       See the "Callbacks" section below.
711
712       accessors
713
714       To sum it up,
715
716        $csv = Text::CSV_PP->new ();
717
718       is equivalent to
719
720        $csv = Text::CSV_PP->new ({
721            eol                   => undef, # \r, \n, or \r\n
722            sep_char              => ',',
723            sep                   => undef,
724            quote_char            => '"',
725            quote                 => undef,
726            escape_char           => '"',
727            binary                => 0,
728            decode_utf8           => 1,
729            auto_diag             => 0,
730            diag_verbose          => 0,
731            blank_is_undef        => 0,
732            empty_is_undef        => 0,
733            allow_whitespace      => 0,
734            allow_loose_quotes    => 0,
735            allow_loose_escapes   => 0,
736            allow_unquoted_escape => 0,
737            always_quote          => 0,
738            quote_empty           => 0,
739            quote_space           => 1,
740            escape_null           => 1,
741            quote_binary          => 1,
742            keep_meta_info        => 0,
743            strict                => 0,
744            formula               => 0,
745            verbatim              => 0,
746            undef_str             => undef,
747            types                 => undef,
748            callbacks             => undef,
749            });
750
751       For all of the above mentioned flags, an accessor method is available
752       where you can inquire the current value, or change the value
753
754        my $quote = $csv->quote_char;
755        $csv->binary (1);
756
757       It is not wise to change these settings halfway through writing "CSV"
758       data to a stream. If however you want to create a new stream using the
759       available "CSV" object, there is no harm in changing them.
760
761       If the "new" constructor call fails,  it returns "undef",  and makes
762       the fail reason available through the "error_diag" method.
763
764        $csv = Text::CSV_PP->new ({ ecs_char => 1 }) or
765            die "".Text::CSV_PP->error_diag ();
766
767       "error_diag" will return a string like
768
769        "INI - Unknown attribute 'ecs_char'"
770
771   known_attributes
772        @attr = Text::CSV_PP->known_attributes;
773        @attr = Text::CSV_PP::known_attributes;
774        @attr = $csv->known_attributes;
775
776       This method will return an ordered list of all the supported
777       attributes as described above.   This can be useful for knowing what
778       attributes are valid in classes that use or extend Text::CSV_PP.
779
780   print
781        $status = $csv->print ($fh, $colref);
782
783       Similar to  "combine" + "string" + "print",  but much more efficient.
784       It expects an array ref as input  (not an array!)  and the resulting
785       string is not really  created,  but  immediately  written  to the  $fh
786       object, typically an IO handle or any other object that offers a
787       "print" method.
788
789       For performance reasons  "print"  does not create a result string,  so
790       all "string", "status", "fields", and "error_input" methods will return
791       undefined information after executing this method.
792
793       If $colref is "undef"  (explicit,  not through a variable argument) and
794       "bind_columns"  was used to specify fields to be printed,  it is
795       possible to make performance improvements, as otherwise data would have
796       to be copied as arguments to the method call:
797
798        $csv->bind_columns (\($foo, $bar));
799        $status = $csv->print ($fh, undef);
800
801       A short benchmark
802
803        my @data = ("aa" .. "zz");
804        $csv->bind_columns (\(@data));
805
806        $csv->print ($fh, [ @data ]);   # 11800 recs/sec
807        $csv->print ($fh,  \@data  );   # 57600 recs/sec
808        $csv->print ($fh,   undef  );   # 48500 recs/sec
809
810   say
811        $status = $csv->say ($fh, $colref);
812
813       Like "print", but "eol" defaults to "$\".
814
815   print_hr
816        $csv->print_hr ($fh, $ref);
817
818       Provides an easy way  to print a  $ref  (as fetched with "getline_hr")
819       provided the column names are set with "column_names".
820
821       It is just a wrapper method with basic parameter checks over
822
823        $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
824
825   combine
826        $status = $csv->combine (@fields);
827
828       This method constructs a "CSV" record from  @fields,  returning success
829       or failure.   Failure can result from lack of arguments or an argument
830       that contains an invalid character.   Upon success,  "string" can be
831       called to retrieve the resultant "CSV" string.  Upon failure,  the
832       value returned by "string" is undefined and "error_input" could be
833       called to retrieve the invalid argument.
834
835   string
836        $line = $csv->string ();
837
838       This method returns the input to  "parse"  or the resultant "CSV"
839       string of "combine", whichever was called more recently.
840
841   getline
842        $colref = $csv->getline ($fh);
843
844       This is the counterpart to  "print",  as "parse"  is the counterpart to
845       "combine":  it parses a row from the $fh  handle using the "getline"
846       method associated with $fh  and parses this row into an array ref.
847       This array ref is returned by the function or "undef" for failure.
848       When $fh does not support "getline", you are likely to hit errors.
849
850       When fields are bound with "bind_columns" the return value is a
851       reference to an empty list.
852
853       The "string", "fields", and "status" methods are meaningless again.
854
855   getline_all
856        $arrayref = $csv->getline_all ($fh);
857        $arrayref = $csv->getline_all ($fh, $offset);
858        $arrayref = $csv->getline_all ($fh, $offset, $length);
859
860       This will return a reference to a list of getline ($fh) results.  In
861       this call, "keep_meta_info" is disabled.  If $offset is negative, as
862       with "splice", only the last  "abs ($offset)" records of $fh are taken
863       into consideration.
864
865       Given a CSV file with 10 lines:
866
867        lines call
868        ----- ---------------------------------------------------------
869        0..9  $csv->getline_all ($fh)         # all
870        0..9  $csv->getline_all ($fh,  0)     # all
871        8..9  $csv->getline_all ($fh,  8)     # start at 8
872        -     $csv->getline_all ($fh,  0,  0) # start at 0 first 0 rows
873        0..4  $csv->getline_all ($fh,  0,  5) # start at 0 first 5 rows
874        4..5  $csv->getline_all ($fh,  4,  2) # start at 4 first 2 rows
875        8..9  $csv->getline_all ($fh, -2)     # last 2 rows
876        6..7  $csv->getline_all ($fh, -4,  2) # first 2 of last  4 rows
877
878   getline_hr
879       The "getline_hr" and "column_names" methods work together  to allow you
880       to have rows returned as hashrefs.  You must call "column_names" first
881       to declare your column names.
882
883        $csv->column_names (qw( code name price description ));
884        $hr = $csv->getline_hr ($fh);
885        print "Price for $hr->{name} is $hr->{price} EUR\n";
886
887       "getline_hr" will croak if called before "column_names".
888
889       Note that  "getline_hr"  creates a hashref for every row and will be
890       much slower than the combined use of "bind_columns"  and "getline" but
891       still offering the same ease of use hashref inside the loop:
892
893        my @cols = @{$csv->getline ($fh)};
894        $csv->column_names (@cols);
895        while (my $row = $csv->getline_hr ($fh)) {
896            print $row->{price};
897            }
898
899       Could easily be rewritten to the much faster:
900
901        my @cols = @{$csv->getline ($fh)};
902        my $row = {};
903        $csv->bind_columns (\@{$row}{@cols});
904        while ($csv->getline ($fh)) {
905            print $row->{price};
906            }
907
908       Your mileage may vary for the size of the data and the number of rows.
909       With perl-5.14.2 the comparison for a 100_000 line file with 14 rows:
910
911                   Rate hashrefs getlines
912        hashrefs 1.00/s       --     -76%
913        getlines 4.15/s     313%       --
914
915   getline_hr_all
916        $arrayref = $csv->getline_hr_all ($fh);
917        $arrayref = $csv->getline_hr_all ($fh, $offset);
918        $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
919
920       This will return a reference to a list of   getline_hr ($fh) results.
921       In this call, "keep_meta_info" is disabled.
922
923   parse
924        $status = $csv->parse ($line);
925
926       This method decomposes a  "CSV"  string into fields,  returning success
927       or failure.   Failure can result from a lack of argument  or the given
928       "CSV" string is improperly formatted.   Upon success, "fields" can be
929       called to retrieve the decomposed fields. Upon failure calling "fields"
930       will return undefined data and  "error_input"  can be called to
931       retrieve  the invalid argument.
932
933       You may use the "types"  method for setting column types.  See "types"'
934       description below.
935
936       The $line argument is supposed to be a simple scalar. Everything else
937       is supposed to croak and set error 1500.
938
939   fragment
940       This function tries to implement RFC7111  (URI Fragment Identifiers for
941       the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
942
943        my $AoA = $csv->fragment ($fh, $spec);
944
945       In specifications,  "*" is used to specify the last item, a dash ("-")
946       to indicate a range.   All indices are 1-based:  the first row or
947       column has index 1. Selections can be combined with the semi-colon
948       (";").
949
950       When using this method in combination with  "column_names",  the
951       returned reference  will point to a  list of hashes  instead of a  list
952       of lists.  A disjointed  cell-based combined selection  might return
953       rows with different number of columns making the use of hashes
954       unpredictable.
955
956        $csv->column_names ("Name", "Age");
957        my $AoH = $csv->fragment ($fh, "col=3;8");
958
959       If the "after_parse" callback is active,  it is also called on every
960       line parsed and skipped before the fragment.
961
962       row
963          row=4
964          row=5-7
965          row=6-*
966          row=1-2;4;6-*
967
968       col
969          col=2
970          col=1-3
971          col=4-*
972          col=1-2;4;7-*
973
974       cell
975         In cell-based selection, the comma (",") is used to pair row and
976         column
977
978          cell=4,1
979
980         The range operator ("-") using "cell"s can be used to define top-left
981         and bottom-right "cell" location
982
983          cell=3,1-4,6
984
985         The "*" is only allowed in the second part of a pair
986
987          cell=3,2-*,2    # row 3 till end, only column 2
988          cell=3,2-3,*    # column 2 till end, only row 3
989          cell=3,2-*,*    # strip row 1 and 2, and column 1
990
991         Cells and cell ranges may be combined with ";", possibly resulting in
992         rows with different number of columns
993
994          cell=1,1-2,2;3,3-4,4;1,4;4,1
995
996         Disjointed selections will only return selected cells.   The cells
997         that are not  specified  will  not  be  included  in the  returned
998         set,  not even as "undef".  As an example given a "CSV" like
999
1000          11,12,13,...19
1001          21,22,...28,29
1002          :            :
1003          91,...97,98,99
1004
1005         with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1006
1007          11,12,14
1008          21,22
1009          33,34
1010          41,43,44
1011
1012         Overlapping cell-specs will return those cells only once, So
1013         "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1014
1015          11,12,13
1016          21,22,23,24
1017          31,32,33,34
1018          42,43,44
1019
1020       RFC7111 <http://tools.ietf.org/html/rfc7111> does  not  allow different
1021       types of specs to be combined   (either "row" or "col" or "cell").
1022       Passing an invalid fragment specification will croak and set error
1023       2013.
1024
1025   column_names
1026       Set the "keys" that will be used in the  "getline_hr"  calls.  If no
1027       keys (column names) are passed, it will return the current setting as a
1028       list.
1029
1030       "column_names" accepts a list of scalars  (the column names)  or a
1031       single array_ref, so you can pass the return value from "getline" too:
1032
1033        $csv->column_names ($csv->getline ($fh));
1034
1035       "column_names" does no checking on duplicates at all, which might lead
1036       to unexpected results.   Undefined entries will be replaced with the
1037       string "\cAUNDEF\cA", so
1038
1039        $csv->column_names (undef, "", "name", "name");
1040        $hr = $csv->getline_hr ($fh);
1041
1042       Will set "$hr->{"\cAUNDEF\cA"}" to the 1st field,  "$hr->{""}" to the
1043       2nd field, and "$hr->{name}" to the 4th field,  discarding the 3rd
1044       field.
1045
1046       "column_names" croaks on invalid arguments.
1047
1048   header
1049       This method does NOT work in perl-5.6.x
1050
1051       Parse the CSV header and set "sep", column_names and encoding.
1052
1053        my @hdr = $csv->header ($fh);
1054        $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1055        $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1056
1057       The first argument should be a file handle.
1058
1059       This method resets some object properties,  as it is supposed to be
1060       invoked only once per file or stream.  It will leave attributes
1061       "column_names" and "bound_columns" alone of setting column names is
1062       disabled. Reading headers on previously process objects might fail on
1063       perl-5.8.0 and older.
1064
1065       Assuming that the file opened for parsing has a header, and the header
1066       does not contain problematic characters like embedded newlines,   read
1067       the first line from the open handle then auto-detect whether the header
1068       separates the column names with a character from the allowed separator
1069       list.
1070
1071       If any of the allowed separators matches,  and none of the other
1072       allowed separators match,  set  "sep"  to that  separator  for the
1073       current CSV_PP instance and use it to parse the first line, map those
1074       to lowercase, and use that to set the instance "column_names":
1075
1076        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1077        open my $fh, "<", "file.csv";
1078        binmode $fh; # for Windows
1079        $csv->header ($fh);
1080        while (my $row = $csv->getline_hr ($fh)) {
1081            ...
1082            }
1083
1084       If the header is empty,  contains more than one unique separator out of
1085       the allowed set,  contains empty fields,   or contains identical fields
1086       (after folding), it will croak with error 1010, 1011, 1012, or 1013
1087       respectively.
1088
1089       If the header contains embedded newlines or is not valid  CSV  in any
1090       other way, this method will croak and leave the parse error untouched.
1091
1092       A successful call to "header"  will always set the  "sep"  of the $csv
1093       object. This behavior can not be disabled.
1094
1095       return value
1096
1097       On error this method will croak.
1098
1099       In list context,  the headers will be returned whether they are used to
1100       set "column_names" or not.
1101
1102       In scalar context, the instance itself is returned.  Note: the values
1103       as found in the header will effectively be  lost if  "set_column_names"
1104       is false.
1105
1106       Options
1107
1108       sep_set
1109          $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1110
1111         The list of legal separators defaults to "[ ";", "," ]" and can be
1112         changed by this option.  As this is probably the most often used
1113         option,  it can be passed on its own as an unnamed argument:
1114
1115          $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1116
1117         Multi-byte  sequences are allowed,  both multi-character and
1118         Unicode.  See "sep".
1119
1120       detect_bom
1121          $csv->header ($fh, { detect_bom => 1 });
1122
1123         The default behavior is to detect if the header line starts with a
1124         BOM.  If the header has a BOM, use that to set the encoding of $fh.
1125         This default behavior can be disabled by passing a false value to
1126         "detect_bom".
1127
1128         Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1129         UTF-32BE,  and UTF-32LE. BOM's also support UTF-1, UTF-EBCDIC, SCSU,
1130         BOCU-1,  and GB-18030 but Encode does not (yet). UTF-7 is not
1131         supported.
1132
1133         If a supported BOM was detected as start of the stream, it is stored
1134         in the abject attribute "ENCODING".
1135
1136          my $enc = $csv->{ENCODING};
1137
1138         The encoding is used with "binmode" on $fh.
1139
1140         If the handle was opened in a (correct) encoding,  this method will
1141         not alter the encoding, as it checks the leading bytes of the first
1142         line. In case the stream starts with a decode BOM ("U+FEFF"),
1143         "{ENCODING}" will be "" (empty) instead of the default "undef".
1144
1145       munge_column_names
1146         This option offers the means to modify the column names into
1147         something that is most useful to the application.   The default is to
1148         map all column names to lower case.
1149
1150          $csv->header ($fh, { munge_column_names => "lc" });
1151
1152         The following values are available:
1153
1154           lc     - lower case
1155           uc     - upper case
1156           none   - do not change
1157           \%hash - supply a mapping
1158           \&cb   - supply a callback
1159
1160         Literal:
1161
1162          $csv->header ($fh, { munge_column_names => "none" });
1163
1164         Hash:
1165
1166          $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1167
1168         if a value does not exist, the original value is used unchanged
1169
1170         Callback:
1171
1172          $csv->header ($fh, { munge_column_names => sub { fc } });
1173          $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1174          $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1175
1176         As this callback is called in a "map", you can use $_ directly.
1177
1178       set_column_names
1179          $csv->header ($fh, { set_column_names => 1 });
1180
1181         The default is to set the instances column names using
1182         "column_names" if the method is successful,  so subsequent calls to
1183         "getline_hr" can return a hash. Disable setting the header can be
1184         forced by using a false value for this option.
1185
1186         As described in "return value" above, content is lost in scalar
1187         context.
1188
1189       Validation
1190
1191       When receiving CSV files from external sources,  this method can be
1192       used to protect against changes in the layout by restricting to known
1193       headers  (and typos in the header fields).
1194
1195        my %known = (
1196            "record key" => "c_rec",
1197            "rec id"     => "c_rec",
1198            "id_rec"     => "c_rec",
1199            "kode"       => "code",
1200            "code"       => "code",
1201            "vaule"      => "value",
1202            "value"      => "value",
1203            );
1204        my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1205        open my $fh, "<", $source or die "$source: $!";
1206        $csv->header ($fh, { munge_column_names => sub {
1207            s/\s+$//;
1208            s/^\s+//;
1209            $known{lc $_} or die "Unknown column '$_' in $source";
1210            }});
1211        while (my $row = $csv->getline_hr ($fh)) {
1212            say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1213            }
1214
1215   bind_columns
1216       Takes a list of scalar references to be used for output with  "print"
1217       or to store in the fields fetched by "getline".  When you do not pass
1218       enough references to store the fetched fields in, "getline" will fail
1219       with error 3006.  If you pass more than there are fields to return,
1220       the content of the remaining references is left untouched.
1221
1222        $csv->bind_columns (\$code, \$name, \$price, \$description);
1223        while ($csv->getline ($fh)) {
1224            print "The price of a $name is \x{20ac} $price\n";
1225            }
1226
1227       To reset or clear all column binding, call "bind_columns" with the
1228       single argument "undef". This will also clear column names.
1229
1230        $csv->bind_columns (undef);
1231
1232       If no arguments are passed at all, "bind_columns" will return the list
1233       of current bindings or "undef" if no binds are active.
1234
1235       Note that in parsing with  "bind_columns",  the fields are set on the
1236       fly.  That implies that if the third field of a row causes an error
1237       (or this row has just two fields where the previous row had more),  the
1238       first two fields already have been assigned the values of the current
1239       row, while the rest of the fields will still hold the values of the
1240       previous row.  If you want the parser to fail in these cases, use the
1241       "strict" attribute.
1242
1243   eof
1244        $eof = $csv->eof ();
1245
1246       If "parse" or  "getline"  was used with an IO stream,  this method will
1247       return true (1) if the last call hit end of file,  otherwise it will
1248       return false ('').  This is useful to see the difference between a
1249       failure and end of file.
1250
1251       Note that if the parsing of the last line caused an error,  "eof" is
1252       still true.  That means that if you are not using "auto_diag", an idiom
1253       like
1254
1255        while (my $row = $csv->getline ($fh)) {
1256            # ...
1257            }
1258        $csv->eof or $csv->error_diag;
1259
1260       will not report the error. You would have to change that to
1261
1262        while (my $row = $csv->getline ($fh)) {
1263            # ...
1264            }
1265        +$csv->error_diag and $csv->error_diag;
1266
1267   types
1268        $csv->types (\@tref);
1269
1270       This method is used to force that  (all)  columns are of a given type.
1271       For example, if you have an integer column,  two  columns  with
1272       doubles  and a string column, then you might do a
1273
1274        $csv->types ([Text::CSV_PP::IV (),
1275                      Text::CSV_PP::NV (),
1276                      Text::CSV_PP::NV (),
1277                      Text::CSV_PP::PV ()]);
1278
1279       Column types are used only for decoding columns while parsing,  in
1280       other words by the "parse" and "getline" methods.
1281
1282       You can unset column types by doing a
1283
1284        $csv->types (undef);
1285
1286       or fetch the current type settings with
1287
1288        $types = $csv->types ();
1289
1290       IV  Set field type to integer.
1291
1292       NV  Set field type to numeric/float.
1293
1294       PV  Set field type to string.
1295
1296   fields
1297        @columns = $csv->fields ();
1298
1299       This method returns the input to   "combine"  or the resultant
1300       decomposed fields of a successful "parse", whichever was called more
1301       recently.
1302
1303       Note that the return value is undefined after using "getline", which
1304       does not fill the data structures returned by "parse".
1305
1306   meta_info
1307        @flags = $csv->meta_info ();
1308
1309       This method returns the "flags" of the input to "combine" or the flags
1310       of the resultant  decomposed fields of  "parse",   whichever was called
1311       more recently.
1312
1313       For each field,  a meta_info field will hold  flags that  inform
1314       something about  the  field  returned  by  the  "fields"  method or
1315       passed to  the "combine" method. The flags are bit-wise-"or"'d like:
1316
1317       " "0x0001
1318         The field was quoted.
1319
1320       " "0x0002
1321         The field was binary.
1322
1323       See the "is_***" methods below.
1324
1325   is_quoted
1326        my $quoted = $csv->is_quoted ($column_idx);
1327
1328       Where  $column_idx is the  (zero-based)  index of the column in the
1329       last result of "parse".
1330
1331       This returns a true value  if the data in the indicated column was
1332       enclosed in "quote_char" quotes.  This might be important for fields
1333       where content ",20070108," is to be treated as a numeric value,  and
1334       where ","20070108"," is explicitly marked as character string data.
1335
1336       This method is only valid when "keep_meta_info" is set to a true value.
1337
1338   is_binary
1339        my $binary = $csv->is_binary ($column_idx);
1340
1341       Where  $column_idx is the  (zero-based)  index of the column in the
1342       last result of "parse".
1343
1344       This returns a true value if the data in the indicated column contained
1345       any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1346
1347       This method is only valid when "keep_meta_info" is set to a true value.
1348
1349   is_missing
1350        my $missing = $csv->is_missing ($column_idx);
1351
1352       Where  $column_idx is the  (zero-based)  index of the column in the
1353       last result of "getline_hr".
1354
1355        $csv->keep_meta_info (1);
1356        while (my $hr = $csv->getline_hr ($fh)) {
1357            $csv->is_missing (0) and next; # This was an empty line
1358            }
1359
1360       When using  "getline_hr",  it is impossible to tell if the  parsed
1361       fields are "undef" because they where not filled in the "CSV" stream
1362       or because they were not read at all, as all the fields defined by
1363       "column_names" are set in the hash-ref.    If you still need to know if
1364       all fields in each row are provided, you should enable "keep_meta_info"
1365       so you can check the flags.
1366
1367       If  "keep_meta_info"  is "false",  "is_missing"  will always return
1368       "undef", regardless of $column_idx being valid or not. If this
1369       attribute is "true" it will return either 0 (the field is present) or 1
1370       (the field is missing).
1371
1372       A special case is the empty line.  If the line is completely empty -
1373       after dealing with the flags - this is still a valid CSV line:  it is a
1374       record of just one single empty field. However, if "keep_meta_info" is
1375       set, invoking "is_missing" with index 0 will now return true.
1376
1377   status
1378        $status = $csv->status ();
1379
1380       This method returns the status of the last invoked "combine" or "parse"
1381       call. Status is success (true: 1) or failure (false: "undef" or 0).
1382
1383   error_input
1384        $bad_argument = $csv->error_input ();
1385
1386       This method returns the erroneous argument (if it exists) of "combine"
1387       or "parse",  whichever was called more recently.  If the last
1388       invocation was successful, "error_input" will return "undef".
1389
1390   error_diag
1391        Text::CSV_PP->error_diag ();
1392        $csv->error_diag ();
1393        $error_code               = 0  + $csv->error_diag ();
1394        $error_str                = "" . $csv->error_diag ();
1395        ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1396
1397       If (and only if) an error occurred,  this function returns  the
1398       diagnostics of that error.
1399
1400       If called in void context,  this will print the internal error code and
1401       the associated error message to STDERR.
1402
1403       If called in list context,  this will return  the error code  and the
1404       error message in that order.  If the last error was from parsing, the
1405       rest of the values returned are a best guess at the location  within
1406       the line  that was being parsed. Their values are 1-based.  The
1407       position currently is index of the byte at which the parsing failed in
1408       the current record. It might change to be the index of the current
1409       character in a later release. The records is the index of the record
1410       parsed by the csv instance. The field number is the index of the field
1411       the parser thinks it is currently  trying to  parse. See
1412       examples/csv-check for how this can be used.
1413
1414       If called in  scalar context,  it will return  the diagnostics  in a
1415       single scalar, a-la $!.  It will contain the error code in numeric
1416       context, and the diagnostics message in string context.
1417
1418       When called as a class method or a  direct function call,  the
1419       diagnostics are that of the last "new" call.
1420
1421   record_number
1422        $recno = $csv->record_number ();
1423
1424       Returns the records parsed by this csv instance.  This value should be
1425       more accurate than $. when embedded newlines come in play. Records
1426       written by this instance are not counted.
1427
1428   SetDiag
1429        $csv->SetDiag (0);
1430
1431       Use to reset the diagnostics if you are dealing with errors.
1432

FUNCTIONS

1434       This section is also taken from Text::CSV_XS.
1435
1436   csv
1437       This function is not exported by default and should be explicitly
1438       requested:
1439
1440        use Text::CSV_PP qw( csv );
1441
1442       This is an high-level function that aims at simple (user) interfaces.
1443       This can be used to read/parse a "CSV" file or stream (the default
1444       behavior) or to produce a file or write to a stream (define the  "out"
1445       attribute).  It returns an array- or hash-reference on parsing (or
1446       "undef" on fail) or the numeric value of  "error_diag"  on writing.
1447       When this function fails you can get to the error using the class call
1448       to "error_diag"
1449
1450        my $aoa = csv (in => "test.csv") or
1451            die Text::CSV_PP->error_diag;
1452
1453       This function takes the arguments as key-value pairs. This can be
1454       passed as a list or as an anonymous hash:
1455
1456        my $aoa = csv (  in => "test.csv", sep_char => ";");
1457        my $aoh = csv ({ in => $fh, headers => "auto" });
1458
1459       The arguments passed consist of two parts:  the arguments to "csv"
1460       itself and the optional attributes to the  "CSV"  object used inside
1461       the function as enumerated and explained in "new".
1462
1463       If not overridden, the default option used for CSV is
1464
1465        auto_diag   => 1
1466        escape_null => 0
1467
1468       The option that is always set and cannot be altered is
1469
1470        binary      => 1
1471
1472       As this function will likely be used in one-liners,  it allows  "quote"
1473       to be abbreviated as "quo",  and  "escape_char" to be abbreviated as
1474       "esc" or "escape".
1475
1476       Alternative invocations:
1477
1478        my $aoa = Text::CSV_PP::csv (in => "file.csv");
1479
1480        my $csv = Text::CSV_PP->new ();
1481        my $aoa = $csv->csv (in => "file.csv");
1482
1483       In the latter case, the object attributes are used from the existing
1484       object and the attribute arguments in the function call are ignored:
1485
1486        my $csv = Text::CSV_PP->new ({ sep_char => ";" });
1487        my $aoh = $csv->csv (in => "file.csv", bom => 1);
1488
1489       will parse using ";" as "sep_char", not ",".
1490
1491       in
1492
1493       Used to specify the source.  "in" can be a file name (e.g. "file.csv"),
1494       which will be  opened for reading  and closed when finished,  a file
1495       handle (e.g.  $fh or "FH"),  a reference to a glob (e.g. "\*ARGV"),
1496       the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1497       "\q{1,2,"csv"}").
1498
1499       When used with "out", "in" should be a reference to a CSV structure
1500       (AoA or AoH)  or a CODE-ref that returns an array-reference or a hash-
1501       reference.  The code-ref will be invoked with no arguments.
1502
1503        my $aoa = csv (in => "file.csv");
1504
1505        open my $fh, "<", "file.csv";
1506        my $aoa = csv (in => $fh);
1507
1508        my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1509        my $err = csv (in => $csv, out => "file.csv");
1510
1511       If called in void context without the "out" attribute, the resulting
1512       ref will be used as input to a subsequent call to csv:
1513
1514        csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1515
1516       will be a shortcut to
1517
1518        csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1519
1520       where, in the absence of the "out" attribute, this is a shortcut to
1521
1522        csv (in  => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1523             out => *STDOUT)
1524
1525       out
1526
1527        csv (in => $aoa, out => "file.csv");
1528        csv (in => $aoa, out => $fh);
1529        csv (in => $aoa, out =>   STDOUT);
1530        csv (in => $aoa, out =>  *STDOUT);
1531        csv (in => $aoa, out => \*STDOUT);
1532        csv (in => $aoa, out => \my $data);
1533        csv (in => $aoa, out =>  undef);
1534        csv (in => $aoa, out => \"skip");
1535
1536       In output mode, the default CSV options when producing CSV are
1537
1538        eol       => "\r\n"
1539
1540       The "fragment" attribute is ignored in output mode.
1541
1542       "out" can be a file name  (e.g.  "file.csv"),  which will be opened for
1543       writing and closed when finished,  a file handle (e.g. $fh or "FH"),  a
1544       reference to a glob (e.g. "\*STDOUT"),  the glob itself (e.g. *STDOUT),
1545       or a reference to a scalar (e.g. "\my $data").
1546
1547        csv (in => sub { $sth->fetch },            out => "dump.csv");
1548        csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1549             headers => $sth->{NAME_lc});
1550
1551       When a code-ref is used for "in", the output is generated  per
1552       invocation, so no buffering is involved. This implies that there is no
1553       size restriction on the number of records. The "csv" function ends when
1554       the coderef returns a false value.
1555
1556       If "out" is set to a reference of the literal string "skip", the output
1557       will be suppressed completely,  which might be useful in combination
1558       with a filter for side effects only.
1559
1560        my %cache;
1561        csv (in    => "dump.csv",
1562             out   => \"skip",
1563             on_in => sub { $cache{$_[1][1]}++ });
1564
1565       Currently,  setting "out" to any false value  ("undef", "", 0) will be
1566       equivalent to "\"skip"".
1567
1568       encoding
1569
1570       If passed,  it should be an encoding accepted by the  ":encoding()"
1571       option to "open". There is no default value. This attribute does not
1572       work in perl 5.6.x.  "encoding" can be abbreviated to "enc" for ease of
1573       use in command line invocations.
1574
1575       If "encoding" is set to the literal value "auto", the method "header"
1576       will be invoked on the opened stream to check if there is a BOM and set
1577       the encoding accordingly.   This is equal to passing a true value in
1578       the option "detect_bom".
1579
1580       detect_bom
1581
1582       If  "detect_bom"  is given, the method  "header"  will be invoked on
1583       the opened stream to check if there is a BOM and set the encoding
1584       accordingly.
1585
1586       "detect_bom" can be abbreviated to "bom".
1587
1588       This is the same as setting "encoding" to "auto".
1589
1590       Note that as the method  "header" is invoked,  its default is to also
1591       set the headers.
1592
1593       headers
1594
1595       If this attribute is not given, the default behavior is to produce an
1596       array of arrays.
1597
1598       If "headers" is supplied,  it should be an anonymous list of column
1599       names, an anonymous hashref, a coderef, or a literal flag:  "auto",
1600       "lc", "uc", or "skip".
1601
1602       skip
1603         When "skip" is used, the header will not be included in the output.
1604
1605          my $aoa = csv (in => $fh, headers => "skip");
1606
1607       auto
1608         If "auto" is used, the first line of the "CSV" source will be read as
1609         the list of field headers and used to produce an array of hashes.
1610
1611          my $aoh = csv (in => $fh, headers => "auto");
1612
1613       lc
1614         If "lc" is used,  the first line of the  "CSV" source will be read as
1615         the list of field headers mapped to  lower case and used to produce
1616         an array of hashes. This is a variation of "auto".
1617
1618          my $aoh = csv (in => $fh, headers => "lc");
1619
1620       uc
1621         If "uc" is used,  the first line of the  "CSV" source will be read as
1622         the list of field headers mapped to  upper case and used to produce
1623         an array of hashes. This is a variation of "auto".
1624
1625          my $aoh = csv (in => $fh, headers => "uc");
1626
1627       CODE
1628         If a coderef is used,  the first line of the  "CSV" source will be
1629         read as the list of mangled field headers in which each field is
1630         passed as the only argument to the coderef. This list is used to
1631         produce an array of hashes.
1632
1633          my $aoh = csv (in      => $fh,
1634                         headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1635
1636         this example is a variation of using "lc" where all occurrences of
1637         "kode" are replaced with "code".
1638
1639       ARRAY
1640         If  "headers"  is an anonymous list,  the entries in the list will be
1641         used as field names. The first line is considered data instead of
1642         headers.
1643
1644          my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1645          csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1646
1647       HASH
1648         If "headers" is an hash reference, this implies "auto", but header
1649         fields for that exist as key in the hashref will be replaced by the
1650         value for that key. Given a CSV file like
1651
1652          post-kode,city,name,id number,fubble
1653          1234AA,Duckstad,Donald,13,"X313DF"
1654
1655         using
1656
1657          csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1658
1659         will return an entry like
1660
1661          { pc     => "1234AA",
1662            city   => "Duckstad",
1663            name   => "Donald",
1664            ID     => "13",
1665            fubble => "X313DF",
1666            }
1667
1668       See also "munge_column_names" and "set_column_names".
1669
1670       munge_column_names
1671
1672       If "munge_column_names" is set,  the method  "header"  is invoked on
1673       the opened stream with all matching arguments to detect and set the
1674       headers.
1675
1676       "munge_column_names" can be abbreviated to "munge".
1677
1678       key
1679
1680       If passed,  will default  "headers"  to "auto" and return a hashref
1681       instead of an array of hashes. Allowed values are simple scalars or
1682       array-references where the first element is the joiner and the rest are
1683       the fields to join to combine the key.
1684
1685        my $ref = csv (in => "test.csv", key => "code");
1686        my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1687
1688       with test.csv like
1689
1690        code,product,price,color
1691        1,pc,850,gray
1692        2,keyboard,12,white
1693        3,mouse,5,black
1694
1695       the first example will return
1696
1697         { 1   => {
1698               code    => 1,
1699               color   => 'gray',
1700               price   => 850,
1701               product => 'pc'
1702               },
1703           2   => {
1704               code    => 2,
1705               color   => 'white',
1706               price   => 12,
1707               product => 'keyboard'
1708               },
1709           3   => {
1710               code    => 3,
1711               color   => 'black',
1712               price   => 5,
1713               product => 'mouse'
1714               }
1715           }
1716
1717       the second example will return
1718
1719         { "1:gray"    => {
1720               code    => 1,
1721               color   => 'gray',
1722               price   => 850,
1723               product => 'pc'
1724               },
1725           "2:white"   => {
1726               code    => 2,
1727               color   => 'white',
1728               price   => 12,
1729               product => 'keyboard'
1730               },
1731           "3:black"   => {
1732               code    => 3,
1733               color   => 'black',
1734               price   => 5,
1735               product => 'mouse'
1736               }
1737           }
1738
1739       The "key" attribute can be combined with "headers" for "CSV" date that
1740       has no header line, like
1741
1742        my $ref = csv (
1743            in      => "foo.csv",
1744            headers => [qw( c_foo foo bar description stock )],
1745            key     =>     "c_foo",
1746            );
1747
1748       value
1749
1750       Used to create key-value hashes.
1751
1752       Only allowed when "key" is valid. A "value" can be either a single
1753       column label or an anonymous list of column labels.  In the first case,
1754       the value will be a simple scalar value, in the latter case, it will be
1755       a hashref.
1756
1757        my $ref = csv (in => "test.csv", key   => "code",
1758                                         value => "price");
1759        my $ref = csv (in => "test.csv", key   => "code",
1760                                         value => [ "product", "price" ]);
1761        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1762                                         value => "price");
1763        my $ref = csv (in => "test.csv", key   => [ ":" => "code", "color" ],
1764                                         value => [ "product", "price" ]);
1765
1766       with test.csv like
1767
1768        code,product,price,color
1769        1,pc,850,gray
1770        2,keyboard,12,white
1771        3,mouse,5,black
1772
1773       the first example will return
1774
1775         { 1 => 850,
1776           2 =>  12,
1777           3 =>   5,
1778           }
1779
1780       the second example will return
1781
1782         { 1   => {
1783               price   => 850,
1784               product => 'pc'
1785               },
1786           2   => {
1787               price   => 12,
1788               product => 'keyboard'
1789               },
1790           3   => {
1791               price   => 5,
1792               product => 'mouse'
1793               }
1794           }
1795
1796       the third example will return
1797
1798         { "1:gray"    => 850,
1799           "2:white"   =>  12,
1800           "3:black"   =>   5,
1801           }
1802
1803       the fourth example will return
1804
1805         { "1:gray"    => {
1806               price   => 850,
1807               product => 'pc'
1808               },
1809           "2:white"   => {
1810               price   => 12,
1811               product => 'keyboard'
1812               },
1813           "3:black"   => {
1814               price   => 5,
1815               product => 'mouse'
1816               }
1817           }
1818
1819       keep_headers
1820
1821       When using hashes,  keep the column names into the arrayref passed,  so
1822       all headers are available after the call in the original order.
1823
1824        my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1825
1826       This attribute can be abbreviated to "kh" or passed as
1827       "keep_column_names".
1828
1829       This attribute implies a default of "auto" for the "headers" attribute.
1830
1831       fragment
1832
1833       Only output the fragment as defined in the "fragment" method. This
1834       option is ignored when generating "CSV". See "out".
1835
1836       Combining all of them could give something like
1837
1838        use Text::CSV_PP qw( csv );
1839        my $aoh = csv (
1840            in       => "test.txt",
1841            encoding => "utf-8",
1842            headers  => "auto",
1843            sep_char => "|",
1844            fragment => "row=3;6-9;15-*",
1845            );
1846        say $aoh->[15]{Foo};
1847
1848       sep_set
1849
1850       If "sep_set" is set, the method "header" is invoked on the opened
1851       stream to detect and set "sep_char" with the given set.
1852
1853       "sep_set" can be abbreviated to "seps".
1854
1855       Note that as the  "header" method is invoked,  its default is to also
1856       set the headers.
1857
1858       set_column_names
1859
1860       If  "set_column_names" is passed,  the method "header" is invoked on
1861       the opened stream with all arguments meant for "header".
1862
1863       If "set_column_names" is passed as a false value, the content of the
1864       first row is only preserved if the output is AoA:
1865
1866       With an input-file like
1867
1868        bAr,foo
1869        1,2
1870        3,4,5
1871
1872       This call
1873
1874        my $aoa = csv (in => $file, set_column_names => 0);
1875
1876       will result in
1877
1878        [[ "bar", "foo"     ],
1879         [ "1",   "2"       ],
1880         [ "3",   "4",  "5" ]]
1881
1882       and
1883
1884        my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
1885
1886       will result in
1887
1888        [[ "bAr", "foo"     ],
1889         [ "1",   "2"       ],
1890         [ "3",   "4",  "5" ]]
1891
1892   Callbacks
1893       Callbacks enable actions triggered from the inside of Text::CSV_PP.
1894
1895       While most of what this enables  can easily be done in an  unrolled
1896       loop as described in the "SYNOPSIS" callbacks can be used to meet
1897       special demands or enhance the "csv" function.
1898
1899       error
1900          $csv->callbacks (error => sub { $csv->SetDiag (0) });
1901
1902         the "error"  callback is invoked when an error occurs,  but  only
1903         when "auto_diag" is set to a true value. A callback is invoked with
1904         the values returned by "error_diag":
1905
1906          my ($c, $s);
1907
1908          sub ignore3006
1909          {
1910              my ($err, $msg, $pos, $recno, $fldno) = @_;
1911              if ($err == 3006) {
1912                  # ignore this error
1913                  ($c, $s) = (undef, undef);
1914                  Text::CSV_PP->SetDiag (0);
1915                  }
1916              # Any other error
1917              return;
1918              } # ignore3006
1919
1920          $csv->callbacks (error => \&ignore3006);
1921          $csv->bind_columns (\$c, \$s);
1922          while ($csv->getline ($fh)) {
1923              # Error 3006 will not stop the loop
1924              }
1925
1926       after_parse
1927          $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
1928          while (my $row = $csv->getline ($fh)) {
1929              $row->[-1] eq "NEW";
1930              }
1931
1932         This callback is invoked after parsing with  "getline"  only if no
1933         error occurred.  The callback is invoked with two arguments:   the
1934         current "CSV" parser object and an array reference to the fields
1935         parsed.
1936
1937         The return code of the callback is ignored  unless it is a reference
1938         to the string "skip", in which case the record will be skipped in
1939         "getline_all".
1940
1941          sub add_from_db
1942          {
1943              my ($csv, $row) = @_;
1944              $sth->execute ($row->[4]);
1945              push @$row, $sth->fetchrow_array;
1946              } # add_from_db
1947
1948          my $aoa = csv (in => "file.csv", callbacks => {
1949              after_parse => \&add_from_db });
1950
1951         This hook can be used for validation:
1952
1953         FAIL
1954           Die if any of the records does not validate a rule:
1955
1956            after_parse => sub {
1957                $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
1958                    die "5th field does not have a valid Dutch zipcode";
1959                }
1960
1961         DEFAULT
1962           Replace invalid fields with a default value:
1963
1964            after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
1965
1966         SKIP
1967           Skip records that have invalid fields (only applies to
1968           "getline_all"):
1969
1970            after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
1971
1972       before_print
1973          my $idx = 1;
1974          $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
1975          $csv->print (*STDOUT, [ 0, $_ ]) for @members;
1976
1977         This callback is invoked  before printing with  "print"  only if no
1978         error occurred.  The callback is invoked with two arguments:  the
1979         current  "CSV" parser object and an array reference to the fields
1980         passed.
1981
1982         The return code of the callback is ignored.
1983
1984          sub max_4_fields
1985          {
1986              my ($csv, $row) = @_;
1987              @$row > 4 and splice @$row, 4;
1988              } # max_4_fields
1989
1990          csv (in => csv (in => "file.csv"), out => *STDOUT,
1991              callbacks => { before print => \&max_4_fields });
1992
1993         This callback is not active for "combine".
1994
1995       Callbacks for csv ()
1996
1997       The "csv" allows for some callbacks that do not integrate in XS
1998       internals but only feature the "csv" function.
1999
2000         csv (in        => "file.csv",
2001              callbacks => {
2002                  filter       => { 6 => sub { $_ > 15 } },    # first
2003                  after_parse  => sub { say "AFTER PARSE";  }, # first
2004                  after_in     => sub { say "AFTER IN";     }, # second
2005                  on_in        => sub { say "ON IN";        }, # third
2006                  },
2007              );
2008
2009         csv (in        => $aoh,
2010              out       => "file.csv",
2011              callbacks => {
2012                  on_in        => sub { say "ON IN";        }, # first
2013                  before_out   => sub { say "BEFORE OUT";   }, # second
2014                  before_print => sub { say "BEFORE PRINT"; }, # third
2015                  },
2016              );
2017
2018       filter
2019         This callback can be used to filter records.  It is called just after
2020         a new record has been scanned.  The callback accepts a:
2021
2022         hashref
2023           The keys are the index to the row (the field name or field number,
2024           1-based) and the values are subs to return a true or false value.
2025
2026            csv (in => "file.csv", filter => {
2027                       3 => sub { m/a/ },       # third field should contain an "a"
2028                       5 => sub { length > 4 }, # length of the 5th field minimal 5
2029                       });
2030
2031            csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2032
2033           If the keys to the filter hash contain any character that is not a
2034           digit it will also implicitly set "headers" to "auto"  unless
2035           "headers"  was already passed as argument.  When headers are
2036           active, returning an array of hashes, the filter is not applicable
2037           to the header itself.
2038
2039           All sub results should match, as in AND.
2040
2041           The context of the callback sets  $_ localized to the field
2042           indicated by the filter. The two arguments are as with all other
2043           callbacks, so the other fields in the current row can be seen:
2044
2045            filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2046
2047           If the context is set to return a list of hashes  ("headers" is
2048           defined), the current record will also be available in the
2049           localized %_:
2050
2051            filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000  }}
2052
2053           If the filter is used to alter the content by changing $_,  make
2054           sure that the sub returns true in order not to have that record
2055           skipped:
2056
2057            filter => { 2 => sub { $_ = uc }}
2058
2059           will upper-case the second field, and then skip it if the resulting
2060           content evaluates to false. To always accept, end with truth:
2061
2062            filter => { 2 => sub { $_ = uc; 1 }}
2063
2064         coderef
2065            csv (in => "file.csv", filter => sub { $n++; 0; });
2066
2067           If the argument to "filter" is a coderef,  it is an alias or
2068           shortcut to a filter on column 0:
2069
2070            csv (filter => sub { $n++; 0 });
2071
2072           is equal to
2073
2074            csv (filter => { 0 => sub { $n++; 0 });
2075
2076         filter-name
2077            csv (in => "file.csv", filter => "not_blank");
2078            csv (in => "file.csv", filter => "not_empty");
2079            csv (in => "file.csv", filter => "filled");
2080
2081           These are predefined filters
2082
2083           Given a file like (line numbers prefixed for doc purpose only):
2084
2085            1:1,2,3
2086            2:
2087            3:,
2088            4:""
2089            5:,,
2090            6:, ,
2091            7:"",
2092            8:" "
2093            9:4,5,6
2094
2095           not_blank
2096             Filter out the blank lines
2097
2098             This filter is a shortcut for
2099
2100              filter => { 0 => sub { @{$_[1]} > 1 or
2101                          defined $_[1][0] && $_[1][0] ne "" } }
2102
2103             Due to the implementation,  it is currently impossible to also
2104             filter lines that consists only of a quoted empty field. These
2105             lines are also considered blank lines.
2106
2107             With the given example, lines 2 and 4 will be skipped.
2108
2109           not_empty
2110             Filter out lines where all the fields are empty.
2111
2112             This filter is a shortcut for
2113
2114              filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2115
2116             A space is not regarded being empty, so given the example data,
2117             lines 2, 3, 4, 5, and 7 are skipped.
2118
2119           filled
2120             Filter out lines that have no visible data
2121
2122             This filter is a shortcut for
2123
2124              filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2125
2126             This filter rejects all lines that not have at least one field
2127             that does not evaluate to the empty string.
2128
2129             With the given example data, this filter would skip lines 2
2130             through 8.
2131
2132       after_in
2133         This callback is invoked for each record after all records have been
2134         parsed but before returning the reference to the caller.  The hook is
2135         invoked with two arguments:  the current  "CSV"  parser object  and a
2136         reference to the record.   The reference can be a reference to a
2137         HASH  or a reference to an ARRAY as determined by the arguments.
2138
2139         This callback can also be passed as  an attribute without the
2140         "callbacks" wrapper.
2141
2142       before_out
2143         This callback is invoked for each record before the record is
2144         printed.  The hook is invoked with two arguments:  the current "CSV"
2145         parser object and a reference to the record.   The reference can be a
2146         reference to a  HASH or a reference to an ARRAY as determined by the
2147         arguments.
2148
2149         This callback can also be passed as an attribute  without the
2150         "callbacks" wrapper.
2151
2152         This callback makes the row available in %_ if the row is a hashref.
2153         In this case %_ is writable and will change the original row.
2154
2155       on_in
2156         This callback acts exactly as the "after_in" or the "before_out"
2157         hooks.
2158
2159         This callback can also be passed as an attribute  without the
2160         "callbacks" wrapper.
2161
2162         This callback makes the row available in %_ if the row is a hashref.
2163         In this case %_ is writable and will change the original row. So e.g.
2164         with
2165
2166           my $aoh = csv (
2167               in      => \"foo\n1\n2\n",
2168               headers => "auto",
2169               on_in   => sub { $_{bar} = 2; },
2170               );
2171
2172         $aoh will be:
2173
2174           [ { foo => 1,
2175               bar => 2,
2176               }
2177             { foo => 2,
2178               bar => 2,
2179               }
2180             ]
2181
2182       csv
2183         The function  "csv" can also be called as a method or with an
2184         existing Text::CSV_PP object. This could help if the function is to
2185         be invoked a lot of times and the overhead of creating the object
2186         internally over  and  over again would be prevented by passing an
2187         existing instance.
2188
2189          my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
2190
2191          my $aoa = $csv->csv (in => $fh);
2192          my $aoa = csv (in => $fh, csv => $csv);
2193
2194         both act the same. Running this 20000 times on a 20 lines CSV file,
2195         showed a 53% speedup.
2196

DIAGNOSTICS

2198       This section is also taken from Text::CSV_XS.
2199
2200       Still under construction ...
2201
2202       If an error occurs,  "$csv->error_diag" can be used to get information
2203       on the cause of the failure. Note that for speed reasons the internal
2204       value is never cleared on success,  so using the value returned by
2205       "error_diag" in normal cases - when no error occurred - may cause
2206       unexpected results.
2207
2208       If the constructor failed, the cause can be found using "error_diag" as
2209       a class method, like "Text::CSV_PP->error_diag".
2210
2211       The "$csv->error_diag" method is automatically invoked upon error when
2212       the contractor was called with  "auto_diag"  set to  1 or 2, or when
2213       autodie is in effect.  When set to 1, this will cause a "warn" with the
2214       error message,  when set to 2, it will "die". "2012 - EOF" is excluded
2215       from "auto_diag" reports.
2216
2217       Errors can be (individually) caught using the "error" callback.
2218
2219       The errors as described below are available. I have tried to make the
2220       error itself explanatory enough, but more descriptions will be added.
2221       For most of these errors, the first three capitals describe the error
2222       category:
2223
2224       • INI
2225
2226         Initialization error or option conflict.
2227
2228       • ECR
2229
2230         Carriage-Return related parse error.
2231
2232       • EOF
2233
2234         End-Of-File related parse error.
2235
2236       • EIQ
2237
2238         Parse error inside quotation.
2239
2240       • EIF
2241
2242         Parse error inside field.
2243
2244       • ECB
2245
2246         Combine error.
2247
2248       • EHR
2249
2250         HashRef parse related error.
2251
2252       And below should be the complete list of error codes that can be
2253       returned:
2254
2255       • 1001 "INI - sep_char is equal to quote_char or escape_char"
2256
2257         The  separation character  cannot be equal to  the quotation
2258         character or to the escape character,  as this would invalidate all
2259         parsing rules.
2260
2261       • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2262         TAB"
2263
2264         Using the  "allow_whitespace"  attribute  when either "quote_char" or
2265         "escape_char"  is equal to "SPACE" or "TAB" is too ambiguous to
2266         allow.
2267
2268       • 1003 "INI - \r or \n in main attr not allowed"
2269
2270         Using default "eol" characters in either "sep_char", "quote_char",
2271         or  "escape_char"  is  not allowed.
2272
2273       • 1004 "INI - callbacks should be undef or a hashref"
2274
2275         The "callbacks"  attribute only allows one to be "undef" or a hash
2276         reference.
2277
2278       • 1005 "INI - EOL too long"
2279
2280         The value passed for EOL is exceeding its maximum length (16).
2281
2282       • 1006 "INI - SEP too long"
2283
2284         The value passed for SEP is exceeding its maximum length (16).
2285
2286       • 1007 "INI - QUOTE too long"
2287
2288         The value passed for QUOTE is exceeding its maximum length (16).
2289
2290       • 1008 "INI - SEP undefined"
2291
2292         The value passed for SEP should be defined and not empty.
2293
2294       • 1010 "INI - the header is empty"
2295
2296         The header line parsed in the "header" is empty.
2297
2298       • 1011 "INI - the header contains more than one valid separator"
2299
2300         The header line parsed in the  "header"  contains more than one
2301         (unique) separator character out of the allowed set of separators.
2302
2303       • 1012 "INI - the header contains an empty field"
2304
2305         The header line parsed in the "header" is contains an empty field.
2306
2307       • 1013 "INI - the header contains nun-unique fields"
2308
2309         The header line parsed in the  "header"  contains at least  two
2310         identical fields.
2311
2312       • 1014 "INI - header called on undefined stream"
2313
2314         The header line cannot be parsed from an undefined sources.
2315
2316       • 1500 "PRM - Invalid/unsupported argument(s)"
2317
2318         Function or method called with invalid argument(s) or parameter(s).
2319
2320       • 1501 "PRM - The key attribute is passed as an unsupported type"
2321
2322         The "key" attribute is of an unsupported type.
2323
2324       • 1502 "PRM - The value attribute is passed without the key attribute"
2325
2326         The "value" attribute is only allowed when a valid key is given.
2327
2328       • 1503 "PRM - The value attribute is passed as an unsupported type"
2329
2330         The "value" attribute is of an unsupported type.
2331
2332       • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2333
2334         When  "eol"  has  been  set  to  anything  but the  default,  like
2335         "\r\t\n",  and  the  "\r"  is  following  the   second   (closing)
2336         "quote_char", where the characters following the "\r" do not make up
2337         the "eol" sequence, this is an error.
2338
2339       • 2011 "ECR - Characters after end of quoted field"
2340
2341         Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2342         quoted field and after the closing double-quote, there should be
2343         either a new-line sequence or a separation character.
2344
2345       • 2012 "EOF - End of data in parsing input stream"
2346
2347         Self-explaining. End-of-file while inside parsing a stream. Can
2348         happen only when reading from streams with "getline",  as using
2349         "parse" is done on strings that are not required to have a trailing
2350         "eol".
2351
2352       • 2013 "INI - Specification error for fragments RFC7111"
2353
2354         Invalid specification for URI "fragment" specification.
2355
2356       • 2014 "ENF - Inconsistent number of fields"
2357
2358         Inconsistent number of fields under strict parsing.
2359
2360       • 2021 "EIQ - NL char inside quotes, binary off"
2361
2362         Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2363         option has been selected with the constructor.
2364
2365       • 2022 "EIQ - CR char inside quotes, binary off"
2366
2367         Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2368         option has been selected with the constructor.
2369
2370       • 2023 "EIQ - QUO character not allowed"
2371
2372         Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2373         Bar",\n" will cause this error.
2374
2375       • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2376
2377         The escape character is not allowed as last character in an input
2378         stream.
2379
2380       • 2025 "EIQ - Loose unescaped escape"
2381
2382         An escape character should escape only characters that need escaping.
2383
2384         Allowing  the escape  for other characters  is possible  with the
2385         attribute "allow_loose_escapes".
2386
2387       • 2026 "EIQ - Binary character inside quoted field, binary off"
2388
2389         Binary characters are not allowed by default.    Exceptions are
2390         fields that contain valid UTF-8,  that will automatically be upgraded
2391         if the content is valid UTF-8. Set "binary" to 1 to accept binary
2392         data.
2393
2394       • 2027 "EIQ - Quoted field not terminated"
2395
2396         When parsing a field that started with a quotation character,  the
2397         field is expected to be closed with a quotation character.   When the
2398         parsed line is exhausted before the quote is found, that field is not
2399         terminated.
2400
2401       • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2402
2403       • 2031 "EIF - CR char is first char of field, not part of EOL"
2404
2405       • 2032 "EIF - CR char inside unquoted, not part of EOL"
2406
2407       • 2034 "EIF - Loose unescaped quote"
2408
2409       • 2035 "EIF - Escaped EOF in unquoted field"
2410
2411       • 2036 "EIF - ESC error"
2412
2413       • 2037 "EIF - Binary character in unquoted field, binary off"
2414
2415       • 2110 "ECB - Binary character in Combine, binary off"
2416
2417       • 2200 "EIO - print to IO failed. See errno"
2418
2419       • 3001 "EHR - Unsupported syntax for column_names ()"
2420
2421       • 3002 "EHR - getline_hr () called before column_names ()"
2422
2423       • 3003 "EHR - bind_columns () and column_names () fields count
2424         mismatch"
2425
2426       • 3004 "EHR - bind_columns () only accepts refs to scalars"
2427
2428       • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2429         fields"
2430
2431       • 3007 "EHR - bind_columns needs refs to writable scalars"
2432
2433       • 3008 "EHR - unexpected error in bound fields"
2434
2435       • 3009 "EHR - print_hr () called before column_names ()"
2436
2437       • 3010 "EHR - print_hr () called with invalid arguments"
2438

AUTHOR

2446       Kenichi Ishigaki, <ishigaki[at]cpan.org> Makamaka Hannyaharamitu,
2447       <makamaka[at]cpan.org>
2448
2449       Text::CSV_XS was written by <joe[at]ispsoft.de> and maintained by
2450       <h.m.brand[at]xs4all.nl>.
2451
2452       Text::CSV was written by <alan[at]mfgrtl.com>.
2453

COPYRIGHT AND LICENSE

2455       Copyright 2017- by Kenichi Ishigaki, <ishigaki[at]cpan.org> Copyright
2456       2005-2015 by Makamaka Hannyaharamitu, <makamaka[at]cpan.org>
2457
2458       Most of the code and doc is directly taken from the pure perl part of
2459       Text::CSV_XS.
2460
2461       Copyright (C) 2007-2016 H.Merijn Brand.  All rights reserved.
2462       Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
2463       Copyright (C) 1997      Alan Citterman.  All rights reserved.
2464
2465       This library is free software; you can redistribute it and/or modify it
2466       under the same terms as Perl itself.
2467
2468
2469
2470perl v5.32.1                      2021-01-27                   Text::CSV_PP(3)