1Text::CSV_PP(3) User Contributed Perl Documentation Text::CSV_PP(3)
2
3
4
6 Text::CSV_PP - Text::CSV_XS compatible pure-Perl module
7
9 This section is taken from Text::CSV_XS.
10
11 # Functional interface
12 use Text::CSV_PP qw( csv );
13
14 # Read whole file in memory
15 my $aoa = csv (in => "data.csv"); # as array of array
16 my $aoh = csv (in => "data.csv",
17 headers => "auto"); # as array of hash
18
19 # Write array of arrays as csv file
20 csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22 # Only show lines where "code" is odd
23 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25 # Object interface
26 use Text::CSV_PP;
27
28 my @rows;
29 # Read/parse CSV
30 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
31 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32 while (my $row = $csv->getline ($fh)) {
33 $row->[2] =~ m/pattern/ or next; # 3rd field should match
34 push @rows, $row;
35 }
36 close $fh;
37
38 # and write as CSV
39 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40 $csv->say ($fh, $_) for @rows;
41 close $fh or die "new.csv: $!";
42
44 Text::CSV_PP is a pure-perl module that provides facilities for the
45 composition and decomposition of comma-separated values. This is
46 (almost) compatible with much faster Text::CSV_XS, and mainly used as
47 its fallback module when you use Text::CSV module without having
48 installed Text::CSV_XS. If you don't have any reason to use this module
49 directly, use Text::CSV for speed boost and portability (or maybe
50 Text::CSV_XS when you write an one-off script and don't need to care
51 about portability).
52
53 The following caveats are taken from the doc of Text::CSV_XS.
54
55 Embedded newlines
56 Important Note: The default behavior is to accept only ASCII
57 characters in the range from 0x20 (space) to 0x7E (tilde). This means
58 that the fields can not contain newlines. If your data contains
59 newlines embedded in fields, or characters above 0x7E (tilde), or
60 binary data, you must set "binary => 1" in the call to "new". To cover
61 the widest range of parsing options, you will always want to set
62 binary.
63
64 But you still have the problem that you have to pass a correct line to
65 the "parse" method, which is more complicated from the usual point of
66 usage:
67
68 my $csv = Text::CSV_PP->new ({ binary => 1, eol => $/ });
69 while (<>) { # WRONG!
70 $csv->parse ($_);
71 my @fields = $csv->fields ();
72 }
73
74 this will break, as the "while" might read broken lines: it does not
75 care about the quoting. If you need to support embedded newlines, the
76 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
77 and "\r\n" by default) and then
78
79 my $csv = Text::CSV_PP->new ({ binary => 1 });
80 open my $fh, "<", $file or die "$file: $!";
81 while (my $row = $csv->getline ($fh)) {
82 my @fields = @$row;
83 }
84
85 The old(er) way of using global file handles is still supported
86
87 while (my $row = $csv->getline (*ARGV)) { ... }
88
89 Unicode
90 Unicode is only tested to work with perl-5.8.2 and up.
91
92 See also "BOM".
93
94 The simplest way to ensure the correct encoding is used for in- and
95 output is by either setting layers on the filehandles, or setting the
96 "encoding" argument for "csv".
97
98 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
99 or
100 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
101
102 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
103 or
104 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
105
106 On parsing (both for "getline" and "parse"), if the source is marked
107 being UTF8, then all fields that are marked binary will also be marked
108 UTF8.
109
110 On combining ("print" and "combine"): if any of the combining fields
111 was marked UTF8, the resulting string will be marked as UTF8. Note
112 however that all fields before the first field marked UTF8 and
113 contained 8-bit characters that were not upgraded to UTF8, these will
114 be "bytes" in the resulting string too, possibly causing unexpected
115 errors. If you pass data of different encoding, or you don't know if
116 there is different encoding, force it to be upgraded before you pass
117 them on:
118
119 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
120
121 For complete control over encoding, please use Text::CSV::Encoded:
122
123 use Text::CSV::Encoded;
124 my $csv = Text::CSV::Encoded->new ({
125 encoding_in => "iso-8859-1", # the encoding comes into Perl
126 encoding_out => "cp1252", # the encoding comes out of Perl
127 });
128
129 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
130 # combine () and print () accept *literally* utf8 encoded data
131 # parse () and getline () return *literally* utf8 encoded data
132
133 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
134 # combine () and print () accept UTF8 marked data
135 # parse () and getline () return UTF8 marked data
136
137 BOM
138 BOM (or Byte Order Mark) handling is available only inside the
139 "header" method. This method supports the following encodings:
140 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
141 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
142 <https://en.wikipedia.org/wiki/Byte_order_mark>.
143
144 If a file has a BOM, the easiest way to deal with that is
145
146 my $aoh = csv (in => $file, detect_bom => 1);
147
148 All records will be encoded based on the detected BOM.
149
150 This implies a call to the "header" method, which defaults to also
151 set the "column_names". So this is not the same as
152
153 my $aoh = csv (in => $file, headers => "auto");
154
155 which only reads the first record to set "column_names" but ignores
156 any meaning of possible present BOM.
157
159 This section is also taken from Text::CSV_XS.
160
161 version
162 (Class method) Returns the current module version.
163
164 new
165 (Class method) Returns a new instance of class Text::CSV_PP. The
166 attributes are described by the (optional) hash ref "\%attr".
167
168 my $csv = Text::CSV_PP->new ({ attributes ... });
169
170 The following attributes are available:
171
172 eol
173
174 my $csv = Text::CSV_PP->new ({ eol => $/ });
175 $csv->eol (undef);
176 my $eol = $csv->eol;
177
178 The end-of-line string to add to rows for "print" or the record
179 separator for "getline".
180
181 When not passed in a parser instance, the default behavior is to
182 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
183 "eol" at all. Passing "undef" or the empty string behave the same.
184
185 When not passed in a generating instance, records are not terminated
186 at all, so it is probably wise to pass something you expect. A safe
187 choice for "eol" on output is either $/ or "\r\n".
188
189 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
190 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
191 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
192
193 If both $/ and "eol" equal "\015", parsing lines that end on only a
194 Carriage Return without Line Feed, will be "parse"d correct.
195
196 sep_char
197
198 my $csv = Text::CSV_PP->new ({ sep_char => ";" });
199 $csv->sep_char (";");
200 my $c = $csv->sep_char;
201
202 The char used to separate fields, by default a comma. (","). Limited
203 to a single-byte character, usually in the range from 0x20 (space) to
204 0x7E (tilde). When longer sequences are required, use "sep".
205
206 The separation character can not be equal to the quote character or to
207 the escape character.
208
209 sep
210
211 my $csv = Text::CSV_PP->new ({ sep => "\N{FULLWIDTH COMMA}" });
212 $csv->sep (";");
213 my $sep = $csv->sep;
214
215 The chars used to separate fields, by default undefined. Limited to 8
216 bytes.
217
218 When set, overrules "sep_char". If its length is one byte it acts as
219 an alias to "sep_char".
220
221 quote_char
222
223 my $csv = Text::CSV_PP->new ({ quote_char => "'" });
224 $csv->quote_char (undef);
225 my $c = $csv->quote_char;
226
227 The character to quote fields containing blanks or binary data, by
228 default the double quote character ("""). A value of undef suppresses
229 quote chars (for simple cases only). Limited to a single-byte
230 character, usually in the range from 0x20 (space) to 0x7E (tilde).
231 When longer sequences are required, use "quote".
232
233 "quote_char" can not be equal to "sep_char".
234
235 quote
236
237 my $csv = Text::CSV_PP->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
238 $csv->quote ("'");
239 my $quote = $csv->quote;
240
241 The chars used to quote fields, by default undefined. Limited to 8
242 bytes.
243
244 When set, overrules "quote_char". If its length is one byte it acts as
245 an alias to "quote_char".
246
247 escape_char
248
249 my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
250 $csv->escape_char (":");
251 my $c = $csv->escape_char;
252
253 The character to escape certain characters inside quoted fields.
254 This is limited to a single-byte character, usually in the range
255 from 0x20 (space) to 0x7E (tilde).
256
257 The "escape_char" defaults to being the double-quote mark ("""). In
258 other words the same as the default "quote_char". This means that
259 doubling the quote mark in a field escapes it:
260
261 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
262
263 If you change the "quote_char" without changing the
264 "escape_char", the "escape_char" will still be the double-quote
265 ("""). If instead you want to escape the "quote_char" by doubling it
266 you will need to also change the "escape_char" to be the same as what
267 you have changed the "quote_char" to.
268
269 Setting "escape_char" to <undef> or "" will disable escaping completely
270 and is greatly discouraged. This will also disable "escape_null".
271
272 The escape character can not be equal to the separation character.
273
274 binary
275
276 my $csv = Text::CSV_PP->new ({ binary => 1 });
277 $csv->binary (0);
278 my $f = $csv->binary;
279
280 If this attribute is 1, you may use binary characters in quoted
281 fields, including line feeds, carriage returns and "NULL" bytes. (The
282 latter could be escaped as ""0".) By default this feature is off.
283
284 If a string is marked UTF8, "binary" will be turned on automatically
285 when binary characters other than "CR" and "NL" are encountered. Note
286 that a simple string like "\x{00a0}" might still be binary, but not
287 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
288
289 strict
290
291 my $csv = Text::CSV_PP->new ({ strict => 1 });
292 $csv->strict (0);
293 my $f = $csv->strict;
294
295 If this attribute is set to 1, any row that parses to a different
296 number of fields than the previous row will cause the parser to throw
297 error 2014.
298
299 formula_handling
300
301 formula
302
303 my $csv = Text::CSV_PP->new ({ formula => "none" });
304 $csv->formula ("none");
305 my $f = $csv->formula;
306
307 This defines the behavior of fields containing formulas. As formulas
308 are considered dangerous in spreadsheets, this attribute can define an
309 optional action to be taken if a field starts with an equal sign ("=").
310
311 For purpose of code-readability, this can also be written as
312
313 my $csv = Text::CSV_PP->new ({ formula_handling => "none" });
314 $csv->formula_handling ("none");
315 my $f = $csv->formula_handling;
316
317 Possible values for this attribute are
318
319 none
320 Take no specific action. This is the default.
321
322 $csv->formula ("none");
323
324 die
325 Cause the process to "die" whenever a leading "=" is encountered.
326
327 $csv->formula ("die");
328
329 croak
330 Cause the process to "croak" whenever a leading "=" is encountered.
331 (See Carp)
332
333 $csv->formula ("croak");
334
335 diag
336 Report position and content of the field whenever a leading "=" is
337 found. The value of the field is unchanged.
338
339 $csv->formula ("diag");
340
341 empty
342 Replace the content of fields that start with a "=" with the empty
343 string.
344
345 $csv->formula ("empty");
346 $csv->formula ("");
347
348 undef
349 Replace the content of fields that start with a "=" with "undef".
350
351 $csv->formula ("undef");
352 $csv->formula (undef);
353
354 All other values will give a warning and then fallback to "diag".
355
356 decode_utf8
357
358 my $csv = Text::CSV_PP->new ({ decode_utf8 => 1 });
359 $csv->decode_utf8 (0);
360 my $f = $csv->decode_utf8;
361
362 This attributes defaults to TRUE.
363
364 While parsing, fields that are valid UTF-8, are automatically set to
365 be UTF-8, so that
366
367 $csv->parse ("\xC4\xA8\n");
368
369 results in
370
371 PV("\304\250"\0) [UTF8 "\x{128}"]
372
373 Sometimes it might not be a desired action. To prevent those upgrades,
374 set this attribute to false, and the result will be
375
376 PV("\304\250"\0)
377
378 auto_diag
379
380 my $csv = Text::CSV_PP->new ({ auto_diag => 1 });
381 $csv->auto_diag (2);
382 my $l = $csv->auto_diag;
383
384 Set this attribute to a number between 1 and 9 causes "error_diag" to
385 be automatically called in void context upon errors.
386
387 In case of error "2012 - EOF", this call will be void.
388
389 If "auto_diag" is set to a numeric value greater than 1, it will "die"
390 on errors instead of "warn". If set to anything unrecognized, it will
391 be silently ignored.
392
393 Future extensions to this feature will include more reliable auto-
394 detection of "autodie" being active in the scope of which the error
395 occurred which will increment the value of "auto_diag" with 1 the
396 moment the error is detected.
397
398 diag_verbose
399
400 my $csv = Text::CSV_PP->new ({ diag_verbose => 1 });
401 $csv->diag_verbose (2);
402 my $l = $csv->diag_verbose;
403
404 Set the verbosity of the output triggered by "auto_diag". Currently
405 only adds the current input-record-number (if known) to the
406 diagnostic output with an indication of the position of the error.
407
408 blank_is_undef
409
410 my $csv = Text::CSV_PP->new ({ blank_is_undef => 1 });
411 $csv->blank_is_undef (0);
412 my $f = $csv->blank_is_undef;
413
414 Under normal circumstances, "CSV" data makes no distinction between
415 quoted- and unquoted empty fields. These both end up in an empty
416 string field once read, thus
417
418 1,"",," ",2
419
420 is read as
421
422 ("1", "", "", " ", "2")
423
424 When writing "CSV" files with either "always_quote" or "quote_empty"
425 set, the unquoted empty field is the result of an undefined value.
426 To enable this distinction when reading "CSV" data, the
427 "blank_is_undef" attribute will cause unquoted empty fields to be set
428 to "undef", causing the above to be parsed as
429
430 ("1", "", undef, " ", "2")
431
432 note that this is specifically important when loading "CSV" fields
433 into a database that allows "NULL" values, as the perl equivalent for
434 "NULL" is "undef" in DBI land.
435
436 empty_is_undef
437
438 my $csv = Text::CSV_PP->new ({ empty_is_undef => 1 });
439 $csv->empty_is_undef (0);
440 my $f = $csv->empty_is_undef;
441
442 Going one step further than "blank_is_undef", this attribute
443 converts all empty fields to "undef", so
444
445 1,"",," ",2
446
447 is read as
448
449 (1, undef, undef, " ", 2)
450
451 Note that this effects only fields that are originally empty, not
452 fields that are empty after stripping allowed whitespace. YMMV.
453
454 allow_whitespace
455
456 my $csv = Text::CSV_PP->new ({ allow_whitespace => 1 });
457 $csv->allow_whitespace (0);
458 my $f = $csv->allow_whitespace;
459
460 When this option is set to true, the whitespace ("TAB"'s and
461 "SPACE"'s) surrounding the separation character is removed when
462 parsing. If either "TAB" or "SPACE" is one of the three characters
463 "sep_char", "quote_char", or "escape_char" it will not be considered
464 whitespace.
465
466 Now lines like:
467
468 1 , "foo" , bar , 3 , zapp
469
470 are parsed as valid "CSV", even though it violates the "CSV" specs.
471
472 Note that all whitespace is stripped from both start and end of
473 each field. That would make it more than a feature to enable parsing
474 bad "CSV" lines, as
475
476 1, 2.0, 3, ape , monkey
477
478 will now be parsed as
479
480 ("1", "2.0", "3", "ape", "monkey")
481
482 even if the original line was perfectly acceptable "CSV".
483
484 allow_loose_quotes
485
486 my $csv = Text::CSV_PP->new ({ allow_loose_quotes => 1 });
487 $csv->allow_loose_quotes (0);
488 my $f = $csv->allow_loose_quotes;
489
490 By default, parsing unquoted fields containing "quote_char" characters
491 like
492
493 1,foo "bar" baz,42
494
495 would result in parse error 2034. Though it is still bad practice to
496 allow this format, we cannot help the fact that some vendors
497 make their applications spit out lines styled this way.
498
499 If there is really bad "CSV" data, like
500
501 1,"foo "bar" baz",42
502
503 or
504
505 1,""foo bar baz"",42
506
507 there is a way to get this data-line parsed and leave the quotes inside
508 the quoted field as-is. This can be achieved by setting
509 "allow_loose_quotes" AND making sure that the "escape_char" is not
510 equal to "quote_char".
511
512 allow_loose_escapes
513
514 my $csv = Text::CSV_PP->new ({ allow_loose_escapes => 1 });
515 $csv->allow_loose_escapes (0);
516 my $f = $csv->allow_loose_escapes;
517
518 Parsing fields that have "escape_char" characters that escape
519 characters that do not need to be escaped, like:
520
521 my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
522 $csv->parse (qq{1,"my bar\'s",baz,42});
523
524 would result in parse error 2025. Though it is bad practice to allow
525 this format, this attribute enables you to treat all escape character
526 sequences equal.
527
528 allow_unquoted_escape
529
530 my $csv = Text::CSV_PP->new ({ allow_unquoted_escape => 1 });
531 $csv->allow_unquoted_escape (0);
532 my $f = $csv->allow_unquoted_escape;
533
534 A backward compatibility issue where "escape_char" differs from
535 "quote_char" prevents "escape_char" to be in the first position of a
536 field. If "quote_char" is equal to the default """ and "escape_char"
537 is set to "\", this would be illegal:
538
539 1,\0,2
540
541 Setting this attribute to 1 might help to overcome issues with
542 backward compatibility and allow this style.
543
544 always_quote
545
546 my $csv = Text::CSV_PP->new ({ always_quote => 1 });
547 $csv->always_quote (0);
548 my $f = $csv->always_quote;
549
550 By default the generated fields are quoted only if they need to be.
551 For example, if they contain the separator character. If you set this
552 attribute to 1 then all defined fields will be quoted. ("undef" fields
553 are not quoted, see "blank_is_undef"). This makes it quite often easier
554 to handle exported data in external applications.
555
556 quote_space
557
558 my $csv = Text::CSV_PP->new ({ quote_space => 1 });
559 $csv->quote_space (0);
560 my $f = $csv->quote_space;
561
562 By default, a space in a field would trigger quotation. As no rule
563 exists this to be forced in "CSV", nor any for the opposite, the
564 default is true for safety. You can exclude the space from this
565 trigger by setting this attribute to 0.
566
567 quote_empty
568
569 my $csv = Text::CSV_PP->new ({ quote_empty => 1 });
570 $csv->quote_empty (0);
571 my $f = $csv->quote_empty;
572
573 By default the generated fields are quoted only if they need to be.
574 An empty (defined) field does not need quotation. If you set this
575 attribute to 1 then empty defined fields will be quoted. ("undef"
576 fields are not quoted, see "blank_is_undef"). See also "always_quote".
577
578 quote_binary
579
580 my $csv = Text::CSV_PP->new ({ quote_binary => 1 });
581 $csv->quote_binary (0);
582 my $f = $csv->quote_binary;
583
584 By default, all "unsafe" bytes inside a string cause the combined
585 field to be quoted. By setting this attribute to 0, you can disable
586 that trigger for bytes >= 0x7F.
587
588 escape_null
589
590 my $csv = Text::CSV_PP->new ({ escape_null => 1 });
591 $csv->escape_null (0);
592 my $f = $csv->escape_null;
593
594 By default, a "NULL" byte in a field would be escaped. This option
595 enables you to treat the "NULL" byte as a simple binary character in
596 binary mode (the "{ binary => 1 }" is set). The default is true. You
597 can prevent "NULL" escapes by setting this attribute to 0.
598
599 When the "escape_char" attribute is set to undefined, this attribute
600 will be set to false.
601
602 The default setting will encode "=\x00=" as
603
604 "="0="
605
606 With "escape_null" set, this will result in
607
608 "=\x00="
609
610 The default when using the "csv" function is "false".
611
612 For backward compatibility reasons, the deprecated old name
613 "quote_null" is still recognized.
614
615 keep_meta_info
616
617 my $csv = Text::CSV_PP->new ({ keep_meta_info => 1 });
618 $csv->keep_meta_info (0);
619 my $f = $csv->keep_meta_info;
620
621 By default, the parsing of input records is as simple and fast as
622 possible. However, some parsing information - like quotation of the
623 original field - is lost in that process. Setting this flag to true
624 enables retrieving that information after parsing with the methods
625 "meta_info", "is_quoted", and "is_binary" described below. Default is
626 false for performance.
627
628 If you set this attribute to a value greater than 9, than you can
629 control output quotation style like it was used in the input of the the
630 last parsed record (unless quotation was added because of other
631 reasons).
632
633 my $csv = Text::CSV_PP->new ({
634 binary => 1,
635 keep_meta_info => 1,
636 quote_space => 0,
637 });
638
639 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
640
641 $csv->print (*STDOUT, \@row);
642 # 1,,, , ,f,g,"h""h",help,help
643 $csv->keep_meta_info (11);
644 $csv->print (*STDOUT, \@row);
645 # 1,,"", ," ",f,"g","h""h",help,"help"
646
647 undef_str
648
649 my $csv = Text::CSV_PP->new ({ undef_str => "\\N" });
650 $csv->undef_str (undef);
651 my $s = $csv->undef_str;
652
653 This attribute optionally defines the output of undefined fields. The
654 value passed is not changed at all, so if it needs quotation, the
655 quotation needs to be included in the value of the attribute. Use with
656 caution, as passing a value like ",",,,,""" will for sure mess up
657 your output. The default for this attribute is "undef", meaning no
658 special treatment.
659
660 This attribute is useful when exporting CSV data to be imported in
661 custom loaders, like for MySQL, that recognize special sequences for
662 "NULL" data.
663
664 This attribute has no meaning when parsing CSV data.
665
666 verbatim
667
668 my $csv = Text::CSV_PP->new ({ verbatim => 1 });
669 $csv->verbatim (0);
670 my $f = $csv->verbatim;
671
672 This is a quite controversial attribute to set, but makes some hard
673 things possible.
674
675 The rationale behind this attribute is to tell the parser that the
676 normally special characters newline ("NL") and Carriage Return ("CR")
677 will not be special when this flag is set, and be dealt with as being
678 ordinary binary characters. This will ease working with data with
679 embedded newlines.
680
681 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
682 every line.
683
684 Imagine a file format like
685
686 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
687
688 where, the line ending is a very specific "#\r\n", and the sep_char is
689 a "^" (caret). None of the fields is quoted, but embedded binary
690 data is likely to be present. With the specific line ending, this
691 should not be too hard to detect.
692
693 By default, Text::CSV_PP' parse function is instructed to only know
694 about "\n" and "\r" to be legal line endings, and so has to deal with
695 the embedded newline as a real "end-of-line", so it can scan the next
696 line if binary is true, and the newline is inside a quoted field. With
697 this option, we tell "parse" to parse the line as if "\n" is just
698 nothing more than a binary character.
699
700 For "parse" this means that the parser has no more idea about line
701 ending and "getline" "chomp"s line endings on reading.
702
703 types
704
705 A set of column types; the attribute is immediately passed to the
706 "types" method.
707
708 callbacks
709
710 See the "Callbacks" section below.
711
712 accessors
713
714 To sum it up,
715
716 $csv = Text::CSV_PP->new ();
717
718 is equivalent to
719
720 $csv = Text::CSV_PP->new ({
721 eol => undef, # \r, \n, or \r\n
722 sep_char => ',',
723 sep => undef,
724 quote_char => '"',
725 quote => undef,
726 escape_char => '"',
727 binary => 0,
728 decode_utf8 => 1,
729 auto_diag => 0,
730 diag_verbose => 0,
731 blank_is_undef => 0,
732 empty_is_undef => 0,
733 allow_whitespace => 0,
734 allow_loose_quotes => 0,
735 allow_loose_escapes => 0,
736 allow_unquoted_escape => 0,
737 always_quote => 0,
738 quote_empty => 0,
739 quote_space => 1,
740 escape_null => 1,
741 quote_binary => 1,
742 keep_meta_info => 0,
743 strict => 0,
744 formula => 0,
745 verbatim => 0,
746 undef_str => undef,
747 types => undef,
748 callbacks => undef,
749 });
750
751 For all of the above mentioned flags, an accessor method is available
752 where you can inquire the current value, or change the value
753
754 my $quote = $csv->quote_char;
755 $csv->binary (1);
756
757 It is not wise to change these settings halfway through writing "CSV"
758 data to a stream. If however you want to create a new stream using the
759 available "CSV" object, there is no harm in changing them.
760
761 If the "new" constructor call fails, it returns "undef", and makes
762 the fail reason available through the "error_diag" method.
763
764 $csv = Text::CSV_PP->new ({ ecs_char => 1 }) or
765 die "".Text::CSV_PP->error_diag ();
766
767 "error_diag" will return a string like
768
769 "INI - Unknown attribute 'ecs_char'"
770
771 known_attributes
772 @attr = Text::CSV_PP->known_attributes;
773 @attr = Text::CSV_PP::known_attributes;
774 @attr = $csv->known_attributes;
775
776 This method will return an ordered list of all the supported
777 attributes as described above. This can be useful for knowing what
778 attributes are valid in classes that use or extend Text::CSV_PP.
779
780 print
781 $status = $csv->print ($fh, $colref);
782
783 Similar to "combine" + "string" + "print", but much more efficient.
784 It expects an array ref as input (not an array!) and the resulting
785 string is not really created, but immediately written to the $fh
786 object, typically an IO handle or any other object that offers a
787 "print" method.
788
789 For performance reasons "print" does not create a result string, so
790 all "string", "status", "fields", and "error_input" methods will return
791 undefined information after executing this method.
792
793 If $colref is "undef" (explicit, not through a variable argument) and
794 "bind_columns" was used to specify fields to be printed, it is
795 possible to make performance improvements, as otherwise data would have
796 to be copied as arguments to the method call:
797
798 $csv->bind_columns (\($foo, $bar));
799 $status = $csv->print ($fh, undef);
800
801 A short benchmark
802
803 my @data = ("aa" .. "zz");
804 $csv->bind_columns (\(@data));
805
806 $csv->print ($fh, [ @data ]); # 11800 recs/sec
807 $csv->print ($fh, \@data ); # 57600 recs/sec
808 $csv->print ($fh, undef ); # 48500 recs/sec
809
810 say
811 $status = $csv->say ($fh, $colref);
812
813 Like "print", but "eol" defaults to "$\".
814
815 print_hr
816 $csv->print_hr ($fh, $ref);
817
818 Provides an easy way to print a $ref (as fetched with "getline_hr")
819 provided the column names are set with "column_names".
820
821 It is just a wrapper method with basic parameter checks over
822
823 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
824
825 combine
826 $status = $csv->combine (@fields);
827
828 This method constructs a "CSV" record from @fields, returning success
829 or failure. Failure can result from lack of arguments or an argument
830 that contains an invalid character. Upon success, "string" can be
831 called to retrieve the resultant "CSV" string. Upon failure, the
832 value returned by "string" is undefined and "error_input" could be
833 called to retrieve the invalid argument.
834
835 string
836 $line = $csv->string ();
837
838 This method returns the input to "parse" or the resultant "CSV"
839 string of "combine", whichever was called more recently.
840
841 getline
842 $colref = $csv->getline ($fh);
843
844 This is the counterpart to "print", as "parse" is the counterpart to
845 "combine": it parses a row from the $fh handle using the "getline"
846 method associated with $fh and parses this row into an array ref.
847 This array ref is returned by the function or "undef" for failure.
848 When $fh does not support "getline", you are likely to hit errors.
849
850 When fields are bound with "bind_columns" the return value is a
851 reference to an empty list.
852
853 The "string", "fields", and "status" methods are meaningless again.
854
855 getline_all
856 $arrayref = $csv->getline_all ($fh);
857 $arrayref = $csv->getline_all ($fh, $offset);
858 $arrayref = $csv->getline_all ($fh, $offset, $length);
859
860 This will return a reference to a list of getline ($fh) results. In
861 this call, "keep_meta_info" is disabled. If $offset is negative, as
862 with "splice", only the last "abs ($offset)" records of $fh are taken
863 into consideration.
864
865 Given a CSV file with 10 lines:
866
867 lines call
868 ----- ---------------------------------------------------------
869 0..9 $csv->getline_all ($fh) # all
870 0..9 $csv->getline_all ($fh, 0) # all
871 8..9 $csv->getline_all ($fh, 8) # start at 8
872 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
873 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
874 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
875 8..9 $csv->getline_all ($fh, -2) # last 2 rows
876 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
877
878 getline_hr
879 The "getline_hr" and "column_names" methods work together to allow you
880 to have rows returned as hashrefs. You must call "column_names" first
881 to declare your column names.
882
883 $csv->column_names (qw( code name price description ));
884 $hr = $csv->getline_hr ($fh);
885 print "Price for $hr->{name} is $hr->{price} EUR\n";
886
887 "getline_hr" will croak if called before "column_names".
888
889 Note that "getline_hr" creates a hashref for every row and will be
890 much slower than the combined use of "bind_columns" and "getline" but
891 still offering the same ease of use hashref inside the loop:
892
893 my @cols = @{$csv->getline ($fh)};
894 $csv->column_names (@cols);
895 while (my $row = $csv->getline_hr ($fh)) {
896 print $row->{price};
897 }
898
899 Could easily be rewritten to the much faster:
900
901 my @cols = @{$csv->getline ($fh)};
902 my $row = {};
903 $csv->bind_columns (\@{$row}{@cols});
904 while ($csv->getline ($fh)) {
905 print $row->{price};
906 }
907
908 Your mileage may vary for the size of the data and the number of rows.
909 With perl-5.14.2 the comparison for a 100_000 line file with 14 rows:
910
911 Rate hashrefs getlines
912 hashrefs 1.00/s -- -76%
913 getlines 4.15/s 313% --
914
915 getline_hr_all
916 $arrayref = $csv->getline_hr_all ($fh);
917 $arrayref = $csv->getline_hr_all ($fh, $offset);
918 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
919
920 This will return a reference to a list of getline_hr ($fh) results.
921 In this call, "keep_meta_info" is disabled.
922
923 parse
924 $status = $csv->parse ($line);
925
926 This method decomposes a "CSV" string into fields, returning success
927 or failure. Failure can result from a lack of argument or the given
928 "CSV" string is improperly formatted. Upon success, "fields" can be
929 called to retrieve the decomposed fields. Upon failure calling "fields"
930 will return undefined data and "error_input" can be called to
931 retrieve the invalid argument.
932
933 You may use the "types" method for setting column types. See "types"'
934 description below.
935
936 The $line argument is supposed to be a simple scalar. Everything else
937 is supposed to croak and set error 1500.
938
939 fragment
940 This function tries to implement RFC7111 (URI Fragment Identifiers for
941 the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
942
943 my $AoA = $csv->fragment ($fh, $spec);
944
945 In specifications, "*" is used to specify the last item, a dash ("-")
946 to indicate a range. All indices are 1-based: the first row or
947 column has index 1. Selections can be combined with the semi-colon
948 (";").
949
950 When using this method in combination with "column_names", the
951 returned reference will point to a list of hashes instead of a list
952 of lists. A disjointed cell-based combined selection might return
953 rows with different number of columns making the use of hashes
954 unpredictable.
955
956 $csv->column_names ("Name", "Age");
957 my $AoH = $csv->fragment ($fh, "col=3;8");
958
959 If the "after_parse" callback is active, it is also called on every
960 line parsed and skipped before the fragment.
961
962 row
963 row=4
964 row=5-7
965 row=6-*
966 row=1-2;4;6-*
967
968 col
969 col=2
970 col=1-3
971 col=4-*
972 col=1-2;4;7-*
973
974 cell
975 In cell-based selection, the comma (",") is used to pair row and
976 column
977
978 cell=4,1
979
980 The range operator ("-") using "cell"s can be used to define top-left
981 and bottom-right "cell" location
982
983 cell=3,1-4,6
984
985 The "*" is only allowed in the second part of a pair
986
987 cell=3,2-*,2 # row 3 till end, only column 2
988 cell=3,2-3,* # column 2 till end, only row 3
989 cell=3,2-*,* # strip row 1 and 2, and column 1
990
991 Cells and cell ranges may be combined with ";", possibly resulting in
992 rows with different number of columns
993
994 cell=1,1-2,2;3,3-4,4;1,4;4,1
995
996 Disjointed selections will only return selected cells. The cells
997 that are not specified will not be included in the returned
998 set, not even as "undef". As an example given a "CSV" like
999
1000 11,12,13,...19
1001 21,22,...28,29
1002 : :
1003 91,...97,98,99
1004
1005 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1006
1007 11,12,14
1008 21,22
1009 33,34
1010 41,43,44
1011
1012 Overlapping cell-specs will return those cells only once, So
1013 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1014
1015 11,12,13
1016 21,22,23,24
1017 31,32,33,34
1018 42,43,44
1019
1020 RFC7111 <http://tools.ietf.org/html/rfc7111> does not allow different
1021 types of specs to be combined (either "row" or "col" or "cell").
1022 Passing an invalid fragment specification will croak and set error
1023 2013.
1024
1025 column_names
1026 Set the "keys" that will be used in the "getline_hr" calls. If no
1027 keys (column names) are passed, it will return the current setting as a
1028 list.
1029
1030 "column_names" accepts a list of scalars (the column names) or a
1031 single array_ref, so you can pass the return value from "getline" too:
1032
1033 $csv->column_names ($csv->getline ($fh));
1034
1035 "column_names" does no checking on duplicates at all, which might lead
1036 to unexpected results. Undefined entries will be replaced with the
1037 string "\cAUNDEF\cA", so
1038
1039 $csv->column_names (undef, "", "name", "name");
1040 $hr = $csv->getline_hr ($fh);
1041
1042 Will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1043 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1044 field.
1045
1046 "column_names" croaks on invalid arguments.
1047
1048 header
1049 This method does NOT work in perl-5.6.x
1050
1051 Parse the CSV header and set "sep", column_names and encoding.
1052
1053 my @hdr = $csv->header ($fh);
1054 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1055 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1056
1057 The first argument should be a file handle.
1058
1059 This method resets some object properties, as it is supposed to be
1060 invoked only once per file or stream. It will leave attributes
1061 "column_names" and "bound_columns" alone of setting column names is
1062 disabled. Reading headers on previously process objects might fail on
1063 perl-5.8.0 and older.
1064
1065 Assuming that the file opened for parsing has a header, and the header
1066 does not contain problematic characters like embedded newlines, read
1067 the first line from the open handle then auto-detect whether the header
1068 separates the column names with a character from the allowed separator
1069 list.
1070
1071 If any of the allowed separators matches, and none of the other
1072 allowed separators match, set "sep" to that separator for the
1073 current CSV_PP instance and use it to parse the first line, map those
1074 to lowercase, and use that to set the instance "column_names":
1075
1076 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1077 open my $fh, "<", "file.csv";
1078 binmode $fh; # for Windows
1079 $csv->header ($fh);
1080 while (my $row = $csv->getline_hr ($fh)) {
1081 ...
1082 }
1083
1084 If the header is empty, contains more than one unique separator out of
1085 the allowed set, contains empty fields, or contains identical fields
1086 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1087 respectively.
1088
1089 If the header contains embedded newlines or is not valid CSV in any
1090 other way, this method will croak and leave the parse error untouched.
1091
1092 A successful call to "header" will always set the "sep" of the $csv
1093 object. This behavior can not be disabled.
1094
1095 return value
1096
1097 On error this method will croak.
1098
1099 In list context, the headers will be returned whether they are used to
1100 set "column_names" or not.
1101
1102 In scalar context, the instance itself is returned. Note: the values
1103 as found in the header will effectively be lost if "set_column_names"
1104 is false.
1105
1106 Options
1107
1108 sep_set
1109 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1110
1111 The list of legal separators defaults to "[ ";", "," ]" and can be
1112 changed by this option. As this is probably the most often used
1113 option, it can be passed on its own as an unnamed argument:
1114
1115 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1116
1117 Multi-byte sequences are allowed, both multi-character and
1118 Unicode. See "sep".
1119
1120 detect_bom
1121 $csv->header ($fh, { detect_bom => 1 });
1122
1123 The default behavior is to detect if the header line starts with a
1124 BOM. If the header has a BOM, use that to set the encoding of $fh.
1125 This default behavior can be disabled by passing a false value to
1126 "detect_bom".
1127
1128 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1129 UTF-32BE, and UTF-32LE. BOM's also support UTF-1, UTF-EBCDIC, SCSU,
1130 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1131 supported.
1132
1133 If a supported BOM was detected as start of the stream, it is stored
1134 in the abject attribute "ENCODING".
1135
1136 my $enc = $csv->{ENCODING};
1137
1138 The encoding is used with "binmode" on $fh.
1139
1140 If the handle was opened in a (correct) encoding, this method will
1141 not alter the encoding, as it checks the leading bytes of the first
1142 line. In case the stream starts with a decode BOM ("U+FEFF"),
1143 "{ENCODING}" will be "" (empty) instead of the default "undef".
1144
1145 munge_column_names
1146 This option offers the means to modify the column names into
1147 something that is most useful to the application. The default is to
1148 map all column names to lower case.
1149
1150 $csv->header ($fh, { munge_column_names => "lc" });
1151
1152 The following values are available:
1153
1154 lc - lower case
1155 uc - upper case
1156 none - do not change
1157 \%hash - supply a mapping
1158 \&cb - supply a callback
1159
1160 Literal:
1161
1162 $csv->header ($fh, { munge_column_names => "none" });
1163
1164 Hash:
1165
1166 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1167
1168 if a value does not exist, the original value is used unchanged
1169
1170 Callback:
1171
1172 $csv->header ($fh, { munge_column_names => sub { fc } });
1173 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1174 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1175
1176 As this callback is called in a "map", you can use $_ directly.
1177
1178 set_column_names
1179 $csv->header ($fh, { set_column_names => 1 });
1180
1181 The default is to set the instances column names using
1182 "column_names" if the method is successful, so subsequent calls to
1183 "getline_hr" can return a hash. Disable setting the header can be
1184 forced by using a false value for this option.
1185
1186 As described in "return value" above, content is lost in scalar
1187 context.
1188
1189 Validation
1190
1191 When receiving CSV files from external sources, this method can be
1192 used to protect against changes in the layout by restricting to known
1193 headers (and typos in the header fields).
1194
1195 my %known = (
1196 "record key" => "c_rec",
1197 "rec id" => "c_rec",
1198 "id_rec" => "c_rec",
1199 "kode" => "code",
1200 "code" => "code",
1201 "vaule" => "value",
1202 "value" => "value",
1203 );
1204 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1205 open my $fh, "<", $source or die "$source: $!";
1206 $csv->header ($fh, { munge_column_names => sub {
1207 s/\s+$//;
1208 s/^\s+//;
1209 $known{lc $_} or die "Unknown column '$_' in $source";
1210 }});
1211 while (my $row = $csv->getline_hr ($fh)) {
1212 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1213 }
1214
1215 bind_columns
1216 Takes a list of scalar references to be used for output with "print"
1217 or to store in the fields fetched by "getline". When you do not pass
1218 enough references to store the fetched fields in, "getline" will fail
1219 with error 3006. If you pass more than there are fields to return,
1220 the content of the remaining references is left untouched.
1221
1222 $csv->bind_columns (\$code, \$name, \$price, \$description);
1223 while ($csv->getline ($fh)) {
1224 print "The price of a $name is \x{20ac} $price\n";
1225 }
1226
1227 To reset or clear all column binding, call "bind_columns" with the
1228 single argument "undef". This will also clear column names.
1229
1230 $csv->bind_columns (undef);
1231
1232 If no arguments are passed at all, "bind_columns" will return the list
1233 of current bindings or "undef" if no binds are active.
1234
1235 Note that in parsing with "bind_columns", the fields are set on the
1236 fly. That implies that if the third field of a row causes an error
1237 (or this row has just two fields where the previous row had more), the
1238 first two fields already have been assigned the values of the current
1239 row, while the rest of the fields will still hold the values of the
1240 previous row. If you want the parser to fail in these cases, use the
1241 "strict" attribute.
1242
1243 eof
1244 $eof = $csv->eof ();
1245
1246 If "parse" or "getline" was used with an IO stream, this method will
1247 return true (1) if the last call hit end of file, otherwise it will
1248 return false (''). This is useful to see the difference between a
1249 failure and end of file.
1250
1251 Note that if the parsing of the last line caused an error, "eof" is
1252 still true. That means that if you are not using "auto_diag", an idiom
1253 like
1254
1255 while (my $row = $csv->getline ($fh)) {
1256 # ...
1257 }
1258 $csv->eof or $csv->error_diag;
1259
1260 will not report the error. You would have to change that to
1261
1262 while (my $row = $csv->getline ($fh)) {
1263 # ...
1264 }
1265 +$csv->error_diag and $csv->error_diag;
1266
1267 types
1268 $csv->types (\@tref);
1269
1270 This method is used to force that (all) columns are of a given type.
1271 For example, if you have an integer column, two columns with
1272 doubles and a string column, then you might do a
1273
1274 $csv->types ([Text::CSV_PP::IV (),
1275 Text::CSV_PP::NV (),
1276 Text::CSV_PP::NV (),
1277 Text::CSV_PP::PV ()]);
1278
1279 Column types are used only for decoding columns while parsing, in
1280 other words by the "parse" and "getline" methods.
1281
1282 You can unset column types by doing a
1283
1284 $csv->types (undef);
1285
1286 or fetch the current type settings with
1287
1288 $types = $csv->types ();
1289
1290 IV Set field type to integer.
1291
1292 NV Set field type to numeric/float.
1293
1294 PV Set field type to string.
1295
1296 fields
1297 @columns = $csv->fields ();
1298
1299 This method returns the input to "combine" or the resultant
1300 decomposed fields of a successful "parse", whichever was called more
1301 recently.
1302
1303 Note that the return value is undefined after using "getline", which
1304 does not fill the data structures returned by "parse".
1305
1306 meta_info
1307 @flags = $csv->meta_info ();
1308
1309 This method returns the "flags" of the input to "combine" or the flags
1310 of the resultant decomposed fields of "parse", whichever was called
1311 more recently.
1312
1313 For each field, a meta_info field will hold flags that inform
1314 something about the field returned by the "fields" method or
1315 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1316
1317 " "0x0001
1318 The field was quoted.
1319
1320 " "0x0002
1321 The field was binary.
1322
1323 See the "is_***" methods below.
1324
1325 is_quoted
1326 my $quoted = $csv->is_quoted ($column_idx);
1327
1328 Where $column_idx is the (zero-based) index of the column in the
1329 last result of "parse".
1330
1331 This returns a true value if the data in the indicated column was
1332 enclosed in "quote_char" quotes. This might be important for fields
1333 where content ",20070108," is to be treated as a numeric value, and
1334 where ","20070108"," is explicitly marked as character string data.
1335
1336 This method is only valid when "keep_meta_info" is set to a true value.
1337
1338 is_binary
1339 my $binary = $csv->is_binary ($column_idx);
1340
1341 Where $column_idx is the (zero-based) index of the column in the
1342 last result of "parse".
1343
1344 This returns a true value if the data in the indicated column contained
1345 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1346
1347 This method is only valid when "keep_meta_info" is set to a true value.
1348
1349 is_missing
1350 my $missing = $csv->is_missing ($column_idx);
1351
1352 Where $column_idx is the (zero-based) index of the column in the
1353 last result of "getline_hr".
1354
1355 $csv->keep_meta_info (1);
1356 while (my $hr = $csv->getline_hr ($fh)) {
1357 $csv->is_missing (0) and next; # This was an empty line
1358 }
1359
1360 When using "getline_hr", it is impossible to tell if the parsed
1361 fields are "undef" because they where not filled in the "CSV" stream
1362 or because they were not read at all, as all the fields defined by
1363 "column_names" are set in the hash-ref. If you still need to know if
1364 all fields in each row are provided, you should enable "keep_meta_info"
1365 so you can check the flags.
1366
1367 If "keep_meta_info" is "false", "is_missing" will always return
1368 "undef", regardless of $column_idx being valid or not. If this
1369 attribute is "true" it will return either 0 (the field is present) or 1
1370 (the field is missing).
1371
1372 A special case is the empty line. If the line is completely empty -
1373 after dealing with the flags - this is still a valid CSV line: it is a
1374 record of just one single empty field. However, if "keep_meta_info" is
1375 set, invoking "is_missing" with index 0 will now return true.
1376
1377 status
1378 $status = $csv->status ();
1379
1380 This method returns the status of the last invoked "combine" or "parse"
1381 call. Status is success (true: 1) or failure (false: "undef" or 0).
1382
1383 error_input
1384 $bad_argument = $csv->error_input ();
1385
1386 This method returns the erroneous argument (if it exists) of "combine"
1387 or "parse", whichever was called more recently. If the last
1388 invocation was successful, "error_input" will return "undef".
1389
1390 error_diag
1391 Text::CSV_PP->error_diag ();
1392 $csv->error_diag ();
1393 $error_code = 0 + $csv->error_diag ();
1394 $error_str = "" . $csv->error_diag ();
1395 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1396
1397 If (and only if) an error occurred, this function returns the
1398 diagnostics of that error.
1399
1400 If called in void context, this will print the internal error code and
1401 the associated error message to STDERR.
1402
1403 If called in list context, this will return the error code and the
1404 error message in that order. If the last error was from parsing, the
1405 rest of the values returned are a best guess at the location within
1406 the line that was being parsed. Their values are 1-based. The
1407 position currently is index of the byte at which the parsing failed in
1408 the current record. It might change to be the index of the current
1409 character in a later release. The records is the index of the record
1410 parsed by the csv instance. The field number is the index of the field
1411 the parser thinks it is currently trying to parse. See
1412 examples/csv-check for how this can be used.
1413
1414 If called in scalar context, it will return the diagnostics in a
1415 single scalar, a-la $!. It will contain the error code in numeric
1416 context, and the diagnostics message in string context.
1417
1418 When called as a class method or a direct function call, the
1419 diagnostics are that of the last "new" call.
1420
1421 record_number
1422 $recno = $csv->record_number ();
1423
1424 Returns the records parsed by this csv instance. This value should be
1425 more accurate than $. when embedded newlines come in play. Records
1426 written by this instance are not counted.
1427
1428 SetDiag
1429 $csv->SetDiag (0);
1430
1431 Use to reset the diagnostics if you are dealing with errors.
1432
1434 This section is also taken from Text::CSV_XS.
1435
1436 csv
1437 This function is not exported by default and should be explicitly
1438 requested:
1439
1440 use Text::CSV_PP qw( csv );
1441
1442 This is an high-level function that aims at simple (user) interfaces.
1443 This can be used to read/parse a "CSV" file or stream (the default
1444 behavior) or to produce a file or write to a stream (define the "out"
1445 attribute). It returns an array- or hash-reference on parsing (or
1446 "undef" on fail) or the numeric value of "error_diag" on writing.
1447 When this function fails you can get to the error using the class call
1448 to "error_diag"
1449
1450 my $aoa = csv (in => "test.csv") or
1451 die Text::CSV_PP->error_diag;
1452
1453 This function takes the arguments as key-value pairs. This can be
1454 passed as a list or as an anonymous hash:
1455
1456 my $aoa = csv ( in => "test.csv", sep_char => ";");
1457 my $aoh = csv ({ in => $fh, headers => "auto" });
1458
1459 The arguments passed consist of two parts: the arguments to "csv"
1460 itself and the optional attributes to the "CSV" object used inside
1461 the function as enumerated and explained in "new".
1462
1463 If not overridden, the default option used for CSV is
1464
1465 auto_diag => 1
1466 escape_null => 0
1467
1468 The option that is always set and cannot be altered is
1469
1470 binary => 1
1471
1472 As this function will likely be used in one-liners, it allows "quote"
1473 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1474 "esc" or "escape".
1475
1476 Alternative invocations:
1477
1478 my $aoa = Text::CSV_PP::csv (in => "file.csv");
1479
1480 my $csv = Text::CSV_PP->new ();
1481 my $aoa = $csv->csv (in => "file.csv");
1482
1483 In the latter case, the object attributes are used from the existing
1484 object and the attribute arguments in the function call are ignored:
1485
1486 my $csv = Text::CSV_PP->new ({ sep_char => ";" });
1487 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1488
1489 will parse using ";" as "sep_char", not ",".
1490
1491 in
1492
1493 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1494 which will be opened for reading and closed when finished, a file
1495 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1496 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1497 "\q{1,2,"csv"}").
1498
1499 When used with "out", "in" should be a reference to a CSV structure
1500 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1501 reference. The code-ref will be invoked with no arguments.
1502
1503 my $aoa = csv (in => "file.csv");
1504
1505 open my $fh, "<", "file.csv";
1506 my $aoa = csv (in => $fh);
1507
1508 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1509 my $err = csv (in => $csv, out => "file.csv");
1510
1511 If called in void context without the "out" attribute, the resulting
1512 ref will be used as input to a subsequent call to csv:
1513
1514 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1515
1516 will be a shortcut to
1517
1518 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1519
1520 where, in the absence of the "out" attribute, this is a shortcut to
1521
1522 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1523 out => *STDOUT)
1524
1525 out
1526
1527 csv (in => $aoa, out => "file.csv");
1528 csv (in => $aoa, out => $fh);
1529 csv (in => $aoa, out => STDOUT);
1530 csv (in => $aoa, out => *STDOUT);
1531 csv (in => $aoa, out => \*STDOUT);
1532 csv (in => $aoa, out => \my $data);
1533 csv (in => $aoa, out => undef);
1534 csv (in => $aoa, out => \"skip");
1535
1536 In output mode, the default CSV options when producing CSV are
1537
1538 eol => "\r\n"
1539
1540 The "fragment" attribute is ignored in output mode.
1541
1542 "out" can be a file name (e.g. "file.csv"), which will be opened for
1543 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1544 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1545 or a reference to a scalar (e.g. "\my $data").
1546
1547 csv (in => sub { $sth->fetch }, out => "dump.csv");
1548 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1549 headers => $sth->{NAME_lc});
1550
1551 When a code-ref is used for "in", the output is generated per
1552 invocation, so no buffering is involved. This implies that there is no
1553 size restriction on the number of records. The "csv" function ends when
1554 the coderef returns a false value.
1555
1556 If "out" is set to a reference of the literal string "skip", the output
1557 will be suppressed completely, which might be useful in combination
1558 with a filter for side effects only.
1559
1560 my %cache;
1561 csv (in => "dump.csv",
1562 out => \"skip",
1563 on_in => sub { $cache{$_[1][1]}++ });
1564
1565 Currently, setting "out" to any false value ("undef", "", 0) will be
1566 equivalent to "\"skip"".
1567
1568 encoding
1569
1570 If passed, it should be an encoding accepted by the ":encoding()"
1571 option to "open". There is no default value. This attribute does not
1572 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1573 use in command line invocations.
1574
1575 If "encoding" is set to the literal value "auto", the method "header"
1576 will be invoked on the opened stream to check if there is a BOM and set
1577 the encoding accordingly. This is equal to passing a true value in
1578 the option "detect_bom".
1579
1580 detect_bom
1581
1582 If "detect_bom" is given, the method "header" will be invoked on
1583 the opened stream to check if there is a BOM and set the encoding
1584 accordingly.
1585
1586 "detect_bom" can be abbreviated to "bom".
1587
1588 This is the same as setting "encoding" to "auto".
1589
1590 Note that as the method "header" is invoked, its default is to also
1591 set the headers.
1592
1593 headers
1594
1595 If this attribute is not given, the default behavior is to produce an
1596 array of arrays.
1597
1598 If "headers" is supplied, it should be an anonymous list of column
1599 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1600 "lc", "uc", or "skip".
1601
1602 skip
1603 When "skip" is used, the header will not be included in the output.
1604
1605 my $aoa = csv (in => $fh, headers => "skip");
1606
1607 auto
1608 If "auto" is used, the first line of the "CSV" source will be read as
1609 the list of field headers and used to produce an array of hashes.
1610
1611 my $aoh = csv (in => $fh, headers => "auto");
1612
1613 lc
1614 If "lc" is used, the first line of the "CSV" source will be read as
1615 the list of field headers mapped to lower case and used to produce
1616 an array of hashes. This is a variation of "auto".
1617
1618 my $aoh = csv (in => $fh, headers => "lc");
1619
1620 uc
1621 If "uc" is used, the first line of the "CSV" source will be read as
1622 the list of field headers mapped to upper case and used to produce
1623 an array of hashes. This is a variation of "auto".
1624
1625 my $aoh = csv (in => $fh, headers => "uc");
1626
1627 CODE
1628 If a coderef is used, the first line of the "CSV" source will be
1629 read as the list of mangled field headers in which each field is
1630 passed as the only argument to the coderef. This list is used to
1631 produce an array of hashes.
1632
1633 my $aoh = csv (in => $fh,
1634 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1635
1636 this example is a variation of using "lc" where all occurrences of
1637 "kode" are replaced with "code".
1638
1639 ARRAY
1640 If "headers" is an anonymous list, the entries in the list will be
1641 used as field names. The first line is considered data instead of
1642 headers.
1643
1644 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1645 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1646
1647 HASH
1648 If "headers" is an hash reference, this implies "auto", but header
1649 fields for that exist as key in the hashref will be replaced by the
1650 value for that key. Given a CSV file like
1651
1652 post-kode,city,name,id number,fubble
1653 1234AA,Duckstad,Donald,13,"X313DF"
1654
1655 using
1656
1657 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1658
1659 will return an entry like
1660
1661 { pc => "1234AA",
1662 city => "Duckstad",
1663 name => "Donald",
1664 ID => "13",
1665 fubble => "X313DF",
1666 }
1667
1668 See also "munge_column_names" and "set_column_names".
1669
1670 munge_column_names
1671
1672 If "munge_column_names" is set, the method "header" is invoked on
1673 the opened stream with all matching arguments to detect and set the
1674 headers.
1675
1676 "munge_column_names" can be abbreviated to "munge".
1677
1678 key
1679
1680 If passed, will default "headers" to "auto" and return a hashref
1681 instead of an array of hashes. Allowed values are simple scalars or
1682 array-references where the first element is the joiner and the rest are
1683 the fields to join to combine the key.
1684
1685 my $ref = csv (in => "test.csv", key => "code");
1686 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1687
1688 with test.csv like
1689
1690 code,product,price,color
1691 1,pc,850,gray
1692 2,keyboard,12,white
1693 3,mouse,5,black
1694
1695 the first example will return
1696
1697 { 1 => {
1698 code => 1,
1699 color => 'gray',
1700 price => 850,
1701 product => 'pc'
1702 },
1703 2 => {
1704 code => 2,
1705 color => 'white',
1706 price => 12,
1707 product => 'keyboard'
1708 },
1709 3 => {
1710 code => 3,
1711 color => 'black',
1712 price => 5,
1713 product => 'mouse'
1714 }
1715 }
1716
1717 the second example will return
1718
1719 { "1:gray" => {
1720 code => 1,
1721 color => 'gray',
1722 price => 850,
1723 product => 'pc'
1724 },
1725 "2:white" => {
1726 code => 2,
1727 color => 'white',
1728 price => 12,
1729 product => 'keyboard'
1730 },
1731 "3:black" => {
1732 code => 3,
1733 color => 'black',
1734 price => 5,
1735 product => 'mouse'
1736 }
1737 }
1738
1739 The "key" attribute can be combined with "headers" for "CSV" date that
1740 has no header line, like
1741
1742 my $ref = csv (
1743 in => "foo.csv",
1744 headers => [qw( c_foo foo bar description stock )],
1745 key => "c_foo",
1746 );
1747
1748 value
1749
1750 Used to create key-value hashes.
1751
1752 Only allowed when "key" is valid. A "value" can be either a single
1753 column label or an anonymous list of column labels. In the first case,
1754 the value will be a simple scalar value, in the latter case, it will be
1755 a hashref.
1756
1757 my $ref = csv (in => "test.csv", key => "code",
1758 value => "price");
1759 my $ref = csv (in => "test.csv", key => "code",
1760 value => [ "product", "price" ]);
1761 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1762 value => "price");
1763 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1764 value => [ "product", "price" ]);
1765
1766 with test.csv like
1767
1768 code,product,price,color
1769 1,pc,850,gray
1770 2,keyboard,12,white
1771 3,mouse,5,black
1772
1773 the first example will return
1774
1775 { 1 => 850,
1776 2 => 12,
1777 3 => 5,
1778 }
1779
1780 the second example will return
1781
1782 { 1 => {
1783 price => 850,
1784 product => 'pc'
1785 },
1786 2 => {
1787 price => 12,
1788 product => 'keyboard'
1789 },
1790 3 => {
1791 price => 5,
1792 product => 'mouse'
1793 }
1794 }
1795
1796 the third example will return
1797
1798 { "1:gray" => 850,
1799 "2:white" => 12,
1800 "3:black" => 5,
1801 }
1802
1803 the fourth example will return
1804
1805 { "1:gray" => {
1806 price => 850,
1807 product => 'pc'
1808 },
1809 "2:white" => {
1810 price => 12,
1811 product => 'keyboard'
1812 },
1813 "3:black" => {
1814 price => 5,
1815 product => 'mouse'
1816 }
1817 }
1818
1819 keep_headers
1820
1821 When using hashes, keep the column names into the arrayref passed, so
1822 all headers are available after the call in the original order.
1823
1824 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1825
1826 This attribute can be abbreviated to "kh" or passed as
1827 "keep_column_names".
1828
1829 This attribute implies a default of "auto" for the "headers" attribute.
1830
1831 fragment
1832
1833 Only output the fragment as defined in the "fragment" method. This
1834 option is ignored when generating "CSV". See "out".
1835
1836 Combining all of them could give something like
1837
1838 use Text::CSV_PP qw( csv );
1839 my $aoh = csv (
1840 in => "test.txt",
1841 encoding => "utf-8",
1842 headers => "auto",
1843 sep_char => "|",
1844 fragment => "row=3;6-9;15-*",
1845 );
1846 say $aoh->[15]{Foo};
1847
1848 sep_set
1849
1850 If "sep_set" is set, the method "header" is invoked on the opened
1851 stream to detect and set "sep_char" with the given set.
1852
1853 "sep_set" can be abbreviated to "seps".
1854
1855 Note that as the "header" method is invoked, its default is to also
1856 set the headers.
1857
1858 set_column_names
1859
1860 If "set_column_names" is passed, the method "header" is invoked on
1861 the opened stream with all arguments meant for "header".
1862
1863 If "set_column_names" is passed as a false value, the content of the
1864 first row is only preserved if the output is AoA:
1865
1866 With an input-file like
1867
1868 bAr,foo
1869 1,2
1870 3,4,5
1871
1872 This call
1873
1874 my $aoa = csv (in => $file, set_column_names => 0);
1875
1876 will result in
1877
1878 [[ "bar", "foo" ],
1879 [ "1", "2" ],
1880 [ "3", "4", "5" ]]
1881
1882 and
1883
1884 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
1885
1886 will result in
1887
1888 [[ "bAr", "foo" ],
1889 [ "1", "2" ],
1890 [ "3", "4", "5" ]]
1891
1892 Callbacks
1893 Callbacks enable actions triggered from the inside of Text::CSV_PP.
1894
1895 While most of what this enables can easily be done in an unrolled
1896 loop as described in the "SYNOPSIS" callbacks can be used to meet
1897 special demands or enhance the "csv" function.
1898
1899 error
1900 $csv->callbacks (error => sub { $csv->SetDiag (0) });
1901
1902 the "error" callback is invoked when an error occurs, but only
1903 when "auto_diag" is set to a true value. A callback is invoked with
1904 the values returned by "error_diag":
1905
1906 my ($c, $s);
1907
1908 sub ignore3006
1909 {
1910 my ($err, $msg, $pos, $recno, $fldno) = @_;
1911 if ($err == 3006) {
1912 # ignore this error
1913 ($c, $s) = (undef, undef);
1914 Text::CSV_PP->SetDiag (0);
1915 }
1916 # Any other error
1917 return;
1918 } # ignore3006
1919
1920 $csv->callbacks (error => \&ignore3006);
1921 $csv->bind_columns (\$c, \$s);
1922 while ($csv->getline ($fh)) {
1923 # Error 3006 will not stop the loop
1924 }
1925
1926 after_parse
1927 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
1928 while (my $row = $csv->getline ($fh)) {
1929 $row->[-1] eq "NEW";
1930 }
1931
1932 This callback is invoked after parsing with "getline" only if no
1933 error occurred. The callback is invoked with two arguments: the
1934 current "CSV" parser object and an array reference to the fields
1935 parsed.
1936
1937 The return code of the callback is ignored unless it is a reference
1938 to the string "skip", in which case the record will be skipped in
1939 "getline_all".
1940
1941 sub add_from_db
1942 {
1943 my ($csv, $row) = @_;
1944 $sth->execute ($row->[4]);
1945 push @$row, $sth->fetchrow_array;
1946 } # add_from_db
1947
1948 my $aoa = csv (in => "file.csv", callbacks => {
1949 after_parse => \&add_from_db });
1950
1951 This hook can be used for validation:
1952
1953 FAIL
1954 Die if any of the records does not validate a rule:
1955
1956 after_parse => sub {
1957 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
1958 die "5th field does not have a valid Dutch zipcode";
1959 }
1960
1961 DEFAULT
1962 Replace invalid fields with a default value:
1963
1964 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
1965
1966 SKIP
1967 Skip records that have invalid fields (only applies to
1968 "getline_all"):
1969
1970 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
1971
1972 before_print
1973 my $idx = 1;
1974 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
1975 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
1976
1977 This callback is invoked before printing with "print" only if no
1978 error occurred. The callback is invoked with two arguments: the
1979 current "CSV" parser object and an array reference to the fields
1980 passed.
1981
1982 The return code of the callback is ignored.
1983
1984 sub max_4_fields
1985 {
1986 my ($csv, $row) = @_;
1987 @$row > 4 and splice @$row, 4;
1988 } # max_4_fields
1989
1990 csv (in => csv (in => "file.csv"), out => *STDOUT,
1991 callbacks => { before print => \&max_4_fields });
1992
1993 This callback is not active for "combine".
1994
1995 Callbacks for csv ()
1996
1997 The "csv" allows for some callbacks that do not integrate in XS
1998 internals but only feature the "csv" function.
1999
2000 csv (in => "file.csv",
2001 callbacks => {
2002 filter => { 6 => sub { $_ > 15 } }, # first
2003 after_parse => sub { say "AFTER PARSE"; }, # first
2004 after_in => sub { say "AFTER IN"; }, # second
2005 on_in => sub { say "ON IN"; }, # third
2006 },
2007 );
2008
2009 csv (in => $aoh,
2010 out => "file.csv",
2011 callbacks => {
2012 on_in => sub { say "ON IN"; }, # first
2013 before_out => sub { say "BEFORE OUT"; }, # second
2014 before_print => sub { say "BEFORE PRINT"; }, # third
2015 },
2016 );
2017
2018 filter
2019 This callback can be used to filter records. It is called just after
2020 a new record has been scanned. The callback accepts a:
2021
2022 hashref
2023 The keys are the index to the row (the field name or field number,
2024 1-based) and the values are subs to return a true or false value.
2025
2026 csv (in => "file.csv", filter => {
2027 3 => sub { m/a/ }, # third field should contain an "a"
2028 5 => sub { length > 4 }, # length of the 5th field minimal 5
2029 });
2030
2031 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2032
2033 If the keys to the filter hash contain any character that is not a
2034 digit it will also implicitly set "headers" to "auto" unless
2035 "headers" was already passed as argument. When headers are
2036 active, returning an array of hashes, the filter is not applicable
2037 to the header itself.
2038
2039 All sub results should match, as in AND.
2040
2041 The context of the callback sets $_ localized to the field
2042 indicated by the filter. The two arguments are as with all other
2043 callbacks, so the other fields in the current row can be seen:
2044
2045 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2046
2047 If the context is set to return a list of hashes ("headers" is
2048 defined), the current record will also be available in the
2049 localized %_:
2050
2051 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2052
2053 If the filter is used to alter the content by changing $_, make
2054 sure that the sub returns true in order not to have that record
2055 skipped:
2056
2057 filter => { 2 => sub { $_ = uc }}
2058
2059 will upper-case the second field, and then skip it if the resulting
2060 content evaluates to false. To always accept, end with truth:
2061
2062 filter => { 2 => sub { $_ = uc; 1 }}
2063
2064 coderef
2065 csv (in => "file.csv", filter => sub { $n++; 0; });
2066
2067 If the argument to "filter" is a coderef, it is an alias or
2068 shortcut to a filter on column 0:
2069
2070 csv (filter => sub { $n++; 0 });
2071
2072 is equal to
2073
2074 csv (filter => { 0 => sub { $n++; 0 });
2075
2076 filter-name
2077 csv (in => "file.csv", filter => "not_blank");
2078 csv (in => "file.csv", filter => "not_empty");
2079 csv (in => "file.csv", filter => "filled");
2080
2081 These are predefined filters
2082
2083 Given a file like (line numbers prefixed for doc purpose only):
2084
2085 1:1,2,3
2086 2:
2087 3:,
2088 4:""
2089 5:,,
2090 6:, ,
2091 7:"",
2092 8:" "
2093 9:4,5,6
2094
2095 not_blank
2096 Filter out the blank lines
2097
2098 This filter is a shortcut for
2099
2100 filter => { 0 => sub { @{$_[1]} > 1 or
2101 defined $_[1][0] && $_[1][0] ne "" } }
2102
2103 Due to the implementation, it is currently impossible to also
2104 filter lines that consists only of a quoted empty field. These
2105 lines are also considered blank lines.
2106
2107 With the given example, lines 2 and 4 will be skipped.
2108
2109 not_empty
2110 Filter out lines where all the fields are empty.
2111
2112 This filter is a shortcut for
2113
2114 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2115
2116 A space is not regarded being empty, so given the example data,
2117 lines 2, 3, 4, 5, and 7 are skipped.
2118
2119 filled
2120 Filter out lines that have no visible data
2121
2122 This filter is a shortcut for
2123
2124 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2125
2126 This filter rejects all lines that not have at least one field
2127 that does not evaluate to the empty string.
2128
2129 With the given example data, this filter would skip lines 2
2130 through 8.
2131
2132 after_in
2133 This callback is invoked for each record after all records have been
2134 parsed but before returning the reference to the caller. The hook is
2135 invoked with two arguments: the current "CSV" parser object and a
2136 reference to the record. The reference can be a reference to a
2137 HASH or a reference to an ARRAY as determined by the arguments.
2138
2139 This callback can also be passed as an attribute without the
2140 "callbacks" wrapper.
2141
2142 before_out
2143 This callback is invoked for each record before the record is
2144 printed. The hook is invoked with two arguments: the current "CSV"
2145 parser object and a reference to the record. The reference can be a
2146 reference to a HASH or a reference to an ARRAY as determined by the
2147 arguments.
2148
2149 This callback can also be passed as an attribute without the
2150 "callbacks" wrapper.
2151
2152 This callback makes the row available in %_ if the row is a hashref.
2153 In this case %_ is writable and will change the original row.
2154
2155 on_in
2156 This callback acts exactly as the "after_in" or the "before_out"
2157 hooks.
2158
2159 This callback can also be passed as an attribute without the
2160 "callbacks" wrapper.
2161
2162 This callback makes the row available in %_ if the row is a hashref.
2163 In this case %_ is writable and will change the original row. So e.g.
2164 with
2165
2166 my $aoh = csv (
2167 in => \"foo\n1\n2\n",
2168 headers => "auto",
2169 on_in => sub { $_{bar} = 2; },
2170 );
2171
2172 $aoh will be:
2173
2174 [ { foo => 1,
2175 bar => 2,
2176 }
2177 { foo => 2,
2178 bar => 2,
2179 }
2180 ]
2181
2182 csv
2183 The function "csv" can also be called as a method or with an
2184 existing Text::CSV_PP object. This could help if the function is to
2185 be invoked a lot of times and the overhead of creating the object
2186 internally over and over again would be prevented by passing an
2187 existing instance.
2188
2189 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
2190
2191 my $aoa = $csv->csv (in => $fh);
2192 my $aoa = csv (in => $fh, csv => $csv);
2193
2194 both act the same. Running this 20000 times on a 20 lines CSV file,
2195 showed a 53% speedup.
2196
2198 This section is also taken from Text::CSV_XS.
2199
2200 Still under construction ...
2201
2202 If an error occurs, "$csv->error_diag" can be used to get information
2203 on the cause of the failure. Note that for speed reasons the internal
2204 value is never cleared on success, so using the value returned by
2205 "error_diag" in normal cases - when no error occurred - may cause
2206 unexpected results.
2207
2208 If the constructor failed, the cause can be found using "error_diag" as
2209 a class method, like "Text::CSV_PP->error_diag".
2210
2211 The "$csv->error_diag" method is automatically invoked upon error when
2212 the contractor was called with "auto_diag" set to 1 or 2, or when
2213 autodie is in effect. When set to 1, this will cause a "warn" with the
2214 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2215 from "auto_diag" reports.
2216
2217 Errors can be (individually) caught using the "error" callback.
2218
2219 The errors as described below are available. I have tried to make the
2220 error itself explanatory enough, but more descriptions will be added.
2221 For most of these errors, the first three capitals describe the error
2222 category:
2223
2224 • INI
2225
2226 Initialization error or option conflict.
2227
2228 • ECR
2229
2230 Carriage-Return related parse error.
2231
2232 • EOF
2233
2234 End-Of-File related parse error.
2235
2236 • EIQ
2237
2238 Parse error inside quotation.
2239
2240 • EIF
2241
2242 Parse error inside field.
2243
2244 • ECB
2245
2246 Combine error.
2247
2248 • EHR
2249
2250 HashRef parse related error.
2251
2252 And below should be the complete list of error codes that can be
2253 returned:
2254
2255 • 1001 "INI - sep_char is equal to quote_char or escape_char"
2256
2257 The separation character cannot be equal to the quotation
2258 character or to the escape character, as this would invalidate all
2259 parsing rules.
2260
2261 • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2262 TAB"
2263
2264 Using the "allow_whitespace" attribute when either "quote_char" or
2265 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2266 allow.
2267
2268 • 1003 "INI - \r or \n in main attr not allowed"
2269
2270 Using default "eol" characters in either "sep_char", "quote_char",
2271 or "escape_char" is not allowed.
2272
2273 • 1004 "INI - callbacks should be undef or a hashref"
2274
2275 The "callbacks" attribute only allows one to be "undef" or a hash
2276 reference.
2277
2278 • 1005 "INI - EOL too long"
2279
2280 The value passed for EOL is exceeding its maximum length (16).
2281
2282 • 1006 "INI - SEP too long"
2283
2284 The value passed for SEP is exceeding its maximum length (16).
2285
2286 • 1007 "INI - QUOTE too long"
2287
2288 The value passed for QUOTE is exceeding its maximum length (16).
2289
2290 • 1008 "INI - SEP undefined"
2291
2292 The value passed for SEP should be defined and not empty.
2293
2294 • 1010 "INI - the header is empty"
2295
2296 The header line parsed in the "header" is empty.
2297
2298 • 1011 "INI - the header contains more than one valid separator"
2299
2300 The header line parsed in the "header" contains more than one
2301 (unique) separator character out of the allowed set of separators.
2302
2303 • 1012 "INI - the header contains an empty field"
2304
2305 The header line parsed in the "header" is contains an empty field.
2306
2307 • 1013 "INI - the header contains nun-unique fields"
2308
2309 The header line parsed in the "header" contains at least two
2310 identical fields.
2311
2312 • 1014 "INI - header called on undefined stream"
2313
2314 The header line cannot be parsed from an undefined sources.
2315
2316 • 1500 "PRM - Invalid/unsupported argument(s)"
2317
2318 Function or method called with invalid argument(s) or parameter(s).
2319
2320 • 1501 "PRM - The key attribute is passed as an unsupported type"
2321
2322 The "key" attribute is of an unsupported type.
2323
2324 • 1502 "PRM - The value attribute is passed without the key attribute"
2325
2326 The "value" attribute is only allowed when a valid key is given.
2327
2328 • 1503 "PRM - The value attribute is passed as an unsupported type"
2329
2330 The "value" attribute is of an unsupported type.
2331
2332 • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2333
2334 When "eol" has been set to anything but the default, like
2335 "\r\t\n", and the "\r" is following the second (closing)
2336 "quote_char", where the characters following the "\r" do not make up
2337 the "eol" sequence, this is an error.
2338
2339 • 2011 "ECR - Characters after end of quoted field"
2340
2341 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2342 quoted field and after the closing double-quote, there should be
2343 either a new-line sequence or a separation character.
2344
2345 • 2012 "EOF - End of data in parsing input stream"
2346
2347 Self-explaining. End-of-file while inside parsing a stream. Can
2348 happen only when reading from streams with "getline", as using
2349 "parse" is done on strings that are not required to have a trailing
2350 "eol".
2351
2352 • 2013 "INI - Specification error for fragments RFC7111"
2353
2354 Invalid specification for URI "fragment" specification.
2355
2356 • 2014 "ENF - Inconsistent number of fields"
2357
2358 Inconsistent number of fields under strict parsing.
2359
2360 • 2021 "EIQ - NL char inside quotes, binary off"
2361
2362 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2363 option has been selected with the constructor.
2364
2365 • 2022 "EIQ - CR char inside quotes, binary off"
2366
2367 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2368 option has been selected with the constructor.
2369
2370 • 2023 "EIQ - QUO character not allowed"
2371
2372 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2373 Bar",\n" will cause this error.
2374
2375 • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2376
2377 The escape character is not allowed as last character in an input
2378 stream.
2379
2380 • 2025 "EIQ - Loose unescaped escape"
2381
2382 An escape character should escape only characters that need escaping.
2383
2384 Allowing the escape for other characters is possible with the
2385 attribute "allow_loose_escapes".
2386
2387 • 2026 "EIQ - Binary character inside quoted field, binary off"
2388
2389 Binary characters are not allowed by default. Exceptions are
2390 fields that contain valid UTF-8, that will automatically be upgraded
2391 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2392 data.
2393
2394 • 2027 "EIQ - Quoted field not terminated"
2395
2396 When parsing a field that started with a quotation character, the
2397 field is expected to be closed with a quotation character. When the
2398 parsed line is exhausted before the quote is found, that field is not
2399 terminated.
2400
2401 • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2402
2403 • 2031 "EIF - CR char is first char of field, not part of EOL"
2404
2405 • 2032 "EIF - CR char inside unquoted, not part of EOL"
2406
2407 • 2034 "EIF - Loose unescaped quote"
2408
2409 • 2035 "EIF - Escaped EOF in unquoted field"
2410
2411 • 2036 "EIF - ESC error"
2412
2413 • 2037 "EIF - Binary character in unquoted field, binary off"
2414
2415 • 2110 "ECB - Binary character in Combine, binary off"
2416
2417 • 2200 "EIO - print to IO failed. See errno"
2418
2419 • 3001 "EHR - Unsupported syntax for column_names ()"
2420
2421 • 3002 "EHR - getline_hr () called before column_names ()"
2422
2423 • 3003 "EHR - bind_columns () and column_names () fields count
2424 mismatch"
2425
2426 • 3004 "EHR - bind_columns () only accepts refs to scalars"
2427
2428 • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2429 fields"
2430
2431 • 3007 "EHR - bind_columns needs refs to writable scalars"
2432
2433 • 3008 "EHR - unexpected error in bound fields"
2434
2435 • 3009 "EHR - print_hr () called before column_names ()"
2436
2437 • 3010 "EHR - print_hr () called with invalid arguments"
2438
2440 Text::CSV_XS, Text::CSV
2441
2442 Older versions took many regexp from
2443 <http://www.din.or.jp/~ohzaki/perl.htm>
2444
2446 Kenichi Ishigaki, <ishigaki[at]cpan.org> Makamaka Hannyaharamitu,
2447 <makamaka[at]cpan.org>
2448
2449 Text::CSV_XS was written by <joe[at]ispsoft.de> and maintained by
2450 <h.m.brand[at]xs4all.nl>.
2451
2452 Text::CSV was written by <alan[at]mfgrtl.com>.
2453
2455 Copyright 2017- by Kenichi Ishigaki, <ishigaki[at]cpan.org> Copyright
2456 2005-2015 by Makamaka Hannyaharamitu, <makamaka[at]cpan.org>
2457
2458 Most of the code and doc is directly taken from the pure perl part of
2459 Text::CSV_XS.
2460
2461 Copyright (C) 2007-2016 H.Merijn Brand. All rights reserved.
2462 Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
2463 Copyright (C) 1997 Alan Citterman. All rights reserved.
2464
2465 This library is free software; you can redistribute it and/or modify it
2466 under the same terms as Perl itself.
2467
2468
2469
2470perl v5.32.1 2021-01-27 Text::CSV_PP(3)