1Text::CSV_PP(3) User Contributed Perl Documentation Text::CSV_PP(3)
2
3
4
6 Text::CSV_PP - Text::CSV_XS compatible pure-Perl module
7
9 This section is taken from Text::CSV_XS.
10
11 # Functional interface
12 use Text::CSV_PP qw( csv );
13
14 # Read whole file in memory
15 my $aoa = csv (in => "data.csv"); # as array of array
16 my $aoh = csv (in => "data.csv",
17 headers => "auto"); # as array of hash
18
19 # Write array of arrays as csv file
20 csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22 # Only show lines where "code" is odd
23 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25 # Object interface
26 use Text::CSV_PP;
27
28 my @rows;
29 # Read/parse CSV
30 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
31 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32 while (my $row = $csv->getline ($fh)) {
33 $row->[2] =~ m/pattern/ or next; # 3rd field should match
34 push @rows, $row;
35 }
36 close $fh;
37
38 # and write as CSV
39 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40 $csv->say ($fh, $_) for @rows;
41 close $fh or die "new.csv: $!";
42
44 Text::CSV_PP is a pure-perl module that provides facilities for the
45 composition and decomposition of comma-separated values. This is
46 (almost) compatible with much faster Text::CSV_XS, and mainly used as
47 its fallback module when you use Text::CSV module without having
48 installed Text::CSV_XS. If you don't have any reason to use this module
49 directly, use Text::CSV for speed boost and portability (or maybe
50 Text::CSV_XS when you write an one-off script and don't need to care
51 about portability).
52
53 The following caveats are taken from the doc of Text::CSV_XS.
54
55 Embedded newlines
56 Important Note: The default behavior is to accept only ASCII
57 characters in the range from 0x20 (space) to 0x7E (tilde). This means
58 that the fields can not contain newlines. If your data contains
59 newlines embedded in fields, or characters above 0x7E (tilde), or
60 binary data, you must set "binary => 1" in the call to "new". To cover
61 the widest range of parsing options, you will always want to set
62 binary.
63
64 But you still have the problem that you have to pass a correct line to
65 the "parse" method, which is more complicated from the usual point of
66 usage:
67
68 my $csv = Text::CSV_PP->new ({ binary => 1, eol => $/ });
69 while (<>) { # WRONG!
70 $csv->parse ($_);
71 my @fields = $csv->fields ();
72 }
73
74 this will break, as the "while" might read broken lines: it does not
75 care about the quoting. If you need to support embedded newlines, the
76 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
77 and "\r\n" by default) and then
78
79 my $csv = Text::CSV_PP->new ({ binary => 1 });
80 open my $fh, "<", $file or die "$file: $!";
81 while (my $row = $csv->getline ($fh)) {
82 my @fields = @$row;
83 }
84
85 The old(er) way of using global file handles is still supported
86
87 while (my $row = $csv->getline (*ARGV)) { ... }
88
89 Unicode
90 Unicode is only tested to work with perl-5.8.2 and up.
91
92 See also "BOM".
93
94 The simplest way to ensure the correct encoding is used for in- and
95 output is by either setting layers on the filehandles, or setting the
96 "encoding" argument for "csv".
97
98 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
99 or
100 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
101
102 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
103 or
104 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
105
106 On parsing (both for "getline" and "parse"), if the source is marked
107 being UTF8, then all fields that are marked binary will also be marked
108 UTF8.
109
110 On combining ("print" and "combine"): if any of the combining fields
111 was marked UTF8, the resulting string will be marked as UTF8. Note
112 however that all fields before the first field marked UTF8 and
113 contained 8-bit characters that were not upgraded to UTF8, these will
114 be "bytes" in the resulting string too, possibly causing unexpected
115 errors. If you pass data of different encoding, or you don't know if
116 there is different encoding, force it to be upgraded before you pass
117 them on:
118
119 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
120
121 For complete control over encoding, please use Text::CSV::Encoded:
122
123 use Text::CSV::Encoded;
124 my $csv = Text::CSV::Encoded->new ({
125 encoding_in => "iso-8859-1", # the encoding comes into Perl
126 encoding_out => "cp1252", # the encoding comes out of Perl
127 });
128
129 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
130 # combine () and print () accept *literally* utf8 encoded data
131 # parse () and getline () return *literally* utf8 encoded data
132
133 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
134 # combine () and print () accept UTF8 marked data
135 # parse () and getline () return UTF8 marked data
136
137 BOM
138 BOM (or Byte Order Mark) handling is available only inside the
139 "header" method. This method supports the following encodings:
140 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
141 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
142 <https://en.wikipedia.org/wiki/Byte_order_mark>.
143
144 If a file has a BOM, the easiest way to deal with that is
145
146 my $aoh = csv (in => $file, detect_bom => 1);
147
148 All records will be encoded based on the detected BOM.
149
150 This implies a call to the "header" method, which defaults to also
151 set the "column_names". So this is not the same as
152
153 my $aoh = csv (in => $file, headers => "auto");
154
155 which only reads the first record to set "column_names" but ignores
156 any meaning of possible present BOM.
157
159 This section is also taken from Text::CSV_XS.
160
161 version
162 (Class method) Returns the current module version.
163
164 new
165 (Class method) Returns a new instance of class Text::CSV_PP. The
166 attributes are described by the (optional) hash ref "\%attr".
167
168 my $csv = Text::CSV_PP->new ({ attributes ... });
169
170 The following attributes are available:
171
172 eol
173
174 my $csv = Text::CSV_PP->new ({ eol => $/ });
175 $csv->eol (undef);
176 my $eol = $csv->eol;
177
178 The end-of-line string to add to rows for "print" or the record
179 separator for "getline".
180
181 When not passed in a parser instance, the default behavior is to
182 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
183 "eol" at all. Passing "undef" or the empty string behave the same.
184
185 When not passed in a generating instance, records are not terminated
186 at all, so it is probably wise to pass something you expect. A safe
187 choice for "eol" on output is either $/ or "\r\n".
188
189 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
190 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
191 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
192
193 If both $/ and "eol" equal "\015", parsing lines that end on only a
194 Carriage Return without Line Feed, will be "parse"d correct.
195
196 sep_char
197
198 my $csv = Text::CSV_PP->new ({ sep_char => ";" });
199 $csv->sep_char (";");
200 my $c = $csv->sep_char;
201
202 The char used to separate fields, by default a comma. (","). Limited
203 to a single-byte character, usually in the range from 0x20 (space) to
204 0x7E (tilde). When longer sequences are required, use "sep".
205
206 The separation character can not be equal to the quote character or to
207 the escape character.
208
209 sep
210
211 my $csv = Text::CSV_PP->new ({ sep => "\N{FULLWIDTH COMMA}" });
212 $csv->sep (";");
213 my $sep = $csv->sep;
214
215 The chars used to separate fields, by default undefined. Limited to 8
216 bytes.
217
218 When set, overrules "sep_char". If its length is one byte it acts as
219 an alias to "sep_char".
220
221 quote_char
222
223 my $csv = Text::CSV_PP->new ({ quote_char => "'" });
224 $csv->quote_char (undef);
225 my $c = $csv->quote_char;
226
227 The character to quote fields containing blanks or binary data, by
228 default the double quote character ("""). A value of undef suppresses
229 quote chars (for simple cases only). Limited to a single-byte
230 character, usually in the range from 0x20 (space) to 0x7E (tilde).
231 When longer sequences are required, use "quote".
232
233 "quote_char" can not be equal to "sep_char".
234
235 quote
236
237 my $csv = Text::CSV_PP->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
238 $csv->quote ("'");
239 my $quote = $csv->quote;
240
241 The chars used to quote fields, by default undefined. Limited to 8
242 bytes.
243
244 When set, overrules "quote_char". If its length is one byte it acts as
245 an alias to "quote_char".
246
247 This method does not support "undef". Use "quote_char" to disable
248 quotation.
249
250 escape_char
251
252 my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
253 $csv->escape_char (":");
254 my $c = $csv->escape_char;
255
256 The character to escape certain characters inside quoted fields.
257 This is limited to a single-byte character, usually in the range
258 from 0x20 (space) to 0x7E (tilde).
259
260 The "escape_char" defaults to being the double-quote mark ("""). In
261 other words the same as the default "quote_char". This means that
262 doubling the quote mark in a field escapes it:
263
264 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
265
266 If you change the "quote_char" without changing the
267 "escape_char", the "escape_char" will still be the double-quote
268 ("""). If instead you want to escape the "quote_char" by doubling it
269 you will need to also change the "escape_char" to be the same as what
270 you have changed the "quote_char" to.
271
272 Setting "escape_char" to "undef" or "" will completely disable escapes
273 and is greatly discouraged. This will also disable "escape_null".
274
275 The escape character can not be equal to the separation character.
276
277 binary
278
279 my $csv = Text::CSV_PP->new ({ binary => 1 });
280 $csv->binary (0);
281 my $f = $csv->binary;
282
283 If this attribute is 1, you may use binary characters in quoted
284 fields, including line feeds, carriage returns and "NULL" bytes. (The
285 latter could be escaped as ""0".) By default this feature is off.
286
287 If a string is marked UTF8, "binary" will be turned on automatically
288 when binary characters other than "CR" and "NL" are encountered. Note
289 that a simple string like "\x{00a0}" might still be binary, but not
290 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
291
292 strict
293
294 my $csv = Text::CSV_PP->new ({ strict => 1 });
295 $csv->strict (0);
296 my $f = $csv->strict;
297
298 If this attribute is set to 1, any row that parses to a different
299 number of fields than the previous row will cause the parser to throw
300 error 2014.
301
302 skip_empty_rows
303
304 my $csv = Text::CSV_PP->new ({ skip_empty_rows => 1 });
305 $csv->skip_empty_rows (0);
306 my $f = $csv->skip_empty_rows;
307
308 If this attribute is set to 1, any row that has an "eol" immediately
309 following the start of line will be skipped. Default behavior is to
310 return one single empty field.
311
312 This attribute is only used in parsing.
313
314 formula_handling
315
316 Alias for "formula"
317
318 formula
319
320 my $csv = Text::CSV_PP->new ({ formula => "none" });
321 $csv->formula ("none");
322 my $f = $csv->formula;
323
324 This defines the behavior of fields containing formulas. As formulas
325 are considered dangerous in spreadsheets, this attribute can define an
326 optional action to be taken if a field starts with an equal sign ("=").
327
328 For purpose of code-readability, this can also be written as
329
330 my $csv = Text::CSV_PP->new ({ formula_handling => "none" });
331 $csv->formula_handling ("none");
332 my $f = $csv->formula_handling;
333
334 Possible values for this attribute are
335
336 none
337 Take no specific action. This is the default.
338
339 $csv->formula ("none");
340
341 die
342 Cause the process to "die" whenever a leading "=" is encountered.
343
344 $csv->formula ("die");
345
346 croak
347 Cause the process to "croak" whenever a leading "=" is encountered.
348 (See Carp)
349
350 $csv->formula ("croak");
351
352 diag
353 Report position and content of the field whenever a leading "=" is
354 found. The value of the field is unchanged.
355
356 $csv->formula ("diag");
357
358 empty
359 Replace the content of fields that start with a "=" with the empty
360 string.
361
362 $csv->formula ("empty");
363 $csv->formula ("");
364
365 undef
366 Replace the content of fields that start with a "=" with "undef".
367
368 $csv->formula ("undef");
369 $csv->formula (undef);
370
371 a callback
372 Modify the content of fields that start with a "=" with the return-
373 value of the callback. The original content of the field is
374 available inside the callback as $_;
375
376 # Replace all formula's with 42
377 $csv->formula (sub { 42; });
378
379 # same as $csv->formula ("empty") but slower
380 $csv->formula (sub { "" });
381
382 # Allow =4+12
383 $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
384
385 # Allow more complex calculations
386 $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
387
388 All other values will give a warning and then fallback to "diag".
389
390 decode_utf8
391
392 my $csv = Text::CSV_PP->new ({ decode_utf8 => 1 });
393 $csv->decode_utf8 (0);
394 my $f = $csv->decode_utf8;
395
396 This attributes defaults to TRUE.
397
398 While parsing, fields that are valid UTF-8, are automatically set to
399 be UTF-8, so that
400
401 $csv->parse ("\xC4\xA8\n");
402
403 results in
404
405 PV("\304\250"\0) [UTF8 "\x{128}"]
406
407 Sometimes it might not be a desired action. To prevent those upgrades,
408 set this attribute to false, and the result will be
409
410 PV("\304\250"\0)
411
412 auto_diag
413
414 my $csv = Text::CSV_PP->new ({ auto_diag => 1 });
415 $csv->auto_diag (2);
416 my $l = $csv->auto_diag;
417
418 Set this attribute to a number between 1 and 9 causes "error_diag" to
419 be automatically called in void context upon errors.
420
421 In case of error "2012 - EOF", this call will be void.
422
423 If "auto_diag" is set to a numeric value greater than 1, it will "die"
424 on errors instead of "warn". If set to anything unrecognized, it will
425 be silently ignored.
426
427 Future extensions to this feature will include more reliable auto-
428 detection of "autodie" being active in the scope of which the error
429 occurred which will increment the value of "auto_diag" with 1 the
430 moment the error is detected.
431
432 diag_verbose
433
434 my $csv = Text::CSV_PP->new ({ diag_verbose => 1 });
435 $csv->diag_verbose (2);
436 my $l = $csv->diag_verbose;
437
438 Set the verbosity of the output triggered by "auto_diag". Currently
439 only adds the current input-record-number (if known) to the
440 diagnostic output with an indication of the position of the error.
441
442 blank_is_undef
443
444 my $csv = Text::CSV_PP->new ({ blank_is_undef => 1 });
445 $csv->blank_is_undef (0);
446 my $f = $csv->blank_is_undef;
447
448 Under normal circumstances, "CSV" data makes no distinction between
449 quoted- and unquoted empty fields. These both end up in an empty
450 string field once read, thus
451
452 1,"",," ",2
453
454 is read as
455
456 ("1", "", "", " ", "2")
457
458 When writing "CSV" files with either "always_quote" or "quote_empty"
459 set, the unquoted empty field is the result of an undefined value.
460 To enable this distinction when reading "CSV" data, the
461 "blank_is_undef" attribute will cause unquoted empty fields to be set
462 to "undef", causing the above to be parsed as
463
464 ("1", "", undef, " ", "2")
465
466 Note that this is specifically important when loading "CSV" fields
467 into a database that allows "NULL" values, as the perl equivalent for
468 "NULL" is "undef" in DBI land.
469
470 empty_is_undef
471
472 my $csv = Text::CSV_PP->new ({ empty_is_undef => 1 });
473 $csv->empty_is_undef (0);
474 my $f = $csv->empty_is_undef;
475
476 Going one step further than "blank_is_undef", this attribute
477 converts all empty fields to "undef", so
478
479 1,"",," ",2
480
481 is read as
482
483 (1, undef, undef, " ", 2)
484
485 Note that this affects only fields that are originally empty, not
486 fields that are empty after stripping allowed whitespace. YMMV.
487
488 allow_whitespace
489
490 my $csv = Text::CSV_PP->new ({ allow_whitespace => 1 });
491 $csv->allow_whitespace (0);
492 my $f = $csv->allow_whitespace;
493
494 When this option is set to true, the whitespace ("TAB"'s and
495 "SPACE"'s) surrounding the separation character is removed when
496 parsing. If either "TAB" or "SPACE" is one of the three characters
497 "sep_char", "quote_char", or "escape_char" it will not be considered
498 whitespace.
499
500 Now lines like:
501
502 1 , "foo" , bar , 3 , zapp
503
504 are parsed as valid "CSV", even though it violates the "CSV" specs.
505
506 Note that all whitespace is stripped from both start and end of
507 each field. That would make it more than a feature to enable parsing
508 bad "CSV" lines, as
509
510 1, 2.0, 3, ape , monkey
511
512 will now be parsed as
513
514 ("1", "2.0", "3", "ape", "monkey")
515
516 even if the original line was perfectly acceptable "CSV".
517
518 allow_loose_quotes
519
520 my $csv = Text::CSV_PP->new ({ allow_loose_quotes => 1 });
521 $csv->allow_loose_quotes (0);
522 my $f = $csv->allow_loose_quotes;
523
524 By default, parsing unquoted fields containing "quote_char" characters
525 like
526
527 1,foo "bar" baz,42
528
529 would result in parse error 2034. Though it is still bad practice to
530 allow this format, we cannot help the fact that some vendors
531 make their applications spit out lines styled this way.
532
533 If there is really bad "CSV" data, like
534
535 1,"foo "bar" baz",42
536
537 or
538
539 1,""foo bar baz"",42
540
541 there is a way to get this data-line parsed and leave the quotes inside
542 the quoted field as-is. This can be achieved by setting
543 "allow_loose_quotes" AND making sure that the "escape_char" is not
544 equal to "quote_char".
545
546 allow_loose_escapes
547
548 my $csv = Text::CSV_PP->new ({ allow_loose_escapes => 1 });
549 $csv->allow_loose_escapes (0);
550 my $f = $csv->allow_loose_escapes;
551
552 Parsing fields that have "escape_char" characters that escape
553 characters that do not need to be escaped, like:
554
555 my $csv = Text::CSV_PP->new ({ escape_char => "\\" });
556 $csv->parse (qq{1,"my bar\'s",baz,42});
557
558 would result in parse error 2025. Though it is bad practice to allow
559 this format, this attribute enables you to treat all escape character
560 sequences equal.
561
562 allow_unquoted_escape
563
564 my $csv = Text::CSV_PP->new ({ allow_unquoted_escape => 1 });
565 $csv->allow_unquoted_escape (0);
566 my $f = $csv->allow_unquoted_escape;
567
568 A backward compatibility issue where "escape_char" differs from
569 "quote_char" prevents "escape_char" to be in the first position of a
570 field. If "quote_char" is equal to the default """ and "escape_char"
571 is set to "\", this would be illegal:
572
573 1,\0,2
574
575 Setting this attribute to 1 might help to overcome issues with
576 backward compatibility and allow this style.
577
578 always_quote
579
580 my $csv = Text::CSV_PP->new ({ always_quote => 1 });
581 $csv->always_quote (0);
582 my $f = $csv->always_quote;
583
584 By default the generated fields are quoted only if they need to be.
585 For example, if they contain the separator character. If you set this
586 attribute to 1 then all defined fields will be quoted. ("undef" fields
587 are not quoted, see "blank_is_undef"). This makes it quite often easier
588 to handle exported data in external applications.
589
590 quote_space
591
592 my $csv = Text::CSV_PP->new ({ quote_space => 1 });
593 $csv->quote_space (0);
594 my $f = $csv->quote_space;
595
596 By default, a space in a field would trigger quotation. As no rule
597 exists this to be forced in "CSV", nor any for the opposite, the
598 default is true for safety. You can exclude the space from this
599 trigger by setting this attribute to 0.
600
601 quote_empty
602
603 my $csv = Text::CSV_PP->new ({ quote_empty => 1 });
604 $csv->quote_empty (0);
605 my $f = $csv->quote_empty;
606
607 By default the generated fields are quoted only if they need to be.
608 An empty (defined) field does not need quotation. If you set this
609 attribute to 1 then empty defined fields will be quoted. ("undef"
610 fields are not quoted, see "blank_is_undef"). See also "always_quote".
611
612 quote_binary
613
614 my $csv = Text::CSV_PP->new ({ quote_binary => 1 });
615 $csv->quote_binary (0);
616 my $f = $csv->quote_binary;
617
618 By default, all "unsafe" bytes inside a string cause the combined
619 field to be quoted. By setting this attribute to 0, you can disable
620 that trigger for bytes ">= 0x7F".
621
622 escape_null
623
624 my $csv = Text::CSV_PP->new ({ escape_null => 1 });
625 $csv->escape_null (0);
626 my $f = $csv->escape_null;
627
628 By default, a "NULL" byte in a field would be escaped. This option
629 enables you to treat the "NULL" byte as a simple binary character in
630 binary mode (the "{ binary => 1 }" is set). The default is true. You
631 can prevent "NULL" escapes by setting this attribute to 0.
632
633 When the "escape_char" attribute is set to undefined, this attribute
634 will be set to false.
635
636 The default setting will encode "=\x00=" as
637
638 "="0="
639
640 With "escape_null" set, this will result in
641
642 "=\x00="
643
644 The default when using the "csv" function is "false".
645
646 For backward compatibility reasons, the deprecated old name
647 "quote_null" is still recognized.
648
649 keep_meta_info
650
651 my $csv = Text::CSV_PP->new ({ keep_meta_info => 1 });
652 $csv->keep_meta_info (0);
653 my $f = $csv->keep_meta_info;
654
655 By default, the parsing of input records is as simple and fast as
656 possible. However, some parsing information - like quotation of the
657 original field - is lost in that process. Setting this flag to true
658 enables retrieving that information after parsing with the methods
659 "meta_info", "is_quoted", and "is_binary" described below. Default is
660 false for performance.
661
662 If you set this attribute to a value greater than 9, then you can
663 control output quotation style like it was used in the input of the the
664 last parsed record (unless quotation was added because of other
665 reasons).
666
667 my $csv = Text::CSV_PP->new ({
668 binary => 1,
669 keep_meta_info => 1,
670 quote_space => 0,
671 });
672
673 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
674
675 $csv->print (*STDOUT, \@row);
676 # 1,,, , ,f,g,"h""h",help,help
677 $csv->keep_meta_info (11);
678 $csv->print (*STDOUT, \@row);
679 # 1,,"", ," ",f,"g","h""h",help,"help"
680
681 undef_str
682
683 my $csv = Text::CSV_PP->new ({ undef_str => "\\N" });
684 $csv->undef_str (undef);
685 my $s = $csv->undef_str;
686
687 This attribute optionally defines the output of undefined fields. The
688 value passed is not changed at all, so if it needs quotation, the
689 quotation needs to be included in the value of the attribute. Use with
690 caution, as passing a value like ",",,,,""" will for sure mess up
691 your output. The default for this attribute is "undef", meaning no
692 special treatment.
693
694 This attribute is useful when exporting CSV data to be imported in
695 custom loaders, like for MySQL, that recognize special sequences for
696 "NULL" data.
697
698 This attribute has no meaning when parsing CSV data.
699
700 comment_str
701
702 my $csv = Text::CSV_PP->new ({ comment_str => "#" });
703 $csv->comment_str (undef);
704 my $s = $csv->comment_str;
705
706 This attribute optionally defines a string to be recognized as comment.
707 If this attribute is defined, all lines starting with this sequence
708 will not be parsed as CSV but skipped as comment.
709
710 This attribute has no meaning when generating CSV.
711
712 Comment strings that start with any of the special characters/sequences
713 are not supported (so it cannot start with any of "sep_char",
714 "quote_char", "escape_char", "sep", "quote", or "eol").
715
716 For convenience, "comment" is an alias for "comment_str".
717
718 verbatim
719
720 my $csv = Text::CSV_PP->new ({ verbatim => 1 });
721 $csv->verbatim (0);
722 my $f = $csv->verbatim;
723
724 This is a quite controversial attribute to set, but makes some hard
725 things possible.
726
727 The rationale behind this attribute is to tell the parser that the
728 normally special characters newline ("NL") and Carriage Return ("CR")
729 will not be special when this flag is set, and be dealt with as being
730 ordinary binary characters. This will ease working with data with
731 embedded newlines.
732
733 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
734 every line.
735
736 Imagine a file format like
737
738 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
739
740 where, the line ending is a very specific "#\r\n", and the sep_char is
741 a "^" (caret). None of the fields is quoted, but embedded binary
742 data is likely to be present. With the specific line ending, this
743 should not be too hard to detect.
744
745 By default, Text::CSV_PP' parse function is instructed to only know
746 about "\n" and "\r" to be legal line endings, and so has to deal with
747 the embedded newline as a real "end-of-line", so it can scan the next
748 line if binary is true, and the newline is inside a quoted field. With
749 this option, we tell "parse" to parse the line as if "\n" is just
750 nothing more than a binary character.
751
752 For "parse" this means that the parser has no more idea about line
753 ending and "getline" "chomp"s line endings on reading.
754
755 types
756
757 A set of column types; the attribute is immediately passed to the
758 "types" method.
759
760 callbacks
761
762 See the "Callbacks" section below.
763
764 accessors
765
766 To sum it up,
767
768 $csv = Text::CSV_PP->new ();
769
770 is equivalent to
771
772 $csv = Text::CSV_PP->new ({
773 eol => undef, # \r, \n, or \r\n
774 sep_char => ',',
775 sep => undef,
776 quote_char => '"',
777 quote => undef,
778 escape_char => '"',
779 binary => 0,
780 decode_utf8 => 1,
781 auto_diag => 0,
782 diag_verbose => 0,
783 blank_is_undef => 0,
784 empty_is_undef => 0,
785 allow_whitespace => 0,
786 allow_loose_quotes => 0,
787 allow_loose_escapes => 0,
788 allow_unquoted_escape => 0,
789 always_quote => 0,
790 quote_empty => 0,
791 quote_space => 1,
792 escape_null => 1,
793 quote_binary => 1,
794 keep_meta_info => 0,
795 strict => 0,
796 skip_empty_rows => 0,
797 formula => 0,
798 verbatim => 0,
799 undef_str => undef,
800 comment_str => undef,
801 types => undef,
802 callbacks => undef,
803 });
804
805 For all of the above mentioned flags, an accessor method is available
806 where you can inquire the current value, or change the value
807
808 my $quote = $csv->quote_char;
809 $csv->binary (1);
810
811 It is not wise to change these settings halfway through writing "CSV"
812 data to a stream. If however you want to create a new stream using the
813 available "CSV" object, there is no harm in changing them.
814
815 If the "new" constructor call fails, it returns "undef", and makes
816 the fail reason available through the "error_diag" method.
817
818 $csv = Text::CSV_PP->new ({ ecs_char => 1 }) or
819 die "".Text::CSV_PP->error_diag ();
820
821 "error_diag" will return a string like
822
823 "INI - Unknown attribute 'ecs_char'"
824
825 known_attributes
826 @attr = Text::CSV_PP->known_attributes;
827 @attr = Text::CSV_PP::known_attributes;
828 @attr = $csv->known_attributes;
829
830 This method will return an ordered list of all the supported
831 attributes as described above. This can be useful for knowing what
832 attributes are valid in classes that use or extend Text::CSV_PP.
833
834 print
835 $status = $csv->print ($fh, $colref);
836
837 Similar to "combine" + "string" + "print", but much more efficient.
838 It expects an array ref as input (not an array!) and the resulting
839 string is not really created, but immediately written to the $fh
840 object, typically an IO handle or any other object that offers a
841 "print" method.
842
843 For performance reasons "print" does not create a result string, so
844 all "string", "status", "fields", and "error_input" methods will return
845 undefined information after executing this method.
846
847 If $colref is "undef" (explicit, not through a variable argument) and
848 "bind_columns" was used to specify fields to be printed, it is
849 possible to make performance improvements, as otherwise data would have
850 to be copied as arguments to the method call:
851
852 $csv->bind_columns (\($foo, $bar));
853 $status = $csv->print ($fh, undef);
854
855 A short benchmark
856
857 my @data = ("aa" .. "zz");
858 $csv->bind_columns (\(@data));
859
860 $csv->print ($fh, [ @data ]); # 11800 recs/sec
861 $csv->print ($fh, \@data ); # 57600 recs/sec
862 $csv->print ($fh, undef ); # 48500 recs/sec
863
864 say
865 $status = $csv->say ($fh, $colref);
866
867 Like "print", but "eol" defaults to "$\".
868
869 print_hr
870 $csv->print_hr ($fh, $ref);
871
872 Provides an easy way to print a $ref (as fetched with "getline_hr")
873 provided the column names are set with "column_names".
874
875 It is just a wrapper method with basic parameter checks over
876
877 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
878
879 combine
880 $status = $csv->combine (@fields);
881
882 This method constructs a "CSV" record from @fields, returning success
883 or failure. Failure can result from lack of arguments or an argument
884 that contains an invalid character. Upon success, "string" can be
885 called to retrieve the resultant "CSV" string. Upon failure, the
886 value returned by "string" is undefined and "error_input" could be
887 called to retrieve the invalid argument.
888
889 string
890 $line = $csv->string ();
891
892 This method returns the input to "parse" or the resultant "CSV"
893 string of "combine", whichever was called more recently.
894
895 getline
896 $colref = $csv->getline ($fh);
897
898 This is the counterpart to "print", as "parse" is the counterpart to
899 "combine": it parses a row from the $fh handle using the "getline"
900 method associated with $fh and parses this row into an array ref.
901 This array ref is returned by the function or "undef" for failure.
902 When $fh does not support "getline", you are likely to hit errors.
903
904 When fields are bound with "bind_columns" the return value is a
905 reference to an empty list.
906
907 The "string", "fields", and "status" methods are meaningless again.
908
909 getline_all
910 $arrayref = $csv->getline_all ($fh);
911 $arrayref = $csv->getline_all ($fh, $offset);
912 $arrayref = $csv->getline_all ($fh, $offset, $length);
913
914 This will return a reference to a list of getline ($fh) results. In
915 this call, "keep_meta_info" is disabled. If $offset is negative, as
916 with "splice", only the last "abs ($offset)" records of $fh are taken
917 into consideration.
918
919 Given a CSV file with 10 lines:
920
921 lines call
922 ----- ---------------------------------------------------------
923 0..9 $csv->getline_all ($fh) # all
924 0..9 $csv->getline_all ($fh, 0) # all
925 8..9 $csv->getline_all ($fh, 8) # start at 8
926 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
927 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
928 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
929 8..9 $csv->getline_all ($fh, -2) # last 2 rows
930 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
931
932 getline_hr
933 The "getline_hr" and "column_names" methods work together to allow you
934 to have rows returned as hashrefs. You must call "column_names" first
935 to declare your column names.
936
937 $csv->column_names (qw( code name price description ));
938 $hr = $csv->getline_hr ($fh);
939 print "Price for $hr->{name} is $hr->{price} EUR\n";
940
941 "getline_hr" will croak if called before "column_names".
942
943 Note that "getline_hr" creates a hashref for every row and will be
944 much slower than the combined use of "bind_columns" and "getline" but
945 still offering the same easy to use hashref inside the loop:
946
947 my @cols = @{$csv->getline ($fh)};
948 $csv->column_names (@cols);
949 while (my $row = $csv->getline_hr ($fh)) {
950 print $row->{price};
951 }
952
953 Could easily be rewritten to the much faster:
954
955 my @cols = @{$csv->getline ($fh)};
956 my $row = {};
957 $csv->bind_columns (\@{$row}{@cols});
958 while ($csv->getline ($fh)) {
959 print $row->{price};
960 }
961
962 Your mileage may vary for the size of the data and the number of rows.
963 With perl-5.14.2 the comparison for a 100_000 line file with 14
964 columns:
965
966 Rate hashrefs getlines
967 hashrefs 1.00/s -- -76%
968 getlines 4.15/s 313% --
969
970 getline_hr_all
971 $arrayref = $csv->getline_hr_all ($fh);
972 $arrayref = $csv->getline_hr_all ($fh, $offset);
973 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
974
975 This will return a reference to a list of getline_hr ($fh) results.
976 In this call, "keep_meta_info" is disabled.
977
978 parse
979 $status = $csv->parse ($line);
980
981 This method decomposes a "CSV" string into fields, returning success
982 or failure. Failure can result from a lack of argument or the given
983 "CSV" string is improperly formatted. Upon success, "fields" can be
984 called to retrieve the decomposed fields. Upon failure calling "fields"
985 will return undefined data and "error_input" can be called to
986 retrieve the invalid argument.
987
988 You may use the "types" method for setting column types. See "types"'
989 description below.
990
991 The $line argument is supposed to be a simple scalar. Everything else
992 is supposed to croak and set error 1500.
993
994 fragment
995 This function tries to implement RFC7111 (URI Fragment Identifiers for
996 the text/csv Media Type) -
997 https://datatracker.ietf.org/doc/html/rfc7111
998
999 my $AoA = $csv->fragment ($fh, $spec);
1000
1001 In specifications, "*" is used to specify the last item, a dash ("-")
1002 to indicate a range. All indices are 1-based: the first row or
1003 column has index 1. Selections can be combined with the semi-colon
1004 (";").
1005
1006 When using this method in combination with "column_names", the
1007 returned reference will point to a list of hashes instead of a list
1008 of lists. A disjointed cell-based combined selection might return
1009 rows with different number of columns making the use of hashes
1010 unpredictable.
1011
1012 $csv->column_names ("Name", "Age");
1013 my $AoH = $csv->fragment ($fh, "col=3;8");
1014
1015 If the "after_parse" callback is active, it is also called on every
1016 line parsed and skipped before the fragment.
1017
1018 row
1019 row=4
1020 row=5-7
1021 row=6-*
1022 row=1-2;4;6-*
1023
1024 col
1025 col=2
1026 col=1-3
1027 col=4-*
1028 col=1-2;4;7-*
1029
1030 cell
1031 In cell-based selection, the comma (",") is used to pair row and
1032 column
1033
1034 cell=4,1
1035
1036 The range operator ("-") using "cell"s can be used to define top-left
1037 and bottom-right "cell" location
1038
1039 cell=3,1-4,6
1040
1041 The "*" is only allowed in the second part of a pair
1042
1043 cell=3,2-*,2 # row 3 till end, only column 2
1044 cell=3,2-3,* # column 2 till end, only row 3
1045 cell=3,2-*,* # strip row 1 and 2, and column 1
1046
1047 Cells and cell ranges may be combined with ";", possibly resulting in
1048 rows with different numbers of columns
1049
1050 cell=1,1-2,2;3,3-4,4;1,4;4,1
1051
1052 Disjointed selections will only return selected cells. The cells
1053 that are not specified will not be included in the returned
1054 set, not even as "undef". As an example given a "CSV" like
1055
1056 11,12,13,...19
1057 21,22,...28,29
1058 : :
1059 91,...97,98,99
1060
1061 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1062
1063 11,12,14
1064 21,22
1065 33,34
1066 41,43,44
1067
1068 Overlapping cell-specs will return those cells only once, So
1069 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1070
1071 11,12,13
1072 21,22,23,24
1073 31,32,33,34
1074 42,43,44
1075
1076 RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does not
1077 allow different types of specs to be combined (either "row" or "col"
1078 or "cell"). Passing an invalid fragment specification will croak and
1079 set error 2013.
1080
1081 column_names
1082 Set the "keys" that will be used in the "getline_hr" calls. If no
1083 keys (column names) are passed, it will return the current setting as a
1084 list.
1085
1086 "column_names" accepts a list of scalars (the column names) or a
1087 single array_ref, so you can pass the return value from "getline" too:
1088
1089 $csv->column_names ($csv->getline ($fh));
1090
1091 "column_names" does no checking on duplicates at all, which might lead
1092 to unexpected results. Undefined entries will be replaced with the
1093 string "\cAUNDEF\cA", so
1094
1095 $csv->column_names (undef, "", "name", "name");
1096 $hr = $csv->getline_hr ($fh);
1097
1098 will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1099 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1100 field.
1101
1102 "column_names" croaks on invalid arguments.
1103
1104 header
1105 This method does NOT work in perl-5.6.x
1106
1107 Parse the CSV header and set "sep", column_names and encoding.
1108
1109 my @hdr = $csv->header ($fh);
1110 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1111 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1112
1113 The first argument should be a file handle.
1114
1115 This method resets some object properties, as it is supposed to be
1116 invoked only once per file or stream. It will leave attributes
1117 "column_names" and "bound_columns" alone if setting column names is
1118 disabled. Reading headers on previously process objects might fail on
1119 perl-5.8.0 and older.
1120
1121 Assuming that the file opened for parsing has a header, and the header
1122 does not contain problematic characters like embedded newlines, read
1123 the first line from the open handle then auto-detect whether the header
1124 separates the column names with a character from the allowed separator
1125 list.
1126
1127 If any of the allowed separators matches, and none of the other
1128 allowed separators match, set "sep" to that separator for the
1129 current CSV_PP instance and use it to parse the first line, map those
1130 to lowercase, and use that to set the instance "column_names":
1131
1132 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1133 open my $fh, "<", "file.csv";
1134 binmode $fh; # for Windows
1135 $csv->header ($fh);
1136 while (my $row = $csv->getline_hr ($fh)) {
1137 ...
1138 }
1139
1140 If the header is empty, contains more than one unique separator out of
1141 the allowed set, contains empty fields, or contains identical fields
1142 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1143 respectively.
1144
1145 If the header contains embedded newlines or is not valid CSV in any
1146 other way, this method will croak and leave the parse error untouched.
1147
1148 A successful call to "header" will always set the "sep" of the $csv
1149 object. This behavior can not be disabled.
1150
1151 return value
1152
1153 On error this method will croak.
1154
1155 In list context, the headers will be returned whether they are used to
1156 set "column_names" or not.
1157
1158 In scalar context, the instance itself is returned. Note: the values
1159 as found in the header will effectively be lost if "set_column_names"
1160 is false.
1161
1162 Options
1163
1164 sep_set
1165 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1166
1167 The list of legal separators defaults to "[ ";", "," ]" and can be
1168 changed by this option. As this is probably the most often used
1169 option, it can be passed on its own as an unnamed argument:
1170
1171 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1172
1173 Multi-byte sequences are allowed, both multi-character and
1174 Unicode. See "sep".
1175
1176 detect_bom
1177 $csv->header ($fh, { detect_bom => 1 });
1178
1179 The default behavior is to detect if the header line starts with a
1180 BOM. If the header has a BOM, use that to set the encoding of $fh.
1181 This default behavior can be disabled by passing a false value to
1182 "detect_bom".
1183
1184 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1185 UTF-32BE, and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1186 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1187 supported.
1188
1189 If a supported BOM was detected as start of the stream, it is stored
1190 in the object attribute "ENCODING".
1191
1192 my $enc = $csv->{ENCODING};
1193
1194 The encoding is used with "binmode" on $fh.
1195
1196 If the handle was opened in a (correct) encoding, this method will
1197 not alter the encoding, as it checks the leading bytes of the first
1198 line. In case the stream starts with a decoded BOM ("U+FEFF"),
1199 "{ENCODING}" will be "" (empty) instead of the default "undef".
1200
1201 munge_column_names
1202 This option offers the means to modify the column names into
1203 something that is most useful to the application. The default is to
1204 map all column names to lower case.
1205
1206 $csv->header ($fh, { munge_column_names => "lc" });
1207
1208 The following values are available:
1209
1210 lc - lower case
1211 uc - upper case
1212 db - valid DB field names
1213 none - do not change
1214 \%hash - supply a mapping
1215 \&cb - supply a callback
1216
1217 Lower case
1218 $csv->header ($fh, { munge_column_names => "lc" });
1219
1220 The header is changed to all lower-case
1221
1222 $_ = lc;
1223
1224 Upper case
1225 $csv->header ($fh, { munge_column_names => "uc" });
1226
1227 The header is changed to all upper-case
1228
1229 $_ = uc;
1230
1231 Literal
1232 $csv->header ($fh, { munge_column_names => "none" });
1233
1234 Hash
1235 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1236
1237 if a value does not exist, the original value is used unchanged
1238
1239 Database
1240 $csv->header ($fh, { munge_column_names => "db" });
1241
1242 - lower-case
1243
1244 - all sequences of non-word characters are replaced with an
1245 underscore
1246
1247 - all leading underscores are removed
1248
1249 $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1250
1251 Callback
1252 $csv->header ($fh, { munge_column_names => sub { fc } });
1253 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1254 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1255
1256 As this callback is called in a "map", you can use $_ directly.
1257
1258 set_column_names
1259 $csv->header ($fh, { set_column_names => 1 });
1260
1261 The default is to set the instances column names using
1262 "column_names" if the method is successful, so subsequent calls to
1263 "getline_hr" can return a hash. Disable setting the header can be
1264 forced by using a false value for this option.
1265
1266 As described in "return value" above, content is lost in scalar
1267 context.
1268
1269 Validation
1270
1271 When receiving CSV files from external sources, this method can be
1272 used to protect against changes in the layout by restricting to known
1273 headers (and typos in the header fields).
1274
1275 my %known = (
1276 "record key" => "c_rec",
1277 "rec id" => "c_rec",
1278 "id_rec" => "c_rec",
1279 "kode" => "code",
1280 "code" => "code",
1281 "vaule" => "value",
1282 "value" => "value",
1283 );
1284 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
1285 open my $fh, "<", $source or die "$source: $!";
1286 $csv->header ($fh, { munge_column_names => sub {
1287 s/\s+$//;
1288 s/^\s+//;
1289 $known{lc $_} or die "Unknown column '$_' in $source";
1290 }});
1291 while (my $row = $csv->getline_hr ($fh)) {
1292 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1293 }
1294
1295 bind_columns
1296 Takes a list of scalar references to be used for output with "print"
1297 or to store in the fields fetched by "getline". When you do not pass
1298 enough references to store the fetched fields in, "getline" will fail
1299 with error 3006. If you pass more than there are fields to return,
1300 the content of the remaining references is left untouched.
1301
1302 $csv->bind_columns (\$code, \$name, \$price, \$description);
1303 while ($csv->getline ($fh)) {
1304 print "The price of a $name is \x{20ac} $price\n";
1305 }
1306
1307 To reset or clear all column binding, call "bind_columns" with the
1308 single argument "undef". This will also clear column names.
1309
1310 $csv->bind_columns (undef);
1311
1312 If no arguments are passed at all, "bind_columns" will return the list
1313 of current bindings or "undef" if no binds are active.
1314
1315 Note that in parsing with "bind_columns", the fields are set on the
1316 fly. That implies that if the third field of a row causes an error
1317 (or this row has just two fields where the previous row had more), the
1318 first two fields already have been assigned the values of the current
1319 row, while the rest of the fields will still hold the values of the
1320 previous row. If you want the parser to fail in these cases, use the
1321 "strict" attribute.
1322
1323 eof
1324 $eof = $csv->eof ();
1325
1326 If "parse" or "getline" was used with an IO stream, this method will
1327 return true (1) if the last call hit end of file, otherwise it will
1328 return false (''). This is useful to see the difference between a
1329 failure and end of file.
1330
1331 Note that if the parsing of the last line caused an error, "eof" is
1332 still true. That means that if you are not using "auto_diag", an idiom
1333 like
1334
1335 while (my $row = $csv->getline ($fh)) {
1336 # ...
1337 }
1338 $csv->eof or $csv->error_diag;
1339
1340 will not report the error. You would have to change that to
1341
1342 while (my $row = $csv->getline ($fh)) {
1343 # ...
1344 }
1345 +$csv->error_diag and $csv->error_diag;
1346
1347 types
1348 $csv->types (\@tref);
1349
1350 This method is used to force that (all) columns are of a given type.
1351 For example, if you have an integer column, two columns with
1352 doubles and a string column, then you might do a
1353
1354 $csv->types ([Text::CSV_PP::IV (),
1355 Text::CSV_PP::NV (),
1356 Text::CSV_PP::NV (),
1357 Text::CSV_PP::PV ()]);
1358
1359 Column types are used only for decoding columns while parsing, in
1360 other words by the "parse" and "getline" methods.
1361
1362 You can unset column types by doing a
1363
1364 $csv->types (undef);
1365
1366 or fetch the current type settings with
1367
1368 $types = $csv->types ();
1369
1370 IV
1371 CSV_TYPE_IV
1372 Set field type to integer.
1373
1374 NV
1375 CSV_TYPE_NV
1376 Set field type to numeric/float.
1377
1378 PV
1379 CSV_TYPE_PV
1380 Set field type to string.
1381
1382 fields
1383 @columns = $csv->fields ();
1384
1385 This method returns the input to "combine" or the resultant
1386 decomposed fields of a successful "parse", whichever was called more
1387 recently.
1388
1389 Note that the return value is undefined after using "getline", which
1390 does not fill the data structures returned by "parse".
1391
1392 meta_info
1393 @flags = $csv->meta_info ();
1394
1395 This method returns the "flags" of the input to "combine" or the flags
1396 of the resultant decomposed fields of "parse", whichever was called
1397 more recently.
1398
1399 For each field, a meta_info field will hold flags that inform
1400 something about the field returned by the "fields" method or
1401 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1402
1403 0x0001
1404 "CSV_FLAGS_IS_QUOTED"
1405 The field was quoted.
1406
1407 0x0002
1408 "CSV_FLAGS_IS_BINARY"
1409 The field was binary.
1410
1411 0x0004
1412 "CSV_FLAGS_ERROR_IN_FIELD"
1413 The field was invalid.
1414
1415 Currently only used when "allow_loose_quotes" is active.
1416
1417 0x0010
1418 "CSV_FLAGS_IS_MISSING"
1419 The field was missing.
1420
1421 See the "is_***" methods below.
1422
1423 is_quoted
1424 my $quoted = $csv->is_quoted ($column_idx);
1425
1426 where $column_idx is the (zero-based) index of the column in the
1427 last result of "parse".
1428
1429 This returns a true value if the data in the indicated column was
1430 enclosed in "quote_char" quotes. This might be important for fields
1431 where content ",20070108," is to be treated as a numeric value, and
1432 where ","20070108"," is explicitly marked as character string data.
1433
1434 This method is only valid when "keep_meta_info" is set to a true value.
1435
1436 is_binary
1437 my $binary = $csv->is_binary ($column_idx);
1438
1439 where $column_idx is the (zero-based) index of the column in the
1440 last result of "parse".
1441
1442 This returns a true value if the data in the indicated column contained
1443 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1444
1445 This method is only valid when "keep_meta_info" is set to a true value.
1446
1447 is_missing
1448 my $missing = $csv->is_missing ($column_idx);
1449
1450 where $column_idx is the (zero-based) index of the column in the
1451 last result of "getline_hr".
1452
1453 $csv->keep_meta_info (1);
1454 while (my $hr = $csv->getline_hr ($fh)) {
1455 $csv->is_missing (0) and next; # This was an empty line
1456 }
1457
1458 When using "getline_hr", it is impossible to tell if the parsed
1459 fields are "undef" because they where not filled in the "CSV" stream
1460 or because they were not read at all, as all the fields defined by
1461 "column_names" are set in the hash-ref. If you still need to know if
1462 all fields in each row are provided, you should enable "keep_meta_info"
1463 so you can check the flags.
1464
1465 If "keep_meta_info" is "false", "is_missing" will always return
1466 "undef", regardless of $column_idx being valid or not. If this
1467 attribute is "true" it will return either 0 (the field is present) or 1
1468 (the field is missing).
1469
1470 A special case is the empty line. If the line is completely empty -
1471 after dealing with the flags - this is still a valid CSV line: it is a
1472 record of just one single empty field. However, if "keep_meta_info" is
1473 set, invoking "is_missing" with index 0 will now return true.
1474
1475 status
1476 $status = $csv->status ();
1477
1478 This method returns the status of the last invoked "combine" or "parse"
1479 call. Status is success (true: 1) or failure (false: "undef" or 0).
1480
1481 Note that as this only keeps track of the status of above mentioned
1482 methods, you are probably looking for "error_diag" instead.
1483
1484 error_input
1485 $bad_argument = $csv->error_input ();
1486
1487 This method returns the erroneous argument (if it exists) of "combine"
1488 or "parse", whichever was called more recently. If the last
1489 invocation was successful, "error_input" will return "undef".
1490
1491 Depending on the type of error, it might also hold the data for the
1492 last error-input of "getline".
1493
1494 error_diag
1495 Text::CSV_PP->error_diag ();
1496 $csv->error_diag ();
1497 $error_code = 0 + $csv->error_diag ();
1498 $error_str = "" . $csv->error_diag ();
1499 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1500
1501 If (and only if) an error occurred, this function returns the
1502 diagnostics of that error.
1503
1504 If called in void context, this will print the internal error code and
1505 the associated error message to STDERR.
1506
1507 If called in list context, this will return the error code and the
1508 error message in that order. If the last error was from parsing, the
1509 rest of the values returned are a best guess at the location within
1510 the line that was being parsed. Their values are 1-based. The
1511 position currently is index of the byte at which the parsing failed in
1512 the current record. It might change to be the index of the current
1513 character in a later release. The records is the index of the record
1514 parsed by the csv instance. The field number is the index of the field
1515 the parser thinks it is currently trying to parse. See
1516 examples/csv-check for how this can be used.
1517
1518 If called in scalar context, it will return the diagnostics in a
1519 single scalar, a-la $!. It will contain the error code in numeric
1520 context, and the diagnostics message in string context.
1521
1522 When called as a class method or a direct function call, the
1523 diagnostics are that of the last "new" call.
1524
1525 record_number
1526 $recno = $csv->record_number ();
1527
1528 Returns the records parsed by this csv instance. This value should be
1529 more accurate than $. when embedded newlines come in play. Records
1530 written by this instance are not counted.
1531
1532 SetDiag
1533 $csv->SetDiag (0);
1534
1535 Use to reset the diagnostics if you are dealing with errors.
1536
1538 This section is also taken from Text::CSV_XS.
1539
1540 csv
1541 This function is not exported by default and should be explicitly
1542 requested:
1543
1544 use Text::CSV_PP qw( csv );
1545
1546 This is a high-level function that aims at simple (user) interfaces.
1547 This can be used to read/parse a "CSV" file or stream (the default
1548 behavior) or to produce a file or write to a stream (define the "out"
1549 attribute). It returns an array- or hash-reference on parsing (or
1550 "undef" on fail) or the numeric value of "error_diag" on writing.
1551 When this function fails you can get to the error using the class call
1552 to "error_diag"
1553
1554 my $aoa = csv (in => "test.csv") or
1555 die Text::CSV_PP->error_diag;
1556
1557 This function takes the arguments as key-value pairs. This can be
1558 passed as a list or as an anonymous hash:
1559
1560 my $aoa = csv ( in => "test.csv", sep_char => ";");
1561 my $aoh = csv ({ in => $fh, headers => "auto" });
1562
1563 The arguments passed consist of two parts: the arguments to "csv"
1564 itself and the optional attributes to the "CSV" object used inside
1565 the function as enumerated and explained in "new".
1566
1567 If not overridden, the default option used for CSV is
1568
1569 auto_diag => 1
1570 escape_null => 0
1571
1572 The option that is always set and cannot be altered is
1573
1574 binary => 1
1575
1576 As this function will likely be used in one-liners, it allows "quote"
1577 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1578 "esc" or "escape".
1579
1580 Alternative invocations:
1581
1582 my $aoa = Text::CSV_PP::csv (in => "file.csv");
1583
1584 my $csv = Text::CSV_PP->new ();
1585 my $aoa = $csv->csv (in => "file.csv");
1586
1587 In the latter case, the object attributes are used from the existing
1588 object and the attribute arguments in the function call are ignored:
1589
1590 my $csv = Text::CSV_PP->new ({ sep_char => ";" });
1591 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1592
1593 will parse using ";" as "sep_char", not ",".
1594
1595 in
1596
1597 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1598 which will be opened for reading and closed when finished, a file
1599 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1600 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1601 "\q{1,2,"csv"}").
1602
1603 When used with "out", "in" should be a reference to a CSV structure
1604 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1605 reference. The code-ref will be invoked with no arguments.
1606
1607 my $aoa = csv (in => "file.csv");
1608
1609 open my $fh, "<", "file.csv";
1610 my $aoa = csv (in => $fh);
1611
1612 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1613 my $err = csv (in => $csv, out => "file.csv");
1614
1615 If called in void context without the "out" attribute, the resulting
1616 ref will be used as input to a subsequent call to csv:
1617
1618 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1619
1620 will be a shortcut to
1621
1622 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1623
1624 where, in the absence of the "out" attribute, this is a shortcut to
1625
1626 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1627 out => *STDOUT)
1628
1629 out
1630
1631 csv (in => $aoa, out => "file.csv");
1632 csv (in => $aoa, out => $fh);
1633 csv (in => $aoa, out => STDOUT);
1634 csv (in => $aoa, out => *STDOUT);
1635 csv (in => $aoa, out => \*STDOUT);
1636 csv (in => $aoa, out => \my $data);
1637 csv (in => $aoa, out => undef);
1638 csv (in => $aoa, out => \"skip");
1639
1640 csv (in => $fh, out => \@aoa);
1641 csv (in => $fh, out => \@aoh, bom => 1);
1642 csv (in => $fh, out => \%hsh, key => "key");
1643
1644 In output mode, the default CSV options when producing CSV are
1645
1646 eol => "\r\n"
1647
1648 The "fragment" attribute is ignored in output mode.
1649
1650 "out" can be a file name (e.g. "file.csv"), which will be opened for
1651 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1652 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1653 or a reference to a scalar (e.g. "\my $data").
1654
1655 csv (in => sub { $sth->fetch }, out => "dump.csv");
1656 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1657 headers => $sth->{NAME_lc});
1658
1659 When a code-ref is used for "in", the output is generated per
1660 invocation, so no buffering is involved. This implies that there is no
1661 size restriction on the number of records. The "csv" function ends when
1662 the coderef returns a false value.
1663
1664 If "out" is set to a reference of the literal string "skip", the output
1665 will be suppressed completely, which might be useful in combination
1666 with a filter for side effects only.
1667
1668 my %cache;
1669 csv (in => "dump.csv",
1670 out => \"skip",
1671 on_in => sub { $cache{$_[1][1]}++ });
1672
1673 Currently, setting "out" to any false value ("undef", "", 0) will be
1674 equivalent to "\"skip"".
1675
1676 If the "in" argument point to something to parse, and the "out" is set
1677 to a reference to an "ARRAY" or a "HASH", the output is appended to the
1678 data in the existing reference. The result of the parse should match
1679 what exists in the reference passed. This might come handy when you
1680 have to parse a set of files with similar content (like data stored per
1681 period) and you want to collect that into a single data structure:
1682
1683 my %hash;
1684 csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1685
1686 my @list; # List of arrays
1687 csv (in => $_, out => \@list) for sort glob "foo-[0-9]*.csv";
1688
1689 my @list; # List of hashes
1690 csv (in => $_, out => \@list, bom => 1) for sort glob "foo-[0-9]*.csv";
1691
1692 encoding
1693
1694 If passed, it should be an encoding accepted by the :encoding()
1695 option to "open". There is no default value. This attribute does not
1696 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1697 use in command line invocations.
1698
1699 If "encoding" is set to the literal value "auto", the method "header"
1700 will be invoked on the opened stream to check if there is a BOM and set
1701 the encoding accordingly. This is equal to passing a true value in
1702 the option "detect_bom".
1703
1704 Encodings can be stacked, as supported by "binmode":
1705
1706 # Using PerlIO::via::gzip
1707 csv (in => \@csv,
1708 out => "test.csv:via.gz",
1709 encoding => ":via(gzip):encoding(utf-8)",
1710 );
1711 $aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
1712
1713 # Using PerlIO::gzip
1714 csv (in => \@csv,
1715 out => "test.csv:via.gz",
1716 encoding => ":gzip:encoding(utf-8)",
1717 );
1718 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1719
1720 detect_bom
1721
1722 If "detect_bom" is given, the method "header" will be invoked on
1723 the opened stream to check if there is a BOM and set the encoding
1724 accordingly.
1725
1726 "detect_bom" can be abbreviated to "bom".
1727
1728 This is the same as setting "encoding" to "auto".
1729
1730 Note that as the method "header" is invoked, its default is to also
1731 set the headers.
1732
1733 headers
1734
1735 If this attribute is not given, the default behavior is to produce an
1736 array of arrays.
1737
1738 If "headers" is supplied, it should be an anonymous list of column
1739 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1740 "lc", "uc", or "skip".
1741
1742 skip
1743 When "skip" is used, the header will not be included in the output.
1744
1745 my $aoa = csv (in => $fh, headers => "skip");
1746
1747 auto
1748 If "auto" is used, the first line of the "CSV" source will be read as
1749 the list of field headers and used to produce an array of hashes.
1750
1751 my $aoh = csv (in => $fh, headers => "auto");
1752
1753 lc
1754 If "lc" is used, the first line of the "CSV" source will be read as
1755 the list of field headers mapped to lower case and used to produce
1756 an array of hashes. This is a variation of "auto".
1757
1758 my $aoh = csv (in => $fh, headers => "lc");
1759
1760 uc
1761 If "uc" is used, the first line of the "CSV" source will be read as
1762 the list of field headers mapped to upper case and used to produce
1763 an array of hashes. This is a variation of "auto".
1764
1765 my $aoh = csv (in => $fh, headers => "uc");
1766
1767 CODE
1768 If a coderef is used, the first line of the "CSV" source will be
1769 read as the list of mangled field headers in which each field is
1770 passed as the only argument to the coderef. This list is used to
1771 produce an array of hashes.
1772
1773 my $aoh = csv (in => $fh,
1774 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1775
1776 this example is a variation of using "lc" where all occurrences of
1777 "kode" are replaced with "code".
1778
1779 ARRAY
1780 If "headers" is an anonymous list, the entries in the list will be
1781 used as field names. The first line is considered data instead of
1782 headers.
1783
1784 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1785 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1786
1787 HASH
1788 If "headers" is a hash reference, this implies "auto", but header
1789 fields that exist as key in the hashref will be replaced by the value
1790 for that key. Given a CSV file like
1791
1792 post-kode,city,name,id number,fubble
1793 1234AA,Duckstad,Donald,13,"X313DF"
1794
1795 using
1796
1797 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1798
1799 will return an entry like
1800
1801 { pc => "1234AA",
1802 city => "Duckstad",
1803 name => "Donald",
1804 ID => "13",
1805 fubble => "X313DF",
1806 }
1807
1808 See also "munge_column_names" and "set_column_names".
1809
1810 munge_column_names
1811
1812 If "munge_column_names" is set, the method "header" is invoked on
1813 the opened stream with all matching arguments to detect and set the
1814 headers.
1815
1816 "munge_column_names" can be abbreviated to "munge".
1817
1818 key
1819
1820 If passed, will default "headers" to "auto" and return a hashref
1821 instead of an array of hashes. Allowed values are simple scalars or
1822 array-references where the first element is the joiner and the rest are
1823 the fields to join to combine the key.
1824
1825 my $ref = csv (in => "test.csv", key => "code");
1826 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1827
1828 with test.csv like
1829
1830 code,product,price,color
1831 1,pc,850,gray
1832 2,keyboard,12,white
1833 3,mouse,5,black
1834
1835 the first example will return
1836
1837 { 1 => {
1838 code => 1,
1839 color => 'gray',
1840 price => 850,
1841 product => 'pc'
1842 },
1843 2 => {
1844 code => 2,
1845 color => 'white',
1846 price => 12,
1847 product => 'keyboard'
1848 },
1849 3 => {
1850 code => 3,
1851 color => 'black',
1852 price => 5,
1853 product => 'mouse'
1854 }
1855 }
1856
1857 the second example will return
1858
1859 { "1:gray" => {
1860 code => 1,
1861 color => 'gray',
1862 price => 850,
1863 product => 'pc'
1864 },
1865 "2:white" => {
1866 code => 2,
1867 color => 'white',
1868 price => 12,
1869 product => 'keyboard'
1870 },
1871 "3:black" => {
1872 code => 3,
1873 color => 'black',
1874 price => 5,
1875 product => 'mouse'
1876 }
1877 }
1878
1879 The "key" attribute can be combined with "headers" for "CSV" date that
1880 has no header line, like
1881
1882 my $ref = csv (
1883 in => "foo.csv",
1884 headers => [qw( c_foo foo bar description stock )],
1885 key => "c_foo",
1886 );
1887
1888 value
1889
1890 Used to create key-value hashes.
1891
1892 Only allowed when "key" is valid. A "value" can be either a single
1893 column label or an anonymous list of column labels. In the first case,
1894 the value will be a simple scalar value, in the latter case, it will be
1895 a hashref.
1896
1897 my $ref = csv (in => "test.csv", key => "code",
1898 value => "price");
1899 my $ref = csv (in => "test.csv", key => "code",
1900 value => [ "product", "price" ]);
1901 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1902 value => "price");
1903 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1904 value => [ "product", "price" ]);
1905
1906 with test.csv like
1907
1908 code,product,price,color
1909 1,pc,850,gray
1910 2,keyboard,12,white
1911 3,mouse,5,black
1912
1913 the first example will return
1914
1915 { 1 => 850,
1916 2 => 12,
1917 3 => 5,
1918 }
1919
1920 the second example will return
1921
1922 { 1 => {
1923 price => 850,
1924 product => 'pc'
1925 },
1926 2 => {
1927 price => 12,
1928 product => 'keyboard'
1929 },
1930 3 => {
1931 price => 5,
1932 product => 'mouse'
1933 }
1934 }
1935
1936 the third example will return
1937
1938 { "1:gray" => 850,
1939 "2:white" => 12,
1940 "3:black" => 5,
1941 }
1942
1943 the fourth example will return
1944
1945 { "1:gray" => {
1946 price => 850,
1947 product => 'pc'
1948 },
1949 "2:white" => {
1950 price => 12,
1951 product => 'keyboard'
1952 },
1953 "3:black" => {
1954 price => 5,
1955 product => 'mouse'
1956 }
1957 }
1958
1959 keep_headers
1960
1961 When using hashes, keep the column names into the arrayref passed, so
1962 all headers are available after the call in the original order.
1963
1964 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1965
1966 This attribute can be abbreviated to "kh" or passed as
1967 "keep_column_names".
1968
1969 This attribute implies a default of "auto" for the "headers" attribute.
1970
1971 The headers can also be kept internally to keep stable header order:
1972
1973 csv (in => csv (in => "file.csv", kh => "internal"),
1974 out => "new.csv",
1975 kh => "internal");
1976
1977 where "internal" can also be 1, "yes", or "true". This is similar to
1978
1979 my @h;
1980 csv (in => csv (in => "file.csv", kh => \@h),
1981 out => "new.csv",
1982 headers => \@h);
1983
1984 fragment
1985
1986 Only output the fragment as defined in the "fragment" method. This
1987 option is ignored when generating "CSV". See "out".
1988
1989 Combining all of them could give something like
1990
1991 use Text::CSV_PP qw( csv );
1992 my $aoh = csv (
1993 in => "test.txt",
1994 encoding => "utf-8",
1995 headers => "auto",
1996 sep_char => "|",
1997 fragment => "row=3;6-9;15-*",
1998 );
1999 say $aoh->[15]{Foo};
2000
2001 sep_set
2002
2003 If "sep_set" is set, the method "header" is invoked on the opened
2004 stream to detect and set "sep_char" with the given set.
2005
2006 "sep_set" can be abbreviated to "seps".
2007
2008 Note that as the "header" method is invoked, its default is to also
2009 set the headers.
2010
2011 set_column_names
2012
2013 If "set_column_names" is passed, the method "header" is invoked on
2014 the opened stream with all arguments meant for "header".
2015
2016 If "set_column_names" is passed as a false value, the content of the
2017 first row is only preserved if the output is AoA:
2018
2019 With an input-file like
2020
2021 bAr,foo
2022 1,2
2023 3,4,5
2024
2025 This call
2026
2027 my $aoa = csv (in => $file, set_column_names => 0);
2028
2029 will result in
2030
2031 [[ "bar", "foo" ],
2032 [ "1", "2" ],
2033 [ "3", "4", "5" ]]
2034
2035 and
2036
2037 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2038
2039 will result in
2040
2041 [[ "bAr", "foo" ],
2042 [ "1", "2" ],
2043 [ "3", "4", "5" ]]
2044
2045 Callbacks
2046 Callbacks enable actions triggered from the inside of Text::CSV_PP.
2047
2048 While most of what this enables can easily be done in an unrolled
2049 loop as described in the "SYNOPSIS" callbacks can be used to meet
2050 special demands or enhance the "csv" function.
2051
2052 error
2053 $csv->callbacks (error => sub { $csv->SetDiag (0) });
2054
2055 the "error" callback is invoked when an error occurs, but only
2056 when "auto_diag" is set to a true value. A callback is invoked with
2057 the values returned by "error_diag":
2058
2059 my ($c, $s);
2060
2061 sub ignore3006 {
2062 my ($err, $msg, $pos, $recno, $fldno) = @_;
2063 if ($err == 3006) {
2064 # ignore this error
2065 ($c, $s) = (undef, undef);
2066 Text::CSV_PP->SetDiag (0);
2067 }
2068 # Any other error
2069 return;
2070 } # ignore3006
2071
2072 $csv->callbacks (error => \&ignore3006);
2073 $csv->bind_columns (\$c, \$s);
2074 while ($csv->getline ($fh)) {
2075 # Error 3006 will not stop the loop
2076 }
2077
2078 after_parse
2079 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2080 while (my $row = $csv->getline ($fh)) {
2081 $row->[-1] eq "NEW";
2082 }
2083
2084 This callback is invoked after parsing with "getline" only if no
2085 error occurred. The callback is invoked with two arguments: the
2086 current "CSV" parser object and an array reference to the fields
2087 parsed.
2088
2089 The return code of the callback is ignored unless it is a reference
2090 to the string "skip", in which case the record will be skipped in
2091 "getline_all".
2092
2093 sub add_from_db {
2094 my ($csv, $row) = @_;
2095 $sth->execute ($row->[4]);
2096 push @$row, $sth->fetchrow_array;
2097 } # add_from_db
2098
2099 my $aoa = csv (in => "file.csv", callbacks => {
2100 after_parse => \&add_from_db });
2101
2102 This hook can be used for validation:
2103
2104 FAIL
2105 Die if any of the records does not validate a rule:
2106
2107 after_parse => sub {
2108 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2109 die "5th field does not have a valid Dutch zipcode";
2110 }
2111
2112 DEFAULT
2113 Replace invalid fields with a default value:
2114
2115 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2116
2117 SKIP
2118 Skip records that have invalid fields (only applies to
2119 "getline_all"):
2120
2121 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2122
2123 before_print
2124 my $idx = 1;
2125 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2126 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2127
2128 This callback is invoked before printing with "print" only if no
2129 error occurred. The callback is invoked with two arguments: the
2130 current "CSV" parser object and an array reference to the fields
2131 passed.
2132
2133 The return code of the callback is ignored.
2134
2135 sub max_4_fields {
2136 my ($csv, $row) = @_;
2137 @$row > 4 and splice @$row, 4;
2138 } # max_4_fields
2139
2140 csv (in => csv (in => "file.csv"), out => *STDOUT,
2141 callbacks => { before_print => \&max_4_fields });
2142
2143 This callback is not active for "combine".
2144
2145 Callbacks for csv ()
2146
2147 The "csv" allows for some callbacks that do not integrate in XS
2148 internals but only feature the "csv" function.
2149
2150 csv (in => "file.csv",
2151 callbacks => {
2152 filter => { 6 => sub { $_ > 15 } }, # first
2153 after_parse => sub { say "AFTER PARSE"; }, # first
2154 after_in => sub { say "AFTER IN"; }, # second
2155 on_in => sub { say "ON IN"; }, # third
2156 },
2157 );
2158
2159 csv (in => $aoh,
2160 out => "file.csv",
2161 callbacks => {
2162 on_in => sub { say "ON IN"; }, # first
2163 before_out => sub { say "BEFORE OUT"; }, # second
2164 before_print => sub { say "BEFORE PRINT"; }, # third
2165 },
2166 );
2167
2168 filter
2169 This callback can be used to filter records. It is called just after
2170 a new record has been scanned. The callback accepts a:
2171
2172 hashref
2173 The keys are the index to the row (the field name or field number,
2174 1-based) and the values are subs to return a true or false value.
2175
2176 csv (in => "file.csv", filter => {
2177 3 => sub { m/a/ }, # third field should contain an "a"
2178 5 => sub { length > 4 }, # length of the 5th field minimal 5
2179 });
2180
2181 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2182
2183 If the keys to the filter hash contain any character that is not a
2184 digit it will also implicitly set "headers" to "auto" unless
2185 "headers" was already passed as argument. When headers are
2186 active, returning an array of hashes, the filter is not applicable
2187 to the header itself.
2188
2189 All sub results should match, as in AND.
2190
2191 The context of the callback sets $_ localized to the field
2192 indicated by the filter. The two arguments are as with all other
2193 callbacks, so the other fields in the current row can be seen:
2194
2195 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2196
2197 If the context is set to return a list of hashes ("headers" is
2198 defined), the current record will also be available in the
2199 localized %_:
2200
2201 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2202
2203 If the filter is used to alter the content by changing $_, make
2204 sure that the sub returns true in order not to have that record
2205 skipped:
2206
2207 filter => { 2 => sub { $_ = uc }}
2208
2209 will upper-case the second field, and then skip it if the resulting
2210 content evaluates to false. To always accept, end with truth:
2211
2212 filter => { 2 => sub { $_ = uc; 1 }}
2213
2214 coderef
2215 csv (in => "file.csv", filter => sub { $n++; 0; });
2216
2217 If the argument to "filter" is a coderef, it is an alias or
2218 shortcut to a filter on column 0:
2219
2220 csv (filter => sub { $n++; 0 });
2221
2222 is equal to
2223
2224 csv (filter => { 0 => sub { $n++; 0 });
2225
2226 filter-name
2227 csv (in => "file.csv", filter => "not_blank");
2228 csv (in => "file.csv", filter => "not_empty");
2229 csv (in => "file.csv", filter => "filled");
2230
2231 These are predefined filters
2232
2233 Given a file like (line numbers prefixed for doc purpose only):
2234
2235 1:1,2,3
2236 2:
2237 3:,
2238 4:""
2239 5:,,
2240 6:, ,
2241 7:"",
2242 8:" "
2243 9:4,5,6
2244
2245 not_blank
2246 Filter out the blank lines
2247
2248 This filter is a shortcut for
2249
2250 filter => { 0 => sub { @{$_[1]} > 1 or
2251 defined $_[1][0] && $_[1][0] ne "" } }
2252
2253 Due to the implementation, it is currently impossible to also
2254 filter lines that consists only of a quoted empty field. These
2255 lines are also considered blank lines.
2256
2257 With the given example, lines 2 and 4 will be skipped.
2258
2259 not_empty
2260 Filter out lines where all the fields are empty.
2261
2262 This filter is a shortcut for
2263
2264 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2265
2266 A space is not regarded being empty, so given the example data,
2267 lines 2, 3, 4, 5, and 7 are skipped.
2268
2269 filled
2270 Filter out lines that have no visible data
2271
2272 This filter is a shortcut for
2273
2274 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2275
2276 This filter rejects all lines that not have at least one field
2277 that does not evaluate to the empty string.
2278
2279 With the given example data, this filter would skip lines 2
2280 through 8.
2281
2282 One could also use modules like Types::Standard:
2283
2284 use Types::Standard -types;
2285
2286 my $type = Tuple[Str, Str, Int, Bool, Optional[Num]];
2287 my $check = $type->compiled_check;
2288
2289 # filter with compiled check and warnings
2290 my $aoa = csv (
2291 in => \$data,
2292 filter => {
2293 0 => sub {
2294 my $ok = $check->($_[1]) or
2295 warn $type->get_message ($_[1]), "\n";
2296 return $ok;
2297 },
2298 },
2299 );
2300
2301 after_in
2302 This callback is invoked for each record after all records have been
2303 parsed but before returning the reference to the caller. The hook is
2304 invoked with two arguments: the current "CSV" parser object and a
2305 reference to the record. The reference can be a reference to a
2306 HASH or a reference to an ARRAY as determined by the arguments.
2307
2308 This callback can also be passed as an attribute without the
2309 "callbacks" wrapper.
2310
2311 before_out
2312 This callback is invoked for each record before the record is
2313 printed. The hook is invoked with two arguments: the current "CSV"
2314 parser object and a reference to the record. The reference can be a
2315 reference to a HASH or a reference to an ARRAY as determined by the
2316 arguments.
2317
2318 This callback can also be passed as an attribute without the
2319 "callbacks" wrapper.
2320
2321 This callback makes the row available in %_ if the row is a hashref.
2322 In this case %_ is writable and will change the original row.
2323
2324 on_in
2325 This callback acts exactly as the "after_in" or the "before_out"
2326 hooks.
2327
2328 This callback can also be passed as an attribute without the
2329 "callbacks" wrapper.
2330
2331 This callback makes the row available in %_ if the row is a hashref.
2332 In this case %_ is writable and will change the original row. So e.g.
2333 with
2334
2335 my $aoh = csv (
2336 in => \"foo\n1\n2\n",
2337 headers => "auto",
2338 on_in => sub { $_{bar} = 2; },
2339 );
2340
2341 $aoh will be:
2342
2343 [ { foo => 1,
2344 bar => 2,
2345 }
2346 { foo => 2,
2347 bar => 2,
2348 }
2349 ]
2350
2351 csv
2352 The function "csv" can also be called as a method or with an
2353 existing Text::CSV_PP object. This could help if the function is to
2354 be invoked a lot of times and the overhead of creating the object
2355 internally over and over again would be prevented by passing an
2356 existing instance.
2357
2358 my $csv = Text::CSV_PP->new ({ binary => 1, auto_diag => 1 });
2359
2360 my $aoa = $csv->csv (in => $fh);
2361 my $aoa = csv (in => $fh, csv => $csv);
2362
2363 both act the same. Running this 20000 times on a 20 lines CSV file,
2364 showed a 53% speedup.
2365
2367 This section is also taken from Text::CSV_XS.
2368
2369 Still under construction ...
2370
2371 If an error occurs, "$csv->error_diag" can be used to get information
2372 on the cause of the failure. Note that for speed reasons the internal
2373 value is never cleared on success, so using the value returned by
2374 "error_diag" in normal cases - when no error occurred - may cause
2375 unexpected results.
2376
2377 If the constructor failed, the cause can be found using "error_diag" as
2378 a class method, like "Text::CSV_PP->error_diag".
2379
2380 The "$csv->error_diag" method is automatically invoked upon error when
2381 the contractor was called with "auto_diag" set to 1 or 2, or when
2382 autodie is in effect. When set to 1, this will cause a "warn" with the
2383 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2384 from "auto_diag" reports.
2385
2386 Errors can be (individually) caught using the "error" callback.
2387
2388 The errors as described below are available. I have tried to make the
2389 error itself explanatory enough, but more descriptions will be added.
2390 For most of these errors, the first three capitals describe the error
2391 category:
2392
2393 • INI
2394
2395 Initialization error or option conflict.
2396
2397 • ECR
2398
2399 Carriage-Return related parse error.
2400
2401 • EOF
2402
2403 End-Of-File related parse error.
2404
2405 • EIQ
2406
2407 Parse error inside quotation.
2408
2409 • EIF
2410
2411 Parse error inside field.
2412
2413 • ECB
2414
2415 Combine error.
2416
2417 • EHR
2418
2419 HashRef parse related error.
2420
2421 And below should be the complete list of error codes that can be
2422 returned:
2423
2424 • 1001 "INI - sep_char is equal to quote_char or escape_char"
2425
2426 The separation character cannot be equal to the quotation
2427 character or to the escape character, as this would invalidate all
2428 parsing rules.
2429
2430 • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2431 TAB"
2432
2433 Using the "allow_whitespace" attribute when either "quote_char" or
2434 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2435 allow.
2436
2437 • 1003 "INI - \r or \n in main attr not allowed"
2438
2439 Using default "eol" characters in either "sep_char", "quote_char",
2440 or "escape_char" is not allowed.
2441
2442 • 1004 "INI - callbacks should be undef or a hashref"
2443
2444 The "callbacks" attribute only allows one to be "undef" or a hash
2445 reference.
2446
2447 • 1005 "INI - EOL too long"
2448
2449 The value passed for EOL is exceeding its maximum length (16).
2450
2451 • 1006 "INI - SEP too long"
2452
2453 The value passed for SEP is exceeding its maximum length (16).
2454
2455 • 1007 "INI - QUOTE too long"
2456
2457 The value passed for QUOTE is exceeding its maximum length (16).
2458
2459 • 1008 "INI - SEP undefined"
2460
2461 The value passed for SEP should be defined and not empty.
2462
2463 • 1010 "INI - the header is empty"
2464
2465 The header line parsed in the "header" is empty.
2466
2467 • 1011 "INI - the header contains more than one valid separator"
2468
2469 The header line parsed in the "header" contains more than one
2470 (unique) separator character out of the allowed set of separators.
2471
2472 • 1012 "INI - the header contains an empty field"
2473
2474 The header line parsed in the "header" contains an empty field.
2475
2476 • 1013 "INI - the header contains nun-unique fields"
2477
2478 The header line parsed in the "header" contains at least two
2479 identical fields.
2480
2481 • 1014 "INI - header called on undefined stream"
2482
2483 The header line cannot be parsed from an undefined source.
2484
2485 • 1500 "PRM - Invalid/unsupported argument(s)"
2486
2487 Function or method called with invalid argument(s) or parameter(s).
2488
2489 • 1501 "PRM - The key attribute is passed as an unsupported type"
2490
2491 The "key" attribute is of an unsupported type.
2492
2493 • 1502 "PRM - The value attribute is passed without the key attribute"
2494
2495 The "value" attribute is only allowed when a valid key is given.
2496
2497 • 1503 "PRM - The value attribute is passed as an unsupported type"
2498
2499 The "value" attribute is of an unsupported type.
2500
2501 • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2502
2503 When "eol" has been set to anything but the default, like
2504 "\r\t\n", and the "\r" is following the second (closing)
2505 "quote_char", where the characters following the "\r" do not make up
2506 the "eol" sequence, this is an error.
2507
2508 • 2011 "ECR - Characters after end of quoted field"
2509
2510 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2511 quoted field and after the closing double-quote, there should be
2512 either a new-line sequence or a separation character.
2513
2514 • 2012 "EOF - End of data in parsing input stream"
2515
2516 Self-explaining. End-of-file while inside parsing a stream. Can
2517 happen only when reading from streams with "getline", as using
2518 "parse" is done on strings that are not required to have a trailing
2519 "eol".
2520
2521 • 2013 "INI - Specification error for fragments RFC7111"
2522
2523 Invalid specification for URI "fragment" specification.
2524
2525 • 2014 "ENF - Inconsistent number of fields"
2526
2527 Inconsistent number of fields under strict parsing.
2528
2529 • 2021 "EIQ - NL char inside quotes, binary off"
2530
2531 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2532 option has been selected with the constructor.
2533
2534 • 2022 "EIQ - CR char inside quotes, binary off"
2535
2536 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2537 option has been selected with the constructor.
2538
2539 • 2023 "EIQ - QUO character not allowed"
2540
2541 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2542 Bar",\n" will cause this error.
2543
2544 • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2545
2546 The escape character is not allowed as last character in an input
2547 stream.
2548
2549 • 2025 "EIQ - Loose unescaped escape"
2550
2551 An escape character should escape only characters that need escaping.
2552
2553 Allowing the escape for other characters is possible with the
2554 attribute "allow_loose_escapes".
2555
2556 • 2026 "EIQ - Binary character inside quoted field, binary off"
2557
2558 Binary characters are not allowed by default. Exceptions are
2559 fields that contain valid UTF-8, that will automatically be upgraded
2560 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2561 data.
2562
2563 • 2027 "EIQ - Quoted field not terminated"
2564
2565 When parsing a field that started with a quotation character, the
2566 field is expected to be closed with a quotation character. When the
2567 parsed line is exhausted before the quote is found, that field is not
2568 terminated.
2569
2570 • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2571
2572 • 2031 "EIF - CR char is first char of field, not part of EOL"
2573
2574 • 2032 "EIF - CR char inside unquoted, not part of EOL"
2575
2576 • 2034 "EIF - Loose unescaped quote"
2577
2578 • 2035 "EIF - Escaped EOF in unquoted field"
2579
2580 • 2036 "EIF - ESC error"
2581
2582 • 2037 "EIF - Binary character in unquoted field, binary off"
2583
2584 • 2110 "ECB - Binary character in Combine, binary off"
2585
2586 • 2200 "EIO - print to IO failed. See errno"
2587
2588 • 3001 "EHR - Unsupported syntax for column_names ()"
2589
2590 • 3002 "EHR - getline_hr () called before column_names ()"
2591
2592 • 3003 "EHR - bind_columns () and column_names () fields count
2593 mismatch"
2594
2595 • 3004 "EHR - bind_columns () only accepts refs to scalars"
2596
2597 • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2598 fields"
2599
2600 • 3007 "EHR - bind_columns needs refs to writable scalars"
2601
2602 • 3008 "EHR - unexpected error in bound fields"
2603
2604 • 3009 "EHR - print_hr () called before column_names ()"
2605
2606 • 3010 "EHR - print_hr () called with invalid arguments"
2607
2609 Text::CSV_XS, Text::CSV
2610
2611 Older versions took many regexp from
2612 <http://www.din.or.jp/~ohzaki/perl.htm>
2613
2615 Kenichi Ishigaki, <ishigaki[at]cpan.org> Makamaka Hannyaharamitu,
2616 <makamaka[at]cpan.org>
2617
2618 Text::CSV_XS was written by <joe[at]ispsoft.de> and maintained by
2619 <h.m.brand[at]xs4all.nl>.
2620
2621 Text::CSV was written by <alan[at]mfgrtl.com>.
2622
2624 Copyright 2017- by Kenichi Ishigaki, <ishigaki[at]cpan.org> Copyright
2625 2005-2015 by Makamaka Hannyaharamitu, <makamaka[at]cpan.org>
2626
2627 Most of the code and doc is directly taken from the pure perl part of
2628 Text::CSV_XS.
2629
2630 Copyright (C) 2007-2016 H.Merijn Brand. All rights reserved.
2631 Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
2632 Copyright (C) 1997 Alan Citterman. All rights reserved.
2633
2634 This library is free software; you can redistribute it and/or modify it
2635 under the same terms as Perl itself.
2636
2637
2638
2639perl v5.38.0 2023-07-21 Text::CSV_PP(3)