1Text::CSV(3) User Contributed Perl Documentation Text::CSV(3)
2
3
4
6 Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7
9 This section is taken from Text::CSV_XS.
10
11 # Functional interface
12 use Text::CSV qw( csv );
13
14 # Read whole file in memory
15 my $aoa = csv (in => "data.csv"); # as array of array
16 my $aoh = csv (in => "data.csv",
17 headers => "auto"); # as array of hash
18
19 # Write array of arrays as csv file
20 csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22 # Only show lines where "code" is odd
23 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25 # Object interface
26 use Text::CSV;
27
28 my @rows;
29 # Read/parse CSV
30 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
31 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32 while (my $row = $csv->getline ($fh)) {
33 $row->[2] =~ m/pattern/ or next; # 3rd field should match
34 push @rows, $row;
35 }
36 close $fh;
37
38 # and write as CSV
39 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40 $csv->say ($fh, $_) for @rows;
41 close $fh or die "new.csv: $!";
42
44 Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
45 All the backend modules provide facilities for the composition and
46 decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
47 default, and when Text::CSV_XS is not available, falls back on
48 Text::CSV_PP, which is bundled in the same distribution as this module.
49
51 This module respects an environmental variable called "PERL_TEXT_CSV"
52 when it decides a backend module to use. If this environmental variable
53 is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
54 available, falls back on Text::CSV_PP;
55
56 If you always don't want it to fall back on Text::CSV_PP, set the
57 variable like this ("export" may be "setenv", "set" and the likes,
58 depending on your environment):
59
60 > export PERL_TEXT_CSV=Text::CSV_XS
61
62 If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
63
64 > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
65
66 You may also want to set this variable at the top of your test files,
67 in order not to be bothered with incompatibilities between backends
68 (you need to wrap this in "BEGIN", and set before actually "use"-ing
69 Text::CSV module, as it decides its backend as soon as it's loaded):
70
71 BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
72 use Text::CSV;
73
75 This section is also taken from Text::CSV_XS.
76
77 Embedded newlines
78 Important Note: The default behavior is to accept only ASCII
79 characters in the range from 0x20 (space) to 0x7E (tilde). This means
80 that the fields can not contain newlines. If your data contains
81 newlines embedded in fields, or characters above 0x7E (tilde), or
82 binary data, you must set "binary => 1" in the call to "new". To cover
83 the widest range of parsing options, you will always want to set
84 binary.
85
86 But you still have the problem that you have to pass a correct line to
87 the "parse" method, which is more complicated from the usual point of
88 usage:
89
90 my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
91 while (<>) { # WRONG!
92 $csv->parse ($_);
93 my @fields = $csv->fields ();
94 }
95
96 this will break, as the "while" might read broken lines: it does not
97 care about the quoting. If you need to support embedded newlines, the
98 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
99 and "\r\n" by default) and then
100
101 my $csv = Text::CSV->new ({ binary => 1 });
102 open my $fh, "<", $file or die "$file: $!";
103 while (my $row = $csv->getline ($fh)) {
104 my @fields = @$row;
105 }
106
107 The old(er) way of using global file handles is still supported
108
109 while (my $row = $csv->getline (*ARGV)) { ... }
110
111 Unicode
112 Unicode is only tested to work with perl-5.8.2 and up.
113
114 See also "BOM".
115
116 The simplest way to ensure the correct encoding is used for in- and
117 output is by either setting layers on the filehandles, or setting the
118 "encoding" argument for "csv".
119
120 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
121 or
122 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
123
124 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
125 or
126 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
127
128 On parsing (both for "getline" and "parse"), if the source is marked
129 being UTF8, then all fields that are marked binary will also be marked
130 UTF8.
131
132 On combining ("print" and "combine"): if any of the combining fields
133 was marked UTF8, the resulting string will be marked as UTF8. Note
134 however that all fields before the first field marked UTF8 and
135 contained 8-bit characters that were not upgraded to UTF8, these will
136 be "bytes" in the resulting string too, possibly causing unexpected
137 errors. If you pass data of different encoding, or you don't know if
138 there is different encoding, force it to be upgraded before you pass
139 them on:
140
141 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
142
143 For complete control over encoding, please use Text::CSV::Encoded:
144
145 use Text::CSV::Encoded;
146 my $csv = Text::CSV::Encoded->new ({
147 encoding_in => "iso-8859-1", # the encoding comes into Perl
148 encoding_out => "cp1252", # the encoding comes out of Perl
149 });
150
151 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
152 # combine () and print () accept *literally* utf8 encoded data
153 # parse () and getline () return *literally* utf8 encoded data
154
155 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
156 # combine () and print () accept UTF8 marked data
157 # parse () and getline () return UTF8 marked data
158
159 BOM
160 BOM (or Byte Order Mark) handling is available only inside the
161 "header" method. This method supports the following encodings:
162 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
163 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
164 <https://en.wikipedia.org/wiki/Byte_order_mark>.
165
166 If a file has a BOM, the easiest way to deal with that is
167
168 my $aoh = csv (in => $file, detect_bom => 1);
169
170 All records will be encoded based on the detected BOM.
171
172 This implies a call to the "header" method, which defaults to also
173 set the "column_names". So this is not the same as
174
175 my $aoh = csv (in => $file, headers => "auto");
176
177 which only reads the first record to set "column_names" but ignores
178 any meaning of possible present BOM.
179
181 This section is also taken from Text::CSV_XS.
182
183 version
184 (Class method) Returns the current module version.
185
186 new
187 (Class method) Returns a new instance of class Text::CSV. The
188 attributes are described by the (optional) hash ref "\%attr".
189
190 my $csv = Text::CSV->new ({ attributes ... });
191
192 The following attributes are available:
193
194 eol
195
196 my $csv = Text::CSV->new ({ eol => $/ });
197 $csv->eol (undef);
198 my $eol = $csv->eol;
199
200 The end-of-line string to add to rows for "print" or the record
201 separator for "getline".
202
203 When not passed in a parser instance, the default behavior is to
204 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
205 "eol" at all. Passing "undef" or the empty string behave the same.
206
207 When not passed in a generating instance, records are not terminated
208 at all, so it is probably wise to pass something you expect. A safe
209 choice for "eol" on output is either $/ or "\r\n".
210
211 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
212 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
213 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
214
215 If both $/ and "eol" equal "\015", parsing lines that end on only a
216 Carriage Return without Line Feed, will be "parse"d correct.
217
218 sep_char
219
220 my $csv = Text::CSV->new ({ sep_char => ";" });
221 $csv->sep_char (";");
222 my $c = $csv->sep_char;
223
224 The char used to separate fields, by default a comma. (","). Limited
225 to a single-byte character, usually in the range from 0x20 (space) to
226 0x7E (tilde). When longer sequences are required, use "sep".
227
228 The separation character can not be equal to the quote character or to
229 the escape character.
230
231 sep
232
233 my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
234 $csv->sep (";");
235 my $sep = $csv->sep;
236
237 The chars used to separate fields, by default undefined. Limited to 8
238 bytes.
239
240 When set, overrules "sep_char". If its length is one byte it acts as
241 an alias to "sep_char".
242
243 quote_char
244
245 my $csv = Text::CSV->new ({ quote_char => "'" });
246 $csv->quote_char (undef);
247 my $c = $csv->quote_char;
248
249 The character to quote fields containing blanks or binary data, by
250 default the double quote character ("""). A value of undef suppresses
251 quote chars (for simple cases only). Limited to a single-byte
252 character, usually in the range from 0x20 (space) to 0x7E (tilde).
253 When longer sequences are required, use "quote".
254
255 "quote_char" can not be equal to "sep_char".
256
257 quote
258
259 my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
260 $csv->quote ("'");
261 my $quote = $csv->quote;
262
263 The chars used to quote fields, by default undefined. Limited to 8
264 bytes.
265
266 When set, overrules "quote_char". If its length is one byte it acts as
267 an alias to "quote_char".
268
269 This method does not support "undef". Use "quote_char" to disable
270 quotation.
271
272 escape_char
273
274 my $csv = Text::CSV->new ({ escape_char => "\\" });
275 $csv->escape_char (":");
276 my $c = $csv->escape_char;
277
278 The character to escape certain characters inside quoted fields.
279 This is limited to a single-byte character, usually in the range
280 from 0x20 (space) to 0x7E (tilde).
281
282 The "escape_char" defaults to being the double-quote mark ("""). In
283 other words the same as the default "quote_char". This means that
284 doubling the quote mark in a field escapes it:
285
286 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
287
288 If you change the "quote_char" without changing the
289 "escape_char", the "escape_char" will still be the double-quote
290 ("""). If instead you want to escape the "quote_char" by doubling it
291 you will need to also change the "escape_char" to be the same as what
292 you have changed the "quote_char" to.
293
294 Setting "escape_char" to <undef> or "" will disable escaping completely
295 and is greatly discouraged. This will also disable "escape_null".
296
297 The escape character can not be equal to the separation character.
298
299 binary
300
301 my $csv = Text::CSV->new ({ binary => 1 });
302 $csv->binary (0);
303 my $f = $csv->binary;
304
305 If this attribute is 1, you may use binary characters in quoted
306 fields, including line feeds, carriage returns and "NULL" bytes. (The
307 latter could be escaped as ""0".) By default this feature is off.
308
309 If a string is marked UTF8, "binary" will be turned on automatically
310 when binary characters other than "CR" and "NL" are encountered. Note
311 that a simple string like "\x{00a0}" might still be binary, but not
312 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
313
314 strict
315
316 my $csv = Text::CSV->new ({ strict => 1 });
317 $csv->strict (0);
318 my $f = $csv->strict;
319
320 If this attribute is set to 1, any row that parses to a different
321 number of fields than the previous row will cause the parser to throw
322 error 2014.
323
324 skip_empty_rows
325
326 my $csv = Text::CSV->new ({ skip_empty_rows => 1 });
327 $csv->skip_empty_rows (0);
328 my $f = $csv->skip_empty_rows;
329
330 If this attribute is set to 1, any row that has an "eol" immediately
331 following the start of line will be skipped. Default behavior is to
332 return one single empty field.
333
334 This attribute is only used in parsing.
335
336 formula_handling
337
338 formula
339
340 my $csv = Text::CSV->new ({ formula => "none" });
341 $csv->formula ("none");
342 my $f = $csv->formula;
343
344 This defines the behavior of fields containing formulas. As formulas
345 are considered dangerous in spreadsheets, this attribute can define an
346 optional action to be taken if a field starts with an equal sign ("=").
347
348 For purpose of code-readability, this can also be written as
349
350 my $csv = Text::CSV->new ({ formula_handling => "none" });
351 $csv->formula_handling ("none");
352 my $f = $csv->formula_handling;
353
354 Possible values for this attribute are
355
356 none
357 Take no specific action. This is the default.
358
359 $csv->formula ("none");
360
361 die
362 Cause the process to "die" whenever a leading "=" is encountered.
363
364 $csv->formula ("die");
365
366 croak
367 Cause the process to "croak" whenever a leading "=" is encountered.
368 (See Carp)
369
370 $csv->formula ("croak");
371
372 diag
373 Report position and content of the field whenever a leading "=" is
374 found. The value of the field is unchanged.
375
376 $csv->formula ("diag");
377
378 empty
379 Replace the content of fields that start with a "=" with the empty
380 string.
381
382 $csv->formula ("empty");
383 $csv->formula ("");
384
385 undef
386 Replace the content of fields that start with a "=" with "undef".
387
388 $csv->formula ("undef");
389 $csv->formula (undef);
390
391 a callback
392 Modify the content of fields that start with a "=" with the return-
393 value of the callback. The original content of the field is
394 available inside the callback as $_;
395
396 # Replace all formula's with 42
397 $csv->formula (sub { 42; });
398
399 # same as $csv->formula ("empty") but slower
400 $csv->formula (sub { "" });
401
402 # Allow =4+12
403 $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
404
405 # Allow more complex calculations
406 $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
407
408 All other values will give a warning and then fallback to "diag".
409
410 decode_utf8
411
412 my $csv = Text::CSV->new ({ decode_utf8 => 1 });
413 $csv->decode_utf8 (0);
414 my $f = $csv->decode_utf8;
415
416 This attributes defaults to TRUE.
417
418 While parsing, fields that are valid UTF-8, are automatically set to
419 be UTF-8, so that
420
421 $csv->parse ("\xC4\xA8\n");
422
423 results in
424
425 PV("\304\250"\0) [UTF8 "\x{128}"]
426
427 Sometimes it might not be a desired action. To prevent those upgrades,
428 set this attribute to false, and the result will be
429
430 PV("\304\250"\0)
431
432 auto_diag
433
434 my $csv = Text::CSV->new ({ auto_diag => 1 });
435 $csv->auto_diag (2);
436 my $l = $csv->auto_diag;
437
438 Set this attribute to a number between 1 and 9 causes "error_diag" to
439 be automatically called in void context upon errors.
440
441 In case of error "2012 - EOF", this call will be void.
442
443 If "auto_diag" is set to a numeric value greater than 1, it will "die"
444 on errors instead of "warn". If set to anything unrecognized, it will
445 be silently ignored.
446
447 Future extensions to this feature will include more reliable auto-
448 detection of "autodie" being active in the scope of which the error
449 occurred which will increment the value of "auto_diag" with 1 the
450 moment the error is detected.
451
452 diag_verbose
453
454 my $csv = Text::CSV->new ({ diag_verbose => 1 });
455 $csv->diag_verbose (2);
456 my $l = $csv->diag_verbose;
457
458 Set the verbosity of the output triggered by "auto_diag". Currently
459 only adds the current input-record-number (if known) to the
460 diagnostic output with an indication of the position of the error.
461
462 blank_is_undef
463
464 my $csv = Text::CSV->new ({ blank_is_undef => 1 });
465 $csv->blank_is_undef (0);
466 my $f = $csv->blank_is_undef;
467
468 Under normal circumstances, "CSV" data makes no distinction between
469 quoted- and unquoted empty fields. These both end up in an empty
470 string field once read, thus
471
472 1,"",," ",2
473
474 is read as
475
476 ("1", "", "", " ", "2")
477
478 When writing "CSV" files with either "always_quote" or "quote_empty"
479 set, the unquoted empty field is the result of an undefined value.
480 To enable this distinction when reading "CSV" data, the
481 "blank_is_undef" attribute will cause unquoted empty fields to be set
482 to "undef", causing the above to be parsed as
483
484 ("1", "", undef, " ", "2")
485
486 Note that this is specifically important when loading "CSV" fields
487 into a database that allows "NULL" values, as the perl equivalent for
488 "NULL" is "undef" in DBI land.
489
490 empty_is_undef
491
492 my $csv = Text::CSV->new ({ empty_is_undef => 1 });
493 $csv->empty_is_undef (0);
494 my $f = $csv->empty_is_undef;
495
496 Going one step further than "blank_is_undef", this attribute
497 converts all empty fields to "undef", so
498
499 1,"",," ",2
500
501 is read as
502
503 (1, undef, undef, " ", 2)
504
505 Note that this affects only fields that are originally empty, not
506 fields that are empty after stripping allowed whitespace. YMMV.
507
508 allow_whitespace
509
510 my $csv = Text::CSV->new ({ allow_whitespace => 1 });
511 $csv->allow_whitespace (0);
512 my $f = $csv->allow_whitespace;
513
514 When this option is set to true, the whitespace ("TAB"'s and
515 "SPACE"'s) surrounding the separation character is removed when
516 parsing. If either "TAB" or "SPACE" is one of the three characters
517 "sep_char", "quote_char", or "escape_char" it will not be considered
518 whitespace.
519
520 Now lines like:
521
522 1 , "foo" , bar , 3 , zapp
523
524 are parsed as valid "CSV", even though it violates the "CSV" specs.
525
526 Note that all whitespace is stripped from both start and end of
527 each field. That would make it more than a feature to enable parsing
528 bad "CSV" lines, as
529
530 1, 2.0, 3, ape , monkey
531
532 will now be parsed as
533
534 ("1", "2.0", "3", "ape", "monkey")
535
536 even if the original line was perfectly acceptable "CSV".
537
538 allow_loose_quotes
539
540 my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
541 $csv->allow_loose_quotes (0);
542 my $f = $csv->allow_loose_quotes;
543
544 By default, parsing unquoted fields containing "quote_char" characters
545 like
546
547 1,foo "bar" baz,42
548
549 would result in parse error 2034. Though it is still bad practice to
550 allow this format, we cannot help the fact that some vendors
551 make their applications spit out lines styled this way.
552
553 If there is really bad "CSV" data, like
554
555 1,"foo "bar" baz",42
556
557 or
558
559 1,""foo bar baz"",42
560
561 there is a way to get this data-line parsed and leave the quotes inside
562 the quoted field as-is. This can be achieved by setting
563 "allow_loose_quotes" AND making sure that the "escape_char" is not
564 equal to "quote_char".
565
566 allow_loose_escapes
567
568 my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
569 $csv->allow_loose_escapes (0);
570 my $f = $csv->allow_loose_escapes;
571
572 Parsing fields that have "escape_char" characters that escape
573 characters that do not need to be escaped, like:
574
575 my $csv = Text::CSV->new ({ escape_char => "\\" });
576 $csv->parse (qq{1,"my bar\'s",baz,42});
577
578 would result in parse error 2025. Though it is bad practice to allow
579 this format, this attribute enables you to treat all escape character
580 sequences equal.
581
582 allow_unquoted_escape
583
584 my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
585 $csv->allow_unquoted_escape (0);
586 my $f = $csv->allow_unquoted_escape;
587
588 A backward compatibility issue where "escape_char" differs from
589 "quote_char" prevents "escape_char" to be in the first position of a
590 field. If "quote_char" is equal to the default """ and "escape_char"
591 is set to "\", this would be illegal:
592
593 1,\0,2
594
595 Setting this attribute to 1 might help to overcome issues with
596 backward compatibility and allow this style.
597
598 always_quote
599
600 my $csv = Text::CSV->new ({ always_quote => 1 });
601 $csv->always_quote (0);
602 my $f = $csv->always_quote;
603
604 By default the generated fields are quoted only if they need to be.
605 For example, if they contain the separator character. If you set this
606 attribute to 1 then all defined fields will be quoted. ("undef" fields
607 are not quoted, see "blank_is_undef"). This makes it quite often easier
608 to handle exported data in external applications.
609
610 quote_space
611
612 my $csv = Text::CSV->new ({ quote_space => 1 });
613 $csv->quote_space (0);
614 my $f = $csv->quote_space;
615
616 By default, a space in a field would trigger quotation. As no rule
617 exists this to be forced in "CSV", nor any for the opposite, the
618 default is true for safety. You can exclude the space from this
619 trigger by setting this attribute to 0.
620
621 quote_empty
622
623 my $csv = Text::CSV->new ({ quote_empty => 1 });
624 $csv->quote_empty (0);
625 my $f = $csv->quote_empty;
626
627 By default the generated fields are quoted only if they need to be.
628 An empty (defined) field does not need quotation. If you set this
629 attribute to 1 then empty defined fields will be quoted. ("undef"
630 fields are not quoted, see "blank_is_undef"). See also "always_quote".
631
632 quote_binary
633
634 my $csv = Text::CSV->new ({ quote_binary => 1 });
635 $csv->quote_binary (0);
636 my $f = $csv->quote_binary;
637
638 By default, all "unsafe" bytes inside a string cause the combined
639 field to be quoted. By setting this attribute to 0, you can disable
640 that trigger for bytes >= 0x7F.
641
642 escape_null
643
644 my $csv = Text::CSV->new ({ escape_null => 1 });
645 $csv->escape_null (0);
646 my $f = $csv->escape_null;
647
648 By default, a "NULL" byte in a field would be escaped. This option
649 enables you to treat the "NULL" byte as a simple binary character in
650 binary mode (the "{ binary => 1 }" is set). The default is true. You
651 can prevent "NULL" escapes by setting this attribute to 0.
652
653 When the "escape_char" attribute is set to undefined, this attribute
654 will be set to false.
655
656 The default setting will encode "=\x00=" as
657
658 "="0="
659
660 With "escape_null" set, this will result in
661
662 "=\x00="
663
664 The default when using the "csv" function is "false".
665
666 For backward compatibility reasons, the deprecated old name
667 "quote_null" is still recognized.
668
669 keep_meta_info
670
671 my $csv = Text::CSV->new ({ keep_meta_info => 1 });
672 $csv->keep_meta_info (0);
673 my $f = $csv->keep_meta_info;
674
675 By default, the parsing of input records is as simple and fast as
676 possible. However, some parsing information - like quotation of the
677 original field - is lost in that process. Setting this flag to true
678 enables retrieving that information after parsing with the methods
679 "meta_info", "is_quoted", and "is_binary" described below. Default is
680 false for performance.
681
682 If you set this attribute to a value greater than 9, then you can
683 control output quotation style like it was used in the input of the the
684 last parsed record (unless quotation was added because of other
685 reasons).
686
687 my $csv = Text::CSV->new ({
688 binary => 1,
689 keep_meta_info => 1,
690 quote_space => 0,
691 });
692
693 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
694
695 $csv->print (*STDOUT, \@row);
696 # 1,,, , ,f,g,"h""h",help,help
697 $csv->keep_meta_info (11);
698 $csv->print (*STDOUT, \@row);
699 # 1,,"", ," ",f,"g","h""h",help,"help"
700
701 undef_str
702
703 my $csv = Text::CSV->new ({ undef_str => "\\N" });
704 $csv->undef_str (undef);
705 my $s = $csv->undef_str;
706
707 This attribute optionally defines the output of undefined fields. The
708 value passed is not changed at all, so if it needs quotation, the
709 quotation needs to be included in the value of the attribute. Use with
710 caution, as passing a value like ",",,,,""" will for sure mess up
711 your output. The default for this attribute is "undef", meaning no
712 special treatment.
713
714 This attribute is useful when exporting CSV data to be imported in
715 custom loaders, like for MySQL, that recognize special sequences for
716 "NULL" data.
717
718 This attribute has no meaning when parsing CSV data.
719
720 comment_str
721
722 my $csv = Text::CSV->new ({ comment_str => "#" });
723 $csv->comment_str (undef);
724 my $s = $csv->comment_str;
725
726 This attribute optionally defines a string to be recognized as comment.
727 If this attribute is defined, all lines starting with this sequence
728 will not be parsed as CSV but skipped as comment.
729
730 This attribute has no meaning when generating CSV.
731
732 Comment strings that start with any of the special characters/sequences
733 are not supported (so it cannot start with any of "sep_char",
734 "quote_char", "escape_char", "sep", "quote", or "eol").
735
736 For convenience, "comment" is an alias for "comment_str".
737
738 verbatim
739
740 my $csv = Text::CSV->new ({ verbatim => 1 });
741 $csv->verbatim (0);
742 my $f = $csv->verbatim;
743
744 This is a quite controversial attribute to set, but makes some hard
745 things possible.
746
747 The rationale behind this attribute is to tell the parser that the
748 normally special characters newline ("NL") and Carriage Return ("CR")
749 will not be special when this flag is set, and be dealt with as being
750 ordinary binary characters. This will ease working with data with
751 embedded newlines.
752
753 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
754 every line.
755
756 Imagine a file format like
757
758 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
759
760 where, the line ending is a very specific "#\r\n", and the sep_char is
761 a "^" (caret). None of the fields is quoted, but embedded binary
762 data is likely to be present. With the specific line ending, this
763 should not be too hard to detect.
764
765 By default, Text::CSV' parse function is instructed to only know
766 about "\n" and "\r" to be legal line endings, and so has to deal with
767 the embedded newline as a real "end-of-line", so it can scan the next
768 line if binary is true, and the newline is inside a quoted field. With
769 this option, we tell "parse" to parse the line as if "\n" is just
770 nothing more than a binary character.
771
772 For "parse" this means that the parser has no more idea about line
773 ending and "getline" "chomp"s line endings on reading.
774
775 types
776
777 A set of column types; the attribute is immediately passed to the
778 "types" method.
779
780 callbacks
781
782 See the "Callbacks" section below.
783
784 accessors
785
786 To sum it up,
787
788 $csv = Text::CSV->new ();
789
790 is equivalent to
791
792 $csv = Text::CSV->new ({
793 eol => undef, # \r, \n, or \r\n
794 sep_char => ',',
795 sep => undef,
796 quote_char => '"',
797 quote => undef,
798 escape_char => '"',
799 binary => 0,
800 decode_utf8 => 1,
801 auto_diag => 0,
802 diag_verbose => 0,
803 blank_is_undef => 0,
804 empty_is_undef => 0,
805 allow_whitespace => 0,
806 allow_loose_quotes => 0,
807 allow_loose_escapes => 0,
808 allow_unquoted_escape => 0,
809 always_quote => 0,
810 quote_empty => 0,
811 quote_space => 1,
812 escape_null => 1,
813 quote_binary => 1,
814 keep_meta_info => 0,
815 strict => 0,
816 skip_empty_rows => 0,
817 formula => 0,
818 verbatim => 0,
819 undef_str => undef,
820 comment_str => undef,
821 types => undef,
822 callbacks => undef,
823 });
824
825 For all of the above mentioned flags, an accessor method is available
826 where you can inquire the current value, or change the value
827
828 my $quote = $csv->quote_char;
829 $csv->binary (1);
830
831 It is not wise to change these settings halfway through writing "CSV"
832 data to a stream. If however you want to create a new stream using the
833 available "CSV" object, there is no harm in changing them.
834
835 If the "new" constructor call fails, it returns "undef", and makes
836 the fail reason available through the "error_diag" method.
837
838 $csv = Text::CSV->new ({ ecs_char => 1 }) or
839 die "".Text::CSV->error_diag ();
840
841 "error_diag" will return a string like
842
843 "INI - Unknown attribute 'ecs_char'"
844
845 known_attributes
846 @attr = Text::CSV->known_attributes;
847 @attr = Text::CSV::known_attributes;
848 @attr = $csv->known_attributes;
849
850 This method will return an ordered list of all the supported
851 attributes as described above. This can be useful for knowing what
852 attributes are valid in classes that use or extend Text::CSV.
853
854 print
855 $status = $csv->print ($fh, $colref);
856
857 Similar to "combine" + "string" + "print", but much more efficient.
858 It expects an array ref as input (not an array!) and the resulting
859 string is not really created, but immediately written to the $fh
860 object, typically an IO handle or any other object that offers a
861 "print" method.
862
863 For performance reasons "print" does not create a result string, so
864 all "string", "status", "fields", and "error_input" methods will return
865 undefined information after executing this method.
866
867 If $colref is "undef" (explicit, not through a variable argument) and
868 "bind_columns" was used to specify fields to be printed, it is
869 possible to make performance improvements, as otherwise data would have
870 to be copied as arguments to the method call:
871
872 $csv->bind_columns (\($foo, $bar));
873 $status = $csv->print ($fh, undef);
874
875 A short benchmark
876
877 my @data = ("aa" .. "zz");
878 $csv->bind_columns (\(@data));
879
880 $csv->print ($fh, [ @data ]); # 11800 recs/sec
881 $csv->print ($fh, \@data ); # 57600 recs/sec
882 $csv->print ($fh, undef ); # 48500 recs/sec
883
884 say
885 $status = $csv->say ($fh, $colref);
886
887 Like "print", but "eol" defaults to "$\".
888
889 print_hr
890 $csv->print_hr ($fh, $ref);
891
892 Provides an easy way to print a $ref (as fetched with "getline_hr")
893 provided the column names are set with "column_names".
894
895 It is just a wrapper method with basic parameter checks over
896
897 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
898
899 combine
900 $status = $csv->combine (@fields);
901
902 This method constructs a "CSV" record from @fields, returning success
903 or failure. Failure can result from lack of arguments or an argument
904 that contains an invalid character. Upon success, "string" can be
905 called to retrieve the resultant "CSV" string. Upon failure, the
906 value returned by "string" is undefined and "error_input" could be
907 called to retrieve the invalid argument.
908
909 string
910 $line = $csv->string ();
911
912 This method returns the input to "parse" or the resultant "CSV"
913 string of "combine", whichever was called more recently.
914
915 getline
916 $colref = $csv->getline ($fh);
917
918 This is the counterpart to "print", as "parse" is the counterpart to
919 "combine": it parses a row from the $fh handle using the "getline"
920 method associated with $fh and parses this row into an array ref.
921 This array ref is returned by the function or "undef" for failure.
922 When $fh does not support "getline", you are likely to hit errors.
923
924 When fields are bound with "bind_columns" the return value is a
925 reference to an empty list.
926
927 The "string", "fields", and "status" methods are meaningless again.
928
929 getline_all
930 $arrayref = $csv->getline_all ($fh);
931 $arrayref = $csv->getline_all ($fh, $offset);
932 $arrayref = $csv->getline_all ($fh, $offset, $length);
933
934 This will return a reference to a list of getline ($fh) results. In
935 this call, "keep_meta_info" is disabled. If $offset is negative, as
936 with "splice", only the last "abs ($offset)" records of $fh are taken
937 into consideration.
938
939 Given a CSV file with 10 lines:
940
941 lines call
942 ----- ---------------------------------------------------------
943 0..9 $csv->getline_all ($fh) # all
944 0..9 $csv->getline_all ($fh, 0) # all
945 8..9 $csv->getline_all ($fh, 8) # start at 8
946 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
947 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
948 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
949 8..9 $csv->getline_all ($fh, -2) # last 2 rows
950 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
951
952 getline_hr
953 The "getline_hr" and "column_names" methods work together to allow you
954 to have rows returned as hashrefs. You must call "column_names" first
955 to declare your column names.
956
957 $csv->column_names (qw( code name price description ));
958 $hr = $csv->getline_hr ($fh);
959 print "Price for $hr->{name} is $hr->{price} EUR\n";
960
961 "getline_hr" will croak if called before "column_names".
962
963 Note that "getline_hr" creates a hashref for every row and will be
964 much slower than the combined use of "bind_columns" and "getline" but
965 still offering the same easy to use hashref inside the loop:
966
967 my @cols = @{$csv->getline ($fh)};
968 $csv->column_names (@cols);
969 while (my $row = $csv->getline_hr ($fh)) {
970 print $row->{price};
971 }
972
973 Could easily be rewritten to the much faster:
974
975 my @cols = @{$csv->getline ($fh)};
976 my $row = {};
977 $csv->bind_columns (\@{$row}{@cols});
978 while ($csv->getline ($fh)) {
979 print $row->{price};
980 }
981
982 Your mileage may vary for the size of the data and the number of rows.
983 With perl-5.14.2 the comparison for a 100_000 line file with 14
984 columns:
985
986 Rate hashrefs getlines
987 hashrefs 1.00/s -- -76%
988 getlines 4.15/s 313% --
989
990 getline_hr_all
991 $arrayref = $csv->getline_hr_all ($fh);
992 $arrayref = $csv->getline_hr_all ($fh, $offset);
993 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
994
995 This will return a reference to a list of getline_hr ($fh) results.
996 In this call, "keep_meta_info" is disabled.
997
998 parse
999 $status = $csv->parse ($line);
1000
1001 This method decomposes a "CSV" string into fields, returning success
1002 or failure. Failure can result from a lack of argument or the given
1003 "CSV" string is improperly formatted. Upon success, "fields" can be
1004 called to retrieve the decomposed fields. Upon failure calling "fields"
1005 will return undefined data and "error_input" can be called to
1006 retrieve the invalid argument.
1007
1008 You may use the "types" method for setting column types. See "types"'
1009 description below.
1010
1011 The $line argument is supposed to be a simple scalar. Everything else
1012 is supposed to croak and set error 1500.
1013
1014 fragment
1015 This function tries to implement RFC7111 (URI Fragment Identifiers for
1016 the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
1017
1018 my $AoA = $csv->fragment ($fh, $spec);
1019
1020 In specifications, "*" is used to specify the last item, a dash ("-")
1021 to indicate a range. All indices are 1-based: the first row or
1022 column has index 1. Selections can be combined with the semi-colon
1023 (";").
1024
1025 When using this method in combination with "column_names", the
1026 returned reference will point to a list of hashes instead of a list
1027 of lists. A disjointed cell-based combined selection might return
1028 rows with different number of columns making the use of hashes
1029 unpredictable.
1030
1031 $csv->column_names ("Name", "Age");
1032 my $AoH = $csv->fragment ($fh, "col=3;8");
1033
1034 If the "after_parse" callback is active, it is also called on every
1035 line parsed and skipped before the fragment.
1036
1037 row
1038 row=4
1039 row=5-7
1040 row=6-*
1041 row=1-2;4;6-*
1042
1043 col
1044 col=2
1045 col=1-3
1046 col=4-*
1047 col=1-2;4;7-*
1048
1049 cell
1050 In cell-based selection, the comma (",") is used to pair row and
1051 column
1052
1053 cell=4,1
1054
1055 The range operator ("-") using "cell"s can be used to define top-left
1056 and bottom-right "cell" location
1057
1058 cell=3,1-4,6
1059
1060 The "*" is only allowed in the second part of a pair
1061
1062 cell=3,2-*,2 # row 3 till end, only column 2
1063 cell=3,2-3,* # column 2 till end, only row 3
1064 cell=3,2-*,* # strip row 1 and 2, and column 1
1065
1066 Cells and cell ranges may be combined with ";", possibly resulting in
1067 rows with different numbers of columns
1068
1069 cell=1,1-2,2;3,3-4,4;1,4;4,1
1070
1071 Disjointed selections will only return selected cells. The cells
1072 that are not specified will not be included in the returned
1073 set, not even as "undef". As an example given a "CSV" like
1074
1075 11,12,13,...19
1076 21,22,...28,29
1077 : :
1078 91,...97,98,99
1079
1080 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1081
1082 11,12,14
1083 21,22
1084 33,34
1085 41,43,44
1086
1087 Overlapping cell-specs will return those cells only once, So
1088 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1089
1090 11,12,13
1091 21,22,23,24
1092 31,32,33,34
1093 42,43,44
1094
1095 RFC7111 <http://tools.ietf.org/html/rfc7111> does not allow different
1096 types of specs to be combined (either "row" or "col" or "cell").
1097 Passing an invalid fragment specification will croak and set error
1098 2013.
1099
1100 column_names
1101 Set the "keys" that will be used in the "getline_hr" calls. If no
1102 keys (column names) are passed, it will return the current setting as a
1103 list.
1104
1105 "column_names" accepts a list of scalars (the column names) or a
1106 single array_ref, so you can pass the return value from "getline" too:
1107
1108 $csv->column_names ($csv->getline ($fh));
1109
1110 "column_names" does no checking on duplicates at all, which might lead
1111 to unexpected results. Undefined entries will be replaced with the
1112 string "\cAUNDEF\cA", so
1113
1114 $csv->column_names (undef, "", "name", "name");
1115 $hr = $csv->getline_hr ($fh);
1116
1117 will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1118 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1119 field.
1120
1121 "column_names" croaks on invalid arguments.
1122
1123 header
1124 This method does NOT work in perl-5.6.x
1125
1126 Parse the CSV header and set "sep", column_names and encoding.
1127
1128 my @hdr = $csv->header ($fh);
1129 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1130 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1131
1132 The first argument should be a file handle.
1133
1134 This method resets some object properties, as it is supposed to be
1135 invoked only once per file or stream. It will leave attributes
1136 "column_names" and "bound_columns" alone if setting column names is
1137 disabled. Reading headers on previously process objects might fail on
1138 perl-5.8.0 and older.
1139
1140 Assuming that the file opened for parsing has a header, and the header
1141 does not contain problematic characters like embedded newlines, read
1142 the first line from the open handle then auto-detect whether the header
1143 separates the column names with a character from the allowed separator
1144 list.
1145
1146 If any of the allowed separators matches, and none of the other
1147 allowed separators match, set "sep" to that separator for the
1148 current CSV instance and use it to parse the first line, map those to
1149 lowercase, and use that to set the instance "column_names":
1150
1151 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1152 open my $fh, "<", "file.csv";
1153 binmode $fh; # for Windows
1154 $csv->header ($fh);
1155 while (my $row = $csv->getline_hr ($fh)) {
1156 ...
1157 }
1158
1159 If the header is empty, contains more than one unique separator out of
1160 the allowed set, contains empty fields, or contains identical fields
1161 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1162 respectively.
1163
1164 If the header contains embedded newlines or is not valid CSV in any
1165 other way, this method will croak and leave the parse error untouched.
1166
1167 A successful call to "header" will always set the "sep" of the $csv
1168 object. This behavior can not be disabled.
1169
1170 return value
1171
1172 On error this method will croak.
1173
1174 In list context, the headers will be returned whether they are used to
1175 set "column_names" or not.
1176
1177 In scalar context, the instance itself is returned. Note: the values
1178 as found in the header will effectively be lost if "set_column_names"
1179 is false.
1180
1181 Options
1182
1183 sep_set
1184 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1185
1186 The list of legal separators defaults to "[ ";", "," ]" and can be
1187 changed by this option. As this is probably the most often used
1188 option, it can be passed on its own as an unnamed argument:
1189
1190 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1191
1192 Multi-byte sequences are allowed, both multi-character and
1193 Unicode. See "sep".
1194
1195 detect_bom
1196 $csv->header ($fh, { detect_bom => 1 });
1197
1198 The default behavior is to detect if the header line starts with a
1199 BOM. If the header has a BOM, use that to set the encoding of $fh.
1200 This default behavior can be disabled by passing a false value to
1201 "detect_bom".
1202
1203 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1204 UTF-32BE, and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1205 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1206 supported.
1207
1208 If a supported BOM was detected as start of the stream, it is stored
1209 in the object attribute "ENCODING".
1210
1211 my $enc = $csv->{ENCODING};
1212
1213 The encoding is used with "binmode" on $fh.
1214
1215 If the handle was opened in a (correct) encoding, this method will
1216 not alter the encoding, as it checks the leading bytes of the first
1217 line. In case the stream starts with a decoded BOM ("U+FEFF"),
1218 "{ENCODING}" will be "" (empty) instead of the default "undef".
1219
1220 munge_column_names
1221 This option offers the means to modify the column names into
1222 something that is most useful to the application. The default is to
1223 map all column names to lower case.
1224
1225 $csv->header ($fh, { munge_column_names => "lc" });
1226
1227 The following values are available:
1228
1229 lc - lower case
1230 uc - upper case
1231 db - valid DB field names
1232 none - do not change
1233 \%hash - supply a mapping
1234 \&cb - supply a callback
1235
1236 Lower case
1237 $csv->header ($fh, { munge_column_names => "lc" });
1238
1239 The header is changed to all lower-case
1240
1241 $_ = lc;
1242
1243 Upper case
1244 $csv->header ($fh, { munge_column_names => "uc" });
1245
1246 The header is changed to all upper-case
1247
1248 $_ = uc;
1249
1250 Literal
1251 $csv->header ($fh, { munge_column_names => "none" });
1252
1253 Hash
1254 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1255
1256 if a value does not exist, the original value is used unchanged
1257
1258 Database
1259 $csv->header ($fh, { munge_column_names => "db" });
1260
1261 - lower-case
1262
1263 - all sequences of non-word characters are replaced with an
1264 underscore
1265
1266 - all leading underscores are removed
1267
1268 $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1269
1270 Callback
1271 $csv->header ($fh, { munge_column_names => sub { fc } });
1272 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1273 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1274
1275 As this callback is called in a "map", you can use $_ directly.
1276
1277 set_column_names
1278 $csv->header ($fh, { set_column_names => 1 });
1279
1280 The default is to set the instances column names using
1281 "column_names" if the method is successful, so subsequent calls to
1282 "getline_hr" can return a hash. Disable setting the header can be
1283 forced by using a false value for this option.
1284
1285 As described in "return value" above, content is lost in scalar
1286 context.
1287
1288 Validation
1289
1290 When receiving CSV files from external sources, this method can be
1291 used to protect against changes in the layout by restricting to known
1292 headers (and typos in the header fields).
1293
1294 my %known = (
1295 "record key" => "c_rec",
1296 "rec id" => "c_rec",
1297 "id_rec" => "c_rec",
1298 "kode" => "code",
1299 "code" => "code",
1300 "vaule" => "value",
1301 "value" => "value",
1302 );
1303 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1304 open my $fh, "<", $source or die "$source: $!";
1305 $csv->header ($fh, { munge_column_names => sub {
1306 s/\s+$//;
1307 s/^\s+//;
1308 $known{lc $_} or die "Unknown column '$_' in $source";
1309 }});
1310 while (my $row = $csv->getline_hr ($fh)) {
1311 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1312 }
1313
1314 bind_columns
1315 Takes a list of scalar references to be used for output with "print"
1316 or to store in the fields fetched by "getline". When you do not pass
1317 enough references to store the fetched fields in, "getline" will fail
1318 with error 3006. If you pass more than there are fields to return,
1319 the content of the remaining references is left untouched.
1320
1321 $csv->bind_columns (\$code, \$name, \$price, \$description);
1322 while ($csv->getline ($fh)) {
1323 print "The price of a $name is \x{20ac} $price\n";
1324 }
1325
1326 To reset or clear all column binding, call "bind_columns" with the
1327 single argument "undef". This will also clear column names.
1328
1329 $csv->bind_columns (undef);
1330
1331 If no arguments are passed at all, "bind_columns" will return the list
1332 of current bindings or "undef" if no binds are active.
1333
1334 Note that in parsing with "bind_columns", the fields are set on the
1335 fly. That implies that if the third field of a row causes an error
1336 (or this row has just two fields where the previous row had more), the
1337 first two fields already have been assigned the values of the current
1338 row, while the rest of the fields will still hold the values of the
1339 previous row. If you want the parser to fail in these cases, use the
1340 "strict" attribute.
1341
1342 eof
1343 $eof = $csv->eof ();
1344
1345 If "parse" or "getline" was used with an IO stream, this method will
1346 return true (1) if the last call hit end of file, otherwise it will
1347 return false (''). This is useful to see the difference between a
1348 failure and end of file.
1349
1350 Note that if the parsing of the last line caused an error, "eof" is
1351 still true. That means that if you are not using "auto_diag", an idiom
1352 like
1353
1354 while (my $row = $csv->getline ($fh)) {
1355 # ...
1356 }
1357 $csv->eof or $csv->error_diag;
1358
1359 will not report the error. You would have to change that to
1360
1361 while (my $row = $csv->getline ($fh)) {
1362 # ...
1363 }
1364 +$csv->error_diag and $csv->error_diag;
1365
1366 types
1367 $csv->types (\@tref);
1368
1369 This method is used to force that (all) columns are of a given type.
1370 For example, if you have an integer column, two columns with
1371 doubles and a string column, then you might do a
1372
1373 $csv->types ([Text::CSV::IV (),
1374 Text::CSV::NV (),
1375 Text::CSV::NV (),
1376 Text::CSV::PV ()]);
1377
1378 Column types are used only for decoding columns while parsing, in
1379 other words by the "parse" and "getline" methods.
1380
1381 You can unset column types by doing a
1382
1383 $csv->types (undef);
1384
1385 or fetch the current type settings with
1386
1387 $types = $csv->types ();
1388
1389 IV Set field type to integer.
1390
1391 NV Set field type to numeric/float.
1392
1393 PV Set field type to string.
1394
1395 fields
1396 @columns = $csv->fields ();
1397
1398 This method returns the input to "combine" or the resultant
1399 decomposed fields of a successful "parse", whichever was called more
1400 recently.
1401
1402 Note that the return value is undefined after using "getline", which
1403 does not fill the data structures returned by "parse".
1404
1405 meta_info
1406 @flags = $csv->meta_info ();
1407
1408 This method returns the "flags" of the input to "combine" or the flags
1409 of the resultant decomposed fields of "parse", whichever was called
1410 more recently.
1411
1412 For each field, a meta_info field will hold flags that inform
1413 something about the field returned by the "fields" method or
1414 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1415
1416 " "0x0001
1417 The field was quoted.
1418
1419 " "0x0002
1420 The field was binary.
1421
1422 See the "is_***" methods below.
1423
1424 is_quoted
1425 my $quoted = $csv->is_quoted ($column_idx);
1426
1427 where $column_idx is the (zero-based) index of the column in the
1428 last result of "parse".
1429
1430 This returns a true value if the data in the indicated column was
1431 enclosed in "quote_char" quotes. This might be important for fields
1432 where content ",20070108," is to be treated as a numeric value, and
1433 where ","20070108"," is explicitly marked as character string data.
1434
1435 This method is only valid when "keep_meta_info" is set to a true value.
1436
1437 is_binary
1438 my $binary = $csv->is_binary ($column_idx);
1439
1440 where $column_idx is the (zero-based) index of the column in the
1441 last result of "parse".
1442
1443 This returns a true value if the data in the indicated column contained
1444 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1445
1446 This method is only valid when "keep_meta_info" is set to a true value.
1447
1448 is_missing
1449 my $missing = $csv->is_missing ($column_idx);
1450
1451 where $column_idx is the (zero-based) index of the column in the
1452 last result of "getline_hr".
1453
1454 $csv->keep_meta_info (1);
1455 while (my $hr = $csv->getline_hr ($fh)) {
1456 $csv->is_missing (0) and next; # This was an empty line
1457 }
1458
1459 When using "getline_hr", it is impossible to tell if the parsed
1460 fields are "undef" because they where not filled in the "CSV" stream
1461 or because they were not read at all, as all the fields defined by
1462 "column_names" are set in the hash-ref. If you still need to know if
1463 all fields in each row are provided, you should enable "keep_meta_info"
1464 so you can check the flags.
1465
1466 If "keep_meta_info" is "false", "is_missing" will always return
1467 "undef", regardless of $column_idx being valid or not. If this
1468 attribute is "true" it will return either 0 (the field is present) or 1
1469 (the field is missing).
1470
1471 A special case is the empty line. If the line is completely empty -
1472 after dealing with the flags - this is still a valid CSV line: it is a
1473 record of just one single empty field. However, if "keep_meta_info" is
1474 set, invoking "is_missing" with index 0 will now return true.
1475
1476 status
1477 $status = $csv->status ();
1478
1479 This method returns the status of the last invoked "combine" or "parse"
1480 call. Status is success (true: 1) or failure (false: "undef" or 0).
1481
1482 Note that as this only keeps track of the status of above mentioned
1483 methods, you are probably looking for "error_diag" instead.
1484
1485 error_input
1486 $bad_argument = $csv->error_input ();
1487
1488 This method returns the erroneous argument (if it exists) of "combine"
1489 or "parse", whichever was called more recently. If the last
1490 invocation was successful, "error_input" will return "undef".
1491
1492 Depending on the type of error, it might also hold the data for the
1493 last error-input of "getline".
1494
1495 error_diag
1496 Text::CSV->error_diag ();
1497 $csv->error_diag ();
1498 $error_code = 0 + $csv->error_diag ();
1499 $error_str = "" . $csv->error_diag ();
1500 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1501
1502 If (and only if) an error occurred, this function returns the
1503 diagnostics of that error.
1504
1505 If called in void context, this will print the internal error code and
1506 the associated error message to STDERR.
1507
1508 If called in list context, this will return the error code and the
1509 error message in that order. If the last error was from parsing, the
1510 rest of the values returned are a best guess at the location within
1511 the line that was being parsed. Their values are 1-based. The
1512 position currently is index of the byte at which the parsing failed in
1513 the current record. It might change to be the index of the current
1514 character in a later release. The records is the index of the record
1515 parsed by the csv instance. The field number is the index of the field
1516 the parser thinks it is currently trying to parse. See
1517 examples/csv-check for how this can be used.
1518
1519 If called in scalar context, it will return the diagnostics in a
1520 single scalar, a-la $!. It will contain the error code in numeric
1521 context, and the diagnostics message in string context.
1522
1523 When called as a class method or a direct function call, the
1524 diagnostics are that of the last "new" call.
1525
1526 record_number
1527 $recno = $csv->record_number ();
1528
1529 Returns the records parsed by this csv instance. This value should be
1530 more accurate than $. when embedded newlines come in play. Records
1531 written by this instance are not counted.
1532
1533 SetDiag
1534 $csv->SetDiag (0);
1535
1536 Use to reset the diagnostics if you are dealing with errors.
1537
1539 backend
1540 Returns the backend module name called by Text::CSV. "module" is
1541 an alias.
1542
1543 is_xs
1544 Returns true value if Text::CSV uses an XS backend.
1545
1546 is_pp
1547 Returns true value if Text::CSV uses a pure-Perl backend.
1548
1550 This section is also taken from Text::CSV_XS.
1551
1552 csv
1553 This function is not exported by default and should be explicitly
1554 requested:
1555
1556 use Text::CSV qw( csv );
1557
1558 This is a high-level function that aims at simple (user) interfaces.
1559 This can be used to read/parse a "CSV" file or stream (the default
1560 behavior) or to produce a file or write to a stream (define the "out"
1561 attribute). It returns an array- or hash-reference on parsing (or
1562 "undef" on fail) or the numeric value of "error_diag" on writing.
1563 When this function fails you can get to the error using the class call
1564 to "error_diag"
1565
1566 my $aoa = csv (in => "test.csv") or
1567 die Text::CSV->error_diag;
1568
1569 This function takes the arguments as key-value pairs. This can be
1570 passed as a list or as an anonymous hash:
1571
1572 my $aoa = csv ( in => "test.csv", sep_char => ";");
1573 my $aoh = csv ({ in => $fh, headers => "auto" });
1574
1575 The arguments passed consist of two parts: the arguments to "csv"
1576 itself and the optional attributes to the "CSV" object used inside
1577 the function as enumerated and explained in "new".
1578
1579 If not overridden, the default option used for CSV is
1580
1581 auto_diag => 1
1582 escape_null => 0
1583
1584 The option that is always set and cannot be altered is
1585
1586 binary => 1
1587
1588 As this function will likely be used in one-liners, it allows "quote"
1589 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1590 "esc" or "escape".
1591
1592 Alternative invocations:
1593
1594 my $aoa = Text::CSV::csv (in => "file.csv");
1595
1596 my $csv = Text::CSV->new ();
1597 my $aoa = $csv->csv (in => "file.csv");
1598
1599 In the latter case, the object attributes are used from the existing
1600 object and the attribute arguments in the function call are ignored:
1601
1602 my $csv = Text::CSV->new ({ sep_char => ";" });
1603 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1604
1605 will parse using ";" as "sep_char", not ",".
1606
1607 in
1608
1609 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1610 which will be opened for reading and closed when finished, a file
1611 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1612 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1613 "\q{1,2,"csv"}").
1614
1615 When used with "out", "in" should be a reference to a CSV structure
1616 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1617 reference. The code-ref will be invoked with no arguments.
1618
1619 my $aoa = csv (in => "file.csv");
1620
1621 open my $fh, "<", "file.csv";
1622 my $aoa = csv (in => $fh);
1623
1624 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1625 my $err = csv (in => $csv, out => "file.csv");
1626
1627 If called in void context without the "out" attribute, the resulting
1628 ref will be used as input to a subsequent call to csv:
1629
1630 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1631
1632 will be a shortcut to
1633
1634 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1635
1636 where, in the absence of the "out" attribute, this is a shortcut to
1637
1638 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1639 out => *STDOUT)
1640
1641 out
1642
1643 csv (in => $aoa, out => "file.csv");
1644 csv (in => $aoa, out => $fh);
1645 csv (in => $aoa, out => STDOUT);
1646 csv (in => $aoa, out => *STDOUT);
1647 csv (in => $aoa, out => \*STDOUT);
1648 csv (in => $aoa, out => \my $data);
1649 csv (in => $aoa, out => undef);
1650 csv (in => $aoa, out => \"skip");
1651
1652 csv (in => $fh, out => \@aoa);
1653 csv (in => $fh, out => \@aoh, bom => 1);
1654 csv (in => $fh, out => \%hsh, key => "key");
1655
1656 In output mode, the default CSV options when producing CSV are
1657
1658 eol => "\r\n"
1659
1660 The "fragment" attribute is ignored in output mode.
1661
1662 "out" can be a file name (e.g. "file.csv"), which will be opened for
1663 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1664 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1665 or a reference to a scalar (e.g. "\my $data").
1666
1667 csv (in => sub { $sth->fetch }, out => "dump.csv");
1668 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1669 headers => $sth->{NAME_lc});
1670
1671 When a code-ref is used for "in", the output is generated per
1672 invocation, so no buffering is involved. This implies that there is no
1673 size restriction on the number of records. The "csv" function ends when
1674 the coderef returns a false value.
1675
1676 If "out" is set to a reference of the literal string "skip", the output
1677 will be suppressed completely, which might be useful in combination
1678 with a filter for side effects only.
1679
1680 my %cache;
1681 csv (in => "dump.csv",
1682 out => \"skip",
1683 on_in => sub { $cache{$_[1][1]}++ });
1684
1685 Currently, setting "out" to any false value ("undef", "", 0) will be
1686 equivalent to "\"skip"".
1687
1688 If the "in" argument point to something to parse, and the "out" is set
1689 to a reference to an "ARRAY" or a "HASH", the output is appended to the
1690 data in the existing reference. The result of the parse should match
1691 what exists in the reference passed. This might come handy when you
1692 have to parse a set of files with similar content (like data stored per
1693 period) and you want to collect that into a single data structure:
1694
1695 my %hash;
1696 csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1697
1698 my @list; # List of arrays
1699 csv (in => $_, out => \@list) for sort glob "foo-[0-9]*.csv";
1700
1701 my @list; # List of hashes
1702 csv (in => $_, out => \@list, bom => 1) for sort glob "foo-[0-9]*.csv";
1703
1704 encoding
1705
1706 If passed, it should be an encoding accepted by the ":encoding()"
1707 option to "open". There is no default value. This attribute does not
1708 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1709 use in command line invocations.
1710
1711 If "encoding" is set to the literal value "auto", the method "header"
1712 will be invoked on the opened stream to check if there is a BOM and set
1713 the encoding accordingly. This is equal to passing a true value in
1714 the option "detect_bom".
1715
1716 Encodings can be stacked, as supported by "binmode":
1717
1718 # Using PerlIO::via::gzip
1719 csv (in => \@csv,
1720 out => "test.csv:via.gz",
1721 encoding => ":via(gzip):encoding(utf-8)",
1722 );
1723 $aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
1724
1725 # Using PerlIO::gzip
1726 csv (in => \@csv,
1727 out => "test.csv:via.gz",
1728 encoding => ":gzip:encoding(utf-8)",
1729 );
1730 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1731
1732 detect_bom
1733
1734 If "detect_bom" is given, the method "header" will be invoked on
1735 the opened stream to check if there is a BOM and set the encoding
1736 accordingly.
1737
1738 "detect_bom" can be abbreviated to "bom".
1739
1740 This is the same as setting "encoding" to "auto".
1741
1742 Note that as the method "header" is invoked, its default is to also
1743 set the headers.
1744
1745 headers
1746
1747 If this attribute is not given, the default behavior is to produce an
1748 array of arrays.
1749
1750 If "headers" is supplied, it should be an anonymous list of column
1751 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1752 "lc", "uc", or "skip".
1753
1754 skip
1755 When "skip" is used, the header will not be included in the output.
1756
1757 my $aoa = csv (in => $fh, headers => "skip");
1758
1759 auto
1760 If "auto" is used, the first line of the "CSV" source will be read as
1761 the list of field headers and used to produce an array of hashes.
1762
1763 my $aoh = csv (in => $fh, headers => "auto");
1764
1765 lc
1766 If "lc" is used, the first line of the "CSV" source will be read as
1767 the list of field headers mapped to lower case and used to produce
1768 an array of hashes. This is a variation of "auto".
1769
1770 my $aoh = csv (in => $fh, headers => "lc");
1771
1772 uc
1773 If "uc" is used, the first line of the "CSV" source will be read as
1774 the list of field headers mapped to upper case and used to produce
1775 an array of hashes. This is a variation of "auto".
1776
1777 my $aoh = csv (in => $fh, headers => "uc");
1778
1779 CODE
1780 If a coderef is used, the first line of the "CSV" source will be
1781 read as the list of mangled field headers in which each field is
1782 passed as the only argument to the coderef. This list is used to
1783 produce an array of hashes.
1784
1785 my $aoh = csv (in => $fh,
1786 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1787
1788 this example is a variation of using "lc" where all occurrences of
1789 "kode" are replaced with "code".
1790
1791 ARRAY
1792 If "headers" is an anonymous list, the entries in the list will be
1793 used as field names. The first line is considered data instead of
1794 headers.
1795
1796 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1797 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1798
1799 HASH
1800 If "headers" is a hash reference, this implies "auto", but header
1801 fields that exist as key in the hashref will be replaced by the value
1802 for that key. Given a CSV file like
1803
1804 post-kode,city,name,id number,fubble
1805 1234AA,Duckstad,Donald,13,"X313DF"
1806
1807 using
1808
1809 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1810
1811 will return an entry like
1812
1813 { pc => "1234AA",
1814 city => "Duckstad",
1815 name => "Donald",
1816 ID => "13",
1817 fubble => "X313DF",
1818 }
1819
1820 See also "munge_column_names" and "set_column_names".
1821
1822 munge_column_names
1823
1824 If "munge_column_names" is set, the method "header" is invoked on
1825 the opened stream with all matching arguments to detect and set the
1826 headers.
1827
1828 "munge_column_names" can be abbreviated to "munge".
1829
1830 key
1831
1832 If passed, will default "headers" to "auto" and return a hashref
1833 instead of an array of hashes. Allowed values are simple scalars or
1834 array-references where the first element is the joiner and the rest are
1835 the fields to join to combine the key.
1836
1837 my $ref = csv (in => "test.csv", key => "code");
1838 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1839
1840 with test.csv like
1841
1842 code,product,price,color
1843 1,pc,850,gray
1844 2,keyboard,12,white
1845 3,mouse,5,black
1846
1847 the first example will return
1848
1849 { 1 => {
1850 code => 1,
1851 color => 'gray',
1852 price => 850,
1853 product => 'pc'
1854 },
1855 2 => {
1856 code => 2,
1857 color => 'white',
1858 price => 12,
1859 product => 'keyboard'
1860 },
1861 3 => {
1862 code => 3,
1863 color => 'black',
1864 price => 5,
1865 product => 'mouse'
1866 }
1867 }
1868
1869 the second example will return
1870
1871 { "1:gray" => {
1872 code => 1,
1873 color => 'gray',
1874 price => 850,
1875 product => 'pc'
1876 },
1877 "2:white" => {
1878 code => 2,
1879 color => 'white',
1880 price => 12,
1881 product => 'keyboard'
1882 },
1883 "3:black" => {
1884 code => 3,
1885 color => 'black',
1886 price => 5,
1887 product => 'mouse'
1888 }
1889 }
1890
1891 The "key" attribute can be combined with "headers" for "CSV" date that
1892 has no header line, like
1893
1894 my $ref = csv (
1895 in => "foo.csv",
1896 headers => [qw( c_foo foo bar description stock )],
1897 key => "c_foo",
1898 );
1899
1900 value
1901
1902 Used to create key-value hashes.
1903
1904 Only allowed when "key" is valid. A "value" can be either a single
1905 column label or an anonymous list of column labels. In the first case,
1906 the value will be a simple scalar value, in the latter case, it will be
1907 a hashref.
1908
1909 my $ref = csv (in => "test.csv", key => "code",
1910 value => "price");
1911 my $ref = csv (in => "test.csv", key => "code",
1912 value => [ "product", "price" ]);
1913 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1914 value => "price");
1915 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1916 value => [ "product", "price" ]);
1917
1918 with test.csv like
1919
1920 code,product,price,color
1921 1,pc,850,gray
1922 2,keyboard,12,white
1923 3,mouse,5,black
1924
1925 the first example will return
1926
1927 { 1 => 850,
1928 2 => 12,
1929 3 => 5,
1930 }
1931
1932 the second example will return
1933
1934 { 1 => {
1935 price => 850,
1936 product => 'pc'
1937 },
1938 2 => {
1939 price => 12,
1940 product => 'keyboard'
1941 },
1942 3 => {
1943 price => 5,
1944 product => 'mouse'
1945 }
1946 }
1947
1948 the third example will return
1949
1950 { "1:gray" => 850,
1951 "2:white" => 12,
1952 "3:black" => 5,
1953 }
1954
1955 the fourth example will return
1956
1957 { "1:gray" => {
1958 price => 850,
1959 product => 'pc'
1960 },
1961 "2:white" => {
1962 price => 12,
1963 product => 'keyboard'
1964 },
1965 "3:black" => {
1966 price => 5,
1967 product => 'mouse'
1968 }
1969 }
1970
1971 keep_headers
1972
1973 When using hashes, keep the column names into the arrayref passed, so
1974 all headers are available after the call in the original order.
1975
1976 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1977
1978 This attribute can be abbreviated to "kh" or passed as
1979 "keep_column_names".
1980
1981 This attribute implies a default of "auto" for the "headers" attribute.
1982
1983 fragment
1984
1985 Only output the fragment as defined in the "fragment" method. This
1986 option is ignored when generating "CSV". See "out".
1987
1988 Combining all of them could give something like
1989
1990 use Text::CSV qw( csv );
1991 my $aoh = csv (
1992 in => "test.txt",
1993 encoding => "utf-8",
1994 headers => "auto",
1995 sep_char => "|",
1996 fragment => "row=3;6-9;15-*",
1997 );
1998 say $aoh->[15]{Foo};
1999
2000 sep_set
2001
2002 If "sep_set" is set, the method "header" is invoked on the opened
2003 stream to detect and set "sep_char" with the given set.
2004
2005 "sep_set" can be abbreviated to "seps".
2006
2007 Note that as the "header" method is invoked, its default is to also
2008 set the headers.
2009
2010 set_column_names
2011
2012 If "set_column_names" is passed, the method "header" is invoked on
2013 the opened stream with all arguments meant for "header".
2014
2015 If "set_column_names" is passed as a false value, the content of the
2016 first row is only preserved if the output is AoA:
2017
2018 With an input-file like
2019
2020 bAr,foo
2021 1,2
2022 3,4,5
2023
2024 This call
2025
2026 my $aoa = csv (in => $file, set_column_names => 0);
2027
2028 will result in
2029
2030 [[ "bar", "foo" ],
2031 [ "1", "2" ],
2032 [ "3", "4", "5" ]]
2033
2034 and
2035
2036 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2037
2038 will result in
2039
2040 [[ "bAr", "foo" ],
2041 [ "1", "2" ],
2042 [ "3", "4", "5" ]]
2043
2044 Callbacks
2045 Callbacks enable actions triggered from the inside of Text::CSV.
2046
2047 While most of what this enables can easily be done in an unrolled
2048 loop as described in the "SYNOPSIS" callbacks can be used to meet
2049 special demands or enhance the "csv" function.
2050
2051 error
2052 $csv->callbacks (error => sub { $csv->SetDiag (0) });
2053
2054 the "error" callback is invoked when an error occurs, but only
2055 when "auto_diag" is set to a true value. A callback is invoked with
2056 the values returned by "error_diag":
2057
2058 my ($c, $s);
2059
2060 sub ignore3006 {
2061 my ($err, $msg, $pos, $recno, $fldno) = @_;
2062 if ($err == 3006) {
2063 # ignore this error
2064 ($c, $s) = (undef, undef);
2065 Text::CSV->SetDiag (0);
2066 }
2067 # Any other error
2068 return;
2069 } # ignore3006
2070
2071 $csv->callbacks (error => \&ignore3006);
2072 $csv->bind_columns (\$c, \$s);
2073 while ($csv->getline ($fh)) {
2074 # Error 3006 will not stop the loop
2075 }
2076
2077 after_parse
2078 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2079 while (my $row = $csv->getline ($fh)) {
2080 $row->[-1] eq "NEW";
2081 }
2082
2083 This callback is invoked after parsing with "getline" only if no
2084 error occurred. The callback is invoked with two arguments: the
2085 current "CSV" parser object and an array reference to the fields
2086 parsed.
2087
2088 The return code of the callback is ignored unless it is a reference
2089 to the string "skip", in which case the record will be skipped in
2090 "getline_all".
2091
2092 sub add_from_db {
2093 my ($csv, $row) = @_;
2094 $sth->execute ($row->[4]);
2095 push @$row, $sth->fetchrow_array;
2096 } # add_from_db
2097
2098 my $aoa = csv (in => "file.csv", callbacks => {
2099 after_parse => \&add_from_db });
2100
2101 This hook can be used for validation:
2102
2103 FAIL
2104 Die if any of the records does not validate a rule:
2105
2106 after_parse => sub {
2107 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2108 die "5th field does not have a valid Dutch zipcode";
2109 }
2110
2111 DEFAULT
2112 Replace invalid fields with a default value:
2113
2114 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2115
2116 SKIP
2117 Skip records that have invalid fields (only applies to
2118 "getline_all"):
2119
2120 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2121
2122 before_print
2123 my $idx = 1;
2124 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2125 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2126
2127 This callback is invoked before printing with "print" only if no
2128 error occurred. The callback is invoked with two arguments: the
2129 current "CSV" parser object and an array reference to the fields
2130 passed.
2131
2132 The return code of the callback is ignored.
2133
2134 sub max_4_fields {
2135 my ($csv, $row) = @_;
2136 @$row > 4 and splice @$row, 4;
2137 } # max_4_fields
2138
2139 csv (in => csv (in => "file.csv"), out => *STDOUT,
2140 callbacks => { before_print => \&max_4_fields });
2141
2142 This callback is not active for "combine".
2143
2144 Callbacks for csv ()
2145
2146 The "csv" allows for some callbacks that do not integrate in XS
2147 internals but only feature the "csv" function.
2148
2149 csv (in => "file.csv",
2150 callbacks => {
2151 filter => { 6 => sub { $_ > 15 } }, # first
2152 after_parse => sub { say "AFTER PARSE"; }, # first
2153 after_in => sub { say "AFTER IN"; }, # second
2154 on_in => sub { say "ON IN"; }, # third
2155 },
2156 );
2157
2158 csv (in => $aoh,
2159 out => "file.csv",
2160 callbacks => {
2161 on_in => sub { say "ON IN"; }, # first
2162 before_out => sub { say "BEFORE OUT"; }, # second
2163 before_print => sub { say "BEFORE PRINT"; }, # third
2164 },
2165 );
2166
2167 filter
2168 This callback can be used to filter records. It is called just after
2169 a new record has been scanned. The callback accepts a:
2170
2171 hashref
2172 The keys are the index to the row (the field name or field number,
2173 1-based) and the values are subs to return a true or false value.
2174
2175 csv (in => "file.csv", filter => {
2176 3 => sub { m/a/ }, # third field should contain an "a"
2177 5 => sub { length > 4 }, # length of the 5th field minimal 5
2178 });
2179
2180 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2181
2182 If the keys to the filter hash contain any character that is not a
2183 digit it will also implicitly set "headers" to "auto" unless
2184 "headers" was already passed as argument. When headers are
2185 active, returning an array of hashes, the filter is not applicable
2186 to the header itself.
2187
2188 All sub results should match, as in AND.
2189
2190 The context of the callback sets $_ localized to the field
2191 indicated by the filter. The two arguments are as with all other
2192 callbacks, so the other fields in the current row can be seen:
2193
2194 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2195
2196 If the context is set to return a list of hashes ("headers" is
2197 defined), the current record will also be available in the
2198 localized %_:
2199
2200 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2201
2202 If the filter is used to alter the content by changing $_, make
2203 sure that the sub returns true in order not to have that record
2204 skipped:
2205
2206 filter => { 2 => sub { $_ = uc }}
2207
2208 will upper-case the second field, and then skip it if the resulting
2209 content evaluates to false. To always accept, end with truth:
2210
2211 filter => { 2 => sub { $_ = uc; 1 }}
2212
2213 coderef
2214 csv (in => "file.csv", filter => sub { $n++; 0; });
2215
2216 If the argument to "filter" is a coderef, it is an alias or
2217 shortcut to a filter on column 0:
2218
2219 csv (filter => sub { $n++; 0 });
2220
2221 is equal to
2222
2223 csv (filter => { 0 => sub { $n++; 0 });
2224
2225 filter-name
2226 csv (in => "file.csv", filter => "not_blank");
2227 csv (in => "file.csv", filter => "not_empty");
2228 csv (in => "file.csv", filter => "filled");
2229
2230 These are predefined filters
2231
2232 Given a file like (line numbers prefixed for doc purpose only):
2233
2234 1:1,2,3
2235 2:
2236 3:,
2237 4:""
2238 5:,,
2239 6:, ,
2240 7:"",
2241 8:" "
2242 9:4,5,6
2243
2244 not_blank
2245 Filter out the blank lines
2246
2247 This filter is a shortcut for
2248
2249 filter => { 0 => sub { @{$_[1]} > 1 or
2250 defined $_[1][0] && $_[1][0] ne "" } }
2251
2252 Due to the implementation, it is currently impossible to also
2253 filter lines that consists only of a quoted empty field. These
2254 lines are also considered blank lines.
2255
2256 With the given example, lines 2 and 4 will be skipped.
2257
2258 not_empty
2259 Filter out lines where all the fields are empty.
2260
2261 This filter is a shortcut for
2262
2263 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2264
2265 A space is not regarded being empty, so given the example data,
2266 lines 2, 3, 4, 5, and 7 are skipped.
2267
2268 filled
2269 Filter out lines that have no visible data
2270
2271 This filter is a shortcut for
2272
2273 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2274
2275 This filter rejects all lines that not have at least one field
2276 that does not evaluate to the empty string.
2277
2278 With the given example data, this filter would skip lines 2
2279 through 8.
2280
2281 One could also use modules like Types::Standard:
2282
2283 use Types::Standard -types;
2284
2285 my $type = Tuple[Str, Str, Int, Bool, Optional[Num]];
2286 my $check = $type->compiled_check;
2287
2288 # filter with compiled check and warnings
2289 my $aoa = csv (
2290 in => \$data,
2291 filter => {
2292 0 => sub {
2293 my $ok = $check->($_[1]) or
2294 warn $type->get_message ($_[1]), "\n";
2295 return $ok;
2296 },
2297 },
2298 );
2299
2300 after_in
2301 This callback is invoked for each record after all records have been
2302 parsed but before returning the reference to the caller. The hook is
2303 invoked with two arguments: the current "CSV" parser object and a
2304 reference to the record. The reference can be a reference to a
2305 HASH or a reference to an ARRAY as determined by the arguments.
2306
2307 This callback can also be passed as an attribute without the
2308 "callbacks" wrapper.
2309
2310 before_out
2311 This callback is invoked for each record before the record is
2312 printed. The hook is invoked with two arguments: the current "CSV"
2313 parser object and a reference to the record. The reference can be a
2314 reference to a HASH or a reference to an ARRAY as determined by the
2315 arguments.
2316
2317 This callback can also be passed as an attribute without the
2318 "callbacks" wrapper.
2319
2320 This callback makes the row available in %_ if the row is a hashref.
2321 In this case %_ is writable and will change the original row.
2322
2323 on_in
2324 This callback acts exactly as the "after_in" or the "before_out"
2325 hooks.
2326
2327 This callback can also be passed as an attribute without the
2328 "callbacks" wrapper.
2329
2330 This callback makes the row available in %_ if the row is a hashref.
2331 In this case %_ is writable and will change the original row. So e.g.
2332 with
2333
2334 my $aoh = csv (
2335 in => \"foo\n1\n2\n",
2336 headers => "auto",
2337 on_in => sub { $_{bar} = 2; },
2338 );
2339
2340 $aoh will be:
2341
2342 [ { foo => 1,
2343 bar => 2,
2344 }
2345 { foo => 2,
2346 bar => 2,
2347 }
2348 ]
2349
2350 csv
2351 The function "csv" can also be called as a method or with an
2352 existing Text::CSV object. This could help if the function is to be
2353 invoked a lot of times and the overhead of creating the object
2354 internally over and over again would be prevented by passing an
2355 existing instance.
2356
2357 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2358
2359 my $aoa = $csv->csv (in => $fh);
2360 my $aoa = csv (in => $fh, csv => $csv);
2361
2362 both act the same. Running this 20000 times on a 20 lines CSV file,
2363 showed a 53% speedup.
2364
2366 This section is also taken from Text::CSV_XS.
2367
2368 Still under construction ...
2369
2370 If an error occurs, "$csv->error_diag" can be used to get information
2371 on the cause of the failure. Note that for speed reasons the internal
2372 value is never cleared on success, so using the value returned by
2373 "error_diag" in normal cases - when no error occurred - may cause
2374 unexpected results.
2375
2376 If the constructor failed, the cause can be found using "error_diag" as
2377 a class method, like "Text::CSV->error_diag".
2378
2379 The "$csv->error_diag" method is automatically invoked upon error when
2380 the contractor was called with "auto_diag" set to 1 or 2, or when
2381 autodie is in effect. When set to 1, this will cause a "warn" with the
2382 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2383 from "auto_diag" reports.
2384
2385 Errors can be (individually) caught using the "error" callback.
2386
2387 The errors as described below are available. I have tried to make the
2388 error itself explanatory enough, but more descriptions will be added.
2389 For most of these errors, the first three capitals describe the error
2390 category:
2391
2392 • INI
2393
2394 Initialization error or option conflict.
2395
2396 • ECR
2397
2398 Carriage-Return related parse error.
2399
2400 • EOF
2401
2402 End-Of-File related parse error.
2403
2404 • EIQ
2405
2406 Parse error inside quotation.
2407
2408 • EIF
2409
2410 Parse error inside field.
2411
2412 • ECB
2413
2414 Combine error.
2415
2416 • EHR
2417
2418 HashRef parse related error.
2419
2420 And below should be the complete list of error codes that can be
2421 returned:
2422
2423 • 1001 "INI - sep_char is equal to quote_char or escape_char"
2424
2425 The separation character cannot be equal to the quotation
2426 character or to the escape character, as this would invalidate all
2427 parsing rules.
2428
2429 • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2430 TAB"
2431
2432 Using the "allow_whitespace" attribute when either "quote_char" or
2433 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2434 allow.
2435
2436 • 1003 "INI - \r or \n in main attr not allowed"
2437
2438 Using default "eol" characters in either "sep_char", "quote_char",
2439 or "escape_char" is not allowed.
2440
2441 • 1004 "INI - callbacks should be undef or a hashref"
2442
2443 The "callbacks" attribute only allows one to be "undef" or a hash
2444 reference.
2445
2446 • 1005 "INI - EOL too long"
2447
2448 The value passed for EOL is exceeding its maximum length (16).
2449
2450 • 1006 "INI - SEP too long"
2451
2452 The value passed for SEP is exceeding its maximum length (16).
2453
2454 • 1007 "INI - QUOTE too long"
2455
2456 The value passed for QUOTE is exceeding its maximum length (16).
2457
2458 • 1008 "INI - SEP undefined"
2459
2460 The value passed for SEP should be defined and not empty.
2461
2462 • 1010 "INI - the header is empty"
2463
2464 The header line parsed in the "header" is empty.
2465
2466 • 1011 "INI - the header contains more than one valid separator"
2467
2468 The header line parsed in the "header" contains more than one
2469 (unique) separator character out of the allowed set of separators.
2470
2471 • 1012 "INI - the header contains an empty field"
2472
2473 The header line parsed in the "header" contains an empty field.
2474
2475 • 1013 "INI - the header contains nun-unique fields"
2476
2477 The header line parsed in the "header" contains at least two
2478 identical fields.
2479
2480 • 1014 "INI - header called on undefined stream"
2481
2482 The header line cannot be parsed from an undefined source.
2483
2484 • 1500 "PRM - Invalid/unsupported argument(s)"
2485
2486 Function or method called with invalid argument(s) or parameter(s).
2487
2488 • 1501 "PRM - The key attribute is passed as an unsupported type"
2489
2490 The "key" attribute is of an unsupported type.
2491
2492 • 1502 "PRM - The value attribute is passed without the key attribute"
2493
2494 The "value" attribute is only allowed when a valid key is given.
2495
2496 • 1503 "PRM - The value attribute is passed as an unsupported type"
2497
2498 The "value" attribute is of an unsupported type.
2499
2500 • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2501
2502 When "eol" has been set to anything but the default, like
2503 "\r\t\n", and the "\r" is following the second (closing)
2504 "quote_char", where the characters following the "\r" do not make up
2505 the "eol" sequence, this is an error.
2506
2507 • 2011 "ECR - Characters after end of quoted field"
2508
2509 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2510 quoted field and after the closing double-quote, there should be
2511 either a new-line sequence or a separation character.
2512
2513 • 2012 "EOF - End of data in parsing input stream"
2514
2515 Self-explaining. End-of-file while inside parsing a stream. Can
2516 happen only when reading from streams with "getline", as using
2517 "parse" is done on strings that are not required to have a trailing
2518 "eol".
2519
2520 • 2013 "INI - Specification error for fragments RFC7111"
2521
2522 Invalid specification for URI "fragment" specification.
2523
2524 • 2014 "ENF - Inconsistent number of fields"
2525
2526 Inconsistent number of fields under strict parsing.
2527
2528 • 2021 "EIQ - NL char inside quotes, binary off"
2529
2530 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2531 option has been selected with the constructor.
2532
2533 • 2022 "EIQ - CR char inside quotes, binary off"
2534
2535 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2536 option has been selected with the constructor.
2537
2538 • 2023 "EIQ - QUO character not allowed"
2539
2540 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2541 Bar",\n" will cause this error.
2542
2543 • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2544
2545 The escape character is not allowed as last character in an input
2546 stream.
2547
2548 • 2025 "EIQ - Loose unescaped escape"
2549
2550 An escape character should escape only characters that need escaping.
2551
2552 Allowing the escape for other characters is possible with the
2553 attribute "allow_loose_escapes".
2554
2555 • 2026 "EIQ - Binary character inside quoted field, binary off"
2556
2557 Binary characters are not allowed by default. Exceptions are
2558 fields that contain valid UTF-8, that will automatically be upgraded
2559 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2560 data.
2561
2562 • 2027 "EIQ - Quoted field not terminated"
2563
2564 When parsing a field that started with a quotation character, the
2565 field is expected to be closed with a quotation character. When the
2566 parsed line is exhausted before the quote is found, that field is not
2567 terminated.
2568
2569 • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2570
2571 • 2031 "EIF - CR char is first char of field, not part of EOL"
2572
2573 • 2032 "EIF - CR char inside unquoted, not part of EOL"
2574
2575 • 2034 "EIF - Loose unescaped quote"
2576
2577 • 2035 "EIF - Escaped EOF in unquoted field"
2578
2579 • 2036 "EIF - ESC error"
2580
2581 • 2037 "EIF - Binary character in unquoted field, binary off"
2582
2583 • 2110 "ECB - Binary character in Combine, binary off"
2584
2585 • 2200 "EIO - print to IO failed. See errno"
2586
2587 • 3001 "EHR - Unsupported syntax for column_names ()"
2588
2589 • 3002 "EHR - getline_hr () called before column_names ()"
2590
2591 • 3003 "EHR - bind_columns () and column_names () fields count
2592 mismatch"
2593
2594 • 3004 "EHR - bind_columns () only accepts refs to scalars"
2595
2596 • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2597 fields"
2598
2599 • 3007 "EHR - bind_columns needs refs to writable scalars"
2600
2601 • 3008 "EHR - unexpected error in bound fields"
2602
2603 • 3009 "EHR - print_hr () called before column_names ()"
2604
2605 • 3010 "EHR - print_hr () called with invalid arguments"
2606
2608 Text::CSV_PP, Text::CSV_XS and Text::CSV::Encoded.
2609
2611 Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2612 Please don't send mail concerning Text::CSV to Alan, as he's not a
2613 present maintainer.
2614
2615 Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2616 in C by implementing a simple finite-state machine and added the
2617 variable quote, escape and separator characters, the binary mode and
2618 the print and getline methods. See ChangeLog releases 0.10 through
2619 0.23.
2620
2621 H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2622 field flags methods, wrote the major part of the test suite, completed
2623 the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2624 on.
2625
2626 Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2627 which is the pure-Perl version of Text::CSV_XS.
2628
2629 New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2630 Ishigaki since 1.91.
2631
2633 Text::CSV
2634
2635 Copyright (C) 1997 Alan Citterman. All rights reserved. Copyright (C)
2636 2007-2015 Makamaka Hannyaharamitu. Copyright (C) 2017- Kenichi
2637 Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2638 below.
2639
2640 Text::CSV_PP:
2641
2642 Copyright (C) 2005-2015 Makamaka Hannyaharamitu. Copyright (C) 2017-
2643 Kenichi Ishigaki A large portion of the code/doc are also taken from
2644 Text::CSV_XS. See below.
2645
2646 Text:CSV_XS:
2647
2648 Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V. Copyright (C)
2649 1998-2001 Jochen Wiedmann. All rights reserved. Portions Copyright (C)
2650 1997 Alan Citterman. All rights reserved.
2651
2652 This library is free software; you can redistribute it and/or modify it
2653 under the same terms as Perl itself.
2654
2655
2656
2657perl v5.34.0 2021-07-23 Text::CSV(3)