1Text::CSV(3) User Contributed Perl Documentation Text::CSV(3)
2
3
4
6 Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7
9 This section is taken from Text::CSV_XS.
10
11 # Functional interface
12 use Text::CSV qw( csv );
13
14 # Read whole file in memory
15 my $aoa = csv (in => "data.csv"); # as array of array
16 my $aoh = csv (in => "data.csv",
17 headers => "auto"); # as array of hash
18
19 # Write array of arrays as csv file
20 csv (in => $aoa, out => "file.csv", sep_char=> ";");
21
22 # Only show lines where "code" is odd
23 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
24
25 # Object interface
26 use Text::CSV;
27
28 my @rows;
29 # Read/parse CSV
30 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
31 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
32 while (my $row = $csv->getline ($fh)) {
33 $row->[2] =~ m/pattern/ or next; # 3rd field should match
34 push @rows, $row;
35 }
36 close $fh;
37
38 # and write as CSV
39 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
40 $csv->say ($fh, $_) for @rows;
41 close $fh or die "new.csv: $!";
42
44 Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
45 All the backend modules provide facilities for the composition and
46 decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
47 default, and when Text::CSV_XS is not available, falls back on
48 Text::CSV_PP, which is bundled in the same distribution as this module.
49
51 This module respects an environmental variable called "PERL_TEXT_CSV"
52 when it decides a backend module to use. If this environmental variable
53 is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
54 available, falls back on Text::CSV_PP;
55
56 If you always don't want it to fall back on Text::CSV_PP, set the
57 variable like this ("export" may be "setenv", "set" and the likes,
58 depending on your environment):
59
60 > export PERL_TEXT_CSV=Text::CSV_XS
61
62 If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
63
64 > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
65
66 You may also want to set this variable at the top of your test files,
67 in order not to be bothered with incompatibilities between backends
68 (you need to wrap this in "BEGIN", and set before actually "use"-ing
69 Text::CSV module, as it decides its backend as soon as it's loaded):
70
71 BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
72 use Text::CSV;
73
75 This section is also taken from Text::CSV_XS.
76
77 Embedded newlines
78 Important Note: The default behavior is to accept only ASCII
79 characters in the range from 0x20 (space) to 0x7E (tilde). This means
80 that the fields can not contain newlines. If your data contains
81 newlines embedded in fields, or characters above 0x7E (tilde), or
82 binary data, you must set "binary => 1" in the call to "new". To cover
83 the widest range of parsing options, you will always want to set
84 binary.
85
86 But you still have the problem that you have to pass a correct line to
87 the "parse" method, which is more complicated from the usual point of
88 usage:
89
90 my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
91 while (<>) { # WRONG!
92 $csv->parse ($_);
93 my @fields = $csv->fields ();
94 }
95
96 this will break, as the "while" might read broken lines: it does not
97 care about the quoting. If you need to support embedded newlines, the
98 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
99 and "\r\n" by default) and then
100
101 my $csv = Text::CSV->new ({ binary => 1 });
102 open my $fh, "<", $file or die "$file: $!";
103 while (my $row = $csv->getline ($fh)) {
104 my @fields = @$row;
105 }
106
107 The old(er) way of using global file handles is still supported
108
109 while (my $row = $csv->getline (*ARGV)) { ... }
110
111 Unicode
112 Unicode is only tested to work with perl-5.8.2 and up.
113
114 See also "BOM".
115
116 The simplest way to ensure the correct encoding is used for in- and
117 output is by either setting layers on the filehandles, or setting the
118 "encoding" argument for "csv".
119
120 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
121 or
122 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
123
124 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
125 or
126 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
127
128 On parsing (both for "getline" and "parse"), if the source is marked
129 being UTF8, then all fields that are marked binary will also be marked
130 UTF8.
131
132 On combining ("print" and "combine"): if any of the combining fields
133 was marked UTF8, the resulting string will be marked as UTF8. Note
134 however that all fields before the first field marked UTF8 and
135 contained 8-bit characters that were not upgraded to UTF8, these will
136 be "bytes" in the resulting string too, possibly causing unexpected
137 errors. If you pass data of different encoding, or you don't know if
138 there is different encoding, force it to be upgraded before you pass
139 them on:
140
141 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
142
143 For complete control over encoding, please use Text::CSV::Encoded:
144
145 use Text::CSV::Encoded;
146 my $csv = Text::CSV::Encoded->new ({
147 encoding_in => "iso-8859-1", # the encoding comes into Perl
148 encoding_out => "cp1252", # the encoding comes out of Perl
149 });
150
151 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
152 # combine () and print () accept *literally* utf8 encoded data
153 # parse () and getline () return *literally* utf8 encoded data
154
155 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
156 # combine () and print () accept UTF8 marked data
157 # parse () and getline () return UTF8 marked data
158
159 BOM
160 BOM (or Byte Order Mark) handling is available only inside the
161 "header" method. This method supports the following encodings:
162 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
163 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
164 <https://en.wikipedia.org/wiki/Byte_order_mark>.
165
166 If a file has a BOM, the easiest way to deal with that is
167
168 my $aoh = csv (in => $file, detect_bom => 1);
169
170 All records will be encoded based on the detected BOM.
171
172 This implies a call to the "header" method, which defaults to also
173 set the "column_names". So this is not the same as
174
175 my $aoh = csv (in => $file, headers => "auto");
176
177 which only reads the first record to set "column_names" but ignores
178 any meaning of possible present BOM.
179
181 This section is also taken from Text::CSV_XS.
182
183 version
184 (Class method) Returns the current module version.
185
186 new
187 (Class method) Returns a new instance of class Text::CSV. The
188 attributes are described by the (optional) hash ref "\%attr".
189
190 my $csv = Text::CSV->new ({ attributes ... });
191
192 The following attributes are available:
193
194 eol
195
196 my $csv = Text::CSV->new ({ eol => $/ });
197 $csv->eol (undef);
198 my $eol = $csv->eol;
199
200 The end-of-line string to add to rows for "print" or the record
201 separator for "getline".
202
203 When not passed in a parser instance, the default behavior is to
204 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
205 "eol" at all. Passing "undef" or the empty string behave the same.
206
207 When not passed in a generating instance, records are not terminated
208 at all, so it is probably wise to pass something you expect. A safe
209 choice for "eol" on output is either $/ or "\r\n".
210
211 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
212 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
213 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
214
215 If both $/ and "eol" equal "\015", parsing lines that end on only a
216 Carriage Return without Line Feed, will be "parse"d correct.
217
218 sep_char
219
220 my $csv = Text::CSV->new ({ sep_char => ";" });
221 $csv->sep_char (";");
222 my $c = $csv->sep_char;
223
224 The char used to separate fields, by default a comma. (","). Limited
225 to a single-byte character, usually in the range from 0x20 (space) to
226 0x7E (tilde). When longer sequences are required, use "sep".
227
228 The separation character can not be equal to the quote character or to
229 the escape character.
230
231 sep
232
233 my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
234 $csv->sep (";");
235 my $sep = $csv->sep;
236
237 The chars used to separate fields, by default undefined. Limited to 8
238 bytes.
239
240 When set, overrules "sep_char". If its length is one byte it acts as
241 an alias to "sep_char".
242
243 quote_char
244
245 my $csv = Text::CSV->new ({ quote_char => "'" });
246 $csv->quote_char (undef);
247 my $c = $csv->quote_char;
248
249 The character to quote fields containing blanks or binary data, by
250 default the double quote character ("""). A value of undef suppresses
251 quote chars (for simple cases only). Limited to a single-byte
252 character, usually in the range from 0x20 (space) to 0x7E (tilde).
253 When longer sequences are required, use "quote".
254
255 "quote_char" can not be equal to "sep_char".
256
257 quote
258
259 my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
260 $csv->quote ("'");
261 my $quote = $csv->quote;
262
263 The chars used to quote fields, by default undefined. Limited to 8
264 bytes.
265
266 When set, overrules "quote_char". If its length is one byte it acts as
267 an alias to "quote_char".
268
269 This method does not support "undef". Use "quote_char" to disable
270 quotation.
271
272 escape_char
273
274 my $csv = Text::CSV->new ({ escape_char => "\\" });
275 $csv->escape_char (":");
276 my $c = $csv->escape_char;
277
278 The character to escape certain characters inside quoted fields.
279 This is limited to a single-byte character, usually in the range
280 from 0x20 (space) to 0x7E (tilde).
281
282 The "escape_char" defaults to being the double-quote mark ("""). In
283 other words the same as the default "quote_char". This means that
284 doubling the quote mark in a field escapes it:
285
286 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
287
288 If you change the "quote_char" without changing the
289 "escape_char", the "escape_char" will still be the double-quote
290 ("""). If instead you want to escape the "quote_char" by doubling it
291 you will need to also change the "escape_char" to be the same as what
292 you have changed the "quote_char" to.
293
294 Setting "escape_char" to "undef" or "" will completely disable escapes
295 and is greatly discouraged. This will also disable "escape_null".
296
297 The escape character can not be equal to the separation character.
298
299 binary
300
301 my $csv = Text::CSV->new ({ binary => 1 });
302 $csv->binary (0);
303 my $f = $csv->binary;
304
305 If this attribute is 1, you may use binary characters in quoted
306 fields, including line feeds, carriage returns and "NULL" bytes. (The
307 latter could be escaped as ""0".) By default this feature is off.
308
309 If a string is marked UTF8, "binary" will be turned on automatically
310 when binary characters other than "CR" and "NL" are encountered. Note
311 that a simple string like "\x{00a0}" might still be binary, but not
312 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
313
314 strict
315
316 my $csv = Text::CSV->new ({ strict => 1 });
317 $csv->strict (0);
318 my $f = $csv->strict;
319
320 If this attribute is set to 1, any row that parses to a different
321 number of fields than the previous row will cause the parser to throw
322 error 2014.
323
324 skip_empty_rows
325
326 my $csv = Text::CSV->new ({ skip_empty_rows => 1 });
327 $csv->skip_empty_rows (0);
328 my $f = $csv->skip_empty_rows;
329
330 If this attribute is set to 1, any row that has an "eol" immediately
331 following the start of line will be skipped. Default behavior is to
332 return one single empty field.
333
334 This attribute is only used in parsing.
335
336 formula_handling
337
338 Alias for "formula"
339
340 formula
341
342 my $csv = Text::CSV->new ({ formula => "none" });
343 $csv->formula ("none");
344 my $f = $csv->formula;
345
346 This defines the behavior of fields containing formulas. As formulas
347 are considered dangerous in spreadsheets, this attribute can define an
348 optional action to be taken if a field starts with an equal sign ("=").
349
350 For purpose of code-readability, this can also be written as
351
352 my $csv = Text::CSV->new ({ formula_handling => "none" });
353 $csv->formula_handling ("none");
354 my $f = $csv->formula_handling;
355
356 Possible values for this attribute are
357
358 none
359 Take no specific action. This is the default.
360
361 $csv->formula ("none");
362
363 die
364 Cause the process to "die" whenever a leading "=" is encountered.
365
366 $csv->formula ("die");
367
368 croak
369 Cause the process to "croak" whenever a leading "=" is encountered.
370 (See Carp)
371
372 $csv->formula ("croak");
373
374 diag
375 Report position and content of the field whenever a leading "=" is
376 found. The value of the field is unchanged.
377
378 $csv->formula ("diag");
379
380 empty
381 Replace the content of fields that start with a "=" with the empty
382 string.
383
384 $csv->formula ("empty");
385 $csv->formula ("");
386
387 undef
388 Replace the content of fields that start with a "=" with "undef".
389
390 $csv->formula ("undef");
391 $csv->formula (undef);
392
393 a callback
394 Modify the content of fields that start with a "=" with the return-
395 value of the callback. The original content of the field is
396 available inside the callback as $_;
397
398 # Replace all formula's with 42
399 $csv->formula (sub { 42; });
400
401 # same as $csv->formula ("empty") but slower
402 $csv->formula (sub { "" });
403
404 # Allow =4+12
405 $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
406
407 # Allow more complex calculations
408 $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
409
410 All other values will give a warning and then fallback to "diag".
411
412 decode_utf8
413
414 my $csv = Text::CSV->new ({ decode_utf8 => 1 });
415 $csv->decode_utf8 (0);
416 my $f = $csv->decode_utf8;
417
418 This attributes defaults to TRUE.
419
420 While parsing, fields that are valid UTF-8, are automatically set to
421 be UTF-8, so that
422
423 $csv->parse ("\xC4\xA8\n");
424
425 results in
426
427 PV("\304\250"\0) [UTF8 "\x{128}"]
428
429 Sometimes it might not be a desired action. To prevent those upgrades,
430 set this attribute to false, and the result will be
431
432 PV("\304\250"\0)
433
434 auto_diag
435
436 my $csv = Text::CSV->new ({ auto_diag => 1 });
437 $csv->auto_diag (2);
438 my $l = $csv->auto_diag;
439
440 Set this attribute to a number between 1 and 9 causes "error_diag" to
441 be automatically called in void context upon errors.
442
443 In case of error "2012 - EOF", this call will be void.
444
445 If "auto_diag" is set to a numeric value greater than 1, it will "die"
446 on errors instead of "warn". If set to anything unrecognized, it will
447 be silently ignored.
448
449 Future extensions to this feature will include more reliable auto-
450 detection of "autodie" being active in the scope of which the error
451 occurred which will increment the value of "auto_diag" with 1 the
452 moment the error is detected.
453
454 diag_verbose
455
456 my $csv = Text::CSV->new ({ diag_verbose => 1 });
457 $csv->diag_verbose (2);
458 my $l = $csv->diag_verbose;
459
460 Set the verbosity of the output triggered by "auto_diag". Currently
461 only adds the current input-record-number (if known) to the
462 diagnostic output with an indication of the position of the error.
463
464 blank_is_undef
465
466 my $csv = Text::CSV->new ({ blank_is_undef => 1 });
467 $csv->blank_is_undef (0);
468 my $f = $csv->blank_is_undef;
469
470 Under normal circumstances, "CSV" data makes no distinction between
471 quoted- and unquoted empty fields. These both end up in an empty
472 string field once read, thus
473
474 1,"",," ",2
475
476 is read as
477
478 ("1", "", "", " ", "2")
479
480 When writing "CSV" files with either "always_quote" or "quote_empty"
481 set, the unquoted empty field is the result of an undefined value.
482 To enable this distinction when reading "CSV" data, the
483 "blank_is_undef" attribute will cause unquoted empty fields to be set
484 to "undef", causing the above to be parsed as
485
486 ("1", "", undef, " ", "2")
487
488 Note that this is specifically important when loading "CSV" fields
489 into a database that allows "NULL" values, as the perl equivalent for
490 "NULL" is "undef" in DBI land.
491
492 empty_is_undef
493
494 my $csv = Text::CSV->new ({ empty_is_undef => 1 });
495 $csv->empty_is_undef (0);
496 my $f = $csv->empty_is_undef;
497
498 Going one step further than "blank_is_undef", this attribute
499 converts all empty fields to "undef", so
500
501 1,"",," ",2
502
503 is read as
504
505 (1, undef, undef, " ", 2)
506
507 Note that this affects only fields that are originally empty, not
508 fields that are empty after stripping allowed whitespace. YMMV.
509
510 allow_whitespace
511
512 my $csv = Text::CSV->new ({ allow_whitespace => 1 });
513 $csv->allow_whitespace (0);
514 my $f = $csv->allow_whitespace;
515
516 When this option is set to true, the whitespace ("TAB"'s and
517 "SPACE"'s) surrounding the separation character is removed when
518 parsing. If either "TAB" or "SPACE" is one of the three characters
519 "sep_char", "quote_char", or "escape_char" it will not be considered
520 whitespace.
521
522 Now lines like:
523
524 1 , "foo" , bar , 3 , zapp
525
526 are parsed as valid "CSV", even though it violates the "CSV" specs.
527
528 Note that all whitespace is stripped from both start and end of
529 each field. That would make it more than a feature to enable parsing
530 bad "CSV" lines, as
531
532 1, 2.0, 3, ape , monkey
533
534 will now be parsed as
535
536 ("1", "2.0", "3", "ape", "monkey")
537
538 even if the original line was perfectly acceptable "CSV".
539
540 allow_loose_quotes
541
542 my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
543 $csv->allow_loose_quotes (0);
544 my $f = $csv->allow_loose_quotes;
545
546 By default, parsing unquoted fields containing "quote_char" characters
547 like
548
549 1,foo "bar" baz,42
550
551 would result in parse error 2034. Though it is still bad practice to
552 allow this format, we cannot help the fact that some vendors
553 make their applications spit out lines styled this way.
554
555 If there is really bad "CSV" data, like
556
557 1,"foo "bar" baz",42
558
559 or
560
561 1,""foo bar baz"",42
562
563 there is a way to get this data-line parsed and leave the quotes inside
564 the quoted field as-is. This can be achieved by setting
565 "allow_loose_quotes" AND making sure that the "escape_char" is not
566 equal to "quote_char".
567
568 allow_loose_escapes
569
570 my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
571 $csv->allow_loose_escapes (0);
572 my $f = $csv->allow_loose_escapes;
573
574 Parsing fields that have "escape_char" characters that escape
575 characters that do not need to be escaped, like:
576
577 my $csv = Text::CSV->new ({ escape_char => "\\" });
578 $csv->parse (qq{1,"my bar\'s",baz,42});
579
580 would result in parse error 2025. Though it is bad practice to allow
581 this format, this attribute enables you to treat all escape character
582 sequences equal.
583
584 allow_unquoted_escape
585
586 my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
587 $csv->allow_unquoted_escape (0);
588 my $f = $csv->allow_unquoted_escape;
589
590 A backward compatibility issue where "escape_char" differs from
591 "quote_char" prevents "escape_char" to be in the first position of a
592 field. If "quote_char" is equal to the default """ and "escape_char"
593 is set to "\", this would be illegal:
594
595 1,\0,2
596
597 Setting this attribute to 1 might help to overcome issues with
598 backward compatibility and allow this style.
599
600 always_quote
601
602 my $csv = Text::CSV->new ({ always_quote => 1 });
603 $csv->always_quote (0);
604 my $f = $csv->always_quote;
605
606 By default the generated fields are quoted only if they need to be.
607 For example, if they contain the separator character. If you set this
608 attribute to 1 then all defined fields will be quoted. ("undef" fields
609 are not quoted, see "blank_is_undef"). This makes it quite often easier
610 to handle exported data in external applications.
611
612 quote_space
613
614 my $csv = Text::CSV->new ({ quote_space => 1 });
615 $csv->quote_space (0);
616 my $f = $csv->quote_space;
617
618 By default, a space in a field would trigger quotation. As no rule
619 exists this to be forced in "CSV", nor any for the opposite, the
620 default is true for safety. You can exclude the space from this
621 trigger by setting this attribute to 0.
622
623 quote_empty
624
625 my $csv = Text::CSV->new ({ quote_empty => 1 });
626 $csv->quote_empty (0);
627 my $f = $csv->quote_empty;
628
629 By default the generated fields are quoted only if they need to be.
630 An empty (defined) field does not need quotation. If you set this
631 attribute to 1 then empty defined fields will be quoted. ("undef"
632 fields are not quoted, see "blank_is_undef"). See also "always_quote".
633
634 quote_binary
635
636 my $csv = Text::CSV->new ({ quote_binary => 1 });
637 $csv->quote_binary (0);
638 my $f = $csv->quote_binary;
639
640 By default, all "unsafe" bytes inside a string cause the combined
641 field to be quoted. By setting this attribute to 0, you can disable
642 that trigger for bytes ">= 0x7F".
643
644 escape_null
645
646 my $csv = Text::CSV->new ({ escape_null => 1 });
647 $csv->escape_null (0);
648 my $f = $csv->escape_null;
649
650 By default, a "NULL" byte in a field would be escaped. This option
651 enables you to treat the "NULL" byte as a simple binary character in
652 binary mode (the "{ binary => 1 }" is set). The default is true. You
653 can prevent "NULL" escapes by setting this attribute to 0.
654
655 When the "escape_char" attribute is set to undefined, this attribute
656 will be set to false.
657
658 The default setting will encode "=\x00=" as
659
660 "="0="
661
662 With "escape_null" set, this will result in
663
664 "=\x00="
665
666 The default when using the "csv" function is "false".
667
668 For backward compatibility reasons, the deprecated old name
669 "quote_null" is still recognized.
670
671 keep_meta_info
672
673 my $csv = Text::CSV->new ({ keep_meta_info => 1 });
674 $csv->keep_meta_info (0);
675 my $f = $csv->keep_meta_info;
676
677 By default, the parsing of input records is as simple and fast as
678 possible. However, some parsing information - like quotation of the
679 original field - is lost in that process. Setting this flag to true
680 enables retrieving that information after parsing with the methods
681 "meta_info", "is_quoted", and "is_binary" described below. Default is
682 false for performance.
683
684 If you set this attribute to a value greater than 9, then you can
685 control output quotation style like it was used in the input of the the
686 last parsed record (unless quotation was added because of other
687 reasons).
688
689 my $csv = Text::CSV->new ({
690 binary => 1,
691 keep_meta_info => 1,
692 quote_space => 0,
693 });
694
695 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
696
697 $csv->print (*STDOUT, \@row);
698 # 1,,, , ,f,g,"h""h",help,help
699 $csv->keep_meta_info (11);
700 $csv->print (*STDOUT, \@row);
701 # 1,,"", ," ",f,"g","h""h",help,"help"
702
703 undef_str
704
705 my $csv = Text::CSV->new ({ undef_str => "\\N" });
706 $csv->undef_str (undef);
707 my $s = $csv->undef_str;
708
709 This attribute optionally defines the output of undefined fields. The
710 value passed is not changed at all, so if it needs quotation, the
711 quotation needs to be included in the value of the attribute. Use with
712 caution, as passing a value like ",",,,,""" will for sure mess up
713 your output. The default for this attribute is "undef", meaning no
714 special treatment.
715
716 This attribute is useful when exporting CSV data to be imported in
717 custom loaders, like for MySQL, that recognize special sequences for
718 "NULL" data.
719
720 This attribute has no meaning when parsing CSV data.
721
722 comment_str
723
724 my $csv = Text::CSV->new ({ comment_str => "#" });
725 $csv->comment_str (undef);
726 my $s = $csv->comment_str;
727
728 This attribute optionally defines a string to be recognized as comment.
729 If this attribute is defined, all lines starting with this sequence
730 will not be parsed as CSV but skipped as comment.
731
732 This attribute has no meaning when generating CSV.
733
734 Comment strings that start with any of the special characters/sequences
735 are not supported (so it cannot start with any of "sep_char",
736 "quote_char", "escape_char", "sep", "quote", or "eol").
737
738 For convenience, "comment" is an alias for "comment_str".
739
740 verbatim
741
742 my $csv = Text::CSV->new ({ verbatim => 1 });
743 $csv->verbatim (0);
744 my $f = $csv->verbatim;
745
746 This is a quite controversial attribute to set, but makes some hard
747 things possible.
748
749 The rationale behind this attribute is to tell the parser that the
750 normally special characters newline ("NL") and Carriage Return ("CR")
751 will not be special when this flag is set, and be dealt with as being
752 ordinary binary characters. This will ease working with data with
753 embedded newlines.
754
755 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
756 every line.
757
758 Imagine a file format like
759
760 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
761
762 where, the line ending is a very specific "#\r\n", and the sep_char is
763 a "^" (caret). None of the fields is quoted, but embedded binary
764 data is likely to be present. With the specific line ending, this
765 should not be too hard to detect.
766
767 By default, Text::CSV' parse function is instructed to only know
768 about "\n" and "\r" to be legal line endings, and so has to deal with
769 the embedded newline as a real "end-of-line", so it can scan the next
770 line if binary is true, and the newline is inside a quoted field. With
771 this option, we tell "parse" to parse the line as if "\n" is just
772 nothing more than a binary character.
773
774 For "parse" this means that the parser has no more idea about line
775 ending and "getline" "chomp"s line endings on reading.
776
777 types
778
779 A set of column types; the attribute is immediately passed to the
780 "types" method.
781
782 callbacks
783
784 See the "Callbacks" section below.
785
786 accessors
787
788 To sum it up,
789
790 $csv = Text::CSV->new ();
791
792 is equivalent to
793
794 $csv = Text::CSV->new ({
795 eol => undef, # \r, \n, or \r\n
796 sep_char => ',',
797 sep => undef,
798 quote_char => '"',
799 quote => undef,
800 escape_char => '"',
801 binary => 0,
802 decode_utf8 => 1,
803 auto_diag => 0,
804 diag_verbose => 0,
805 blank_is_undef => 0,
806 empty_is_undef => 0,
807 allow_whitespace => 0,
808 allow_loose_quotes => 0,
809 allow_loose_escapes => 0,
810 allow_unquoted_escape => 0,
811 always_quote => 0,
812 quote_empty => 0,
813 quote_space => 1,
814 escape_null => 1,
815 quote_binary => 1,
816 keep_meta_info => 0,
817 strict => 0,
818 skip_empty_rows => 0,
819 formula => 0,
820 verbatim => 0,
821 undef_str => undef,
822 comment_str => undef,
823 types => undef,
824 callbacks => undef,
825 });
826
827 For all of the above mentioned flags, an accessor method is available
828 where you can inquire the current value, or change the value
829
830 my $quote = $csv->quote_char;
831 $csv->binary (1);
832
833 It is not wise to change these settings halfway through writing "CSV"
834 data to a stream. If however you want to create a new stream using the
835 available "CSV" object, there is no harm in changing them.
836
837 If the "new" constructor call fails, it returns "undef", and makes
838 the fail reason available through the "error_diag" method.
839
840 $csv = Text::CSV->new ({ ecs_char => 1 }) or
841 die "".Text::CSV->error_diag ();
842
843 "error_diag" will return a string like
844
845 "INI - Unknown attribute 'ecs_char'"
846
847 known_attributes
848 @attr = Text::CSV->known_attributes;
849 @attr = Text::CSV::known_attributes;
850 @attr = $csv->known_attributes;
851
852 This method will return an ordered list of all the supported
853 attributes as described above. This can be useful for knowing what
854 attributes are valid in classes that use or extend Text::CSV.
855
856 print
857 $status = $csv->print ($fh, $colref);
858
859 Similar to "combine" + "string" + "print", but much more efficient.
860 It expects an array ref as input (not an array!) and the resulting
861 string is not really created, but immediately written to the $fh
862 object, typically an IO handle or any other object that offers a
863 "print" method.
864
865 For performance reasons "print" does not create a result string, so
866 all "string", "status", "fields", and "error_input" methods will return
867 undefined information after executing this method.
868
869 If $colref is "undef" (explicit, not through a variable argument) and
870 "bind_columns" was used to specify fields to be printed, it is
871 possible to make performance improvements, as otherwise data would have
872 to be copied as arguments to the method call:
873
874 $csv->bind_columns (\($foo, $bar));
875 $status = $csv->print ($fh, undef);
876
877 A short benchmark
878
879 my @data = ("aa" .. "zz");
880 $csv->bind_columns (\(@data));
881
882 $csv->print ($fh, [ @data ]); # 11800 recs/sec
883 $csv->print ($fh, \@data ); # 57600 recs/sec
884 $csv->print ($fh, undef ); # 48500 recs/sec
885
886 say
887 $status = $csv->say ($fh, $colref);
888
889 Like "print", but "eol" defaults to "$\".
890
891 print_hr
892 $csv->print_hr ($fh, $ref);
893
894 Provides an easy way to print a $ref (as fetched with "getline_hr")
895 provided the column names are set with "column_names".
896
897 It is just a wrapper method with basic parameter checks over
898
899 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
900
901 combine
902 $status = $csv->combine (@fields);
903
904 This method constructs a "CSV" record from @fields, returning success
905 or failure. Failure can result from lack of arguments or an argument
906 that contains an invalid character. Upon success, "string" can be
907 called to retrieve the resultant "CSV" string. Upon failure, the
908 value returned by "string" is undefined and "error_input" could be
909 called to retrieve the invalid argument.
910
911 string
912 $line = $csv->string ();
913
914 This method returns the input to "parse" or the resultant "CSV"
915 string of "combine", whichever was called more recently.
916
917 getline
918 $colref = $csv->getline ($fh);
919
920 This is the counterpart to "print", as "parse" is the counterpart to
921 "combine": it parses a row from the $fh handle using the "getline"
922 method associated with $fh and parses this row into an array ref.
923 This array ref is returned by the function or "undef" for failure.
924 When $fh does not support "getline", you are likely to hit errors.
925
926 When fields are bound with "bind_columns" the return value is a
927 reference to an empty list.
928
929 The "string", "fields", and "status" methods are meaningless again.
930
931 getline_all
932 $arrayref = $csv->getline_all ($fh);
933 $arrayref = $csv->getline_all ($fh, $offset);
934 $arrayref = $csv->getline_all ($fh, $offset, $length);
935
936 This will return a reference to a list of getline ($fh) results. In
937 this call, "keep_meta_info" is disabled. If $offset is negative, as
938 with "splice", only the last "abs ($offset)" records of $fh are taken
939 into consideration.
940
941 Given a CSV file with 10 lines:
942
943 lines call
944 ----- ---------------------------------------------------------
945 0..9 $csv->getline_all ($fh) # all
946 0..9 $csv->getline_all ($fh, 0) # all
947 8..9 $csv->getline_all ($fh, 8) # start at 8
948 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
949 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
950 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
951 8..9 $csv->getline_all ($fh, -2) # last 2 rows
952 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
953
954 getline_hr
955 The "getline_hr" and "column_names" methods work together to allow you
956 to have rows returned as hashrefs. You must call "column_names" first
957 to declare your column names.
958
959 $csv->column_names (qw( code name price description ));
960 $hr = $csv->getline_hr ($fh);
961 print "Price for $hr->{name} is $hr->{price} EUR\n";
962
963 "getline_hr" will croak if called before "column_names".
964
965 Note that "getline_hr" creates a hashref for every row and will be
966 much slower than the combined use of "bind_columns" and "getline" but
967 still offering the same easy to use hashref inside the loop:
968
969 my @cols = @{$csv->getline ($fh)};
970 $csv->column_names (@cols);
971 while (my $row = $csv->getline_hr ($fh)) {
972 print $row->{price};
973 }
974
975 Could easily be rewritten to the much faster:
976
977 my @cols = @{$csv->getline ($fh)};
978 my $row = {};
979 $csv->bind_columns (\@{$row}{@cols});
980 while ($csv->getline ($fh)) {
981 print $row->{price};
982 }
983
984 Your mileage may vary for the size of the data and the number of rows.
985 With perl-5.14.2 the comparison for a 100_000 line file with 14
986 columns:
987
988 Rate hashrefs getlines
989 hashrefs 1.00/s -- -76%
990 getlines 4.15/s 313% --
991
992 getline_hr_all
993 $arrayref = $csv->getline_hr_all ($fh);
994 $arrayref = $csv->getline_hr_all ($fh, $offset);
995 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
996
997 This will return a reference to a list of getline_hr ($fh) results.
998 In this call, "keep_meta_info" is disabled.
999
1000 parse
1001 $status = $csv->parse ($line);
1002
1003 This method decomposes a "CSV" string into fields, returning success
1004 or failure. Failure can result from a lack of argument or the given
1005 "CSV" string is improperly formatted. Upon success, "fields" can be
1006 called to retrieve the decomposed fields. Upon failure calling "fields"
1007 will return undefined data and "error_input" can be called to
1008 retrieve the invalid argument.
1009
1010 You may use the "types" method for setting column types. See "types"'
1011 description below.
1012
1013 The $line argument is supposed to be a simple scalar. Everything else
1014 is supposed to croak and set error 1500.
1015
1016 fragment
1017 This function tries to implement RFC7111 (URI Fragment Identifiers for
1018 the text/csv Media Type) -
1019 https://datatracker.ietf.org/doc/html/rfc7111
1020
1021 my $AoA = $csv->fragment ($fh, $spec);
1022
1023 In specifications, "*" is used to specify the last item, a dash ("-")
1024 to indicate a range. All indices are 1-based: the first row or
1025 column has index 1. Selections can be combined with the semi-colon
1026 (";").
1027
1028 When using this method in combination with "column_names", the
1029 returned reference will point to a list of hashes instead of a list
1030 of lists. A disjointed cell-based combined selection might return
1031 rows with different number of columns making the use of hashes
1032 unpredictable.
1033
1034 $csv->column_names ("Name", "Age");
1035 my $AoH = $csv->fragment ($fh, "col=3;8");
1036
1037 If the "after_parse" callback is active, it is also called on every
1038 line parsed and skipped before the fragment.
1039
1040 row
1041 row=4
1042 row=5-7
1043 row=6-*
1044 row=1-2;4;6-*
1045
1046 col
1047 col=2
1048 col=1-3
1049 col=4-*
1050 col=1-2;4;7-*
1051
1052 cell
1053 In cell-based selection, the comma (",") is used to pair row and
1054 column
1055
1056 cell=4,1
1057
1058 The range operator ("-") using "cell"s can be used to define top-left
1059 and bottom-right "cell" location
1060
1061 cell=3,1-4,6
1062
1063 The "*" is only allowed in the second part of a pair
1064
1065 cell=3,2-*,2 # row 3 till end, only column 2
1066 cell=3,2-3,* # column 2 till end, only row 3
1067 cell=3,2-*,* # strip row 1 and 2, and column 1
1068
1069 Cells and cell ranges may be combined with ";", possibly resulting in
1070 rows with different numbers of columns
1071
1072 cell=1,1-2,2;3,3-4,4;1,4;4,1
1073
1074 Disjointed selections will only return selected cells. The cells
1075 that are not specified will not be included in the returned
1076 set, not even as "undef". As an example given a "CSV" like
1077
1078 11,12,13,...19
1079 21,22,...28,29
1080 : :
1081 91,...97,98,99
1082
1083 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1084
1085 11,12,14
1086 21,22
1087 33,34
1088 41,43,44
1089
1090 Overlapping cell-specs will return those cells only once, So
1091 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1092
1093 11,12,13
1094 21,22,23,24
1095 31,32,33,34
1096 42,43,44
1097
1098 RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does not
1099 allow different types of specs to be combined (either "row" or "col"
1100 or "cell"). Passing an invalid fragment specification will croak and
1101 set error 2013.
1102
1103 column_names
1104 Set the "keys" that will be used in the "getline_hr" calls. If no
1105 keys (column names) are passed, it will return the current setting as a
1106 list.
1107
1108 "column_names" accepts a list of scalars (the column names) or a
1109 single array_ref, so you can pass the return value from "getline" too:
1110
1111 $csv->column_names ($csv->getline ($fh));
1112
1113 "column_names" does no checking on duplicates at all, which might lead
1114 to unexpected results. Undefined entries will be replaced with the
1115 string "\cAUNDEF\cA", so
1116
1117 $csv->column_names (undef, "", "name", "name");
1118 $hr = $csv->getline_hr ($fh);
1119
1120 will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1121 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1122 field.
1123
1124 "column_names" croaks on invalid arguments.
1125
1126 header
1127 This method does NOT work in perl-5.6.x
1128
1129 Parse the CSV header and set "sep", column_names and encoding.
1130
1131 my @hdr = $csv->header ($fh);
1132 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1133 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1134
1135 The first argument should be a file handle.
1136
1137 This method resets some object properties, as it is supposed to be
1138 invoked only once per file or stream. It will leave attributes
1139 "column_names" and "bound_columns" alone if setting column names is
1140 disabled. Reading headers on previously process objects might fail on
1141 perl-5.8.0 and older.
1142
1143 Assuming that the file opened for parsing has a header, and the header
1144 does not contain problematic characters like embedded newlines, read
1145 the first line from the open handle then auto-detect whether the header
1146 separates the column names with a character from the allowed separator
1147 list.
1148
1149 If any of the allowed separators matches, and none of the other
1150 allowed separators match, set "sep" to that separator for the
1151 current CSV instance and use it to parse the first line, map those to
1152 lowercase, and use that to set the instance "column_names":
1153
1154 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1155 open my $fh, "<", "file.csv";
1156 binmode $fh; # for Windows
1157 $csv->header ($fh);
1158 while (my $row = $csv->getline_hr ($fh)) {
1159 ...
1160 }
1161
1162 If the header is empty, contains more than one unique separator out of
1163 the allowed set, contains empty fields, or contains identical fields
1164 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1165 respectively.
1166
1167 If the header contains embedded newlines or is not valid CSV in any
1168 other way, this method will croak and leave the parse error untouched.
1169
1170 A successful call to "header" will always set the "sep" of the $csv
1171 object. This behavior can not be disabled.
1172
1173 return value
1174
1175 On error this method will croak.
1176
1177 In list context, the headers will be returned whether they are used to
1178 set "column_names" or not.
1179
1180 In scalar context, the instance itself is returned. Note: the values
1181 as found in the header will effectively be lost if "set_column_names"
1182 is false.
1183
1184 Options
1185
1186 sep_set
1187 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1188
1189 The list of legal separators defaults to "[ ";", "," ]" and can be
1190 changed by this option. As this is probably the most often used
1191 option, it can be passed on its own as an unnamed argument:
1192
1193 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1194
1195 Multi-byte sequences are allowed, both multi-character and
1196 Unicode. See "sep".
1197
1198 detect_bom
1199 $csv->header ($fh, { detect_bom => 1 });
1200
1201 The default behavior is to detect if the header line starts with a
1202 BOM. If the header has a BOM, use that to set the encoding of $fh.
1203 This default behavior can be disabled by passing a false value to
1204 "detect_bom".
1205
1206 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1207 UTF-32BE, and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1208 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1209 supported.
1210
1211 If a supported BOM was detected as start of the stream, it is stored
1212 in the object attribute "ENCODING".
1213
1214 my $enc = $csv->{ENCODING};
1215
1216 The encoding is used with "binmode" on $fh.
1217
1218 If the handle was opened in a (correct) encoding, this method will
1219 not alter the encoding, as it checks the leading bytes of the first
1220 line. In case the stream starts with a decoded BOM ("U+FEFF"),
1221 "{ENCODING}" will be "" (empty) instead of the default "undef".
1222
1223 munge_column_names
1224 This option offers the means to modify the column names into
1225 something that is most useful to the application. The default is to
1226 map all column names to lower case.
1227
1228 $csv->header ($fh, { munge_column_names => "lc" });
1229
1230 The following values are available:
1231
1232 lc - lower case
1233 uc - upper case
1234 db - valid DB field names
1235 none - do not change
1236 \%hash - supply a mapping
1237 \&cb - supply a callback
1238
1239 Lower case
1240 $csv->header ($fh, { munge_column_names => "lc" });
1241
1242 The header is changed to all lower-case
1243
1244 $_ = lc;
1245
1246 Upper case
1247 $csv->header ($fh, { munge_column_names => "uc" });
1248
1249 The header is changed to all upper-case
1250
1251 $_ = uc;
1252
1253 Literal
1254 $csv->header ($fh, { munge_column_names => "none" });
1255
1256 Hash
1257 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1258
1259 if a value does not exist, the original value is used unchanged
1260
1261 Database
1262 $csv->header ($fh, { munge_column_names => "db" });
1263
1264 - lower-case
1265
1266 - all sequences of non-word characters are replaced with an
1267 underscore
1268
1269 - all leading underscores are removed
1270
1271 $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1272
1273 Callback
1274 $csv->header ($fh, { munge_column_names => sub { fc } });
1275 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1276 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1277
1278 As this callback is called in a "map", you can use $_ directly.
1279
1280 set_column_names
1281 $csv->header ($fh, { set_column_names => 1 });
1282
1283 The default is to set the instances column names using
1284 "column_names" if the method is successful, so subsequent calls to
1285 "getline_hr" can return a hash. Disable setting the header can be
1286 forced by using a false value for this option.
1287
1288 As described in "return value" above, content is lost in scalar
1289 context.
1290
1291 Validation
1292
1293 When receiving CSV files from external sources, this method can be
1294 used to protect against changes in the layout by restricting to known
1295 headers (and typos in the header fields).
1296
1297 my %known = (
1298 "record key" => "c_rec",
1299 "rec id" => "c_rec",
1300 "id_rec" => "c_rec",
1301 "kode" => "code",
1302 "code" => "code",
1303 "vaule" => "value",
1304 "value" => "value",
1305 );
1306 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1307 open my $fh, "<", $source or die "$source: $!";
1308 $csv->header ($fh, { munge_column_names => sub {
1309 s/\s+$//;
1310 s/^\s+//;
1311 $known{lc $_} or die "Unknown column '$_' in $source";
1312 }});
1313 while (my $row = $csv->getline_hr ($fh)) {
1314 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1315 }
1316
1317 bind_columns
1318 Takes a list of scalar references to be used for output with "print"
1319 or to store in the fields fetched by "getline". When you do not pass
1320 enough references to store the fetched fields in, "getline" will fail
1321 with error 3006. If you pass more than there are fields to return,
1322 the content of the remaining references is left untouched.
1323
1324 $csv->bind_columns (\$code, \$name, \$price, \$description);
1325 while ($csv->getline ($fh)) {
1326 print "The price of a $name is \x{20ac} $price\n";
1327 }
1328
1329 To reset or clear all column binding, call "bind_columns" with the
1330 single argument "undef". This will also clear column names.
1331
1332 $csv->bind_columns (undef);
1333
1334 If no arguments are passed at all, "bind_columns" will return the list
1335 of current bindings or "undef" if no binds are active.
1336
1337 Note that in parsing with "bind_columns", the fields are set on the
1338 fly. That implies that if the third field of a row causes an error
1339 (or this row has just two fields where the previous row had more), the
1340 first two fields already have been assigned the values of the current
1341 row, while the rest of the fields will still hold the values of the
1342 previous row. If you want the parser to fail in these cases, use the
1343 "strict" attribute.
1344
1345 eof
1346 $eof = $csv->eof ();
1347
1348 If "parse" or "getline" was used with an IO stream, this method will
1349 return true (1) if the last call hit end of file, otherwise it will
1350 return false (''). This is useful to see the difference between a
1351 failure and end of file.
1352
1353 Note that if the parsing of the last line caused an error, "eof" is
1354 still true. That means that if you are not using "auto_diag", an idiom
1355 like
1356
1357 while (my $row = $csv->getline ($fh)) {
1358 # ...
1359 }
1360 $csv->eof or $csv->error_diag;
1361
1362 will not report the error. You would have to change that to
1363
1364 while (my $row = $csv->getline ($fh)) {
1365 # ...
1366 }
1367 +$csv->error_diag and $csv->error_diag;
1368
1369 types
1370 $csv->types (\@tref);
1371
1372 This method is used to force that (all) columns are of a given type.
1373 For example, if you have an integer column, two columns with
1374 doubles and a string column, then you might do a
1375
1376 $csv->types ([Text::CSV::IV (),
1377 Text::CSV::NV (),
1378 Text::CSV::NV (),
1379 Text::CSV::PV ()]);
1380
1381 Column types are used only for decoding columns while parsing, in
1382 other words by the "parse" and "getline" methods.
1383
1384 You can unset column types by doing a
1385
1386 $csv->types (undef);
1387
1388 or fetch the current type settings with
1389
1390 $types = $csv->types ();
1391
1392 IV
1393 CSV_TYPE_IV
1394 Set field type to integer.
1395
1396 NV
1397 CSV_TYPE_NV
1398 Set field type to numeric/float.
1399
1400 PV
1401 CSV_TYPE_PV
1402 Set field type to string.
1403
1404 fields
1405 @columns = $csv->fields ();
1406
1407 This method returns the input to "combine" or the resultant
1408 decomposed fields of a successful "parse", whichever was called more
1409 recently.
1410
1411 Note that the return value is undefined after using "getline", which
1412 does not fill the data structures returned by "parse".
1413
1414 meta_info
1415 @flags = $csv->meta_info ();
1416
1417 This method returns the "flags" of the input to "combine" or the flags
1418 of the resultant decomposed fields of "parse", whichever was called
1419 more recently.
1420
1421 For each field, a meta_info field will hold flags that inform
1422 something about the field returned by the "fields" method or
1423 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1424
1425 0x0001
1426 "CSV_FLAGS_IS_QUOTED"
1427 The field was quoted.
1428
1429 0x0002
1430 "CSV_FLAGS_IS_BINARY"
1431 The field was binary.
1432
1433 0x0004
1434 "CSV_FLAGS_ERROR_IN_FIELD"
1435 The field was invalid.
1436
1437 Currently only used when "allow_loose_quotes" is active.
1438
1439 0x0010
1440 "CSV_FLAGS_IS_MISSING"
1441 The field was missing.
1442
1443 See the "is_***" methods below.
1444
1445 is_quoted
1446 my $quoted = $csv->is_quoted ($column_idx);
1447
1448 where $column_idx is the (zero-based) index of the column in the
1449 last result of "parse".
1450
1451 This returns a true value if the data in the indicated column was
1452 enclosed in "quote_char" quotes. This might be important for fields
1453 where content ",20070108," is to be treated as a numeric value, and
1454 where ","20070108"," is explicitly marked as character string data.
1455
1456 This method is only valid when "keep_meta_info" is set to a true value.
1457
1458 is_binary
1459 my $binary = $csv->is_binary ($column_idx);
1460
1461 where $column_idx is the (zero-based) index of the column in the
1462 last result of "parse".
1463
1464 This returns a true value if the data in the indicated column contained
1465 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1466
1467 This method is only valid when "keep_meta_info" is set to a true value.
1468
1469 is_missing
1470 my $missing = $csv->is_missing ($column_idx);
1471
1472 where $column_idx is the (zero-based) index of the column in the
1473 last result of "getline_hr".
1474
1475 $csv->keep_meta_info (1);
1476 while (my $hr = $csv->getline_hr ($fh)) {
1477 $csv->is_missing (0) and next; # This was an empty line
1478 }
1479
1480 When using "getline_hr", it is impossible to tell if the parsed
1481 fields are "undef" because they where not filled in the "CSV" stream
1482 or because they were not read at all, as all the fields defined by
1483 "column_names" are set in the hash-ref. If you still need to know if
1484 all fields in each row are provided, you should enable "keep_meta_info"
1485 so you can check the flags.
1486
1487 If "keep_meta_info" is "false", "is_missing" will always return
1488 "undef", regardless of $column_idx being valid or not. If this
1489 attribute is "true" it will return either 0 (the field is present) or 1
1490 (the field is missing).
1491
1492 A special case is the empty line. If the line is completely empty -
1493 after dealing with the flags - this is still a valid CSV line: it is a
1494 record of just one single empty field. However, if "keep_meta_info" is
1495 set, invoking "is_missing" with index 0 will now return true.
1496
1497 status
1498 $status = $csv->status ();
1499
1500 This method returns the status of the last invoked "combine" or "parse"
1501 call. Status is success (true: 1) or failure (false: "undef" or 0).
1502
1503 Note that as this only keeps track of the status of above mentioned
1504 methods, you are probably looking for "error_diag" instead.
1505
1506 error_input
1507 $bad_argument = $csv->error_input ();
1508
1509 This method returns the erroneous argument (if it exists) of "combine"
1510 or "parse", whichever was called more recently. If the last
1511 invocation was successful, "error_input" will return "undef".
1512
1513 Depending on the type of error, it might also hold the data for the
1514 last error-input of "getline".
1515
1516 error_diag
1517 Text::CSV->error_diag ();
1518 $csv->error_diag ();
1519 $error_code = 0 + $csv->error_diag ();
1520 $error_str = "" . $csv->error_diag ();
1521 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1522
1523 If (and only if) an error occurred, this function returns the
1524 diagnostics of that error.
1525
1526 If called in void context, this will print the internal error code and
1527 the associated error message to STDERR.
1528
1529 If called in list context, this will return the error code and the
1530 error message in that order. If the last error was from parsing, the
1531 rest of the values returned are a best guess at the location within
1532 the line that was being parsed. Their values are 1-based. The
1533 position currently is index of the byte at which the parsing failed in
1534 the current record. It might change to be the index of the current
1535 character in a later release. The records is the index of the record
1536 parsed by the csv instance. The field number is the index of the field
1537 the parser thinks it is currently trying to parse. See
1538 examples/csv-check for how this can be used.
1539
1540 If called in scalar context, it will return the diagnostics in a
1541 single scalar, a-la $!. It will contain the error code in numeric
1542 context, and the diagnostics message in string context.
1543
1544 When called as a class method or a direct function call, the
1545 diagnostics are that of the last "new" call.
1546
1547 record_number
1548 $recno = $csv->record_number ();
1549
1550 Returns the records parsed by this csv instance. This value should be
1551 more accurate than $. when embedded newlines come in play. Records
1552 written by this instance are not counted.
1553
1554 SetDiag
1555 $csv->SetDiag (0);
1556
1557 Use to reset the diagnostics if you are dealing with errors.
1558
1560 backend
1561 Returns the backend module name called by Text::CSV. "module" is
1562 an alias.
1563
1564 is_xs
1565 Returns true value if Text::CSV uses an XS backend.
1566
1567 is_pp
1568 Returns true value if Text::CSV uses a pure-Perl backend.
1569
1571 This section is also taken from Text::CSV_XS.
1572
1573 csv
1574 This function is not exported by default and should be explicitly
1575 requested:
1576
1577 use Text::CSV qw( csv );
1578
1579 This is a high-level function that aims at simple (user) interfaces.
1580 This can be used to read/parse a "CSV" file or stream (the default
1581 behavior) or to produce a file or write to a stream (define the "out"
1582 attribute). It returns an array- or hash-reference on parsing (or
1583 "undef" on fail) or the numeric value of "error_diag" on writing.
1584 When this function fails you can get to the error using the class call
1585 to "error_diag"
1586
1587 my $aoa = csv (in => "test.csv") or
1588 die Text::CSV->error_diag;
1589
1590 This function takes the arguments as key-value pairs. This can be
1591 passed as a list or as an anonymous hash:
1592
1593 my $aoa = csv ( in => "test.csv", sep_char => ";");
1594 my $aoh = csv ({ in => $fh, headers => "auto" });
1595
1596 The arguments passed consist of two parts: the arguments to "csv"
1597 itself and the optional attributes to the "CSV" object used inside
1598 the function as enumerated and explained in "new".
1599
1600 If not overridden, the default option used for CSV is
1601
1602 auto_diag => 1
1603 escape_null => 0
1604
1605 The option that is always set and cannot be altered is
1606
1607 binary => 1
1608
1609 As this function will likely be used in one-liners, it allows "quote"
1610 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1611 "esc" or "escape".
1612
1613 Alternative invocations:
1614
1615 my $aoa = Text::CSV::csv (in => "file.csv");
1616
1617 my $csv = Text::CSV->new ();
1618 my $aoa = $csv->csv (in => "file.csv");
1619
1620 In the latter case, the object attributes are used from the existing
1621 object and the attribute arguments in the function call are ignored:
1622
1623 my $csv = Text::CSV->new ({ sep_char => ";" });
1624 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1625
1626 will parse using ";" as "sep_char", not ",".
1627
1628 in
1629
1630 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1631 which will be opened for reading and closed when finished, a file
1632 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1633 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1634 "\q{1,2,"csv"}").
1635
1636 When used with "out", "in" should be a reference to a CSV structure
1637 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1638 reference. The code-ref will be invoked with no arguments.
1639
1640 my $aoa = csv (in => "file.csv");
1641
1642 open my $fh, "<", "file.csv";
1643 my $aoa = csv (in => $fh);
1644
1645 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1646 my $err = csv (in => $csv, out => "file.csv");
1647
1648 If called in void context without the "out" attribute, the resulting
1649 ref will be used as input to a subsequent call to csv:
1650
1651 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1652
1653 will be a shortcut to
1654
1655 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1656
1657 where, in the absence of the "out" attribute, this is a shortcut to
1658
1659 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1660 out => *STDOUT)
1661
1662 out
1663
1664 csv (in => $aoa, out => "file.csv");
1665 csv (in => $aoa, out => $fh);
1666 csv (in => $aoa, out => STDOUT);
1667 csv (in => $aoa, out => *STDOUT);
1668 csv (in => $aoa, out => \*STDOUT);
1669 csv (in => $aoa, out => \my $data);
1670 csv (in => $aoa, out => undef);
1671 csv (in => $aoa, out => \"skip");
1672
1673 csv (in => $fh, out => \@aoa);
1674 csv (in => $fh, out => \@aoh, bom => 1);
1675 csv (in => $fh, out => \%hsh, key => "key");
1676
1677 In output mode, the default CSV options when producing CSV are
1678
1679 eol => "\r\n"
1680
1681 The "fragment" attribute is ignored in output mode.
1682
1683 "out" can be a file name (e.g. "file.csv"), which will be opened for
1684 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1685 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1686 or a reference to a scalar (e.g. "\my $data").
1687
1688 csv (in => sub { $sth->fetch }, out => "dump.csv");
1689 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1690 headers => $sth->{NAME_lc});
1691
1692 When a code-ref is used for "in", the output is generated per
1693 invocation, so no buffering is involved. This implies that there is no
1694 size restriction on the number of records. The "csv" function ends when
1695 the coderef returns a false value.
1696
1697 If "out" is set to a reference of the literal string "skip", the output
1698 will be suppressed completely, which might be useful in combination
1699 with a filter for side effects only.
1700
1701 my %cache;
1702 csv (in => "dump.csv",
1703 out => \"skip",
1704 on_in => sub { $cache{$_[1][1]}++ });
1705
1706 Currently, setting "out" to any false value ("undef", "", 0) will be
1707 equivalent to "\"skip"".
1708
1709 If the "in" argument point to something to parse, and the "out" is set
1710 to a reference to an "ARRAY" or a "HASH", the output is appended to the
1711 data in the existing reference. The result of the parse should match
1712 what exists in the reference passed. This might come handy when you
1713 have to parse a set of files with similar content (like data stored per
1714 period) and you want to collect that into a single data structure:
1715
1716 my %hash;
1717 csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1718
1719 my @list; # List of arrays
1720 csv (in => $_, out => \@list) for sort glob "foo-[0-9]*.csv";
1721
1722 my @list; # List of hashes
1723 csv (in => $_, out => \@list, bom => 1) for sort glob "foo-[0-9]*.csv";
1724
1725 encoding
1726
1727 If passed, it should be an encoding accepted by the :encoding()
1728 option to "open". There is no default value. This attribute does not
1729 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1730 use in command line invocations.
1731
1732 If "encoding" is set to the literal value "auto", the method "header"
1733 will be invoked on the opened stream to check if there is a BOM and set
1734 the encoding accordingly. This is equal to passing a true value in
1735 the option "detect_bom".
1736
1737 Encodings can be stacked, as supported by "binmode":
1738
1739 # Using PerlIO::via::gzip
1740 csv (in => \@csv,
1741 out => "test.csv:via.gz",
1742 encoding => ":via(gzip):encoding(utf-8)",
1743 );
1744 $aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
1745
1746 # Using PerlIO::gzip
1747 csv (in => \@csv,
1748 out => "test.csv:via.gz",
1749 encoding => ":gzip:encoding(utf-8)",
1750 );
1751 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1752
1753 detect_bom
1754
1755 If "detect_bom" is given, the method "header" will be invoked on
1756 the opened stream to check if there is a BOM and set the encoding
1757 accordingly.
1758
1759 "detect_bom" can be abbreviated to "bom".
1760
1761 This is the same as setting "encoding" to "auto".
1762
1763 Note that as the method "header" is invoked, its default is to also
1764 set the headers.
1765
1766 headers
1767
1768 If this attribute is not given, the default behavior is to produce an
1769 array of arrays.
1770
1771 If "headers" is supplied, it should be an anonymous list of column
1772 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1773 "lc", "uc", or "skip".
1774
1775 skip
1776 When "skip" is used, the header will not be included in the output.
1777
1778 my $aoa = csv (in => $fh, headers => "skip");
1779
1780 auto
1781 If "auto" is used, the first line of the "CSV" source will be read as
1782 the list of field headers and used to produce an array of hashes.
1783
1784 my $aoh = csv (in => $fh, headers => "auto");
1785
1786 lc
1787 If "lc" is used, the first line of the "CSV" source will be read as
1788 the list of field headers mapped to lower case and used to produce
1789 an array of hashes. This is a variation of "auto".
1790
1791 my $aoh = csv (in => $fh, headers => "lc");
1792
1793 uc
1794 If "uc" is used, the first line of the "CSV" source will be read as
1795 the list of field headers mapped to upper case and used to produce
1796 an array of hashes. This is a variation of "auto".
1797
1798 my $aoh = csv (in => $fh, headers => "uc");
1799
1800 CODE
1801 If a coderef is used, the first line of the "CSV" source will be
1802 read as the list of mangled field headers in which each field is
1803 passed as the only argument to the coderef. This list is used to
1804 produce an array of hashes.
1805
1806 my $aoh = csv (in => $fh,
1807 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1808
1809 this example is a variation of using "lc" where all occurrences of
1810 "kode" are replaced with "code".
1811
1812 ARRAY
1813 If "headers" is an anonymous list, the entries in the list will be
1814 used as field names. The first line is considered data instead of
1815 headers.
1816
1817 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1818 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1819
1820 HASH
1821 If "headers" is a hash reference, this implies "auto", but header
1822 fields that exist as key in the hashref will be replaced by the value
1823 for that key. Given a CSV file like
1824
1825 post-kode,city,name,id number,fubble
1826 1234AA,Duckstad,Donald,13,"X313DF"
1827
1828 using
1829
1830 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1831
1832 will return an entry like
1833
1834 { pc => "1234AA",
1835 city => "Duckstad",
1836 name => "Donald",
1837 ID => "13",
1838 fubble => "X313DF",
1839 }
1840
1841 See also "munge_column_names" and "set_column_names".
1842
1843 munge_column_names
1844
1845 If "munge_column_names" is set, the method "header" is invoked on
1846 the opened stream with all matching arguments to detect and set the
1847 headers.
1848
1849 "munge_column_names" can be abbreviated to "munge".
1850
1851 key
1852
1853 If passed, will default "headers" to "auto" and return a hashref
1854 instead of an array of hashes. Allowed values are simple scalars or
1855 array-references where the first element is the joiner and the rest are
1856 the fields to join to combine the key.
1857
1858 my $ref = csv (in => "test.csv", key => "code");
1859 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1860
1861 with test.csv like
1862
1863 code,product,price,color
1864 1,pc,850,gray
1865 2,keyboard,12,white
1866 3,mouse,5,black
1867
1868 the first example will return
1869
1870 { 1 => {
1871 code => 1,
1872 color => 'gray',
1873 price => 850,
1874 product => 'pc'
1875 },
1876 2 => {
1877 code => 2,
1878 color => 'white',
1879 price => 12,
1880 product => 'keyboard'
1881 },
1882 3 => {
1883 code => 3,
1884 color => 'black',
1885 price => 5,
1886 product => 'mouse'
1887 }
1888 }
1889
1890 the second example will return
1891
1892 { "1:gray" => {
1893 code => 1,
1894 color => 'gray',
1895 price => 850,
1896 product => 'pc'
1897 },
1898 "2:white" => {
1899 code => 2,
1900 color => 'white',
1901 price => 12,
1902 product => 'keyboard'
1903 },
1904 "3:black" => {
1905 code => 3,
1906 color => 'black',
1907 price => 5,
1908 product => 'mouse'
1909 }
1910 }
1911
1912 The "key" attribute can be combined with "headers" for "CSV" date that
1913 has no header line, like
1914
1915 my $ref = csv (
1916 in => "foo.csv",
1917 headers => [qw( c_foo foo bar description stock )],
1918 key => "c_foo",
1919 );
1920
1921 value
1922
1923 Used to create key-value hashes.
1924
1925 Only allowed when "key" is valid. A "value" can be either a single
1926 column label or an anonymous list of column labels. In the first case,
1927 the value will be a simple scalar value, in the latter case, it will be
1928 a hashref.
1929
1930 my $ref = csv (in => "test.csv", key => "code",
1931 value => "price");
1932 my $ref = csv (in => "test.csv", key => "code",
1933 value => [ "product", "price" ]);
1934 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1935 value => "price");
1936 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1937 value => [ "product", "price" ]);
1938
1939 with test.csv like
1940
1941 code,product,price,color
1942 1,pc,850,gray
1943 2,keyboard,12,white
1944 3,mouse,5,black
1945
1946 the first example will return
1947
1948 { 1 => 850,
1949 2 => 12,
1950 3 => 5,
1951 }
1952
1953 the second example will return
1954
1955 { 1 => {
1956 price => 850,
1957 product => 'pc'
1958 },
1959 2 => {
1960 price => 12,
1961 product => 'keyboard'
1962 },
1963 3 => {
1964 price => 5,
1965 product => 'mouse'
1966 }
1967 }
1968
1969 the third example will return
1970
1971 { "1:gray" => 850,
1972 "2:white" => 12,
1973 "3:black" => 5,
1974 }
1975
1976 the fourth example will return
1977
1978 { "1:gray" => {
1979 price => 850,
1980 product => 'pc'
1981 },
1982 "2:white" => {
1983 price => 12,
1984 product => 'keyboard'
1985 },
1986 "3:black" => {
1987 price => 5,
1988 product => 'mouse'
1989 }
1990 }
1991
1992 keep_headers
1993
1994 When using hashes, keep the column names into the arrayref passed, so
1995 all headers are available after the call in the original order.
1996
1997 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1998
1999 This attribute can be abbreviated to "kh" or passed as
2000 "keep_column_names".
2001
2002 This attribute implies a default of "auto" for the "headers" attribute.
2003
2004 The headers can also be kept internally to keep stable header order:
2005
2006 csv (in => csv (in => "file.csv", kh => "internal"),
2007 out => "new.csv",
2008 kh => "internal");
2009
2010 where "internal" can also be 1, "yes", or "true". This is similar to
2011
2012 my @h;
2013 csv (in => csv (in => "file.csv", kh => \@h),
2014 out => "new.csv",
2015 headers => \@h);
2016
2017 fragment
2018
2019 Only output the fragment as defined in the "fragment" method. This
2020 option is ignored when generating "CSV". See "out".
2021
2022 Combining all of them could give something like
2023
2024 use Text::CSV qw( csv );
2025 my $aoh = csv (
2026 in => "test.txt",
2027 encoding => "utf-8",
2028 headers => "auto",
2029 sep_char => "|",
2030 fragment => "row=3;6-9;15-*",
2031 );
2032 say $aoh->[15]{Foo};
2033
2034 sep_set
2035
2036 If "sep_set" is set, the method "header" is invoked on the opened
2037 stream to detect and set "sep_char" with the given set.
2038
2039 "sep_set" can be abbreviated to "seps".
2040
2041 Note that as the "header" method is invoked, its default is to also
2042 set the headers.
2043
2044 set_column_names
2045
2046 If "set_column_names" is passed, the method "header" is invoked on
2047 the opened stream with all arguments meant for "header".
2048
2049 If "set_column_names" is passed as a false value, the content of the
2050 first row is only preserved if the output is AoA:
2051
2052 With an input-file like
2053
2054 bAr,foo
2055 1,2
2056 3,4,5
2057
2058 This call
2059
2060 my $aoa = csv (in => $file, set_column_names => 0);
2061
2062 will result in
2063
2064 [[ "bar", "foo" ],
2065 [ "1", "2" ],
2066 [ "3", "4", "5" ]]
2067
2068 and
2069
2070 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2071
2072 will result in
2073
2074 [[ "bAr", "foo" ],
2075 [ "1", "2" ],
2076 [ "3", "4", "5" ]]
2077
2078 Callbacks
2079 Callbacks enable actions triggered from the inside of Text::CSV.
2080
2081 While most of what this enables can easily be done in an unrolled
2082 loop as described in the "SYNOPSIS" callbacks can be used to meet
2083 special demands or enhance the "csv" function.
2084
2085 error
2086 $csv->callbacks (error => sub { $csv->SetDiag (0) });
2087
2088 the "error" callback is invoked when an error occurs, but only
2089 when "auto_diag" is set to a true value. A callback is invoked with
2090 the values returned by "error_diag":
2091
2092 my ($c, $s);
2093
2094 sub ignore3006 {
2095 my ($err, $msg, $pos, $recno, $fldno) = @_;
2096 if ($err == 3006) {
2097 # ignore this error
2098 ($c, $s) = (undef, undef);
2099 Text::CSV->SetDiag (0);
2100 }
2101 # Any other error
2102 return;
2103 } # ignore3006
2104
2105 $csv->callbacks (error => \&ignore3006);
2106 $csv->bind_columns (\$c, \$s);
2107 while ($csv->getline ($fh)) {
2108 # Error 3006 will not stop the loop
2109 }
2110
2111 after_parse
2112 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2113 while (my $row = $csv->getline ($fh)) {
2114 $row->[-1] eq "NEW";
2115 }
2116
2117 This callback is invoked after parsing with "getline" only if no
2118 error occurred. The callback is invoked with two arguments: the
2119 current "CSV" parser object and an array reference to the fields
2120 parsed.
2121
2122 The return code of the callback is ignored unless it is a reference
2123 to the string "skip", in which case the record will be skipped in
2124 "getline_all".
2125
2126 sub add_from_db {
2127 my ($csv, $row) = @_;
2128 $sth->execute ($row->[4]);
2129 push @$row, $sth->fetchrow_array;
2130 } # add_from_db
2131
2132 my $aoa = csv (in => "file.csv", callbacks => {
2133 after_parse => \&add_from_db });
2134
2135 This hook can be used for validation:
2136
2137 FAIL
2138 Die if any of the records does not validate a rule:
2139
2140 after_parse => sub {
2141 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2142 die "5th field does not have a valid Dutch zipcode";
2143 }
2144
2145 DEFAULT
2146 Replace invalid fields with a default value:
2147
2148 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2149
2150 SKIP
2151 Skip records that have invalid fields (only applies to
2152 "getline_all"):
2153
2154 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2155
2156 before_print
2157 my $idx = 1;
2158 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2159 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2160
2161 This callback is invoked before printing with "print" only if no
2162 error occurred. The callback is invoked with two arguments: the
2163 current "CSV" parser object and an array reference to the fields
2164 passed.
2165
2166 The return code of the callback is ignored.
2167
2168 sub max_4_fields {
2169 my ($csv, $row) = @_;
2170 @$row > 4 and splice @$row, 4;
2171 } # max_4_fields
2172
2173 csv (in => csv (in => "file.csv"), out => *STDOUT,
2174 callbacks => { before_print => \&max_4_fields });
2175
2176 This callback is not active for "combine".
2177
2178 Callbacks for csv ()
2179
2180 The "csv" allows for some callbacks that do not integrate in XS
2181 internals but only feature the "csv" function.
2182
2183 csv (in => "file.csv",
2184 callbacks => {
2185 filter => { 6 => sub { $_ > 15 } }, # first
2186 after_parse => sub { say "AFTER PARSE"; }, # first
2187 after_in => sub { say "AFTER IN"; }, # second
2188 on_in => sub { say "ON IN"; }, # third
2189 },
2190 );
2191
2192 csv (in => $aoh,
2193 out => "file.csv",
2194 callbacks => {
2195 on_in => sub { say "ON IN"; }, # first
2196 before_out => sub { say "BEFORE OUT"; }, # second
2197 before_print => sub { say "BEFORE PRINT"; }, # third
2198 },
2199 );
2200
2201 filter
2202 This callback can be used to filter records. It is called just after
2203 a new record has been scanned. The callback accepts a:
2204
2205 hashref
2206 The keys are the index to the row (the field name or field number,
2207 1-based) and the values are subs to return a true or false value.
2208
2209 csv (in => "file.csv", filter => {
2210 3 => sub { m/a/ }, # third field should contain an "a"
2211 5 => sub { length > 4 }, # length of the 5th field minimal 5
2212 });
2213
2214 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2215
2216 If the keys to the filter hash contain any character that is not a
2217 digit it will also implicitly set "headers" to "auto" unless
2218 "headers" was already passed as argument. When headers are
2219 active, returning an array of hashes, the filter is not applicable
2220 to the header itself.
2221
2222 All sub results should match, as in AND.
2223
2224 The context of the callback sets $_ localized to the field
2225 indicated by the filter. The two arguments are as with all other
2226 callbacks, so the other fields in the current row can be seen:
2227
2228 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2229
2230 If the context is set to return a list of hashes ("headers" is
2231 defined), the current record will also be available in the
2232 localized %_:
2233
2234 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2235
2236 If the filter is used to alter the content by changing $_, make
2237 sure that the sub returns true in order not to have that record
2238 skipped:
2239
2240 filter => { 2 => sub { $_ = uc }}
2241
2242 will upper-case the second field, and then skip it if the resulting
2243 content evaluates to false. To always accept, end with truth:
2244
2245 filter => { 2 => sub { $_ = uc; 1 }}
2246
2247 coderef
2248 csv (in => "file.csv", filter => sub { $n++; 0; });
2249
2250 If the argument to "filter" is a coderef, it is an alias or
2251 shortcut to a filter on column 0:
2252
2253 csv (filter => sub { $n++; 0 });
2254
2255 is equal to
2256
2257 csv (filter => { 0 => sub { $n++; 0 });
2258
2259 filter-name
2260 csv (in => "file.csv", filter => "not_blank");
2261 csv (in => "file.csv", filter => "not_empty");
2262 csv (in => "file.csv", filter => "filled");
2263
2264 These are predefined filters
2265
2266 Given a file like (line numbers prefixed for doc purpose only):
2267
2268 1:1,2,3
2269 2:
2270 3:,
2271 4:""
2272 5:,,
2273 6:, ,
2274 7:"",
2275 8:" "
2276 9:4,5,6
2277
2278 not_blank
2279 Filter out the blank lines
2280
2281 This filter is a shortcut for
2282
2283 filter => { 0 => sub { @{$_[1]} > 1 or
2284 defined $_[1][0] && $_[1][0] ne "" } }
2285
2286 Due to the implementation, it is currently impossible to also
2287 filter lines that consists only of a quoted empty field. These
2288 lines are also considered blank lines.
2289
2290 With the given example, lines 2 and 4 will be skipped.
2291
2292 not_empty
2293 Filter out lines where all the fields are empty.
2294
2295 This filter is a shortcut for
2296
2297 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2298
2299 A space is not regarded being empty, so given the example data,
2300 lines 2, 3, 4, 5, and 7 are skipped.
2301
2302 filled
2303 Filter out lines that have no visible data
2304
2305 This filter is a shortcut for
2306
2307 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2308
2309 This filter rejects all lines that not have at least one field
2310 that does not evaluate to the empty string.
2311
2312 With the given example data, this filter would skip lines 2
2313 through 8.
2314
2315 One could also use modules like Types::Standard:
2316
2317 use Types::Standard -types;
2318
2319 my $type = Tuple[Str, Str, Int, Bool, Optional[Num]];
2320 my $check = $type->compiled_check;
2321
2322 # filter with compiled check and warnings
2323 my $aoa = csv (
2324 in => \$data,
2325 filter => {
2326 0 => sub {
2327 my $ok = $check->($_[1]) or
2328 warn $type->get_message ($_[1]), "\n";
2329 return $ok;
2330 },
2331 },
2332 );
2333
2334 after_in
2335 This callback is invoked for each record after all records have been
2336 parsed but before returning the reference to the caller. The hook is
2337 invoked with two arguments: the current "CSV" parser object and a
2338 reference to the record. The reference can be a reference to a
2339 HASH or a reference to an ARRAY as determined by the arguments.
2340
2341 This callback can also be passed as an attribute without the
2342 "callbacks" wrapper.
2343
2344 before_out
2345 This callback is invoked for each record before the record is
2346 printed. The hook is invoked with two arguments: the current "CSV"
2347 parser object and a reference to the record. The reference can be a
2348 reference to a HASH or a reference to an ARRAY as determined by the
2349 arguments.
2350
2351 This callback can also be passed as an attribute without the
2352 "callbacks" wrapper.
2353
2354 This callback makes the row available in %_ if the row is a hashref.
2355 In this case %_ is writable and will change the original row.
2356
2357 on_in
2358 This callback acts exactly as the "after_in" or the "before_out"
2359 hooks.
2360
2361 This callback can also be passed as an attribute without the
2362 "callbacks" wrapper.
2363
2364 This callback makes the row available in %_ if the row is a hashref.
2365 In this case %_ is writable and will change the original row. So e.g.
2366 with
2367
2368 my $aoh = csv (
2369 in => \"foo\n1\n2\n",
2370 headers => "auto",
2371 on_in => sub { $_{bar} = 2; },
2372 );
2373
2374 $aoh will be:
2375
2376 [ { foo => 1,
2377 bar => 2,
2378 }
2379 { foo => 2,
2380 bar => 2,
2381 }
2382 ]
2383
2384 csv
2385 The function "csv" can also be called as a method or with an
2386 existing Text::CSV object. This could help if the function is to be
2387 invoked a lot of times and the overhead of creating the object
2388 internally over and over again would be prevented by passing an
2389 existing instance.
2390
2391 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2392
2393 my $aoa = $csv->csv (in => $fh);
2394 my $aoa = csv (in => $fh, csv => $csv);
2395
2396 both act the same. Running this 20000 times on a 20 lines CSV file,
2397 showed a 53% speedup.
2398
2400 This section is also taken from Text::CSV_XS.
2401
2402 Still under construction ...
2403
2404 If an error occurs, "$csv->error_diag" can be used to get information
2405 on the cause of the failure. Note that for speed reasons the internal
2406 value is never cleared on success, so using the value returned by
2407 "error_diag" in normal cases - when no error occurred - may cause
2408 unexpected results.
2409
2410 If the constructor failed, the cause can be found using "error_diag" as
2411 a class method, like "Text::CSV->error_diag".
2412
2413 The "$csv->error_diag" method is automatically invoked upon error when
2414 the contractor was called with "auto_diag" set to 1 or 2, or when
2415 autodie is in effect. When set to 1, this will cause a "warn" with the
2416 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2417 from "auto_diag" reports.
2418
2419 Errors can be (individually) caught using the "error" callback.
2420
2421 The errors as described below are available. I have tried to make the
2422 error itself explanatory enough, but more descriptions will be added.
2423 For most of these errors, the first three capitals describe the error
2424 category:
2425
2426 • INI
2427
2428 Initialization error or option conflict.
2429
2430 • ECR
2431
2432 Carriage-Return related parse error.
2433
2434 • EOF
2435
2436 End-Of-File related parse error.
2437
2438 • EIQ
2439
2440 Parse error inside quotation.
2441
2442 • EIF
2443
2444 Parse error inside field.
2445
2446 • ECB
2447
2448 Combine error.
2449
2450 • EHR
2451
2452 HashRef parse related error.
2453
2454 And below should be the complete list of error codes that can be
2455 returned:
2456
2457 • 1001 "INI - sep_char is equal to quote_char or escape_char"
2458
2459 The separation character cannot be equal to the quotation
2460 character or to the escape character, as this would invalidate all
2461 parsing rules.
2462
2463 • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2464 TAB"
2465
2466 Using the "allow_whitespace" attribute when either "quote_char" or
2467 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2468 allow.
2469
2470 • 1003 "INI - \r or \n in main attr not allowed"
2471
2472 Using default "eol" characters in either "sep_char", "quote_char",
2473 or "escape_char" is not allowed.
2474
2475 • 1004 "INI - callbacks should be undef or a hashref"
2476
2477 The "callbacks" attribute only allows one to be "undef" or a hash
2478 reference.
2479
2480 • 1005 "INI - EOL too long"
2481
2482 The value passed for EOL is exceeding its maximum length (16).
2483
2484 • 1006 "INI - SEP too long"
2485
2486 The value passed for SEP is exceeding its maximum length (16).
2487
2488 • 1007 "INI - QUOTE too long"
2489
2490 The value passed for QUOTE is exceeding its maximum length (16).
2491
2492 • 1008 "INI - SEP undefined"
2493
2494 The value passed for SEP should be defined and not empty.
2495
2496 • 1010 "INI - the header is empty"
2497
2498 The header line parsed in the "header" is empty.
2499
2500 • 1011 "INI - the header contains more than one valid separator"
2501
2502 The header line parsed in the "header" contains more than one
2503 (unique) separator character out of the allowed set of separators.
2504
2505 • 1012 "INI - the header contains an empty field"
2506
2507 The header line parsed in the "header" contains an empty field.
2508
2509 • 1013 "INI - the header contains nun-unique fields"
2510
2511 The header line parsed in the "header" contains at least two
2512 identical fields.
2513
2514 • 1014 "INI - header called on undefined stream"
2515
2516 The header line cannot be parsed from an undefined source.
2517
2518 • 1500 "PRM - Invalid/unsupported argument(s)"
2519
2520 Function or method called with invalid argument(s) or parameter(s).
2521
2522 • 1501 "PRM - The key attribute is passed as an unsupported type"
2523
2524 The "key" attribute is of an unsupported type.
2525
2526 • 1502 "PRM - The value attribute is passed without the key attribute"
2527
2528 The "value" attribute is only allowed when a valid key is given.
2529
2530 • 1503 "PRM - The value attribute is passed as an unsupported type"
2531
2532 The "value" attribute is of an unsupported type.
2533
2534 • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2535
2536 When "eol" has been set to anything but the default, like
2537 "\r\t\n", and the "\r" is following the second (closing)
2538 "quote_char", where the characters following the "\r" do not make up
2539 the "eol" sequence, this is an error.
2540
2541 • 2011 "ECR - Characters after end of quoted field"
2542
2543 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2544 quoted field and after the closing double-quote, there should be
2545 either a new-line sequence or a separation character.
2546
2547 • 2012 "EOF - End of data in parsing input stream"
2548
2549 Self-explaining. End-of-file while inside parsing a stream. Can
2550 happen only when reading from streams with "getline", as using
2551 "parse" is done on strings that are not required to have a trailing
2552 "eol".
2553
2554 • 2013 "INI - Specification error for fragments RFC7111"
2555
2556 Invalid specification for URI "fragment" specification.
2557
2558 • 2014 "ENF - Inconsistent number of fields"
2559
2560 Inconsistent number of fields under strict parsing.
2561
2562 • 2021 "EIQ - NL char inside quotes, binary off"
2563
2564 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2565 option has been selected with the constructor.
2566
2567 • 2022 "EIQ - CR char inside quotes, binary off"
2568
2569 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2570 option has been selected with the constructor.
2571
2572 • 2023 "EIQ - QUO character not allowed"
2573
2574 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2575 Bar",\n" will cause this error.
2576
2577 • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2578
2579 The escape character is not allowed as last character in an input
2580 stream.
2581
2582 • 2025 "EIQ - Loose unescaped escape"
2583
2584 An escape character should escape only characters that need escaping.
2585
2586 Allowing the escape for other characters is possible with the
2587 attribute "allow_loose_escapes".
2588
2589 • 2026 "EIQ - Binary character inside quoted field, binary off"
2590
2591 Binary characters are not allowed by default. Exceptions are
2592 fields that contain valid UTF-8, that will automatically be upgraded
2593 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2594 data.
2595
2596 • 2027 "EIQ - Quoted field not terminated"
2597
2598 When parsing a field that started with a quotation character, the
2599 field is expected to be closed with a quotation character. When the
2600 parsed line is exhausted before the quote is found, that field is not
2601 terminated.
2602
2603 • 2030 "EIF - NL char inside unquoted verbatim, binary off"
2604
2605 • 2031 "EIF - CR char is first char of field, not part of EOL"
2606
2607 • 2032 "EIF - CR char inside unquoted, not part of EOL"
2608
2609 • 2034 "EIF - Loose unescaped quote"
2610
2611 • 2035 "EIF - Escaped EOF in unquoted field"
2612
2613 • 2036 "EIF - ESC error"
2614
2615 • 2037 "EIF - Binary character in unquoted field, binary off"
2616
2617 • 2110 "ECB - Binary character in Combine, binary off"
2618
2619 • 2200 "EIO - print to IO failed. See errno"
2620
2621 • 3001 "EHR - Unsupported syntax for column_names ()"
2622
2623 • 3002 "EHR - getline_hr () called before column_names ()"
2624
2625 • 3003 "EHR - bind_columns () and column_names () fields count
2626 mismatch"
2627
2628 • 3004 "EHR - bind_columns () only accepts refs to scalars"
2629
2630 • 3006 "EHR - bind_columns () did not pass enough refs for parsed
2631 fields"
2632
2633 • 3007 "EHR - bind_columns needs refs to writable scalars"
2634
2635 • 3008 "EHR - unexpected error in bound fields"
2636
2637 • 3009 "EHR - print_hr () called before column_names ()"
2638
2639 • 3010 "EHR - print_hr () called with invalid arguments"
2640
2642 Text::CSV_PP, Text::CSV_XS and Text::CSV::Encoded.
2643
2645 Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2646 Please don't send mail concerning Text::CSV to Alan, as he's not a
2647 present maintainer.
2648
2649 Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2650 in C by implementing a simple finite-state machine and added the
2651 variable quote, escape and separator characters, the binary mode and
2652 the print and getline methods. See ChangeLog releases 0.10 through
2653 0.23.
2654
2655 H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2656 field flags methods, wrote the major part of the test suite, completed
2657 the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2658 on.
2659
2660 Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2661 which is the pure-Perl version of Text::CSV_XS.
2662
2663 New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2664 Ishigaki since 1.91.
2665
2667 Text::CSV
2668
2669 Copyright (C) 1997 Alan Citterman. All rights reserved. Copyright (C)
2670 2007-2015 Makamaka Hannyaharamitu. Copyright (C) 2017- Kenichi
2671 Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2672 below.
2673
2674 Text::CSV_PP:
2675
2676 Copyright (C) 2005-2015 Makamaka Hannyaharamitu. Copyright (C) 2017-
2677 Kenichi Ishigaki A large portion of the code/doc are also taken from
2678 Text::CSV_XS. See below.
2679
2680 Text:CSV_XS:
2681
2682 Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V. Copyright (C)
2683 1998-2001 Jochen Wiedmann. All rights reserved. Portions Copyright (C)
2684 1997 Alan Citterman. All rights reserved.
2685
2686 This library is free software; you can redistribute it and/or modify it
2687 under the same terms as Perl itself.
2688
2689
2690
2691perl v5.38.0 2023-07-21 Text::CSV(3)