1CSV_XS(3) User Contributed Perl Documentation CSV_XS(3)
2
3
4
6 Text::CSV_XS - comma-separated values manipulation routines
7
9 # Functional interface
10 use Text::CSV_XS qw( csv );
11
12 # Read whole file in memory
13 my $aoa = csv (in => "data.csv"); # as array of array
14 my $aoh = csv (in => "data.csv",
15 headers => "auto"); # as array of hash
16
17 # Write array of arrays as csv file
18 csv (in => $aoa, out => "file.csv", sep_char=> ";");
19
20 # Only show lines where "code" is odd
21 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
22
23
24 # Object interface
25 use Text::CSV_XS;
26
27 my @rows;
28 # Read/parse CSV
29 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
30 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
31 while (my $row = $csv->getline ($fh)) {
32 $row->[2] =~ m/pattern/ or next; # 3rd field should match
33 push @rows, $row;
34 }
35 close $fh;
36
37 # and write as CSV
38 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
39 $csv->say ($fh, $_) for @rows;
40 close $fh or die "new.csv: $!";
41
43 Text::CSV_XS provides facilities for the composition and
44 decomposition of comma-separated values. An instance of the
45 Text::CSV_XS class will combine fields into a "CSV" string and parse a
46 "CSV" string into fields.
47
48 The module accepts either strings or files as input and support the
49 use of user-specified characters for delimiters, separators, and
50 escapes.
51
52 Embedded newlines
53 Important Note: The default behavior is to accept only ASCII
54 characters in the range from 0x20 (space) to 0x7E (tilde). This means
55 that the fields can not contain newlines. If your data contains
56 newlines embedded in fields, or characters above 0x7E (tilde), or
57 binary data, you must set "binary => 1" in the call to "new". To cover
58 the widest range of parsing options, you will always want to set
59 binary.
60
61 But you still have the problem that you have to pass a correct line to
62 the "parse" method, which is more complicated from the usual point of
63 usage:
64
65 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
66 while (<>) { # WRONG!
67 $csv->parse ($_);
68 my @fields = $csv->fields ();
69 }
70
71 this will break, as the "while" might read broken lines: it does not
72 care about the quoting. If you need to support embedded newlines, the
73 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
74 and "\r\n" by default) and then
75
76 my $csv = Text::CSV_XS->new ({ binary => 1 });
77 open my $fh, "<", $file or die "$file: $!";
78 while (my $row = $csv->getline ($fh)) {
79 my @fields = @$row;
80 }
81
82 The old(er) way of using global file handles is still supported
83
84 while (my $row = $csv->getline (*ARGV)) { ... }
85
86 Unicode
87 Unicode is only tested to work with perl-5.8.2 and up.
88
89 See also "BOM".
90
91 The simplest way to ensure the correct encoding is used for in- and
92 output is by either setting layers on the filehandles, or setting the
93 "encoding" argument for "csv".
94
95 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
96 or
97 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
98
99 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
100 or
101 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
102
103 On parsing (both for "getline" and "parse"), if the source is marked
104 being UTF8, then all fields that are marked binary will also be marked
105 UTF8.
106
107 On combining ("print" and "combine"): if any of the combining fields
108 was marked UTF8, the resulting string will be marked as UTF8. Note
109 however that all fields before the first field marked UTF8 and
110 contained 8-bit characters that were not upgraded to UTF8, these will
111 be "bytes" in the resulting string too, possibly causing unexpected
112 errors. If you pass data of different encoding, or you don't know if
113 there is different encoding, force it to be upgraded before you pass
114 them on:
115
116 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
117
118 For complete control over encoding, please use Text::CSV::Encoded:
119
120 use Text::CSV::Encoded;
121 my $csv = Text::CSV::Encoded->new ({
122 encoding_in => "iso-8859-1", # the encoding comes into Perl
123 encoding_out => "cp1252", # the encoding comes out of Perl
124 });
125
126 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
127 # combine () and print () accept *literally* utf8 encoded data
128 # parse () and getline () return *literally* utf8 encoded data
129
130 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
131 # combine () and print () accept UTF8 marked data
132 # parse () and getline () return UTF8 marked data
133
134 BOM
135 BOM (or Byte Order Mark) handling is available only inside the
136 "header" method. This method supports the following encodings:
137 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
138 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
139 <https://en.wikipedia.org/wiki/Byte_order_mark>.
140
141 If a file has a BOM, the easiest way to deal with that is
142
143 my $aoh = csv (in => $file, detect_bom => 1);
144
145 All records will be encoded based on the detected BOM.
146
147 This implies a call to the "header" method, which defaults to also
148 set the "column_names". So this is not the same as
149
150 my $aoh = csv (in => $file, headers => "auto");
151
152 which only reads the first record to set "column_names" but ignores
153 any meaning of possible present BOM.
154
156 While no formal specification for CSV exists, RFC 4180
157 <https://datatracker.ietf.org/doc/html/rfc4180> (1) describes the
158 common format and establishes "text/csv" as the MIME type registered
159 with the IANA. RFC 7111 <https://datatracker.ietf.org/doc/html/rfc7111>
160 (2) adds fragments to CSV.
161
162 Many informal documents exist that describe the "CSV" format. "How
163 To: The Comma Separated Value (CSV) File Format"
164 <http://creativyst.com/Doc/Articles/CSV/CSV01.shtml> (3) provides an
165 overview of the "CSV" format in the most widely used applications and
166 explains how it can best be used and supported.
167
168 1) https://datatracker.ietf.org/doc/html/rfc4180
169 2) https://datatracker.ietf.org/doc/html/rfc7111
170 3) http://creativyst.com/Doc/Articles/CSV/CSV01.shtml
171
172 The basic rules are as follows:
173
174 CSV is a delimited data format that has fields/columns separated by
175 the comma character and records/rows separated by newlines. Fields that
176 contain a special character (comma, newline, or double quote), must be
177 enclosed in double quotes. However, if a line contains a single entry
178 that is the empty string, it may be enclosed in double quotes. If a
179 field's value contains a double quote character it is escaped by
180 placing another double quote character next to it. The "CSV" file
181 format does not require a specific character encoding, byte order, or
182 line terminator format.
183
184 • Each record is a single line ended by a line feed (ASCII/"LF"=0x0A)
185 or a carriage return and line feed pair (ASCII/"CRLF"="0x0D 0x0A"),
186 however, line-breaks may be embedded.
187
188 • Fields are separated by commas.
189
190 • Allowable characters within a "CSV" field include 0x09 ("TAB") and
191 the inclusive range of 0x20 (space) through 0x7E (tilde). In binary
192 mode all characters are accepted, at least in quoted fields.
193
194 • A field within "CSV" must be surrounded by double-quotes to
195 contain a separator character (comma).
196
197 Though this is the most clear and restrictive definition, Text::CSV_XS
198 is way more liberal than this, and allows extension:
199
200 • Line termination by a single carriage return is accepted by default
201
202 • The separation-, escape-, and escape- characters can be any ASCII
203 character in the range from 0x20 (space) to 0x7E (tilde).
204 Characters outside this range may or may not work as expected.
205 Multibyte characters, like UTF "U+060C" (ARABIC COMMA), "U+FF0C"
206 (FULLWIDTH COMMA), "U+241B" (SYMBOL FOR ESCAPE), "U+2424" (SYMBOL
207 FOR NEWLINE), "U+FF02" (FULLWIDTH QUOTATION MARK), and "U+201C" (LEFT
208 DOUBLE QUOTATION MARK) (to give some examples of what might look
209 promising) work for newer versions of perl for "sep_char", and
210 "quote_char" but not for "escape_char".
211
212 If you use perl-5.8.2 or higher these three attributes are
213 utf8-decoded, to increase the likelihood of success. This way
214 "U+00FE" will be allowed as a quote character.
215
216 • A field in "CSV" must be surrounded by double-quotes to make an
217 embedded double-quote, represented by a pair of consecutive double-
218 quotes, valid. In binary mode you may additionally use the sequence
219 ""0" for representation of a NULL byte. Using 0x00 in binary mode is
220 just as valid.
221
222 • Several violations of the above specification may be lifted by
223 passing some options as attributes to the object constructor.
224
226 version
227 (Class method) Returns the current module version.
228
229 new
230 (Class method) Returns a new instance of class Text::CSV_XS. The
231 attributes are described by the (optional) hash ref "\%attr".
232
233 my $csv = Text::CSV_XS->new ({ attributes ... });
234
235 The following attributes are available:
236
237 eol
238
239 my $csv = Text::CSV_XS->new ({ eol => $/ });
240 $csv->eol (undef);
241 my $eol = $csv->eol;
242
243 The end-of-line string to add to rows for "print" or the record
244 separator for "getline".
245
246 When not passed in a parser instance, the default behavior is to
247 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
248 "eol" at all. Passing "undef" or the empty string behave the same.
249
250 When not passed in a generating instance, records are not terminated
251 at all, so it is probably wise to pass something you expect. A safe
252 choice for "eol" on output is either $/ or "\r\n".
253
254 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
255 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
256 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
257
258 If both $/ and "eol" equal "\015", parsing lines that end on only a
259 Carriage Return without Line Feed, will be "parse"d correct.
260
261 sep_char
262
263 my $csv = Text::CSV_XS->new ({ sep_char => ";" });
264 $csv->sep_char (";");
265 my $c = $csv->sep_char;
266
267 The char used to separate fields, by default a comma. (","). Limited
268 to a single-byte character, usually in the range from 0x20 (space) to
269 0x7E (tilde). When longer sequences are required, use "sep".
270
271 The separation character can not be equal to the quote character or to
272 the escape character.
273
274 See also "CAVEATS"
275
276 sep
277
278 my $csv = Text::CSV_XS->new ({ sep => "\N{FULLWIDTH COMMA}" });
279 $csv->sep (";");
280 my $sep = $csv->sep;
281
282 The chars used to separate fields, by default undefined. Limited to 8
283 bytes.
284
285 When set, overrules "sep_char". If its length is one byte it acts as
286 an alias to "sep_char".
287
288 See also "CAVEATS"
289
290 quote_char
291
292 my $csv = Text::CSV_XS->new ({ quote_char => "'" });
293 $csv->quote_char (undef);
294 my $c = $csv->quote_char;
295
296 The character to quote fields containing blanks or binary data, by
297 default the double quote character ("""). A value of undef suppresses
298 quote chars (for simple cases only). Limited to a single-byte
299 character, usually in the range from 0x20 (space) to 0x7E (tilde).
300 When longer sequences are required, use "quote".
301
302 "quote_char" can not be equal to "sep_char".
303
304 quote
305
306 my $csv = Text::CSV_XS->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
307 $csv->quote ("'");
308 my $quote = $csv->quote;
309
310 The chars used to quote fields, by default undefined. Limited to 8
311 bytes.
312
313 When set, overrules "quote_char". If its length is one byte it acts as
314 an alias to "quote_char".
315
316 This method does not support "undef". Use "quote_char" to disable
317 quotation.
318
319 See also "CAVEATS"
320
321 escape_char
322
323 my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
324 $csv->escape_char (":");
325 my $c = $csv->escape_char;
326
327 The character to escape certain characters inside quoted fields.
328 This is limited to a single-byte character, usually in the range
329 from 0x20 (space) to 0x7E (tilde).
330
331 The "escape_char" defaults to being the double-quote mark ("""). In
332 other words the same as the default "quote_char". This means that
333 doubling the quote mark in a field escapes it:
334
335 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
336
337 If you change the "quote_char" without changing the
338 "escape_char", the "escape_char" will still be the double-quote
339 ("""). If instead you want to escape the "quote_char" by doubling it
340 you will need to also change the "escape_char" to be the same as what
341 you have changed the "quote_char" to.
342
343 Setting "escape_char" to "undef" or "" will completely disable escapes
344 and is greatly discouraged. This will also disable "escape_null".
345
346 The escape character can not be equal to the separation character.
347
348 binary
349
350 my $csv = Text::CSV_XS->new ({ binary => 1 });
351 $csv->binary (0);
352 my $f = $csv->binary;
353
354 If this attribute is 1, you may use binary characters in quoted
355 fields, including line feeds, carriage returns and "NULL" bytes. (The
356 latter could be escaped as ""0".) By default this feature is off.
357
358 If a string is marked UTF8, "binary" will be turned on automatically
359 when binary characters other than "CR" and "NL" are encountered. Note
360 that a simple string like "\x{00a0}" might still be binary, but not
361 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
362
363 strict
364
365 my $csv = Text::CSV_XS->new ({ strict => 1 });
366 $csv->strict (0);
367 my $f = $csv->strict;
368
369 If this attribute is set to 1, any row that parses to a different
370 number of fields than the previous row will cause the parser to throw
371 error 2014.
372
373 skip_empty_rows
374
375 my $csv = Text::CSV_XS->new ({ skip_empty_rows => 1 });
376 $csv->skip_empty_rows (0);
377 my $f = $csv->skip_empty_rows;
378
379 If this attribute is set to 1, any row that has an "eol" immediately
380 following the start of line will be skipped. Default behavior is to
381 return one single empty field.
382
383 This attribute is only used in parsing.
384
385 formula_handling
386
387 Alias for "formula"
388
389 formula
390
391 my $csv = Text::CSV_XS->new ({ formula => "none" });
392 $csv->formula ("none");
393 my $f = $csv->formula;
394
395 This defines the behavior of fields containing formulas. As formulas
396 are considered dangerous in spreadsheets, this attribute can define an
397 optional action to be taken if a field starts with an equal sign ("=").
398
399 For purpose of code-readability, this can also be written as
400
401 my $csv = Text::CSV_XS->new ({ formula_handling => "none" });
402 $csv->formula_handling ("none");
403 my $f = $csv->formula_handling;
404
405 Possible values for this attribute are
406
407 none
408 Take no specific action. This is the default.
409
410 $csv->formula ("none");
411
412 die
413 Cause the process to "die" whenever a leading "=" is encountered.
414
415 $csv->formula ("die");
416
417 croak
418 Cause the process to "croak" whenever a leading "=" is encountered.
419 (See Carp)
420
421 $csv->formula ("croak");
422
423 diag
424 Report position and content of the field whenever a leading "=" is
425 found. The value of the field is unchanged.
426
427 $csv->formula ("diag");
428
429 empty
430 Replace the content of fields that start with a "=" with the empty
431 string.
432
433 $csv->formula ("empty");
434 $csv->formula ("");
435
436 undef
437 Replace the content of fields that start with a "=" with "undef".
438
439 $csv->formula ("undef");
440 $csv->formula (undef);
441
442 a callback
443 Modify the content of fields that start with a "=" with the return-
444 value of the callback. The original content of the field is
445 available inside the callback as $_;
446
447 # Replace all formula's with 42
448 $csv->formula (sub { 42; });
449
450 # same as $csv->formula ("empty") but slower
451 $csv->formula (sub { "" });
452
453 # Allow =4+12
454 $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
455
456 # Allow more complex calculations
457 $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
458
459 All other values will give a warning and then fallback to "diag".
460
461 decode_utf8
462
463 my $csv = Text::CSV_XS->new ({ decode_utf8 => 1 });
464 $csv->decode_utf8 (0);
465 my $f = $csv->decode_utf8;
466
467 This attributes defaults to TRUE.
468
469 While parsing, fields that are valid UTF-8, are automatically set to
470 be UTF-8, so that
471
472 $csv->parse ("\xC4\xA8\n");
473
474 results in
475
476 PV("\304\250"\0) [UTF8 "\x{128}"]
477
478 Sometimes it might not be a desired action. To prevent those upgrades,
479 set this attribute to false, and the result will be
480
481 PV("\304\250"\0)
482
483 auto_diag
484
485 my $csv = Text::CSV_XS->new ({ auto_diag => 1 });
486 $csv->auto_diag (2);
487 my $l = $csv->auto_diag;
488
489 Set this attribute to a number between 1 and 9 causes "error_diag" to
490 be automatically called in void context upon errors.
491
492 In case of error "2012 - EOF", this call will be void.
493
494 If "auto_diag" is set to a numeric value greater than 1, it will "die"
495 on errors instead of "warn". If set to anything unrecognized, it will
496 be silently ignored.
497
498 Future extensions to this feature will include more reliable auto-
499 detection of "autodie" being active in the scope of which the error
500 occurred which will increment the value of "auto_diag" with 1 the
501 moment the error is detected.
502
503 diag_verbose
504
505 my $csv = Text::CSV_XS->new ({ diag_verbose => 1 });
506 $csv->diag_verbose (2);
507 my $l = $csv->diag_verbose;
508
509 Set the verbosity of the output triggered by "auto_diag". Currently
510 only adds the current input-record-number (if known) to the
511 diagnostic output with an indication of the position of the error.
512
513 blank_is_undef
514
515 my $csv = Text::CSV_XS->new ({ blank_is_undef => 1 });
516 $csv->blank_is_undef (0);
517 my $f = $csv->blank_is_undef;
518
519 Under normal circumstances, "CSV" data makes no distinction between
520 quoted- and unquoted empty fields. These both end up in an empty
521 string field once read, thus
522
523 1,"",," ",2
524
525 is read as
526
527 ("1", "", "", " ", "2")
528
529 When writing "CSV" files with either "always_quote" or "quote_empty"
530 set, the unquoted empty field is the result of an undefined value.
531 To enable this distinction when reading "CSV" data, the
532 "blank_is_undef" attribute will cause unquoted empty fields to be set
533 to "undef", causing the above to be parsed as
534
535 ("1", "", undef, " ", "2")
536
537 Note that this is specifically important when loading "CSV" fields
538 into a database that allows "NULL" values, as the perl equivalent for
539 "NULL" is "undef" in DBI land.
540
541 empty_is_undef
542
543 my $csv = Text::CSV_XS->new ({ empty_is_undef => 1 });
544 $csv->empty_is_undef (0);
545 my $f = $csv->empty_is_undef;
546
547 Going one step further than "blank_is_undef", this attribute
548 converts all empty fields to "undef", so
549
550 1,"",," ",2
551
552 is read as
553
554 (1, undef, undef, " ", 2)
555
556 Note that this affects only fields that are originally empty, not
557 fields that are empty after stripping allowed whitespace. YMMV.
558
559 allow_whitespace
560
561 my $csv = Text::CSV_XS->new ({ allow_whitespace => 1 });
562 $csv->allow_whitespace (0);
563 my $f = $csv->allow_whitespace;
564
565 When this option is set to true, the whitespace ("TAB"'s and
566 "SPACE"'s) surrounding the separation character is removed when
567 parsing. If either "TAB" or "SPACE" is one of the three characters
568 "sep_char", "quote_char", or "escape_char" it will not be considered
569 whitespace.
570
571 Now lines like:
572
573 1 , "foo" , bar , 3 , zapp
574
575 are parsed as valid "CSV", even though it violates the "CSV" specs.
576
577 Note that all whitespace is stripped from both start and end of
578 each field. That would make it more than a feature to enable parsing
579 bad "CSV" lines, as
580
581 1, 2.0, 3, ape , monkey
582
583 will now be parsed as
584
585 ("1", "2.0", "3", "ape", "monkey")
586
587 even if the original line was perfectly acceptable "CSV".
588
589 allow_loose_quotes
590
591 my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1 });
592 $csv->allow_loose_quotes (0);
593 my $f = $csv->allow_loose_quotes;
594
595 By default, parsing unquoted fields containing "quote_char" characters
596 like
597
598 1,foo "bar" baz,42
599
600 would result in parse error 2034. Though it is still bad practice to
601 allow this format, we cannot help the fact that some vendors
602 make their applications spit out lines styled this way.
603
604 If there is really bad "CSV" data, like
605
606 1,"foo "bar" baz",42
607
608 or
609
610 1,""foo bar baz"",42
611
612 there is a way to get this data-line parsed and leave the quotes inside
613 the quoted field as-is. This can be achieved by setting
614 "allow_loose_quotes" AND making sure that the "escape_char" is not
615 equal to "quote_char".
616
617 allow_loose_escapes
618
619 my $csv = Text::CSV_XS->new ({ allow_loose_escapes => 1 });
620 $csv->allow_loose_escapes (0);
621 my $f = $csv->allow_loose_escapes;
622
623 Parsing fields that have "escape_char" characters that escape
624 characters that do not need to be escaped, like:
625
626 my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
627 $csv->parse (qq{1,"my bar\'s",baz,42});
628
629 would result in parse error 2025. Though it is bad practice to allow
630 this format, this attribute enables you to treat all escape character
631 sequences equal.
632
633 allow_unquoted_escape
634
635 my $csv = Text::CSV_XS->new ({ allow_unquoted_escape => 1 });
636 $csv->allow_unquoted_escape (0);
637 my $f = $csv->allow_unquoted_escape;
638
639 A backward compatibility issue where "escape_char" differs from
640 "quote_char" prevents "escape_char" to be in the first position of a
641 field. If "quote_char" is equal to the default """ and "escape_char"
642 is set to "\", this would be illegal:
643
644 1,\0,2
645
646 Setting this attribute to 1 might help to overcome issues with
647 backward compatibility and allow this style.
648
649 always_quote
650
651 my $csv = Text::CSV_XS->new ({ always_quote => 1 });
652 $csv->always_quote (0);
653 my $f = $csv->always_quote;
654
655 By default the generated fields are quoted only if they need to be.
656 For example, if they contain the separator character. If you set this
657 attribute to 1 then all defined fields will be quoted. ("undef" fields
658 are not quoted, see "blank_is_undef"). This makes it quite often easier
659 to handle exported data in external applications. (Poor creatures who
660 are better to use Text::CSV_XS. :)
661
662 quote_space
663
664 my $csv = Text::CSV_XS->new ({ quote_space => 1 });
665 $csv->quote_space (0);
666 my $f = $csv->quote_space;
667
668 By default, a space in a field would trigger quotation. As no rule
669 exists this to be forced in "CSV", nor any for the opposite, the
670 default is true for safety. You can exclude the space from this
671 trigger by setting this attribute to 0.
672
673 quote_empty
674
675 my $csv = Text::CSV_XS->new ({ quote_empty => 1 });
676 $csv->quote_empty (0);
677 my $f = $csv->quote_empty;
678
679 By default the generated fields are quoted only if they need to be.
680 An empty (defined) field does not need quotation. If you set this
681 attribute to 1 then empty defined fields will be quoted. ("undef"
682 fields are not quoted, see "blank_is_undef"). See also "always_quote".
683
684 quote_binary
685
686 my $csv = Text::CSV_XS->new ({ quote_binary => 1 });
687 $csv->quote_binary (0);
688 my $f = $csv->quote_binary;
689
690 By default, all "unsafe" bytes inside a string cause the combined
691 field to be quoted. By setting this attribute to 0, you can disable
692 that trigger for bytes ">= 0x7F".
693
694 escape_null
695
696 my $csv = Text::CSV_XS->new ({ escape_null => 1 });
697 $csv->escape_null (0);
698 my $f = $csv->escape_null;
699
700 By default, a "NULL" byte in a field would be escaped. This option
701 enables you to treat the "NULL" byte as a simple binary character in
702 binary mode (the "{ binary => 1 }" is set). The default is true. You
703 can prevent "NULL" escapes by setting this attribute to 0.
704
705 When the "escape_char" attribute is set to undefined, this attribute
706 will be set to false.
707
708 The default setting will encode "=\x00=" as
709
710 "="0="
711
712 With "escape_null" set, this will result in
713
714 "=\x00="
715
716 The default when using the "csv" function is "false".
717
718 For backward compatibility reasons, the deprecated old name
719 "quote_null" is still recognized.
720
721 keep_meta_info
722
723 my $csv = Text::CSV_XS->new ({ keep_meta_info => 1 });
724 $csv->keep_meta_info (0);
725 my $f = $csv->keep_meta_info;
726
727 By default, the parsing of input records is as simple and fast as
728 possible. However, some parsing information - like quotation of the
729 original field - is lost in that process. Setting this flag to true
730 enables retrieving that information after parsing with the methods
731 "meta_info", "is_quoted", and "is_binary" described below. Default is
732 false for performance.
733
734 If you set this attribute to a value greater than 9, then you can
735 control output quotation style like it was used in the input of the the
736 last parsed record (unless quotation was added because of other
737 reasons).
738
739 my $csv = Text::CSV_XS->new ({
740 binary => 1,
741 keep_meta_info => 1,
742 quote_space => 0,
743 });
744
745 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
746
747 $csv->print (*STDOUT, \@row);
748 # 1,,, , ,f,g,"h""h",help,help
749 $csv->keep_meta_info (11);
750 $csv->print (*STDOUT, \@row);
751 # 1,,"", ," ",f,"g","h""h",help,"help"
752
753 undef_str
754
755 my $csv = Text::CSV_XS->new ({ undef_str => "\\N" });
756 $csv->undef_str (undef);
757 my $s = $csv->undef_str;
758
759 This attribute optionally defines the output of undefined fields. The
760 value passed is not changed at all, so if it needs quotation, the
761 quotation needs to be included in the value of the attribute. Use with
762 caution, as passing a value like ",",,,,""" will for sure mess up
763 your output. The default for this attribute is "undef", meaning no
764 special treatment.
765
766 This attribute is useful when exporting CSV data to be imported in
767 custom loaders, like for MySQL, that recognize special sequences for
768 "NULL" data.
769
770 This attribute has no meaning when parsing CSV data.
771
772 comment_str
773
774 my $csv = Text::CSV_XS->new ({ comment_str => "#" });
775 $csv->comment_str (undef);
776 my $s = $csv->comment_str;
777
778 This attribute optionally defines a string to be recognized as comment.
779 If this attribute is defined, all lines starting with this sequence
780 will not be parsed as CSV but skipped as comment.
781
782 This attribute has no meaning when generating CSV.
783
784 Comment strings that start with any of the special characters/sequences
785 are not supported (so it cannot start with any of "sep_char",
786 "quote_char", "escape_char", "sep", "quote", or "eol").
787
788 For convenience, "comment" is an alias for "comment_str".
789
790 verbatim
791
792 my $csv = Text::CSV_XS->new ({ verbatim => 1 });
793 $csv->verbatim (0);
794 my $f = $csv->verbatim;
795
796 This is a quite controversial attribute to set, but makes some hard
797 things possible.
798
799 The rationale behind this attribute is to tell the parser that the
800 normally special characters newline ("NL") and Carriage Return ("CR")
801 will not be special when this flag is set, and be dealt with as being
802 ordinary binary characters. This will ease working with data with
803 embedded newlines.
804
805 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
806 every line.
807
808 Imagine a file format like
809
810 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
811
812 where, the line ending is a very specific "#\r\n", and the sep_char is
813 a "^" (caret). None of the fields is quoted, but embedded binary
814 data is likely to be present. With the specific line ending, this
815 should not be too hard to detect.
816
817 By default, Text::CSV_XS' parse function is instructed to only know
818 about "\n" and "\r" to be legal line endings, and so has to deal with
819 the embedded newline as a real "end-of-line", so it can scan the next
820 line if binary is true, and the newline is inside a quoted field. With
821 this option, we tell "parse" to parse the line as if "\n" is just
822 nothing more than a binary character.
823
824 For "parse" this means that the parser has no more idea about line
825 ending and "getline" "chomp"s line endings on reading.
826
827 types
828
829 A set of column types; the attribute is immediately passed to the
830 "types" method.
831
832 callbacks
833
834 See the "Callbacks" section below.
835
836 accessors
837
838 To sum it up,
839
840 $csv = Text::CSV_XS->new ();
841
842 is equivalent to
843
844 $csv = Text::CSV_XS->new ({
845 eol => undef, # \r, \n, or \r\n
846 sep_char => ',',
847 sep => undef,
848 quote_char => '"',
849 quote => undef,
850 escape_char => '"',
851 binary => 0,
852 decode_utf8 => 1,
853 auto_diag => 0,
854 diag_verbose => 0,
855 blank_is_undef => 0,
856 empty_is_undef => 0,
857 allow_whitespace => 0,
858 allow_loose_quotes => 0,
859 allow_loose_escapes => 0,
860 allow_unquoted_escape => 0,
861 always_quote => 0,
862 quote_empty => 0,
863 quote_space => 1,
864 escape_null => 1,
865 quote_binary => 1,
866 keep_meta_info => 0,
867 strict => 0,
868 skip_empty_rows => 0,
869 formula => 0,
870 verbatim => 0,
871 undef_str => undef,
872 comment_str => undef,
873 types => undef,
874 callbacks => undef,
875 });
876
877 For all of the above mentioned flags, an accessor method is available
878 where you can inquire the current value, or change the value
879
880 my $quote = $csv->quote_char;
881 $csv->binary (1);
882
883 It is not wise to change these settings halfway through writing "CSV"
884 data to a stream. If however you want to create a new stream using the
885 available "CSV" object, there is no harm in changing them.
886
887 If the "new" constructor call fails, it returns "undef", and makes
888 the fail reason available through the "error_diag" method.
889
890 $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
891 die "".Text::CSV_XS->error_diag ();
892
893 "error_diag" will return a string like
894
895 "INI - Unknown attribute 'ecs_char'"
896
897 known_attributes
898 @attr = Text::CSV_XS->known_attributes;
899 @attr = Text::CSV_XS::known_attributes;
900 @attr = $csv->known_attributes;
901
902 This method will return an ordered list of all the supported
903 attributes as described above. This can be useful for knowing what
904 attributes are valid in classes that use or extend Text::CSV_XS.
905
906 print
907 $status = $csv->print ($fh, $colref);
908
909 Similar to "combine" + "string" + "print", but much more efficient.
910 It expects an array ref as input (not an array!) and the resulting
911 string is not really created, but immediately written to the $fh
912 object, typically an IO handle or any other object that offers a
913 "print" method.
914
915 For performance reasons "print" does not create a result string, so
916 all "string", "status", "fields", and "error_input" methods will return
917 undefined information after executing this method.
918
919 If $colref is "undef" (explicit, not through a variable argument) and
920 "bind_columns" was used to specify fields to be printed, it is
921 possible to make performance improvements, as otherwise data would have
922 to be copied as arguments to the method call:
923
924 $csv->bind_columns (\($foo, $bar));
925 $status = $csv->print ($fh, undef);
926
927 A short benchmark
928
929 my @data = ("aa" .. "zz");
930 $csv->bind_columns (\(@data));
931
932 $csv->print ($fh, [ @data ]); # 11800 recs/sec
933 $csv->print ($fh, \@data ); # 57600 recs/sec
934 $csv->print ($fh, undef ); # 48500 recs/sec
935
936 say
937 $status = $csv->say ($fh, $colref);
938
939 Like "print", but "eol" defaults to "$\".
940
941 print_hr
942 $csv->print_hr ($fh, $ref);
943
944 Provides an easy way to print a $ref (as fetched with "getline_hr")
945 provided the column names are set with "column_names".
946
947 It is just a wrapper method with basic parameter checks over
948
949 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
950
951 combine
952 $status = $csv->combine (@fields);
953
954 This method constructs a "CSV" record from @fields, returning success
955 or failure. Failure can result from lack of arguments or an argument
956 that contains an invalid character. Upon success, "string" can be
957 called to retrieve the resultant "CSV" string. Upon failure, the
958 value returned by "string" is undefined and "error_input" could be
959 called to retrieve the invalid argument.
960
961 string
962 $line = $csv->string ();
963
964 This method returns the input to "parse" or the resultant "CSV"
965 string of "combine", whichever was called more recently.
966
967 getline
968 $colref = $csv->getline ($fh);
969
970 This is the counterpart to "print", as "parse" is the counterpart to
971 "combine": it parses a row from the $fh handle using the "getline"
972 method associated with $fh and parses this row into an array ref.
973 This array ref is returned by the function or "undef" for failure.
974 When $fh does not support "getline", you are likely to hit errors.
975
976 When fields are bound with "bind_columns" the return value is a
977 reference to an empty list.
978
979 The "string", "fields", and "status" methods are meaningless again.
980
981 getline_all
982 $arrayref = $csv->getline_all ($fh);
983 $arrayref = $csv->getline_all ($fh, $offset);
984 $arrayref = $csv->getline_all ($fh, $offset, $length);
985
986 This will return a reference to a list of getline ($fh) results. In
987 this call, "keep_meta_info" is disabled. If $offset is negative, as
988 with "splice", only the last "abs ($offset)" records of $fh are taken
989 into consideration.
990
991 Given a CSV file with 10 lines:
992
993 lines call
994 ----- ---------------------------------------------------------
995 0..9 $csv->getline_all ($fh) # all
996 0..9 $csv->getline_all ($fh, 0) # all
997 8..9 $csv->getline_all ($fh, 8) # start at 8
998 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
999 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
1000 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
1001 8..9 $csv->getline_all ($fh, -2) # last 2 rows
1002 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
1003
1004 getline_hr
1005 The "getline_hr" and "column_names" methods work together to allow you
1006 to have rows returned as hashrefs. You must call "column_names" first
1007 to declare your column names.
1008
1009 $csv->column_names (qw( code name price description ));
1010 $hr = $csv->getline_hr ($fh);
1011 print "Price for $hr->{name} is $hr->{price} EUR\n";
1012
1013 "getline_hr" will croak if called before "column_names".
1014
1015 Note that "getline_hr" creates a hashref for every row and will be
1016 much slower than the combined use of "bind_columns" and "getline" but
1017 still offering the same easy to use hashref inside the loop:
1018
1019 my @cols = @{$csv->getline ($fh)};
1020 $csv->column_names (@cols);
1021 while (my $row = $csv->getline_hr ($fh)) {
1022 print $row->{price};
1023 }
1024
1025 Could easily be rewritten to the much faster:
1026
1027 my @cols = @{$csv->getline ($fh)};
1028 my $row = {};
1029 $csv->bind_columns (\@{$row}{@cols});
1030 while ($csv->getline ($fh)) {
1031 print $row->{price};
1032 }
1033
1034 Your mileage may vary for the size of the data and the number of rows.
1035 With perl-5.14.2 the comparison for a 100_000 line file with 14
1036 columns:
1037
1038 Rate hashrefs getlines
1039 hashrefs 1.00/s -- -76%
1040 getlines 4.15/s 313% --
1041
1042 getline_hr_all
1043 $arrayref = $csv->getline_hr_all ($fh);
1044 $arrayref = $csv->getline_hr_all ($fh, $offset);
1045 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
1046
1047 This will return a reference to a list of getline_hr ($fh) results.
1048 In this call, "keep_meta_info" is disabled.
1049
1050 parse
1051 $status = $csv->parse ($line);
1052
1053 This method decomposes a "CSV" string into fields, returning success
1054 or failure. Failure can result from a lack of argument or the given
1055 "CSV" string is improperly formatted. Upon success, "fields" can be
1056 called to retrieve the decomposed fields. Upon failure calling "fields"
1057 will return undefined data and "error_input" can be called to
1058 retrieve the invalid argument.
1059
1060 You may use the "types" method for setting column types. See "types"'
1061 description below.
1062
1063 The $line argument is supposed to be a simple scalar. Everything else
1064 is supposed to croak and set error 1500.
1065
1066 fragment
1067 This function tries to implement RFC7111 (URI Fragment Identifiers for
1068 the text/csv Media Type) -
1069 https://datatracker.ietf.org/doc/html/rfc7111
1070
1071 my $AoA = $csv->fragment ($fh, $spec);
1072
1073 In specifications, "*" is used to specify the last item, a dash ("-")
1074 to indicate a range. All indices are 1-based: the first row or
1075 column has index 1. Selections can be combined with the semi-colon
1076 (";").
1077
1078 When using this method in combination with "column_names", the
1079 returned reference will point to a list of hashes instead of a list
1080 of lists. A disjointed cell-based combined selection might return
1081 rows with different number of columns making the use of hashes
1082 unpredictable.
1083
1084 $csv->column_names ("Name", "Age");
1085 my $AoH = $csv->fragment ($fh, "col=3;8");
1086
1087 If the "after_parse" callback is active, it is also called on every
1088 line parsed and skipped before the fragment.
1089
1090 row
1091 row=4
1092 row=5-7
1093 row=6-*
1094 row=1-2;4;6-*
1095
1096 col
1097 col=2
1098 col=1-3
1099 col=4-*
1100 col=1-2;4;7-*
1101
1102 cell
1103 In cell-based selection, the comma (",") is used to pair row and
1104 column
1105
1106 cell=4,1
1107
1108 The range operator ("-") using "cell"s can be used to define top-left
1109 and bottom-right "cell" location
1110
1111 cell=3,1-4,6
1112
1113 The "*" is only allowed in the second part of a pair
1114
1115 cell=3,2-*,2 # row 3 till end, only column 2
1116 cell=3,2-3,* # column 2 till end, only row 3
1117 cell=3,2-*,* # strip row 1 and 2, and column 1
1118
1119 Cells and cell ranges may be combined with ";", possibly resulting in
1120 rows with different numbers of columns
1121
1122 cell=1,1-2,2;3,3-4,4;1,4;4,1
1123
1124 Disjointed selections will only return selected cells. The cells
1125 that are not specified will not be included in the returned
1126 set, not even as "undef". As an example given a "CSV" like
1127
1128 11,12,13,...19
1129 21,22,...28,29
1130 : :
1131 91,...97,98,99
1132
1133 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1134
1135 11,12,14
1136 21,22
1137 33,34
1138 41,43,44
1139
1140 Overlapping cell-specs will return those cells only once, So
1141 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1142
1143 11,12,13
1144 21,22,23,24
1145 31,32,33,34
1146 42,43,44
1147
1148 RFC7111 <https://datatracker.ietf.org/doc/html/rfc7111> does not
1149 allow different types of specs to be combined (either "row" or "col"
1150 or "cell"). Passing an invalid fragment specification will croak and
1151 set error 2013.
1152
1153 column_names
1154 Set the "keys" that will be used in the "getline_hr" calls. If no
1155 keys (column names) are passed, it will return the current setting as a
1156 list.
1157
1158 "column_names" accepts a list of scalars (the column names) or a
1159 single array_ref, so you can pass the return value from "getline" too:
1160
1161 $csv->column_names ($csv->getline ($fh));
1162
1163 "column_names" does no checking on duplicates at all, which might lead
1164 to unexpected results. Undefined entries will be replaced with the
1165 string "\cAUNDEF\cA", so
1166
1167 $csv->column_names (undef, "", "name", "name");
1168 $hr = $csv->getline_hr ($fh);
1169
1170 will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1171 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1172 field.
1173
1174 "column_names" croaks on invalid arguments.
1175
1176 header
1177 This method does NOT work in perl-5.6.x
1178
1179 Parse the CSV header and set "sep", column_names and encoding.
1180
1181 my @hdr = $csv->header ($fh);
1182 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1183 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1184
1185 The first argument should be a file handle.
1186
1187 This method resets some object properties, as it is supposed to be
1188 invoked only once per file or stream. It will leave attributes
1189 "column_names" and "bound_columns" alone if setting column names is
1190 disabled. Reading headers on previously process objects might fail on
1191 perl-5.8.0 and older.
1192
1193 Assuming that the file opened for parsing has a header, and the header
1194 does not contain problematic characters like embedded newlines, read
1195 the first line from the open handle then auto-detect whether the header
1196 separates the column names with a character from the allowed separator
1197 list.
1198
1199 If any of the allowed separators matches, and none of the other
1200 allowed separators match, set "sep" to that separator for the
1201 current CSV_XS instance and use it to parse the first line, map those
1202 to lowercase, and use that to set the instance "column_names":
1203
1204 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1205 open my $fh, "<", "file.csv";
1206 binmode $fh; # for Windows
1207 $csv->header ($fh);
1208 while (my $row = $csv->getline_hr ($fh)) {
1209 ...
1210 }
1211
1212 If the header is empty, contains more than one unique separator out of
1213 the allowed set, contains empty fields, or contains identical fields
1214 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1215 respectively.
1216
1217 If the header contains embedded newlines or is not valid CSV in any
1218 other way, this method will croak and leave the parse error untouched.
1219
1220 A successful call to "header" will always set the "sep" of the $csv
1221 object. This behavior can not be disabled.
1222
1223 return value
1224
1225 On error this method will croak.
1226
1227 In list context, the headers will be returned whether they are used to
1228 set "column_names" or not.
1229
1230 In scalar context, the instance itself is returned. Note: the values
1231 as found in the header will effectively be lost if "set_column_names"
1232 is false.
1233
1234 Options
1235
1236 sep_set
1237 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1238
1239 The list of legal separators defaults to "[ ";", "," ]" and can be
1240 changed by this option. As this is probably the most often used
1241 option, it can be passed on its own as an unnamed argument:
1242
1243 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1244
1245 Multi-byte sequences are allowed, both multi-character and
1246 Unicode. See "sep".
1247
1248 detect_bom
1249 $csv->header ($fh, { detect_bom => 1 });
1250
1251 The default behavior is to detect if the header line starts with a
1252 BOM. If the header has a BOM, use that to set the encoding of $fh.
1253 This default behavior can be disabled by passing a false value to
1254 "detect_bom".
1255
1256 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1257 UTF-32BE, and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1258 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1259 supported.
1260
1261 If a supported BOM was detected as start of the stream, it is stored
1262 in the object attribute "ENCODING".
1263
1264 my $enc = $csv->{ENCODING};
1265
1266 The encoding is used with "binmode" on $fh.
1267
1268 If the handle was opened in a (correct) encoding, this method will
1269 not alter the encoding, as it checks the leading bytes of the first
1270 line. In case the stream starts with a decoded BOM ("U+FEFF"),
1271 "{ENCODING}" will be "" (empty) instead of the default "undef".
1272
1273 munge_column_names
1274 This option offers the means to modify the column names into
1275 something that is most useful to the application. The default is to
1276 map all column names to lower case.
1277
1278 $csv->header ($fh, { munge_column_names => "lc" });
1279
1280 The following values are available:
1281
1282 lc - lower case
1283 uc - upper case
1284 db - valid DB field names
1285 none - do not change
1286 \%hash - supply a mapping
1287 \&cb - supply a callback
1288
1289 Lower case
1290 $csv->header ($fh, { munge_column_names => "lc" });
1291
1292 The header is changed to all lower-case
1293
1294 $_ = lc;
1295
1296 Upper case
1297 $csv->header ($fh, { munge_column_names => "uc" });
1298
1299 The header is changed to all upper-case
1300
1301 $_ = uc;
1302
1303 Literal
1304 $csv->header ($fh, { munge_column_names => "none" });
1305
1306 Hash
1307 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1308
1309 if a value does not exist, the original value is used unchanged
1310
1311 Database
1312 $csv->header ($fh, { munge_column_names => "db" });
1313
1314 - lower-case
1315
1316 - all sequences of non-word characters are replaced with an
1317 underscore
1318
1319 - all leading underscores are removed
1320
1321 $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1322
1323 Callback
1324 $csv->header ($fh, { munge_column_names => sub { fc } });
1325 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1326 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1327
1328 As this callback is called in a "map", you can use $_ directly.
1329
1330 set_column_names
1331 $csv->header ($fh, { set_column_names => 1 });
1332
1333 The default is to set the instances column names using
1334 "column_names" if the method is successful, so subsequent calls to
1335 "getline_hr" can return a hash. Disable setting the header can be
1336 forced by using a false value for this option.
1337
1338 As described in "return value" above, content is lost in scalar
1339 context.
1340
1341 Validation
1342
1343 When receiving CSV files from external sources, this method can be
1344 used to protect against changes in the layout by restricting to known
1345 headers (and typos in the header fields).
1346
1347 my %known = (
1348 "record key" => "c_rec",
1349 "rec id" => "c_rec",
1350 "id_rec" => "c_rec",
1351 "kode" => "code",
1352 "code" => "code",
1353 "vaule" => "value",
1354 "value" => "value",
1355 );
1356 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1357 open my $fh, "<", $source or die "$source: $!";
1358 $csv->header ($fh, { munge_column_names => sub {
1359 s/\s+$//;
1360 s/^\s+//;
1361 $known{lc $_} or die "Unknown column '$_' in $source";
1362 }});
1363 while (my $row = $csv->getline_hr ($fh)) {
1364 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1365 }
1366
1367 bind_columns
1368 Takes a list of scalar references to be used for output with "print"
1369 or to store in the fields fetched by "getline". When you do not pass
1370 enough references to store the fetched fields in, "getline" will fail
1371 with error 3006. If you pass more than there are fields to return,
1372 the content of the remaining references is left untouched.
1373
1374 $csv->bind_columns (\$code, \$name, \$price, \$description);
1375 while ($csv->getline ($fh)) {
1376 print "The price of a $name is \x{20ac} $price\n";
1377 }
1378
1379 To reset or clear all column binding, call "bind_columns" with the
1380 single argument "undef". This will also clear column names.
1381
1382 $csv->bind_columns (undef);
1383
1384 If no arguments are passed at all, "bind_columns" will return the list
1385 of current bindings or "undef" if no binds are active.
1386
1387 Note that in parsing with "bind_columns", the fields are set on the
1388 fly. That implies that if the third field of a row causes an error
1389 (or this row has just two fields where the previous row had more), the
1390 first two fields already have been assigned the values of the current
1391 row, while the rest of the fields will still hold the values of the
1392 previous row. If you want the parser to fail in these cases, use the
1393 "strict" attribute.
1394
1395 eof
1396 $eof = $csv->eof ();
1397
1398 If "parse" or "getline" was used with an IO stream, this method will
1399 return true (1) if the last call hit end of file, otherwise it will
1400 return false (''). This is useful to see the difference between a
1401 failure and end of file.
1402
1403 Note that if the parsing of the last line caused an error, "eof" is
1404 still true. That means that if you are not using "auto_diag", an idiom
1405 like
1406
1407 while (my $row = $csv->getline ($fh)) {
1408 # ...
1409 }
1410 $csv->eof or $csv->error_diag;
1411
1412 will not report the error. You would have to change that to
1413
1414 while (my $row = $csv->getline ($fh)) {
1415 # ...
1416 }
1417 +$csv->error_diag and $csv->error_diag;
1418
1419 types
1420 $csv->types (\@tref);
1421
1422 This method is used to force that (all) columns are of a given type.
1423 For example, if you have an integer column, two columns with
1424 doubles and a string column, then you might do a
1425
1426 $csv->types ([Text::CSV_XS::IV (),
1427 Text::CSV_XS::NV (),
1428 Text::CSV_XS::NV (),
1429 Text::CSV_XS::PV ()]);
1430
1431 Column types are used only for decoding columns while parsing, in
1432 other words by the "parse" and "getline" methods.
1433
1434 You can unset column types by doing a
1435
1436 $csv->types (undef);
1437
1438 or fetch the current type settings with
1439
1440 $types = $csv->types ();
1441
1442 IV
1443 CSV_TYPE_IV
1444 Set field type to integer.
1445
1446 NV
1447 CSV_TYPE_NV
1448 Set field type to numeric/float.
1449
1450 PV
1451 CSV_TYPE_PV
1452 Set field type to string.
1453
1454 fields
1455 @columns = $csv->fields ();
1456
1457 This method returns the input to "combine" or the resultant
1458 decomposed fields of a successful "parse", whichever was called more
1459 recently.
1460
1461 Note that the return value is undefined after using "getline", which
1462 does not fill the data structures returned by "parse".
1463
1464 meta_info
1465 @flags = $csv->meta_info ();
1466
1467 This method returns the "flags" of the input to "combine" or the flags
1468 of the resultant decomposed fields of "parse", whichever was called
1469 more recently.
1470
1471 For each field, a meta_info field will hold flags that inform
1472 something about the field returned by the "fields" method or
1473 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1474
1475 0x0001
1476 "CSV_FLAGS_IS_QUOTED"
1477 The field was quoted.
1478
1479 0x0002
1480 "CSV_FLAGS_IS_BINARY"
1481 The field was binary.
1482
1483 0x0004
1484 "CSV_FLAGS_ERROR_IN_FIELD"
1485 The field was invalid.
1486
1487 Currently only used when "allow_loose_quotes" is active.
1488
1489 0x0010
1490 "CSV_FLAGS_IS_MISSING"
1491 The field was missing.
1492
1493 See the "is_***" methods below.
1494
1495 is_quoted
1496 my $quoted = $csv->is_quoted ($column_idx);
1497
1498 where $column_idx is the (zero-based) index of the column in the
1499 last result of "parse".
1500
1501 This returns a true value if the data in the indicated column was
1502 enclosed in "quote_char" quotes. This might be important for fields
1503 where content ",20070108," is to be treated as a numeric value, and
1504 where ","20070108"," is explicitly marked as character string data.
1505
1506 This method is only valid when "keep_meta_info" is set to a true value.
1507
1508 is_binary
1509 my $binary = $csv->is_binary ($column_idx);
1510
1511 where $column_idx is the (zero-based) index of the column in the
1512 last result of "parse".
1513
1514 This returns a true value if the data in the indicated column contained
1515 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1516
1517 This method is only valid when "keep_meta_info" is set to a true value.
1518
1519 is_missing
1520 my $missing = $csv->is_missing ($column_idx);
1521
1522 where $column_idx is the (zero-based) index of the column in the
1523 last result of "getline_hr".
1524
1525 $csv->keep_meta_info (1);
1526 while (my $hr = $csv->getline_hr ($fh)) {
1527 $csv->is_missing (0) and next; # This was an empty line
1528 }
1529
1530 When using "getline_hr", it is impossible to tell if the parsed
1531 fields are "undef" because they where not filled in the "CSV" stream
1532 or because they were not read at all, as all the fields defined by
1533 "column_names" are set in the hash-ref. If you still need to know if
1534 all fields in each row are provided, you should enable "keep_meta_info"
1535 so you can check the flags.
1536
1537 If "keep_meta_info" is "false", "is_missing" will always return
1538 "undef", regardless of $column_idx being valid or not. If this
1539 attribute is "true" it will return either 0 (the field is present) or 1
1540 (the field is missing).
1541
1542 A special case is the empty line. If the line is completely empty -
1543 after dealing with the flags - this is still a valid CSV line: it is a
1544 record of just one single empty field. However, if "keep_meta_info" is
1545 set, invoking "is_missing" with index 0 will now return true.
1546
1547 status
1548 $status = $csv->status ();
1549
1550 This method returns the status of the last invoked "combine" or "parse"
1551 call. Status is success (true: 1) or failure (false: "undef" or 0).
1552
1553 Note that as this only keeps track of the status of above mentioned
1554 methods, you are probably looking for "error_diag" instead.
1555
1556 error_input
1557 $bad_argument = $csv->error_input ();
1558
1559 This method returns the erroneous argument (if it exists) of "combine"
1560 or "parse", whichever was called more recently. If the last
1561 invocation was successful, "error_input" will return "undef".
1562
1563 Depending on the type of error, it might also hold the data for the
1564 last error-input of "getline".
1565
1566 error_diag
1567 Text::CSV_XS->error_diag ();
1568 $csv->error_diag ();
1569 $error_code = 0 + $csv->error_diag ();
1570 $error_str = "" . $csv->error_diag ();
1571 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1572
1573 If (and only if) an error occurred, this function returns the
1574 diagnostics of that error.
1575
1576 If called in void context, this will print the internal error code and
1577 the associated error message to STDERR.
1578
1579 If called in list context, this will return the error code and the
1580 error message in that order. If the last error was from parsing, the
1581 rest of the values returned are a best guess at the location within
1582 the line that was being parsed. Their values are 1-based. The
1583 position currently is index of the byte at which the parsing failed in
1584 the current record. It might change to be the index of the current
1585 character in a later release. The records is the index of the record
1586 parsed by the csv instance. The field number is the index of the field
1587 the parser thinks it is currently trying to parse. See
1588 examples/csv-check for how this can be used.
1589
1590 If called in scalar context, it will return the diagnostics in a
1591 single scalar, a-la $!. It will contain the error code in numeric
1592 context, and the diagnostics message in string context.
1593
1594 When called as a class method or a direct function call, the
1595 diagnostics are that of the last "new" call.
1596
1597 record_number
1598 $recno = $csv->record_number ();
1599
1600 Returns the records parsed by this csv instance. This value should be
1601 more accurate than $. when embedded newlines come in play. Records
1602 written by this instance are not counted.
1603
1604 SetDiag
1605 $csv->SetDiag (0);
1606
1607 Use to reset the diagnostics if you are dealing with errors.
1608
1610 By default none of these are exported.
1611
1612 csv
1613 use Text::CSV_XS qw( csv );
1614
1615 Import the function "csv" function. See below.
1616
1617 :CONSTANTS
1618 use Text::CSV_XS qw( :CONSTANTS );
1619
1620 Import module constants "CSV_FLAGS_IS_QUOTED",
1621 "CSV_FLAGS_IS_BINARY", "CSV_FLAGS_ERROR_IN_FIELD",
1622 "CSV_FLAGS_IS_MISSING", "CSV_TYPE_PV", "CSV_TYPE_IV", and
1623 "CSV_TYPE_NV". Each can be imported alone
1624
1625 use Text::CSV_XS qw( CSV_FLAS_IS_BINARY CSV_TYPE_NV );
1626
1628 csv
1629 This function is not exported by default and should be explicitly
1630 requested:
1631
1632 use Text::CSV_XS qw( csv );
1633
1634 This is a high-level function that aims at simple (user) interfaces.
1635 This can be used to read/parse a "CSV" file or stream (the default
1636 behavior) or to produce a file or write to a stream (define the "out"
1637 attribute). It returns an array- or hash-reference on parsing (or
1638 "undef" on fail) or the numeric value of "error_diag" on writing.
1639 When this function fails you can get to the error using the class call
1640 to "error_diag"
1641
1642 my $aoa = csv (in => "test.csv") or
1643 die Text::CSV_XS->error_diag;
1644
1645 This function takes the arguments as key-value pairs. This can be
1646 passed as a list or as an anonymous hash:
1647
1648 my $aoa = csv ( in => "test.csv", sep_char => ";");
1649 my $aoh = csv ({ in => $fh, headers => "auto" });
1650
1651 The arguments passed consist of two parts: the arguments to "csv"
1652 itself and the optional attributes to the "CSV" object used inside
1653 the function as enumerated and explained in "new".
1654
1655 If not overridden, the default option used for CSV is
1656
1657 auto_diag => 1
1658 escape_null => 0
1659
1660 The option that is always set and cannot be altered is
1661
1662 binary => 1
1663
1664 As this function will likely be used in one-liners, it allows "quote"
1665 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1666 "esc" or "escape".
1667
1668 Alternative invocations:
1669
1670 my $aoa = Text::CSV_XS::csv (in => "file.csv");
1671
1672 my $csv = Text::CSV_XS->new ();
1673 my $aoa = $csv->csv (in => "file.csv");
1674
1675 In the latter case, the object attributes are used from the existing
1676 object and the attribute arguments in the function call are ignored:
1677
1678 my $csv = Text::CSV_XS->new ({ sep_char => ";" });
1679 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1680
1681 will parse using ";" as "sep_char", not ",".
1682
1683 in
1684
1685 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1686 which will be opened for reading and closed when finished, a file
1687 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1688 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1689 "\q{1,2,"csv"}").
1690
1691 When used with "out", "in" should be a reference to a CSV structure
1692 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1693 reference. The code-ref will be invoked with no arguments.
1694
1695 my $aoa = csv (in => "file.csv");
1696
1697 open my $fh, "<", "file.csv";
1698 my $aoa = csv (in => $fh);
1699
1700 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1701 my $err = csv (in => $csv, out => "file.csv");
1702
1703 If called in void context without the "out" attribute, the resulting
1704 ref will be used as input to a subsequent call to csv:
1705
1706 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1707
1708 will be a shortcut to
1709
1710 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1711
1712 where, in the absence of the "out" attribute, this is a shortcut to
1713
1714 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1715 out => *STDOUT)
1716
1717 out
1718
1719 csv (in => $aoa, out => "file.csv");
1720 csv (in => $aoa, out => $fh);
1721 csv (in => $aoa, out => STDOUT);
1722 csv (in => $aoa, out => *STDOUT);
1723 csv (in => $aoa, out => \*STDOUT);
1724 csv (in => $aoa, out => \my $data);
1725 csv (in => $aoa, out => undef);
1726 csv (in => $aoa, out => \"skip");
1727
1728 csv (in => $fh, out => \@aoa);
1729 csv (in => $fh, out => \@aoh, bom => 1);
1730 csv (in => $fh, out => \%hsh, key => "key");
1731
1732 In output mode, the default CSV options when producing CSV are
1733
1734 eol => "\r\n"
1735
1736 The "fragment" attribute is ignored in output mode.
1737
1738 "out" can be a file name (e.g. "file.csv"), which will be opened for
1739 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1740 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1741 or a reference to a scalar (e.g. "\my $data").
1742
1743 csv (in => sub { $sth->fetch }, out => "dump.csv");
1744 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1745 headers => $sth->{NAME_lc});
1746
1747 When a code-ref is used for "in", the output is generated per
1748 invocation, so no buffering is involved. This implies that there is no
1749 size restriction on the number of records. The "csv" function ends when
1750 the coderef returns a false value.
1751
1752 If "out" is set to a reference of the literal string "skip", the output
1753 will be suppressed completely, which might be useful in combination
1754 with a filter for side effects only.
1755
1756 my %cache;
1757 csv (in => "dump.csv",
1758 out => \"skip",
1759 on_in => sub { $cache{$_[1][1]}++ });
1760
1761 Currently, setting "out" to any false value ("undef", "", 0) will be
1762 equivalent to "\"skip"".
1763
1764 If the "in" argument point to something to parse, and the "out" is set
1765 to a reference to an "ARRAY" or a "HASH", the output is appended to the
1766 data in the existing reference. The result of the parse should match
1767 what exists in the reference passed. This might come handy when you
1768 have to parse a set of files with similar content (like data stored per
1769 period) and you want to collect that into a single data structure:
1770
1771 my %hash;
1772 csv (in => $_, out => \%hash, key => "id") for sort glob "foo-[0-9]*.csv";
1773
1774 my @list; # List of arrays
1775 csv (in => $_, out => \@list) for sort glob "foo-[0-9]*.csv";
1776
1777 my @list; # List of hashes
1778 csv (in => $_, out => \@list, bom => 1) for sort glob "foo-[0-9]*.csv";
1779
1780 encoding
1781
1782 If passed, it should be an encoding accepted by the ":encoding()"
1783 option to "open". There is no default value. This attribute does not
1784 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1785 use in command line invocations.
1786
1787 If "encoding" is set to the literal value "auto", the method "header"
1788 will be invoked on the opened stream to check if there is a BOM and set
1789 the encoding accordingly. This is equal to passing a true value in
1790 the option "detect_bom".
1791
1792 Encodings can be stacked, as supported by "binmode":
1793
1794 # Using PerlIO::via::gzip
1795 csv (in => \@csv,
1796 out => "test.csv:via.gz",
1797 encoding => ":via(gzip):encoding(utf-8)",
1798 );
1799 $aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
1800
1801 # Using PerlIO::gzip
1802 csv (in => \@csv,
1803 out => "test.csv:via.gz",
1804 encoding => ":gzip:encoding(utf-8)",
1805 );
1806 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1807
1808 detect_bom
1809
1810 If "detect_bom" is given, the method "header" will be invoked on
1811 the opened stream to check if there is a BOM and set the encoding
1812 accordingly.
1813
1814 "detect_bom" can be abbreviated to "bom".
1815
1816 This is the same as setting "encoding" to "auto".
1817
1818 Note that as the method "header" is invoked, its default is to also
1819 set the headers.
1820
1821 headers
1822
1823 If this attribute is not given, the default behavior is to produce an
1824 array of arrays.
1825
1826 If "headers" is supplied, it should be an anonymous list of column
1827 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1828 "lc", "uc", or "skip".
1829
1830 skip
1831 When "skip" is used, the header will not be included in the output.
1832
1833 my $aoa = csv (in => $fh, headers => "skip");
1834
1835 auto
1836 If "auto" is used, the first line of the "CSV" source will be read as
1837 the list of field headers and used to produce an array of hashes.
1838
1839 my $aoh = csv (in => $fh, headers => "auto");
1840
1841 lc
1842 If "lc" is used, the first line of the "CSV" source will be read as
1843 the list of field headers mapped to lower case and used to produce
1844 an array of hashes. This is a variation of "auto".
1845
1846 my $aoh = csv (in => $fh, headers => "lc");
1847
1848 uc
1849 If "uc" is used, the first line of the "CSV" source will be read as
1850 the list of field headers mapped to upper case and used to produce
1851 an array of hashes. This is a variation of "auto".
1852
1853 my $aoh = csv (in => $fh, headers => "uc");
1854
1855 CODE
1856 If a coderef is used, the first line of the "CSV" source will be
1857 read as the list of mangled field headers in which each field is
1858 passed as the only argument to the coderef. This list is used to
1859 produce an array of hashes.
1860
1861 my $aoh = csv (in => $fh,
1862 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1863
1864 this example is a variation of using "lc" where all occurrences of
1865 "kode" are replaced with "code".
1866
1867 ARRAY
1868 If "headers" is an anonymous list, the entries in the list will be
1869 used as field names. The first line is considered data instead of
1870 headers.
1871
1872 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1873 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1874
1875 HASH
1876 If "headers" is a hash reference, this implies "auto", but header
1877 fields that exist as key in the hashref will be replaced by the value
1878 for that key. Given a CSV file like
1879
1880 post-kode,city,name,id number,fubble
1881 1234AA,Duckstad,Donald,13,"X313DF"
1882
1883 using
1884
1885 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1886
1887 will return an entry like
1888
1889 { pc => "1234AA",
1890 city => "Duckstad",
1891 name => "Donald",
1892 ID => "13",
1893 fubble => "X313DF",
1894 }
1895
1896 See also "munge_column_names" and "set_column_names".
1897
1898 munge_column_names
1899
1900 If "munge_column_names" is set, the method "header" is invoked on
1901 the opened stream with all matching arguments to detect and set the
1902 headers.
1903
1904 "munge_column_names" can be abbreviated to "munge".
1905
1906 key
1907
1908 If passed, will default "headers" to "auto" and return a hashref
1909 instead of an array of hashes. Allowed values are simple scalars or
1910 array-references where the first element is the joiner and the rest are
1911 the fields to join to combine the key.
1912
1913 my $ref = csv (in => "test.csv", key => "code");
1914 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1915
1916 with test.csv like
1917
1918 code,product,price,color
1919 1,pc,850,gray
1920 2,keyboard,12,white
1921 3,mouse,5,black
1922
1923 the first example will return
1924
1925 { 1 => {
1926 code => 1,
1927 color => 'gray',
1928 price => 850,
1929 product => 'pc'
1930 },
1931 2 => {
1932 code => 2,
1933 color => 'white',
1934 price => 12,
1935 product => 'keyboard'
1936 },
1937 3 => {
1938 code => 3,
1939 color => 'black',
1940 price => 5,
1941 product => 'mouse'
1942 }
1943 }
1944
1945 the second example will return
1946
1947 { "1:gray" => {
1948 code => 1,
1949 color => 'gray',
1950 price => 850,
1951 product => 'pc'
1952 },
1953 "2:white" => {
1954 code => 2,
1955 color => 'white',
1956 price => 12,
1957 product => 'keyboard'
1958 },
1959 "3:black" => {
1960 code => 3,
1961 color => 'black',
1962 price => 5,
1963 product => 'mouse'
1964 }
1965 }
1966
1967 The "key" attribute can be combined with "headers" for "CSV" date that
1968 has no header line, like
1969
1970 my $ref = csv (
1971 in => "foo.csv",
1972 headers => [qw( c_foo foo bar description stock )],
1973 key => "c_foo",
1974 );
1975
1976 value
1977
1978 Used to create key-value hashes.
1979
1980 Only allowed when "key" is valid. A "value" can be either a single
1981 column label or an anonymous list of column labels. In the first case,
1982 the value will be a simple scalar value, in the latter case, it will be
1983 a hashref.
1984
1985 my $ref = csv (in => "test.csv", key => "code",
1986 value => "price");
1987 my $ref = csv (in => "test.csv", key => "code",
1988 value => [ "product", "price" ]);
1989 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1990 value => "price");
1991 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1992 value => [ "product", "price" ]);
1993
1994 with test.csv like
1995
1996 code,product,price,color
1997 1,pc,850,gray
1998 2,keyboard,12,white
1999 3,mouse,5,black
2000
2001 the first example will return
2002
2003 { 1 => 850,
2004 2 => 12,
2005 3 => 5,
2006 }
2007
2008 the second example will return
2009
2010 { 1 => {
2011 price => 850,
2012 product => 'pc'
2013 },
2014 2 => {
2015 price => 12,
2016 product => 'keyboard'
2017 },
2018 3 => {
2019 price => 5,
2020 product => 'mouse'
2021 }
2022 }
2023
2024 the third example will return
2025
2026 { "1:gray" => 850,
2027 "2:white" => 12,
2028 "3:black" => 5,
2029 }
2030
2031 the fourth example will return
2032
2033 { "1:gray" => {
2034 price => 850,
2035 product => 'pc'
2036 },
2037 "2:white" => {
2038 price => 12,
2039 product => 'keyboard'
2040 },
2041 "3:black" => {
2042 price => 5,
2043 product => 'mouse'
2044 }
2045 }
2046
2047 keep_headers
2048
2049 When using hashes, keep the column names into the arrayref passed, so
2050 all headers are available after the call in the original order.
2051
2052 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
2053
2054 This attribute can be abbreviated to "kh" or passed as
2055 "keep_column_names".
2056
2057 This attribute implies a default of "auto" for the "headers" attribute.
2058
2059 The headers can also be kept internally to keep stable header order:
2060
2061 csv (in => csv (in => "file.csv", kh => "internal"),
2062 out => "new.csv",
2063 kh => "internal");
2064
2065 where "internal" can also be 1, "yes", or "true". This is similar to
2066
2067 my @h;
2068 csv (in => csv (in => "file.csv", kh => \@h),
2069 out => "new.csv",
2070 headers => \@h);
2071
2072 fragment
2073
2074 Only output the fragment as defined in the "fragment" method. This
2075 option is ignored when generating "CSV". See "out".
2076
2077 Combining all of them could give something like
2078
2079 use Text::CSV_XS qw( csv );
2080 my $aoh = csv (
2081 in => "test.txt",
2082 encoding => "utf-8",
2083 headers => "auto",
2084 sep_char => "|",
2085 fragment => "row=3;6-9;15-*",
2086 );
2087 say $aoh->[15]{Foo};
2088
2089 sep_set
2090
2091 If "sep_set" is set, the method "header" is invoked on the opened
2092 stream to detect and set "sep_char" with the given set.
2093
2094 "sep_set" can be abbreviated to "seps".
2095
2096 Note that as the "header" method is invoked, its default is to also
2097 set the headers.
2098
2099 set_column_names
2100
2101 If "set_column_names" is passed, the method "header" is invoked on
2102 the opened stream with all arguments meant for "header".
2103
2104 If "set_column_names" is passed as a false value, the content of the
2105 first row is only preserved if the output is AoA:
2106
2107 With an input-file like
2108
2109 bAr,foo
2110 1,2
2111 3,4,5
2112
2113 This call
2114
2115 my $aoa = csv (in => $file, set_column_names => 0);
2116
2117 will result in
2118
2119 [[ "bar", "foo" ],
2120 [ "1", "2" ],
2121 [ "3", "4", "5" ]]
2122
2123 and
2124
2125 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2126
2127 will result in
2128
2129 [[ "bAr", "foo" ],
2130 [ "1", "2" ],
2131 [ "3", "4", "5" ]]
2132
2133 Callbacks
2134 Callbacks enable actions triggered from the inside of Text::CSV_XS.
2135
2136 While most of what this enables can easily be done in an unrolled
2137 loop as described in the "SYNOPSIS" callbacks can be used to meet
2138 special demands or enhance the "csv" function.
2139
2140 error
2141 $csv->callbacks (error => sub { $csv->SetDiag (0) });
2142
2143 the "error" callback is invoked when an error occurs, but only
2144 when "auto_diag" is set to a true value. A callback is invoked with
2145 the values returned by "error_diag":
2146
2147 my ($c, $s);
2148
2149 sub ignore3006 {
2150 my ($err, $msg, $pos, $recno, $fldno) = @_;
2151 if ($err == 3006) {
2152 # ignore this error
2153 ($c, $s) = (undef, undef);
2154 Text::CSV_XS->SetDiag (0);
2155 }
2156 # Any other error
2157 return;
2158 } # ignore3006
2159
2160 $csv->callbacks (error => \&ignore3006);
2161 $csv->bind_columns (\$c, \$s);
2162 while ($csv->getline ($fh)) {
2163 # Error 3006 will not stop the loop
2164 }
2165
2166 after_parse
2167 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2168 while (my $row = $csv->getline ($fh)) {
2169 $row->[-1] eq "NEW";
2170 }
2171
2172 This callback is invoked after parsing with "getline" only if no
2173 error occurred. The callback is invoked with two arguments: the
2174 current "CSV" parser object and an array reference to the fields
2175 parsed.
2176
2177 The return code of the callback is ignored unless it is a reference
2178 to the string "skip", in which case the record will be skipped in
2179 "getline_all".
2180
2181 sub add_from_db {
2182 my ($csv, $row) = @_;
2183 $sth->execute ($row->[4]);
2184 push @$row, $sth->fetchrow_array;
2185 } # add_from_db
2186
2187 my $aoa = csv (in => "file.csv", callbacks => {
2188 after_parse => \&add_from_db });
2189
2190 This hook can be used for validation:
2191
2192 FAIL
2193 Die if any of the records does not validate a rule:
2194
2195 after_parse => sub {
2196 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2197 die "5th field does not have a valid Dutch zipcode";
2198 }
2199
2200 DEFAULT
2201 Replace invalid fields with a default value:
2202
2203 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2204
2205 SKIP
2206 Skip records that have invalid fields (only applies to
2207 "getline_all"):
2208
2209 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2210
2211 before_print
2212 my $idx = 1;
2213 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2214 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2215
2216 This callback is invoked before printing with "print" only if no
2217 error occurred. The callback is invoked with two arguments: the
2218 current "CSV" parser object and an array reference to the fields
2219 passed.
2220
2221 The return code of the callback is ignored.
2222
2223 sub max_4_fields {
2224 my ($csv, $row) = @_;
2225 @$row > 4 and splice @$row, 4;
2226 } # max_4_fields
2227
2228 csv (in => csv (in => "file.csv"), out => *STDOUT,
2229 callbacks => { before_print => \&max_4_fields });
2230
2231 This callback is not active for "combine".
2232
2233 Callbacks for csv ()
2234
2235 The "csv" allows for some callbacks that do not integrate in XS
2236 internals but only feature the "csv" function.
2237
2238 csv (in => "file.csv",
2239 callbacks => {
2240 filter => { 6 => sub { $_ > 15 } }, # first
2241 after_parse => sub { say "AFTER PARSE"; }, # first
2242 after_in => sub { say "AFTER IN"; }, # second
2243 on_in => sub { say "ON IN"; }, # third
2244 },
2245 );
2246
2247 csv (in => $aoh,
2248 out => "file.csv",
2249 callbacks => {
2250 on_in => sub { say "ON IN"; }, # first
2251 before_out => sub { say "BEFORE OUT"; }, # second
2252 before_print => sub { say "BEFORE PRINT"; }, # third
2253 },
2254 );
2255
2256 filter
2257 This callback can be used to filter records. It is called just after
2258 a new record has been scanned. The callback accepts a:
2259
2260 hashref
2261 The keys are the index to the row (the field name or field number,
2262 1-based) and the values are subs to return a true or false value.
2263
2264 csv (in => "file.csv", filter => {
2265 3 => sub { m/a/ }, # third field should contain an "a"
2266 5 => sub { length > 4 }, # length of the 5th field minimal 5
2267 });
2268
2269 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2270
2271 If the keys to the filter hash contain any character that is not a
2272 digit it will also implicitly set "headers" to "auto" unless
2273 "headers" was already passed as argument. When headers are
2274 active, returning an array of hashes, the filter is not applicable
2275 to the header itself.
2276
2277 All sub results should match, as in AND.
2278
2279 The context of the callback sets $_ localized to the field
2280 indicated by the filter. The two arguments are as with all other
2281 callbacks, so the other fields in the current row can be seen:
2282
2283 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2284
2285 If the context is set to return a list of hashes ("headers" is
2286 defined), the current record will also be available in the
2287 localized %_:
2288
2289 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2290
2291 If the filter is used to alter the content by changing $_, make
2292 sure that the sub returns true in order not to have that record
2293 skipped:
2294
2295 filter => { 2 => sub { $_ = uc }}
2296
2297 will upper-case the second field, and then skip it if the resulting
2298 content evaluates to false. To always accept, end with truth:
2299
2300 filter => { 2 => sub { $_ = uc; 1 }}
2301
2302 coderef
2303 csv (in => "file.csv", filter => sub { $n++; 0; });
2304
2305 If the argument to "filter" is a coderef, it is an alias or
2306 shortcut to a filter on column 0:
2307
2308 csv (filter => sub { $n++; 0 });
2309
2310 is equal to
2311
2312 csv (filter => { 0 => sub { $n++; 0 });
2313
2314 filter-name
2315 csv (in => "file.csv", filter => "not_blank");
2316 csv (in => "file.csv", filter => "not_empty");
2317 csv (in => "file.csv", filter => "filled");
2318
2319 These are predefined filters
2320
2321 Given a file like (line numbers prefixed for doc purpose only):
2322
2323 1:1,2,3
2324 2:
2325 3:,
2326 4:""
2327 5:,,
2328 6:, ,
2329 7:"",
2330 8:" "
2331 9:4,5,6
2332
2333 not_blank
2334 Filter out the blank lines
2335
2336 This filter is a shortcut for
2337
2338 filter => { 0 => sub { @{$_[1]} > 1 or
2339 defined $_[1][0] && $_[1][0] ne "" } }
2340
2341 Due to the implementation, it is currently impossible to also
2342 filter lines that consists only of a quoted empty field. These
2343 lines are also considered blank lines.
2344
2345 With the given example, lines 2 and 4 will be skipped.
2346
2347 not_empty
2348 Filter out lines where all the fields are empty.
2349
2350 This filter is a shortcut for
2351
2352 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2353
2354 A space is not regarded being empty, so given the example data,
2355 lines 2, 3, 4, 5, and 7 are skipped.
2356
2357 filled
2358 Filter out lines that have no visible data
2359
2360 This filter is a shortcut for
2361
2362 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2363
2364 This filter rejects all lines that not have at least one field
2365 that does not evaluate to the empty string.
2366
2367 With the given example data, this filter would skip lines 2
2368 through 8.
2369
2370 One could also use modules like Types::Standard:
2371
2372 use Types::Standard -types;
2373
2374 my $type = Tuple[Str, Str, Int, Bool, Optional[Num]];
2375 my $check = $type->compiled_check;
2376
2377 # filter with compiled check and warnings
2378 my $aoa = csv (
2379 in => \$data,
2380 filter => {
2381 0 => sub {
2382 my $ok = $check->($_[1]) or
2383 warn $type->get_message ($_[1]), "\n";
2384 return $ok;
2385 },
2386 },
2387 );
2388
2389 after_in
2390 This callback is invoked for each record after all records have been
2391 parsed but before returning the reference to the caller. The hook is
2392 invoked with two arguments: the current "CSV" parser object and a
2393 reference to the record. The reference can be a reference to a
2394 HASH or a reference to an ARRAY as determined by the arguments.
2395
2396 This callback can also be passed as an attribute without the
2397 "callbacks" wrapper.
2398
2399 before_out
2400 This callback is invoked for each record before the record is
2401 printed. The hook is invoked with two arguments: the current "CSV"
2402 parser object and a reference to the record. The reference can be a
2403 reference to a HASH or a reference to an ARRAY as determined by the
2404 arguments.
2405
2406 This callback can also be passed as an attribute without the
2407 "callbacks" wrapper.
2408
2409 This callback makes the row available in %_ if the row is a hashref.
2410 In this case %_ is writable and will change the original row.
2411
2412 on_in
2413 This callback acts exactly as the "after_in" or the "before_out"
2414 hooks.
2415
2416 This callback can also be passed as an attribute without the
2417 "callbacks" wrapper.
2418
2419 This callback makes the row available in %_ if the row is a hashref.
2420 In this case %_ is writable and will change the original row. So e.g.
2421 with
2422
2423 my $aoh = csv (
2424 in => \"foo\n1\n2\n",
2425 headers => "auto",
2426 on_in => sub { $_{bar} = 2; },
2427 );
2428
2429 $aoh will be:
2430
2431 [ { foo => 1,
2432 bar => 2,
2433 }
2434 { foo => 2,
2435 bar => 2,
2436 }
2437 ]
2438
2439 csv
2440 The function "csv" can also be called as a method or with an
2441 existing Text::CSV_XS object. This could help if the function is to
2442 be invoked a lot of times and the overhead of creating the object
2443 internally over and over again would be prevented by passing an
2444 existing instance.
2445
2446 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2447
2448 my $aoa = $csv->csv (in => $fh);
2449 my $aoa = csv (in => $fh, csv => $csv);
2450
2451 both act the same. Running this 20000 times on a 20 lines CSV file,
2452 showed a 53% speedup.
2453
2455 Combine (...)
2456 Parse (...)
2457
2458 The arguments to these internal functions are deliberately not
2459 described or documented in order to enable the module authors make
2460 changes it when they feel the need for it. Using them is highly
2461 discouraged as the API may change in future releases.
2462
2464 Reading a CSV file line by line:
2465 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2466 open my $fh, "<", "file.csv" or die "file.csv: $!";
2467 while (my $row = $csv->getline ($fh)) {
2468 # do something with @$row
2469 }
2470 close $fh or die "file.csv: $!";
2471
2472 or
2473
2474 my $aoa = csv (in => "file.csv", on_in => sub {
2475 # do something with %_
2476 });
2477
2478 Reading only a single column
2479
2480 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2481 open my $fh, "<", "file.csv" or die "file.csv: $!";
2482 # get only the 4th column
2483 my @column = map { $_->[3] } @{$csv->getline_all ($fh)};
2484 close $fh or die "file.csv: $!";
2485
2486 with "csv", you could do
2487
2488 my @column = map { $_->[0] }
2489 @{csv (in => "file.csv", fragment => "col=4")};
2490
2491 Parsing CSV strings:
2492 my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
2493
2494 my $sample_input_string =
2495 qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
2496 if ($csv->parse ($sample_input_string)) {
2497 my @field = $csv->fields;
2498 foreach my $col (0 .. $#field) {
2499 my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
2500 printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
2501 }
2502 }
2503 else {
2504 print STDERR "parse () failed on argument: ",
2505 $csv->error_input, "\n";
2506 $csv->error_diag ();
2507 }
2508
2509 Parsing CSV from memory
2510
2511 Given a complete CSV data-set in scalar $data, generate a list of
2512 lists to represent the rows and fields
2513
2514 # The data
2515 my $data = join "\r\n" => map { join "," => 0 .. 5 } 0 .. 5;
2516
2517 # in a loop
2518 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2519 open my $fh, "<", \$data;
2520 my @foo;
2521 while (my $row = $csv->getline ($fh)) {
2522 push @foo, $row;
2523 }
2524 close $fh;
2525
2526 # a single call
2527 my $foo = csv (in => \$data);
2528
2529 Printing CSV data
2530 The fast way: using "print"
2531
2532 An example for creating "CSV" files using the "print" method:
2533
2534 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
2535 open my $fh, ">", "foo.csv" or die "foo.csv: $!";
2536 for (1 .. 10) {
2537 $csv->print ($fh, [ $_, "$_" ]) or $csv->error_diag;
2538 }
2539 close $fh or die "$tbl.csv: $!";
2540
2541 The slow way: using "combine" and "string"
2542
2543 or using the slower "combine" and "string" methods:
2544
2545 my $csv = Text::CSV_XS->new;
2546
2547 open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
2548
2549 my @sample_input_fields = (
2550 'You said, "Hello!"', 5.67,
2551 '"Surely"', '', '3.14159');
2552 if ($csv->combine (@sample_input_fields)) {
2553 print $csv_fh $csv->string, "\n";
2554 }
2555 else {
2556 print "combine () failed on argument: ",
2557 $csv->error_input, "\n";
2558 }
2559 close $csv_fh or die "hello.csv: $!";
2560
2561 Generating CSV into memory
2562
2563 Format a data-set (@foo) into a scalar value in memory ($data):
2564
2565 # The data
2566 my @foo = map { [ 0 .. 5 ] } 0 .. 3;
2567
2568 # in a loop
2569 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
2570 open my $fh, ">", \my $data;
2571 $csv->print ($fh, $_) for @foo;
2572 close $fh;
2573
2574 # a single call
2575 csv (in => \@foo, out => \my $data);
2576
2577 Rewriting CSV
2578 Rewrite "CSV" files with ";" as separator character to well-formed
2579 "CSV":
2580
2581 use Text::CSV_XS qw( csv );
2582 csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
2583
2584 As "STDOUT" is now default in "csv", a one-liner converting a UTF-16
2585 CSV file with BOM and TAB-separation to valid UTF-8 CSV could be:
2586
2587 $ perl -C3 -MText::CSV_XS=csv -we\
2588 'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
2589
2590 Dumping database tables to CSV
2591 Dumping a database table can be simple as this (TIMTOWTDI):
2592
2593 my $dbh = DBI->connect (...);
2594 my $sql = "select * from foo";
2595
2596 # using your own loop
2597 open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
2598 my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
2599 my $sth = $dbh->prepare ($sql); $sth->execute;
2600 $csv->print ($fh, $sth->{NAME_lc});
2601 while (my $row = $sth->fetch) {
2602 $csv->print ($fh, $row);
2603 }
2604
2605 # using the csv function, all in memory
2606 csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
2607
2608 # using the csv function, streaming with callbacks
2609 my $sth = $dbh->prepare ($sql); $sth->execute;
2610 csv (out => "foo.csv", in => sub { $sth->fetch });
2611 csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
2612
2613 Note that this does not discriminate between "empty" values and NULL-
2614 values from the database, as both will be the same empty field in CSV.
2615 To enable distinction between the two, use "quote_empty".
2616
2617 csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
2618
2619 If the database import utility supports special sequences to insert
2620 "NULL" values into the database, like MySQL/MariaDB supports "\N",
2621 use a filter or a map
2622
2623 csv (out => "foo.csv", in => sub { $sth->fetch },
2624 on_in => sub { $_ //= "\\N" for @{$_[1]} });
2625
2626 while (my $row = $sth->fetch) {
2627 $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
2628 }
2629
2630 Note that this will not work as expected when choosing the backslash
2631 ("\") as "escape_char", as that will cause the "\" to need to be
2632 escaped by yet another "\", which will cause the field to need
2633 quotation and thus ending up as "\\N" instead of "\N". See also
2634 "undef_str".
2635
2636 csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
2637
2638 These special sequences are not recognized by Text::CSV_XS on parsing
2639 the CSV generated like this, but map and filter are your friends again
2640
2641 while (my $row = $csv->getline ($fh)) {
2642 $sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
2643 }
2644
2645 csv (in => "foo.csv", filter => { 1 => sub {
2646 $sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
2647
2648 Converting CSV to JSON
2649 use Text::CSV_XS qw( csv );
2650 use JSON; # or Cpanel::JSON::XS for better performance
2651
2652 # AoA (no header interpretation)
2653 say encode_json (csv (in => "file.csv"));
2654
2655 # AoH (convert to structures)
2656 say encode_json (csv (in => "file.csv", bom => 1));
2657
2658 Yes, it is that simple.
2659
2660 The examples folder
2661 For more extended examples, see the examples/ 1. sub-directory in the
2662 original distribution or the git repository 2.
2663
2664 1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2665 2. https://github.com/Tux/Text-CSV_XS
2666
2667 The following files can be found there:
2668
2669 parser-xs.pl
2670 This can be used as a boilerplate to parse invalid "CSV" and parse
2671 beyond (expected) errors alternative to using the "error" callback.
2672
2673 $ perl examples/parser-xs.pl bad.csv >good.csv
2674
2675 csv-check
2676 This is a command-line tool that uses parser-xs.pl techniques to
2677 check the "CSV" file and report on its content.
2678
2679 $ csv-check files/utf8.csv
2680 Checked files/utf8.csv with csv-check 1.9
2681 using Text::CSV_XS 1.32 with perl 5.26.0 and Unicode 9.0.0
2682 OK: rows: 1, columns: 2
2683 sep = <,>, quo = <">, bin = <1>, eol = <"\n">
2684
2685 csv-split
2686 This command splits "CSV" files into smaller files, keeping (part
2687 of) the header. Options include maximum number of (data) rows per
2688 file and maximum number of columns per file or a combination of the
2689 two.
2690
2691 csv2xls
2692 A script to convert "CSV" to Microsoft Excel ("XLS"). This requires
2693 extra modules Date::Calc and Spreadsheet::WriteExcel. The converter
2694 accepts various options and can produce UTF-8 compliant Excel files.
2695
2696 csv2xlsx
2697 A script to convert "CSV" to Microsoft Excel ("XLSX"). This requires
2698 the modules Date::Calc and Spreadsheet::Writer::XLSX. The converter
2699 does accept various options including merging several "CSV" files
2700 into a single Excel file.
2701
2702 csvdiff
2703 A script that provides colorized diff on sorted CSV files, assuming
2704 first line is header and first field is the key. Output options
2705 include colorized ANSI escape codes or HTML.
2706
2707 $ csvdiff --html --output=diff.html file1.csv file2.csv
2708
2709 rewrite.pl
2710 A script to rewrite (in)valid CSV into valid CSV files. Script has
2711 options to generate confusing CSV files or CSV files that conform to
2712 Dutch MS-Excel exports (using ";" as separation).
2713
2714 Script - by default - honors BOM and auto-detects separation
2715 converting it to default standard CSV with "," as separator.
2716
2718 Text::CSV_XS is not designed to detect the characters used to quote
2719 and separate fields. The parsing is done using predefined (default)
2720 settings. In the examples sub-directory, you can find scripts that
2721 demonstrate how you could try to detect these characters yourself.
2722
2723 Microsoft Excel
2724 The import/export from Microsoft Excel is a risky task, according to
2725 the documentation in "Text::CSV::Separator". Microsoft uses the
2726 system's list separator defined in the regional settings, which happens
2727 to be a semicolon for Dutch, German and Spanish (and probably some
2728 others as well). For the English locale, the default is a comma.
2729 In Windows however, the user is free to choose a predefined locale,
2730 and then change every individual setting in it, so checking the
2731 locale is no solution.
2732
2733 As of version 1.17, a lone first line with just
2734
2735 sep=;
2736
2737 will be recognized and honored when parsing with "getline".
2738
2740 More Errors & Warnings
2741 New extensions ought to be clear and concise in reporting what
2742 error has occurred where and why, and maybe also offer a remedy to
2743 the problem.
2744
2745 "error_diag" is a (very) good start, but there is more work to be
2746 done in this area.
2747
2748 Basic calls should croak or warn on illegal parameters. Errors
2749 should be documented.
2750
2751 setting meta info
2752 Future extensions might include extending the "meta_info",
2753 "is_quoted", and "is_binary" to accept setting these flags for
2754 fields, so you can specify which fields are quoted in the
2755 "combine"/"string" combination.
2756
2757 $csv->meta_info (0, 1, 1, 3, 0, 0);
2758 $csv->is_quoted (3, 1);
2759
2760 Metadata Vocabulary for Tabular Data
2761 <http://w3c.github.io/csvw/metadata/> (a W3C editor's draft) could be
2762 an example for supporting more metadata.
2763
2764 Parse the whole file at once
2765 Implement new methods or functions that enable parsing of a
2766 complete file at once, returning a list of hashes. Possible extension
2767 to this could be to enable a column selection on the call:
2768
2769 my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
2770
2771 returning something like
2772
2773 [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
2774 flags => [ ... ],
2775 },
2776 { fields => [ ... ],
2777 .
2778 },
2779 ]
2780
2781 Note that the "csv" function already supports most of this, but does
2782 not return flags. "getline_all" returns all rows for an open stream,
2783 but this will not return flags either. "fragment" can reduce the
2784 required rows or columns, but cannot combine them.
2785
2786 Cookbook
2787 Write a document that has recipes for most known non-standard (and
2788 maybe some standard) "CSV" formats, including formats that use
2789 "TAB", ";", "|", or other non-comma separators.
2790
2791 Examples could be taken from W3C's CSV on the Web: Use Cases and
2792 Requirements <http://w3c.github.io/csvw/use-cases-and-
2793 requirements/index.html>
2794
2795 Steal
2796 Steal good new ideas and features from PapaParse
2797 <http://papaparse.com> or csvkit <http://csvkit.readthedocs.org>.
2798
2799 Raku support
2800 Raku support can be found here <https://github.com/Tux/CSV>. The
2801 interface is richer in support than the Perl5 API, as Raku supports
2802 more types.
2803
2804 The Raku version does not (yet) support pure binary CSV datasets.
2805
2806 NOT TODO
2807 combined methods
2808 Requests for adding means (methods) that combine "combine" and
2809 "string" in a single call will not be honored (use "print" instead).
2810 Likewise for "parse" and "fields" (use "getline" instead), given the
2811 problems with embedded newlines.
2812
2813 Release plan
2814 No guarantees, but this is what I had in mind some time ago:
2815
2816 • DIAGNOSTICS section in pod to *describe* the errors (see below)
2817
2819 Everything should now work on native EBCDIC systems. As the test does
2820 not cover all possible codepoints and Encode does not support
2821 "utf-ebcdic", there is no guarantee that all handling of Unicode is
2822 done correct.
2823
2824 Opening "EBCDIC" encoded files on "ASCII"+ systems is likely to
2825 succeed using Encode's "cp37", "cp1047", or "posix-bc":
2826
2827 open my $fh, "<:encoding(cp1047)", "ebcdic_file.csv" or die "...";
2828
2830 Still under construction ...
2831
2832 If an error occurs, "$csv->error_diag" can be used to get information
2833 on the cause of the failure. Note that for speed reasons the internal
2834 value is never cleared on success, so using the value returned by
2835 "error_diag" in normal cases - when no error occurred - may cause
2836 unexpected results.
2837
2838 If the constructor failed, the cause can be found using "error_diag" as
2839 a class method, like "Text::CSV_XS->error_diag".
2840
2841 The "$csv->error_diag" method is automatically invoked upon error when
2842 the contractor was called with "auto_diag" set to 1 or 2, or when
2843 autodie is in effect. When set to 1, this will cause a "warn" with the
2844 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2845 from "auto_diag" reports.
2846
2847 Errors can be (individually) caught using the "error" callback.
2848
2849 The errors as described below are available. I have tried to make the
2850 error itself explanatory enough, but more descriptions will be added.
2851 For most of these errors, the first three capitals describe the error
2852 category:
2853
2854 • INI
2855
2856 Initialization error or option conflict.
2857
2858 • ECR
2859
2860 Carriage-Return related parse error.
2861
2862 • EOF
2863
2864 End-Of-File related parse error.
2865
2866 • EIQ
2867
2868 Parse error inside quotation.
2869
2870 • EIF
2871
2872 Parse error inside field.
2873
2874 • ECB
2875
2876 Combine error.
2877
2878 • EHR
2879
2880 HashRef parse related error.
2881
2882 And below should be the complete list of error codes that can be
2883 returned:
2884
2885 • 1001 "INI - sep_char is equal to quote_char or escape_char"
2886
2887 The separation character cannot be equal to the quotation
2888 character or to the escape character, as this would invalidate all
2889 parsing rules.
2890
2891 • 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2892 TAB"
2893
2894 Using the "allow_whitespace" attribute when either "quote_char" or
2895 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2896 allow.
2897
2898 • 1003 "INI - \r or \n in main attr not allowed"
2899
2900 Using default "eol" characters in either "sep_char", "quote_char",
2901 or "escape_char" is not allowed.
2902
2903 • 1004 "INI - callbacks should be undef or a hashref"
2904
2905 The "callbacks" attribute only allows one to be "undef" or a hash
2906 reference.
2907
2908 • 1005 "INI - EOL too long"
2909
2910 The value passed for EOL is exceeding its maximum length (16).
2911
2912 • 1006 "INI - SEP too long"
2913
2914 The value passed for SEP is exceeding its maximum length (16).
2915
2916 • 1007 "INI - QUOTE too long"
2917
2918 The value passed for QUOTE is exceeding its maximum length (16).
2919
2920 • 1008 "INI - SEP undefined"
2921
2922 The value passed for SEP should be defined and not empty.
2923
2924 • 1010 "INI - the header is empty"
2925
2926 The header line parsed in the "header" is empty.
2927
2928 • 1011 "INI - the header contains more than one valid separator"
2929
2930 The header line parsed in the "header" contains more than one
2931 (unique) separator character out of the allowed set of separators.
2932
2933 • 1012 "INI - the header contains an empty field"
2934
2935 The header line parsed in the "header" contains an empty field.
2936
2937 • 1013 "INI - the header contains nun-unique fields"
2938
2939 The header line parsed in the "header" contains at least two
2940 identical fields.
2941
2942 • 1014 "INI - header called on undefined stream"
2943
2944 The header line cannot be parsed from an undefined source.
2945
2946 • 1500 "PRM - Invalid/unsupported argument(s)"
2947
2948 Function or method called with invalid argument(s) or parameter(s).
2949
2950 • 1501 "PRM - The key attribute is passed as an unsupported type"
2951
2952 The "key" attribute is of an unsupported type.
2953
2954 • 1502 "PRM - The value attribute is passed without the key attribute"
2955
2956 The "value" attribute is only allowed when a valid key is given.
2957
2958 • 1503 "PRM - The value attribute is passed as an unsupported type"
2959
2960 The "value" attribute is of an unsupported type.
2961
2962 • 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2963
2964 When "eol" has been set to anything but the default, like
2965 "\r\t\n", and the "\r" is following the second (closing)
2966 "quote_char", where the characters following the "\r" do not make up
2967 the "eol" sequence, this is an error.
2968
2969 • 2011 "ECR - Characters after end of quoted field"
2970
2971 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2972 quoted field and after the closing double-quote, there should be
2973 either a new-line sequence or a separation character.
2974
2975 • 2012 "EOF - End of data in parsing input stream"
2976
2977 Self-explaining. End-of-file while inside parsing a stream. Can
2978 happen only when reading from streams with "getline", as using
2979 "parse" is done on strings that are not required to have a trailing
2980 "eol".
2981
2982 • 2013 "INI - Specification error for fragments RFC7111"
2983
2984 Invalid specification for URI "fragment" specification.
2985
2986 • 2014 "ENF - Inconsistent number of fields"
2987
2988 Inconsistent number of fields under strict parsing.
2989
2990 • 2021 "EIQ - NL char inside quotes, binary off"
2991
2992 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2993 option has been selected with the constructor.
2994
2995 • 2022 "EIQ - CR char inside quotes, binary off"
2996
2997 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2998 option has been selected with the constructor.
2999
3000 • 2023 "EIQ - QUO character not allowed"
3001
3002 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
3003 Bar",\n" will cause this error.
3004
3005 • 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
3006
3007 The escape character is not allowed as last character in an input
3008 stream.
3009
3010 • 2025 "EIQ - Loose unescaped escape"
3011
3012 An escape character should escape only characters that need escaping.
3013
3014 Allowing the escape for other characters is possible with the
3015 attribute "allow_loose_escapes".
3016
3017 • 2026 "EIQ - Binary character inside quoted field, binary off"
3018
3019 Binary characters are not allowed by default. Exceptions are
3020 fields that contain valid UTF-8, that will automatically be upgraded
3021 if the content is valid UTF-8. Set "binary" to 1 to accept binary
3022 data.
3023
3024 • 2027 "EIQ - Quoted field not terminated"
3025
3026 When parsing a field that started with a quotation character, the
3027 field is expected to be closed with a quotation character. When the
3028 parsed line is exhausted before the quote is found, that field is not
3029 terminated.
3030
3031 • 2030 "EIF - NL char inside unquoted verbatim, binary off"
3032
3033 • 2031 "EIF - CR char is first char of field, not part of EOL"
3034
3035 • 2032 "EIF - CR char inside unquoted, not part of EOL"
3036
3037 • 2034 "EIF - Loose unescaped quote"
3038
3039 • 2035 "EIF - Escaped EOF in unquoted field"
3040
3041 • 2036 "EIF - ESC error"
3042
3043 • 2037 "EIF - Binary character in unquoted field, binary off"
3044
3045 • 2110 "ECB - Binary character in Combine, binary off"
3046
3047 • 2200 "EIO - print to IO failed. See errno"
3048
3049 • 3001 "EHR - Unsupported syntax for column_names ()"
3050
3051 • 3002 "EHR - getline_hr () called before column_names ()"
3052
3053 • 3003 "EHR - bind_columns () and column_names () fields count
3054 mismatch"
3055
3056 • 3004 "EHR - bind_columns () only accepts refs to scalars"
3057
3058 • 3006 "EHR - bind_columns () did not pass enough refs for parsed
3059 fields"
3060
3061 • 3007 "EHR - bind_columns needs refs to writable scalars"
3062
3063 • 3008 "EHR - unexpected error in bound fields"
3064
3065 • 3009 "EHR - print_hr () called before column_names ()"
3066
3067 • 3010 "EHR - print_hr () called with invalid arguments"
3068
3070 IO::File, IO::Handle, IO::Wrap, Text::CSV, Text::CSV_PP,
3071 Text::CSV::Encoded, Text::CSV::Separator, Text::CSV::Slurp,
3072 Spreadsheet::CSV and Spreadsheet::Read, and of course perl.
3073
3074 If you are using Raku, have a look at "Text::CSV" in the Raku
3075 ecosystem, offering the same features.
3076
3077 non-perl
3078
3079 A CSV parser in JavaScript, also used by W3C <http://www.w3.org>, is
3080 the multi-threaded in-browser PapaParse <http://papaparse.com/>.
3081
3082 csvkit <http://csvkit.readthedocs.org> is a python CSV parsing toolkit.
3083
3085 Alan Citterman <alan@mfgrtl.com> wrote the original Perl module.
3086 Please don't send mail concerning Text::CSV_XS to Alan, who is not
3087 involved in the C/XS part that is now the main part of the module.
3088
3089 Jochen Wiedmann <joe@ispsoft.de> rewrote the en- and decoding in C by
3090 implementing a simple finite-state machine. He added variable quote,
3091 escape and separator characters, the binary mode and the print and
3092 getline methods. See ChangeLog releases 0.10 through 0.23.
3093
3094 H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code, added the
3095 field flags methods, wrote the major part of the test suite, completed
3096 the documentation, fixed most RT bugs, added all the allow flags and
3097 the "csv" function. See ChangeLog releases 0.25 and on.
3098
3100 Copyright (C) 2007-2022 H.Merijn Brand. All rights reserved.
3101 Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
3102 Copyright (C) 1997 Alan Citterman. All rights reserved.
3103
3104 This library is free software; you can redistribute and/or modify it
3105 under the same terms as Perl itself.
3106
3107
3108
3109perl v5.36.0 2022-07-22 CSV_XS(3)