1CSV_XS(3) User Contributed Perl Documentation CSV_XS(3)
2
3
4
6 Text::CSV_XS - comma-separated values manipulation routines
7
9 # Functional interface
10 use Text::CSV_XS qw( csv );
11
12 # Read whole file in memory
13 my $aoa = csv (in => "data.csv"); # as array of array
14 my $aoh = csv (in => "data.csv",
15 headers => "auto"); # as array of hash
16
17 # Write array of arrays as csv file
18 csv (in => $aoa, out => "file.csv", sep_char=> ";");
19
20 # Only show lines where "code" is odd
21 csv (in => "data.csv", filter => { code => sub { $_ % 2 }});
22
23
24 # Object interface
25 use Text::CSV_XS;
26
27 my @rows;
28 # Read/parse CSV
29 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
30 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
31 while (my $row = $csv->getline ($fh)) {
32 $row->[2] =~ m/pattern/ or next; # 3rd field should match
33 push @rows, $row;
34 }
35 close $fh;
36
37 # and write as CSV
38 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
39 $csv->say ($fh, $_) for @rows;
40 close $fh or die "new.csv: $!";
41
43 Text::CSV_XS provides facilities for the composition and
44 decomposition of comma-separated values. An instance of the
45 Text::CSV_XS class will combine fields into a "CSV" string and parse a
46 "CSV" string into fields.
47
48 The module accepts either strings or files as input and support the
49 use of user-specified characters for delimiters, separators, and
50 escapes.
51
52 Embedded newlines
53 Important Note: The default behavior is to accept only ASCII
54 characters in the range from 0x20 (space) to 0x7E (tilde). This means
55 that the fields can not contain newlines. If your data contains
56 newlines embedded in fields, or characters above 0x7E (tilde), or
57 binary data, you must set "binary => 1" in the call to "new". To cover
58 the widest range of parsing options, you will always want to set
59 binary.
60
61 But you still have the problem that you have to pass a correct line to
62 the "parse" method, which is more complicated from the usual point of
63 usage:
64
65 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
66 while (<>) { # WRONG!
67 $csv->parse ($_);
68 my @fields = $csv->fields ();
69 }
70
71 this will break, as the "while" might read broken lines: it does not
72 care about the quoting. If you need to support embedded newlines, the
73 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
74 and "\r\n" by default) and then
75
76 my $csv = Text::CSV_XS->new ({ binary => 1 });
77 open my $fh, "<", $file or die "$file: $!";
78 while (my $row = $csv->getline ($fh)) {
79 my @fields = @$row;
80 }
81
82 The old(er) way of using global file handles is still supported
83
84 while (my $row = $csv->getline (*ARGV)) { ... }
85
86 Unicode
87 Unicode is only tested to work with perl-5.8.2 and up.
88
89 See also "BOM".
90
91 The simplest way to ensure the correct encoding is used for in- and
92 output is by either setting layers on the filehandles, or setting the
93 "encoding" argument for "csv".
94
95 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
96 or
97 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
98
99 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
100 or
101 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
102
103 On parsing (both for "getline" and "parse"), if the source is marked
104 being UTF8, then all fields that are marked binary will also be marked
105 UTF8.
106
107 On combining ("print" and "combine"): if any of the combining fields
108 was marked UTF8, the resulting string will be marked as UTF8. Note
109 however that all fields before the first field marked UTF8 and
110 contained 8-bit characters that were not upgraded to UTF8, these will
111 be "bytes" in the resulting string too, possibly causing unexpected
112 errors. If you pass data of different encoding, or you don't know if
113 there is different encoding, force it to be upgraded before you pass
114 them on:
115
116 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
117
118 For complete control over encoding, please use Text::CSV::Encoded:
119
120 use Text::CSV::Encoded;
121 my $csv = Text::CSV::Encoded->new ({
122 encoding_in => "iso-8859-1", # the encoding comes into Perl
123 encoding_out => "cp1252", # the encoding comes out of Perl
124 });
125
126 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
127 # combine () and print () accept *literally* utf8 encoded data
128 # parse () and getline () return *literally* utf8 encoded data
129
130 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
131 # combine () and print () accept UTF8 marked data
132 # parse () and getline () return UTF8 marked data
133
134 BOM
135 BOM (or Byte Order Mark) handling is available only inside the
136 "header" method. This method supports the following encodings:
137 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
138 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
139 <https://en.wikipedia.org/wiki/Byte_order_mark>.
140
141 If a file has a BOM, the easiest way to deal with that is
142
143 my $aoh = csv (in => $file, detect_bom => 1);
144
145 All records will be encoded based on the detected BOM.
146
147 This implies a call to the "header" method, which defaults to also
148 set the "column_names". So this is not the same as
149
150 my $aoh = csv (in => $file, headers => "auto");
151
152 which only reads the first record to set "column_names" but ignores
153 any meaning of possible present BOM.
154
156 While no formal specification for CSV exists, RFC 4180
157 <http://tools.ietf.org/html/rfc4180> (1) describes the common format
158 and establishes "text/csv" as the MIME type registered with the IANA.
159 RFC 7111 <http://tools.ietf.org/html/rfc7111> (2) adds fragments to
160 CSV.
161
162 Many informal documents exist that describe the "CSV" format. "How
163 To: The Comma Separated Value (CSV) File Format"
164 <http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm> (3) provides an
165 overview of the "CSV" format in the most widely used applications and
166 explains how it can best be used and supported.
167
168 1) http://tools.ietf.org/html/rfc4180
169 2) http://tools.ietf.org/html/rfc7111
170 3) http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
171
172 The basic rules are as follows:
173
174 CSV is a delimited data format that has fields/columns separated by
175 the comma character and records/rows separated by newlines. Fields that
176 contain a special character (comma, newline, or double quote), must be
177 enclosed in double quotes. However, if a line contains a single entry
178 that is the empty string, it may be enclosed in double quotes. If a
179 field's value contains a double quote character it is escaped by
180 placing another double quote character next to it. The "CSV" file
181 format does not require a specific character encoding, byte order, or
182 line terminator format.
183
184 · Each record is a single line ended by a line feed (ASCII/"LF"=0x0A)
185 or a carriage return and line feed pair (ASCII/"CRLF"="0x0D 0x0A"),
186 however, line-breaks may be embedded.
187
188 · Fields are separated by commas.
189
190 · Allowable characters within a "CSV" field include 0x09 ("TAB") and
191 the inclusive range of 0x20 (space) through 0x7E (tilde). In binary
192 mode all characters are accepted, at least in quoted fields.
193
194 · A field within "CSV" must be surrounded by double-quotes to
195 contain a separator character (comma).
196
197 Though this is the most clear and restrictive definition, Text::CSV_XS
198 is way more liberal than this, and allows extension:
199
200 · Line termination by a single carriage return is accepted by default
201
202 · The separation-, escape-, and escape- characters can be any ASCII
203 character in the range from 0x20 (space) to 0x7E (tilde).
204 Characters outside this range may or may not work as expected.
205 Multibyte characters, like UTF "U+060C" (ARABIC COMMA), "U+FF0C"
206 (FULLWIDTH COMMA), "U+241B" (SYMBOL FOR ESCAPE), "U+2424" (SYMBOL
207 FOR NEWLINE), "U+FF02" (FULLWIDTH QUOTATION MARK), and "U+201C" (LEFT
208 DOUBLE QUOTATION MARK) (to give some examples of what might look
209 promising) work for newer versions of perl for "sep_char", and
210 "quote_char" but not for "escape_char".
211
212 If you use perl-5.8.2 or higher these three attributes are
213 utf8-decoded, to increase the likelihood of success. This way
214 "U+00FE" will be allowed as a quote character.
215
216 · A field in "CSV" must be surrounded by double-quotes to make an
217 embedded double-quote, represented by a pair of consecutive double-
218 quotes, valid. In binary mode you may additionally use the sequence
219 ""0" for representation of a NULL byte. Using 0x00 in binary mode is
220 just as valid.
221
222 · Several violations of the above specification may be lifted by
223 passing some options as attributes to the object constructor.
224
226 version
227 (Class method) Returns the current module version.
228
229 new
230 (Class method) Returns a new instance of class Text::CSV_XS. The
231 attributes are described by the (optional) hash ref "\%attr".
232
233 my $csv = Text::CSV_XS->new ({ attributes ... });
234
235 The following attributes are available:
236
237 eol
238
239 my $csv = Text::CSV_XS->new ({ eol => $/ });
240 $csv->eol (undef);
241 my $eol = $csv->eol;
242
243 The end-of-line string to add to rows for "print" or the record
244 separator for "getline".
245
246 When not passed in a parser instance, the default behavior is to
247 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
248 "eol" at all. Passing "undef" or the empty string behave the same.
249
250 When not passed in a generating instance, records are not terminated
251 at all, so it is probably wise to pass something you expect. A safe
252 choice for "eol" on output is either $/ or "\r\n".
253
254 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
255 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
256 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
257
258 If both $/ and "eol" equal "\015", parsing lines that end on only a
259 Carriage Return without Line Feed, will be "parse"d correct.
260
261 sep_char
262
263 my $csv = Text::CSV_XS->new ({ sep_char => ";" });
264 $csv->sep_char (";");
265 my $c = $csv->sep_char;
266
267 The char used to separate fields, by default a comma. (","). Limited
268 to a single-byte character, usually in the range from 0x20 (space) to
269 0x7E (tilde). When longer sequences are required, use "sep".
270
271 The separation character can not be equal to the quote character or to
272 the escape character.
273
274 See also "CAVEATS"
275
276 sep
277
278 my $csv = Text::CSV_XS->new ({ sep => "\N{FULLWIDTH COMMA}" });
279 $csv->sep (";");
280 my $sep = $csv->sep;
281
282 The chars used to separate fields, by default undefined. Limited to 8
283 bytes.
284
285 When set, overrules "sep_char". If its length is one byte it acts as
286 an alias to "sep_char".
287
288 See also "CAVEATS"
289
290 quote_char
291
292 my $csv = Text::CSV_XS->new ({ quote_char => "'" });
293 $csv->quote_char (undef);
294 my $c = $csv->quote_char;
295
296 The character to quote fields containing blanks or binary data, by
297 default the double quote character ("""). A value of undef suppresses
298 quote chars (for simple cases only). Limited to a single-byte
299 character, usually in the range from 0x20 (space) to 0x7E (tilde).
300 When longer sequences are required, use "quote".
301
302 "quote_char" can not be equal to "sep_char".
303
304 quote
305
306 my $csv = Text::CSV_XS->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
307 $csv->quote ("'");
308 my $quote = $csv->quote;
309
310 The chars used to quote fields, by default undefined. Limited to 8
311 bytes.
312
313 When set, overrules "quote_char". If its length is one byte it acts as
314 an alias to "quote_char".
315
316 This method does not support "undef". Use "quote_char" to disable
317 quotation.
318
319 See also "CAVEATS"
320
321 escape_char
322
323 my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
324 $csv->escape_char (":");
325 my $c = $csv->escape_char;
326
327 The character to escape certain characters inside quoted fields.
328 This is limited to a single-byte character, usually in the range
329 from 0x20 (space) to 0x7E (tilde).
330
331 The "escape_char" defaults to being the double-quote mark ("""). In
332 other words the same as the default "quote_char". This means that
333 doubling the quote mark in a field escapes it:
334
335 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
336
337 If you change the "quote_char" without changing the
338 "escape_char", the "escape_char" will still be the double-quote
339 ("""). If instead you want to escape the "quote_char" by doubling it
340 you will need to also change the "escape_char" to be the same as what
341 you have changed the "quote_char" to.
342
343 Setting "escape_char" to <undef> or "" will disable escaping completely
344 and is greatly discouraged. This will also disable "escape_null".
345
346 The escape character can not be equal to the separation character.
347
348 binary
349
350 my $csv = Text::CSV_XS->new ({ binary => 1 });
351 $csv->binary (0);
352 my $f = $csv->binary;
353
354 If this attribute is 1, you may use binary characters in quoted
355 fields, including line feeds, carriage returns and "NULL" bytes. (The
356 latter could be escaped as ""0".) By default this feature is off.
357
358 If a string is marked UTF8, "binary" will be turned on automatically
359 when binary characters other than "CR" and "NL" are encountered. Note
360 that a simple string like "\x{00a0}" might still be binary, but not
361 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
362
363 strict
364
365 my $csv = Text::CSV_XS->new ({ strict => 1 });
366 $csv->strict (0);
367 my $f = $csv->strict;
368
369 If this attribute is set to 1, any row that parses to a different
370 number of fields than the previous row will cause the parser to throw
371 error 2014.
372
373 formula_handling
374
375 formula
376
377 my $csv = Text::CSV_XS->new ({ formula => "none" });
378 $csv->formula ("none");
379 my $f = $csv->formula;
380
381 This defines the behavior of fields containing formulas. As formulas
382 are considered dangerous in spreadsheets, this attribute can define an
383 optional action to be taken if a field starts with an equal sign ("=").
384
385 For purpose of code-readability, this can also be written as
386
387 my $csv = Text::CSV_XS->new ({ formula_handling => "none" });
388 $csv->formula_handling ("none");
389 my $f = $csv->formula_handling;
390
391 Possible values for this attribute are
392
393 none
394 Take no specific action. This is the default.
395
396 $csv->formula ("none");
397
398 die
399 Cause the process to "die" whenever a leading "=" is encountered.
400
401 $csv->formula ("die");
402
403 croak
404 Cause the process to "croak" whenever a leading "=" is encountered.
405 (See Carp)
406
407 $csv->formula ("croak");
408
409 diag
410 Report position and content of the field whenever a leading "=" is
411 found. The value of the field is unchanged.
412
413 $csv->formula ("diag");
414
415 empty
416 Replace the content of fields that start with a "=" with the empty
417 string.
418
419 $csv->formula ("empty");
420 $csv->formula ("");
421
422 undef
423 Replace the content of fields that start with a "=" with "undef".
424
425 $csv->formula ("undef");
426 $csv->formula (undef);
427
428 a callback
429 Modify the content of fields that start with a "=" with the return-
430 value of the callback. The original content of the field is
431 available inside the callback as $_;
432
433 # Replace all formula's with 42
434 $csv->formula (sub { 42; });
435
436 # same as $csv->formula ("empty") but slower
437 $csv->formula (sub { "" });
438
439 # Allow =4+12
440 $csv->formula (sub { s/^=(\d+\+\d+)$/$1/eer });
441
442 # Allow more complex calculations
443 $csv->formula (sub { eval { s{^=([-+*/0-9()]+)$}{$1}ee }; $_ });
444
445 All other values will give a warning and then fallback to "diag".
446
447 decode_utf8
448
449 my $csv = Text::CSV_XS->new ({ decode_utf8 => 1 });
450 $csv->decode_utf8 (0);
451 my $f = $csv->decode_utf8;
452
453 This attributes defaults to TRUE.
454
455 While parsing, fields that are valid UTF-8, are automatically set to
456 be UTF-8, so that
457
458 $csv->parse ("\xC4\xA8\n");
459
460 results in
461
462 PV("\304\250"\0) [UTF8 "\x{128}"]
463
464 Sometimes it might not be a desired action. To prevent those upgrades,
465 set this attribute to false, and the result will be
466
467 PV("\304\250"\0)
468
469 auto_diag
470
471 my $csv = Text::CSV_XS->new ({ auto_diag => 1 });
472 $csv->auto_diag (2);
473 my $l = $csv->auto_diag;
474
475 Set this attribute to a number between 1 and 9 causes "error_diag" to
476 be automatically called in void context upon errors.
477
478 In case of error "2012 - EOF", this call will be void.
479
480 If "auto_diag" is set to a numeric value greater than 1, it will "die"
481 on errors instead of "warn". If set to anything unrecognized, it will
482 be silently ignored.
483
484 Future extensions to this feature will include more reliable auto-
485 detection of "autodie" being active in the scope of which the error
486 occurred which will increment the value of "auto_diag" with 1 the
487 moment the error is detected.
488
489 diag_verbose
490
491 my $csv = Text::CSV_XS->new ({ diag_verbose => 1 });
492 $csv->diag_verbose (2);
493 my $l = $csv->diag_verbose;
494
495 Set the verbosity of the output triggered by "auto_diag". Currently
496 only adds the current input-record-number (if known) to the
497 diagnostic output with an indication of the position of the error.
498
499 blank_is_undef
500
501 my $csv = Text::CSV_XS->new ({ blank_is_undef => 1 });
502 $csv->blank_is_undef (0);
503 my $f = $csv->blank_is_undef;
504
505 Under normal circumstances, "CSV" data makes no distinction between
506 quoted- and unquoted empty fields. These both end up in an empty
507 string field once read, thus
508
509 1,"",," ",2
510
511 is read as
512
513 ("1", "", "", " ", "2")
514
515 When writing "CSV" files with either "always_quote" or "quote_empty"
516 set, the unquoted empty field is the result of an undefined value.
517 To enable this distinction when reading "CSV" data, the
518 "blank_is_undef" attribute will cause unquoted empty fields to be set
519 to "undef", causing the above to be parsed as
520
521 ("1", "", undef, " ", "2")
522
523 Note that this is specifically important when loading "CSV" fields
524 into a database that allows "NULL" values, as the perl equivalent for
525 "NULL" is "undef" in DBI land.
526
527 empty_is_undef
528
529 my $csv = Text::CSV_XS->new ({ empty_is_undef => 1 });
530 $csv->empty_is_undef (0);
531 my $f = $csv->empty_is_undef;
532
533 Going one step further than "blank_is_undef", this attribute
534 converts all empty fields to "undef", so
535
536 1,"",," ",2
537
538 is read as
539
540 (1, undef, undef, " ", 2)
541
542 Note that this affects only fields that are originally empty, not
543 fields that are empty after stripping allowed whitespace. YMMV.
544
545 allow_whitespace
546
547 my $csv = Text::CSV_XS->new ({ allow_whitespace => 1 });
548 $csv->allow_whitespace (0);
549 my $f = $csv->allow_whitespace;
550
551 When this option is set to true, the whitespace ("TAB"'s and
552 "SPACE"'s) surrounding the separation character is removed when
553 parsing. If either "TAB" or "SPACE" is one of the three characters
554 "sep_char", "quote_char", or "escape_char" it will not be considered
555 whitespace.
556
557 Now lines like:
558
559 1 , "foo" , bar , 3 , zapp
560
561 are parsed as valid "CSV", even though it violates the "CSV" specs.
562
563 Note that all whitespace is stripped from both start and end of
564 each field. That would make it more than a feature to enable parsing
565 bad "CSV" lines, as
566
567 1, 2.0, 3, ape , monkey
568
569 will now be parsed as
570
571 ("1", "2.0", "3", "ape", "monkey")
572
573 even if the original line was perfectly acceptable "CSV".
574
575 allow_loose_quotes
576
577 my $csv = Text::CSV_XS->new ({ allow_loose_quotes => 1 });
578 $csv->allow_loose_quotes (0);
579 my $f = $csv->allow_loose_quotes;
580
581 By default, parsing unquoted fields containing "quote_char" characters
582 like
583
584 1,foo "bar" baz,42
585
586 would result in parse error 2034. Though it is still bad practice to
587 allow this format, we cannot help the fact that some vendors
588 make their applications spit out lines styled this way.
589
590 If there is really bad "CSV" data, like
591
592 1,"foo "bar" baz",42
593
594 or
595
596 1,""foo bar baz"",42
597
598 there is a way to get this data-line parsed and leave the quotes inside
599 the quoted field as-is. This can be achieved by setting
600 "allow_loose_quotes" AND making sure that the "escape_char" is not
601 equal to "quote_char".
602
603 allow_loose_escapes
604
605 my $csv = Text::CSV_XS->new ({ allow_loose_escapes => 1 });
606 $csv->allow_loose_escapes (0);
607 my $f = $csv->allow_loose_escapes;
608
609 Parsing fields that have "escape_char" characters that escape
610 characters that do not need to be escaped, like:
611
612 my $csv = Text::CSV_XS->new ({ escape_char => "\\" });
613 $csv->parse (qq{1,"my bar\'s",baz,42});
614
615 would result in parse error 2025. Though it is bad practice to allow
616 this format, this attribute enables you to treat all escape character
617 sequences equal.
618
619 allow_unquoted_escape
620
621 my $csv = Text::CSV_XS->new ({ allow_unquoted_escape => 1 });
622 $csv->allow_unquoted_escape (0);
623 my $f = $csv->allow_unquoted_escape;
624
625 A backward compatibility issue where "escape_char" differs from
626 "quote_char" prevents "escape_char" to be in the first position of a
627 field. If "quote_char" is equal to the default """ and "escape_char"
628 is set to "\", this would be illegal:
629
630 1,\0,2
631
632 Setting this attribute to 1 might help to overcome issues with
633 backward compatibility and allow this style.
634
635 always_quote
636
637 my $csv = Text::CSV_XS->new ({ always_quote => 1 });
638 $csv->always_quote (0);
639 my $f = $csv->always_quote;
640
641 By default the generated fields are quoted only if they need to be.
642 For example, if they contain the separator character. If you set this
643 attribute to 1 then all defined fields will be quoted. ("undef" fields
644 are not quoted, see "blank_is_undef"). This makes it quite often easier
645 to handle exported data in external applications. (Poor creatures who
646 are better to use Text::CSV_XS. :)
647
648 quote_space
649
650 my $csv = Text::CSV_XS->new ({ quote_space => 1 });
651 $csv->quote_space (0);
652 my $f = $csv->quote_space;
653
654 By default, a space in a field would trigger quotation. As no rule
655 exists this to be forced in "CSV", nor any for the opposite, the
656 default is true for safety. You can exclude the space from this
657 trigger by setting this attribute to 0.
658
659 quote_empty
660
661 my $csv = Text::CSV_XS->new ({ quote_empty => 1 });
662 $csv->quote_empty (0);
663 my $f = $csv->quote_empty;
664
665 By default the generated fields are quoted only if they need to be.
666 An empty (defined) field does not need quotation. If you set this
667 attribute to 1 then empty defined fields will be quoted. ("undef"
668 fields are not quoted, see "blank_is_undef"). See also "always_quote".
669
670 quote_binary
671
672 my $csv = Text::CSV_XS->new ({ quote_binary => 1 });
673 $csv->quote_binary (0);
674 my $f = $csv->quote_binary;
675
676 By default, all "unsafe" bytes inside a string cause the combined
677 field to be quoted. By setting this attribute to 0, you can disable
678 that trigger for bytes >= 0x7F.
679
680 escape_null
681
682 my $csv = Text::CSV_XS->new ({ escape_null => 1 });
683 $csv->escape_null (0);
684 my $f = $csv->escape_null;
685
686 By default, a "NULL" byte in a field would be escaped. This option
687 enables you to treat the "NULL" byte as a simple binary character in
688 binary mode (the "{ binary => 1 }" is set). The default is true. You
689 can prevent "NULL" escapes by setting this attribute to 0.
690
691 When the "escape_char" attribute is set to undefined, this attribute
692 will be set to false.
693
694 The default setting will encode "=\x00=" as
695
696 "="0="
697
698 With "escape_null" set, this will result in
699
700 "=\x00="
701
702 The default when using the "csv" function is "false".
703
704 For backward compatibility reasons, the deprecated old name
705 "quote_null" is still recognized.
706
707 keep_meta_info
708
709 my $csv = Text::CSV_XS->new ({ keep_meta_info => 1 });
710 $csv->keep_meta_info (0);
711 my $f = $csv->keep_meta_info;
712
713 By default, the parsing of input records is as simple and fast as
714 possible. However, some parsing information - like quotation of the
715 original field - is lost in that process. Setting this flag to true
716 enables retrieving that information after parsing with the methods
717 "meta_info", "is_quoted", and "is_binary" described below. Default is
718 false for performance.
719
720 If you set this attribute to a value greater than 9, then you can
721 control output quotation style like it was used in the input of the the
722 last parsed record (unless quotation was added because of other
723 reasons).
724
725 my $csv = Text::CSV_XS->new ({
726 binary => 1,
727 keep_meta_info => 1,
728 quote_space => 0,
729 });
730
731 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
732
733 $csv->print (*STDOUT, \@row);
734 # 1,,, , ,f,g,"h""h",help,help
735 $csv->keep_meta_info (11);
736 $csv->print (*STDOUT, \@row);
737 # 1,,"", ," ",f,"g","h""h",help,"help"
738
739 undef_str
740
741 my $csv = Text::CSV_XS->new ({ undef_str => "\\N" });
742 $csv->undef_str (undef);
743 my $s = $csv->undef_str;
744
745 This attribute optionally defines the output of undefined fields. The
746 value passed is not changed at all, so if it needs quotation, the
747 quotation needs to be included in the value of the attribute. Use with
748 caution, as passing a value like ",",,,,""" will for sure mess up
749 your output. The default for this attribute is "undef", meaning no
750 special treatment.
751
752 This attribute is useful when exporting CSV data to be imported in
753 custom loaders, like for MySQL, that recognize special sequences for
754 "NULL" data.
755
756 This attribute has no meaning when parsing CSV data.
757
758 verbatim
759
760 my $csv = Text::CSV_XS->new ({ verbatim => 1 });
761 $csv->verbatim (0);
762 my $f = $csv->verbatim;
763
764 This is a quite controversial attribute to set, but makes some hard
765 things possible.
766
767 The rationale behind this attribute is to tell the parser that the
768 normally special characters newline ("NL") and Carriage Return ("CR")
769 will not be special when this flag is set, and be dealt with as being
770 ordinary binary characters. This will ease working with data with
771 embedded newlines.
772
773 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
774 every line.
775
776 Imagine a file format like
777
778 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
779
780 where, the line ending is a very specific "#\r\n", and the sep_char is
781 a "^" (caret). None of the fields is quoted, but embedded binary
782 data is likely to be present. With the specific line ending, this
783 should not be too hard to detect.
784
785 By default, Text::CSV_XS' parse function is instructed to only know
786 about "\n" and "\r" to be legal line endings, and so has to deal with
787 the embedded newline as a real "end-of-line", so it can scan the next
788 line if binary is true, and the newline is inside a quoted field. With
789 this option, we tell "parse" to parse the line as if "\n" is just
790 nothing more than a binary character.
791
792 For "parse" this means that the parser has no more idea about line
793 ending and "getline" "chomp"s line endings on reading.
794
795 types
796
797 A set of column types; the attribute is immediately passed to the
798 "types" method.
799
800 callbacks
801
802 See the "Callbacks" section below.
803
804 accessors
805
806 To sum it up,
807
808 $csv = Text::CSV_XS->new ();
809
810 is equivalent to
811
812 $csv = Text::CSV_XS->new ({
813 eol => undef, # \r, \n, or \r\n
814 sep_char => ',',
815 sep => undef,
816 quote_char => '"',
817 quote => undef,
818 escape_char => '"',
819 binary => 0,
820 decode_utf8 => 1,
821 auto_diag => 0,
822 diag_verbose => 0,
823 blank_is_undef => 0,
824 empty_is_undef => 0,
825 allow_whitespace => 0,
826 allow_loose_quotes => 0,
827 allow_loose_escapes => 0,
828 allow_unquoted_escape => 0,
829 always_quote => 0,
830 quote_empty => 0,
831 quote_space => 1,
832 escape_null => 1,
833 quote_binary => 1,
834 keep_meta_info => 0,
835 strict => 0,
836 formula => 0,
837 verbatim => 0,
838 undef_str => undef,
839 types => undef,
840 callbacks => undef,
841 });
842
843 For all of the above mentioned flags, an accessor method is available
844 where you can inquire the current value, or change the value
845
846 my $quote = $csv->quote_char;
847 $csv->binary (1);
848
849 It is not wise to change these settings halfway through writing "CSV"
850 data to a stream. If however you want to create a new stream using the
851 available "CSV" object, there is no harm in changing them.
852
853 If the "new" constructor call fails, it returns "undef", and makes
854 the fail reason available through the "error_diag" method.
855
856 $csv = Text::CSV_XS->new ({ ecs_char => 1 }) or
857 die "".Text::CSV_XS->error_diag ();
858
859 "error_diag" will return a string like
860
861 "INI - Unknown attribute 'ecs_char'"
862
863 known_attributes
864 @attr = Text::CSV_XS->known_attributes;
865 @attr = Text::CSV_XS::known_attributes;
866 @attr = $csv->known_attributes;
867
868 This method will return an ordered list of all the supported
869 attributes as described above. This can be useful for knowing what
870 attributes are valid in classes that use or extend Text::CSV_XS.
871
872 print
873 $status = $csv->print ($fh, $colref);
874
875 Similar to "combine" + "string" + "print", but much more efficient.
876 It expects an array ref as input (not an array!) and the resulting
877 string is not really created, but immediately written to the $fh
878 object, typically an IO handle or any other object that offers a
879 "print" method.
880
881 For performance reasons "print" does not create a result string, so
882 all "string", "status", "fields", and "error_input" methods will return
883 undefined information after executing this method.
884
885 If $colref is "undef" (explicit, not through a variable argument) and
886 "bind_columns" was used to specify fields to be printed, it is
887 possible to make performance improvements, as otherwise data would have
888 to be copied as arguments to the method call:
889
890 $csv->bind_columns (\($foo, $bar));
891 $status = $csv->print ($fh, undef);
892
893 A short benchmark
894
895 my @data = ("aa" .. "zz");
896 $csv->bind_columns (\(@data));
897
898 $csv->print ($fh, [ @data ]); # 11800 recs/sec
899 $csv->print ($fh, \@data ); # 57600 recs/sec
900 $csv->print ($fh, undef ); # 48500 recs/sec
901
902 say
903 $status = $csv->say ($fh, $colref);
904
905 Like "print", but "eol" defaults to "$\".
906
907 print_hr
908 $csv->print_hr ($fh, $ref);
909
910 Provides an easy way to print a $ref (as fetched with "getline_hr")
911 provided the column names are set with "column_names".
912
913 It is just a wrapper method with basic parameter checks over
914
915 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
916
917 combine
918 $status = $csv->combine (@fields);
919
920 This method constructs a "CSV" record from @fields, returning success
921 or failure. Failure can result from lack of arguments or an argument
922 that contains an invalid character. Upon success, "string" can be
923 called to retrieve the resultant "CSV" string. Upon failure, the
924 value returned by "string" is undefined and "error_input" could be
925 called to retrieve the invalid argument.
926
927 string
928 $line = $csv->string ();
929
930 This method returns the input to "parse" or the resultant "CSV"
931 string of "combine", whichever was called more recently.
932
933 getline
934 $colref = $csv->getline ($fh);
935
936 This is the counterpart to "print", as "parse" is the counterpart to
937 "combine": it parses a row from the $fh handle using the "getline"
938 method associated with $fh and parses this row into an array ref.
939 This array ref is returned by the function or "undef" for failure.
940 When $fh does not support "getline", you are likely to hit errors.
941
942 When fields are bound with "bind_columns" the return value is a
943 reference to an empty list.
944
945 The "string", "fields", and "status" methods are meaningless again.
946
947 getline_all
948 $arrayref = $csv->getline_all ($fh);
949 $arrayref = $csv->getline_all ($fh, $offset);
950 $arrayref = $csv->getline_all ($fh, $offset, $length);
951
952 This will return a reference to a list of getline ($fh) results. In
953 this call, "keep_meta_info" is disabled. If $offset is negative, as
954 with "splice", only the last "abs ($offset)" records of $fh are taken
955 into consideration.
956
957 Given a CSV file with 10 lines:
958
959 lines call
960 ----- ---------------------------------------------------------
961 0..9 $csv->getline_all ($fh) # all
962 0..9 $csv->getline_all ($fh, 0) # all
963 8..9 $csv->getline_all ($fh, 8) # start at 8
964 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
965 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
966 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
967 8..9 $csv->getline_all ($fh, -2) # last 2 rows
968 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
969
970 getline_hr
971 The "getline_hr" and "column_names" methods work together to allow you
972 to have rows returned as hashrefs. You must call "column_names" first
973 to declare your column names.
974
975 $csv->column_names (qw( code name price description ));
976 $hr = $csv->getline_hr ($fh);
977 print "Price for $hr->{name} is $hr->{price} EUR\n";
978
979 "getline_hr" will croak if called before "column_names".
980
981 Note that "getline_hr" creates a hashref for every row and will be
982 much slower than the combined use of "bind_columns" and "getline" but
983 still offering the same easy to use hashref inside the loop:
984
985 my @cols = @{$csv->getline ($fh)};
986 $csv->column_names (@cols);
987 while (my $row = $csv->getline_hr ($fh)) {
988 print $row->{price};
989 }
990
991 Could easily be rewritten to the much faster:
992
993 my @cols = @{$csv->getline ($fh)};
994 my $row = {};
995 $csv->bind_columns (\@{$row}{@cols});
996 while ($csv->getline ($fh)) {
997 print $row->{price};
998 }
999
1000 Your mileage may vary for the size of the data and the number of rows.
1001 With perl-5.14.2 the comparison for a 100_000 line file with 14
1002 columns:
1003
1004 Rate hashrefs getlines
1005 hashrefs 1.00/s -- -76%
1006 getlines 4.15/s 313% --
1007
1008 getline_hr_all
1009 $arrayref = $csv->getline_hr_all ($fh);
1010 $arrayref = $csv->getline_hr_all ($fh, $offset);
1011 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
1012
1013 This will return a reference to a list of getline_hr ($fh) results.
1014 In this call, "keep_meta_info" is disabled.
1015
1016 parse
1017 $status = $csv->parse ($line);
1018
1019 This method decomposes a "CSV" string into fields, returning success
1020 or failure. Failure can result from a lack of argument or the given
1021 "CSV" string is improperly formatted. Upon success, "fields" can be
1022 called to retrieve the decomposed fields. Upon failure calling "fields"
1023 will return undefined data and "error_input" can be called to
1024 retrieve the invalid argument.
1025
1026 You may use the "types" method for setting column types. See "types"'
1027 description below.
1028
1029 The $line argument is supposed to be a simple scalar. Everything else
1030 is supposed to croak and set error 1500.
1031
1032 fragment
1033 This function tries to implement RFC7111 (URI Fragment Identifiers for
1034 the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
1035
1036 my $AoA = $csv->fragment ($fh, $spec);
1037
1038 In specifications, "*" is used to specify the last item, a dash ("-")
1039 to indicate a range. All indices are 1-based: the first row or
1040 column has index 1. Selections can be combined with the semi-colon
1041 (";").
1042
1043 When using this method in combination with "column_names", the
1044 returned reference will point to a list of hashes instead of a list
1045 of lists. A disjointed cell-based combined selection might return
1046 rows with different number of columns making the use of hashes
1047 unpredictable.
1048
1049 $csv->column_names ("Name", "Age");
1050 my $AoH = $csv->fragment ($fh, "col=3;8");
1051
1052 If the "after_parse" callback is active, it is also called on every
1053 line parsed and skipped before the fragment.
1054
1055 row
1056 row=4
1057 row=5-7
1058 row=6-*
1059 row=1-2;4;6-*
1060
1061 col
1062 col=2
1063 col=1-3
1064 col=4-*
1065 col=1-2;4;7-*
1066
1067 cell
1068 In cell-based selection, the comma (",") is used to pair row and
1069 column
1070
1071 cell=4,1
1072
1073 The range operator ("-") using "cell"s can be used to define top-left
1074 and bottom-right "cell" location
1075
1076 cell=3,1-4,6
1077
1078 The "*" is only allowed in the second part of a pair
1079
1080 cell=3,2-*,2 # row 3 till end, only column 2
1081 cell=3,2-3,* # column 2 till end, only row 3
1082 cell=3,2-*,* # strip row 1 and 2, and column 1
1083
1084 Cells and cell ranges may be combined with ";", possibly resulting in
1085 rows with different numbers of columns
1086
1087 cell=1,1-2,2;3,3-4,4;1,4;4,1
1088
1089 Disjointed selections will only return selected cells. The cells
1090 that are not specified will not be included in the returned
1091 set, not even as "undef". As an example given a "CSV" like
1092
1093 11,12,13,...19
1094 21,22,...28,29
1095 : :
1096 91,...97,98,99
1097
1098 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1099
1100 11,12,14
1101 21,22
1102 33,34
1103 41,43,44
1104
1105 Overlapping cell-specs will return those cells only once, So
1106 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1107
1108 11,12,13
1109 21,22,23,24
1110 31,32,33,34
1111 42,43,44
1112
1113 RFC7111 <http://tools.ietf.org/html/rfc7111> does not allow different
1114 types of specs to be combined (either "row" or "col" or "cell").
1115 Passing an invalid fragment specification will croak and set error
1116 2013.
1117
1118 column_names
1119 Set the "keys" that will be used in the "getline_hr" calls. If no
1120 keys (column names) are passed, it will return the current setting as a
1121 list.
1122
1123 "column_names" accepts a list of scalars (the column names) or a
1124 single array_ref, so you can pass the return value from "getline" too:
1125
1126 $csv->column_names ($csv->getline ($fh));
1127
1128 "column_names" does no checking on duplicates at all, which might lead
1129 to unexpected results. Undefined entries will be replaced with the
1130 string "\cAUNDEF\cA", so
1131
1132 $csv->column_names (undef, "", "name", "name");
1133 $hr = $csv->getline_hr ($fh);
1134
1135 will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1136 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1137 field.
1138
1139 "column_names" croaks on invalid arguments.
1140
1141 header
1142 This method does NOT work in perl-5.6.x
1143
1144 Parse the CSV header and set "sep", column_names and encoding.
1145
1146 my @hdr = $csv->header ($fh);
1147 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1148 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1149
1150 The first argument should be a file handle.
1151
1152 This method resets some object properties, as it is supposed to be
1153 invoked only once per file or stream. It will leave attributes
1154 "column_names" and "bound_columns" alone if setting column names is
1155 disabled. Reading headers on previously process objects might fail on
1156 perl-5.8.0 and older.
1157
1158 Assuming that the file opened for parsing has a header, and the header
1159 does not contain problematic characters like embedded newlines, read
1160 the first line from the open handle then auto-detect whether the header
1161 separates the column names with a character from the allowed separator
1162 list.
1163
1164 If any of the allowed separators matches, and none of the other
1165 allowed separators match, set "sep" to that separator for the
1166 current CSV_XS instance and use it to parse the first line, map those
1167 to lowercase, and use that to set the instance "column_names":
1168
1169 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1170 open my $fh, "<", "file.csv";
1171 binmode $fh; # for Windows
1172 $csv->header ($fh);
1173 while (my $row = $csv->getline_hr ($fh)) {
1174 ...
1175 }
1176
1177 If the header is empty, contains more than one unique separator out of
1178 the allowed set, contains empty fields, or contains identical fields
1179 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1180 respectively.
1181
1182 If the header contains embedded newlines or is not valid CSV in any
1183 other way, this method will croak and leave the parse error untouched.
1184
1185 A successful call to "header" will always set the "sep" of the $csv
1186 object. This behavior can not be disabled.
1187
1188 return value
1189
1190 On error this method will croak.
1191
1192 In list context, the headers will be returned whether they are used to
1193 set "column_names" or not.
1194
1195 In scalar context, the instance itself is returned. Note: the values
1196 as found in the header will effectively be lost if "set_column_names"
1197 is false.
1198
1199 Options
1200
1201 sep_set
1202 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1203
1204 The list of legal separators defaults to "[ ";", "," ]" and can be
1205 changed by this option. As this is probably the most often used
1206 option, it can be passed on its own as an unnamed argument:
1207
1208 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1209
1210 Multi-byte sequences are allowed, both multi-character and
1211 Unicode. See "sep".
1212
1213 detect_bom
1214 $csv->header ($fh, { detect_bom => 1 });
1215
1216 The default behavior is to detect if the header line starts with a
1217 BOM. If the header has a BOM, use that to set the encoding of $fh.
1218 This default behavior can be disabled by passing a false value to
1219 "detect_bom".
1220
1221 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1222 UTF-32BE, and UTF-32LE. BOM also supports UTF-1, UTF-EBCDIC, SCSU,
1223 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1224 supported.
1225
1226 If a supported BOM was detected as start of the stream, it is stored
1227 in the object attribute "ENCODING".
1228
1229 my $enc = $csv->{ENCODING};
1230
1231 The encoding is used with "binmode" on $fh.
1232
1233 If the handle was opened in a (correct) encoding, this method will
1234 not alter the encoding, as it checks the leading bytes of the first
1235 line. In case the stream starts with a decoded BOM ("U+FEFF"),
1236 "{ENCODING}" will be "" (empty) instead of the default "undef".
1237
1238 munge_column_names
1239 This option offers the means to modify the column names into
1240 something that is most useful to the application. The default is to
1241 map all column names to lower case.
1242
1243 $csv->header ($fh, { munge_column_names => "lc" });
1244
1245 The following values are available:
1246
1247 lc - lower case
1248 uc - upper case
1249 db - valid DB field names
1250 none - do not change
1251 \%hash - supply a mapping
1252 \&cb - supply a callback
1253
1254 Lower case
1255 $csv->header ($fh, { munge_column_names => "lc" });
1256
1257 The header is changed to all lower-case
1258
1259 $_ = lc;
1260
1261 Upper case
1262 $csv->header ($fh, { munge_column_names => "uc" });
1263
1264 The header is changed to all upper-case
1265
1266 $_ = uc;
1267
1268 Literal
1269 $csv->header ($fh, { munge_column_names => "none" });
1270
1271 Hash
1272 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1273
1274 if a value does not exist, the original value is used unchanged
1275
1276 Database
1277 $csv->header ($fh, { munge_column_names => "db" });
1278
1279 - lower-case
1280
1281 - all sequences of non-word characters are replaced with an
1282 underscore
1283
1284 - all leading underscores are removed
1285
1286 $_ = lc (s/\W+/_/gr =~ s/^_+//r);
1287
1288 Callback
1289 $csv->header ($fh, { munge_column_names => sub { fc } });
1290 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1291 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1292
1293 As this callback is called in a "map", you can use $_ directly.
1294
1295 set_column_names
1296 $csv->header ($fh, { set_column_names => 1 });
1297
1298 The default is to set the instances column names using
1299 "column_names" if the method is successful, so subsequent calls to
1300 "getline_hr" can return a hash. Disable setting the header can be
1301 forced by using a false value for this option.
1302
1303 As described in "return value" above, content is lost in scalar
1304 context.
1305
1306 Validation
1307
1308 When receiving CSV files from external sources, this method can be
1309 used to protect against changes in the layout by restricting to known
1310 headers (and typos in the header fields).
1311
1312 my %known = (
1313 "record key" => "c_rec",
1314 "rec id" => "c_rec",
1315 "id_rec" => "c_rec",
1316 "kode" => "code",
1317 "code" => "code",
1318 "vaule" => "value",
1319 "value" => "value",
1320 );
1321 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
1322 open my $fh, "<", $source or die "$source: $!";
1323 $csv->header ($fh, { munge_column_names => sub {
1324 s/\s+$//;
1325 s/^\s+//;
1326 $known{lc $_} or die "Unknown column '$_' in $source";
1327 }});
1328 while (my $row = $csv->getline_hr ($fh)) {
1329 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1330 }
1331
1332 bind_columns
1333 Takes a list of scalar references to be used for output with "print"
1334 or to store in the fields fetched by "getline". When you do not pass
1335 enough references to store the fetched fields in, "getline" will fail
1336 with error 3006. If you pass more than there are fields to return,
1337 the content of the remaining references is left untouched.
1338
1339 $csv->bind_columns (\$code, \$name, \$price, \$description);
1340 while ($csv->getline ($fh)) {
1341 print "The price of a $name is \x{20ac} $price\n";
1342 }
1343
1344 To reset or clear all column binding, call "bind_columns" with the
1345 single argument "undef". This will also clear column names.
1346
1347 $csv->bind_columns (undef);
1348
1349 If no arguments are passed at all, "bind_columns" will return the list
1350 of current bindings or "undef" if no binds are active.
1351
1352 Note that in parsing with "bind_columns", the fields are set on the
1353 fly. That implies that if the third field of a row causes an error
1354 (or this row has just two fields where the previous row had more), the
1355 first two fields already have been assigned the values of the current
1356 row, while the rest of the fields will still hold the values of the
1357 previous row. If you want the parser to fail in these cases, use the
1358 "strict" attribute.
1359
1360 eof
1361 $eof = $csv->eof ();
1362
1363 If "parse" or "getline" was used with an IO stream, this method will
1364 return true (1) if the last call hit end of file, otherwise it will
1365 return false (''). This is useful to see the difference between a
1366 failure and end of file.
1367
1368 Note that if the parsing of the last line caused an error, "eof" is
1369 still true. That means that if you are not using "auto_diag", an idiom
1370 like
1371
1372 while (my $row = $csv->getline ($fh)) {
1373 # ...
1374 }
1375 $csv->eof or $csv->error_diag;
1376
1377 will not report the error. You would have to change that to
1378
1379 while (my $row = $csv->getline ($fh)) {
1380 # ...
1381 }
1382 +$csv->error_diag and $csv->error_diag;
1383
1384 types
1385 $csv->types (\@tref);
1386
1387 This method is used to force that (all) columns are of a given type.
1388 For example, if you have an integer column, two columns with
1389 doubles and a string column, then you might do a
1390
1391 $csv->types ([Text::CSV_XS::IV (),
1392 Text::CSV_XS::NV (),
1393 Text::CSV_XS::NV (),
1394 Text::CSV_XS::PV ()]);
1395
1396 Column types are used only for decoding columns while parsing, in
1397 other words by the "parse" and "getline" methods.
1398
1399 You can unset column types by doing a
1400
1401 $csv->types (undef);
1402
1403 or fetch the current type settings with
1404
1405 $types = $csv->types ();
1406
1407 IV Set field type to integer.
1408
1409 NV Set field type to numeric/float.
1410
1411 PV Set field type to string.
1412
1413 fields
1414 @columns = $csv->fields ();
1415
1416 This method returns the input to "combine" or the resultant
1417 decomposed fields of a successful "parse", whichever was called more
1418 recently.
1419
1420 Note that the return value is undefined after using "getline", which
1421 does not fill the data structures returned by "parse".
1422
1423 meta_info
1424 @flags = $csv->meta_info ();
1425
1426 This method returns the "flags" of the input to "combine" or the flags
1427 of the resultant decomposed fields of "parse", whichever was called
1428 more recently.
1429
1430 For each field, a meta_info field will hold flags that inform
1431 something about the field returned by the "fields" method or
1432 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1433
1434 " "0x0001
1435 The field was quoted.
1436
1437 " "0x0002
1438 The field was binary.
1439
1440 See the "is_***" methods below.
1441
1442 is_quoted
1443 my $quoted = $csv->is_quoted ($column_idx);
1444
1445 where $column_idx is the (zero-based) index of the column in the
1446 last result of "parse".
1447
1448 This returns a true value if the data in the indicated column was
1449 enclosed in "quote_char" quotes. This might be important for fields
1450 where content ",20070108," is to be treated as a numeric value, and
1451 where ","20070108"," is explicitly marked as character string data.
1452
1453 This method is only valid when "keep_meta_info" is set to a true value.
1454
1455 is_binary
1456 my $binary = $csv->is_binary ($column_idx);
1457
1458 where $column_idx is the (zero-based) index of the column in the
1459 last result of "parse".
1460
1461 This returns a true value if the data in the indicated column contained
1462 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1463
1464 This method is only valid when "keep_meta_info" is set to a true value.
1465
1466 is_missing
1467 my $missing = $csv->is_missing ($column_idx);
1468
1469 where $column_idx is the (zero-based) index of the column in the
1470 last result of "getline_hr".
1471
1472 $csv->keep_meta_info (1);
1473 while (my $hr = $csv->getline_hr ($fh)) {
1474 $csv->is_missing (0) and next; # This was an empty line
1475 }
1476
1477 When using "getline_hr", it is impossible to tell if the parsed
1478 fields are "undef" because they where not filled in the "CSV" stream
1479 or because they were not read at all, as all the fields defined by
1480 "column_names" are set in the hash-ref. If you still need to know if
1481 all fields in each row are provided, you should enable "keep_meta_info"
1482 so you can check the flags.
1483
1484 If "keep_meta_info" is "false", "is_missing" will always return
1485 "undef", regardless of $column_idx being valid or not. If this
1486 attribute is "true" it will return either 0 (the field is present) or 1
1487 (the field is missing).
1488
1489 A special case is the empty line. If the line is completely empty -
1490 after dealing with the flags - this is still a valid CSV line: it is a
1491 record of just one single empty field. However, if "keep_meta_info" is
1492 set, invoking "is_missing" with index 0 will now return true.
1493
1494 status
1495 $status = $csv->status ();
1496
1497 This method returns the status of the last invoked "combine" or "parse"
1498 call. Status is success (true: 1) or failure (false: "undef" or 0).
1499
1500 error_input
1501 $bad_argument = $csv->error_input ();
1502
1503 This method returns the erroneous argument (if it exists) of "combine"
1504 or "parse", whichever was called more recently. If the last
1505 invocation was successful, "error_input" will return "undef".
1506
1507 error_diag
1508 Text::CSV_XS->error_diag ();
1509 $csv->error_diag ();
1510 $error_code = 0 + $csv->error_diag ();
1511 $error_str = "" . $csv->error_diag ();
1512 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1513
1514 If (and only if) an error occurred, this function returns the
1515 diagnostics of that error.
1516
1517 If called in void context, this will print the internal error code and
1518 the associated error message to STDERR.
1519
1520 If called in list context, this will return the error code and the
1521 error message in that order. If the last error was from parsing, the
1522 rest of the values returned are a best guess at the location within
1523 the line that was being parsed. Their values are 1-based. The
1524 position currently is index of the byte at which the parsing failed in
1525 the current record. It might change to be the index of the current
1526 character in a later release. The records is the index of the record
1527 parsed by the csv instance. The field number is the index of the field
1528 the parser thinks it is currently trying to parse. See
1529 examples/csv-check for how this can be used.
1530
1531 If called in scalar context, it will return the diagnostics in a
1532 single scalar, a-la $!. It will contain the error code in numeric
1533 context, and the diagnostics message in string context.
1534
1535 When called as a class method or a direct function call, the
1536 diagnostics are that of the last "new" call.
1537
1538 record_number
1539 $recno = $csv->record_number ();
1540
1541 Returns the records parsed by this csv instance. This value should be
1542 more accurate than $. when embedded newlines come in play. Records
1543 written by this instance are not counted.
1544
1545 SetDiag
1546 $csv->SetDiag (0);
1547
1548 Use to reset the diagnostics if you are dealing with errors.
1549
1551 csv
1552 This function is not exported by default and should be explicitly
1553 requested:
1554
1555 use Text::CSV_XS qw( csv );
1556
1557 This is a high-level function that aims at simple (user) interfaces.
1558 This can be used to read/parse a "CSV" file or stream (the default
1559 behavior) or to produce a file or write to a stream (define the "out"
1560 attribute). It returns an array- or hash-reference on parsing (or
1561 "undef" on fail) or the numeric value of "error_diag" on writing.
1562 When this function fails you can get to the error using the class call
1563 to "error_diag"
1564
1565 my $aoa = csv (in => "test.csv") or
1566 die Text::CSV_XS->error_diag;
1567
1568 This function takes the arguments as key-value pairs. This can be
1569 passed as a list or as an anonymous hash:
1570
1571 my $aoa = csv ( in => "test.csv", sep_char => ";");
1572 my $aoh = csv ({ in => $fh, headers => "auto" });
1573
1574 The arguments passed consist of two parts: the arguments to "csv"
1575 itself and the optional attributes to the "CSV" object used inside
1576 the function as enumerated and explained in "new".
1577
1578 If not overridden, the default option used for CSV is
1579
1580 auto_diag => 1
1581 escape_null => 0
1582
1583 The option that is always set and cannot be altered is
1584
1585 binary => 1
1586
1587 As this function will likely be used in one-liners, it allows "quote"
1588 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1589 "esc" or "escape".
1590
1591 Alternative invocations:
1592
1593 my $aoa = Text::CSV_XS::csv (in => "file.csv");
1594
1595 my $csv = Text::CSV_XS->new ();
1596 my $aoa = $csv->csv (in => "file.csv");
1597
1598 In the latter case, the object attributes are used from the existing
1599 object and the attribute arguments in the function call are ignored:
1600
1601 my $csv = Text::CSV_XS->new ({ sep_char => ";" });
1602 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1603
1604 will parse using ";" as "sep_char", not ",".
1605
1606 in
1607
1608 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1609 which will be opened for reading and closed when finished, a file
1610 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1611 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1612 "\q{1,2,"csv"}").
1613
1614 When used with "out", "in" should be a reference to a CSV structure
1615 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1616 reference. The code-ref will be invoked with no arguments.
1617
1618 my $aoa = csv (in => "file.csv");
1619
1620 open my $fh, "<", "file.csv";
1621 my $aoa = csv (in => $fh);
1622
1623 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1624 my $err = csv (in => $csv, out => "file.csv");
1625
1626 If called in void context without the "out" attribute, the resulting
1627 ref will be used as input to a subsequent call to csv:
1628
1629 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1630
1631 will be a shortcut to
1632
1633 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1634
1635 where, in the absence of the "out" attribute, this is a shortcut to
1636
1637 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1638 out => *STDOUT)
1639
1640 out
1641
1642 csv (in => $aoa, out => "file.csv");
1643 csv (in => $aoa, out => $fh);
1644 csv (in => $aoa, out => STDOUT);
1645 csv (in => $aoa, out => *STDOUT);
1646 csv (in => $aoa, out => \*STDOUT);
1647 csv (in => $aoa, out => \my $data);
1648 csv (in => $aoa, out => undef);
1649 csv (in => $aoa, out => \"skip");
1650
1651 In output mode, the default CSV options when producing CSV are
1652
1653 eol => "\r\n"
1654
1655 The "fragment" attribute is ignored in output mode.
1656
1657 "out" can be a file name (e.g. "file.csv"), which will be opened for
1658 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1659 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1660 or a reference to a scalar (e.g. "\my $data").
1661
1662 csv (in => sub { $sth->fetch }, out => "dump.csv");
1663 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1664 headers => $sth->{NAME_lc});
1665
1666 When a code-ref is used for "in", the output is generated per
1667 invocation, so no buffering is involved. This implies that there is no
1668 size restriction on the number of records. The "csv" function ends when
1669 the coderef returns a false value.
1670
1671 If "out" is set to a reference of the literal string "skip", the output
1672 will be suppressed completely, which might be useful in combination
1673 with a filter for side effects only.
1674
1675 my %cache;
1676 csv (in => "dump.csv",
1677 out => \"skip",
1678 on_in => sub { $cache{$_[1][1]}++ });
1679
1680 Currently, setting "out" to any false value ("undef", "", 0) will be
1681 equivalent to "\"skip"".
1682
1683 encoding
1684
1685 If passed, it should be an encoding accepted by the ":encoding()"
1686 option to "open". There is no default value. This attribute does not
1687 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1688 use in command line invocations.
1689
1690 If "encoding" is set to the literal value "auto", the method "header"
1691 will be invoked on the opened stream to check if there is a BOM and set
1692 the encoding accordingly. This is equal to passing a true value in
1693 the option "detect_bom".
1694
1695 Encodings can be stacked, as supported by "binmode":
1696
1697 # Using PerlIO::via::gzip
1698 csv (in => \@csv,
1699 out => "test.csv:via.gz",
1700 encoding => ":via(gzip):encoding(utf-8)",
1701 );
1702 $aoa = csv (in => "test.csv:via.gz", encoding => ":via(gzip)");
1703
1704 # Using PerlIO::gzip
1705 csv (in => \@csv,
1706 out => "test.csv:via.gz",
1707 encoding => ":gzip:encoding(utf-8)",
1708 );
1709 $aoa = csv (in => "test.csv:gzip.gz", encoding => ":gzip");
1710
1711 detect_bom
1712
1713 If "detect_bom" is given, the method "header" will be invoked on
1714 the opened stream to check if there is a BOM and set the encoding
1715 accordingly.
1716
1717 "detect_bom" can be abbreviated to "bom".
1718
1719 This is the same as setting "encoding" to "auto".
1720
1721 Note that as the method "header" is invoked, its default is to also
1722 set the headers.
1723
1724 headers
1725
1726 If this attribute is not given, the default behavior is to produce an
1727 array of arrays.
1728
1729 If "headers" is supplied, it should be an anonymous list of column
1730 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1731 "lc", "uc", or "skip".
1732
1733 skip
1734 When "skip" is used, the header will not be included in the output.
1735
1736 my $aoa = csv (in => $fh, headers => "skip");
1737
1738 auto
1739 If "auto" is used, the first line of the "CSV" source will be read as
1740 the list of field headers and used to produce an array of hashes.
1741
1742 my $aoh = csv (in => $fh, headers => "auto");
1743
1744 lc
1745 If "lc" is used, the first line of the "CSV" source will be read as
1746 the list of field headers mapped to lower case and used to produce
1747 an array of hashes. This is a variation of "auto".
1748
1749 my $aoh = csv (in => $fh, headers => "lc");
1750
1751 uc
1752 If "uc" is used, the first line of the "CSV" source will be read as
1753 the list of field headers mapped to upper case and used to produce
1754 an array of hashes. This is a variation of "auto".
1755
1756 my $aoh = csv (in => $fh, headers => "uc");
1757
1758 CODE
1759 If a coderef is used, the first line of the "CSV" source will be
1760 read as the list of mangled field headers in which each field is
1761 passed as the only argument to the coderef. This list is used to
1762 produce an array of hashes.
1763
1764 my $aoh = csv (in => $fh,
1765 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1766
1767 this example is a variation of using "lc" where all occurrences of
1768 "kode" are replaced with "code".
1769
1770 ARRAY
1771 If "headers" is an anonymous list, the entries in the list will be
1772 used as field names. The first line is considered data instead of
1773 headers.
1774
1775 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1776 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1777
1778 HASH
1779 If "headers" is a hash reference, this implies "auto", but header
1780 fields that exist as key in the hashref will be replaced by the value
1781 for that key. Given a CSV file like
1782
1783 post-kode,city,name,id number,fubble
1784 1234AA,Duckstad,Donald,13,"X313DF"
1785
1786 using
1787
1788 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1789
1790 will return an entry like
1791
1792 { pc => "1234AA",
1793 city => "Duckstad",
1794 name => "Donald",
1795 ID => "13",
1796 fubble => "X313DF",
1797 }
1798
1799 See also "munge_column_names" and "set_column_names".
1800
1801 munge_column_names
1802
1803 If "munge_column_names" is set, the method "header" is invoked on
1804 the opened stream with all matching arguments to detect and set the
1805 headers.
1806
1807 "munge_column_names" can be abbreviated to "munge".
1808
1809 key
1810
1811 If passed, will default "headers" to "auto" and return a hashref
1812 instead of an array of hashes. Allowed values are simple scalars or
1813 array-references where the first element is the joiner and the rest are
1814 the fields to join to combine the key.
1815
1816 my $ref = csv (in => "test.csv", key => "code");
1817 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ]);
1818
1819 with test.csv like
1820
1821 code,product,price,color
1822 1,pc,850,gray
1823 2,keyboard,12,white
1824 3,mouse,5,black
1825
1826 the first example will return
1827
1828 { 1 => {
1829 code => 1,
1830 color => 'gray',
1831 price => 850,
1832 product => 'pc'
1833 },
1834 2 => {
1835 code => 2,
1836 color => 'white',
1837 price => 12,
1838 product => 'keyboard'
1839 },
1840 3 => {
1841 code => 3,
1842 color => 'black',
1843 price => 5,
1844 product => 'mouse'
1845 }
1846 }
1847
1848 the second example will return
1849
1850 { "1:gray" => {
1851 code => 1,
1852 color => 'gray',
1853 price => 850,
1854 product => 'pc'
1855 },
1856 "2:white" => {
1857 code => 2,
1858 color => 'white',
1859 price => 12,
1860 product => 'keyboard'
1861 },
1862 "3:black" => {
1863 code => 3,
1864 color => 'black',
1865 price => 5,
1866 product => 'mouse'
1867 }
1868 }
1869
1870 The "key" attribute can be combined with "headers" for "CSV" date that
1871 has no header line, like
1872
1873 my $ref = csv (
1874 in => "foo.csv",
1875 headers => [qw( c_foo foo bar description stock )],
1876 key => "c_foo",
1877 );
1878
1879 value
1880
1881 Used to create key-value hashes.
1882
1883 Only allowed when "key" is valid. A "value" can be either a single
1884 column label or an anonymous list of column labels. In the first case,
1885 the value will be a simple scalar value, in the latter case, it will be
1886 a hashref.
1887
1888 my $ref = csv (in => "test.csv", key => "code",
1889 value => "price");
1890 my $ref = csv (in => "test.csv", key => "code",
1891 value => [ "product", "price" ]);
1892 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1893 value => "price");
1894 my $ref = csv (in => "test.csv", key => [ ":" => "code", "color" ],
1895 value => [ "product", "price" ]);
1896
1897 with test.csv like
1898
1899 code,product,price,color
1900 1,pc,850,gray
1901 2,keyboard,12,white
1902 3,mouse,5,black
1903
1904 the first example will return
1905
1906 { 1 => 850,
1907 2 => 12,
1908 3 => 5,
1909 }
1910
1911 the second example will return
1912
1913 { 1 => {
1914 price => 850,
1915 product => 'pc'
1916 },
1917 2 => {
1918 price => 12,
1919 product => 'keyboard'
1920 },
1921 3 => {
1922 price => 5,
1923 product => 'mouse'
1924 }
1925 }
1926
1927 the third example will return
1928
1929 { "1:gray" => 850,
1930 "2:white" => 12,
1931 "3:black" => 5,
1932 }
1933
1934 the fourth example will return
1935
1936 { "1:gray" => {
1937 price => 850,
1938 product => 'pc'
1939 },
1940 "2:white" => {
1941 price => 12,
1942 product => 'keyboard'
1943 },
1944 "3:black" => {
1945 price => 5,
1946 product => 'mouse'
1947 }
1948 }
1949
1950 keep_headers
1951
1952 When using hashes, keep the column names into the arrayref passed, so
1953 all headers are available after the call in the original order.
1954
1955 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1956
1957 This attribute can be abbreviated to "kh" or passed as
1958 "keep_column_names".
1959
1960 This attribute implies a default of "auto" for the "headers" attribute.
1961
1962 fragment
1963
1964 Only output the fragment as defined in the "fragment" method. This
1965 option is ignored when generating "CSV". See "out".
1966
1967 Combining all of them could give something like
1968
1969 use Text::CSV_XS qw( csv );
1970 my $aoh = csv (
1971 in => "test.txt",
1972 encoding => "utf-8",
1973 headers => "auto",
1974 sep_char => "|",
1975 fragment => "row=3;6-9;15-*",
1976 );
1977 say $aoh->[15]{Foo};
1978
1979 sep_set
1980
1981 If "sep_set" is set, the method "header" is invoked on the opened
1982 stream to detect and set "sep_char" with the given set.
1983
1984 "sep_set" can be abbreviated to "seps".
1985
1986 Note that as the "header" method is invoked, its default is to also
1987 set the headers.
1988
1989 set_column_names
1990
1991 If "set_column_names" is passed, the method "header" is invoked on
1992 the opened stream with all arguments meant for "header".
1993
1994 If "set_column_names" is passed as a false value, the content of the
1995 first row is only preserved if the output is AoA:
1996
1997 With an input-file like
1998
1999 bAr,foo
2000 1,2
2001 3,4,5
2002
2003 This call
2004
2005 my $aoa = csv (in => $file, set_column_names => 0);
2006
2007 will result in
2008
2009 [[ "bar", "foo" ],
2010 [ "1", "2" ],
2011 [ "3", "4", "5" ]]
2012
2013 and
2014
2015 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
2016
2017 will result in
2018
2019 [[ "bAr", "foo" ],
2020 [ "1", "2" ],
2021 [ "3", "4", "5" ]]
2022
2023 Callbacks
2024 Callbacks enable actions triggered from the inside of Text::CSV_XS.
2025
2026 While most of what this enables can easily be done in an unrolled
2027 loop as described in the "SYNOPSIS" callbacks can be used to meet
2028 special demands or enhance the "csv" function.
2029
2030 error
2031 $csv->callbacks (error => sub { $csv->SetDiag (0) });
2032
2033 the "error" callback is invoked when an error occurs, but only
2034 when "auto_diag" is set to a true value. A callback is invoked with
2035 the values returned by "error_diag":
2036
2037 my ($c, $s);
2038
2039 sub ignore3006 {
2040 my ($err, $msg, $pos, $recno, $fldno) = @_;
2041 if ($err == 3006) {
2042 # ignore this error
2043 ($c, $s) = (undef, undef);
2044 Text::CSV_XS->SetDiag (0);
2045 }
2046 # Any other error
2047 return;
2048 } # ignore3006
2049
2050 $csv->callbacks (error => \&ignore3006);
2051 $csv->bind_columns (\$c, \$s);
2052 while ($csv->getline ($fh)) {
2053 # Error 3006 will not stop the loop
2054 }
2055
2056 after_parse
2057 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
2058 while (my $row = $csv->getline ($fh)) {
2059 $row->[-1] eq "NEW";
2060 }
2061
2062 This callback is invoked after parsing with "getline" only if no
2063 error occurred. The callback is invoked with two arguments: the
2064 current "CSV" parser object and an array reference to the fields
2065 parsed.
2066
2067 The return code of the callback is ignored unless it is a reference
2068 to the string "skip", in which case the record will be skipped in
2069 "getline_all".
2070
2071 sub add_from_db {
2072 my ($csv, $row) = @_;
2073 $sth->execute ($row->[4]);
2074 push @$row, $sth->fetchrow_array;
2075 } # add_from_db
2076
2077 my $aoa = csv (in => "file.csv", callbacks => {
2078 after_parse => \&add_from_db });
2079
2080 This hook can be used for validation:
2081
2082 FAIL
2083 Die if any of the records does not validate a rule:
2084
2085 after_parse => sub {
2086 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
2087 die "5th field does not have a valid Dutch zipcode";
2088 }
2089
2090 DEFAULT
2091 Replace invalid fields with a default value:
2092
2093 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
2094
2095 SKIP
2096 Skip records that have invalid fields (only applies to
2097 "getline_all"):
2098
2099 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
2100
2101 before_print
2102 my $idx = 1;
2103 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
2104 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
2105
2106 This callback is invoked before printing with "print" only if no
2107 error occurred. The callback is invoked with two arguments: the
2108 current "CSV" parser object and an array reference to the fields
2109 passed.
2110
2111 The return code of the callback is ignored.
2112
2113 sub max_4_fields {
2114 my ($csv, $row) = @_;
2115 @$row > 4 and splice @$row, 4;
2116 } # max_4_fields
2117
2118 csv (in => csv (in => "file.csv"), out => *STDOUT,
2119 callbacks => { before_print => \&max_4_fields });
2120
2121 This callback is not active for "combine".
2122
2123 Callbacks for csv ()
2124
2125 The "csv" allows for some callbacks that do not integrate in XS
2126 internals but only feature the "csv" function.
2127
2128 csv (in => "file.csv",
2129 callbacks => {
2130 filter => { 6 => sub { $_ > 15 } }, # first
2131 after_parse => sub { say "AFTER PARSE"; }, # first
2132 after_in => sub { say "AFTER IN"; }, # second
2133 on_in => sub { say "ON IN"; }, # third
2134 },
2135 );
2136
2137 csv (in => $aoh,
2138 out => "file.csv",
2139 callbacks => {
2140 on_in => sub { say "ON IN"; }, # first
2141 before_out => sub { say "BEFORE OUT"; }, # second
2142 before_print => sub { say "BEFORE PRINT"; }, # third
2143 },
2144 );
2145
2146 filter
2147 This callback can be used to filter records. It is called just after
2148 a new record has been scanned. The callback accepts a:
2149
2150 hashref
2151 The keys are the index to the row (the field name or field number,
2152 1-based) and the values are subs to return a true or false value.
2153
2154 csv (in => "file.csv", filter => {
2155 3 => sub { m/a/ }, # third field should contain an "a"
2156 5 => sub { length > 4 }, # length of the 5th field minimal 5
2157 });
2158
2159 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
2160
2161 If the keys to the filter hash contain any character that is not a
2162 digit it will also implicitly set "headers" to "auto" unless
2163 "headers" was already passed as argument. When headers are
2164 active, returning an array of hashes, the filter is not applicable
2165 to the header itself.
2166
2167 All sub results should match, as in AND.
2168
2169 The context of the callback sets $_ localized to the field
2170 indicated by the filter. The two arguments are as with all other
2171 callbacks, so the other fields in the current row can be seen:
2172
2173 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
2174
2175 If the context is set to return a list of hashes ("headers" is
2176 defined), the current record will also be available in the
2177 localized %_:
2178
2179 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
2180
2181 If the filter is used to alter the content by changing $_, make
2182 sure that the sub returns true in order not to have that record
2183 skipped:
2184
2185 filter => { 2 => sub { $_ = uc }}
2186
2187 will upper-case the second field, and then skip it if the resulting
2188 content evaluates to false. To always accept, end with truth:
2189
2190 filter => { 2 => sub { $_ = uc; 1 }}
2191
2192 coderef
2193 csv (in => "file.csv", filter => sub { $n++; 0; });
2194
2195 If the argument to "filter" is a coderef, it is an alias or
2196 shortcut to a filter on column 0:
2197
2198 csv (filter => sub { $n++; 0 });
2199
2200 is equal to
2201
2202 csv (filter => { 0 => sub { $n++; 0 });
2203
2204 filter-name
2205 csv (in => "file.csv", filter => "not_blank");
2206 csv (in => "file.csv", filter => "not_empty");
2207 csv (in => "file.csv", filter => "filled");
2208
2209 These are predefined filters
2210
2211 Given a file like (line numbers prefixed for doc purpose only):
2212
2213 1:1,2,3
2214 2:
2215 3:,
2216 4:""
2217 5:,,
2218 6:, ,
2219 7:"",
2220 8:" "
2221 9:4,5,6
2222
2223 not_blank
2224 Filter out the blank lines
2225
2226 This filter is a shortcut for
2227
2228 filter => { 0 => sub { @{$_[1]} > 1 or
2229 defined $_[1][0] && $_[1][0] ne "" } }
2230
2231 Due to the implementation, it is currently impossible to also
2232 filter lines that consists only of a quoted empty field. These
2233 lines are also considered blank lines.
2234
2235 With the given example, lines 2 and 4 will be skipped.
2236
2237 not_empty
2238 Filter out lines where all the fields are empty.
2239
2240 This filter is a shortcut for
2241
2242 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2243
2244 A space is not regarded being empty, so given the example data,
2245 lines 2, 3, 4, 5, and 7 are skipped.
2246
2247 filled
2248 Filter out lines that have no visible data
2249
2250 This filter is a shortcut for
2251
2252 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2253
2254 This filter rejects all lines that not have at least one field
2255 that does not evaluate to the empty string.
2256
2257 With the given example data, this filter would skip lines 2
2258 through 8.
2259
2260 One could also use modules like Types::Standard:
2261
2262 use Types::Standard -types;
2263
2264 my $type = Tuple[Str, Str, Int, Bool, Optional[Num]];
2265 my $check = $type->compiled_check;
2266
2267 # filter with compiled check and warnings
2268 my $aoa = csv (
2269 in => \$data,
2270 filter => {
2271 0 => sub {
2272 my $ok = $check->($_[1]) or
2273 warn $type->get_message ($_[1]), "\n";
2274 return $ok;
2275 },
2276 },
2277 );
2278
2279 after_in
2280 This callback is invoked for each record after all records have been
2281 parsed but before returning the reference to the caller. The hook is
2282 invoked with two arguments: the current "CSV" parser object and a
2283 reference to the record. The reference can be a reference to a
2284 HASH or a reference to an ARRAY as determined by the arguments.
2285
2286 This callback can also be passed as an attribute without the
2287 "callbacks" wrapper.
2288
2289 before_out
2290 This callback is invoked for each record before the record is
2291 printed. The hook is invoked with two arguments: the current "CSV"
2292 parser object and a reference to the record. The reference can be a
2293 reference to a HASH or a reference to an ARRAY as determined by the
2294 arguments.
2295
2296 This callback can also be passed as an attribute without the
2297 "callbacks" wrapper.
2298
2299 This callback makes the row available in %_ if the row is a hashref.
2300 In this case %_ is writable and will change the original row.
2301
2302 on_in
2303 This callback acts exactly as the "after_in" or the "before_out"
2304 hooks.
2305
2306 This callback can also be passed as an attribute without the
2307 "callbacks" wrapper.
2308
2309 This callback makes the row available in %_ if the row is a hashref.
2310 In this case %_ is writable and will change the original row. So e.g.
2311 with
2312
2313 my $aoh = csv (
2314 in => \"foo\n1\n2\n",
2315 headers => "auto",
2316 on_in => sub { $_{bar} = 2; },
2317 );
2318
2319 $aoh will be:
2320
2321 [ { foo => 1,
2322 bar => 2,
2323 }
2324 { foo => 2,
2325 bar => 2,
2326 }
2327 ]
2328
2329 csv
2330 The function "csv" can also be called as a method or with an
2331 existing Text::CSV_XS object. This could help if the function is to
2332 be invoked a lot of times and the overhead of creating the object
2333 internally over and over again would be prevented by passing an
2334 existing instance.
2335
2336 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2337
2338 my $aoa = $csv->csv (in => $fh);
2339 my $aoa = csv (in => $fh, csv => $csv);
2340
2341 both act the same. Running this 20000 times on a 20 lines CSV file,
2342 showed a 53% speedup.
2343
2345 Combine (...)
2346 Parse (...)
2347
2348 The arguments to these internal functions are deliberately not
2349 described or documented in order to enable the module authors make
2350 changes it when they feel the need for it. Using them is highly
2351 discouraged as the API may change in future releases.
2352
2354 Reading a CSV file line by line:
2355 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2356 open my $fh, "<", "file.csv" or die "file.csv: $!";
2357 while (my $row = $csv->getline ($fh)) {
2358 # do something with @$row
2359 }
2360 close $fh or die "file.csv: $!";
2361
2362 or
2363
2364 my $aoa = csv (in => "file.csv", on_in => sub {
2365 # do something with %_
2366 });
2367
2368 Reading only a single column
2369
2370 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2371 open my $fh, "<", "file.csv" or die "file.csv: $!";
2372 # get only the 4th column
2373 my @column = map { $_->[3] } @{$csv->getline_all ($fh)};
2374 close $fh or die "file.csv: $!";
2375
2376 with "csv", you could do
2377
2378 my @column = map { $_->[0] }
2379 @{csv (in => "file.csv", fragment => "col=4")};
2380
2381 Parsing CSV strings:
2382 my $csv = Text::CSV_XS->new ({ keep_meta_info => 1, binary => 1 });
2383
2384 my $sample_input_string =
2385 qq{"I said, ""Hi!""",Yes,"",2.34,,"1.09","\x{20ac}",};
2386 if ($csv->parse ($sample_input_string)) {
2387 my @field = $csv->fields;
2388 foreach my $col (0 .. $#field) {
2389 my $quo = $csv->is_quoted ($col) ? $csv->{quote_char} : "";
2390 printf "%2d: %s%s%s\n", $col, $quo, $field[$col], $quo;
2391 }
2392 }
2393 else {
2394 print STDERR "parse () failed on argument: ",
2395 $csv->error_input, "\n";
2396 $csv->error_diag ();
2397 }
2398
2399 Parsing CSV from memory
2400
2401 Given a complete CSV data-set in scalar $data, generate a list of
2402 lists to represent the rows and fields
2403
2404 # The data
2405 my $data = join "\r\n" => map { join "," => 0 .. 5 } 0 .. 5;
2406
2407 # in a loop
2408 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
2409 open my $fh, "<", \$data;
2410 my @foo;
2411 while (my $row = $csv->getline ($fh)) {
2412 push @foo, $row;
2413 }
2414 close $fh;
2415
2416 # a single call
2417 my $foo = csv (in => \$data);
2418
2419 Printing CSV data
2420 The fast way: using "print"
2421
2422 An example for creating "CSV" files using the "print" method:
2423
2424 my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
2425 open my $fh, ">", "foo.csv" or die "foo.csv: $!";
2426 for (1 .. 10) {
2427 $csv->print ($fh, [ $_, "$_" ]) or $csv->error_diag;
2428 }
2429 close $fh or die "$tbl.csv: $!";
2430
2431 The slow way: using "combine" and "string"
2432
2433 or using the slower "combine" and "string" methods:
2434
2435 my $csv = Text::CSV_XS->new;
2436
2437 open my $csv_fh, ">", "hello.csv" or die "hello.csv: $!";
2438
2439 my @sample_input_fields = (
2440 'You said, "Hello!"', 5.67,
2441 '"Surely"', '', '3.14159');
2442 if ($csv->combine (@sample_input_fields)) {
2443 print $csv_fh $csv->string, "\n";
2444 }
2445 else {
2446 print "combine () failed on argument: ",
2447 $csv->error_input, "\n";
2448 }
2449 close $csv_fh or die "hello.csv: $!";
2450
2451 Generating CSV into memory
2452
2453 Format a data-set (@foo) into a scalar value in memory ($data):
2454
2455 # The data
2456 my @foo = map { [ 0 .. 5 ] } 0 .. 3;
2457
2458 # in a loop
2459 my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r\n" });
2460 open my $fh, ">", \my $data;
2461 $csv->print ($fh, $_) for @foo;
2462 close $fh;
2463
2464 # a single call
2465 csv (in => \@foo, out => \my $data);
2466
2467 Rewriting CSV
2468 Rewrite "CSV" files with ";" as separator character to well-formed
2469 "CSV":
2470
2471 use Text::CSV_XS qw( csv );
2472 csv (in => csv (in => "bad.csv", sep_char => ";"), out => *STDOUT);
2473
2474 As "STDOUT" is now default in "csv", a one-liner converting a UTF-16
2475 CSV file with BOM and TAB-separation to valid UTF-8 CSV could be:
2476
2477 $ perl -C3 -MText::CSV_XS=csv -we\
2478 'csv(in=>"utf16tab.csv",encoding=>"utf16",sep=>"\t")' >utf8.csv
2479
2480 Dumping database tables to CSV
2481 Dumping a database table can be simple as this (TIMTOWTDI):
2482
2483 my $dbh = DBI->connect (...);
2484 my $sql = "select * from foo";
2485
2486 # using your own loop
2487 open my $fh, ">", "foo.csv" or die "foo.csv: $!\n";
2488 my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n" });
2489 my $sth = $dbh->prepare ($sql); $sth->execute;
2490 $csv->print ($fh, $sth->{NAME_lc});
2491 while (my $row = $sth->fetch) {
2492 $csv->print ($fh, $row);
2493 }
2494
2495 # using the csv function, all in memory
2496 csv (out => "foo.csv", in => $dbh->selectall_arrayref ($sql));
2497
2498 # using the csv function, streaming with callbacks
2499 my $sth = $dbh->prepare ($sql); $sth->execute;
2500 csv (out => "foo.csv", in => sub { $sth->fetch });
2501 csv (out => "foo.csv", in => sub { $sth->fetchrow_hashref });
2502
2503 Note that this does not discriminate between "empty" values and NULL-
2504 values from the database, as both will be the same empty field in CSV.
2505 To enable distinction between the two, use "quote_empty".
2506
2507 csv (out => "foo.csv", in => sub { $sth->fetch }, quote_empty => 1);
2508
2509 If the database import utility supports special sequences to insert
2510 "NULL" values into the database, like MySQL/MariaDB supports "\N",
2511 use a filter or a map
2512
2513 csv (out => "foo.csv", in => sub { $sth->fetch },
2514 on_in => sub { $_ //= "\\N" for @{$_[1]} });
2515
2516 while (my $row = $sth->fetch) {
2517 $csv->print ($fh, [ map { $_ // "\\N" } @$row ]);
2518 }
2519
2520 Note that this will not work as expected when choosing the backslash
2521 ("\") as "escape_char", as that will cause the "\" to need to be
2522 escaped by yet another "\", which will cause the field to need
2523 quotation and thus ending up as "\\N" instead of "\N". See also
2524 "undef_str".
2525
2526 csv (out => "foo.csv", in => sub { $sth->fetch }, undef_str => "\\N");
2527
2528 These special sequences are not recognized by Text::CSV_XS on parsing
2529 the CSV generated like this, but map and filter are your friends again
2530
2531 while (my $row = $csv->getline ($fh)) {
2532 $sth->execute (map { $_ eq "\\N" ? undef : $_ } @$row);
2533 }
2534
2535 csv (in => "foo.csv", filter => { 1 => sub {
2536 $sth->execute (map { $_ eq "\\N" ? undef : $_ } @{$_[1]}); 0; }});
2537
2538 The examples folder
2539 For more extended examples, see the examples/ 1. sub-directory in the
2540 original distribution or the git repository 2.
2541
2542 1. https://github.com/Tux/Text-CSV_XS/tree/master/examples
2543 2. https://github.com/Tux/Text-CSV_XS
2544
2545 The following files can be found there:
2546
2547 parser-xs.pl
2548 This can be used as a boilerplate to parse invalid "CSV" and parse
2549 beyond (expected) errors alternative to using the "error" callback.
2550
2551 $ perl examples/parser-xs.pl bad.csv >good.csv
2552
2553 csv-check
2554 This is a command-line tool that uses parser-xs.pl techniques to
2555 check the "CSV" file and report on its content.
2556
2557 $ csv-check files/utf8.csv
2558 Checked files/utf8.csv with csv-check 1.9
2559 using Text::CSV_XS 1.32 with perl 5.26.0 and Unicode 9.0.0
2560 OK: rows: 1, columns: 2
2561 sep = <,>, quo = <">, bin = <1>, eol = <"\n">
2562
2563 csv2xls
2564 A script to convert "CSV" to Microsoft Excel ("XLS"). This requires
2565 extra modules Date::Calc and Spreadsheet::WriteExcel. The converter
2566 accepts various options and can produce UTF-8 compliant Excel files.
2567
2568 csv2xlsx
2569 A script to convert "CSV" to Microsoft Excel ("XLSX"). This requires
2570 the modules Date::Calc and Spreadsheet::Writer::XLSX. The converter
2571 does accept various options including merging several "CSV" files
2572 into a single Excel file.
2573
2574 csvdiff
2575 A script that provides colorized diff on sorted CSV files, assuming
2576 first line is header and first field is the key. Output options
2577 include colorized ANSI escape codes or HTML.
2578
2579 $ csvdiff --html --output=diff.html file1.csv file2.csv
2580
2581 rewrite.pl
2582 A script to rewrite (in)valid CSV into valid CSV files. Script has
2583 options to generate confusing CSV files or CSV files that conform to
2584 Dutch MS-Excel exports (using ";" as separation).
2585
2586 Script - by default - honors BOM and auto-detects separation
2587 converting it to default standard CSV with "," as separator.
2588
2590 Text::CSV_XS is not designed to detect the characters used to quote
2591 and separate fields. The parsing is done using predefined (default)
2592 settings. In the examples sub-directory, you can find scripts that
2593 demonstrate how you could try to detect these characters yourself.
2594
2595 Microsoft Excel
2596 The import/export from Microsoft Excel is a risky task, according to
2597 the documentation in "Text::CSV::Separator". Microsoft uses the
2598 system's list separator defined in the regional settings, which happens
2599 to be a semicolon for Dutch, German and Spanish (and probably some
2600 others as well). For the English locale, the default is a comma.
2601 In Windows however, the user is free to choose a predefined locale,
2602 and then change every individual setting in it, so checking the
2603 locale is no solution.
2604
2605 As of version 1.17, a lone first line with just
2606
2607 sep=;
2608
2609 will be recognized and honored when parsing with "getline".
2610
2612 More Errors & Warnings
2613 New extensions ought to be clear and concise in reporting what
2614 error has occurred where and why, and maybe also offer a remedy to
2615 the problem.
2616
2617 "error_diag" is a (very) good start, but there is more work to be
2618 done in this area.
2619
2620 Basic calls should croak or warn on illegal parameters. Errors
2621 should be documented.
2622
2623 setting meta info
2624 Future extensions might include extending the "meta_info",
2625 "is_quoted", and "is_binary" to accept setting these flags for
2626 fields, so you can specify which fields are quoted in the
2627 "combine"/"string" combination.
2628
2629 $csv->meta_info (0, 1, 1, 3, 0, 0);
2630 $csv->is_quoted (3, 1);
2631
2632 Metadata Vocabulary for Tabular Data
2633 <http://w3c.github.io/csvw/metadata/> (a W3C editor's draft) could be
2634 an example for supporting more metadata.
2635
2636 Parse the whole file at once
2637 Implement new methods or functions that enable parsing of a
2638 complete file at once, returning a list of hashes. Possible extension
2639 to this could be to enable a column selection on the call:
2640
2641 my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
2642
2643 returning something like
2644
2645 [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
2646 flags => [ ... ],
2647 },
2648 { fields => [ ... ],
2649 .
2650 },
2651 ]
2652
2653 Note that the "csv" function already supports most of this, but does
2654 not return flags. "getline_all" returns all rows for an open stream,
2655 but this will not return flags either. "fragment" can reduce the
2656 required rows or columns, but cannot combine them.
2657
2658 Cookbook
2659 Write a document that has recipes for most known non-standard (and
2660 maybe some standard) "CSV" formats, including formats that use
2661 "TAB", ";", "|", or other non-comma separators.
2662
2663 Examples could be taken from W3C's CSV on the Web: Use Cases and
2664 Requirements <http://w3c.github.io/csvw/use-cases-and-
2665 requirements/index.html>
2666
2667 Steal
2668 Steal good new ideas and features from PapaParse
2669 <http://papaparse.com> or csvkit <http://csvkit.readthedocs.org>.
2670
2671 Perl6 support
2672 I'm already working on perl6 support here
2673 <https://github.com/Tux/CSV>. No promises yet on when it is finished
2674 (or fast). Trying to keep the API alike as much as possible.
2675
2676 NOT TODO
2677 combined methods
2678 Requests for adding means (methods) that combine "combine" and
2679 "string" in a single call will not be honored (use "print" instead).
2680 Likewise for "parse" and "fields" (use "getline" instead), given the
2681 problems with embedded newlines.
2682
2683 Release plan
2684 No guarantees, but this is what I had in mind some time ago:
2685
2686 · DIAGNOSTICS section in pod to *describe* the errors (see below)
2687
2689 Everything should now work on native EBCDIC systems. As the test does
2690 not cover all possible codepoints and Encode does not support
2691 "utf-ebcdic", there is no guarantee that all handling of Unicode is
2692 done correct.
2693
2694 Opening "EBCDIC" encoded files on "ASCII"+ systems is likely to
2695 succeed using Encode's "cp37", "cp1047", or "posix-bc":
2696
2697 open my $fh, "<:encoding(cp1047)", "ebcdic_file.csv" or die "...";
2698
2700 Still under construction ...
2701
2702 If an error occurs, "$csv->error_diag" can be used to get information
2703 on the cause of the failure. Note that for speed reasons the internal
2704 value is never cleared on success, so using the value returned by
2705 "error_diag" in normal cases - when no error occurred - may cause
2706 unexpected results.
2707
2708 If the constructor failed, the cause can be found using "error_diag" as
2709 a class method, like "Text::CSV_XS->error_diag".
2710
2711 The "$csv->error_diag" method is automatically invoked upon error when
2712 the contractor was called with "auto_diag" set to 1 or 2, or when
2713 autodie is in effect. When set to 1, this will cause a "warn" with the
2714 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2715 from "auto_diag" reports.
2716
2717 Errors can be (individually) caught using the "error" callback.
2718
2719 The errors as described below are available. I have tried to make the
2720 error itself explanatory enough, but more descriptions will be added.
2721 For most of these errors, the first three capitals describe the error
2722 category:
2723
2724 · INI
2725
2726 Initialization error or option conflict.
2727
2728 · ECR
2729
2730 Carriage-Return related parse error.
2731
2732 · EOF
2733
2734 End-Of-File related parse error.
2735
2736 · EIQ
2737
2738 Parse error inside quotation.
2739
2740 · EIF
2741
2742 Parse error inside field.
2743
2744 · ECB
2745
2746 Combine error.
2747
2748 · EHR
2749
2750 HashRef parse related error.
2751
2752 And below should be the complete list of error codes that can be
2753 returned:
2754
2755 · 1001 "INI - sep_char is equal to quote_char or escape_char"
2756
2757 The separation character cannot be equal to the quotation
2758 character or to the escape character, as this would invalidate all
2759 parsing rules.
2760
2761 · 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2762 TAB"
2763
2764 Using the "allow_whitespace" attribute when either "quote_char" or
2765 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2766 allow.
2767
2768 · 1003 "INI - \r or \n in main attr not allowed"
2769
2770 Using default "eol" characters in either "sep_char", "quote_char",
2771 or "escape_char" is not allowed.
2772
2773 · 1004 "INI - callbacks should be undef or a hashref"
2774
2775 The "callbacks" attribute only allows one to be "undef" or a hash
2776 reference.
2777
2778 · 1005 "INI - EOL too long"
2779
2780 The value passed for EOL is exceeding its maximum length (16).
2781
2782 · 1006 "INI - SEP too long"
2783
2784 The value passed for SEP is exceeding its maximum length (16).
2785
2786 · 1007 "INI - QUOTE too long"
2787
2788 The value passed for QUOTE is exceeding its maximum length (16).
2789
2790 · 1008 "INI - SEP undefined"
2791
2792 The value passed for SEP should be defined and not empty.
2793
2794 · 1010 "INI - the header is empty"
2795
2796 The header line parsed in the "header" is empty.
2797
2798 · 1011 "INI - the header contains more than one valid separator"
2799
2800 The header line parsed in the "header" contains more than one
2801 (unique) separator character out of the allowed set of separators.
2802
2803 · 1012 "INI - the header contains an empty field"
2804
2805 The header line parsed in the "header" contains an empty field.
2806
2807 · 1013 "INI - the header contains nun-unique fields"
2808
2809 The header line parsed in the "header" contains at least two
2810 identical fields.
2811
2812 · 1014 "INI - header called on undefined stream"
2813
2814 The header line cannot be parsed from an undefined source.
2815
2816 · 1500 "PRM - Invalid/unsupported argument(s)"
2817
2818 Function or method called with invalid argument(s) or parameter(s).
2819
2820 · 1501 "PRM - The key attribute is passed as an unsupported type"
2821
2822 The "key" attribute is of an unsupported type.
2823
2824 · 1502 "PRM - The value attribute is passed without the key attribute"
2825
2826 The "value" attribute is only allowed when a valid key is given.
2827
2828 · 1503 "PRM - The value attribute is passed as an unsupported type"
2829
2830 The "value" attribute is of an unsupported type.
2831
2832 · 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2833
2834 When "eol" has been set to anything but the default, like
2835 "\r\t\n", and the "\r" is following the second (closing)
2836 "quote_char", where the characters following the "\r" do not make up
2837 the "eol" sequence, this is an error.
2838
2839 · 2011 "ECR - Characters after end of quoted field"
2840
2841 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2842 quoted field and after the closing double-quote, there should be
2843 either a new-line sequence or a separation character.
2844
2845 · 2012 "EOF - End of data in parsing input stream"
2846
2847 Self-explaining. End-of-file while inside parsing a stream. Can
2848 happen only when reading from streams with "getline", as using
2849 "parse" is done on strings that are not required to have a trailing
2850 "eol".
2851
2852 · 2013 "INI - Specification error for fragments RFC7111"
2853
2854 Invalid specification for URI "fragment" specification.
2855
2856 · 2014 "ENF - Inconsistent number of fields"
2857
2858 Inconsistent number of fields under strict parsing.
2859
2860 · 2021 "EIQ - NL char inside quotes, binary off"
2861
2862 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2863 option has been selected with the constructor.
2864
2865 · 2022 "EIQ - CR char inside quotes, binary off"
2866
2867 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2868 option has been selected with the constructor.
2869
2870 · 2023 "EIQ - QUO character not allowed"
2871
2872 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2873 Bar",\n" will cause this error.
2874
2875 · 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2876
2877 The escape character is not allowed as last character in an input
2878 stream.
2879
2880 · 2025 "EIQ - Loose unescaped escape"
2881
2882 An escape character should escape only characters that need escaping.
2883
2884 Allowing the escape for other characters is possible with the
2885 attribute "allow_loose_escapes".
2886
2887 · 2026 "EIQ - Binary character inside quoted field, binary off"
2888
2889 Binary characters are not allowed by default. Exceptions are
2890 fields that contain valid UTF-8, that will automatically be upgraded
2891 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2892 data.
2893
2894 · 2027 "EIQ - Quoted field not terminated"
2895
2896 When parsing a field that started with a quotation character, the
2897 field is expected to be closed with a quotation character. When the
2898 parsed line is exhausted before the quote is found, that field is not
2899 terminated.
2900
2901 · 2030 "EIF - NL char inside unquoted verbatim, binary off"
2902
2903 · 2031 "EIF - CR char is first char of field, not part of EOL"
2904
2905 · 2032 "EIF - CR char inside unquoted, not part of EOL"
2906
2907 · 2034 "EIF - Loose unescaped quote"
2908
2909 · 2035 "EIF - Escaped EOF in unquoted field"
2910
2911 · 2036 "EIF - ESC error"
2912
2913 · 2037 "EIF - Binary character in unquoted field, binary off"
2914
2915 · 2110 "ECB - Binary character in Combine, binary off"
2916
2917 · 2200 "EIO - print to IO failed. See errno"
2918
2919 · 3001 "EHR - Unsupported syntax for column_names ()"
2920
2921 · 3002 "EHR - getline_hr () called before column_names ()"
2922
2923 · 3003 "EHR - bind_columns () and column_names () fields count
2924 mismatch"
2925
2926 · 3004 "EHR - bind_columns () only accepts refs to scalars"
2927
2928 · 3006 "EHR - bind_columns () did not pass enough refs for parsed
2929 fields"
2930
2931 · 3007 "EHR - bind_columns needs refs to writable scalars"
2932
2933 · 3008 "EHR - unexpected error in bound fields"
2934
2935 · 3009 "EHR - print_hr () called before column_names ()"
2936
2937 · 3010 "EHR - print_hr () called with invalid arguments"
2938
2940 IO::File, IO::Handle, IO::Wrap, Text::CSV, Text::CSV_PP,
2941 Text::CSV::Encoded, Text::CSV::Separator, Text::CSV::Slurp,
2942 Spreadsheet::CSV and Spreadsheet::Read, and of course perl.
2943
2944 If you are using perl6, you can have a look at "Text::CSV" in the
2945 perl6 ecosystem, offering the same features.
2946
2947 non-perl
2948
2949 A CSV parser in JavaScript, also used by W3C <http://www.w3.org>, is
2950 the multi-threaded in-browser PapaParse <http://papaparse.com/>.
2951
2952 csvkit <http://csvkit.readthedocs.org> is a python CSV parsing toolkit.
2953
2955 Alan Citterman <alan@mfgrtl.com> wrote the original Perl module.
2956 Please don't send mail concerning Text::CSV_XS to Alan, who is not
2957 involved in the C/XS part that is now the main part of the module.
2958
2959 Jochen Wiedmann <joe@ispsoft.de> rewrote the en- and decoding in C by
2960 implementing a simple finite-state machine. He added variable quote,
2961 escape and separator characters, the binary mode and the print and
2962 getline methods. See ChangeLog releases 0.10 through 0.23.
2963
2964 H.Merijn Brand <h.m.brand@xs4all.nl> cleaned up the code, added the
2965 field flags methods, wrote the major part of the test suite, completed
2966 the documentation, fixed most RT bugs, added all the allow flags and
2967 the "csv" function. See ChangeLog releases 0.25 and on.
2968
2970 Copyright (C) 2007-2020 H.Merijn Brand. All rights reserved.
2971 Copyright (C) 1998-2001 Jochen Wiedmann. All rights reserved.
2972 Copyright (C) 1997 Alan Citterman. All rights reserved.
2973
2974 This library is free software; you can redistribute and/or modify it
2975 under the same terms as Perl itself.
2976
2977
2978
2979perl v5.32.0 2020-07-28 CSV_XS(3)