1Text::CSV(3) User Contributed Perl Documentation Text::CSV(3)
2
3
4
6 Text::CSV - comma-separated values manipulator (using XS or PurePerl)
7
9 use Text::CSV;
10
11 my @rows;
12 my $csv = Text::CSV->new ( { binary => 1 } ) # should set binary attribute.
13 or die "Cannot use CSV: ".Text::CSV->error_diag ();
14
15 open my $fh, "<:encoding(utf8)", "test.csv" or die "test.csv: $!";
16 while ( my $row = $csv->getline( $fh ) ) {
17 $row->[2] =~ m/pattern/ or next; # 3rd field should match
18 push @rows, $row;
19 }
20 $csv->eof or $csv->error_diag();
21 close $fh;
22
23 $csv->eol ("\r\n");
24
25 open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
26 $csv->print ($fh, $_) for @rows;
27 close $fh or die "new.csv: $!";
28
29 #
30 # parse and combine style
31 #
32
33 $status = $csv->combine(@columns); # combine columns into a string
34 $line = $csv->string(); # get the combined string
35
36 $status = $csv->parse($line); # parse a CSV string into fields
37 @columns = $csv->fields(); # get the parsed fields
38
39 $status = $csv->status (); # get the most recent status
40 $bad_argument = $csv->error_input (); # get the most recent bad argument
41 $diag = $csv->error_diag (); # if an error occurred, explains WHY
42
43 $status = $csv->print ($io, $colref); # Write an array of fields
44 # immediately to a file $io
45 $colref = $csv->getline ($io); # Read a line from file $io,
46 # parse it and return an array
47 # ref of fields
48 $csv->column_names (@names); # Set column names for getline_hr ()
49 $ref = $csv->getline_hr ($io); # getline (), but returns a hashref
50 $eof = $csv->eof (); # Indicate if last parse or
51 # getline () hit End Of File
52
53 $csv->types(\@t_array); # Set column types
54
56 Text::CSV is a thin wrapper for Text::CSV_XS-compatible modules now.
57 All the backend modules provide facilities for the composition and
58 decomposition of comma-separated values. Text::CSV uses Text::CSV_XS by
59 default, and when Text::CSV_XS is not available, falls back on
60 Text::CSV_PP, which is bundled in the same distribution as this module.
61
63 This module respects an environmental variable called "PERL_TEXT_CSV"
64 when it decides a backend module to use. If this environmental variable
65 is not set, it tries to load Text::CSV_XS, and if Text::CSV_XS is not
66 available, falls back on Text::CSV_PP;
67
68 If you always don't want it to fall back on Text::CSV_PP, set the
69 variable like this ("export" may be "setenv", "set" and the likes,
70 depending on your environment):
71
72 > export PERL_TEXT_CSV=Text::CSV_XS
73
74 If you prefer Text::CSV_XS to Text::CSV_PP (default), then:
75
76 > export PERL_TEXT_CSV=Text::CSV_XS,Text::CSV_PP
77
78 You may also want to set this variable at the top of your test files,
79 in order not to be bothered with incompatibilities between backends
80 (you need to wrap this in "BEGIN", and set before actually "use"-ing
81 Text::CSV module, as it decides its backend as soon as it's loaded):
82
83 BEGIN { $ENV{PERL_TEXT_CSV}='Text::CSV_PP'; }
84 use Text::CSV;
85
87 This section is taken from Text::CSV_XS.
88
89 Embedded newlines
90 Important Note: The default behavior is to accept only ASCII
91 characters in the range from 0x20 (space) to 0x7E (tilde). This means
92 that the fields can not contain newlines. If your data contains
93 newlines embedded in fields, or characters above 0x7E (tilde), or
94 binary data, you must set "binary => 1" in the call to "new". To cover
95 the widest range of parsing options, you will always want to set
96 binary.
97
98 But you still have the problem that you have to pass a correct line to
99 the "parse" method, which is more complicated from the usual point of
100 usage:
101
102 my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
103 while (<>) { # WRONG!
104 $csv->parse ($_);
105 my @fields = $csv->fields ();
106 }
107
108 this will break, as the "while" might read broken lines: it does not
109 care about the quoting. If you need to support embedded newlines, the
110 way to go is to not pass "eol" in the parser (it accepts "\n", "\r",
111 and "\r\n" by default) and then
112
113 my $csv = Text::CSV->new ({ binary => 1 });
114 open my $fh, "<", $file or die "$file: $!";
115 while (my $row = $csv->getline ($fh)) {
116 my @fields = @$row;
117 }
118
119 The old(er) way of using global file handles is still supported
120
121 while (my $row = $csv->getline (*ARGV)) { ... }
122
123 Unicode
124 Unicode is only tested to work with perl-5.8.2 and up.
125
126 See also "BOM".
127
128 The simplest way to ensure the correct encoding is used for in- and
129 output is by either setting layers on the filehandles, or setting the
130 "encoding" argument for "csv".
131
132 open my $fh, "<:encoding(UTF-8)", "in.csv" or die "in.csv: $!";
133 or
134 my $aoa = csv (in => "in.csv", encoding => "UTF-8");
135
136 open my $fh, ">:encoding(UTF-8)", "out.csv" or die "out.csv: $!";
137 or
138 csv (in => $aoa, out => "out.csv", encoding => "UTF-8");
139
140 On parsing (both for "getline" and "parse"), if the source is marked
141 being UTF8, then all fields that are marked binary will also be marked
142 UTF8.
143
144 On combining ("print" and "combine"): if any of the combining fields
145 was marked UTF8, the resulting string will be marked as UTF8. Note
146 however that all fields before the first field marked UTF8 and
147 contained 8-bit characters that were not upgraded to UTF8, these will
148 be "bytes" in the resulting string too, possibly causing unexpected
149 errors. If you pass data of different encoding, or you don't know if
150 there is different encoding, force it to be upgraded before you pass
151 them on:
152
153 $csv->print ($fh, [ map { utf8::upgrade (my $x = $_); $x } @data ]);
154
155 For complete control over encoding, please use Text::CSV::Encoded:
156
157 use Text::CSV::Encoded;
158 my $csv = Text::CSV::Encoded->new ({
159 encoding_in => "iso-8859-1", # the encoding comes into Perl
160 encoding_out => "cp1252", # the encoding comes out of Perl
161 });
162
163 $csv = Text::CSV::Encoded->new ({ encoding => "utf8" });
164 # combine () and print () accept *literally* utf8 encoded data
165 # parse () and getline () return *literally* utf8 encoded data
166
167 $csv = Text::CSV::Encoded->new ({ encoding => undef }); # default
168 # combine () and print () accept UTF8 marked data
169 # parse () and getline () return UTF8 marked data
170
171 BOM
172 BOM (or Byte Order Mark) handling is available only inside the
173 "header" method. This method supports the following encodings:
174 "utf-8", "utf-1", "utf-32be", "utf-32le", "utf-16be", "utf-16le",
175 "utf-ebcdic", "scsu", "bocu-1", and "gb-18030". See Wikipedia
176 <https://en.wikipedia.org/wiki/Byte_order_mark>.
177
178 If a file has a BOM, the easiest way to deal with that is
179
180 my $aoh = csv (in => $file, detect_bom => 1);
181
182 All records will be encoded based on the detected BOM.
183
184 This implies a call to the "header" method, which defaults to also
185 set the "column_names". So this is not the same as
186
187 my $aoh = csv (in => $file, headers => "auto");
188
189 which only reads the first record to set "column_names" but ignores
190 any meaning of possible present BOM.
191
193 This section is also taken from Text::CSV_XS.
194
195 version
196 (Class method) Returns the current module version.
197
198 new
199 (Class method) Returns a new instance of class Text::CSV. The
200 attributes are described by the (optional) hash ref "\%attr".
201
202 my $csv = Text::CSV->new ({ attributes ... });
203
204 The following attributes are available:
205
206 eol
207
208 my $csv = Text::CSV->new ({ eol => $/ });
209 $csv->eol (undef);
210 my $eol = $csv->eol;
211
212 The end-of-line string to add to rows for "print" or the record
213 separator for "getline".
214
215 When not passed in a parser instance, the default behavior is to
216 accept "\n", "\r", and "\r\n", so it is probably safer to not specify
217 "eol" at all. Passing "undef" or the empty string behave the same.
218
219 When not passed in a generating instance, records are not terminated
220 at all, so it is probably wise to pass something you expect. A safe
221 choice for "eol" on output is either $/ or "\r\n".
222
223 Common values for "eol" are "\012" ("\n" or Line Feed), "\015\012"
224 ("\r\n" or Carriage Return, Line Feed), and "\015" ("\r" or Carriage
225 Return). The "eol" attribute cannot exceed 7 (ASCII) characters.
226
227 If both $/ and "eol" equal "\015", parsing lines that end on only a
228 Carriage Return without Line Feed, will be "parse"d correct.
229
230 sep_char
231
232 my $csv = Text::CSV->new ({ sep_char => ";" });
233 $csv->sep_char (";");
234 my $c = $csv->sep_char;
235
236 The char used to separate fields, by default a comma. (","). Limited
237 to a single-byte character, usually in the range from 0x20 (space) to
238 0x7E (tilde). When longer sequences are required, use "sep".
239
240 The separation character can not be equal to the quote character or to
241 the escape character.
242
243 sep
244
245 my $csv = Text::CSV->new ({ sep => "\N{FULLWIDTH COMMA}" });
246 $csv->sep (";");
247 my $sep = $csv->sep;
248
249 The chars used to separate fields, by default undefined. Limited to 8
250 bytes.
251
252 When set, overrules "sep_char". If its length is one byte it acts as
253 an alias to "sep_char".
254
255 quote_char
256
257 my $csv = Text::CSV->new ({ quote_char => "'" });
258 $csv->quote_char (undef);
259 my $c = $csv->quote_char;
260
261 The character to quote fields containing blanks or binary data, by
262 default the double quote character ("""). A value of undef suppresses
263 quote chars (for simple cases only). Limited to a single-byte
264 character, usually in the range from 0x20 (space) to 0x7E (tilde).
265 When longer sequences are required, use "quote".
266
267 "quote_char" can not be equal to "sep_char".
268
269 quote
270
271 my $csv = Text::CSV->new ({ quote => "\N{FULLWIDTH QUOTATION MARK}" });
272 $csv->quote ("'");
273 my $quote = $csv->quote;
274
275 The chars used to quote fields, by default undefined. Limited to 8
276 bytes.
277
278 When set, overrules "quote_char". If its length is one byte it acts as
279 an alias to "quote_char".
280
281 escape_char
282
283 my $csv = Text::CSV->new ({ escape_char => "\\" });
284 $csv->escape_char (":");
285 my $c = $csv->escape_char;
286
287 The character to escape certain characters inside quoted fields.
288 This is limited to a single-byte character, usually in the range
289 from 0x20 (space) to 0x7E (tilde).
290
291 The "escape_char" defaults to being the double-quote mark ("""). In
292 other words the same as the default "quote_char". This means that
293 doubling the quote mark in a field escapes it:
294
295 "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
296
297 If you change the "quote_char" without changing the
298 "escape_char", the "escape_char" will still be the double-quote
299 ("""). If instead you want to escape the "quote_char" by doubling it
300 you will need to also change the "escape_char" to be the same as what
301 you have changed the "quote_char" to.
302
303 Setting "escape_char" to <undef> or "" will disable escaping completely
304 and is greatly discouraged. This will also disable "escape_null".
305
306 The escape character can not be equal to the separation character.
307
308 binary
309
310 my $csv = Text::CSV->new ({ binary => 1 });
311 $csv->binary (0);
312 my $f = $csv->binary;
313
314 If this attribute is 1, you may use binary characters in quoted
315 fields, including line feeds, carriage returns and "NULL" bytes. (The
316 latter could be escaped as ""0".) By default this feature is off.
317
318 If a string is marked UTF8, "binary" will be turned on automatically
319 when binary characters other than "CR" and "NL" are encountered. Note
320 that a simple string like "\x{00a0}" might still be binary, but not
321 marked UTF8, so setting "{ binary => 1 }" is still a wise option.
322
323 strict
324
325 my $csv = Text::CSV->new ({ strict => 1 });
326 $csv->strict (0);
327 my $f = $csv->strict;
328
329 If this attribute is set to 1, any row that parses to a different
330 number of fields than the previous row will cause the parser to throw
331 error 2014.
332
333 formula_handling
334
335 formula
336
337 my $csv = Text::CSV->new ({ formula => "none" });
338 $csv->formula ("none");
339 my $f = $csv->formula;
340
341 This defines the behavior of fields containing formulas. As formulas
342 are considered dangerous in spreadsheets, this attribute can define an
343 optional action to be taken if a field starts with an equal sign ("=").
344
345 For purpose of code-readability, this can also be written as
346
347 my $csv = Text::CSV->new ({ formula_handling => "none" });
348 $csv->formula_handling ("none");
349 my $f = $csv->formula_handling;
350
351 Possible values for this attribute are
352
353 none
354 Take no specific action. This is the default.
355
356 $csv->formula ("none");
357
358 die
359 Cause the process to "die" whenever a leading "=" is encountered.
360
361 $csv->formula ("die");
362
363 croak
364 Cause the process to "croak" whenever a leading "=" is encountered.
365 (See Carp)
366
367 $csv->formula ("croak");
368
369 diag
370 Report position and content of the field whenever a leading "=" is
371 found. The value of the field is unchanged.
372
373 $csv->formula ("diag");
374
375 empty
376 Replace the content of fields that start with a "=" with the empty
377 string.
378
379 $csv->formula ("empty");
380 $csv->formula ("");
381
382 undef
383 Replace the content of fields that start with a "=" with "undef".
384
385 $csv->formula ("undef");
386 $csv->formula (undef);
387
388 All other values will give a warning and then fallback to "diag".
389
390 decode_utf8
391
392 my $csv = Text::CSV->new ({ decode_utf8 => 1 });
393 $csv->decode_utf8 (0);
394 my $f = $csv->decode_utf8;
395
396 This attributes defaults to TRUE.
397
398 While parsing, fields that are valid UTF-8, are automatically set to
399 be UTF-8, so that
400
401 $csv->parse ("\xC4\xA8\n");
402
403 results in
404
405 PV("\304\250"\0) [UTF8 "\x{128}"]
406
407 Sometimes it might not be a desired action. To prevent those upgrades,
408 set this attribute to false, and the result will be
409
410 PV("\304\250"\0)
411
412 auto_diag
413
414 my $csv = Text::CSV->new ({ auto_diag => 1 });
415 $csv->auto_diag (2);
416 my $l = $csv->auto_diag;
417
418 Set this attribute to a number between 1 and 9 causes "error_diag" to
419 be automatically called in void context upon errors.
420
421 In case of error "2012 - EOF", this call will be void.
422
423 If "auto_diag" is set to a numeric value greater than 1, it will "die"
424 on errors instead of "warn". If set to anything unrecognized, it will
425 be silently ignored.
426
427 Future extensions to this feature will include more reliable auto-
428 detection of "autodie" being active in the scope of which the error
429 occurred which will increment the value of "auto_diag" with 1 the
430 moment the error is detected.
431
432 diag_verbose
433
434 my $csv = Text::CSV->new ({ diag_verbose => 1 });
435 $csv->diag_verbose (2);
436 my $l = $csv->diag_verbose;
437
438 Set the verbosity of the output triggered by "auto_diag". Currently
439 only adds the current input-record-number (if known) to the
440 diagnostic output with an indication of the position of the error.
441
442 blank_is_undef
443
444 my $csv = Text::CSV->new ({ blank_is_undef => 1 });
445 $csv->blank_is_undef (0);
446 my $f = $csv->blank_is_undef;
447
448 Under normal circumstances, "CSV" data makes no distinction between
449 quoted- and unquoted empty fields. These both end up in an empty
450 string field once read, thus
451
452 1,"",," ",2
453
454 is read as
455
456 ("1", "", "", " ", "2")
457
458 When writing "CSV" files with either "always_quote" or "quote_empty"
459 set, the unquoted empty field is the result of an undefined value.
460 To enable this distinction when reading "CSV" data, the
461 "blank_is_undef" attribute will cause unquoted empty fields to be set
462 to "undef", causing the above to be parsed as
463
464 ("1", "", undef, " ", "2")
465
466 note that this is specifically important when loading "CSV" fields
467 into a database that allows "NULL" values, as the perl equivalent for
468 "NULL" is "undef" in DBI land.
469
470 empty_is_undef
471
472 my $csv = Text::CSV->new ({ empty_is_undef => 1 });
473 $csv->empty_is_undef (0);
474 my $f = $csv->empty_is_undef;
475
476 Going one step further than "blank_is_undef", this attribute
477 converts all empty fields to "undef", so
478
479 1,"",," ",2
480
481 is read as
482
483 (1, undef, undef, " ", 2)
484
485 Note that this effects only fields that are originally empty, not
486 fields that are empty after stripping allowed whitespace. YMMV.
487
488 allow_whitespace
489
490 my $csv = Text::CSV->new ({ allow_whitespace => 1 });
491 $csv->allow_whitespace (0);
492 my $f = $csv->allow_whitespace;
493
494 When this option is set to true, the whitespace ("TAB"'s and
495 "SPACE"'s) surrounding the separation character is removed when
496 parsing. If either "TAB" or "SPACE" is one of the three characters
497 "sep_char", "quote_char", or "escape_char" it will not be considered
498 whitespace.
499
500 Now lines like:
501
502 1 , "foo" , bar , 3 , zapp
503
504 are parsed as valid "CSV", even though it violates the "CSV" specs.
505
506 Note that all whitespace is stripped from both start and end of
507 each field. That would make it more than a feature to enable parsing
508 bad "CSV" lines, as
509
510 1, 2.0, 3, ape , monkey
511
512 will now be parsed as
513
514 ("1", "2.0", "3", "ape", "monkey")
515
516 even if the original line was perfectly acceptable "CSV".
517
518 allow_loose_quotes
519
520 my $csv = Text::CSV->new ({ allow_loose_quotes => 1 });
521 $csv->allow_loose_quotes (0);
522 my $f = $csv->allow_loose_quotes;
523
524 By default, parsing unquoted fields containing "quote_char" characters
525 like
526
527 1,foo "bar" baz,42
528
529 would result in parse error 2034. Though it is still bad practice to
530 allow this format, we cannot help the fact that some vendors
531 make their applications spit out lines styled this way.
532
533 If there is really bad "CSV" data, like
534
535 1,"foo "bar" baz",42
536
537 or
538
539 1,""foo bar baz"",42
540
541 there is a way to get this data-line parsed and leave the quotes inside
542 the quoted field as-is. This can be achieved by setting
543 "allow_loose_quotes" AND making sure that the "escape_char" is not
544 equal to "quote_char".
545
546 allow_loose_escapes
547
548 my $csv = Text::CSV->new ({ allow_loose_escapes => 1 });
549 $csv->allow_loose_escapes (0);
550 my $f = $csv->allow_loose_escapes;
551
552 Parsing fields that have "escape_char" characters that escape
553 characters that do not need to be escaped, like:
554
555 my $csv = Text::CSV->new ({ escape_char => "\\" });
556 $csv->parse (qq{1,"my bar\'s",baz,42});
557
558 would result in parse error 2025. Though it is bad practice to allow
559 this format, this attribute enables you to treat all escape character
560 sequences equal.
561
562 allow_unquoted_escape
563
564 my $csv = Text::CSV->new ({ allow_unquoted_escape => 1 });
565 $csv->allow_unquoted_escape (0);
566 my $f = $csv->allow_unquoted_escape;
567
568 A backward compatibility issue where "escape_char" differs from
569 "quote_char" prevents "escape_char" to be in the first position of a
570 field. If "quote_char" is equal to the default """ and "escape_char"
571 is set to "\", this would be illegal:
572
573 1,\0,2
574
575 Setting this attribute to 1 might help to overcome issues with
576 backward compatibility and allow this style.
577
578 always_quote
579
580 my $csv = Text::CSV->new ({ always_quote => 1 });
581 $csv->always_quote (0);
582 my $f = $csv->always_quote;
583
584 By default the generated fields are quoted only if they need to be.
585 For example, if they contain the separator character. If you set this
586 attribute to 1 then all defined fields will be quoted. ("undef" fields
587 are not quoted, see "blank_is_undef"). This makes it quite often easier
588 to handle exported data in external applications.
589
590 quote_space
591
592 my $csv = Text::CSV->new ({ quote_space => 1 });
593 $csv->quote_space (0);
594 my $f = $csv->quote_space;
595
596 By default, a space in a field would trigger quotation. As no rule
597 exists this to be forced in "CSV", nor any for the opposite, the
598 default is true for safety. You can exclude the space from this
599 trigger by setting this attribute to 0.
600
601 quote_empty
602
603 my $csv = Text::CSV->new ({ quote_empty => 1 });
604 $csv->quote_empty (0);
605 my $f = $csv->quote_empty;
606
607 By default the generated fields are quoted only if they need to be.
608 An empty (defined) field does not need quotation. If you set this
609 attribute to 1 then empty defined fields will be quoted. ("undef"
610 fields are not quoted, see "blank_is_undef"). See also "always_quote".
611
612 quote_binary
613
614 my $csv = Text::CSV->new ({ quote_binary => 1 });
615 $csv->quote_binary (0);
616 my $f = $csv->quote_binary;
617
618 By default, all "unsafe" bytes inside a string cause the combined
619 field to be quoted. By setting this attribute to 0, you can disable
620 that trigger for bytes >= 0x7F.
621
622 escape_null
623
624 my $csv = Text::CSV->new ({ escape_null => 1 });
625 $csv->escape_null (0);
626 my $f = $csv->escape_null;
627
628 By default, a "NULL" byte in a field would be escaped. This option
629 enables you to treat the "NULL" byte as a simple binary character in
630 binary mode (the "{ binary => 1 }" is set). The default is true. You
631 can prevent "NULL" escapes by setting this attribute to 0.
632
633 When the "escape_char" attribute is set to undefined, this attribute
634 will be set to false.
635
636 The default setting will encode "=\x00=" as
637
638 "="0="
639
640 With "escape_null" set, this will result in
641
642 "=\x00="
643
644 The default when using the "csv" function is "false".
645
646 For backward compatibility reasons, the deprecated old name
647 "quote_null" is still recognized.
648
649 keep_meta_info
650
651 my $csv = Text::CSV->new ({ keep_meta_info => 1 });
652 $csv->keep_meta_info (0);
653 my $f = $csv->keep_meta_info;
654
655 By default, the parsing of input records is as simple and fast as
656 possible. However, some parsing information - like quotation of the
657 original field - is lost in that process. Setting this flag to true
658 enables retrieving that information after parsing with the methods
659 "meta_info", "is_quoted", and "is_binary" described below. Default is
660 false for performance.
661
662 If you set this attribute to a value greater than 9, than you can
663 control output quotation style like it was used in the input of the the
664 last parsed record (unless quotation was added because of other
665 reasons).
666
667 my $csv = Text::CSV->new ({
668 binary => 1,
669 keep_meta_info => 1,
670 quote_space => 0,
671 });
672
673 my $row = $csv->parse (q{1,,"", ," ",f,"g","h""h",help,"help"});
674
675 $csv->print (*STDOUT, \@row);
676 # 1,,, , ,f,g,"h""h",help,help
677 $csv->keep_meta_info (11);
678 $csv->print (*STDOUT, \@row);
679 # 1,,"", ," ",f,"g","h""h",help,"help"
680
681 undef_str
682
683 my $csv = Text::CSV->new ({ undef_str => "\\N" });
684 $csv->undef_str (undef);
685 my $s = $csv->undef_str;
686
687 This attribute optionally defines the output of undefined fields. The
688 value passed is not changed at all, so if it needs quotation, the
689 quotation needs to be included in the value of the attribute. Use with
690 caution, as passing a value like ",",,,,""" will for sure mess up
691 your output. The default for this attribute is "undef", meaning no
692 special treatment.
693
694 This attribute is useful when exporting CSV data to be imported in
695 custom loaders, like for MySQL, that recognize special sequences for
696 "NULL" data.
697
698 verbatim
699
700 my $csv = Text::CSV->new ({ verbatim => 1 });
701 $csv->verbatim (0);
702 my $f = $csv->verbatim;
703
704 This is a quite controversial attribute to set, but makes some hard
705 things possible.
706
707 The rationale behind this attribute is to tell the parser that the
708 normally special characters newline ("NL") and Carriage Return ("CR")
709 will not be special when this flag is set, and be dealt with as being
710 ordinary binary characters. This will ease working with data with
711 embedded newlines.
712
713 When "verbatim" is used with "getline", "getline" auto-"chomp"'s
714 every line.
715
716 Imagine a file format like
717
718 M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
719
720 where, the line ending is a very specific "#\r\n", and the sep_char is
721 a "^" (caret). None of the fields is quoted, but embedded binary
722 data is likely to be present. With the specific line ending, this
723 should not be too hard to detect.
724
725 By default, Text::CSV' parse function is instructed to only know
726 about "\n" and "\r" to be legal line endings, and so has to deal with
727 the embedded newline as a real "end-of-line", so it can scan the next
728 line if binary is true, and the newline is inside a quoted field. With
729 this option, we tell "parse" to parse the line as if "\n" is just
730 nothing more than a binary character.
731
732 For "parse" this means that the parser has no more idea about line
733 ending and "getline" "chomp"s line endings on reading.
734
735 types
736
737 A set of column types; the attribute is immediately passed to the
738 "types" method.
739
740 callbacks
741
742 See the "Callbacks" section below.
743
744 accessors
745
746 To sum it up,
747
748 $csv = Text::CSV->new ();
749
750 is equivalent to
751
752 $csv = Text::CSV->new ({
753 eol => undef, # \r, \n, or \r\n
754 sep_char => ',',
755 sep => undef,
756 quote_char => '"',
757 quote => undef,
758 escape_char => '"',
759 binary => 0,
760 decode_utf8 => 1,
761 auto_diag => 0,
762 diag_verbose => 0,
763 blank_is_undef => 0,
764 empty_is_undef => 0,
765 allow_whitespace => 0,
766 allow_loose_quotes => 0,
767 allow_loose_escapes => 0,
768 allow_unquoted_escape => 0,
769 always_quote => 0,
770 quote_empty => 0,
771 quote_space => 1,
772 escape_null => 1,
773 quote_binary => 1,
774 keep_meta_info => 0,
775 verbatim => 0,
776 undef_str => undef,
777 types => undef,
778 callbacks => undef,
779 });
780
781 For all of the above mentioned flags, an accessor method is available
782 where you can inquire the current value, or change the value
783
784 my $quote = $csv->quote_char;
785 $csv->binary (1);
786
787 It is not wise to change these settings halfway through writing "CSV"
788 data to a stream. If however you want to create a new stream using the
789 available "CSV" object, there is no harm in changing them.
790
791 If the "new" constructor call fails, it returns "undef", and makes
792 the fail reason available through the "error_diag" method.
793
794 $csv = Text::CSV->new ({ ecs_char => 1 }) or
795 die "".Text::CSV->error_diag ();
796
797 "error_diag" will return a string like
798
799 "INI - Unknown attribute 'ecs_char'"
800
801 known_attributes
802 @attr = Text::CSV->known_attributes;
803 @attr = Text::CSV::known_attributes;
804 @attr = $csv->known_attributes;
805
806 This method will return an ordered list of all the supported
807 attributes as described above. This can be useful for knowing what
808 attributes are valid in classes that use or extend Text::CSV.
809
810 print
811 $status = $csv->print ($fh, $colref);
812
813 Similar to "combine" + "string" + "print", but much more efficient.
814 It expects an array ref as input (not an array!) and the resulting
815 string is not really created, but immediately written to the $fh
816 object, typically an IO handle or any other object that offers a
817 "print" method.
818
819 For performance reasons "print" does not create a result string, so
820 all "string", "status", "fields", and "error_input" methods will return
821 undefined information after executing this method.
822
823 If $colref is "undef" (explicit, not through a variable argument) and
824 "bind_columns" was used to specify fields to be printed, it is
825 possible to make performance improvements, as otherwise data would have
826 to be copied as arguments to the method call:
827
828 $csv->bind_columns (\($foo, $bar));
829 $status = $csv->print ($fh, undef);
830
831 A short benchmark
832
833 my @data = ("aa" .. "zz");
834 $csv->bind_columns (\(@data));
835
836 $csv->print ($fh, [ @data ]); # 11800 recs/sec
837 $csv->print ($fh, \@data ); # 57600 recs/sec
838 $csv->print ($fh, undef ); # 48500 recs/sec
839
840 say
841 $status = $csv->say ($fh, $colref);
842
843 Like "print", but "eol" defaults to "$\".
844
845 print_hr
846 $csv->print_hr ($fh, $ref);
847
848 Provides an easy way to print a $ref (as fetched with "getline_hr")
849 provided the column names are set with "column_names".
850
851 It is just a wrapper method with basic parameter checks over
852
853 $csv->print ($fh, [ map { $ref->{$_} } $csv->column_names ]);
854
855 combine
856 $status = $csv->combine (@fields);
857
858 This method constructs a "CSV" record from @fields, returning success
859 or failure. Failure can result from lack of arguments or an argument
860 that contains an invalid character. Upon success, "string" can be
861 called to retrieve the resultant "CSV" string. Upon failure, the
862 value returned by "string" is undefined and "error_input" could be
863 called to retrieve the invalid argument.
864
865 string
866 $line = $csv->string ();
867
868 This method returns the input to "parse" or the resultant "CSV"
869 string of "combine", whichever was called more recently.
870
871 getline
872 $colref = $csv->getline ($fh);
873
874 This is the counterpart to "print", as "parse" is the counterpart to
875 "combine": it parses a row from the $fh handle using the "getline"
876 method associated with $fh and parses this row into an array ref.
877 This array ref is returned by the function or "undef" for failure.
878 When $fh does not support "getline", you are likely to hit errors.
879
880 When fields are bound with "bind_columns" the return value is a
881 reference to an empty list.
882
883 The "string", "fields", and "status" methods are meaningless again.
884
885 getline_all
886 $arrayref = $csv->getline_all ($fh);
887 $arrayref = $csv->getline_all ($fh, $offset);
888 $arrayref = $csv->getline_all ($fh, $offset, $length);
889
890 This will return a reference to a list of getline ($fh) results. In
891 this call, "keep_meta_info" is disabled. If $offset is negative, as
892 with "splice", only the last "abs ($offset)" records of $fh are taken
893 into consideration.
894
895 Given a CSV file with 10 lines:
896
897 lines call
898 ----- ---------------------------------------------------------
899 0..9 $csv->getline_all ($fh) # all
900 0..9 $csv->getline_all ($fh, 0) # all
901 8..9 $csv->getline_all ($fh, 8) # start at 8
902 - $csv->getline_all ($fh, 0, 0) # start at 0 first 0 rows
903 0..4 $csv->getline_all ($fh, 0, 5) # start at 0 first 5 rows
904 4..5 $csv->getline_all ($fh, 4, 2) # start at 4 first 2 rows
905 8..9 $csv->getline_all ($fh, -2) # last 2 rows
906 6..7 $csv->getline_all ($fh, -4, 2) # first 2 of last 4 rows
907
908 getline_hr
909 The "getline_hr" and "column_names" methods work together to allow you
910 to have rows returned as hashrefs. You must call "column_names" first
911 to declare your column names.
912
913 $csv->column_names (qw( code name price description ));
914 $hr = $csv->getline_hr ($fh);
915 print "Price for $hr->{name} is $hr->{price} EUR\n";
916
917 "getline_hr" will croak if called before "column_names".
918
919 Note that "getline_hr" creates a hashref for every row and will be
920 much slower than the combined use of "bind_columns" and "getline" but
921 still offering the same ease of use hashref inside the loop:
922
923 my @cols = @{$csv->getline ($fh)};
924 $csv->column_names (@cols);
925 while (my $row = $csv->getline_hr ($fh)) {
926 print $row->{price};
927 }
928
929 Could easily be rewritten to the much faster:
930
931 my @cols = @{$csv->getline ($fh)};
932 my $row = {};
933 $csv->bind_columns (\@{$row}{@cols});
934 while ($csv->getline ($fh)) {
935 print $row->{price};
936 }
937
938 Your mileage may vary for the size of the data and the number of rows.
939 With perl-5.14.2 the comparison for a 100_000 line file with 14 rows:
940
941 Rate hashrefs getlines
942 hashrefs 1.00/s -- -76%
943 getlines 4.15/s 313% --
944
945 getline_hr_all
946 $arrayref = $csv->getline_hr_all ($fh);
947 $arrayref = $csv->getline_hr_all ($fh, $offset);
948 $arrayref = $csv->getline_hr_all ($fh, $offset, $length);
949
950 This will return a reference to a list of getline_hr ($fh) results.
951 In this call, "keep_meta_info" is disabled.
952
953 parse
954 $status = $csv->parse ($line);
955
956 This method decomposes a "CSV" string into fields, returning success
957 or failure. Failure can result from a lack of argument or the given
958 "CSV" string is improperly formatted. Upon success, "fields" can be
959 called to retrieve the decomposed fields. Upon failure calling "fields"
960 will return undefined data and "error_input" can be called to
961 retrieve the invalid argument.
962
963 You may use the "types" method for setting column types. See "types"'
964 description below.
965
966 The $line argument is supposed to be a simple scalar. Everything else
967 is supposed to croak and set error 1500.
968
969 fragment
970 This function tries to implement RFC7111 (URI Fragment Identifiers for
971 the text/csv Media Type) - http://tools.ietf.org/html/rfc7111
972
973 my $AoA = $csv->fragment ($fh, $spec);
974
975 In specifications, "*" is used to specify the last item, a dash ("-")
976 to indicate a range. All indices are 1-based: the first row or
977 column has index 1. Selections can be combined with the semi-colon
978 (";").
979
980 When using this method in combination with "column_names", the
981 returned reference will point to a list of hashes instead of a list
982 of lists. A disjointed cell-based combined selection might return
983 rows with different number of columns making the use of hashes
984 unpredictable.
985
986 $csv->column_names ("Name", "Age");
987 my $AoH = $csv->fragment ($fh, "col=3;8");
988
989 If the "after_parse" callback is active, it is also called on every
990 line parsed and skipped before the fragment.
991
992 row
993 row=4
994 row=5-7
995 row=6-*
996 row=1-2;4;6-*
997
998 col
999 col=2
1000 col=1-3
1001 col=4-*
1002 col=1-2;4;7-*
1003
1004 cell
1005 In cell-based selection, the comma (",") is used to pair row and
1006 column
1007
1008 cell=4,1
1009
1010 The range operator ("-") using "cell"s can be used to define top-left
1011 and bottom-right "cell" location
1012
1013 cell=3,1-4,6
1014
1015 The "*" is only allowed in the second part of a pair
1016
1017 cell=3,2-*,2 # row 3 till end, only column 2
1018 cell=3,2-3,* # column 2 till end, only row 3
1019 cell=3,2-*,* # strip row 1 and 2, and column 1
1020
1021 Cells and cell ranges may be combined with ";", possibly resulting in
1022 rows with different number of columns
1023
1024 cell=1,1-2,2;3,3-4,4;1,4;4,1
1025
1026 Disjointed selections will only return selected cells. The cells
1027 that are not specified will not be included in the returned
1028 set, not even as "undef". As an example given a "CSV" like
1029
1030 11,12,13,...19
1031 21,22,...28,29
1032 : :
1033 91,...97,98,99
1034
1035 with "cell=1,1-2,2;3,3-4,4;1,4;4,1" will return:
1036
1037 11,12,14
1038 21,22
1039 33,34
1040 41,43,44
1041
1042 Overlapping cell-specs will return those cells only once, So
1043 "cell=1,1-3,3;2,2-4,4;2,3;4,2" will return:
1044
1045 11,12,13
1046 21,22,23,24
1047 31,32,33,34
1048 42,43,44
1049
1050 RFC7111 <http://tools.ietf.org/html/rfc7111> does not allow different
1051 types of specs to be combined (either "row" or "col" or "cell").
1052 Passing an invalid fragment specification will croak and set error
1053 2013.
1054
1055 column_names
1056 Set the "keys" that will be used in the "getline_hr" calls. If no
1057 keys (column names) are passed, it will return the current setting as a
1058 list.
1059
1060 "column_names" accepts a list of scalars (the column names) or a
1061 single array_ref, so you can pass the return value from "getline" too:
1062
1063 $csv->column_names ($csv->getline ($fh));
1064
1065 "column_names" does no checking on duplicates at all, which might lead
1066 to unexpected results. Undefined entries will be replaced with the
1067 string "\cAUNDEF\cA", so
1068
1069 $csv->column_names (undef, "", "name", "name");
1070 $hr = $csv->getline_hr ($fh);
1071
1072 Will set "$hr->{"\cAUNDEF\cA"}" to the 1st field, "$hr->{""}" to the
1073 2nd field, and "$hr->{name}" to the 4th field, discarding the 3rd
1074 field.
1075
1076 "column_names" croaks on invalid arguments.
1077
1078 header
1079 This method does NOT work in perl-5.6.x
1080
1081 Parse the CSV header and set "sep", column_names and encoding.
1082
1083 my @hdr = $csv->header ($fh);
1084 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1085 $csv->header ($fh, { detect_bom => 1, munge_column_names => "lc" });
1086
1087 The first argument should be a file handle.
1088
1089 This method resets some object properties, as it is supposed to be
1090 invoked only once per file or stream. It will leave attributes
1091 "column_names" and "bound_columns" alone of setting column names is
1092 disabled. Reading headers on previously process objects might fail on
1093 perl-5.8.0 and older.
1094
1095 Assuming that the file opened for parsing has a header, and the header
1096 does not contain problematic characters like embedded newlines, read
1097 the first line from the open handle then auto-detect whether the header
1098 separates the column names with a character from the allowed separator
1099 list.
1100
1101 If any of the allowed separators matches, and none of the other
1102 allowed separators match, set "sep" to that separator for the
1103 current CSV instance and use it to parse the first line, map those to
1104 lowercase, and use that to set the instance "column_names":
1105
1106 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1107 open my $fh, "<", "file.csv";
1108 binmode $fh; # for Windows
1109 $csv->header ($fh);
1110 while (my $row = $csv->getline_hr ($fh)) {
1111 ...
1112 }
1113
1114 If the header is empty, contains more than one unique separator out of
1115 the allowed set, contains empty fields, or contains identical fields
1116 (after folding), it will croak with error 1010, 1011, 1012, or 1013
1117 respectively.
1118
1119 If the header contains embedded newlines or is not valid CSV in any
1120 other way, this method will croak and leave the parse error untouched.
1121
1122 A successful call to "header" will always set the "sep" of the $csv
1123 object. This behavior can not be disabled.
1124
1125 return value
1126
1127 On error this method will croak.
1128
1129 In list context, the headers will be returned whether they are used to
1130 set "column_names" or not.
1131
1132 In scalar context, the instance itself is returned. Note: the values
1133 as found in the header will effectively be lost if "set_column_names"
1134 is false.
1135
1136 Options
1137
1138 sep_set
1139 $csv->header ($fh, { sep_set => [ ";", ",", "|", "\t" ] });
1140
1141 The list of legal separators defaults to "[ ";", "," ]" and can be
1142 changed by this option. As this is probably the most often used
1143 option, it can be passed on its own as an unnamed argument:
1144
1145 $csv->header ($fh, [ ";", ",", "|", "\t", "::", "\x{2063}" ]);
1146
1147 Multi-byte sequences are allowed, both multi-character and
1148 Unicode. See "sep".
1149
1150 detect_bom
1151 $csv->header ($fh, { detect_bom => 1 });
1152
1153 The default behavior is to detect if the header line starts with a
1154 BOM. If the header has a BOM, use that to set the encoding of $fh.
1155 This default behavior can be disabled by passing a false value to
1156 "detect_bom".
1157
1158 Supported encodings from BOM are: UTF-8, UTF-16BE, UTF-16LE,
1159 UTF-32BE, and UTF-32LE. BOM's also support UTF-1, UTF-EBCDIC, SCSU,
1160 BOCU-1, and GB-18030 but Encode does not (yet). UTF-7 is not
1161 supported.
1162
1163 If a supported BOM was detected as start of the stream, it is stored
1164 in the abject attribute "ENCODING".
1165
1166 my $enc = $csv->{ENCODING};
1167
1168 The encoding is used with "binmode" on $fh.
1169
1170 If the handle was opened in a (correct) encoding, this method will
1171 not alter the encoding, as it checks the leading bytes of the first
1172 line. In case the stream starts with a decode BOM ("U+FEFF"),
1173 "{ENCODING}" will be "" (empty) instead of the default "undef".
1174
1175 munge_column_names
1176 This option offers the means to modify the column names into
1177 something that is most useful to the application. The default is to
1178 map all column names to lower case.
1179
1180 $csv->header ($fh, { munge_column_names => "lc" });
1181
1182 The following values are available:
1183
1184 lc - lower case
1185 uc - upper case
1186 none - do not change
1187 \%hash - supply a mapping
1188 \&cb - supply a callback
1189
1190 Literal:
1191
1192 $csv->header ($fh, { munge_column_names => "none" });
1193
1194 Hash:
1195
1196 $csv->header ($fh, { munge_column_names => { foo => "sombrero" });
1197
1198 if a value does not exist, the original value is used unchanged
1199
1200 Callback:
1201
1202 $csv->header ($fh, { munge_column_names => sub { fc } });
1203 $csv->header ($fh, { munge_column_names => sub { "column_".$col++ } });
1204 $csv->header ($fh, { munge_column_names => sub { lc (s/\W+/_/gr) } });
1205
1206 As this callback is called in a "map", you can use $_ directly.
1207
1208 set_column_names
1209 $csv->header ($fh, { set_column_names => 1 });
1210
1211 The default is to set the instances column names using
1212 "column_names" if the method is successful, so subsequent calls to
1213 "getline_hr" can return a hash. Disable setting the header can be
1214 forced by using a false value for this option.
1215
1216 As described in "return value" above, content is lost in scalar
1217 context.
1218
1219 Validation
1220
1221 When receiving CSV files from external sources, this method can be
1222 used to protect against changes in the layout by restricting to known
1223 headers (and typos in the header fields).
1224
1225 my %known = (
1226 "record key" => "c_rec",
1227 "rec id" => "c_rec",
1228 "id_rec" => "c_rec",
1229 "kode" => "code",
1230 "code" => "code",
1231 "vaule" => "value",
1232 "value" => "value",
1233 );
1234 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
1235 open my $fh, "<", $source or die "$source: $!";
1236 $csv->header ($fh, { munge_column_names => sub {
1237 s/\s+$//;
1238 s/^\s+//;
1239 $known{lc $_} or die "Unknown column '$_' in $source";
1240 }});
1241 while (my $row = $csv->getline_hr ($fh)) {
1242 say join "\t", $row->{c_rec}, $row->{code}, $row->{value};
1243 }
1244
1245 bind_columns
1246 Takes a list of scalar references to be used for output with "print"
1247 or to store in the fields fetched by "getline". When you do not pass
1248 enough references to store the fetched fields in, "getline" will fail
1249 with error 3006. If you pass more than there are fields to return,
1250 the content of the remaining references is left untouched.
1251
1252 $csv->bind_columns (\$code, \$name, \$price, \$description);
1253 while ($csv->getline ($fh)) {
1254 print "The price of a $name is \x{20ac} $price\n";
1255 }
1256
1257 To reset or clear all column binding, call "bind_columns" with the
1258 single argument "undef". This will also clear column names.
1259
1260 $csv->bind_columns (undef);
1261
1262 If no arguments are passed at all, "bind_columns" will return the list
1263 of current bindings or "undef" if no binds are active.
1264
1265 Note that in parsing with "bind_columns", the fields are set on the
1266 fly. That implies that if the third field of a row causes an error
1267 (or this row has just two fields where the previous row had more), the
1268 first two fields already have been assigned the values of the current
1269 row, while the rest of the fields will still hold the values of the
1270 previous row. If you want the parser to fail in these cases, use the
1271 "strict" attribute.
1272
1273 eof
1274 $eof = $csv->eof ();
1275
1276 If "parse" or "getline" was used with an IO stream, this method will
1277 return true (1) if the last call hit end of file, otherwise it will
1278 return false (''). This is useful to see the difference between a
1279 failure and end of file.
1280
1281 Note that if the parsing of the last line caused an error, "eof" is
1282 still true. That means that if you are not using "auto_diag", an idiom
1283 like
1284
1285 while (my $row = $csv->getline ($fh)) {
1286 # ...
1287 }
1288 $csv->eof or $csv->error_diag;
1289
1290 will not report the error. You would have to change that to
1291
1292 while (my $row = $csv->getline ($fh)) {
1293 # ...
1294 }
1295 +$csv->error_diag and $csv->error_diag;
1296
1297 types
1298 $csv->types (\@tref);
1299
1300 This method is used to force that (all) columns are of a given type.
1301 For example, if you have an integer column, two columns with
1302 doubles and a string column, then you might do a
1303
1304 $csv->types ([Text::CSV::IV (),
1305 Text::CSV::NV (),
1306 Text::CSV::NV (),
1307 Text::CSV::PV ()]);
1308
1309 Column types are used only for decoding columns while parsing, in
1310 other words by the "parse" and "getline" methods.
1311
1312 You can unset column types by doing a
1313
1314 $csv->types (undef);
1315
1316 or fetch the current type settings with
1317
1318 $types = $csv->types ();
1319
1320 IV Set field type to integer.
1321
1322 NV Set field type to numeric/float.
1323
1324 PV Set field type to string.
1325
1326 fields
1327 @columns = $csv->fields ();
1328
1329 This method returns the input to "combine" or the resultant
1330 decomposed fields of a successful "parse", whichever was called more
1331 recently.
1332
1333 Note that the return value is undefined after using "getline", which
1334 does not fill the data structures returned by "parse".
1335
1336 meta_info
1337 @flags = $csv->meta_info ();
1338
1339 This method returns the "flags" of the input to "combine" or the flags
1340 of the resultant decomposed fields of "parse", whichever was called
1341 more recently.
1342
1343 For each field, a meta_info field will hold flags that inform
1344 something about the field returned by the "fields" method or
1345 passed to the "combine" method. The flags are bit-wise-"or"'d like:
1346
1347 " "0x0001
1348 The field was quoted.
1349
1350 " "0x0002
1351 The field was binary.
1352
1353 See the "is_***" methods below.
1354
1355 is_quoted
1356 my $quoted = $csv->is_quoted ($column_idx);
1357
1358 Where $column_idx is the (zero-based) index of the column in the
1359 last result of "parse".
1360
1361 This returns a true value if the data in the indicated column was
1362 enclosed in "quote_char" quotes. This might be important for fields
1363 where content ",20070108," is to be treated as a numeric value, and
1364 where ","20070108"," is explicitly marked as character string data.
1365
1366 This method is only valid when "keep_meta_info" is set to a true value.
1367
1368 is_binary
1369 my $binary = $csv->is_binary ($column_idx);
1370
1371 Where $column_idx is the (zero-based) index of the column in the
1372 last result of "parse".
1373
1374 This returns a true value if the data in the indicated column contained
1375 any byte in the range "[\x00-\x08,\x10-\x1F,\x7F-\xFF]".
1376
1377 This method is only valid when "keep_meta_info" is set to a true value.
1378
1379 is_missing
1380 my $missing = $csv->is_missing ($column_idx);
1381
1382 Where $column_idx is the (zero-based) index of the column in the
1383 last result of "getline_hr".
1384
1385 $csv->keep_meta_info (1);
1386 while (my $hr = $csv->getline_hr ($fh)) {
1387 $csv->is_missing (0) and next; # This was an empty line
1388 }
1389
1390 When using "getline_hr", it is impossible to tell if the parsed
1391 fields are "undef" because they where not filled in the "CSV" stream
1392 or because they were not read at all, as all the fields defined by
1393 "column_names" are set in the hash-ref. If you still need to know if
1394 all fields in each row are provided, you should enable "keep_meta_info"
1395 so you can check the flags.
1396
1397 If "keep_meta_info" is "false", "is_missing" will always return
1398 "undef", regardless of $column_idx being valid or not. If this
1399 attribute is "true" it will return either 0 (the field is present) or 1
1400 (the field is missing).
1401
1402 A special case is the empty line. If the line is completely empty -
1403 after dealing with the flags - this is still a valid CSV line: it is a
1404 record of just one single empty field. However, if "keep_meta_info" is
1405 set, invoking "is_missing" with index 0 will now return true.
1406
1407 status
1408 $status = $csv->status ();
1409
1410 This method returns the status of the last invoked "combine" or "parse"
1411 call. Status is success (true: 1) or failure (false: "undef" or 0).
1412
1413 error_input
1414 $bad_argument = $csv->error_input ();
1415
1416 This method returns the erroneous argument (if it exists) of "combine"
1417 or "parse", whichever was called more recently. If the last
1418 invocation was successful, "error_input" will return "undef".
1419
1420 error_diag
1421 Text::CSV->error_diag ();
1422 $csv->error_diag ();
1423 $error_code = 0 + $csv->error_diag ();
1424 $error_str = "" . $csv->error_diag ();
1425 ($cde, $str, $pos, $rec, $fld) = $csv->error_diag ();
1426
1427 If (and only if) an error occurred, this function returns the
1428 diagnostics of that error.
1429
1430 If called in void context, this will print the internal error code and
1431 the associated error message to STDERR.
1432
1433 If called in list context, this will return the error code and the
1434 error message in that order. If the last error was from parsing, the
1435 rest of the values returned are a best guess at the location within
1436 the line that was being parsed. Their values are 1-based. The
1437 position currently is index of the byte at which the parsing failed in
1438 the current record. It might change to be the index of the current
1439 character in a later release. The records is the index of the record
1440 parsed by the csv instance. The field number is the index of the field
1441 the parser thinks it is currently trying to parse. See
1442 examples/csv-check for how this can be used.
1443
1444 If called in scalar context, it will return the diagnostics in a
1445 single scalar, a-la $!. It will contain the error code in numeric
1446 context, and the diagnostics message in string context.
1447
1448 When called as a class method or a direct function call, the
1449 diagnostics are that of the last "new" call.
1450
1451 record_number
1452 $recno = $csv->record_number ();
1453
1454 Returns the records parsed by this csv instance. This value should be
1455 more accurate than $. when embedded newlines come in play. Records
1456 written by this instance are not counted.
1457
1458 SetDiag
1459 $csv->SetDiag (0);
1460
1461 Use to reset the diagnostics if you are dealing with errors.
1462
1464 backend
1465 Returns the backend module name called by Text::CSV. "module" is
1466 an alias.
1467
1468 is_xs
1469 Returns true value if Text::CSV uses an XS backend.
1470
1471 is_pp
1472 Returns true value if Text::CSV uses a pure-Perl backend.
1473
1475 This section is also taken from Text::CSV_XS.
1476
1477 csv
1478 This function is not exported by default and should be explicitly
1479 requested:
1480
1481 use Text::CSV qw( csv );
1482
1483 This is an high-level function that aims at simple (user) interfaces.
1484 This can be used to read/parse a "CSV" file or stream (the default
1485 behavior) or to produce a file or write to a stream (define the "out"
1486 attribute). It returns an array- or hash-reference on parsing (or
1487 "undef" on fail) or the numeric value of "error_diag" on writing.
1488 When this function fails you can get to the error using the class call
1489 to "error_diag"
1490
1491 my $aoa = csv (in => "test.csv") or
1492 die Text::CSV->error_diag;
1493
1494 This function takes the arguments as key-value pairs. This can be
1495 passed as a list or as an anonymous hash:
1496
1497 my $aoa = csv ( in => "test.csv", sep_char => ";");
1498 my $aoh = csv ({ in => $fh, headers => "auto" });
1499
1500 The arguments passed consist of two parts: the arguments to "csv"
1501 itself and the optional attributes to the "CSV" object used inside
1502 the function as enumerated and explained in "new".
1503
1504 If not overridden, the default option used for CSV is
1505
1506 auto_diag => 1
1507 escape_null => 0
1508
1509 The option that is always set and cannot be altered is
1510
1511 binary => 1
1512
1513 As this function will likely be used in one-liners, it allows "quote"
1514 to be abbreviated as "quo", and "escape_char" to be abbreviated as
1515 "esc" or "escape".
1516
1517 Alternative invocations:
1518
1519 my $aoa = Text::CSV::csv (in => "file.csv");
1520
1521 my $csv = Text::CSV->new ();
1522 my $aoa = $csv->csv (in => "file.csv");
1523
1524 In the latter case, the object attributes are used from the existing
1525 object and the attribute arguments in the function call are ignored:
1526
1527 my $csv = Text::CSV->new ({ sep_char => ";" });
1528 my $aoh = $csv->csv (in => "file.csv", bom => 1);
1529
1530 will parse using ";" as "sep_char", not ",".
1531
1532 in
1533
1534 Used to specify the source. "in" can be a file name (e.g. "file.csv"),
1535 which will be opened for reading and closed when finished, a file
1536 handle (e.g. $fh or "FH"), a reference to a glob (e.g. "\*ARGV"),
1537 the glob itself (e.g. *STDIN), or a reference to a scalar (e.g.
1538 "\q{1,2,"csv"}").
1539
1540 When used with "out", "in" should be a reference to a CSV structure
1541 (AoA or AoH) or a CODE-ref that returns an array-reference or a hash-
1542 reference. The code-ref will be invoked with no arguments.
1543
1544 my $aoa = csv (in => "file.csv");
1545
1546 open my $fh, "<", "file.csv";
1547 my $aoa = csv (in => $fh);
1548
1549 my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]];
1550 my $err = csv (in => $csv, out => "file.csv");
1551
1552 If called in void context without the "out" attribute, the resulting
1553 ref will be used as input to a subsequent call to csv:
1554
1555 csv (in => "file.csv", filter => { 2 => sub { length > 2 }})
1556
1557 will be a shortcut to
1558
1559 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}))
1560
1561 where, in the absence of the "out" attribute, this is a shortcut to
1562
1563 csv (in => csv (in => "file.csv", filter => { 2 => sub { length > 2 }}),
1564 out => *STDOUT)
1565
1566 out
1567
1568 csv (in => $aoa, out => "file.csv");
1569 csv (in => $aoa, out => $fh);
1570 csv (in => $aoa, out => STDOUT);
1571 csv (in => $aoa, out => *STDOUT);
1572 csv (in => $aoa, out => \*STDOUT);
1573 csv (in => $aoa, out => \my $data);
1574 csv (in => $aoa, out => undef);
1575 csv (in => $aoa, out => \"skip");
1576
1577 In output mode, the default CSV options when producing CSV are
1578
1579 eol => "\r\n"
1580
1581 The "fragment" attribute is ignored in output mode.
1582
1583 "out" can be a file name (e.g. "file.csv"), which will be opened for
1584 writing and closed when finished, a file handle (e.g. $fh or "FH"), a
1585 reference to a glob (e.g. "\*STDOUT"), the glob itself (e.g. *STDOUT),
1586 or a reference to a scalar (e.g. "\my $data").
1587
1588 csv (in => sub { $sth->fetch }, out => "dump.csv");
1589 csv (in => sub { $sth->fetchrow_hashref }, out => "dump.csv",
1590 headers => $sth->{NAME_lc});
1591
1592 When a code-ref is used for "in", the output is generated per
1593 invocation, so no buffering is involved. This implies that there is no
1594 size restriction on the number of records. The "csv" function ends when
1595 the coderef returns a false value.
1596
1597 If "out" is set to a reference of the literal string "skip", the output
1598 will be suppressed completely, which might be useful in combination
1599 with a filter for side effects only.
1600
1601 my %cache;
1602 csv (in => "dump.csv",
1603 out => \"skip",
1604 on_in => sub { $cache{$_[1][1]}++ });
1605
1606 Currently, setting "out" to any false value ("undef", "", 0) will be
1607 equivalent to "\"skip"".
1608
1609 encoding
1610
1611 If passed, it should be an encoding accepted by the ":encoding()"
1612 option to "open". There is no default value. This attribute does not
1613 work in perl 5.6.x. "encoding" can be abbreviated to "enc" for ease of
1614 use in command line invocations.
1615
1616 If "encoding" is set to the literal value "auto", the method "header"
1617 will be invoked on the opened stream to check if there is a BOM and set
1618 the encoding accordingly. This is equal to passing a true value in
1619 the option "detect_bom".
1620
1621 detect_bom
1622
1623 If "detect_bom" is given, the method "header" will be invoked on
1624 the opened stream to check if there is a BOM and set the encoding
1625 accordingly.
1626
1627 "detect_bom" can be abbreviated to "bom".
1628
1629 This is the same as setting "encoding" to "auto".
1630
1631 Note that as the method "header" is invoked, its default is to also
1632 set the headers.
1633
1634 headers
1635
1636 If this attribute is not given, the default behavior is to produce an
1637 array of arrays.
1638
1639 If "headers" is supplied, it should be an anonymous list of column
1640 names, an anonymous hashref, a coderef, or a literal flag: "auto",
1641 "lc", "uc", or "skip".
1642
1643 skip
1644 When "skip" is used, the header will not be included in the output.
1645
1646 my $aoa = csv (in => $fh, headers => "skip");
1647
1648 auto
1649 If "auto" is used, the first line of the "CSV" source will be read as
1650 the list of field headers and used to produce an array of hashes.
1651
1652 my $aoh = csv (in => $fh, headers => "auto");
1653
1654 lc
1655 If "lc" is used, the first line of the "CSV" source will be read as
1656 the list of field headers mapped to lower case and used to produce
1657 an array of hashes. This is a variation of "auto".
1658
1659 my $aoh = csv (in => $fh, headers => "lc");
1660
1661 uc
1662 If "uc" is used, the first line of the "CSV" source will be read as
1663 the list of field headers mapped to upper case and used to produce
1664 an array of hashes. This is a variation of "auto".
1665
1666 my $aoh = csv (in => $fh, headers => "uc");
1667
1668 CODE
1669 If a coderef is used, the first line of the "CSV" source will be
1670 read as the list of mangled field headers in which each field is
1671 passed as the only argument to the coderef. This list is used to
1672 produce an array of hashes.
1673
1674 my $aoh = csv (in => $fh,
1675 headers => sub { lc ($_[0]) =~ s/kode/code/gr });
1676
1677 this example is a variation of using "lc" where all occurrences of
1678 "kode" are replaced with "code".
1679
1680 ARRAY
1681 If "headers" is an anonymous list, the entries in the list will be
1682 used as field names. The first line is considered data instead of
1683 headers.
1684
1685 my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]);
1686 csv (in => $aoa, out => $fh, headers => [qw( code description price )]);
1687
1688 HASH
1689 If "headers" is an hash reference, this implies "auto", but header
1690 fields for that exist as key in the hashref will be replaced by the
1691 value for that key. Given a CSV file like
1692
1693 post-kode,city,name,id number,fubble
1694 1234AA,Duckstad,Donald,13,"X313DF"
1695
1696 using
1697
1698 csv (headers => { "post-kode" => "pc", "id number" => "ID" }, ...
1699
1700 will return an entry like
1701
1702 { pc => "1234AA",
1703 city => "Duckstad",
1704 name => "Donald",
1705 ID => "13",
1706 fubble => "X313DF",
1707 }
1708
1709 See also "munge_column_names" and "set_column_names".
1710
1711 munge_column_names
1712
1713 If "munge_column_names" is set, the method "header" is invoked on
1714 the opened stream with all matching arguments to detect and set the
1715 headers.
1716
1717 "munge_column_names" can be abbreviated to "munge".
1718
1719 key
1720
1721 If passed, will default "headers" to "auto" and return a hashref
1722 instead of an array of hashes.
1723
1724 my $ref = csv (in => "test.csv", key => "code");
1725
1726 with test.csv like
1727
1728 code,product,price,color
1729 1,pc,850,gray
1730 2,keyboard,12,white
1731 3,mouse,5,black
1732
1733 will return
1734
1735 { 1 => {
1736 code => 1,
1737 color => 'gray',
1738 price => 850,
1739 product => 'pc'
1740 },
1741 2 => {
1742 code => 2,
1743 color => 'white',
1744 price => 12,
1745 product => 'keyboard'
1746 },
1747 3 => {
1748 code => 3,
1749 color => 'black',
1750 price => 5,
1751 product => 'mouse'
1752 }
1753 }
1754
1755 The "key" attribute can be combined with "headers" for "CSV" date that
1756 has no header line, like
1757
1758 my $ref = csv (
1759 in => "foo.csv",
1760 headers => [qw( c_foo foo bar description stock )],
1761 key => "c_foo",
1762 );
1763
1764 keep_headers
1765
1766 When using hashes, keep the column names into the arrayref passed, so
1767 all headers are available after the call in the original order.
1768
1769 my $aoh = csv (in => "file.csv", keep_headers => \my @hdr);
1770
1771 This attribute can be abbreviated to "kh" or passed as
1772 "keep_column_names".
1773
1774 This attribute implies a default of "auto" for the "headers" attribute.
1775
1776 fragment
1777
1778 Only output the fragment as defined in the "fragment" method. This
1779 option is ignored when generating "CSV". See "out".
1780
1781 Combining all of them could give something like
1782
1783 use Text::CSV qw( csv );
1784 my $aoh = csv (
1785 in => "test.txt",
1786 encoding => "utf-8",
1787 headers => "auto",
1788 sep_char => "|",
1789 fragment => "row=3;6-9;15-*",
1790 );
1791 say $aoh->[15]{Foo};
1792
1793 sep_set
1794
1795 If "sep_set" is set, the method "header" is invoked on the opened
1796 stream to detect and set "sep_char" with the given set.
1797
1798 "sep_set" can be abbreviated to "seps".
1799
1800 Note that as the "header" method is invoked, its default is to also
1801 set the headers.
1802
1803 set_column_names
1804
1805 If "set_column_names" is passed, the method "header" is invoked on
1806 the opened stream with all arguments meant for "header".
1807
1808 If "set_column_names" is passed as a false value, the content of the
1809 first row is only preserved if the output is AoA:
1810
1811 With an input-file like
1812
1813 bAr,foo
1814 1,2
1815 3,4,5
1816
1817 This call
1818
1819 my $aoa = csv (in => $file, set_column_names => 0);
1820
1821 will result in
1822
1823 [[ "bar", "foo" ],
1824 [ "1", "2" ],
1825 [ "3", "4", "5" ]]
1826
1827 and
1828
1829 my $aoa = csv (in => $file, set_column_names => 0, munge => "none");
1830
1831 will result in
1832
1833 [[ "bAr", "foo" ],
1834 [ "1", "2" ],
1835 [ "3", "4", "5" ]]
1836
1837 Callbacks
1838 Callbacks enable actions triggered from the inside of Text::CSV.
1839
1840 While most of what this enables can easily be done in an unrolled
1841 loop as described in the "SYNOPSIS" callbacks can be used to meet
1842 special demands or enhance the "csv" function.
1843
1844 error
1845 $csv->callbacks (error => sub { $csv->SetDiag (0) });
1846
1847 the "error" callback is invoked when an error occurs, but only
1848 when "auto_diag" is set to a true value. A callback is invoked with
1849 the values returned by "error_diag":
1850
1851 my ($c, $s);
1852
1853 sub ignore3006
1854 {
1855 my ($err, $msg, $pos, $recno, $fldno) = @_;
1856 if ($err == 3006) {
1857 # ignore this error
1858 ($c, $s) = (undef, undef);
1859 Text::CSV->SetDiag (0);
1860 }
1861 # Any other error
1862 return;
1863 } # ignore3006
1864
1865 $csv->callbacks (error => \&ignore3006);
1866 $csv->bind_columns (\$c, \$s);
1867 while ($csv->getline ($fh)) {
1868 # Error 3006 will not stop the loop
1869 }
1870
1871 after_parse
1872 $csv->callbacks (after_parse => sub { push @{$_[1]}, "NEW" });
1873 while (my $row = $csv->getline ($fh)) {
1874 $row->[-1] eq "NEW";
1875 }
1876
1877 This callback is invoked after parsing with "getline" only if no
1878 error occurred. The callback is invoked with two arguments: the
1879 current "CSV" parser object and an array reference to the fields
1880 parsed.
1881
1882 The return code of the callback is ignored unless it is a reference
1883 to the string "skip", in which case the record will be skipped in
1884 "getline_all".
1885
1886 sub add_from_db
1887 {
1888 my ($csv, $row) = @_;
1889 $sth->execute ($row->[4]);
1890 push @$row, $sth->fetchrow_array;
1891 } # add_from_db
1892
1893 my $aoa = csv (in => "file.csv", callbacks => {
1894 after_parse => \&add_from_db });
1895
1896 This hook can be used for validation:
1897
1898 FAIL
1899 Die if any of the records does not validate a rule:
1900
1901 after_parse => sub {
1902 $_[1][4] =~ m/^[0-9]{4}\s?[A-Z]{2}$/ or
1903 die "5th field does not have a valid Dutch zipcode";
1904 }
1905
1906 DEFAULT
1907 Replace invalid fields with a default value:
1908
1909 after_parse => sub { $_[1][2] =~ m/^\d+$/ or $_[1][2] = 0 }
1910
1911 SKIP
1912 Skip records that have invalid fields (only applies to
1913 "getline_all"):
1914
1915 after_parse => sub { $_[1][0] =~ m/^\d+$/ or return \"skip"; }
1916
1917 before_print
1918 my $idx = 1;
1919 $csv->callbacks (before_print => sub { $_[1][0] = $idx++ });
1920 $csv->print (*STDOUT, [ 0, $_ ]) for @members;
1921
1922 This callback is invoked before printing with "print" only if no
1923 error occurred. The callback is invoked with two arguments: the
1924 current "CSV" parser object and an array reference to the fields
1925 passed.
1926
1927 The return code of the callback is ignored.
1928
1929 sub max_4_fields
1930 {
1931 my ($csv, $row) = @_;
1932 @$row > 4 and splice @$row, 4;
1933 } # max_4_fields
1934
1935 csv (in => csv (in => "file.csv"), out => *STDOUT,
1936 callbacks => { before print => \&max_4_fields });
1937
1938 This callback is not active for "combine".
1939
1940 Callbacks for csv ()
1941
1942 The "csv" allows for some callbacks that do not integrate in XS
1943 internals but only feature the "csv" function.
1944
1945 csv (in => "file.csv",
1946 callbacks => {
1947 filter => { 6 => sub { $_ > 15 } }, # first
1948 after_parse => sub { say "AFTER PARSE"; }, # first
1949 after_in => sub { say "AFTER IN"; }, # second
1950 on_in => sub { say "ON IN"; }, # third
1951 },
1952 );
1953
1954 csv (in => $aoh,
1955 out => "file.csv",
1956 callbacks => {
1957 on_in => sub { say "ON IN"; }, # first
1958 before_out => sub { say "BEFORE OUT"; }, # second
1959 before_print => sub { say "BEFORE PRINT"; }, # third
1960 },
1961 );
1962
1963 filter
1964 This callback can be used to filter records. It is called just after
1965 a new record has been scanned. The callback accepts a:
1966
1967 hashref
1968 The keys are the index to the row (the field name or field number,
1969 1-based) and the values are subs to return a true or false value.
1970
1971 csv (in => "file.csv", filter => {
1972 3 => sub { m/a/ }, # third field should contain an "a"
1973 5 => sub { length > 4 }, # length of the 5th field minimal 5
1974 });
1975
1976 csv (in => "file.csv", filter => { foo => sub { $_ > 4 }});
1977
1978 If the keys to the filter hash contain any character that is not a
1979 digit it will also implicitly set "headers" to "auto" unless
1980 "headers" was already passed as argument. When headers are
1981 active, returning an array of hashes, the filter is not applicable
1982 to the header itself.
1983
1984 All sub results should match, as in AND.
1985
1986 The context of the callback sets $_ localized to the field
1987 indicated by the filter. The two arguments are as with all other
1988 callbacks, so the other fields in the current row can be seen:
1989
1990 filter => { 3 => sub { $_ > 100 ? $_[1][1] =~ m/A/ : $_[1][6] =~ m/B/ }}
1991
1992 If the context is set to return a list of hashes ("headers" is
1993 defined), the current record will also be available in the
1994 localized %_:
1995
1996 filter => { 3 => sub { $_ > 100 && $_{foo} =~ m/A/ && $_{bar} < 1000 }}
1997
1998 If the filter is used to alter the content by changing $_, make
1999 sure that the sub returns true in order not to have that record
2000 skipped:
2001
2002 filter => { 2 => sub { $_ = uc }}
2003
2004 will upper-case the second field, and then skip it if the resulting
2005 content evaluates to false. To always accept, end with truth:
2006
2007 filter => { 2 => sub { $_ = uc; 1 }}
2008
2009 coderef
2010 csv (in => "file.csv", filter => sub { $n++; 0; });
2011
2012 If the argument to "filter" is a coderef, it is an alias or
2013 shortcut to a filter on column 0:
2014
2015 csv (filter => sub { $n++; 0 });
2016
2017 is equal to
2018
2019 csv (filter => { 0 => sub { $n++; 0 });
2020
2021 filter-name
2022 csv (in => "file.csv", filter => "not_blank");
2023 csv (in => "file.csv", filter => "not_empty");
2024 csv (in => "file.csv", filter => "filled");
2025
2026 These are predefined filters
2027
2028 Given a file like (line numbers prefixed for doc purpose only):
2029
2030 1:1,2,3
2031 2:
2032 3:,
2033 4:""
2034 5:,,
2035 6:, ,
2036 7:"",
2037 8:" "
2038 9:4,5,6
2039
2040 not_blank
2041 Filter out the blank lines
2042
2043 This filter is a shortcut for
2044
2045 filter => { 0 => sub { @{$_[1]} > 1 or
2046 defined $_[1][0] && $_[1][0] ne "" } }
2047
2048 Due to the implementation, it is currently impossible to also
2049 filter lines that consists only of a quoted empty field. These
2050 lines are also considered blank lines.
2051
2052 With the given example, lines 2 and 4 will be skipped.
2053
2054 not_empty
2055 Filter out lines where all the fields are empty.
2056
2057 This filter is a shortcut for
2058
2059 filter => { 0 => sub { grep { defined && $_ ne "" } @{$_[1]} } }
2060
2061 A space is not regarded being empty, so given the example data,
2062 lines 2, 3, 4, 5, and 7 are skipped.
2063
2064 filled
2065 Filter out lines that have no visible data
2066
2067 This filter is a shortcut for
2068
2069 filter => { 0 => sub { grep { defined && m/\S/ } @{$_[1]} } }
2070
2071 This filter rejects all lines that not have at least one field
2072 that does not evaluate to the empty string.
2073
2074 With the given example data, this filter would skip lines 2
2075 through 8.
2076
2077 after_in
2078 This callback is invoked for each record after all records have been
2079 parsed but before returning the reference to the caller. The hook is
2080 invoked with two arguments: the current "CSV" parser object and a
2081 reference to the record. The reference can be a reference to a
2082 HASH or a reference to an ARRAY as determined by the arguments.
2083
2084 This callback can also be passed as an attribute without the
2085 "callbacks" wrapper.
2086
2087 before_out
2088 This callback is invoked for each record before the record is
2089 printed. The hook is invoked with two arguments: the current "CSV"
2090 parser object and a reference to the record. The reference can be a
2091 reference to a HASH or a reference to an ARRAY as determined by the
2092 arguments.
2093
2094 This callback can also be passed as an attribute without the
2095 "callbacks" wrapper.
2096
2097 This callback makes the row available in %_ if the row is a hashref.
2098 In this case %_ is writable and will change the original row.
2099
2100 on_in
2101 This callback acts exactly as the "after_in" or the "before_out"
2102 hooks.
2103
2104 This callback can also be passed as an attribute without the
2105 "callbacks" wrapper.
2106
2107 This callback makes the row available in %_ if the row is a hashref.
2108 In this case %_ is writable and will change the original row. So e.g.
2109 with
2110
2111 my $aoh = csv (
2112 in => \"foo\n1\n2\n",
2113 headers => "auto",
2114 on_in => sub { $_{bar} = 2; },
2115 );
2116
2117 $aoh will be:
2118
2119 [ { foo => 1,
2120 bar => 2,
2121 }
2122 { foo => 2,
2123 bar => 2,
2124 }
2125 ]
2126
2127 csv
2128 The function "csv" can also be called as a method or with an
2129 existing Text::CSV object. This could help if the function is to be
2130 invoked a lot of times and the overhead of creating the object
2131 internally over and over again would be prevented by passing an
2132 existing instance.
2133
2134 my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1 });
2135
2136 my $aoa = $csv->csv (in => $fh);
2137 my $aoa = csv (in => $fh, csv => $csv);
2138
2139 both act the same. Running this 20000 times on a 20 lines CSV file,
2140 showed a 53% speedup.
2141
2143 This section is also taken from Text::CSV_XS.
2144
2145 Still under construction ...
2146
2147 If an error occurs, "$csv->error_diag" can be used to get information
2148 on the cause of the failure. Note that for speed reasons the internal
2149 value is never cleared on success, so using the value returned by
2150 "error_diag" in normal cases - when no error occurred - may cause
2151 unexpected results.
2152
2153 If the constructor failed, the cause can be found using "error_diag" as
2154 a class method, like "Text::CSV->error_diag".
2155
2156 The "$csv->error_diag" method is automatically invoked upon error when
2157 the contractor was called with "auto_diag" set to 1 or 2, or when
2158 autodie is in effect. When set to 1, this will cause a "warn" with the
2159 error message, when set to 2, it will "die". "2012 - EOF" is excluded
2160 from "auto_diag" reports.
2161
2162 Errors can be (individually) caught using the "error" callback.
2163
2164 The errors as described below are available. I have tried to make the
2165 error itself explanatory enough, but more descriptions will be added.
2166 For most of these errors, the first three capitals describe the error
2167 category:
2168
2169 · INI
2170
2171 Initialization error or option conflict.
2172
2173 · ECR
2174
2175 Carriage-Return related parse error.
2176
2177 · EOF
2178
2179 End-Of-File related parse error.
2180
2181 · EIQ
2182
2183 Parse error inside quotation.
2184
2185 · EIF
2186
2187 Parse error inside field.
2188
2189 · ECB
2190
2191 Combine error.
2192
2193 · EHR
2194
2195 HashRef parse related error.
2196
2197 And below should be the complete list of error codes that can be
2198 returned:
2199
2200 · 1001 "INI - sep_char is equal to quote_char or escape_char"
2201
2202 The separation character cannot be equal to the quotation
2203 character or to the escape character, as this would invalidate all
2204 parsing rules.
2205
2206 · 1002 "INI - allow_whitespace with escape_char or quote_char SP or
2207 TAB"
2208
2209 Using the "allow_whitespace" attribute when either "quote_char" or
2210 "escape_char" is equal to "SPACE" or "TAB" is too ambiguous to
2211 allow.
2212
2213 · 1003 "INI - \r or \n in main attr not allowed"
2214
2215 Using default "eol" characters in either "sep_char", "quote_char",
2216 or "escape_char" is not allowed.
2217
2218 · 1004 "INI - callbacks should be undef or a hashref"
2219
2220 The "callbacks" attribute only allows one to be "undef" or a hash
2221 reference.
2222
2223 · 1005 "INI - EOL too long"
2224
2225 The value passed for EOL is exceeding its maximum length (16).
2226
2227 · 1006 "INI - SEP too long"
2228
2229 The value passed for SEP is exceeding its maximum length (16).
2230
2231 · 1007 "INI - QUOTE too long"
2232
2233 The value passed for QUOTE is exceeding its maximum length (16).
2234
2235 · 1008 "INI - SEP undefined"
2236
2237 The value passed for SEP should be defined and not empty.
2238
2239 · 1010 "INI - the header is empty"
2240
2241 The header line parsed in the "header" is empty.
2242
2243 · 1011 "INI - the header contains more than one valid separator"
2244
2245 The header line parsed in the "header" contains more than one
2246 (unique) separator character out of the allowed set of separators.
2247
2248 · 1012 "INI - the header contains an empty field"
2249
2250 The header line parsed in the "header" is contains an empty field.
2251
2252 · 1013 "INI - the header contains nun-unique fields"
2253
2254 The header line parsed in the "header" contains at least two
2255 identical fields.
2256
2257 · 1014 "INI - header called on undefined stream"
2258
2259 The header line cannot be parsed from an undefined sources.
2260
2261 · 1500 "PRM - Invalid/unsupported argument(s)"
2262
2263 Function or method called with invalid argument(s) or parameter(s).
2264
2265 · 1501 "PRM - The key attribute is passed as an unsupported type"
2266
2267 The "key" attribute is of an unsupported type.
2268
2269 · 2010 "ECR - QUO char inside quotes followed by CR not part of EOL"
2270
2271 When "eol" has been set to anything but the default, like
2272 "\r\t\n", and the "\r" is following the second (closing)
2273 "quote_char", where the characters following the "\r" do not make up
2274 the "eol" sequence, this is an error.
2275
2276 · 2011 "ECR - Characters after end of quoted field"
2277
2278 Sequences like "1,foo,"bar"baz,22,1" are not allowed. "bar" is a
2279 quoted field and after the closing double-quote, there should be
2280 either a new-line sequence or a separation character.
2281
2282 · 2012 "EOF - End of data in parsing input stream"
2283
2284 Self-explaining. End-of-file while inside parsing a stream. Can
2285 happen only when reading from streams with "getline", as using
2286 "parse" is done on strings that are not required to have a trailing
2287 "eol".
2288
2289 · 2013 "INI - Specification error for fragments RFC7111"
2290
2291 Invalid specification for URI "fragment" specification.
2292
2293 · 2014 "ENF - Inconsistent number of fields"
2294
2295 Inconsistent number of fields under strict parsing.
2296
2297 · 2021 "EIQ - NL char inside quotes, binary off"
2298
2299 Sequences like "1,"foo\nbar",22,1" are allowed only when the binary
2300 option has been selected with the constructor.
2301
2302 · 2022 "EIQ - CR char inside quotes, binary off"
2303
2304 Sequences like "1,"foo\rbar",22,1" are allowed only when the binary
2305 option has been selected with the constructor.
2306
2307 · 2023 "EIQ - QUO character not allowed"
2308
2309 Sequences like ""foo "bar" baz",qu" and "2023,",2008-04-05,"Foo,
2310 Bar",\n" will cause this error.
2311
2312 · 2024 "EIQ - EOF cannot be escaped, not even inside quotes"
2313
2314 The escape character is not allowed as last character in an input
2315 stream.
2316
2317 · 2025 "EIQ - Loose unescaped escape"
2318
2319 An escape character should escape only characters that need escaping.
2320
2321 Allowing the escape for other characters is possible with the
2322 attribute "allow_loose_escape".
2323
2324 · 2026 "EIQ - Binary character inside quoted field, binary off"
2325
2326 Binary characters are not allowed by default. Exceptions are
2327 fields that contain valid UTF-8, that will automatically be upgraded
2328 if the content is valid UTF-8. Set "binary" to 1 to accept binary
2329 data.
2330
2331 · 2027 "EIQ - Quoted field not terminated"
2332
2333 When parsing a field that started with a quotation character, the
2334 field is expected to be closed with a quotation character. When the
2335 parsed line is exhausted before the quote is found, that field is not
2336 terminated.
2337
2338 · 2030 "EIF - NL char inside unquoted verbatim, binary off"
2339
2340 · 2031 "EIF - CR char is first char of field, not part of EOL"
2341
2342 · 2032 "EIF - CR char inside unquoted, not part of EOL"
2343
2344 · 2034 "EIF - Loose unescaped quote"
2345
2346 · 2035 "EIF - Escaped EOF in unquoted field"
2347
2348 · 2036 "EIF - ESC error"
2349
2350 · 2037 "EIF - Binary character in unquoted field, binary off"
2351
2352 · 2110 "ECB - Binary character in Combine, binary off"
2353
2354 · 2200 "EIO - print to IO failed. See errno"
2355
2356 · 3001 "EHR - Unsupported syntax for column_names ()"
2357
2358 · 3002 "EHR - getline_hr () called before column_names ()"
2359
2360 · 3003 "EHR - bind_columns () and column_names () fields count
2361 mismatch"
2362
2363 · 3004 "EHR - bind_columns () only accepts refs to scalars"
2364
2365 · 3006 "EHR - bind_columns () did not pass enough refs for parsed
2366 fields"
2367
2368 · 3007 "EHR - bind_columns needs refs to writable scalars"
2369
2370 · 3008 "EHR - unexpected error in bound fields"
2371
2372 · 3009 "EHR - print_hr () called before column_names ()"
2373
2374 · 3010 "EHR - print_hr () called with invalid arguments"
2375
2377 Text::CSV_PP, Text::CSV_XS and Text::CSV::Encoded.
2378
2380 Alan Citterman <alan[at]mfgrtl.com> wrote the original Perl module.
2381 Please don't send mail concerning Text::CSV to Alan, as he's not a
2382 present maintainer.
2383
2384 Jochen Wiedmann <joe[at]ispsoft.de> rewrote the encoding and decoding
2385 in C by implementing a simple finite-state machine and added the
2386 variable quote, escape and separator characters, the binary mode and
2387 the print and getline methods. See ChangeLog releases 0.10 through
2388 0.23.
2389
2390 H.Merijn Brand <h.m.brand[at]xs4all.nl> cleaned up the code, added the
2391 field flags methods, wrote the major part of the test suite, completed
2392 the documentation, fixed some RT bugs. See ChangeLog releases 0.25 and
2393 on.
2394
2395 Makamaka Hannyaharamitu, <makamaka[at]cpan.org> wrote Text::CSV_PP
2396 which is the pure-Perl version of Text::CSV_XS.
2397
2398 New Text::CSV (since 0.99) is maintained by Makamaka, and Kenichi
2399 Ishigaki since 1.91.
2400
2402 Text::CSV
2403
2404 Copyright (C) 1997 Alan Citterman. All rights reserved. Copyright (C)
2405 2007-2015 Makamaka Hannyaharamitu. Copyright (C) 2017- Kenichi
2406 Ishigaki A large portion of the doc is taken from Text::CSV_XS. See
2407 below.
2408
2409 Text::CSV_PP:
2410
2411 Copyright (C) 2005-2015 Makamaka Hannyaharamitu. Copyright (C) 2017-
2412 Kenichi Ishigaki A large portion of the code/doc are also taken from
2413 Text::CSV_XS. See below.
2414
2415 Text:CSV_XS:
2416
2417 Copyright (C) 2007-2016 H.Merijn Brand for PROCURA B.V. Copyright (C)
2418 1998-2001 Jochen Wiedmann. All rights reserved. Portions Copyright (C)
2419 1997 Alan Citterman. All rights reserved.
2420
2421 This library is free software; you can redistribute it and/or modify it
2422 under the same terms as Perl itself.
2423
2424
2425
2426perl v5.28.0 2018-08-17 Text::CSV(3)