Text::RecordParser(3pm)

1Text::RecordParser(3) User Contributed Perl DocumentationText::RecordParser(3)
2
3
4

NAME

6       Text::RecordParser - read record-oriented files
7

VERSION

9       This documentation refers to version 1.2.1.
10

SYNOPSIS

12         use Text::RecordParser;
13
14         # use default record (\n) and field (,) separators
15         my $p = Text::RecordParser->new( $file );
16
17         # or be explicit
18         my $p = Text::RecordParser->new({
19             filename        => $file,
20             field_separator => "\t",
21         });
22
23         $p->filename('foo.csv');
24
25         # Split records on two newlines
26         $p->record_separator("\n\n");
27
28         # Split fields on tabs
29         $p->field_separator("\t");
30
31         # Skip lines beginning with hashes
32         $p->comment( qr/^#/ );
33
34         # Trim whitespace
35         $p->trim(1);
36
37         # Use the fields in the first line as column names
38         $p->bind_header;
39
40         # Get a list of the header fields (in order)
41         my @columns = $p->field_list;
42
43         # Extract a particular field from the next row
44         my ( $name, $age ) = $p->extract( qw[name age] );
45
46         # Return all the fields from the next row
47         my @fields = $p->fetchrow_array;
48
49         # Define a field alias
50         $p->set_field_alias( name => 'handle' );
51
52         # Return all the fields from the next row as a hashref
53         my $record = $p->fetchrow_hashref;
54         print $record->{'name'};
55         # or
56         print $record->{'handle'};
57
58         # Return the record as an object with fields as accessors
59         my $object = $p->fetchrow_object;
60         print $object->name; # or $object->handle;
61
62         # Get all data as arrayref of arrayrefs
63         my $data = $p->fetchall_arrayref;
64
65         # Get all data as arrayref of hashrefs
66         my $data = $p->fetchall_arrayref( { Columns => {} } );
67
68         # Get all data as hashref of hashrefs
69         my $data = $p->fetchall_hashref('name');
70

DESCRIPTION

72       This module is for reading record-oriented data in a delimited text
73       file.  The most common example have records separated by newlines and
74       fields separated by commas or tabs, but this module aims to provide a
75       consistent interface for handling sequential records in a file however
76       they may be delimited.  Typically this data lists the fields in the
77       first line of the file, in which case you should call "bind_header" to
78       bind the field name (or not, and it will be called implicitly).  If the
79       first line contains data, you can still bind your own field names via
80       "bind_fields".  Either way, you can then use many methods to get at the
81       data as arrays or hashes.
82

METHODS

84       new
85
86       This is the object constructor.  It takes a hash (or hashref) of argu‐
87       ments.  Each argument can also be set through the method of the same
88       name.
89
90       * filename
91           The path to the file being read.  If the filename is passed and the
92           fh is not, then it will open a filehandle on that file and sets
93           "fh" accordingly.
94
95       * comment
96           A compiled regular expression identifying comment lines that should
97           be skipped.
98
99       * data
100           The data to read.
101
102       * fh
103           The filehandle of the file to read.
104
105       * field_separator ⎪ fs
106           The field separator (default is comma).
107
108       * record_separator ⎪ rs
109           The record separator (default is newline).
110
111       * field_filter
112           A callback applied to all the fields as they are read.
113
114       * header_filter
115           A callback applied to the column names.
116
117       * trim
118           Boolean to enable trimming of leading and trailing whitespace from
119           fields (useful if splitting on whitespace only).
120
121       See methods for each argument name for more information.
122
123       Alternately, if you supply a single argument to "new", it will be
124       treated as the "filename" argument.
125
126       bind_fields
127
128         $p->bind_fields( qw[ name rank serial_number ] );
129
130       Takes an array of field names and memorizes the field positions for
131       later use.  If the input file has no header line but you still wish to
132       retrieve the fields by name (or even if you want to call "bind_header"
133       and then give your own field names), simply pass in the an array of
134       field names you wish to use.
135
136       Pass in an empty array reference to unset:
137
138         $p->bind_field( [] ); # unsets fields
139
140       bind_header
141
142         $p->bind_header;
143         my $name = $p->extract('name');
144
145       Takes the fields from the next row under the cursor and assigns the
146       field names to the values.  Usually you would call this immediately
147       after opening the file in order to bind the field names in the first
148       row.
149
150       comment
151
152         $p->comment( qr/^#/ );  # Perl-style comments
153         $p->comment( qr/^--/ ); # SQL-style comments
154
155       Takes a regex to apply to a record to see if it looks like a comment to
156       skip.
157
158       data
159
160         $p->data( $string );
161         $p->data( \$string );
162         $p->data( @lines );
163         $p->data( [$line1, $line2, $line3] );
164         $p->data( IO::File->new('<data') );
165
166       Allows a scalar, scalar reference, glob, array, or array reference as
167       the thing to read instead of a file handle.
168
169       It's not advised to pass a filehandle to "data" as it will read the
170       entire contents of the file rather than one line at a time if you set
171       it via "fh".
172
173       extract
174
175         my ( $foo, $bar, $baz ) = $p->extract( qw[ foo bar baz ] );
176
177       Extracts a list of fields out of the last row read.  The field names
178       must correspond to the field names bound either via "bind_fields" or
179       "bind_header".
180
181       fetchrow_array
182
183         my @values = $p->fetchrow_array;
184
185       Reads a row from the file and returns an array or array reference of
186       the fields.
187
188       fetchrow_hashref
189
190         my $record = $p->fetchrow_hashref;
191         print "Name = ", $record->{'name'}, "\n";
192
193       Reads a line of the file and returns it as a hash reference.  The keys
194       of the hashref are the field names bound via "bind_fields" or
195       "bind_header".  If you do not bind fields prior to calling this method,
196       the "bind_header" method will be implicitly called for you.
197
198       fetchrow_object
199
200         while ( my $object = $p->fetchrow_object ) {
201             my $id   = $object->id;
202             my $name = $object->naem; # <-- this will throw a runtime error
203         }
204
205       This will return the next data record as a Text::RecordParser::Object
206       object that has read-only accessor methods of the field names and any
207       aliases.  This allows you to enforce field names, further helping
208       ensure that your code is reading the input file correctly.  That is, if
209       you are using the "fetchrow_hashref" method to read each line, you may
210       misspell the hash key and introduce a bug in your code.  With this
211       method, Perl will throw an error if you attempt to read a field not
212       defined in the file's headers.  Additionally, any defined field aliases
213       will be created as additional accessor methods.
214
215       fetchall_arrayref
216
217         my $records = $p->fetchall_arrayref;
218         for my $record ( @$records ) {
219             print "Name = ", $record->[0], "\n";
220         }
221
222         my $records = $p->fetchall_arrayref( { Columns => {} } );
223         for my $record ( @$records ) {
224             print "Name = ", $record->{'name'}, "\n";
225         }
226
227       Like DBI's fetchall_arrayref, returns an arrayref of arrayrefs.  Also
228       accepts optional "{ Columns => {} }" argument to return an arrayref of
229       hashrefs.
230
231       fetchall_hashref
232
233         my $records = $p->fetchall_hashref('id');
234         for my $id ( keys %$records ) {
235             my $record = $records->{ $id };
236             print "Name = ", $record->{'name'}, "\n";
237         }
238
239       Like DBI's fetchall_hashref, this returns a hash reference of hash ref‐
240       erences.  The keys of the top-level hashref are the field values of the
241       field argument you supply.  The field name you supply can be a field
242       created by a "field_compute".
243
244       fh
245
246         open my $fh, '<', $file or die $!;
247         $p->fh( $fh );
248
249       Gets or sets the filehandle of the file being read.
250
251       field_compute
252
253       A callback applied to the fields identified by position (or field name
254       if "bind_fields" or "bind_header" was called).
255
256       The callback will be passed two arguments:
257
258       1   The current field
259
260       2   A reference to all the other fields, either as an array or hash
261           reference, depending on the method which you called.
262
263       If data looks like this:
264
265         parent    children
266         Mike      Greg,Peter,Bobby
267         Carol     Marcia,Jane,Cindy
268
269       You could split the "children" field into an array reference with the
270       values like so:
271
272         $p->field_compute( 'children', sub { [ split /,/, shift() ] } );
273
274       The field position or name doesn't actually have to exist, which means
275       you could create new, computed fields on-the-fly.  E.g., if you data
276       looks like this:
277
278           1,3,5
279           32,4,1
280           9,5,4
281
282       You could write a field_compute like this:
283
284           $p->field_compute( 3,
285               sub {
286                   my ( $cur, $others ) = @_;
287                   my $sum;
288                   $sum += $_ for @$others;
289                   return $sum;
290               }
291           );
292
293       Field "3" will be created as the sum of the other fields.  This allows
294       you to further write:
295
296           my $data = $p->fetchall_arrayref;
297           for my $rec ( @$data ) {
298               print "$rec->[0] + $rec->[1] + $rec->[2] = $rec->[3]\n";
299           }
300
301       Prints:
302
303           1 + 3 + 5 = 9
304           32 + 4 + 1 = 37
305           9 + 5 + 4 = 18
306
307       field_filter
308
309         $p->field_filter( sub { $_ = shift; uc(lc($_)) } );
310
311       A callback which is applied to each field.  The callback will be passed
312       the current value of the field.  Whatever is passed back will become
313       the new value of the field.  The above example capitalizes field val‐
314       ues.  To unset the filter, pass in the empty string.
315
316       field_list
317
318         $p->bind_fields( qw[ foo bar baz ] );
319         my @fields = $p->field_list;
320         print join ', ', @fields; # prints "foo, bar, baz"
321
322       Returns the fields bound via "bind_fields" (or "bind_header").
323
324       field_positions
325
326         my %positions = $p->field_positions;
327
328       Returns a hash of the fields and their positions bound via
329       "bind_fields" (or "bind_header").  Mostly for internal use.
330
331       field_separator
332
333         $p->field_separator("\t");     # splits fields on tabs
334         $p->field_separator('::');     # splits fields on double colons
335         $p->field_separator(qr/\s+/);  # splits fields on whitespace
336         my $sep = $p->field_separator; # returns the current separator
337
338       Gets and sets the token to use as the field delimiter.  The default is
339       a comma.  Regular expressions can be specified using qr//.
340
341       filename
342
343         $p->filename('/path/to/file.dat');
344
345       Gets or sets the complete path to the file to be read.  If a file is
346       already opened, then the handle on it will be closed and a new one
347       opened on the new file.
348
349       get_field_aliases
350
351         my @aliases = $p->get_field_aliases('name');
352
353       Allows you to define alternate names for fields, e.g., sometimes your
354       input file calls city "town" or "township," sometimes a file uses
355       "Moniker" instead of "name."
356
357       header_filter
358
359         $p->header_filter( sub { $_ = shift; s/\s+/_/g; lc $_ } );
360
361       A callback applied to column header names.  The callback will be passed
362       the current value of the header.  Whatever is returned will become the
363       new value of the header.  The above example collapses spaces into a
364       single underscore and lowercases the letters.  To unset a filter, pass
365       in the empty string.
366
367       record_separator
368
369         $p->record_separator("\n//\n");
370         $p->field_separator("\n");
371
372       Gets and sets the token to use as the record separator.  The default is
373       a newline ("\n").
374
375       The above example would read a file that looks like this:
376
377         field1
378         field2
379         field3
380         //
381         data1
382         data2
383         data3
384         //
385
386       set_field_alias
387
388         $p->set_field_alias({
389             name => 'Moniker,handle',        # comma-separated string
390             city => [ qw( town township ) ], # or anonymous arrayref
391         });
392
393       Allows you to define alternate names for fields, e.g., sometimes your
394       input file calls city "town" or "township," sometimes a file uses
395       "Moniker" instead of "name."
396
397       trim
398
399         my $trim_value = $p->trim(1);
400
401       Provide "true" argument to remove leading and trailing whitespace from
402       fields.  Use a "false" argument to disable.
403

AUTHOR

405       Ken Youens-Clark <kclark@cpan.org>
406

CREDITS

408       Thanks to the following:
409
410       * Benjamin Tilly
411           For Text::xSV, the inspirado for this module
412
413       * Tim Bunce et al.
414           For DBI, from which many of the methods were shamelessly stolen
415
416       * Tom Aldcroft
417           For contributing code to make it easy to parse whitespace-delimited
418           data
419
420       * Liya Ren
421           For catching the column-ordering error when parsing with "no-head‐
422           ers"
423
424       * Sharon Wei
425           For catching bug in "extract" that sets up infinite loops
426
427       * Lars Thegler
428           For bug report on missing "script_files" arg in Build.PL
429

BUGS

431       None known.  Please use http://rt.cpan.org/ for reporting bugs.
432

LICENSE AND COPYRIGHT

434       Copyright (C) 2006 Ken Youens-Clark.  All rights reserved.
435
436       This program is free software; you can redistribute it and/or modify it
437       under the terms of the GNU General Public License as published by the
438       Free Software Foundation; version 2.
439
440       This program is distributed in the hope that it will be useful, but
441       WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
442       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
443       Public License for more details.
444
445
446
447perl v5.8.8                       2007-05-17             Text::RecordParser(3)