Text::RecordParser(3pm)

1Text::RecordParser(3) User Contributed Perl DocumentationText::RecordParser(3)
2
3
4

NAME

6       Text::RecordParser - read record-oriented files
7

VERSION

9       This documentation refers to version 1.3.0.
10

SYNOPSIS

12         use Text::RecordParser;
13
14         # use default record (\n) and field (,) separators
15         my $p = Text::RecordParser->new( $file );
16
17         # or be explicit
18         my $p = Text::RecordParser->new({
19             filename        => $file,
20             field_separator => "\t",
21         });
22
23         $p->filename('foo.csv');
24
25         # Split records on two newlines
26         $p->record_separator("\n\n");
27
28         # Split fields on tabs
29         $p->field_separator("\t");
30
31         # Skip lines beginning with hashes
32         $p->comment( qr/^#/ );
33
34         # Trim whitespace
35         $p->trim(1);
36
37         # Use the fields in the first line as column names
38         $p->bind_header;
39
40         # Get a list of the header fields (in order)
41         my @columns = $p->field_list;
42
43         # Extract a particular field from the next row
44         my ( $name, $age ) = $p->extract( qw[name age] );
45
46         # Return all the fields from the next row
47         my @fields = $p->fetchrow_array;
48
49         # Define a field alias
50         $p->set_field_alias( name => 'handle' );
51
52         # Return all the fields from the next row as a hashref
53         my $record = $p->fetchrow_hashref;
54         print $record->{'name'};
55         # or
56         print $record->{'handle'};
57
58         # Return the record as an object with fields as accessors
59         my $object = $p->fetchrow_object;
60         print $object->name; # or $object->handle;
61
62         # Get all data as arrayref of arrayrefs
63         my $data = $p->fetchall_arrayref;
64
65         # Get all data as arrayref of hashrefs
66         my $data = $p->fetchall_arrayref( { Columns => {} } );
67
68         # Get all data as hashref of hashrefs
69         my $data = $p->fetchall_hashref('name');
70

DESCRIPTION

72       This module is for reading record-oriented data in a delimited text
73       file.  The most common example have records separated by newlines and
74       fields separated by commas or tabs, but this module aims to provide a
75       consistent interface for handling sequential records in a file however
76       they may be delimited.  Typically this data lists the fields in the
77       first line of the file, in which case you should call "bind_header" to
78       bind the field name (or not, and it will be called implicitly).  If the
79       first line contains data, you can still bind your own field names via
80       "bind_fields".  Either way, you can then use many methods to get at the
81       data as arrays or hashes.
82

METHODS

84   new
85       This is the object constructor.  It takes a hash (or hashref) of
86       arguments.  Each argument can also be set through the method of the
87       same name.
88
89       ·   filename
90
91           The path to the file being read.  If the filename is passed and the
92           fh is not, then it will open a filehandle on that file and sets
93           "fh" accordingly.
94
95       ·   comment
96
97           A compiled regular expression identifying comment lines that should
98           be skipped.
99
100       ·   data
101
102           The data to read.
103
104       ·   fh
105
106           The filehandle of the file to read.
107
108       ·   field_separator | fs
109
110           The field separator (default is comma).
111
112       ·   record_separator | rs
113
114           The record separator (default is newline).
115
116       ·   field_filter
117
118           A callback applied to all the fields as they are read.
119
120       ·   header_filter
121
122           A callback applied to the column names.
123
124       ·   trim
125
126           Boolean to enable trimming of leading and trailing whitespace from
127           fields (useful if splitting on whitespace only).
128
129       See methods for each argument name for more information.
130
131       Alternately, if you supply a single argument to "new", it will be
132       treated as the "filename" argument.
133
134   bind_fields
135         $p->bind_fields( qw[ name rank serial_number ] );
136
137       Takes an array of field names and memorizes the field positions for
138       later use.  If the input file has no header line but you still wish to
139       retrieve the fields by name (or even if you want to call "bind_header"
140       and then give your own field names), simply pass in the an array of
141       field names you wish to use.
142
143       Pass in an empty array reference to unset:
144
145         $p->bind_field( [] ); # unsets fields
146
147   bind_header
148         $p->bind_header;
149         my $name = $p->extract('name');
150
151       Takes the fields from the next row under the cursor and assigns the
152       field names to the values.  Usually you would call this immediately
153       after opening the file in order to bind the field names in the first
154       row.
155
156   comment
157         $p->comment( qr/^#/ );  # Perl-style comments
158         $p->comment( qr/^--/ ); # SQL-style comments
159
160       Takes a regex to apply to a record to see if it looks like a comment to
161       skip.
162
163   data
164         $p->data( $string );
165         $p->data( \$string );
166         $p->data( @lines );
167         $p->data( [$line1, $line2, $line3] );
168         $p->data( IO::File->new('<data') );
169
170       Allows a scalar, scalar reference, glob, array, or array reference as
171       the thing to read instead of a file handle.
172
173       It's not advised to pass a filehandle to "data" as it will read the
174       entire contents of the file rather than one line at a time if you set
175       it via "fh".
176
177   extract
178         my ( $foo, $bar, $baz ) = $p->extract( qw[ foo bar baz ] );
179
180       Extracts a list of fields out of the last row read.  The field names
181       must correspond to the field names bound either via "bind_fields" or
182       "bind_header".
183
184   fetchrow_array
185         my @values = $p->fetchrow_array;
186
187       Reads a row from the file and returns an array or array reference of
188       the fields.
189
190   fetchrow_hashref
191         my $record = $p->fetchrow_hashref;
192         print "Name = ", $record->{'name'}, "\n";
193
194       Reads a line of the file and returns it as a hash reference.  The keys
195       of the hashref are the field names bound via "bind_fields" or
196       "bind_header".  If you do not bind fields prior to calling this method,
197       the "bind_header" method will be implicitly called for you.
198
199   fetchrow_object
200         while ( my $object = $p->fetchrow_object ) {
201             my $id   = $object->id;
202             my $name = $object->naem; # <-- this will throw a runtime error
203         }
204
205       This will return the next data record as a Text::RecordParser::Object
206       object that has read-only accessor methods of the field names and any
207       aliases.  This allows you to enforce field names, further helping
208       ensure that your code is reading the input file correctly.  That is, if
209       you are using the "fetchrow_hashref" method to read each line, you may
210       misspell the hash key and introduce a bug in your code.  With this
211       method, Perl will throw an error if you attempt to read a field not
212       defined in the file's headers.  Additionally, any defined field aliases
213       will be created as additional accessor methods.
214
215   fetchall_arrayref
216         my $records = $p->fetchall_arrayref;
217         for my $record ( @$records ) {
218             print "Name = ", $record->[0], "\n";
219         }
220
221         my $records = $p->fetchall_arrayref( { Columns => {} } );
222         for my $record ( @$records ) {
223             print "Name = ", $record->{'name'}, "\n";
224         }
225
226       Like DBI's fetchall_arrayref, returns an arrayref of arrayrefs.  Also
227       accepts optional "{ Columns => {} }" argument to return an arrayref of
228       hashrefs.
229
230   fetchall_hashref
231         my $records = $p->fetchall_hashref('id');
232         for my $id ( keys %$records ) {
233             my $record = $records->{ $id };
234             print "Name = ", $record->{'name'}, "\n";
235         }
236
237       Like DBI's fetchall_hashref, this returns a hash reference of hash
238       references.  The keys of the top-level hashref are the field values of
239       the field argument you supply.  The field name you supply can be a
240       field created by a "field_compute".
241
242   fh
243         open my $fh, '<', $file or die $!;
244         $p->fh( $fh );
245
246       Gets or sets the filehandle of the file being read.
247
248   field_compute
249       A callback applied to the fields identified by position (or field name
250       if "bind_fields" or "bind_header" was called).
251
252       The callback will be passed two arguments:
253
254       1.  The current field
255
256       2.  A reference to all the other fields, either as an array or hash
257           reference, depending on the method which you called.
258
259       If data looks like this:
260
261         parent    children
262         Mike      Greg,Peter,Bobby
263         Carol     Marcia,Jane,Cindy
264
265       You could split the "children" field into an array reference with the
266       values like so:
267
268         $p->field_compute( 'children', sub { [ split /,/, shift() ] } );
269
270       The field position or name doesn't actually have to exist, which means
271       you could create new, computed fields on-the-fly.  E.g., if you data
272       looks like this:
273
274           1,3,5
275           32,4,1
276           9,5,4
277
278       You could write a field_compute like this:
279
280           $p->field_compute( 3,
281               sub {
282                   my ( $cur, $others ) = @_;
283                   my $sum;
284                   $sum += $_ for @$others;
285                   return $sum;
286               }
287           );
288
289       Field "3" will be created as the sum of the other fields.  This allows
290       you to further write:
291
292           my $data = $p->fetchall_arrayref;
293           for my $rec ( @$data ) {
294               print "$rec->[0] + $rec->[1] + $rec->[2] = $rec->[3]\n";
295           }
296
297       Prints:
298
299           1 + 3 + 5 = 9
300           32 + 4 + 1 = 37
301           9 + 5 + 4 = 18
302
303   field_filter
304         $p->field_filter( sub { $_ = shift; uc(lc($_)) } );
305
306       A callback which is applied to each field.  The callback will be passed
307       the current value of the field.  Whatever is passed back will become
308       the new value of the field.  The above example capitalizes field
309       values.  To unset the filter, pass in the empty string.
310
311   field_list
312         $p->bind_fields( qw[ foo bar baz ] );
313         my @fields = $p->field_list;
314         print join ', ', @fields; # prints "foo, bar, baz"
315
316       Returns the fields bound via "bind_fields" (or "bind_header").
317
318   field_positions
319         my %positions = $p->field_positions;
320
321       Returns a hash of the fields and their positions bound via
322       "bind_fields" (or "bind_header").  Mostly for internal use.
323
324   field_separator
325         $p->field_separator("\t");     # splits fields on tabs
326         $p->field_separator('::');     # splits fields on double colons
327         $p->field_separator(qr/\s+/);  # splits fields on whitespace
328         my $sep = $p->field_separator; # returns the current separator
329
330       Gets and sets the token to use as the field delimiter.  Regular
331       expressions can be specified using qr//.  If not specified, it will
332       take a guess based on the filename extension ("comma" for ".txt,"
333       ".dat," or ".csv"; "tab" for ".tab").  The default is a comma.
334
335   filename
336         $p->filename('/path/to/file.dat');
337
338       Gets or sets the complete path to the file to be read.  If a file is
339       already opened, then the handle on it will be closed and a new one
340       opened on the new file.
341
342   get_field_aliases
343         my @aliases = $p->get_field_aliases('name');
344
345       Allows you to define alternate names for fields, e.g., sometimes your
346       input file calls city "town" or "township," sometimes a file uses
347       "Moniker" instead of "name."
348
349   header_filter
350         $p->header_filter( sub { $_ = shift; s/\s+/_/g; lc $_ } );
351
352       A callback applied to column header names.  The callback will be passed
353       the current value of the header.  Whatever is returned will become the
354       new value of the header.  The above example collapses spaces into a
355       single underscore and lowercases the letters.  To unset a filter, pass
356       in the empty string.
357
358   record_separator
359         $p->record_separator("\n//\n");
360         $p->field_separator("\n");
361
362       Gets and sets the token to use as the record separator.  The default is
363       a newline ("\n").
364
365       The above example would read a file that looks like this:
366
367         field1
368         field2
369         field3
370         //
371         data1
372         data2
373         data3
374         //
375
376   set_field_alias
377         $p->set_field_alias({
378             name => 'Moniker,handle',        # comma-separated string
379             city => [ qw( town township ) ], # or anonymous arrayref
380         });
381
382       Allows you to define alternate names for fields, e.g., sometimes your
383       input file calls city "town" or "township," sometimes a file uses
384       "Moniker" instead of "name."
385
386   trim
387         my $trim_value = $p->trim(1);
388
389       Provide "true" argument to remove leading and trailing whitespace from
390       fields.  Use a "false" argument to disable.
391

AUTHOR

393       Ken Youens-Clark <kclark@cpan.org>
394

CREDITS

396       Thanks to the following:
397
398       ·   Benjamin Tilly
399
400           For Text::xSV, the inspirado for this module
401
402       ·   Tim Bunce et al.
403
404           For DBI, from which many of the methods were shamelessly stolen
405
406       ·   Tom Aldcroft
407
408           For contributing code to make it easy to parse whitespace-delimited
409           data
410
411       ·   Liya Ren
412
413           For catching the column-ordering error when parsing with "no-
414           headers"
415
416       ·   Sharon Wei
417
418           For catching bug in "extract" that sets up infinite loops
419
420       ·   Lars Thegler
421
422           For bug report on missing "script_files" arg in Build.PL
423

BUGS

425       None known.  Please use http://rt.cpan.org/ for reporting bugs.
426

LICENSE AND COPYRIGHT

428       Copyright (C) 2006-9 Ken Youens-Clark.  All rights reserved.
429
430       This program is free software; you can redistribute it and/or modify it
431       under the terms of the GNU General Public License as published by the
432       Free Software Foundation; version 2.
433
434       This program is distributed in the hope that it will be useful, but
435       WITHOUT ANY WARRANTY; without even the implied warranty of
436       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
437       General Public License for more details.
438
439
440
441perl v5.12.0                      2010-05-07             Text::RecordParser(3)