1Text::RecordParser(3) User Contributed Perl DocumentationText::RecordParser(3)
2
3
4
6 Text::RecordParser - read record-oriented files
7
9 This documentation refers to version 1.2.1.
10
12 use Text::RecordParser;
13
14 # use default record (\n) and field (,) separators
15 my $p = Text::RecordParser->new( $file );
16
17 # or be explicit
18 my $p = Text::RecordParser->new({
19 filename => $file,
20 field_separator => "\t",
21 });
22
23 $p->filename('foo.csv');
24
25 # Split records on two newlines
26 $p->record_separator("\n\n");
27
28 # Split fields on tabs
29 $p->field_separator("\t");
30
31 # Skip lines beginning with hashes
32 $p->comment( qr/^#/ );
33
34 # Trim whitespace
35 $p->trim(1);
36
37 # Use the fields in the first line as column names
38 $p->bind_header;
39
40 # Get a list of the header fields (in order)
41 my @columns = $p->field_list;
42
43 # Extract a particular field from the next row
44 my ( $name, $age ) = $p->extract( qw[name age] );
45
46 # Return all the fields from the next row
47 my @fields = $p->fetchrow_array;
48
49 # Define a field alias
50 $p->set_field_alias( name => 'handle' );
51
52 # Return all the fields from the next row as a hashref
53 my $record = $p->fetchrow_hashref;
54 print $record->{'name'};
55 # or
56 print $record->{'handle'};
57
58 # Return the record as an object with fields as accessors
59 my $object = $p->fetchrow_object;
60 print $object->name; # or $object->handle;
61
62 # Get all data as arrayref of arrayrefs
63 my $data = $p->fetchall_arrayref;
64
65 # Get all data as arrayref of hashrefs
66 my $data = $p->fetchall_arrayref( { Columns => {} } );
67
68 # Get all data as hashref of hashrefs
69 my $data = $p->fetchall_hashref('name');
70
72 This module is for reading record-oriented data in a delimited text
73 file. The most common example have records separated by newlines and
74 fields separated by commas or tabs, but this module aims to provide a
75 consistent interface for handling sequential records in a file however
76 they may be delimited. Typically this data lists the fields in the
77 first line of the file, in which case you should call "bind_header" to
78 bind the field name (or not, and it will be called implicitly). If the
79 first line contains data, you can still bind your own field names via
80 "bind_fields". Either way, you can then use many methods to get at the
81 data as arrays or hashes.
82
84 new
85
86 This is the object constructor. It takes a hash (or hashref) of argu‐
87 ments. Each argument can also be set through the method of the same
88 name.
89
90 * filename
91 The path to the file being read. If the filename is passed and the
92 fh is not, then it will open a filehandle on that file and sets
93 "fh" accordingly.
94
95 * comment
96 A compiled regular expression identifying comment lines that should
97 be skipped.
98
99 * data
100 The data to read.
101
102 * fh
103 The filehandle of the file to read.
104
105 * field_separator ⎪ fs
106 The field separator (default is comma).
107
108 * record_separator ⎪ rs
109 The record separator (default is newline).
110
111 * field_filter
112 A callback applied to all the fields as they are read.
113
114 * header_filter
115 A callback applied to the column names.
116
117 * trim
118 Boolean to enable trimming of leading and trailing whitespace from
119 fields (useful if splitting on whitespace only).
120
121 See methods for each argument name for more information.
122
123 Alternately, if you supply a single argument to "new", it will be
124 treated as the "filename" argument.
125
126 bind_fields
127
128 $p->bind_fields( qw[ name rank serial_number ] );
129
130 Takes an array of field names and memorizes the field positions for
131 later use. If the input file has no header line but you still wish to
132 retrieve the fields by name (or even if you want to call "bind_header"
133 and then give your own field names), simply pass in the an array of
134 field names you wish to use.
135
136 Pass in an empty array reference to unset:
137
138 $p->bind_field( [] ); # unsets fields
139
140 bind_header
141
142 $p->bind_header;
143 my $name = $p->extract('name');
144
145 Takes the fields from the next row under the cursor and assigns the
146 field names to the values. Usually you would call this immediately
147 after opening the file in order to bind the field names in the first
148 row.
149
150 comment
151
152 $p->comment( qr/^#/ ); # Perl-style comments
153 $p->comment( qr/^--/ ); # SQL-style comments
154
155 Takes a regex to apply to a record to see if it looks like a comment to
156 skip.
157
158 data
159
160 $p->data( $string );
161 $p->data( \$string );
162 $p->data( @lines );
163 $p->data( [$line1, $line2, $line3] );
164 $p->data( IO::File->new('<data') );
165
166 Allows a scalar, scalar reference, glob, array, or array reference as
167 the thing to read instead of a file handle.
168
169 It's not advised to pass a filehandle to "data" as it will read the
170 entire contents of the file rather than one line at a time if you set
171 it via "fh".
172
173 extract
174
175 my ( $foo, $bar, $baz ) = $p->extract( qw[ foo bar baz ] );
176
177 Extracts a list of fields out of the last row read. The field names
178 must correspond to the field names bound either via "bind_fields" or
179 "bind_header".
180
181 fetchrow_array
182
183 my @values = $p->fetchrow_array;
184
185 Reads a row from the file and returns an array or array reference of
186 the fields.
187
188 fetchrow_hashref
189
190 my $record = $p->fetchrow_hashref;
191 print "Name = ", $record->{'name'}, "\n";
192
193 Reads a line of the file and returns it as a hash reference. The keys
194 of the hashref are the field names bound via "bind_fields" or
195 "bind_header". If you do not bind fields prior to calling this method,
196 the "bind_header" method will be implicitly called for you.
197
198 fetchrow_object
199
200 while ( my $object = $p->fetchrow_object ) {
201 my $id = $object->id;
202 my $name = $object->naem; # <-- this will throw a runtime error
203 }
204
205 This will return the next data record as a Text::RecordParser::Object
206 object that has read-only accessor methods of the field names and any
207 aliases. This allows you to enforce field names, further helping
208 ensure that your code is reading the input file correctly. That is, if
209 you are using the "fetchrow_hashref" method to read each line, you may
210 misspell the hash key and introduce a bug in your code. With this
211 method, Perl will throw an error if you attempt to read a field not
212 defined in the file's headers. Additionally, any defined field aliases
213 will be created as additional accessor methods.
214
215 fetchall_arrayref
216
217 my $records = $p->fetchall_arrayref;
218 for my $record ( @$records ) {
219 print "Name = ", $record->[0], "\n";
220 }
221
222 my $records = $p->fetchall_arrayref( { Columns => {} } );
223 for my $record ( @$records ) {
224 print "Name = ", $record->{'name'}, "\n";
225 }
226
227 Like DBI's fetchall_arrayref, returns an arrayref of arrayrefs. Also
228 accepts optional "{ Columns => {} }" argument to return an arrayref of
229 hashrefs.
230
231 fetchall_hashref
232
233 my $records = $p->fetchall_hashref('id');
234 for my $id ( keys %$records ) {
235 my $record = $records->{ $id };
236 print "Name = ", $record->{'name'}, "\n";
237 }
238
239 Like DBI's fetchall_hashref, this returns a hash reference of hash ref‐
240 erences. The keys of the top-level hashref are the field values of the
241 field argument you supply. The field name you supply can be a field
242 created by a "field_compute".
243
244 fh
245
246 open my $fh, '<', $file or die $!;
247 $p->fh( $fh );
248
249 Gets or sets the filehandle of the file being read.
250
251 field_compute
252
253 A callback applied to the fields identified by position (or field name
254 if "bind_fields" or "bind_header" was called).
255
256 The callback will be passed two arguments:
257
258 1 The current field
259
260 2 A reference to all the other fields, either as an array or hash
261 reference, depending on the method which you called.
262
263 If data looks like this:
264
265 parent children
266 Mike Greg,Peter,Bobby
267 Carol Marcia,Jane,Cindy
268
269 You could split the "children" field into an array reference with the
270 values like so:
271
272 $p->field_compute( 'children', sub { [ split /,/, shift() ] } );
273
274 The field position or name doesn't actually have to exist, which means
275 you could create new, computed fields on-the-fly. E.g., if you data
276 looks like this:
277
278 1,3,5
279 32,4,1
280 9,5,4
281
282 You could write a field_compute like this:
283
284 $p->field_compute( 3,
285 sub {
286 my ( $cur, $others ) = @_;
287 my $sum;
288 $sum += $_ for @$others;
289 return $sum;
290 }
291 );
292
293 Field "3" will be created as the sum of the other fields. This allows
294 you to further write:
295
296 my $data = $p->fetchall_arrayref;
297 for my $rec ( @$data ) {
298 print "$rec->[0] + $rec->[1] + $rec->[2] = $rec->[3]\n";
299 }
300
301 Prints:
302
303 1 + 3 + 5 = 9
304 32 + 4 + 1 = 37
305 9 + 5 + 4 = 18
306
307 field_filter
308
309 $p->field_filter( sub { $_ = shift; uc(lc($_)) } );
310
311 A callback which is applied to each field. The callback will be passed
312 the current value of the field. Whatever is passed back will become
313 the new value of the field. The above example capitalizes field val‐
314 ues. To unset the filter, pass in the empty string.
315
316 field_list
317
318 $p->bind_fields( qw[ foo bar baz ] );
319 my @fields = $p->field_list;
320 print join ', ', @fields; # prints "foo, bar, baz"
321
322 Returns the fields bound via "bind_fields" (or "bind_header").
323
324 field_positions
325
326 my %positions = $p->field_positions;
327
328 Returns a hash of the fields and their positions bound via
329 "bind_fields" (or "bind_header"). Mostly for internal use.
330
331 field_separator
332
333 $p->field_separator("\t"); # splits fields on tabs
334 $p->field_separator('::'); # splits fields on double colons
335 $p->field_separator(qr/\s+/); # splits fields on whitespace
336 my $sep = $p->field_separator; # returns the current separator
337
338 Gets and sets the token to use as the field delimiter. The default is
339 a comma. Regular expressions can be specified using qr//.
340
341 filename
342
343 $p->filename('/path/to/file.dat');
344
345 Gets or sets the complete path to the file to be read. If a file is
346 already opened, then the handle on it will be closed and a new one
347 opened on the new file.
348
349 get_field_aliases
350
351 my @aliases = $p->get_field_aliases('name');
352
353 Allows you to define alternate names for fields, e.g., sometimes your
354 input file calls city "town" or "township," sometimes a file uses
355 "Moniker" instead of "name."
356
357 header_filter
358
359 $p->header_filter( sub { $_ = shift; s/\s+/_/g; lc $_ } );
360
361 A callback applied to column header names. The callback will be passed
362 the current value of the header. Whatever is returned will become the
363 new value of the header. The above example collapses spaces into a
364 single underscore and lowercases the letters. To unset a filter, pass
365 in the empty string.
366
367 record_separator
368
369 $p->record_separator("\n//\n");
370 $p->field_separator("\n");
371
372 Gets and sets the token to use as the record separator. The default is
373 a newline ("\n").
374
375 The above example would read a file that looks like this:
376
377 field1
378 field2
379 field3
380 //
381 data1
382 data2
383 data3
384 //
385
386 set_field_alias
387
388 $p->set_field_alias({
389 name => 'Moniker,handle', # comma-separated string
390 city => [ qw( town township ) ], # or anonymous arrayref
391 });
392
393 Allows you to define alternate names for fields, e.g., sometimes your
394 input file calls city "town" or "township," sometimes a file uses
395 "Moniker" instead of "name."
396
397 trim
398
399 my $trim_value = $p->trim(1);
400
401 Provide "true" argument to remove leading and trailing whitespace from
402 fields. Use a "false" argument to disable.
403
405 Ken Youens-Clark <kclark@cpan.org>
406
408 Thanks to the following:
409
410 * Benjamin Tilly
411 For Text::xSV, the inspirado for this module
412
413 * Tim Bunce et al.
414 For DBI, from which many of the methods were shamelessly stolen
415
416 * Tom Aldcroft
417 For contributing code to make it easy to parse whitespace-delimited
418 data
419
420 * Liya Ren
421 For catching the column-ordering error when parsing with "no-head‐
422 ers"
423
424 * Sharon Wei
425 For catching bug in "extract" that sets up infinite loops
426
427 * Lars Thegler
428 For bug report on missing "script_files" arg in Build.PL
429
431 None known. Please use http://rt.cpan.org/ for reporting bugs.
432
434 Copyright (C) 2006 Ken Youens-Clark. All rights reserved.
435
436 This program is free software; you can redistribute it and/or modify it
437 under the terms of the GNU General Public License as published by the
438 Free Software Foundation; version 2.
439
440 This program is distributed in the hope that it will be useful, but
441 WITHOUT ANY WARRANTY; without even the implied warranty of MER‐
442 CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
443 Public License for more details.
444
445
446
447perl v5.8.8 2007-05-17 Text::RecordParser(3)