1Data::Record(3)       User Contributed Perl Documentation      Data::Record(3)
2
3
4

NAME

6       Data::Record - "split" on steroids
7

VERSION

9       Version 0.02
10

SYNOPSIS

12         use Regexp::Common;
13         use Data::Record;
14         my $record = Data::Record->new({
15           split  => "\n",
16           unless => $RE{quoted},
17         });
18         my @data = $record->records($data);
19

DESCRIPTION

21       Sometimes we need data split into records and a simple split on the
22       input record separator ($/) or some other value fails because the
23       values we're splitting on may allowed in other parts of the data.
24       Perhaps they're quoted.  Perhaps they're embedded in other data which
25       should not be split up.
26
27       This module allows you to specify what you wish to split the data on,
28       but also speficy an "unless" regular expression.  If the text in
29       question matches the "unless" regex, it will not be split there.  This
30       allows us to do things like split on newlines unless newlines are
31       embedded in quotes.
32

METHODS

34   new
35       Common usage:
36
37        my $record = Data::Record->new({
38           split  => qr/$split/,
39           unless => qr/$unless/,
40        });
41
42       Advanced usage:
43
44        my $record = Data::Record->new({
45           split  => qr/$split/,
46           unless => qr/$unless/,  # optional
47           token  => $token,       # optional
48           chomp  => 0,            # optional
49           limit  => $limit,       # optional (do not use with trim)
50           trim   => 1,            # optional (do not use with limit)
51           fields => {
52               split  => ',',
53               unless => $RE{quoted}, # from Regexp::Common
54           }
55        });
56
57       The constructor takes a hashref of key/value pairs to set the behavior
58       of data records to be created.
59
60       ·   split
61
62           This is the value to split the data on.  It may be either a regular
63           expression or a string.
64
65           Defaults to the current input record separator ($/).
66
67       ·   unless
68
69           Data will be split into records matching the split value unless
70           they also match this value.  No default.
71
72           If you do not have an "unless" value, use of this module is
73           overkill.
74
75       ·   token
76
77           You will probably never need to set this value.
78
79           Internally, this module attempts to find a token which does not
80           match any text found in the data to be split and also does not
81           match the split value.  This is necessary because we mask the data
82           we don't want to split using this token.  This allows us to split
83           the resulting text.
84
85           In the unlikely event that the module cannot find a token which is
86           not in the text, you may set the token value yourself to some
87           string value.  Do not set it to a regular expression.
88
89       ·   chomp
90
91           By default, the split value is discarded (chomped) from each
92           record.  Set this to a true value to keep the split value on each
93           record.  This differs slightly from how it's done with split and
94           capturing parentheses:
95
96             split /(\,)/, '3,4,5';
97
98           Ordinarily, this results in the following list:
99
100            ( 3, ',', 4, ',', 5 )
101
102           This module assumes you want those values with the preceding
103           record.  By setting chomp to false, you get the following list:
104
105            ( '3,', '4,' 5 )
106
107       ·   limit
108
109           The default split behavior is similar to this:
110
111            split $split_regex, $data;
112
113           Setting "limit" will cause the behavior to act like this:
114
115            split $split_regex, $data, $limit
116
117           See "perldoc -f split" for more information about the behavior of
118           "limit".
119
120           You may not set both "limit" and "trim" in the constructor.
121
122       ·   trim
123
124           By default, we return all records.  This means that due to the
125           nature of split and how we're doing things, we sometimes get a
126           trailing null record.  However, setting this value causes the
127           module to behave as if we had done this:
128
129            split $split_regex, $data, 0;
130
131           When "split" is called with a zero as the third argument, trailing
132           null values are discarded.  See "perldoc -f split" for more
133           information.
134
135           You may not set both "limit" and "trim" in the constructor.
136
137           Note:  This does not trim white space around returned records.
138
139       ·   fields
140
141           By default, individual records are returned as strings.  If you set
142           "fields", you pass in a hashref of arguments that are identical to
143           what "new" would take and resulting records are returned as array
144           references processed by a new "Data::Record" instance.
145
146           Example:  a quick CSV parser which assumes that commas and newlines
147           may both be in quotes:
148
149            # four lines, but there are only three records! (newline in quotes)
150            $data = <<'END_DATA';
151            1,2,"programmer, perl",4,5
152            1,2,"programmer,
153            perl",4,5
154            1,2,3,4,5
155            END_DATA
156
157            $record = $RECORD->new({
158                split  => "\n",
159                unless => $quoted,
160                trim   => 1,
161                fields => {
162                    split  => ",",
163                    unless => $quoted,
164                }
165            });
166            my @records = $record->records($data);
167            foreach my $fields (@records) {
168              foreach my $field = (@$fields);
169                # do something
170              }
171            }
172
173           Note that above example will not remove the quotes from individual
174           fields.
175
176   split
177         my $split = $record->split;
178         $record->split($on_value);
179
180       Getter/setter for split value.  May be a regular expression or a scalar
181       value.
182
183   unless
184        my $unless = $self->unless;
185        $self->unless($is_value);
186
187       Getter/setter for unless value.  May be a regular expression or a
188       scalar value.
189
190   chomp
191         my $chomp = $record->chomp;
192         $record->chomp(0);
193
194       Getter/setter for boolean chomp value.
195
196   limit
197         my $limit = $record->limit;
198         $record->limit(3);
199
200       Getter/setter for integer limit value.
201
202   trim
203         my $trim = $record->trim;
204         $record->trim(1);
205
206       Getter/setter for boolean limit value.  Setting this value will cause
207       any previous "limit" value to be overwritten.
208
209   token
210         my $token = $record->token;
211         $record->token($string_not_found_in_text);
212
213       Getter/setter for token value.  Token must be a string that does not
214       match the split value and is not found in the text.
215
216       You can return the current token value if you have set it in your code.
217       If you rely on this module to create a token (this is the normal
218       behavior), it is not available via this method until "records" is
219       called.
220
221       Setting the token to an undefined value causes Data::Record to try and
222       find a token itself.
223
224       If the token matches the split value, this method will croak when you
225       attempt to set the token.
226
227       If the token is found in the data, the "records" method will croak when
228       it is called.
229
230   records
231         my @records = $record->records($data);
232
233       Returns @records for $data based upon current split criteria.
234

BUGS

236       It's possible to get erroneous results if the split value is "/\d+/".
237       I've tried to work around this.  Please let me know if there is a
238       problem.
239

CAVEATS

241       This module must read all of the data at once.  This can make it slow
242       for larger data sets.
243

AUTHOR

245       Curtis "Ovid" Poe, "<ovid [at] cpan [dot] org>"
246

BUGS

248       Please report any bugs or feature requests to
249       "bug-data-record@rt.cpan.org", or through the web interface at
250       <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Record>.  I will
251       be notified, and then you'll automatically be notified of progress on
252       your bug as I make changes.
253

ACKNOWLEDGEMENTS

255       Thanks to the Monks for inspiration from
256       <http://perlmonks.org/index.pl?node_id=492002>.
257
258       0.02 Thanks to Smylers and Stefano Rodighiero for catching POD errors.
259
261       Copyright 2005 Curtis "Ovid" Poe, all rights reserved.
262
263       This program is free software; you can redistribute it and/or modify it
264       under the same terms as Perl itself.
265
266
267
268perl v5.32.0                      2020-07-28                   Data::Record(3)
Impressum