1Data::Record(3) User Contributed Perl Documentation Data::Record(3)
2
3
4
6 Data::Record - "split" on steroids
7
9 Version 0.02
10
12 use Regexp::Common;
13 use Data::Record;
14 my $record = Data::Record->new({
15 split => "\n",
16 unless => $RE{quoted},
17 });
18 my @data = $record->records($data);
19
21 Sometimes we need data split into records and a simple split on the
22 input record separator ($/) or some other value fails because the
23 values we're splitting on may allowed in other parts of the data.
24 Perhaps they're quoted. Perhaps they're embedded in other data which
25 should not be split up.
26
27 This module allows you to specify what you wish to split the data on,
28 but also speficy an "unless" regular expression. If the text in
29 question matches the "unless" regex, it will not be split there. This
30 allows us to do things like split on newlines unless newlines are
31 embedded in quotes.
32
34 new
35 Common usage:
36
37 my $record = Data::Record->new({
38 split => qr/$split/,
39 unless => qr/$unless/,
40 });
41
42 Advanced usage:
43
44 my $record = Data::Record->new({
45 split => qr/$split/,
46 unless => qr/$unless/, # optional
47 token => $token, # optional
48 chomp => 0, # optional
49 limit => $limit, # optional (do not use with trim)
50 trim => 1, # optional (do not use with limit)
51 fields => {
52 split => ',',
53 unless => $RE{quoted}, # from Regexp::Common
54 }
55 });
56
57 The constructor takes a hashref of key/value pairs to set the behavior
58 of data records to be created.
59
60 • split
61
62 This is the value to split the data on. It may be either a regular
63 expression or a string.
64
65 Defaults to the current input record separator ($/).
66
67 • unless
68
69 Data will be split into records matching the split value unless
70 they also match this value. No default.
71
72 If you do not have an "unless" value, use of this module is
73 overkill.
74
75 • token
76
77 You will probably never need to set this value.
78
79 Internally, this module attempts to find a token which does not
80 match any text found in the data to be split and also does not
81 match the split value. This is necessary because we mask the data
82 we don't want to split using this token. This allows us to split
83 the resulting text.
84
85 In the unlikely event that the module cannot find a token which is
86 not in the text, you may set the token value yourself to some
87 string value. Do not set it to a regular expression.
88
89 • chomp
90
91 By default, the split value is discarded (chomped) from each
92 record. Set this to a true value to keep the split value on each
93 record. This differs slightly from how it's done with split and
94 capturing parentheses:
95
96 split /(\,)/, '3,4,5';
97
98 Ordinarily, this results in the following list:
99
100 ( 3, ',', 4, ',', 5 )
101
102 This module assumes you want those values with the preceding
103 record. By setting chomp to false, you get the following list:
104
105 ( '3,', '4,' 5 )
106
107 • limit
108
109 The default split behavior is similar to this:
110
111 split $split_regex, $data;
112
113 Setting "limit" will cause the behavior to act like this:
114
115 split $split_regex, $data, $limit
116
117 See "perldoc -f split" for more information about the behavior of
118 "limit".
119
120 You may not set both "limit" and "trim" in the constructor.
121
122 • trim
123
124 By default, we return all records. This means that due to the
125 nature of split and how we're doing things, we sometimes get a
126 trailing null record. However, setting this value causes the
127 module to behave as if we had done this:
128
129 split $split_regex, $data, 0;
130
131 When "split" is called with a zero as the third argument, trailing
132 null values are discarded. See "perldoc -f split" for more
133 information.
134
135 You may not set both "limit" and "trim" in the constructor.
136
137 Note: This does not trim white space around returned records.
138
139 • fields
140
141 By default, individual records are returned as strings. If you set
142 "fields", you pass in a hashref of arguments that are identical to
143 what "new" would take and resulting records are returned as array
144 references processed by a new "Data::Record" instance.
145
146 Example: a quick CSV parser which assumes that commas and newlines
147 may both be in quotes:
148
149 # four lines, but there are only three records! (newline in quotes)
150 $data = <<'END_DATA';
151 1,2,"programmer, perl",4,5
152 1,2,"programmer,
153 perl",4,5
154 1,2,3,4,5
155 END_DATA
156
157 $record = $RECORD->new({
158 split => "\n",
159 unless => $quoted,
160 trim => 1,
161 fields => {
162 split => ",",
163 unless => $quoted,
164 }
165 });
166 my @records = $record->records($data);
167 foreach my $fields (@records) {
168 foreach my $field = (@$fields);
169 # do something
170 }
171 }
172
173 Note that above example will not remove the quotes from individual
174 fields.
175
176 split
177 my $split = $record->split;
178 $record->split($on_value);
179
180 Getter/setter for split value. May be a regular expression or a scalar
181 value.
182
183 unless
184 my $unless = $self->unless;
185 $self->unless($is_value);
186
187 Getter/setter for unless value. May be a regular expression or a
188 scalar value.
189
190 chomp
191 my $chomp = $record->chomp;
192 $record->chomp(0);
193
194 Getter/setter for boolean chomp value.
195
196 limit
197 my $limit = $record->limit;
198 $record->limit(3);
199
200 Getter/setter for integer limit value.
201
202 trim
203 my $trim = $record->trim;
204 $record->trim(1);
205
206 Getter/setter for boolean limit value. Setting this value will cause
207 any previous "limit" value to be overwritten.
208
209 token
210 my $token = $record->token;
211 $record->token($string_not_found_in_text);
212
213 Getter/setter for token value. Token must be a string that does not
214 match the split value and is not found in the text.
215
216 You can return the current token value if you have set it in your code.
217 If you rely on this module to create a token (this is the normal
218 behavior), it is not available via this method until "records" is
219 called.
220
221 Setting the token to an undefined value causes Data::Record to try and
222 find a token itself.
223
224 If the token matches the split value, this method will croak when you
225 attempt to set the token.
226
227 If the token is found in the data, the "records" method will croak when
228 it is called.
229
230 records
231 my @records = $record->records($data);
232
233 Returns @records for $data based upon current split criteria.
234
236 It's possible to get erroneous results if the split value is "/\d+/".
237 I've tried to work around this. Please let me know if there is a
238 problem.
239
241 This module must read all of the data at once. This can make it slow
242 for larger data sets.
243
245 Curtis "Ovid" Poe, "<ovid [at] cpan [dot] org>"
246
248 Please report any bugs or feature requests to
249 "bug-data-record@rt.cpan.org", or through the web interface at
250 <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Record>. I will
251 be notified, and then you'll automatically be notified of progress on
252 your bug as I make changes.
253
255 Thanks to the Monks for inspiration from
256 <http://perlmonks.org/index.pl?node_id=492002>.
257
258 0.02 Thanks to Smylers and Stefano Rodighiero for catching POD errors.
259
261 Copyright 2005 Curtis "Ovid" Poe, all rights reserved.
262
263 This program is free software; you can redistribute it and/or modify it
264 under the same terms as Perl itself.
265
266
267
268perl v5.34.0 2021-07-22 Data::Record(3)