1DateTime::Format::BuildUesre(r3)Contributed Perl DocumenDtaatteiToinme::Format::Builder(3)
2
3
4
6 DateTime::Format::Builder - Create DateTime parser classes and objects.
7
9 package DateTime::Format::Brief;
10 our $VERSION = '0.07';
11 use DateTime::Format::Builder
12 (
13 parsers => {
14 parse_datetime => [
15 {
16 regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
17 params => [qw( year month day hour minute second )],
18 },
19 {
20 regex => qr/^(\d{4})(\d\d)(\d\d)$/,
21 params => [qw( year month day )],
22 },
23 ],
24 }
25 );
26
28 DateTime::Format::Builder creates DateTime parsers. Many string for‐
29 mats of dates and times are simple and just require a basic regular
30 expression to extract the relevant information. Builder provides a sim‐
31 ple way to do this without writing reams of structural code.
32
33 Builder provides a number of methods, most of which you'll never need,
34 or at least rarely need. They're provided more for exposing of the mod‐
35 ule's innards to any subclasses, or for when you need to do something
36 slightly beyond what I expected.
37
39 See DateTime::Format::Builder::Tutorial.
40
42 Often, I will speak of "undef" being returned, however that's not
43 strictly true.
44
45 When a simple single specification is given for a method, the method
46 isn't given a single parser directly. It's given a wrapper that will
47 call "on_fail()" if the single parser returns "undef". The single
48 parser must return "undef" so that a multiple parser can work nicely
49 and actual errors can be thrown from any of the callbacks.
50
51 Similarly, any multiple parsers will only call "on_fail()" right at the
52 end when it's tried all it could.
53
54 "on_fail()" (see later) is defined, by default, to throw an error.
55
56 Multiple parser specifications can also specify "on_fail" with a
57 coderef as an argument in the options block. This will take precedence
58 over the inheritable and over-ridable method.
59
60 That said, don't throw real errors from callbacks in multiple parser
61 specifications unless you really want parsing to stop right there and
62 not try any other parsers.
63
64 In summary: calling a method will result in either a "DateTime" object
65 being returned or an error being thrown (unless you've overridden
66 "on_fail()" or "create_method()", or you've specified a "on_fail" key
67 to a multiple parser specification).
68
69 Individual parsers (be they multiple parsers or single parsers) will
70 return either the "DateTime" object or "undef".
71
73 A single specification is a hash ref of instructions on how to create a
74 parser.
75
76 The precise set of keys and values varies according to parser type.
77 There are some common ones though:
78
79 · length is an optional parameter that can be used to specify that
80 this particular regex is only applicable to strings of a certain
81 fixed length. This can be used to make parsers more efficient. It's
82 strongly recommended that any parser that can use this parameter
83 does.
84
85 You may happily specify the same length twice. The parsers will be
86 tried in order of specification.
87
88 You can also specify multiple lengths by giving it an arrayref of
89 numbers rather than just a single scalar. If doing so, please keep
90 the number of lengths to a minimum.
91
92 If any specifications without lengths are given and the particular
93 length parser fails, then the non-length parsers are tried.
94
95 This parameter is ignored unless the specification is part of a
96 multiple parser specification.
97
98 · label provides a name for the specification and is passed to some
99 of the callbacks about to mentioned.
100
101 · on_match and on_fail are callbacks. Both routines will be called
102 with parameters of:
103
104 · input, being the input to the parser (after any preprocessing
105 callbacks).
106
107 · label, being the label of the parser, if there is one.
108
109 · self, being the object on which the method has been invoked
110 (which may just be a class name). Naturally, you can then
111 invoke your own methods on it do get information you want.
112
113 · args, being an arrayref of any passed arguments, if any. If
114 there were no arguments, then this parameter is not given.
115
116 These routines will be called depending on whether the regex match
117 succeeded or failed.
118
119 · preprocess is a callback provided for cleaning up input prior to
120 parsing. It's given a hash as arguments with the following keys:
121
122 · input being the datetime string the parser was given (if using
123 multiple specifications and an overall preprocess then this is
124 the date after it's been through that preprocessor).
125
126 · parsed being the state of parsing so far. Usually empty at this
127 point unless an overall preprocess was given. Items may be
128 placed in it and will be given to any postprocessor and "Date‐
129 Time->new" (unless the postprocessor deletes it).
130
131 · self, args, label as per on_match and on_fail.
132
133 The return value from the routine is what is given to the regex.
134 Note that this is last code stop before the match.
135
136 Note: mixing length and a preprocess that modifies the length of
137 the input string is probably not what you meant to do. You probably
138 meant to use the multiple parser variant of preprocess which is
139 done before any length calculations. This "single parser" variant
140 of preprocess is performed after any length calculations.
141
142 · postprocess is the last code stop before "DateTime->new()" is
143 called. It's given the same arguments as preprocess. This allows it
144 to modify the parsed parameters after the parse and before the cre‐
145 ation of the object. For example, you might use:
146
147 {
148 regex => qr/^(\d\d) (\d\d) (\d\d)$/,
149 params => [qw( year month day )],
150 postprocess => \&_fix_year,
151 }
152
153 where "_fix_year" is defined as:
154
155 sub _fix_year
156 {
157 my %args = @_;
158 my ($date, $p) = @args{qw( input parsed )};
159 $p->{year} += $p->{year} > 69 ? 1900 : 2000;
160 return 1;
161 }
162
163 This will cause the two digit years to be corrected according to
164 the cut off. If the year was '69' or lower, then it is made into
165 2069 (or 2045, or whatever the year was parsed as). Otherwise it is
166 assumed to be 19xx. The DateTime::Format::Mail module uses code
167 similar to this (only it allows the cut off to be configured and it
168 doesn't use Builder).
169
170 Note: It is very important to return an explicit value from the
171 postprocess callback. If the return value is false then the parse
172 is taken to have failed. If the return value is true, then the
173 parse is taken to have succeeded and "DateTime->new()" is called.
174
175 See the documentation for the individual parsers for their valid keys.
176
177 Parsers at the time of writing are:
178
179 · DateTime::Format::Builder::Parser::Regex - provides regular expres‐
180 sion based parsing.
181
182 · DateTime::Format::Builder::Parser::Strptime - provides strptime
183 based parsing.
184
185 Subroutines / coderefs as specifications.
186
187 A single parser specification can be a coderef. This was added mostly
188 because it could be and because I knew someone, somewhere, would want
189 to use it.
190
191 If the specification is a reference to a piece of code, be it a subrou‐
192 tine, anonymous, or whatever, then it's passed more or less straight
193 through. The code should return "undef" in event of failure (or any
194 false value, but "undef" is strongly preferred), or a true value in the
195 event of success (ideally a "DateTime" object or some object that has
196 the same interface).
197
198 This all said, I generally wouldn't recommend using this feature unless
199 you have to.
200
201 Callbacks
202
203 I mention a number of callbacks in this document.
204
205 Any time you see a callback being mentioned, you can, if you like, sub‐
206 stitute an arrayref of coderefs rather than having the straight
207 coderef.
208
210 These are very easily described as an array of single specifications.
211
212 Note that if the first element of the array is an arrayref, then you're
213 specifying options.
214
215 · preprocess lets you specify a preprocessor that is called before
216 any of the parsers are tried. This lets you do things like strip
217 off timezones or any unnecessary data. The most common use people
218 have for it at present is to get the input date to a particular
219 length so that the length is usable (DateTime::Format::ICal would
220 use it to strip off the variable length timezone).
221
222 Arguments are as for the single parser preprocess variant with the
223 exception that label is never given.
224
225 · on_fail should be a reference to a subroutine that is called if the
226 parser fails. If this is not provided, the default action is to
227 call "DateTime::Format::Builder::on_fail", or the "on_fail" method
228 of the subclass of DTFB that was used to create the parser.
229
231 Builder allows you to plug in a fair few callbacks, which can make fol‐
232 lowing how a parse failed (or succeeded unexpectedly) somewhat tricky.
233
234 For Single Specifications
235
236 A single specification will do the following:
237
238 User calls parser:
239
240 my $dt = $class->parse_datetime( $string );
241
242 1 preprocess is called. It's given $string and a reference to the
243 parsing workspace hash, which we'll call $p. At this point, $p is
244 empty. The return value is used as $date for the rest of this sin‐
245 gle parser. Anything put in $p is also used for the rest of this
246 single parser.
247
248 2 regex is applied.
249
250 3 If regex did not match, then on_fail is called (and is given $date
251 and also label if it was defined). Any return value is ignored and
252 the next thing is for the single parser to return "undef".
253
254 If regex did match, then on_match is called with the same arguments
255 as would be given to on_fail. The return value is similarly
256 ignored, but we then move to step 4 rather than exiting the parser.
257
258 4 postprocess is called with $date and a filled out $p. The return
259 value is taken as a indication of whether the parse was a success
260 or not. If it wasn't a success then the single parser will exit at
261 this point, returning undef.
262
263 5 "DateTime->new()" is called and the user is given the resultant
264 "DateTime" object.
265
266 See the section on error handling regarding the "undef"s mentioned
267 above.
268
269 For Multiple Specifications
270
271 With multiple specifications:
272
273 User calls parser:
274
275 my $dt = $class->complex_parse( $string );
276
277 1 The overall preprocessor is called and is given $string and the
278 hashref $p (identically to the per parser preprocess mentioned in
279 the previous flow).
280
281 If the callback modifies $p then a copy of $p is given to each of
282 the individual parsers. This is so parsers won't accidentally pol‐
283 lute each other's workspace.
284
285 2 If an appropriate length specific parser is found, then it is
286 called and the single parser flow (see the previous section) is
287 followed, and the parser is given a copy of $p and the return value
288 of the overall preprocessor as $date.
289
290 If a "DateTime" object was returned so we go straight back to the
291 user.
292
293 If no appropriate parser was found, or the parser returned "undef",
294 then we progress to step 3!
295
296 3 Any non-length based parsers are tried in the order they were spec‐
297 ified.
298
299 For each of those the single specification flow above is performed,
300 and is given a copy of the output from the overall preprocessor.
301
302 If a real "DateTime" object is returned then we exit back to the
303 user.
304
305 If no parser could parse, then an error is thrown.
306
307 See the section on error handling regarding the "undef"s mentioned
308 above.
309
311 In the general course of things you won't need any of the methods. Life
312 often throws unexpected things at us so the methods are all available
313 for use.
314
315 import
316
317 "import()" is a wrapper for "create_class()". If you specify the class
318 option (see documentation for "create_class()") it will be ignored.
319
320 create_class
321
322 This method can be used as the runtime equivalent of "import()". That
323 is, it takes the exact same parameters as when one does:
324
325 use DateTime::Format::Builder ( blah blah blah )
326
327 That can be (almost) equivalently written as:
328
329 use DateTime::Format::Builder;
330 DateTime::Format::Builder->create_class( blah blah blah );
331
332 The difference being that the first is done at compile time while the
333 second is done at run time.
334
335 In the tutorial I said there were only two parameters at present. I
336 lied. There are actually three of them.
337
338 · parsers takes a hashref of methods and their parser specifications.
339 See the tutorial above for details.
340
341 Note that if you define a subroutine of the same name as one of the
342 methods you define here, an error will be thrown.
343
344 · constructor determines whether and how to create a "new()" function
345 in the new class. If given a true value, a constructor is created.
346 If given a false value, one isn't.
347
348 If given an anonymous sub or a reference to a sub then that is used
349 as "new()".
350
351 The default is 1 (that is, create a constructor using our default
352 code which simply creates a hashref and blesses it).
353
354 If your class defines its own "new()" method it will not be over‐
355 written. If you define your own "new()" and also tell Builder to
356 define one an error will be thrown.
357
358 · verbose takes a value. If the value is undef, then logging is dis‐
359 abled. If the value is a filehandle then that's where logging will
360 go. If it's a true value, then output will go to "STDERR".
361
362 Alternatively, call "$DateTime::Format::Builder::verbose()" with
363 the relevant value. Whichever value is given more recently is
364 adhered to.
365
366 Be aware that verbosity is a global wide setting.
367
368 · class is optional and specifies the name of the class in which to
369 create the specified methods.
370
371 If using this method in the guise of "import()" then this field
372 will cause an error so it is only of use when calling as "cre‐
373 ate_class()".
374
375 · version is also optional and specifies the value to give $VERSION
376 in the class. It's generally not recommended unless you're combin‐
377 ing with the class option. A "ExtUtils::MakeMaker" / "CPAN" compli‐
378 ant version specification is much better.
379
380 In addition to creating any of the methods it also creates a "new()"
381 method that can instantiate (or clone) objects.
382
384 In the rest of the documentation I've often lied in order to get some
385 of the ideas across more easily. The thing is, this module's very flex‐
386 ible. You can get markedly different behaviour from simply subclassing
387 it and overriding some methods.
388
389 create_method
390
391 Given a parser coderef, returns a coderef that is suitable to be a
392 method.
393
394 The default action is to call "on_fail()" in the event of a non-parse,
395 but you can make it do whatever you want.
396
397 on_fail
398
399 This is called in the event of a non-parse (unless you've overridden
400 "create_method()" to do something else.
401
402 The single argument is the input string. The default action is to call
403 "croak()". Above, where I've said parsers or methods throw errors, this
404 is the method that is doing the error throwing.
405
406 You could conceivably override this method to, say, return "undef".
407
409 The methods listed in the METHODS section are all you generally need
410 when creating your own class. Sometimes you may not want a full blown
411 class to parse something just for this one program. Some methods are
412 provided to make that task easier.
413
414 new
415
416 The basic constructor. It takes no arguments, merely returns a new
417 "DateTime::Format::Builder" object.
418
419 my $parser = DateTime::Format::Builder->new();
420
421 If called as a method on an object (rather than as a class method),
422 then it clones the object.
423
424 my $clone = $parser->new();
425
426 clone
427
428 Provided for those who prefer an explicit "clone()" method rather than
429 using "new()" as an object method.
430
431 my $clone_of_clone = $clone->clone();
432
433 parser
434
435 Given either a single or multiple parser specification, sets the object
436 to have a parser based on that specification.
437
438 $parser->parser(
439 regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
440 params => [qw( year month day )],
441 );
442
443 The arguments given to "parser()" are handed directly to "cre‐
444 ate_parser()". The resultant parser is passed to "set_parser()".
445
446 If called as an object method, it returns the object.
447
448 If called as a class method, it creates a new object, sets its parser
449 and returns that object.
450
451 set_parser
452
453 Sets the parser of the object to the given parser.
454
455 $parser->set_parser( $coderef );
456
457 Note: this method does not take specifications. It also does not take
458 anything except coderefs. Luckily, coderefs are what most of the other
459 methods produce.
460
461 The method return value is the object itself.
462
463 get_parser
464
465 Returns the parser the object is using.
466
467 my $code = $parser->get_parser();
468
469 parse_datetime
470
471 Given a string, it calls the parser and returns the "DateTime" object
472 that results.
473
474 my $dt = $parser->parse_datetime( "1979 07 16" );
475
476 The return value, if not a "DateTime" object, is whatever the parser
477 wants to return. Generally this means that if the parse failed an error
478 will be thrown.
479
480 format_datetime
481
482 If you call this function, it will throw an errror.
483
485 Some longer examples are provided in the distribution. These implement
486 some of the common parsing DateTime modules using Builder. Each of them
487 are, or were, drop in replacements for the modules at the time of writ‐
488 ing them.
489
491 Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
492 DateTime::Format::ICal and DateTime::Format::MySQL, and some much
493 needed review.
494
495 Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for
496 writing the multilength code (both one length with multiple parsers and
497 single parser with multiple lengths), blame for the Regex custom con‐
498 structor code, spotting a bug in Dispatch, and more much needed review.
499
500 Kellan Elliott-McCrea (KELLAN) for even more review, suggestions, Date‐
501 Time::Format::W3CDTF and the encouragement to rewrite these docs almost
502 100%!
503
504 Claus Faerber (CFAERBER) for having me get around to fixing the auto-
505 constructor writing, providing the 'args'/'self' patch, and suggesting
506 the multi-callbacks.
507
508 Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now
509 supports.
510
511 Matthew McGillis for pointing out that "on_fail" overriding should be
512 simpler.
513
514 Simon Cozens (SIMON) for saying it was cool.
515
517 Support for this module is provided via the datetime@perl.org email
518 list. See http://lists.perl.org/ for more details.
519
520 Alternatively, log them via the CPAN RT system via the web or email:
521
522 http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder
523 bug-datetime-format-builder@rt.cpan.org
524
525 This makes it much easier for me to track things and thus means your
526 problem is less likely to be neglected.
527
529 Copyright (C) Iain Truskett, 2003. All rights reserved.
530
531 This library is free software; you can redistribute it and/or modify it
532 under the same terms as Perl itself, either Perl version 5.000 or, at
533 your option, any later version of Perl 5 you may have available.
534
535 The full text of the licences can be found in the Artistic and COPYING
536 files included with this module, or in perlartistic and perlgpl as sup‐
537 plied with Perl 5.8.1 and later.
538
540 Originally written by Iain Truskett <spoon@cpan.org>, who died on
541 December 29, 2003.
542
543 Maintained by Dave Rolsky <autarch@urth.org>.
544
546 "datetime@perl.org" mailing list.
547
548 http://datetime.perl.org/
549
550 perl, DateTime, DateTime::Format::Builder::Tutorial, DateTime::For‐
551 mat::Builder::Parser
552
553
554
555perl v5.8.8 2008-02-01 DateTime::Format::Builder(3)