1DateTime::Format::BuildUesre(r3)Contributed Perl DocumenDtaatteiToinme::Format::Builder(3)
2
3
4
6 DateTime::Format::Builder - Create DateTime parser classes and objects.
7
9 package DateTime::Format::Brief;
10 our $VERSION = '0.07';
11 use DateTime::Format::Builder
12 (
13 parsers => {
14 parse_datetime => [
15 {
16 regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
17 params => [qw( year month day hour minute second )],
18 },
19 {
20 regex => qr/^(\d{4})(\d\d)(\d\d)$/,
21 params => [qw( year month day )],
22 },
23 ],
24 }
25 );
26
28 DateTime::Format::Builder creates DateTime parsers. Many string
29 formats of dates and times are simple and just require a basic regular
30 expression to extract the relevant information. Builder provides a
31 simple way to do this without writing reams of structural code.
32
33 Builder provides a number of methods, most of which you'll never need,
34 or at least rarely need. They're provided more for exposing of the
35 module's innards to any subclasses, or for when you need to do
36 something slightly beyond what I expected.
37
39 See DateTime::Format::Builder::Tutorial.
40
42 Often, I will speak of "undef" being returned, however that's not
43 strictly true.
44
45 When a simple single specification is given for a method, the method
46 isn't given a single parser directly. It's given a wrapper that will
47 call "on_fail()" if the single parser returns "undef". The single
48 parser must return "undef" so that a multiple parser can work nicely
49 and actual errors can be thrown from any of the callbacks.
50
51 Similarly, any multiple parsers will only call "on_fail()" right at the
52 end when it's tried all it could.
53
54 "on_fail()" (see later) is defined, by default, to throw an error.
55
56 Multiple parser specifications can also specify "on_fail" with a
57 coderef as an argument in the options block. This will take precedence
58 over the inheritable and over-ridable method.
59
60 That said, don't throw real errors from callbacks in multiple parser
61 specifications unless you really want parsing to stop right there and
62 not try any other parsers.
63
64 In summary: calling a method will result in either a "DateTime" object
65 being returned or an error being thrown (unless you've overridden
66 "on_fail()" or "create_method()", or you've specified a "on_fail" key
67 to a multiple parser specification).
68
69 Individual parsers (be they multiple parsers or single parsers) will
70 return either the "DateTime" object or "undef".
71
73 A single specification is a hash ref of instructions on how to create a
74 parser.
75
76 The precise set of keys and values varies according to parser type.
77 There are some common ones though:
78
79 · length is an optional parameter that can be used to specify that
80 this particular regex is only applicable to strings of a certain
81 fixed length. This can be used to make parsers more efficient. It's
82 strongly recommended that any parser that can use this parameter
83 does.
84
85 You may happily specify the same length twice. The parsers will be
86 tried in order of specification.
87
88 You can also specify multiple lengths by giving it an arrayref of
89 numbers rather than just a single scalar. If doing so, please keep
90 the number of lengths to a minimum.
91
92 If any specifications without lengths are given and the particular
93 length parser fails, then the non-length parsers are tried.
94
95 This parameter is ignored unless the specification is part of a
96 multiple parser specification.
97
98 · label provides a name for the specification and is passed to some
99 of the callbacks about to mentioned.
100
101 · on_match and on_fail are callbacks. Both routines will be called
102 with parameters of:
103
104 · input, being the input to the parser (after any preprocessing
105 callbacks).
106
107 · label, being the label of the parser, if there is one.
108
109 · self, being the object on which the method has been invoked
110 (which may just be a class name). Naturally, you can then
111 invoke your own methods on it do get information you want.
112
113 · args, being an arrayref of any passed arguments, if any. If
114 there were no arguments, then this parameter is not given.
115
116 These routines will be called depending on whether the regex match
117 succeeded or failed.
118
119 · preprocess is a callback provided for cleaning up input prior to
120 parsing. It's given a hash as arguments with the following keys:
121
122 · input being the datetime string the parser was given (if using
123 multiple specifications and an overall preprocess then this is
124 the date after it's been through that preprocessor).
125
126 · parsed being the state of parsing so far. Usually empty at this
127 point unless an overall preprocess was given. Items may be
128 placed in it and will be given to any postprocessor and
129 "DateTime->new" (unless the postprocessor deletes it).
130
131 · self, args, label as per on_match and on_fail.
132
133 The return value from the routine is what is given to the regex.
134 Note that this is last code stop before the match.
135
136 Note: mixing length and a preprocess that modifies the length of
137 the input string is probably not what you meant to do. You probably
138 meant to use the multiple parser variant of preprocess which is
139 done before any length calculations. This "single parser" variant
140 of preprocess is performed after any length calculations.
141
142 · postprocess is the last code stop before "DateTime->new()" is
143 called. It's given the same arguments as preprocess. This allows it
144 to modify the parsed parameters after the parse and before the
145 creation of the object. For example, you might use:
146
147 {
148 regex => qr/^(\d\d) (\d\d) (\d\d)$/,
149 params => [qw( year month day )],
150 postprocess => \&_fix_year,
151 }
152
153 where "_fix_year" is defined as:
154
155 sub _fix_year
156 {
157 my %args = @_;
158 my ($date, $p) = @args{qw( input parsed )};
159 $p->{year} += $p->{year} > 69 ? 1900 : 2000;
160 return 1;
161 }
162
163 This will cause the two digit years to be corrected according to
164 the cut off. If the year was '69' or lower, then it is made into
165 2069 (or 2045, or whatever the year was parsed as). Otherwise it is
166 assumed to be 19xx. The DateTime::Format::Mail module uses code
167 similar to this (only it allows the cut off to be configured and it
168 doesn't use Builder).
169
170 Note: It is very important to return an explicit value from the
171 postprocess callback. If the return value is false then the parse
172 is taken to have failed. If the return value is true, then the
173 parse is taken to have succeeded and "DateTime->new()" is called.
174
175 See the documentation for the individual parsers for their valid keys.
176
177 Parsers at the time of writing are:
178
179 · DateTime::Format::Builder::Parser::Regex - provides regular
180 expression based parsing.
181
182 · DateTime::Format::Builder::Parser::Strptime - provides strptime
183 based parsing.
184
185 Subroutines / coderefs as specifications.
186 A single parser specification can be a coderef. This was added mostly
187 because it could be and because I knew someone, somewhere, would want
188 to use it.
189
190 If the specification is a reference to a piece of code, be it a
191 subroutine, anonymous, or whatever, then it's passed more or less
192 straight through. The code should return "undef" in event of failure
193 (or any false value, but "undef" is strongly preferred), or a true
194 value in the event of success (ideally a "DateTime" object or some
195 object that has the same interface).
196
197 This all said, I generally wouldn't recommend using this feature unless
198 you have to.
199
200 Callbacks
201 I mention a number of callbacks in this document.
202
203 Any time you see a callback being mentioned, you can, if you like,
204 substitute an arrayref of coderefs rather than having the straight
205 coderef.
206
208 These are very easily described as an array of single specifications.
209
210 Note that if the first element of the array is an arrayref, then you're
211 specifying options.
212
213 · preprocess lets you specify a preprocessor that is called before
214 any of the parsers are tried. This lets you do things like strip
215 off timezones or any unnecessary data. The most common use people
216 have for it at present is to get the input date to a particular
217 length so that the length is usable (DateTime::Format::ICal would
218 use it to strip off the variable length timezone).
219
220 Arguments are as for the single parser preprocess variant with the
221 exception that label is never given.
222
223 · on_fail should be a reference to a subroutine that is called if the
224 parser fails. If this is not provided, the default action is to
225 call "DateTime::Format::Builder::on_fail", or the "on_fail" method
226 of the subclass of DTFB that was used to create the parser.
227
229 Builder allows you to plug in a fair few callbacks, which can make
230 following how a parse failed (or succeeded unexpectedly) somewhat
231 tricky.
232
233 For Single Specifications
234 A single specification will do the following:
235
236 User calls parser:
237
238 my $dt = $class->parse_datetime( $string );
239
240 1. preprocess is called. It's given $string and a reference to the
241 parsing workspace hash, which we'll call $p. At this point, $p is
242 empty. The return value is used as $date for the rest of this
243 single parser. Anything put in $p is also used for the rest of
244 this single parser.
245
246 2. regex is applied.
247
248 3. If regex did not match, then on_fail is called (and is given $date
249 and also label if it was defined). Any return value is ignored and
250 the next thing is for the single parser to return "undef".
251
252 If regex did match, then on_match is called with the same arguments
253 as would be given to on_fail. The return value is similarly
254 ignored, but we then move to step 4 rather than exiting the parser.
255
256 4. postprocess is called with $date and a filled out $p. The return
257 value is taken as a indication of whether the parse was a success
258 or not. If it wasn't a success then the single parser will exit at
259 this point, returning undef.
260
261 5. "DateTime->new()" is called and the user is given the resultant
262 "DateTime" object.
263
264 See the section on error handling regarding the "undef"s mentioned
265 above.
266
267 For Multiple Specifications
268 With multiple specifications:
269
270 User calls parser:
271
272 my $dt = $class->complex_parse( $string );
273
274 1. The overall preprocessor is called and is given $string and the
275 hashref $p (identically to the per parser preprocess mentioned in
276 the previous flow).
277
278 If the callback modifies $p then a copy of $p is given to each of
279 the individual parsers. This is so parsers won't accidentally
280 pollute each other's workspace.
281
282 2. If an appropriate length specific parser is found, then it is
283 called and the single parser flow (see the previous section) is
284 followed, and the parser is given a copy of $p and the return value
285 of the overall preprocessor as $date.
286
287 If a "DateTime" object was returned so we go straight back to the
288 user.
289
290 If no appropriate parser was found, or the parser returned "undef",
291 then we progress to step 3!
292
293 3. Any non-length based parsers are tried in the order they were
294 specified.
295
296 For each of those the single specification flow above is performed,
297 and is given a copy of the output from the overall preprocessor.
298
299 If a real "DateTime" object is returned then we exit back to the
300 user.
301
302 If no parser could parse, then an error is thrown.
303
304 See the section on error handling regarding the "undef"s mentioned
305 above.
306
308 In the general course of things you won't need any of the methods. Life
309 often throws unexpected things at us so the methods are all available
310 for use.
311
312 import
313 "import()" is a wrapper for "create_class()". If you specify the class
314 option (see documentation for "create_class()") it will be ignored.
315
316 create_class
317 This method can be used as the runtime equivalent of "import()". That
318 is, it takes the exact same parameters as when one does:
319
320 use DateTime::Format::Builder ( blah blah blah )
321
322 That can be (almost) equivalently written as:
323
324 use DateTime::Format::Builder;
325 DateTime::Format::Builder->create_class( blah blah blah );
326
327 The difference being that the first is done at compile time while the
328 second is done at run time.
329
330 In the tutorial I said there were only two parameters at present. I
331 lied. There are actually three of them.
332
333 · parsers takes a hashref of methods and their parser specifications.
334 See the tutorial above for details.
335
336 Note that if you define a subroutine of the same name as one of the
337 methods you define here, an error will be thrown.
338
339 · constructor determines whether and how to create a "new()" function
340 in the new class. If given a true value, a constructor is created.
341 If given a false value, one isn't.
342
343 If given an anonymous sub or a reference to a sub then that is used
344 as "new()".
345
346 The default is 1 (that is, create a constructor using our default
347 code which simply creates a hashref and blesses it).
348
349 If your class defines its own "new()" method it will not be
350 overwritten. If you define your own "new()" and also tell Builder
351 to define one an error will be thrown.
352
353 · verbose takes a value. If the value is undef, then logging is
354 disabled. If the value is a filehandle then that's where logging
355 will go. If it's a true value, then output will go to "STDERR".
356
357 Alternatively, call "$DateTime::Format::Builder::verbose()" with
358 the relevant value. Whichever value is given more recently is
359 adhered to.
360
361 Be aware that verbosity is a global wide setting.
362
363 · class is optional and specifies the name of the class in which to
364 create the specified methods.
365
366 If using this method in the guise of "import()" then this field
367 will cause an error so it is only of use when calling as
368 "create_class()".
369
370 · version is also optional and specifies the value to give $VERSION
371 in the class. It's generally not recommended unless you're
372 combining with the class option. A "ExtUtils::MakeMaker" / "CPAN"
373 compliant version specification is much better.
374
375 In addition to creating any of the methods it also creates a "new()"
376 method that can instantiate (or clone) objects.
377
379 In the rest of the documentation I've often lied in order to get some
380 of the ideas across more easily. The thing is, this module's very
381 flexible. You can get markedly different behaviour from simply
382 subclassing it and overriding some methods.
383
384 create_method
385 Given a parser coderef, returns a coderef that is suitable to be a
386 method.
387
388 The default action is to call "on_fail()" in the event of a non-parse,
389 but you can make it do whatever you want.
390
391 on_fail
392 This is called in the event of a non-parse (unless you've overridden
393 "create_method()" to do something else.
394
395 The single argument is the input string. The default action is to call
396 "croak()". Above, where I've said parsers or methods throw errors, this
397 is the method that is doing the error throwing.
398
399 You could conceivably override this method to, say, return "undef".
400
402 The methods listed in the METHODS section are all you generally need
403 when creating your own class. Sometimes you may not want a full blown
404 class to parse something just for this one program. Some methods are
405 provided to make that task easier.
406
407 new
408 The basic constructor. It takes no arguments, merely returns a new
409 "DateTime::Format::Builder" object.
410
411 my $parser = DateTime::Format::Builder->new();
412
413 If called as a method on an object (rather than as a class method),
414 then it clones the object.
415
416 my $clone = $parser->new();
417
418 clone
419 Provided for those who prefer an explicit "clone()" method rather than
420 using "new()" as an object method.
421
422 my $clone_of_clone = $clone->clone();
423
424 parser
425 Given either a single or multiple parser specification, sets the object
426 to have a parser based on that specification.
427
428 $parser->parser(
429 regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
430 params => [qw( year month day )],
431 );
432
433 The arguments given to "parser()" are handed directly to
434 "create_parser()". The resultant parser is passed to "set_parser()".
435
436 If called as an object method, it returns the object.
437
438 If called as a class method, it creates a new object, sets its parser
439 and returns that object.
440
441 set_parser
442 Sets the parser of the object to the given parser.
443
444 $parser->set_parser( $coderef );
445
446 Note: this method does not take specifications. It also does not take
447 anything except coderefs. Luckily, coderefs are what most of the other
448 methods produce.
449
450 The method return value is the object itself.
451
452 get_parser
453 Returns the parser the object is using.
454
455 my $code = $parser->get_parser();
456
457 parse_datetime
458 Given a string, it calls the parser and returns the "DateTime" object
459 that results.
460
461 my $dt = $parser->parse_datetime( "1979 07 16" );
462
463 The return value, if not a "DateTime" object, is whatever the parser
464 wants to return. Generally this means that if the parse failed an error
465 will be thrown.
466
467 format_datetime
468 If you call this function, it will throw an errror.
469
471 Some longer examples are provided in the distribution. These implement
472 some of the common parsing DateTime modules using Builder. Each of them
473 are, or were, drop in replacements for the modules at the time of
474 writing them.
475
477 Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
478 DateTime::Format::ICal and DateTime::Format::MySQL, and some much
479 needed review.
480
481 Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for
482 writing the multilength code (both one length with multiple parsers and
483 single parser with multiple lengths), blame for the Regex custom
484 constructor code, spotting a bug in Dispatch, and more much needed
485 review.
486
487 Kellan Elliott-McCrea (KELLAN) for even more review, suggestions,
488 DateTime::Format::W3CDTF and the encouragement to rewrite these docs
489 almost 100%!
490
491 Claus Faerber (CFAERBER) for having me get around to fixing the auto-
492 constructor writing, providing the 'args'/'self' patch, and suggesting
493 the multi-callbacks.
494
495 Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now
496 supports.
497
498 Matthew McGillis for pointing out that "on_fail" overriding should be
499 simpler.
500
501 Simon Cozens (SIMON) for saying it was cool.
502
504 Support for this module is provided via the datetime@perl.org email
505 list. See http://lists.perl.org/ for more details.
506
507 Alternatively, log them via the CPAN RT system via the web or email:
508
509 http://rt.cpan.org/NoAuth/ReportBug.html?Queue=DateTime%3A%3AFormat%3A%3ABuilder
510 bug-datetime-format-builder@rt.cpan.org
511
512 This makes it much easier for me to track things and thus means your
513 problem is less likely to be neglected.
514
516 Copyright (C) Iain Truskett, 2003. All rights reserved.
517
518 This library is free software; you can redistribute it and/or modify it
519 under the same terms as Perl itself, either Perl version 5.000 or, at
520 your option, any later version of Perl 5 you may have available.
521
522 The full text of the licences can be found in the Artistic and COPYING
523 files included with this module, or in perlartistic and perlgpl as
524 supplied with Perl 5.8.1 and later.
525
527 Originally written by Iain Truskett <spoon@cpan.org>, who died on
528 December 29, 2003.
529
530 Maintained by Dave Rolsky <autarch@urth.org>.
531
533 "datetime@perl.org" mailing list.
534
535 http://datetime.perl.org/
536
537 perl, DateTime, DateTime::Format::Builder::Tutorial,
538 DateTime::Format::Builder::Parser
539
540
541
542perl v5.12.0 2010-05-14 DateTime::Format::Builder(3)