1DateTime::Format::BuildUesre(r3)Contributed Perl DocumenDtaatteiToinme::Format::Builder(3)
2
3
4
6 DateTime::Format::Builder - Create DateTime parser classes and objects.
7
9 version 0.82
10
12 package DateTime::Format::Brief;
13
14 use DateTime::Format::Builder
15 (
16 parsers => {
17 parse_datetime => [
18 {
19 regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
20 params => [qw( year month day hour minute second )],
21 },
22 {
23 regex => qr/^(\d{4})(\d\d)(\d\d)$/,
24 params => [qw( year month day )],
25 },
26 ],
27 }
28 );
29
31 DateTime::Format::Builder creates DateTime parsers. Many string
32 formats of dates and times are simple and just require a basic regular
33 expression to extract the relevant information. Builder provides a
34 simple way to do this without writing reams of structural code.
35
36 Builder provides a number of methods, most of which you'll never need,
37 or at least rarely need. They're provided more for exposing of the
38 module's innards to any subclasses, or for when you need to do
39 something slightly beyond what I expected.
40
42 See DateTime::Format::Builder::Tutorial.
43
45 Often, I will speak of "undef" being returned, however that's not
46 strictly true.
47
48 When a simple single specification is given for a method, the method
49 isn't given a single parser directly. It's given a wrapper that will
50 call "on_fail()" if the single parser returns "undef". The single
51 parser must return "undef" so that a multiple parser can work nicely
52 and actual errors can be thrown from any of the callbacks.
53
54 Similarly, any multiple parsers will only call "on_fail()" right at the
55 end when it's tried all it could.
56
57 "on_fail()" (see later) is defined, by default, to throw an error.
58
59 Multiple parser specifications can also specify "on_fail" with a
60 coderef as an argument in the options block. This will take precedence
61 over the inheritable and overrideable method.
62
63 That said, don't throw real errors from callbacks in multiple parser
64 specifications unless you really want parsing to stop right there and
65 not try any other parsers.
66
67 In summary: calling a method will result in either a "DateTime" object
68 being returned or an error being thrown (unless you've overridden
69 "on_fail()" or "create_method()", or you've specified a "on_fail" key
70 to a multiple parser specification).
71
72 Individual parsers (be they multiple parsers or single parsers) will
73 return either the "DateTime" object or "undef".
74
76 A single specification is a hash ref of instructions on how to create a
77 parser.
78
79 The precise set of keys and values varies according to parser type.
80 There are some common ones though:
81
82 · length is an optional parameter that can be used to specify that
83 this particular regex is only applicable to strings of a certain
84 fixed length. This can be used to make parsers more efficient. It's
85 strongly recommended that any parser that can use this parameter
86 does.
87
88 You may happily specify the same length twice. The parsers will be
89 tried in order of specification.
90
91 You can also specify multiple lengths by giving it an arrayref of
92 numbers rather than just a single scalar. If doing so, please keep
93 the number of lengths to a minimum.
94
95 If any specifications without lengths are given and the particular
96 length parser fails, then the non-length parsers are tried.
97
98 This parameter is ignored unless the specification is part of a
99 multiple parser specification.
100
101 · label provides a name for the specification and is passed to some
102 of the callbacks about to mentioned.
103
104 · on_match and on_fail are callbacks. Both routines will be called
105 with parameters of:
106
107 · input, being the input to the parser (after any preprocessing
108 callbacks).
109
110 · label, being the label of the parser, if there is one.
111
112 · self, being the object on which the method has been invoked
113 (which may just be a class name). Naturally, you can then
114 invoke your own methods on it do get information you want.
115
116 · args, being an arrayref of any passed arguments, if any. If
117 there were no arguments, then this parameter is not given.
118
119 These routines will be called depending on whether the regex match
120 succeeded or failed.
121
122 · preprocess is a callback provided for cleaning up input prior to
123 parsing. It's given a hash as arguments with the following keys:
124
125 · input being the datetime string the parser was given (if using
126 multiple specifications and an overall preprocess then this is
127 the date after it's been through that preprocessor).
128
129 · parsed being the state of parsing so far. Usually empty at this
130 point unless an overall preprocess was given. Items may be
131 placed in it and will be given to any postprocessor and
132 "DateTime->new" (unless the postprocessor deletes it).
133
134 · self, args, label as per on_match and on_fail.
135
136 The return value from the routine is what is given to the regex.
137 Note that this is last code stop before the match.
138
139 Note: mixing length and a preprocess that modifies the length of
140 the input string is probably not what you meant to do. You probably
141 meant to use the multiple parser variant of preprocess which is
142 done before any length calculations. This "single parser" variant
143 of preprocess is performed after any length calculations.
144
145 · postprocess is the last code stop before "DateTime->new()" is
146 called. It's given the same arguments as preprocess. This allows it
147 to modify the parsed parameters after the parse and before the
148 creation of the object. For example, you might use:
149
150 {
151 regex => qr/^(\d\d) (\d\d) (\d\d)$/,
152 params => [qw( year month day )],
153 postprocess => \&_fix_year,
154 }
155
156 where "_fix_year" is defined as:
157
158 sub _fix_year
159 {
160 my %args = @_;
161 my ($date, $p) = @args{qw( input parsed )};
162 $p->{year} += $p->{year} > 69 ? 1900 : 2000;
163 return 1;
164 }
165
166 This will cause the two digit years to be corrected according to
167 the cut off. If the year was '69' or lower, then it is made into
168 2069 (or 2045, or whatever the year was parsed as). Otherwise it is
169 assumed to be 19xx. The DateTime::Format::Mail module uses code
170 similar to this (only it allows the cut off to be configured and it
171 doesn't use Builder).
172
173 Note: It is very important to return an explicit value from the
174 postprocess callback. If the return value is false then the parse
175 is taken to have failed. If the return value is true, then the
176 parse is taken to have succeeded and "DateTime->new()" is called.
177
178 See the documentation for the individual parsers for their valid keys.
179
180 Parsers at the time of writing are:
181
182 · DateTime::Format::Builder::Parser::Regex - provides regular
183 expression based parsing.
184
185 · DateTime::Format::Builder::Parser::Strptime - provides strptime
186 based parsing.
187
188 Subroutines / coderefs as specifications.
189 A single parser specification can be a coderef. This was added mostly
190 because it could be and because I knew someone, somewhere, would want
191 to use it.
192
193 If the specification is a reference to a piece of code, be it a
194 subroutine, anonymous, or whatever, then it's passed more or less
195 straight through. The code should return "undef" in event of failure
196 (or any false value, but "undef" is strongly preferred), or a true
197 value in the event of success (ideally a "DateTime" object or some
198 object that has the same interface).
199
200 This all said, I generally wouldn't recommend using this feature unless
201 you have to.
202
203 Callbacks
204 I mention a number of callbacks in this document.
205
206 Any time you see a callback being mentioned, you can, if you like,
207 substitute an arrayref of coderefs rather than having the straight
208 coderef.
209
211 These are very easily described as an array of single specifications.
212
213 Note that if the first element of the array is an arrayref, then you're
214 specifying options.
215
216 · preprocess lets you specify a preprocessor that is called before
217 any of the parsers are tried. This lets you do things like strip
218 off timezones or any unnecessary data. The most common use people
219 have for it at present is to get the input date to a particular
220 length so that the length is usable (DateTime::Format::ICal would
221 use it to strip off the variable length timezone).
222
223 Arguments are as for the single parser preprocess variant with the
224 exception that label is never given.
225
226 · on_fail should be a reference to a subroutine that is called if the
227 parser fails. If this is not provided, the default action is to
228 call "DateTime::Format::Builder::on_fail", or the "on_fail" method
229 of the subclass of DTFB that was used to create the parser.
230
232 Builder allows you to plug in a fair few callbacks, which can make
233 following how a parse failed (or succeeded unexpectedly) somewhat
234 tricky.
235
236 For Single Specifications
237 A single specification will do the following:
238
239 User calls parser:
240
241 my $dt = $class->parse_datetime( $string );
242
243 1. preprocess is called. It's given $string and a reference to the
244 parsing workspace hash, which we'll call $p. At this point, $p is
245 empty. The return value is used as $date for the rest of this
246 single parser. Anything put in $p is also used for the rest of
247 this single parser.
248
249 2. regex is applied.
250
251 3. If regex did not match, then on_fail is called (and is given $date
252 and also label if it was defined). Any return value is ignored and
253 the next thing is for the single parser to return "undef".
254
255 If regex did match, then on_match is called with the same arguments
256 as would be given to on_fail. The return value is similarly
257 ignored, but we then move to step 4 rather than exiting the parser.
258
259 4. postprocess is called with $date and a filled out $p. The return
260 value is taken as a indication of whether the parse was a success
261 or not. If it wasn't a success then the single parser will exit at
262 this point, returning undef.
263
264 5. "DateTime->new()" is called and the user is given the resultant
265 "DateTime" object.
266
267 See the section on error handling regarding the "undef"s mentioned
268 above.
269
270 For Multiple Specifications
271 With multiple specifications:
272
273 User calls parser:
274
275 my $dt = $class->complex_parse( $string );
276
277 1. The overall preprocessor is called and is given $string and the
278 hashref $p (identically to the per parser preprocess mentioned in
279 the previous flow).
280
281 If the callback modifies $p then a copy of $p is given to each of
282 the individual parsers. This is so parsers won't accidentally
283 pollute each other's workspace.
284
285 2. If an appropriate length specific parser is found, then it is
286 called and the single parser flow (see the previous section) is
287 followed, and the parser is given a copy of $p and the return value
288 of the overall preprocessor as $date.
289
290 If a "DateTime" object was returned so we go straight back to the
291 user.
292
293 If no appropriate parser was found, or the parser returned "undef",
294 then we progress to step 3!
295
296 3. Any non-length based parsers are tried in the order they were
297 specified.
298
299 For each of those the single specification flow above is performed,
300 and is given a copy of the output from the overall preprocessor.
301
302 If a real "DateTime" object is returned then we exit back to the
303 user.
304
305 If no parser could parse, then an error is thrown.
306
307 See the section on error handling regarding the "undef"s mentioned
308 above.
309
311 In the general course of things you won't need any of the methods. Life
312 often throws unexpected things at us so the methods are all available
313 for use.
314
315 import
316 "import()" is a wrapper for "create_class()". If you specify the class
317 option (see documentation for "create_class()") it will be ignored.
318
319 create_class
320 This method can be used as the runtime equivalent of "import()". That
321 is, it takes the exact same parameters as when one does:
322
323 use DateTime::Format::Builder ( blah blah blah )
324
325 That can be (almost) equivalently written as:
326
327 use DateTime::Format::Builder;
328 DateTime::Format::Builder->create_class( blah blah blah );
329
330 The difference being that the first is done at compile time while the
331 second is done at run time.
332
333 In the tutorial I said there were only two parameters at present. I
334 lied. There are actually three of them.
335
336 · parsers takes a hashref of methods and their parser specifications.
337 See the DateTime::Format::Builder::Tutorial for details.
338
339 Note that if you define a subroutine of the same name as one of the
340 methods you define here, an error will be thrown.
341
342 · constructor determines whether and how to create a "new()" function
343 in the new class. If given a true value, a constructor is created.
344 If given a false value, one isn't.
345
346 If given an anonymous sub or a reference to a sub then that is used
347 as "new()".
348
349 The default is 1 (that is, create a constructor using our default
350 code which simply creates a hashref and blesses it).
351
352 If your class defines its own "new()" method it will not be
353 overwritten. If you define your own "new()" and also tell Builder
354 to define one an error will be thrown.
355
356 · verbose takes a value. If the value is undef, then logging is
357 disabled. If the value is a filehandle then that's where logging
358 will go. If it's a true value, then output will go to "STDERR".
359
360 Alternatively, call "$DateTime::Format::Builder::verbose()" with
361 the relevant value. Whichever value is given more recently is
362 adhered to.
363
364 Be aware that verbosity is a global wide setting.
365
366 · class is optional and specifies the name of the class in which to
367 create the specified methods.
368
369 If using this method in the guise of "import()" then this field
370 will cause an error so it is only of use when calling as
371 "create_class()".
372
373 · version is also optional and specifies the value to give $VERSION
374 in the class. It's generally not recommended unless you're
375 combining with the class option. A "ExtUtils::MakeMaker" / "CPAN"
376 compliant version specification is much better.
377
378 In addition to creating any of the methods it also creates a "new()"
379 method that can instantiate (or clone) objects.
380
382 In the rest of the documentation I've often lied in order to get some
383 of the ideas across more easily. The thing is, this module's very
384 flexible. You can get markedly different behaviour from simply
385 subclassing it and overriding some methods.
386
387 create_method
388 Given a parser coderef, returns a coderef that is suitable to be a
389 method.
390
391 The default action is to call "on_fail()" in the event of a non-parse,
392 but you can make it do whatever you want.
393
394 on_fail
395 This is called in the event of a non-parse (unless you've overridden
396 "create_method()" to do something else.
397
398 The single argument is the input string. The default action is to call
399 "croak()". Above, where I've said parsers or methods throw errors, this
400 is the method that is doing the error throwing.
401
402 You could conceivably override this method to, say, return "undef".
403
405 The methods listed in the METHODS section are all you generally need
406 when creating your own class. Sometimes you may not want a full blown
407 class to parse something just for this one program. Some methods are
408 provided to make that task easier.
409
410 new
411 The basic constructor. It takes no arguments, merely returns a new
412 "DateTime::Format::Builder" object.
413
414 my $parser = DateTime::Format::Builder->new();
415
416 If called as a method on an object (rather than as a class method),
417 then it clones the object.
418
419 my $clone = $parser->new();
420
421 clone
422 Provided for those who prefer an explicit "clone()" method rather than
423 using "new()" as an object method.
424
425 my $clone_of_clone = $clone->clone();
426
427 parser
428 Given either a single or multiple parser specification, sets the object
429 to have a parser based on that specification.
430
431 $parser->parser(
432 regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
433 params => [qw( year month day )],
434 );
435
436 The arguments given to "parser()" are handed directly to
437 "create_parser()". The resultant parser is passed to "set_parser()".
438
439 If called as an object method, it returns the object.
440
441 If called as a class method, it creates a new object, sets its parser
442 and returns that object.
443
444 set_parser
445 Sets the parser of the object to the given parser.
446
447 $parser->set_parser( $coderef );
448
449 Note: this method does not take specifications. It also does not take
450 anything except coderefs. Luckily, coderefs are what most of the other
451 methods produce.
452
453 The method return value is the object itself.
454
455 get_parser
456 Returns the parser the object is using.
457
458 my $code = $parser->get_parser();
459
460 parse_datetime
461 Given a string, it calls the parser and returns the "DateTime" object
462 that results.
463
464 my $dt = $parser->parse_datetime( "1979 07 16" );
465
466 The return value, if not a "DateTime" object, is whatever the parser
467 wants to return. Generally this means that if the parse failed an error
468 will be thrown.
469
470 format_datetime
471 If you call this function, it will throw an error.
472
474 Some longer examples are provided in the distribution. These implement
475 some of the common parsing DateTime modules using Builder. Each of them
476 are, or were, drop in replacements for the modules at the time of
477 writing them.
478
480 Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
481 DateTime::Format::ICal and DateTime::Format::MySQL, and some much
482 needed review.
483
484 Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for
485 writing the multi-length code (both one length with multiple parsers
486 and single parser with multiple lengths), blame for the Regex custom
487 constructor code, spotting a bug in Dispatch, and more much needed
488 review.
489
490 Kellan Elliott-McCrea (KELLAN) for even more review, suggestions,
491 DateTime::Format::W3CDTF and the encouragement to rewrite these docs
492 almost 100%!
493
494 Claus Färber (CFAERBER) for having me get around to fixing the auto-
495 constructor writing, providing the 'args'/'self' patch, and suggesting
496 the multi-callbacks.
497
498 Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now
499 supports.
500
501 Matthew McGillis for pointing out that "on_fail" overriding should be
502 simpler.
503
504 Simon Cozens (SIMON) for saying it was cool.
505
507 "datetime@perl.org" mailing list.
508
509 http://datetime.perl.org/
510
511 perl, DateTime, DateTime::Format::Builder::Tutorial,
512 DateTime::Format::Builder::Parser
513
515 Bugs may be submitted at
516 <http://rt.cpan.org/Public/Dist/Display.html?Name=DateTime-Format-Builder>
517 or via email to bug-datetime-format-builder@rt.cpan.org <mailto:bug-
518 datetime-format-builder@rt.cpan.org>.
519
520 I am also usually active on IRC as 'autarch' on "irc://irc.perl.org".
521
523 The source code repository for DateTime-Format-Builder can be found at
524 <https://github.com/houseabsolute/DateTime-Format-Builder>.
525
527 If you'd like to thank me for the work I've done on this module, please
528 consider making a "donation" to me via PayPal. I spend a lot of free
529 time creating free software, and would appreciate any support you'd
530 care to offer.
531
532 Please note that I am not suggesting that you must do this in order for
533 me to continue working on this particular software. I will continue to
534 do so, inasmuch as I have in the past, for as long as it interests me.
535
536 Similarly, a donation made in this way will probably not make me work
537 on this software much more, unless I get so many donations that I can
538 consider working on free software full time (let's all have a chuckle
539 at that together).
540
541 To donate, log into PayPal and send money to autarch@urth.org, or use
542 the button at <http://www.urth.org/~autarch/fs-donation.html>.
543
545 · Dave Rolsky <autarch@urth.org>
546
547 · Iain Truskett
548
550 · Daisuke Maki <daisuke@endeworks.jp>
551
552 · Ian Truskett <spoon@cpan.org>
553
554 · (no author) <(no author)@49043108-e40d-0410-ab17-85caa8b5b18d>
555
557 This software is Copyright (c) 2019 by Dave Rolsky.
558
559 This is free software, licensed under:
560
561 The Artistic License 2.0 (GPL Compatible)
562
563 The full text of the license can be found in the LICENSE file included
564 with this distribution.
565
566
567
568perl v5.30.1 2020-01-29 DateTime::Format::Builder(3)