1DateTime::Format::BuildUesre(r3)Contributed Perl DocumenDtaatteiToinme::Format::Builder(3)
2
3
4
6 DateTime::Format::Builder - Create DateTime parser classes and objects.
7
9 version 0.83
10
12 package DateTime::Format::Brief;
13
14 use DateTime::Format::Builder (
15 parsers => {
16 parse_datetime => [
17 {
18 regex => qr/^(\d{4})(\d\d)(\d\d)(\d\d)(\d\d)(\d\d)$/,
19 params => [qw( year month day hour minute second )],
20 },
21 {
22 regex => qr/^(\d{4})(\d\d)(\d\d)$/,
23 params => [qw( year month day )],
24 },
25 ],
26 }
27 );
28
30 DateTime::Format::Builder creates DateTime parsers. Many string formats
31 of dates and times are simple and just require a basic regular
32 expression to extract the relevant information. Builder provides a
33 simple way to do this without writing reams of structural code.
34
35 Builder provides a number of methods, most of which you'll never need,
36 or at least rarely need. They're provided more for exposing of the
37 module's innards to any subclasses, or for when you need to do
38 something slightly beyond what I expected.
39
41 See DateTime::Format::Builder::Tutorial.
42
44 Often, I will speak of "undef" being returned, however that's not
45 strictly true.
46
47 When a simple single specification is given for a method, the method
48 isn't given a single parser directly. It's given a wrapper that will
49 call "on_fail" if the single parser returns "undef". The single parser
50 must return "undef" so that a multiple parser can work nicely and
51 actual errors can be thrown from any of the callbacks.
52
53 Similarly, any multiple parsers will only call "on_fail" right at the
54 end when it's tried all it could.
55
56 "on_fail" (see later) is defined, by default, to throw an error.
57
58 Multiple parser specifications can also specify "on_fail" with a
59 coderef as an argument in the options block. This will take precedence
60 over the inheritable and overrideable method.
61
62 That said, don't throw real errors from callbacks in multiple parser
63 specifications unless you really want parsing to stop right there and
64 not try any other parsers.
65
66 In summary: calling a method will result in either a "DateTime" object
67 being returned or an error being thrown (unless you've overridden
68 "on_fail" or "create_method", or you've specified a "on_fail" key to a
69 multiple parser specification).
70
71 Individual parsers (be they multiple parsers or single parsers) will
72 return either the "DateTime" object or "undef".
73
75 A single specification is a hash ref of instructions on how to create a
76 parser.
77
78 The precise set of keys and values varies according to parser type.
79 There are some common ones though:
80
81 • length
82
83 length is an optional parameter that can be used to specify that
84 this particular regex is only applicable to strings of a certain
85 fixed length. This can be used to make parsers more efficient. It's
86 strongly recommended that any parser that can use this parameter
87 does.
88
89 You may happily specify the same length twice. The parsers will be
90 tried in order of specification.
91
92 You can also specify multiple lengths by giving it an arrayref of
93 numbers rather than just a single scalar. If doing so, please keep
94 the number of lengths to a minimum.
95
96 If any specifications without lengths are given and the particular
97 length parser fails, then the non-length parsers are tried.
98
99 This parameter is ignored unless the specification is part of a
100 multiple parser specification.
101
102 • label
103
104 label provides a name for the specification and is passed to some
105 of the callbacks about to mentioned.
106
107 • on_match and on_fail
108
109 on_match and on_fail are callbacks. Both routines will be called
110 with parameters of:
111
112 • input
113
114 input is the input to the parser (after any preprocessing
115 callbacks).
116
117 • label
118
119 label is the label of the parser if there is one.
120
121 • self
122
123 self is the object on which the method has been invoked (which
124 may just be a class name). Naturally, you can then invoke your
125 own methods on it do get information you want.
126
127 • args is an arrayref of any passed arguments, if any. If there
128 were no arguments, then this parameter is not given.
129
130 These routines will be called depending on whether the regex match
131 succeeded or failed.
132
133 • preprocess
134
135 preprocess is a callback provided for cleaning up input prior to
136 parsing. It's given a hash as arguments with the following keys:
137
138 • input
139
140 input is the datetime string the parser was given (if using
141 multiple specifications and an overall preprocess then this is
142 the date after it's been through that preprocessor).
143
144 • parsed
145
146 parsed is the state of parsing so far. Usually empty at this
147 point unless an overall preprocess was given. Items may be
148 placed in it and will be given to any postprocessor and
149 "DateTime->new" (unless the postprocessor deletes it).
150
151 • self, args, label
152
153 self, args, label as per on_match and on_fail.
154
155 The return value from the routine is what is given to the regex.
156 Note that this is last code stop before the match.
157
158 Note: mixing length and a preprocess that modifies the length of
159 the input string is probably not what you meant to do. You probably
160 meant to use the multiple parser variant of preprocess which is
161 done before any length calculations. This "single parser" variant
162 of preprocess is performed after any length calculations.
163
164 • postprocess
165
166 postprocess is the last code stop before "DateTime->new" is called.
167 It's given the same arguments as preprocess. This allows it to
168 modify the parsed parameters after the parse and before the
169 creation of the object. For example, you might use:
170
171 {
172 regex => qr/^(\d\d) (\d\d) (\d\d)$/,
173 params => [qw( year month day )],
174 postprocess => \&_fix_year,
175 }
176
177 where "_fix_year" is defined as:
178
179 sub _fix_year {
180 my %args = @_;
181 my ( $date, $p ) = @args{qw( input parsed )};
182 $p->{year} += $p->{year} > 69 ? 1900 : 2000;
183 return 1;
184 }
185
186 This will cause the two digit years to be corrected according to
187 the cut off. If the year was '69' or lower, then it is made into
188 2069 (or 2045, or whatever the year was parsed as). Otherwise it is
189 assumed to be 19xx. The DateTime::Format::Mail module uses code
190 similar to this (only it allows the cut off to be configured and it
191 doesn't use Builder).
192
193 Note: It is very important to return an explicit value from the
194 postprocess callback. If the return value is false then the parse
195 is taken to have failed. If the return value is true, then the
196 parse is taken to have succeeded and "DateTime->new" is called.
197
198 See the documentation for the individual parsers for their valid keys.
199
200 Parsers at the time of writing are:
201
202 • DateTime::Format::Builder::Parser::Regex - provides regular
203 expression based parsing.
204
205 • DateTime::Format::Builder::Parser::Strptime - provides strptime
206 based parsing.
207
208 Subroutines / coderefs as specifications.
209 A single parser specification can be a coderef. This was added mostly
210 because it could be and because I knew someone, somewhere, would want
211 to use it.
212
213 If the specification is a reference to a piece of code, be it a
214 subroutine, anonymous, or whatever, then it's passed more or less
215 straight through. The code should return "undef" in event of failure
216 (or any false value, but "undef" is strongly preferred), or a true
217 value in the event of success (ideally a "DateTime" object or some
218 object that has the same interface).
219
220 This all said, I generally wouldn't recommend using this feature unless
221 you have to.
222
223 Callbacks
224 I mention a number of callbacks in this document.
225
226 Any time you see a callback being mentioned, you can, if you like,
227 substitute an arrayref of coderefs rather than having the straight
228 coderef.
229
231 These are very easily described as an array of single specifications.
232
233 Note that if the first element of the array is an arrayref, then you're
234 specifying options.
235
236 • preprocess
237
238 preprocess lets you specify a preprocessor that is called before
239 any of the parsers are tried. This lets you do things like strip
240 off timezones or any unnecessary data. The most common use people
241 have for it at present is to get the input date to a particular
242 length so that the length is usable (DateTime::Format::ICal would
243 use it to strip off the variable length timezone).
244
245 Arguments are as for the single parser preprocess variant with the
246 exception that label is never given.
247
248 • on_fail
249
250 on_fail should be a reference to a subroutine that is called if the
251 parser fails. If this is not provided, the default action is to
252 call "DateTime::Format::Builder::on_fail", or the "on_fail" method
253 of the subclass of DTFB that was used to create the parser.
254
256 Builder allows you to plug in a fair few callbacks, which can make
257 following how a parse failed (or succeeded unexpectedly) somewhat
258 tricky.
259
260 For Single Specifications
261 A single specification will do the following:
262
263 User calls parser:
264
265 my $dt = $class->parse_datetime($string);
266
267 1. preprocess is called. It's given $string and a reference to the
268 parsing workspace hash, which we'll call $p. At this point, $p is
269 empty. The return value is used as $date for the rest of this
270 single parser. Anything put in $p is also used for the rest of
271 this single parser.
272
273 2. regex is applied.
274
275 3. If regex did not match, then on_fail is called (and is given $date
276 and also label if it was defined). Any return value is ignored and
277 the next thing is for the single parser to return "undef".
278
279 If regex did match, then on_match is called with the same arguments
280 as would be given to on_fail. The return value is similarly
281 ignored, but we then move to step 4 rather than exiting the parser.
282
283 4. postprocess is called with $date and a filled out $p. The return
284 value is taken as a indication of whether the parse was a success
285 or not. If it wasn't a success then the single parser will exit at
286 this point, returning undef.
287
288 5. "DateTime->new" is called and the user is given the resultant
289 "DateTime" object.
290
291 See the section on error handling regarding the "undef"s mentioned
292 above.
293
294 For Multiple Specifications
295 With multiple specifications:
296
297 User calls parser:
298
299 my $dt = $class->complex_parse($string);
300
301 1. The overall preprocessor is called and is given $string and the
302 hashref $p (identically to the per parser preprocess mentioned in
303 the previous flow).
304
305 If the callback modifies $p then a copy of $p is given to each of
306 the individual parsers. This is so parsers won't accidentally
307 pollute each other's workspace.
308
309 2. If an appropriate length specific parser is found, then it is
310 called and the single parser flow (see the previous section) is
311 followed, and the parser is given a copy of $p and the return value
312 of the overall preprocessor as $date.
313
314 If a "DateTime" object was returned so we go straight back to the
315 user.
316
317 If no appropriate parser was found, or the parser returned "undef",
318 then we progress to step 3!
319
320 3. Any non-length based parsers are tried in the order they were
321 specified.
322
323 For each of those the single specification flow above is performed,
324 and is given a copy of the output from the overall preprocessor.
325
326 If a real "DateTime" object is returned then we exit back to the
327 user.
328
329 If no parser could parse, then an error is thrown.
330
331 See the section on error handling regarding the "undef"s mentioned
332 above.
333
335 In the general course of things you won't need any of the methods. Life
336 often throws unexpected things at us so the methods are all available
337 for use.
338
339 import
340 "import" is a wrapper for "create_class". If you specify the class
341 option (see documentation for "create_class") it will be ignored.
342
343 create_class
344 This method can be used as the runtime equivalent of "import". That is,
345 it takes the exact same parameters as when one does:
346
347 use DateTime::Format::Builder ( ... )
348
349 That can be (almost) equivalently written as:
350
351 use DateTime::Format::Builder;
352 DateTime::Format::Builder->create_class( ... );
353
354 The difference being that the first is done at compile time while the
355 second is done at run time.
356
357 In the tutorial I said there were only two parameters at present. I
358 lied. There are actually three of them.
359
360 • parsers
361
362 parsers takes a hashref of methods and their parser specifications.
363 See the DateTime::Format::Builder::Tutorial for details.
364
365 Note that if you define a subroutine of the same name as one of the
366 methods you define here, an error will be thrown.
367
368 • constructor
369
370 constructor determines whether and how to create a "new" function
371 in the new class. If given a true value, a constructor is created.
372 If given a false value, one isn't.
373
374 If given an anonymous sub or a reference to a sub then that is used
375 as "new".
376
377 The default is 1 (that is, create a constructor using our default
378 code which simply creates a hashref and blesses it).
379
380 If your class defines its own "new" method it will not be
381 overwritten. If you define your own "new" and also tell Builder to
382 define one an error will be thrown.
383
384 • verbose
385
386 verbose takes a value. If the value is "undef", then logging is
387 disabled. If the value is a filehandle then that's where logging
388 will go. If it's a true value, then output will go to "STDERR".
389
390 Alternatively, call $DateTime::Format::Builder::verbose with the
391 relevant value. Whichever value is given more recently is adhered
392 to.
393
394 Be aware that verbosity is a global setting.
395
396 • class
397
398 class is optional and specifies the name of the class in which to
399 create the specified methods.
400
401 If using this method in the guise of "import" then this field will
402 cause an error so it is only of use when calling as "create_class".
403
404 • version
405
406 version is also optional and specifies the value to give $VERSION
407 in the class. It's generally not recommended unless you're
408 combining with the class option. A "ExtUtils::MakeMaker" / "CPAN"
409 compliant version specification is much better.
410
411 In addition to creating any of the methods it also creates a "new"
412 method that can instantiate (or clone) objects.
413
415 In the rest of the documentation I've often lied in order to get some
416 of the ideas across more easily. The thing is, this module's very
417 flexible. You can get markedly different behaviour from simply
418 subclassing it and overriding some methods.
419
420 create_method
421 Given a parser coderef, returns a coderef that is suitable to be a
422 method.
423
424 The default action is to call "on_fail" in the event of a non-parse,
425 but you can make it do whatever you want.
426
427 on_fail
428 This is called in the event of a non-parse (unless you've overridden
429 "create_method" to do something else.
430
431 The single argument is the input string. The default action is to call
432 "croak". Above, where I've said parsers or methods throw errors, this
433 is the method that is doing the error throwing.
434
435 You could conceivably override this method to, say, return "undef".
436
438 The methods listed in the METHODS section are all you generally need
439 when creating your own class. Sometimes you may not want a full blown
440 class to parse something just for this one program. Some methods are
441 provided to make that task easier.
442
443 new
444 The basic constructor. It takes no arguments, merely returns a new
445 "DateTime::Format::Builder" object.
446
447 my $parser = DateTime::Format::Builder->new;
448
449 If called as a method on an object (rather than as a class method),
450 then it clones the object.
451
452 my $clone = $parser->new;
453
454 clone
455 Provided for those who prefer an explicit "clone" method rather than
456 using "new" as an object method.
457
458 my $clone_of_clone = $clone->clone;
459
460 parser
461 Given either a single or multiple parser specification, sets the object
462 to have a parser based on that specification.
463
464 $parser->parser(
465 regex => qr/^ (\d{4}) (\d\d) (\d\d) $/x;
466 params => [qw( year month day )],
467 );
468
469 The arguments given to "parser" are handed directly to "create_parser".
470 The resultant parser is passed to "set_parser".
471
472 If called as an object method, it returns the object.
473
474 If called as a class method, it creates a new object, sets its parser
475 and returns that object.
476
477 set_parser
478 Sets the parser of the object to the given parser.
479
480 $parser->set_parser($coderef);
481
482 Note: this method does not take specifications. It also does not take
483 anything except coderefs. Luckily, coderefs are what most of the other
484 methods produce.
485
486 The method return value is the object itself.
487
488 get_parser
489 Returns the parser the object is using.
490
491 my $code = $parser->get_parser;
492
493 parse_datetime
494 Given a string, it calls the parser and returns the "DateTime" object
495 that results.
496
497 my $dt = $parser->parse_datetime('1979 07 16');
498
499 The return value, if not a "DateTime" object, is whatever the parser
500 wants to return. Generally this means that if the parse failed an error
501 will be thrown.
502
503 format_datetime
504 If you call this function, it will throw an error.
505
507 Some longer examples are provided in the distribution. These implement
508 some of the common parsing DateTime modules using Builder. Each of them
509 are, or were, drop in replacements for the modules at the time of
510 writing them.
511
513 Dave Rolsky (DROLSKY) for kickstarting the DateTime project, writing
514 DateTime::Format::ICal and DateTime::Format::MySQL, and some much
515 needed review.
516
517 Joshua Hoblitt (JHOBLITT) for the concept, some of the API, impetus for
518 writing the multi-length code (both one length with multiple parsers
519 and single parser with multiple lengths), blame for the Regex custom
520 constructor code, spotting a bug in Dispatch, and more much needed
521 review.
522
523 Kellan Elliott-McCrea (KELLAN) for even more review, suggestions,
524 DateTime::Format::W3CDTF and the encouragement to rewrite these docs
525 almost 100%!
526
527 Claus Färber (CFAERBER) for having me get around to fixing the auto-
528 constructor writing, providing the 'args'/'self' patch, and suggesting
529 the multi-callbacks.
530
531 Rick Measham (RICKM) for DateTime::Format::Strptime which Builder now
532 supports.
533
534 Matthew McGillis for pointing out that "on_fail" overriding should be
535 simpler.
536
537 Simon Cozens (SIMON) for saying it was cool.
538
540 "datetime@perl.org" mailing list.
541
542 http://datetime.perl.org/
543
544 perl, DateTime, DateTime::Format::Builder::Tutorial,
545 DateTime::Format::Builder::Parser
546
548 Bugs may be submitted at
549 <https://github.com/houseabsolute/DateTime-Format-Builder/issues>.
550
551 I am also usually active on IRC as 'autarch' on "irc://irc.perl.org".
552
554 The source code repository for DateTime-Format-Builder can be found at
555 <https://github.com/houseabsolute/DateTime-Format-Builder>.
556
558 If you'd like to thank me for the work I've done on this module, please
559 consider making a "donation" to me via PayPal. I spend a lot of free
560 time creating free software, and would appreciate any support you'd
561 care to offer.
562
563 Please note that I am not suggesting that you must do this in order for
564 me to continue working on this particular software. I will continue to
565 do so, inasmuch as I have in the past, for as long as it interests me.
566
567 Similarly, a donation made in this way will probably not make me work
568 on this software much more, unless I get so many donations that I can
569 consider working on free software full time (let's all have a chuckle
570 at that together).
571
572 To donate, log into PayPal and send money to autarch@urth.org, or use
573 the button at <https://www.urth.org/fs-donation.html>.
574
576 • Dave Rolsky <autarch@urth.org>
577
578 • Iain Truskett <spoon@cpan.org>
579
581 • Daisuke Maki <daisuke@endeworks.jp>
582
583 • James Raspass <jraspass@gmail.com>
584
586 This software is Copyright (c) 2020 by Dave Rolsky.
587
588 This is free software, licensed under:
589
590 The Artistic License 2.0 (GPL Compatible)
591
592 The full text of the license can be found in the LICENSE file included
593 with this distribution.
594
595
596
597perl v5.36.0 2022-07-22 DateTime::Format::Builder(3)