1Regexp::Pattern(3) User Contributed Perl Documentation Regexp::Pattern(3)
2
3
4
6 Regexp::Pattern - Convention/framework for modules that contain
7 collection of regexes
8
10 0.2
11
13 This document describes version 0.2.13 of Regexp::Pattern (from Perl
14 distribution Regexp-Pattern), released on 2020-02-08.
15
17 Subroutine interface:
18
19 use Regexp::Pattern; # exports re()
20
21 my $re = re('YouTube::video_id');
22 say "ID does not look like a YouTube video ID" unless $id =~ /\A$re\z/;
23
24 # a dynamic pattern (generated on-demand) with generator arguments
25 my $re2 = re('Example::re3', {variant=>"B"});
26
27 Hash interface (a la Regexp::Common but simpler with
28 regular/non-magical hash that is only 1-level deep):
29
30 use Regexp::Pattern 'YouTube::video_id';
31 say "ID does not look like a YouTube video ID"
32 unless $id =~ /\A$RE{video_id}\z/;
33
34 # more complex example
35
36 use Regexp::Pattern (
37 're', # we still want the re() function
38 'Foo::bar' => (-as => 'qux'), # the pattern will be in your $RE{qux}
39 'YouTube::*', # wildcard import
40 'Example::re3' => (variant => 'B'), # supply generator arguments
41 'JSON::*' => (-prefix => 'json_'), # add prefix
42 'License::*' => (
43 # filtering options
44 -has_tag => 'family:cc', # only select patterns that have this tag
45 -lacks_tag => 'type:unversioned', # only select patterns that do not have this tag
46 -has_tag_matching => qr/^type:/, # only select patterns that have at least a tag matching this regex
47 -lacks_tag_matching => qr/^type:/, # only select patterns that do not have any tags matching this regex
48
49 # other options
50 -prefix => 'pat_', # add prefix
51 -suffix => '_license', # add suffix
52 ),
53 );
54
56 Regexp::Pattern is a convention for organizing reusable regexp patterns
57 in modules, as well as framework to provide convenience in using those
58 patterns in your program.
59
60 Structure of an example Regexp::Pattern::* module
61 package Regexp::Pattern::Example;
62
63
64 our %RE = (
65 # the minimum spec
66 re1 => { pat => qr/\d{3}-\d{3}/ },
67
68 # more complete spec
69 re2 => {
70 summary => 'This is regexp for blah', # plaintext
71 description => <<'_',
72
73 A longer description in *Markdown* format.
74
75 _
76 pat => qr/\d{3}-\d{3}(?:-\d{5})?/,
77 tags => ['A','B'],
78 examples => [
79 # examples can be tested using 'test-regexp-pattern' script
80 # (distributed in Test-Regexp-Pattern distribution). examples can
81 # also be rendered in your POD using
82 # Pod::Weaver::Plugin::Regexp::Pattern.
83 {
84 str => '123-456',
85 matches => 1,
86 },
87 {
88 summary => 'Another example that matches',
89 str => '123-456-78901',
90 matches => 1,
91 },
92 {
93 summary => 'An example that does not match',
94 str => '123456',
95 matches => 0,
96 },
97 {
98 summary => 'An example that does not get tested',
99 str => '123456',
100 },
101 {
102 summary => 'Another example that does not get tested nor rendered to POD',
103 str => '234567',
104 matches => 0,
105 test => 0,
106 doc => 0,
107 },
108 ],
109 },
110
111 # dynamic (regexp generator)
112 re3 => {
113 summary => 'This is a regexp for blah blah',
114 description => <<'_',
115
116 ...
117
118 _
119 gen => sub {
120 my %args = @_;
121 my $variant = $args{variant} || 'A';
122 if ($variant eq 'A') {
123 return qr/\d{3}-\d{3}/;
124 } else { # B
125 return qr/\d{3}-\d{2}-\d{5}/;
126 }
127 },
128 gen_args => {
129 variant => {
130 summary => 'Choose variant',
131 schema => ['str*', in=>['A','B']],
132 default => 'A',
133 req => 1,
134 },
135 },
136 tags => ['B','C'],
137 examples => [
138 {
139 summary => 'An example that matches',
140 gen_args => {variant=>'A'},
141 str => '123-456',
142 matches => 1,
143 },
144 {
145 summary => "An example that doesn't match",
146 gen_args => {variant=>'B'},
147 str => '123-456',
148 matches => 0,
149 },
150 ],
151 },
152
153 re4 => {
154 summary => 'This is a regexp that does capturing',
155 # it is recommended that your pattern does not capture, unless
156 # necessary. capturing pattern should tag with 'capturing' to let
157 # users/tools know.
158 tags => ['capturing'],
159 pat => qr/(\d{3})-(\d{3})/,
160 examples => [
161 {str=>'123-456', matches=>[123, 456]},
162 {str=>'foo-bar', matches=>[]},
163 ],
164 },
165
166 re5 => {
167 summary => 'This is another regexp that is anchored and does (named) capturing',
168 # it is recommended that your pattern is not anchored for more
169 # reusability, unless necessary. anchored pattern should tag with
170 # 'anchored' to let users/tools know.
171 tags => ['capturing', 'anchored'],
172 pat => qr/^(?<cap1>\d{3})-(?<cap2>\d{3})/,
173 examples => [
174 {str=>'123-456', matches=>{cap1=>123, cap2=>456}},
175 {str=>'something 123-456', matches=>{}},
176 ],
177 },
178 );
179
180 A Regexp::Pattern::* module must declare a package global hash variable
181 named %RE. Hash keys are pattern names, hash values are pattern
182 definitions in the form of defhashes (see DefHash).
183
184 Pattern name should be a simple identifier that matches this regexp:
185 "/\A[A-Za-z_][A-Za-z_0-9]*\z/". The definition for the qualified
186 pattern name "Foo::Bar::baz" can then be located in
187 %Regexp::Pattern::Foo::Bar::RE under the hash key "baz".
188
189 Pattern definition hash should at the minimum be:
190
191 { pat => qr/.../ }
192
193 You can add more stuffs from the defhash specification, e.g. summary,
194 description, tags, and so on, for example (taken from
195 Regexp::Pattern::CPAN):
196
197 {
198 summary => 'PAUSE author ID, or PAUSE ID for short',
199 pat => qr/[A-Z][A-Z0-9]{1,8}/,
200 description => <<~HERE,
201 I'm not sure whether PAUSE allows digit for the first letter. For safety
202 I'm assuming no.
203 HERE
204 examples => [
205 {str=>'PERLANCAR', matches=>1},
206 {str=>'BAD ID', anchor=>1, matches=>0},
207 ],
208 }
209
210 Examples. Your regexp specification can include an "examples" property
211 (see above for example). The value of the "examples" property is an
212 array, each of which should be a defhash. For each example, at the
213 minimum you should specify "str" (string to be matched by the regexp),
214 "gen_args" (hash, arguments to use when generating dynamic regexp
215 pattern), and "matches" (a boolean value that specifies whether the
216 regexp should match the string or not, or an array/hash that specifies
217 the captures). You can of course specify other defhash properties (e.g.
218 "summary", "description", etc). Other example properties might be
219 introduced in the future.
220
221 If you use Dist::Zilla to build your distribution, you can use the
222 plugin [Regexp::Pattern] to test the examples during building, and the
223 Pod::Weaver plugin [-Regexp::Pattern] to render the examples in your
224 POD.
225
226 Using a Regexp::Pattern::* module
227 Standalone
228
229 A Regexp::Pattern::* module can be used in a standalone way (i.e. no
230 need to use via the Regexp::Pattern framework), as it simply contains
231 data that can be grabbed using a normal means, e.g.:
232
233 use Regexp::Pattern::Example;
234
235 say "Input does not match blah"
236 unless $input =~ /\A$Regexp::Pattern::Example::RE{re1}{pat}\z/;
237
238 Via Regexp::Pattern, sub interface
239
240 Regexp::Pattern (this module) also provides "re()" function to help
241 retrieve the regexp pattern. See "re" for more details.
242
243 Via Regexp::Pattern, hash interface
244
245 Additionally, Regexp::Pattern (since v0.2.0) lets you import regexp
246 patterns into your %RE package hash variable, a la Regexp::Common (but
247 simpler because the hash is just a regular hash, only 1-level deep, and
248 not magical).
249
250 To import, you specify qualified pattern names as the import arguments:
251
252 use Regexp::Pattern 'Q::pat1', 'Q::pat2', ...;
253
254 Each qualified pattern name can optionally be followed by a list of
255 name-value pairs. A pair name can be an option name (which is dash
256 followed by a word, e.g. "-as", "-prefix") or a generator argument
257 name for dynamic pattern.
258
259 Wildcard import. Instead of a qualified pattern name, you can use
260 'Module::SubModule::*' wildcard syntax to import all patterns from a
261 pattern module.
262
263 Importing into a different name. You can add the import option "-as" to
264 import into a different name, for example:
265
266 use Regexp::Pattern 'YouTube::video_id' => (-as => 'yt_id');
267
268 Prefix and suffix. You can also add a prefix and/or suffix to the
269 imported name:
270
271 use Regexp::Pattern 'Example::*' => (-prefix => 'example_');
272 use Regexp::Pattern 'Example::*' => (-suffix => '_sample');
273
274 Filtering. When wildcard-importing, you can select the patterns you
275 want using a combination of these options: "-has_tag" (only select
276 patterns that have a specified tag), "-lacks_tag" (only select patterns
277 that do not have a specified tag), "-has_tag_matching" (only select
278 patterns that have at least one tag matching specified regex pattern),
279 "-lacks_tag_matching" (only select patterns that do not have any tags
280 matching specified regex pattern).
281
282 Recommendations for writing the regex patterns
283 · Regexp pattern should in general be written as a "qr//" literal
284 instead of string
285
286 That is:
287
288 pat => qr/foo[abc]+/,
289
290 is preferred over:
291
292 pat => 'foo[abc]+',
293
294 Using a string literal is less desirable because of lack of
295 compile-time checking. An exception to this rule is when you want
296 to delay regex compilation for some reason, e.g. you want your user
297 to compile the patterns themselves using different regex engine
298 (see "re::engine::*" modules on CPAN).
299
300 · Regexp pattern should not be anchored (unless really necessary)
301
302 That is:
303
304 pat => qr/foo/,
305
306 is preferred over:
307
308 pat => qr/^foo/, # or qr/foo$/, or qr/\Afoo\z/
309
310 Adding anchors limits the reusability of the pattern. When
311 composing pattern, user can add anchors herself if needed.
312
313 When you define an anchored pattern, adding tag "anchored" is
314 recommended:
315
316 tags => ['anchored'],
317
318 · Regexp pattern should not contain capture groups (unless really
319 necessary)
320
321 Adding capture groups limits the reusability of the pattern because
322 it can affect the groups of the composed pattern. When composing
323 pattern, user can add captures herself if needed.
324
325 When you define a capturing pattern, adding tag "capturing" is
326 recommended:
327
328 tags => ['capturing'],
329
331 re
332 Exported by default. Get a regexp pattern by name from a
333 "Regexp::Pattern::*" module.
334
335 Usage:
336
337 re($name[, \%args ]) => $re
338
339 $name is MODULE_NAME::PATTERN_NAME where MODULE_NAME is name of a
340 "Regexp::Pattern::*" module without the "Regexp::Pattern::" prefix and
341 PATTERN_NAME is a key to the %RE package global hash in the module. A
342 dynamic pattern can accept arguments for its generator, and you can
343 pass it as hashref in the second argument of "re()".
344
345 Anchoring. You can also put "-anchor => 1" in %args. This will
346 conveniently wraps the regex inside "qr/\A(?:...)\z/".
347
348 Die when pattern by name $name cannot be found (either the module
349 cannot be loaded or the pattern with that name is not found in the
350 module).
351
353 My pattern is not anchored, but what if I want to test the anchored
354 version?
355 You can add "anchor=>1" or "gen_args=>{-anchor=>1}" in the example, for
356 example:
357
358 {
359 summary => 'PAUSE author ID, or PAUSE ID for short',
360 pat => qr/[A-Z][A-Z0-9]{1,8}/,
361 description => <<~HERE,
362 I'm not sure whether PAUSE allows digit for the first letter. For safety
363 I'm assuming no.
364 HERE
365 examples => [
366 {str=>'PERLANCAR', matches=>1},
367 {str=>'BAD ID', anchor=>1, matches=>0, summary=>"Contains whitespace"},
368 {str=>'NAMETOOLONG', gen_args=>{-anchor=>1}, matches=>0, summary=>"Too long"},
369 ],
370 }
371
373 Please visit the project's homepage at
374 <https://metacpan.org/release/Regexp-Pattern>.
375
377 Source repository is at
378 <https://github.com/perlancar/perl-Regexp-Pattern>.
379
381 Please report any bugs or feature requests on the bugtracker website
382 <https://rt.cpan.org/Public/Dist/Display.html?Name=Regexp-Pattern>
383
384 When submitting a bug or request, please include a test-file or a patch
385 to an existing test-file that illustrates the bug or desired feature.
386
388 Regexp::Common. Regexp::Pattern is an alternative to Regexp::Common.
389 Regexp::Pattern offers simplicity and lower startup overhead. Instead
390 of a magic hash, you retrieve available regexes from normal data
391 structure or via the provided "re()" function. Regexp::Pattern also
392 provides a hash interface, albeit the hash is not magic.
393
394 Regexp::Common::RegexpPattern, a bridge module to use patterns in
395 "Regexp::Pattern::*" modules via Regexp::Common.
396
397 Regexp::Pattern::RegexpCommon, a bridge module to use patterns in
398 "Regexp::Common::*" modules via Regexp::Pattern.
399
400 App::RegexpPatternUtils
401
402 If you use Dist::Zilla: Dist::Zilla::Plugin::Regexp::Pattern,
403 Pod::Weaver::Plugin::Regexp::Pattern,
404 Dist::Zilla::Plugin::AddModule::RegexpCommon::FromRegexpPattern,
405 Dist::Zilla::Plugin::AddModule::RegexpPattern::FromRegexpCommon.
406
407 Test::Regexp::Pattern and test-regexp-pattern.
408
410 perlancar <perlancar@cpan.org>
411
413 This software is copyright (c) 2020, 2019, 2018, 2016 by
414 perlancar@cpan.org.
415
416 This is free software; you can redistribute it and/or modify it under
417 the same terms as the Perl 5 programming language system itself.
418
419
420
421perl v5.30.1 2020-02-08 Regexp::Pattern(3)