1Regexp::Pattern(3) User Contributed Perl Documentation Regexp::Pattern(3)
2
3
4
6 Regexp::Pattern - Convention/framework for modules that contain
7 collection of regexes
8
10 0.2
11
13 This document describes version 0.2.14 of Regexp::Pattern (from Perl
14 distribution Regexp-Pattern), released on 2020-04-01.
15
17 Subroutine interface:
18
19 use Regexp::Pattern; # exports re()
20
21 my $re = re('YouTube::video_id');
22 say "ID does not look like a YouTube video ID" unless $id =~ /\A$re\z/;
23
24 # a dynamic pattern (generated on-demand) with generator arguments
25 my $re2 = re('Example::re3', {variant=>"B"});
26
27 Hash interface (a la Regexp::Common but simpler with
28 regular/non-magical hash that is only 1-level deep):
29
30 use Regexp::Pattern 'YouTube::video_id';
31 say "ID does not look like a YouTube video ID"
32 unless $id =~ /\A$RE{video_id}\z/;
33
34 # more complex example
35
36 use Regexp::Pattern (
37 're', # we still want the re() function
38 'Foo::bar' => (-as => 'qux'), # the pattern will be in your $RE{qux}
39 'YouTube::*', # wildcard import
40 'Example::re3' => (variant => 'B'), # supply generator arguments
41 'JSON::*' => (-prefix => 'json_'), # add prefix
42 'License::*' => (
43 # filtering options
44 -has_tag => 'family:cc', # only select patterns that have this tag
45 -lacks_tag => 'type:unversioned', # only select patterns that do not have this tag
46 -has_tag_matching => qr/^type:/, # only select patterns that have at least a tag matching this regex
47 -lacks_tag_matching => qr/^type:/, # only select patterns that do not have any tags matching this regex
48
49 # other options
50 -prefix => 'pat_', # add prefix
51 -suffix => '_license', # add suffix
52 ),
53 );
54
56 Regexp::Pattern is a convention for organizing reusable regexp patterns
57 in modules, as well as framework to provide convenience in using those
58 patterns in your program.
59
60 Structure of an example Regexp::Pattern::* module
61 package Regexp::Pattern::Example;
62
63
64 our %RE = (
65 # the minimum spec
66 re1 => { pat => qr/\d{3}-\d{3}/ },
67
68 # more complete spec
69 re2 => {
70 summary => 'This is regexp for blah', # plaintext
71 description => <<'_',
72
73 A longer description in *Markdown* format.
74
75 _
76 pat => qr/\d{3}-\d{3}(?:-\d{5})?/,
77 tags => ['A','B'],
78 examples => [
79 # examples can be tested using 'test-regexp-pattern' script
80 # (distributed in Test-Regexp-Pattern distribution). examples can
81 # also be rendered in your POD using
82 # Pod::Weaver::Plugin::Regexp::Pattern.
83 {
84 str => '123-456',
85 matches => 1,
86 },
87 {
88 summary => 'Another example that matches',
89 str => '123-456-78901',
90 matches => 1,
91 },
92 {
93 summary => 'An example that does not match',
94 str => '123456',
95 matches => 0,
96 },
97 {
98 summary => 'An example that does not get tested',
99 str => '123456',
100 },
101 {
102 summary => 'Another example that does not get tested nor rendered to POD',
103 str => '234567',
104 matches => 0,
105 test => 0,
106 doc => 0,
107 },
108 ],
109 },
110
111 # dynamic (regexp generator)
112 re3 => {
113 summary => 'This is a regexp for blah blah',
114 description => <<'_',
115
116 ...
117
118 _
119 gen => sub {
120 my %args = @_;
121 my $variant = $args{variant} || 'A';
122 if ($variant eq 'A') {
123 return qr/\d{3}-\d{3}/;
124 } else { # B
125 return qr/\d{3}-\d{2}-\d{5}/;
126 }
127 },
128 gen_args => {
129 variant => {
130 summary => 'Choose variant',
131 schema => ['str*', in=>['A','B']],
132 default => 'A',
133 req => 1,
134 },
135 },
136 tags => ['B','C'],
137 examples => [
138 {
139 summary => 'An example that matches',
140 gen_args => {variant=>'A'},
141 str => '123-456',
142 matches => 1,
143 },
144 {
145 summary => "An example that doesn't match",
146 gen_args => {variant=>'B'},
147 str => '123-456',
148 matches => 0,
149 },
150 ],
151 },
152
153 re4 => {
154 summary => 'This is a regexp that does capturing',
155 # it is recommended that your pattern does not capture, unless
156 # necessary. capturing pattern should tag with 'capturing' to let
157 # users/tools know.
158 tags => ['capturing'],
159 pat => qr/(\d{3})-(\d{3})/,
160 examples => [
161 {str=>'123-456', matches=>[123, 456]},
162 {str=>'foo-bar', matches=>[]},
163 ],
164 },
165
166 re5 => {
167 summary => 'This is another regexp that is anchored and does (named) capturing',
168 # it is recommended that your pattern is not anchored for more
169 # reusability, unless necessary. anchored pattern should tag with
170 # 'anchored' to let users/tools know.
171 tags => ['capturing', 'anchored'],
172 pat => qr/^(?<cap1>\d{3})-(?<cap2>\d{3})/,
173 examples => [
174 {str=>'123-456', matches=>{cap1=>123, cap2=>456}},
175 {str=>'something 123-456', matches=>{}},
176 ],
177 },
178 );
179
180 A Regexp::Pattern::* module must declare a package global hash variable
181 named %RE. Hash keys are pattern names, hash values are pattern
182 definitions in the form of defhashes (see DefHash).
183
184 Pattern name should be a simple identifier that matches this regexp:
185 "/\A[A-Za-z_][A-Za-z_0-9]*\z/". The definition for the qualified
186 pattern name "Foo::Bar::baz" can then be located in
187 %Regexp::Pattern::Foo::Bar::RE under the hash key "baz".
188
189 Pattern definition hash should at the minimum be:
190
191 { pat => qr/.../ }
192
193 You can add more stuffs from the defhash specification, e.g. summary,
194 description, tags, and so on, for example (taken from
195 Regexp::Pattern::CPAN):
196
197 {
198 summary => 'PAUSE author ID, or PAUSE ID for short',
199 pat => qr/[A-Z][A-Z0-9]{1,8}/,
200 description => <<~HERE,
201 I'm not sure whether PAUSE allows digit for the first letter. For safety
202 I'm assuming no.
203 HERE
204 examples => [
205 {str=>'PERLANCAR', matches=>1},
206 {str=>'BAD ID', anchor=>1, matches=>0},
207 ],
208 }
209
210 Examples. Your regexp specification can include an "examples" property
211 (see above for example). The value of the "examples" property is an
212 array, each of which should be a defhash. For each example, at the
213 minimum you should specify "str" (string to be matched by the regexp),
214 "gen_args" (hash, arguments to use when generating dynamic regexp
215 pattern), and "matches" (a boolean value that specifies whether the
216 regexp should match the string or not, or an array/hash that specifies
217 the captures). You can of course specify other defhash properties (e.g.
218 "summary", "description", etc). Other example properties might be
219 introduced in the future.
220
221 If you use Dist::Zilla to build your distribution, you can use the
222 plugin [Regexp::Pattern] to test the examples during building, and the
223 Pod::Weaver plugin [-Regexp::Pattern] to render the examples in your
224 POD.
225
226 Using a Regexp::Pattern::* module
227 Standalone
228
229 A Regexp::Pattern::* module can be used in a standalone way (i.e. no
230 need to use via the Regexp::Pattern framework), as it simply contains
231 data that can be grabbed using a normal means, e.g.:
232
233 use Regexp::Pattern::Example;
234
235 say "Input does not match blah"
236 unless $input =~ /\A$Regexp::Pattern::Example::RE{re1}{pat}\z/;
237
238 Via Regexp::Pattern, sub interface
239
240 Regexp::Pattern (this module) also provides "re()" function to help
241 retrieve the regexp pattern. See "re" for more details.
242
243 Via Regexp::Pattern, hash interface
244
245 Additionally, Regexp::Pattern (since v0.2.0) lets you import regexp
246 patterns into your %RE package hash variable, a la Regexp::Common (but
247 simpler because the hash is just a regular hash, only 1-level deep, and
248 not magical).
249
250 To import, you specify qualified pattern names as the import arguments:
251
252 use Regexp::Pattern 'Q::pat1', 'Q::pat2', ...;
253
254 Each qualified pattern name can optionally be followed by a list of
255 name-value pairs. A pair name can be an option name (which is dash
256 followed by a word, e.g. "-as", "-prefix") or a generator argument
257 name for dynamic pattern.
258
259 Wildcard import. Instead of a qualified pattern name, you can use
260 'Module::SubModule::*' wildcard syntax to import all patterns from a
261 pattern module.
262
263 Importing into a different name. You can add the import option "-as" to
264 import into a different name, for example:
265
266 use Regexp::Pattern 'YouTube::video_id' => (-as => 'yt_id');
267
268 Prefix and suffix. You can also add a prefix and/or suffix to the
269 imported name:
270
271 use Regexp::Pattern 'Example::*' => (-prefix => 'example_');
272 use Regexp::Pattern 'Example::*' => (-suffix => '_sample');
273
274 Filtering. When wildcard-importing, you can select the patterns you
275 want using a combination of these options: "-has_tag" (only select
276 patterns that have a specified tag), "-lacks_tag" (only select patterns
277 that do not have a specified tag), "-has_tag_matching" (only select
278 patterns that have at least one tag matching specified regex pattern),
279 "-lacks_tag_matching" (only select patterns that do not have any tags
280 matching specified regex pattern).
281
282 Recommendations for writing the regex patterns
283 • Regexp pattern should in general be written as a "qr//" literal
284 instead of string
285
286 That is:
287
288 pat => qr/foo[abc]+/,
289
290 is preferred over:
291
292 pat => 'foo[abc]+',
293
294 Using a string literal is less desirable because of lack of
295 compile-time checking. An exception to this rule is when you want
296 to delay regex compilation for some reason, e.g. you want your user
297 to compile the patterns themselves using different regex engine
298 (see "re::engine::*" modules on CPAN).
299
300 • Regexp pattern should not be anchored (unless really necessary)
301
302 That is:
303
304 pat => qr/foo/,
305
306 is preferred over:
307
308 pat => qr/^foo/, # or qr/foo$/, or qr/\Afoo\z/
309
310 Adding anchors limits the reusability of the pattern. When
311 composing pattern, user can add anchors herself if needed.
312
313 When you define an anchored pattern, adding tag "anchored" is
314 recommended:
315
316 tags => ['anchored'],
317
318 • Regexp pattern should not contain capture groups (unless really
319 necessary)
320
321 Adding capture groups limits the reusability of the pattern because
322 it can affect the groups of the composed pattern. When composing
323 pattern, user can add captures herself if needed.
324
325 When you define a capturing pattern, adding tag "capturing" is
326 recommended:
327
328 tags => ['capturing'],
329
331 re
332 Exported by default. Get a regexp pattern by name from a
333 "Regexp::Pattern::*" module.
334
335 Usage:
336
337 re($name[, \%args ]) => $re
338
339 $name is MODULE_NAME::PATTERN_NAME where MODULE_NAME is name of a
340 "Regexp::Pattern::*" module without the "Regexp::Pattern::" prefix and
341 PATTERN_NAME is a key to the %RE package global hash in the module. A
342 dynamic pattern can accept arguments for its generator, and you can
343 pass it as hashref in the second argument of "re()".
344
345 Anchoring. You can also put "-anchor => 1" in %args. This will
346 conveniently wraps the regex inside "qr/\A(?:...)\z/". To only add left
347 anchoring, specify "-anchor => 'left'" ("qr/\A(?:...)/". To only add
348 right anchoring, specify "-anchor => 'right'" ("qr/(?:...)\z/".
349
350 Die when pattern by name $name cannot be found (either the module
351 cannot be loaded or the pattern with that name is not found in the
352 module).
353
355 My pattern is not anchored, but what if I want to test the anchored
356 version?
357 You can add "anchor=>1" or "gen_args=>{-anchor=>1}" in the example, for
358 example:
359
360 {
361 summary => 'PAUSE author ID, or PAUSE ID for short',
362 pat => qr/[A-Z][A-Z0-9]{1,8}/,
363 description => <<~HERE,
364 I'm not sure whether PAUSE allows digit for the first letter. For safety
365 I'm assuming no.
366 HERE
367 examples => [
368 {str=>'PERLANCAR', matches=>1},
369 {str=>'BAD ID', anchor=>1, matches=>0, summary=>"Contains whitespace"},
370 {str=>'NAMETOOLONG', gen_args=>{-anchor=>1}, matches=>0, summary=>"Too long"},
371 ],
372 }
373
375 Please visit the project's homepage at
376 <https://metacpan.org/release/Regexp-Pattern>.
377
379 Source repository is at
380 <https://github.com/perlancar/perl-Regexp-Pattern>.
381
383 Please report any bugs or feature requests on the bugtracker website
384 <https://rt.cpan.org/Public/Dist/Display.html?Name=Regexp-Pattern>
385
386 When submitting a bug or request, please include a test-file or a patch
387 to an existing test-file that illustrates the bug or desired feature.
388
390 Regexp::Common. Regexp::Pattern is an alternative to Regexp::Common.
391 Regexp::Pattern offers simplicity and lower startup overhead. Instead
392 of a magic hash, you retrieve available regexes from normal data
393 structure or via the provided "re()" function. Regexp::Pattern also
394 provides a hash interface, albeit the hash is not magic.
395
396 Regexp::Common::RegexpPattern, a bridge module to use patterns in
397 "Regexp::Pattern::*" modules via Regexp::Common.
398
399 Regexp::Pattern::RegexpCommon, a bridge module to use patterns in
400 "Regexp::Common::*" modules via Regexp::Pattern.
401
402 App::RegexpPatternUtils
403
404 If you use Dist::Zilla: Dist::Zilla::Plugin::Regexp::Pattern,
405 Pod::Weaver::Plugin::Regexp::Pattern,
406 Dist::Zilla::Plugin::AddModule::RegexpCommon::FromRegexpPattern,
407 Dist::Zilla::Plugin::AddModule::RegexpPattern::FromRegexpCommon.
408
409 Test::Regexp::Pattern and test-regexp-pattern.
410
412 perlancar <perlancar@cpan.org>
413
415 This software is copyright (c) 2020, 2019, 2018, 2016 by
416 perlancar@cpan.org.
417
418 This is free software; you can redistribute it and/or modify it under
419 the same terms as the Perl 5 programming language system itself.
420
421
422
423perl v5.34.0 2021-07-22 Regexp::Pattern(3)