1Regexp::Pattern(3)    User Contributed Perl Documentation   Regexp::Pattern(3)
2
3
4

NAME

6       Regexp::Pattern - Convention/framework for modules that contain
7       collection of regexes
8

SPECIFICATION VERSION

10       0.2
11

VERSION

13       This document describes version 0.2.14 of Regexp::Pattern (from Perl
14       distribution Regexp-Pattern), released on 2020-04-01.
15

SYNOPSIS

17       Subroutine interface:
18
19        use Regexp::Pattern; # exports re()
20
21        my $re = re('YouTube::video_id');
22        say "ID does not look like a YouTube video ID" unless $id =~ /\A$re\z/;
23
24        # a dynamic pattern (generated on-demand) with generator arguments
25        my $re2 = re('Example::re3', {variant=>"B"});
26
27       Hash interface (a la Regexp::Common but simpler with
28       regular/non-magical hash that is only 1-level deep):
29
30        use Regexp::Pattern 'YouTube::video_id';
31        say "ID does not look like a YouTube video ID"
32            unless $id =~ /\A$RE{video_id}\z/;
33
34        # more complex example
35
36        use Regexp::Pattern (
37            're',                                # we still want the re() function
38            'Foo::bar' => (-as => 'qux'),        # the pattern will be in your $RE{qux}
39            'YouTube::*',                        # wildcard import
40            'Example::re3' => (variant => 'B'),  # supply generator arguments
41            'JSON::*' => (-prefix => 'json_'),   # add prefix
42            'License::*' => (
43              # filtering options
44              -has_tag    => 'family:cc',        # only select patterns that have this tag
45              -lacks_tag  => 'type:unversioned', # only select patterns that do not have this tag
46              -has_tag_matching   => qr/^type:/, # only select patterns that have at least a tag matching this regex
47              -lacks_tag_matching => qr/^type:/, # only select patterns that do not have any tags matching this regex
48
49              # other options
50              -prefix  => 'pat_',       # add prefix
51              -suffix  => '_license',   # add suffix
52            ),
53        );
54

DESCRIPTION

56       Regexp::Pattern is a convention for organizing reusable regexp patterns
57       in modules, as well as framework to provide convenience in using those
58       patterns in your program.
59
60   Structure of an example Regexp::Pattern::* module
61        package Regexp::Pattern::Example;
62
63
64        our %RE = (
65            # the minimum spec
66            re1 => { pat => qr/\d{3}-\d{3}/ },
67
68            # more complete spec
69            re2 => {
70                summary => 'This is regexp for blah', # plaintext
71                description => <<'_',
72
73        A longer description in *Markdown* format.
74
75        _
76                pat => qr/\d{3}-\d{3}(?:-\d{5})?/,
77                tags => ['A','B'],
78                examples => [
79                    # examples can be tested using 'test-regexp-pattern' script
80                    # (distributed in Test-Regexp-Pattern distribution). examples can
81                    # also be rendered in your POD using
82                    # Pod::Weaver::Plugin::Regexp::Pattern.
83                    {
84                        str => '123-456',
85                        matches => 1,
86                    },
87                    {
88                        summary => 'Another example that matches',
89                        str => '123-456-78901',
90                        matches => 1,
91                    },
92                    {
93                        summary => 'An example that does not match',
94                        str => '123456',
95                        matches => 0,
96                    },
97                    {
98                        summary => 'An example that does not get tested',
99                        str => '123456',
100                    },
101                    {
102                        summary => 'Another example that does not get tested nor rendered to POD',
103                        str => '234567',
104                        matches => 0,
105                        test => 0,
106                        doc => 0,
107                    },
108                ],
109            },
110
111            # dynamic (regexp generator)
112            re3 => {
113                summary => 'This is a regexp for blah blah',
114                description => <<'_',
115
116        ...
117
118        _
119                gen => sub {
120                    my %args = @_;
121                    my $variant = $args{variant} || 'A';
122                    if ($variant eq 'A') {
123                        return qr/\d{3}-\d{3}/;
124                    } else { # B
125                        return qr/\d{3}-\d{2}-\d{5}/;
126                    }
127                },
128                gen_args => {
129                    variant => {
130                        summary => 'Choose variant',
131                        schema => ['str*', in=>['A','B']],
132                        default => 'A',
133                        req => 1,
134                    },
135                },
136                tags => ['B','C'],
137                examples => [
138                    {
139                        summary => 'An example that matches',
140                        gen_args => {variant=>'A'},
141                        str => '123-456',
142                        matches => 1,
143                    },
144                    {
145                        summary => "An example that doesn't match",
146                        gen_args => {variant=>'B'},
147                        str => '123-456',
148                        matches => 0,
149                    },
150                ],
151            },
152
153            re4 => {
154                summary => 'This is a regexp that does capturing',
155                # it is recommended that your pattern does not capture, unless
156                # necessary. capturing pattern should tag with 'capturing' to let
157                # users/tools know.
158                tags => ['capturing'],
159                pat => qr/(\d{3})-(\d{3})/,
160                examples => [
161                    {str=>'123-456', matches=>[123, 456]},
162                    {str=>'foo-bar', matches=>[]},
163                ],
164            },
165
166            re5 => {
167                summary => 'This is another regexp that is anchored and does (named) capturing',
168                # it is recommended that your pattern is not anchored for more
169                # reusability, unless necessary. anchored pattern should tag with
170                # 'anchored' to let users/tools know.
171                tags => ['capturing', 'anchored'],
172                pat => qr/^(?<cap1>\d{3})-(?<cap2>\d{3})/,
173                examples => [
174                    {str=>'123-456', matches=>{cap1=>123, cap2=>456}},
175                    {str=>'something 123-456', matches=>{}},
176                ],
177            },
178        );
179
180       A Regexp::Pattern::* module must declare a package global hash variable
181       named %RE. Hash keys are pattern names, hash values are pattern
182       definitions in the form of defhashes (see DefHash).
183
184       Pattern name should be a simple identifier that matches this regexp:
185       "/\A[A-Za-z_][A-Za-z_0-9]*\z/". The definition for the qualified
186       pattern name "Foo::Bar::baz" can then be located in
187       %Regexp::Pattern::Foo::Bar::RE under the hash key "baz".
188
189       Pattern definition hash should at the minimum be:
190
191        { pat => qr/.../ }
192
193       You can add more stuffs from the defhash specification, e.g. summary,
194       description, tags, and so on, for example (taken from
195       Regexp::Pattern::CPAN):
196
197        {
198            summary     => 'PAUSE author ID, or PAUSE ID for short',
199            pat         => qr/[A-Z][A-Z0-9]{1,8}/,
200            description => <<~HERE,
201            I'm not sure whether PAUSE allows digit for the first letter. For safety
202            I'm assuming no.
203            HERE
204            examples => [
205                {str=>'PERLANCAR', matches=>1},
206                {str=>'BAD ID', anchor=>1, matches=>0},
207            ],
208        }
209
210       Examples. Your regexp specification can include an "examples" property
211       (see above for example). The value of the "examples" property is an
212       array, each of which should be a defhash. For each example, at the
213       minimum you should specify "str" (string to be matched by the regexp),
214       "gen_args" (hash, arguments to use when generating dynamic regexp
215       pattern), and "matches" (a boolean value that specifies whether the
216       regexp should match the string or not, or an array/hash that specifies
217       the captures). You can of course specify other defhash properties (e.g.
218       "summary", "description", etc). Other example properties might be
219       introduced in the future.
220
221       If you use Dist::Zilla to build your distribution, you can use the
222       plugin [Regexp::Pattern] to test the examples during building, and the
223       Pod::Weaver plugin [-Regexp::Pattern] to render the examples in your
224       POD.
225
226   Using a Regexp::Pattern::* module
227       Standalone
228
229       A Regexp::Pattern::* module can be used in a standalone way (i.e. no
230       need to use via the Regexp::Pattern framework), as it simply contains
231       data that can be grabbed using a normal means, e.g.:
232
233        use Regexp::Pattern::Example;
234
235        say "Input does not match blah"
236            unless $input =~ /\A$Regexp::Pattern::Example::RE{re1}{pat}\z/;
237
238       Via Regexp::Pattern, sub interface
239
240       Regexp::Pattern (this module) also provides re() function to help
241       retrieve the regexp pattern. See "re" for more details.
242
243       Via Regexp::Pattern, hash interface
244
245       Additionally, Regexp::Pattern (since v0.2.0) lets you import regexp
246       patterns into your %RE package hash variable, a la Regexp::Common (but
247       simpler because the hash is just a regular hash, only 1-level deep, and
248       not magical).
249
250       To import, you specify qualified pattern names as the import arguments:
251
252        use Regexp::Pattern 'Q::pat1', 'Q::pat2', ...;
253
254       Each qualified pattern name can optionally be followed by a list of
255       name-value pairs. A pair name can be an option name (which is dash
256       followed by a word, e.g.  "-as", "-prefix") or a generator argument
257       name for dynamic pattern.
258
259       Wildcard import. Instead of a qualified pattern name, you can use
260       'Module::SubModule::*' wildcard syntax to import all patterns from a
261       pattern module.
262
263       Importing into a different name. You can add the import option "-as" to
264       import into a different name, for example:
265
266        use Regexp::Pattern 'YouTube::video_id' => (-as => 'yt_id');
267
268       Prefix and suffix. You can also add a prefix and/or suffix to the
269       imported name:
270
271        use Regexp::Pattern 'Example::*' => (-prefix => 'example_');
272        use Regexp::Pattern 'Example::*' => (-suffix => '_sample');
273
274       Filtering. When wildcard-importing, you can select the patterns you
275       want using a combination of these options: "-has_tag" (only select
276       patterns that have a specified tag), "-lacks_tag" (only select patterns
277       that do not have a specified tag), "-has_tag_matching" (only select
278       patterns that have at least one tag matching specified regex pattern),
279       "-lacks_tag_matching" (only select patterns that do not have any tags
280       matching specified regex pattern).
281
282   Recommendations for writing the regex patterns
283       •   Regexp pattern should in general be written as a "qr//" literal
284           instead of string
285
286           That is:
287
288            pat => qr/foo[abc]+/,
289
290           is preferred over:
291
292            pat => 'foo[abc]+',
293
294           Using a string literal is less desirable because of lack of
295           compile-time checking. An exception to this rule is when you want
296           to delay regex compilation for some reason, e.g. you want your user
297           to compile the patterns themselves using different regex engine
298           (see "re::engine::*" modules on CPAN).
299
300       •   Regexp pattern should not be anchored (unless really necessary)
301
302           That is:
303
304            pat => qr/foo/,
305
306           is preferred over:
307
308            pat => qr/^foo/, # or qr/foo$/, or qr/\Afoo\z/
309
310           Adding anchors limits the reusability of the pattern. When
311           composing pattern, user can add anchors herself if needed.
312
313           When you define an anchored pattern, adding tag "anchored" is
314           recommended:
315
316            tags => ['anchored'],
317
318       •   Regexp pattern should not contain capture groups (unless really
319           necessary)
320
321           Adding capture groups limits the reusability of the pattern because
322           it can affect the groups of the composed pattern. When composing
323           pattern, user can add captures herself if needed.
324
325           When you define a capturing pattern, adding tag "capturing" is
326           recommended:
327
328            tags => ['capturing'],
329

FUNCTIONS

331   re
332       Exported by default. Get a regexp pattern by name from a
333       "Regexp::Pattern::*" module.
334
335       Usage:
336
337        re($name[, \%args ]) => $re
338
339       $name is MODULE_NAME::PATTERN_NAME where MODULE_NAME is name of a
340       "Regexp::Pattern::*" module without the "Regexp::Pattern::" prefix and
341       PATTERN_NAME is a key to the %RE package global hash in the module. A
342       dynamic pattern can accept arguments for its generator, and you can
343       pass it as hashref in the second argument of re().
344
345       Anchoring. You can also put "-anchor => 1" in %args. This will
346       conveniently wraps the regex inside "qr/\A(?:...)\z/". To only add left
347       anchoring, specify "-anchor => 'left'" ("qr/\A(?:...)/". To only add
348       right anchoring, specify "-anchor => 'right'" ("qr/(?:...)\z/".
349
350       Die when pattern by name $name cannot be found (either the module
351       cannot be loaded or the pattern with that name is not found in the
352       module).
353

FAQ

355   My pattern is not anchored, but what if I want to test the anchored
356       version?
357       You can add "anchor=>1" or "gen_args=>{-anchor=>1}" in the example, for
358       example:
359
360        {
361            summary     => 'PAUSE author ID, or PAUSE ID for short',
362            pat         => qr/[A-Z][A-Z0-9]{1,8}/,
363            description => <<~HERE,
364            I'm not sure whether PAUSE allows digit for the first letter. For safety
365            I'm assuming no.
366            HERE
367            examples => [
368                {str=>'PERLANCAR', matches=>1},
369                {str=>'BAD ID', anchor=>1, matches=>0, summary=>"Contains whitespace"},
370                {str=>'NAMETOOLONG', gen_args=>{-anchor=>1}, matches=>0, summary=>"Too long"},
371            ],
372        }
373

HOMEPAGE

375       Please visit the project's homepage at
376       <https://metacpan.org/release/Regexp-Pattern>.
377

SOURCE

379       Source repository is at
380       <https://github.com/perlancar/perl-Regexp-Pattern>.
381

BUGS

383       Please report any bugs or feature requests on the bugtracker website
384       <https://rt.cpan.org/Public/Dist/Display.html?Name=Regexp-Pattern>
385
386       When submitting a bug or request, please include a test-file or a patch
387       to an existing test-file that illustrates the bug or desired feature.
388

SEE ALSO

390       Regexp::Common. Regexp::Pattern is an alternative to Regexp::Common.
391       Regexp::Pattern offers simplicity and lower startup overhead. Instead
392       of a magic hash, you retrieve available regexes from normal data
393       structure or via the provided re() function. Regexp::Pattern also
394       provides a hash interface, albeit the hash is not magic.
395
396       Regexp::Common::RegexpPattern, a bridge module to use patterns in
397       "Regexp::Pattern::*" modules via Regexp::Common.
398
399       Regexp::Pattern::RegexpCommon, a bridge module to use patterns in
400       "Regexp::Common::*" modules via Regexp::Pattern.
401
402       App::RegexpPatternUtils
403
404       If you use Dist::Zilla: Dist::Zilla::Plugin::Regexp::Pattern,
405       Pod::Weaver::Plugin::Regexp::Pattern,
406       Dist::Zilla::Plugin::AddModule::RegexpCommon::FromRegexpPattern,
407       Dist::Zilla::Plugin::AddModule::RegexpPattern::FromRegexpCommon.
408
409       Test::Regexp::Pattern and test-regexp-pattern.
410

AUTHOR

412       perlancar <perlancar@cpan.org>
413
415       This software is copyright (c) 2020, 2019, 2018, 2016 by
416       perlancar@cpan.org.
417
418       This is free software; you can redistribute it and/or modify it under
419       the same terms as the Perl 5 programming language system itself.
420
421
422
423perl v5.36.0                      2023-01-20                Regexp::Pattern(3)
Impressum