1Regexp::Common(3)     User Contributed Perl Documentation    Regexp::Common(3)
2
3
4

NAME

6       Regexp::Common - Provide commonly requested regular expressions
7

SYNOPSIS

9        # STANDARD USAGE
10
11        use Regexp::Common;
12
13        while (<>) {
14            /$RE{num}{real}/               and print q{a number};
15            /$RE{quoted}                   and print q{a ['"`] quoted string};
16            /$RE{delimited}{-delim=>'/'}/  and print q{a /.../ sequence};
17            /$RE{balanced}{-parens=>'()'}/ and print q{balanced parentheses};
18            /$RE{profanity}/               and print q{a #*@%-ing word};
19        }
20
21
22        # SUBROUTINE-BASED INTERFACE
23
24        use Regexp::Common 'RE_ALL';
25
26        while (<>) {
27            $_ =~ RE_num_real()              and print q{a number};
28            $_ =~ RE_quoted()                and print q{a ['"`] quoted string};
29            $_ =~ RE_delimited(-delim=>'/')  and print q{a /.../ sequence};
30            $_ =~ RE_balanced(-parens=>'()'} and print q{balanced parentheses};
31            $_ =~ RE_profanity()             and print q{a #*@%-ing word};
32        }
33
34
35        # IN-LINE MATCHING...
36
37        if ( $RE{num}{int}->matches($text) ) {...}
38
39
40        # ...AND SUBSTITUTION
41
42        my $cropped = $RE{ws}{crop}->subs($uncropped);
43
44
45        # ROLL-YOUR-OWN PATTERNS
46
47        use Regexp::Common 'pattern';
48
49        pattern name   => ['name', 'mine'],
50                create => '(?i:J[.]?\s+A[.]?\s+Perl-Hacker)',
51                ;
52
53        my $name_matcher = $RE{name}{mine};
54
55        pattern name    => [ 'lineof', '-char=_' ],
56                create  => sub {
57                               my $flags = shift;
58                               my $char = quotemeta $flags->{-char};
59                               return '(?:^$char+$)';
60                           },
61                matches => sub {
62                               my ($self, $str) = @_;
63                               return $str !~ /[^$self->{flags}{-char}]/;
64                           },
65                subs   => sub {
66                               my ($self, $str, $replacement) = @_;
67                               $_[1] =~ s/^$self->{flags}{-char}+$//g;
68                          },
69                ;
70
71        my $asterisks = $RE{lineof}{-char=>'*'};
72
73        # DECIDING WHICH PATTERNS TO LOAD.
74
75        use Regexp::Common qw /comment number/;  # Comment and number patterns.
76        use Regexp::Common qw /no_defaults/;     # Don't load any patterns.
77        use Regexp::Common qw /!delimited/;      # All, but delimited patterns.
78

DESCRIPTION

80       By default, this module exports a single hash (%RE) that stores or
81       generates commonly needed regular expressions (see "List of available
82       patterns").
83
84       There is an alternative, subroutine-based syntax described in
85       "Subroutine-based interface".
86
87   General syntax for requesting patterns
88       To access a particular pattern, %RE is treated as a hierarchical hash
89       of hashes (of hashes...), with each successive key being an identifier.
90       For example, to access the pattern that matches real numbers, you
91       specify:
92
93               $RE{num}{real}
94
95       and to access the pattern that matches integers:
96
97               $RE{num}{int}
98
99       Deeper layers of the hash are used to specify flags: arguments that
100       modify the resulting pattern in some way. The keys used to access these
101       layers are prefixed with a minus sign and may have a value; if a value
102       is given, it's done by using a multidimensional key.  For example, to
103       access the pattern that matches base-2 real numbers with embedded
104       commas separating groups of three digits (e.g. 10,101,110.110101101):
105
106               $RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}
107
108       Through the magic of Perl, these flag layers may be specified in any
109       order (and even interspersed through the identifier keys!)  so you
110       could get the same pattern with:
111
112               $RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}
113
114       or:
115
116               $RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}
117
118       or even:
119
120               $RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}
121
122       etc.
123
124       Note, however, that the relative order of amongst the identifier keys
125       is significant. That is:
126
127               $RE{list}{set}
128
129       would not be the same as:
130
131               $RE{set}{list}
132
133   Flag syntax
134       In versions prior to 2.113, flags could also be written as
135       "{"-flag=value"}". This no longer works, although "{"-flag$;value"}"
136       still does. However, "{-flag => 'value'}" is the preferred syntax.
137
138   Universal flags
139       Normally, flags are specific to a single pattern.  However, there is
140       two flags that all patterns may specify.
141
142       "-keep"
143           By default, the patterns provided by %RE contain no capturing
144           parentheses. However, if the "-keep" flag is specified (it requires
145           no value) then any significant substrings that the pattern matches
146           are captured. For example:
147
148                   if ($str =~ $RE{num}{real}{-keep}) {
149                           $number   = $1;
150                           $whole    = $3;
151                           $decimals = $5;
152                   }
153
154           Special care is needed if a "kept" pattern is interpolated into a
155           larger regular expression, as the presence of other capturing
156           parentheses is likely to change the "number variables" into which
157           significant substrings are saved.
158
159           See also "Adding new regular expressions", which describes how to
160           create new patterns with "optional" capturing brackets that respond
161           to "-keep".
162
163       "-i"
164           Some patterns or subpatterns only match lowercase or uppercase
165           letters.  If one wants the do case insensitive matching, one option
166           is to use the "/i" regexp modifier, or the special sequence "(?i)".
167           But if the functional interface is used, one does not have this
168           option. The "-i" switch solves this problem; by using it, the
169           pattern will do case insensitive matching.
170
171   OO interface and inline matching/substitution
172       The patterns returned from %RE are objects, so rather than writing:
173
174               if ($str =~ /$RE{some}{pattern}/ ) {...}
175
176       you can write:
177
178               if ( $RE{some}{pattern}->matches($str) ) {...}
179
180       For matching this would seem to have no great advantage apart from
181       readability (but see below).
182
183       For substitutions, it has other significant benefits. Frequently you
184       want to perform a substitution on a string without changing the
185       original. Most people use this:
186
187               $changed = $original;
188               $changed =~ s/$RE{some}{pattern}/$replacement/;
189
190       The more adept use:
191
192               ($changed = $original) =~ s/$RE{some}{pattern}/$replacement/;
193
194       Regexp::Common allows you do write this:
195
196               $changed = $RE{some}{pattern}->subs($original=>$replacement);
197
198       Apart from reducing precedence-angst, this approach has the added
199       advantages that the substitution behaviour can be optimized from the
200       regular expression, and the replacement string can be provided by
201       default (see "Adding new regular expressions").
202
203       For example, in the implementation of this substitution:
204
205               $cropped = $RE{ws}{crop}->subs($uncropped);
206
207       the default empty string is provided automatically, and the
208       substitution is optimized to use:
209
210               $uncropped =~ s/^\s+//;
211               $uncropped =~ s/\s+$//;
212
213       rather than:
214
215               $uncropped =~ s/^\s+|\s+$//g;
216
217   Subroutine-based interface
218       The hash-based interface was chosen because it allows regexes to be
219       effortlessly interpolated, and because it also allows them to be
220       "curried". For example:
221
222               my $num = $RE{num}{int};
223
224               my $commad     = $num->{-sep=>','}{-group=>3};
225               my $duodecimal = $num->{-base=>12};
226
227       However, the use of tied hashes does make the access to Regexp::Common
228       patterns slower than it might otherwise be. In contexts where
229       impatience overrules laziness, Regexp::Common provides an additional
230       subroutine-based interface.
231
232       For each (sub-)entry in the %RE hash ($RE{key1}{key2}{etc}), there is a
233       corresponding exportable subroutine: "RE_key1_key2_etc()". The name of
234       each subroutine is the underscore-separated concatenation of the non-
235       flag keys that locate the same pattern in %RE. Flags are passed to the
236       subroutine in its argument list. Thus:
237
238               use Regexp::Common qw( RE_ws_crop RE_num_real RE_profanity );
239
240               $str =~ RE_ws_crop() and die "Surrounded by whitespace";
241
242               $str =~ RE_num_real(-base=>8, -sep=>" ") or next;
243
244               $offensive = RE_profanity(-keep);
245               $str =~ s/$offensive/$bad{$1}++; "<expletive deleted>"/ge;
246
247       Note that, unlike the hash-based interface (which returns objects),
248       these subroutines return ordinary "qr"'d regular expressions. Hence
249       they do not curry, nor do they provide the OO match and substitution
250       inlining described in the previous section.
251
252       It is also possible to export subroutines for all available patterns
253       like so:
254
255               use Regexp::Common 'RE_ALL';
256
257       Or you can export all subroutines with a common prefix of keys like so:
258
259               use Regexp::Common 'RE_num_ALL';
260
261       which will export "RE_num_int" and "RE_num_real" (and if you have
262       create more patterns who have first key num, those will be exported as
263       well). In general, RE_key1_..._keyn_ALL will export all subroutines
264       whose pattern names have first keys key1 ... keyn.
265
266   Adding new regular expressions
267       You can add your own regular expressions to the %RE hash at run-time,
268       using the exportable "pattern" subroutine. It expects a hash-like list
269       of key/value pairs that specify the behaviour of the pattern. The
270       various possible argument pairs are:
271
272       "name => [ @list ]"
273           A required argument that specifies the name of the pattern, and any
274           flags it may take, via a reference to a list of strings. For
275           example:
276
277                    pattern name => [qw( line of -char )],
278                            # other args here
279                            ;
280
281           This specifies an entry $RE{line}{of}, which may take a "-char"
282           flag.
283
284           Flags may also be specified with a default value, which is then
285           used whenever the flag is specified without an explicit value (but
286           not when the flag is omitted). For example:
287
288                    pattern name => [qw( line of -char=_ )],
289                            # default char is '_'
290                            # other args here
291                            ;
292
293       "create => $sub_ref_or_string"
294           A required argument that specifies either a string that is to be
295           returned as the pattern:
296
297                   pattern name    => [qw( line of underscores )],
298                           create  => q/(?:^_+$)/
299                           ;
300
301           or a reference to a subroutine that will be called to create the
302           pattern:
303
304                   pattern name    => [qw( line of -char=_ )],
305                           create  => sub {
306                                           my ($self, $flags) = @_;
307                                           my $char = quotemeta $flags->{-char};
308                                           return '(?:^$char+$)';
309                                       },
310                           ;
311
312           If the subroutine version is used, the subroutine will be called
313           with three arguments: a reference to the pattern object itself, a
314           reference to a hash containing the flags and their values, and a
315           reference to an array containing the non-flag keys.
316
317           Whatever the subroutine returns is stringified as the pattern.
318
319           No matter how the pattern is created, it is immediately
320           postprocessed to include or exclude capturing parentheses
321           (according to the value of the "-keep" flag). To specify such
322           "optional" capturing parentheses within the regular expression
323           associated with "create", use the notation "(?k:...)". Any
324           parentheses of this type will be converted to "(...)"  when the
325           "-keep" flag is specified, or "(?:...)" when it is not.  It is a
326           Regexp::Common convention that the outermost capturing parentheses
327           always capture the entire pattern, but this is not enforced.
328
329       "matches => $sub_ref"
330           An optional argument that specifies a subroutine that is to be
331           called when the "$RE{...}->matches(...)" method of this pattern is
332           invoked.
333
334           The subroutine should expect two arguments: a reference to the
335           pattern object itself, and the string to be matched against.
336
337           It should return the same types of values as a "m/.../" does.
338
339                pattern name    => [qw( line of -char )],
340                        create  => sub {...},
341                        matches => sub {
342                                        my ($self, $str) = @_;
343                                        $str !~ /[^$self->{flags}{-char}]/;
344                                   },
345                        ;
346
347       "subs => $sub_ref"
348           An optional argument that specifies a subroutine that is to be
349           called when the "$RE{...}->subs(...)" method of this pattern is
350           invoked.
351
352           The subroutine should expect three arguments: a reference to the
353           pattern object itself, the string to be changed, and the value to
354           be substituted into it.  The third argument may be "undef",
355           indicating the default substitution is required.
356
357           The subroutine should return the same types of values as an
358           "s/.../.../" does.
359
360           For example:
361
362                pattern name    => [ 'lineof', '-char=_' ],
363                        create  => sub {...},
364                        subs    => sub {
365                                     my ($self, $str, $ignore_replacement) = @_;
366                                     $_[1] =~ s/^$self->{flags}{-char}+$//g;
367                                   },
368                        ;
369
370           Note that such a subroutine will almost always need to modify $_[1]
371           directly.
372
373       "version => $minimum_perl_version"
374           If this argument is given, it specifies the minimum version of perl
375           required to use the new pattern. Attempts to use the pattern with
376           earlier versions of perl will generate a fatal diagnostic.
377
378   Loading specific sets of patterns.
379       By default, all the sets of patterns listed below are made available.
380       However, it is possible to indicate which sets of patterns should be
381       made available - the wanted sets should be given as arguments to "use".
382       Alternatively, it is also possible to indicate which sets of patterns
383       should not be made available - those sets will be given as argument to
384       the "use" statement, but are preceeded with an exclaimation mark. The
385       argument no_defaults indicates none of the default patterns should be
386       made available. This is useful for instance if all you want is the
387       "pattern()" subroutine.
388
389       Examples:
390
391        use Regexp::Common qw /comment number/;  # Comment and number patterns.
392        use Regexp::Common qw /no_defaults/;     # Don't load any patterns.
393        use Regexp::Common qw /!delimited/;      # All, but delimited patterns.
394
395       It's also possible to load your own set of patterns. If you have a
396       module "Regexp::Common::my_patterns" that makes patterns available, you
397       can have it made available with
398
399        use Regexp::Common qw /my_patterns/;
400
401       Note that the default patterns will still be made available - only if
402       you use no_defaults, or mention one of the default sets explicitely,
403       the non mentioned defaults aren't made available.
404
405   List of available patterns
406       The patterns listed below are currently available. Each set of patterns
407       has its own manual page describing the details. For each pattern set
408       named name, the manual page Regexp::Common::name describes the details.
409
410       Currently available are:
411
412       Regexp::Common::balanced
413           Provides regexes for strings with balanced parenthesized
414           delimiters.
415
416       Regexp::Common::comment
417           Provides regexes for comments of various languages (43 languages
418           currently).
419
420       Regexp::Common::delimited
421           Provides regexes for delimited strings.
422
423       Regexp::Common::lingua
424           Provides regexes for palindromes.
425
426       Regexp::Common::list
427           Provides regexes for lists.
428
429       Regexp::Common::net
430           Provides regexes for IPv4 addresses and MAC addresses.
431
432       Regexp::Common::number
433           Provides regexes for numbers (integers and reals).
434
435       Regexp::Common::profanity
436           Provides regexes for profanity.
437
438       Regexp::Common::whitespace
439           Provides regexes for leading and trailing whitespace.
440
441       Regexp::Common::zip
442           Provides regexes for zip codes.
443
444   Forthcoming patterns and features
445       Future releases of the module will also provide patterns for the
446       following:
447
448               * email addresses
449               * HTML/XML tags
450               * more numerical matchers,
451               * mail headers (including multiline ones),
452               * more URLS
453               * telephone numbers of various countries
454               * currency (universal 3 letter format, Latin-1, currency names)
455               * dates
456               * binary formats (e.g. UUencoded, MIMEd)
457
458       If you have other patterns or pattern generators that you think would
459       be generally useful, please send them to the maintainer -- preferably
460       as source code using the "pattern" subroutine. Submissions that include
461       a set of tests will be especially welcome.
462

DIAGNOSTICS

464       "Can't export unknown subroutine %s"
465           The subroutine-based interface didn't recognize the requested
466           subroutine.  Often caused by a spelling mistake or an incompletely
467           specified name.
468
469       "Can't create unknown regex: $RE{...}"
470           Regexp::Common doesn't have a generator for the requested pattern.
471           Often indicates a mispelt or missing parameter.
472
473        "Perl %f does not support the pattern $RE{...}. You need Perl %f or
474       later"
475           The requested pattern requires advanced regex features (e.g.
476           recursion) that not available in your version of Perl. Time to
477           upgrade.
478
479       "pattern() requires argument: name => [ @list ]"
480           Every user-defined pattern specification must have a name.
481
482       "pattern() requires argument: create => $sub_ref_or_string"
483           Every user-defined pattern specification must provide a pattern
484           creation mechanism: either a pattern string or a reference to a
485           subroutine that returns the pattern string.
486
487       "Base must be between 1 and 36"
488           The $RE{num}{real}{-base=>'I<N>'} pattern uses the characters
489           [0-9A-Z] to represent the digits of various bases. Hence it only
490           produces regular expressions for bases up to hexatricensimal.
491
492       "Must specify delimiter in $RE{delimited}"
493           The pattern has no default delimiter.  You need to write:
494           $RE{delimited}{-delim=>I<X>'} for some character X
495

ACKNOWLEDGEMENTS

497       Deepest thanks to the many people who have encouraged and contributed
498       to this project, especially: Elijah, Jarkko, Tom, Nat, Ed, and Vivek.
499
500       Further thanks go to: Alexandr Ciornii, Blair Zajac, Bob Stockdale,
501       Charles Thomas, Chris Vertonghen, the CPAN Testers, David Hand, Fany,
502       Geoffrey Leach, Hermann-Marcus Behrens, Jerome Quelin, Jim Cromie, Lars
503       Wilke, Linda Julien, Mike Arms, Mike Castle, Mikko, Murat Uenalan,
504       Rafaeel Garcia-Suarez, Ron Savage, Sam Vilain, Slaven Rezic, Smylers,
505       Tim Maher, and all the others I've forgotten.
506

AUTHOR

508       Damian Conway (damian@conway.org)
509

MAINTAINANCE

511       This package is maintained by Abigail (regexp-common@abigail.be).
512

BUGS AND IRRITATIONS

514       Bound to be plenty.
515
516       For a start, there are many common regexes missing.  Send them in to
517       regexp-common@abigail.be.
518
519       There are some POD issues when installing this module using a pre-5.6.0
520       perl; some manual pages may not install, or may not install correctly
521       using a perl that is that old. You might consider upgrading your perl.
522
524       This software is Copyright (c) 2001 - 2009, Damian Conway and Abigail.
525
526       This module is free software, and maybe used under any of the following
527       licenses:
528
529        1) The Perl Artistic License.     See the file COPYRIGHT.AL.
530        2) The Perl Artistic License 2.0. See the file COPYRIGHT.AL2.
531        3) The BSD Licence.               See the file COPYRIGHT.BSD.
532        4) The MIT Licence.               See the file COPYRIGHT.MIT.
533
534
535
536perl v5.12.0                      2010-01-02                 Regexp::Common(3)
Impressum