1Regexp::Common(3)     User Contributed Perl Documentation    Regexp::Common(3)
2
3
4

NAME

6       Regexp::Common - Provide commonly requested regular expressions
7

SYNOPSIS

9        # STANDARD USAGE
10
11        use Regexp::Common;
12
13        while (<>) {
14            /$RE{num}{real}/               and print q{a number};
15            /$RE{quoted}                   and print q{a ['"`] quoted string};
16            /$RE{delimited}{-delim=>'/'}/  and print q{a /.../ sequence};
17            /$RE{balanced}{-parens=>'()'}/ and print q{balanced parentheses};
18            /$RE{profanity}/               and print q{a #*@%-ing word};
19        }
20
21        # SUBROUTINE-BASED INTERFACE
22
23        use Regexp::Common 'RE_ALL';
24
25        while (<>) {
26            $_ =~ RE_num_real()              and print q{a number};
27            $_ =~ RE_quoted()                and print q{a ['"`] quoted string};
28            $_ =~ RE_delimited(-delim=>'/')  and print q{a /.../ sequence};
29            $_ =~ RE_balanced(-parens=>'()'} and print q{balanced parentheses};
30            $_ =~ RE_profanity()             and print q{a #*@%-ing word};
31        }
32
33        # IN-LINE MATCHING...
34
35        if ( $RE{num}{int}->matches($text) ) {...}
36
37        # ...AND SUBSTITUTION
38
39        my $cropped = $RE{ws}{crop}->subs($uncropped);
40
41        # ROLL-YOUR-OWN PATTERNS
42
43        use Regexp::Common 'pattern';
44
45        pattern name   => ['name', 'mine'],
46                create => '(?i:J[.]?\s+A[.]?\s+Perl-Hacker)',
47                ;
48
49        my $name_matcher = $RE{name}{mine};
50
51        pattern name    => [ 'lineof', '-char=_' ],
52                create  => sub {
53                               my $flags = shift;
54                               my $char = quotemeta $flags->{-char};
55                               return '(?:^$char+$)';
56                           },
57                matches => sub {
58                               my ($self, $str) = @_;
59                               return $str !~ /[^$self->{flags}{-char}]/;
60                           },
61                subs   => sub {
62                               my ($self, $str, $replacement) = @_;
63                               $_[1] =~ s/^$self->{flags}{-char}+$//g;
64                          },
65                ;
66
67        my $asterisks = $RE{lineof}{-char=>'*'};
68
69        # DECIDING WHICH PATTERNS TO LOAD.
70
71        use Regexp::Common qw /comment number/;  # Comment and number patterns.
72        use Regexp::Common qw /no_defaults/;     # Don't load any patterns.
73        use Regexp::Common qw /!delimited/;      # All, but delimited patterns.
74

DESCRIPTION

76       By default, this module exports a single hash (%RE) that stores or gen‐
77       erates commonly needed regular expressions (see "List of available pat‐
78       terns").
79
80       There is an alternative, subroutine-based syntax described in "Subrou‐
81       tine-based interface".
82
83       General syntax for requesting patterns
84
85       To access a particular pattern, %RE is treated as a hierarchical hash
86       of hashes (of hashes...), with each successive key being an identifier.
87       For example, to access the pattern that matches real numbers, you spec‐
88       ify:
89
90               $RE{num}{real}
91
92       and to access the pattern that matches integers:
93
94               $RE{num}{int}
95
96       Deeper layers of the hash are used to specify flags: arguments that
97       modify the resulting pattern in some way. The keys used to access these
98       layers are prefixed with a minus sign and may have a value; if a value
99       is given, it's done by using a multidimensional key.  For example, to
100       access the pattern that matches base-2 real numbers with embedded com‐
101       mas separating groups of three digits (e.g. 10,101,110.110101101):
102
103               $RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}
104
105       Through the magic of Perl, these flag layers may be specified in any
106       order (and even interspersed through the identifier keys!)  so you
107       could get the same pattern with:
108
109               $RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}
110
111       or:
112
113               $RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}
114
115       or even:
116
117               $RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}
118
119       etc.
120
121       Note, however, that the relative order of amongst the identifier keys
122       is significant. That is:
123
124               $RE{list}{set}
125
126       would not be the same as:
127
128               $RE{set}{list}
129
130       Flag syntax
131
132       In versions prior to 2.113, flags could also be written as
133       "{"-flag=value"}". This no longer works, although "{"-flag$;value"}"
134       still does. However, "{-flag => 'value'}" is the preferred syntax.
135
136       Universal flags
137
138       Normally, flags are specific to a single pattern.  However, there is
139       two flags that all patterns may specify.
140
141       "-keep"
142           By default, the patterns provided by %RE contain no capturing
143           parentheses. However, if the "-keep" flag is specified (it requires
144           no value) then any significant substrings that the pattern matches
145           are captured. For example:
146
147                   if ($str =~ $RE{num}{real}{-keep}) {
148                           $number   = $1;
149                           $whole    = $3;
150                           $decimals = $5;
151                   }
152
153           Special care is needed if a "kept" pattern is interpolated into a
154           larger regular expression, as the presence of other capturing
155           parentheses is likely to change the "number variables" into which
156           significant substrings are saved.
157
158           See also "Adding new regular expressions", which describes how to
159           create new patterns with "optional" capturing brackets that respond
160           to "-keep".
161
162       "-i"
163           Some patterns or subpatterns only match lowercase or uppercase let‐
164           ters.  If one wants the do case insensitive matching, one option is
165           to use the "/i" regexp modifier, or the special sequence "(?i)".
166           But if the functional interface is used, one does not have this
167           option. The "-i" switch solves this problem; by using it, the pat‐
168           tern will do case insensitive matching.
169
170       OO interface and inline matching/substitution
171
172       The patterns returned from %RE are objects, so rather than writing:
173
174               if ($str =~ /$RE{some}{pattern}/ ) {...}
175
176       you can write:
177
178               if ( $RE{some}{pattern}->matches($str) ) {...}
179
180       For matching this would seem to have no great advantage apart from
181       readability (but see below).
182
183       For substitutions, it has other significant benefits. Frequently you
184       want to perform a substitution on a string without changing the origi‐
185       nal. Most people use this:
186
187               $changed = $original;
188               $changed =~ s/$RE{some}{pattern}/$replacement/;
189
190       The more adept use:
191
192               ($changed = $original) =~ s/$RE{some}{pattern}/$replacement/;
193
194       Regexp::Common allows you do write this:
195
196               $changed = $RE{some}{pattern}->subs($original=>$replacement);
197
198       Apart from reducing precedence-angst, this approach has the added
199       advantages that the substitution behaviour can be optimized from the
200       regular expression, and the replacement string can be provided by
201       default (see "Adding new regular expressions").
202
203       For example, in the implementation of this substitution:
204
205               $cropped = $RE{ws}{crop}->subs($uncropped);
206
207       the default empty string is provided automatically, and the substitu‐
208       tion is optimized to use:
209
210               $uncropped =~ s/^\s+//;
211               $uncropped =~ s/\s+$//;
212
213       rather than:
214
215               $uncropped =~ s/^\s+⎪\s+$//g;
216
217       Subroutine-based interface
218
219       The hash-based interface was chosen because it allows regexes to be
220       effortlessly interpolated, and because it also allows them to be "cur‐
221       ried". For example:
222
223               my $num = $RE{num}{int};
224
225               my $commad     = $num->{-sep=>','}{-group=>3};
226               my $duodecimal = $num->{-base=>12};
227
228       However, the use of tied hashes does make the access to Regexp::Common
229       patterns slower than it might otherwise be. In contexts where impa‐
230       tience overrules laziness, Regexp::Common provides an additional sub‐
231       routine-based interface.
232
233       For each (sub-)entry in the %RE hash ($RE{key1}{key2}{etc}), there is a
234       corresponding exportable subroutine: "RE_key1_key2_etc()". The name of
235       each subroutine is the underscore-separated concatenation of the non-
236       flag keys that locate the same pattern in %RE. Flags are passed to the
237       subroutine in its argument list. Thus:
238
239               use Regexp::Common qw( RE_ws_crop RE_num_real RE_profanity );
240
241               $str =~ RE_ws_crop() and die "Surrounded by whitespace";
242
243               $str =~ RE_num_real(-base=>8, -sep=>" ") or next;
244
245               $offensive = RE_profanity(-keep);
246               $str =~ s/$offensive/$bad{$1}++; "<expletive deleted>"/ge;
247
248       Note that, unlike the hash-based interface (which returns objects),
249       these subroutines return ordinary "qr"'d regular expressions. Hence
250       they do not curry, nor do they provide the OO match and substitution
251       inlining described in the previous section.
252
253       It is also possible to export subroutines for all available patterns
254       like so:
255
256               use Regexp::Common 'RE_ALL';
257
258       Or you can export all subroutines with a common prefix of keys like so:
259
260               use Regexp::Common 'RE_num_ALL';
261
262       which will export "RE_num_int" and "RE_num_real" (and if you have cre‐
263       ate more patterns who have first key num, those will be exported as
264       well). In general, RE_key1_..._keyn_ALL will export all subroutines
265       whose pattern names have first keys key1 ... keyn.
266
267       Adding new regular expressions
268
269       You can add your own regular expressions to the %RE hash at run-time,
270       using the exportable "pattern" subroutine. It expects a hash-like list
271       of key/value pairs that specify the behaviour of the pattern. The vari‐
272       ous possible argument pairs are:
273
274           "name => [ @list ]"
275               A required argument that specifies the name of the pattern, and
276               any flags it may take, via a reference to a list of strings.
277               For example:
278
279                        pattern name => [qw( line of -char )],
280                                # other args here
281                                ;
282
283               This specifies an entry $RE{line}{of}, which may take a "-char"
284               flag.
285
286               Flags may also be specified with a default value, which is then
287               used whenever the flag is omitted, or specified without an
288               explicit value. For example:
289
290                        pattern name => [qw( line of -char=_ )],
291                                # default char is '_'
292                                # other args here
293                                ;
294
295           "create => $sub_ref_or_string"
296               A required argument that specifies either a string that is to
297               be returned as the pattern:
298
299                       pattern name    => [qw( line of underscores )],
300                               create  => q/(?:^_+$)/
301                               ;
302
303               or a reference to a subroutine that will be called to create
304               the pattern:
305
306                       pattern name    => [qw( line of -char=_ )],
307                               create  => sub {
308                                               my ($self, $flags) = @_;
309                                               my $char = quotemeta $flags->{-char};
310                                               return '(?:^$char+$)';
311                                           },
312                               ;
313
314               If the subroutine version is used, the subroutine will be
315               called with three arguments: a reference to the pattern object
316               itself, a reference to a hash containing the flags and their
317               values, and a reference to an array containing the non-flag
318               keys.
319
320               Whatever the subroutine returns is stringified as the pattern.
321
322               No matter how the pattern is created, it is immediately post‐
323               processed to include or exclude capturing parentheses (accord‐
324               ing to the value of the "-keep" flag). To specify such
325               "optional" capturing parentheses within the regular expression
326               associated with "create", use the notation "(?k:...)". Any
327               parentheses of this type will be converted to "(...)"  when the
328               "-keep" flag is specified, or "(?:...)" when it is not.  It is
329               a Regexp::Common convention that the outermost capturing paren‐
330               theses always capture the entire pattern, but this is not
331               enforced.
332
333           "matches => $sub_ref"
334               An optional argument that specifies a subroutine that is to be
335               called when the "$RE{...}->matches(...)" method of this pattern
336               is invoked.
337
338               The subroutine should expect two arguments: a reference to the
339               pattern object itself, and the string to be matched against.
340
341               It should return the same types of values as a "m/.../" does.
342
343                    pattern name    => [qw( line of -char )],
344                            create  => sub {...},
345                            matches => sub {
346                                            my ($self, $str) = @_;
347                                            $str !~ /[^$self->{flags}{-char}]/;
348                                       },
349                            ;
350
351           "subs => $sub_ref"
352               An optional argument that specifies a subroutine that is to be
353               called when the "$RE{...}->subs(...)" method of this pattern is
354               invoked.
355
356               The subroutine should expect three arguments: a reference to
357               the pattern object itself, the string to be changed, and the
358               value to be substituted into it.  The third argument may be
359               "undef", indicating the default substitution is required.
360
361               The subroutine should return the same types of values as an
362               "s/.../.../" does.
363
364               For example:
365
366                    pattern name    => [ 'lineof', '-char=_' ],
367                            create  => sub {...},
368                            subs    => sub {
369                                         my ($self, $str, $ignore_replacement) = @_;
370                                         $_[1] =~ s/^$self->{flags}{-char}+$//g;
371                                       },
372                            ;
373
374               Note that such a subroutine will almost always need to modify
375               $_[1] directly.
376
377           "version => $minimum_perl_version"
378               If this argument is given, it specifies the minimum version of
379               perl required to use the new pattern. Attempts to use the pat‐
380               tern with earlier versions of perl will generate a fatal diag‐
381               nostic.
382
383           Loading specific sets of patterns.
384
385           By default, all the sets of patterns listed below are made avail‐
386           able.  However, it is possible to indicate which sets of patterns
387           should be made available - the wanted sets should be given as argu‐
388           ments to "use". Alternatively, it is also possible to indicate
389           which sets of patterns should not be made available - those sets
390           will be given as argument to the "use" statement, but are preceeded
391           with an exclaimation mark. The argument no_defaults indicates none
392           of the default patterns should be made available. This is useful
393           for instance if all you want is the "pattern()" subroutine.
394
395           Examples:
396
397            use Regexp::Common qw /comment number/;  # Comment and number patterns.
398            use Regexp::Common qw /no_defaults/;     # Don't load any patterns.
399            use Regexp::Common qw /!delimited/;      # All, but delimited patterns.
400
401           It's also possible to load your own set of patterns. If you have a
402           module "Regexp::Common::my_patterns" that makes patterns available,
403           you can have it made available with
404
405            use Regexp::Common qw /my_patterns/;
406
407           Note that the default patterns will still be made available - only
408           if you use no_defaults, or mention one of the default sets
409           explicitely, the non mentioned defaults aren't made available.
410
411           List of available patterns
412
413           The patterns listed below are currently available. Each set of pat‐
414           terns has its own manual page describing the details. For each pat‐
415           tern set named name, the manual page Regexp::Common::name describes
416           the details.
417
418           Currently available are:
419
420           Regexp::Common::balanced
421               Provides regexes for strings with balanced parenthesized delim‐
422               iters.
423
424           Regexp::Common::comment
425               Provides regexes for comments of various languages (43 lan‐
426               guages currently).
427
428           Regexp::Common::delimited
429               Provides regexes for delimited strings.
430
431           Regexp::Common::lingua
432               Provides regexes for palindromes.
433
434           Regexp::Common::list
435               Provides regexes for lists.
436
437           Regexp::Common::net
438               Provides regexes for IPv4 addresses and MAC addresses.
439
440           Regexp::Common::number
441               Provides regexes for numbers (integers and reals).
442
443           Regexp::Common::profanity
444               Provides regexes for profanity.
445
446           Regexp::Common::whitespace
447               Provides regexes for leading and trailing whitespace.
448
449           Regexp::Common::zip
450               Provides regexes for zip codes.
451
452           Forthcoming patterns and features
453
454           Future releases of the module will also provide patterns for the
455           following:
456
457                   * email addresses
458                   * HTML/XML tags
459                   * more numerical matchers,
460                   * mail headers (including multiline ones),
461                   * more URLS
462                   * telephone numbers of various countries
463                   * currency (universal 3 letter format, Latin-1, currency names)
464                   * dates
465                   * binary formats (e.g. UUencoded, MIMEd)
466
467           If you have other patterns or pattern generators that you think
468           would be generally useful, please send them to the maintainer --
469           preferably as source code using the "pattern" subroutine. Submis‐
470           sions that include a set of tests will be especially welcome.
471

DIAGNOSTICS

473       "Can't export unknown subroutine %s"
474           The subroutine-based interface didn't recognize the requested sub‐
475           routine.  Often caused by a spelling mistake or an incompletely
476           specified name.
477
478       "Can't create unknown regex: $RE{...}"
479           Regexp::Common doesn't have a generator for the requested pattern.
480           Often indicates a mispelt or missing parameter.
481
482       "Perl %f does not support the pattern $RE{...}. You need Perl %f or
483       later"
484           The requested pattern requires advanced regex features (e.g. recur‐
485           sion) that not available in your version of Perl. Time to upgrade.
486
487       "pattern() requires argument: name => [ @list ]"
488           Every user-defined pattern specification must have a name.
489
490       "pattern() requires argument: create => $sub_ref_or_string"
491           Every user-defined pattern specification must provide a pattern
492           creation mechanism: either a pattern string or a reference to a
493           subroutine that returns the pattern string.
494
495       "Base must be between 1 and 36"
496           The $RE{num}{real}{-base=>'N'} pattern uses the characters [0-9A-Z]
497           to represent the digits of various bases. Hence it only produces
498           regular expressions for bases up to hexatricensimal.
499
500       "Must specify delimiter in $RE{delimited}"
501           The pattern has no default delimiter.  You need to write:
502           $RE{delimited}{-delim=>X'} for some character X
503

ACKNOWLEDGEMENTS

505       Deepest thanks to the many people who have encouraged and contributed
506       to this project, especially: Elijah, Jarkko, Tom, Nat, Ed, and Vivek.
507

HISTORY

509         $Log: Common.pm,v $
510         Revision 2.120  2005/03/16 00:24:45  abigail
511         Load Carp only on demand
512
513         Revision 2.119  2005/01/01 16:35:14  abigail
514         - Updated copyright notice. New release.
515
516         Revision 2.118  2004/12/14 23:17:57  abigail
517         Fixed the generic OO routines.
518
519         Revision 2.117  2004/06/30 15:01:35  abigail
520         Pod nits. (Jim Cromie)
521
522         Revision 2.116  2004/06/30 09:37:36  abigail
523         New version
524
525         Revision 2.115  2004/06/09 21:58:01  abigail
526         - 'SEN'
527         - New release.
528
529         Revision 2.114  2003/05/25 21:34:56  abigail
530         POD nits from Bryan C. Warnock
531
532         Revision 2.113  2003/04/02 21:23:48  abigail
533         Removed anything related to $; being '='
534
535         Revision 2.112  2003/03/25 23:27:27  abigail
536         New release
537
538         Revision 2.111  2003/03/12 22:37:13  abigail
539         +  The -i switch.
540         +  New release.
541
542         Revision 2.110  2003/02/21 14:55:31  abigail
543         New release
544
545         Revision 2.109  2003/02/10 21:36:58  abigail
546         New release
547
548         Revision 2.108  2003/02/09 21:45:07  abigail
549         New release
550
551         Revision 2.107  2003/02/07 15:23:03  abigail
552         New release
553
554         Revision 2.106  2003/02/02 17:44:58  abigail
555         New release
556
557         Revision 2.105  2003/02/02 03:20:32  abigail
558         New release
559
560         Revision 2.104  2003/01/24 15:43:40  abigail
561         New release
562
563         Revision 2.103  2003/01/23 02:19:01  abigail
564         New release
565
566         Revision 2.102  2003/01/22 17:32:34  abigail
567         New release
568
569         Revision 2.101  2003/01/21 23:52:18  abigail
570         POD fix.
571
572         Revision 2.100  2003/01/21 23:19:40  abigail
573         The whole world understands RCS/CVS version numbers, that 1.9 is an
574         older version than 1.10. Except CPAN. Curse the idiot(s) who think
575         that version numbers are floats (in which universe do floats have
576         more than one decimal dot?).
577         Everything is bumped to version 2.100 because CPAN couldn't deal
578         with the fact one file had version 1.10.
579
580         Revision 1.30  2003/01/17 13:19:04  abigail
581         New release
582
583         Revision 1.29  2003/01/16 11:08:41  abigail
584         New release
585
586         Revision 1.28  2003/01/01 23:03:53  abigail
587         New distribution
588
589         Revision 1.27  2003/01/01 17:09:07  abigail
590         lingua class added
591
592         Revision 1.26  2002/12/30 23:08:28  abigail
593         New module Regexp::Common::zip
594
595         Revision 1.25  2002/12/27 23:34:44  abigail
596         New release
597
598         Revision 1.24  2002/12/24 00:00:04  abigail
599         New release
600
601         Revision 1.23  2002/11/06 13:50:23  abigail
602         Minor POD changes.
603
604         Revision 1.22  2002/10/01 18:25:46  abigail
605         POD buglets.
606
607         Revision 1.21  2002/09/18 17:46:11  abigail
608         POD Typo fix (Douglas Hunter)
609
610         Revision 1.20  2002/08/27 17:04:29  abigail
611         VERSION is now extracted from the CVS revision number.
612
613         Revision 1.19  2002/08/06 14:46:49  abigail
614         Upped version number to 0.09.
615
616         Revision 1.18  2002/08/06 13:50:08  abigail
617         - Added HISTORY section with CVS log.
618         - Upped version number to 0.08.
619
620         Revision 1.17  2002/08/05 12:21:46  abigail
621         Upped version number to 0.07.
622
623         Revision 1.16  2002/08/05 12:16:30  abigail
624         Fixed 'Regex::' typo to 'Regexp::' (Found my Mike Castle).
625
626         Revision 1.15  2002/08/04 22:56:02  abigail
627         Upped version number to 0.06.
628
629         Revision 1.14  2002/08/04 19:33:33  abigail
630         Loaded URI by default.
631
632         Revision 1.13  2002/08/01 10:02:42  abigail
633         Upped version number.
634
635         Revision 1.12  2002/07/31 23:26:06  abigail
636         Upped version number.
637
638         Revision 1.11  2002/07/31 13:11:20  abigail
639         Removed URL from the list of default loaded regexes, as this one isn't
640         ready yet.
641
642         Upped the version number to 0.03.
643
644         Revision 1.10  2002/07/29 13:16:38  abigail
645         Introduced 'use strict' (which uncovered a bug, \@non_flags was used
646         when $spec{create} was called instead of \@nonflags).
647
648         Turned warnings on (using local $^W = 1; "use warnings" isn't available
649         in pre 5.6).
650
651         Revision 1.9  2002/07/28 23:02:54  abigail
652         Split out the remaining pattern groups to separate files.
653
654         Fixed a bug in _decache, changed the regex /$fpat=(.+)/ to
655         /$fpat=(.*)/, to be able to distinguish the case of a flag
656         set to the empty string, or a flag without an argument.
657
658         Added 'undef' to @_ in the sub_interface setting to avoid a warning
659         of setting a hash with an odd number of arguments.
660
661         POD fixes.
662
663         Revision 1.8  2002/07/25 23:55:54  abigail
664         Moved balanced, net and URL to separate files.
665
666         Revision 1.7  2002/07/25 20:01:40  abigail
667         Modified import() to deal with factoring out groups of related regexes.
668         Factored out comments into Common/comment.
669
670         Revision 1.6  2002/07/23 21:20:43  abigail
671         Upped version number to 0.02.
672
673         Revision 1.5  2002/07/23 21:14:55  abigail
674         Added $RE{comment}{HTML}.
675
676         Revision 1.4  2002/07/23 17:01:09  abigail
677         Added lines about new maintainer, and an email address to submit bugs
678         and new regexes to.
679
680         Revision 1.3  2002/07/23 13:58:58  abigail
681         Changed various occurences of C<... => ...> into C<< ... => ... >>.
682
683         Revision 1.2  2002/07/23 12:27:07  abigail
684         Line 733 was missing the closing > of a C<> in the POD.
685
686         Revision 1.1  2002/07/23 12:22:51  abigail
687         Initial revision
688

AUTHOR

690       Damian Conway (damian@conway.org)
691

MAINTAINANCE

693       This package is maintained by Abigail (regexp-common@abigail.nl).
694

BUGS AND IRRITATIONS

696       Bound to be plenty.
697
698       For a start, there are many common regexes missing.  Send them in to
699       regexp-common@abigail.nl.
700
702          Copyright (c) 2001 - 2005, Damian Conway and Abigail. All Rights
703        Reserved. This module is free software. It may be used, redistributed
704            and/or modified under the terms of the Perl Artistic License
705                  (see http://www.perl.com/perl/misc/Artistic.html)
706
707
708
709perl v5.8.8                       2003-03-23                 Regexp::Common(3)
Impressum