1PPIx::Regexp(3)       User Contributed Perl Documentation      PPIx::Regexp(3)
2
3
4

NAME

6       PPIx::Regexp - Represent a regular expression of some sort
7

SYNOPSIS

9        use PPIx::Regexp;
10        use PPIx::Regexp::Dumper;
11        my $re = PPIx::Regexp->new( 'qr{foo}smx' );
12        PPIx::Regexp::Dumper->new( $re )
13            ->print();
14

INHERITANCE

16       "PPIx::Regexp" is a PPIx::Regexp::Node.
17
18       "PPIx::Regexp" has no descendants.
19

DESCRIPTION

21       The purpose of the PPIx-Regexp package is to parse regular expressions
22       in a manner similar to the way the PPI package parses Perl. This class
23       forms the root of the parse tree, playing a role similar to
24       PPI::Document.
25
26       This package shares with PPI the property of being round-trip safe.
27       That is,
28
29        my $expr = 's/ ( \d+ ) ( \D+ ) /$2$1/smxg';
30        my $re = PPIx::Regexp->new( $expr );
31        print $re->content() eq $expr ? "yes\n" : "no\n"
32
33       should print 'yes' for any valid regular expression.
34
35       Navigation is similar to that provided by PPI. That is to say, things
36       like "children", "find_first", "snext_sibling" and so on all work
37       pretty much the same way as in PPI.
38
39       The class hierarchy is also similar to PPI. Except for some utility
40       classes (the dumper, the lexer, and the tokenizer) all classes are
41       descended from PPIx::Regexp::Element, which provides basic navigation.
42       Tokens are descended from PPIx::Regexp::Token, which provides content.
43       All containers are descended from PPIx::Regexp::Node, which provides
44       for children, and all structure elements are descended from
45       PPIx::Regexp::Structure, which provides beginning and ending
46       delimiters, and a type.
47
48       There are two features of PPI that this package does not provide -
49       mutability and operator overloading. There are no plans for serious
50       mutability, though something like PPI's "prune" functionality might be
51       considered. Similarly there are no plans for operator overloading,
52       which appears to the author to represent a performance hit for little
53       tangible gain.
54

NOTICE

56       The author will attempt to preserve the documented interface, but if
57       the interface needs to change to correct some egregiously bad design or
58       implementation decision, then it will change.  Any incompatible changes
59       will go through a deprecation cycle.
60
61       The goal of this package is to parse well-formed regular expressions
62       correctly. A secondary goal is not to blow up on ill-formed regular
63       expressions. The correct identification and characterization of ill-
64       formed regular expressions is not a goal of this package.
65
66       This policy attempts to track features in development releases as well
67       as public releases. However, features added in a development release
68       and then removed before the next production release will not be
69       tracked, and any functionality relating to such features will be
70       removed. The issue here is the potential re-use (with different
71       semantics) of syntax that did not make it into the production release.
72

METHODS

74       This class provides the following public methods. Methods not
75       documented here are private, and unsupported in the sense that the
76       author reserves the right to change or remove them without notice.
77
78   new
79        my $re = PPIx::Regexp->new('/foo/');
80
81       This method instantiates a "PPIx::Regexp" object from a string, a
82       PPI::Token::QuoteLike::Regexp, a PPI::Token::Regexp::Match, or a
83       PPI::Token::Regexp::Substitute.  Honestly, any PPI::Element will do,
84       but only the three Regexp classes mentioned previously are likely to do
85       anything useful.
86
87       Optionally you can pass one or more name/value pairs after the regular
88       expression. The possible options are:
89
90       default_modifiers array_reference
91           This option specifies a reference to an array of default modifiers
92           to apply to the regular expression being parsed. Each modifier is
93           specified as a string. Any actual modifiers found supersede the
94           defaults.
95
96           When applying the defaults, '?' and '/' are completely ignored, and
97           '^' is ignored unless it occurs at the beginning of the modifier.
98           The first dash ('-') causes subsequent modifiers to be negated.
99
100           So, for example, if you wish to produce a "PPIx::Regexp" object
101           representing the regular expression in
102
103            use re '/smx';
104            {
105               no re '/x';
106               m/ foo /;
107            }
108
109           you would (after some help from PPI in finding the relevant
110           statements), do something like
111
112            my $re = PPIx::Regexp->new( 'm/ foo /',
113                default_modifiers => [ '/smx', '-/x' ] );
114           `
115           =item encoding name
116
117           This option specifies the encoding of the regular expression. This
118           is passed to the tokenizer, which will "decode" the regular
119           expression string before it tokenizes it. For example:
120
121            my $re = PPIx::Regexp->new( '/foo/',
122                encoding => 'iso-8859-1',
123            );
124
125       trace number
126           If greater than zero, this option causes trace output from the
127           parse.  The author reserves the right to change or eliminate this
128           without notice.
129
130       Passing optional input other than the above is not an error, but
131       neither is it supported.
132
133   new_from_cache
134       This static method wraps "new" in a caching mechanism. Only one object
135       will be generated for a given PPI::Element, no matter how many times
136       this method is called. Calls after the first for a given PPI::Element
137       simply return the same "PPIx::Regexp" object.
138
139       When the "PPIx::Regexp" object is returned from cache, the values of
140       the optional arguments are ignored.
141
142       Calls to this method with the regular expression in a string rather
143       than a PPI::Element will not be cached.
144
145       Caveat: This method is provided for code like Perl::Critic which might
146       instantiate the same object multiple times. The cache will persist
147       until "flush_cache" is called.
148
149   flush_cache
150        $re->flush_cache();            # Remove $re from cache
151        PPIx::Regexp->flush_cache();   # Empty the cache
152
153       This method flushes the cache used by "new_from_cache". If called as a
154       static method with no arguments, the entire cache is emptied. Otherwise
155       any objects specified are removed from the cache.
156
157   capture_names
158        foreach my $name ( $re->capture_names() ) {
159            print "Capture name '$name'\n";
160        }
161
162       This convenience method returns the capture names found in the regular
163       expression.
164
165       This method is equivalent to
166
167        $self->regular_expression()->capture_names();
168
169       except that if "$self->regular_expression()" returns "undef" (meaning
170       that something went terribly wrong with the parse) this method will
171       simply return.
172
173   delimiters
174        print join("\t", PPIx::Regexp->new('s/foo/bar/')->delimiters());
175        # prints '//      //'
176
177       When called in list context, this method returns either one or two
178       strings, depending on whether the parsed expression has a replacement
179       string. In the case of non-bracketed substitutions, the start delimiter
180       of the replacement string is considered to be the same as its finish
181       delimiter, as illustrated by the above example.
182
183       When called in scalar context, you get the delimiters of the regular
184       expression; that is, element 0 of the array that is returned in list
185       context.
186
187       Optionally, you can pass an index value and the corresponding
188       delimiters will be returned; index 0 represents the regular
189       expression's delimiters, and index 1 represents the replacement
190       string's delimiters, which may be undef. For example,
191
192        print PPIx::Regexp->new('s{foo}<bar>')-delimiters(1);
193        # prints '<>'
194
195       If the object was not initialized with a valid regexp of some sort, the
196       results of this method are undefined.
197
198   errstr
199       This static method returns the error string from the most recent
200       attempt to instantiate a "PPIx::Regexp". It will be "undef" if the most
201       recent attempt succeeded.
202
203   failures
204        print "There were ", $re->failures(), " parse failures\n";
205
206       This method returns the number of parse failures. This is a count of
207       the number of unknown tokens plus the number of unterminated structures
208       plus the number of unmatched right brackets of any sort.
209
210   max_capture_number
211        print "Highest used capture number ",
212            $re->max_capture_number(), "\n";
213
214       This convenience method returns the highest capture number used by the
215       regular expression. If there are no captures, the return will be 0.
216
217       This method is equivalent to
218
219        $self->regular_expression()->max_capture_number();
220
221       except that if "$self->regular_expression()" returns "undef" (meaning
222       that something went terribly wrong with the parse) this method will
223       too.
224
225   modifier
226        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
227        print $re->modifier()->content(), "\n";
228        # prints 'smx'.
229
230       This method retrieves the modifier of the object. This comes from the
231       end of the initializing string or object and will be a
232       PPIx::Regexp::Token::Modifier.
233
234       Note that this object represents the actual modifiers present on the
235       regexp, and does not take into account any that may have been applied
236       by default (i.e. via the "default_modifiers" argument to "new()"). For
237       something that takes account of default modifiers, see
238       modifier_asserted(), below.
239
240       In the event of a parse failure, there may not be a modifier present,
241       in which case nothing is returned.
242
243   modifier_asserted
244        my $re = PPIx::Regexp->new( '/ . /',
245            default_modifiers => [ 'smx' ] );
246        print $re->modifier_asserted( 'x' ) ? "yes\n" : "no\n";
247        # prints 'yes'.
248
249       This method returns true if the given modifier is asserted for the
250       regexp, whether explicitly or by the modifiers passed in the
251       "default_modifiers" argument.
252
253   regular_expression
254        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
255        print $re->regular_expression()->content(), "\n";
256        # prints '/(foo)/'.
257
258       This method returns that portion of the object which actually
259       represents a regular expression.
260
261   replacement
262        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
263        print $re->replacement()->content(), "\n";
264        # prints '${1}bar/'.
265
266       This method returns that portion of the object which represents the
267       replacement string. This will be "undef" unless the regular expression
268       actually has a replacement string. Delimiters will be included, but
269       there will be no beginning delimiter unless the regular expression was
270       bracketed.
271
272   source
273        my $source = $re->source();
274
275       This method returns the object or string that was used to instantiate
276       the object.
277
278   type
279        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
280        print $re->type()->content(), "\n";
281        # prints 's'.
282
283       This method retrieves the type of the object. This comes from the
284       beginning of the initializing string or object, and will be a
285       PPIx::Regexp::Token::Structure whose "content" is one of 's', 'm',
286       'qr', or ''.
287

RESTRICTIONS

289       By the nature of this module, it is never going to get everything
290       right.  Many of the known problem areas involve interpolations one way
291       or another.
292
293   Ambiguous Syntax
294       Perl's regular expressions contain cases where the syntax is ambiguous.
295       A particularly egregious example is an interpolation followed by square
296       or curly brackets, for example $foo[...]. There is nothing in the
297       syntax to say whether the programmer wanted to interpolate an element
298       of array @foo, or whether he wanted to interpolate scalar $foo, and
299       then follow that interpolation by a character class.
300
301       The perlop documentation notes that in this case what Perl does is to
302       guess. That is, it employs various heuristics on the code to try to
303       figure out what the programmer wanted. These heuristics are documented
304       as being undocumented (!) and subject to change without notice.
305
306       Given this situation, this module's chances of duplicating every Perl
307       version's interpretation of every regular expression are pretty much
308       nil.  What it does now is to assume that square brackets containing
309       only an integer or an interpolation represent a subscript; otherwise
310       they represent a character class. Similarly, curly brackets containing
311       only a bareword or an interpolation are a subscript; otherwise they
312       represent a quantifier.
313
314   Changes in Syntax
315       Sometimes the introduction of new syntax changes the way a regular
316       expression is parsed. For example, the "\v" character class was
317       introduced in Perl 5.9.5. But it did not represent a syntax error prior
318       to that version of Perl, it was simply parsed as "v". So
319
320        $ perl -le 'print "v" =~ m/\v/ ? "yes" : "no"'
321
322       prints "yes" under Perl 5.8.9, but "no" under 5.10.0. "PPIx::Regexp"
323       generally assumes the more modern parse in cases like this.
324
325   Static Parsing
326       It is well known that Perl can not be statically parsed. That is, you
327       can not completely parse a piece of Perl code without executing that
328       same code.
329
330       Nevertheless, this class is trying to statically parse regular
331       expressions. The main problem with this is that there is no way to know
332       what is being interpolated into the regular expression by an
333       interpolated variable. This is a problem because the interpolated value
334       can change the interpretation of adjacent elements.
335
336       This module deals with this by making assumptions about what is in an
337       interpolated variable. These assumptions will not be enumerated here,
338       but in general the principal is to assume the interpolated value does
339       not change the interpretation of the regular expression. For example,
340
341        my $foo = 'a-z]';
342        my $re = qr{[$foo};
343
344       is fine with the Perl interpreter, but will confuse the dickens out of
345       this module. Similarly and more usefully, something like
346
347        my $mods = 'i';
348        my $re = qr{(?$mods:foo)};
349
350       or maybe
351
352        my $mods = 'i';
353        my $re = qr{(?$mods)$foo};
354
355       probably sets a modifier of some sort, and that is how this module
356       interprets it. If the interpolation is not about modifiers, this module
357       will get it wrong. Another such semi-benign example is
358
359        my $foo = $] >= 5.010 ? '?<foo>' : '';
360        my $re = qr{($foo\w+)};
361
362       which will parse, but this module will never realize that it might be
363       looking at a named capture.
364
365   Non-Standard Syntax
366       There are modules out there that alter the syntax of Perl. If the
367       syntax of a regular expression is altered, this module has no way to
368       understand that it has been altered, much less to adapt to the
369       alteration. The following modules are known to cause problems:
370
371       Acme::PerlML, which renders Perl as XML.
372
373       Data::PostfixDeref, which causes Perl to interpret suffixed empty
374       brackets as dereferencing the thing they suffix.
375
376       Filter::Trigraph, which recognizes ANSI C trigraphs, allowing Perl to
377       be written in the ISO 646 character set.
378
379       Perl6::Pugs. Enough said.
380
381       Perl6::Rules, which back-ports some of the Perl 6 regular expression
382       syntax to Perl 5.
383
384       Regexp::Extended, which extends regular expressions in various ways,
385       some of which seem to conflict with Perl 5.010.
386

SEE ALSO

388       Regexp::Parser, which parses a bare regular expression (without
389       enclosing "qr{}", "m//", or whatever) and uses a different navigation
390       model.
391

SUPPORT

393       Support is by the author. Please file bug reports at
394       <http://rt.cpan.org>, or in electronic mail to the author.
395

AUTHOR

397       Thomas R. Wyant, III wyant at cpan dot org
398
400       Copyright (C) 2009-2013 by Thomas R. Wyant, III
401
402       This program is free software; you can redistribute it and/or modify it
403       under the same terms as Perl 5.10.0. For more details, see the full
404       text of the licenses in the directory LICENSES.
405
406       This program is distributed in the hope that it will be useful, but
407       without any warranty; without even the implied warranty of
408       merchantability or fitness for a particular purpose.
409
410
411
412perl v5.16.3                      2014-06-10                   PPIx::Regexp(3)
Impressum