1PPIx::Regexp(3)       User Contributed Perl Documentation      PPIx::Regexp(3)
2
3
4

NAME

6       PPIx::Regexp - Represent a regular expression of some sort
7

SYNOPSIS

9        use PPIx::Regexp;
10        use PPIx::Regexp::Dumper;
11        my $re = PPIx::Regexp->new( 'qr{foo}smx' );
12        PPIx::Regexp::Dumper->new( $re )
13            ->print();
14

INHERITANCE

16       "PPIx::Regexp" is a PPIx::Regexp::Node.
17
18       "PPIx::Regexp" has no descendants.
19

DESCRIPTION

21       The purpose of the PPIx-Regexp package is to parse regular expressions
22       in a manner similar to the way the PPI package parses Perl. This class
23       forms the root of the parse tree, playing a role similar to
24       PPI::Document.
25
26       This package shares with PPI the property of being round-trip safe.
27       That is,
28
29        my $expr = 's/ ( \d+ ) ( \D+ ) /$2$1/smxg';
30        my $re = PPIx::Regexp->new( $expr );
31        print $re->content() eq $expr ? "yes\n" : "no\n"
32
33       should print 'yes' for any valid regular expression.
34
35       Navigation is similar to that provided by PPI. That is to say, things
36       like "children", "find_first", "snext_sibling" and so on all work
37       pretty much the same way as in PPI.
38
39       The class hierarchy is also similar to PPI. Except for some utility
40       classes (the dumper, the lexer, and the tokenizer) all classes are
41       descended from PPIx::Regexp::Element, which provides basic navigation.
42       Tokens are descended from PPIx::Regexp::Token, which provides content.
43       All containers are descended from PPIx::Regexp::Node, which provides
44       for children, and all structure elements are descended from
45       PPIx::Regexp::Structure, which provides beginning and ending
46       delimiters, and a type.
47
48       There are two features of PPI that this package does not provide -
49       mutability and operator overloading. There are no plans for serious
50       mutability, though something like PPI's "prune" functionality might be
51       considered. Similarly there are no plans for operator overloading,
52       which appears to the author to represent a performance hit for little
53       tangible gain.
54

NOTICE

56       This is alpha code. The author will attempt to preserve the documented
57       interface, but if the interface needs to change to correct some
58       egregiously bad design or implementation decision, then it will change.
59
60       The goal of this package is to parse well-formed regular expressions
61       correctly. A secondary goal is not to blow up on ill-formed regular
62       expressions. The correct identification and characterization of ill-
63       formed regular expressions is not a goal of this package.
64

METHODS

66       This class provides the following public methods. Methods not
67       documented here are private, and unsupported in the sense that the
68       author reserves the right to change or remove them without notice.
69
70   new
71        my $re = PPIx::Regexp->new('/foo/');
72
73       This method instantiates a "PPIx::Regexp" object from a string, a
74       PPI::Token::QuoteLike::Regexp, a PPI::Token::Regexp::Match, or a
75       PPI::Token::Regexp::Substitute.  Honestly, any PPI::Element will do,
76       but only the three Regexp classes mentioned previously are likely to do
77       anything useful.
78
79       Optionally you can pass one or more name/value pairs after the regular
80       expression. The possible options are:
81
82       encoding name
83           This option specifies the encoding of the regular expression. This
84           is passed to the tokenizer, which will "decode" the regular
85           expression string before it tokenizes it. For example:
86
87            my $re = PPIx::Regexp->new( '/foo/',
88                encoding => 'iso-8859-1',
89            );
90
91       trace number
92           If greater than zero, this option causes trace output from the
93           parse.  The author reserves the right to change or eliminate this
94           without notice.
95
96       Passing optional input other than the above is not an error, but
97       neither is it supported.
98
99   new_from_cache
100       This static method wraps "new" in a caching mechanism. Only one object
101       will be generated for a given PPI::Element, no matter how many times
102       this method is called. Calls after the first for a given PPI::Element
103       simply return the same "PPIx::Regexp" object.
104
105       When the "PPIx::Regexp" object is returned from cache, the values of
106       the optional arguments are ignored.
107
108       Calls to this method with the regular expression in a string rather
109       than a PPI::Element will not be cached.
110
111       Caveat: This method is provided for code like Perl::Critic which might
112       instantiate the same object multiple times. The cache will persist
113       until "flush_cache" is called.
114
115   flush_cache
116        $re->flush_cache();            # Remove $re from cache
117        PPIx::Regexp->flush_cache();   # Empty the cache
118
119       This method flushes the cache used by "new_from_cache". If called as a
120       static method with no arguments, the entire cache is emptied. Otherwise
121       any objects specified are removed from the cache.
122
123   capture_names
124        foreach my $name ( $re->capture_names() ) {
125            print "Capture name '$name'\n";
126        }
127
128       This convenience method returns the capture names found in the regular
129       expression.
130
131       This method is equivalent to
132
133        $self->regular_expression()->capture_names();
134
135       except that if "$self->regular_expression()" returns "undef" (meaning
136       that something went terribly wrong with the parse) this method will
137       simply return.
138
139   delimiters
140        print join("\t", PPIx::Regexp->new('s/foo/bar/')->delimiters());
141        # prints '//      //'
142
143       When called in list context, this method returns either one or two
144       strings, depending on whether the parsed expression has a replacement
145       string. In the case of non-bracketed substitutions, the start delimiter
146       of the replacement string is considered to be the same as its finish
147       delimiter, as illustrated by the above example.
148
149       When called in scalar context, you get the delimiters of the regular
150       expression; that is, element 0 of the array that is returned in list
151       context.
152
153       Optionally, you can pass an index value and the corresponding
154       delimiters will be returned; index 0 represents the regular
155       expression's delimiters, and index 1 represents the replacement
156       string's delimiters, which may be undef. For example,
157
158        print PPIx::Regexp->new('s{foo}<bar>')-delimiters(1);
159        # prints '[]'
160
161       If the object was not initialized with a valid regexp of some sort, the
162       results of this method are undefined.
163
164   errstr
165       This static method returns the error string from the most recent
166       attempt to instantiate a "PPIx::Regexp". It will be "undef" if the most
167       recent attempt succeeded.
168
169   failures
170        print "There were ", $re->failures(), " parse failures\n";
171
172       This method returns the number of parse failures. This is a count of
173       the number of unknown tokens plus the number of unterminated structures
174       plus the number of unmatched right brackets of any sort.
175
176   max_capture_number
177        print "Highest used capture number ",
178            $re->max_capture_number(), "\n";
179
180       This convenience method returns the highest capture number used by the
181       regular expression. If there are no captures, the return will be 0.
182
183       This method is equivalent to
184
185        $self->regular_expression()->max_capture_number();
186
187       except that if "$self->regular_expression()" returns "undef" (meaning
188       that something went terribly wrong with the parse) this method will
189       too.
190
191   modifier
192        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
193        print $re->modifier()->content(), "\n";
194        # prints 'smx'.
195
196       This method retrieves the modifier of the object. This comes from the
197       end of the initializing string or object and will be a
198       PPIx::Regexp::Token::Modifier.
199
200       In the event of a parse failure, there may not be a modifier present,
201       in which case nothing is returned.
202
203   regular_expression
204        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
205        print $re->regular_expression()->content(), "\n";
206        # prints '/(foo)/'.
207
208       This method returns that portion of the object which actually
209       represents a regular expression.
210
211   replacement
212        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
213        print $re->replacement()->content(), "\n";
214        # prints '${1}bar/'.
215
216       This method returns that portion of the object which represents the
217       replacement string. This will be "undef" unless the regular expression
218       actually has a replacement string. Delimiters will be included, but
219       there will be no beginning delimiter unless the regular expression was
220       bracketed.
221
222   source
223        my $source = $re->source();
224
225       This method returns the object or string that was used to instantiate
226       the object.
227
228   type
229        my $re = PPIx::Regexp->new( 's/(foo)/${1}bar/smx' );
230        print $re->type()->content(), "\n";
231        # prints 's'.
232
233       This method retrieves the type of the object. This comes from the
234       beginning of the initializing string or object, and will be a
235       PPIx::Regexp::Token::Structure whose "content" is one of 's', 'm',
236       'qr', or ''.
237

RESTRICTIONS

239       By the nature of this module, it is never going to get everything
240       right.  Many of the known problem areas involve interpolations one way
241       or another.
242
243   Ambiguous Syntax
244       Perl's regular expressions contain cases where the syntax is ambiguous.
245       A particularly egregious example is an interpolation followed by square
246       or curly brackets, for example $foo[...]. There is nothing in the
247       syntax to say whether the programmer wanted to interpolate an element
248       of array @foo, or whether he wanted to interpolate scalar $foo, and
249       then follow that interpolation by a character class.
250
251       The perlop documentation notes that in this case what Perl does is to
252       guess. That is, it employs various heuristics on the code to try to
253       figure out what the programmer wanted. These heuristics are documented
254       as being undocumented (!) and subject to change without notice.
255
256       Given this situation, this module's chances of duplicating every Perl
257       version's interpretation of every regular expression are pretty much
258       nil.  What it does now is to assume that square brackets containing
259       only an integer or an interpolation represent a subscript; otherwise
260       they represent a character class. Similarly, curly brackets containing
261       only a bareword or an interpolation are a subscript; otherwise they
262       represent a quantifier.
263
264   Static Parsing
265       It is well known that Perl can not be statically parsed. That is, you
266       can not completely parse a piece of Perl code without executing that
267       same code.
268
269       Nevertheless, this class is trying to statically parse regular
270       expressions. The main problem with this is that there is no way to know
271       what is being interpolated into the regular expression by an
272       interpolated variable. This is a problem because the interpolated value
273       can change the interpretation of adjacent elements.
274
275       This module deals with this by making assumptions about what is in an
276       interpolated variable. These assumptions will not be enumerated here,
277       but in general the principal is to assume the interpolated value does
278       not change the interpretation of the regular expression. For example,
279
280        my $foo = 'a-z]';
281        my $re = qr{[$foo};
282
283       is fine with the Perl interpreter, but will confuse the dickens out of
284       this module. Similarly and more usefully, something like
285
286        my $mods = 'i';
287        my $re = qr{(?$mods:foo)};
288
289       or maybe
290
291        my $mods = 'i';
292        my $re = qr{(?$mods)$foo};
293
294       probably sets a modifier of some sort, and that is how this module
295       interprets it. If the interpolation is not about modifiers, this module
296       will get it wrong. Another such semi-benign example is
297
298        my $foo = $] >= 5.010 ? '?<foo>' : '';
299        my $re = qr{($foo\w+)};
300
301       which will parse, but this module will never realize that it might be
302       looking at a named capture.
303
304   Non-Standard Syntax
305       There are modules out there that alter the syntax of Perl. If the
306       syntax of a regular expression is altered, this module has no way to
307       understand that it has been altered, much less to adapt to the
308       alteration. The following modules are known to cause problems:
309
310       Acme::PerlML, which renders Perl as XML.
311
312       Data::PostfixDeref, which causes Perl to interpret suffixed empty
313       brackets as dereferencing the thing they suffix.
314
315       Filter::Trigraph, which recognizes ANSI C trigraphs, allowing Perl to
316       be written in the ISO 646 character set.
317
318       Perl6::Pugs. Enough said.
319
320       Perl6::Rules, which back-ports some of the Perl 6 regular expression
321       syntax to Perl 5.
322
323       Regexp::Extended, which extends regular expressions in various ways,
324       some of which seem to conflict with Perl 5.010.
325

SEE ALSO

327       Regexp::Parser, which parses a bare regular expression (without
328       enclosing "qr{}", "m//", or whatever) and uses a different navigation
329       model.
330

SUPPORT

332       Support is by the author. Please file bug reports at
333       <http://rt.cpan.org>, or in electronic mail to the author.
334

AUTHOR

336       Thomas R. Wyant, III wyant at cpan dot org
337
339       Copyright (C) 2009-2010, Thomas R. Wyant, III
340
341       This program is free software; you can redistribute it and/or modify it
342       under the same terms as Perl 5.10.0. For more details, see the full
343       text of the licenses in the directory LICENSES.
344
345       This program is distributed in the hope that it will be useful, but
346       without any warranty; without even the implied warranty of
347       merchantability or fitness for a particular purpose.
348
349
350
351perl v5.12.0                      2010-06-08                   PPIx::Regexp(3)
Impressum