1re(3pm) Perl Programmers Reference Guide re(3pm)
2
3
4
6 re - Perl pragma to alter regular expression behaviour
7
9 use re 'taint';
10 ($x) = ($^X =~ /^(.*)$/s); # $x is tainted here
11
12 $pat = '(?{ $foo = 1 })';
13 use re 'eval';
14 /foo${pat}bar/; # won't fail (when not under -T switch)
15
16 {
17 no re 'taint'; # the default
18 ($x) = ($^X =~ /^(.*)$/s); # $x is not tainted here
19
20 no re 'eval'; # the default
21 /foo${pat}bar/; # disallowed (with or without -T switch)
22 }
23
24 use re 'debug'; # output debugging info during
25 /^(.*)$/s; # compile and run time
26
27
28 use re 'debugcolor'; # same as 'debug', but with colored output
29 ...
30
31 use re qw(Debug All); # Finer tuned debugging options.
32 use re qw(Debug More);
33 no re qw(Debug ALL); # Turn of all re debugging in this scope
34
35 use re qw(is_regexp regexp_pattern); # import utility functions
36 my ($pat,$mods)=regexp_pattern(qr/foo/i);
37 if (is_regexp($obj)) {
38 print "Got regexp: ",
39 scalar regexp_pattern($obj); # just as perl would stringify it
40 } # but no hassle with blessed re's.
41
42 (We use $^X in these examples because it's tainted by default.)
43
45 'taint' mode
46 When "use re 'taint'" is in effect, and a tainted string is the target
47 of a regexp, the regexp memories (or values returned by the m//
48 operator in list context) are tainted. This feature is useful when
49 regexp operations on tainted data aren't meant to extract safe
50 substrings, but to perform other transformations.
51
52 'eval' mode
53 When "use re 'eval'" is in effect, a regexp is allowed to contain "(?{
54 ... })" zero-width assertions and "(??{ ... })" postponed
55 subexpressions, even if the regular expression contains variable
56 interpolation. That is normally disallowed, since it is a potential
57 security risk. Note that this pragma is ignored when the regular
58 expression is obtained from tainted data, i.e. evaluation is always
59 disallowed with tainted regular expressions. See "(?{ code })" in
60 perlre and "(??{ code })" in perlre.
61
62 For the purpose of this pragma, interpolation of precompiled regular
63 expressions (i.e., the result of "qr//") is not considered variable
64 interpolation. Thus:
65
66 /foo${pat}bar/
67
68 is allowed if $pat is a precompiled regular expression, even if $pat
69 contains "(?{ ... })" assertions or "(??{ ... })" subexpressions.
70
71 'debug' mode
72 When "use re 'debug'" is in effect, perl emits debugging messages when
73 compiling and using regular expressions. The output is the same as
74 that obtained by running a "-DDEBUGGING"-enabled perl interpreter with
75 the -Dr switch. It may be quite voluminous depending on the complexity
76 of the match. Using "debugcolor" instead of "debug" enables a form of
77 output that can be used to get a colorful display on terminals that
78 understand termcap color sequences. Set $ENV{PERL_RE_TC} to a comma-
79 separated list of "termcap" properties to use for highlighting strings
80 on/off, pre-point part on/off. See "Debugging regular expressions" in
81 perldebug for additional info.
82
83 As of 5.9.5 the directive "use re 'debug'" and its equivalents are
84 lexically scoped, as the other directives are. However they have both
85 compile-time and run-time effects.
86
87 See "Pragmatic Modules" in perlmodlib.
88
89 'Debug' mode
90 Similarly "use re 'Debug'" produces debugging output, the difference
91 being that it allows the fine tuning of what debugging output will be
92 emitted. Options are divided into three groups, those related to
93 compilation, those related to execution and those related to special
94 purposes. The options are as follows:
95
96 Compile related options
97 COMPILE
98 Turns on all compile related debug options.
99
100 PARSE
101 Turns on debug output related to the process of parsing the
102 pattern.
103
104 OPTIMISE
105 Enables output related to the optimisation phase of
106 compilation.
107
108 TRIEC
109 Detailed info about trie compilation.
110
111 DUMP
112 Dump the final program out after it is compiled and optimised.
113
114 Execute related options
115 EXECUTE
116 Turns on all execute related debug options.
117
118 MATCH
119 Turns on debugging of the main matching loop.
120
121 TRIEE
122 Extra debugging of how tries execute.
123
124 INTUIT
125 Enable debugging of start point optimisations.
126
127 Extra debugging options
128 EXTRA
129 Turns on all "extra" debugging options.
130
131 BUFFERS
132 Enable debugging the capture buffer storage during match.
133 Warning, this can potentially produce extremely large output.
134
135 TRIEM
136 Enable enhanced TRIE debugging. Enhances both TRIEE and TRIEC.
137
138 STATE
139 Enable debugging of states in the engine.
140
141 STACK
142 Enable debugging of the recursion stack in the engine. Enabling
143 or disabling this option automatically does the same for
144 debugging states as well. This output from this can be quite
145 large.
146
147 OPTIMISEM
148 Enable enhanced optimisation debugging and start point
149 optimisations. Probably not useful except when debugging the
150 regexp engine itself.
151
152 OFFSETS
153 Dump offset information. This can be used to see how regops
154 correlate to the pattern. Output format is
155
156 NODENUM:POSITION[LENGTH]
157
158 Where 1 is the position of the first char in the string. Note
159 that position can be 0, or larger than the actual length of the
160 pattern, likewise length can be zero.
161
162 OFFSETSDBG
163 Enable debugging of offsets information. This emits copious
164 amounts of trace information and doesn't mesh well with other
165 debug options.
166
167 Almost definitely only useful to people hacking on the offsets
168 part of the debug engine.
169
170 Other useful flags
171 These are useful shortcuts to save on the typing.
172
173 ALL Enable all options at once except OFFSETS, OFFSETSDBG and
174 BUFFERS
175
176 All Enable DUMP and all execute options. Equivalent to:
177
178 use re 'debug';
179
180 MORE
181 More
182 Enable TRIEM and all execute compile and execute options.
183
184 As of 5.9.5 the directive "use re 'debug'" and its equivalents are
185 lexically scoped, as the other directives are. However they have both
186 compile-time and run-time effects.
187
188 Exportable Functions
189 As of perl 5.9.5 're' debug contains a number of utility functions that
190 may be optionally exported into the caller's namespace. They are listed
191 below.
192
193 is_regexp($ref)
194 Returns true if the argument is a compiled regular expression as
195 returned by "qr//", false if it is not.
196
197 This function will not be confused by overloading or blessing. In
198 internals terms, this extracts the regexp pointer out of the
199 PERL_MAGIC_qr structure so it it cannot be fooled.
200
201 regexp_pattern($ref)
202 If the argument is a compiled regular expression as returned by
203 "qr//", then this function returns the pattern.
204
205 In list context it returns a two element list, the first element
206 containing the pattern and the second containing the modifiers used
207 when the pattern was compiled.
208
209 my ($pat, $mods) = regexp_pattern($ref);
210
211 In scalar context it returns the same as perl would when
212 stringifying a raw "qr//" with the same pattern inside. If the
213 argument is not a compiled reference then this routine returns
214 false but defined in scalar context, and the empty list in list
215 context. Thus the following
216
217 if (regexp_pattern($ref) eq '(?i-xsm:foo)')
218
219 will be warning free regardless of what $ref actually is.
220
221 Like "is_regexp" this function will not be confused by overloading
222 or blessing of the object.
223
224 regmust($ref)
225 If the argument is a compiled regular expression as returned by
226 "qr//", then this function returns what the optimiser considers to
227 be the longest anchored fixed string and longest floating fixed
228 string in the pattern.
229
230 A fixed string is defined as being a substring that must appear for
231 the pattern to match. An anchored fixed string is a fixed string
232 that must appear at a particular offset from the beginning of the
233 match. A floating fixed string is defined as a fixed string that
234 can appear at any point in a range of positions relative to the
235 start of the match. For example,
236
237 my $qr = qr/here .* there/x;
238 my ($anchored, $floating) = regmust($qr);
239 print "anchored:'$anchored'\nfloating:'$floating'\n";
240
241 results in
242
243 anchored:'here'
244 floating:'there'
245
246 Because the "here" is before the ".*" in the pattern, its position
247 can be determined exactly. That's not true, however, for the
248 "there"; it could appear at any point after where the anchored
249 string appeared. Perl uses both for its optimisations, prefering
250 the longer, or, if they are equal, the floating.
251
252 NOTE: This may not necessarily be the definitive longest anchored
253 and floating string. This will be what the optimiser of the Perl
254 that you are using thinks is the longest. If you believe that the
255 result is wrong please report it via the perlbug utility.
256
257 regname($name,$all)
258 Returns the contents of a named buffer of the last successful
259 match. If $all is true, then returns an array ref containing one
260 entry per buffer, otherwise returns the first defined buffer.
261
262 regnames($all)
263 Returns a list of all of the named buffers defined in the last
264 successful match. If $all is true, then it returns all names
265 defined, if not it returns only names which were involved in the
266 match.
267
268 regnames_count()
269 Returns the number of distinct names defined in the pattern used
270 for the last successful match.
271
272 Note: this result is always the actual number of distinct named
273 buffers defined, it may not actually match that which is returned
274 by "regnames()" and related routines when those routines have not
275 been called with the $all parameter set.
276
278 "Pragmatic Modules" in perlmodlib.
279
280
281
282perl v5.12.4 2011-11-04 re(3pm)