1re(3pm) Perl Programmers Reference Guide re(3pm)
2
3
4
6 re - Perl pragma to alter regular expression behaviour
7
9 use re 'taint';
10 ($x) = ($^X =~ /^(.*)$/s); # $x is tainted here
11
12 $pat = '(?{ $foo = 1 })';
13 use re 'eval';
14 /foo${pat}bar/; # won't fail (when not under -T switch)
15
16 {
17 no re 'taint'; # the default
18 ($x) = ($^X =~ /^(.*)$/s); # $x is not tainted here
19
20 no re 'eval'; # the default
21 /foo${pat}bar/; # disallowed (with or without -T switch)
22 }
23
24 use re '/ix';
25 "FOO" =~ / foo /; # /ix implied
26 no re '/x';
27 "FOO" =~ /foo/; # just /i implied
28
29 use re 'debug'; # output debugging info during
30 /^(.*)$/s; # compile and run time
31
32
33 use re 'debugcolor'; # same as 'debug', but with colored output
34 ...
35
36 use re qw(Debug All); # Finer tuned debugging options.
37 use re qw(Debug More);
38 no re qw(Debug ALL); # Turn of all re debugging in this scope
39
40 use re qw(is_regexp regexp_pattern); # import utility functions
41 my ($pat,$mods)=regexp_pattern(qr/foo/i);
42 if (is_regexp($obj)) {
43 print "Got regexp: ",
44 scalar regexp_pattern($obj); # just as perl would stringify it
45 } # but no hassle with blessed re's.
46
47 (We use $^X in these examples because it's tainted by default.)
48
50 'taint' mode
51 When "use re 'taint'" is in effect, and a tainted string is the target
52 of a regexp, the regexp memories (or values returned by the m//
53 operator in list context) are tainted. This feature is useful when
54 regexp operations on tainted data aren't meant to extract safe
55 substrings, but to perform other transformations.
56
57 'eval' mode
58 When "use re 'eval'" is in effect, a regexp is allowed to contain "(?{
59 ... })" zero-width assertions and "(??{ ... })" postponed
60 subexpressions, even if the regular expression contains variable
61 interpolation. That is normally disallowed, since it is a potential
62 security risk. Note that this pragma is ignored when the regular
63 expression is obtained from tainted data, i.e. evaluation is always
64 disallowed with tainted regular expressions. See "(?{ code })" in
65 perlre and "(??{ code })" in perlre.
66
67 For the purpose of this pragma, interpolation of precompiled regular
68 expressions (i.e., the result of "qr//") is not considered variable
69 interpolation. Thus:
70
71 /foo${pat}bar/
72
73 is allowed if $pat is a precompiled regular expression, even if $pat
74 contains "(?{ ... })" assertions or "(??{ ... })" subexpressions.
75
76 '/flags' mode
77 When "use re '/flags'" is specified, the given flags are automatically
78 added to every regular expression till the end of the lexical scope.
79
80 "no re '/flags'" will turn off the effect of "use re '/flags'" for the
81 given flags.
82
83 For example, if you want all your regular expressions to have /msx on
84 by default, simply put
85
86 use re '/msx';
87
88 at the top of your code.
89
90 The character set /adul flags cancel each other out. So, in this
91 example,
92
93 use re "/u";
94 "ss" =~ /\xdf/;
95 use re "/d";
96 "ss" =~ /\xdf/;
97
98 the second "use re" does an implicit "no re '/u'".
99
100 Turning on one of the character set flags with "use re" takes
101 precedence over the "locale" pragma and the 'unicode_strings'
102 "feature", for regular expressions. Turning off one of these flags when
103 it is active reverts to the behaviour specified by whatever other
104 pragmata are in scope. For example:
105
106 use feature "unicode_strings";
107 no re "/u"; # does nothing
108 use re "/l";
109 no re "/l"; # reverts to unicode_strings behaviour
110
111 'debug' mode
112 When "use re 'debug'" is in effect, perl emits debugging messages when
113 compiling and using regular expressions. The output is the same as
114 that obtained by running a "-DDEBUGGING"-enabled perl interpreter with
115 the -Dr switch. It may be quite voluminous depending on the complexity
116 of the match. Using "debugcolor" instead of "debug" enables a form of
117 output that can be used to get a colorful display on terminals that
118 understand termcap color sequences. Set $ENV{PERL_RE_TC} to a comma-
119 separated list of "termcap" properties to use for highlighting strings
120 on/off, pre-point part on/off. See "Debugging Regular Expressions" in
121 perldebug for additional info.
122
123 As of 5.9.5 the directive "use re 'debug'" and its equivalents are
124 lexically scoped, as the other directives are. However they have both
125 compile-time and run-time effects.
126
127 See "Pragmatic Modules" in perlmodlib.
128
129 'Debug' mode
130 Similarly "use re 'Debug'" produces debugging output, the difference
131 being that it allows the fine tuning of what debugging output will be
132 emitted. Options are divided into three groups, those related to
133 compilation, those related to execution and those related to special
134 purposes. The options are as follows:
135
136 Compile related options
137 COMPILE
138 Turns on all compile related debug options.
139
140 PARSE
141 Turns on debug output related to the process of parsing the
142 pattern.
143
144 OPTIMISE
145 Enables output related to the optimisation phase of
146 compilation.
147
148 TRIEC
149 Detailed info about trie compilation.
150
151 DUMP
152 Dump the final program out after it is compiled and optimised.
153
154 Execute related options
155 EXECUTE
156 Turns on all execute related debug options.
157
158 MATCH
159 Turns on debugging of the main matching loop.
160
161 TRIEE
162 Extra debugging of how tries execute.
163
164 INTUIT
165 Enable debugging of start point optimisations.
166
167 Extra debugging options
168 EXTRA
169 Turns on all "extra" debugging options.
170
171 BUFFERS
172 Enable debugging the capture group storage during match.
173 Warning, this can potentially produce extremely large output.
174
175 TRIEM
176 Enable enhanced TRIE debugging. Enhances both TRIEE and TRIEC.
177
178 STATE
179 Enable debugging of states in the engine.
180
181 STACK
182 Enable debugging of the recursion stack in the engine. Enabling
183 or disabling this option automatically does the same for
184 debugging states as well. This output from this can be quite
185 large.
186
187 OPTIMISEM
188 Enable enhanced optimisation debugging and start point
189 optimisations. Probably not useful except when debugging the
190 regexp engine itself.
191
192 OFFSETS
193 Dump offset information. This can be used to see how regops
194 correlate to the pattern. Output format is
195
196 NODENUM:POSITION[LENGTH]
197
198 Where 1 is the position of the first char in the string. Note
199 that position can be 0, or larger than the actual length of the
200 pattern, likewise length can be zero.
201
202 OFFSETSDBG
203 Enable debugging of offsets information. This emits copious
204 amounts of trace information and doesn't mesh well with other
205 debug options.
206
207 Almost definitely only useful to people hacking on the offsets
208 part of the debug engine.
209
210 Other useful flags
211 These are useful shortcuts to save on the typing.
212
213 ALL Enable all options at once except OFFSETS, OFFSETSDBG and
214 BUFFERS
215
216 All Enable DUMP and all execute options. Equivalent to:
217
218 use re 'debug';
219
220 MORE
221 More
222 Enable TRIEM and all execute compile and execute options.
223
224 As of 5.9.5 the directive "use re 'debug'" and its equivalents are
225 lexically scoped, as the other directives are. However they have both
226 compile-time and run-time effects.
227
228 Exportable Functions
229 As of perl 5.9.5 're' debug contains a number of utility functions that
230 may be optionally exported into the caller's namespace. They are listed
231 below.
232
233 is_regexp($ref)
234 Returns true if the argument is a compiled regular expression as
235 returned by "qr//", false if it is not.
236
237 This function will not be confused by overloading or blessing. In
238 internals terms, this extracts the regexp pointer out of the
239 PERL_MAGIC_qr structure so it cannot be fooled.
240
241 regexp_pattern($ref)
242 If the argument is a compiled regular expression as returned by
243 "qr//", then this function returns the pattern.
244
245 In list context it returns a two element list, the first element
246 containing the pattern and the second containing the modifiers used
247 when the pattern was compiled.
248
249 my ($pat, $mods) = regexp_pattern($ref);
250
251 In scalar context it returns the same as perl would when
252 stringifying a raw "qr//" with the same pattern inside. If the
253 argument is not a compiled reference then this routine returns
254 false but defined in scalar context, and the empty list in list
255 context. Thus the following
256
257 if (regexp_pattern($ref) eq '(?^i:foo)')
258
259 will be warning free regardless of what $ref actually is.
260
261 Like "is_regexp" this function will not be confused by overloading
262 or blessing of the object.
263
264 regmust($ref)
265 If the argument is a compiled regular expression as returned by
266 "qr//", then this function returns what the optimiser considers to
267 be the longest anchored fixed string and longest floating fixed
268 string in the pattern.
269
270 A fixed string is defined as being a substring that must appear for
271 the pattern to match. An anchored fixed string is a fixed string
272 that must appear at a particular offset from the beginning of the
273 match. A floating fixed string is defined as a fixed string that
274 can appear at any point in a range of positions relative to the
275 start of the match. For example,
276
277 my $qr = qr/here .* there/x;
278 my ($anchored, $floating) = regmust($qr);
279 print "anchored:'$anchored'\nfloating:'$floating'\n";
280
281 results in
282
283 anchored:'here'
284 floating:'there'
285
286 Because the "here" is before the ".*" in the pattern, its position
287 can be determined exactly. That's not true, however, for the
288 "there"; it could appear at any point after where the anchored
289 string appeared. Perl uses both for its optimisations, prefering
290 the longer, or, if they are equal, the floating.
291
292 NOTE: This may not necessarily be the definitive longest anchored
293 and floating string. This will be what the optimiser of the Perl
294 that you are using thinks is the longest. If you believe that the
295 result is wrong please report it via the perlbug utility.
296
297 regname($name,$all)
298 Returns the contents of a named buffer of the last successful
299 match. If $all is true, then returns an array ref containing one
300 entry per buffer, otherwise returns the first defined buffer.
301
302 regnames($all)
303 Returns a list of all of the named buffers defined in the last
304 successful match. If $all is true, then it returns all names
305 defined, if not it returns only names which were involved in the
306 match.
307
308 regnames_count()
309 Returns the number of distinct names defined in the pattern used
310 for the last successful match.
311
312 Note: this result is always the actual number of distinct named
313 buffers defined, it may not actually match that which is returned
314 by "regnames()" and related routines when those routines have not
315 been called with the $all parameter set.
316
318 "Pragmatic Modules" in perlmodlib.
319
320
321
322perl v5.16.3 2013-03-04 re(3pm)