1re(3pm)                Perl Programmers Reference Guide                re(3pm)
2
3
4

NAME

6       re - Perl pragma to alter regular expression behaviour
7

SYNOPSIS

9           use re 'taint';
10           ($x) = ($^X =~ /^(.*)$/s);     # $x is tainted here
11
12           $pat = '(?{ $foo = 1 })';
13           use re 'eval';
14           /foo${pat}bar/;                # won't fail (when not under -T
15                                          # switch)
16
17           {
18               no re 'taint';             # the default
19               ($x) = ($^X =~ /^(.*)$/s); # $x is not tainted here
20
21               no re 'eval';              # the default
22               /foo${pat}bar/;            # disallowed (with or without -T
23                                          # switch)
24           }
25
26           use re 'strict';               # Raise warnings for more conditions
27
28           use re '/ix';
29           "FOO" =~ / foo /; # /ix implied
30           no re '/x';
31           "FOO" =~ /foo/; # just /i implied
32
33           use re 'debug';                # output debugging info during
34           /^(.*)$/s;                     # compile and run time
35
36
37           use re 'debugcolor';           # same as 'debug', but with colored
38                                          # output
39           ...
40
41           use re qw(Debug All);          # Same as "use re 'debug'", but you
42                                          # can use "Debug" with things other
43                                          # than 'All'
44           use re qw(Debug More);         # 'All' plus output more details
45           no re qw(Debug ALL);           # Turn on (almost) all re debugging
46                                          # in this scope
47
48           use re qw(is_regexp regexp_pattern); # import utility functions
49           my ($pat,$mods)=regexp_pattern(qr/foo/i);
50           if (is_regexp($obj)) {
51               print "Got regexp: ",
52                   scalar regexp_pattern($obj); # just as perl would stringify
53           }                                    # it but no hassle with blessed
54                                                # re's.
55
56       (We use $^X in these examples because it's tainted by default.)
57

DESCRIPTION

59   'taint' mode
60       When "use re 'taint'" is in effect, and a tainted string is the target
61       of a regexp, the regexp memories (or values returned by the m//
62       operator in list context) are tainted.  This feature is useful when
63       regexp operations on tainted data aren't meant to extract safe
64       substrings, but to perform other transformations.
65
66   'eval' mode
67       When "use re 'eval'" is in effect, a regexp is allowed to contain "(?{
68       ... })" zero-width assertions and "(??{ ... })" postponed
69       subexpressions that are derived from variable interpolation, rather
70       than appearing literally within the regexp.  That is normally
71       disallowed, since it is a potential security risk.  Note that this
72       pragma is ignored when the regular expression is obtained from tainted
73       data, i.e.  evaluation is always disallowed with tainted regular
74       expressions.  See "(?{ code })" in perlre and "(??{ code })" in perlre.
75
76       For the purpose of this pragma, interpolation of precompiled regular
77       expressions (i.e., the result of "qr//") is not considered variable
78       interpolation.  Thus:
79
80           /foo${pat}bar/
81
82       is allowed if $pat is a precompiled regular expression, even if $pat
83       contains "(?{ ... })" assertions or "(??{ ... })" subexpressions.
84
85   'strict' mode
86       Note that this is an experimental feature which may be changed or
87       removed in a future Perl release.
88
89       When "use re 'strict'" is in effect, stricter checks are applied than
90       otherwise when compiling regular expressions patterns.  These may cause
91       more warnings to be raised than otherwise, and more things to be fatal
92       instead of just warnings.  The purpose of this is to find and report at
93       compile time some things, which may be legal, but have a reasonable
94       possibility of not being the programmer's actual intent.  This
95       automatically turns on the "regexp" warnings category (if not already
96       on) within its scope.
97
98       As an example of something that is caught under ""strict'", but not
99       otherwise, is the pattern
100
101        qr/\xABC/
102
103       The "\x" construct without curly braces should be followed by exactly
104       two hex digits; this one is followed by three.  This currently
105       evaluates as equivalent to
106
107        qr/\x{AB}C/
108
109       that is, the character whose code point value is 0xAB, followed by the
110       letter "C".  But since "C" is a hex digit, there is a reasonable chance
111       that the intent was
112
113        qr/\x{ABC}/
114
115       that is the single character at 0xABC.  Under 'strict' it is an error
116       to not follow "\x" with exactly two hex digits.  When not under
117       'strict' a warning is generated if there is only one hex digit, and no
118       warning is raised if there are more than two.
119
120       It is expected that what exactly 'strict' does will evolve over time as
121       we gain experience with it.  This means that programs that compile
122       under it in today's Perl may not compile, or may have more or fewer
123       warnings, in future Perls.  There is no backwards compatibility
124       promises with regards to it.  Also there are already proposals for an
125       alternate syntax for enabling it.  For these reasons, using it will
126       raise a "experimental::re_strict" class warning, unless that category
127       is turned off.
128
129       Note that if a pattern compiled within 'strict' is recompiled, say by
130       interpolating into another pattern, outside of 'strict', it is not
131       checked again for strictness.  This is because if it works under strict
132       it must work under non-strict.
133
134   '/flags' mode
135       When "use re '/flags'" is specified, the given flags are automatically
136       added to every regular expression till the end of the lexical scope.
137       flags can be any combination of 'a', 'aa', 'd', 'i', 'l', 'm', 'n',
138       'p', 's', 'u', 'x', and/or 'xx'.
139
140       "no re '/flags'" will turn off the effect of "use re '/flags'" for the
141       given flags.
142
143       For example, if you want all your regular expressions to have /msxx on
144       by default, simply put
145
146           use re '/msxx';
147
148       at the top of your code.
149
150       The character set "/adul" flags cancel each other out. So, in this
151       example,
152
153           use re "/u";
154           "ss" =~ /\xdf/;
155           use re "/d";
156           "ss" =~ /\xdf/;
157
158       the second "use re" does an implicit "no re '/u'".
159
160       Similarly,
161
162           use re "/xx";   # Doubled-x
163           ...
164           use re "/x";    # Single x from here on
165           ...
166
167       Turning on one of the character set flags with "use re" takes
168       precedence over the "locale" pragma and the 'unicode_strings'
169       "feature", for regular expressions. Turning off one of these flags when
170       it is active reverts to the behaviour specified by whatever other
171       pragmata are in scope. For example:
172
173           use feature "unicode_strings";
174           no re "/u"; # does nothing
175           use re "/l";
176           no re "/l"; # reverts to unicode_strings behaviour
177
178   'debug' mode
179       When "use re 'debug'" is in effect, perl emits debugging messages when
180       compiling and using regular expressions.  The output is the same as
181       that obtained by running a "-DDEBUGGING"-enabled perl interpreter with
182       the -Dr switch. It may be quite voluminous depending on the complexity
183       of the match.  Using "debugcolor" instead of "debug" enables a form of
184       output that can be used to get a colorful display on terminals that
185       understand termcap color sequences.  Set $ENV{PERL_RE_TC} to a comma-
186       separated list of "termcap" properties to use for highlighting strings
187       on/off, pre-point part on/off.  See "Debugging Regular Expressions" in
188       perldebug for additional info.
189
190       NOTE that the exact format of the "debug" mode is NOT considered to be
191       an officially supported API of Perl. It is intended for debugging only
192       and may change as the core development team deems appropriate without
193       notice or deprecation in any release of Perl, major or minor.  Any
194       documentation of the output is purely advisory.
195
196       As of 5.9.5 the directive "use re 'debug'" and its equivalents are
197       lexically scoped, as the other directives are.  However they have both
198       compile-time and run-time effects.
199
200       See "Pragmatic Modules" in perlmodlib.
201
202   'Debug' mode
203       Similarly "use re 'Debug'" produces debugging output, the difference
204       being that it allows the fine tuning of what debugging output will be
205       emitted. Options are divided into three groups, those related to
206       compilation, those related to execution and those related to special
207       purposes.
208
209       NOTE that the options provided under the "Debug" mode and the exact
210       format of the output they create is NOT considered to be an officially
211       supported API of Perl. It is intended for debugging only and may change
212       as the core development team deems appropriate without notice or
213       deprecation in any release of Perl, major or minor. Any documentation
214       of the format or options available is advisory only and is subject to
215       change without notice.
216
217       The options are as follows:
218
219       Compile related options
220           COMPILE
221               Turns on all non-extra compile related debug options.
222
223           PARSE
224               Turns on debug output related to the process of parsing the
225               pattern.
226
227           OPTIMISE
228               Enables output related to the optimisation phase of
229               compilation.
230
231           TRIEC
232               Detailed info about trie compilation.
233
234           DUMP
235               Dump the final program out after it is compiled and optimised.
236
237           FLAGS
238               Dump the flags associated with the program
239
240           TEST
241               Print output intended for testing the internals of the compile
242               process
243
244       Execute related options
245           EXECUTE
246               Turns on all non-extra execute related debug options.
247
248           MATCH
249               Turns on debugging of the main matching loop.
250
251           TRIEE
252               Extra debugging of how tries execute.
253
254           INTUIT
255               Enable debugging of start-point optimisations.
256
257       Extra debugging options
258           EXTRA
259               Turns on all "extra" debugging options.
260
261           BUFFERS
262               Enable debugging the capture group storage during match.
263               Warning, this can potentially produce extremely large output.
264
265           TRIEM
266               Enable enhanced TRIE debugging. Enhances both TRIEE and TRIEC.
267
268           STATE
269               Enable debugging of states in the engine.
270
271           STACK
272               Enable debugging of the recursion stack in the engine. Enabling
273               or disabling this option automatically does the same for
274               debugging states as well. This output from this can be quite
275               large.
276
277           GPOS
278               Enable debugging of the \G modifier.
279
280           OPTIMISEM
281               Enable enhanced optimisation debugging and start-point
282               optimisations.  Probably not useful except when debugging the
283               regexp engine itself.
284
285           DUMP_PRE_OPTIMIZE
286               Enable the dumping of the compiled pattern before the
287               optimization phase.
288
289           WILDCARD
290               When Perl encounters a wildcard subpattern, (see "Wildcards in
291               Property Values" in perlunicode), it suspends compilation of
292               the main pattern, compiles the subpattern, and then matches
293               that against all legal possibilities to determine the actual
294               code points the subpattern matches.  After that it adds these
295               to the main pattern, and continues its compilation.
296
297               You may very well want to see how your subpattern gets
298               compiled, but it is likely of less use to you to see how Perl
299               matches that against all the legal possibilities, as that is
300               under control of Perl, not you.   Therefore, the debugging
301               information of the compilation portion is as specified by the
302               other options, but the debugging output of the matching portion
303               is normally suppressed.
304
305               You can use the WILDCARD option to enable the debugging output
306               of this subpattern matching.  Careful!  This can lead to
307               voluminous outputs, and it may not make much sense to you what
308               and why Perl is doing what it is.  But it may be helpful to you
309               to see why things aren't going the way you expect.
310
311               Note that this option alone doesn't cause any debugging
312               information to be output.  What it does is stop the normal
313               suppression of execution-related debugging information during
314               the matching portion of the compilation of wildcards.  You also
315               have to specify which execution debugging information you want,
316               such as by also including the EXECUTE option.
317
318       Other useful flags
319           These are useful shortcuts to save on the typing.
320
321           ALL Enable all options at once except BUFFERS, WILDCARD, and
322               DUMP_PRE_OPTIMIZE.  (To get every single option without
323               exception, use both ALL and EXTRA, or starting in 5.30 on a
324               "-DDEBUGGING"-enabled perl interpreter, use the -Drv command-
325               line switches.)
326
327           All Enable DUMP and all non-extra execute options. Equivalent to:
328
329                 use re 'debug';
330
331           MORE
332           More
333               Enable the options enabled by "All", plus STATE, TRIEC, and
334               TRIEM.
335
336       As of 5.9.5 the directive "use re 'debug'" and its equivalents are
337       lexically scoped, as are the other directives.  However they have both
338       compile-time and run-time effects.
339
340   Exportable Functions
341       As of perl 5.9.5 're' debug contains a number of utility functions that
342       may be optionally exported into the caller's namespace. They are listed
343       below.
344
345       is_regexp($ref)
346           Returns true if the argument is a compiled regular expression as
347           returned by "qr//", false if it is not.
348
349           This function will not be confused by overloading or blessing. In
350           internals terms, this extracts the regexp pointer out of the
351           PERL_MAGIC_qr structure so it cannot be fooled.
352
353       regexp_pattern($ref)
354           If the argument is a compiled regular expression as returned by
355           "qr//", then this function returns the pattern.
356
357           In list context it returns a two element list, the first element
358           containing the pattern and the second containing the modifiers used
359           when the pattern was compiled.
360
361             my ($pat, $mods) = regexp_pattern($ref);
362
363           In scalar context it returns the same as perl would when
364           stringifying a raw "qr//" with the same pattern inside.  If the
365           argument is not a compiled reference then this routine returns
366           false but defined in scalar context, and the empty list in list
367           context. Thus the following
368
369               if (regexp_pattern($ref) eq '(?^i:foo)')
370
371           will be warning free regardless of what $ref actually is.
372
373           Like "is_regexp" this function will not be confused by overloading
374           or blessing of the object.
375
376       regname($name,$all)
377           Returns the contents of a named buffer of the last successful
378           match. If $all is true, then returns an array ref containing one
379           entry per buffer, otherwise returns the first defined buffer.
380
381       regnames($all)
382           Returns a list of all of the named buffers defined in the last
383           successful match. If $all is true, then it returns all names
384           defined, if not it returns only names which were involved in the
385           match.
386
387       regnames_count()
388           Returns the number of distinct names defined in the pattern used
389           for the last successful match.
390
391           Note: this result is always the actual number of distinct named
392           buffers defined, it may not actually match that which is returned
393           by "regnames()" and related routines when those routines have not
394           been called with the $all parameter set.
395
396       regmust($ref)
397           If the argument is a compiled regular expression as returned by
398           "qr//", then this function returns what the optimiser considers to
399           be the longest anchored fixed string and longest floating fixed
400           string in the pattern.
401
402           A fixed string is defined as being a substring that must appear for
403           the pattern to match. An anchored fixed string is a fixed string
404           that must appear at a particular offset from the beginning of the
405           match. A floating fixed string is defined as a fixed string that
406           can appear at any point in a range of positions relative to the
407           start of the match. For example,
408
409               my $qr = qr/here .* there/x;
410               my ($anchored, $floating) = regmust($qr);
411               print "anchored:'$anchored'\nfloating:'$floating'\n";
412
413           results in
414
415               anchored:'here'
416               floating:'there'
417
418           Because the "here" is before the ".*" in the pattern, its position
419           can be determined exactly. That's not true, however, for the
420           "there"; it could appear at any point after where the anchored
421           string appeared.  Perl uses both for its optimisations, preferring
422           the longer, or, if they are equal, the floating.
423
424           NOTE: This may not necessarily be the definitive longest anchored
425           and floating string. This will be what the optimiser of the Perl
426           that you are using thinks is the longest. If you believe that the
427           result is wrong please report it via the perlbug utility.
428
429       optimization($ref)
430           If the argument is a compiled regular expression as returned by
431           "qr//", then this function returns a hashref of the optimization
432           information discovered at compile time, so we can write tests
433           around it. If any other argument is given, returns "undef".
434
435           The hash contents are expected to change from time to time as we
436           develop new ways to optimize - no assumption of stability should be
437           made, not even between minor versions of perl.
438
439           For the current version, the hash will have the following contents:
440
441           minlen
442               An integer, the least number of characters in any string that
443               can match.
444
445           minlenret
446               An integer, the least number of characters that can be in $&
447               after a match. (Consider eg " /ns(?=\d)/ ".)
448
449           gofs
450               An integer, the number of characters before "pos()" to start
451               match at.
452
453           noscan
454               A boolean, "TRUE" to indicate that any anchored/floating
455               substrings found should not be used. (CHECKME: apparently this
456               is set for an anchored pattern with no floating substring, but
457               never used.)
458
459           isall
460               A boolean, "TRUE" to indicate that the optimizer information is
461               all that the regular expression contains, and thus one does not
462               need to enter the regexp runtime engine at all.
463
464           anchor SBOL
465               A boolean, "TRUE" if the pattern is anchored to start of
466               string.
467
468           anchor MBOL
469               A boolean, "TRUE" if the pattern is anchored to any start of
470               line within the string.
471
472           anchor GPOS
473               A boolean, "TRUE" if the pattern is anchored to the end of the
474               previous match.
475
476           skip
477               A boolean, "TRUE" if the start class can match only the first
478               of a run.
479
480           implicit
481               A boolean, "TRUE" if a "/.*/" has been turned implicitly into a
482               "/^.*/".
483
484           anchored/floating
485               A byte string representing an anchored or floating substring
486               respectively that any match must contain, or undef if no such
487               substring was found, or if the substring would require utf8 to
488               represent.
489
490           anchored utf8/floating utf8
491               A utf8 string representing an anchored or floating substring
492               respectively that any match must contain, or undef if no such
493               substring was found, or if the substring contains only 7-bit
494               ASCII characters.
495
496           anchored min offset/floating min offset
497               An integer, the first offset in characters from a match
498               location at which we should look for the corresponding
499               substring.
500
501           anchored max offset/floating max offset
502               An integer, the last offset in characters from a match location
503               at which we should look for the corresponding substring.
504
505               Ignored for anchored, so may be 0 or same as min.
506
507           anchored end shift/floating end shift
508               FIXME: not sure what this is, something to do with lookbehind.
509               regcomp.c says:
510                   When the final pattern is compiled and the data is moved
511               from the
512                   scan_data_t structure into the regexp structure the
513               information
514                   about lookbehind is factored in, with the information that
515               would
516                   have been lost precalculated in the end_shift field for the
517                   associated string.
518
519           checking
520               A constant string, one of "anchored", "floating" or "none" to
521               indicate which substring (if any) should be checked for first.
522
523           stclass
524               A string representation of a character class ("start class")
525               that must be the first character of any match.
526
527               TODO: explain the representations.
528

SEE ALSO

530       "Pragmatic Modules" in perlmodlib.
531
532
533
534perl v5.36.0                      2022-08-30                           re(3pm)
Impressum