perlre(1)

1PERLRE(1)              Perl Programmers Reference Guide              PERLRE(1)
2
3
4

NAME

6       perlre - Perl regular expressions
7

DESCRIPTION

9       This page describes the syntax of regular expressions in Perl.
10
11       If you haven't used regular expressions before, a quick-start
12       introduction is available in perlrequick, and a longer tutorial
13       introduction is available in perlretut.
14
15       For reference on how regular expressions are used in matching
16       operations, plus various examples of the same, see discussions of
17       "m//", "s///", "qr//" and "??" in "Regexp Quote-Like Operators" in
18       perlop.
19
20   Modifiers
21       Matching operations can have various modifiers.  Modifiers that relate
22       to the interpretation of the regular expression inside are listed
23       below.  Modifiers that alter the way a regular expression is used by
24       Perl are detailed in "Regexp Quote-Like Operators" in perlop and "Gory
25       details of parsing quoted constructs" in perlop.
26
27       m   Treat string as multiple lines.  That is, change "^" and "$" from
28           matching the start or end of the string to matching the start or
29           end of any line anywhere within the string.
30
31       s   Treat string as single line.  That is, change "." to match any
32           character whatsoever, even a newline, which normally it would not
33           match.
34
35           Used together, as "/ms", they let the "." match any character
36           whatsoever, while still allowing "^" and "$" to match,
37           respectively, just after and just before newlines within the
38           string.
39
40       i   Do case-insensitive pattern matching.
41
42           If locale matching rules are in effect, the case map is taken from
43           the current locale for code points less than 255, and from Unicode
44           rules for larger code points.  However, matches that would cross
45           the Unicode rules/non-Unicode rules boundary (ords 255/256) will
46           not succeed.  See perllocale.
47
48           There are a number of Unicode characters that match multiple
49           characters under "/i".  For example, "LATIN SMALL LIGATURE FI"
50           should match the sequence "fi".  Perl is not currently able to do
51           this when the multiple characters are in the pattern and are split
52           between groupings, or when one or more are quantified.  Thus
53
54            "\N{LATIN SMALL LIGATURE FI}" =~ /fi/i;          # Matches
55            "\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i;    # Doesn't match!
56            "\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i;         # Doesn't match!
57
58            # The below doesn't match, and it isn't clear what $1 and $2 would
59            # be even if it did!!
60            "\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i;      # Doesn't match!
61
62           Perl doesn't match multiple characters in an inverted bracketed
63           character class, which otherwise could be highly confusing.  See
64           "Negation" in perlrecharclass.
65
66           Another bug involves character classes that match both a sequence
67           of multiple characters, and an initial sub-string of that sequence.
68           For example,
69
70            /[s\xDF]/i
71
72           should match both a single and a double "s", since "\xDF" (on ASCII
73           platforms) matches "ss".  However, this bug ([perl #89774]
74           <https://rt.perl.org/rt3/Ticket/Display.html?id=89774>) causes it
75           to only match a single "s", even if the final larger match fails,
76           and matching the double "ss" would have succeeded.
77
78           Also, Perl matching doesn't fully conform to the current Unicode
79           "/i" recommendations, which ask that the matching be made upon the
80           NFD (Normalization Form Decomposed) of the text.  However, Unicode
81           is in the process of reconsidering and revising their
82           recommendations.
83
84       x   Extend your pattern's legibility by permitting whitespace and
85           comments.  Details in "/x"
86
87       p   Preserve the string matched such that ${^PREMATCH}, ${^MATCH}, and
88           ${^POSTMATCH} are available for use after matching.
89
90       g and c
91           Global matching, and keep the Current position after failed
92           matching.  Unlike i, m, s and x, these two flags affect the way the
93           regex is used rather than the regex itself. See "Using regular
94           expressions in Perl" in perlretut for further explanation of the g
95           and c modifiers.
96
97       a, d, l and u
98           These modifiers, all new in 5.14, affect which character-set
99           semantics (Unicode, etc.) are used, as described below in
100           "Character set modifiers".
101
102       Regular expression modifiers are usually written in documentation as
103       e.g., "the "/x" modifier", even though the delimiter in question might
104       not really be a slash.  The modifiers "/imsxadlup" may also be embedded
105       within the regular expression itself using the "(?...)" construct, see
106       "Extended Patterns" below.
107
108       /x
109
110       "/x" tells the regular expression parser to ignore most whitespace that
111       is neither backslashed nor within a character class.  You can use this
112       to break up your regular expression into (slightly) more readable
113       parts.  The "#" character is also treated as a metacharacter
114       introducing a comment, just as in ordinary Perl code.  This also means
115       that if you want real whitespace or "#" characters in the pattern
116       (outside a character class, where they are unaffected by "/x"), then
117       you'll either have to escape them (using backslashes or "\Q...\E") or
118       encode them using octal, hex, or "\N{}" escapes.  Taken together, these
119       features go a long way towards making Perl's regular expressions more
120       readable.  Note that you have to be careful not to include the pattern
121       delimiter in the comment--perl has no way of knowing you did not intend
122       to close the pattern early.  See the C-comment deletion code in perlop.
123       Also note that anything inside a "\Q...\E" stays unaffected by "/x".
124       And note that "/x" doesn't affect space interpretation within a single
125       multi-character construct.  For example in "\x{...}", regardless of the
126       "/x" modifier, there can be no spaces.  Same for a quantifier such as
127       "{3}" or "{5,}".  Similarly, "(?:...)" can't have a space between the
128       "?" and ":", but can between the "(" and "?".  Within any delimiters
129       for such a construct, allowed spaces are not affected by "/x", and
130       depend on the construct.  For example, "\x{...}" can't have spaces
131       because hexadecimal numbers don't have spaces in them.  But, Unicode
132       properties can have spaces, so in "\p{...}" there can be spaces that
133       follow the Unicode rules, for which see "Properties accessible through
134       \p{} and \P{}" in perluniprops.
135
136       Character set modifiers
137
138       "/d", "/u", "/a", and "/l", available starting in 5.14, are called the
139       character set modifiers; they affect the character set semantics used
140       for the regular expression.
141
142       The "/d", "/u", and "/l" modifiers are not likely to be of much use to
143       you, and so you need not worry about them very much.  They exist for
144       Perl's internal use, so that complex regular expression data structures
145       can be automatically serialized and later exactly reconstituted,
146       including all their nuances.  But, since Perl can't keep a secret, and
147       there may be rare instances where they are useful, they are documented
148       here.
149
150       The "/a" modifier, on the other hand, may be useful.  Its purpose is to
151       allow code that is to work mostly on ASCII data to not have to concern
152       itself with Unicode.
153
154       Briefly, "/l" sets the character set to that of whatever Locale is in
155       effect at the time of the execution of the pattern match.
156
157       "/u" sets the character set to Unicode.
158
159       "/a" also sets the character set to Unicode, BUT adds several
160       restrictions for ASCII-safe matching.
161
162       "/d" is the old, problematic, pre-5.14 Default character set behavior.
163       Its only use is to force that old behavior.
164
165       At any given time, exactly one of these modifiers is in effect.  Their
166       existence allows Perl to keep the originally compiled behavior of a
167       regular expression, regardless of what rules are in effect when it is
168       actually executed.  And if it is interpolated into a larger regex, the
169       original's rules continue to apply to it, and only it.
170
171       The "/l" and "/u" modifiers are automatically selected for regular
172       expressions compiled within the scope of various pragmas, and we
173       recommend that in general, you use those pragmas instead of specifying
174       these modifiers explicitly.  For one thing, the modifiers affect only
175       pattern matching, and do not extend to even any replacement done,
176       whereas using the pragmas give consistent results for all appropriate
177       operations within their scopes.  For example,
178
179        s/foo/\Ubar/il
180
181       will match "foo" using the locale's rules for case-insensitive
182       matching, but the "/l" does not affect how the "\U" operates.  Most
183       likely you want both of them to use locale rules.  To do this, instead
184       compile the regular expression within the scope of "use locale".  This
185       both implicitly adds the "/l" and applies locale rules to the "\U".
186       The lesson is to "use locale" and not "/l" explicitly.
187
188       Similarly, it would be better to use "use feature 'unicode_strings'"
189       instead of,
190
191        s/foo/\Lbar/iu
192
193       to get Unicode rules, as the "\L" in the former (but not necessarily
194       the latter) would also use Unicode rules.
195
196       More detail on each of the modifiers follows.  Most likely you don't
197       need to know this detail for "/l", "/u", and "/d", and can skip ahead
198       to /a.
199
200       /l
201
202       means to use the current locale's rules (see perllocale) when pattern
203       matching.  For example, "\w" will match the "word" characters of that
204       locale, and "/i" case-insensitive matching will match according to the
205       locale's case folding rules.  The locale used will be the one in effect
206       at the time of execution of the pattern match.  This may not be the
207       same as the compilation-time locale, and can differ from one match to
208       another if there is an intervening call of the setlocale() function.
209
210       Perl only supports single-byte locales.  This means that code points
211       above 255 are treated as Unicode no matter what locale is in effect.
212       Under Unicode rules, there are a few case-insensitive matches that
213       cross the 255/256 boundary.  These are disallowed under "/l".  For
214       example, 0xFF (on ASCII platforms) does not caselessly match the
215       character at 0x178, "LATIN CAPITAL LETTER Y WITH DIAERESIS", because
216       0xFF may not be "LATIN SMALL LETTER Y WITH DIAERESIS" in the current
217       locale, and Perl has no way of knowing if that character even exists in
218       the locale, much less what code point it is.
219
220       This modifier may be specified to be the default by "use locale", but
221       see "Which character set modifier is in effect?".
222
223       /u
224
225       means to use Unicode rules when pattern matching.  On ASCII platforms,
226       this means that the code points between 128 and 255 take on their
227       Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's).
228       (Otherwise Perl considers their meanings to be undefined.)  Thus, under
229       this modifier, the ASCII platform effectively becomes a Unicode
230       platform; and hence, for example, "\w" will match any of the more than
231       100_000 word characters in Unicode.
232
233       Unlike most locales, which are specific to a language and country pair,
234       Unicode classifies all the characters that are letters somewhere in the
235       world as "\w".  For example, your locale might not think that "LATIN
236       SMALL LETTER ETH" is a letter (unless you happen to speak Icelandic),
237       but Unicode does.  Similarly, all the characters that are decimal
238       digits somewhere in the world will match "\d"; this is hundreds, not
239       10, possible matches.  And some of those digits look like some of the
240       10 ASCII digits, but mean a different number, so a human could easily
241       think a number is a different quantity than it really is.  For example,
242       "BENGALI DIGIT FOUR" (U+09EA) looks very much like an "ASCII DIGIT
243       EIGHT" (U+0038).  And, "\d+", may match strings of digits that are a
244       mixture from different writing systems, creating a security issue.
245       "num()" in Unicode::UCD can be used to sort this out.  Or the "/a"
246       modifier can be used to force "\d" to match just the ASCII 0 through 9.
247
248       Also, under this modifier, case-insensitive matching works on the full
249       set of Unicode characters.  The "KELVIN SIGN", for example matches the
250       letters "k" and "K"; and "LATIN SMALL LIGATURE FF" matches the sequence
251       "ff", which, if you're not prepared, might make it look like a
252       hexadecimal constant, presenting another potential security issue.  See
253       <http://unicode.org/reports/tr36> for a detailed discussion of Unicode
254       security issues.
255
256       On the EBCDIC platforms that Perl handles, the native character set is
257       equivalent to Latin-1.  Thus this modifier changes behavior only when
258       the "/i" modifier is also specified, and it turns out it affects only
259       two characters, giving them full Unicode semantics: the "MICRO SIGN"
260       will match the Greek capital and small letters "MU", otherwise not; and
261       the "LATIN CAPITAL LETTER SHARP S" will match any of "SS", "Ss", "sS",
262       and "ss", otherwise not.
263
264       This modifier may be specified to be the default by "use feature
265       'unicode_strings", "use locale ':not_characters'", or "use 5.012" (or
266       higher), but see "Which character set modifier is in effect?".
267
268       /d
269
270       This modifier means to use the "Default" native rules of the platform
271       except when there is cause to use Unicode rules instead, as follows:
272
273       1.  the target string is encoded in UTF-8; or
274
275       2.  the pattern is encoded in UTF-8; or
276
277       3.  the pattern explicitly mentions a code point that is above 255 (say
278           by "\x{100}"); or
279
280       4.  the pattern uses a Unicode name ("\N{...}");  or
281
282       5.  the pattern uses a Unicode property ("\p{...}")
283
284       Another mnemonic for this modifier is "Depends", as the rules actually
285       used depend on various things, and as a result you can get unexpected
286       results.  See "The "Unicode Bug"" in perlunicode.  The Unicode Bug has
287       become rather infamous, leading to yet another (printable) name for
288       this modifier, "Dodgy".
289
290       On ASCII platforms, the native rules are ASCII, and on EBCDIC platforms
291       (at least the ones that Perl handles), they are Latin-1.
292
293       Here are some examples of how that works on an ASCII platform:
294
295        $str =  "\xDF";      # $str is not in UTF-8 format.
296        $str =~ /^\w/;       # No match, as $str isn't in UTF-8 format.
297        $str .= "\x{0e0b}";  # Now $str is in UTF-8 format.
298        $str =~ /^\w/;       # Match! $str is now in UTF-8 format.
299        chop $str;
300        $str =~ /^\w/;       # Still a match! $str remains in UTF-8 format.
301
302       This modifier is automatically selected by default when none of the
303       others are, so yet another name for it is "Default".
304
305       Because of the unexpected behaviors associated with this modifier, you
306       probably should only use it to maintain weird backward compatibilities.
307
308       /a (and /aa)
309
310       This modifier stands for ASCII-restrict (or ASCII-safe).  This
311       modifier, unlike the others, may be doubled-up to increase its effect.
312
313       When it appears singly, it causes the sequences "\d", "\s", "\w", and
314       the Posix character classes to match only in the ASCII range.  They
315       thus revert to their pre-5.6, pre-Unicode meanings.  Under "/a",  "\d"
316       always means precisely the digits "0" to "9"; "\s" means the five
317       characters "[ \f\n\r\t]"; "\w" means the 63 characters "[A-Za-z0-9_]";
318       and likewise, all the Posix classes such as "[[:print:]]" match only
319       the appropriate ASCII-range characters.
320
321       This modifier is useful for people who only incidentally use Unicode,
322       and who do not wish to be burdened with its complexities and security
323       concerns.
324
325       With "/a", one can write "\d" with confidence that it will only match
326       ASCII characters, and should the need arise to match beyond ASCII, you
327       can instead use "\p{Digit}" (or "\p{Word}" for "\w").  There are
328       similar "\p{...}" constructs that can match beyond ASCII both white
329       space (see "Whitespace" in perlrecharclass), and Posix classes (see
330       "POSIX Character Classes" in perlrecharclass).  Thus, this modifier
331       doesn't mean you can't use Unicode, it means that to get Unicode
332       matching you must explicitly use a construct ("\p{}", "\P{}") that
333       signals Unicode.
334
335       As you would expect, this modifier causes, for example, "\D" to mean
336       the same thing as "[^0-9]"; in fact, all non-ASCII characters match
337       "\D", "\S", and "\W".  "\b" still means to match at the boundary
338       between "\w" and "\W", using the "/a" definitions of them (similarly
339       for "\B").
340
341       Otherwise, "/a" behaves like the "/u" modifier, in that case-
342       insensitive matching uses Unicode semantics; for example, "k" will
343       match the Unicode "\N{KELVIN SIGN}" under "/i" matching, and code
344       points in the Latin1 range, above ASCII will have Unicode rules when it
345       comes to case-insensitive matching.
346
347       To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"),
348       specify the "a" twice, for example "/aai" or "/aia".  (The first
349       occurrence of "a" restricts the "\d", etc., and the second occurrence
350       adds the "/i" restrictions.)  But, note that code points outside the
351       ASCII range will use Unicode rules for "/i" matching, so the modifier
352       doesn't really restrict things to just ASCII; it just forbids the
353       intermixing of ASCII and non-ASCII.
354
355       To summarize, this modifier provides protection for applications that
356       don't wish to be exposed to all of Unicode.  Specifying it twice gives
357       added protection.
358
359       This modifier may be specified to be the default by "use re '/a'" or
360       "use re '/aa'".  If you do so, you may actually have occasion to use
361       the "/u" modifier explictly if there are a few regular expressions
362       where you do want full Unicode rules (but even here, it's best if
363       everything were under feature "unicode_strings", along with the "use re
364       '/aa'").  Also see "Which character set modifier is in effect?".
365
366       Which character set modifier is in effect?
367
368       Which of these modifiers is in effect at any given point in a regular
369       expression depends on a fairly complex set of interactions.  These have
370       been designed so that in general you don't have to worry about it, but
371       this section gives the gory details.  As explained below in "Extended
372       Patterns" it is possible to explicitly specify modifiers that apply
373       only to portions of a regular expression.  The innermost always has
374       priority over any outer ones, and one applying to the whole expression
375       has priority over any of the default settings that are described in the
376       remainder of this section.
377
378       The "use re '/foo'" pragma can be used to set default modifiers
379       (including these) for regular expressions compiled within its scope.
380       This pragma has precedence over the other pragmas listed below that
381       also change the defaults.
382
383       Otherwise, "use locale" sets the default modifier to "/l"; and "use
384       feature 'unicode_strings", or "use 5.012" (or higher) set the default
385       to "/u" when not in the same scope as either "use locale" or "use
386       bytes".  ("use locale ':not_characters'" also sets the default to "/u",
387       overriding any plain "use locale".)  Unlike the mechanisms mentioned
388       above, these affect operations besides regular expressions pattern
389       matching, and so give more consistent results with other operators,
390       including using "\U", "\l", etc. in substitution replacements.
391
392       If none of the above apply, for backwards compatibility reasons, the
393       "/d" modifier is the one in effect by default.  As this can lead to
394       unexpected results, it is best to specify which other rule set should
395       be used.
396
397       Character set modifier behavior prior to Perl 5.14
398
399       Prior to 5.14, there were no explicit modifiers, but "/l" was implied
400       for regexes compiled within the scope of "use locale", and "/d" was
401       implied otherwise.  However, interpolating a regex into a larger regex
402       would ignore the original compilation in favor of whatever was in
403       effect at the time of the second compilation.  There were a number of
404       inconsistencies (bugs) with the "/d" modifier, where Unicode rules
405       would be used when inappropriate, and vice versa.  "\p{}" did not imply
406       Unicode rules, and neither did all occurrences of "\N{}", until 5.12.
407
408   Regular Expressions
409       Metacharacters
410
411       The patterns used in Perl pattern matching evolved from those supplied
412       in the Version 8 regex routines.  (The routines are derived (distantly)
413       from Henry Spencer's freely redistributable reimplementation of the V8
414       routines.)  See "Version 8 Regular Expressions" for details.
415
416       In particular the following metacharacters have their standard
417       egrep-ish meanings:
418
419           \        Quote the next metacharacter
420           ^        Match the beginning of the line
421           .        Match any character (except newline)
422           $        Match the end of the line (or before newline at the end)
423           |        Alternation
424           ()       Grouping
425           []       Bracketed Character class
426
427       By default, the "^" character is guaranteed to match only the beginning
428       of the string, the "$" character only the end (or before the newline at
429       the end), and Perl does certain optimizations with the assumption that
430       the string contains only one line.  Embedded newlines will not be
431       matched by "^" or "$".  You may, however, wish to treat a string as a
432       multi-line buffer, such that the "^" will match after any newline
433       within the string (except if the newline is the last character in the
434       string), and "$" will match before any newline.  At the cost of a
435       little more overhead, you can do this by using the /m modifier on the
436       pattern match operator.  (Older programs did this by setting $*, but
437       this option was removed in perl 5.9.)
438
439       To simplify multi-line substitutions, the "." character never matches a
440       newline unless you use the "/s" modifier, which in effect tells Perl to
441       pretend the string is a single line--even if it isn't.
442
443       Quantifiers
444
445       The following standard quantifiers are recognized:
446
447           *           Match 0 or more times
448           +           Match 1 or more times
449           ?           Match 1 or 0 times
450           {n}         Match exactly n times
451           {n,}        Match at least n times
452           {n,m}       Match at least n but not more than m times
453
454       (If a curly bracket occurs in any other context and does not form part
455       of a backslashed sequence like "\x{...}", it is treated as a regular
456       character.  In particular, the lower quantifier bound is not optional.
457       However, in Perl v5.18, it is planned to issue a deprecation warning
458       for all such occurrences, and in Perl v5.20 to require literal uses of
459       a curly bracket to be escaped, say by preceding them with a backslash
460       or enclosing them within square brackets, ("\{" or "[{]").  This change
461       will allow for future syntax extensions (like making the lower bound of
462       a quantifier optional), and better error checking of quantifiers.  Now,
463       a typo in a quantifier silently causes it to be treated as the literal
464       characters.  For example,
465
466           /o{4,3}/
467
468       looks like a quantifier that matches 0 times, since 4 is greater than
469       3, but it really means to match the sequence of six characters
470       "o { 4 , 3 }".)
471
472       The "*" quantifier is equivalent to "{0,}", the "+" quantifier to
473       "{1,}", and the "?" quantifier to "{0,1}".  n and m are limited to non-
474       negative integral values less than a preset limit defined when perl is
475       built.  This is usually 32766 on the most common platforms.  The actual
476       limit can be seen in the error message generated by code such as this:
477
478           $_ **= $_ , / {$_} / for 2 .. 42;
479
480       By default, a quantified subpattern is "greedy", that is, it will match
481       as many times as possible (given a particular starting location) while
482       still allowing the rest of the pattern to match.  If you want it to
483       match the minimum number of times possible, follow the quantifier with
484       a "?".  Note that the meanings don't change, just the "greediness":
485
486           *?        Match 0 or more times, not greedily
487           +?        Match 1 or more times, not greedily
488           ??        Match 0 or 1 time, not greedily
489           {n}?      Match exactly n times, not greedily (redundant)
490           {n,}?     Match at least n times, not greedily
491           {n,m}?    Match at least n but not more than m times, not greedily
492
493       By default, when a quantified subpattern does not allow the rest of the
494       overall pattern to match, Perl will backtrack. However, this behaviour
495       is sometimes undesirable. Thus Perl provides the "possessive"
496       quantifier form as well.
497
498        *+     Match 0 or more times and give nothing back
499        ++     Match 1 or more times and give nothing back
500        ?+     Match 0 or 1 time and give nothing back
501        {n}+   Match exactly n times and give nothing back (redundant)
502        {n,}+  Match at least n times and give nothing back
503        {n,m}+ Match at least n but not more than m times and give nothing back
504
505       For instance,
506
507          'aaaa' =~ /a++a/
508
509       will never match, as the "a++" will gobble up all the "a"'s in the
510       string and won't leave any for the remaining part of the pattern. This
511       feature can be extremely useful to give perl hints about where it
512       shouldn't backtrack. For instance, the typical "match a double-quoted
513       string" problem can be most efficiently performed when written as:
514
515          /"(?:[^"\\]++|\\.)*+"/
516
517       as we know that if the final quote does not match, backtracking will
518       not help. See the independent subexpression ""(?>pattern)"" for more
519       details; possessive quantifiers are just syntactic sugar for that
520       construct. For instance the above example could also be written as
521       follows:
522
523          /"(?>(?:(?>[^"\\]+)|\\.)*)"/
524
525       Escape sequences
526
527       Because patterns are processed as double-quoted strings, the following
528       also work:
529
530        \t          tab                   (HT, TAB)
531        \n          newline               (LF, NL)
532        \r          return                (CR)
533        \f          form feed             (FF)
534        \a          alarm (bell)          (BEL)
535        \e          escape (think troff)  (ESC)
536        \cK         control char          (example: VT)
537        \x{}, \x00  character whose ordinal is the given hexadecimal number
538        \N{name}    named Unicode character or character sequence
539        \N{U+263D}  Unicode character     (example: FIRST QUARTER MOON)
540        \o{}, \000  character whose ordinal is the given octal number
541        \l          lowercase next char (think vi)
542        \u          uppercase next char (think vi)
543        \L          lowercase till \E (think vi)
544        \U          uppercase till \E (think vi)
545        \Q          quote (disable) pattern metacharacters till \E
546        \E          end either case modification or quoted section, think vi
547
548       Details are in "Quote and Quote-like Operators" in perlop.
549
550       Character Classes and other Special Escapes
551
552       In addition, Perl defines the following:
553
554        Sequence   Note    Description
555         [...]     [1]  Match a character according to the rules of the
556                          bracketed character class defined by the "...".
557                          Example: [a-z] matches "a" or "b" or "c" ... or "z"
558         [[:...:]] [2]  Match a character according to the rules of the POSIX
559                          character class "..." within the outer bracketed
560                          character class.  Example: [[:upper:]] matches any
561                          uppercase character.
562         \w        [3]  Match a "word" character (alphanumeric plus "_", plus
563                          other connector punctuation chars plus Unicode
564                          marks)
565         \W        [3]  Match a non-"word" character
566         \s        [3]  Match a whitespace character
567         \S        [3]  Match a non-whitespace character
568         \d        [3]  Match a decimal digit character
569         \D        [3]  Match a non-digit character
570         \pP       [3]  Match P, named property.  Use \p{Prop} for longer names
571         \PP       [3]  Match non-P
572         \X        [4]  Match Unicode "eXtended grapheme cluster"
573         \C             Match a single C-language char (octet) even if that is
574                          part of a larger UTF-8 character.  Thus it breaks up
575                          characters into their UTF-8 bytes, so you may end up
576                          with malformed pieces of UTF-8.  Unsupported in
577                          lookbehind.
578         \1        [5]  Backreference to a specific capture group or buffer.
579                          '1' may actually be any positive integer.
580         \g1       [5]  Backreference to a specific or previous group,
581         \g{-1}    [5]  The number may be negative indicating a relative
582                          previous group and may optionally be wrapped in
583                          curly brackets for safer parsing.
584         \g{name}  [5]  Named backreference
585         \k<name>  [5]  Named backreference
586         \K        [6]  Keep the stuff left of the \K, don't include it in $&
587         \N        [7]  Any character but \n (experimental).  Not affected by
588                          /s modifier
589         \v        [3]  Vertical whitespace
590         \V        [3]  Not vertical whitespace
591         \h        [3]  Horizontal whitespace
592         \H        [3]  Not horizontal whitespace
593         \R        [4]  Linebreak
594
595       [1] See "Bracketed Character Classes" in perlrecharclass for details.
596
597       [2] See "POSIX Character Classes" in perlrecharclass for details.
598
599       [3] See "Backslash sequences" in perlrecharclass for details.
600
601       [4] See "Misc" in perlrebackslash for details.
602
603       [5] See "Capture groups" below for details.
604
605       [6] See "Extended Patterns" below for details.
606
607       [7] Note that "\N" has two meanings.  When of the form "\N{NAME}", it
608           matches the character or character sequence whose name is "NAME";
609           and similarly when of the form "\N{U+hex}", it matches the
610           character whose Unicode code point is hex.  Otherwise it matches
611           any character but "\n".
612
613       Assertions
614
615       Perl defines the following zero-width assertions:
616
617           \b  Match a word boundary
618           \B  Match except at a word boundary
619           \A  Match only at beginning of string
620           \Z  Match only at end of string, or before newline at the end
621           \z  Match only at end of string
622           \G  Match only at pos() (e.g. at the end-of-match position
623               of prior m//g)
624
625       A word boundary ("\b") is a spot between two characters that has a "\w"
626       on one side of it and a "\W" on the other side of it (in either order),
627       counting the imaginary characters off the beginning and end of the
628       string as matching a "\W".  (Within character classes "\b" represents
629       backspace rather than a word boundary, just as it normally does in any
630       double-quoted string.)  The "\A" and "\Z" are just like "^" and "$",
631       except that they won't match multiple times when the "/m" modifier is
632       used, while "^" and "$" will match at every internal line boundary.  To
633       match the actual end of the string and not ignore an optional trailing
634       newline, use "\z".
635
636       The "\G" assertion can be used to chain global matches (using "m//g"),
637       as described in "Regexp Quote-Like Operators" in perlop.  It is also
638       useful when writing "lex"-like scanners, when you have several patterns
639       that you want to match against consequent substrings of your string;
640       see the previous reference.  The actual location where "\G" will match
641       can also be influenced by using "pos()" as an lvalue: see "pos" in
642       perlfunc. Note that the rule for zero-length matches (see "Repeated
643       Patterns Matching a Zero-length Substring") is modified somewhat, in
644       that contents to the left of "\G" are not counted when determining the
645       length of the match. Thus the following will not match forever:
646
647            my $string = 'ABC';
648            pos($string) = 1;
649            while ($string =~ /(.\G)/g) {
650                print $1;
651            }
652
653       It will print 'A' and then terminate, as it considers the match to be
654       zero-width, and thus will not match at the same position twice in a
655       row.
656
657       It is worth noting that "\G" improperly used can result in an infinite
658       loop. Take care when using patterns that include "\G" in an
659       alternation.
660
661       Capture groups
662
663       The bracketing construct "( ... )" creates capture groups (also
664       referred to as capture buffers). To refer to the current contents of a
665       group later on, within the same pattern, use "\g1" (or "\g{1}") for the
666       first, "\g2" (or "\g{2}") for the second, and so on.  This is called a
667       backreference.
668
669
670
671
672
673
674
675
676       There is no limit to the number of captured substrings that you may
677       use.  Groups are numbered with the leftmost open parenthesis being
678       number 1, etc.  If a group did not match, the associated backreference
679       won't match either. (This can happen if the group is optional, or in a
680       different branch of an alternation.)  You can omit the "g", and write
681       "\1", etc, but there are some issues with this form, described below.
682
683       You can also refer to capture groups relatively, by using a negative
684       number, so that "\g-1" and "\g{-1}" both refer to the immediately
685       preceding capture group, and "\g-2" and "\g{-2}" both refer to the
686       group before it.  For example:
687
688               /
689                (Y)            # group 1
690                (              # group 2
691                   (X)         # group 3
692                   \g{-1}      # backref to group 3
693                   \g{-3}      # backref to group 1
694                )
695               /x
696
697       would match the same as "/(Y) ( (X) \g3 \g1 )/x".  This allows you to
698       interpolate regexes into larger regexes and not have to worry about the
699       capture groups being renumbered.
700
701       You can dispense with numbers altogether and create named capture
702       groups.  The notation is "(?<name>...)" to declare and "\g{name}" to
703       reference.  (To be compatible with .Net regular expressions, "\g{name}"
704       may also be written as "\k{name}", "\k<name>" or "\k'name'".)  name
705       must not begin with a number, nor contain hyphens.  When different
706       groups within the same pattern have the same name, any reference to
707       that name assumes the leftmost defined group.  Named groups count in
708       absolute and relative numbering, and so can also be referred to by
709       those numbers.  (It's possible to do things with named capture groups
710       that would otherwise require "(??{})".)
711
712       Capture group contents are dynamically scoped and available to you
713       outside the pattern until the end of the enclosing block or until the
714       next successful match, whichever comes first.  (See "Compound
715       Statements" in perlsyn.)  You can refer to them by absolute number
716       (using "$1" instead of "\g1", etc); or by name via the "%+" hash, using
717       "$+{name}".
718
719       Braces are required in referring to named capture groups, but are
720       optional for absolute or relative numbered ones.  Braces are safer when
721       creating a regex by concatenating smaller strings.  For example if you
722       have "qr/$a$b/", and $a contained "\g1", and $b contained "37", you
723       would get "/\g137/" which is probably not what you intended.
724
725       The "\g" and "\k" notations were introduced in Perl 5.10.0.  Prior to
726       that there were no named nor relative numbered capture groups.
727       Absolute numbered groups were referred to using "\1", "\2", etc., and
728       this notation is still accepted (and likely always will be).  But it
729       leads to some ambiguities if there are more than 9 capture groups, as
730       "\10" could mean either the tenth capture group, or the character whose
731       ordinal in octal is 010 (a backspace in ASCII).  Perl resolves this
732       ambiguity by interpreting "\10" as a backreference only if at least 10
733       left parentheses have opened before it.  Likewise "\11" is a
734       backreference only if at least 11 left parentheses have opened before
735       it.  And so on.  "\1" through "\9" are always interpreted as
736       backreferences.  There are several examples below that illustrate these
737       perils.  You can avoid the ambiguity by always using "\g{}" or "\g" if
738       you mean capturing groups; and for octal constants always using "\o{}",
739       or for "\077" and below, using 3 digits padded with leading zeros,
740       since a leading zero implies an octal constant.
741
742       The "\digit" notation also works in certain circumstances outside the
743       pattern.  See "Warning on \1 Instead of $1" below for details.
744
745       Examples:
746
747           s/^([^ ]*) *([^ ]*)/$2 $1/;     # swap first two words
748
749           /(.)\g1/                        # find first doubled char
750                and print "'$1' is the first doubled character\n";
751
752           /(?<char>.)\k<char>/            # ... a different way
753                and print "'$+{char}' is the first doubled character\n";
754
755           /(?'char'.)\g1/                 # ... mix and match
756                and print "'$1' is the first doubled character\n";
757
758           if (/Time: (..):(..):(..)/) {   # parse out values
759               $hours = $1;
760               $minutes = $2;
761               $seconds = $3;
762           }
763
764           /(.)(.)(.)(.)(.)(.)(.)(.)(.)\g10/   # \g10 is a backreference
765           /(.)(.)(.)(.)(.)(.)(.)(.)(.)\10/    # \10 is octal
766           /((.)(.)(.)(.)(.)(.)(.)(.)(.))\10/  # \10 is a backreference
767           /((.)(.)(.)(.)(.)(.)(.)(.)(.))\010/ # \010 is octal
768
769           $a = '(.)\1';        # Creates problems when concatenated.
770           $b = '(.)\g{1}';     # Avoids the problems.
771           "aa" =~ /${a}/;      # True
772           "aa" =~ /${b}/;      # True
773           "aa0" =~ /${a}0/;    # False!
774           "aa0" =~ /${b}0/;    # True
775           "aa\x08" =~ /${a}0/;  # True!
776           "aa\x08" =~ /${b}0/;  # False
777
778       Several special variables also refer back to portions of the previous
779       match.  $+ returns whatever the last bracket match matched.  $& returns
780       the entire matched string.  (At one point $0 did also, but now it
781       returns the name of the program.)  "$`" returns everything before the
782       matched string.  "$'" returns everything after the matched string. And
783       $^N contains whatever was matched by the most-recently closed group
784       (submatch). $^N can be used in extended patterns (see below), for
785       example to assign a submatch to a variable.
786
787       These special variables, like the "%+" hash and the numbered match
788       variables ($1, $2, $3, etc.) are dynamically scoped until the end of
789       the enclosing block or until the next successful match, whichever comes
790       first.  (See "Compound Statements" in perlsyn.)
791
792       NOTE: Failed matches in Perl do not reset the match variables, which
793       makes it easier to write code that tests for a series of more specific
794       cases and remembers the best match.
795
796       WARNING: Once Perl sees that you need one of $&, "$`", or "$'" anywhere
797       in the program, it has to provide them for every pattern match.  This
798       may substantially slow your program.  Perl uses the same mechanism to
799       produce $1, $2, etc, so you also pay a price for each pattern that
800       contains capturing parentheses.  (To avoid this cost while retaining
801       the grouping behaviour, use the extended regular expression "(?: ... )"
802       instead.)  But if you never use $&, "$`" or "$'", then patterns without
803       capturing parentheses will not be penalized.  So avoid $&, "$'", and
804       "$`" if you can, but if you can't (and some algorithms really
805       appreciate them), once you've used them once, use them at will, because
806       you've already paid the price.  As of 5.005, $& is not so costly as the
807       other two.
808
809       As a workaround for this problem, Perl 5.10.0 introduces
810       "${^PREMATCH}", "${^MATCH}" and "${^POSTMATCH}", which are equivalent
811       to "$`", $& and "$'", except that they are only guaranteed to be
812       defined after a successful match that was executed with the "/p"
813       (preserve) modifier.  The use of these variables incurs no global
814       performance penalty, unlike their punctuation char equivalents, however
815       at the trade-off that you have to tell perl when you want to use them.
816
817   Quoting metacharacters
818       Backslashed metacharacters in Perl are alphanumeric, such as "\b",
819       "\w", "\n".  Unlike some other regular expression languages, there are
820       no backslashed symbols that aren't alphanumeric.  So anything that
821       looks like \\, \(, \), \<, \>, \{, or \} is always interpreted as a
822       literal character, not a metacharacter.  This was once used in a common
823       idiom to disable or quote the special meanings of regular expression
824       metacharacters in a string that you want to use for a pattern. Simply
825       quote all non-"word" characters:
826
827           $pattern =~ s/(\W)/\\$1/g;
828
829       (If "use locale" is set, then this depends on the current locale.)
830       Today it is more common to use the quotemeta() function or the "\Q"
831       metaquoting escape sequence to disable all metacharacters' special
832       meanings like this:
833
834           /$unquoted\Q$quoted\E$unquoted/
835
836       Beware that if you put literal backslashes (those not inside
837       interpolated variables) between "\Q" and "\E", double-quotish backslash
838       interpolation may lead to confusing results.  If you need to use
839       literal backslashes within "\Q...\E", consult "Gory details of parsing
840       quoted constructs" in perlop.
841
842       "quotemeta()" and "\Q" are fully described in "quotemeta" in perlfunc.
843
844   Extended Patterns
845       Perl also defines a consistent extension syntax for features not found
846       in standard tools like awk and lex.  The syntax for most of these is a
847       pair of parentheses with a question mark as the first thing within the
848       parentheses.  The character after the question mark indicates the
849       extension.
850
851       The stability of these extensions varies widely.  Some have been part
852       of the core language for many years.  Others are experimental and may
853       change without warning or be completely removed.  Check the
854       documentation on an individual feature to verify its current status.
855
856       A question mark was chosen for this and for the minimal-matching
857       construct because 1) question marks are rare in older regular
858       expressions, and 2) whenever you see one, you should stop and
859       "question" exactly what is going on.  That's psychology....
860
861       "(?#text)"
862           A comment.  The text is ignored.  If the "/x" modifier enables
863           whitespace formatting, a simple "#" will suffice.  Note that Perl
864           closes the comment as soon as it sees a ")", so there is no way to
865           put a literal ")" in the comment.
866
867       "(?adlupimsx-imsx)"
868       "(?^alupimsx)"
869           One or more embedded pattern-match modifiers, to be turned on (or
870           turned off, if preceded by "-") for the remainder of the pattern or
871           the remainder of the enclosing pattern group (if any).
872
873           This is particularly useful for dynamic patterns, such as those
874           read in from a configuration file, taken from an argument, or
875           specified in a table somewhere.  Consider the case where some
876           patterns want to be case-sensitive and some do not:  The case-
877           insensitive ones merely need to include "(?i)" at the front of the
878           pattern.  For example:
879
880               $pattern = "foobar";
881               if ( /$pattern/i ) { }
882
883               # more flexible:
884
885               $pattern = "(?i)foobar";
886               if ( /$pattern/ ) { }
887
888           These modifiers are restored at the end of the enclosing group. For
889           example,
890
891               ( (?i) blah ) \s+ \g1
892
893           will match "blah" in any case, some spaces, and an exact (including
894           the case!)  repetition of the previous word, assuming the "/x"
895           modifier, and no "/i" modifier outside this group.
896
897           These modifiers do not carry over into named subpatterns called in
898           the enclosing group. In other words, a pattern such as
899           "((?i)(?&NAME))" does not change the case-sensitivity of the "NAME"
900           pattern.
901
902           Any of these modifiers can be set to apply globally to all regular
903           expressions compiled within the scope of a "use re".  See "'/flags'
904           mode" in re.
905
906           Starting in Perl 5.14, a "^" (caret or circumflex accent)
907           immediately after the "?" is a shorthand equivalent to "d-imsx".
908           Flags (except "d") may follow the caret to override it.  But a
909           minus sign is not legal with it.
910
911           Note that the "a", "d", "l", "p", and "u" modifiers are special in
912           that they can only be enabled, not disabled, and the "a", "d", "l",
913           and "u" modifiers are mutually exclusive: specifying one de-
914           specifies the others, and a maximum of one (or two "a"'s) may
915           appear in the construct.  Thus, for example, "(?-p)" will warn when
916           compiled under "use warnings"; "(?-d:...)" and "(?dl:...)" are
917           fatal errors.
918
919           Note also that the "p" modifier is special in that its presence
920           anywhere in a pattern has a global effect.
921
922       "(?:pattern)"
923       "(?adluimsx-imsx:pattern)"
924       "(?^aluimsx:pattern)"
925           This is for clustering, not capturing; it groups subexpressions
926           like "()", but doesn't make backreferences as "()" does.  So
927
928               @fields = split(/\b(?:a|b|c)\b/)
929
930           is like
931
932               @fields = split(/\b(a|b|c)\b/)
933
934           but doesn't spit out extra fields.  It's also cheaper not to
935           capture characters if you don't need to.
936
937           Any letters between "?" and ":" act as flags modifiers as with
938           "(?adluimsx-imsx)".  For example,
939
940               /(?s-i:more.*than).*million/i
941
942           is equivalent to the more verbose
943
944               /(?:(?s-i)more.*than).*million/i
945
946           Starting in Perl 5.14, a "^" (caret or circumflex accent)
947           immediately after the "?" is a shorthand equivalent to "d-imsx".
948           Any positive flags (except "d") may follow the caret, so
949
950               (?^x:foo)
951
952           is equivalent to
953
954               (?x-ims:foo)
955
956           The caret tells Perl that this cluster doesn't inherit the flags of
957           any surrounding pattern, but uses the system defaults ("d-imsx"),
958           modified by any flags specified.
959
960           The caret allows for simpler stringification of compiled regular
961           expressions.  These look like
962
963               (?^:pattern)
964
965           with any non-default flags appearing between the caret and the
966           colon.  A test that looks at such stringification thus doesn't need
967           to have the system default flags hard-coded in it, just the caret.
968           If new flags are added to Perl, the meaning of the caret's
969           expansion will change to include the default for those flags, so
970           the test will still work, unchanged.
971
972           Specifying a negative flag after the caret is an error, as the flag
973           is redundant.
974
975           Mnemonic for "(?^...)":  A fresh beginning since the usual use of a
976           caret is to match at the beginning.
977
978       "(?|pattern)"
979           This is the "branch reset" pattern, which has the special property
980           that the capture groups are numbered from the same starting point
981           in each alternation branch. It is available starting from perl
982           5.10.0.
983
984           Capture groups are numbered from left to right, but inside this
985           construct the numbering is restarted for each branch.
986
987           The numbering within each branch will be as normal, and any groups
988           following this construct will be numbered as though the construct
989           contained only one branch, that being the one with the most capture
990           groups in it.
991
992           This construct is useful when you want to capture one of a number
993           of alternative matches.
994
995           Consider the following pattern.  The numbers underneath show in
996           which group the captured content will be stored.
997
998               # before  ---------------branch-reset----------- after
999               / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1000               # 1            2         2  3        2     3     4
1001
1002           Be careful when using the branch reset pattern in combination with
1003           named captures. Named captures are implemented as being aliases to
1004           numbered groups holding the captures, and that interferes with the
1005           implementation of the branch reset pattern. If you are using named
1006           captures in a branch reset pattern, it's best to use the same
1007           names, in the same order, in each of the alternations:
1008
1009              /(?|  (?<a> x ) (?<b> y )
1010                 |  (?<a> z ) (?<b> w )) /x
1011
1012           Not doing so may lead to surprises:
1013
1014             "12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
1015             say $+ {a};   # Prints '12'
1016             say $+ {b};   # *Also* prints '12'.
1017
1018           The problem here is that both the group named "a" and the group
1019           named "b" are aliases for the group belonging to $1.
1020
1021       Look-Around Assertions
1022           Look-around assertions are zero-width patterns which match a
1023           specific pattern without including it in $&. Positive assertions
1024           match when their subpattern matches, negative assertions match when
1025           their subpattern fails. Look-behind matches text up to the current
1026           match position, look-ahead matches text following the current match
1027           position.
1028
1029           "(?=pattern)"
1030               A zero-width positive look-ahead assertion.  For example,
1031               "/\w+(?=\t)/" matches a word followed by a tab, without
1032               including the tab in $&.
1033
1034           "(?!pattern)"
1035               A zero-width negative look-ahead assertion.  For example
1036               "/foo(?!bar)/" matches any occurrence of "foo" that isn't
1037               followed by "bar".  Note however that look-ahead and look-
1038               behind are NOT the same thing.  You cannot use this for look-
1039               behind.
1040
1041               If you are looking for a "bar" that isn't preceded by a "foo",
1042               "/(?!foo)bar/" will not do what you want.  That's because the
1043               "(?!foo)" is just saying that the next thing cannot be
1044               "foo"--and it's not, it's a "bar", so "foobar" will match.  Use
1045               look-behind instead (see below).
1046
1047           "(?<=pattern)" "\K"
1048               A zero-width positive look-behind assertion.  For example,
1049               "/(?<=\t)\w+/" matches a word that follows a tab, without
1050               including the tab in $&.  Works only for fixed-width look-
1051               behind.
1052
1053               There is a special form of this construct, called "\K", which
1054               causes the regex engine to "keep" everything it had matched
1055               prior to the "\K" and not include it in $&. This effectively
1056               provides variable-length look-behind. The use of "\K" inside of
1057               another look-around assertion is allowed, but the behaviour is
1058               currently not well defined.
1059
1060               For various reasons "\K" may be significantly more efficient
1061               than the equivalent "(?<=...)" construct, and it is especially
1062               useful in situations where you want to efficiently remove
1063               something following something else in a string. For instance
1064
1065                 s/(foo)bar/$1/g;
1066
1067               can be rewritten as the much more efficient
1068
1069                 s/foo\Kbar//g;
1070
1071           "(?<!pattern)"
1072               A zero-width negative look-behind assertion.  For example
1073               "/(?<!bar)foo/" matches any occurrence of "foo" that does not
1074               follow "bar".  Works only for fixed-width look-behind.
1075
1076       "(?'NAME'pattern)"
1077       "(?<NAME>pattern)"
1078           A named capture group. Identical in every respect to normal
1079           capturing parentheses "()" but for the additional fact that the
1080           group can be referred to by name in various regular expression
1081           constructs (like "\g{NAME}") and can be accessed by name after a
1082           successful match via "%+" or "%-". See perlvar for more details on
1083           the "%+" and "%-" hashes.
1084
1085           If multiple distinct capture groups have the same name then the
1086           $+{NAME} will refer to the leftmost defined group in the match.
1087
1088           The forms "(?'NAME'pattern)" and "(?<NAME>pattern)" are equivalent.
1089
1090           NOTE: While the notation of this construct is the same as the
1091           similar function in .NET regexes, the behavior is not. In Perl the
1092           groups are numbered sequentially regardless of being named or not.
1093           Thus in the pattern
1094
1095             /(x)(?<foo>y)(z)/
1096
1097           $+{foo} will be the same as $2, and $3 will contain 'z' instead of
1098           the opposite which is what a .NET regex hacker might expect.
1099
1100           Currently NAME is restricted to simple identifiers only.  In other
1101           words, it must match "/^[_A-Za-z][_A-Za-z0-9]*\z/" or its Unicode
1102           extension (see utf8), though it isn't extended by the locale (see
1103           perllocale).
1104
1105           NOTE: In order to make things easier for programmers with
1106           experience with the Python or PCRE regex engines, the pattern
1107           "(?P<NAME>pattern)" may be used instead of "(?<NAME>pattern)";
1108           however this form does not support the use of single quotes as a
1109           delimiter for the name.
1110
1111       "\k<NAME>"
1112       "\k'NAME'"
1113           Named backreference. Similar to numeric backreferences, except that
1114           the group is designated by name and not number. If multiple groups
1115           have the same name then it refers to the leftmost defined group in
1116           the current match.
1117
1118           It is an error to refer to a name not defined by a "(?<NAME>)"
1119           earlier in the pattern.
1120
1121           Both forms are equivalent.
1122
1123           NOTE: In order to make things easier for programmers with
1124           experience with the Python or PCRE regex engines, the pattern
1125           "(?P=NAME)" may be used instead of "\k<NAME>".
1126
1127       "(?{ code })"
1128           WARNING: This extended regular expression feature is considered
1129           experimental, and may be changed without notice. Code executed that
1130           has side effects may not perform identically from version to
1131           version due to the effect of future optimisations in the regex
1132           engine.
1133
1134           This zero-width assertion evaluates any embedded Perl code.  It
1135           always succeeds, and its "code" is not interpolated.  Currently,
1136           the rules to determine where the "code" ends are somewhat
1137           convoluted.
1138
1139           This feature can be used together with the special variable $^N to
1140           capture the results of submatches in variables without having to
1141           keep track of the number of nested parentheses. For example:
1142
1143             $_ = "The brown fox jumps over the lazy dog";
1144             /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
1145             print "color = $color, animal = $animal\n";
1146
1147           Inside the "(?{...})" block, $_ refers to the string the regular
1148           expression is matching against. You can also use "pos()" to know
1149           what is the current position of matching within this string.
1150
1151           The "code" is properly scoped in the following sense: If the
1152           assertion is backtracked (compare "Backtracking"), all changes
1153           introduced after "local"ization are undone, so that
1154
1155             $_ = 'a' x 8;
1156             m<
1157                (?{ $cnt = 0 })               # Initialize $cnt.
1158                (
1159                  a
1160                  (?{
1161                      local $cnt = $cnt + 1;  # Update $cnt,
1162                                              # backtracking-safe.
1163                  })
1164                )*
1165                aaaa
1166                (?{ $res = $cnt })            # On success copy to
1167                                              # non-localized location.
1168              >x;
1169
1170           will set "$res = 4".  Note that after the match, $cnt returns to
1171           the globally introduced value, because the scopes that restrict
1172           "local" operators are unwound.
1173
1174           This assertion may be used as a
1175           "(?(condition)yes-pattern|no-pattern)" switch.  If not used in this
1176           way, the result of evaluation of "code" is put into the special
1177           variable $^R.  This happens immediately, so $^R can be used from
1178           other "(?{ code })" assertions inside the same regular expression.
1179
1180           The assignment to $^R above is properly localized, so the old value
1181           of $^R is restored if the assertion is backtracked; compare
1182           "Backtracking".
1183
1184           For reasons of security, this construct is forbidden if the regular
1185           expression involves run-time interpolation of variables, unless the
1186           perilous "use re 'eval'" pragma has been used (see re), or the
1187           variables contain results of the "qr//" operator (see
1188           "qr/STRING/msixpodual" in perlop).
1189
1190           This restriction is due to the wide-spread and remarkably
1191           convenient custom of using run-time determined strings as patterns.
1192           For example:
1193
1194               $re = <>;
1195               chomp $re;
1196               $string =~ /$re/;
1197
1198           Before Perl knew how to execute interpolated code within a pattern,
1199           this operation was completely safe from a security point of view,
1200           although it could raise an exception from an illegal pattern.  If
1201           you turn on the "use re 'eval'", though, it is no longer secure, so
1202           you should only do so if you are also using taint checking.  Better
1203           yet, use the carefully constrained evaluation within a Safe
1204           compartment.  See perlsec for details about both these mechanisms.
1205
1206           WARNING: Use of lexical ("my") variables in these blocks is broken.
1207           The result is unpredictable and will make perl unstable. The
1208           workaround is to use global ("our") variables.
1209
1210           WARNING: In perl 5.12.x and earlier, the regex engine was not re-
1211           entrant, so interpolated code could not safely invoke the regex
1212           engine either directly with "m//" or "s///"), or indirectly with
1213           functions such as "split". Invoking the regex engine in these
1214           blocks would make perl unstable.
1215
1216       "(??{ code })"
1217           WARNING: This extended regular expression feature is considered
1218           experimental, and may be changed without notice. Code executed that
1219           has side effects may not perform identically from version to
1220           version due to the effect of future optimisations in the regex
1221           engine.
1222
1223           This is a "postponed" regular subexpression.  The "code" is
1224           evaluated at run time, at the moment this subexpression may match.
1225           The result of evaluation is considered a regular expression and
1226           matched as if it were inserted instead of this construct.  Note
1227           that this means that the contents of capture groups defined inside
1228           an eval'ed pattern are not available outside of the pattern, and
1229           vice versa, there is no way for the inner pattern returned from the
1230           code block to refer to a capture group defined outside.  (The code
1231           block itself can use $1, etc., to refer to the enclosing pattern's
1232           capture groups.)  Thus,
1233
1234               ('a' x 100)=~/(??{'(.)' x 100})/
1235
1236           will match, it will not set $1.
1237
1238           The "code" is not interpolated.  As before, the rules to determine
1239           where the "code" ends are currently somewhat convoluted.
1240
1241           The following pattern matches a parenthesized group:
1242
1243            $re = qr{
1244                       \(
1245                       (?:
1246                          (?> [^()]+ )  # Non-parens without backtracking
1247                        |
1248                          (??{ $re })   # Group with matching parens
1249                       )*
1250                       \)
1251                    }x;
1252
1253           See also "(?PARNO)" for a different, more efficient way to
1254           accomplish the same task.
1255
1256           For reasons of security, this construct is forbidden if the regular
1257           expression involves run-time interpolation of variables, unless the
1258           perilous "use re 'eval'" pragma has been used (see re), or the
1259           variables contain results of the "qr//" operator (see
1260           "qr/STRING/msixpodual" in perlop).
1261
1262           In perl 5.12.x and earlier, because the regex engine was not re-
1263           entrant, delayed code could not safely invoke the regex engine
1264           either directly with "m//" or "s///"), or indirectly with functions
1265           such as "split".
1266
1267           Recursing deeper than 50 times without consuming any input string
1268           will result in a fatal error.  The maximum depth is compiled into
1269           perl, so changing it requires a custom build.
1270
1271       "(?PARNO)" "(?-PARNO)" "(?+PARNO)" "(?R)" "(?0)"
1272           Similar to "(??{ code })" except it does not involve compiling any
1273           code, instead it treats the contents of a capture group as an
1274           independent pattern that must match at the current position.
1275           Capture groups contained by the pattern will have the value as
1276           determined by the outermost recursion.
1277
1278           PARNO is a sequence of digits (not starting with 0) whose value
1279           reflects the paren-number of the capture group to recurse to.
1280           "(?R)" recurses to the beginning of the whole pattern. "(?0)" is an
1281           alternate syntax for "(?R)". If PARNO is preceded by a plus or
1282           minus sign then it is assumed to be relative, with negative numbers
1283           indicating preceding capture groups and positive ones following.
1284           Thus "(?-1)" refers to the most recently declared group, and
1285           "(?+1)" indicates the next group to be declared.  Note that the
1286           counting for relative recursion differs from that of relative
1287           backreferences, in that with recursion unclosed groups are
1288           included.
1289
1290           The following pattern matches a function foo() which may contain
1291           balanced parentheses as the argument.
1292
1293             $re = qr{ (                   # paren group 1 (full function)
1294                         foo
1295                         (                 # paren group 2 (parens)
1296                           \(
1297                             (             # paren group 3 (contents of parens)
1298                             (?:
1299                              (?> [^()]+ ) # Non-parens without backtracking
1300                             |
1301                              (?2)         # Recurse to start of paren group 2
1302                             )*
1303                             )
1304                           \)
1305                         )
1306                       )
1307                     }x;
1308
1309           If the pattern was used as follows
1310
1311               'foo(bar(baz)+baz(bop))'=~/$re/
1312                   and print "\$1 = $1\n",
1313                             "\$2 = $2\n",
1314                             "\$3 = $3\n";
1315
1316           the output produced should be the following:
1317
1318               $1 = foo(bar(baz)+baz(bop))
1319               $2 = (bar(baz)+baz(bop))
1320               $3 = bar(baz)+baz(bop)
1321
1322           If there is no corresponding capture group defined, then it is a
1323           fatal error.  Recursing deeper than 50 times without consuming any
1324           input string will also result in a fatal error.  The maximum depth
1325           is compiled into perl, so changing it requires a custom build.
1326
1327           The following shows how using negative indexing can make it easier
1328           to embed recursive patterns inside of a "qr//" construct for later
1329           use:
1330
1331               my $parens = qr/(\((?:[^()]++|(?-1))*+\))/;
1332               if (/foo $parens \s+ + \s+ bar $parens/x) {
1333                  # do something here...
1334               }
1335
1336           Note that this pattern does not behave the same way as the
1337           equivalent PCRE or Python construct of the same form. In Perl you
1338           can backtrack into a recursed group, in PCRE and Python the
1339           recursed into group is treated as atomic. Also, modifiers are
1340           resolved at compile time, so constructs like (?i:(?1)) or
1341           (?:(?i)(?1)) do not affect how the sub-pattern will be processed.
1342
1343       "(?&NAME)"
1344           Recurse to a named subpattern. Identical to "(?PARNO)" except that
1345           the parenthesis to recurse to is determined by name. If multiple
1346           parentheses have the same name, then it recurses to the leftmost.
1347
1348           It is an error to refer to a name that is not declared somewhere in
1349           the pattern.
1350
1351           NOTE: In order to make things easier for programmers with
1352           experience with the Python or PCRE regex engines the pattern
1353           "(?P>NAME)" may be used instead of "(?&NAME)".
1354
1355       "(?(condition)yes-pattern|no-pattern)"
1356       "(?(condition)yes-pattern)"
1357           Conditional expression. Matches "yes-pattern" if "condition" yields
1358           a true value, matches "no-pattern" otherwise. A missing pattern
1359           always matches.
1360
1361           "(condition)" should be either an integer in parentheses (which is
1362           valid if the corresponding pair of parentheses matched), a
1363           look-ahead/look-behind/evaluate zero-width assertion, a name in
1364           angle brackets or single quotes (which is valid if a group with the
1365           given name matched), or the special symbol (R) (true when evaluated
1366           inside of recursion or eval). Additionally the R may be followed by
1367           a number, (which will be true when evaluated when recursing inside
1368           of the appropriate group), or by &NAME, in which case it will be
1369           true only when evaluated during recursion in the named group.
1370
1371           Here's a summary of the possible predicates:
1372
1373           (1) (2) ...
1374               Checks if the numbered capturing group has matched something.
1375
1376           (<NAME>) ('NAME')
1377               Checks if a group with the given name has matched something.
1378
1379           (?=...) (?!...) (?<=...) (?<!...)
1380               Checks whether the pattern matches (or does not match, for the
1381               '!'  variants).
1382
1383           (?{ CODE })
1384               Treats the return value of the code block as the condition.
1385
1386           (R) Checks if the expression has been evaluated inside of
1387               recursion.
1388
1389           (R1) (R2) ...
1390               Checks if the expression has been evaluated while executing
1391               directly inside of the n-th capture group. This check is the
1392               regex equivalent of
1393
1394                 if ((caller(0))[3] eq 'subname') { ... }
1395
1396               In other words, it does not check the full recursion stack.
1397
1398           (R&NAME)
1399               Similar to "(R1)", this predicate checks to see if we're
1400               executing directly inside of the leftmost group with a given
1401               name (this is the same logic used by "(?&NAME)" to
1402               disambiguate). It does not check the full stack, but only the
1403               name of the innermost active recursion.
1404
1405           (DEFINE)
1406               In this case, the yes-pattern is never directly executed, and
1407               no no-pattern is allowed. Similar in spirit to "(?{0})" but
1408               more efficient.  See below for details.
1409
1410           For example:
1411
1412               m{ ( \( )?
1413                  [^()]+
1414                  (?(1) \) )
1415                }x
1416
1417           matches a chunk of non-parentheses, possibly included in
1418           parentheses themselves.
1419
1420           A special form is the "(DEFINE)" predicate, which never executes
1421           its yes-pattern directly, and does not allow a no-pattern. This
1422           allows one to define subpatterns which will be executed only by the
1423           recursion mechanism.  This way, you can define a set of regular
1424           expression rules that can be bundled into any pattern you choose.
1425
1426           It is recommended that for this usage you put the DEFINE block at
1427           the end of the pattern, and that you name any subpatterns defined
1428           within it.
1429
1430           Also, it's worth noting that patterns defined this way probably
1431           will not be as efficient, as the optimiser is not very clever about
1432           handling them.
1433
1434           An example of how this might be used is as follows:
1435
1436             /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
1437              (?(DEFINE)
1438                (?<NAME_PAT>....)
1439                (?<ADRESS_PAT>....)
1440              )/x
1441
1442           Note that capture groups matched inside of recursion are not
1443           accessible after the recursion returns, so the extra layer of
1444           capturing groups is necessary. Thus $+{NAME_PAT} would not be
1445           defined even though $+{NAME} would be.
1446
1447           Finally, keep in mind that subpatterns created inside a DEFINE
1448           block count towards the absolute and relative number of captures,
1449           so this:
1450
1451               my @captures = "a" =~ /(.)                  # First capture
1452                                      (?(DEFINE)
1453                                          (?<EXAMPLE> 1 )  # Second capture
1454                                      )/x;
1455               say scalar @captures;
1456
1457           Will output 2, not 1. This is particularly important if you intend
1458           to compile the definitions with the "qr//" operator, and later
1459           interpolate them in another pattern.
1460
1461       "(?>pattern)"
1462           An "independent" subexpression, one which matches the substring
1463           that a standalone "pattern" would match if anchored at the given
1464           position, and it matches nothing other than this substring.  This
1465           construct is useful for optimizations of what would otherwise be
1466           "eternal" matches, because it will not backtrack (see
1467           "Backtracking").  It may also be useful in places where the "grab
1468           all you can, and do not give anything back" semantic is desirable.
1469
1470           For example: "^(?>a*)ab" will never match, since "(?>a*)" (anchored
1471           at the beginning of string, as above) will match all characters "a"
1472           at the beginning of string, leaving no "a" for "ab" to match.  In
1473           contrast, "a*ab" will match the same as "a+b", since the match of
1474           the subgroup "a*" is influenced by the following group "ab" (see
1475           "Backtracking").  In particular, "a*" inside "a*ab" will match
1476           fewer characters than a standalone "a*", since this makes the tail
1477           match.
1478
1479           "(?>pattern)" does not disable backtracking altogether once it has
1480           matched. It is still possible to backtrack past the construct, but
1481           not into it. So "((?>a*)|(?>b*))ar" will still match "bar".
1482
1483           An effect similar to "(?>pattern)" may be achieved by writing
1484           "(?=(pattern))\g{-1}".  This matches the same substring as a
1485           standalone "a+", and the following "\g{-1}" eats the matched
1486           string; it therefore makes a zero-length assertion into an analogue
1487           of "(?>...)".  (The difference between these two constructs is that
1488           the second one uses a capturing group, thus shifting ordinals of
1489           backreferences in the rest of a regular expression.)
1490
1491           Consider this pattern:
1492
1493               m{ \(
1494                     (
1495                       [^()]+           # x+
1496                     |
1497                       \( [^()]* \)
1498                     )+
1499                  \)
1500                }x
1501
1502           That will efficiently match a nonempty group with matching
1503           parentheses two levels deep or less.  However, if there is no such
1504           group, it will take virtually forever on a long string.  That's
1505           because there are so many different ways to split a long string
1506           into several substrings.  This is what "(.+)+" is doing, and
1507           "(.+)+" is similar to a subpattern of the above pattern.  Consider
1508           how the pattern above detects no-match on "((()aaaaaaaaaaaaaaaaaa"
1509           in several seconds, but that each extra letter doubles this time.
1510           This exponential performance will make it appear that your program
1511           has hung.  However, a tiny change to this pattern
1512
1513               m{ \(
1514                     (
1515                       (?> [^()]+ )        # change x+ above to (?> x+ )
1516                     |
1517                       \( [^()]* \)
1518                     )+
1519                  \)
1520                }x
1521
1522           which uses "(?>...)" matches exactly when the one above does
1523           (verifying this yourself would be a productive exercise), but
1524           finishes in a fourth the time when used on a similar string with
1525           1000000 "a"s.  Be aware, however, that, when this construct is
1526           followed by a quantifier, it currently triggers a warning message
1527           under the "use warnings" pragma or -w switch saying it "matches
1528           null string many times in regex".
1529
1530           On simple groups, such as the pattern "(?> [^()]+ )", a comparable
1531           effect may be achieved by negative look-ahead, as in "[^()]+ (?!
1532           [^()] )".  This was only 4 times slower on a string with 1000000
1533           "a"s.
1534
1535           The "grab all you can, and do not give anything back" semantic is
1536           desirable in many situations where on the first sight a simple
1537           "()*" looks like the correct solution.  Suppose we parse text with
1538           comments being delimited by "#" followed by some optional
1539           (horizontal) whitespace.  Contrary to its appearance, "#[ \t]*" is
1540           not the correct subexpression to match the comment delimiter,
1541           because it may "give up" some whitespace if the remainder of the
1542           pattern can be made to match that way.  The correct answer is
1543           either one of these:
1544
1545               (?>#[ \t]*)
1546               #[ \t]*(?![ \t])
1547
1548           For example, to grab non-empty comments into $1, one should use
1549           either one of these:
1550
1551               / (?> \# [ \t]* ) (        .+ ) /x;
1552               /     \# [ \t]*   ( [^ \t] .* ) /x;
1553
1554           Which one you pick depends on which of these expressions better
1555           reflects the above specification of comments.
1556
1557           In some literature this construct is called "atomic matching" or
1558           "possessive matching".
1559
1560           Possessive quantifiers are equivalent to putting the item they are
1561           applied to inside of one of these constructs. The following
1562           equivalences apply:
1563
1564               Quantifier Form     Bracketing Form
1565               ---------------     ---------------
1566               PAT*+               (?>PAT*)
1567               PAT++               (?>PAT+)
1568               PAT?+               (?>PAT?)
1569               PAT{min,max}+       (?>PAT{min,max})
1570
1571   Special Backtracking Control Verbs
1572       WARNING: These patterns are experimental and subject to change or
1573       removal in a future version of Perl. Their usage in production code
1574       should be noted to avoid problems during upgrades.
1575
1576       These special patterns are generally of the form "(*VERB:ARG)". Unless
1577       otherwise stated the ARG argument is optional; in some cases, it is
1578       forbidden.
1579
1580       Any pattern containing a special backtracking verb that allows an
1581       argument has the special behaviour that when executed it sets the
1582       current package's $REGERROR and $REGMARK variables. When doing so the
1583       following rules apply:
1584
1585       On failure, the $REGERROR variable will be set to the ARG value of the
1586       verb pattern, if the verb was involved in the failure of the match. If
1587       the ARG part of the pattern was omitted, then $REGERROR will be set to
1588       the name of the last "(*MARK:NAME)" pattern executed, or to TRUE if
1589       there was none. Also, the $REGMARK variable will be set to FALSE.
1590
1591       On a successful match, the $REGERROR variable will be set to FALSE, and
1592       the $REGMARK variable will be set to the name of the last
1593       "(*MARK:NAME)" pattern executed.  See the explanation for the
1594       "(*MARK:NAME)" verb below for more details.
1595
1596       NOTE: $REGERROR and $REGMARK are not magic variables like $1 and most
1597       other regex-related variables. They are not local to a scope, nor
1598       readonly, but instead are volatile package variables similar to
1599       $AUTOLOAD.  Use "local" to localize changes to them to a specific scope
1600       if necessary.
1601
1602       If a pattern does not contain a special backtracking verb that allows
1603       an argument, then $REGERROR and $REGMARK are not touched at all.
1604
1605       Verbs that take an argument
1606          "(*PRUNE)" "(*PRUNE:NAME)"
1607              This zero-width pattern prunes the backtracking tree at the
1608              current point when backtracked into on failure. Consider the
1609              pattern "A (*PRUNE) B", where A and B are complex patterns.
1610              Until the "(*PRUNE)" verb is reached, A may backtrack as
1611              necessary to match. Once it is reached, matching continues in B,
1612              which may also backtrack as necessary; however, should B not
1613              match, then no further backtracking will take place, and the
1614              pattern will fail outright at the current starting position.
1615
1616              The following example counts all the possible matching strings
1617              in a pattern (without actually matching any of them).
1618
1619                  'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
1620                  print "Count=$count\n";
1621
1622              which produces:
1623
1624                  aaab
1625                  aaa
1626                  aa
1627                  a
1628                  aab
1629                  aa
1630                  a
1631                  ab
1632                  a
1633                  Count=9
1634
1635              If we add a "(*PRUNE)" before the count like the following
1636
1637                  'aaab' =~ /a+b?(*PRUNE)(?{print "$&\n"; $count++})(*FAIL)/;
1638                  print "Count=$count\n";
1639
1640              we prevent backtracking and find the count of the longest
1641              matching string at each matching starting point like so:
1642
1643                  aaab
1644                  aab
1645                  ab
1646                  Count=3
1647
1648              Any number of "(*PRUNE)" assertions may be used in a pattern.
1649
1650              See also "(?>pattern)" and possessive quantifiers for other ways
1651              to control backtracking. In some cases, the use of "(*PRUNE)"
1652              can be replaced with a "(?>pattern)" with no functional
1653              difference; however, "(*PRUNE)" can be used to handle cases that
1654              cannot be expressed using a "(?>pattern)" alone.
1655
1656          "(*SKIP)" "(*SKIP:NAME)"
1657              This zero-width pattern is similar to "(*PRUNE)", except that on
1658              failure it also signifies that whatever text that was matched
1659              leading up to the "(*SKIP)" pattern being executed cannot be
1660              part of any match of this pattern. This effectively means that
1661              the regex engine "skips" forward to this position on failure and
1662              tries to match again, (assuming that there is sufficient room to
1663              match).
1664
1665              The name of the "(*SKIP:NAME)" pattern has special significance.
1666              If a "(*MARK:NAME)" was encountered while matching, then it is
1667              that position which is used as the "skip point". If no "(*MARK)"
1668              of that name was encountered, then the "(*SKIP)" operator has no
1669              effect. When used without a name the "skip point" is where the
1670              match point was when executing the (*SKIP) pattern.
1671
1672              Compare the following to the examples in "(*PRUNE)"; note the
1673              string is twice as long:
1674
1675               'aaabaaab' =~ /a+b?(*SKIP)(?{print "$&\n"; $count++})(*FAIL)/;
1676               print "Count=$count\n";
1677
1678              outputs
1679
1680                  aaab
1681                  aaab
1682                  Count=2
1683
1684              Once the 'aaab' at the start of the string has matched, and the
1685              "(*SKIP)" executed, the next starting point will be where the
1686              cursor was when the "(*SKIP)" was executed.
1687
1688          "(*MARK:NAME)" "(*:NAME)"
1689              This zero-width pattern can be used to mark the point reached in
1690              a string when a certain part of the pattern has been
1691              successfully matched. This mark may be given a name. A later
1692              "(*SKIP)" pattern will then skip forward to that point if
1693              backtracked into on failure. Any number of "(*MARK)" patterns
1694              are allowed, and the NAME portion may be duplicated.
1695
1696              In addition to interacting with the "(*SKIP)" pattern,
1697              "(*MARK:NAME)" can be used to "label" a pattern branch, so that
1698              after matching, the program can determine which branches of the
1699              pattern were involved in the match.
1700
1701              When a match is successful, the $REGMARK variable will be set to
1702              the name of the most recently executed "(*MARK:NAME)" that was
1703              involved in the match.
1704
1705              This can be used to determine which branch of a pattern was
1706              matched without using a separate capture group for each branch,
1707              which in turn can result in a performance improvement, as perl
1708              cannot optimize "/(?:(x)|(y)|(z))/" as efficiently as something
1709              like "/(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/".
1710
1711              When a match has failed, and unless another verb has been
1712              involved in failing the match and has provided its own name to
1713              use, the $REGERROR variable will be set to the name of the most
1714              recently executed "(*MARK:NAME)".
1715
1716              See "(*SKIP)" for more details.
1717
1718              As a shortcut "(*MARK:NAME)" can be written "(*:NAME)".
1719
1720          "(*THEN)" "(*THEN:NAME)"
1721              This is similar to the "cut group" operator "::" from Perl 6.
1722              Like "(*PRUNE)", this verb always matches, and when backtracked
1723              into on failure, it causes the regex engine to try the next
1724              alternation in the innermost enclosing group (capturing or
1725              otherwise) that has alternations.  The two branches of a
1726              "(?(condition)yes-pattern|no-pattern)" do not count as an
1727              alternation, as far as "(*THEN)" is concerned.
1728
1729              Its name comes from the observation that this operation combined
1730              with the alternation operator ("|") can be used to create what
1731              is essentially a pattern-based if/then/else block:
1732
1733                ( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
1734
1735              Note that if this operator is used and NOT inside of an
1736              alternation then it acts exactly like the "(*PRUNE)" operator.
1737
1738                / A (*PRUNE) B /
1739
1740              is the same as
1741
1742                / A (*THEN) B /
1743
1744              but
1745
1746                / ( A (*THEN) B | C (*THEN) D ) /
1747
1748              is not the same as
1749
1750                / ( A (*PRUNE) B | C (*PRUNE) D ) /
1751
1752              as after matching the A but failing on the B the "(*THEN)" verb
1753              will backtrack and try C; but the "(*PRUNE)" verb will simply
1754              fail.
1755
1756       Verbs without an argument
1757          "(*COMMIT)"
1758              This is the Perl 6 "commit pattern" "<commit>" or ":::". It's a
1759              zero-width pattern similar to "(*SKIP)", except that when
1760              backtracked into on failure it causes the match to fail
1761              outright. No further attempts to find a valid match by advancing
1762              the start pointer will occur again.  For example,
1763
1764               'aaabaaab' =~ /a+b?(*COMMIT)(?{print "$&\n"; $count++})(*FAIL)/;
1765               print "Count=$count\n";
1766
1767              outputs
1768
1769                  aaab
1770                  Count=1
1771
1772              In other words, once the "(*COMMIT)" has been entered, and if
1773              the pattern does not match, the regex engine will not try any
1774              further matching on the rest of the string.
1775
1776          "(*FAIL)" "(*F)"
1777              This pattern matches nothing and always fails. It can be used to
1778              force the engine to backtrack. It is equivalent to "(?!)", but
1779              easier to read. In fact, "(?!)" gets optimised into "(*FAIL)"
1780              internally.
1781
1782              It is probably useful only when combined with "(?{})" or
1783              "(??{})".
1784
1785          "(*ACCEPT)"
1786              WARNING: This feature is highly experimental. It is not
1787              recommended for production code.
1788
1789              This pattern matches nothing and causes the end of successful
1790              matching at the point at which the "(*ACCEPT)" pattern was
1791              encountered, regardless of whether there is actually more to
1792              match in the string. When inside of a nested pattern, such as
1793              recursion, or in a subpattern dynamically generated via
1794              "(??{})", only the innermost pattern is ended immediately.
1795
1796              If the "(*ACCEPT)" is inside of capturing groups then the groups
1797              are marked as ended at the point at which the "(*ACCEPT)" was
1798              encountered.  For instance:
1799
1800                'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
1801
1802              will match, and $1 will be "AB" and $2 will be "B", $3 will not
1803              be set. If another branch in the inner parentheses was matched,
1804              such as in the string 'ACDE', then the "D" and "E" would have to
1805              be matched as well.
1806
1807   Backtracking
1808       NOTE: This section presents an abstract approximation of regular
1809       expression behavior.  For a more rigorous (and complicated) view of the
1810       rules involved in selecting a match among possible alternatives, see
1811       "Combining RE Pieces".
1812
1813       A fundamental feature of regular expression matching involves the
1814       notion called backtracking, which is currently used (when needed) by
1815       all regular non-possessive expression quantifiers, namely "*", "*?",
1816       "+", "+?", "{n,m}", and "{n,m}?".  Backtracking is often optimized
1817       internally, but the general principle outlined here is valid.
1818
1819       For a regular expression to match, the entire regular expression must
1820       match, not just part of it.  So if the beginning of a pattern
1821       containing a quantifier succeeds in a way that causes later parts in
1822       the pattern to fail, the matching engine backs up and recalculates the
1823       beginning part--that's why it's called backtracking.
1824
1825       Here is an example of backtracking:  Let's say you want to find the
1826       word following "foo" in the string "Food is on the foo table.":
1827
1828           $_ = "Food is on the foo table.";
1829           if ( /\b(foo)\s+(\w+)/i ) {
1830               print "$2 follows $1.\n";
1831           }
1832
1833       When the match runs, the first part of the regular expression
1834       ("\b(foo)") finds a possible match right at the beginning of the
1835       string, and loads up $1 with "Foo".  However, as soon as the matching
1836       engine sees that there's no whitespace following the "Foo" that it had
1837       saved in $1, it realizes its mistake and starts over again one
1838       character after where it had the tentative match.  This time it goes
1839       all the way until the next occurrence of "foo". The complete regular
1840       expression matches this time, and you get the expected output of "table
1841       follows foo."
1842
1843       Sometimes minimal matching can help a lot.  Imagine you'd like to match
1844       everything between "foo" and "bar".  Initially, you write something
1845       like this:
1846
1847           $_ =  "The food is under the bar in the barn.";
1848           if ( /foo(.*)bar/ ) {
1849               print "got <$1>\n";
1850           }
1851
1852       Which perhaps unexpectedly yields:
1853
1854         got <d is under the bar in the >
1855
1856       That's because ".*" was greedy, so you get everything between the first
1857       "foo" and the last "bar".  Here it's more effective to use minimal
1858       matching to make sure you get the text between a "foo" and the first
1859       "bar" thereafter.
1860
1861           if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
1862         got <d is under the >
1863
1864       Here's another example. Let's say you'd like to match a number at the
1865       end of a string, and you also want to keep the preceding part of the
1866       match.  So you write this:
1867
1868           $_ = "I have 2 numbers: 53147";
1869           if ( /(.*)(\d*)/ ) {                                # Wrong!
1870               print "Beginning is <$1>, number is <$2>.\n";
1871           }
1872
1873       That won't work at all, because ".*" was greedy and gobbled up the
1874       whole string. As "\d*" can match on an empty string the complete
1875       regular expression matched successfully.
1876
1877           Beginning is <I have 2 numbers: 53147>, number is <>.
1878
1879       Here are some variants, most of which don't work:
1880
1881           $_ = "I have 2 numbers: 53147";
1882           @pats = qw{
1883               (.*)(\d*)
1884               (.*)(\d+)
1885               (.*?)(\d*)
1886               (.*?)(\d+)
1887               (.*)(\d+)$
1888               (.*?)(\d+)$
1889               (.*)\b(\d+)$
1890               (.*\D)(\d+)$
1891           };
1892
1893           for $pat (@pats) {
1894               printf "%-12s ", $pat;
1895               if ( /$pat/ ) {
1896                   print "<$1> <$2>\n";
1897               } else {
1898                   print "FAIL\n";
1899               }
1900           }
1901
1902       That will print out:
1903
1904           (.*)(\d*)    <I have 2 numbers: 53147> <>
1905           (.*)(\d+)    <I have 2 numbers: 5314> <7>
1906           (.*?)(\d*)   <> <>
1907           (.*?)(\d+)   <I have > <2>
1908           (.*)(\d+)$   <I have 2 numbers: 5314> <7>
1909           (.*?)(\d+)$  <I have 2 numbers: > <53147>
1910           (.*)\b(\d+)$ <I have 2 numbers: > <53147>
1911           (.*\D)(\d+)$ <I have 2 numbers: > <53147>
1912
1913       As you see, this can be a bit tricky.  It's important to realize that a
1914       regular expression is merely a set of assertions that gives a
1915       definition of success.  There may be 0, 1, or several different ways
1916       that the definition might succeed against a particular string.  And if
1917       there are multiple ways it might succeed, you need to understand
1918       backtracking to know which variety of success you will achieve.
1919
1920       When using look-ahead assertions and negations, this can all get even
1921       trickier.  Imagine you'd like to find a sequence of non-digits not
1922       followed by "123".  You might try to write that as
1923
1924           $_ = "ABC123";
1925           if ( /^\D*(?!123)/ ) {                # Wrong!
1926               print "Yup, no 123 in $_\n";
1927           }
1928
1929       But that isn't going to match; at least, not the way you're hoping.  It
1930       claims that there is no 123 in the string.  Here's a clearer picture of
1931       why that pattern matches, contrary to popular expectations:
1932
1933           $x = 'ABC123';
1934           $y = 'ABC445';
1935
1936           print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
1937           print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
1938
1939           print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
1940           print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
1941
1942       This prints
1943
1944           2: got ABC
1945           3: got AB
1946           4: got ABC
1947
1948       You might have expected test 3 to fail because it seems to a more
1949       general purpose version of test 1.  The important difference between
1950       them is that test 3 contains a quantifier ("\D*") and so can use
1951       backtracking, whereas test 1 will not.  What's happening is that you've
1952       asked "Is it true that at the start of $x, following 0 or more non-
1953       digits, you have something that's not 123?"  If the pattern matcher had
1954       let "\D*" expand to "ABC", this would have caused the whole pattern to
1955       fail.
1956
1957       The search engine will initially match "\D*" with "ABC".  Then it will
1958       try to match "(?!123)" with "123", which fails.  But because a
1959       quantifier ("\D*") has been used in the regular expression, the search
1960       engine can backtrack and retry the match differently in the hope of
1961       matching the complete regular expression.
1962
1963       The pattern really, really wants to succeed, so it uses the standard
1964       pattern back-off-and-retry and lets "\D*" expand to just "AB" this
1965       time.  Now there's indeed something following "AB" that is not "123".
1966       It's "C123", which suffices.
1967
1968       We can deal with this by using both an assertion and a negation.  We'll
1969       say that the first part in $1 must be followed both by a digit and by
1970       something that's not "123".  Remember that the look-aheads are zero-
1971       width expressions--they only look, but don't consume any of the string
1972       in their match.  So rewriting this way produces what you'd expect; that
1973       is, case 5 will fail, but case 6 succeeds:
1974
1975           print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
1976           print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
1977
1978           6: got ABC
1979
1980       In other words, the two zero-width assertions next to each other work
1981       as though they're ANDed together, just as you'd use any built-in
1982       assertions:  "/^$/" matches only if you're at the beginning of the line
1983       AND the end of the line simultaneously.  The deeper underlying truth is
1984       that juxtaposition in regular expressions always means AND, except when
1985       you write an explicit OR using the vertical bar.  "/ab/" means match
1986       "a" AND (then) match "b", although the attempted matches are made at
1987       different positions because "a" is not a zero-width assertion, but a
1988       one-width assertion.
1989
1990       WARNING: Particularly complicated regular expressions can take
1991       exponential time to solve because of the immense number of possible
1992       ways they can use backtracking to try for a match.  For example,
1993       without internal optimizations done by the regular expression engine,
1994       this will take a painfully long time to run:
1995
1996           'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
1997
1998       And if you used "*"'s in the internal groups instead of limiting them
1999       to 0 through 5 matches, then it would take forever--or until you ran
2000       out of stack space.  Moreover, these internal optimizations are not
2001       always applicable.  For example, if you put "{0,5}" instead of "*" on
2002       the external group, no current optimization is applicable, and the
2003       match takes a long time to finish.
2004
2005       A powerful tool for optimizing such beasts is what is known as an
2006       "independent group", which does not backtrack (see ""(?>pattern)"").
2007       Note also that zero-length look-ahead/look-behind assertions will not
2008       backtrack to make the tail match, since they are in "logical" context:
2009       only whether they match is considered relevant.  For an example where
2010       side-effects of look-ahead might have influenced the following match,
2011       see ""(?>pattern)"".
2012
2013   Version 8 Regular Expressions
2014       In case you're not familiar with the "regular" Version 8 regex
2015       routines, here are the pattern-matching rules not described above.
2016
2017       Any single character matches itself, unless it is a metacharacter with
2018       a special meaning described here or above.  You can cause characters
2019       that normally function as metacharacters to be interpreted literally by
2020       prefixing them with a "\" (e.g., "\." matches a ".", not any character;
2021       "\\" matches a "\"). This escape mechanism is also required for the
2022       character used as the pattern delimiter.
2023
2024       A series of characters matches that series of characters in the target
2025       string, so the pattern "blurfl" would match "blurfl" in the target
2026       string.
2027
2028       You can specify a character class, by enclosing a list of characters in
2029       "[]", which will match any character from the list.  If the first
2030       character after the "[" is "^", the class matches any character not in
2031       the list.  Within a list, the "-" character specifies a range, so that
2032       "a-z" represents all characters between "a" and "z", inclusive.  If you
2033       want either "-" or "]" itself to be a member of a class, put it at the
2034       start of the list (possibly after a "^"), or escape it with a
2035       backslash.  "-" is also taken literally when it is at the end of the
2036       list, just before the closing "]".  (The following all specify the same
2037       class of three characters: "[-az]", "[az-]", and "[a\-z]".  All are
2038       different from "[a-z]", which specifies a class containing twenty-six
2039       characters, even on EBCDIC-based character sets.)  Also, if you try to
2040       use the character classes "\w", "\W", "\s", "\S", "\d", or "\D" as
2041       endpoints of a range, the "-" is understood literally.
2042
2043       Note also that the whole range idea is rather unportable between
2044       character sets--and even within character sets they may cause results
2045       you probably didn't expect.  A sound principle is to use only ranges
2046       that begin from and end at either alphabetics of equal case ([a-e],
2047       [A-E]), or digits ([0-9]).  Anything else is unsafe.  If in doubt,
2048       spell out the character sets in full.
2049
2050       Characters may be specified using a metacharacter syntax much like that
2051       used in C: "\n" matches a newline, "\t" a tab, "\r" a carriage return,
2052       "\f" a form feed, etc.  More generally, \nnn, where nnn is a string of
2053       three octal digits, matches the character whose coded character set
2054       value is nnn.  Similarly, \xnn, where nn are hexadecimal digits,
2055       matches the character whose ordinal is nn. The expression \cx matches
2056       the character control-x.  Finally, the "." metacharacter matches any
2057       character except "\n" (unless you use "/s").
2058
2059       You can specify a series of alternatives for a pattern using "|" to
2060       separate them, so that "fee|fie|foe" will match any of "fee", "fie", or
2061       "foe" in the target string (as would "f(e|i|o)e").  The first
2062       alternative includes everything from the last pattern delimiter ("(",
2063       "(?:", etc. or the beginning of the pattern) up to the first "|", and
2064       the last alternative contains everything from the last "|" to the next
2065       closing pattern delimiter.  That's why it's common practice to include
2066       alternatives in parentheses: to minimize confusion about where they
2067       start and end.
2068
2069       Alternatives are tried from left to right, so the first alternative
2070       found for which the entire expression matches, is the one that is
2071       chosen. This means that alternatives are not necessarily greedy. For
2072       example: when matching "foo|foot" against "barefoot", only the "foo"
2073       part will match, as that is the first alternative tried, and it
2074       successfully matches the target string. (This might not seem important,
2075       but it is important when you are capturing matched text using
2076       parentheses.)
2077
2078       Also remember that "|" is interpreted as a literal within square
2079       brackets, so if you write "[fee|fie|foe]" you're really only matching
2080       "[feio|]".
2081
2082       Within a pattern, you may designate subpatterns for later reference by
2083       enclosing them in parentheses, and you may refer back to the nth
2084       subpattern later in the pattern using the metacharacter \n or \gn.
2085       Subpatterns are numbered based on the left to right order of their
2086       opening parenthesis.  A backreference matches whatever actually matched
2087       the subpattern in the string being examined, not the rules for that
2088       subpattern.  Therefore, "(0|0x)\d*\s\g1\d*" will match "0x1234 0x4321",
2089       but not "0x1234 01234", because subpattern 1 matched "0x", even though
2090       the rule "0|0x" could potentially match the leading 0 in the second
2091       number.
2092
2093   Warning on \1 Instead of $1
2094       Some people get too used to writing things like:
2095
2096           $pattern =~ s/(\W)/\\\1/g;
2097
2098       This is grandfathered (for \1 to \9) for the RHS of a substitute to
2099       avoid shocking the sed addicts, but it's a dirty habit to get into.
2100       That's because in PerlThink, the righthand side of an "s///" is a
2101       double-quoted string.  "\1" in the usual double-quoted string means a
2102       control-A.  The customary Unix meaning of "\1" is kludged in for
2103       "s///".  However, if you get into the habit of doing that, you get
2104       yourself into trouble if you then add an "/e" modifier.
2105
2106           s/(\d+)/ \1 + 1 /eg;            # causes warning under -w
2107
2108       Or if you try to do
2109
2110           s/(\d+)/\1000/;
2111
2112       You can't disambiguate that by saying "\{1}000", whereas you can fix it
2113       with "${1}000".  The operation of interpolation should not be confused
2114       with the operation of matching a backreference.  Certainly they mean
2115       two different things on the left side of the "s///".
2116
2117   Repeated Patterns Matching a Zero-length Substring
2118       WARNING: Difficult material (and prose) ahead.  This section needs a
2119       rewrite.
2120
2121       Regular expressions provide a terse and powerful programming language.
2122       As with most other power tools, power comes together with the ability
2123       to wreak havoc.
2124
2125       A common abuse of this power stems from the ability to make infinite
2126       loops using regular expressions, with something as innocuous as:
2127
2128           'foo' =~ m{ ( o? )* }x;
2129
2130       The "o?" matches at the beginning of 'foo', and since the position in
2131       the string is not moved by the match, "o?" would match again and again
2132       because of the "*" quantifier.  Another common way to create a similar
2133       cycle is with the looping modifier "//g":
2134
2135           @matches = ( 'foo' =~ m{ o? }xg );
2136
2137       or
2138
2139           print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
2140
2141       or the loop implied by split().
2142
2143       However, long experience has shown that many programming tasks may be
2144       significantly simplified by using repeated subexpressions that may
2145       match zero-length substrings.  Here's a simple example being:
2146
2147           @chars = split //, $string;           # // is not magic in split
2148           ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
2149
2150       Thus Perl allows such constructs, by forcefully breaking the infinite
2151       loop.  The rules for this are different for lower-level loops given by
2152       the greedy quantifiers "*+{}", and for higher-level ones like the "/g"
2153       modifier or split() operator.
2154
2155       The lower-level loops are interrupted (that is, the loop is broken)
2156       when Perl detects that a repeated expression matched a zero-length
2157       substring.   Thus
2158
2159          m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
2160
2161       is made equivalent to
2162
2163          m{ (?: NON_ZERO_LENGTH )* (?: ZERO_LENGTH )? }x;
2164
2165       For example, this program
2166
2167          #!perl -l
2168          "aaaaab" =~ /
2169            (?:
2170               a                 # non-zero
2171               |                 # or
2172              (?{print "hello"}) # print hello whenever this
2173                                 #    branch is tried
2174              (?=(b))            # zero-width assertion
2175            )*  # any number of times
2176           /x;
2177          print $&;
2178          print $1;
2179
2180       prints
2181
2182          hello
2183          aaaaa
2184          b
2185
2186       Notice that "hello" is only printed once, as when Perl sees that the
2187       sixth iteration of the outermost "(?:)*" matches a zero-length string,
2188       it stops the "*".
2189
2190       The higher-level loops preserve an additional state between iterations:
2191       whether the last match was zero-length.  To break the loop, the
2192       following match after a zero-length match is prohibited to have a
2193       length of zero.  This prohibition interacts with backtracking (see
2194       "Backtracking"), and so the second best match is chosen if the best
2195       match is of zero length.
2196
2197       For example:
2198
2199           $_ = 'bar';
2200           s/\w??/<$&>/g;
2201
2202       results in "<><b><><a><><r><>".  At each position of the string the
2203       best match given by non-greedy "??" is the zero-length match, and the
2204       second best match is what is matched by "\w".  Thus zero-length matches
2205       alternate with one-character-long matches.
2206
2207       Similarly, for repeated "m/()/g" the second-best match is the match at
2208       the position one notch further in the string.
2209
2210       The additional state of being matched with zero-length is associated
2211       with the matched string, and is reset by each assignment to pos().
2212       Zero-length matches at the end of the previous match are ignored during
2213       "split".
2214
2215   Combining RE Pieces
2216       Each of the elementary pieces of regular expressions which were
2217       described before (such as "ab" or "\Z") could match at most one
2218       substring at the given position of the input string.  However, in a
2219       typical regular expression these elementary pieces are combined into
2220       more complicated patterns using combining operators "ST", "S|T", "S*"
2221       etc.  (in these examples "S" and "T" are regular subexpressions).
2222
2223       Such combinations can include alternatives, leading to a problem of
2224       choice: if we match a regular expression "a|ab" against "abc", will it
2225       match substring "a" or "ab"?  One way to describe which substring is
2226       actually matched is the concept of backtracking (see "Backtracking").
2227       However, this description is too low-level and makes you think in terms
2228       of a particular implementation.
2229
2230       Another description starts with notions of "better"/"worse".  All the
2231       substrings which may be matched by the given regular expression can be
2232       sorted from the "best" match to the "worst" match, and it is the "best"
2233       match which is chosen.  This substitutes the question of "what is
2234       chosen?"  by the question of "which matches are better, and which are
2235       worse?".
2236
2237       Again, for elementary pieces there is no such question, since at most
2238       one match at a given position is possible.  This section describes the
2239       notion of better/worse for combining operators.  In the description
2240       below "S" and "T" are regular subexpressions.
2241
2242       "ST"
2243           Consider two possible matches, "AB" and "A'B'", "A" and "A'" are
2244           substrings which can be matched by "S", "B" and "B'" are substrings
2245           which can be matched by "T".
2246
2247           If "A" is a better match for "S" than "A'", "AB" is a better match
2248           than "A'B'".
2249
2250           If "A" and "A'" coincide: "AB" is a better match than "AB'" if "B"
2251           is a better match for "T" than "B'".
2252
2253       "S|T"
2254           When "S" can match, it is a better match than when only "T" can
2255           match.
2256
2257           Ordering of two matches for "S" is the same as for "S".  Similar
2258           for two matches for "T".
2259
2260       "S{REPEAT_COUNT}"
2261           Matches as "SSS...S" (repeated as many times as necessary).
2262
2263       "S{min,max}"
2264           Matches as "S{max}|S{max-1}|...|S{min+1}|S{min}".
2265
2266       "S{min,max}?"
2267           Matches as "S{min}|S{min+1}|...|S{max-1}|S{max}".
2268
2269       "S?", "S*", "S+"
2270           Same as "S{0,1}", "S{0,BIG_NUMBER}", "S{1,BIG_NUMBER}"
2271           respectively.
2272
2273       "S??", "S*?", "S+?"
2274           Same as "S{0,1}?", "S{0,BIG_NUMBER}?", "S{1,BIG_NUMBER}?"
2275           respectively.
2276
2277       "(?>S)"
2278           Matches the best match for "S" and only that.
2279
2280       "(?=S)", "(?<=S)"
2281           Only the best match for "S" is considered.  (This is important only
2282           if "S" has capturing parentheses, and backreferences are used
2283           somewhere else in the whole regular expression.)
2284
2285       "(?!S)", "(?<!S)"
2286           For this grouping operator there is no need to describe the
2287           ordering, since only whether or not "S" can match is important.
2288
2289       "(??{ EXPR })", "(?PARNO)"
2290           The ordering is the same as for the regular expression which is the
2291           result of EXPR, or the pattern contained by capture group PARNO.
2292
2293       "(?(condition)yes-pattern|no-pattern)"
2294           Recall that which of "yes-pattern" or "no-pattern" actually matches
2295           is already determined.  The ordering of the matches is the same as
2296           for the chosen subexpression.
2297
2298       The above recipes describe the ordering of matches at a given position.
2299       One more rule is needed to understand how a match is determined for the
2300       whole regular expression: a match at an earlier position is always
2301       better than a match at a later position.
2302
2303   Creating Custom RE Engines
2304       As of Perl 5.10.0, one can create custom regular expression engines.
2305       This is not for the faint of heart, as they have to plug in at the C
2306       level.  See perlreapi for more details.
2307
2308       As an alternative, overloaded constants (see overload) provide a simple
2309       way to extend the functionality of the RE engine, by substituting one
2310       pattern for another.
2311
2312       Suppose that we want to enable a new RE escape-sequence "\Y|" which
2313       matches at a boundary between whitespace characters and non-whitespace
2314       characters.  Note that "(?=\S)(?<!\S)|(?!\S)(?<=\S)" matches exactly at
2315       these positions, so we want to have each "\Y|" in the place of the more
2316       complicated version.  We can create a module "customre" to do this:
2317
2318           package customre;
2319           use overload;
2320
2321           sub import {
2322             shift;
2323             die "No argument to customre::import allowed" if @_;
2324             overload::constant 'qr' => \&convert;
2325           }
2326
2327           sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
2328
2329           # We must also take care of not escaping the legitimate \\Y|
2330           # sequence, hence the presence of '\\' in the conversion rules.
2331           my %rules = ( '\\' => '\\\\',
2332                         'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
2333           sub convert {
2334             my $re = shift;
2335             $re =~ s{
2336                       \\ ( \\ | Y . )
2337                     }
2338                     { $rules{$1} or invalid($re,$1) }sgex;
2339             return $re;
2340           }
2341
2342       Now "use customre" enables the new escape in constant regular
2343       expressions, i.e., those without any runtime variable interpolations.
2344       As documented in overload, this conversion will work only over literal
2345       parts of regular expressions.  For "\Y|$re\Y|" the variable part of
2346       this regular expression needs to be converted explicitly (but only if
2347       the special meaning of "\Y|" should be enabled inside $re):
2348
2349           use customre;
2350           $re = <>;
2351           chomp $re;
2352           $re = customre::convert $re;
2353           /\Y|$re\Y|/;
2354
2355   PCRE/Python Support
2356       As of Perl 5.10.0, Perl supports several Python/PCRE-specific
2357       extensions to the regex syntax. While Perl programmers are encouraged
2358       to use the Perl-specific syntax, the following are also accepted:
2359
2360       "(?P<NAME>pattern)"
2361           Define a named capture group. Equivalent to "(?<NAME>pattern)".
2362
2363       "(?P=NAME)"
2364           Backreference to a named capture group. Equivalent to "\g{NAME}".
2365
2366       "(?P>NAME)"
2367           Subroutine call to a named capture group. Equivalent to "(?&NAME)".
2368

BUGS

2370       Many regular expression constructs don't work on EBCDIC platforms.
2371
2372       There are a number of issues with regard to case-insensitive matching
2373       in Unicode rules.  See "i" under "Modifiers" above.
2374
2375       This document varies from difficult to understand to completely and
2376       utterly opaque.  The wandering prose riddled with jargon is hard to
2377       fathom in several places.
2378
2379       This document needs a rewrite that separates the tutorial content from
2380       the reference content.
2381

NAME

DESCRIPTION

BUGS

SEE ALSO