1PERLRE(1)              Perl Programmers Reference Guide              PERLRE(1)
2
3
4

NAME

6       perlre - Perl regular expressions
7

DESCRIPTION

9       This page describes the syntax of regular expressions in Perl.
10
11       If you haven't used regular expressions before, a tutorial introduction
12       is available in perlretut.  If you know just a little about them, a
13       quick-start introduction is available in perlrequick.
14
15       Except for "The Basics" section, this page assumes you are familiar
16       with regular expression basics, like what is a "pattern", what does it
17       look like, and how it is basically used.  For a reference on how they
18       are used, plus various examples of the same, see discussions of "m//",
19       "s///", "qr//" and "??" in "Regexp Quote-Like Operators" in perlop.
20
21       New in v5.22, "use re 'strict'" applies stricter rules than otherwise
22       when compiling regular expression patterns.  It can find things that,
23       while legal, may not be what you intended.
24
25   The Basics
26       Regular expressions are strings with the very particular syntax and
27       meaning described in this document and auxiliary documents referred to
28       by this one.  The strings are called "patterns".  Patterns are used to
29       determine if some other string, called the "target", has (or doesn't
30       have) the characteristics specified by the pattern.  We call this
31       "matching" the target string against the pattern.  Usually the match is
32       done by having the target be the first operand, and the pattern be the
33       second operand, of one of the two binary operators "=~" and "!~",
34       listed in "Binding Operators" in perlop; and the pattern will have been
35       converted from an ordinary string by one of the operators in "Regexp
36       Quote-Like Operators" in perlop, like so:
37
38        $foo =~ m/abc/
39
40       This evaluates to true if and only if the string in the variable $foo
41       contains somewhere in it, the sequence of characters "a", "b", then
42       "c".  (The "=~ m", or match operator, is described in
43       "m/PATTERN/msixpodualngc" in perlop.)
44
45       Patterns that aren't already stored in some variable must be
46       delimitted, at both ends, by delimitter characters.  These are often,
47       as in the example above, forward slashes, and the typical way a pattern
48       is written in documentation is with those slashes.  In most cases, the
49       delimitter is the same character, fore and aft, but there are a few
50       cases where a character looks like it has a mirror-image mate, where
51       the opening version is the beginning delimiter, and the closing one is
52       the ending delimiter, like
53
54        $foo =~ m<abc>
55
56       Most times, the pattern is evaluated in double-quotish context, but it
57       is possible to choose delimiters to force single-quotish, like
58
59        $foo =~ m'abc'
60
61       If the pattern contains its delimiter within it, that delimiter must be
62       escaped.  Prefixing it with a backslash (e.g., "/foo\/bar/") serves
63       this purpose.
64
65       Any single character in a pattern matches that same character in the
66       target string, unless the character is a metacharacter with a special
67       meaning described in this document.  A sequence of non-metacharacters
68       matches the same sequence in the target string, as we saw above with
69       "m/abc/".
70
71       Only a few characters (all of them being ASCII punctuation characters)
72       are metacharacters.  The most commonly used one is a dot ".", which
73       normally matches almost any character (including a dot itself).
74
75       You can cause characters that normally function as metacharacters to be
76       interpreted literally by prefixing them with a "\", just like the
77       pattern's delimiter must be escaped if it also occurs within the
78       pattern.  Thus, "\." matches just a literal dot, "." instead of its
79       normal meaning.  This means that the backslash is also a metacharacter,
80       so "\\" matches a single "\".  And a sequence that contains an escaped
81       metacharacter matches the same sequence (but without the escape) in the
82       target string.  So, the pattern "/blur\\fl/" would match any target
83       string that contains the sequence "blur\fl".
84
85       The metacharacter "|" is used to match one thing or another.  Thus
86
87        $foo =~ m/this|that/
88
89       is TRUE if and only if $foo contains either the sequence "this" or the
90       sequence "that".  Like all metacharacters, prefixing the "|" with a
91       backslash makes it match the plain punctuation character; in its case,
92       the VERTICAL LINE.
93
94        $foo =~ m/this\|that/
95
96       is TRUE if and only if $foo contains the sequence "this|that".
97
98       You aren't limited to just a single "|".
99
100        $foo =~ m/fee|fie|foe|fum/
101
102       is TRUE if and only if $foo contains any of those 4 sequences from the
103       children's story "Jack and the Beanstalk".
104
105       As you can see, the "|" binds less tightly than a sequence of ordinary
106       characters.  We can override this by using the grouping metacharacters,
107       the parentheses "(" and ")".
108
109        $foo =~ m/th(is|at) thing/
110
111       is TRUE if and only if $foo contains either the sequence "this thing"
112       or the sequence "that thing".  The portions of the string that match
113       the portions of the pattern enclosed in parentheses are normally made
114       available separately for use later in the pattern, substitution, or
115       program.  This is called "capturing", and it can get complicated.  See
116       "Capture groups".
117
118       The first alternative includes everything from the last pattern
119       delimiter ("(", "(?:" (described later), etc. or the beginning of the
120       pattern) up to the first "|", and the last alternative contains
121       everything from the last "|" to the next closing pattern delimiter.
122       That's why it's common practice to include alternatives in parentheses:
123       to minimize confusion about where they start and end.
124
125       Alternatives are tried from left to right, so the first alternative
126       found for which the entire expression matches, is the one that is
127       chosen. This means that alternatives are not necessarily greedy. For
128       example: when matching "foo|foot" against "barefoot", only the "foo"
129       part will match, as that is the first alternative tried, and it
130       successfully matches the target string. (This might not seem important,
131       but it is important when you are capturing matched text using
132       parentheses.)
133
134       Besides taking away the special meaning of a metacharacter, a prefixed
135       backslash changes some letter and digit characters away from matching
136       just themselves to instead have special meaning.  These are called
137       "escape sequences", and all such are described in perlrebackslash.  A
138       backslash sequence (of a letter or digit) that doesn't currently have
139       special meaning to Perl will raise a warning if warnings are enabled,
140       as those are reserved for potential future use.
141
142       One such sequence is "\b", which matches a boundary of some sort.
143       "\b{wb}" and a few others give specialized types of boundaries.  (They
144       are all described in detail starting at "\b{}, \b, \B{}, \B" in
145       perlrebackslash.)  Note that these don't match characters, but the
146       zero-width spaces between characters.  They are an example of a zero-
147       width assertion.  Consider again,
148
149        $foo =~ m/fee|fie|foe|fum/
150
151       It evaluates to TRUE if, besides those 4 words, any of the sequences
152       "feed", "field", "Defoe", "fume", and many others are in $foo.  By
153       judicious use of "\b" (or better (because it is designed to handle
154       natural language) "\b{wb}"), we can make sure that only the Giant's
155       words are matched:
156
157        $foo =~ m/\b(fee|fie|foe|fum)\b/
158        $foo =~ m/\b{wb}(fee|fie|foe|fum)\b{wb}/
159
160       The final example shows that the characters "{" and "}" are
161       metacharacters.
162
163       Another use for escape sequences is to specify characters that cannot
164       (or which you prefer not to) be written literally.  These are described
165       in detail in "Character Escapes" in perlrebackslash, but the next three
166       paragraphs briefly describe some of them.
167
168       Various control characters can be written in C language style: "\n"
169       matches a newline, "\t" a tab, "\r" a carriage return, "\f" a form
170       feed, etc.
171
172       More generally, "\nnn", where nnn is a string of three octal digits,
173       matches the character whose native code point is nnn.  You can easily
174       run into trouble if you don't have exactly three digits.  So always use
175       three, or since Perl 5.14, you can use "\o{...}" to specify any number
176       of octal digits.
177
178       Similarly, "\xnn", where nn are hexadecimal digits, matches the
179       character whose native ordinal is nn.  Again, not using exactly two
180       digits is a recipe for disaster, but you can use "\x{...}" to specify
181       any number of hex digits.
182
183       Besides being a metacharacter, the "." is an example of a "character
184       class", something that can match any single character of a given set of
185       them.  In its case, the set is just about all possible characters.
186       Perl predefines several character classes besides the "."; there is a
187       separate reference page about just these, perlrecharclass.
188
189       You can define your own custom character classes, by putting into your
190       pattern in the appropriate place(s), a list of all the characters you
191       want in the set.  You do this by enclosing the list within "[]" bracket
192       characters.  These are called "bracketed character classes" when we are
193       being precise, but often the word "bracketed" is dropped.  (Dropping it
194       usually doesn't cause confusion.)  This means that the "[" character is
195       another metacharacter.  It doesn't match anything just by itself; it is
196       used only to tell Perl that what follows it is a bracketed character
197       class.  If you want to match a literal left square bracket, you must
198       escape it, like "\[".  The matching "]" is also a metacharacter; again
199       it doesn't match anything by itself, but just marks the end of your
200       custom class to Perl.  It is an example of a "sometimes metacharacter".
201       It isn't a metacharacter if there is no corresponding "[", and matches
202       its literal self:
203
204        print "]" =~ /]/;  # prints 1
205
206       The list of characters within the character class gives the set of
207       characters matched by the class.  "[abc]" matches a single "a" or "b"
208       or "c".  But if the first character after the "[" is "^", the class
209       instead matches any character not in the list.  Within a list, the "-"
210       character specifies a range of characters, so that "a-z" represents all
211       characters between "a" and "z", inclusive.  If you want either "-" or
212       "]" itself to be a member of a class, put it at the start of the list
213       (possibly after a "^"), or escape it with a backslash.  "-" is also
214       taken literally when it is at the end of the list, just before the
215       closing "]".  (The following all specify the same class of three
216       characters: "[-az]", "[az-]", and "[a\-z]".  All are different from
217       "[a-z]", which specifies a class containing twenty-six characters, even
218       on EBCDIC-based character sets.)
219
220       There is lots more to bracketed character classes; full details are in
221       "Bracketed Character Classes" in perlrecharclass.
222
223       Metacharacters
224
225       "The Basics" introduced some of the metacharacters.  This section gives
226       them all.  Most of them have the same meaning as in the egrep command.
227
228       Only the "\" is always a metacharacter.  The others are metacharacters
229       just sometimes.  The following tables lists all of them, summarizes
230       their use, and gives the contexts where they are metacharacters.
231       Outside those contexts or if prefixed by a "\", they match their
232       corresponding punctuation character.  In some cases, their meaning
233       varies depending on various pattern modifiers that alter the default
234       behaviors.  See "Modifiers".
235
236                   PURPOSE                                  WHERE
237        \   Escape the next character                    Always, except when
238                                                         escaped by another \
239        ^   Match the beginning of the string            Not in []
240              (or line, if /m is used)
241        ^   Complement the [] class                      At the beginning of []
242        .   Match any single character except newline    Not in []
243              (under /s, includes newline)
244        $   Match the end of the string                  Not in [], but can
245              (or before newline at the end of the       mean interpolate a
246              string; or before any newline if /m is     scalar
247              used)
248        |   Alternation                                  Not in []
249        ()  Grouping                                     Not in []
250        [   Start Bracketed Character class              Not in []
251        ]   End Bracketed Character class                Only in [], and
252                                                           not first
253        *   Matches the preceding element 0 or more      Not in []
254              times
255        +   Matches the preceding element 1 or more      Not in []
256              times
257        ?   Matches the preceding element 0 or 1         Not in []
258              times
259        {   Starts a sequence that gives number(s)       Not in []
260              of times the preceding element can be
261              matched
262        {   when following certain escape sequences
263              starts a modifier to the meaning of the
264              sequence
265        }   End sequence started by {
266        -   Indicates a range                            Only in [] interior
267        #   Beginning of comment, extends to line end    Only with /x modifier
268
269       Notice that most of the metacharacters lose their special meaning when
270       they occur in a bracketed character class, except "^" has a different
271       meaning when it is at the beginning of such a class.  And "-" and "]"
272       are metacharacters only at restricted positions within bracketed
273       character classes; while "}" is a metacharacter only when closing a
274       special construct started by "{".
275
276       In double-quotish context, as is usually the case,  you need to be
277       careful about "$" and the non-metacharacter "@".  Those could
278       interpolate variables, which may or may not be what you intended.
279
280       These rules were designed for compactness of expression, rather than
281       legibility and maintainability.  The "/x and /xx" pattern modifiers
282       allow you to insert white space to improve readability.  And use of
283       "re 'strict'" adds extra checking to catch some typos that might
284       silently compile into something unintended.
285
286       By default, the "^" character is guaranteed to match only the beginning
287       of the string, the "$" character only the end (or before the newline at
288       the end), and Perl does certain optimizations with the assumption that
289       the string contains only one line.  Embedded newlines will not be
290       matched by "^" or "$".  You may, however, wish to treat a string as a
291       multi-line buffer, such that the "^" will match after any newline
292       within the string (except if the newline is the last character in the
293       string), and "$" will match before any newline.  At the cost of a
294       little more overhead, you can do this by using the "/m" modifier on the
295       pattern match operator.  (Older programs did this by setting $*, but
296       this option was removed in perl 5.10.)
297
298       To simplify multi-line substitutions, the "." character never matches a
299       newline unless you use the "/s" modifier, which in effect tells Perl to
300       pretend the string is a single line--even if it isn't.
301
302   Modifiers
303       Overview
304
305       The default behavior for matching can be changed, using various
306       modifiers.  Modifiers that relate to the interpretation of the pattern
307       are listed just below.  Modifiers that alter the way a pattern is used
308       by Perl are detailed in "Regexp Quote-Like Operators" in perlop and
309       "Gory details of parsing quoted constructs" in perlop.  Modifiers can
310       be added dynamically; see "Extended Patterns" below.
311
312       "m" Treat the string being matched against as multiple lines.  That is,
313           change "^" and "$" from matching the start of the string's first
314           line and the end of its last line to matching the start and end of
315           each line within the string.
316
317       "s" Treat the string as single line.  That is, change "." to match any
318           character whatsoever, even a newline, which normally it would not
319           match.
320
321           Used together, as "/ms", they let the "." match any character
322           whatsoever, while still allowing "^" and "$" to match,
323           respectively, just after and just before newlines within the
324           string.
325
326       "i" Do case-insensitive pattern matching.  For example, "A" will match
327           "a" under "/i".
328
329           If locale matching rules are in effect, the case map is taken from
330           the current locale for code points less than 255, and from Unicode
331           rules for larger code points.  However, matches that would cross
332           the Unicode rules/non-Unicode rules boundary (ords 255/256) will
333           not succeed, unless the locale is a UTF-8 one.  See perllocale.
334
335           There are a number of Unicode characters that match a sequence of
336           multiple characters under "/i".  For example, "LATIN SMALL LIGATURE
337           FI" should match the sequence "fi".  Perl is not currently able to
338           do this when the multiple characters are in the pattern and are
339           split between groupings, or when one or more are quantified.  Thus
340
341            "\N{LATIN SMALL LIGATURE FI}" =~ /fi/i;          # Matches
342            "\N{LATIN SMALL LIGATURE FI}" =~ /[fi][fi]/i;    # Doesn't match!
343            "\N{LATIN SMALL LIGATURE FI}" =~ /fi*/i;         # Doesn't match!
344
345            # The below doesn't match, and it isn't clear what $1 and $2 would
346            # be even if it did!!
347            "\N{LATIN SMALL LIGATURE FI}" =~ /(f)(i)/i;      # Doesn't match!
348
349           Perl doesn't match multiple characters in a bracketed character
350           class unless the character that maps to them is explicitly
351           mentioned, and it doesn't match them at all if the character class
352           is inverted, which otherwise could be highly confusing.  See
353           "Bracketed Character Classes" in perlrecharclass, and "Negation" in
354           perlrecharclass.
355
356       "x" and "xx"
357           Extend your pattern's legibility by permitting whitespace and
358           comments.  Details in "/x and  /xx"
359
360       "p" Preserve the string matched such that "${^PREMATCH}", "${^MATCH}",
361           and "${^POSTMATCH}" are available for use after matching.
362
363           In Perl 5.20 and higher this is ignored. Due to a new copy-on-write
364           mechanism, "${^PREMATCH}", "${^MATCH}", and "${^POSTMATCH}" will be
365           available after the match regardless of the modifier.
366
367       "a", "d", "l", and "u"
368           These modifiers, all new in 5.14, affect which character-set rules
369           (Unicode, etc.) are used, as described below in "Character set
370           modifiers".
371
372       "n" Prevent the grouping metacharacters "()" from capturing. This
373           modifier, new in 5.22, will stop $1, $2, etc... from being filled
374           in.
375
376             "hello" =~ /(hi|hello)/;   # $1 is "hello"
377             "hello" =~ /(hi|hello)/n;  # $1 is undef
378
379           This is equivalent to putting "?:" at the beginning of every
380           capturing group:
381
382             "hello" =~ /(?:hi|hello)/; # $1 is undef
383
384           "/n" can be negated on a per-group basis. Alternatively, named
385           captures may still be used.
386
387             "hello" =~ /(?-n:(hi|hello))/n;   # $1 is "hello"
388             "hello" =~ /(?<greet>hi|hello)/n; # $1 is "hello", $+{greet} is
389                                               # "hello"
390
391       Other Modifiers
392           There are a number of flags that can be found at the end of regular
393           expression constructs that are not generic regular expression
394           flags, but apply to the operation being performed, like matching or
395           substitution ("m//" or "s///" respectively).
396
397           Flags described further in "Using regular expressions in Perl" in
398           perlretut are:
399
400             c  - keep the current position during repeated matching
401             g  - globally match the pattern repeatedly in the string
402
403           Substitution-specific modifiers described in
404           "s/PATTERN/REPLACEMENT/msixpodualngcer" in perlop are:
405
406             e  - evaluate the right-hand side as an expression
407             ee - evaluate the right side as a string then eval the result
408             o  - pretend to optimize your code, but actually introduce bugs
409             r  - perform non-destructive substitution and return the new value
410
411       Regular expression modifiers are usually written in documentation as
412       e.g., "the "/x" modifier", even though the delimiter in question might
413       not really be a slash.  The modifiers "/imnsxadlup" may also be
414       embedded within the regular expression itself using the "(?...)"
415       construct, see "Extended Patterns" below.
416
417       Details on some modifiers
418
419       Some of the modifiers require more explanation than given in the
420       "Overview" above.
421
422       "/x" and  "/xx"
423
424       A single "/x" tells the regular expression parser to ignore most
425       whitespace that is neither backslashed nor within a bracketed character
426       class.  You can use this to break up your regular expression into more
427       readable parts.  Also, the "#" character is treated as a metacharacter
428       introducing a comment that runs up to the pattern's closing delimiter,
429       or to the end of the current line if the pattern extends onto the next
430       line.  Hence, this is very much like an ordinary Perl code comment.
431       (You can include the closing delimiter within the comment only if you
432       precede it with a backslash, so be careful!)
433
434       Use of "/x" means that if you want real whitespace or "#" characters in
435       the pattern (outside a bracketed character class, which is unaffected
436       by "/x"), then you'll either have to escape them (using backslashes or
437       "\Q...\E") or encode them using octal, hex, or "\N{}" or "\p{name=...}"
438       escapes.  It is ineffective to try to continue a comment onto the next
439       line by escaping the "\n" with a backslash or "\Q".
440
441       You can use "(?#text)" to create a comment that ends earlier than the
442       end of the current line, but "text" also can't contain the closing
443       delimiter unless escaped with a backslash.
444
445       A common pitfall is to forget that "#" characters begin a comment under
446       "/x" and are not matched literally.  Just keep that in mind when trying
447       to puzzle out why a particular "/x" pattern isn't working as expected.
448
449       Starting in Perl v5.26, if the modifier has a second "x" within it, it
450       does everything that a single "/x" does, but additionally non-
451       backslashed SPACE and TAB characters within bracketed character classes
452       are also generally ignored, and hence can be added to make the classes
453       more readable.
454
455           / [d-e g-i 3-7]/xx
456           /[ ! @ " # $ % ^ & * () = ? <> ' ]/xx
457
458       may be easier to grasp than the squashed equivalents
459
460           /[d-eg-i3-7]/
461           /[!@"#$%^&*()=?<>']/
462
463       Taken together, these features go a long way towards making Perl's
464       regular expressions more readable.  Here's an example:
465
466           # Delete (most) C comments.
467           $program =~ s {
468               /\*     # Match the opening delimiter.
469               .*?     # Match a minimal number of characters.
470               \*/     # Match the closing delimiter.
471           } []gsx;
472
473       Note that anything inside a "\Q...\E" stays unaffected by "/x".  And
474       note that "/x" doesn't affect space interpretation within a single
475       multi-character construct.  For example "(?:...)" can't have a space
476       between the "(", "?", and ":".  Within any delimiters for such a
477       construct, allowed spaces are not affected by "/x", and depend on the
478       construct.  For example, all constructs using curly braces as
479       delimiters, such as "\x{...}" can have blanks within but adjacent to
480       the braces, but not elsewhere, and no non-blank space characters.  An
481       exception are Unicode properties which follow Unicode rules, for which
482       see "Properties accessible through \p{} and \P{}" in perluniprops.
483
484       The set of characters that are deemed whitespace are those that Unicode
485       calls "Pattern White Space", namely:
486
487        U+0009 CHARACTER TABULATION
488        U+000A LINE FEED
489        U+000B LINE TABULATION
490        U+000C FORM FEED
491        U+000D CARRIAGE RETURN
492        U+0020 SPACE
493        U+0085 NEXT LINE
494        U+200E LEFT-TO-RIGHT MARK
495        U+200F RIGHT-TO-LEFT MARK
496        U+2028 LINE SEPARATOR
497        U+2029 PARAGRAPH SEPARATOR
498
499       Character set modifiers
500
501       "/d", "/u", "/a", and "/l", available starting in 5.14, are called the
502       character set modifiers; they affect the character set rules used for
503       the regular expression.
504
505       The "/d", "/u", and "/l" modifiers are not likely to be of much use to
506       you, and so you need not worry about them very much.  They exist for
507       Perl's internal use, so that complex regular expression data structures
508       can be automatically serialized and later exactly reconstituted,
509       including all their nuances.  But, since Perl can't keep a secret, and
510       there may be rare instances where they are useful, they are documented
511       here.
512
513       The "/a" modifier, on the other hand, may be useful.  Its purpose is to
514       allow code that is to work mostly on ASCII data to not have to concern
515       itself with Unicode.
516
517       Briefly, "/l" sets the character set to that of whatever Locale is in
518       effect at the time of the execution of the pattern match.
519
520       "/u" sets the character set to Unicode.
521
522       "/a" also sets the character set to Unicode, BUT adds several
523       restrictions for ASCII-safe matching.
524
525       "/d" is the old, problematic, pre-5.14 Default character set behavior.
526       Its only use is to force that old behavior.
527
528       At any given time, exactly one of these modifiers is in effect.  Their
529       existence allows Perl to keep the originally compiled behavior of a
530       regular expression, regardless of what rules are in effect when it is
531       actually executed.  And if it is interpolated into a larger regex, the
532       original's rules continue to apply to it, and don't affect the other
533       parts.
534
535       The "/l" and "/u" modifiers are automatically selected for regular
536       expressions compiled within the scope of various pragmas, and we
537       recommend that in general, you use those pragmas instead of specifying
538       these modifiers explicitly.  For one thing, the modifiers affect only
539       pattern matching, and do not extend to even any replacement done,
540       whereas using the pragmas gives consistent results for all appropriate
541       operations within their scopes.  For example,
542
543        s/foo/\Ubar/il
544
545       will match "foo" using the locale's rules for case-insensitive
546       matching, but the "/l" does not affect how the "\U" operates.  Most
547       likely you want both of them to use locale rules.  To do this, instead
548       compile the regular expression within the scope of "use locale".  This
549       both implicitly adds the "/l", and applies locale rules to the "\U".
550       The lesson is to "use locale", and not "/l" explicitly.
551
552       Similarly, it would be better to use "use feature 'unicode_strings'"
553       instead of,
554
555        s/foo/\Lbar/iu
556
557       to get Unicode rules, as the "\L" in the former (but not necessarily
558       the latter) would also use Unicode rules.
559
560       More detail on each of the modifiers follows.  Most likely you don't
561       need to know this detail for "/l", "/u", and "/d", and can skip ahead
562       to /a.
563
564       /l
565
566       means to use the current locale's rules (see perllocale) when pattern
567       matching.  For example, "\w" will match the "word" characters of that
568       locale, and "/i" case-insensitive matching will match according to the
569       locale's case folding rules.  The locale used will be the one in effect
570       at the time of execution of the pattern match.  This may not be the
571       same as the compilation-time locale, and can differ from one match to
572       another if there is an intervening call of the setlocale() function.
573
574       Prior to v5.20, Perl did not support multi-byte locales.  Starting
575       then, UTF-8 locales are supported.  No other multi byte locales are
576       ever likely to be supported.  However, in all locales, one can have
577       code points above 255 and these will always be treated as Unicode no
578       matter what locale is in effect.
579
580       Under Unicode rules, there are a few case-insensitive matches that
581       cross the 255/256 boundary.  Except for UTF-8 locales in Perls v5.20
582       and later, these are disallowed under "/l".  For example, 0xFF (on
583       ASCII platforms) does not caselessly match the character at 0x178,
584       "LATIN CAPITAL LETTER Y WITH DIAERESIS", because 0xFF may not be "LATIN
585       SMALL LETTER Y WITH DIAERESIS" in the current locale, and Perl has no
586       way of knowing if that character even exists in the locale, much less
587       what code point it is.
588
589       In a UTF-8 locale in v5.20 and later, the only visible difference
590       between locale and non-locale in regular expressions should be tainting
591       (see perlsec).
592
593       This modifier may be specified to be the default by "use locale", but
594       see "Which character set modifier is in effect?".
595
596       /u
597
598       means to use Unicode rules when pattern matching.  On ASCII platforms,
599       this means that the code points between 128 and 255 take on their
600       Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's).
601       (Otherwise Perl considers their meanings to be undefined.)  Thus, under
602       this modifier, the ASCII platform effectively becomes a Unicode
603       platform; and hence, for example, "\w" will match any of the more than
604       100_000 word characters in Unicode.
605
606       Unlike most locales, which are specific to a language and country pair,
607       Unicode classifies all the characters that are letters somewhere in the
608       world as "\w".  For example, your locale might not think that "LATIN
609       SMALL LETTER ETH" is a letter (unless you happen to speak Icelandic),
610       but Unicode does.  Similarly, all the characters that are decimal
611       digits somewhere in the world will match "\d"; this is hundreds, not
612       10, possible matches.  And some of those digits look like some of the
613       10 ASCII digits, but mean a different number, so a human could easily
614       think a number is a different quantity than it really is.  For example,
615       "BENGALI DIGIT FOUR" (U+09EA) looks very much like an "ASCII DIGIT
616       EIGHT" (U+0038), and "LEPCHA DIGIT SIX" (U+1C46) looks very much like
617       an "ASCII DIGIT FIVE" (U+0035).  And, "\d+", may match strings of
618       digits that are a mixture from different writing systems, creating a
619       security issue.  A fraudulent website, for example, could display the
620       price of something using U+1C46, and it would appear to the user that
621       something cost 500 units, but it really costs 600.  A browser that
622       enforced script runs ("Script Runs") would prevent that fraudulent
623       display.  "num()" in Unicode::UCD can also be used to sort this out.
624       Or the "/a" modifier can be used to force "\d" to match just the ASCII
625       0 through 9.
626
627       Also, under this modifier, case-insensitive matching works on the full
628       set of Unicode characters.  The "KELVIN SIGN", for example matches the
629       letters "k" and "K"; and "LATIN SMALL LIGATURE FF" matches the sequence
630       "ff", which, if you're not prepared, might make it look like a
631       hexadecimal constant, presenting another potential security issue.  See
632       <https://unicode.org/reports/tr36> for a detailed discussion of Unicode
633       security issues.
634
635       This modifier may be specified to be the default by "use feature
636       'unicode_strings", "use locale ':not_characters'", or "use 5.012" (or
637       higher), but see "Which character set modifier is in effect?".
638
639       /d
640
641       This modifier means to use the "Default" native rules of the platform
642       except when there is cause to use Unicode rules instead, as follows:
643
644       1.  the target string is encoded in UTF-8; or
645
646       2.  the pattern is encoded in UTF-8; or
647
648       3.  the pattern explicitly mentions a code point that is above 255 (say
649           by "\x{100}"); or
650
651       4.  the pattern uses a Unicode name ("\N{...}");  or
652
653       5.  the pattern uses a Unicode property ("\p{...}" or "\P{...}"); or
654
655       6.  the pattern uses a Unicode break ("\b{...}" or "\B{...}"); or
656
657       7.  the pattern uses "(?[ ])"
658
659       8.  the pattern uses "(*script_run: ...)"
660
661       Another mnemonic for this modifier is "Depends", as the rules actually
662       used depend on various things, and as a result you can get unexpected
663       results.  See "The "Unicode Bug"" in perlunicode.  The Unicode Bug has
664       become rather infamous, leading to yet other (without swearing) names
665       for this modifier, "Dicey" and "Dodgy".
666
667       Unless the pattern or string are encoded in UTF-8, only ASCII
668       characters can match positively.
669
670       Here are some examples of how that works on an ASCII platform:
671
672        $str =  "\xDF";      # $str is not in UTF-8 format.
673        $str =~ /^\w/;       # No match, as $str isn't in UTF-8 format.
674        $str .= "\x{0e0b}";  # Now $str is in UTF-8 format.
675        $str =~ /^\w/;       # Match! $str is now in UTF-8 format.
676        chop $str;
677        $str =~ /^\w/;       # Still a match! $str remains in UTF-8 format.
678
679       This modifier is automatically selected by default when none of the
680       others are, so yet another name for it is "Default".
681
682       Because of the unexpected behaviors associated with this modifier, you
683       probably should only explicitly use it to maintain weird backward
684       compatibilities.
685
686       /a (and /aa)
687
688       This modifier stands for ASCII-restrict (or ASCII-safe).  This modifier
689       may be doubled-up to increase its effect.
690
691       When it appears singly, it causes the sequences "\d", "\s", "\w", and
692       the Posix character classes to match only in the ASCII range.  They
693       thus revert to their pre-5.6, pre-Unicode meanings.  Under "/a",  "\d"
694       always means precisely the digits "0" to "9"; "\s" means the five
695       characters "[ \f\n\r\t]", and starting in Perl v5.18, the vertical tab;
696       "\w" means the 63 characters "[A-Za-z0-9_]"; and likewise, all the
697       Posix classes such as "[[:print:]]" match only the appropriate ASCII-
698       range characters.
699
700       This modifier is useful for people who only incidentally use Unicode,
701       and who do not wish to be burdened with its complexities and security
702       concerns.
703
704       With "/a", one can write "\d" with confidence that it will only match
705       ASCII characters, and should the need arise to match beyond ASCII, you
706       can instead use "\p{Digit}" (or "\p{Word}" for "\w").  There are
707       similar "\p{...}" constructs that can match beyond ASCII both white
708       space (see "Whitespace" in perlrecharclass), and Posix classes (see
709       "POSIX Character Classes" in perlrecharclass).  Thus, this modifier
710       doesn't mean you can't use Unicode, it means that to get Unicode
711       matching you must explicitly use a construct ("\p{}", "\P{}") that
712       signals Unicode.
713
714       As you would expect, this modifier causes, for example, "\D" to mean
715       the same thing as "[^0-9]"; in fact, all non-ASCII characters match
716       "\D", "\S", and "\W".  "\b" still means to match at the boundary
717       between "\w" and "\W", using the "/a" definitions of them (similarly
718       for "\B").
719
720       Otherwise, "/a" behaves like the "/u" modifier, in that case-
721       insensitive matching uses Unicode rules; for example, "k" will match
722       the Unicode "\N{KELVIN SIGN}" under "/i" matching, and code points in
723       the Latin1 range, above ASCII will have Unicode rules when it comes to
724       case-insensitive matching.
725
726       To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"),
727       specify the "a" twice, for example "/aai" or "/aia".  (The first
728       occurrence of "a" restricts the "\d", etc., and the second occurrence
729       adds the "/i" restrictions.)  But, note that code points outside the
730       ASCII range will use Unicode rules for "/i" matching, so the modifier
731       doesn't really restrict things to just ASCII; it just forbids the
732       intermixing of ASCII and non-ASCII.
733
734       To summarize, this modifier provides protection for applications that
735       don't wish to be exposed to all of Unicode.  Specifying it twice gives
736       added protection.
737
738       This modifier may be specified to be the default by "use re '/a'" or
739       "use re '/aa'".  If you do so, you may actually have occasion to use
740       the "/u" modifier explicitly if there are a few regular expressions
741       where you do want full Unicode rules (but even here, it's best if
742       everything were under feature "unicode_strings", along with the "use re
743       '/aa'").  Also see "Which character set modifier is in effect?".
744
745       Which character set modifier is in effect?
746
747       Which of these modifiers is in effect at any given point in a regular
748       expression depends on a fairly complex set of interactions.  These have
749       been designed so that in general you don't have to worry about it, but
750       this section gives the gory details.  As explained below in "Extended
751       Patterns" it is possible to explicitly specify modifiers that apply
752       only to portions of a regular expression.  The innermost always has
753       priority over any outer ones, and one applying to the whole expression
754       has priority over any of the default settings that are described in the
755       remainder of this section.
756
757       The "use re '/foo'" pragma can be used to set default modifiers
758       (including these) for regular expressions compiled within its scope.
759       This pragma has precedence over the other pragmas listed below that
760       also change the defaults.
761
762       Otherwise, "use locale" sets the default modifier to "/l"; and "use
763       feature 'unicode_strings", or "use 5.012" (or higher) set the default
764       to "/u" when not in the same scope as either "use locale" or "use
765       bytes".  ("use locale ':not_characters'" also sets the default to "/u",
766       overriding any plain "use locale".)  Unlike the mechanisms mentioned
767       above, these affect operations besides regular expressions pattern
768       matching, and so give more consistent results with other operators,
769       including using "\U", "\l", etc. in substitution replacements.
770
771       If none of the above apply, for backwards compatibility reasons, the
772       "/d" modifier is the one in effect by default.  As this can lead to
773       unexpected results, it is best to specify which other rule set should
774       be used.
775
776       Character set modifier behavior prior to Perl 5.14
777
778       Prior to 5.14, there were no explicit modifiers, but "/l" was implied
779       for regexes compiled within the scope of "use locale", and "/d" was
780       implied otherwise.  However, interpolating a regex into a larger regex
781       would ignore the original compilation in favor of whatever was in
782       effect at the time of the second compilation.  There were a number of
783       inconsistencies (bugs) with the "/d" modifier, where Unicode rules
784       would be used when inappropriate, and vice versa.  "\p{}" did not imply
785       Unicode rules, and neither did all occurrences of "\N{}", until 5.12.
786
787   Regular Expressions
788       Quantifiers
789
790       Quantifiers are used when a particular portion of a pattern needs to
791       match a certain number (or numbers) of times.  If there isn't a
792       quantifier the number of times to match is exactly one.  The following
793       standard quantifiers are recognized:
794
795           *           Match 0 or more times
796           +           Match 1 or more times
797           ?           Match 1 or 0 times
798           {n}         Match exactly n times
799           {n,}        Match at least n times
800           {,n}        Match at most n times
801           {n,m}       Match at least n but not more than m times
802
803       (If a non-escaped curly bracket occurs in a context other than one of
804       the quantifiers listed above, where it does not form part of a
805       backslashed sequence like "\x{...}", it is either a fatal syntax error,
806       or treated as a regular character, generally with a deprecation warning
807       raised.  To escape it, you can precede it with a backslash ("\{") or
808       enclose it within square brackets  ("[{]").  This change will allow for
809       future syntax extensions (like making the lower bound of a quantifier
810       optional), and better error checking of quantifiers).
811
812       The "*" quantifier is equivalent to "{0,}", the "+" quantifier to
813       "{1,}", and the "?" quantifier to "{0,1}".  n and m are limited to non-
814       negative integral values less than a preset limit defined when perl is
815       built.  This is usually 65534 on the most common platforms.  The actual
816       limit can be seen in the error message generated by code such as this:
817
818           $_ **= $_ , / {$_} / for 2 .. 42;
819
820       By default, a quantified subpattern is "greedy", that is, it will match
821       as many times as possible (given a particular starting location) while
822       still allowing the rest of the pattern to match.  If you want it to
823       match the minimum number of times possible, follow the quantifier with
824       a "?".  Note that the meanings don't change, just the "greediness":
825
826           *?        Match 0 or more times, not greedily
827           +?        Match 1 or more times, not greedily
828           ??        Match 0 or 1 time, not greedily
829           {n}?      Match exactly n times, not greedily (redundant)
830           {n,}?     Match at least n times, not greedily
831           {,n}?     Match at most n times, not greedily
832           {n,m}?    Match at least n but not more than m times, not greedily
833
834       Normally when a quantified subpattern does not allow the rest of the
835       overall pattern to match, Perl will backtrack. However, this behaviour
836       is sometimes undesirable. Thus Perl provides the "possessive"
837       quantifier form as well.
838
839        *+     Match 0 or more times and give nothing back
840        ++     Match 1 or more times and give nothing back
841        ?+     Match 0 or 1 time and give nothing back
842        {n}+   Match exactly n times and give nothing back (redundant)
843        {n,}+  Match at least n times and give nothing back
844        {,n}+  Match at most n times and give nothing back
845        {n,m}+ Match at least n but not more than m times and give nothing back
846
847       For instance,
848
849          'aaaa' =~ /a++a/
850
851       will never match, as the "a++" will gobble up all the "a"'s in the
852       string and won't leave any for the remaining part of the pattern. This
853       feature can be extremely useful to give perl hints about where it
854       shouldn't backtrack. For instance, the typical "match a double-quoted
855       string" problem can be most efficiently performed when written as:
856
857          /"(?:[^"\\]++|\\.)*+"/
858
859       as we know that if the final quote does not match, backtracking will
860       not help. See the independent subexpression "(?>pattern)" for more
861       details; possessive quantifiers are just syntactic sugar for that
862       construct. For instance the above example could also be written as
863       follows:
864
865          /"(?>(?:(?>[^"\\]+)|\\.)*)"/
866
867       Note that the possessive quantifier modifier can not be combined with
868       the non-greedy modifier. This is because it would make no sense.
869       Consider the follow equivalency table:
870
871           Illegal         Legal
872           ------------    ------
873           X??+            X{0}
874           X+?+            X{1}
875           X{min,max}?+    X{min}
876
877       Escape sequences
878
879       Because patterns are processed as double-quoted strings, the following
880       also work:
881
882        \t          tab                   (HT, TAB)
883        \n          newline               (LF, NL)
884        \r          return                (CR)
885        \f          form feed             (FF)
886        \a          alarm (bell)          (BEL)
887        \e          escape (think troff)  (ESC)
888        \cK         control char          (example: VT)
889        \x{}, \x00  character whose ordinal is the given hexadecimal number
890        \N{name}    named Unicode character or character sequence
891        \N{U+263D}  Unicode character     (example: FIRST QUARTER MOON)
892        \o{}, \000  character whose ordinal is the given octal number
893        \l          lowercase next char (think vi)
894        \u          uppercase next char (think vi)
895        \L          lowercase until \E (think vi)
896        \U          uppercase until \E (think vi)
897        \Q          quote (disable) pattern metacharacters until \E
898        \E          end either case modification or quoted section, think vi
899
900       Details are in "Quote and Quote-like Operators" in perlop.
901
902       Character Classes and other Special Escapes
903
904       In addition, Perl defines the following:
905
906        Sequence   Note    Description
907         [...]     [1]  Match a character according to the rules of the
908                          bracketed character class defined by the "...".
909                          Example: [a-z] matches "a" or "b" or "c" ... or "z"
910         [[:...:]] [2]  Match a character according to the rules of the POSIX
911                          character class "..." within the outer bracketed
912                          character class.  Example: [[:upper:]] matches any
913                          uppercase character.
914         (?[...])  [8]  Extended bracketed character class
915         \w        [3]  Match a "word" character (alphanumeric plus "_", plus
916                          other connector punctuation chars plus Unicode
917                          marks)
918         \W        [3]  Match a non-"word" character
919         \s        [3]  Match a whitespace character
920         \S        [3]  Match a non-whitespace character
921         \d        [3]  Match a decimal digit character
922         \D        [3]  Match a non-digit character
923         \pP       [3]  Match P, named property.  Use \p{Prop} for longer names
924         \PP       [3]  Match non-P
925         \X        [4]  Match Unicode "eXtended grapheme cluster"
926         \1        [5]  Backreference to a specific capture group or buffer.
927                          '1' may actually be any positive integer.
928         \g1       [5]  Backreference to a specific or previous group,
929         \g{-1}    [5]  The number may be negative indicating a relative
930                          previous group and may optionally be wrapped in
931                          curly brackets for safer parsing.
932         \g{name}  [5]  Named backreference
933         \k<name>  [5]  Named backreference
934         \k'name'  [5]  Named backreference
935         \k{name}  [5]  Named backreference
936         \K        [6]  Keep the stuff left of the \K, don't include it in $&
937         \N        [7]  Any character but \n.  Not affected by /s modifier
938         \v        [3]  Vertical whitespace
939         \V        [3]  Not vertical whitespace
940         \h        [3]  Horizontal whitespace
941         \H        [3]  Not horizontal whitespace
942         \R        [4]  Linebreak
943
944       [1] See "Bracketed Character Classes" in perlrecharclass for details.
945
946       [2] See "POSIX Character Classes" in perlrecharclass for details.
947
948       [3] See "Unicode Character Properties" in perlunicode for details
949
950       [4] See "Misc" in perlrebackslash for details.
951
952       [5] See "Capture groups" below for details.
953
954       [6] See "Extended Patterns" below for details.
955
956       [7] Note that "\N" has two meanings.  When of the form "\N{NAME}", it
957           matches the character or character sequence whose name is NAME; and
958           similarly when of the form "\N{U+hex}", it matches the character
959           whose Unicode code point is hex.  Otherwise it matches any
960           character but "\n".
961
962       [8] See "Extended Bracketed Character Classes" in perlrecharclass for
963           details.
964
965       Assertions
966
967       Besides "^" and "$", Perl defines the following zero-width assertions:
968
969        \b{}   Match at Unicode boundary of specified type
970        \B{}   Match where corresponding \b{} doesn't match
971        \b     Match a \w\W or \W\w boundary
972        \B     Match except at a \w\W or \W\w boundary
973        \A     Match only at beginning of string
974        \Z     Match only at end of string, or before newline at the end
975        \z     Match only at end of string
976        \G     Match only at pos() (e.g. at the end-of-match position
977               of prior m//g)
978
979       A Unicode boundary ("\b{}"), available starting in v5.22, is a spot
980       between two characters, or before the first character in the string, or
981       after the final character in the string where certain criteria defined
982       by Unicode are met.  See "\b{}, \b, \B{}, \B" in perlrebackslash for
983       details.
984
985       A word boundary ("\b") is a spot between two characters that has a "\w"
986       on one side of it and a "\W" on the other side of it (in either order),
987       counting the imaginary characters off the beginning and end of the
988       string as matching a "\W".  (Within character classes "\b" represents
989       backspace rather than a word boundary, just as it normally does in any
990       double-quoted string.)  The "\A" and "\Z" are just like "^" and "$",
991       except that they won't match multiple times when the "/m" modifier is
992       used, while "^" and "$" will match at every internal line boundary.  To
993       match the actual end of the string and not ignore an optional trailing
994       newline, use "\z".
995
996       The "\G" assertion can be used to chain global matches (using "m//g"),
997       as described in "Regexp Quote-Like Operators" in perlop.  It is also
998       useful when writing "lex"-like scanners, when you have several patterns
999       that you want to match against consequent substrings of your string;
1000       see the previous reference.  The actual location where "\G" will match
1001       can also be influenced by using "pos()" as an lvalue: see "pos" in
1002       perlfunc. Note that the rule for zero-length matches (see "Repeated
1003       Patterns Matching a Zero-length Substring") is modified somewhat, in
1004       that contents to the left of "\G" are not counted when determining the
1005       length of the match. Thus the following will not match forever:
1006
1007            my $string = 'ABC';
1008            pos($string) = 1;
1009            while ($string =~ /(.\G)/g) {
1010                print $1;
1011            }
1012
1013       It will print 'A' and then terminate, as it considers the match to be
1014       zero-width, and thus will not match at the same position twice in a
1015       row.
1016
1017       It is worth noting that "\G" improperly used can result in an infinite
1018       loop. Take care when using patterns that include "\G" in an
1019       alternation.
1020
1021       Note also that "s///" will refuse to overwrite part of a substitution
1022       that has already been replaced; so for example this will stop after the
1023       first iteration, rather than iterating its way backwards through the
1024       string:
1025
1026           $_ = "123456789";
1027           pos = 6;
1028           s/.(?=.\G)/X/g;
1029           print;      # prints 1234X6789, not XXXXX6789
1030
1031       Capture groups
1032
1033       The grouping construct "( ... )" creates capture groups (also referred
1034       to as capture buffers). To refer to the current contents of a group
1035       later on, within the same pattern, use "\g1" (or "\g{1}") for the
1036       first, "\g2" (or "\g{2}") for the second, and so on.  This is called a
1037       backreference.
1038
1039
1040
1041
1042
1043
1044
1045
1046       There is no limit to the number of captured substrings that you may
1047       use.  Groups are numbered with the leftmost open parenthesis being
1048       number 1, etc.  If a group did not match, the associated backreference
1049       won't match either. (This can happen if the group is optional, or in a
1050       different branch of an alternation.)  You can omit the "g", and write
1051       "\1", etc, but there are some issues with this form, described below.
1052
1053       You can also refer to capture groups relatively, by using a negative
1054       number, so that "\g-1" and "\g{-1}" both refer to the immediately
1055       preceding capture group, and "\g-2" and "\g{-2}" both refer to the
1056       group before it.  For example:
1057
1058               /
1059                (Y)            # group 1
1060                (              # group 2
1061                   (X)         # group 3
1062                   \g{-1}      # backref to group 3
1063                   \g{-3}      # backref to group 1
1064                )
1065               /x
1066
1067       would match the same as "/(Y) ( (X) \g3 \g1 )/x".  This allows you to
1068       interpolate regexes into larger regexes and not have to worry about the
1069       capture groups being renumbered.
1070
1071       You can dispense with numbers altogether and create named capture
1072       groups.  The notation is "(?<name>...)" to declare and "\g{name}" to
1073       reference.  (To be compatible with .Net regular expressions, "\g{name}"
1074       may also be written as "\k{name}", "\k<name>" or "\k'name'".)  name
1075       must not begin with a number, nor contain hyphens.  When different
1076       groups within the same pattern have the same name, any reference to
1077       that name assumes the leftmost defined group.  Named groups count in
1078       absolute and relative numbering, and so can also be referred to by
1079       those numbers.  (It's possible to do things with named capture groups
1080       that would otherwise require "(??{})".)
1081
1082       Capture group contents are dynamically scoped and available to you
1083       outside the pattern until the end of the enclosing block or until the
1084       next successful match, whichever comes first.  (See "Compound
1085       Statements" in perlsyn.)  You can refer to them by absolute number
1086       (using "$1" instead of "\g1", etc); or by name via the "%+" hash, using
1087       "$+{name}".
1088
1089       Braces are required in referring to named capture groups, but are
1090       optional for absolute or relative numbered ones.  Braces are safer when
1091       creating a regex by concatenating smaller strings.  For example if you
1092       have "qr/$a$b/", and $a contained "\g1", and $b contained "37", you
1093       would get "/\g137/" which is probably not what you intended.
1094
1095       If you use braces, you may also optionally add any number of blank
1096       (space or tab) characters within but adjacent to the braces, like
1097       "\g{ -1 }", or "\k{ name }".
1098
1099       The "\g" and "\k" notations were introduced in Perl 5.10.0.  Prior to
1100       that there were no named nor relative numbered capture groups.
1101       Absolute numbered groups were referred to using "\1", "\2", etc., and
1102       this notation is still accepted (and likely always will be).  But it
1103       leads to some ambiguities if there are more than 9 capture groups, as
1104       "\10" could mean either the tenth capture group, or the character whose
1105       ordinal in octal is 010 (a backspace in ASCII).  Perl resolves this
1106       ambiguity by interpreting "\10" as a backreference only if at least 10
1107       left parentheses have opened before it.  Likewise "\11" is a
1108       backreference only if at least 11 left parentheses have opened before
1109       it.  And so on.  "\1" through "\9" are always interpreted as
1110       backreferences.  There are several examples below that illustrate these
1111       perils.  You can avoid the ambiguity by always using "\g{}" or "\g" if
1112       you mean capturing groups; and for octal constants always using "\o{}",
1113       or for "\077" and below, using 3 digits padded with leading zeros,
1114       since a leading zero implies an octal constant.
1115
1116       The "\digit" notation also works in certain circumstances outside the
1117       pattern.  See "Warning on \1 Instead of $1" below for details.
1118
1119       Examples:
1120
1121           s/^([^ ]*) *([^ ]*)/$2 $1/;     # swap first two words
1122
1123           /(.)\g1/                        # find first doubled char
1124                and print "'$1' is the first doubled character\n";
1125
1126           /(?<char>.)\k<char>/            # ... a different way
1127                and print "'$+{char}' is the first doubled character\n";
1128
1129           /(?'char'.)\g1/                 # ... mix and match
1130                and print "'$1' is the first doubled character\n";
1131
1132           if (/Time: (..):(..):(..)/) {   # parse out values
1133               $hours = $1;
1134               $minutes = $2;
1135               $seconds = $3;
1136           }
1137
1138           /(.)(.)(.)(.)(.)(.)(.)(.)(.)\g10/   # \g10 is a backreference
1139           /(.)(.)(.)(.)(.)(.)(.)(.)(.)\10/    # \10 is octal
1140           /((.)(.)(.)(.)(.)(.)(.)(.)(.))\10/  # \10 is a backreference
1141           /((.)(.)(.)(.)(.)(.)(.)(.)(.))\010/ # \010 is octal
1142
1143           $a = '(.)\1';        # Creates problems when concatenated.
1144           $b = '(.)\g{1}';     # Avoids the problems.
1145           "aa" =~ /${a}/;      # True
1146           "aa" =~ /${b}/;      # True
1147           "aa0" =~ /${a}0/;    # False!
1148           "aa0" =~ /${b}0/;    # True
1149           "aa\x08" =~ /${a}0/;  # True!
1150           "aa\x08" =~ /${b}0/;  # False
1151
1152       Several special variables also refer back to portions of the previous
1153       match.  $+ returns whatever the last bracket match matched.  $& returns
1154       the entire matched string.  (At one point $0 did also, but now it
1155       returns the name of the program.)  "$`" returns everything before the
1156       matched string.  "$'" returns everything after the matched string. And
1157       $^N contains whatever was matched by the most-recently closed group
1158       (submatch). $^N can be used in extended patterns (see below), for
1159       example to assign a submatch to a variable.
1160
1161       These special variables, like the "%+" hash and the numbered match
1162       variables ($1, $2, $3, etc.) are dynamically scoped until the end of
1163       the enclosing block or until the next successful match, whichever comes
1164       first.  (See "Compound Statements" in perlsyn.)
1165
1166       NOTE: Failed matches in Perl do not reset the match variables, which
1167       makes it easier to write code that tests for a series of more specific
1168       cases and remembers the best match.
1169
1170       WARNING: If your code is to run on Perl 5.16 or earlier, beware that
1171       once Perl sees that you need one of $&, "$`", or "$'" anywhere in the
1172       program, it has to provide them for every pattern match.  This may
1173       substantially slow your program.
1174
1175       Perl uses the same mechanism to produce $1, $2, etc, so you also pay a
1176       price for each pattern that contains capturing parentheses.  (To avoid
1177       this cost while retaining the grouping behaviour, use the extended
1178       regular expression "(?: ... )" instead.)  But if you never use $&, "$`"
1179       or "$'", then patterns without capturing parentheses will not be
1180       penalized.  So avoid $&, "$'", and "$`" if you can, but if you can't
1181       (and some algorithms really appreciate them), once you've used them
1182       once, use them at will, because you've already paid the price.
1183
1184       Perl 5.16 introduced a slightly more efficient mechanism that notes
1185       separately whether each of "$`", $&, and "$'" have been seen, and thus
1186       may only need to copy part of the string.  Perl 5.20 introduced a much
1187       more efficient copy-on-write mechanism which eliminates any slowdown.
1188
1189       As another workaround for this problem, Perl 5.10.0 introduced
1190       "${^PREMATCH}", "${^MATCH}" and "${^POSTMATCH}", which are equivalent
1191       to "$`", $& and "$'", except that they are only guaranteed to be
1192       defined after a successful match that was executed with the "/p"
1193       (preserve) modifier.  The use of these variables incurs no global
1194       performance penalty, unlike their punctuation character equivalents,
1195       however at the trade-off that you have to tell perl when you want to
1196       use them.  As of Perl 5.20, these three variables are equivalent to
1197       "$`", $& and "$'", and "/p" is ignored.
1198
1199   Quoting metacharacters
1200       Backslashed metacharacters in Perl are alphanumeric, such as "\b",
1201       "\w", "\n".  Unlike some other regular expression languages, there are
1202       no backslashed symbols that aren't alphanumeric.  So anything that
1203       looks like "\\", "\(", "\)", "\[", "\]", "\{", or "\}" is always
1204       interpreted as a literal character, not a metacharacter.  This was once
1205       used in a common idiom to disable or quote the special meanings of
1206       regular expression metacharacters in a string that you want to use for
1207       a pattern. Simply quote all non-"word" characters:
1208
1209           $pattern =~ s/(\W)/\\$1/g;
1210
1211       (If "use locale" is set, then this depends on the current locale.)
1212       Today it is more common to use the "quotemeta()" function or the "\Q"
1213       metaquoting escape sequence to disable all metacharacters' special
1214       meanings like this:
1215
1216           /$unquoted\Q$quoted\E$unquoted/
1217
1218       Beware that if you put literal backslashes (those not inside
1219       interpolated variables) between "\Q" and "\E", double-quotish backslash
1220       interpolation may lead to confusing results.  If you need to use
1221       literal backslashes within "\Q...\E", consult "Gory details of parsing
1222       quoted constructs" in perlop.
1223
1224       "quotemeta()" and "\Q" are fully described in "quotemeta" in perlfunc.
1225
1226   Extended Patterns
1227       Perl also defines a consistent extension syntax for features not found
1228       in standard tools like awk and lex.  The syntax for most of these is a
1229       pair of parentheses with a question mark as the first thing within the
1230       parentheses.  The character after the question mark indicates the
1231       extension.
1232
1233       A question mark was chosen for this and for the minimal-matching
1234       construct because 1) question marks are rare in older regular
1235       expressions, and 2) whenever you see one, you should stop and
1236       "question" exactly what is going on.  That's psychology....
1237
1238       "(?#text)"
1239           A comment.  The text is ignored.  Note that Perl closes the comment
1240           as soon as it sees a ")", so there is no way to put a literal ")"
1241           in the comment.  The pattern's closing delimiter must be escaped by
1242           a backslash if it appears in the comment.
1243
1244           See "/x" for another way to have comments in patterns.
1245
1246           Note that a comment can go just about anywhere, except in the
1247           middle of an escape sequence.   Examples:
1248
1249            qr/foo(?#comment)bar/'  # Matches 'foobar'
1250
1251            # The pattern below matches 'abcd', 'abccd', or 'abcccd'
1252            qr/abc(?#comment between literal and its quantifier){1,3}d/
1253
1254            # The pattern below generates a syntax error, because the '\p' must
1255            # be followed immediately by a '{'.
1256            qr/\p(?#comment between \p and its property name){Any}/
1257
1258            # The pattern below generates a syntax error, because the initial
1259            # '\(' is a literal opening parenthesis, and so there is nothing
1260            # for the  closing ')' to match
1261            qr/\(?#the backslash means this isn't a comment)p{Any}/
1262
1263            # Comments can be used to fold long patterns into multiple lines
1264            qr/First part of a long regex(?#
1265              )remaining part/
1266
1267       "(?adlupimnsx-imnsx)"
1268       "(?^alupimnsx)"
1269           Zero or more embedded pattern-match modifiers, to be turned on (or
1270           turned off if preceded by "-") for the remainder of the pattern or
1271           the remainder of the enclosing pattern group (if any).
1272
1273           This is particularly useful for dynamically-generated patterns,
1274           such as those read in from a configuration file, taken from an
1275           argument, or specified in a table somewhere.  Consider the case
1276           where some patterns want to be case-sensitive and some do not:  The
1277           case-insensitive ones merely need to include "(?i)" at the front of
1278           the pattern.  For example:
1279
1280               $pattern = "foobar";
1281               if ( /$pattern/i ) { }
1282
1283               # more flexible:
1284
1285               $pattern = "(?i)foobar";
1286               if ( /$pattern/ ) { }
1287
1288           These modifiers are restored at the end of the enclosing group. For
1289           example,
1290
1291               ( (?i) blah ) \s+ \g1
1292
1293           will match "blah" in any case, some spaces, and an exact (including
1294           the case!)  repetition of the previous word, assuming the "/x"
1295           modifier, and no "/i" modifier outside this group.
1296
1297           These modifiers do not carry over into named subpatterns called in
1298           the enclosing group. In other words, a pattern such as
1299           "((?i)(?&NAME))" does not change the case-sensitivity of the NAME
1300           pattern.
1301
1302           A modifier is overridden by later occurrences of this construct in
1303           the same scope containing the same modifier, so that
1304
1305               /((?im)foo(?-m)bar)/
1306
1307           matches all of "foobar" case insensitively, but uses "/m" rules for
1308           only the "foo" portion.  The "a" flag overrides "aa" as well;
1309           likewise "aa" overrides "a".  The same goes for "x" and "xx".
1310           Hence, in
1311
1312               /(?-x)foo/xx
1313
1314           both "/x" and "/xx" are turned off during matching "foo".  And in
1315
1316               /(?x)foo/x
1317
1318           "/x" but NOT "/xx" is turned on for matching "foo".  (One might
1319           mistakenly think that since the inner "(?x)" is already in the
1320           scope of "/x", that the result would effectively be the sum of
1321           them, yielding "/xx".  It doesn't work that way.)  Similarly, doing
1322           something like "(?xx-x)foo" turns off all "x" behavior for matching
1323           "foo", it is not that you subtract 1 "x" from 2 to get 1 "x"
1324           remaining.
1325
1326           Any of these modifiers can be set to apply globally to all regular
1327           expressions compiled within the scope of a "use re".  See "'/flags'
1328           mode" in re.
1329
1330           Starting in Perl 5.14, a "^" (caret or circumflex accent)
1331           immediately after the "?" is a shorthand equivalent to "d-imnsx".
1332           Flags (except "d") may follow the caret to override it.  But a
1333           minus sign is not legal with it.
1334
1335           Note that the "a", "d", "l", "p", and "u" modifiers are special in
1336           that they can only be enabled, not disabled, and the "a", "d", "l",
1337           and "u" modifiers are mutually exclusive: specifying one de-
1338           specifies the others, and a maximum of one (or two "a"'s) may
1339           appear in the construct.  Thus, for example, "(?-p)" will warn when
1340           compiled under "use warnings"; "(?-d:...)" and "(?dl:...)" are
1341           fatal errors.
1342
1343           Note also that the "p" modifier is special in that its presence
1344           anywhere in a pattern has a global effect.
1345
1346           Having zero modifiers makes this a no-op (so why did you specify
1347           it, unless it's generated code), and starting in v5.30, warns under
1348           "use re 'strict'".
1349
1350       "(?:pattern)"
1351       "(?adluimnsx-imnsx:pattern)"
1352       "(?^aluimnsx:pattern)"
1353           This is for clustering, not capturing; it groups subexpressions
1354           like "()", but doesn't make backreferences as "()" does.  So
1355
1356               @fields = split(/\b(?:a|b|c)\b/)
1357
1358           matches the same field delimiters as
1359
1360               @fields = split(/\b(a|b|c)\b/)
1361
1362           but doesn't spit out the delimiters themselves as extra fields
1363           (even though that's the behaviour of "split" in perlfunc when its
1364           pattern contains capturing groups).  It's also cheaper not to
1365           capture characters if you don't need to.
1366
1367           Any letters between "?" and ":" act as flags modifiers as with
1368           "(?adluimnsx-imnsx)".  For example,
1369
1370               /(?s-i:more.*than).*million/i
1371
1372           is equivalent to the more verbose
1373
1374               /(?:(?s-i)more.*than).*million/i
1375
1376           Note that any "()" constructs enclosed within this one will still
1377           capture unless the "/n" modifier is in effect.
1378
1379           Like the "(?adlupimnsx-imnsx)" construct, "aa" and "a" override
1380           each other, as do "xx" and "x".  They are not additive.  So, doing
1381           something like "(?xx-x:foo)" turns off all "x" behavior for
1382           matching "foo".
1383
1384           Starting in Perl 5.14, a "^" (caret or circumflex accent)
1385           immediately after the "?" is a shorthand equivalent to "d-imnsx".
1386           Any positive flags (except "d") may follow the caret, so
1387
1388               (?^x:foo)
1389
1390           is equivalent to
1391
1392               (?x-imns:foo)
1393
1394           The caret tells Perl that this cluster doesn't inherit the flags of
1395           any surrounding pattern, but uses the system defaults ("d-imnsx"),
1396           modified by any flags specified.
1397
1398           The caret allows for simpler stringification of compiled regular
1399           expressions.  These look like
1400
1401               (?^:pattern)
1402
1403           with any non-default flags appearing between the caret and the
1404           colon.  A test that looks at such stringification thus doesn't need
1405           to have the system default flags hard-coded in it, just the caret.
1406           If new flags are added to Perl, the meaning of the caret's
1407           expansion will change to include the default for those flags, so
1408           the test will still work, unchanged.
1409
1410           Specifying a negative flag after the caret is an error, as the flag
1411           is redundant.
1412
1413           Mnemonic for "(?^...)":  A fresh beginning since the usual use of a
1414           caret is to match at the beginning.
1415
1416       "(?|pattern)"
1417           This is the "branch reset" pattern, which has the special property
1418           that the capture groups are numbered from the same starting point
1419           in each alternation branch. It is available starting from perl
1420           5.10.0.
1421
1422           Capture groups are numbered from left to right, but inside this
1423           construct the numbering is restarted for each branch.
1424
1425           The numbering within each branch will be as normal, and any groups
1426           following this construct will be numbered as though the construct
1427           contained only one branch, that being the one with the most capture
1428           groups in it.
1429
1430           This construct is useful when you want to capture one of a number
1431           of alternative matches.
1432
1433           Consider the following pattern.  The numbers underneath show in
1434           which group the captured content will be stored.
1435
1436               # before  ---------------branch-reset----------- after
1437               / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
1438               # 1            2         2  3        2     3     4
1439
1440           Be careful when using the branch reset pattern in combination with
1441           named captures. Named captures are implemented as being aliases to
1442           numbered groups holding the captures, and that interferes with the
1443           implementation of the branch reset pattern. If you are using named
1444           captures in a branch reset pattern, it's best to use the same
1445           names, in the same order, in each of the alternations:
1446
1447              /(?|  (?<a> x ) (?<b> y )
1448                 |  (?<a> z ) (?<b> w )) /x
1449
1450           Not doing so may lead to surprises:
1451
1452             "12" =~ /(?| (?<a> \d+ ) | (?<b> \D+))/x;
1453             say $+{a};    # Prints '12'
1454             say $+{b};    # *Also* prints '12'.
1455
1456           The problem here is that both the group named "a" and the group
1457           named "b" are aliases for the group belonging to $1.
1458
1459       Lookaround Assertions
1460           Lookaround assertions are zero-width patterns which match a
1461           specific pattern without including it in $&. Positive assertions
1462           match when their subpattern matches, negative assertions match when
1463           their subpattern fails. Lookbehind matches text up to the current
1464           match position, lookahead matches text following the current match
1465           position.
1466
1467           "(?=pattern)"
1468           "(*pla:pattern)"
1469           "(*positive_lookahead:pattern)"
1470               A zero-width positive lookahead assertion.  For example,
1471               "/\w+(?=\t)/" matches a word followed by a tab, without
1472               including the tab in $&.
1473
1474           "(?!pattern)"
1475           "(*nla:pattern)"
1476           "(*negative_lookahead:pattern)"
1477               A zero-width negative lookahead assertion.  For example
1478               "/foo(?!bar)/" matches any occurrence of "foo" that isn't
1479               followed by "bar".  Note however that lookahead and lookbehind
1480               are NOT the same thing.  You cannot use this for lookbehind.
1481
1482               If you are looking for a "bar" that isn't preceded by a "foo",
1483               "/(?!foo)bar/" will not do what you want.  That's because the
1484               "(?!foo)" is just saying that the next thing cannot be
1485               "foo"--and it's not, it's a "bar", so "foobar" will match.  Use
1486               lookbehind instead (see below).
1487
1488           "(?<=pattern)"
1489           "\K"
1490           "(*plb:pattern)"
1491           "(*positive_lookbehind:pattern)"
1492               A zero-width positive lookbehind assertion.  For example,
1493               "/(?<=\t)\w+/" matches a word that follows a tab, without
1494               including the tab in $&.
1495
1496               Prior to Perl 5.30, it worked only for fixed-width lookbehind,
1497               but starting in that release, it can handle variable lengths
1498               from 1 to 255 characters as an experimental feature.  The
1499               feature is enabled automatically if you use a variable length
1500               lookbehind assertion, but will raise a warning at pattern
1501               compilation time, unless turned off, in the "experimental::vlb"
1502               category.  This is to warn you that the exact behavior is
1503               subject to change should feedback from actual use in the field
1504               indicate to do so; or even complete removal if the problems
1505               found are not practically surmountable.  You can achieve close
1506               to pre-5.30 behavior by fatalizing warnings in this category.
1507
1508               There is a special form of this construct, called "\K"
1509               (available since Perl 5.10.0), which causes the regex engine to
1510               "keep" everything it had matched prior to the "\K" and not
1511               include it in $&. This effectively provides non-experimental
1512               variable-length lookbehind of any length.
1513
1514               And, there is a technique that can be used to handle variable
1515               length lookbehinds on earlier releases, and longer than 255
1516               characters.  It is described in
1517               <http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html>.
1518
1519               Note that under "/i", a few single characters match two or
1520               three other characters.  This makes them variable length, and
1521               the 255 length applies to the maximum number of characters in
1522               the match.  For example "qr/\N{LATIN SMALL LETTER SHARP S}/i"
1523               matches the sequence "ss".  Your lookbehind assertion could
1524               contain 127 Sharp S characters under "/i", but adding a 128th
1525               would generate a compilation error, as that could match 256 "s"
1526               characters in a row.
1527
1528               The use of "\K" inside of another lookaround assertion is
1529               allowed, but the behaviour is currently not well defined.
1530
1531               For various reasons "\K" may be significantly more efficient
1532               than the equivalent "(?<=...)" construct, and it is especially
1533               useful in situations where you want to efficiently remove
1534               something following something else in a string. For instance
1535
1536                 s/(foo)bar/$1/g;
1537
1538               can be rewritten as the much more efficient
1539
1540                 s/foo\Kbar//g;
1541
1542               Use of the non-greedy modifier "?" may not give you the
1543               expected results if it is within a capturing group within the
1544               construct.
1545
1546           "(?<!pattern)"
1547           "(*nlb:pattern)"
1548           "(*negative_lookbehind:pattern)"
1549               A zero-width negative lookbehind assertion.  For example
1550               "/(?<!bar)foo/" matches any occurrence of "foo" that does not
1551               follow "bar".
1552
1553               Prior to Perl 5.30, it worked only for fixed-width lookbehind,
1554               but starting in that release, it can handle variable lengths
1555               from 1 to 255 characters as an experimental feature.  The
1556               feature is enabled automatically if you use a variable length
1557               lookbehind assertion, but will raise a warning at pattern
1558               compilation time, unless turned off, in the "experimental::vlb"
1559               category.  This is to warn you that the exact behavior is
1560               subject to change should feedback from actual use in the field
1561               indicate to do so; or even complete removal if the problems
1562               found are not practically surmountable.  You can achieve close
1563               to pre-5.30 behavior by fatalizing warnings in this category.
1564
1565               There is a technique that can be used to handle variable length
1566               lookbehinds on earlier releases, and longer than 255
1567               characters.  It is described in
1568               <http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html>.
1569
1570               Note that under "/i", a few single characters match two or
1571               three other characters.  This makes them variable length, and
1572               the 255 length applies to the maximum number of characters in
1573               the match.  For example "qr/\N{LATIN SMALL LETTER SHARP S}/i"
1574               matches the sequence "ss".  Your lookbehind assertion could
1575               contain 127 Sharp S characters under "/i", but adding a 128th
1576               would generate a compilation error, as that could match 256 "s"
1577               characters in a row.
1578
1579               Use of the non-greedy modifier "?" may not give you the
1580               expected results if it is within a capturing group within the
1581               construct.
1582
1583       "(?<NAME>pattern)"
1584       "(?'NAME'pattern)"
1585           A named capture group. Identical in every respect to normal
1586           capturing parentheses "()" but for the additional fact that the
1587           group can be referred to by name in various regular expression
1588           constructs (like "\g{NAME}") and can be accessed by name after a
1589           successful match via "%+" or "%-". See perlvar for more details on
1590           the "%+" and "%-" hashes.
1591
1592           If multiple distinct capture groups have the same name, then
1593           $+{NAME} will refer to the leftmost defined group in the match.
1594
1595           The forms "(?'NAME'pattern)" and "(?<NAME>pattern)" are equivalent.
1596
1597           NOTE: While the notation of this construct is the same as the
1598           similar function in .NET regexes, the behavior is not. In Perl the
1599           groups are numbered sequentially regardless of being named or not.
1600           Thus in the pattern
1601
1602             /(x)(?<foo>y)(z)/
1603
1604           $+{foo} will be the same as $2, and $3 will contain 'z' instead of
1605           the opposite which is what a .NET regex hacker might expect.
1606
1607           Currently NAME is restricted to simple identifiers only.  In other
1608           words, it must match "/^[_A-Za-z][_A-Za-z0-9]*\z/" or its Unicode
1609           extension (see utf8), though it isn't extended by the locale (see
1610           perllocale).
1611
1612           NOTE: In order to make things easier for programmers with
1613           experience with the Python or PCRE regex engines, the pattern
1614           "(?P<NAME>pattern)" may be used instead of "(?<NAME>pattern)";
1615           however this form does not support the use of single quotes as a
1616           delimiter for the name.
1617
1618       "\k<NAME>"
1619       "\k'NAME'"
1620       "\k{NAME}"
1621           Named backreference. Similar to numeric backreferences, except that
1622           the group is designated by name and not number. If multiple groups
1623           have the same name then it refers to the leftmost defined group in
1624           the current match.
1625
1626           It is an error to refer to a name not defined by a "(?<NAME>)"
1627           earlier in the pattern.
1628
1629           All three forms are equivalent, although with "\k{ NAME }", you may
1630           optionally have blanks within but adjacent to the braces, as shown.
1631
1632           NOTE: In order to make things easier for programmers with
1633           experience with the Python or PCRE regex engines, the pattern
1634           "(?P=NAME)" may be used instead of "\k<NAME>".
1635
1636       "(?{ code })"
1637           WARNING: Using this feature safely requires that you understand its
1638           limitations.  Code executed that has side effects may not perform
1639           identically from version to version due to the effect of future
1640           optimisations in the regex engine.  For more information on this,
1641           see "Embedded Code Execution Frequency".
1642
1643           This zero-width assertion executes any embedded Perl code.  It
1644           always succeeds, and its return value is set as $^R.
1645
1646           In literal patterns, the code is parsed at the same time as the
1647           surrounding code. While within the pattern, control is passed
1648           temporarily back to the perl parser, until the logically-balancing
1649           closing brace is encountered. This is similar to the way that an
1650           array index expression in a literal string is handled, for example
1651
1652               "abc$array[ 1 + f('[') + g()]def"
1653
1654           In particular, braces do not need to be balanced:
1655
1656               s/abc(?{ f('{'); })/def/
1657
1658           Even in a pattern that is interpolated and compiled at run-time,
1659           literal code blocks will be compiled once, at perl compile time;
1660           the following prints "ABCD":
1661
1662               print "D";
1663               my $qr = qr/(?{ BEGIN { print "A" } })/;
1664               my $foo = "foo";
1665               /$foo$qr(?{ BEGIN { print "B" } })/;
1666               BEGIN { print "C" }
1667
1668           In patterns where the text of the code is derived from run-time
1669           information rather than appearing literally in a source code
1670           /pattern/, the code is compiled at the same time that the pattern
1671           is compiled, and for reasons of security, "use re 'eval'" must be
1672           in scope. This is to stop user-supplied patterns containing code
1673           snippets from being executable.
1674
1675           In situations where you need to enable this with "use re 'eval'",
1676           you should also have taint checking enabled.  Better yet, use the
1677           carefully constrained evaluation within a Safe compartment.  See
1678           perlsec for details about both these mechanisms.
1679
1680           From the viewpoint of parsing, lexical variable scope and closures,
1681
1682               /AAA(?{ BBB })CCC/
1683
1684           behaves approximately like
1685
1686               /AAA/ && do { BBB } && /CCC/
1687
1688           Similarly,
1689
1690               qr/AAA(?{ BBB })CCC/
1691
1692           behaves approximately like
1693
1694               sub { /AAA/ && do { BBB } && /CCC/ }
1695
1696           In particular:
1697
1698               { my $i = 1; $r = qr/(?{ print $i })/ }
1699               my $i = 2;
1700               /$r/; # prints "1"
1701
1702           Inside a "(?{...})" block, $_ refers to the string the regular
1703           expression is matching against. You can also use "pos()" to know
1704           what is the current position of matching within this string.
1705
1706           The code block introduces a new scope from the perspective of
1707           lexical variable declarations, but not from the perspective of
1708           "local" and similar localizing behaviours. So later code blocks
1709           within the same pattern will still see the values which were
1710           localized in earlier blocks.  These accumulated localizations are
1711           undone either at the end of a successful match, or if the assertion
1712           is backtracked (compare "Backtracking"). For example,
1713
1714             $_ = 'a' x 8;
1715             m<
1716                (?{ $cnt = 0 })               # Initialize $cnt.
1717                (
1718                  a
1719                  (?{
1720                      local $cnt = $cnt + 1;  # Update $cnt,
1721                                              # backtracking-safe.
1722                  })
1723                )*
1724                aaaa
1725                (?{ $res = $cnt })            # On success copy to
1726                                              # non-localized location.
1727              >x;
1728
1729           will initially increment $cnt up to 8; then during backtracking,
1730           its value will be unwound back to 4, which is the value assigned to
1731           $res.  At the end of the regex execution, $cnt will be wound back
1732           to its initial value of 0.
1733
1734           This assertion may be used as the condition in a
1735
1736               (?(condition)yes-pattern|no-pattern)
1737
1738           switch.  If not used in this way, the result of evaluation of code
1739           is put into the special variable $^R.  This happens immediately, so
1740           $^R can be used from other "(?{ code })" assertions inside the same
1741           regular expression.
1742
1743           The assignment to $^R above is properly localized, so the old value
1744           of $^R is restored if the assertion is backtracked; compare
1745           "Backtracking".
1746
1747           Note that the special variable $^N  is particularly useful with
1748           code blocks to capture the results of submatches in variables
1749           without having to keep track of the number of nested parentheses.
1750           For example:
1751
1752             $_ = "The brown fox jumps over the lazy dog";
1753             /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i;
1754             print "color = $color, animal = $animal\n";
1755
1756       "(??{ code })"
1757           WARNING: Using this feature safely requires that you understand its
1758           limitations.  Code executed that has side effects may not perform
1759           identically from version to version due to the effect of future
1760           optimisations in the regex engine.  For more information on this,
1761           see "Embedded Code Execution Frequency".
1762
1763           This is a "postponed" regular subexpression.  It behaves in exactly
1764           the same way as a "(?{ code })" code block as described above,
1765           except that its return value, rather than being assigned to $^R, is
1766           treated as a pattern, compiled if it's a string (or used as-is if
1767           its a qr// object), then matched as if it were inserted instead of
1768           this construct.
1769
1770           During the matching of this sub-pattern, it has its own set of
1771           captures which are valid during the sub-match, but are discarded
1772           once control returns to the main pattern. For example, the
1773           following matches, with the inner pattern capturing "B" and
1774           matching "BB", while the outer pattern captures "A";
1775
1776               my $inner = '(.)\1';
1777               "ABBA" =~ /^(.)(??{ $inner })\1/;
1778               print $1; # prints "A";
1779
1780           Note that this means that  there is no way for the inner pattern to
1781           refer to a capture group defined outside.  (The code block itself
1782           can use $1, etc., to refer to the enclosing pattern's capture
1783           groups.)  Thus, although
1784
1785               ('a' x 100)=~/(??{'(.)' x 100})/
1786
1787           will match, it will not set $1 on exit.
1788
1789           The following pattern matches a parenthesized group:
1790
1791            $re = qr{
1792                       \(
1793                       (?:
1794                          (?> [^()]+ )  # Non-parens without backtracking
1795                        |
1796                          (??{ $re })   # Group with matching parens
1797                       )*
1798                       \)
1799                    }x;
1800
1801           See also "(?PARNO)" for a different, more efficient way to
1802           accomplish the same task.
1803
1804           Executing a postponed regular expression too many times without
1805           consuming any input string will also result in a fatal error.  The
1806           depth at which that happens is compiled into perl, so it can be
1807           changed with a custom build.
1808
1809       "(?PARNO)" "(?-PARNO)" "(?+PARNO)" "(?R)" "(?0)"
1810           Recursive subpattern. Treat the contents of a given capture buffer
1811           in the current pattern as an independent subpattern and attempt to
1812           match it at the current position in the string. Information about
1813           capture state from the caller for things like backreferences is
1814           available to the subpattern, but capture buffers set by the
1815           subpattern are not visible to the caller.
1816
1817           Similar to "(??{ code })" except that it does not involve executing
1818           any code or potentially compiling a returned pattern string;
1819           instead it treats the part of the current pattern contained within
1820           a specified capture group as an independent pattern that must match
1821           at the current position. Also different is the treatment of capture
1822           buffers, unlike "(??{ code })" recursive patterns have access to
1823           their caller's match state, so one can use backreferences safely.
1824
1825           PARNO is a sequence of digits (not starting with 0) whose value
1826           reflects the paren-number of the capture group to recurse to.
1827           "(?R)" recurses to the beginning of the whole pattern. "(?0)" is an
1828           alternate syntax for "(?R)". If PARNO is preceded by a plus or
1829           minus sign then it is assumed to be relative, with negative numbers
1830           indicating preceding capture groups and positive ones following.
1831           Thus "(?-1)" refers to the most recently declared group, and
1832           "(?+1)" indicates the next group to be declared.  Note that the
1833           counting for relative recursion differs from that of relative
1834           backreferences, in that with recursion unclosed groups are
1835           included.
1836
1837           The following pattern matches a function "foo()" which may contain
1838           balanced parentheses as the argument.
1839
1840             $re = qr{ (                   # paren group 1 (full function)
1841                         foo
1842                         (                 # paren group 2 (parens)
1843                           \(
1844                             (             # paren group 3 (contents of parens)
1845                             (?:
1846                              (?> [^()]+ ) # Non-parens without backtracking
1847                             |
1848                              (?2)         # Recurse to start of paren group 2
1849                             )*
1850                             )
1851                           \)
1852                         )
1853                       )
1854                     }x;
1855
1856           If the pattern was used as follows
1857
1858               'foo(bar(baz)+baz(bop))'=~/$re/
1859                   and print "\$1 = $1\n",
1860                             "\$2 = $2\n",
1861                             "\$3 = $3\n";
1862
1863           the output produced should be the following:
1864
1865               $1 = foo(bar(baz)+baz(bop))
1866               $2 = (bar(baz)+baz(bop))
1867               $3 = bar(baz)+baz(bop)
1868
1869           If there is no corresponding capture group defined, then it is a
1870           fatal error.  Recursing deeply without consuming any input string
1871           will also result in a fatal error.  The depth at which that happens
1872           is compiled into perl, so it can be changed with a custom build.
1873
1874           The following shows how using negative indexing can make it easier
1875           to embed recursive patterns inside of a "qr//" construct for later
1876           use:
1877
1878               my $parens = qr/(\((?:[^()]++|(?-1))*+\))/;
1879               if (/foo $parens \s+ \+ \s+ bar $parens/x) {
1880                  # do something here...
1881               }
1882
1883           Note that this pattern does not behave the same way as the
1884           equivalent PCRE or Python construct of the same form. In Perl you
1885           can backtrack into a recursed group, in PCRE and Python the
1886           recursed into group is treated as atomic. Also, modifiers are
1887           resolved at compile time, so constructs like "(?i:(?1))" or
1888           "(?:(?i)(?1))" do not affect how the sub-pattern will be processed.
1889
1890       "(?&NAME)"
1891           Recurse to a named subpattern. Identical to "(?PARNO)" except that
1892           the parenthesis to recurse to is determined by name. If multiple
1893           parentheses have the same name, then it recurses to the leftmost.
1894
1895           It is an error to refer to a name that is not declared somewhere in
1896           the pattern.
1897
1898           NOTE: In order to make things easier for programmers with
1899           experience with the Python or PCRE regex engines the pattern
1900           "(?P>NAME)" may be used instead of "(?&NAME)".
1901
1902       "(?(condition)yes-pattern|no-pattern)"
1903       "(?(condition)yes-pattern)"
1904           Conditional expression. Matches yes-pattern if condition yields a
1905           true value, matches no-pattern otherwise. A missing pattern always
1906           matches.
1907
1908           "(condition)" should be one of:
1909
1910           an integer in parentheses
1911               (which is valid if the corresponding pair of parentheses
1912               matched);
1913
1914           a lookahead/lookbehind/evaluate zero-width assertion;
1915           a name in angle brackets or single quotes
1916               (which is valid if a group with the given name matched);
1917
1918           the special symbol "(R)"
1919               (true when evaluated inside of recursion or eval).
1920               Additionally the "R" may be followed by a number, (which will
1921               be true when evaluated when recursing inside of the appropriate
1922               group), or by "&NAME", in which case it will be true only when
1923               evaluated during recursion in the named group.
1924
1925           Here's a summary of the possible predicates:
1926
1927           "(1)" "(2)" ...
1928               Checks if the numbered capturing group has matched something.
1929               Full syntax: "(?(1)then|else)"
1930
1931           "(<NAME>)" "('NAME')"
1932               Checks if a group with the given name has matched something.
1933               Full syntax: "(?(<name>)then|else)"
1934
1935           "(?=...)" "(?!...)" "(?<=...)" "(?<!...)"
1936               Checks whether the pattern matches (or does not match, for the
1937               "!"  variants).  Full syntax: "(?(?=lookahead)then|else)"
1938
1939           "(?{ CODE })"
1940               Treats the return value of the code block as the condition.
1941               Full syntax: "(?(?{ code })then|else)"
1942
1943           "(R)"
1944               Checks if the expression has been evaluated inside of
1945               recursion.  Full syntax: "(?(R)then|else)"
1946
1947           "(R1)" "(R2)" ...
1948               Checks if the expression has been evaluated while executing
1949               directly inside of the n-th capture group. This check is the
1950               regex equivalent of
1951
1952                 if ((caller(0))[3] eq 'subname') { ... }
1953
1954               In other words, it does not check the full recursion stack.
1955
1956               Full syntax: "(?(R1)then|else)"
1957
1958           "(R&NAME)"
1959               Similar to "(R1)", this predicate checks to see if we're
1960               executing directly inside of the leftmost group with a given
1961               name (this is the same logic used by "(?&NAME)" to
1962               disambiguate). It does not check the full stack, but only the
1963               name of the innermost active recursion.  Full syntax:
1964               "(?(R&name)then|else)"
1965
1966           "(DEFINE)"
1967               In this case, the yes-pattern is never directly executed, and
1968               no no-pattern is allowed. Similar in spirit to "(?{0})" but
1969               more efficient.  See below for details.  Full syntax:
1970               "(?(DEFINE)definitions...)"
1971
1972           For example:
1973
1974               m{ ( \( )?
1975                  [^()]+
1976                  (?(1) \) )
1977                }x
1978
1979           matches a chunk of non-parentheses, possibly included in
1980           parentheses themselves.
1981
1982           A special form is the "(DEFINE)" predicate, which never executes
1983           its yes-pattern directly, and does not allow a no-pattern. This
1984           allows one to define subpatterns which will be executed only by the
1985           recursion mechanism.  This way, you can define a set of regular
1986           expression rules that can be bundled into any pattern you choose.
1987
1988           It is recommended that for this usage you put the DEFINE block at
1989           the end of the pattern, and that you name any subpatterns defined
1990           within it.
1991
1992           Also, it's worth noting that patterns defined this way probably
1993           will not be as efficient, as the optimizer is not very clever about
1994           handling them.
1995
1996           An example of how this might be used is as follows:
1997
1998             /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
1999              (?(DEFINE)
2000                (?<NAME_PAT>....)
2001                (?<ADDRESS_PAT>....)
2002              )/x
2003
2004           Note that capture groups matched inside of recursion are not
2005           accessible after the recursion returns, so the extra layer of
2006           capturing groups is necessary. Thus $+{NAME_PAT} would not be
2007           defined even though $+{NAME} would be.
2008
2009           Finally, keep in mind that subpatterns created inside a DEFINE
2010           block count towards the absolute and relative number of captures,
2011           so this:
2012
2013               my @captures = "a" =~ /(.)                  # First capture
2014                                      (?(DEFINE)
2015                                          (?<EXAMPLE> 1 )  # Second capture
2016                                      )/x;
2017               say scalar @captures;
2018
2019           Will output 2, not 1. This is particularly important if you intend
2020           to compile the definitions with the "qr//" operator, and later
2021           interpolate them in another pattern.
2022
2023       "(?>pattern)"
2024       "(*atomic:pattern)"
2025           An "independent" subexpression, one which matches the substring
2026           that a standalone pattern would match if anchored at the given
2027           position, and it matches nothing other than this substring.  This
2028           construct is useful for optimizations of what would otherwise be
2029           "eternal" matches, because it will not backtrack (see
2030           "Backtracking").  It may also be useful in places where the "grab
2031           all you can, and do not give anything back" semantic is desirable.
2032
2033           For example: "^(?>a*)ab" will never match, since "(?>a*)" (anchored
2034           at the beginning of string, as above) will match all characters "a"
2035           at the beginning of string, leaving no "a" for "ab" to match.  In
2036           contrast, "a*ab" will match the same as "a+b", since the match of
2037           the subgroup "a*" is influenced by the following group "ab" (see
2038           "Backtracking").  In particular, "a*" inside "a*ab" will match
2039           fewer characters than a standalone "a*", since this makes the tail
2040           match.
2041
2042           "(?>pattern)" does not disable backtracking altogether once it has
2043           matched. It is still possible to backtrack past the construct, but
2044           not into it. So "((?>a*)|(?>b*))ar" will still match "bar".
2045
2046           An effect similar to "(?>pattern)" may be achieved by writing
2047           "(?=(pattern))\g{-1}".  This matches the same substring as a
2048           standalone "a+", and the following "\g{-1}" eats the matched
2049           string; it therefore makes a zero-length assertion into an analogue
2050           of "(?>...)".  (The difference between these two constructs is that
2051           the second one uses a capturing group, thus shifting ordinals of
2052           backreferences in the rest of a regular expression.)
2053
2054           Consider this pattern:
2055
2056               m{ \(
2057                     (
2058                       [^()]+           # x+
2059                     |
2060                       \( [^()]* \)
2061                     )+
2062                  \)
2063                }x
2064
2065           That will efficiently match a nonempty group with matching
2066           parentheses two levels deep or less.  However, if there is no such
2067           group, it will take virtually forever on a long string.  That's
2068           because there are so many different ways to split a long string
2069           into several substrings.  This is what "(.+)+" is doing, and
2070           "(.+)+" is similar to a subpattern of the above pattern.  Consider
2071           how the pattern above detects no-match on "((()aaaaaaaaaaaaaaaaaa"
2072           in several seconds, but that each extra letter doubles this time.
2073           This exponential performance will make it appear that your program
2074           has hung.  However, a tiny change to this pattern
2075
2076               m{ \(
2077                     (
2078                       (?> [^()]+ )        # change x+ above to (?> x+ )
2079                     |
2080                       \( [^()]* \)
2081                     )+
2082                  \)
2083                }x
2084
2085           which uses "(?>...)" matches exactly when the one above does
2086           (verifying this yourself would be a productive exercise), but
2087           finishes in a fourth the time when used on a similar string with
2088           1000000 "a"s.  Be aware, however, that, when this construct is
2089           followed by a quantifier, it currently triggers a warning message
2090           under the "use warnings" pragma or -w switch saying it "matches
2091           null string many times in regex".
2092
2093           On simple groups, such as the pattern "(?> [^()]+ )", a comparable
2094           effect may be achieved by negative lookahead, as in "[^()]+ (?!
2095           [^()] )".  This was only 4 times slower on a string with 1000000
2096           "a"s.
2097
2098           The "grab all you can, and do not give anything back" semantic is
2099           desirable in many situations where on the first sight a simple
2100           "()*" looks like the correct solution.  Suppose we parse text with
2101           comments being delimited by "#" followed by some optional
2102           (horizontal) whitespace.  Contrary to its appearance, "#[ \t]*" is
2103           not the correct subexpression to match the comment delimiter,
2104           because it may "give up" some whitespace if the remainder of the
2105           pattern can be made to match that way.  The correct answer is
2106           either one of these:
2107
2108               (?>#[ \t]*)
2109               #[ \t]*(?![ \t])
2110
2111           For example, to grab non-empty comments into $1, one should use
2112           either one of these:
2113
2114               / (?> \# [ \t]* ) (        .+ ) /x;
2115               /     \# [ \t]*   ( [^ \t] .* ) /x;
2116
2117           Which one you pick depends on which of these expressions better
2118           reflects the above specification of comments.
2119
2120           In some literature this construct is called "atomic matching" or
2121           "possessive matching".
2122
2123           Possessive quantifiers are equivalent to putting the item they are
2124           applied to inside of one of these constructs. The following
2125           equivalences apply:
2126
2127               Quantifier Form     Bracketing Form
2128               ---------------     ---------------
2129               PAT*+               (?>PAT*)
2130               PAT++               (?>PAT+)
2131               PAT?+               (?>PAT?)
2132               PAT{min,max}+       (?>PAT{min,max})
2133
2134           Nested "(?>...)" constructs are not no-ops, even if at first glance
2135           they might seem to be.  This is because the nested "(?>...)" can
2136           restrict internal backtracking that otherwise might occur.  For
2137           example,
2138
2139            "abc" =~ /(?>a[bc]*c)/
2140
2141           matches, but
2142
2143            "abc" =~ /(?>a(?>[bc]*)c)/
2144
2145           does not.
2146
2147       "(?[ ])"
2148           See "Extended Bracketed Character Classes" in perlrecharclass.
2149
2150           Note that this feature is currently experimental; using it yields a
2151           warning in the "experimental::regex_sets" category.
2152
2153   Backtracking
2154       NOTE: This section presents an abstract approximation of regular
2155       expression behavior.  For a more rigorous (and complicated) view of the
2156       rules involved in selecting a match among possible alternatives, see
2157       "Combining RE Pieces".
2158
2159       A fundamental feature of regular expression matching involves the
2160       notion called backtracking, which is currently used (when needed) by
2161       all regular non-possessive expression quantifiers, namely "*", "*?",
2162       "+", "+?", "{n,m}", and "{n,m}?".  Backtracking is often optimized
2163       internally, but the general principle outlined here is valid.
2164
2165       For a regular expression to match, the entire regular expression must
2166       match, not just part of it.  So if the beginning of a pattern
2167       containing a quantifier succeeds in a way that causes later parts in
2168       the pattern to fail, the matching engine backs up and recalculates the
2169       beginning part--that's why it's called backtracking.
2170
2171       Here is an example of backtracking:  Let's say you want to find the
2172       word following "foo" in the string "Food is on the foo table.":
2173
2174           $_ = "Food is on the foo table.";
2175           if ( /\b(foo)\s+(\w+)/i ) {
2176               print "$2 follows $1.\n";
2177           }
2178
2179       When the match runs, the first part of the regular expression
2180       ("\b(foo)") finds a possible match right at the beginning of the
2181       string, and loads up $1 with "Foo".  However, as soon as the matching
2182       engine sees that there's no whitespace following the "Foo" that it had
2183       saved in $1, it realizes its mistake and starts over again one
2184       character after where it had the tentative match.  This time it goes
2185       all the way until the next occurrence of "foo". The complete regular
2186       expression matches this time, and you get the expected output of "table
2187       follows foo."
2188
2189       Sometimes minimal matching can help a lot.  Imagine you'd like to match
2190       everything between "foo" and "bar".  Initially, you write something
2191       like this:
2192
2193           $_ =  "The food is under the bar in the barn.";
2194           if ( /foo(.*)bar/ ) {
2195               print "got <$1>\n";
2196           }
2197
2198       Which perhaps unexpectedly yields:
2199
2200         got <d is under the bar in the >
2201
2202       That's because ".*" was greedy, so you get everything between the first
2203       "foo" and the last "bar".  Here it's more effective to use minimal
2204       matching to make sure you get the text between a "foo" and the first
2205       "bar" thereafter.
2206
2207           if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
2208         got <d is under the >
2209
2210       Here's another example. Let's say you'd like to match a number at the
2211       end of a string, and you also want to keep the preceding part of the
2212       match.  So you write this:
2213
2214           $_ = "I have 2 numbers: 53147";
2215           if ( /(.*)(\d*)/ ) {                                # Wrong!
2216               print "Beginning is <$1>, number is <$2>.\n";
2217           }
2218
2219       That won't work at all, because ".*" was greedy and gobbled up the
2220       whole string. As "\d*" can match on an empty string the complete
2221       regular expression matched successfully.
2222
2223           Beginning is <I have 2 numbers: 53147>, number is <>.
2224
2225       Here are some variants, most of which don't work:
2226
2227           $_ = "I have 2 numbers: 53147";
2228           @pats = qw{
2229               (.*)(\d*)
2230               (.*)(\d+)
2231               (.*?)(\d*)
2232               (.*?)(\d+)
2233               (.*)(\d+)$
2234               (.*?)(\d+)$
2235               (.*)\b(\d+)$
2236               (.*\D)(\d+)$
2237           };
2238
2239           for $pat (@pats) {
2240               printf "%-12s ", $pat;
2241               if ( /$pat/ ) {
2242                   print "<$1> <$2>\n";
2243               } else {
2244                   print "FAIL\n";
2245               }
2246           }
2247
2248       That will print out:
2249
2250           (.*)(\d*)    <I have 2 numbers: 53147> <>
2251           (.*)(\d+)    <I have 2 numbers: 5314> <7>
2252           (.*?)(\d*)   <> <>
2253           (.*?)(\d+)   <I have > <2>
2254           (.*)(\d+)$   <I have 2 numbers: 5314> <7>
2255           (.*?)(\d+)$  <I have 2 numbers: > <53147>
2256           (.*)\b(\d+)$ <I have 2 numbers: > <53147>
2257           (.*\D)(\d+)$ <I have 2 numbers: > <53147>
2258
2259       As you see, this can be a bit tricky.  It's important to realize that a
2260       regular expression is merely a set of assertions that gives a
2261       definition of success.  There may be 0, 1, or several different ways
2262       that the definition might succeed against a particular string.  And if
2263       there are multiple ways it might succeed, you need to understand
2264       backtracking to know which variety of success you will achieve.
2265
2266       When using lookahead assertions and negations, this can all get even
2267       trickier.  Imagine you'd like to find a sequence of non-digits not
2268       followed by "123".  You might try to write that as
2269
2270           $_ = "ABC123";
2271           if ( /^\D*(?!123)/ ) {                # Wrong!
2272               print "Yup, no 123 in $_\n";
2273           }
2274
2275       But that isn't going to match; at least, not the way you're hoping.  It
2276       claims that there is no 123 in the string.  Here's a clearer picture of
2277       why that pattern matches, contrary to popular expectations:
2278
2279           $x = 'ABC123';
2280           $y = 'ABC445';
2281
2282           print "1: got $1\n" if $x =~ /^(ABC)(?!123)/;
2283           print "2: got $1\n" if $y =~ /^(ABC)(?!123)/;
2284
2285           print "3: got $1\n" if $x =~ /^(\D*)(?!123)/;
2286           print "4: got $1\n" if $y =~ /^(\D*)(?!123)/;
2287
2288       This prints
2289
2290           2: got ABC
2291           3: got AB
2292           4: got ABC
2293
2294       You might have expected test 3 to fail because it seems to a more
2295       general purpose version of test 1.  The important difference between
2296       them is that test 3 contains a quantifier ("\D*") and so can use
2297       backtracking, whereas test 1 will not.  What's happening is that you've
2298       asked "Is it true that at the start of $x, following 0 or more non-
2299       digits, you have something that's not 123?"  If the pattern matcher had
2300       let "\D*" expand to "ABC", this would have caused the whole pattern to
2301       fail.
2302
2303       The search engine will initially match "\D*" with "ABC".  Then it will
2304       try to match "(?!123)" with "123", which fails.  But because a
2305       quantifier ("\D*") has been used in the regular expression, the search
2306       engine can backtrack and retry the match differently in the hope of
2307       matching the complete regular expression.
2308
2309       The pattern really, really wants to succeed, so it uses the standard
2310       pattern back-off-and-retry and lets "\D*" expand to just "AB" this
2311       time.  Now there's indeed something following "AB" that is not "123".
2312       It's "C123", which suffices.
2313
2314       We can deal with this by using both an assertion and a negation.  We'll
2315       say that the first part in $1 must be followed both by a digit and by
2316       something that's not "123".  Remember that the lookaheads are zero-
2317       width expressions--they only look, but don't consume any of the string
2318       in their match.  So rewriting this way produces what you'd expect; that
2319       is, case 5 will fail, but case 6 succeeds:
2320
2321           print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/;
2322           print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/;
2323
2324           6: got ABC
2325
2326       In other words, the two zero-width assertions next to each other work
2327       as though they're ANDed together, just as you'd use any built-in
2328       assertions:  "/^$/" matches only if you're at the beginning of the line
2329       AND the end of the line simultaneously.  The deeper underlying truth is
2330       that juxtaposition in regular expressions always means AND, except when
2331       you write an explicit OR using the vertical bar.  "/ab/" means match
2332       "a" AND (then) match "b", although the attempted matches are made at
2333       different positions because "a" is not a zero-width assertion, but a
2334       one-width assertion.
2335
2336       WARNING: Particularly complicated regular expressions can take
2337       exponential time to solve because of the immense number of possible
2338       ways they can use backtracking to try for a match.  For example,
2339       without internal optimizations done by the regular expression engine,
2340       this will take a painfully long time to run:
2341
2342           'aaaaaaaaaaaa' =~ /((a{0,5}){0,5})*[c]/
2343
2344       And if you used "*"'s in the internal groups instead of limiting them
2345       to 0 through 5 matches, then it would take forever--or until you ran
2346       out of stack space.  Moreover, these internal optimizations are not
2347       always applicable.  For example, if you put "{0,5}" instead of "*" on
2348       the external group, no current optimization is applicable, and the
2349       match takes a long time to finish.
2350
2351       A powerful tool for optimizing such beasts is what is known as an
2352       "independent group", which does not backtrack (see "(?>pattern)").
2353       Note also that zero-length lookahead/lookbehind assertions will not
2354       backtrack to make the tail match, since they are in "logical" context:
2355       only whether they match is considered relevant.  For an example where
2356       side-effects of lookahead might have influenced the following match,
2357       see "(?>pattern)".
2358
2359   Script Runs
2360       A script run is basically a sequence of characters, all from the same
2361       Unicode script (see "Scripts" in perlunicode), such as Latin or Greek.
2362       In most places a single word would never be written in multiple
2363       scripts, unless it is a spoofing attack.  An infamous example, is
2364
2365        paypal.com
2366
2367       Those letters could all be Latin (as in the example just above), or
2368       they could be all Cyrillic (except for the dot), or they could be a
2369       mixture of the two.  In the case of an internet address the ".com"
2370       would be in Latin, And any Cyrillic ones would cause it to be a
2371       mixture, not a script run.  Someone clicking on such a link would not
2372       be directed to the real Paypal website, but an attacker would craft a
2373       look-alike one to attempt to gather sensitive information from the
2374       person.
2375
2376       Starting in Perl 5.28, it is now easy to detect strings that aren't
2377       script runs.  Simply enclose just about any pattern like either of
2378       these:
2379
2380        (*script_run:pattern)
2381        (*sr:pattern)
2382
2383       What happens is that after pattern succeeds in matching, it is
2384       subjected to the additional criterion that every character in it must
2385       be from the same script (see exceptions below).  If this isn't true,
2386       backtracking occurs until something all in the same script is found
2387       that matches, or all possibilities are exhausted.  This can cause a lot
2388       of backtracking, but generally, only malicious input will result in
2389       this, though the slow down could cause a denial of service attack.  If
2390       your needs permit, it is best to make the pattern atomic to cut down on
2391       the amount of backtracking.  This is so likely to be what you want,
2392       that instead of writing this:
2393
2394        (*script_run:(?>pattern))
2395
2396       you can write either of these:
2397
2398        (*atomic_script_run:pattern)
2399        (*asr:pattern)
2400
2401       (See "(?>pattern)".)
2402
2403       In Taiwan, Japan, and Korea, it is common for text to have a mixture of
2404       characters from their native scripts and base Chinese.  Perl follows
2405       Unicode's UTS 39 (<https://unicode.org/reports/tr39/>) Unicode Security
2406       Mechanisms in allowing such mixtures.  For example, the Japanese
2407       scripts Katakana and Hiragana are commonly mixed together in practice,
2408       along with some Chinese characters, and hence are treated as being in a
2409       single script run by Perl.
2410
2411       The rules used for matching decimal digits are slightly stricter.  Many
2412       scripts have their own sets of digits equivalent to the Western 0
2413       through 9 ones.  A few, such as Arabic, have more than one set.  For a
2414       string to be considered a script run, all digits in it must come from
2415       the same set of ten, as determined by the first digit encountered.  As
2416       an example,
2417
2418        qr/(*script_run: \d+ \b )/x
2419
2420       guarantees that the digits matched will all be from the same set of 10.
2421       You won't get a look-alike digit from a different script that has a
2422       different value than what it appears to be.
2423
2424       Unicode has three pseudo scripts that are handled specially.
2425
2426       "Unknown" is applied to code points whose meaning has yet to be
2427       determined.  Perl currently will match as a script run, any single
2428       character string consisting of one of these code points.  But any
2429       string longer than one code point containing one of these will not be
2430       considered a script run.
2431
2432       "Inherited" is applied to characters that modify another, such as an
2433       accent of some type.  These are considered to be in the script of the
2434       master character, and so never cause a script run to not match.
2435
2436       The other one is "Common".  This consists of mostly punctuation, emoji,
2437       and characters used in mathematics and music, the ASCII digits 0
2438       through 9, and full-width forms of these digits.  These characters can
2439       appear intermixed in text in many of the world's scripts.  These also
2440       don't cause a script run to not match.  But like other scripts, all
2441       digits in a run must come from the same set of 10.
2442
2443       This construct is non-capturing.  You can add parentheses to pattern to
2444       capture, if desired.  You will have to do this if you plan to use
2445       "(*ACCEPT) (*ACCEPT:arg)" and not have it bypass the script run
2446       checking.
2447
2448       The "Script_Extensions" property as modified by UTS 39
2449       (<https://unicode.org/reports/tr39/>) is used as the basis for this
2450       feature.
2451
2452       To summarize,
2453
2454       •   All length 0 or length 1 sequences are script runs.
2455
2456       •   A longer sequence is a script run if and only if all of the
2457           following conditions are met:
2458
2459
2460
2461           1.  No code point in the sequence has the "Script_Extension"
2462               property of "Unknown".
2463
2464               This currently means that all code points in the sequence have
2465               been assigned by Unicode to be characters that aren't private
2466               use nor surrogate code points.
2467
2468           2.  All characters in the sequence come from the Common script
2469               and/or the Inherited script and/or a single other script.
2470
2471               The script of a character is determined by the
2472               "Script_Extensions" property as modified by UTS 39
2473               (<https://unicode.org/reports/tr39/>), as described above.
2474
2475           3.  All decimal digits in the sequence come from the same block of
2476               10 consecutive digits.
2477
2478   Special Backtracking Control Verbs
2479       These special patterns are generally of the form "(*VERB:arg)". Unless
2480       otherwise stated the arg argument is optional; in some cases, it is
2481       mandatory.
2482
2483       Any pattern containing a special backtracking verb that allows an
2484       argument has the special behaviour that when executed it sets the
2485       current package's $REGERROR and $REGMARK variables. When doing so the
2486       following rules apply:
2487
2488       On failure, the $REGERROR variable will be set to the arg value of the
2489       verb pattern, if the verb was involved in the failure of the match. If
2490       the arg part of the pattern was omitted, then $REGERROR will be set to
2491       the name of the last "(*MARK:NAME)" pattern executed, or to TRUE if
2492       there was none. Also, the $REGMARK variable will be set to FALSE.
2493
2494       On a successful match, the $REGERROR variable will be set to FALSE, and
2495       the $REGMARK variable will be set to the name of the last
2496       "(*MARK:NAME)" pattern executed.  See the explanation for the
2497       "(*MARK:NAME)" verb below for more details.
2498
2499       NOTE: $REGERROR and $REGMARK are not magic variables like $1 and most
2500       other regex-related variables. They are not local to a scope, nor
2501       readonly, but instead are volatile package variables similar to
2502       $AUTOLOAD.  They are set in the package containing the code that
2503       executed the regex (rather than the one that compiled it, where those
2504       differ).  If necessary, you can use "local" to localize changes to
2505       these variables to a specific scope before executing a regex.
2506
2507       If a pattern does not contain a special backtracking verb that allows
2508       an argument, then $REGERROR and $REGMARK are not touched at all.
2509
2510       Verbs
2511          "(*PRUNE)" "(*PRUNE:NAME)"
2512              This zero-width pattern prunes the backtracking tree at the
2513              current point when backtracked into on failure. Consider the
2514              pattern "/A (*PRUNE) B/", where A and B are complex patterns.
2515              Until the "(*PRUNE)" verb is reached, A may backtrack as
2516              necessary to match. Once it is reached, matching continues in B,
2517              which may also backtrack as necessary; however, should B not
2518              match, then no further backtracking will take place, and the
2519              pattern will fail outright at the current starting position.
2520
2521              The following example counts all the possible matching strings
2522              in a pattern (without actually matching any of them).
2523
2524                  'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
2525                  print "Count=$count\n";
2526
2527              which produces:
2528
2529                  aaab
2530                  aaa
2531                  aa
2532                  a
2533                  aab
2534                  aa
2535                  a
2536                  ab
2537                  a
2538                  Count=9
2539
2540              If we add a "(*PRUNE)" before the count like the following
2541
2542                  'aaab' =~ /a+b?(*PRUNE)(?{print "$&\n"; $count++})(*FAIL)/;
2543                  print "Count=$count\n";
2544
2545              we prevent backtracking and find the count of the longest
2546              matching string at each matching starting point like so:
2547
2548                  aaab
2549                  aab
2550                  ab
2551                  Count=3
2552
2553              Any number of "(*PRUNE)" assertions may be used in a pattern.
2554
2555              See also "(?>pattern)" and possessive quantifiers for other ways
2556              to control backtracking. In some cases, the use of "(*PRUNE)"
2557              can be replaced with a "(?>pattern)" with no functional
2558              difference; however, "(*PRUNE)" can be used to handle cases that
2559              cannot be expressed using a "(?>pattern)" alone.
2560
2561          "(*SKIP)" "(*SKIP:NAME)"
2562              This zero-width pattern is similar to "(*PRUNE)", except that on
2563              failure it also signifies that whatever text that was matched
2564              leading up to the "(*SKIP)" pattern being executed cannot be
2565              part of any match of this pattern. This effectively means that
2566              the regex engine "skips" forward to this position on failure and
2567              tries to match again, (assuming that there is sufficient room to
2568              match).
2569
2570              The name of the "(*SKIP:NAME)" pattern has special significance.
2571              If a "(*MARK:NAME)" was encountered while matching, then it is
2572              that position which is used as the "skip point". If no "(*MARK)"
2573              of that name was encountered, then the "(*SKIP)" operator has no
2574              effect. When used without a name the "skip point" is where the
2575              match point was when executing the "(*SKIP)" pattern.
2576
2577              Compare the following to the examples in "(*PRUNE)"; note the
2578              string is twice as long:
2579
2580               'aaabaaab' =~ /a+b?(*SKIP)(?{print "$&\n"; $count++})(*FAIL)/;
2581               print "Count=$count\n";
2582
2583              outputs
2584
2585                  aaab
2586                  aaab
2587                  Count=2
2588
2589              Once the 'aaab' at the start of the string has matched, and the
2590              "(*SKIP)" executed, the next starting point will be where the
2591              cursor was when the "(*SKIP)" was executed.
2592
2593          "(*MARK:NAME)" "(*:NAME)"
2594              This zero-width pattern can be used to mark the point reached in
2595              a string when a certain part of the pattern has been
2596              successfully matched. This mark may be given a name. A later
2597              "(*SKIP)" pattern will then skip forward to that point if
2598              backtracked into on failure. Any number of "(*MARK)" patterns
2599              are allowed, and the NAME portion may be duplicated.
2600
2601              In addition to interacting with the "(*SKIP)" pattern,
2602              "(*MARK:NAME)" can be used to "label" a pattern branch, so that
2603              after matching, the program can determine which branches of the
2604              pattern were involved in the match.
2605
2606              When a match is successful, the $REGMARK variable will be set to
2607              the name of the most recently executed "(*MARK:NAME)" that was
2608              involved in the match.
2609
2610              This can be used to determine which branch of a pattern was
2611              matched without using a separate capture group for each branch,
2612              which in turn can result in a performance improvement, as perl
2613              cannot optimize "/(?:(x)|(y)|(z))/" as efficiently as something
2614              like "/(?:x(*MARK:x)|y(*MARK:y)|z(*MARK:z))/".
2615
2616              When a match has failed, and unless another verb has been
2617              involved in failing the match and has provided its own name to
2618              use, the $REGERROR variable will be set to the name of the most
2619              recently executed "(*MARK:NAME)".
2620
2621              See "(*SKIP)" for more details.
2622
2623              As a shortcut "(*MARK:NAME)" can be written "(*:NAME)".
2624
2625          "(*THEN)" "(*THEN:NAME)"
2626              This is similar to the "cut group" operator "::" from Raku.
2627              Like "(*PRUNE)", this verb always matches, and when backtracked
2628              into on failure, it causes the regex engine to try the next
2629              alternation in the innermost enclosing group (capturing or
2630              otherwise) that has alternations.  The two branches of a
2631              "(?(condition)yes-pattern|no-pattern)" do not count as an
2632              alternation, as far as "(*THEN)" is concerned.
2633
2634              Its name comes from the observation that this operation combined
2635              with the alternation operator ("|") can be used to create what
2636              is essentially a pattern-based if/then/else block:
2637
2638                ( COND (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ )
2639
2640              Note that if this operator is used and NOT inside of an
2641              alternation then it acts exactly like the "(*PRUNE)" operator.
2642
2643                / A (*PRUNE) B /
2644
2645              is the same as
2646
2647                / A (*THEN) B /
2648
2649              but
2650
2651                / ( A (*THEN) B | C ) /
2652
2653              is not the same as
2654
2655                / ( A (*PRUNE) B | C ) /
2656
2657              as after matching the A but failing on the B the "(*THEN)" verb
2658              will backtrack and try C; but the "(*PRUNE)" verb will simply
2659              fail.
2660
2661          "(*COMMIT)" "(*COMMIT:arg)"
2662              This is the Raku "commit pattern" "<commit>" or ":::". It's a
2663              zero-width pattern similar to "(*SKIP)", except that when
2664              backtracked into on failure it causes the match to fail
2665              outright. No further attempts to find a valid match by advancing
2666              the start pointer will occur again.  For example,
2667
2668               'aaabaaab' =~ /a+b?(*COMMIT)(?{print "$&\n"; $count++})(*FAIL)/;
2669               print "Count=$count\n";
2670
2671              outputs
2672
2673                  aaab
2674                  Count=1
2675
2676              In other words, once the "(*COMMIT)" has been entered, and if
2677              the pattern does not match, the regex engine will not try any
2678              further matching on the rest of the string.
2679
2680          "(*FAIL)" "(*F)" "(*FAIL:arg)"
2681              This pattern matches nothing and always fails. It can be used to
2682              force the engine to backtrack. It is equivalent to "(?!)", but
2683              easier to read. In fact, "(?!)" gets optimised into "(*FAIL)"
2684              internally. You can provide an argument so that if the match
2685              fails because of this "FAIL" directive the argument can be
2686              obtained from $REGERROR.
2687
2688              It is probably useful only when combined with "(?{})" or
2689              "(??{})".
2690
2691          "(*ACCEPT)" "(*ACCEPT:arg)"
2692              This pattern matches nothing and causes the end of successful
2693              matching at the point at which the "(*ACCEPT)" pattern was
2694              encountered, regardless of whether there is actually more to
2695              match in the string. When inside of a nested pattern, such as
2696              recursion, or in a subpattern dynamically generated via
2697              "(??{})", only the innermost pattern is ended immediately.
2698
2699              If the "(*ACCEPT)" is inside of capturing groups then the groups
2700              are marked as ended at the point at which the "(*ACCEPT)" was
2701              encountered.  For instance:
2702
2703                'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x;
2704
2705              will match, and $1 will be "AB" and $2 will be "B", $3 will not
2706              be set. If another branch in the inner parentheses was matched,
2707              such as in the string 'ACDE', then the "D" and "E" would have to
2708              be matched as well.
2709
2710              You can provide an argument, which will be available in the var
2711              $REGMARK after the match completes.
2712
2713   Warning on "\1" Instead of $1
2714       Some people get too used to writing things like:
2715
2716           $pattern =~ s/(\W)/\\\1/g;
2717
2718       This is grandfathered (for \1 to \9) for the RHS of a substitute to
2719       avoid shocking the sed addicts, but it's a dirty habit to get into.
2720       That's because in PerlThink, the righthand side of an "s///" is a
2721       double-quoted string.  "\1" in the usual double-quoted string means a
2722       control-A.  The customary Unix meaning of "\1" is kludged in for
2723       "s///".  However, if you get into the habit of doing that, you get
2724       yourself into trouble if you then add an "/e" modifier.
2725
2726           s/(\d+)/ \1 + 1 /eg;            # causes warning under -w
2727
2728       Or if you try to do
2729
2730           s/(\d+)/\1000/;
2731
2732       You can't disambiguate that by saying "\{1}000", whereas you can fix it
2733       with "${1}000".  The operation of interpolation should not be confused
2734       with the operation of matching a backreference.  Certainly they mean
2735       two different things on the left side of the "s///".
2736
2737   Repeated Patterns Matching a Zero-length Substring
2738       WARNING: Difficult material (and prose) ahead.  This section needs a
2739       rewrite.
2740
2741       Regular expressions provide a terse and powerful programming language.
2742       As with most other power tools, power comes together with the ability
2743       to wreak havoc.
2744
2745       A common abuse of this power stems from the ability to make infinite
2746       loops using regular expressions, with something as innocuous as:
2747
2748           'foo' =~ m{ ( o? )* }x;
2749
2750       The "o?" matches at the beginning of ""foo"", and since the position in
2751       the string is not moved by the match, "o?" would match again and again
2752       because of the "*" quantifier.  Another common way to create a similar
2753       cycle is with the looping modifier "/g":
2754
2755           @matches = ( 'foo' =~ m{ o? }xg );
2756
2757       or
2758
2759           print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
2760
2761       or the loop implied by "split()".
2762
2763       However, long experience has shown that many programming tasks may be
2764       significantly simplified by using repeated subexpressions that may
2765       match zero-length substrings.  Here's a simple example being:
2766
2767           @chars = split //, $string;           # // is not magic in split
2768           ($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
2769
2770       Thus Perl allows such constructs, by forcefully breaking the infinite
2771       loop.  The rules for this are different for lower-level loops given by
2772       the greedy quantifiers "*+{}", and for higher-level ones like the "/g"
2773       modifier or "split()" operator.
2774
2775       The lower-level loops are interrupted (that is, the loop is broken)
2776       when Perl detects that a repeated expression matched a zero-length
2777       substring.   Thus
2778
2779          m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
2780
2781       is made equivalent to
2782
2783          m{ (?: NON_ZERO_LENGTH )* (?: ZERO_LENGTH )? }x;
2784
2785       For example, this program
2786
2787          #!perl -l
2788          "aaaaab" =~ /
2789            (?:
2790               a                 # non-zero
2791               |                 # or
2792              (?{print "hello"}) # print hello whenever this
2793                                 #    branch is tried
2794              (?=(b))            # zero-width assertion
2795            )*  # any number of times
2796           /x;
2797          print $&;
2798          print $1;
2799
2800       prints
2801
2802          hello
2803          aaaaa
2804          b
2805
2806       Notice that "hello" is only printed once, as when Perl sees that the
2807       sixth iteration of the outermost "(?:)*" matches a zero-length string,
2808       it stops the "*".
2809
2810       The higher-level loops preserve an additional state between iterations:
2811       whether the last match was zero-length.  To break the loop, the
2812       following match after a zero-length match is prohibited to have a
2813       length of zero.  This prohibition interacts with backtracking (see
2814       "Backtracking"), and so the second best match is chosen if the best
2815       match is of zero length.
2816
2817       For example:
2818
2819           $_ = 'bar';
2820           s/\w??/<$&>/g;
2821
2822       results in "<><b><><a><><r><>".  At each position of the string the
2823       best match given by non-greedy "??" is the zero-length match, and the
2824       second best match is what is matched by "\w".  Thus zero-length matches
2825       alternate with one-character-long matches.
2826
2827       Similarly, for repeated "m/()/g" the second-best match is the match at
2828       the position one notch further in the string.
2829
2830       The additional state of being matched with zero-length is associated
2831       with the matched string, and is reset by each assignment to "pos()".
2832       Zero-length matches at the end of the previous match are ignored during
2833       "split".
2834
2835   Combining RE Pieces
2836       Each of the elementary pieces of regular expressions which were
2837       described before (such as "ab" or "\Z") could match at most one
2838       substring at the given position of the input string.  However, in a
2839       typical regular expression these elementary pieces are combined into
2840       more complicated patterns using combining operators "ST", "S|T", "S*"
2841       etc.  (in these examples "S" and "T" are regular subexpressions).
2842
2843       Such combinations can include alternatives, leading to a problem of
2844       choice: if we match a regular expression "a|ab" against "abc", will it
2845       match substring "a" or "ab"?  One way to describe which substring is
2846       actually matched is the concept of backtracking (see "Backtracking").
2847       However, this description is too low-level and makes you think in terms
2848       of a particular implementation.
2849
2850       Another description starts with notions of "better"/"worse".  All the
2851       substrings which may be matched by the given regular expression can be
2852       sorted from the "best" match to the "worst" match, and it is the "best"
2853       match which is chosen.  This substitutes the question of "what is
2854       chosen?"  by the question of "which matches are better, and which are
2855       worse?".
2856
2857       Again, for elementary pieces there is no such question, since at most
2858       one match at a given position is possible.  This section describes the
2859       notion of better/worse for combining operators.  In the description
2860       below "S" and "T" are regular subexpressions.
2861
2862       "ST"
2863           Consider two possible matches, "AB" and "A'B'", "A" and "A'" are
2864           substrings which can be matched by "S", "B" and "B'" are substrings
2865           which can be matched by "T".
2866
2867           If "A" is a better match for "S" than "A'", "AB" is a better match
2868           than "A'B'".
2869
2870           If "A" and "A'" coincide: "AB" is a better match than "AB'" if "B"
2871           is a better match for "T" than "B'".
2872
2873       "S|T"
2874           When "S" can match, it is a better match than when only "T" can
2875           match.
2876
2877           Ordering of two matches for "S" is the same as for "S".  Similar
2878           for two matches for "T".
2879
2880       "S{REPEAT_COUNT}"
2881           Matches as "SSS...S" (repeated as many times as necessary).
2882
2883       "S{min,max}"
2884           Matches as "S{max}|S{max-1}|...|S{min+1}|S{min}".
2885
2886       "S{min,max}?"
2887           Matches as "S{min}|S{min+1}|...|S{max-1}|S{max}".
2888
2889       "S?", "S*", "S+"
2890           Same as "S{0,1}", "S{0,BIG_NUMBER}", "S{1,BIG_NUMBER}"
2891           respectively.
2892
2893       "S??", "S*?", "S+?"
2894           Same as "S{0,1}?", "S{0,BIG_NUMBER}?", "S{1,BIG_NUMBER}?"
2895           respectively.
2896
2897       "(?>S)"
2898           Matches the best match for "S" and only that.
2899
2900       "(?=S)", "(?<=S)"
2901           Only the best match for "S" is considered.  (This is important only
2902           if "S" has capturing parentheses, and backreferences are used
2903           somewhere else in the whole regular expression.)
2904
2905       "(?!S)", "(?<!S)"
2906           For this grouping operator there is no need to describe the
2907           ordering, since only whether or not "S" can match is important.
2908
2909       "(??{ EXPR })", "(?PARNO)"
2910           The ordering is the same as for the regular expression which is the
2911           result of EXPR, or the pattern contained by capture group PARNO.
2912
2913       "(?(condition)yes-pattern|no-pattern)"
2914           Recall that which of yes-pattern or no-pattern actually matches is
2915           already determined.  The ordering of the matches is the same as for
2916           the chosen subexpression.
2917
2918       The above recipes describe the ordering of matches at a given position.
2919       One more rule is needed to understand how a match is determined for the
2920       whole regular expression: a match at an earlier position is always
2921       better than a match at a later position.
2922
2923   Creating Custom RE Engines
2924       As of Perl 5.10.0, one can create custom regular expression engines.
2925       This is not for the faint of heart, as they have to plug in at the C
2926       level.  See perlreapi for more details.
2927
2928       As an alternative, overloaded constants (see overload) provide a simple
2929       way to extend the functionality of the RE engine, by substituting one
2930       pattern for another.
2931
2932       Suppose that we want to enable a new RE escape-sequence "\Y|" which
2933       matches at a boundary between whitespace characters and non-whitespace
2934       characters.  Note that "(?=\S)(?<!\S)|(?!\S)(?<=\S)" matches exactly at
2935       these positions, so we want to have each "\Y|" in the place of the more
2936       complicated version.  We can create a module "customre" to do this:
2937
2938           package customre;
2939           use overload;
2940
2941           sub import {
2942             shift;
2943             die "No argument to customre::import allowed" if @_;
2944             overload::constant 'qr' => \&convert;
2945           }
2946
2947           sub invalid { die "/$_[0]/: invalid escape '\\$_[1]'"}
2948
2949           # We must also take care of not escaping the legitimate \\Y|
2950           # sequence, hence the presence of '\\' in the conversion rules.
2951           my %rules = ( '\\' => '\\\\',
2952                         'Y|' => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );
2953           sub convert {
2954             my $re = shift;
2955             $re =~ s{
2956                       \\ ( \\ | Y . )
2957                     }
2958                     { $rules{$1} or invalid($re,$1) }sgex;
2959             return $re;
2960           }
2961
2962       Now "use customre" enables the new escape in constant regular
2963       expressions, i.e., those without any runtime variable interpolations.
2964       As documented in overload, this conversion will work only over literal
2965       parts of regular expressions.  For "\Y|$re\Y|" the variable part of
2966       this regular expression needs to be converted explicitly (but only if
2967       the special meaning of "\Y|" should be enabled inside $re):
2968
2969           use customre;
2970           $re = <>;
2971           chomp $re;
2972           $re = customre::convert $re;
2973           /\Y|$re\Y|/;
2974
2975   Embedded Code Execution Frequency
2976       The exact rules for how often "(??{})" and "(?{})" are executed in a
2977       pattern are unspecified.  In the case of a successful match you can
2978       assume that they DWIM and will be executed in left to right order the
2979       appropriate number of times in the accepting path of the pattern as
2980       would any other meta-pattern.  How non-accepting pathways and match
2981       failures affect the number of times a pattern is executed is
2982       specifically unspecified and may vary depending on what optimizations
2983       can be applied to the pattern and is likely to change from version to
2984       version.
2985
2986       For instance in
2987
2988         "aaabcdeeeee"=~/a(?{print "a"})b(?{print "b"})cde/;
2989
2990       the exact number of times "a" or "b" are printed out is unspecified for
2991       failure, but you may assume they will be printed at least once during a
2992       successful match, additionally you may assume that if "b" is printed,
2993       it will be preceded by at least one "a".
2994
2995       In the case of branching constructs like the following:
2996
2997         /a(b|(?{ print "a" }))c(?{ print "c" })/;
2998
2999       you can assume that the input "ac" will output "ac", and that "abc"
3000       will output only "c".
3001
3002       When embedded code is quantified, successful matches will call the code
3003       once for each matched iteration of the quantifier.  For example:
3004
3005         "good" =~ /g(?:o(?{print "o"}))*d/;
3006
3007       will output "o" twice.
3008
3009   PCRE/Python Support
3010       As of Perl 5.10.0, Perl supports several Python/PCRE-specific
3011       extensions to the regex syntax. While Perl programmers are encouraged
3012       to use the Perl-specific syntax, the following are also accepted:
3013
3014       "(?P<NAME>pattern)"
3015           Define a named capture group. Equivalent to "(?<NAME>pattern)".
3016
3017       "(?P=NAME)"
3018           Backreference to a named capture group. Equivalent to "\g{NAME}".
3019
3020       "(?P>NAME)"
3021           Subroutine call to a named capture group. Equivalent to "(?&NAME)".
3022

BUGS

3024       There are a number of issues with regard to case-insensitive matching
3025       in Unicode rules.  See "i" under "Modifiers" above.
3026
3027       This document varies from difficult to understand to completely and
3028       utterly opaque.  The wandering prose riddled with jargon is hard to
3029       fathom in several places.
3030
3031       This document needs a rewrite that separates the tutorial content from
3032       the reference content.
3033

SEE ALSO

3035       The syntax of patterns used in Perl pattern matching evolved from those
3036       supplied in the Bell Labs Research Unix 8th Edition (Version 8) regex
3037       routines.  (The code is actually derived (distantly) from Henry
3038       Spencer's freely redistributable reimplementation of those V8
3039       routines.)
3040
3041       perlrequick.
3042
3043       perlretut.
3044
3045       "Regexp Quote-Like Operators" in perlop.
3046
3047       "Gory details of parsing quoted constructs" in perlop.
3048
3049       perlfaq6.
3050
3051       "pos" in perlfunc.
3052
3053       perllocale.
3054
3055       perlebcdic.
3056
3057       Mastering Regular Expressions by Jeffrey Friedl, published by O'Reilly
3058       and Associates.
3059
3060
3061
3062perl v5.34.0                      2021-10-18                         PERLRE(1)
Impressum