1PERLREREF(1)           Perl Programmers Reference Guide           PERLREREF(1)
2
3
4

NAME

6       perlreref - Perl Regular Expressions Reference
7

DESCRIPTION

9       This is a quick reference to Perl's regular expressions.  For full
10       information see perlre and perlop, as well as the "SEE ALSO" section in
11       this document.
12
13   OPERATORS
14       "=~" determines to which variable the regex is applied.  In its
15       absence, $_ is used.
16
17           $var =~ /foo/;
18
19       "!~" determines to which variable the regex is applied, and negates the
20       result of the match; it returns false if the match succeeds, and true
21       if it fails.
22
23           $var !~ /foo/;
24
25       "m/pattern/msixpogcdual" searches a string for a pattern match,
26       applying the given options.
27
28           m  Multiline mode - ^ and $ match internal lines
29           s  match as a Single line - . matches \n
30           i  case-Insensitive
31           x  eXtended legibility - free whitespace and comments
32           p  Preserve a copy of the matched string -
33              ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
34           o  compile pattern Once
35           g  Global - all occurrences
36           c  don't reset pos on failed matches when using /g
37           a  restrict \d, \s, \w and [:posix:] to match ASCII only
38           aa (two a's) also /i matches exclude ASCII/non-ASCII
39           l  match according to current locale
40           u  match according to Unicode rules
41           d  match according to native rules unless something indicates
42              Unicode
43
44       If 'pattern' is an empty string, the last successfully matched regex is
45       used. Delimiters other than '/' may be used for both this operator and
46       the following ones. The leading "m" can be omitted if the delimiter is
47       '/'.
48
49       "qr/pattern/msixpodual" lets you store a regex in a variable, or pass
50       one around. Modifiers as for "m//", and are stored within the regex.
51
52       "s/pattern/replacement/msixpogcedual" substitutes matches of 'pattern'
53       with 'replacement'. Modifiers as for "m//", with two additions:
54
55           e  Evaluate 'replacement' as an expression
56           r  Return substitution and leave the original string untouched.
57
58       'e' may be specified multiple times. 'replacement' is interpreted as a
59       double quoted string unless a single-quote ("'") is the delimiter.
60
61       "?pattern?" is like "m/pattern/" but matches only once. No alternate
62       delimiters can be used.  Must be reset with reset().
63
64   SYNTAX
65        \       Escapes the character immediately following it
66        .       Matches any single character except a newline (unless /s is
67                  used)
68        ^       Matches at the beginning of the string (or line, if /m is used)
69        $       Matches at the end of the string (or line, if /m is used)
70        *       Matches the preceding element 0 or more times
71        +       Matches the preceding element 1 or more times
72        ?       Matches the preceding element 0 or 1 times
73        {...}   Specifies a range of occurrences for the element preceding it
74        [...]   Matches any one of the characters contained within the brackets
75        (...)   Groups subexpressions for capturing to $1, $2...
76        (?:...) Groups subexpressions without capturing (cluster)
77        |       Matches either the subexpression preceding or following it
78        \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
79        \1, \2, \3 ...           Matches the text from the Nth group
80        \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
81        \g{name}     Named backreference
82        \k<name>     Named backreference
83        \k'name'     Named backreference
84        (?P=name)    Named backreference (python syntax)
85
86   ESCAPE SEQUENCES
87       These work as in normal strings.
88
89          \a       Alarm (beep)
90          \e       Escape
91          \f       Formfeed
92          \n       Newline
93          \r       Carriage return
94          \t       Tab
95          \037     Char whose ordinal is the 3 octal digits, max \777
96          \o{2307} Char whose ordinal is the octal number, unrestricted
97          \x7f     Char whose ordinal is the 2 hex digits, max \xFF
98          \x{263a} Char whose ordinal is the hex number, unrestricted
99          \cx      Control-x
100          \N{name} A named Unicode character or character sequence
101          \N{U+263D} A Unicode character by hex ordinal
102
103          \l  Lowercase next character
104          \u  Titlecase next character
105          \L  Lowercase until \E
106          \U  Uppercase until \E
107          \F  Foldcase until \E
108          \Q  Disable pattern metacharacters until \E
109          \E  End modification
110
111       For Titlecase, see "Titlecase".
112
113       This one works differently from normal strings:
114
115          \b  An assertion, not backspace, except in a character class
116
117   CHARACTER CLASSES
118          [amy]    Match 'a', 'm' or 'y'
119          [f-j]    Dash specifies "range"
120          [f-j-]   Dash escaped or at start or end means 'dash'
121          [^f-j]   Caret indicates "match any character _except_ these"
122
123       The following sequences (except "\N") work within or without a
124       character class.  The first six are locale aware, all are Unicode
125       aware. See perllocale and perlunicode for details.
126
127          \d      A digit
128          \D      A nondigit
129          \w      A word character
130          \W      A non-word character
131          \s      A whitespace character
132          \S      A non-whitespace character
133          \h      An horizontal whitespace
134          \H      A non horizontal whitespace
135          \N      A non newline (when not followed by '{NAME}'; experimental;
136                  not valid in a character class; equivalent to [^\n]; it's
137                  like '.' without /s modifier)
138          \v      A vertical whitespace
139          \V      A non vertical whitespace
140          \R      A generic newline           (?>\v|\x0D\x0A)
141
142          \C      Match a byte (with Unicode, '.' matches a character)
143          \pP     Match P-named (Unicode) property
144          \p{...} Match Unicode property with name longer than 1 character
145          \PP     Match non-P
146          \P{...} Match lack of Unicode property with name longer than 1 char
147          \X      Match Unicode extended grapheme cluster
148
149       POSIX character classes and their Unicode and Perl equivalents:
150
151                   ASCII-         Full-
152          POSIX    range          range    backslash
153        [[:...:]]  \p{...}        \p{...}   sequence    Description
154
155        -----------------------------------------------------------------------
156        alnum   PosixAlnum       XPosixAlnum            Alpha plus Digit
157        alpha   PosixAlpha       XPosixAlpha            Alphabetic characters
158        ascii   ASCII                                   Any ASCII character
159        blank   PosixBlank       XPosixBlank   \h       Horizontal whitespace;
160                                                          full-range also
161                                                          written as
162                                                          \p{HorizSpace} (GNU
163                                                          extension)
164        cntrl   PosixCntrl       XPosixCntrl            Control characters
165        digit   PosixDigit       XPosixDigit   \d       Decimal digits
166        graph   PosixGraph       XPosixGraph            Alnum plus Punct
167        lower   PosixLower       XPosixLower            Lowercase characters
168        print   PosixPrint       XPosixPrint            Graph plus Print, but
169                                                          not any Cntrls
170        punct   PosixPunct       XPosixPunct            Punctuation and Symbols
171                                                          in ASCII-range; just
172                                                          punct outside it
173        space   PosixSpace       XPosixSpace            [\s\cK]
174                PerlSpace        XPerlSpace    \s       Perl's whitespace def'n
175        upper   PosixUpper       XPosixUpper            Uppercase characters
176        word    PosixWord        XPosixWord    \w       Alnum + Unicode marks +
177                                                          connectors, like '_'
178                                                          (Perl extension)
179        xdigit  ASCII_Hex_Digit  XPosixDigit            Hexadecimal digit,
180                                                           ASCII-range is
181                                                           [0-9A-Fa-f]
182
183       Also, various synonyms like "\p{Alpha}" for "\p{XPosixAlpha}"; all
184       listed in "Properties accessible through \p{} and \P{}" in perluniprops
185
186       Within a character class:
187
188           POSIX      traditional   Unicode
189         [:digit:]       \d        \p{Digit}
190         [:^digit:]      \D        \P{Digit}
191
192   ANCHORS
193       All are zero-width assertions.
194
195          ^  Match string start (or line, if /m is used)
196          $  Match string end (or line, if /m is used) or before newline
197          \b Match word boundary (between \w and \W)
198          \B Match except at word boundary (between \w and \w or \W and \W)
199          \A Match string start (regardless of /m)
200          \Z Match string end (before optional newline)
201          \z Match absolute string end
202          \G Match where previous m//g left off
203          \K Keep the stuff left of the \K, don't include it in $&
204
205   QUANTIFIERS
206       Quantifiers are greedy by default and match the longest leftmost.
207
208          Maximal Minimal Possessive Allowed range
209          ------- ------- ---------- -------------
210          {n,m}   {n,m}?  {n,m}+     Must occur at least n times
211                                     but no more than m times
212          {n,}    {n,}?   {n,}+      Must occur at least n times
213          {n}     {n}?    {n}+       Must occur exactly n times
214          *       *?      *+         0 or more times (same as {0,})
215          +       +?      ++         1 or more times (same as {1,})
216          ?       ??      ?+         0 or 1 time (same as {0,1})
217
218       The possessive forms (new in Perl 5.10) prevent backtracking: what gets
219       matched by a pattern with a possessive quantifier will not be
220       backtracked into, even if that causes the whole match to fail.
221
222       There is no quantifier "{,n}". That's interpreted as a literal string.
223
224   EXTENDED CONSTRUCTS
225          (?#text)          A comment
226          (?:...)           Groups subexpressions without capturing (cluster)
227          (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
228          (?=...)           Zero-width positive lookahead assertion
229          (?!...)           Zero-width negative lookahead assertion
230          (?<=...)          Zero-width positive lookbehind assertion
231          (?<!...)          Zero-width negative lookbehind assertion
232          (?>...)           Grab what we can, prohibit backtracking
233          (?|...)           Branch reset
234          (?<name>...)      Named capture
235          (?'name'...)      Named capture
236          (?P<name>...)     Named capture (python syntax)
237          (?{ code })       Embedded code, return value becomes $^R
238          (??{ code })      Dynamic regex, return value used as regex
239          (?N)              Recurse into subpattern number N
240          (?-N), (?+N)      Recurse into Nth previous/next subpattern
241          (?R), (?0)        Recurse at the beginning of the whole pattern
242          (?&name)          Recurse into a named subpattern
243          (?P>name)         Recurse into a named subpattern (python syntax)
244          (?(cond)yes|no)
245          (?(cond)yes)      Conditional expression, where "cond" can be:
246                            (?=pat)   look-ahead
247                            (?!pat)   negative look-ahead
248                            (?<=pat)  look-behind
249                            (?<!pat)  negative look-behind
250                            (N)       subpattern N has matched something
251                            (<name>)  named subpattern has matched something
252                            ('name')  named subpattern has matched something
253                            (?{code}) code condition
254                            (R)       true if recursing
255                            (RN)      true if recursing into Nth subpattern
256                            (R&name)  true if recursing into named subpattern
257                            (DEFINE)  always false, no no-pattern allowed
258
259   VARIABLES
260          $_    Default variable for operators to use
261
262          $`    Everything prior to matched string
263          $&    Entire matched string
264          $'    Everything after to matched string
265
266          ${^PREMATCH}   Everything prior to matched string
267          ${^MATCH}      Entire matched string
268          ${^POSTMATCH}  Everything after to matched string
269
270       The use of "$`", $& or "$'" will slow down all regex use within your
271       program. Consult perlvar for "@-" to see equivalent expressions that
272       won't cause slow down.  See also Devel::SawAmpersand. Starting with
273       Perl 5.10, you can also use the equivalent variables "${^PREMATCH}",
274       "${^MATCH}" and "${^POSTMATCH}", but for them to be defined, you have
275       to specify the "/p" (preserve) modifier on your regular expression.
276
277          $1, $2 ...  hold the Xth captured expr
278          $+    Last parenthesized pattern match
279          $^N   Holds the most recently closed capture
280          $^R   Holds the result of the last (?{...}) expr
281          @-    Offsets of starts of groups. $-[0] holds start of whole match
282          @+    Offsets of ends of groups. $+[0] holds end of whole match
283          %+    Named capture groups
284          %-    Named capture groups, as array refs
285
286       Captured groups are numbered according to their opening paren.
287
288   FUNCTIONS
289          lc          Lowercase a string
290          lcfirst     Lowercase first char of a string
291          uc          Uppercase a string
292          ucfirst     Titlecase first char of a string
293          fc          Foldcase a string
294
295          pos         Return or set current match position
296          quotemeta   Quote metacharacters
297          reset       Reset ?pattern? status
298          study       Analyze string for optimizing matching
299
300          split       Use a regex to split a string into parts
301
302       The first five of these are like the escape sequences "\L", "\l", "\U",
303       "\u", and "\F".  For Titlecase, see "Titlecase"; For Foldcase, see
304       "Foldcase".
305
306   TERMINOLOGY
307       Titlecase
308
309       Unicode concept which most often is equal to uppercase, but for certain
310       characters like the German "sharp s" there is a difference.
311
312       Foldcase
313
314       Unicode form that is useful when comparing strings regardless of case,
315       as certain characters have compex one-to-many case mappings. Primarily
316       a variant of lowercase.
317

AUTHOR

319       Iain Truskett. Updated by the Perl 5 Porters.
320
321       This document may be distributed under the same terms as Perl itself.
322

SEE ALSO

324       ·   perlretut for a tutorial on regular expressions.
325
326       ·   perlrequick for a rapid tutorial.
327
328       ·   perlre for more details.
329
330       ·   perlvar for details on the variables.
331
332       ·   perlop for details on the operators.
333
334       ·   perlfunc for details on the functions.
335
336       ·   perlfaq6 for FAQs on regular expressions.
337
338       ·   perlrebackslash for a reference on backslash sequences.
339
340       ·   perlrecharclass for a reference on character classes.
341
342       ·   The re module to alter behaviour and aid debugging.
343
344       ·   "Debugging Regular Expressions" in perldebug
345
346       ·   perluniintro, perlunicode, charnames and perllocale for details on
347           regexes and internationalisation.
348
349       ·   Mastering Regular Expressions by Jeffrey Friedl
350           (http://oreilly.com/catalog/9780596528126/) for a thorough
351           grounding and reference on the topic.
352

THANKS

354       David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom
355       Christiansen, Jim Cromie, and Jeffrey Goff for useful advice.
356
357
358
359perl v5.16.3                      2013-03-04                      PERLREREF(1)
Impressum