1PERLREREF(1) Perl Programmers Reference Guide PERLREREF(1)
2
3
4
6 perlreref - Perl Regular Expressions Reference
7
9 This is a quick reference to Perl's regular expressions. For full
10 information see perlre and perlop, as well as the "SEE ALSO" section in
11 this document.
12
13 OPERATORS
14 "=~" determines to which variable the regex is applied. In its
15 absence, $_ is used.
16
17 $var =~ /foo/;
18
19 "!~" determines to which variable the regex is applied, and negates the
20 result of the match; it returns false if the match succeeds, and true
21 if it fails.
22
23 $var !~ /foo/;
24
25 "m/pattern/msixpogcdualn" searches a string for a pattern match,
26 applying the given options.
27
28 m Multiline mode - ^ and $ match internal lines
29 s match as a Single line - . matches \n
30 i case-Insensitive
31 x eXtended legibility - free whitespace and comments
32 p Preserve a copy of the matched string -
33 ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
34 o compile pattern Once
35 g Global - all occurrences
36 c don't reset pos on failed matches when using /g
37 a restrict \d, \s, \w and [:posix:] to match ASCII only
38 aa (two a's) also /i matches exclude ASCII/non-ASCII
39 l match according to current locale
40 u match according to Unicode rules
41 d match according to native rules unless something indicates
42 Unicode
43 n Non-capture mode. Don't let () fill in $1, $2, etc...
44
45 If 'pattern' is an empty string, the last successfully matched regex is
46 used. Delimiters other than '/' may be used for both this operator and
47 the following ones. The leading "m" can be omitted if the delimiter is
48 '/'.
49
50 "qr/pattern/msixpodualn" lets you store a regex in a variable, or pass
51 one around. Modifiers as for "m//", and are stored within the regex.
52
53 "s/pattern/replacement/msixpogcedual" substitutes matches of 'pattern'
54 with 'replacement'. Modifiers as for "m//", with two additions:
55
56 e Evaluate 'replacement' as an expression
57 r Return substitution and leave the original string untouched.
58
59 'e' may be specified multiple times. 'replacement' is interpreted as a
60 double quoted string unless a single-quote ("'") is the delimiter.
61
62 "m?pattern?" is like "m/pattern/" but matches only once. No alternate
63 delimiters can be used. Must be reset with reset().
64
65 SYNTAX
66 \ Escapes the character immediately following it
67 . Matches any single character except a newline (unless /s is
68 used)
69 ^ Matches at the beginning of the string (or line, if /m is used)
70 $ Matches at the end of the string (or line, if /m is used)
71 * Matches the preceding element 0 or more times
72 + Matches the preceding element 1 or more times
73 ? Matches the preceding element 0 or 1 times
74 {...} Specifies a range of occurrences for the element preceding it
75 [...] Matches any one of the characters contained within the brackets
76 (...) Groups subexpressions for capturing to $1, $2...
77 (?:...) Groups subexpressions without capturing (cluster)
78 | Matches either the subexpression preceding or following it
79 \g1 or \g{1}, \g2 ... Matches the text from the Nth group
80 \1, \2, \3 ... Matches the text from the Nth group
81 \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
82 \g{name} Named backreference
83 \k<name> Named backreference
84 \k'name' Named backreference
85 (?P=name) Named backreference (python syntax)
86
87 ESCAPE SEQUENCES
88 These work as in normal strings.
89
90 \a Alarm (beep)
91 \e Escape
92 \f Formfeed
93 \n Newline
94 \r Carriage return
95 \t Tab
96 \037 Char whose ordinal is the 3 octal digits, max \777
97 \o{2307} Char whose ordinal is the octal number, unrestricted
98 \x7f Char whose ordinal is the 2 hex digits, max \xFF
99 \x{263a} Char whose ordinal is the hex number, unrestricted
100 \cx Control-x
101 \N{name} A named Unicode character or character sequence
102 \N{U+263D} A Unicode character by hex ordinal
103
104 \l Lowercase next character
105 \u Titlecase next character
106 \L Lowercase until \E
107 \U Uppercase until \E
108 \F Foldcase until \E
109 \Q Disable pattern metacharacters until \E
110 \E End modification
111
112 For Titlecase, see "Titlecase".
113
114 This one works differently from normal strings:
115
116 \b An assertion, not backspace, except in a character class
117
118 CHARACTER CLASSES
119 [amy] Match 'a', 'm' or 'y'
120 [f-j] Dash specifies "range"
121 [f-j-] Dash escaped or at start or end means 'dash'
122 [^f-j] Caret indicates "match any character _except_ these"
123
124 The following sequences (except "\N") work within or without a
125 character class. The first six are locale aware, all are Unicode
126 aware. See perllocale and perlunicode for details.
127
128 \d A digit
129 \D A nondigit
130 \w A word character
131 \W A non-word character
132 \s A whitespace character
133 \S A non-whitespace character
134 \h A horizontal whitespace
135 \H A non horizontal whitespace
136 \N A non newline (when not followed by '{NAME}';;
137 not valid in a character class; equivalent to [^\n]; it's
138 like '.' without /s modifier)
139 \v A vertical whitespace
140 \V A non vertical whitespace
141 \R A generic newline (?>\v|\x0D\x0A)
142
143 \pP Match P-named (Unicode) property
144 \p{...} Match Unicode property with name longer than 1 character
145 \PP Match non-P
146 \P{...} Match lack of Unicode property with name longer than 1 char
147 \X Match Unicode extended grapheme cluster
148
149 POSIX character classes and their Unicode and Perl equivalents:
150
151 ASCII- Full-
152 POSIX range range backslash
153 [[:...:]] \p{...} \p{...} sequence Description
154
155 -----------------------------------------------------------------------
156 alnum PosixAlnum XPosixAlnum 'alpha' plus 'digit'
157 alpha PosixAlpha XPosixAlpha Alphabetic characters
158 ascii ASCII Any ASCII character
159 blank PosixBlank XPosixBlank \h Horizontal whitespace;
160 full-range also
161 written as
162 \p{HorizSpace} (GNU
163 extension)
164 cntrl PosixCntrl XPosixCntrl Control characters
165 digit PosixDigit XPosixDigit \d Decimal digits
166 graph PosixGraph XPosixGraph 'alnum' plus 'punct'
167 lower PosixLower XPosixLower Lowercase characters
168 print PosixPrint XPosixPrint 'graph' plus 'space',
169 but not any Controls
170 punct PosixPunct XPosixPunct Punctuation and Symbols
171 in ASCII-range; just
172 punct outside it
173 space PosixSpace XPosixSpace \s Whitespace
174 upper PosixUpper XPosixUpper Uppercase characters
175 word PosixWord XPosixWord \w 'alnum' + Unicode marks
176 + connectors, like
177 '_' (Perl extension)
178 xdigit ASCII_Hex_Digit XPosixDigit Hexadecimal digit,
179 ASCII-range is
180 [0-9A-Fa-f]
181
182 Also, various synonyms like "\p{Alpha}" for "\p{XPosixAlpha}"; all
183 listed in "Properties accessible through \p{} and \P{}" in perluniprops
184
185 Within a character class:
186
187 POSIX traditional Unicode
188 [:digit:] \d \p{Digit}
189 [:^digit:] \D \P{Digit}
190
191 ANCHORS
192 All are zero-width assertions.
193
194 ^ Match string start (or line, if /m is used)
195 $ Match string end (or line, if /m is used) or before newline
196 \b{} Match boundary of type specified within the braces
197 \B{} Match wherever \b{} doesn't match
198 \b Match word boundary (between \w and \W)
199 \B Match except at word boundary (between \w and \w or \W and \W)
200 \A Match string start (regardless of /m)
201 \Z Match string end (before optional newline)
202 \z Match absolute string end
203 \G Match where previous m//g left off
204 \K Keep the stuff left of the \K, don't include it in $&
205
206 QUANTIFIERS
207 Quantifiers are greedy by default and match the longest leftmost.
208
209 Maximal Minimal Possessive Allowed range
210 ------- ------- ---------- -------------
211 {n,m} {n,m}? {n,m}+ Must occur at least n times
212 but no more than m times
213 {n,} {n,}? {n,}+ Must occur at least n times
214 {,n} {,n}? {,n}+ Must occur at most n times
215 {n} {n}? {n}+ Must occur exactly n times
216 * *? *+ 0 or more times (same as {0,})
217 + +? ++ 1 or more times (same as {1,})
218 ? ?? ?+ 0 or 1 time (same as {0,1})
219
220 The possessive forms (new in Perl 5.10) prevent backtracking: what gets
221 matched by a pattern with a possessive quantifier will not be
222 backtracked into, even if that causes the whole match to fail.
223
224 EXTENDED CONSTRUCTS
225 (?#text) A comment
226 (?:...) Groups subexpressions without capturing (cluster)
227 (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
228 (?=...) Zero-width positive lookahead assertion
229 (*pla:...) Same, starting in 5.32; experimentally in 5.28
230 (*positive_lookahead:...) Same, same versions as *pla
231 (?!...) Zero-width negative lookahead assertion
232 (*nla:...) Same, starting in 5.32; experimentally in 5.28
233 (*negative_lookahead:...) Same, same versions as *nla
234 (?<=...) Zero-width positive lookbehind assertion
235 (*plb:...) Same, starting in 5.32; experimentally in 5.28
236 (*positive_lookbehind:...) Same, same versions as *plb
237 (?<!...) Zero-width negative lookbehind assertion
238 (*nlb:...) Same, starting in 5.32; experimentally in 5.28
239 (*negative_lookbehind:...) Same, same versions as *plb
240 (?>...) Grab what we can, prohibit backtracking
241 (*atomic:...) Same, starting in 5.32; experimentally in 5.28
242 (?|...) Branch reset
243 (?<name>...) Named capture
244 (?'name'...) Named capture
245 (?P<name>...) Named capture (python syntax)
246 (?[...]) Extended bracketed character class
247 (?{ code }) Embedded code, return value becomes $^R
248 (??{ code }) Dynamic regex, return value used as regex
249 (?N) Recurse into subpattern number N
250 (?-N), (?+N) Recurse into Nth previous/next subpattern
251 (?R), (?0) Recurse at the beginning of the whole pattern
252 (?&name) Recurse into a named subpattern
253 (?P>name) Recurse into a named subpattern (python syntax)
254 (?(cond)yes|no)
255 (?(cond)yes) Conditional expression, where "(cond)" can be:
256 (?=pat) lookahead; also (*pla:pat)
257 (*positive_lookahead:pat)
258 (?!pat) negative lookahead; also (*nla:pat)
259 (*negative_lookahead:pat)
260 (?<=pat) lookbehind; also (*plb:pat)
261 (*lookbehind:pat)
262 (?<!pat) negative lookbehind; also (*nlb:pat)
263 (*negative_lookbehind:pat)
264 (N) subpattern N has matched something
265 (<name>) named subpattern has matched something
266 ('name') named subpattern has matched something
267 (?{code}) code condition
268 (R) true if recursing
269 (RN) true if recursing into Nth subpattern
270 (R&name) true if recursing into named subpattern
271 (DEFINE) always false, no no-pattern allowed
272
273 VARIABLES
274 $_ Default variable for operators to use
275
276 $` Everything prior to matched string
277 $& Entire matched string
278 $' Everything after to matched string
279
280 ${^PREMATCH} Everything prior to matched string
281 ${^MATCH} Entire matched string
282 ${^POSTMATCH} Everything after to matched string
283
284 Note to those still using Perl 5.18 or earlier: The use of "$`", $& or
285 "$'" will slow down all regex use within your program. Consult perlvar
286 for "@-" to see equivalent expressions that won't cause slow down. See
287 also Devel::SawAmpersand. Starting with Perl 5.10, you can also use the
288 equivalent variables "${^PREMATCH}", "${^MATCH}" and "${^POSTMATCH}",
289 but for them to be defined, you have to specify the "/p" (preserve)
290 modifier on your regular expression. In Perl 5.20, the use of "$`", $&
291 and "$'" makes no speed difference.
292
293 $1, $2 ... hold the Xth captured expr
294 $+ Last parenthesized pattern match
295 $^N Holds the most recently closed capture
296 $^R Holds the result of the last (?{...}) expr
297 @- Offsets of starts of groups. $-[0] holds start of whole match
298 @+ Offsets of ends of groups. $+[0] holds end of whole match
299 %+ Named capture groups
300 %- Named capture groups, as array refs
301
302 Captured groups are numbered according to their opening paren.
303
304 FUNCTIONS
305 lc Lowercase a string
306 lcfirst Lowercase first char of a string
307 uc Uppercase a string
308 ucfirst Titlecase first char of a string
309 fc Foldcase a string
310
311 pos Return or set current match position
312 quotemeta Quote metacharacters
313 reset Reset m?pattern? status
314 study Analyze string for optimizing matching
315
316 split Use a regex to split a string into parts
317
318 The first five of these are like the escape sequences "\L", "\l", "\U",
319 "\u", and "\F". For Titlecase, see "Titlecase"; For Foldcase, see
320 "Foldcase".
321
322 TERMINOLOGY
323 Titlecase
324
325 Unicode concept which most often is equal to uppercase, but for certain
326 characters like the German "sharp s" there is a difference.
327
328 Foldcase
329
330 Unicode form that is useful when comparing strings regardless of case,
331 as certain characters have complex one-to-many case mappings. Primarily
332 a variant of lowercase.
333
335 Iain Truskett. Updated by the Perl 5 Porters.
336
337 This document may be distributed under the same terms as Perl itself.
338
340 • perlretut for a tutorial on regular expressions.
341
342 • perlrequick for a rapid tutorial.
343
344 • perlre for more details.
345
346 • perlvar for details on the variables.
347
348 • perlop for details on the operators.
349
350 • perlfunc for details on the functions.
351
352 • perlfaq6 for FAQs on regular expressions.
353
354 • perlrebackslash for a reference on backslash sequences.
355
356 • perlrecharclass for a reference on character classes.
357
358 • The re module to alter behaviour and aid debugging.
359
360 • "Debugging Regular Expressions" in perldebug
361
362 • perluniintro, perlunicode, charnames and perllocale for details on
363 regexes and internationalisation.
364
365 • Mastering Regular Expressions by Jeffrey Friedl
366 (<http://oreilly.com/catalog/9780596528126/>) for a thorough
367 grounding and reference on the topic.
368
370 David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom
371 Christiansen, Jim Cromie, and Jeffrey Goff for useful advice.
372
373
374
375perl v5.36.0 2022-08-30 PERLREREF(1)