1PERLREREF(1) Perl Programmers Reference Guide PERLREREF(1)
2
3
4
6 perlreref - Perl Regular Expressions Reference
7
9 This is a quick reference to Perl's regular expressions. For full
10 information see perlre and perlop, as well as the "SEE ALSO" section in
11 this document.
12
13 OPERATORS
14 "=~" determines to which variable the regex is applied. In its
15 absence, $_ is used.
16
17 $var =~ /foo/;
18
19 "!~" determines to which variable the regex is applied, and negates the
20 result of the match; it returns false if the match succeeds, and true
21 if it fails.
22
23 $var !~ /foo/;
24
25 "m/pattern/msixpogc" searches a string for a pattern match, applying
26 the given options.
27
28 m Multiline mode - ^ and $ match internal lines
29 s match as a Single line - . matches \n
30 i case-Insensitive
31 x eXtended legibility - free whitespace and comments
32 p Preserve a copy of the matched string -
33 ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
34 o compile pattern Once
35 g Global - all occurrences
36 c don't reset pos on failed matches when using /g
37
38 If 'pattern' is an empty string, the last successfully matched regex is
39 used. Delimiters other than '/' may be used for both this operator and
40 the following ones. The leading "m" can be omitted if the delimiter is
41 '/'.
42
43 "qr/pattern/msixpo" lets you store a regex in a variable, or pass one
44 around. Modifiers as for "m//", and are stored within the regex.
45
46 "s/pattern/replacement/msixpogce" substitutes matches of 'pattern' with
47 'replacement'. Modifiers as for "m//", with one addition:
48
49 e Evaluate 'replacement' as an expression
50
51 'e' may be specified multiple times. 'replacement' is interpreted as a
52 double quoted string unless a single-quote ("'") is the delimiter.
53
54 "?pattern?" is like "m/pattern/" but matches only once. No alternate
55 delimiters can be used. Must be reset with reset().
56
57 SYNTAX
58 \ Escapes the character immediately following it
59 . Matches any single character except a newline (unless /s is used)
60 ^ Matches at the beginning of the string (or line, if /m is used)
61 $ Matches at the end of the string (or line, if /m is used)
62 * Matches the preceding element 0 or more times
63 + Matches the preceding element 1 or more times
64 ? Matches the preceding element 0 or 1 times
65 {...} Specifies a range of occurrences for the element preceding it
66 [...] Matches any one of the characters contained within the brackets
67 (...) Groups subexpressions for capturing to $1, $2...
68 (?:...) Groups subexpressions without capturing (cluster)
69 | Matches either the subexpression preceding or following it
70 \1, \2, \3 ... Matches the text from the Nth group
71 \g1 or \g{1}, \g2 ... Matches the text from the Nth group
72 \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
73 \g{name} Named backreference
74 \k<name> Named backreference
75 \k'name' Named backreference
76 (?P=name) Named backreference (python syntax)
77
78 ESCAPE SEQUENCES
79 These work as in normal strings.
80
81 \a Alarm (beep)
82 \e Escape
83 \f Formfeed
84 \n Newline
85 \r Carriage return
86 \t Tab
87 \037 Any octal ASCII value
88 \x7f Any hexadecimal ASCII value
89 \x{263a} A wide hexadecimal value
90 \cx Control-x
91 \N{name} A named character
92
93 \l Lowercase next character
94 \u Titlecase next character
95 \L Lowercase until \E
96 \U Uppercase until \E
97 \Q Disable pattern metacharacters until \E
98 \E End modification
99
100 For Titlecase, see "Titlecase".
101
102 This one works differently from normal strings:
103
104 \b An assertion, not backspace, except in a character class
105
106 CHARACTER CLASSES
107 [amy] Match 'a', 'm' or 'y'
108 [f-j] Dash specifies "range"
109 [f-j-] Dash escaped or at start or end means 'dash'
110 [^f-j] Caret indicates "match any character _except_ these"
111
112 The following sequences work within or without a character class. The
113 first six are locale aware, all are Unicode aware. See perllocale and
114 perlunicode for details.
115
116 \d A digit
117 \D A nondigit
118 \w A word character
119 \W A non-word character
120 \s A whitespace character
121 \S A non-whitespace character
122 \h An horizontal white space
123 \H A non horizontal white space
124 \v A vertical white space
125 \V A non vertical white space
126 \R A generic newline (?>\v|\x0D\x0A)
127
128 \C Match a byte (with Unicode, '.' matches a character)
129 \pP Match P-named (Unicode) property
130 \p{...} Match Unicode property with long name
131 \PP Match non-P
132 \P{...} Match lack of Unicode property with long name
133 \X Match extended Unicode combining character sequence
134
135 POSIX character classes and their Unicode and Perl equivalents:
136
137 alnum IsAlnum Alphanumeric
138 alpha IsAlpha Alphabetic
139 ascii IsASCII Any ASCII char
140 blank IsSpace [ \t] Horizontal whitespace (GNU extension)
141 cntrl IsCntrl Control characters
142 digit IsDigit \d Digits
143 graph IsGraph Alphanumeric and punctuation
144 lower IsLower Lowercase chars (locale and Unicode aware)
145 print IsPrint Alphanumeric, punct, and space
146 punct IsPunct Punctuation
147 space IsSpace [\s\ck] Whitespace
148 IsSpacePerl \s Perl's whitespace definition
149 upper IsUpper Uppercase chars (locale and Unicode aware)
150 word IsWord \w Alphanumeric plus _ (Perl extension)
151 xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
152
153 Within a character class:
154
155 POSIX traditional Unicode
156 [:digit:] \d \p{IsDigit}
157 [:^digit:] \D \P{IsDigit}
158
159 ANCHORS
160 All are zero-width assertions.
161
162 ^ Match string start (or line, if /m is used)
163 $ Match string end (or line, if /m is used) or before newline
164 \b Match word boundary (between \w and \W)
165 \B Match except at word boundary (between \w and \w or \W and \W)
166 \A Match string start (regardless of /m)
167 \Z Match string end (before optional newline)
168 \z Match absolute string end
169 \G Match where previous m//g left off
170
171 \K Keep the stuff left of the \K, don't include it in $&
172
173 QUANTIFIERS
174 Quantifiers are greedy by default -- match the longest leftmost.
175
176 Maximal Minimal Possessive Allowed range
177 ------- ------- ---------- -------------
178 {n,m} {n,m}? {n,m}+ Must occur at least n times
179 but no more than m times
180 {n,} {n,}? {n,}+ Must occur at least n times
181 {n} {n}? {n}+ Must occur exactly n times
182 * *? *+ 0 or more times (same as {0,})
183 + +? ++ 1 or more times (same as {1,})
184 ? ?? ?+ 0 or 1 time (same as {0,1})
185
186 The possessive forms (new in Perl 5.10) prevent backtracking: what gets
187 matched by a pattern with a possessive quantifier will not be
188 backtracked into, even if that causes the whole match to fail.
189
190 There is no quantifier {,n} -- that gets understood as a literal
191 string.
192
193 EXTENDED CONSTRUCTS
194 (?#text) A comment
195 (?:...) Groups subexpressions without capturing (cluster)
196 (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
197 (?=...) Zero-width positive lookahead assertion
198 (?!...) Zero-width negative lookahead assertion
199 (?<=...) Zero-width positive lookbehind assertion
200 (?<!...) Zero-width negative lookbehind assertion
201 (?>...) Grab what we can, prohibit backtracking
202 (?|...) Branch reset
203 (?<name>...) Named capture
204 (?'name'...) Named capture
205 (?P<name>...) Named capture (python syntax)
206 (?{ code }) Embedded code, return value becomes $^R
207 (??{ code }) Dynamic regex, return value used as regex
208 (?N) Recurse into subpattern number N
209 (?-N), (?+N) Recurse into Nth previous/next subpattern
210 (?R), (?0) Recurse at the beginning of the whole pattern
211 (?&name) Recurse into a named subpattern
212 (?P>name) Recurse into a named subpattern (python syntax)
213 (?(cond)yes|no)
214 (?(cond)yes) Conditional expression, where "cond" can be:
215 (N) subpattern N has matched something
216 (<name>) named subpattern has matched something
217 ('name') named subpattern has matched something
218 (?{code}) code condition
219 (R) true if recursing
220 (RN) true if recursing into Nth subpattern
221 (R&name) true if recursing into named subpattern
222 (DEFINE) always false, no no-pattern allowed
223
224 VARIABLES
225 $_ Default variable for operators to use
226
227 $` Everything prior to matched string
228 $& Entire matched string
229 $' Everything after to matched string
230
231 ${^PREMATCH} Everything prior to matched string
232 ${^MATCH} Entire matched string
233 ${^POSTMATCH} Everything after to matched string
234
235 The use of "$`", $& or "$'" will slow down all regex use within your
236 program. Consult perlvar for "@-" to see equivalent expressions that
237 won't cause slow down. See also Devel::SawAmpersand. Starting with
238 Perl 5.10, you can also use the equivalent variables "${^PREMATCH}",
239 "${^MATCH}" and "${^POSTMATCH}", but for them to be defined, you have
240 to specify the "/p" (preserve) modifier on your regular expression.
241
242 $1, $2 ... hold the Xth captured expr
243 $+ Last parenthesized pattern match
244 $^N Holds the most recently closed capture
245 $^R Holds the result of the last (?{...}) expr
246 @- Offsets of starts of groups. $-[0] holds start of whole match
247 @+ Offsets of ends of groups. $+[0] holds end of whole match
248 %+ Named capture buffers
249 %- Named capture buffers, as array refs
250
251 Captured groups are numbered according to their opening paren.
252
253 FUNCTIONS
254 lc Lowercase a string
255 lcfirst Lowercase first char of a string
256 uc Uppercase a string
257 ucfirst Titlecase first char of a string
258
259 pos Return or set current match position
260 quotemeta Quote metacharacters
261 reset Reset ?pattern? status
262 study Analyze string for optimizing matching
263
264 split Use a regex to split a string into parts
265
266 The first four of these are like the escape sequences "\L", "\l", "\U",
267 and "\u". For Titlecase, see "Titlecase".
268
269 TERMINOLOGY
270 Titlecase
271
272 Unicode concept which most often is equal to uppercase, but for certain
273 characters like the German "sharp s" there is a difference.
274
276 Iain Truskett. Updated by the Perl 5 Porters.
277
278 This document may be distributed under the same terms as Perl itself.
279
281 · perlretut for a tutorial on regular expressions.
282
283 · perlrequick for a rapid tutorial.
284
285 · perlre for more details.
286
287 · perlvar for details on the variables.
288
289 · perlop for details on the operators.
290
291 · perlfunc for details on the functions.
292
293 · perlfaq6 for FAQs on regular expressions.
294
295 · perlrebackslash for a reference on backslash sequences.
296
297 · perlrecharclass for a reference on character classes.
298
299 · The re module to alter behaviour and aid debugging.
300
301 · "Debugging regular expressions" in perldebug
302
303 · perluniintro, perlunicode, charnames and perllocale for details on
304 regexes and internationalisation.
305
306 · Mastering Regular Expressions by Jeffrey Friedl
307 (http://regex.info/) for a thorough grounding and reference on the
308 topic.
309
311 David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom
312 Christiansen, Jim Cromie, and Jeffrey Goff for useful advice.
313
314
315
316perl v5.10.1 2009-02-12 PERLREREF(1)