1PERLREREF(1) Perl Programmers Reference Guide PERLREREF(1)
2
3
4
6 perlreref - Perl Regular Expressions Reference
7
9 This is a quick reference to Perl's regular expressions. For full
10 information see perlre and perlop, as well as the "SEE ALSO" section in
11 this document.
12
13 OPERATORS
14
15 =~ determines to which variable the regex is applied.
16 In its absence, $_ is used.
17
18 $var =~ /foo/;
19
20 !~ determines to which variable the regex is applied,
21 and negates the result of the match; it returns
22 false if the match succeeds, and true if it fails.
23
24 $var !~ /foo/;
25
26 m/pattern/igmsoxc searches a string for a pattern match,
27 applying the given options.
28
29 i case-Insensitive
30 g Global - all occurrences
31 m Multiline mode - ^ and $ match internal lines
32 s match as a Single line - . matches \n
33 o compile pattern Once
34 x eXtended legibility - free whitespace and comments
35 c don't reset pos on failed matches when using /g
36
37 If 'pattern' is an empty string, the last I<successfully> matched
38 regex is used. Delimiters other than '/' may be used for both this
39 operator and the following ones.
40
41 qr/pattern/imsox lets you store a regex in a variable,
42 or pass one around. Modifiers as for m// and are stored
43 within the regex.
44
45 s/pattern/replacement/igmsoxe substitutes matches of
46 'pattern' with 'replacement'. Modifiers as for m//
47 with one addition:
48
49 e Evaluate replacement as an expression
50
51 'e' may be specified multiple times. 'replacement' is interpreted
52 as a double quoted string unless a single-quote (') is the delimiter.
53
54 ?pattern? is like m/pattern/ but matches only once. No alternate
55 delimiters can be used. Must be reset with L<reset⎪perlfunc/reset>.
56
57 SYNTAX
58
59 \ Escapes the character immediately following it
60 . Matches any single character except a newline (unless /s is used)
61 ^ Matches at the beginning of the string (or line, if /m is used)
62 $ Matches at the end of the string (or line, if /m is used)
63 * Matches the preceding element 0 or more times
64 + Matches the preceding element 1 or more times
65 ? Matches the preceding element 0 or 1 times
66 {...} Specifies a range of occurrences for the element preceding it
67 [...] Matches any one of the characters contained within the brackets
68 (...) Groups subexpressions for capturing to $1, $2...
69 (?:...) Groups subexpressions without capturing (cluster)
70 ⎪ Matches either the subexpression preceding or following it
71 \1, \2 ... The text from the Nth group
72
73 ESCAPE SEQUENCES
74
75 These work as in normal strings.
76
77 \a Alarm (beep)
78 \e Escape
79 \f Formfeed
80 \n Newline
81 \r Carriage return
82 \t Tab
83 \037 Any octal ASCII value
84 \x7f Any hexadecimal ASCII value
85 \x{263a} A wide hexadecimal value
86 \cx Control-x
87 \N{name} A named character
88
89 \l Lowercase next character
90 \u Titlecase next character
91 \L Lowercase until \E
92 \U Uppercase until \E
93 \Q Disable pattern metacharacters until \E
94 \E End case modification
95
96 For Titlecase, see "Titlecase".
97
98 This one works differently from normal strings:
99
100 \b An assertion, not backspace, except in a character class
101
102 CHARACTER CLASSES
103
104 [amy] Match 'a', 'm' or 'y'
105 [f-j] Dash specifies "range"
106 [f-j-] Dash escaped or at start or end means 'dash'
107 [^f-j] Caret indicates "match any character _except_ these"
108
109 The following sequences work within or without a character class. The
110 first six are locale aware, all are Unicode aware. The default charac‐
111 ter class equivalent are given. See perllocale and perlunicode for
112 details.
113
114 \d A digit [0-9]
115 \D A nondigit [^0-9]
116 \w A word character [a-zA-Z0-9_]
117 \W A non-word character [^a-zA-Z0-9_]
118 \s A whitespace character [ \t\n\r\f]
119 \S A non-whitespace character [^ \t\n\r\f]
120
121 \C Match a byte (with Unicode, '.' matches a character)
122 \pP Match P-named (Unicode) property
123 \p{...} Match Unicode property with long name
124 \PP Match non-P
125 \P{...} Match lack of Unicode property with long name
126 \X Match extended unicode sequence
127
128 POSIX character classes and their Unicode and Perl equivalents:
129
130 alnum IsAlnum Alphanumeric
131 alpha IsAlpha Alphabetic
132 ascii IsASCII Any ASCII char
133 blank IsSpace [ \t] Horizontal whitespace (GNU extension)
134 cntrl IsCntrl Control characters
135 digit IsDigit \d Digits
136 graph IsGraph Alphanumeric and punctuation
137 lower IsLower Lowercase chars (locale and Unicode aware)
138 print IsPrint Alphanumeric, punct, and space
139 punct IsPunct Punctuation
140 space IsSpace [\s\ck] Whitespace
141 IsSpacePerl \s Perl's whitespace definition
142 upper IsUpper Uppercase chars (locale and Unicode aware)
143 word IsWord \w Alphanumeric plus _ (Perl extension)
144 xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit
145
146 Within a character class:
147
148 POSIX traditional Unicode
149 [:digit:] \d \p{IsDigit}
150 [:^digit:] \D \P{IsDigit}
151
152 ANCHORS
153
154 All are zero-width assertions.
155
156 ^ Match string start (or line, if /m is used)
157 $ Match string end (or line, if /m is used) or before newline
158 \b Match word boundary (between \w and \W)
159 \B Match except at word boundary (between \w and \w or \W and \W)
160 \A Match string start (regardless of /m)
161 \Z Match string end (before optional newline)
162 \z Match absolute string end
163 \G Match where previous m//g left off
164
165 QUANTIFIERS
166
167 Quantifiers are greedy by default -- match the longest leftmost.
168
169 Maximal Minimal Allowed range
170 ------- ------- -------------
171 {n,m} {n,m}? Must occur at least n times but no more than m times
172 {n,} {n,}? Must occur at least n times
173 {n} {n}? Must occur exactly n times
174 * *? 0 or more times (same as {0,})
175 + +? 1 or more times (same as {1,})
176 ? ?? 0 or 1 time (same as {0,1})
177
178 There is no quantifier {,n} -- that gets understood as a literal
179 string.
180
181 EXTENDED CONSTRUCTS
182
183 (?#text) A comment
184 (?imxs-imsx:...) Enable/disable option (as per m// modifiers)
185 (?=...) Zero-width positive lookahead assertion
186 (?!...) Zero-width negative lookahead assertion
187 (?<=...) Zero-width positive lookbehind assertion
188 (?<!...) Zero-width negative lookbehind assertion
189 (?>...) Grab what we can, prohibit backtracking
190 (?{ code }) Embedded code, return value becomes $^R
191 (??{ code }) Dynamic regex, return value used as regex
192 (?(cond)yes⎪no) cond being integer corresponding to capturing parens
193 (?(cond)yes) or a lookaround/eval zero-width assertion
194
195 VARIABLES
196
197 $_ Default variable for operators to use
198 $* Enable multiline matching (deprecated; not in 5.9.0 or later)
199
200 $& Entire matched string
201 $` Everything prior to matched string
202 $' Everything after to matched string
203
204 The use of those last three will slow down all regex use within your
205 program. Consult perlvar for @LAST_MATCH_START to see equivalent
206 expressions that won't cause slow down. See also Devel::SawAmpersand.
207
208 $1, $2 ... hold the Xth captured expr
209 $+ Last parenthesized pattern match
210 $^N Holds the most recently closed capture
211 $^R Holds the result of the last (?{...}) expr
212 @- Offsets of starts of groups. $-[0] holds start of whole match
213 @+ Offsets of ends of groups. $+[0] holds end of whole match
214
215 Captured groups are numbered according to their opening paren.
216
217 FUNCTIONS
218
219 lc Lowercase a string
220 lcfirst Lowercase first char of a string
221 uc Uppercase a string
222 ucfirst Titlecase first char of a string
223
224 pos Return or set current match position
225 quotemeta Quote metacharacters
226 reset Reset ?pattern? status
227 study Analyze string for optimizing matching
228
229 split Use regex to split a string into parts
230
231 The first four of these are like the escape sequences "\L", "\l", "\U",
232 and "\u". For Titlecase, see "Titlecase".
233
234 TERMINOLOGY
235
236 Titlecase
237
238 Unicode concept which most often is equal to uppercase, but for certain
239 characters like the German "sharp s" there is a difference.
240
242 Iain Truskett.
243
244 This document may be distributed under the same terms as Perl itself.
245
247 · perlretut for a tutorial on regular expressions.
248
249 · perlrequick for a rapid tutorial.
250
251 · perlre for more details.
252
253 · perlvar for details on the variables.
254
255 · perlop for details on the operators.
256
257 · perlfunc for details on the functions.
258
259 · perlfaq6 for FAQs on regular expressions.
260
261 · The re module to alter behaviour and aid debugging.
262
263 · "Debugging regular expressions" in perldebug
264
265 · perluniintro, perlunicode, charnames and locale for details on
266 regexes and internationalisation.
267
268 · Mastering Regular Expressions by Jeffrey Friedl
269 (http://regex.info/) for a thorough grounding and reference on the
270 topic.
271
273 David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom Chris‐
274 tiansen, Jim Cromie, and Jeffrey Goff for useful advice.
275
276
277
278perl v5.8.8 2006-01-07 PERLREREF(1)