1PCRESYNTAX(3) Library Functions Manual PCRESYNTAX(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 The full syntax and semantics of the regular expressions that are sup‐
11 ported by PCRE are described in the pcrepattern documentation. This
12 document contains just a quick-reference summary of the syntax.
13
15
16 \x where x is non-alphanumeric is a literal x
17 \Q...\E treat enclosed characters as literal
18
20
21 \a alarm, that is, the BEL character (hex 07)
22 \cx "control-x", where x is any character
23 \e escape (hex 1B)
24 \f form feed (hex 0C)
25 \n newline (hex 0A)
26 \r carriage return (hex 0D)
27 \t tab (hex 09)
28 \ddd character with octal code ddd, or backreference
29 \xhh character with hex code hh
30 \x{hhh..} character with hex code hhh..
31
33
34 . any character except newline;
35 in dotall mode, any character whatsoever
36 \C one byte, even in UTF-8 mode (best avoided)
37 \d a decimal digit
38 \D a character that is not a decimal digit
39 \h a horizontal white space character
40 \H a character that is not a horizontal white space character
41 \p{xx} a character with the xx property
42 \P{xx} a character without the xx property
43 \R a newline sequence
44 \s a white space character
45 \S a character that is not a white space character
46 \v a vertical white space character
47 \V a character that is not a vertical white space character
48 \w a "word" character
49 \W a "non-word" character
50 \X an extended Unicode sequence
51
52 In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
53
55
56 C Other
57 Cc Control
58 Cf Format
59 Cn Unassigned
60 Co Private use
61 Cs Surrogate
62
63 L Letter
64 Ll Lower case letter
65 Lm Modifier letter
66 Lo Other letter
67 Lt Title case letter
68 Lu Upper case letter
69 L& Ll, Lu, or Lt
70
71 M Mark
72 Mc Spacing mark
73 Me Enclosing mark
74 Mn Non-spacing mark
75
76 N Number
77 Nd Decimal number
78 Nl Letter number
79 No Other number
80
81 P Punctuation
82 Pc Connector punctuation
83 Pd Dash punctuation
84 Pe Close punctuation
85 Pf Final punctuation
86 Pi Initial punctuation
87 Po Other punctuation
88 Ps Open punctuation
89
90 S Symbol
91 Sc Currency symbol
92 Sk Modifier symbol
93 Sm Mathematical symbol
94 So Other symbol
95
96 Z Separator
97 Zl Line separator
98 Zp Paragraph separator
99 Zs Space separator
100
102
103 Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese,
104 Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic, Cuneiform,
105 Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
106 Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira‐
107 gana, Inherited, Kannada, Katakana, Kharoshthi, Khmer, Lao, Latin,
108 Limbu, Linear_B, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko,
109 Ogham, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
110 Runic, Shavian, Sinhala, Syloti_Nagri, Syriac, Tagalog, Tagbanwa,
111 Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
112
114
115 [...] positive character class
116 [^...] negative character class
117 [x-y] range (can be used for hex characters)
118 [[:xxx:]] positive POSIX named set
119 [[:^xxx:]] negative POSIX named set
120
121 alnum alphanumeric
122 alpha alphabetic
123 ascii 0-127
124 blank space or tab
125 cntrl control character
126 digit decimal digit
127 graph printing, excluding space
128 lower lower case letter
129 print printing, including space
130 punct printing, excluding alphanumeric
131 space white space
132 upper upper case letter
133 word same as \w
134 xdigit hexadecimal digit
135
136 In PCRE, POSIX character set names recognize only ASCII characters. You
137 can use \Q...\E inside a character class.
138
140
141 ? 0 or 1, greedy
142 ?+ 0 or 1, possessive
143 ?? 0 or 1, lazy
144 * 0 or more, greedy
145 *+ 0 or more, possessive
146 *? 0 or more, lazy
147 + 1 or more, greedy
148 ++ 1 or more, possessive
149 +? 1 or more, lazy
150 {n} exactly n
151 {n,m} at least n, no more than m, greedy
152 {n,m}+ at least n, no more than m, possessive
153 {n,m}? at least n, no more than m, lazy
154 {n,} n or more, greedy
155 {n,}+ n or more, possessive
156 {n,}? n or more, lazy
157
159
160 \b word boundary
161 \B not a word boundary
162 ^ start of subject
163 also after internal newline in multiline mode
164 \A start of subject
165 $ end of subject
166 also before newline at end of subject
167 also before internal newline in multiline mode
168 \Z end of subject
169 also before newline at end of subject
170 \z end of subject
171 \G first matching position in subject
172
174
175 \K reset start of match
176
178
179 expr|expr|expr...
180
182
183 (...) capturing group
184 (?<name>...) named capturing group (Perl)
185 (?'name'...) named capturing group (Perl)
186 (?P<name>...) named capturing group (Python)
187 (?:...) non-capturing group
188 (?|...) non-capturing group; reset group numbers for
189 capturing groups in each alternative
190
192
193 (?>...) atomic, non-capturing group
194
196
197 (?#....) comment (not nestable)
198
200
201 (?i) caseless
202 (?J) allow duplicate names
203 (?m) multiline
204 (?s) single line (dotall)
205 (?U) default ungreedy (lazy)
206 (?x) extended (ignore white space)
207 (?-...) unset option(s)
208
210
211 (?=...) positive look ahead
212 (?!...) negative look ahead
213 (?<=...) positive look behind
214 (?<!...) negative look behind
215
216 Each top-level branch of a look behind must be of a fixed length.
217
219
220 \n reference by number (can be ambiguous)
221 \gn reference by number
222 \g{n} reference by number
223 \g{-n} relative reference by number
224 \k<name> reference by name (Perl)
225 \k'name' reference by name (Perl)
226 \g{name} reference by name (Perl)
227 \k{name} reference by name (.NET)
228 (?P=name) reference by name (Python)
229
231
232 (?R) recurse whole pattern
233 (?n) call subpattern by absolute number
234 (?+n) call subpattern by relative number
235 (?-n) call subpattern by relative number
236 (?&name) call subpattern by name (Perl)
237 (?P>name) call subpattern by name (Python)
238 \g<name> call subpattern by name (Oniguruma)
239 \g'name' call subpattern by name (Oniguruma)
240 \g<n> call subpattern by absolute number (Oniguruma)
241 \g'n' call subpattern by absolute number (Oniguruma)
242 \g<+n> call subpattern by relative number (PCRE extension)
243 \g'+n' call subpattern by relative number (PCRE extension)
244 \g<-n> call subpattern by relative number (PCRE extension)
245 \g'-n' call subpattern by relative number (PCRE extension)
246
248
249 (?(condition)yes-pattern)
250 (?(condition)yes-pattern|no-pattern)
251
252 (?(n)... absolute reference condition
253 (?(+n)... relative reference condition
254 (?(-n)... relative reference condition
255 (?(<name>)... named reference condition (Perl)
256 (?('name')... named reference condition (Perl)
257 (?(name)... named reference condition (PCRE)
258 (?(R)... overall recursion condition
259 (?(Rn)... specific group recursion condition
260 (?(R&name)... specific recursion condition
261 (?(DEFINE)... define subpattern for reference
262 (?(assert)... assertion condition
263
265
266 The following act immediately they are reached:
267
268 (*ACCEPT) force successful match
269 (*FAIL) force backtrack; synonym (*F)
270
271 The following act only when a subsequent match failure causes a back‐
272 track to reach them. They all force a match failure, but they differ in
273 what happens afterwards. Those that advance the start-of-match point do
274 so only if the pattern is not anchored.
275
276 (*COMMIT) overall failure, no advance of starting point
277 (*PRUNE) advance to next starting character
278 (*SKIP) advance start to current matching position
279 (*THEN) local failure, backtrack to next alternation
280
282
283 These are recognized only at the very start of the pattern or after a
284 (*BSR_...) option.
285
286 (*CR)
287 (*LF)
288 (*CRLF)
289 (*ANYCRLF)
290 (*ANY)
291
293
294 These are recognized only at the very start of the pattern or after a
295 (*...) option that sets the newline convention.
296
297 (*BSR_ANYCRLF)
298 (*BSR_UNICODE)
299
301
302 (?C) callout
303 (?Cn) callout with data n
304
306
307 pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
308
310
311 Philip Hazel
312 University Computing Service
313 Cambridge CB2 3QH, England.
314
316
317 Last updated: 09 April 2008
318 Copyright (c) 1997-2008 University of Cambridge.
319
320
321
322 PCRESYNTAX(3)