1PCRESYNTAX(3) Library Functions Manual PCRESYNTAX(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 The full syntax and semantics of the regular expressions that are sup‐
11 ported by PCRE are described in the pcrepattern documentation. This
12 document contains a quick-reference summary of the syntax.
13
15
16 \x where x is non-alphanumeric is a literal x
17 \Q...\E treat enclosed characters as literal
18
20
21 \a alarm, that is, the BEL character (hex 07)
22 \cx "control-x", where x is any ASCII character
23 \e escape (hex 1B)
24 \f form feed (hex 0C)
25 \n newline (hex 0A)
26 \r carriage return (hex 0D)
27 \t tab (hex 09)
28 \ddd character with octal code ddd, or backreference
29 \xhh character with hex code hh
30 \x{hhh..} character with hex code hhh..
31
33
34 . any character except newline;
35 in dotall mode, any character whatsoever
36 \C one data unit, even in UTF mode (best avoided)
37 \d a decimal digit
38 \D a character that is not a decimal digit
39 \h a horizontal white space character
40 \H a character that is not a horizontal white space character
41 \N a character that is not a newline
42 \p{xx} a character with the xx property
43 \P{xx} a character without the xx property
44 \R a newline sequence
45 \s a white space character
46 \S a character that is not a white space character
47 \v a vertical white space character
48 \V a character that is not a vertical white space character
49 \w a "word" character
50 \W a "non-word" character
51 \X a Unicode extended grapheme cluster
52
53 In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII
54 characters, even in a UTF mode. However, this can be changed by setting
55 the PCRE_UCP option.
56
58
59 C Other
60 Cc Control
61 Cf Format
62 Cn Unassigned
63 Co Private use
64 Cs Surrogate
65
66 L Letter
67 Ll Lower case letter
68 Lm Modifier letter
69 Lo Other letter
70 Lt Title case letter
71 Lu Upper case letter
72 L& Ll, Lu, or Lt
73
74 M Mark
75 Mc Spacing mark
76 Me Enclosing mark
77 Mn Non-spacing mark
78
79 N Number
80 Nd Decimal number
81 Nl Letter number
82 No Other number
83
84 P Punctuation
85 Pc Connector punctuation
86 Pd Dash punctuation
87 Pe Close punctuation
88 Pf Final punctuation
89 Pi Initial punctuation
90 Po Other punctuation
91 Ps Open punctuation
92
93 S Symbol
94 Sc Currency symbol
95 Sk Modifier symbol
96 Sm Mathematical symbol
97 So Other symbol
98
99 Z Separator
100 Zl Line separator
101 Zp Paragraph separator
102 Zs Space separator
103
105
106 Xan Alphanumeric: union of properties L and N
107 Xps POSIX space: property Z or tab, NL, VT, FF, CR
108 Xsp Perl space: property Z or tab, NL, FF, CR
109 Xwd Perl word: property Xan or underscore
110
112
113 Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
114 Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
115 Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
116 Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
117 Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira‐
118 gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip‐
119 tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
120 Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
121 Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
122 Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
123 Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
124 Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari‐
125 tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
126 Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
127 Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
128 Yi.
129
131
132 [...] positive character class
133 [^...] negative character class
134 [x-y] range (can be used for hex characters)
135 [[:xxx:]] positive POSIX named set
136 [[:^xxx:]] negative POSIX named set
137
138 alnum alphanumeric
139 alpha alphabetic
140 ascii 0-127
141 blank space or tab
142 cntrl control character
143 digit decimal digit
144 graph printing, excluding space
145 lower lower case letter
146 print printing, including space
147 punct printing, excluding alphanumeric
148 space white space
149 upper upper case letter
150 word same as \w
151 xdigit hexadecimal digit
152
153 In PCRE, POSIX character set names recognize only ASCII characters by
154 default, but some of them use Unicode properties if PCRE_UCP is set.
155 You can use \Q...\E inside a character class.
156
158
159 ? 0 or 1, greedy
160 ?+ 0 or 1, possessive
161 ?? 0 or 1, lazy
162 * 0 or more, greedy
163 *+ 0 or more, possessive
164 *? 0 or more, lazy
165 + 1 or more, greedy
166 ++ 1 or more, possessive
167 +? 1 or more, lazy
168 {n} exactly n
169 {n,m} at least n, no more than m, greedy
170 {n,m}+ at least n, no more than m, possessive
171 {n,m}? at least n, no more than m, lazy
172 {n,} n or more, greedy
173 {n,}+ n or more, possessive
174 {n,}? n or more, lazy
175
177
178 \b word boundary
179 \B not a word boundary
180 ^ start of subject
181 also after internal newline in multiline mode
182 \A start of subject
183 $ end of subject
184 also before newline at end of subject
185 also before internal newline in multiline mode
186 \Z end of subject
187 also before newline at end of subject
188 \z end of subject
189 \G first matching position in subject
190
192
193 \K reset start of match
194
196
197 expr|expr|expr...
198
200
201 (...) capturing group
202 (?<name>...) named capturing group (Perl)
203 (?'name'...) named capturing group (Perl)
204 (?P<name>...) named capturing group (Python)
205 (?:...) non-capturing group
206 (?|...) non-capturing group; reset group numbers for
207 capturing groups in each alternative
208
210
211 (?>...) atomic, non-capturing group
212
214
215 (?#....) comment (not nestable)
216
218
219 (?i) caseless
220 (?J) allow duplicate names
221 (?m) multiline
222 (?s) single line (dotall)
223 (?U) default ungreedy (lazy)
224 (?x) extended (ignore white space)
225 (?-...) unset option(s)
226
227 The following are recognized only at the start of a pattern or after
228 one of the newline-setting options with similar syntax:
229
230 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
231 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8)
232 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16)
233 (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32)
234 (*UTF) set appropriate UTF mode for the library in use
235 (*UCP) set PCRE_UCP (use Unicode properties for \d etc)
236
238
239 (?=...) positive look ahead
240 (?!...) negative look ahead
241 (?<=...) positive look behind
242 (?<!...) negative look behind
243
244 Each top-level branch of a look behind must be of a fixed length.
245
247
248 \n reference by number (can be ambiguous)
249 \gn reference by number
250 \g{n} reference by number
251 \g{-n} relative reference by number
252 \k<name> reference by name (Perl)
253 \k'name' reference by name (Perl)
254 \g{name} reference by name (Perl)
255 \k{name} reference by name (.NET)
256 (?P=name) reference by name (Python)
257
259
260 (?R) recurse whole pattern
261 (?n) call subpattern by absolute number
262 (?+n) call subpattern by relative number
263 (?-n) call subpattern by relative number
264 (?&name) call subpattern by name (Perl)
265 (?P>name) call subpattern by name (Python)
266 \g<name> call subpattern by name (Oniguruma)
267 \g'name' call subpattern by name (Oniguruma)
268 \g<n> call subpattern by absolute number (Oniguruma)
269 \g'n' call subpattern by absolute number (Oniguruma)
270 \g<+n> call subpattern by relative number (PCRE extension)
271 \g'+n' call subpattern by relative number (PCRE extension)
272 \g<-n> call subpattern by relative number (PCRE extension)
273 \g'-n' call subpattern by relative number (PCRE extension)
274
276
277 (?(condition)yes-pattern)
278 (?(condition)yes-pattern|no-pattern)
279
280 (?(n)... absolute reference condition
281 (?(+n)... relative reference condition
282 (?(-n)... relative reference condition
283 (?(<name>)... named reference condition (Perl)
284 (?('name')... named reference condition (Perl)
285 (?(name)... named reference condition (PCRE)
286 (?(R)... overall recursion condition
287 (?(Rn)... specific group recursion condition
288 (?(R&name)... specific recursion condition
289 (?(DEFINE)... define subpattern for reference
290 (?(assert)... assertion condition
291
293
294 The following act immediately they are reached:
295
296 (*ACCEPT) force successful match
297 (*FAIL) force backtrack; synonym (*F)
298 (*MARK:NAME) set name to be passed back; synonym (*:NAME)
299
300 The following act only when a subsequent match failure causes a back‐
301 track to reach them. They all force a match failure, but they differ in
302 what happens afterwards. Those that advance the start-of-match point do
303 so only if the pattern is not anchored.
304
305 (*COMMIT) overall failure, no advance of starting point
306 (*PRUNE) advance to next starting character
307 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE)
308 (*SKIP) advance to current matching position
309 (*SKIP:NAME) advance to position corresponding to an earlier
310 (*MARK:NAME); if not found, the (*SKIP) is ignored
311 (*THEN) local failure, backtrack to next alternation
312 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN)
313
315
316 These are recognized only at the very start of the pattern or after a
317 (*BSR_...), (*UTF8), (*UTF16), (*UTF32) or (*UCP) option.
318
319 (*CR) carriage return only
320 (*LF) linefeed only
321 (*CRLF) carriage return followed by linefeed
322 (*ANYCRLF) all three of the above
323 (*ANY) any Unicode newline sequence
324
326
327 These are recognized only at the very start of the pattern or after a
328 (*...) option that sets the newline convention or a UTF or UCP mode.
329
330 (*BSR_ANYCRLF) CR, LF, or CRLF
331 (*BSR_UNICODE) any Unicode newline sequence
332
334
335 (?C) callout
336 (?Cn) callout with data n
337
339
340 pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
341
343
344 Philip Hazel
345 University Computing Service
346 Cambridge CB2 3QH, England.
347
349
350 Last updated: 11 November 2012
351 Copyright (c) 1997-2012 University of Cambridge.
352
353
354
355PCRE 8.32 11 November 2012 PCRESYNTAX(3)