1PCRESYNTAX(3)              Library Functions Manual              PCRESYNTAX(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE REGULAR EXPRESSION SYNTAX SUMMARY

9
10       The  full syntax and semantics of the regular expressions that are sup‐
11       ported by PCRE are described in  the  pcrepattern  documentation.  This
12       document contains just a quick-reference summary of the syntax.
13

QUOTING

15
16         \x         where x is non-alphanumeric is a literal x
17         \Q...\E    treat enclosed characters as literal
18

CHARACTERS

20
21         \a         alarm, that is, the BEL character (hex 07)
22         \cx        "control-x", where x is any character
23         \e         escape (hex 1B)
24         \f         form feed (hex 0C)
25         \n         newline (hex 0A)
26         \r         carriage return (hex 0D)
27         \t         tab (hex 09)
28         \ddd       character with octal code ddd, or backreference
29         \xhh       character with hex code hh
30         \x{hhh..}  character with hex code hhh..
31

CHARACTER TYPES

33
34         .          any character except newline;
35                      in dotall mode, any character whatsoever
36         \C         one byte, even in UTF-8 mode (best avoided)
37         \d         a decimal digit
38         \D         a character that is not a decimal digit
39         \h         a horizontal white space character
40         \H         a character that is not a horizontal white space character
41         \p{xx}     a character with the xx property
42         \P{xx}     a character without the xx property
43         \R         a newline sequence
44         \s         a white space character
45         \S         a character that is not a white space character
46         \v         a vertical white space character
47         \V         a character that is not a vertical white space character
48         \w         a "word" character
49         \W         a "non-word" character
50         \X         an extended Unicode sequence
51
52       In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
53

GENERAL CATEGORY PROPERTY CODES FOR \p and \P

55
56         C          Other
57         Cc         Control
58         Cf         Format
59         Cn         Unassigned
60         Co         Private use
61         Cs         Surrogate
62
63         L          Letter
64         Ll         Lower case letter
65         Lm         Modifier letter
66         Lo         Other letter
67         Lt         Title case letter
68         Lu         Upper case letter
69         L&         Ll, Lu, or Lt
70
71         M          Mark
72         Mc         Spacing mark
73         Me         Enclosing mark
74         Mn         Non-spacing mark
75
76         N          Number
77         Nd         Decimal number
78         Nl         Letter number
79         No         Other number
80
81         P          Punctuation
82         Pc         Connector punctuation
83         Pd         Dash punctuation
84         Pe         Close punctuation
85         Pf         Final punctuation
86         Pi         Initial punctuation
87         Po         Other punctuation
88         Ps         Open punctuation
89
90         S          Symbol
91         Sc         Currency symbol
92         Sk         Modifier symbol
93         Sm         Mathematical symbol
94         So         Other symbol
95
96         Z          Separator
97         Zl         Line separator
98         Zp         Paragraph separator
99         Zs         Space separator
100

SCRIPT NAMES FOR \p AND \P

102
103       Arabic,  Armenian,  Balinese,  Bengali,  Bopomofo,  Braille,  Buginese,
104       Buhid,  Canadian_Aboriginal,  Cherokee,  Common,   Coptic,   Cuneiform,
105       Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic,
106       Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew,  Hira‐
107       gana,  Inherited,  Kannada,  Katakana,  Kharoshthi,  Khmer, Lao, Latin,
108       Limbu,  Linear_B,  Malayalam,  Mongolian,  Myanmar,  New_Tai_Lue,  Nko,
109       Ogham,  Old_Italic,  Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician,
110       Runic,  Shavian,  Sinhala,  Syloti_Nagri,  Syriac,  Tagalog,  Tagbanwa,
111       Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Yi.
112

CHARACTER CLASSES

114
115         [...]       positive character class
116         [^...]      negative character class
117         [x-y]       range (can be used for hex characters)
118         [[:xxx:]]   positive POSIX named set
119         [[:^xxx:]]  negative POSIX named set
120
121         alnum       alphanumeric
122         alpha       alphabetic
123         ascii       0-127
124         blank       space or tab
125         cntrl       control character
126         digit       decimal digit
127         graph       printing, excluding space
128         lower       lower case letter
129         print       printing, including space
130         punct       printing, excluding alphanumeric
131         space       white space
132         upper       upper case letter
133         word        same as \w
134         xdigit      hexadecimal digit
135
136       In PCRE, POSIX character set names recognize only ASCII characters. You
137       can use \Q...\E inside a character class.
138

QUANTIFIERS

140
141         ?           0 or 1, greedy
142         ?+          0 or 1, possessive
143         ??          0 or 1, lazy
144         *           0 or more, greedy
145         *+          0 or more, possessive
146         *?          0 or more, lazy
147         +           1 or more, greedy
148         ++          1 or more, possessive
149         +?          1 or more, lazy
150         {n}         exactly n
151         {n,m}       at least n, no more than m, greedy
152         {n,m}+      at least n, no more than m, possessive
153         {n,m}?      at least n, no more than m, lazy
154         {n,}        n or more, greedy
155         {n,}+       n or more, possessive
156         {n,}?       n or more, lazy
157

ANCHORS AND SIMPLE ASSERTIONS

159
160         \b          word boundary
161         \B          not a word boundary
162         ^           start of subject
163                      also after internal newline in multiline mode
164         \A          start of subject
165         $           end of subject
166                      also before newline at end of subject
167                      also before internal newline in multiline mode
168         \Z          end of subject
169                      also before newline at end of subject
170         \z          end of subject
171         \G          first matching position in subject
172

MATCH POINT RESET

174
175         \K          reset start of match
176

ALTERNATION

178
179         expr|expr|expr...
180

CAPTURING

182
183         (...)          capturing group
184         (?<name>...)   named capturing group (Perl)
185         (?'name'...)   named capturing group (Perl)
186         (?P<name>...)  named capturing group (Python)
187         (?:...)        non-capturing group
188         (?|...)        non-capturing group; reset group numbers for
189                         capturing groups in each alternative
190

ATOMIC GROUPS

192
193         (?>...)        atomic, non-capturing group
194

COMMENT

196
197         (?#....)       comment (not nestable)
198

OPTION SETTING

200
201         (?i)           caseless
202         (?J)           allow duplicate names
203         (?m)           multiline
204         (?s)           single line (dotall)
205         (?U)           default ungreedy (lazy)
206         (?x)           extended (ignore white space)
207         (?-...)        unset option(s)
208

LOOKAHEAD AND LOOKBEHIND ASSERTIONS

210
211         (?=...)        positive look ahead
212         (?!...)        negative look ahead
213         (?<=...)       positive look behind
214         (?<!...)       negative look behind
215
216       Each top-level branch of a look behind must be of a fixed length.
217

BACKREFERENCES

219
220         \n             reference by number (can be ambiguous)
221         \gn            reference by number
222         \g{n}          reference by number
223         \g{-n}         relative reference by number
224         \k<name>       reference by name (Perl)
225         \k'name'       reference by name (Perl)
226         \g{name}       reference by name (Perl)
227         \k{name}       reference by name (.NET)
228         (?P=name)      reference by name (Python)
229

SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)

231
232         (?R)           recurse whole pattern
233         (?n)           call subpattern by absolute number
234         (?+n)          call subpattern by relative number
235         (?-n)          call subpattern by relative number
236         (?&name)       call subpattern by name (Perl)
237         (?P>name)      call subpattern by name (Python)
238         \g<name>       call subpattern by name (Oniguruma)
239         \g'name'       call subpattern by name (Oniguruma)
240         \g<n>          call subpattern by absolute number (Oniguruma)
241         \g'n'          call subpattern by absolute number (Oniguruma)
242         \g<+n>         call subpattern by relative number (PCRE extension)
243         \g'+n'         call subpattern by relative number (PCRE extension)
244         \g<-n>         call subpattern by relative number (PCRE extension)
245         \g'-n'         call subpattern by relative number (PCRE extension)
246

CONDITIONAL PATTERNS

248
249         (?(condition)yes-pattern)
250         (?(condition)yes-pattern|no-pattern)
251
252         (?(n)...       absolute reference condition
253         (?(+n)...      relative reference condition
254         (?(-n)...      relative reference condition
255         (?(<name>)...  named reference condition (Perl)
256         (?('name')...  named reference condition (Perl)
257         (?(name)...    named reference condition (PCRE)
258         (?(R)...       overall recursion condition
259         (?(Rn)...      specific group recursion condition
260         (?(R&name)...  specific recursion condition
261         (?(DEFINE)...  define subpattern for reference
262         (?(assert)...  assertion condition
263

BACKTRACKING CONTROL

265
266       The following act immediately they are reached:
267
268         (*ACCEPT)      force successful match
269         (*FAIL)        force backtrack; synonym (*F)
270
271       The following act only when a subsequent match failure causes  a  back‐
272       track to reach them. They all force a match failure, but they differ in
273       what happens afterwards. Those that advance the start-of-match point do
274       so only if the pattern is not anchored.
275
276         (*COMMIT)      overall failure, no advance of starting point
277         (*PRUNE)       advance to next starting character
278         (*SKIP)        advance start to current matching position
279         (*THEN)        local failure, backtrack to next alternation
280

NEWLINE CONVENTIONS

282
283       These  are  recognized only at the very start of the pattern or after a
284       (*BSR_...) option.
285
286         (*CR)
287         (*LF)
288         (*CRLF)
289         (*ANYCRLF)
290         (*ANY)
291

WHAT \R MATCHES

293
294       These are recognized only at the very start of the pattern or  after  a
295       (*...) option that sets the newline convention.
296
297         (*BSR_ANYCRLF)
298         (*BSR_UNICODE)
299

CALLOUTS

301
302         (?C)      callout
303         (?Cn)     callout with data n
304

SEE ALSO

306
307       pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3).
308

AUTHOR

310
311       Philip Hazel
312       University Computing Service
313       Cambridge CB2 3QH, England.
314

REVISION

316
317       Last updated: 09 April 2008
318       Copyright (c) 1997-2008 University of Cambridge.
319
320
321
322                                                                 PCRESYNTAX(3)
Impressum