pcresyntax(3)

1PCRESYNTAX(3)              Library Functions Manual              PCRESYNTAX(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE REGULAR EXPRESSION SYNTAX SUMMARY

9
10       The  full syntax and semantics of the regular expressions that are sup‐
11       ported by PCRE are described in  the  pcrepattern  documentation.  This
12       document contains a quick-reference summary of the syntax.
13

QUOTING

15
16         \x         where x is non-alphanumeric is a literal x
17         \Q...\E    treat enclosed characters as literal
18

CHARACTERS

20
21         \a         alarm, that is, the BEL character (hex 07)
22         \cx        "control-x", where x is any ASCII character
23         \e         escape (hex 1B)
24         \f         form feed (hex 0C)
25         \n         newline (hex 0A)
26         \r         carriage return (hex 0D)
27         \t         tab (hex 09)
28         \0dd       character with octal code 0dd
29         \ddd       character with octal code ddd, or backreference
30         \o{ddd..}  character with octal code ddd..
31         \xhh       character with hex code hh
32         \x{hhh..}  character with hex code hhh..
33
34       Note that \0dd is always an octal code, and that \8 and \9 are the lit‐
35       eral characters "8" and "9".
36

CHARACTER TYPES

38
39         .          any character except newline;
40                      in dotall mode, any character whatsoever
41         \C         one data unit, even in UTF mode (best avoided)
42         \d         a decimal digit
43         \D         a character that is not a decimal digit
44         \h         a horizontal white space character
45         \H         a character that is not a horizontal white space character
46         \N         a character that is not a newline
47         \p{xx}     a character with the xx property
48         \P{xx}     a character without the xx property
49         \R         a newline sequence
50         \s         a white space character
51         \S         a character that is not a white space character
52         \v         a vertical white space character
53         \V         a character that is not a vertical white space character
54         \w         a "word" character
55         \W         a "non-word" character
56         \X         a Unicode extended grapheme cluster
57
58       By default, \d, \s, and \w match only ASCII characters, even  in  UTF-8
59       mode  or  in  the 16- bit and 32-bit libraries. However, if locale-spe‐
60       cific matching is happening, \s and \w may also match  characters  with
61       code  points  in  the range 128-255. If the PCRE_UCP option is set, the
62       behaviour of these escape sequences is changed to use  Unicode  proper‐
63       ties and they match many more characters.
64

GENERAL CATEGORY PROPERTIES FOR \p and \P

66
67         C          Other
68         Cc         Control
69         Cf         Format
70         Cn         Unassigned
71         Co         Private use
72         Cs         Surrogate
73
74         L          Letter
75         Ll         Lower case letter
76         Lm         Modifier letter
77         Lo         Other letter
78         Lt         Title case letter
79         Lu         Upper case letter
80         L&         Ll, Lu, or Lt
81
82         M          Mark
83         Mc         Spacing mark
84         Me         Enclosing mark
85         Mn         Non-spacing mark
86
87         N          Number
88         Nd         Decimal number
89         Nl         Letter number
90         No         Other number
91
92         P          Punctuation
93         Pc         Connector punctuation
94         Pd         Dash punctuation
95         Pe         Close punctuation
96         Pf         Final punctuation
97         Pi         Initial punctuation
98         Po         Other punctuation
99         Ps         Open punctuation
100
101         S          Symbol
102         Sc         Currency symbol
103         Sk         Modifier symbol
104         Sm         Mathematical symbol
105         So         Other symbol
106
107         Z          Separator
108         Zl         Line separator
109         Zp         Paragraph separator
110         Zs         Space separator
111

PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P

113
114         Xan        Alphanumeric: union of properties L and N
115         Xps        POSIX space: property Z or tab, NL, VT, FF, CR
116         Xsp        Perl space: property Z or tab, NL, VT, FF, CR
117         Xuc        Universally-named character: one that can be
118                      represented by a Universal Character Name
119         Xwd        Perl word: property Xan or underscore
120
121       Perl and POSIX space are now the same. Perl added VT to its space char‐
122       acter set at release 5.18 and PCRE changed at release 8.34.
123

SCRIPT NAMES FOR \p AND \P

125
126       Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak,  Bengali,
127       Bopomofo,  Brahmi,  Braille, Buginese, Buhid, Canadian_Aboriginal, Car‐
128       ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei‐
129       form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero‐
130       glyphs,  Elbasan,  Ethiopic,  Georgian,  Glagolitic,  Gothic,  Grantha,
131       Greek,  Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Im‐
132       perial_Aramaic,     Inherited,     Inscriptional_Pahlavi,      Inscrip‐
133       tional_Parthian,   Javanese,   Kaithi,   Kannada,  Katakana,  Kayah_Li,
134       Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha,  Limbu,  Lin‐
135       ear_A,  Linear_B,  Lisu,  Lycian, Lydian, Mahajani, Malayalam, Mandaic,
136       Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, Meroitic_Hi‐
137       eroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean, New_Tai_Lue,
138       Nko,  Ogham,  Ol_Chiki,  Old_Italic,   Old_North_Arabian,   Old_Permic,
139       Old_Persian,   Old_South_Arabian,   Old_Turkic,   Oriya,  Osmanya,  Pa‐
140       hawh_Hmong,    Palmyrene,    Pau_Cin_Hau,     Phags_Pa,     Phoenician,
141       Psalter_Pahlavi,  Rejang,  Runic,  Samaritan, Saurashtra, Sharada, Sha‐
142       vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri,  Syriac,
143       Tagalog,  Tagbanwa,  Tai_Le,  Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
144       Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic,  Vai,  Warang_Citi,
145       Yi.
146

CHARACTER CLASSES

148
149         [...]       positive character class
150         [^...]      negative character class
151         [x-y]       range (can be used for hex characters)
152         [[:xxx:]]   positive POSIX named set
153         [[:^xxx:]]  negative POSIX named set
154
155         alnum       alphanumeric
156         alpha       alphabetic
157         ascii       0-127
158         blank       space or tab
159         cntrl       control character
160         digit       decimal digit
161         graph       printing, excluding space
162         lower       lower case letter
163         print       printing, including space
164         punct       printing, excluding alphanumeric
165         space       white space
166         upper       upper case letter
167         word        same as \w
168         xdigit      hexadecimal digit
169
170       In  PCRE,  POSIX character set names recognize only ASCII characters by
171       default, but some of them use Unicode properties if  PCRE_UCP  is  set.
172       You can use \Q...\E inside a character class.
173

QUANTIFIERS

175
176         ?           0 or 1, greedy
177         ?+          0 or 1, possessive
178         ??          0 or 1, lazy
179         *           0 or more, greedy
180         *+          0 or more, possessive
181         *?          0 or more, lazy
182         +           1 or more, greedy
183         ++          1 or more, possessive
184         +?          1 or more, lazy
185         {n}         exactly n
186         {n,m}       at least n, no more than m, greedy
187         {n,m}+      at least n, no more than m, possessive
188         {n,m}?      at least n, no more than m, lazy
189         {n,}        n or more, greedy
190         {n,}+       n or more, possessive
191         {n,}?       n or more, lazy
192

ANCHORS AND SIMPLE ASSERTIONS

194
195         \b          word boundary
196         \B          not a word boundary
197         ^           start of subject
198                      also after internal newline in multiline mode
199         \A          start of subject
200         $           end of subject
201                      also before newline at end of subject
202                      also before internal newline in multiline mode
203         \Z          end of subject
204                      also before newline at end of subject
205         \z          end of subject
206         \G          first matching position in subject
207

MATCH POINT RESET

209
210         \K          reset start of match
211
212       \K is honoured in positive assertions, but ignored in negative ones.
213

ALTERNATION

215
216         expr|expr|expr...
217

CAPTURING

219
220         (...)           capturing group
221         (?<name>...)    named capturing group (Perl)
222         (?'name'...)    named capturing group (Perl)
223         (?P<name>...)   named capturing group (Python)
224         (?:...)         non-capturing group
225         (?|...)         non-capturing group; reset group numbers for
226                          capturing groups in each alternative
227

ATOMIC GROUPS

229
230         (?>...)         atomic, non-capturing group
231

COMMENT

233
234         (?#....)        comment (not nestable)
235

OPTION SETTING

237
238         (?i)            caseless
239         (?J)            allow duplicate names
240         (?m)            multiline
241         (?s)            single line (dotall)
242         (?U)            default ungreedy (lazy)
243         (?x)            extended (ignore white space)
244         (?-...)         unset option(s)
245
246       The following are recognized only at the very start of a pattern or af‐
247       ter one of the newline or \R options with similar syntax. More than one
248       of them may appear.
249
250         (*LIMIT_MATCH=d) set the match limit to d (decimal number)
251         (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
252         (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
253         (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
254         (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
255         (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
256         (*UTF32)        set UTF-32 mode: 32-bit library (PCRE_UTF32)
257         (*UTF)          set appropriate UTF mode for the library in use
258         (*UCP)          set PCRE_UCP (use Unicode properties for \d etc)
259
260       Note  that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of
261       the limits set by the caller of pcre_exec(), not increase them.
262

NEWLINE CONVENTION

264
265       These are recognized only at the very start of the pattern or after op‐
266       tion settings with a similar syntax.
267
268         (*CR)           carriage return only
269         (*LF)           linefeed only
270         (*CRLF)         carriage return followed by linefeed
271         (*ANYCRLF)      all three of the above
272         (*ANY)          any Unicode newline sequence
273

WHAT \R MATCHES

275
276       These are recognized only at the very start of the pattern or after op‐
277       tion setting with a similar syntax.
278
279         (*BSR_ANYCRLF)  CR, LF, or CRLF
280         (*BSR_UNICODE)  any Unicode newline sequence
281

LOOKAHEAD AND LOOKBEHIND ASSERTIONS

283
284         (?=...)         positive look ahead
285         (?!...)         negative look ahead
286         (?<=...)        positive look behind
287         (?<!...)        negative look behind
288
289       Each top-level branch of a look behind must be of a fixed length.
290

BACKREFERENCES

292
293         \n              reference by number (can be ambiguous)
294         \gn             reference by number
295         \g{n}           reference by number
296         \g{-n}          relative reference by number
297         \k<name>        reference by name (Perl)
298         \k'name'        reference by name (Perl)
299         \g{name}        reference by name (Perl)
300         \k{name}        reference by name (.NET)
301         (?P=name)       reference by name (Python)
302

SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)

304
305         (?R)            recurse whole pattern
306         (?n)            call subpattern by absolute number
307         (?+n)           call subpattern by relative number
308         (?-n)           call subpattern by relative number
309         (?&name)        call subpattern by name (Perl)
310         (?P>name)       call subpattern by name (Python)
311         \g<name>        call subpattern by name (Oniguruma)
312         \g'name'        call subpattern by name (Oniguruma)
313         \g<n>           call subpattern by absolute number (Oniguruma)
314         \g'n'           call subpattern by absolute number (Oniguruma)
315         \g<+n>          call subpattern by relative number (PCRE extension)
316         \g'+n'          call subpattern by relative number (PCRE extension)
317         \g<-n>          call subpattern by relative number (PCRE extension)
318         \g'-n'          call subpattern by relative number (PCRE extension)
319

CONDITIONAL PATTERNS

321
322         (?(condition)yes-pattern)
323         (?(condition)yes-pattern|no-pattern)
324
325         (?(n)...        absolute reference condition
326         (?(+n)...       relative reference condition
327         (?(-n)...       relative reference condition
328         (?(<name>)...   named reference condition (Perl)
329         (?('name')...   named reference condition (Perl)
330         (?(name)...     named reference condition (PCRE)
331         (?(R)...        overall recursion condition
332         (?(Rn)...       specific group recursion condition
333         (?(R&name)...   specific recursion condition
334         (?(DEFINE)...   define subpattern for reference
335         (?(assert)...   assertion condition
336

BACKTRACKING CONTROL

338
339       The following act immediately they are reached:
340
341         (*ACCEPT)       force successful match
342         (*FAIL)         force backtrack; synonym (*F)
343         (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
344
345       The following act only when a subsequent match failure causes  a  back‐
346       track to reach them. They all force a match failure, but they differ in
347       what happens afterwards. Those that advance the start-of-match point do
348       so only if the pattern is not anchored.
349
350         (*COMMIT)       overall failure, no advance of starting point
351         (*PRUNE)        advance to next starting character
352         (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
353         (*SKIP)         advance to current matching position
354         (*SKIP:NAME)    advance to position corresponding to an earlier
355                         (*MARK:NAME); if not found, the (*SKIP) is ignored
356         (*THEN)         local failure, backtrack to next alternation
357         (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
358

CALLOUTS

360
361         (?C)      callout
362         (?Cn)     callout with data n
363

AUTHOR

369
370       Philip Hazel
371       University Computing Service
372       Cambridge CB2 3QH, England.
373

REVISION

375
376       Last updated: 08 January 2014
377       Copyright (c) 1997-2014 University of Cambridge.
378
379
380
381PCRE 8.35                       08 January 2014                  PCRESYNTAX(3)