1PCRE2COMPAT(3)             Library Functions Manual             PCRE2COMPAT(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

DIFFERENCES BETWEEN PCRE2 AND PERL

9
10       This document describes the differences in the ways that PCRE2 and Perl
11       handle regular expressions. The differences  described  here  are  with
12       respect to Perl versions 5.10 and above.
13
14       1.  PCRE2  has only a subset of Perl's Unicode support. Details of what
15       it does have are given in the pcre2unicode page.
16
17       2. PCRE2 allows repeat quantifiers only  on  parenthesized  assertions,
18       but  they  do not mean what you might think. For example, (?!a){3} does
19       not assert that the next three characters are not "a". It just  asserts
20       that  the  next  character  is not "a" three times (in principle: PCRE2
21       optimizes this to run the assertion  just  once).  Perl  allows  repeat
22       quantifiers  on  other  assertions such as \b, but these do not seem to
23       have any use.
24
25       3. Capturing subpatterns that occur inside  negative  lookahead  asser‐
26       tions  are  counted,  but their entries in the offsets vector are never
27       set. Perl sometimes (but not always) sets its numerical variables  from
28       inside negative assertions.
29
30       4.  The  following Perl escape sequences are not supported: \l, \u, \L,
31       \U, and \N when followed by a character name or Unicode value.  (\N  on
32       its own, matching a non-newline character, is supported.) In fact these
33       are implemented by Perl's general string-handling and are not  part  of
34       its  pattern matching engine. If any of these are encountered by PCRE2,
35       an error is generated by default. However, if the PCRE2_ALT_BSUX option
36       is set, \U and \u are interpreted as ECMAScript interprets them.
37
38       5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2
39       is built with Unicode support. The properties that can be  tested  with
40       \p and \P are limited to the general category properties such as Lu and
41       Nd, script names such as Greek or Han, and the derived  properties  Any
42       and L&. PCRE2 does support the Cs (surrogate) property, which Perl does
43       not; the Perl documentation says "Because Perl hides the need  for  the
44       user  to  understand the internal representation of Unicode characters,
45       there is no need to implement the  somewhat  messy  concept  of  surro‐
46       gates."
47
48       6.  PCRE2 does support the \Q...\E escape for quoting substrings. Char‐
49       acters in between are treated as literals. This is  slightly  different
50       from  Perl  in  that  $  and  @ are also handled as literals inside the
51       quotes. In Perl, they cause variable interpolation (but of course PCRE2
52       does not have variables).  Note the following examples:
53
54           Pattern            PCRE2 matches      Perl matches
55
56           \Qabc$xyz\E        abc$xyz           abc followed by the
57                                                  contents of $xyz
58           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
59           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
60
61       The  \Q...\E  sequence  is recognized both inside and outside character
62       classes.
63
64       7.  Fairly  obviously,  PCRE2  does  not  support  the  (?{code})   and
65       (??{code})  constructions. However, there is support for recursive pat‐
66       terns. This is not available in Perl 5.8, but it is in Perl 5.10. Also,
67       the  PCRE2  "callout"  feature allows an external function to be called
68       during  pattern  matching.  See  the  pcre2callout  documentation   for
69       details.
70
71       8.  Subroutine  calls  (whether recursive or not) are treated as atomic
72       groups.  Atomic recursion is like Python,  but  unlike  Perl.  Captured
73       values  that  are  set outside a subroutine call can be referenced from
74       inside in PCRE2, but not in Perl. There is a discussion  that  explains
75       these  differences  in  more detail in the section on recursion differ‐
76       ences from Perl in the pcre2pattern page.
77
78       9. If any of the backtracking control verbs are used  in  a  subpattern
79       that  is  called  as  a  subroutine (whether or not recursively), their
80       effect is confined to that subpattern; it does not extend to  the  sur‐
81       rounding  pattern.  This is not always the case in Perl. In particular,
82       if (*THEN) is present in a group that is called as  a  subroutine,  its
83       action is limited to that group, even if the group does not contain any
84       | characters. Note that such subpatterns are processed as  anchored  at
85       the point where they are tested.
86
87       10.  If a pattern contains more than one backtracking control verb, the
88       first one that is backtracked onto acts. For example,  in  the  pattern
89       A(*COMMIT)B(*PRUNE)C  a  failure in B triggers (*COMMIT), but a failure
90       in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases
91       it is the same as PCRE2, but there are cases where it differs.
92
93       11.  Most  backtracking  verbs in assertions have their normal actions.
94       They are not confined to the assertion.
95
96       12. There are some differences that are concerned with the settings  of
97       captured  strings  when  part  of  a  pattern is repeated. For example,
98       matching "aba" against the  pattern  /^(a(b)?)+$/  in  Perl  leaves  $2
99       unset, but in PCRE2 it is set to "b".
100
101       13. PCRE2's handling of duplicate subpattern numbers and duplicate sub‐
102       pattern names is not as general as Perl's. This is a consequence of the
103       fact  the  PCRE2  works internally just with numbers, using an external
104       table to translate between numbers and names. In particular, a  pattern
105       such  as  (?|(?<a>A)|(?<b>B),  where the two capturing parentheses have
106       the same number but different names, is not supported,  and  causes  an
107       error  at compile time. If it were allowed, it would not be possible to
108       distinguish which parentheses matched, because both names map  to  cap‐
109       turing subpattern number 1. To avoid this confusing situation, an error
110       is given at compile time.
111
112       14. Perl used to recognize comments in some places that PCRE2 does not,
113       for  example,  between the ( and ? at the start of a subpattern. If the
114       /x modifier is set, Perl allowed white space between ( and ? though the
115       latest  Perls give an error (for a while it was just deprecated). There
116       may still be some cases where Perl behaves differently.
117
118       15. Perl, when in warning mode, gives warnings  for  character  classes
119       such  as  [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter‐
120       als. PCRE2 has no warning features, so it gives an error in these cases
121       because they are almost certainly user mistakes.
122
123       16.  In  PCRE2, the upper/lower case character properties Lu and Ll are
124       not affected when case-independent matching is specified. For  example,
125       \p{Lu} always matches an upper case letter. I think Perl has changed in
126       this respect; in the release at the time of writing (5.16), \p{Lu}  and
127       \p{Ll} match all letters, regardless of case, when case independence is
128       specified.
129
130       17. PCRE2 provides some  extensions  to  the  Perl  regular  expression
131       facilities.   Perl  5.10  includes new features that are not in earlier
132       versions of Perl, some of which (such as named parentheses)  have  been
133       in PCRE2 for some time. This list is with respect to Perl 5.10:
134
135       (a)  Although  lookbehind  assertions  in PCRE2 must match fixed length
136       strings, each alternative branch of a lookbehind assertion can match  a
137       different  length  of  string.  Perl requires them all to have the same
138       length.
139
140       (b) From PCRE2 10.23, back references to groups  of  fixed  length  are
141       supported in lookbehinds, provided that there is no possibility of ref‐
142       erencing a non-unique number or name. Perl does not support  backrefer‐
143       ences in lookbehinds.
144
145       (c)  If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the
146       $ meta-character matches only at the very end of the string.
147
148       (d) A backslash followed  by  a  letter  with  no  special  meaning  is
149       faulted. (Perl can be made to issue a warning.)
150
151       (e)  If PCRE2_UNGREEDY is set, the greediness of the repetition quanti‐
152       fiers is inverted, that is, by default they are not greedy, but if fol‐
153       lowed by a question mark they are.
154
155       (f)  PCRE2_ANCHORED  can be used at matching time to force a pattern to
156       be tried only at the first matching position in the subject string.
157
158       (g)      The      PCRE2_NOTBOL,      PCRE2_NOTEOL,      PCRE2_NOTEMPTY,
159       PCRE2_NOTEMPTY_ATSTART,  and PCRE2_NO_AUTO_CAPTURE options have no Perl
160       equivalents.
161
162       (h) The \R escape sequence can be restricted to match only CR,  LF,  or
163       CRLF by the PCRE2_BSR_ANYCRLF option.
164
165       (i) The callout facility is PCRE2-specific.
166
167       (j) The partial matching facility is PCRE2-specific.
168
169       (k)  The  alternative matching function (pcre2_dfa_match() matches in a
170       different way and is not Perl-compatible.
171
172       (l) PCRE2 recognizes some special sequences such as (*CR) at the  start
173       of a pattern that set overall options that cannot be changed within the
174       pattern.
175

AUTHOR

177
178       Philip Hazel
179       University Computing Service
180       Cambridge, England.
181

REVISION

183
184       Last updated: 18 October 2016
185       Copyright (c) 1997-2016 University of Cambridge.
186
187
188
189PCRE2 10.23                     18 October 2016                 PCRE2COMPAT(3)
Impressum