1PCREPARTIAL(3)             Library Functions Manual             PCREPARTIAL(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PARTIAL MATCHING IN PCRE

9
10       In  normal  use  of  PCRE,  if  the  subject  string  that is passed to
11       pcre_exec() or pcre_dfa_exec() matches as far as it goes,  but  is  too
12       short  to  match  the  entire  pattern, PCRE_ERROR_NOMATCH is returned.
13       There are circumstances where it might be helpful to  distinguish  this
14       case from other cases in which there is no match.
15
16       Consider, for example, an application where a human is required to type
17       in data for a field with specific formatting requirements.  An  example
18       might be a date in the form ddmmmyy, defined by this pattern:
19
20         ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
21
22       If the application sees the user's keystrokes one by one, and can check
23       that what has been typed so far is potentially valid,  it  is  able  to
24       raise  an  error as soon as a mistake is made, possibly beeping and not
25       reflecting the character that has been typed. This  immediate  feedback
26       is  likely  to  be a better user interface than a check that is delayed
27       until the entire string has been entered.
28
29       PCRE supports the concept of partial matching by means of the PCRE_PAR‐
30       TIAL   option,   which   can   be   set  when  calling  pcre_exec()  or
31       pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
32       PCRE_ERROR_NOMATCH  is converted into PCRE_ERROR_PARTIAL if at any time
33       during the matching process the last part of the subject string matched
34       part  of  the  pattern. Unfortunately, for non-anchored matching, it is
35       not possible to obtain the position of the start of the partial  match.
36       No captured data is set when PCRE_ERROR_PARTIAL is returned.
37
38       When   PCRE_PARTIAL   is  set  for  pcre_dfa_exec(),  the  return  code
39       PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the  end  of
40       the  subject is reached, there have been no complete matches, but there
41       is still at least one matching possibility. The portion of  the  string
42       that provided the partial match is set as the first matching string.
43
44       Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
45       the last literal byte in a pattern, and abandons  matching  immediately
46       if  such a byte is not present in the subject string. This optimization
47       cannot be used for a subject string that might match only partially.
48

RESTRICTED PATTERNS FOR PCRE_PARTIAL

50
51       Because of the way certain internal optimizations  are  implemented  in
52       the  pcre_exec()  function, the PCRE_PARTIAL option cannot be used with
53       all patterns. These restrictions do not apply when  pcre_dfa_exec()  is
54       used.  For pcre_exec(), repeated single characters such as
55
56         a{2,4}
57
58       and repeated single metasequences such as
59
60         \d+
61
62       are  not permitted if the maximum number of occurrences is greater than
63       one.  Optional items such as \d? (where the maximum is one) are permit‐
64       ted.   Quantifiers  with any values are permitted after parentheses, so
65       the invalid examples above can be coded thus:
66
67         (a){2,4}
68         (\d)+
69
70       These constructions run more slowly, but for the kinds  of  application
71       that  are  envisaged  for this facility, this is not felt to be a major
72       restriction.
73
74       If PCRE_PARTIAL is set for a pattern  that  does  not  conform  to  the
75       restrictions,  pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
76       (-13).  You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo()  to
77       find out if a compiled pattern can be used for partial matching.
78

EXAMPLE OF PARTIAL MATCHING USING PCRETEST

80
81       If  the  escape  sequence  \P  is  present in a pcretest data line, the
82       PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
83       uses the date example quoted above:
84
85           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
86         data> 25jun04\P
87          0: 25jun04
88          1: jun
89         data> 25dec3\P
90         Partial match
91         data> 3ju\P
92         Partial match
93         data> 3juj\P
94         No match
95         data> j\P
96         No match
97
98       The  first  data  string  is  matched completely, so pcretest shows the
99       matched substrings. The remaining four strings do not  match  the  com‐
100       plete  pattern,  but  the first two are partial matches. The same test,
101       using pcre_dfa_exec() matching (by means of the  \D  escape  sequence),
102       produces the following output:
103
104           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
105         data> 25jun04\P\D
106          0: 25jun04
107         data> 23dec3\P\D
108         Partial match: 23dec3
109         data> 3ju\P\D
110         Partial match: 3ju
111         data> 3juj\P\D
112         No match
113         data> j\P\D
114         No match
115
116       Notice  that in this case the portion of the string that was matched is
117       made available.
118

MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()

120
121       When a partial match has been found using pcre_dfa_exec(), it is possi‐
122       ble  to  continue  the  match  by providing additional subject data and
123       calling pcre_dfa_exec() again with the same  compiled  regular  expres‐
124       sion, this time setting the PCRE_DFA_RESTART option. You must also pass
125       the same working space as before, because this is where details of  the
126       previous  partial  match are stored. Here is an example using pcretest,
127       using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
128       \D are as above):
129
130           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
131         data> 23ja\P\D
132         Partial match: 23ja
133         data> n05\R\D
134          0: n05
135
136       The  first  call has "23ja" as the subject, and requests partial match‐
137       ing; the second call  has  "n05"  as  the  subject  for  the  continued
138       (restarted)  match.   Notice  that when the match is complete, only the
139       last part is shown; PCRE does  not  retain  the  previously  partially-
140       matched  string. It is up to the calling program to do that if it needs
141       to.
142
143       You can set PCRE_PARTIAL  with  PCRE_DFA_RESTART  to  continue  partial
144       matching over multiple segments. This facility can be used to pass very
145       long subject strings to pcre_dfa_exec(). However, some care  is  needed
146       for certain types of pattern.
147
148       1.  If  the  pattern contains tests for the beginning or end of a line,
149       you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options,  as  appropri‐
150       ate,  when  the subject string for any call does not contain the begin‐
151       ning or end of a line.
152
153       2. If the pattern contains backward assertions (including  \b  or  \B),
154       you  need  to  arrange for some overlap in the subject strings to allow
155       for this. For example, you could pass the subject in  chunks  that  are
156       500  bytes long, but in a buffer of 700 bytes, with the starting offset
157       set to 200 and the previous 200 bytes at the start of the buffer.
158
159       3. Matching a subject string that is split into multiple segments  does
160       not  always produce exactly the same result as matching over one single
161       long string.  The difference arises when there  are  multiple  matching
162       possibilities,  because a partial match result is given only when there
163       are no completed matches in a call to pcre_dfa_exec(). This means  that
164       as  soon  as  the  shortest match has been found, continuation to a new
165       subject segment is no longer possible.  Consider this pcretest example:
166
167           re> /dog(sbody)?/
168         data> do\P\D
169         Partial match: do
170         data> gsb\R\P\D
171          0: g
172         data> dogsbody\D
173          0: dogsbody
174          1: dog
175
176       The pattern matches the words "dog" or "dogsbody". When the subject  is
177       presented  in  several  parts  ("do" and "gsb" being the first two) the
178       match stops when "dog" has been found, and it is not possible  to  con‐
179       tinue.  On  the  other  hand,  if  "dogsbody"  is presented as a single
180       string, both matches are found.
181
182       Because of this phenomenon, it does not usually make  sense  to  end  a
183       pattern that is going to be matched in this way with a variable repeat.
184
185       4. Patterns that contain alternatives at the top level which do not all
186       start with the same pattern item may not work as expected. For example,
187       consider this pattern:
188
189         1234|3789
190
191       If  the  first  part of the subject is "ABC123", a partial match of the
192       first alternative is found at offset 3. There is no partial  match  for
193       the second alternative, because such a match does not start at the same
194       point in the subject string. Attempting to  continue  with  the  string
195       "789" does not yield a match because only those alternatives that match
196       at one point in the subject are remembered. The problem arises  because
197       the  start  of the second alternative matches within the first alterna‐
198       tive. There is no problem with anchored patterns or patterns such as:
199
200         1234|ABCD
201
202       where no string can be a partial match for both alternatives.
203

AUTHOR

205
206       Philip Hazel
207       University Computing Service
208       Cambridge CB2 3QH, England.
209

REVISION

211
212       Last updated: 04 June 2007
213       Copyright (c) 1997-2007 University of Cambridge.
214
215
216
217                                                                PCREPARTIAL(3)
Impressum