1PCREPARTIAL(3) Library Functions Manual PCREPARTIAL(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 In normal use of PCRE, if the subject string that is passed to
11 pcre_exec() or pcre_dfa_exec() matches as far as it goes, but is too
12 short to match the entire pattern, PCRE_ERROR_NOMATCH is returned.
13 There are circumstances where it might be helpful to distinguish this
14 case from other cases in which there is no match.
15
16 Consider, for example, an application where a human is required to type
17 in data for a field with specific formatting requirements. An example
18 might be a date in the form ddmmmyy, defined by this pattern:
19
20 ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
21
22 If the application sees the user's keystrokes one by one, and can check
23 that what has been typed so far is potentially valid, it is able to
24 raise an error as soon as a mistake is made, possibly beeping and not
25 reflecting the character that has been typed. This immediate feedback
26 is likely to be a better user interface than a check that is delayed
27 until the entire string has been entered.
28
29 PCRE supports the concept of partial matching by means of the PCRE_PAR‐
30 TIAL option, which can be set when calling pcre_exec() or
31 pcre_dfa_exec(). When this flag is set for pcre_exec(), the return code
32 PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
33 during the matching process the last part of the subject string matched
34 part of the pattern. Unfortunately, for non-anchored matching, it is
35 not possible to obtain the position of the start of the partial match.
36 No captured data is set when PCRE_ERROR_PARTIAL is returned.
37
38 When PCRE_PARTIAL is set for pcre_dfa_exec(), the return code
39 PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of
40 the subject is reached, there have been no complete matches, but there
41 is still at least one matching possibility. The portion of the string
42 that provided the partial match is set as the first matching string.
43
44 Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers
45 the last literal byte in a pattern, and abandons matching immediately
46 if such a byte is not present in the subject string. This optimization
47 cannot be used for a subject string that might match only partially.
48
50
51 Because of the way certain internal optimizations are implemented in
52 the pcre_exec() function, the PCRE_PARTIAL option cannot be used with
53 all patterns. These restrictions do not apply when pcre_dfa_exec() is
54 used. For pcre_exec(), repeated single characters such as
55
56 a{2,4}
57
58 and repeated single metasequences such as
59
60 \d+
61
62 are not permitted if the maximum number of occurrences is greater than
63 one. Optional items such as \d? (where the maximum is one) are permit‐
64 ted. Quantifiers with any values are permitted after parentheses, so
65 the invalid examples above can be coded thus:
66
67 (a){2,4}
68 (\d)+
69
70 These constructions run more slowly, but for the kinds of application
71 that are envisaged for this facility, this is not felt to be a major
72 restriction.
73
74 If PCRE_PARTIAL is set for a pattern that does not conform to the
75 restrictions, pcre_exec() returns the error code PCRE_ERROR_BADPARTIAL
76 (-13). You can use the PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to
77 find out if a compiled pattern can be used for partial matching.
78
80
81 If the escape sequence \P is present in a pcretest data line, the
82 PCRE_PARTIAL flag is used for the match. Here is a run of pcretest that
83 uses the date example quoted above:
84
85 re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
86 data> 25jun04\P
87 0: 25jun04
88 1: jun
89 data> 25dec3\P
90 Partial match
91 data> 3ju\P
92 Partial match
93 data> 3juj\P
94 No match
95 data> j\P
96 No match
97
98 The first data string is matched completely, so pcretest shows the
99 matched substrings. The remaining four strings do not match the com‐
100 plete pattern, but the first two are partial matches. The same test,
101 using pcre_dfa_exec() matching (by means of the \D escape sequence),
102 produces the following output:
103
104 re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
105 data> 25jun04\P\D
106 0: 25jun04
107 data> 23dec3\P\D
108 Partial match: 23dec3
109 data> 3ju\P\D
110 Partial match: 3ju
111 data> 3juj\P\D
112 No match
113 data> j\P\D
114 No match
115
116 Notice that in this case the portion of the string that was matched is
117 made available.
118
120
121 When a partial match has been found using pcre_dfa_exec(), it is possi‐
122 ble to continue the match by providing additional subject data and
123 calling pcre_dfa_exec() again with the same compiled regular expres‐
124 sion, this time setting the PCRE_DFA_RESTART option. You must also pass
125 the same working space as before, because this is where details of the
126 previous partial match are stored. Here is an example using pcretest,
127 using the \R escape sequence to set the PCRE_DFA_RESTART option (\P and
128 \D are as above):
129
130 re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
131 data> 23ja\P\D
132 Partial match: 23ja
133 data> n05\R\D
134 0: n05
135
136 The first call has "23ja" as the subject, and requests partial match‐
137 ing; the second call has "n05" as the subject for the continued
138 (restarted) match. Notice that when the match is complete, only the
139 last part is shown; PCRE does not retain the previously partially-
140 matched string. It is up to the calling program to do that if it needs
141 to.
142
143 You can set PCRE_PARTIAL with PCRE_DFA_RESTART to continue partial
144 matching over multiple segments. This facility can be used to pass very
145 long subject strings to pcre_dfa_exec(). However, some care is needed
146 for certain types of pattern.
147
148 1. If the pattern contains tests for the beginning or end of a line,
149 you need to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropri‐
150 ate, when the subject string for any call does not contain the begin‐
151 ning or end of a line.
152
153 2. If the pattern contains backward assertions (including \b or \B),
154 you need to arrange for some overlap in the subject strings to allow
155 for this. For example, you could pass the subject in chunks that are
156 500 bytes long, but in a buffer of 700 bytes, with the starting offset
157 set to 200 and the previous 200 bytes at the start of the buffer.
158
159 3. Matching a subject string that is split into multiple segments does
160 not always produce exactly the same result as matching over one single
161 long string. The difference arises when there are multiple matching
162 possibilities, because a partial match result is given only when there
163 are no completed matches in a call to pcre_dfa_exec(). This means that
164 as soon as the shortest match has been found, continuation to a new
165 subject segment is no longer possible. Consider this pcretest example:
166
167 re> /dog(sbody)?/
168 data> do\P\D
169 Partial match: do
170 data> gsb\R\P\D
171 0: g
172 data> dogsbody\D
173 0: dogsbody
174 1: dog
175
176 The pattern matches the words "dog" or "dogsbody". When the subject is
177 presented in several parts ("do" and "gsb" being the first two) the
178 match stops when "dog" has been found, and it is not possible to con‐
179 tinue. On the other hand, if "dogsbody" is presented as a single
180 string, both matches are found.
181
182 Because of this phenomenon, it does not usually make sense to end a
183 pattern that is going to be matched in this way with a variable repeat.
184
185 4. Patterns that contain alternatives at the top level which do not all
186 start with the same pattern item may not work as expected. For example,
187 consider this pattern:
188
189 1234|3789
190
191 If the first part of the subject is "ABC123", a partial match of the
192 first alternative is found at offset 3. There is no partial match for
193 the second alternative, because such a match does not start at the same
194 point in the subject string. Attempting to continue with the string
195 "789" does not yield a match because only those alternatives that match
196 at one point in the subject are remembered. The problem arises because
197 the start of the second alternative matches within the first alterna‐
198 tive. There is no problem with anchored patterns or patterns such as:
199
200 1234|ABCD
201
202 where no string can be a partial match for both alternatives.
203
205
206 Philip Hazel
207 University Computing Service
208 Cambridge CB2 3QH, England.
209
211
212 Last updated: 04 June 2007
213 Copyright (c) 1997-2007 University of Cambridge.
214
215
216
217 PCREPARTIAL(3)