1PCRECALLOUT(3) Library Functions Manual PCRECALLOUT(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 int (*pcre_callout)(pcre_callout_block *);
11
12 PCRE provides a feature called "callout", which is a means of temporar‐
13 ily passing control to the caller of PCRE in the middle of pattern
14 matching. The caller of PCRE provides an external function by putting
15 its entry point in the global variable pcre_callout. By default, this
16 variable contains NULL, which disables all calling out.
17
18 Within a regular expression, (?C) indicates the points at which the
19 external function is to be called. Different callout points can be
20 identified by putting a number less than 256 after the letter C. The
21 default value is zero. For example, this pattern has two callout
22 points:
23
24 (?C1)abc(?C2)def
25
26 If the PCRE_AUTO_CALLOUT option bit is set when pcre_compile() or
27 pcre_compile2() is called, PCRE automatically inserts callouts, all
28 with number 255, before each item in the pattern. For example, if
29 PCRE_AUTO_CALLOUT is used with the pattern
30
31 A(\d{2}|--)
32
33 it is processed as if it were
34
35 (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
36
37 Notice that there is a callout before and after each parenthesis and
38 alternation bar. Automatic callouts can be used for tracking the
39 progress of pattern matching. The pcretest command has an option that
40 sets automatic callouts; when it is used, the output indicates how the
41 pattern is matched. This is useful information when you are trying to
42 optimize the performance of a particular pattern.
43
45
46 You should be aware that, because of optimizations in the way PCRE
47 matches patterns by default, callouts sometimes do not happen. For
48 example, if the pattern is
49
50 ab(?C4)cd
51
52 PCRE knows that any matching string must contain the letter "d". If the
53 subject string is "abyz", the lack of "d" means that matching doesn't
54 ever start, and the callout is never reached. However, with "abyd",
55 though the result is still no match, the callout is obeyed.
56
57 If the pattern is studied, PCRE knows the minimum length of a matching
58 string, and will immediately give a "no match" return without actually
59 running a match if the subject is not long enough, or, for unanchored
60 patterns, if it has been scanned far enough.
61
62 You can disable these optimizations by passing the PCRE_NO_START_OPTI‐
63 MIZE option to pcre_exec() or pcre_dfa_exec(). This slows down the
64 matching process, but does ensure that callouts such as the example
65 above are obeyed.
66
68
69 During matching, when PCRE reaches a callout point, the external func‐
70 tion defined by pcre_callout is called (if it is set). This applies to
71 both the pcre_exec() and the pcre_dfa_exec() matching functions. The
72 only argument to the callout function is a pointer to a pcre_callout
73 block. This structure contains the following fields:
74
75 int version;
76 int callout_number;
77 int *offset_vector;
78 const char *subject;
79 int subject_length;
80 int start_match;
81 int current_position;
82 int capture_top;
83 int capture_last;
84 void *callout_data;
85 int pattern_position;
86 int next_item_length;
87
88 The version field is an integer containing the version number of the
89 block format. The initial version was 0; the current version is 1. The
90 version number will change again in future if additional fields are
91 added, but the intention is never to remove any of the existing fields.
92
93 The callout_number field contains the number of the callout, as com‐
94 piled into the pattern (that is, the number after ?C for manual call‐
95 outs, and 255 for automatically generated callouts).
96
97 The offset_vector field is a pointer to the vector of offsets that was
98 passed by the caller to pcre_exec() or pcre_dfa_exec(). When
99 pcre_exec() is used, the contents can be inspected in order to extract
100 substrings that have been matched so far, in the same way as for
101 extracting substrings after a match has completed. For pcre_dfa_exec()
102 this field is not useful.
103
104 The subject and subject_length fields contain copies of the values that
105 were passed to pcre_exec().
106
107 The start_match field normally contains the offset within the subject
108 at which the current match attempt started. However, if the escape
109 sequence \K has been encountered, this value is changed to reflect the
110 modified starting point. If the pattern is not anchored, the callout
111 function may be called several times from the same point in the pattern
112 for different starting points in the subject.
113
114 The current_position field contains the offset within the subject of
115 the current match pointer.
116
117 When the pcre_exec() function is used, the capture_top field contains
118 one more than the number of the highest numbered captured substring so
119 far. If no substrings have been captured, the value of capture_top is
120 one. This is always the case when pcre_dfa_exec() is used, because it
121 does not support captured substrings.
122
123 The capture_last field contains the number of the most recently cap‐
124 tured substring. If no substrings have been captured, its value is -1.
125 This is always the case when pcre_dfa_exec() is used.
126
127 The callout_data field contains a value that is passed to pcre_exec()
128 or pcre_dfa_exec() specifically so that it can be passed back in call‐
129 outs. It is passed in the pcre_callout field of the pcre_extra data
130 structure. If no such data was passed, the value of callout_data in a
131 pcre_callout block is NULL. There is a description of the pcre_extra
132 structure in the pcreapi documentation.
133
134 The pattern_position field is present from version 1 of the pcre_call‐
135 out structure. It contains the offset to the next item to be matched in
136 the pattern string.
137
138 The next_item_length field is present from version 1 of the pcre_call‐
139 out structure. It contains the length of the next item to be matched in
140 the pattern string. When the callout immediately precedes an alterna‐
141 tion bar, a closing parenthesis, or the end of the pattern, the length
142 is zero. When the callout precedes an opening parenthesis, the length
143 is that of the entire subpattern.
144
145 The pattern_position and next_item_length fields are intended to help
146 in distinguishing between different automatic callouts, which all have
147 the same callout number. However, they are set for all callouts.
148
150
151 The external callout function returns an integer to PCRE. If the value
152 is zero, matching proceeds as normal. If the value is greater than
153 zero, matching fails at the current point, but the testing of other
154 matching possibilities goes ahead, just as if a lookahead assertion had
155 failed. If the value is less than zero, the match is abandoned, and
156 pcre_exec() or pcre_dfa_exec() returns the negative value.
157
158 Negative values should normally be chosen from the set of
159 PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan‐
160 dard "no match" failure. The error number PCRE_ERROR_CALLOUT is
161 reserved for use by callout functions; it will never be used by PCRE
162 itself.
163
165
166 Philip Hazel
167 University Computing Service
168 Cambridge CB2 3QH, England.
169
171
172 Last updated: 29 September 2009
173 Copyright (c) 1997-2009 University of Cambridge.
174
175
176
177 PCRECALLOUT(3)