1PCRECALLOUT(3) Library Functions Manual PCRECALLOUT(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 int (*pcre_callout)(pcre_callout_block *);
11
12 PCRE provides a feature called "callout", which is a means of temporar‐
13 ily passing control to the caller of PCRE in the middle of pattern
14 matching. The caller of PCRE provides an external function by putting
15 its entry point in the global variable pcre_callout. By default, this
16 variable contains NULL, which disables all calling out.
17
18 Within a regular expression, (?C) indicates the points at which the
19 external function is to be called. Different callout points can be
20 identified by putting a number less than 256 after the letter C. The
21 default value is zero. For example, this pattern has two callout
22 points:
23
24 (?C1)abc(?C2)def
25
26 If the PCRE_AUTO_CALLOUT option bit is set when pcre_compile() is
27 called, PCRE automatically inserts callouts, all with number 255,
28 before each item in the pattern. For example, if PCRE_AUTO_CALLOUT is
29 used with the pattern
30
31 A(\d{2}|--)
32
33 it is processed as if it were
34
35 (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
36
37 Notice that there is a callout before and after each parenthesis and
38 alternation bar. Automatic callouts can be used for tracking the
39 progress of pattern matching. The pcretest command has an option that
40 sets automatic callouts; when it is used, the output indicates how the
41 pattern is matched. This is useful information when you are trying to
42 optimize the performance of a particular pattern.
43
45
46 You should be aware that, because of optimizations in the way PCRE
47 matches patterns, callouts sometimes do not happen. For example, if the
48 pattern is
49
50 ab(?C4)cd
51
52 PCRE knows that any matching string must contain the letter "d". If the
53 subject string is "abyz", the lack of "d" means that matching doesn't
54 ever start, and the callout is never reached. However, with "abyd",
55 though the result is still no match, the callout is obeyed.
56
58
59 During matching, when PCRE reaches a callout point, the external func‐
60 tion defined by pcre_callout is called (if it is set). This applies to
61 both the pcre_exec() and the pcre_dfa_exec() matching functions. The
62 only argument to the callout function is a pointer to a pcre_callout
63 block. This structure contains the following fields:
64
65 int version;
66 int callout_number;
67 int *offset_vector;
68 const char *subject;
69 int subject_length;
70 int start_match;
71 int current_position;
72 int capture_top;
73 int capture_last;
74 void *callout_data;
75 int pattern_position;
76 int next_item_length;
77
78 The version field is an integer containing the version number of the
79 block format. The initial version was 0; the current version is 1. The
80 version number will change again in future if additional fields are
81 added, but the intention is never to remove any of the existing fields.
82
83 The callout_number field contains the number of the callout, as com‐
84 piled into the pattern (that is, the number after ?C for manual call‐
85 outs, and 255 for automatically generated callouts).
86
87 The offset_vector field is a pointer to the vector of offsets that was
88 passed by the caller to pcre_exec() or pcre_dfa_exec(). When
89 pcre_exec() is used, the contents can be inspected in order to extract
90 substrings that have been matched so far, in the same way as for
91 extracting substrings after a match has completed. For pcre_dfa_exec()
92 this field is not useful.
93
94 The subject and subject_length fields contain copies of the values that
95 were passed to pcre_exec().
96
97 The start_match field normally contains the offset within the subject
98 at which the current match attempt started. However, if the escape
99 sequence \K has been encountered, this value is changed to reflect the
100 modified starting point. If the pattern is not anchored, the callout
101 function may be called several times from the same point in the pattern
102 for different starting points in the subject.
103
104 The current_position field contains the offset within the subject of
105 the current match pointer.
106
107 When the pcre_exec() function is used, the capture_top field contains
108 one more than the number of the highest numbered captured substring so
109 far. If no substrings have been captured, the value of capture_top is
110 one. This is always the case when pcre_dfa_exec() is used, because it
111 does not support captured substrings.
112
113 The capture_last field contains the number of the most recently cap‐
114 tured substring. If no substrings have been captured, its value is -1.
115 This is always the case when pcre_dfa_exec() is used.
116
117 The callout_data field contains a value that is passed to pcre_exec()
118 or pcre_dfa_exec() specifically so that it can be passed back in call‐
119 outs. It is passed in the pcre_callout field of the pcre_extra data
120 structure. If no such data was passed, the value of callout_data in a
121 pcre_callout block is NULL. There is a description of the pcre_extra
122 structure in the pcreapi documentation.
123
124 The pattern_position field is present from version 1 of the pcre_call‐
125 out structure. It contains the offset to the next item to be matched in
126 the pattern string.
127
128 The next_item_length field is present from version 1 of the pcre_call‐
129 out structure. It contains the length of the next item to be matched in
130 the pattern string. When the callout immediately precedes an alterna‐
131 tion bar, a closing parenthesis, or the end of the pattern, the length
132 is zero. When the callout precedes an opening parenthesis, the length
133 is that of the entire subpattern.
134
135 The pattern_position and next_item_length fields are intended to help
136 in distinguishing between different automatic callouts, which all have
137 the same callout number. However, they are set for all callouts.
138
140
141 The external callout function returns an integer to PCRE. If the value
142 is zero, matching proceeds as normal. If the value is greater than
143 zero, matching fails at the current point, but the testing of other
144 matching possibilities goes ahead, just as if a lookahead assertion had
145 failed. If the value is less than zero, the match is abandoned, and
146 pcre_exec() (or pcre_dfa_exec()) returns the negative value.
147
148 Negative values should normally be chosen from the set of
149 PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan‐
150 dard "no match" failure. The error number PCRE_ERROR_CALLOUT is
151 reserved for use by callout functions; it will never be used by PCRE
152 itself.
153
155
156 Philip Hazel
157 University Computing Service
158 Cambridge CB2 3QH, England.
159
161
162 Last updated: 29 May 2007
163 Copyright (c) 1997-2007 University of Cambridge.
164
165
166
167 PCRECALLOUT(3)