1Tcl_RegExpMatch(3) Tcl Library Procedures Tcl_RegExpMatch(3)
2
3
4
5______________________________________________________________________________
6
8 Tcl_RegExpMatch, Tcl_RegExpCompile, Tcl_RegExpExec, Tcl_RegExpRange,
9 Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj, Tcl_RegExpExecObj, Tcl_Reg‐
10 ExpGetInfo - Pattern matching with regular expressions
11
13 #include <tcl.h>
14
15 int
16 Tcl_RegExpMatchObj(interp, strObj, patObj)
17
18 int
19 Tcl_RegExpMatch(interp, string, pattern)
20
21 Tcl_RegExp
22 Tcl_RegExpCompile(interp, pattern)
23
24 int
25 Tcl_RegExpExec(interp, regexp, string, start)
26
27 Tcl_RegExpRange(regexp, index, startPtr, endPtr)
28
29 Tcl_RegExp │
30 Tcl_GetRegExpFromObj(interp, patObj, cflags) │
31
32 int │
33 Tcl_RegExpExecObj(interp, regexp, objPtr, offset, nmatches, eflags) │
34
35 Tcl_RegExpGetInfo(regexp, infoPtr) │
36
37
39 Tcl_Interp *interp (in) Tcl interpreter to use for error
40 reporting. The interpreter may be
41 NULL if no error reporting is desired. │
42
43 Tcl_Obj *strObj (in/out) │
44 Refers to the object from which to get │
45 the string to search. The internal │
46 representation of the object may be │
47 converted to a form that can be effi‐ │
48 ciently searched. │
49
50 Tcl_Obj *patObj (in/out) │
51 Refers to the object from which to get │
52 a regular expression. The compiled │
53 regular expression is cached in the │
54 object.
55
56 char *string (in) String to check for a match with a
57 regular expression.
58
59 CONST char *pattern (in) String in the form of a regular
60 expression pattern.
61
62 Tcl_RegExp regexp (in) Compiled regular expression. Must
63 have been returned previously by
64 Tcl_GetRegExpFromObj or Tcl_RegExpCom‐
65 pile.
66
67 char *start (in) If string is just a portion of some
68 other string, this argument identifies
69 the beginning of the larger string.
70 If it isn't the same as string, then
71 no ^ matches will be allowed.
72
73 int index (in) Specifies which range is desired: 0
74 means the range of the entire match, 1
75 or greater means the range that
76 matched a parenthesized sub-expres‐
77 sion. │
78
79 CONST char **startPtr(out) │
80 The address of the first character in │
81 the range is stored here, or NULL if │
82 there is no such range. │
83
84 CONST char **endPtr (out) │
85 The address of the character just │
86 after the last one in the range is │
87 stored here, or NULL if there is no │
88 such range. │
89
90 int cflags (in) │
91 OR-ed combination of compilation │
92 flags. See below for more information. │
93
94 Tcl_Obj *objPtr (in/out) │
95 An object which contains the string to │
96 check for a match with a regular │
97 expression. │
98
99 int offset (in) │
100 The character offset into the string │
101 where matching should begin. The │
102 value of the offset has no impact on ^ │
103 matches. This behavior is controlled │
104 by eflags. │
105
106 int nmatches (in) │
107 The number of matching subexpressions │
108 that should be remembered for later │
109 use. If this value is 0, then no sub‐ │
110 expression match information will be │
111 computed. If the value is -1, then │
112 all of the matching subexpressions │
113 will be remembered. Any other value │
114 will be taken as the maximum number of │
115 subexpressions to remember. │
116
117 int eflags (in) │
118 OR-ed combination of the values │
119 TCL_REG_NOTBOL and TCL_REG_NOTEOL. │
120 See below for more information. │
121
122 Tcl_RegExpInfo *infoPtr(out) │
123 The address of the location where │
124 information about a previous match │
125 should be stored by Tcl_RegExpGetInfo.
126_________________________________________________________________
127
128
130 Tcl_RegExpMatch determines whether its pattern argument matches regexp,
131 where regexp is interpreted as a regular expression using the rules in
132 the re_syntax reference page. If there is a match then Tcl_RegExpMatch
133 returns 1. If there is no match then Tcl_RegExpMatch returns 0. If an
134 error occurs in the matching process (e.g. pattern is not a valid regu‐
135 lar expression) then Tcl_RegExpMatch returns -1 and leaves an error
136 message in the interpreter result. Tcl_RegExpMatchObj is similar to │
137 Tcl_RegExpMatch except it operates on the Tcl objects strObj and patObj │
138 instead of UTF strings. Tcl_RegExpMatchObj is generally more efficient │
139 than Tcl_RegExpMatch, so it is the preferred interface.
140
141 Tcl_RegExpCompile, Tcl_RegExpExec, and Tcl_RegExpRange provide lower-
142 level access to the regular expression pattern matcher. Tcl_RegExpCom‐
143 pile compiles a regular expression string into the internal form used
144 for efficient pattern matching. The return value is a token for this
145 compiled form, which can be used in subsequent calls to Tcl_RegExpExec
146 or Tcl_RegExpRange. If an error occurs while compiling the regular
147 expression then Tcl_RegExpCompile returns NULL and leaves an error mes‐
148 sage in the interpreter result. Note: the return value from Tcl_Reg‐
149 ExpCompile is only valid up to the next call to Tcl_RegExpCompile; it
150 is not safe to retain these values for long periods of time.
151
152 Tcl_RegExpExec executes the regular expression pattern matcher. It
153 returns 1 if string contains a range of characters that match regexp, 0
154 if no match is found, and -1 if an error occurs. In the case of an
155 error, Tcl_RegExpExec leaves an error message in the interpreter
156 result. When searching a string for multiple matches of a pattern, it
157 is important to distinguish between the start of the original string
158 and the start of the current search. For example, when searching for
159 the second occurrence of a match, the string argument might point to
160 the character just after the first match; however, it is important for
161 the pattern matcher to know that this is not the start of the entire
162 string, so that it doesn't allow ^ atoms in the pattern to match. The
163 start argument provides this information by pointing to the start of
164 the overall string containing string. Start will be less than or equal
165 to string; if it is less than string then no ^ matches will be
166 allowed.
167
168 Tcl_RegExpRange may be invoked after Tcl_RegExpExec returns; it pro‐
169 vides detailed information about what ranges of the string matched what
170 parts of the pattern. Tcl_RegExpRange returns a pair of pointers in
171 *startPtr and *endPtr that identify a range of characters in the source
172 string for the most recent call to Tcl_RegExpExec. Index indicates
173 which of several ranges is desired: if index is 0, information is
174 returned about the overall range of characters that matched the entire
175 pattern; otherwise, information is returned about the range of charac‐
176 ters that matched the index'th parenthesized subexpression within the
177 pattern. If there is no range corresponding to index then NULL is
178 stored in *startPtr and *endPtr.
179
180 Tcl_GetRegExpFromObj, Tcl_RegExpExecObj, and Tcl_RegExpGetInfo are │
181 object interfaces that provide the most direct control of Henry │
182 Spencer's regular expression library. For users that need to modify │
183 compilation and execution options directly, it is recommended that you │
184 use these interfaces instead of calling the internal regexp functions. │
185 These interfaces handle the details of UTF to Unicode translations as │
186 well as providing improved performance through caching in the pattern │
187 and string objects. │
188
189 Tcl_GetRegExpFromObj attempts to return a compiled regular expression │
190 from the patObj. If the object does not already contain a compiled │
191 regular expression it will attempt to create one from the string in the │
192 object and assign it to the internal representation of the patObj. The │
193 return value of this function is of type Tcl_RegExp. The return value │
194 is a token for this compiled form, which can be used in subsequent │
195 calls to Tcl_RegExpExecObj or Tcl_RegExpGetInfo. If an error occurs │
196 while compiling the regular expression then Tcl_GetRegExpFromObj │
197 returns NULL and leaves an error message in the interpreter result. │
198 The regular expression token can be used as long as the internal repre‐ │
199 sentation of patObj refers to the compiled form. The eflags argument │
200 is a bitwise OR of zero or more of the following flags that control the │
201 compilation of patObj: │
202
203 TCL_REG_ADVANCED │
204 Compile advanced regular expressions (`AREs'). This mode cor‐ │
205 responds to the normal regular expression syntax accepted by │
206 the Tcl regexp and regsub commands. │
207
208 TCL_REG_EXTENDED │
209 Compile extended regular expressions (`EREs'). This mode cor‐ │
210 responds to the regular expression syntax recognized by Tcl │
211 8.0 and earlier versions. │
212
213 TCL_REG_BASIC │
214 Compile basic regular expressions (`BREs'). This mode corre‐ │
215 sponds to the regular expression syntax recognized by common │
216 Unix utilities like sed and grep. This is the default if no │
217 flags are specified. │
218
219 TCL_REG_EXPANDED │
220 Compile the regular expression (basic, extended, or advanced) │
221 using an expanded syntax that allows comments and whitespace. │
222 This mode causes non-backslashed non-bracket-expression white │
223 space and #-to-end-of-line comments to be ignored. │
224
225 TCL_REG_QUOTE │
226 Compile a literal string, with all characters treated as ordi‐ │
227 nary characters. │
228
229 TCL_REG_NOCASE │
230 Compile for matching that ignores upper/lower case distinc‐ │
231 tions. │
232
233 TCL_REG_NEWLINE │
234 Compile for newline-sensitive matching. By default, newline │
235 is a completely ordinary character with no special meaning in │
236 either regular expressions or strings. With this flag, `[^' │
237 bracket expressions and `.' never match newline, `^' matches │
238 an empty string after any newline in addition to its normal │
239 function, and `$' matches an empty string before any newline │
240 in addition to its normal function. REG_NEWLINE is the bit‐ │
241 wise OR of REG_NLSTOP and REG_NLANCH. │
242
243 TCL_REG_NLSTOP │
244 Compile for partial newline-sensitive matching, with the │
245 behavior of `[^' bracket expressions and `.' affected, but not │
246 the behavior of `^' and `$'. In this mode, `[^' bracket │
247 expressions and `.' never match newline. │
248
249 TCL_REG_NLANCH │
250 Compile for inverse partial newline-sensitive matching, with │
251 the behavior of of `^' and `$' (the ``anchors'') affected, but │
252 not the behavior of `[^' bracket expressions and `.'. In this │
253 mode `^' matches an empty string after any newline in addition │
254 to its normal function, and `$' matches an empty string before │
255 any newline in addition to its normal function. │
256
257 TCL_REG_NOSUB │
258 Compile for matching that reports only success or failure, not │
259 what was matched. This reduces compile overhead and may │
260 improve performance. Subsequent calls to Tcl_RegExpGetInfo or │
261 Tcl_RegExpRange will not report any match information. │
262
263 TCL_REG_CANMATCH │
264 Compile for matching that reports the potential to complete a │
265 partial match given more text (see below). │
266
267 Only one of TCL_REG_EXTENDED, TCL_REG_ADVANCED, TCL_REG_BASIC, and │
268 TCL_REG_QUOTE may be specified. │
269
270 Tcl_RegExpExecObj executes the regular expression pattern matcher. It │
271 returns 1 if objPtr contains a range of characters that match regexp, 0 │
272 if no match is found, and -1 if an error occurs. In the case of an │
273 error, Tcl_RegExpExecObj leaves an error message in the interpreter │
274 result. The nmatches value indicates to the matcher how many subex‐ │
275 pressions are of interest. If nmatches is 0, then no subexpression │
276 match information is recorded, which may allow the matcher to make var‐ │
277 ious optimizations. If the value is -1, then all of the subexpressions │
278 in the pattern are remembered. If the value is a positive integer, │
279 then only that number of subexpressions will be remembered. Matching │
280 begins at the specified Unicode character index given by offset. │
281 Unlike Tcl_RegExpExec, the behavior of anchors is not affected by the │
282 offset value. Instead the behavior of the anchors is explicitly con‐ │
283 trolled by the eflags argument, which is a bitwise OR of zero or more │
284 of the following flags: │
285
286 TCL_REG_NOTBOL │
287 The starting character will not be treated as the beginning of │
288 a line or the beginning of the string, so `^' will not match │
289 there. Note that this flag has no effect on how `\A' matches. │
290
291 TCL_REG_NOTEOL │
292 The last character in the string will not be treated as the │
293 end of a line or the end of the string, so '$' will not match │
294 there. Note that this flag has no effect on how `\Z' matches. │
295
296 Tcl_RegExpGetInfo retrieves information about the last match performed │
297 with a given regular expression regexp. The infoPtr argument contains │
298 a pointer to a structure that is defined as follows: │
299
300 typedef struct Tcl_RegExpInfo { │
301 int nsubs; │
302 Tcl_RegExpIndices *matches; │
303 long extendStart; │
304 } Tcl_RegExpInfo; │
305
306 The nsubs field contains a count of the number of parenthesized subex‐ │
307 pressions within the regular expression. If the TCL_REG_NOSUB was │
308 used, then this value will be zero. The matches field points to an │
309 array of nsubs values that indicate the bounds of each subexpression │
310 matched. The first element in the array refers to the range matched by │
311 the entire regular expression, and subsequent elements refer to the │
312 parenthesized subexpressions in the order that they appear in the pat‐ │
313 tern. Each element is a structure that is defined as follows: │
314
315 typedef struct Tcl_RegExpIndices { │
316 long start; │
317 long end; │
318 } Tcl_RegExpIndices; │
319
320 The start and end values are Unicode character indices relative to the │
321 offset location within objPtr where matching began. The start index │
322 identifies the first character of the matched subexpression. The end │
323 index identifies the first character after the matched subexpression. │
324 If the subexpression matched the empty string, then start and end will │
325 be equal. If the subexpression did not participate in the match, then │
326 start and end will be set to -1. │
327
328 The extendStart field in Tcl_RegExpInfo is only set if the TCL_REG_CAN‐ │
329 MATCH flag was used. It indicates the first character in the string │
330 where a match could occur. If a match was found, this will be the same │
331 as the beginning of the current match. If no match was found, then it │
332 indicates the earliest point at which a match might occur if additional │
333 text is appended to the string. If it is no match is possible even │
334 with further text, this field will be set to -1.
335
337 re_syntax(n)
338
340 match, pattern, regular expression, string, subexpression, Tcl_RegEx‐
341 pIndices, Tcl_RegExpInfo
342
343
344
345Tcl 8.1 Tcl_RegExpMatch(3)