1Tcl_RegExpMatch(3)          Tcl Library Procedures          Tcl_RegExpMatch(3)
2
3
4
5______________________________________________________________________________
6

NAME

8       Tcl_RegExpMatch,  Tcl_RegExpCompile,  Tcl_RegExpExec,  Tcl_RegExpRange,
9       Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj,  Tcl_RegExpExecObj,  Tcl_Reg‐
10       ExpGetInfo - Pattern matching with regular expressions
11

SYNOPSIS

13       #include <tcl.h>
14
15       int
16       Tcl_RegExpMatchObj(interp, textObj, patObj)
17
18       int
19       Tcl_RegExpMatch(interp, text, pattern)
20
21       Tcl_RegExp
22       Tcl_RegExpCompile(interp, pattern)
23
24       int
25       Tcl_RegExpExec(interp, regexp, text, start)
26
27       void
28       Tcl_RegExpRange(regexp, index, startPtr, endPtr)
29
30       Tcl_RegExp
31       Tcl_GetRegExpFromObj(interp, patObj, cflags)
32
33       int
34       Tcl_RegExpExecObj(interp, regexp, textObj, offset, nmatches, eflags)
35
36       void
37       Tcl_RegExpGetInfo(regexp, infoPtr)
38

ARGUMENTS

40       Tcl_Interp *interp (in)              Tcl  interpreter  to use for error
41                                            reporting.  The interpreter may be
42                                            NULL  if no error reporting is de‐
43                                            sired.
44
45       Tcl_Obj *textObj (in/out)            Refers to the value from which  to
46                                            get  the  text to search.  The in‐
47                                            ternal representation of the value
48                                            may  be  converted  to a form that
49                                            can be efficiently searched.
50
51       Tcl_Obj *patObj (in/out)             Refers to the value from which  to
52                                            get a regular expression. The com‐
53                                            piled regular expression is cached
54                                            in the value.
55
56       const char *text (in)                Text  to search for a match with a
57                                            regular expression.
58
59       const char *pattern (in)             String in the form  of  a  regular
60                                            expression pattern.
61
62       Tcl_RegExp regexp (in)               Compiled regular expression.  Must
63                                            have been returned  previously  by
64                                            Tcl_GetRegExpFromObj or Tcl_RegEx‐
65                                            pCompile.
66
67       const char *start (in)               If text is just a portion of  some
68                                            other  string, this argument iden‐
69                                            tifies the beginning of the larger
70                                            string.   If it is not the same as
71                                            text, then no “^” matches will  be
72                                            allowed.
73
74       int index (in)                       Specifies  which range is desired:
75                                            0 means the range  of  the  entire
76                                            match,  1  or  greater  means  the
77                                            range that matched a parenthesized
78                                            sub-expression.
79
80       const char **startPtr (out)          The address of the first character
81                                            in the range is  stored  here,  or
82                                            NULL if there is no such range.
83
84       const char **endPtr (out)            The  address of the character just
85                                            after the last one in the range is
86                                            stored  here,  or NULL if there is
87                                            no such range.
88
89       int cflags (in)                      OR-ed combination of the  compila‐
90                                            tion    flags    TCL_REG_ADVANCED,
91                                            TCL_REG_EXTENDED,   TCL_REG_BASIC,
92                                            TCL_REG_EXPANDED,   TCL_REG_QUOTE,
93                                            TCL_REG_NOCASE,   TCL_REG_NEWLINE,
94                                            TCL_REG_NLSTOP,    TCL_REG_NLANCH,
95                                            TCL_REG_NOSUB,  and   TCL_REG_CAN‐
96                                            MATCH. See below for more informa‐
97                                            tion.
98
99       int offset (in)                      The character offset into the text
100                                            where  matching should begin.  The
101                                            value of the offset has no  impact
102                                            on  ^  matches.   This behavior is
103                                            controlled by eflags.
104
105       int nmatches (in)                    The number of matching  subexpres‐
106                                            sions  that  should  be remembered
107                                            for later use.  If this  value  is
108                                            0, then no subexpression match in‐
109                                            formation will  be  computed.   If
110                                            the  value  is -1, then all of the
111                                            matching  subexpressions  will  be
112                                            remembered.   Any other value will
113                                            be taken as the maximum number  of
114                                            subexpressions to remember.
115
116       int eflags (in)                      OR-ed combination of the execution
117                                            flags      TCL_REG_NOTBOL      and
118                                            TCL_REG_NOTEOL. See below for more
119                                            information.
120
121       Tcl_RegExpInfo *infoPtr (out)        The address of the location  where
122                                            information about a previous match
123                                            should be stored by Tcl_RegExpGet‐
124                                            Info.
125______________________________________________________________________________
126

DESCRIPTION

128       Tcl_RegExpMatch determines whether its pattern argument matches regexp,
129       where regexp is interpreted as a regular expression using the rules  in
130       the re_syntax reference page.  If there is a match then Tcl_RegExpMatch
131       returns 1.  If there is no match then Tcl_RegExpMatch returns 0.  If an
132       error occurs in the matching process (e.g. pattern is not a valid regu‐
133       lar expression) then Tcl_RegExpMatch returns -1  and  leaves  an  error
134       message  in  the  interpreter result.  Tcl_RegExpMatchObj is similar to
135       Tcl_RegExpMatch except it operates on the Tcl values textObj and patObj
136       instead of UTF strings.  Tcl_RegExpMatchObj is generally more efficient
137       than Tcl_RegExpMatch, so it is the preferred interface.
138
139       Tcl_RegExpCompile, Tcl_RegExpExec, and Tcl_RegExpRange  provide  lower-
140       level access to the regular expression pattern matcher.  Tcl_RegExpCom‐
141       pile compiles a regular expression string into the internal  form  used
142       for  efficient  pattern matching.  The return value is a token for this
143       compiled form, which can be used in subsequent calls to  Tcl_RegExpExec
144       or Tcl_RegExpRange.  If an error occurs while compiling the regular ex‐
145       pression then Tcl_RegExpCompile returns NULL and leaves an  error  mes‐
146       sage  in the interpreter result.  Note:  the return value from Tcl_Reg‐
147       ExpCompile is only valid up to the next call to Tcl_RegExpCompile;   it
148       is not safe to retain these values for long periods of time.
149
150       Tcl_RegExpExec executes the regular expression pattern matcher.  It re‐
151       turns 1 if text contains a range of characters that match regexp, 0  if
152       no match is found, and -1 if an error occurs.  In the case of an error,
153       Tcl_RegExpExec leaves an error message in the interpreter result.  When
154       searching  a  string for multiple matches of a pattern, it is important
155       to distinguish between the start of the original string and  the  start
156       of  the current search.  For example, when searching for the second oc‐
157       currence of a match, the text argument might  point  to  the  character
158       just  after  the first match;  however, it is important for the pattern
159       matcher to know that this is not the start of  the  entire  string,  so
160       that  it  does  not allow “^” atoms in the pattern to match.  The start
161       argument provides this information by pointing  to  the  start  of  the
162       overall  string  containing  text.  Start will be less than or equal to
163       text;  if it is less than text then no ^ matches will be allowed.
164
165       Tcl_RegExpRange may be invoked after Tcl_RegExpExec returns;   it  pro‐
166       vides detailed information about what ranges of the string matched what
167       parts of the pattern.  Tcl_RegExpRange returns a pair  of  pointers  in
168       *startPtr and *endPtr that identify a range of characters in the source
169       string for the most recent call  to  Tcl_RegExpExec.   Index  indicates
170       which  of  several ranges is desired: if index is 0, information is re‐
171       turned about the overall range of characters that  matched  the  entire
172       pattern;  otherwise, information is returned about the range of charac‐
173       ters that matched the index'th parenthesized subexpression  within  the
174       pattern.   If  there  is  no  range corresponding to index then NULL is
175       stored in *startPtr and *endPtr.
176
177       Tcl_GetRegExpFromObj,  Tcl_RegExpExecObj,  and  Tcl_RegExpGetInfo   are
178       value  interfaces  that  provide  the  most  direct  control  of  Henry
179       Spencer's regular expression library.  For users that  need  to  modify
180       compilation  and execution options directly, it is recommended that you
181       use these interfaces instead of calling the internal regexp  functions.
182       These  interfaces  handle the details of UTF to Unicode translations as
183       well as providing improved performance through caching in  the  pattern
184       and string values.
185
186       Tcl_GetRegExpFromObj  attempts  to return a compiled regular expression
187       from the patObj.  If the value does not already contain a compiled reg‐
188       ular  expression  it  will attempt to create one from the string in the
189       value and assign it to the internal representation of the patObj.   The
190       return  value of this function is of type Tcl_RegExp.  The return value
191       is a token for this compiled form, which  can  be  used  in  subsequent
192       calls  to  Tcl_RegExpExecObj  or Tcl_RegExpGetInfo.  If an error occurs
193       while compiling the regular expression  then  Tcl_GetRegExpFromObj  re‐
194       turns  NULL and leaves an error message in the interpreter result.  The
195       regular expression token can be used as long as the internal  represen‐
196       tation of patObj refers to the compiled form.  The cflags argument is a
197       bit-wise OR of zero or more of the following  flags  that  control  the
198       compilation of patObj:
199
200         TCL_REG_ADVANCED
201                Compile advanced regular expressions (“ARE”s).  This mode cor‐
202                responds to the normal regular expression syntax  accepted  by
203                the Tcl regexp and regsub commands.
204
205         TCL_REG_EXTENDED
206                Compile extended regular expressions (“ERE”s).  This mode cor‐
207                responds to the regular expression syntax  recognized  by  Tcl
208                8.0 and earlier versions.
209
210         TCL_REG_BASIC
211                Compile  basic regular expressions (“BRE”s).  This mode corre‐
212                sponds to the regular expression syntax recognized  by  common
213                Unix  utilities  like sed and grep.  This is the default if no
214                flags are specified.
215
216         TCL_REG_EXPANDED
217                Compile the regular expression (basic, extended, or  advanced)
218                using  an expanded syntax that allows comments and whitespace.
219                This mode causes non-backslashed non-bracket-expression  white
220                space and #-to-end-of-line comments to be ignored.
221
222         TCL_REG_QUOTE
223                Compile a literal string, with all characters treated as ordi‐
224                nary characters.
225
226         TCL_REG_NOCASE
227                Compile for matching that ignores  upper/lower  case  distinc‐
228                tions.
229
230         TCL_REG_NEWLINE
231                Compile  for  newline-sensitive matching.  By default, newline
232                is a completely ordinary character with no special meaning  in
233                either  regular  expressions or strings.  With this flag, “[^”
234                bracket expressions and “.”  never match newline, “^”  matches
235                an  empty  string  after any newline in addition to its normal
236                function, and “$” matches an empty string before  any  newline
237                in  addition  to its normal function.  REG_NEWLINE is the bit-
238                wise OR of REG_NLSTOP and REG_NLANCH.
239
240         TCL_REG_NLSTOP
241                Compile for partial newline-sensitive matching, with  the  be‐
242                havior  of “[^” bracket expressions and “.”  affected, but not
243                the behavior of “^” and “$”.  In this mode, “[^”  bracket  ex‐
244                pressions and “.”  never match newline.
245
246         TCL_REG_NLANCH
247                Compile  for  inverse partial newline-sensitive matching, with
248                the behavior of “^” and “$” (the “anchors”) affected, but  not
249                the  behavior  of  “[^”  bracket expressions and “.”.  In this
250                mode “^” matches an empty string after any newline in addition
251                to its normal function, and “$” matches an empty string before
252                any newline in addition to its normal function.
253
254         TCL_REG_NOSUB
255                Compile for matching that reports only success or failure, not
256                what  was  matched.  This reduces compile overhead and may im‐
257                prove performance.  Subsequent calls to  Tcl_RegExpGetInfo  or
258                Tcl_RegExpRange will not report any match information.
259
260         TCL_REG_CANMATCH
261                Compile  for matching that reports the potential to complete a
262                partial match given more text (see below).
263
264       Only one  of  TCL_REG_EXTENDED,  TCL_REG_ADVANCED,  TCL_REG_BASIC,  and
265       TCL_REG_QUOTE may be specified.
266
267       Tcl_RegExpExecObj  executes the regular expression pattern matcher.  It
268       returns 1 if objPtr contains a range of characters that match regexp, 0
269       if no match is found, and -1 if an error occurs.  In the case of an er‐
270       ror, Tcl_RegExpExecObj leaves an error message in the  interpreter  re‐
271       sult.   The nmatches value indicates to the matcher how many subexpres‐
272       sions are of interest.  If nmatches is 0, then no  subexpression  match
273       information  is  recorded,  which may allow the matcher to make various
274       optimizations.  If the value is -1, then all of the  subexpressions  in
275       the  pattern  are remembered.  If the value is a positive integer, then
276       only that number of subexpressions will be remembered.  Matching begins
277       at  the  specified  Unicode  character  index  given by offset.  Unlike
278       Tcl_RegExpExec, the behavior of anchors is not affected by  the  offset
279       value.  Instead the behavior of the anchors is explicitly controlled by
280       the eflags argument, which is a bit-wise OR of zero or more of the fol‐
281       lowing flags:
282
283         TCL_REG_NOTBOL
284                The starting character will not be treated as the beginning of
285                a line or the beginning of the string, so “^” will  not  match
286                there.  Note that this flag has no effect on how “\A” matches.
287
288         TCL_REG_NOTEOL
289                The  last  character  in the string will not be treated as the
290                end of a line or the end of the string, so “$” will not  match
291                there.  Note that this flag has no effect on how “\Z” matches.
292
293       Tcl_RegExpGetInfo  retrieves information about the last match performed
294       with a given regular expression regexp.  The infoPtr argument  contains
295       a pointer to a structure that is defined as follows:
296
297              typedef struct Tcl_RegExpInfo {
298                  int nsubs;
299                  Tcl_RegExpIndices *matches;
300                  long extendStart;
301              } Tcl_RegExpInfo;
302
303       The  nsubs field contains a count of the number of parenthesized subex‐
304       pressions within the regular  expression.   If  the  TCL_REG_NOSUB  was
305       used, then this value will be zero.  The matches field points to an ar‐
306       ray of nsubs+1 values that indicate the bounds  of  each  subexpression
307       matched.  The first element in the array refers to the range matched by
308       the entire regular expression, and subsequent  elements  refer  to  the
309       parenthesized  subexpressions in the order that they appear in the pat‐
310       tern.  Each element is a structure that is defined as follows:
311
312              typedef struct Tcl_RegExpIndices {
313                  long start;
314                  long end;
315              } Tcl_RegExpIndices;
316
317       The start and end values are Unicode character indices relative to  the
318       offset  location  within  objPtr where matching began.  The start index
319       identifies the first character of the matched subexpression.   The  end
320       index  identifies  the first character after the matched subexpression.
321       If the subexpression matched the empty string, then start and end  will
322       be  equal.  If the subexpression did not participate in the match, then
323       start and end will be set to -1.
324
325       The extendStart field in Tcl_RegExpInfo is only set if the TCL_REG_CAN‐
326       MATCH  flag  was  used.  It indicates the first character in the string
327       where a match could occur.  If a match was found, this will be the same
328       as  the beginning of the current match.  If no match was found, then it
329       indicates the earliest point at which a match might occur if additional
330       text  is  appended  to  the string.  If it is no match is possible even
331       with further text, this field will be set to -1.
332

SEE ALSO

334       re_syntax(n)
335

KEYWORDS

337       match, pattern, regular expression, string,  subexpression,  Tcl_RegEx‐
338       pIndices, Tcl_RegExpInfo
339
340
341
342Tcl                                   8.1                   Tcl_RegExpMatch(3)
Impressum