1regcomp(3C) Standard C Library Functions regcomp(3C)
2
3
4
6 regcomp, regexec, regerror, regfree - regular expression matching
7
9 #include <sys/types.h>
10 #include <regex.h>
11
12 int regcomp(regex_t *restrict preg, const char *restrict pattern,
13 int cflags);
14
15
16 int regexec(const regex_t *restrict preg,
17 const char *restrict string, size_t nmatch,
18 regmatch_t pmatch[restrict], int eflags);
19
20
21 size_t regerror(int errcode, const regex_t *restrict preg,
22 char *restrict errbuf, size_t errbuf_size);
23
24
25 void regfree(regex_t *preg);
26
27
29 These functions interpret basic and extended regular expressions
30 (described on the regex(5) manual page).
31
32
33 The structure type regex_t contains at least the following member:
34
35 size_t re_nsub Number of parenthesised subexpressions.
36
37
38
39 The structure type regmatch_t contains at least the following members:
40
41 regoff_t rm_so Byte offset from start of string to start of sub‐
42 string.
43
44
45 regoff_t rm_eo Byte offset from start of string of the first charac‐
46 ter after the end of substring.
47
48
49 regcomp()
50 The regcomp() function will compile the regular expression contained in
51 the string pointed to by the pattern argument and place the results in
52 the structure pointed to by preg. The cflags argument is the bitwise
53 inclusive OR of zero or more of the following flags, which are defined
54 in the header <regex.h>:
55
56 REG_EXTENDED Use Extended Regular Expressions.
57
58
59 REG_ICASE Ignore case in match.
60
61
62 REG_NOSUB Report only success/fail in regexec().
63
64
65 REG_NEWLINE Change the handling of NEWLINE characters, as described
66 in the text.
67
68
69
70 The default regular expression type for pattern is a Basic Regular
71 Expression. The application can specify Extended Regular Expressions
72 using the REG_EXTENDED cflags flag.
73
74
75 If the REG_NOSUB flag was not set in cflags, then regcomp() will set
76 re_nsub to the number of parenthesised subexpressions (delimited by
77 \(\) in basic regular expressions or () in extended regular expres‐
78 sions) found in pattern.
79
80 regexec()
81 The regexec() function compares the null-terminated string specified by
82 string with the compiled regular expression preg initialized by a pre‐
83 vious call to regcomp(). The eflags argument is the bitwise inclusive
84 OR of zero or more of the following flags, which are defined in the
85 header <regex.h>:
86
87 REG_NOTBOL The first character of the string pointed to by string is
88 not the beginning of the line. Therefore, the circumflex
89 character (^), when taken as a special character, will
90 not match the beginning of string.
91
92
93 REG_NOTEOL The last character of the string pointed to by string is
94 not the end of the line. Therefore, the dollar sign ($),
95 when taken as a special character, will not match the end
96 of string.
97
98
99
100 If nmatch is zero or REG_NOSUB was set in the cflags argument to reg‐
101 comp(), then regexec() will ignore the pmatch argument. Otherwise, the
102 pmatch argument must point to an array with at least nmatch elements,
103 and regexec() will fill in the elements of that array with offsets of
104 the substrings of string that correspond to the parenthesised subex‐
105 pressions of pattern: pmatch[i].rm_so will be the byte offset of the
106 beginning and pmatch[i].rm_eo will be one greater than the byte offset
107 of the end of substring i. (Subexpression i begins at the ith matched
108 open parenthesis, counting from 1.) Offsets in pmatch[0] identify the
109 substring that corresponds to the entire regular expression. Unused
110 elements of pmatch up to pmatch[nmatch−1] will be filled with −1. If
111 there are more than nmatch subexpressions in pattern (pattern itself
112 counts as a subexpression), then regexec() will still do the match, but
113 will record only the first nmatch substrings.
114
115
116 When matching a basic or extended regular expression, any given paren‐
117 thesised subexpression of pattern might participate in the match of
118 several different substrings of string, or it might not match any sub‐
119 string even though the pattern as a whole did match. The following
120 rules are used to determine which substrings to report in pmatch when
121 matching regular expressions:
122
123 1. If subexpression i in a regular expression is not contained
124 within another subexpression, and it participated in the match
125 several times, then the byte offsets in pmatch[i] will delimit
126 the last such match.
127
128
129 2. If subexpression i is not contained within another subexpression,
130 and it did not participate in an otherwise successful match, the
131 byte offsets in pmatch[i] will be −1. A subexpression does not
132 participate in the match when:
133
134 * or \{\} appears immediately after the subexpression in a basic
135 regular expression, or *, ?, or {} appears immediately after the
136 subexpression in an extended regular expression, and the subex‐
137 pression did not match (matched zero times)
138
139 or
140
141 | is used in an extended regular expression to select this subex‐
142 pression or another, and the other subexpression matched.
143
144
145 3. If subexpression i is contained within another subexpression j,
146 and i is not contained within any other subexpression that is
147 contained within j, and a match of subexpression j is reported in
148 pmatch[j], then the match or non-match of subexpression i
149 reported in pmatch[i] will be as described in 1. and 2. above,
150 but within the substring reported in pmatch[j] rather than the
151 whole string.
152
153
154 4. If subexpression i is contained in subexpression j, and the byte
155 offsets in pmatch[j] are −1, then the pointers in pmatch[i] also
156 will be −1.
157
158
159 5. If subexpression i matched a zero-length string, then both byte
160 offsets in pmatch[i] will be the byte offset of the character or
161 NULL terminator immediately following the zero-length string.
162
163
164
165 If, when regexec() is called, the locale is different from when the
166 regular expression was compiled, the result is undefined.
167
168
169 If REG_NEWLINE is not set in cflags, then a NEWLINE character in pat‐
170 tern or string will be treated as an ordinary character. If REG_NEWLINE
171 is set, then newline will be treated as an ordinary character except as
172 follows:
173
174 1. A NEWLINE character in string will not be matched by a period
175 outside a bracket expression or by any form of a non-matching
176 list.
177
178
179 2. A circumflex (^) in pattern, when used to specify expression
180 anchoring will match the zero-length string immediately after a
181 newline in string, regardless of the setting of REG_NOTBOL.
182
183
184 3. A dollar-sign ($) in pattern, when used to specify expression
185 anchoring, will match the zero-length string immediately before a
186 newline in string, regardless of the setting of REG_NOTEOL.
187
188
189 regfree()
190 The regfree() function frees any memory allocated by regcomp() associ‐
191 ated with preg.
192
193
194 The following constants are defined as error return values:
195
196 REG_NOMATCH The regexec() function failed to match.
197
198
199 REG_BADPAT Invalid regular expression.
200
201
202 REG_ECOLLATE Invalid collating element referenced.
203
204
205 REG_ECTYPE Invalid character class type referenced.
206
207
208 REG_EESCAPE Trailing \ in pattern.
209
210
211 REG_ESUBREG Number in \digit invalid or in error.
212
213
214 REG_EBRACK [] imbalance.
215
216
217 REG_ENOSYS The function is not supported.
218
219
220 REG_EPAREN \(\) or () imbalance.
221
222
223 REG_EBRACE \{ \} imbalance.
224
225
226 REG_BADBR Content of \{ \} invalid: not a number, number too
227 large, more than two numbers, first larger than second.
228
229
230 REG_ERANGE Invalid endpoint in range expression.
231
232
233 REG_ESPACE Out of memory.
234
235
236 REG_BADRPT ?, * or + not preceded by valid regular expression.
237
238
239 regerror()
240 The regerror() function provides a mapping from error codes returned by
241 regcomp() and regexec() to unspecified printable strings. It generates
242 a string corresponding to the value of the errcode argument, which must
243 be the last non-zero value returned by regcomp() or regexec() with the
244 given value of preg. If errcode is not such a value, an error message
245 indicating that the error code is invalid is returned.
246
247
248 If preg is a NULL pointer, but errcode is a value returned by a previ‐
249 ous call to regexec() or regcomp(), the regerror() still generates an
250 error string corresponding to the value of errcode.
251
252
253 If the errbuf_size argument is not zero, regerror() will place the gen‐
254 erated string into the buffer of size errbuf_size bytes pointed to by
255 errbuf. If the string (including the terminating NULL) cannot fit in
256 the buffer, regerror() will truncate the string and null-terminate the
257 result.
258
259
260 If errbuf_size is zero, regerror() ignores the errbuf argument, and
261 returns the size of the buffer needed to hold the generated string.
262
263
264 If the preg argument to regexec() or regfree() is not a compiled regu‐
265 lar expression returned by regcomp(), the result is undefined. A preg
266 is no longer treated as a compiled regular expression after it is given
267 to regfree().
268
269
270 See regex(5) for BRE (Basic Regular Expression) Anchoring.
271
273 On successful completion, the regcomp() function returns 0. Otherwise,
274 it returns an integer value indicating an error as described in
275 <regex.h>, and the content of preg is undefined.
276
277
278 On successful completion, the regexec() function returns 0. Otherwise
279 it returns REG_NOMATCH to indicate no match, or REG_ENOSYS to indicate
280 that the function is not supported.
281
282
283 Upon successful completion, the regerror() function returns the number
284 of bytes needed to hold the entire generated string. Otherwise, it
285 returns 0 to indicate that the function is not implemented.
286
287
288 The regfree() function returns no value.
289
291 No errors are defined.
292
294 An application could use:
295
296
297 regerror(code,preg,(char *)NULL,(size_t)0)
298
299
300 to find out how big a buffer is needed for the generated string, malloc
301 a buffer to hold the string, and then call regerror() again to get the
302 string (see malloc(3C)). Alternately, it could allocate a fixed, static
303 buffer that is big enough to hold most strings, and then use malloc()
304 to allocate a larger buffer if it finds that this is too small.
305
307 Example 1 Example to match string against the extended regular expres‐
308 sion in pattern.
309
310 #include <regex.h>
311 /*
312 * Match string against the extended regular expression in
313 * pattern, treating errors as no match.
314 *
315 * return 1 for match, 0 for no match
316 */
317
318 int
319 match(const char *string, char *pattern)
320 {
321 int status;
322 regex_t re;
323 if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
324 return(0); /* report error */
325 }
326 status = regexec(&re, string, (size_t) 0, NULL, 0);
327 regfree(&re);
328 if (status != 0) {
329 return(0); /* report error */
330 }
331 return(1);
332 }
333
334
335
336 The following demonstrates how the REG_NOTBOL flag could be used with
337 regexec() to find all substrings in a line that match a pattern sup‐
338 plied by a user. (For simplicity of the example, very little error
339 checking is done.)
340
341
342 (void) regcomp (&re, pattern, 0);
343 /* this call to regexec() finds the first match on the line */
344 error = regexec (&re, &buffer[0], 1, &pm, 0);
345 while (error == 0) { /* while matches found */
346 /* substring found between pm.rm_so and pm.rm_eo */
347 /* This call to regexec() finds the next match */
348 error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
349 }
350
351
353 See attributes(5) for descriptions of the following attributes:
354
355
356
357
358 ┌─────────────────────────────┬─────────────────────────────┐
359 │ ATTRIBUTE TYPE │ ATTRIBUTE VALUE │
360 ├─────────────────────────────┼─────────────────────────────┤
361 │CSI │Enabled │
362 ├─────────────────────────────┼─────────────────────────────┤
363 │Interface Stability │Standard │
364 ├─────────────────────────────┼─────────────────────────────┤
365 │MT-Level │MT-Safe with exceptions │
366 └─────────────────────────────┴─────────────────────────────┘
367
369 fnmatch(3C), glob(3C), malloc(3C), setlocale(3C), attributes(5), stan‐
370 dards(5), regex(5)
371
373 The regcomp() function can be used safely in a multithreaded applica‐
374 tion as long as setlocale(3C) is not being called to change the locale.
375
376
377
378SunOS 5.11 1 Nov 2003 regcomp(3C)