1regcomp(3C)              Standard C Library Functions              regcomp(3C)
2
3
4

NAME

6       regcomp, regexec, regerror, regfree - regular expression matching
7

SYNOPSIS

9       #include <sys/types.h>
10       #include <regex.h>
11
12       int regcomp(regex_t *restrict preg, const char *restrict pattern,
13            int cflags);
14
15
16       int regexec(const regex_t *restrict preg,
17            const char *restrict string, size_t nmatch,
18            regmatch_t pmatch[restrict], int eflags);
19
20
21       size_t regerror(int errcode, const regex_t *restrict preg,
22            char *restrict errbuf, size_t errbuf_size);
23
24
25       void regfree(regex_t *preg);
26
27

DESCRIPTION

29       These  functions  interpret  basic  and  extended  regular  expressions
30       (described on the regex(5) manual page).
31
32
33       The structure type regex_t contains at least the following member:
34
35       size_t re_nsub    Number of parenthesised subexpressions.
36
37
38
39       The structure type regmatch_t contains at least the following members:
40
41       regoff_t rm_so    Byte offset from start of string  to  start  of  sub‐
42                         string.
43
44
45       regoff_t rm_eo    Byte offset from start of string of the first charac‐
46                         ter after the end of substring.
47
48
49   regcomp()
50       The regcomp() function will compile the regular expression contained in
51       the  string pointed to by the pattern argument and place the results in
52       the structure pointed to by preg. The cflags argument  is  the  bitwise
53       inclusive  OR of zero or more of the following flags, which are defined
54       in the header <regex.h>:
55
56       REG_EXTENDED    Use Extended Regular Expressions.
57
58
59       REG_ICASE       Ignore case in match.
60
61
62       REG_NOSUB       Report only success/fail in regexec().
63
64
65       REG_NEWLINE     Change the handling of NEWLINE characters, as described
66                       in the text.
67
68
69
70       The  default  regular  expression  type  for pattern is a Basic Regular
71       Expression. The application can specify  Extended  Regular  Expressions
72       using the REG_EXTENDED cflags flag.
73
74
75       If  the  REG_NOSUB  flag was not set in cflags, then regcomp() will set
76       re_nsub to the number of  parenthesised  subexpressions  (delimited  by
77       \(\)  in  basic  regular  expressions or () in extended regular expres‐
78       sions) found in  pattern.
79
80   regexec()
81       The regexec() function compares the null-terminated string specified by
82       string  with the compiled regular expression preg initialized by a pre‐
83       vious call to regcomp(). The eflags argument is the  bitwise  inclusive
84       OR  of  zero  or  more of the following flags, which are defined in the
85       header <regex.h>:
86
87       REG_NOTBOL    The first character of the string pointed to by string is
88                     not  the beginning of the line. Therefore, the circumflex
89                     character (^), when taken as a  special  character,  will
90                     not match the beginning of string.
91
92
93       REG_NOTEOL    The  last character of the string pointed to by string is
94                     not the end of the line. Therefore, the dollar sign  ($),
95                     when taken as a special character, will not match the end
96                     of string.
97
98
99
100       If nmatch is zero or REG_NOSUB was set in the cflags argument  to  reg‐
101       comp(),  then regexec() will ignore the pmatch argument. Otherwise, the
102       pmatch argument must point to an array with at least  nmatch  elements,
103       and  regexec()  will fill in the elements of that array with offsets of
104       the substrings of string that correspond to  the  parenthesised  subex‐
105       pressions  of  pattern:  pmatch[i].rm_so will be the byte offset of the
106       beginning and pmatch[i].rm_eo will be one greater than the byte  offset
107       of  the  end of substring i. (Subexpression i begins at the ith matched
108       open parenthesis, counting from 1.) Offsets in pmatch[0]  identify  the
109       substring  that  corresponds  to  the entire regular expression. Unused
110       elements of pmatch up to pmatch[nmatch−1] will be filled  with  −1.  If
111       there  are  more  than nmatch subexpressions in pattern (pattern itself
112       counts as a subexpression), then regexec() will still do the match, but
113       will record only the first nmatch substrings.
114
115
116       When  matching a basic or extended regular expression, any given paren‐
117       thesised subexpression of pattern might participate  in  the  match  of
118       several  different substrings of string, or it might not match any sub‐
119       string even though the pattern as a  whole  did  match.  The  following
120       rules  are  used to determine which substrings to report in pmatch when
121       matching regular expressions:
122
123       1.    If subexpression i in  a  regular  expression  is  not  contained
124             within  another  subexpression,  and it participated in the match
125             several times, then the byte offsets in  pmatch[i]  will  delimit
126             the last such match.
127
128
129       2.    If subexpression i is not contained within another subexpression,
130             and it did not participate in an otherwise successful match,  the
131             byte  offsets  in  pmatch[i] will be −1. A subexpression does not
132             participate in the match when:
133
134             * or \{\}  appears immediately after the subexpression in a basic
135             regular  expression, or *, ?, or {} appears immediately after the
136             subexpression in an extended regular expression, and  the  subex‐
137             pression did not match (matched zero times)
138
139             or
140
141             | is used in an extended regular expression to select this subex‐
142             pression or another, and the other subexpression matched.
143
144
145       3.    If subexpression i is contained within another  subexpression  j,
146             and  i  is  not  contained within any other subexpression that is
147             contained within j, and a match of subexpression j is reported in
148             pmatch[j],  then  the  match  or  non-match  of  subexpression  i
149             reported in pmatch[i] will be as described in 1.  and  2.  above,
150             but  within  the  substring reported in pmatch[j] rather than the
151             whole string.
152
153
154       4.    If subexpression i is contained in subexpression j, and the  byte
155             offsets  in pmatch[j] are −1, then the pointers in pmatch[i] also
156             will be −1.
157
158
159       5.    If subexpression i matched a zero-length string, then  both  byte
160             offsets  in pmatch[i] will be the byte offset of the character or
161             NULL terminator immediately following the zero-length string.
162
163
164
165       If, when regexec() is called, the locale is  different  from  when  the
166       regular expression was compiled, the result is undefined.
167
168
169       If  REG_NEWLINE  is not set in cflags, then a NEWLINE character in pat‐
170       tern or string will be treated as an ordinary character. If REG_NEWLINE
171       is set, then newline will be treated as an ordinary character except as
172       follows:
173
174       1.    A NEWLINE character in string will not be  matched  by  a  period
175             outside  a  bracket  expression  or by any form of a non-matching
176             list.
177
178
179       2.    A circumflex (^) in pattern,  when  used  to  specify  expression
180             anchoring  will  match the zero-length string immediately after a
181             newline in string, regardless of the setting of REG_NOTBOL.
182
183
184       3.    A dollar-sign ($) in pattern, when  used  to  specify  expression
185             anchoring, will match the zero-length string immediately before a
186             newline in string, regardless of the setting of REG_NOTEOL.
187
188
189   regfree()
190       The regfree() function frees any memory allocated by regcomp()  associ‐
191       ated with preg.
192
193
194       The following constants are defined as error return values:
195
196       REG_NOMATCH     The regexec() function failed to match.
197
198
199       REG_BADPAT      Invalid regular expression.
200
201
202       REG_ECOLLATE    Invalid collating element referenced.
203
204
205       REG_ECTYPE      Invalid character class type referenced.
206
207
208       REG_EESCAPE     Trailing \ in pattern.
209
210
211       REG_ESUBREG     Number in \digit invalid or in error.
212
213
214       REG_EBRACK      [] imbalance.
215
216
217       REG_ENOSYS      The function is not supported.
218
219
220       REG_EPAREN      \(\) or () imbalance.
221
222
223       REG_EBRACE      \{ \} imbalance.
224
225
226       REG_BADBR       Content  of  \{  \}  invalid:  not a number, number too
227                       large, more than two numbers, first larger than second.
228
229
230       REG_ERANGE      Invalid endpoint in range expression.
231
232
233       REG_ESPACE      Out of memory.
234
235
236       REG_BADRPT      ?, * or + not preceded by valid regular expression.
237
238
239   regerror()
240       The regerror() function provides a mapping from error codes returned by
241       regcomp()  and regexec() to unspecified printable strings. It generates
242       a string corresponding to the value of the errcode argument, which must
243       be  the last non-zero value returned by regcomp() or regexec() with the
244       given value of preg. If errcode is not such a value, an  error  message
245       indicating that the error code is invalid is returned.
246
247
248       If  preg is a NULL pointer, but errcode is a value returned by a previ‐
249       ous call to regexec() or regcomp(), the regerror() still  generates  an
250       error string corresponding to the value of errcode.
251
252
253       If the errbuf_size argument is not zero, regerror() will place the gen‐
254       erated string into the buffer of size errbuf_size bytes pointed  to  by
255       errbuf.  If  the  string (including the terminating NULL) cannot fit in
256       the buffer, regerror() will truncate the string and null-terminate  the
257       result.
258
259
260       If  errbuf_size  is  zero,  regerror() ignores the errbuf argument, and
261       returns the size of the buffer needed to hold the generated string.
262
263
264       If the preg argument to regexec() or regfree() is not a compiled  regu‐
265       lar  expression  returned by regcomp(), the result is undefined. A preg
266       is no longer treated as a compiled regular expression after it is given
267       to regfree().
268
269
270       See regex(5) for BRE (Basic Regular Expression) Anchoring.
271

RETURN VALUES

273       On  successful completion, the regcomp() function returns 0. Otherwise,
274       it returns an  integer  value  indicating  an  error  as  described  in
275       <regex.h>, and the content of preg is undefined.
276
277
278       On  successful  completion, the regexec() function returns 0. Otherwise
279       it returns REG_NOMATCH to indicate no match, or REG_ENOSYS to  indicate
280       that the function is not supported.
281
282
283       Upon  successful completion, the regerror() function returns the number
284       of bytes needed to hold the  entire  generated  string.  Otherwise,  it
285       returns 0 to indicate that the function is not implemented.
286
287
288       The regfree() function returns no value.
289

ERRORS

291       No errors are defined.
292

USAGE

294       An application could use:
295
296
297       regerror(code,preg,(char *)NULL,(size_t)0)
298
299
300       to find out how big a buffer is needed for the generated string, malloc
301       a buffer to hold the string, and then call regerror() again to get  the
302       string (see malloc(3C)). Alternately, it could allocate a fixed, static
303       buffer that is big enough to hold most strings, and then  use  malloc()
304       to allocate a larger buffer if it finds that this is too small.
305

EXAMPLES

307       Example  1 Example to match string against the extended regular expres‐
308       sion in pattern.
309
310         #include <regex.h>
311         /*
312         * Match string against the extended regular expression in
313         * pattern, treating errors as no match.
314         *
315         * return 1 for match, 0 for no match
316         */
317
318         int
319         match(const char *string, char *pattern)
320         {
321               int status;
322               regex_t re;
323               if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
324                    return(0);      /* report error */
325               }
326               status = regexec(&re, string, (size_t) 0, NULL, 0);
327               regfree(&re);
328               if (status != 0) {
329                     return(0);      /* report error */
330               }
331               return(1);
332         }
333
334
335
336       The following demonstrates how the REG_NOTBOL flag could be  used  with
337       regexec()  to  find  all substrings in a line that match a pattern sup‐
338       plied by a user. (For simplicity of  the  example,  very  little  error
339       checking is done.)
340
341
342         (void) regcomp (&re, pattern, 0);
343         /* this call to regexec() finds the first match on the line */
344         error = regexec (&re, &buffer[0], 1, &pm, 0);
345         while (error == 0) {     /* while matches found */
346                 /* substring found between pm.rm_so and pm.rm_eo */
347                 /* This call to regexec() finds the next match */
348                 error = regexec (&re, buffer + pm.rm_eo, 1, &pm, REG_NOTBOL);
349         }
350
351

ATTRIBUTES

353       See attributes(5) for descriptions of the following attributes:
354
355
356
357
358       ┌─────────────────────────────┬─────────────────────────────┐
359       │      ATTRIBUTE TYPE         │      ATTRIBUTE VALUE        │
360       ├─────────────────────────────┼─────────────────────────────┤
361       │CSI                          │Enabled                      │
362       ├─────────────────────────────┼─────────────────────────────┤
363       │Interface Stability          │Standard                     │
364       ├─────────────────────────────┼─────────────────────────────┤
365       │MT-Level                     │MT-Safe with exceptions      │
366       └─────────────────────────────┴─────────────────────────────┘
367

SEE ALSO

369       fnmatch(3C),  glob(3C), malloc(3C), setlocale(3C), attributes(5), stan‐
370       dards(5), regex(5)
371

NOTES

373       The regcomp() function can be used safely in a  multithreaded  applica‐
374       tion as long as setlocale(3C) is not being called to change the locale.
375
376
377
378SunOS 5.11                        1 Nov 2003                       regcomp(3C)
Impressum