1regex(3) Library Functions Manual regex(3)
2
3
4
6 regcomp, regexec, regerror, regfree - POSIX regex functions
7
9 Standard C library (libc, -lc)
10
12 #include <regex.h>
13
14 int regcomp(regex_t *restrict preg, const char *restrict regex,
15 int cflags);
16 int regexec(const regex_t *restrict preg, const char *restrict string,
17 size_t nmatch, regmatch_t pmatch[restrict .nmatch],
18 int eflags);
19
20 size_t regerror(int errcode, const regex_t *restrict preg,
21 char errbuf[restrict .errbuf_size], size_t errbuf_size);
22 void regfree(regex_t *preg);
23
25 POSIX regex compiling
26 regcomp() is used to compile a regular expression into a form that is
27 suitable for subsequent regexec() searches.
28
29 regcomp() is supplied with preg, a pointer to a pattern buffer storage
30 area; regex, a pointer to the null-terminated string and cflags, flags
31 used to determine the type of compilation.
32
33 All regular expression searching must be done via a compiled pattern
34 buffer, thus regexec() must always be supplied with the address of a
35 regcomp()-initialized pattern buffer.
36
37 cflags is the bitwise-or of zero or more of the following:
38
39 REG_EXTENDED
40 Use POSIX Extended Regular Expression syntax when interpreting
41 regex. If not set, POSIX Basic Regular Expression syntax is
42 used.
43
44 REG_ICASE
45 Do not differentiate case. Subsequent regexec() searches using
46 this pattern buffer will be case insensitive.
47
48 REG_NOSUB
49 Do not report position of matches. The nmatch and pmatch argu‐
50 ments to regexec() are ignored if the pattern buffer supplied
51 was compiled with this flag set.
52
53 REG_NEWLINE
54 Match-any-character operators don't match a newline.
55
56 A nonmatching list ([^...]) not containing a newline does not
57 match a newline.
58
59 Match-beginning-of-line operator (^) matches the empty string
60 immediately after a newline, regardless of whether eflags, the
61 execution flags of regexec(), contains REG_NOTBOL.
62
63 Match-end-of-line operator ($) matches the empty string immedi‐
64 ately before a newline, regardless of whether eflags contains
65 REG_NOTEOL.
66
67 POSIX regex matching
68 regexec() is used to match a null-terminated string against the precom‐
69 piled pattern buffer, preg. nmatch and pmatch are used to provide in‐
70 formation regarding the location of any matches. eflags is the bit‐
71 wise-or of zero or more of the following flags:
72
73 REG_NOTBOL
74 The match-beginning-of-line operator always fails to match (but
75 see the compilation flag REG_NEWLINE above). This flag may be
76 used when different portions of a string are passed to regexec()
77 and the beginning of the string should not be interpreted as the
78 beginning of the line.
79
80 REG_NOTEOL
81 The match-end-of-line operator always fails to match (but see
82 the compilation flag REG_NEWLINE above).
83
84 REG_STARTEND
85 Use pmatch[0] on the input string, starting at byte
86 pmatch[0].rm_so and ending before byte pmatch[0].rm_eo. This
87 allows matching embedded NUL bytes and avoids a strlen(3) on
88 large strings. It does not use nmatch on input, and does not
89 change REG_NOTBOL or REG_NEWLINE processing. This flag is a BSD
90 extension, not present in POSIX.
91
92 Byte offsets
93 Unless REG_NOSUB was set for the compilation of the pattern buffer, it
94 is possible to obtain match addressing information. pmatch must be di‐
95 mensioned to have at least nmatch elements. These are filled in by
96 regexec() with substring match addresses. The offsets of the subex‐
97 pression starting at the ith open parenthesis are stored in pmatch[i].
98 The entire regular expression's match addresses are stored in
99 pmatch[0]. (Note that to return the offsets of N subexpression
100 matches, nmatch must be at least N+1.) Any unused structure elements
101 will contain the value -1.
102
103 The regmatch_t structure which is the type of pmatch is defined in
104 <regex.h>.
105
106 typedef struct {
107 regoff_t rm_so;
108 regoff_t rm_eo;
109 } regmatch_t;
110
111 Each rm_so element that is not -1 indicates the start offset of the
112 next largest substring match within the string. The relative rm_eo el‐
113 ement indicates the end offset of the match, which is the offset of the
114 first character after the matching text.
115
116 POSIX error reporting
117 regerror() is used to turn the error codes that can be returned by both
118 regcomp() and regexec() into error message strings.
119
120 regerror() is passed the error code, errcode, the pattern buffer, preg,
121 a pointer to a character string buffer, errbuf, and the size of the
122 string buffer, errbuf_size. It returns the size of the errbuf required
123 to contain the null-terminated error message string. If both errbuf
124 and errbuf_size are nonzero, errbuf is filled in with the first er‐
125 rbuf_size - 1 characters of the error message and a terminating null
126 byte ('\0').
127
128 POSIX pattern buffer freeing
129 Supplying regfree() with a precompiled pattern buffer, preg, will free
130 the memory allocated to the pattern buffer by the compiling process,
131 regcomp().
132
134 regcomp() returns zero for a successful compilation or an error code
135 for failure.
136
137 regexec() returns zero for a successful match or REG_NOMATCH for fail‐
138 ure.
139
141 The following errors can be returned by regcomp():
142
143 REG_BADBR
144 Invalid use of back reference operator.
145
146 REG_BADPAT
147 Invalid use of pattern operators such as group or list.
148
149 REG_BADRPT
150 Invalid use of repetition operators such as using '*' as the
151 first character.
152
153 REG_EBRACE
154 Un-matched brace interval operators.
155
156 REG_EBRACK
157 Un-matched bracket list operators.
158
159 REG_ECOLLATE
160 Invalid collating element.
161
162 REG_ECTYPE
163 Unknown character class name.
164
165 REG_EEND
166 Nonspecific error. This is not defined by POSIX.2.
167
168 REG_EESCAPE
169 Trailing backslash.
170
171 REG_EPAREN
172 Un-matched parenthesis group operators.
173
174 REG_ERANGE
175 Invalid use of the range operator; for example, the ending point
176 of the range occurs prior to the starting point.
177
178 REG_ESIZE
179 Compiled regular expression requires a pattern buffer larger
180 than 64 kB. This is not defined by POSIX.2.
181
182 REG_ESPACE
183 The regex routines ran out of memory.
184
185 REG_ESUBREG
186 Invalid back reference to a subexpression.
187
189 For an explanation of the terms used in this section, see at‐
190 tributes(7).
191
192 ┌─────────────────────────────────────┬───────────────┬────────────────┐
193 │Interface │ Attribute │ Value │
194 ├─────────────────────────────────────┼───────────────┼────────────────┤
195 │regcomp(), regexec() │ Thread safety │ MT-Safe locale │
196 ├─────────────────────────────────────┼───────────────┼────────────────┤
197 │regerror() │ Thread safety │ MT-Safe env │
198 ├─────────────────────────────────────┼───────────────┼────────────────┤
199 │regfree() │ Thread safety │ MT-Safe │
200 └─────────────────────────────────────┴───────────────┴────────────────┘
201
203 POSIX.1-2008.
204
206 POSIX.1-2001.
207
209 #include <stdint.h>
210 #include <stdio.h>
211 #include <stdlib.h>
212 #include <regex.h>
213
214 #define ARRAY_SIZE(arr) (sizeof((arr)) / sizeof((arr)[0]))
215
216 static const char *const str =
217 "1) John Driverhacker;\n2) John Doe;\n3) John Foo;\n";
218 static const char *const re = "John.*o";
219
220 int main(void)
221 {
222 static const char *s = str;
223 regex_t regex;
224 regmatch_t pmatch[1];
225 regoff_t off, len;
226
227 if (regcomp(®ex, re, REG_NEWLINE))
228 exit(EXIT_FAILURE);
229
230 printf("String = \"%s\"\n", str);
231 printf("Matches:\n");
232
233 for (unsigned int i = 0; ; i++) {
234 if (regexec(®ex, s, ARRAY_SIZE(pmatch), pmatch, 0))
235 break;
236
237 off = pmatch[0].rm_so + (s - str);
238 len = pmatch[0].rm_eo - pmatch[0].rm_so;
239 printf("#%zu:\n", i);
240 printf("offset = %jd; length = %jd\n", (intmax_t) off,
241 (intmax_t) len);
242 printf("substring = \"%.*s\"\n", len, s + pmatch[0].rm_so);
243
244 s += pmatch[0].rm_eo;
245 }
246
247 exit(EXIT_SUCCESS);
248 }
249
251 grep(1), regex(7)
252
253 The glibc manual section, Regular Expressions
254
255
256
257Linux man-pages 6.04 2023-03-30 regex(3)