1PCRECPP(3) Library Functions Manual PCRECPP(3)
2
3
4
6 PCRE - Perl-compatible regular expressions.
7
9
10 #include <pcrecpp.h>
11
13
14 The C++ wrapper for PCRE was provided by Google Inc. Some additional
15 functionality was added by Giuseppe Maxia. This brief man page was con‐
16 structed from the notes in the pcrecpp.h file, which should be con‐
17 sulted for further details. Note that the C++ wrapper supports only the
18 original 8-bit PCRE library. There is no 16-bit or 32-bit support at
19 present.
20
22
23 The "FullMatch" operation checks that supplied text matches a supplied
24 pattern exactly. If pointer arguments are supplied, it copies matched
25 sub-strings that match sub-patterns into them.
26
27 Example: successful match
28 pcrecpp::RE re("h.*o");
29 re.FullMatch("hello");
30
31 Example: unsuccessful match (requires full match):
32 pcrecpp::RE re("e");
33 !re.FullMatch("hello");
34
35 Example: creating a temporary RE object:
36 pcrecpp::RE("h.*o").FullMatch("hello");
37
38 You can pass in a "const char*" or a "string" for "text". The examples
39 below tend to use a const char*. You can, as in the different examples
40 above, store the RE object explicitly in a variable or use a temporary
41 RE object. The examples below use one mode or the other arbitrarily.
42 Either could correctly be used for any of these examples.
43
44 You must supply extra pointer arguments to extract matched subpieces.
45
46 Example: extracts "ruby" into "s" and 1234 into "i"
47 int i;
48 string s;
49 pcrecpp::RE re("(\\w+):(\\d+)");
50 re.FullMatch("ruby:1234", &s, &i);
51
52 Example: does not try to extract any extra sub-patterns
53 re.FullMatch("ruby:1234", &s);
54
55 Example: does not try to extract into NULL
56 re.FullMatch("ruby:1234", NULL, &i);
57
58 Example: integer overflow causes failure
59 !re.FullMatch("ruby:1234567891234", NULL, &i);
60
61 Example: fails because there aren't enough sub-patterns:
62 !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
63
64 Example: fails because string cannot be stored in integer
65 !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
66
67 The provided pointer arguments can be pointers to any scalar numeric
68 type, or one of:
69
70 string (matched piece is copied to string)
71 StringPiece (StringPiece is mutated to point to matched piece)
72 T (where "bool T::ParseFrom(const char*, int)" exists)
73 NULL (the corresponding matched sub-pattern is not copied)
74
75 The function returns true iff all of the following conditions are sat‐
76 isfied:
77
78 a. "text" matches "pattern" exactly;
79
80 b. The number of matched sub-patterns is >= number of supplied
81 pointers;
82
83 c. The "i"th argument has a suitable type for holding the
84 string captured as the "i"th sub-pattern. If you pass in
85 void * NULL for the "i"th argument, or a non-void * NULL
86 of the correct type, or pass fewer arguments than the
87 number of sub-patterns, "i"th captured sub-pattern is
88 ignored.
89
90 CAVEAT: An optional sub-pattern that does not exist in the matched
91 string is assigned the empty string. Therefore, the following will
92 return false (because the empty string is not a valid number):
93
94 int number;
95 pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
96
97 The matching interface supports at most 16 arguments per call. If you
98 need more, consider using the more general interface
99 pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
100
101 NOTE: Do not use no_arg, which is used internally to mark the end of a
102 list of optional arguments, as a placeholder for missing arguments, as
103 this can lead to segfaults.
104
106
107 You can use the "QuoteMeta" operation to insert backslashes before all
108 potentially meaningful characters in a string. The returned string,
109 used as a regular expression, will exactly match the original string.
110
111 Example:
112 string quoted = RE::QuoteMeta(unquoted);
113
114 Note that it's legal to escape a character even if it has no special
115 meaning in a regular expression -- so this function does that. (This
116 also makes it identical to the perl function of the same name; see
117 "perldoc -f quotemeta".) For example, "1.5-2.0?" becomes
118 "1\.5\-2\.0\?".
119
121
122 You can use the "PartialMatch" operation when you want the pattern to
123 match any substring of the text.
124
125 Example: simple search for a string:
126 pcrecpp::RE("ell").PartialMatch("hello");
127
128 Example: find first number in a string:
129 int number;
130 pcrecpp::RE re("(\\d+)");
131 re.PartialMatch("x*100 + 20", &number);
132 assert(number == 100);
133
135
136 By default, pattern and text are plain text, one byte per character.
137 The UTF8 flag, passed to the constructor, causes both pattern and
138 string to be treated as UTF-8 text, still a byte stream but potentially
139 multiple bytes per character. In practice, the text is likelier to be
140 UTF-8 than the pattern, but the match returned may depend on the UTF8
141 flag, so always use it when matching UTF8 text. For example, "." will
142 match one byte normally but with UTF8 set may match up to three bytes
143 of a multi-byte character.
144
145 Example:
146 pcrecpp::RE_Options options;
147 options.set_utf8();
148 pcrecpp::RE re(utf8_pattern, options);
149 re.FullMatch(utf8_string);
150
151 Example: using the convenience function UTF8():
152 pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
153 re.FullMatch(utf8_string);
154
155 NOTE: The UTF8 flag is ignored if pcre was not configured with the
156 --enable-utf8 flag.
157
159
160 PCRE defines some modifiers to change the behavior of the regular
161 expression engine. The C++ wrapper defines an auxiliary class,
162 RE_Options, as a vehicle to pass such modifiers to a RE class. Cur‐
163 rently, the following modifiers are supported:
164
165 modifier description Perl corresponding
166
167 PCRE_CASELESS case insensitive match /i
168 PCRE_MULTILINE multiple lines match /m
169 PCRE_DOTALL dot matches newlines /s
170 PCRE_DOLLAR_ENDONLY $ matches only at end N/A
171 PCRE_EXTRA strict escape parsing N/A
172 PCRE_EXTENDED ignore white spaces /x
173 PCRE_UTF8 handles UTF8 chars built-in
174 PCRE_UNGREEDY reverses * and *? N/A
175 PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
176
177 (*) Both Perl and PCRE allow non capturing parentheses by means of the
178 "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not cap‐
179 ture, while (ab|cd) does.
180
181 For a full account on how each modifier works, please check the PCRE
182 API reference page.
183
184 For each modifier, there are two member functions whose name is made
185 out of the modifier in lowercase, without the "PCRE_" prefix. For
186 instance, PCRE_CASELESS is handled by
187
188 bool caseless()
189
190 which returns true if the modifier is set, and
191
192 RE_Options & set_caseless(bool)
193
194 which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
195 be accessed through the set_match_limit() and match_limit() member
196 functions. Setting match_limit to a non-zero value will limit the exe‐
197 cution of pcre to keep it from doing bad things like blowing the stack
198 or taking an eternity to return a result. A value of 5000 is good
199 enough to stop stack blowup in a 2MB thread stack. Setting match_limit
200 to zero disables match limiting. Alternatively, you can call
201 match_limit_recursion() which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
202 limit how much PCRE recurses. match_limit() limits the number of
203 matches PCRE does; match_limit_recursion() limits the depth of internal
204 recursion, and therefore the amount of stack that is used.
205
206 Normally, to pass one or more modifiers to a RE class, you declare a
207 RE_Options object, set the appropriate options, and pass this object to
208 a RE constructor. Example:
209
210 RE_Options opt;
211 opt.set_caseless(true);
212 if (RE("HELLO", opt).PartialMatch("hello world")) ...
213
214 RE_options has two constructors. The default constructor takes no argu‐
215 ments and creates a set of flags that are off by default. The optional
216 parameter option_flags is to facilitate transfer of legacy code from C
217 programs. This lets you do
218
219 RE(pattern,
220 RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
221
222 However, new code is better off doing
223
224 RE(pattern,
225 RE_Options().set_caseless(true).set_multiline(true))
226 .PartialMatch(str);
227
228 If you are going to pass one of the most used modifiers, there are some
229 convenience functions that return a RE_Options class with the appropri‐
230 ate modifier already set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
231 and EXTENDED().
232
233 If you need to set several options at once, and you don't want to go
234 through the pains of declaring a RE_Options object and setting several
235 options, there is a parallel method that give you such ability on the
236 fly. You can concatenate several set_xxxxx() member functions, since
237 each of them returns a reference to its class object. For example, to
238 pass PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
239 statement, you may write:
240
241 RE(" ^ xyz \\s+ .* blah$",
242 RE_Options()
243 .set_caseless(true)
244 .set_extended(true)
245 .set_multiline(true)).PartialMatch(sometext);
246
247
249
250 The "Consume" operation may be useful if you want to repeatedly match
251 regular expressions at the front of a string and skip over them as they
252 match. This requires use of the "StringPiece" type, which represents a
253 sub-range of a real string. Like RE, StringPiece is defined in the
254 pcrecpp namespace.
255
256 Example: read lines of the form "var = value" from a string.
257 string contents = ...; // Fill string somehow
258 pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
259
260 string var;
261 int value;
262 pcrecpp::RE re("(\\w+) = (\\d+)\n");
263 while (re.Consume(&input, &var, &value)) {
264 ...;
265 }
266
267 Each successful call to "Consume" will set "var/value", and also
268 advance "input" so it points past the matched text.
269
270 The "FindAndConsume" operation is similar to "Consume" but does not
271 anchor your match at the beginning of the string. For example, you
272 could extract all words from a string by repeatedly calling
273
274 pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
275
277
278 By default, if you pass a pointer to a numeric value, the corresponding
279 text is interpreted as a base-10 number. You can instead wrap the
280 pointer with a call to one of the operators Hex(), Octal(), or CRadix()
281 to interpret the text in another base. The CRadix operator interprets
282 C-style "0" (base-8) and "0x" (base-16) prefixes, but defaults to
283 base-10.
284
285 Example:
286 int a, b, c, d;
287 pcrecpp::RE re("(.*) (.*) (.*) (.*)");
288 re.FullMatch("100 40 0100 0x40",
289 pcrecpp::Octal(&a), pcrecpp::Hex(&b),
290 pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
291
292 will leave 64 in a, b, c, and d.
293
295
296 You can replace the first match of "pattern" in "str" with "rewrite".
297 Within "rewrite", backslash-escaped digits (\1 to \9) can be used to
298 insert text matching corresponding parenthesized group from the pat‐
299 tern. \0 in "rewrite" refers to the entire matching text. For example:
300
301 string s = "yabba dabba doo";
302 pcrecpp::RE("b+").Replace("d", &s);
303
304 will leave "s" containing "yada dabba doo". The result is true if the
305 pattern matches and a replacement occurs, false otherwise.
306
307 GlobalReplace is like Replace except that it replaces all occurrences
308 of the pattern in the string with the rewrite. Replacements are not
309 subject to re-matching. For example:
310
311 string s = "yabba dabba doo";
312 pcrecpp::RE("b+").GlobalReplace("d", &s);
313
314 will leave "s" containing "yada dada doo". It returns the number of
315 replacements made.
316
317 Extract is like Replace, except that if the pattern matches, "rewrite"
318 is copied into "out" (an additional argument) with substitutions. The
319 non-matching portions of "text" are ignored. Returns true iff a match
320 occurred and the extraction happened successfully; if no match occurs,
321 the string is left unaffected.
322
324
325 The C++ wrapper was contributed by Google Inc.
326 Copyright (c) 2007 Google Inc.
327
329
330 Last updated: 08 January 2012
331
332
333
334PCRE 8.30 08 January 2012 PCRECPP(3)