1PCRECPP(3)                 Library Functions Manual                 PCRECPP(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS OF C++ WRAPPER

9
10       #include <pcrecpp.h>
11

DESCRIPTION

13
14       The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
15       functionality was added by Giuseppe Maxia. This brief man page was con‐
16       structed  from  the  notes  in the pcrecpp.h file, which should be con‐
17       sulted for further details.
18

MATCHING INTERFACE

20
21       The "FullMatch" operation checks that supplied text matches a  supplied
22       pattern  exactly.  If pointer arguments are supplied, it copies matched
23       sub-strings that match sub-patterns into them.
24
25         Example: successful match
26            pcrecpp::RE re("h.*o");
27            re.FullMatch("hello");
28
29         Example: unsuccessful match (requires full match):
30            pcrecpp::RE re("e");
31            !re.FullMatch("hello");
32
33         Example: creating a temporary RE object:
34            pcrecpp::RE("h.*o").FullMatch("hello");
35
36       You can pass in a "const char*" or a "string" for "text". The  examples
37       below  tend to use a const char*. You can, as in the different examples
38       above, store the RE object explicitly in a variable or use a  temporary
39       RE  object.  The  examples below use one mode or the other arbitrarily.
40       Either could correctly be used for any of these examples.
41
42       You must supply extra pointer arguments to extract matched subpieces.
43
44         Example: extracts "ruby" into "s" and 1234 into "i"
45            int i;
46            string s;
47            pcrecpp::RE re("(\\w+):(\\d+)");
48            re.FullMatch("ruby:1234", &s, &i);
49
50         Example: does not try to extract any extra sub-patterns
51            re.FullMatch("ruby:1234", &s);
52
53         Example: does not try to extract into NULL
54            re.FullMatch("ruby:1234", NULL, &i);
55
56         Example: integer overflow causes failure
57            !re.FullMatch("ruby:1234567891234", NULL, &i);
58
59         Example: fails because there aren't enough sub-patterns:
60            !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
61
62         Example: fails because string cannot be stored in integer
63            !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
64
65       The provided pointer arguments can be pointers to  any  scalar  numeric
66       type, or one of:
67
68          string        (matched piece is copied to string)
69          StringPiece   (StringPiece is mutated to point to matched piece)
70          T             (where "bool T::ParseFrom(const char*, int)" exists)
71          NULL          (the corresponding matched sub-pattern is not copied)
72
73       The  function returns true iff all of the following conditions are sat‐
74       isfied:
75
76         a. "text" matches "pattern" exactly;
77
78         b. The number of matched sub-patterns is >= number of supplied
79            pointers;
80
81         c. The "i"th argument has a suitable type for holding the
82            string captured as the "i"th sub-pattern. If you pass in
83            void * NULL for the "i"th argument, or a non-void * NULL
84            of the correct type, or pass fewer arguments than the
85            number of sub-patterns, "i"th captured sub-pattern is
86            ignored.
87
88       CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
89       string  is  assigned  the  empty  string. Therefore, the following will
90       return false (because the empty string is not a valid number):
91
92          int number;
93          pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
94
95       The matching interface supports at most 16 arguments per call.  If  you
96       need    more,    consider    using    the    more   general   interface
97       pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
98

QUOTING METACHARACTERS

100
101       You can use the "QuoteMeta" operation to insert backslashes before  all
102       potentially  meaningful  characters  in  a string. The returned string,
103       used as a regular expression, will exactly match the original string.
104
105         Example:
106            string quoted = RE::QuoteMeta(unquoted);
107
108       Note that it's legal to escape a character even if it  has  no  special
109       meaning  in  a  regular expression -- so this function does that. (This
110       also makes it identical to the perl function  of  the  same  name;  see
111       "perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
112       "1\.5\-2\.0\?".
113

PARTIAL MATCHES

115
116       You can use the "PartialMatch" operation when you want the  pattern  to
117       match any substring of the text.
118
119         Example: simple search for a string:
120            pcrecpp::RE("ell").PartialMatch("hello");
121
122         Example: find first number in a string:
123            int number;
124            pcrecpp::RE re("(\\d+)");
125            re.PartialMatch("x*100 + 20", &number);
126            assert(number == 100);
127

UTF-8 AND THE MATCHING INTERFACE

129
130       By  default,  pattern  and text are plain text, one byte per character.
131       The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
132       string to be treated as UTF-8 text, still a byte stream but potentially
133       multiple bytes per character. In practice, the text is likelier  to  be
134       UTF-8  than  the pattern, but the match returned may depend on the UTF8
135       flag, so always use it when matching UTF8 text. For example,  "."  will
136       match  one  byte normally but with UTF8 set may match up to three bytes
137       of a multi-byte character.
138
139         Example:
140            pcrecpp::RE_Options options;
141            options.set_utf8();
142            pcrecpp::RE re(utf8_pattern, options);
143            re.FullMatch(utf8_string);
144
145         Example: using the convenience function UTF8():
146            pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
147            re.FullMatch(utf8_string);
148
149       NOTE: The UTF8 flag is ignored if pcre was not configured with the
150             --enable-utf8 flag.
151

PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE

153
154       PCRE defines some modifiers to  change  the  behavior  of  the  regular
155       expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
156       RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur‐
157       rently, the following modifiers are supported:
158
159          modifier              description               Perl corresponding
160
161          PCRE_CASELESS         case insensitive match      /i
162          PCRE_MULTILINE        multiple lines match        /m
163          PCRE_DOTALL           dot matches newlines        /s
164          PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
165          PCRE_EXTRA            strict escape parsing       N/A
166          PCRE_EXTENDED         ignore white spaces         /x
167          PCRE_UTF8             handles UTF8 chars          built-in
168          PCRE_UNGREEDY         reverses * and *?           N/A
169          PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
170
171       (*)  Both Perl and PCRE allow non capturing parentheses by means of the
172       "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap‐
173       ture, while (ab|cd) does.
174
175       For  a  full  account on how each modifier works, please check the PCRE
176       API reference page.
177
178       For each modifier, there are two member functions whose  name  is  made
179       out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
180       instance, PCRE_CASELESS is handled by
181
182         bool caseless()
183
184       which returns true if the modifier is set, and
185
186         RE_Options & set_caseless(bool)
187
188       which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
189       be  accessed  through  the  set_match_limit()  and match_limit() member
190       functions. Setting match_limit to a non-zero value will limit the  exe‐
191       cution  of pcre to keep it from doing bad things like blowing the stack
192       or taking an eternity to return a result.  A  value  of  5000  is  good
193       enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
194       to  zero  disables  match  limiting.  Alternatively,   you   can   call
195       match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
196       limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
197       matches PCRE does; match_limit_recursion() limits the depth of internal
198       recursion, and therefore the amount of stack that is used.
199
200       Normally, to pass one or more modifiers to a RE class,  you  declare  a
201       RE_Options object, set the appropriate options, and pass this object to
202       a RE constructor. Example:
203
204          RE_options opt;
205          opt.set_caseless(true);
206          if (RE("HELLO", opt).PartialMatch("hello world")) ...
207
208       RE_options has two constructors. The default constructor takes no argu‐
209       ments  and creates a set of flags that are off by default. The optional
210       parameter option_flags is to facilitate transfer of legacy code from  C
211       programs.  This lets you do
212
213          RE(pattern,
214            RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
215
216       However, new code is better off doing
217
218          RE(pattern,
219            RE_Options().set_caseless(true).set_multiline(true))
220              .PartialMatch(str);
221
222       If you are going to pass one of the most used modifiers, there are some
223       convenience functions that return a RE_Options class with the appropri‐
224       ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
225       and EXTENDED().
226
227       If you need to set several options at once, and you don't  want  to  go
228       through  the pains of declaring a RE_Options object and setting several
229       options, there is a parallel method that give you such ability  on  the
230       fly.  You  can  concatenate several set_xxxxx() member functions, since
231       each of them returns a reference to its class object. For  example,  to
232       pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
233       statement, you may write:
234
235          RE(" ^ xyz \\s+ .* blah$",
236            RE_Options()
237              .set_caseless(true)
238              .set_extended(true)
239              .set_multiline(true)).PartialMatch(sometext);
240
241

SCANNING TEXT INCREMENTALLY

243
244       The "Consume" operation may be useful if you want to  repeatedly  match
245       regular expressions at the front of a string and skip over them as they
246       match. This requires use of the "StringPiece" type, which represents  a
247       sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
248       pcrecpp namespace.
249
250         Example: read lines of the form "var = value" from a string.
251            string contents = ...;                 // Fill string somehow
252            pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
253
254            string var;
255            int value;
256            pcrecpp::RE re("(\\w+) = (\\d+)\n");
257            while (re.Consume(&input, &var, &value)) {
258              ...;
259            }
260
261       Each successful call  to  "Consume"  will  set  "var/value",  and  also
262       advance "input" so it points past the matched text.
263
264       The  "FindAndConsume"  operation  is  similar to "Consume" but does not
265       anchor your match at the beginning of  the  string.  For  example,  you
266       could extract all words from a string by repeatedly calling
267
268         pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
269

PARSING HEX/OCTAL/C-RADIX NUMBERS

271
272       By default, if you pass a pointer to a numeric value, the corresponding
273       text is interpreted as a base-10  number.  You  can  instead  wrap  the
274       pointer with a call to one of the operators Hex(), Octal(), or CRadix()
275       to interpret the text in another base. The CRadix  operator  interprets
276       C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
277       base-10.
278
279         Example:
280           int a, b, c, d;
281           pcrecpp::RE re("(.*) (.*) (.*) (.*)");
282           re.FullMatch("100 40 0100 0x40",
283                        pcrecpp::Octal(&a), pcrecpp::Hex(&b),
284                        pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
285
286       will leave 64 in a, b, c, and d.
287

REPLACING PARTS OF STRINGS

289
290       You can replace the first match of "pattern" in "str"  with  "rewrite".
291       Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
292       insert text matching corresponding parenthesized group  from  the  pat‐
293       tern. \0 in "rewrite" refers to the entire matching text. For example:
294
295         string s = "yabba dabba doo";
296         pcrecpp::RE("b+").Replace("d", &s);
297
298       will  leave  "s" containing "yada dabba doo". The result is true if the
299       pattern matches and a replacement occurs, false otherwise.
300
301       GlobalReplace is like Replace except that it replaces  all  occurrences
302       of  the  pattern  in  the string with the rewrite. Replacements are not
303       subject to re-matching. For example:
304
305         string s = "yabba dabba doo";
306         pcrecpp::RE("b+").GlobalReplace("d", &s);
307
308       will leave "s" containing "yada dada doo". It  returns  the  number  of
309       replacements made.
310
311       Extract  is like Replace, except that if the pattern matches, "rewrite"
312       is copied into "out" (an additional argument) with substitutions.   The
313       non-matching  portions  of "text" are ignored. Returns true iff a match
314       occurred and the extraction happened successfully;  if no match occurs,
315       the string is left unaffected.
316

AUTHOR

318
319       The C++ wrapper was contributed by Google Inc.
320       Copyright (c) 2007 Google Inc.
321

REVISION

323
324       Last updated: 12 November 2007
325
326
327
328                                                                    PCRECPP(3)
Impressum