1PCRECPP(3)                 Library Functions Manual                 PCRECPP(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS OF C++ WRAPPER

9
10       #include <pcrecpp.h>
11

DESCRIPTION

13
14       The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
15       functionality was added by Giuseppe Maxia. This brief man page was con‐
16       structed  from  the  notes  in the pcrecpp.h file, which should be con‐
17       sulted for further details. Note that the C++ wrapper supports only the
18       original  8-bit  PCRE  library. There is no 16-bit or 32-bit support at
19       present.
20

MATCHING INTERFACE

22
23       The "FullMatch" operation checks that supplied text matches a  supplied
24       pattern  exactly.  If pointer arguments are supplied, it copies matched
25       sub-strings that match sub-patterns into them.
26
27         Example: successful match
28            pcrecpp::RE re("h.*o");
29            re.FullMatch("hello");
30
31         Example: unsuccessful match (requires full match):
32            pcrecpp::RE re("e");
33            !re.FullMatch("hello");
34
35         Example: creating a temporary RE object:
36            pcrecpp::RE("h.*o").FullMatch("hello");
37
38       You can pass in a "const char*" or a "string" for "text". The  examples
39       below  tend to use a const char*. You can, as in the different examples
40       above, store the RE object explicitly in a variable or use a  temporary
41       RE  object.  The  examples below use one mode or the other arbitrarily.
42       Either could correctly be used for any of these examples.
43
44       You must supply extra pointer arguments to extract matched subpieces.
45
46         Example: extracts "ruby" into "s" and 1234 into "i"
47            int i;
48            string s;
49            pcrecpp::RE re("(\\w+):(\\d+)");
50            re.FullMatch("ruby:1234", &s, &i);
51
52         Example: does not try to extract any extra sub-patterns
53            re.FullMatch("ruby:1234", &s);
54
55         Example: does not try to extract into NULL
56            re.FullMatch("ruby:1234", NULL, &i);
57
58         Example: integer overflow causes failure
59            !re.FullMatch("ruby:1234567891234", NULL, &i);
60
61         Example: fails because there aren't enough sub-patterns:
62            !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
63
64         Example: fails because string cannot be stored in integer
65            !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
66
67       The provided pointer arguments can be pointers to  any  scalar  numeric
68       type, or one of:
69
70          string        (matched piece is copied to string)
71          StringPiece   (StringPiece is mutated to point to matched piece)
72          T             (where "bool T::ParseFrom(const char*, int)" exists)
73          NULL          (the corresponding matched sub-pattern is not copied)
74
75       The  function returns true iff all of the following conditions are sat‐
76       isfied:
77
78         a. "text" matches "pattern" exactly;
79
80         b. The number of matched sub-patterns is >= number of supplied
81            pointers;
82
83         c. The "i"th argument has a suitable type for holding the
84            string captured as the "i"th sub-pattern. If you pass in
85            void * NULL for the "i"th argument, or a non-void * NULL
86            of the correct type, or pass fewer arguments than the
87            number of sub-patterns, "i"th captured sub-pattern is
88            ignored.
89
90       CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
91       string  is  assigned  the  empty  string. Therefore, the following will
92       return false (because the empty string is not a valid number):
93
94          int number;
95          pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
96
97       The matching interface supports at most 16 arguments per call.  If  you
98       need    more,    consider    using    the    more   general   interface
99       pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
100
101       NOTE: Do not use no_arg, which is used internally to mark the end of  a
102       list  of optional arguments, as a placeholder for missing arguments, as
103       this can lead to segfaults.
104

QUOTING METACHARACTERS

106
107       You can use the "QuoteMeta" operation to insert backslashes before  all
108       potentially  meaningful  characters  in  a string. The returned string,
109       used as a regular expression, will exactly match the original string.
110
111         Example:
112            string quoted = RE::QuoteMeta(unquoted);
113
114       Note that it's legal to escape a character even if it  has  no  special
115       meaning  in  a  regular expression -- so this function does that. (This
116       also makes it identical to the perl function  of  the  same  name;  see
117       "perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
118       "1\.5\-2\.0\?".
119

PARTIAL MATCHES

121
122       You can use the "PartialMatch" operation when you want the  pattern  to
123       match any substring of the text.
124
125         Example: simple search for a string:
126            pcrecpp::RE("ell").PartialMatch("hello");
127
128         Example: find first number in a string:
129            int number;
130            pcrecpp::RE re("(\\d+)");
131            re.PartialMatch("x*100 + 20", &number);
132            assert(number == 100);
133

UTF-8 AND THE MATCHING INTERFACE

135
136       By  default,  pattern  and text are plain text, one byte per character.
137       The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
138       string to be treated as UTF-8 text, still a byte stream but potentially
139       multiple bytes per character. In practice, the text is likelier  to  be
140       UTF-8  than  the pattern, but the match returned may depend on the UTF8
141       flag, so always use it when matching UTF8 text. For example,  "."  will
142       match  one  byte normally but with UTF8 set may match up to three bytes
143       of a multi-byte character.
144
145         Example:
146            pcrecpp::RE_Options options;
147            options.set_utf8();
148            pcrecpp::RE re(utf8_pattern, options);
149            re.FullMatch(utf8_string);
150
151         Example: using the convenience function UTF8():
152            pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
153            re.FullMatch(utf8_string);
154
155       NOTE: The UTF8 flag is ignored if pcre was not configured with the
156             --enable-utf8 flag.
157

PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE

159
160       PCRE defines some modifiers to  change  the  behavior  of  the  regular
161       expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
162       RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur‐
163       rently, the following modifiers are supported:
164
165          modifier              description               Perl corresponding
166
167          PCRE_CASELESS         case insensitive match      /i
168          PCRE_MULTILINE        multiple lines match        /m
169          PCRE_DOTALL           dot matches newlines        /s
170          PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
171          PCRE_EXTRA            strict escape parsing       N/A
172          PCRE_EXTENDED         ignore white spaces         /x
173          PCRE_UTF8             handles UTF8 chars          built-in
174          PCRE_UNGREEDY         reverses * and *?           N/A
175          PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
176
177       (*)  Both Perl and PCRE allow non capturing parentheses by means of the
178       "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap‐
179       ture, while (ab|cd) does.
180
181       For  a  full  account on how each modifier works, please check the PCRE
182       API reference page.
183
184       For each modifier, there are two member functions whose  name  is  made
185       out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
186       instance, PCRE_CASELESS is handled by
187
188         bool caseless()
189
190       which returns true if the modifier is set, and
191
192         RE_Options & set_caseless(bool)
193
194       which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
195       be  accessed  through  the  set_match_limit()  and match_limit() member
196       functions. Setting match_limit to a non-zero value will limit the  exe‐
197       cution  of pcre to keep it from doing bad things like blowing the stack
198       or taking an eternity to return a result.  A  value  of  5000  is  good
199       enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
200       to  zero  disables  match  limiting.  Alternatively,   you   can   call
201       match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
202       limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
203       matches PCRE does; match_limit_recursion() limits the depth of internal
204       recursion, and therefore the amount of stack that is used.
205
206       Normally, to pass one or more modifiers to a RE class,  you  declare  a
207       RE_Options object, set the appropriate options, and pass this object to
208       a RE constructor. Example:
209
210          RE_Options opt;
211          opt.set_caseless(true);
212          if (RE("HELLO", opt).PartialMatch("hello world")) ...
213
214       RE_options has two constructors. The default constructor takes no argu‐
215       ments  and creates a set of flags that are off by default. The optional
216       parameter option_flags is to facilitate transfer of legacy code from  C
217       programs.  This lets you do
218
219          RE(pattern,
220            RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
221
222       However, new code is better off doing
223
224          RE(pattern,
225            RE_Options().set_caseless(true).set_multiline(true))
226              .PartialMatch(str);
227
228       If you are going to pass one of the most used modifiers, there are some
229       convenience functions that return a RE_Options class with the appropri‐
230       ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
231       and EXTENDED().
232
233       If you need to set several options at once, and you don't  want  to  go
234       through  the pains of declaring a RE_Options object and setting several
235       options, there is a parallel method that give you such ability  on  the
236       fly.  You  can  concatenate several set_xxxxx() member functions, since
237       each of them returns a reference to its class object. For  example,  to
238       pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
239       statement, you may write:
240
241          RE(" ^ xyz \\s+ .* blah$",
242            RE_Options()
243              .set_caseless(true)
244              .set_extended(true)
245              .set_multiline(true)).PartialMatch(sometext);
246
247

SCANNING TEXT INCREMENTALLY

249
250       The "Consume" operation may be useful if you want to  repeatedly  match
251       regular expressions at the front of a string and skip over them as they
252       match. This requires use of the "StringPiece" type, which represents  a
253       sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
254       pcrecpp namespace.
255
256         Example: read lines of the form "var = value" from a string.
257            string contents = ...;                 // Fill string somehow
258            pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
259
260            string var;
261            int value;
262            pcrecpp::RE re("(\\w+) = (\\d+)\n");
263            while (re.Consume(&input, &var, &value)) {
264              ...;
265            }
266
267       Each successful call  to  "Consume"  will  set  "var/value",  and  also
268       advance "input" so it points past the matched text.
269
270       The  "FindAndConsume"  operation  is  similar to "Consume" but does not
271       anchor your match at the beginning of  the  string.  For  example,  you
272       could extract all words from a string by repeatedly calling
273
274         pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
275

PARSING HEX/OCTAL/C-RADIX NUMBERS

277
278       By default, if you pass a pointer to a numeric value, the corresponding
279       text is interpreted as a base-10  number.  You  can  instead  wrap  the
280       pointer with a call to one of the operators Hex(), Octal(), or CRadix()
281       to interpret the text in another base. The CRadix  operator  interprets
282       C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
283       base-10.
284
285         Example:
286           int a, b, c, d;
287           pcrecpp::RE re("(.*) (.*) (.*) (.*)");
288           re.FullMatch("100 40 0100 0x40",
289                        pcrecpp::Octal(&a), pcrecpp::Hex(&b),
290                        pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
291
292       will leave 64 in a, b, c, and d.
293

REPLACING PARTS OF STRINGS

295
296       You can replace the first match of "pattern" in "str"  with  "rewrite".
297       Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
298       insert text matching corresponding parenthesized group  from  the  pat‐
299       tern. \0 in "rewrite" refers to the entire matching text. For example:
300
301         string s = "yabba dabba doo";
302         pcrecpp::RE("b+").Replace("d", &s);
303
304       will  leave  "s" containing "yada dabba doo". The result is true if the
305       pattern matches and a replacement occurs, false otherwise.
306
307       GlobalReplace is like Replace except that it replaces  all  occurrences
308       of  the  pattern  in  the string with the rewrite. Replacements are not
309       subject to re-matching. For example:
310
311         string s = "yabba dabba doo";
312         pcrecpp::RE("b+").GlobalReplace("d", &s);
313
314       will leave "s" containing "yada dada doo". It  returns  the  number  of
315       replacements made.
316
317       Extract  is like Replace, except that if the pattern matches, "rewrite"
318       is copied into "out" (an additional argument) with substitutions.   The
319       non-matching  portions  of "text" are ignored. Returns true iff a match
320       occurred and the extraction happened successfully;  if no match occurs,
321       the string is left unaffected.
322

AUTHOR

324
325       The C++ wrapper was contributed by Google Inc.
326       Copyright (c) 2007 Google Inc.
327

REVISION

329
330       Last updated: 08 January 2012
331
332
333
334PCRE 8.30                       08 January 2012                     PCRECPP(3)
Impressum