1PCRECPP(3)                 Library Functions Manual                 PCRECPP(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS OF C++ WRAPPER

9
10       #include <pcrecpp.h>
11

DESCRIPTION

13
14       The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
15       functionality was added by Giuseppe Maxia. This brief man page was con‐
16       structed  from  the  notes  in the pcrecpp.h file, which should be con‐
17       sulted for further details.
18

MATCHING INTERFACE

20
21       The "FullMatch" operation checks that supplied text matches a  supplied
22       pattern  exactly.  If pointer arguments are supplied, it copies matched
23       sub-strings that match sub-patterns into them.
24
25         Example: successful match
26            pcrecpp::RE re("h.*o");
27            re.FullMatch("hello");
28
29         Example: unsuccessful match (requires full match):
30            pcrecpp::RE re("e");
31            !re.FullMatch("hello");
32
33         Example: creating a temporary RE object:
34            pcrecpp::RE("h.*o").FullMatch("hello");
35
36       You can pass in a "const char*" or a "string" for "text". The  examples
37       below  tend to use a const char*. You can, as in the different examples
38       above, store the RE object explicitly in a variable or use a  temporary
39       RE  object.  The  examples below use one mode or the other arbitrarily.
40       Either could correctly be used for any of these examples.
41
42       You must supply extra pointer arguments to extract matched subpieces.
43
44         Example: extracts "ruby" into "s" and 1234 into "i"
45            int i;
46            string s;
47            pcrecpp::RE re("(\\w+):(\\d+)");
48            re.FullMatch("ruby:1234", &s, &i);
49
50         Example: does not try to extract any extra sub-patterns
51            re.FullMatch("ruby:1234", &s);
52
53         Example: does not try to extract into NULL
54            re.FullMatch("ruby:1234", NULL, &i);
55
56         Example: integer overflow causes failure
57            !re.FullMatch("ruby:1234567891234", NULL, &i);
58
59         Example: fails because there aren't enough sub-patterns:
60            !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
61
62         Example: fails because string cannot be stored in integer
63            !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
64
65       The provided pointer arguments can be pointers to  any  scalar  numeric
66       type, or one of:
67
68          string        (matched piece is copied to string)
69          StringPiece   (StringPiece is mutated to point to matched piece)
70          T             (where "bool T::ParseFrom(const char*, int)" exists)
71          NULL          (the corresponding matched sub-pattern is not copied)
72
73       The  function returns true iff all of the following conditions are sat‐
74       isfied:
75
76         a. "text" matches "pattern" exactly;
77
78         b. The number of matched sub-patterns is >= number of supplied
79            pointers;
80
81         c. The "i"th argument has a suitable type for holding the
82            string captured as the "i"th sub-pattern. If you pass in
83            NULL for the "i"th argument, or pass fewer arguments than
84            number of sub-patterns, "i"th captured sub-pattern is
85            ignored.
86
87       CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
88       string  is  assigned  the  empty  string. Therefore, the following will
89       return false (because the empty string is not a valid number):
90
91          int number;
92          pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
93
94       The matching interface supports at most 16 arguments per call.  If  you
95       need    more,    consider    using    the    more   general   interface
96       pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
97

QUOTING METACHARACTERS

99
100       You can use the "QuoteMeta" operation to insert backslashes before  all
101       potentially  meaningful  characters  in  a string. The returned string,
102       used as a regular expression, will exactly match the original string.
103
104         Example:
105            string quoted = RE::QuoteMeta(unquoted);
106
107       Note that it's legal to escape a character even if it  has  no  special
108       meaning  in  a  regular expression -- so this function does that. (This
109       also makes it identical to the perl function  of  the  same  name;  see
110       "perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
111       "1\.5\-2\.0\?".
112

PARTIAL MATCHES

114
115       You can use the "PartialMatch" operation when you want the  pattern  to
116       match any substring of the text.
117
118         Example: simple search for a string:
119            pcrecpp::RE("ell").PartialMatch("hello");
120
121         Example: find first number in a string:
122            int number;
123            pcrecpp::RE re("(\\d+)");
124            re.PartialMatch("x*100 + 20", &number);
125            assert(number == 100);
126

UTF-8 AND THE MATCHING INTERFACE

128
129       By  default,  pattern  and text are plain text, one byte per character.
130       The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
131       string to be treated as UTF-8 text, still a byte stream but potentially
132       multiple bytes per character. In practice, the text is likelier  to  be
133       UTF-8  than  the pattern, but the match returned may depend on the UTF8
134       flag, so always use it when matching UTF8 text. For example,  "."  will
135       match  one  byte normally but with UTF8 set may match up to three bytes
136       of a multi-byte character.
137
138         Example:
139            pcrecpp::RE_Options options;
140            options.set_utf8();
141            pcrecpp::RE re(utf8_pattern, options);
142            re.FullMatch(utf8_string);
143
144         Example: using the convenience function UTF8():
145            pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
146            re.FullMatch(utf8_string);
147
148       NOTE: The UTF8 flag is ignored if pcre was not configured with the
149             --enable-utf8 flag.
150

PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE

152
153       PCRE defines some modifiers to  change  the  behavior  of  the  regular
154       expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
155       RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur‐
156       rently, the following modifiers are supported:
157
158          modifier              description               Perl corresponding
159
160          PCRE_CASELESS         case insensitive match      /i
161          PCRE_MULTILINE        multiple lines match        /m
162          PCRE_DOTALL           dot matches newlines        /s
163          PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
164          PCRE_EXTRA            strict escape parsing       N/A
165          PCRE_EXTENDED         ignore whitespaces          /x
166          PCRE_UTF8             handles UTF8 chars          built-in
167          PCRE_UNGREEDY         reverses * and *?           N/A
168          PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
169
170       (*)  Both Perl and PCRE allow non capturing parentheses by means of the
171       "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap‐
172       ture, while (ab|cd) does.
173
174       For  a  full  account on how each modifier works, please check the PCRE
175       API reference page.
176
177       For each modifier, there are two member functions whose  name  is  made
178       out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
179       instance, PCRE_CASELESS is handled by
180
181         bool caseless()
182
183       which returns true if the modifier is set, and
184
185         RE_Options & set_caseless(bool)
186
187       which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
188       be  accessed  through  the  set_match_limit()  and match_limit() member
189       functions. Setting match_limit to a non-zero value will limit the  exe‐
190       cution  of pcre to keep it from doing bad things like blowing the stack
191       or taking an eternity to return a result.  A  value  of  5000  is  good
192       enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
193       to  zero  disables  match  limiting.  Alternatively,   you   can   call
194       match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
195       limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
196       matches PCRE does; match_limit_recursion() limits the depth of internal
197       recursion, and therefore the amount of stack that is used.
198
199       Normally, to pass one or more modifiers to a RE class,  you  declare  a
200       RE_Options object, set the appropriate options, and pass this object to
201       a RE constructor. Example:
202
203          RE_options opt;
204          opt.set_caseless(true);
205          if (RE("HELLO", opt).PartialMatch("hello world")) ...
206
207       RE_options has two constructors. The default constructor takes no argu‐
208       ments  and creates a set of flags that are off by default. The optional
209       parameter option_flags is to facilitate transfer of legacy code from  C
210       programs.  This lets you do
211
212          RE(pattern,
213            RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
214
215       However, new code is better off doing
216
217          RE(pattern,
218            RE_Options().set_caseless(true).set_multiline(true))
219              .PartialMatch(str);
220
221       If you are going to pass one of the most used modifiers, there are some
222       convenience functions that return a RE_Options class with the appropri‐
223       ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
224       and EXTENDED().
225
226       If you need to set several options at once, and you don't  want  to  go
227       through  the pains of declaring a RE_Options object and setting several
228       options, there is a parallel method that give you such ability  on  the
229       fly.  You  can  concatenate several set_xxxxx() member functions, since
230       each of them returns a reference to its class object. For  example,  to
231       pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
232       statement, you may write:
233
234          RE(" ^ xyz \\s+ .* blah$",
235            RE_Options()
236              .set_caseless(true)
237              .set_extended(true)
238              .set_multiline(true)).PartialMatch(sometext);
239
240

SCANNING TEXT INCREMENTALLY

242
243       The "Consume" operation may be useful if you want to  repeatedly  match
244       regular expressions at the front of a string and skip over them as they
245       match. This requires use of the "StringPiece" type, which represents  a
246       sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
247       pcrecpp namespace.
248
249         Example: read lines of the form "var = value" from a string.
250            string contents = ...;                 // Fill string somehow
251            pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
252
253            string var;
254            int value;
255            pcrecpp::RE re("(\\w+) = (\\d+)\n");
256            while (re.Consume(&input, &var, &value)) {
257              ...;
258            }
259
260       Each successful call  to  "Consume"  will  set  "var/value",  and  also
261       advance "input" so it points past the matched text.
262
263       The  "FindAndConsume"  operation  is  similar to "Consume" but does not
264       anchor your match at the beginning of  the  string.  For  example,  you
265       could extract all words from a string by repeatedly calling
266
267         pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
268

PARSING HEX/OCTAL/C-RADIX NUMBERS

270
271       By default, if you pass a pointer to a numeric value, the corresponding
272       text is interpreted as a base-10  number.  You  can  instead  wrap  the
273       pointer with a call to one of the operators Hex(), Octal(), or CRadix()
274       to interpret the text in another base. The CRadix  operator  interprets
275       C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
276       base-10.
277
278         Example:
279           int a, b, c, d;
280           pcrecpp::RE re("(.*) (.*) (.*) (.*)");
281           re.FullMatch("100 40 0100 0x40",
282                        pcrecpp::Octal(&a), pcrecpp::Hex(&b),
283                        pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
284
285       will leave 64 in a, b, c, and d.
286

REPLACING PARTS OF STRINGS

288
289       You can replace the first match of "pattern" in "str"  with  "rewrite".
290       Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
291       insert text matching corresponding parenthesized group  from  the  pat‐
292       tern. \0 in "rewrite" refers to the entire matching text. For example:
293
294         string s = "yabba dabba doo";
295         pcrecpp::RE("b+").Replace("d", &s);
296
297       will  leave  "s" containing "yada dabba doo". The result is true if the
298       pattern matches and a replacement occurs, false otherwise.
299
300       GlobalReplace is like Replace except that it replaces  all  occurrences
301       of  the  pattern  in  the string with the rewrite. Replacements are not
302       subject to re-matching. For example:
303
304         string s = "yabba dabba doo";
305         pcrecpp::RE("b+").GlobalReplace("d", &s);
306
307       will leave "s" containing "yada dada doo". It  returns  the  number  of
308       replacements made.
309
310       Extract  is like Replace, except that if the pattern matches, "rewrite"
311       is copied into "out" (an additional argument) with substitutions.   The
312       non-matching  portions  of "text" are ignored. Returns true iff a match
313       occurred and the extraction happened successfully;  if no match occurs,
314       the string is left unaffected.
315

AUTHOR

317
318       The C++ wrapper was contributed by Google Inc.
319       Copyright (c) 2007 Google Inc.
320

REVISION

322
323       Last updated: 06 March 2007
324
325
326
327                                                                    PCRECPP(3)
Impressum