1PCRECPP(3)                 Library Functions Manual                 PCRECPP(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS OF C++ WRAPPER

9
10       #include <pcrecpp.h>
11

DESCRIPTION

13
14       The  C++  wrapper  for PCRE was provided by Google Inc. Some additional
15       functionality was added by Giuseppe Maxia. This brief man page was con‐
16       structed  from  the  notes  in the pcrecpp.h file, which should be con‐
17       sulted for further details.
18

MATCHING INTERFACE

20
21       The "FullMatch" operation checks that supplied text matches a  supplied
22       pattern  exactly.  If pointer arguments are supplied, it copies matched
23       sub-strings that match sub-patterns into them.
24
25         Example: successful match
26            pcrecpp::RE re("h.*o");
27            re.FullMatch("hello");
28
29         Example: unsuccessful match (requires full match):
30            pcrecpp::RE re("e");
31            !re.FullMatch("hello");
32
33         Example: creating a temporary RE object:
34            pcrecpp::RE("h.*o").FullMatch("hello");
35
36       You can pass in a "const char*" or a "string" for "text". The  examples
37       below  tend to use a const char*. You can, as in the different examples
38       above, store the RE object explicitly in a variable or use a  temporary
39       RE  object.  The  examples below use one mode or the other arbitrarily.
40       Either could correctly be used for any of these examples.
41
42       You must supply extra pointer arguments to extract matched subpieces.
43
44         Example: extracts "ruby" into "s" and 1234 into "i"
45            int i;
46            string s;
47            pcrecpp::RE re("(\\w+):(\\d+)");
48            re.FullMatch("ruby:1234", &s, &i);
49
50         Example: does not try to extract any extra sub-patterns
51            re.FullMatch("ruby:1234", &s);
52
53         Example: does not try to extract into NULL
54            re.FullMatch("ruby:1234", NULL, &i);
55
56         Example: integer overflow causes failure
57            !re.FullMatch("ruby:1234567891234", NULL, &i);
58
59         Example: fails because there aren't enough sub-patterns:
60            !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
61
62         Example: fails because string cannot be stored in integer
63            !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
64
65       The provided pointer arguments can be pointers to  any  scalar  numeric
66       type, or one of:
67
68          string        (matched piece is copied to string)
69          StringPiece   (StringPiece is mutated to point to matched piece)
70          T             (where "bool T::ParseFrom(const char*, int)" exists)
71          NULL          (the corresponding matched sub-pattern is not copied)
72
73       The  function returns true iff all of the following conditions are sat‐
74       isfied:
75
76         a. "text" matches "pattern" exactly;
77
78         b. The number of matched sub-patterns is >= number of supplied
79            pointers;
80
81         c. The "i"th argument has a suitable type for holding the
82            string captured as the "i"th sub-pattern. If you pass in
83            void * NULL for the "i"th argument, or a non-void * NULL
84            of the correct type, or pass fewer arguments than the
85            number of sub-patterns, "i"th captured sub-pattern is
86            ignored.
87
88       CAVEAT: An optional sub-pattern that does  not  exist  in  the  matched
89       string  is  assigned  the  empty  string. Therefore, the following will
90       return false (because the empty string is not a valid number):
91
92          int number;
93          pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
94
95       The matching interface supports at most 16 arguments per call.  If  you
96       need    more,    consider    using    the    more   general   interface
97       pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch.
98
99       NOTE: Do not use no_arg, which is used internally to mark the end of  a
100       list  of optional arguments, as a placeholder for missing arguments, as
101       this can lead to segfaults.
102

QUOTING METACHARACTERS

104
105       You can use the "QuoteMeta" operation to insert backslashes before  all
106       potentially  meaningful  characters  in  a string. The returned string,
107       used as a regular expression, will exactly match the original string.
108
109         Example:
110            string quoted = RE::QuoteMeta(unquoted);
111
112       Note that it's legal to escape a character even if it  has  no  special
113       meaning  in  a  regular expression -- so this function does that. (This
114       also makes it identical to the perl function  of  the  same  name;  see
115       "perldoc    -f    quotemeta".)    For   example,   "1.5-2.0?"   becomes
116       "1\.5\-2\.0\?".
117

PARTIAL MATCHES

119
120       You can use the "PartialMatch" operation when you want the  pattern  to
121       match any substring of the text.
122
123         Example: simple search for a string:
124            pcrecpp::RE("ell").PartialMatch("hello");
125
126         Example: find first number in a string:
127            int number;
128            pcrecpp::RE re("(\\d+)");
129            re.PartialMatch("x*100 + 20", &number);
130            assert(number == 100);
131

UTF-8 AND THE MATCHING INTERFACE

133
134       By  default,  pattern  and text are plain text, one byte per character.
135       The UTF8 flag, passed to  the  constructor,  causes  both  pattern  and
136       string to be treated as UTF-8 text, still a byte stream but potentially
137       multiple bytes per character. In practice, the text is likelier  to  be
138       UTF-8  than  the pattern, but the match returned may depend on the UTF8
139       flag, so always use it when matching UTF8 text. For example,  "."  will
140       match  one  byte normally but with UTF8 set may match up to three bytes
141       of a multi-byte character.
142
143         Example:
144            pcrecpp::RE_Options options;
145            options.set_utf8();
146            pcrecpp::RE re(utf8_pattern, options);
147            re.FullMatch(utf8_string);
148
149         Example: using the convenience function UTF8():
150            pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
151            re.FullMatch(utf8_string);
152
153       NOTE: The UTF8 flag is ignored if pcre was not configured with the
154             --enable-utf8 flag.
155

PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE

157
158       PCRE defines some modifiers to  change  the  behavior  of  the  regular
159       expression   engine.  The  C++  wrapper  defines  an  auxiliary  class,
160       RE_Options, as a vehicle to pass such modifiers to  a  RE  class.  Cur‐
161       rently, the following modifiers are supported:
162
163          modifier              description               Perl corresponding
164
165          PCRE_CASELESS         case insensitive match      /i
166          PCRE_MULTILINE        multiple lines match        /m
167          PCRE_DOTALL           dot matches newlines        /s
168          PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
169          PCRE_EXTRA            strict escape parsing       N/A
170          PCRE_EXTENDED         ignore whitespaces          /x
171          PCRE_UTF8             handles UTF8 chars          built-in
172          PCRE_UNGREEDY         reverses * and *?           N/A
173          PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
174
175       (*)  Both Perl and PCRE allow non capturing parentheses by means of the
176       "?:" modifier within the pattern itself. e.g. (?:ab|cd) does  not  cap‐
177       ture, while (ab|cd) does.
178
179       For  a  full  account on how each modifier works, please check the PCRE
180       API reference page.
181
182       For each modifier, there are two member functions whose  name  is  made
183       out  of  the  modifier  in  lowercase,  without the "PCRE_" prefix. For
184       instance, PCRE_CASELESS is handled by
185
186         bool caseless()
187
188       which returns true if the modifier is set, and
189
190         RE_Options & set_caseless(bool)
191
192       which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can
193       be  accessed  through  the  set_match_limit()  and match_limit() member
194       functions. Setting match_limit to a non-zero value will limit the  exe‐
195       cution  of pcre to keep it from doing bad things like blowing the stack
196       or taking an eternity to return a result.  A  value  of  5000  is  good
197       enough  to stop stack blowup in a 2MB thread stack. Setting match_limit
198       to  zero  disables  match  limiting.  Alternatively,   you   can   call
199       match_limit_recursion()  which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to
200       limit how much  PCRE  recurses.  match_limit()  limits  the  number  of
201       matches PCRE does; match_limit_recursion() limits the depth of internal
202       recursion, and therefore the amount of stack that is used.
203
204       Normally, to pass one or more modifiers to a RE class,  you  declare  a
205       RE_Options object, set the appropriate options, and pass this object to
206       a RE constructor. Example:
207
208          RE_options opt;
209          opt.set_caseless(true);
210          if (RE("HELLO", opt).PartialMatch("hello world")) ...
211
212       RE_options has two constructors. The default constructor takes no argu‐
213       ments  and creates a set of flags that are off by default. The optional
214       parameter option_flags is to facilitate transfer of legacy code from  C
215       programs.  This lets you do
216
217          RE(pattern,
218            RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
219
220       However, new code is better off doing
221
222          RE(pattern,
223            RE_Options().set_caseless(true).set_multiline(true))
224              .PartialMatch(str);
225
226       If you are going to pass one of the most used modifiers, there are some
227       convenience functions that return a RE_Options class with the appropri‐
228       ate  modifier  already  set: CASELESS(), UTF8(), MULTILINE(), DOTALL(),
229       and EXTENDED().
230
231       If you need to set several options at once, and you don't  want  to  go
232       through  the pains of declaring a RE_Options object and setting several
233       options, there is a parallel method that give you such ability  on  the
234       fly.  You  can  concatenate several set_xxxxx() member functions, since
235       each of them returns a reference to its class object. For  example,  to
236       pass  PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one
237       statement, you may write:
238
239          RE(" ^ xyz \\s+ .* blah$",
240            RE_Options()
241              .set_caseless(true)
242              .set_extended(true)
243              .set_multiline(true)).PartialMatch(sometext);
244
245

SCANNING TEXT INCREMENTALLY

247
248       The "Consume" operation may be useful if you want to  repeatedly  match
249       regular expressions at the front of a string and skip over them as they
250       match. This requires use of the "StringPiece" type, which represents  a
251       sub-range  of  a  real  string.  Like RE, StringPiece is defined in the
252       pcrecpp namespace.
253
254         Example: read lines of the form "var = value" from a string.
255            string contents = ...;                 // Fill string somehow
256            pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
257
258            string var;
259            int value;
260            pcrecpp::RE re("(\\w+) = (\\d+)\n");
261            while (re.Consume(&input, &var, &value)) {
262              ...;
263            }
264
265       Each successful call  to  "Consume"  will  set  "var/value",  and  also
266       advance "input" so it points past the matched text.
267
268       The  "FindAndConsume"  operation  is  similar to "Consume" but does not
269       anchor your match at the beginning of  the  string.  For  example,  you
270       could extract all words from a string by repeatedly calling
271
272         pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
273

PARSING HEX/OCTAL/C-RADIX NUMBERS

275
276       By default, if you pass a pointer to a numeric value, the corresponding
277       text is interpreted as a base-10  number.  You  can  instead  wrap  the
278       pointer with a call to one of the operators Hex(), Octal(), or CRadix()
279       to interpret the text in another base. The CRadix  operator  interprets
280       C-style  "0"  (base-8)  and  "0x"  (base-16)  prefixes, but defaults to
281       base-10.
282
283         Example:
284           int a, b, c, d;
285           pcrecpp::RE re("(.*) (.*) (.*) (.*)");
286           re.FullMatch("100 40 0100 0x40",
287                        pcrecpp::Octal(&a), pcrecpp::Hex(&b),
288                        pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
289
290       will leave 64 in a, b, c, and d.
291

REPLACING PARTS OF STRINGS

293
294       You can replace the first match of "pattern" in "str"  with  "rewrite".
295       Within  "rewrite",  backslash-escaped  digits (\1 to \9) can be used to
296       insert text matching corresponding parenthesized group  from  the  pat‐
297       tern. \0 in "rewrite" refers to the entire matching text. For example:
298
299         string s = "yabba dabba doo";
300         pcrecpp::RE("b+").Replace("d", &s);
301
302       will  leave  "s" containing "yada dabba doo". The result is true if the
303       pattern matches and a replacement occurs, false otherwise.
304
305       GlobalReplace is like Replace except that it replaces  all  occurrences
306       of  the  pattern  in  the string with the rewrite. Replacements are not
307       subject to re-matching. For example:
308
309         string s = "yabba dabba doo";
310         pcrecpp::RE("b+").GlobalReplace("d", &s);
311
312       will leave "s" containing "yada dada doo". It  returns  the  number  of
313       replacements made.
314
315       Extract  is like Replace, except that if the pattern matches, "rewrite"
316       is copied into "out" (an additional argument) with substitutions.   The
317       non-matching  portions  of "text" are ignored. Returns true iff a match
318       occurred and the extraction happened successfully;  if no match occurs,
319       the string is left unaffected.
320

AUTHOR

322
323       The C++ wrapper was contributed by Google Inc.
324       Copyright (c) 2007 Google Inc.
325

REVISION

327
328       Last updated: 17 March 2009
329
330
331
332                                                                    PCRECPP(3)
Impressum