1PCREAPI(3)                 Library Functions Manual                 PCREAPI(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE NATIVE API

9
10       #include <pcre.h>
11
12       pcre *pcre_compile(const char *pattern, int options,
13            const char **errptr, int *erroffset,
14            const unsigned char *tableptr);
15
16       pcre *pcre_compile2(const char *pattern, int options,
17            int *errorcodeptr,
18            const char **errptr, int *erroffset,
19            const unsigned char *tableptr);
20
21       pcre_extra *pcre_study(const pcre *code, int options,
22            const char **errptr);
23
24       int pcre_exec(const pcre *code, const pcre_extra *extra,
25            const char *subject, int length, int startoffset,
26            int options, int *ovector, int ovecsize);
27
28       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
29            const char *subject, int length, int startoffset,
30            int options, int *ovector, int ovecsize,
31            int *workspace, int wscount);
32
33       int pcre_copy_named_substring(const pcre *code,
34            const char *subject, int *ovector,
35            int stringcount, const char *stringname,
36            char *buffer, int buffersize);
37
38       int pcre_copy_substring(const char *subject, int *ovector,
39            int stringcount, int stringnumber, char *buffer,
40            int buffersize);
41
42       int pcre_get_named_substring(const pcre *code,
43            const char *subject, int *ovector,
44            int stringcount, const char *stringname,
45            const char **stringptr);
46
47       int pcre_get_stringnumber(const pcre *code,
48            const char *name);
49
50       int pcre_get_stringtable_entries(const pcre *code,
51            const char *name, char **first, char **last);
52
53       int pcre_get_substring(const char *subject, int *ovector,
54            int stringcount, int stringnumber,
55            const char **stringptr);
56
57       int pcre_get_substring_list(const char *subject,
58            int *ovector, int stringcount, const char ***listptr);
59
60       void pcre_free_substring(const char *stringptr);
61
62       void pcre_free_substring_list(const char **stringptr);
63
64       const unsigned char *pcre_maketables(void);
65
66       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
67            int what, void *where);
68
69       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
70
71       int pcre_refcount(pcre *code, int adjust);
72
73       int pcre_config(int what, void *where);
74
75       char *pcre_version(void);
76
77       void *(*pcre_malloc)(size_t);
78
79       void (*pcre_free)(void *);
80
81       void *(*pcre_stack_malloc)(size_t);
82
83       void (*pcre_stack_free)(void *);
84
85       int (*pcre_callout)(pcre_callout_block *);
86

PCRE API OVERVIEW

88
89       PCRE has its own native API, which is described in this document. There
90       are also some wrapper functions that correspond to  the  POSIX  regular
91       expression  API.  These  are  described in the pcreposix documentation.
92       Both of these APIs define a set of C function calls. A C++  wrapper  is
93       distributed with PCRE. It is documented in the pcrecpp page.
94
95       The  native  API  C  function prototypes are defined in the header file
96       pcre.h, and on Unix systems the library itself is called  libpcre.   It
97       can normally be accessed by adding -lpcre to the command for linking an
98       application  that  uses  PCRE.  The  header  file  defines  the  macros
99       PCRE_MAJOR  and  PCRE_MINOR to contain the major and minor release num‐
100       bers for the library.  Applications can use these  to  include  support
101       for different releases of PCRE.
102
103       In a Windows environment, if you want to statically link an application
104       program against a non-dll pcre.a  file,  you  must  define  PCRE_STATIC
105       before  including  pcre.h or pcrecpp.h, because otherwise the pcre_mal‐
106       loc()   and   pcre_free()   exported   functions   will   be   declared
107       __declspec(dllimport), with unwanted results.
108
109       The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
110       pcre_exec() are used for compiling and matching regular expressions  in
111       a  Perl-compatible  manner. A sample program that demonstrates the sim‐
112       plest way of using them is provided in the file  called  pcredemo.c  in
113       the PCRE source distribution. A listing of this program is given in the
114       pcredemo documentation, and the pcresample documentation describes  how
115       to compile and run it.
116
117       A second matching function, pcre_dfa_exec(), which is not Perl-compati‐
118       ble, is also provided. This uses a different algorithm for  the  match‐
119       ing.  The  alternative algorithm finds all possible matches (at a given
120       point in the subject), and scans the subject just  once  (unless  there
121       are  lookbehind  assertions).  However,  this algorithm does not return
122       captured substrings. A description of the two matching  algorithms  and
123       their  advantages  and disadvantages is given in the pcrematching docu‐
124       mentation.
125
126       In addition to the main compiling and  matching  functions,  there  are
127       convenience functions for extracting captured substrings from a subject
128       string that is matched by pcre_exec(). They are:
129
130         pcre_copy_substring()
131         pcre_copy_named_substring()
132         pcre_get_substring()
133         pcre_get_named_substring()
134         pcre_get_substring_list()
135         pcre_get_stringnumber()
136         pcre_get_stringtable_entries()
137
138       pcre_free_substring() and pcre_free_substring_list() are also provided,
139       to free the memory used for extracted strings.
140
141       The  function  pcre_maketables()  is  used  to build a set of character
142       tables  in  the  current  locale   for   passing   to   pcre_compile(),
143       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
144       provided for specialist use.  Most  commonly,  no  special  tables  are
145       passed,  in  which case internal tables that are generated when PCRE is
146       built are used.
147
148       The function pcre_fullinfo() is used to find out  information  about  a
149       compiled  pattern; pcre_info() is an obsolete version that returns only
150       some of the available information, but is retained for  backwards  com‐
151       patibility.   The function pcre_version() returns a pointer to a string
152       containing the version of PCRE and its date of release.
153
154       The function pcre_refcount() maintains a  reference  count  in  a  data
155       block  containing  a compiled pattern. This is provided for the benefit
156       of object-oriented applications.
157
158       The global variables pcre_malloc and pcre_free  initially  contain  the
159       entry  points  of  the  standard malloc() and free() functions, respec‐
160       tively. PCRE calls the memory management functions via these variables,
161       so  a  calling  program  can replace them if it wishes to intercept the
162       calls. This should be done before calling any PCRE functions.
163
164       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
165       indirections  to  memory  management functions. These special functions
166       are used only when PCRE is compiled to use  the  heap  for  remembering
167       data, instead of recursive function calls, when running the pcre_exec()
168       function. See the pcrebuild documentation for  details  of  how  to  do
169       this.  It  is  a non-standard way of building PCRE, for use in environ‐
170       ments that have limited stacks. Because of the greater  use  of  memory
171       management,  it  runs  more  slowly. Separate functions are provided so
172       that special-purpose external code can be  used  for  this  case.  When
173       used,  these  functions  are always called in a stack-like manner (last
174       obtained, first freed), and always for memory blocks of the same  size.
175       There  is  a discussion about PCRE's stack usage in the pcrestack docu‐
176       mentation.
177
178       The global variable pcre_callout initially contains NULL. It can be set
179       by  the  caller  to  a "callout" function, which PCRE will then call at
180       specified points during a matching operation. Details are given in  the
181       pcrecallout documentation.
182

NEWLINES

184
185       PCRE  supports five different conventions for indicating line breaks in
186       strings: a single CR (carriage return) character, a  single  LF  (line‐
187       feed) character, the two-character sequence CRLF, any of the three pre‐
188       ceding, or any Unicode newline sequence. The Unicode newline  sequences
189       are  the  three just mentioned, plus the single characters VT (vertical
190       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
191       separator, U+2028), and PS (paragraph separator, U+2029).
192
193       Each  of  the first three conventions is used by at least one operating
194       system as its standard newline sequence. When PCRE is built, a  default
195       can  be  specified.  The default default is LF, which is the Unix stan‐
196       dard. When PCRE is run, the default can be overridden,  either  when  a
197       pattern is compiled, or when it is matched.
198
199       At compile time, the newline convention can be specified by the options
200       argument of pcre_compile(), or it can be specified by special  text  at
201       the start of the pattern itself; this overrides any other settings. See
202       the pcrepattern page for details of the special character sequences.
203
204       In the PCRE documentation the word "newline" is used to mean "the char‐
205       acter  or pair of characters that indicate a line break". The choice of
206       newline convention affects the handling of  the  dot,  circumflex,  and
207       dollar metacharacters, the handling of #-comments in /x mode, and, when
208       CRLF is a recognized line ending sequence, the match position  advance‐
209       ment for a non-anchored pattern. There is more detail about this in the
210       section on pcre_exec() options below.
211
212       The choice of newline convention does not affect the interpretation  of
213       the  \n  or  \r  escape  sequences, nor does it affect what \R matches,
214       which is controlled in a similar way, but by separate options.
215

MULTITHREADING

217
218       The PCRE functions can be used in  multi-threading  applications,  with
219       the  proviso  that  the  memory  management  functions  pointed  to  by
220       pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
221       callout function pointed to by pcre_callout, are shared by all threads.
222
223       The  compiled form of a regular expression is not altered during match‐
224       ing, so the same compiled pattern can safely be used by several threads
225       at once.
226

SAVING PRECOMPILED PATTERNS FOR LATER USE

228
229       The compiled form of a regular expression can be saved and re-used at a
230       later time, possibly by a different program, and even on a  host  other
231       than  the  one  on  which  it  was  compiled.  Details are given in the
232       pcreprecompile documentation. However, compiling a  regular  expression
233       with  one version of PCRE for use with a different version is not guar‐
234       anteed to work and may cause crashes.
235

CHECKING BUILD-TIME OPTIONS

237
238       int pcre_config(int what, void *where);
239
240       The function pcre_config() makes it possible for a PCRE client to  dis‐
241       cover which optional features have been compiled into the PCRE library.
242       The pcrebuild documentation has more details about these optional  fea‐
243       tures.
244
245       The  first  argument  for pcre_config() is an integer, specifying which
246       information is required; the second argument is a pointer to a variable
247       into  which  the  information  is  placed. The following information is
248       available:
249
250         PCRE_CONFIG_UTF8
251
252       The output is an integer that is set to one if UTF-8 support is  avail‐
253       able; otherwise it is set to zero.
254
255         PCRE_CONFIG_UNICODE_PROPERTIES
256
257       The  output  is  an  integer  that is set to one if support for Unicode
258       character properties is available; otherwise it is set to zero.
259
260         PCRE_CONFIG_NEWLINE
261
262       The output is an integer whose value specifies  the  default  character
263       sequence  that is recognized as meaning "newline". The four values that
264       are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
265       and  -1  for  ANY.  Though they are derived from ASCII, the same values
266       are returned in EBCDIC environments. The default should normally corre‐
267       spond to the standard sequence for your operating system.
268
269         PCRE_CONFIG_BSR
270
271       The output is an integer whose value indicates what character sequences
272       the \R escape sequence matches by default. A value of 0 means  that  \R
273       matches  any  Unicode  line ending sequence; a value of 1 means that \R
274       matches only CR, LF, or CRLF. The default can be overridden when a pat‐
275       tern is compiled or matched.
276
277         PCRE_CONFIG_LINK_SIZE
278
279       The  output  is  an  integer that contains the number of bytes used for
280       internal linkage in compiled regular expressions. The value is 2, 3, or
281       4.  Larger  values  allow larger regular expressions to be compiled, at
282       the expense of slower matching. The default value of  2  is  sufficient
283       for  all  but  the  most massive patterns, since it allows the compiled
284       pattern to be up to 64K in size.
285
286         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
287
288       The output is an integer that contains the threshold  above  which  the
289       POSIX  interface  uses malloc() for output vectors. Further details are
290       given in the pcreposix documentation.
291
292         PCRE_CONFIG_MATCH_LIMIT
293
294       The output is a long integer that gives the default limit for the  num‐
295       ber  of  internal  matching  function calls in a pcre_exec() execution.
296       Further details are given with pcre_exec() below.
297
298         PCRE_CONFIG_MATCH_LIMIT_RECURSION
299
300       The output is a long integer that gives the default limit for the depth
301       of   recursion  when  calling  the  internal  matching  function  in  a
302       pcre_exec() execution.  Further  details  are  given  with  pcre_exec()
303       below.
304
305         PCRE_CONFIG_STACKRECURSE
306
307       The  output is an integer that is set to one if internal recursion when
308       running pcre_exec() is implemented by recursive function calls that use
309       the  stack  to remember their state. This is the usual way that PCRE is
310       compiled. The output is zero if PCRE was compiled to use blocks of data
311       on  the  heap  instead  of  recursive  function  calls.  In  this case,
312       pcre_stack_malloc and  pcre_stack_free  are  called  to  manage  memory
313       blocks on the heap, thus avoiding the use of the stack.
314

COMPILING A PATTERN

316
317       pcre *pcre_compile(const char *pattern, int options,
318            const char **errptr, int *erroffset,
319            const unsigned char *tableptr);
320
321       pcre *pcre_compile2(const char *pattern, int options,
322            int *errorcodeptr,
323            const char **errptr, int *erroffset,
324            const unsigned char *tableptr);
325
326       Either of the functions pcre_compile() or pcre_compile2() can be called
327       to compile a pattern into an internal form. The only difference between
328       the  two interfaces is that pcre_compile2() has an additional argument,
329       errorcodeptr, via which a numerical error  code  can  be  returned.  To
330       avoid  too  much repetition, we refer just to pcre_compile() below, but
331       the information applies equally to pcre_compile2().
332
333       The pattern is a C string terminated by a binary zero, and is passed in
334       the  pattern  argument.  A  pointer to a single block of memory that is
335       obtained via pcre_malloc is returned. This contains the  compiled  code
336       and related data. The pcre type is defined for the returned block; this
337       is a typedef for a structure whose contents are not externally defined.
338       It is up to the caller to free the memory (via pcre_free) when it is no
339       longer required.
340
341       Although the compiled code of a PCRE regex is relocatable, that is,  it
342       does not depend on memory location, the complete pcre data block is not
343       fully relocatable, because it may contain a copy of the tableptr  argu‐
344       ment, which is an address (see below).
345
346       The options argument contains various bit settings that affect the com‐
347       pilation. It should be zero if no options are required.  The  available
348       options  are  described  below. Some of them (in particular, those that
349       are compatible with Perl, but some others as well) can also be set  and
350       unset  from  within  the  pattern  (see the detailed description in the
351       pcrepattern documentation). For those options that can be different  in
352       different  parts  of  the pattern, the contents of the options argument
353       specifies their settings at the start of compilation and execution. The
354       PCRE_ANCHORED, PCRE_BSR_xxx, and PCRE_NEWLINE_xxx options can be set at
355       the time of matching as well as at compile time.
356
357       If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
358       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
359       sets the variable pointed to by errptr to point to a textual error mes‐
360       sage. This is a static string that is part of the library. You must not
361       try to free it. The byte offset from the start of the  pattern  to  the
362       character  that  was  being  processed when the error was discovered is
363       placed in the variable pointed to by erroffset, which must not be NULL.
364       If  it  is,  an  immediate error is given. Some errors are not detected
365       until checks are carried out when the whole pattern has  been  scanned;
366       in this case the offset is set to the end of the pattern.
367
368       If  pcre_compile2()  is  used instead of pcre_compile(), and the error‐
369       codeptr argument is not NULL, a non-zero error code number is  returned
370       via  this argument in the event of an error. This is in addition to the
371       textual error message. Error codes and messages are listed below.
372
373       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
374       character  tables  that  are  built  when  PCRE  is compiled, using the
375       default C locale. Otherwise, tableptr must be an address  that  is  the
376       result  of  a  call to pcre_maketables(). This value is stored with the
377       compiled pattern, and used again by pcre_exec(), unless  another  table
378       pointer is passed to it. For more discussion, see the section on locale
379       support below.
380
381       This code fragment shows a typical straightforward  call  to  pcre_com‐
382       pile():
383
384         pcre *re;
385         const char *error;
386         int erroffset;
387         re = pcre_compile(
388           "^A.*Z",          /* the pattern */
389           0,                /* default options */
390           &error,           /* for error message */
391           &erroffset,       /* for error offset */
392           NULL);            /* use default character tables */
393
394       The  following  names  for option bits are defined in the pcre.h header
395       file:
396
397         PCRE_ANCHORED
398
399       If this bit is set, the pattern is forced to be "anchored", that is, it
400       is  constrained to match only at the first matching point in the string
401       that is being searched (the "subject string"). This effect can also  be
402       achieved  by appropriate constructs in the pattern itself, which is the
403       only way to do it in Perl.
404
405         PCRE_AUTO_CALLOUT
406
407       If this bit is set, pcre_compile() automatically inserts callout items,
408       all  with  number  255, before each pattern item. For discussion of the
409       callout facility, see the pcrecallout documentation.
410
411         PCRE_BSR_ANYCRLF
412         PCRE_BSR_UNICODE
413
414       These options (which are mutually exclusive) control what the \R escape
415       sequence  matches.  The choice is either to match only CR, LF, or CRLF,
416       or to match any Unicode newline sequence. The default is specified when
417       PCRE is built. It can be overridden from within the pattern, or by set‐
418       ting an option when a compiled pattern is matched.
419
420         PCRE_CASELESS
421
422       If this bit is set, letters in the pattern match both upper  and  lower
423       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
424       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
425       always  understands the concept of case for characters whose values are
426       less than 128, so caseless matching is always possible. For  characters
427       with  higher  values,  the concept of case is supported if PCRE is com‐
428       piled with Unicode property support, but not otherwise. If you want  to
429       use  caseless  matching  for  characters 128 and above, you must ensure
430       that PCRE is compiled with Unicode property support  as  well  as  with
431       UTF-8 support.
432
433         PCRE_DOLLAR_ENDONLY
434
435       If  this bit is set, a dollar metacharacter in the pattern matches only
436       at the end of the subject string. Without this option,  a  dollar  also
437       matches  immediately before a newline at the end of the string (but not
438       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
439       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
440       Perl, and no way to set it within a pattern.
441
442         PCRE_DOTALL
443
444       If this bit is set, a dot metacharater in the pattern matches all char‐
445       acters,  including  those that indicate newline. Without it, a dot does
446       not match when the current position is at a  newline.  This  option  is
447       equivalent  to Perl's /s option, and it can be changed within a pattern
448       by a (?s) option setting. A negative class such as [^a] always  matches
449       newline characters, independent of the setting of this option.
450
451         PCRE_DUPNAMES
452
453       If  this  bit is set, names used to identify capturing subpatterns need
454       not be unique. This can be helpful for certain types of pattern when it
455       is  known  that  only  one instance of the named subpattern can ever be
456       matched. There are more details of named subpatterns  below;  see  also
457       the pcrepattern documentation.
458
459         PCRE_EXTENDED
460
461       If  this  bit  is  set,  whitespace  data characters in the pattern are
462       totally ignored except when escaped or inside a character class. White‐
463       space does not include the VT character (code 11). In addition, charac‐
464       ters between an unescaped # outside a character class and the next new‐
465       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
466       option, and it can be changed within a pattern by a  (?x)  option  set‐
467       ting.
468
469       This  option  makes  it possible to include comments inside complicated
470       patterns.  Note, however, that this applies only  to  data  characters.
471       Whitespace   characters  may  never  appear  within  special  character
472       sequences in a pattern, for  example  within  the  sequence  (?(  which
473       introduces a conditional subpattern.
474
475         PCRE_EXTRA
476
477       This  option  was invented in order to turn on additional functionality
478       of PCRE that is incompatible with Perl, but it  is  currently  of  very
479       little  use. When set, any backslash in a pattern that is followed by a
480       letter that has no special meaning  causes  an  error,  thus  reserving
481       these  combinations  for  future  expansion.  By default, as in Perl, a
482       backslash followed by a letter with no special meaning is treated as  a
483       literal. (Perl can, however, be persuaded to give an error for this, by
484       running it with the -w option.) There are at present no other  features
485       controlled  by this option. It can also be set by a (?X) option setting
486       within a pattern.
487
488         PCRE_FIRSTLINE
489
490       If this option is set, an  unanchored  pattern  is  required  to  match
491       before  or  at  the  first  newline  in  the subject string, though the
492       matched text may continue over the newline.
493
494         PCRE_JAVASCRIPT_COMPAT
495
496       If this option is set, PCRE's behaviour is changed in some ways so that
497       it  is  compatible with JavaScript rather than Perl. The changes are as
498       follows:
499
500       (1) A lone closing square bracket in a pattern  causes  a  compile-time
501       error,  because this is illegal in JavaScript (by default it is treated
502       as a data character). Thus, the pattern AB]CD becomes illegal when this
503       option is set.
504
505       (2)  At run time, a back reference to an unset subpattern group matches
506       an empty string (by default this causes the current  matching  alterna‐
507       tive  to  fail). A pattern such as (\1)(a) succeeds when this option is
508       set (assuming it can find an "a" in the subject), whereas it  fails  by
509       default, for Perl compatibility.
510
511         PCRE_MULTILINE
512
513       By  default,  PCRE  treats the subject string as consisting of a single
514       line of characters (even if it actually contains newlines). The  "start
515       of  line"  metacharacter  (^)  matches only at the start of the string,
516       while the "end of line" metacharacter ($) matches only at  the  end  of
517       the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
518       is set). This is the same as Perl.
519
520       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
521       constructs  match  immediately following or immediately before internal
522       newlines in the subject string, respectively, as well as  at  the  very
523       start  and  end.  This is equivalent to Perl's /m option, and it can be
524       changed within a pattern by a (?m) option setting. If there are no new‐
525       lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,
526       setting PCRE_MULTILINE has no effect.
527
528         PCRE_NEWLINE_CR
529         PCRE_NEWLINE_LF
530         PCRE_NEWLINE_CRLF
531         PCRE_NEWLINE_ANYCRLF
532         PCRE_NEWLINE_ANY
533
534       These options override the default newline definition that  was  chosen
535       when  PCRE  was built. Setting the first or the second specifies that a
536       newline is indicated by a single character (CR  or  LF,  respectively).
537       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
538       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
539       that any of the three preceding sequences should be recognized. Setting
540       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
541       recognized. The Unicode newline sequences are the three just mentioned,
542       plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,
543       U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS
544       (paragraph separator, U+2029). The last  two  are  recognized  only  in
545       UTF-8 mode.
546
547       The  newline  setting  in  the  options  word  uses three bits that are
548       treated as a number, giving eight possibilities. Currently only six are
549       used  (default  plus the five values above). This means that if you set
550       more than one newline option, the combination may or may not be  sensi‐
551       ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
552       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and
553       cause an error.
554
555       The  only time that a line break is specially recognized when compiling
556       a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a
557       character  class  is  encountered.  This indicates a comment that lasts
558       until after the next line break sequence. In other circumstances,  line
559       break   sequences   are   treated  as  literal  data,  except  that  in
560       PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
561       and are therefore ignored.
562
563       The newline option that is set at compile time becomes the default that
564       is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden.
565
566         PCRE_NO_AUTO_CAPTURE
567
568       If this option is set, it disables the use of numbered capturing paren‐
569       theses  in the pattern. Any opening parenthesis that is not followed by
570       ? behaves as if it were followed by ?: but named parentheses can  still
571       be  used  for  capturing  (and  they acquire numbers in the usual way).
572       There is no equivalent of this option in Perl.
573
574         PCRE_UCP
575
576       This option changes the way PCRE processes \b, \d, \s, \w, and some  of
577       the POSIX character classes. By default, only ASCII characters are rec‐
578       ognized, but if PCRE_UCP is set, Unicode properties are used instead to
579       classify  characters.  More details are given in the section on generic
580       character types in the pcrepattern page. If you set PCRE_UCP,  matching
581       one  of the items it affects takes much longer. The option is available
582       only if PCRE has been compiled with Unicode property support.
583
584         PCRE_UNGREEDY
585
586       This option inverts the "greediness" of the quantifiers  so  that  they
587       are  not greedy by default, but become greedy if followed by "?". It is
588       not compatible with Perl. It can also be set by a (?U)  option  setting
589       within the pattern.
590
591         PCRE_UTF8
592
593       This  option  causes PCRE to regard both the pattern and the subject as
594       strings of UTF-8 characters instead of single-byte  character  strings.
595       However,  it is available only when PCRE is built to include UTF-8 sup‐
596       port. If not, the use of this option provokes an error. Details of  how
597       this  option  changes the behaviour of PCRE are given in the section on
598       UTF-8 support in the main pcre page.
599
600         PCRE_NO_UTF8_CHECK
601
602       When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
603       automatically  checked.  There  is  a  discussion about the validity of
604       UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of
605       bytes  is  found,  pcre_compile() returns an error. If you already know
606       that your pattern is valid, and you want to skip this check for perfor‐
607       mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is
608       set, the effect of passing an invalid UTF-8  string  as  a  pattern  is
609       undefined.  It  may  cause your program to crash. Note that this option
610       can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the
611       UTF-8 validity checking of subject strings.
612

COMPILATION ERROR CODES

614
615       The  following  table  lists  the  error  codes than may be returned by
616       pcre_compile2(), along with the error messages that may be returned  by
617       both  compiling functions. As PCRE has developed, some error codes have
618       fallen out of use. To avoid confusion, they have not been re-used.
619
620          0  no error
621          1  \ at end of pattern
622          2  \c at end of pattern
623          3  unrecognized character follows \
624          4  numbers out of order in {} quantifier
625          5  number too big in {} quantifier
626          6  missing terminating ] for character class
627          7  invalid escape sequence in character class
628          8  range out of order in character class
629          9  nothing to repeat
630         10  [this code is not in use]
631         11  internal error: unexpected repeat
632         12  unrecognized character after (? or (?-
633         13  POSIX named classes are supported only within a class
634         14  missing )
635         15  reference to non-existent subpattern
636         16  erroffset passed as NULL
637         17  unknown option bit(s) set
638         18  missing ) after comment
639         19  [this code is not in use]
640         20  regular expression is too large
641         21  failed to get memory
642         22  unmatched parentheses
643         23  internal error: code overflow
644         24  unrecognized character after (?<
645         25  lookbehind assertion is not fixed length
646         26  malformed number or name after (?(
647         27  conditional group contains more than two branches
648         28  assertion expected after (?(
649         29  (?R or (?[+-]digits must be followed by )
650         30  unknown POSIX class name
651         31  POSIX collating elements are not supported
652         32  this version of PCRE is not compiled with PCRE_UTF8 support
653         33  [this code is not in use]
654         34  character value in \x{...} sequence is too large
655         35  invalid condition (?(0)
656         36  \C not allowed in lookbehind assertion
657         37  PCRE does not support \L, \l, \N, \U, or \u
658         38  number after (?C is > 255
659         39  closing ) for (?C expected
660         40  recursive call could loop indefinitely
661         41  unrecognized character after (?P
662         42  syntax error in subpattern name (missing terminator)
663         43  two named subpatterns have the same name
664         44  invalid UTF-8 string
665         45  support for \P, \p, and \X has not been compiled
666         46  malformed \P or \p sequence
667         47  unknown property name after \P or \p
668         48  subpattern name is too long (maximum 32 characters)
669         49  too many named subpatterns (maximum 10000)
670         50  [this code is not in use]
671         51  octal value is greater than \377 (not in UTF-8 mode)
672         52  internal error: overran compiling workspace
673         53  internal error: previously-checked referenced subpattern
674               not found
675         54  DEFINE group contains more than one branch
676         55  repeating a DEFINE group is not allowed
677         56  inconsistent NEWLINE options
678         57  \g is not followed by a braced, angle-bracketed, or quoted
679               name/number or by a plain number
680         58  a numbered reference must not be zero
681         59  an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
682         60  (*VERB) not recognized
683         61  number is too big
684         62  subpattern name expected
685         63  digit expected after (?+
686         64  ] is an invalid data character in JavaScript compatibility mode
687         65  different names for subpatterns of the same number are
688               not allowed
689         66  (*MARK) must have an argument
690         67  this version of PCRE is not compiled with PCRE_UCP support
691
692       The numbers 32 and 10000 in errors 48 and 49  are  defaults;  different
693       values may be used if the limits were changed when PCRE was built.
694

STUDYING A PATTERN

696
697       pcre_extra *pcre_study(const pcre *code, int options
698            const char **errptr);
699
700       If  a  compiled  pattern is going to be used several times, it is worth
701       spending more time analyzing it in order to speed up the time taken for
702       matching.  The function pcre_study() takes a pointer to a compiled pat‐
703       tern as its first argument. If studying the pattern produces additional
704       information  that  will  help speed up matching, pcre_study() returns a
705       pointer to a pcre_extra block, in which the study_data field points  to
706       the results of the study.
707
708       The  returned  value  from  pcre_study()  can  be  passed  directly  to
709       pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block  also  con‐
710       tains  other  fields  that can be set by the caller before the block is
711       passed; these are described below in the section on matching a pattern.
712
713       If studying the  pattern  does  not  produce  any  useful  information,
714       pcre_study() returns NULL. In that circumstance, if the calling program
715       wants  to  pass  any  of   the   other   fields   to   pcre_exec()   or
716       pcre_dfa_exec(), it must set up its own pcre_extra block.
717
718       The  second  argument of pcre_study() contains option bits. At present,
719       no options are defined, and this argument should always be zero.
720
721       The third argument for pcre_study() is a pointer for an error  message.
722       If  studying  succeeds  (even  if no data is returned), the variable it
723       points to is set to NULL. Otherwise it is set to  point  to  a  textual
724       error message. This is a static string that is part of the library. You
725       must not try to free it. You should test the  error  pointer  for  NULL
726       after calling pcre_study(), to be sure that it has run successfully.
727
728       This is a typical call to pcre_study():
729
730         pcre_extra *pe;
731         pe = pcre_study(
732           re,             /* result of pcre_compile() */
733           0,              /* no options exist */
734           &error);        /* set to NULL or points to a message */
735
736       Studying a pattern does two things: first, a lower bound for the length
737       of subject string that is needed to match the pattern is computed. This
738       does not mean that there are any strings of that length that match, but
739       it does guarantee that no shorter strings match. The value is  used  by
740       pcre_exec()  and  pcre_dfa_exec()  to  avoid  wasting time by trying to
741       match strings that are shorter than the lower bound. You can  find  out
742       the value in a calling program via the pcre_fullinfo() function.
743
744       Studying a pattern is also useful for non-anchored patterns that do not
745       have a single fixed starting character. A bitmap of  possible  starting
746       bytes  is  created. This speeds up finding a position in the subject at
747       which to start matching.
748
749       The two optimizations just described can be  disabled  by  setting  the
750       PCRE_NO_START_OPTIMIZE    option    when    calling    pcre_exec()   or
751       pcre_dfa_exec(). You might want to do this  if  your  pattern  contains
752       callouts,  or  make  use of (*MARK), and you make use of these in cases
753       where matching fails.  See  the  discussion  of  PCRE_NO_START_OPTIMIZE
754       below.
755

LOCALE SUPPORT

757
758       PCRE  handles  caseless matching, and determines whether characters are
759       letters, digits, or whatever, by reference to a set of tables,  indexed
760       by  character  value.  When running in UTF-8 mode, this applies only to
761       characters with codes less than 128. By  default,  higher-valued  codes
762       never match escapes such as \w or \d, but they can be tested with \p if
763       PCRE is built with Unicode character property  support.  Alternatively,
764       the  PCRE_UCP  option  can  be  set at compile time; this causes \w and
765       friends to use Unicode property support instead of built-in tables. The
766       use of locales with Unicode is discouraged. If you are handling charac‐
767       ters with codes greater than 128, you should either use UTF-8 and  Uni‐
768       code, or use locales, but not try to mix the two.
769
770       PCRE  contains  an  internal set of tables that are used when the final
771       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
772       applications.  Normally, the internal tables recognize only ASCII char‐
773       acters. However, when PCRE is built, it is possible to cause the inter‐
774       nal tables to be rebuilt in the default "C" locale of the local system,
775       which may cause them to be different.
776
777       The internal tables can always be overridden by tables supplied by  the
778       application that calls PCRE. These may be created in a different locale
779       from the default. As more and more applications change  to  using  Uni‐
780       code, the need for this locale support is expected to die away.
781
782       External  tables  are  built by calling the pcre_maketables() function,
783       which has no arguments, in the relevant locale. The result can then  be
784       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
785       example, to build and use tables that are appropriate  for  the  French
786       locale  (where  accented  characters  with  values greater than 128 are
787       treated as letters), the following code could be used:
788
789         setlocale(LC_CTYPE, "fr_FR");
790         tables = pcre_maketables();
791         re = pcre_compile(..., tables);
792
793       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
794       if you are using Windows, the name for the French locale is "french".
795
796       When  pcre_maketables()  runs,  the  tables are built in memory that is
797       obtained via pcre_malloc. It is the caller's responsibility  to  ensure
798       that  the memory containing the tables remains available for as long as
799       it is needed.
800
801       The pointer that is passed to pcre_compile() is saved with the compiled
802       pattern,  and the same tables are used via this pointer by pcre_study()
803       and normally also by pcre_exec(). Thus, by default, for any single pat‐
804       tern, compilation, studying and matching all happen in the same locale,
805       but different patterns can be compiled in different locales.
806
807       It is possible to pass a table pointer or NULL (indicating the  use  of
808       the  internal  tables)  to  pcre_exec(). Although not intended for this
809       purpose, this facility could be used to match a pattern in a  different
810       locale from the one in which it was compiled. Passing table pointers at
811       run time is discussed below in the section on matching a pattern.
812

INFORMATION ABOUT A PATTERN

814
815       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
816            int what, void *where);
817
818       The pcre_fullinfo() function returns information about a compiled  pat‐
819       tern. It replaces the obsolete pcre_info() function, which is neverthe‐
820       less retained for backwards compability (and is documented below).
821
822       The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
823       pattern.  The second argument is the result of pcre_study(), or NULL if
824       the pattern was not studied. The third argument specifies  which  piece
825       of  information  is required, and the fourth argument is a pointer to a
826       variable to receive the data. The yield of the  function  is  zero  for
827       success, or one of the following negative numbers:
828
829         PCRE_ERROR_NULL       the argument code was NULL
830                               the argument where was NULL
831         PCRE_ERROR_BADMAGIC   the "magic number" was not found
832         PCRE_ERROR_BADOPTION  the value of what was invalid
833
834       The  "magic  number" is placed at the start of each compiled pattern as
835       an simple check against passing an arbitrary memory pointer. Here is  a
836       typical  call  of pcre_fullinfo(), to obtain the length of the compiled
837       pattern:
838
839         int rc;
840         size_t length;
841         rc = pcre_fullinfo(
842           re,               /* result of pcre_compile() */
843           pe,               /* result of pcre_study(), or NULL */
844           PCRE_INFO_SIZE,   /* what is required */
845           &length);         /* where to put the data */
846
847       The possible values for the third argument are defined in  pcre.h,  and
848       are as follows:
849
850         PCRE_INFO_BACKREFMAX
851
852       Return  the  number  of  the highest back reference in the pattern. The
853       fourth argument should point to an int variable. Zero  is  returned  if
854       there are no back references.
855
856         PCRE_INFO_CAPTURECOUNT
857
858       Return  the  number of capturing subpatterns in the pattern. The fourth
859       argument should point to an int variable.
860
861         PCRE_INFO_DEFAULT_TABLES
862
863       Return a pointer to the internal default character tables within  PCRE.
864       The  fourth  argument should point to an unsigned char * variable. This
865       information call is provided for internal use by the pcre_study() func‐
866       tion.  External  callers  can  cause PCRE to use its internal tables by
867       passing a NULL table pointer.
868
869         PCRE_INFO_FIRSTBYTE
870
871       Return information about the first byte of any matched  string,  for  a
872       non-anchored  pattern. The fourth argument should point to an int vari‐
873       able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
874       is still recognized for backwards compatibility.)
875
876       If  there  is  a  fixed first byte, for example, from a pattern such as
877       (cat|cow|coyote), its value is returned. Otherwise, if either
878
879       (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
880       branch starts with "^", or
881
882       (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
883       set (if it were set, the pattern would be anchored),
884
885       -1 is returned, indicating that the pattern matches only at  the  start
886       of  a  subject string or after any newline within the string. Otherwise
887       -2 is returned. For anchored patterns, -2 is returned.
888
889         PCRE_INFO_FIRSTTABLE
890
891       If the pattern was studied, and this resulted in the construction of  a
892       256-bit table indicating a fixed set of bytes for the first byte in any
893       matching string, a pointer to the table is returned. Otherwise NULL  is
894       returned.  The fourth argument should point to an unsigned char * vari‐
895       able.
896
897         PCRE_INFO_HASCRORLF
898
899       Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
900       characters,  otherwise  0.  The  fourth argument should point to an int
901       variable. An explicit match is either a literal CR or LF character,  or
902       \r or \n.
903
904         PCRE_INFO_JCHANGED
905
906       Return  1  if  the (?J) or (?-J) option setting is used in the pattern,
907       otherwise 0. The fourth argument should point to an int variable.  (?J)
908       and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.
909
910         PCRE_INFO_LASTLITERAL
911
912       Return  the  value of the rightmost literal byte that must exist in any
913       matched string, other than at its  start,  if  such  a  byte  has  been
914       recorded. The fourth argument should point to an int variable. If there
915       is no such byte, -1 is returned. For anchored patterns, a last  literal
916       byte  is  recorded only if it follows something of variable length. For
917       example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
918       /^a\dz\d/ the returned value is -1.
919
920         PCRE_INFO_MINLENGTH
921
922       If  the  pattern  was studied and a minimum length for matching subject
923       strings was computed, its value is  returned.  Otherwise  the  returned
924       value  is  -1. The value is a number of characters, not bytes (this may
925       be relevant in UTF-8 mode). The fourth argument should point to an  int
926       variable.  A  non-negative  value is a lower bound to the length of any
927       matching string. There may not be any strings of that  length  that  do
928       actually match, but every string that does match is at least that long.
929
930         PCRE_INFO_NAMECOUNT
931         PCRE_INFO_NAMEENTRYSIZE
932         PCRE_INFO_NAMETABLE
933
934       PCRE  supports the use of named as well as numbered capturing parenthe‐
935       ses. The names are just an additional way of identifying the  parenthe‐
936       ses, which still acquire numbers. Several convenience functions such as
937       pcre_get_named_substring() are provided for  extracting  captured  sub‐
938       strings  by  name. It is also possible to extract the data directly, by
939       first converting the name to a number in order to  access  the  correct
940       pointers in the output vector (described with pcre_exec() below). To do
941       the conversion, you need  to  use  the  name-to-number  map,  which  is
942       described by these three values.
943
944       The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
945       gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
946       of  each  entry;  both  of  these  return  an int value. The entry size
947       depends on the length of the longest name. PCRE_INFO_NAMETABLE  returns
948       a  pointer  to  the  first  entry of the table (a pointer to char). The
949       first two bytes of each entry are the number of the capturing parenthe‐
950       sis,  most  significant byte first. The rest of the entry is the corre‐
951       sponding name, zero terminated.
952
953       The names are in alphabetical order. Duplicate names may appear if  (?|
954       is used to create multiple groups with the same number, as described in
955       the section on duplicate subpattern numbers in  the  pcrepattern  page.
956       Duplicate  names  for  subpatterns with different numbers are permitted
957       only if PCRE_DUPNAMES is set. In all cases  of  duplicate  names,  they
958       appear  in  the table in the order in which they were found in the pat‐
959       tern. In the absence of (?| this is the  order  of  increasing  number;
960       when (?| is used this is not necessarily the case because later subpat‐
961       terns may have lower numbers.
962
963       As a simple example of the name/number table,  consider  the  following
964       pattern  (assume  PCRE_EXTENDED is set, so white space - including new‐
965       lines - is ignored):
966
967         (?<date> (?<year>(\d\d)?\d\d) -
968         (?<month>\d\d) - (?<day>\d\d) )
969
970       There are four named subpatterns, so the table has  four  entries,  and
971       each  entry  in the table is eight bytes long. The table is as follows,
972       with non-printing bytes shows in hexadecimal, and undefined bytes shown
973       as ??:
974
975         00 01 d  a  t  e  00 ??
976         00 05 d  a  y  00 ?? ??
977         00 04 m  o  n  t  h  00
978         00 02 y  e  a  r  00 ??
979
980       When  writing  code  to  extract  data from named subpatterns using the
981       name-to-number map, remember that the length of the entries  is  likely
982       to be different for each compiled pattern.
983
984         PCRE_INFO_OKPARTIAL
985
986       Return  1  if  the  pattern  can  be  used  for  partial  matching with
987       pcre_exec(), otherwise 0. The fourth argument should point  to  an  int
988       variable.  From  release  8.00,  this  always  returns  1,  because the
989       restrictions that previously applied  to  partial  matching  have  been
990       lifted.  The  pcrepartial documentation gives details of partial match‐
991       ing.
992
993         PCRE_INFO_OPTIONS
994
995       Return a copy of the options with which the pattern was  compiled.  The
996       fourth  argument  should  point to an unsigned long int variable. These
997       option bits are those specified in the call to pcre_compile(), modified
998       by any top-level option settings at the start of the pattern itself. In
999       other words, they are the options that will be in force  when  matching
1000       starts.  For  example, if the pattern /(?im)abc(?-i)d/ is compiled with
1001       the PCRE_EXTENDED option, the result is PCRE_CASELESS,  PCRE_MULTILINE,
1002       and PCRE_EXTENDED.
1003
1004       A  pattern  is  automatically  anchored by PCRE if all of its top-level
1005       alternatives begin with one of the following:
1006
1007         ^     unless PCRE_MULTILINE is set
1008         \A    always
1009         \G    always
1010         .*    if PCRE_DOTALL is set and there are no back
1011                 references to the subpattern in which .* appears
1012
1013       For such patterns, the PCRE_ANCHORED bit is set in the options returned
1014       by pcre_fullinfo().
1015
1016         PCRE_INFO_SIZE
1017
1018       Return  the  size  of the compiled pattern, that is, the value that was
1019       passed as the argument to pcre_malloc() when PCRE was getting memory in
1020       which to place the compiled data. The fourth argument should point to a
1021       size_t variable.
1022
1023         PCRE_INFO_STUDYSIZE
1024
1025       Return the size of the data block pointed to by the study_data field in
1026       a  pcre_extra  block.  That  is,  it  is  the  value that was passed to
1027       pcre_malloc() when PCRE was getting memory into which to place the data
1028       created  by  pcre_study().  If pcre_extra is NULL, or there is no study
1029       data, zero is returned. The fourth argument should point  to  a  size_t
1030       variable.
1031

OBSOLETE INFO FUNCTION

1033
1034       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
1035
1036       The  pcre_info()  function is now obsolete because its interface is too
1037       restrictive to return all the available data about a compiled  pattern.
1038       New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
1039       pcre_info() is the number of capturing subpatterns, or one of the  fol‐
1040       lowing negative numbers:
1041
1042         PCRE_ERROR_NULL       the argument code was NULL
1043         PCRE_ERROR_BADMAGIC   the "magic number" was not found
1044
1045       If  the  optptr  argument is not NULL, a copy of the options with which
1046       the pattern was compiled is placed in the integer  it  points  to  (see
1047       PCRE_INFO_OPTIONS above).
1048
1049       If  the  pattern  is  not anchored and the firstcharptr argument is not
1050       NULL, it is used to pass back information about the first character  of
1051       any matched string (see PCRE_INFO_FIRSTBYTE above).
1052

REFERENCE COUNTS

1054
1055       int pcre_refcount(pcre *code, int adjust);
1056
1057       The  pcre_refcount()  function is used to maintain a reference count in
1058       the data block that contains a compiled pattern. It is provided for the
1059       benefit  of  applications  that  operate  in an object-oriented manner,
1060       where different parts of the application may be using the same compiled
1061       pattern, but you want to free the block when they are all done.
1062
1063       When a pattern is compiled, the reference count field is initialized to
1064       zero.  It is changed only by calling this function, whose action is  to
1065       add  the  adjust  value  (which may be positive or negative) to it. The
1066       yield of the function is the new value. However, the value of the count
1067       is  constrained to lie between 0 and 65535, inclusive. If the new value
1068       is outside these limits, it is forced to the appropriate limit value.
1069
1070       Except when it is zero, the reference count is not correctly  preserved
1071       if  a  pattern  is  compiled on one host and then transferred to a host
1072       whose byte-order is different. (This seems a highly unlikely scenario.)
1073

MATCHING A PATTERN: THE TRADITIONAL FUNCTION

1075
1076       int pcre_exec(const pcre *code, const pcre_extra *extra,
1077            const char *subject, int length, int startoffset,
1078            int options, int *ovector, int ovecsize);
1079
1080       The function pcre_exec() is called to match a subject string against  a
1081       compiled  pattern, which is passed in the code argument. If the pattern
1082       was studied, the result of the study should  be  passed  in  the  extra
1083       argument.  This  function is the main matching facility of the library,
1084       and it operates in a Perl-like manner. For specialist use there is also
1085       an  alternative matching function, which is described below in the sec‐
1086       tion about the pcre_dfa_exec() function.
1087
1088       In most applications, the pattern will have been compiled (and  option‐
1089       ally  studied)  in the same process that calls pcre_exec(). However, it
1090       is possible to save compiled patterns and study data, and then use them
1091       later  in  different processes, possibly even on different hosts. For a
1092       discussion about this, see the pcreprecompile documentation.
1093
1094       Here is an example of a simple call to pcre_exec():
1095
1096         int rc;
1097         int ovector[30];
1098         rc = pcre_exec(
1099           re,             /* result of pcre_compile() */
1100           NULL,           /* we didn't study the pattern */
1101           "some string",  /* the subject string */
1102           11,             /* the length of the subject string */
1103           0,              /* start at offset 0 in the subject */
1104           0,              /* default options */
1105           ovector,        /* vector of integers for substring information */
1106           30);            /* number of elements (NOT size in bytes) */
1107
1108   Extra data for pcre_exec()
1109
1110       If the extra argument is not NULL, it must point to a  pcre_extra  data
1111       block.  The pcre_study() function returns such a block (when it doesn't
1112       return NULL), but you can also create one for yourself, and pass  addi‐
1113       tional  information  in it. The pcre_extra block contains the following
1114       fields (not necessarily in this order):
1115
1116         unsigned long int flags;
1117         void *study_data;
1118         unsigned long int match_limit;
1119         unsigned long int match_limit_recursion;
1120         void *callout_data;
1121         const unsigned char *tables;
1122         unsigned char **mark;
1123
1124       The flags field is a bitmap that specifies which of  the  other  fields
1125       are set. The flag bits are:
1126
1127         PCRE_EXTRA_STUDY_DATA
1128         PCRE_EXTRA_MATCH_LIMIT
1129         PCRE_EXTRA_MATCH_LIMIT_RECURSION
1130         PCRE_EXTRA_CALLOUT_DATA
1131         PCRE_EXTRA_TABLES
1132         PCRE_EXTRA_MARK
1133
1134       Other  flag  bits should be set to zero. The study_data field is set in
1135       the pcre_extra block that is returned by  pcre_study(),  together  with
1136       the appropriate flag bit. You should not set this yourself, but you may
1137       add to the block by setting the other fields  and  their  corresponding
1138       flag bits.
1139
1140       The match_limit field provides a means of preventing PCRE from using up
1141       a vast amount of resources when running patterns that are not going  to
1142       match,  but  which  have  a very large number of possibilities in their
1143       search trees. The classic example is a pattern that uses nested  unlim‐
1144       ited repeats.
1145
1146       Internally,  PCRE uses a function called match() which it calls repeat‐
1147       edly (sometimes recursively). The limit set by match_limit  is  imposed
1148       on  the  number  of times this function is called during a match, which
1149       has the effect of limiting the amount of  backtracking  that  can  take
1150       place. For patterns that are not anchored, the count restarts from zero
1151       for each position in the subject string.
1152
1153       The default value for the limit can be set  when  PCRE  is  built;  the
1154       default  default  is 10 million, which handles all but the most extreme
1155       cases. You can override the default  by  suppling  pcre_exec()  with  a
1156       pcre_extra     block    in    which    match_limit    is    set,    and
1157       PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
1158       exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1159
1160       The  match_limit_recursion field is similar to match_limit, but instead
1161       of limiting the total number of times that match() is called, it limits
1162       the  depth  of  recursion. The recursion depth is a smaller number than
1163       the total number of calls, because not all calls to match() are  recur‐
1164       sive.  This limit is of use only if it is set smaller than match_limit.
1165
1166       Limiting  the  recursion  depth  limits the amount of stack that can be
1167       used, or, when PCRE has been compiled to use memory on the heap instead
1168       of the stack, the amount of heap memory that can be used.
1169
1170       The  default  value  for  match_limit_recursion can be set when PCRE is
1171       built; the default default  is  the  same  value  as  the  default  for
1172       match_limit.  You can override the default by suppling pcre_exec() with
1173       a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
1174       PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
1175       limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1176
1177       The callout_data field is used in conjunction with the  "callout"  fea‐
1178       ture, and is described in the pcrecallout documentation.
1179
1180       The  tables  field  is  used  to  pass  a  character  tables pointer to
1181       pcre_exec(); this overrides the value that is stored with the  compiled
1182       pattern.  A  non-NULL value is stored with the compiled pattern only if
1183       custom tables were supplied to pcre_compile() via  its  tableptr  argu‐
1184       ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1185       PCRE's internal tables to be used. This facility is  helpful  when  re-
1186       using  patterns  that  have been saved after compiling with an external
1187       set of tables, because the external tables  might  be  at  a  different
1188       address  when  pcre_exec() is called. See the pcreprecompile documenta‐
1189       tion for a discussion of saving compiled patterns for later use.
1190
1191       If PCRE_EXTRA_MARK is set in the flags field, the mark  field  must  be
1192       set  to  point  to a char * variable. If the pattern contains any back‐
1193       tracking control verbs such as (*MARK:NAME), and the execution ends  up
1194       with  a  name  to  pass back, a pointer to the name string (zero termi‐
1195       nated) is placed in the variable pointed to  by  the  mark  field.  The
1196       names  are  within  the  compiled pattern; if you wish to retain such a
1197       name you must copy it before freeing the memory of a compiled  pattern.
1198       If  there  is no name to pass back, the variable pointed to by the mark
1199       field set to NULL. For details of the backtracking control  verbs,  see
1200       the section entitled "Backtracking control" in the pcrepattern documen‐
1201       tation.
1202
1203   Option bits for pcre_exec()
1204
1205       The unused bits of the options argument for pcre_exec() must  be  zero.
1206       The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
1207       PCRE_NOTBOL,   PCRE_NOTEOL,    PCRE_NOTEMPTY,    PCRE_NOTEMPTY_ATSTART,
1208       PCRE_NO_START_OPTIMIZE,   PCRE_NO_UTF8_CHECK,   PCRE_PARTIAL_SOFT,  and
1209       PCRE_PARTIAL_HARD.
1210
1211         PCRE_ANCHORED
1212
1213       The PCRE_ANCHORED option limits pcre_exec() to matching  at  the  first
1214       matching  position.  If  a  pattern was compiled with PCRE_ANCHORED, or
1215       turned out to be anchored by virtue of its contents, it cannot be  made
1216       unachored at matching time.
1217
1218         PCRE_BSR_ANYCRLF
1219         PCRE_BSR_UNICODE
1220
1221       These options (which are mutually exclusive) control what the \R escape
1222       sequence matches. The choice is either to match only CR, LF,  or  CRLF,
1223       or  to  match  any Unicode newline sequence. These options override the
1224       choice that was made or defaulted when the pattern was compiled.
1225
1226         PCRE_NEWLINE_CR
1227         PCRE_NEWLINE_LF
1228         PCRE_NEWLINE_CRLF
1229         PCRE_NEWLINE_ANYCRLF
1230         PCRE_NEWLINE_ANY
1231
1232       These options override  the  newline  definition  that  was  chosen  or
1233       defaulted  when the pattern was compiled. For details, see the descrip‐
1234       tion of pcre_compile()  above.  During  matching,  the  newline  choice
1235       affects  the  behaviour  of the dot, circumflex, and dollar metacharac‐
1236       ters. It may also alter the way the match position is advanced after  a
1237       match failure for an unanchored pattern.
1238
1239       When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
1240       set, and a match attempt for an unanchored pattern fails when the  cur‐
1241       rent  position  is  at  a  CRLF  sequence,  and the pattern contains no
1242       explicit matches for  CR  or  LF  characters,  the  match  position  is
1243       advanced by two characters instead of one, in other words, to after the
1244       CRLF.
1245
1246       The above rule is a compromise that makes the most common cases work as
1247       expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
1248       option is not set), it does not match the string "\r\nA" because, after
1249       failing  at the start, it skips both the CR and the LF before retrying.
1250       However, the pattern [\r\n]A does match that string,  because  it  con‐
1251       tains an explicit CR or LF reference, and so advances only by one char‐
1252       acter after the first failure.
1253
1254       An explicit match for CR of LF is either a literal appearance of one of
1255       those  characters,  or  one  of the \r or \n escape sequences. Implicit
1256       matches such as [^X] do not count, nor does \s (which includes  CR  and
1257       LF in the characters that it matches).
1258
1259       Notwithstanding  the above, anomalous effects may still occur when CRLF
1260       is a valid newline sequence and explicit \r or \n escapes appear in the
1261       pattern.
1262
1263         PCRE_NOTBOL
1264
1265       This option specifies that first character of the subject string is not
1266       the beginning of a line, so the  circumflex  metacharacter  should  not
1267       match  before it. Setting this without PCRE_MULTILINE (at compile time)
1268       causes circumflex never to match. This option affects only  the  behav‐
1269       iour of the circumflex metacharacter. It does not affect \A.
1270
1271         PCRE_NOTEOL
1272
1273       This option specifies that the end of the subject string is not the end
1274       of a line, so the dollar metacharacter should not match it nor  (except
1275       in  multiline mode) a newline immediately before it. Setting this with‐
1276       out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1277       option  affects only the behaviour of the dollar metacharacter. It does
1278       not affect \Z or \z.
1279
1280         PCRE_NOTEMPTY
1281
1282       An empty string is not considered to be a valid match if this option is
1283       set.  If  there are alternatives in the pattern, they are tried. If all
1284       the alternatives match the empty string, the entire  match  fails.  For
1285       example, if the pattern
1286
1287         a?b?
1288
1289       is  applied  to  a  string not beginning with "a" or "b", it matches an
1290       empty string at the start of the subject. With PCRE_NOTEMPTY set,  this
1291       match is not valid, so PCRE searches further into the string for occur‐
1292       rences of "a" or "b".
1293
1294         PCRE_NOTEMPTY_ATSTART
1295
1296       This is like PCRE_NOTEMPTY, except that an empty string match  that  is
1297       not  at  the  start  of  the  subject  is  permitted. If the pattern is
1298       anchored, such a match can occur only if the pattern contains \K.
1299
1300       Perl    has    no    direct    equivalent    of    PCRE_NOTEMPTY     or
1301       PCRE_NOTEMPTY_ATSTART,  but  it  does  make a special case of a pattern
1302       match of the empty string within its split() function, and  when  using
1303       the  /g  modifier.  It  is  possible  to emulate Perl's behaviour after
1304       matching a null string by first trying the match again at the same off‐
1305       set  with  PCRE_NOTEMPTY_ATSTART  and  PCRE_ANCHORED,  and then if that
1306       fails, by advancing the starting offset (see below) and trying an ordi‐
1307       nary  match  again. There is some code that demonstrates how to do this
1308       in the pcredemo sample program.
1309
1310         PCRE_NO_START_OPTIMIZE
1311
1312       There are a number of optimizations that pcre_exec() uses at the  start
1313       of  a  match,  in  order to speed up the process. For example, if it is
1314       known that an unanchored match must start with a specific character, it
1315       searches  the  subject  for that character, and fails immediately if it
1316       cannot find it, without actually running the  main  matching  function.
1317       This means that a special item such as (*COMMIT) at the start of a pat‐
1318       tern is not considered until after a suitable starting  point  for  the
1319       match  has been found. When callouts or (*MARK) items are in use, these
1320       "start-up" optimizations can cause them to be skipped if the pattern is
1321       never  actually  used.  The start-up optimizations are in effect a pre-
1322       scan of the subject that takes place before the pattern is run.
1323
1324       The PCRE_NO_START_OPTIMIZE option disables the start-up  optimizations,
1325       possibly  causing  performance  to  suffer,  but ensuring that in cases
1326       where the result is "no match", the callouts do occur, and  that  items
1327       such as (*COMMIT) and (*MARK) are considered at every possible starting
1328       position in the subject  string.   Setting  PCRE_NO_START_OPTIMIZE  can
1329       change the outcome of a matching operation.  Consider the pattern
1330
1331         (*COMMIT)ABC
1332
1333       When  this  is  compiled, PCRE records the fact that a match must start
1334       with the character "A". Suppose the subject  string  is  "DEFABC".  The
1335       start-up  optimization  scans along the subject, finds "A" and runs the
1336       first match attempt from there. The (*COMMIT) item means that the  pat‐
1337       tern  must  match the current starting position, which in this case, it
1338       does. However, if the same match  is  run  with  PCRE_NO_START_OPTIMIZE
1339       set,  the  initial  scan  along the subject string does not happen. The
1340       first match attempt is run starting  from  "D"  and  when  this  fails,
1341       (*COMMIT)  prevents  any  further  matches  being tried, so the overall
1342       result is "no match". If the pattern is studied,  more  start-up  opti‐
1343       mizations  may  be  used. For example, a minimum length for the subject
1344       may be recorded. Consider the pattern
1345
1346         (*MARK:A)(X|Y)
1347
1348       The minimum length for a match is one  character.  If  the  subject  is
1349       "ABC",  there  will  be  attempts  to  match "ABC", "BC", "C", and then
1350       finally an empty string.  If the pattern is studied, the final  attempt
1351       does  not take place, because PCRE knows that the subject is too short,
1352       and so the (*MARK) is never encountered.  In this  case,  studying  the
1353       pattern  does  not  affect the overall match result, which is still "no
1354       match", but it does affect the auxiliary information that is returned.
1355
1356         PCRE_NO_UTF8_CHECK
1357
1358       When PCRE_UTF8 is set at compile time, the validity of the subject as a
1359       UTF-8  string is automatically checked when pcre_exec() is subsequently
1360       called.  The value of startoffset is also checked  to  ensure  that  it
1361       points  to  the start of a UTF-8 character. There is a discussion about
1362       the validity of UTF-8 strings in the section on UTF-8  support  in  the
1363       main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
1364       pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con‐
1365       tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
1366
1367       If  you  already  know that your subject is valid, and you want to skip
1368       these   checks   for   performance   reasons,   you   can    set    the
1369       PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
1370       do this for the second and subsequent calls to pcre_exec() if  you  are
1371       making  repeated  calls  to  find  all  the matches in a single subject
1372       string. However, you should be  sure  that  the  value  of  startoffset
1373       points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1374       set, the effect of passing an invalid UTF-8 string as a subject,  or  a
1375       value  of startoffset that does not point to the start of a UTF-8 char‐
1376       acter, is undefined. Your program may crash.
1377
1378         PCRE_PARTIAL_HARD
1379         PCRE_PARTIAL_SOFT
1380
1381       These options turn on the partial matching feature. For backwards  com‐
1382       patibility,  PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial
1383       match occurs if the end of the subject string is reached  successfully,
1384       but  there  are not enough subject characters to complete the match. If
1385       this happens when PCRE_PARTIAL_HARD  is  set,  pcre_exec()  immediately
1386       returns  PCRE_ERROR_PARTIAL.  Otherwise,  if  PCRE_PARTIAL_SOFT is set,
1387       matching continues by testing any other alternatives. Only if they  all
1388       fail  is  PCRE_ERROR_PARTIAL  returned (instead of PCRE_ERROR_NOMATCH).
1389       The portion of the string that was inspected when the partial match was
1390       found  is  set  as  the first matching string. There is a more detailed
1391       discussion in the pcrepartial documentation.
1392
1393   The string to be matched by pcre_exec()
1394
1395       The subject string is passed to pcre_exec() as a pointer in subject,  a
1396       length (in bytes) in length, and a starting byte offset in startoffset.
1397       In UTF-8 mode, the byte offset must point to the start of a UTF-8 char‐
1398       acter.  Unlike  the pattern string, the subject may contain binary zero
1399       bytes. When the starting offset is zero, the search for a match  starts
1400       at  the  beginning  of  the subject, and this is by far the most common
1401       case.
1402
1403       A non-zero starting offset is useful when searching for  another  match
1404       in  the same subject by calling pcre_exec() again after a previous suc‐
1405       cess.  Setting startoffset differs from just passing over  a  shortened
1406       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
1407       with any kind of lookbehind. For example, consider the pattern
1408
1409         \Biss\B
1410
1411       which finds occurrences of "iss" in the middle of  words.  (\B  matches
1412       only  if  the  current position in the subject is not a word boundary.)
1413       When applied to the string "Mississipi" the first call  to  pcre_exec()
1414       finds  the  first  occurrence. If pcre_exec() is called again with just
1415       the remainder of the subject,  namely  "issipi",  it  does  not  match,
1416       because \B is always false at the start of the subject, which is deemed
1417       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
1418       string again, but with startoffset set to 4, it finds the second occur‐
1419       rence of "iss" because it is able to look behind the starting point  to
1420       discover that it is preceded by a letter.
1421
1422       If  a  non-zero starting offset is passed when the pattern is anchored,
1423       one attempt to match at the given offset is made. This can only succeed
1424       if  the  pattern  does  not require the match to be at the start of the
1425       subject.
1426
1427   How pcre_exec() returns captured substrings
1428
1429       In general, a pattern matches a certain portion of the subject, and  in
1430       addition,  further  substrings  from  the  subject may be picked out by
1431       parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
1432       this  is  called "capturing" in what follows, and the phrase "capturing
1433       subpattern" is used for a fragment of a pattern that picks out  a  sub‐
1434       string.  PCRE  supports several other kinds of parenthesized subpattern
1435       that do not cause substrings to be captured.
1436
1437       Captured substrings are returned to the caller via a vector of integers
1438       whose  address is passed in ovector. The number of elements in the vec‐
1439       tor is passed in ovecsize, which must be a non-negative  number.  Note:
1440       this argument is NOT the size of ovector in bytes.
1441
1442       The  first  two-thirds of the vector is used to pass back captured sub‐
1443       strings, each substring using a pair of integers. The  remaining  third
1444       of  the  vector is used as workspace by pcre_exec() while matching cap‐
1445       turing subpatterns, and is not available for passing back  information.
1446       The  number passed in ovecsize should always be a multiple of three. If
1447       it is not, it is rounded down.
1448
1449       When a match is successful, information about  captured  substrings  is
1450       returned  in  pairs  of integers, starting at the beginning of ovector,
1451       and continuing up to two-thirds of its length at the  most.  The  first
1452       element  of  each pair is set to the byte offset of the first character
1453       in a substring, and the second is set to the byte offset of  the  first
1454       character  after  the end of a substring. Note: these values are always
1455       byte offsets, even in UTF-8 mode. They are not character counts.
1456
1457       The first pair of integers, ovector[0]  and  ovector[1],  identify  the
1458       portion  of  the subject string matched by the entire pattern. The next
1459       pair is used for the first capturing subpattern, and so on.  The  value
1460       returned by pcre_exec() is one more than the highest numbered pair that
1461       has been set.  For example, if two substrings have been  captured,  the
1462       returned  value is 3. If there are no capturing subpatterns, the return
1463       value from a successful match is 1, indicating that just the first pair
1464       of offsets has been set.
1465
1466       If a capturing subpattern is matched repeatedly, it is the last portion
1467       of the string that it matched that is returned.
1468
1469       If the vector is too small to hold all the captured substring  offsets,
1470       it is used as far as possible (up to two-thirds of its length), and the
1471       function returns a value of zero. If the substring offsets are  not  of
1472       interest,  pcre_exec()  may  be  called with ovector passed as NULL and
1473       ovecsize as zero. However, if the pattern contains back references  and
1474       the  ovector is not big enough to remember the related substrings, PCRE
1475       has to get additional memory for use during matching. Thus it  is  usu‐
1476       ally advisable to supply an ovector.
1477
1478       The pcre_fullinfo() function can be used to find out how many capturing
1479       subpatterns there are in a compiled  pattern.  The  smallest  size  for
1480       ovector  that  will allow for n captured substrings, in addition to the
1481       offsets of the substring matched by the whole pattern, is (n+1)*3.
1482
1483       It is possible for capturing subpattern number n+1 to match  some  part
1484       of the subject when subpattern n has not been used at all. For example,
1485       if the string "abc" is matched  against  the  pattern  (a|(z))(bc)  the
1486       return from the function is 4, and subpatterns 1 and 3 are matched, but
1487       2 is not. When this happens, both values in  the  offset  pairs  corre‐
1488       sponding to unused subpatterns are set to -1.
1489
1490       Offset  values  that correspond to unused subpatterns at the end of the
1491       expression are also set to -1. For example,  if  the  string  "abc"  is
1492       matched  against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
1493       matched. The return from the function is 2, because  the  highest  used
1494       capturing subpattern number is 1. However, you can refer to the offsets
1495       for the second and third capturing subpatterns if  you  wish  (assuming
1496       the vector is large enough, of course).
1497
1498       Some  convenience  functions  are  provided for extracting the captured
1499       substrings as separate strings. These are described below.
1500
1501   Error return values from pcre_exec()
1502
1503       If pcre_exec() fails, it returns a negative number. The  following  are
1504       defined in the header file:
1505
1506         PCRE_ERROR_NOMATCH        (-1)
1507
1508       The subject string did not match the pattern.
1509
1510         PCRE_ERROR_NULL           (-2)
1511
1512       Either  code  or  subject  was  passed as NULL, or ovector was NULL and
1513       ovecsize was not zero.
1514
1515         PCRE_ERROR_BADOPTION      (-3)
1516
1517       An unrecognized bit was set in the options argument.
1518
1519         PCRE_ERROR_BADMAGIC       (-4)
1520
1521       PCRE stores a 4-byte "magic number" at the start of the compiled  code,
1522       to catch the case when it is passed a junk pointer and to detect when a
1523       pattern that was compiled in an environment of one endianness is run in
1524       an  environment  with the other endianness. This is the error that PCRE
1525       gives when the magic number is not present.
1526
1527         PCRE_ERROR_UNKNOWN_OPCODE (-5)
1528
1529       While running the pattern match, an unknown item was encountered in the
1530       compiled  pattern.  This  error  could be caused by a bug in PCRE or by
1531       overwriting of the compiled pattern.
1532
1533         PCRE_ERROR_NOMEMORY       (-6)
1534
1535       If a pattern contains back references, but the ovector that  is  passed
1536       to pcre_exec() is not big enough to remember the referenced substrings,
1537       PCRE gets a block of memory at the start of matching to  use  for  this
1538       purpose.  If the call via pcre_malloc() fails, this error is given. The
1539       memory is automatically freed at the end of matching.
1540
1541       This error is also given if pcre_stack_malloc() fails  in  pcre_exec().
1542       This  can happen only when PCRE has been compiled with --disable-stack-
1543       for-recursion.
1544
1545         PCRE_ERROR_NOSUBSTRING    (-7)
1546
1547       This error is used by the pcre_copy_substring(),  pcre_get_substring(),
1548       and  pcre_get_substring_list()  functions  (see  below).  It  is  never
1549       returned by pcre_exec().
1550
1551         PCRE_ERROR_MATCHLIMIT     (-8)
1552
1553       The backtracking limit, as specified by  the  match_limit  field  in  a
1554       pcre_extra  structure  (or  defaulted) was reached. See the description
1555       above.
1556
1557         PCRE_ERROR_CALLOUT        (-9)
1558
1559       This error is never generated by pcre_exec() itself. It is provided for
1560       use  by  callout functions that want to yield a distinctive error code.
1561       See the pcrecallout documentation for details.
1562
1563         PCRE_ERROR_BADUTF8        (-10)
1564
1565       A string that contains an invalid UTF-8 byte sequence was passed  as  a
1566       subject.
1567
1568         PCRE_ERROR_BADUTF8_OFFSET (-11)
1569
1570       The UTF-8 byte sequence that was passed as a subject was valid, but the
1571       value of startoffset did not point to the beginning of a UTF-8  charac‐
1572       ter.
1573
1574         PCRE_ERROR_PARTIAL        (-12)
1575
1576       The  subject  string did not match, but it did match partially. See the
1577       pcrepartial documentation for details of partial matching.
1578
1579         PCRE_ERROR_BADPARTIAL     (-13)
1580
1581       This code is no longer in  use.  It  was  formerly  returned  when  the
1582       PCRE_PARTIAL  option  was used with a compiled pattern containing items
1583       that were  not  supported  for  partial  matching.  From  release  8.00
1584       onwards, there are no restrictions on partial matching.
1585
1586         PCRE_ERROR_INTERNAL       (-14)
1587
1588       An  unexpected  internal error has occurred. This error could be caused
1589       by a bug in PCRE or by overwriting of the compiled pattern.
1590
1591         PCRE_ERROR_BADCOUNT       (-15)
1592
1593       This error is given if the value of the ovecsize argument is negative.
1594
1595         PCRE_ERROR_RECURSIONLIMIT (-21)
1596
1597       The internal recursion limit, as specified by the match_limit_recursion
1598       field  in  a  pcre_extra  structure (or defaulted) was reached. See the
1599       description above.
1600
1601         PCRE_ERROR_BADNEWLINE     (-23)
1602
1603       An invalid combination of PCRE_NEWLINE_xxx options was given.
1604
1605       Error numbers -16 to -20 and -22 are not used by pcre_exec().
1606

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

1608
1609       int pcre_copy_substring(const char *subject, int *ovector,
1610            int stringcount, int stringnumber, char *buffer,
1611            int buffersize);
1612
1613       int pcre_get_substring(const char *subject, int *ovector,
1614            int stringcount, int stringnumber,
1615            const char **stringptr);
1616
1617       int pcre_get_substring_list(const char *subject,
1618            int *ovector, int stringcount, const char ***listptr);
1619
1620       Captured substrings can be  accessed  directly  by  using  the  offsets
1621       returned  by  pcre_exec()  in  ovector.  For convenience, the functions
1622       pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub‐
1623       string_list()  are  provided for extracting captured substrings as new,
1624       separate, zero-terminated strings. These functions identify  substrings
1625       by  number.  The  next section describes functions for extracting named
1626       substrings.
1627
1628       A substring that contains a binary zero is correctly extracted and  has
1629       a  further zero added on the end, but the result is not, of course, a C
1630       string.  However, you can process such a string  by  referring  to  the
1631       length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub‐
1632       string().  Unfortunately, the interface to pcre_get_substring_list() is
1633       not  adequate for handling strings containing binary zeros, because the
1634       end of the final string is not independently indicated.
1635
1636       The first three arguments are the same for all  three  of  these  func‐
1637       tions:  subject  is  the subject string that has just been successfully
1638       matched, ovector is a pointer to the vector of integer offsets that was
1639       passed to pcre_exec(), and stringcount is the number of substrings that
1640       were captured by the match, including the substring  that  matched  the
1641       entire regular expression. This is the value returned by pcre_exec() if
1642       it is greater than zero. If pcre_exec() returned zero, indicating  that
1643       it  ran out of space in ovector, the value passed as stringcount should
1644       be the number of elements in the vector divided by three.
1645
1646       The functions pcre_copy_substring() and pcre_get_substring() extract  a
1647       single  substring,  whose  number  is given as stringnumber. A value of
1648       zero extracts the substring that matched the  entire  pattern,  whereas
1649       higher  values  extract  the  captured  substrings.  For pcre_copy_sub‐
1650       string(), the string is placed in buffer,  whose  length  is  given  by
1651       buffersize,  while  for  pcre_get_substring()  a new block of memory is
1652       obtained via pcre_malloc, and its address is  returned  via  stringptr.
1653       The  yield  of  the function is the length of the string, not including
1654       the terminating zero, or one of these error codes:
1655
1656         PCRE_ERROR_NOMEMORY       (-6)
1657
1658       The buffer was too small for pcre_copy_substring(), or the  attempt  to
1659       get memory failed for pcre_get_substring().
1660
1661         PCRE_ERROR_NOSUBSTRING    (-7)
1662
1663       There is no substring whose number is stringnumber.
1664
1665       The  pcre_get_substring_list()  function  extracts  all  available sub‐
1666       strings and builds a list of pointers to them. All this is  done  in  a
1667       single block of memory that is obtained via pcre_malloc. The address of
1668       the memory block is returned via listptr, which is also  the  start  of
1669       the  list  of  string pointers. The end of the list is marked by a NULL
1670       pointer. The yield of the function is zero if all  went  well,  or  the
1671       error code
1672
1673         PCRE_ERROR_NOMEMORY       (-6)
1674
1675       if the attempt to get the memory block failed.
1676
1677       When  any of these functions encounter a substring that is unset, which
1678       can happen when capturing subpattern number n+1 matches  some  part  of
1679       the  subject, but subpattern n has not been used at all, they return an
1680       empty string. This can be distinguished from a genuine zero-length sub‐
1681       string  by inspecting the appropriate offset in ovector, which is nega‐
1682       tive for unset substrings.
1683
1684       The two convenience functions pcre_free_substring() and  pcre_free_sub‐
1685       string_list()  can  be  used  to free the memory returned by a previous
1686       call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec‐
1687       tively.  They  do  nothing  more  than  call the function pointed to by
1688       pcre_free, which of course could be called directly from a  C  program.
1689       However,  PCRE is used in some situations where it is linked via a spe‐
1690       cial  interface  to  another  programming  language  that  cannot   use
1691       pcre_free  directly;  it is for these cases that the functions are pro‐
1692       vided.
1693

EXTRACTING CAPTURED SUBSTRINGS BY NAME

1695
1696       int pcre_get_stringnumber(const pcre *code,
1697            const char *name);
1698
1699       int pcre_copy_named_substring(const pcre *code,
1700            const char *subject, int *ovector,
1701            int stringcount, const char *stringname,
1702            char *buffer, int buffersize);
1703
1704       int pcre_get_named_substring(const pcre *code,
1705            const char *subject, int *ovector,
1706            int stringcount, const char *stringname,
1707            const char **stringptr);
1708
1709       To extract a substring by name, you first have to find associated  num‐
1710       ber.  For example, for this pattern
1711
1712         (a+)b(?<xxx>\d+)...
1713
1714       the number of the subpattern called "xxx" is 2. If the name is known to
1715       be unique (PCRE_DUPNAMES was not set), you can find the number from the
1716       name by calling pcre_get_stringnumber(). The first argument is the com‐
1717       piled pattern, and the second is the name. The yield of the function is
1718       the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
1719       subpattern of that name.
1720
1721       Given the number, you can extract the substring directly, or use one of
1722       the functions described in the previous section. For convenience, there
1723       are also two functions that do the whole job.
1724
1725       Most   of   the   arguments    of    pcre_copy_named_substring()    and
1726       pcre_get_named_substring()  are  the  same  as  those for the similarly
1727       named functions that extract by number. As these are described  in  the
1728       previous  section,  they  are not re-described here. There are just two
1729       differences:
1730
1731       First, instead of a substring number, a substring name is  given.  Sec‐
1732       ond, there is an extra argument, given at the start, which is a pointer
1733       to the compiled pattern. This is needed in order to gain access to  the
1734       name-to-number translation table.
1735
1736       These  functions call pcre_get_stringnumber(), and if it succeeds, they
1737       then call pcre_copy_substring() or pcre_get_substring(),  as  appropri‐
1738       ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
1739       behaviour may not be what you want (see the next section).
1740
1741       Warning: If the pattern uses the (?| feature to set up multiple subpat‐
1742       terns  with  the  same number, as described in the section on duplicate
1743       subpattern numbers in the pcrepattern page, you  cannot  use  names  to
1744       distinguish  the  different subpatterns, because names are not included
1745       in the compiled code. The matching process uses only numbers. For  this
1746       reason,  the  use of different names for subpatterns of the same number
1747       causes an error at compile time.
1748

DUPLICATE SUBPATTERN NAMES

1750
1751       int pcre_get_stringtable_entries(const pcre *code,
1752            const char *name, char **first, char **last);
1753
1754       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
1755       subpatterns  are not required to be unique. (Duplicate names are always
1756       allowed for subpatterns with the same number, created by using the  (?|
1757       feature.  Indeed,  if  such subpatterns are named, they are required to
1758       use the same names.)
1759
1760       Normally, patterns with duplicate names are such that in any one match,
1761       only  one of the named subpatterns participates. An example is shown in
1762       the pcrepattern documentation.
1763
1764       When   duplicates   are   present,   pcre_copy_named_substring()    and
1765       pcre_get_named_substring()  return the first substring corresponding to
1766       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
1767       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
1768       function returns one of the numbers that are associated with the  name,
1769       but it is not defined which it is.
1770
1771       If  you want to get full details of all captured substrings for a given
1772       name, you must use  the  pcre_get_stringtable_entries()  function.  The
1773       first argument is the compiled pattern, and the second is the name. The
1774       third and fourth are pointers to variables which  are  updated  by  the
1775       function. After it has run, they point to the first and last entries in
1776       the name-to-number table  for  the  given  name.  The  function  itself
1777       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
1778       there are none. The format of the table is described above in the  sec‐
1779       tion  entitled  Information  about  a  pattern.  Given all the relevant
1780       entries for the name, you can extract each of their numbers, and  hence
1781       the captured data, if any.
1782

FINDING ALL POSSIBLE MATCHES

1784
1785       The  traditional  matching  function  uses a similar algorithm to Perl,
1786       which stops when it finds the first match, starting at a given point in
1787       the  subject.  If you want to find all possible matches, or the longest
1788       possible match, consider using the alternative matching  function  (see
1789       below)  instead.  If you cannot use the alternative function, but still
1790       need to find all possible matches, you can kludge it up by  making  use
1791       of the callout facility, which is described in the pcrecallout documen‐
1792       tation.
1793
1794       What you have to do is to insert a callout right at the end of the pat‐
1795       tern.   When your callout function is called, extract and save the cur‐
1796       rent matched substring. Then return  1,  which  forces  pcre_exec()  to
1797       backtrack  and  try other alternatives. Ultimately, when it runs out of
1798       matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
1799

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

1801
1802       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
1803            const char *subject, int length, int startoffset,
1804            int options, int *ovector, int ovecsize,
1805            int *workspace, int wscount);
1806
1807       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
1808       against  a  compiled pattern, using a matching algorithm that scans the
1809       subject string just once, and does not backtrack.  This  has  different
1810       characteristics  to  the  normal  algorithm, and is not compatible with
1811       Perl. Some of the features of PCRE patterns are not  supported.  Never‐
1812       theless,  there are times when this kind of matching can be useful. For
1813       a discussion of the two matching algorithms, and  a  list  of  features
1814       that  pcre_dfa_exec() does not support, see the pcrematching documenta‐
1815       tion.
1816
1817       The arguments for the pcre_dfa_exec() function  are  the  same  as  for
1818       pcre_exec(), plus two extras. The ovector argument is used in a differ‐
1819       ent way, and this is described below. The other  common  arguments  are
1820       used  in  the  same way as for pcre_exec(), so their description is not
1821       repeated here.
1822
1823       The two additional arguments provide workspace for  the  function.  The
1824       workspace  vector  should  contain at least 20 elements. It is used for
1825       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
1826       workspace  will  be  needed for patterns and subjects where there are a
1827       lot of potential matches.
1828
1829       Here is an example of a simple call to pcre_dfa_exec():
1830
1831         int rc;
1832         int ovector[10];
1833         int wspace[20];
1834         rc = pcre_dfa_exec(
1835           re,             /* result of pcre_compile() */
1836           NULL,           /* we didn't study the pattern */
1837           "some string",  /* the subject string */
1838           11,             /* the length of the subject string */
1839           0,              /* start at offset 0 in the subject */
1840           0,              /* default options */
1841           ovector,        /* vector of integers for substring information */
1842           10,             /* number of elements (NOT size in bytes) */
1843           wspace,         /* working space vector */
1844           20);            /* number of elements (NOT size in bytes) */
1845
1846   Option bits for pcre_dfa_exec()
1847
1848       The unused bits of the options argument  for  pcre_dfa_exec()  must  be
1849       zero.  The  only  bits  that  may  be  set are PCRE_ANCHORED, PCRE_NEW‐
1850       LINE_xxx,        PCRE_NOTBOL,        PCRE_NOTEOL,        PCRE_NOTEMPTY,
1851       PCRE_NOTEMPTY_ATSTART,       PCRE_NO_UTF8_CHECK,      PCRE_BSR_ANYCRLF,
1852       PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD,  PCRE_PAR‐
1853       TIAL_SOFT,  PCRE_DFA_SHORTEST,  and PCRE_DFA_RESTART.  All but the last
1854       four of these are  exactly  the  same  as  for  pcre_exec(),  so  their
1855       description is not repeated here.
1856
1857         PCRE_PARTIAL_HARD
1858         PCRE_PARTIAL_SOFT
1859
1860       These  have the same general effect as they do for pcre_exec(), but the
1861       details are slightly  different.  When  PCRE_PARTIAL_HARD  is  set  for
1862       pcre_dfa_exec(),  it  returns PCRE_ERROR_PARTIAL if the end of the sub‐
1863       ject is reached and there is still at least  one  matching  possibility
1864       that requires additional characters. This happens even if some complete
1865       matches have also been found. When PCRE_PARTIAL_SOFT is set, the return
1866       code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end
1867       of the subject is reached, there have been  no  complete  matches,  but
1868       there  is  still  at least one matching possibility. The portion of the
1869       string that was inspected when the longest partial match was  found  is
1870       set as the first matching string in both cases.
1871
1872         PCRE_DFA_SHORTEST
1873
1874       Setting  the  PCRE_DFA_SHORTEST option causes the matching algorithm to
1875       stop as soon as it has found one match. Because of the way the alterna‐
1876       tive  algorithm  works, this is necessarily the shortest possible match
1877       at the first possible matching point in the subject string.
1878
1879         PCRE_DFA_RESTART
1880
1881       When pcre_dfa_exec() returns a partial match, it is possible to call it
1882       again,  with  additional  subject characters, and have it continue with
1883       the same match. The PCRE_DFA_RESTART option requests this action;  when
1884       it  is  set,  the workspace and wscount options must reference the same
1885       vector as before because data about the match so far is  left  in  them
1886       after a partial match. There is more discussion of this facility in the
1887       pcrepartial documentation.
1888
1889   Successful returns from pcre_dfa_exec()
1890
1891       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub‐
1892       string in the subject. Note, however, that all the matches from one run
1893       of the function start at the same point in  the  subject.  The  shorter
1894       matches  are all initial substrings of the longer matches. For example,
1895       if the pattern
1896
1897         <.*>
1898
1899       is matched against the string
1900
1901         This is <something> <something else> <something further> no more
1902
1903       the three matched strings are
1904
1905         <something>
1906         <something> <something else>
1907         <something> <something else> <something further>
1908
1909       On success, the yield of the function is a number  greater  than  zero,
1910       which  is  the  number of matched substrings. The substrings themselves
1911       are returned in ovector. Each string uses two elements;  the  first  is
1912       the  offset  to  the start, and the second is the offset to the end. In
1913       fact, all the strings have the same start  offset.  (Space  could  have
1914       been  saved by giving this only once, but it was decided to retain some
1915       compatibility with the way pcre_exec() returns data,  even  though  the
1916       meaning of the strings is different.)
1917
1918       The strings are returned in reverse order of length; that is, the long‐
1919       est matching string is given first. If there were too many  matches  to
1920       fit  into ovector, the yield of the function is zero, and the vector is
1921       filled with the longest matches.
1922
1923   Error returns from pcre_dfa_exec()
1924
1925       The pcre_dfa_exec() function returns a negative number when  it  fails.
1926       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
1927       described above.  There are in addition the following errors  that  are
1928       specific to pcre_dfa_exec():
1929
1930         PCRE_ERROR_DFA_UITEM      (-16)
1931
1932       This  return is given if pcre_dfa_exec() encounters an item in the pat‐
1933       tern that it does not support, for instance, the use of \C  or  a  back
1934       reference.
1935
1936         PCRE_ERROR_DFA_UCOND      (-17)
1937
1938       This  return  is  given  if pcre_dfa_exec() encounters a condition item
1939       that uses a back reference for the condition, or a test  for  recursion
1940       in a specific group. These are not supported.
1941
1942         PCRE_ERROR_DFA_UMLIMIT    (-18)
1943
1944       This  return  is given if pcre_dfa_exec() is called with an extra block
1945       that contains a setting of the match_limit field. This is not supported
1946       (it is meaningless).
1947
1948         PCRE_ERROR_DFA_WSSIZE     (-19)
1949
1950       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
1951       workspace vector.
1952
1953         PCRE_ERROR_DFA_RECURSE    (-20)
1954
1955       When a recursive subpattern is processed, the matching  function  calls
1956       itself  recursively,  using  private vectors for ovector and workspace.
1957       This error is given if the output vector  is  not  large  enough.  This
1958       should be extremely rare, as a vector of size 1000 is used.
1959

SEE ALSO

1961
1962       pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar‐
1963       tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
1964

AUTHOR

1966
1967       Philip Hazel
1968       University Computing Service
1969       Cambridge CB2 3QH, England.
1970

REVISION

1972
1973       Last updated: 21 June 2010
1974       Copyright (c) 1997-2010 University of Cambridge.
1975
1976
1977
1978                                                                    PCREAPI(3)
Impressum