1PCREAPI(3)                 Library Functions Manual                 PCREAPI(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE NATIVE API

9
10       #include <pcre.h>
11
12       pcre *pcre_compile(const char *pattern, int options,
13            const char **errptr, int *erroffset,
14            const unsigned char *tableptr);
15
16       pcre *pcre_compile2(const char *pattern, int options,
17            int *errorcodeptr,
18            const char **errptr, int *erroffset,
19            const unsigned char *tableptr);
20
21       pcre_extra *pcre_study(const pcre *code, int options,
22            const char **errptr);
23
24       int pcre_exec(const pcre *code, const pcre_extra *extra,
25            const char *subject, int length, int startoffset,
26            int options, int *ovector, int ovecsize);
27
28       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
29            const char *subject, int length, int startoffset,
30            int options, int *ovector, int ovecsize,
31            int *workspace, int wscount);
32
33       int pcre_copy_named_substring(const pcre *code,
34            const char *subject, int *ovector,
35            int stringcount, const char *stringname,
36            char *buffer, int buffersize);
37
38       int pcre_copy_substring(const char *subject, int *ovector,
39            int stringcount, int stringnumber, char *buffer,
40            int buffersize);
41
42       int pcre_get_named_substring(const pcre *code,
43            const char *subject, int *ovector,
44            int stringcount, const char *stringname,
45            const char **stringptr);
46
47       int pcre_get_stringnumber(const pcre *code,
48            const char *name);
49
50       int pcre_get_stringtable_entries(const pcre *code,
51            const char *name, char **first, char **last);
52
53       int pcre_get_substring(const char *subject, int *ovector,
54            int stringcount, int stringnumber,
55            const char **stringptr);
56
57       int pcre_get_substring_list(const char *subject,
58            int *ovector, int stringcount, const char ***listptr);
59
60       void pcre_free_substring(const char *stringptr);
61
62       void pcre_free_substring_list(const char **stringptr);
63
64       const unsigned char *pcre_maketables(void);
65
66       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
67            int what, void *where);
68
69       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
70
71       int pcre_refcount(pcre *code, int adjust);
72
73       int pcre_config(int what, void *where);
74
75       char *pcre_version(void);
76
77       void *(*pcre_malloc)(size_t);
78
79       void (*pcre_free)(void *);
80
81       void *(*pcre_stack_malloc)(size_t);
82
83       void (*pcre_stack_free)(void *);
84
85       int (*pcre_callout)(pcre_callout_block *);
86

PCRE API OVERVIEW

88
89       PCRE has its own native API, which is described in this document. There
90       are also some wrapper functions that correspond to  the  POSIX  regular
91       expression  API.  These  are  described in the pcreposix documentation.
92       Both of these APIs define a set of C function calls. A C++  wrapper  is
93       distributed with PCRE. It is documented in the pcrecpp page.
94
95       The  native  API  C  function prototypes are defined in the header file
96       pcre.h, and on Unix systems the library itself is called  libpcre.   It
97       can normally be accessed by adding -lpcre to the command for linking an
98       application  that  uses  PCRE.  The  header  file  defines  the  macros
99       PCRE_MAJOR  and  PCRE_MINOR to contain the major and minor release num‐
100       bers for the library.  Applications can use these  to  include  support
101       for different releases of PCRE.
102
103       The   functions   pcre_compile(),  pcre_compile2(),  pcre_study(),  and
104       pcre_exec() are used for compiling and matching regular expressions  in
105       a  Perl-compatible  manner. A sample program that demonstrates the sim‐
106       plest way of using them is provided in the file  called  pcredemo.c  in
107       the  source distribution. The pcresample documentation describes how to
108       run it.
109
110       A second matching function, pcre_dfa_exec(), which is not Perl-compati‐
111       ble,  is  also provided. This uses a different algorithm for the match‐
112       ing. The alternative algorithm finds all possible matches (at  a  given
113       point  in  the subject), and scans the subject just once. However, this
114       algorithm does not return captured substrings. A description of the two
115       matching  algorithms and their advantages and disadvantages is given in
116       the pcrematching documentation.
117
118       In addition to the main compiling and  matching  functions,  there  are
119       convenience functions for extracting captured substrings from a subject
120       string that is matched by pcre_exec(). They are:
121
122         pcre_copy_substring()
123         pcre_copy_named_substring()
124         pcre_get_substring()
125         pcre_get_named_substring()
126         pcre_get_substring_list()
127         pcre_get_stringnumber()
128         pcre_get_stringtable_entries()
129
130       pcre_free_substring() and pcre_free_substring_list() are also provided,
131       to free the memory used for extracted strings.
132
133       The  function  pcre_maketables()  is  used  to build a set of character
134       tables  in  the  current  locale   for   passing   to   pcre_compile(),
135       pcre_exec(),  or  pcre_dfa_exec(). This is an optional facility that is
136       provided for specialist use.  Most  commonly,  no  special  tables  are
137       passed,  in  which case internal tables that are generated when PCRE is
138       built are used.
139
140       The function pcre_fullinfo() is used to find out  information  about  a
141       compiled  pattern; pcre_info() is an obsolete version that returns only
142       some of the available information, but is retained for  backwards  com‐
143       patibility.   The function pcre_version() returns a pointer to a string
144       containing the version of PCRE and its date of release.
145
146       The function pcre_refcount() maintains a  reference  count  in  a  data
147       block  containing  a compiled pattern. This is provided for the benefit
148       of object-oriented applications.
149
150       The global variables pcre_malloc and pcre_free  initially  contain  the
151       entry  points  of  the  standard malloc() and free() functions, respec‐
152       tively. PCRE calls the memory management functions via these variables,
153       so  a  calling  program  can replace them if it wishes to intercept the
154       calls. This should be done before calling any PCRE functions.
155
156       The global variables pcre_stack_malloc  and  pcre_stack_free  are  also
157       indirections  to  memory  management functions. These special functions
158       are used only when PCRE is compiled to use  the  heap  for  remembering
159       data, instead of recursive function calls, when running the pcre_exec()
160       function. See the pcrebuild documentation for  details  of  how  to  do
161       this.  It  is  a non-standard way of building PCRE, for use in environ‐
162       ments that have limited stacks. Because of the greater  use  of  memory
163       management,  it  runs  more  slowly. Separate functions are provided so
164       that special-purpose external code can be  used  for  this  case.  When
165       used,  these  functions  are always called in a stack-like manner (last
166       obtained, first freed), and always for memory blocks of the same  size.
167       There  is  a discussion about PCRE's stack usage in the pcrestack docu‐
168       mentation.
169
170       The global variable pcre_callout initially contains NULL. It can be set
171       by  the  caller  to  a "callout" function, which PCRE will then call at
172       specified points during a matching operation. Details are given in  the
173       pcrecallout documentation.
174

NEWLINES

176
177       PCRE  supports five different conventions for indicating line breaks in
178       strings: a single CR (carriage return) character, a  single  LF  (line‐
179       feed) character, the two-character sequence CRLF, any of the three pre‐
180       ceding, or any Unicode newline sequence. The Unicode newline  sequences
181       are  the  three just mentioned, plus the single characters VT (vertical
182       tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS  (line
183       separator, U+2028), and PS (paragraph separator, U+2029).
184
185       Each  of  the first three conventions is used by at least one operating
186       system as its standard newline sequence. When PCRE is built, a  default
187       can  be  specified.  The default default is LF, which is the Unix stan‐
188       dard. When PCRE is run, the default can be overridden,  either  when  a
189       pattern is compiled, or when it is matched.
190
191       At compile time, the newline convention can be specified by the options
192       argument of pcre_compile(), or it can be specified by special  text  at
193       the start of the pattern itself; this overrides any other settings. See
194       the pcrepattern page for details of the special character sequences.
195
196       In the PCRE documentation the word "newline" is used to mean "the char‐
197       acter  or pair of characters that indicate a line break". The choice of
198       newline convention affects the handling of  the  dot,  circumflex,  and
199       dollar metacharacters, the handling of #-comments in /x mode, and, when
200       CRLF is a recognized line ending sequence, the match position  advance‐
201       ment for a non-anchored pattern. There is more detail about this in the
202       section on pcre_exec() options below. The choice of newline  convention
203       does not affect the interpretation of the \n or \r escape sequences.
204

MULTITHREADING

206
207       The  PCRE  functions  can be used in multi-threading applications, with
208       the  proviso  that  the  memory  management  functions  pointed  to  by
209       pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
210       callout function pointed to by pcre_callout, are shared by all threads.
211
212       The compiled form of a regular expression is not altered during  match‐
213       ing, so the same compiled pattern can safely be used by several threads
214       at once.
215

SAVING PRECOMPILED PATTERNS FOR LATER USE

217
218       The compiled form of a regular expression can be saved and re-used at a
219       later  time,  possibly by a different program, and even on a host other
220       than the one on which  it  was  compiled.  Details  are  given  in  the
221       pcreprecompile  documentation.  However, compiling a regular expression
222       with one version of PCRE for use with a different version is not  guar‐
223       anteed to work and may cause crashes.
224

CHECKING BUILD-TIME OPTIONS

226
227       int pcre_config(int what, void *where);
228
229       The  function pcre_config() makes it possible for a PCRE client to dis‐
230       cover which optional features have been compiled into the PCRE library.
231       The  pcrebuild documentation has more details about these optional fea‐
232       tures.
233
234       The first argument for pcre_config() is an  integer,  specifying  which
235       information is required; the second argument is a pointer to a variable
236       into which the information is  placed.  The  following  information  is
237       available:
238
239         PCRE_CONFIG_UTF8
240
241       The  output is an integer that is set to one if UTF-8 support is avail‐
242       able; otherwise it is set to zero.
243
244         PCRE_CONFIG_UNICODE_PROPERTIES
245
246       The output is an integer that is set to  one  if  support  for  Unicode
247       character properties is available; otherwise it is set to zero.
248
249         PCRE_CONFIG_NEWLINE
250
251       The  output  is  an integer whose value specifies the default character
252       sequence that is recognized as meaning "newline". The four values  that
253       are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
254       and -1 for ANY. The default should normally be  the  standard  sequence
255       for your operating system.
256
257         PCRE_CONFIG_LINK_SIZE
258
259       The  output  is  an  integer that contains the number of bytes used for
260       internal linkage in compiled regular expressions. The value is 2, 3, or
261       4.  Larger  values  allow larger regular expressions to be compiled, at
262       the expense of slower matching. The default value of  2  is  sufficient
263       for  all  but  the  most massive patterns, since it allows the compiled
264       pattern to be up to 64K in size.
265
266         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
267
268       The output is an integer that contains the threshold  above  which  the
269       POSIX  interface  uses malloc() for output vectors. Further details are
270       given in the pcreposix documentation.
271
272         PCRE_CONFIG_MATCH_LIMIT
273
274       The output is an integer that gives the default limit for the number of
275       internal  matching  function  calls in a pcre_exec() execution. Further
276       details are given with pcre_exec() below.
277
278         PCRE_CONFIG_MATCH_LIMIT_RECURSION
279
280       The output is an integer that gives the default limit for the depth  of
281       recursion  when calling the internal matching function in a pcre_exec()
282       execution. Further details are given with pcre_exec() below.
283
284         PCRE_CONFIG_STACKRECURSE
285
286       The output is an integer that is set to one if internal recursion  when
287       running pcre_exec() is implemented by recursive function calls that use
288       the stack to remember their state. This is the usual way that  PCRE  is
289       compiled. The output is zero if PCRE was compiled to use blocks of data
290       on the  heap  instead  of  recursive  function  calls.  In  this  case,
291       pcre_stack_malloc  and  pcre_stack_free  are  called  to  manage memory
292       blocks on the heap, thus avoiding the use of the stack.
293

COMPILING A PATTERN

295
296       pcre *pcre_compile(const char *pattern, int options,
297            const char **errptr, int *erroffset,
298            const unsigned char *tableptr);
299
300       pcre *pcre_compile2(const char *pattern, int options,
301            int *errorcodeptr,
302            const char **errptr, int *erroffset,
303            const unsigned char *tableptr);
304
305       Either of the functions pcre_compile() or pcre_compile2() can be called
306       to compile a pattern into an internal form. The only difference between
307       the two interfaces is that pcre_compile2() has an additional  argument,
308       errorcodeptr, via which a numerical error code can be returned.
309
310       The pattern is a C string terminated by a binary zero, and is passed in
311       the pattern argument. A pointer to a single block  of  memory  that  is
312       obtained  via  pcre_malloc is returned. This contains the compiled code
313       and related data. The pcre type is defined for the returned block; this
314       is a typedef for a structure whose contents are not externally defined.
315       It is up to the caller to free the memory (via pcre_free) when it is no
316       longer required.
317
318       Although  the compiled code of a PCRE regex is relocatable, that is, it
319       does not depend on memory location, the complete pcre data block is not
320       fully  relocatable, because it may contain a copy of the tableptr argu‐
321       ment, which is an address (see below).
322
323       The options argument contains various bit settings that affect the com‐
324       pilation.  It  should be zero if no options are required. The available
325       options are described below. Some of them, in  particular,  those  that
326       are  compatible  with  Perl,  can also be set and unset from within the
327       pattern (see the detailed description  in  the  pcrepattern  documenta‐
328       tion).  For  these options, the contents of the options argument speci‐
329       fies their initial settings at the start of compilation and  execution.
330       The  PCRE_ANCHORED  and PCRE_NEWLINE_xxx options can be set at the time
331       of matching as well as at compile time.
332
333       If errptr is NULL, pcre_compile() returns NULL immediately.  Otherwise,
334       if  compilation  of  a  pattern fails, pcre_compile() returns NULL, and
335       sets the variable pointed to by errptr to point to a textual error mes‐
336       sage. This is a static string that is part of the library. You must not
337       try to free it. The offset from the start of the pattern to the charac‐
338       ter where the error was discovered is placed in the variable pointed to
339       by erroffset, which must not be NULL. If it is, an immediate  error  is
340       given.
341
342       If  pcre_compile2()  is  used instead of pcre_compile(), and the error‐
343       codeptr argument is not NULL, a non-zero error code number is  returned
344       via  this argument in the event of an error. This is in addition to the
345       textual error message. Error codes and messages are listed below.
346
347       If the final argument, tableptr, is NULL, PCRE uses a  default  set  of
348       character  tables  that  are  built  when  PCRE  is compiled, using the
349       default C locale. Otherwise, tableptr must be an address  that  is  the
350       result  of  a  call to pcre_maketables(). This value is stored with the
351       compiled pattern, and used again by pcre_exec(), unless  another  table
352       pointer is passed to it. For more discussion, see the section on locale
353       support below.
354
355       This code fragment shows a typical straightforward  call  to  pcre_com‐
356       pile():
357
358         pcre *re;
359         const char *error;
360         int erroffset;
361         re = pcre_compile(
362           "^A.*Z",          /* the pattern */
363           0,                /* default options */
364           &error,           /* for error message */
365           &erroffset,       /* for error offset */
366           NULL);            /* use default character tables */
367
368       The  following  names  for option bits are defined in the pcre.h header
369       file:
370
371         PCRE_ANCHORED
372
373       If this bit is set, the pattern is forced to be "anchored", that is, it
374       is  constrained to match only at the first matching point in the string
375       that is being searched (the "subject string"). This effect can also  be
376       achieved  by appropriate constructs in the pattern itself, which is the
377       only way to do it in Perl.
378
379         PCRE_AUTO_CALLOUT
380
381       If this bit is set, pcre_compile() automatically inserts callout items,
382       all  with  number  255, before each pattern item. For discussion of the
383       callout facility, see the pcrecallout documentation.
384
385         PCRE_CASELESS
386
387       If this bit is set, letters in the pattern match both upper  and  lower
388       case  letters.  It  is  equivalent  to  Perl's /i option, and it can be
389       changed within a pattern by a (?i) option setting. In UTF-8 mode,  PCRE
390       always  understands the concept of case for characters whose values are
391       less than 128, so caseless matching is always possible. For  characters
392       with  higher  values,  the concept of case is supported if PCRE is com‐
393       piled with Unicode property support, but not otherwise. If you want  to
394       use  caseless  matching  for  characters 128 and above, you must ensure
395       that PCRE is compiled with Unicode property support  as  well  as  with
396       UTF-8 support.
397
398         PCRE_DOLLAR_ENDONLY
399
400       If  this bit is set, a dollar metacharacter in the pattern matches only
401       at the end of the subject string. Without this option,  a  dollar  also
402       matches  immediately before a newline at the end of the string (but not
403       before any other newlines). The PCRE_DOLLAR_ENDONLY option  is  ignored
404       if  PCRE_MULTILINE  is  set.   There is no equivalent to this option in
405       Perl, and no way to set it within a pattern.
406
407         PCRE_DOTALL
408
409       If this bit is set, a dot metacharater in the pattern matches all char‐
410       acters,  including  those that indicate newline. Without it, a dot does
411       not match when the current position is at a  newline.  This  option  is
412       equivalent  to Perl's /s option, and it can be changed within a pattern
413       by a (?s) option setting. A negative class such as [^a] always  matches
414       newline characters, independent of the setting of this option.
415
416         PCRE_DUPNAMES
417
418       If  this  bit is set, names used to identify capturing subpatterns need
419       not be unique. This can be helpful for certain types of pattern when it
420       is  known  that  only  one instance of the named subpattern can ever be
421       matched. There are more details of named subpatterns  below;  see  also
422       the pcrepattern documentation.
423
424         PCRE_EXTENDED
425
426       If  this  bit  is  set,  whitespace  data characters in the pattern are
427       totally ignored except when escaped or inside a character class. White‐
428       space does not include the VT character (code 11). In addition, charac‐
429       ters between an unescaped # outside a character class and the next new‐
430       line,  inclusive,  are  also  ignored.  This is equivalent to Perl's /x
431       option, and it can be changed within a pattern by a  (?x)  option  set‐
432       ting.
433
434       This  option  makes  it possible to include comments inside complicated
435       patterns.  Note, however, that this applies only  to  data  characters.
436       Whitespace   characters  may  never  appear  within  special  character
437       sequences in a pattern, for  example  within  the  sequence  (?(  which
438       introduces a conditional subpattern.
439
440         PCRE_EXTRA
441
442       This  option  was invented in order to turn on additional functionality
443       of PCRE that is incompatible with Perl, but it  is  currently  of  very
444       little  use. When set, any backslash in a pattern that is followed by a
445       letter that has no special meaning  causes  an  error,  thus  reserving
446       these  combinations  for  future  expansion.  By default, as in Perl, a
447       backslash followed by a letter with no special meaning is treated as  a
448       literal.  (Perl can, however, be persuaded to give a warning for this.)
449       There are at present no other features controlled by  this  option.  It
450       can also be set by a (?X) option setting within a pattern.
451
452         PCRE_FIRSTLINE
453
454       If  this  option  is  set,  an  unanchored pattern is required to match
455       before or at the first  newline  in  the  subject  string,  though  the
456       matched text may continue over the newline.
457
458         PCRE_MULTILINE
459
460       By  default,  PCRE  treats the subject string as consisting of a single
461       line of characters (even if it actually contains newlines). The  "start
462       of  line"  metacharacter  (^)  matches only at the start of the string,
463       while the "end of line" metacharacter ($) matches only at  the  end  of
464       the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
465       is set). This is the same as Perl.
466
467       When PCRE_MULTILINE it is set, the "start of line" and  "end  of  line"
468       constructs  match  immediately following or immediately before internal
469       newlines in the subject string, respectively, as well as  at  the  very
470       start  and  end.  This is equivalent to Perl's /m option, and it can be
471       changed within a pattern by a (?m) option setting. If there are no new‐
472       lines  in  a  subject string, or no occurrences of ^ or $ in a pattern,
473       setting PCRE_MULTILINE has no effect.
474
475         PCRE_NEWLINE_CR
476         PCRE_NEWLINE_LF
477         PCRE_NEWLINE_CRLF
478         PCRE_NEWLINE_ANYCRLF
479         PCRE_NEWLINE_ANY
480
481       These options override the default newline definition that  was  chosen
482       when  PCRE  was built. Setting the first or the second specifies that a
483       newline is indicated by a single character (CR  or  LF,  respectively).
484       Setting  PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
485       two-character CRLF  sequence.  Setting  PCRE_NEWLINE_ANYCRLF  specifies
486       that any of the three preceding sequences should be recognized. Setting
487       PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should  be
488       recognized. The Unicode newline sequences are the three just mentioned,
489       plus the single characters VT (vertical  tab,  U+000B),  FF  (formfeed,
490       U+000C),  NEL  (next line, U+0085), LS (line separator, U+2028), and PS
491       (paragraph separator, U+2029). The last  two  are  recognized  only  in
492       UTF-8 mode.
493
494       The  newline  setting  in  the  options  word  uses three bits that are
495       treated as a number, giving eight possibilities. Currently only six are
496       used  (default  plus the five values above). This means that if you set
497       more than one newline option, the combination may or may not be  sensi‐
498       ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
499       PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers  and
500       cause an error.
501
502       The  only time that a line break is specially recognized when compiling
503       a pattern is if PCRE_EXTENDED is set, and  an  unescaped  #  outside  a
504       character  class  is  encountered.  This indicates a comment that lasts
505       until after the next line break sequence. In other circumstances,  line
506       break   sequences   are   treated  as  literal  data,  except  that  in
507       PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
508       and are therefore ignored.
509
510       The newline option that is set at compile time becomes the default that
511       is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden.
512
513         PCRE_NO_AUTO_CAPTURE
514
515       If this option is set, it disables the use of numbered capturing paren‐
516       theses  in the pattern. Any opening parenthesis that is not followed by
517       ? behaves as if it were followed by ?: but named parentheses can  still
518       be  used  for  capturing  (and  they acquire numbers in the usual way).
519       There is no equivalent of this option in Perl.
520
521         PCRE_UNGREEDY
522
523       This option inverts the "greediness" of the quantifiers  so  that  they
524       are  not greedy by default, but become greedy if followed by "?". It is
525       not compatible with Perl. It can also be set by a (?U)  option  setting
526       within the pattern.
527
528         PCRE_UTF8
529
530       This  option  causes PCRE to regard both the pattern and the subject as
531       strings of UTF-8 characters instead of single-byte  character  strings.
532       However,  it is available only when PCRE is built to include UTF-8 sup‐
533       port. If not, the use of this option provokes an error. Details of  how
534       this  option  changes the behaviour of PCRE are given in the section on
535       UTF-8 support in the main pcre page.
536
537         PCRE_NO_UTF8_CHECK
538
539       When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
540       automatically  checked.  There  is  a  discussion about the validity of
541       UTF-8 strings in the main pcre page. If an invalid  UTF-8  sequence  of
542       bytes  is  found,  pcre_compile() returns an error. If you already know
543       that your pattern is valid, and you want to skip this check for perfor‐
544       mance  reasons,  you  can set the PCRE_NO_UTF8_CHECK option. When it is
545       set, the effect of passing an invalid UTF-8  string  as  a  pattern  is
546       undefined.  It  may  cause your program to crash. Note that this option
547       can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress  the
548       UTF-8 validity checking of subject strings.
549

COMPILATION ERROR CODES

551
552       The  following  table  lists  the  error  codes than may be returned by
553       pcre_compile2(), along with the error messages that may be returned  by
554       both  compiling functions. As PCRE has developed, some error codes have
555       fallen out of use. To avoid confusion, they have not been re-used.
556
557          0  no error
558          1  \ at end of pattern
559          2  \c at end of pattern
560          3  unrecognized character follows \
561          4  numbers out of order in {} quantifier
562          5  number too big in {} quantifier
563          6  missing terminating ] for character class
564          7  invalid escape sequence in character class
565          8  range out of order in character class
566          9  nothing to repeat
567         10  [this code is not in use]
568         11  internal error: unexpected repeat
569         12  unrecognized character after (?
570         13  POSIX named classes are supported only within a class
571         14  missing )
572         15  reference to non-existent subpattern
573         16  erroffset passed as NULL
574         17  unknown option bit(s) set
575         18  missing ) after comment
576         19  [this code is not in use]
577         20  regular expression too large
578         21  failed to get memory
579         22  unmatched parentheses
580         23  internal error: code overflow
581         24  unrecognized character after (?<
582         25  lookbehind assertion is not fixed length
583         26  malformed number or name after (?(
584         27  conditional group contains more than two branches
585         28  assertion expected after (?(
586         29  (?R or (?[+-]digits must be followed by )
587         30  unknown POSIX class name
588         31  POSIX collating elements are not supported
589         32  this version of PCRE is not compiled with PCRE_UTF8 support
590         33  [this code is not in use]
591         34  character value in \x{...} sequence is too large
592         35  invalid condition (?(0)
593         36  \C not allowed in lookbehind assertion
594         37  PCRE does not support \L, \l, \N, \U, or \u
595         38  number after (?C is > 255
596         39  closing ) for (?C expected
597         40  recursive call could loop indefinitely
598         41  unrecognized character after (?P
599         42  syntax error in subpattern name (missing terminator)
600         43  two named subpatterns have the same name
601         44  invalid UTF-8 string
602         45  support for \P, \p, and \X has not been compiled
603         46  malformed \P or \p sequence
604         47  unknown property name after \P or \p
605         48  subpattern name is too long (maximum 32 characters)
606         49  too many named subpatterns (maximum 10,000)
607         50  [this code is not in use]
608         51  octal value is greater than \377 (not in UTF-8 mode)
609         52  internal error: overran compiling workspace
610         53  internal  error:  previously-checked  referenced  subpattern  not
611       found
612         54  DEFINE group contains more than one branch
613         55  repeating a DEFINE group is not allowed
614         56  inconsistent NEWLINE options"
615         57  \g is not followed by a braced name or an optionally braced
616               non-zero number
617         58  (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number
618

STUDYING A PATTERN

620
621       pcre_extra *pcre_study(const pcre *code, int options
622            const char **errptr);
623
624       If  a  compiled  pattern is going to be used several times, it is worth
625       spending more time analyzing it in order to speed up the time taken for
626       matching.  The function pcre_study() takes a pointer to a compiled pat‐
627       tern as its first argument. If studying the pattern produces additional
628       information  that  will  help speed up matching, pcre_study() returns a
629       pointer to a pcre_extra block, in which the study_data field points  to
630       the results of the study.
631
632       The  returned  value  from  pcre_study()  can  be  passed  directly  to
633       pcre_exec(). However, a pcre_extra block  also  contains  other  fields
634       that  can  be  set  by the caller before the block is passed; these are
635       described below in the section on matching a pattern.
636
637       If studying the pattern does not  produce  any  additional  information
638       pcre_study() returns NULL. In that circumstance, if the calling program
639       wants to pass any of the other fields to pcre_exec(), it  must  set  up
640       its own pcre_extra block.
641
642       The  second  argument of pcre_study() contains option bits. At present,
643       no options are defined, and this argument should always be zero.
644
645       The third argument for pcre_study() is a pointer for an error  message.
646       If  studying  succeeds  (even  if no data is returned), the variable it
647       points to is set to NULL. Otherwise it is set to  point  to  a  textual
648       error message. This is a static string that is part of the library. You
649       must not try to free it. You should test the  error  pointer  for  NULL
650       after calling pcre_study(), to be sure that it has run successfully.
651
652       This is a typical call to pcre_study():
653
654         pcre_extra *pe;
655         pe = pcre_study(
656           re,             /* result of pcre_compile() */
657           0,              /* no options exist */
658           &error);        /* set to NULL or points to a message */
659
660       At present, studying a pattern is useful only for non-anchored patterns
661       that do not have a single fixed starting character. A bitmap of  possi‐
662       ble starting bytes is created.
663

LOCALE SUPPORT

665
666       PCRE  handles  caseless matching, and determines whether characters are
667       letters, digits, or whatever, by reference to a set of tables,  indexed
668       by  character  value.  When running in UTF-8 mode, this applies only to
669       characters with codes less than 128. Higher-valued  codes  never  match
670       escapes  such  as  \w or \d, but can be tested with \p if PCRE is built
671       with Unicode character property support. The use of locales  with  Uni‐
672       code  is discouraged. If you are handling characters with codes greater
673       than 128, you should either use UTF-8 and Unicode, or use locales,  but
674       not try to mix the two.
675
676       PCRE  contains  an  internal set of tables that are used when the final
677       argument of pcre_compile() is  NULL.  These  are  sufficient  for  many
678       applications.  Normally, the internal tables recognize only ASCII char‐
679       acters. However, when PCRE is built, it is possible to cause the inter‐
680       nal tables to be rebuilt in the default "C" locale of the local system,
681       which may cause them to be different.
682
683       The internal tables can always be overridden by tables supplied by  the
684       application that calls PCRE. These may be created in a different locale
685       from the default. As more and more applications change  to  using  Uni‐
686       code, the need for this locale support is expected to die away.
687
688       External  tables  are  built by calling the pcre_maketables() function,
689       which has no arguments, in the relevant locale. The result can then  be
690       passed  to  pcre_compile()  or  pcre_exec()  as often as necessary. For
691       example, to build and use tables that are appropriate  for  the  French
692       locale  (where  accented  characters  with  values greater than 128 are
693       treated as letters), the following code could be used:
694
695         setlocale(LC_CTYPE, "fr_FR");
696         tables = pcre_maketables();
697         re = pcre_compile(..., tables);
698
699       The locale name "fr_FR" is used on Linux and other  Unix-like  systems;
700       if you are using Windows, the name for the French locale is "french".
701
702       When  pcre_maketables()  runs,  the  tables are built in memory that is
703       obtained via pcre_malloc. It is the caller's responsibility  to  ensure
704       that  the memory containing the tables remains available for as long as
705       it is needed.
706
707       The pointer that is passed to pcre_compile() is saved with the compiled
708       pattern,  and the same tables are used via this pointer by pcre_study()
709       and normally also by pcre_exec(). Thus, by default, for any single pat‐
710       tern, compilation, studying and matching all happen in the same locale,
711       but different patterns can be compiled in different locales.
712
713       It is possible to pass a table pointer or NULL (indicating the  use  of
714       the  internal  tables)  to  pcre_exec(). Although not intended for this
715       purpose, this facility could be used to match a pattern in a  different
716       locale from the one in which it was compiled. Passing table pointers at
717       run time is discussed below in the section on matching a pattern.
718

INFORMATION ABOUT A PATTERN

720
721       int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
722            int what, void *where);
723
724       The pcre_fullinfo() function returns information about a compiled  pat‐
725       tern. It replaces the obsolete pcre_info() function, which is neverthe‐
726       less retained for backwards compability (and is documented below).
727
728       The first argument for pcre_fullinfo() is a  pointer  to  the  compiled
729       pattern.  The second argument is the result of pcre_study(), or NULL if
730       the pattern was not studied. The third argument specifies  which  piece
731       of  information  is required, and the fourth argument is a pointer to a
732       variable to receive the data. The yield of the  function  is  zero  for
733       success, or one of the following negative numbers:
734
735         PCRE_ERROR_NULL       the argument code was NULL
736                               the argument where was NULL
737         PCRE_ERROR_BADMAGIC   the "magic number" was not found
738         PCRE_ERROR_BADOPTION  the value of what was invalid
739
740       The  "magic  number" is placed at the start of each compiled pattern as
741       an simple check against passing an arbitrary memory pointer. Here is  a
742       typical  call  of pcre_fullinfo(), to obtain the length of the compiled
743       pattern:
744
745         int rc;
746         size_t length;
747         rc = pcre_fullinfo(
748           re,               /* result of pcre_compile() */
749           pe,               /* result of pcre_study(), or NULL */
750           PCRE_INFO_SIZE,   /* what is required */
751           &length);         /* where to put the data */
752
753       The possible values for the third argument are defined in  pcre.h,  and
754       are as follows:
755
756         PCRE_INFO_BACKREFMAX
757
758       Return  the  number  of  the highest back reference in the pattern. The
759       fourth argument should point to an int variable. Zero  is  returned  if
760       there are no back references.
761
762         PCRE_INFO_CAPTURECOUNT
763
764       Return  the  number of capturing subpatterns in the pattern. The fourth
765       argument should point to an int variable.
766
767         PCRE_INFO_DEFAULT_TABLES
768
769       Return a pointer to the internal default character tables within  PCRE.
770       The  fourth  argument should point to an unsigned char * variable. This
771       information call is provided for internal use by the pcre_study() func‐
772       tion.  External  callers  can  cause PCRE to use its internal tables by
773       passing a NULL table pointer.
774
775         PCRE_INFO_FIRSTBYTE
776
777       Return information about the first byte of any matched  string,  for  a
778       non-anchored  pattern. The fourth argument should point to an int vari‐
779       able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old  name
780       is still recognized for backwards compatibility.)
781
782       If  there  is  a  fixed first byte, for example, from a pattern such as
783       (cat|cow|coyote), its value is returned. Otherwise, if either
784
785       (a) the pattern was compiled with the PCRE_MULTILINE option, and  every
786       branch starts with "^", or
787
788       (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
789       set (if it were set, the pattern would be anchored),
790
791       -1 is returned, indicating that the pattern matches only at  the  start
792       of  a  subject string or after any newline within the string. Otherwise
793       -2 is returned. For anchored patterns, -2 is returned.
794
795         PCRE_INFO_FIRSTTABLE
796
797       If the pattern was studied, and this resulted in the construction of  a
798       256-bit table indicating a fixed set of bytes for the first byte in any
799       matching string, a pointer to the table is returned. Otherwise NULL  is
800       returned.  The fourth argument should point to an unsigned char * vari‐
801       able.
802
803         PCRE_INFO_HASCRORLF
804
805       Return 1 if the pattern contains any explicit  matches  for  CR  or  LF
806       characters,  otherwise  0.  The  fourth argument should point to an int
807       variable.
808
809         PCRE_INFO_JCHANGED
810
811       Return 1 if the (?J) option setting is used in the  pattern,  otherwise
812       0. The fourth argument should point to an int variable. The (?J) inter‐
813       nal option setting changes the local PCRE_DUPNAMES option.
814
815         PCRE_INFO_LASTLITERAL
816
817       Return the value of the rightmost literal byte that must exist  in  any
818       matched  string,  other  than  at  its  start,  if such a byte has been
819       recorded. The fourth argument should point to an int variable. If there
820       is  no such byte, -1 is returned. For anchored patterns, a last literal
821       byte is recorded only if it follows something of variable  length.  For
822       example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
823       /^a\dz\d/ the returned value is -1.
824
825         PCRE_INFO_NAMECOUNT
826         PCRE_INFO_NAMEENTRYSIZE
827         PCRE_INFO_NAMETABLE
828
829       PCRE supports the use of named as well as numbered capturing  parenthe‐
830       ses.  The names are just an additional way of identifying the parenthe‐
831       ses, which still acquire numbers. Several convenience functions such as
832       pcre_get_named_substring()  are  provided  for extracting captured sub‐
833       strings by name. It is also possible to extract the data  directly,  by
834       first  converting  the  name to a number in order to access the correct
835       pointers in the output vector (described with pcre_exec() below). To do
836       the  conversion,  you  need  to  use  the  name-to-number map, which is
837       described by these three values.
838
839       The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
840       gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
841       of each entry; both of these  return  an  int  value.  The  entry  size
842       depends  on the length of the longest name. PCRE_INFO_NAMETABLE returns
843       a pointer to the first entry of the table  (a  pointer  to  char).  The
844       first two bytes of each entry are the number of the capturing parenthe‐
845       sis, most significant byte first. The rest of the entry is  the  corre‐
846       sponding  name,  zero  terminated. The names are in alphabetical order.
847       When PCRE_DUPNAMES is set, duplicate names are in order of their paren‐
848       theses  numbers.  For  example,  consider the following pattern (assume
849       PCRE_EXTENDED is  set,  so  white  space  -  including  newlines  -  is
850       ignored):
851
852         (?<date> (?<year>(\d\d)?\d\d) -
853         (?<month>\d\d) - (?<day>\d\d) )
854
855       There  are  four  named subpatterns, so the table has four entries, and
856       each entry in the table is eight bytes long. The table is  as  follows,
857       with non-printing bytes shows in hexadecimal, and undefined bytes shown
858       as ??:
859
860         00 01 d  a  t  e  00 ??
861         00 05 d  a  y  00 ?? ??
862         00 04 m  o  n  t  h  00
863         00 02 y  e  a  r  00 ??
864
865       When writing code to extract data  from  named  subpatterns  using  the
866       name-to-number  map,  remember that the length of the entries is likely
867       to be different for each compiled pattern.
868
869         PCRE_INFO_OKPARTIAL
870
871       Return 1 if the pattern can be used for partial matching, otherwise  0.
872       The  fourth  argument  should point to an int variable. The pcrepartial
873       documentation lists the restrictions that apply to patterns  when  par‐
874       tial matching is used.
875
876         PCRE_INFO_OPTIONS
877
878       Return  a  copy of the options with which the pattern was compiled. The
879       fourth argument should point to an unsigned long  int  variable.  These
880       option bits are those specified in the call to pcre_compile(), modified
881       by any top-level option settings at the start of the pattern itself. In
882       other  words,  they are the options that will be in force when matching
883       starts. For example, if the pattern /(?im)abc(?-i)d/ is  compiled  with
884       the  PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
885       and PCRE_EXTENDED.
886
887       A pattern is automatically anchored by PCRE if  all  of  its  top-level
888       alternatives begin with one of the following:
889
890         ^     unless PCRE_MULTILINE is set
891         \A    always
892         \G    always
893         .*    if PCRE_DOTALL is set and there are no back
894                 references to the subpattern in which .* appears
895
896       For such patterns, the PCRE_ANCHORED bit is set in the options returned
897       by pcre_fullinfo().
898
899         PCRE_INFO_SIZE
900
901       Return the size of the compiled pattern, that is, the  value  that  was
902       passed as the argument to pcre_malloc() when PCRE was getting memory in
903       which to place the compiled data. The fourth argument should point to a
904       size_t variable.
905
906         PCRE_INFO_STUDYSIZE
907
908       Return the size of the data block pointed to by the study_data field in
909       a pcre_extra block. That is,  it  is  the  value  that  was  passed  to
910       pcre_malloc() when PCRE was getting memory into which to place the data
911       created by pcre_study(). The fourth argument should point to  a  size_t
912       variable.
913

OBSOLETE INFO FUNCTION

915
916       int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
917
918       The  pcre_info()  function is now obsolete because its interface is too
919       restrictive to return all the available data about a compiled  pattern.
920       New   programs   should  use  pcre_fullinfo()  instead.  The  yield  of
921       pcre_info() is the number of capturing subpatterns, or one of the  fol‐
922       lowing negative numbers:
923
924         PCRE_ERROR_NULL       the argument code was NULL
925         PCRE_ERROR_BADMAGIC   the "magic number" was not found
926
927       If  the  optptr  argument is not NULL, a copy of the options with which
928       the pattern was compiled is placed in the integer  it  points  to  (see
929       PCRE_INFO_OPTIONS above).
930
931       If  the  pattern  is  not anchored and the firstcharptr argument is not
932       NULL, it is used to pass back information about the first character  of
933       any matched string (see PCRE_INFO_FIRSTBYTE above).
934

REFERENCE COUNTS

936
937       int pcre_refcount(pcre *code, int adjust);
938
939       The  pcre_refcount()  function is used to maintain a reference count in
940       the data block that contains a compiled pattern. It is provided for the
941       benefit  of  applications  that  operate  in an object-oriented manner,
942       where different parts of the application may be using the same compiled
943       pattern, but you want to free the block when they are all done.
944
945       When a pattern is compiled, the reference count field is initialized to
946       zero.  It is changed only by calling this function, whose action is  to
947       add  the  adjust  value  (which may be positive or negative) to it. The
948       yield of the function is the new value. However, the value of the count
949       is  constrained to lie between 0 and 65535, inclusive. If the new value
950       is outside these limits, it is forced to the appropriate limit value.
951
952       Except when it is zero, the reference count is not correctly  preserved
953       if  a  pattern  is  compiled on one host and then transferred to a host
954       whose byte-order is different. (This seems a highly unlikely scenario.)
955

MATCHING A PATTERN: THE TRADITIONAL FUNCTION

957
958       int pcre_exec(const pcre *code, const pcre_extra *extra,
959            const char *subject, int length, int startoffset,
960            int options, int *ovector, int ovecsize);
961
962       The function pcre_exec() is called to match a subject string against  a
963       compiled  pattern, which is passed in the code argument. If the pattern
964       has been studied, the result of the study should be passed in the extra
965       argument.  This  function is the main matching facility of the library,
966       and it operates in a Perl-like manner. For specialist use there is also
967       an  alternative matching function, which is described below in the sec‐
968       tion about the pcre_dfa_exec() function.
969
970       In most applications, the pattern will have been compiled (and  option‐
971       ally  studied)  in the same process that calls pcre_exec(). However, it
972       is possible to save compiled patterns and study data, and then use them
973       later  in  different processes, possibly even on different hosts. For a
974       discussion about this, see the pcreprecompile documentation.
975
976       Here is an example of a simple call to pcre_exec():
977
978         int rc;
979         int ovector[30];
980         rc = pcre_exec(
981           re,             /* result of pcre_compile() */
982           NULL,           /* we didn't study the pattern */
983           "some string",  /* the subject string */
984           11,             /* the length of the subject string */
985           0,              /* start at offset 0 in the subject */
986           0,              /* default options */
987           ovector,        /* vector of integers for substring information */
988           30);            /* number of elements (NOT size in bytes) */
989
990   Extra data for pcre_exec()
991
992       If the extra argument is not NULL, it must point to a  pcre_extra  data
993       block.  The pcre_study() function returns such a block (when it doesn't
994       return NULL), but you can also create one for yourself, and pass  addi‐
995       tional  information  in it. The pcre_extra block contains the following
996       fields (not necessarily in this order):
997
998         unsigned long int flags;
999         void *study_data;
1000         unsigned long int match_limit;
1001         unsigned long int match_limit_recursion;
1002         void *callout_data;
1003         const unsigned char *tables;
1004
1005       The flags field is a bitmap that specifies which of  the  other  fields
1006       are set. The flag bits are:
1007
1008         PCRE_EXTRA_STUDY_DATA
1009         PCRE_EXTRA_MATCH_LIMIT
1010         PCRE_EXTRA_MATCH_LIMIT_RECURSION
1011         PCRE_EXTRA_CALLOUT_DATA
1012         PCRE_EXTRA_TABLES
1013
1014       Other  flag  bits should be set to zero. The study_data field is set in
1015       the pcre_extra block that is returned by  pcre_study(),  together  with
1016       the appropriate flag bit. You should not set this yourself, but you may
1017       add to the block by setting the other fields  and  their  corresponding
1018       flag bits.
1019
1020       The match_limit field provides a means of preventing PCRE from using up
1021       a vast amount of resources when running patterns that are not going  to
1022       match,  but  which  have  a very large number of possibilities in their
1023       search trees. The classic  example  is  the  use  of  nested  unlimited
1024       repeats.
1025
1026       Internally,  PCRE uses a function called match() which it calls repeat‐
1027       edly (sometimes recursively). The limit set by match_limit  is  imposed
1028       on  the  number  of times this function is called during a match, which
1029       has the effect of limiting the amount of  backtracking  that  can  take
1030       place. For patterns that are not anchored, the count restarts from zero
1031       for each position in the subject string.
1032
1033       The default value for the limit can be set  when  PCRE  is  built;  the
1034       default  default  is 10 million, which handles all but the most extreme
1035       cases. You can override the default  by  suppling  pcre_exec()  with  a
1036       pcre_extra     block    in    which    match_limit    is    set,    and
1037       PCRE_EXTRA_MATCH_LIMIT is set in the  flags  field.  If  the  limit  is
1038       exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1039
1040       The  match_limit_recursion field is similar to match_limit, but instead
1041       of limiting the total number of times that match() is called, it limits
1042       the  depth  of  recursion. The recursion depth is a smaller number than
1043       the total number of calls, because not all calls to match() are  recur‐
1044       sive.  This limit is of use only if it is set smaller than match_limit.
1045
1046       Limiting  the  recursion  depth  limits the amount of stack that can be
1047       used, or, when PCRE has been compiled to use memory on the heap instead
1048       of the stack, the amount of heap memory that can be used.
1049
1050       The  default  value  for  match_limit_recursion can be set when PCRE is
1051       built; the default default  is  the  same  value  as  the  default  for
1052       match_limit.  You can override the default by suppling pcre_exec() with
1053       a  pcre_extra  block  in  which  match_limit_recursion  is   set,   and
1054       PCRE_EXTRA_MATCH_LIMIT_RECURSION  is  set  in  the  flags field. If the
1055       limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1056
1057       The pcre_callout field is used in conjunction with the  "callout"  fea‐
1058       ture, which is described in the pcrecallout documentation.
1059
1060       The  tables  field  is  used  to  pass  a  character  tables pointer to
1061       pcre_exec(); this overrides the value that is stored with the  compiled
1062       pattern.  A  non-NULL value is stored with the compiled pattern only if
1063       custom tables were supplied to pcre_compile() via  its  tableptr  argu‐
1064       ment.  If NULL is passed to pcre_exec() using this mechanism, it forces
1065       PCRE's internal tables to be used. This facility is  helpful  when  re-
1066       using  patterns  that  have been saved after compiling with an external
1067       set of tables, because the external tables  might  be  at  a  different
1068       address  when  pcre_exec() is called. See the pcreprecompile documenta‐
1069       tion for a discussion of saving compiled patterns for later use.
1070
1071   Option bits for pcre_exec()
1072
1073       The unused bits of the options argument for pcre_exec() must  be  zero.
1074       The  only  bits  that  may  be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
1075       PCRE_NOTBOL,   PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and
1076       PCRE_PARTIAL.
1077
1078         PCRE_ANCHORED
1079
1080       The  PCRE_ANCHORED  option  limits pcre_exec() to matching at the first
1081       matching position. If a pattern was  compiled  with  PCRE_ANCHORED,  or
1082       turned  out to be anchored by virtue of its contents, it cannot be made
1083       unachored at matching time.
1084
1085         PCRE_NEWLINE_CR
1086         PCRE_NEWLINE_LF
1087         PCRE_NEWLINE_CRLF
1088         PCRE_NEWLINE_ANYCRLF
1089         PCRE_NEWLINE_ANY
1090
1091       These options override  the  newline  definition  that  was  chosen  or
1092       defaulted  when the pattern was compiled. For details, see the descrip‐
1093       tion of pcre_compile()  above.  During  matching,  the  newline  choice
1094       affects  the  behaviour  of the dot, circumflex, and dollar metacharac‐
1095       ters. It may also alter the way the match position is advanced after  a
1096       match failure for an unanchored pattern.
1097
1098       When  PCRE_NEWLINE_CRLF,  PCRE_NEWLINE_ANYCRLF,  or PCRE_NEWLINE_ANY is
1099       set, and a match attempt for an unanchored pattern fails when the  cur‐
1100       rent  position  is  at  a  CRLF  sequence,  and the pattern contains no
1101       explicit matches for  CR  or  NL  characters,  the  match  position  is
1102       advanced by two characters instead of one, in other words, to after the
1103       CRLF.
1104
1105       The above rule is a compromise that makes the most common cases work as
1106       expected.  For  example,  if  the  pattern  is .+A (and the PCRE_DOTALL
1107       option is not set), it does not match the string "\r\nA" because, after
1108       failing  at the start, it skips both the CR and the LF before retrying.
1109       However, the pattern [\r\n]A does match that string,  because  it  con‐
1110       tains an explicit CR or LF reference, and so advances only by one char‐
1111       acter after the first failure.  Note than an explicit CR or  LF  refer‐
1112       ence occurs for negated character classes such as [^X] because they can
1113       match CR or LF characters.
1114
1115       Notwithstanding the above, anomalous effects may still occur when  CRLF
1116       is a valid newline sequence and explicit \r or \n escapes appear in the
1117       pattern.
1118
1119         PCRE_NOTBOL
1120
1121       This option specifies that first character of the subject string is not
1122       the  beginning  of  a  line, so the circumflex metacharacter should not
1123       match before it. Setting this without PCRE_MULTILINE (at compile  time)
1124       causes  circumflex  never to match. This option affects only the behav‐
1125       iour of the circumflex metacharacter. It does not affect \A.
1126
1127         PCRE_NOTEOL
1128
1129       This option specifies that the end of the subject string is not the end
1130       of  a line, so the dollar metacharacter should not match it nor (except
1131       in multiline mode) a newline immediately before it. Setting this  with‐
1132       out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1133       option affects only the behaviour of the dollar metacharacter. It  does
1134       not affect \Z or \z.
1135
1136         PCRE_NOTEMPTY
1137
1138       An empty string is not considered to be a valid match if this option is
1139       set. If there are alternatives in the pattern, they are tried.  If  all
1140       the  alternatives  match  the empty string, the entire match fails. For
1141       example, if the pattern
1142
1143         a?b?
1144
1145       is applied to a string not beginning with "a" or "b",  it  matches  the
1146       empty  string at the start of the subject. With PCRE_NOTEMPTY set, this
1147       match is not valid, so PCRE searches further into the string for occur‐
1148       rences of "a" or "b".
1149
1150       Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe‐
1151       cial case of a pattern match of the empty  string  within  its  split()
1152       function,  and  when  using  the /g modifier. It is possible to emulate
1153       Perl's behaviour after matching a null string by first trying the match
1154       again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
1155       if that fails by advancing the starting offset (see below)  and  trying
1156       an ordinary match again. There is some code that demonstrates how to do
1157       this in the pcredemo.c sample program.
1158
1159         PCRE_NO_UTF8_CHECK
1160
1161       When PCRE_UTF8 is set at compile time, the validity of the subject as a
1162       UTF-8  string is automatically checked when pcre_exec() is subsequently
1163       called.  The value of startoffset is also checked  to  ensure  that  it
1164       points  to  the start of a UTF-8 character. There is a discussion about
1165       the validity of UTF-8 strings in the section on UTF-8  support  in  the
1166       main  pcre  page.  If  an  invalid  UTF-8  sequence  of bytes is found,
1167       pcre_exec() returns the error PCRE_ERROR_BADUTF8. If  startoffset  con‐
1168       tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
1169
1170       If  you  already  know that your subject is valid, and you want to skip
1171       these   checks   for   performance   reasons,   you   can    set    the
1172       PCRE_NO_UTF8_CHECK  option  when calling pcre_exec(). You might want to
1173       do this for the second and subsequent calls to pcre_exec() if  you  are
1174       making  repeated  calls  to  find  all  the matches in a single subject
1175       string. However, you should be  sure  that  the  value  of  startoffset
1176       points  to  the  start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1177       set, the effect of passing an invalid UTF-8 string as a subject,  or  a
1178       value  of startoffset that does not point to the start of a UTF-8 char‐
1179       acter, is undefined. Your program may crash.
1180
1181         PCRE_PARTIAL
1182
1183       This option turns on the  partial  matching  feature.  If  the  subject
1184       string  fails to match the pattern, but at some point during the match‐
1185       ing process the end of the subject was reached (that  is,  the  subject
1186       partially  matches  the  pattern and the failure to match occurred only
1187       because there were not enough subject characters), pcre_exec()  returns
1188       PCRE_ERROR_PARTIAL  instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
1189       used, there are restrictions on what may appear in the  pattern.  These
1190       are discussed in the pcrepartial documentation.
1191
1192   The string to be matched by pcre_exec()
1193
1194       The  subject string is passed to pcre_exec() as a pointer in subject, a
1195       length in length, and a starting byte offset in startoffset.  In  UTF-8
1196       mode,  the  byte  offset  must point to the start of a UTF-8 character.
1197       Unlike the pattern string, the subject may contain binary  zero  bytes.
1198       When  the starting offset is zero, the search for a match starts at the
1199       beginning of the subject, and this is by far the most common case.
1200
1201       A non-zero starting offset is useful when searching for  another  match
1202       in  the same subject by calling pcre_exec() again after a previous suc‐
1203       cess.  Setting startoffset differs from just passing over  a  shortened
1204       string  and  setting  PCRE_NOTBOL  in the case of a pattern that begins
1205       with any kind of lookbehind. For example, consider the pattern
1206
1207         \Biss\B
1208
1209       which finds occurrences of "iss" in the middle of  words.  (\B  matches
1210       only  if  the  current position in the subject is not a word boundary.)
1211       When applied to the string "Mississipi" the first call  to  pcre_exec()
1212       finds  the  first  occurrence. If pcre_exec() is called again with just
1213       the remainder of the subject,  namely  "issipi",  it  does  not  match,
1214       because \B is always false at the start of the subject, which is deemed
1215       to be a word boundary. However, if pcre_exec()  is  passed  the  entire
1216       string again, but with startoffset set to 4, it finds the second occur‐
1217       rence of "iss" because it is able to look behind the starting point  to
1218       discover that it is preceded by a letter.
1219
1220       If  a  non-zero starting offset is passed when the pattern is anchored,
1221       one attempt to match at the given offset is made. This can only succeed
1222       if  the  pattern  does  not require the match to be at the start of the
1223       subject.
1224
1225   How pcre_exec() returns captured substrings
1226
1227       In general, a pattern matches a certain portion of the subject, and  in
1228       addition,  further  substrings  from  the  subject may be picked out by
1229       parts of the pattern. Following the usage  in  Jeffrey  Friedl's  book,
1230       this  is  called "capturing" in what follows, and the phrase "capturing
1231       subpattern" is used for a fragment of a pattern that picks out  a  sub‐
1232       string.  PCRE  supports several other kinds of parenthesized subpattern
1233       that do not cause substrings to be captured.
1234
1235       Captured substrings are returned to the caller via a vector of  integer
1236       offsets  whose  address is passed in ovector. The number of elements in
1237       the vector is passed in ovecsize, which must be a non-negative  number.
1238       Note: this argument is NOT the size of ovector in bytes.
1239
1240       The  first  two-thirds of the vector is used to pass back captured sub‐
1241       strings, each substring using a pair of integers. The  remaining  third
1242       of  the  vector is used as workspace by pcre_exec() while matching cap‐
1243       turing subpatterns, and is not available for passing back  information.
1244       The  length passed in ovecsize should always be a multiple of three. If
1245       it is not, it is rounded down.
1246
1247       When a match is successful, information about  captured  substrings  is
1248       returned  in  pairs  of integers, starting at the beginning of ovector,
1249       and continuing up to two-thirds of its length at the  most.  The  first
1250       element of a pair is set to the offset of the first character in a sub‐
1251       string, and the second is set to the  offset  of  the  first  character
1252       after  the  end  of  a  substring. The first pair, ovector[0] and ovec‐
1253       tor[1], identify the portion of  the  subject  string  matched  by  the
1254       entire  pattern.  The next pair is used for the first capturing subpat‐
1255       tern, and so on. The value returned by pcre_exec() is one more than the
1256       highest numbered pair that has been set. For example, if two substrings
1257       have been captured, the returned value is 3. If there are no  capturing
1258       subpatterns,  the return value from a successful match is 1, indicating
1259       that just the first pair of offsets has been set.
1260
1261       If a capturing subpattern is matched repeatedly, it is the last portion
1262       of the string that it matched that is returned.
1263
1264       If  the vector is too small to hold all the captured substring offsets,
1265       it is used as far as possible (up to two-thirds of its length), and the
1266       function  returns a value of zero. In particular, if the substring off‐
1267       sets are not of interest, pcre_exec() may be called with ovector passed
1268       as  NULL  and  ovecsize  as zero. However, if the pattern contains back
1269       references and the ovector is not big enough to  remember  the  related
1270       substrings,  PCRE has to get additional memory for use during matching.
1271       Thus it is usually advisable to supply an ovector.
1272
1273       The pcre_info() function can be used to find  out  how  many  capturing
1274       subpatterns  there  are  in  a  compiled pattern. The smallest size for
1275       ovector that will allow for n captured substrings, in addition  to  the
1276       offsets of the substring matched by the whole pattern, is (n+1)*3.
1277
1278       It  is  possible for capturing subpattern number n+1 to match some part
1279       of the subject when subpattern n has not been used at all. For example,
1280       if  the  string  "abc"  is  matched against the pattern (a|(z))(bc) the
1281       return from the function is 4, and subpatterns 1 and 3 are matched, but
1282       2  is  not.  When  this happens, both values in the offset pairs corre‐
1283       sponding to unused subpatterns are set to -1.
1284
1285       Offset values that correspond to unused subpatterns at the end  of  the
1286       expression  are  also  set  to  -1. For example, if the string "abc" is
1287       matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are  not
1288       matched.  The  return  from the function is 2, because the highest used
1289       capturing subpattern number is 1. However, you can refer to the offsets
1290       for  the  second  and third capturing subpatterns if you wish (assuming
1291       the vector is large enough, of course).
1292
1293       Some convenience functions are provided  for  extracting  the  captured
1294       substrings as separate strings. These are described below.
1295
1296   Error return values from pcre_exec()
1297
1298       If  pcre_exec()  fails, it returns a negative number. The following are
1299       defined in the header file:
1300
1301         PCRE_ERROR_NOMATCH        (-1)
1302
1303       The subject string did not match the pattern.
1304
1305         PCRE_ERROR_NULL           (-2)
1306
1307       Either code or subject was passed as NULL,  or  ovector  was  NULL  and
1308       ovecsize was not zero.
1309
1310         PCRE_ERROR_BADOPTION      (-3)
1311
1312       An unrecognized bit was set in the options argument.
1313
1314         PCRE_ERROR_BADMAGIC       (-4)
1315
1316       PCRE  stores a 4-byte "magic number" at the start of the compiled code,
1317       to catch the case when it is passed a junk pointer and to detect when a
1318       pattern that was compiled in an environment of one endianness is run in
1319       an environment with the other endianness. This is the error  that  PCRE
1320       gives when the magic number is not present.
1321
1322         PCRE_ERROR_UNKNOWN_OPCODE (-5)
1323
1324       While running the pattern match, an unknown item was encountered in the
1325       compiled pattern. This error could be caused by a bug  in  PCRE  or  by
1326       overwriting of the compiled pattern.
1327
1328         PCRE_ERROR_NOMEMORY       (-6)
1329
1330       If  a  pattern contains back references, but the ovector that is passed
1331       to pcre_exec() is not big enough to remember the referenced substrings,
1332       PCRE  gets  a  block of memory at the start of matching to use for this
1333       purpose. If the call via pcre_malloc() fails, this error is given.  The
1334       memory is automatically freed at the end of matching.
1335
1336         PCRE_ERROR_NOSUBSTRING    (-7)
1337
1338       This  error is used by the pcre_copy_substring(), pcre_get_substring(),
1339       and  pcre_get_substring_list()  functions  (see  below).  It  is  never
1340       returned by pcre_exec().
1341
1342         PCRE_ERROR_MATCHLIMIT     (-8)
1343
1344       The  backtracking  limit,  as  specified  by the match_limit field in a
1345       pcre_extra structure (or defaulted) was reached.  See  the  description
1346       above.
1347
1348         PCRE_ERROR_CALLOUT        (-9)
1349
1350       This error is never generated by pcre_exec() itself. It is provided for
1351       use by callout functions that want to yield a distinctive  error  code.
1352       See the pcrecallout documentation for details.
1353
1354         PCRE_ERROR_BADUTF8        (-10)
1355
1356       A  string  that contains an invalid UTF-8 byte sequence was passed as a
1357       subject.
1358
1359         PCRE_ERROR_BADUTF8_OFFSET (-11)
1360
1361       The UTF-8 byte sequence that was passed as a subject was valid, but the
1362       value  of startoffset did not point to the beginning of a UTF-8 charac‐
1363       ter.
1364
1365         PCRE_ERROR_PARTIAL        (-12)
1366
1367       The subject string did not match, but it did match partially.  See  the
1368       pcrepartial documentation for details of partial matching.
1369
1370         PCRE_ERROR_BADPARTIAL     (-13)
1371
1372       The  PCRE_PARTIAL  option  was  used with a compiled pattern containing
1373       items that are not supported for partial matching. See the  pcrepartial
1374       documentation for details of partial matching.
1375
1376         PCRE_ERROR_INTERNAL       (-14)
1377
1378       An  unexpected  internal error has occurred. This error could be caused
1379       by a bug in PCRE or by overwriting of the compiled pattern.
1380
1381         PCRE_ERROR_BADCOUNT       (-15)
1382
1383       This error is given if the value of the ovecsize argument is negative.
1384
1385         PCRE_ERROR_RECURSIONLIMIT (-21)
1386
1387       The internal recursion limit, as specified by the match_limit_recursion
1388       field  in  a  pcre_extra  structure (or defaulted) was reached. See the
1389       description above.
1390
1391         PCRE_ERROR_BADNEWLINE     (-23)
1392
1393       An invalid combination of PCRE_NEWLINE_xxx options was given.
1394
1395       Error numbers -16 to -20 and -22 are not used by pcre_exec().
1396

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

1398
1399       int pcre_copy_substring(const char *subject, int *ovector,
1400            int stringcount, int stringnumber, char *buffer,
1401            int buffersize);
1402
1403       int pcre_get_substring(const char *subject, int *ovector,
1404            int stringcount, int stringnumber,
1405            const char **stringptr);
1406
1407       int pcre_get_substring_list(const char *subject,
1408            int *ovector, int stringcount, const char ***listptr);
1409
1410       Captured substrings can be  accessed  directly  by  using  the  offsets
1411       returned  by  pcre_exec()  in  ovector.  For convenience, the functions
1412       pcre_copy_substring(),    pcre_get_substring(),    and    pcre_get_sub‐
1413       string_list()  are  provided for extracting captured substrings as new,
1414       separate, zero-terminated strings. These functions identify  substrings
1415       by  number.  The  next section describes functions for extracting named
1416       substrings.
1417
1418       A substring that contains a binary zero is correctly extracted and  has
1419       a  further zero added on the end, but the result is not, of course, a C
1420       string.  However, you can process such a string  by  referring  to  the
1421       length  that  is  returned  by  pcre_copy_substring() and pcre_get_sub‐
1422       string().  Unfortunately, the interface to pcre_get_substring_list() is
1423       not  adequate for handling strings containing binary zeros, because the
1424       end of the final string is not independently indicated.
1425
1426       The first three arguments are the same for all  three  of  these  func‐
1427       tions:  subject  is  the subject string that has just been successfully
1428       matched, ovector is a pointer to the vector of integer offsets that was
1429       passed to pcre_exec(), and stringcount is the number of substrings that
1430       were captured by the match, including the substring  that  matched  the
1431       entire regular expression. This is the value returned by pcre_exec() if
1432       it is greater than zero. If pcre_exec() returned zero, indicating  that
1433       it  ran out of space in ovector, the value passed as stringcount should
1434       be the number of elements in the vector divided by three.
1435
1436       The functions pcre_copy_substring() and pcre_get_substring() extract  a
1437       single  substring,  whose  number  is given as stringnumber. A value of
1438       zero extracts the substring that matched the  entire  pattern,  whereas
1439       higher  values  extract  the  captured  substrings.  For pcre_copy_sub‐
1440       string(), the string is placed in buffer,  whose  length  is  given  by
1441       buffersize,  while  for  pcre_get_substring()  a new block of memory is
1442       obtained via pcre_malloc, and its address is  returned  via  stringptr.
1443       The  yield  of  the function is the length of the string, not including
1444       the terminating zero, or one of these error codes:
1445
1446         PCRE_ERROR_NOMEMORY       (-6)
1447
1448       The buffer was too small for pcre_copy_substring(), or the  attempt  to
1449       get memory failed for pcre_get_substring().
1450
1451         PCRE_ERROR_NOSUBSTRING    (-7)
1452
1453       There is no substring whose number is stringnumber.
1454
1455       The  pcre_get_substring_list()  function  extracts  all  available sub‐
1456       strings and builds a list of pointers to them. All this is  done  in  a
1457       single block of memory that is obtained via pcre_malloc. The address of
1458       the memory block is returned via listptr, which is also  the  start  of
1459       the  list  of  string pointers. The end of the list is marked by a NULL
1460       pointer. The yield of the function is zero if all  went  well,  or  the
1461       error code
1462
1463         PCRE_ERROR_NOMEMORY       (-6)
1464
1465       if the attempt to get the memory block failed.
1466
1467       When  any of these functions encounter a substring that is unset, which
1468       can happen when capturing subpattern number n+1 matches  some  part  of
1469       the  subject, but subpattern n has not been used at all, they return an
1470       empty string. This can be distinguished from a genuine zero-length sub‐
1471       string  by inspecting the appropriate offset in ovector, which is nega‐
1472       tive for unset substrings.
1473
1474       The two convenience functions pcre_free_substring() and  pcre_free_sub‐
1475       string_list()  can  be  used  to free the memory returned by a previous
1476       call  of  pcre_get_substring()  or  pcre_get_substring_list(),  respec‐
1477       tively.  They  do  nothing  more  than  call the function pointed to by
1478       pcre_free, which of course could be called directly from a  C  program.
1479       However,  PCRE is used in some situations where it is linked via a spe‐
1480       cial  interface  to  another  programming  language  that  cannot   use
1481       pcre_free  directly;  it is for these cases that the functions are pro‐
1482       vided.
1483

EXTRACTING CAPTURED SUBSTRINGS BY NAME

1485
1486       int pcre_get_stringnumber(const pcre *code,
1487            const char *name);
1488
1489       int pcre_copy_named_substring(const pcre *code,
1490            const char *subject, int *ovector,
1491            int stringcount, const char *stringname,
1492            char *buffer, int buffersize);
1493
1494       int pcre_get_named_substring(const pcre *code,
1495            const char *subject, int *ovector,
1496            int stringcount, const char *stringname,
1497            const char **stringptr);
1498
1499       To extract a substring by name, you first have to find associated  num‐
1500       ber.  For example, for this pattern
1501
1502         (a+)b(?<xxx>\d+)...
1503
1504       the number of the subpattern called "xxx" is 2. If the name is known to
1505       be unique (PCRE_DUPNAMES was not set), you can find the number from the
1506       name by calling pcre_get_stringnumber(). The first argument is the com‐
1507       piled pattern, and the second is the name. The yield of the function is
1508       the  subpattern  number,  or PCRE_ERROR_NOSUBSTRING (-7) if there is no
1509       subpattern of that name.
1510
1511       Given the number, you can extract the substring directly, or use one of
1512       the functions described in the previous section. For convenience, there
1513       are also two functions that do the whole job.
1514
1515       Most   of   the   arguments    of    pcre_copy_named_substring()    and
1516       pcre_get_named_substring()  are  the  same  as  those for the similarly
1517       named functions that extract by number. As these are described  in  the
1518       previous  section,  they  are not re-described here. There are just two
1519       differences:
1520
1521       First, instead of a substring number, a substring name is  given.  Sec‐
1522       ond, there is an extra argument, given at the start, which is a pointer
1523       to the compiled pattern. This is needed in order to gain access to  the
1524       name-to-number translation table.
1525
1526       These  functions call pcre_get_stringnumber(), and if it succeeds, they
1527       then call pcre_copy_substring() or pcre_get_substring(),  as  appropri‐
1528       ate.  NOTE:  If PCRE_DUPNAMES is set and there are duplicate names, the
1529       behaviour may not be what you want (see the next section).
1530

DUPLICATE SUBPATTERN NAMES

1532
1533       int pcre_get_stringtable_entries(const pcre *code,
1534            const char *name, char **first, char **last);
1535
1536       When a pattern is compiled with the  PCRE_DUPNAMES  option,  names  for
1537       subpatterns  are  not  required  to  be unique. Normally, patterns with
1538       duplicate names are such that in any one match, only one of  the  named
1539       subpatterns  participates. An example is shown in the pcrepattern docu‐
1540       mentation.
1541
1542       When   duplicates   are   present,   pcre_copy_named_substring()    and
1543       pcre_get_named_substring()  return the first substring corresponding to
1544       the given name that is set. If  none  are  set,  PCRE_ERROR_NOSUBSTRING
1545       (-7)  is  returned;  no  data  is returned. The pcre_get_stringnumber()
1546       function returns one of the numbers that are associated with the  name,
1547       but it is not defined which it is.
1548
1549       If  you want to get full details of all captured substrings for a given
1550       name, you must use  the  pcre_get_stringtable_entries()  function.  The
1551       first argument is the compiled pattern, and the second is the name. The
1552       third and fourth are pointers to variables which  are  updated  by  the
1553       function. After it has run, they point to the first and last entries in
1554       the name-to-number table  for  the  given  name.  The  function  itself
1555       returns  the  length  of  each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
1556       there are none. The format of the table is described above in the  sec‐
1557       tion  entitled  Information  about  a  pattern.  Given all the relevant
1558       entries for the name, you can extract each of their numbers, and  hence
1559       the captured data, if any.
1560

FINDING ALL POSSIBLE MATCHES

1562
1563       The  traditional  matching  function  uses a similar algorithm to Perl,
1564       which stops when it finds the first match, starting at a given point in
1565       the  subject.  If you want to find all possible matches, or the longest
1566       possible match, consider using the alternative matching  function  (see
1567       below)  instead.  If you cannot use the alternative function, but still
1568       need to find all possible matches, you can kludge it up by  making  use
1569       of the callout facility, which is described in the pcrecallout documen‐
1570       tation.
1571
1572       What you have to do is to insert a callout right at the end of the pat‐
1573       tern.   When your callout function is called, extract and save the cur‐
1574       rent matched substring. Then return  1,  which  forces  pcre_exec()  to
1575       backtrack  and  try other alternatives. Ultimately, when it runs out of
1576       matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
1577

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

1579
1580       int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
1581            const char *subject, int length, int startoffset,
1582            int options, int *ovector, int ovecsize,
1583            int *workspace, int wscount);
1584
1585       The function pcre_dfa_exec()  is  called  to  match  a  subject  string
1586       against  a  compiled pattern, using a matching algorithm that scans the
1587       subject string just once, and does not backtrack.  This  has  different
1588       characteristics  to  the  normal  algorithm, and is not compatible with
1589       Perl. Some of the features of PCRE patterns are not  supported.  Never‐
1590       theless,  there are times when this kind of matching can be useful. For
1591       a discussion of the two matching algorithms, see the pcrematching docu‐
1592       mentation.
1593
1594       The  arguments  for  the  pcre_dfa_exec()  function are the same as for
1595       pcre_exec(), plus two extras. The ovector argument is used in a differ‐
1596       ent  way,  and  this is described below. The other common arguments are
1597       used in the same way as for pcre_exec(), so their  description  is  not
1598       repeated here.
1599
1600       The  two  additional  arguments provide workspace for the function. The
1601       workspace vector should contain at least 20 elements. It  is  used  for
1602       keeping  track  of  multiple  paths  through  the  pattern  tree.  More
1603       workspace will be needed for patterns and subjects where  there  are  a
1604       lot of potential matches.
1605
1606       Here is an example of a simple call to pcre_dfa_exec():
1607
1608         int rc;
1609         int ovector[10];
1610         int wspace[20];
1611         rc = pcre_dfa_exec(
1612           re,             /* result of pcre_compile() */
1613           NULL,           /* we didn't study the pattern */
1614           "some string",  /* the subject string */
1615           11,             /* the length of the subject string */
1616           0,              /* start at offset 0 in the subject */
1617           0,              /* default options */
1618           ovector,        /* vector of integers for substring information */
1619           10,             /* number of elements (NOT size in bytes) */
1620           wspace,         /* working space vector */
1621           20);            /* number of elements (NOT size in bytes) */
1622
1623   Option bits for pcre_dfa_exec()
1624
1625       The  unused  bits  of  the options argument for pcre_dfa_exec() must be
1626       zero. The only bits  that  may  be  set  are  PCRE_ANCHORED,  PCRE_NEW‐
1627       LINE_xxx,  PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,
1628       PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
1629       three of these are the same as for pcre_exec(), so their description is
1630       not repeated here.
1631
1632         PCRE_PARTIAL
1633
1634       This has the same general effect as it does for  pcre_exec(),  but  the
1635       details   are   slightly   different.  When  PCRE_PARTIAL  is  set  for
1636       pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is  converted  into
1637       PCRE_ERROR_PARTIAL  if  the  end  of the subject is reached, there have
1638       been no complete matches, but there is still at least one matching pos‐
1639       sibility.  The portion of the string that provided the partial match is
1640       set as the first matching string.
1641
1642         PCRE_DFA_SHORTEST
1643
1644       Setting the PCRE_DFA_SHORTEST option causes the matching  algorithm  to
1645       stop as soon as it has found one match. Because of the way the alterna‐
1646       tive algorithm works, this is necessarily the shortest  possible  match
1647       at the first possible matching point in the subject string.
1648
1649         PCRE_DFA_RESTART
1650
1651       When  pcre_dfa_exec()  is  called  with  the  PCRE_PARTIAL  option, and
1652       returns a partial match, it is possible to call it  again,  with  addi‐
1653       tional  subject  characters,  and have it continue with the same match.
1654       The PCRE_DFA_RESTART option requests this action; when it is  set,  the
1655       workspace  and wscount options must reference the same vector as before
1656       because data about the match so far is left in  them  after  a  partial
1657       match.  There  is  more  discussion of this facility in the pcrepartial
1658       documentation.
1659
1660   Successful returns from pcre_dfa_exec()
1661
1662       When pcre_dfa_exec() succeeds, it may have matched more than  one  sub‐
1663       string in the subject. Note, however, that all the matches from one run
1664       of the function start at the same point in  the  subject.  The  shorter
1665       matches  are all initial substrings of the longer matches. For example,
1666       if the pattern
1667
1668         <.*>
1669
1670       is matched against the string
1671
1672         This is <something> <something else> <something further> no more
1673
1674       the three matched strings are
1675
1676         <something>
1677         <something> <something else>
1678         <something> <something else> <something further>
1679
1680       On success, the yield of the function is a number  greater  than  zero,
1681       which  is  the  number of matched substrings. The substrings themselves
1682       are returned in ovector. Each string uses two elements;  the  first  is
1683       the  offset  to  the start, and the second is the offset to the end. In
1684       fact, all the strings have the same start  offset.  (Space  could  have
1685       been  saved by giving this only once, but it was decided to retain some
1686       compatibility with the way pcre_exec() returns data,  even  though  the
1687       meaning of the strings is different.)
1688
1689       The strings are returned in reverse order of length; that is, the long‐
1690       est matching string is given first. If there were too many  matches  to
1691       fit  into ovector, the yield of the function is zero, and the vector is
1692       filled with the longest matches.
1693
1694   Error returns from pcre_dfa_exec()
1695
1696       The pcre_dfa_exec() function returns a negative number when  it  fails.
1697       Many  of  the  errors  are  the  same as for pcre_exec(), and these are
1698       described above.  There are in addition the following errors  that  are
1699       specific to pcre_dfa_exec():
1700
1701         PCRE_ERROR_DFA_UITEM      (-16)
1702
1703       This  return is given if pcre_dfa_exec() encounters an item in the pat‐
1704       tern that it does not support, for instance, the use of \C  or  a  back
1705       reference.
1706
1707         PCRE_ERROR_DFA_UCOND      (-17)
1708
1709       This  return  is  given  if pcre_dfa_exec() encounters a condition item
1710       that uses a back reference for the condition, or a test  for  recursion
1711       in a specific group. These are not supported.
1712
1713         PCRE_ERROR_DFA_UMLIMIT    (-18)
1714
1715       This  return  is given if pcre_dfa_exec() is called with an extra block
1716       that contains a setting of the match_limit field. This is not supported
1717       (it is meaningless).
1718
1719         PCRE_ERROR_DFA_WSSIZE     (-19)
1720
1721       This  return  is  given  if  pcre_dfa_exec()  runs  out of space in the
1722       workspace vector.
1723
1724         PCRE_ERROR_DFA_RECURSE    (-20)
1725
1726       When a recursive subpattern is processed, the matching  function  calls
1727       itself  recursively,  using  private vectors for ovector and workspace.
1728       This error is given if the output vector  is  not  large  enough.  This
1729       should be extremely rare, as a vector of size 1000 is used.
1730

SEE ALSO

1732
1733       pcrebuild(3),  pcrecallout(3), pcrecpp(3)(3), pcrematching(3), pcrepar‐
1734       tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
1735

AUTHOR

1737
1738       Philip Hazel
1739       University Computing Service
1740       Cambridge CB2 3QH, England.
1741

REVISION

1743
1744       Last updated: 21 August 2007
1745       Copyright (c) 1997-2007 University of Cambridge.
1746
1747
1748
1749                                                                    PCREAPI(3)
Impressum