1PCREAPI(3) Library Functions Manual PCREAPI(3)
2
3
4
6 PCRE - Perl-compatible regular expressions
7
9
10 #include <pcre.h>
11
12 pcre *pcre_compile(const char *pattern, int options,
13 const char **errptr, int *erroffset,
14 const unsigned char *tableptr);
15
16 pcre *pcre_compile2(const char *pattern, int options,
17 int *errorcodeptr,
18 const char **errptr, int *erroffset,
19 const unsigned char *tableptr);
20
21 pcre_extra *pcre_study(const pcre *code, int options,
22 const char **errptr);
23
24 int pcre_exec(const pcre *code, const pcre_extra *extra,
25 const char *subject, int length, int startoffset,
26 int options, int *ovector, int ovecsize);
27
28 int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
29 const char *subject, int length, int startoffset,
30 int options, int *ovector, int ovecsize,
31 int *workspace, int wscount);
32
33 int pcre_copy_named_substring(const pcre *code,
34 const char *subject, int *ovector,
35 int stringcount, const char *stringname,
36 char *buffer, int buffersize);
37
38 int pcre_copy_substring(const char *subject, int *ovector,
39 int stringcount, int stringnumber, char *buffer,
40 int buffersize);
41
42 int pcre_get_named_substring(const pcre *code,
43 const char *subject, int *ovector,
44 int stringcount, const char *stringname,
45 const char **stringptr);
46
47 int pcre_get_stringnumber(const pcre *code,
48 const char *name);
49
50 int pcre_get_stringtable_entries(const pcre *code,
51 const char *name, char **first, char **last);
52
53 int pcre_get_substring(const char *subject, int *ovector,
54 int stringcount, int stringnumber,
55 const char **stringptr);
56
57 int pcre_get_substring_list(const char *subject,
58 int *ovector, int stringcount, const char ***listptr);
59
60 void pcre_free_substring(const char *stringptr);
61
62 void pcre_free_substring_list(const char **stringptr);
63
64 const unsigned char *pcre_maketables(void);
65
66 int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
67 int what, void *where);
68
69 int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
70
71 int pcre_refcount(pcre *code, int adjust);
72
73 int pcre_config(int what, void *where);
74
75 char *pcre_version(void);
76
77 void *(*pcre_malloc)(size_t);
78
79 void (*pcre_free)(void *);
80
81 void *(*pcre_stack_malloc)(size_t);
82
83 void (*pcre_stack_free)(void *);
84
85 int (*pcre_callout)(pcre_callout_block *);
86
88
89 PCRE has its own native API, which is described in this document. There
90 are also some wrapper functions that correspond to the POSIX regular
91 expression API. These are described in the pcreposix documentation.
92 Both of these APIs define a set of C function calls. A C++ wrapper is
93 distributed with PCRE. It is documented in the pcrecpp page.
94
95 The native API C function prototypes are defined in the header file
96 pcre.h, and on Unix systems the library itself is called libpcre. It
97 can normally be accessed by adding -lpcre to the command for linking an
98 application that uses PCRE. The header file defines the macros
99 PCRE_MAJOR and PCRE_MINOR to contain the major and minor release num‐
100 bers for the library. Applications can use these to include support
101 for different releases of PCRE.
102
103 The functions pcre_compile(), pcre_compile2(), pcre_study(), and
104 pcre_exec() are used for compiling and matching regular expressions in
105 a Perl-compatible manner. A sample program that demonstrates the sim‐
106 plest way of using them is provided in the file called pcredemo.c in
107 the source distribution. The pcresample documentation describes how to
108 run it.
109
110 A second matching function, pcre_dfa_exec(), which is not Perl-compati‐
111 ble, is also provided. This uses a different algorithm for the match‐
112 ing. The alternative algorithm finds all possible matches (at a given
113 point in the subject), and scans the subject just once. However, this
114 algorithm does not return captured substrings. A description of the two
115 matching algorithms and their advantages and disadvantages is given in
116 the pcrematching documentation.
117
118 In addition to the main compiling and matching functions, there are
119 convenience functions for extracting captured substrings from a subject
120 string that is matched by pcre_exec(). They are:
121
122 pcre_copy_substring()
123 pcre_copy_named_substring()
124 pcre_get_substring()
125 pcre_get_named_substring()
126 pcre_get_substring_list()
127 pcre_get_stringnumber()
128 pcre_get_stringtable_entries()
129
130 pcre_free_substring() and pcre_free_substring_list() are also provided,
131 to free the memory used for extracted strings.
132
133 The function pcre_maketables() is used to build a set of character
134 tables in the current locale for passing to pcre_compile(),
135 pcre_exec(), or pcre_dfa_exec(). This is an optional facility that is
136 provided for specialist use. Most commonly, no special tables are
137 passed, in which case internal tables that are generated when PCRE is
138 built are used.
139
140 The function pcre_fullinfo() is used to find out information about a
141 compiled pattern; pcre_info() is an obsolete version that returns only
142 some of the available information, but is retained for backwards com‐
143 patibility. The function pcre_version() returns a pointer to a string
144 containing the version of PCRE and its date of release.
145
146 The function pcre_refcount() maintains a reference count in a data
147 block containing a compiled pattern. This is provided for the benefit
148 of object-oriented applications.
149
150 The global variables pcre_malloc and pcre_free initially contain the
151 entry points of the standard malloc() and free() functions, respec‐
152 tively. PCRE calls the memory management functions via these variables,
153 so a calling program can replace them if it wishes to intercept the
154 calls. This should be done before calling any PCRE functions.
155
156 The global variables pcre_stack_malloc and pcre_stack_free are also
157 indirections to memory management functions. These special functions
158 are used only when PCRE is compiled to use the heap for remembering
159 data, instead of recursive function calls, when running the pcre_exec()
160 function. See the pcrebuild documentation for details of how to do
161 this. It is a non-standard way of building PCRE, for use in environ‐
162 ments that have limited stacks. Because of the greater use of memory
163 management, it runs more slowly. Separate functions are provided so
164 that special-purpose external code can be used for this case. When
165 used, these functions are always called in a stack-like manner (last
166 obtained, first freed), and always for memory blocks of the same size.
167 There is a discussion about PCRE's stack usage in the pcrestack docu‐
168 mentation.
169
170 The global variable pcre_callout initially contains NULL. It can be set
171 by the caller to a "callout" function, which PCRE will then call at
172 specified points during a matching operation. Details are given in the
173 pcrecallout documentation.
174
176
177 PCRE supports five different conventions for indicating line breaks in
178 strings: a single CR (carriage return) character, a single LF (line‐
179 feed) character, the two-character sequence CRLF, any of the three pre‐
180 ceding, or any Unicode newline sequence. The Unicode newline sequences
181 are the three just mentioned, plus the single characters VT (vertical
182 tab, U+000B), FF (formfeed, U+000C), NEL (next line, U+0085), LS (line
183 separator, U+2028), and PS (paragraph separator, U+2029).
184
185 Each of the first three conventions is used by at least one operating
186 system as its standard newline sequence. When PCRE is built, a default
187 can be specified. The default default is LF, which is the Unix stan‐
188 dard. When PCRE is run, the default can be overridden, either when a
189 pattern is compiled, or when it is matched.
190
191 At compile time, the newline convention can be specified by the options
192 argument of pcre_compile(), or it can be specified by special text at
193 the start of the pattern itself; this overrides any other settings. See
194 the pcrepattern page for details of the special character sequences.
195
196 In the PCRE documentation the word "newline" is used to mean "the char‐
197 acter or pair of characters that indicate a line break". The choice of
198 newline convention affects the handling of the dot, circumflex, and
199 dollar metacharacters, the handling of #-comments in /x mode, and, when
200 CRLF is a recognized line ending sequence, the match position advance‐
201 ment for a non-anchored pattern. There is more detail about this in the
202 section on pcre_exec() options below. The choice of newline convention
203 does not affect the interpretation of the \n or \r escape sequences.
204
206
207 The PCRE functions can be used in multi-threading applications, with
208 the proviso that the memory management functions pointed to by
209 pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the
210 callout function pointed to by pcre_callout, are shared by all threads.
211
212 The compiled form of a regular expression is not altered during match‐
213 ing, so the same compiled pattern can safely be used by several threads
214 at once.
215
217
218 The compiled form of a regular expression can be saved and re-used at a
219 later time, possibly by a different program, and even on a host other
220 than the one on which it was compiled. Details are given in the
221 pcreprecompile documentation. However, compiling a regular expression
222 with one version of PCRE for use with a different version is not guar‐
223 anteed to work and may cause crashes.
224
226
227 int pcre_config(int what, void *where);
228
229 The function pcre_config() makes it possible for a PCRE client to dis‐
230 cover which optional features have been compiled into the PCRE library.
231 The pcrebuild documentation has more details about these optional fea‐
232 tures.
233
234 The first argument for pcre_config() is an integer, specifying which
235 information is required; the second argument is a pointer to a variable
236 into which the information is placed. The following information is
237 available:
238
239 PCRE_CONFIG_UTF8
240
241 The output is an integer that is set to one if UTF-8 support is avail‐
242 able; otherwise it is set to zero.
243
244 PCRE_CONFIG_UNICODE_PROPERTIES
245
246 The output is an integer that is set to one if support for Unicode
247 character properties is available; otherwise it is set to zero.
248
249 PCRE_CONFIG_NEWLINE
250
251 The output is an integer whose value specifies the default character
252 sequence that is recognized as meaning "newline". The four values that
253 are supported are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for ANYCRLF,
254 and -1 for ANY. The default should normally be the standard sequence
255 for your operating system.
256
257 PCRE_CONFIG_LINK_SIZE
258
259 The output is an integer that contains the number of bytes used for
260 internal linkage in compiled regular expressions. The value is 2, 3, or
261 4. Larger values allow larger regular expressions to be compiled, at
262 the expense of slower matching. The default value of 2 is sufficient
263 for all but the most massive patterns, since it allows the compiled
264 pattern to be up to 64K in size.
265
266 PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
267
268 The output is an integer that contains the threshold above which the
269 POSIX interface uses malloc() for output vectors. Further details are
270 given in the pcreposix documentation.
271
272 PCRE_CONFIG_MATCH_LIMIT
273
274 The output is an integer that gives the default limit for the number of
275 internal matching function calls in a pcre_exec() execution. Further
276 details are given with pcre_exec() below.
277
278 PCRE_CONFIG_MATCH_LIMIT_RECURSION
279
280 The output is an integer that gives the default limit for the depth of
281 recursion when calling the internal matching function in a pcre_exec()
282 execution. Further details are given with pcre_exec() below.
283
284 PCRE_CONFIG_STACKRECURSE
285
286 The output is an integer that is set to one if internal recursion when
287 running pcre_exec() is implemented by recursive function calls that use
288 the stack to remember their state. This is the usual way that PCRE is
289 compiled. The output is zero if PCRE was compiled to use blocks of data
290 on the heap instead of recursive function calls. In this case,
291 pcre_stack_malloc and pcre_stack_free are called to manage memory
292 blocks on the heap, thus avoiding the use of the stack.
293
295
296 pcre *pcre_compile(const char *pattern, int options,
297 const char **errptr, int *erroffset,
298 const unsigned char *tableptr);
299
300 pcre *pcre_compile2(const char *pattern, int options,
301 int *errorcodeptr,
302 const char **errptr, int *erroffset,
303 const unsigned char *tableptr);
304
305 Either of the functions pcre_compile() or pcre_compile2() can be called
306 to compile a pattern into an internal form. The only difference between
307 the two interfaces is that pcre_compile2() has an additional argument,
308 errorcodeptr, via which a numerical error code can be returned.
309
310 The pattern is a C string terminated by a binary zero, and is passed in
311 the pattern argument. A pointer to a single block of memory that is
312 obtained via pcre_malloc is returned. This contains the compiled code
313 and related data. The pcre type is defined for the returned block; this
314 is a typedef for a structure whose contents are not externally defined.
315 It is up to the caller to free the memory (via pcre_free) when it is no
316 longer required.
317
318 Although the compiled code of a PCRE regex is relocatable, that is, it
319 does not depend on memory location, the complete pcre data block is not
320 fully relocatable, because it may contain a copy of the tableptr argu‐
321 ment, which is an address (see below).
322
323 The options argument contains various bit settings that affect the com‐
324 pilation. It should be zero if no options are required. The available
325 options are described below. Some of them, in particular, those that
326 are compatible with Perl, can also be set and unset from within the
327 pattern (see the detailed description in the pcrepattern documenta‐
328 tion). For these options, the contents of the options argument speci‐
329 fies their initial settings at the start of compilation and execution.
330 The PCRE_ANCHORED and PCRE_NEWLINE_xxx options can be set at the time
331 of matching as well as at compile time.
332
333 If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise,
334 if compilation of a pattern fails, pcre_compile() returns NULL, and
335 sets the variable pointed to by errptr to point to a textual error mes‐
336 sage. This is a static string that is part of the library. You must not
337 try to free it. The offset from the start of the pattern to the charac‐
338 ter where the error was discovered is placed in the variable pointed to
339 by erroffset, which must not be NULL. If it is, an immediate error is
340 given.
341
342 If pcre_compile2() is used instead of pcre_compile(), and the error‐
343 codeptr argument is not NULL, a non-zero error code number is returned
344 via this argument in the event of an error. This is in addition to the
345 textual error message. Error codes and messages are listed below.
346
347 If the final argument, tableptr, is NULL, PCRE uses a default set of
348 character tables that are built when PCRE is compiled, using the
349 default C locale. Otherwise, tableptr must be an address that is the
350 result of a call to pcre_maketables(). This value is stored with the
351 compiled pattern, and used again by pcre_exec(), unless another table
352 pointer is passed to it. For more discussion, see the section on locale
353 support below.
354
355 This code fragment shows a typical straightforward call to pcre_com‐
356 pile():
357
358 pcre *re;
359 const char *error;
360 int erroffset;
361 re = pcre_compile(
362 "^A.*Z", /* the pattern */
363 0, /* default options */
364 &error, /* for error message */
365 &erroffset, /* for error offset */
366 NULL); /* use default character tables */
367
368 The following names for option bits are defined in the pcre.h header
369 file:
370
371 PCRE_ANCHORED
372
373 If this bit is set, the pattern is forced to be "anchored", that is, it
374 is constrained to match only at the first matching point in the string
375 that is being searched (the "subject string"). This effect can also be
376 achieved by appropriate constructs in the pattern itself, which is the
377 only way to do it in Perl.
378
379 PCRE_AUTO_CALLOUT
380
381 If this bit is set, pcre_compile() automatically inserts callout items,
382 all with number 255, before each pattern item. For discussion of the
383 callout facility, see the pcrecallout documentation.
384
385 PCRE_CASELESS
386
387 If this bit is set, letters in the pattern match both upper and lower
388 case letters. It is equivalent to Perl's /i option, and it can be
389 changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE
390 always understands the concept of case for characters whose values are
391 less than 128, so caseless matching is always possible. For characters
392 with higher values, the concept of case is supported if PCRE is com‐
393 piled with Unicode property support, but not otherwise. If you want to
394 use caseless matching for characters 128 and above, you must ensure
395 that PCRE is compiled with Unicode property support as well as with
396 UTF-8 support.
397
398 PCRE_DOLLAR_ENDONLY
399
400 If this bit is set, a dollar metacharacter in the pattern matches only
401 at the end of the subject string. Without this option, a dollar also
402 matches immediately before a newline at the end of the string (but not
403 before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored
404 if PCRE_MULTILINE is set. There is no equivalent to this option in
405 Perl, and no way to set it within a pattern.
406
407 PCRE_DOTALL
408
409 If this bit is set, a dot metacharater in the pattern matches all char‐
410 acters, including those that indicate newline. Without it, a dot does
411 not match when the current position is at a newline. This option is
412 equivalent to Perl's /s option, and it can be changed within a pattern
413 by a (?s) option setting. A negative class such as [^a] always matches
414 newline characters, independent of the setting of this option.
415
416 PCRE_DUPNAMES
417
418 If this bit is set, names used to identify capturing subpatterns need
419 not be unique. This can be helpful for certain types of pattern when it
420 is known that only one instance of the named subpattern can ever be
421 matched. There are more details of named subpatterns below; see also
422 the pcrepattern documentation.
423
424 PCRE_EXTENDED
425
426 If this bit is set, whitespace data characters in the pattern are
427 totally ignored except when escaped or inside a character class. White‐
428 space does not include the VT character (code 11). In addition, charac‐
429 ters between an unescaped # outside a character class and the next new‐
430 line, inclusive, are also ignored. This is equivalent to Perl's /x
431 option, and it can be changed within a pattern by a (?x) option set‐
432 ting.
433
434 This option makes it possible to include comments inside complicated
435 patterns. Note, however, that this applies only to data characters.
436 Whitespace characters may never appear within special character
437 sequences in a pattern, for example within the sequence (?( which
438 introduces a conditional subpattern.
439
440 PCRE_EXTRA
441
442 This option was invented in order to turn on additional functionality
443 of PCRE that is incompatible with Perl, but it is currently of very
444 little use. When set, any backslash in a pattern that is followed by a
445 letter that has no special meaning causes an error, thus reserving
446 these combinations for future expansion. By default, as in Perl, a
447 backslash followed by a letter with no special meaning is treated as a
448 literal. (Perl can, however, be persuaded to give a warning for this.)
449 There are at present no other features controlled by this option. It
450 can also be set by a (?X) option setting within a pattern.
451
452 PCRE_FIRSTLINE
453
454 If this option is set, an unanchored pattern is required to match
455 before or at the first newline in the subject string, though the
456 matched text may continue over the newline.
457
458 PCRE_MULTILINE
459
460 By default, PCRE treats the subject string as consisting of a single
461 line of characters (even if it actually contains newlines). The "start
462 of line" metacharacter (^) matches only at the start of the string,
463 while the "end of line" metacharacter ($) matches only at the end of
464 the string, or before a terminating newline (unless PCRE_DOLLAR_ENDONLY
465 is set). This is the same as Perl.
466
467 When PCRE_MULTILINE it is set, the "start of line" and "end of line"
468 constructs match immediately following or immediately before internal
469 newlines in the subject string, respectively, as well as at the very
470 start and end. This is equivalent to Perl's /m option, and it can be
471 changed within a pattern by a (?m) option setting. If there are no new‐
472 lines in a subject string, or no occurrences of ^ or $ in a pattern,
473 setting PCRE_MULTILINE has no effect.
474
475 PCRE_NEWLINE_CR
476 PCRE_NEWLINE_LF
477 PCRE_NEWLINE_CRLF
478 PCRE_NEWLINE_ANYCRLF
479 PCRE_NEWLINE_ANY
480
481 These options override the default newline definition that was chosen
482 when PCRE was built. Setting the first or the second specifies that a
483 newline is indicated by a single character (CR or LF, respectively).
484 Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the
485 two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies
486 that any of the three preceding sequences should be recognized. Setting
487 PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be
488 recognized. The Unicode newline sequences are the three just mentioned,
489 plus the single characters VT (vertical tab, U+000B), FF (formfeed,
490 U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS
491 (paragraph separator, U+2029). The last two are recognized only in
492 UTF-8 mode.
493
494 The newline setting in the options word uses three bits that are
495 treated as a number, giving eight possibilities. Currently only six are
496 used (default plus the five values above). This means that if you set
497 more than one newline option, the combination may or may not be sensi‐
498 ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to
499 PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and
500 cause an error.
501
502 The only time that a line break is specially recognized when compiling
503 a pattern is if PCRE_EXTENDED is set, and an unescaped # outside a
504 character class is encountered. This indicates a comment that lasts
505 until after the next line break sequence. In other circumstances, line
506 break sequences are treated as literal data, except that in
507 PCRE_EXTENDED mode, both CR and LF are treated as whitespace characters
508 and are therefore ignored.
509
510 The newline option that is set at compile time becomes the default that
511 is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden.
512
513 PCRE_NO_AUTO_CAPTURE
514
515 If this option is set, it disables the use of numbered capturing paren‐
516 theses in the pattern. Any opening parenthesis that is not followed by
517 ? behaves as if it were followed by ?: but named parentheses can still
518 be used for capturing (and they acquire numbers in the usual way).
519 There is no equivalent of this option in Perl.
520
521 PCRE_UNGREEDY
522
523 This option inverts the "greediness" of the quantifiers so that they
524 are not greedy by default, but become greedy if followed by "?". It is
525 not compatible with Perl. It can also be set by a (?U) option setting
526 within the pattern.
527
528 PCRE_UTF8
529
530 This option causes PCRE to regard both the pattern and the subject as
531 strings of UTF-8 characters instead of single-byte character strings.
532 However, it is available only when PCRE is built to include UTF-8 sup‐
533 port. If not, the use of this option provokes an error. Details of how
534 this option changes the behaviour of PCRE are given in the section on
535 UTF-8 support in the main pcre page.
536
537 PCRE_NO_UTF8_CHECK
538
539 When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is
540 automatically checked. There is a discussion about the validity of
541 UTF-8 strings in the main pcre page. If an invalid UTF-8 sequence of
542 bytes is found, pcre_compile() returns an error. If you already know
543 that your pattern is valid, and you want to skip this check for perfor‐
544 mance reasons, you can set the PCRE_NO_UTF8_CHECK option. When it is
545 set, the effect of passing an invalid UTF-8 string as a pattern is
546 undefined. It may cause your program to crash. Note that this option
547 can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress the
548 UTF-8 validity checking of subject strings.
549
551
552 The following table lists the error codes than may be returned by
553 pcre_compile2(), along with the error messages that may be returned by
554 both compiling functions. As PCRE has developed, some error codes have
555 fallen out of use. To avoid confusion, they have not been re-used.
556
557 0 no error
558 1 \ at end of pattern
559 2 \c at end of pattern
560 3 unrecognized character follows \
561 4 numbers out of order in {} quantifier
562 5 number too big in {} quantifier
563 6 missing terminating ] for character class
564 7 invalid escape sequence in character class
565 8 range out of order in character class
566 9 nothing to repeat
567 10 [this code is not in use]
568 11 internal error: unexpected repeat
569 12 unrecognized character after (?
570 13 POSIX named classes are supported only within a class
571 14 missing )
572 15 reference to non-existent subpattern
573 16 erroffset passed as NULL
574 17 unknown option bit(s) set
575 18 missing ) after comment
576 19 [this code is not in use]
577 20 regular expression too large
578 21 failed to get memory
579 22 unmatched parentheses
580 23 internal error: code overflow
581 24 unrecognized character after (?<
582 25 lookbehind assertion is not fixed length
583 26 malformed number or name after (?(
584 27 conditional group contains more than two branches
585 28 assertion expected after (?(
586 29 (?R or (?[+-]digits must be followed by )
587 30 unknown POSIX class name
588 31 POSIX collating elements are not supported
589 32 this version of PCRE is not compiled with PCRE_UTF8 support
590 33 [this code is not in use]
591 34 character value in \x{...} sequence is too large
592 35 invalid condition (?(0)
593 36 \C not allowed in lookbehind assertion
594 37 PCRE does not support \L, \l, \N, \U, or \u
595 38 number after (?C is > 255
596 39 closing ) for (?C expected
597 40 recursive call could loop indefinitely
598 41 unrecognized character after (?P
599 42 syntax error in subpattern name (missing terminator)
600 43 two named subpatterns have the same name
601 44 invalid UTF-8 string
602 45 support for \P, \p, and \X has not been compiled
603 46 malformed \P or \p sequence
604 47 unknown property name after \P or \p
605 48 subpattern name is too long (maximum 32 characters)
606 49 too many named subpatterns (maximum 10,000)
607 50 [this code is not in use]
608 51 octal value is greater than \377 (not in UTF-8 mode)
609 52 internal error: overran compiling workspace
610 53 internal error: previously-checked referenced subpattern not
611 found
612 54 DEFINE group contains more than one branch
613 55 repeating a DEFINE group is not allowed
614 56 inconsistent NEWLINE options"
615 57 \g is not followed by a braced name or an optionally braced
616 non-zero number
617 58 (?+ or (?- or (?(+ or (?(- must be followed by a non-zero number
618
620
621 pcre_extra *pcre_study(const pcre *code, int options
622 const char **errptr);
623
624 If a compiled pattern is going to be used several times, it is worth
625 spending more time analyzing it in order to speed up the time taken for
626 matching. The function pcre_study() takes a pointer to a compiled pat‐
627 tern as its first argument. If studying the pattern produces additional
628 information that will help speed up matching, pcre_study() returns a
629 pointer to a pcre_extra block, in which the study_data field points to
630 the results of the study.
631
632 The returned value from pcre_study() can be passed directly to
633 pcre_exec(). However, a pcre_extra block also contains other fields
634 that can be set by the caller before the block is passed; these are
635 described below in the section on matching a pattern.
636
637 If studying the pattern does not produce any additional information
638 pcre_study() returns NULL. In that circumstance, if the calling program
639 wants to pass any of the other fields to pcre_exec(), it must set up
640 its own pcre_extra block.
641
642 The second argument of pcre_study() contains option bits. At present,
643 no options are defined, and this argument should always be zero.
644
645 The third argument for pcre_study() is a pointer for an error message.
646 If studying succeeds (even if no data is returned), the variable it
647 points to is set to NULL. Otherwise it is set to point to a textual
648 error message. This is a static string that is part of the library. You
649 must not try to free it. You should test the error pointer for NULL
650 after calling pcre_study(), to be sure that it has run successfully.
651
652 This is a typical call to pcre_study():
653
654 pcre_extra *pe;
655 pe = pcre_study(
656 re, /* result of pcre_compile() */
657 0, /* no options exist */
658 &error); /* set to NULL or points to a message */
659
660 At present, studying a pattern is useful only for non-anchored patterns
661 that do not have a single fixed starting character. A bitmap of possi‐
662 ble starting bytes is created.
663
665
666 PCRE handles caseless matching, and determines whether characters are
667 letters, digits, or whatever, by reference to a set of tables, indexed
668 by character value. When running in UTF-8 mode, this applies only to
669 characters with codes less than 128. Higher-valued codes never match
670 escapes such as \w or \d, but can be tested with \p if PCRE is built
671 with Unicode character property support. The use of locales with Uni‐
672 code is discouraged. If you are handling characters with codes greater
673 than 128, you should either use UTF-8 and Unicode, or use locales, but
674 not try to mix the two.
675
676 PCRE contains an internal set of tables that are used when the final
677 argument of pcre_compile() is NULL. These are sufficient for many
678 applications. Normally, the internal tables recognize only ASCII char‐
679 acters. However, when PCRE is built, it is possible to cause the inter‐
680 nal tables to be rebuilt in the default "C" locale of the local system,
681 which may cause them to be different.
682
683 The internal tables can always be overridden by tables supplied by the
684 application that calls PCRE. These may be created in a different locale
685 from the default. As more and more applications change to using Uni‐
686 code, the need for this locale support is expected to die away.
687
688 External tables are built by calling the pcre_maketables() function,
689 which has no arguments, in the relevant locale. The result can then be
690 passed to pcre_compile() or pcre_exec() as often as necessary. For
691 example, to build and use tables that are appropriate for the French
692 locale (where accented characters with values greater than 128 are
693 treated as letters), the following code could be used:
694
695 setlocale(LC_CTYPE, "fr_FR");
696 tables = pcre_maketables();
697 re = pcre_compile(..., tables);
698
699 The locale name "fr_FR" is used on Linux and other Unix-like systems;
700 if you are using Windows, the name for the French locale is "french".
701
702 When pcre_maketables() runs, the tables are built in memory that is
703 obtained via pcre_malloc. It is the caller's responsibility to ensure
704 that the memory containing the tables remains available for as long as
705 it is needed.
706
707 The pointer that is passed to pcre_compile() is saved with the compiled
708 pattern, and the same tables are used via this pointer by pcre_study()
709 and normally also by pcre_exec(). Thus, by default, for any single pat‐
710 tern, compilation, studying and matching all happen in the same locale,
711 but different patterns can be compiled in different locales.
712
713 It is possible to pass a table pointer or NULL (indicating the use of
714 the internal tables) to pcre_exec(). Although not intended for this
715 purpose, this facility could be used to match a pattern in a different
716 locale from the one in which it was compiled. Passing table pointers at
717 run time is discussed below in the section on matching a pattern.
718
720
721 int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
722 int what, void *where);
723
724 The pcre_fullinfo() function returns information about a compiled pat‐
725 tern. It replaces the obsolete pcre_info() function, which is neverthe‐
726 less retained for backwards compability (and is documented below).
727
728 The first argument for pcre_fullinfo() is a pointer to the compiled
729 pattern. The second argument is the result of pcre_study(), or NULL if
730 the pattern was not studied. The third argument specifies which piece
731 of information is required, and the fourth argument is a pointer to a
732 variable to receive the data. The yield of the function is zero for
733 success, or one of the following negative numbers:
734
735 PCRE_ERROR_NULL the argument code was NULL
736 the argument where was NULL
737 PCRE_ERROR_BADMAGIC the "magic number" was not found
738 PCRE_ERROR_BADOPTION the value of what was invalid
739
740 The "magic number" is placed at the start of each compiled pattern as
741 an simple check against passing an arbitrary memory pointer. Here is a
742 typical call of pcre_fullinfo(), to obtain the length of the compiled
743 pattern:
744
745 int rc;
746 size_t length;
747 rc = pcre_fullinfo(
748 re, /* result of pcre_compile() */
749 pe, /* result of pcre_study(), or NULL */
750 PCRE_INFO_SIZE, /* what is required */
751 &length); /* where to put the data */
752
753 The possible values for the third argument are defined in pcre.h, and
754 are as follows:
755
756 PCRE_INFO_BACKREFMAX
757
758 Return the number of the highest back reference in the pattern. The
759 fourth argument should point to an int variable. Zero is returned if
760 there are no back references.
761
762 PCRE_INFO_CAPTURECOUNT
763
764 Return the number of capturing subpatterns in the pattern. The fourth
765 argument should point to an int variable.
766
767 PCRE_INFO_DEFAULT_TABLES
768
769 Return a pointer to the internal default character tables within PCRE.
770 The fourth argument should point to an unsigned char * variable. This
771 information call is provided for internal use by the pcre_study() func‐
772 tion. External callers can cause PCRE to use its internal tables by
773 passing a NULL table pointer.
774
775 PCRE_INFO_FIRSTBYTE
776
777 Return information about the first byte of any matched string, for a
778 non-anchored pattern. The fourth argument should point to an int vari‐
779 able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name
780 is still recognized for backwards compatibility.)
781
782 If there is a fixed first byte, for example, from a pattern such as
783 (cat|cow|coyote), its value is returned. Otherwise, if either
784
785 (a) the pattern was compiled with the PCRE_MULTILINE option, and every
786 branch starts with "^", or
787
788 (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not
789 set (if it were set, the pattern would be anchored),
790
791 -1 is returned, indicating that the pattern matches only at the start
792 of a subject string or after any newline within the string. Otherwise
793 -2 is returned. For anchored patterns, -2 is returned.
794
795 PCRE_INFO_FIRSTTABLE
796
797 If the pattern was studied, and this resulted in the construction of a
798 256-bit table indicating a fixed set of bytes for the first byte in any
799 matching string, a pointer to the table is returned. Otherwise NULL is
800 returned. The fourth argument should point to an unsigned char * vari‐
801 able.
802
803 PCRE_INFO_HASCRORLF
804
805 Return 1 if the pattern contains any explicit matches for CR or LF
806 characters, otherwise 0. The fourth argument should point to an int
807 variable.
808
809 PCRE_INFO_JCHANGED
810
811 Return 1 if the (?J) option setting is used in the pattern, otherwise
812 0. The fourth argument should point to an int variable. The (?J) inter‐
813 nal option setting changes the local PCRE_DUPNAMES option.
814
815 PCRE_INFO_LASTLITERAL
816
817 Return the value of the rightmost literal byte that must exist in any
818 matched string, other than at its start, if such a byte has been
819 recorded. The fourth argument should point to an int variable. If there
820 is no such byte, -1 is returned. For anchored patterns, a last literal
821 byte is recorded only if it follows something of variable length. For
822 example, for the pattern /^a\d+z\d+/ the returned value is "z", but for
823 /^a\dz\d/ the returned value is -1.
824
825 PCRE_INFO_NAMECOUNT
826 PCRE_INFO_NAMEENTRYSIZE
827 PCRE_INFO_NAMETABLE
828
829 PCRE supports the use of named as well as numbered capturing parenthe‐
830 ses. The names are just an additional way of identifying the parenthe‐
831 ses, which still acquire numbers. Several convenience functions such as
832 pcre_get_named_substring() are provided for extracting captured sub‐
833 strings by name. It is also possible to extract the data directly, by
834 first converting the name to a number in order to access the correct
835 pointers in the output vector (described with pcre_exec() below). To do
836 the conversion, you need to use the name-to-number map, which is
837 described by these three values.
838
839 The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT
840 gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size
841 of each entry; both of these return an int value. The entry size
842 depends on the length of the longest name. PCRE_INFO_NAMETABLE returns
843 a pointer to the first entry of the table (a pointer to char). The
844 first two bytes of each entry are the number of the capturing parenthe‐
845 sis, most significant byte first. The rest of the entry is the corre‐
846 sponding name, zero terminated. The names are in alphabetical order.
847 When PCRE_DUPNAMES is set, duplicate names are in order of their paren‐
848 theses numbers. For example, consider the following pattern (assume
849 PCRE_EXTENDED is set, so white space - including newlines - is
850 ignored):
851
852 (?<date> (?<year>(\d\d)?\d\d) -
853 (?<month>\d\d) - (?<day>\d\d) )
854
855 There are four named subpatterns, so the table has four entries, and
856 each entry in the table is eight bytes long. The table is as follows,
857 with non-printing bytes shows in hexadecimal, and undefined bytes shown
858 as ??:
859
860 00 01 d a t e 00 ??
861 00 05 d a y 00 ?? ??
862 00 04 m o n t h 00
863 00 02 y e a r 00 ??
864
865 When writing code to extract data from named subpatterns using the
866 name-to-number map, remember that the length of the entries is likely
867 to be different for each compiled pattern.
868
869 PCRE_INFO_OKPARTIAL
870
871 Return 1 if the pattern can be used for partial matching, otherwise 0.
872 The fourth argument should point to an int variable. The pcrepartial
873 documentation lists the restrictions that apply to patterns when par‐
874 tial matching is used.
875
876 PCRE_INFO_OPTIONS
877
878 Return a copy of the options with which the pattern was compiled. The
879 fourth argument should point to an unsigned long int variable. These
880 option bits are those specified in the call to pcre_compile(), modified
881 by any top-level option settings at the start of the pattern itself. In
882 other words, they are the options that will be in force when matching
883 starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with
884 the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE,
885 and PCRE_EXTENDED.
886
887 A pattern is automatically anchored by PCRE if all of its top-level
888 alternatives begin with one of the following:
889
890 ^ unless PCRE_MULTILINE is set
891 \A always
892 \G always
893 .* if PCRE_DOTALL is set and there are no back
894 references to the subpattern in which .* appears
895
896 For such patterns, the PCRE_ANCHORED bit is set in the options returned
897 by pcre_fullinfo().
898
899 PCRE_INFO_SIZE
900
901 Return the size of the compiled pattern, that is, the value that was
902 passed as the argument to pcre_malloc() when PCRE was getting memory in
903 which to place the compiled data. The fourth argument should point to a
904 size_t variable.
905
906 PCRE_INFO_STUDYSIZE
907
908 Return the size of the data block pointed to by the study_data field in
909 a pcre_extra block. That is, it is the value that was passed to
910 pcre_malloc() when PCRE was getting memory into which to place the data
911 created by pcre_study(). The fourth argument should point to a size_t
912 variable.
913
915
916 int pcre_info(const pcre *code, int *optptr, int *firstcharptr);
917
918 The pcre_info() function is now obsolete because its interface is too
919 restrictive to return all the available data about a compiled pattern.
920 New programs should use pcre_fullinfo() instead. The yield of
921 pcre_info() is the number of capturing subpatterns, or one of the fol‐
922 lowing negative numbers:
923
924 PCRE_ERROR_NULL the argument code was NULL
925 PCRE_ERROR_BADMAGIC the "magic number" was not found
926
927 If the optptr argument is not NULL, a copy of the options with which
928 the pattern was compiled is placed in the integer it points to (see
929 PCRE_INFO_OPTIONS above).
930
931 If the pattern is not anchored and the firstcharptr argument is not
932 NULL, it is used to pass back information about the first character of
933 any matched string (see PCRE_INFO_FIRSTBYTE above).
934
936
937 int pcre_refcount(pcre *code, int adjust);
938
939 The pcre_refcount() function is used to maintain a reference count in
940 the data block that contains a compiled pattern. It is provided for the
941 benefit of applications that operate in an object-oriented manner,
942 where different parts of the application may be using the same compiled
943 pattern, but you want to free the block when they are all done.
944
945 When a pattern is compiled, the reference count field is initialized to
946 zero. It is changed only by calling this function, whose action is to
947 add the adjust value (which may be positive or negative) to it. The
948 yield of the function is the new value. However, the value of the count
949 is constrained to lie between 0 and 65535, inclusive. If the new value
950 is outside these limits, it is forced to the appropriate limit value.
951
952 Except when it is zero, the reference count is not correctly preserved
953 if a pattern is compiled on one host and then transferred to a host
954 whose byte-order is different. (This seems a highly unlikely scenario.)
955
957
958 int pcre_exec(const pcre *code, const pcre_extra *extra,
959 const char *subject, int length, int startoffset,
960 int options, int *ovector, int ovecsize);
961
962 The function pcre_exec() is called to match a subject string against a
963 compiled pattern, which is passed in the code argument. If the pattern
964 has been studied, the result of the study should be passed in the extra
965 argument. This function is the main matching facility of the library,
966 and it operates in a Perl-like manner. For specialist use there is also
967 an alternative matching function, which is described below in the sec‐
968 tion about the pcre_dfa_exec() function.
969
970 In most applications, the pattern will have been compiled (and option‐
971 ally studied) in the same process that calls pcre_exec(). However, it
972 is possible to save compiled patterns and study data, and then use them
973 later in different processes, possibly even on different hosts. For a
974 discussion about this, see the pcreprecompile documentation.
975
976 Here is an example of a simple call to pcre_exec():
977
978 int rc;
979 int ovector[30];
980 rc = pcre_exec(
981 re, /* result of pcre_compile() */
982 NULL, /* we didn't study the pattern */
983 "some string", /* the subject string */
984 11, /* the length of the subject string */
985 0, /* start at offset 0 in the subject */
986 0, /* default options */
987 ovector, /* vector of integers for substring information */
988 30); /* number of elements (NOT size in bytes) */
989
990 Extra data for pcre_exec()
991
992 If the extra argument is not NULL, it must point to a pcre_extra data
993 block. The pcre_study() function returns such a block (when it doesn't
994 return NULL), but you can also create one for yourself, and pass addi‐
995 tional information in it. The pcre_extra block contains the following
996 fields (not necessarily in this order):
997
998 unsigned long int flags;
999 void *study_data;
1000 unsigned long int match_limit;
1001 unsigned long int match_limit_recursion;
1002 void *callout_data;
1003 const unsigned char *tables;
1004
1005 The flags field is a bitmap that specifies which of the other fields
1006 are set. The flag bits are:
1007
1008 PCRE_EXTRA_STUDY_DATA
1009 PCRE_EXTRA_MATCH_LIMIT
1010 PCRE_EXTRA_MATCH_LIMIT_RECURSION
1011 PCRE_EXTRA_CALLOUT_DATA
1012 PCRE_EXTRA_TABLES
1013
1014 Other flag bits should be set to zero. The study_data field is set in
1015 the pcre_extra block that is returned by pcre_study(), together with
1016 the appropriate flag bit. You should not set this yourself, but you may
1017 add to the block by setting the other fields and their corresponding
1018 flag bits.
1019
1020 The match_limit field provides a means of preventing PCRE from using up
1021 a vast amount of resources when running patterns that are not going to
1022 match, but which have a very large number of possibilities in their
1023 search trees. The classic example is the use of nested unlimited
1024 repeats.
1025
1026 Internally, PCRE uses a function called match() which it calls repeat‐
1027 edly (sometimes recursively). The limit set by match_limit is imposed
1028 on the number of times this function is called during a match, which
1029 has the effect of limiting the amount of backtracking that can take
1030 place. For patterns that are not anchored, the count restarts from zero
1031 for each position in the subject string.
1032
1033 The default value for the limit can be set when PCRE is built; the
1034 default default is 10 million, which handles all but the most extreme
1035 cases. You can override the default by suppling pcre_exec() with a
1036 pcre_extra block in which match_limit is set, and
1037 PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is
1038 exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT.
1039
1040 The match_limit_recursion field is similar to match_limit, but instead
1041 of limiting the total number of times that match() is called, it limits
1042 the depth of recursion. The recursion depth is a smaller number than
1043 the total number of calls, because not all calls to match() are recur‐
1044 sive. This limit is of use only if it is set smaller than match_limit.
1045
1046 Limiting the recursion depth limits the amount of stack that can be
1047 used, or, when PCRE has been compiled to use memory on the heap instead
1048 of the stack, the amount of heap memory that can be used.
1049
1050 The default value for match_limit_recursion can be set when PCRE is
1051 built; the default default is the same value as the default for
1052 match_limit. You can override the default by suppling pcre_exec() with
1053 a pcre_extra block in which match_limit_recursion is set, and
1054 PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the
1055 limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT.
1056
1057 The pcre_callout field is used in conjunction with the "callout" fea‐
1058 ture, which is described in the pcrecallout documentation.
1059
1060 The tables field is used to pass a character tables pointer to
1061 pcre_exec(); this overrides the value that is stored with the compiled
1062 pattern. A non-NULL value is stored with the compiled pattern only if
1063 custom tables were supplied to pcre_compile() via its tableptr argu‐
1064 ment. If NULL is passed to pcre_exec() using this mechanism, it forces
1065 PCRE's internal tables to be used. This facility is helpful when re-
1066 using patterns that have been saved after compiling with an external
1067 set of tables, because the external tables might be at a different
1068 address when pcre_exec() is called. See the pcreprecompile documenta‐
1069 tion for a discussion of saving compiled patterns for later use.
1070
1071 Option bits for pcre_exec()
1072
1073 The unused bits of the options argument for pcre_exec() must be zero.
1074 The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx,
1075 PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK and
1076 PCRE_PARTIAL.
1077
1078 PCRE_ANCHORED
1079
1080 The PCRE_ANCHORED option limits pcre_exec() to matching at the first
1081 matching position. If a pattern was compiled with PCRE_ANCHORED, or
1082 turned out to be anchored by virtue of its contents, it cannot be made
1083 unachored at matching time.
1084
1085 PCRE_NEWLINE_CR
1086 PCRE_NEWLINE_LF
1087 PCRE_NEWLINE_CRLF
1088 PCRE_NEWLINE_ANYCRLF
1089 PCRE_NEWLINE_ANY
1090
1091 These options override the newline definition that was chosen or
1092 defaulted when the pattern was compiled. For details, see the descrip‐
1093 tion of pcre_compile() above. During matching, the newline choice
1094 affects the behaviour of the dot, circumflex, and dollar metacharac‐
1095 ters. It may also alter the way the match position is advanced after a
1096 match failure for an unanchored pattern.
1097
1098 When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is
1099 set, and a match attempt for an unanchored pattern fails when the cur‐
1100 rent position is at a CRLF sequence, and the pattern contains no
1101 explicit matches for CR or NL characters, the match position is
1102 advanced by two characters instead of one, in other words, to after the
1103 CRLF.
1104
1105 The above rule is a compromise that makes the most common cases work as
1106 expected. For example, if the pattern is .+A (and the PCRE_DOTALL
1107 option is not set), it does not match the string "\r\nA" because, after
1108 failing at the start, it skips both the CR and the LF before retrying.
1109 However, the pattern [\r\n]A does match that string, because it con‐
1110 tains an explicit CR or LF reference, and so advances only by one char‐
1111 acter after the first failure. Note than an explicit CR or LF refer‐
1112 ence occurs for negated character classes such as [^X] because they can
1113 match CR or LF characters.
1114
1115 Notwithstanding the above, anomalous effects may still occur when CRLF
1116 is a valid newline sequence and explicit \r or \n escapes appear in the
1117 pattern.
1118
1119 PCRE_NOTBOL
1120
1121 This option specifies that first character of the subject string is not
1122 the beginning of a line, so the circumflex metacharacter should not
1123 match before it. Setting this without PCRE_MULTILINE (at compile time)
1124 causes circumflex never to match. This option affects only the behav‐
1125 iour of the circumflex metacharacter. It does not affect \A.
1126
1127 PCRE_NOTEOL
1128
1129 This option specifies that the end of the subject string is not the end
1130 of a line, so the dollar metacharacter should not match it nor (except
1131 in multiline mode) a newline immediately before it. Setting this with‐
1132 out PCRE_MULTILINE (at compile time) causes dollar never to match. This
1133 option affects only the behaviour of the dollar metacharacter. It does
1134 not affect \Z or \z.
1135
1136 PCRE_NOTEMPTY
1137
1138 An empty string is not considered to be a valid match if this option is
1139 set. If there are alternatives in the pattern, they are tried. If all
1140 the alternatives match the empty string, the entire match fails. For
1141 example, if the pattern
1142
1143 a?b?
1144
1145 is applied to a string not beginning with "a" or "b", it matches the
1146 empty string at the start of the subject. With PCRE_NOTEMPTY set, this
1147 match is not valid, so PCRE searches further into the string for occur‐
1148 rences of "a" or "b".
1149
1150 Perl has no direct equivalent of PCRE_NOTEMPTY, but it does make a spe‐
1151 cial case of a pattern match of the empty string within its split()
1152 function, and when using the /g modifier. It is possible to emulate
1153 Perl's behaviour after matching a null string by first trying the match
1154 again at the same offset with PCRE_NOTEMPTY and PCRE_ANCHORED, and then
1155 if that fails by advancing the starting offset (see below) and trying
1156 an ordinary match again. There is some code that demonstrates how to do
1157 this in the pcredemo.c sample program.
1158
1159 PCRE_NO_UTF8_CHECK
1160
1161 When PCRE_UTF8 is set at compile time, the validity of the subject as a
1162 UTF-8 string is automatically checked when pcre_exec() is subsequently
1163 called. The value of startoffset is also checked to ensure that it
1164 points to the start of a UTF-8 character. There is a discussion about
1165 the validity of UTF-8 strings in the section on UTF-8 support in the
1166 main pcre page. If an invalid UTF-8 sequence of bytes is found,
1167 pcre_exec() returns the error PCRE_ERROR_BADUTF8. If startoffset con‐
1168 tains an invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.
1169
1170 If you already know that your subject is valid, and you want to skip
1171 these checks for performance reasons, you can set the
1172 PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to
1173 do this for the second and subsequent calls to pcre_exec() if you are
1174 making repeated calls to find all the matches in a single subject
1175 string. However, you should be sure that the value of startoffset
1176 points to the start of a UTF-8 character. When PCRE_NO_UTF8_CHECK is
1177 set, the effect of passing an invalid UTF-8 string as a subject, or a
1178 value of startoffset that does not point to the start of a UTF-8 char‐
1179 acter, is undefined. Your program may crash.
1180
1181 PCRE_PARTIAL
1182
1183 This option turns on the partial matching feature. If the subject
1184 string fails to match the pattern, but at some point during the match‐
1185 ing process the end of the subject was reached (that is, the subject
1186 partially matches the pattern and the failure to match occurred only
1187 because there were not enough subject characters), pcre_exec() returns
1188 PCRE_ERROR_PARTIAL instead of PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is
1189 used, there are restrictions on what may appear in the pattern. These
1190 are discussed in the pcrepartial documentation.
1191
1192 The string to be matched by pcre_exec()
1193
1194 The subject string is passed to pcre_exec() as a pointer in subject, a
1195 length in length, and a starting byte offset in startoffset. In UTF-8
1196 mode, the byte offset must point to the start of a UTF-8 character.
1197 Unlike the pattern string, the subject may contain binary zero bytes.
1198 When the starting offset is zero, the search for a match starts at the
1199 beginning of the subject, and this is by far the most common case.
1200
1201 A non-zero starting offset is useful when searching for another match
1202 in the same subject by calling pcre_exec() again after a previous suc‐
1203 cess. Setting startoffset differs from just passing over a shortened
1204 string and setting PCRE_NOTBOL in the case of a pattern that begins
1205 with any kind of lookbehind. For example, consider the pattern
1206
1207 \Biss\B
1208
1209 which finds occurrences of "iss" in the middle of words. (\B matches
1210 only if the current position in the subject is not a word boundary.)
1211 When applied to the string "Mississipi" the first call to pcre_exec()
1212 finds the first occurrence. If pcre_exec() is called again with just
1213 the remainder of the subject, namely "issipi", it does not match,
1214 because \B is always false at the start of the subject, which is deemed
1215 to be a word boundary. However, if pcre_exec() is passed the entire
1216 string again, but with startoffset set to 4, it finds the second occur‐
1217 rence of "iss" because it is able to look behind the starting point to
1218 discover that it is preceded by a letter.
1219
1220 If a non-zero starting offset is passed when the pattern is anchored,
1221 one attempt to match at the given offset is made. This can only succeed
1222 if the pattern does not require the match to be at the start of the
1223 subject.
1224
1225 How pcre_exec() returns captured substrings
1226
1227 In general, a pattern matches a certain portion of the subject, and in
1228 addition, further substrings from the subject may be picked out by
1229 parts of the pattern. Following the usage in Jeffrey Friedl's book,
1230 this is called "capturing" in what follows, and the phrase "capturing
1231 subpattern" is used for a fragment of a pattern that picks out a sub‐
1232 string. PCRE supports several other kinds of parenthesized subpattern
1233 that do not cause substrings to be captured.
1234
1235 Captured substrings are returned to the caller via a vector of integer
1236 offsets whose address is passed in ovector. The number of elements in
1237 the vector is passed in ovecsize, which must be a non-negative number.
1238 Note: this argument is NOT the size of ovector in bytes.
1239
1240 The first two-thirds of the vector is used to pass back captured sub‐
1241 strings, each substring using a pair of integers. The remaining third
1242 of the vector is used as workspace by pcre_exec() while matching cap‐
1243 turing subpatterns, and is not available for passing back information.
1244 The length passed in ovecsize should always be a multiple of three. If
1245 it is not, it is rounded down.
1246
1247 When a match is successful, information about captured substrings is
1248 returned in pairs of integers, starting at the beginning of ovector,
1249 and continuing up to two-thirds of its length at the most. The first
1250 element of a pair is set to the offset of the first character in a sub‐
1251 string, and the second is set to the offset of the first character
1252 after the end of a substring. The first pair, ovector[0] and ovec‐
1253 tor[1], identify the portion of the subject string matched by the
1254 entire pattern. The next pair is used for the first capturing subpat‐
1255 tern, and so on. The value returned by pcre_exec() is one more than the
1256 highest numbered pair that has been set. For example, if two substrings
1257 have been captured, the returned value is 3. If there are no capturing
1258 subpatterns, the return value from a successful match is 1, indicating
1259 that just the first pair of offsets has been set.
1260
1261 If a capturing subpattern is matched repeatedly, it is the last portion
1262 of the string that it matched that is returned.
1263
1264 If the vector is too small to hold all the captured substring offsets,
1265 it is used as far as possible (up to two-thirds of its length), and the
1266 function returns a value of zero. In particular, if the substring off‐
1267 sets are not of interest, pcre_exec() may be called with ovector passed
1268 as NULL and ovecsize as zero. However, if the pattern contains back
1269 references and the ovector is not big enough to remember the related
1270 substrings, PCRE has to get additional memory for use during matching.
1271 Thus it is usually advisable to supply an ovector.
1272
1273 The pcre_info() function can be used to find out how many capturing
1274 subpatterns there are in a compiled pattern. The smallest size for
1275 ovector that will allow for n captured substrings, in addition to the
1276 offsets of the substring matched by the whole pattern, is (n+1)*3.
1277
1278 It is possible for capturing subpattern number n+1 to match some part
1279 of the subject when subpattern n has not been used at all. For example,
1280 if the string "abc" is matched against the pattern (a|(z))(bc) the
1281 return from the function is 4, and subpatterns 1 and 3 are matched, but
1282 2 is not. When this happens, both values in the offset pairs corre‐
1283 sponding to unused subpatterns are set to -1.
1284
1285 Offset values that correspond to unused subpatterns at the end of the
1286 expression are also set to -1. For example, if the string "abc" is
1287 matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not
1288 matched. The return from the function is 2, because the highest used
1289 capturing subpattern number is 1. However, you can refer to the offsets
1290 for the second and third capturing subpatterns if you wish (assuming
1291 the vector is large enough, of course).
1292
1293 Some convenience functions are provided for extracting the captured
1294 substrings as separate strings. These are described below.
1295
1296 Error return values from pcre_exec()
1297
1298 If pcre_exec() fails, it returns a negative number. The following are
1299 defined in the header file:
1300
1301 PCRE_ERROR_NOMATCH (-1)
1302
1303 The subject string did not match the pattern.
1304
1305 PCRE_ERROR_NULL (-2)
1306
1307 Either code or subject was passed as NULL, or ovector was NULL and
1308 ovecsize was not zero.
1309
1310 PCRE_ERROR_BADOPTION (-3)
1311
1312 An unrecognized bit was set in the options argument.
1313
1314 PCRE_ERROR_BADMAGIC (-4)
1315
1316 PCRE stores a 4-byte "magic number" at the start of the compiled code,
1317 to catch the case when it is passed a junk pointer and to detect when a
1318 pattern that was compiled in an environment of one endianness is run in
1319 an environment with the other endianness. This is the error that PCRE
1320 gives when the magic number is not present.
1321
1322 PCRE_ERROR_UNKNOWN_OPCODE (-5)
1323
1324 While running the pattern match, an unknown item was encountered in the
1325 compiled pattern. This error could be caused by a bug in PCRE or by
1326 overwriting of the compiled pattern.
1327
1328 PCRE_ERROR_NOMEMORY (-6)
1329
1330 If a pattern contains back references, but the ovector that is passed
1331 to pcre_exec() is not big enough to remember the referenced substrings,
1332 PCRE gets a block of memory at the start of matching to use for this
1333 purpose. If the call via pcre_malloc() fails, this error is given. The
1334 memory is automatically freed at the end of matching.
1335
1336 PCRE_ERROR_NOSUBSTRING (-7)
1337
1338 This error is used by the pcre_copy_substring(), pcre_get_substring(),
1339 and pcre_get_substring_list() functions (see below). It is never
1340 returned by pcre_exec().
1341
1342 PCRE_ERROR_MATCHLIMIT (-8)
1343
1344 The backtracking limit, as specified by the match_limit field in a
1345 pcre_extra structure (or defaulted) was reached. See the description
1346 above.
1347
1348 PCRE_ERROR_CALLOUT (-9)
1349
1350 This error is never generated by pcre_exec() itself. It is provided for
1351 use by callout functions that want to yield a distinctive error code.
1352 See the pcrecallout documentation for details.
1353
1354 PCRE_ERROR_BADUTF8 (-10)
1355
1356 A string that contains an invalid UTF-8 byte sequence was passed as a
1357 subject.
1358
1359 PCRE_ERROR_BADUTF8_OFFSET (-11)
1360
1361 The UTF-8 byte sequence that was passed as a subject was valid, but the
1362 value of startoffset did not point to the beginning of a UTF-8 charac‐
1363 ter.
1364
1365 PCRE_ERROR_PARTIAL (-12)
1366
1367 The subject string did not match, but it did match partially. See the
1368 pcrepartial documentation for details of partial matching.
1369
1370 PCRE_ERROR_BADPARTIAL (-13)
1371
1372 The PCRE_PARTIAL option was used with a compiled pattern containing
1373 items that are not supported for partial matching. See the pcrepartial
1374 documentation for details of partial matching.
1375
1376 PCRE_ERROR_INTERNAL (-14)
1377
1378 An unexpected internal error has occurred. This error could be caused
1379 by a bug in PCRE or by overwriting of the compiled pattern.
1380
1381 PCRE_ERROR_BADCOUNT (-15)
1382
1383 This error is given if the value of the ovecsize argument is negative.
1384
1385 PCRE_ERROR_RECURSIONLIMIT (-21)
1386
1387 The internal recursion limit, as specified by the match_limit_recursion
1388 field in a pcre_extra structure (or defaulted) was reached. See the
1389 description above.
1390
1391 PCRE_ERROR_BADNEWLINE (-23)
1392
1393 An invalid combination of PCRE_NEWLINE_xxx options was given.
1394
1395 Error numbers -16 to -20 and -22 are not used by pcre_exec().
1396
1398
1399 int pcre_copy_substring(const char *subject, int *ovector,
1400 int stringcount, int stringnumber, char *buffer,
1401 int buffersize);
1402
1403 int pcre_get_substring(const char *subject, int *ovector,
1404 int stringcount, int stringnumber,
1405 const char **stringptr);
1406
1407 int pcre_get_substring_list(const char *subject,
1408 int *ovector, int stringcount, const char ***listptr);
1409
1410 Captured substrings can be accessed directly by using the offsets
1411 returned by pcre_exec() in ovector. For convenience, the functions
1412 pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub‐
1413 string_list() are provided for extracting captured substrings as new,
1414 separate, zero-terminated strings. These functions identify substrings
1415 by number. The next section describes functions for extracting named
1416 substrings.
1417
1418 A substring that contains a binary zero is correctly extracted and has
1419 a further zero added on the end, but the result is not, of course, a C
1420 string. However, you can process such a string by referring to the
1421 length that is returned by pcre_copy_substring() and pcre_get_sub‐
1422 string(). Unfortunately, the interface to pcre_get_substring_list() is
1423 not adequate for handling strings containing binary zeros, because the
1424 end of the final string is not independently indicated.
1425
1426 The first three arguments are the same for all three of these func‐
1427 tions: subject is the subject string that has just been successfully
1428 matched, ovector is a pointer to the vector of integer offsets that was
1429 passed to pcre_exec(), and stringcount is the number of substrings that
1430 were captured by the match, including the substring that matched the
1431 entire regular expression. This is the value returned by pcre_exec() if
1432 it is greater than zero. If pcre_exec() returned zero, indicating that
1433 it ran out of space in ovector, the value passed as stringcount should
1434 be the number of elements in the vector divided by three.
1435
1436 The functions pcre_copy_substring() and pcre_get_substring() extract a
1437 single substring, whose number is given as stringnumber. A value of
1438 zero extracts the substring that matched the entire pattern, whereas
1439 higher values extract the captured substrings. For pcre_copy_sub‐
1440 string(), the string is placed in buffer, whose length is given by
1441 buffersize, while for pcre_get_substring() a new block of memory is
1442 obtained via pcre_malloc, and its address is returned via stringptr.
1443 The yield of the function is the length of the string, not including
1444 the terminating zero, or one of these error codes:
1445
1446 PCRE_ERROR_NOMEMORY (-6)
1447
1448 The buffer was too small for pcre_copy_substring(), or the attempt to
1449 get memory failed for pcre_get_substring().
1450
1451 PCRE_ERROR_NOSUBSTRING (-7)
1452
1453 There is no substring whose number is stringnumber.
1454
1455 The pcre_get_substring_list() function extracts all available sub‐
1456 strings and builds a list of pointers to them. All this is done in a
1457 single block of memory that is obtained via pcre_malloc. The address of
1458 the memory block is returned via listptr, which is also the start of
1459 the list of string pointers. The end of the list is marked by a NULL
1460 pointer. The yield of the function is zero if all went well, or the
1461 error code
1462
1463 PCRE_ERROR_NOMEMORY (-6)
1464
1465 if the attempt to get the memory block failed.
1466
1467 When any of these functions encounter a substring that is unset, which
1468 can happen when capturing subpattern number n+1 matches some part of
1469 the subject, but subpattern n has not been used at all, they return an
1470 empty string. This can be distinguished from a genuine zero-length sub‐
1471 string by inspecting the appropriate offset in ovector, which is nega‐
1472 tive for unset substrings.
1473
1474 The two convenience functions pcre_free_substring() and pcre_free_sub‐
1475 string_list() can be used to free the memory returned by a previous
1476 call of pcre_get_substring() or pcre_get_substring_list(), respec‐
1477 tively. They do nothing more than call the function pointed to by
1478 pcre_free, which of course could be called directly from a C program.
1479 However, PCRE is used in some situations where it is linked via a spe‐
1480 cial interface to another programming language that cannot use
1481 pcre_free directly; it is for these cases that the functions are pro‐
1482 vided.
1483
1485
1486 int pcre_get_stringnumber(const pcre *code,
1487 const char *name);
1488
1489 int pcre_copy_named_substring(const pcre *code,
1490 const char *subject, int *ovector,
1491 int stringcount, const char *stringname,
1492 char *buffer, int buffersize);
1493
1494 int pcre_get_named_substring(const pcre *code,
1495 const char *subject, int *ovector,
1496 int stringcount, const char *stringname,
1497 const char **stringptr);
1498
1499 To extract a substring by name, you first have to find associated num‐
1500 ber. For example, for this pattern
1501
1502 (a+)b(?<xxx>\d+)...
1503
1504 the number of the subpattern called "xxx" is 2. If the name is known to
1505 be unique (PCRE_DUPNAMES was not set), you can find the number from the
1506 name by calling pcre_get_stringnumber(). The first argument is the com‐
1507 piled pattern, and the second is the name. The yield of the function is
1508 the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no
1509 subpattern of that name.
1510
1511 Given the number, you can extract the substring directly, or use one of
1512 the functions described in the previous section. For convenience, there
1513 are also two functions that do the whole job.
1514
1515 Most of the arguments of pcre_copy_named_substring() and
1516 pcre_get_named_substring() are the same as those for the similarly
1517 named functions that extract by number. As these are described in the
1518 previous section, they are not re-described here. There are just two
1519 differences:
1520
1521 First, instead of a substring number, a substring name is given. Sec‐
1522 ond, there is an extra argument, given at the start, which is a pointer
1523 to the compiled pattern. This is needed in order to gain access to the
1524 name-to-number translation table.
1525
1526 These functions call pcre_get_stringnumber(), and if it succeeds, they
1527 then call pcre_copy_substring() or pcre_get_substring(), as appropri‐
1528 ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the
1529 behaviour may not be what you want (see the next section).
1530
1532
1533 int pcre_get_stringtable_entries(const pcre *code,
1534 const char *name, char **first, char **last);
1535
1536 When a pattern is compiled with the PCRE_DUPNAMES option, names for
1537 subpatterns are not required to be unique. Normally, patterns with
1538 duplicate names are such that in any one match, only one of the named
1539 subpatterns participates. An example is shown in the pcrepattern docu‐
1540 mentation.
1541
1542 When duplicates are present, pcre_copy_named_substring() and
1543 pcre_get_named_substring() return the first substring corresponding to
1544 the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING
1545 (-7) is returned; no data is returned. The pcre_get_stringnumber()
1546 function returns one of the numbers that are associated with the name,
1547 but it is not defined which it is.
1548
1549 If you want to get full details of all captured substrings for a given
1550 name, you must use the pcre_get_stringtable_entries() function. The
1551 first argument is the compiled pattern, and the second is the name. The
1552 third and fourth are pointers to variables which are updated by the
1553 function. After it has run, they point to the first and last entries in
1554 the name-to-number table for the given name. The function itself
1555 returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if
1556 there are none. The format of the table is described above in the sec‐
1557 tion entitled Information about a pattern. Given all the relevant
1558 entries for the name, you can extract each of their numbers, and hence
1559 the captured data, if any.
1560
1562
1563 The traditional matching function uses a similar algorithm to Perl,
1564 which stops when it finds the first match, starting at a given point in
1565 the subject. If you want to find all possible matches, or the longest
1566 possible match, consider using the alternative matching function (see
1567 below) instead. If you cannot use the alternative function, but still
1568 need to find all possible matches, you can kludge it up by making use
1569 of the callout facility, which is described in the pcrecallout documen‐
1570 tation.
1571
1572 What you have to do is to insert a callout right at the end of the pat‐
1573 tern. When your callout function is called, extract and save the cur‐
1574 rent matched substring. Then return 1, which forces pcre_exec() to
1575 backtrack and try other alternatives. Ultimately, when it runs out of
1576 matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.
1577
1579
1580 int pcre_dfa_exec(const pcre *code, const pcre_extra *extra,
1581 const char *subject, int length, int startoffset,
1582 int options, int *ovector, int ovecsize,
1583 int *workspace, int wscount);
1584
1585 The function pcre_dfa_exec() is called to match a subject string
1586 against a compiled pattern, using a matching algorithm that scans the
1587 subject string just once, and does not backtrack. This has different
1588 characteristics to the normal algorithm, and is not compatible with
1589 Perl. Some of the features of PCRE patterns are not supported. Never‐
1590 theless, there are times when this kind of matching can be useful. For
1591 a discussion of the two matching algorithms, see the pcrematching docu‐
1592 mentation.
1593
1594 The arguments for the pcre_dfa_exec() function are the same as for
1595 pcre_exec(), plus two extras. The ovector argument is used in a differ‐
1596 ent way, and this is described below. The other common arguments are
1597 used in the same way as for pcre_exec(), so their description is not
1598 repeated here.
1599
1600 The two additional arguments provide workspace for the function. The
1601 workspace vector should contain at least 20 elements. It is used for
1602 keeping track of multiple paths through the pattern tree. More
1603 workspace will be needed for patterns and subjects where there are a
1604 lot of potential matches.
1605
1606 Here is an example of a simple call to pcre_dfa_exec():
1607
1608 int rc;
1609 int ovector[10];
1610 int wspace[20];
1611 rc = pcre_dfa_exec(
1612 re, /* result of pcre_compile() */
1613 NULL, /* we didn't study the pattern */
1614 "some string", /* the subject string */
1615 11, /* the length of the subject string */
1616 0, /* start at offset 0 in the subject */
1617 0, /* default options */
1618 ovector, /* vector of integers for substring information */
1619 10, /* number of elements (NOT size in bytes) */
1620 wspace, /* working space vector */
1621 20); /* number of elements (NOT size in bytes) */
1622
1623 Option bits for pcre_dfa_exec()
1624
1625 The unused bits of the options argument for pcre_dfa_exec() must be
1626 zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW‐
1627 LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NO_UTF8_CHECK,
1628 PCRE_PARTIAL, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last
1629 three of these are the same as for pcre_exec(), so their description is
1630 not repeated here.
1631
1632 PCRE_PARTIAL
1633
1634 This has the same general effect as it does for pcre_exec(), but the
1635 details are slightly different. When PCRE_PARTIAL is set for
1636 pcre_dfa_exec(), the return code PCRE_ERROR_NOMATCH is converted into
1637 PCRE_ERROR_PARTIAL if the end of the subject is reached, there have
1638 been no complete matches, but there is still at least one matching pos‐
1639 sibility. The portion of the string that provided the partial match is
1640 set as the first matching string.
1641
1642 PCRE_DFA_SHORTEST
1643
1644 Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to
1645 stop as soon as it has found one match. Because of the way the alterna‐
1646 tive algorithm works, this is necessarily the shortest possible match
1647 at the first possible matching point in the subject string.
1648
1649 PCRE_DFA_RESTART
1650
1651 When pcre_dfa_exec() is called with the PCRE_PARTIAL option, and
1652 returns a partial match, it is possible to call it again, with addi‐
1653 tional subject characters, and have it continue with the same match.
1654 The PCRE_DFA_RESTART option requests this action; when it is set, the
1655 workspace and wscount options must reference the same vector as before
1656 because data about the match so far is left in them after a partial
1657 match. There is more discussion of this facility in the pcrepartial
1658 documentation.
1659
1660 Successful returns from pcre_dfa_exec()
1661
1662 When pcre_dfa_exec() succeeds, it may have matched more than one sub‐
1663 string in the subject. Note, however, that all the matches from one run
1664 of the function start at the same point in the subject. The shorter
1665 matches are all initial substrings of the longer matches. For example,
1666 if the pattern
1667
1668 <.*>
1669
1670 is matched against the string
1671
1672 This is <something> <something else> <something further> no more
1673
1674 the three matched strings are
1675
1676 <something>
1677 <something> <something else>
1678 <something> <something else> <something further>
1679
1680 On success, the yield of the function is a number greater than zero,
1681 which is the number of matched substrings. The substrings themselves
1682 are returned in ovector. Each string uses two elements; the first is
1683 the offset to the start, and the second is the offset to the end. In
1684 fact, all the strings have the same start offset. (Space could have
1685 been saved by giving this only once, but it was decided to retain some
1686 compatibility with the way pcre_exec() returns data, even though the
1687 meaning of the strings is different.)
1688
1689 The strings are returned in reverse order of length; that is, the long‐
1690 est matching string is given first. If there were too many matches to
1691 fit into ovector, the yield of the function is zero, and the vector is
1692 filled with the longest matches.
1693
1694 Error returns from pcre_dfa_exec()
1695
1696 The pcre_dfa_exec() function returns a negative number when it fails.
1697 Many of the errors are the same as for pcre_exec(), and these are
1698 described above. There are in addition the following errors that are
1699 specific to pcre_dfa_exec():
1700
1701 PCRE_ERROR_DFA_UITEM (-16)
1702
1703 This return is given if pcre_dfa_exec() encounters an item in the pat‐
1704 tern that it does not support, for instance, the use of \C or a back
1705 reference.
1706
1707 PCRE_ERROR_DFA_UCOND (-17)
1708
1709 This return is given if pcre_dfa_exec() encounters a condition item
1710 that uses a back reference for the condition, or a test for recursion
1711 in a specific group. These are not supported.
1712
1713 PCRE_ERROR_DFA_UMLIMIT (-18)
1714
1715 This return is given if pcre_dfa_exec() is called with an extra block
1716 that contains a setting of the match_limit field. This is not supported
1717 (it is meaningless).
1718
1719 PCRE_ERROR_DFA_WSSIZE (-19)
1720
1721 This return is given if pcre_dfa_exec() runs out of space in the
1722 workspace vector.
1723
1724 PCRE_ERROR_DFA_RECURSE (-20)
1725
1726 When a recursive subpattern is processed, the matching function calls
1727 itself recursively, using private vectors for ovector and workspace.
1728 This error is given if the output vector is not large enough. This
1729 should be extremely rare, as a vector of size 1000 is used.
1730
1732
1733 pcrebuild(3), pcrecallout(3), pcrecpp(3)[22m(3), pcrematching(3), pcrepar‐
1734 tial(3), pcreposix(3), pcreprecompile(3), pcresample(3), pcrestack(3).
1735
1737
1738 Philip Hazel
1739 University Computing Service
1740 Cambridge CB2 3QH, England.
1741
1743
1744 Last updated: 21 August 2007
1745 Copyright (c) 1997-2007 University of Cambridge.
1746
1747
1748
1749 PCREAPI(3)