1PCREPOSIX(3)               Library Functions Manual               PCREPOSIX(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS

9
10       #include <pcreposix.h>
11
12       int regcomp(regex_t *preg, const char *pattern,
13            int cflags);
14
15       int regexec(regex_t *preg, const char *string,
16            size_t nmatch, regmatch_t pmatch[], int eflags);
17            size_t regerror(int errcode, const regex_t *preg,
18            char *errbuf, size_t errbuf_size);
19
20       void regfree(regex_t *preg);
21

DESCRIPTION

23
24       This  set  of functions provides a POSIX-style API for the PCRE regular
25       expression 8-bit library. See the pcreapi documentation for a  descrip‐
26       tion  of  PCRE's native API, which contains much additional functional‐
27       ity. There is no POSIX-style  wrapper  for  PCRE's  16-bit  and  32-bit
28       library.
29
30       The functions described here are just wrapper functions that ultimately
31       call  the  PCRE  native  API.  Their  prototypes  are  defined  in  the
32       pcreposix.h  header  file,  and  on  Unix systems the library itself is
33       called pcreposix.a, so can be accessed by  adding  -lpcreposix  to  the
34       command  for  linking  an application that uses them. Because the POSIX
35       functions call the native ones, it is also necessary to add -lpcre.
36
37       I have implemented only those POSIX option bits that can be  reasonably
38       mapped  to PCRE native options. In addition, the option REG_EXTENDED is
39       defined with the value zero. This has no  effect,  but  since  programs
40       that  are  written  to  the POSIX interface often use it, this makes it
41       easier to slot in PCRE as a replacement library.  Other  POSIX  options
42       are not even defined.
43
44       There  are also some other options that are not defined by POSIX. These
45       have been added at the request of users who want to make use of certain
46       PCRE-specific features via the POSIX calling interface.
47
48       When  PCRE  is  called  via these functions, it is only the API that is
49       POSIX-like in style. The syntax and semantics of  the  regular  expres‐
50       sions  themselves  are  still  those of Perl, subject to the setting of
51       various PCRE options, as described below. "POSIX-like in  style"  means
52       that  the  API  approximates  to  the POSIX definition; it is not fully
53       POSIX-compatible, and in multi-byte encoding  domains  it  is  probably
54       even less compatible.
55
56       The  header for these functions is supplied as pcreposix.h to avoid any
57       potential clash with other POSIX  libraries.  It  can,  of  course,  be
58       renamed or aliased as regex.h, which is the "correct" name. It provides
59       two structure types, regex_t for  compiled  internal  forms,  and  reg‐
60       match_t  for  returning  captured substrings. It also defines some con‐
61       stants whose names start  with  "REG_";  these  are  used  for  setting
62       options and identifying error codes.
63

COMPILING A PATTERN

65
66       The  function regcomp() is called to compile a pattern into an internal
67       form. The pattern is a C string terminated by a  binary  zero,  and  is
68       passed  in  the  argument  pattern. The preg argument is a pointer to a
69       regex_t structure that is used as a base for storing information  about
70       the compiled regular expression.
71
72       The argument cflags is either zero, or contains one or more of the bits
73       defined by the following macros:
74
75         REG_DOTALL
76
77       The PCRE_DOTALL option is set when the regular expression is passed for
78       compilation to the native function. Note that REG_DOTALL is not part of
79       the POSIX standard.
80
81         REG_ICASE
82
83       The PCRE_CASELESS option is set when the regular expression  is  passed
84       for compilation to the native function.
85
86         REG_NEWLINE
87
88       The  PCRE_MULTILINE option is set when the regular expression is passed
89       for compilation to the native function. Note that this does  not  mimic
90       the  defined  POSIX  behaviour  for REG_NEWLINE (see the following sec‐
91       tion).
92
93         REG_NOSUB
94
95       The PCRE_NO_AUTO_CAPTURE option is set when the regular  expression  is
96       passed for compilation to the native function. In addition, when a pat‐
97       tern that is compiled with this flag is passed to regexec() for  match‐
98       ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
99       strings are returned.
100
101         REG_UCP
102
103       The PCRE_UCP option is set when the regular expression  is  passed  for
104       compilation  to  the  native  function. This causes PCRE to use Unicode
105       properties when matchine \d, \w,  etc.,  instead  of  just  recognizing
106       ASCII values. Note that REG_UTF8 is not part of the POSIX standard.
107
108         REG_UNGREEDY
109
110       The  PCRE_UNGREEDY  option is set when the regular expression is passed
111       for compilation to the native function. Note that REG_UNGREEDY  is  not
112       part of the POSIX standard.
113
114         REG_UTF8
115
116       The  PCRE_UTF8  option is set when the regular expression is passed for
117       compilation to the native function. This causes the pattern itself  and
118       all  data  strings used for matching it to be treated as UTF-8 strings.
119       Note that REG_UTF8 is not part of the POSIX standard.
120
121       In the absence of these flags, no options  are  passed  to  the  native
122       function.   This  means  the  the  regex  is compiled with PCRE default
123       semantics. In particular, the way it handles newline characters in  the
124       subject  string  is  the Perl way, not the POSIX way. Note that setting
125       PCRE_MULTILINE has only some of the effects specified for  REG_NEWLINE.
126       It  does not affect the way newlines are matched by . (they are not) or
127       by a negative class such as [^a] (they are).
128
129       The yield of regcomp() is zero on success, and non-zero otherwise.  The
130       preg structure is filled in on success, and one member of the structure
131       is public: re_nsub contains the number of capturing subpatterns in  the
132       regular expression. Various error codes are defined in the header file.
133
134       NOTE:  If  the  yield of regcomp() is non-zero, you must not attempt to
135       use the contents of the preg structure. If, for example, you pass it to
136       regexec(), the result is undefined and your program is likely to crash.
137

MATCHING NEWLINE CHARACTERS

139
140       This area is not simple, because POSIX and Perl take different views of
141       things.  It is not possible to get PCRE to obey  POSIX  semantics,  but
142       then  PCRE was never intended to be a POSIX engine. The following table
143       lists the different possibilities for matching  newline  characters  in
144       PCRE:
145
146                                 Default   Change with
147
148         . matches newline          no     PCRE_DOTALL
149         newline matches [^a]       yes    not changeable
150         $ matches \n at end        yes    PCRE_DOLLARENDONLY
151         $ matches \n in middle     no     PCRE_MULTILINE
152         ^ matches \n in middle     no     PCRE_MULTILINE
153
154       This is the equivalent table for POSIX:
155
156                                 Default   Change with
157
158         . matches newline          yes    REG_NEWLINE
159         newline matches [^a]       yes    REG_NEWLINE
160         $ matches \n at end        no     REG_NEWLINE
161         $ matches \n in middle     no     REG_NEWLINE
162         ^ matches \n in middle     no     REG_NEWLINE
163
164       PCRE's behaviour is the same as Perl's, except that there is no equiva‐
165       lent for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl,  there  is
166       no way to stop newline from matching [^a].
167
168       The   default  POSIX  newline  handling  can  be  obtained  by  setting
169       PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is no way to  make  PCRE
170       behave exactly as for the REG_NEWLINE action.
171

MATCHING A PATTERN

173
174       The  function  regexec()  is  called  to  match a compiled pattern preg
175       against a given string, which is by default terminated by a  zero  byte
176       (but  see  REG_STARTEND below), subject to the options in eflags. These
177       can be:
178
179         REG_NOTBOL
180
181       The PCRE_NOTBOL option is set when calling the underlying PCRE matching
182       function.
183
184         REG_NOTEMPTY
185
186       The PCRE_NOTEMPTY option is set when calling the underlying PCRE match‐
187       ing function. Note that REG_NOTEMPTY is not part of the POSIX standard.
188       However, setting this option can give more POSIX-like behaviour in some
189       situations.
190
191         REG_NOTEOL
192
193       The PCRE_NOTEOL option is set when calling the underlying PCRE matching
194       function.
195
196         REG_STARTEND
197
198       The  string  is  considered to start at string + pmatch[0].rm_so and to
199       have a terminating NUL located at string + pmatch[0].rm_eo (there  need
200       not  actually  be  a  NUL at that location), regardless of the value of
201       nmatch. This is a BSD extension, compatible with but not  specified  by
202       IEEE  Standard  1003.2  (POSIX.2),  and  should be used with caution in
203       software intended to be portable to other systems. Note that a non-zero
204       rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
205       of the string, not how it is matched.
206
207       If the pattern was compiled with the REG_NOSUB flag, no data about  any
208       matched  strings  is  returned.  The  nmatch  and  pmatch  arguments of
209       regexec() are ignored.
210
211       If the value of nmatch is zero, or if the value pmatch is NULL, no data
212       about any matched strings is returned.
213
214       Otherwise,the portion of the string that was matched, and also any cap‐
215       tured substrings, are returned via the pmatch argument, which points to
216       an  array  of nmatch structures of type regmatch_t, containing the mem‐
217       bers rm_so and rm_eo. These contain the offset to the  first  character
218       of  each  substring and the offset to the first character after the end
219       of each substring, respectively. The 0th element of the vector  relates
220       to  the  entire portion of string that was matched; subsequent elements
221       relate to the capturing subpatterns of the regular  expression.  Unused
222       entries in the array have both structure members set to -1.
223
224       A  successful  match  yields  a  zero  return;  various error codes are
225       defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
226       failure code.
227

ERROR MESSAGES

229
230       The regerror() function maps a non-zero errorcode from either regcomp()
231       or regexec() to a printable message. If preg is  not  NULL,  the  error
232       should have arisen from the use of that structure. A message terminated
233       by a binary zero is placed  in  errbuf.  The  length  of  the  message,
234       including  the  zero, is limited to errbuf_size. The yield of the func‐
235       tion is the size of buffer needed to hold the whole message.
236

MEMORY USAGE

238
239       Compiling a regular expression causes memory to be allocated and  asso‐
240       ciated  with  the preg structure. The function regfree() frees all such
241       memory, after which preg may no longer be used as  a  compiled  expres‐
242       sion.
243

AUTHOR

245
246       Philip Hazel
247       University Computing Service
248       Cambridge CB2 3QH, England.
249

REVISION

251
252       Last updated: 09 January 2012
253       Copyright (c) 1997-2012 University of Cambridge.
254
255
256
257PCRE 8.30                       09 January 2012                   PCREPOSIX(3)
Impressum