1PCREPOSIX(3)               Library Functions Manual               PCREPOSIX(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions.
7

SYNOPSIS OF POSIX API

9
10       #include <pcreposix.h>
11
12       int regcomp(regex_t *preg, const char *pattern,
13            int cflags);
14
15       int regexec(regex_t *preg, const char *string,
16            size_t nmatch, regmatch_t pmatch[], int eflags);
17
18       size_t regerror(int errcode, const regex_t *preg,
19            char *errbuf, size_t errbuf_size);
20
21       void regfree(regex_t *preg);
22

DESCRIPTION

24
25       This  set  of functions provides a POSIX-style API for the PCRE regular
26       expression 8-bit library. See the pcreapi documentation for a  descrip‐
27       tion  of  PCRE's native API, which contains much additional functional‐
28       ity. There is no POSIX-style  wrapper  for  PCRE's  16-bit  and  32-bit
29       library.
30
31       The functions described here are just wrapper functions that ultimately
32       call  the  PCRE  native  API.  Their  prototypes  are  defined  in  the
33       pcreposix.h  header  file,  and  on  Unix systems the library itself is
34       called pcreposix.a, so can be accessed by  adding  -lpcreposix  to  the
35       command  for  linking  an application that uses them. Because the POSIX
36       functions call the native ones, it is also necessary to add -lpcre.
37
38       I have implemented only those POSIX option bits that can be  reasonably
39       mapped  to PCRE native options. In addition, the option REG_EXTENDED is
40       defined with the value zero. This has no  effect,  but  since  programs
41       that  are  written  to  the POSIX interface often use it, this makes it
42       easier to slot in PCRE as a replacement library.  Other  POSIX  options
43       are not even defined.
44
45       There  are also some other options that are not defined by POSIX. These
46       have been added at the request of users who want to make use of certain
47       PCRE-specific features via the POSIX calling interface.
48
49       When  PCRE  is  called  via these functions, it is only the API that is
50       POSIX-like in style. The syntax and semantics of  the  regular  expres‐
51       sions  themselves  are  still  those of Perl, subject to the setting of
52       various PCRE options, as described below. "POSIX-like in  style"  means
53       that  the  API  approximates  to  the POSIX definition; it is not fully
54       POSIX-compatible, and in multi-byte encoding  domains  it  is  probably
55       even less compatible.
56
57       The  header for these functions is supplied as pcreposix.h to avoid any
58       potential clash with other POSIX  libraries.  It  can,  of  course,  be
59       renamed or aliased as regex.h, which is the "correct" name. It provides
60       two structure types, regex_t for  compiled  internal  forms,  and  reg‐
61       match_t  for  returning  captured substrings. It also defines some con‐
62       stants whose names start  with  "REG_";  these  are  used  for  setting
63       options and identifying error codes.
64

COMPILING A PATTERN

66
67       The  function regcomp() is called to compile a pattern into an internal
68       form. The pattern is a C string terminated by a  binary  zero,  and  is
69       passed  in  the  argument  pattern. The preg argument is a pointer to a
70       regex_t structure that is used as a base for storing information  about
71       the compiled regular expression.
72
73       The argument cflags is either zero, or contains one or more of the bits
74       defined by the following macros:
75
76         REG_DOTALL
77
78       The PCRE_DOTALL option is set when the regular expression is passed for
79       compilation to the native function. Note that REG_DOTALL is not part of
80       the POSIX standard.
81
82         REG_ICASE
83
84       The PCRE_CASELESS option is set when the regular expression  is  passed
85       for compilation to the native function.
86
87         REG_NEWLINE
88
89       The  PCRE_MULTILINE option is set when the regular expression is passed
90       for compilation to the native function. Note that this does  not  mimic
91       the  defined  POSIX  behaviour  for REG_NEWLINE (see the following sec‐
92       tion).
93
94         REG_NOSUB
95
96       The PCRE_NO_AUTO_CAPTURE option is set when the regular  expression  is
97       passed for compilation to the native function. In addition, when a pat‐
98       tern that is compiled with this flag is passed to regexec() for  match‐
99       ing,  the  nmatch  and  pmatch  arguments  are ignored, and no captured
100       strings are returned.
101
102         REG_UCP
103
104       The PCRE_UCP option is set when the regular expression  is  passed  for
105       compilation  to  the  native  function. This causes PCRE to use Unicode
106       properties when matchine \d, \w,  etc.,  instead  of  just  recognizing
107       ASCII values. Note that REG_UTF8 is not part of the POSIX standard.
108
109         REG_UNGREEDY
110
111       The  PCRE_UNGREEDY  option is set when the regular expression is passed
112       for compilation to the native function. Note that REG_UNGREEDY  is  not
113       part of the POSIX standard.
114
115         REG_UTF8
116
117       The  PCRE_UTF8  option is set when the regular expression is passed for
118       compilation to the native function. This causes the pattern itself  and
119       all  data  strings used for matching it to be treated as UTF-8 strings.
120       Note that REG_UTF8 is not part of the POSIX standard.
121
122       In the absence of these flags, no options  are  passed  to  the  native
123       function.   This  means  the  the  regex  is compiled with PCRE default
124       semantics. In particular, the way it handles newline characters in  the
125       subject  string  is  the Perl way, not the POSIX way. Note that setting
126       PCRE_MULTILINE has only some of the effects specified for  REG_NEWLINE.
127       It  does not affect the way newlines are matched by . (they are not) or
128       by a negative class such as [^a] (they are).
129
130       The yield of regcomp() is zero on success, and non-zero otherwise.  The
131       preg structure is filled in on success, and one member of the structure
132       is public: re_nsub contains the number of capturing subpatterns in  the
133       regular expression. Various error codes are defined in the header file.
134
135       NOTE:  If  the  yield of regcomp() is non-zero, you must not attempt to
136       use the contents of the preg structure. If, for example, you pass it to
137       regexec(), the result is undefined and your program is likely to crash.
138

MATCHING NEWLINE CHARACTERS

140
141       This area is not simple, because POSIX and Perl take different views of
142       things.  It is not possible to get PCRE to obey  POSIX  semantics,  but
143       then  PCRE was never intended to be a POSIX engine. The following table
144       lists the different possibilities for matching  newline  characters  in
145       PCRE:
146
147                                 Default   Change with
148
149         . matches newline          no     PCRE_DOTALL
150         newline matches [^a]       yes    not changeable
151         $ matches \n at end        yes    PCRE_DOLLARENDONLY
152         $ matches \n in middle     no     PCRE_MULTILINE
153         ^ matches \n in middle     no     PCRE_MULTILINE
154
155       This is the equivalent table for POSIX:
156
157                                 Default   Change with
158
159         . matches newline          yes    REG_NEWLINE
160         newline matches [^a]       yes    REG_NEWLINE
161         $ matches \n at end        no     REG_NEWLINE
162         $ matches \n in middle     no     REG_NEWLINE
163         ^ matches \n in middle     no     REG_NEWLINE
164
165       PCRE's behaviour is the same as Perl's, except that there is no equiva‐
166       lent for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl,  there  is
167       no way to stop newline from matching [^a].
168
169       The   default  POSIX  newline  handling  can  be  obtained  by  setting
170       PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is no way to  make  PCRE
171       behave exactly as for the REG_NEWLINE action.
172

MATCHING A PATTERN

174
175       The  function  regexec()  is  called  to  match a compiled pattern preg
176       against a given string, which is by default terminated by a  zero  byte
177       (but  see  REG_STARTEND below), subject to the options in eflags. These
178       can be:
179
180         REG_NOTBOL
181
182       The PCRE_NOTBOL option is set when calling the underlying PCRE matching
183       function.
184
185         REG_NOTEMPTY
186
187       The PCRE_NOTEMPTY option is set when calling the underlying PCRE match‐
188       ing function. Note that REG_NOTEMPTY is not part of the POSIX standard.
189       However, setting this option can give more POSIX-like behaviour in some
190       situations.
191
192         REG_NOTEOL
193
194       The PCRE_NOTEOL option is set when calling the underlying PCRE matching
195       function.
196
197         REG_STARTEND
198
199       The  string  is  considered to start at string + pmatch[0].rm_so and to
200       have a terminating NUL located at string + pmatch[0].rm_eo (there  need
201       not  actually  be  a  NUL at that location), regardless of the value of
202       nmatch. This is a BSD extension, compatible with but not  specified  by
203       IEEE  Standard  1003.2  (POSIX.2),  and  should be used with caution in
204       software intended to be portable to other systems. Note that a non-zero
205       rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
206       of the string, not how it is matched.
207
208       If the pattern was compiled with the REG_NOSUB flag, no data about  any
209       matched  strings  is  returned.  The  nmatch  and  pmatch  arguments of
210       regexec() are ignored.
211
212       If the value of nmatch is zero, or if the value pmatch is NULL, no data
213       about any matched strings is returned.
214
215       Otherwise,the portion of the string that was matched, and also any cap‐
216       tured substrings, are returned via the pmatch argument, which points to
217       an  array  of nmatch structures of type regmatch_t, containing the mem‐
218       bers rm_so and rm_eo. These contain the offset to the  first  character
219       of  each  substring and the offset to the first character after the end
220       of each substring, respectively. The 0th element of the vector  relates
221       to  the  entire portion of string that was matched; subsequent elements
222       relate to the capturing subpatterns of the regular  expression.  Unused
223       entries in the array have both structure members set to -1.
224
225       A  successful  match  yields  a  zero  return;  various error codes are
226       defined in the header file, of  which  REG_NOMATCH  is  the  "expected"
227       failure code.
228

ERROR MESSAGES

230
231       The regerror() function maps a non-zero errorcode from either regcomp()
232       or regexec() to a printable message. If preg is  not  NULL,  the  error
233       should have arisen from the use of that structure. A message terminated
234       by a binary zero is placed  in  errbuf.  The  length  of  the  message,
235       including  the  zero, is limited to errbuf_size. The yield of the func‐
236       tion is the size of buffer needed to hold the whole message.
237

MEMORY USAGE

239
240       Compiling a regular expression causes memory to be allocated and  asso‐
241       ciated  with  the preg structure. The function regfree() frees all such
242       memory, after which preg may no longer be used as  a  compiled  expres‐
243       sion.
244

AUTHOR

246
247       Philip Hazel
248       University Computing Service
249       Cambridge CB2 3QH, England.
250

REVISION

252
253       Last updated: 09 January 2012
254       Copyright (c) 1997-2012 University of Cambridge.
255
256
257
258PCRE 8.30                       09 January 2012                   PCREPOSIX(3)
Impressum