1PCRE2POSIX(3)              Library Functions Manual              PCRE2POSIX(3)
2
3
4

NAME

6       PCRE2 - Perl-compatible regular expressions (revised API)
7

SYNOPSIS

9
10       #include <pcre2posix.h>
11
12       int regcomp(regex_t *preg, const char *pattern,
13            int cflags);
14
15       int regexec(const regex_t *preg, const char *string,
16            size_t nmatch, regmatch_t pmatch[], int eflags);
17
18       size_t regerror(int errcode, const regex_t *preg,
19            char *errbuf, size_t errbuf_size);
20
21       void regfree(regex_t *preg);
22

DESCRIPTION

24
25       This  set of functions provides a POSIX-style API for the PCRE2 regular
26       expression 8-bit library. See the pcre2api documentation for a descrip‐
27       tion  of PCRE2's native API, which contains much additional functional‐
28       ity. There are no POSIX-style wrappers for PCRE2's  16-bit  and  32-bit
29       libraries.
30
31       The functions described here are just wrapper functions that ultimately
32       call the  PCRE2  native  API.  Their  prototypes  are  defined  in  the
33       pcre2posix.h  header  file,  and  on Unix systems the library itself is
34       called libpcre2-posix.a, so can be accessed by adding -lpcre2-posix  to
35       the  command  for  linking  an  application that uses them. Because the
36       POSIX functions call the native ones,  it  is  also  necessary  to  add
37       -lpcre2-8.
38
39       Those  POSIX  option bits that can reasonably be mapped to PCRE2 native
40       options have been implemented. In addition, the option REG_EXTENDED  is
41       defined  with  the  value  zero. This has no effect, but since programs
42       that are written to the POSIX interface often use  it,  this  makes  it
43       easier  to  slot in PCRE2 as a replacement library. Other POSIX options
44       are not even defined.
45
46       There are also some options that are not defined by POSIX.  These  have
47       been  added  at  the  request  of users who want to make use of certain
48       PCRE2-specific features via the POSIX calling interface.
49
50       When PCRE2 is called via these functions, it is only the  API  that  is
51       POSIX-like  in  style.  The syntax and semantics of the regular expres‐
52       sions themselves are still those of Perl, subject  to  the  setting  of
53       various  PCRE2 options, as described below. "POSIX-like in style" means
54       that the API approximates to the POSIX  definition;  it  is  not  fully
55       POSIX-compatible,  and  in  multi-unit  encoding domains it is probably
56       even less compatible.
57
58       The header for these functions is supplied as pcre2posix.h to avoid any
59       potential  clash  with  other  POSIX  libraries.  It can, of course, be
60       renamed or aliased as regex.h, which is the "correct" name. It provides
61       two  structure  types,  regex_t  for  compiled internal forms, and reg‐
62       match_t for returning captured substrings. It also  defines  some  con‐
63       stants  whose  names  start  with  "REG_";  these  are used for setting
64       options and identifying error codes.
65

COMPILING A PATTERN

67
68       The function regcomp() is called to compile a pattern into an  internal
69       form.  The  pattern  is  a C string terminated by a binary zero, and is
70       passed in the argument pattern. The preg argument is  a  pointer  to  a
71       regex_t  structure that is used as a base for storing information about
72       the compiled regular expression.
73
74       The argument cflags is either zero, or contains one or more of the bits
75       defined by the following macros:
76
77         REG_DOTALL
78
79       The  PCRE2_DOTALL  option  is set when the regular expression is passed
80       for compilation to the native function. Note  that  REG_DOTALL  is  not
81       part of the POSIX standard.
82
83         REG_ICASE
84
85       The  PCRE2_CASELESS option is set when the regular expression is passed
86       for compilation to the native function.
87
88         REG_NEWLINE
89
90       The PCRE2_MULTILINE option is set when the regular expression is passed
91       for  compilation  to the native function. Note that this does not mimic
92       the defined POSIX behaviour for REG_NEWLINE  (see  the  following  sec‐
93       tion).
94
95         REG_NOSUB
96
97       When  a  pattern that is compiled with this flag is passed to regexec()
98       for matching, the nmatch and pmatch arguments are ignored, and no  cap‐
99       tured strings are returned. Versions of the PCRE library prior to 10.22
100       used to set the  PCRE2_NO_AUTO_CAPTURE  compile  option,  but  this  no
101       longer happens because it disables the use of back references.
102
103         REG_UCP
104
105       The  PCRE2_UCP  option is set when the regular expression is passed for
106       compilation to the native function. This causes PCRE2  to  use  Unicode
107       properties  when  matchine  \d,  \w,  etc., instead of just recognizing
108       ASCII values. Note that REG_UCP is not part of the POSIX standard.
109
110         REG_UNGREEDY
111
112       The PCRE2_UNGREEDY option is set when the regular expression is  passed
113       for  compilation  to the native function. Note that REG_UNGREEDY is not
114       part of the POSIX standard.
115
116         REG_UTF
117
118       The PCRE2_UTF option is set when the regular expression is  passed  for
119       compilation  to the native function. This causes the pattern itself and
120       all data strings used for matching it to be treated as  UTF-8  strings.
121       Note that REG_UTF is not part of the POSIX standard.
122
123       In  the  absence  of  these  flags, no options are passed to the native
124       function.  This means the the regex  is  compiled  with  PCRE2  default
125       semantics.  In particular, the way it handles newline characters in the
126       subject string is the Perl way, not the POSIX way.  Note  that  setting
127       PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE.
128       It does not affect the way newlines are matched by the dot  metacharac‐
129       ter (they are not) or by a negative class such as [^a] (they are).
130
131       The  yield of regcomp() is zero on success, and non-zero otherwise. The
132       preg structure is filled in on success, and one member of the structure
133       is  public: re_nsub contains the number of capturing subpatterns in the
134       regular expression. Various error codes are defined in the header file.
135
136       NOTE: If the yield of regcomp() is non-zero, you must  not  attempt  to
137       use the contents of the preg structure. If, for example, you pass it to
138       regexec(), the result is undefined and your program is likely to crash.
139

MATCHING NEWLINE CHARACTERS

141
142       This area is not simple, because POSIX and Perl take different views of
143       things.   It  is not possible to get PCRE2 to obey POSIX semantics, but
144       then PCRE2 was never intended to be a POSIX engine. The following table
145       lists  the  different  possibilities for matching newline characters in
146       Perl and PCRE2:
147
148                                 Default   Change with
149
150         . matches newline          no     PCRE2_DOTALL
151         newline matches [^a]       yes    not changeable
152         $ matches \n at end        yes    PCRE2_DOLLAR_ENDONLY
153         $ matches \n in middle     no     PCRE2_MULTILINE
154         ^ matches \n in middle     no     PCRE2_MULTILINE
155
156       This is the equivalent table for a POSIX-compatible pattern matcher:
157
158                                 Default   Change with
159
160         . matches newline          yes    REG_NEWLINE
161         newline matches [^a]       yes    REG_NEWLINE
162         $ matches \n at end        no     REG_NEWLINE
163         $ matches \n in middle     no     REG_NEWLINE
164         ^ matches \n in middle     no     REG_NEWLINE
165
166       This behaviour is not what happens when PCRE2 is called via  its  POSIX
167       API.  By  default, PCRE2's behaviour is the same as Perl's, except that
168       there is no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both  PCRE2
169       and Perl, there is no way to stop newline from matching [^a].
170
171       Default  POSIX newline handling can be obtained by setting PCRE2_DOTALL
172       and PCRE2_DOLLAR_ENDONLY when  calling  pcre2_compile()  directly,  but
173       there  is  no  way  to make PCRE2 behave exactly as for the REG_NEWLINE
174       action. When using the POSIX API, passing REG_NEWLINE to  PCRE2's  reg‐
175       comp() function causes PCRE2_MULTILINE to be passed to pcre2_compile(),
176       and REG_DOTALL passes PCRE2_DOTALL. There is no way to pass  PCRE2_DOL‐
177       LAR_ENDONLY.
178

MATCHING A PATTERN

180
181       The  function  regexec()  is  called  to  match a compiled pattern preg
182       against a given string, which is by default terminated by a  zero  byte
183       (but  see  REG_STARTEND below), subject to the options in eflags. These
184       can be:
185
186         REG_NOTBOL
187
188       The PCRE2_NOTBOL option is set when calling the underlying PCRE2 match‐
189       ing function.
190
191         REG_NOTEMPTY
192
193       The  PCRE2_NOTEMPTY  option  is  set  when calling the underlying PCRE2
194       matching function. Note that REG_NOTEMPTY is  not  part  of  the  POSIX
195       standard.  However, setting this option can give more POSIX-like behav‐
196       iour in some situations.
197
198         REG_NOTEOL
199
200       The PCRE2_NOTEOL option is set when calling the underlying PCRE2 match‐
201       ing function.
202
203         REG_STARTEND
204
205       The  string  is  considered to start at string + pmatch[0].rm_so and to
206       have a terminating NUL located at string + pmatch[0].rm_eo (there  need
207       not  actually  be  a  NUL at that location), regardless of the value of
208       nmatch. This is a BSD extension, compatible with but not  specified  by
209       IEEE  Standard  1003.2  (POSIX.2),  and  should be used with caution in
210       software intended to be portable to other systems. Note that a non-zero
211       rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location
212       of the string, not how it is matched. Setting REG_STARTEND and  passing
213       pmatch  as  NULL  are  mutually  exclusive;  the  error  REG_INVARG  is
214       returned.
215
216       If the pattern was compiled with the REG_NOSUB flag, no data about  any
217       matched  strings  is  returned.  The  nmatch  and  pmatch  arguments of
218       regexec() are ignored (except possibly as input for REG_STARTEND).
219
220       The value of nmatch may be zero, and  the  value  pmatch  may  be  NULL
221       (unless  REG_STARTEND  is  set);  in both these cases no data about any
222       matched strings is returned.
223
224       Otherwise, the portion of the string that was  matched,  and  also  any
225       captured substrings, are returned via the pmatch argument, which points
226       to an array of nmatch structures of  type  regmatch_t,  containing  the
227       members  rm_so  and  rm_eo.  These contain the byte offset to the first
228       character of each substring and the offset to the first character after
229       the  end of each substring, respectively. The 0th element of the vector
230       relates to the entire portion of string that  was  matched;  subsequent
231       elements relate to the capturing subpatterns of the regular expression.
232       Unused entries in the array have both structure members set to -1.
233
234       A successful match yields  a  zero  return;  various  error  codes  are
235       defined  in  the  header  file,  of which REG_NOMATCH is the "expected"
236       failure code.
237

ERROR MESSAGES

239
240       The regerror() function maps a non-zero errorcode from either regcomp()
241       or  regexec()  to  a  printable message. If preg is not NULL, the error
242       should have arisen from the use of that structure. A message terminated
243       by  a binary zero is placed in errbuf. If the buffer is too short, only
244       the first errbuf_size - 1 characters of the error message are used. The
245       yield  of  the  function is the size of buffer needed to hold the whole
246       message, including the terminating zero. This  value  is  greater  than
247       errbuf_size if the message was truncated.
248

MEMORY USAGE

250
251       Compiling  a regular expression causes memory to be allocated and asso‐
252       ciated with the preg structure. The function regfree() frees  all  such
253       memory,  after  which  preg may no longer be used as a compiled expres‐
254       sion.
255

AUTHOR

257
258       Philip Hazel
259       University Computing Service
260       Cambridge, England.
261

REVISION

263
264       Last updated: 31 January 2016
265       Copyright (c) 1997-2016 University of Cambridge.
266
267
268
269PCRE2 10.22                     31 January 2016                  PCRE2POSIX(3)
Impressum