1PCREBUILD(3)               Library Functions Manual               PCREBUILD(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE BUILD-TIME OPTIONS

9
10       This  document  describes  the  optional  features  of PCRE that can be
11       selected when the library is compiled. It assumes use of the  configure
12       script,  where the optional features are selected or deselected by pro‐
13       viding options to configure before running the make  command.  However,
14       the  same  options  can be selected in both Unix-like and non-Unix-like
15       environments using the GUI facility of  CMakeSetup  if  you  are  using
16       CMake instead of configure to build PCRE.
17
18       The complete list of options for configure (which includes the standard
19       ones such as the  selection  of  the  installation  directory)  can  be
20       obtained by running
21
22         ./configure --help
23
24       The  following  sections  include  descriptions  of options whose names
25       begin with --enable or --disable. These settings specify changes to the
26       defaults  for  the configure command. Because of the way that configure
27       works, --enable and --disable always come in pairs, so  the  complemen‐
28       tary  option always exists as well, but as it specifies the default, it
29       is not described.
30

C++ SUPPORT

32
33       By default, the configure script will search for a C++ compiler and C++
34       header files. If it finds them, it automatically builds the C++ wrapper
35       library for PCRE. You can disable this by adding
36
37         --disable-cpp
38
39       to the configure command.
40

UTF-8 SUPPORT

42
43       To build PCRE with support for UTF-8 character strings, add
44
45         --enable-utf8
46
47       to the configure command. Of itself, this  does  not  make  PCRE  treat
48       strings  as UTF-8. As well as compiling PCRE with this option, you also
49       have have to set the PCRE_UTF8 option when you call the  pcre_compile()
50       function.
51

UNICODE CHARACTER PROPERTY SUPPORT

53
54       UTF-8  support allows PCRE to process character values greater than 255
55       in the strings that it handles. On its own, however, it does  not  pro‐
56       vide any facilities for accessing the properties of such characters. If
57       you want to be able to use the pattern escapes \P, \p,  and  \X,  which
58       refer to Unicode character properties, you must add
59
60         --enable-unicode-properties
61
62       to  the configure command. This implies UTF-8 support, even if you have
63       not explicitly requested it.
64
65       Including Unicode property support adds around 30K  of  tables  to  the
66       PCRE  library.  Only  the general category properties such as Lu and Nd
67       are supported. Details are given in the pcrepattern documentation.
68

CODE VALUE OF NEWLINE

70
71       By default, PCRE interprets character 10 (linefeed, LF)  as  indicating
72       the  end  of  a line. This is the normal newline character on Unix-like
73       systems. You can compile PCRE to use character 13 (carriage return, CR)
74       instead, by adding
75
76         --enable-newline-is-cr
77
78       to  the  configure  command.  There  is  also  a --enable-newline-is-lf
79       option, which explicitly specifies linefeed as the newline character.
80
81       Alternatively, you can specify that line endings are to be indicated by
82       the two character sequence CRLF. If you want this, add
83
84         --enable-newline-is-crlf
85
86       to the configure command. There is a fourth option, specified by
87
88         --enable-newline-is-anycrlf
89
90       which  causes  PCRE  to recognize any of the three sequences CR, LF, or
91       CRLF as indicating a line ending. Finally, a fifth option, specified by
92
93         --enable-newline-is-any
94
95       causes PCRE to recognize any Unicode newline sequence.
96
97       Whatever line ending convention is selected when PCRE is built  can  be
98       overridden  when  the library functions are called. At build time it is
99       conventional to use the standard for your operating system.
100

WHAT \R MATCHES

102
103       By default, the sequence \R in a pattern matches  any  Unicode  newline
104       sequence,  whatever  has  been selected as the line ending sequence. If
105       you specify
106
107         --enable-bsr-anycrlf
108
109       the default is changed so that \R matches only CR, LF, or  CRLF.  What‐
110       ever  is selected when PCRE is built can be overridden when the library
111       functions are called.
112

BUILDING SHARED AND STATIC LIBRARIES

114
115       The PCRE building process uses libtool to build both shared and  static
116       Unix  libraries by default. You can suppress one of these by adding one
117       of
118
119         --disable-shared
120         --disable-static
121
122       to the configure command, as required.
123

POSIX MALLOC USAGE

125
126       When PCRE is called through the POSIX interface (see the pcreposix doc‐
127       umentation),  additional  working  storage  is required for holding the
128       pointers to capturing substrings, because PCRE requires three  integers
129       per  substring,  whereas  the POSIX interface provides only two. If the
130       number of expected substrings is small, the wrapper function uses space
131       on the stack, because this is faster than using malloc() for each call.
132       The default threshold above which the stack is no longer used is 10; it
133       can be changed by adding a setting such as
134
135         --with-posix-malloc-threshold=20
136
137       to the configure command.
138

HANDLING VERY LARGE PATTERNS

140
141       Within  a  compiled  pattern,  offset values are used to point from one
142       part to another (for example, from an opening parenthesis to an  alter‐
143       nation  metacharacter).  By default, two-byte values are used for these
144       offsets, leading to a maximum size for a  compiled  pattern  of  around
145       64K.  This  is sufficient to handle all but the most gigantic patterns.
146       Nevertheless, some people do want to process enormous patterns,  so  it
147       is  possible  to compile PCRE to use three-byte or four-byte offsets by
148       adding a setting such as
149
150         --with-link-size=3
151
152       to the configure command. The value given must be 2,  3,  or  4.  Using
153       longer  offsets slows down the operation of PCRE because it has to load
154       additional bytes when handling them.
155

AVOIDING EXCESSIVE STACK USAGE

157
158       When matching with the pcre_exec() function, PCRE implements backtrack‐
159       ing  by  making recursive calls to an internal function called match().
160       In environments where the size of the stack is limited,  this  can  se‐
161       verely  limit  PCRE's operation. (The Unix environment does not usually
162       suffer from this problem, but it may sometimes be necessary to increase
163       the  maximum  stack size.  There is a discussion in the pcrestack docu‐
164       mentation.) An alternative approach to recursion that uses memory  from
165       the  heap  to remember data, instead of using recursive function calls,
166       has been implemented to work round the problem of limited  stack  size.
167       If you want to build a version of PCRE that works this way, add
168
169         --disable-stack-for-recursion
170
171       to  the  configure  command. With this configuration, PCRE will use the
172       pcre_stack_malloc and pcre_stack_free variables to call memory  manage‐
173       ment  functions. By default these point to malloc() and free(), but you
174       can replace the pointers so that your own functions are used.
175
176       Separate functions are  provided  rather  than  using  pcre_malloc  and
177       pcre_free  because  the  usage  is  very  predictable:  the block sizes
178       requested are always the same, and  the  blocks  are  always  freed  in
179       reverse  order.  A calling program might be able to implement optimized
180       functions that perform better  than  malloc()  and  free().  PCRE  runs
181       noticeably more slowly when built in this way. This option affects only
182       the  pcre_exec()  function;  it   is   not   relevant   for   the   the
183       pcre_dfa_exec() function.
184

LIMITING PCRE RESOURCE USAGE

186
187       Internally,  PCRE has a function called match(), which it calls repeat‐
188       edly  (sometimes  recursively)  when  matching  a  pattern   with   the
189       pcre_exec()  function.  By controlling the maximum number of times this
190       function may be called during a single matching operation, a limit  can
191       be  placed  on  the resources used by a single call to pcre_exec(). The
192       limit can be changed at run time, as described in the pcreapi  documen‐
193       tation.  The default is 10 million, but this can be changed by adding a
194       setting such as
195
196         --with-match-limit=500000
197
198       to  the  configure  command.  This  setting  has  no  effect   on   the
199       pcre_dfa_exec() matching function.
200
201       In  some  environments  it is desirable to limit the depth of recursive
202       calls of match() more strictly than the total number of calls, in order
203       to  restrict  the maximum amount of stack (or heap, if --disable-stack-
204       for-recursion is specified) that is used. A second limit controls this;
205       it  defaults  to  the  value  that is set for --with-match-limit, which
206       imposes no additional constraints. However, you can set a  lower  limit
207       by adding, for example,
208
209         --with-match-limit-recursion=10000
210
211       to  the  configure  command.  This  value can also be overridden at run
212       time.
213

CREATING CHARACTER TABLES AT BUILD TIME

215
216       PCRE uses fixed tables for processing characters whose code values  are
217       less  than 256. By default, PCRE is built with a set of tables that are
218       distributed in the file pcre_chartables.c.dist. These  tables  are  for
219       ASCII codes only. If you add
220
221         --enable-rebuild-chartables
222
223       to  the  configure  command, the distributed tables are no longer used.
224       Instead, a program called dftables is compiled and  run.  This  outputs
225       the source for new set of tables, created in the default locale of your
226       C run-time system. (This method of replacing the tables does  not  work
227       if  you are cross compiling, because dftables is run on the local host.
228       If you need to create alternative tables when cross compiling, you will
229       have to do so "by hand".)
230

USING EBCDIC CODE

232
233       PCRE  assumes  by  default that it will run in an environment where the
234       character code is ASCII (or Unicode, which is  a  superset  of  ASCII).
235       This  is  the  case for most computer operating systems. PCRE can, how‐
236       ever, be compiled to run in an EBCDIC environment by adding
237
238         --enable-ebcdic
239
240       to the configure command. This setting implies --enable-rebuild-charta‐
241       bles.  You  should  only  use  it if you know that you are in an EBCDIC
242       environment (for example, an IBM mainframe operating system).
243

PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT

245
246       By default, pcregrep reads all files as plain text. You can build it so
247       that it recognizes files whose names end in .gz or .bz2, and reads them
248       with libz or libbz2, respectively, by adding one or both of
249
250         --enable-pcregrep-libz
251         --enable-pcregrep-libbz2
252
253       to the configure command. These options naturally require that the rel‐
254       evant  libraries  are installed on your system. Configuration will fail
255       if they are not.
256

PCRETEST OPTION FOR LIBREADLINE SUPPORT

258
259       If you add
260
261         --enable-pcretest-libreadline
262
263       to the configure command,  pcretest  is  linked  with  the  libreadline
264       library,  and  when its input is from a terminal, it reads it using the
265       readline() function. This provides line-editing and history facilities.
266       Note that libreadline is GPL-licenced, so if you distribute a binary of
267       pcretest linked in this way, there may be licensing issues.
268
269       Setting this option causes the -lreadline option to  be  added  to  the
270       pcretest  build.  In many operating environments with a sytem-installed
271       libreadline this is sufficient. However, in some environments (e.g.  if
272       an  unmodified  distribution version of readline is in use), some extra
273       configuration may be necessary. The INSTALL file for  libreadline  says
274       this:
275
276         "Readline uses the termcap functions, but does not link with the
277         termcap or curses library itself, allowing applications which link
278         with readline the to choose an appropriate library."
279
280       If  your environment has not been set up so that an appropriate library
281       is automatically included, you may need to add something like
282
283         LIBS="-ncurses"
284
285       immediately before the configure command.
286

SEE ALSO

288
289       pcreapi(3), pcre_config(3).
290

AUTHOR

292
293       Philip Hazel
294       University Computing Service
295       Cambridge CB2 3QH, England.
296

REVISION

298
299       Last updated: 13 April 2008
300       Copyright (c) 1997-2008 University of Cambridge.
301
302
303
304                                                                  PCREBUILD(3)
Impressum