1PCREBUILD(3)               Library Functions Manual               PCREBUILD(3)
2
3
4

NAME

6       PCRE - Perl-compatible regular expressions
7

PCRE BUILD-TIME OPTIONS

9
10       This  document  describes  the  optional  features  of PCRE that can be
11       selected when the library is compiled. They are all selected, or  dese‐
12       lected, by providing options to the configure script that is run before
13       the make command. The complete list of  options  for  configure  (which
14       includes  the  standard  ones such as the selection of the installation
15       directory) can be obtained by running
16
17         ./configure --help
18
19       The following sections include  descriptions  of  options  whose  names
20       begin with --enable or --disable. These settings specify changes to the
21       defaults for the configure command. Because of the way  that  configure
22       works,  --enable  and --disable always come in pairs, so the complemen‐
23       tary option always exists as well, but as it specifies the default,  it
24       is not described.
25

C++ SUPPORT

27
28       By default, the configure script will search for a C++ compiler and C++
29       header files. If it finds them, it automatically builds the C++ wrapper
30       library for PCRE. You can disable this by adding
31
32         --disable-cpp
33
34       to the configure command.
35

UTF-8 SUPPORT

37
38       To build PCRE with support for UTF-8 character strings, add
39
40         --enable-utf8
41
42       to  the  configure  command.  Of  itself, this does not make PCRE treat
43       strings as UTF-8. As well as compiling PCRE with this option, you  also
44       have  have to set the PCRE_UTF8 option when you call the pcre_compile()
45       function.
46

UNICODE CHARACTER PROPERTY SUPPORT

48
49       UTF-8 support allows PCRE to process character values greater than  255
50       in  the  strings that it handles. On its own, however, it does not pro‐
51       vide any facilities for accessing the properties of such characters. If
52       you  want  to  be able to use the pattern escapes \P, \p, and \X, which
53       refer to Unicode character properties, you must add
54
55         --enable-unicode-properties
56
57       to the configure command. This implies UTF-8 support, even if you  have
58       not explicitly requested it.
59
60       Including  Unicode  property  support  adds around 30K of tables to the
61       PCRE library. Only the general category properties such as  Lu  and  Nd
62       are supported. Details are given in the pcrepattern documentation.
63

CODE VALUE OF NEWLINE

65
66       By  default,  PCRE interprets character 10 (linefeed, LF) as indicating
67       the end of a line. This is the normal newline  character  on  Unix-like
68       systems. You can compile PCRE to use character 13 (carriage return, CR)
69       instead, by adding
70
71         --enable-newline-is-cr
72
73       to the  configure  command.  There  is  also  a  --enable-newline-is-lf
74       option, which explicitly specifies linefeed as the newline character.
75
76       Alternatively, you can specify that line endings are to be indicated by
77       the two character sequence CRLF. If you want this, add
78
79         --enable-newline-is-crlf
80
81       to the configure command. There is a fourth option, specified by
82
83         --enable-newline-is-anycrlf
84
85       which causes PCRE to recognize any of the three sequences  CR,  LF,  or
86       CRLF as indicating a line ending. Finally, a fifth option, specified by
87
88         --enable-newline-is-any
89
90       causes PCRE to recognize any Unicode newline sequence.
91
92       Whatever  line  ending convention is selected when PCRE is built can be
93       overridden when the library functions are called. At build time  it  is
94       conventional to use the standard for your operating system.
95

BUILDING SHARED AND STATIC LIBRARIES

97
98       The  PCRE building process uses libtool to build both shared and static
99       Unix libraries by default. You can suppress one of these by adding  one
100       of
101
102         --disable-shared
103         --disable-static
104
105       to the configure command, as required.
106

POSIX MALLOC USAGE

108
109       When PCRE is called through the POSIX interface (see the pcreposix doc‐
110       umentation), additional working storage is  required  for  holding  the
111       pointers  to capturing substrings, because PCRE requires three integers
112       per substring, whereas the POSIX interface provides only  two.  If  the
113       number of expected substrings is small, the wrapper function uses space
114       on the stack, because this is faster than using malloc() for each call.
115       The default threshold above which the stack is no longer used is 10; it
116       can be changed by adding a setting such as
117
118         --with-posix-malloc-threshold=20
119
120       to the configure command.
121

HANDLING VERY LARGE PATTERNS

123
124       Within a compiled pattern, offset values are used  to  point  from  one
125       part  to another (for example, from an opening parenthesis to an alter‐
126       nation metacharacter). By default, two-byte values are used  for  these
127       offsets,  leading  to  a  maximum size for a compiled pattern of around
128       64K. This is sufficient to handle all but the most  gigantic  patterns.
129       Nevertheless,  some  people do want to process enormous patterns, so it
130       is possible to compile PCRE to use three-byte or four-byte  offsets  by
131       adding a setting such as
132
133         --with-link-size=3
134
135       to  the  configure  command.  The value given must be 2, 3, or 4. Using
136       longer offsets slows down the operation of PCRE because it has to  load
137       additional bytes when handling them.
138

AVOIDING EXCESSIVE STACK USAGE

140
141       When matching with the pcre_exec() function, PCRE implements backtrack‐
142       ing by making recursive calls to an internal function  called  match().
143       In  environments  where  the size of the stack is limited, this can se‐
144       verely limit PCRE's operation. (The Unix environment does  not  usually
145       suffer from this problem, but it may sometimes be necessary to increase
146       the maximum stack size.  There is a discussion in the  pcrestack  docu‐
147       mentation.)  An alternative approach to recursion that uses memory from
148       the heap to remember data, instead of using recursive  function  calls,
149       has  been  implemented to work round the problem of limited stack size.
150       If you want to build a version of PCRE that works this way, add
151
152         --disable-stack-for-recursion
153
154       to the configure command. With this configuration, PCRE  will  use  the
155       pcre_stack_malloc  and pcre_stack_free variables to call memory manage‐
156       ment functions. By default these point to malloc() and free(), but  you
157       can replace the pointers so that your own functions are used.
158
159       Separate  functions  are  provided  rather  than  using pcre_malloc and
160       pcre_free because the  usage  is  very  predictable:  the  block  sizes
161       requested  are  always  the  same,  and  the blocks are always freed in
162       reverse order. A calling program might be able to  implement  optimized
163       functions  that  perform  better  than  malloc()  and free(). PCRE runs
164       noticeably more slowly when built in this way. This option affects only
165       the   pcre_exec()   function;   it   is   not   relevant  for  the  the
166       pcre_dfa_exec() function.
167

LIMITING PCRE RESOURCE USAGE

169
170       Internally, PCRE has a function called match(), which it calls  repeat‐
171       edly   (sometimes   recursively)  when  matching  a  pattern  with  the
172       pcre_exec() function. By controlling the maximum number of  times  this
173       function  may be called during a single matching operation, a limit can
174       be placed on the resources used by a single call  to  pcre_exec().  The
175       limit  can be changed at run time, as described in the pcreapi documen‐
176       tation. The default is 10 million, but this can be changed by adding  a
177       setting such as
178
179         --with-match-limit=500000
180
181       to   the   configure  command.  This  setting  has  no  effect  on  the
182       pcre_dfa_exec() matching function.
183
184       In some environments it is desirable to limit the  depth  of  recursive
185       calls of match() more strictly than the total number of calls, in order
186       to restrict the maximum amount of stack (or heap,  if  --disable-stack-
187       for-recursion is specified) that is used. A second limit controls this;
188       it defaults to the value that  is  set  for  --with-match-limit,  which
189       imposes  no  additional constraints. However, you can set a lower limit
190       by adding, for example,
191
192         --with-match-limit-recursion=10000
193
194       to the configure command. This value can  also  be  overridden  at  run
195       time.
196

CREATING CHARACTER TABLES AT BUILD TIME

198
199       PCRE  uses fixed tables for processing characters whose code values are
200       less than 256. By default, PCRE is built with a set of tables that  are
201       distributed  in  the  file pcre_chartables.c.dist. These tables are for
202       ASCII codes only. If you add
203
204         --enable-rebuild-chartables
205
206       to the configure command, the distributed tables are  no  longer  used.
207       Instead,  a  program  called dftables is compiled and run. This outputs
208       the source for new set of tables, created in the default locale of your
209       C runtime system. (This method of replacing the tables does not work if
210       you are cross compiling, because dftables is run on the local host.  If
211       you  need  to  create alternative tables when cross compiling, you will
212       have to do so "by hand".)
213

USING EBCDIC CODE

215
216       PCRE assumes by default that it will run in an  environment  where  the
217       character  code  is  ASCII  (or Unicode, which is a superset of ASCII).
218       This is the case for most computer operating systems.  PCRE  can,  how‐
219       ever, be compiled to run in an EBCDIC environment by adding
220
221         --enable-ebcdic
222
223       to the configure command. This setting implies --enable-rebuild-charta‐
224       bles. You should only use it if you know that  you  are  in  an  EBCDIC
225       environment (for example, an IBM mainframe operating system).
226

SEE ALSO

228
229       pcreapi(3), pcre_config(3).
230

AUTHOR

232
233       Philip Hazel
234       University Computing Service
235       Cambridge CB2 3QH, England.
236

REVISION

238
239       Last updated: 30 July 2007
240       Copyright (c) 1997-2007 University of Cambridge.
241
242
243
244                                                                  PCREBUILD(3)
Impressum